From: Ian Upright
Subject: Bit Vectored Identity Sets
Date: 
Message-ID: <0m6hfssihc9h2ugk1jssen4v79sluhqd4p@4ax.com>
Are there any Smalltalk implementations of an IdentitySet that internally
use Bit Vectors?  If one was doing a lot of set unions/intersections with
objects in ram, it would seem there could be a lot of performance benefits
to be gained there by using some sort of bit vector implementation.

Apparently GemStone uses some type of implementation of bit vectors in it's
identity sets?

The use of a bit vector seems pretty simple, when you hard-wire each bit to
be used for a specific individual set.  I imagine one really simple
implementation might be having a huge hashed/btree Dictionary containing all
entities, lookup and return another hashed/btree Dictionary of the bits
based on which IdentitySets the entity was in.  With IdentitySets that
aren't intersected at all, I would think this would be very significant
overhead, hardly worth doing this sort of implementation.  Also, I'm not
sure what kinds of benefits would be gained when you have a btree of bits,
it's hardly a simple bit comparision.

So what kind of pattern (if any exists) is used to take advantage of bit
vectors in cases where sets are being frequently unioned/interseted, and
with sets that aren't leave well enough alone and implement the set as it's
own individual instance with just one bit, or join it together with another
small IdentitySet that isn't used frequently?  Assuming you have tens of
thousands or even millions of different IdentitySets, how do you know which
IdentitySets should be grouped together to use the same hashtable/btree
lookup and get bit vectored together?  In GemStone it seems, this magic all
happens behind the scenes and I have no idea what might be going on in
there.  :-)

Anybody able to enlighten me, or have any references to papers, etc. that
might be relevant?

Ian