From: Gary Livingston
Subject: hash-table questions
Date: 
Message-ID: <6r4cao$c$1@usenet01.srv.cis.pitt.edu>
Hello,

I have several questions regarding efficient use of hash
tables:

1.  I notice in Paul Graham's "ANSI COMMON LISP," that
	eq and eql hash tables both incur extra costs under
	copying garbage collection algorithms.  Why?

2.  Why eq and eql but not equal or equalp?

3.  Why does Graham recommend as the best solution using an 
	eql hash table and fixnums as keys?  I thought he
	just stated that eql hash tables also incur GC costs.

THanks,
Gary Livingston


	

From: Jeffrey Mark Siskind
Subject: Re: hash-table questions
Date: 
Message-ID: <yq7lnoqm1si.fsf@qobi.nj.nec.com>
EQ (and EQL) determine whether two objects are identical (i.e. have the same
address). EQ (and EQL) hash tables thus hash on the address of the keys. A
copying GC changes the address of objects. Thus EQ (and EQL) hash tables must
be rehashed after a GC moves one of the objects in that table. EQUAL hash
tables don't hash on object identity or address. The hash function for EQUAL,
however, must traverse the key to compute the hash value, which can be
expensive for large keys. And lookup must also do an EQUAL (which requires key
traversal) when there is hash bucket collision. If you have an EQL hash table
where all the keys are numbers, then GC cannot cause the hash value of the
keys of the objects in that table to change so it never needs to be rehashed.
Presumably, a good implementation keeps a flag in the header of an EQL hash
table to keep track if there has ever been a SETF GETHASH on that table with a
nonnumeric key.

The standard effficient way of hashing objects (i.e. CLOS objects or DEFSTRUCT
structures) is to have a slot called SERIAL-NUMBER or some such, have a global
variable *SERIAL-NUMBER* (initialized to zero), and have an :AFTER
MAKE-INSTANCE method that initializes the SERIAL-NUMBER slot of new instances
of objects to (INCF *SERIAL-NUMBER*) as they are created. Then you use an EQL
hash table on the SERIAL-NUMBER slots of objects. This can all be done nicely
with mixins. In fact, it would have been nice if this was made part of the
language spec.

    Jeff (http://www.neci.nj.nec.com/homepages/qobi)
From: Gary Livingston
Subject: Re: hash-table questions
Date: 
Message-ID: <6r58gp$28h$1@usenet01.srv.cis.pitt.edu>
Thanks,

THat question about the difference between eq, eql, and equla hash tables had
been bugging me for a while.


I have often thought of a scheme similar to your serial numbers.

Thanks again,
Gary

Jeffrey Mark Siskind <····@research.nj.nec.com> wrote:
: EQ (and EQL) determine whether two objects are identical (i.e. have the same
: address). EQ (and EQL) hash tables thus hash on the address of the keys. A
: copying GC changes the address of objects. Thus EQ (and EQL) hash tables must
: be rehashed after a GC moves one of the objects in that table. EQUAL hash
: tables don't hash on object identity or address. The hash function for EQUAL,
: however, must traverse the key to compute the hash value, which can be
: expensive for large keys. And lookup must also do an EQUAL (which requires key
: traversal) when there is hash bucket collision. If you have an EQL hash table
: where all the keys are numbers, then GC cannot cause the hash value of the
: keys of the objects in that table to change so it never needs to be rehashed.
: Presumably, a good implementation keeps a flag in the header of an EQL hash
: table to keep track if there has ever been a SETF GETHASH on that table with a
: nonnumeric key.

: The standard effficient way of hashing objects (i.e. CLOS objects or DEFSTRUCT
: structures) is to have a slot called SERIAL-NUMBER or some such, have a global
: variable *SERIAL-NUMBER* (initialized to zero), and have an :AFTER
: MAKE-INSTANCE method that initializes the SERIAL-NUMBER slot of new instances
: of objects to (INCF *SERIAL-NUMBER*) as they are created. Then you use an EQL
: hash table on the SERIAL-NUMBER slots of objects. This can all be done nicely
: with mixins. In fact, it would have been nice if this was made part of the
: language spec.

:     Jeff (http://www.neci.nj.nec.com/homepages/qobi)
From: Kelly Murray
Subject: Re: hash-table questions
Date: 
Message-ID: <35D88D3A.D060AC8@IntelliMarket.Com>
> The standard effficient way of hashing objects (i.e. CLOS objects or DEFSTRUCT
> structures) is to have a slot called SERIAL-NUMBER or some such, have a global
> variable *SERIAL-NUMBER* (initialized to zero), and have an :AFTER
> MAKE-INSTANCE method that initializes the SERIAL-NUMBER slot of new instances
> of objects to (INCF *SERIAL-NUMBER*) as they are created. Then you use an EQL
> hash table on the SERIAL-NUMBER slots of objects. This can all be done nicely
> with mixins. In fact, it would have been nice if this was made part of the
> language spec.
> 

This would work nicely if sxhash was a generic function,
since then if an existing slot could function as a unique id,
there would be no need to duplicate that info by a mixin.
Actually, gethash needs to be a generic function, and have
its arguments reversed, and a hashtable would be a CLOS class.
If progress was only possible..   -kelly murray  


(method gethash ((obj hashtable) key) (lisp::gethash key obj))

(method hash (obj) (lisp::sxhash obj))

(method hash ((obj hashable-mixin)) (hashable-key obj))

and furthermore, if initforms of a slot could reference
the value of other slots, in this case, the class slot with
global value to increment the no global variable is needed,
and no :after method either. 

(class hashable-mixin () 
 (hashcount :allocation :class :initform 0)
 (hashkey :initform (incf hashcount) :accessor hashable-key)
 )
From: Jeffrey Mark Siskind
Subject: Re: hash-table questions
Date: 
Message-ID: <yq7iujrm6ia.fsf@qobi.nj.nec.com>
> Actually, gethash needs to be a generic function, and have
> its arguments reversed

To carry this further, I wish there were multidimensional hash tables. Instead
of just changing (GETHASH key table) to (GETHASH table key) it should further
change to (HREF table key0 ... keyN) and also allow SETF. And the different
axes should allow different :TESTs. I implemented this once upon a time.

> If progress was only possible..   -kelly murray

Sigh :-(

    Jeff (http://www.neci.nj.nec.com/homepages/qobi)
From: Marco Antoniotti
Subject: Re: hash-table questions
Date: 
Message-ID: <lwemuey8mz.fsf@galvani.parades.rm.cnr.it>
Jeffrey Mark Siskind <····@research.nj.nec.com> writes:

> > Actually, gethash needs to be a generic function, and have
> > its arguments reversed
> 
> To carry this further, I wish there were multidimensional hash tables. Instead
> of just changing (GETHASH key table) to (GETHASH table key) it should further
> change to (HREF table key0 ... keyN) and also allow SETF. And the different
> axes should allow different :TESTs. I implemented this once upon a time.
> 
> > If progress was only possible..   -kelly murray
> 


(WITH-GRIPE- MODE-ON
"
Why is not progress possible?  Harlequin and Franz could just ask for
comments on a *COMMON* proposal on C.L.L. and make the necessary
changes to their products.  Moreover, they could provide a *public
domain* implementation CL implementation of this technology (which is
*non relevant* for competition purposes) and be done with it.
Consensus may take some time to build, but it may work.
Finally, Harlequin's Hyperspec could have an "addendum" on the new
features non included in the standard, but available under all the
CL's under the Sun.
")

Cheers

-- 
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - (0)6 - 68 80 79 23, fax. +39 - (0)6 - 68 80 79 26
http://www.parades.rm.cnr.it
From: Duane Rettig
Subject: Re: hash-table questions
Date: 
Message-ID: <4af52p9or.fsf@beta.franz.com>
Marco Antoniotti <·······@galvani.parades.rm.cnr.it> writes:

> Jeffrey Mark Siskind <····@research.nj.nec.com> writes:
> 
> > > Actually, gethash needs to be a generic function, and have
> > > its arguments reversed
> > 
> > To carry this further, I wish there were multidimensional hash tables. Instead
> > of just changing (GETHASH key table) to (GETHASH table key) it should further
> > change to (HREF table key0 ... keyN) and also allow SETF. And the different
> > axes should allow different :TESTs. I implemented this once upon a time.
> > 
> > > If progress was only possible..   -kelly murray
> > 
> 
> 
> (WITH-GRIPE- MODE-ON
> "
> Why is not progress possible?

I think different people have differing views on what constitutes
progress.

1. Some may feel that progress is made when a formal standard itself
progresses.  Such progress is by nature extremely slow (when was the
last version of the CL spec?  And the one before it? :-)

2. Others may view progress as a cooperative effort among competitors
to provide common features above and beyond the language.  This kind
of progress is not only possible, but it does happen, and is usually
brought about by business relationships and mutual non-competitive
goals.  But I also believe that this kind of progress is relatively
slow.

3. As a CL developer, I view strong progress very myopically.  My
customers tell me what they want (we have many formal Requests For
Enhancement) logged by customers and developers), and part of my
job is to deal with those requests on a priority basis, implementing
those that end up with high priority.  In that sense, progress does not
necessarily produce portable extensions, but it is much faster and
very real.  As to the subject of this thread, many of the hash table
extensions in our lisp (including various weak hashtable options and
a sans-value (key-only) hash-table, as well as user-defined hash code
generation and :test) are due to such RFEs.  And I do like the
multi-dimensional idea; I will probably file an RFE to do some
research into it.

4. Finally, since CL (and lisp in general) is an extensible language,
it turns all CL programmers into CL developers, and many of the
features that are in CL today are the result of someone implementing
an extension on top of an existing lisp and then sharing it.

>  Harlequin and Franz could just ask for
> comments on a *COMMON* proposal on C.L.L. and make the necessary
> changes to their products.

I presume that by your emphasis you are looking for progress in either
the first (standards) or second (cooperative effort) categories.
However, if this is the case, why only Harlequin and Franz?  Shouldn't
this include all CL providers?

>  Moreover, they could provide a *public
> domain* implementation CL implementation of this technology (which is
> *non relevant* for competition purposes) and be done with it.

I assume that by "this technology" you mean a package of lisp code
that implements a new feature, and not an entire lisp.
I think most of us CL implementors try to do such things, but
sometimes it is hard to add a complex feature without making
low-level changes to the core lisp technology.  

> Consensus may take some time to build, but it may work.
> Finally, Harlequin's Hyperspec could have an "addendum" on the new
> features non included in the standard, but available under all the
> CL's under the Sun.

This leans more toward the "standards" category, above, which 
tends to take the longest time.

> ")
> 
> Cheers
> 
> -- 
> Marco Antoniotti ===========================================
> PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
> tel. +39 - (0)6 - 68 80 79 23, fax. +39 - (0)6 - 68 80 79 26
> http://www.parades.rm.cnr.it

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)
From: Mike McDonald
Subject: Re: hash-table questions
Date: 
Message-ID: <hZhC1.21367$MV.15438689@news.teleport.com>
In article <·············@beta.franz.com>,
	Duane Rettig <·····@franz.com> writes:
> Marco Antoniotti <·······@galvani.parades.rm.cnr.it> writes:

>>  Harlequin and Franz could just ask for
>> comments on a *COMMON* proposal on C.L.L. and make the necessary
>> changes to their products.
> 
> I presume that by your emphasis you are looking for progress in either
> the first (standards) or second (cooperative effort) categories.
> However, if this is the case, why only Harlequin and Franz?  Shouldn't
> this include all CL providers?

  From a practicle point of view, don't the two of you make up a vast majority
of the commercial CL market? If the two of you could agree on anything, that
pretty much makes it a defacto industry standard.

  Mike McDonald
  ·······@mikemac.com
From: Duane Rettig
Subject: Re: hash-table questions
Date: 
Message-ID: <47m06p25l.fsf@beta.franz.com>
·······@teleport.com (Mike McDonald) writes:

> In article <·············@beta.franz.com>,
> 	Duane Rettig <·····@franz.com> writes:
> > Marco Antoniotti <·······@galvani.parades.rm.cnr.it> writes:
> 
> >>  Harlequin and Franz could just ask for
> >> comments on a *COMMON* proposal on C.L.L. and make the necessary
> >> changes to their products.
> > 
> > I presume that by your emphasis you are looking for progress in either
> > the first (standards) or second (cooperative effort) categories.
> > However, if this is the case, why only Harlequin and Franz?  Shouldn't
> > this include all CL providers?
> 
>   From a practicle point of view, don't the two of you make up a vast majority
> of the commercial CL market?

Actually, from my own personal point of view, there are many more
customers involved in the commercial CL market than there are
vendors.  And although we CL vendors know the implementations
of our products better than our customers know (or care to know),
our customers know a lot more about their products that we vendors
know.  The CL standard itself is completely open, and in fact was
defined by both vendors and users of lisp.  So I view the commercial
CL market as a give-and-take between both customers and vendors.

> If the two of you could agree on anything, that
> pretty much makes it a defacto industry standard.

For this one, I looked up the word "industry" in my dictionary;
not all definitions had commercial connotations.  I think that
the lisp industry is made up of both commercial and non-commercial
elements, and any attempt by commercial vendors to create a
de facto standard, without taking into consideration non-commercial
users and vendors, might leave those groups out.  This may in fact
occur occasionally, but if the rest of the industry doesn't buy
the new piece of the standard, then a new faction is created,
splitting the CL industry.  A house divided against itself ...

>   Mike McDonald
>   ·······@mikemac.com
> 

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)
From: Tim Bradshaw
Subject: Re: hash-table questions
Date: 
Message-ID: <ey390klb1q8.fsf@todday.aiai.ed.ac.uk>
* Duane Rettig wrote:

> Actually, from my own personal point of view, there are many more
> customers involved in the commercial CL market than there are
> vendors.  And although we CL vendors know the implementations
> of our products better than our customers know (or care to know),
> our customers know a lot more about their products that we vendors
> know.  The CL standard itself is completely open, and in fact was
> defined by both vendors and users of lisp.  So I view the commercial
> CL market as a give-and-take between both customers and vendors.

Surely the right thing to do is for interested people (either vendors
or users) to actually sit down and write down & publicise a definition
of what they want, preferably with one or more implementations on top
of existing CLs, and such a definition could become a defacto standard
if enough people liked it (and a real official standard too in due
course I suppose).  I definitely don't think that some little
conglomerate of vendors should sit down and try and define a standard.

So I suppose what I'm trying to say is that, if people want some extra
feature (fancier hashtables, say), the answer is to sit down and write
a spec for it, preferably with an implementation, and let people
explore it.  comp.lang.lisp would probably be a good forum for
`publishing' such things.  I certainly wouldn't like to see the
vendors drive this.

--tim
From: Marco Antoniotti
Subject: Re: hash-table questions
Date: 
Message-ID: <lwiujpe5xg.fsf@galileo.parades.rm.cnr.it>
Hi,

I posted a polemic article, and I got a civilized response. Thanks.

Duane Rettig <·····@franz.com> writes:

> Marco Antoniotti <·······@galvani.parades.rm.cnr.it> writes:
> 
> > (WITH-GRIPE- MODE-ON
> > "
> > Why is not progress possible?
> 
> I think different people have differing views on what constitutes
> progress.
> 
> 1. Some may feel that progress is made when a formal standard itself
> progresses.  Such progress is by nature extremely slow (when was the
> last version of the CL spec?  And the one before it? :-)
> 
> 2. Others may view progress as a cooperative effort among competitors
> to provide common features above and beyond the language.  This kind
> of progress is not only possible, but it does happen, and is usually
> brought about by business relationships and mutual non-competitive
> goals.  But I also believe that this kind of progress is relatively
> slow.

Both are right, but both are kind "off the mark". IMHO progress in the
CL camp should be viewed right now (i.e. in August of 1998) as "making
the language more usable to the widest possible group of people". This
amounts to keeping option (1) in mind, while trying to homogeneize the
language as much as possible.

However, I personally feel (though I do not really know) that option
(1) is pragmatically pursuable right now.
Therefore, IMHO, this leaves option (2) as the sole source of possible
"progress" (as I defined before) in the CL camp.

This should answer all your later remarks .

> 3. As a CL developer, I view strong progress very myopically.  My
> customers tell me what they want (we have many formal Requests For
> Enhancement) logged by customers and developers), and part of my
> job is to deal with those requests on a priority basis, implementing
> those that end up with high priority.  In that sense, progress does not
> necessarily produce portable extensions, but it is much faster and
> very real.  As to the subject of this thread, many of the hash table
> extensions in our lisp (including various weak hashtable options and
> a sans-value (key-only) hash-table, as well as user-defined hash code
> generation and :test) are due to such RFEs.  And I do like the
> multi-dimensional idea; I will probably file an RFE to do some
> research into it.

The problem of HASH tables extensions is an epitome of the current
state of affairs. I do not have a LWW on my machine, so I can't
control, but, given that weak pointers have been around in most
implementations and that user defined hash functions are so useful, I
strongly believe that cooperation (between Franz and Harlequin) in
this case would definitively do more good to the CL camp as a whole
than the current situation.  The fact that "extensions" to hash tables
have been well understood for quite some time makes the matter only
more inexplicable.

> 
> 4. Finally, since CL (and lisp in general) is an extensible language,
> it turns all CL programmers into CL developers, and many of the
> features that are in CL today are the result of someone implementing
> an extension on top of an existing lisp and then sharing it.

Yes.  You are right. Unfortunately, most of the time this "layer" is
just a wrapping of some implementation dependent code which is
semantically equivalent but syntactically not.

> 
> >  Harlequin and Franz could just ask for
> > comments on a *COMMON* proposal on C.L.L. and make the necessary
> > changes to their products.
> 
> I presume that by your emphasis you are looking for progress in either
> the first (standards) or second (cooperative effort) categories.
> However, if this is the case, why only Harlequin and Franz?  Shouldn't
> this include all CL providers?

As Mike McDonald already answered you, the only other provider which
comes to mind is Digitool. They should be included as well.
> 
> >  Moreover, they could provide a *public
> > domain* implementation CL implementation of this technology (which is
> > *non relevant* for competition purposes) and be done with it.
> 
> I assume that by "this technology" you mean a package of lisp code
> that implements a new feature, and not an entire lisp.
> I think most of us CL implementors try to do such things, but
> sometimes it is hard to add a complex feature without making
> low-level changes to the core lisp technology.

I agree with that.  However, given that progress is defined in the
terms I gave above (after all, an overall bigger CL market will
benefit both Franz and Harlequin and Digitool), there have been cases
were the "public" availability of an implementation dependent CL piece
of software made it possible to adapt it to non commercial systems and
therefore enlarged the pool of CL users (the case of CLX - to refer to
a substantial piece of code - and CMUCL comes to mind).

> > Consensus may take some time to build, but it may work.
> > Finally, Harlequin's Hyperspec could have an "addendum" on the new
> > features non included in the standard, but available under all the
> > CL's under the Sun.
> 
> This leans more toward the "standards" category, above, which 
> tends to take the longest time.

Of course it leans toward the "standards" category. I simply argue
that a two-vendors standard is better than a one-vendor standard (do
keywords like FFI, Socket Interface, and even GUI ring a bell?)
given that the universal standard is nowadays difficult to achieve. A
three-vendors standard is even better.  A three-vendor standard made
easy to incorporate in non commercial systems is even better since it
enlarges the pool of CL users.

I just hope you partially agree with me and that you have some
decision power within Franz to make it happen.

And now back to programming in Java.

Cheers

-- 
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - (0)6 - 68 80 79 23, fax. +39 - (0)6 - 68 80 79 26
http://www.parades.rm.cnr.it
From: Scott L. Burson
Subject: Re: hash-table questions
Date: 
Message-ID: <35D64775.E19C2ACC@zeta-sqoft.com>
Gary Livingston wrote:
> I have several questions regarding efficient use of hash
> tables:
> 
> 1.  I notice in Paul Graham's "ANSI COMMON LISP," that
>         eq and eql hash tables both incur extra costs under
>         copying garbage collection algorithms.  Why?

Because in these tables, non-immediate objects are hashed by their address. 
Since copying GC relocates objects to new addresses, a hash table containing
non-immediate objects has to be rehashed after a GC.

> 2.  Why eq and eql but not equal or equalp?

EQUAL and EQUALP tables hash structures by using SXHASH or a similar hash
function, which does not look at addresses of objects.

> 3.  Why does Graham recommend as the best solution using an 
>         eql hash table and fixnums as keys?  I thought he
>         just stated that eql hash tables also incur GC costs.

Fixnums, being immediate objects, have no address and are hashed by their value
instead.  Some hash table packages keep track of whether only immediate objects
have been used as keys for a given table, and if so, they skip the rehashing
step.

"Immediate" means that the object's value is given directly by the machine word
that represents it.  These normally include fixnums, characters, and in some
implementations, some kinds of floating-point numbers.  "Non-immediate" means
that the object's value is stored in memory at some address, and the machine
word that represents it is a pointer to that address.

A side note -- about a year ago I encountered the first situation I can recall,
in my two decades of Lisp work, in which I really needed the full power of EQUAL
hashing.  That is, I wasn't just hashing strings -- the most common application
of EQUAL hashing -- but arbitrarily nested list structures containing integers,
strings, and symbols.  It is a remarkably potent technique.  I wonder if I might
not have found uses for it sooner had I been more alert to them.

-- Scott

				  * * * * *

To use the email address, remove all occurrences of the letter "q".
From: Duane Rettig
Subject: Re: hash-table questions
Date: 
Message-ID: <4af4y67u6.fsf@beta.franz.com>
"Scott L. Burson" <·····@zeta-sqoft.com> writes:

> Gary Livingston wrote:
> > I have several questions regarding efficient use of hash
> > tables:

> > 3.  Why does Graham recommend as the best solution using an 
> >         eql hash table and fixnums as keys?  I thought he
> >         just stated that eql hash tables also incur GC costs.
> 
> Fixnums, being immediate objects, have no address and are hashed by their value
> instead.  Some hash table packages keep track of whether only immediate objects
> have been used as keys for a given table, and if so, they skip the rehashing
> step.
> 
> "Immediate" means that the object's value is given directly by the machine word
> that represents it.  These normally include fixnums, characters, and in some
> implementations, some kinds of floating-point numbers.  "Non-immediate" means
> that the object's value is stored in memory at some address, and the machine
> word that represents it is a pointer to that address.

Actually, "immediacy" wrt hashing doesn't have to be tied to the immediacy
of the implemented type; any object for which a hash code can be generated
deterministically (and preferably uniquely) without regard to the object's
address can be used efficiently in eq/eql hash tables.  In Allegro, we store
all numbers into eq/eql hash-tables without dirtying them, and also any
objects for which a unique hash code is generated and placed into the object
itself (i.e. symbols, function objects, and standard-instances).  So using
eq/eql tables is very fast with any of the above types.

Furthermore, we observed the phenomenon that once an object becomes old
enough (in Allegro, it is tenured into oldspace; some other lisps promote
finally to their dynamic space), the object never moves again until the
next global (dynamic) gc.  Because of this fact, we added a twofold
gradation to the "dirtying" of eq/eql hashtables - so even if there are
objects as keys that are not immediate in the above sense, if they are
all old enough the hash-table will only need rehashing the first use
after a global-gc, instead of after each scavenge.

Of course, if the keys are all "immediate" as described above, the hash
table would _never_ need rehashing until full.

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)