(make-hash-table :test #'mytest)

From: Knut Olav Bøhmer
Subject: (make-hash-table :test #'mytest)
Date: Sat, 02 Nov 2002 09:43:38 +0000
Message-ID: <ujp8z0ce27p.fsf@patch.linpro.no>

Hi,

Is it possible to make ones own :test to use with hash-tables? if yes,
how (where is the doc)? if no, why?


/me newbie
-- 
Knut Olav B�hmer
         _   _
       / /  (_)__  __ ____  __
      / /__/ / _ \/ // /\ \/ /  ... The choice of a
     /____/_/_//_/\.,_/ /_/\.\         GNU generation

An ideal world is left as an exercise to the reader. (Paul Graham)

Re: (make-hash-table :test #'mytest) Paul F. Dietz
Re: (make-hash-table :test #'mytest) Kaz Kylheku
Re: (make-hash-table :test #'mytest) Vijay L
Re: (make-hash-table :test #'mytest) David Lichteblau
Re: (make-hash-table :test #'mytest) Wade Humeniuk
Re: (make-hash-table :test #'mytest) Duane Rettig
Re: (make-hash-table :test #'mytest) Coby Beck
Re: (make-hash-table :test #'mytest) Matthew Danish
Re: (make-hash-table :test #'mytest) Erik Naggum
- Re: (make-hash-table :test #'mytest) Peter Seibel
  - Re: (make-hash-table :test #'mytest) Paul F. Dietz
    - Re: (make-hash-table :test #'mytest) Marcus Breiing
    - Re: (make-hash-table :test #'mytest) Frode Vatvedt Fjeld
      - Re: (make-hash-table :test #'mytest) Paul F. Dietz
        Re: (make-hash-table :test #'mytest) Duane Rettig
        Re: (make-hash-table :test #'mytest) Paul F. Dietz
        Re: (make-hash-table :test #'mytest) Michael Hudson
        Re: (make-hash-table :test #'mytest) Paul F. Dietz
        Re: (make-hash-table :test #'mytest) Duane Rettig
        Re: (make-hash-table :test #'mytest) Tim Bradshaw
        Re: (make-hash-table :test #'mytest) Erik Naggum
        Re: (make-hash-table :test #'mytest) Thomas A. Russ
      - Re: (make-hash-table :test #'mytest) Tim Bradshaw
        Re: (make-hash-table :test #'mytest) Daniel Barlow
        Re: (make-hash-table :test #'mytest) Rob Warnock
        Re: (make-hash-table :test #'mytest) Tim Bradshaw
        Re: (make-hash-table :test #'mytest) Duane Rettig
        Re: (make-hash-table :test #'mytest) Erik Naggum
        Re: (make-hash-table :test #'mytest) Tim Bradshaw
    - Re: (make-hash-table :test #'mytest) Peter Seibel

From: Paul F. Dietz
Subject: Re: (make-hash-table :test #'mytest)
Date: Sat, 02 Nov 2002 13:30:52 +0000
Message-ID: <3DC3D38C.9070301@dls.net>

Knut Olav B�hmer wrote:
> Hi,
> 
> Is it possible to make ones own :test to use with hash-tables? if yes,
> how (where is the doc)? if no, why?

How would the hash table be able to obtain a hash function H
with the property

    (funcall testfn x y) ==> (eql (H x) (H y))

The only way it could do that would be if H were a function
returning a constant, but in that case you might as well
use a list with FIND.

It would make sense to augment the hash table interface to
take a hash function as well as a comparison function.

	Paul

From: Kaz Kylheku
Subject: Re: (make-hash-table :test #'mytest)
Date: Sat, 02 Nov 2002 16:28:15 +0000
Message-ID: <cf333042.0211020828.549cfe0a@posting.google.com>

"Knut Olav B�hmer <·····@linpro.no> wrote in message news:<···············@patch.linpro.no>...
> Hi,
> 
> Is it possible to make ones own :test to use with hash-tables? if yes,
> how (where is the doc)? if no, why?

The language specification says that the test function must be one of
EQ, EQL, EQUAL or EQUALP. So the restriction does not just rule out
your own test functions, but even some standard ones like STRING=.

One productive way to deal with hash tables is to view them as a low
level optimization tool. You can wrap them behind interfaces that
don't require the user to select the appropriate test. Internally, you
know which of the four applies.

E.g. how about this hack:

  ;; real version would pass through other make-hash-table args
  (defgeneric make-hash-for-slot (obj slot))

 
  (defmethod make-hash-for-slot ((obj employee) (slot symbol))
    (ecase slot
      ((employee-number) (make-hash-table :test #'eql))
      ((first-name) (make-hash-table :test #'equal))
      (otherwise 

        (error "MAKE-HASH-FOR-SLOT: Unsupported slot ~a."
               slot)))

Another useful technique is a wrapper that maps some standard
comparison functions to the closes EQ- one.

  (defconstant hash-fun-substitution-list
    `((nil ,#'eql)
      (,#'eq ,#'eq)
      (,#'eql ,#'eql)
      (,#'equal ,#'equal)
      (,#'equalp ,#'equalp)
      (,#'string= ,#'equal)
      (,#'string-equal ,#'equalp)
      (,#'= ,#'eql)
            ;; ... etc
      ))

  (defun make-hash-table* (&key test) ;; other args
    (let ((fun (find test hash-fun-substitution-list 
                     :key #'first :test #'eq)))
      (if fun
        (make-hash-table :test (second fun))
        (error "MAKE-HASH-TABLE*: Unsupported function ~a." test))))

From: Vijay L
Subject: Re: (make-hash-table :test #'mytest)
Date: Sat, 02 Nov 2002 16:55:23 +0000
Message-ID: <1eaf81aa.0211020855.4e9d941a@posting.google.com>

"Knut Olav B�hmer <·····@linpro.no> wrote in message news:<···············@patch.linpro.no>...
> Hi,
> 
> Is it possible to make ones own :test to use with hash-tables? if yes,
> how (where is the doc)? if no, why?
> 
> 
> /me newbie

AFAIK it isn't possible. I don't think it is really necessary to keep
any a test other than the four already provided: EQ, EQL, EQUAL,
EQUALP

EQ takes care of symbols
EQL symbols and numbers
EQUAL conses
EQUALP structures (even copies of structures with the same data), like
an EQUAL test for all structures

The :test tests for equality when searching the hash-table. If you
want to alter the key value use the :key. However check the manual and
see if the key function is applied to both the hash-key in the table
and key you give, I don't have access to the manual right now.

Thanks,

Vijay L

From: David Lichteblau
Subject: Re: (make-hash-table :test #'mytest)
Date: Sat, 02 Nov 2002 11:43:46 +0000
Message-ID: <oelof98p571.fsf@vanilla.rz.fhtw-berlin.de>

"Knut Olav B�hmer" <·····@linpro.no> writes:

> Is it possible to make ones own :test to use with hash-tables? if yes,

Not portably, but most implementations support non-standard hash
functions and test predicates, either as MAKE-HASH-TABLE keyword
arguments or (in the case of CMUCL) using a separate registration
mechanism.

> how (where is the doc)?

Ships with your Lisp.

> if no, why?

No idea.

From: Wade Humeniuk
Subject: Re: (make-hash-table :test #'mytest)
Date: Sat, 02 Nov 2002 15:29:41 +0000
Message-ID: <FbSw9.2763$vr4.254609@news2.telusplanet.net>

"Knut Olav B�hmer" <·····@linpro.no> wrote in message ····················@patch.linpro.no...
>
> Hi,
>
> Is it possible to make ones own :test to use with hash-tables? if yes,
> how (where is the doc)? if no, why?

It is likely that if you need your own test for a hash-table that should
implement some specialized data structure to hold your data (and
not use the built in hash tables).  For example I had some data
sorted by date, so I created arrays of months for each year.  Each
month contained an array of days-in-the-month, each day was an
ordered list of entries.  This specialized storage is faster than a
hash-table, but its structure is based on knowledge of the nature
of the data.


--
Wade

(format t "Email: ~A"
        (map 'string
             'code-char
             '(119 104 117 109 101 110 105 117 64
                   116 101 108 117 115 46 110 101 116)))

From: Duane Rettig
Subject: Re: (make-hash-table :test #'mytest)
Date: Sat, 02 Nov 2002 21:00:00 +0000
Message-ID: <4iszfhg9n.fsf@beta.franz.com>

"Knut Olav B�hmer" <·····@linpro.no> writes:

> Hi,
> 
> Is it possible to make ones own :test to use with hash-tables? if yes,
> how (where is the doc)? if no, why?

Not portably, other than the ones given in the Ansi Spec.  However,
some implementations provide such features, also including hash-code
generator function specification, as well as weak hash-tables and key-only
tables for space savings.  For documentation on Allegro CL's extensions,
see

http://www.franz.com/support/documentation/6.2/doc/implementation.htm#cl-make-hash-table-2

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Coby Beck
Subject: Re: (make-hash-table :test #'mytest)
Date: Sat, 02 Nov 2002 10:17:44 +0000
Message-ID: <aq08ma$2i2d$1@otis.netspace.net.au>

"Knut Olav B�hmer" <·····@linpro.no> wrote in message
····················@patch.linpro.no...
>
> Hi,
>
> Is it possible to make ones own :test to use with hash-tables? if yes,
> how (where is the doc)? if no, why?

Can't say why, but the answer is no.

from the make-hash-table entry in the hyperspec:

<quote>
make-hash-table &key test size rehash-size rehash-threshold => hash-table

Arguments and Values:

test---a designator for one of the functions eq, eql, equal, or equalp. The
default is eql.

</quote>


--
Coby Beck
(remove #\Space "coby 101 @ bigpond . com")

From: Matthew Danish
Subject: Re: (make-hash-table :test #'mytest)
Date: Sun, 03 Nov 2002 11:16:02 +0000
Message-ID: <20021103061602.E28154@lain.cheme.cmu.edu>

On Sat, Nov 02, 2002 at 10:43:38AM +0100, Knut Olav B?hmer wrote:
> Is it possible to make ones own :test to use with hash-tables? if yes,
> how (where is the doc)? if no, why?

The problem is that you need more than a :TEST to hash values in a
table; you need a hash-function for that type of value.  You have the
option of using the hash-table interface extensions that various
implementations provide, using a third-party hash-table package, or
writing your own hash function which maps your values to something that
can be inserted into the standard hash-tables.

Generally I take the third option; for example I encoded a Checkers
board as a 96-bit integer and stored using that number in a EQL
hash-table.  In this case the key was always unique to the board, so
there were no collisions, but if there is the possibility of a collision
then you will need to handle it yourself via chaining or some other
mechanism.  The built-in hash-table would not know the difference
between two different Checkers boards that give the same key value, for
example.

-- 
; Matthew Danish <·······@andrew.cmu.edu>
; OpenPGP public key: C24B6010 on keyring.debian.org
; Signed or encrypted mail welcome.
; "There is no dark side of the moon really; matter of fact, it's all dark."

From: Erik Naggum
Subject: Re: (make-hash-table :test #'mytest)
Date: Sun, 03 Nov 2002 18:58:46 +0000
Message-ID: <3245338726772626@naggum.no>

* Knut Olav B�hmer
| Is it possible to make ones own :test to use with hash-tables? if yes,
| how (where is the doc)? if no, why?

  This is actually very simple, but not immediately obvious because of the
  many choices to `make-hash-table�, which tends to make you look for ways
  to expand on those choices.  The better solution is to look another way
  entirely and realize that the key in the hash-table does not necessarily
  have to be the actual key.

  Even with `equalp�, which traverses structures, you still end up with a
  hash bucket that is some index into an array.  So the function has to do
  two things: compute some magic number and compare the item found in the
  hash-table to the actual key before returning the value in the hash-table.

  Your `mytest� function would therefore have to consist of two functions:
  (A) One of one argument that returns a (numeric) hash key, and (B) one of
  two arguments that returns true if the two arguments are the same under
  your test and nil otherwise.  You might argue that the way we specify the
  `:test� argument to `make-hash-table� today is deficient in that it hides
  the details and pretends that real functions are involved when they are
  really keyword arguments from a finite set of values that select two or
  more aspects of the hash-table at once.

  To make your own hashtable work, you would therefore have to implement
  the bucket yourself, and each hash key you use would have to retrieve a
  list of potential values and scan it for the actual value you want.  One
  good way to do this is to find an orthogonal quality that can also be
  reduced to a `eq�-testable value and then use plists.  Then you build a
  few simple wrapper macros to make use of your improved hash table.

  In practice, your knowledge of the data you wish to store and retrieve
  will in the general case be superior to what you can express in any
  declarative language and it is therefore alluring the way politcal
  campaign promises are to choose some "simple" mechanism that covers
  another small area of the problem space in addition to the existing hash
  test functions, but then someone will come along to ask exactly the same
  question you did when they step outside that area, too.  Therefore, a
  standard has to strike a balance between convenience and simplicity.

  To wrap this up with a reference back to what I said at the top, the fact
  that you can get a bunch of test functions tends to make you look for a
  way to extend that list.  This is the wrong approach in general, but it
  is known to be hard to leave a path you think ought to be good.  There is
  even an idiom for it: To be stuck in a rut, where rut is literally a deep
  track made by the passage of a wheel.

-- 
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.

From: Peter Seibel
Subject: Re: (make-hash-table :test #'mytest)
Date: Tue, 05 Nov 2002 05:23:45 +0000
Message-ID: <m3u1iw1tga.fsf@localhost.localdomain>

Erik Naggum <····@naggum.no> writes:

>   You might argue that the way we specify the `:test� argument to
>   `make-hash-table� today is deficient in that it hides the details
>   and pretends that real functions are involved when they are really
>   keyword arguments from a finite set of values that select two or
>   more aspects of the hash-table at once.

So, is it the case that internally gethash uses different hash
functions depending on which :test you choose?

-Peter

-- 
Peter Seibel
·····@javamonkey.com

From: Paul F. Dietz
Subject: Re: (make-hash-table :test #'mytest)
Date: Tue, 05 Nov 2002 11:36:38 +0000
Message-ID: <J8KdneHVhYjXMFqgXTWcoA@dls.net>

Peter Seibel wrote:

> So, is it the case that internally gethash uses different hash
> functions depending on which :test you choose?

Almost certainly.  It would be inefficient to have an EQ hash
table use the same hash function as an EQUAL hash table, since the
latter must traverse the S expression to compute the function, while
the former can just hash the address of the object.

	Paul

From: Marcus Breiing
Subject: Re: (make-hash-table :test #'mytest)
Date: Tue, 05 Nov 2002 12:19:43 +0000
Message-ID: <tDh508E9tIVFavme6rMeld@breiing.com>

"Paul F. Dietz" <·····@dls.net> wrote:

> Peter Seibel wrote:
>> So, is it the case that internally gethash uses different hash
>> functions depending on which :test you choose?

> Almost certainly.  It would be inefficient to have an EQ hash table
> use the same hash function as an EQUAL hash table, since the latter
> must traverse the S expression to compute the function, while the
> former can just hash the address of the object.

Beyond efficiency, the reason to have different hash functions is
modifications to hash table keys.  You can't reasonably expect
EQUAL-hash tables to work correctly under such modifications.  If
you'd use the EQUAL-hash for EQ-tables, such modifications wouldn't
work there, either.  However, EQ-Hash tables are rightly required to
work correctly if you modify objects used as keys (using standard
functions;) see 18.1.2 in the Hyperspec.

-- 
Marcus Breiing
···················@breiing.com (will expire)

From: Frode Vatvedt Fjeld
Subject: Re: (make-hash-table :test #'mytest)
Date: Tue, 05 Nov 2002 12:46:07 +0000
Message-ID: <2h1y60gp68.fsf@vserver.cs.uit.no>

"Paul F. Dietz" <·····@dls.net> writes:

> Almost certainly.  It would be inefficient to have an EQ hash table
> use the same hash function as an EQUAL hash table, since the latter
> must traverse the S expression to compute the function, while the
> former can just hash the address of the object.

And rehash at GC-time? Is this really being done in practice?

-- 
Frode Vatvedt Fjeld

From: Paul F. Dietz
Subject: Re: (make-hash-table :test #'mytest)
Date: Tue, 05 Nov 2002 13:10:51 +0000
Message-ID: <eeadnSK897_CXlqgXTWc3A@dls.net>

Frode Vatvedt Fjeld wrote:

> And rehash at GC-time? Is this really being done in practice?

If the collector moves the objects then, yes, it has to rehash
at GC time.  ACL does this, I believe.

GC clearly also needs to be involved if the lisp supports a weak key
extension for EQ hash tables.

	Paul

From: Duane Rettig
Subject: Re: (make-hash-table :test #'mytest)
Date: Tue, 05 Nov 2002 17:00:01 +0000
Message-ID: <4ela0m0vg.fsf@beta.franz.com>

"Paul F. Dietz" <·····@dls.net> writes:

> Frode Vatvedt Fjeld wrote:
> 
> > And rehash at GC-time? Is this really being done in practice?
> 
> If the collector moves the objects then, yes, it has to rehash
> at GC time.  ACL does this, I believe.

No.  Rehashing after every gc would be incredibly inefficient.  Objects
whose hash codes will change due to a gc only _flag_ the hash table
for rehash before the next gethash or puthash operation (if further
GCs occur before the next hashing operation, then if automatic
rehashing had been forced each gc would have wasted one rehash
operation).  Also, some objects provably don't move, and some objects
can be hashed in ways other than their address, so that they don't
"dirty" the hash-table.  So no rehashing is ever needed if the
hash-table only contains those objects whose hash codes have provably
not changed.

> GC clearly also needs to be involved if the lisp supports a weak key
> extension for EQ hash tables.

Yes.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Paul F. Dietz
Subject: Re: (make-hash-table :test #'mytest)
Date: Wed, 06 Nov 2002 00:11:57 +0000
Message-ID: <ldCdnfDLTLTTw1WgXTWc3g@dls.net>

Duane Rettig wrote:

> No.  Rehashing after every gc would be incredibly inefficient.  Objects
> whose hash codes will change due to a gc only _flag_ the hash table
> for rehash before the next gethash or puthash operation (if further
> GCs occur before the next hashing operation, then if automatic
> rehashing had been forced each gc would have wasted one rehash
> operation).

Ah.  Thanks.

	Paul

From: Michael Hudson
Subject: Re: (make-hash-table :test #'mytest)
Date: Tue, 05 Nov 2002 18:37:51 +0000
Message-ID: <7h3el9z27cb.fsf@pc150.maths.bris.ac.uk>

Duane Rettig <·····@franz.com> writes:

> No.  Rehashing after every gc would be incredibly inefficient.  Objects
> whose hash codes will change due to a gc only _flag_ the hash table
> for rehash before the next gethash or puthash operation

How do you efficiently tell which hashtables a given object is in?  I
can't think of a blindly obvious way of doing that. I haven't spent
long thinking about it, so sorry if I'm just being dumb.

Cheers,
M.

-- 
  All programs evolve until they can send email.      -- Richard Letts
  Except Microsoft Exchange.                                    -- Art
               -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html

From: Paul F. Dietz
Subject: Re: (make-hash-table :test #'mytest)
Date: Wed, 06 Nov 2002 00:09:39 +0000
Message-ID: <ldCdnfHLTLRZwFWgXTWc3g@dls.net>

Michael Hudson wrote:

> How do you efficiently tell which hashtables a given object is in?  I
> can't think of a blindly obvious way of doing that. I haven't spent
> long thinking about it, so sorry if I'm just being dumb.

If GC is moving objects around it already has to solve the problem
of finding all the pointers to those objects.

	Paul

From: Duane Rettig
Subject: Re: (make-hash-table :test #'mytest)
Date: Tue, 05 Nov 2002 21:00:02 +0000
Message-ID: <47kfrpxrc.fsf@beta.franz.com>

Michael Hudson <···@python.net> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > No.  Rehashing after every gc would be incredibly inefficient.  Objects
> > whose hash codes will change due to a gc only _flag_ the hash table
> > for rehash before the next gethash or puthash operation
> 
> How do you efficiently tell which hashtables a given object is in?  I
> can't think of a blindly obvious way of doing that. I haven't spent
> long thinking about it, so sorry if I'm just being dumb.

Who is "you" in this case?  Why do "you" _want_ to tell which hashtables
a given object is in?

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Tim Bradshaw
Subject: Re: (make-hash-table :test #'mytest)
Date: Tue, 05 Nov 2002 22:26:41 +0000
Message-ID: <ey365vbd55q.fsf@cley.com>

* Michael Hudson wrote:

> How do you efficiently tell which hashtables a given object is in?  I
> can't think of a blindly obvious way of doing that. I haven't spent
> long thinking about it, so sorry if I'm just being dumb.

You don't need to.  When you insert an object whose hash-code may
change into a table, you flag the table as needing a rehash after a
GC, then, next time you access it, if a GC has happened, you rehash
then.  (I bet implementations do even smarter things of course).

The classic case where you can avoid rehashing is EQL tables which
only have numeric keys.

--tim

From: Erik Naggum
Subject: Re: (make-hash-table :test #'mytest)
Date: Tue, 05 Nov 2002 22:30:40 +0000
Message-ID: <3245524240078182@naggum.no>

* Michael Hudson
| How do you efficiently tell which hashtables a given object is in?  I
| can't think of a blindly obvious way of doing that.  I haven't spent long
| thinking about it, so sorry if I'm just being dumb.

  You are not allowed to change the key in a way that would change the
  value of the test function applied to it.  This is not when rehashing
  takes place.  Rehashing takes place when you have an `eq� hashtable of
  objects that do not have an internal hash code (such as is commonly done
  with symbols), and a garbage collection moves the objects around.  As
  Duane explained, there is no point in actually performing the rehash
  until the hash-table is accessed after a garbage collection that actually
  moved an object.  The information necessary to make this decision is all
  local to the hash-table.

-- 
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.

From: Thomas A. Russ
Subject: Re: (make-hash-table :test #'mytest)
Date: Thu, 07 Nov 2002 22:47:20 +0000
Message-ID: <ymi1y5xatfr.fsf@sevak.isi.edu>

Michael Hudson <···@python.net> writes:

> 
> Duane Rettig <·····@franz.com> writes:
> 
> > No.  Rehashing after every gc would be incredibly inefficient.  Objects
> > whose hash codes will change due to a gc only _flag_ the hash table
> > for rehash before the next gethash or puthash operation
> 
> How do you efficiently tell which hashtables a given object is in?  I
> can't think of a blindly obvious way of doing that. I haven't spent
> long thinking about it, so sorry if I'm just being dumb.

I don't think you have to do that at all.  I can imagine it being done
with a gc-counter and a flag.  If any item whose hash changes on a GC
is added to a particular hash-table, it sets the hash-changes flag.  The
GC counter is set on each rehash of the table, and on accesses checked
to see if one (or more) GC operations have occurred since the last rehash.

-- 
Thomas A. Russ,  USC/Information Sciences Institute          ···@isi.edu

From: Tim Bradshaw
Subject: Re: (make-hash-table :test #'mytest)
Date: Tue, 05 Nov 2002 13:01:47 +0000
Message-ID: <ey33cqgdvb8.fsf@cley.com>

* Frode Vatvedt Fjeld wrote:
> And rehash at GC-time? Is this really being done in practice?

It more-or-less has to be I think.  Unless object contain some magic
bit of data which uniquely identifies them but is not their address,
then if their address changes, you need to rehash.

--tim

From: Daniel Barlow
Subject: Re: (make-hash-table :test #'mytest)
Date: Tue, 05 Nov 2002 17:29:31 +0000
Message-ID: <874rawkjr8.fsf@noetbook.telent.net>

Tim Bradshaw <···@cley.com> writes:

> * Frode Vatvedt Fjeld wrote:
>> And rehash at GC-time? Is this really being done in practice?
>
> It more-or-less has to be I think.  Unless object contain some magic
> bit of data which uniquely identifies them but is not their address,
> then if their address changes, you need to rehash.

Though note that you can do this lazily (I think CMUCL does) - do the
rehash on the first gethash after the gc has happened, instead of at
gc time itself.  This is a win if you don't access the hash table very
often (you could do two or more GCs in between rehashes), and a lose
if you were expecting that gethash takes a predictable time to run.

-dan

-- 

  http://ww.telent.net/cliki/ - Link farm for free CL-on-Unix resources

From: Rob Warnock
Subject: Re: (make-hash-table :test #'mytest)
Date: Wed, 06 Nov 2002 02:56:41 +0000
Message-ID: <gp2cnWVJDsB0GVWgXTWcpg@giganews.com>

Tim Bradshaw  <···@cley.com> wrote:
+---------------
| * Frode Vatvedt Fjeld wrote:
| > And rehash at GC-time? Is this really being done in practice?
| 
| It more-or-less has to be I think.  Unless object contain some magic
| bit of data which uniquely identifies them but is not their address,
| then if their address changes, you need to rehash.
+---------------

Some kinds of objects might conveniently have a hash computed on them
when they're created or read in (or lazily, the first time they're used
in any hash-related operation) that is permanently stored with the object
(and recomputed if the object is mutated). Strings & symbols (with the
hash being based on their name-strings) come to mind as the main candidates
for this, though in some cases bignums might also benefit.

In that case, hash tables containing only keys with such "persistent"
hash values [or values whose hashes don't depend on location] need not
be re-hashed when GC'd, nor even when expanded (if the stored hash value
contains "extra" bits which were unused in doing lookups with the smaller
hash table size). Indeed, a hash table might contain a hint bit that says
whether the table is currently "GC-clean", which is cleared whenever a
non-cooperating key is stored into it (and perhaps set again during some
future GC if the table becomes "clean" again).


-Rob

-----
Rob Warnock, PP-ASEL-IA		<····@rpw3.org>
627 26th Avenue			<URL:http://www.rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Tim Bradshaw
Subject: Re: (make-hash-table :test #'mytest)
Date: Wed, 06 Nov 2002 09:28:18 +0000
Message-ID: <ey3y987avyl.fsf@cley.com>

* Rob Warnock wrote:

> Some kinds of objects might conveniently have a hash computed on
> them when they're created or read in (or lazily, the first time
> they're used in any hash-related operation) that is permanently
> stored with the object (and recomputed if the object is
> mutated). Strings & symbols (with the hash being based on their
> name-strings) come to mind as the main candidates for this, though
> in some cases bignums might also benefit.

This can only work, I think, for immutable objects, or for objects
whose hash-code doesn't need to change if they are mutated.
Otherwise, if you change the object, then any hashtable which contains
it as a key needs to be marked for rehash, and that is hideous -
either you need a global bit which says `some hash code may have
changed, rehash any table on access', which bit needs to be set by any
mutation of any such hash-coded object, or such objects need to keep
tabs on which tables they are in.

(This isn't to say that such a trick isn't very useful.)

--tim

From: Duane Rettig
Subject: Re: (make-hash-table :test #'mytest)
Date: Wed, 06 Nov 2002 19:00:01 +0000
Message-ID: <4n0ommu3m.fsf@beta.franz.com>

Tim Bradshaw <···@cley.com> writes:

> * Rob Warnock wrote:
> 
> > Some kinds of objects might conveniently have a hash computed on
> > them when they're created or read in (or lazily, the first time
> > they're used in any hash-related operation) that is permanently
> > stored with the object (and recomputed if the object is
> > mutated). Strings & symbols (with the hash being based on their
> > name-strings) come to mind as the main candidates for this, though
> > in some cases bignums might also benefit.
> 
> This can only work, I think, for immutable objects, or for objects
> whose hash-code doesn't need to change if they are mutated.

This is what I understood Rob to be saying.  Symbols are easy, but
strings are not quite as easy, since they may or may not be
immutable,  depending on whether they were created as constants or
via make-string.  So you can still use strings in eq/eql hash-tables,
though we don't assume immutability for strings.

In Allegro CL, we have an internal function called excl::sxhash-if-fast
(which also implies "if-immutable-hash-code").  It yields results for
all of the following object types (which are thus "clean" by definition
for eq/eql hashtables): null, symbol, character, single-float, double-float,
fixnum, bignum, ratio, complex, function-object, standard-instance.

> Otherwise, if you change the object, then any hashtable which contains
> it as a key needs to be marked for rehash, and that is hideous -
> either you need a global bit which says `some hash code may have
> changed, rehash any table on access', which bit needs to be set by any
> mutation of any such hash-coded object, or such objects need to keep
> tabs on which tables they are in.
> 
> (This isn't to say that such a trick isn't very useful.)

Useful, and used.  A hash-table is marked as dirty when an "unclean"
object is put into it.

However, there is yet one more trick that we employ, even if an object
is not clean wrt its hash-code generation:  For objects of types other
than listed above, where for eq/eql hastables the hash-code is generated
from the object's address, the object will only dirty the table if it
moves.  Since some objects move less often than others, we can sometimes
determine _how_ dirty the hash-table is by where its objects were when
they were allocated.  In our own generational GC, the nursery generations
are kept in a newspace and are likely to move each scavenge (generational
gc) and the aged objects are kept in an old area and cannot move unless 
a global gc is performed.  Thus, when an "unclean" object is put into an
eq/eql hash-table, its age is checked and the hash-table is only dirtied
according to that age.  We maintain two dirty bits instead of one:
a dirty-after-scavenge bit and a dirty-after-global-gc bit.   Thus, if all
of the objects have been tenured (aged) before being placed into the
hash-table, the hash table need never be rehashed until the before the
access after the next global-gc, rather than before the access after the
next scavenge.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Erik Naggum
Subject: Re: (make-hash-table :test #'mytest)
Date: Wed, 06 Nov 2002 20:58:17 +0000
Message-ID: <3245605097824604@naggum.no>

* Tim Bradshaw
| Otherwise, if you change the object, then any hashtable which contains
| it as a key needs to be marked for rehash, and that is hideous -

  It is also undefined behavior in Common Lisp.

-- 
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.

From: Tim Bradshaw
Subject: Re: (make-hash-table :test #'mytest)
Date: Thu, 07 Nov 2002 00:37:16 +0000
Message-ID: <ey3iszatdtv.fsf@cley.com>

* Erik Naggum wrote:

>   It is also undefined behavior in Common Lisp.

Good point.

From: Peter Seibel
Subject: Re: (make-hash-table :test #'mytest)
Date: Tue, 05 Nov 2002 15:57:01 +0000
Message-ID: <m365vcyprj.fsf@localhost.localdomain>

"Paul F. Dietz" <·····@dls.net> writes:

> Peter Seibel wrote:
> 
> > So, is it the case that internally gethash uses different hash
> > functions depending on which :test you choose?
> 
> Almost certainly.  It would be inefficient to have an EQ hash
> table use the same hash function as an EQUAL hash table, since the
> latter must traverse the S expression to compute the function, while
> the former can just hash the address of the object.

That's pretty much what I figured, thanks. Anyone know why (I'm sure
someone here does) the API was designed with this sort of "fake"
function argument as opposed to actually taking a :test function
argument that can be any comparator function and something like a
:hash argument that takes a hash function (which might be optional if
you pass one of the standard comparators in the :test argument.)

I've read Erik Naggum's response to the OP saying this isn't what you
really want a couple times but I'm still not quite getting it. I
understand why--given the current API--you can't pass any old function
to :test; what I don't get is why it would have been so horrible to do
something like what I proposed to support different
comparators/hash-functions.

-Peter

-- 
Peter Seibel
·····@javamonkey.com