equal test hash-tables vs nested hash tables

From: akopa
Subject: equal test hash-tables vs nested hash tables
Date: Mon, 03 Mar 2008 23:04:52 +0000
Message-ID: <d38de617-50c2-4eda-a7ca-0fded1272124@e6g2000prf.googlegroups.com>

Does anyone have any insight into which is more efficient when the
number of elements in the sequence keys is constant? E.g. (gethash
'(foo bar) hash) vs (gethash 'bar (gethash 'foo hash)).  I have a
function I want to memoize on a certain number of arguments, but I
don't to cons up a list (or vector) every time I check the table.


Thanks,
Matt

Re: equal test hash-tables vs nested hash tables Pascal Bourguignon
- Re: equal test hash-tables vs nested hash tables Rob Warnock
  - Re: equal test hash-tables vs nested hash tables Juho Snellman
Re: equal test hash-tables vs nested hash tables Frode Vatvedt Fjeld
- Re: equal test hash-tables vs nested hash tables Barry Margolin
  - Re: equal test hash-tables vs nested hash tables Frode Vatvedt Fjeld
    - Re: equal test hash-tables vs nested hash tables Barry Margolin
Re: equal test hash-tables vs nested hash tables akopa
Re: equal test hash-tables vs nested hash tables K Livingston

From: Pascal Bourguignon
Subject: Re: equal test hash-tables vs nested hash tables
Date: Tue, 04 Mar 2008 00:47:12 +0000
Message-ID: <87fxv72u3z.fsf@thalassa.informatimago.com>

akopa <··················@gmail.com> writes:

> Does anyone have any insight into which is more efficient when the
> number of elements in the sequence keys is constant? E.g. (gethash
> '(foo bar) hash) vs (gethash 'bar (gethash 'foo hash)).  I have a
> function I want to memoize on a certain number of arguments, but I
> don't to cons up a list (or vector) every time I check the table.

This is totally implementation dependant.  If you really care, you
should profile it.


You could write a layer to automatically adapt to the fastest.


;; Pseudo-code, not tested.

(defun get-my-stuff (key collection)
   (profile-my-stuff)
   (funcall 'get-my-stuff key collection)) ; NOT a recursive call!

(defun get-my-stuff/one-gethash (key collection)
    (gethash key collection))
(defun get-my-stuff/two-gethash (key collection)
    (gethash (first key) (gethash (second key) collection)))

(defmacro internal-run-duration (&body body)
 "return the difference in internal-run-time from start to end of execution of body."
 (let ((start (gensym)))
 `(let ((,start (get-internal-run-time)))
    (unwind-protect (progn ,@body)
        (return-from internal-run-duration (- (get-internal-run-time) ,start))))))

(defun profile-my-stuff ()
 (setf (symbol-function 'get-my-stuff) 
       (symbol-function 
          (let ((count 100))
             (if (< (let ((collection (make-some-typical-collection/1 count))
                          (keys       (make-some-typical-keys/1       count)))
                      (internal-run-duration 
                        (loop repeat count
                              for key in keys
                              do (get-my-stuff/one-gethash key collection))))
                    (let ((collection (make-some-typical-collection/2 count))
                          (keys       (make-some-typical-keys/2       count)))
                      (internal-run-duration 
                        (loop repeat count
                              for key in keys
                              do (get-my-stuff/two-gethash key collection))))
                   'get-my-stuff/one-gethash
                   'get-my-stuff/two-gethash))))))

;; implementation of make-some-typical-{collection,keys}/{1,2} left as
;; exercise to the reader.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

ADVISORY: There is an extremely small but nonzero chance that,
through a process known as "tunneling," this product may
spontaneously disappear from its present location and reappear at
any random place in the universe, including your neighbor's
domicile. The manufacturer will not be responsible for any damages
or inconveniences that may result.

From: Rob Warnock
Subject: Re: equal test hash-tables vs nested hash tables
Date: Tue, 04 Mar 2008 08:52:58 +0000
Message-ID: <-omdnd5JFOB3kFDanZ2dnUVZ_h2pnZ2d@speakeasy.net>

Pascal Bourguignon  <···@informatimago.com> wrote:
+---------------
| akopa <··················@gmail.com> writes:
| 
| > Does anyone have any insight into which is more efficient when the
| > number of elements in the sequence keys is constant? E.g. (gethash
| > '(foo bar) hash) vs (gethash 'bar (gethash 'foo hash)).  I have a
| > function I want to memoize on a certain number of arguments, but I
| > don't to cons up a list (or vector) every time I check the table.
| 
| This is totally implementation dependant.  If you really care, you
| should profile it.
+---------------

In addition, some implementations do a *MUCH* better or worse
job of hashing some data types than others. In <akopa>'s case,
the types to compare are conses (and/or lists and/or trees)
versus symbols. [I only mention this because in the past some
versions of some implementations were known to have very bad
hashing performance on conses.]

Also note that if you "cons up a list" to compute a hash key
at runtime, you *must* use an EQUAL or EQUALP hash table
(since your freshly-consed keys will never be EQ or EQV
to any already-existing entry), which is potentially more
expensive, whereas with nested hash tables with symbols
as keys EQ hash tables will always suffice. [The "Subject:"
<akopa> used implies that was already known, but I thought
I'd mention it "pour les autres".]

On the other hand, nested hash tables have a memory cost,
so it depends.

As Pascal suggests, if it really matters then profile it.


-Rob

p.s. I'd also suggest reading the section in the CLHS on SXHASH
and following the references to the material on "similarity",
especially the implications of objects which are not EQUAL or
EQUALP but *are* "similar" [that is, they'll have the same SXHASH
value and thus GETHASH for such keys may have to traverse long
internal lists of hash table entries with the same hash value
until it finds the one that's EQUAL/EQUALP to the GETHASH key arg].

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Juho Snellman
Subject: Re: equal test hash-tables vs nested hash tables
Date: Tue, 04 Mar 2008 12:46:08 +0000
Message-ID: <87ve42ac8f.fsf@vasara.proghammer.com>

····@rpw3.org (Rob Warnock) writes:
> p.s. I'd also suggest reading the section in the CLHS on SXHASH
> and following the references to the material on "similarity",
> especially the implications of objects which are not EQUAL or
> EQUALP but *are* "similar" [that is, they'll have the same SXHASH
> value and thus GETHASH for such keys may have to traverse long
> internal lists of hash table entries with the same hash value
> until it finds the one that's EQUAL/EQUALP to the GETHASH key arg].

That would be true if EQUAL hash tables had to use SXHASH as their
hash function. But luckily nothing forces an implementation to do
that.

-- 
Juho Snellman

From: Frode Vatvedt Fjeld
Subject: Re: equal test hash-tables vs nested hash tables
Date: Tue, 04 Mar 2008 08:53:37 +0000
Message-ID: <2hmypekgz2.fsf@vserver.cs.uit.no>

akopa <··················@gmail.com> writes:

> I have a function I want to memoize on a certain number of
> arguments, but I don't to cons up a list (or vector) every time I
> check the table.

One possibility is to use e.g. (reduce #'foo key :key #'sxhash) as the
key for the hash-table (with foo for example being + or logxor), and
resolve collisions manually, for example by having each hash-table
bucket be an assoc-list. Lookup then becomes a combination of gethash
and assoc, with no consing involved.

-- 
Frode Vatvedt Fjeld

From: Barry Margolin
Subject: Re: equal test hash-tables vs nested hash tables
Date: Tue, 04 Mar 2008 22:11:14 +0000
Message-ID: <barmar-F5CD78.17111404032008@newsgroups.comcast.net>

In article <··············@vserver.cs.uit.no>,
 Frode Vatvedt Fjeld <······@cs.uit.no> wrote:

> akopa <··················@gmail.com> writes:
> 
> > I have a function I want to memoize on a certain number of
> > arguments, but I don't to cons up a list (or vector) every time I
> > check the table.
> 
> One possibility is to use e.g. (reduce #'foo key :key #'sxhash) as the
> key for the hash-table (with foo for example being + or logxor), and
> resolve collisions manually, for example by having each hash-table
> bucket be an assoc-list. Lookup then becomes a combination of gethash
> and assoc, with no consing involved.

But you had to cons to make the KEY argument to REDUCE, so this is 
likely to be very similar to the way an EQUAL hash-table is implemented.  
I wouldn't worry too much about consing up these temporary lists, that's 
why ephemeral GC was invented.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE don't copy me on replies, I'll read them in the group ***

From: Frode Vatvedt Fjeld
Subject: Re: equal test hash-tables vs nested hash tables
Date: Wed, 05 Mar 2008 09:18:55 +0000
Message-ID: <2h63w1jzpc.fsf@vserver.cs.uit.no>

Frode Vatvedt Fjeld <······@cs.uit.no> wrote:

> > One possibility is to use e.g. (reduce #'foo key :key #'sxhash) as the
> > key for the hash-table (with foo for example being + or logxor), and
> > resolve collisions manually, for example by having each hash-table
> > bucket be an assoc-list. Lookup then becomes a combination of gethash
> > and assoc, with no consing involved.

Barry Margolin <······@alum.mit.edu> writes:

> But you had to cons to make the KEY argument to REDUCE, so this is
> likely to be very similar to the way an EQUAL hash-table is
> implemented.  I wouldn't worry too much about consing up these
> temporary lists, that's why ephemeral GC was invented.

I don't see how reduce needs to cons here at all, except if the
combinator "foo" is e.g. + it may overflow to bignums..? I would
certainly not expect the form #'sxhash to cons at all, if that's what
you meant (but I assume not).

Performance-wise I agree that a good GC would ensure that consing up a
key for each lookup would yield an overall lookup cost of O(key-size),
just as the scheme I outlined using reduce. But I would expect the
constant factors involved in the latter to be smaller.

-- 
Frode Vatvedt Fjeld

From: Barry Margolin
Subject: Re: equal test hash-tables vs nested hash tables
Date: Wed, 05 Mar 2008 21:51:25 +0000
Message-ID: <barmar-E35E74.16512505032008@newsgroups.comcast.net>

In article <··············@vserver.cs.uit.no>,
 Frode Vatvedt Fjeld <······@cs.uit.no> wrote:

> Frode Vatvedt Fjeld <······@cs.uit.no> wrote:
> 
> > > One possibility is to use e.g. (reduce #'foo key :key #'sxhash) as the
> > > key for the hash-table (with foo for example being + or logxor), and
> > > resolve collisions manually, for example by having each hash-table
> > > bucket be an assoc-list. Lookup then becomes a combination of gethash
> > > and assoc, with no consing involved.
> 
> Barry Margolin <······@alum.mit.edu> writes:
> 
> > But you had to cons to make the KEY argument to REDUCE, so this is
> > likely to be very similar to the way an EQUAL hash-table is
> > implemented.  I wouldn't worry too much about consing up these
> > temporary lists, that's why ephemeral GC was invented.
> 
> I don't see how reduce needs to cons here at all, except if the
> combinator "foo" is e.g. + it may overflow to bignums..? I would
> certainly not expect the form #'sxhash to cons at all, if that's what
> you meant (but I assume not).

REDUCE doesn't need to cons, but where did KEY come from?  That's the 
same list the OP didn't want to cons for the EQUAL hash-table.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE don't copy me on replies, I'll read them in the group ***

From: akopa
Subject: Re: equal test hash-tables vs nested hash tables
Date: Wed, 05 Mar 2008 07:46:37 +0000
Message-ID: <216a2354-3490-41a0-ab1a-0826b06f28e0@u72g2000hsf.googlegroups.com>

On Mar 3, 5:04 pm, akopa <··················@gmail.com> wrote:
> Does anyone have any insight into which is more efficient when the
> number of elements in the sequence keys is constant?

Thanks for all the replies. I think i will use nesting for now.  It
seems like "the simplest thing that could work".

Not earth shattering, but I have found functions like
(defun ensure-child-hash (key hash &optional (test 'eql))
  (let ((inner-hash (gethash key hash)))
    (if inner-hash
        inner-hash
        (setf (gethash key hash)
              (make-hash-table :test test)))))

help move the idiom along.

From: K Livingston
Subject: Re: equal test hash-tables vs nested hash tables
Date: Thu, 06 Mar 2008 02:23:15 +0000
Message-ID: <917b6de7-a1be-4fae-895e-df9c66fb6e39@47g2000hsb.googlegroups.com>

On Mar 3, 5:04 pm, akopa <··················@gmail.com> wrote:
> Does anyone have any insight into which is more efficient when the
> number of elements in the sequence keys is constant? E.g. (gethash
> '(foo bar) hash) vs (gethash 'bar (gethash 'foo hash)).  I have a
> function I want to memoize on a certain number of arguments, but I
> don't to cons up a list (or vector) every time I check the table.
>
> Thanks,
> Matt

It probably doesn't apply to your situation, but... depending on where
you arguments come from and how much variation there is, you might be
able to intern them so that they are then eq.  (in my application I
can do a lot of that at read-time, get the keys to be eq, and then use
them over and over)

For example, I have a lot of data that lives for a while, and uses the
same key for lookup several times.  Before the first lookup I can use
an equal hash to return an identical key, but if I do this the first
time for all my keys, then they become eq.

So things become interned like symbols.  It might be easier to look
exmaple at code:

;;;table for hashing the unique expressions
(defvar *intern-expression-table* (make-hash-table :test #'equal))

(defun intern-expression (expr)
  (typecase expr
    (symbol expr)
    (null expr)
    #+allegro
    (fixnum expr)  ;fixnums aren't eq in some implementations
    (t (multiple-value-bind (unique existsp)
           (gethash expr *intern-expression-table*)
         (if existsp
           unique
           (setf (gethash expr *intern-expression-table*) expr))))))

;;; if (equal exp1 exp2)
;;; then (eq (intern-expression exp1) (intern-expression exp2))

In my case, I then defined a reader macro to make that a little
simpler.

Anyway, it might not help in memoizing a function (like you
mentioned), unless your arguments are frequently constants, or you
hang on to the keys to call again.  But I just thought I'd point out
another point in the space of trade-offs.

Kevin