From: akopa
Subject: equal test hash-tables vs nested hash tables
Date: 
Message-ID: <d38de617-50c2-4eda-a7ca-0fded1272124@e6g2000prf.googlegroups.com>
Does anyone have any insight into which is more efficient when the
number of elements in the sequence keys is constant? E.g. (gethash
'(foo bar) hash) vs (gethash 'bar (gethash 'foo hash)).  I have a
function I want to memoize on a certain number of arguments, but I
don't to cons up a list (or vector) every time I check the table.


Thanks,
Matt

From: Pascal Bourguignon
Subject: Re: equal test hash-tables vs nested hash tables
Date: 
Message-ID: <87fxv72u3z.fsf@thalassa.informatimago.com>
akopa <··················@gmail.com> writes:

> Does anyone have any insight into which is more efficient when the
> number of elements in the sequence keys is constant? E.g. (gethash
> '(foo bar) hash) vs (gethash 'bar (gethash 'foo hash)).  I have a
> function I want to memoize on a certain number of arguments, but I
> don't to cons up a list (or vector) every time I check the table.

This is totally implementation dependant.  If you really care, you
should profile it.


You could write a layer to automatically adapt to the fastest.


;; Pseudo-code, not tested.

(defun get-my-stuff (key collection)
   (profile-my-stuff)
   (funcall 'get-my-stuff key collection)) ; NOT a recursive call!

(defun get-my-stuff/one-gethash (key collection)
    (gethash key collection))
(defun get-my-stuff/two-gethash (key collection)
    (gethash (first key) (gethash (second key) collection)))

(defmacro internal-run-duration (&body body)
 "return the difference in internal-run-time from start to end of execution of body."
 (let ((start (gensym)))
 `(let ((,start (get-internal-run-time)))
    (unwind-protect (progn ,@body)
        (return-from internal-run-duration (- (get-internal-run-time) ,start))))))

(defun profile-my-stuff ()
 (setf (symbol-function 'get-my-stuff) 
       (symbol-function 
          (let ((count 100))
             (if (< (let ((collection (make-some-typical-collection/1 count))
                          (keys       (make-some-typical-keys/1       count)))
                      (internal-run-duration 
                        (loop repeat count
                              for key in keys
                              do (get-my-stuff/one-gethash key collection))))
                    (let ((collection (make-some-typical-collection/2 count))
                          (keys       (make-some-typical-keys/2       count)))
                      (internal-run-duration 
                        (loop repeat count
                              for key in keys
                              do (get-my-stuff/two-gethash key collection))))
                   'get-my-stuff/one-gethash
                   'get-my-stuff/two-gethash))))))

;; implementation of make-some-typical-{collection,keys}/{1,2} left as
;; exercise to the reader.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

ADVISORY: There is an extremely small but nonzero chance that,
through a process known as "tunneling," this product may
spontaneously disappear from its present location and reappear at
any random place in the universe, including your neighbor's
domicile. The manufacturer will not be responsible for any damages
or inconveniences that may result.
From: Rob Warnock
Subject: Re: equal test hash-tables vs nested hash tables
Date: 
Message-ID: <-omdnd5JFOB3kFDanZ2dnUVZ_h2pnZ2d@speakeasy.net>
Pascal Bourguignon  <···@informatimago.com> wrote:
+---------------
| akopa <··················@gmail.com> writes:
| 
| > Does anyone have any insight into which is more efficient when the
| > number of elements in the sequence keys is constant? E.g. (gethash
| > '(foo bar) hash) vs (gethash 'bar (gethash 'foo hash)).  I have a
| > function I want to memoize on a certain number of arguments, but I
| > don't to cons up a list (or vector) every time I check the table.
| 
| This is totally implementation dependant.  If you really care, you
| should profile it.
+---------------

In addition, some implementations do a *MUCH* better or worse
job of hashing some data types than others. In <akopa>'s case,
the types to compare are conses (and/or lists and/or trees)
versus symbols. [I only mention this because in the past some
versions of some implementations were known to have very bad
hashing performance on conses.]

Also note that if you "cons up a list" to compute a hash key
at runtime, you *must* use an EQUAL or EQUALP hash table
(since your freshly-consed keys will never be EQ or EQV
to any already-existing entry), which is potentially more
expensive, whereas with nested hash tables with symbols
as keys EQ hash tables will always suffice. [The "Subject:"
<akopa> used implies that was already known, but I thought
I'd mention it "pour les autres".]

On the other hand, nested hash tables have a memory cost,
so it depends.

As Pascal suggests, if it really matters then profile it.


-Rob

p.s. I'd also suggest reading the section in the CLHS on SXHASH
and following the references to the material on "similarity",
especially the implications of objects which are not EQUAL or
EQUALP but *are* "similar" [that is, they'll have the same SXHASH
value and thus GETHASH for such keys may have to traverse long
internal lists of hash table entries with the same hash value
until it finds the one that's EQUAL/EQUALP to the GETHASH key arg].

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Juho Snellman
Subject: Re: equal test hash-tables vs nested hash tables
Date: 
Message-ID: <87ve42ac8f.fsf@vasara.proghammer.com>
····@rpw3.org (Rob Warnock) writes:
> p.s. I'd also suggest reading the section in the CLHS on SXHASH
> and following the references to the material on "similarity",
> especially the implications of objects which are not EQUAL or
> EQUALP but *are* "similar" [that is, they'll have the same SXHASH
> value and thus GETHASH for such keys may have to traverse long
> internal lists of hash table entries with the same hash value
> until it finds the one that's EQUAL/EQUALP to the GETHASH key arg].

That would be true if EQUAL hash tables had to use SXHASH as their
hash function. But luckily nothing forces an implementation to do
that.

-- 
Juho Snellman
From: Frode Vatvedt Fjeld
Subject: Re: equal test hash-tables vs nested hash tables
Date: 
Message-ID: <2hmypekgz2.fsf@vserver.cs.uit.no>
akopa <··················@gmail.com> writes:

> I have a function I want to memoize on a certain number of
> arguments, but I don't to cons up a list (or vector) every time I
> check the table.

One possibility is to use e.g. (reduce #'foo key :key #'sxhash) as the
key for the hash-table (with foo for example being + or logxor), and
resolve collisions manually, for example by having each hash-table
bucket be an assoc-list. Lookup then becomes a combination of gethash
and assoc, with no consing involved.

-- 
Frode Vatvedt Fjeld
From: Barry Margolin
Subject: Re: equal test hash-tables vs nested hash tables
Date: 
Message-ID: <barmar-F5CD78.17111404032008@newsgroups.comcast.net>
In article <··············@vserver.cs.uit.no>,
 Frode Vatvedt Fjeld <······@cs.uit.no> wrote:

> akopa <··················@gmail.com> writes:
> 
> > I have a function I want to memoize on a certain number of
> > arguments, but I don't to cons up a list (or vector) every time I
> > check the table.
> 
> One possibility is to use e.g. (reduce #'foo key :key #'sxhash) as the
> key for the hash-table (with foo for example being + or logxor), and
> resolve collisions manually, for example by having each hash-table
> bucket be an assoc-list. Lookup then becomes a combination of gethash
> and assoc, with no consing involved.

But you had to cons to make the KEY argument to REDUCE, so this is 
likely to be very similar to the way an EQUAL hash-table is implemented.  
I wouldn't worry too much about consing up these temporary lists, that's 
why ephemeral GC was invented.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE don't copy me on replies, I'll read them in the group ***
From: Frode Vatvedt Fjeld
Subject: Re: equal test hash-tables vs nested hash tables
Date: 
Message-ID: <2h63w1jzpc.fsf@vserver.cs.uit.no>
Frode Vatvedt Fjeld <······@cs.uit.no> wrote:

> > One possibility is to use e.g. (reduce #'foo key :key #'sxhash) as the
> > key for the hash-table (with foo for example being + or logxor), and
> > resolve collisions manually, for example by having each hash-table
> > bucket be an assoc-list. Lookup then becomes a combination of gethash
> > and assoc, with no consing involved.

Barry Margolin <······@alum.mit.edu> writes:

> But you had to cons to make the KEY argument to REDUCE, so this is
> likely to be very similar to the way an EQUAL hash-table is
> implemented.  I wouldn't worry too much about consing up these
> temporary lists, that's why ephemeral GC was invented.

I don't see how reduce needs to cons here at all, except if the
combinator "foo" is e.g. + it may overflow to bignums..? I would
certainly not expect the form #'sxhash to cons at all, if that's what
you meant (but I assume not).

Performance-wise I agree that a good GC would ensure that consing up a
key for each lookup would yield an overall lookup cost of O(key-size),
just as the scheme I outlined using reduce. But I would expect the
constant factors involved in the latter to be smaller.

-- 
Frode Vatvedt Fjeld
From: Barry Margolin
Subject: Re: equal test hash-tables vs nested hash tables
Date: 
Message-ID: <barmar-E35E74.16512505032008@newsgroups.comcast.net>
In article <··············@vserver.cs.uit.no>,
 Frode Vatvedt Fjeld <······@cs.uit.no> wrote:

> Frode Vatvedt Fjeld <······@cs.uit.no> wrote:
> 
> > > One possibility is to use e.g. (reduce #'foo key :key #'sxhash) as the
> > > key for the hash-table (with foo for example being + or logxor), and
> > > resolve collisions manually, for example by having each hash-table
> > > bucket be an assoc-list. Lookup then becomes a combination of gethash
> > > and assoc, with no consing involved.
> 
> Barry Margolin <······@alum.mit.edu> writes:
> 
> > But you had to cons to make the KEY argument to REDUCE, so this is
> > likely to be very similar to the way an EQUAL hash-table is
> > implemented.  I wouldn't worry too much about consing up these
> > temporary lists, that's why ephemeral GC was invented.
> 
> I don't see how reduce needs to cons here at all, except if the
> combinator "foo" is e.g. + it may overflow to bignums..? I would
> certainly not expect the form #'sxhash to cons at all, if that's what
> you meant (but I assume not).

REDUCE doesn't need to cons, but where did KEY come from?  That's the 
same list the OP didn't want to cons for the EQUAL hash-table.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE don't copy me on replies, I'll read them in the group ***
From: akopa
Subject: Re: equal test hash-tables vs nested hash tables
Date: 
Message-ID: <216a2354-3490-41a0-ab1a-0826b06f28e0@u72g2000hsf.googlegroups.com>
On Mar 3, 5:04 pm, akopa <··················@gmail.com> wrote:
> Does anyone have any insight into which is more efficient when the
> number of elements in the sequence keys is constant?

Thanks for all the replies. I think i will use nesting for now.  It
seems like "the simplest thing that could work".

Not earth shattering, but I have found functions like
(defun ensure-child-hash (key hash &optional (test 'eql))
  (let ((inner-hash (gethash key hash)))
    (if inner-hash
        inner-hash
        (setf (gethash key hash)
              (make-hash-table :test test)))))

help move the idiom along.
From: K Livingston
Subject: Re: equal test hash-tables vs nested hash tables
Date: 
Message-ID: <917b6de7-a1be-4fae-895e-df9c66fb6e39@47g2000hsb.googlegroups.com>
On Mar 3, 5:04 pm, akopa <··················@gmail.com> wrote:
> Does anyone have any insight into which is more efficient when the
> number of elements in the sequence keys is constant? E.g. (gethash
> '(foo bar) hash) vs (gethash 'bar (gethash 'foo hash)).  I have a
> function I want to memoize on a certain number of arguments, but I
> don't to cons up a list (or vector) every time I check the table.
>
> Thanks,
> Matt

It probably doesn't apply to your situation, but... depending on where
you arguments come from and how much variation there is, you might be
able to intern them so that they are then eq.  (in my application I
can do a lot of that at read-time, get the keys to be eq, and then use
them over and over)

For example, I have a lot of data that lives for a while, and uses the
same key for lookup several times.  Before the first lookup I can use
an equal hash to return an identical key, but if I do this the first
time for all my keys, then they become eq.

So things become interned like symbols.  It might be easier to look
exmaple at code:


;;;table for hashing the unique expressions
(defvar *intern-expression-table* (make-hash-table :test #'equal))

(defun intern-expression (expr)
  (typecase expr
    (symbol expr)
    (null expr)
    #+allegro
    (fixnum expr)  ;fixnums aren't eq in some implementations
    (t (multiple-value-bind (unique existsp)
           (gethash expr *intern-expression-table*)
         (if existsp
           unique
           (setf (gethash expr *intern-expression-table*) expr))))))

;;; if (equal exp1 exp2)
;;; then (eq (intern-expression exp1) (intern-expression exp2))

In my case, I then defined a reader macro to make that a little
simpler.


Anyway, it might not help in memoizing a function (like you
mentioned), unless your arguments are frequently constants, or you
hang on to the keys to call again.  But I just thought I'd point out
another point in the space of trade-offs.

Kevin