Hi,
I was doing some work with clisp hash tables and discovered the
following:
[1]> (sxhash '(1 2 3 4 5))
4231295
[2]> (sxhash '(1 2 3 4 5 6 7 8 9))
4231295
Emacs lisp returns this:
(sxhash '(1 2 3 4 5))
74565
(sxhash '(1 2 3 4 5 6 7 8 9))
19088743
I think that this is a bug in clisp. Has anyone else seen this? Am I
using clisp correctly?
clisp --version returns the following:
GNU CLISP 2.33.2 (2004-06-02) (built on g242.suse.de [10.10.103.242])
Software: GNU C 3.3.4 (pre 3.3.5 20040809) ANSI C program
Features:
(CLOS LOOP COMPILER CLISP ANSI-CL COMMON-LISP LISP=CL INTERPRETER
SOCKETS GENERIC-STREAMS LOGICAL-PATHNAMES SCREEN FFI GETTEXT
UNICODE BASE-CHAR=CHARACTER PC386 UNIX)
Installation directory: /usr/lib/clisp/
User language: ENGLISH
Regards,
Matthew O'Connor
Here's what I see from my CLISP session.
CL-USER> (sxhash '(1 ))
11286385
CL-USER> (sxhash '(1 2 ))
11318769
CL-USER> (sxhash '(1 2 3))
11306577
CL-USER> (sxhash '(1 2 3 4 ))
11310649
CL-USER> (sxhash '(1 2 3 4 5))
8889177
CL-USER> (sxhash '(1 2 3 4 5 6 7 8 9))
8889177
CL-USER> (sxhash '(1 2 3 4 5 6 7 8 9 10 11 12))
8889177
CL-USER> (sxhash '(1 2 3 4 5 a b c d e))
8889177
The hash code are same for list longer than 5. I guess it's for
efficiency CLISP only use first five elements of list for hashing.
The standard only require that same object return same hash-code, not
that different object always return different hash-code.
Don't know how that will effect the performance of hashtable if you
tends to have store long lists which first five elements are identical
though.
·········@hotmail.com wrote:
> Hi,
>
> I was doing some work with clisp hash tables and discovered the
> following:
>
> [1]> (sxhash '(1 2 3 4 5))
> 4231295
> [2]> (sxhash '(1 2 3 4 5 6 7 8 9))
> 4231295
>
> Emacs lisp returns this:
>
> (sxhash '(1 2 3 4 5))
> 74565
> (sxhash '(1 2 3 4 5 6 7 8 9))
> 19088743
>
> I think that this is a bug in clisp. Has anyone else seen this? Am I
> using clisp correctly?
>
> clisp --version returns the following:
>
> GNU CLISP 2.33.2 (2004-06-02) (built on g242.suse.de [10.10.103.242])
> Software: GNU C 3.3.4 (pre 3.3.5 20040809) ANSI C program
> Features:
> (CLOS LOOP COMPILER CLISP ANSI-CL COMMON-LISP LISP=CL INTERPRETER
> SOCKETS GENERIC-STREAMS LOGICAL-PATHNAMES SCREEN FFI GETTEXT
> UNICODE BASE-CHAR=CHARACTER PC386 UNIX)
> Installation directory: /usr/lib/clisp/
> User language: ENGLISH
>
> Regards,
>
> Matthew O'Connor
From: Zach Beane
Subject: Re: sxhash on clisp - gives same result for different lists
Date:
Message-ID: <m364phg7d6.fsf@unnamed.xach.com>
·········@hotmail.com writes:
> Hi,
>
> I was doing some work with clisp hash tables and discovered the
> following:
>
> [1]> (sxhash '(1 2 3 4 5))
> 4231295
> [2]> (sxhash '(1 2 3 4 5 6 7 8 9))
> 4231295
[snip]
> I think that this is a bug in clisp. Has anyone else seen this? Am I
> using clisp correctly?
Hmm, I don't see how the specification of SXHASH prohibits this
behavior. Can you clarify?
Zach
From: Brian Downing
Subject: Re: sxhash on clisp - gives same result for different lists
Date:
Message-ID: <SUoqf.425459$084.393322@attbi_s22>
In article <·······················@f14g2000cwb.googlegroups.com>,
<·········@hotmail.com> wrote:
> I was doing some work with clisp hash tables and discovered the
> following:
>
> [1]> (sxhash '(1 2 3 4 5))
> 4231295
> [2]> (sxhash '(1 2 3 4 5 6 7 8 9))
> 4231295
>
> Emacs lisp returns this:
>
> (sxhash '(1 2 3 4 5))
> 74565
> (sxhash '(1 2 3 4 5 6 7 8 9))
> 19088743
>
> I think that this is a bug in clisp. Has anyone else seen this? Am I
> using clisp correctly?
SXHASH returns a number for an object such that if two objects are
EQUAL, their SXHASH values will be the same.
It doesn't say anything about objects not EQUAL having non-equal SXHASH
values, and indeed that's impossible since it's impossible to fit the
infinity of potential Lisp values into a fixnum.
Indeed, looking at the clisp source code reveals that it only calculates
the hash to depth 4. This is a good plan because it means that the hash
value gets computed quickly, and long lists with evenly-distributed
values in the first few elements will still have a decent distribution.
Note that SBCL behaves similarly:
* (sxhash '(1 2 3 4))
81346688
* (sxhash '(1 2 3 4 5))
329232196
* (sxhash '(1 2 3 4 5 6 7 8 9))
329232196
For specifics, see:
http://www.lispworks.com/documentation/HyperSpec/Body/f_sxhash.htm
-bcd
--
*** Brian Downing <bdowning at lavos dot net>
Thanks Brian,
Your restating of the specification for sxhash brings the problem into
a clearer light.
I was indeed expecting objects that were not EQUAL to have non-equal
SXHASH values.
Thanks to everyone for replying,
Matthew O'Connor
From: Thomas A. Russ
Subject: Re: sxhash on clisp - gives same result for different lists
Date:
Message-ID: <ymilkydhtim.fsf@sevak.isi.edu>
·········@hotmail.com writes:
> Thanks Brian,
>
> Your restating of the specification for sxhash brings the problem into
> a clearer light.
>
> I was indeed expecting objects that were not EQUAL to have non-equal
> SXHASH values.
This is a not uncommon misapprehension. But if one thinks about the
nature of hash tables, it is clear that in most normal cases there is
some duplication of hash codes. Without that, there would be no space
savings from using a hash rather than just assigning every object a
unique ID and using that as the index into a table.
Hash tables are useful when there is a reasonably fast hash code
generation function (otherwise the constant for the O(1) lookup makes it
impractical) and that the space of possible objects is sparsely
populated (othewise using a unique ID and direct lookup would be more
efficient). That leads to the expectation that there will be some
hashing collisions.
--
Thomas A. Russ, USC/Information Sciences Institute
From: Brian Downing
Subject: Re: sxhash on clisp - gives same result for different lists
Date:
Message-ID: <yXoqf.425466$084.354431@attbi_s22>