From: ·········@hotmail.com
Subject: sxhash on clisp - gives same result for different lists
Date: 
Message-ID: <1135219133.785139.25550@f14g2000cwb.googlegroups.com>
Hi,

I was doing some work with clisp hash tables and discovered the
following:

[1]> (sxhash '(1 2 3 4 5))
4231295
[2]> (sxhash '(1 2 3 4 5 6 7 8 9))
4231295

Emacs lisp returns this:

(sxhash '(1 2 3 4 5))
74565
(sxhash '(1 2 3 4 5 6 7 8 9))
19088743

I think that this is a bug in clisp. Has anyone else seen this? Am I
using clisp correctly?

clisp --version returns the following:

GNU CLISP 2.33.2 (2004-06-02) (built on g242.suse.de [10.10.103.242])
Software: GNU C 3.3.4 (pre 3.3.5 20040809) ANSI C program
Features:
(CLOS LOOP COMPILER CLISP ANSI-CL COMMON-LISP LISP=CL INTERPRETER
SOCKETS GENERIC-STREAMS LOGICAL-PATHNAMES SCREEN FFI GETTEXT
 UNICODE BASE-CHAR=CHARACTER PC386 UNIX)
Installation directory: /usr/lib/clisp/
User language: ENGLISH

Regards,

Matthew O'Connor

From: Pisin Bootvong
Subject: Re: sxhash on clisp - gives same result for different lists
Date: 
Message-ID: <1135221854.024244.27840@g49g2000cwa.googlegroups.com>
Here's what I see from my CLISP session.

CL-USER> (sxhash '(1 ))
11286385
CL-USER> (sxhash '(1 2 ))
11318769
CL-USER> (sxhash '(1 2 3))
11306577
CL-USER> (sxhash '(1 2 3 4 ))
11310649
CL-USER> (sxhash '(1 2 3 4 5))
8889177
CL-USER> (sxhash '(1 2 3 4 5 6 7 8 9))
8889177
CL-USER> (sxhash '(1 2 3 4 5 6 7 8 9 10 11 12))
8889177
CL-USER> (sxhash '(1 2 3 4 5 a b c d e))
8889177

The hash code are same for list longer than 5. I guess it's for
efficiency CLISP only use first five elements of list for hashing.
The standard only require that same object return same hash-code, not
that different object always return different hash-code.

Don't know how that will effect the performance of hashtable if you
tends to have store long lists which first five elements are identical
though.

·········@hotmail.com wrote:
> Hi,
>
> I was doing some work with clisp hash tables and discovered the
> following:
>
> [1]> (sxhash '(1 2 3 4 5))
> 4231295
> [2]> (sxhash '(1 2 3 4 5 6 7 8 9))
> 4231295
>
> Emacs lisp returns this:
>
> (sxhash '(1 2 3 4 5))
> 74565
> (sxhash '(1 2 3 4 5 6 7 8 9))
> 19088743
>
> I think that this is a bug in clisp. Has anyone else seen this? Am I
> using clisp correctly?
>
> clisp --version returns the following:
>
> GNU CLISP 2.33.2 (2004-06-02) (built on g242.suse.de [10.10.103.242])
> Software: GNU C 3.3.4 (pre 3.3.5 20040809) ANSI C program
> Features:
> (CLOS LOOP COMPILER CLISP ANSI-CL COMMON-LISP LISP=CL INTERPRETER
> SOCKETS GENERIC-STREAMS LOGICAL-PATHNAMES SCREEN FFI GETTEXT
>  UNICODE BASE-CHAR=CHARACTER PC386 UNIX)
> Installation directory: /usr/lib/clisp/
> User language: ENGLISH
> 
> Regards,
> 
> Matthew O'Connor
From: Zach Beane
Subject: Re: sxhash on clisp - gives same result for different lists
Date: 
Message-ID: <m364phg7d6.fsf@unnamed.xach.com>
·········@hotmail.com writes:

> Hi,
> 
> I was doing some work with clisp hash tables and discovered the
> following:
> 
> [1]> (sxhash '(1 2 3 4 5))
> 4231295
> [2]> (sxhash '(1 2 3 4 5 6 7 8 9))
> 4231295
[snip]
> I think that this is a bug in clisp. Has anyone else seen this? Am I
> using clisp correctly?

Hmm, I don't see how the specification of SXHASH prohibits this
behavior. Can you clarify?

Zach
From: Brian Downing
Subject: Re: sxhash on clisp - gives same result for different lists
Date: 
Message-ID: <SUoqf.425459$084.393322@attbi_s22>
In article <·······················@f14g2000cwb.googlegroups.com>,
 <·········@hotmail.com> wrote:
> I was doing some work with clisp hash tables and discovered the
> following:
> 
> [1]> (sxhash '(1 2 3 4 5))
> 4231295
> [2]> (sxhash '(1 2 3 4 5 6 7 8 9))
> 4231295
> 
> Emacs lisp returns this:
> 
> (sxhash '(1 2 3 4 5))
> 74565
> (sxhash '(1 2 3 4 5 6 7 8 9))
> 19088743
> 
> I think that this is a bug in clisp. Has anyone else seen this? Am I
> using clisp correctly?

SXHASH returns a number for an object such that if two objects are
EQUAL, their SXHASH values will be the same.

It doesn't say anything about objects not EQUAL having non-equal SXHASH
values, and indeed that's impossible since it's impossible to fit the
infinity of potential Lisp values into a fixnum.

Indeed, looking at the clisp source code reveals that it only calculates
the hash to depth 4.  This is a good plan because it means that the hash
value gets computed quickly, and long lists with evenly-distributed
values in the first few elements will still have a decent distribution.

Note that SBCL behaves similarly:

* (sxhash '(1 2 3 4))
81346688
* (sxhash '(1 2 3 4 5))
329232196
* (sxhash '(1 2 3 4 5 6 7 8 9))
329232196

For specifics, see:
http://www.lispworks.com/documentation/HyperSpec/Body/f_sxhash.htm

-bcd
-- 
*** Brian Downing <bdowning at lavos dot net> 
From: ·········@hotmail.com
Subject: Re: sxhash on clisp - gives same result for different lists
Date: 
Message-ID: <1135225906.517139.86630@f14g2000cwb.googlegroups.com>
Thanks Brian,

Your restating of the specification for sxhash brings the problem into
a clearer light.

I was indeed expecting objects that were not EQUAL to have non-equal
SXHASH values.

Thanks to everyone for replying,

Matthew O'Connor
From: Thomas A. Russ
Subject: Re: sxhash on clisp - gives same result for different lists
Date: 
Message-ID: <ymilkydhtim.fsf@sevak.isi.edu>
·········@hotmail.com writes:

> Thanks Brian,
> 
> Your restating of the specification for sxhash brings the problem into
> a clearer light.
> 
> I was indeed expecting objects that were not EQUAL to have non-equal
> SXHASH values.

This is a not uncommon misapprehension.  But if one thinks about the
nature of hash tables, it is clear that in most normal cases there is
some duplication of hash codes.  Without that, there would be no space
savings from using a hash rather than just assigning every object a
unique ID and using that as the index into a table.  

Hash tables are useful when there is a reasonably fast hash code
generation function (otherwise the constant for the O(1) lookup makes it
impractical) and that the space of possible objects is sparsely
populated (othewise using a unique ID and direct lookup would be more
efficient).  That leads to the expectation that there will be some
hashing collisions.

-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Brian Downing
Subject: Re: sxhash on clisp - gives same result for different lists
Date: 
Message-ID: <yXoqf.425466$084.354431@attbi_s22>
In article <·······················@f14g2000cwb.googlegroups.com>,
 <·········@hotmail.com> wrote:
> Emacs lisp returns this:
> 
> (sxhash '(1 2 3 4 5))
> 74565
> (sxhash '(1 2 3 4 5 6 7 8 9))
> 19088743

Also:

ELISP> (sxhash '(1 2 3 4 5)) 
74565
ELISP> (sxhash '(1 2 3 4 5 6)) 
1193046
ELISP> (sxhash '(1 2 3 4 5 6 7)) 
19088743
ELISP> (sxhash '(1 2 3 4 5 6 7 8 9)) 
19088743
ELISP> (sxhash '(1 2 3 4 5 6 7 8 9 10 11)) 
19088743

-bcd
-- 
*** Brian Downing <bdowning at lavos dot net> 
From: ···············@yahoo.com
Subject: Re: sxhash on clisp - gives same result for different lists
Date: 
Message-ID: <1135264411.139330.263630@g49g2000cwa.googlegroups.com>
Apparently ACL 7.0 looks at the first 14.

CL-USER(13): (defun one-thru-n (n &optional (ans '()))
                (if (zerop n) ans (one-thru-n (1- n) (cons n ans))))
ONE-THRU-N
CL-USER(15): (dotimes (i 20) (print (sxhash (one-thru-n i))))
905410 ;; ()
907970 ;; (1)
824515
2447082
1908178
1342171
3710967
10763883
3147263
12019576
5973908
13839900
13765405
16165689
14670607 ;; (1 ... 14)
14670607 ;; (1 ... 15)
14670607 
14670607 
14670607 
14670607