sxhash: what is "same implementation"?

From: Sam Steingold
Subject: sxhash: what is "same implementation"?
Date: Mon, 23 May 2005 16:22:28 +0000
Message-ID: <usm0dap4r.fsf@gnu.org>

<http://www.lisp.org/HyperSpec/Body/fun_sxhash.html>
1. SXHASH must return a FIXNUM
2. for any two objects, x and y, both of which are bit vectors,
   characters, conses, numbers, pathnames, strings, or symbols, and
   which are similar, (sxhash x) and (sxhash y) yield the same
   mathematical value even if x and y exist in different Lisp images of
   the same implementation.
   See Section 3.2.4 (Literal Objects in Compiled Files).

what is "same implementation" here?
often the size of fixnum is different on different platforms for the
"same implementation" (e.g., 64-bit platforms may have 48+ bit fixnums
which is not possible on a 32-bit platform).

So, is the AMD64 incarnation of an implementation required to return the
same mathematical fixnum as the i386 incarnation of the "same
implementation"?

The reference to compiled files appear to suggest that the "same
implementation" is defined as "able to share compiled files".
This would mean that CLISP/AMD64, CLISP/Sparc64, CLISP/i386 &c are all
the same implementations because they can share the compiled files
(platform-independent bytecode) - and thus are required to produce the
same SXHASH values on all platforms,
while CMUCL/i386 and CMUCL/Sparc are different implementations because
they cannot share the natively compiled files - and thus the 64-bit
version is free to produce larger fixnums.
Right?

Thanks!


-- 
Sam Steingold (http://www.podval.org/~sds) running w2k
<http://www.iris.org.il> <http://www.memri.org/> <http://pmw.org.il/>
<http://www.jihadwatch.org/> <http://www.honestreporting.com>
We are born naked, wet, and hungry.  Then things get worse.

Re: sxhash: what is "same implementation"? Christopher C. Stacy
- Re: sxhash: what is "same implementation"? Pascal Bourguignon
Multiple fixnum size lisps? Re: sxhash: what is "same implementation"? Michael Ferrador
- Re: Multiple fixnum size lisps? Re: sxhash: what is "same implementation"? Christopher C. Stacy
Re: sxhash: what is "same implementation"? Pascal Bourguignon
- Re: sxhash: what is "same implementation"? Duane Rettig
  - Re: sxhash: what is "same implementation"? Barry Margolin
    - Re: sxhash: what is "same implementation"? Ron Garret
    - Re: sxhash: what is "same implementation"? Duane Rettig
- Re: sxhash: what is "same implementation"? sds
  - Re: sxhash: what is "same implementation"? Adam Warner
Re: sxhash: what is "same implementation"? Florian Weimer
Re: sxhash: what is "same implementation"? Barry Margolin
- Re: sxhash: what is "same implementation"? Duane Rettig
Re: sxhash: what is "same implementation"? Kent M Pitman
Re: sxhash: what is "same implementation"? Cameron MacKinnon
- Re: sxhash: what is "same implementation"? Pascal Bourguignon
  - Re: sxhash: what is "same implementation"? Adam Warner
Re: sxhash: what is "same implementation"? Adam Warner
- Re: sxhash: what is "same implementation"? Duane Rettig
  - Re: sxhash: what is "same implementation"? Pascal Bourguignon
  - Re: sxhash: what is "same implementation"? Adam Warner
- Re: sxhash: what is "same implementation"? lin8080
  - Re: sxhash: what is "same implementation"? Kent M Pitman
    - Re: sxhash: what is "same implementation"? lin8080

From: Christopher C. Stacy
Subject: Re: sxhash: what is "same implementation"?
Date: Mon, 23 May 2005 17:55:05 +0000
Message-ID: <uis197rph.fsf@news.dtpq.com>

Sam Steingold <···@gnu.org> writes:

> what is "same implementation" here?
> often the size of fixnum is different on different platforms

I think that's your answer right there.

> "same implementation" (e.g., 64-bit platforms may have 48+ bit
> fixnums which is not possible on a 32-bit platform).

Implementation is not the same thing as "product line".

From: Pascal Bourguignon
Subject: Re: sxhash: what is "same implementation"?
Date: Mon, 23 May 2005 18:24:15 +0000
Message-ID: <87k6lpn6ls.fsf@thalassa.informatimago.com>

······@news.dtpq.com (Christopher C. Stacy) writes:

> Sam Steingold <···@gnu.org> writes:
>
>> what is "same implementation" here?
>> often the size of fixnum is different on different platforms
>
> I think that's your answer right there.
>
>> "same implementation" (e.g., 64-bit platforms may have 48+ bit
>> fixnums which is not possible on a 32-bit platform).
>
> Implementation is not the same thing as "product line".

Indeed.  By the way, in a more formally specified standard, perhaps
one would have:

(for-all (i (or (vectors bit) character cons number pathname string symbol))
   (if (equal (with-implementation (a)
                (list (lisp-implementation-type) (lisp-implementation-version)))
              (with-implementation (b)
                (list (lisp-implementation-type) (lisp-implementation-version))))
      (= (with-implementation (a) (sxhash i)) 
         (with-implementation (b) (sxhash i)))))


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

There is no worse tyranny than to force a man to pay for what he does not
want merely because you think it would be good for him. -- Robert Heinlein

From: Michael Ferrador
Subject: Multiple fixnum size lisps? Re: sxhash: what is "same implementation"?
Date: Mon, 23 May 2005 23:09:30 +0000
Message-ID: <Lotke.46564$HJ2.26734@fe11.lga>

maybe I was just distracted by Tiger's current 64 bit compute
needs to IPC to a 32 GUI front end...

(and cutting out the IPC)

Has there been any Multiple level Data or Address space size lisps?

From: Christopher C. Stacy
Subject: Re: Multiple fixnum size lisps? Re: sxhash: what is "same implementation"?
Date: Tue, 24 May 2005 02:45:09 +0000
Message-ID: <uhdgtz6iy.fsf@news.dtpq.com>

Michael Ferrador <·····@yahoo.com> writes:

> maybe I was just distracted by Tiger's current 64 bit compute
> needs to IPC to a 32 GUI front end...
> 
> (and cutting out the IPC)
> 
> Has there been any Multiple level Data or Address space size lisps?

I think there might have been a Common Lisp implementation 
on the later PDP-10 models which used "extended addressing",
which was a kind of segmented address space.

But the usual answer is that Lisp implementations have always 
just pushed the address space.   When the PDP-10 address space
became too small, the Lisp Machine was invented, featuring a
huge address space.    More recently, 64 bits has usually been
enough for most people.

From: Pascal Bourguignon
Subject: Re: sxhash: what is "same implementation"?
Date: Mon, 23 May 2005 17:50:04 +0000
Message-ID: <87u0ktn86r.fsf@thalassa.informatimago.com>

Sam Steingold <···@gnu.org> writes:

> <http://www.lisp.org/HyperSpec/Body/fun_sxhash.html>
> 1. SXHASH must return a FIXNUM
> 2. for any two objects, x and y, both of which are bit vectors,
>    characters, conses, numbers, pathnames, strings, or symbols, and
>    which are similar, (sxhash x) and (sxhash y) yield the same
>    mathematical value even if x and y exist in different Lisp images of
>    the same implementation.
>    See Section 3.2.4 (Literal Objects in Compiled Files).
>
> what is "same implementation" here?
> often the size of fixnum is different on different platforms for the
> "same implementation" (e.g., 64-bit platforms may have 48+ bit fixnums
> which is not possible on a 32-bit platform).
>
> So, is the AMD64 incarnation of an implementation required to return the
> same mathematical fixnum as the i386 incarnation of the "same
> implementation"?
>
> The reference to compiled files appear to suggest that the "same
> implementation" is defined as "able to share compiled files".
> This would mean that CLISP/AMD64, CLISP/Sparc64, CLISP/i386 &c are all
> the same implementations because they can share the compiled files
> (platform-independent bytecode) - and thus are required to produce the
> same SXHASH values on all platforms,
> while CMUCL/i386 and CMUCL/Sparc are different implementations because
> they cannot share the natively compiled files - and thus the 64-bit
> version is free to produce larger fixnums.
> Right?

I'd say: 
any two implementations that can load each others' .fasl or images.

In the paragraph 2 above, I undestood 'image' as the running process
core.  Saved images are just a copy of a running process core.  This
paragraph allows one to save an image and reload it later  and not to
have to rehash everything.

That said, I don't see any reason why  SXHASH could not be the same on
all 32-bit platforms, or even on all platforms.  If I was an
implementor, I'd rather write one hash function running everywhere
than one hash function for each processor...

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

The world will now reboot.  don't bother saving your artefacts.

From: Duane Rettig
Subject: Re: sxhash: what is "same implementation"?
Date: Mon, 23 May 2005 20:17:36 +0000
Message-ID: <4mzqlsnmn.fsf@franz.com>

Pascal Bourguignon <···@informatimago.com> writes:

> Sam Steingold <···@gnu.org> writes:
> 
> > <http://www.lisp.org/HyperSpec/Body/fun_sxhash.html>
> > 1. SXHASH must return a FIXNUM
> > 2. for any two objects, x and y, both of which are bit vectors,
> >    characters, conses, numbers, pathnames, strings, or symbols, and
> >    which are similar, (sxhash x) and (sxhash y) yield the same
> >    mathematical value even if x and y exist in different Lisp images of
> >    the same implementation.
> >    See Section 3.2.4 (Literal Objects in Compiled Files).
> >
> > what is "same implementation" here?
> > often the size of fixnum is different on different platforms for the
> > "same implementation" (e.g., 64-bit platforms may have 48+ bit fixnums
> > which is not possible on a 32-bit platform).
> >
> > So, is the AMD64 incarnation of an implementation required to return the
> > same mathematical fixnum as the i386 incarnation of the "same
> > implementation"?
> >
> > The reference to compiled files appear to suggest that the "same
> > implementation" is defined as "able to share compiled files".
> > This would mean that CLISP/AMD64, CLISP/Sparc64, CLISP/i386 &c are all
> > the same implementations because they can share the compiled files
> > (platform-independent bytecode) - and thus are required to produce the
> > same SXHASH values on all platforms,
> > while CMUCL/i386 and CMUCL/Sparc are different implementations because
> > they cannot share the natively compiled files - and thus the 64-bit
> > version is free to produce larger fixnums.
> > Right?
> 
> I'd say: 
> any two implementations that can load each others' .fasl or images.

That's probably a good definition, but it's probably too strict;
some pair of implementations might have the same hash code generation
but they may be different architecures, so whereas the fasl files are
not exchangeable, the sxhashes are still the same.

> In the paragraph 2 above, I undestood 'image' as the running process
> core.  Saved images are just a copy of a running process core.  This
> paragraph allows one to save an image and reload it later  and not to
> have to rehash everything.
> 
> 
> 
> That said, I don't see any reason why  SXHASH could not be the same on
> all 32-bit platforms, or even on all platforms.

Don't count out endianness - if the implemetation's hash-code
generation is done on a word-basis rather than an element-basis,
then the sxhash will be different for each endianness as well as
for different word-sizes.  For example, on Allegro CL 7.0,
on an x86 (32-bit, little-endian):

CL-USER(1): (format t "~x" (sxhash "abcd"))
8db322
NIL
CL-USER(2): 


while on a Sparc (32-bit, big-endian):

CL-USER(1): (format t "~x" (sxhash "abcd"))
784763
NIL
CL-USER(2): 

and you can verify that the other 32-bit lisps yield the same values
based on their endianness (mutatis mutandis for 64-bits).

  If I was an
> implementor, I'd rather write one hash function running everywhere
> than one hash function for each processor...

Precisely!

:-)

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Barry Margolin
Subject: Re: sxhash: what is "same implementation"?
Date: Mon, 23 May 2005 23:55:05 +0000
Message-ID: <barmar-70B684.19550523052005@comcast.dca.giganews.com>

In article <·············@franz.com>, Duane Rettig <·····@franz.com> 
wrote:

> That's probably a good definition, but it's probably too strict;
> some pair of implementations might have the same hash code generation
> but they may be different architecures, so whereas the fasl files are
> not exchangeable, the sxhashes are still the same.

But the specification doesn't *require* them to.  We're talking about 
what constitutes the same implementation in the context of the ANSI 
standard.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

From: Ron Garret
Subject: Re: sxhash: what is "same implementation"?
Date: Tue, 24 May 2005 00:36:00 +0000
Message-ID: <rNOSPAMon-8EDB1D.17360023052005@news.gha.chartermi.net>

In article <····························@comcast.dca.giganews.com>,
 Barry Margolin <······@alum.mit.edu> wrote:

> In article <·············@franz.com>, Duane Rettig <·····@franz.com> 
> wrote:
> 
> > That's probably a good definition, but it's probably too strict;
> > some pair of implementations might have the same hash code generation
> > but they may be different architecures, so whereas the fasl files are
> > not exchangeable, the sxhashes are still the same.
> 
> But the specification doesn't *require* them to.  We're talking about 
> what constitutes the same implementation in the context of the ANSI 
> standard.

This has apparently been known to be a confusing issue for a long time:

http://www.lisp.org/HyperSpec/Issues/iss337-writeup.html

rg

From: Duane Rettig
Subject: Re: sxhash: what is "same implementation"?
Date: Tue, 24 May 2005 06:04:44 +0000
Message-ID: <4wtpprwg3.fsf@franz.com>

Barry Margolin <······@alum.mit.edu> writes:

> In article <·············@franz.com>, Duane Rettig <·····@franz.com> 
> wrote:
> 
> > That's probably a good definition, but it's probably too strict;
> > some pair of implementations might have the same hash code generation
> > but they may be different architecures, so whereas the fasl files are
> > not exchangeable, the sxhashes are still the same.
> 
> But the specification doesn't *require* them to.  We're talking about 
> what constitutes the same implementation in the context of the ANSI 
> standard.

You're referring to the spec; I am not.  The spec only defines sameness
for objects and types, and not for implementations.  But to imply that
something shouldn't be talked about just because it is not defined by
the spec isn't very useful.  We're also discussing implementations, and
what an implementation might best do, as the question was originally
posed by an implementor.  And finally, to state the context, I responded
directly not to any other post, but to Pascal Bourguignon, who started
his definition with "I'd say:".

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: sds
Subject: Re: sxhash: what is "same implementation"?
Date: Mon, 23 May 2005 20:36:02 +0000
Message-ID: <1116880562.483441.38180@o13g2000cwo.googlegroups.com>

Pascal Bourguignon wrote:
> any two implementations that can load each others' .fasl or images.

image portability is NOT the same as FAS portability.
<http://www.podval.org/~sds/clisp/impnotes/image.html>

> That said, I don't see any reason why  SXHASH could not be the same
on
> all 32-bit platforms, or even on all platforms.  If I was an
> implementor, I'd rather write one hash function running everywhere
> than one hash function for each processor...

clisp computes a 32 bit hash code which then has to be truncated to 24
bits because that's the min fixnum sized on all platforms.
I wondered if that was still necessary on 64-bit platforms.

From: Adam Warner
Subject: Re: sxhash: what is "same implementation"?
Date: Tue, 24 May 2005 02:06:25 +0000
Message-ID: <pan.2005.05.24.02.06.23.860836@consulting.net.nz>

On Mon, 23 May 2005 13:36:02 -0700, sds wrote:
> clisp computes a 32 bit hash code which then has to be truncated to 24
> bits because that's the min fixnum sized on all platforms. I wondered if
> that was still necessary on 64-bit platforms.

EXT:SXHASH32 could be a useful extension that returns a 32-bit platform
independent hash code upon all CLISP implementations. Code intended to run
better on 64-bit platforms could use it in place of CL:SXHASH.

Regards,
Adam

From: Florian Weimer
Subject: Re: sxhash: what is "same implementation"?
Date: Mon, 23 May 2005 19:26:22 +0000
Message-ID: <871x7xagm9.fsf@deneb.enyo.de>

* Sam Steingold:

> <http://www.lisp.org/HyperSpec/Body/fun_sxhash.html>
> 1. SXHASH must return a FIXNUM
> 2. for any two objects, x and y, both of which are bit vectors,
>    characters, conses, numbers, pathnames, strings, or symbols, and
>    which are similar, (sxhash x) and (sxhash y) yield the same
>    mathematical value even if x and y exist in different Lisp images of
>    the same implementation.
>    See Section 3.2.4 (Literal Objects in Compiled Files).
>
> what is "same implementation" here?

Is this term used anywhere else in the standard?  Otherwise "same
implementation" just means "an implementation in which SXHASH yields
the same values".

However, this requirement is utterly broken because it makes SXHASH
unsuitable for hash tables where the keys do not come from a
trustworthy data source.

From: Barry Margolin
Subject: Re: sxhash: what is "same implementation"?
Date: Tue, 24 May 2005 02:26:47 +0000
Message-ID: <barmar-F544E2.22264723052005@comcast.dca.giganews.com>

In article <·············@gnu.org>, Sam Steingold <···@gnu.org> wrote:

> <http://www.lisp.org/HyperSpec/Body/fun_sxhash.html>
> 1. SXHASH must return a FIXNUM
> 2. for any two objects, x and y, both of which are bit vectors,
>    characters, conses, numbers, pathnames, strings, or symbols, and
>    which are similar, (sxhash x) and (sxhash y) yield the same
>    mathematical value even if x and y exist in different Lisp images of
>    the same implementation.
>    See Section 3.2.4 (Literal Objects in Compiled Files).
> 
> what is "same implementation" here?

I think the implementor gets to decide which implementations he's "the 
same" as.  The only case I expect you can inherently depend on is the 
same Lisp application running on the same version of the same OS on the 
same type of hardware.

More than likely, if an implementor creates Lisp versions for different 
operating systems they could arrange for them to use the same SXHASH 
algorithm, to make it easier for his customers to share data files 
across these related implementations.  But I don't think there's any 
requirement for it.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

From: Duane Rettig
Subject: Re: sxhash: what is "same implementation"?
Date: Tue, 24 May 2005 06:23:32 +0000
Message-ID: <4oeb1rvkr.fsf@franz.com>

Barry Margolin <······@alum.mit.edu> writes:

> In article <·············@gnu.org>, Sam Steingold <···@gnu.org> wrote:
> 
> > <http://www.lisp.org/HyperSpec/Body/fun_sxhash.html>
> > 1. SXHASH must return a FIXNUM
> > 2. for any two objects, x and y, both of which are bit vectors,
> >    characters, conses, numbers, pathnames, strings, or symbols, and
> >    which are similar, (sxhash x) and (sxhash y) yield the same
> >    mathematical value even if x and y exist in different Lisp images of
> >    the same implementation.
> >    See Section 3.2.4 (Literal Objects in Compiled Files).
> > 
> > what is "same implementation" here?
> 
> I think the implementor gets to decide which implementations he's "the 
> same" as.  The only case I expect you can inherently depend on is the 
> same Lisp application running on the same version of the same OS on the 
> same type of hardware.

Agreed, but consider also a distinction based on a patch which explicitly
changes the hashing algorithm.  This could be viewed as a "version change",
which would then render the same lisp without the patch to be not the "same
implementation" as that lisp with the patch loaded in.  Oh, the wonders
of dynamic redefinition...

> More than likely, if an implementor creates Lisp versions for different 
> operating systems they could arrange for them to use the same SXHASH 
> algorithm, to make it easier for his customers to share data files 
> across these related implementations.  But I don't think there's any 
> requirement for it.

Also agreed.  The usefulness of regularizing hash code generation is
mostly in the areas of debug and determinism.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Kent M Pitman
Subject: Re: sxhash: what is "same implementation"?
Date: Tue, 24 May 2005 03:41:47 +0000
Message-ID: <uoeb1e1c6.fsf@nhplace.com>

Sam Steingold <···@gnu.org> writes:

> <http://www.lisp.org/HyperSpec/Body/fun_sxhash.html>
> 1. SXHASH must return a FIXNUM
> 2. for any two objects, x and y, both of which are bit vectors,
>    characters, conses, numbers, pathnames, strings, or symbols, and
>    which are similar, (sxhash x) and (sxhash y) yield the same
>    mathematical value even if x and y exist in different Lisp images of
>    the same implementation.
>    See Section 3.2.4 (Literal Objects in Compiled Files).
> 
> what is "same implementation" here?

I think it means, at minimum, that (a) it won't change within the same image
and (b) that if I have a .exe, the value won't change from one day
to the next, so that I can save something out one day, quit image, and read
it back in.

Whether a new release of the same software is or is not compatible
sounds to me like a business relationship management problem, not a
conformance problem.

> often the size of fixnum is different on different platforms for the
> "same implementation" (e.g., 64-bit platforms may have 48+ bit fixnums
> which is not possible on a 32-bit platform).
> 
> So, is the AMD64 incarnation of an implementation required to return the
> same mathematical fixnum as the i386 incarnation of the "same
> implementation"?
> 
> The reference to compiled files appear to suggest that the "same
> implementation" is defined as "able to share compiled files".
> This would mean that CLISP/AMD64, CLISP/Sparc64, CLISP/i386 &c are all
> the same implementations because they can share the compiled files
> (platform-independent bytecode) - and thus are required to produce the
> same SXHASH values on all platforms,
> while CMUCL/i386 and CMUCL/Sparc are different implementations because
> they cannot share the natively compiled files - and thus the 64-bit
> version is free to produce larger fixnums.
> Right?

I can't speak for the committee, but I can tell you what ANSI told the
committee about lawsuits, and that's that if a vendor makes a plausible
claim that they conform, it's dangerous for someone on the committee (or
presumably anyone else either) to claim with any certainty that it doesn't.

From a conservative construction point of view, though, the simplest
thing is for the implementation to simply document that they were
unsure about this point and then document what they do.  Then leave it
to users to open a dialog if they want something different.

Incidentally, I think the wording in question (or, at least, the
restriction in question) goes way, way back to Maclisp, which had the
same restriction....  I think the question of what an implementation was
was simpler back then; I'm pretty sure the two implementations of relevance
were the DEC PDP10 version and the Honeywell Multics version, which were
really decidedly different (the former using a 36-bit word and the latter
a 72-bit word)...

From: Cameron MacKinnon
Subject: Re: sxhash: what is "same implementation"?
Date: Tue, 24 May 2005 14:25:52 +0000
Message-ID: <N9mdnQ5ALPzspA7fRVn-rA@rogers.com>

Sam Steingold wrote:
> <http://www.lisp.org/HyperSpec/Body/fun_sxhash.html>
> 1. SXHASH must return a FIXNUM
> 2. for any two objects, x and y, both of which are bit vectors,
>    characters, conses, numbers, pathnames, strings, or symbols, and
>    which are similar, (sxhash x) and (sxhash y) yield the same
>    mathematical value even if x and y exist in different Lisp images of
>    the same implementation.
>    See Section 3.2.4 (Literal Objects in Compiled Files).

But also (same source):
4. The hash-code is intended for hashing. This places no verifiable
    constraint on a conforming implementation, but the intent is that an
    implementation should make a good-faith effort to produce hash-codes
    that are well distributed within the range of non-negative fixnums.

> what is "same implementation" here?
> often the size of fixnum is different on different platforms for the
> "same implementation" (e.g., 64-bit platforms may have 48+ bit fixnums
> which is not possible on a 32-bit platform).

(4) implies that, between platforms where fixnum sizes differ, sxhash 
should differ.

The committee did a lot of dancing around, and the cleanup issue 
SXHASH-DEFINITION:SIMILAR-FOR-SXHASH keeps dancing, all to say, 
essentially, that the hash must be dependent on the item's contents, not 
its memory location.

This makes (sxhash x) an invariant between instances of the same 
implementation separated spatially or temporally; it could be used 
between two images connected via (read) and TCP/IP.

> So, is the AMD64 incarnation of an implementation required to return the
> same mathematical fixnum as the i386 incarnation of the "same
> implementation"?

I think you could justify either answer: Better hash distribution on the 
64 bit implementation versus sxhash compatibility between 
CLISP/platformX and CLISP/platformY. Which do you think your customers 
want more?

From: Pascal Bourguignon
Subject: Re: sxhash: what is "same implementation"?
Date: Tue, 24 May 2005 15:02:35 +0000
Message-ID: <87fywcll9w.fsf@thalassa.informatimago.com>

Cameron MacKinnon <··········@clearspot.net> writes:

> Sam Steingold wrote:
>> <http://www.lisp.org/HyperSpec/Body/fun_sxhash.html>
>> 1. SXHASH must return a FIXNUM
>> 2. for any two objects, x and y, both of which are bit vectors,
>>    characters, conses, numbers, pathnames, strings, or symbols, and
>>    which are similar, (sxhash x) and (sxhash y) yield the same
>>    mathematical value even if x and y exist in different Lisp images of
>>    the same implementation.
>>    See Section 3.2.4 (Literal Objects in Compiled Files).
>
> But also (same source):
> 4. The hash-code is intended for hashing. This places no verifiable
>     constraint on a conforming implementation, but the intent is that an
>     implementation should make a good-faith effort to produce hash-codes
>     that are well distributed within the range of non-negative fixnums.
>
>> what is "same implementation" here?
>> often the size of fixnum is different on different platforms for the
>> "same implementation" (e.g., 64-bit platforms may have 48+ bit fixnums
>> which is not possible on a 32-bit platform).
>
> (4) implies that, between platforms where fixnum sizes differ, sxhash
> should differ.

Good argument.

> The committee did a lot of dancing around, and the cleanup issue
> SXHASH-DEFINITION:SIMILAR-FOR-SXHASH keeps dancing, all to say,
> essentially, that the hash must be dependent on the item's contents,
> not its memory location.
>
> This makes (sxhash x) an invariant between instances of the same
> implementation separated spatially or temporally; it could be used
> between two images connected via (read) and TCP/IP.
>
>> So, is the AMD64 incarnation of an implementation required to return the
>> same mathematical fixnum as the i386 incarnation of the "same
>> implementation"?
>
> I think you could justify either answer: Better hash distribution on
> the 64 bit implementation versus sxhash compatibility between
> CLISP/platformX and CLISP/platformY. Which do you think your customers
> want more?

In conclusion, Clisp could provide three functions the standard
CL:SXHASH, and two extensions EXT:SXHASH32 and EXT:SXHASH64 (on all
platforms) that would return.

         on:  32-bit    64-bit

CL:SXHASH     fixnum    fixnum          ; different values

EXT:SXHASH32  fixnum    fixnum (small)  ; the same value on the two platforms.

EXT:SXHASH64  bigint    fixnum          ; the same value on the two platforms.



An "implementation" per CLHS would be identified by the same result
from CL:SXHASH.

An "implementation" per clisp could be identified by the same result
from EXT:SXHASH32 or EXT:SXHASH64.

So depending on the set of platforms, the need for spatio-temporal
communications, and the need for space-and-time optimization, clisp
users could choose to use either one of these hashing functions.


Portable or only one platform:                     CL:SXHASH

Clisp specific, both 32-bit and 64-bit platforms, 
need speed or space on 32-bit:                     EXT:SXHASH32

Clisp specific, both 32-bit and 64-bit platforms, 
need good spreading of hash values:                EXT:SXHASH64


CL:SXHASH could be EXT:SXHASH32 on 32-bit 
      and could be EXT:SXHASH64 on 64-bit.



-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
Kitty like plastic.
Confuses for litter box.
Don't leave tarp around.

From: Adam Warner
Subject: Re: sxhash: what is "same implementation"?
Date: Tue, 24 May 2005 23:51:25 +0000
Message-ID: <pan.2005.05.24.23.51.21.974943@consulting.net.nz>

On Tue, 24 May 2005 17:02:35 +0200, Pascal Bourguignon wrote:
> Cameron MacKinnon <··········@clearspot.net> writes:
> 
>> Sam Steingold wrote:
>>> <http://www.lisp.org/HyperSpec/Body/fun_sxhash.html>
>>> 1. SXHASH must return a FIXNUM
>>> 2. for any two objects, x and y, both of which are bit vectors,
>>>    characters, conses, numbers, pathnames, strings, or symbols, and
>>>    which are similar, (sxhash x) and (sxhash y) yield the same
>>>    mathematical value even if x and y exist in different Lisp images of
>>>    the same implementation.
>>>    See Section 3.2.4 (Literal Objects in Compiled Files).
>>
>> But also (same source):
>> 4. The hash-code is intended for hashing. This places no verifiable
>>     constraint on a conforming implementation, but the intent is that an
>>     implementation should make a good-faith effort to produce hash-codes
>>     that are well distributed within the range of non-negative fixnums.
>>
>>> what is "same implementation" here?
>>> often the size of fixnum is different on different platforms for the
>>> "same implementation" (e.g., 64-bit platforms may have 48+ bit fixnums
>>> which is not possible on a 32-bit platform).
>>
>> (4) implies that, between platforms where fixnum sizes differ, sxhash
>> should differ.
> 
> Good argument.

Then many of the SXHASH literals will be bignums on the platform with
smaller fixnums. A good argument also takes into account whether the
implementations have interchangeable FASL files.

Mentally compile this code upon a 64-bit platform with bigger fixnums,
bigger hash codes and a shared FASL file format:

(defparameter *vector*
  (make-array 3 :initial-contents '#.(list (sxhash 'a) (sxhash 'b) (sxhash 'c))
              :element-type 'fixnum))

Now load the compiled file into an implementation with smaller fixnums
that shares the same FASL files. The code will invoke undefined behaviour.

Frankly I cannot see any way to have a portable FASL file format without
imposing the same sized fixnums upon implementations, just as Java imposes
the same sized primitives upon all platforms.

>> The committee did a lot of dancing around, and the cleanup issue
>> SXHASH-DEFINITION:SIMILAR-FOR-SXHASH keeps dancing, all to say,
>> essentially, that the hash must be dependent on the item's contents,
>> not its memory location.
>>
>> This makes (sxhash x) an invariant between instances of the same
>> implementation separated spatially or temporally; it could be used
>> between two images connected via (read) and TCP/IP.
>>
>>> So, is the AMD64 incarnation of an implementation required to return the
>>> same mathematical fixnum as the i386 incarnation of the "same
>>> implementation"?
>>
>> I think you could justify either answer: Better hash distribution on
>> the 64 bit implementation versus sxhash compatibility between
>> CLISP/platformX and CLISP/platformY. Which do you think your customers
>> want more?
> 
> In conclusion, Clisp could provide three functions the standard
> CL:SXHASH, and two extensions EXT:SXHASH32 and EXT:SXHASH64 (on all
> platforms) that would return.

Any shared FASL file may contain integer literals derived from either
implementation's SXHASH.
 
>          on:  32-bit    64-bit
> 
> CL:SXHASH     fixnum    fixnum          ; different values

With shared FASL files and platform-specific fixnum-sized hash codes 
the type of the resulting literals will be an integer between 0 and the
maximum most-positive-fixnum of the so-called "same" implementations.

Regards,
Adam

From: Adam Warner
Subject: Re: sxhash: what is "same implementation"?
Date: Tue, 24 May 2005 01:44:16 +0000
Message-ID: <pan.2005.05.24.01.44.12.949669@consulting.net.nz>

On Mon, 23 May 2005 12:22:28 -0400, Sam Steingold wrote:
> <http://www.lisp.org/HyperSpec/Body/fun_sxhash.html>
[...]

> The reference to compiled files appear to suggest that the "same
> implementation" is defined as "able to share compiled files".

I share this interpretation.

> This would mean that CLISP/AMD64, CLISP/Sparc64, CLISP/i386 &c are all
> the same implementations because they can share the compiled files
> (platform-independent bytecode) - and thus are required to produce the
> same SXHASH values on all platforms, while CMUCL/i386 and CMUCL/Sparc
> are different implementations because they cannot share the natively
> compiled files - and thus the 64-bit version is free to produce larger
> fixnums. Right?

Right.

Here's one reason why this is so vital: Say you use SXHASH as a component
of a binary sorting algorithm. If the SXHASH of object1 is less than the
SXHASH of object2 then object1 is placed before object2 (ignore the more
expensive tests that may be necessary if the hash codes collide). If these
objects are saved to a FASL file in sorted order then they must continue
to be sorted when loaded into an implementation intended to share the same
FASL files.

If you don't observe this invariance then any subsequent binary search
upon the literal objects may break. For example, symbol |ABCD| has hash
code 15584287 in CLISP. |ZYXW| has hash code 680336. They could be saved
in SXHASH code order in the FASL file as literal object #(zyxw abcd).

If you decide to generate larger hash codes for 64-bit CLISPs then it's
quite possible that the hash code of |ZYXW| could turn out to be larger
than the hash code of |ABCD|. The sorted literal #(zyxw abcd) generated by
the 32-bit CLISP may no longer be sorted in the 64-bit CLISP.

Apart from just putting up with 24-bit SXHASH codes on 64-bit platforms,
your options include:

1. Break FASL file compatibility between 32- and 64-bit platforms so you
can generate hashes with far less collisions on 64-bit platforms.

2. Implement larger fixnums for 32-bit CLISP. This is just another cost
that tiny fixnums impose. There may be many more collisions on CLISP with
24-bit hash codes than say 32-bit SBCL with 29-bit hash codes or ABCL with
31-bit hash codes (unless the other implementations have very poor hash
code generators).

You could even abstract away the hardware differences by giving users the
same sized fixnums on all platforms. CLISP could be extended to generate
optimal bytecode for explicit types such as (integer 0 9999).

Regards,
Adam

From: Duane Rettig
Subject: Re: sxhash: what is "same implementation"?
Date: Tue, 24 May 2005 06:17:29 +0000
Message-ID: <4sm0drvuu.fsf@franz.com>

Adam Warner <······@consulting.net.nz> writes:

> On Mon, 23 May 2005 12:22:28 -0400, Sam Steingold wrote:
> > <http://www.lisp.org/HyperSpec/Body/fun_sxhash.html>
> [...]
> 
> > The reference to compiled files appear to suggest that the "same
> > implementation" is defined as "able to share compiled files".
> 
> I share this interpretation.
> 
> > This would mean that CLISP/AMD64, CLISP/Sparc64, CLISP/i386 &c are all
> > the same implementations because they can share the compiled files
> > (platform-independent bytecode) - and thus are required to produce the
> > same SXHASH values on all platforms, while CMUCL/i386 and CMUCL/Sparc
> > are different implementations because they cannot share the natively
> > compiled files - and thus the 64-bit version is free to produce larger
> > fixnums. Right?
> 
> Right.

I agree that it is Good, but it is not Right according to the spec.  The
spec doesn't define what the same implemetation is.  Consider this even
more subtle difference; your vendor has discovered a hashing distribution
problem, and has put out a patch that changes the distribtion and increases
the speed of hashing, making customers happy.  As a result, (and with the
vendor's warning) the sxhash of certain kinds of objects have changed from
before the patch to after the patch.  Is it the "same implemtation"?  One
could say not; the inclusion of patches might be construed as a version
shift of sorts.  And it's a Good Thing.

> Here's one reason why this is so vital: Say you use SXHASH as a component
> of a binary sorting algorithm. If the SXHASH of object1 is less than the
> SXHASH of object2 then object1 is placed before object2 (ignore the more
> expensive tests that may be necessary if the hash codes collide). If these
> objects are saved to a FASL file in sorted order then they must continue
> to be sorted when loaded into an implementation intended to share the same
> FASL files.

Consider a hash table with 21 buckets, and two objects, one which hashes
to 17 and one to 25.  So the first hashes to smaller than the second, but
the second goes into bucket 4 and the first into bucket 17.  Consider
next that the hash table grows to 37.  Now the first and second object
have swapped order wrt each other.  Users should not count on an ordering
within hash tables.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Pascal Bourguignon
Subject: Re: sxhash: what is "same implementation"?
Date: Tue, 24 May 2005 07:47:02 +0000
Message-ID: <8764x9m5ft.fsf@thalassa.informatimago.com>

Duane Rettig <·····@franz.com> writes:
>> Here's one reason why this is so vital: Say you use SXHASH as a component
>> of a binary sorting algorithm. If the SXHASH of object1 is less than the
>> SXHASH of object2 then object1 is placed before object2 (ignore the more
>> expensive tests that may be necessary if the hash codes collide). If these
>> objects are saved to a FASL file in sorted order then they must continue
>> to be sorted when loaded into an implementation intended to share the same
>> FASL files.
>
> Consider a hash table with 21 buckets, and two objects, one which hashes
> to 17 and one to 25.  So the first hashes to smaller than the second, but
> the second goes into bucket 4 and the first into bucket 17.  Consider
> next that the hash table grows to 37.  Now the first and second object
> have swapped order wrt each other.  Users should not count on an ordering
> within hash tables.

This was not the point.

(sort random-object (lambda (a b) (< (sxhash a) (sxhash b))))


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

This is a signature virus.  Add me to your signature and help me to live

From: Adam Warner
Subject: Re: sxhash: what is "same implementation"?
Date: Tue, 24 May 2005 08:37:05 +0000
Message-ID: <pan.2005.05.24.08.37.02.76560@consulting.net.nz>

On Mon, 23 May 2005 23:17:29 -0700, Duane Rettig wrote:

> Adam Warner <······@consulting.net.nz> writes:
> 
>> On Mon, 23 May 2005 12:22:28 -0400, Sam Steingold wrote:
>> > <http://www.lisp.org/HyperSpec/Body/fun_sxhash.html>
>> [...]
>> 
>> > The reference to compiled files appear to suggest that the "same
>> > implementation" is defined as "able to share compiled files".
>> 
>> I share this interpretation.
>> 
>> > This would mean that CLISP/AMD64, CLISP/Sparc64, CLISP/i386 &c are
>> > all the same implementations because they can share the compiled
>> > files (platform-independent bytecode) - and thus are required to
>> > produce the same SXHASH values on all platforms, while CMUCL/i386 and
>> > CMUCL/Sparc are different implementations because they cannot share
>> > the natively compiled files - and thus the 64-bit version is free to
>> > produce larger fixnums. Right?
>> 
>> Right.
> 
> I agree that it is Good, but it is not Right according to the spec.  The
> spec doesn't define what the same implementation is.  Consider this even
> more subtle difference; your vendor has discovered a hashing
> distribution problem, and has put out a patch that changes the
> distribution and increases the speed of hashing, making customers happy.
> As a result, (and with the vendor's warning) the sxhash of certain kinds
> of objects have changed from before the patch to after the patch.  Is it
> the "same implementation"? One could say not; the inclusion of patches
> might be construed as a version shift of sorts.  And it's a Good Thing.

Fine. I consider this a change worthy of a FASL file version increment.

>> Here's one reason why this is so vital: Say you use SXHASH as a
>> component of a binary sorting algorithm. If the SXHASH of object1 is
>> less than the SXHASH of object2 then object1 is placed before object2
>> (ignore the more expensive tests that may be necessary if the hash
>> codes collide). If these objects are saved to a FASL file in sorted
>> order then they must continue to be sorted when loaded into an
>> implementation intended to share the same FASL files.
> 
> Consider a hash table with 21 buckets, and two objects, one which hashes
> to 17 and one to 25.  So the first hashes to smaller than the second,
> but the second goes into bucket 4 and the first into bucket 17. Consider
> next that the hash table grows to 37.  Now the first and second object
> have swapped order wrt each other.  Users should not count on an
> ordering within hash tables.

When I mentioned a binary sorting algorithm I had a vector of objects in
mind to apply it to (my example was a vector of a mere 2 objects). This
vector of objects may be generated at compile time for inclusion as a FASL
file literal. As the vector of objects is binary sorted any object can be
looked up within (1+ (floor (log N 2))) comparisons, where N is the number
of objects.

If SXHASH is used as a component of this binary sort then changing it will
invalidate the order of the objects in the vector. A new FASL file should
be compiled to generate a freshly sorted literal vector. This is why I
consider a change in SXHASH is worthy of a FASL file version increment.

SXHASH is explicitly intended for use where ANSI Common Lisp's
abstractions are insufficient: "sxhash is intended for use where the
pre-defined abstractions are insufficient." Not being able to count upon
an ordering within Common Lisp's predefined hash tables is precisely a
reason to use SXHASH to write a custom abstraction where this can be
counted upon.

Note that it is very expensive to create a consistent symbol inequality
predicate [within the same version of an implementation] without access
to SXHASH-like functionality. Without it all none-EQ symbols have to be
compared using, at a minimum, string comparisons upon the symbol and
package names. With the functionality one only has to make such a
comparison when the symbols are non-EQ but happen to have the same hash
code (because they have the same hash code another way has to be found to
consistently order them, hence the string comparisons).

Regards,
Adam

From: lin8080
Subject: Re: sxhash: what is "same implementation"?
Date: Thu, 26 May 2005 11:57:08 +0000
Message-ID: <4295B994.83C615EE@freenet.de>

Adam Warner schrieb:


> 1. Break FASL file compatibility between 32- and 64-bit platforms so you
> can generate hashes with far less collisions on 64-bit platforms.

> 2. Implement larger fixnums for 32-bit CLISP. This is just another cost
> that tiny fixnums impose. There may be many more collisions on CLISP with
> 24-bit hash codes than say 32-bit SBCL with 29-bit hash codes or ABCL with
> 31-bit hash codes (unless the other implementations have very poor hash
> code generators).

Hallo

And what about a time/status-stamp from the compiler inside the *.fasl
file? This way one can realize the "same implementation" otherwise the
file should be marked to be recompiled (!). 

stefan

the day will come and cpu has 128-bit

From: Kent M Pitman
Subject: Re: sxhash: what is "same implementation"?
Date: Thu, 26 May 2005 15:31:39 +0000
Message-ID: <uvf569f1k.fsf@nhplace.com>

lin8080 <·······@freenet.de> writes:

> Adam Warner schrieb:
> 
> 
> > 1. Break FASL file compatibility between 32- and 64-bit platforms so you
> > can generate hashes with far less collisions on 64-bit platforms.
> 
> > 2. Implement larger fixnums for 32-bit CLISP. This is just another cost
> > that tiny fixnums impose. There may be many more collisions on CLISP with
> > 24-bit hash codes than say 32-bit SBCL with 29-bit hash codes or ABCL with
> > 31-bit hash codes (unless the other implementations have very poor hash
> > code generators).
> 
> Hallo
> 
> And what about a time/status-stamp from the compiler inside the *.fasl
> file? This way one can realize the "same implementation" otherwise the
> file should be marked to be recompiled (!). 

Heh.  You're assuming there is a source file.  The information may have
been dumped direct to fasl and then seeing that the version has changed
tells you it's too late.

Relevant Viewing:  "The Atomic Cafe", especially the part where the soldier
is being interviewed about how he's going to be sent into a nuclear blast
area.  The interviewer asks about the radiation badge he's wearing.  He
explains that it records how much radiation he's received.  The interviewer
says (paraphrasing from memory): "Oh, so you can tell when you've received
a lethal does?" The man blythely responds, probably not really understanding
the import of the question, "That's right."

From: lin8080
Subject: Re: sxhash: what is "same implementation"?
Date: Fri, 27 May 2005 14:23:06 +0000
Message-ID: <42972D4A.13747489@freenet.de>

Kent M Pitman schrieb:
> lin8080 <·······@freenet.de> writes:

 Hallo 

> > And what about a time/status-stamp from the compiler inside the *.fasl
> > file? This way one can realize the "same implementation" otherwise the
> > file should be marked to be recompiled (!).

> Heh.  You're assuming there is a source file.  The information may have
> been dumped direct to fasl and then seeing that the version has changed
> tells you it's too late.

Well ok. Practice shows what I missed in my thoughts. What I saw was
asdf.lisp about 30kb and asdf.fasl about 140kb (done by lispworks).
(Usually I do not compile my c-lisp-files). But a developer should have
the original source-files.

Thinking about a solution it comes in mind to include a reference link
to the source-file. But this is also not good for the same reason. (ie.
(update path-to-old-source-file) -returns new-fasl-file and some msg,
-only among the same implemt.-versions?). Sounds easy for the first
view. (something like imp-notes: 30.7.�or 35.3 -not very elegant). 

What is between the most common use and external modules? So, what does
it take, to make the interpreter able to recompile any *.fasl file?
Guess this is not possible (-could end in undefined status). 

Hmm. I see, I should dig deeper into lisp... (ffi a c-file and no way to
do that with a fasl-file? -could end with overloads) 

stefan   

-diging makes a small horizont