From: Lars Rune Nøstdal
Subject: Re: lisp newbie question: how can I associate  properties to list	cells?
Date: 
Message-ID: <1227927307.9324.34.camel@blackbox.nostdal.org>
On Thu, 2008-11-27 at 11:44 -0800, yang wrote:
> for a symbol, 'a, I can do
>  (setf (get 'a 'state) 'finish)
> 
> so that 'a has a property state with value finish,
> 
> I want to know how to do similar things for lisp. set a property and
> value for each cell in the list
> eg:  given a list
> (setq lst '(1 2 3 1))
> 
> (setf  (get (car lst ) 'state) 'finish)
> will report an error.
> 
> properties are bound to symbol names other than list cells. how can I
> specify properties for cells ?
> thanks

this is probably not a good idea for what you're (really?) after, but:

        
        (defmacro mk-meta-slot (accessor-name &key (test '#'eq))
         `(let ((slot-data (make-hash-table :test ,test :weakness :key)))
             (defmethod ,accessor-name (object)
               (gethash object slot-data))
        
             (defmethod (setf ,accessor-name) (new-value object)
               (setf (gethash object slot-data) new-value))))


        
        ;; anything stronger than EQ decends into the lists/conses and determines equality based on content instead of the conses themselves ...
        (mk-meta-slot properties-of :test #'eq) 
        
        
        (defun property-of (object property-name &key (test #'eq))
          (let ((result (assoc property-name (properties-of object) :test test)))
            (if result
                (values (cdr result) t)
                (values nil nil))))
        
        
        (defun (setf property-of) (property-value object property-name &key (test #'eq))
          (let ((already-existing (assoc property-name (properties-of object) :test test)))
            (if already-existing
                (setf (cdr already-existing) property-value)
                (push (cons property-name property-value) (properties-of object)))))
                      
        
        
        SW> (let ((lst (list 1 2 3 1)))
              (setf (property-of (nthcdr 0 lst) 'cell-name) 'first
                    (property-of (nthcdr 1 lst) 'cell-name) 'second
                    (property-of (nthcdr 2 lst) 'cell-name) 'third
                    (property-of (nthcdr 3 lst) 'cell-name) 'fourth)
              (assert (not (eq (property-of (nthcdr 0 lst) 'cell-name)    ;; this is either exactly what you wanted or it is the opposite of it; both "ways" are possible though
                               (property-of (nthcdr 3 lst) 'cell-name))))
              (setf (first lst) 0)
              (assert (eq (property-of (nthcdr 0 lst) 'cell-name) 'first)))
        NIL
        
        

From: budden
Subject: Re: lisp newbie question: how can I associate properties to list 	cells?
Date: 
Message-ID: <d333085a-5d1b-4b9c-965b-6c0b33478c4b@r40g2000yqj.googlegroups.com>
Hi!
  You can not (in general) put conses to hash table with :test #'eq

http://www.lispworks.com/documentation/HyperSpec/Body/f_sxhash.htm

states:

> 3. The hash-code for an object is always the same within a single session provided that the object is not visibly modified with regard to the equivalence test equal.

That means, when we modify car or cdr, conforming implementation may
change hash value of the cons.
In particulare, in SBCL we get:
CL-USER> (defvar *v* (cons nil nil))
CL-USER> (sxhash *v*)
457802361
CL-USER> (setf (cdr *v*) 4)
4
CL-USER> (sxhash *v*)
213127439

For mapping conses with a eq, we could use association lists only, but
they're not efficient. Or we must prohibit the modification of the
cons once a property is put on it.

Why is it so? This is because conses are garbage-collected. They may
be moved while gc-ing and so they have no permanent address in a
memory, as C objects do. It looks like inherent source of inefficiency
in any garbage-collected languages where objects may move.
From: Lars Rune Nøstdal
Subject: Re: lisp newbie question: how can I associate properties to list 	cells?
Date: 
Message-ID: <1227992004.9324.57.camel@blackbox.nostdal.org>
yeah, and defmethod inside some-other-form (or something like that)
isn't allowed either; this was pointed out to me in another post

..even if sxhash returns different values, it seems to work anyway(?):

        > (let ((lst (cons nil nil)))
            (setf (property-of (nthcdr 0 lst) 'state) 'finished)
            (setf (cdr lst) 4)
            (property-of (nthcdr 0 lst) 'state))
        FINISHED
        T
From: Pascal Costanza
Subject: Re: lisp newbie question: how can I associate properties to list    cells?
Date: 
Message-ID: <6pdt49F7ig0vU1@mid.individual.net>
Lars Rune N�stdal wrote:
> yeah, and defmethod inside some-other-form (or something like that)
> isn't allowed either; this was pointed out to me in another post
> 
> ..even if sxhash returns different values, it seems to work anyway(?):
> 
>         > (let ((lst (cons nil nil)))
>             (setf (property-of (nthcdr 0 lst) 'state) 'finished)
>             (setf (cdr lst) 4)
>             (property-of (nthcdr 0 lst) 'state))
>         FINISHED
>         T

sxhash and hash tables are not related. See further down in the 
definition of sxhash:

"The hash codes returned by sxhash are not necessarily related to any 
hashing strategy used by any other function in Common Lisp."

What's important is the :test argument for make-hash-tables, nothing 
else (unless you are in implementation-dependent land).

Furthermore, you can (!) have local defmethod forms. Although that's 
exotic, it is required to work per ANSI CL.


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
From: Lars Rune Nøstdal
Subject: Re: lisp newbie question: how can I associate properties to list	cells?
Date: 
Message-ID: <1228001345.9324.69.camel@blackbox.nostdal.org>
On Sat, 2008-11-29 at 23:13 +0100, Pascal Costanza wrote:
> Lars Rune Nøstdal wrote:
> > yeah, and defmethod inside some-other-form (or something like that)
> > isn't allowed either; this was pointed out to me in another post
> > 
> > ..even if sxhash returns different values, it seems to work anyway(?):
> > 
> >         > (let ((lst (cons nil nil)))
> >             (setf (property-of (nthcdr 0 lst) 'state) 'finished)
> >             (setf (cdr lst) 4)
> >             (property-of (nthcdr 0 lst) 'state))
> >         FINISHED
> >         T
> 
> sxhash and hash tables are not related. See further down in the 
> definition of sxhash:
> 
> "The hash codes returned by sxhash are not necessarily related to any 
> hashing strategy used by any other function in Common Lisp."
> 
> What's important is the :test argument for make-hash-tables, nothing 
> else (unless you are in implementation-dependent land).
> 
> Furthermore, you can (!) have local defmethod forms. Although that's 
> exotic, it is required to work per ANSI CL.
> 
> 
> Pascal


ah, great .. i can keep "misusing my computer" then :) .. .. instead of
the other way around

*alt-tabs back to emacs*
From: budden
Subject: Re: lisp newbie question: how can I associate properties to list 	cells?
Date: 
Message-ID: <a1857b29-278f-4e13-bc3d-6573529967bf@d23g2000yqc.googlegroups.com>
I was wrong. Sorry.

Some time ago I also was interested with associating properties to
cons cells. I have read cltl2, not the new standard and I was sure
sxhash is used in a hash tables. Well, it is fine that I was wrong.

But I still can't imagine a really efficient way to do so. We either
need to store cons cell identity in the cons cell itself (consumes
memory), or we need to do some rehashing if a cons cell have recently
moved with a GC (consumes a time), or we should not move objects at
all (consumes a memory).
From: Robert Maas, http://tinyurl.com/uh3t
Subject: EQ hashtables (was: lisp newbie question: how can I associate properties to list cells?)
Date: 
Message-ID: <REM-2008nov30-002@Yahoo.Com>
(EQ hash tables *must* work, even if GC moves objects around:)
> From: budden <········@gmail.com>
> ... I still can't imagine a really efficient way to do so. We
> either need to store cons cell identity in the cons cell itself
> (consumes memory), or we need to do some rehashing if a cons cell
> have recently moved with a GC (consumes a time), or we should not
> move objects at all (consumes a memory).

Note: I'm not angry at you. You're a newbie, so presumably you
often doesn't know what's important and what's not important, so
the following is intended to be educational, not accusatory.
(I only get angry at newbies who ask us to do their homework for them,
 and your article is most surely not in that category.)

It's an implementation issue deep down inside the guts of some
version of CL, so you as an ordinary user absolutely never need to
worry about how it's actually done. It's the same as you can't
think of a really efficient way to implement multiple value
returns, or keyword parameters, or multiple lexical closures that
share some bindings, etc. It's not your job, as application
programmer, as user of some CL implementation, to know or care how
it's done. You just must know that any *conforming* CL
implementation must somehow make it work per spec, and you just
refuse to use any seriously non-conforming implementation, and
*efficiency* is a matter of trade secrets and competitive advantage
among the various confirming implementations, so if you find a
bottleneck where it's utterly slow in one company's implementation
you try another company's implementation if you can afford the
extra cost they charge for that extra efficiency better than you
can afford your wasted personal time waiting for the less-efficient
but cheaper implemention to finish common tasks.

As for how it *might* be implemented, here are some ideas I have:

Never move anything. Use the Mad Hatter's Tea Party algorithm: When
your current seating (pages of memory) are too dirty (storage
fragmented so you can't allocate a large object you want), you move
to a new seat (allocate another page of virtual memory). This is
feasible because virtual address space is **IMMENSE** on modern
computers and swap space on hard disk is **IMMENSE** so as long as
your working set isn't larger than physical RAM you are OK, and
physical RAM is pretty large on modern computers so fragmented
memory isn't really as bad as it used to be on 1MB Macintosh Plus
or 256k MicroSoft DOS.

Never move CONS cells. This is feasible because CONS cells are
always the same size, so fragmentation among competing uses for
CONS cells isn't a problem. Only if you allocate lots of CONS
cells, then discard most of them, then try to allocate a large
contiguous array, you'll need to use the Mad Hatter's Tea Party
algorithm. But that doesn't solve the problem of EQ hash tables
that point use *other* things, such as arrays, as keys. So this
section is moot. Go back to fullfledged Mad Hatter's Tea Party
algorithm, or forward to next algorithm below.

Have an extra bit in each object, CONS cell or array etc. etc.,
which tells whether any EQ hash table has ever used it as a key.
Once that bit is set, that object is permanently locked in place
for the duration of the Lisp session, or until the next super-GC
which resets that bit everywhere and then sets it again for all
keys of all EQ hash tables. Any object not recenty a key of an EQ
hash table is free to move. This is reasonble because only a few
objects are ever used as EQ keys in a typical program, or because
even if every object in your program is used as an EQ hash key
still the Mad Hatter's Tea Party agorithm will save you.

Rehash all EQ hash tables whenever GC needs to unfragment memory.
This is reasonable as a space/time tradeoff on older machines that
have only a small address space, such as Macintoshes running in
24-bit address space mode to make older software continue to work,
so the Mad Hatter has only a small number of seats at the table and
you'll shortly get back to where you started and *really* need to
clean your place (unfragment memory). It's also reasonble if
fragmentation causes your working set to occasionally exceed
physical RAM and the GC algorithm has a way to notice that
thrashing has commenced and take remedial action to reduce the
working set by defragmentation. For example, the OS can do a
call-back into the Lisp environment whenever page swapping exceeds
some pre-set threshold, and the called Lisp function can simply set
a flag that is later recognized by CONS and other memory allocaters
to trigger an *immediate* repacking-GC followed by global
EQ-hash-rehashing (or setting a flag in each EQ hash table to
invaidate it and force re-hash the next time anybody tries to
search it) at that moment even before memory is exhausted.

Don't GC in the first place!! This is feasible if the longest
possible lifetime of the Lisp environment is shorter than the least
possible time to run out of memory, such as one-transaction
Web-service applications (regular CGI without mod_lisp persistent
processes), or quickie Unix-style utilities, or even decently long
lasting applications on computer with immense quantity of physical
RAM (such as Lisp Machine, often run for weeks or months with GC
turned off before needing to re-boot the system then re-start the
very last task that had started but didn't get to finish). This is
reasonble because it may be faster to re-boot the system and re-do
that last task rather than run a GC.

So don't worry about it already!

Your idea of storing a CONS-cell identity in each CONS cell would
need to be extended to all moveable objects, including arrays. It
would need to be done for *all* such objects, not just those that
are used as keys in EQ hashtables, because typically you build the
object first and *then* make it a key, and there's no adjacent
storage available to store the identity info at the time you make
it a key. Unfortuately the number of different objects that are
allocated is unbounded, even on a system with finite address space,
because objects are repeatedly discarded and later replaced (in
RAM) by new objects, so the size of the identity info is unbounded
(logarithm of number of objects), so you'd need to use some
variable-length ID-info, such as radix-1 or UTF-8, which would
probably be less efficient than most of the ideas I suggested
above. (In addition to such variable-length strings of data taking
more storage than the no-additional-space of a simple machine
address you already have to find the object in the first place,
hashing a fixed-length virtual machine address is a fixed sequence
of instructions whereas hashing a variable-length string of data is
a WHILE loop, both for computing the bucket location and for
comparing two strings that hash to the same bucket to see if they
are identical. EQ hash tables *ought* to be more efficient than
EQUAL hash tables of course for this reason, hence should be used
instead whereever appropriate. EQL hash tables, like EQ hash
tables, should be fixed code with no loop, but with a dispatch to
handle numbers of various types and character objects differently
from other objects where EQ is called, so EQL hashtables should be
slighty slower than EQ hash tables, because if not all keys are
numbers or characters then all the GC problems regarding object
moving occurs with EQL tables too.)

OT Gripe: For most practical applications, EQUAL hash tables are
what you *really* want, so that for example the most common case of
the same string or list/tree/sortedSet of strings generated from
two different sources will match. IMO the default hashtable type
should have been EQUAL instead of EQL, except that EQL is the
default for virtually ever other function that allows various
comparison functions, such as ASSOC and MEMBER and POSITION, so I
guess we're stuck with EQL being default for hash tables, sigh.
(But note that GET etc. use EQ, so there is one group of exceptions
 to the EQL default behaviour. Why not a second exception?)
From: Kaz Kylheku
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate properties to list cells?)
Date: 
Message-ID: <20081216085629.74@gmail.com>
On 2008-11-30, Robert Maas, http://tinyurl.com/uh3t
<·············@teh.intarweb.org> wrote:
> As for how it *might* be implemented, here are some ideas I have:
>
> Never move anything. Use the Mad Hatter's Tea Party algorithm: When
> your current seating (pages of memory) are too dirty (storage
> fragmented so you can't allocate a large object you want), you move
> to a new seat (allocate another page of virtual memory). This is
> feasible because virtual address space is **IMMENSE** on modern
> computers and swap space on hard disk is **IMMENSE** so as long as
> your working set isn't larger than physical RAM you are OK, and
> physical RAM is pretty large on modern computers so fragmented
> memory isn't really as bad as it used to be on 1MB Macintosh Plus
> or 256k MicroSoft DOS.

Note that virtual memory is not exactly free. It requires virtual address
translation, whose efficiency depends on locality of reference: keeping
everything clumped together.

You don't want to scatter all your objects to different virtual pages, so that
they all hash to different TLB entries. If you're working with a number of
pages that is much larger than the TLB size, you will be thrashing your TLB
cache.

This is made even worse by braindead TLB management, which may be direct
mapped, depending on architecture. Direct mapped means that a page is assigned
a fixed position in the TLB. If you are accessing two pages that map to the
same TLB entry, they bump each other's entry.

Combine a large VM footprint with poor TLB replacement and you have bad
performance.
From: budden
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate 	properties to list cells?)
Date: 
Message-ID: <53219988-98be-4533-90e5-93baef03a86c@l42g2000yqe.googlegroups.com>
Hi!
  Thanks to all for verbose replies.
  I disagree (regardless of anyone's opinion) that I shouldn't care
efficiency from the very start. Lisp is very flexible and allows to
fix some architecture errors afterwards, but some decisions are to be
made from the very beginning and
knowledge of compared efficiency of misc. mechanisms is important.

I made a test:

CL-USER> (require :iterate-keywords) ; http://sourceforge.net/projects/iteratekeywords/
CL-USER> (defparameter *count* 1000000)
CL-USER> (defparameter *v* (make-hash-table :test #'eq))
CL-USER> (progn (defparameter *lst* (iter (:for i from 1 to *count*)
(:collect i))) nil)
CL-USER> (time (iter (:for i from 1 to *count*) (:for c on *lst*)
(setf (gethash c *v*) i)))
Evaluation took:
  1.049 seconds of real time
  33,632,184 bytes consed
CL-USER> (time (iter (:for i from 1 to *count*) (:for c on *lst*)
(when (gethash c *v*) (:count 1))))
Evaluation took:
  0.233 seconds of real time
  31,216 bytes consed

1000000

Acceptable.
From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate properties to list cells?)
Date: 
Message-ID: <REM-2008dec02-004@Yahoo.Com>
> From: budden <········@gmail.com>
> I disagree (regardless of anyone's opinion) that I shouldn't care
> efficiency from the very start.

I take a middle ground. If you're writing a one-shot script that
won't be processing a lot of data, then just write it the quickest
way you can think of and don't worry about computer efficiency.
Your personal efficiency writing the code is more important. An
hour saved writing the code is more important than the five seconds
of extra CPU time your script consumed. Even if you run that script
a hundred times, and thereby waste a total of 500 CPU seconds, over
the course of a year, that's still probably fine.

If you're going to be processing a medium amount of data, like one
million records on today's fast computers, then worry about order
of magnitude considerations, and see if you can find a built-in
utility that is reasonble, for example:
- Guaranteed log(n) for finding a key in a set, and then returning
   the value or deleting or inserting or changing
- Guaranteed n for iterating through a set
- Guaranteed n * log(n) for sorting a set, or for building a
   structure that will allow log(n) operations listed above
This is a trade-off between computer efficiency and personal efficiency.

If your program is apparently too slow, and it's going to be used a
lot, and you really need to speed it up, *then* start getting more
serious about efficiency. Do timing tests to see where the
bottlenecks are, and work hard tuning *only* those parts.
From: budden
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate 	properties to list cells?)
Date: 
Message-ID: <12dff0f1-783d-4c8e-b854-1471f635f274@g38g2000yqn.googlegroups.com>
Hi!
> - Guaranteed log(n) for finding a key in a set, and then returning
>    the value or deleting or inserting or changing
> - Guaranteed n for iterating through a set
> - Guaranteed n * log(n) for sorting a set, or for building a
>    structure that will allow log(n) operations listed above
Hmm, we might also get some function (n) on each GC, and it depends on
many factors which I don't know yet.

If my hash-table is huge, but some parts of it are modified
frequently, would generational GC help it or it
will be stored in a "youngest" generation and rehashed on each GC?
This might slow down program significantly.
I'm interested in it not as in theory, but in a relation to existing
implementations (I have no means to build my own EQ hash table, as
sxhash can't hash cons cells efficiently and reliably).

And, also, lisp is sometimes advertised as *fast* language. Cl-ppcre
said to be faster than perl on regexps (and it is),
CMU CL is said to be faster than fortarn on numerical maths (didn't
try CMUCL, but hesitate that SBCL is so fast. E.g.
fixnum in SBCL 1.0.22 is limited by 1FFFFFFF, so it loses compared to
plain 32-bit C on 64-bit tasks, I don't know about FORTRAN).

I believed that lisp is fast from the start, but now I'm not so sure
it is true. See, e.g. http://shootout.alioth.debian.org/
Tasks are too simple to show real performance, but SBCL (designed to
be fast) aint perform so well even in regexps tasks.
Looks like JIT produces faster code than lisp's compiler. And even C,
where regexps are likely to be interpreted, is faster than cl-ppcre.
From: budden
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate 	properties to list cells?)
Date: 
Message-ID: <bb98c7f3-598b-43bb-9996-be306f46be9b@3g2000yqs.googlegroups.com>
Hallo!
  Hmm, it seems I set very interesting question about hash tables, but
no replies.
  Repeating myself: if I have a big hash table which is mostly
constant, but some parts of it are changed frequently (consider OLAP
as an example there we have a huge client database which is mostly
constant). We have already found that
if objects are moved on GC (and as far as I know they _are_ moved in
most CL implementations), we pay some price on each rehash. In
generational GC, will this price be proportional to:
- total size of a table (we'll get poor efficiency)
- intensity of read/write access between GCs (it is ok)
  If it is proportional to total size of a table, I should avoid
hashing on conses, maybe should avoid eq hash or hash tables at all
and build tables using sxhash or hand-written type-specific
algorithms.
  It it is proportional to intensity of access, eq hash tables are
scalable.
  What can be said about it regarding the best existing CL
implementations?
From: Abdulaziz Ghuloum
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate 	properties to list cells?)
Date: 
Message-ID: <1f11cad7-6e0f-4fa5-9650-2dd439d827fb@t2g2000yqm.googlegroups.com>
I have written a paper[1] on implementing eq-hash-tables such that:

1. No need to store extra info (hash key) in any data structure
   making it ideal for cons cells and other small data structures.

2. It is generation-friendly: *only* objects that actually *move*
   during a collection are rehashed, and only when lookup operation
   fail.  Large tables with old keys are not penalized by minor
   collections.

Ikarus Scheme and Chez Scheme employ this strategy.  I don't know
about CL implementations but last time I checked, what they did was
rehash the entire table when a lookup operation fails after a GC.

Aziz,,,

[1] Generation-Friendly Eq Hash Tables.
    Abdulaziz Ghuloum and R. Kent Dybvig.
    2007 Workshop on Scheme and
    Functional Programming
    Freiburg, Germany
    http://sfp2007.ift.ulaval.ca/procPaper3.pdf

[PS. I don't read comp.lang.lisp usually, so, please email me if
 you have any questions]
From: budden
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate 	properties to list cells?)
Date: 
Message-ID: <ed3224d2-937c-4d15-8ee2-5e905f7252e2@f13g2000yqj.googlegroups.com>
Hallo, Aziz!
  If I get it right, it looks like you have at least a pointer and a
cons cell in a guardian per each key/value pair stored in a hash
table. So it is a memory vs time tradeoff.
  The advantage of your approach is that you do not have this cell for
every entity in an image.
  But if there are many hash tables in an image, the same key might be
stored in many guardians and we get unnecessary memory consumption. If
you store an object list per generation, In this case, storing
object's identity in the object itself seems to be better.
>    http://sfp2007.ift.ulaval.ca/procPaper3.pdf
  If an implementation had a generation number, we might store
generation number of the youngest object in a bucket and rehash only
the buckets which store young data. It would be optimal to do that
only on hash lookup failure. But I see sbcl do not have such an id.

gc.lisp:
(declaim (type cons *gc-epoch*))
(defvar *gc-epoch* (cons nil nil))

  Maybe we might add such a number (likely, a bignum) just for hash
tables. But then accessing hash table implies  comparing two bignums,
I'm not sure it is not slow.
From: Abdulaziz Ghuloum
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate 	properties to list cells?)
Date: 
Message-ID: <4f4dede6-ca8f-4000-bb45-8f7c1ef45a81@41g2000yqf.googlegroups.com>
On Dec 5, 2:11 am, budden <········@gmail.com> wrote:
> Hallo, Aziz!
>   If I get it right, it looks like you have at least a pointer and a
> cons cell in a guardian per each key/value pair stored in a hash
> table. So it is a memory vs time tradeoff.

The intent of using the tconcs was not for performance vs. memory
tradeoff.  It was to eliminate the need to synchronize the mutator
with the collector.  The collector knows when keys move.  It does not
eagerly rehash them because the mutator might be operating on the
table when the collector kicks in (e.g., it might be collecting the
list of keys/values in the table, or it might be enlarging/shrinking
the table).  The tconc is used to only gather the list of keys that
actually moved: the collector adds them to the end, and the mutator
pops them from the front.  This requires no synchronization between
the two. (you can refer to the original Dybvig paper on guardians
for details of how that's done).

>   The advantage of your approach is that you do not have this cell
> for every entity in an image.

Correct.

>   But if there are many hash tables in an image, the same key might be
> stored in many guardians and we get unnecessary memory consumption.

Also correct.  However, you have to make a choice between adding an
extra field to all of your objects (thus doubling the size of pairs
and other small data structures) or consuming an extra pair for each
key that moves during GC.  If you want to optimize for the common
use of Scheme or CommonLisp, you better not double the size of every
pair in the system and let hash tables pay for themselves.  After
all, with chained hashing, you do require about 5 cells for each
key/value pair in a hash table, so, using an occasional extra pair
may not be the worst thing.

Aziz,,,
From: budden
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate 	properties to list cells?)
Date: 
Message-ID: <4bc2ef86-8624-4f3a-8e7f-a10893385cdd@f13g2000yqj.googlegroups.com>
> It was to eliminate the need to synchronize the mutator with the collector.
Ok, that's fine.
> After all, with chained hashing, you do require about 5 cells for each
key/value pair in a hash table, so, using an occasional extra pair
may not be the worst thing.
Reasonable, I agree.
So, I conclude that (in principle, at least) there is a possibility to
store large eq hash tables without catastrophic time/memory penalties.
So, I might e.g. track a source locations of large code base with a
high degree of precision. This way I can use cons cells for storing
AST (which is the natural way to go in lisp) and annotate them via
hash tables, instead of creating "specialized conses" fot that. It is
rather important and fine.
Thanks for your comments.
From: Abdulaziz Ghuloum
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate 	properties to list cells?)
Date: 
Message-ID: <0b2c9ae1-9674-4141-bc9c-2ca4a0e39a24@g38g2000yqd.googlegroups.com>
On Dec 5, 4:34 am, budden <········@gmail.com> wrote:

> So, I conclude that (in principle, at least) there is a possibility to
> store large eq hash tables without catastrophic time/memory penalties.
> So, I might e.g. track a source locations of large code base with a
> high degree of precision. This way I can use cons cells for storing
> AST (which is the natural way to go in lisp) and annotate them via
> hash tables, instead of creating "specialized conses" fot that. It is
> rather important and fine.

I don't know how big your sources are at any given time, but I tried
timing Ikarus on all Scheme source in my home's ikarus/scheme
directory
(77 files).  I timed how long it took to read all files, then how long
it took to insert all pairs in a hash table (taking into account the
possibility of having cyclic and shared data structures).  This is
what I have on my macbook:

running stats for (map read-file files):
    12 collections
    304 ms elapsed cpu time, including 58 ms collecting
    305 ms elapsed real time, including 59 ms collecting
    51311024 bytes allocated
running stats for (for-each hash sources):
    1 collection
    52 ms elapsed cpu time, including 9 ms collecting
    52 ms elapsed real time, including 9 ms collecting
    5268152 bytes allocated
77 files, 38937 lines, 198197 keys

It doesn't look catastrophic, but the size if not that big either.
(what's 40,000 lines or 200,000 pairs)

Aziz,,,
From: Abdulaziz Ghuloum
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate 	properties to list cells?)
Date: 
Message-ID: <54a5878c-d8fc-446f-85b4-d460f0126b02@j39g2000yqn.googlegroups.com>
Maybe one meaningful benchmark here is what happens if we
perform 100 collections while the hash table is in memory,
and how that changes if we probe the table before each
collection.  (the probe is with a nonexistent key, which
causes rehashing of all keys that moved)


running stats for (do ((i 0 (+ i 1))) ((= i 100)) (collect)):
    100 collections
    148 ms elapsed cpu time, including 147 ms collecting

running stats for (do ((i 0 (+ i 1))) ((= i 100)) (hashtable-ref h
(cons 1 2) #f) (collect)):
    100 collections
    177 ms elapsed cpu time, including 157 ms collecting

The total rehashing overhead here is 20~30 ms for 100 collections.
(depending on how you count:
  total time - collection time = 20ms
  with probe - without probe   = 30ms)


We try the same experiment with 1000 collections now.

running stats for (do ((i 0 (+ i 1))) ((= i 1000)) (collect)):
    1000 collections
    408 ms elapsed cpu time, including 406 ms collecting

running stats for (do ((i 0 (+ i 1))) ((= i 1000)) (hashtable-ref h
(cons 1 2) #f) (collect)):
    1000 collections
    487 ms elapsed cpu time, including 437 ms collecting

The total rehashing overhead is now 50~80 ms.


So, for increasing the number of collections by 10x, there is
about 2.5x increase in collection time, and about 2.5x increase
in rehashing time.  (which is what "generation-friendly" means)

Enough of that for now.

Aziz,,,
From: budden
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate 	properties to list cells?)
Date: 
Message-ID: <872b45c0-b51d-4717-a722-66a0f5ce7b68@t11g2000yqg.googlegroups.com>
I think it is rather fast.
From: Juho Snellman
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate  properties to list cells?)
Date: 
Message-ID: <87abbbvypf.fsf@vasara.proghammer.com>
Abdulaziz Ghuloum <········@gmail.com> writes:

> I have written a paper[1] on implementing eq-hash-tables such that:
> 
> 1. No need to store extra info (hash key) in any data structure
>    making it ideal for cons cells and other small data structures.
> 
> 2. It is generation-friendly: *only* objects that actually *move*
>    during a collection are rehashed, and only when lookup operation
>    fail.  Large tables with old keys are not penalized by minor
>    collections.
> 
> Ikarus Scheme and Chez Scheme employ this strategy.  I don't know
> about CL implementations but last time I checked, what they did was
> rehash the entire table when a lookup operation fails after a GC.

In CMUCL the GC eagerly rehashes only the keys that moved (but still
needs to do a full scan of the backing vector of the table to find
those objects).

SBCL used to do the same, but that code was scrapped a while ago,
since the extra syncronization costs from safely coordinating the GC
and the mutators was much larger than the gains from doing less
rehashing. Getting rid of the partial rehashes made it much easier to
optimize the rest of the code.

-- 
Juho Snellman
From: Abdulaziz Ghuloum
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate 	properties to list cells?)
Date: 
Message-ID: <170c3bdc-9600-4751-8803-24cea88cdc20@h5g2000yqh.googlegroups.com>
On Dec 5, 3:00 am, Juho Snellman <······@iki.fi> wrote:

> SBCL used to do the same, but that code was scrapped a while ago,
> since the extra syncronization costs from safely coordinating the GC
> and the mutators was much larger than the gains from doing less
> rehashing.

I don't understand how these are the same.  In our approach, there is
no synchronization whatsoever between the collector and the mutator
because the collector does not rehash (or reorganize, or mess up) the
table.  It's the mutator that owns the hash table.  The collector only
tells what keys have moved.

Aziz,,,
From: Juho Snellman
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate  properties to list cells?)
Date: 
Message-ID: <8763lzvu80.fsf@vasara.proghammer.com>
Abdulaziz Ghuloum <········@gmail.com> writes:
> On Dec 5, 3:00�am, Juho Snellman <······@iki.fi> wrote:
> > SBCL used to do the same, but that code was scrapped a while ago,
> > since the extra syncronization costs from safely coordinating the GC
> > and the mutators was much larger than the gains from doing less
> > rehashing.
> 
> I don't understand how these are the same.  In our approach, there is
> no synchronization whatsoever between the collector and the mutator
> because the collector does not rehash (or reorganize, or mess up) the
> table.  It's the mutator that owns the hash table.  The collector only
> tells what keys have moved.

The removed code that I was referring to was the code inherited from
CMUCL, which did eager partial rehashing. So the GC would actually be
mutating the hash table (in a rather clever way) to rehash the keys
that moved after scavenging the hash table.

I wasn't trying to imply that this was the same technique you're
using, just pointing out that some CL implementations have done
something more complicated than full rehashes on failing lookups.

-- 
Juho Snellman
From: Abdulaziz Ghuloum
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate 	properties to list cells?)
Date: 
Message-ID: <cc5c2736-9f3c-4e17-a4a9-e045aea4de84@j32g2000yqn.googlegroups.com>
On Dec 5, 4:37 am, Juho Snellman <······@iki.fi> wrote:

> I wasn't trying to imply that this was the same technique you're
> using, just pointing out that some CL implementations have done
> something more complicated than full rehashes on failing lookups.

Understood.  Thanks for the clarification.

Aziz,,,
From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: EQ hashtables (was: lisp newbie question: how can I associate properties to list cells?)
Date: 
Message-ID: <REM-2008dec02-003@Yahoo.Com>
> > As for how it *might* be implemented, here are some ideas I have:
> > Never move anything. Use the Mad Hatter's Tea Party algorithm: ...
> From: Kaz Kylheku <········@gmail.com>
> Note that virtual memory is not exactly free. It requires virtual
> address translation, whose efficiency depends on locality of
> reference: keeping everything clumped together.

Indeed. Still there may be some mid-sized algorithms where the MHTP
algorithm will allow you to avoid needing GC while you use a lot
more VM than physical RAM. But as you pointed out if your application
gets *really* huge, scattered with just a few bytes used on
each swapping page, to where swapping is very inefficient you may
at that point need to really GC then.

Thanks for the info, <SNL>Debbie Downer</SNL> ^H^H...^H^H Kaz Killjoy (just kidding)
From: Pascal Costanza
Subject: Re: lisp newbie question: how can I associate properties to list    cells?
Date: 
Message-ID: <6pfseeF7nmk3U1@mid.individual.net>
budden wrote:
> I was wrong. Sorry.
> 
> Some time ago I also was interested with associating properties to
> cons cells. I have read cltl2, not the new standard and I was sure
> sxhash is used in a hash tables. Well, it is fine that I was wrong.
> 
> But I still can't imagine a really efficient way to do so. We either
> need to store cons cell identity in the cons cell itself (consumes
> memory), or we need to do some rehashing if a cons cell have recently
> moved with a GC (consumes a time), or we should not move objects at
> all (consumes a memory).

...or we should only worry if and when we find out that this is an 
actual bottleneck in our program...


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
From: Kaz Kylheku
Subject: Re: lisp newbie question: how can I associate properties to list  cells?
Date: 
Message-ID: <20081216074831.211@gmail.com>
On 2008-11-30, budden <········@gmail.com> wrote:
> I was wrong. Sorry.
>
> Some time ago I also was interested with associating properties to
> cons cells. I have read cltl2, not the new standard and I was sure
> sxhash is used in a hash tables. Well, it is fine that I was wrong.
>
> But I still can't imagine a really efficient way to do so. We either
> need to store cons cell identity in the cons cell itself (consumes
> memory), or we need to do some rehashing if a cons cell have recently
> moved with a GC (consumes a time), or we should not move objects at
> all (consumes a memory).

A common implementation strategy for conses is to put them into a dedicated
heap that contains only conses. Since the conses are all of the same size, that
heap doesn't fragment. Since it doesn't fragment, it doesn't have to be
compacted, and so conses can stay where they are.

You are right, though; this is a general problem. If you compact heaps, how do
you maintain object identity? It's not just heap compaction. There is also the
possibility of CLOS objects changing type dynamically; they can acquire and
lose slots. If an object changes type in such a way that it's now too large to
fit into its original space, it has to be reloated.

The possibilities you have outlined are not exhaustive. Your idea of putting an
ID into the object is one possibility.  Another way is to have handles. Under
handles, an object's ID isn't an absolute address, but a displacement into an
array of pointers.  The disadvantage is the double dereference to access an
object: fetch the real pointer from the table, then go to the object.  This can
be dealt with by some aggressive optimization: a thread can fetch the real
addresses of objects and keep them cached in registers. (Some such caching will
necessarily happen even in unoptimized access sequences, and so the GC has to
deal with it anyway).

An advantage of the handle approach is that it provides reliable ID recycling.
Two different objects have a different position in the table and hence
different ID. When an object becomes garbage, its ID (position in the table)
can be easily reused for another object.  If ID's are stored in
objects, you need some algorithm to generate new ID's which don't clash with
any existing ID's (e.g. a data structure holding dynamic, compacted set of
intervals over the integer domain).