Loop over keys and values of a hashtable

From: Gisle SÃ¦lensminde
Subject: Loop over keys and values of a hashtable
Date: Fri, 17 Jun 2005 11:19:47 +0000
Message-ID: <slrndb5cej.v7e.gisle@kaktus.ii.uib.no>

I'm trying to make a loop that runs over all pairs of keys/values of a
hashtable. My question is - will the code below always give the key/value
pairs that belong to each other. In some implementations it does, but 
I'm asking whether it is reasonable to expect the keys/values to be paird.
My gut feeling is that it's not.

For example the following code converting a hashtable to an assoc-list:

(loop for key being the hash-keys of *hash*
      for val being the hash-values of *hash* collect (cons key val))


-- 
Gisle S�lensminde, Phd student, Scientific programmer
Computational biology unit, University of Bergen, Norway
Email: gisle at cbu.uib.no | Complicated is easy, simple is hard.

Re: Loop over keys and values of a hashtable ···············@yahoo.com
- Re: Loop over keys and values of a hashtable Peter Lewerin
Re: Loop over keys and values of a hashtable Edi Weitz
- Re: Loop over keys and values of a hashtable Kent M Pitman
  - Re: Loop over keys and values of a hashtable David
    - Re: Loop over keys and values of a hashtable Cameron MacKinnon
      - Re: Loop over keys and values of a hashtable Tayssir John Gabbour
      - Re: Loop over keys and values of a hashtable Greg Menke
    - Re: Loop over keys and values of a hashtable Marcin 'Qrczak' Kowalczyk
    - Re: Loop over keys and values of a hashtable Kent M Pitman
      - Re: Loop over keys and values of a hashtable Robert Maas, see http://tinyurl.com/uh3t
        Re: Loop over keys and values of a hashtable Kent M Pitman
        Re: Loop over keys and values of a hashtable Kirk Job Sluder
        Re: Loop over keys and values of a hashtable Robert Maas, see http://tinyurl.com/uh3t
      - Re: Loop over keys and values of a hashtable Kirk Job Sluder
Re: Loop over keys and values of a hashtable ·············@oktetlabs.ru

From: ···············@yahoo.com
Subject: Re: Loop over keys and values of a hashtable
Date: Fri, 17 Jun 2005 15:49:34 +0000
Message-ID: <1119023374.314815.206700@g14g2000cwa.googlegroups.com>

There is also with-hash-table-iterator.

From: Peter Lewerin
Subject: Re: Loop over keys and values of a hashtable
Date: Sun, 19 Jun 2005 10:37:03 +0000
Message-ID: <b72f3640.0506190237.79d27cd3@posting.google.com>

···············@yahoo.com wrote in message news:<························@g14g2000cwa.googlegroups.com>...
> There is also with-hash-table-iterator.

And, for pedantic completeness in case newbies are reading, MAPHASH:

   (let (alist)
     (maphash #'(lambda (key value)
                  (push (cons key value) alist)) *table*)
     (reverse alist))

From: Edi Weitz
Subject: Re: Loop over keys and values of a hashtable
Date: Fri, 17 Jun 2005 11:23:06 +0000
Message-ID: <u8y19z0px.fsf@agharta.de>

On 17 Jun 2005 11:19:47 GMT, Gisle Sælensminde <·····@nowhere.uib.no> wrote:

> I'm trying to make a loop that runs over all pairs of keys/values of
> a hashtable. My question is - will the code below always give the
> key/value pairs that belong to each other. In some implementations
> it does, but I'm asking whether it is reasonable to expect the
> keys/values to be paird.  My gut feeling is that it's not.
>
> For example the following code converting a hashtable to an assoc-list:
>
> (loop for key being the hash-keys of *hash*
>       for val being the hash-values of *hash* collect (cons key val))

I think there's nothing in the standard that says the above code
should do what you expect.  Instead, use somthing like this:

  (loop for key being the hash-keys of *hash*
        using (hash-value val)
        collect (cons key val))

Cheers,
Edi.

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")

From: Kent M Pitman
Subject: Re: Loop over keys and values of a hashtable
Date: Fri, 17 Jun 2005 15:58:09 +0000
Message-ID: <ull59f012.fsf@nhplace.com>

Edi Weitz <········@agharta.de> writes:

> On 17 Jun 2005 11:19:47 GMT, Gisle Sælensminde <·····@nowhere.uib.no> wrote:
> 
> > I'm trying to make a loop that runs over all pairs of keys/values of
> > a hashtable. My question is - will the code below always give the
> > key/value pairs that belong to each other. In some implementations
> > it does, but I'm asking whether it is reasonable to expect the
> > keys/values to be paird.  My gut feeling is that it's not.
> >
> > For example the following code converting a hashtable to an assoc-list:
> >
> > (loop for key being the hash-keys of *hash*
> >       for val being the hash-values of *hash* collect (cons key val))
> 
> I think there's nothing in the standard that says the above code
> should do what you expect.  Instead, use somthing like this:
> 
>   (loop for key being the hash-keys of *hash*
>         using (hash-value val)
>         collect (cons key val))

While I agree that this is so, I also think it's reasonable to contact
each vendor and assert that you think these two things should move in
lockstep and ask why they do not.  If they don't have a clear reason,
I'd ask them to change it.  If they do have a good reason, post it
back here so we'll all be informed.  I think the language should do
reasonable things where possible, whether specified or not, and that
it's possible for vendors to migrate toward convergence
notwithstanding the absence of prodding by a standard.

From: David
Subject: Re: Loop over keys and values of a hashtable
Date: Sat, 18 Jun 2005 14:58:39 +0000
Message-ID: <zEWse.22$y57.2@fe1.news.blueyonder.co.uk>

Kent M Pitman wrote:
> Edi Weitz <········@agharta.de> writes:
> 
> 
>>On 17 Jun 2005 11:19:47 GMT, Gisle Sælensminde <·····@nowhere.uib.no> wrote:
>>
>>
>>>I'm trying to make a loop that runs over all pairs of keys/values of
>>>a hashtable. My question is - will the code below always give the
>>>key/value pairs that belong to each other. In some implementations
>>>it does, but I'm asking whether it is reasonable to expect the
>>>keys/values to be paird.  My gut feeling is that it's not.
>>>
>>>For example the following code converting a hashtable to an assoc-list:
>>>
>>>(loop for key being the hash-keys of *hash*
>>>      for val being the hash-values of *hash* collect (cons key val))
>>
>>I think there's nothing in the standard that says the above code
>>should do what you expect.  Instead, use somthing like this:
>>
>>  (loop for key being the hash-keys of *hash*
>>        using (hash-value val)
>>        collect (cons key val))
> 
> 
> While I agree that this is so, I also think it's reasonable to contact
> each vendor and assert that you think these two things should move in
> lockstep and ask why they do not.  If they don't have a clear reason,
> I'd ask them to change it.  If they do have a good reason, post it
> back here so we'll all be informed.  I think the language should do
> reasonable things where possible, whether specified or not, and that
> it's possible for vendors to migrate toward convergence
> notwithstanding the absence of prodding by a standard.

I disagree :)

I don't think it _is_ reasonable to expect those things to move in 
lockstep. The thing is, hash tables are by nature unordered, so no code 
should ever expect them to come out in any particular order - even if 
you iterate over the same hash table you shouldn't expect to do so in 
the same order. The problem with a vendor trying to make the language do 
reasonable things is that people might then write code which relies on 
those 'reasonable' things. This code could then fail in subtle, hard to 
debug, ways if you move to another implementation. For this reason I 
think it would almost be more helpful for the vendor to try to _prevent_ 
this loop running in lockstep.

I agree that languages should generally try to do reasonable things 
where possible, but I don't think it is helpful in this case.

I've noticed in perl that even if you have a hash containing a 
particular set of key value pairs it won't always be looped over in the 
same order for different runs of the program. Maybe it depends on what 
other hashes there are around at the time?

From: Cameron MacKinnon
Subject: Re: Loop over keys and values of a hashtable
Date: Sat, 18 Jun 2005 15:39:22 +0000
Message-ID: <0Y2dnfzskt223SnfRVn-qg@rogers.com>

David wrote:
> Kent M Pitman wrote:
> 
>> Edi Weitz <········@agharta.de> writes:
>> 
>>> <·····@nowhere.uib.no> wrote:
>>> 
>>>> (loop for key being the hash-keys of *hash* for val being the
>>>> hash-values of *hash* collect (cons key val))
>>> 
>>> I think there's nothing in the standard that says the above code 
>>> should do what you expect.
>>> 
>> While I agree that this is so, I also think it's reasonable to
>> contact each vendor and assert that you think these two things
>> should move in lockstep and ask why they do not.

> I disagree :)
> 
> I don't think it _is_ reasonable to expect those things to move in 
> lockstep. The thing is, hash tables are by nature unordered, so no
> code should ever expect them to come out in any particular order -
> even if you iterate over the same hash table you shouldn't expect to
> do so in the same order. The problem with a vendor trying to make the
> language do reasonable things is that people might then write code
> which relies on those 'reasonable' things. This code could then fail
> in subtle, hard to debug, ways if you move to another implementation.
> For this reason I think it would almost be more helpful for the
> vendor to try to _prevent_ this loop running in lockstep.

Just out of curiosity, what is your mental model for how Lisp or Perl
iterate over the entries in a hash table? You say that "hash tables are
by nature unordered", which is a good description of how users are
supposed to think about hash tables, but it does precious little good to
an implementor. Do you suppose that an implementation's iterator loops
over all possible keys and checks to see whether the table has an entry
for each? Given your definition, it would have little choice. After all,
there's no first element and no "next element" operation.

In real life, hash tables are "by nature" ordered, because they occupy
memory, which is ordered. The discussion, so far, has been about whether
  a user can assume that an implementor isn't brain-dead. Can you
conceive of an implementation where the above hash-keys and hash-values
come out uncorrelated and which is just as fast as the obvious
correlated implementation?

> I agree that languages should generally try to do reasonable things 
> where possible, but I don't think it is helpful in this case.

What you're arguing is that user programs should bear the burden of a
performance hit (a hash and lookup for each table entry) because the
language doesn't have a "loop over the hash table giving me the
key-value pair" operation and the spec doesn't say that the OP's code
must work the way it ought. That's fine for people who aren't concerned
about performance... but those people wouldn't be using hash tables in
the first place, would they?

I think the OP's code should be added to Paul Dietz's test suite (if
Paul considers it reasonable). Failure of the test case wouldn't
indicate non-conformance, but it sure would raise questions in users' mnds.

-- 
Cameron MacKinnon
Toronto, Canada

From: Tayssir John Gabbour
Subject: Re: Loop over keys and values of a hashtable
Date: Sat, 18 Jun 2005 17:39:53 +0000
Message-ID: <1119116393.829093.300930@g43g2000cwa.googlegroups.com>

Cameron MacKinnon wrote:
> David wrote:
> > Kent M Pitman wrote:
> >> Edi Weitz <········@agharta.de> writes:
> >>> <·····@nowhere.uib.no> wrote:
> >>>> (loop for key being the hash-keys of *hash* for val being the
> >>>> hash-values of *hash* collect (cons key val))
> >>>
> >>> I think there's nothing in the standard that says the above code
> >>> should do what you expect.
> >>>
> >> While I agree that this is so, I also think it's reasonable to
> >> contact each vendor and assert that you think these two things
> >> should move in lockstep and ask why they do not.
>
> > I agree that languages should generally try to do reasonable things
> > where possible, but I don't think it is helpful in this case.
>
> What you're arguing is that user programs should bear the burden of a
> performance hit (a hash and lookup for each table entry) because the
> language doesn't have a "loop over the hash table giving me the
> key-value pair" operation and the spec doesn't say that the OP's code
> must work the way it ought. That's fine for people who aren't concerned
> about performance... but those people wouldn't be using hash tables in
> the first place, would they?

I don't quite understand a point you made... loop already does this:
;; #1
(loop for key being the hash-keys of *hash* using (hash-value val) ...)

instead of the original poster's:
;; #2
(loop for key being the hash-keys of *hash*
      for val being the hash-values of *hash* ...)

The issue here is that #2 isn't guaranteed to act like #1, and some
argue it probably should. But #1 already exists and is supported.


Tayssir

From: Greg Menke
Subject: Re: Loop over keys and values of a hashtable
Date: Sat, 18 Jun 2005 19:23:12 +0000
Message-ID: <m3y89779lr.fsf@athena.pienet>

Cameron MacKinnon <··········@clearspot.net> writes:

> David wrote:
> > Kent M Pitman wrote:
> >
> >> While I agree that this is so, I also think it's reasonable to
> >> contact each vendor and assert that you think these two things
> >> should move in lockstep and ask why they do not.
> 
> > I disagree :)
> > I don't think it _is_ reasonable to expect those things to move in
> > lockstep. The thing is, hash tables are by nature unordered, so no
> > code should ever expect them to come out in any particular order -
> > even if you iterate over the same hash table you shouldn't expect to
> > do so in the same order. The problem with a vendor trying to make the
> > language do reasonable things is that people might then write code
> > which relies on those 'reasonable' things. This code could then fail
> > in subtle, hard to debug, ways if you move to another implementation.
> > For this reason I think it would almost be more helpful for the
> > vendor to try to _prevent_ this loop running in lockstep.
> 
> Just out of curiosity, what is your mental model for how Lisp or Perl
> iterate over the entries in a hash table? You say that "hash tables are
> by nature unordered", which is a good description of how users are
> supposed to think about hash tables, but it does precious little good to
> an implementor. Do you suppose that an implementation's iterator loops
> over all possible keys and checks to see whether the table has an entry
> for each? Given your definition, it would have little choice. After all,
> there's no first element and no "next element" operation.

What he's saying is there is no <specific> order.  An implementation is
free to return items in an arbitrary and possibly changing order from
pass to pass.  The only thing the user may expect is they will
eventually iterate over all the items.  Observations like that do give
an implementor well defined scope.

> In real life, hash tables are "by nature" ordered, because they occupy
> memory, which is ordered. The discussion, so far, has been about whether
>   a user can assume that an implementor isn't brain-dead. Can you
> conceive of an implementation where the above hash-keys and hash-values
> come out uncorrelated and which is just as fast as the obvious
> correlated implementation?

The fact that they're in memory does give the items a sequence in the
abstract, but not necessarily a sequence relevant to the hash algorithm
which put them there in the first place.  Not to mention the items and
hash table overhead could well be dispersed throughout the image's
memory and not reasonably addressable outside the hash algorithm's
indices.

I think the problem is the uncorrelated nature of the two loop terms, if
a new loop term was added to specifically return the key and value like
maphash does then I think concerns about correlation would be addressed
more cleanly.

> > I agree that languages should generally try to do reasonable things
> > where possible, but I don't think it is helpful in this case.
> 
> What you're arguing is that user programs should bear the burden of a
> performance hit (a hash and lookup for each table entry) because the
> language doesn't have a "loop over the hash table giving me the
> key-value pair" operation and the spec doesn't say that the OP's code
> must work the way it ought. That's fine for people who aren't concerned
> about performance... but those people wouldn't be using hash tables in
> the first place, would they?

If the user is concerned about performance they are probably not going
to be iterating over a hash table, returning keys and values at the same
time..  If the user really wants to do that kind of thing theres always
maphash.  And since we're talking about Common Lisp and not C/C++, the
user is also free to tweak the LOOP macro as necessary.

> I think the OP's code should be added to Paul Dietz's test suite (if
> Paul considers it reasonable). Failure of the test case wouldn't
> indicate non-conformance, but it sure would raise questions in users' mnds.

Its definitely a good thing to know, loop is so handy it sometimes gives
a false sense of security.

Gregm

From: Marcin 'Qrczak' Kowalczyk
Subject: Re: Loop over keys and values of a hashtable
Date: Sat, 18 Jun 2005 15:43:14 +0000
Message-ID: <87hdfv1xil.fsf@qrnik.zagroda>

David <······@SPAMAROONEY.blueyonder.co.uk> writes:

> I've noticed in perl that even if you have a hash containing a
> particular set of key value pairs it won't always be looped over in
> the same order for different runs of the program.

This is to avoid feeding Perl scripts with maliciously constructed
data which triggers a pathological amount of hash collisions.

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From: Kent M Pitman
Subject: Re: Loop over keys and values of a hashtable
Date: Sat, 18 Jun 2005 18:45:48 +0000
Message-ID: <ubr634i51.fsf@nhplace.com>

David <······@SPAMAROONEY.blueyonder.co.uk> writes:

> I don't think it _is_ reasonable to expect those things to move in
> lockstep. The thing is, hash tables are by nature unordered, so no
> code should ever expect them to come out in any particular order -
> even if you iterate over the same hash table you shouldn't expect to
> do so in the same order.

You're not relying on a particular order here.  Relying on the fact of an
order is not relying on the fact of a particular order, any more than
 (length (list (random)))
or
 (integerp (random)) 
is relying on something about your random number generator. A number is 
a number.

You're relying merely on synchronized order of access, given that they
have to be in SOME order.  Hash tables have no SPECIFIED ordering, but
that doesn't mean that accesses to a hash table are nondeterministic
in some literal sense.

I was having a hard time imagining an implementation of a hash table in 
which the keys were not associated with the values such that if you found
one, you wouldn't have both.

If you think this would be hard to implement in some implementation of hash
table, I'd believe you... but I'd also want you to show me the implementation
so I could learn what I'm not clicking on that makes it thus hard.  I'm not
saying I'm infallible or all-knowing, in other words, just that I don't 
believe in ghost stories.  I want to talk about real situations.

> The problem with a vendor trying to make the
> language do reasonable things is that people might then write code
> which relies on those 'reasonable' things.

This is how the entire system you now use came to pass.  Lisp evolves and
this is precisely how.  You don't rely on it doing reasonable things, you
ask vendors to promise to do things because those things are reasonable
and then you rely on vendor promises.  Vendor promises are as good as 
a standard.  In fact, better, since the standard cannot promise a single
implementation.  A vendor can promise at least one.  (Again I have to add
that by vendor I include free software makers--i.e., any implementor 
committed to his/her product).

> This code could then fail in subtle, hard to debug, ways if you move
> to another implementation.

Only if you run it in some implementation that has made no promise.
No one is asking you to rely on faith.  I don't know where you got that. 
I suggested, rather, that people contact vendors and ask them to tighten
up their promises so that in the future it might become part of a layered
or revised standard.

Standards are made up from current practice.  No change current practice 
means the next standard looks like the previous.

> For this reason I think it would almost be more helpful for the
> vendor to try to _prevent_ this loop running in lockstep.

Nonsense.  This would be true in a community full of non-evolution and
wishful thinking.  But it's not how Lisp as a community has worked.

> I agree that languages should generally try to do reasonable things
> where possible, but I don't think it is helpful in this case.

Well, we disagree.  My opinion per se, of course, carries no more
weight than yours.  But I personally hope my arguments will catch
hold, since they at least have the possibility of moving things forward
and fixing small things without waiting for a large thing, which I
somewhat hope is not soon in coming.  (I don't think it's yet the right
time for another big standard.)

> I've noticed in perl that even if you have a hash containing a
> particular set of key value pairs it won't always be looped over in
> the same order for different runs of the program.

Nor have I suggested that here.

> Maybe it depends on
> what other hashes there are around at the time?

It can depend on a lot of things.  Lisp can change the order between
iterations.  No one here is talking about that, as far as I understand.

From: Robert Maas, see http://tinyurl.com/uh3t
Subject: Re: Loop over keys and values of a hashtable
Date: Sun, 19 Jun 2005 03:01:13 +0000
Message-ID: <REM-2005jun18-004@Yahoo.Com>

> From: Kent M Pitman <······@nhplace.com>
> Relying on the fact of an order is not relying on the fact of a
> particular order

I agree.

> I was having a hard time imagining an implementation of a hash table
> in which the keys were not associated with the values such that if
> you found one, you wouldn't have both.

If you mean that literally, that all hashtables work both ways, from
keys to values and from values to keys, then the only reasonable
implementation would be to store keys and values in separate hash
tables with links from each entry in each to the corresponding entry in
the other. With such dual hashing, there's no reason whatsoever to
expect that mapping down the key table and in parallel with that
process separately mapping down the value table would produce anything
remotely resembling the usual key<->value correspondence.

Now the current spec doesn't say what you said, it specifies mapping in
just one direction, lookup key and map to value, right? Still some
implementation might in fact provide mapping in both directions as an
enhanced value above what's required by the spec, and the LOOP macro
term to map down values might be optimized to directly use the value
hashtable instead of use the key hashtable and lookup value therein.
So the OP's code would be totally broken on such an implementation.

> Lisp can change the order between iterations.

And to avoid a deadlock of some interation just sits there, rehashing
should be possible while still in the middle of an iteration, so long
as the already-started iteration can continue using its old obsolete
copy of the hash table, right? In that case, the use of the LOOP macro
given by the OP could start an iteration for the keys, then a GC could
occur and notice the hash table is nearly full so re-hash it, then the
iteration on the values could be started on the re-hashed table instead
of the original, so the parallel iterations of keys and values in the
LOOP macro would be running down non-matching sequences. Kent, would
that be allowed by the spec, IYO?

To the OP, if you want to avoid the hit on looking up each key twice,
once by simple iteration to retrieve the keys in sequence, and once by
actual computing hash to lookup said key and retrieve value, then you
can do this simple trick: When you originally store the values in the
table, instead store (key . value) pairs. When you want the value, take
CDR of what hash lookup gives you, very small extra cost. When you want
to map down keys and values in parallel, map down only values, and take
car and cdr of each item returned to get guaranteed corresponding key
and value. Only you (or better, a profiler) know(s) whether you do
parallel key/value association often enough to pay back the cost you
spent always needing to take extra cdr when you only want value.

From: Kent M Pitman
Subject: Re: Loop over keys and values of a hashtable
Date: Sun, 19 Jun 2005 04:16:47 +0000
Message-ID: <uekazq8sh.fsf@nhplace.com>

·······@Yahoo.Com (Robert Maas, see http://tinyurl.com/uh3t) writes:

> > From: Kent M Pitman <······@nhplace.com>
> > Relying on the fact of an order is not relying on the fact of a
> > particular order
> 
> I agree.
> 
> > I was having a hard time imagining an implementation of a hash table
> > in which the keys were not associated with the values such that if
> > you found one, you wouldn't have both.
> 
> If you mean that literally, that all hashtables work both ways, from
> keys to values and from values to keys,

No, I was saying nothing about reverse mappings from values to keys.

I'm saying that I don't see any reason you can't just walk either 
the keys or the key/value pairs, depending on how it's laid out, and
get the values from there in either case.  Hash tables want to be 
implemented such that they're fast, so it can't be THAT slow to go from
keys to values.  Hence, you can generate all the values by calling some
internal mapping function from the locative of the key you're at anywhere
along the way to the value, just the same as if you'd already done the hash
key part and the bucket search and gotten to the key locative that way.

> To the OP, if you want to avoid the hit on looking up each key twice,
> once by simple iteration to retrieve the keys in sequence, and once by
> actual computing hash to lookup said key and retrieve value, then you
> can do this simple trick: ...

I would think a full user-level gethash should be needed.  This all happens
internally, so implementations that take advantage of knowledge of the 
detailed representation are allowable.

From: Kirk Job Sluder
Subject: Re: Loop over keys and values of a hashtable
Date: Sun, 19 Jun 2005 08:55:21 +0000
Message-ID: <8764wa913b.fsf@debian.kirkjobsluder.is-a-geek.net>

Kent M Pitman <······@nhplace.com> writes:

> ·······@Yahoo.Com (Robert Maas, see http://tinyurl.com/uh3t) writes:
> I'm saying that I don't see any reason you can't just walk either 
> the keys or the key/value pairs, depending on how it's laid out, and
> get the values from there in either case.  Hash tables want to be 
> implemented such that they're fast, so it can't be THAT slow to go from
> keys to values.  Hence, you can generate all the values by calling some
> internal mapping function from the locative of the key you're at anywhere
> along the way to the value, just the same as if you'd already done the hash
> key part and the bucket search and gotten to the key locative that way.

Certainly.  I don't think you can do it using a two-clause loop like:
(loop for x being the hash-keys in hash
      for y being the hash-values in hash)

Each time you add a new clause to a loop statement, you create a new
sandbox.  The hash-keys and hash-values statements have no idea that the
other exists.  They both know that x and y exist, but not that they are
accessing the same hash table in sequence.

I suspect that some implementations of (loop for x being the hash-keys
using (hash-value y)) may use the walk method as opposed to the lookup
method.  However, from what I can tell, gethash is pretty trivial in
terms of performance, so there does not seem to be that much to be
gained from walking vs. hashing.  
-- 
Kirk Job-Sluder
"The square-jawed homunculi of Tommy Hilfinger ads make every day an
existential holocaust."  --Scary Go Round

From: Robert Maas, see http://tinyurl.com/uh3t
Subject: Re: Loop over keys and values of a hashtable
Date: Sun, 26 Jun 2005 06:17:34 +0000
Message-ID: <REM-2005jun25-018@Yahoo.Com>

> From: Kent M Pitman <······@nhplace.com>
> I don't see any reason you can't just walk either the keys or the
> key/value pairs, depending on how it's laid out, and get the values
> from there in either case.

I agree, but that's not what the OP was trying to do, so your remark is
irrelevant to his question about legitimacy of what he was trying.

There's nothing in the spec that says you can run multiple iterators
through a collection in parallel and expect that they will remain in
sync indefinitely (until they reach the end of the collection).
Relaying on such undocumented and unexpected behaviour is not good,
especially if you'd like your code to be portable, but even if not.
(Same applies to lisp or java or any other language with iterators.)

From: Kirk Job Sluder
Subject: Re: Loop over keys and values of a hashtable
Date: Sun, 19 Jun 2005 08:30:37 +0000
Message-ID: <87br62928i.fsf@debian.kirkjobsluder.is-a-geek.net>

Kent M Pitman <······@nhplace.com> writes:
> You're relying merely on synchronized order of access, given that they
> have to be in SOME order.  Hash tables have no SPECIFIED ordering, but
> that doesn't mean that accesses to a hash table are nondeterministic
> in some literal sense.
> 
> I was having a hard time imagining an implementation of a hash table in 
> which the keys were not associated with the values such that if you found
> one, you wouldn't have both.

Spending a bit too much time than I should digging into this, I think
you have a bigger problem than just "how do I get the key-value pairs
out of the hash table?"

To start with, iterating over a hash table is expensive. This is one of
the tradeoffs we have to deal with.  Searching an index is easy, listing
the entire library is hard.  So you really want to avoid hash-iteration
if you can.

So here are our contestants:

One clause:
(loop for i being the hash-keys of foo using (hash-value j))

Two clauses: 
(loop 
        for i being the hash-keys of foo ;;clause A
        for j being the hash-values of foo));;clause B

The specs for loop define each clause within a loop macro as independent
of the other clauses, able to access only the values explicitly set in
the lexical scope. B has no idea what A is doing outside of the fact
that "key" was set.  In general, this is a good thing.

Using macroexpand (at least in cmucl, sbcl, and clisp) reveals that the
two-clause form requires two hash-iterators.  In other words, you are
searching for every key-value pair twice.  The one-clause form gets the
key-value pairs in one shot.  Also, the hash-keys/hash-values forms
appear to be just syntactic sugar.  The two-clause form may or may not
work, but probably not for the reason you are proposing in this discussion.

The benchmarks (appended) reveal that you get about the performance hit
that you would expect.  The two-clause form takes about 190 CPU cycles
per iteration.  The one-clause form takes about 120.  The manual form
using an explicit (gethash ...) appears to be about the same.

Now personally, being a python person, I'd like to see a less ugly hash
iteration form like (for x,y in hash-pairs of foo).  But I don't think you
can get there by making an exception for the two-clause form, at least
not without breaking other aspects of how loop works.  

These were done using CMUCL and slime on an EPIA 900MHz running Linux BTW.

CL-USER> (loop for i from 1 to 1000000
	       do (setf (gethash i foo) i))
NIL
CL-USER> foo
#<EQL hash table, 1000000 entries {58E1073D}>
CL-USER> (time (loop for i being the hash-keys of foo
		     for j being the hash-values of foo
		     )
	       )
; Compiling LAMBDA NIL: 
; Compiling Top-Level Form: 

; Evaluation took:
;   0.2 seconds of real time
;   0.2 seconds of user run time
;   0.0 seconds of system run time
;   188,127,313 CPU cycles
;   0 page faults and
;   0 bytes consed.
; 
NIL
CL-USER> (time (loop for i being the hash-keys of foo using (hash-value j)))
; Compiling LAMBDA NIL: 
; Compiling Top-Level Form: 

; Evaluation took:
;   0.12 seconds of real time
;   0.12 seconds of user run time
;   0.0 seconds of system run time
;   115,272,923 CPU cycles
;   0 page faults and
;   0 bytes consed.
; 
NIL
CL-USER> 

the "using" clause gives you about the same performance hit as looking
up the value using the key manually:

CL-USER> (time (loop for i being the hash-keys of foo
		     for j = (gethash i foo)))
; Compiling LAMBDA NIL: 
; Compiling Top-Level Form: 

; Evaluation took:
;   0.13 seconds of real time
;   0.13 seconds of user run time
;   0.0 seconds of system run time
;   116,602,311 CPU cycles
;   0 page faults and
;   0 bytes consed.
; 
NIL
CL-USER> 

-- 
Kirk Job-Sluder
"The square-jawed homunculi of Tommy Hilfinger ads make every day an
existential holocaust."  --Scary Go Round

From: ·············@oktetlabs.ru
Subject: Re: Loop over keys and values of a hashtable
Date: Fri, 17 Jun 2005 11:32:17 +0000
Message-ID: <87oea5i5ha.fsf@artem.iling.nw.ru>

Gisle Sælensminde <·····@nowhere.uib.no> writes:

> I'm trying to make a loop that runs over all pairs of keys/values of a
> hashtable. My question is - will the code below always give the key/value
> pairs that belong to each other. In some implementations it does, but 
> I'm asking whether it is reasonable to expect the keys/values to be paird.
> My gut feeling is that it's not.
> 
> For example the following code converting a hashtable to an assoc-list:
> 
> (loop for key being the hash-keys of *hash*
>       for val being the hash-values of *hash* collect (cons key val))

You should use:

(loop for key being the hash-keys 4 using (hash-value val) 
      collect (cons key val))

-- 
Artem V. Andreev
Institute of Linguistic studies, St.Petersburg, Russia