From: K. Ari Krupnikov
Subject: reject duplicates in a hashtable?
Date: 
Message-ID: <86zn29u6zf.fsf@localhost.localdomain>
I need to signal an error (or a warning maybe) if a value is inserted
into a hashtable with a key that already exists in that table. As far
as I can tell, no such facility is provided in CL. Am I correct?

What I did was write a function to check this. Can you guys tell me if
I'm thinking in the right direction? Is there a clearer way of doing
this?

(defun unique-sethash (key value ht)
  (multiple-value-bind (old-value was-found) (gethash key ht)
    (if was-found
      (error "duplicate key ~A" key)
      (setf (gethash key ht) value))))


Also, I ended up calling gethash twice in this function (if no error
is reported). How do I avoid that? old-value has the value of the what
was in the table, not the pointer back to the "place" in the table
that I could pass to setf. (I can pass it of course, but I'd be
setting the local old-value, not the hashtable). How can I preserve
the pointer?

Ari.

-- 
Elections only count as free and trials as fair if you can lose money
betting on the outcome.

From: Duane Rettig
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <4wtxdbj1k.fsf@franz.com>
···@lib.aero (K. Ari Krupnikov) writes:

> I need to signal an error (or a warning maybe) if a value is inserted
> into a hashtable with a key that already exists in that table. As far
> as I can tell, no such facility is provided in CL. Am I correct?

Correct.

> What I did was write a function to check this. Can you guys tell me if
> I'm thinking in the right direction? Is there a clearer way of doing
> this?
> 
> (defun unique-sethash (key value ht)
>   (multiple-value-bind (old-value was-found) (gethash key ht)
>     (if was-found
>       (error "duplicate key ~A" key)
>       (setf (gethash key ht) value))))
> 
> 
> Also, I ended up calling gethash twice in this function (if no error
> is reported). How do I avoid that? old-value has the value of the what
> was in the table, not the pointer back to the "place" in the table
> that I could pass to setf. (I can pass it of course, but I'd be
> setting the local old-value, not the hashtable). How can I preserve
> the pointer?

I assume that by "calling gethash twice" you mean that you're generating
the same hashcode and MOD-ing it twice, once in gethash and once in
(setf gethash).

There is a kind of hash-table that Allegro CL provides called a
"sans values" hash-table.  The keys themselves become the values,
and the contents are marked by the presence or absences of a
particular value/key.  We implement, for example, a shared-cons
hash-table with this extension, which is used to share some kinds
of constants that happen to be lists.  This sans-values hash-table
has a special setter called excl:puthash-key, which takes only
two arguments (the key and the hash-table) and which always returns
the actual value (key) stored in the hash-table, regardless whether
the key which did the finding is EQ to the stored key.

Obviously this would not be useful to you if you also want to
associate a value with the key.  But if your hash-table is wanting
to create an "existence" set, then it might be the kind of thing
you want.

This exaple is similar to how we implement our shared-cons hash-table.
I didn't demonstrate any usage for the weak-keys aspect of it, but
you can probably speculate what it is for.

CL-USER(1): (setq ht (make-hash-table :values nil :weak-keys t :test #'equal))
#<EQUAL hash-table (sans values) with weak keys, 0 entries @ #x717c2312>
CL-USER(2): (setq x (list 10) y (list 10) z (list 20))
(20)
CL-USER(3): (eq x y)
NIL
CL-USER(4): (puthash-key x ht)
(10)
CL-USER(5): (setq w (puthash-key y ht))
(10)
CL-USER(6): (eq w y)
NIL
CL-USER(7): (eq w x)
T
CL-USER(8): (puthash-key z ht)
(20)
CL-USER(9): ht
#<EQUAL hash-table (sans values) with weak keys, 2 entries @ #x717c2312>
CL-USER(10): :i *
#<EQUAL hash-table (sans values) with weak keys, 2 entries @ #x717c2312>
   0 KEY ----------> (10), a proper list with 1 element
   1 KEY ----------> (20), a proper list with 1 element
CL-USER(11): (gethash y ht)
(10)
T
CL-USER(12): (eq * y)
NIL
CL-USER(13): (eq ** x)
T
CL-USER(14): 

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Edi Weitz
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <uy8htiutq.fsf@agharta.de>
On 26 Oct 2004 03:08:36 -0700, ···@lib.aero (K. Ari Krupnikov) wrote:

> I need to signal an error (or a warning maybe) if a value is
> inserted into a hashtable with a key that already exists in that
> table. As far as I can tell, no such facility is provided in CL. Am
> I correct?
>
> What I did was write a function to check this. Can you guys tell me
> if I'm thinking in the right direction? Is there a clearer way of
> doing this?
>
> (defun unique-sethash (key value ht)
>   (multiple-value-bind (old-value was-found) (gethash key ht)
>     (if was-found
>       (error "duplicate key ~A" key)
>       (setf (gethash key ht) value))))
>
> Also, I ended up calling gethash twice in this function (if no error
> is reported). How do I avoid that? old-value has the value of the
> what was in the table, not the pointer back to the "place" in the
> table that I could pass to setf. (I can pass it of course, but I'd
> be setting the local old-value, not the hashtable). How can I
> preserve the pointer?

I think if your solution (which seems fine to me) isn't acceptable to
you you have to resort to DEFINE-SETF-EXPANDER to get what you want.

Cheers,
Edi.

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: Svein Ove Aas
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <cllhue$g4n$1@services.kq.no>
Edi Weitz wrote:

> On 26 Oct 2004 03:08:36 -0700, ···@lib.aero (K. Ari Krupnikov) wrote:
> 
>> I need to signal an error (or a warning maybe) if a value is
>> inserted into a hashtable with a key that already exists in that
>> table. As far as I can tell, no such facility is provided in CL. Am
>> I correct?
>>
>> What I did was write a function to check this. Can you guys tell me
>> if I'm thinking in the right direction? Is there a clearer way of
>> doing this?
>>
>> (defun unique-sethash (key value ht)
>>   (multiple-value-bind (old-value was-found) (gethash key ht)
>>     (if was-found
>>       (error "duplicate key ~A" key)
>>       (setf (gethash key ht) value))))
>>
>> Also, I ended up calling gethash twice in this function (if no error
>> is reported). How do I avoid that? old-value has the value of the
>> what was in the table, not the pointer back to the "place" in the
>> table that I could pass to setf. (I can pass it of course, but I'd
>> be setting the local old-value, not the hashtable). How can I
>> preserve the pointer?
> 
> I think if your solution (which seems fine to me) isn't acceptable to
> you you have to resort to DEFINE-SETF-EXPANDER to get what you want.
> 

OP:

You'll have to copy and alter the actual code that's called by (setf
gethash), which would be exremely implementation-dependent.

It could be fairly easy, however - at some point said function has to detect
whether it's supposed to replace an existing entry or insert a new one.
Simply have it error out instead of replacing anything.


Still, I suggest you don't do this unless you *really* need that extra
performance, especially as the compiler may already be optimizing your code
to be equivalent to that.
From: K. Ari Krupnikov
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <86u0sgtvek.fsf@localhost.localdomain>
Svein Ove Aas <·········@aas.no> writes:

> >> (defun unique-sethash (key value ht)
> >>   (multiple-value-bind (old-value was-found) (gethash key ht)
> >>     (if was-found
> >>       (error "duplicate key ~A" key)
> >>       (setf (gethash key ht) value))))
> >>
> >> Also, I ended up calling gethash twice in this function (if no error
> >> is reported). How do I avoid that?
> > 
> > I think if your solution (which seems fine to me) isn't acceptable to
> > you you have to resort to DEFINE-SETF-EXPANDER to get what you want.
> > 
> 
> OP:

[snip]

> Still, I suggest you don't do this unless you *really* need that extra
> performance, especially as the compiler may already be optimizing your code
> to be equivalent to that.

I haven't phrased my question well perhaps. I wasn't looking for
performance advice (at least not yet), rather for advice on style. I
am only just learning lisp and what I was looking for was an idiomatic
way to write this. It looked awkward to me to repeat (gethash key ht)
twice when I really mean the same thing both times, and especially so
since I'm binding the result of the first call.

Ari.

-- 
Elections only count as free and trials as fair if you can lose money
betting on the outcome.
From: Edi Weitz
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <uwtxcy2hk.fsf@agharta.de>
On 27 Oct 2004 01:30:59 -0700, ···@lib.aero (K. Ari Krupnikov) wrote:

> I haven't phrased my question well perhaps. I wasn't looking for
> performance advice (at least not yet), rather for advice on style. I
> am only just learning lisp and what I was looking for was an
> idiomatic way to write this. It looked awkward to me to repeat
> (gethash key ht) twice when I really mean the same thing both times,
> and especially so since I'm binding the result of the first call.

Read the answers you got already again. It might look to you as if you
call the same function, GETHASH, twice but actually you're calling two
/different/ functions - one is GETHASH, the other one is (SETF
GETHASH). Perhaps this'll become clearer if you rewrite your function
like this:

  (defun unique-sethash (key value ht)
    (multiple-value-bind (old-value was-found)
        (funcall #'gethash key ht)
      (declare (ignore old-value))
      (if was-found
        (error "duplicate key ~A" key)
        (funcall #'(setf gethash) value key ht))))

Cheers,
Edi.

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: Rahul Jain
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <87wtxchddu.fsf@nyct.net>
Svein Ove Aas <·········@aas.no> writes:

> You'll have to copy and alter the actual code that's called by (setf
> gethash), which would be exremely implementation-dependent.

Uh... no... he'd get it using GET-SETF-EXPANSION and insert it into his
code via macros and backquoting.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Svein Ove Aas
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <clnm9m$djm$1@services.kq.no>
Rahul Jain wrote:

> Svein Ove Aas <·········@aas.no> writes:
> 
>> You'll have to copy and alter the actual code that's called by (setf
>> gethash), which would be exremely implementation-dependent.
> 
> Uh... no... he'd get it using GET-SETF-EXPANSION and insert it into his
> code via macros and backquoting.
> 
Not in SBCL, he wouldn't, and probably not anywhere else either.
The point of the exercise was to avoid hashing twice. Nothing you can do
with the values of get-setf-expansion lets you do that.

Try it, and you'll see. 
From: Rahul Jain
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <871xffaok0.fsf@nyct.net>
Svein Ove Aas <·········@aas.no> writes:

> Rahul Jain wrote:
>
>> Svein Ove Aas <·········@aas.no> writes:
>> 
>>> You'll have to copy and alter the actual code that's called by (setf
>>> gethash), which would be exremely implementation-dependent.
>> 
>> Uh... no... he'd get it using GET-SETF-EXPANSION and insert it into his
>> code via macros and backquoting.
>> 
> Not in SBCL, he wouldn't, and probably not anywhere else either.
> The point of the exercise was to avoid hashing twice. Nothing you can do
> with the values of get-setf-expansion lets you do that.
>
> Try it, and you'll see. 

So you're looking for an implementation of CL with CL's hash tables that
have some feature that isn't required in the standard? Then use the
implementation that has the performance optimization you need. If GCC
isn't compiling your C code well enough, but the Intel compiler is,
which compiler will you compile the release version with?

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Kalle Olavi Niemitalo
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <87lldtb2h3.fsf@Astalo.kon.iki.fi>
Edi Weitz <········@agharta.de> writes:

> I think if your solution (which seems fine to me) isn't acceptable to
> you you have to resort to DEFINE-SETF-EXPANDER to get what you want.

I think you mean GET-SETF-EXPANSION:

  (defun unique-sethash (key value ht)
    (macrolet ((body (&environment env)
                 (multiple-value-bind (vars vals stores write read)
                     (get-setf-expansion '(gethash key ht) env)
                   `(let* ,(mapcar #'list vars vals)
                      (if (nth-value 1 ,read)
                          (error "duplicate key ~A" key)
                          (multiple-value-bind ,stores value
                            ,write))))))
      (body)))

Or use my SYMBOL-MACROLETF macro:

  (defun unique-sethash (key value ht)
    (symbol-macroletf ((place (gethash key ht)))
      (if (nth-value 1 place)
          (error "duplicate key ~A" key)
          (setf place value))))

Unfortunately, with CMUCL 18e, these variations and the OP's
original version compile to equivalent machine code; the setf
expander of GETHASH makes no attempt to cache the hash bucket.

Suppose some implementation does cache it.  The user sets up a
hash table with :TEST 'EQUAL, gets the setf expansion of (gethash
key hash-table), binds the resulting temporary variables suitably
for key=(good day), and evaluates the accessing form; this
process primes the cache.  The user then does (setf (second key)
'night), binds the store variable, and evaluates the storing
form.  Is the storing form required to notice that the key has
been visibly modified with respect to EQUAL and the cache is thus
no longer valid?  If so, it is probably cheaper to compute the
hash code afresh each time than to preserve a copy of the key
and check whether it has changed.

Section 18.1.2 forbids visible modifications to objects that have
been used as keys in a hash table.  However, some people have
claimed this only refers to keys that have been inserted to the
hash table, not to keys that have merely been looked up.
From: Toomas Altosaar
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <3ef90b62.0410270055.5a93c096@posting.google.com>
Side question: I remeber using locatives on the Lisp Machines a long
time ago but can't remember exactly what their purpose was. Some sort
of a fast access to memory.

Back to the OP:

I think I follow Kalle's "is caching valid anymore" concern.

I would assume that the "check, then write using the same 'locative'"
idea would have to run without-interrupts since what would happen if a
gc occured in the meantime and someone rehashed the table? How
expensive is it to hash a key from some symbol or fixnum in general?
From: Rahul Jain
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <87wtx799st.fsf@nyct.net>
···············@hut.fi (Toomas Altosaar) writes:

> How expensive is it to hash a key from some symbol or fixnum in
> general?

The hash key could be the symbol's memory address or the fixnum itself,
respectively.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Toomas Altosaar
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <3ef90b62.0410310049.1996965a@posting.google.com>
So, the question remains:

Is it faster to rehash the key,

or,

to disable interrupts and use the address generated by the first
gethash operation during the write in (setf (gethash ... ?

I would assume that in very time critical code the latter would be
preferred, especially if the keys are not fixnums nor symbols, but
more complex objects requiring sxhash or such.
From: Pascal Costanza
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <clldka$1404$1@f1node01.rhrz.uni-bonn.de>
K. Ari Krupnikov wrote:

> I need to signal an error (or a warning maybe) if a value is inserted
> into a hashtable with a key that already exists in that table. As far
> as I can tell, no such facility is provided in CL. Am I correct?
> 
> What I did was write a function to check this. Can you guys tell me if
> I'm thinking in the right direction? Is there a clearer way of doing
> this?
> 
> (defun unique-sethash (key value ht)
>   (multiple-value-bind (old-value was-found) (gethash key ht)
>     (if was-found
>       (error "duplicate key ~A" key)
>       (setf (gethash key ht) value))))

I'd call it like this:

(defun (setf unique-gethash) (value key ht &optional default)
   (declare (ignore default))
   (multiple-value-bind
     (old-value foundp)
     (gethash key ht)
     (if foundp
         (error "duplicate key ~A" key)
         (setf (gethash key ht) value))))

Then you can say

(setf (unique-gethash key ht) ...)

...which integrates better with the rest of Common Lisp (-> setf 
framework). (The "default" value is part of the interface for the same 
reason that it is part of the origional (setf gethash) function.)

> Also, I ended up calling gethash twice in this function (if no error
> is reported). How do I avoid that? old-value has the value of the what
> was in the table, not the pointer back to the "place" in the table
> that I could pass to setf. (I can pass it of course, but I'd be
> setting the local old-value, not the hashtable). How can I preserve
> the pointer?

You could wrap the values in a cons:

(defun unique-gethash (key ht &optional default)
   (let ((result (gethash key ht)))
     (if result
         (car result)
         default)))

(defun (setf unique-gethash) (value key ht &optional default)
   (declare (ignore default))
   (let ((result (gethash key ht)))
     (if result
         (setf (car result) value)
         (progn
            (setf (gethash key ht)
                  (list value))
             value))))


Pascal

-- 
Pascal Costanza               University of Bonn
···············@web.de        Institute of Computer Science III
http://www.pascalcostanza.de  R�merstr. 164, D-53117 Bonn (Germany)
From: K. Ari Krupnikov
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <86lldstt8g.fsf@localhost.localdomain>
Pascal Costanza <········@web.de> writes:

> > (defun unique-sethash (key value ht)
> >   (multiple-value-bind (old-value was-found) (gethash key ht)
> >     (if was-found
> >       (error "duplicate key ~A" key)
> >       (setf (gethash key ht) value))))
> 
> I'd call it like this:
> 
> (defun (setf unique-gethash) (value key ht &optional default)
>    (declare (ignore default))
>    (multiple-value-bind
>      (old-value foundp)
>      (gethash key ht)
>      (if foundp
>          (error "duplicate key ~A" key)
>          (setf (gethash key ht) value))))
> 
> Then you can say
> 
> (setf (unique-gethash key ht) ...)
> 
> ...which integrates better with the rest of Common Lisp (-> setf
> framework). (The "default" value is part of the interface for the same
> reason that it is part of the origional (setf gethash) function.)

Thank you. I can see how it's closer to standard facilities. Thanks
for pointing out a better name for foundp, too. However, the name
unique-gethash doesn't really capture what the function does -- there
is nothing wrong with trying to get an existing key from the table,
only setting it is an error. Would it be clearer to pass in an extra
argument, say, if-foundp -- a function that would be called if the key
were present in the table? How would you arrange the arguments then
wrt &optional?  (assuming you tried to stick to the convention you
pointed out)

Am I correct assuming that one would customarily
(defun unique-gethash (value key ht &optional default)..) in addition
to the above?

> You could wrap the values in a cons:
> 
> (defun unique-gethash (key ht &optional default)
>    (let ((result (gethash key ht)))
>      (if result
>          (car result)
>          default)))

CLISP says "*** - CAR: "value" is not a LIST" when I try to invoke
it. How would this work? I guess you could say
(multiple-value-list (gethash key ht)), but then result would always
be true, so you must have meant something else.

> (defun (setf unique-gethash) (value key ht &optional default)
>    (declare (ignore default))
>    (let ((result (gethash key ht)))
>      (if result
>          (setf (car result) value)
>          (progn
>             (setf (gethash key ht)
>                   (list value))
>              value))))

-- 
Elections only count as free and trials as fair if you can lose money
betting on the outcome.
From: Pascal Costanza
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <clo62b$n4q$1@newsreader2.netcologne.de>
K. Ari Krupnikov wrote:
> Pascal Costanza <········@web.de> writes:
> 
> 
>>>(defun unique-sethash (key value ht)
>>>  (multiple-value-bind (old-value was-found) (gethash key ht)
>>>    (if was-found
>>>      (error "duplicate key ~A" key)
>>>      (setf (gethash key ht) value))))
>>
>>I'd call it like this:
>>
>>(defun (setf unique-gethash) (value key ht &optional default)
>>   (declare (ignore default))
>>   (multiple-value-bind
>>     (old-value foundp)
>>     (gethash key ht)
>>     (if foundp
>>         (error "duplicate key ~A" key)
>>         (setf (gethash key ht) value))))
>>
>>Then you can say
>>
>>(setf (unique-gethash key ht) ...)
>>
>>...which integrates better with the rest of Common Lisp (-> setf
>>framework). (The "default" value is part of the interface for the same
>>reason that it is part of the origional (setf gethash) function.)
> 
> Thank you. I can see how it's closer to standard facilities. Thanks
> for pointing out a better name for foundp, too. However, the name
> unique-gethash doesn't really capture what the function does -- there
> is nothing wrong with trying to get an existing key from the table,
> only setting it is an error. Would it be clearer to pass in an extra
> argument, say, if-foundp -- a function that would be called if the key
> were present in the table? How would you arrange the arguments then
> wrt &optional?  (assuming you tried to stick to the convention you
> pointed out)

I don't know exactly what you are trying to achieve, but here I would 
probably try to follow one of the CLOS designs: Whenever a slot is 
either not bound or missing, CLOS automatically calls a method you can 
override for your own purposes. Here's a sketch:

(defun (setf unique-gethash) (value key ht &optional default)
     (declare (ignore default))
     (multiple-value-bind
       (old-value foundp)
       (gethash key ht)
       (if foundp
           (duplicate-hash-key value key ht)
           (setf (gethash key ht) value))))

(defgeneric duplicate-hash-key (value key ht))

Unfortunately there is no class involved that you could specialize on. 
An idea from the MOP would be to define a wrapper class for keys for 
specialization. But this is really going to be overkill.

Do you have some specific use for this in mind? It may be better to 
discuss the more general goal that you are trying to achieve.

> Am I correct assuming that one would customarily
> (defun unique-gethash (value key ht &optional default)..) in addition
> to the above?

Don't know. Depends on the big picture.

>>You could wrap the values in a cons:
>>
>>(defun unique-gethash (key ht &optional default)
>>   (let ((result (gethash key ht)))
>>     (if result
>>         (car result)
>>         default)))
> 
> CLISP says "*** - CAR: "value" is not a LIST" when I try to invoke
> it. How would this work? I guess you could say
> (multiple-value-list (gethash key ht)), but then result would always
> be true, so you must have meant something else.

That code was supposed to be used in conjunction with the code below. 
Note that it wraps new values in a cons (-> (list value)).

>>(defun (setf unique-gethash) (value key ht &optional default)
>>   (declare (ignore default))
>>   (let ((result (gethash key ht)))
>>     (if result
>>         (setf (car result) value)
>>         (progn
>>            (setf (gethash key ht)
>>                  (list value))
>>             value))))


Pascal

-- 
Tyler: "How's that working out for you?"
Jack: "Great."
Tyler: "Keep it up, then."
From: K. Ari Krupnikov
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <86zn27rdon.fsf@localhost.localdomain>
Pascal Costanza <········@web.de> writes:

> Do you have some specific use for this in mind? It may be better to
> discuss the more general goal that you are trying to achieve.

I read records from files and put them into hashtables. Other
components in the program can then query. Sometimes, the files contain
duplicate records (duplicate keys), which is an error that I need to
report.

Ari.

-- 
Elections only count as free and trials as fair if you can lose money
betting on the outcome.
From: Christopher C. Stacy
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <u4qkfahmh.fsf@news.dtpq.com>
···@lib.aero (K. Ari Krupnikov) writes:

> Pascal Costanza <········@web.de> writes:
> 
> > Do you have some specific use for this in mind? It may be better to
> > discuss the more general goal that you are trying to achieve.
> 
> I read records from files and put them into hashtables. Other
> components in the program can then query. Sometimes, the files
> contain duplicate records (duplicate keys), which is an error that I
> need to report.

Just call GETHASH to see if anything is there, and then call 
(SETF (GETHASH ...)) to set it if appropriate (or warn of 
duplicates of whatever).

I get the impression there's some fear that this will be too inefficient,
but no evidence to suggest that this fear is justified.

(defun record-record (r)
  (let ((key (compute-file-record-key r)))
    (if (gethash key *records*)
        (format t "~&Duplicate record ~A" key)
        (setf (gethash key *records*)
              (compute-file-record-vale r)))))


Use COND instead of IF, if there's more than one thing to do in each case.
From: Peter Seibel
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <m3mzy7k7pz.fsf@javamonkey.com>
······@news.dtpq.com (Christopher C. Stacy) writes:

> ···@lib.aero (K. Ari Krupnikov) writes:
>
>> Pascal Costanza <········@web.de> writes:
>> 
>> > Do you have some specific use for this in mind? It may be better to
>> > discuss the more general goal that you are trying to achieve.
>> 
>> I read records from files and put them into hashtables. Other
>> components in the program can then query. Sometimes, the files
>> contain duplicate records (duplicate keys), which is an error that I
>> need to report.
>
> Just call GETHASH to see if anything is there, and then call 
> (SETF (GETHASH ...)) to set it if appropriate (or warn of 
> duplicates of whatever).
>
> I get the impression there's some fear that this will be too inefficient,
> but no evidence to suggest that this fear is justified.
>
> (defun record-record (r)
>   (let ((key (compute-file-record-key r)))
>     (if (gethash key *records*)
>         (format t "~&Duplicate record ~A" key)
>         (setf (gethash key *records*)
>               (compute-file-record-vale r)))))
>
>
> Use COND instead of IF, if there's more than one thing to do in each case.

Or if your hash can contain NIL values:

  (defun record-record (r)
    (let ((key (compute-file-record-key r)))
      (if (nth-value 1 (gethash key *records*))
          (format t "~&Duplicate record ~A" key)
          (setf (gethash key *records*)
                (compute-file-record-vale r)))))

And if you need lots of flexibility you may want to use the condition
system. Something like this (untested) code:

  (define-condition duplicate-record (error)
    ((key :initarg :key :reader key)
     (old-value :initarg :old-value :reader old-value)
     (new-value :initarg :new-value :reader new-value))
    (:report
      (lambda (c s) 
       (format s "~&Duplicate record found for key: ~s.~&Old-value:~&~s~&New-value:~&~s~%" 
         (key c)
         (old-value c)
         (new-value c)))))

  (defun record-record (r)
    (let ((key (compute-file-record-key r))
          (new-value (compute-file-record-value r)))
      (multiple-value-bind (old-value present-p) (gethash key *records*)
        (when present-p
          (restart-case (error 'duplicate-record
                               :key key
                               :old-value old-value
                               :new-value new-value)
          (keep-old-record ()
            (return-from record-record))
          (keep-new-record ())))
      (setf (gethash key *records* new-value)))))


Now during development you can use these restarts interactively to
track down the duplicate records in your input file. Or if you have
some automatable logic for choosing which record you want to keep you
can implement it with a condition handler:

  (defun pick-em (c)
    (invoke-restart 
      (if (better-p (new-value c) (old-value c))
       'keep-new-record
       'keep-old-record)))

  (defun load-file (file)
    (with-open-file (in file)
      (handler-bind (duplicate-record #'pick-em)
        (loop for rec = (read-record in)
           while rec do (record-record rec)))))

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Paul Foley
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <m2mzy9acf0.fsf@mycroft.actrix.gen.nz>
On 26 Oct 2004 03:08:36 -0700, K Ari Krupnikov wrote:

> I need to signal an error (or a warning maybe) if a value is inserted
> into a hashtable with a key that already exists in that table. As far
> as I can tell, no such facility is provided in CL. Am I correct?

> What I did was write a function to check this. Can you guys tell me if
> I'm thinking in the right direction? Is there a clearer way of doing
> this?

> (defun unique-sethash (key value ht)
>   (multiple-value-bind (old-value was-found) (gethash key ht)
>     (if was-found
>       (error "duplicate key ~A" key)
>       (setf (gethash key ht) value))))


> Also, I ended up calling gethash twice in this function (if no error

You're not calling GETHASH twice, you're calling GETHASH once and
(SETF GETHASH) once.

> is reported). How do I avoid that? old-value has the value of the what
> was in the table, not the pointer back to the "place" in the table
> that I could pass to setf. (I can pass it of course, but I'd be

The key isn't in the table, so there isn't necessarily any "place" it
could return (the table may need to grow, etc).  Your implementation
may cache stuff under the covers to make this operation faster,
anyway.  If it's not actually causing you pain, forget about it.

-- 
Being really good at C++ is like being really good at using rocks to
sharpen sticks.
                                                         -- Thant Tessman
(setq reply-to
  (concatenate 'string "Paul Foley " "<mycroft" '(··@) "actrix.gen.nz>"))
From: Marco Antoniotti
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <plxfd.14$TY6.9165@typhoon.nyu.edu>
Paul Foley wrote:
> On 26 Oct 2004 03:08:36 -0700, K Ari Krupnikov wrote:
> 
> 
>>I need to signal an error (or a warning maybe) if a value is inserted
>>into a hashtable with a key that already exists in that table. As far
>>as I can tell, no such facility is provided in CL. Am I correct?
> 
> 
>>What I did was write a function to check this. Can you guys tell me if
>>I'm thinking in the right direction? Is there a clearer way of doing
>>this?
> 
> 
>>(defun unique-sethash (key value ht)
>>  (multiple-value-bind (old-value was-found) (gethash key ht)
>>    (if was-found
>>      (error "duplicate key ~A" key)
>>      (setf (gethash key ht) value))))
> 
> 
> 
>>Also, I ended up calling gethash twice in this function (if no error
> 
> 
> You're not calling GETHASH twice, you're calling GETHASH once and
> (SETF GETHASH) once.
> 
> 
>>is reported). How do I avoid that? old-value has the value of the what
>>was in the table, not the pointer back to the "place" in the table
>>that I could pass to setf. (I can pass it of course, but I'd be
> 
> 
> The key isn't in the table, so there isn't necessarily any "place" it
> could return (the table may need to grow, etc).  Your implementation
> may cache stuff under the covers to make this operation faster,
> anyway.  If it's not actually causing you pain, forget about it.

I think the OP wanted something different.

Suppose you have a value associated with the key in HT.  He wants the 
following

(multiple-value-bind (htloc foundp)
    (get-pointer-to-bucket-in-hashtable key ht)
    ;; the above function works like GETHASH but returns a "pointer" or
    ;; "locative" to the actual hash table bucket - assuming an "open"
    ;; hash table implementation
  (if foundp
     (error "FOO")
     (setf (content htloc) value)))

... for an appropriate value of CONTENT (or you can call it DEREF)

Now, CL does not offer these.

In any case, it would break what IMHO is the obvious Dictionary API in 
whatever language you want to program it, as it requires knowledge of 
the hash table implementation.  Maybe we are looking at a case of 
"premature optimization".

Cheers
--
Marco
From: Marco Antoniotti
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <onxfd.15$TY6.9165@typhoon.nyu.edu>
Marco Antoniotti wrote:

	...

> 
> I think the OP wanted something different.

	...
> 
> Suppose you have a value associated with the key in HT. 

Sorry, the above sentence was to be deleted.

--
Marco
From: K. Ari Krupnikov
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <86pt34tumw.fsf@localhost.localdomain>
Marco Antoniotti <·······@cs.nyu.edu> writes:

> >>Also, I ended up calling gethash twice in this function (if no error

> > You're not calling GETHASH twice, you're calling GETHASH once and
> > (SETF GETHASH) once.

Right. My simplified mental model of how setf works obviously needs
updating.

> In any case, it would break what IMHO is the obvious Dictionary API in
> whatever language you want to program it, as it requires knowledge of
> the hash table implementation.  Maybe we are looking at a case of
> "premature optimization".

That impression is the last thing in the world I wanted to create. I
haven't written enough of this application to even guess where the
bottlenecks are going to be, much less profile them. If I can venture
a guess at all, it would be that it would all fit into main memory and
will run in a short enough time that optimizing it won't be
worthwhile. In any case, my primary focus in writing it is to learn
good lisp, not to squeeze every bit of performance out of it. Not yet.

Thanks everyone who suggested low-level hacks to solve my misstated
"pointer" question. many of them went over my head, but I'll revisit
them as I learn more lisp.

Ari.

-- 
Elections only count as free and trials as fair if you can lose money
betting on the outcome.
From: Coby Beck
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <tPZfd.10882$df2.2531@edtnps89>
"K. Ari Krupnikov" <···@lib.aero> wrote in message 
···················@localhost.localdomain...
> Marco Antoniotti <·······@cs.nyu.edu> writes:
>> In any case, it would break what IMHO is the obvious Dictionary API in
>> whatever language you want to program it, as it requires knowledge of
>> the hash table implementation.  Maybe we are looking at a case of
>> "premature optimization".
>
> That impression is the last thing in the world I wanted to create. I
> haven't written enough of this application to even guess where the
> bottlenecks are going to be, much less profile them. If I can venture

Just to add a bit more conflicting style advice...

Perhaps this is a unique situation, and you don't need a general 
abstraction.  If so, I would probably prefer hiding the fact it is a 
hashtable (this is usually preferable anyway, maybe you will change your 
mind about the data structure later on).  So rather write:

(defun store-a-foo (key foo-thing) ...)
(defun get-a-foo (key) ...)

Use those all over the place and then when you switch from a hashtable to a 
Red-Black tree or p-list or whatever nothing changes except the insides of 
these two functions.

Now the insides of store-a-foo with a hashtable could also be simpler if you 
know you will never store a NIL foo, which is not an uncommon situation (I 
personally find I rarely need the second return value of gethash)

(defun store-a-foo (key foo-thing)
    (if (gethash key *foos*)
       (error "already got a ~A, thanks" foo-thing)
      (setf (gethash key *foos*) foo-thing)))

Perhaps in your app it is also an error to ask for something when it is not 
there:

(defun get-a-foo (key)
   (or (gethash key *foos*)
       (error "No foo for ~A, sorry" key)))

Here also we have an place where a closure would be very apropos:

(let ((foos (make-hash-table)))
  (defun store-a-foo (key foo-thing)
     (if (gethash key foos)
        (error "already got a foo for ~A, thanks" key)
       (setf (gethash key foos) foo-thing)))
  (defun get-a-foo (key)
     (or (gethash key foos)
         (error "No foo for ~A, sorry" key))))

Don't forget to give yourself some debugging utilities like

(defun dump-foos ()
   (setf foos (make-hash-table)))

inside the closure of course....

-- 
Coby Beck
(remove #\Space "coby 101 @ big pond . com")
From: K. Ari Krupnikov
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <861xfighth.fsf@localhost.localdomain>
"Coby Beck" <·····@mercury.bc.ca> writes:

> Perhaps this is a unique situation, and you don't need a general 
> abstraction.  If so, I would probably prefer hiding the fact it is a 
> hashtable (this is usually preferable anyway, maybe you will change your 
> mind about the data structure later on).  So rather write:
> 
> (defun store-a-foo (key foo-thing) ...)
> (defun get-a-foo (key) ...)
> 
> Use those all over the place and then when you switch from a hashtable to a 
> Red-Black tree or p-list or whatever nothing changes except the insides of 
> these two functions.

Hm. This seems to be the advice for languages like C++ and Java: hide
your implementation behind an interface, protect it from unauthorized
access (with a closure in this case). My experience with this approach
has been mixed. The increase in complexity (forcing a reader to learn
what get-a-foo is and how it works, instead of using a common
construct such as gethash) must be offset by some reduction in
complexity, and I don't think it happens here. If store-a-foo isn't
used much, there is no great harm in going through the code and
replacing the few calls to gethash with some other calls that
correspond to the new implementation.  If is used often, inevitably, I
find myself adding a function to count stored foos, to clear foos, add
arguments to it that say how to handle failures, etc[1], leaving it as
unprotected as a global hashtable and more complicated to use.

> Now the insides of store-a-foo with a hashtable could also be simpler if you 
> know you will never store a NIL foo, which is not an uncommon situation (I 
> personally find I rarely need the second return value of gethash)

Right. Now this is clearly something I have overlooked. Indeed, I
don't intend to store nils in the table, so indeed I don't need the
second value. Thank you for pointing this out. I think I was just
fascinated with this cool new feature (multiple return values) and dug
myself into a hole using them :=)

Ari.

[1] http://www.propylon.com/news/ctoarticles/040330_business_objects.html 
-- 
Elections only count as free and trials as fair if you can lose money
betting on the outcome.
From: Coby Beck
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <2TGgd.29667$df2.2653@edtnps89>
"K. Ari Krupnikov" <···@lib.aero> wrote in message 
···················@localhost.localdomain...
> "Coby Beck" <·····@mercury.bc.ca> writes:
>
>> Perhaps this is a unique situation, and you don't need a general
>> abstraction.  If so, I would probably prefer hiding the fact it is a
>> hashtable (this is usually preferable anyway, maybe you will change your
>> mind about the data structure later on).  So rather write:
>>
>> (defun store-a-foo (key foo-thing) ...)
>> (defun get-a-foo (key) ...)
>>
>> Use those all over the place and then when you switch from a hashtable to 
>> a
>> Red-Black tree or p-list or whatever nothing changes except the insides 
>> of
>> these two functions.
>
> Hm. This seems to be the advice for languages like C++ and Java: hide
> your implementation behind an interface,

Well, it is not so much a matter of "hiding behind" but rather one of 
"providing" an interface.  That's just the benefit of abstracting.  You will 
thank yourself later.  However you implement it, protected or not, a simple 
API of some sort can not be wrong!

> protect it from unauthorized
> access (with a closure in this case).

The closure idea was really more of a "here's something cool that would 
work" suggestion rather than a design sermon ;)

> My experience with this approach
> has been mixed. The increase in complexity (forcing a reader to learn
> what get-a-foo is and how it works, instead of using a common
> construct such as gethash) must be offset by some reduction in
> complexity, and I don't think it happens here.

But a general construct does not provide easier understanding!  I see 
gethash and I know "oh, he's storing and retrieving data."  Well, what else 
is new?  I see "get-a-foo" and I know right away you are storing and 
retrieving foo's and I don't have to bother with where or how.  Here you 
have the simplest kind of API, a meaningful name.

>> Now the insides of store-a-foo with a hashtable could also be simpler if 
>> you
>> know you will never store a NIL foo, which is not an uncommon situation 
>> (I
>> personally find I rarely need the second return value of gethash)
>
> Right. Now this is clearly something I have overlooked. Indeed, I
> don't intend to store nils in the table, so indeed I don't need the
> second value. Thank you for pointing this out. I think I was just
> fascinated with this cool new feature (multiple return values) and dug
> myself into a hole using them :=)

I remember often finding it impossible to resist going back over perfectly 
fine code just so I could make use of the "cool new feature" du jour!  So 
don't kick yourself for slightly gratuitous uses in new code ;)

-- 
Coby Beck
(remove #\Space "coby 101 @ big pond . com")
From: Rahul Jain
Subject: Re: reject duplicates in a hashtable?
Date: 
Message-ID: <87sm80hd89.fsf@nyct.net>
Marco Antoniotti <·······@cs.nyu.edu> writes:

> In any case, it would break what IMHO is the obvious Dictionary API in
> whatever language you want to program it, as it requires knowledge of
> the hash table implementation.

Nah, just the setf-expansion needs to know that. :)

It's up to the implementation to implement it such that it caches the
location of the hash bucket and invalidates that cache if a GC causes a
rehash of an EQ hashtable.

> Maybe we are looking at a case of "premature optimization".

Very likely, yes.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist