From: Anton V. Belyaev
Subject: How to get key reference from a hashtable
Date: 
Message-ID: <1150360228.417650.245570@y41g2000cwy.googlegroups.com>
Hello!

Let me have a hashtable:

(setq table (make-hash-table :test 'string-eq))
(setf (gethash "integer" table) 'integer)
(setf (gethash "float" table) 'float)

And a function which for a specified string returns
a cons cell, for example (integer . "integer"):

(defun search (str)
     (cons (gethash str table) str))

The problem is that for each cell returned
(search "integer") => (integer . "integer")
(search "integer") => (integer . "integer")
(search "integer") => (integer . "integer")
three instances of "integer" string are created and referenced by
resulting cells.
All these conses refer to different "integer" instances.
It would be nice if I had a possibility to put a reference
to the key while constructing that cons cells. It will
save the memory.

Do you have an idea on how to do it?
Thanks in advance!

From: Kaz Kylheku
Subject: Re: How to get key reference from a hashtable
Date: 
Message-ID: <1150382513.055457.224940@y41g2000cwy.googlegroups.com>
Anton V. Belyaev wrote:
> Hello!
>
> Let me have a hashtable:
>
> (setq table (make-hash-table :test 'string-eq))
> (setf (gethash "integer" table) 'integer)
> (setf (gethash "float" table) 'float)
>
> And a function which for a specified string returns
> a cons cell, for example (integer . "integer"):
>
> (defun search (str)
>      (cons (gethash str table) str))
>
> The problem is that for each cell returned
> (search "integer") => (integer . "integer")
> (search "integer") => (integer . "integer")
> (search "integer") => (integer . "integer")
> three instances of "integer" string are created and referenced by
> resulting cells.

These instances are created by the READ function invoked by your Lisp
listener. That has nothing to do with your hash table.

In a compiled program, the literals will be "baked" into the program
image. There will be a finite number of them. That is to say, if you
have N expressions throughout the program that evaluate the literal
"integer", then there won't be more than N distinct instances of that
literal in the program image. Perhaps fewer than that if the compiler
or loader folds identical string constants together.

Memory for literals is a sunk cost, so it doesn't matter whether your
function returns the original literal or the key from the hash table.
Depending on the Lisp implementation, a literal might never become
garbage. It's implicitly referenced by the expression that produces it;
if that expression might be called in the future, that literal has to
stick around.

So in fact it's better to work with the literal wherever there is a
choice. It's better for a memory location to refer to a literal than to
a non-literal object, because the non-literal object occupies
potentially reclaimable storage.

Think about it. You have some dynamically allocated string "integer",
and you have an "integer" literal. If the program loses all of its
references to both of these objects, the dynamic one becomes garbage
whose space can be reclaimed. The literal continues to occupy space
whether or not a variable or other memory location refers to it. If the
expression that yields it is reachable code, then it has to stick
around. Literals may even be loaded into special memory that is not
subject to garbage collection. In an embedded Lisp, they could even be
put into ROM. I.e. the storage issues with respect to literals tend to
be load-time issues, not run-time issues. That is what I mean by "sunk
cost". Once they are loaded, you have them, and might as well use them.

> All these conses refer to different "integer" instances.
> It would be nice if I had a possibility to put a reference
> to the key while constructing that cons cells. It will
> save the memory.

Would it really ... haha.
From: Anton V. Belyaev
Subject: Re: How to get key reference from a hashtable
Date: 
Message-ID: <1150383455.828618.291520@r2g2000cwb.googlegroups.com>
When I call search function it is not necessary that I call it with
literal.
This string might be produced during parsing a file.
And if search returns cons cell with reference to this string, it will
not be
garbage-collected.

In case it returned a reference to key, the search string might be
GCed.

;; create the hashtable
(setq key "integer")
(setq hash (make-hash-table :test 'equal))
(setf (get-hash key hash) (cons 'integer key))

;; now we have one "integer" in memory, which might never dispose
;; becasue it is built-in literal

;; read these two strings from file. let them be equal to "integer"
(setq str1 (read-from-file))
(setq str2 (read-from-file))

;; now we have three "integer" strings in memory

;; res1 and res2 cells contain reference to the same "integer" which is
stored in
;; hash as a key
(setq res1 (search str1))   => (integer . "integer")
(setq res2 (search str2))   => (integer . "integer")

(setq str1 nil)
(setq str1 nil)
(run-garbage-collector)

;; after the GC run only one "integer" string is in memory
From: Kaz Kylheku
Subject: Re: How to get key reference from a hashtable
Date: 
Message-ID: <1150384914.885452.150590@u72g2000cwu.googlegroups.com>
Anton V. Belyaev wrote:
> ;; read these two strings from file. let them be equal to "integer"
> (setq str1 (read-from-file))
> (setq str2 (read-from-file))
>
> ;; now we have three "integer" strings in memory
>
> ;; res1 and res2 cells contain reference to the same "integer" which is
> stored in
> ;; hash as a key
> (setq res1 (search str1))   => (integer . "integer")
> (setq res2 (search str2))   => (integer . "integer")

Right; so in this example, you have basically implemented an interning
service for strings.

You're on your way to reinventing symbols.

The design of your example could be implemented by using a package
instead of the hash table, symbols interned in that package instead of
strings, and SYMBOL-VALUE on the symbols to store the associated value.


:)

The nice thing about symbols is that the interning (i.e. the hash table
lookup, or whatever, implemented by the package system) is done only at
read time. To actually do the look up and retrieve the value, no work
has to be done other than just fetching the value slot of the symbol.
The SEARCH function becomes superfluous.

Since you have reduced down to a single "integer" object, don't you
think it would be more efficeint to have an extra cell of hidden
storage right inside that literal which could store the associated
value? Then (search "integer") would just look there, instead of going
through a hash table.

That's the kind of thing that symbols do.

If you /do/ use a hash table to associate symbols with values, it will
be an EQ hash. EQ hashes can be implemented efficiently with something
like a radix tree, since the keys are essentially machine addresses.
From: Damyan Pepper
Subject: Re: How to get key reference from a hashtable
Date: 
Message-ID: <uwtbi93fw.fsf@gmail.com>
"Anton V. Belyaev" <·············@gmail.com> writes:

> (setq table (make-hash-table :test 'string-eq))

Is this actually valid?  According to CLHS the test has to be one of
eq, eql, equal or equalp.  CLisp gives an error on the above line -
with my tests I've used 'equal.

> The problem is that for each cell returned
> (search "integer") => (integer . "integer")
> (search "integer") => (integer . "integer")
> (search "integer") => (integer . "integer")
> three instances of "integer" string are created and referenced by
> resulting cells.
> All these conses refer to different "integer" instances.
> It would be nice if I had a possibility to put a reference
> to the key while constructing that cons cells. It will
> save the memory.

You're creating the 3 instances of "integer" each time you call
search.  

It looks like the hash table does store the keys because you can
access them using maphash:

CL-USER> (maphash (lambda (key value) (print (cons key value))) *table*)

("float" . FLOAT) 
("integer" . INTEGER) 
NIL


But I can't find a way to extract them without iterating over the
whole table.


One solution would be to use the same string object in your queries:

CL-USER> (let ((str "integer"))
		   (eq (cdr (search str)) (cdr (search str))))
T

But I don't think that's what you're after :)

Maybe you could start along the route of having another hash table
that contains definitive keys, then set your original table to 'eq:

(setq *table* (make-hash-table))
(setq *keys* (make-hash-table :test 'equal))
(setf (gethash "integer" *keys*) "integer")
(setf (gethash "float" *keys*) "float")
(setf (gethash (gethash "integer" *keys*) *table*) 'integer)

(defun mysearch (str)
	   (let ((key (gethash str *keys*)))
		 (cons (gethash key *table*) key)))

CL-USER> (mysearch "integer")
(INTEGER . "integer")

CL-USER> (eq (cdr (mysearch "integer")) (cdr (mysearch "integer")))
T
From: Anton V. Belyaev
Subject: Re: How to get key reference from a hashtable
Date: 
Message-ID: <1150364642.272149.244360@c74g2000cwc.googlegroups.com>
Thanks Damyan.
The second table for keys will work of course, but it not optimal.

I've just invented an idea on how to achieve what I want.
It is possible to put both the key and value into value fields of a
hashtable:

(defun puthash (table key value)
  (setf (gethash key table) (cons key value))

The expense of the solution is a couple of bytes fo cons cell:
hash key and key in the cell will refer to the same string.
From: Kaz Kylheku
Subject: Re: How to get key reference from a hashtable
Date: 
Message-ID: <1150384417.901960.319660@c74g2000cwc.googlegroups.com>
Anton V. Belyaev wrote:
> Thanks Damyan.
> The second table for keys will work of course, but it not optimal.
>
> I've just invented an idea on how to achieve what I want.
> It is possible to put both the key and value into value fields of a
> hashtable:
>
> (defun puthash (table key value)
>   (setf (gethash key table) (cons key value))
>
> The expense of the solution is a couple of bytes fo cons cell:
> hash key and key in the cell will refer to the same string.

But your SEARCH function now reduces to a trivial wrapper around
GETHASH; it just returns this cons cell now and so doesn't have to
allocate anything.

Callers have to take care to make a copy if they want to modify that
cell, of course.
From: Barry Margolin
Subject: Re: How to get key reference from a hashtable
Date: 
Message-ID: <barmar-D4A287.08463915062006@comcast.dca.giganews.com>
In article <························@c74g2000cwc.googlegroups.com>,
 "Anton V. Belyaev" <·············@gmail.com> wrote:

> Thanks Damyan.
> The second table for keys will work of course, but it not optimal.
> 
> I've just invented an idea on how to achieve what I want.
> It is possible to put both the key and value into value fields of a
> hashtable:
> 
> (defun puthash (table key value)
>   (setf (gethash key table) (cons key value))

That was the solution I was going to suggest.

> 
> The expense of the solution is a couple of bytes fo cons cell:
> hash key and key in the cell will refer to the same string.

But since you save on consing new cells later, this should be a net 
saving.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***