From: Carl Taylor
Subject: gethash key arg question
Date: 
Message-ID: <Sg6sj.586091$kj1.382567@bgtnsc04-news.ops.worldnet.att.net>
I would appreciate an explanation as to why a <copy-seq> is needed in line 12 of 
the following <defun> to avoid garbage output.  It is only needed when initially 
inserting a new entry to the hash table.  The implementation is LW.

Thanks.

Carl Taylor


(defparameter *t-string*
   (format nil "Quuxabcdefghcarlijkl-carlmnQuuxopqrcarlstuvwxyz~
                    123carl45678carl90abcdefQuuxQuuxabcdefghijklmno~
                    pqrstuvwxyzfoofoo123456789foo-abcdefghijkwosjvq"))


(defun extract-repeating-sub-strings (ss-lng x-string)
   (loop with wk-str = (make-string ss-lng)
           and subseq-ht = (make-hash-table :size 503 :test #'equal)
           and sis = (make-string-input-stream x-string)
           and stream-pointer fixnum = 0
           unless (eql ss-lng (read-sequence wk-str sis))
                 do (loop-finish)
           else
                 if (nth-value 1 (gethash wk-str subseq-ht))
                    do (incf (gethash wk-str subseq-ht))
                 else
                    do (incf (gethash (copy-seq wk-str) subseq-ht 0))
                 end
           end
           do (file-position sis (incf stream-pointer))
           finally
              (maphash (lambda (k v)
                                (when (> v 1)
                                    (print (cons k v))))
                             subseq-ht)))

(compile *)

(extract-repeating-sub-strings 4 *t-string*) 

From: Rainer Joswig
Subject: Re: gethash key arg question
Date: 
Message-ID: <joswig-C46FE1.02364412022008@news-europe.giganews.com>
In article 
<·······················@bgtnsc04-news.ops.worldnet.att.net>,
 "Carl Taylor" <··········@att.net> wrote:

> I would appreciate an explanation as to why a <copy-seq> is needed in line 12 of 
> the following <defun> to avoid garbage output.  It is only needed when initially 
> inserting a new entry to the hash table.  The implementation is LW.
> 
> Thanks.
> 
> Carl Taylor
> 
> 
> (defparameter *t-string*
>    (format nil "Quuxabcdefghcarlijkl-carlmnQuuxopqrcarlstuvwxyz~
>                     123carl45678carl90abcdefQuuxQuuxabcdefghijklmno~
>                     pqrstuvwxyzfoofoo123456789foo-abcdefghijkwosjvq"))
> 
> 
> (defun extract-repeating-sub-strings (ss-lng x-string)
>    (loop with wk-str = (make-string ss-lng)
>            and subseq-ht = (make-hash-table :size 503 :test #'equal)
>            and sis = (make-string-input-stream x-string)
>            and stream-pointer fixnum = 0
>            unless (eql ss-lng (read-sequence wk-str sis))
>                  do (loop-finish)
>            else
>                  if (nth-value 1 (gethash wk-str subseq-ht))
>                     do (incf (gethash wk-str subseq-ht))
>                  else
>                     do (incf (gethash (copy-seq wk-str) subseq-ht 0))
>                  end
>            end
>            do (file-position sis (incf stream-pointer))
>            finally
>               (maphash (lambda (k v)
>                                 (when (> v 1)
>                                     (print (cons k v))))
>                              subseq-ht)))
> 
> (compile *)
> 
> (extract-repeating-sub-strings 4 *t-string*) 

http://www.lispworks.com/documentation/HyperSpec/Body/18_ab.htm
18.1.2 Modifying Hash Table Keys

From reading the spec, I would infer:

You are using a string as a key for a hash-table operation.
Later you are modifying this string and are use it again
as a key for a hash-table operation. The change of the string
in combination with an EQUAL hash-table is called a
'visible modification'. The consequences of such an visible
modification are undefined. This means for me, that modifying
a string which is used several times as a key for hash-table operations
with an EQUAL hashtable should be avoided.
For each new string content, you need to cons a fresh new string,
if you want to use this string as a parameter for hash-table
operations.
From: vanekl
Subject: Re: gethash key arg question
Date: 
Message-ID: <for0l5$8v5$1@aioe.org>
Carl Taylor wrote:
> I would appreciate an explanation as to why a <copy-seq> is needed in 
> line 12 of the following <defun> to avoid garbage output.  It is only 
> needed when initially inserting a new entry to the hash table.  The 
> implementation is LW.
> 
> Thanks.
> 
> Carl Taylor
> 
> 
> (defparameter *t-string*
>   (format nil "Quuxabcdefghcarlijkl-carlmnQuuxopqrcarlstuvwxyz~
>                    123carl45678carl90abcdefQuuxQuuxabcdefghijklmno~
>                    pqrstuvwxyzfoofoo123456789foo-abcdefghijkwosjvq"))
> 
> 
> (defun extract-repeating-sub-strings (ss-lng x-string)
>   (loop with wk-str = (make-string ss-lng)
>           and subseq-ht = (make-hash-table :size 503 :test #'equal)
>           and sis = (make-string-input-stream x-string)
>           and stream-pointer fixnum = 0
>           unless (eql ss-lng (read-sequence wk-str sis))
>                 do (loop-finish)
>           else
>                 if (nth-value 1 (gethash wk-str subseq-ht))
>                    do (incf (gethash wk-str subseq-ht))
>                 else
>                    do (incf (gethash (copy-seq wk-str) subseq-ht 0))
>                 end
>           end
>           do (file-position sis (incf stream-pointer))
>           finally
>              (maphash (lambda (k v)
>                                (when (> v 1)
>                                    (print (cons k v))))
>                             subseq-ht)))
> 
> (compile *)
> 
> (extract-repeating-sub-strings 4 *t-string*)

If you're familiar with C, what's happening is the equivalent to:

1. the hash table is saving a pointer to each hash (string) key,
    but is not calling strdup on the key
2. when the loop terminates, there are 138 key pointers stored in the
    hash table, but the pointers are all identical and all point to
    the same place in memory (that happens to contain "jvqq" when the
    loop terminates, because the last read was only 3 chars, "jvq")

In other words, the hash table is trying not to do any unnecessary consing,
but leaving that up to the calling procedure.
From: Maciej Katafiasz
Subject: Re: gethash key arg question
Date: 
Message-ID: <fornfu$g3d$3@news.net.uni-c.dk>
Den Tue, 12 Feb 2008 01:04:50 +0000 skrev Carl Taylor:

> I would appreciate an explanation as to why a <copy-seq> is needed in
> line 12 of the following <defun> to avoid garbage output.  It is only
> needed when initially inserting a new entry to the hash table.  The
> implementation is LW.

A minor stylistic nitpick: it's customary to take advantage of the 
default upcasing behaviour of CL and say COPY-SEQ and DEFUN instead of 
<copy-seq> and <defun>. That makes it easy to distinguish, say, LENGTH-
the-function from length-the-word even outside code snippets.

Cheers,
Maciej