I would appreciate an explanation as to why a <copy-seq> is needed in line 12 of
the following <defun> to avoid garbage output. It is only needed when initially
inserting a new entry to the hash table. The implementation is LW.
Thanks.
Carl Taylor
(defparameter *t-string*
(format nil "Quuxabcdefghcarlijkl-carlmnQuuxopqrcarlstuvwxyz~
123carl45678carl90abcdefQuuxQuuxabcdefghijklmno~
pqrstuvwxyzfoofoo123456789foo-abcdefghijkwosjvq"))
(defun extract-repeating-sub-strings (ss-lng x-string)
(loop with wk-str = (make-string ss-lng)
and subseq-ht = (make-hash-table :size 503 :test #'equal)
and sis = (make-string-input-stream x-string)
and stream-pointer fixnum = 0
unless (eql ss-lng (read-sequence wk-str sis))
do (loop-finish)
else
if (nth-value 1 (gethash wk-str subseq-ht))
do (incf (gethash wk-str subseq-ht))
else
do (incf (gethash (copy-seq wk-str) subseq-ht 0))
end
end
do (file-position sis (incf stream-pointer))
finally
(maphash (lambda (k v)
(when (> v 1)
(print (cons k v))))
subseq-ht)))
(compile *)
(extract-repeating-sub-strings 4 *t-string*)
In article
<·······················@bgtnsc04-news.ops.worldnet.att.net>,
"Carl Taylor" <··········@att.net> wrote:
> I would appreciate an explanation as to why a <copy-seq> is needed in line 12 of
> the following <defun> to avoid garbage output. It is only needed when initially
> inserting a new entry to the hash table. The implementation is LW.
>
> Thanks.
>
> Carl Taylor
>
>
> (defparameter *t-string*
> (format nil "Quuxabcdefghcarlijkl-carlmnQuuxopqrcarlstuvwxyz~
> 123carl45678carl90abcdefQuuxQuuxabcdefghijklmno~
> pqrstuvwxyzfoofoo123456789foo-abcdefghijkwosjvq"))
>
>
> (defun extract-repeating-sub-strings (ss-lng x-string)
> (loop with wk-str = (make-string ss-lng)
> and subseq-ht = (make-hash-table :size 503 :test #'equal)
> and sis = (make-string-input-stream x-string)
> and stream-pointer fixnum = 0
> unless (eql ss-lng (read-sequence wk-str sis))
> do (loop-finish)
> else
> if (nth-value 1 (gethash wk-str subseq-ht))
> do (incf (gethash wk-str subseq-ht))
> else
> do (incf (gethash (copy-seq wk-str) subseq-ht 0))
> end
> end
> do (file-position sis (incf stream-pointer))
> finally
> (maphash (lambda (k v)
> (when (> v 1)
> (print (cons k v))))
> subseq-ht)))
>
> (compile *)
>
> (extract-repeating-sub-strings 4 *t-string*)
http://www.lispworks.com/documentation/HyperSpec/Body/18_ab.htm
18.1.2 Modifying Hash Table Keys
From reading the spec, I would infer:
You are using a string as a key for a hash-table operation.
Later you are modifying this string and are use it again
as a key for a hash-table operation. The change of the string
in combination with an EQUAL hash-table is called a
'visible modification'. The consequences of such an visible
modification are undefined. This means for me, that modifying
a string which is used several times as a key for hash-table operations
with an EQUAL hashtable should be avoided.
For each new string content, you need to cons a fresh new string,
if you want to use this string as a parameter for hash-table
operations.
Carl Taylor wrote:
> I would appreciate an explanation as to why a <copy-seq> is needed in
> line 12 of the following <defun> to avoid garbage output. It is only
> needed when initially inserting a new entry to the hash table. The
> implementation is LW.
>
> Thanks.
>
> Carl Taylor
>
>
> (defparameter *t-string*
> (format nil "Quuxabcdefghcarlijkl-carlmnQuuxopqrcarlstuvwxyz~
> 123carl45678carl90abcdefQuuxQuuxabcdefghijklmno~
> pqrstuvwxyzfoofoo123456789foo-abcdefghijkwosjvq"))
>
>
> (defun extract-repeating-sub-strings (ss-lng x-string)
> (loop with wk-str = (make-string ss-lng)
> and subseq-ht = (make-hash-table :size 503 :test #'equal)
> and sis = (make-string-input-stream x-string)
> and stream-pointer fixnum = 0
> unless (eql ss-lng (read-sequence wk-str sis))
> do (loop-finish)
> else
> if (nth-value 1 (gethash wk-str subseq-ht))
> do (incf (gethash wk-str subseq-ht))
> else
> do (incf (gethash (copy-seq wk-str) subseq-ht 0))
> end
> end
> do (file-position sis (incf stream-pointer))
> finally
> (maphash (lambda (k v)
> (when (> v 1)
> (print (cons k v))))
> subseq-ht)))
>
> (compile *)
>
> (extract-repeating-sub-strings 4 *t-string*)
If you're familiar with C, what's happening is the equivalent to:
1. the hash table is saving a pointer to each hash (string) key,
but is not calling strdup on the key
2. when the loop terminates, there are 138 key pointers stored in the
hash table, but the pointers are all identical and all point to
the same place in memory (that happens to contain "jvqq" when the
loop terminates, because the last read was only 3 chars, "jvq")
In other words, the hash table is trying not to do any unnecessary consing,
but leaving that up to the calling procedure.
Den Tue, 12 Feb 2008 01:04:50 +0000 skrev Carl Taylor:
> I would appreciate an explanation as to why a <copy-seq> is needed in
> line 12 of the following <defun> to avoid garbage output. It is only
> needed when initially inserting a new entry to the hash table. The
> implementation is LW.
A minor stylistic nitpick: it's customary to take advantage of the
default upcasing behaviour of CL and say COPY-SEQ and DEFUN instead of
<copy-seq> and <defun>. That makes it easy to distinguish, say, LENGTH-
the-function from length-the-word even outside code snippets.
Cheers,
Maciej