Comments requested on noob code

From: John Landahl
Subject: Comments requested on noob code
Date: Wed, 26 Jan 2005 18:58:07 +0000
Message-ID: <3LRJd.3784$m31.52913@typhoon.sonic.net>

Hi, all.  I'm pretty much a complete beginner at Lisp (though not a
beginning programmer), and would appreciate some feedback from experienced
Lispers on a tiny bit of code.

The aim is to use a dictionary file (one word per line,
e.g. /usr/share/dict/words) to determine how many valid words can be made
from a "rack" of letters (a la Scrabble).  Here's a sample run:

(time (find-in-file "/usr/share/dict/words" "struoi"))
Evaluation took:
     0.303 seconds of real time
     0.26596 seconds of user run time
     0.025996 seconds of system run time
     0 page faults and
     18,410,256 bytes consed.
("" "I" "Io" "Ir" "It" "Ito" "O" "Os" "Otis" "R" "Rio" "Rios" "Ru" "S" "Si"
 "Sr" "St" "Stu" "Sui" "T" "Ti" "U" "Ur" "Uris" "i" "is" "it" "its" "o" "or"
 "our" "ours" "oust" "out" "outs" "r" "riot" "riots" "rot" "rots" "rout"
 "routs" "rs" "rust" "rut" "ruts" "s" "sir" "sit" "so" "sort" "sot" "sour"
 "stir" "suit" "suitor" "t" "ti" "to" "tor" "tors" "torsi" "torus" "tour"
 "tours" "trio" "trios" "ts" "u" "us")

The code is included below.  Some specific questions I have:

  1) The speed is quite acceptable, but quite a bit of memory is
     being used.  This is no doubt because of the way match-letters
     is written, and so:
  2) I'm using recursion with first/rest and remove in match-letters;
     is there a better way to do this?
  3) I'm using "loop for ... across ..." to split a string into a list
     of chars.  Is this acceptable, or is there a more idiomatic way to
     do this?
  4) match-word is an intermediary function used for sanity checking
     and for converting the string to a char list.  Is there a cleaner
     way to do this within find-in-file's loop?
  5) Would type declarations help in way?
  6) SLIME question: why does SLIME indent slightly differently when
     editing a file versus indenting in the REPL?  Specifically, the
     else clauses are backdented two spaces when editing a file, but
     are lined up with the main clause in the REPL.

Any and all comments/suggestions are welcome.  I'm especially interested in
learning the most idiomatic way to do these things in Lisp.  Thanks in
advance.  Code follows:

(defun match-letters (letters rack)
  "Determine whether 'letters' (a list of chars) can be created from choices
   in 'rack' (also a list of chars)."
  (if (null letters)
      t
    (let ((letter (first letters)))
      (if (find letter rack)
   (match-letters (rest letters) (remove letter rack :count 1))
        nil))))

(defun match-word (word rack-letters)
  (if (or (null word))
      nil
    (let ((word-letters (loop for l across word
         collect (char-downcase l))))
      (match-letters word-letters rack-letters))))

(defun find-in-file (file-name rack)
  (let ((rack-letters (loop for l across rack
       collect (char-downcase l))))
    (with-open-file (in file-name)
      (loop for word = (read-line in nil)
     while word
     when (match-word word rack-letters)
     collect word))))

Re: Comments requested on noob code John Landahl
Re: Comments requested on noob code Christopher C. Stacy
- Re: Comments requested on noob code John Landahl
  - Re: Comments requested on noob code Christopher C. Stacy
    - Re: Comments requested on noob code John Landahl
Re: Comments requested on noob code Peter Seibel
- Re: Comments requested on noob code Peter Seibel
  - Re: Comments requested on noob code John Landahl
    - Re: Comments requested on noob code Peter Seibel
      - Re: Comments requested on noob code Peter Seibel
        Re: Comments requested on noob code John Landahl
        Re: Comments requested on noob code Peter Seibel
      - Re: Comments requested on noob code David Sletten
Re: Comments requested on noob code David Sletten
- Re: Comments requested on noob code Frank Buss
  - Re: Comments requested on noob code David Sletten
    - Re: Comments requested on noob code Frank Buss
      - Re: Comments requested on noob code Christopher C. Stacy
  - Re: Comments requested on noob code John Landahl
- Re: Comments requested on noob code John Landahl
  - Re: Comments requested on noob code David Sletten
Re: Comments requested on noob code Brian Downing

From: John Landahl
Subject: Re: Comments requested on noob code
Date: Wed, 26 Jan 2005 19:06:12 +0000
Message-ID: <ESRJd.3786$m31.52884@typhoon.sonic.net>

Sorry for the typo:

>   5) Would type declarations help in way?

s/in way/in any way/

From: Christopher C. Stacy
Subject: Re: Comments requested on noob code
Date: Wed, 26 Jan 2005 19:16:38 +0000
Message-ID: <uu0p4q9uy.fsf@news.dtpq.com>

This is not strictly an "idiomatic Lisp" question, but rather an 
issue with the algorithm that you are using to match the words.
Exploding the letters of the word into a list, rather than just
dealing with the string representation of the word is going to 
use up a lot of memory.  You are also using up memory when you
create a new character object to do your lowercase comparison.

Why not just map over the string in the first place, with a 
non-consing (no new memory allocation) case-insensitive 
testing function?  

I'd suggest looking more closely at the arguments that you
could give to the mapping and sequence functions (like FIND), 
and explore the character and string functions (such as CHAR-EQUAL).

From: John Landahl
Subject: Re: Comments requested on noob code
Date: Wed, 26 Jan 2005 19:42:33 +0000
Message-ID: <JoSJd.3792$m31.53039@typhoon.sonic.net>

Christopher C. Stacy wrote:

> This is not strictly an "idiomatic Lisp" question, but rather an
> issue with the algorithm that you are using to match the words.

I was as interested in comments on the algorithm as on idiomatic concerns. 
I don't know enough Lisp that an alternative implementation immediately
suggests itself.

> ...
> I'd suggest looking more closely at the arguments that you
> could give to the mapping and sequence functions (like FIND),
> and explore the character and string functions (such as CHAR-EQUAL).

I thought about extra arguments to FIND and co., but the "rack" might have
multiples of the same character (e.g. "fctaaht" can make "attach") and I
didn't know if they could be used to effectively deal with that.

- John

From: Christopher C. Stacy
Subject: Re: Comments requested on noob code
Date: Wed, 26 Jan 2005 21:55:57 +0000
Message-ID: <umzuvrh1v.fsf@news.dtpq.com>

John Landahl <····@landahl.org> writes:

> Christopher C. Stacy wrote:
> 
> > This is not strictly an "idiomatic Lisp" question, but rather an
> > issue with the algorithm that you are using to match the words.
> 
> I was as interested in comments on the algorithm as on idiomatic concerns. 
> I don't know enough Lisp that an alternative implementation immediately
> suggests itself.

There is no standard Lisp idiom that implements Scrabble,
but the Lisp code that you have presented uses the functions
and techniques that you have selected in a normal way that is
easily understood by Lisp programmers.  Your complaint that the
program uses way too much memory simply indicates that your 
algorithm may be inappropriate.  Wouldn't you have the same problem
if you implemented this the same way in any other language?

The question that you seem to be asking is not about Lisp programming
idioms, but about performance, specifically memory usage.  I pointed
out places where you could revise your algorithm to not allocate
a lot of memory for intermediate data structures.

I thought I was being pretty clear when I said that in Lisp we do 
not normally operate on strings by iterating across each character 
in order to package them up into a linked list structure and then
perform recursion on that structure (and creating additional list
structures along the way).

You stated that you are "not a beginning programmer", so I now ask you
how you would write the program in whatever your favorite language is.
There's nothing wrong with your toplevel function, FIND-IN-FILE.  
All the interesting work happens in the MATCH-WORD subroutine.

If I understand what this program is supposed to do, it's just 
handed two piles of letters, and it calculates whether there are
enough latters in one pile to construct the second pile of letters.

So the big hint for you is that this program is about counting letters.
I again suggest that you take a look at the sequence functions.

From: John Landahl
Subject: Re: Comments requested on noob code
Date: Thu, 27 Jan 2005 03:40:22 +0000
Message-ID: <GoZJd.3869$m31.53791@typhoon.sonic.net>

Christopher C. Stacy wrote:

> There is no standard Lisp idiom that implements Scrabble,
> but the Lisp code that you have presented uses the functions
> and techniques that you have selected in a normal way that is
> easily understood by Lisp programmers.  Your complaint that the
> program uses way too much memory simply indicates that your
> algorithm may be inappropriate.

I may have overemphasized the point slightly.  I really didn't think it was
that big a deal, I just wondered if there was a more efficient solution
that still looked "right" to a Lisp programmer.

> Wouldn't you have the same problem 
> if you implemented this the same way in any other language?

It would no doubt be far worse, since lists are not typically implemented as
conses in another language I might use (e.g. Python).  It seemed like a
fairly standard thing to do in Lisp, though, at least according to examples
I've seen in books.

> The question that you seem to be asking is not about Lisp programming
> idioms, but about performance, specifically memory usage.
> ...

It was just one of my questions (ok, two).  I was more interested in knowing
if I was writing code that looked in remotely like something a Lisp
programmer would come up with.  Trying to avoid writing FORTRAN in Lisp, as
the saying goes.  Thanks for your help.

- John

From: Peter Seibel
Subject: Re: Comments requested on noob code
Date: Wed, 26 Jan 2005 19:26:24 +0000
Message-ID: <m3651k0z83.fsf@javamonkey.com>

John Landahl <····@landahl.org> writes:

> Hi, all. I'm pretty much a complete beginner at Lisp (though not a
> beginning programmer), and would appreciate some feedback from
> experienced Lispers on a tiny bit of code.
>
> The aim is to use a dictionary file (one word per line,
> e.g. /usr/share/dict/words) to determine how many valid words can be made
> from a "rack" of letters (a la Scrabble).  Here's a sample run:
>
> (time (find-in-file "/usr/share/dict/words" "struoi"))
> Evaluation took:
>      0.303 seconds of real time
>      0.26596 seconds of user run time
>      0.025996 seconds of system run time
>      0 page faults and
>      18,410,256 bytes consed.
> ("" "I" "Io" "Ir" "It" "Ito" "O" "Os" "Otis" "R" "Rio" "Rios" "Ru" "S" "Si"
>  "Sr" "St" "Stu" "Sui" "T" "Ti" "U" "Ur" "Uris" "i" "is" "it" "its" "o" "or"
>  "our" "ours" "oust" "out" "outs" "r" "riot" "riots" "rot" "rots" "rout"
>  "routs" "rs" "rust" "rut" "ruts" "s" "sir" "sit" "so" "sort" "sot" "sour"
>  "stir" "suit" "suitor" "t" "ti" "to" "tor" "tors" "torsi" "torus" "tour"
>  "tours" "trio" "trios" "ts" "u" "us")
>
> The code is included below.  Some specific questions I have:
>
>   1) The speed is quite acceptable, but quite a bit of memory is
>      being used.  This is no doubt because of the way match-letters
>      is written, and so:
>   2) I'm using recursion with first/rest and remove in match-letters;
>      is there a better way to do this?
>   3) I'm using "loop for ... across ..." to split a string into a list
>      of chars.  Is this acceptable, or is there a more idiomatic way to
>      do this?

That's fine if that's what you really want. But in this case you
probably don't need to make a list each time; you can operate on the
strings as sequences. What about something like this:

  (defun match-word (word sorted-rack)
    (loop for char across (sort (copy-seq word) #'char<)
         for start = 0 then (1+ idx)
         for idx = (position char sorted-rack :start start)
         always idx))

  (defun find-in-file (file-name rack)
    (with-open-file (in file-name)
      (loop with sorted-rack = (sort (copy-seq rack) #'char<)
         for word = (read-line in nil)
         while word
         when (match-word word rack) collect word)))

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp

From: Peter Seibel
Subject: Re: Comments requested on noob code
Date: Wed, 26 Jan 2005 19:29:30 +0000
Message-ID: <m31xc80z2x.fsf@javamonkey.com>

Peter Seibel <·····@javamonkey.com> writes:

> John Landahl <····@landahl.org> writes:
>
>> Hi, all. I'm pretty much a complete beginner at Lisp (though not a
>> beginning programmer), and would appreciate some feedback from
>> experienced Lispers on a tiny bit of code.
>>
>> The aim is to use a dictionary file (one word per line,
>> e.g. /usr/share/dict/words) to determine how many valid words can be made
>> from a "rack" of letters (a la Scrabble).  Here's a sample run:
>>
>> (time (find-in-file "/usr/share/dict/words" "struoi"))
>> Evaluation took:
>>      0.303 seconds of real time
>>      0.26596 seconds of user run time
>>      0.025996 seconds of system run time
>>      0 page faults and
>>      18,410,256 bytes consed.
>> ("" "I" "Io" "Ir" "It" "Ito" "O" "Os" "Otis" "R" "Rio" "Rios" "Ru" "S" "Si"
>>  "Sr" "St" "Stu" "Sui" "T" "Ti" "U" "Ur" "Uris" "i" "is" "it" "its" "o" "or"
>>  "our" "ours" "oust" "out" "outs" "r" "riot" "riots" "rot" "rots" "rout"
>>  "routs" "rs" "rust" "rut" "ruts" "s" "sir" "sit" "so" "sort" "sot" "sour"
>>  "stir" "suit" "suitor" "t" "ti" "to" "tor" "tors" "torsi" "torus" "tour"
>>  "tours" "trio" "trios" "ts" "u" "us")
>>
>> The code is included below.  Some specific questions I have:
>>
>>   1) The speed is quite acceptable, but quite a bit of memory is
>>      being used.  This is no doubt because of the way match-letters
>>      is written, and so:
>>   2) I'm using recursion with first/rest and remove in match-letters;
>>      is there a better way to do this?
>>   3) I'm using "loop for ... across ..." to split a string into a list
>>      of chars.  Is this acceptable, or is there a more idiomatic way to
>>      do this?
>
> That's fine if that's what you really want. But in this case you
> probably don't need to make a list each time; you can operate on the
> strings as sequences. What about something like this:
>
>   (defun match-word (word sorted-rack)
>     (loop for char across (sort (copy-seq word) #'char<)
>          for start = 0 then (1+ idx)
>          for idx = (position char sorted-rack :start start)
>          always idx))
>
>   (defun find-in-file (file-name rack)
>     (with-open-file (in file-name)
>       (loop with sorted-rack = (sort (copy-seq rack) #'char<)
>          for word = (read-line in nil)
>          while word
>          when (match-word word rack) collect word)))

Whoops, you'll want to use STRING-DOWNCASE around the COPY-SEQ to get
back case insensitivity. (Don't replace COPY-SEQ with STRING-DOWNCASE
since although its not "destructive" it can return the original string
if it is already all lowercase in which case the SORT will destroy
your original string.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp

From: John Landahl
Subject: Re: Comments requested on noob code
Date: Wed, 26 Jan 2005 20:05:37 +0000
Message-ID: <lKSJd.3800$m31.53168@typhoon.sonic.net>

Peter Seibel wrote:

> Peter Seibel <·····@javamonkey.com> writes:
> ...
>> That's fine if that's what you really want. But in this case you
>> probably don't need to make a list each time; you can operate on the
>> strings as sequences. What about something like this:
>>
>>   (defun match-word (word sorted-rack)
>>     (loop for char across (sort (copy-seq word) #'char<)
>>          for start = 0 then (1+ idx)
>>          for idx = (position char sorted-rack :start start)
>>          always idx))
>>
>>   (defun find-in-file (file-name rack)
>>     (with-open-file (in file-name)
>>       (loop with sorted-rack = (sort (copy-seq rack) #'char<)
>>          for word = (read-line in nil)
>>          while word
>>          when (match-word word rack) collect word)))
> 
> Whoops, you'll want to use STRING-DOWNCASE around the COPY-SEQ to get
> back case insensitivity.
> ...

Thanks for taking the time to look at this, Peter -- it must be a really
scarce commodity for you right now. :)  Your solution is much cleaner,
clearer, and better demonstrates the elegance of LOOP.  Oddly, though, it
takes longer to run and uses more memory:

(time (find-in-file "/usr/share/dict/words" "struoi"))
Evaluation took:
   1.358 seconds of real time
   1.286804 seconds of user run time
   0.046993 seconds of system run time
   0 page faults and
   37,768,608 bytes consed.

... versus the original:

(time (find-in-file "/usr/share/dict/words" "struoi"))
Evaluation took:
���0.303�seconds�of�real�time
���0.26596�seconds�of�user�run�time
���0.025996�seconds�of�system�run�time
���0�page�faults�and
���18,410,256�bytes�consed.

Any idea why that would be the case?

- John

From: Peter Seibel
Subject: Re: Comments requested on noob code
Date: Wed, 26 Jan 2005 20:13:45 +0000
Message-ID: <m3wtu0ymnu.fsf@javamonkey.com>

John Landahl <····@landahl.org> writes:

> Peter Seibel wrote:
>
>> Peter Seibel <·····@javamonkey.com> writes:
>> ...
>>> That's fine if that's what you really want. But in this case you
>>> probably don't need to make a list each time; you can operate on the
>>> strings as sequences. What about something like this:
>>>
>>>   (defun match-word (word sorted-rack)
>>>     (loop for char across (sort (copy-seq word) #'char<)
>>>          for start = 0 then (1+ idx)
>>>          for idx = (position char sorted-rack :start start)
>>>          always idx))
>>>
>>>   (defun find-in-file (file-name rack)
>>>     (with-open-file (in file-name)
>>>       (loop with sorted-rack = (sort (copy-seq rack) #'char<)
>>>          for word = (read-line in nil)
>>>          while word
>>>          when (match-word word rack) collect word)))
>> 
>> Whoops, you'll want to use STRING-DOWNCASE around the COPY-SEQ to get
>> back case insensitivity.
>> ...
>
> Thanks for taking the time to look at this, Peter -- it must be a really
> scarce commodity for you right now. :)  Your solution is much cleaner,
> clearer, and better demonstrates the elegance of LOOP.  Oddly, though, it
> takes longer to run and uses more memory:

Well, if you put in the STRING-DOWNCASE it's (sadly) probably making
two copies of the word--once in COPY-SEQ and once in STRING-DOWNCASE.
But you can't count on STRING-DOWNCASE always making a copy. Duh--it
just hit me, use NSTRING-DOWNCASE since you know you're dealing with a
fresh string. Beyond that, I have no idea.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp

From: Peter Seibel
Subject: Re: Comments requested on noob code
Date: Wed, 26 Jan 2005 20:21:21 +0000
Message-ID: <m3oefcymb5.fsf@javamonkey.com>

Peter Seibel <·····@javamonkey.com> writes:

> John Landahl <····@landahl.org> writes:
>
>> Peter Seibel wrote:
>>
>>> Peter Seibel <·····@javamonkey.com> writes:
>>> ...
>>>> That's fine if that's what you really want. But in this case you
>>>> probably don't need to make a list each time; you can operate on the
>>>> strings as sequences. What about something like this:
>>>>
>>>>   (defun match-word (word sorted-rack)
>>>>     (loop for char across (sort (copy-seq word) #'char<)
>>>>          for start = 0 then (1+ idx)
>>>>          for idx = (position char sorted-rack :start start)
>>>>          always idx))
>>>>
>>>>   (defun find-in-file (file-name rack)
>>>>     (with-open-file (in file-name)
>>>>       (loop with sorted-rack = (sort (copy-seq rack) #'char<)
>>>>          for word = (read-line in nil)
>>>>          while word
>>>>          when (match-word word rack) collect word)))
>>> 
>>> Whoops, you'll want to use STRING-DOWNCASE around the COPY-SEQ to get
>>> back case insensitivity.
>>> ...
>>
>> Thanks for taking the time to look at this, Peter -- it must be a really
>> scarce commodity for you right now. :)  Your solution is much cleaner,
>> clearer, and better demonstrates the elegance of LOOP.  Oddly, though, it
>> takes longer to run and uses more memory:
>
> Well, if you put in the STRING-DOWNCASE it's (sadly) probably making
> two copies of the word--once in COPY-SEQ and once in STRING-DOWNCASE.
> But you can't count on STRING-DOWNCASE always making a copy. Duh--it
> just hit me, use NSTRING-DOWNCASE since you know you're dealing with a
> fresh string. Beyond that, I have no idea.

Oh, and find-in-file has a bug, it's passing RACK instead of
SORTED-RACK to MATCH-WORD. Dunno if it'll run faster or slower when
you fix that but at least it'll be correct.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp

From: John Landahl
Subject: Re: Comments requested on noob code
Date: Wed, 26 Jan 2005 20:28:49 +0000
Message-ID: <54TJd.3805$m31.53269@typhoon.sonic.net>

Peter Seibel wrote:
> Oh, and find-in-file has a bug, it's passing RACK instead of
> SORTED-RACK to MATCH-WORD. Dunno if it'll run faster or slower when
> you fix that but at least it'll be correct.

I did fix that.  And by the way, I tried it first without string-downcase
and it didn't make that big of a difference time/memory-wise.

- John

From: Peter Seibel
Subject: Re: Comments requested on noob code
Date: Wed, 26 Jan 2005 23:47:02 +0000
Message-ID: <m3fz0nycsd.fsf@javamonkey.com>

John Landahl <····@landahl.org> writes:

> Peter Seibel wrote:
>> Oh, and find-in-file has a bug, it's passing RACK instead of
>> SORTED-RACK to MATCH-WORD. Dunno if it'll run faster or slower when
>> you fix that but at least it'll be correct.
>
> I did fix that.  And by the way, I tried it first without string-downcase
> and it didn't make that big of a difference time/memory-wise.

Okay, here's my best offer. Basically the same algorithm as your
original but without all the needless list creation.

  (defun find-in-file (file-name rack)
    (with-open-file (in file-name)
      (loop for word = (read-line in nil nil) while word
         when (match-word word rack) collect word)))

  (defun match-word (word rack)
    (loop with tiles = (copy-seq rack)
       for char across word
       always (find char tiles :test #'char-equal)
       do (setf tiles (delete char tiles :test #'char-equal :count 1))))

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp

From: David Sletten
Subject: Re: Comments requested on noob code
Date: Wed, 26 Jan 2005 20:23:46 +0000
Message-ID: <m%SJd.1100$e11.646@twister.socal.rr.com>

Peter Seibel wrote:

> John Landahl <····@landahl.org> writes:
> 
> 
>>Peter Seibel wrote:
>>
>>
>>>Peter Seibel <·····@javamonkey.com> writes:
>>>...
>>>
>>>>That's fine if that's what you really want. But in this case you
>>>>probably don't need to make a list each time; you can operate on the
>>>>strings as sequences. What about something like this:
>>>>
>>>>  (defun match-word (word sorted-rack)
>>>>    (loop for char across (sort (copy-seq word) #'char<)
>>>>         for start = 0 then (1+ idx)
>>>>         for idx = (position char sorted-rack :start start)
>>>>         always idx))
>>>>
>>>>  (defun find-in-file (file-name rack)
>>>>    (with-open-file (in file-name)
>>>>      (loop with sorted-rack = (sort (copy-seq rack) #'char<)
>>>>         for word = (read-line in nil)
>>>>         while word
>>>>         when (match-word word rack) collect word)))
>>>
>>>Whoops, you'll want to use STRING-DOWNCASE around the COPY-SEQ to get
>>>back case insensitivity.
>>>...
>>
>>Thanks for taking the time to look at this, Peter -- it must be a really
>>scarce commodity for you right now. :)  Your solution is much cleaner,
>>clearer, and better demonstrates the elegance of LOOP.  Oddly, though, it
>>takes longer to run and uses more memory:
> 
> 
> Well, if you put in the STRING-DOWNCASE it's (sadly) probably making
> two copies of the word--once in COPY-SEQ and once in STRING-DOWNCASE.
> But you can't count on STRING-DOWNCASE always making a copy. Duh--it
> just hit me, use NSTRING-DOWNCASE since you know you're dealing with a
> fresh string. Beyond that, I have no idea.
> 
> -Peter
> 
It doesn't seem to return the same results either...
("I" "i" "O" "o" "R" "r" "S" "s" "st" "T" "t" "tu" "U" "u" "ur" "us" 
"ust" "ut")

From: David Sletten
Subject: Re: Comments requested on noob code
Date: Wed, 26 Jan 2005 20:20:49 +0000
Message-ID: <BYSJd.1099$e11.642@twister.socal.rr.com>

John Landahl wrote:

> Hi, all.  I'm pretty much a complete beginner at Lisp (though not a
> beginning programmer), and would appreciate some feedback from experienced
> Lispers on a tiny bit of code.
> 
> The aim is to use a dictionary file (one word per line,
> e.g. /usr/share/dict/words) to determine how many valid words can be made
> from a "rack" of letters (a la Scrabble).  Here's a sample run:
> 
> (time (find-in-file "/usr/share/dict/words" "struoi"))
> Evaluation took:
>      0.303 seconds of real time
>      0.26596 seconds of user run time
>      0.025996 seconds of system run time
>      0 page faults and
>      18,410,256 bytes consed.
> ("" "I" "Io" "Ir" "It" "Ito" "O" "Os" "Otis" "R" "Rio" "Rios" "Ru" "S" "Si"
>  "Sr" "St" "Stu" "Sui" "T" "Ti" "U" "Ur" "Uris" "i" "is" "it" "its" "o" "or"
>  "our" "ours" "oust" "out" "outs" "r" "riot" "riots" "rot" "rots" "rout"
>  "routs" "rs" "rust" "rut" "ruts" "s" "sir" "sit" "so" "sort" "sot" "sour"
>  "stir" "suit" "suitor" "t" "ti" "to" "tor" "tors" "torsi" "torus" "tour"
>  "tours" "trio" "trios" "ts" "u" "us")
> 
> The code is included below.  Some specific questions I have:
> 
>   1) The speed is quite acceptable, but quite a bit of memory is
>      being used.  This is no doubt because of the way match-letters
>      is written, and so:
>   2) I'm using recursion with first/rest and remove in match-letters;
>      is there a better way to do this?
>   3) I'm using "loop for ... across ..." to split a string into a list
>      of chars.  Is this acceptable, or is there a more idiomatic way to
>      do this?
>   4) match-word is an intermediary function used for sanity checking
>      and for converting the string to a char list.  Is there a cleaner
>      way to do this within find-in-file's loop?
>   5) Would type declarations help in way?
>   6) SLIME question: why does SLIME indent slightly differently when
>      editing a file versus indenting in the REPL?  Specifically, the
>      else clauses are backdented two spaces when editing a file, but
>      are lined up with the main clause in the REPL.
> 
> Any and all comments/suggestions are welcome.  I'm especially interested in
> learning the most idiomatic way to do these things in Lisp.  Thanks in
> advance.  Code follows:
> 
> (defun match-letters (letters rack)
>   "Determine whether 'letters' (a list of chars) can be created from choices
>    in 'rack' (also a list of chars)."
>   (if (null letters)
>       t
>     (let ((letter (first letters)))
>       (if (find letter rack)
>    (match-letters (rest letters) (remove letter rack :count 1))
>         nil))))
> 
> (defun match-word (word rack-letters)
>   (if (or (null word))
>       nil
>     (let ((word-letters (loop for l across word
>          collect (char-downcase l))))
>       (match-letters word-letters rack-letters))))
> 
> (defun find-in-file (file-name rack)
>   (let ((rack-letters (loop for l across rack
>        collect (char-downcase l))))
>     (with-open-file (in file-name)
>       (loop for word = (read-line in nil)
>      while word
>      when (match-word word rack-letters)
>      collect word))))

John,

I'm not claiming any land speed records, but on my system this takes 
about 68% of the time yours does and uses about 54% of the memory:

(defun match-word (word rack-hash)
   (loop with lowercase-word = (string-downcase word)
         for ch across (remove-duplicates lowercase-word)
         always (<= (count ch lowercase-word) (gethash ch rack-hash 0))
         finally (return word)))

(defun string->hash (s)
   (let ((h (make-hash-table :test #'eql)))
     (loop for char across (string-downcase s)
           do (setf (gethash char h) (1+ (gethash char h 0)))
           finally (return h))))

(defun find-in-file (file-name rack)
   (with-open-file (in file-name)
     (loop with rack-hash = (string->hash rack)
           with rack-length = (length rack)
           for word = (read-line in nil nil)
           while word
           when (and (<= (length word) rack-length)
                     (match-word (string-downcase word) rack-hash))
           collect word)))

The basic plan is to take your "rack" and make a hash table counting 
each "tile". As we examine the words in the dictionary we immediately 
discard those that are longer than the rack. If we have the right 
length, then we look at the individual letters. In MATCH-WORD I use the 
COUNT function to compare unique characters in the target word with 
those in the rack hash. You may want to experiment with creating a hash 
for each target word instead...

David Sletten

From: Frank Buss
Subject: Re: Comments requested on noob code
Date: Thu, 27 Jan 2005 00:20:45 +0000
Message-ID: <ct9c4t$f5m$1@newsreader2.netcologne.de>

David Sletten <·····@slytobias.com> wrote:

> The basic plan is to take your "rack" and make a hash table counting 
> each "tile". As we examine the words in the dictionary we immediately 
> discard those that are longer than the rack. 

good idea. For short search strings this implementation is ca. 50% faster:

(defparameter *primes*
  #(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101))

(defun word-prime (word)
  (reduce #'(lambda (a b) 
              (* a (svref *primes* (- (char-code b)
                                      (char-code #\a))))) 
          (nstring-downcase word) :initial-value 1))

(defun find-in-file (file-name rack)
  (let ((rack-prime (word-prime rack))
        (rack-length (length rack)))
    (with-open-file (in file-name)
      (loop for word = (read-line in nil)
            while word
            when (and (<= (length word) rack-length)
                      (= 0 (rem rack-prime (word-prime word))))
            collect word))))


-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: David Sletten
Subject: Re: Comments requested on noob code
Date: Thu, 27 Jan 2005 00:34:46 +0000
Message-ID: <GGWJd.2747$e11.1683@twister.socal.rr.com>

Frank Buss wrote:
> David Sletten <·····@slytobias.com> wrote:
> 
> 
>>The basic plan is to take your "rack" and make a hash table counting 
>>each "tile". As we examine the words in the dictionary we immediately 
>>discard those that are longer than the rack. 
> 
> 
> good idea. For short search strings this implementation is ca. 50% faster:
> 
> (defparameter *primes*
>   #(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101))
> 
> (defun word-prime (word)
>   (reduce #'(lambda (a b) 
>               (* a (svref *primes* (- (char-code b)
>                                       (char-code #\a))))) 
>           (nstring-downcase word) :initial-value 1))
> 
> (defun find-in-file (file-name rack)
>   (let ((rack-prime (word-prime rack))
>         (rack-length (length rack)))
>     (with-open-file (in file-name)
>       (loop for word = (read-line in nil)
>             while word
>             when (and (<= (length word) rack-length)
>                       (= 0 (rem rack-prime (word-prime word))))
>             collect word))))
> 
> 
That is fantastic! So WORD-PRIME "factors" a word based solely on the 
characters it contains irrespective of order but taking repeated 
characters into accout. Just as any natural number has a unique prime 
factorization (Fundamental Theorem of Arithmetic), any bag (not 
sequence) of characters has a unique prime factorization associated with 
it via WORD-PRIME. Any word whose own value is a "factor" of the rack is 
kept. Very nice.

David Sletten

From: Frank Buss
Subject: Re: Comments requested on noob code
Date: Thu, 27 Jan 2005 00:41:02 +0000
Message-ID: <ct9dau$fla$1@newsreader2.netcologne.de>

David Sletten <·····@slytobias.com> wrote:

> That is fantastic! So WORD-PRIME "factors" a word based solely on the 
> characters it contains irrespective of order but taking repeated 
> characters into accout. Just as any natural number has a unique prime 
> factorization 

thanks, but it's not all my idea. I've seen a class called "Prime*" in the 
Add-A-Gram Java solution yesterday and then it was obvious, without looking 
inside the Java source :-)

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: Christopher C. Stacy
Subject: Re: Comments requested on noob code
Date: Thu, 27 Jan 2005 07:22:49 +0000
Message-ID: <uhdl3qqt3.fsf@news.dtpq.com>

Frank Buss <··@frank-buss.de> writes:
> David Sletten <·····@slytobias.com> wrote:
> 
> > That is fantastic! So WORD-PRIME "factors" a word based solely on the 
> > characters it contains irrespective of order but taking repeated 
> > characters into accout. Just as any natural number has a unique prime 
> > factorization 
> 
> thanks, but it's not all my idea. I've seen a class called "Prime*" in the 
> Add-A-Gram Java solution yesterday and then it was obvious, without looking 
> inside the Java source :-)

That's the coolest solution, I think!

I was thinking along the following lines:

----------------------------------------------------------------------
(defun find-in-file (rack &optional (file *dictionary-file*))
  (with-open-file (dict file)
    (loop for word = (read-line dict nil)
            until (null word)
            when (match-word word rack)
            collect word)))

(defun match-word (word rack)
  (unless (> (length word) (length rack)) ;Not enough letters
    (every #'(lambda (c) (enough-chars c word rack))
           word)))

(defun enough-chars (c word rack)
  (let ((number-in-rack (count c rack :test #'char-equal)))
    (and (not (zerop number-in-rack))
         (>= number-in-rack
             (count c word :test #'char-equal)))))
----------------------------------------------------------------------

This doesn't cons at all, except for the READ-LINE (and of course the
final answer list of words that it returns).  If this were a real
program, I would either read the entire dictionary into memory just
once, or use a buffered IO stream that did not cons.

You could also represent each dictionary word as a vector of 
character counts.  Loop across this vector ignoring zeros,
using the index as an ASCII code; COUNT the number of characters
in the rack as above.  This means each word takes 26 iterations,
but you only have to count the characters in the word one time,
rather than every word in the dictionary over again for every rack.

I like mine because I think it's obvious what's going on.
ENOUGH-CHARS could be folded into that LAMBDA, of course.
If you're looking for examples, it shows sequence mapping functions,
the ever useful :TEST keyword argument, and an anonymous function.
Like the original program, it shows WITH-OPEN-FILE and a simple
LOOP collection.

From: John Landahl
Subject: Re: Comments requested on noob code
Date: Thu, 27 Jan 2005 03:13:17 +0000
Message-ID: <h%YJd.3868$m31.53933@typhoon.sonic.net>

Frank Buss wrote:

> David Sletten <·····@slytobias.com> wrote:
> 
>> The basic plan is to take your "rack" and make a hash table counting
>> each "tile". As we examine the words in the dictionary we immediately
>> discard those that are longer than the rack.
> 
> good idea. For short search strings this implementation is ca. 50% faster:
> 
> (defparameter *primes*
>   #(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
>   #101))
> 
> (defun word-prime (word)
>   (reduce #'(lambda (a b)
>               (* a (svref *primes* (- (char-code b)
>                                       (char-code #\a)))))
>           (nstring-downcase word) :initial-value 1))
> 
> (defun find-in-file (file-name rack)
>   (let ((rack-prime (word-prime rack))
>         (rack-length (length rack)))
>     (with-open-file (in file-name)
>       (loop for word = (read-line in nil)
>             while word
>             when (and (<= (length word) rack-length)
>                       (= 0 (rem rack-prime (word-prime word))))
>             collect word))))

Wow, great idea, and nice use of reduce.  Part of the learning experience
for me is getting general FP idioms into my head so that I think to use
them more often.

The only problem with the above code is that my words file has entries with
apostrophes and even some with extended characters, which will choke
word-prime.  A little extra code for error handling should do the trick.

- John

From: John Landahl
Subject: Re: Comments requested on noob code
Date: Thu, 27 Jan 2005 03:07:56 +0000
Message-ID: <gWYJd.3866$m31.53895@typhoon.sonic.net>

David Sletten wrote:
> ...
> I'm not claiming any land speed records, but on my system this takes
> about 68% of the time yours does and uses about 54% of the memory:
> ...
> The basic plan is to take your "rack" and make a hash table counting
> each "tile".

Right, this is how I would do it in Python or what-have-you.  I didn't do
this initially in Lisp because the recursive solution seemed closer to the
spirit of a lot of other Lisp code I've seen (at least in books...).  I
also thought I read somewhere that hash tables were somewhat inefficient
for small collections.

> As we examine the words in the dictionary we immediately 
> discard those that are longer than the rack.

D'oh, right, should have thought of that.

> In MATCH-WORD I use the
> COUNT function to compare unique characters in the target word with
> those in the rack hash.

This is quite a nice solution, and I really like how you integrated it
within LOOP.

> You may want to experiment with creating a hash 
> for each target word instead...

Seems like that would involve a lot more overhead (my words file has >96,000
entries, for example).

Thanks for the suggestions and code.  It's beautiful to see the whole thing
done in three LOOP statements.

- John

From: David Sletten
Subject: Re: Comments requested on noob code
Date: Thu, 27 Jan 2005 05:19:58 +0000
Message-ID: <2S_Jd.2764$e11.1012@twister.socal.rr.com>

John Landahl wrote:


> 
> Thanks for the suggestions and code.  It's beautiful to see the whole thing
> done in three LOOP statements.

The funny thing is that up until a few months ago I never used LOOP! 
There are a lot of neat things you can do with it though. Take a look at 
Peter Seibel's book:
http://www.gigamonkeys.com/book/loop-for-black-belts.html

As you may be aware, LOOP is a macro, so it's often instructive to use 
MACROEXPAND to see what code is actually generated. In particular, study 
how LOOP implements the COLLECT directive. This is LispWorks:
(macroexpand '(loop for i from 1 upto 10 collect i)) =>
(BLOCK NIL
   (MACROLET ((LOOP-FINISH () '(GO #:|end-loop-741|)))
     (LET ((I 1) (#:|to-744| 10) (#:|by-745| 1))
       (LET ((#:|accumulator-742| (LIST NIL)))
         (DECLARE (TYPE LIST #:|accumulator-742|))
         (LET ((#:|aux-var-747| #:|accumulator-742|))
           (TAGBODY (PROGN (WHEN (OR (> I #:|to-744|)) (GO 
#:|end-loop-741|)))
            #:|begin-loop-740| (SETQ #:|aux-var-747| (LAST (RPLACD 
#:|aux-var-747| (LIST I))))
                    (PROGN
                      (LET ((#:|temp-746| (+ I #:|by-745|))) (SETQ I 
#:|temp-746|))
                      (WHEN (OR (> I #:|to-744|)) (GO #:|end-loop-741|)))
                    (GO #:|begin-loop-740|)
            #:|end-loop-741| (RETURN-FROM NIL (CDR 
#:|accumulator-742|))))))))

Notice the interaction with #:|accumulator-742| and #:|aux-var-747| 
(Hint: the accumulator points to the (head of the) result, while aux-var 
adds elements to the end of the list.)

David Sletten

From: Brian Downing
Subject: Re: Comments requested on noob code
Date: Wed, 26 Jan 2005 23:19:15 +0000
Message-ID: <TzVJd.22878$4I2.17812@attbi_s01>

In article <····················@typhoon.sonic.net>,
John Landahl  <····@landahl.org> wrote:
> Hi, all.  I'm pretty much a complete beginner at Lisp (though not a
> beginning programmer), and would appreciate some feedback from experienced
> Lispers on a tiny bit of code.
> 
> The aim is to use a dictionary file (one word per line,
> e.g. /usr/share/dict/words) to determine how many valid words can be made
> from a "rack" of letters (a la Scrabble).  Here's a sample run:

What follows is an algorithm change, but contains consless code for the
main search, which should help with some of your Lisp questions.

I've slightly modified the sample code to read out of a dictionary list,
as reading from the file takes a significant amount of time on my Lisp.
Since I assume you're worried about the algorithm speed, this is
confusing.

Here are the original timings on my computer (500MHz PowerBook, SBCL
0.8.16), modified as described above:

CL-USER> (time (find-in-dictionary *dictionary* "struoi"))
Evaluation took:
  		 4.139 seconds of real time
  		 3.32 seconds of user run time
  		 0.45 seconds of system run time
  		 0 page faults and
  		 39,032,424 bytes consed.
("I" "i" "Io" "io" "is" "iso" "ist" "it" "Ito" "its" "O" "o" "or" "ort" "Os"
 "os" "Otis" "Otus" "our" "ours" "oust" "out" "R" "r" "Rio" "rio" "riot" "rist"
 "rit" "Ro" "roi" "Roist" "roit" "rot" "roust" "rout" "Rus" "rusot" "rust"
 "rut" "S" "s" "si" "Sir" "sir" "sit" "so" "sori" "sort" "sot" "sou" "sour"
 "Sri" "sri" "sruti" "st" "stir" "stour" "Sui" "suit" "suitor" "sur" "Suto"
 "sutor" "T" "t" "ti" "Tiou" "to" "toi" "tor" "toru" "torus" "tou" "tour" "tri"
 "Trio" "trio" "tu" "tui" "tur" "Turi" "turio" "tursio" "U" "u" "ur" "Uro" "us"
 "ust" "ut")

Now, I've made a 26-ary lookup tree for the rack.  This makes looking up
each character a very cheap operation, and consless.  Building the tree
isn't too terribly expensive, either (and only has to be done once per
search, verses once per word):

CL-USER> (time (progn (make-rack-continuation "struoi") nil))
Evaluation took:
  		 0.064 seconds of real time
  		 0.02 seconds of user run time
  		 0.0 seconds of system run time
  		 0 page faults and
  		 185,592 bytes consed.
NIL
CL-USER> (time (progn (make-rack-continuation "struoil") nil))
Evaluation took:
  		 0.269 seconds of real time
  		 0.16 seconds of user run time
  		 0.02 seconds of system run time
  		 0 page faults and
  		 1,298,800 bytes consed.
NIL

The full search:

CL-USER> (time (find-in-dictionary *dictionary* "struoi"))
Evaluation took:
  		 0.274 seconds of real time
  		 0.16 seconds of user run time
  		 0.0 seconds of system run time
  		 0 page faults and
  		 186,352 bytes consed.
("I" "i" "Io" "io" "is" "iso" "ist" "it" "Ito" "its" "O" "o" "or" "ort" "Os"
 "os" "Otis" "Otus" "our" "ours" "oust" "out" "R" "r" "Rio" "rio" "riot" "rist"
 "rit" "Ro" "roi" "Roist" "roit" "rot" "roust" "rout" "Rus" "rusot" "rust"
 "rut" "S" "s" "si" "Sir" "sir" "sit" "so" "sori" "sort" "sot" "sou" "sour"
 "Sri" "sri" "sruti" "st" "stir" "stour" "Sui" "suit" "suitor" "sur" "Suto"
 "sutor" "T" "t" "ti" "Tiou" "to" "toi" "tor" "toru" "torus" "tou" "tour" "tri"
 "Trio" "trio" "tu" "tui" "tur" "Turi" "turio" "tursio" "U" "u" "ur" "Uro" "us"
 "ust" "ut")

A bigger search:

CL-USER> (time (find-in-dictionary *dictionary* "struoil"))
Evaluation took:
  		 0.388 seconds of real time
  		 0.28 seconds of user run time
  		 0.02 seconds of system run time
  		 0 page faults and
  		 1,300,024 bytes consed.
("I" "i" "ilot" "Io" "io" "is" "islot" "iso" "ist" "it" "Ito" "its" "L" "l"
 "li" "lis" "list" "lit" "litus" "Lo" "lo" "loir" "Lois" "lori" "loris"
 "Lorius" "lors" "lost" "Lot" "lot" "lots" "lotus" "Lou" "Louis" "lour" "lout"
 "Lu" "Luo" "Lur" "Luri" "lust" "lut" "O" "o" "oil" "or" "ort" "Os" "os" "Otis"
 "Otus" "our" "ours" "oust" "out" "R" "r" "Rio" "rio" "riot" "rist" "rit" "Ro"
 "roi" "roil" "Roist" "roit" "rot" "roust" "rout" "Rus" "rusot" "rust" "rut"
 "S" "s" "si" "sil" "silo" "silt" "siol" "Sir" "sir" "sit" "slirt" "slit"
 "slot" "slour" "sluit" "slur" "slut" "so" "soil" "Sol" "sol" "soli" "sori"
 "sort" "sot" "sou" "soul" "sour" "Sri" "sri" "sruti" "st" "stir" "stour"
 "stroil" "Sui" "suit" "suitor" "sur" "Suto" "sutor" "T" "t" "ti" "til" "Tiou"
 "tirl" "to" "toi" "toil" "tol" "tolu" "tor" "toru" "torus" "tou" "tour" "tri"
 "Trio" "trio" "trisul" "tu" "tui" "tulsi" "tur" "Turi" "turio" "tursio" "U"
 "u" "ur" "Uro" "us" "ust" "ut")

The new code:

(declaim (inline letter-index))
(defun letter-index (letter)
  (- (char-code (char-downcase letter)) (char-code #\a)))

(declaim (inline index-letter))
(defun index-letter (index)
  (declare (type (mod 26) index))
  (code-char (+ index (char-code #\a))))

(defun make-rack-continuation (rack)
  (declare (optimize speed) (simple-string rack))
  (if (zerop (length rack))
      :empty
      (let ((array (make-array 26 :initial-element nil)))
        (loop for index below 26
              for letter = (index-letter index) do
              (when (find letter rack)
                (setf (aref array index) (make-rack-continuation
                                          (remove letter rack :count 1)))))
        array)))

(defun match-word (word start rack-continuation)
  (declare (optimize speed) (simple-string word) (type (mod 10000) start))
  (cond ((= start (length word)) (if rack-continuation t nil))
        ((or (null rack-continuation) (eq rack-continuation :empty)) nil)
        (t (match-word word (1+ start)
                       (locally (declare (simple-vector rack-continuation))
                         (aref rack-continuation
                               (letter-index (aref word start))))))))

(defun load-dictionary (file-name)
  (with-open-file (in file-name)
    (loop for word = (read-line in nil)
          while word
          collect word)))

(defun find-in-dictionary (dictionary rack)
  (let ((rack-continuation (make-rack-continuation rack)))
    (loop for word in dictionary
          when (match-word word 0 rack-continuation)
          collect word)))

-bcd
-- 
*** Brian Downing <bdowning at lavos dot net>