lists of lists (newbie)

From: Stephen Ramsay
Subject: lists of lists (newbie)
Date: Thu, 18 Aug 2005 22:09:01 +0000
Message-ID: <1G7Ne.30721$rp.24693@bignews1.bellsouth.net>

Lisp newbie asks . . .

For various reasons, I need to process a bunch of terms with associated
weights.  In other words, I have things like:

"foo" -> .8838
"bar" -> .2838
"bat" -> .2849

etc.

Now, this seems like a natural for a hash table, and I can easily get
one set up.  However, I need to return the the results sorted by the
value.  I can't use the value as the key, because I can't guarantee the
uniqueness of the value.

There doesn't seem to be a function called "sort-by-value", and so I'm
essentially trying to build one (or at least set up another data
structure that makes this trivial).

I've tried lots of different things with maphash, dolist, and whatever
else, but always seems to come down to wanting to have something like
this:

(("foo" .8838) ("bar" .2838) ("bat" .2849))

If I had that, I could use CL's amazing swiss-army sort function to sort
by value.

Question is, how do I "accumulate" a list like that?  I can't seem to
use append or push, because I end up with this:

("foo" .8838 "bar" .2838 "bat" .2849)

I can maybe provide some failed code, but I thought I'd ask the general
question first.  How do I iteratively append lists to a list (so I end
up with a list of lists)?  In lisp.

Steve



-- 
Stephen Ramsay
web: http://cantor.english.uga.edu/
PGP Public Key ID: 0xA38D7B11

Re: lists of lists (newbie) Eric Lavigne
Re: lists of lists (newbie) Frank Buss
Re: lists of lists (newbie) Pascal Bourguignon
- Re: lists of lists (newbie) Joe Marshall
  - Re: lists of lists (newbie) Rob Warnock
    - Re: lists of lists (newbie) Joe Marshall
Re: lists of lists (newbie) Marco Antoniotti

From: Eric Lavigne
Subject: Re: lists of lists (newbie)
Date: Thu, 18 Aug 2005 23:23:08 +0000
Message-ID: <1124407388.414981.327620@f14g2000cwb.googlegroups.com>

>Now, this seems like a natural for a hash table, and I can easily get
>one set up.  However, I need to return the the results sorted by the
>value.

Nicolas Neuss sorted key-value pairs in a hash table (sorting according
to the values) for the word frequency benchmark on the shootout. You
can see how it is done here:
http://shootout.alioth.debian.org/benchmark.php?test=wordfreq&lang=cmucl&id=0&sort=fullcpu

From: Frank Buss
Subject: Re: lists of lists (newbie)
Date: Thu, 18 Aug 2005 22:22:50 +0000
Message-ID: <11oe3fsgwqj4b$.1ev7llycmc2id$.dlg@40tude.net>

Stephen Ramsay wrote:

> Question is, how do I "accumulate" a list like that?  I can't seem to
> use append or push, because I end up with this:
> 
> ("foo" .8838 "bar" .2838 "bat" .2849)

You should learn more of the Lisp basics, perhaps with this book:

http://www.gigamonkeys.com/book/

If it is ok to prepend, try this:

(defparameter l ())
(push '("bar" 0.2838) l)
(push '("foo" 0.8838) l)

Append is possible, too (but slower for large lists) :

(defparameter l ())
(setf l (append l '(("foo" 0.8838))))
(setf l (append l '(("bar" 0.2838))))

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: Pascal Bourguignon
Subject: Re: lists of lists (newbie)
Date: Thu, 18 Aug 2005 23:59:18 +0000
Message-ID: <877jei94q1.fsf@thalassa.informatimago.com>

Stephen Ramsay <·······@uga.edu> writes:

> Lisp newbie asks . . .
>
> For various reasons, I need to process a bunch of terms with associated
> weights.  In other words, I have things like:
>
> "foo" -> .8838
> "bar" -> .2838
> "bat" -> .2849
>
> etc.
>
> Now, this seems like a natural for a hash table, and I can easily get
> one set up.  However, I need to return the the results sorted by the
> value.  I can't use the value as the key, because I can't guarantee the
> uniqueness of the value.
>
> There doesn't seem to be a function called "sort-by-value", and so I'm
> essentially trying to build one (or at least set up another data
> structure that makes this trivial).
>
> I've tried lots of different things with maphash, dolist, and whatever
> else, but always seems to come down to wanting to have something like
> this:
>
> (("foo" .8838) ("bar" .2838) ("bat" .2849))

You might want to use conses instead of lists, and getting a true a-list:

(defparameter al (list '("foo" . .8838) '("bar" . .2838) '("bat" . .2849)))

This way, you can get the second fields with CDR:

     (cdr (assoc "bar" al :test (function string=)))



> If I had that, I could use CL's amazing swiss-army sort function to sort
> by value.

You have:

(setf al (sort al (function <=) :key (function cdr)))

> Question is, how do I "accumulate" a list like that?  

And you can easily add new items:

    (push (cons "new" 0.1234) al)


> How do I iteratively append lists to a list (so I end
> up with a list of lists)?  In lisp.

Wrong question.  Appending a list to a list gives a concatenated list.
You want to ask how to append items (which happen to be list but this
is irrelevant!) to a list.

(setf list (append list (list item))) ; this copies list everytime
(setf list (nconc  list (list item))) ; this modifies list


Now, iteratively:

(mapcar (function cons) 
        (list "foo" "bar" "bat")
        (list 0.8838 0.2838 0.2849))

(loop for key in (list "foo" "bar" "bat")
      for value in (some-value-for-key key)
      collect (cons key value))

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
I need a new toy.
Tail of black dog keeps good time.
Pounce! Good dog! Good dog!

From: Joe Marshall
Subject: Re: lists of lists (newbie)
Date: Fri, 19 Aug 2005 13:26:21 +0000
Message-ID: <zmreawhu.fsf@ccs.neu.edu>

Pascal Bourguignon <····@mouse-potato.com> writes:

>> How do I iteratively append lists to a list (so I end
>> up with a list of lists)?  In lisp.
>
> Wrong question.  Appending a list to a list gives a concatenated list.
> You want to ask how to append items (which happen to be list but this
> is irrelevant!) to a list.
>
> (setf list (append list (list item))) ; this copies list everytime
> (setf list (nconc  list (list item))) ; this modifies list

DON'T EVER DO THIS!

When you use append or nconc to add things to the `far end' of a list,
you have to traverse the entire list structure to get there.  If you
do this repeatedly --- and you will because if you add one item, you
are bound to want to add more --- this will be a quadratic process
rather than linear.

Here's an example:  Both of these functions read a file into a list of
strings.  The first uses append to tack each line on to the `far end'
of the accumulated list, the second uses cons to accumulate on to the
`near end'.  The second accumulates the list backwards, so we use
nreverse to get the list in the right order before returning it.

(defun file->list-bad (file)
  (with-open-file (stream file :direction :input :element-type 'character)
    (do ((line (read-line stream nil nil) (read-line stream nil nil))
         (result '() (nconc result (list line))))
        ((null line) result))))

(defun file->list-good (file)
  (with-open-file (stream file :direction :input :element-type 'character)
    (do ((line (read-line stream nil nil) (read-line stream nil nil))
         (result '() (cons line result)))
        ((null line) (nreverse result)))))

When we run this on a moderately sized file, we can see the
difference:

    [12]> (time (length (file->list-good "12dicts\\6of12.txt"))))

    Real time: 0.094 sec.
    Run time: 0.09375 sec.
    Space: 1596172 Bytes
    GC: 1, GC time: 0.03125 sec.
    32153
    [13]> (time (length (file->list-good "12dicts\\2of12.txt"))))

    Real time: 0.11 sec.
    Run time: 0.109375 sec.
    Space: 2055900 Bytes
    GC: 3, GC time: 0.046875 sec.
    41238
    [14]> (time (length (file->list-good "12dicts\\2of12inf.txt"))))

    Real time: 0.235 sec.
    Run time: 0.234375 sec.
    Space: 4166620 Bytes
    GC: 5, GC time: 0.0625 sec.
    81520

Notice how roughly doubling the file size roughly doubles the time.

    [15]> (time (length (file->list-bad "12dicts\\6of12.txt"))))

    Real time: 3.328 sec.
    Run time: 3.328125 sec.
    Space: 1596172 Bytes
    GC: 2, GC time: 0.015625 sec.
    32153
    [16]> (time (length (file->list-bad "12dicts\\2of12.txt"))))

    Real time: 5.453 sec.
    Run time: 5.4375 sec.
    Space: 2055900 Bytes
    GC: 4, GC time: 0.046875 sec.
    41238
    [17]> (time (length (file->list-bad "12dicts\\2of12inf.txt"))))

    Real time: 22.047 sec.
    Run time: 21.96875 sec.
    Space: 4166620 Bytes
    GC: 7, GC time: 0.125 sec.
    81520

Notice how roughly doubling the file size roughly squares the time.
Note, too, that it is 100 times slower than the good version!

I once worked on a project where the original author couldn't
understand why lisp was so slow.  It took several minutes to read in
just one file.  This was the culprit.

As it turned out, a list of strings wasn't the best representation
anyway.  A single string was better.  Since this was a bottleneck, it
was worthwhile to do a serious hack.  Using the FFI, I called mmap to
map the file into the address space, then I faked up an array header
so it looked like a lisp object and returned a displaced array to it.
(The GC was ok with this, but it really depended on the details of how
the GC was coded.)  This was six orders of magnitude faster than the
original program.

~jrm

From: Rob Warnock
Subject: Re: lists of lists (newbie)
Date: Mon, 22 Aug 2005 04:10:34 +0000
Message-ID: <1LKdnYSa2fOnzZTeRVn-iQ@speakeasy.net>

Joe Marshall  <···@ccs.neu.edu> wrote:
+---------------
| As it turned out, a list of strings wasn't the best representation
| anyway.  A single string was better.  Since this was a bottleneck, it
| was worthwhile to do a serious hack.  Using the FFI, I called mmap to
| map the file into the address space, then I faked up an array header
| so it looked like a lisp object and returned a displaced array to it.
| (The GC was ok with this, but it really depended on the details of how
| the GC was coded.)  This was six orders of magnitude faster than the
| original program.
+---------------

In some implementations [e.g., CMUCL], READ-SEQUENCE will call the
underlying operating system's "read()" routine more-or-less directly,
which is why I use the following for this kind of thing:

    (defun file-string (path)
      "Sucks up an entire file from PATH into a freshly-allocated string,
      returning two values: the string and the number of bytes read."
      (with-open-file (s path)
	(let* ((len (file-length s))
	       (data (make-string len)))
	  (values data (read-sequence data s)))))

It's... uh... FAST:

    > (defvar *data* nil)	; avoid printing many megabytes

    *DATA*
    > (time (multiple-value-bind (data len)
		(file-string "MAIL.today")
	      (setf *data* data)
	      len))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 
    ; Evaluation took:
    ;   0.05 seconds of real time
    ;   0.00165 seconds of user run time
    ;   0.047575 seconds of system run time
    ;   101,130,862 CPU cycles
    ;   0 page faults and
    ;   11,638,960 bytes consed.
    ; 
    11637220
    > (subseq *data* 0 100)	; prove that it worked

    "From ····@CENSORED.org  Sun Jan  2 01:02:57 2005
    Return-Path: <····@CENSORED.org>
    X-Original-To: lis"
    > 


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Joe Marshall
Subject: Re: lists of lists (newbie)
Date: Mon, 22 Aug 2005 16:42:39 +0000
Message-ID: <1x4l53eo.fsf@ccs.neu.edu>

····@rpw3.org (Rob Warnock) writes:

> Joe Marshall  <···@ccs.neu.edu> wrote:
> +---------------
> | As it turned out, a list of strings wasn't the best representation
> | anyway.  A single string was better.  Since this was a bottleneck, it
> | was worthwhile to do a serious hack.  Using the FFI, I called mmap to
> | map the file into the address space, then I faked up an array header
> | so it looked like a lisp object and returned a displaced array to it.
> | (The GC was ok with this, but it really depended on the details of how
> | the GC was coded.)  This was six orders of magnitude faster than the
> | original program.
> +---------------
>
> In some implementations [e.g., CMUCL], READ-SEQUENCE will call the
> underlying operating system's "read()" routine more-or-less directly,

READ-SEQUENCE wasn't in CLTL1

From: Marco Antoniotti
Subject: Re: lists of lists (newbie)
Date: Fri, 19 Aug 2005 14:45:51 +0000
Message-ID: <zgmNe.46$DJ5.69341@typhoon.nyu.edu>

Stephen Ramsay wrote:
> Lisp newbie asks . . .
> 
> For various reasons, I need to process a bunch of terms with associated
> weights.  In other words, I have things like:
> 
> "foo" -> .8838
> "bar" -> .2838
> "bat" -> .2849
> 
> etc.
> 
> Now, this seems like a natural for a hash table, and I can easily get
> one set up.  However, I need to return the the results sorted by the
> value.  I can't use the value as the key, because I can't guarantee the
> uniqueness of the value.
> 
> There doesn't seem to be a function called "sort-by-value", and so I'm
> essentially trying to build one (or at least set up another data
> structure that makes this trivial).


You have two choices.

1 - If you want to maintain O(1) access time and you need to return the 
ordered list every once in a while, then you can simply do

	(let ((hash-list
                  (loop for k being the hash-key of *my-hash-table*
                        using (hash-value v)
                        collect (cons k v))))
            (sort hash-list #'< :key #'cdr))

2 - If you want to traverse the data in order without generating a list 
every time, you need a data structure that allows you to do so.  The 
Red-Black tree package in the CMU AI.Repository does that :)  (shameless 
plug)

Cheers
--
Marco