From: Cortez
Subject: Collecting like-labelled sublists of a list
Date: 
Message-ID: <420afb4e-0ed0-4a48-93bb-dd473c3e51ad@b1g2000hsg.googlegroups.com>
I need to traverse a list of lists, where each sublist is labelled by
a number, and collect together the contents of all sublists sharing
the same label. So if I have the list -

((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
r) (5 s t))

where the first element of each sublist is the label, I need to
produce -

((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))

I do this with the following -

(defun test (list)
  (loop for j in list
	  for index = (first j)
          for k = (rest j)
          with indices = nil
	  if (not (member index indices))
	    do (pushnew index indices)
	    and collect k into res
	  else
	    do (nconc (nth index res) k)
	  finally (return res)))

I suspect that there is a more efficient and elegant way of doing
this, however. Any suggestions welcome.

Brief background: this is part of a program I've written for reading
data from SDIF files, a binary format which stores sound description
data. The labelled lists represent partials in spectral analysis data
(partial-index, time, frequency).

From: Cortez
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <ff4e5361-6c0c-41ea-a0a2-a3bab97b5a6b@z66g2000hsc.googlegroups.com>
On Jul 29, 6:29 pm, Cortez <············@hotmail.co.uk> wrote:
> I need to traverse a list of lists, where each sublist is labelled by
> a number, and collect together the contents of all sublists sharing
> the same label. So if I have the list -
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce -
>
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> I do this with the following -
>
> (defun test (list)
>   (loop for j in list
>           for index = (first j)
>           for k = (rest j)
>           with indices = nil
>           if (not (member index indices))
>             do (pushnew index indices)
>             and collect k into res
>           else
>             do (nconc (nth index res) k)
>           finally (return res)))
>
> I suspect that there is a more efficient and elegant way of doing
> this, however. Any suggestions welcome.
>
> Brief background: this is part of a program I've written for reading
> data from SDIF files, a binary format which stores sound description
> data. The labelled lists represent partials in spectral analysis data
> (partial-index, time, frequency).

Actually the function uses ASSOC instead of NTH, because the labels
themselves will not necessarily be integers -

(defun test (list)
  (loop for j in list
	for index = (first j)
        for k = (rest j)
        with indices = nil
	if (not (member index indices))
          do (pushnew index indices)
	  and collect j into res
	else
	  do (nconc (assoc index res) k) ; ASSOC instead of NTH
	finally (return res)))
From: Cortez
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <712195e4-2e38-4b2f-97bd-48fe7b2f7af1@b1g2000hsg.googlegroups.com>
On Jul 29, 6:44 pm, Cortez <············@hotmail.co.uk> wrote:
> On Jul 29, 6:29 pm, Cortez <············@hotmail.co.uk> wrote:
>
>
>
> > I need to traverse a list of lists, where each sublist is labelled by
> > a number, and collect together the contents of all sublists sharing
> > the same label. So if I have the list -
>
> > ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> > r) (5 s t))
>
> > where the first element of each sublist is the label, I need to
> > produce -
>
> > ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> > I do this with the following -
>
> > (defun test (list)
> >   (loop for j in list
> >           for index = (first j)
> >           for k = (rest j)
> >           with indices = nil
> >           if (not (member index indices))
> >             do (pushnew index indices)
> >             and collect k into res
> >           else
> >             do (nconc (nth index res) k)
> >           finally (return res)))
>
> > I suspect that there is a more efficient and elegant way of doing
> > this, however. Any suggestions welcome.
>
> > Brief background: this is part of a program I've written for reading
> > data from SDIF files, a binary format which stores sound description
> > data. The labelled lists represent partials in spectral analysis data
> > (partial-index, time, frequency).
>
> Actually the function uses ASSOC instead of NTH, because the labels
> themselves will not necessarily be integers -
>
> (defun test (list)
>   (loop for j in list
>         for index = (first j)
>         for k = (rest j)
>         with indices = nil
>         if (not (member index indices))
>           do (pushnew index indices)
>           and collect j into res
>         else
>           do (nconc (assoc index res) k) ; ASSOC instead of NTH
>         finally (return res)))

To be more precise (if that helps), I'm wondering if there's a way of
doing this without having to build up a list of the indices (labels)
and using membership/non-membership of this list as the test for
whether we have encountered a new index or not.

Thanks for any ideas.
From: Alberto Riva
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <g6nnq7$i82k$2@usenet.osg.ufl.edu>
Cortez wrote:
> On Jul 29, 6:44 pm, Cortez <············@hotmail.co.uk> wrote:
>> On Jul 29, 6:29 pm, Cortez <············@hotmail.co.uk> wrote:
>>
>>
>>
>>> I need to traverse a list of lists, where each sublist is labelled by
>>> a number, and collect together the contents of all sublists sharing
>>> the same label. So if I have the list -
>>> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
>>> r) (5 s t))
>>> where the first element of each sublist is the label, I need to
>>> produce -
>>> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>>> I do this with the following -
>>> (defun test (list)
>>>   (loop for j in list
>>>           for index = (first j)
>>>           for k = (rest j)
>>>           with indices = nil
>>>           if (not (member index indices))
>>>             do (pushnew index indices)
>>>             and collect k into res
>>>           else
>>>             do (nconc (nth index res) k)
>>>           finally (return res)))
>>> I suspect that there is a more efficient and elegant way of doing
>>> this, however. Any suggestions welcome.
>>> Brief background: this is part of a program I've written for reading
>>> data from SDIF files, a binary format which stores sound description
>>> data. The labelled lists represent partials in spectral analysis data
>>> (partial-index, time, frequency).
>> Actually the function uses ASSOC instead of NTH, because the labels
>> themselves will not necessarily be integers -
>>
>> (defun test (list)
>>   (loop for j in list
>>         for index = (first j)
>>         for k = (rest j)
>>         with indices = nil
>>         if (not (member index indices))
>>           do (pushnew index indices)
>>           and collect j into res
>>         else
>>           do (nconc (assoc index res) k) ; ASSOC instead of NTH
>>         finally (return res)))
> 
> To be more precise (if that helps), I'm wondering if there's a way of
> doing this without having to build up a list of the indices (labels)
> and using membership/non-membership of this list as the test for
> whether we have encountered a new index or not.

I posted my reply before seeing this, but my solution takes care of this 
  too: GETHASH automatically creates a new entry in the hash table for 
each new key, so you don't have to worry about keeping a list of 
already-seen indexes.

Alberto
From: Madhu
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <m31w1b1lob.fsf@moon.robolove.meer.net>
* Cortez <····································@b1g2000hsg.googlegroups.com> :
Wrote on Tue, 29 Jul 2008 11:24:02 -0700 (PDT):

|> (defun test (list)
|>   (loop for j in list
|>         for index = (first j)
|>         for k = (rest j)
|>         with indices = nil
|>         if (not (member index indices))
|>           do (pushnew index indices)
|>           and collect j into res
|>         else
|>           do (nconc (assoc index res) k) ; ASSOC instead of NTH
|>         finally (return res)))
|
| To be more precise (if that helps), I'm wondering if there's a way of
| doing this without having to build up a list of the indices (labels)
| and using membership/non-membership of this list as the test for
| whether we have encountered a new index or not.

You can get by without building indices and just using ASSOC (which you
cannot avoid):

(defun cortez-group (list)      ; Destroys LIST!
  (let (result)
    (dolist (el list)
      (let ((entry (assoc (car el) result)))
        (if entry
            (rplacd entry (nconc (cdr entry) (cdr el)))
            (push el result))))
    (nreverse (mapcar #'cdr result))))

* (setq $a '((0 a b) (1 c d) (2 e f) (3 g h) (1 i j)
             (2 k l) (4 m n) (2 o p) (4 q r) (5 s t)))
* (cortez-group $a)
=> ((A B) (C D I J) (E F K L O P) (G H) (M N Q R) (S T))

If you want the value to be sorted by label, consider sorting RESULT
:KEY #'CAR.

--
Madhu
From: Alberto Riva
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <g6nnm2$i82k$1@usenet.osg.ufl.edu>
Cortez wrote:
> On Jul 29, 6:29 pm, Cortez <············@hotmail.co.uk> wrote:
>> I need to traverse a list of lists, where each sublist is labelled by
>> a number, and collect together the contents of all sublists sharing
>> the same label. So if I have the list -
>>
>> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
>> r) (5 s t))
>>
>> where the first element of each sublist is the label, I need to
>> produce -
>>
>> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>>
>> I do this with the following -
>>
>> (defun test (list)
>>   (loop for j in list
>>           for index = (first j)
>>           for k = (rest j)
>>           with indices = nil
>>           if (not (member index indices))
>>             do (pushnew index indices)
>>             and collect k into res
>>           else
>>             do (nconc (nth index res) k)
>>           finally (return res)))
>>
>> I suspect that there is a more efficient and elegant way of doing
>> this, however. Any suggestions welcome.
>>
>> Brief background: this is part of a program I've written for reading
>> data from SDIF files, a binary format which stores sound description
>> data. The labelled lists represent partials in spectral analysis data
>> (partial-index, time, frequency).
> 
> Actually the function uses ASSOC instead of NTH, because the labels
> themselves will not necessarily be integers -
> 
> (defun test (list)
>   (loop for j in list
> 	for index = (first j)
>         for k = (rest j)
>         with indices = nil
> 	if (not (member index indices))
>           do (pushnew index indices)
> 	  and collect j into res
> 	else
> 	  do (nconc (assoc index res) k) ; ASSOC instead of NTH
> 	finally (return res)))

If the number of labels can be large, I would use a hash table:

   (defun test (list)
     (let ((ht (make-hash-table :test #'equal)))
       (dolist (j list)
	(dolist (k (rest j))
	  (push k (gethash (car j) ht))))
       (loop
	  for v being the hash-value of ht
	  collect v)))


Of course you get the sublists in random order, but it wasn't clear from 
your message whether that's an issue or not...

Alberto
From: Cortez
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <6d4f68e3-a1e4-4c0b-b9b9-ef8c81f610db@k30g2000hse.googlegroups.com>
On Jul 29, 7:31 pm, Alberto Riva <·····@nospam.ufl.edu> wrote:
> Cortez wrote:
> > On Jul 29, 6:29 pm, Cortez <············@hotmail.co.uk> wrote:
> >> I need to traverse a list of lists, where each sublist is labelled by
> >> a number, and collect together the contents of all sublists sharing
> >> the same label. So if I have the list -
>
> >> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> >> r) (5 s t))
>
> >> where the first element of each sublist is the label, I need to
> >> produce -
>
> >> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> >> I do this with the following -
>
> >> (defun test (list)
> >>   (loop for j in list
> >>           for index = (first j)
> >>           for k = (rest j)
> >>           with indices = nil
> >>           if (not (member index indices))
> >>             do (pushnew index indices)
> >>             and collect k into res
> >>           else
> >>             do (nconc (nth index res) k)
> >>           finally (return res)))
>
> >> I suspect that there is a more efficient and elegant way of doing
> >> this, however. Any suggestions welcome.
>
> >> Brief background: this is part of a program I've written for reading
> >> data from SDIF files, a binary format which stores sound description
> >> data. The labelled lists represent partials in spectral analysis data
> >> (partial-index, time, frequency).
>
> > Actually the function uses ASSOC instead of NTH, because the labels
> > themselves will not necessarily be integers -
>
> > (defun test (list)
> >   (loop for j in list
> >    for index = (first j)
> >         for k = (rest j)
> >         with indices = nil
> >    if (not (member index indices))
> >           do (pushnew index indices)
> >      and collect j into res
> >    else
> >      do (nconc (assoc index res) k) ; ASSOC instead of NTH
> >    finally (return res)))
>
> If the number of labels can be large, I would use a hash table:
>
>    (defun test (list)
>      (let ((ht (make-hash-table :test #'equal)))
>        (dolist (j list)
>         (dolist (k (rest j))
>           (push k (gethash (car j) ht))))
>        (loop
>           for v being the hash-value of ht
>           collect v)))
>
> Of course you get the sublists in random order, but it wasn't clear from
> your message whether that's an issue or not...
>
> Alberto

Thanks Alberto. I had considered using a hash. Unfortunately the order
is important, though. To clarify still further, the final list of
lists is passed as a sequence of coordinate pairs (X1 Y1 X2 Y2... Xn
Yn) to a drawing function in McClim, which outputs them to a graphical
display pane. The main problem I have is that it is not known prior to
actually reading in all the data from an SDIF file how many partials
it contains. This information is not contained in the file header. So
I can't, for example, create an array with a row for each partial and
then just read in all the data. So I want some way of building up a
list (or vector) for each partial as I'm reading through the data,
which is what the function I originally suggested does.
From: Alberto Riva
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <g6o21b$fo46$1@usenet.osg.ufl.edu>
Cortez wrote:
> On Jul 29, 7:31 pm, Alberto Riva <·····@nospam.ufl.edu> wrote:
>> Cortez wrote:
>>> On Jul 29, 6:29 pm, Cortez <············@hotmail.co.uk> wrote:
>>>> I need to traverse a list of lists, where each sublist is labelled by
>>>> a number, and collect together the contents of all sublists sharing
>>>> the same label. So if I have the list -
>>>> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
>>>> r) (5 s t))
>>>> where the first element of each sublist is the label, I need to
>>>> produce -
>>>> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>>>> I do this with the following -
>>>> (defun test (list)
>>>>   (loop for j in list
>>>>           for index = (first j)
>>>>           for k = (rest j)
>>>>           with indices = nil
>>>>           if (not (member index indices))
>>>>             do (pushnew index indices)
>>>>             and collect k into res
>>>>           else
>>>>             do (nconc (nth index res) k)
>>>>           finally (return res)))
>>>> I suspect that there is a more efficient and elegant way of doing
>>>> this, however. Any suggestions welcome.
>>>> Brief background: this is part of a program I've written for reading
>>>> data from SDIF files, a binary format which stores sound description
>>>> data. The labelled lists represent partials in spectral analysis data
>>>> (partial-index, time, frequency).
>>> Actually the function uses ASSOC instead of NTH, because the labels
>>> themselves will not necessarily be integers -
>>> (defun test (list)
>>>   (loop for j in list
>>>    for index = (first j)
>>>         for k = (rest j)
>>>         with indices = nil
>>>    if (not (member index indices))
>>>           do (pushnew index indices)
>>>      and collect j into res
>>>    else
>>>      do (nconc (assoc index res) k) ; ASSOC instead of NTH
>>>    finally (return res)))
>> If the number of labels can be large, I would use a hash table:
>>
>>    (defun test (list)
>>      (let ((ht (make-hash-table :test #'equal)))
>>        (dolist (j list)
>>         (dolist (k (rest j))
>>           (push k (gethash (car j) ht))))
>>        (loop
>>           for v being the hash-value of ht
>>           collect v)))
>>
>> Of course you get the sublists in random order, but it wasn't clear from
>> your message whether that's an issue or not...
>>
>> Alberto
> 
> Thanks Alberto. I had considered using a hash. Unfortunately the order
> is important, though.

Sorry, do you mean the order of the labels, or the order of the elements 
inside each sublist? I though it was the former (and if this is the 
case, see the function below[*]), but reading what you say about 
interpreting your result as a sequence of coordinates, I'm starting to 
think it's the latter. In this case you just need to REVERSE each 
sublist, since the elements are pushed into them.


[*] If you want to preserve the order of the labels, you could still use 
a hash table to store the values and keep a separate list of labels in 
order of appearance:

       (defun test (list)
	(let ((ht (make-hash-table :test #'equal))
	      (labels nil)
	      (result nil))
	  (dolist (j list)
	    (pushnew (car j) labels :test #'equal)
	    (dolist (k (rest j))
	      (push k (gethash (car j) ht))))
	  (dolist (j labels)
	    (push (gethash j ht) result))
	  result))


Alberto

> To clarify still further, the final list of
> lists is passed as a sequence of coordinate pairs (X1 Y1 X2 Y2... Xn
> Yn) to a drawing function in McClim, which outputs them to a graphical
> display pane. The main problem I have is that it is not known prior to
> actually reading in all the data from an SDIF file how many partials
> it contains. This information is not contained in the file header. So
> I can't, for example, create an array with a row for each partial and
> then just read in all the data. So I want some way of building up a
> list (or vector) for each partial as I'm reading through the data,
> which is what the function I originally suggested does.
From: Cortez
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <b0781f79-2f34-4b9c-a578-171eadf2d91a@m3g2000hsc.googlegroups.com>
On Jul 29, 10:28 pm, Alberto Riva <·····@nospam.ufl.edu> wrote:
> Cortez wrote:
> > On Jul 29, 7:31 pm, Alberto Riva <·····@nospam.ufl.edu> wrote:
> >> Cortez wrote:
> >>> On Jul 29, 6:29 pm, Cortez <············@hotmail.co.uk> wrote:
> >>>> I need to traverse a list of lists, where each sublist is labelled by
> >>>> a number, and collect together the contents of all sublists sharing
> >>>> the same label. So if I have the list -
> >>>> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> >>>> r) (5 s t))
> >>>> where the first element of each sublist is the label, I need to
> >>>> produce -
> >>>> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
> >>>> I do this with the following -
> >>>> (defun test (list)
> >>>>   (loop for j in list
> >>>>           for index = (first j)
> >>>>           for k = (rest j)
> >>>>           with indices = nil
> >>>>           if (not (member index indices))
> >>>>             do (pushnew index indices)
> >>>>             and collect k into res
> >>>>           else
> >>>>             do (nconc (nth index res) k)
> >>>>           finally (return res)))
> >>>> I suspect that there is a more efficient and elegant way of doing
> >>>> this, however. Any suggestions welcome.
> >>>> Brief background: this is part of a program I've written for reading
> >>>> data from SDIF files, a binary format which stores sound description
> >>>> data. The labelled lists represent partials in spectral analysis data
> >>>> (partial-index, time, frequency).
> >>> Actually the function uses ASSOC instead of NTH, because the labels
> >>> themselves will not necessarily be integers -
> >>> (defun test (list)
> >>>   (loop for j in list
> >>>    for index = (first j)
> >>>         for k = (rest j)
> >>>         with indices = nil
> >>>    if (not (member index indices))
> >>>           do (pushnew index indices)
> >>>      and collect j into res
> >>>    else
> >>>      do (nconc (assoc index res) k) ; ASSOC instead of NTH
> >>>    finally (return res)))
> >> If the number of labels can be large, I would use a hash table:
>
> >>    (defun test (list)
> >>      (let ((ht (make-hash-table :test #'equal)))
> >>        (dolist (j list)
> >>         (dolist (k (rest j))
> >>           (push k (gethash (car j) ht))))
> >>        (loop
> >>           for v being the hash-value of ht
> >>           collect v)))
>
> >> Of course you get the sublists in random order, but it wasn't clear from
> >> your message whether that's an issue or not...
>
> >> Alberto
>
> > Thanks Alberto. I had considered using a hash. Unfortunately the order
> > is important, though.
>
> Sorry, do you mean the order of the labels, or the order of the elements
> inside each sublist? I though it was the former (and if this is the
> case, see the function below[*]), but reading what you say about
> interpreting your result as a sequence of coordinates, I'm starting to
> think it's the latter. In this case you just need to REVERSE each
> sublist, since the elements are pushed into them.

Of course, yes - sorry, got confused by the data I was testing it
with. :)
The ordering of the final collected lists is not important, since
they're
just passed to the drawing function separately. I might want to
preserve the order
eventually, once I've done a bit more testing. Thank you very much for
your
suggestions.

My main concern is efficiency, however, since the files I'm dealing
with contain a
huge amount of data (a 1MB SDIF file might contain thousands of
partials). Timing the
various (compiled) versions of TEST with realistic data in ACL/Slime
gives -

My original version -

; cpu time (non-gc) 0 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  0 msec user, 0 msec system
; real time  0 msec
; space allocation:
;  76 cons cells, 0 other bytes, 0 static bytes

Alberto -

; cpu time (non-gc) 16 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  16 msec user, 0 msec system
; real time  15 msec
; space allocation:
;  691 cons cells, 944 other bytes, 0 static bytes

Volkan -

; cpu time (non-gc) 0 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  0 msec user, 0 msec system
; real time  0 msec
; space allocation:
;  327 cons cells, 944 other bytes, 0 static bytes

My 'naive' assumption (based on only limited experience profiling and
optimizing code)
is that my version would therefore be more efficient. True?
From: Volkan YAZICI
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <56ed8e6a-8d35-49b8-b093-54c1e87f98f8@j22g2000hsf.googlegroups.com>
On Jul 30, 1:36 am, Cortez <············@hotmail.co.uk> wrote:
> My original version -
>
> ; cpu time (non-gc) 0 msec user, 0 msec system
> ; cpu time (gc)     0 msec user, 0 msec system
> ; cpu time (total)  0 msec user, 0 msec system
> ; real time  0 msec
> ; space allocation:
> ;  76 cons cells, 0 other bytes, 0 static bytes
>
> Alberto -
>
> ; cpu time (non-gc) 16 msec user, 0 msec system
> ; cpu time (gc)     0 msec user, 0 msec system
> ; cpu time (total)  16 msec user, 0 msec system
> ; real time  15 msec
> ; space allocation:
> ;  691 cons cells, 944 other bytes, 0 static bytes
>
> Volkan -
>
> ; cpu time (non-gc) 0 msec user, 0 msec system
> ; cpu time (gc)     0 msec user, 0 msec system
> ; cpu time (total)  0 msec user, 0 msec system
> ; real time  0 msec
> ; space allocation:
> ;  327 cons cells, 944 other bytes, 0 static bytes

I don't think your original version will keep its speed stable --
because of (member index indices) searches -- as input grows. Maybe
you should also tell us more about the average # of labels.


Regards.
From: Cortez
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <d07f0451-fb11-4ab5-8b8e-47b6968c53cb@34g2000hsh.googlegroups.com>
On Jul 30, 6:45 am, Volkan YAZICI <·············@gmail.com> wrote:
> On Jul 30, 1:36 am, Cortez <············@hotmail.co.uk> wrote:
>
>
>
> > My original version -
>
> > ; cpu time (non-gc) 0 msec user, 0 msec system
> > ; cpu time (gc)     0 msec user, 0 msec system
> > ; cpu time (total)  0 msec user, 0 msec system
> > ; real time  0 msec
> > ; space allocation:
> > ;  76 cons cells, 0 other bytes, 0 static bytes
>
> > Alberto -
>
> > ; cpu time (non-gc) 16 msec user, 0 msec system
> > ; cpu time (gc)     0 msec user, 0 msec system
> > ; cpu time (total)  16 msec user, 0 msec system
> > ; real time  15 msec
> > ; space allocation:
> > ;  691 cons cells, 944 other bytes, 0 static bytes
>
> > Volkan -
>
> > ; cpu time (non-gc) 0 msec user, 0 msec system
> > ; cpu time (gc)     0 msec user, 0 msec system
> > ; cpu time (total)  0 msec user, 0 msec system
> > ; real time  0 msec
> > ; space allocation:
> > ;  327 cons cells, 944 other bytes, 0 static bytes
>
> I don't think your original version will keep its speed stable --
> because of (member index indices) searches -- as input grows. Maybe
> you should also tell us more about the average # of labels.
>
> Regards.

Hi Volkan,

Well a 1MB SDIF file might contain about 3000 partials, so that's 3000
labels. That's about the size of file I typically work with. But the
situation is complicated by the fact that within the file itself these
partials are distributed over a set of time-ordered matrices, which
have to be traversed in order to extract the data. Each matrix
contains information on every partial located at that point (each
matrix also has a header). The information itself (frequency,
amplitude, phase, etc - although I'm only interested in frequency at
the moment) is typically in the form of single or double floats. For
example, here are three matrices (which I model as arrays) containing
data extracted from a single clarinet note. The label is in the left-
most column, followed by frequency, amplitude and other data which I
won't elaborate on here (relating to time-offset and noise content.
The time of the matrix itself is contained in the matrix header). So
you can see how the partials (0 to 6 in this example) are distributed
across the matrices -

#2A((0.0d0 1207.8944079486291d0 3.0572778826667964d-5
0.8507826862315859d0
     0.9633105769790011d0 8.735621320690784d-4)
    (1.0d0 2799.767798804896d0 2.9758011552006322d-5
3.45542871985445d0
     0.9999134899562377d0 7.948914515372436d-4)
    (2.0d0 556.677039133396d0 1.7342238369471644d-5
3.5441699661680883d0 0.0d0
     0.0013469001214222072d0))
#2A((0.0d0 1203.4944446791235d0 2.7734811951076312d-5
3.7002160311387104d0
     0.8199124994997562d0 4.790926227113963d-4)
    (1.0d0 2660.5866387929855d0 2.5368686049752616d-5
3.5907080654740575d0
     0.9996524472597946d0 3.0160728537549295d-4)
    (2.0d0 606.6191777974843d0 6.739337893175857d-6
2.3996345665170553d0 0.0d0
     0.0011529762653688502d0)
    (3.0d0 2324.4893423205413d0 4.0769660018709746d-5
3.41151701297919d0
     0.9974054018288645d0 3.8066262505557277d-4)
    (4.0d0 5335.909767981719d0 3.5703443068263075d-5
3.111386950381825d0
     0.9988603457041186d0 4.2724354268293147d-4))
#2A((0.0d0 1201.692599868909d0 3.140013470961943d-5
2.6591665451516757d0
     0.6600595944811739d0 7.04389350218825d-4)
    (3.0d0 2345.9740698897076d0 2.501488942774825d-5
3.3084674908933707d0
     0.9955756776543997d0 3.5830861408908156d-4)
    (4.0d0 5335.896889017415d0 3.63802335700116d-5 6.002182744750073d0
     0.9999136282420076d0 1.5051639945999097d-4)
    (5.0d0 1963.8199112992056d0 2.1833696069379764d-5
3.411731878197788d0
     0.9837995802145886d0 0.0012953992561072873d0)
    (6.0d0 3466.815966572441d0 4.645427928279146d-5
0.4173643804422096d0
     0.9949535597520733d0 5.296405453024079d-4))...

The project I'm working on is a personal Lisp library for reading and
writing SDIF files (it's also a general exercise for me in writing an
application). I already have a working version, but I want to make it
much more efficient. It's not that slow (and in SBCL is actually very
fast), but now I'm trying to write a simple graphical interface, in
which the data is visualized. I'm experimenting with McClim and ACL.
Both get quite slow for large files. At the moment I'm just reading in
all the data into a slot in one of the graphical objects, and then
passing it to the various drawing functions I use. What I want,
ideally, is to do the drawing as I'm parsing the file.

I model the internal structure of an SDIF file through various
classes, instances of which are used to hold the data (SDIF is a
binary format). My current method of extracting partials is to read
all of the matrices into arrays (see above) then iterate over these
for each partial, starting from 0, until I've found them all. This is
horribly inefficient, of course. So the original function I posted was
a model (admittedly simplistic) for the kind of thing I want to do
instead, which is to extract the partials as I'm actually reading in
the bytes.

BTW, more info on SDIF can be found at: http://recherche.ircam.fr/equipes/analyse-synthese/sdif/
From: Thomas A. Russ
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <ymifxprc6jj.fsf@blackcat.isi.edu>
Cortez <············@hotmail.co.uk> writes:
> 
> Hi Volkan,
> 
> Well a 1MB SDIF file might contain about 3000 partials, so that's 3000
> labels. That's about the size of file I typically work with. But the
> situation is complicated by the fact that within the file itself these
> partials are distributed over a set of time-ordered matrices, which
> have to be traversed in order to extract the data. Each matrix
> contains information on every partial located at that point (each
> matrix also has a header). The information itself (frequency,
> amplitude, phase, etc - although I'm only interested in frequency at
> the moment) is typically in the form of single or double floats. For
> example, here are three matrices (which I model as arrays) containing
> data extracted from a single clarinet note. 

> The label is in the left-
> most column, followed by frequency, amplitude and other data which I
> won't elaborate on here (relating to time-offset and noise content.
> The time of the matrix itself is contained in the matrix header). So
> you can see how the partials (0 to 6 in this example) are distributed
> across the matrices -
> 
> #2A((0.0d0 1207.8944079486291d0 3.0572778826667964d-5
> 0.8507826862315859d0
>      0.9633105769790011d0 8.735621320690784d-4)
>     (1.0d0 2799.767798804896d0 2.9758011552006322d-5
> 3.45542871985445d0
>      0.9999134899562377d0 7.948914515372436d-4)
>     (2.0d0 556.677039133396d0 1.7342238369471644d-5
> 3.5441699661680883d0 0.0d0
>      0.0013469001214222072d0))
> #2A((0.0d0 1203.4944446791235d0 2.7734811951076312d-5
> 3.7002160311387104d0
>      0.8199124994997562d0 4.790926227113963d-4)
>     (1.0d0 2660.5866387929855d0 2.5368686049752616d-5
> 3.5907080654740575d0
...

Well, it seems that one of the key questions is whether there are any
other constraints on the label that you are using.  All of the values
are doubles, but it seems from the small sample given above that the
label values are all integer values.

Is this a correct assumption?

If so, and if the number of labels is reasonably bounded, then you can
take a two-pass buffering approach that should be reasonably efficient.

Is it reasonable to assume that the number of items you will need to
collect for each label is relatively small?  If so, then a simple
solution to accumulating the values will work.  Otherwise, something
more complicated will be needed.

Is it also the case that the labels will be in ascending order without
any missing values?

> So the original function I posted was
> a model (admittedly simplistic) for the kind of thing I want to do
> instead, which is to extract the partials as I'm actually reading in
> the bytes.

Based on some guesses to the questions above, namely that the labels are
non-negative integers, not many total values per label, a bounded number
of labels and consecutive labels, I created the following.  Actually,
consecutive is not a strict requirement, but you might end up with some
empty entries that way.

You set up a vector of maximum size, and then just assign the incoming
values into the correct bucket.  In some ways this is a short-cut of a
hash-table where we assume that the label value itself constitutes the
hash-key into our collision-free vector.  [It is related to the O(n)
radix-sort routine.]

(defconstant maximum-label-value 1024)  ;; Or whatever is reasonable.

(defun collate-data (data)
  (let ((results (make-array (list maximum-label-value)))
        (max-label -1))
    (dolist (datum data)
       (let ((label (truncate (first datum)))
             (values (rest datum)))
          (setf (aref results label)
                (nconc (aref results label) values))
          (setf max-label (max max-label label))))
    (values results max-label)))

This will return a vector with the data in place, and an indication of
how many elements are present.

CL-USER> (setq *input* (copy-tree '((0 a b) (1 c d) (2 e f) (3 g h)
                                    (1 i j) (2 k l) (4 m n) (2 o p)
                                    (4 q r) (5 s t))))
((0 A B) (1 C D) (2 E F) (3 G H) (1 I J) (2 K L) (4 M N) (2 O P) (4 Q R) (5 S T))

CL-USER> (time (collate-data *input*))
; cpu time (non-gc) 0 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  0 msec user, 0 msec system
; real time  0 msec
; space allocation:
;  1 cons cell, 4,112 other bytes, 0 static bytes

#((A B) (C D I J) (E F K L O P) (G H) (M N Q R) (S T) NIL NIL NIL NIL ...)
5


> BTW, more info on SDIF can be found at:
> http://recherche.ircam.fr/equipes/analyse-synthese/sdif/

Wow.  Looks complicated....

-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Cortez
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <9e66fd2d-55bb-4bc9-ad6d-93fd9edc344d@l42g2000hsc.googlegroups.com>
On Jul 31, 12:10 am, ····@sevak.isi.edu (Thomas A. Russ) wrote:
> Cortez <············@hotmail.co.uk> writes:
>
> > Hi Volkan,
>
> > Well a 1MB SDIF file might contain about 3000 partials, so that's 3000
> > labels. That's about the size of file I typically work with. But the
> > situation is complicated by the fact that within the file itself these
> > partials are distributed over a set of time-ordered matrices, which
> > have to be traversed in order to extract the data. Each matrix
> > contains information on every partial located at that point (each
> > matrix also has a header). The information itself (frequency,
> > amplitude, phase, etc - although I'm only interested in frequency at
> > the moment) is typically in the form of single or double floats. For
> > example, here are three matrices (which I model as arrays) containing
> > data extracted from a single clarinet note.
> > The label is in the left-
> > most column, followed by frequency, amplitude and other data which I
> > won't elaborate on here (relating to time-offset and noise content.
> > The time of the matrix itself is contained in the matrix header). So
> > you can see how the partials (0 to 6 in this example) are distributed
> > across the matrices -
>
> > #2A((0.0d0 1207.8944079486291d0 3.0572778826667964d-5
> > 0.8507826862315859d0
> >      0.9633105769790011d0 8.735621320690784d-4)
> >     (1.0d0 2799.767798804896d0 2.9758011552006322d-5
> > 3.45542871985445d0
> >      0.9999134899562377d0 7.948914515372436d-4)
> >     (2.0d0 556.677039133396d0 1.7342238369471644d-5
> > 3.5441699661680883d0 0.0d0
> >      0.0013469001214222072d0))
> > #2A((0.0d0 1203.4944446791235d0 2.7734811951076312d-5
> > 3.7002160311387104d0
> >      0.8199124994997562d0 4.790926227113963d-4)
> >     (1.0d0 2660.5866387929855d0 2.5368686049752616d-5
> > 3.5907080654740575d0
>
> ...
>
> Well, it seems that one of the key questions is whether there are any
> other constraints on the label that you are using.  All of the values
> are doubles, but it seems from the small sample given above that the
> label values are all integer values.
>
> Is this a correct assumption?

The data within each matrix has to be of the same binary type -
they're generally either 32- or 64-bit floats.

> If so, and if the number of labels is reasonably bounded, then you can
> take a two-pass buffering approach that should be reasonably efficient.
>
> Is it reasonable to assume that the number of items you will need to
> collect for each label is relatively small?  If so, then a simple
> solution to accumulating the values will work.  Otherwise, something
> more complicated will be needed.
>
> Is it also the case that the labels will be in ascending order without
> any missing values?

Yes, I just want to extract frequency data, contained in the second
column. And yes again, they're in ascending order. There might be
occasional small gaps, though, when a partial disappears from the
stream of matrices only to reappear a few matrices later.

> > So the original function I posted was
> > a model (admittedly simplistic) for the kind of thing I want to do
> > instead, which is to extract the partials as I'm actually reading in
> > the bytes.
>
> Based on some guesses to the questions above, namely that the labels are
> non-negative integers, not many total values per label, a bounded number
> of labels and consecutive labels, I created the following.  Actually,
> consecutive is not a strict requirement, but you might end up with some
> empty entries that way.
>
> You set up a vector of maximum size, and then just assign the incoming
> values into the correct bucket.  In some ways this is a short-cut of a
> hash-table where we assume that the label value itself constitutes the
> hash-key into our collision-free vector.  [It is related to the O(n)
> radix-sort routine.]
>
> (defconstant maximum-label-value 1024)  ;; Or whatever is reasonable.
>
> (defun collate-data (data)
>   (let ((results (make-array (list maximum-label-value)))
>         (max-label -1))
>     (dolist (datum data)
>        (let ((label (truncate (first datum)))
>              (values (rest datum)))
>           (setf (aref results label)
>                 (nconc (aref results label) values))
>           (setf max-label (max max-label label))))
>     (values results max-label)))
>
> This will return a vector with the data in place, and an indication of
> how many elements are present.
>
> CL-USER> (setq *input* (copy-tree '((0 a b) (1 c d) (2 e f) (3 g h)
>                                     (1 i j) (2 k l) (4 m n) (2 o p)
>                                     (4 q r) (5 s t))))
> ((0 A B) (1 C D) (2 E F) (3 G H) (1 I J) (2 K L) (4 M N) (2 O P) (4 Q R) (5 S T))
>
> CL-USER> (time (collate-data *input*))
> ; cpu time (non-gc) 0 msec user, 0 msec system
> ; cpu time (gc)     0 msec user, 0 msec system
> ; cpu time (total)  0 msec user, 0 msec system
> ; real time  0 msec
> ; space allocation:
> ;  1 cons cell, 4,112 other bytes, 0 static bytes
>
> #((A B) (C D I J) (E F K L O P) (G H) (M N Q R) (S T) NIL NIL NIL NIL ...)
> 5

Thank you Thomas, that's a neat suggestion.

> > BTW, more info on SDIF can be found at:
> >http://recherche.ircam.fr/equipes/analyse-synthese/sdif/
>
> Wow.  Looks complicated....

I model the various internal SDIF structures using just a few classes
and binary types. I'm only really interested in certain types of SDIF
files, namely 1TRC and RBEP types, which contain sinusoidal track
data. As it stands my library can handle them quite well. It's mainly
because I want to extract time and frequency values and convert them
into xy coordinates (for graphical display) that I'm interested in
parsing the files in a new (and more efficient) way.
From: Cortez
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <099a810d-0b9b-4170-a031-22cbcfcd1247@c65g2000hsa.googlegroups.com>
On Jul 31, 3:21 am, Cortez <············@hotmail.co.uk> wrote:
> On Jul 31, 12:10 am, ····@sevak.isi.edu (Thomas A. Russ) wrote:
>
>
>
> > Cortez <············@hotmail.co.uk> writes:
>
> > > Hi Volkan,
>
> > > Well a 1MBSDIFfile might contain about 3000 partials, so that's 3000
> > > labels. That's about the size of file I typically work with. But the
> > > situation is complicated by the fact that within the file itself these
> > > partials are distributed over a set of time-ordered matrices, which
> > > have to be traversed in order to extract the data. Each matrix
> > > contains information on every partial located at that point (each
> > > matrix also has a header). The information itself (frequency,
> > > amplitude, phase, etc - although I'm only interested in frequency at
> > > the moment) is typically in the form of single or double floats. For
> > > example, here are three matrices (which I model as arrays) containing
> > > data extracted from a single clarinet note.
> > > The label is in the left-
> > > most column, followed by frequency, amplitude and other data which I
> > > won't elaborate on here (relating to time-offset and noise content.
> > > The time of the matrix itself is contained in the matrix header). So
> > > you can see how the partials (0 to 6 in this example) are distributed
> > > across the matrices -
>
> > > #2A((0.0d0 1207.8944079486291d0 3.0572778826667964d-5
> > > 0.8507826862315859d0
> > >      0.9633105769790011d0 8.735621320690784d-4)
> > >     (1.0d0 2799.767798804896d0 2.9758011552006322d-5
> > > 3.45542871985445d0
> > >      0.9999134899562377d0 7.948914515372436d-4)
> > >     (2.0d0 556.677039133396d0 1.7342238369471644d-5
> > > 3.5441699661680883d0 0.0d0
> > >      0.0013469001214222072d0))
> > > #2A((0.0d0 1203.4944446791235d0 2.7734811951076312d-5
> > > 3.7002160311387104d0
> > >      0.8199124994997562d0 4.790926227113963d-4)
> > >     (1.0d0 2660.5866387929855d0 2.5368686049752616d-5
> > > 3.5907080654740575d0
>
> > ...
>
> > Well, it seems that one of the key questions is whether there are any
> > other constraints on the label that you are using.  All of the values
> > are doubles, but it seems from the small sample given above that the
> > label values are all integer values.
>
> > Is this a correct assumption?
>
> The data within each matrix has to be of the same binary type -
> they're generally either 32- or 64-bit floats.
>
> > If so, and if the number of labels is reasonably bounded, then you can
> > take a two-pass buffering approach that should be reasonably efficient.
>
> > Is it reasonable to assume that the number of items you will need to
> > collect for each label is relatively small?  If so, then a simple
> > solution to accumulating the values will work.  Otherwise, something
> > more complicated will be needed.
>
> > Is it also the case that the labels will be in ascending order without
> > any missing values?
>
> Yes, I just want to extract frequency data, contained in the second
> column. And yes again, they're in ascending order. There might be
> occasional small gaps, though, when a partial disappears from the
> stream of matrices only to reappear a few matrices later.
>
>
>
> > > So the original function I posted was
> > > a model (admittedly simplistic) for the kind of thing I want to do
> > > instead, which is to extract the partials as I'm actually reading in
> > > the bytes.
>
> > Based on some guesses to the questions above, namely that the labels are
> > non-negative integers, not many total values per label, a bounded number
> > of labels and consecutive labels, I created the following.  Actually,
> > consecutive is not a strict requirement, but you might end up with some
> > empty entries that way.
>
> > You set up a vector of maximum size, and then just assign the incoming
> > values into the correct bucket.  In some ways this is a short-cut of a
> > hash-table where we assume that the label value itself constitutes the
> > hash-key into our collision-free vector.  [It is related to the O(n)
> > radix-sort routine.]
>
> > (defconstant maximum-label-value 1024)  ;; Or whatever is reasonable.
>
> > (defun collate-data (data)
> >   (let ((results (make-array (list maximum-label-value)))
> >         (max-label -1))
> >     (dolist (datum data)
> >        (let ((label (truncate (first datum)))
> >              (values (rest datum)))
> >           (setf (aref results label)
> >                 (nconc (aref results label) values))
> >           (setf max-label (max max-label label))))
> >     (values results max-label)))
>
> > This will return a vector with the data in place, and an indication of
> > how many elements are present.
>
> > CL-USER> (setq *input* (copy-tree '((0 a b) (1 c d) (2 e f) (3 g h)
> >                                     (1 i j) (2 k l) (4 m n) (2 o p)
> >                                     (4 q r) (5 s t))))
> > ((0 A B) (1 C D) (2 E F) (3 G H) (1 I J) (2 K L) (4 M N) (2 O P) (4 Q R) (5 S T))
>
> > CL-USER> (time (collate-data *input*))
> > ; cpu time (non-gc) 0 msec user, 0 msec system
> > ; cpu time (gc)     0 msec user, 0 msec system
> > ; cpu time (total)  0 msec user, 0 msec system
> > ; real time  0 msec
> > ; space allocation:
> > ;  1 cons cell, 4,112 other bytes, 0 static bytes
>
> > #((A B) (C D I J) (E F K L O P) (G H) (M N Q R) (S T) NIL NIL NIL NIL ...)
> > 5
>
> Thank you Thomas, that's a neat suggestion.
>
> > > BTW, more info onSDIFcan be found at:
> > >http://recherche.ircam.fr/equipes/analyse-synthese/sdif/
>
> > Wow.  Looks complicated....
>
> I model the various internalSDIFstructures using just a few classes
> and binary types. I'm only really interested in certain types ofSDIF
> files, namely 1TRC and RBEP types, which contain sinusoidal track
> data. As it stands my library can handle them quite well. It's mainly
> because I want to extract time and frequency values and convert them
> into xy coordinates (for graphical display) that I'm interested in
> parsing the files in a new (and more efficient) way.

I am happy to say that I have now solved this problem. I am using a
hash table stored in a global variable, into which I read the partials
as I'm reading the matrices from the file. This has reduced the
loading time for a large (c.1MB) SDIF file from more than 3 minutes
(yes!) to less than 7 seconds. This is for loading in to the graphical
interface in ACL - the non-GUI version is even faster. Once I've
optimized and done the appropriate type-declarations I'm sure I can
get the time down even further.

Many thanks for all the very helpful suggestions and advice!
From: John Thingstad
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <op.ue2sojvwut4oq5@pandora.alfanett.no>
P� Wed, 30 Jul 2008 00:36:28 +0200, skrev Cortez  
<············@hotmail.co.uk>:

>
> My main concern is efficiency, however, since the files I'm dealing
> with contain a
> huge amount of data (a 1MB SDIF file might contain thousands of
> partials). Timing the
> various (compiled) versions of TEST with realistic data in ACL/Slime
> gives -
>
> My original version -
>
> ; cpu time (non-gc) 0 msec user, 0 msec system
> ; cpu time (gc)     0 msec user, 0 msec system
> ; cpu time (total)  0 msec user, 0 msec system
> ; real time  0 msec
> ; space allocation:
> ;  76 cons cells, 0 other bytes, 0 static bytes
>
> Alberto -
>
> ; cpu time (non-gc) 16 msec user, 0 msec system
> ; cpu time (gc)     0 msec user, 0 msec system
> ; cpu time (total)  16 msec user, 0 msec system
> ; real time  15 msec
> ; space allocation:
> ;  691 cons cells, 944 other bytes, 0 static bytes
>
> Volkan -
>
> ; cpu time (non-gc) 0 msec user, 0 msec system
> ; cpu time (gc)     0 msec user, 0 msec system
> ; cpu time (total)  0 msec user, 0 msec system
> ; real time  0 msec
> ; space allocation:
> ;  327 cons cells, 944 other bytes, 0 static bytes
>
> My 'naive' assumption (based on only limited experience profiling and
> optimizing code)
> is that my version would therefore be more efficient. True?
>

This doesn't really tell you anything.
You need to test it on a reasonable size dataset.
Hashing doesn't really come into effect before you have a large amount of  
elements so it scores unreasonably low here compared to what it would on a  
large dataset.

--------------
John Thingstad
From: Cortez
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <3a56c130-ed9c-4d51-8d91-0056905579bd@j22g2000hsf.googlegroups.com>
On Jul 29, 11:56 pm, "John Thingstad" <·······@online.no> wrote:
> På Wed, 30 Jul 2008 00:36:28 +0200, skrev Cortez
> <············@hotmail.co.uk>:
>
>
>
>
>
> > My main concern is efficiency, however, since the files I'm dealing
> > with contain a
> > huge amount of data (a 1MB SDIF file might contain thousands of
> > partials). Timing the
> > various (compiled) versions of TEST with realistic data in ACL/Slime
> > gives -
>
> > My original version -
>
> > ; cpu time (non-gc) 0 msec user, 0 msec system
> > ; cpu time (gc)     0 msec user, 0 msec system
> > ; cpu time (total)  0 msec user, 0 msec system
> > ; real time  0 msec
> > ; space allocation:
> > ;  76 cons cells, 0 other bytes, 0 static bytes
>
> > Alberto -
>
> > ; cpu time (non-gc) 16 msec user, 0 msec system
> > ; cpu time (gc)     0 msec user, 0 msec system
> > ; cpu time (total)  16 msec user, 0 msec system
> > ; real time  15 msec
> > ; space allocation:
> > ;  691 cons cells, 944 other bytes, 0 static bytes
>
> > Volkan -
>
> > ; cpu time (non-gc) 0 msec user, 0 msec system
> > ; cpu time (gc)     0 msec user, 0 msec system
> > ; cpu time (total)  0 msec user, 0 msec system
> > ; real time  0 msec
> > ; space allocation:
> > ;  327 cons cells, 944 other bytes, 0 static bytes
>
> > My 'naive' assumption (based on only limited experience profiling and
> > optimizing code)
> > is that my version would therefore be more efficient. True?
>
> This doesn't really tell you anything.
> You need to test it on a reasonable size dataset.
> Hashing doesn't really come into effect before you have a large amount of
> elements so it scores unreasonably low here compared to what it would on a
> large dataset.
>
> --------------
> John Thingstad

Thanks John, I suspected as much.
From: Volkan YAZICI
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <059e6f33-e56e-4aff-817c-d57ba19a9809@t54g2000hsg.googlegroups.com>
On Jul 29, 8:29 pm, Cortez <············@hotmail.co.uk> wrote:
> I need to traverse a list of lists, where each sublist is labelled by
> a number, and collect together the contents of all sublists sharing
> the same label. So if I have the list -
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce -
>
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> I do this with the following -
>
> (defun test (list)
>   (loop for j in list
>           for index = (first j)
>           for k = (rest j)
>           with indices = nil
>           if (not (member index indices))
>             do (pushnew index indices)
>             and collect k into res
>           else
>             do (nconc (nth index res) k)
>           finally (return res)))
>
> I suspect that there is a more efficient and elegant way of doing
> this, however. Any suggestions welcome.
>
> Brief background: this is part of a program I've written for reading
> data from SDIF files, a binary format which stores sound description
> data. The labelled lists represent partials in spectral analysis data
> (partial-index, time, frequency).

CL-USER> (let ((table (make-hash-table)))
           (mapcar
            (lambda (key) (gethash key table))
            (mapcar
              (lambda (item &aux (key (first item)))
               (setf (gethash key table)
                     (nconc (gethash key table) (rest item)))
               key)
             '((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k
l)
               (4 m n) (2 o p) (4 q r) (5 s t)))))
((A B) (C D I J) (E F K L O P) (G H) (C D I J) (E F K L O P) (M N Q
R)
 (E F K L O P) (M N Q R) (S T))


Regards.
From: Thomas A. Russ
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <ymiod4gcea3.fsf@blackcat.isi.edu>
Cortez <············@hotmail.co.uk> writes:

> I need to traverse a list of lists, where each sublist is labelled by
> a number, and collect together the contents of all sublists sharing
> the same label. So if I have the list -
> 
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
> 
> where the first element of each sublist is the label, I need to
> produce -
> 
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
> 
> I do this with the following -
> 
> (defun test (list)
>   (loop for j in list
> 	  for index = (first j)
>           for k = (rest j)
>           with indices = nil
> 	  if (not (member index indices))
> 	    do (pushnew index indices)
> 	    and collect k into res
> 	  else
> 	    do (nconc (nth index res) k)
> 	  finally (return res)))

The one comment that I have about this particular code is that you end
up destructively modifying the original list structure that you start
with.  Depending on the precise application that may or may not be a
good idea.

As long as you don't need any of the original structure, you will be
fine.  But if the input structure needs to be preserved, you are in
trouble.

To see the effect, try the following:

(defvar *input* (copy-tree '((0 a b)
                             (1 c d)
                             (2 e f)
                             (3 g h)
                             (1 i j)
                             (2 k l)
                             (4 m n)
                             (2 o p)
                             (4 q r)
                             (5 s t))))

;; I use copy-tree to avoid problems with destructively modifying
;; constant list structure.

(test *input*)
  => ((A B) (C D I J) (E F K L O P) (G H) (M N Q R) (S T))

*input*
  =>  ((0 A B) (1 C D I J) (2 E F K L O P) (3 G H) (1 I J)
       (2 K L O P) (4 M N Q R) (2 O P) (4 Q R) (5 S T))

Note in particular the changes to the first occurences of the lists
headed by 1, 2 and 4.

-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Cortez
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <4aba07dc-9f1b-457d-a869-4dd27bfea3d1@34g2000hsf.googlegroups.com>
On Jul 30, 3:11 am, ····@sevak.isi.edu (Thomas A. Russ) wrote:
> Cortez <············@hotmail.co.uk> writes:
> > I need to traverse a list of lists, where each sublist is labelled by
> > a number, and collect together the contents of all sublists sharing
> > the same label. So if I have the list -
>
> > ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> > r) (5 s t))
>
> > where the first element of each sublist is the label, I need to
> > produce -
>
> > ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> > I do this with the following -
>
> > (defun test (list)
> >   (loop for j in list
> >      for index = (first j)
> >           for k = (rest j)
> >           with indices = nil
> >      if (not (member index indices))
> >        do (pushnew index indices)
> >        and collect k into res
> >      else
> >        do (nconc (nth index res) k)
> >      finally (return res)))
>
> The one comment that I have about this particular code is that you end
> up destructively modifying the original list structure that you start
> with.  Depending on the precise application that may or may not be a
> good idea.
>
> As long as you don't need any of the original structure, you will be
> fine.  But if the input structure needs to be preserved, you are in
> trouble.
>
> To see the effect, try the following:
>
> (defvar *input* (copy-tree '((0 a b)
>                              (1 c d)
>                              (2 e f)
>                              (3 g h)
>                              (1 i j)
>                              (2 k l)
>                              (4 m n)
>                              (2 o p)
>                              (4 q r)
>                              (5 s t))))
>
> ;; I use copy-tree to avoid problems with destructively modifying
> ;; constant list structure.
>
> (test *input*)
>   => ((A B) (C D I J) (E F K L O P) (G H) (M N Q R) (S T))
>
> *input*
>   =>  ((0 A B) (1 C D I J) (2 E F K L O P) (3 G H) (1 I J)
>        (2 K L O P) (4 M N Q R) (2 O P) (4 Q R) (5 S T))
>
> Note in particular the changes to the first occurences of the lists
> headed by 1, 2 and 4.
>
> --
> Thomas A. Russ,  USC/Information Sciences Institute

Yes, thanks for pointing that out - I should have mentioned that the
destructive modification is not, in fact, an issue.
From: John Thingstad
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <op.ue4ohlswut4oq5@pandora.alfanett.no>
P� Wed, 30 Jul 2008 04:11:00 +0200, skrev Thomas A. Russ  
<···@sevak.isi.edu>:

>
> To see the effect, try the following:
>
> (defvar *input* (copy-tree '((0 a b)
>                              (1 c d)
>                              (2 e f)
>                              (3 g h)
>                              (1 i j)
>                              (2 k l)
>                              (4 m n)
>                              (2 o p)
>                              (4 q r)
>                              (5 s t))))
>
> ;; I use copy-tree to avoid problems with destructively modifying
> ;; constant list structure.
>
> (test *input*)
>   => ((A B) (C D I J) (E F K L O P) (G H) (M N Q R) (S T))
>
> *input*
>   =>  ((0 A B) (1 C D I J) (2 E F K L O P) (3 G H) (1 I J)
>        (2 K L O P) (4 M N Q R) (2 O P) (4 Q R) (5 S T))
>
> Note in particular the changes to the first occurences of the lists
> headed by 1, 2 and 4.
>

If your files are as big as you mentioned 25 Mb or more a buffered  
approach sounds better unless you have a pressing need to keep it all in  
memory.

--------------
John Thingstad
From: Steve Allan
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <uljzj2luv.fsf@attachmate.com>
Cortez <············@hotmail.co.uk> writes:

> I need to traverse a list of lists, where each sublist is labelled by
> a number, and collect together the contents of all sublists sharing
> the same label. So if I have the list -
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce -
>
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> I do this with the following -
>
> (defun test (list)
>   (loop for j in list
> 	  for index = (first j)
>           for k = (rest j)
>           with indices = nil
> 	  if (not (member index indices))
> 	    do (pushnew index indices)
> 	    and collect k into res
> 	  else
> 	    do (nconc (nth index res) k)
> 	  finally (return res)))
>
> I suspect that there is a more efficient and elegant way of doing
> this, however. Any suggestions welcome.
>
> Brief background: this is part of a program I've written for reading
> data from SDIF files, a binary format which stores sound description
> data. The labelled lists represent partials in spectral analysis data
> (partial-index, time, frequency).

I'm a junior lisper, so I tried a different approach mostly for my own
education.


(let ((mylist '((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) 
		(4 m n) (2 o p) (4 q) (5 s t)))
      (ar (make-array 10 :adjustable t)))
  (mapcar (lambda (l)
	    (setf (aref ar (first l)) 
		  (append (aref ar (first l)) (rest l)))) mylist))


My choice of 10 for the initial size of the array was arbitrary.  I
chose the non-destructive append over nconc, but nconc might be ok.


-- 
-- Steve
From: Steve Allan
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <u1w1b2bpx.fsf@attachmate.com>
Madhu <·······@meer.net> writes:

> * Steve Allan <·············@attachmate.com> :
> Wrote on Wed, 30 Jul 2008 12:49:44 -0700:
> | I'm a junior lisper, so I tried a different approach mostly for my own
> | education.
> |
> | (let ((mylist '((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) 
> | 		(4 m n) (2 o p) (4 q) (5 s t)))
> |       (ar (make-array 10 :adjustable t)))
> |   (mapcar (lambda (l)
> | 	    (setf (aref ar (first l)) 
> | 		  (append (aref ar (first l)) (rest l)))) mylist))
> |
> |
> | My choice of 10 for the initial size of the array was arbitrary.  I
> | chose the non-destructive append over nconc, but nconc might be ok.
>
> This is a correct approach.  Note it is the almost identical to what
> [the experienced] Kaz suggests in this thread!  Like he says, it will
> work well "If the labels are within a small numeric range, like 0 to 99"
>
> [Even if the label bound is not known but is known to be small, as you
>  have declared AR adjustable, if the label I is greater than (LENGTH AR)
>  you can resize AR before setting the Ith element.]
>

I realize now that I don't understand how adjustable arrays work.  I
assumed the expanding would happen automagically, but after a quick
test I see that it doesn't.  So my approach doesn't work the way I had
intended.  I'll have to read up on adjustable arrays a bit.

> However in this case the label may not be useful to index the array, as
> elsewhere in the thread Cortez has said:
>
>> Actually the function uses ASSOC instead of NTH, because the labels
>> themselves will not necessarily be integers

Ah, that does change things.  

-- 
-- Steve
From: Kaz Kylheku
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <20080730145539.324@gmail.com>
On 2008-07-29, Cortez <············@hotmail.co.uk> wrote:
> I need to traverse a list of lists, where each sublist is labelled by
> a number, and collect together the contents of all sublists sharing
> the same label. So if I have the list -
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))

Note that this is basically like join of a database onto itself,
where the number is the join key.

> where the first element of each sublist is the label, I need to
> produce -
>
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> I do this with the following -
>
> (defun test (list)
>   (loop for j in list
> 	  for index = (first j)
>           for k = (rest j)
>           with indices = nil
> 	  if (not (member index indices))
> 	    do (pushnew index indices)
> 	    and collect k into res
> 	  else
> 	    do (nconc (nth index res) k)
> 	  finally (return res)))
>
> I suspect that there is a more efficient and elegant way of doing
> this, however. Any suggestions welcome.

If you want efficiency for situations when such lists are large,
you probably want to use hashing to implement the join.

If the labels are within a small numeric range, like 0 to 99, you could use an
array instead of hashing:

  (defun join (list)
    (let ((array (make-array '(100))))
      (dolist (sublist list (coerce (remove nil array) 'list))
        (destructuring-bind (numeric-label &rest items) sublist
	  (setf (aref array numeric-label) 
	        (append (aref array numeric-label) items))))))

Change APPEND to NCONC for the destructive version.

> Brief background: this is part of a program I've written for reading
> data from SDIF files, a binary format which stores sound description
> data. The labelled lists represent partials in spectral analysis data
> (partial-index, time, frequency).

So the labels are bounded, since they represent partials, and there are only so
many partials in the spectral analysis.
From: ······@gmail.com
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <616342f3-5723-4fae-8fcf-b6348968d044@a8g2000prf.googlegroups.com>
On Jul 29, 10:29 am, Cortez <············@hotmail.co.uk> wrote:
> I need to traverse a list of lists, where each sublist is labelled by
> a number, and collect together the contents of all sublists sharing
> the same label. So if I have the list -
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce -
>
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> I do this with the following -
>
> (defun test (list)
>   (loop for j in list
>           for index = (first j)
>           for k = (rest j)
>           with indices = nil
>           if (not (member index indices))
>             do (pushnew index indices)
>             and collect k into res
>           else
>             do (nconc (nth index res) k)
>           finally (return res)))
>
> I suspect that there is a more efficient and elegant way of doing
> this, however. Any suggestions welcome.
>
> Brief background: this is part of a program I've written for reading
> data from SDIF files, a binary format which stores sound description
> data. The labelled lists represent partials in spectral analysis data
> (partial-index, time, frequency).


For some excursion, here's how one'd do it in Mathematica.

define the list:

mylist={0[a,b],1[c,d],2[e,f],3[g,f],1[i,j],2[k,l],4[m,n],2[o,p],4[q,r],
5[s,t]}

then do this:

····@mylist //. {f___,x_[a__],x_[b__],l___} -> {f,x[a,b],l}

output is:

{0[a, b], 1[c, d, i, j], 2[e, f, k, l, o, p], 3[g, f], 4[m, n, q, r],
5[s, t]}

if you want the result cleaned up so that the integer labels are
removed, do like this

result /. _Integer[b___] -> {b}

-----------------------------------

Explanation:

The ····@mylist is syntactically equivalent to Sort[mylist]. It just
sorts it.
The result is:

{0[a, b], 1[c, d], 1[i, j], 2[e, f], 2[k, l], 2[o, p], 3[g, f], 4[m,
n],
  4[q, r], 5[s, t]}

The “//. {f___,x_[a__],x_[b__],l___} -> {f,x[a,b],l}”

means use a pattern matching so that if ajacent elements has the same
head, merge them into one.

the shortcut syntax for structural transformation used above is this:

myExpr //. myPattern -> myNewForm

The “myExpr” is any expression. The “myPattern” is any structural
pattern (i.e. like regex except it work on list structures and
datatypes). The “myNewForm” is like the regex's replacement string.

The syntax
“myExpr //. myPattern -> myNewForm”
can also be written in purely nested form, like this:

ReplaceRepeated[myExpr, Rule[myPattern, myNewForm]]

Now, here's some explanation on the the shortcut syntax used to match
patterns:

“_” means any single symbol.
“__” means any 1 or more symbols.
“___” means any 0 or more symbols.

“x_” means a pattern to be named “x”, so later you can refer it as
“x”.
So, “f___” means a pattern that's 0 or more symbols, and named f.
Similar for “l___”.
The “x_[a__]” means basically a expression with 1 or more elements.
The head we named “x”, and its elements we named “a”. Similar for
“x_[b__]”.

So, all together, the “{f___,x_[a__],x_[b__],l___}” just matches a
list or 0 or more length, and capture neighbors that has the same
head.

The “{f,x[a,b],l}” just means the new form we want. Note the f,x,a,b,l
are just names we used for the captured pattern.

So now, ·····@mylist //. {f___,x_[a__],x_[b__],l___} -> {f,x[a,b],l}”
gives us the desired result.

Now to clean up the head that function as integer labels, we do:

result /. _Integer[b___] -> {b}

which is another expression transformation with pattern. The
“_Integer[b___]” just means any list who's head is of Integer type,
and its element we name “b”. Any expression matching it is replaced by
a list of its elements, expressed as “{b}”.

----------------------

Now, all the above may seem like weired syntax. But actually it has a
purely nested form.

For example, this whole expression:

····@mylist /. {f___, x_[a__], x_[b__], l___} -> {f, x[a, b], l}

is syntactically equivalent to this:

ReplaceRepeated[Sort[mylist],
    Rule[List[Pattern[f, BlankNullSequence[]],
        Pattern[x, Blank[]][Pattern[a, BlankSequence[]]],
        Pattern[x, Blank[]][Pattern[b, BlankSequence[]]],
        Pattern[l, BlankNullSequence[]]], List[f, x[a, b], l]]]

In a lisp form, it'd be:

(ReplaceRepeated (Sort mylist)
            (Rule
             (List
              (Pattern f (BlankNullSequence))
              ((Pattern x (Blank)) (Pattern a BlankSequence))
              ((Pattern x (Blank)) (Pattern b BlankSequence))
              (Pattern l (BlankNullSequence)))
             (List f (x a b) l)))

That's some power of fully nested, _regular_, syntax.

For some detailed reading regarding the syntax, see:
 “The Concepts and Confusions of Prefix, Infix, Postfix and Fully
Functional Notations”
 http://xahlee.org/UnixResource_dir/writ/notations.html

-------------------------

Now, Qi support pattern matching in common lisp. I wonder if the above
can be translated to Qi. A functional lang such as Haskell or OCaml
would be interesting.

  Xah
∑ http://xahlee.org/

☄
From: Jon Harrop
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <g7g7nr$eiq$1@aioe.org>
······@gmail.com wrote:
> For example, this whole expression:
> 
> ····@mylist /. {f___, x_[a__], x_[b__], l___} -> {f, x[a, b], l}
> ...

In OCaml/F#, the original list would be a list of tuples of integers and
lists and the solution is then simply:

  List.sort (fun (a, _) (b, _) -> compare a b) mylist

-- 
Dr Jon D Harrop, Flying Frog Consultancy
http://www.ffconsultancy.com/products/?u
From: William James
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <45ba28f7-94f1-4b53-a5fb-d75afc5fd1da@x41g2000hsb.googlegroups.com>
On Jul 29, 12:29 pm, Cortez <············@hotmail.co.uk> wrote:
> I need to traverse a list of lists, where each sublist is labelled by
> a number, and collect together the contents of all sublists sharing
> the same label. So if I have the list -
>
> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
> r) (5 s t))
>
> where the first element of each sublist is the label, I need to
> produce -
>
> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>
> I do this with the following -
>
> (defun test (list)
>   (loop for j in list
>           for index = (first j)
>           for k = (rest j)
>           with indices = nil
>           if (not (member index indices))
>             do (pushnew index indices)
>             and collect k into res
>           else
>             do (nconc (nth index res) k)
>           finally (return res)))

Ruby:

[[0,"a","b"],[1,"c","d"],[2,"e","f"],[3,"g","h"],[1,"i","j"],
[2,"k","l"],[4,"m","n"],[2,"o","p"],[4,"q","r"],[5,"s","t"]].
sort.inject([]){|a,b|
  (a==[] or b[0] != a[-1][0]) ?
    a << b : ( a[-1] += b[1..-1] ; a ) }.
map{|a| a[1..-1] }

    ==>[["a", "b"], ["c", "d", "i", "j"], ["e", "f", "k", "l", "o",
"p"],
 ["g","h"], ["m", "n", "q", "r"], ["s", "t"]]
From: André Thieme
Subject: Re: Collecting like-labelled sublists of a list
Date: 
Message-ID: <gavhp2$ckj$1@registered.motzarella.org>
William James schrieb:
> On Jul 29, 12:29 pm, Cortez <············@hotmail.co.uk> wrote:
>> I need to traverse a list of lists, where each sublist is labelled by
>> a number, and collect together the contents of all sublists sharing
>> the same label. So if I have the list -
>>
>> ((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q
>> r) (5 s t))
>>
>> where the first element of each sublist is the label, I need to
>> produce -
>>
>> ((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))
>>
>> I do this with the following -
>>
>> (defun test (list)
>>   (loop for j in list
>>           for index = (first j)
>>           for k = (rest j)
>>           with indices = nil
>>           if (not (member index indices))
>>             do (pushnew index indices)
>>             and collect k into res
>>           else
>>             do (nconc (nth index res) k)
>>           finally (return res)))
> 
> Ruby:
> 
> [[0,"a","b"],[1,"c","d"],[2,"e","f"],[3,"g","h"],[1,"i","j"],
> [2,"k","l"],[4,"m","n"],[2,"o","p"],[4,"q","r"],[5,"s","t"]].
> sort.inject([]){|a,b|
>   (a==[] or b[0] != a[-1][0]) ?
>     a << b : ( a[-1] += b[1..-1] ; a ) }.
> map{|a| a[1..-1] }

Although your Ruby solution looks shorter, because the function names
only consist of a few characters your solution is longer in fact, when
looking at the program complexity, which is what really counts in my
opinion. We don�t want programs to have more complexity elements.

I did not count the data structure itself, because in that case Ruby
would easily have lost, because of the so many strings.
While '((1 a b) (2 c d)) are only 11 points we have
[[1, "a", "b"], [2, "c", "d"]] already with 18 points.

Both, the Lisp code and your Ruby code (without counting the complexity
points of your data structure) have 46 points. But the Lisp code wraps
this into a function.
So, in your Ruby code does less. If we add  def test(list) and end
we already have a more complex Ruby version.


Andr�
--