How do I make this utility more flexible without losing speed?

From: Chaitanya Gupta
Subject: How do I make this utility more flexible without losing speed?
Date: Tue, 08 May 2007 12:24:11 +0000
Message-ID: <f1pq5c$m8o$1@registered.motzarella.org>

Hello,

I have a function, GROUP, which groups together items in a list if they
satisfy a TEST (of two arguments), after applying the parameter KEY to 
both the items -

(defun group (list &key (test #'eql) (key #'identity))
   "Groups items in a list based on TEST and KEY"
   (let ((hash-table (make-hash-table :test test)))
     (dolist (el list)
       (push el (gethash (funcall key el) hash-table)))
     (loop
        for val being the hash-value in hash-table
        collect val into vals
        finally (return vals))))
=>
GROUP


(defvar *strings* '("foo" "bar" "baz" "quux" "quuux" "bazola" "gorp"))
=>
*STRINGS*


(group *strings* :key #'(lambda (x) (char x 0)) :test #'equalp)
=>
(("bazola" "baz" "bar") ("foo") ("gorp") ("quuux" "quux"))


Seems good enough. But the only limitation of this function is that
because it uses hash tables, TEST can only be one of EQ, EQL,
EQUAL or EQUALP.

So I write something which should take any TEST argument -

(defun group2 (list &key (test #'eql) (key #'identity))
   (let ((result nil))
     (dolist (el list)
       (let* ((entry-key (funcall key el))
	     (entry (assoc entry-key result :test test)))
	(if entry
	    (push el (cdr entry))
	    (push (list entry-key el) result))))
     (mapcar #'cdr result)))
=>
GROUP2

(group2 *strings* :key #'(lambda (x) (char x 0)) :test #'char-equal)
=>
(("gorp") ("quuux" "quux") ("bazola" "baz" "bar") ("foo"))


(group2 *strings* :test #'(lambda (x y)
			    (char-equal (char x 0) (char y 0))))
=>
(("gorp") ("quuux" "quux") ("bazola" "baz" "bar") ("foo"))


Now this works fine. But my only concern is that GROUP2 would be too 
slow when the number of "keys" for grouping is large.

e.g.

(defvar *nums* (loop repeat 100000 collect (random 100000)))
=>
*NUMS*

(time (loop repeat 10 do (group *nums* :key #'(lambda (x) (mod x 1000)))))
=>
; cpu time (non-gc) 2,050 msec user, 10 msec system
; cpu time (gc)     160 msec user, 0 msec system
; cpu time (total)  2,210 msec user, 10 msec system
; real time  2,219 msec
; space allocation:
;  12,010,358 cons cells, 80,607,872 other bytes, 0 static bytes
NIL


(time (loop repeat 10 do (group2 *nums* :key #'(lambda (x) (mod x 1000)))))
=>
; cpu time (non-gc) 19,450 msec user, 30 msec system
; cpu time (gc)     170 msec user, 0 msec system
; cpu time (total)  19,620 msec user, 30 msec system
; real time  19,666 msec
; space allocation:
;  12,030,348 cons cells, 80,015,168 other bytes, 0 static bytes
NIL

So, how do I make GROUP2 faster using standard CL constructs?

Thanks,

Chaitanya

Re: How do I make this utility more flexible without losing speed? Alan Manuel K. Gloria
- Re: How do I make this utility more flexible without losing speed? Chaitanya Gupta
  - Re: How do I make this utility more flexible without losing speed? Geoff Wozniak
    - Re: How do I make this utility more flexible without losing speed? John Thingstad
    - Re: How do I make this utility more flexible without losing speed? Chaitanya Gupta
      - Re: How do I make this utility more flexible without losing speed? Ken Tilton
        Re: How do I make this utility more flexible without losing speed? Chaitanya Gupta
        Re: How do I make this utility more flexible without losing speed? Geoff Wozniak
        Re: How do I make this utility more flexible without losing speed? Chaitanya Gupta
        Re: How do I make this utility more flexible without losing speed? Ken Tilton
        Re: How do I make this utility more flexible without losing speed? Chaitanya Gupta
        Re: How do I make this utility more flexible without losing speed? Ken Tilton
        Re: How do I make this utility more flexible without losing speed? Chaitanya Gupta
        Re: How do I make this utility more flexible without losing speed? Ken Tilton
        Re: How do I make this utility more flexible without losing speed? Chaitanya Gupta
        Re: How do I make this utility more flexible without losing speed? Ken Tilton
Re: How do I make this utility more flexible without losing speed? Chaitanya Gupta
Re: How do I make this utility more flexible without losing speed? Wade Humeniuk
- Re: How do I make this utility more flexible without losing speed? Chaitanya Gupta
  - Re: How do I make this utility more flexible without losing speed? Wade Humeniuk
    - Re: How do I make this utility more flexible without losing speed? Chaitanya Gupta
      - Re: How do I make this utility more flexible without losing speed? Duane Rettig
        Re: How do I make this utility more flexible without losing speed? Chaitanya Gupta
        Re: How do I make this utility more flexible without losing speed? Duane Rettig
        Re: How do I make this utility more flexible without losing speed? Pascal Bourguignon
        Re: How do I make this utility more flexible without losing speed? Chaitanya Gupta
Re: How do I make this utility more flexible without losing speed? Madhu
- Re: How do I make this utility more flexible without losing speed? Madhu

From: Alan Manuel K. Gloria
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Tue, 08 May 2007 14:11:05 +0000
Message-ID: <1178633465.138457.235960@e65g2000hsc.googlegroups.com>

On May 8, 8:24 pm, Chaitanya Gupta <····@chaitanyagupta.com> wrote:
> Hello,
>
> I have a function, GROUP, which groups together items in a list if they
> satisfy a TEST (of two arguments), after applying the parameter KEY to
> both the items -
>
> (defun group (list &key (test #'eql) (key #'identity))
>    "Groups items in a list based on TEST and KEY"
>    (let ((hash-table (make-hash-table :test test)))
>      (dolist (el list)
>        (push el (gethash (funcall key el) hash-table)))
>      (loop
>         for val being the hash-value in hash-table
>         collect val into vals
>         finally (return vals))))
> =>
> GROUP
>
> (defvar *strings* '("foo" "bar" "baz" "quux" "quuux" "bazola" "gorp"))
> =>
> *STRINGS*
>
> (group *strings* :key #'(lambda (x) (char x 0)) :test #'equalp)
> =>
> (("bazola" "baz" "bar") ("foo") ("gorp") ("quuux" "quux"))
>
> Seems good enough. But the only limitation of this function is that
> because it uses hash tables, TEST can only be one of EQ, EQL,
> EQUAL or EQUALP.
>
> So I write something which should take any TEST argument -
>
> (defun group2 (list &key (test #'eql) (key #'identity))
>    (let ((result nil))
>      (dolist (el list)
>        (let* ((entry-key (funcall key el))
>              (entry (assoc entry-key result :test test)))
>         (if entry
>             (push el (cdr entry))
>             (push (list entry-key el) result))))
>      (mapcar #'cdr result)))
> =>
> GROUP2
>
> (group2 *strings* :key #'(lambda (x) (char x 0)) :test #'char-equal)
> =>
> (("gorp") ("quuux" "quux") ("bazola" "baz" "bar") ("foo"))
>
> (group2 *strings* :test #'(lambda (x y)
>                             (char-equal (char x 0) (char y 0))))
> =>
> (("gorp") ("quuux" "quux") ("bazola" "baz" "bar") ("foo"))
>
> Now this works fine. But my only concern is that GROUP2 would be too
> slow when the number of "keys" for grouping is large.
>
> e.g.
>
> (defvar *nums* (loop repeat 100000 collect (random 100000)))
> =>
> *NUMS*
>
> (time (loop repeat 10 do (group *nums* :key #'(lambda (x) (mod x 1000)))))
> =>
> ; cpu time (non-gc) 2,050 msec user, 10 msec system
> ; cpu time (gc)     160 msec user, 0 msec system
> ; cpu time (total)  2,210 msec user, 10 msec system
> ; real time  2,219 msec
> ; space allocation:
> ;  12,010,358 cons cells, 80,607,872 other bytes, 0 static bytes
> NIL
>
> (time (loop repeat 10 do (group2 *nums* :key #'(lambda (x) (mod x 1000)))))
> =>
> ; cpu time (non-gc) 19,450 msec user, 30 msec system
> ; cpu time (gc)     170 msec user, 0 msec system
> ; cpu time (total)  19,620 msec user, 30 msec system
> ; real time  19,666 msec
> ; space allocation:
> ;  12,030,348 cons cells, 80,015,168 other bytes, 0 static bytes
> NIL
>
> So, how do I make GROUP2 faster using standard CL constructs?
>
> Thanks,
>
> Chaitanya
Hmm, as far as I can tell, hash tables would be the best for your
problem domain.

However, it seems that you might prefer to use a different test
function.  Here are some things I noticed -
1.  (dolist ... with an inner (assoc ....)) is actually a loop within
a loop.  The outer loop is the dolist, the inner loop the assoc.
2.  The cleanup (mapcar ...) is yet another loop, but is thankfully
not a loop within the other loops.  Considering that it's been
grouped, I expect the list here to be reasonably small.  Probably not
worth optimizing out.

Anyway it's an interesting problem, so I'll try to hack on it.

From: Chaitanya Gupta
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Wed, 09 May 2007 06:41:57 +0000
Message-ID: <f1rqfo$ao2$1@registered.motzarella.org>

Alan Manuel K. Gloria wrote:
> Hmm, as far as I can tell, hash tables would be the best for your
> problem domain.
> 
> However, it seems that you might prefer to use a different test
> function.  Here are some things I noticed -
> 1.  (dolist ... with an inner (assoc ....)) is actually a loop within
> a loop.  The outer loop is the dolist, the inner loop the assoc.

Point. That's the problem I am trying to solve.

> 2.  The cleanup (mapcar ...) is yet another loop, but is thankfully
> not a loop within the other loops.  Considering that it's been
> grouped, I expect the list here to be reasonably small.  Probably not
> worth optimizing out.

Even for GROUP (using hash tables), the form (loop for val being the 
hash-value... ), is a loop which does the samething as (mapcar ...) in 
GROUP2 i.e. spit out a grouped list. I don't think this makes any 
difference.

Chaitanya

From: Geoff Wozniak
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Wed, 09 May 2007 13:29:46 +0000
Message-ID: <1178717386.221624.254230@y5g2000hsa.googlegroups.com>

On May 9, 2:41 am, Chaitanya Gupta <····@chaitanyagupta.com> wrote:
> Alan Manuel K. Gloria wrote:
>
> > Hmm, as far as I can tell, hash tables would be the best for your
> > problem domain.
>
> Point. That's the problem I am trying to solve.
>

Why not both hash tables and lists, depending on the situation?

(defun group-hash (list test key)
  (let ((hash-table (make-hash-table :test test)))
    (dolist (el list)
      (push el (gethash (funcall key el) hash-table)))
    (loop
       for val being the hash-value in hash-table
       collect val into vals
       finally (return vals))))

(defun group-list (list test key)
  (let ((groups nil))
    (dolist (elt list groups)
      (let ((pos (position (funcall key elt) groups :test
                                 (lambda (e group)
                                   (funcall test e
                                              (funcall key (car
group)))))))
        (if pos
           (push elt (nth pos groups))
           (push (list elt) groups))))))

(defun standard-test-function-p (fn)
  (member fn (list 'eq     #'eq
                          'eql    #'eql
                          'equal  #'equal
                          'equalp #'equalp)))

(defun group (list &key (test #'eql) (key #'identity))
  "Groups items in a list based on TEST and KEY.  If TEST is one of
the
standard equality functions, the function will likely be considerably
faster."
  (if (standard-test-function-p test)
      (group-hash list test key)
      (group-list list test key)))

CL-USER> (defvar *nums* (loop repeat 100000 collect (random 100000)))
*NUMS*
CL-USER> (time (progn (group *nums* :key #'(lambda (x) (mod x 100)))
nil))
Evaluation took:
  0.032 seconds of real time
  0.029349 seconds of user run time
  0.001179 seconds of system run time
  0 calls to %EVAL
  0 page faults and
  806,960 bytes consed.
NIL
CL-USER> (time (progn (group *nums*
                                             :test #'(lambda (x y)
(eql x y))
                                             :key #'(lambda (x) (mod x
100))) nil))
Evaluation took:
  0.624 seconds of real time
  0.536391 seconds of user run time
  0.005239 seconds of system run time
  [Run times include 0.023 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  2,398,024 bytes consed.
NIL

(This is using SBCL 1.0.5.  Also, I hope the formatting comes out
since I'm posting this through Google Groups.)

I can't see any way around the problem of iterating through the known
equivalence classes (groups) when testing to see what class an item
should belong to.  With a pre-defined mapping from the keys to the
natural numbers, you can deal with it in an efficient way (like hash
tables).  Without that knowledge, I don't see a way of guaranteeing
much of anything.

When situations like this arise, I try to take advantage of what
information is available and get the function to work fast for those
situations.  In this case, if the test function is compatible with a
hash table, then I say take advantage of it.  There will still be
situations where it's slow, but I don't see a major problem if you
document the behaviour.

Alternatively, if you're not worried about tracing the function, you
could write a compiler macro that specializes the call.

Geoff

From: John Thingstad
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Wed, 09 May 2007 13:50:42 +0000
Message-ID: <op.tr2bes19pqzri1@pandora.upc.no>

On Wed, 09 May 2007 15:29:46 +0200, Geoff Wozniak  
<·············@gmail.com> wrote:

> On May 9, 2:41 am, Chaitanya Gupta <····@chaitanyagupta.com> wrote:
>> Alan Manuel K. Gloria wrote:
>>
>> > Hmm, as far as I can tell, hash tables would be the best for your
>> > problem domain.
>>
>> Point. That's the problem I am trying to solve.
>>
>
> Why not both hash tables and lists, depending on the situation?
>
> (defun group (list &key (test #'eql) (key #'identity))
>   "Groups items in a list based on TEST and KEY.  If TEST is one of
> the
> standard equality functions, the function will likely be considerably
> faster."
>   (if (standard-test-function-p test)
>       (group-hash list test key)
>       (group-list list test key)))
>

Better still use define-compiler-macro to determine this
at compile time if possible.

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

From: Chaitanya Gupta
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Thu, 10 May 2007 14:45:26 +0000
Message-ID: <f1vb66$9f3$1@registered.motzarella.org>

Geoff Wozniak wrote:
> On May 9, 2:41 am, Chaitanya Gupta <····@chaitanyagupta.com> wrote:
>> Alan Manuel K. Gloria wrote:
>>
>>> Hmm, as far as I can tell, hash tables would be the best for your
>>> problem domain.
>> Point. That's the problem I am trying to solve.
>>
> 
> Why not both hash tables and lists, depending on the situation?
> 
> (defun group-hash (list test key)
>   (let ((hash-table (make-hash-table :test test)))
>     (dolist (el list)
>       (push el (gethash (funcall key el) hash-table)))
>     (loop
>        for val being the hash-value in hash-table
>        collect val into vals
>        finally (return vals))))
> 
> (defun group-list (list test key)
>   (let ((groups nil))
>     (dolist (elt list groups)
>       (let ((pos (position (funcall key elt) groups :test
>                                  (lambda (e group)
>                                    (funcall test e
>                                               (funcall key (car
> group)))))))
>         (if pos
>            (push elt (nth pos groups))
>            (push (list elt) groups))))))
> 
> (defun standard-test-function-p (fn)
>   (member fn (list 'eq     #'eq
>                           'eql    #'eql
>                           'equal  #'equal
>                           'equalp #'equalp)))
> 
> (defun group (list &key (test #'eql) (key #'identity))
>   "Groups items in a list based on TEST and KEY.  If TEST is one of
> the
> standard equality functions, the function will likely be considerably
> faster."
>   (if (standard-test-function-p test)
>       (group-hash list test key)
>       (group-list list test key)))

Cool. Didn't think about that. From what I know till now, this is 
probably the optimum solution.

>
> 
> I can't see any way around the problem of iterating through the known
> equivalence classes (groups) when testing to see what class an item
> should belong to.  With a pre-defined mapping from the keys to the
> natural numbers, you can deal with it in an efficient way (like hash
> tables).  Without that knowledge, I don't see a way of guaranteeing
> much of anything.
> 
> When situations like this arise, I try to take advantage of what
> information is available and get the function to work fast for those
> situations.  In this case, if the test function is compatible with a
> hash table, then I say take advantage of it.  There will still be
> situations where it's slow, but I don't see a major problem if you
> document the behaviour.
> 
> Alternatively, if you're not worried about tracing the function, you
> could write a compiler macro that specializes the call.

I don't think a compiler macro will make much of a difference though. 
The test for the TEST function is made only once, so it won't have much 
impact when the dataset is large, IMHO.

Chaitanya

From: Ken Tilton
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Thu, 10 May 2007 16:58:15 +0000
Message-ID: <N8I0i.6$ep4.4@newsfe12.lga>

Chaitanya Gupta wrote:
> Geoff Wozniak wrote:
> 
>> On May 9, 2:41 am, Chaitanya Gupta <····@chaitanyagupta.com> wrote:
>>
>>> Alan Manuel K. Gloria wrote:
>>>
>>>> Hmm, as far as I can tell, hash tables would be the best for your
>>>> problem domain.
>>>
>>> Point. That's the problem I am trying to solve.
>>>
>>
>> Why not both hash tables and lists, depending on the situation?
>>
>> (defun group-hash (list test key)
>>   (let ((hash-table (make-hash-table :test test)))
>>     (dolist (el list)
>>       (push el (gethash (funcall key el) hash-table)))
>>     (loop
>>        for val being the hash-value in hash-table
>>        collect val into vals
>>        finally (return vals))))
>>
>> (defun group-list (list test key)
>>   (let ((groups nil))
>>     (dolist (elt list groups)
>>       (let ((pos (position (funcall key elt) groups :test
>>                                  (lambda (e group)
>>                                    (funcall test e
>>                                               (funcall key (car
>> group)))))))
>>         (if pos
>>            (push elt (nth pos groups))
>>            (push (list elt) groups))))))
>>
>> (defun standard-test-function-p (fn)
>>   (member fn (list 'eq     #'eq
>>                           'eql    #'eql
>>                           'equal  #'equal
>>                           'equalp #'equalp)))
>>
>> (defun group (list &key (test #'eql) (key #'identity))
>>   "Groups items in a list based on TEST and KEY.  If TEST is one of
>> the
>> standard equality functions, the function will likely be considerably
>> faster."
>>   (if (standard-test-function-p test)
>>       (group-hash list test key)
>>       (group-list list test key)))
> 
> 
> Cool. Didn't think about that. From what I know till now, this is 
> probably the optimum solution.

What happens if you have a non-standard test and a large dataset?

Part of the problem is your definition of the problem: you limit
solutions to standard Lisp constructs. Is this homework? If not, use
something non-standard, a C library or your own hashing algorithm.

But if you really really really want to use only standard Lisp, then
listen to yourself!:

You want hash tables for speed, and they only take certain tests. So you
have to use one of those tests:

(group2 *strings*
      :key #'(lambda (x &aux (c (char x 0)))
                (cons c (upper-case-p c)))
      :test #'equal)

ie, if the mountain won't come to your test function, your key function
must go to the mountain.

hth,kzo

-- 
http://www.theoryyalgebra.com/

"Algebra is the metaphysics of arithmetic." - John Ray

"As long as algebra is taught in school,
there will be prayer in school." - Cokie Roberts

"Stand firm in your refusal to remain conscious during algebra."
    - Fran Lebowitz

"I'm an algebra liar. I figure two good lies make a positive."
    - Tim Allen

From: Chaitanya Gupta
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Thu, 10 May 2007 18:04:11 +0000
Message-ID: <1178820251.086157.188930@o5g2000hsb.googlegroups.com>

On May 10, 9:58 pm, Ken Tilton <···········@optonline.net> wrote:
> What happens if you have a non-standard test and a large dataset?
That of course remains a problem.

>
> Part of the problem is your definition of the problem: you limit
> solutions to standard Lisp constructs. Is this homework? If not, use
> something non-standard, a C library or your own hashing algorithm.
>
> But if you really really really want to use only standard Lisp, then
> listen to yourself!:
I don't really really really want to use only standard (Common) Lisp,
but I was hoping I could do it that way.

>
> You want hash tables for speed, and they only take certain tests. So you
> have to use one of those tests:
>
> (group2 *strings*
>       :key #'(lambda (x &aux (c (char x 0)))
>                 (cons c (upper-case-p c)))
>       :test #'equal)
>
> ie, if the mountain won't come to your test function, your key function
> must go to the mountain.
I know, and this is what I have been doing, but that's just plain
*ugly*. It's not the right thing...

Still, I think I can get away with GROUP2 (or Geoff Wozniak's GROUP-
LIST) most of the time I need a non-hashtable test. So I will probably
use something similar to the above solution (i.e. use GROUP-HASH when
the test is one of the standard hash table tests, and GROUP-LIST
otherwise).

Unless, of course, there's still some standard way to make the
mountain come to TEST... ;)

Chaitanya

From: Geoff Wozniak
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Thu, 10 May 2007 20:24:20 +0000
Message-ID: <1178828660.277503.213760@e65g2000hsc.googlegroups.com>

On May 10, 2:04 pm, Chaitanya Gupta <····@chaitanyagupta.com> wrote:
> Unless, of course, there's still some standard way to make the
> mountain come to TEST... ;)
>

There's an excellent article by Kent Pitman that may provide some
insight into the difficulties of making the mountain come to TEST.

http://www.nhplace.com/kent/PS/EQUAL.html

Kenny is right: the KEY function must come to the mountain because it
affects the TEST function.  If you make the KEY function return
something compatible with the EQ* functions, then you'll always be
able to use hash tables.  Otherwise, you'll be forced to write your
own hash functions.  (And since it looks like you're using ACL, this
means you'll be able to use them in conjunction with MAKE-HASH-TABLE.)

Geoff

From: Chaitanya Gupta
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Sat, 12 May 2007 09:45:31 +0000
Message-ID: <f242bp$rq2$1@registered.motzarella.org>

Geoff Wozniak wrote:
> On May 10, 2:04 pm, Chaitanya Gupta <····@chaitanyagupta.com> wrote:
>> Unless, of course, there's still some standard way to make the
>> mountain come to TEST... ;)
>>
> 
> There's an excellent article by Kent Pitman that may provide some
> insight into the difficulties of making the mountain come to TEST.
> 
> http://www.nhplace.com/kent/PS/EQUAL.html
Thanks for the link. Really excellent.

> 
> Kenny is right: the KEY function must come to the mountain because it
> affects the TEST function.  If you make the KEY function return
> something compatible with the EQ* functions, then you'll always be
> able to use hash tables.  Otherwise, you'll be forced to write your
> own hash functions.  (And since it looks like you're using ACL, this
> means you'll be able to use them in conjunction with MAKE-HASH-TABLE.)
Thanks for the bit about ACL's hash tables. I will check it up. But then 
of course, I go off from my original requirement for a "standard" CL 
solution... ;)

Chaitanya

From: Ken Tilton
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Thu, 10 May 2007 18:18:41 +0000
Message-ID: <bkJ0i.16$ep4.9@newsfe12.lga>

Chaitanya Gupta wrote:
> On May 10, 9:58 pm, Ken Tilton <···········@optonline.net> wrote:
> 
>>What happens if you have a non-standard test and a large dataset?
> 
> That of course remains a problem.
> 
> 
>>Part of the problem is your definition of the problem: you limit
>>solutions to standard Lisp constructs. Is this homework? If not, use
>>something non-standard, a C library or your own hashing algorithm.
>>
>>But if you really really really want to use only standard Lisp, then
>>listen to yourself!:
> 
> I don't really really really want to use only standard (Common) Lisp,
> but I was hoping I could do it that way.
> 
> 
>>You want hash tables for speed, and they only take certain tests. So you
>>have to use one of those tests:
>>
>>(group2 *strings*
>>      :key #'(lambda (x &aux (c (char x 0)))
>>                (cons c (upper-case-p c)))
>>      :test #'equal)
>>
>>ie, if the mountain won't come to your test function, your key function
>>must go to the mountain.
> 
> I know, and this is what I have been doing, but that's just plain
> *ugly*. It's not the right thing...

Ugly? Funny, I was thinking "elegant". I am afraid you are not yet 
listening to yourself. Given your constraints (the acceptable test 
parameters to the only standard Lisp mechanism that does random access), 
that is precisely The Right Thing: provide a key function that satisfies 
one of the acceptable test parameters.

So what we see is that poor Wade is working his tail off to solve a 
problem arising only from your poor judgment of ugliness. tsk, tsk.

:)

kenzo

-- 
http://www.theoryyalgebra.com/

"Algebra is the metaphysics of arithmetic." - John Ray

"As long as algebra is taught in school,
there will be prayer in school." - Cokie Roberts

"Stand firm in your refusal to remain conscious during algebra."
    - Fran Lebowitz

"I'm an algebra liar. I figure two good lies make a positive."
    - Tim Allen

From: Chaitanya Gupta
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Thu, 10 May 2007 18:53:26 +0000
Message-ID: <1178823206.215035.202250@h2g2000hsg.googlegroups.com>

On May 10, 11:18 pm, Ken Tilton <···········@optonline.net> wrote:
> Ugly? Funny, I was thinking "elegant". I am afraid you are not yet
> listening to yourself. Given your constraints (the acceptable test
> parameters to the only standard Lisp mechanism that does random access),
I guess this is precisely what I should have asked in the first place
- whether anything other than hash tables that provide random access
in CL (I think symbol plists do, but then the test is even more
constrained). If there's really nothing else, I won't worry about a
"standard" CL solution.

> that is precisely The Right Thing: provide a key function that satisfies
> one of the acceptable test parameters.

Its maybe the right thing.. but its still ugly. I would prefer

(group things
       :test #'(lambda (x y)
		 (and (= (some-number x) (some-number y))
		      (string-equal (some-string x) (some-string y)))))

to

(group things
       :key #'(lambda (x)
		(list (some-number x) (some-string y)))
       :test #'equal)


And if you want to take things even further (disclaimer: I don't think
I'll ever need this, but still...)

(group things
       :test #'(lambda (x y)
		 (or (= (some-number x) (some-number y))
		     (string-equal (some-string x) (some-string y)))))


How do you take the key to the mountain here? I can't think of
anything (maybe because I am only half-awake now, really should get
some sleep...)

>
> So what we see is that poor Wade is working his tail off to solve a
> problem arising only from your poor judgment of ugliness. tsk, tsk.
>
I hope Wade doesn't read this. ;)

cheers,
Chaitanya

From: Ken Tilton
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Thu, 10 May 2007 19:35:52 +0000
Message-ID: <ysK0i.22$kJ1.8@newsfe12.lga>

Chaitanya Gupta wrote:
> On May 10, 11:18 pm, Ken Tilton <···········@optonline.net> wrote:
> 
>>Ugly? Funny, I was thinking "elegant". I am afraid you are not yet
>>listening to yourself. Given your constraints (the acceptable test
>>parameters to the only standard Lisp mechanism that does random access),
> 
> I guess this is precisely what I should have asked in the first place
> - whether anything other than hash tables that provide random access
> in CL (I think symbol plists do, but then the test is even more
> constrained). If there's really nothing else, I won't worry about a
> "standard" CL solution.
> 
> 
>>that is precisely The Right Thing: provide a key function that satisfies
>>one of the acceptable test parameters.
> 
> 
> Its maybe the right thing.. but its still ugly. 

Well, you certainly have your heels dug in on the word "ugly". I can 
almost hear you stamping your foot. Cleary I am going to need a bigger 
tractor:

Tilton's Law: "The model must map perfectly onto the thing being 
modelled, including looking bad where the thing modelled looks bad."

The artificial constraints on the solution (limited test funcs) verily 
demand heavy-lifting by the key function.

If Tilton's Law is enough to pry the word "ugly" from your tongue, maybe 
Einstein can: "Make everything as simple as possible, but not simpler."

Look at it another way: can you imagine how general-purpose 
random-access with any test function would be implemented? I am just a 
simple application programmer, but if I do not know when X and Y are the 
same, how do I (as author of this general purpose random-access library) 
concoct a key entry for X which will reliably find X but not Y when they 
are different according to the arbitrary test, or replace X in the index 
when they are the same?

If, as I suspect, this is inconceivable, then your very requirement 
is... well, let's just hope Wade does not see this, either.

:)

kenzo

-- 
http://www.theoryyalgebra.com/

"Algebra is the metaphysics of arithmetic." - John Ray

"As long as algebra is taught in school,
there will be prayer in school." - Cokie Roberts

"Stand firm in your refusal to remain conscious during algebra."
    - Fran Lebowitz

"I'm an algebra liar. I figure two good lies make a positive."
    - Tim Allen

From: Chaitanya Gupta
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Sat, 12 May 2007 09:28:30 +0000
Message-ID: <f241bq$ot2$1@registered.motzarella.org>

Ken Tilton wrote:
> 
> 
> 
> Well, you certainly have your heels dug in on the word "ugly". I can 
> almost hear you stamping your foot. Cleary I am going to need a bigger 
> tractor:
> 
> Tilton's Law: "The model must map perfectly onto the thing being 
> modelled, including looking bad where the thing modelled looks bad."
Makes sense. But, what is the model? And what is the thing being modelled?

> 
> The artificial constraints on the solution (limited test funcs) verily 
> demand heavy-lifting by the key function.
Never disagreed with that.

> 
> If Tilton's Law is enough to pry the word "ugly" from your tongue, maybe 
> Einstein can: "Make everything as simple as possible, but not simpler."
This sounds right, but I just can't understand it... Maybe when I grow 
old, and have white hair, I will... ;)

> 
> Look at it another way: can you imagine how general-purpose 
> random-access with any test function would be implemented? I am just a 
> simple application programmer, but if I do not know when X and Y are the 
> same, how do I (as author of this general purpose random-access library) 
> concoct a key entry for X which will reliably find X but not Y when they 
> are different according to the arbitrary test, or replace X in the index 
> when they are the same?
Hmmm... I think I can see the point you are trying to make. Maybe I 
should understand better how hash tables work.


Chaitanya

From: Ken Tilton
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Thu, 10 May 2007 17:53:52 +0000
Message-ID: <WYI0i.14$ep4.13@newsfe12.lga>

Ken Tilton wrote:
> 
> 
> Chaitanya Gupta wrote:
> 
>> Geoff Wozniak wrote:
>>
>>> On May 9, 2:41 am, Chaitanya Gupta <····@chaitanyagupta.com> wrote:
>>>
>>>> Alan Manuel K. Gloria wrote:
>>>>
>>>>> Hmm, as far as I can tell, hash tables would be the best for your
>>>>> problem domain.
>>>>
>>>>
>>>> Point. That's the problem I am trying to solve.
>>>>
>>>
>>> Why not both hash tables and lists, depending on the situation?
>>>
>>> (defun group-hash (list test key)
>>>   (let ((hash-table (make-hash-table :test test)))
>>>     (dolist (el list)
>>>       (push el (gethash (funcall key el) hash-table)))
>>>     (loop
>>>        for val being the hash-value in hash-table
>>>        collect val into vals
>>>        finally (return vals))))
>>>
>>> (defun group-list (list test key)
>>>   (let ((groups nil))
>>>     (dolist (elt list groups)
>>>       (let ((pos (position (funcall key elt) groups :test
>>>                                  (lambda (e group)
>>>                                    (funcall test e
>>>                                               (funcall key (car
>>> group)))))))
>>>         (if pos
>>>            (push elt (nth pos groups))
>>>            (push (list elt) groups))))))
>>>
>>> (defun standard-test-function-p (fn)
>>>   (member fn (list 'eq     #'eq
>>>                           'eql    #'eql
>>>                           'equal  #'equal
>>>                           'equalp #'equalp)))
>>>
>>> (defun group (list &key (test #'eql) (key #'identity))
>>>   "Groups items in a list based on TEST and KEY.  If TEST is one of
>>> the
>>> standard equality functions, the function will likely be considerably
>>> faster."
>>>   (if (standard-test-function-p test)
>>>       (group-hash list test key)
>>>       (group-list list test key)))
>>
>>
>>
>> Cool. Didn't think about that. From what I know till now, this is 
>> probably the optimum solution.
> 
> 
> What happens if you have a non-standard test and a large dataset?
> 
> Part of the problem is your definition of the problem: you limit
> solutions to standard Lisp constructs. Is this homework? If not, use
> something non-standard, a C library or your own hashing algorithm.
> 
> But if you really really really want to use only standard Lisp, then
> listen to yourself!:
> 
> You want hash tables for speed, and they only take certain tests. So you
> have to use one of those tests:
> 
> (group2 *strings*
>      :key #'(lambda (x &aux (c (char x 0)))
>                (cons c (upper-case-p c)))
>      :test #'equal)

Sorry, I meant to disclose "mostly untested", and I meant to (downcase 
or upcase (or char-code, think to come of it.. hmmms, can we avoid the 
cons? express lowercase as minus the char-code? something like that...)) 
the char.

kt

> 
> ie, if the mountain won't come to your test function, your key function
> must go to the mountain.
> 
> hth,kzo
> 

-- 
http://www.theoryyalgebra.com/

"Algebra is the metaphysics of arithmetic." - John Ray

"As long as algebra is taught in school,
there will be prayer in school." - Cokie Roberts

"Stand firm in your refusal to remain conscious during algebra."
    - Fran Lebowitz

"I'm an algebra liar. I figure two good lies make a positive."
    - Tim Allen

From: Chaitanya Gupta
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Thu, 10 May 2007 18:09:04 +0000
Message-ID: <1178820543.975366.87910@y5g2000hsa.googlegroups.com>

On May 10, 10:53 pm, Ken Tilton <···········@optonline.net> wrote:
> Ken Tilton wrote:
>
> > Chaitanya Gupta wrote:
>
> >> Geoff Wozniak wrote:
>
> >>> On May 9, 2:41 am, Chaitanya Gupta <····@chaitanyagupta.com> wrote:
>
> >>>> Alan Manuel K. Gloria wrote:
>
> >>>>> Hmm, as far as I can tell, hash tables would be the best for your
> >>>>> problem domain.
>
> >>>> Point. That's the problem I am trying to solve.
>
> >>> Why not both hash tables and lists, depending on the situation?
>
> >>> (defun group-hash (list test key)
> >>>   (let ((hash-table (make-hash-table :test test)))
> >>>     (dolist (el list)
> >>>       (push el (gethash (funcall key el) hash-table)))
> >>>     (loop
> >>>        for val being the hash-value in hash-table
> >>>        collect val into vals
> >>>        finally (return vals))))
>
> >>> (defun group-list (list test key)
> >>>   (let ((groups nil))
> >>>     (dolist (elt list groups)
> >>>       (let ((pos (position (funcall key elt) groups :test
> >>>                                  (lambda (e group)
> >>>                                    (funcall test e
> >>>                                               (funcall key (car
> >>> group)))))))
> >>>         (if pos
> >>>            (push elt (nth pos groups))
> >>>            (push (list elt) groups))))))
>
> >>> (defun standard-test-function-p (fn)
> >>>   (member fn (list 'eq     #'eq
> >>>                           'eql    #'eql
> >>>                           'equal  #'equal
> >>>                           'equalp #'equalp)))
>
> >>> (defun group (list &key (test #'eql) (key #'identity))
> >>>   "Groups items in a list based on TEST and KEY.  If TEST is one of
> >>> the
> >>> standard equality functions, the function will likely be considerably
> >>> faster."
> >>>   (if (standard-test-function-p test)
> >>>       (group-hash list test key)
> >>>       (group-list list test key)))
>
> >> Cool. Didn't think about that. From what I know till now, this is
> >> probably the optimum solution.
>
> > What happens if you have a non-standard test and a large dataset?
>
> > Part of the problem is your definition of the problem: you limit
> > solutions to standard Lisp constructs. Is this homework? If not, use
> > something non-standard, a C library or your own hashing algorithm.
>
> > But if you really really really want to use only standard Lisp, then
> > listen to yourself!:
>
> > You want hash tables for speed, and they only take certain tests. So you
> > have to use one of those tests:
>
> > (group2 *strings*
> >      :key #'(lambda (x &aux (c (char x 0)))
> >                (cons c (upper-case-p c)))
> >      :test #'equal)
>
> Sorry, I meant to disclose "mostly untested", and I meant to (downcase
> or upcase (or char-code, think to come of it.. hmmms, can we avoid the
> cons? express lowercase as minus the char-code? something like that...))
> the char.
>

Maybe just use #'equalp as test?

From: Ken Tilton
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Thu, 10 May 2007 18:26:03 +0000
Message-ID: <5rJ0i.17$ep4.16@newsfe12.lga>

Chaitanya Gupta wrote:
> On May 10, 10:53 pm, Ken Tilton <···········@optonline.net> wrote:
> 
>>Ken Tilton wrote:
>>
>>
>>>Chaitanya Gupta wrote:
>>
>>>>Geoff Wozniak wrote:
>>
>>>>>On May 9, 2:41 am, Chaitanya Gupta <····@chaitanyagupta.com> wrote:
>>
>>>>>>Alan Manuel K. Gloria wrote:
>>
>>>>>>>Hmm, as far as I can tell, hash tables would be the best for your
>>>>>>>problem domain.
>>
>>>>>>Point. That's the problem I am trying to solve.
>>
>>>>>Why not both hash tables and lists, depending on the situation?
>>
>>>>>(defun group-hash (list test key)
>>>>>  (let ((hash-table (make-hash-table :test test)))
>>>>>    (dolist (el list)
>>>>>      (push el (gethash (funcall key el) hash-table)))
>>>>>    (loop
>>>>>       for val being the hash-value in hash-table
>>>>>       collect val into vals
>>>>>       finally (return vals))))
>>
>>>>>(defun group-list (list test key)
>>>>>  (let ((groups nil))
>>>>>    (dolist (elt list groups)
>>>>>      (let ((pos (position (funcall key elt) groups :test
>>>>>                                 (lambda (e group)
>>>>>                                   (funcall test e
>>>>>                                              (funcall key (car
>>>>>group)))))))
>>>>>        (if pos
>>>>>           (push elt (nth pos groups))
>>>>>           (push (list elt) groups))))))
>>
>>>>>(defun standard-test-function-p (fn)
>>>>>  (member fn (list 'eq     #'eq
>>>>>                          'eql    #'eql
>>>>>                          'equal  #'equal
>>>>>                          'equalp #'equalp)))
>>
>>>>>(defun group (list &key (test #'eql) (key #'identity))
>>>>>  "Groups items in a list based on TEST and KEY.  If TEST is one of
>>>>>the
>>>>>standard equality functions, the function will likely be considerably
>>>>>faster."
>>>>>  (if (standard-test-function-p test)
>>>>>      (group-hash list test key)
>>>>>      (group-list list test key)))
>>
>>>>Cool. Didn't think about that. From what I know till now, this is
>>>>probably the optimum solution.
>>
>>>What happens if you have a non-standard test and a large dataset?
>>
>>>Part of the problem is your definition of the problem: you limit
>>>solutions to standard Lisp constructs. Is this homework? If not, use
>>>something non-standard, a C library or your own hashing algorithm.
>>
>>>But if you really really really want to use only standard Lisp, then
>>>listen to yourself!:
>>
>>>You want hash tables for speed, and they only take certain tests. So you
>>>have to use one of those tests:
>>
>>>(group2 *strings*
>>>     :key #'(lambda (x &aux (c (char x 0)))
>>>               (cons c (upper-case-p c)))
>>>     :test #'equal)
>>
>>Sorry, I meant to disclose "mostly untested", and I meant to (downcase
>>or upcase (or char-code, think to come of it.. hmmms, can we avoid the
>>cons? express lowercase as minus the char-code? something like that...))
>>the char.
>>
> 
> 
> Maybe just use #'equalp as test?
> 

We're looking for speed, I like EQL and a key of char-code minusized for 
uppercase.

kzo

ps. Oh, sh*t, #\a and #\A have different char-codes. 
PWAUAUAHAHAJJAJAJAJAJAHAHHAHAHAH!!!!!!


-- 
http://www.theoryyalgebra.com/

"Algebra is the metaphysics of arithmetic." - John Ray

"As long as algebra is taught in school,
there will be prayer in school." - Cokie Roberts

"Stand firm in your refusal to remain conscious during algebra."
    - Fran Lebowitz

"I'm an algebra liar. I figure two good lies make a positive."
    - Tim Allen

From: Chaitanya Gupta
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Tue, 08 May 2007 12:36:40 +0000
Message-ID: <f1pqsp$nvo$1@registered.motzarella.org>

I see that GROUP2 wasn't indented properly in the last post. Hopefully 
this should come out right -

(defun group2 (list &key (test #'eql) (key #'identity))
   (let ((result nil))
     (dolist (el list)
       (let* ((entry-key (funcall key el))
	     (entry (assoc entry-key result :test test)))
	(if entry
	    (push el (cdr entry))
	    (push (list entry-key el) result))))
     (mapcar #'cdr result)))
=>
GROUP2

From: Wade Humeniuk
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Tue, 08 May 2007 23:05:36 +0000
Message-ID: <m2lkfysyps.fsf@telus.net.no.spam>

Here is antoher way.  With some tweaking it could be more efficient.

(defun group (list &key (test #'eql) (key #'identity) &aux groups)
  (dolist (elt list groups)
    (let ((pos (position elt groups :test
			 (lambda (e group)
			   (funcall test (funcall key e)
				    (funcall key (car group)))))))
      (if pos
	  (push elt (nth pos groups))
	  (push (list elt) groups)))))

; SLIME 2007-04-19
;Compiling "/private/var/tmp/tmp\\.0.419Tsv"...
;Loading #P"/var/tmp/tmp\\.0.dx64fsl"...
CL-USER> (defvar *strings* '("foo" "bar" "baz" "quux" "quuux" "bazola" "gorp"))

*STRINGS*
CL-USER> (group *strings* :key #'(lambda (x) (char x 0)) :test #'char=)
(("gorp") ("quuux" "quux") ("bazola" "baz" "bar") ("foo"))
CL-USER> 

Wade

From: Chaitanya Gupta
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Wed, 09 May 2007 06:33:33 +0000
Message-ID: <f1rpvu$9ro$1@registered.motzarella.org>

Wade Humeniuk wrote:
> Here is antoher way.  With some tweaking it could be more efficient.
> 
> (defun group (list &key (test #'eql) (key #'identity) &aux groups)
>   (dolist (elt list groups)
>     (let ((pos (position elt groups :test
> 			 (lambda (e group)
> 			   (funcall test (funcall key e)
> 				    (funcall key (car group)))))))
>       (if pos
> 	  (push elt (nth pos groups))
> 	  (push (list elt) groups)))))


Unfortunately, this way is prohibitively slow, and one reason for that 
is the excessive use of KEY.

CL-USER> (defvar *nums* (loop repeat 100000 collect (random 100000)))
*NUMS*

CL-USER> (time (progn (group *nums* :key #'(lambda (x) (mod x 100))) nil))
; cpu time (non-gc) 20,900 msec user, 90 msec system
; cpu time (gc)     8,060 msec user, 40 msec system
; cpu time (total)  28,960 msec user, 130 msec system
; real time  29,157 msec
; space allocation:
;  100,729,663 cons cells, -728,850,992 other bytes, 0 static bytes
NIL


Compare this with GROUP2 in my first post -

CL-USER> (time (progn (group2 *nums* :key #'(lambda (x) (mod x 100))) nil))
; cpu time (non-gc) 390 msec user, 0 msec system
; cpu time (gc)     90 msec user, 10 msec system
; cpu time (total)  480 msec user, 10 msec system
; real time  489 msec
; space allocation:
;  1,100,303 cons cells, 120,800,048 other bytes, 0 static bytes
NIL


Here's an improvement which will make it slightly faster -

(defun group (list &key (test #'eql) (key #'identity) &aux groups)
   (dolist (elt list groups)
     (let ((pos (position (funcall key elt) groups :test
			 (lambda (e group)
			   (funcall test e
				    (funcall key (car group)))))))
       (if pos
	  (push elt (nth pos groups))
	  (push (list elt) groups)))))



CL-USER> (group *strings* :test #'(lambda (x y)
				     (char= (char x 0) (char y 0))))
(("gorp") ("quuux" "quux") ("bazola" "baz" "bar") ("foo"))


CL-USER> (time (progn (group *nums* :key #'(lambda (x) (mod x 100))) nil))
; cpu time (non-gc) 11,030 msec user, 40 msec system
; cpu time (gc)     4,110 msec user, 30 msec system
; cpu time (total)  15,140 msec user, 70 msec system
; real time  15,237 msec
; space allocation:
;  51,414,883 cons cells, -2,391,109,120 other bytes, 0 static bytes
NIL


But still too  slow. I believe that if you can avoid calling KEY in 
(funcall key (car group)), it can go much faster. But then I think you 
will get something very similar to GROUP2. So I am not sure whether this 
is the way to go...

Chaitanya

From: Wade Humeniuk
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Wed, 09 May 2007 13:27:40 +0000
Message-ID: <m2bqguyvn7.fsf@telus.net.no.spam>

Chaitanya Gupta <····@chaitanyagupta.com> writes:

> Wade Humeniuk wrote:
> (defun group (list &key (test #'eql) (key #'identity) &aux groups)
>   (dolist (elt list groups)
>     (let ((pos (position (funcall key elt) groups :test
> 			 (lambda (e group)
> 			   (funcall test e
> 				    (funcall key (car group)))))))
>       (if pos
> 	  (push elt (nth pos groups))
> 	  (push (list elt) groups)))))
>
>
>
> CL-USER> (group *strings* :test #'(lambda (x y)
> 				     (char= (char x 0) (char y 0))))
> (("gorp") ("quuux" "quux") ("bazola" "baz" "bar") ("foo"))
>
>
> CL-USER> (time (progn (group *nums* :key #'(lambda (x) (mod x 100))) nil))
> ; cpu time (non-gc) 11,030 msec user, 40 msec system
> ; cpu time (gc)     4,110 msec user, 30 msec system
> ; cpu time (total)  15,140 msec user, 70 msec system
> ; real time  15,237 msec
> ; space allocation:
> ;  51,414,883 cons cells, -2,391,109,120 other bytes, 0 static bytes
> NIL
>
>
> But still too  slow. I believe that if you can avoid calling KEY in
> (funcall key (car group)), it can go much faster. But then I think you
> will get something very similar to GROUP2. So I am not sure whether
> this is the way to go...
>


Give this a test...

(defstruct key-group key (group nil :type list))

(defun group (list &key (test #'eql) (key #'identity) &aux groups)
  (dolist (elt list (mapcar 'key-group-group groups))
    (let* ((key (funcall key elt))
	   (group (find key groups :test test :key #'key-group-key)))
      (if group
	  (push elt (key-group-group group))
	  (push (make-key-group :key key :group (list elt))
		groups)))))

Wade

From: Chaitanya Gupta
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Thu, 10 May 2007 14:26:58 +0000
Message-ID: <f1va3m$76h$1@registered.motzarella.org>

Wade Humeniuk wrote:
> 
> Give this a test...
> 
> (defstruct key-group key (group nil :type list))
> 
> (defun group (list &key (test #'eql) (key #'identity) &aux groups)
>   (dolist (elt list (mapcar 'key-group-group groups))
>     (let* ((key (funcall key elt))
> 	   (group (find key groups :test test :key #'key-group-key)))
>       (if group
> 	  (push elt (key-group-group group))
> 	  (push (make-key-group :key key :group (list elt))
> 		groups)))))
> 


Speed wise, (compared to GROUP2),

CL-USER> (time (loop repeat 10 do (group *nums* :key #'(lambda (x) (mod 
x 100)))))
; cpu time (non-gc) 4,130 msec user, 10 msec system
; cpu time (gc)     140 msec user, 0 msec system
; cpu time (total)  4,270 msec user, 10 msec system
; real time  4,283 msec
; space allocation:
;  12,002,348 cons cells, 80,039,360 other bytes, 0 static bytes
NIL


CL-USER> (time (loop repeat 10 do (group2 *nums* :key #'(lambda (x) (mod 
x 100)))))
; cpu time (non-gc) 3,550 msec user, 10 msec system
; cpu time (gc)     220 msec user, 0 msec system
; cpu time (total)  3,770 msec user, 10 msec system
; real time  3,786 msec
; space allocation:
;  12,003,348 cons cells, 80,015,456 other bytes, 0 static bytes
NIL


Not much or a difference there. Both these functions suffer from the 
same problem - they use a loop within a loop (as Alan K. Gloria 
mentioned), and I am not sure there is any way around it...

Chaitanya

From: Duane Rettig
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Thu, 10 May 2007 17:54:51 +0000
Message-ID: <o0fy64y36c.fsf@gemini.franz.com>

Chaitanya Gupta <····@chaitanyagupta.com> writes:

> (defun group (list &key (test #'eql) (key #'identity) &aux groups)
  ...

> CL-USER> (time (loop repeat 10 do (group *nums* :key #'(lambda (x)
> (mod x 100)))))

Are you sure everything is compiled?  Be sure to compile the
functions.  Also, as I always suggest at the tutorials I give, I
suggest putting the loop form itself into a function and compiling it:

(compile
  (defun test ()
    (loop repeat 10 do (group *nums* :key #'(lambda (x) (mod x 100))))))

(time (test))

otherwise, a lot of what you're timing is the interpreter, on lisps
that use an interpreter, and that is not very interesting.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Chaitanya Gupta
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Thu, 10 May 2007 19:02:13 +0000
Message-ID: <1178823733.020837.23250@e65g2000hsc.googlegroups.com>

On May 10, 10:54 pm, Duane Rettig <····@franz.com> wrote:
> Chaitanya Gupta <····@chaitanyagupta.com> writes:
> > (defun group (list &key (test #'eql) (key #'identity) &aux groups)
>
>   ...
>
> > CL-USER> (time (loop repeat 10 do (group *nums* :key #'(lambda (x)
> > (mod x 100)))))
>
> Are you sure everything is compiled?  Be sure to compile the
> functions.  Also, as I always suggest at the tutorials I give, I
> suggest putting the loop form itself into a function and compiling it:
>
> (compile
>   (defun test ()
>     (loop repeat 10 do (group *nums* :key #'(lambda (x) (mod x 100))))))
>
> (time (test))
>
> otherwise, a lot of what you're timing is the interpreter, on lisps
> that use an interpreter, and that is not very interesting.

Thanks for the suggestion. It definitely gives a speed bump. But space
allocation is where things improve a lot.

These are the results with the version using alists -

CL-USER> (time (loop repeat 10 do (group2 *nums* :key #'(lambda (x)
(mod x 1000)))))
; cpu time (non-gc) 18,210 msec user, 30 msec system
; cpu time (gc)     210 msec user, 0 msec system
; cpu time (total)  18,420 msec user, 30 msec system
; real time  18,473 msec
; space allocation:
;  12,030,348 cons cells, 80,015,456 other bytes, 0 static bytes
NIL
CL-USER> (time
	  (funcall
	   (compile nil (lambda ()
			  (loop repeat 10 do (group2 *nums* :key #'(lambda (x) (mod x
1000))))))))
; cpu time (non-gc) 16,070 msec user, 20 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  16,070 msec user, 20 msec system
; real time  16,100 msec
; space allocation:
;  1,034,277 cons cells, 17,904 other bytes, 0 static bytes
NIL


And these are for the hash table version -

CL-USER> (time (loop repeat 10 do (group *nums* :key #'(lambda (x)
(mod x 1000)))))
; cpu time (non-gc) 2,030 msec user, 0 msec system
; cpu time (gc)     170 msec user, 0 msec system
; cpu time (total)  2,200 msec user, 0 msec system
; real time  2,206 msec
; space allocation:
;  12,010,358 cons cells, 80,608,160 other bytes, 0 static bytes
NIL
CL-USER> (time
	  (funcall
	   (compile nil (lambda ()
			  (loop repeat 10 do (group *nums* :key #'(lambda (x) (mod x
1000))))))))
; cpu time (non-gc) 170 msec user, 0 msec system
; cpu time (gc)     10 msec user, 0 msec system
; cpu time (total)  180 msec user, 0 msec system
; real time  177 msec
; space allocation:
;  1,014,287 cons cells, 617,280 other bytes, 0 static bytes
NIL

Chaitanya

From: Duane Rettig
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Thu, 10 May 2007 19:07:20 +0000
Message-ID: <o0bqgsxztj.fsf@gemini.franz.com>

Chaitanya Gupta <····@chaitanyagupta.com> writes:

> CL-USER> (time
> 	  (funcall
> 	   (compile nil (lambda ()
> 			  (loop repeat 10 do (group2 *nums* :key #'(lambda (x) (mod x
> 1000))))))))

Remember, time is a macro; its arguments are not evaluated before it
is "invoked".  So now, you are timing, among other things, the
compiler :-)

Create a dummy function and do the compile first, and then run time on
the call to the dummy function.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Pascal Bourguignon
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Thu, 10 May 2007 19:22:21 +0000
Message-ID: <877irglc0i.fsf@thalassa.lan.informatimago.com>

Duane Rettig <·····@franz.com> writes:

> Chaitanya Gupta <····@chaitanyagupta.com> writes:
>
>> CL-USER> (time
>> 	  (funcall
>> 	   (compile nil (lambda ()
>> 			  (loop repeat 10 do (group2 *nums* :key #'(lambda (x) (mod x
>> 1000))))))))
>
> Remember, time is a macro; its arguments are not evaluated before it
> is "invoked".  So now, you are timing, among other things, the
> compiler :-)
>
> Create a dummy function and do the compile first, and then run time on
> the call to the dummy function.

s/dummy/anonymous/

(funcall (compile nil (lambda ()
                       (time (loop repeat 10 
                                do (group2 *nums* 
                                           :key (lambda (x)
                                                   (mod x 1000))))))))

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

READ THIS BEFORE OPENING PACKAGE: According to certain suggested
versions of the Grand Unified Theory, the primary particles
constituting this product may decay to nothingness within the next
four hundred million years.

From: Chaitanya Gupta
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Sat, 12 May 2007 09:05:28 +0000
Message-ID: <f2400l$kno$1@registered.motzarella.org>

Duane Rettig wrote:
> Chaitanya Gupta <····@chaitanyagupta.com> writes:
> 
>> CL-USER> (time
>> 	  (funcall
>> 	   (compile nil (lambda ()
>> 			  (loop repeat 10 do (group2 *nums* :key #'(lambda (x) (mod x
>> 1000))))))))
> 
> Remember, time is a macro; its arguments are not evaluated before it
> is "invoked".  So now, you are timing, among other things, the
> compiler :-)

Oops. Thanks for pointing this out. :)

From: Madhu
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Wed, 09 May 2007 09:59:09 +0000
Message-ID: <m3wszigvwy.fsf@robolove.meer.net>

* Chaitanya Gupta <············@registered.motzarella.org> :

| I have a function, GROUP, which groups together items in a list if they
| satisfy a TEST (of two arguments), after applying the parameter KEY to 
| both the items -

Are you trying to partition the data?  Can an element belong to more
than one group? The semantics are not clear from the examples you
gave.

Consider the output of these two, adapted from your examples to see my
point.

(group2 *nums* :key #'(lambda (x) (or (mod x 3) (mod x 10))))
(group2 *nums* :key #'(lambda (x) (or (mod x 10) (mod x 3))))

This makes it very sensitive to what the KEY is.

So, if you are not sure your KEY will partition your data into a
number of disjoint sets, I think you will have to consider all
combinations of your input taken two at a time.  If efficiently
implemented, this will involve {}^nC_2 applications of TEST.  And
twice as many calls to KEY PLUS as many calls to KEY to find the right
bucket you are hashing the two elements to, which I'd guess would be
the order of the length of the output, and unavoidable.

It is not clear if such a generalization is worthwhile, and you may be
better off trying to exploit features/structures specific to your
actual problem data, say to name each bucket.
--
Madhu

From: Madhu
Subject: Re: How do I make this utility more flexible without losing speed?
Date: Wed, 09 May 2007 11:10:27 +0000
Message-ID: <m3sla6gsm4.fsf@robolove.meer.net>

 [ugh]

* Madhu <··············@robolove.meer.net> :
| * Chaitanya Gupta <············@registered.motzarella.org> :
|
| | I have a function, GROUP, which groups together items in a list if they
| | satisfy a TEST (of two arguments), after applying the parameter KEY to 
| | both the items -
|
| Are you trying to partition the data?  Can an element belong to more
| than one group? The semantics are not clear from the examples you
| gave.

Actually it is.  [I was confused by the fact that the output lists may
have had repeating elements, since you were using PUSH not PUSHNEW]

| (group2 *nums* :key #'(lambda (x) (or (mod x 3) (mod x 10))))

I probably meant:  (or (zerop (mod x 3)) (zerop (mod x 10)))

[snip]

Sorry for any confusion. :)
--
Madhu