Most efficient way of adding to a list?

From: dpapathanasiou
Subject: Most efficient way of adding to a list?
Date: Fri, 20 Jan 2006 20:48:37 +0000
Message-ID: <1137790117.309369.18180@g14g2000cwa.googlegroups.com>

As a student (still) of Common Lisp, I've been using it to develop an
experimental project, and I came across the following situation:

Suppose you have a global list to which you need to add more items
(without duplication) periodically at runtime.

Here's the solution I came up with:

(setf *items*
  ; A global list
  nil)

(defun increment-items (items)
  "Add each of the elements of items to *items* (if it is not there
already)."
  (mapcar #'(lambda (item)
	      (when (not (member item *items* :test #'equalp))
		(setf *items* (cons item *items*)))))
    items))

It does what I want, but is it the best (in terms of efficiency)
solution?

After reading Norvig's excellent book "Case Studies in Common Lisp", I
know it's important to avoid unnecessary consing, but I'm not sure how
I could do it differently to make it cons less.

Any suggestions or recommendations?

Re: Most efficient way of adding to a list? Tayssir John Gabbour
- Re: Most efficient way of adding to a list? Thomas F. Burdick
  - Re: Most efficient way of adding to a list? Pascal Bourguignon
    - Re: Most efficient way of adding to a list? Thomas F. Burdick
  - Re: Most efficient way of adding to a list? Thomas A. Russ
Re: Most efficient way of adding to a list? ·······@gmail.com
Re: Most efficient way of adding to a list? jayessay
- Re: Most efficient way of adding to a list? dpapathanasiou
  - Re: Most efficient way of adding to a list? Tayssir John Gabbour
  - Re: Most efficient way of adding to a list? jayessay
    - Re: Most efficient way of adding to a list? verec
      - Re: Most efficient way of adding to a list? Brian Downing
      - Re: Most efficient way of adding to a list? Edi Weitz
        Re: Most efficient way of adding to a list? verec
        Re: Most efficient way of adding to a list? jayessay
      - Re: Most efficient way of adding to a list? Tim X
  - Re: Most efficient way of adding to a list? Wade Humeniuk
  - Re: Most efficient way of adding to a list? Tim X
Re: Most efficient way of adding to a list? Wade Humeniuk

From: Tayssir John Gabbour
Subject: Re: Most efficient way of adding to a list?
Date: Fri, 20 Jan 2006 21:03:26 +0000
Message-ID: <1137791006.172703.177290@g44g2000cwa.googlegroups.com>

dpapathanasiou wrote:
> As a student (still) of Common Lisp, I've been using it to develop an
> experimental project, and I came across the following situation:
>
> Suppose you have a global list to which you need to add more items
> (without duplication) periodically at runtime.
>
> Here's the solution I came up with:
>
> (setf *items*
>   ; A global list
>   nil)
>
> (defun increment-items (items)
>   "Add each of the elements of items to *items* (if it is not there
> already)."
>   (mapcar #'(lambda (item)
> 	      (when (not (member item *items* :test #'equalp))
> 		(setf *items* (cons item *items*)))))
>     items))
>
> It does what I want, but is it the best (in terms of efficiency)
> solution?
>
> After reading Norvig's excellent book "Case Studies in Common Lisp", I
> know it's important to avoid unnecessary consing, but I'm not sure how
> I could do it differently to make it cons less.
>
> Any suggestions or recommendations?

Here's a more idiomatic version of your code...

;; untested!
(defparameter *items* nil)

(defun increment-items (items)
  "Add each of the elements of items to *items* (if it is not there
already)."
  (mapc (lambda (item) (pushnew item *items* :test #'equalp))
        items))


For performance, one thing you could do is use an 'equalp hashtable,
keying items into (say) T.

Tayssir

From: Thomas F. Burdick
Subject: Re: Most efficient way of adding to a list?
Date: Sat, 21 Jan 2006 11:23:48 +0000
Message-ID: <xcvacdp25a3.fsf@conquest.OCF.Berkeley.EDU>

"Tayssir John Gabbour" <···········@yahoo.com> writes:

> Here's a more idiomatic version of your code...
> 
> ;; untested!
> (defparameter *items* nil)
> 
> (defun increment-items (items)
>   "Add each of the elements of items to *items* (if it is not there
> already)."
>   (mapc (lambda (item) (pushnew item *items* :test #'equalp))
>         items))

Or even:

  (defparameter *items* nil)

  (defun increment-items (items)
    (setf *items* (union items *items* :test #'equalp))
    items)

assuming that the order of the items in *items* doesn't matter.

> For performance, one thing you could do is use an 'equalp hashtable,
> keying items into (say) T.

I bet you can guess what UNION typically does internally :-)

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | Free Mumia Abu-Jamal! |
     ,--'    _,'   | Abolish the racist    |
    /       /      | death penalty!        |
   (   -.  |       `-----------------------'
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Pascal Bourguignon
Subject: Re: Most efficient way of adding to a list?
Date: Sat, 21 Jan 2006 17:52:32 +0000
Message-ID: <87hd7xfoyn.fsf@thalassa.informatimago.com>

···@conquest.OCF.Berkeley.EDU (Thomas F. Burdick) writes:

> "Tayssir John Gabbour" <···········@yahoo.com> writes:
>
>> Here's a more idiomatic version of your code...
>> 
>> ;; untested!
>> (defparameter *items* nil)
>> 
>> (defun increment-items (items)
>>   "Add each of the elements of items to *items* (if it is not there
>> already)."
>>   (mapc (lambda (item) (pushnew item *items* :test #'equalp))
>>         items))
>
> Or even:
>
>   (defparameter *items* nil)
>
>   (defun increment-items (items)
>     (setf *items* (union items *items* :test #'equalp))
>     items)
>
> assuming that the order of the items in *items* doesn't matter.
>
>> For performance, one thing you could do is use an 'equalp hashtable,
>> keying items into (say) T.
>
> I bet you can guess what UNION typically does internally :-)

I would have proposed that, but UNION doesn't guarantee the unicity of
the elements when they're not unique in the original lists.

With PUSHNEW:

 (defparameter *items* nil)
 (increment-items '(a a a a)) 
 (assert (equalp *items* '(a)))

With UNION:

 (defparameter *items* nil)
 (increment-items '(a a a a)) 
 (assert (subset '(a) *items* :test (function equalp)))
 ;; you can have (< 1 (length *items*)), implementation dependant.

That's the problem with Posters who post only code, without posting
their specifications: code overspecifies!

-- 
__Pascal_Bourguignon__               _  Software patents are endangering
()  ASCII ribbon against html email (o_ the computer industry all around
/\  1962:DO20I=1.100                //\ the world http://lpf.ai.mit.edu/
    2001:my($f)=`fortune`;          V_/   http://petition.eurolinux.org/

From: Thomas F. Burdick
Subject: Re: Most efficient way of adding to a list?
Date: Sun, 22 Jan 2006 09:45:00 +0000
Message-ID: <xcvslrgzjdv.fsf@conquest.OCF.Berkeley.EDU>

Pascal Bourguignon <····@mouse-potato.com> writes:

> I would have proposed that, but UNION doesn't guarantee the unicity of
> the elements when they're not unique in the original lists.

Good point, I was assuming the OP really wanted a set, but we really
had no way of knowing.  I suppose I could bring up the possibility of
running the resulting set (resulting either from UNION or just APPEND)
through REMOVE-DUPLICATES, but...

> That's the problem with Posters who post only code, without posting
> their specifications: code overspecifies!

... I'd have to post like 8 different variations.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | Free Mumia Abu-Jamal! |
     ,--'    _,'   | Abolish the racist    |
    /       /      | death penalty!        |
   (   -.  |       `-----------------------'
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Thomas A. Russ
Subject: Re: Most efficient way of adding to a list?
Date: Mon, 23 Jan 2006 17:46:42 +0000
Message-ID: <ymi3bjehm65.fsf@sevak.isi.edu>

···@conquest.OCF.Berkeley.EDU (Thomas F. Burdick) writes:
> 
> I bet you can guess what UNION typically does internally :-)

Unfortunately, experience tells me I can't.  Some implementations seem
to do something like

    (union x y) ==>

    (remove-duplicates (append x y))

and then use a N^2 algorithm for REMOVE-DUPLICATES.  That's why some of
our lisp software has calls to our own FAST-REMOVE-DUPLICATES function
that is based on hash-tables.

Of course, things could have improved in the years since that software
was written.

-- 
Thomas A. Russ,  USC/Information Sciences Institute

From: ·······@gmail.com
Subject: Re: Most efficient way of adding to a list?
Date: Fri, 20 Jan 2006 21:30:59 +0000
Message-ID: <1137792659.538929.13410@g43g2000cwa.googlegroups.com>

dpapathanasiou wrote:
> As a student (still) of Common Lisp, I've been using it to develop an
> experimental project, and I came across the following situation:
>
> Suppose you have a global list to which you need to add more items
> (without duplication) periodically at runtime.
[...]
> After reading Norvig's excellent book "Case Studies in Common Lisp", I
> know it's important to avoid unnecessary consing, but I'm not sure how
> I could do it differently to make it cons less.
>
> Any suggestions or recommendations?

Your task description looks pretty much like a set union. Is there
anything wrong with just using the standard UNION function (or possibly
NUNION, if applicable, to reduce consing)? You can always use that for
now, since it's much less code to read and debug, and see if you can
make it better if/when it's ever needed.

Paul Khuong

From: jayessay
Subject: Re: Most efficient way of adding to a list?
Date: Fri, 20 Jan 2006 21:31:48 +0000
Message-ID: <m3fynibn7f.fsf@rigel.goldenthreadtech.com>

"dpapathanasiou" <···················@gmail.com> writes:

> Suppose you have a global list to which you need to add more items
> (without duplication) periodically at runtime.

Why do you think you need a list here?

> Here's the solution I came up with:
> 
> (setf *items*
>   ; A global list
>   nil)

Have you previously defined *items* (via defparameter (or possibly
defvar))?  If not, you may be in the land of dragons.

> 	      (when (not (member item *items* :test #'equalp))
> 		(setf *items* (cons item *items*)))

See ADJOIN, and more likely PUSHNEW

> It does what I want, but is it the best (in terms of efficiency)
> solution?

Will *items* ever have more than, oh say 10 items in it?  Do you need
the order in which the items were added?

> After reading Norvig's excellent book "Case Studies in Common Lisp", I
> know it's important to avoid unnecessary consing

The issue in this case is more likely time not space complexity.  You
have an O(n^2) algorithm when you may well need something much better
(hint: how big is *items* again?)

/Jon

-- 
'j' - a n t h o n y at romeo/charley/november com

From: dpapathanasiou
Subject: Re: Most efficient way of adding to a list?
Date: Fri, 20 Jan 2006 21:33:03 +0000
Message-ID: <1137792783.509194.178760@g14g2000cwa.googlegroups.com>

Tayssir & jayessay,

Many thanks for pointing out that I should be using defparameter
instead of setf for *items* (I was being very careful about the
contexts and scopes where *items* was used, but you're right, it's
better to define it with defparameter from the start).

Thanks, too, for the clarification on pushnew instead of the combined
member/setf/cons logic -- I see that pushnew is a macro which
encapsulates all of the same logic, but is it any different in terms of
efficiency necessarily?

jayessay: the sequence of what's in *items* doesn't matter (i.e. I
won't have to sort them, for example), but *items* has the potential to
grow very large (millions or even more) under the right circumstances,
and that's what got me thinking about efficiency in the first place.

From: Tayssir John Gabbour
Subject: Re: Most efficient way of adding to a list?
Date: Fri, 20 Jan 2006 21:45:44 +0000
Message-ID: <1137793544.481865.311870@o13g2000cwo.googlegroups.com>

dpapathanasiou wrote:
> Tayssir & jayessay,
>
> Many thanks for pointing out that I should be using defparameter
> instead of setf for *items* (I was being very careful about the
> contexts and scopes where *items* was used, but you're right, it's
> better to define it with defparameter from the start).
>
> Thanks, too, for the clarification on pushnew instead of the combined
> member/setf/cons logic -- I see that pushnew is a macro which
> encapsulates all of the same logic, but is it any different in terms of
> efficiency necessarily?

Not likely. So you might eventually do something like:

;; untested
(defparameter *items* (make-hash-table :test 'equalp))

(defun increment-items (items)
  "Add each of the elements of items to *items* (if it is not there
already)."
  (mapc (lambda (item) (setf (gethash item *items*) t))
        items))


Or whatever datastructure you find appropriate.

Tayssir

From: jayessay
Subject: Re: Most efficient way of adding to a list?
Date: Fri, 20 Jan 2006 22:08:16 +0000
Message-ID: <m3bqy6blin.fsf@rigel.goldenthreadtech.com>

"dpapathanasiou" <···················@gmail.com> writes:

> jayessay: the sequence of what's in *items* doesn't matter (i.e. I
> won't have to sort them, for example), but *items* has the potential to
> grow very large (millions or even more) under the right circumstances,
> and that's what got me thinking about efficiency in the first place.

You don't want a list.  You don't want a sequence.  Since the order
doesn't matter either, you don't even need a tree.  You almost
certainly want a hashtable.  A big honkin hashtable.  This presumes
you can conjure a decent hashfunction for your items (that is also,
almost certainly fairly (or even outright) easy to do.)  You will then
have an O(1) operation on finding and putting things from/to *items*.

See:

http://www.lispworks.com/documentation/HyperSpec/Body/c_hash_t.htm

/Jon

-- 
'j' - a n t h o n y at romeo/charley/november com

From: verec
Subject: Re: Most efficient way of adding to a list?
Date: Sat, 21 Jan 2006 00:08:19 +0000
Message-ID: <43d17b77$0$87291$5a6aecb4@news.aaisp.net.uk>

On 2006-01-20 22:08:16 +0000, jayessay <······@foo.com> said:

> "dpapathanasiou" <···················@gmail.com> writes:
>> jayessay: the sequence of what's in *items* doesn't matter (i.e. I
>> won't have to sort them, for example), but *items* has the potential to
>> grow very large (millions or even more) under the right circumstances,
>> and that's what got me thinking about efficiency in the first place.
> You don't want a list.  You don't want a sequence.  Since the order
> doesn't matter either, you don't even need a tree.  You almost
> certainly want a hashtable.

I'm curious. Why a hash-table and not an adjustable array +
vector-push-extend ?

In a hash-table the value slot would be wasted, since the
code would most likely be:

(setf (gethash item *items*) t)

just to record that the item is there?

Whereas, with an adjustable array, a million entries won't
need two millions slots?
--
JFB

From: Brian Downing
Subject: Re: Most efficient way of adding to a list?
Date: Sat, 21 Jan 2006 00:13:57 +0000
Message-ID: <91fAf.503233$084.109545@attbi_s22>

In article <·························@news.aaisp.net.uk>,
verec  <·····@mac.com> wrote:
> I'm curious. Why a hash-table and not an adjustable array +
> vector-push-extend ?

Because he doesn't want duplicates.

-bcd

From: Edi Weitz
Subject: Re: Most efficient way of adding to a list?
Date: Sat, 21 Jan 2006 00:14:07 +0000
Message-ID: <uek32igj4.fsf@agharta.de>

On Sat, 21 Jan 2006 00:08:19 +0000, verec <·····@mac.com> wrote:

> I'm curious. Why a hash-table and not an adjustable array +
> vector-push-extend ?
>
> In a hash-table the value slot would be wasted, since the code would
> most likely be:
>
> (setf (gethash item *items*) t)
>
> just to record that the item is there?
>
> Whereas, with an adjustable array, a million entries won't need two
> millions slots?

With any type of sequence you always have to walk through the whole
thingy (in the worst) case to check if something is already there.
With a hash table this operation is O(1).

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")

From: verec
Subject: Re: Most efficient way of adding to a list?
Date: Sat, 21 Jan 2006 00:53:04 +0000
Message-ID: <43d185f4$0$87291$5a6aecb4@news.aaisp.net.uk>

On 2006-01-21 00:14:07 +0000, Edi Weitz <········@agharta.de> said:

> On Sat, 21 Jan 2006 00:08:19 +0000, verec <·····@mac.com> wrote:
> 
>> I'm curious. Why a hash-table and not an adjustable array +
>> vector-push-extend ?
>> 
>> In a hash-table the value slot would be wasted, since the code would
>> most likely be:
>> 
>> (setf (gethash item *items*) t)
>> 
>> just to record that the item is there?
>> 
>> Whereas, with an adjustable array, a million entries won't need two
>> millions slots?
> 
> With any type of sequence you always have to walk through the whole
> thingy (in the worst) case to check if something is already there.
> With a hash table this operation is O(1).

You're right. Sorry, I missed the no duplicate bits.

Though there's still the usual time/space trade-off.

1 million hash-table entries should weight about 8 MB
in RAM plus the total size of all items (assuming 32 bits
slots for both the key and the value)

This is probably inconsequential today, but not so long
ago ...
--
JFB

From: jayessay
Subject: Re: Most efficient way of adding to a list?
Date: Sat, 21 Jan 2006 04:41:42 +0000
Message-ID: <m33bjib3ax.fsf@rigel.goldenthreadtech.com>

verec <·····@mac.com> writes:

> Though there's still the usual time/space trade-off.

As I pointed out before, in this type of case, the issue is _time_
complexity, not space.

> 1 million hash-table entries should weight about 8 MB in RAM plus
> the total size of all items (assuming 32 bits slots for both the key
> and the value)

Any concern here is completely blown away by trying to build up
millions of of non duplicated items in a sequence, where the time will
quickly reach into the the 10s of _trillions_ of comparisons.

> This is probably inconsequential today, but not so long
> ago ...

Even not so long ago space would be an irrelevant concern in this type
of case even if you had to spill to secondary storage.  Well, assuming
you ever wanted to see the thing finish.

/Jon

-- 
'j' - a n t h o n y at romeo/charley/november com

From: Tim X
Subject: Re: Most efficient way of adding to a list?
Date: Sat, 21 Jan 2006 04:09:48 +0000
Message-ID: <87lkxa8bn7.fsf@tiger.rapttech.com.au>

verec <·····@mac.com> writes:

> On 2006-01-20 22:08:16 +0000, jayessay <······@foo.com> said:
> 
> > "dpapathanasiou" <···················@gmail.com> writes:
> >> jayessay: the sequence of what's in *items* doesn't matter (i.e. I
> >> won't have to sort them, for example), but *items* has the potential to
> >> grow very large (millions or even more) under the right circumstances,
> >> and that's what got me thinking about efficiency in the first place.
> > You don't want a list.  You don't want a sequence.  Since the order
> > doesn't matter either, you don't even need a tree.  You almost
> > certainly want a hashtable.
> 
> I'm curious. Why a hash-table and not an adjustable array +
> vector-push-extend ?
> 
> In a hash-table the value slot would be wasted, since the
> code would most likely be:
> 
> (setf (gethash item *items*) t)
> 
> just to record that the item is there?
> 
> Whereas, with an adjustable array, a million entries won't
> need two millions slots?
> --
> JFB
> 
> 

I think the main reason a hash table is being proposed is because a
list or an array will involve lots of traversal, which for a large
list/array will be a performance hit if this operation occurs a
lot. The amount of space wasted for unused hash slots is likely to be
negligable to the repeated traversal and comparison of a list or
array. 

Of course, like most situations, its a play off between memory use and
execution time. More would need to be known before a definitive answer
could be made concerning the application and how/where it is used and
the environment it is used in. 

Tim
-- 
Tim Cross
The e-mail address on this message is FALSE (obviously!). My real e-mail is
to a company in Australia called rapttech and my login is tcross - if you 
really need to send mail, you should be able to work it out!

From: Wade Humeniuk
Subject: Re: Most efficient way of adding to a list?
Date: Fri, 20 Jan 2006 21:41:25 +0000
Message-ID: <9OcAf.135874$OU5.117170@clgrps13>

dpapathanasiou wrote:
> Tayssir & jayessay,
> 
> Many thanks for pointing out that I should be using defparameter
> instead of setf for *items* (I was being very careful about the
> contexts and scopes where *items* was used, but you're right, it's
> better to define it with defparameter from the start).
> 
> Thanks, too, for the clarification on pushnew instead of the combined
> member/setf/cons logic -- I see that pushnew is a macro which
> encapsulates all of the same logic, but is it any different in terms of
> efficiency necessarily?
> 
> jayessay: the sequence of what's in *items* doesn't matter (i.e. I
> won't have to sort them, for example), but *items* has the potential to
> grow very large (millions or even more) under the right circumstances,
> and that's what got me thinking about efficiency in the first place.
> 

Use a hash table.

(defvar *items* (make-hash-table :test 'equalp))

(defun add-items (items)
   (dolist (item items)
     (setf (gethash item *items*) t)))

(defun get-items ()
   (loop for key being the hash-key of *items*
         collect key))

(defun item-p (item)
   (gethash item *items*))

CL-USER 1 > (add-items '(1 2 3))
NIL

CL-USER 2 > (add-items '(1 2 6))
NIL

CL-USER 3 > (item-p 6)
T
T

CL-USER 4 > (item-p 10)
NIL
NIL

CL-USER 5 > (get-items)
(1 2 3 6)

CL-USER 6 >
Wade

From: Tim X
Subject: Re: Most efficient way of adding to a list?
Date: Sat, 21 Jan 2006 04:01:03 +0000
Message-ID: <87psmm8c1s.fsf@tiger.rapttech.com.au>

"dpapathanasiou" <···················@gmail.com> writes:

> Tayssir & jayessay,
> 
> Many thanks for pointing out that I should be using defparameter
> instead of setf for *items* (I was being very careful about the
> contexts and scopes where *items* was used, but you're right, it's
> better to define it with defparameter from the start).
> 
> Thanks, too, for the clarification on pushnew instead of the combined
> member/setf/cons logic -- I see that pushnew is a macro which
> encapsulates all of the same logic, but is it any different in terms of
> efficiency necessarily?
> 
> jayessay: the sequence of what's in *items* doesn't matter (i.e. I
> won't have to sort them, for example), but *items* has the potential to
> grow very large (millions or even more) under the right circumstances,
> and that's what got me thinking about efficiency in the first place.
> 

In that case jayessay's comments are very relevant. If your going to
have millions of items, I can't see a list being a good solution
unless you very rarely have to add new items AND the list gives you
desired efficiencies in other areas. 

With millions of itmes, the consing of new items is likely to be
insignificant with respect to efficiency compared to the repeated
scan/traversal of your items list comparing each element to the new
item until either you find a match or get to the end of the list. As
your list grows, its likely the chances a prospective new item is
already in the list will also grow, which means that it won't be added
to the list - so, its likely the longer the list gets, its likely the
performance hit for joinning new items to the global list will
decrease even further and the traversal and comparison will become an
even greater hit.

However, putting all this to the side, I think you may be putting the
cart before the horse. Don't worry about this level of efficiency
until later as once you have more of the application developed your
likely to find other areas of your application which are making far
greater contribution to performance overheads. Start instead with
contemplation of the best data structures for representing your
problem - in my experience, regardless of the implementation language,
this is far more fundamental than worrying about individual points of
inefficiency. If you get the right data structure, often the right
algorithms just seem to jump out and fit with the problem and you are
usually able to express the problem with much greater clarity, which
also tends to lead to clearer algorithms etc. 

Once you have it all running, do some profiling and see where the real
inefficiencies are and see how you can improve the worst of these
first. 

Tim
-- 
Tim Cross
The e-mail address on this message is FALSE (obviously!). My real e-mail is
to a company in Australia called rapttech and my login is tcross - if you 
really need to send mail, you should be able to work it out!

From: Wade Humeniuk
Subject: Re: Most efficient way of adding to a list?
Date: Fri, 20 Jan 2006 21:35:05 +0000
Message-ID: <dIcAf.135872$OU5.89611@clgrps13>

dpapathanasiou wrote:
> 
> Any suggestions or recommendations?
> 

(defvar *items* nil)

(defun add-items (items)
   (dolist (item items) (pushnew item *items* :test #'equalp)))

Wade