From: Shin
Subject: PUSH vs LOOP + COLLECT
Date: 
Message-ID: <a19o7sk3bgfqeqvi9qsfrjre7vno0c9o8i@4ax.com>
Hello,

If one wants to build a list and can do it easly with PUSH:

   (let ((x ()))
        (loop for i from 2000 downto 1 do (push i x))

Can one assume, in general, that this is faster than the LOOP + COLLECT
combination?

   (let ((x (loop for i from 1 to 2000 collect i)))

Or the latter is normally implemented to be as quick as the former?

Thanks,

-- Shin

From: Robert Monfera
Subject: Re: PUSH vs LOOP + COLLECT
Date: 
Message-ID: <387E4466.2171C759@fisec.com>
I'm not sure if there is a general answer (maybe others do or
implementations vary), you can simply try it in your implementation
(make sure you put it in a function and compile).

If you want it to be real fast, or provide fast random access, a vector
is a much better choice, usually orders of magnitude faster.

Robert

Shin wrote:
>
> Hello,
>
> If one wants to build a list and can do it easly with PUSH:
>
>    (let ((x ()))
>         (loop for i from 2000 downto 1 do (push i x))
>
> Can one assume, in general, that this is faster than the LOOP + COLLECT
> combination?
>
>    (let ((x (loop for i from 1 to 2000 collect i)))
>
> Or the latter is normally implemented to be as quick as the former?
>
> Thanks,
>
> -- Shin
From: Barry Margolin
Subject: Re: PUSH vs LOOP + COLLECT
Date: 
Message-ID: <2Brf4.28$%%2.328@burlma1-snr2>
In article <··································@4ax.com>,
Shin  <···@retemail.es> wrote:
>Hello,
>
>If one wants to build a list and can do it easly with PUSH:
>
>   (let ((x ()))
>        (loop for i from 2000 downto 1 do (push i x))
>
>Can one assume, in general, that this is faster than the LOOP + COLLECT
>combination?
>
>   (let ((x (loop for i from 1 to 2000 collect i)))
>
>Or the latter is normally implemented to be as quick as the former?

The typical implementation of loop+collect keeps a reference to the last
cons in the list.  It performs a rplacd and then sets the reference to
point to the new last cons.

So on each iteration, your PUSH will do CONS and SETQ.  LOOP + COLLECT will
do CONS, RPLACD, and SETQ.  So it's likely that PUSH will be a little
faster.

Note that this only works out in the case where it's easy for you to
arrange the two loops to go in different orders -- the first one counted
down, while the second one counted up.  Often this isn't easy to arrange,
so the PUSH loop is follewed by (setq q (nreverse x)).  This extra step
will probably make up for any savings that you got by using PUSH rather
than COLLECT.

-- 
Barry Margolin, ······@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
From: Marco Antoniotti
Subject: Re: PUSH vs LOOP + COLLECT
Date: 
Message-ID: <lwiu0x0zcr.fsf@parades.rm.cnr.it>
Shin <···@retemail.es> writes:

> Hello,
> 
> If one wants to build a list and can do it easly with PUSH:
> 
>    (let ((x ()))
>         (loop for i from 2000 downto 1 do (push i x))
> 
> Can one assume, in general, that this is faster than the LOOP + COLLECT
> combination?
> 
>    (let ((x (loop for i from 1 to 2000 collect i)))
> 
> Or the latter is normally implemented to be as quick as the former?

There are certainly some efficiency considerations to keep in mind,
but I like the second better.

BTW. Be careful about the following cases, which may be a little
counterintuitive.  Cfr. the Hyperspec, Section 6.1.3.


* (loop for i from 2 to 100 collect i)
(2 3 4 5 6 ...)


* (loop for i from 2 to 100 collect i into x)
NIL


* (loop for i from 2 to 100 collect i into x finally (return x))
(2 3 4 5 6 ...)


* (let ((x '(-1 -2))) (loop for i from 2 to 100 collect i into x) x)
(-1 -2)
* 


-- 
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - 06 68 10 03 17, fax. +39 - 06 68 80 79 26
http://www.parades.rm.cnr.it/~marcoxa
From: Pierre R. Mai
Subject: Re: PUSH vs LOOP + COLLECT
Date: 
Message-ID: <871z7lnep1.fsf@orion.dent.isdn.cs.tu-berlin.de>
Shin <···@retemail.es> writes:

> If one wants to build a list and can do it easly with PUSH:
> 
>    (let ((x ()))
>         (loop for i from 2000 downto 1 do (push i x))

You could also write

(loop with result-list = ()
      for i from 2000 downto 1 do (push i x)
      finally (return result-list))

Which will get you the same behaviour and code contour of a simple 
loop + collect.

> Can one assume, in general, that this is faster than the LOOP + COLLECT
> combination?
> 
>    (let ((x (loop for i from 1 to 2000 collect i)))

Others have already argued that loop + push might be infinitesimally
faster than loop + collect, because the former does the equivalent of
cons + setq, whereas the other does cons + rplacd + setq.  OTOH,
for a given implementation loop+collect might be expanded into special
code that the compiler recognizes and codes very efficiently, whereas
loop+push is unlikely to receive this treatment.

But in general the time to cons all the list elements (and to collect
them later on) will most likely dwarf any small differences in
performance:  On CMU CL on a AMD K6-2 350 it takes ~ 1.34s to collect
1000 elements into a list 1000 times.  Of this time, 0.36s go to GC
run-time.  Any differences (there were no measurable differences with
this example) between loop+push and loop+collect will be drowned by GC 
time alone.

> Or the latter is normally implemented to be as quick as the former?

In general you can assume that loop clauses are generally at least as
efficient as code a normal user might write, and generally they are
more efficient.

Anyway:  Don't worry about small run-time differences, worry about
overall development- and run-time speed instead ;)

Regs, Pierre.

-- 
Pierre Mai <····@acm.org>         PGP and GPG keys at your nearest Keyserver
  "One smaller motivation which, in part, stems from altruism is Microsoft-
   bashing." [Microsoft memo, see http://www.opensource.org/halloween1.html]
From: Paul Dietz
Subject: Re: PUSH vs LOOP + COLLECT
Date: 
Message-ID: <85ned9$b32$1@schbbs.mot.com>
In article <··············@orion.dent.isdn.cs.tu-berlin.de>,
Pierre R. Mai <····@acm.org> wrote:


>> Can one assume, in general, that this is faster than the LOOP + COLLECT
>> combination?
>> 
>>    (let ((x (loop for i from 1 to 2000 collect i)))
>
>Others have already argued that loop + push might be infinitesimally
>faster than loop + collect, because the former does the equivalent of
>cons + setq, whereas the other does cons + rplacd + setq.  OTOH,
>for a given implementation loop+collect might be expanded into special
>code that the compiler recognizes and codes very efficiently, whereas
>loop+push is unlikely to receive this treatment.
>
>But in general the time to cons all the list elements (and to collect
>them later on) will most likely dwarf any small differences in
>performance:  On CMU CL on a AMD K6-2 350 it takes ~ 1.34s to collect
>1000 elements into a list 1000 times.  Of this time, 0.36s go to GC
>run-time.  Any differences (there were no measurable differences with
>this example) between loop+push and loop+collect will be drowned by GC 
>time alone.
...
>In general you can assume that loop clauses are generally at least as
>efficient as code a normal user might write, and generally they are
>more efficient.



This is implementation dependent.  As a counterexample, consider
this program compiled on ACL 5.0.1 on an Sparc Ultra 2:

(in-package "USER")

(declaim (optimize (speed 3) (safety 0) (debug 0)))

(defun f1 (n)
  (declare (fixnum n))
  (loop
    for i of-type fixnum from 1 to n
    collect i))

(defun f2 (n)
  (declare (fixnum n))
  (let ((x nil))
    (loop
      for i of-type fixnum from n downto 1
      do (push i x))
    x))

(defun f3 (n)
  (declare (fixnum n))
  ;; Assume n > 0
  (if (evenp n)
      (f3a 1 n)
    (cons 1 (f3a 2 n))))

(defun f3a (b n)
  (declare (fixnum b n))
  (loop
    for i of-type fixnum from b to n by 2
    nconc (list i (the fixnum (1+ i)))))

(defun f4 (n)
  (declare (fixnum n))
  ;; Assume n > 0
  (if (evenp n)
      (f4a 1 n)
    (cons 1 (f4a 2 n))))

(defun f4a (b n)
  (declare (fixnum b n))
  (let* ((head (cons nil nil))
	 (tail head))
    (declare ;; (dynamic-extent head)
	     (type cons head tail))
    (loop
      for i of-type fixnum from b to n by 2
      do (let* ((c2 (list (the fixnum (1+ i))))
		(c1 (cons i c2)))
	   (setf (cdr tail) c1)
	   (setq tail c2)))
    (cdr head)))

(defun f5 (n)
  (declare (fixnum n))
  (let* ((head (cons nil nil))
	 (tail head))
;;    (declare (dynamic-extent head))
    (loop
      for i of-type fixnum from 1 to n
      do (setq tail
	       (setf (cdr tail) (list i))))
    (cdr head)))

(defun f6 (n)
  (declare (fixnum n))
  (let ((x nil))
    (loop for i of-type fixnum from 1 to n
      do (push i x))
    x))

(defun f7 (n)
  (declare (fixnum n))
  (let ((x nil))
    (loop for i of-type fixnum from 1 to n
      do (push i x))
    (nreverse x)))

(defun test1 (reps)
  (declare (fixnum reps))
  (dotimes (r reps)
    (declare (fixnum r))
    (f1 2000)))

(defun test2 (reps)
  (declare (fixnum reps))
  (dotimes (r reps)
    (declare (fixnum r))
    (f2 2000)))

(defun test3 (reps)
  (declare (fixnum reps))
  (dotimes (r reps)
    (declare (fixnum r))
    (f3 2000)))
  
(defun test4 (reps)
  (declare (fixnum reps))
  (dotimes (r reps)
    (declare (fixnum r))
    (f4 2000)))

(defun test5 (reps)
  (declare (fixnum reps))
  (dotimes (r reps)
    (declare (fixnum r))
    (f5 2000)))

(defun test6 (reps)
  (declare (fixnum reps))
  (dotimes (r reps)
    (declare (fixnum r))
    (f6 2000)))

(defun test7 (reps)
  (declare (fixnum reps))
  (dotimes (r reps)
    (declare (fixnum r))
    (f7 2000)))

(defun testall ()
  (time (test1 10000))
  (time (test2 10000))
  (time (test3 10000))
  (time (test4 10000))
  (time (test5 10000))
  (time (test6 10000))
  (time (test7 10000)))

---

USER(24): (testall)
; cpu time (non-gc) 4,040 msec user, 20 msec system
; cpu time (gc)     750 msec user, 0 msec system
; cpu time (total)  4,790 msec user, 20 msec system
; real time  4,824 msec
; space allocation:
;  20,010,075 cons cells, 0 symbols, -32 other bytes, 0 static bytes
; cpu time (non-gc) 2,280 msec user, 0 msec system
; cpu time (gc)     800 msec user, 10 msec system
; cpu time (total)  3,080 msec user, 10 msec system
; real time  3,115 msec
; space allocation:
;  20,000,000 cons cells, 0 symbols, -32 other bytes, 0 static bytes
; cpu time (non-gc) 3,630 msec user, 10 msec system
; cpu time (gc)     800 msec user, 20 msec system
; cpu time (total)  4,430 msec user, 30 msec system
; real time  4,462 msec
; space allocation:
;  20,010,000 cons cells, 0 symbols, -32 other bytes, 0 static bytes
; cpu time (non-gc) 2,720 msec user, 0 msec system
; cpu time (gc)     760 msec user, 0 msec system
; cpu time (total)  3,480 msec user, 0 msec system
; real time  3,560 msec
; space allocation:
;  20,010,000 cons cells, 0 symbols, -32 other bytes, 0 static bytes
; cpu time (non-gc) 3,760 msec user, 0 msec system
; cpu time (gc)     740 msec user, 0 msec system
; cpu time (total)  4,500 msec user, 0 msec system
; real time  4,512 msec
; space allocation:
;  20,010,000 cons cells, 0 symbols, -32 other bytes, 0 static bytes
; cpu time (non-gc) 1,940 msec user, 0 msec system
; cpu time (gc)     680 msec user, 0 msec system
; cpu time (total)  2,620 msec user, 0 msec system
; real time  2,642 msec
; space allocation:
;  20,000,000 cons cells, 0 symbols, -32 other bytes, 0 static bytes
; cpu time (non-gc) 5,300 msec user, 0 msec system
; cpu time (gc)     910 msec user, 0 msec system
; cpu time (total)  6,210 msec user, 0 msec system
; real time  6,267 msec
; space allocation:
;  20,000,000 cons cells, 0 symbols, -32 other bytes, 0 static bytes
NIL
USER(25): 

Note that GC time (as such) is not dominant (it could be
reduced further here by playing with the GC tuning parameters.)

LOOP+COLLECT suffers because it does destructive operations
on the list, doing a RPLACD on the tail of the intermediate
list on each iteration.  ACL on the Sparc uses a generational
garbage collector, so any destructive operation like this
(unless the compiler can determine that the value being assigned
is not a pointer to some potentially GCable object) hits a
write barrier, and is expensive.

test3 unrolls the loop once and uses the LOOP ... NCONC form
to nconc lists of length2.  This avoids half the write barrier
calls.  test4 does this manually, avoiding the call to nconc.
test5 is a manual reimplementation of test1.  There appears to
be some unnecessary overhead in the loop expansion in test1,
since test5 and test1 should be identical.

	Paul Dietz
From: Kent M Pitman
Subject: Re: PUSH vs LOOP + COLLECT
Date: 
Message-ID: <sfwya9jfjvu.fsf@world.std.com>
·····@comm.mot.com (Paul Dietz) writes:

> In article <··············@orion.dent.isdn.cs.tu-berlin.de>,
> Pierre R. Mai <····@acm.org> wrote:
> >[...]
> >In general you can assume that loop clauses are generally at least as
> >efficient as code a normal user might write, and generally they are
> >more efficient.

Rather than "can", I suggest "should".
 
> This is implementation dependent.  As a counterexample, consider
> this program compiled on ACL 5.0.1 on an Sparc Ultra 2: [...]
> [...] 
> Note that GC time (as such) is not dominant (it could be
> reduced further here by playing with the GC tuning parameters.)
> 
> LOOP+COLLECT suffers because it does destructive operations
> on the list, doing a RPLACD on the tail of the intermediate
> list on each iteration.  ACL on the Sparc uses a generational
> garbage collector, so any destructive operation like this
> (unless the compiler can determine that the value being assigned
> is not a pointer to some potentially GCable object) hits a
> write barrier, and is expensive.
> 
> test3 unrolls the loop once and uses the LOOP ... NCONC form
> to nconc lists of length2.  This avoids half the write barrier
> calls.  test4 does this manually, avoiding the call to nconc.
> test5 is a manual reimplementation of test1.  There appears to
> be some unnecessary overhead in the loop expansion in test1,
> since test5 and test1 should be identical.

The right thing is to assume that programs whose intent is directly
inferrable from syntax (such as LOOP+COLLECT) "should" be optimized
better than programs whose intent must be pieced together by examining
implementation (such as LOOP+PUSH+NREVERSE, etc.).

You will definitely be wrong on some occasions when you use LOOP+COLLECT
expecting it to be fastest, but you will be investing in the future.
You should send a bug report where you observe you are not getting
the performance you need.  Since only the implementation can know whether
these low-level GC issues will dominate, or whether there are special
primitives available that compile fast, only the implementation can 
properly do this.  If you make an assumption about what is right in one
implementation, expect that you are pessimizing it in another.

And, incidentally, even if your program runs only in one
implementation, remember that portability extends not only in space
(to other vendors) but also in time (to future releases, where you
might not pick up an important optimization to LOOP+COLLECT.  When you
unfold this to something more obscure, you are betting on your vendor
to be non-responsive.  And you are betting against your ever wising up
and getting a better vendor/implementation. That may be a good bet in
some cases, but if you really think it is, I honestly think it's time
for you to either change vendors or change languages.

Life is short.  Do not spend your precious days trying to find ways
to outrace the obviously-correct primitive just because it is 
instantaneously not optimal for your situation unless everything else
in your running system is in order and the dollars are coming in for 
your deployed product at a rate that supports such luxury.

On the other hand, figuring out that things like LOOP+COLLECT are 
suboptimal in some implementations ARE valuable if it is for the purpose
of sending a bug report to a vendor or taking some other action that will
leverage the Future Good by making a change that everyone benefits from.

When I was first using Maclisp, I used to ask people if PROG or LET was
faster for binding variables.  Scarily enough, sometimes people would 
actually have a definite opinion.  But fortunately, some reasonable person
told me that I should program as if the difference did not matter because
both were supposed to be optimally fast, and I should send a bug report if
I found this not to be so.  This was really excellent advice that has, for
me, stood the test of time.
From: Pierre R. Mai
Subject: Re: PUSH vs LOOP + COLLECT
Date: 
Message-ID: <87r9f7kmhn.fsf@orion.dent.isdn.cs.tu-berlin.de>
Kent M Pitman <······@world.std.com> writes:

> ·····@comm.mot.com (Paul Dietz) writes:
> 
> > In article <··············@orion.dent.isdn.cs.tu-berlin.de>,
> > Pierre R. Mai <····@acm.org> wrote:
> > >[...]
> > >In general you can assume that loop clauses are generally at least as
> > >efficient as code a normal user might write, and generally they are
> > >more efficient.
> 
> Rather than "can", I suggest "should".

[ very good advice and reasoning elided ]

Right, that was indeed the idea that I wanted to convey, but I did a
rather poor job of it...

Regs, Pierre.

-- 
Pierre Mai <····@acm.org>         PGP and GPG keys at your nearest Keyserver
  "One smaller motivation which, in part, stems from altruism is Microsoft-
   bashing." [Microsoft memo, see http://www.opensource.org/halloween1.html]