From: Lupus Yonderboy
Subject: Lisp optimization
Date: 
Message-ID: <24890@sdcc6.ucsd.edu>
How do you write Lisp code so it can be optimized easily by the
compiler?  I've already done (proclaim '(optimize ...)) and tried to
use (declare (type ...)) but they didn't seem to do enough.  For
instance, the following program took nine hours under Allegro CL 3.1.
The equivalent C program took 15 1/2 minutes.


(proclaim '(optimize (safety 0) (speed 3) (space 1)
                     (compilation-speed 0)))

(defun doit()
   ; Loop through a list of primes, determine if the current
   ; number is prime, add it to the list if it is.
   (let
      (
         ; Our list of prime numbers so far
         (primes '(2 3))
      )
      (declare (type list primes))
      (do
         (
            ; Our counter
            (current 5 (setq current (+ current 2)))
         )
         ; Look up to 1,000,000.
         ((> current 1000000) primes)
   
         ; Some declarations.
         (declare (type fixnum current))
   
         ; Loop through the list of primes.
         (let
            (
               ; Where we should search to
               (limit (isqrt current))
            )
            (declare (type fixnum limit))
            (do*
               (
                  ; Pointer to current location in prime list.
                  (pointer primes (setq pointer (rest pointer)))
                  (current-prime (first pointer)
                     (setq current-prime (first pointer)))
               )
      
               ; Search until we go past the possible limit.
               ((> current-prime limit)
                  (nconc primes (cons current nil)))
      
               ; Some declares.
               (declare (type fixnum current-prime)
                      (type cons pointer))
      
               ; If the number we're testing divides evenly with
               ; the current prime, fail it.
               (when
                  (zerop (the fixnum
                     (mod current current-prime)))
                  (return nil)))))))


The books keep saying that it's just a myth that Lisp is slow, yet I
have so much trouble making it run fast.  Can anyone help?

I have CLtL2 -- that's my only info on declares.

Thanks much,

Steve Boswell         | This opinion is distributed in the hopes that it
······@ucsd.edu       | will be useful, but WITHOUT ANY WARRANTY...
······@gnu.ai.mit.edu |

From: David Gadbois
Subject: Lisp optimization
Date: 
Message-ID: <19911027061103.2.GADBOIS@CLIO.ACA.MCC.COM>
    Date: Sat, 26 Oct 1991 12:55 CDT
    From: ········@sdcc5.ucsd.edu (Lupus Yonderboy)

    How do you write Lisp code so it can be optimized easily by the
    compiler?  I've already done (proclaim '(optimize ...)) and tried to
    use (declare (type ...)) but they didn't seem to do enough.  For
    instance, the following program took nine hours under Allegro CL
    3.1.  The equivalent C program took 15 1/2 minutes.

Before you start trying to figure out how to tell the compiler to do
things, it is always a good idea to work on improving your code
algorithmically.  There has been a huge amount of work on computing
primes, and I am sure there is some tricky way to find them very
quickly.  Unfortunately, I am too lazy to go downstairs to the library
to look one up, so I will just point out an easy way to speed your code
up.

Here is an annotated version of the original code, reground for
readability:

    (defun doit ()
      (let ((primes '(2 3)))

Note that destructively modifying quoted constants is a big no-no.
Using a quote gives the compiler license to make the value read-only.
The initializer here should be written (LIST 2 3).

	(declare (type list primes))

One would hope that this declaration is a no-op, since PRIMES was just
intitialized to a list.

	(do ((current 5 (setq current (+ current 2))))

Note that SETQing the variable in the stepper form is superfluous -- it
gets set to the value of the form automatically.
    
	    ((> current 1000000) primes)
    	  (declare (type fixnum current))

A sufficiently smart compiler should be able to determine that CURRENT
will always be a fixnum without the declaration :-).
       
	  (let ((limit (isqrt current)))

There was a big discussion about ISQRT implementations at the beginning
of the year.  It turns out that most of them really sucked.  Let me know
if you want the results.

	    (declare (type fixnum limit))
	    (do* ((pointer primes (setq pointer (rest pointer)))
		  (current-prime (first pointer)
				 (setq current-prime (first pointer))))
		 ((> current-prime limit)
		  (nconc primes (cons current nil)))

Aha.  Here is the problem.  In addition to destructively modifying a
quoted constant, sticking the new prime on to the end of the current
list by using NCONC is incredibly expensive for large lists.  Imagine
what is happening -- each time you NCONC something on to the list, it
has to go down the list until it reaches the end.  As the list grows,
you spend more and more of your time travelling down the list.  For the
all the primes under one million, this means traversing some
3,081,007,251 list elements when you really needn't traverse any.

	      (declare (type fixnum current-prime)
		       (type cons pointer))
	      (when
		(zerop (the fixnum
			    (mod current current-prime)))
		(return nil)))))))

So, here is the updated code.  The variable TAIL holds a pointer to the
last cons of the list of primes, and we just keep sticking the newly
found one in its CDR -- no list traversal at all.

    (defun doit-2 ()
      (let* ((primes (list 2 3))
	     (tail   (cdr primes)))
	(declare (type list primes tail))
	(do ((current 5 (+ current 2)))
	    ((> current 1000000) primes)
	  (declare (type fixnum current))
	  (let ((limit (isqrt current)))
	    (declare (type fixnum limit))
	    (do* ((pointer primes (rest pointer))
		  (current-prime (first pointer) (first pointer)))
		 ((> current-prime limit)
		  (setf (cdr tail) (cons current nil)
			tail       (cdr tail)))
	      (declare (type fixnum current-prime))
	      (when (zerop (mod current current-prime))
		(return nil)))))))

(DOIT-2) took 96 seconds to run on a SPARC Station II under CMU Common
Lisp.

(The CMUCL compiler complained that it had to a fixnum trunation while
doing the MOD, and so I tried replacing the INTEGER declarations with
(UNSIGNED-BYTE 29).  It made only a marginal difference in run time and
was terribly unsightly.)

    The books keep saying that it's just a myth that Lisp is slow, yet I
    have so much trouble making it run fast.  Can anyone help?

Looks like our next trick should be to debunk the myth that C is slow.
Really, I have found that it is possible to con most any compiler for
any language to emit any given sequence of machine instructions.  The
problem with Lisp, if indeed it is one, is that is very easy to not
think about what a program is doing implementation-wise, which makes it
easy to write very inefficient code.  The "NCONC problem" does not come
up often in C since the "while (++pointer != NULL)" would stick out like
a sore thumb.

--David Gadbois
From: Bob Kerns
Subject: Re: Lisp optimization
Date: 
Message-ID: <1991Oct27.080532.22284@crl.dec.com>
In article <·····@sdcc6.ucsd.edu>, ········@sdcc5.ucsd.edu (Lupus Yonderboy) writes:
> 
> How do you write Lisp code so it can be optimized easily by the
> compiler?  I've already done (proclaim '(optimize ...)) and tried to
> use (declare (type ...)) but they didn't seem to do enough.  For
> instance, the following program took nine hours under Allegro CL 3.1.
> The equivalent C program took 15 1/2 minutes.

I've taken the liberty of converting your program from
C-style indentation and spacing to the standard Lisp style
to make it easier to read.

> (defun doit()
>    ; Loop through a list of primes, determine if the current
>    ; number is prime, add it to the list if it is.
>    (let ((primes '(2 3))) ; Our list of prime numbers so far

There's a serious bug right here, that will cause it die in
many implementations (such as Lucid and Symbolics) which check
for this bug.  This is self-modifying code.  Later, you perform
NCONC on PRIMES, but here you set PRIMES to a constant list.

If it didn't die for you on Allegro, it certainly would have run
much more slowly on each successive trial, as the "constant" list
gets longer and longer.


>      (declare (type list primes))
>      (do ((current 5 (setq current (+ current 2)))); Our counter

The SETQ is completely redundant, and possibly time-wasting.
That is, you should write:

(do ((current 5 (+ current 2))) ...)

In other words, first you SETQ CURRENT to a value, and then
DO takes the return value from that (the value you set CURRENT
to), and sets CURRENT to it.

I removed it before doing my timing tests, to avoid confusion.

>          ; Look up to 1,000,000.
>          ((> current 1000000) primes)
>          ; Some declarations.
>          (declare (type fixnum current))
>          ; Loop through the list of primes.
>          (let ((limit (isqrt current))) ; Where we should search to
>            (declare (type fixnum limit))
>            (do* (; Pointer to current location in prime list.
>                  (pointer primes (rest pointer))
>                  (current-prime (first pointer) (first pointer)))
>              ; Search until we go past the possible limit.
>              ((> current-prime limit)
>                 (nconc primes (cons current nil)))

I'll bet that when you wrote the C code, you kept a pointer
to the end of the list, instead of iterating through the
entire list at each step as you do here.  (NCONC is implicitly
an iteration through all but the last argument).  You've
just converted an O(N^A) algorithm to O(N^(A+1)).

>              ; Some declares.
>              (declare (type fixnum current-prime)
>                       (type cons pointer))
>              ; If the number we're testing divides evenly with
>              ; the current prime, fail it.
>              (when (zerop (the fixnum
>                                (mod current current-prime)))
>                 (return nil)))))))
> 
> The books keep saying that it's just a myth that Lisp is slow, yet I
> have so much trouble making it run fast.  Can anyone help?

Well, as this case illustrates, one reason why people write
slow code in Lisp, is that code is easier in Lisp.  To write

(nconc primes (cons current nil))

in C, you'd write it something like:

{ 
  list p;
  for (p = current; (p->cdr) != NULL; p = p->cdr) {};
  p->cdr = CONS(p,current);
}

Or it's about three times as much work.  Since code like this
is easy to write, people write it even when they shouldn't.

Another way of looking at it, is that in C, every operation
does something very simple-minded, and takes about the same
length of time.  This makes it easy to estimate how fast a
piece of code is by looking at how big it is.

But in Lisp you can write an O(n^3), exponential, or even
worse algorithm in just a couple of lines.  But in C, by
the time you've written all that code, you've had time to
realize that it's going to be slow and to think of a better
way.

Here's your code, rewritten without these bugs.  I also
took out the declarations, since you're not doing anything
which will benefit noticably from them.  Most of your time
is spent inside NCONC, and declarations won't help that.
Plus, you can now look for primes which are larger than a
fixnum.

On a DECstation 5000/200, with 64 MBytes of memory, it took
33 minutes 38 seconds to run with the declarations, but with
the constant-list bug fixed.

It took 33 minutes 7 seconds to run the same code without
declarations.  (I.e. it was in the noise in my measurements).

With the code below, which doesn't use NCONC down the ever-growing
list, it took 1 minute, 23 seconds.

There was one ephemeral GC in each trial, and the
same amount of storage was allocated.

Note that I got this speedup while *REMOVING* declarations.
It is far, far, more important to not write slow algorithms
than to put in declarations.  The difference in speed would
have been much greater if you were looking for bigger primes.
Declarations generally make a speed difference of a constant
few percent.  In some cases, they may make a speed difference
of perhaps 300%, but they will never equal the kind of speedup
I got you here.

Then I put the declarations back in.  It now takes 1 minute
and 1 second.  This illustrates another point:  That declarations
matter more, the higher percentage of your code is doing fixnum
arithmetic.  Before, when your code was spending most of its
time doing NCONC, declarations made no measurable difference.
Now, however, they save 27% of the time, a quite noticable
result.  However, it comes at the cost of limiting the size of
prime we can handle.


> I have CLtL2 -- that's my only info on declares.

That's everything you need, except for understanding, of
course.  Of course, if you want to tune for a particular
implementation, it's helpful to have the documentation for
that implementation.

I don't know of any book to refer you to help your 
understanding of the performance aspects, but if you
think about what each Lisp routine has to do in order
to do its job, and take that into consideration, you'll
write much faster code.

Here's my version of your routine:

(proclaim '(optimize (safety 0) (speed 3) (compilation-speed 0) (space 1)))

(defun doit()
  ;; Loop through a list of primes, determine if the current
  ;; number is prime, add it to the list if it is.
  (do* ((primes (list 2 3))        ; Our list of prime numbers so far
	(last-prime (last primes))
        (current 5 (+ current 2))) ; Our counter
      ;; Look up to 1,000,000.
      ((> current 1000000) primes)
      ;; Loop through the list of primes.
    (do* ((limit (isqrt current))
	  ;; Pointer to current location in prime list.
	  (pointer primes (rest pointer))
	  (current-prime (first pointer) (first pointer)))
	 ;; Search until we go past the possible limit.
	 ((> current-prime limit)
	  (setf (cdr last-prime)
		(setf last-prime (cons current nil))))
       (when (zerop (mod current current-prime))
	 (return nil)))))

Here's the version with declarations.  Note that I left
out the (type list ...) declarations.  I'm not aware
of any Lisp which makes use of them in any significant
way.  That doesn't mean that none ever will, but my
philosphy about adding declarations is not to add them
unless I'm looking for a specific speedup, usually
verified through measurement.  Declarations are another
piece of code which can have bugs, but you can't have
bugs in code that you eliminate!

I also would never do PROCLAIM of the unsafe OPTIMIZE
parameters as you do, but rather declare them locally,
as below.  (SAFETY 0) should be reserved for small pieces
of well-tested, performance-critical code, because you'll
NEVER get all the bugs out of a large system.


(proclaim '(optimize (compilation-speed 0)))

(defun doit()
  (declare (optimize (safety 0) (speed 3) (space 1)))
  ;; Loop through a list of primes, determine if the current
  ;; number is prime, add it to the list if it is.
  (do* ((primes (list 2 3))		; Our list of prime numbers so far
	(last-prime (last primes))      ; Where to add more primes.
        (current 5 (+ current 2)))	; Our counter
      ;; Look up to 1,000,000.
      ((> current 1000000) primes)
      (declare (type fixnum current))
      ;; Loop through the list of primes.
    (do* ((limit (isqrt current))
	  ;; Pointer to current location in prime list.
	  (pointer primes (rest pointer))
	  (current-prime (first pointer) (first pointer)))
	 ;; Search until we go past the possible limit.
	 ((> current-prime limit)
	  (setf (cdr last-prime)
		(setf last-prime (cons current nil))))
	 (declare (type fixnum limit current-prime))
	 (when (zerop (the fixnum (mod current current-prime)))
	   (return nil)))))
From: Ted Dunning
Subject: Re: Lisp optimization
Date: 
Message-ID: <TED.91Oct27084616@lole.nmsu.edu>
In article <······················@crl.dec.com> ···@crl.dec.com (Bob Kerns) writes:


   I also would never do PROCLAIM of the unsafe OPTIMIZE
   parameters as you do, but rather declare them locally,
   as below.  (SAFETY 0) should be reserved for small pieces
   of well-tested, performance-critical code, because you'll
   NEVER get all the bugs out of a large system.




i think that bob is actually much too generous here.  in fact, you
only need (speed 3) in a _very_ few situations and (safety 0) in even
fewer situations.

as a test of this opinion, i ran bob's fixed version of the code under
cmu common lisp on a sun ipx both with and without the speed
declaration.  with the speed and safety declaration the program took
96 seconds, and without the declaration it took 104 seconds.  in my
opinion, this 8% difference is too small to lose the run-time type
checking.
From: Richard Lynch
Subject: Re: Lisp optimization
Date: 
Message-ID: <3947@anaxagoras.ils.nwu.edu>
One thing I noticed that has noticed that has not been explicitly mentioned is
that one suggestion remove the let from inside the do.

I believe that
  (do
    (let
  ) )
is also very expensive.  The let will be stacking and popping values *a lot*.

Far cheaper to:
  a) (do (var (form) (form))
     )
where form is the same in both places.

  b) (let (var)
       (do
         (setq var (form))
     ) )

This may not be as "pretty", but will (I believe) save a lot of cycles.

Disclaimer:  let may not work like I think it does.


-- 
"TANSTAAFL"     ·····@aristotle.ils.nwu.edu
From: Robert Krajewski
Subject: Re: Lisp optimization
Date: 
Message-ID: <ROBERTK.91Oct31182033@rkrajewski.lotus.com>
In article <····@anaxagoras.ils.nwu.edu> ·····@ils.nwu.edu (Richard Lynch) writes:

   One thing I noticed that has noticed that has not been explicitly mentioned is
   that one suggestion remove the let from inside the do.

   I believe that
     (do
       (let
     ) )
   is also very expensive.  The let will be stacking and popping values *a lot*.

Umm, maybe the abstract "Common Lisp" machine is changing the lexical
contour on each iteration, but it's very likely that the compiler
allocates the space (a lexical frame) for all the locals at once.
Optimizing for stack depth doesn't buy you much anyway.

   Disclaimer:  let may not work like I think it does.
From: Kenneth S A Oksanen
Subject: Re: Lisp optimization
Date: 
Message-ID: <CESSU.91Oct31155801@tahma.hut.fi>
···@crl.dec.com (Bob Kerns) writes:
>········@sdcc5.ucsd.edu (Lupus Yonderboy) writes:
>>                 (nconc primes (cons current nil)))

>(NCONC is implicitly an iteration through all but the last argument).
>You've just converted an O(N^A) algorithm to O(N^(A+1)).

Since the amount of primes under N is O(log N), the change is not from
O(f(N)) to O(f(N)*N) but to O(f(N)*log(N)).

--
; (c) 1991 by Kenneth Oksanen, <·····@cs.hut.fi>, +358-0-4513295 or -555017
((lambda (a) (a a ((lambda (a) (lambda () (set! a (+ 1 a)))) 2))) (lambda 
(a b) ((lambda (c) (newline) (write c) (a a ((lambda (b) (lambda () (b b))) 
(lambda (a) ((lambda (b) (if (= (modulo b c) 0) (a a) b)) (b)))))) (b))))