From: ········@gmail.com
Subject: optimizing lisp - best practices?
Date: 
Message-ID: <1120170700.740075.199350@f14g2000cwb.googlegroups.com>
So my LISP raytracer is coming along pretty well.  It still could use
more features (you can only do so much with spheres :P ), but I would
like to start optimizing it - as of now, it's slow!  The following
image took about 10 minutes to generate:
http://inst.eecs.berkeley.edu/~stevenan/04.png

Now I know you shouldn't prematurely optimize..but I think the time has
come.  And I just wanna make some simple optimizations.  Namely, type
declarations.  I was wondering, is there a "best way" to do this?  It
seems that (declare)'s clutter up the code a lot - is there a way to do
it while keeping the logic clean?  Any tools you guys use to do this
(maybe even tools that do type-inference)?

I just want some good advice before I dig in and waste a lot of time :)

Thanks,
Steve

From: Joe Marshall
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <1x6jv382.fsf@comcast.net>
········@gmail.com writes:

> So my LISP raytracer is coming along pretty well.  It still could use
> more features (you can only do so much with spheres :P ), but I would
> like to start optimizing it - as of now, it's slow!  The following
> image took about 10 minutes to generate:
> http://inst.eecs.berkeley.edu/~stevenan/04.png
>
> Now I know you shouldn't prematurely optimize..but I think the time has
> come.  And I just wanna make some simple optimizations.  Namely, type
> declarations.  I was wondering, is there a "best way" to do this?  

Yes. Run the profiler and find out exactly where the optimizations
will pay off.  Don't add declarations elsewhere.

> It
> seems that (declare)'s clutter up the code a lot - is there a way to do
> it while keeping the logic clean?  Any tools you guys use to do this
> (maybe even tools that do type-inference)?
>
> I just want some good advice before I dig in and waste a lot of time :)
>
> Thanks,
> Steve
>

-- 
~jrm
From: Simon Alexander
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <m3zmt6mtm5.fsf@ardbeg.math.uwaterloo.ca>
········@gmail.com writes:

> So my LISP raytracer is coming along pretty well.  It still could use
> more features (you can only do so much with spheres :P ), but I would
> like to start optimizing it - as of now, it's slow!  The following
> image took about 10 minutes to generate:
> http://inst.eecs.berkeley.edu/~stevenan/04.png

Without knowing what machine you have used, that 10 minutes doesn't mean much
... but yes, it does sound very slow, especially with only a few spheres (which
are very cheap)

I can offer you advice on two levels, and some comments.  I should mention that
I have done somewhat sophisticated rendering using lisp, and have a pretty good
idea of the issues you will hit.  When I say `somewhat sophisticated', I mean
global illumination (stochastic rendering) and meshes with hundreds of thousands
of elements.  Without reasonable efficiency, this is hopeless, even though what
I was doing wasn't exactly bleeding-edge.


> Now I know you shouldn't prematurely optimize..but I think the time has
> come.  And I just wanna make some simple optimizations.  Namely, type
> declarations.  I was wondering, is there a "best way" to do this?  It

Good that you are thinking about premature optimization.  As you will
undoubtably hear elsewhere: if you are going to optimize, you must first measure
(i.e. profile).  These two issues are the backbone of any attempt at
optimization.

The next point is this (and this applies to a lot of things, but in particular it
applies to your problem):

Your biggest gains will be algorithmic, not through code generation.  In this
sense, your use of declarations falls in the same boat as using a faster
compiler.  We can squeeze extra cycles here and there, but you can't paper over
bad algorithms.

In your case, it looks like this: The vast majority of your time will be spent
making ray-object intersection tests.  As soon as you move away from spheres,
especially, this becomes crucial.  So in order of importance, what you want
to do is the following (*if* what you want is a fast renderer):

1) reduce the number of tests made on each ray to those where intersection is `likely
    
   - some sort of space partitioning (griding, trees, etc.)
   - there is an entire literature about this stuff...

2) use the best intersection algorithms you can find.

   - some of the best algorithms are a bit harder to understand, but very fast
     compared to the `obvious' ones

3) improve your code generation

   - declarations are part of this
   - to get realy fast intersection code, you will need to fix on a
     representation for the underlying vector space that allows you to do the
     vector calculations (dot & cross products, matrix-vector mults, etc)
     without consing at all
   - don't waste time polishing unimportant parts (use your profiler!)

I ended up writing code generation code to do the last bit, but even by doing
simple things by hand, you will see a big improvement.

At the end of the day, on a modern PC your scene should take O(seconds) not
O(ten minutes).

Important note: The ordering above is by *impact on performance*.  The ordering
is roughly inverse in terms of ease of implementation.  You may, especially if
you are learning as you go along (I haven't seen earlier messages) want to
ignore the first issue while you can, and concentrate on a mix of #2 and #3.
For example, get a decent ray-sphere intersection algorithm. Implement it, test
it, time it.  Then attack it in the sense of #3, optimize it.  Test the
optimized version against your original version. 

lather. rinse. repeat.

Hope that helps!

Simon.
From: Joe Marshall
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <ekainmbd.fsf@comcast.net>
Simon Alexander <··@spam.net> writes:

> Your biggest gains will be algorithmic, not through code generation.  

Definitely true, but it is often the case that naive Lisp code will
turn into truly bletcherous generated code.  Floating-point code in
Lisp can be amazingly slow if you don't put the right declarations in the
inner loops.

> In this sense, your use of declarations falls in the same boat as
> using a faster compiler.  We can squeeze extra cycles here and
> there, but you can't paper over bad algorithms.
>
> In your case, it looks like this: The vast majority of your time will be spent
> making ray-object intersection tests.  As soon as you move away from spheres,
> especially, this becomes crucial.  So in order of importance, what you want
> to do is the following (*if* what you want is a fast renderer):
>
> 1) reduce the number of tests made on each ray to those where intersection is `likely
>     
>    - some sort of space partitioning (griding, trees, etc.)
>    - there is an entire literature about this stuff...
>
> 2) use the best intersection algorithms you can find.
>
>    - some of the best algorithms are a bit harder to understand, but very fast
>      compared to the `obvious' ones
>
> 3) improve your code generation
>
>    - declarations are part of this

I agree completely, except for the following:

>    - to get realy fast intersection code, you will need to fix on a
>      representation for the underlying vector space that allows you to do the
>      vector calculations (dot & cross products, matrix-vector mults, etc)
>      without consing at all

These days, if your GC is properly tuned, consing is not that big a
deal.  It is important to avoid `boxing' your floats in these inner
loops (you can lose a lot of time there), but you can generally be
quite prolific with consing without too much overhead.

>    - don't waste time polishing unimportant parts (use your profiler!)

This is *very* important.  Measure what is taking the time.  You'd be
surprised.  In one project, I found that a lot of code was spending a
horrendous amount of time in `length'.  Since we only took the length
of simple vectors, a few vector declarations made a significant
difference.

> I ended up writing code generation code to do the last bit, but even by doing
> simple things by hand, you will see a big improvement.
>
> At the end of the day, on a modern PC your scene should take O(seconds) not
> O(ten minutes).
>
> Important note: The ordering above is by *impact on performance*.  The ordering
> is roughly inverse in terms of ease of implementation.  You may, especially if
> you are learning as you go along (I haven't seen earlier messages) want to
> ignore the first issue while you can, and concentrate on a mix of #2 and #3.
> For example, get a decent ray-sphere intersection algorithm. Implement it, test
> it, time it.  Then attack it in the sense of #3, optimize it.  Test the
> optimized version against your original version. 
>
> lather. rinse. repeat.

I keep running out of shampoo.

-- 
~jrm
From: Simon Alexander
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <m3u0jdmajo.fsf@ardbeg.math.uwaterloo.ca>
Joe Marshall <·············@comcast.net> writes:

> Simon Alexander <··@spam.net> writes:
> 
> > Your biggest gains will be algorithmic, not through code generation.  
> 
> Definitely true, but it is often the case that naive Lisp code will
> turn into truly bletcherous generated code.  Floating-point code in
> Lisp can be amazingly slow if you don't put the right declarations in the
> inner loops.

This is true, but in the particular case, I expect it to be dominated by the
sheer number of object tests, if you don't prune.  But naive Lisp code can be
very slow, certainly!

> I agree completely, except for the following:
> 
> >    - to get realy fast intersection code, you will need to fix on a
> >      representation for the underlying vector space that allows you to do the
> >      vector calculations (dot & cross products, matrix-vector mults, etc)
> >      without consing at all
> 
> These days, if your GC is properly tuned, consing is not that big a
> deal.  It is important to avoid `boxing' your floats in these inner
> loops (you can lose a lot of time there), but you can generally be
> quite prolific with consing without too much overhead.
> 

Perhaps I should have been more specific... It isn't the consing in general
that is a problem, but as you suggest, the boxing of floats.  Which is why I
particularly noted the do & cross products, etc.  In this application, that is
where the trouble will show up.  You will also need to avoid creating temporary
vectors in the code.  A typical intersection calculation can be only a few
floating pointoperations, which is completely dominated by boxing and unboxing
floats if that is what happens.  The "symptom" of this is consing in the
ray-object intersection test, but it is not the same as chasing down consing in
general.  In order to get decent performance you will need to tune these tests
quite carefully, and the particulars will be implementation specific. 
So, good point!  

S.
From: Robert Maas, see http://tinyurl.com/uh3t
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <REM-2005jul08-009@Yahoo.Com>
> From: Simon Alexander <··@spam.net>
> A typical intersection calculation can be only a few floating
> pointoperations, which is completely dominated by boxing and unboxing
> floats if that is what happens.

This reminds me: One of the great failures of CL, in my opinion, is the
lack of overwriting arithmetic functions. In my opinion, *every*
arithmetic function that produces a fixed-precision result, including
*all* floating point results, should have an optional keyword parameter
that specifies a place of the appropriate type to store the result in
lieu of allocationg a whole new floating-point object. For example:
Instead of writing:
  (setq f12 (* f1 f2))
which allocates storage for a new object and then changes f12 to point
to it, leaving the old value in f12 to be garbage collected, it should
be possible to write:
  (* f1 f2 :store-result f12)
The semantics if these variables didn't have their types declared would
be that at runtime a check is made whether the current binding of f12
is of the same type as the result from (* f1 f2), and if so then just
overwrite the innerds of the boxed value, caveat emptor if there's
another reference to that same numeric value that shouldn't be changed,
THROW EXCEPTION if the value of f12 is a literal constant in
read-only-memory, and if the value of f12 is not of the right type then
throw exception or do nothing depending on safety/debug settings. If
the variables are declared to be of appropriate type, and this is
compiled code, then all variables hold unboxed values and the safety
checks can be done at compile time and f12 is guaranteed not to hold a
literal constant so the compiler can generate direct machine code for
floating point multiplication to overwrite the contents of f12
unconditionally. Of course you'd profile first, and only in places
where it really counts would you unroll all your nice nested
s-expressions to flat sequential arithmetic-store-result lines of code.

And not just arithmetic! Anything that computes fixed-size data, should
have the option of directly overwriting a value instead of allocating a
new value and overwriting the handle to the old value to be a handle to
the new value. This would even apply to list structure. If you have a
lot of car/cdr trees all of the same shape, and all the leaves are of
fixed size too, you should, if you want, be able to traverse the
fixed-size tree overwriting all the leaves while leaving the internal
pointers as-is, where you write the code as if you were going to make
new structure but include the :store-result parameter to make it
overwrite instead. Same with fixed-size arrays of fixed-size elements.
Same with read-line with adjustable string big enough to hold the
largest line in your file (or throw exception if you have a line larger
than the capacity of your string).

Of course you'd want a macro WITH-OVERWRITING to wrap around a block
of ordinary code to perform a pre-compilation syntax transformation
to produce the source code described above.
From: Joe Marshall
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <8y0fizmn.fsf@comcast.net>
·······@Yahoo.Com (Robert Maas, see http://tinyurl.com/uh3t) writes:

>> From: Simon Alexander <··@spam.net>
>> A typical intersection calculation can be only a few floating
>> pointoperations, which is completely dominated by boxing and unboxing
>> floats if that is what happens.
>
> This reminds me: One of the great failures of CL, in my opinion, is the
> lack of overwriting arithmetic functions. In my opinion, *every*
> arithmetic function that produces a fixed-precision result, including
> *all* floating point results, should have an optional keyword parameter
> that specifies a place of the appropriate type to store the result in
> lieu of allocationg a whole new floating-point object. For example:
> Instead of writing:
>   (setq f12 (* f1 f2))
> which allocates storage for a new object and then changes f12 to point
> to it, leaving the old value in f12 to be garbage collected, it should
> be possible to write:
>   (* f1 f2 :store-result f12)

I disagree completely.  This kind of re-use is one that any decent
compiler ought to be able to do (you don't need the mythical
sufficiently smart compiler).

> And not just arithmetic! Anything that computes fixed-size data, should
> have the option of directly overwriting a value instead of allocating a
> new value and overwriting the handle to the old value to be a handle to
> the new value. This would even apply to list structure. If you have a
> lot of car/cdr trees all of the same shape, and all the leaves are of
> fixed size too, you should, if you want, be able to traverse the
> fixed-size tree overwriting all the leaves while leaving the internal
> pointers as-is, where you write the code as if you were going to make
> new structure but include the :store-result parameter to make it
> overwrite instead. Same with fixed-size arrays of fixed-size elements.
> Same with read-line with adjustable string big enough to hold the
> largest line in your file (or throw exception if you have a line larger
> than the capacity of your string).

This is more interesting, but I disagree with the approach.  `Linear
variables' are ones that are defined exactly once and used exactly
once (no more, no less).  Henry Baker describes this in this paper:

  http://home.pipeline.com/~hbaker1/LinearLisp.html

The advantage, as you point out, is that there is no need for garbage
collection.

Alan Bawden extended Baker's work in his PhD dissertation:

  Implementing Distributed Systems Using Linear Naming 
  ftp://publications.ai.mit.edu/ai-publications/1500-1999/AITR-1627.ps

Where I disagree with your approach is that you annotate the operator
to specify where the variables are linear rather than annotating the
variables.

-- 
~jrm
From: Peter Seibel
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <m2ll4pdkfz.fsf@beagle.local>
Joe Marshall <·············@comcast.net> writes:

>> lather. rinse. repeat.
>
> I keep running out of shampoo.

Well, you're a pretty hairy guy.

-Peter

-- 
Peter Seibel           * ·····@gigamonkeys.com
Gigamonkeys Consulting * http://www.gigamonkeys.com/
Practical Common Lisp  * http://www.gigamonkeys.com/book/
From: Joe Marshall
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <slywn8es.fsf@comcast.net>
Peter Seibel <·····@gigamonkeys.com> writes:

> Joe Marshall <·············@comcast.net> writes:
>
>>> lather. rinse. repeat.
>>
>> I keep running out of shampoo.
>
> Well, you're a pretty hairy guy.

It's the hairy macros.

-- 
~jrm
From: Robert Maas, see http://tinyurl.com/uh3t
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <REM-2005jul08-008@Yahoo.Com>
> From: Joe Marshall <·············@comcast.net>
> > lather. rinse. repeat.
> I keep running out of shampoo.

I guess you're not as intelligent as I am/was. As long ago as early
teens, when my mother stopped washing my hair for me and I had to do it
myself, so I had to read the instructions on the tube of Prell, I was
pissed at the stupid instructions that were obviously wrong. I was very
shy and terribly afraid of failing to follow instructions and risking
public humiliation for my alleged stupidity, so I had to keep it secret
that I wasn't following the blatantly stupid instructions on the tube.

More appropriate instructions would be:
  while (hair.IsDirty()) {
    hair.lather();
    hair.rinse();
    }
(Sorry, it's easier to express in Java, sigh.)
(Also Java is more popular so it's more likely the makers of Prell
would change the instructions like the above, rather than the CL
version.)
(Also, they stopped making tubes of Prell several years ago, and I
didn't like the other brands, so now I just use bars of hand soap.)

Anyway, the modified algorithm still has one problem:

Error at line 1:
  while (hair.IsDirty()) {
              ^
undefined method.

Like when you're in the shower with the water running, how can you tell
whether your hair is clean yet or still dirty?? You know the joke about
the two-pass algorithm for the guy on the bus who wanted to know which
place to get off? The other person said "Watch where I get off, and
your stop is two stops before that"? Well the only time you can know if
your hair got washed clean or is still dirty is later when it's dry and
you try to brush it and it is so gummy it snags on the brush, and then
you know you got off the bus too early.

Oh, I'm supposed to mention Lisp at least once to avoid being
off-topic? Well, stand by, it's now 21:53, and I need to look at the
LOOP tutorial to rememeber how to use (LOOP ... WHILE ....), stand by
.. oops, I forgot I was going to go to the LOOP tutorial, but went
here instead:
  http://www.lispworks.com/documentation/HyperSpec/Body/m_loop.htm
the tutorial is much easier to follow, but I think I found what I want
in the HyperSpec. Here's my guess at the code:
  (loop while (is-dirty hair)
        do (progn (lather hair shampoo) (rinse hair water)))
Did I get it right? 21:59 now.
From: Kalle Olavi Niemitalo
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <87y88gb5wn.fsf@Astalo.kon.iki.fi>
·······@Yahoo.Com (Robert Maas, see http://tinyurl.com/uh3t) writes:

>   (loop while (is-dirty hair)
>         do (progn (lather hair shampoo) (rinse hair water)))
> Did I get it right?

It'd work, but the PROGN is not needed; you can have multiple
compound forms after DO.
From: Robert Uhl
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <m31x65luvg.fsf@4dv.net>
·······@Yahoo.Com (Robert Maas, see http://tinyurl.com/uh3t) writes:
>
> As long ago as early teens, when my mother stopped washing my hair for
> me and I had to do it myself, so I had to read the instructions on the
> tube of Prell, I was pissed at the stupid instructions that were
> obviously wrong.

Teens?!?!

-- 
Robert Uhl <http://public.xdi.org/=ruhl>
A Puritan is such a one as loves God with all his soul, but hates his
neighbour with all his heart.                 --John Manningham, 1602
From: Tayssir John Gabbour
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <1120401844.801863.235660@g14g2000cwa.googlegroups.com>
········@gmail.com wrote:
> So my LISP raytracer is coming along pretty well.  It still could use
> more features (you can only do so much with spheres :P ), but I would
> like to start optimizing it - as of now, it's slow!  The following
> image took about 10 minutes to generate:
> http://inst.eecs.berkeley.edu/~stevenan/04.png
>
> Now I know you shouldn't prematurely optimize..but I think the time has
> come.  And I just wanna make some simple optimizations.  Namely, type
> declarations.  I was wondering, is there a "best way" to do this?  It
> seems that (declare)'s clutter up the code a lot - is there a way to do
> it while keeping the logic clean?  Any tools you guys use to do this
> (maybe even tools that do type-inference)?
>
> I just want some good advice before I dig in and waste a lot of time :)

"Compiler macros" may help keep your code free of clutter, if there are
special cases which you can exploit before runtime. May not be useful
in your case, but I haven't seen your code and won't try guessing
whether it is...
http://lisp.tech.coop/lisp-user-meeting-amsterdam-april-2004#compiler-macro

Caveat: I've never used them, personally. ;) And it's optional for a
given Lisp implementation to actually use the ones you define...

Tayssir
From: Pietro Campesato
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <1120491446.249782.142540@o13g2000cwo.googlegroups.com>
Tayssir John Gabbour wrote:
>
> "Compiler macros" may help keep your code free of clutter, if there are
> special cases which you can exploit before runtime. May not be useful
> in your case, but I haven't seen your code and won't try guessing
> whether it is...
> http://lisp.tech.coop/lisp-user-meeting-amsterdam-april-2004#compiler-macro
>
> Caveat: I've never used them, personally. ;) And it's optional for a
> given Lisp implementation to actually use the ones you define...
>
> Tayssir

Just wondering, is there a way to do a comprehensive profile of your
code without littering it with (time ...) all over the place? Something
like a "profile-eval"?
From: Edi Weitz
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <uvf3qmu01.fsf@agharta.de>
On 4 Jul 2005 08:37:26 -0700, "Pietro Campesato" <················@gmail.com> wrote:

> Just wondering, is there a way to do a comprehensive profile of your
> code without littering it with (time ...) all over the place?

Littering it with (time ...) all over the place?  Which Lisp are you
using?  Have you actually read its documentation?

Here are some examples for popular CL implementations.

  <http://www.lispworks.com/documentation/lw445/LWUG/html/lwuser-122.htm>
  <http://www.lispworks.com/documentation/lw445/CLWUG-W/html/clwuser-w-380.htm>

  <http://www.franz.com/support/documentation/7.0/doc/runtime-analyzer.htm>

  <http://common-lisp.net/project/cmucl/doc/cmu-user/compiler-hint.html#toc219>

Cheers,
Edi.

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: Pietro Campesato
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <1120501490.964897.83770@g14g2000cwa.googlegroups.com>
Edi Weitz wrote:
>
> Littering it with (time ...) all over the place?  Which Lisp are you
> using?  Have you actually read its documentation?
>

Thank you, but I was wondering if there is a standard way to profile,
such as a widely used library supporting many implementations.
From: ········@gmail.com
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <1120502976.596023.317720@g14g2000cwa.googlegroups.com>
Yeah I'd like to know as well.  I'm using GNU Clisp, so I don't think
there's anything native to it, but I did find this:
http://www.lisp-p.org/lh/node11.html#SECTION001140000000000000000

I've only gone half way through the tutorial, but it seems easy and
usesable.
From: Edi Weitz
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <ud5pycssg.fsf@agharta.de>
On 4 Jul 2005 11:24:51 -0700, "Pietro Campesato" <················@gmail.com> wrote:

> I was wondering if there is a standard way to profile, such as a
> widely used library supporting many implementations.

I only know about this one

  <http://www-cgi.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/tools/metering/0.html>

but I don't think it's "widely used."  I guess the
implementation-provided tools are usually much better.

Cheers,
Edi.


-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: Thomas Schilling
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <3iqb7fFl3cdvU1@news.dfncis.de>
In case you don't already do this:

You might want to take real advantage of Lisp's macro system and do
"partial application", that is, given your scene parameters you are able
 generate code specialized for your scene, pulling some amount of work
to compile time.

Ok, here's a simple example: Let's say you have a function to calculate
x^n:

(defun power (x n)
  (cond ((= n 0) 1)
        ((even n)
         (let ((y (power x (/ n 2))))
           (* y y)))
        (t (* x (power x (1- n))))))

if you have n given (at compile time) to be, say, 7 you can optimize
this to x*(x*(x^2))^2.

You can directly express this using a macro:

(defmacro power-pe (x n)
  (if (not (numberp n))
     `(power ,x ,n)      ;; fall back to default version
    (let ((gx (gensym)))
     `(let ((,gx ,x))
       ,(cond ((= n 0) 1)
	     ((evenp n)
	      (let ((y (gensym)))
		`(let ((,y (power-pe ,gx ,(/ n 2))))
		  (* ,y ,y))))
	     (t
	      `(* ,gx (power-pe ,gx ,(1- n)))))))))

For (power x 5) this will expand to (gensyms beautified)
=> (let ((x1 x))
     (* x1 (power-pe x1 4)))
=> (let ((x1 x))
     (* x1 (let ((x2 x1))
             (let ((y1 (power-pe x2 2))) (* y1 y1)))))
=> ...
=> essentially resulting in something equivalent to
     x*((x^2)^2))

The generated code using this macro is still quite ugly but, eg. those
  (let ((x1 ..) (let ((x2 x1)) ... use x2 ...))
will be optimized by (almost?) every compiler to
  (let ((x1 ..) ... use x1 ...)

Paul Graham also shows some examples of this technique in his book "On
Lisp".

Admittedly it would be nicer (and possible) to let the compiler do this
stuff when declared so (actually this is just some extension to
constant-folding) and some compilers might already do this (at least for
some built-in functions like EXPT).

Partial application might even be used at runtime, since the compiler is
available at runtime resulting in code like:
  (render-scene (compile-scene "somefile.scn"))

The speedup gained heavily depends on the parameters with which the
functions are called, and may vary from much slower to speedups of 10x
or more. (Note also, that we trade code size for speed, which may rise
cache issues.)

For how this can be applied to ray-tracing applications, the paper below
might give some hints. They report a speedup of 1.5x - 3x.

  http://repository.readscheme.org/ftp/papers/topps/D-289.pdf

-ts
From: Kalle Olavi Niemitalo
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <87d5pzpwzt.fsf@Astalo.kon.iki.fi>
Thomas Schilling <······@yahoo.de> writes:

> The generated code using this macro is still quite ugly but, eg. those
>   (let ((x1 ..) (let ((x2 x1)) ... use x2 ...))
> will be optimized by (almost?) every compiler to
>   (let ((x1 ..) ... use x1 ...)

Here's a version that produces very readable code.  It is quite
pointless (compilers are already supposed to optimize EXPT), but
perhaps it can serve as a demonstration of LOOP.

(defmacro power-pretty (x n)
  (if (not (typep n '(integer 0 *)))
      `(power ,x ,n)			; fall back to default version
    (loop for index below (max 1 (integer-length n)) ; evaluate x even when n=0
	  for gensym = (make-symbol (format nil "~S~S" 'power (expt 2 index)))
	  and formula = x then `(* ,gensym ,gensym)
	  collect (list gensym formula) into bindings
	  if (logbitp index n)
	    collect gensym into final-factors
	  finally (return `(let* ,bindings
			     (* ,@final-factors))))))

Using DEFINE-COMPILER-MACRO instead of DEFMACRO would make the
compiler automatically use the optimized version where
appropriate, but then again one may disagree with the compiler
about what is appropriate.
From: Kalle Olavi Niemitalo
Subject: Re: optimizing lisp - best practices?
Date: 
Message-ID: <87acl3pqxv.fsf@Astalo.kon.iki.fi>
Kalle Olavi Niemitalo <···@iki.fi> writes:

>         for gensym = (make-symbol (format nil "~S~S" 'power (expt 2 index)))

Doh!  I wanted to make the symbols look pretty in both standard
and "modern" lisps but botched that and forgot about *PACKAGE*.
Please substitute:

          for gensym = (make-symbol (format nil "~A~S" (symbol-name 'power)
                                            (expt 2 index)))