Optimizing Numerical Code

From: David Steuber
Subject: Optimizing Numerical Code
Date: Tue, 31 Aug 2004 00:07:07 +0000
Message-ID: <87isb09nmc.fsf@david-steuber.com>

I need to speed up the following function:

(defun double-float-escape-fn (c-real c-imag)
  "Return iterations for c-real + c-imag i as floating-point numbers"
  (declare (type double-float c-real c-imag)
           (optimize (speed 3) (safety 0) (space 0)))
  (loop for iterations below *max-iterations*
     for period from 1
     for z-imag         = 0.0D0 then (+ (* 2.0D0 z-real z-imag)
  c-imag)       
     for z-real         = 0.0D0 then (+ (- z-real-squared
  z-imag-squared) c-real)
     for z-real-squared = 0.0D0 then (* z-real z-real)
     for z-imag-squared = 0.0D0 then (* z-imag z-imag)
     with a-imag        = 0.0D0
     with a-real        = 0.0D0
     with check         = 3
     unless (< (+ z-real-squared z-imag-squared) 4.0D0) return (the
  fixnum iterations)
     when (and (= a-imag z-imag) (= a-real z-real) (> iterations 1))
  return (the fixnum *max-iterations*)
     when (> period check) do (setf period 1 check (* 2 check) a-imag
  z-imag a-real z-real)
     finally (return (the fixnum iterations))))

The function calculates the escape value for the Mandelbrot set.  I
use the returned value to decide what color to paint a pixel.  I am
considering doing this function in assembler and using FFI.  The only
problem with that is that it locks me into an architecture and
implementation.  So before I go that route, I want to see what else
can be done in Lisp land.

Here is a sample timing of the function from SBCL on OS X running on a
PowerBook G4 866Mhz:

* (svref *real-array* 260)

-0.10488076384628028d0
* (svref *imag-array* 145)

0.9256099657646029d0
* *max-iterations*

2858197
* (time (double-float-escape-fn (svref *real-array* 260) (svref
*imag-array* 145)))

Evaluation took:
                 3.51 seconds of real time
                 2.35 seconds of user run time
                 1.07 seconds of system run time
                 0 page faults and
                 204,646,208 bytes consed.
2858197


This was one pixel in a 320x240 image.

Thanks.

-- 
An ideal world is left as an excercise to the reader.
   --- Paul Graham, On Lisp 8.1

Re: Optimizing Numerical Code Shiro Kawai
Re: Optimizing Numerical Code Thomas Bakketun
Re: Optimizing Numerical Code Nicolas Neuss
Re: Optimizing Numerical Code Paolo Amoroso
Re: Optimizing Numerical Code Joel Ray Holveck
- Re: Optimizing Numerical Code David Steuber
Re: Optimizing Numerical Code Marco Baringer
- Re: Optimizing Numerical Code David Steuber
- Re: Optimizing Numerical Code ··········@hotmail.com
Re: Optimizing Numerical Code Wade Humeniuk
- Re: Optimizing Numerical Code Wade Humeniuk
- Re: Optimizing Numerical Code David Steuber

From: Shiro Kawai
Subject: Re: Optimizing Numerical Code
Date: Tue, 31 Aug 2004 12:16:34 +0000
Message-ID: <1bc2f7b2.0408310416.365b0cc0@posting.google.com>

compile and disassemble are your friends.
I didn't benchmark, but adding types to all loop variables
as follows made Allegro 7.0beta compile all numeric operations 
inline.

(defun double-float-escape-fn (c-real c-imag)
  "Return iterations for c-real + c-imag i as floating-point numbers"
  (declare (type double-float c-real c-imag)
           (optimize (speed 3) (safety 0) (space 0)))
  (loop for iterations of-type fixnum below *max-iterations*
     for period of-type fixnum from 1
     for z-imag of-type double-float = 0.0D0 then (+ (* 2.0D0 z-real z-imag)
  c-imag)       
     for z-real of-type double-float = 0.0D0 then (+ (- z-real-squared
  z-imag-squared) c-real)
     for z-real-squared of-type double-float = 0.0D0 then (* z-real z-real)
     for z-imag-squared of-type double-float = 0.0D0 then (* z-imag z-imag)
     with a-imag of-type double-float = 0.0D0
     with a-real of-type double-float = 0.0D0
     with check of-type fixnum = 3
     unless (< (+ z-real-squared z-imag-squared) 4.0D0) return (the
  fixnum iterations)
     when (and (= a-imag z-imag) (= a-real z-real) (> iterations 1))
  return (the fixnum *max-iterations*)
     when (> period check) do (setf period 1 check (* 2 check) a-imag
  z-imag a-real z-real)
     finally (return (the fixnum iterations))))

From: Thomas Bakketun
Subject: Re: Optimizing Numerical Code
Date: Tue, 31 Aug 2004 08:07:18 +0000
Message-ID: <pan.2004.08.31.08.07.16.224482@kokusbolle.bakketun.net>

* David Steuber:

> I need to speed up the following function:

[snip]

Try this one with type declarations for all variables:

(defun double-float-escape-fn (c-real c-imag)
  "Return iterations for c-real + c-imag i as floating-point numbers"
  (declare (type double-float c-real c-imag)
           (optimize (speed 3) (safety 0) (space 0)))
  (loop for iterations fixnum below *max-iterations*
     for period fixnum from 1
     for z-imag double-float = 0.0D0 then (+ (* 2.0D0 z-real z-imag) c-imag)       
     for z-real double-float = 0.0D0 then (+ (- z-real-squared
					   z-imag-squared) c-real)
     for z-real-squared double-float = 0.0D0 then (* z-real z-real)
     for z-imag-squared double-float = 0.0D0 then (* z-imag z-imag)
     with a-imag double-float       = 0.0D0
     with a-real double-float       = 0.0D0
     with check fixnum        = 3
     unless (< (+ z-real-squared z-imag-squared) 4.0D0) 
		return (the fixnum iterations)
     when (and (= a-imag z-imag) (= a-real z-real) (> iterations 1))
     return (the fixnum *max-iterations*)
     when (> period check) do (setf period 1 check (* 2 check) a-imag
				    z-imag a-real z-real)
     finally (return (the fixnum iterations))))

> Evaluation took:
>                  3.51 seconds of real time
>                  2.35 seconds of user run time
>                  1.07 seconds of system run time
>                  0 page faults and
>                  204,646,208 bytes consed.

Note the last line.  Over 200 MB of memory was allocated.  With type
declariations for all variables this should be reduced to 0.

From: Nicolas Neuss
Subject: Re: Optimizing Numerical Code
Date: Tue, 31 Aug 2004 08:13:12 +0000
Message-ID: <87656zk9nr.fsf@ortler.iwr.uni-heidelberg.de>

David Steuber <·····@david-steuber.com> writes:

> I need to speed up the following function:
> 
> ...

Not quite your function (I did not understand why you do some stuff), but
much faster:

(setq *max-iterations* 2858197)

(defun double-float-escape-fn (c-real c-imag &key (max-iterations *max-iterations*))
  "Return iterations for c-real + c-imag i as floating-point numbers"
  (declare (type double-float c-real c-imag)
           (type fixnum max-iterations)
           (optimize (speed 3) (safety 0) (space 0)))
  (loop for iterations of-type fixnum below max-iterations
        for z-imag of-type double-float = 0.0D0 then (+ (* 2.0D0 z-real z-imag) c-imag)
        for z-real of-type double-float = 0.0D0 then (+ (- z-real-squared z-imag-squared) c-real)
        for z-real-squared of-type double-float = 0.0D0 then (* z-real z-real)
        for z-imag-squared of-type double-float = 0.0D0 then (* z-imag z-imag)
        until (>= (+ z-real-squared z-imag-squared) 4.0D0)
        finally (return iterations)))


(time (double-float-escape-fn -0.10488076384628028d0 0.9256099657646029d0))

=>
; Evaluation took:
;   0.05f0 seconds of real time
;   0.06f0 seconds of user run time
;   0.0f0 seconds of system run time
;   132,102,896 CPU cycles
;   0 page faults and
;   0 bytes consed.

2858197

From: Paolo Amoroso
Subject: Re: Optimizing Numerical Code
Date: Tue, 31 Aug 2004 10:51:48 +0000
Message-ID: <87d617eg1n.fsf@plato.moon.paoloamoroso.it>

This paper might be useful, in case you are not already aware of it:

  "Fast Floating-Point Processing in Common Lisp"
  Richard J. Fateman, Kevin A. Broughan, Diane K. Willcock, Duane Rettig

I'm offline right now and I don't have a URL handy, but it should be
pretty easy to find it with Google.


Paolo
-- 
Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film
Recommended Common Lisp libraries/tools (Google for info on each):
- ASDF/ASDF-INSTALL: system building/installation
- CL-PPCRE: regular expressions
- UFFI: Foreign Function Interface

From: Joel Ray Holveck
Subject: Re: Optimizing Numerical Code
Date: Tue, 31 Aug 2004 21:42:45 +0000
Message-ID: <y7cd6173rxm.fsf@sindri.juniper.net>

> I need to speed up the following function:

Most people have suggested adding types to your code.  The compiler
can give you that hint by itself, too.  CMUCL, for example will tell
you about places where it can use optimized calls (such as inline
addition instead of calling #'+) if you give it a narrower type.  I
assume that SBCL will do the same.  I've used this information fairly
extensively in some of my tight loops.

These are the compiler settings I had at the time:
   (declaim (optimize (debug 3) (safety 3) (inhibit-warnings 3)))
   (setq *compile-print* nil)
   (setq *compile-verbose* nil)

I compiled your code on CMUCL here, and here's what I got:

    * (compile 'double-float-escape-fn)
    In: LAMBDA (C-REAL C-IMAG)
      (LOOP FOR ITERATIONS BELOW *MAX-ITERATIONS* ...)
    --> LET LET LET LET LET LET LET LET LET BLOCK ANSI-LOOP::LOOP-BODY TAGBODY WHEN 
    --> COND IF >= IF 
    ==>
      (< ITERATIONS #:G0)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a REAL, not a SINGLE-FLOAT.
          The second argument is a REAL, not a DOUBLE-FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a REAL, not a DOUBLE-FLOAT.
          The second argument is a REAL, not a SINGLE-FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a REAL, not a FLOAT.
          The second argument is a REAL, not a RATIONAL.

      (+ Z-REAL-SQUARED Z-IMAG-SQUARED)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a RATIONAL.
          The second argument is a NUMBER, not a FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a FLOAT.
          The second argument is a NUMBER, not a RATIONAL.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a SINGLE-FLOAT.
          The second argument is a NUMBER, not a DOUBLE-FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The second argument is a NUMBER, not a SINGLE-FLOAT.

      (< (+ Z-REAL-SQUARED Z-IMAG-SQUARED) 4.0d0)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a REAL, not a SINGLE-FLOAT.

      (= A-IMAG Z-IMAG)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a SINGLE-FLOAT.
          The second argument is a NUMBER, not a DOUBLE-FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The second argument is a NUMBER, not a SINGLE-FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a FLOAT.
          The second argument is a NUMBER, not a RATIONAL.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
          The second argument is a NUMBER, not a REAL.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a REAL.
          The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
          The second argument is a NUMBER, not a REAL.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a REAL.
          The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
    Note: Unable to open code because:
          Operands might not be the same type.

      (= A-REAL Z-REAL)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a SINGLE-FLOAT.
          The second argument is a NUMBER, not a DOUBLE-FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The second argument is a NUMBER, not a SINGLE-FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a FLOAT.
          The second argument is a NUMBER, not a RATIONAL.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
          The second argument is a NUMBER, not a REAL.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a REAL.
          The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
          The second argument is a NUMBER, not a REAL.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a REAL.
          The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
    Note: Unable to open code because:
          Operands might not be the same type.

      (> ITERATIONS 1)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a REAL, not a FLOAT.

      (> PERIOD CHECK)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a REAL, not a SINGLE-FLOAT.
          The second argument is a REAL, not a DOUBLE-FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a REAL, not a DOUBLE-FLOAT.
          The second argument is a REAL, not a SINGLE-FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a REAL, not a FLOAT.
          The second argument is a REAL, not a RATIONAL.

      (LOOP FOR ITERATIONS BELOW *MAX-ITERATIONS* ...)
    --> LET LET LET LET LET LET LET LET LET BLOCK ANSI-LOOP::LOOP-BODY TAGBODY 
    --> ANSI-LOOP::LOOP-REALLY-DESETQ SETQ 1+ 
    ==>
      (+ ITERATIONS 1)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a REAL, not a FLOAT.

    --> LET LET LET LET LET LET LET LET LET BLOCK ANSI-LOOP::LOOP-BODY TAGBODY WHEN 
    --> COND IF >= IF 
    ==>
      (< ITERATIONS #:G0)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a REAL, not a SINGLE-FLOAT.
          The second argument is a REAL, not a DOUBLE-FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a REAL, not a DOUBLE-FLOAT.
          The second argument is a REAL, not a SINGLE-FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a REAL, not a FLOAT.
          The second argument is a REAL, not a RATIONAL.

    --> LET LET LET LET LET LET LET LET LET BLOCK ANSI-LOOP::LOOP-BODY TAGBODY 
    --> ANSI-LOOP::LOOP-REALLY-DESETQ SETQ 1+ 
    ==>
      (+ PERIOD 1)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a REAL, not a FLOAT.

      (* 2.0d0 Z-REAL Z-IMAG)
    ==>
      (* (* 2.0d0 Z-REAL) Z-IMAG)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a RATIONAL.
          The second argument is a NUMBER, not a FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a FLOAT.
          The second argument is a NUMBER, not a RATIONAL.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a SINGLE-FLOAT.
          The second argument is a NUMBER, not a DOUBLE-FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The second argument is a NUMBER, not a SINGLE-FLOAT.
    Note: Unable to convert x*2^k to shift due to type uncertainty:
          The first argument is a NUMBER, not a INTEGER.
          The second argument is a NUMBER, not a INTEGER.
    Note: Unable to recode as shift and add due to type uncertainty:
          The first argument is a NUMBER, not a (UNSIGNED-BYTE 32).
          The second argument is a NUMBER, not a (UNSIGNED-BYTE 32).
          The result is a NUMBER, not a (UNSIGNED-BYTE 32).

      (+ (* 2.0d0 Z-REAL Z-IMAG) C-IMAG)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a RATIONAL.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a SINGLE-FLOAT.

      (- Z-REAL-SQUARED Z-IMAG-SQUARED)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a RATIONAL.
          The second argument is a NUMBER, not a FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a FLOAT.
          The second argument is a NUMBER, not a RATIONAL.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a SINGLE-FLOAT.
          The second argument is a NUMBER, not a DOUBLE-FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The second argument is a NUMBER, not a SINGLE-FLOAT.

      (+ (- Z-REAL-SQUARED Z-IMAG-SQUARED) C-REAL)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a RATIONAL.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a SINGLE-FLOAT.

      (* Z-REAL Z-REAL)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a RATIONAL.
          The second argument is a NUMBER, not a FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a FLOAT.
          The second argument is a NUMBER, not a RATIONAL.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a SINGLE-FLOAT.
          The second argument is a NUMBER, not a DOUBLE-FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The second argument is a NUMBER, not a SINGLE-FLOAT.
    Note: Unable to convert x*2^k to shift due to type uncertainty:
          The first argument is a NUMBER, not a INTEGER.
          The second argument is a NUMBER, not a INTEGER.
    Note: Unable to recode as shift and add due to type uncertainty:
          The first argument is a NUMBER, not a (UNSIGNED-BYTE 32).
          The second argument is a NUMBER, not a (UNSIGNED-BYTE 32).
          The result is a NUMBER, not a (UNSIGNED-BYTE 32).

      (* Z-IMAG Z-IMAG)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a RATIONAL.
          The second argument is a NUMBER, not a FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a FLOAT.
          The second argument is a NUMBER, not a RATIONAL.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a SINGLE-FLOAT.
          The second argument is a NUMBER, not a DOUBLE-FLOAT.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The second argument is a NUMBER, not a SINGLE-FLOAT.
    Note: Unable to convert x*2^k to shift due to type uncertainty:
          The first argument is a NUMBER, not a INTEGER.
          The second argument is a NUMBER, not a INTEGER.
    Note: Unable to recode as shift and add due to type uncertainty:
          The first argument is a NUMBER, not a (UNSIGNED-BYTE 32).
          The second argument is a NUMBER, not a (UNSIGNED-BYTE 32).
          The result is a NUMBER, not a (UNSIGNED-BYTE 32).

      (* 2.0d0 Z-REAL Z-IMAG)
    --> * * 
    ==>
      (* C::Y 2.0d0)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a RATIONAL.
    Note: Unable to optimize due to type uncertainty:
          The first argument is a NUMBER, not a SINGLE-FLOAT.

      (* 2 CHECK)
    ==>
      (* C::Y 2)
    Note: Unable to optimize due to type uncertainty:
          The first argument is a REAL, not a FLOAT.
    Note: Unable to convert x*2^k to shift due to type uncertainty:
          The first argument is a REAL, not a INTEGER.
    Note: Unable to recode as shift and add due to type uncertainty:
          The first argument is a REAL, not a (UNSIGNED-BYTE 32).
          The result is a REAL, not a (UNSIGNED-BYTE 32).

      (LOOP FOR ITERATIONS BELOW *MAX-ITERATIONS* ...)
    --> LET LET LET LET LET LET LET LET LET BLOCK ANSI-LOOP::LOOP-BODY TAGBODY WHEN 
    --> COND IF >= IF 
    ==>
      (< ITERATIONS #:G0)
    Note: Forced to do GENERIC-< (cost 10).
          Unable to do inline float comparison (cost 3) because:
          The first argument is a REAL, not a DOUBLE-FLOAT.
          The second argument is a REAL, not a DOUBLE-FLOAT.
          Unable to do inline float comparison (cost 3) because:
          The first argument is a REAL, not a SINGLE-FLOAT.
          The second argument is a REAL, not a SINGLE-FLOAT.
          etc.

      (+ Z-REAL-SQUARED Z-IMAG-SQUARED)
    Note: Forced to do GENERIC-+ (cost 10).
          Unable to do inline float arithmetic (cost 2) because:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The second argument is a NUMBER, not a DOUBLE-FLOAT.
          The result is a REAL, not a DOUBLE-FLOAT.
          Unable to do inline float arithmetic (cost 2) because:
          The first argument is a NUMBER, not a SINGLE-FLOAT.
          The second argument is a NUMBER, not a SINGLE-FLOAT.
          The result is a REAL, not a SINGLE-FLOAT.
          etc.

      (< (+ Z-REAL-SQUARED Z-IMAG-SQUARED) 4.0d0)
    Note: Forced to do GENERIC-< (cost 10).
          Unable to do inline float comparison (cost 3) because:
          The first argument is a REAL, not a DOUBLE-FLOAT.

      (= A-IMAG Z-IMAG)
    Note: Forced to do GENERIC-= (cost 10).
          Unable to do inline float comparison (cost 3) because:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The second argument is a NUMBER, not a DOUBLE-FLOAT.
          Unable to do inline float comparison (cost 3) because:
          The first argument is a NUMBER, not a SINGLE-FLOAT.
          The second argument is a NUMBER, not a SINGLE-FLOAT.

      (= A-REAL Z-REAL)
    Note: Forced to do GENERIC-= (cost 10).
          Unable to do inline float comparison (cost 3) because:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The second argument is a NUMBER, not a DOUBLE-FLOAT.
          Unable to do inline float comparison (cost 3) because:
          The first argument is a NUMBER, not a SINGLE-FLOAT.
          The second argument is a NUMBER, not a SINGLE-FLOAT.

      (> ITERATIONS 1)
    Note: Forced to do GENERIC-> (cost 10).
          Unable to do inline fixnum comparison (cost 3) because:
          The first argument is a REAL, not a FIXNUM.
          Unable to do inline fixnum comparison (cost 4) because:
          The first argument is a REAL, not a FIXNUM.
          etc.

      (> PERIOD CHECK)
    Note: Forced to do GENERIC-> (cost 10).
          Unable to do inline float comparison (cost 3) because:
          The first argument is a REAL, not a DOUBLE-FLOAT.
          The second argument is a REAL, not a DOUBLE-FLOAT.
          Unable to do inline float comparison (cost 3) because:
          The first argument is a REAL, not a SINGLE-FLOAT.
          The second argument is a REAL, not a SINGLE-FLOAT.
          etc.

      (* 2 CHECK)
    ==>
      (* C::Y 2)
    Note: Forced to do GENERIC-* (cost 30).
          Unable to do inline fixnum arithmetic (cost 3) because:
          The first argument is a REAL, not a FIXNUM.
          The result is a REAL, not a FIXNUM.
          Unable to do inline fixnum arithmetic (cost 4) because:
          The first argument is a REAL, not a FIXNUM.
          The result is a REAL, not a FIXNUM.
          etc.

      (LOOP FOR ITERATIONS BELOW *MAX-ITERATIONS* ...)
    --> LET LET LET LET LET LET LET LET LET BLOCK ANSI-LOOP::LOOP-BODY TAGBODY 
    --> ANSI-LOOP::LOOP-REALLY-DESETQ SETQ 1+ 
    ==>
      (+ ITERATIONS 1)
    Note: Forced to do GENERIC-+ (cost 10).
          Unable to do inline fixnum arithmetic (cost 1) because:
          The first argument is a REAL, not a FIXNUM.
          The result is a REAL, not a FIXNUM.
          Unable to do inline fixnum arithmetic (cost 2) because:
          The first argument is a REAL, not a FIXNUM.
          The result is a REAL, not a FIXNUM.
          etc.

    --> LET LET LET LET LET LET LET LET LET BLOCK ANSI-LOOP::LOOP-BODY TAGBODY WHEN 
    --> COND IF >= IF 
    ==>
      (< ITERATIONS #:G0)
    Note: Forced to do GENERIC-< (cost 10).
          Unable to do inline float comparison (cost 3) because:
          The first argument is a REAL, not a DOUBLE-FLOAT.
          The second argument is a REAL, not a DOUBLE-FLOAT.
          Unable to do inline float comparison (cost 3) because:
          The first argument is a REAL, not a SINGLE-FLOAT.
          The second argument is a REAL, not a SINGLE-FLOAT.
          etc.

    --> LET LET LET LET LET LET LET LET LET BLOCK ANSI-LOOP::LOOP-BODY TAGBODY 
    --> ANSI-LOOP::LOOP-REALLY-DESETQ SETQ 1+ 
    ==>
      (+ PERIOD 1)
    Note: Forced to do GENERIC-+ (cost 10).
          Unable to do inline fixnum arithmetic (cost 1) because:
          The first argument is a REAL, not a FIXNUM.
          The result is a REAL, not a FIXNUM.
          Unable to do inline fixnum arithmetic (cost 2) because:
          The first argument is a REAL, not a FIXNUM.
          The result is a REAL, not a FIXNUM.
          etc.

      (* 2.0d0 Z-REAL Z-IMAG)
    --> * * 
    ==>
      (* C::Y 2.0d0)
    Note: Forced to do GENERIC-* (cost 30).
          Unable to do inline float arithmetic (cost 3) because:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The result is a NUMBER, not a DOUBLE-FLOAT.

    ==>
      (* (* 2.0d0 Z-REAL) Z-IMAG)
    Note: Forced to do GENERIC-* (cost 30).
          Unable to do inline float arithmetic (cost 3) because:
          The first argument is a NUMBER, not a SINGLE-FLOAT.
          The second argument is a NUMBER, not a SINGLE-FLOAT.
          The result is a NUMBER, not a SINGLE-FLOAT.
          Unable to do inline float arithmetic (cost 3) because:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The second argument is a NUMBER, not a DOUBLE-FLOAT.
          The result is a NUMBER, not a DOUBLE-FLOAT.
          etc.

      (+ (* 2.0d0 Z-REAL Z-IMAG) C-IMAG)
    Note: Forced to do GENERIC-+ (cost 10).
          Unable to do inline float arithmetic (cost 2) because:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The result is a NUMBER, not a DOUBLE-FLOAT.

      (- Z-REAL-SQUARED Z-IMAG-SQUARED)
    Note: Forced to do GENERIC-- (cost 10).
          Unable to do inline float arithmetic (cost 2) because:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The second argument is a NUMBER, not a DOUBLE-FLOAT.
          The result is a NUMBER, not a DOUBLE-FLOAT.
          Unable to do inline float arithmetic (cost 2) because:
          The first argument is a NUMBER, not a SINGLE-FLOAT.
          The second argument is a NUMBER, not a SINGLE-FLOAT.
          The result is a NUMBER, not a SINGLE-FLOAT.
          etc.

      (+ (- Z-REAL-SQUARED Z-IMAG-SQUARED) C-REAL)
    Note: Forced to do GENERIC-+ (cost 10).
          Unable to do inline float arithmetic (cost 2) because:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The result is a NUMBER, not a DOUBLE-FLOAT.

      (* Z-REAL Z-REAL)
    Note: Forced to do GENERIC-* (cost 30).
          Unable to do inline float arithmetic (cost 3) because:
          The first argument is a NUMBER, not a SINGLE-FLOAT.
          The second argument is a NUMBER, not a SINGLE-FLOAT.
          The result is a NUMBER, not a SINGLE-FLOAT.
          Unable to do inline float arithmetic (cost 3) because:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The second argument is a NUMBER, not a DOUBLE-FLOAT.
          The result is a NUMBER, not a DOUBLE-FLOAT.
          etc.

      (* Z-IMAG Z-IMAG)
    Note: Forced to do GENERIC-* (cost 30).
          Unable to do inline float arithmetic (cost 3) because:
          The first argument is a NUMBER, not a SINGLE-FLOAT.
          The second argument is a NUMBER, not a SINGLE-FLOAT.
          The result is a NUMBER, not a SINGLE-FLOAT.
          Unable to do inline float arithmetic (cost 3) because:
          The first argument is a NUMBER, not a DOUBLE-FLOAT.
          The second argument is a NUMBER, not a DOUBLE-FLOAT.
          The result is a NUMBER, not a DOUBLE-FLOAT.
          etc.
      (THE FIXNUM *MAX-ITERATIONS*)
    Warning: Undefined variable: *MAX-ITERATIONS*


    Warning: This variable is undefined:
      *MAX-ITERATIONS*


    Compilation unit finished.
      2 warnings
      82 notes


    DOUBLE-FLOAT-ESCAPE-FN
    T
    NIL
    *

From: David Steuber
Subject: Re: Optimizing Numerical Code
Date: Wed, 01 Sep 2004 00:45:17 +0000
Message-ID: <87r7pm3jhe.fsf@david-steuber.com>

Joel Ray Holveck <·····@juniper.net> writes:

> > I need to speed up the following function:
> 
> Most people have suggested adding types to your code.  The compiler
> can give you that hint by itself, too.  CMUCL, for example will tell
> you about places where it can use optimized calls (such as inline
> addition instead of calling #'+) if you give it a narrower type.  I
> assume that SBCL will do the same.  I've used this information fairly
> extensively in some of my tight loops.
> 
> These are the compiler settings I had at the time:
>    (declaim (optimize (debug 3) (safety 3) (inhibit-warnings 3)))
>    (setq *compile-print* nil)
>    (setq *compile-verbose* nil)
> 
> I compiled your code on CMUCL here, and here's what I got:
<snip>

SBCL also gives such notes.  My big problem is not knowing what to do
about them.  Being informed that OF-TYPE existed was a major help to
me and now I get fewer notes when compiling my code.  In time, I hope
to understand things better so that I can satisfy the compiler.

OpenMCL has not given me any notes at all.  It may be some setting I
have to turn on for that, I don't know.

It's all part of the process of learning how to make Lisp compile into
code that runs as fast as C.  This experience has helped me see how to
do that.  I'm sure Lisp would be a more popular language if more
people knew how to write Lisp programs that ran as fast as their C
equivalents but could be developed as quickly as something written in
Python or Perl.

A 48x improvement was pretty darn good.  If I can learn to understand
Wade's code and it runs 4x faster than that, well, that will have me
very impressed.  The thought of going 192x faster makes my head spin
in a good way.

-- 
An ideal world is left as an excercise to the reader.
   --- Paul Graham, On Lisp 8.1

From: Marco Baringer
Subject: Re: Optimizing Numerical Code
Date: Tue, 31 Aug 2004 14:04:51 +0000
Message-ID: <m24qmj1jzw.fsf@convey.it>

David Steuber <·····@david-steuber.com> writes:

> I need to speed up the following function:

[snip]

i saw you're using os x, if you're also using openmcl i had _great_
success with randall bear's fpc-ppc:

http://vorlon.cwru.edu/~beer/Software/FPC-PPC/FPC-PPC-DOC-0.2.txt

-- 
-Marco
Ring the bells that still can ring.
Forget your perfect offering.
There is a crack in everything.
That's how the light gets in.
     -Leonard Cohen

From: David Steuber
Subject: Re: Optimizing Numerical Code
Date: Wed, 01 Sep 2004 00:25:55 +0000
Message-ID: <87wtze3kdo.fsf@david-steuber.com>

Marco Baringer <··@bese.it> writes:

> i saw you're using os x, if you're also using openmcl i had _great_
> success with randall bear's fpc-ppc:
> 
> http://vorlon.cwru.edu/~beer/Software/FPC-PPC/FPC-PPC-DOC-0.2.txt

I've been developing on Emacs + SLIME + OpenMCL and testing with SBCL
which has been running faster (and also giving me notes on things it
can't optimize, although I do not understand all things yet).  The
above link is something I will look at though.  Thanks.

Also thanks to the others for their replies.  I need to look at Wade's
code.  If I can gain another 4x improvement over a fully typed loop,
then that will be nearly 200 times faster than my original code.  As
it is, OF-TYPE was the key.  I was not familiar with this word in Lisp
and it made a tremendous improvement all by itself.  With SBCL on my
Mac, here is the before and after:

From:

* (time (double-float-escape-fn -0.10488076384628028d0
  0.9256099657646029d0))

Evaluation took:
                 3.51 seconds of real time
                 2.35 seconds of user run time
                 1.07 seconds of system run time
                 0 page faults and
                 204,646,208 bytes consed.
2858197

To:

* (time (double-float-escape-fn -0.10488076384628028d0
  0.9256099657646029d0))

Evaluation took:
                 0.073 seconds of real time
                 0.07 seconds of user run time
                 0.0 seconds of system run time
                 0 page faults and
                 80 bytes consed.
2858197

* (/ 3.51 0.073)

48.08219

The real fun will be trying to speed up bignum code when the
resolution can't be represented by a double-float.  At that point, the
iteration counts will also be much higher.  Until I grok and test
Wade's version, here is how the function currently stands:

(defun double-float-escape-fn (c-real c-imag)
  "Return iterations for c-real + c-imag i as floating-point numbers"
  (declare (type double-float c-real c-imag)
           (optimize (speed 3) (safety 0) (space 0)))
  (loop for iterations of-type fixnum below *max-iterations*
     for period of-type fixnum from 1
     for z-imag of-type double-float         = 0.0D0 then (+ (* 2.0D0
  z-real z-imag) c-imag)       
     for z-real of-type double-float         = 0.0D0 then (+ (-
  z-real-squared z-imag-squared) c-real)
     for z-real-squared of-type double-float = 0.0D0 then (* z-real
  z-real)
     for z-imag-squared of-type double-float = 0.0D0 then (* z-imag
  z-imag)
     with a-imag of-type double-float        = 0.0D0
     with a-real of-type double-float        = 0.0D0
     with check of-type fixnum               = 3
     unless (< (+ z-real-squared z-imag-squared) 4.0D0) return
  iterations
     when (and (= a-imag z-imag) (= a-real z-real) (> iterations 1))
  return *max-iterations*
     when (> period check) do (setf period 1 check (* 2 check) a-imag
  z-imag a-real z-real)
     finally (return iterations)))

-- 
An ideal world is left as an excercise to the reader.
   --- Paul Graham, On Lisp 8.1

From: ··········@hotmail.com
Subject: Re: Optimizing Numerical Code
Date: Wed, 01 Sep 2004 06:43:41 +0000
Message-ID: <f7afbb21.0408312243.573c419c@posting.google.com>

Marco Baringer <··@bese.it> wrote in message news:<··············@convey.it>...
> David Steuber <·····@david-steuber.com> writes:
> 
> > I need to speed up the following function:
> 
> [snip]
> 
> i saw you're using os x, if you're also using openmcl i had _great_
> success with randall bear's fpc-ppc:
> 
> http://vorlon.cwru.edu/~beer/Software/FPC-PPC/FPC-PPC-DOC-0.2.txt

I haven't followed the thread, but one should be aware that OS X and
trigonometric functions will automatically result in a desaster
provided one uses a G3 or G4 processor. A G3/G4 1000 MHz processor is
not better than a Celeron when speaking of cosine, sine, etc.
functions.

That should bet kept in mind.

From: Wade Humeniuk
Subject: Re: Optimizing Numerical Code
Date: Tue, 31 Aug 2004 17:12:26 +0000
Message-ID: <_j2Zc.80192$X12.40846@edtnps84>

Here's the fastest version I can get on LWW

(defun double-float-escape-fn (c-real c-imag &optional (max-iterations *max-iterations*))
   "Return iterations for c-real + c-imag i as floating-point numbers"
   (declare (type double-float c-real c-imag) (type fixnum max-iterations)
            (optimize (speed 3) (safety 0) (space 0) #+lispworks(float 0)))
   (prog ((z-imag 0.0D0) (z-real 0.0D0) (z-real-squared 0.0D0) (z-imag-squared 0.0D0)
          (a-imag 0.0D0) (a-real 0.0D0)
          (iterations 0) (check 3) (period 1))
     (declare (type double-float z-imag z-real z-real-squared z-imag-squared
                    a-imag a-real)
              (type fixnum iterations check period))
     iterate
     (if (= iterations max-iterations)
         (go complete)
       (progn
         (setf z-imag (+ (* 2.0D0 z-real z-imag) c-imag)
               z-real (+ (- z-real-squared z-imag-squared) c-real)
               z-real-squared (* z-real z-real)
               z-imag-squared (* z-imag z-imag))
         (unless (< (+ z-real-squared z-imag-squared) 4.0D0) (go complete))
         (when (and (= a-imag z-imag) (= a-real z-real))
           (setf iterations max-iterations)
           (go complete))
         (when (> period check)
           (setf period 1 check (* 2 check) a-imag z-imag a-real z-real))))
     (incf iterations)
     (incf period)
     (go iterate)
     complete
     (return-from double-float-escape-fn iterations)))

CL-USER 18 > (time (double-float-escape-fn -0.10488076384628028d0 0.9256099657646029d0))

0.1 seconds used.
Standard Allocation 0 bytes.
Fixlen Allocation 0 bytes.
2858197

CL-USER 19 >

It's about 4 times faster than a fully typed loop version.  It's about
as close to the bone as I can think of.

Wade

From: Wade Humeniuk
Subject: Re: Optimizing Numerical Code
Date: Tue, 31 Aug 2004 17:25:22 +0000
Message-ID: <6w2Zc.80195$X12.5260@edtnps84>

Just for the curious, here is the disassemble

CL-USER 20 : 1 > (disassemble 'double-float-escape-fn)
; Loading fasl file D:\HARLEQUIN\LISPWORKS\lib\4-1-0-0\modules\util\disass.fsl
; Loading fasl file D:\HARLEQUIN\LISPWORKS\lib\4-1-0-0\modules\memory\findptr.fsl
21217BBA:
        0:      80FD02           cmpb  ch, 2
        3:      7508             jne   L1
        5:      5E               pop   esi
        6:      50               push  eax
        7:      56               push  esi
        8:      B810000000       move  eax, 10
L1:   13:      55               push  ebp
       14:      89E5             move  ebp, esp
       16:      83EC44           sub   esp, 44
       19:      C7042445440000   move  [esp], 4445
       26:      50               push  eax
       27:      50               push  eax
       28:      50               push  eax
       29:      50               push  eax
       30:      81E100FF0000     and   ecx, FF00
       36:      80FD03           cmpb  ch, 3
       39:      724E             jb    L5
L2:   41:      DD0578E8D520     fldl  [20D5E878]       ; 0.0
       47:      DD5DF0           fstpl [ebp-10]
       50:      DD0578E8D520     fldl  [20D5E878]       ; 0.0
       56:      DD5DF8           fstpl [ebp-8]
       59:      DD0578E8D520     fldl  [20D5E878]       ; 0.0
       65:      DD5DE8           fstpl [ebp-18]
       68:      DD0578E8D520     fldl  [20D5E878]       ; 0.0
       74:      DD5DE0           fstpl [ebp-20]
       77:      DD0578E8D520     fldl  [20D5E878]       ; 0.0
       83:      DD5DD0           fstpl [ebp-30]
       86:      DD0578E8D520     fldl  [20D5E878]       ; 0.0
       92:      DD5DD8           fstpl [ebp-28]
       95:      33DB             xor   ebx, ebx
       97:      BA00030000       move  edx, 300
      102:      BF00010000       move  edi, 100
L3:  107:      3B5DAC           cmp   ebx, [ebp-54]
      110:      7512             jne   L6
L4:  112:      89D8             move  eax, ebx
      114:      FD               std
      115:      C9               leave
      116:      C20800           ret   8
L5:  119:      8B3DA8821D21     move  edi, [211D82A8]  ; *MAX-ITERATIONS*
      125:      897DAC           move  [ebp-54], edi
      128:      EBA7             jmp   L2
L6:  130:      DD45F0           fldl  [ebp-10]
      133:      DC4DF8           fmull [ebp-8]
      136:      DD05886D2121     fldl  [21216D88]       ; 2.0
      142:      DEC9             fmulp st(1), st
      144:      8B4508           move  eax, [ebp+8]
      147:      DD4004           fldl  [eax+4]
      150:      DEC1             faddp st(1), st
      152:      DD5DF0           fstpl [ebp-10]
      155:      DD45E8           fldl  [ebp-18]
      158:      DD5DF8           fstpl [ebp-8]
      161:      DD45E0           fldl  [ebp-20]
      164:      DC6DF8           fsubrl [ebp-8]
      167:      8B450C           move  eax, [ebp+C]
      170:      DD4004           fldl  [eax+4]
      173:      DEC1             faddp st(1), st
      175:      DD55F8           fstl  [ebp-8]
      178:      DD5DF8           fstpl [ebp-8]
      181:      DD45F8           fldl  [ebp-8]
      184:      DC4DF8           fmull [ebp-8]
      187:      DD5DE8           fstpl [ebp-18]
      190:      DD45F0           fldl  [ebp-10]
      193:      DC4DF0           fmull [ebp-10]
      196:      DD5DE0           fstpl [ebp-20]
      199:      DD45E0           fldl  [ebp-20]
      202:      DC45E8           faddl [ebp-18]
      205:      DD05786D2121     fldl  [21216D78]       ; 4.0
      211:      DED9             fcompp
      213:      DFE0             fstsw ax
      215:      9E               sahf
      216:      7A96             jp    L4
      218:      7694             jbe   L4
      220:      DD45F0           fldl  [ebp-10]
      223:      DC5DD0           fcompl [ebp-30]
      226:      DFE0             fstsw ax
      228:      9E               sahf
      229:      7A17             jp    L7
      231:      7515             jne   L7
      233:      DD45F8           fldl  [ebp-8]
      236:      DC5DD8           fcompl [ebp-28]
      239:      DFE0             fstsw ax
      241:      9E               sahf
      242:      7A0A             jp    L7
      244:      7508             jne   L7
      246:      8B5DAC           move  ebx, [ebp-54]
      249:      E972FFFFFF       jmp   L4
L7:  254:      3BFA             cmp   edi, edx
      256:      7E1A             jle   L8
      258:      BF00010000       move  edi, 100
      263:      8955B0           move  [ebp-50], edx
      266:      8B55B0           move  edx, [ebp-50]
      269:      C1E201           shl   edx, 1
      272:      DD45F0           fldl  [ebp-10]
      275:      DD5DD0           fstpl [ebp-30]
      278:      DD45F8           fldl  [ebp-8]
      281:      DD5DD8           fstpl [ebp-28]
L8:  284:      8DB300010000     lea   esi, [ebx+100]
      290:      8975B4           move  [ebp-4C], esi
      293:      8B5DB4           move  ebx, [ebp-4C]
      296:      8DB700010000     lea   esi, [edi+100]
      302:      8975B8           move  [ebp-48], esi
      305:      8B7DB8           move  edi, [ebp-48]
      308:      E932FFFFFF       jmp   L3
      313:      90               nop
NIL

CL-USER 21 : 1 >

Wade

From: David Steuber
Subject: Re: Optimizing Numerical Code
Date: Wed, 01 Sep 2004 01:05:19 +0000
Message-ID: <87llfu3ik0.fsf@david-steuber.com>

Wade Humeniuk <····································@telus.net> writes:

> Here's the fastest version I can get on LWW
<snip>
> CL-USER 18 > (time (double-float-escape-fn -0.10488076384628028d0 0.9256099657646029d0))
> 
> 0.1 seconds used.
> Standard Allocation 0 bytes.
> Fixlen Allocation 0 bytes.
> 2858197
> 
> CL-USER 19 >
> 
> It's about 4 times faster than a fully typed loop version.  It's about
> as close to the bone as I can think of.

Thanks.  I was looking for the big speed up, but it didn't happen for
me on SBCL :-(.  Perhaps SBCL just has a really good LOOP macro or the
compiler is just that good.  I made a couple minor changes when I
tested so that I could do a side by side apples to apples comparison.
I made max-iterations *max-iterations* to point to my global variable
and I renamed the function so that it could coexist with the one I
had.  Here are my results:

* (setf *max-iterations* 2858197)

2858197
* (time (double-float-escape-fn -0.10488076384628028d0
0.9256099657646029d0))

Evaluation took:
                 0.072 seconds of real time
                 0.07 seconds of user run time
                 0.0 seconds of system run time
                 0 page faults and
                 80 bytes consed.
2858197
* (time (wade-double-float-escape-fn -0.10488076384628028d0
0.9256099657646029d0))

Evaluation took:
                 0.075 seconds of real time
                 0.07 seconds of user run time
                 0.0 seconds of system run time
                 0 page faults and
                 80 bytes consed.
2858197

I'd call it a wash, so I don't think I'll go with the extra
complexity.  It is interesting to see C style GOTO functionality
though.  Thanks.

-- 
An ideal world is left as an excercise to the reader.
   --- Paul Graham, On Lisp 8.1