From: Frederick Crabbe
Subject: efficiency, complex numbers
Date: 
Message-ID: <wzc4r9hftkl.fsf@crab.cs.usna.edu>
I have a bit of code I wrote to find the roots of a quartic
polynomial.  It's a closed form solution, so it is pretty
straightforward.  But this code will lie at the core of the most inner
loop of the program; I estimate will be called 20 million
times. Obviously, I wish to make it as fast as possible.

My primary worry is with how I'm handling complex numbers.  There are
lots of square roots being found, and right now I don't do anything
special with the return value.  That is, if it is complex, I treat it
just as I do real values in other computations.  Is that the right
thing to do, or should I test before my sqrt and handle the complex
cases myself explicitly?

Similarly, when I take the cube root of a negative real, I want the
real root, not the primary.  I wrote a function for that, but is that
the right thing to do? Maple has a built in function, surd(x), which
finds the root closest to x, returning the real root in my cases.  I
don't see anything in common lisp like that.

ACL 6.0 on Solaris, if it makes a difference.  

Any suggestions would be much appreciated.

cheers,
ric

;;;Given a quartic in the form x^4+ax^3+bx^2+cx+d, quartic-root
;;;returns a list of the four roots.
(defun quartic-roots (a b c d)
  (let* ((y (real-cubic-root (* -1 b) (- (* a c) (* 4 d))  
                             (+ (* -1 a a d) (* 4 b d) (* -1 c c))))
	 (R (sqrt (+ (* a a 1/4) (* -1 b) y)))
	 (p1 (* 3/4 a a))
	 (p2 (* -1 R R))
	 (p3 (* -2 b))
	 (p4 (* 4 a b))
	 (p5 (* -8 c))
	 (p6 (* -1 a a a))
	 (p7 (* a -1/4))
	 (bigD (/ (sqrt (+ p1 p2 p3 (/ (+ p4 p5 p6) (* 4 R)))) 2))
	 (bigE (/ (sqrt (+ p1 p2 p3 (/ (+ p4 p5 p6) (* -4 R))))2))
	 (f (+ p7 (* R 1/2)))
	 (g (+ p7 (* R -1/2))))
    ;;bigD, bigE, or f and g could be complex.  Return them all and
    ;;let the caller figure it out.
    (list (+ f bigD) (- f bigD) (+ g bigE) (- g bigE))))


;;;quartic-roots needs at least one root from its resolvent cubic equation.
;;;real-cubic-root returns the real one.
(defun real-cubic-root (p q r)
  (let ((a (/ (- (* 3 q) (* p p)) 3))
	(b (/ (+ (* 2 p p p) (* -1 9 p q) (* 27 r)) 27)))
    (let* ((p1 (* -1 b 1/2))
	   (p2 (expt (+ (* b b 1/4) (* a a a 1/27)) 1/2)))
      ;;There are 3 roots to the cubic.  This one is guaranteed to be real.
      ;;If b^2/4+a^3/27 < 0, then bigA and bigB are conjugate complex numbers,
      ;;and adding them together results in a real.  Since Lisp returns #c(n 0.0) 
      ;;in that case, I take the real part.
      (realpart (+ (* -1/3 p) (real-cube-root (+ p1 p2)) 
                              (real-cube-root (- p1 p2)))))))

(defmacro sign (n)
  `(if (< ,n 0) -1 1))

(defun real-cube-root (n)
  ;;;The principle cube root of a negative number is likely to be
  ;;;complex because of how the branch cuts are selected.  This
  ;;;function ensures that we get the real root. If the number is
  ;;;already complex, don't worry about it.
  (if (complexp n)
      (expt n 1/3)
    (let* ((s (sign n))
	   (n2 (* s n)))
      (* s (expt n2 1/3)))))

From: Joe Marshall
Subject: Re: efficiency, complex numbers
Date: 
Message-ID: <el8la5y4.fsf@ccs.neu.edu>
Frederick Crabbe <······@usna.edu> writes:

> I have a bit of code I wrote to find the roots of a quartic
> polynomial.  It's a closed form solution, so it is pretty
> straightforward.  But this code will lie at the core of the most inner
> loop of the program; I estimate will be called 20 million
> times. Obviously, I wish to make it as fast as possible.
> 


Do you expect to be getting a lot of complex results?  If the
bulk of your computations will be complex it might be better to
separate out the real and imaginary parts.  Complex numbers are
(probably) heap-allocated, so keeping the real and imaginary parts
separate would allow them to be on the stack.  If the real and
imaginary parts are floats, then there may be performance lost when
boxing and unboxing them, too.

> My primary worry is with how I'm handling complex numbers.  There are
> lots of square roots being found, and right now I don't do anything
> special with the return value.  That is, if it is complex, I treat it
> just as I do real values in other computations.  Is that the right
> thing to do, or should I test before my sqrt and handle the complex
> cases myself explicitly?

Since you are looking for the real cubic root, it might be a better
idea to ensure that you don't take a negative square root from the
start rather than attempting to recover the real part after the fact.

> Similarly, when I take the cube root of a negative real, I want the
> real root, not the primary.  I wrote a function for that, but is that
> the right thing to do? Maple has a built in function, surd(x), which
> finds the root closest to x, returning the real root in my cases.  I
> don't see anything in common lisp like that.
> 
> ACL 6.0 on Solaris, if it makes a difference.  
> 
> Any suggestions would be much appreciated.
> 
> cheers,
> ric
> 
> ;;;Given a quartic in the form x^4+ax^3+bx^2+cx+d, quartic-root
> ;;;returns a list of the four roots.
> (defun quartic-roots (a b c d)
>   (let* ((y (real-cubic-root (* -1 b) (- (* a c) (* 4 d))  
>                              (+ (* -1 a a d) (* 4 b d) (* -1 c c))))
> 	 (R (sqrt (+ (* a a 1/4) (* -1 b) y)))
> 	 (p1 (* 3/4 a a))
> 	 (p2 (* -1 R R))
> 	 (p3 (* -2 b))
> 	 (p4 (* 4 a b))
> 	 (p5 (* -8 c))
> 	 (p6 (* -1 a a a))
> 	 (p7 (* a -1/4))
> 	 (bigD (/ (sqrt (+ p1 p2 p3 (/ (+ p4 p5 p6) (* 4 R)))) 2))
> 	 (bigE (/ (sqrt (+ p1 p2 p3 (/ (+ p4 p5 p6) (* -4 R))))2))
> 	 (f (+ p7 (* R 1/2)))
> 	 (g (+ p7 (* R -1/2))))
>     ;;bigD, bigE, or f and g could be complex.  Return them all and
>     ;;let the caller figure it out.
>     (list (+ f bigD) (- f bigD) (+ g bigE) (- g bigE))))
> 
> 
> ;;;quartic-roots needs at least one root from its resolvent cubic equation.
> ;;;real-cubic-root returns the real one.
> (defun real-cubic-root (p q r)
>   (let ((a (/ (- (* 3 q) (* p p)) 3))
> 	(b (/ (+ (* 2 p p p) (* -1 9 p q) (* 27 r)) 27)))
>     (let* ((p1 (* -1 b 1/2))
> 	   (p2 (expt (+ (* b b 1/4) (* a a a 1/27)) 1/2)))
>       ;;There are 3 roots to the cubic.  This one is guaranteed to be real.
>       ;;If b^2/4+a^3/27 < 0, then bigA and bigB are conjugate complex numbers,
>       ;;and adding them together results in a real.  Since Lisp returns #c(n 0.0) 
>       ;;in that case, I take the real part.
>       (realpart (+ (* -1/3 p) (real-cube-root (+ p1 p2)) 
>                               (real-cube-root (- p1 p2)))))))
> 
> (defmacro sign (n)
>   `(if (< ,n 0) -1 1))
> 
> (defun real-cube-root (n)
>   ;;;The principle cube root of a negative number is likely to be
>   ;;;complex because of how the branch cuts are selected.  This
>   ;;;function ensures that we get the real root. If the number is
>   ;;;already complex, don't worry about it.
>   (if (complexp n)
>       (expt n 1/3)
>     (let* ((s (sign n))
> 	   (n2 (* s n)))
>       (* s (expt n2 1/3)))))
From: Frederick Crabbe
Subject: Re: efficiency, complex numbers
Date: 
Message-ID: <wzcvg1tesc9.fsf@crab.cs.usna.edu>
Joe Marshall <···@ccs.neu.edu> writes:
> Do you expect to be getting a lot of complex results?  If the
> bulk of your computations will be complex it might be better to
> separate out the real and imaginary parts.  Complex numbers are
> (probably) heap-allocated, so keeping the real and imaginary parts
> separate would allow them to be on the stack.  If the real and
> imaginary parts are floats, then there may be performance lost when
> boxing and unboxing them, too.

I have no idea how often the results will be complex.  But I'm getting
the impression from your post and others it very well might be worth
my while to separate the complex cases out.  Thanks.

> Since you are looking for the real cubic root, it might be a better
> idea to ensure that you don't take a negative square root from the
> start rather than attempting to recover the real part after the fact.

Unfortunately, there are some cases where the cubic root is real, but
the intermediate values are complex conjugates that are later summed.
But of course, I can test for that and go down a different branch in
that case.

cheers,
ric
From: Gareth McCaughan
Subject: Re: efficiency, complex numbers
Date: 
Message-ID: <slrnavni3a.11v7.Gareth.McCaughan@g.local>
Frederick Crabbe wrote:
>  I have a bit of code I wrote to find the roots of a quartic
>  polynomial.  It's a closed form solution, so it is pretty
>  straightforward.  But this code will lie at the core of the most inner
>  loop of the program; I estimate will be called 20 million
>  times. Obviously, I wish to make it as fast as possible.

20 million times *ever*? (If so, you probably spent more time
typing that paragraph than reducing the runtime of the
quartic solver to 0 would buy you.) 

>  My primary worry is with how I'm handling complex numbers.  There are
>  lots of square roots being found, and right now I don't do anything
>  special with the return value.  That is, if it is complex, I treat it
>  just as I do real values in other computations.  Is that the right
>  thing to do, or should I test before my sqrt and handle the complex
>  cases myself explicitly?

You may get better code by checking explicitly, even if the
calculations you then do are identical in each case. Consider
these two functions:

    (defun f1 (x)
      (* x (1+ x) (1- x)))

    (defun f2 (x)
      (typecase x
        ((double-float) (* x (1+ x) (1- x)))
        ((single-float) (* x (1+ x) (1- x)))
        (t              (* x (1+ x) (1- x)))))

Clearly they calculate the same thing. However, in F2 the
compiler can, in principle, open-code everything in the
double-float and single-float cases; it might not think
to try that with the original F1.

For instance, on my box (Athlon/1GHz, FreeBSD, CMUCL 18d)
if I do (declaim (inline f1 f2)), then define those functions
above, then

    (defun t1 ()
      (let ((xs (make-array '(1000000) :element-type 'double-float
                                       :initial-element 1.0d0)))
        (time (loop for x across xs sum (f1 x)))))

and similarly for t2 using f2, then t1 reports having used about
1.6 CPU seconds and t2 reports having used about 0.7 CPU seconds.
How this would be for you using ACL on Solaris, I don't know.

It's also conceivable, alas, that doing everything with reals
might be faster than using complex numbers. If your implementation
is clever enough then it might be able to open-code complex
arithmetic and avoid making boxes for the complex numbers it
uses, but I wouldn't advise you to bet on it.

    (defun f1 (z)
      (declare (type (complex double-float) z))
      (loop repeat 1000000 do
        (setq z (* z z)))
      z)

    (defun f2 (z)
      (declare (type (complex double-float) z))
      (let ((x (realpart z))
            (y (imagpart z)))
        (loop repeat 1000000 do
          (psetq x (- (* x x) (* y y))
                 y (* 2 x y)))
        (complex x y)))

On my system, F2 is about twice the speed of F1.

In any case, you should consider sprinkling your code with
declarations. If anything's known to be (say) of type
double-float, then you may get much better results if you
declare the functions and their arguments to have the
appropriate types. In particular, the compiler may be
able to avoid boxing things. (This works better when
the functions are inlined. If I recall correctly, ACL
ignores the INLINE declaration for user-defined functions,
but maybe it's good at providing special entries and
returns that avoid boxing arguments?)

Please note that effective use of declarations is *very*
compiler-dependent. CMUCL is very good with numeric stuff,
and its compiler puts a lot of effort into type inference.
I don't know how ACL compares in this area.

Another word of warning. Is there any danger that some of
the coefficients passed into QUARTIC-ROOTS might be integers?
(Say, the integer 0.) If so, then note that you're liable
to get a SINGLE-FLOAT from things like (expt 2 1/3). See
section 12.1.3.3 of the Standard or the HyperSpec.

>  (defmacro sign (n)
>    `(if (< ,n 0) -1 1))

There's already a SIGNUM function. I'd hope your implementation
is clever enough to inline it at least somewhat.

>  (defun real-cube-root (n)
>    ;;;The principle cube root of a negative number is likely to be
>    ;;;complex because of how the branch cuts are selected.  This
>    ;;;function ensures that we get the real root. If the number is
>    ;;;already complex, don't worry about it.
>    (if (complexp n)
>        (expt n 1/3)
>      (let* ((s (sign n))
>             (n2 (* s n)))
>        (* s (expt n2 1/3)))))

I dislike both the name and the implementation of this function.
I don't think you should call something REAL-CUBE-ROOT if it
might return something that isn't real. So, either (1) check
the sign of the thing you square-root to get P2 and then call
a function defined thus

    (defun real-cube-root (x)
      "The unique real cube root of the real number X."
      (declare (type real x))
      ;; if you can make that SINGLE-FLOAT or DOUBLE-FLOAT,
      ;; it is likely to be a win.
      (* (signum x) (expt (abs x) 1/3)))

or else (2) don't bother checking the signs and call this

    (defun cube-root (x)
      "A cube root of X. If X is real, this function guarantees
      to return its real cube root."
      (if (complexp x)
        (expt x 1/3)
        (* (signum x) (expt (abs x) 1/3))))

I prefer #1, especially as you can move the sign check to
before you square-root P2 and thereby might get faster code
in the positive case :-).

-- 
Gareth McCaughan  ················@pobox.com
.sig under construc
From: Lieven Marchand
Subject: Re: efficiency, complex numbers
Date: 
Message-ID: <87pts3moq5.fsf@wyrd.be>
Gareth McCaughan <················@pobox.com> writes:

> 20 million times *ever*? (If so, you probably spent more time
> typing that paragraph than reducing the runtime of the
> quartic solver to 0 would buy you.) 

For what it's worth, if you're interested in numerical solutions, the
quartic solver is probably not the way to go. Go for newton iteration
with root polishing or something similar and it'll probably run faster
than the quartic formula.

-- 
When they ran out of ships, they used guns. When they ran out of guns,
they used knives and sticks and bare hands. They did this for two
years. They never ran out of courage. But in the end, they ran out of
time.                                          B5 --- In the Beginning
From: Gareth McCaughan
Subject: Re: efficiency, complex numbers
Date: 
Message-ID: <slrnavq87k.11v7.Gareth.McCaughan@g.local>
Lieven Marchand wrote:

[I wrote:]
> > 20 million times *ever*? (If so, you probably spent more time
> > typing that paragraph than reducing the runtime of the
> > quartic solver to 0 would buy you.) 

[Lieven:]
>  For what it's worth, if you're interested in numerical solutions, the
>  quartic solver is probably not the way to go. Go for newton iteration
>  with root polishing or something similar and it'll probably run faster
>  than the quartic formula.

I'm not sure whether that's true in general[1], but it occurs to me
that if the original poster is solving 20 million quartics every time
the program runs then maybe each is very close to its predecessor --
which is about the most favourable circumstance for Newton iteration
that one can think of. :-)

[1] I've heard it claimed, but I expect it depends on just how
    accurate you want your solutions to be, whether you need all
    the roots, how fast your cube-root routine is, and so on.

    Speaking of which, it *might* conceivably be worth the original
    poster's while to write a special cube-root routine rather than
    depending on EXPT to run fast.

-- 
Gareth McCaughan  ················@pobox.com
.sig under construc
From: Frederick Crabbe
Subject: Re: efficiency, complex numbers
Date: 
Message-ID: <wzcn0n5ep0q.fsf@crab.cs.usna.edu>
Gareth McCaughan <················@pobox.com> writes:
> Lieven Marchand wrote:
> >  For what it's worth, if you're interested in numerical solutions, the
> >  quartic solver is probably not the way to go. Go for newton iteration
> >  with root polishing or something similar and it'll probably run faster
> >  than the quartic formula.
> 
> I'm not sure whether that's true in general[1], but it occurs to me
> that if the original poster is solving 20 million quartics every time
> the program runs then maybe each is very close to its predecessor --
> which is about the most favourable circumstance for Newton iteration
> that one can think of. :-)
> 
> [1] I've heard it claimed, but I expect it depends on just how
>     accurate you want your solutions to be, whether you need all
>     the roots, how fast your cube-root routine is, and so on.
> 
>     Speaking of which, it *might* conceivably be worth the original
>     poster's while to write a special cube-root routine rather than
>     depending on EXPT to run fast.


Yes, this (what Lieven wrote) blows my intuition all ahoo.  Newton's
method has a tendency to be very bad if you don't get a good initial
guess, and the only fixes I know of require extra backtracking.  I
would expect that whatever search is built in to expt, it is less of a
hassle than Newton;  but what do I know, that's why I posted.

As for writing my own cube-root: I certainly could, but I have the
default model of trusting that the library functions in most languages
will be orders of magnitude better than anything I could write.  Is it
really the case that one I wrote could be that much better? Especially
since I've never really coded lisp with speed in mind before...

Finally, now that the subject is up, I should mention where the
quartic comes from.  I am really trying to _maximize_ a quadratic that
is constrained by another quadratic.  After applying Lagrange, I end
up with the quartic.  The whole point of this was to be faster than
finding the max of the original quadratic by search.  But I'm getting
the feeling that you folks think this might not be the case (which
would be a pity, wasting all that beautiful math).  What do y'all
think? should I just maximize the original function?

cheers,
ric
From: Lieven Marchand
Subject: Re: efficiency, complex numbers
Date: 
Message-ID: <87isxtobkt.fsf@wyrd.be>
Frederick Crabbe <······@usna.edu> writes:

> Yes, this (what Lieven wrote) blows my intuition all ahoo.  Newton's
> method has a tendency to be very bad if you don't get a good initial
> guess, and the only fixes I know of require extra backtracking.  I
> would expect that whatever search is built in to expt, it is less of a
> hassle than Newton;  but what do I know, that's why I posted.

But you can get excellent initial values for polynomials. There are
whole heaps of theorems about the distribution of zeroes. 

> Finally, now that the subject is up, I should mention where the
> quartic comes from.  I am really trying to _maximize_ a quadratic that
> is constrained by another quadratic.  After applying Lagrange, I end
> up with the quartic.  The whole point of this was to be faster than
> finding the max of the original quadratic by search.  But I'm getting
> the feeling that you folks think this might not be the case (which
> would be a pity, wasting all that beautiful math).  What do y'all
> think? should I just maximize the original function?

Could you post the full problem definition?

-- 
When they ran out of ships, they used guns. When they ran out of guns,
they used knives and sticks and bare hands. They did this for two
years. They never ran out of courage. But in the end, they ran out of
time.                                          B5 --- In the Beginning
From: Raymond Toy
Subject: Re: efficiency, complex numbers
Date: 
Message-ID: <4nlm2qj56t.fsf@rtp.ericsson.se>
>>>>> "Gareth" == Gareth McCaughan <················@pobox.com> writes:

    Gareth>     (defun f1 (z)
    Gareth>       (declare (type (complex double-float) z))
    Gareth>       (loop repeat 1000000 do
    Gareth>         (setq z (* z z)))
    Gareth>       z)

    Gareth>     (defun f2 (z)
    Gareth>       (declare (type (complex double-float) z))
    Gareth>       (let ((x (realpart z))
    Gareth>             (y (imagpart z)))
    Gareth>         (loop repeat 1000000 do
    Gareth>           (psetq x (- (* x x) (* y y))
    Gareth>                  y (* 2 x y)))
    Gareth>         (complex x y)))

    Gareth> On my system, F2 is about twice the speed of F1.

This is because of 2 things: The compiler is not smart enough in f1 to
know you're squaring z which you've optimized in f2 to remove some
redundant operations.  Also, the compiler for some reason usually
generates a bunch of redundant move operations for f1 which doesn't
happen for f2.  (Based on my experience on cmucl/sparc.  I didn't
disassemble these to look.)

Compiler macros could be very useful here if this is really important.

At lease CMUCL doesn't cons to death for f1 like every other
implementation I know of.  :-)

Ray
From: Frederick Crabbe
Subject: Re: efficiency, complex numbers
Date: 
Message-ID: <wzcr8cher5a.fsf@crab.cs.usna.edu>
Gareth McCaughan <················@pobox.com> writes:

> Frederick Crabbe wrote:
> >  I have a bit of code I wrote to find the roots of a quartic
> >  polynomial.  It's a closed form solution, so it is pretty
> >  straightforward.  But this code will lie at the core of the most inner
> >  loop of the program; I estimate will be called 20 million
> >  times. Obviously, I wish to make it as fast as possible.
> 
> 20 million times *ever*? (If so, you probably spent more time
> typing that paragraph than reducing the runtime of the
> quartic solver to 0 would buy you.) 

well, no- I guess that wasn't as clear as it should have been.  20
million times in one invocation of the unit. I guess that unit will be
called 500,000 to 1,000,000 times over the course of the experiment.
> 
> You may get better code by checking explicitly, even if the
> calculations you then do are identical in each case. Consider
> these two functions:
> 
>     (defun f1 (x)
>       (* x (1+ x) (1- x)))
> 
>     (defun f2 (x)
>       (typecase x
>         ((double-float) (* x (1+ x) (1- x)))
>         ((single-float) (* x (1+ x) (1- x)))
>         (t              (* x (1+ x) (1- x)))))
> 


Ohh, nice.  I wouldn't have thought of that at all.

> In any case, you should consider sprinkling your code with
> declarations. If anything's known to be (say) of type

Yes, of course declarations are key.  I should have had them in the
code I posted, but laziness uber alles.

> Please note that effective use of declarations is *very*
> compiler-dependent. CMUCL is very good with numeric stuff,
> and its compiler puts a lot of effort into type inference.
> I don't know how ACL compares in this area.

Does anyone know? aren't there some ACL types who hang out around
here? 

Another question is, if CMUCL does such a nice job, should I switch to
it?  I have a linux box that's faster than my Sun, and if CMUCL is so
good, it might be worth my while to move over to that.

> Another word of warning. Is there any danger that some of
> the coefficients passed into QUARTIC-ROOTS might be integers?
> (Say, the integer 0.) If so, then note that you're liable
> to get a SINGLE-FLOAT from things like (expt 2 1/3). See
> section 12.1.3.3 of the Standard or the HyperSpec.

Yikes.  At this stage, I think it is unlikely that they will be
integers, but that's another thing I wouldn't have thought to check.
Thanks for the heads up.

> I dislike both the name and the implementation of this function.
> I don't think you should call something REAL-CUBE-ROOT if it
> might return something that isn't real. So, either (1) check
> the sign of the thing you square-root to get P2 and then call
> a function defined thus

Ugh, I'm embarassed I posted that.  Please pretend you never saw it.
The problem with the name came from the function originally just
returning the real root, and I later grafted on the complex part.  Not
a very good excuse, I know.  The general yuckiness came from me coding
yuckily.

cheers,
ric