From: larry
Subject: newbie question: fast floating point computations
Date: 
Message-ID: <7b8f89d6.0108080857.5238d45a@posting.google.com>
I heard that lisp with declarations and the proper optimization
settings
can be just as fast as c code. I tried a simple program written in C
and
lisp and I just can't get the speed of the compiled lisp program
anywhere near
that of the corresnponding C program. The lisp version is maybe 10
times slower than the C version.

(defun test(n)
  (declare (optimize (speed 3) (safety 0))
           (fixnum n))
  (let ((x 0.0s0))
    (declare (single-float x))
    (dotimes (i n)
      (incf x (the single-float (cos x))))
    x))

Am I doing something wrong with the declarations and optimization
settings?
I'm using ACL 5.0.

From: Christophe Rhodes
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <sqitfyeaky.fsf@lambda.jesus.cam.ac.uk>
··········@hotmail.com (larry) writes:

> I heard that lisp with declarations and the proper optimization
> settings
> can be just as fast as c code. I tried a simple program written in C
> and
> lisp and I just can't get the speed of the compiled lisp program
> anywhere near
> that of the corresnponding C program. The lisp version is maybe 10
> times slower than the C version.
> 
> (defun test(n)
>   (declare (optimize (speed 3) (safety 0))
>            (fixnum n))
>   (let ((x 0.0s0))
>     (declare (single-float x))
>     (dotimes (i n)
>       (incf x (the single-float (cos x))))
>     x))
> 
> Am I doing something wrong with the declarations and optimization
> settings?

I don't think so.

On my system (debian unstable, cmucl and gcc -O2, I get roughly equal
times with C and lisp):

[ C ]
·····@lambda:~$ time ./foo
1.570796

real	0m0.043s
user	0m0.040s
sys	0m0.000s

[ Lisp ]
* (time (test 100000))
Compiling LAMBDA NIL: 
Compiling Top-Level Form: 

Evaluation took:
  0.03 seconds of real time
  0.03 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  0 bytes consed.
1.5707964

But this doesn't look like a real world test, to be honest.

CMUCL emitted two notes when compiling your test function, one of
which I think is unavoidable:

In: LAMBDA (N)
  (COS X)
Note: Unable to avoid inline argument range check
because the argument range (SINGLE-FLOAT) was not within 2^64

Note: Doing float to pointer coercion (cost 13) from X to "<return value>".

The second is inevitable (cmucl has to box up the float to return it
to the caller, unless any particular magic happens); the first,
however, should be avoidable.

Perhaps you could disassemble your test function in ACL to see what
it's doing? Does ACL emit any efficiency notes on compilation?

Christophe
-- 
Jesus College, Cambridge, CB5 8BL                           +44 1223 510 299
http://www-jcsu.jesus.cam.ac.uk/~csr21/                  (defun pling-dollar 
(str schar arg) (first (last +))) (make-dispatch-macro-character #\! t)
(set-dispatch-macro-character #\! #\$ #'pling-dollar)
From: Raymond Toy
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <4nhevil8og.fsf@rtp.ericsson.se>
>>>>> "Christophe" == Christophe Rhodes <·····@cam.ac.uk> writes:

    Christophe> CMUCL emitted two notes when compiling your test function, one of
    Christophe> which I think is unavoidable:

    Christophe> In: LAMBDA (N)
    Christophe>   (COS X)
    Christophe> Note: Unable to avoid inline argument range check
    Christophe> because the argument range (SINGLE-FLOAT) was not within 2^64

    Christophe> Note: Doing float to pointer coercion (cost 13) from X to "<return value>".

    Christophe> The second is inevitable (cmucl has to box up the float to return it
    Christophe> to the caller, unless any particular magic happens); the first,
    Christophe> however, should be avoidable.

For what it's worth, I think the first note is emitted because CMUCL
will try to use the fsincos (?) floating-point instruction directly.
This only works if |x| < 2^64.  If the compiler can't prove this is
true, it, instead, generates a call to the C cos function on the
assumption that it will do a better job.

Ray
From: Siegfried Gonzi
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <3B72474D.5C491644@kfunigraz.ac.at>
larry wrote:


> The lisp version is maybe 10
> times slower than the C version.

> 
> (defun test(n)
>   (declare (optimize (speed 3) (safety 0))
>            (fixnum n))
>   (let ((x 0.0s0))
>     (declare (single-float x))
>     (dotimes (i n)
>       (incf x (the single-float (cos x))))
>     x))


My happyness about my PowerLisp Macintosh version occured too soon.

You are not alone, also on my Mac there is the C version 10 times faster
as the compiled (with PowerLisp 2.2) version. Surely, the compiled
version (
I copy and pasted it from your post and compiled it) is
about 5 times faster than the interpreted one, but lost noticeably to
C (see the C code for evaluation at the end of my post).

I think my career as potential LISP programmer is over. It is
intimmitating
to see that even commercial products (at least Larry's product from
Franz Inc.)
are way behind what might be possible. Where are the famous 80% of the C
speed 
(after searching in the net I have been often reading this)?

I also tried a small Clean program on my Macintosh. Clean is a
functional language
and for non commercial use it is free. Clean has been (and beeing always
developed
and improved) developed at the university of Nijmegen (Netherlands).
Clean is not
only a toy project; it is a powerful IDE (e.g. efficient platform
independent 
GUI-programming is possible in Clean). 

And to my great surprise Clean is as fast as the C version (MPW native
Macintosh
C/C++ compiler):

Here is the executable Clean version (Real stands for "double" in
Clean):

module cosx
import StdEnv

test::!Int !Real -> !Real
test n cosx
        | n==0 = cosx
        | otherwise = test (n-1) (cosx+cos(cosx))
        
Start = test 10000000 0.0


Why isn't it possible that the LISP version does in-place update when
one gives the
LISP compiler declarative assignments? At my PowerLisp for the Mac, I
can see a message
during execution, e.g. (test '1000000), that the compiled version is
busy with 
garbage-collection.

In Clean when I give strict information, e.g. "!Real" then there is
virtually no  garbage collection
and nested scope!


S. Gonzi (please notice the enclosed code for your evaluation of my
propositions; I am not interested in that
fucking C/C++, therefore send your complaints about my C coding-style to
Plan9 or elsewhere)

#include <stdio.h>
#include <math.h>
int main( void )
{
        int i;
        int n=10000000; /*The same 'n' as in the LISP-code; A big n is
easier for*/
                                        /*timing the execution time with
a watch*/
        double cosx=0.0; /*Use float here if you want; but this doesn't
influence*/
                                        /*the execution time, at least
on my Macintosh*/
        
        for(i=0;i<=n;i++)
        {
                cosx = cosx + cos(cosx);
        }
        printf("cosx: %f\n",cosx);
return 0;
}

---END OF POST---
From: Duane Rettig
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <4vgjx5m3s.fsf@beta.franz.com>
Siegfried Gonzi <···············@kfunigraz.ac.at> writes:

> I think my career as potential LISP programmer is over. It is
> intimmitating
> to see that even commercial products (at least Larry's product from
> Franz Inc.)
> are way behind what might be possible. Where are the famous 80% of the C
> speed 
> (after searching in the net I have been often reading this)?

Benchmarking is a black art.

I strongly advise you not to make life-changing decisions based on
benchmarks.  If you do, you will end up bouncing from language to
language, becuase no language implementation will satisfy a need for
speed 100% of the time, and especially a language which places
correctness at a higher priority than speed.

For the original poster:

When benchmarking cos functionality, one must realize that it is hard to
get an accurate answer from trig functions.  Sometimes you must work
within a range in order to allow hardware versions to be used, and
sometimes the library versions of the functions simply don't have a
single-float version, because it simply isn't accurate enough.

Below are two examples of speedups in different versions of Allegro CL.
The first takes advantage of the x86 hardware cos function, which can
fire if the range of single-float or double-float is limited to
something less than plus-or-minus (expt 2 63).  The second is on a Sparc,
and makes a foreign call to the cos in (using defforeign, since
def-foreign-call has a performance bug I just discovered and will fix).
Note also that the foreign call uses double-floats instead of
single-floats, since that is what the library function provides.

The results:  The on the x86, 10x slowdown is gained almost completely
back again.  On the sparc, I doubt that lisp's slowdown is as much as
10x, since not even the C compiler can do faster than the library
call.  However, note that removing the consing results in a 4x speedup.


USER(1): (shell "cat costime.cl")
(in-package :user)

(defun test1 (n)
  (declare (optimize (speed 3) (safety 0))
           (fixnum n))
  (let ((x 0.0s0))
    (declare (single-float x))
    (dotimes (i n)
      (incf x (the single-float (cos x))))
    x))

(defun test2 (n)
  (declare (optimize (speed 3) (safety 0))
           (fixnum n))
  (let ((x 0.0s0))
    (declare (type (single-float 0.0 1.0) x))
    (dotimes (i n)
      (incf x (the (single-float 0.0 1.0) (cos x))))
    x))
0
USER(2): :cl costime
;;; Compiling file costime.cl
;;; Writing fasl file costime.fasl
;;; Fasl write complete
; Fast loading /home/duane/acl62/src/costime.fasl
USER(3): (time (test1 1000000))
; cpu time (non-gc) 910 msec user, 0 msec system
; cpu time (gc)     160 msec user, 0 msec system
; cpu time (total)  1,070 msec user, 0 msec system
; real time  1,078 msec
; space allocation:
;  1 cons cell, 0 symbols, 80,000,048 other bytes, 0 static bytes
1.5707964
USER(4): (time (test2 1000000))
; cpu time (non-gc) 110 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  110 msec user, 0 msec system
; real time  114 msec
; space allocation:
;  1 cons cell, 0 symbols, 48 other bytes, 0 static bytes
1.5707964
USER(5): 

USER(1): (shell "cat costime.cl")
(in-package :user)

(defun test1 (n)
  (declare (optimize (speed 3) (safety 0))
           (fixnum n))
  (let ((x 0.0s0))
    (declare (single-float x))
    (dotimes (i n)
      (incf x (the single-float (cos x))))
    x))

(eval-when (compile load eval)
  (ff:defforeign 'c-cos
      :entry-point "cos"
      :arguments '(double-float)
      :return-type :double-float
      :arg-checking nil
      :call-direct t))

(defun test2 (n)
  (declare (optimize (speed 3) (safety 0))
           (fixnum n))
  (let ((x 0.0d0))
    (declare (double-float x))
    (dotimes (i n)
      (incf x (the double-float (c-cos x))))
    x))
0
USER(2): :cl costime
;;; Compiling file costime.cl
; Fast loading from bundle code/ffcompat.fasl.
;;; Writing fasl file costime.fasl
;;; Fasl write complete
; Fast loading /acl2/duane/acl62/src/costime.fasl
USER(3): (time (test1 1000000))
; cpu time (non-gc) 2,330 msec user, 10 msec system
; cpu time (gc)     400 msec user, 20 msec system
; cpu time (total)  2,730 msec user, 30 msec system
; real time  2,793 msec
; space allocation:
;  1 cons cell, 0 symbols, 80,000,016 other bytes, 0 static bytes
1.5707964
USER(4): (time (test2 1000000))
; cpu time (non-gc) 620 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  620 msec user, 0 msec system
; real time  623 msec
; space allocation:
;  1 cons cell, 0 symbols, 24 other bytes, 0 static bytes
1.5707963267948966d0
USER(5): 

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)
From: Christophe Rhodes
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <sqzo996wkz.fsf@lambda.jesus.cam.ac.uk>
Duane Rettig <·····@franz.com> writes:

> (defun test2 (n)
>   (declare (optimize (speed 3) (safety 0))
>            (fixnum n))
>   (let ((x 0.0s0))
>     (declare (type (single-float 0.0 1.0) x))
                     ^^^^^^^^^^^^^^^^^^^^^^
>     (dotimes (i n)
>       (incf x (the (single-float 0.0 1.0) (cos x))))
>     x))

A pedantic point perhaps, but the underlined declaration isn't true if
n > 1. Of course, since it's highly unlikely that any compiler will
generate different code for this declaration than for the correct
(single-float 0.0 2.0), you've gotten away with it :)

Cheers,

Christophe
-- 
Jesus College, Cambridge, CB5 8BL                           +44 1223 510 299
http://www-jcsu.jesus.cam.ac.uk/~csr21/                  (defun pling-dollar 
(str schar arg) (first (last +))) (make-dispatch-macro-character #\! t)
(set-dispatch-macro-character #\! #\$ #'pling-dollar)
From: Duane Rettig
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <4n1595gzr.fsf@beta.franz.com>
Christophe Rhodes <·····@cam.ac.uk> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > (defun test2 (n)
> >   (declare (optimize (speed 3) (safety 0))
> >            (fixnum n))
> >   (let ((x 0.0s0))
> >     (declare (type (single-float 0.0 1.0) x))
>                      ^^^^^^^^^^^^^^^^^^^^^^
> >     (dotimes (i n)
> >       (incf x (the (single-float 0.0 1.0) (cos x))))
> >     x))
> 
> A pedantic point perhaps, but the underlined declaration isn't true if
> n > 1. Of course, since it's highly unlikely that any compiler will
> generate different code for this declaration than for the correct
> (single-float 0.0 2.0), you've gotten away with it :)

Ah, yes, of course.  Thanks.  I forgot about the addition within
the series.

Of course, any declaration within the range of the type
'(single-float #.(- (expt 2.0 63)) #.(expt 2.0 63))
will do, since that is the range (as I recall) which the x86 allows
for accurate answers.

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)
From: Pekka P. Pirinen
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <upua4t21v.fsf@globalgraphics.com>
Christophe Rhodes <·····@cam.ac.uk> writes:
> Duane Rettig <·····@franz.com> writes:
> > (defun test2 (n)
> >   (declare (optimize (speed 3) (safety 0))
> >            (fixnum n))
> >   (let ((x 0.0s0))
> >     (declare (type (single-float 0.0 1.0) x))
>                      ^^^^^^^^^^^^^^^^^^^^^^
> >     (dotimes (i n)
> >       (incf x (the (single-float 0.0 1.0) (cos x))))
> >     x))
> 
> A pedantic point perhaps, but the underlined declaration isn't true if
> n > 1. Of course, since it's highly unlikely that any compiler will
> generate different code for this declaration than for the correct
> (single-float 0.0 2.0), you've gotten away with it :)

Another very pedantic point definitely, but the other declaration
isn't true, either, if n > 4, since the sum might overshoot (/ pi 2)
in which case (cos x) will go negative.  Of course, since it's highly
unlikely that any compiler will generate different code for this
declaration than for the correct (single-float -1.0 1.0), you've
gotten away with that too, :).

Mathematically, the sum would never overshoot pi/2 and cos(x) would
never go negative, since Dcos(x)<=1, but floating-point arithmetic
being what it is, a rounding error might push it just over.  In fact,
with LW single-floats (24 bits precision): (cos (coerce (/ pi 2)
'single-float)) => -4.3711388E-8.

And yet another pedantic point, 0.0s0 is a SHORT-FLOAT, you meant
0.0f0.  Some compilers might generate interestingly incorrect code
from this one.
-- 
Pekka P. Pirinen
The triumph of cloning tech: A sheep that looks just like another sheep.
From: larry
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <7b8f89d6.0108091658.20ada0b7@posting.google.com>
> (defun test2 (n)
>   (declare (optimize (speed 3) (safety 0))
>            (fixnum n))
>   (let ((x 0.0s0))
>     (declare (type (single-float 0.0 1.0) x))
>     (dotimes (i n)
>       (incf x (the (single-float 0.0 1.0) (cos x))))
>     x))

Ahh, so this is how you do cons free floating point arithmetic.
Great!

But can you explain how (single-float 0.0 1.0) stops the consing.
In particular what is the 0.0 and 1.0 for. Do they specify the
range of x and cos x?
From: Duane Rettig
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <4elqkxw7f.fsf@beta.franz.com>
··········@hotmail.com (larry) writes:

> > (defun test2 (n)
> >   (declare (optimize (speed 3) (safety 0))
> >            (fixnum n))
> >   (let ((x 0.0s0))
> >     (declare (type (single-float 0.0 1.0) x))
> >     (dotimes (i n)
> >       (incf x (the (single-float 0.0 1.0) (cos x))))
> >     x))
> 
> Ahh, so this is how you do cons free floating point arithmetic.
> Great!

Careful; Bear in mind that this only works on x86-based systems
(and on the old 68k systems, though we no longer have any such
products on them) because their floating point architecture includes
cos instructions (as well as sin and tan instructions).

> But can you explain how (single-float 0.0 1.0) stops the consing.
> In particular what is the 0.0 and 1.0 for. Do they specify the
> range of x and cos x?

Yes, although it was pointed out to me that the range should have been
(single-float 0.0 2.0) instead.

The consing is stopped by way of the compiler using the instruction on
unboxed floats, and then not consing up the results.  By keeping
unboxed floating point variables, boxes aren't necessary for each
calculation.

However there is a point where the compiler doesn't want to use the
instruction.  I don't have my Pentium manual handy, but the maximum
range for which any of these three instructions can accurately
calculate the result is plus or minus (expt 2 63).  Now consider
that the type 'single-float is equivalent to type (single-float * *),
which is _not_ a subtype of
 '(single-float #.(- (expt 2.0 63)) #.(expt 2.0 63)),
whereas both '(single-float 0.0 1.0) and '(single-float 0.0 2.0)
are subtypes.  So the compiler can assume the latter two types to
then allow the instruction to be used.

What happens when the instruction is not used?  Well, the float
variable is boxed up for passing as an argument to the COS
lisp function, which normalizes the range of the float, converts it
to double float (because that's what the library function wants)
and then calls lower level library functionality and consing the
result after converting it back to single-float.  Thus it conses
more all the way around.

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)
From: Raymond Toy
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <4nlmkodr1i.fsf@rtp.ericsson.se>
>>>>> "Duane" == Duane Rettig <·····@franz.com> writes:

    Duane> However there is a point where the compiler doesn't want to use the
    Duane> instruction.  I don't have my Pentium manual handy, but the maximum
    Duane> range for which any of these three instructions can accurately
    Duane> calculate the result is plus or minus (expt 2 63).  Now consider

I think that's the right limit.  However, "accurately" is debatable.
I think the Pentium only has a 66 bit value of pi, so after
argument reduction of 2^63, you only know which octant the angle is in
and that's the result you get, one of 8 possible values.

Kahan argues in one of his online papers that cos(2^120) should return
-0.9258790228548379d0 and not some other value.  Sun's libm does this
because it uses a thousand or so bits to represent pi.

    Duane> What happens when the instruction is not used?  Well, the float
    Duane> variable is boxed up for passing as an argument to the COS
    Duane> lisp function, which normalizes the range of the float, converts it
    Duane> to double float (because that's what the library function wants)
    Duane> and then calls lower level library functionality and consing the
    Duane> result after converting it back to single-float.  Thus it conses
    Duane> more all the way around.

If this is important, you can might be able to use CMUCL which doesn't
cons when calling cos because the compiler knows it can call the C cos
function directly. :-)

Ray
From: Christophe Rhodes
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <sqvgjx1vrx.fsf@lambda.jesus.cam.ac.uk>
Siegfried Gonzi <···············@kfunigraz.ac.at> writes:

> larry wrote:
> 
> 
> > The lisp version is maybe 10
> > times slower than the C version.
> 
> > 
> > (defun test(n)
> >   (declare (optimize (speed 3) (safety 0))
> >            (fixnum n))
> >   (let ((x 0.0s0))
> >     (declare (single-float x))
> >     (dotimes (i n)
> >       (incf x (the single-float (cos x))))
> >     x))
> 
> 
> I think my career as potential LISP programmer is over. It is
> intimmitating to see that even commercial products (at least Larry's
> product from Franz Inc.)  are way behind what might be
> possible. Where are the famous 80% of the C speed (after searching
> in the net I have been often reading this)?

Why?

For goodness' sake, just rewrite the function to use a different
algorithm, if this is really the speed-critical part of your
application:

(defun test (n)
  ;; no need for declarations, really
  (case n
    (0 0.0s0)
    (1 1.0s0)
    (2 1.5403023s0)
    (3 1.5707916s0)
    (t 1.5707964s0)))

There. I hope that's fast enough for you.

Christophe
-- 
Jesus College, Cambridge, CB5 8BL                           +44 1223 510 299
http://www-jcsu.jesus.cam.ac.uk/~csr21/                  (defun pling-dollar 
(str schar arg) (first (last +))) (make-dispatch-macro-character #\! t)
(set-dispatch-macro-character #\! #\$ #'pling-dollar)
From: Siegfried Gonzi
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <3B724DF9.4FD63551@kfunigraz.ac.at>
Christophe Rhodes wrote:


> For goodness' sake, just rewrite the function to use a different
> algorithm, if this is really the speed-critical part of your
> application:


And this is the problem in my opinion. I know there are always better
algorithms which suit better to a programming language (Forther always
say: Do not translate your code directly from C to Forth it can became
awful).

On the other side it is not obvious for me why I should give strictness
(for the sake sorry this is not the right expression in the LISP
community) information and then it doesn't pay off.

 
> (defun test (n)
>   ;; no need for declarations, really
>   (case n
>     (0 0.0s0)
>     (1 1.0s0)
>     (2 1.5403023s0)
>     (3 1.5707916s0)
>     (t 1.5707964s0)))
> 
> There. I hope that's fast enough for you.


As I have seen in your post from yesterday it is possible that CL
becomes fast on Unix. No question (but at least the Pseudoknot benchmark
test should become renewed).


S. Gonzi
From: Ole Rohne
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <ebwk80dfumw.fsf@lxplus042.cern.ch>
Siegfried Gonzi <···············@kfunigraz.ac.at> writes:
> larry wrote:
> 
> > The lisp version is maybe 10
> > times slower than the C version.
> 
> You are not alone, also on my Mac there is the C version 10 times faster
> as the compiled (with PowerLisp 2.2) version.

n		gcc		cmucl
10      	0.0		0.0	;; neglible C startup disadvantage
100000000       12.8            11.9    ;; C speed within 90% of lisp

You guys just need better compilers:-)

To be fair: Inspecting the assembly code shows that gcc doesn't inline
cos. Anyone knows if it could? - and how to make it do that?


	Regards,
        Ole


/* test.c */
#include <math.h>
int main(int argc, char **argv)
{
  float x=0.0;
  int i=atoi(argv[1]);
  for (; i; i--) x+=cos(x);
  printf("%10.7f\n", x);
  return 0;
}

$ uname -smrp
Linux 2.2.19-6.2.1.1smp i686 unknown
$ CC=gcc LOADLIBES=-lm CFLAGS=-O9 make test
$ time -p ./test 10     
 1.5707964
real 0.03
user 0.00
sys 0.02
$ time -p ./test 100000000
 1.5707964
real 12.79
user 12.70
sys 0.02

;;; test.lisp
(defun test (n)
  (declare (type fixnum n))
  (declare (optimize (speed 3) (space 0) (safety 0)))
  (let ((x 0.0d0))
    (declare (type (double-float 0.0d0 2.0d0) x)) ;; Eliminate range check
    (dotimes (i n)
      (incf x (cos x)))
    x))

CMU Common Lisp 18c, running on lxplus042
* (compile-file "test" :load t)
* (time (test 10))
Evaluation took:
  0.0 seconds of real time
  0.0 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  0 bytes consed.
1.5707963267948966d0
* (time (test 100000000))
Evaluation took:
  11.88 seconds of real time
  11.34 seconds of user run time
  0.06 seconds of system run time
  0 page faults and
  0 bytes consed.
1.5707963267948966d0
From: Raymond Toy
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <4n4rrhjsau.fsf@rtp.ericsson.se>
>>>>> "Ole" == Ole Rohne <·········@cern.ch> writes:

    Ole> Siegfried Gonzi <···············@kfunigraz.ac.at> writes:
    >> larry wrote:
    >> 
    >> > The lisp version is maybe 10
    >> > times slower than the C version.
    >> 
    >> You are not alone, also on my Mac there is the C version 10 times faster
    >> as the compiled (with PowerLisp 2.2) version.

    Ole> n		gcc		cmucl
    Ole> 10      	0.0		0.0	;; neglible C startup disadvantage
    Ole> 100000000       12.8            11.9    ;; C speed within 90% of lisp

    Ole> You guys just need better compilers:-)

    Ole> To be fair: Inspecting the assembly code shows that gcc doesn't inline
    Ole> cos. Anyone knows if it could? - and how to make it do that?

Don't know how to make gcc inline cos by itself, but gcc supports a
pretty nice inline assembly feature so you could do something like

        asm("fsincos" : x);

to inline it.  I used to know some of the complicated inline asm
syntax, but have forgotten it.

Ray
From: Siegfried Gonzi
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <3B725F29.378AA78E@kfunigraz.ac.at>
Ole Rohne wrote:

 
> n               gcc             cmucl
> 10              0.0             0.0     ;; neglible C startup disadvantage
> 100000000       12.8            11.9    ;; C speed within 90% of lisp
> 
> You guys just need better compilers:-)

Do you have any figures whether CMUCL is busy with garbage-collection
during the computation?

As I wrote in Clean there is absolutely no garbage-collection (0.0 sec);
but PoweLisp makes extensive use of it. Maybe this is also the case with
the LISP from Larry. 

I saw that your code includes (space 0). Could it be that this reduces
garbage-collection?


S. Gonzi
From: Duncan Harvey
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <1exvh9f.1p7onhk5t7uv4N%spam@hubris2.demon.co.uk>
Siegfried Gonzi <···············@kfunigraz.ac.at> wrote:

> As I wrote in Clean there is absolutely no garbage-collection (0.0 sec);
> but PoweLisp makes extensive use of it.

At the risk of upsetting Roger Corman, I'd tentatively suggest that
PowerLisp's compiler is unlikely to be as mature as other
implementations, simply because it hasn't existed/been maintained for as
long.

> I saw that your code includes (space 0). Could it be that this reduces
> garbage-collection?

Probably not in the way you may be expecting -- the above space
declaration tells the compiler that you don't care about the size of the
emitted code.  Increasing its importance, to say (space 3), tells the
compiler to try and make the code as small as possible at the expense of
other factors (e.g. speed and/or debugability).

-- 
Duncan Harvey
"Smiling and waving. Before letting himself fall."
                       -- Anja Garbarek, The Diver
From: Raymond Wiker
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <86wv4da5ae.fsf@raw.grenland.fast.no>
Siegfried Gonzi <···············@kfunigraz.ac.at> writes:

> Ole Rohne wrote:
> 
>  
> > n               gcc             cmucl
> > 10              0.0             0.0     ;; neglible C startup disadvantage
> > 100000000       12.8            11.9    ;; C speed within 90% of lisp
> > 
> > You guys just need better compilers:-)
> 
> Do you have any figures whether CMUCL is busy with garbage-collection
> during the computation?

        If you look closer at the article you quoted, you'll see a
line saying something like "0 bytes consed". You may take this as an
indication that there was no GC activity :-)

> As I wrote in Clean there is absolutely no garbage-collection (0.0 sec);
> but PoweLisp makes extensive use of it. Maybe this is also the case with
> the LISP from Larry. 

        Most Lisp systems report GC activity as part of the output
from (time).

> I saw that your code includes (space 0). Could it be that this reduces
> garbage-collection?

        (space 0) was used as part of a (declare (optimize ...)). It
tells the compiler that the size of the compiled code is completely
irrelevant.

-- 
Raymond Wiker
·············@fast.no
From: Siegfried Gonzi
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <3B726ED9.A042AF42@kfunigraz.ac.at>
Raymond Wiker wrote:

 
>         If you look closer at the article you quoted, you'll see a
> line saying something like "0 bytes consed". You may take this as an
> indication that there was no GC activity :-)


Maybe Larry should conduct a benchmark with arrays. If his LISP version
would then also suck, I think he has got a really big problem with his
compiler if he want to use his LISP for scientfic computations (we are
not discussing about a factor of 2 or so; we are discussing about a
factor of 10; and it is a pity that we cannot use it additive instead we
have to deal it multiplicative).

Often micro-benchmarks (as by Larry) show you only one side of the
medal. But when it comes to arrays often the situation changes. 
Functional programming languages often face the problem that they may
not update arrays as "destructively"; this slows down the execution
enormously. In Clean there it is possible to make destructive updates.
This is often debatable, because some think destructive update is not
really "functional". 


I only always speak from the numerical-side. Does anybody have array
benchmarks in LISP?


S. Gonzi
From: Raymond Toy
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <4nvgjwgk1d.fsf@rtp.ericsson.se>
>>>>> "Siegfried" == Siegfried Gonzi <···············@kfunigraz.ac.at> writes:


    Siegfried> I only always speak from the numerical-side. Does anybody have array
    Siegfried> benchmarks in LISP?

A while ago there was a thread on implementing a DCT.  Lisp was quite
a bit worse than C mainly because the original Lisp version was using
an 2-D array and not a 2-D simple-array.  Changing to 1-D simple-array
helped quite a bit, but C was still faster by a fairly large margin.

Perhaps google still has it.

Ray
From: Marco Antoniotti
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <y6c3d6zjqu0.fsf@octagon.mrl.nyu.edu>
Siegfried Gonzi <···············@kfunigraz.ac.at> writes:

> Ole Rohne wrote:
> 
>  
> > n               gcc             cmucl
> > 10              0.0             0.0     ;; neglible C startup disadvantage
> > 100000000       12.8            11.9    ;; C speed within 90% of lisp
> > 
> > You guys just need better compilers:-)
> 
> Do you have any figures whether CMUCL is busy with garbage-collection
> during the computation?
> 
> As I wrote in Clean there is absolutely no garbage-collection (0.0 sec);
> but PoweLisp makes extensive use of it. Maybe this is also the case with
> the LISP from Larry. 
> 
> I saw that your code includes (space 0). Could it be that this reduces
> garbage-collection?

Why do you insist on using a system that has never been completely
supported and developed?  Even on the Mac, I would consider the test
reasonable only if run on MCL. Not PowerLisp.

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: Shannon Spires
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <svspire-4E506D.20263110082001@news.sandia.gov>
Here's the test run in MCL 4.3.1 on a 500 Mhz Powerbook G3:

;  Larry's original function 

(defun test(n)
  (declare (optimize (speed 3) (safety 0))
           (fixnum n))
  (let ((x 0.0s0))
    (declare (single-float x))
    (dotimes (i n)
      (incf x (the single-float (cos x))))
    x))

? (time (test 1000000))
(TEST 1000000) took 1,518 milliseconds (1.518 seconds) to run.
Of that, 20 milliseconds (0.020 seconds) were spent in The Cooperative 
Multitasking Experience.
205 milliseconds (0.205 seconds) was spent in GC.
Net execution time is thus 1293 milliseconds (1.293 seconds).
 32,000,000 bytes of memory allocated.
1.5707963267948966

Ouch! Too much consing.

Now let's rewrite it for less consing:

(defun test2 (n)
  (declare (optimize (speed 3) (safety 0))
           (fixnum n))
  (let ((x 0.0d0))
    (declare (destructive x)) ; just like dynamic-extent except it 
conses once
    (declare (type double-float x))
    (dotimes (i n)
      (incf x (the double-float (cos x))))
    x))

? (time (test2 1000000))
(TEST2 1000000) took 2,101 milliseconds (2.101 seconds) to run.
Of that, 89 milliseconds (0.089 seconds) were spent in The Cooperative 
Multitasking Experience.
824 milliseconds (0.824 seconds) was spent in GC.
Net execution time is thus 1188 milliseconds (1.188 seconds).
 16,000,104 bytes of memory allocated.
1.5707963267948966

Better. At least x isn't consing now. But now the problem is cos itself.

(defun test3 (n)
  (declare (optimize (speed 3) (safety 0))
           (fixnum n))
  (let ((x 0.0d0)
        (y 0.0d0))
    (declare (destructive x)) ; just like dynamic-extent except it 
conses once
    (declare (dynamic-extent y))
    (declare (type double-float x y))
    (dotimes (i n)
      (ccl::%double-float-cos! x y) ; an internal function
      (incf x y)
      )
    x))

? (time (test3 1000000))
(TEST3 1000000) took 706 milliseconds (0.706 seconds) to run.
Of that, 9 milliseconds (0.009 seconds) were spent in The Cooperative 
Multitasking Experience.
Net execution time is thus 697 milliseconds (0.697 seconds).
 32 bytes of memory allocated.
1.5707963267948966

Now we've done it. But this last example requires some knowledge of the 
guts of MCL.

These examples didn't show it, but MCL can also deal with 1d and 2d 
arrays of floats without consing.

Shannon Spires
svspire-at-nmia-dot-com
From: Takehiko Abe
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <keke-1208010041470001@solg4.keke.org>
In article <·····························@news.sandia.gov>, 
Shannon Spires wrote:

> ? (time (test3 1000000))
> (TEST3 1000000) took 706 milliseconds (0.706 seconds) to run.
> Of that, 9 milliseconds (0.009 seconds) were spent in The Cooperative 
> Multitasking Experience.
> Net execution time is thus 697 milliseconds (0.697 seconds).
>  32 bytes of memory allocated.
> 1.5707963267948966
> 

On PPC G4 350mhz with MCL4.3.1, I got:

? (time (test3 1000000))
(TEST3 1000000) took 971 milliseconds (0.971 seconds) to run.
Of that, 6 milliseconds (0.006 seconds) were spent in The Cooperative 
Multitasking Experience.
 56 bytes of memory allocated.          ; hmmm
1.5707963267948966
? 

This is impressive, but all the declarations seem a bit voodoo to me.
I would still take a C library route.

Here's my C version [compiled into shared library with MPW.]
-------
#include <math.h>

extern double temp_test(int n);

extern double temp_test(int n)
{
   double x=0.0;
   
   while (n) {
      x += cos(x);
      n--;
   }

   return x;
}
--------
Now, back to MCL,

(define-entry-point ("temp_test" ("temp_test"))
  ((:n :long))
  :double-float)

? (time (temp_test 1000000))
(TEMP_TEST 1000000) took 244 milliseconds (0.244 seconds) to run.
 32 bytes of memory allocated.
1.5707963267948966
? 

But I guess you can beat this easily with your Altivec LAP.


regards,
abe
<keke at mac com>
From: Takehiko Abe
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <keke-1008010022570001@solg4.keke.org>
In article <··································@4ax.com>,
Fernando wrote:

> [...] However, it
> does raise an interesting question: how do you do fast, cons-free floating
> point calculations in Lisp?  This isn't the kind of thing I usually do, so
> it doesn't really worry me, but I'm curious.

I'd write the portion of code in C and ff call from Lisp.

regards,
abe
-- 
<keke at mac com>
From: Kent M Pitman
Subject: Re: newbie question: fast floating point computations
Date: 
Message-ID: <sfwk80dz04v.fsf@world.std.com>
····@mac.spam (Takehiko Abe) writes:

> 
> In article <··································@4ax.com>,
> Fernando wrote:
> 
> > [...] However, it
> > does raise an interesting question: how do you do fast, cons-free floating
> > point calculations in Lisp?  This isn't the kind of thing I usually do, so
> > it doesn't really worry me, but I'm curious.
> 
> I'd write the portion of code in C and ff call from Lisp.

Nothing wrong with this approach.

I don't think there is anything fundamentally wrong with the Lisp paradigm
but I do think the language definition as presently spec'd is short on
normative hints about how you should write code so it will be compiled well,
and so writing portably efficient code is quite hard.  Perhaps this will
be fixed in the future.  In the nearterm, your best options are tight 
cooperation with a specific vendor of choice to find out what they expect
(and maybe also what secret operations/modes they have that are not in
the standard that can help you) or else foreign function call (which means
you can write a c library for certain rote tasks and then call it from
most implementations, since almost everyone has a foreign function interface
(though, like "flammable vs inflammable", some call it a native function
interface).