Hi.
I have been challenged to "port" to CL the following (for now
quite inaccurate) approximation of pi/2. Depending on the
speed outcome, CL may get considered for the project that
will be havy on mumeric computations.
C++ code
#include <iostream>
#include <cmath>
#include <ctime>
static double piOver2() {
double sum = std::exp(-std::pow(-10.0, 2.0)) ;
sum += std::exp(-std::pow(10.0, 2)) ;
for (double x = -9.99 ; x < 10.0 ; x += 0.01) {
double v = std::exp(-std::pow(x, 2.0)) ;
sum += v + v ;
}
return sum * 0.005 ;
}
10,000 iterations take about 3s.
I then went through to java (replacing "std::" with "Math.")
and this takes about 9s.
I finally coded the Lisp version
(defun pi/2 ()
(do ((sum
(+ (exp (- (expt -10.0 2.0)))
(exp (- (expt 10.0 2.0)))))
(x -9.99 (+ x 0.01)))
((>= x 10.0) (* sum 0.005))
(incf sum (* 2 (exp (- (expt x 2.0)))))))
But I just can't tell how long 10,000 iterations would
take because I lost patience after 2 minutes.
So I started inserting a few (declare) and (the float)
here and there:
(defun pi/2-opt ()
(declare (optimize (safety 0) (debug 0) (speed 3)))
(declare (float sum))
(declare (float x))
(do ((sum
(+ (exp (- (expt -10.0 2.0)))
(exp (- (expt 10.0 2.0)))))
(x -9.99 (+ x 0.01)))
((>= x 10.0) (* sum 0.005))
(incf sum (the float (* 2 (exp (- (expt x 2.0))))))))
CL-USER 54 > (time (dotimes (i 10000) (pi/2-opt)))
Timing the evaluation of (DOTIMES (I 10000) (PI/2-OPT))
user time = 97.631
system time = 2.084
Elapsed time = 0:02:05
Allocation = 4877356064 bytes
0 Page faults
Calls to %EVAL 120050
NIL
At nearly 30 times slower, I just don't stand a chance.
I can probably push the high-level, DSL and other Lisp features
at around 10x or 5x the speed penalty, but not at 30x.
What did I misunderstand about the use of (declare (optimize ...
and (the float ( ... or anything else that might affect the
speed so adversely?
Just in case this isn't clear, the point is not about getting
pi/2 but getting numeric speed for computations of complexity
similar to that used in this approximation of pi/2.
Many thanks
--
JFB
verec wrote:
> Hi.
>
> I have been challenged to "port" to CL the following (for now
> quite inaccurate) approximation of pi/2. Depending on the
> speed outcome, CL may get considered for the project that
> will be havy on mumeric computations.
>
> C++ code
> #include <iostream>
> #include <cmath>
> #include <ctime>
>
> static double piOver2() {
> double sum = std::exp(-std::pow(-10.0, 2.0)) ;
> sum += std::exp(-std::pow(10.0, 2)) ;
>
> for (double x = -9.99 ; x < 10.0 ; x += 0.01) {
> double v = std::exp(-std::pow(x, 2.0)) ;
> sum += v + v ;
> }
>
> return sum * 0.005 ;
> }
>
> 10,000 iterations take about 3s.
>
> I then went through to java (replacing "std::" with "Math.")
> and this takes about 9s.
>
> I finally coded the Lisp version
>
> (defun pi/2 ()
> (do ((sum
> (+ (exp (- (expt -10.0 2.0)))
> (exp (- (expt 10.0 2.0)))))
> (x -9.99 (+ x 0.01)))
> ((>= x 10.0) (* sum 0.005))
> (incf sum (* 2 (exp (- (expt x 2.0)))))))
>
> But I just can't tell how long 10,000 iterations would
> take because I lost patience after 2 minutes.
>
> So I started inserting a few (declare) and (the float)
> here and there:
>
> (defun pi/2-opt ()
> (declare (optimize (safety 0) (debug 0) (speed 3)))
> (declare (float sum))
> (declare (float x))
> (do ((sum
> (+ (exp (- (expt -10.0 2.0)))
> (exp (- (expt 10.0 2.0)))))
> (x -9.99 (+ x 0.01)))
> ((>= x 10.0) (* sum 0.005))
> (incf sum (the float (* 2 (exp (- (expt x 2.0))))))))
>
> CL-USER 54 > (time (dotimes (i 10000) (pi/2-opt)))
> Timing the evaluation of (DOTIMES (I 10000) (PI/2-OPT))
>
> user time = 97.631
> system time = 2.084
> Elapsed time = 0:02:05
> Allocation = 4877356064 bytes
> 0 Page faults
> Calls to %EVAL 120050
> NIL
>
> At nearly 30 times slower, I just don't stand a chance.
> I can probably push the high-level, DSL and other Lisp features
> at around 10x or 5x the speed penalty, but not at 30x.
>
> What did I misunderstand about the use of (declare (optimize ...
> and (the float ( ... or anything else that might affect the
> speed so adversely?
1. Note that the call to TIME also tells you that your function was
interpreted, not compiled.
2. I pasted your code in SBCL's REPL. Here's what I got:
; caught WARNING:
; These variables are undefined:
; SUM X
The declares must be in the scope of the declared variables, like so:
(read the standard to know where to put your declarations)
(defun pi/2-opt ()
(declare (optimize (safety 0) (debug 0) (speed 3)))
(do ((sum
(+ (exp (- (expt -10.0 2.0)))
(exp (- (expt 10.0 2.0)))))
(x -9.99 (+ x 0.01)))
((>= x 10.0) (* sum 0.005))
(declare (type single-float x sum))
(incf sum (the float (* 2 (exp (- (expt x 2.0))))))))
10000 iterations take ~2.9 seconds on a P-M 1.6 GHz.
3. I don't know which compiler you intend to use, but python is a bit
moronic when it comes to side-effected variables. Here's a
side-effect-free version that gives the same answer (but as doubles,
not single floats). In this case, 10000 iterations also take ~2.9
seconds, but I prefer this version :)
(defun pi/2 ()
(declare (optimize (speed 3) (safety 0)))
(labels ((inner (x acc)
(declare (type double-float x acc))
(if (>= x 10d0)
acc
(inner (+ 0.01 x)
(+ acc (* 2 (exp (- (expt x 2)))))))))
(* 0.005d0 (inner -9.99d0
(+ (exp (- (expt -10.0d0 2)))
(exp (- (expt 10.0d0 2))))))))
4. It is sometimes useful to compile different versions and then
examine their assembly with DISASSEMBLE. For heavily numeric code, one
doesn't have to understand the innards of the CL implementation, only
normal x86 + x87.
Good Luck,
Paul Khuong
On 2006-08-12 20:33:51 +0100, ·······@gmail.com said:
>> (defun pi/2-opt ()
>> (declare (optimize (safety 0) (debug 0) (speed 3)))
>> (declare (float sum))
>> (declare (float x))
>> (do ((sum
>> (+ (exp (- (expt -10.0 2.0)))
>> (exp (- (expt 10.0 2.0)))))
>> (x -9.99 (+ x 0.01)))
>> ((>= x 10.0) (* sum 0.005))
>> (incf sum (the float (* 2 (exp (- (expt x 2.0))))))))
> 1. Note that the call to TIME also tells you that your function was
> interpreted, not compiled.
CL-USER 58 > (compile 'pi/2-opt)
;;;*** Warning in PI/2-OPT: The definition of PI/2-OPT is already compiled.
PI/2-OPT
((PI/2-OPT #<STYLE-WARNING 100B2BDF>))
NIL
It already was compiled :-(
> 2. I pasted your code in SBCL's REPL. Here's what I got:
> ; caught WARNING:
> ; These variables are undefined:
> ; SUM X
> The declares must be in the scope of the declared variables, like so:
>
> (read the standard to know where to put your declarations)
>
> (defun pi/2-opt ()
> (declare (optimize (safety 0) (debug 0) (speed 3)))
> (do ((sum
> (+ (exp (- (expt -10.0 2.0)))
> (exp (- (expt 10.0 2.0)))))
> (x -9.99 (+ x 0.01)))
> ((>= x 10.0) (* sum 0.005))
> (declare (type single-float x sum))
> (incf sum (the float (* 2 (exp (- (expt x 2.0))))))))
Excellent. Thanks. I tried.
CL-USER 59 > (pi/2-opt)
1.3412517E-31
Ooops. definitely incorrect. So I rewrote the declare part
as:
(declare (type float x sum))
CL-USER 60 > (pi/2-opt)
#C(1.7724538 -7.747651E-8)
OK. Back in the game :-)
CL-USER 61 > (time (dotimes (i 10000) (pi/2-opt)))
Timing the evaluation of (DOTIMES (I 10000) (PI/2-OPT))
user time = 99.182
"-(
>
> 10000 iterations take ~2.9 seconds on a P-M 1.6 GHz.
>
> 3. I don't know which compiler you intend to use, but python is a bit
> moronic when it comes to side-effected variables. Here's a
> side-effect-free version that gives the same answer (but as doubles,
> not single floats). In this case, 10000 iterations also take ~2.9
> seconds, but I prefer this version :)
>
> (defun pi/2 ()
> (declare (optimize (speed 3) (safety 0)))
> (labels ((inner (x acc)
> (declare (type double-float x acc))
> (if (>= x 10d0)
> acc
> (inner (+ 0.01 x)
> (+ acc (* 2 (exp (- (expt x 2)))))))))
> (* 0.005d0 (inner -9.99d0
> (+ (exp (- (expt -10.0d0 2)))
> (exp (- (expt 10.0d0 2))))))))
CL-USER 63 > (pi/2-PK)
1.7724538905229454D0
Great. Correct too.
CL-USER 64 > (time (dotimes (i 10000) (pi/2-PK)))
Timing the evaluation of (DOTIMES (I 10000) (PI/2-PK))
user time = 43.500
Your version definitevly halves the time required!
That's execellent, but nowhere near the speed you
quote for your compiler/OS combo.
> 4. It is sometimes useful to compile different versions and then
> examine their assembly with DISASSEMBLE. For heavily numeric code, one
> doesn't have to understand the innards of the CL implementation, only
> normal x86 + x87.
I had tried to (disassemble 'pi/2-opt) but nearly fainted at the
sight of 104 lines of assembly :-(
> Good Luck,
Thank you very much! The lesson here for me is that I should
first try another compiler, as I just can't see why a copy and
paste of your version would run 15 times slower on my setup.
(LW 4.4.6 on a G4 @1.5GHz ... the fact that the disassembly shows
104 lines is probably a good hint that no optimization is applied
at all in the PE of LW?)
> Paul Khuong
--
JFB
verec <·····@mac.com> writes:
> On 2006-08-12 20:33:51 +0100, ·······@gmail.com said:
> > 1. Note that the call to TIME also tells you that your function was
> > interpreted, not compiled.
>
> CL-USER 58 > (compile 'pi/2-opt)
> ;;;*** Warning in PI/2-OPT: The definition of PI/2-OPT is already compiled.
> PI/2-OPT
> ((PI/2-OPT #<STYLE-WARNING 100B2BDF>))
> NIL
>
> It already was compiled :-(
The function PI/2-OPT may have been compiled, but what you tested with
TIME was the following:
(time (dotimes (i 10000) (pi/2-opt)))
and the DOTIMES loop is interpreted, not compiled. What I usually end
up doing to get around that is to write some simple test function like
so:
(defun test (n) (dotimes (i n) (pi/2-opt)))
and then make sure TEST is compiled. With dynamic linking, it also
means that subsequent changes to the tested function can be timed
without needing to recompile the TEST function.
(time (test 10000))
--
Thomas A. Russ, USC/Information Sciences Institute
Thomas A. Russ <···@sevak.isi.edu> wrote:
+---------------
| verec <·····@mac.com> writes:
| > On 2006-08-12 20:33:51 +0100, ·······@gmail.com said:
| > > 1. Note that the call to TIME also tells you that your function was
| > > interpreted, not compiled.
| >
| > CL-USER 58 > (compile 'pi/2-opt)
| > ;;;*** Warning in PI/2-OPT: The definition of PI/2-OPT is already compiled.
| > PI/2-OPT
| > ((PI/2-OPT #<STYLE-WARNING 100B2BDF>))
| > NIL
| >
| > It already was compiled :-(
|
| The function PI/2-OPT may have been compiled, but what you tested with
| TIME was the following:
| (time (dotimes (i 10000) (pi/2-opt)))
| and the DOTIMES loop is interpreted, not compiled.
+---------------
Interesting. In CMUCL, TIME always compiles the form to be timed:
cmu> (time (dotimes (i 100000000)))
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:
; Evaluation took:
; 0.11f0 seconds of real time
; 0.11f0 seconds of user run time
; 0.0f0 seconds of system run time
; 200,529,433 CPU cycles
; 0 page faults and
; 0 bytes consed.
;
NIL
cmu>
-Rob
-----
Rob Warnock <····@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607
Here are some good links on improving numerical computation speed in
Common Lisp:
"On using Common Lisp in scientific computing", by Nicolas Neuss. This
has a pretty good tuturial-level introduction to how to do numerical
optimization in Common Lisp.
http://www.iwr.uni-heidelberg.de/organization/sfb359/PP/Preprint2002-40.ps.gz
"Fast Floating-Point Processing in Common Lisp", by Richard Fateman.
This is a more in-depth treatment.
http://www.cs.berkeley.edu/~fateman/papers/lispfloat.ps
Ken Anderson's Common Lisp Performance page:
http://openmap.bbn.com/~kanderso/performance/
Additionally, here's a listing of math oriented libraries available for
Common Lisp:
http://www.cliki.net/mathematics
Glenn
On 2006-08-12 23:28:59 +0100, ·············@specastro.com said:
> Here are some good links on improving numerical computation speed in
> Common Lisp:
>
> "On using Common Lisp in scientific computing", by Nicolas Neuss. This
> has a pretty good tuturial-level introduction to how to do numerical
> optimization in Common Lisp.
> http://www.iwr.uni-heidelberg.de/organization/sfb359/PP/Preprint2002-40.ps.gz
>
> "Fast Floating-Point Processing in Common Lisp", by Richard Fateman.
> This is a more in-depth treatment.
> http://www.cs.berkeley.edu/~fateman/papers/lispfloat.ps
>
> Ken Anderson's Common Lisp Performance page:
> http://openmap.bbn.com/~kanderso/performance/
>
> Additionally, here's a listing of math oriented libraries available for
> Common Lisp:
> http://www.cliki.net/mathematics
Much appreciated. Thanks!
> Glenn
--
JFB
From: Wade Humeniuk
Subject: Re: help with numeric speed optimization
Date:
Message-ID: <YQqDg.3088$Nz6.726@edtnps82>
On 2006-08-12 22:28:15 +0100, Wade Humeniuk
<··················@telus.net> said:
> This version is better than the last
>
> (defun pi/2-opt ()
> (declare (optimize (speed 3) (safety 0) (debug 0) (float 0)))
> (loop with sum of-type double-float = #.(realpart (+ (exp (- (expt
> -10.0d0 2.0d0)))
> (exp (- (expt
> 10.0d0 2.0d0)))))
> for x of-type double-float from -9.99d0 below 10.0d0 by 0.01d0
> do (incf sum (* 2d0 (exp (- (* x x)))))
> finally (return (* sum 0.005d0))))
>
> CL-USER 1 > (time (dotimes (i 10000) (pi/2-opt)))
> Timing the evaluation of (DOTIMES (I 10000) (PI/2-OPT))
> ; Loading fasl file C:\Program
> Files\Xanalys\LispWorks\lib\4-3-0-0\modules\util\callcoun.fsl
>
> user time = 1.231
> system time = 0.000
> Elapsed time = 0:00:01
> Allocation = 164304 bytes standard / 113113 bytes conses
> 0 Page faults
> Calls to %EVAL 10035
CL-USER 1 > (pi/2-WH)
1.7724538509055174D0
CL-USER 2 > (time (dotimes (i 10000) (pi/2-WH)))
Timing the evaluation of (DOTIMES (I 10000) (PI/2-WH))
...
Elapsed time = 0:00:12
...
CL-USER 4 > (pi/2-WH2)
1.7724538509055174D0
CL-USER 5 > (time (dotimes (i 10000) (pi/2-WH2)))
Timing the evaluation of (DOTIMES (I 10000) (PI/2-WH2))
...
Elapsed time = 0:00:12
...
Thanks Wade!
At 12s for the Lisp version (still with the same LW 4.4,6 PE compiler)
I have no doubt that I now can successfully make my case!
(Though, out of curiosity I'm going to try with SBCL and OpenMCL)
Many thanks
--
JFB
On 2006-08-12 21:27:36 +0100, Wade Humeniuk
<··················@telus.net> said:
> Removed some expt stuff, I think it is still correct.
Hmmm. That's a point I overlooked when copying and pasting
your example (as well as PK's.)
The thing is, both of your version are about 10 times
faster than what I intially came up with _precisely_
because you have replaced (expt x 2) with (* x x)
When I plug (expt x 2) back into the loop, I get again
the same dismall performance as with my initial version.
with:
do (incf sum (* 2d0 (exp (- (* x x)))))
12s
with:
do (incf sum (* 2d0 (exp (- (expt x 2.0d0)))))
115s!
but with:
do (incf sum (* 2d0 (exp (- (realpart (expt x 2.0d0))))))
88s
Hmmm. Still scratching my head :-(
--
JFB
verec <·····@mac.com> writes:
> On 2006-08-12 21:27:36 +0100, Wade Humeniuk
> <··················@telus.net> said:
>
>> Removed some expt stuff, I think it is still correct.
>
> Hmmm. That's a point I overlooked when copying and pasting
> your example (as well as PK's.)
>
> The thing is, both of your version are about 10 times
> faster than what I intially came up with _precisely_
> because you have replaced (expt x 2) with (* x x)
>
> When I plug (expt x 2) back into the loop, I get again
> the same dismall performance as with my initial version.
You can write a compiler-macro. Unfortunately, not on CL:EXPT since
this is forbidden, but you can do it this way:
(shadow 'expt)
(declaim (inline expt))
(defun expt (base-number power-number) (cl:expt base-number power-number))
(define-compiler-macro expt (&whole whole base-number power-number)
(if (equal 2 power-number)
(let ((bn (gensym)))
`(let ((,bn ,base-number)) (* ,bn ,bn)))
whole))
Then compare:
(disassemble (compile (defun f (x) (expt x 2))))
(disassemble (compile (defun g (x y) (expt x y))))
So there is no need in modifying the source, keep (expt x 2).
(You can also write a more sophisticated compiler macro, handing other
constant exponents).
--
__Pascal Bourguignon__ http://www.informatimago.com/
"This statement is false." In Lisp: (defun Q () (eq nil (Q)))
On 2006-08-13 00:08:25 +0100, Pascal Bourguignon <···@informatimago.com> said:
> You can write a compiler-macro. Unfortunately, not on CL:EXPT since
> this is forbidden, but you can do it this way:
>
> (shadow 'expt)
> (declaim (inline expt))
> (defun expt (base-number power-number) (cl:expt base-number power-number))
> (define-compiler-macro expt (&whole whole base-number power-number)
> (if (equal 2 power-number)
> (let ((bn (gensym)))
> `(let ((,bn ,base-number)) (* ,bn ,bn)))
> whole))
>
> Then compare:
>
> (disassemble (compile (defun f (x) (expt x 2))))
> (disassemble (compile (defun g (x y) (expt x y))))
Cute :-)
Many thanks
--
JFB
On Sat, 12 Aug 2006 20:38:26 +0200, verec <·····@mac.com> wrote:
I get: Error: The value #C(1.1094963563118383E-43 2.7065338742349504E-57)
does not satisfy the type specifier FLOAT.
Perhaps that gives you a hint.
>
> (defun pi/2-opt ()
> (declare (optimize (safety 0) (debug 0) (speed 3)))
> (declare (float sum))
> (declare (float x))
> (do ((sum
> (+ (exp (- (expt -10.0 2.0)))
> (exp (- (expt 10.0 2.0)))))
> (x -9.99 (+ x 0.01)))
> ((>= x 10.0) (* sum 0.005))
> (incf sum (the float (* 2 (exp (- (expt x 2.0))))))))
>
> CL-USER 54 > (time (dotimes (i 10000) (pi/2-opt)))
> Timing the evaluation of (DOTIMES (I 10000) (PI/2-OPT))
>
> user time = 97.631
> system time = 2.084
> Elapsed time = 0:02:05
> Allocation = 4877356064 bytes
> 0 Page faults
> Calls to %EVAL 120050
> NIL
>
> At nearly 30 times slower, I just don't stand a chance.
> I can probably push the high-level, DSL and other Lisp features
> at around 10x or 5x the speed penalty, but not at 30x.
>
> What did I misunderstand about the use of (declare (optimize ...
> and (the float ( ... or anything else that might affect the
> speed so adversely?
>
> Just in case this isn't clear, the point is not about getting
> pi/2 but getting numeric speed for computations of complexity
> similar to that used in this approximation of pi/2.
>
> Many thanks
> --
> JFB
>
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
On 2006-08-12 20:12:06 +0100, "John Thingstad" <··············@chello.no> said:
> On Sat, 12 Aug 2006 20:38:26 +0200, verec <·····@mac.com> wrote:
>
> I get: Error: The value #C(1.1094963563118383E-43 2.7065338742349504E-57 )
> does not satisfy the type specifier FLOAT.
>
> Perhaps that gives you a hint.
Hmmm...
CL-USER 55 > (pi/2-opt)
#C(1.7724538 -7.747651E-8)
CL-USER 56 >
Are type-casts supposed to be implementation dependent?
I thought (erroneously?) that (the float XYZ) would
just drop the imaginary part, precisely because that's
the purpose of a cast? Or should I use (coerce ... in
addition?
FYI, I'm using LispWorks 4.4.6 PE on a G4.
Many thanks
--
JFB
>> (defun pi/2-opt ()
>> (declare (optimize (safety 0) (debug 0) (speed 3)))
>> (declare (float sum))
>> (declare (float x))
>> (do ((sum
>> (+ (exp (- (expt -10.0 2.0)))
>> (exp (- (expt 10.0 2.0)))))
>> (x -9.99 (+ x 0.01)))
>> ((>= x 10.0) (* sum 0.005))
>> (incf sum (the float (* 2 (exp (- (expt x 2.0))))))))
>>
>> CL-USER 54 > (time (dotimes (i 10000) (pi/2-opt)))
>> Timing the evaluation of (DOTIMES (I 10000) (PI/2-OPT))
>>
>> user time = 97.631
>> system time = 2.084
>> Elapsed time = 0:02:05
>> Allocation = 4877356064 bytes
>> 0 Page faults
>> Calls to %EVAL 120050
>> NIL
>>
>> At nearly 30 times slower, I just don't stand a chance.
>> I can probably push the high-level, DSL and other Lisp features
>> at around 10x or 5x the speed penalty, but not at 30x.
>>
>> What did I misunderstand about the use of (declare (optimize ...
>> and (the float ( ... or anything else that might affect the
>> speed so adversely?
>>
>> Just in case this isn't clear, the point is not about getting
>> pi/2 but getting numeric speed for computations of complexity
>> similar to that used in this approximation of pi/2.
In article <·······················@news.aaisp.net.uk>,
verec <·····@mac.com> wrote:
> I thought (erroneously?) that (the float XYZ) would
> just drop the imaginary part, precisely because that's
> the purpose of a cast? Or should I use (coerce ... in
> addition?
THE is not a type-cast, it is a declaration that the value of the
expression is known a priori to be of the specified type. It allows the
compiler to perform type dispatching at compile time rather than at run
time, but it doesn't perform any conversion to ensure that the object is
of that type.
So you should write (the float (realpart xyz)).
--
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
On 2006-08-12 22:54:06 +0100, Barry Margolin <······@alum.mit.edu> said:
> In article <·······················@news.aaisp.net.uk>,
> verec <·····@mac.com> wrote:
>
>> I thought (erroneously?) that (the float XYZ) would
>> just drop the imaginary part, precisely because that's
>> the purpose of a cast? Or should I use (coerce ... in
>> addition?
>
> THE is not a type-cast, it is a declaration that the value of the
> expression is known a priori to be of the specified type. It allows
> the compiler to perform type dispatching at compile time rather than at
> run time, but it doesn't perform any conversion to ensure that the
> object is of that type.
>
> So you should write (the float (realpart xyz)).
I knew that when with a hint of a doubt, I should head straight to the CLHS :-)
Many Thanks
--
JFB
On 2006-08-12 15:18:47 -0400, verec <·····@mac.com> said:
> Are type-casts supposed to be implementation dependent?
>
> I thought (erroneously?) that (the float XYZ) would
> just drop the imaginary part, precisely because that's
> the purpose of a cast?
Just to be clear, "the" is *not* a type cast - it is a special form
that acts as a type declaration.
(the float (foo ...)) is a special form telling the
compiler/interpreter that you, the programmer, are certain that the
expression beginning (foo ...) will definitely return a float so the
compiler can treat the return of (foo ...) as a float. If you use this
declaration with an expression that returns something other than a
float you are in undefined territory:
"The consequences are undefined if the values yielded by the form are
not of the type specified by value-type."
(from <http://www.lisp.org/HyperSpec/Body/speope_the.html#the>)
The function cl:float takes a real number and an optional prototype for
float format (e.g., 1.0d0) and returns a float of the same format as
the prototype or a single float if no prototype is supplied:
(float 1.0s0 1.0d0)
=> 1.0d0
(float 33/3 1.0d0)
=> 11.0d0
(float 2)
=> 2.0s0
If you want only the real part of a complex number use the function
cl:realpart:
(realpart #C(1.7724538 -7.747651E-8))
=> 1.7724538
On 2006-08-12 23:07:13 +0100, Raffael Cavallaro
<················@pas-d'espam-s'il-vous-plait-mac.com> said:
> "The consequences are undefined if the values yielded by the form are
> not of the type specified by value-type."
> (from <http://www.lisp.org/HyperSpec/Body/speope_the.html#the>)
>
[...]
> If you want only the real part of a complex number use the function
> cl:realpart:
>
> (realpart #C(1.7724538 -7.747651E-8))
>
> => 1.7724538
Point taken. Many thanks.
--
JFB
Le Sat, 12 Aug 2006 19:38:26 +0100, verec a écrit :
> static double piOver2() {
> double sum = std::exp(-std::pow(-10.0, 2.0)) ;
> sum += std::exp(-std::pow(10.0, 2)) ;
For the sake of my curiosity, why do you calculate twice the same
number? 10^2 == (-10)^2
> But I just can't tell how long 10,000 iterations would
> take because I lost patience after 2 minutes.
That's very strange indeed. I can confirm it is pretty fast without any
optimization. I'm also working with SBCL. It gets from 5 seconds without
optimization down to 1.1 with speed 3 and anything else to 0.
Maybe there's a flaw in your compiler. Try it on another implementation.
Curiously,
Nowhere man
--
···········@levallois.eu.org
OpenPGP 0xD9D50D8A
On 2006-08-12 21:49:58 +0100, Pierre THIERRY
<···········@levallois.eu.org> said:
> Le Sat, 12 Aug 2006 19:38:26 +0100, verec a écrit :
>> static double piOver2() {
>> double sum = std::exp(-std::pow(-10.0, 2.0)) ;
>> sum += std::exp(-std::pow(10.0, 2)) ;
>
> For the sake of my curiosity, why do you calculate twice the same
> number? 10^2 == (-10)^2
Well spotted! I guess this is only a mind-less translation of
mine of the math formula
pi/2 = integral[-10, 10, x](e^-(x^2))
:-(
>> But I just can't tell how long 10,000 iterations would
>> take because I lost patience after 2 minutes.
>
> That's very strange indeed. I can confirm it is pretty fast without any
> optimization. I'm also working with SBCL. It gets from 5 seconds without
> optimization down to 1.1 with speed 3 and anything else to 0.
>
> Maybe there's a flaw in your compiler. Try it on another implementation.
Still on the case :-)
I'll report in due course.
> Curiously,
> Nowhere man
--
JFB
On 9565 day of my life ·····@mac.com wrote:
> On 2006-08-12 21:49:58 +0100, Pierre THIERRY
>>> static double piOver2() {
>>> double sum = std::exp(-std::pow(-10.0, 2.0)) ;
>>> sum += std::exp(-std::pow(10.0, 2)) ;
>> For the sake of my curiosity, why do you calculate twice the same
>> number? 10^2 == (-10)^2
>
> Well spotted!
Well, any modern compiler will calculate both values at compile time.
And this "-" will help other people to understand your code.
--
Ivan Boldyrev
| recursion, n:
| See recursion
verec <·····@mac.com> writes:
> I have been challenged to "port" to CL the following (for now
> quite inaccurate) approximation of pi/2. Depending on the
> speed outcome, CL may get considered for the project that
> will be havy on mumeric computations.
I get consistently *faster* run times with SBCL 0.9.14.30 than with
GCC 4.0.4 using plain C, not C++. (There's no reason C++ would be
faster, is there?)
100,000 iterations on a 2GHz x86:
SBCL
17.729 seconds of real time
17.713108 seconds of user run time
0.004 seconds of system run time
0 page faults and
0 bytes consed.
GCC
real 0m19.948s
user 0m19.861s
sys 0m0.004s
Le Sun, 13 Aug 2006 13:50:12 +0200, Lars Brinkhoff a écrit :
> I get consistently *faster* run times with SBCL 0.9.14.30 than with
> GCC 4.0.4 using plain C, not C++. (There's no reason C++ would be
> faster, is there?)
How exactly did you instrument the C/C++ code to time it? As I didn't
want to take much time on it, I just compiled the given code with a main
doing 10,000 iterations with a for loop, and timed it with my shell, but
this includes spawning and shutting down the process...
Ha, wait. Don't mind. I could just change the number of iterations and
then easily calculate the impact it really has.
OK, I'm also significantly faster in Lisp... It took 1.7s even with -O3
for the C++ version (ten times with ten times iterations, so spawning
the process is quite lightweight, in fact), and 1.1s in Lisp for the
following:
(defun pi/2-opt ()
(declare (optimize (debug 0)(space 0)(safety 0)(speed 3)))
(do ((x -9.98 (+ x 0.01))
(v 0.0 (exp (- (expt x 2.0))))
(sum (* 2 (exp (- (expt 10.0 2.0)))) (+ sum (* 2 v))))
((>= x 10.0) (* 0.005 sum))
(declare (type float x v sum))))
I'm impressed...
Quickly,
Nowhere man
--
···········@levallois.eu.org
OpenPGP 0xD9D50D8A
On 2006-08-13 13:51:43 +0100, Pierre THIERRY
<···········@levallois.eu.org> said:
> (defun pi/2-opt ()
> (declare (optimize (debug 0)(space 0)(safety 0)(speed 3)))
> (do ((x -9.98 (+ x 0.01))
> (v 0.0 (exp (- (expt x 2.0))))
> (sum (* 2 (exp (- (expt 10.0 2.0)))) (+ sum (* 2 v))))
> ((>= x 10.0) (* 0.005 sum))
> (declare (type float x v sum))))
>
> I'm impressed...
So am I :-)
I tested this version sith sbcl 0.9.15 and indeed:
* (time (dotimes (i 10000) (pi/2-optPT)))
Evaluation took:
4.413 seconds of real time
3.753417 seconds of user run time
0.036023 seconds of system run time
0 page faults and
81,920 bytes consed.
NIL
I'm amazed at the difference a compiler makes...
Many thanks to all, for all your time and efforts.
--
JFB
verec wrote, On 14/8/06 2:06 AM:
> I'm amazed at the difference a compiler makes...
If you look at the recent thread in this newsgroup with subject "Squeezing
blood from a turnip" you will see a related discussion about getting better
performance in Lisp than C at counting bits in an integer.
What ended up being found is that the Common Lisp built in function logcount
was the fastest way do do it in SBCL, that the fastest C program to do it was
faster, that the implementation of logcount in SBCL was not using the fastest
algorithm that was known in C, and then coincidentally someone contributed a
faster logcount to the SBCL project and that made the Lisp version faster than
the C version.
An important point that you can get from my post to that thread is that you
can write your own compiler implementation of a function like logcount using
the SBCL macro define-vop, which allows you to write the assembler language
implementation of a function. So even if the bottleneck in your program is not
something that is part of the CL language definition such as logcount, you are
able to hand optimize it if it makes sense to do so. And you can evaluate a
define-vop form right at the REPL. You don't have to touch the compiler source
code to in effect make a patch to the the compiler.
If profiling shows that the bottleneck is a function that is part of Common
Lisp, you can give your improvements back to the project, if it is an open
source Lisp.
That means that if you have a very large program, once you have the algorithm
tweaked and you profile to find where the bottleneck is, you can go the extra
step of telling the compiler how to write fully optimized machine language for
a critical function in the inner loop, if there is such a function.
The advantage I have found in using Common Lisp instead of C for highly
optimized code is that it is much easier to make major changes to the
algorithm, which usually is the right way to get big improvements, before
dealing with low level tweaking.
--
Sidney Markowitz
http://www.sidney.com
Pierre Thierry wrote:
> Lars Brinkhoff wrote:
> > I get consistently *faster* run times with SBCL 0.9.14.30 than
> > with GCC 4.0.4.
> How exactly did you instrument the C/C++ code to time it?
Here's the whole thing:
#include <math.h>
static double pi_over_2 (void)
{
double sum = exp (-pow (-10.0, 2.0)) + exp (-pow (10.0, 2.0));
double x;
for (x = -9.99; x < 10.0; x += 0.01)
sum += 2.0 * exp (-pow (x, 2.0)) ;
return sum * 0.005;
}
int main (void)
{
int i;
for (i = 0; i < 100000; i++)
pi_over_2 ();
return 0;
}
> OK, I'm also significantly faster in Lisp... It took 1.7s even with
> -O3 for the C++ version [...], and 1.1s in Lisp
I didn't see any significant differences between -O2, -O3, and -Os, or
between C and C++, so I guess that's not a factor to consider here.
> (defun pi/2-opt ()
> (declare (optimize (debug 0)(space 0)(safety 0)(speed 3)))
> (do ((x -9.98 (+ x 0.01))
> (v 0.0 (exp (- (expt x 2.0))))
> (sum (* 2 (exp (- (expt 10.0 2.0)))) (+ sum (* 2 v))))
> ((>= x 10.0) (* 0.005 sum))
> (declare (type float x v sum))))
You can also avoid boxing the return value, but I find that it doesn't
improve speed noticeably in this toy program.
(defun pi/2 (box)
(declare (type (simple-array double-float ()) box))
(do (...)
((>= x 10.0d0)
(setf (aref box) (* 0.005d0 sum))
nil)
...))
A one-element vector works just as well, but it's always fun to use a
zero-dimensional array. :) (Perhaps other implementations optimize one
of those better than another.)
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: help with numeric speed optimization
Date:
Message-ID: <87lkps5v3a.fsf@qrnik.zagroda>
Pierre THIERRY <···········@levallois.eu.org> writes:
> OK, I'm also significantly faster in Lisp... It took 1.7s even with -O3
> for the C++ version (ten times with ten times iterations, so spawning
> the process is quite lightweight, in fact), and 1.1s in Lisp
Compile the C version with -ffast-math and do something to avoid
optimizing out the whole computation (e.g sum the results and print
the sum). It's twice faster than without -ffast-math on my box.
--
__("< Marcin Kowalczyk
\__/ ······@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
Marcin 'Qrczak' Kowalczyk wrote:
> Pierre Thierry wrote;
> > OK, I'm also significantly faster in Lisp... It took 1.7s even
> > with -O3 for the C++ version [...], and 1.1s in Lisp
> Compile the C version with -ffast-math and do something to avoid
> optimizing out the whole computation (e.g sum the results and print
> the sum). It's twice faster than without -ffast-math on my box.
Significantly faster, yes, but not by a factor of two on my machine.
SBCL
17.772 seconds of real time
17.745108 seconds of user run time
0.004 seconds of system run time
0 page faults and
0 bytes consed.
GCC
real 0m15.863s
user 0m15.785s
sys 0m0.012s
Old results, without summing the return values:
SBCL
17.729 seconds of real time
17.713108 seconds of user run time
0.004 seconds of system run time
0 page faults and
0 bytes consed.
GCC
real 0m19.948s
user 0m19.861s
sys 0m0.004s
Marcin 'Qrczak' Kowalczyk wrote:
> Pierre THIERRY <···········@levallois.eu.org> writes:
>
> > OK, I'm also significantly faster in Lisp... It took 1.7s even with -O3
> > for the C++ version (ten times with ten times iterations, so spawning
> > the process is quite lightweight, in fact), and 1.1s in Lisp
>
> Compile the C version with -ffast-math and do something to avoid
> optimizing out the whole computation (e.g sum the results and print
> the sum). It's twice faster than without -ffast-math on my box.
On the MS Windows machine I'm currently using:
Cygwin GCC 3.4.4 takes 19.5secs (with -O3 and --fast-math)
SBCL "kitten of death" takes ~23secs
GCL 2.6.1 takes 59secs
I'm impressed by how close SBCL gets to GCC. I'm also impressed at the
acrobatics needed to stop GCC from removing the calculation altogether
(as you say sum results + print sum).
The 'time' function seems not to work on MS windows SBCL, I imagine
fixing that isn't a priority.
Rob Thorpe <·············@antenova.com> wrote:
> The 'time' function seems not to work on MS windows SBCL, I imagine
> fixing that isn't a priority.
It works in the latest release, but you'll need to build that from
source. Binaries aren't made for all platforms on every release.
--
Juho Snellman
"verec" <·····@mac.com> wrote:
> Depending on the
> speed outcome, CL may get considered for the project that
> will be havy on mumeric computations.
Why use Lisp for numerical computations ?
patro
On 2006-08-13 23:15:29 +0100, "patro" <·····@ortap.com> said:
>> Depending on the
>> speed outcome, CL may get considered for the project that
>> will be havy on mumeric computations.
>
> Why use Lisp for numerical computations ?
I said that the project was going to be heavy on numerical
computations, not that it was only about them.
The basic blocks (financial modelling on time series) are
fairly well defined. That's where the "raw speed" is needed
because the data samples are potentially _huge_, and the
computations not trivial. The math guy doesn't really care
which language is chosen, provided it is "fast enough", and
I'm pushing as hard as I can to avoid having to use Java
for the rest, because Java forces me to build too much
infrastructure/protocols that have to be almost entirely discarded
when the higher level requirements change.
The higher levels (what strategies to apply according to the
results of the models) are a lot less well defined, and it is
my beleif that changing our mind three times along the way about
what we really want is going to be less expensive and faster
in CL (than the daily nightmare of forcing new business requirements
into a three year old Java code base that was not meant for them.)
But in order to be able to change our minds, we need first
to see some results, hence experiment, hence my plea for using
CL
--
JFB
From: Paul Jones
Subject: Re: help with numeric speed optimization
Date:
Message-ID: <1155659641_508@news-east.n>
On 2006-08-14 01:30:26 +0200, verec <·····@mac.com> said:
> On 2006-08-13 23:15:29 +0100, "patro" <·····@ortap.com> said:
>
>>> Depending on the
>>> speed outcome, CL may get considered for the project that
>>> will be havy on mumeric computations.
>
> I said that the project was going to be heavy on numerical
> computations, not that it was only about them.
>
> The basic blocks (financial modelling on time series) are
> fairly well defined. That's where the "raw speed" is needed
> because the data samples are potentially _huge_, and the
> computations not trivial. The math guy doesn't really care
> which language is chosen, provided it is "fast enough", and
> I'm pushing as hard as I can to avoid having to use Java
> for the rest, because Java forces me to build too much
> infrastructure/protocols that have to be almost entirely discarded
> when the higher level requirements change.
Speaking from experience, it's really not at all about how fast a
language can do one sort or another of numerical floating point
arithmetic. With huge data samples, it's about how the data are
organized and what types of methods are used to access them. We handle
very large time series using Lisp, and are eternally thankful that we
chose it for a language. We do use some routines for numerical
computation that are not Lisp, but they're sure not Java either.
They're carefully coded and optimized and then wrapped in Lisp wrappers.
We have years of experience in using Lisp with (very large) time series
for financial applications. If you'd like more pointers, let me know.
>
> The higher levels (what strategies to apply according to the
> results of the models) are a lot less well defined, and it is
> my beleif that changing our mind three times along the way about
> what we really want is going to be less expensive and faster
> in CL (than the daily nightmare of forcing new business requirements
> into a three year old Java code base that was not meant for them.)
>
> But in order to be able to change our minds, we need first
> to see some results, hence experiment, hence my plea for using
> CL
----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
Paul Jones wrote:
> On 2006-08-14 01:30:26 +0200, verec <·····@mac.com> said:
>
> > On 2006-08-13 23:15:29 +0100, "patro" <·····@ortap.com> said:
> >
> >>> Depending on the
> >>> speed outcome, CL may get considered for the project that
> >>> will be havy on mumeric computations.
> >
> > I said that the project was going to be heavy on numerical
> > computations, not that it was only about them.
> >
> > The basic blocks (financial modelling on time series) are
> > fairly well defined. That's where the "raw speed" is needed
> > because the data samples are potentially _huge_, and the
> > computations not trivial. The math guy doesn't really care
> > which language is chosen, provided it is "fast enough", and
> > I'm pushing as hard as I can to avoid having to use Java
> > for the rest, because Java forces me to build too much
> > infrastructure/protocols that have to be almost entirely discarded
> > when the higher level requirements change.
> Speaking from experience, it's really not at all about how fast a
> language can do one sort or another of numerical floating point
> arithmetic. With huge data samples, it's about how the data are
> organized and what types of methods are used to access them.
I'd agree with that. The example program given isn't a particularly
good one.
Most programs that use floating point iterate over large data
structures to read the data they need and output the results. The
optimization of these data structures is often the most important part.
In the SPECfp2000 floating point benchmark set there are now few
benchmarks that make great use of the floating point capabilities of
the processor. Most test the capabilities of the memory subsystem to
access regularly structured data. This is because the benchmarks are
built to be typical of fp applications and this is what fp applications
today do.
Le Mon, 14 Aug 2006 00:15:29 +0200, patro a écrit :
> Why use Lisp for numerical computations ?
Why not?
Curiously,
Nowhere man
--
···········@levallois.eu.org
OpenPGP 0xD9D50D8A
Thanks to all for your replies. It helped win the case: we're
using Lisp for the project! Yeap! BTW: the math guy apologized:
the reason that the initial pi/2 wasn't very accurate (15% off
or so) is that the formula he gave me was for (sqrt pi), not
(/ pi 2) ...
CL-USER 11 > (defun pi/2-opt4 () ;;; formula for (sqrt pi) not for (/ pi 2) !!
(declare (optimize (speed 3) (safety 0) (debug 0) (float 0)))
(loop with sum of-type double-float = #.(realpart (+ (exp (- (expt
-10.0d0 2.0d0)))
(exp (- (expt
10.0d0 2.0d0)))))
for x of-type double-float from -9.99d0 below 10.0d0 by 0.01d0
do (incf sum (* 2d0 (exp (- (realpart (expt x 2.0d0))))))
finally (return (* sum 0.005d0))))
CL-USER 12 > (pi/2-opt4)
1.7724538509055174D0
CL-USER 13 > (sqrt pi)
1.7724538509055159D0
CL-USER 14 > (/ pi 2)
1.5707963267948966D0
Many thanks again.
--
JFB