From: ···@sef-pmax.slisp.cs.cmu.edu
Subject: Re: Floating Point
Date:
Message-ID: <CMB3tq.3MG.3@cs.cmu.edu>
From: ········@silver.ucs.indiana.edu (lael schooler)
I have heard that there exist implementations of LISP
which are as fast as C. Is this true?
CMU Common Lisp is especially good at floating-point, as Lisps go. When we
first it running on the MIPS-based Decstation (sorry -- only under Mach!),
we found that float-intensive code (for example, some of my neural-net
simulators) ran five times as fast in CMU CL as in certain commercial
Common Lisps (which may have improved since then), and that CMU CL was
quite competitive with C -- usually within 20% and sometimes faster.
However, this assumes that your CL program has all the "short-float"
declarations you need to generate optimal code. The Python compiler in CMU
CL will helpfully point out all the places where lack of declarations
force it to do generic arithmetic.
Of course, the old Maclisp compiler did even better relative to the
conventional-language compilers of its day.
-- Scott
===========================================================================
Scott E. Fahlman Internet: ····@cs.cmu.edu
Senior Research Scientist Phone: 412 268-2575
School of Computer Science Fax: 412 681-5739
Carnegie Mellon University Latitude: 40:26:33 N
5000 Forbes Avenue Longitude: 79:56:48 W
Pittsburgh, PA 15213
===========================================================================
From: ········@silver.ucs.indiana.edu (lael schooler)
I have heard that there exist implementations of LISP
which are as fast as C. Is this true?
You have to be very carefull. Differences in Lisp compilers can produce wildly
different results on rather benign floating point code. In some compilers,
even very small changes to the code can trigger the enabling or disabling of a
whole cascade of code generation optimizations which can affect performance
by an order of magnitude or more. Here is the synopsis of my intimate
experience with the Lucid/SPARC production compiler version 4.1 (henceforth
refered to as Lucid). (It says nothing about any other compiler.)
I once wrote some code to do some computational geometry. In a nutshell, it
took two line-segments or circles as input, and computed how much you could
translate or rotate one until it intersected the other. There are eight cases
here, the cross product of the trajector being a line-segment or a circle, the
target being a line-segment or a circle, and translation versus rotation.
Since across all of these cases there were a lot of similar calculations, I
wrote it in CLOS with all of the nice high-level abstractions we teach in
university. While the code was pristine and elegant, it was slow as a dog.
There were several reasons for this:
A. The straightforward encoding of the mathematical derivations meant that
there were numerous cases of repeated computations or computations that would
be discarded. Compilers usually can't do common subexpression removal or dead
code elimination across function call abstraction boundaries, let alone across
CLOS method dispatch and method combination.
B. The elegant way of doing things dictated that I have a generic function
for computing things like the intersection point of two figures. This function
did a multimethod dispatch on both arguments. In all cases in my code the
types of both arguments could be determined at compile time. But in Lucid,
even if you fully declare everything, you still get run time dispatch.
C. The elegant way of doing things dictated that many routines construct and
return new objects like points, lines, and rays. This meant that the
implementation was consing new temporary objects out the wazoo. Often these
temporary objects had extremely short lifetime, often serving just to package
up a pair of numbers returned from one function as input to another. I
originally used multiple values for this but found that code very hard to
debug and prove correct.
So I figured I would be industrious and I wrote a partial evaluator. This
partial evaluator accepted most the CommonLisp special forms and even accepted
a fair subset of CLOS. It did very extensive type inference. With minimal
declarations it could unravel arbitrary CommonLisp code into straightline
arithmetic code so long as certain compile-time termination conditions were
met (like compile-time determinable bounds on loops or recursion). This
straightline code did no consing, no slot access, no type dispatching, no
loops, no recursion, and no high-level function calls; just really dense calls
to primitive functions like +, -, *, /, ABS, SIN, ... along with conditional
branching. Having squished the abstraction hierarchy down to a single
function, the partial evaluator did common subexpression and dead code
elimination on this straightline code. The result was something like:
(flet ((f1 (x y) (sqrt (+ (* x x) (* y y))))
(f2 (x y u v) (atan (- y v) (- x u))))
(if (f2 a b c d)
(f1 f g)
(f1 i j)))
(This is not the actual code. I could run the partial evaluator and give
actual code but I don't have the time right now. This is just a hand
simulation to give you an idea of the code generated.)
After spending several months of hard work on the partial evaluator, I finally
got it to translate all 1200 lines of the inner loop of my computational
geometry calculations into thousands upon thousands of lines straightline
code. And yes, the code was extremely dense with no consing, slot access, or
type dispatching. To my great dismay and utter disappointment the code was
still slow as a dog. So I began disassembling the output of the Lucid
compiler. What I found was that even though my partial evaluator did complete
type inference for every expression in my code, and it generated straightline
code fully decorated with float declarations, the Lucid compiler insisted upon
boxing all of the floats since THERE IS NO WAY WITH LUCID TO PASS AN UNBOXED
FLOAT AS AN ARGUMENT TO A FUNCTION OR RETURN AN UNBOXED FLOAT AS A RESULT.
This holds true even for local functions generated by FLET. So I tried to use
lexical free variables to pass parameters instead of arguments. No go. Lucid
still would box and unbox the floats upon each access. I tried everything
without success. And because the floats were boxed, calls to `primitives' like
ABS used the generic version rather than being open-coded.
Now I experimented by hand and could write very dense code that within one
function, would leave all floats unboxed and open-code calls to primitives.
But I needed to have this work across functions which I fully declared. And
there was no way to get this with Lucid.
Fortunately, my partial evaluator used a very general and abstract internal
representation for straighline code from which I could generate various
different styles of target Lisp code. I explored the space of styles quite
extensively. And then I gave up. I ended up having my partial evaluator
generate straightline C code from that same internal representation and then
load that C code in with the FFI.
Anyway, this is just my war story.
Jeff
In article <·················@qobi.ai.toronto.edu> ····@qobi.ai.toronto.edu (Jeffrey Mark Siskind) writes:
Path: news.bbn.com!olivea!decwrl!hookup!swrinde!ihnp4.ucsd.edu!mvb.saic.com!MathWorks.Com!panix!zip.eecs.umich.edu!newsxfer.itd.umich.edu!nntp.cs.ubc.ca!utcsri!newsServ.csri!qobi
Newsgroups: comp.lang.lisp
From: ····@qobi.ai.toronto.edu (Jeffrey Mark Siskind)
Reply-To: ····@CS.Toronto.EDU
Organization: Department of Computer Science, University of Toronto
References: <············@cs.cmu.edu>
Date: 7 Mar 94 20:05:03 GMT
Lines: 97
From: ········@silver.ucs.indiana.edu (lael schooler)
I have heard that there exist implementations of LISP
which are as fast as C. Is this true?
You have to be very carefull. Differences in Lisp compilers can produce wildly
different results on rather benign floating point code. In some compilers,
even very small changes to the code can trigger the enabling or disabling of a
whole cascade of code generation optimizations which can affect performance
by an order of magnitude or more. Here is the synopsis of my intimate
experience with the Lucid/SPARC production compiler version 4.1 (henceforth
refered to as Lucid). (It says nothing about any other compiler.)
I once wrote some code to do some computational geometry. In a nutshell, it
took two line-segments or circles as input, and computed how much you could
translate or rotate one until it intersected the other. There are eight cases
here, the cross product of the trajector being a line-segment or a circle, the
target being a line-segment or a circle, and translation versus rotation.
Since across all of these cases there were a lot of similar calculations, I
wrote it in CLOS with all of the nice high-level abstractions we teach in
university. While the code was pristine and elegant, it was slow as a dog.
There were several reasons for this:
A. The straightforward encoding of the mathematical derivations meant that
there were numerous cases of repeated computations or computations that would
be discarded. Compilers usually can't do common subexpression removal or dead
code elimination across function call abstraction boundaries, let alone across
CLOS method dispatch and method combination.
B. The elegant way of doing things dictated that I have a generic function
for computing things like the intersection point of two figures. This function
did a multimethod dispatch on both arguments. In all cases in my code the
types of both arguments could be determined at compile time. But in Lucid,
even if you fully declare everything, you still get run time dispatch.
C. The elegant way of doing things dictated that many routines construct and
return new objects like points, lines, and rays. This meant that the
implementation was consing new temporary objects out the wazoo. Often these
temporary objects had extremely short lifetime, often serving just to package
up a pair of numbers returned from one function as input to another. I
originally used multiple values for this but found that code very hard to
debug and prove correct.
So I figured I would be industrious and I wrote a partial evaluator. This
partial evaluator accepted most the CommonLisp special forms and even accepted
a fair subset of CLOS. It did very extensive type inference. With minimal
declarations it could unravel arbitrary CommonLisp code into straightline
arithmetic code so long as certain compile-time termination conditions were
met (like compile-time determinable bounds on loops or recursion). This
straightline code did no consing, no slot access, no type dispatching, no
loops, no recursion, and no high-level function calls; just really dense calls
to primitive functions like +, -, *, /, ABS, SIN, ... along with conditional
branching. Having squished the abstraction hierarchy down to a single
function, the partial evaluator did common subexpression and dead code
elimination on this straightline code. The result was something like:
(flet ((f1 (x y) (sqrt (+ (* x x) (* y y))))
(f2 (x y u v) (atan (- y v) (- x u))))
(if (f2 a b c d)
(f1 f g)
(f1 i j)))
(This is not the actual code. I could run the partial evaluator and give
actual code but I don't have the time right now. This is just a hand
simulation to give you an idea of the code generated.)
After spending several months of hard work on the partial evaluator, I finally
got it to translate all 1200 lines of the inner loop of my computational
geometry calculations into thousands upon thousands of lines straightline
code. And yes, the code was extremely dense with no consing, slot access, or
type dispatching. To my great dismay and utter disappointment the code was
still slow as a dog. So I began disassembling the output of the Lucid
compiler. What I found was that even though my partial evaluator did complete
type inference for every expression in my code, and it generated straightline
code fully decorated with float declarations, the Lucid compiler insisted upon
boxing all of the floats since THERE IS NO WAY WITH LUCID TO PASS AN UNBOXED
FLOAT AS AN ARGUMENT TO A FUNCTION OR RETURN AN UNBOXED FLOAT AS A RESULT.
This holds true even for local functions generated by FLET. So I tried to use
You'll find CMU CL much better at least the flet case. The best thing you
can currently do in general Common Lisp is inline the functions in the
inner loop so that the compiler can propagate the type information. I
realize that is not what you always want.
Also, Allegro 4.2 final can let users pass unboxed floats to lisp functions
much the same way that you can pass them to foreign functions.
Thisapproach goes against some Common Lisp semantics (which is why this is
an issue anyway) so it isn't currently advertised.
lexical free variables to pass parameters instead of arguments. No go. Lucid
still would box and unbox the floats upon each access. I tried everything
without success. And because the floats were boxed, calls to `primitives' like
One trick that i've used is to use a (simple-array double-float (*)) array
as a place to store floats temporarily without consing them. I realize
this may seem like a kludge in your application, but in others it fits in
fairly well.
ABS used the generic version rather than being open-coded.
I don't actually believe this is true, at least in Lucid today, but ABS is
a classic example of how an apparently trivial operation can be much slower
than you expect. (I'm cleaning up an example of this right now.)
Now I experimented by hand and could write very dense code that within one
function, would leave all floats unboxed and open-code calls to primitives.
But I needed to have this work across functions which I fully declared. And
there was no way to get this with Lucid.
Fortunately, my partial evaluator used a very general and abstract internal
representation for straighline code from which I could generate various
different styles of target Lisp code. I explored the space of styles quite
extensively. And then I gave up. I ended up having my partial evaluator
generate straightline C code from that same internal representation and then
load that C code in with the FFI.
I think that writing Lisp code that writes C code is a wonderful use of
Lisp. You've crystalized the knowledge, that started out in an elegant but
slow CLOS program, into an efficient C program. Of course, your point is
we should be able to do this more automatically.
Anyway, this is just my war story.
A wonderful one. These are important to share, with users and vendors.
I've been trying to turn such war stories into a practical guide for
improving Lisp performance. I find that it usually isn't hard to find
places where small changes code lead to vast (X10 or more) improvements in
performance. The suprising thing is that this can be true in C as well.
k
--
Ken Anderson
Internet: ·········@bbn.com
BBN STC Work Phone: 617-873-3160
10 Moulton St. Home Phone: 617-643-0157
Mail Stop 6/4c FAX: 617-873-3776
Cambridge MA 02138
USA
In article <·····················@wheaton.bbn.com>,
Ken Anderson <········@wheaton.bbn.com> wrote:
>
>One trick that i've used is to use a (simple-array double-float (*)) array
>as a place to store floats temporarily without consing them. I realize
>this may seem like a kludge in your application, but in others it fits in
>fairly well.
Yes, this works fairly well albeit awkwardly. A similar solution
solves the problem where you want to store a float in a sturcture or
class.
Structure and instance slot storage is generally equivvalent to
SIMPLE-VECTOR. This is the biggest lossage I find in practice with
using floating point. I can usually use INLINE, compiler options, and
other chicanery to keep consing from float boxing down to nearly zero,
but you usually have to store the answer when you are done somewhere.
As far as I know only CMUCL offers unboxed slots in structures. You
wind up having to do something like this in all other implementations:
(defstruct frob
;;
;; I really want a slot named XOOL of type DOUBLE-FLOAT, but this
;; cruft will keep me from consing.
(xool-unboxed-storage (make-array 1 :element-type 'double-float
:initial-element 0.0)
:type (simple-array double-float (1))
:read-only t))
(declaim (inline frob-xool))
(defun frob-xool (frob)
(declare (type frob frob))
(aref (frob-xool-unboxed-storage frob) 0))
(declaim (inline set-frob-xool))
(defun set-frob-xool (frob value)
(declare (double-float value)
(type frob frob))
(setf (aref (frob-xool-unboxed-storage frob) 0) value))
;;;
(defsetf frob-xool set-frob-xool)
It costs you an extra dereference everywhere, but you can get some
tight code from it. Of course you can hide the cruft with some macro
magic.
Bottom line: vendors should support unboxed slots in structures and
classes as well. It's not that hard, although inheritance keeps it
interesting.
-- Chris.
(··@lks.csi.com)
In article <············@cs.cmu.edu> ···@cs.cmu.edu (Christopher Hoover) writes:
(defstruct frob
;;
;; I really want a slot named XOOL of type DOUBLE-FLOAT, but this
;; cruft will keep me from consing.
(xool-unboxed-storage (make-array 1 :element-type 'double-float
:initial-element 0.0)
:type (simple-array double-float (1))
:read-only t))
In a case like this, I myself prefer an array rank of 0 instead of 1:
(defstruct frob
(xool-unboxed-storage (make-array nil :element-type 'double-float
:initial-element 0d0)
:type (simple-array double-float nil)
:read-only t))
The array contents can then be referenced simply as
(aref (frob-xool-unboxed-storage frob))
with no indices at all.
--
Lawrence G. Mayka
AT&T Bell Laboratories
···@iexist.att.com
Standard disclaimer.
In article <·················@polaris.flw.att.com>,
Lawrence G. Mayka <···@polaris.flw.att.com> wrote:
>In article <············@cs.cmu.edu> ···@cs.cmu.edu (Christopher Hoover) writes:
>
> (defstruct frob
> ;;
> ;; I really want a slot named XOOL of type DOUBLE-FLOAT, but this
> ;; cruft will keep me from consing.
> (xool-unboxed-storage (make-array 1 :element-type 'double-float
> :initial-element 0.0)
> :type (simple-array double-float (1))
> :read-only t))
>
>In a case like this, I myself prefer an array rank of 0 instead of 1:
>
>(defstruct frob
> (xool-unboxed-storage (make-array nil :element-type 'double-float
> :initial-element 0d0)
> :type (simple-array double-float nil)
> :read-only t))
>
>The array contents can then be referenced simply as
>
>(aref (frob-xool-unboxed-storage frob))
>
>with no indices at all.
>
>--
> Lawrence G. Mayka
> AT&T Bell Laboratories
> ···@iexist.att.com
>
>Standard disclaimer.
Neat idea. Unfortunately Lucid CL apparently doesn't do the same
optimizations for simple 0-d double-float arrays as they do for simple
1-d and 2-d double-float arrays. (N.B. I didn't try to make it work
very hard, but I believe this is the case.)
-- Chris.
(··@lks.csi.com)
In article <·················@qobi.ai.toronto.edu> ····@qobi.ai.toronto.edu (Jeffrey Mark Siskind) writes:
After spending several months of hard work on the partial evaluator, I finally
got it to translate all 1200 lines of the inner loop of my computational
geometry calculations into thousands upon thousands of lines straightline
code. And yes, the code was extremely dense with no consing, slot access, or
type dispatching. To my great dismay and utter disappointment the code was
still slow as a dog. So I began disassembling the output of the Lucid
compiler. What I found was that even though my partial evaluator did complete
type inference for every expression in my code, and it generated straightline
code fully decorated with float declarations, the Lucid compiler insisted upon
boxing all of the floats since THERE IS NO WAY WITH LUCID TO PASS AN UNBOXED
FLOAT AS AN ARGUMENT TO A FUNCTION OR RETURN AN UNBOXED FLOAT AS A RESULT.
This holds true even for local functions generated by FLET. So I tried to use
lexical free variables to pass parameters instead of arguments. No go. Lucid
still would box and unbox the floats upon each access. I tried everything
without success. And because the floats were boxed, calls to `primitives' like
ABS used the generic version rather than being open-coded.
"Horror stories" like this are a good argument for vendors to support
unboxed SHORT-FLOATs. In the two implementations that I know of that
support unboxed SHORT-FLOATs (LispWorks and CLISP), the range from
LEAST-POSITIVE-SHORT-FLOAT to MOST-POSITIVE-SHORT-FLOAT is roughly the
same as for SINGLE-FLOAT; only the precision is lower.
--
Lawrence G. Mayka
AT&T Bell Laboratories
···@iexist.att.com
Standard disclaimer.
I can vouch for cmucl.
I did a simple neural net benchmark and cmucl beat the sun c compiler,
and was just beaten by gcc with full opt. The code used arrays, but
not complex function calls or method dispatching. (cmucl knows how to
compile a group of functions as a block, and thus do certain cross
function optimizations.) cmucl beat gcc for a simple integer
benchmark, but I'd say that they are actually similar.
I suggest that you write a test for the guts of what you are trying to
do in C and lisp, and find out for yourself. The lisp code will need
declarations, make sure that you know how to use them.
Anthony
--
Anthony Berglas
Rm 312a, Computer Science, Uni of Qld, 4072, Australia.
Uni Ph +61 7 365 4184, Home 391 7727, Fax 365 1999