From: ···@sef-pmax.slisp.cs.cmu.edu
Subject: Re: Floating Point
Date: 
Message-ID: <CMB3tq.3MG.3@cs.cmu.edu>
    From: ········@silver.ucs.indiana.edu (lael schooler)
    
    I have heard that there exist implementations of LISP
    which are as fast as C. Is this true?
    
CMU Common Lisp is especially good at floating-point, as Lisps go.  When we
first it running on the MIPS-based Decstation (sorry -- only under Mach!),
we found that float-intensive code (for example, some of my neural-net
simulators) ran five times as fast in CMU CL as in certain commercial
Common Lisps (which may have improved since then), and that CMU CL was
quite competitive with C -- usually within 20% and sometimes faster.
However, this assumes that your CL program has all the "short-float"
declarations you need to generate optimal code.  The Python compiler in CMU
CL will helpfully point out all the places where lack of declarations
force it to do generic arithmetic.

Of course, the old Maclisp compiler did even better relative to the
conventional-language compilers of its day.

-- Scott

===========================================================================
Scott E. Fahlman			Internet:  ····@cs.cmu.edu
Senior Research Scientist		Phone:     412 268-2575
School of Computer Science              Fax:       412 681-5739
Carnegie Mellon University		Latitude:  40:26:33 N
5000 Forbes Avenue			Longitude: 79:56:48 W
Pittsburgh, PA 15213
===========================================================================

From: Jeffrey Mark Siskind
Subject: Re: Floating Point
Date: 
Message-ID: <QOBI.94Mar7150501@qobi.ai.toronto.edu>
    From: ········@silver.ucs.indiana.edu (lael schooler)
    
    I have heard that there exist implementations of LISP
    which are as fast as C. Is this true?
    
You have to be very carefull. Differences in Lisp compilers can produce wildly
different results on rather benign floating point code. In some compilers,
even very small changes to the code can trigger the enabling or disabling of a
whole cascade of code generation optimizations which can affect performance
by an order of magnitude or more. Here is the synopsis of my intimate
experience with the Lucid/SPARC production compiler version 4.1 (henceforth
refered to as Lucid). (It says nothing about any other compiler.)

I once wrote some code to do some computational geometry. In a nutshell, it
took two line-segments or circles as input, and computed how much you could
translate or rotate one until it intersected the other. There are eight cases
here, the cross product of the trajector being a line-segment or a circle, the
target being a line-segment or a circle, and translation versus rotation.
Since across all of these cases there were a lot of similar calculations, I
wrote it in CLOS with all of the nice high-level abstractions we teach in
university. While the code was pristine and elegant, it was slow as a dog.
There were several reasons for this:

A. The straightforward encoding of the mathematical derivations meant that
there were numerous cases of repeated computations or computations that would
be discarded. Compilers usually can't do common subexpression removal or dead
code elimination across function call abstraction boundaries, let alone across
CLOS method dispatch and method combination.

B. The elegant way of doing things dictated that I have a generic function
for computing things like the intersection point of two figures. This function
did a multimethod dispatch on both arguments. In all cases in my code the
types of both arguments could be determined at compile time. But in Lucid,
even if you fully declare everything, you still get run time dispatch.

C. The elegant way of doing things dictated that many routines construct and
return new objects like points, lines, and rays. This meant that the
implementation was consing new temporary objects out the wazoo. Often these
temporary objects had extremely short lifetime, often serving just to package
up a pair of numbers returned from one function as input to another. I
originally used multiple values for this but found that code very hard to
debug and prove correct.

So I figured I would be industrious and I wrote a partial evaluator. This
partial evaluator accepted most the CommonLisp special forms and even accepted
a fair subset of CLOS. It did very extensive type inference. With minimal
declarations it could unravel arbitrary CommonLisp code into straightline
arithmetic code so long as certain compile-time termination conditions were
met (like compile-time determinable bounds on loops or recursion). This
straightline code did no consing, no slot access, no type dispatching, no
loops, no recursion, and no high-level function calls; just really dense calls
to primitive functions like +, -, *, /, ABS, SIN, ... along with conditional
branching. Having squished the abstraction hierarchy down to a single
function, the partial evaluator did common subexpression and dead code
elimination on this straightline code. The result was something like:

(flet ((f1 (x y) (sqrt (+ (* x x) (* y y))))
       (f2 (x y u v) (atan (- y v) (- x u))))
 (if (f2 a b c d)
     (f1 f g)
     (f1 i j)))

(This is not the actual code. I could run the partial evaluator and give
actual code but I don't have the time right now. This is just a hand
simulation to give you an idea of the code generated.)

After spending several months of hard work on the partial evaluator, I finally
got it to translate all 1200 lines of the inner loop of my computational
geometry calculations into thousands upon thousands of lines straightline
code. And yes, the code was extremely dense with no consing, slot access, or
type dispatching. To my great dismay and utter disappointment the code was
still slow as a dog. So I began disassembling the output of the Lucid
compiler. What I found was that even though my partial evaluator did complete
type inference for every expression in my code, and it generated straightline
code fully decorated with float declarations, the Lucid compiler insisted upon
boxing all of the floats since THERE IS NO WAY WITH LUCID TO PASS AN UNBOXED
FLOAT AS AN ARGUMENT TO A FUNCTION OR RETURN AN UNBOXED FLOAT AS A RESULT.
This holds true even for local functions generated by FLET. So I tried to use
lexical free variables to pass parameters instead of arguments. No go. Lucid
still would box and unbox the floats upon each access. I tried everything
without success. And because the floats were boxed, calls to `primitives' like
ABS used the generic version rather than being open-coded.

Now I experimented by hand and could write very dense code that within one
function, would leave all floats unboxed and open-code calls to primitives.
But I needed to have this work across functions which I fully declared. And
there was no way to get this with Lucid.

Fortunately, my partial evaluator used a very general and abstract internal
representation for straighline code from which I could generate various
different styles of target Lisp code. I explored the space of styles quite
extensively. And then I gave up. I ended up having my partial evaluator
generate straightline C code from that same internal representation and then
load that C code in with the FFI.

Anyway, this is just my war story.
	Jeff
From: Ken Anderson
Subject: Re: Floating Point
Date: 
Message-ID: <KANDERSO.94Mar8124230@wheaton.bbn.com>
In article <·················@qobi.ai.toronto.edu> ····@qobi.ai.toronto.edu (Jeffrey Mark Siskind) writes:

   Path: news.bbn.com!olivea!decwrl!hookup!swrinde!ihnp4.ucsd.edu!mvb.saic.com!MathWorks.Com!panix!zip.eecs.umich.edu!newsxfer.itd.umich.edu!nntp.cs.ubc.ca!utcsri!newsServ.csri!qobi
   Newsgroups: comp.lang.lisp
   From: ····@qobi.ai.toronto.edu (Jeffrey Mark Siskind)
   Reply-To: ····@CS.Toronto.EDU
   Organization: Department of Computer Science, University of Toronto
   References: <············@cs.cmu.edu>
   Date: 7 Mar 94 20:05:03 GMT
   Lines: 97

       From: ········@silver.ucs.indiana.edu (lael schooler)

       I have heard that there exist implementations of LISP
       which are as fast as C. Is this true?

   You have to be very carefull. Differences in Lisp compilers can produce wildly
   different results on rather benign floating point code. In some compilers,
   even very small changes to the code can trigger the enabling or disabling of a
   whole cascade of code generation optimizations which can affect performance
   by an order of magnitude or more. Here is the synopsis of my intimate
   experience with the Lucid/SPARC production compiler version 4.1 (henceforth
   refered to as Lucid). (It says nothing about any other compiler.)

   I once wrote some code to do some computational geometry. In a nutshell, it
   took two line-segments or circles as input, and computed how much you could
   translate or rotate one until it intersected the other. There are eight cases
   here, the cross product of the trajector being a line-segment or a circle, the
   target being a line-segment or a circle, and translation versus rotation.
   Since across all of these cases there were a lot of similar calculations, I
   wrote it in CLOS with all of the nice high-level abstractions we teach in
   university. While the code was pristine and elegant, it was slow as a dog.
   There were several reasons for this:

   A. The straightforward encoding of the mathematical derivations meant that
   there were numerous cases of repeated computations or computations that would
   be discarded. Compilers usually can't do common subexpression removal or dead
   code elimination across function call abstraction boundaries, let alone across
   CLOS method dispatch and method combination.

   B. The elegant way of doing things dictated that I have a generic function
   for computing things like the intersection point of two figures. This function
   did a multimethod dispatch on both arguments. In all cases in my code the
   types of both arguments could be determined at compile time. But in Lucid,
   even if you fully declare everything, you still get run time dispatch.

   C. The elegant way of doing things dictated that many routines construct and
   return new objects like points, lines, and rays. This meant that the
   implementation was consing new temporary objects out the wazoo. Often these
   temporary objects had extremely short lifetime, often serving just to package
   up a pair of numbers returned from one function as input to another. I
   originally used multiple values for this but found that code very hard to
   debug and prove correct.

   So I figured I would be industrious and I wrote a partial evaluator. This
   partial evaluator accepted most the CommonLisp special forms and even accepted
   a fair subset of CLOS. It did very extensive type inference. With minimal
   declarations it could unravel arbitrary CommonLisp code into straightline
   arithmetic code so long as certain compile-time termination conditions were
   met (like compile-time determinable bounds on loops or recursion). This
   straightline code did no consing, no slot access, no type dispatching, no
   loops, no recursion, and no high-level function calls; just really dense calls
   to primitive functions like +, -, *, /, ABS, SIN, ... along with conditional
   branching. Having squished the abstraction hierarchy down to a single
   function, the partial evaluator did common subexpression and dead code
   elimination on this straightline code. The result was something like:

   (flet ((f1 (x y) (sqrt (+ (* x x) (* y y))))
	  (f2 (x y u v) (atan (- y v) (- x u))))
    (if (f2 a b c d)
	(f1 f g)
	(f1 i j)))

   (This is not the actual code. I could run the partial evaluator and give
   actual code but I don't have the time right now. This is just a hand
   simulation to give you an idea of the code generated.)

   After spending several months of hard work on the partial evaluator, I finally
   got it to translate all 1200 lines of the inner loop of my computational
   geometry calculations into thousands upon thousands of lines straightline
   code. And yes, the code was extremely dense with no consing, slot access, or
   type dispatching. To my great dismay and utter disappointment the code was
   still slow as a dog. So I began disassembling the output of the Lucid
   compiler. What I found was that even though my partial evaluator did complete
   type inference for every expression in my code, and it generated straightline
   code fully decorated with float declarations, the Lucid compiler insisted upon
   boxing all of the floats since THERE IS NO WAY WITH LUCID TO PASS AN UNBOXED
   FLOAT AS AN ARGUMENT TO A FUNCTION OR RETURN AN UNBOXED FLOAT AS A RESULT.
   This holds true even for local functions generated by FLET. So I tried to use

You'll find CMU CL much better at least the flet case.  The best thing you
can currently do in general Common Lisp is inline the functions in the
inner loop so that the compiler can propagate the type information.  I
realize that is not what you always want.

Also, Allegro 4.2 final can let users pass unboxed floats to lisp functions
much the same way that you can pass them to foreign functions.
Thisapproach goes against some Common Lisp semantics (which is why this is
an issue anyway) so it isn't currently advertised.

   lexical free variables to pass parameters instead of arguments. No go. Lucid
   still would box and unbox the floats upon each access. I tried everything
   without success. And because the floats were boxed, calls to `primitives' like

One trick that i've used is to use a (simple-array double-float (*)) array
as a place to store floats temporarily without consing them.  I realize
this may seem like a kludge in your application, but in others it fits in
fairly well.

   ABS used the generic version rather than being open-coded.

I don't actually believe this is true, at least in Lucid today, but ABS is
a classic example of how an apparently trivial operation can be much slower
than you expect.  (I'm cleaning up an example of this right now.)

   Now I experimented by hand and could write very dense code that within one
   function, would leave all floats unboxed and open-code calls to primitives.
   But I needed to have this work across functions which I fully declared. And
   there was no way to get this with Lucid.

   Fortunately, my partial evaluator used a very general and abstract internal
   representation for straighline code from which I could generate various
   different styles of target Lisp code. I explored the space of styles quite
   extensively. And then I gave up. I ended up having my partial evaluator
   generate straightline C code from that same internal representation and then
   load that C code in with the FFI.

I think that writing Lisp code that writes C code is a wonderful use of
Lisp. You've crystalized the knowledge, that started out in an elegant but
slow CLOS program, into an efficient C program.  Of course, your point is
we should be able to do this more automatically.

   Anyway, this is just my war story.

A wonderful one.  These are important to share, with users and vendors.
I've been trying to turn such war stories into a practical guide for
improving Lisp performance.  I find that it usually isn't hard to find
places where small changes code lead to vast (X10 or more) improvements in
performance.  The suprising thing is that this can be true in C as well.

k
--
Ken Anderson 
Internet: ·········@bbn.com
BBN STC              Work Phone: 617-873-3160
10 Moulton St.       Home Phone: 617-643-0157
Mail Stop 6/4c              FAX: 617-873-3776
Cambridge MA 02138
USA
From: Christopher Hoover
Subject: Re: Floating Point
Date: 
Message-ID: <CMFBvw.4Dy.3@cs.cmu.edu>
In article <·····················@wheaton.bbn.com>,
Ken Anderson <········@wheaton.bbn.com> wrote:
>
>One trick that i've used is to use a (simple-array double-float (*)) array
>as a place to store floats temporarily without consing them.  I realize
>this may seem like a kludge in your application, but in others it fits in
>fairly well.

Yes, this works fairly well albeit awkwardly.  A similar solution
solves the problem where you want to store a float in a sturcture or
class.

Structure and instance slot storage is generally equivvalent to
SIMPLE-VECTOR.  This is the biggest lossage I find in practice with
using floating point.  I can usually use INLINE, compiler options, and
other chicanery to keep consing from float boxing down to nearly zero,
but you usually have to store the answer when you are done somewhere.

As far as I know only CMUCL offers unboxed slots in structures.  You
wind up having to do something like this in all other implementations:

(defstruct frob
  ;;
  ;; I really want a slot named XOOL of type DOUBLE-FLOAT, but this
  ;; cruft will keep me from consing.
  (xool-unboxed-storage (make-array 1 :element-type 'double-float
				    :initial-element 0.0)
			:type (simple-array double-float (1))
			:read-only t))

(declaim (inline frob-xool))
(defun frob-xool (frob)
  (declare (type frob frob))
  (aref (frob-xool-unboxed-storage frob) 0))

(declaim (inline set-frob-xool))
(defun set-frob-xool (frob value)
  (declare (double-float value)
	   (type frob frob))
  (setf (aref (frob-xool-unboxed-storage frob) 0) value))
;;;
(defsetf frob-xool set-frob-xool)

It costs you an extra dereference everywhere, but you can get some
tight code from it.  Of course you can hide the cruft with some macro
magic.

Bottom line: vendors should support unboxed slots in structures and
classes as well.  It's not that hard, although inheritance keeps it
interesting.

-- Chris.
(··@lks.csi.com)
From: Lawrence G. Mayka
Subject: Re: Floating Point
Date: 
Message-ID: <LGM.94Mar11090926@polaris.flw.att.com>
In article <············@cs.cmu.edu> ···@cs.cmu.edu (Christopher Hoover) writes:

   (defstruct frob
     ;;
     ;; I really want a slot named XOOL of type DOUBLE-FLOAT, but this
     ;; cruft will keep me from consing.
     (xool-unboxed-storage (make-array 1 :element-type 'double-float
				       :initial-element 0.0)
			   :type (simple-array double-float (1))
			   :read-only t))

In a case like this, I myself prefer an array rank of 0 instead of 1:

(defstruct frob
  (xool-unboxed-storage (make-array nil :element-type 'double-float
                                        :initial-element 0d0)
                        :type (simple-array double-float nil)
                        :read-only t))

The array contents can then be referenced simply as

(aref (frob-xool-unboxed-storage frob))

with no indices at all.

--
        Lawrence G. Mayka
        AT&T Bell Laboratories
        ···@iexist.att.com

Standard disclaimer.
From: Christopher Hoover
Subject: Re: Floating Point
Date: 
Message-ID: <CMIHEo.51L.3@cs.cmu.edu>
In article <·················@polaris.flw.att.com>,
Lawrence G. Mayka <···@polaris.flw.att.com> wrote:
>In article <············@cs.cmu.edu> ···@cs.cmu.edu (Christopher Hoover) writes:
>
>   (defstruct frob
>     ;;
>     ;; I really want a slot named XOOL of type DOUBLE-FLOAT, but this
>     ;; cruft will keep me from consing.
>     (xool-unboxed-storage (make-array 1 :element-type 'double-float
>				       :initial-element 0.0)
>			   :type (simple-array double-float (1))
>			   :read-only t))
>
>In a case like this, I myself prefer an array rank of 0 instead of 1:
>
>(defstruct frob
>  (xool-unboxed-storage (make-array nil :element-type 'double-float
>                                        :initial-element 0d0)
>                        :type (simple-array double-float nil)
>                        :read-only t))
>
>The array contents can then be referenced simply as
>
>(aref (frob-xool-unboxed-storage frob))
>
>with no indices at all.
>
>--
>        Lawrence G. Mayka
>        AT&T Bell Laboratories
>        ···@iexist.att.com
>
>Standard disclaimer.

Neat idea.  Unfortunately Lucid CL apparently doesn't do the same
optimizations for simple 0-d double-float arrays as they do for simple
1-d and 2-d double-float arrays.  (N.B. I didn't try to make it work
very hard, but I believe this is the case.)

-- Chris.
(··@lks.csi.com)
From: Lawrence G. Mayka
Subject: Re: Floating Point
Date: 
Message-ID: <LGM.94Mar9091703@polaris.flw.att.com>
In article <·················@qobi.ai.toronto.edu> ····@qobi.ai.toronto.edu (Jeffrey Mark Siskind) writes:

   After spending several months of hard work on the partial evaluator, I finally
   got it to translate all 1200 lines of the inner loop of my computational
   geometry calculations into thousands upon thousands of lines straightline
   code. And yes, the code was extremely dense with no consing, slot access, or
   type dispatching. To my great dismay and utter disappointment the code was
   still slow as a dog. So I began disassembling the output of the Lucid
   compiler. What I found was that even though my partial evaluator did complete
   type inference for every expression in my code, and it generated straightline
   code fully decorated with float declarations, the Lucid compiler insisted upon
   boxing all of the floats since THERE IS NO WAY WITH LUCID TO PASS AN UNBOXED
   FLOAT AS AN ARGUMENT TO A FUNCTION OR RETURN AN UNBOXED FLOAT AS A RESULT.
   This holds true even for local functions generated by FLET. So I tried to use
   lexical free variables to pass parameters instead of arguments. No go. Lucid
   still would box and unbox the floats upon each access. I tried everything
   without success. And because the floats were boxed, calls to `primitives' like
   ABS used the generic version rather than being open-coded.

"Horror stories" like this are a good argument for vendors to support
unboxed SHORT-FLOATs.  In the two implementations that I know of that
support unboxed SHORT-FLOATs (LispWorks and CLISP), the range from
LEAST-POSITIVE-SHORT-FLOAT to MOST-POSITIVE-SHORT-FLOAT is roughly the
same as for SINGLE-FLOAT; only the precision is lower.
--
        Lawrence G. Mayka
        AT&T Bell Laboratories
        ···@iexist.att.com

Standard disclaimer.
From: Anthony Berglas
Subject: Re: Floating Point
Date: 
Message-ID: <2lgl37$538@uqcspe.cs.uq.oz.au>
I can vouch for cmucl.

I did a simple neural net benchmark and cmucl beat the sun c compiler,
and was just beaten by gcc with full opt.  The code used arrays, but
not complex function calls or method dispatching.  (cmucl knows how to
compile a group of functions as a block, and thus do certain cross
function optimizations.)  cmucl beat gcc for a simple integer
benchmark, but I'd say that they are actually similar.

I suggest that you write a test for the guts of what you are trying to
do in C and lisp, and find out for yourself.  The lisp code will need
declarations, make sure that you know how to use them.


Anthony

--
Anthony Berglas
Rm 312a, Computer Science, Uni of Qld, 4072, Australia.
Uni Ph +61 7 365 4184,  Home 391 7727,  Fax 365 1999