From: H. Tunc Simsek
Subject: [optimization] Is coercion expensive?
Date: 
Message-ID: <38449FB1.F0297EFA@EECS.Berkeley.Edu>
I'm still working on optimizing a piece of code (an integration algo)
that is part of a larger system.  The bottlenecks I've encountered
so far are:

i)  each time the function INTEGRATE is called it conses a short list.
   Apparently, (declare (dynamic extent ... )) doesn't work with CMUCL.
   The effect of this consing should not be too bad but the function
	integrate is called too many times and things accumulate.
	Some performance figures are (prob. not very accurate):

	C code:  1.2 seconds to integrate 10,000 components
       Lisp code: 2.3 seconds (of which 0.6 sec. is GC).

  It'd be nice to get rid of this GC bit.  Any suggestions?

ii) The integration should work for DOUBLE-FLOAT numbers.
    But if the rhs of a differential eqn. returns 1 instead of 1.0d0
   then the program gives an error.  I suppose I could coerce
   the numbers to DOUBLE-FLOAT, but would that be expensive?
   

   e.g.

    (defun adder (a b)
       (declare (type double a b))
       (+ a b))

    (adder 1 2)

    should work for convenience.

Thanks,
Tunc

From: John Watton
Subject: Re: [optimization] Is coercion expensive?
Date: 
Message-ID: <823a2j$bkb$1@nnrp1.deja.com>
In article <·················@EECS.Berkeley.Edu>,
  ······@EECS.Berkeley.Edu wrote:
> I'm still working on optimizing a piece of code (an integration algo)
> that is part of a larger system.  The bottlenecks I've encountered
> so far are:
>
> i)  each time the function INTEGRATE is called it conses a short list.
>    Apparently, (declare (dynamic extent ... )) doesn't work with
CMUCL.
>    The effect of this consing should not be too bad but the function
> 	integrate is called too many times and things accumulate.
> 	Some performance figures are (prob. not very accurate):
>
> 	C code:  1.2 seconds to integrate 10,000 components
>        Lisp code: 2.3 seconds (of which 0.6 sec. is GC).
>
>   It'd be nice to get rid of this GC bit.  Any suggestions?

I am not a CMUCL user but with ACL calling a function with numerical
arguments requires them to be boxed and then unboxed in the called
function if it is optimized. When I want to avoid this overhead which I
suspect is a similar overhead with CMUCL I "prebox" the numbers myself
by placing them into an array which is recycled for this purpose. You
trade a couple of aref's and setf on aref's for the consing overhead of
boxing which is usually a big win.

Here is an example:

(defparameter *inputs* (make-array 2))
(defparameter *output* (make-array 1 :element-type 'single-float))
(declaim (type (simple-array t (*)) *inputs*)
	 (type (simple-array single-float (1)) *output*))

(defun test (inputs)
  (declare (optimize (speed 3) (safety 1))
	   (type (simple-array t (*)) inputs))
  (let* ((i (the fixnum (aref inputs 0)))
	 (a (the single-float (aref inputs 1)))
	 (value 0.0))
    (declare (fixnum i)
	     (single-float a value))
    (setq value (* i a))
    (setf (aref *output* 0) value)
    nil)) ;; nil here to prevent boxing return value

(defmacro test-mult (i a)
  `(progn
     (setf (aref *inputs* 0) ,i
	   (aref *inputs* 1) ,a)
     (test *inputs*)
     (aref *output* 0)))

> ii) The integration should work for DOUBLE-FLOAT numbers.
>     But if the rhs of a differential eqn. returns 1 instead of 1.0d0
>    then the program gives an error.  I suppose I could coerce
>    the numbers to DOUBLE-FLOAT, but would that be expensive?
>
>    e.g.
>
>     (defun adder (a b)
>        (declare (type double a b))
>        (+ a b))
>
>     (adder 1 2)
>
>     should work for convenience.
>

Coercing isn't usually a big overhead. Try it and see:

(defun adder (a b)
  (let ((a (float a 0.0d0))
        (b (float b 0.0d0)))
   (declare (double-float a b))
   (+ a b)))

--
John Watton
Alcoa Inc.


Sent via Deja.com http://www.deja.com/
Before you buy.
From: Hidayet Tunc Simsek
Subject: Re: [optimization] Is coercion expensive?
Date: 
Message-ID: <3845BA71.E9EE21A@EECS.Berkeley.Edu>
John Watton wrote:
> 


> 
> I am not a CMUCL user but with ACL calling a function with numerical
> arguments requires them to be boxed and then unboxed in the called
> function if it is optimized. When I want to avoid this overhead which I
> suspect is a similar overhead with CMUCL I "prebox" the numbers myself
> by placing them into an array which is recycled for this purpose. You
> trade a couple of aref's and setf on aref's for the consing overhead of
> boxing which is usually a big win.
>


What is meant by "boxing"?  I'm quite new to lisp and not familiar with 
this term.

> > ii) The integration should work for DOUBLE-FLOAT numbers.
> >     But if the rhs of a differential eqn. returns 1 instead of 1.0d0
> >    then the program gives an error.  I suppose I could coerce
> >    the numbers to DOUBLE-FLOAT, but would that be expensive?
> >
> >    e.g.
> >
> >     (defun adder (a b)
> >        (declare (type double a b))
> >        (+ a b))
> >
> >     (adder 1 2)
> >
> >     should work for convenience.
> >
> 
> Coercing isn't usually a big overhead. Try it and see:
> 
> (defun adder (a b)
>   (let ((a (float a 0.0d0))
>         (b (float b 0.0d0)))
>    (declare (double-float a b))
>    (+ a b)))
> 

I had tried the function FLOAT with CMUCL, but that consed a lot
(perhaps misdeclarations).
Is (coerce a 'double-float) and (float a 0.0d0) the same, i.e. are they
optimized into
the same code?

> --
> John Watton
> Alcoa Inc.
> 
> Sent via Deja.com http://www.deja.com/
> Before you buy.
From: John Watton
Subject: Re: [optimization] Is coercion expensive?
Date: 
Message-ID: <8260kq$aaq$1@nnrp1.deja.com>
In article <················@EECS.Berkeley.Edu>,
  ······@EECS.Berkeley.Edu wrote:
> What is meant by "boxing"?  I'm quite new to lisp and not familiar
with
> this term.

From the Franz ACL5.0 documentation at:

http://www.franz.com/support/docs/5.0/doc/cl/compiling.htm
9.5 Explain boxing

A number is boxed when it is converted from its machine representation
to the Lisp representation. For floats, the machine representation is
one (for singles) or two (for doubles) words. Lisp adds an extra word,
which contain a pointer and a type code. For fixnums, boxing simply
involves a left shift of two bits. For bignums which are in the range
of machine integers, boxing again adds an additional word.

Boxing obviously involves a computational overhead, but more important
it involves a space overhead. If a calculation involves the calculation
of thousands of floats, for example, thousands of bytes of space will
be used. Often that space need not be used.

> I had tried the function FLOAT with CMUCL, but that consed a lot
> (perhaps misdeclarations).
> Is (coerce a 'double-float) and (float a 0.0d0) the same, i.e. are
they
> optimized into
> the same code?

They are not the same with Franz's ACL. Using float is more efficient.
But this doesn't necessarily apply to CMUCL. You can use disassemble to
compare the two approaches. For example with ACL:

(defun c1 (a) (float a 0.0d0))
(defun c2 (a) (coerce a 'double-float))

(disassemble 'c1)
;; disassembly of #<Function C1>
;; formals: A

;; code start: #x127b65fc:
   0: 37de0080             ldo 64(r30),r30
   4: 6bc23f59             stw r2,-84(0,r30)
   8: 6bd13fe1             stw r17,-16(0,r30)
  12: b08037ff             addit,<> -1,r4,r0     	"number of args"
  16: b1403000             addit,<> 0,r10,r0     	"c_interrupt"
  20: 4a483b1d             ldw -626(0,r18),r8     	DOUBLE-FLOAT
  24: e520e000             ble 0(sr7,r9)
  28: 34040002     [ldo]   ldi #x1,r4
  32: 4bc23f59             ldw -84(0,r30),r2
  36: 37de3f81             ldo -64(r30),r30
  40: e13fffed             be -12(sr7,r9)
  44: 4bd13fe1             ldw -16(0,r30),r17

(disassemble 'c2)
;; disassembly of #<Function C2>
;; formals: A
;; constant vector:
0:	COERCE

;; code start: #x127ccf44:
   0: 37de0080             ldo 64(r30),r30
   4: 6bc23f59             stw r2,-84(0,r30)
   8: 6bd13fe1             stw r17,-16(0,r30)
  12: b08037ff             addit,<> -1,r4,r0     	"number of args"
  16: b1403000             addit,<> 0,r10,r0     	"c_interrupt"
  20: 4a593b1d             ldw -626(0,r18),r25     	DOUBLE-FLOAT
  24: 4a280056             ldw 43(0,r17),r8     	COERCE
  28: e520e000             ble 0(sr7,r9)
  32: 34040004     [ldo]   ldi #x2,r4
  36: 4bc23f59             ldw -84(0,r30),r2
  40: 37de3f81             ldo -64(r30),r30
  44: e13fffed             be -12(sr7,r9)
  48: 4bd13fe1             ldw -16(0,r30),r17

--
John Watton
Alcoa Inc.


Sent via Deja.com http://www.deja.com/
Before you buy.
From: His Holiness the Reverend Doktor Xenophon Fenderson, the Carbon(d)ated
Subject: Re: [optimization] Is coercion expensive?
Date: 
Message-ID: <w4oso1l2uws.fsf@nemesis.irtnog.org>
>>>>> "HTS" == Hidayet Tunc Simsek <······@EECS.Berkeley.Edu> writes:

    HTS> What is meant by "boxing"?  I'm quite new to lisp and not
    HTS> familiar with this term.

This has to do with type tags being inside a particular datum.

You'll want to read "Representing Type Information in Dynamically
Typed Languages" by David Gudeman (University of Arizona TR 93-27).  I
have a copy of the paper in PostScript format---if you can't find it
(I think I found it on Henry Baker's FTP site; the file is called
"typeinfo.ps"), send me a note and I'll gladly email you the paper.

Here's the abstract:

	This report is a discussion of various techniques for
	representing type information in dynamically typed languages,
	as implemented on general-purpose machines (and costs are
	discussed in terms of modern RISC machines).  It is intended
	to make readily available a large body of knowledge that
	currently has to be absorbed piecemeal from the literature or
	re-invented by each language implementer.  This discussion
	covers not only tagging schemes but other forms of
	representation as well, although the discussion is strictly
	limited to the representation of type information.  It should
	also be noted that this report does not purport to contain a
	survey of the relevant literature.  Instead, this report
	gathers together a body of folklore, organizes it into a
	logical structure, makes some generalizations, and then
	discusses the results in terms of modern hardware.

HTH.  HAND.

-- 
"Applying computer technology is simply finding the right wrench to pound
in the correct screw." --.sig of Tim Wright, Sequent Computer Systems.
From: Fernando
Subject: Re: [optimization] Is coercion expensive?
Date: 
Message-ID: <79af4soebvalesd7d73ar0snabop6o5n3v@4ax.com>
On 02 Dec 1999 11:02:11 -0500, "His Holiness the Reverend Doktor
Xenophon Fenderson, the Carbon(d)ated" <········@irtnog.org> wrote:

>>>>>> "HTS" == Hidayet Tunc Simsek <······@EECS.Berkeley.Edu> writes:
>
>    HTS> What is meant by "boxing"?  I'm quite new to lisp and not
>    HTS> familiar with this term.
>
>This has to do with type tags being inside a particular datum.
>
>You'll want to read "Representing Type Information in Dynamically
>Typed Languages" by David Gudeman (University of Arizona TR 93-27).  I
>have a copy of the paper in PostScript format---if you can't find it
>(I think I found it on Henry Baker's FTP site; the file is called
>"typeinfo.ps"), send me a note and I'll gladly email you the paper.
You may find it here
http://www.cs.indiana.edu/scheme-repository/doc.publications.html




//-----------------------------------------------
//	Fernando Rodriguez Romero
//
//	frr at mindless dot com
//------------------------------------------------
From: Thomas A. Russ
Subject: Re: [optimization] Is coercion expensive?
Date: 
Message-ID: <ymihfi2474e.fsf@sevak.isi.edu>
"H. Tunc Simsek" <······@EECS.Berkeley.Edu> writes:

> ii) The integration should work for DOUBLE-FLOAT numbers.
>     But if the rhs of a differential eqn. returns 1 instead of 1.0d0
>    then the program gives an error.  I suppose I could coerce
>    the numbers to DOUBLE-FLOAT, but would that be expensive?

Undoubtedly less expensive than a trip to the debugger :)

>    e.g.
> 
>     (defun adder (a b)
>        (declare (type double a b))
>        (+ a b))
> 
>     (adder 1 2)
> 
>     should work for convenience.

Actually, no it shouldn't.  The declaration of types in Lisp is a
promise on the part of the programmer that the variables really will be
of that type.  At this point you have to decide whether you want the
function ADDER to be able to handle generic numbers (in which case you
need to remove the declaration) or if you want the compiler to produce
more efficient code (by keeping the declarations and then only passing
in floating point numbers).

You can't really have it both ways.  Without benchmarking a particular
implementation, it is difficult to say what the cost of coercing either
an integer to a double float or even what the cost of calling coerce on
a double float which doesn't need to be coerced would be.

A quick check of Allegro Common Lisp shows that 
   (coerce 3.2d0 'double-float)
doesn't cons up a new float, although the COERCE code does call TYPEP to
check if its argument is already of the correct type.

I would guess that by the time you have done all of the work for doing
the coercion, you will have given up any speed advantage that having
floating point only code would have provided -- since you have done a
good part of the work of implementing generic arithmetic yourself.

If you want speed, then you will have to stay in floating point numbers
and sometimes resort to other sorts of fancy tricks as well.

> 
> Thanks,
> Tunc

-- 
Thomas A. Russ,  USC/Information Sciences Institute          ···@isi.edu    
From: H. Tunc Simsek
Subject: Re: [optimization] Is coercion expensive?
Date: 
Message-ID: <38462230.2EB3D590@EECS.Berkeley.Edu>
"Thomas A. Russ" wrote:

> Undoubtedly less expensive than a trip to the debugger :)

indeed.

> 
> >    e.g.
> >
> >     (defun adder (a b)
> >        (declare (type double a b))
> >        (+ a b))
> >
> >     (adder 1 2)
> >
> >     should work for convenience.
> 
> Actually, no it shouldn't.  The declaration of types in Lisp is a
> promise on the part of the programmer that the variables really will be
> of that type.  At this point you have to decide whether you want the
> function ADDER to be able to handle generic numbers (in which case you
> need to remove the declaration) or if you want the compiler to produce
> more efficient code (by keeping the declarations and then only passing
> in floating point numbers).
> 
> You can't really have it both ways.  Without benchmarking a particular
> implementation, it is difficult to say what the cost of coercing either
> an integer to a double float or even what the cost of calling coerce on
> a double float which doesn't need to be coerced would be.


I guess I've been programming with C too long and can't quite give up
some habits.
What I'm trying to get at is:

(defun adder (a b)
   (declare (type (or fixnum single-float double-float ratio) a b))
   (the double-float (+ (coerce a 'double-float)
      (coerce b 'double-float))))

should (at least as I would like it to) neither cons nor give up
optimization.
The reason why it shouldn't cons is because it can allocate the maximum
space needed
for a and b on the stack, and it should not give up on optimization
because
it can coerce on the stack.
However, my tests with CMUCL indicates otherwise.  That's o.k., as long
as
I understand exactly why and what a *documentable* solution is.
I guess I'm really trying to mimic the *casting* provided by C.

Tunc 

> 
> --
> Thomas A. Russ,  USC/Information Sciences Institute          ···@isi.edu
From: Raymond Toy
Subject: Re: [optimization] Is coercion expensive?
Date: 
Message-ID: <4nr9h5b8db.fsf@rtp.ericsson.se>
>>>>> "H" == H Tunc Simsek <······@EECS.Berkeley.Edu> writes:

    H> I guess I've been programming with C too long and can't quite
    H> give up some habits.  What I'm trying to get at is:

    H> (defun adder (a b)
    H>    (declare (type (or fixnum single-float double-float ratio) a b))
    H>    (the double-float (+ (coerce a 'double-float)
    H>       (coerce b 'double-float))))

    H> should (at least as I would like it to) neither cons nor give
    H> up optimization.  The reason why it shouldn't cons is because
    H> it can allocate the maximum space needed for a and b on the
    H> stack, and it should not give up on optimization because it can
    H> coerce on the stack.  However, my tests with CMUCL indicates
    H> otherwise.  That's o.k., as long as I understand exactly why
    H> and what a *documentable* solution is.  I guess I'm really
    H> trying to mimic the *casting* provided by C.

You can try something like the following (not tested).  It assumes
that a and b are always the same types, so if they're not, you'll have
to generalize this.

(defun adder (a b)
  (typecase a
     (fixnum
       (locally () (declare (fixnum b))
         (coerce (+ a b) 'double-float)))
     (single-float
       (locally () (declare (single-float b))
         (coerce (+ a b) 'double-float)))
     (double-float
       (locally () (declare (double-float b))
         (coerce (+ a b) 'double-float)))
     ; etc.
  ))

CMUCL has a macro called number-dispatch that basically generates this
type of code.  This will give you maximum speed for the individual
operation, but you have to do a type dispatch on a and b.  In this
case, since the operation is so simple, it might be give you anything.

Ray