From: Tim Bradshaw
Subject: Optimising code using CLOS in CMUCL
Date: 
Message-ID: <TFB.94May13161733@burns.cogsci.ed.ac.uk>
I know that talking about `optimising' in the same sentence as `CLOS'
sounds a bit strange, but...

I'm beginning to play with some float-intensive simulations which I
will eventually write in non-CLOS CL (because I know I can make CMUCL
do float stuff fast).  Before I know what I want to do, I'm doing
stuff in CLOS for all the benefits it gives.  What I plan to do is to
pull things out of instances and then manipulate them and put them
back -- i.e I hope I don't have to do too much method dispatch.  But
I'm not sure how I should declare things to tell CMUCL what to do.

Here is a trivial piece of code I wrote to experiment:

    (defclass particle ()
      ;; a free particle in 1 dimension
      ((m :accessor mass :type single-float :initform 0.0 :initarg :mass)
       (q :accessor pos :type single-float :initform 0.0 :initarg :pos)
       (p :accessor mom :type single-float :initform 0.0 :initarg :mom)))

    (declaim (ftype (function (particle) single-float) mass pos mom vel vel2))

    (defgeneric vel (thing)
      ;; Rely on the above DECLAIM
      )
    (defmethod vel ((thing particle))
	(/ (mom thing) (mass thing)))

    (defgeneric vel2 (thing)
      ;; Do things locally
      )
    (defmethod vel2 ((thing particle))
      (let ((mom (mom thing)) (mass (mass thing)))
	(declare (type single-float mom mass))
	(/ mom mass)))

What I want to happen is that the `/' can be done efficiently, and
that I can write code without a billion extra declarations.  This
fragment lets that happen in CMUCL, but obviously I'd rather write
things like VEL rather than VEL2.

Questions: (1) is that DECLAIM a reasonable approach to this, given
that PCL itself doesn't do anything with types that I can see?  Is the
DECLAIM even *correct* (ie is it correct to declare that GFs are
functions?).

(2) Is it right that moving the DECLAIM to the start of the file
    (ahead of the DEFCLASS) is going to break the optimisation? -- it
    does in CMUCL.

(3) Since PCL ignores this type declarations, VEL loses if I stuff a
    non-single-float into one of the slots.  Even worse, it isn't
    picked up by the type checking, it just gets a bus error.  VEL2
    gets a type error which is at least better.  Is there any way
    around this?

(4) any other ideas?  Is it even reasonable to try and write fastish
    code this way?  No suggestions to switch to C++ please.

Thanks

--tim

From: Ken Anderson
Subject: Re: Optimising code using CLOS in CMUCL
Date: 
Message-ID: <KANDERSO.94May13130559@wheaton.bbn.com>
In article <·················@burns.cogsci.ed.ac.uk> ···@cogsci.ed.ac.uk (Tim Bradshaw) writes:

   Newsgroups: comp.lang.lisp
   Path: info-server.bbn.com!noc.near.net!paperboy.wellfleet.com!news-feed-1.peachnet.edu!gatech!howland.reston.ans.net!pipex!uknet!festival!edcogsci!usenet
   From: ···@cogsci.ed.ac.uk (Tim Bradshaw)
   Sender: ······@cogsci.ed.ac.uk (C News Software)
   Nntp-Posting-Host: burns
   Organization: Centre for Cognitive Science, University of Edinburgh
   Date: Fri, 13 May 1994 15:17:33 GMT
   Lines: 65

   I know that talking about `optimising' in the same sentence as `CLOS'
   sounds a bit strange, but...

Why?

   I'm beginning to play with some float-intensive simulations which I
   will eventually write in non-CLOS CL (because I know I can make CMUCL
   do float stuff fast).  Before I know what I want to do, I'm doing
   stuff in CLOS for all the benefits it gives.  What I plan to do is to
   pull things out of instances and then manipulate them and put them
   back -- i.e I hope I don't have to do too much method dispatch.  But
   I'm not sure how I should declare things to tell CMUCL what to do.

   Here is a trivial piece of code I wrote to experiment:

       (defclass particle ()
	 ;; a free particle in 1 dimension
	 ((m :accessor mass :type single-float :initform 0.0 :initarg :mass)
	  (q :accessor pos :type single-float :initform 0.0 :initarg :pos)
	  (p :accessor mom :type single-float :initform 0.0 :initarg :mom)))

       (declaim (ftype (function (particle) single-float) mass pos mom vel vel2))

       (defgeneric vel (thing)
	 ;; Rely on the above DECLAIM
	 )
       (defmethod vel ((thing particle))
	   (/ (mom thing) (mass thing)))

       (defgeneric vel2 (thing)
	 ;; Do things locally
	 )
       (defmethod vel2 ((thing particle))
	 (let ((mom (mom thing)) (mass (mass thing)))
	   (declare (type single-float mom mass))
	   (/ mom mass)))

   What I want to happen is that the `/' can be done efficiently, and
   that I can write code without a billion extra declarations.  This
   fragment lets that happen in CMUCL, but obviously I'd rather write
   things like VEL rather than VEL2.

You could declare / or a special version of / to take only single-float
arguments and return a single-float, much like your ftype declaration, above.

You could also declare a macro, /-sf, say, that would write the
declarations for you.

   Questions: (1) is that DECLAIM a reasonable approach to this, given
   that PCL itself doesn't do anything with types that I can see?  Is the
   DECLAIM even *correct* (ie is it correct to declare that GFs are
   functions?).

   (2) Is it right that moving the DECLAIM to the start of the file
       (ahead of the DEFCLASS) is going to break the optimisation? -- it
       does in CMUCL.

   (3) Since PCL ignores this type declarations, VEL loses if I stuff a
       non-single-float into one of the slots.  Even worse, it isn't
       picked up by the type checking, it just gets a bus error.  VEL2
       gets a type error which is at least better.  Is there any way
       around this?

You could have CMUCL do all the type checking for you, but then you
wouldn't have the performance you want.  Get the code working before you
optimize it.

   (4) any other ideas?  Is it even reasonable to try and write fastish
       code this way?  No suggestions to switch to C++ please.


Write your appliation in a very generic way so that you can play with
various optimization startegies without changing your code much.  Ie, use a
layer of macros so you can change your mind about implementation without
changing the code.  For example, if you had a DEFOBJECT macro, you could
switch from standard-objects to structure-objects for any single
inheritance class (CLOS can tell you which those are).  Similarly use
readers like (slot x) rather than (slot-value x 'slot).

The biggest issue will be avoiding floating point consing.  In some Lisp's
at least, vel and vel2 will always cons a single-float.  In those Lisp's
you're better off making vel a macro.  This way the compiler will be able to
see that it is dealing with floats and not make new ones when it doesn't
need to.  Similarly, when you store a single-float into a slot on an
object, Lisp will cons a float to store there.  If you need to store lots
of numbers that change all the time, use a (simple-array single-float (*)).

You may also want to crystalize an object oriented representation of your
problem into an efficient implementation.

Don't trust your intuition, profile your program to see where the bottle
necks are.

Scott Fahlman's cascor-1 code is a good example of things you can do to
make Lisp as fast as C. ftp: pt.cs.cmu.edu /afs/cs/project/connect/code
cascor-v1.0.3.share.

Contact me separately, if you need more help.

k

--
Ken Anderson 
Internet: ·········@bbn.com
BBN STC              Work Phone: 617-873-3160
10 Moulton St.       Home Phone: 617-643-0157
Mail Stop 6/4c              FAX: 617-873-3776
Cambridge MA 02138
USA
From: Jim McDonald
Subject: Re: Optimising code using CLOS in CMUCL
Date: 
Message-ID: <1994May13.214307.23811@kestrel.edu>
In article <······················@wheaton.bbn.com>, ········@wheaton.bbn.com (Ken Anderson) writes:
|> 
|> You could declare / or a special version of / to take only single-float
|> arguments and return a single-float, much like your ftype declaration, above.
|> 
...
|> 
|> Don't trust your intuition, profile your program to see where the bottle
|> necks are.

Along these lines, note that on some machines double-float is faster than
single-float, since the internal operations are on double-floats and singles
are handled via extra conversions (all in hardware, but using extra cycles).
It's also plausible that loading/storing a doubleword vs. singleword could 
have unintuitive timing properties, depending on the memory bus, caches, etc.
If you need that last few percent, profile it both ways.  

Also, I think it's legal for the compiler to treat single and double float
the same (e.g. both as double float), so you might want to disassemble some
code to see what's really being generated for various declarations.

|> --
|> Ken Anderson 
|> Internet: ·········@bbn.com
|> BBN STC              Work Phone: 617-873-3160
|> 10 Moulton St.       Home Phone: 617-643-0157
|> Mail Stop 6/4c              FAX: 617-873-3776
|> Cambridge MA 02138
|> USA

-- 
James McDonald
Kestrel Institute                       ········@kestrel.edu
3260 Hillview Ave.                      (415) 493-6871 ext. 339
Palo Alto, CA 94304                fax: (415) 424-1807