From: Peter C Olsen
Subject: How can I write acceptably fast code in CLISP or AKCL (or scheme)?
Date: 
Message-ID: <1994Apr24.170153.17490@super.org>
I'm writing to ask for some ideas on how to write "acceptably fast"
numerical analytic code in either CLISP or AKCL.  I would also be
interested in suggestions about other public-domain (or GPL-like) lisp
(or scheme) interpreters or compilers.

Here is some background.  I've been tasked with performing some
exploatory time-series/signal-processing analysis on some very large
experimental datasets --- from about 10 million samples on up.  My
first sets are 8-bit signed integers, although this may change later
on.

In the past, I've done this by plotting segments of data, finding
patterns, writing C-programs to find the patterns elsewhere, looking
at more segments, finding more patterns, etc.  Sometimes I can use
tools like Ptolemy (a dsp packages from UCSB), but usually I wind up
having to write a suite of special-purpose C programs to do most of
the work --- a fact which diverts too much time from "analysis" to
"programming."

Now I'd like to redress that balance by turning to lisp, at least in
the preliminary stages when most of my programs are written to be used
once and thrown away.  Unfortunately, my first few attempts in CLISP
yielded programs which took only minutes to write but HOURS to run ---
after I compiled them.  I'm happy that I could design and code the
programs so quickly, but their run-time is so long that I can't use
them.

Now for some specific questions:

	1.  Is AKCL faster than CLISP?  How about CMULISP?  Scheme->C?
I need a PD/GPL-like lisp because I may want to run on several
platforms, from PCs to mainframes.

	2.  I have easy access to both an SGI-IRIS and a Sun SPARC-20.
Is one better than the other?

	3.  My data is collected as a large binary file, one point per
byte.  I've been reading it directly, one byte at a time with
"read-byte", but only because I can't figure out how to do it better.
Is there an analog to the C function "fread" to allow block-reads?
Alternatively, would it be faster to translate the data to
ascii-character representation (thereby expanding it manyfold) and
read it that way?

I would appreciate any help.  I like lisp, and I want to use it for
some serious work, but I won't be able to unless I can make it run
much faster than I have been able to do so far.

Peter


-- 
   Peter Olsen, n2ell, ·······@super.super.org  ...!uunet!super!pcolsen
         P.O. Box 410, Simpsonville, MD 21150-0410; 410-997-8584
     "Engineering is the art of applying a professional knowledge of
   mathematics and the physical sciences to improve the quality of life"

From: Henry G. Baker
Subject: Re: How can I write acceptably fast code in CLISP or AKCL (or scheme)?
Date: 
Message-ID: <hbakerCotnzM.663@netcom.com>
In article <······················@super.org> ·······@super.org (Peter C Olsen) writes:
>I'm writing to ask for some ideas on how to write "acceptably fast"
>numerical analytic code in either CLISP or AKCL.  I would also be
>interested in suggestions about other public-domain (or GPL-like) lisp
>(or scheme) interpreters or compilers.

I don't know CLISP, but I know KCL.  You may want to treat KCL as the
'driver' program for some C functions that do much of the dirty work.
The advantages are that you can continue to use Lisp as the driver to
perform the tests.

The interface between KCL and C is extremely efficient, and not difficult
to work with, once you understand the basic KCL array format.  Use KCL
to allocate and manage the arrays, but C to do the calculations.

There's no inherent reason why KCL couldn't compile array hacking code
into good C, but it currently requires so many declarations that it's
probably easier to code in C directly.

I haven't tested it, but 'read-byte' is probably quite slow, so you
might want to use C functions for reading, as well.

It is very easy to declare C functions which are callable from Lisp.
KCL also allows incremental loading of C functions (but you have to
be a little careful), so you can do Lisp-style incremental development.
(Obsolete versions of C functions take up virtual memory, but will no
longer be called.)

You should be able to take advantage of the best of both worlds.
From: Jeff Dalton
Subject: Re: How can I write acceptably fast code in CLISP or AKCL (or scheme)?
Date: 
Message-ID: <Cottv5.D6o@cogsci.ed.ac.uk>
In article <······················@super.org> ·······@super.org (Peter C Olsen) writes:
>
>I'm writing to ask for some ideas on how to write "acceptably fast"
>numerical analytic code in either CLISP or AKCL.  I would also be
>interested in suggestions about other public-domain (or GPL-like) lisp
>(or scheme) interpreters or compilers.

You can get reasonably efficient code in many cases in AKCL
by adding declarations.  There are some examples in the documentation.

Now, rather than wonder whether your declarations have had any effect,
and trying to figure it out via benchmarking, you can look at the
emitted code.  This is easier for AKCL than for most other Lisps,
because AKCL compiles to native code _via C_.  (That is, it emits C
code and then calls the C compiler.)  The C code is moderately easy to
understand (especially compared to assembler output), once you get
used to it.  

You don't have to worry about all the details, but it may be worth
figuring out enough of it so that you can tell how well your
declarations are working.  Here's some information that may help:

----------------------------------------------------------------------

I. In-line operations, declarations and type-specific arithmetic in AKCL

Since the [A]KCL compiler emits C code, you can often understand
what it is doing by looking at the emitted code.  If you define
a function in Lisp and call DISASSEMBLE on its name, you will
see the code that would be compiled for that functions.  If you
want to see the C code for a whole file, call COMPILE-FILE with
:C-FILE T as a keyword parameter.

You can see how a built-in function will be compiled in-line by
looking at the property-list of the function name.  For instance:

>(symbol-plist '+)
(COMPILER::INLINE-ALWAYS
    (((COMPILER::FIXNUM-FLOAT COMPILER::FIXNUM-FLOAT) LONG-FLOAT NIL
      NIL "(double)(#0)+(double)(#1)")
     ((COMPILER::FIXNUM-FLOAT COMPILER::FIXNUM-FLOAT) SHORT-FLOAT NIL
      NIL "(double)(#0)+(double)(#1)")
     ((FIXNUM FIXNUM) FIXNUM NIL NIL "(#0)+(#1)")
     ((T T) T NIL T "number_plus(#0,#1)"))
    COMPILER::RETURN-TYPE T COMPILER::ARG-TYPES (*) COMPILER::LFUN
    "Lplus")

INLINE-ALWAYS always applies.  There's also INLINE-SAFE and -UNSAFE
which depend on the OPTIMIZE declaration.  An entry like this

((FIXNUM FIXNUM) FIXNUM NIL NIL "(#0)+(#1)")

means that if adding a fixnum to a fixnum and the result is known
to be a fixnum, emit "(#0)+(#1)" as the C code.  So this is ordinary
C addition.  "number_plus(#0,#1)", on the other hand, will be a
procedure call.

More information on inline compilation can be found by looking
at the comments in cmpnew/cmpinline.lsp in the sources.

Now, to actually get fixnum addition, you need to make sure the
compiler knows the argument types **and the result type**.  
(That much is true of CL in general.)  You can do all of that
via a macro:

(defmacro fix+ (i j)
  `(the fixnum (+ (the fixnum ,i) (the fixnum ,j))))

I have a file containing such macros for all of the arithmetic
operations I need.

However, you can also declare variable types, adding THE forms
to specify the result type.  For instance:

>(defun f (i j)
   (declare (fixnum i j))
   (let ((k (the fixnum (+ i j))))
     (declare (fixnum k))
     (the fixnum (* k k))))
F

To see how this would compile, type (disassemble 'f).

Here is the most relevant part of what will result:

/*	function definition for F	*/

static L1()
{register object *base=vs_base;
	register object *sup=base+VM1; VC1
	vs_check;
	{int V1;
	int V2;
	V1=fix(base[0]);
	V2=fix(base[1]);
	vs_top=sup;
TTL:;
	{int V3;
	V3= (V1)+(V2);
	base[2]= CMPmake_fixnum((V3)*(V3));
	vs_top=(vs_base=base+2)+1;
	return;}
	}
}

V1, V2, and V3 (V for "value") are ordinary C ints.  So declaring i,
j, k to be fixnums causes them to be allocated as C ints as well as
telling the compiler how to perform arithmetic on them more efficiently.
Struct slots and array elements that are identified as fixnums (by :type
for a slot or :element-type for an array) will also be allocated
specially.

Single- and double-precision floats can also be specially allocated.

"V1=fix(base[0])" accesses the first arg to the function and
extracts a C int from it.

"V3= (V1)+(V2)" is ordinary C addition.

"base[2]= CMPmake_fixnum((V3)*(V3))" does ordinary C multiplication
using "(V3)*(V3)" then constructs a Lisp fixnum object via
CMPmake_fixnum.  The object is put on the AKCL "value stack" by
assigning it to base[2].  That's how it gets returned as a
result.

CMPmake_fixnum is a C macro defined like this (see h/cmpinclude.h
in the sources):

#define CMPmake_fixnum(x) \
((((FIXtemp=(x))+1024)&-2048)==0?small_fixnum(FIXtemp):make_fixnum(FIXtemp))

What this amounts to is: if the number is small enough, use a
preallocated fixnum object, otherwise allocate a new one.


II. Fixnum iterations

It's sometimes difficult to get a CL compiler to actually use
fixnum arithmetic in a DOTIMES.  I find the following macro useful:

(defmacro fix-dotimes ((var count &optional (result nil))
		       &body body)
  (let ((count-var (gensym)))
    `(let ((,count-var ,count))
       (declare (fixnum ,count-var))
       (do ((,var 0 (fix+ ,var 1)))
	   ((fix>= ,var ,count-var)
	    ,result)
	 (declare (fixnum ,var))
	 ,@body))))


III. An array example

;;; Fixnum arithmetic

(defmacro fix+ (i j)
  `(the fixnum (+ (the fixnum ,i) (the fixnum ,j))))

(defmacro fix>= (i j)
  `(>= (the fixnum ,i) (the fixnum ,j)))

;;; Double-float arithmetic

(defmacro df+ (x y)
  `(the double-float (+ (the double-float ,x) (the double-float ,y))))

;;; 2-D arrays

(defmacro numrows (mat)
  `(array-dimension ,mat 0))

(defmacro numcols (mat)
  `(array-dimension ,mat 1))

;;; 2-D arrays of double-floats

(deftype df-matrix (&optional rows cols)
  `(array double-float (,rows ,cols)))

;;; Summing a row

(defun sum-row (mat r)
  (declare (type df-matrix mat)
	   (type fixnum r))
  (let ((cols (numcols mat)))
    (declare (type fixnum cols))
    (do ((sum 0d0 (df+ sum (aref mat r c)))
	 (c   0   (fix+ c 1)))
	((fix>= c cols)
	 sum)
      (declare (type fixnum c)
	       (type double-float sum)))))


IV. OPTIMIZE etc.

You can usually get away with saying:

#+akcl (proclaim '(optimize (safety 0)))

But even SAFETY 2 is not too bad.  SAFETY 3 (I think) is very slow.

There's also a global choice about how functions are called.  It
affects whether a call saves backtrace information.  For fast calls,
evaluate (use-fast-links t).  If you want backtrace information,
evaluate (use-fast-links nil).  There's some more about this in the
doc directory in the sources.  

use-fast-links is something you do at the top-level.  E.g., run
AKCL and type: (use-fast-links nil).  Or you can put it in an
init.lsp file or whatever.

----------------------------------------------------------------------

>Here is some background.  I've been tasked with performing some
>exploatory time-series/signal-processing analysis on some very large
>experimental datasets --- from about 10 million samples on up.  My
>first sets are 8-bit signed integers, although this may change later
>on.

I don't know whether AKCL has any direct support for 8-bit signed
integers.

>[...]   Unfortunately, my first few attempts in CLISP
>yielded programs which took only minutes to write but HOURS to run ---
>after I compiled them.  I'm happy that I could design and code the
>programs so quickly, but their run-time is so long that I can't use
>them.

How long do do think it would take a C program?

Can you tell where the Lisp is spending its time?  For instance, how
much of it is I/O?  (See below.)

>Now for some specific questions:
>
>	1.  Is AKCL faster than CLISP?  How about CMULISP?  Scheme->C?
>I need a PD/GPL-like lisp because I may want to run on several
>platforms, from PCs to mainframes.

AKCL is probably faster than CLISP.  I'm not sure whether CLISP
compiles to native code.  I think it doesn't.  KCL does.  CMULISP
may be faster than AKCL	but (1) it's bigger, (2) it runs on far
fewer OS/machine combinations, (3) it's much harder to port to
new OS/machine combinations, and (4) its future is uncertain.

>	2.  I have easy access to both an SGI-IRIS and a Sun SPARC-20.
>Is one better than the other?

Don't know.

>	3.  My data is collected as a large binary file, one point per
>byte.  I've been reading it directly, one byte at a time with
>"read-byte", but only because I can't figure out how to do it better.
>Is there an analog to the C function "fread" to allow block-reads?
>Alternatively, would it be faster to translate the data to
>ascii-character representation (thereby expanding it manyfold) and
>read it that way?

I would expect read-byte to be buffered, but I'm not sure what
AKCL actually does.  It might be worth writing some C code just
to read the data.

-- jeff

Jeff Dalton,
AI Applications Institute,                               ········@ed.ac.uk
Edinburgh University.
From: Richard A. O'Keefe
Subject: Re: How can I write acceptably fast code in CLISP or AKCL (or scheme)?
Date: 
Message-ID: <2pio52$p6p@goanna.cs.rmit.oz.au>
·······@super.org (Peter C Olsen) writes:
>I'm writing to ask for some ideas on how to write "acceptably fast"
>numerical analytic code in either CLISP or AKCL.  I would also be
>interested in suggestions about other public-domain (or GPL-like) lisp
>(or scheme) interpreters or compilers.

I'm a happy user of XLISP-STAT, a rather nice package for doing statistics.
It's available for Macs and for UNIX+X systems.
How does it get acceptable performance when XLISP (which it is built on)
is only an interpreter?
The computational and graphic kernels are written in C, and the Lisp
code just figures out what needs doing.

>Here is some background.  I've been tasked with performing some
>exploatory time-series/signal-processing analysis on some very large
>experimental datasets --- from about 10 million samples on up.  My
>first sets are 8-bit signed integers, although this may change later
>on.

Mind you, 10 000 000 samples is a lot of samples.
If you stored that many numbers in a vector _as numbers_ you'd need
40Mbytes of memory to hold them, so managing the data is going to take
careful work whatever language you use.

For a fair comparison, I've got the TSPACK book (code in Fortran), and
despite Fortran's fame as a number-crunching language (a) you can't read
signed bytes in Fortran and (b) that's more data than TSPACK was designed
for.

-- 
Richard A. O'Keefe; ··@goanna.cs.rmit.oz.au; RMIT, Melbourne, Australia.
"C is quirky, flawed, and an enormous success."  -- Dennis M. Ritchie.