Common Lisp implementations are still multiple times slower than C

From: Frank Buss
Subject: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 04:51:45 +0000
Message-ID: <1v6fny4g59bkb.fhal0z83t2j$.dlg@40tude.net>

I've tested my cellular automaton from
http://www.frank-buss.de/automaton/arabeske.html again on my current
computer with C and some Common Lisp implementations on Windows:

Time for calculating 1000 generations:

Visual Studio C++ 7.1.3088: 1.9s
sbcl-0.9.9-x86-win32: 29s
LispWorks Personal Edition 4.4.6 and 4.3.7: 172s
CLISP-2.3.8: 518s

I'm interested in the timings for other Lisp implementations. You can use
the C version from http://www.frank-buss.de/tmp/arabeske.zip for a time
reference (or use the CLISP version).

In another thread in this newsgroup some people wrote that Common Lisp can
be as fast as C. This might be possible, but looks like no current
implementation reachs this goal. And I know, it's unfair to compare C and
Lisp with such low-level tasks, because Lisp is better for high-level
tasks, but would be nice to do fast image processing like in
http://www.frank-buss.de/lisp/texture.html in Lisp, only.

The code I've used for testing:

Lisp:

(defun generations (count file) 
  "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )" 
  (declare (optimize (speed 3) (safety 0) (space 0) (debug 0)))
  (let* ((width 640) 
         (height 480) 
         (ca-width (+ width 2)) 
         (ca-height (+ height 2)) 
         (len (* ca-width ca-height)) 
         (ca1 (make-array len :element-type '(unsigned-byte 8)
:initial-element 0)) 
         (ca2 (make-array len :element-type '(unsigned-byte 8)
:initial-element 0))) 
    (declare (type fixnum count)
             (type fixnum width)
             (type fixnum height)
             (type fixnum ca-width)
             (type fixnum ca-height)
             (type fixnum len)
             (type (simple-array (unsigned-byte 8) *) ca1)
             (type (simple-array (unsigned-byte 8) *) ca2))

    ;; seed the cellular automaton space 
    (setf (aref ca1 (- (floor len 2) ca-width 4)) 1) 

    ;; calcuclate the generation "count" 
    (loop for c fixnum from 1 to count do 
          (loop for y fixnum from 0 below height 
                with i fixnum = (1+ ca-width) do 
                (loop for x fixnum from 0 below width do 
                      (let ((cell (aref ca1 i))) 
                        (if (zerop cell) 
                            ;; summing the 8 neighbours 
                            (let ((sum (+ (aref ca1 (1+ i)) 
                                          (aref ca1 (1- i)) 
                                          (aref ca1 (+ i ca-width -1)) 
                                          (aref ca1 (+ i ca-width)) 
                                          (aref ca1 (+ i ca-width 1)) 
                                          (aref ca1 (- i ca-width -1)) 
                                          (aref ca1 (- i ca-width)) 
                                          (aref ca1 (- i ca-width 1))))) 
                              (when (= sum 1) (setf cell 1))) 
                          (if (= cell 62) 
                              (setf cell 0) 
                            (incf cell))) 
                        (setf (aref ca2 i) cell) 
                        (incf i))) 
                (incf i 2)) 
          (rotatef ca1 ca2)) 

    ;; save it 
    (with-open-file (stream file :direction :output 
                            :element-type '(unsigned-byte 8) 
                            :if-exists :supersede 
                            :if-does-not-exist :create) 
      (loop for y from 0 below ca-height 
            with i = 0 do 
            (loop for x from 0 below ca-width do 
                  (let ((cell (aref ca1 i))) 
                    (write-byte cell stream)) 
                  (incf i)))))) 

(defun test ()
  (generations 1000 "c:/tmp/test.txt"))


C:

#include <stdio.h> 
#include <stdlib.h> 
#include <memory.h> 

#define WIDTH 640 
#define HEIGHT 480 
#define CA_WIDTH (WIDTH + 2) 
#define CA_HEIGHT (HEIGHT + 2) 

int main(int argc, char** argv) 
{ 
        /* a generations CA (see http://www.mirekw.com/ca/rullex_gene.html
) */ 
        unsigned char *ca1, *ca2, *tmp; 
        int count; 
        FILE* f; 
        int c, x, y; 


        if (argc != 3) { 
                printf("usage: arabeske count output-file\n"); 
                return 1; 
        } 


        count = atoi(argv[1]); 
        ca1 = (unsigned char*) malloc(CA_WIDTH*CA_HEIGHT); 
        ca2 = (unsigned char*) malloc(CA_WIDTH*CA_HEIGHT); 
        memset(ca1, 0, CA_WIDTH*CA_HEIGHT); 
        memset(ca2, 0, CA_WIDTH*CA_HEIGHT); 


    /* seed the cellular automaton space */ 
        ca1[CA_WIDTH*CA_HEIGHT/2-CA_WIDTH-4] = 1; 


    /* calcuclate the generation "count" */ 
        for (c = 0; c < count; c++) { 
                int i = CA_WIDTH + 1; 
                for (y = 0; y < HEIGHT; y++) { 
                        for (x = 0; x < WIDTH; x++) { 
                                int cell = ca1[i]; 
                                if (cell) { 
                                        if (++cell == 63) cell = 0; 
                                } else { 
                                        /* summing the 8 neighbours */ 
                                        int sum = ca1[i + 1] + 
                                                        ca1[i - 1] + 
                                                        ca1[i + CA_WIDTH -
1] + 
                                                        ca1[i + CA_WIDTH] + 
                                                        ca1[i + CA_WIDTH +
1] + 
                                                        ca1[i - CA_WIDTH -
1] + 
                                                        ca1[i - CA_WIDTH] + 
                                                        ca1[i - CA_WIDTH +
1]; 
                                        if (sum == 1) cell = 1; 
                                } 
                                ca2[i] = cell; 
                                i++; 
                        } 
                        i += 2; 
                } 
                tmp = ca1; 
                ca1 = ca2; 
                ca2 = tmp; 
        } 


        /* save it */ 
        f = fopen(argv[2], "wb"); 
        if (f) { 
                fwrite(ca1, CA_WIDTH * CA_HEIGHT, 1, f); 
                fclose(f); 
        } 

        return 0; 
} 


-- 
Frank Buss, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

Re: Common Lisp implementations are still multiple times slower than C Jon Harrop
- Re: Common Lisp implementations are still multiple times slower than C Frank Buss
  - Re: Common Lisp implementations are still multiple times slower than C Nicolas Neuss
    - Re: Common Lisp implementations are still multiple times slower than C Nicolas Neuss
- Re: Common Lisp implementations are still multiple times slower than C funkyj
  - Re: Common Lisp implementations are still multiple times slower than C Jon Harrop
Re: Common Lisp implementations are still multiple times slower than C Kaz Kylheku
Re: Common Lisp implementations are still multiple times slower than C Ken Tilton
- Re: Common Lisp implementations are still multiple times slower than C Frank Buss
  - Re: Common Lisp implementations are still multiple times slower than C Ken Tilton
Re: Common Lisp implementations are still multiple times slower than C Frank Buss
- Re: Common Lisp implementations are still multiple times slower than C j.k.
  - Re: Common Lisp implementations are still multiple times slower than C Frank Buss
- Re: Common Lisp implementations are still multiple times slower than C j.k.
  - Re: Common Lisp implementations are still multiple times slower than C j.k.
  - Re: Common Lisp implementations are still multiple times slower than C Marko Kocic
  - Re: Common Lisp implementations are still multiple times slower than C funkyj
Re: Common Lisp implementations are still multiple times slower than C Rainer Joswig
- Re: Common Lisp implementations are still multiple times slower than C Christophe Rhodes
- Re: Common Lisp implementations are still multiple times slower than C Thomas Schilling
  - Re: Common Lisp implementations are still multiple times slower than C Rainer Joswig
    - Re: Common Lisp implementations are still multiple times slower than C Thomas Schilling
  - Re: Common Lisp implementations are still multiple times slower than C Juho Snellman
- Re: Common Lisp implementations are still multiple times slower than C Frank Buss
  - Re: Common Lisp implementations are still multiple times slower than C Pascal Costanza
    - Re: Common Lisp implementations are still multiple times slower than C Frank Buss
  - Re: Common Lisp implementations are still multiple times slower than C Rainer Joswig
    - Re: Common Lisp implementations are still multiple times slower than C Frank Buss
      - Re: Common Lisp implementations are still multiple times slower than C ······@corporate-world.lisp.de
        Re: Common Lisp implementations are still multiple times slower than C John Thingstad
        Re: Common Lisp implementations are still multiple times slower than C John Thingstad
    - Re: Common Lisp implementations are still multiple times slower than C Marko Kocic
      - Re: Common Lisp implementations are still multiple times slower than C funkyj
        Re: Common Lisp implementations are still multiple times slower than C Marko Kocic
      - Re: Common Lisp implementations are still multiple times slower than C Joerg Hoehle
  - Re: Common Lisp implementations are still multiple times slower than C Thomas Schilling
    - Re: Common Lisp implementations are still multiple times slower than C Frank Buss
  - Re: Common Lisp implementations are still multiple times slower than C Douglas Crosher
Re: Common Lisp implementations are still multiple times slower than C John Thingstad
- Re: Common Lisp implementations are still multiple times slower than C Frank Buss
Re: Common Lisp implementations are still multiple times slower than C Wade Humeniuk
- Re: Common Lisp implementations are still multiple times slower than C Wade Humeniuk
Re: Common Lisp implementations are still multiple times slower than C Dmitry Gorbatovsky
Re: Common Lisp implementations are still multiple times slower than C bradb
- Re: Common Lisp implementations are still multiple times slower than C Kaz Kylheku
  - Re: Common Lisp implementations are still multiple times slower than C Rob Thorpe
    - Re: Common Lisp implementations are still multiple times slower than C Kaz Kylheku
      - Re: Common Lisp implementations are still multiple times slower than C Rob Thorpe
        Re: Common Lisp implementations are still multiple times slower than C Kaz Kylheku
    - Re: Common Lisp implementations are still multiple times slower than C Michael Livshin
- Re: Common Lisp implementations are still multiple times slower than C Ken Tilton
  - Re: Common Lisp implementations are still multiple times slower than C bradb
- Re: Common Lisp implementations are still multiple times slower than C ·······@gmail.com
  - Re: Common Lisp implementations are still multiple times slower than C bradb
  - Re: Common Lisp implementations are still multiple times slower than C bradb
Re: Common Lisp implementations are still multiple times slower than C Antonio Menezes Leitao
Re: Common Lisp implementations are still multiple times slower than C Shyamal Prasad
- Re: Common Lisp implementations are still multiple times slower than C Frank Buss
  - Re: Common Lisp implementations are still multiple times slower than C HB
    - Re: Common Lisp implementations are still multiple times slower than C bradb
      - Re: Common Lisp implementations are still multiple times slower than C HB
  - Re: Common Lisp implementations are still multiple times slower than C ·············@verizon.net
- Re: Common Lisp implementations are still multiple times slower than C Peter Seibel
  - Re: Common Lisp implementations are still multiple times slower than C Christophe Rhodes
    - Re: Common Lisp implementations are still multiple times slower than C ··············@ericsson.com

From: Jon Harrop
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sat, 25 Mar 2006 15:32:30 +0000
Message-ID: <4426b410$0$3599$ed2e19e4@ptn-nntp-reader04.plus.net>

Frank Buss wrote:
> In another thread in this newsgroup some people wrote that Common Lisp can
> be as fast as C. This might be possible, but looks like no current
> implementation reachs this goal. And I know, it's unfair to compare C and
> Lisp with such low-level tasks, because Lisp is better for high-level
> tasks, but would be nice to do fast image processing like in
> http://www.frank-buss.de/lisp/texture.html in Lisp, only.

Neuss recently posted about a simple stencil operation, giving C and Lisp
code. I found the Lisp code to be faster than the C code on my AMD64.

If you want to do those kinds of operations efficiently then use OpenGL and
do them on the GPU.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/ocaml_for_scientists/chapter1.html

From: Frank Buss
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 16:37:22 +0000
Message-ID: <1dhko7ays1o3t$.1aeoz0segvs0m$.dlg@40tude.net>

Jon Harrop wrote:

> Neuss recently posted about a simple stencil operation, giving C and Lisp
> code. I found the Lisp code to be faster than the C code on my AMD64.

can you post a reference? I've read something about this in some optimizing
technics in the links Rainer posted and C was slower when the size of the
stencil area was not known at compile time, because then the Lisp program
can compile code at runtime for the size. With static sizes C was always
faster.

> If you want to do those kinds of operations efficiently then use OpenGL and
> do them on the GPU.

Doing it in C is easier and faster (at least with my graphics card) :-)

-- 
Frank Buss, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: Nicolas Neuss
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Mon, 27 Mar 2006 09:02:57 +0000
Message-ID: <87k6agqmym.fsf@ortler.iwr.uni-heidelberg.de>

Frank Buss <··@frank-buss.de> writes:

> Jon Harrop wrote:
>
>> Neuss recently posted about a simple stencil operation, giving C and Lisp
>> code. I found the Lisp code to be faster than the C code on my AMD64.
>
> can you post a reference? I've read something about this in some optimizing
> technics in the links Rainer posted and C was slower when the size of the
> stencil area was not known at compile time, because then the Lisp program
> can compile code at runtime for the size. With static sizes C was always
> faster.

Here is the message Jon Harrop is refering too:

http://groups.google.com/group/comp.lang.lisp/msg/3cd599c51ea91c3d

It has some links to code.  Test it for yourself.

Nicolas.

From: Nicolas Neuss
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Tue, 28 Mar 2006 09:19:27 +0000
Message-ID: <87fyl3q63k.fsf@ortler.iwr.uni-heidelberg.de>

Nicolas Neuss <·············@iwr.uni-heidelberg.de> writes:

> Here is the message Jon Harrop is refering too:
>
> http://groups.google.com/group/comp.lang.lisp/msg/3cd599c51ea91c3d
>
> It has some links to code.  Test it for yourself.
>
> Nicolas.

Hmm, I could not resist in trying to adapt my stencil code to this
situation.  Compiled with CMUCL it runs within a factor of 1.5-2 of the
speed of your C-code.  I optimized a little bit more by not performing the
additions for points where the previous point already contained a number
larger than 1.

To the OP: please check if it still does the same as your code.

Food for thought: maybe one could do at least some of the work with 32bit
additions.  However, the algorithm would be even more complicated.

Yours, Nicolas.

Here is the code:

;;; This codes generates and compiles stencil application functions for
;;; arbitrary stencils.  For simplicity, we assume Dirichlet boundary
;;; conditions.

(defun compress-stencil (width stencil)
  "Transforms a 2D-stencil given as array in a sparse form where entries
with equal value are put together."
  (let ((sparse-stencil ()))
    (dotimes (i 3)
      (dotimes (j 3)
	(let ((value (aref stencil i j)))
	  (unless (zerop value)
	    (let ((pos (+ (1- i) (* (1- j) width)))
		  (entry-group (assoc value sparse-stencil)))
	      (if entry-group
		  (push pos (cdr entry-group))
		  (push (list value pos) sparse-stencil)))))))
    sparse-stencil))

(defun apply-stencil-code (width height stencil)
  "Code for applying a stencil to an array of given width and height."
  (let ((cstencil (compress-stencil width stencil)))
    `(lambda (result values)
      (declare (type (simple-array (unsigned-byte 8) (*)) result values))
      (declare (optimize (speed 3) (safety 0) (debug 0) (compilation-speed 0)))
      (loop
       for pos1 of-type (integer ,width ,(* width height))
       from ,width below ,(* width (1- height)) by ,width do
       (loop
	for pos2 of-type (integer ,width ,(* width height))
	from (1+ pos1) below (+ pos1 ,(1- width))
	for value of-type fixnum = (aref values pos2)
	and previous of-type fixnum = (aref values pos1) then value
	do
	(setf (aref result pos2)
	      (if (zerop value)
		  (if (> previous 1)
		      0
		      (if (= 1 (+
				,@(loop
				   for (value . indices) in cstencil collecting
				   `(* ,value
				     (+ ,@(loop for index in indices collecting
						`(aref values (+ pos2 ,index))))))))
			  1
			  0))
		  (if (= value 63)
		      0
		      (1+ value)))))))))

(defun stencil-application-function (width height stencil)
  "Returns a function for a stencil application."
  (compile nil (apply-stencil-code width height stencil)))

(defun test (stencil width height &optional (count (/ 100000000 (* width height))))
  "Applies stencil count times to an width * height grid function."
  (let ((entries (make-array (* width height) :initial-element 0 :element-type '(unsigned-byte 8)))
	(entries2 (make-array (* width height) :initial-element 0 :element-type '(unsigned-byte 8)))
	(stencil-function (stencil-application-function width height stencil))
	(middle (/ (* (1+ width) height) 2)))
    (setf (aref entries middle) 1)
    (time
     (dotimes (i count)
       (funcall stencil-function entries2 entries)
       (rotatef entries entries2)
       ))))

(defparameter *nine-point-stencil*
  #2A((1 1 1)
      (1 0 1)
      (1 1 1)))

(test *nine-point-stencil* 642 482 1000)

From: funkyj
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Tue, 28 Mar 2006 17:57:44 +0000
Message-ID: <1143568664.218877.232580@t31g2000cwb.googlegroups.com>

So where is the Ocaml implementation of the OP's program?  How about a
Stalin implementation?

From: Jon Harrop
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Tue, 28 Mar 2006 01:09:59 +0000
Message-ID: <4429de6e$0$3608$ed2e19e4@ptn-nntp-reader04.plus.net>

funkyj wrote:
> So where is the Ocaml implementation of the OP's program?

Here's an OCaml version:

let width = 640 and height = 480
let ca_width = width+2 and ca_height = height+2

let count, file = match Sys.argv with [|_; n; f|] -> int_of_string n, f
  | _ ->
      print_endline "Usage: arabeske <count> <filename>";
      exit 1

let ca1 = Array.make (ca_width*ca_height) 0
let ca2 = Array.make (ca_width*ca_height) 0

let rec loop ca1 ca2 n =
  if n=0 then () else
    let i = ref (ca_width+1) in
    for y=0 to height-1 do
      for x=0 to width-1 do
        ca2.(!i) <-
          (match ca1.(!i) with
             62 -> 0
           | 0 ->
               if ca1.(!i-ca_width-1)+ca1.(!i-ca_width)+ca1.(!i-ca_width+1)+
                 ca1.(!i+1)+ca1.(!i-1)+
                 ca1.(!i+ca_width-1)+ca1.(!i+ca_width)+ca1.(!i+ca_width+1)=1
then
                   1
               else 0
           | cell -> cell + 1);
        incr i;
      done;
      i := !i + 2;
    done;
    loop ca2 ca1 (n-1)

let () =
  ca1.(ca_width*ca_height/2 - ca_width - 4) <- 1;

  loop ca1 ca2 count;

  let ch = open_out file in
  Array.iter (fun i -> output_char ch (Char.chr i)) ca1;
  close_out ch

On my 1.2GHz Athlon T-bird I get:

$ ocamlopt -unsafe -inline 100 arabeske2.ml -o arabeske2
$ time ./arabeske2 1000 output.pgm

real    0m18.159s
user    0m16.520s
sys     0m0.160s

Compared to:

$ time java -server -Xms16M Arabeske 1000 output.txt

real    0m16.874s
user    0m14.100s
sys     0m0.130s

So this OCaml implementation (with bounds checking turned off!) is still
slightly slower than the Java.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/ocaml_for_scientists/chapter1.html

From: Kaz Kylheku
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 05:36:52 +0000
Message-ID: <1143351412.271184.179930@j33g2000cwa.googlegroups.com>

Frank Buss wrote:
> I've tested my cellular automaton from
> http://www.frank-buss.de/automaton/arabeske.html again on my current
> computer with C and some Common Lisp implementations on Windows:
>
> Time for calculating 1000 generations:
>
> Visual Studio C++ 7.1.3088: 1.9s
> sbcl-0.9.9-x86-win32: 29s
> LispWorks Personal Edition 4.4.6 and 4.3.7: 172s
> CLISP-2.3.8: 518s

There are still a few things in your Lisp code that could be declared.
That big + addition isn't declared as coming out a fixnum using (the
...). There are some undeclared locals like cell and sum. Maybe the
compiler isn't smart enough to realize that adding together eight
(unsigned 8) values produces a result that fits into fixnum. Try
changing

  (aref ca1 (+ i ca-width -1))

to

  (aref ca1 (the fixnum (+ i ca-width -1)))

Adding two fixnums together doesn't necessarily produce a fixnum,
right? Just because i is a fixnum and ca-width is a fixnum doesn't mean
that the result of adding them can be assumed to be a fixnum. If you
want such a dangerous assumption, you have to ask for it. In Lisp,
everything is safe unless asked for otherwise.

C is completely the opposite. In C, every expression and subexpression
has a type, statically sythesized from the types of its constituents
according to rules. Information analogous (the fixnum (expr ..)) is
implicit everywhere!  If you add two unsigned char values together they
are promoted to int (or, rarely, unsigned int). Addition between two
ints produces an int. If the result doesn't fit into int, the behavior
is undefined.

You get C to be fast by implicitly meeting these restrictions. You
"just know" that if you add this variable and that one, they don't
overflow int. You know that since these 8 cells in the automaton are
values in the range 0 to 255, you can add all of them together and fit
into type int. The compiler can emit code which just blindly widens the
values into word-sized operands that are milled through seven add
instructions.

From: Ken Tilton
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 11:39:33 +0000
Message-ID: <S3vVf.1917$Ph.662@fe10.lga>

Frank Buss wrote:
> I've tested my cellular automaton from
> http://www.frank-buss.de/automaton/arabeske.html again on my current
> computer with C and some Common Lisp implementations on Windows:
> 
> Time for calculating 1000 generations:
> 
> Visual Studio C++ 7.1.3088: 1.9s
> sbcl-0.9.9-x86-win32: 29s
> LispWorks Personal Edition 4.4.6 and 4.3.7: 172s
> CLISP-2.3.8: 518s

ACL 8 on win32: 4.8s

> 
> I'm interested in the timings for other Lisp implementations. You can use
> the C version from http://www.frank-buss.de/tmp/arabeske.zip for a time
> reference (or use the CLISP version).

I do not know how to time things on the win32 command line, but FWIW:

arabeske 1000 a.txt: one-thousand-one-one-thousand-two-one

kt

From: Frank Buss
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 15:09:58 +0000
Message-ID: <1eozwnskzmvmu$.1es9dop26lrg4.dlg@40tude.net>

Ken Tilton wrote:

> ACL 8 on win32: 4.8s

This sounds interesting. But I didn't found a link to the 8.0 trial
version.

> I do not know how to time things on the win32 command line, but FWIW:
> 
> arabeske 1000 a.txt: one-thousand-one-one-thousand-two-one

I've updated the program:

#include <stdio.h> 
#include <stdlib.h> 
#include <memory.h> 
#include <time.h>

	clock_t start, finish;
	double  duration;
	start = clock();
	...
	finish = clock();
	duration = (double)(finish - start) / CLOCKS_PER_SEC;
	printf( "%2.1f seconds\n", duration );

http://www.frank-buss.de/tmp/arabeske2.zip

My first timing was stopped with a stop watch, now it is faster,  1.6
seconds :-)

-- 
Frank Buss, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: Ken Tilton
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 17:22:57 +0000
Message-ID: <F5AVf.36$YH5.5@fe10.lga>

Frank Buss wrote:
> Ken Tilton wrote:
> 
> 
>>ACL 8 on win32: 4.8s
> 
> 
> This sounds interesting. But I didn't found a link to the 8.0 trial
> version.

Maybe they are saving 8 for the Express Edition?

> 
> 
>>I do not know how to time things on the win32 command line, but FWIW:
>>
>>arabeske 1000 a.txt: one-thousand-one-one-thousand-two-one
> 
> 
> I've updated the program:
> 
> #include <stdio.h> 
> #include <stdlib.h> 
> #include <memory.h> 
> #include <time.h>
> 
> 	clock_t start, finish;
> 	double  duration;
> 	start = clock();
> 	...
> 	finish = clock();
> 	duration = (double)(finish - start) / CLOCKS_PER_SEC;
> 	printf( "%2.1f seconds\n", duration );
> 
> http://www.frank-buss.de/tmp/arabeske2.zip
> 
> My first timing was stopped with a stop watch, now it is faster,  1.6
> seconds :-)

I get 1.7s

ken

-- 
Cells: http://common-lisp.net/project/cells/

"And I will know my song well before I start singing."  - Bob Dylan

From: Frank Buss
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 05:16:53 +0000
Message-ID: <718fihdrfvvp.vxh47xsd0aul.dlg@40tude.net>

Frank Buss wrote:

> Time for calculating 1000 generations:
> 
> Visual Studio C++ 7.1.3088: 1.9s
> sbcl-0.9.9-x86-win32: 29s
> LispWorks Personal Edition 4.4.6 and 4.3.7: 172s
> CLISP-2.3.8: 518s

java version "1.5.0_06": 6.4s

import java.io.*;

public class Arabeske {

	public static void main(String args[]) throws Exception {
		/*
		 * a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )
		 */
		int WIDTH = 640;
		int HEIGHT = 480;
		int CA_WIDTH = (WIDTH + 2);
		int CA_HEIGHT = (HEIGHT + 2);

		if (args.length != 2) {
			System.out.println("usage: arabeske count output-file\n");
			return;
		}

		int count = Integer.parseInt(args[0]);
		byte[] ca1 = new byte[CA_WIDTH * CA_HEIGHT];
		byte[] ca2 = new byte[CA_WIDTH * CA_HEIGHT];

		/* seed the cellular automaton space */
		ca1[CA_WIDTH * CA_HEIGHT / 2 - CA_WIDTH - 4] = 1;

		/* calcuclate the generation "count" */
		for (int c = 0; c < count; c++) {
			int i = CA_WIDTH + 1;
			for (int y = 0; y < HEIGHT; y++) {
				for (int x = 0; x < WIDTH; x++) {
					int cell = ca1[i];
					if (cell != 0) {
						if (++cell == 63)
							cell = 0;
					} else {
						/* summing the 8 neighbours */
						int sum = ((int) ca1[i + 1]) + ((int) ca1[i - 1]) + ((int) ca1[i +
CA_WIDTH - 1])
								+ ((int) ca1[i + CA_WIDTH]) + ((int) ca1[i + CA_WIDTH + 1])
								+ ((int) ca1[i - CA_WIDTH - 1]) + ((int) ca1[i - CA_WIDTH])
								+ ((int) ca1[i - CA_WIDTH + 1]);
						if (sum == 1)
							cell = 1;
					}
					ca2[i] = (byte) cell;
					i++;
				}
				i += 2;
			}
			byte[] tmp = ca1;
			ca1 = ca2;
			ca2 = tmp;
		}

		/* save it */
		FileOutputStream f = new FileOutputStream(args[1]);
		f.write(ca1);
		f.close();
	}
}


-- 
Frank Buss, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: j.k.
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Mon, 27 Mar 2006 03:58:39 +0000
Message-ID: <1143431919.775192.208640@i40g2000cwc.googlegroups.com>

Frank Buss wrote:
> Frank Buss wrote:
>
> > Time for calculating 1000 generations:
> >
> > Visual Studio C++ 7.1.3088: 1.9s
> > sbcl-0.9.9-x86-win32: 29s
> > LispWorks Personal Edition 4.4.6 and 4.3.7: 172s
> > CLISP-2.3.8: 518s
>
> java version "1.5.0_06": 6.4s

I suspect that the Java time you give here was just using the default
java options, which is probably not fair if the point is to compare
optimized run times.

You'll see much better results for Java with some very minor changes
and a couple of commandline args to the java process.

As a baseline, with gcc, and gcc -O2 respectively, I see the following
for your C code:

real    0m4.318s
user    0m4.310s

real    0m2.191s
user    0m2.140s

(-O3 wasn't better than -O2.)

Your Java code with default options, shows for me:

real    0m8.376s
user    0m8.330s

If I add the "-Xms16M" to allocate 16 megs for the heap initially, that
drastically improves to:

real    0m4.820s
user    0m4.780s

Making the initial constants final (as they are constants in the C
code) yields:

real    0m4.552s
user    0m4.500s

And then switching to the -server vm (which optimizes much more
aggressively than the default -client vm) yields:

real    0m3.383s
user    0m3.260s

Of that time, only 3.19 s is actually inside the main method. The rest
of the time is startup, which is a one-time cost and probably shouldn't
be included.

That puts Java (3.19 s) about halfway between unoptimized gcc c (4.32
s), and gcc -O2 (2.19 s). This is on Linux with kernel 2.6.15 and  java
version "1.5.0_06-b05".

(The final Java command for the best time was "java -server -Xms16M
Arabeske 1000 /tmp/java.out".)

From: Frank Buss
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Mon, 27 Mar 2006 04:28:13 +0000
Message-ID: <1i1v5ljc4i8e0.1svf13nr7oov1$.dlg@40tude.net>

j.k. wrote:

> That puts Java (3.19 s) about halfway between unoptimized gcc c (4.32
> s), and gcc -O2 (2.19 s). This is on Linux with kernel 2.6.15 and  java
> version "1.5.0_06-b05".
> 
> (The final Java command for the best time was "java -server -Xms16M
> Arabeske 1000 /tmp/java.out".)

Thanks, I've tried this and with constants ("final int WIDTH = 640" etc.)
and with Java 5 now it needs 4.2 seconds and with 1.4.2_07 it needs 3.6
seconds.

-- 
Frank Buss, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: j.k.
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Mon, 27 Mar 2006 04:29:41 +0000
Message-ID: <1143433781.609153.121600@e56g2000cwe.googlegroups.com>

Frank Buss wrote:
> Frank Buss wrote:
>
> > Time for calculating 1000 generations:
> >
> > Visual Studio C   7.1.3088: 1.9s
> > sbcl-0.9.9-x86-win32: 29s
> > LispWorks Personal Edition 4.4.6 and 4.3.7: 172s
> > CLISP-2.3.8: 518s
>
> java version "1.5.0_06": 6.4s

I suspect that the Java time you give here was just using the default
java options, which is probably not fair if the point is to compare
optimized run times.

You'll see much better results for Java with some very minor changes
and a couple of commandline args to the java process.

As a baseline, with gcc, and gcc -O2 respectively, I see the following
for your C code:

real    0m4.318s
user    0m4.310s

real    0m2.191s
user    0m2.140s

(-O3 wasn't better than -O2.)

Your Java code with default options, shows for me:

real    0m8.376s
user    0m8.330s

If I add the "-Xms16M" to allocate 16 megs for the heap initially, that
drastically improves to:

real    0m4.820s
user    0m4.780s

Making the initial constants final (as they are constants in the C
code) yields:

real    0m4.552s
user    0m4.500s

And then switching to the -server vm (which optimizes much more
aggressively than the default -client vm) yields:

real    0m3.383s
user    0m3.260s

Of that time, only 3.19 s is actually inside the main method. The rest
of the time is startup, which is a one-time cost and probably shouldn't
be included.

That puts Java (3.19 s) about halfway between unoptimized gcc c (4.32
s), and gcc -O2 (2.19 s). This is on Linux with kernel 2.6.15 and  java
version "1.5.0_06-b05".

(The final Java command for the best time was "java -server -Xms16M
Arabeske 1000 /tmp/java.out".)

From: j.k.
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Mon, 27 Mar 2006 04:37:29 +0000
Message-ID: <1143434249.683836.92070@u72g2000cwu.googlegroups.com>

Oops, sorry about the double post. I don't know how that happened.

From: Marko Kocic
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Mon, 27 Mar 2006 14:03:22 +0000
Message-ID: <1143468201.927843.159390@v46g2000cwv.googlegroups.com>

Also, one should compile java with javac -g:none -O

From: funkyj
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Tue, 28 Mar 2006 18:12:16 +0000
Message-ID: <1143569536.595298.192060@i40g2000cwc.googlegroups.com>

The fastest Java result still seems to be much faster than the fastest
Common Lisp result.

I make the following whole unsubstantiated claims to explain this
result:
  + a lot more money (i.e. man years) has gone into optimizing JVMs
than has
     gone into optimizing CL compilers.
  + naive java code (like naive C code) is more easily optimized.  This
has been
     discussed other postings in this thread (i.e. C vs CL).

From: Rainer Joswig
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 10:08:16 +0000
Message-ID: <joswig-19CA5B.11081626032006@news-europe.giganews.com>

In article <······························@40tude.net>,
 Frank Buss <··@frank-buss.de> wrote:

> I've tested my cellular automaton from
> http://www.frank-buss.de/automaton/arabeske.html again on my current
> computer with C and some Common Lisp implementations on Windows:
> 
> Time for calculating 1000 generations:
> 
> Visual Studio C++ 7.1.3088: 1.9s
> sbcl-0.9.9-x86-win32: 29s
> LispWorks Personal Edition 4.4.6 and 4.3.7: 172s
> CLISP-2.3.8: 518s
> 
> I'm interested in the timings for other Lisp implementations. You can use
> the C version from http://www.frank-buss.de/tmp/arabeske.zip for a time
> reference (or use the CLISP version).
> 
> In another thread in this newsgroup some people wrote that Common Lisp can
> be as fast as C. This might be possible, but looks like no current
> implementation reachs this goal.

Some implementations do. But you have to add more information
to the code to reach that goal. Common Lisp compilers cannot
deduce that you want to work around the language semantics
- see below.

> And I know, it's unfair to compare C and
> Lisp with such low-level tasks, because Lisp is better for high-level
> tasks, but would be nice to do fast image processing like in
> http://www.frank-buss.de/lisp/texture.html in Lisp, only.
> 
> The code I've used for testing:

Your code isn't optimized for speed. Just look at the
disassembly.

If you use CMUCL the compiler will generate lots
of warnings for stuff it couldn't optimize (because
you may not have told the compiler what you want).

The most basic example in code typically is this:

(defun foo (a b)
  (+ (the fixnum a) (the fixnum b)))

There is no way the compiler can deduce that you intend
to do fixnum operations, because the result can
be a non-fixnum. In Lisp by default adding two
fixnums may result in a bignum. If you don't want
that you have to tell it your Lisp compiler.
It is not a problem that the compiler might
not do type inference - it is a problem of the semantics
of the operations. So, declaring a few variables to be
fixnums doesn't tell the compiler what all the
result types should be.

So you write this:

(defun foo (a b)
  (the fixnum (+ (the fixnum a) (the fixnum b))))

But then adding two large fixnums are just a fixnum
and you have to deal with that in your algorithm.
You have now the typical overflow problem when you do numerics
in C (and similar languages). That's why the US government
had problems calculating Bill Gates' tax.

Code like that easily gets large. So usually you
write some macrology to shorten the code.

The next problem for several applications is that
fixnums are not 32 bit wide in 32bit Lisp
implementations. See the value
of MOST-POSITIVE-FIXNUM. In typical Common Lisp
implementations Lisp data has tags. Tags are
costing space. So if Lisp has a 32bit implementation,
you get less than 32bits for your fixnums.
Same also the size of arrays and so on. On a 64bit
machine with a 64bit Lisp implementation, you
easily have 32bit (and more) fixnums. The problem
there is that you don't have the full 64bit.
Because again, the tags take space. The problem
is the same, but for different sizes.

So, what are the typical solutions for doing full
32bit arithmetic?

a) use special operators (see the recent LispWorks release notes
   on that topic)
b) use a 64bit machine and a 64bit Lisp implementation.
   LispWorks 5 will do that. ACL already does it.
c) magic

Similar problems exists with floats.

Pointers:

Making Lisp fast:
http://www.cip.physik.uni-muenchen.de/~tf/lambda/aei/lisp.html
On using Common Lisp for Scientifc Computing
http://www.iwr.uni-heidelberg.de/organization/sfb359/PP/Preprint2002-40.ps.gz
Building Common Lisp Applications with Reasonable Performance
http://bmrc.berkeley.edu/research/publications/1993/125/Lisp.html
Another expert I had the pleasure to meet personally:
http://openmap.bbn.com/~kanderso/performance/

-- 
http://lispm.dyndns.org/

From: Christophe Rhodes
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 09:56:24 +0000
Message-ID: <sqpsk97cmv.fsf@cam.ac.uk>

Rainer Joswig <······@lisp.de> writes:

> So, what are the typical solutions for doing full
> 32bit arithmetic?
>
> a) use special operators (see the recent LispWorks release notes
>    on that topic)
> b) use a 64bit machine and a 64bit Lisp implementation.
>    LispWorks 5 will do that. ACL already does it.
> c) magic

(d) write the algorithm out and let the compiler do the work:
     (logand (+ x y) #xffffffff)
    should compile to a machine addition if x and y are known to be
    (unsigned-byte 32).

I'm sure that there are still native-code lisp compilers which don't
implement this: they should be named and shamed.

Christophe

From: Thomas Schilling
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 09:47:26 +0000
Message-ID: <48n5ogFkh1qfU1@news.dfncis.de>

You are right, but from the SBCL-disassembly this is probably not the
bootleneck in this example.  I'd say the major bottleneck is array access:

  (+ (aref ca1 (1+ i))
     (aref ca1 (1- i))
     (aref ca1 (+ i ca-width -1))
     (aref ca1 (+ i ca-width))
     (aref ca1 (+ i ca-width 1))
     (aref ca1 (- i ca-width -1))
     (aref ca1 (- i ca-width))
     (aref ca1 (- i ca-width 1)))))

(At least for SBCL) These aren't identified as dependent on the
induction variable "i" and are thus recalculated on every iteration.
Furthermore the SBCL register allocator doesn't do a great job, either.

For (aref ca1 (+ i ca-width -1)) and adding it, it produces this code:

  ;     4E83:       8945F4           MOV [EBP-12], EAX
  ;     4E86:       C17DF402         SAR DWORD PTR [EBP-12], 2
  ;     4E8A:       8145F482020000   ADD DWORD PTR [EBP-12], 642
  ;     4E91:       8345F4FF         ADD DWORD PTR [EBP-12], -1
  ;     4E95:       8B45F4           MOV EAX, [EBP-12]
  ;     4E98:       0FB6440201       MOVZX EAX, [EDX+EAX+1]
  ;     4E9D:       8945F4           MOV [EBP-12], EAX
  ;     4EA0:       C165F402         SHL DWORD PTR [EBP-12], 2
  ;     4EA4:       8B45F0           MOV EAX, [EBP-16]
  ;     4EA7:       0345F4           ADD EAX, [EBP-12]
  ;     4EAA:       8945F0           MOV [EBP-16], EAX
  ;     4EAD:       8B45E0           MOV EAX, [EBP-32]

While the optimum would keep the offset in, say, EDX and only increment
EDX when going from (aref ca1 (+ i ca-width -1)) to (aref ca1 (+ i
ca-width)).  Actually the C-version does so.

Note: I'm using a slightly oldish version of SBCL (0.9.4). So the
situation may be better now.

So, it looks like C doesn't (mainly) win here for its "special"
semantics but for its better support for loop optimizations in
compilers. (I used gcc 3.3.6 with -O2)

From: Rainer Joswig
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 11:13:48 +0000
Message-ID: <joswig-79B24B.12134826032006@news-europe.giganews.com>

In article <··············@news.dfncis.de>,
 Thomas Schilling <······@yahoo.de> wrote:

> You are right, but from the SBCL-disassembly this is probably not the
> bootleneck in this example.
>
>  I'd say the major bottleneck is array access:
> 
>   (+ (aref ca1 (1+ i))
>      (aref ca1 (1- i))
>      (aref ca1 (+ i ca-width -1))
>      (aref ca1 (+ i ca-width))
>      (aref ca1 (+ i ca-width 1))
>      (aref ca1 (- i ca-width -1))
>      (aref ca1 (- i ca-width))
>      (aref ca1 (- i ca-width 1)))))
> 
> (At least for SBCL) These aren't identified as dependent on the
> induction variable "i" and are thus recalculated on every iteration.
> Furthermore the SBCL register allocator doesn't do a great job, either.
> 
> For (aref ca1 (+ i ca-width -1)) and adding it, it produces this code:
> 
>   ;     4E83:       8945F4           MOV [EBP-12], EAX
>   ;     4E86:       C17DF402         SAR DWORD PTR [EBP-12], 2
>   ;     4E8A:       8145F482020000   ADD DWORD PTR [EBP-12], 642
>   ;     4E91:       8345F4FF         ADD DWORD PTR [EBP-12], -1
>   ;     4E95:       8B45F4           MOV EAX, [EBP-12]
>   ;     4E98:       0FB6440201       MOVZX EAX, [EDX+EAX+1]
>   ;     4E9D:       8945F4           MOV [EBP-12], EAX
>   ;     4EA0:       C165F402         SHL DWORD PTR [EBP-12], 2
>   ;     4EA4:       8B45F0           MOV EAX, [EBP-16]
>   ;     4EA7:       0345F4           ADD EAX, [EBP-12]
>   ;     4EAA:       8945F0           MOV [EBP-16], EAX
>   ;     4EAD:       8B45E0           MOV EAX, [EBP-32]
> 
> While the optimum would keep the offset in, say, EDX and only increment
> EDX when going from (aref ca1 (+ i ca-width -1)) to (aref ca1 (+ i
> ca-width)).  Actually the C-version does so.
> 
> Note: I'm using a slightly oldish version of SBCL (0.9.4). So the
> situation may be better now.
> 
> So, it looks like C doesn't (mainly) win here for its "special"
> semantics but for its better support for loop optimizations in
> compilers. (I used gcc 3.3.6 with -O2)

The Lisp standard has lots of implications on how you implement Lisp
and under what circumstances you can reach better performance.
The standard tried to give the implementors some room - but
implementing some of the optimizations is tough and
the code the user writes might be problematic (since it
has to be handtuned, you have to specify that you want to deal
with slightly different semantics, ...).

Array access is a topic.

For example in many Lisp implementations (those with a copying or
compacting GC) storage can be moved around at runtime. In those
Lisps you cannot assume a fixed start address during calculations.

Say you have a calculation:

(+ (aref array (index-calc1))
   (aref array (index-calc2)))

In Common Lisp the arguments are evaluated left to right.
Running the function INDEX-CALC1 might trigger a
garbage collection (or it might be triggered by some
other mechanism). During that garbage collection,
the array might be moved (to compact memory, to
move it to another half-space, or to move it to another
generation, ...). The the second AREF will access
the same array, but at a different base location.

The Lisp standard doesn't say anything about the GC,
but most implementations are using some kind of GC.
The kind of GC has lots of interesting
performance implications (for example for hashtables).

Then the array access checks index overflow conditions.

If you do (aref array (+ i 3)) the Lisp compiler typically will not
know that the result of (+ i 3) is  a) is still a fixnum,
b) is still a valid index (see ARRAY-DIMENSION-LIMIT) and
c) is still a valid index for the special type of index
that may be use here.

The layout of the storage in arrays is largely implementation
dependant. Some Lisps only have few (with boxed data) array types.
Boxing and unboxing values from arrays maybe a topic.
Automatic avoidance of boxing/unboxing seems to be also
a difficult topic.

LOOP optimization is still another topic. I guess this is a topic
where many Lisp implementations are very weak.

I would guess many performance problems can be reduced to very
small examples. It might be useful to collect those and
compare the various implementations and the way how they
deal with the problem (if at all).

-- 
http://lispm.dyndns.org/

From: Thomas Schilling
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 13:16:06 +0000
Message-ID: <48nhvnFkbngkU1@news.dfncis.de>

Rainer Joswig wrote:
> If you do (aref array (+ i 3)) the Lisp compiler typically will not
> know that the result of (+ i 3) is  a) is still a fixnum,
> b) is still a valid index (see ARRAY-DIMENSION-LIMIT) and
> c) is still a valid index for the special type of index
> that may be use here.
> 
> The layout of the storage in arrays is largely implementation
> dependant. Some Lisps only have few (with boxed data) array types.
> Boxing and unboxing values from arrays maybe a topic.
> Automatic avoidance of boxing/unboxing seems to be also
> a difficult topic.

Yes, I agree that optimizing lisp code is way much harder than FORTRAN
or C (in that order).  CL's dynamic nature makes it hard to prove
assumptions allowing for optimizations. E.g. in this case the compiler
must make sure the array cannot move, is not resized and only contains
items of the same type.

To make sure it does not move, you either have to allocate it to a
special region (simple here, because it doesn't "escape", i.e. no other
function can get access to it, so it's dynamic extend) or make sure no
GC happends during the optimized accesses. This might be tractable here,
but probably very hard in the general case (i.e., expensive to do, and
not beneficient in most cases).

> LOOP optimization is still another topic. I guess this is a topic
> where many Lisp implementations are very weak.

Looks like.  But I can't tell.  It's really non-trivial though, because
you'd need dependency analysis, interprocedural alias analysis and much
much more.  Furthermore, most people use FORTRAN for this, because the
compilers are better.  They are better for (at least) three reasons:
tradition, demand, and because the language is much more restrictive to
allow better optimizations.  To gain the same performance you'd actually
need to *embed* FORTRAN. This need not necessarily be a bad thing, but
it's a lot of work and since there's no (sufficient) demand, noone will
touch such a big project.

It could be nice to enable high-level, high-performance
DSL-implementations (e.g. for Bioinformatics). But would it be
sufficiently more useful than using Mathematica/Matlab/Maple/whatever
and make it generate efficient Fortran code?

> I would guess many performance problems can be reduced to very
> small examples. It might be useful to collect those and
> compare the various implementations and the way how they
> deal with the problem (if at all).

Yes, that'd be nice.  So let's take this one as a start :)

From: Juho Snellman
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 13:09:59 +0000
Message-ID: <slrne2d4l7.59f.jsnell@sbz-30.cs.Helsinki.FI>

Thomas Schilling <······@yahoo.de> wrote:
> You are right, but from the SBCL-disassembly this is probably not the
> bootleneck in this example.  I'd say the major bottleneck is array access:
> 
>   (+ (aref ca1 (1+ i))
>      (aref ca1 (1- i))
>      (aref ca1 (+ i ca-width -1))
>      (aref ca1 (+ i ca-width))
>      (aref ca1 (+ i ca-width 1))
>      (aref ca1 (- i ca-width -1))
>      (aref ca1 (- i ca-width))
>      (aref ca1 (- i ca-width 1)))))
> 
> (At least for SBCL) These aren't identified as dependent on the
> induction variable "i" and are thus recalculated on every iteration.
> Furthermore the SBCL register allocator doesn't do a great job, either.

Wrapping the (LET ((SUM ...)) ...) in a (DOTIMES (Z 1) ...) would
probably improve the allocation.

There are a couple of other compiler issues here too. One of would be
very hard to fix (representation selection, i.e. selecting whether an
integer should be stored with a fixnum tag or not, is responsible for
the extra shifts in the inner loop). The other one would probably be
easy to fix (the constants in (+ variable constant constant) aren't
folded).

-- 
Juho Snellman

From: Frank Buss
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 11:06:23 +0000
Message-ID: <1uug7ddhssnoc.1o0xw0tsz5ove$.dlg@40tude.net>

Rainer Joswig wrote:

>> In another thread in this newsgroup some people wrote that Common Lisp can
>> be as fast as C. This might be possible, but looks like no current
>> implementation reachs this goal.
> 
> Some implementations do. But you have to add more information
> to the code to reach that goal. Common Lisp compilers cannot
> deduce that you want to work around the language semantics
> - see below.

Which implementation is as fast as Visual C++ 7.1?

Thanks for your references. I was able to optimize it with some tips from
the Fannkuch benchmark. Looks like in LispWorks using SVREF instead of AREF
and some more type declarations makes it multiple times faster, doubles the
speed for SBCL and gives some percent for CLISP. With later versions of
LispWorks it is a bit slower than with older versions:

Visual Studio C++ 7.1.3088: 1.9s
java version "1.5.0_06": 6.4s
sbcl-0.9.9-x86-win32: 17s
LispWorks Personal Edition 4.3.7: 25s
LispWorks Personal Edition 4.4.6: 33s
CLISP-2.3.8: 483s

But this is still multiple times slower than the C or the Java version. 

I think writing my own inline-assembler would be a good idea :-)

The optimized source:

(deftype ui8 () '(unsigned-byte 8))
(defmacro ui8 (a) `(the ui8 ,a))
(defmacro s+ (&rest args)
  `(the fixnum (+ ,@(loop for arg in args collect `(ui8 ,arg)))))
(defmacro sref (a i) `(ui8 (svref ,a (the fixnum ,i))))
(defmacro s= (a b) `(= (the fixnum ,a) (the fixnum ,b)))
(defmacro setfs (a b) `(setf ,a (the fixnum ,b)))

(defun generations (count file) 
  "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )" 
  (declare (optimize (speed 3) (safety 0) (space 0) (debug 0)))
  (let* ((width 640) 
         (height 480) 
         (ca-width (+ width 2)) 
         (ca-height (+ height 2)) 
         (len (* ca-width ca-height)) 
         (ca1 (make-array len :initial-element 0)) 
         (ca2 (make-array len :initial-element 0))) 
    (declare (type fixnum count)
             (type fixnum width)
             (type fixnum height)
             (type fixnum ca-width)
             (type fixnum ca-height)
             (type fixnum len)
             (type simple-vector ca1)
             (type simple-vector ca2))

    ;; seed the cellular automaton space 
    (setf (aref ca1 (- (floor len 2) ca-width 4)) 1) 

    ;; calcuclate the generation "count" 
    (loop for c fixnum from 1 to count do 
          (loop for y fixnum from 0 below height 
                with i fixnum = (1+ ca-width) do 
                (loop for x fixnum from 0 below width do 
                      (let ((cell (the fixnum (aref ca1 i))))
                        (declare (type fixnum cell))
                        (if (zerop cell) 
                            ;; summing the 8 neighbours 
                            (let ((sum (s+ (sref ca1 (1+ i)) 
                                           (sref ca1 (1- i)) 
                                           (sref ca1 (+ i ca-width -1)) 
                                           (sref ca1 (+ i ca-width)) 
                                           (sref ca1 (+ i ca-width 1)) 
                                           (sref ca1 (- i ca-width -1)) 
                                           (sref ca1 (- i ca-width)) 
                                           (sref ca1 (- i ca-width 1))))) 
                              (declare (type fixnum sum))
                              (when (= sum 1) (setfs cell 1)))
                          (if (s= cell 62)
                              (setfs cell 0)
                            (incf cell))) 
                        (setfs (sref ca2 i) cell)
                        (incf i))) 
                (incf i 2)) 
          (rotatef ca1 ca2)) 

    ;; save it 
    (with-open-file (stream file :direction :output 
                            :element-type '(unsigned-byte 8) 
                            :if-exists :supersede 
                            :if-does-not-exist :create) 
      (loop for y from 0 below ca-height 
            with i = 0 do 
            (loop for x from 0 below ca-width do 
                  (let ((cell (aref ca1 i))) 
                    (write-byte cell stream)) 
                  (incf i)))))) 

(defun test ()
  (generations 1000 "c:/tmp/test.txt"))


-- 
Frank Buss, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: Pascal Costanza
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 11:43:52 +0000
Message-ID: <48nd3oFl229sU1@individual.net>

Frank Buss wrote:
> Rainer Joswig wrote:
> 
> 
>>>In another thread in this newsgroup some people wrote that Common Lisp can
>>>be as fast as C. This might be possible, but looks like no current
>>>implementation reachs this goal.
>>
>>Some implementations do. But you have to add more information
>>to the code to reach that goal. Common Lisp compilers cannot
>>deduce that you want to work around the language semantics
>>- see below.
> 
> 
> Which implementation is as fast as Visual C++ 7.1?
> 
> Thanks for your references. I was able to optimize it with some tips from
> the Fannkuch benchmark. Looks like in LispWorks using SVREF instead of AREF
> and some more type declarations makes it multiple times faster, doubles the
> speed for SBCL and gives some percent for CLISP. With later versions of
> LispWorks it is a bit slower than with older versions:
> 
> Visual Studio C++ 7.1.3088: 1.9s
> java version "1.5.0_06": 6.4s
> sbcl-0.9.9-x86-win32: 17s
> LispWorks Personal Edition 4.3.7: 25s
> LispWorks Personal Edition 4.4.6: 33s
> CLISP-2.3.8: 483s
> 
> But this is still multiple times slower than the C or the Java version. 

It seems that you are also measuring file i/o. You may want to measure 
the calculations and file i/o separately.


Pascal

-- 
3rd European Lisp Workshop
July 3-4 - Nantes, France - co-located with ECOOP 2006
http://lisp-ecoop06.bknr.net/

From: Frank Buss
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 15:18:22 +0000
Message-ID: <tebmm7qcr4ub$.1atse33uycrmm.dlg@40tude.net>

Pascal Costanza wrote:

> It seems that you are also measuring file i/o. You may want to measure 
> the calculations and file i/o separately.

the file i/o needs 0.031 seconds in LispWorks and 0.1 seconds in CLISP.
I've added it just to make sure the compiler don't optimize away the whole
function, which should be possible when I don't use the result.

-- 
Frank Buss, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: Rainer Joswig
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 12:47:05 +0000
Message-ID: <joswig-1B7FF4.13470526032006@news-europe.giganews.com>

In article <································@40tude.net>,
 Frank Buss <··@frank-buss.de> wrote:

> Rainer Joswig wrote:
> 
> >> In another thread in this newsgroup some people wrote that Common Lisp can
> >> be as fast as C. This might be possible, but looks like no current
> >> implementation reachs this goal.
> > 
> > Some implementations do. But you have to add more information
> > to the code to reach that goal. Common Lisp compilers cannot
> > deduce that you want to work around the language semantics
> > - see below.
> 
> Which implementation is as fast as Visual C++ 7.1?
> 
> Thanks for your references. I was able to optimize it with some tips from
> the Fannkuch benchmark. Looks like in LispWorks using SVREF instead of AREF
> and some more type declarations makes it multiple times faster, doubles the
> speed for SBCL and gives some percent for CLISP. With later versions of
> LispWorks it is a bit slower than with older versions:

You are using a general array. You don't specify the
ARRAY element type when doing MAKE-ARRAY.

9.4 Compiler control
http://www.lispworks.com/documentation/lw445/LWUG/html/lwuser-89.htm#pgfId-886019

9.6 Optimizing your code
http://www.lispworks.com/documentation/lw445/LWUG/html/lwuser-92.htm#pgfId-887598

using CMUCL on your code gives me:

; File: /home/joswig/test-compiler.lisp

; In: DEFUN GENERATIONS

;   (LOOP FOR Y FROM 0...)
; --> BLOCK LET LET ANSI-LOOP::LOOP-BODY TAGBODY WHEN COND IF >= IF
; ==>
;   (< Y #:G30)
; Note: Unable to optimize due to type uncertainty:
;     The first argument is a REAL, not a FLOAT.
;
;   (LOOP FOR X FROM 0...)
; --> BLOCK LET ANSI-LOOP::LOOP-BODY TAGBODY WHEN COND IF >= IF
; ==>
;   (< X #:G31)
; Note: Unable to optimize due to type uncertainty:
;     The first argument is a REAL, not a FLOAT.
;
;   (LOOP FOR Y FROM 0...)
; --> BLOCK LET LET ANSI-LOOP::LOOP-BODY TAGBODY ANSI-LOOP::LOOP-REALLY-DESETQ
; --> SETQ 1+
; ==>
;   (+ Y 1)
; Note: Unable to optimize due to type uncertainty:
;     The first argument is a NUMBER, not a FLOAT.
;
; Note: Unable to optimize due to type uncertainty:
;     The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; Note: Unable to optimize due to type uncertainty:
;     The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;
;   (LOOP FOR X FROM 0...)
; --> BLOCK LET ANSI-LOOP::LOOP-BODY TAGBODY ANSI-LOOP::LOOP-REALLY-DESETQ SETQ
; --> 1+
; ==>
;   (+ X 1)
; Note: Unable to optimize due to type uncertainty:
;     The first argument is a REAL, not a FLOAT.
;
;   (LOOP FOR Y FROM 0...)
; --> BLOCK LET LET ANSI-LOOP::LOOP-BODY TAGBODY WHEN COND IF >= IF
; ==>
;   (< Y #:G30)
; Note: Forced to do GENERIC-< (cost 10).
;     Unable to do inline fixnum comparison (cost 3) because:
;     The first argument is a REAL, not a FIXNUM.
;     Unable to do inline fixnum comparison (cost 4) because:
;     The first argument is a REAL, not a FIXNUM.
;     etc.
;

and so on...

The next thing I would do is to find out where the time
is spend. Use a profiler. Is it in the computation
or in the output? The output code looks notoriously
slow for example.

You iterate over the array in a 2d fashion.
Writing byte by byte
to a Lisp stream is often very slow (often there
are CLOS generic function calls involved, streams
are CLOS classes and so on).

CMUCL for example can do some optimizations for
arithmetic, but for naive stream processing it is
mostly slow.

A first thing I would do is to use WRITE-SEQUENCE where
possible, instead of WRITE-BYTE.

For example something like (write-sequence ca1 stream) ,
if you can/want write the whole array in one and
the element type of the array and the stream is the same
and so on....

-- 
http://lispm.dyndns.org/

From: Frank Buss
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 15:33:28 +0000
Message-ID: <1hon0f823xkdm$.15mphtv76ncmv$.dlg@40tude.net>

Rainer Joswig wrote:

> You are using a general array. You don't specify the
> ARRAY element type when doing MAKE-ARRAY.

I can't specify the array element when I want to use simple-vector, because
CLHS says, that a simple-vector can hold every element. And I need a simple
vector, because looks like aref is slower in LispWorks than svref, even if
I can't specify the element type. Using :element-type and aref in my last
code results in a running time of 190 seconds.

> using CMUCL on your code gives me:
> 
> ; File: /home/joswig/test-compiler.lisp
> 
> ; In: DEFUN GENERATIONS
> 
> ;   (LOOP FOR Y FROM 0...)
> ; --> BLOCK LET LET ANSI-LOOP::LOOP-BODY TAGBODY WHEN COND IF >= IF
> ; ==>
> ;   (< Y #:G30)
> ; Note: Unable to optimize due to type uncertainty:
> ;     The first argument is a REAL, not a FLOAT.

In my optimized version, which I posted in the message you answered, I've
used "(loop for y fixnum from 0 below height..." and at least SBCL doesn't
print warnings. Did you used this code for CMUCL? If yes, how can I fix it?

> The next thing I would do is to find out where the time
> is spend. Use a profiler. Is it in the computation
> or in the output? The output code looks notoriously
> slow for example.

For SBCL and LispWorks it is in the computation, the output needs 0.05 to
0.1 seconds.

-- 
Frank Buss, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: ······@corporate-world.lisp.de
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 20:13:18 +0000
Message-ID: <1143403998.278642.89660@e56g2000cwe.googlegroups.com>

Frank Buss schrieb:

> Rainer Joswig wrote:
>
> > You are using a general array. You don't specify the
> > ARRAY element type when doing MAKE-ARRAY.
>
> I can't specify the array element when I want to use simple-vector, because
> CLHS says, that a simple-vector can hold every element. And I need a simple
> vector, because looks like aref is slower in LispWorks than svref, even if
> I can't specify the element type. Using :element-type and aref in my last
> code results in a running time of 190 seconds.

Yes, it depends a bit on the compiler. Maybe LispWorks needs to add
some
more magic to their compiler. I would ask them.

Btw., see also the FIXNUM-SAFETY declaration.

>
> > using CMUCL on your code gives me:
> >
> > ; File: /home/joswig/test-compiler.lisp
> >
> > ; In: DEFUN GENERATIONS
> >
> > ;   (LOOP FOR Y FROM 0...)
> > ; --> BLOCK LET LET ANSI-LOOP::LOOP-BODY TAGBODY WHEN COND IF >= IF
> > ; ==>
> > ;   (< Y #:G30)
> > ; Note: Unable to optimize due to type uncertainty:
> > ;     The first argument is a REAL, not a FLOAT.
>
> In my optimized version, which I posted in the message you answered, I've
> used "(loop for y fixnum from 0 below height..." and at least SBCL doesn't
> print warnings. Did you used this code for CMUCL? If yes, how can I fix it?

Do you use y and x? did you see the REPEAT clause of LOOP?

I would also get rid of the local LET stuff where you do:

(let ((foo (something)))
  (+ foo 1))

Why not just (+ (something) 1)  ?

Then I would look at the output of TIME. If there is too much consed
memory,
this points to some inefficiency (maybe bignum arithmetic instead of
fixnum
arithmetic, boxing/unboxing, ...). I would ask the vendor how to reduce
the
consing to zero (or to what you statically allocate (the arrays ...)).
>From looking at the LispWorks output, there seems to be too much
consing.
Maybe the vendor has some ideas to improve the code.

I would also change the nested IFs to a COND.

You can also put an AND between independent WITH clauses (similar to
LET vs. LET*).

LOOP
WITH a = 1
AND b = 2
AND c = 3
WITH d = 4

Also

(loop ...

   (loop with foo = 1

could be changed

to

(loop for foo = 1
  (loop ...

which would reduce the code size of the inner loop (which *could* have
some effect)...

It also might be interesting to compare a 32bit Lisp against a 64bit
Lisp. For example
ACL 8 in both versions on a single machine.

From: John Thingstad
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Mon, 27 Mar 2006 10:48:23 +0000
Message-ID: <op.s62iyxg8pqzri1@mjolner.upc.no>

On Sun, 26 Mar 2006 22:13:18 +0200, <······@corporate-world.lisp.de> wrote:

> Frank Buss schrieb:
>
>> Rainer Joswig wrote:
>>
>> > You are using a general array. You don't specify the
>> > ARRAY element type when doing MAKE-ARRAY.
>>
>> I can't specify the array element when I want to use simple-vector,  
>> because
>> CLHS says, that a simple-vector can hold every element. And I need a  
>> simple
>> vector, because looks like aref is slower in LispWorks than svref, even  
>> if
>> I can't specify the element type. Using :element-type and aref in my  
>> last
>> code results in a running time of 190 seconds.
>
> Yes, it depends a bit on the compiler. Maybe LispWorks needs to add
> some
> more magic to their compiler. I would ask them.
>

Lispworks has a extention for fast 32 bit integers.

In this example the safety level assures a second optimization in  
fli:foreign-typed-aref
(float 0) is needed to turn boxing off

(defun incf-signed-byte-32 (ptr index)
   (declare (optimize (safety 0) (float 0))
            (type fixnum index))
   (setf (fli:foreign-typed-aref 'sys:int32 ptr index)
         (sys:int32-1+ (fli:foreign-typed-aref 'sys:int32
                                               ptr index)))
   ;; return ptr, since otherwise the int32 would
   ;; need to be boxed to return it
   ptr)

This is just a paste from the documentation to give you a taste.

This reference into the LispWorks Reference manual should give you a idea.
http://www.lispworks.com/documentation/lw445/LWRM/html/lwref-586.htm#pgfId-1080734

I'll look into it.
Seems something of a headache. like (sys:add32+ a b) only works fow two  
arguments.
I suspect these features will be integrated into the optimizer at some  
later date
but for now they are only one step above assembly.

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

From: John Thingstad
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Mon, 27 Mar 2006 10:50:15 +0000
Message-ID: <op.s62i11d7pqzri1@mjolner.upc.no>

On Mon, 27 Mar 2006 12:48:23 +0200, John Thingstad  
<··············@chello.no> wrote:

>
> Seems something of a headache. like (sys:add32+ a b) only works fow two  
> arguments.
That was supposed to be sys:int32+.. sorry

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

From: Marko Kocic
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Fri, 31 Mar 2006 15:22:20 +0000
Message-ID: <1143818540.815709.263400@e56g2000cwe.googlegroups.com>

> The next thing I would do is to find out where the time
> is spend. Use a profiler. Is it in the computation
> or in the output? The output code looks notoriously
> slow for example.

While we are at it, is there any profiling package that can be used
with clisp under wndows that actually work?

From: funkyj
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Fri, 31 Mar 2006 19:03:18 +0000
Message-ID: <1143831798.727705.11320@g10g2000cwb.googlegroups.com>

I believe corman lisp includes a profiler.  The Corman compiler & REPL
are free but the IDE (which, no doubt, is where the profiler is) is
payware.  I think you can use the Corman IDE for a 30 day trial period.

The cost of the full Corman IDE is very reasonable.  I think corman
supports threads and windows GUI APIs.  All in all *if* I was going to
get a payware lisp for windows I think I'd go with Corman.

These days I use CLISP because (1) it is free and (2) I can install and
manage CLISP updates with the cygwin installer.  Sure, CLISP is slower
than SBCL and others but newbie user experience is my highest priority.

From: Marko Kocic
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Mon, 03 Apr 2006 13:23:33 +0000
Message-ID: <1144070613.036000.288530@g10g2000cwb.googlegroups.com>

Ok, tried it again but deleted it fast.
I couldn't make it to work with asdf, couldn't start slime, and that
was a showstopper. As for profiler, I couldn't find it cause my IDE
expired long time ago.

As for profiler, I used one from
http://cybertiggyr.com/gene/lh/profile.lisp which works with clisp in
some cases, but couldn't make it to profile examples in this thread.

From: Joerg Hoehle
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Fri, 07 Apr 2006 11:55:51 +0000
Message-ID: <uk6a1a9bs.fsf@users.sourceforge.net>

"Marko Kocic" <···········@gmail.com> writes:

> While we are at it, is there any profiling package that can be used
> with clisp under wndows that actually work?

I used the plain old CMUCL AI repository metering package with CLISP
from times to times.  IIRC, it's included in SLIME.

>As for profiler, I used one from
>http://cybertiggyr.com/gene/lh/profile.lisp which works with clisp in
>some cases, but couldn't make it to profile examples in this thread.
If you have reasons to believe there's a bug somewhere, please
identify it and report a bug to Gene, Peter Norvig or CLISP.

Rainer Joswig writes:
>I would also change the nested IFs to a COND.
If that makes a difference to the compiler, then it's really poor at
opimizations, isn't it?  I'd not do that just for some microseconds.

I already find it annoying enough that some implementations is faster
at make-array without :element-type => simple-vector => may use SVREF,
while others are faster at MAKE-ARRAY :element-type '(unsigned-byte 32).

What a pity!

Regards,
	Jorg Hohle
Telekom/T-Systems Technology Center

From: Thomas Schilling
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 12:48:41 +0000
Message-ID: <48ngcaFl1h8oU1@news.dfncis.de>

Frank Buss wrote:
> The optimized source:
[... code ...]

On my machine using SBCL 9.4.0 (no I/O) this takes:

Evaluation took:
  27.248 seconds of real time
  27.207865 seconds of user run time
  0.005999 seconds of system run time
  0 page faults and
  2,475,568 bytes consed.

Doing manual strength reduction / code hoisting I get:

Evaluation took:
  15.302 seconds of real time
  15.287676 seconds of user run time
  0.003999 seconds of system run time
  0 page faults and
  2,475,568 bytes consed.

I basically pulled up the calculation of the array offsets (variables
named "i-1", "i+1", "i-w-1", ...) . Still the register allocator does a
bad job.

I also refrained from rewriting

 (aref ca1 (+ i width -1))
 (aref ca1 (+ i width))
 (aref ca1 (+ i width 1))

into

 (+ ...
  (for d from (+ i width -1) to (+ i width 1) summing (aref c1 d))
  ...)

because I assumed the possible speedup would be spoiled by the register
allocator, again.  You might try, though.

Here's the code.

(defun generations-hoist (count file)
  "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )"
  (declare (optimize (speed 3) (safety 0) (space 0) (debug 0)))
  (let* ((width 640)
         (height 480)
         (ca-width (+ width 2))
         (ca-height (+ height 2))
         (len (* ca-width ca-height))
         (ca1 (make-array len :initial-element 0))
         (ca2 (make-array len :initial-element 0)))
    (declare (type fixnum count)
             (type fixnum width)
             (type fixnum height)
             (type fixnum ca-width)
             (type fixnum ca-height)
             (type fixnum len)
             (type simple-vector ca1)
             (type simple-vector ca2))

    ;; seed the cellular automaton space
    (setf (aref ca1 (- (floor len 2) ca-width 4)) 1)

    ;; calcuclate the generation "count"
    (loop for c fixnum from 1 to count do
          (loop for y fixnum from 0 below height
	     with i fixnum = (1+ ca-width)
	     with i+1 fixnum = (+ i 1)
	     with i-1 fixnum = (- i 1)
	     with i+w-1 fixnum = (+ i ca-width -1)
	     with i+w fixnum = (+ i ca-width)
	     with i+w+1 fixnum = (+ i ca-width 1)
	     with i-w-1 fixnum = (- i ca-width 1)
	     with i-w fixnum = (- i ca-width)
	     with i-w+1 fixnum = (- i ca-width -1) do
	     (loop for x fixnum from 0 below width do
		  (let ((cell (the fixnum (aref ca1 i))))
		    (declare (type fixnum cell))
		    (if (zerop cell)
			;; summing the 8 neighbours
			(let ((sum (s+ (sref ca1 i+1)
				       (sref ca1 i-1)
				       (sref ca1 i+w-1)
				       (sref ca1 i+w)
				       (sref ca1 i+w+1)
				       (sref ca1 i-w-1)
				       (sref ca1 i-w)
				       (sref ca1 i-w+1))))
			  (declare (type fixnum sum))
			  (when (= sum 1) (setfs cell 1)))
			(if (s= cell 62)
			    (setfs cell 0)
			    (incf cell)))
		    (setfs (sref ca2 i) cell)
		    (sincf i 1)
		    (sincf i-1 1)
		    (sincf i+1 1)
		    (sincf i-w-1 1)
		    (sincf i-w 1)
		    (sincf i-w+1 1)
		    (sincf i+w-1 1)
		    (sincf i+w 1)
		    (sincf i+w+1 1)))
	     (sincf i 2)
	     (sincf i-1 2)
	     (sincf i+1 2)
	     (sincf i-w-1 2)
	     (sincf i-w 2)
	     (sincf i-w+1 2)
	     (sincf i+w-1 2)
	     (sincf i+w 2)
	     (sincf i+w+1 2))
	 (rotatef ca1 ca2))

    ;; save it
    #+ignore
    (with-open-file (stream file :direction :output
                            :element-type '(unsigned-byte 8)
                            :if-exists :supersede
                            :if-does-not-exist :create)
      (loop for y from 0 below ca-height
            with i = 0 do
            (loop for x from 0 below ca-width do
                  (let ((cell (aref ca1 i)))
                    (write-byte cell stream))
                  (incf i))))))

From: Frank Buss
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 16:19:39 +0000
Message-ID: <1xr7ecywcpn34$.v4dcsojed2v.dlg@40tude.net>

Thomas Schilling wrote:

> I basically pulled up the calculation of the array offsets (variables
> named "i-1", "i+1", "i-w-1", ...) . Still the register allocator does a
> bad job.

This helped a bit. I assume your sincf looks like below. Now it runs in 9.1
seconds with SBCL. Still more than 5 times slower than C, but at least 3
times faster than the first version.

> I also refrained from rewriting
> 
>  (aref ca1 (+ i width -1))
>  (aref ca1 (+ i width))
>  (aref ca1 (+ i width 1))
> 
> into
> 
>  (+ ...
>   (for d from (+ i width -1) to (+ i width 1) summing (aref c1 d))
>   ...)
> 
> because I assumed the possible speedup would be spoiled by the register
> allocator, again.  You might try, though.

I didn't manage to write it without warnings like this:

; note: forced to do GENERIC-+ (cost 10)
;       unable to do inline fixnum arithmetic (cost 2) because:
;       The first argument is a UNSIGNED-BYTE, not a FIXNUM.

The latest code:

(deftype ui8 () '(unsigned-byte 8))
(defmacro ui8 (a) `(the ui8 ,a))
(defmacro s+ (&rest args)
  `(the fixnum (+ ,@(loop for arg in args collect `(ui8 ,arg)))))
(defmacro sref (a i) `(ui8 (svref ,a (the fixnum ,i))))
(defmacro s= (a b) `(= (the fixnum ,a) (the fixnum ,b)))
(defmacro setfs (a b) `(setf ,a (the fixnum ,b)))
(defmacro sincf (a b) `(incf ,a (the fixnum ,b)))

(defconstant +width+ 640)
(defconstant +height+ 480)
(defconstant +ca-width+ (+ +width+ 2))
(defconstant +ca-height+ (+ +height+ 2))
(defconstant +len+ (* +ca-width+ +ca-height+))

(defun generations (count file) 
  "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )" 
  (declare (optimize (speed 3) (safety 0) (space 0) (debug 0)))
  (let* ((ca1 (make-array +len+ :initial-element 0)) 
         (ca2 (make-array +len+ :initial-element 0))) 
    (declare (type fixnum count)
             (type fixnum +width+)
             (type fixnum +height+)
             (type fixnum +ca-width+)
             (type fixnum +ca-height+)
             (type fixnum +len+)
             (type simple-vector ca1)
             (type simple-vector ca2))

    ;; seed the cellular automaton space 
    (setf (aref ca1 (- (floor +len+ 2) +ca-width+ 4)) 1) 

    ;; calcuclate the generation "count" 
    (loop for c fixnum from 1 to count do
          (loop for y fixnum from 0 below +height+
                with i fixnum = (1+ +ca-width+)
                with i+1 fixnum = (+ i 1)
                with i-1 fixnum = (- i 1)
                with i+w-1 fixnum = (+ i +ca-width+ -1)
                with i+w fixnum = (+ i +ca-width+)
                with i+w+1 fixnum = (+ i +ca-width+ 1)
                with i-w-1 fixnum = (- i +ca-width+ 1)
                with i-w fixnum = (- i +ca-width+)
                with i-w+1 fixnum = (- i +ca-width+ -1) do
                (loop for x fixnum from 0 below +width+ do
                      (let ((cell (the fixnum (aref ca1 i))))
                        (declare (type fixnum cell))
                        (if (zerop cell)
                            ;; summing the 8 neighbours
                            (let ((sum (s+ (sref ca1 i+1)
                                           (sref ca1 i-1)
                                           (sref ca1 i+w-1)
                                           (sref ca1 i+w)
                                           (sref ca1 i+w+1)
                                           (sref ca1 i-w-1)
                                           (sref ca1 i-w)
                                           (sref ca1 i-w+1))))
                              (declare (type fixnum sum))
                              (when (= sum 1) (setfs cell 1)))
                          (if (s= cell 62)
                              (setfs cell 0)
			    (incf cell)))
                        (setfs (sref ca2 i) cell)
                        (sincf i 1)
                        (sincf i-1 1)
                        (sincf i+1 1)
                        (sincf i-w-1 1)
                        (sincf i-w 1)
                        (sincf i-w+1 1)
                        (sincf i+w-1 1)
                        (sincf i+w 1)
                        (sincf i+w+1 1)))
                (sincf i 2)
                (sincf i-1 2)
                (sincf i+1 2)
                (sincf i-w-1 2)
                (sincf i-w 2)
                (sincf i-w+1 2)
                (sincf i+w-1 2)
                (sincf i+w 2)
                (sincf i+w+1 2))
          (rotatef ca1 ca2))

    ;; save it 
    (with-open-file (stream file :direction :output 
                            :element-type '(unsigned-byte 8) 
                            :if-exists :supersede 
                            :if-does-not-exist :create) 
      (loop for y from 0 below +ca-height+ 
            with i = 0 do 
            (loop for x from 0 below +ca-width+ do 
                  (let ((cell (aref ca1 i))) 
                    (write-byte cell stream)) 
                  (incf i)))))) 

(defun test ()
  (generations 1000 "c:/tmp/test.txt"))

-- 
Frank Buss, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: Douglas Crosher
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 13:25:49 +0000
Message-ID: <4426965D.6080604@scieneer.com>

Frank Buss wrote:
...
> Which implementation is as fast as Visual C++ 7.1?
> 
> Thanks for your references. I was able to optimize it with some tips from
> the Fannkuch benchmark. Looks like in LispWorks using SVREF instead of AREF
> and some more type declarations makes it multiple times faster, doubles the
> speed for SBCL and gives some percent for CLISP. With later versions of
> LispWorks it is a bit slower than with older versions:
> 
> Visual Studio C++ 7.1.3088: 1.9s
> java version "1.5.0_06": 6.4s
> sbcl-0.9.9-x86-win32: 17s
> LispWorks Personal Edition 4.3.7: 25s
> LispWorks Personal Edition 4.4.6: 33s
> CLISP-2.3.8: 483s
> 
> But this is still multiple times slower than the C or the Java version. 

The Scieneer Common Lisp implementation does a better job and appears
to be a good deal faster than Java but still slower than C.

* (time (test))
Compiling lambda nil:
Compiling Top-Level Form:

Evaluation took:
   3.87 seconds of real time
   3.856241 seconds of user run time
   0.012001 seconds of system run time
   0 page faults
   4,955,632 bytes consed by this thread and
   4,955,632 total bytes consed.
nil

And for reference CLISP on the same system:
[4]> (time (test))
Real time: 497.23822 sec.
Run time: 492.19077 sec.
Space: 4956240 Bytes
GC: 2, GC time: 0.048003 sec.
NIL

Use the competitive edge that lisp has and developer a smarter algorithm
in a fraction of the time and leave the competition for dust.

Regards
Douglas Crosher

From: John Thingstad
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 12:32:05 +0000
Message-ID: <op.s60s3rwgpqzri1@mjolner.upc.no>

On Sun, 26 Mar 2006 06:51:45 +0200, Frank Buss <··@frank-buss.de> wrote:

A lot of stuff has already been mentioned here.
(the fixmun, delaration for sum etc.
In addition in the C code your variables like width are
#define'd while in the Lisp code they are defined in let.

#define WIDTH 640
#define HEIGHT 480
#define CA_WIDTH (WIDTH + 2)
#define CA_HEIGHT (HEIGHT + 2)

Do a defconstant instead and you get a value substitution.
This should speed up the inner loop.
Also write-sequence rather than loop - write-byte (minor).

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

From: Frank Buss
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 15:42:47 +0000
Message-ID: <k7fyipg9f2s2.1dvgnlq5xvzva.dlg@40tude.net>

John Thingstad wrote:

> Do a defconstant instead and you get a value substitution.
> This should speed up the inner loop.

no, it doesn't, at least not for SBCL.

-- 
Frank Buss, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: Wade Humeniuk
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 15:33:27 +0000
Message-ID: <bvyVf.8025$B_1.4598@edtnps89>

Here is the best I could get on LW.  This version seems to be 2x
a fast as your fastest LW version.  There are far fewer arefs
in this code.  (I think the algorithm is correct, but it is not
thoroughly checked out).

CL-USER 16 > (time (test))
Timing the evaluation of (TEST)

user time    =     15.882
system time  =      0.000
Elapsed time =   0:00:16
Allocation   = 630336 bytes standard / 3113 bytes conses
0 Page faults
NIL

CL-USER 17 >
(defun generations (count file)
   "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )"
   (declare (optimize (speed 3) (safety 0) (space 0) (debug 0)
                      #+lispworks(float 0) #+lispworks(fixnum-safety 0))
            (type fixnum count))

   (let* ((width 640)
          (height 480)
          (ca-width (+ width 2))
          (ca-height (+ height 2))
          (len (* ca-width ca-height))
          (ca1 (make-array len :element-type '(unsigned-byte 8) :initial-element 0))
          (ca2 (make-array len :element-type '(unsigned-byte 8) :initial-element 0))
          (c00 0) (c01 0) (c02 0)
          (c10 0) (c11 0) (c12 0)
          (c20 0) (c21 0) (c22 0)
          (i 0) (i0 0) (i1 0) (i2 0))
     (declare (type (simple-array (unsigned-byte 8) (*)) ca1 ca2)
              (type fixnum width height ca-width ca-height len
                    c00 c01 c02 c10 c11 c12 c20 c21 c22
                    i i0 i1 i2))

     (flet ((set-filter (i)
              (setf i0 (- i ca-width) i1 i i2 (+ i ca-width))
              (setf c00 (aref ca1 (1- i0)) c01 (aref ca1 i0) c02 (aref ca1 (1+ i0))
                    c10 (aref ca1 (1- i1)) c11 (aref ca1 i1) c12 (aref ca1 (1+ i1))
                    c20 (aref ca1 (1- i2)) c21 (aref ca1 i2) c22 (aref ca1 (1+ i2))))
            (move-filter-right ()
              (setf i0 (1+ i0) i1 (1+ i1) i2 (1+ i2))
              (setf c00 c01 c01 c02 c02 (aref ca1 (1+ i0))
                    c10 c11 c11 c12 c12 (aref ca1 (1+ i1))
                    c20 c21 c21 c22 c22 (aref ca1 (1+ i2)))))
       (declare (inline set-filter move-filter-right))
       ;; seed the cellular automaton space
       (setf (aref ca1 (- (floor len 2) ca-width 4)) 1)

       ;; calcuclate the generation "count"
       (loop repeat count do
             (setf i (+ 1 ca-width))
             (loop repeat height do
                   (set-filter i)
                   (loop repeat width do
                         (setf (aref ca2 i)
                               (if (zerop c11)
                                   ;; summing the 8 neighbours
                                   (let ((sum (+ (+ c00 c01 c02)
                                                 (+ c10     c12)
                                                 (+ c20 c21 c22))))
                                     (declare (type fixnum sum))
                                     (if (= sum 1) 1 sum))
                                 (if (= c11 62) 0 (1+ c11))))
                         (incf i)
                         (move-filter-right))
                   (incf i 2))
             (rotatef ca1 ca2)))

     ;; save it
     (with-open-file (stream file :direction :output
                             :element-type '(unsigned-byte 8)
                             :if-exists :supersede
                             :if-does-not-exist :create)
       (loop repeat height
             for i = 1 then (+ i ca-width) do
             (write-sequence ca1 stream :start i :end (+ i width))))))

(defun test ()
   (generations 1000 "c:/tmp/test.txt"))

From: Wade Humeniuk
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sun, 26 Mar 2006 15:59:08 +0000
Message-ID: <gTyVf.12329$%H.6175@clgrps13>

Wade Humeniuk wrote:
> Mistake at

>                                     (if (= sum 1) 1 sum))
should be                             (if (= sum 1) 1 c11)

Wade

From: Dmitry Gorbatovsky
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Mon, 27 Mar 2006 00:32:14 +0000
Message-ID: <e07boc$315$1@emma.aioe.org>

Pleas ...

-- 
*Prolonged contact with the computer turns mathematicians
into clerks and vice versa.

From: bradb
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Tue, 28 Mar 2006 06:10:11 +0000
Message-ID: <1143526211.570799.218260@t31g2000cwb.googlegroups.com>

This is the best I can do, considering I am new to Lisp maybe it's not
too bad.  Though it is a little scary that you need to be a serious
Lisp expert to get C speed out of Lisp.
This is run on a 1.83 Ghz Intel Mac.  SBCL is version 0.9.10.37.
C code runs in 3.34 seconds.
SBCL in 8.139
My code is not indented to emacs style, if this will cause you an
aneurysm, please stop reading now.

Brad

(defconstant +WIDTH+ 640)
(defconstant +HEIGHT+ 480)
(defconstant +CA-WIDTH+ (+ +WIDTH+ 2))
(defconstant +CA-HEIGHT+ (+ +HEIGHT+ 2))
(defconstant +CA-WIDTH+1+ (+ +CA-WIDTH+ 1))
(defconstant +WIDTH-1+ (- +WIDTH+ 1))
(defconstant +HEIGHT-1+ (- +HEIGHT+ 1))

(declaim (inline sum-neighbours))
(defun sum-neighbours (array pos)
  (declare (optimize (speed 3) (safety 0)))
  (declare (type (vector (unsigned-byte 8)) array)
           (type fixnum pos))
  (+ (aref array (1+ pos))
     (aref array (1- pos))
     (aref array (+ pos +CA-WIDTH+ -1))
     (aref array (+ pos +CA-WIDTH+))
     (aref array (+ pos +CA-WIDTH+ 1))
     (aref array (- pos +CA-WIDTH+ 1))
     (aref array (- pos +CA-WIDTH+))
     (aref array (- pos +CA-WIDTH+ -1))))

(declaim (inline loop-body))
(defun loop-body (ca1 ca2)
  (declare (inline sum-neighbours))
  (declare (optimize (speed 3) (safety 0)))
  (declare (type (simple-array (unsigned-byte 8)) ca1 ca2))
  (let ((i +CA-WIDTH+1+)
        (cell 0))
    (declare (type fixnum cell i))
    (dotimes (y +HEIGHT+)
      (dotimes (x +WIDTH+)
        (declare (type fixnum y x)
                 (type (unsigned-byte 8) cell))
        (setf cell (aref ca1 i))
        (if (= cell 0)
          (when (= (sum-neighbours ca1 i) 1) (setf cell 1))
          (when (= (incf cell) 63) (setf cell 0)))
        (setf (aref ca2 i) cell)
        (incf i)))
    (incf i 2)))

(defun main-loop (num-loops)
  (declare (optimize (speed 3) (safety 0)))
  (declare (type fixnum num-loops))
  (declare (inline loop-body))
  (let ((ca1 (make-array (* +CA-WIDTH+ +CA-HEIGHT+)
                         :element-type '(unsigned-byte 8)
:initial-element 0))
        (ca2 (make-array (* +CA-WIDTH+ +CA-HEIGHT+)
                         :element-type '(unsigned-byte 8)
:initial-element 0)))

    (setf (aref ca1 (- (/ (* +CA-WIDTH+ +CA-HEIGHT+) 2) +CA-WIDTH+ 4))
1)
    (loop repeat num-loops
          do (loop-body ca1 ca2)
          (rotatef ca1 ca2))
    ca1))

(defun call-main (num-loops)
  (let ((ca1 nil))
   (time
      (setf ca1 (main-loop num-loops)))
    (with-open-file (f "lisplog.txt" :direction :output :element-type
'(unsigned-byte 8) :if-exists :supersede)
      (write-sequence ca1 f))
    0))

(progn
  (gc-off)
  (call-main 1000)
  (gc-on))

From: Kaz Kylheku
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Tue, 28 Mar 2006 06:40:41 +0000
Message-ID: <1143528041.674981.280800@e56g2000cwe.googlegroups.com>

bradb wrote:
> This is the best I can do, considering I am new to Lisp maybe it's not
> too bad.  Though it is a little scary that you need to be a serious
> Lisp expert to get C speed out of Lisp.

That's nothing compared to the expert you have to be to get Lisp
flexibility and reliability out of C, on any meaningful scale. :)

In C, you're optimizing everything. Everything is declared, and,
effectively, what would be the safety setting equivalent in C is zero.

Even the main function gets optimized. That argc + argv option
processing, boy, does that get compiled into tight code.  After half a
dozen shared libraries are mmap'ed and initialized, localization and
time zone info is scanned, and hundreds of page faults occur, main() is
ready to tear into that argument vector like a chainsaw!

From: Rob Thorpe
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Tue, 28 Mar 2006 10:13:41 +0000
Message-ID: <1143540821.670748.250270@e56g2000cwe.googlegroups.com>

Kaz Kylheku wrote:
> bradb wrote:
> > This is the best I can do, considering I am new to Lisp maybe it's not
> > too bad.  Though it is a little scary that you need to be a serious
> > Lisp expert to get C speed out of Lisp.
>
> That's nothing compared to the expert you have to be to get Lisp
> flexibility and reliability out of C, on any meaningful scale. :)

Yes.

> In C, you're optimizing everything. Everything is declared, and,
> effectively, what would be the safety setting equivalent in C is zero.

Not really, it sits in a place that can't be compared to CL safety
settings.  C checks types very well, the problems are unrestricted
pointer arithmetic and indexing of arrays.

> Even the main function gets optimized. That argc + argv option
> processing, boy, does that get compiled into tight code.  After half a
> dozen shared libraries are mmap'ed and initialized, localization and
> time zone info is scanned, and hundreds of page faults occur, main() is
> ready to tear into that argument vector like a chainsaw!

:) and most of that happens in CL too often. Don't forget the huge
number of cache misses you get on code that's never run again.

The real issue is that performance code is not needed in many
situations these days.

From: Kaz Kylheku
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Tue, 28 Mar 2006 14:02:51 +0000
Message-ID: <1143554571.794445.31500@v46g2000cwv.googlegroups.com>

Rob Thorpe wrote:
> Kaz Kylheku wrote:
> > bradb wrote:
> > > This is the best I can do, considering I am new to Lisp maybe it's not
> > > too bad.  Though it is a little scary that you need to be a serious
> > > Lisp expert to get C speed out of Lisp.
> >
> > That's nothing compared to the expert you have to be to get Lisp
> > flexibility and reliability out of C, on any meaningful scale. :)
>
> Yes.
>
> > In C, you're optimizing everything. Everything is declared, and,
> > effectively, what would be the safety setting equivalent in C is zero.
>
> Not really, it sits in a place that can't be compared to CL safety
> settings.

Actually yes it can, quite meaningfully.

> C checks types very well, the problems are unrestricted
> pointer arithmetic and indexing of arrays.

You think that's it?

The  C language is chock full of undefined behavior. There is a trap
hiding in every nook and cranny, just about.

- unrestricted type conversions which are unchecked at run time
- effects of statically undetectable or undetected type incompatibility
  (e. g. wrong number or types of arguments)
- effects of arithmetic overflow and certain arithmetic conversions
- side effects, sequencing and order of evaluation.

You can make plenty of trouble even if you don't do anything bad with
array indexing.

Arithmetic in C is can be compared to Lisp that is saturated with
declarations and compiled with safety 0.

If you add two values of type int, the resulting type is int. If the
result doesn't mathematically fit into type int, the behavior is
undefined. That means that the program can halt with or without a
diagnostic, behave unpredictably, continue with an erroneous result
that wraps around or is clamped, etc.

There isn't any standard-defined safety setting which would turn on a
diagnosing behavior.

C programmers do things like:

   /* Check for overflow */
   assert (a < 0 || b < 0 || a < INT_MAX - b);
   /* Check underflow */
   assert (a > 0 || b > 0 || a > INT_MIN - b);
   /* OK! */
   c = a + b;

Other C-like languages don't fare too much better. Remember that
Arianne rocket "accident" that was caused by Ada code which performed a
bad conversion from a 64 bit float to a 16 bit number? In C it's the
same thing. A floating point to integer conversion which is out of
range for the target integral type results in undefined behavior.

  unsigned x = (unsigned) -1.0;  /* Nasal demons! */

From: Rob Thorpe
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Tue, 28 Mar 2006 14:49:58 +0000
Message-ID: <1143557398.224495.247630@i40g2000cwc.googlegroups.com>

Kaz Kylheku wrote:
> Rob Thorpe wrote:
> > Kaz Kylheku wrote:
> > > bradb wrote:
> > > > This is the best I can do, considering I am new to Lisp maybe it's not
> > > > too bad.  Though it is a little scary that you need to be a serious
> > > > Lisp expert to get C speed out of Lisp.
> > >
> > > That's nothing compared to the expert you have to be to get Lisp
> > > flexibility and reliability out of C, on any meaningful scale. :)
> >
> > Yes.
> >
> > > In C, you're optimizing everything. Everything is declared, and,
> > > effectively, what would be the safety setting equivalent in C is zero.
> >
> > Not really, it sits in a place that can't be compared to CL safety
> > settings.
>
> Actually yes it can, quite meaningfully.
>
> > C checks types very well, the problems are unrestricted
> > pointer arithmetic and indexing of arrays.
>
> You think that's it?
>
> The  C language is chock full of undefined behavior. There is a trap
> hiding in every nook and cranny, just about.
>
> - unrestricted type conversions which are unchecked at run time

You mean coercion, that's true, yes.

> - effects of statically undetectable or undetected type incompatibility
>   (e. g. wrong number or types of arguments)

I don't understand what you mean here.  Modern C programs tend to use
prototypes to avoid the problems associated with arguments.  That
problem only exists for old C programs and use of varargs.

> - effects of arithmetic overflow and certain arithmetic conversions

Yup.

> - side effects, sequencing and order of evaluation.

These are mostly defined, if you understand the definitions.
Understanding the definitions is the problem.

> You can make plenty of trouble even if you don't do anything bad with
> array indexing.
>
> Arithmetic in C is can be compared to Lisp that is saturated with
> declarations and compiled with safety 0.

No you can't because declare does not do the same thing as C types.
If something is declared then the compiler may emit code on the basis
that it is the specifed type, and may check that the operations
performed on the variable correspond to the specified type.  But it
does not need to do either.

In C though the compiler to be compliant must spit out code that is
makes incorrect assignments unless the programmer casts them or they
fall into the set of expressions that will be automatically coerced.

C is more like lisp saturated with declarations and compiler with
safety 0 on CMUCL or SBCL.  Since on those compilers declares are take
seriously.  Even that though is not quite comparable since those lisps
can take more complex declarations than C.

> If you add two values of type int, the resulting type is int. If the
> result doesn't mathematically fit into type int, the behavior is
> undefined. That means that the program can halt with or without a
> diagnostic, behave unpredictably, continue with an erroneous result
> that wraps around or is clamped, etc.

Yep.

> There isn't any standard-defined safety setting which would turn on a
> diagnosing behavior.
>
> C programmers do things like:
>
>    /* Check for overflow */
>    assert (a < 0 || b < 0 || a < INT_MAX - b);
>    /* Check underflow */
>    assert (a > 0 || b > 0 || a > INT_MIN - b);
>    /* OK! */
>    c = a + b;

Yep, I've done things like that.

From: Kaz Kylheku
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Tue, 28 Mar 2006 15:07:57 +0000
Message-ID: <1143558477.367554.216850@z34g2000cwc.googlegroups.com>

Rob Thorpe wrote:
> Kaz Kylheku wrote:
> > - effects of statically undetectable or undetected type incompatibility
> >   (e. g. wrong number or types of arguments)
>
> I don't understand what you mean here.  Modern C programs tend to use
> prototypes to avoid the problems associated with arguments.  That
> problem only exists for old C programs and use of varargs.

Prototypes are just another example of an unsafe static declaration
that is blindly trusted after the compiling is done, and not even
before the linking is done. All it takes to call something is:

   extern int function(double, char *);
   function(3.0, "foo");

This does not necessarily reflect how function() is defined in the
other module. Without type-safe linkage, it will link anyway if
function() exists. (And, by the way, calling an undefined function is
simply undefined behavior; no diagnostic is required).

To achieve safety beyond that, you have to ensure that the site of the
definition of a function, and all sites which call that function, have
the same prior declaration (achieved by including including a common
header file which declares that function). Moreover, you have to have
external tools, not defined by the language, which ensure that all
dependencies are recompiled when something changes.

From: Michael Livshin
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Tue, 28 Mar 2006 11:41:37 +0000
Message-ID: <87ek0miyoe.fsf@colinux.pc-mlivshin-ibm.cadence.com.cmm>

"Rob Thorpe" <·············@antenova.com> writes:

> The real issue is that performance code is not needed in many
> situations these days.

no, the real issue is that the traditional "code bumming"-style
optimization is becoming decreasingly relevant as we speak.  

some people even see this as a sign of Von Neumann's prototype
computer architecture finally showing its inherent limitations.  not
that anything else is exactly ready to replace it. :/

cheers,
--m

From: Ken Tilton
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Tue, 28 Mar 2006 06:53:19 +0000
Message-ID: <z35Wf.833$YU7.414@fe08.lga>

bradb wrote:
> This is the best I can do, considering I am new to Lisp maybe it's not
> too bad.  Though it is a little scary that you need to be a serious
> Lisp expert to get C speed out of Lisp.

Nonsense. You get C speed out of Lisp by turning Lisp into C, so of 
course you need to be an expert. How much C do you have to know to get 
Lisp expresssiveness out of C? Or get anywhere near it, since of course 
you cannot? Automatic memory management, untyped variables, dynamism... 
as for the tradeoffs, how often does a programmer need a powerful 
language, and how often do they need to code up Conway's Life?

btw, the following looks like it would have run about 12 minutes under 
AllegroCL. I killed it when it went past the five seconds I had in my 
last version and ran it for 100 iterations and it came in a little over 
7 seconds.

ken

> This is run on a 1.83 Ghz Intel Mac.  SBCL is version 0.9.10.37.
> C code runs in 3.34 seconds.
> SBCL in 8.139
> My code is not indented to emacs style, if this will cause you an
> aneurysm, please stop reading now.
> 
> Brad
> 
> (defconstant +WIDTH+ 640)
> (defconstant +HEIGHT+ 480)
> (defconstant +CA-WIDTH+ (+ +WIDTH+ 2))
> (defconstant +CA-HEIGHT+ (+ +HEIGHT+ 2))
> (defconstant +CA-WIDTH+1+ (+ +CA-WIDTH+ 1))
> (defconstant +WIDTH-1+ (- +WIDTH+ 1))
> (defconstant +HEIGHT-1+ (- +HEIGHT+ 1))
> 
> (declaim (inline sum-neighbours))
> (defun sum-neighbours (array pos)
>   (declare (optimize (speed 3) (safety 0)))
>   (declare (type (vector (unsigned-byte 8)) array)
>            (type fixnum pos))
>   (+ (aref array (1+ pos))
>      (aref array (1- pos))
>      (aref array (+ pos +CA-WIDTH+ -1))
>      (aref array (+ pos +CA-WIDTH+))
>      (aref array (+ pos +CA-WIDTH+ 1))
>      (aref array (- pos +CA-WIDTH+ 1))
>      (aref array (- pos +CA-WIDTH+))
>      (aref array (- pos +CA-WIDTH+ -1))))
> 
> (declaim (inline loop-body))
> (defun loop-body (ca1 ca2)
>   (declare (inline sum-neighbours))
>   (declare (optimize (speed 3) (safety 0)))
>   (declare (type (simple-array (unsigned-byte 8)) ca1 ca2))
>   (let ((i +CA-WIDTH+1+)
>         (cell 0))
>     (declare (type fixnum cell i))
>     (dotimes (y +HEIGHT+)
>       (dotimes (x +WIDTH+)
>         (declare (type fixnum y x)
>                  (type (unsigned-byte 8) cell))
>         (setf cell (aref ca1 i))
>         (if (= cell 0)
>           (when (= (sum-neighbours ca1 i) 1) (setf cell 1))
>           (when (= (incf cell) 63) (setf cell 0)))
>         (setf (aref ca2 i) cell)
>         (incf i)))
>     (incf i 2)))
> 
> (defun main-loop (num-loops)
>   (declare (optimize (speed 3) (safety 0)))
>   (declare (type fixnum num-loops))
>   (declare (inline loop-body))
>   (let ((ca1 (make-array (* +CA-WIDTH+ +CA-HEIGHT+)
>                          :element-type '(unsigned-byte 8)
> :initial-element 0))
>         (ca2 (make-array (* +CA-WIDTH+ +CA-HEIGHT+)
>                          :element-type '(unsigned-byte 8)
> :initial-element 0)))
> 
>     (setf (aref ca1 (- (/ (* +CA-WIDTH+ +CA-HEIGHT+) 2) +CA-WIDTH+ 4))
> 1)
>     (loop repeat num-loops
>           do (loop-body ca1 ca2)
>           (rotatef ca1 ca2))
>     ca1))
> 
> (defun call-main (num-loops)
>   (let ((ca1 nil))
>    (time
>       (setf ca1 (main-loop num-loops)))
>     (with-open-file (f "lisplog.txt" :direction :output :element-type
> '(unsigned-byte 8) :if-exists :supersede)
>       (write-sequence ca1 f))
>     0))
> 
> (progn
>   (gc-off)
>   (call-main 1000)
>   (gc-on))
> 


-- 
Cells: http://common-lisp.net/project/cells/

"Have you ever been in a relationship?"
    Attorney for Mary Winkler, confessed killer of her
    minister husband, when asked if the couple had
    marital problems.

From: bradb
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Tue, 28 Mar 2006 15:30:26 +0000
Message-ID: <1143559826.379164.199310@t31g2000cwb.googlegroups.com>

Would you mind sharing your version?  Or can someone direct me to the
current fastest version that exists within this thread?

I took my best shot at getting C like speed out of Lisp, I got within a
factor of 3 on my machine, which I think isn't bad - but not good
enough in some situations.  Most other people are saying things like
"You can get the speed of C in Lisp by being expert enough" and "How
expert in C do you need to be to get Lisp's flexibility?".
The first argument sounds to me like "Lisp will be fast with a
sufficently smart programmer", to paraphrase an old saying.

So, are there any sufficiently smart Lisp programmers that can match
Frank's C implementation for speed?  Can we see your code please?
I am genually interested in learning what performance code looks like.

Cheers
Brad

From: ·······@gmail.com
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Fri, 07 Apr 2006 19:13:03 +0000
Message-ID: <1144437183.529094.62700@z34g2000cwc.googlegroups.com>

bradb wrote:
> This is the best I can do, considering I am new to Lisp maybe it's not
> too bad.  Though it is a little scary that you need to be a serious
> Lisp expert to get C speed out of Lisp.
> This is run on a 1.83 Ghz Intel Mac.  SBCL is version 0.9.10.37.
> C code runs in 3.34 seconds.
> SBCL in 8.139

Here's my refinement.  For comparison, on a 2.40GHz P4 with Linux, SBCL
0.9.10.22, your original code runs in:

Evaluation took:
  9.605 seconds of real time
  9.51 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  618,912 bytes consed.

My refinements, running on the same machine:

Evaluation took:
  4.337 seconds of real time
  4.27 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  618,912 bytes consed.

This is starting to get pretty close to C (I didn't run the C code on
this machine, however).  Please note that the code below is compiled
with default safety options--I believe for SBCL this would be (SAFETY
1)--so you still get all the type checking, array bounds checking, etc.
goodness that comes standard in CL.

And now I have to get back to work. :)

-Nathan

(defconstant +WIDTH+ 640)
(defconstant +HEIGHT+ 480)
(defconstant +CA-WIDTH+ (+ +WIDTH+ 2))
(defconstant +CA-HEIGHT+ (+ +HEIGHT+ 2))
(defconstant +CA-WIDTH+1+ (+ +CA-WIDTH+ 1))
(defconstant +WIDTH-1+ (- +WIDTH+ 1))
(defconstant +HEIGHT-1+ (- +HEIGHT+ 1))

(deftype ca-world () `(simple-array (unsigned-byte 8) (,(* +CA-WIDTH+
+CA-HEIGHT+))))
(deftype ca-index () `(integer 0 (mod ,(* +CA-WIDTH+ +CA-HEIGHT+))))

(declaim (inline sum-neighbours))
(defun sum-neighbours (array pos)
  (declare (optimize (speed 3)))
  (declare (type ca-world array)
           (type (integer #.+CA-WIDTH+1+ #.(+ +CA-WIDTH+1+ (* +HEIGHT+
+WIDTH+))) pos))
  (+ (aref array (1+ pos))
     (aref array (1- pos))
     (aref array (+ pos +CA-WIDTH+ -1))
     (aref array (+ pos +CA-WIDTH+))
     (aref array (+ pos +CA-WIDTH+ 1))
     (aref array (- pos +CA-WIDTH+ 1))
     (aref array (- pos +CA-WIDTH+))
     (aref array (- pos +CA-WIDTH+ -1))))

(declaim (inline loop-body))
(defun loop-body (ca1 ca2)
  (declare (inline sum-neighbours))
  (declare (optimize (speed 3)))
  (declare (type ca-world ca1 ca2))
  (let ((i +CA-WIDTH+1+))
    (dotimes (y +HEIGHT+)
      (dotimes (x +WIDTH+)
        (let ((cell (aref ca1 i)))
          (if (= cell 0)
              (if (= (sum-neighbours ca1 i) 1)
                  (setf (aref ca2 i) 1)
                  (setf (aref ca2 i) cell))
              (if (= cell 62)
                  (setf (aref ca2 i) 0)
                  (setf (aref ca2 i) (1+ cell)))))
        (incf i))
      (incf i 2))))

(defun main-loop (num-loops)
  (declare (optimize (speed 3)))
  (declare (notinline loop-body))
  (let ((ca1 (make-array (* +CA-WIDTH+ +CA-HEIGHT+)
                         :element-type '(unsigned-byte 8)
                         :initial-element 0))
        (ca2 (make-array (* +CA-WIDTH+ +CA-HEIGHT+)
                         :element-type '(unsigned-byte 8)
                         :initial-element 0)))

    (setf (aref ca1 (- (/ (* +CA-WIDTH+ +CA-HEIGHT+) 2) +CA-WIDTH+ 4))
          1)
    (loop repeat num-loops
      do (loop-body ca1 ca2)
      (rotatef ca1 ca2))
    ca1))

(defun call-main (num-loops)
  (let ((ca1 nil))
    (time
     (setf ca1 (main-loop num-loops)))
    (with-open-file (f "lisplog.txt" :direction :output :element-type
                       '(unsigned-byte 8) :if-exists :supersede)
      (write-sequence ca1 f))
    0))

(call-main 1000)

From: bradb
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Fri, 07 Apr 2006 20:42:23 +0000
Message-ID: <1144442543.683904.272530@e56g2000cwe.googlegroups.com>

Very cool, that is the kind of reply that I was wanting.  I'll have to
run it through my SBCL tonight.

Thanks alot!

Brad

From: bradb
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Sat, 08 Apr 2006 02:18:32 +0000
Message-ID: <1144462712.537859.247560@e56g2000cwe.googlegroups.com>

Results on my machine
C - 3.33
My SBCL version - 8.2 seconds
Yours - 5.406

So we're down to less than a factor or 2.  Nice!  Interestingly if one
follows the SBCL optimisation notes, things get slower?!?  I guess this
speaks well of the underlying type inference from the compiler, but not
so well of the printed notes.
Somebody mentioned that the array access code must cope with arrays
potentially being moved, I wonder if there is a way to tell the
compiler that "this code will run with GC off", so that the generated
assembler won't have to worry about array relocation.

Cheers
Brad

From: Antonio Menezes Leitao
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Tue, 28 Mar 2006 21:51:49 +0000
Message-ID: <87u09igruy.fsf@gia.ist.utl.pt>

Frank Buss <··@frank-buss.de> writes:

> I've tested my cellular automaton from
> http://www.frank-buss.de/automaton/arabeske.html again on my current
> computer with C and some Common Lisp implementations on Windows:
>
> ...
>
> I'm interested in the timings for other Lisp implementations. You can use
> the C version from http://www.frank-buss.de/tmp/arabeske.zip for a time
> reference (or use the CLISP version).
>
> In another thread in this newsgroup some people wrote that Common Lisp can
> be as fast as C. This might be possible, but looks like no current
> implementation reachs this goal. And I know, it's unfair to compare C and
> Lisp with such low-level tasks, because Lisp is better for high-level
> tasks, but would be nice to do fast image processing like in
> http://www.frank-buss.de/lisp/texture.html in Lisp, only.

If you don't mind compiling your code with slightly different
semantics, then you don't need to do much to have more or less the
same performance as C _without_ messing with lots of type
declarations.

What I usually do is to develop in Common Lisp.  If I need extreme
performance, I just add a few type declarations and feed the program
to the Linj compiler that outputs Java source code.  Then, I use a
Java compiler to create an efficient version.

The interesting thing is that I only need the include as many type
declarations as the Linj compiler requires me to do.  It will infer
all the others according to Java semantics.

So, just for comparison, I timed you original Common Lisp code (in
Allegro 7.0) and your hand-written C and Java code in my laptop and
got the following results:

Allegro CL 7.0 Linux: 117.1 s
GCC 4.0.3 Debian:      11.2 s
Java 1.4.2 Linux:      11.2 s

So, without any  special optimization, the Java code runs on par with
the C code and both much faster than the Common Lisp code.

Now, let's change your example so that it becomes easier to translate
into Java.  First, here is your code _without_ type declarations:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun generations (count file) 
  "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )" 
  (let* ((width 640) 
         (height 480) 
         (ca-width (+ width 2)) 
         (ca-height (+ height 2)) 
         (len (* ca-width ca-height)) 
         (ca1 (make-array len :initial-element 0)) 
         (ca2 (make-array len :initial-element 0)))

    ;; seed the cellular automaton space 
    (setf (aref ca1 (- (floor len 2) ca-width 4)) 1) 

    ;; calcuclate the generation "count" 
    (loop for c  from 1 to count do 
          (loop for y  from 0 below height 
                with i  = (1+ ca-width) do 
                (loop for x  from 0 below width do 
                      (let ((cell (aref ca1 i))) 
                        (if (zerop cell) 
                            ;; summing the 8 neighbours 
                            (let ((sum (+ (aref ca1 (1+ i)) 
                                          (aref ca1 (1- i)) 
                                          (aref ca1 (+ i ca-width -1)) 
                                          (aref ca1 (+ i ca-width)) 
                                          (aref ca1 (+ i ca-width 1)) 
                                          (aref ca1 (- i ca-width -1)) 
                                          (aref ca1 (- i ca-width)) 
                                          (aref ca1 (- i ca-width 1))))) 
                              (when (= sum 1) (setf cell 1))) 
                          (if (= cell 62) 
                              (setf cell 0) 
                            (incf cell))) 
                        (setf (aref ca2 i) cell) 
                        (incf i))) 
                (incf i 2)) 
          (rotatef ca1 ca2)) 

    ;; save it 
    (with-open-file (stream file :direction :output 
                            :element-type '(unsigned-byte 8) 
                            :if-exists :supersede 
                            :if-does-not-exist :create) 
      (loop for y from 0 below ca-height 
            with i = 0 do 
            (loop for x from 0 below ca-width do 
                  (let ((cell (aref ca1 i))) 
                    (write-byte cell stream)) 
                  (incf i))))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Using the previous code, the timings get even worse (as expected).  I
get 133.1 seconds in Allegro.

Now, let's change the function name and include a type declaration for
its parameters.  These declarations aren't strictly needed but,
without them, Linj will include extra libraries (to deal with bignums,
symbols and such) that we don't need for this example.  Also, we need
to be a bit more specific about the array declaration.  You will also
notice that I had to change the with-open-file macro call because Java
IO is somewhat different and I replaced the rotatef macro by a psetf
because I didn't bother to include rotatef in Linj.  Anyway, here is
the result.  I hope you will not find many differences.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun main (count file)
  "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )"
  (declare (int count) (string file))                        ;;changed
  (let* ((width 640) 
         (height 480) 
         (ca-width (+ width 2)) 
         (ca-height (+ height 2)) 
         (len (* ca-width ca-height)) 
         (ca1 (make-array (list len) :element-type 'byte))   ;;changed
         (ca2 (make-array (list len) :element-type 'byte)))  ;;changed

    ;; seed the cellular automaton space 
    (setf (aref ca1 (- (floor len 2) ca-width 4)) 1) 

    ;; calcuclate the generation "count" 
    (loop for c from 1 to count do 
          (loop for y from 0 below height 
                with i = (1+ ca-width) do 
                (loop for x from 0 below width do 
                      (let ((cell (aref ca1 i))) 
                        (if (zerop cell) 
                            ;; summing the 8 neighbours 
                            (let ((sum (+ (aref ca1 (1+ i)) 
                                          (aref ca1 (1- i)) 
                                          (aref ca1 (+ i ca-width -1)) 
                                          (aref ca1 (+ i ca-width)) 
                                          (aref ca1 (+ i ca-width 1)) 
                                          (aref ca1 (- i ca-width -1)) 
                                          (aref ca1 (- i ca-width)) 
                                          (aref ca1 (- i ca-width 1))))) 
                              (when (= sum 1) (setf cell 1))) 
                          (if (= cell 62) 
                              (setf cell 0) 
                            (incf cell))) 
                        (setf (aref ca2 i) cell) 
                        (incf i))) 
                (incf i 2)) 
          (psetf ca1 ca2                                    ;;changed
		 ca2 ca1))

    ;; save it 
    (with-open-file (stream file :direction :output          ;;changed
			    :stream-class file-output-stream) 
      (loop for y from 0 below ca-height 
            with i = 0 do 
            (loop for x from 0 below ca-width do 
                  (let ((cell (aref ca1 i))) 
                    (write-byte cell stream)) 
                  (incf i))))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Now, you can translate this code into Java using the Linj compiler.
The result is:

//////////////////////////////////////////////////////////////////////
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.PrintWriter;

public class Generator extends Object {

    // methods

    // a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )
    public static void main(String[] outsideArgs) throws FileNotFoundException {
        int count = Integer.parseInt(outsideArgs[0]);
        String file = outsideArgs[1];
        int width = 640;
        int height = 480;
        int caWidth = width + 2;
        int caHeight = height + 2;
        int len = caWidth * caHeight;
        byte[] ca1 = new byte[len];
        byte[] ca2 = new byte[len];
        ca1[(((int)Math.floor(len / 2)) - caWidth) - 4] = 1;
        for (int c = 1; c <= count; ++c) {
            {
                int y = 0;
                for (int i = caWidth + 1; y < height; ++y) {
                    for (int x = 0; x < width; ++x) {
                        byte cell = ca1[i];
                        if (cell == 0) {
                            int sum =
                                ca1[i + 1] +
                                ca1[i - 1] +
                                ca1[i + caWidth + -1] +
                                ca1[i + caWidth] +
                                ca1[i + caWidth + 1] +
                                ca1[(i - caWidth) - -1] +
                                ca1[i - caWidth] +
                                ca1[(i - caWidth) - 1];
                            if (sum == 1) {
                                cell = 1;
                            }
                        } else if (cell == 62) {
                            cell = 0;
                        } else {
                            ++cell;
                        }
                        ca2[i] = cell;
                        ++i;
                    }
                    i += 2;
                }
            }
            byte[] ca10 = ca1;
            ca1 = ca2;
            ca2 = ca10;
        }
        PrintWriter stream = new PrintWriter(new FileOutputStream(file));
        try {
            int y = 0;
            for (int i = 0; y < caHeight; ++y) {
                for (int x = 0; x < caWidth; ++x) {
                    byte cell = ca1[i];
                    stream.write(cell);
                    ++i;
                }
            }
        } finally {
            if (stream != null) {
                stream.close();
            }
        }
    }
}
//////////////////////////////////////////////////////////////////////

You might find it interesting to compare the translated code with your
hand-written code.

Now, the good news is that this code executes the same example in 11.7
seconds which is 10 times faster than your original Common Lisp
version that included type declarations and is similar to your
hand-written Java a C examples.  Note that I didn't include type
declarations except for the function parameters.

Here is the summary:

Your versions
Allegro CL 7.0 Linux: 117.1 s
GCC 4.0.3 Debian:      11.2 s
Java 1.4.2 Linux:      11.2 s

Linj version
Java 1.4.2 Linux:      11.7 s

Obviously, now that the code is targeting Java, we might take
advantage of Java libraries and we can do the file printing as you
did.  Here is the change:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun main (count file)
  "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )"

  ...

    ;; save it 
    (with-open-file (stream file :direction :output
			    :stream-class file-output-stream
			    :decorator-classes nil)
      (write stream ca1))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

The translation (for the changed fragment) is:

//////////////////////////////////////////////////////////////////////
import java.io.FileOutputStream;
import java.io.IOException;

public class Generator extends Object {

    // methods

    // a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )
    public static void main(String[] outsideArgs) throws IOException {

    ...

        FileOutputStream stream = new FileOutputStream(file);
        try {
            stream.write(ca1);
        } finally {
            if (stream != null) {
                stream.close();
            }
        }
    }
}
//////////////////////////////////////////////////////////////////////

Notice that using with-open-file in Linj gives you the correct
treatment to IO errors (that you forgot in the hand-written Java
example).

Ant�nio Leit�o.

From: Shyamal Prasad
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Fri, 31 Mar 2006 05:43:54 +0000
Message-ID: <87mzf7godj.fsf@localhost.localdomain>

>>>>> "Frank" == Frank Buss <··@frank-buss.de> writes:

    Frank> I've tested my cellular automaton from
    Frank> http://www.frank-buss.de/automaton/arabeske.html again on
    Frank> my current computer with C and some Common Lisp
    Frank> implementations on Windows:

    Frank> Time for calculating 1000 generations:

    Frank> Visual Studio C++ 7.1.3088: 1.9s 
    Frank> sbcl-0.9.9-x86-win32: 29s

I'm no lisp expert, but while browsing this thread I never noticed
until now how much slower sbcl (and your other lisp implementations)
were than the C++ on win32. So I got around to timing it on my
computers.

    Frank> I'm interested in the timings for other Lisp
    Frank> implementations. You can use the C version from
    Frank> http://www.frank-buss.de/tmp/arabeske.zip for a time
    Frank> reference (or use the CLISP version).

Linux on a 600 Mhz Pentium: your lisp code is approximately  2.65x slower.

gcc (-O3) 3.3.5 - 10.71s
sbcl 0.8.16 - 28.42s

Linux on a 2Ghz PowerPC G5: your lisp code is approximaely 2.32x slower

gcc (-03) 3.3.6 - 3.86
sbcl 0.9.9 - 8.94

CMUCL and SBCL are advertised as generating poor code on the x86 (see
the man page or documentation) so this improvement was not a total
surprise.

    Frank> In another thread in this newsgroup some people wrote that
    Frank> Common Lisp can be as fast as C. This might be possible,
    Frank> but looks like no current implementation reachs this
    Frank> goal. And I know, it's unfair to compare C and Lisp with
    Frank> such low-level tasks, because Lisp is better for high-level
    Frank> tasks, but would be nice to do fast image processing like
    Frank> in http://www.frank-buss.de/lisp/texture.html in Lisp,
    Frank> only.

I tried an optimized versions you posted later in this thread (with
defconst and better typing) using sbcl on the powerpc. the run time
dropped to 6.86 seconds - the lisp code is now only 1.77 times slower!

A constant factor like this hardly proves anything conclusively about
the language implementation IMHO (except that, on this type of
program, yes, gcc compiles faster code than sbcl).

I'd reached exactly the opposite conclusion: the fact that sbcl is
only ~2 times slower than a C program on a bit of code ideally suited
for optimzation in C was proof that execution speed really is not a
serious issue to consider when choosing lisp for *most* applications.

Oh yes, and sbcl on win32 does seem to suck....I'd be curious to know
why. 

Cheers!
Shyamal

From: Frank Buss
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Fri, 31 Mar 2006 09:16:17 +0000
Message-ID: <1jxwar14or6f.fnknagw7leqo$.dlg@40tude.net>

Shyamal Prasad wrote:

> I'd reached exactly the opposite conclusion: the fact that sbcl is
> only ~2 times slower than a C program on a bit of code ideally suited
> for optimzation in C was proof that execution speed really is not a
> serious issue to consider when choosing lisp for *most* applications.

Another conclusion would be that GCC is slower than Visual C++ :-)
Maybe the Intel C compiler is even faster.

-- 
Frank Buss, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: HB
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Fri, 31 Mar 2006 16:38:13 +0000
Message-ID: <1143823093.821952.302820@e56g2000cwe.googlegroups.com>

Don't know about Visual C++ but your guess concerning the Intel C
Compiler seems to be wrong. On an 1100MHz Duron I get the following
results:
gcc-4.1 -march=athlon-xp -O3 arabeske.c -o arabeske-gcc

time ./arabeske-gcc 1000 test.dat

real    0m5.826s
user    0m4.608s
sys     0m0.011s

icc -march=pentium4 -O3 arabeske.c -o arabeske-icc

time ./arabeske-icc 1000 test.dat

real    0m8.064s
user    0m5.596s
sys     0m0.026s

(march=pentium4 seems to generate the fastest code for athlon-xp).
Anybody with knowledge about icc optimization options similar results?

Greetings HB

From: bradb
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Fri, 31 Mar 2006 16:56:10 +0000
Message-ID: <1143824170.383138.68650@g10g2000cwb.googlegroups.com>

I don't know how much I'd trust the Intel compiler to generate good
code for non-Intel chips.
See http://www.swallowtail.org/naughty-intel.html

Brad

From: HB
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Fri, 31 Mar 2006 20:37:31 +0000
Message-ID: <1143837451.303898.191840@u72g2000cwu.googlegroups.com>

bradb schrieb:

> I don't know how much I'd trust the Intel compiler to generate good
> code for non-Intel chips.
> See http://www.swallowtail.org/naughty-intel.html
>
> Brad

I patched icc with the latest patch on swallowtail.org, should have
mentioned that. But I didn't run benchmarks with patched icc vs.
unpatched icc, so there's the possibility that the patch didn't work on
my build of icc.

From: ·············@verizon.net
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Fri, 31 Mar 2006 19:28:53 +0000
Message-ID: <hd5etnuy.fsf@ericsson.com>

    "Frank" == Frank Buss <··@frank-buss.de> writes:

    Frank> Shyamal Prasad wrote:
    >> I'd reached exactly the opposite conclusion: the fact that sbcl
    >> is only ~2 times slower than a C program on a bit of code
    >> ideally suited for optimzation in C was proof that execution
    >> speed really is not a serious issue to consider when choosing
    >> lisp for *most* applications.

    Frank> Another conclusion would be that GCC is slower than Visual
    Frank> C++ :-) Maybe the Intel C compiler is even faster.

Yes. A discussion for another day :) I've heard rumours that Visual
C++ is a better optimizing compiler than GCC (of course, only on
x86!).

But, back to the topic, since I work with open source software, Visual
C++ is not an option for me; gcc *is* my C compiler.

So, for the platforms I care about, I'd stick with my conclusion. For
my "choose a language" mind game SBCL is close enough to GCC that I'd
call it a draw :) My choice between Lisp and C when developing an
application would rarely be based on execution speed.

Cheers!
Shyamal

From: Peter Seibel
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Fri, 31 Mar 2006 13:38:24 +0000
Message-ID: <m2acb620q7.fsf@gigamonkeys.com>

Shyamal Prasad <·············@verizon.net> writes:

>>>>>> "Frank" == Frank Buss <··@frank-buss.de> writes:

> Oh yes, and sbcl on win32 does seem to suck....I'd be curious to know
> why. 

Easy there--it's only a few weeks old. Give it time.

-Peter

-- 
Peter Seibel           * ·····@gigamonkeys.com
Gigamonkeys Consulting * http://www.gigamonkeys.com/
Practical Common Lisp  * http://www.gigamonkeys.com/book/

From: Christophe Rhodes
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Fri, 31 Mar 2006 16:31:59 +0000
Message-ID: <sq3bgyefsw.fsf@cam.ac.uk>

Peter Seibel <·····@gigamonkeys.com> writes:

> Shyamal Prasad <·············@verizon.net> writes:
>
>>>>>>> "Frank" == Frank Buss <··@frank-buss.de> writes:
>
>> Oh yes, and sbcl on win32 does seem to suck....I'd be curious to know
>> why. 
>
> Easy there--it's only a few weeks old. Give it time.

SBCL on Win32 sucks in many ways, but in this case I think the suckage
is rather more deep-rooted than the windows port.  (I think the
sentiment in the grandparent post would better be expressed as
something a little more nuanced, like "for certain cases on
register-poor architectures, the implementations of representation
selection and register allocation can interact in ways to produce
strongly suboptimal compilation results").

Additionally, I'd like to second Peter's recommendation, but in a way
he probably didn't mean: if you want SBCL on win32 to suck less, give
it time -- your time. :-)

Christophe

From: ··············@ericsson.com
Subject: Re: Common Lisp implementations are still multiple times slower than C
Date: Fri, 31 Mar 2006 19:17:52 +0000
Message-ID: <mzf6todb.fsf@ericsson.com>

    "Christophe" == Christophe Rhodes <·····@cam.ac.uk> writes:

    Christophe> Peter Seibel <·····@gigamonkeys.com> writes:
    >> Shyamal Prasad <·············@verizon.net> writes:
    >> 
    >>>>>>>> "Frank" == Frank Buss <··@frank-buss.de> writes:
    >>
    >>> Oh yes, and sbcl on win32 does seem to suck....I'd be curious
    >>> to know why.

    >>  Easy there--it's only a few weeks old. Give it time.

    Christophe> Additionally, I'd like to second Peter's
    Christophe> recommendation, but in a way he probably didn't mean:
    Christophe> if you want SBCL on win32 to suck less, give it time
    Christophe> -- your time. :-)

Point taken :-) I don't use win32 to write sofware, or else I would...

My comment was not meant to be any criticism of SBCL's win32 port (I
wouldn't dare, really). It was more a question about the causes of the
OP's reported performance gap (which I don't see on two versions of
SBCL, two versions of the GNU C compiler, on both a Pentium and a
PowerPC machine, when running Linux).

Cheers!
Shyamal