From: Frank Buss
Subject: How to speed up an OpenGL application?
Date: 
Message-ID: <d5jolt$4ii$1@newsreader3.netcologne.de>
I've ported my Arabeske automaton to Lisp. This is the original one
(yes, I wrote it in assembler, first)

http://www.frank-buss.de/automaton/arabeske.html

My first hack in Lisp (which is based on the hack at
http://hocwp.free.fr/ah2cl/ )

http://www.frank-buss.de/tmp/arabeske-cl.zip

To start it, you need LispWorks for Windows, OpenGL and Glu (included in
Windows), Glut (http://www.xmission.com/~nate/glut.html) and my patched
Hello-C version (http://www.frank-buss.de/lisp/hello-c.zip), which needs
ASDF, which you can find for example here:
http://cvs.sourceforge.net/viewcvs.py/cclan/asdf/ 

Then you need to load these files first in the LispWorks IDE:

(defpackage :gl)
(load "...gl-uffi.lisp")
(load "...glu-uffi.lisp")
(load "...glut-uffi.lisp")

Then you can compile my-test.lisp and start it with my-test::start.

It's terrible slow, about 4 frames per second on my PC. So I ported it
again to C++. You can load the source, the Visual Studio project file
and the arabeske.exe (which is 24 kB): 

http://www.frank-buss.de/tmp/arabaske-vc.zip

You just need to install Glut, then you can launch the project,
rebuilding it and you have a new exe. I don't have measured it, but
looks like it runs with VSync speed, so 60 Hz for my TFT, which looks
much smoother. Even in 1000x1000 it runs smooth and looks interesting,
see this screenshot: 

http://www.frank-buss.de/tmp/arabaske.png

Is it possible to tune the Lisp program to run as fast as the C++
program? And how can we simplifiying the setup-process for a working
environment for developing OpenGL applications in Lisp, for all Lisp
implementations on all platforms?

Currently I'm feeling like the "Master Programmer" in this article:

http://www.informatik.hu-berlin.de/~wianke/The_Evolution_of_a_Programmer.html 

The assembler binary was just a 181 byte long program, working on all
PCs from DOS on 386 to Windows on Pentium, without any library.

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: Kenny Tilton
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <QBffe.16561$n93.9242@twister.nyc.rr.com>
Frank Buss wrote:
> I've ported my Arabeske automaton to Lisp. This is the original one
> (yes, I wrote it in assembler, first)
...

> It's terrible slow, about 4 frames per second on my PC. So I ported it
> again to C++. You can load the source,...

No, /you/ can load the source, I just need to look at it. You are trying 
to do assembler in Lisp. Please tell me you grok the problem there. You 
have a 640x480 loop, aka 307k iterations. In the loop you hit aref a 
dozen times, god knows what else, without an iota of optimization.

When you want to do assembler in Lisp, you need to defeat the mad cool 
things Lisp does on every instruction. aka, declarations out the wazoo. 
you might even want to lose the array and get into bit operations. When 
in doubt, disassemble and see how tight is your code.

kt

-- 
Cells? Cello?: http://www.common-lisp.net/project/cells/
Cells-Gtk?: http://www.common-lisp.net/project/cells-gtk/
Why Lisp? http://lisp.tech.coop/RtL%20Highlight%20Film

"Doctor, I wrestled with reality for forty years, and I am happy to 
state that I finally won out over it." -- Elwood P. Dowd
From: Frank Buss
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <d5kfjm$995$1@newsreader3.netcologne.de>
Kenny Tilton <·······@nyc.rr.com> wrote:

> You 
> have a 640x480 loop, aka 307k iterations. In the loop you hit aref a 
> dozen times, god knows what else, without an iota of optimization.

yes, this is the speed limiting factor. Copying the 640x480 bitmap with
OpenGL needs nearly no time (nearly no CPU usage at full 60 fps rate),
but the cellular automaton algorithm is too slow in my Lisp
implementation. 

> When you want to do assembler in Lisp, you need to defeat the mad cool
> things Lisp does on every instruction. aka, declarations out the
> wazoo. you might even want to lose the array and get into bit
> operations. When in doubt, disassemble and see how tight is your code.

you are right, but this is difficult. In C I just write the code and I
know that the compiler produces something fast. I've tried with the help
of others to optimize my texture code in Lisp (and one main result was,
that declaring the type of the array helps LispWorks alot, but declaring
the type of normal variables has nearly no effects) and my sound
generation code, but looks like all this is about 10 times faster in C,
without any special optimization declarations in the C program. So the
result is, implementing modern games (which means some action games,
where it is not a good idea too make a 1 GHz PC look like a 100 MHz PC)
in Lisp is not possible without the help of low-level C libraries. 

But don't misunderstand me, I like Lisp and I don't want to use C, if
possible. Perhaps ECL ( http://ecls.sourceforge.net/ ) is a better
solution for such tasks? Has anyone used this for larger, time critical
applications? 

But perhaps it is really easy to create fast Lisp code, so below is the
main problem, first in Common Lisp, then in ANSI C. It looks like the
Lisp implemenation is really slow: 

CL-USER > (time (generations 100 "c:\\tmp\\cl.bin"))
Timing the evaluation of (GENERATIONS 100 "c:\\tmp\\cl.bin")

user time    =     23.312
system time  =      0.000
Elapsed time =   0:00:23
Allocation   = 632576 bytes standard / 9625 bytes conses
0 Page faults
Calls to %EVAL    35
NIL

In C you can calculate more than 12,000 generations in 20 seconds (with
Visual Studio .NET in release mode compiled, with all default settings
for console project), so the current Lisp implementations is more than
120 times slower than the C implementation. 

The sources are free for any usage, feel free to include it in own
benchmarks or other programs. The Common Lisp program:

(proclaim '(optimize (speed 3) (safety 0) (debug 0)))

(defun generations (count file)
  "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )"
  (let* ((width 640)
         (height 480)
         (ca-width (+ width 2))
         (ca-height (+ height 2))
         (len (* ca-width ca-height))
         (ca1 (make-array len :element-type '(unsigned-byte 8) :initial-element 0))
         (ca2 (make-array len :element-type '(unsigned-byte 8) :initial-element 0)))
    ;; seed the cellular automaton space
    (setf (aref ca1 (- (floor len 2) ca-width 4)) 1)

    ;; calcuclate the generation "count"
    (loop for c from 1 to count do
          (loop for y from 0 below height
                with i = (1+ ca-width) do
                (loop for x from 0 below width do
                      (let ((cell (aref ca1 i)))
                        (if (zerop cell)
                            ;; summing the 8 neighbours
                            (let ((sum (+ (aref ca1 (1+ i))
                                          (aref ca1 (1- i))
                                          (aref ca1 (+ i ca-width -1))
                                          (aref ca1 (+ i ca-width))
                                          (aref ca1 (+ i ca-width 1))
                                          (aref ca1 (- i ca-width -1))
                                          (aref ca1 (- i ca-width))
                                          (aref ca1 (- i ca-width 1)))))
                              (when (= sum 1) (setf cell 1)))
                          (if (= cell 62)
                              (setf cell 0)
                            (incf cell)))
                        (setf (aref ca2 i) cell)
                        (incf i)))
                (incf i 2))
          (rotatef ca1 ca2))

    ;; save it
    (with-open-file (stream file :direction :output
                            :element-type '(unsigned-byte 8)
                            :if-exists :supersede
                            :if-does-not-exist :create)
      (loop for y from 0 below ca-height
            with i = 0 do
            (loop for x from 0 below ca-width do
                  (let ((cell (aref ca1 i)))
                    (write-byte cell stream))
                  (incf i))))))


The C source:

#include <stdio.h>
#include <stdlib.h>
#include <memory.h>

#define WIDTH 640
#define HEIGHT 480
#define CA_WIDTH (WIDTH + 2)
#define CA_HEIGHT (HEIGHT + 2)

int main(int argc, char** argv)
{
    	/* a generations CA (see http://www.mirekw.com/ca/rullex_gene.html ) */
	unsigned char *ca1, *ca2, *tmp;
	int count;
	FILE* f;
	int c, x, y;

	if (argc != 3) {
		printf("usage: arabeske count output-file\n");
		return 1;
	}

	count = atoi(argv[1]);
	ca1 = (unsigned char*) malloc(CA_WIDTH*CA_HEIGHT);
	ca2 = (unsigned char*) malloc(CA_WIDTH*CA_HEIGHT);
	memset(ca1, 0, CA_WIDTH*CA_HEIGHT);
	memset(ca2, 0, CA_WIDTH*CA_HEIGHT);

    /* seed the cellular automaton space */
	ca1[CA_WIDTH*CA_HEIGHT/2-CA_WIDTH-4] = 1;

    /* calcuclate the generation "count" */
	for (c = 0; c < count; c++) {
		int i = CA_WIDTH + 1;
		for (y = 0; y < HEIGHT; y++) {
			for (x = 0; x < WIDTH; x++) {
				int cell = ca1[i];
				if (cell) {
					if (++cell == 63) cell = 0;
				} else {
					/* summing the 8 neighbours */
					int sum = ca1[i + 1] +
							ca1[i - 1] +
							ca1[i + CA_WIDTH - 1] +
							ca1[i + CA_WIDTH] +
							ca1[i + CA_WIDTH + 1] +
							ca1[i - CA_WIDTH - 1] +
							ca1[i - CA_WIDTH] +
  							ca1[i - CA_WIDTH + 1];
					if (sum == 1) cell = 1;
				}
				ca2[i] = cell;
				i++;
			}
			i += 2;
		}
		tmp = ca1;
		ca1 = ca2;
		ca2 = tmp;
	}

	/* save it */
	f = fopen(argv[2], "wb");
	if (f) {
		fwrite(ca1, CA_WIDTH * CA_HEIGHT, 1, f);
		fclose(f);
	}

	return 0;
}



-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: Kenny Tilton
Subject: How to Speed up Cellular Automata [was Re: How to speed up an OpenGL application?]
Date: 
Message-ID: <b4kfe.16574$n93.1178@twister.nyc.rr.com>
Frank Buss wrote:
> Kenny Tilton <·······@nyc.rr.com> wrote:
> 
> 
>>You 
>>have a 640x480 loop, aka 307k iterations. In the loop you hit aref a 
>>dozen times, god knows what else, without an iota of optimization.
> 
> 
> yes, this is the speed limiting factor. Copying the 640x480 bitmap with
> OpenGL needs nearly no time (nearly no CPU usage at full 60 fps rate),

This I doubt. I saw big performance hits copying bitmaps in OpenGL. 
Using textures to cache them on the GPU was vital. Sorry, I have not 
stared at your code since the first quick glance. I think you are saying 
you copy it once creating a texture and then later use the texture name. 
The copying is slow, but you only do it once. Yes, textures are fast 
because you just pass one value (the tx number/name) in each iteration.

> but the cellular automaton algorithm is too slow in my Lisp
> implementation. 

Did you disassemble the code to see how tight it was? btw, I am no 
expert on these matters, but seems like the bottom line: keep optimizing 
until you see tight (unsafe) disassembler output.

> 
> 
>>When you want to do assembler in Lisp, you need to defeat the mad cool
>>things Lisp does on every instruction. aka, declarations out the
>>wazoo. you might even want to lose the array and get into bit
>>operations. When in doubt, disassemble and see how tight is your code.
> 
> 
> you are right, but this is difficult. In C I just write the code and I
> know that the compiler produces something fast.

No, C as a language is just a portable assembler, which is the point I 
am trying to make. Lisp does more by default, so if you need to write 
machine language you have some work to do. C does diddly by default, 
which is why it is so obtuse.

> I've tried with the help
> of others to optimize my texture code in Lisp (and one main result was,
> that declaring the type of the array helps LispWorks alot, but declaring
> the type of normal variables has nearly no effects) and my sound
> generation code, but looks like all this is about 10 times faster in C,
> without any special optimization declarations in the C program.

No shit. Your benchmark is "let's write assembler!".

> So the
> result is, implementing modern games (which means some action games,
> where it is not a good idea too make a 1 GHz PC look like a 100 MHz PC)
> in Lisp is not possible without the help of low-level C libraries. 

Games do not require automata, which is the straw man you have settled 
on in assessing the suitability of Lisp for games. Do you have space 
ships fighting it out in front of Conway's Life?!

If you want to make a speech about "Lisp Sucks for OpenGL", you will 
need a relevant benchmark.

> 
> But don't misunderstand me, I like Lisp and I don't want to use C, if
> possible.

Well if your /real/ project is to write automaton after automaton, hell, 
go ahead and use C. Tho we agree, it might be more fun to learn really 
well how to optimize code and use Lisp. Otoh, it is probably more 
empowering to learn FFI really well.

I remember recently playing with that physics code here recently, and 
seeing AREF show up on AllegroCL's profiler at a sizeable fraction of 
total run time. I will try profiling your code to see what I can see, 
but clearly AREF is going to be a big winner.

kenny

> Perhaps ECL ( http://ecls.sourceforge.net/ ) is a better
> solution for such tasks? Has anyone used this for larger, time critical
> applications? 
> 
> But perhaps it is really easy to create fast Lisp code, so below is the
> main problem, first in Common Lisp, then in ANSI C. It looks like the
> Lisp implemenation is really slow: 
> 
> CL-USER > (time (generations 100 "c:\\tmp\\cl.bin"))
> Timing the evaluation of (GENERATIONS 100 "c:\\tmp\\cl.bin")
> 
> user time    =     23.312
> system time  =      0.000
> Elapsed time =   0:00:23
> Allocation   = 632576 bytes standard / 9625 bytes conses
> 0 Page faults
> Calls to %EVAL    35
> NIL
> 
> In C you can calculate more than 12,000 generations in 20 seconds (with
> Visual Studio .NET in release mode compiled, with all default settings
> for console project), so the current Lisp implementations is more than
> 120 times slower than the C implementation. 
> 
> The sources are free for any usage, feel free to include it in own
> benchmarks or other programs. The Common Lisp program:
> 
> (proclaim '(optimize (speed 3) (safety 0) (debug 0)))
> 
> (defun generations (count file)
>   "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )"
>   (let* ((width 640)
>          (height 480)
>          (ca-width (+ width 2))
>          (ca-height (+ height 2))
>          (len (* ca-width ca-height))
>          (ca1 (make-array len :element-type '(unsigned-byte 8) :initial-element 0))
>          (ca2 (make-array len :element-type '(unsigned-byte 8) :initial-element 0)))
>     ;; seed the cellular automaton space
>     (setf (aref ca1 (- (floor len 2) ca-width 4)) 1)
> 
>     ;; calcuclate the generation "count"
>     (loop for c from 1 to count do
>           (loop for y from 0 below height
>                 with i = (1+ ca-width) do
>                 (loop for x from 0 below width do
>                       (let ((cell (aref ca1 i)))
>                         (if (zerop cell)
>                             ;; summing the 8 neighbours
>                             (let ((sum (+ (aref ca1 (1+ i))
>                                           (aref ca1 (1- i))
>                                           (aref ca1 (+ i ca-width -1))
>                                           (aref ca1 (+ i ca-width))
>                                           (aref ca1 (+ i ca-width 1))
>                                           (aref ca1 (- i ca-width -1))
>                                           (aref ca1 (- i ca-width))
>                                           (aref ca1 (- i ca-width 1)))))
>                               (when (= sum 1) (setf cell 1)))
>                           (if (= cell 62)
>                               (setf cell 0)
>                             (incf cell)))
>                         (setf (aref ca2 i) cell)
>                         (incf i)))
>                 (incf i 2))
>           (rotatef ca1 ca2))
> 
>     ;; save it
>     (with-open-file (stream file :direction :output
>                             :element-type '(unsigned-byte 8)
>                             :if-exists :supersede
>                             :if-does-not-exist :create)
>       (loop for y from 0 below ca-height
>             with i = 0 do
>             (loop for x from 0 below ca-width do
>                   (let ((cell (aref ca1 i)))
>                     (write-byte cell stream))
>                   (incf i))))))
> 
> 
> The C source:
> 
> #include <stdio.h>
> #include <stdlib.h>
> #include <memory.h>
> 
> #define WIDTH 640
> #define HEIGHT 480
> #define CA_WIDTH (WIDTH + 2)
> #define CA_HEIGHT (HEIGHT + 2)
> 
> int main(int argc, char** argv)
> {
>     	/* a generations CA (see http://www.mirekw.com/ca/rullex_gene.html ) */
> 	unsigned char *ca1, *ca2, *tmp;
> 	int count;
> 	FILE* f;
> 	int c, x, y;
> 
> 	if (argc != 3) {
> 		printf("usage: arabeske count output-file\n");
> 		return 1;
> 	}
> 
> 	count = atoi(argv[1]);
> 	ca1 = (unsigned char*) malloc(CA_WIDTH*CA_HEIGHT);
> 	ca2 = (unsigned char*) malloc(CA_WIDTH*CA_HEIGHT);
> 	memset(ca1, 0, CA_WIDTH*CA_HEIGHT);
> 	memset(ca2, 0, CA_WIDTH*CA_HEIGHT);
> 
>     /* seed the cellular automaton space */
> 	ca1[CA_WIDTH*CA_HEIGHT/2-CA_WIDTH-4] = 1;
> 
>     /* calcuclate the generation "count" */
> 	for (c = 0; c < count; c++) {
> 		int i = CA_WIDTH + 1;
> 		for (y = 0; y < HEIGHT; y++) {
> 			for (x = 0; x < WIDTH; x++) {
> 				int cell = ca1[i];
> 				if (cell) {
> 					if (++cell == 63) cell = 0;
> 				} else {
> 					/* summing the 8 neighbours */
> 					int sum = ca1[i + 1] +
> 							ca1[i - 1] +
> 							ca1[i + CA_WIDTH - 1] +
> 							ca1[i + CA_WIDTH] +
> 							ca1[i + CA_WIDTH + 1] +
> 							ca1[i - CA_WIDTH - 1] +
> 							ca1[i - CA_WIDTH] +
>   							ca1[i - CA_WIDTH + 1];
> 					if (sum == 1) cell = 1;
> 				}
> 				ca2[i] = cell;
> 				i++;
> 			}
> 			i += 2;
> 		}
> 		tmp = ca1;
> 		ca1 = ca2;
> 		ca2 = tmp;
> 	}
> 
> 	/* save it */
> 	f = fopen(argv[2], "wb");
> 	if (f) {
> 		fwrite(ca1, CA_WIDTH * CA_HEIGHT, 1, f);
> 		fclose(f);
> 	}
> 
> 	return 0;
> }
> 
> 
> 

-- 
Cells? Cello?: http://www.common-lisp.net/project/cells/
Cells-Gtk?: http://www.common-lisp.net/project/cells-gtk/
Why Lisp? http://lisp.tech.coop/RtL%20Highlight%20Film

"Doctor, I wrestled with reality for forty years, and I am happy to 
state that I finally won out over it." -- Elwood P. Dowd
From: Frank Buss
Subject: Re: How to Speed up Cellular Automata [was Re: How to speed up an OpenGL application?]
Date: 
Message-ID: <d5kl4l$i8j$1@newsreader3.netcologne.de>
Kenny Tilton <·······@nyc.rr.com> wrote:

> I think you are
> saying you copy it once creating a texture and then later use the
> texture name. 

no, I'm using glDrawPixels, which copies from system memory to graphics 
buffer every frame, which needs nearly no CPU usage at 60 fps at my 
computer (a 3 GHz PC with PCI-E bus and a NVIDIA Quadro graphics card).

> Did you disassemble the code to see how tight it was? btw, I am no 
> expert on these matters, but seems like the bottom line: keep
> optimizing until you see tight (unsafe) disassembler output.

I disassembled the posted code and the "aref"s are not optimized, but 
using calls. This is of course really slow, because the C version has no 
call in the inner loop.

> Games do not require automata, which is the straw man you have settled
> on in assessing the suitability of Lisp for games. Do you have space 
> ships fighting it out in front of Conway's Life?!

but games (at least 3D games) requires for example for shadow 
calculation, similar things: Processing large arrays and doing some 
calculation for every entry. Same thing for other nice effects, like 
realistic fog, water and physics etc.

> If you want to make a speech about "Lisp Sucks for OpenGL", you will 
> need a relevant benchmark.

you are right, this has nothing to do with OpenGL, it is Lisp per se 
which is slow :-)

> Well if your /real/ project is to write automaton after automaton,
> hell, go ahead and use C. Tho we agree, it might be more fun to learn
> really well how to optimize code and use Lisp. Otoh, it is probably
> more empowering to learn FFI really well.

I don't know. My idea was, if I have to spent hours for improving a bit 
of Lisp code, which runs without any optimization at all in C 100 times 
faster, why not combining the advantages of both languages and using 
something like ECL? I can write nearly all in Lisp and need to program 
only the hotspots in C, but well integrated in the Lisp program. FFI is 
not flexible enough, it is difficult to use it across Windows, Linux and 
MacOS, if you want to use all the special things like callbacks and 
optimizing Lisp to be as fast as C is not always possible or to 
complicated.

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: M Jared Finder
Subject: Re: How to Speed up Cellular Automata [was Re: How to speed up an OpenGL application?]
Date: 
Message-ID: <NomdnY24LO7W0ePfRVn-og@speakeasy.net>
Frank Buss wrote:
> Kenny Tilton <·······@nyc.rr.com> wrote:
> 
>>Well if your /real/ project is to write automaton after automaton,
>>hell, go ahead and use C. Tho we agree, it might be more fun to learn
>>really well how to optimize code and use Lisp. Otoh, it is probably
>>more empowering to learn FFI really well.
> 
> I don't know. My idea was, if I have to spent hours for improving a bit 
> of Lisp code, which runs without any optimization at all in C 100 times 
> faster, why not combining the advantages of both languages and using 
> something like ECL? I can write nearly all in Lisp and need to program 
> only the hotspots in C, but well integrated in the Lisp program. FFI is 
> not flexible enough, it is difficult to use it across Windows, Linux and 
> MacOS, if you want to use all the special things like callbacks and 
> optimizing Lisp to be as fast as C is not always possible or to 
> complicated.

It seems to me like your code needs more declarations.  In less than 10 
minutes, I was able to get almost a 30x speed improvement on SBCL by 
declaring the arrays CA1 and CA2 to be of type
(SIMPLE-ARRAY (UNSIGNED-BYTE 8) 1).

Before declarations:

CL-USER> (time (generations 100 "tmp/cl.bin"))
Evaluation took:
   47.231 seconds of real time
   42.771496 seconds of user run time
   0.157976 seconds of system run time
   0 page faults and
   618,912 bytes consed.

After declarations:

CL-USER> (time (generations 100 "tmp/cl.bin"))
Evaluation took:
   1.58 seconds of real time
   1.494773 seconds of user run time
   0.010998 seconds of system run time
   0 page faults and
   627,104 bytes consed.

I'm a bit confused as to why SBCL can't figure this out and propagate 
the type, at least in this case.  It would have been really nice to not 
have to type the declarations, since all the needed information is 
available at compile time.

   -- MJF
From: Kenny Tilton
Subject: Re: How to Speed up Cellular Automata [was Re: How to speed up an OpenGL application?]
Date: 
Message-ID: <ximfe.16578$n93.8131@twister.nyc.rr.com>
Frank Buss wrote:

> Kenny Tilton <·······@nyc.rr.com> wrote:
> 
> 
>>I think you are
>>saying you copy it once creating a texture and then later use the
>>texture name. 
> 
> 
> no, I'm using glDrawPixels, which copies from system memory to graphics 
> buffer every frame, which needs nearly no CPU usage at 60 fps at my 
> computer (a 3 GHz PC with PCI-E bus and a NVIDIA Quadro graphics card).
> 
> 
>>Did you disassemble the code to see how tight it was? btw, I am no 
>>expert on these matters, but seems like the bottom line: keep
>>optimizing until you see tight (unsafe) disassembler output.
> 
> 
> I disassembled the posted code and the "aref"s are not optimized, but 
> using calls. This is of course really slow, because the C version has no 
> call in the inner loop.
> 
> 
>>Games do not require automata, which is the straw man you have settled
>>on in assessing the suitability of Lisp for games. Do you have space 
>>ships fighting it out in front of Conway's Life?!
> 
> 
> but games (at least 3D games) requires for example for shadow 
> calculation, similar things: Processing large arrays and doing some 
> calculation for every entry. Same thing for other nice effects, like 
> realistic fog, water and physics etc.

I would use a library for that anyway. But I see you are afraid of FFI. 
What about the new OpenGL Shading Language? Played with that at all? I 
mean, as long as you are worried about speed, let's do this right.

> I don't know. My idea was, if I have to spent hours for improving a bit 
> of Lisp code, which runs without any optimization at all in C 100 times 
> faster, why not combining the advantages of both languages and using 
> something like ECL?

Let us know how ECL works out for you. Me, I am over the FFI hump (and 
Hello-C (ne UFFI) makes it all portable), so I can stick with more 
conventional environments.

kt

-- 
Cells? Cello?: http://www.common-lisp.net/project/cells/
Cells-Gtk?: http://www.common-lisp.net/project/cells-gtk/
Why Lisp? http://lisp.tech.coop/RtL%20Highlight%20Film

"Doctor, I wrestled with reality for forty years, and I am happy to 
state that I finally won out over it." -- Elwood P. Dowd
From: Frank Buss
Subject: Re: How to Speed up Cellular Automata [was Re: How to speed up an OpenGL application?]
Date: 
Message-ID: <d686fu$d7k$1@newsreader3.netcologne.de>
Kenny Tilton <·······@nyc.rr.com> wrote:

> I would use a library for that anyway. But I see you are afraid of
> FFI. What about the new OpenGL Shading Language? Played with that at
> all? I mean, as long as you are worried about speed, let's do this
> right. 

The OpenGL Shading Language is a good idea for such tasks. I've installed 
GLEW, because it is a bit tricky to use OpenGL 2.0 features with Windows:

http://www.gamedev.net/reference/articles/article1929.asp
http://glew.sourceforge.net/

and for example on this page it is explained how to use it:

http://www.zjprogramming.com/html/helloGLSL.html

and many examples:

http://developer.3dlabs.com/openGL2/downloads/

I've tested it a bit (only in C++, because I don't want to convert all the 
new header files to UFFI), but it looks like it is not so easy to create a 
cellular automaton with it, because the fragment shader can only read 
textures and returns for every call the color of a pixel, but I didn't 
found a way to change the texture (declaring a 1000x1000 array . There is a 
NVIDIA example for the Game of Life, which uses another shading language 
and uses a multi pass concept: first a shader program calculates the sum of 
all neighbour cells, then the output is used for the input of another 
shader program, which does a lookup for the rules.

I don't know, if it is possible with OpenGL 2.0 to calculate cellular 
automata, so I crosspost this to the OpenGL newsgroup, perhaps someone 
knows how to do it with GLSL. I've tried an array instead of a texture, 
too, but if I define a 1024*1024 int-array it hangs at glLinkProgramARB and 
after some seconds it runs, but with no output.

BTW: thanks for all your tests, I'll use "normal" Common Lisp 
implementations (ECL doesn't support CLOS and other important features, as 
the ECL website says). It's easier and looks like it is fast enough while 
developing games (that's my goal). If it is too slow, I hope it is not much 
work to optimize the hot-spots in C before releasing.

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: GP lisper
Subject: Re: How to Speed up Cellular Automata [was Re: How to speed up an OpenGL application?]
Date: 
Message-ID: <1116190825.68b68d7e4c74734b81b54d5e7d0a439f@teranews>
On Sun, 15 May 2005 19:04:31 +0000 (UTC), <··@frank-buss.de> wrote:
>
> BTW: thanks for all your tests, I'll use "normal" Common Lisp 
> implementations (ECL doesn't support CLOS and other important features, as 
> the ECL website says). It's easier and looks like it is fast enough while 
> developing games (that's my goal). If it is too slow, I hope it is not much 
> work to optimize the hot-spots in C before releasing.

Thanks for the lisp games effort Frank, I enjoy the problems and
resolutions, and have a couple games in mind myself.

While it's not CLOS, Kesslers "LISP, Objects, and Symbolic
Programming" contains an object system based on structures, with
complete source code in an appendix.  It was written just before CLOS
was standardized, and seems to cover the basics.  An option to keep in
mind.  The ECL website is out of date, based on my brief stay on their
mailing list about 6 weeks ago.


-- 
Everyman has three hearts;
one to show the world, one to show friends, and one only he knows.
From: Adam Warner
Subject: Re: How to Speed up Cellular Automata [was Re: How to speed up an OpenGL application?]
Date: 
Message-ID: <pan.2005.05.08.11.37.06.186635@consulting.net.nz>
On Sun, 08 May 2005 09:11:49 +0000, Frank Buss wrote:

>> Did you disassemble the code to see how tight it was? btw, I am no 
>> expert on these matters, but seems like the bottom line: keep
>> optimizing until you see tight (unsafe) disassembler output.
> 
> I disassembled the posted code and the "aref"s are not optimized, but 
> using calls. This is of course really slow, because the C version has no 
> call in the inner loop.

Then it's likely you have too few type declarations and as a consequence
the array lookups require too much generic code to inline.

By the way the proclaim has no compile-time effect. Wrap it in an
eval-when (...) ["If compile time side effects are desired, eval-when may
be useful."] or use declaim instead ["In most such cases, however, it is
preferrable to use declaim for this purpose."]

I added (declare (optimize (speed 3) (safety 0) (debug 0))) after the
docstring of generations and compiled the function in SBCL. I believe you
will soon appreciate why your code is 10+ times slower than C by viewing
the output I've pasted below.

SBCL has very good type inference. You will not need many declarations to
eliminate most of the optimisation notes. It's fun to figure out how few
you can get away with (but you may have to keep some in for
implementations with less powerful inference engines. I don't know how
LispWorks type inference compares. In my experience CMUCL's inference is
marginally less complete than SBCL's).

I suggest avoiding a global declaim. Work upon optimising one critical
function at a time so the volume of optimisation notes doesn't overwhelm
you. A safety of 0 is very dangerous for program correctness. Isolate the
use of safety 0 to those functions you have well tested and can trust to
receive correct input.

Regards,
Adam

* (compile-file "frank")

; compiling file "/home/adam/t/frank.lisp" (written 08 MAY 2005 11:03:51 PM):
; compiling (PROCLAIM (QUOTE #))
; compiling (DEFUN GENERATIONS ...)
; file: /home/adam/t/frank.lisp
; in: DEFUN GENERATIONS
;     (AREF CA1 I)
; --> LET*
; ==>
;   (SB-KERNEL:HAIRY-DATA-VECTOR-REF ARRAY SB-INT:INDEX)
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a VECTOR, not a SIMPLE-STRING.
;
; note: unable to
;   avoid runtime dispatch on array element type
; because:
;   Upgraded element type of array is not known at compile time.

;     (INCF I 2)
; --> LET*
; ==>
;   (+ I #:G14)
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).

;     (AREF CA1 I)
; --> LET*
; ==>
;   (SB-KERNEL:HAIRY-DATA-VECTOR-REF ARRAY SB-INT:INDEX)
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a VECTOR, not a SIMPLE-STRING.
;
; note: unable to
;   avoid runtime dispatch on array element type
; because:
;   Upgraded element type of array is not known at compile time.

;     (ZEROP CELL)
; ==>
;   (= CELL 0)
;
; note: unable to
;   open-code FLOAT to RATIONAL comparison
; due to type uncertainty:
;   The first argument is a NUMBER, not a FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;
; note: unable to open code because: The operands might not be the same type.

;     (AREF CA1 (1+ I))
; --> LET*
; ==>
;   (SB-KERNEL:HAIRY-DATA-VECTOR-REF ARRAY SB-INT:INDEX)
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a VECTOR, not a SIMPLE-STRING.
;
; note: unable to
;   avoid runtime dispatch on array element type
; because:
;   Upgraded element type of array is not known at compile time.

;     (AREF CA1 (1- I))
; --> LET*
; ==>
;   (SB-KERNEL:HAIRY-DATA-VECTOR-REF ARRAY SB-INT:INDEX)
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a VECTOR, not a SIMPLE-STRING.
;
; note: unable to
;   avoid runtime dispatch on array element type
; because:
;   Upgraded element type of array is not known at compile time.

;     (+ (AREF CA1 (1+ I))
;      (AREF CA1 (1- I))
;      (AREF CA1 (+ I CA-WIDTH -1))
;      (AREF CA1 (+ I CA-WIDTH))
;      (AREF CA1 (+ I CA-WIDTH 1))
;      (AREF CA1 (- I CA-WIDTH -1))
;      (AREF CA1 (- I CA-WIDTH))
;      (AREF CA1 (- I CA-WIDTH 1)))
; --> + + + + + +
; ==>
;   (+ (AREF CA1 (1+ I)) (AREF CA1 (1- I)))
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a RATIONAL.
;   The second argument is a NUMBER, not a FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a FLOAT.
;   The second argument is a NUMBER, not a RATIONAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a SINGLE-FLOAT.
;   The second argument is a NUMBER, not a DOUBLE-FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a DOUBLE-FLOAT.
;   The second argument is a NUMBER, not a SINGLE-FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;   The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;   The second argument is a NUMBER, not a REAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a REAL.
;   The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;   The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;   The second argument is a NUMBER, not a REAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a REAL.
;   The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).

;     (AREF CA1 (+ I CA-WIDTH -1))
; --> LET*
; ==>
;   (SB-KERNEL:HAIRY-DATA-VECTOR-REF ARRAY SB-INT:INDEX)
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a VECTOR, not a SIMPLE-STRING.
;
; note: unable to
;   avoid runtime dispatch on array element type
; because:
;   Upgraded element type of array is not known at compile time.

;     (+ (AREF CA1 (1+ I))
;      (AREF CA1 (1- I))
;      (AREF CA1 (+ I CA-WIDTH -1))
;      (AREF CA1 (+ I CA-WIDTH))
;      (AREF CA1 (+ I CA-WIDTH 1))
;      (AREF CA1 (- I CA-WIDTH -1))
;      (AREF CA1 (- I CA-WIDTH))
;      (AREF CA1 (- I CA-WIDTH 1)))
; --> + + + + +
; ==>
;   (+ (+ (AREF CA1 (1+ I)) (AREF CA1 (1- I))) (AREF CA1 (+ I CA-WIDTH -1)))
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a RATIONAL.
;   The second argument is a NUMBER, not a FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a FLOAT.
;   The second argument is a NUMBER, not a RATIONAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a SINGLE-FLOAT.
;   The second argument is a NUMBER, not a DOUBLE-FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a DOUBLE-FLOAT.
;   The second argument is a NUMBER, not a SINGLE-FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;   The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;   The second argument is a NUMBER, not a REAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a REAL.
;   The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;   The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;   The second argument is a NUMBER, not a REAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a REAL.
;   The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).

;     (AREF CA1 (+ I CA-WIDTH))
; --> LET*
; ==>
;   (SB-KERNEL:HAIRY-DATA-VECTOR-REF ARRAY SB-INT:INDEX)
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a VECTOR, not a SIMPLE-STRING.
;
; note: unable to
;   avoid runtime dispatch on array element type
; because:
;   Upgraded element type of array is not known at compile time.

;     (+ (AREF CA1 (1+ I))
;      (AREF CA1 (1- I))
;      (AREF CA1 (+ I CA-WIDTH -1))
;      (AREF CA1 (+ I CA-WIDTH))
;      (AREF CA1 (+ I CA-WIDTH 1))
;      (AREF CA1 (- I CA-WIDTH -1))
;      (AREF CA1 (- I CA-WIDTH))
;      (AREF CA1 (- I CA-WIDTH 1)))
; --> + + + +
; ==>
;   (+ (+ (+ (AREF CA1 (1+ I)) (AREF CA1 (1- I))) (AREF CA1 (+ I CA-WIDTH -1)))
;      (AREF CA1 (+ I CA-WIDTH)))
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a RATIONAL.
;   The second argument is a NUMBER, not a FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a FLOAT.
;   The second argument is a NUMBER, not a RATIONAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a SINGLE-FLOAT.
;   The second argument is a NUMBER, not a DOUBLE-FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a DOUBLE-FLOAT.
;   The second argument is a NUMBER, not a SINGLE-FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;   The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;   The second argument is a NUMBER, not a REAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a REAL.
;   The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;   The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;   The second argument is a NUMBER, not a REAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a REAL.
;   The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).

;     (AREF CA1 (+ I CA-WIDTH 1))
; --> LET*
; ==>
;   (SB-KERNEL:HAIRY-DATA-VECTOR-REF ARRAY SB-INT:INDEX)
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a VECTOR, not a SIMPLE-STRING.
;
; note: unable to
;   avoid runtime dispatch on array element type
; because:
;   Upgraded element type of array is not known at compile time.

;     (+ (AREF CA1 (1+ I))
;      (AREF CA1 (1- I))
;      (AREF CA1 (+ I CA-WIDTH -1))
;      (AREF CA1 (+ I CA-WIDTH))
;      (AREF CA1 (+ I CA-WIDTH 1))
;      (AREF CA1 (- I CA-WIDTH -1))
;      (AREF CA1 (- I CA-WIDTH))
;      (AREF CA1 (- I CA-WIDTH 1)))
; --> + + +
; ==>
;   (+
;    (+ (+ (+ (AREF CA1 #) (AREF CA1 #)) (AREF CA1 (+ I CA-WIDTH -1)))
;       (AREF CA1 (+ I CA-WIDTH)))
;    (AREF CA1 (+ I CA-WIDTH 1)))
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a RATIONAL.
;   The second argument is a NUMBER, not a FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a FLOAT.
;   The second argument is a NUMBER, not a RATIONAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a SINGLE-FLOAT.
;   The second argument is a NUMBER, not a DOUBLE-FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a DOUBLE-FLOAT.
;   The second argument is a NUMBER, not a SINGLE-FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;   The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;   The second argument is a NUMBER, not a REAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a REAL.
;   The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;   The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;   The second argument is a NUMBER, not a REAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a REAL.
;   The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).

;     (AREF CA1 (- I CA-WIDTH -1))
; --> LET*
; ==>
;   (SB-KERNEL:HAIRY-DATA-VECTOR-REF ARRAY SB-INT:INDEX)
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a VECTOR, not a SIMPLE-STRING.
;
; note: unable to
;   avoid runtime dispatch on array element type
; because:
;   Upgraded element type of array is not known at compile time.

;     (+ (AREF CA1 (1+ I))
;      (AREF CA1 (1- I))
;      (AREF CA1 (+ I CA-WIDTH -1))
;      (AREF CA1 (+ I CA-WIDTH))
;      (AREF CA1 (+ I CA-WIDTH 1))
;      (AREF CA1 (- I CA-WIDTH -1))
;      (AREF CA1 (- I CA-WIDTH))
;      (AREF CA1 (- I CA-WIDTH 1)))
; --> + +
; ==>
;   (+
;    (+ (+ (+ (+ # #) (AREF CA1 #)) (AREF CA1 (+ I CA-WIDTH)))
;       (AREF CA1 (+ I CA-WIDTH 1)))
;    (AREF CA1 (- I CA-WIDTH -1)))
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a RATIONAL.
;   The second argument is a NUMBER, not a FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a FLOAT.
;   The second argument is a NUMBER, not a RATIONAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a SINGLE-FLOAT.
;   The second argument is a NUMBER, not a DOUBLE-FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a DOUBLE-FLOAT.
;   The second argument is a NUMBER, not a SINGLE-FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;   The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;   The second argument is a NUMBER, not a REAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a REAL.
;   The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;   The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;   The second argument is a NUMBER, not a REAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a REAL.
;   The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).

;     (AREF CA1 (- I CA-WIDTH))
; --> LET*
; ==>
;   (SB-KERNEL:HAIRY-DATA-VECTOR-REF ARRAY SB-INT:INDEX)
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a VECTOR, not a SIMPLE-STRING.
;
; note: unable to
;   avoid runtime dispatch on array element type
; because:
;   Upgraded element type of array is not known at compile time.

;     (+ (AREF CA1 (1+ I))
;      (AREF CA1 (1- I))
;      (AREF CA1 (+ I CA-WIDTH -1))
;      (AREF CA1 (+ I CA-WIDTH))
;      (AREF CA1 (+ I CA-WIDTH 1))
;      (AREF CA1 (- I CA-WIDTH -1))
;      (AREF CA1 (- I CA-WIDTH))
;      (AREF CA1 (- I CA-WIDTH 1)))
; --> +
; ==>
;   (+
;    (+ (+ (+ (+ # #) (AREF CA1 #)) (AREF CA1 (+ I CA-WIDTH 1)))
;       (AREF CA1 (- I CA-WIDTH -1)))
;    (AREF CA1 (- I CA-WIDTH)))
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a RATIONAL.
;   The second argument is a NUMBER, not a FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a FLOAT.
;   The second argument is a NUMBER, not a RATIONAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a SINGLE-FLOAT.
;   The second argument is a NUMBER, not a DOUBLE-FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a DOUBLE-FLOAT.
;   The second argument is a NUMBER, not a SINGLE-FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;   The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;   The second argument is a NUMBER, not a REAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a REAL.
;   The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;   The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;   The second argument is a NUMBER, not a REAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a REAL.
;   The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).

;     (AREF CA1 (- I CA-WIDTH 1))
; --> LET*
; ==>
;   (SB-KERNEL:HAIRY-DATA-VECTOR-REF ARRAY SB-INT:INDEX)
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a VECTOR, not a SIMPLE-STRING.
;
; note: unable to
;   avoid runtime dispatch on array element type
; because:
;   Upgraded element type of array is not known at compile time.

;     (+ (AREF CA1 (1+ I))
;      (AREF CA1 (1- I))
;      (AREF CA1 (+ I CA-WIDTH -1))
;      (AREF CA1 (+ I CA-WIDTH))
;      (AREF CA1 (+ I CA-WIDTH 1))
;      (AREF CA1 (- I CA-WIDTH -1))
;      (AREF CA1 (- I CA-WIDTH))
;      (AREF CA1 (- I CA-WIDTH 1)))
; ==>
;   (+
;    (+ (+ (+ (+ # #) (AREF CA1 #)) (AREF CA1 (- I CA-WIDTH -1)))
;       (AREF CA1 (- I CA-WIDTH)))
;    (AREF CA1 (- I CA-WIDTH 1)))
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a RATIONAL.
;   The second argument is a NUMBER, not a FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a FLOAT.
;   The second argument is a NUMBER, not a RATIONAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a SINGLE-FLOAT.
;   The second argument is a NUMBER, not a DOUBLE-FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a DOUBLE-FLOAT.
;   The second argument is a NUMBER, not a SINGLE-FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;   The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;   The second argument is a NUMBER, not a REAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a REAL.
;   The second argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;   The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;   The second argument is a NUMBER, not a REAL.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a REAL.
;   The second argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).

;     (= SUM 1)
;
; note: unable to
;   open-code FLOAT to RATIONAL comparison
; due to type uncertainty:
;   The first argument is a NUMBER, not a FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;
; note: unable to open code because: The operands might not be the same type.

;     (= CELL 62)
;
; note: unable to
;   open-code FLOAT to RATIONAL comparison
; due to type uncertainty:
;   The first argument is a NUMBER, not a FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).
;
; note: unable to open code because: The operands might not be the same type.

;     (INCF CELL)
; --> LET*
; ==>
;   (+ CELL #:G7)
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a FLOAT.
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX SINGLE-FLOAT).
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a NUMBER, not a (COMPLEX DOUBLE-FLOAT).

;     (SETF (AREF CA2 I) CELL)
; --> SB-KERNEL:%ASET LET*
; ==>
;   (SB-KERNEL:HAIRY-DATA-VECTOR-SET ARRAY SB-INT:INDEX SB-C::NEW-VALUE)
;
; note: unable to
;   optimize
; due to type uncertainty:
;   The first argument is a VECTOR, not a SIMPLE-STRING.
;
; note: unable to
;   avoid runtime dispatch on array element type
; because:
;   Upgraded element type of array is not known at compile time.

;     (LOOP FOR
;         C
;         FROM
;         1
;         TO
;         COUNT
;         DO
;         (LOOP FOR Y FROM 0 BELOW HEIGHT WITH I = (1+ CA-WIDTH) DO ...)
;         (ROTATEF CA1 CA2))
; --> BLOCK LET SB-LOOP::LOOP-BODY TAGBODY WHEN COND IF
; ==>
;   (> C #:LOOP-LIMIT-3)
;
; note: forced to do GENERIC-> (cost 10)
;       unable to do inline fixnum comparison (cost 4) because:
;       The first argument is a (INTEGER 1), not a FIXNUM.
;       The second argument is a REAL, not a FIXNUM.

;     (+ (AREF CA1 (1+ I))
;      (AREF CA1 (1- I))
;      (AREF CA1 (+ I CA-WIDTH -1))
;      (AREF CA1 (+ I CA-WIDTH))
;      (AREF CA1 (+ I CA-WIDTH 1))
;      (AREF CA1 (- I CA-WIDTH -1))
;      (AREF CA1 (- I CA-WIDTH))
;      (AREF CA1 (- I CA-WIDTH 1)))
; --> + + + + + +
; ==>
;   (+ (AREF CA1 (1+ I)) (AREF CA1 (1- I)))
;
; note: forced to do GENERIC-+ (cost 10)
;       unable to do inline float arithmetic (cost 2) because:
;       The first argument is a NUMBER, not a DOUBLE-FLOAT.
;       The second argument is a NUMBER, not a DOUBLE-FLOAT.
;       The result is a (VALUES NUMBER
;                               &OPTIONAL), not a (VALUES DOUBLE-FLOAT &REST T).;       unable to do inline float arithmetic (cost 2) because:
;       The first argument is a NUMBER, not a SINGLE-FLOAT.
;       The second argument is a NUMBER, not a SINGLE-FLOAT.
;       The result is a (VALUES NUMBER
;                               &OPTIONAL), not a (VALUES SINGLE-FLOAT &REST T).;       etc.

; --> + + + + +
; ==>
;   (+ (+ (AREF CA1 (1+ I)) (AREF CA1 (1- I))) (AREF CA1 (+ I CA-WIDTH -1)))
;
; note: forced to do GENERIC-+ (cost 10)
;       unable to do inline float arithmetic (cost 2) because:
;       The first argument is a NUMBER, not a DOUBLE-FLOAT.
;       The second argument is a NUMBER, not a DOUBLE-FLOAT.
;       The result is a (VALUES NUMBER
;                               &OPTIONAL), not a (VALUES DOUBLE-FLOAT &REST T).;       unable to do inline float arithmetic (cost 2) because:
;       The first argument is a NUMBER, not a SINGLE-FLOAT.
;       The second argument is a NUMBER, not a SINGLE-FLOAT.
;       The result is a (VALUES NUMBER
;                               &OPTIONAL), not a (VALUES SINGLE-FLOAT &REST T).;       etc.

; --> + + + +
; ==>
;   (+ (+ (+ (AREF CA1 (1+ I)) (AREF CA1 (1- I))) (AREF CA1 (+ I CA-WIDTH -1)))
;      (AREF CA1 (+ I CA-WIDTH)))
;
; note: forced to do GENERIC-+ (cost 10)
;       unable to do inline float arithmetic (cost 2) because:
;       The first argument is a NUMBER, not a DOUBLE-FLOAT.
;       The second argument is a NUMBER, not a DOUBLE-FLOAT.
;       The result is a (VALUES NUMBER
;                               &OPTIONAL), not a (VALUES DOUBLE-FLOAT &REST T).;       unable to do inline float arithmetic (cost 2) because:
;       The first argument is a NUMBER, not a SINGLE-FLOAT.
;       The second argument is a NUMBER, not a SINGLE-FLOAT.
;       The result is a (VALUES NUMBER
;                               &OPTIONAL), not a (VALUES SINGLE-FLOAT &REST T).;       etc.

; --> + + +
; ==>
;   (+
;    (+ (+ (+ (AREF CA1 #) (AREF CA1 #)) (AREF CA1 (+ I CA-WIDTH -1)))
;       (AREF CA1 (+ I CA-WIDTH)))
;    (AREF CA1 (+ I CA-WIDTH 1)))
;
; note: forced to do GENERIC-+ (cost 10)
;       unable to do inline float arithmetic (cost 2) because:
;       The first argument is a NUMBER, not a DOUBLE-FLOAT.
;       The second argument is a NUMBER, not a DOUBLE-FLOAT.
;       The result is a (VALUES NUMBER
;                               &OPTIONAL), not a (VALUES DOUBLE-FLOAT &REST T).;       unable to do inline float arithmetic (cost 2) because:
;       The first argument is a NUMBER, not a SINGLE-FLOAT.
;       The second argument is a NUMBER, not a SINGLE-FLOAT.
;       The result is a (VALUES NUMBER
;                               &OPTIONAL), not a (VALUES SINGLE-FLOAT &REST T).;       etc.

; --> + +
; ==>
;   (+
;    (+ (+ (+ (+ # #) (AREF CA1 #)) (AREF CA1 (+ I CA-WIDTH)))
;       (AREF CA1 (+ I CA-WIDTH 1)))
;    (AREF CA1 (- I CA-WIDTH -1)))
;
; note: forced to do GENERIC-+ (cost 10)
;       unable to do inline float arithmetic (cost 2) because:
;       The first argument is a NUMBER, not a DOUBLE-FLOAT.
;       The second argument is a NUMBER, not a DOUBLE-FLOAT.
;       The result is a (VALUES NUMBER
;                               &OPTIONAL), not a (VALUES DOUBLE-FLOAT &REST T).;       unable to do inline float arithmetic (cost 2) because:
;       The first argument is a NUMBER, not a SINGLE-FLOAT.
;       The second argument is a NUMBER, not a SINGLE-FLOAT.
;       The result is a (VALUES NUMBER
;                               &OPTIONAL), not a (VALUES SINGLE-FLOAT &REST T).;       etc.

; --> +
; ==>
;   (+
;    (+ (+ (+ (+ # #) (AREF CA1 #)) (AREF CA1 (+ I CA-WIDTH 1)))
;       (AREF CA1 (- I CA-WIDTH -1)))
;    (AREF CA1 (- I CA-WIDTH)))
;
; note: forced to do GENERIC-+ (cost 10)
;       unable to do inline float arithmetic (cost 2) because:
;       The first argument is a NUMBER, not a DOUBLE-FLOAT.
;       The second argument is a NUMBER, not a DOUBLE-FLOAT.
;       The result is a (VALUES NUMBER
;                               &OPTIONAL), not a (VALUES DOUBLE-FLOAT &REST T).;       unable to do inline float arithmetic (cost 2) because:
;       The first argument is a NUMBER, not a SINGLE-FLOAT.
;       The second argument is a NUMBER, not a SINGLE-FLOAT.
;       The result is a (VALUES NUMBER
;                               &OPTIONAL), not a (VALUES SINGLE-FLOAT &REST T).;       etc.

; ==>
;   (+
;    (+ (+ (+ (+ # #) (AREF CA1 #)) (AREF CA1 (- I CA-WIDTH -1)))
;       (AREF CA1 (- I CA-WIDTH)))
;    (AREF CA1 (- I CA-WIDTH 1)))
;
; note: forced to do GENERIC-+ (cost 10)
;       unable to do inline float arithmetic (cost 2) because:
;       The first argument is a NUMBER, not a DOUBLE-FLOAT.
;       The second argument is a NUMBER, not a DOUBLE-FLOAT.
;       The result is a (VALUES NUMBER
;                               &OPTIONAL), not a (VALUES DOUBLE-FLOAT &REST T).;       unable to do inline float arithmetic (cost 2) because:
;       The first argument is a NUMBER, not a SINGLE-FLOAT.
;       The second argument is a NUMBER, not a SINGLE-FLOAT.
;       The result is a (VALUES NUMBER
;                               &OPTIONAL), not a (VALUES SINGLE-FLOAT &REST T).;       etc.

;     (INCF CELL)
; --> LET*
; ==>
;   (+ CELL #:G7)
;
; note: forced to do GENERIC-+ (cost 10)
;       unable to do inline fixnum arithmetic (cost 1) because:
;       The first argument is a NUMBER, not a FIXNUM.
;       The result is a (VALUES NUMBER &OPTIONAL), not a (VALUES FIXNUM &REST T).
;       unable to do inline fixnum arithmetic (cost 2) because:
;       The first argument is a NUMBER, not a FIXNUM.
;       The result is a (VALUES NUMBER &OPTIONAL), not a (VALUES FIXNUM &REST T).
;       etc.

;     (INCF I 2)
; --> LET*
; ==>
;   (+ I #:G14)
;
; note: forced to do GENERIC-+ (cost 10)
;       unable to do inline fixnum arithmetic (cost 1) because:
;       The first argument is a NUMBER, not a FIXNUM.
;       The result is a (VALUES NUMBER &OPTIONAL), not a (VALUES FIXNUM &REST T).
;       unable to do inline fixnum arithmetic (cost 2) because:
;       The first argument is a NUMBER, not a FIXNUM.
;       The result is a (VALUES NUMBER &OPTIONAL), not a (VALUES FIXNUM &REST T).
;       etc.

;     (LOOP FOR
;         C
;         FROM
;         1
;         TO
;         COUNT
;         DO
;         (LOOP FOR Y FROM 0 BELOW HEIGHT WITH I = (1+ CA-WIDTH) DO ...)
;         (ROTATEF CA1 CA2))
; --> BLOCK LET SB-LOOP::LOOP-BODY TAGBODY SB-LOOP::LOOP-REALLY-DESETQ SETQ THE
; --> 1+
; ==>
;   (+ C 1)
;
; note: forced to do GENERIC-+ (cost 10)
;       unable to do inline fixnum arithmetic (cost 1) because:
;       The first argument is a (INTEGER 1), not a FIXNUM.
;       The result is a (VALUES (INTEGER 2)
;                               &OPTIONAL), not a (VALUES FIXNUM &REST T).
;       unable to do inline fixnum arithmetic (cost 2) because:
;       The first argument is a (INTEGER 1), not a FIXNUM.
;       The result is a (VALUES (INTEGER 2)
;                               &OPTIONAL), not a (VALUES FIXNUM &REST T).
;       etc.
;
; compilation unit finished
;   printed 121 notes
From: Frank Buss
Subject: Re: How to Speed up Cellular Automata
Date: 
Message-ID: <d5logq$keb$1@newsreader3.netcologne.de>
Adam Warner <······@consulting.net.nz> wrote:

> By the way the proclaim has no compile-time effect. Wrap it in an
> eval-when (...) ["If compile time side effects are desired, eval-when
> may be useful."] or use declaim instead ["In most such cases, however,
> it is preferrable to use declaim for this purpose."]

at least in LispWorks it looks like there is no speed difference for the 
"generations" function when adding the declaim or proclaim (which looks 
like they behave the same, except that declaim is a macro and doesn't 
need the quote), when nothing other is changed.

But when I add these declares after the let:

    (declare (type (simple-array (unsigned-byte 8) (*)) ca1))
    (declare (type (simple-array (unsigned-byte 8) (*)) ca2))

it is 10 times faster, if I use the global declaim or local declare, so 
it is evaluated at least in LispWorks. Now it is only ca. 10 times slower 
than the C version.

I don't understand this huge speed improvement, because just a line 
before I created the array with the specified element type and I don't 
understand the disassembly any more :-)

[ many "unable to optimize" compiler messages ]

Looks like this doesn't help me. I've added integer declares and "of-type 
integer" for all loop variables, but the same speed. The problem is the 
complicated looking assembler, I think. The LispWorks function needs 521 
opcodes, while the Visual Studio .NET function needs 204 opcodes and 
looks more straightforward.

Perhaps there are better optimization tricks in the web pages others have 
posted, but for me it looks like an trial-and-error game: add as much 
declares, proclaims and whatever as possible, hoping that one of these 
hints produces faster code. Or trying to restructure the code as long as 
the the compiler produces more straightforward assembler code.

Of course, all these technics are independent of high-level technics, 
like memoization and building tree structures for handling large blocks 
of the cellular automaton space at once, but this is not possible for 
other problems and if the array access itself is the limiting factor, 
there is no way to improve it in pure Lisp.

Perhaps other Lisp implementations are better, but I don't think much. As 
posted by someone other, SBCL is 30 times faster with some declarations 
compared to a version without the declarations, but I assume this was on 
Linux, so it can't be compared to the Windows version.

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: Rainer Joswig
Subject: Re: How to Speed up Cellular Automata
Date: 
Message-ID: <joswig-FCF2AE.22162508052005@news-europe.giganews.com>
In article <············@newsreader3.netcologne.de>,
 Frank Buss <··@frank-buss.de> wrote:

> Adam Warner <······@consulting.net.nz> wrote:
> 
> > By the way the proclaim has no compile-time effect. Wrap it in an
> > eval-when (...) ["If compile time side effects are desired, eval-when
> > may be useful."] or use declaim instead ["In most such cases, however,
> > it is preferrable to use declaim for this purpose."]
> 
> at least in LispWorks it looks like there is no speed difference for the 
> "generations" function when adding the declaim or proclaim (which looks 
> like they behave the same, except that declaim is a macro and doesn't 
> need the quote), when nothing other is changed.
> 
> But when I add these declares after the let:
> 
>     (declare (type (simple-array (unsigned-byte 8) (*)) ca1))
>     (declare (type (simple-array (unsigned-byte 8) (*)) ca2))
> 
> it is 10 times faster, if I use the global declaim or local declare, so 
> it is evaluated at least in LispWorks. Now it is only ca. 10 times slower 
> than the C version.
> 
> I don't understand this huge speed improvement, because just a line 
> before I created the array with the specified element type and I don't 
> understand the disassembly any more :-)

The element-type when you create the array is mostly only
used for the layout of the datastructure. Few Lisp
compilers can propagate this information.
You still need to declare the variables, function parameters,
return values, etc. - unless the implementation uses a certain
amount of type inferences.

Typical stuff:

(setf foo (1+ bar))

bar was somewhere declared as fixnum. The result possibly won't be.
And foo can't be declared as fixnum either.

You end up with:

(setf foo (the fixnum (1+ bar)))

Telling that the result of adding 1 to a fixnum stays a fixnum.
Which is not true - when your fixnum is already the largest fixnum.

Unfortunately fixnums are in a 32bit Lisp most of the time
not 32 bits wide, since you have a few bits for tags.

LispWorks provides for this case now special support with some
special operators.

> [ many "unable to optimize" compiler messages ]
> 
> Looks like this doesn't help me. I've added integer declares and "of-type 
> integer" for all loop variables, but the same speed. The problem is the 
> complicated looking assembler, I think. The LispWorks function needs 521 
> opcodes, while the Visual Studio .NET function needs 204 opcodes and 
> looks more straightforward.
> 
> Perhaps there are better optimization tricks in the web pages others have 
> posted, but for me it looks like an trial-and-error game: add as much 
> declares, proclaims and whatever as possible, hoping that one of these 
> hints produces faster code. Or trying to restructure the code as long as 
> the the compiler produces more straightforward assembler code.

There is a bit truth to it. You need to look at the disassembly to
understand the various effects of the optimization, read the
implementations manual about the compiler atleast twice,
experiment quite a bit - after some time you get
a feeling what is possible and what not. There are quite a few
areas where the compiler will behave differently.
Lisp comes with a moderately large library and the implementation
+ compiler technology combination will really make a difference.

- arithmetic with the various number types
- sequences, strings, arrays, hashtables
- function calling, argument passing, stack allocation, ...
- consing
- IO, streams, buffered streams, ...
- defstruct vs. defclass

and more.

You need to get experience - but this is not really different
from what you do in C. GCC has very different performance characteristics
compared to, say, IBM's C.

The question is whether you want to invest the time and if its
worth it. If you just want a few small fast running programs, C
might be a better choice. On the other hand, if you want to
develop a larger (commercial) application, it will be useful
to invest that time, to have most of the code in Lisp and only
the fewest amount in foreign code - if that helps you to be
productive.

> Of course, all these technics are independent of high-level technics, 
> like memoization and building tree structures for handling large blocks 
> of the cellular automaton space at once, but this is not possible for 
> other problems and if the array access itself is the limiting factor, 
> there is no way to improve it in pure Lisp.
> 
> Perhaps other Lisp implementations are better, but I don't think much. As 
> posted by someone other, SBCL is 30 times faster with some declarations 
> compared to a version without the declarations, but I assume this was on 
> Linux, so it can't be compared to the Windows version.
From: Bulent Murtezaoglu
Subject: Re: How to Speed up Cellular Automata
Date: 
Message-ID: <87k6m95v99.fsf@p4.internal>
>>>>> "FB" == Frank Buss <··@frank-buss.de> writes:

    FB> Adam Warner <······@consulting.net.nz> wrote:
    >> By the way the proclaim has no compile-time effect. Wrap it in
    >> an eval-when (...) ["If compile time side effects are desired,
    >> eval-when may be useful."] or use declaim instead ["In most
    >> such cases, however, it is preferrable to use declaim for this
    >> purpose."]

    FB> at least in LispWorks it looks like there is no speed
    FB> difference for the "generations" function when adding the
    FB> declaim or proclaim (which looks like they behave the same,
    FB> except that declaim is a macro and doesn't need the quote),
    FB> when nothing other is changed.

I have not tested the code, so I believe you about your empirical 
observation (perhaps you had a declare in there already?).  
Nonetheless, Adam is right about proclaim/declaim.  Check the hyperspec.

[...]
    FB> Looks like this doesn't help me. I've added integer declares
    FB> and "of-type integer" for all loop variables, but the same
    FB> speed. 

Perhaps you think the integer type in CL is like the integer type in 
C (ie 'machine integer').  It isn't, bignums are also integers.  You 
probably want to declare them fixnums and furthermore promise the 
compiler that they stay fixnums (lispworks should have a fixnum-safety 
switch for this, though I am not sure of the name).  

[...]
    FB> Perhaps there are better optimization tricks in the web pages
    FB> others have posted, but for me it looks like an
    FB> trial-and-error game: add as much declares, proclaims and
    FB> whatever as possible, hoping that one of these hints produces
    FB> faster code. Or trying to restructure the code as long as the
    FB> the compiler produces more straightforward assembler code.

This kind of optimization is almost always a process of trial and
error, but what to try becomes easier to pick once one's more
comfortable with the language and the implementation.  It is a good
exercise that you are attempting, and when you understand how
everything works and truly hit your implementation's limits, the next
time it will be easier and will seem less random.  Also, if you are
tied to Lispworks, try their mailing list too.  Lispworks people read
that and vendors know their implementation best.

cheers,

BM
From: ···········@mail.ru
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <1115583889.719131.109750@z14g2000cwz.googlegroups.com>
I removed file operations and call (time (generations 100)) without
declarations (LispWorks 4.4 Windows XP P4-C 2.6Ghz). Final version with
declarations:

(declaim (optimize (speed 3)(safety 0)(debug 0)(fixnum-safety 0)))

(defun generations (count)
  "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )"
  (let* ((width 640)
         (height 480)
         (ca-width (+ width 2))
         (ca-height (+ height 2))
         (len (* ca-width ca-height))
         (ca1 (make-array len :element-type '(unsigned-byte 8)
:initial-element 0))
         (ca2 (make-array len :element-type '(unsigned-byte 8)
:initial-element 0)))
    (declare (fixnum width height ca-width ca-height len)
             (type (simple-array (unsigned-byte 8) (*)) ca1 ca2))
    ;; seed the cellular automaton space
    (setf (aref ca1 (- (floor len 2) ca-width 4)) 1)

    ;; calcuclate the generation "count"
    (loop for c fixnum from 1 to count do
          (loop for y fixnum from 0 below height
                with i of-type fixnum = (1+ ca-width) do
                (loop for x fixnum from 0 below width do
                      (let ((cell (aref ca1 i)))
                        (declare (fixnum cell))
                        (if (= cell 0)
                            ;; summing the 8 neighbours
                            (let ((sum (+ (aref ca1 (1+ i))
                                          (aref ca1 (1- i))
                                          (aref ca1 (+ i ca-width -1))
                                          (aref ca1 (+ i ca-width))
                                          (aref ca1 (+ i ca-width 1))
                                          (aref ca1 (- i ca-width -1))
                                          (aref ca1 (- i ca-width))
                                          (aref ca1 (- i ca-width
1)))))
                              (when (= sum 1) (setf cell 1)))
                          (if (= cell 62)
                              (setf cell 0)
                            (incf cell)))
                        (setf (aref ca2 i) cell)
                        (incf i)))
                (incf i 2))
          (rotatef ca1 ca2))))

Result was:

user time    =     23.187
system time  =      0.000
Elapsed time =   0:00:24
Allocation   = 645192 bytes standard / 21384 bytes conses
0 Page faults

With type declarations :

Timing the evaluation of (GENERATIONS 100)

user time    =      2.859
system time  =      0.000
Elapsed time =   0:00:03
Allocation   = 630048 bytes standard / 8954 bytes conses
0 Page faults

Not so bad ?

Best regards
Lisper
From: Matthias Buelow
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <3e6iuqF1f083U3@news.dfncis.de>
Frank Buss <··@frank-buss.de> wrote:

>without any special optimization declarations in the C program. So the
>result is, implementing modern games (which means some action games,
>where it is not a good idea too make a 1 GHz PC look like a 100 MHz PC)
>in Lisp is not possible without the help of low-level C libraries. 

Well, this very issue (games) has been iterated over the last few
weeks in great detail with Mr. van Every if not here then in
comp.lang.scheme.  The common solution seems to be to write the
lower-level stuff (graphics engine etc.) in C or C++, and the "game
logics" in a high-level language such as a dialect of Lisp or Scheme,
that drives the lower-level layers.  By this approach, you get the
near maximum performance where necessary aswell as a great deal of
flexibility and programming comfort in the logical levels.  For
reference, look at Quake and their use of QuakeC, an interpreted
C-like language, that is used to script all the games on top of the
Quake engine (which provides the graphics, networking and OS
interface).

mkb.
From: Patrick Frankenberger
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <d5kvqm$ni1$03$1@news.t-online.com>
Frank Buss schrieb:
> In C you can calculate more than 12,000 generations in 20 seconds (with
> Visual Studio .NET in release mode compiled, with all default settings
> for console project), so the current Lisp implementations is more than
> 120 times slower than the C implementation. 

VS.NET seems to be great at optimizing this special code, it is more 
than twice as fast as GCC.
With declarations for the simple-arrays Lisp does 500 generations in 20 
s while the GCC executable does 5000.

The inner loop of your lispcode takes more than 1000 byte of machine 
code when compiled with ACL6.2 and has more than 20 different variables.

Is the lispcode that bad? Or the native code generator?
From: Patrick Frankenberger
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <d5l0e4$u5g$00$1@news.t-online.com>
Patrick Frankenberger schrieb:
> Frank Buss schrieb:
> 
>> In C you can calculate more than 12,000 generations in 20 seconds (with
>> Visual Studio .NET in release mode compiled, with all default settings
>> for console project), so the current Lisp implementations is more than
>> 120 times slower than the C implementation. 
> 
> 
> VS.NET seems to be great at optimizing this special code, it is more 
> than twice as fast as GCC.
> With declarations for the simple-arrays Lisp does 500 generations in 20 
> s while the GCC executable does 5000.
> 
> The inner loop of your lispcode takes more than 1000 byte of machine 
> code when compiled with ACL6.2 and has more than 20 different variables.
> 
> Is the lispcode that bad? Or the native code generator?

Missed one important typespec, now lisp does 1600 generations in 20s and 
the machine code is only 450 byte.
From: Kenny Tilton
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <6Lsfe.16606$n93.12933@twister.nyc.rr.com>
Patrick Frankenberger wrote:
> Patrick Frankenberger schrieb:
> 
>> Frank Buss schrieb:
>>
>>> In C you can calculate more than 12,000 generations in 20 seconds (with
>>> Visual Studio .NET in release mode compiled, with all default settings
>>> for console project), so the current Lisp implementations is more than
>>> 120 times slower than the C implementation. 
>>
>>
>>
>> VS.NET seems to be great at optimizing this special code, it is more 
>> than twice as fast as GCC.
>> With declarations for the simple-arrays Lisp does 500 generations in 
>> 20 s while the GCC executable does 5000.
>>
>> The inner loop of your lispcode takes more than 1000 byte of machine 
>> code when compiled with ACL6.2 and has more than 20 different variables.
>>
>> Is the lispcode that bad? Or the native code generator?
> 
> 
> Missed one important typespec, now lisp does 1600 generations in 20s and 
> the machine code is only 450 byte.

Could you post that or email me your version? I only got 300 in 20s, 
using AllegroCL 7.

kenny

-- 
Cells? Cello?: http://www.common-lisp.net/project/cells/
Cells-Gtk?: http://www.common-lisp.net/project/cells-gtk/
Why Lisp? http://lisp.tech.coop/RtL%20Highlight%20Film

"Doctor, I wrestled with reality for forty years, and I am happy to 
state that I finally won out over it." -- Elwood P. Dowd
From: Patrick Frankenberger
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <d5ln8s$at7$03$1@news.t-online.com>
Kenny Tilton schrieb:
> Could you post that or email me your version? I only got 300 in 20s, 
> using AllegroCL 7.

Sure:

(defun generations (count file)
   "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )"
   (let* ((width 640)
	 (height 480)
	 (ca-width (+ width 2))
	 (ca-height (+ height 2))
	 (len (* ca-width ca-height))
	 (ca1 (make-array len
			  :element-type '(unsigned-byte 8)
			  :initial-element 0))
	 (ca2 (make-array len
			  :element-type '(unsigned-byte 8)
			  :initial-element 0)))
     (declare (type (simple-array (unsigned-byte 8) (*)) ca1 ca2))
     (setf (aref ca1 (- (floor len 2) ca-width 4)) 1)
     (loop for c from 1 to count do
	  (loop for y of-type fixnum from 0 below height with i of-type
		fixnum = (1+ ca-width) do
		(loop for x of-type fixnum from 0 below width do
		      (let ((cell (aref ca1 i)))
			(if (zerop cell)
			    (let ((sum
				   (+
				    (aref ca1 (1+ i))
				    (aref ca1 (1- i))
				    (aref ca1 (+ i ca-width -1))
				    (aref ca1 (+ i ca-width))
				    (aref ca1 (+ i ca-width 1))
				    (aref ca1 (- i ca-width -1))
				    (aref ca1 (- i ca-width))
				    (aref ca1 (- i ca-width 1)))))
			      (when (= sum 1) (setf cell 1)))
			    (if (= cell 62) (setf cell 0) (incf cell)))
			(setf (aref ca2 i) cell)
			(incf i)))
		(incf i 2))
	  (rotatef ca1 ca2))
     (with-open-file (stream file :direction :output :element-type
			    '(unsigned-byte 8) :if-exists :supersede
			    :if-does-not-exist :create)
       (loop for y from 0 below ca-height with i = 0 do
	    (loop for x from 0 below ca-width do
		  (let ((cell (aref ca1 i))) (write-byte cell stream))
		  (incf i))))))
From: Paolo Amoroso
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <878y2pnagv.fsf@plato.moon.paoloamoroso.it>
Frank Buss <··@frank-buss.de> writes:

> But perhaps it is really easy to create fast Lisp code, so below is the

Have you checked the optimization sections of Norvig's PAIP book?  See
also the links in the Papers section of this page:

  http://openmap.bbn.com/~kanderso/performance/


Paolo
-- 
Why Lisp? http://lisp.tech.coop/RtL%20Highlight%20Film
Recommended Common Lisp libraries/tools (see also http://clrfi.alu.org):
- ASDF/ASDF-INSTALL: system building/installation
- CL-PPCRE: regular expressions
- UFFI: Foreign Function Interface
From: Rainer Joswig
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <joswig-C01808.17332308052005@news-europe.giganews.com>
In article <············@newsreader3.netcologne.de>,
 Frank Buss <··@frank-buss.de> wrote:

> Kenny Tilton <·······@nyc.rr.com> wrote:
> 
> > You 
> > have a 640x480 loop, aka 307k iterations. In the loop you hit aref a 
> > dozen times, god knows what else, without an iota of optimization.
> 
> yes, this is the speed limiting factor. Copying the 640x480 bitmap with
> OpenGL needs nearly no time (nearly no CPU usage at full 60 fps rate),
> but the cellular automaton algorithm is too slow in my Lisp
> implementation. 
> 
> > When you want to do assembler in Lisp, you need to defeat the mad cool
> > things Lisp does on every instruction. aka, declarations out the
> > wazoo. you might even want to lose the array and get into bit
> > operations. When in doubt, disassemble and see how tight is your code.
> 
> you are right, but this is difficult. In C I just write the code and I
> know that the compiler produces something fast. I've tried with the help
> of others to optimize my texture code in Lisp (and one main result was,
> that declaring the type of the array helps LispWorks alot, but declaring
> the type of normal variables has nearly no effects) and my sound
> generation code, but looks like all this is about 10 times faster in C,
> without any special optimization declarations in the C program. So the
> result is, implementing modern games (which means some action games,
> where it is not a good idea too make a 1 GHz PC look like a 100 MHz PC)
> in Lisp is not possible without the help of low-level C libraries. 
> 
> But don't misunderstand me, I like Lisp and I don't want to use C, if
> possible. Perhaps ECL ( http://ecls.sourceforge.net/ ) is a better
> solution for such tasks? Has anyone used this for larger, time critical
> applications? 

Even writing in C naively will get you different levels of speed.
On the Mac for example the IBM compiler suite will generate
faster code than GCC. Making the code use the Altivec unit
also can make a big difference. Finally you would also need
to use any of the performance analysis tools from Apple
to try out different variants of your code and to
see what kind of effects it has when you use a G5 processor.

ECL is a not very mature (IMHO) and not really a high-performance
Lisp system. I guess you will end up with both lots of
C code and Lisp side which will keep you busy.

Most of the higher-performance Lisp applications have
applied one of the following strategies:

- Use a special purpose Lisp compiler to generate optimized
  code. Example: This has been done by Naughty Dog. The Lisp compiler
  was written in another Lisp (Allegro Common Lisp).

- Use a commercial Common Lisp implementation with relatively
  good performance (Allegro CL and LispWorks probably),
  let the vendor implement the necessary performance improvements
  if possible and affordable, write specialized high-performance
  code in its own module using all the Lisp implementations'
  support and interface to special C libraries where necessary.

  This has been done in many commercial applications written
  in Lisp. In some areas they can get applications with higher speed
  then you get with C/C++. But often developers also will have
  to move some functionality to C/C++ and interface to it, to improve
  the speed. Sometimes the developers later are moving away from
  Lisp and are reimplementing all or large parts of the
  application in C/C++.

- Use one of the free Common Lisp implementations with a good
  compiler (CMUCL, SBCL for example). This is mostly useful
  if you work in some of the application domains where these
  Lisp systems are quite good (numerics, ...).

- Use special hardware. Special hardware also can be just a cluster
  of normal x86/Linux boxes. In former, more stylish times,
  it could have been a Thinking Machine parallel computer.

Still:

- Common Lisp compilers have very different performance characteristics.

- It is almost impossible to get the best performance code without
  tuning it for a special Lisp implementation.

- Naive code usually will not get you much speed. Often though it is
  enough, but for high-performance Lisp code you need to do something.

- Good performance can be increased with special techniques
  (memoizing, partial evaluation, resources, ...).

- Often naive type declarations are wrong.

- GC is always an issue.

- Use tools. Metering, ...

- Conversion of Lisp data types to and from native data types is often an issue.

- Another problem can be blocking I/O in Lisp systems with their own
  process scheduler. A process waiting for I/O will halt the whole
  Lisp for that period of time. Typical solution was to hand I/O
  over to external programs.

Some resources:

Ken Anderson's page (who died some time ago, unfortunately):
http://openmap.bbn.com/~kanderso/performance/

Building Common Lisp Applications with Reasonable Performance
http://bmrc.berkeley.edu/research/publications/1993/125/Lisp.html

Performance and Evaluation of LISP Systems 
http://mitpress.mit.edu/catalog/item/default.asp?tid=6005&ttype=2


Summary: you need to invest more work to make your code faster.
The obvious first thing is to think about type declarations.
From: Thomas F. Burdick
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <xcv7ji88utd.fsf@conquest.OCF.Berkeley.EDU>
Frank Buss <··@frank-buss.de> writes:

> But perhaps it is really easy to create fast Lisp code, so below is the
> main problem, first in Common Lisp, then in ANSI C. It looks like the
> Lisp implemenation is really slow: 
> 
> CL-USER > (time (generations 100 "c:\\tmp\\cl.bin"))
> Timing the evaluation of (GENERATIONS 100 "c:\\tmp\\cl.bin")

I spent about two minutes on this code with SBCL, and it now runs at
2x the speed of C.  I'm sure it could be made faster, but this what I
got by spending essentially no time on it.  I don't actually ever
write (safety 0), because I *like* having the extra checks in there,
but it does make it closer to the C in semantics.

I suspect that, at least with SBCL, changing those loops to dotimes
will speed things up further.

(defconstant width 640)
(defconstant height 480)
(defconstant ca-width (+ width 2))
(defconstant ca-height (+ height 2))

(defun generations (count file)
  "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )"
  (declare (optimize (speed 3) (safety 0))
	   (type (mod 16) count))
  (let* ((len (* ca-width ca-height))
         (ca1 (make-array len :element-type '(unsigned-byte 8) :initial-element 0))
         (ca2 (make-array len :element-type '(unsigned-byte 8) :initial-element 0)))
    (declare (type (simple-array (unsigned-byte 8)) ca1 ca2))
    ;; seed the cellular automaton space
    (setf (aref ca1 (- (floor len 2) ca-width 4)) 1)

    ;; calcuclate the generation "count"
    (loop for c from 1 to count do
          (loop for y from 0 below height
                with i = (1+ ca-width) do
                (loop for x from 0 below width do
                      (let ((cell (aref ca1 i)))
                        (if (zerop cell)
                            ;; summing the 8 neighbours
                            (let ((sum (+ (aref ca1 (1+ i))
                                          (aref ca1 (1- i))
                                          (aref ca1 (+ i ca-width -1))
                                          (aref ca1 (+ i ca-width))
                                          (aref ca1 (+ i ca-width 1))
                                          (aref ca1 (- i ca-width -1))
                                          (aref ca1 (- i ca-width))
                                          (aref ca1 (- i ca-width 1)))))
                              (when (= sum 1) (setf cell 1)))
                          (if (= cell 62)
                              (setf cell 0)
                            (incf cell)))
                        (setf (aref ca2 i) cell)
                        (incf i)))
                (incf i 2))
          (rotatef ca1 ca2))

    ;; save it
    (with-open-file (stream file :direction :output
                            :element-type '(unsigned-byte 8)
                            :if-exists :supersede
                            :if-does-not-exist :create)
      (loop for y from 0 below ca-height
            with i = 0 do
            (loop for x from 0 below ca-width do
                  (let ((cell (aref ca1 i)))
                    (write-byte cell stream))
                  (incf i))))))
From: Patrick Frankenberger
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <d5nt56$d2s$05$1@news.t-online.com>
Thomas F. Burdick schrieb:
> Frank Buss <··@frank-buss.de> writes:
> 
> 
>>But perhaps it is really easy to create fast Lisp code, so below is the
>>main problem, first in Common Lisp, then in ANSI C. It looks like the
>>Lisp implemenation is really slow: 
>>
>>CL-USER > (time (generations 100 "c:\\tmp\\cl.bin"))
>>Timing the evaluation of (GENERATIONS 100 "c:\\tmp\\cl.bin")
> 
> 
> I spent about two minutes on this code with SBCL, and it now runs at
> 2x the speed of C.  I'm sure it could be made faster, but this what I
> got by spending essentially no time on it.  I don't actually ever
> write (safety 0), because I *like* having the extra checks in there,
> but it does make it closer to the C in semantics.
> 
> (defun generations (count file)
>   "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )"
>   (declare (optimize (speed 3) (safety 0))
> 	   (type (mod 16) count))

(mod 16) is a wrong type for count. (mod n) is a short form for (integer 
0 n).

If the compiler really creates code that forces count to be (integer 0 
16) than it is faster because it does less.

2x the speed of C? How did you compile your C-code? Which CPU?
From: Raffael Cavallaro
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <2005050914324116807%raffaelcavallaro@pasdespamsilvousplaitdotmaccom>
On 2005-05-09 10:46:35 -0400, Patrick Frankenberger 
<···············@gmx.net> said:

> (mod 16) is a wrong type for count. (mod n) is a short form for (integer 0 n).
> 
> If the compiler really creates code that forces count to be (integer 0 
> 16) than it is faster because it does less.

Not Thomas Burdick here, but, FWIW, changing this declaration from (mod 
16) to fixnum makes no appreciable difference to the execution time for 
sbcl on Mac OS X - i.e., it's just as fast without the (mod 16) 
declaration.
From: Thomas F. Burdick
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <xcv1x8f8r9f.fsf@conquest.OCF.Berkeley.EDU>
Patrick Frankenberger <···············@gmx.net> writes:

> Thomas F. Burdick schrieb:
> > Frank Buss <··@frank-buss.de> writes:
> > 
> > 
> >>But perhaps it is really easy to create fast Lisp code, so below is the
> >>main problem, first in Common Lisp, then in ANSI C. It looks like the
> >>Lisp implemenation is really slow: 
> >>
> >>CL-USER > (time (generations 100 "c:\\tmp\\cl.bin"))
> >>Timing the evaluation of (GENERATIONS 100 "c:\\tmp\\cl.bin")
> > 
> > 
> > I spent about two minutes on this code with SBCL, and it now runs at
> > 2x the speed of C.  I'm sure it could be made faster, but this what I
> > got by spending essentially no time on it.  I don't actually ever
> > write (safety 0), because I *like* having the extra checks in there,
> > but it does make it closer to the C in semantics.
> > 
> > (defun generations (count file)
> >   "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )"
> >   (declare (optimize (speed 3) (safety 0))
> > 	   (type (mod 16) count))
> 
> (mod 16) is a wrong type for count. (mod n) is a short form for (integer 
> 0 n).
> 
> If the compiler really creates code that forces count to be (integer 0 
> 16) than it is faster because it does less.

Yeah, that was a brain-o, I meant (unsigned-byte 16).  It doesn't
affect the result, though.

> 2x the speed of C? How did you compile your C-code? Which CPU?

Sloppy wording on my part: I meant 1/2 the speed.  This was on the
SPARC, with SBCL-0.9.0 and gcc-2.8.1 compiling with -O2.
From: =?ISO-8859-15?Q?Andr=E9_Thieme?=
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <d5q7cl$7j4$2@ulric.tng.de>
Thomas F. Burdick schrieb:

> Sloppy wording on my part: I meant 1/2 the speed.  This was on the
> SPARC, with SBCL-0.9.0 and gcc-2.8.1 compiling with -O2.

I am still not certain what you mean. Does your Lisp program now have 
half the speed of the C program, meaing it has the double runtime?
Or does it have half of the runtime of the C program which would mean it 
as twice as fast as it?


Andr�
--
From: Alan Crowe
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <863bsw1inh.fsf@cawtech.freeserve.co.uk>
Frank Buss wrote
> But perhaps it is really easy to create fast Lisp code, so
> below is the main problem, first in Common Lisp, then in
> ANSI C. It looks like the Lisp implemenation is really
> slow:

I put in some declarations and my old Pentium
running FreeBSD gave

     cawtech$ cc orig.c
     cawtech$ time  a.out 100 c-output

     real    0m20.227s
     user    0m20.095s
     sys     0m0.051s

and CMUCL managed

     * (time (generations 100 "orig-lisp-output"))
     Compiling LAMBDA NIL: 
     Compiling Top-Level Form: 

     Evaluation took:
       38.2 seconds of real time
       36.990864 seconds of user run time
       0.062134 seconds of system run time
       0 page faults and
       618912 bytes consed.
     NIL

20 seconds for C, 37 seconds for Lisp.
That is within a factor of two.

Mind you, cranking up the optimization on the C compiler
gives it another x 2 1/2.

cawtech$ cc -O3 orig.c
cawtech$ time  a.out 100 c-output

real    0m7.969s
user    0m7.847s
sys     0m0.044s

Here are is the Lisp file with my declarations

(declaim (optimize (speed 3) (safety 0) (debug 0)))

(defun generations (count file)
  (declare (fixnum count))
  "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )"
  (let* ((width 640)
         (height 480)
         (ca-width (+ width 2))
         (ca-height (+ height 2))
         (len (* ca-width ca-height))
         (ca1 (make-array len :element-type '(unsigned-byte 8)
			  :initial-element 0))
         (ca2 (make-array len :element-type '(unsigned-byte 8)
			  :initial-element 0)))
    (declare (fixnum width height
		     ca-width ca-height
		     len))
    (declare (type (array (unsigned-byte 8) 1) ca1 ca2))
    ;; seed the cellular automaton space
    (setf (aref ca1 (- (floor len 2) ca-width 4)) 1)

    ;; calcuclate the generation "count"
    (loop for c of-type fixnum from 1 to count do
          (loop for y of-type fixnum from 0 below height
                with i of-type fixnum = (1+ ca-width) do
                (loop for x of-type fixnum from 0 below width do
                      (let ((cell (aref ca1 i)))
			(declare (fixnum cell))
                        (if (zerop cell)
                            ;; summing the 8 neighbours
                            (let ((sum (+ (aref ca1 (1+ i))
                                          (aref ca1 (1- i))
                                          (aref ca1 (+ i ca-width -1))
                                          (aref ca1 (+ i ca-width))
                                          (aref ca1 (+ i ca-width 1))
                                          (aref ca1 (- i ca-width -1))
                                          (aref ca1 (- i ca-width))
                                          (aref ca1 (- i ca-width 1)))))
			      (declare (fixnum sum))
                              (when (= sum 1) (setf cell 1)))
                          (if (= cell 62)
                              (setf cell 0)
                            (incf cell)))
                        (setf (aref ca2 i) cell)
                        (incf i)))
                (incf i 2))
          (rotatef ca1 ca2))

    ;; save it
    (with-open-file (stream file :direction :output
                            :element-type '(unsigned-byte 8)
                            :if-exists :supersede
                            :if-does-not-exist :create)
      (loop for y of-type fixnum from 0 below ca-height
            with i of-type fixnum = 0 do
            (loop for x of-type fixnum from 0 below ca-width do
                  (let ((cell (aref ca1 i)))
                    (write-byte cell stream))
                  (incf i))))))

Alan Crowe
Edinburgh
Scotland
From: Ulrich Hobelmann
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <3e9gu9F1tglgU1@individual.net>
Alan Crowe wrote:
> Frank Buss wrote
> 
>>But perhaps it is really easy to create fast Lisp code, so
>>below is the main problem, first in Common Lisp, then in
>>ANSI C. It looks like the Lisp implemenation is really
>>slow:
> 
> 
> I put in some declarations and my old Pentium
> running FreeBSD gave
> 
>      cawtech$ cc orig.c
>      cawtech$ time  a.out 100 c-output

try
$ cc -O2 orig.c
to get more speed...
Everything else is cheating in favor of Lisp ;)

-- 
No man is good enough to govern another man without that other's 
consent. -- Abraham Lincoln
From: ···········@mail.ru
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <1115882785.561832.20670@o13g2000cwo.googlegroups.com>
The fastest version for LispWorks is still 3.81 times slower than
VC++7.1 on 12000 iterations.

(declaim (optimize (speed 3)(safety 0)(debug 0)(space 0)(fixnum-safety
0)(float 0)))

(defun generations (count)
  "a generations CA (see http://www.mirekw.com/ca/rullex_gene.html )"
  (declare (fixnum count))
  (let* ((width 640)
         (height 480)
         (ca-width (+ width 2))
         (ca-height (+ height 2))
         (len (* ca-width ca-height))
         (ca1 (make-array len :element-type '(unsigned-byte 8)
:initial-element 0))
         (ca2 (make-array len :element-type '(unsigned-byte 8)
:initial-element 0)))
    (declare (fixnum width height ca-width ca-height len)
             (type (simple-array (unsigned-byte 8) (*)) ca1 ca2))
    ;; seed the cellular automaton space
    (setf (aref ca1 (- (floor len 2) ca-width 4)) 1)

    ;; calcuclate the generation "count"
    (loop for c from 1 to count do
          (loop for y from 0 below height
                with i = (1+ ca-width) do
                (loop for x from 0 below width do
                      (let ((cell (aref ca1 i)))
                           (declare (type (unsigned-byte 8) cell))
                        (if (zerop cell)
                            ;; summing the 8 neighbours
                            (let ((sum (+ (+ (+ (+ (+ (+ (+ (aref ca1
(1+ i))
                                          (aref ca1 (1- i)))
                                          (aref ca1 (+ i ca-width -1)))
                                          (aref ca1 (+ i ca-width)))
                                          (aref ca1 (+ i ca-width 1)))
                                          (aref ca1 (- i ca-width -1)))
                                          (aref ca1 (- i ca-width)))
                                          (aref ca1 (- i ca-width
1)))))
                              (declare (type (unsigned-byte 8) sum))
                              (when (= sum 1) (setf cell 1)))
                          (if (= cell 62)
                              (setf cell 0)
                            (incf cell)))
                        (setf (aref ca2 i) cell)
                        (incf i)))
                (incf i 2))
          (rotatef ca1 ca2))))

Disassemble shows that code is quite similar to optimized VC++7.1. I
think a little further optimization of internal cycle's body can speed
up this much.
From: Nicolas Neuss
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <87is1uugga.fsf@ortler.iwr.uni-heidelberg.de>
Frank Buss <··@frank-buss.de> writes:

> [about performance problems]

If you used CMUCL/SBCL with maximum speed optimization, you would obtain a
lot of notes where you could further optimize.  Presumably, Lispworks would
give you similar information.

Nicolas.
From: Michael J. Ferrador
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <s6rfe.20736$o32.14224@fe09.lga>
Nicolas Neuss wrote:
> Frank Buss <··@frank-buss.de> writes:
> 
>>[about performance problems]
> 
> If you used CMUCL/SBCL with maximum speed optimization, you would obtain a
> lot of notes where you could further optimize.  Presumably, Lispworks would
> give you similar information.

Best algorithm first (were lisp may be more agile)

Then, optimize notes*


---

*Where as a newbie - I did NOT see that I was NOT programming lexically
  (functionally ?) until I cross checked my code from clisp (interp)
  to OpenMCL (on the fly REPL compile \ load)
From: Paolo Amoroso
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <87zmv5lvry.fsf@plato.moon.paoloamoroso.it>
Frank Buss <··@frank-buss.de> writes:

> program? And how can we simplifiying the setup-process for a working
> environment for developing OpenGL applications in Lisp, for all Lisp
> implementations on all platforms?

You may check CL-SDL:

  http://www.cliki.net/CL-SDL


Paolo
-- 
Why Lisp? http://lisp.tech.coop/RtL%20Highlight%20Film
Recommended Common Lisp libraries/tools (see also http://clrfi.alu.org):
- ASDF/ASDF-INSTALL: system building/installation
- CL-PPCRE: regular expressions
- UFFI: Foreign Function Interface
From: Wade Humeniuk
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <D_sfe.31243$0X6.5984@edtnps90>
On LW make these changes: (I have not tested this at all, not even compiled it.)
Also note this uses the LW FLI not the UFFI.  The gist of this that in LW you
can create Lisp arrays in a static area of memory.  Then you can pass the Lisp
array directly to C, (without copying!)

These changes make use of:

1) SYS:IN-STATIC-AREA
2) Declaring array types and optimizations better
3) FLI:WITH-DYNAMIC-LISP-ARRAY-POINTER  Here I am not sure it will work with
the UFFI, but I assume it will.

file:///C:/Program%20Files/Xanalys/LispWorks/lib/4-3-0-0/manual/online/web/FLI-W/html/fli-116.htm#pgfId-1112879

Wade

(defparameter *fb* (sys:in-static-area (make-array (* *width* *height* 4) :element-type 
'unsigned-byte)))

(hic:ff-defun-callable :cdecl :void draw-callback
     ()
   (glClear GL_COLOR_BUFFER_BIT)
   (prog ()
     (declare (type (simple-array unsigned-byte *) *ca1* *ca2* *fb*)
              (optimize (speed 3) (safety 0) (debug 0)))
     (loop for y from 0 below *height*
           with i2 = 0
           with i = (1+ *ca-width*) do
           (loop for x from 0 below *width* do
                 (let ((cell (aref *ca1* i)))
                   (if (zerop cell)
                       (let ((sum (+ (aref *ca1* (1+ i))
                                     (aref *ca1* (1- i))
                                     (aref *ca1* (+ i *ca-width* -1))
                                     (aref *ca1* (+ i *ca-width*))
                                     (aref *ca1* (+ i *ca-width* 1))
                                     (aref *ca1* (- i *ca-width* -1))
                                     (aref *ca1* (- i *ca-width*))
                                     (aref *ca1* (- i *ca-width* 1)))))
                         (when (= sum 1) (setf cell 1)))
                     (if (= cell 63)
                         (setf cell 0)
                       (incf cell)))
                   (setf (aref *ca2* i) cell)
                   (incf i)
                   (setf (aref *fb* (* cell 4)))
                   (incf i2)
                   (setf (aref *fb* (* cell 4)))
                   (incf i2)
                   (setf (aref *fb* (* cell 4)))
                   (incf i2)
                   (setf (aref *fb*  0))
                   (incf i2)))
           (incf i 2)))
   (rotatef *ca1* *ca2*)
   (fli:with-dynamic-lisp-array-pointer (fb *fb*)
     (glDrawPixels *width* *height* GL_RGBA GL_UNSIGNED_BYTE fb))
   (glutSwapBuffers))

(defun start ()
   (declare (special *window*))
   (glutInitDisplayMode (+ GLUT_RGB GLUT_DOUBLE))
   (glutInitWindowPosition 50 50)
   (glutInitWindowSize *width* *height*)
   (setq *window* (glutCreateWindow "Arabeske"))

   (glutDisplayFunc (hic:pointer-address (hic:ff-register-callable 'draw-callback)))
   (glutReshapeFunc (hic:pointer-address (hic:ff-register-callable 'reshape-callback)))
   (glutIdleFunc (hic:pointer-address (hic:ff-register-callable 'idle-callback)))
   (glutKeyboardFunc (hic:pointer-address (hic:ff-register-callable 'key-callback)))
   (glutVisibilityFunc (hic:pointer-address (hic:ff-register-callable 'visible-callback)))

   (let ((len (* *ca-width* *ca-height*)))
     (setf *ca1* (sys:in-static-area (make-array len :element-type '(unsigned-byte 8) 
:initial-element 0))
           *ca2* (sys:in-static-area (make-array len :element-type '(unsigned-byte 8) 
:initial-element 0))
           (aref *ca1* (- (floor len 2) *ca-width* 4)) 1))

   (catch :exit-my-test
     (glutMainLoop)))
From: Wade Humeniuk
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <6jvfe.46358$HR1.45352@clgrps12>
I found an hour and got much more speed up on LW.  File source below.

Wade

(declaim '(optimize (speed 3) (safety 0) (debug 0)))

(in-package :common-lisp-user)

(defpackage :my-test
   (:use :common-lisp #+SBCL :sb-ext :gl :hello-c)
   (:export :start))
	
(in-package :my-test)

(defvar *window* nil)

(defparameter *width* 640)
(defparameter *height* 480)

(defparameter *ca-width* (+ *width* 2))
(defparameter *ca-height* (+ *height* 2))
(defparameter *fb* (sys:in-static-area
                      (make-array (* *width* *height* 4)
                                  :element-type '(unsigned-byte 8)
                                  :initial-element 0)))
(defparameter *ca1* nil)
(defparameter *ca2* nil)

(hic:ff-defun-callable :cdecl :void reshape-callback
     ((width :int) (height :int))
   (glViewport 0 0 width height))

(defun recalc (ca1 ca2 fb)
   (declare (type (simple-array (unsigned-byte 8) (*)) ca1 ca2 fb)
            (optimize (speed 3) (safety 0) (debug 0) (float 0) (hcl:fixnum-safety 0)))
   (loop for y from 0 below *height*
         with ca-width = *ca-width*
         with width = *width*
         with i2 = 0
         with i = (1+ ca-width) do
         (loop for x from 0 below width do
               (let ((cell (aref ca1 i)))
                 (if (zerop cell)
                     (let ((sum (+ (aref ca1 (1+ i))
                                   (aref ca1 (1- i))
                                   (aref ca1 (+ i ca-width -1))
                                   (aref ca1 (+ i ca-width))
                                   (aref ca1 (+ i ca-width 1))
                                   (aref ca1 (- i ca-width -1))
                                   (aref ca1 (- i ca-width))
                                   (aref ca1 (- i ca-width 1)))))
                       (when (= sum 1) (setf cell 1)))
                   (if (= cell 63)
                       (setf cell 0)
                     (incf cell)))
                 (setf (aref ca2 i) cell)
                 (incf i)
                 (setf (aref fb i2) (* cell 4))
                 (incf i2)
                 (setf (aref fb i2) (* cell 4))
                 (incf i2)
                 (setf (aref fb i2) (* cell 4))
                 (incf i2)
                 (setf (aref fb i2) 0)))
         (incf i2)))

(hic:ff-defun-callable :cdecl :void draw-callback
     ()
   (glClear GL_COLOR_BUFFER_BIT)
   (recalc *ca1* *ca2* *fb*)
   (rotatef *ca1* *ca2*)
   (fli:with-dynamic-lisp-array-pointer (fb *fb* :type '(:unsigned :byte))
     (glDrawPixels *width* *height* GL_RGBA GL_UNSIGNED_BYTE fb))
   (glutSwapBuffers))

(hic:ff-defun-callable :cdecl :void idle-callback
     ()
   (glutPostRedisplay))

(hic:ff-defun-callable :cdecl :void visible-callback
     ((vis :int))
   (declare (ignore vis)))


(hic:ff-defun-callable :cdecl :void key-callback
     ((k :int) (x :int) (y :int))
   (declare (ignore k x y))
   (declare (special *window*))
   ;; exit on all keys
   (glutDestroyWindow *window*)
   (throw :exit-my-test nil))


(defun start ()
   (declare (special *window*))
   (glutInitDisplayMode (+ GLUT_RGB GLUT_DOUBLE))
   (glutInitWindowPosition 50 50)
   (glutInitWindowSize *width* *height*)
   (setq *window* (glutCreateWindow "Arabeske"))

   (glutDisplayFunc (hic:pointer-address (hic:ff-register-callable 'draw-callback)))
   (glutReshapeFunc (hic:pointer-address (hic:ff-register-callable 'reshape-callback)))
   (glutIdleFunc (hic:pointer-address (hic:ff-register-callable 'idle-callback)))
   (glutKeyboardFunc (hic:pointer-address (hic:ff-register-callable 'key-callback)))
   (glutVisibilityFunc (hic:pointer-address (hic:ff-register-callable 'visible-callback)))

   (let ((len (* *ca-width* *ca-height*)))
     (setf *ca1* (make-array len :element-type '(unsigned-byte 8) :initial-element 0)
           *ca2* (make-array len :element-type '(unsigned-byte 8) :initial-element 0)
           (aref *ca1* (- (floor len 2) *ca-width* 4)) 1))

   (catch :exit-my-test
     (glutMainLoop)))
From: Wade Humeniuk
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <k%Cfe.56979$tg1.29118@edtnps84>
I have corrected a few errors in my LW version and sped it
up a bit more.  I added a recalc-test just to benchmark
the generation of the frame buffer.  This version has
no calls to aref and all arithmetic is now fixnum.

See http://www3.telus.net/public/whumeniu/my-test.lisp

On my Athlon 1.4 GHz, the original code ran less than a
frame/sec.  Now it runs at about 16 frames/sec.

MY-TEST 13 > (time (recalc-test 1000))
Timing the evaluation of (RECALC-TEST 1000)
; Loading fasl file C:\Program Files\Xanalys\LispWorks\lib\4-3-0-0\modules\util\callcoun.fsl

user time    =     58.113
system time  =      0.020
Elapsed time =   0:00:58
Allocation   = 621152 bytes standard / 4004 bytes conses
0 Page faults
Calls to %EVAL    34
NIL

MY-TEST 14 > (time (recalc-test 16))
Timing the evaluation of (RECALC-TEST 16)

user time    =      0.981
system time  =      0.000
Elapsed time =   0:00:01
Allocation   = 620592 bytes standard / 3377 bytes conses
0 Page faults
Calls to %EVAL    34
NIL

MY-TEST 15 > (* 48 60)
2880

MY-TEST 16 > (time (recalc-test 100))
Timing the evaluation of (RECALC-TEST 100)

user time    =      5.928
system time  =      0.010
Elapsed time =   0:00:06
Allocation   = 620592 bytes standard / 3443 bytes conses
0 Page faults
Calls to %EVAL    34
NIL

MY-TEST 17 >
From: Wade Humeniuk
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <XN4ge.49070$vN2.30312@clgrps13>
Just for people's information.  I created a C library version of
recalc which calculates the cellular automata arrays and frame buffer.
This is only for LWW. (careful of the hack in the Lisp recalc if
you move this code to a big endian machine).

It turns out the C version of recalc is about 4X faster than my best Lisp version.
I am not particularly surprised by this and shows that (at least with LW) it is
better for heavy calculations to code in C (or assembler).
Personally I find nothing wrong with this approach.  If
I was going to pick two languages for getting things done
it would be Lisp and C.

Files
http://www3.telus.net/public/whumeniu/my-test.lisp
http://www3.telus.net/public/whumeniu/recalc.c

The lisp version runs as (start),
the lisp version using the C library is (c-start)

To compile and create a dll under cygwin

gcc -O3 -c recalc.c
gcc -mno-cygwin -shared -o recalc.dll recalc.o

MY-TEST 11 > (time (c-recalc-test 100)) ; tests the C recalc
Timing the evaluation of (C-RECALC-TEST 100)

user time    =      1.191
system time  =      0.000
Elapsed time =   0:00:01
Allocation   = 620616 bytes standard / 3102 bytes conses
0 Page faults
Calls to %EVAL    34
NIL

MY-TEST 12 > (time (recalc-test 100)) ; tests the Lisp recalc
Timing the evaluation of (RECALC-TEST 100)

user time    =      4.887
system time  =      0.000
Elapsed time =   0:00:05
Allocation   = 620608 bytes standard / 3124 bytes conses
0 Page faults
Calls to %EVAL    34
NIL

MY-TEST 13 > (/ 4.9 1.2)
4.083333333333334

MY-TEST 14 >

Wade
From: Wade Humeniuk
Subject: Re: How to speed up an OpenGL application?
Date: 
Message-ID: <PUfge.36259$0X6.1287@edtnps90>
One more version.  I got tired looking at grayscale, so
I have the cells change hue around a constant saturation
and value of the HSV color space.  You still need LW for
this as it uses the COLOR package.

http://www3.telus.net/public/whumeniu/my-test.lisp

Wade