From: Siegfried Gonzi
Subject: Declaring arrays in Lisp
Date: 
Message-ID: <3C42B0CB.1DF4B9CE@kfunigraz.ac.at>
[Before I commence: I did a dejanews-search, but I did not get the
enlightenment. The
following is due to the fact that I often had the experience in Clean
that it does not
well scale up to large arrays, when one has been using the inappropiate
naive
functional-programming elements.]

What is wrong (concerning the declaration) with the following code:

(defun matmul (m1 m2 m3 dims)
            (declare (optimize (speed 3) (safety 0) (debug 0))
                     (fixnum i j)
                     (double-float val)
                     (fixnum k dims)
                     (simple-array (double-float (* *)) m1 m2 m3))
              (loop :for i :from 0 :below dims :do
                  (loop :for j :from 0 :below dims :do
                  (setf val 0.0)
                      (loop :for k :from 0 :below dims :do
                            (setf val (+ val (* (aref m1   i j) (aref
m2  j i)))))
               (setf (aref m3 i j) val)))
(aref m3 0 0))


The above code performs the matrix-matrix-multiplication. I never have
had the dire need for
the matrix-matrix-multiplication in a real-world situation; but it is a
good gauge-tool
for checking the array access of a programming-language implementation.

I do the benchmarks, because I have to work on midsize  arrays of size
(about): 10000 x 12.

I only want to train how to declare arrays in order to get a good
performance out of it.
But it seems this plan is harder to realize than  assumed.

The above matmul-function delivers for a 512x512
matrix-matrix-multiplication the following time scheme:


CG-USER(9): (time (matmul m1 m2 m3 512))
; cpu time (non-gc) 134,242 msec (00:02:14.242) user, 620 msec system
; cpu time (gc)     84,623 msec (00:01:24.623) user, 20 msec system
; cpu time (total)  218,865 msec (00:03:38.865) user, 640 msec system
; real time  221,739 msec (00:03:41.739)
; space allocation:
;  9,095 cons cells, 2,147,535,960 other bytes, 1,368 static bytes
0.0d0
CG-USER(10):


And this on a 1000MHZ Celeron and 256MB RAM and Allegro CL 6.1. The AL
CL compiler variable has been set to:
 (declaim (optimize (speed 3) (safety 0) (space 0) (debug 0)))

A reasonable C version (without blocking) would take about 2sec on my
machine. I wrote a quick Clean-program
in order to compare the execution times (the function is at the end of
the post; for the curiosity only).
The naive Clean version takes 12sec (there have been posted faster ones
on the Clean mailing-list) for a
512x512 multiplication; and it takes 45sec without any declarations.


How can I improve the declaration of the Lisp version; I assume that
(maybe) a small change will be enough:
I do not see the wood for the trees. I tried also to vary the array
indices (in C columns are running faster than
rows; the reverse is true for Fortran indexing); but the execution times
are neary identical.


S. Gonzi

[Look on the Clean mailing-list for a way faster one]
matmul:: !{#{#Real}} !{#{#Real}} !*{#*{#Real}} -> !*{#*{#Real}}
matmul m1 m2 m3 = matmul_i 0 m1 m2 m3
where
     matmul_i:: !Int !{#{#Real}} !{#{#Real}} !*{#*{#Real}} ->
!*{#*{#Real}}
     matmul_i i m1 m2 m3
          |i==(size m1) = m3
          #! m3j = matmul_j i 0 m1 m2 m3
          =matmul_i (inc i)  m1 m2 m3j
      where
           matmul_j:: !Int !Int !{#{#Real}} !{#{#Real}} !*{#*{#Real}} ->
!*{#*{#Real}}
           matmul_j i j m1 m2 m3
                |j==(size m1.[0])=m3
                #! m3ij = matmul_k 0 i j m1 m2  0.0
                =matmul_j  i (inc j) m1 m2 {m3 & [i,j] = m3ij}
            where
                 matmul_k::!Int !Int !Int !{#{#Real}} !{#{#Real}} !Real
-> !Real
                 matmul_k k i j m1 m2  val
                      |k==(size m1.[0]) = val
                      =matmul_k (inc k) i j m1 m2  (val +
m1.[i,k]*m2.[k,j])

From: Will Deakin
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <3C42D89D.2080907@hotmail.com>
Siegfried Gonzi wrote:

> What is wrong (concerning the declaration) with the following code:
> 
> (defun matmul (m1 m2 m3 dims)
>             (declare (optimize (speed 3) (safety 0) (debug 0))
>                      (fixnum i j)
>                      (double-float val)
>                      (fixnum k dims)
>                      (simple-array (double-float (* *)) m1 m2 m3))
>               (loop :for i :from 0 :below dims :do
>                   (loop :for j :from 0 :below dims :do
>                   (setf val 0.0)
>                       (loop :for k :from 0 :below dims :do
>                             (setf val (+ val (* (aref m1   i j) (aref
> m2  j i)))))
>                (setf (aref m3 i j) val)))
> (aref m3 0 0))
Hmmm. What lisp are you using? Did you compile the function?


:)w

 
From: Raymond Wiker
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <86bsfxqb6h.fsf@raw.grenland.fast.no>
Siegfried Gonzi <···············@kfunigraz.ac.at> writes:

> [Before I commence: I did a dejanews-search, but I did not get the
> enlightenment. The
> following is due to the fact that I often had the experience in Clean
> that it does not
> well scale up to large arrays, when one has been using the inappropiate
> naive
> functional-programming elements.]
> 
> What is wrong (concerning the declaration) with the following code:
> 
> (defun matmul (m1 m2 m3 dims)
>             (declare (optimize (speed 3) (safety 0) (debug 0))
>                      (fixnum i j)
>                      (double-float val)
>                      (fixnum k dims)
>                      (simple-array (double-float (* *)) m1 m2 m3))
>               (loop :for i :from 0 :below dims :do
>                   (loop :for j :from 0 :below dims :do
>                   (setf val 0.0)
>                       (loop :for k :from 0 :below dims :do
>                             (setf val (+ val (* (aref m1   i j) (aref
> m2  j i)))))
>                (setf (aref m3 i j) val)))
> (aref m3 0 0))

        I've rewritten this slightly as follows:

(defun matmul (m1 m2 m3 dims)
  (declare (optimize (speed 3) (safety 0) (debug 0)
		     (compilation-speed 0))
	   (type fixnum dims)
	   (type (simple-array double-float (* *)) m1 m2 m3))
  (loop :for i of-type (unsigned-byte 29) :from 0 :below dims :do
	(loop :for j of-type (unsigned-byte 29) :from 0 :below dims :do
	      (let ((val 0.0d0))
		(declare (type double-float val))
		(loop :for k of-type (unsigned-byte 29) :from 0 :below dims :do
		      (incf val (* (aref m1 i k) (aref m2 k i))))
		(setf (aref m3 i j) val))))
  (aref m3 0 0))

        With SBCL (a relative of CMUCL) on a 667-MHz PIII, I get an
execution time of around 120sec. The only warning/note emitted by SBCL
is for the return value, which has to be unboxed.

        The original code has an error (I think): the body of the
innermost loop (on k) should use the indexes "i k" and "k j".

        The typespec (unsigned-byte 29) is correct for SBCL, but
should really be something like (integer 0 array-dimension-limit).

        Still, this is pretty far from your target of 2 seconds... The
CMUCL web pages used to have some example CMUCL code that ran faster
than the equivalent C/C++ code, but I cannot find that at the moment.

-- 
Raymond Wiker                        Mail:  ·············@fast.no
Senior Software Engineer             Web:   http://www.fast.no/
Fast Search & Transfer ASA           Phone: +47 23 01 11 60
P.O. Box 1677 Vika                   Fax:   +47 35 54 87 99
NO-0120 Oslo, NORWAY                 Mob:   +47 48 01 11 60

Try FAST Search: http://alltheweb.com/
From: Pierre R. Mai
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <87vge4ltr0.fsf@orion.bln.pmsf.de>
Raymond Wiker <·············@fast.no> writes:

>         With SBCL (a relative of CMUCL) on a 667-MHz PIII, I get an
> execution time of around 120sec. The only warning/note emitted by SBCL
> is for the return value, which has to be unboxed.

That seems strange, given that current CMUCL takes around 39s for your
code on a lowly AMD K6-2/550, which has a lousy FPU, when compared
with a PIII.  On a PII-300, your code takes around 49s, so I'd expect
a 667-MHz PIII to take less than 30s, and a 1000-MHz PIII to take
around 15s.

That all would come out to about 110 cycles per iteration (for the
PII/PIIIs) or around 159 cycles per iteration (for the AMD), which
isn't the tightest of code (a cursory glance suggests lots of
uneccessary moves, which aren't optimized away by the peep-whole
optimizer...):

* (disassemble 'matmul)

48098920:       .ENTRY MATMUL()              ; FUNCTION
     938:       POP   DWORD PTR [EBP-8]
     93B:       LEA   ESP, [EBP-32]
     93E:       MOV   EAX, [EBP-16]
     941:       MOV   [EBP-24], EAX
     944:       MOV   DWORD PTR [EBP-28], 0  ; No-arg-parsing entry point
     94B:       JMP   L5
     950: L0:   XOR   EBX, EBX
     952:       JMP   L4
     957: L1:   FSTPD FR1
     959:       FLDZ
     95B:       FXCH  FR1
     95D:       MOV   DWORD PTR [EBP-12], 0
     964:       JMP   L3
     966: L2:   MOV   EAX, [EDX+25]
     969:       MOV   [EBP-16], EAX
     96C:       MOV   EAX, [EBP-28]
     96F:       SAR   EAX, 2
     972:       IMUL  EAX, [EBP-16]
     976:       MOV   [EBP-20], EAX
     979:       MOV   EAX, [EBP-20]
     97C:       ADD   EAX, [EBP-12]
     97F:       MOV   [EBP-20], EAX
     982:       MOV   EAX, [EDX+9]
     985:       MOV   [EBP-16], EAX
     988:       MOV   EAX, [EBP-16]
     98B:       MOV   ECX, [EBP-20]
     98E:       FSTPD FR0
     990:       FLDD  [EAX+ECX*2+1]
     994:       MOV   EAX, [EDI+25]
     997:       MOV   [EBP-16], EAX
     99A:       MOV   EAX, [EBP-12]
     99D:       SAR   EAX, 2
     9A0:       IMUL  EAX, [EBP-16]
     9A4:       MOV   [EBP-20], EAX
     9A7:       MOV   EAX, [EBP-20]
     9AA:       ADD   EAX, [EBP-28]
     9AD:       MOV   [EBP-20], EAX
     9B0:       MOV   EAX, [EDI+9]
     9B3:       MOV   [EBP-16], EAX
     9B6:       MOV   EAX, [EBP-16]
     9B9:       MOV   ECX, [EBP-20]
     9BC:       FSTPD FR2
     9BE:       FLDD  [EAX+ECX*2+1]
     9C2:       FXCH  FR2
     9C4:       FMULD FR2
     9C6:       FADD-STI FR1
     9C8:       ADD   DWORD PTR [EBP-12], 4
     9CC: L3:   MOV   EAX, [EBP-24]
     9CF:       MOV   [EBP-16], EAX
     9D2:       MOV   EAX, [EBP-12]
     9D5:       CMP   EAX, [EBP-16]
     9D8:       JL    L2
     9DA:       MOV   EAX, [ESI+25]
     9DD:       MOV   [EBP-12], EAX
     9E0:       MOV   EAX, [EBP-28]
     9E3:       SAR   EAX, 2
     9E6:       IMUL  EAX, [EBP-12]
     9EA:       MOV   [EBP-16], EAX
     9ED:       ADD   [EBP-16], EBX
     9F0:       MOV   EAX, [ESI+9]
     9F3:       MOV   [EBP-12], EAX
     9F6:       MOV   EAX, [EBP-12]
     9F9:       MOV   ECX, [EBP-16]
     9FC:       FXCH  FR1
     9FE:       FSTD  [EAX+ECX*2+1]
     A02:       FXCH  FR1
     A04:       ADD   EBX, 4
     A07: L4:   MOV   EAX, [EBP-24]
     A0A:       MOV   [EBP-12], EAX
     A0D:       CMP   EBX, [EBP-12]
     A10:       JL    L1
     A16:       MOV   ECX, [EBP-28]
     A19:       ADD   ECX, 4
     A1C:       MOV   [EBP-28], ECX
     A1F: L5:   MOV   EBX, [EBP-24]
     A22:       CMP   [EBP-28], EBX
     A25:       JL    L0
     A2B:       MOV   EAX, [ESI+9]
     A2E:       FSTPD FR0
     A30:       FLDD  [EAX+1]
     A33:       MOV   BYTE PTR [#x280001D4], 0 ; COMMON-LISP::*PSEUDO-ATOMIC-INTERRUPTED*
     A3A:       MOV   BYTE PTR [#x280001BC], 4 ; COMMON-LISP::*PSEUDO-ATOMIC-ATOMIC*
     A41:       MOV   EDX, 16
     A46:       ADD   EDX, [#x8069724]       ; current_region_free_pointer
     A4C:       CMP   EDX, [#x80696F8]       ; current_region_end_addr
     A52:       JBE   L6
     A54:       CALL  #x805372C              ; alloc_overflow_edx
     A59: L6:   XCHG  EDX, [#x8069724]       ; current_region_free_pointer
     A5F:       MOV   DWORD PTR [EDX], 790
     A65:       LEA   EDX, [EDX+7]
     A68:       FSTD  [EDX+1]
     A6B:       MOV   BYTE PTR [#x280001BC], 0 ; COMMON-LISP::*PSEUDO-ATOMIC-ATOMIC*
     A72:       CMP   BYTE PTR [#x280001D4], 0 ; COMMON-LISP::*PSEUDO-ATOMIC-INTERRUPTED*
     A79:       JEQ   L7
     A7B:       BREAK 9                      ; Pending interrupt trap
     A7D: L7:   MOV   ECX, [EBP-8]
     A80:       MOV   EAX, [EBP-4]
     A83:       ADD   ECX, 2
     A86:       MOV   ESP, EBP
     A88:       MOV   EBP, EAX
     A8A:       JMP   ECX
     A8C:       NOP
     A8D:       NOP
     A8E:       NOP
     A8F:       NOP

Regs, Pierre.

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein
From: Bulent Murtezaoglu
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <87zo3gpuet.fsf@nkapi.internal>
>>>>> "PRM" == Pierre R Mai <····@acm.org> writes:

    PRM> [...] which isn't the tightest of code (a cursory glance
    PRM> suggests lots of uneccessary moves, which aren't optimized
    PRM> away by the peep-whole optimizer...).

I am not seeing the beginning of the thread here (news server problems 
prolly), but I will jump in nonetheless:

-- Does CMUCL have a peephole optimizer now?  

-- The last I played with deeply nested loops in CMUCL there appeared
to be an additional inefficiency introduced by the register allocator
choosing to keep the indices used in the inner loops in memory.  Or so 
it seemed to me.  

I think it would be worthwhile to see both the C code and the Lisp code 
side by side and go through them here.  This maye have already happened 
but I don't see it here, my apologies if that is the case.

cheers,

BM
From: Jochen Schmidt
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <3C431ED3.40606@dataheaven.de>
Raymond Wiker wrote:
> (defun matmul (m1 m2 m3 dims)
>   (declare (optimize (speed 3) (safety 0) (debug 0)
> 		     (compilation-speed 0))
> 	   (type fixnum dims)
> 	   (type (simple-array double-float (* *)) m1 m2 m3))
>   (loop :for i of-type (unsigned-byte 29) :from 0 :below dims :do
> 	(loop :for j of-type (unsigned-byte 29) :from 0 :below dims :do
> 	      (let ((val 0.0d0))
> 		(declare (type double-float val))
> 		(loop :for k of-type (unsigned-byte 29) :from 0 :below dims :do
> 		      (incf val (* (aref m1 i k) (aref m2 k i))))
> 		(setf (aref m3 i j) val))))
>   (aref m3 0 0))


There is a little typo in the code here. (aref m2 k j) instead of (aref 
m2 k i).


>         With SBCL (a relative of CMUCL) on a 667-MHz PIII, I get an
> execution time of around 120sec. The only warning/note emitted by SBCL
> is for the return value, which has to be unboxed.

I've tested the code under CMUCL18d (prerelease) and LispWorks4.2 on a
800-Mhz Duron with 256MB Ram. The mean time was ~40 seconds. I was 
indeed quite impressed to see that LispWorks4.2 even was a bit faster 
than CMUCL (LW ~39sec, CMUCL ~41 sec)

>         The original code has an error (I think): the body of the
> innermost loop (on k) should use the indexes "i k" and "k j".
> 
>         The typespec (unsigned-byte 29) is correct for SBCL, but
> should really be something like (integer 0 array-dimension-limit).
> 
>         Still, this is pretty far from your target of 2 seconds... The
> CMUCL web pages used to have some example CMUCL code that ran faster
> than the equivalent C/C++ code, but I cannot find that at the moment.

Hm... I'm not sure but a direct implementation in C compiled with GCC 
and -O3 optimization was not faster (in the mean 2-3 seconds slower than 
Lispworks)...

I have to say I'm quite impressed by this...

ciao,
Jochen
From: Raymond Toy
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <4nadvghl2h.fsf@rtp.ericsson.se>
>>>>> "Jochen" == Jochen Schmidt <···@dataheaven.de> writes:

    Jochen> Raymond Wiker wrote:
    >> CMUCL web pages used to have some example CMUCL code that ran faster
    >> than the equivalent C/C++ code, but I cannot find that at the moment.

    Jochen> Hm... I'm not sure but a direct implementation in C compiled with GCC
    Jochen> and -O3 optimization was not faster (in the mean 2-3 seconds slower
    Jochen> than Lispworks)...

Are you saying that the matmul with gcc was actually slower than
Lispworks on the same platform?  

I find this hard to believe.  From what I know of CMUCL, each (2D)
array access takes at least 2 memory reads (1 to get to the pointer to
the actual memory, and one more to get the actual element) to get at a
single element.  I'm pretty sure gcc only needs one.

Ray
From: Pierre R. Mai
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <87lmf0lnid.fsf@orion.bln.pmsf.de>
Raymond Toy <···@rtp.ericsson.se> writes:

> >>>>> "Jochen" == Jochen Schmidt <···@dataheaven.de> writes:
> 
>     Jochen> Raymond Wiker wrote:
>     >> CMUCL web pages used to have some example CMUCL code that ran faster
>     >> than the equivalent C/C++ code, but I cannot find that at the moment.
> 
>     Jochen> Hm... I'm not sure but a direct implementation in C compiled with GCC
>     Jochen> and -O3 optimization was not faster (in the mean 2-3 seconds slower
>     Jochen> than Lispworks)...
> 
> Are you saying that the matmul with gcc was actually slower than
> Lispworks on the same platform?  
> 
> I find this hard to believe.  From what I know of CMUCL, each (2D)
> array access takes at least 2 memory reads (1 to get to the pointer to
> the actual memory, and one more to get the actual element) to get at a
> single element.  I'm pretty sure gcc only needs one.

FWIW, a simple translation of Raymond Wiker's implementation
(including bugs, for consistency), takes 28s vs. 39s on my AMD
K6-2/550, and 16s vs. 49s on a PII-300, when compiled with gcc 2.95.2
(gcc 2.95.3) with -O6.  That would suggest a 5s vs. 15s run-time for a
PIII-1000.  The difference is much higher with a PII/PIII, since the
AMD K6-2 takes much more time for FP calculations...

Regs, Pierre.

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein
From: Jochen Schmidt
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <3C43482C.9030708@dataheaven.de>
Raymond Toy wrote:
> Are you saying that the matmul with gcc was actually slower than
> Lispworks on the same platform?  
> 
> I find this hard to believe.  From what I know of CMUCL, each (2D)
> array access takes at least 2 memory reads (1 to get to the pointer to
> the actual memory, and one more to get the actual element) to get at a
> single element.  I'm pretty sure gcc only needs one.
Yes in my last test it was so. But you have to take into account that:

- I measured the time of the C version simply with the "time" unix
   utility and the CL version with the CL "TIME" function
- One run lasts ~40 secs in all versions - so the 2-3 seconds difference
   between LispWorks and C may have been noise.
- I may have done something wrong

I quickly looked into the sources for any obvious bugs, found none and 
ran the test again.

LW:
user time    =   40.440
system time  =    0.060
Elapsed time = 0:00:41
Allocation   = 6292696 bytes standard / 1573 bytes fixlen
0 Page faults

C:
real 
0m40.792s
user 
0m40.690s
sys 
0m0.100s

Following your doubts this can only mean that I do something wrong - so
lets try to investigate were my error was.

Here is the Lisp code I used to run the test:

;; matrix.lisp

;; Note:In LispWorks it is _essential_ to set the FLOAT optimizer switch
;;      to 0 if you want to get really fast floating point code!
(defun matmul (m1 m2 m3 dims)
   (declare (optimize (speed 3) (safety 0) (debug 0)
                      #+lispworks (float 0))
            (fixnum dims)
            (type (simple-array double-float (* *)) m1 m2 m3))
   (loop :for i :of-type fixnum :from 0 :below dims :do
         (loop :for j :of-type fixnum :from 0 :below dims :do
               (let ((val 0d0))
                 (declare (double-float val))
                 (loop :for k :of-type fixnum :from 0 :below dims :do
                       (incf val (* (aref m1 i k) (aref m2 k j))))
                 (setf (aref m3 i j) val))))
   (aref m3 0 0))

(defun make-matrix (n)
   (declare (optimize (speed 3) (safety 0) (debug 0)
                       #+lispworks(float 0))
            (fixnum n))
   (let ((m (make-array (list n n) :element-type 'double-float))
         (count 1.0d0))
     (declare (type (simple-array double-float (* *)) m)
              (double-float count))
     (dotimes (i n)
       (dotimes (j n)
         (setf (aref m i j) (incf count 0.5d0))))
     m))

(defun test (n)
   (let ((m1 (make-matrix n))
	(m2 (make-matrix n))
	(m3 (make-array (list n n) :element-type 'double-float)))
     (matmul m1 m2 m3 n)))

I compiled it in a LispWorks buffer and then typed into the listener:

   (time (test 512))

And here is some ugly quick hacked together C++ code:

/* matrix.c */

#include <stdio.h>

double matmul(double m1[512][512],
	      double m2[512][512],
	      double m3[512][512]) {
    for (int i=0; i<512; ++i) {
     for (int j=0; j<512; ++j) {
       double val=0.0;
       for (int k=0;k<512; ++k) {
   	val += m1[i][k]*m2[k][j];
       }
       m3[i][j]=val;
     }
    }
   return m3[0][0];
}

void fill_matrix (double m[512][512]) {
   double count = 0.0;
   for (int i=0; i<512;++i)
     for (int j=0; j<512;++j){
       count+=0.5;
       m[i][j]=count;
     }
}

int main ()
{
   double m1[512][512];
   double m2[512][512];
   double m3[512][512];

   fill_matrix(m1);
   fill_matrix(m2);
   fill_matrix(m3);

   printf("%E\n", matmul(m1,m2,m3));

   return 0;
}


I compiled the C++ code with "g++ -O3 matrix.lisp"
And ran the executable like "time ./a.out"


Probably it's only an error in one of the test programs or something
I have forgotten to do.

ciao,
Jochen
From: Raymond Toy
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <4nn0zgfxa1.fsf@rtp.ericsson.se>
>>>>> "Jochen" == Jochen Schmidt <···@dataheaven.de> writes:

    Jochen> I quickly looked into the sources for any obvious bugs, found none and
    Jochen> ran the test again.

A couple of notes from CMUCL: In make-matrix, you might want to
declare i and j to be fixnum's. 

    Jochen> LW:
    Jochen> user time    =   40.440
    Jochen> system time  =    0.060
    Jochen> Elapsed time = 0:00:41
    Jochen> Allocation   = 6292696 bytes standard / 1573 bytes fixlen
    Jochen> 0 Page faults

    Jochen> C:
    Jochen> real 0m40.792s

    Jochen> user 0m40.690s

    Jochen> sys 0m0.100s

For me on a 300 MHz Ultrasparc (which has a 2MB cache), gcc 3.0 gives
27.2 sec user time, and CMUCL (using time) gives 46.5 sec real time.
(I also note that the output differs between the two program.)

However, since we are doing a 512x512 matrix multiply, and the
matrices probably don't fit in the cache, we are probably getting
killed by the cache filling so that the extra instructions that CMUCL
adds don't really have much impact.

When I do a 512x512 multiply via matlisp using the cache-aware ATLAS
libs, this multiply (not counting the initialization) only takes 0.72
sec.

Ray
From: Geoff Summerhayes
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <7jJ08.6742$ZF3.396943@news2.calgary.shaw.ca>
"Jochen Schmidt" <···@dataheaven.de> wrote in message
·····················@dataheaven.de...
>
> I quickly looked into the sources for any obvious bugs, found none and
> ran the test again.
>
> LW:
> user time    =   40.440
> system time  =    0.060
> Elapsed time = 0:00:41
> Allocation   = 6292696 bytes standard / 1573 bytes fixlen
> 0 Page faults
>
> C:
> real
> 0m40.792s
> user
> 0m40.690s
> sys
> 0m0.100s
>
*snip*
>
> I compiled the C++ code with "g++ -O3 matrix.lisp"
> And ran the executable like "time ./a.out"
>
> Probably it's only an error in one of the test programs or something
> I have forgotten to do.
>

The C code can be ?improved? if the compiler is not doing this
optimization job, there are calculations on indexing that can be
hoisted out of the inner loop.
Haven't tested this, but this may provide some increase:

#include <stdio.h>

double matmul(double *m1,double *m2,double *m3,int size)
{
    int i,j,k,l;
    double *line,*out,*col;
    int num_elem=size*size;
    for(i=0;i<num_elem;i+=size)
    {
        line=&m1[i];
        for(j=0;j<size;j++)
        {
            out=&m3[i+j];
            col=&m2[j];
            *out=0.0;
            for(k=l=0;k<size;k++,l+=size)
            {
                *out+=line[k]*col[l];
            }
        }
    }
    return *m3;
}

void fill_matrix (double *m,int size)
{
    double count = 0.0;
    int i,elems=size*size;
    for(i=0;i<elems;i++)
    {
        count+=0.5;
        m[i]=count;
    }
}

test(int size)
{
    double *m1=malloc(size*size*sizeof(double));
    double *m2=malloc(size*size*sizeof(double));
    double *m3=malloc(size*size*sizeof(double));
    fill_matrix(m1,size);
    fill_matrix(m2,size);
    printf("%E\n", matmul(m1,m2,m3,size));
    free(m1);
    free(m2);
    free(m3);
}

int main ()
{
    time_t start,end;
    time(&start);
    test(512);
    time(&end);
    printf("%ld\n",end-start);
    return 0;
}

To be fair, I've altered the code to allow for variable-sized
square matricies as in the Lisp.
Note also that the call fill_matrix(m3); is not something a C
programmer is likely to make since the next call overwrites
the information, so it has been excised.

Does `time' take into account program load times? On my system
the C code with time measured just before the first call to
fill_matrix and just after the printf gives 13 seconds as opposed
to 37 for the current Lisp. I've haven't played with the Lisp
code yet, I assume changes along the same lines may be as
beneficial.

Geoff
From: Jochen Schmidt
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <3C44B5CF.1070309@dataheaven.de>
Geoff Summerhayes wrote:
> "Jochen Schmidt" <···@dataheaven.de> wrote in message
> ·····················@dataheaven.de...
> 
>>I quickly looked into the sources for any obvious bugs, found none and
>>ran the test again.
>>
>>LW:
>>user time    =   40.440
>>system time  =    0.060
>>Elapsed time = 0:00:41
>>Allocation   = 6292696 bytes standard / 1573 bytes fixlen
>>0 Page faults
>>
>>C:
>>real
>>0m40.792s
>>user
>>0m40.690s
>>sys
>>0m0.100s
>>
>>
> *snip*
> 
>>I compiled the C++ code with "g++ -O3 matrix.lisp"
>>And ran the executable like "time ./a.out"
>>
>>Probably it's only an error in one of the test programs or something
>>I have forgotten to do.
>>
>>
> 
> The C code can be ?improved? if the compiler is not doing this
> optimization job, there are calculations on indexing that can be
> hoisted out of the inner loop.
> Haven't tested this, but this may provide some increase:
> 
> #include <stdio.h>
> 
> double matmul(double *m1,double *m2,double *m3,int size)
> {
>     int i,j,k,l;
>     double *line,*out,*col;
>     int num_elem=size*size;
>     for(i=0;i<num_elem;i+=size)
>     {
>         line=&m1[i];
>         for(j=0;j<size;j++)
>         {
>             out=&m3[i+j];
>             col=&m2[j];
>             *out=0.0;
>             for(k=l=0;k<size;k++,l+=size)
>             {
>                 *out+=line[k]*col[l];
>             }
>         }
>     }
>     return *m3;
> }
> 
> void fill_matrix (double *m,int size)
> {
>     double count = 0.0;
>     int i,elems=size*size;
>     for(i=0;i<elems;i++)
>     {
>         count+=0.5;
>         m[i]=count;
>     }
> }
> 
> test(int size)
> {
>     double *m1=malloc(size*size*sizeof(double));
>     double *m2=malloc(size*size*sizeof(double));
>     double *m3=malloc(size*size*sizeof(double));
>     fill_matrix(m1,size);
>     fill_matrix(m2,size);
>     printf("%E\n", matmul(m1,m2,m3,size));
>     free(m1);
>     free(m2);
>     free(m3);
> }
> 
> int main ()
> {
>     time_t start,end;
>     time(&start);
>     test(512);
>     time(&end);
>     printf("%ld\n",end-start);
>     return 0;
> }
> 
> To be fair, I've altered the code to allow for variable-sized
> square matricies as in the Lisp.

Yes that is ok - As I said it was only a quick hack. I was quite sure 
that the C code will outperform the Lisp code by at least a small 
faktor. I was then surprised to see that the C code ran so slow on my 
machine. Your code needs 34 secs on my machine (Duron 800, 256MB Ram)
while my last LW tests needed ~39 seconds. The Duron has only 64K L2 
cache (Athlon has 256K, P4 512K).

> g++ -v
Reading specs from /usr/lib/gcc-lib/i486-suse-linux/2.95.3/specs
gcc version 2.95.3 20010315 (SuSE)

> uname -a
Linux shsp0629 2.4.4-4GB #1 Wed May 16 00:37:55 GMT 2001 i686 unknown

> more /proc/cpuinfo
processor 
: 0
vendor_id 
: AuthenticAMD
cpu family	: 6
model 
	: 3
model name	: AMD Duron(tm) Processor
stepping 
: 1
cpu MHz		: 801.432
cache size	: 64 KB
fdiv_bug 
: no
hlt_bug 
	: no
f00f_bug 
: no
coma_bug 
: no
fpu 
	: yes
fpu_exception 
: yes
cpuid level	: 1
wp 
	: yes
flags 
	: fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx 
fxsr syscall mmxext 3dnowext 3dnow
bogomips 
: 1599.07


and LispWorks 4.2 Enterprise Edition for Linux

As another test I copied the C source and a delivered executable of the 
lisp test to another machine in our LAN:

 > g++ -v
Reading specs from /usr/lib/gcc-lib/i386-linux/2.95.2/specs
gcc version 2.95.2 20000220 (Debian GNU/Linux)

 > uname -a
Linux somehost 2.4.16 #1 Wed Jan 9 16:54:17 CET 2002 i686 unknown

 > more /proc/cpuinfo
processor 
: 0
vendor_id 
: GenuineIntel
cpu family	: 6
model 
	: 8
model name	: Pentium III (Coppermine)
stepping 
: 1
cpu MHz		: 601.373
cache size	: 256 KB
fdiv_bug 
: no
hlt_bug 
	: no
f00f_bug 
: no
coma_bug 
: no
fpu 
	: yes
fpu_exception 
: yes
cpuid level	: 2
wp 
	: yes
flags 
	: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 
mmx fxsr sse
bogomips 
: 1199.30

On this machine your C code ran in 9 seconds and mine in 15 seconds.
Besides the big speed difference between this PIII 600Mhz and my Duron 
800Mhz you may be surprised that the delivered lisp executable ran in 15 
seconds too! It is quite understandable that your code with optimized 
indexing is faster - but it is still surprising to me that LW is as fast
as C if both use the simple version of before...

> Note also that the call fill_matrix(m3); is not something a C
> programmer is likely to make since the next call overwrites
> the information, so it has been excised

I wanted to initialize the matrices and I don't wanted to type a 512x512 
array in. The third fill_matrix is actually a typo left in because I 
used three MAKE-MATRIX calls in lisp and to be fair I used three 
fill_matrix calls in C too. Later (after some little experiments with 
alternative matrix representations in Lisp) I changed the third 
MAKE-MATRIX to a simple MAKE-ARRAY (but forgot to remove the 
fill_matrix). You are right that no C programmer would do something like 
that in real life code but this is no real life code. I cannot even 
remember when I last used a fixed sized array when writing a 
production-level C application. I furthermore don't think that this is 
actually the source of this curious results...

> Does `time' take into account program load times? On my system
> the C code with time measured just before the first call to
> fill_matrix and just after the printf gives 13 seconds as opposed
> to 37 for the current Lisp. I've haven't played with the Lisp
> code yet, I assume changes along the same lines may be as
> beneficial.

The load time is small enough for an C executable that is 6K big.
It at least would not describe this results.
Can you publish your machine stats and which Lisp (+ version) you used?

If there is some interest I can publish my C and the delivered lisp 
executable at my webpage.

ciao,

Jochen
From: Nils Goesche
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <a1v700$t621f$1@ID-125440.news.dfncis.de>
In article <··············@dataheaven.de>, Jochen Schmidt wrote:

> I've tested the code under CMUCL18d (prerelease) and LispWorks4.2 on a
> 800-Mhz Duron with 256MB Ram. The mean time was ~40 seconds. I was 
> indeed quite impressed to see that LispWorks4.2 even was a bit faster 
> than CMUCL (LW ~39sec, CMUCL ~41 sec)

That's indeed surprising.  When I played with the trial version of
Lispworks, it was slower by a factor of 10 in every little toy benchmark
I did.  Are you sure you compiled the CMUCL code? <gd&r> Maybe I should
give LW another try...

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9
From: Dr. Edmund Weitz
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <m3u1to23wu.fsf@dyn138.dbdmedia.de>
Nils Goesche <······@cartan.de> writes:

> That's indeed surprising.  When I played with the trial version of
> Lispworks, it was slower by a factor of 10 in every little toy
> benchmark I did.  Are you sure you compiled the CMUCL code? <gd&r>
> Maybe I should give LW another try...

I've also encountered situations where LispWorks (4.1.20 trial on
Linux) was as fast as (or even a little bit faster than) CMUCL
18c. This was code that didn't use floating point arithmetic,
though. See <http://weitz.de/einstein.html> for an example.

Other implementations (including ACL) were considerably slower.

Edi
From: Jochen Schmidt
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <3C434F82.8070408@dataheaven.de>
Nils Goesche wrote:
> In article <··············@dataheaven.de>, Jochen Schmidt wrote:
> 
> 
>>I've tested the code under CMUCL18d (prerelease) and LispWorks4.2 on a
>>800-Mhz Duron with 256MB Ram. The mean time was ~40 seconds. I was 
>>indeed quite impressed to see that LispWorks4.2 even was a bit faster 
>>than CMUCL (LW ~39sec, CMUCL ~41 sec)
>>
> 
> That's indeed surprising.  When I played with the trial version of
> Lispworks, it was slower by a factor of 10 in every little toy benchmark
> I did.  Are you sure you compiled the CMUCL code? <gd&r> Maybe I should
> give LW another try...


Yes I compiled the CMUCL code - I tested it multiple times with the same 
result. Note that much more astonishing all three (CMUCL, LW and C) are 
somewhere near 40 seconds for this test (according to the test-setting I 
described in the other posting). I by myself have seen CL as being slow 
for multidimensional arrays and have often rewritten such code to use 
1-dimensional arrays instead.
For many things LW was indeed slower than CMUCL (in some things like 
CLOS it was alot faster). One thing you should not forget is that the 
default settings of LW are for safety and debugability. If you for 
example want fast floating-point code you should set the FLOAT optimizer
switch to 0. If you have some inner loop were you need really fast 
fixnum arithmetic you can try setting FIXNUM-SAFETY to something like 0 
or 1 (but be careful this can break many things!). Some optimizing 
options are only available if you finally deliver an executable and this 
is not possible with the personal edition. In general I would say that 
it is not such a good idea to compare CMUCL declared benchmark code with
LispWorks without trying to optimize it to LW too. There are many 
switches and things in LW that enhance it's performance.

Another fact is that Xanalys enhanced the performance of LW in the 4.2 
release (I do not know if this counts here and I do not know which parts 
actually got enhanced)

ciao,
Jochen
From: Siegfried Gonzi
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <3C43F573.FB19C961@kfunigraz.ac.at>
Jochen Schmidt wrote:

> >         With SBCL (a relative of CMUCL) on a 667-MHz PIII, I get an
> > execution time of around 120sec. The only warning/note emitted by SBCL
> > is for the return value, which has to be unboxed.
>
> I've tested the code under CMUCL18d (prerelease) and LispWorks4.2 on a
> 800-Mhz Duron with 256MB Ram. The mean time was ~40 seconds. I was
> indeed quite impressed to see that LispWorks4.2 even was a bit faster
> than CMUCL (LW ~39sec, CMUCL ~41 sec)

Please, do you have got any suggestions for me, how to set the compiler-flags in
LispWorks. I too have actually installed the personal edition, but I am a little
bit confused what to set-up before compiling. [In AL CL the documentation is quite
straightforward.]


S. Gonzi
From: Nils Goesche
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <a216at$tp516$1@ID-125440.news.dfncis.de>
In article <·················@kfunigraz.ac.at>, Siegfried Gonzi wrote:
> Jochen Schmidt wrote:
> 
>> >         With SBCL (a relative of CMUCL) on a 667-MHz PIII, I get an
>> > execution time of around 120sec. The only warning/note emitted by SBCL
>> > is for the return value, which has to be unboxed.
>>
>> I've tested the code under CMUCL18d (prerelease) and LispWorks4.2 on a
>> 800-Mhz Duron with 256MB Ram. The mean time was ~40 seconds. I was
>> indeed quite impressed to see that LispWorks4.2 even was a bit faster
>> than CMUCL (LW ~39sec, CMUCL ~41 sec)
> 
> Please, do you have got any suggestions for me, how to set the
> compiler-flags in LispWorks. I too have actually installed the
> personal edition, but I am a little bit confused what to set-up
> before compiling. [In AL CL the documentation is quite
> straightforward.]

I have changed your code a bit and now it takes 4.8 seconds on
LispWorks Personal Edition, Pentium III 700 MHz.  Changed version:

(eval-when (compile)
    (declaim (optimize (float 0) (speed 3) (safety 0)
                       (space 0) (debug 0))))

(defun span-array (rows cols &optional (start-val 0d0))
     (declare (fixnum rows cols)
              (double-float start-val))
  (let* ((array (make-array (list rows cols) :element-type 'double-float
                            :initial-element 0d0)))
    (loop :for i :of-type fixnum :from 0 :below rows :do
          (loop :for j :of-type fixnum :from 0 :below cols :do
                (setf (aref array i j) (+ start-val (float j 0d0)))))
array))

;;safety setting 0 will result in a segementation-default error
(defun matmul (m1 m2 m3 dims)
  (declare (fixnum dims)
           (type (simple-array double-float (* *)) m1 m2 m3))
  (loop :for i :of-type fixnum :from 0 :below dims :do
        (loop :for j :of-type fixnum :from 0 :below dims :do
              (let ((val 0d0))
                (declare (double-float val))
                (loop :for k :of-type fixnum :from 0 :below dims :do
                      (incf val (* (aref m1 i k) (aref m2 k j))))
        (setf (aref m3 i j) val))))
(aref m3 0 0))


(defun tst ()
  (let ((m1 (span-array 256 256))
        (m2 (span-array 256 256))
        (m3 (span-array 256 256)))
    (time (matmul m1 m2 m3 256))))

CL-USER 70 > (tst)

4.8 seconds used.
Standard Allocation 16 bytes.
Fixlen Allocation 0 bytes.
0.0

CL-USER 71 >

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9
From: Jochen Schmidt
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <3C44B7DD.8000103@dataheaven.de>
Nils Goesche wrote:
> In article <·················@kfunigraz.ac.at>, Siegfried Gonzi wrote:
> 
>>Jochen Schmidt wrote:
>>
>>
>>>>        With SBCL (a relative of CMUCL) on a 667-MHz PIII, I get an
>>>>execution time of around 120sec. The only warning/note emitted by SBCL
>>>>is for the return value, which has to be unboxed.
>>>>
>>>I've tested the code under CMUCL18d (prerelease) and LispWorks4.2 on a
>>>800-Mhz Duron with 256MB Ram. The mean time was ~40 seconds. I was
>>>indeed quite impressed to see that LispWorks4.2 even was a bit faster
>>>than CMUCL (LW ~39sec, CMUCL ~41 sec)
>>>
>>Please, do you have got any suggestions for me, how to set the
>>compiler-flags in LispWorks. I too have actually installed the
>>personal edition, but I am a little bit confused what to set-up
>>before compiling. [In AL CL the documentation is quite
>>straightforward.]
>>
> 
> I have changed your code a bit and now it takes 4.8 seconds on
> LispWorks Personal Edition, Pentium III 700 MHz.  Changed version:
> 
> (eval-when (compile)
>     (declaim (optimize (float 0) (speed 3) (safety 0)
>                        (space 0) (debug 0))))
> 
> (defun span-array (rows cols &optional (start-val 0d0))
>      (declare (fixnum rows cols)
>               (double-float start-val))
>   (let* ((array (make-array (list rows cols) :element-type 'double-float
>                             :initial-element 0d0)))
>     (loop :for i :of-type fixnum :from 0 :below rows :do
>           (loop :for j :of-type fixnum :from 0 :below cols :do
>                 (setf (aref array i j) (+ start-val (float j 0d0)))))
> array))
> 
> ;;safety setting 0 will result in a segementation-default error
> (defun matmul (m1 m2 m3 dims)
>   (declare (fixnum dims)
>            (type (simple-array double-float (* *)) m1 m2 m3))
>   (loop :for i :of-type fixnum :from 0 :below dims :do
>         (loop :for j :of-type fixnum :from 0 :below dims :do
>               (let ((val 0d0))
>                 (declare (double-float val))
>                 (loop :for k :of-type fixnum :from 0 :below dims :do
>                       (incf val (* (aref m1 i k) (aref m2 k j))))
>         (setf (aref m3 i j) val))))
> (aref m3 0 0))
> 
> 
> (defun tst ()
>   (let ((m1 (span-array 256 256))
>         (m2 (span-array 256 256))
>         (m3 (span-array 256 256)))
>     (time (matmul m1 m2 m3 256))))
> 
> CL-USER 70 > (tst)
> 
> 4.8 seconds used.
> Standard Allocation 16 bytes.
> Fixlen Allocation 0 bytes.
> 0.0

It need ~3.8 seconds on my Duron 800 - but :-( you only used 256x256 
arrays (I used 512x512). Your code need 40 seconds too on my machine for 
512x512 matrices.


I have made a little table to demonstrate the increase of runtime 
dependend on the size of the matrices:

n            | 64    | 128    | 256     | 384     | 512     | 640     |
-----------------------------------------------------------------------
user time    | 0.020 |  0.185 |   2.480 |  10.540 |  38.230 |  80.410 |
allocation   | 98416 | 394440 | 1574088 | 3540296 | 6292776 | 9843336 |
fixlen alloc.|    66 |   2178 |    2332 |    2530 |    2376 |    2530 |

In another mail I made some tests with my and the enhanced C version on 
another machine than my own (a PIII 600Mhz running Debian Linux and the 
2.4.16 Kernel). To my surprise all versions ran much faster on that machine:

9secs   - The optimized indexing C version
15 secs - My ugly C hack
15 secs - A delivered executable of the LispWorks 4.2 version (!)

As you can see it is still equally fast to my ugly C hack on a 
completely other machine and configuration (a machine _not_ configured 
by me btw. ;-) )

I wonder if it would be a good idea to publish the executables 
(particularily the LW delivered one) on my webpage so we can compare 
some other results.

ciao,
Jochen
From: Nils Goesche
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <a22c3d$ti84v$1@ID-125440.news.dfncis.de>
In article <················@dataheaven.de>, Jochen Schmidt wrote:
> Nils Goesche wrote:
>> (defun tst ()
>>   (let ((m1 (span-array 256 256))
>>         (m2 (span-array 256 256))
>>         (m3 (span-array 256 256)))
>>     (time (matmul m1 m2 m3 256))))
>> 
>> CL-USER 70 > (tst)
>> 
>> 4.8 seconds used.
>> Standard Allocation 16 bytes.
>> Fixlen Allocation 0 bytes.
>> 0.0
> 
> It need ~3.8 seconds on my Duron 800 - but :-( you only used 256x256 
> arrays (I used 512x512). Your code need 40 seconds too on my machine for 
> 512x512 matrices.

Sorry, I wasn't really following the thread.  I saw the code, the
error and posted a fix, that's all :-)

> I have made a little table to demonstrate the increase of runtime 
> dependend on the size of the matrices:
> 
> n            | 64    | 128    | 256     | 384     | 512     | 640     |
> -----------------------------------------------------------------------
> user time    | 0.020 |  0.185 |   2.480 |  10.540 |  38.230 |  80.410 |
> allocation   | 98416 | 394440 | 1574088 | 3540296 | 6292776 | 9843336 |
> fixlen alloc.|    66 |   2178 |    2332 |    2530 |    2376 |    2530 |
> 
> In another mail I made some tests with my and the enhanced C version on 
> another machine than my own (a PIII 600Mhz running Debian Linux and the 
> 2.4.16 Kernel). To my surprise all versions ran much faster on that machine:
> 
> 9secs   - The optimized indexing C version
> 15 secs - My ugly C hack
> 15 secs - A delivered executable of the LispWorks 4.2 version (!)
> 
> As you can see it is still equally fast to my ugly C hack on a 
> completely other machine and configuration (a machine _not_ configured 
> by me btw. ;-) )
> 
> I wonder if it would be a good idea to publish the executables 
> (particularily the LW delivered one) on my webpage so we can compare 
> some other results.

Sure.  I am also curious if it runs here ;-)  Linux, please.

Wait until I'm at home and we can time my new Pentium IV 1700 MHz
with 512 MB PC800 Rambus memory *drool* --- sorry, it's been a while
since I had a machine to brag about :-) Won't last long, I guess :-(

A question:  You seem to own Lispworks.  I am going to buy 4.2
very soon now.  Any reason I shouldn't?

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9
From: Bulent Murtezaoglu
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <87sn97p52y.fsf@nkapi.internal>
>>>>> "NG" == Nils Goesche <······@cartan.de> writes:
[...]
    NG> Wait until I'm at home and we can time my new Pentium IV 1700
    NG> MHz with 512 MB PC800 Rambus memory *drool* --- sorry, it's
    NG> been a while since I had a machine to brag about :-) Won't
    NG> last long, I guess :-( [...]

Hmm, I was looking for someone to report whether the missing barrel
shifter in the P4 was hurting lisp performance.  This might hurt
performance some when fixnums with tags in the least significant bits
need to be used as machine integers (eg address an array of (byte 8)).
I was curious about this when the P4 first came out.  If anyone else
is interested maybe we can impose on you with some test code?

cheers,

BM
From: Nils Goesche
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <a22e2v$ti84v$3@ID-125440.news.dfncis.de>
In article <··············@nkapi.internal>, Bulent Murtezaoglu wrote:
>>>>>> "NG" == Nils Goesche <······@cartan.de> writes:
> [...]
>    NG> Wait until I'm at home and we can time my new Pentium IV 1700
>    NG> MHz with 512 MB PC800 Rambus memory *drool* --- sorry, it's
>    NG> been a while since I had a machine to brag about :-) Won't
>    NG> last long, I guess :-( [...]
> 
> Hmm, I was looking for someone to report whether the missing barrel
> shifter in the P4 was hurting lisp performance.  This might hurt
> performance some when fixnums with tags in the least significant bits
> need to be used as machine integers (eg address an array of (byte 8)).
> I was curious about this when the P4 first came out.  If anyone else
> is interested maybe we can impose on you with some test code?

Sure, I just love silly little benchmarks :)

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9
From: Dr. Edmund Weitz
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <m37kqj5hly.fsf@bird.agharta.de>
Nils Goesche <······@cartan.de> writes:

> A question: You seem to own Lispworks.  I am going to buy 4.2 very
> soon now.  Any reason I shouldn't?

Just in case you didn't try yourself: The trial version is already
available from their ftp server although the website still points to
the old version. I just installed it. I only had a few minutes to play
with it, but one of the things that I didn't like so much - the
debugger - seems to have been vastly improved. Plus a bunch of other
nice things.

I wasn't 100% sure with 4.1.20 but now with 4.2 I'm sure I will buy
it.

Edi.
From: Jochen Schmidt
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <3C450389.90004@dataheaven.de>
Nils Goesche wrote:
> In article <················@dataheaven.de>, Jochen Schmidt wrote:
>>I wonder if it would be a good idea to publish the executables 
>>(particularily the LW delivered one) on my webpage so we can compare 
>>some other results.
>>
> 
> Sure.  I am also curious if it runs here ;-)  Linux, please.

AFAIU you now already have the LW4.2 Personal Edition to try it out?

Btw... out of some funny mood I tried to install the Windows LW4.2 
Personal Edition on my Linux Box using "Wine". It worked and runs.
You do not need _any_ Windows installation or other Windows libraries or 
binaries - only what comes with Wine.
The Wine release I use seems to have problems with LW in MDI mode - you 
can't focus an editor or listener MDI window - but if you choose 
"seperate windows" it works like a charm ;-)



> Wait until I'm at home and we can time my new Pentium IV 1700 MHz
> with 512 MB PC800 Rambus memory *drool* --- sorry, it's been a while
> since I had a machine to brag about :-) Won't last long, I guess :-(

I'm interested in the times too :-)

> A question:  You seem to own Lispworks.  I am going to buy 4.2
> very soon now.  Any reason I shouldn't?

I see no reason :-)
It is a really nice and very slick development environment. I really 
enjoy working with it.

ciao,
Jochen
From: Bijan Parsia
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <Pine.A41.4.21L1.0201161022380.62520-100000@login9.isis.unc.edu>
On 15 Jan 2002, Nils Goesche wrote:
[snip]
> Wait until I'm at home and we can time my new Pentium IV 1700 MHz
> with 512 MB PC800 Rambus memory *drool* --- sorry, it's been a while
> since I had a machine to brag about :-) 

What, only 512*MB*? <sniff/>Couldn't you spring the extra nickel for some
*real* ram? It's cheaper than most bottled waters these days. Might be
enough to have two Mozilla windows open at once, if the pages aren't too
complex....

> Won't last long, I guess :-(

:)

Sorry, couldn't resist.

Cheers,
Bijan Parsia.
From: Nils Goesche
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <a24ibr$u2e6v$1@ID-125440.news.dfncis.de>
In article <··········································@login9.isis.unc.edu>, Bijan Parsia wrote:
> On 15 Jan 2002, Nils Goesche wrote:
> [snip]
>> Wait until I'm at home and we can time my new Pentium IV 1700 MHz
>> with 512 MB PC800 Rambus memory *drool* --- sorry, it's been a while
>> since I had a machine to brag about :-) 

Since I use vtwm, I start the day with about 20 MB usage...  The swap
partition hasn't been touched for a looooong time :-)

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9
From: Nils Goesche
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <a22gd8$tf803$1@ID-125440.news.dfncis.de>
In article <················@dataheaven.de>, Jochen Schmidt wrote:
> Nils Goesche wrote:
>> (defun tst ()
>>   (let ((m1 (span-array 256 256))
>>         (m2 (span-array 256 256))
>>         (m3 (span-array 256 256)))
>>     (time (matmul m1 m2 m3 256))))
>> 
>> CL-USER 70 > (tst)
>> 
>> 4.8 seconds used.
>> Standard Allocation 16 bytes.
>> Fixlen Allocation 0 bytes.
>> 0.0
> 
> It need ~3.8 seconds on my Duron 800 - but :-( you only used 256x256 
> arrays (I used 512x512). Your code need 40 seconds too on my machine for 
> 512x512 matrices.

Strange; I just downloaded the Personal Edition 4.2 and tried again
on the 700 MHz Pentium with 512x512 matrices:

CL-USER 5 > (tst)
Timing the evaluation of (MATMUL M1 M2 M3 512)

user time    =     19.630
system time  =      0.050
Elapsed time =   0:00:20
Allocation   = 30040 bytes standard / 51502 bytes fixlen
50 Page faults
0.0

CL-USER 6 >

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9
From: Siegfried Gonzi
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <3C4543A5.1DB67024@kfunigraz.ac.at>
Nils Goesche wrote:

>I have changed your code a bit and now it takes 4.8 seconds on
>LispWorks Personal Edition, Pentium III 700 MHz.  Changed version:

Thank you gentlemen (and all the other guys).


Your version takes on LispWorks 4.1.2 (I heard rumors that there is even an
improved 4.2 version out there; and maybe I can squeeze out another 1 or
2seconds) actually for a 512x512 matrix-matrix multiplication 43sec. [5.7sec
for a 256x256 dimension]

And on Allegro CL it takes 39sec.

Hat off to the LispWorks developers; I have been thinking all the time that
LispWorks is way
behind Allegro Cl when it comes to floating-point and array performance.


I did also a C++. The code actually is more or less a pure C version) timming
(enclosed the
code for my testimony at court). I pinched up a version from
the language-shootout page. The version uses some dynamically generated arrays.
And the second
array becomes transposed (this quite often improves the performance in C).

Microsoft Visual C++ takes 20sec for a 512x512 matrix-matrix multiplication on
a 1000MHz Celeron.

And gcc takes also 20sec for the same multplication.

I did the C timmings without any optimization settings, because I am not well
informed about
C compiler settings (maybe compiler settings will change the execution times
around a factor
of 2).

Otherwise I am quite contented and surprised that Lisp (with appropiate
declarations) performs
well now. A performance hit of factor 2-3 compared to C in other circumstances
is quite acceptable
(at least for me). It is always better to have this option (declaration at ones
disposal), than to
call every time a compiled C function, as lets say in Python. But people should
abandon the 30%
performance-hit-believe of Lisp compared to C; it is maybe possible but the
reality (not only when one is
using arrays and numeric stuff) is a factor of 2 or 3 and more compared to C.


Clean is (maybe) fast due to some caching; which  becomes better exploited by
the Clean compiler.
I have two other Clean versions for the matrix-matrix multiplication, where one
of them performs
the 512x512 multiplication in about 2sec; and the other --hyper tuned version
-- performs a
1024x1024 (O(N^3)) multiplication in about 12sec on my 1000MHz Celeron (see the
Clean mailinglist).
In Clean it is possible to access a row at whole and this can maybe highly
speed up the caching.

A Fast Fourier Transformation would be a better measure (I know a place where
one can get a
2 dimensional one for
Lisp::http://www.cip.physik.uni-muenchen.de/~tf/tutorials.html
but in my opinion this implementation is quite ugly and cumbersome). I once
implemented a 2 dimensional one in
Clean (rummage through the Clean mailing-list in ordere to get my code). It
turned out that it
is about 2 times slower than a comparable C version.

It is highly questionable to draw hasty any conclusions from the above
discussed benchmarks.
Matlab (or put in here your favorite scientific calculation tool) for example
is based on some
highly efficient BLAS libraries or therelike and will perform a matrix-matrix
multiplication or
solving (with LU for example) a system of equations very strikingly (often hand
coded C or Fortran code
will have a hard time to keep up). But it is not the first time that I have red
that a Matlab
simulation takes 2 days whereas a comparable implementation in Fortran
completes the task in
1 hour or so.


That said,
S. Gonzi

Semi C++ version for the matrix-matrix multiplication timming:

==
#include <iostream>
#include <stdlib.h>

using namespace std;

#define SIZE 512

double **mkmatrix(int rows, int cols) {
    int i, j, count = 1;
    double **m = (double **) malloc(rows * sizeof(double *));
    for (i=0; i<rows; i++) {
 m[i] = (double *) malloc(cols * sizeof(double));
 for (j=0; j<cols; j++) {
     m[i][j] = count++;
 }
    }
    return(m);
}

void zeromatrix(int rows, int cols, double **m) {
    int i, j;
    for (i=0; i<rows; i++)
 for (j=0; j<cols; j++)
     m[i][j] = 0;
}

void freematrix(int rows, double **m) {
    while (--rows > -1) { free(m[rows]); }
    free(m);
}

double **transpose(int rows, int cols, double **matrix)
{
 double temp;
 for(int i=0; i<rows; i++)
 {
  for(int j=0; j<cols; j++)
  {
   temp = matrix[i][j];
   matrix[i][j] = matrix[j][i];
   matrix[j][i] = temp;
  }
 }
return(matrix);
}

double **mmult(int rows, int cols, double **m1, double **m2, double **m3) {
    int i, j, k, val;

 m2 = transpose(rows, cols, m2);
 for (i=0; i<rows; i++) {
 for (j=0; j<cols; j++) {
     val = 0;
     for (k=0; k<cols; k++) {
  val += m1[i][k] * m2[j][k];
     }
     m3[i][j] = val;
 }
    }
    return(m3);
}




int main(int argc, char *argv[]) {
    int i, n = 1;

    double **m1 = mkmatrix(SIZE, SIZE);
    double **m2 = mkmatrix(SIZE, SIZE);
    double **mm = mkmatrix(SIZE, SIZE);

    for (i=0; i<n; i++) {
 mm = mmult(SIZE, SIZE, m1, m2, mm);
    }
    cout << mm[0][0] << " " << mm[2][3] << " " << mm[3][2] << " " << mm[4][4]
<< endl;

    freematrix(SIZE, m1);
    freematrix(SIZE, m2);
    freematrix(SIZE, mm);
    return(0);
}
==
From: Pierre R. Mai
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <87zo3edsu3.fsf@orion.bln.pmsf.de>
Siegfried Gonzi <···············@kfunigraz.ac.at> writes:

> Otherwise I am quite contented and surprised that Lisp (with
> appropiate declarations) performs well now. A performance hit
> of factor 2-3 compared to C in other circumstances is quite
> acceptable (at least for me). It is always better to have this
> option (declaration at ones disposal), than to call every time
> a compiled C function, as lets say in Python. But people should
> abandon the 30% performance-hit-believe of Lisp compared to C;
> it is maybe possible but the reality (not only when one is
> using arrays and numeric stuff) is a factor of 2 or 3 and more
> compared to C.

That is not the whole truth.  While in this case (and probably other
cases) there is a factor of 2 or 3 between e.g. gcc and CMU CL (which
fared particularly badly in this case because of deficiencies in its
register allocator and peep-hole optimizer, especially on
register-starved x86[1]), there are lots of cases where there is much
less of an impact, much more in line with the often quoted 30%
figure.  Take e.g. the recent case of fast MD5 calculations, where in
the end CMU CL came within 1.15-1.5 of md5sum compiled with gcc, which
likewise isn't an isolated occurrance.

So while there are cases where 2 or 3 is closer to the truth, there
are also lots of cases where 30-50% are much closer to the truth.
That said, for cases where it really matters, nothing beats
benchmarking your specific code...  ;)

Regs, Pierre.

Footnotes: 
[1]  On e.g. UltraSPARC, the runtime difference was e.g. only 12.5s
     vs. 19.5s.

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein
From: Nils Goesche
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <a24oo3$ucq22$1@ID-125440.news.dfncis.de>
In article <·················@kfunigraz.ac.at>, Siegfried Gonzi wrote:
> 
> Otherwise I am quite contented and surprised that Lisp (with
> appropiate declarations) performs well now. A performance hit of
> factor 2-3 compared to C in other circumstances is quite acceptable
> (at least for me). It is always better to have this option
> (declaration at one s disposal), than to call every time a compiled C
> function, as lets say in Python. But people shoul d abandon the 30%
> performance-hit-believe of Lisp compared to C; it is maybe possible
> but the reality (not only when one is using arrays and numeric stuff)
> is a factor of 2 or 3 and more compared to C.

This kept me wondering because it contradicts my personal experience.
In /my/ benchmarks, I often could make Lisp /faster/ than C, or gcc,
at least.  Are you sure you got your timings correct?

I rewrote your C++ version to a C version as close as possible to
the Lisp version:

#include <stdio.h>
#include <stdlib.h>

#define SIZE 512

typedef double (*mp)[SIZE];

mp span_array(int rows, int cols, double start_val)
{
	mp ret;
	if ((ret = malloc(rows * (SIZE * sizeof(double))))) {
		int i, j;

		for (i = 0; i < rows; i++)
			for (j = 0; j < cols; j++)
				ret[i][j] = start_val + j;
	}
	return ret;
}

double matmul(mp m1, mp m2, mp m3, int dims)
{
	int i, j, k;

	for (i = 0; i < dims; i++)
		for (j = 0; j < dims; j++) {
			double val = 0;

			for (k = 0; k < dims; k++)
				val += m1[i][k] * m2[k][j];
			m3[i][j] = val;
		}
	return m3[0][0];
}

int main(void)
{
	mp m1, m2, m3;

	m1 = span_array(SIZE, SIZE, 0);
	m2 = span_array(SIZE, SIZE, 0);
	m3 = span_array(SIZE, SIZE, 0);

	if (m1 && m2 && m3) {
		printf("Return val is %g\n", matmul(m1, m2, m3, SIZE));
	}
	free(m1), free(m2), free(m3);
	return 0;
}

Now I compile it with *full* optimizations:

~ $ gcc -Wall -O4 -march=i686 -ansi -pedantic matmul.c 
~ $ time ./a.out 
Return val is 0
19.690 secs
~ $

Lispworks 4.2 on the same 700 MHz machine takes 21 secs.  I am getting
confused about all those different results :-)

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9
From: Siegfried Gonzi
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <3C48251C.52348332@kfunigraz.ac.at>
Nils Goesche wrote:

> I rewrote your C++ version to a C version as close as possible to
> the Lisp version:
>
> #include <stdio.h>
> #include <stdlib.h>
>
> #define SIZE 512
>
> typedef double (*mp)[SIZE];
>
> mp span_array(int rows, int cols, double start_val)
> {
>         mp ret;
>         if ((ret = malloc(rows * (SIZE * sizeof(double))))) {
>                 int i, j;
>
>                 for (i = 0; i < rows; i++)
>                         for (j = 0; j < cols; j++)
>                                 ret[i][j] = start_val + j;
>         }
>         return ret;
> }
>
> double matmul(mp m1, mp m2, mp m3, int dims)
> {
>         int i, j, k;
>
>         for (i = 0; i < dims; i++)
>                 for (j = 0; j < dims; j++) {
>                         double val = 0;
>
>                         for (k = 0; k < dims; k++)
>                                 val += m1[i][k] * m2[k][j];
>                         m3[i][j] = val;
>                 }
>         return m3[0][0];
> }
>
> int main(void)
> {
>         mp m1, m2, m3;
>
>         m1 = span_array(SIZE, SIZE, 0);
>         m2 = span_array(SIZE, SIZE, 0);
>         m3 = span_array(SIZE, SIZE, 0);
>
>         if (m1 && m2 && m3) {
>                 printf("Return val is %g\n", matmul(m1, m2, m3, SIZE));
>         }
>         free(m1), free(m2), free(m3);
>         return 0;
> }
>
> Now I compile it with *full* optimizations:
>
> ~ $ gcc -Wall -O4 -march=i686 -ansi -pedantic matmul.c
> ~ $ time ./a.out
> Return val is 0
> 19.690 secs
> ~ $
>
> Lispworks 4.2 on the same 700 MHz machine takes 21 secs.  I am getting
> confused about all those different results :-)

This code on a 1000MHz Celeron machine and Windows XP and gcc takes 13sec.

[Your code just only works with your optimizations (otherwise the warings
become errors; according to the compiler: something is then not ANSI
compliant with your code)]

The LispWorks 4.1.20 multiplication, again, takes 43sec. But I will see if
4.2 is also available for Windows and will try it again.


S. Gonzi
From: Nils Goesche
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <a29aim$t0i5j$1@ID-125440.news.dfncis.de>
In article <·················@kfunigraz.ac.at>, Siegfried Gonzi wrote:
> Nils Goesche wrote:
> 
>> I rewrote your C++ version to a C version as close as possible to
>> the Lisp version:

[snip]

> [Your code just only works with your optimizations (otherwise the warings
> become errors; according to the compiler: something is then not ANSI
> compliant with your code)]

Pardon?  gcc gives no warnings at all on the code, and I believe it
to be 100% ANSI compliant...

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9
From: Siegfried Gonzi
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <3C48359D.61C758BE@kfunigraz.ac.at>
Nils Goesche wrote:

> > [Your code just only works with your optimizations (otherwise the warings
> > become errors; according to the compiler: something is then not ANSI
> > compliant with your code)]
>
> Pardon?  gcc gives no warnings at all on the code, and I believe it
> to be 100% ANSI compliant...

Both gcc and Microsoft Visual C++ are complaining (they exhibit nearly the same
error messages).

I will post it tomorrow.

S. Gonzi
From: Siegfried Gonzi
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <3C4936BC.A1D82599@kfunigraz.ac.at>
Nils Goesche wrote:

> In article <·················@kfunigraz.ac.at>, Siegfried Gonzi wrote:
> > Nils Goesche wrote:
> >
> >> I rewrote your C++ version to a C version as close as possible to
> >> the Lisp version:
>
> [snip]
>
> > [Your code just only works with your optimizations (otherwise the warings
> > become errors; according to the compiler: something is then not ANSI
> > compliant with your code)]
>
> Pardon?  gcc gives no warnings at all on the code, and I believe it
> to be 100% ANSI compliant...

The issue has been resolved. I was believing that it is irrelevant whether to
compile a file with ".c" or ".cpp", because the compiler should see at the
include files on which source-code it should work on.

If one tries to compile your C-code under C++ then:

---Microsoft Visual C++ 6:

matrix.cpp
C:\Wissenschaft\Microsoft Visual Studio\MyProjects\matrix\matrix.cpp(11) :
error C2440: '=' : 'void *' kann nicht in 'double (*)[512]' konvertiert werden
        Konvertierung von 'void*' in Zeiger auf nicht-'void' erfordert eine
explizite Typumwandlung
Fehler beim Ausf�hren von cl.exe.

matrix.obj - 1 Fehler, 0 Warnung(en)

---gcc

gcc -c ma.cpp

ma.cpp: In function 'double (* span_array(int, int, double))[512]':
ma.cpp:11: ANSI C++ forbids implicit conversion from 'void *' in assignment


gcc -Wall -O4 -march=i686 -ansi -pedantic ma.cpp

ma.cpp: In function 'double (* span_array(int, int, double))[512]':
ma.cpp:11: warning: ANSI C++ forbids implicit conversion from 'void *' in
assignment

okay:

gcc -c ma.c


S. Gonzi
From: Siegfried Gonzi
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <1012503882.509484@hagakure.utanet.at>
----- Original Message -----
From: "Nils Goesche" <······@cartan.de>

>
> Now I compile it with *full* optimizations:
>
> ~ $ gcc -Wall -O4 -march=i686 -ansi -pedantic matmul.c
> ~ $ time ./a.out
> Return val is 0
> 19.690 secs
> ~ $
>
> Lispworks 4.2 on the same 700 MHz machine takes 21 secs.  I am getting
> confused about all those different results :-)

I had the possibility to download LispWorks 4.2. It only supports the
results that LispWorks is 2 times faster than Allegro Common Lisp 6; at
least on matrix-matrix-multiplication:

Machine: Laptop 1000MHz Celeron and 256MB RAM:

For a 512x512 multiplication (your Lisp code):

LISPWorks:

CL-USER 1 > (tst)
Timing the evaluation of (MATMUL M1 M2 M3 512)
; Loading fasl file
C:\Wissenschaft\LispWorks\lib\4-2-0-0\modules\util\callcoun.fsl

user time    =     20.479
system time  =      0.000
Elapsed time =   0:00:21
Allocation   = 856 bytes standard / 781 bytes fixlen
0 Page faults
0.0


For a 256x256 multiplication:

LispWorks:

CL-USER 2 > (tst)
Timing the evaluation of (MATMUL M1 M2 M3 256)

user time    =      2.693
system time  =      0.000
Elapsed time =   0:00:03
Allocation   = 728 bytes standard / 1155 bytes fixlen
0 Page faults
0.0
AllegroCommonLisp:

CG-USER(2): (tst)
; cpu time (non-gc) 4,687 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  4,687 msec user, 0 msec system
; real time  4,697 msec
; space allocation:
;  9 cons cells, 456 other bytes, 72 static bytes
0.0d0


But there remains the true tragedy that declaring is hardly ever a topic:
how will one declare the essential parts in a many 1000 lines of Lisp code
(I mean in a few days an not 300 years)?

In reality a factor of 10 to 100 remains to lets say C or Fortran; at least
for numerical code.

BTW: The LispWorks 4.2 environment is getting more and more acceptable
(except that I couldn't figure out how to persuade LispWorks to make the
indentation on its own).

S. Gonzi
From: Thomas F. Burdick
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <xcvvgdimhwc.fsf@whirlwind.OCF.Berkeley.EDU>
"Siegfried Gonzi" <···············@kfunigraz.ac.at> writes:

> But there remains the true tragedy that declaring is hardly ever a topic:
> how will one declare the essential parts in a many 1000 lines of Lisp code
> (I mean in a few days an not 300 years)?

In N thousand lines of code, there are probably < N hundred lines of
code that are the performance bottleneck.  If you know how to use a
profiler and you know your Lisp implementation, tweaking this amount
of code is maybe a few days worth of work.  If you don't know your
implementation, of course it's going to be hard, but that's why
craftsmen and -women learn their tools.

> In reality a factor of 10 to 100 remains to lets say C or Fortran; at least
> for numerical code.

No way.  Fortran implementations probably beat Lisps for numerical
code, but not by two orders of magnitude, not if the Lisp is in
competent hands.  And some of the functional languages (ML) probably
beat both.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'                               
From: Dr. Edmund Weitz
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <m3665i5k63.fsf@dyn138.dbdmedia.de>
"Siegfried Gonzi" <···············@kfunigraz.ac.at> writes:

> BTW: The LispWorks 4.2 environment is getting more and more
> acceptable

I'd call this understatement...

> (except that I couldn't figure out how to persuade LispWorks to make
> the indentation on its own).

Are you talking about indentation of Lisp code in the editor? Did you
make sure that your buffer is in Lisp mode (by either switching
explicitely to this mode via 'M-x lisp mode' or by opening a file with
a 'lispy' suffix)? If in Lisp mode, the editor should more or less
behave like Emacs.

On Linux, if you open a new buffer you're in Lisp mode
automatically. I'm not sure but I seem to remember that on Wintendo
(with LWW 4.1), new buffers were in fundamental mode initially.

One noticeable difference between Emacs and the LW editor is that in
LW pressing the return key won't automatically indent the next line. I
haven't checked yet if there's a way to customize this behaviour.

Edi.
From: Nils Goesche
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <a3caq9$166aaj$1@ID-125440.news.dfncis.de>
In article <··············@dyn138.dbdmedia.de>, Dr. Edmund Weitz wrote:
> "Siegfried Gonzi" <···············@kfunigraz.ac.at> writes:
> 
>> BTW: The LispWorks 4.2 environment is getting more and more
>> acceptable
> 
> I'd call this understatement...

Indeed :-)
 
>> (except that I couldn't figure out how to persuade LispWorks to make
>> the indentation on its own).

> One noticeable difference between Emacs and the LW editor is that in
> LW pressing the return key won't automatically indent the next line. I
> haven't checked yet if there's a way to customize this behaviour.

(editor:bind-key "Indent New Line" #\return)

But I'd strongly suggest using M-( and M-) instead (maybe bound to
more accessable keys instead); makes for much faster Lisp editing...
especially together with

(defpackage "MYEDIT"
  (:add-use-defaults t)
  (:use "EDITOR")
  (:export move-past-close-and-reindent-command))

(in-package "MYEDIT")

(defcommand "Move Past Close And Reindent" (p)
     "Move past next `)', delete indentation before it, then indent after it.
Prefix argument is ignored."
     "Just like Emacs' M-), as God intended."
  (declare (ignore p))
  (let ((cpoint (current-point)))
    (form-offset cpoint 1 t 1)
    (forward-character-command -1)
    (when (save-excursion
            (let ((before-paren (copy-point cpoint :temporary)))
              (back-to-indentation-command nil)
              (point= cpoint before-paren)))
      (beginning-of-line-command nil)
      (delete-previous-character-command nil)
      (delete-horizontal-space-command nil)))
  (forward-character-command 1)
  (new-line-command nil)
  (indent-command nil))

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9
From: Dr. Edmund Weitz
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <m3wuxy43k1.fsf@dyn138.dbdmedia.de>
Nils Goesche <······@cartan.de> writes:

> (editor:bind-key "Indent New Line" #\return)
> 
> But I'd strongly suggest using M-( and M-) instead (maybe bound to
> more accessable keys instead); makes for much faster Lisp editing...
> especially together with
> 
> (defpackage "MYEDIT"
>   (:add-use-defaults t)
>   (:use "EDITOR")
>   (:export move-past-close-and-reindent-command))
> 
> (in-package "MYEDIT")
> 
> (defcommand "Move Past Close And Reindent" (p)
>      "Move past next `)', delete indentation before it, then indent after it.
> Prefix argument is ignored."
>      "Just like Emacs' M-), as God intended."
>   (declare (ignore p))
>   (let ((cpoint (current-point)))
>     (form-offset cpoint 1 t 1)
>     (forward-character-command -1)
>     (when (save-excursion
>             (let ((before-paren (copy-point cpoint :temporary)))
>               (back-to-indentation-command nil)
>               (point= cpoint before-paren)))
>       (beginning-of-line-command nil)
>       (delete-previous-character-command nil)
>       (delete-horizontal-space-command nil)))
>   (forward-character-command 1)
>   (new-line-command nil)
>   (indent-command nil))

I remember the funny line "as God intended" so I must have seen it
before. But it looks as if I forgot it - sigh.

Thanks,
Edi.
From: Rahul Jain
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <87665hbtvl.fsf@photino.sid.rice.edu>
"Siegfried Gonzi" <···············@kfunigraz.ac.at> writes:

> But there remains the true tragedy that declaring is hardly ever a topic:
> how will one declare the essential parts in a many 1000 lines of Lisp code
> (I mean in a few days an not 300 years)?

Probably more easily than C programmers declare ALL the 3000 lines of
code they would write instead.

-- 
-> -/-                       - Rahul Jain -                       -\- <-
-> -\- http://linux.rice.edu/~rahul -=-  ············@techie.com  -/- <-
-> -/- "I never could get the hang of Thursdays." - HHGTTG by DNA -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   Version 11.423.999.221020101.23.50110101.042
   (c)1996-2002, All rights reserved. Disclaimer available upon request.
From: Lieven Marchand
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <m3pu4ddn7p.fsf@localhost.localdomain>
Siegfried Gonzi <···············@kfunigraz.ac.at> writes:

> What is wrong (concerning the declaration) with the following code:
> 
> (defun matmul (m1 m2 m3 dims)
>             (declare (optimize (speed 3) (safety 0) (debug 0))
>                      (fixnum i j)
>                      (double-float val)
>                      (fixnum k dims)
>                      (simple-array (double-float (* *)) m1 m2 m3))
>               (loop :for i :from 0 :below dims :do
>                   (loop :for j :from 0 :below dims :do
>                   (setf val 0.0)
>                       (loop :for k :from 0 :below dims :do
>                             (setf val (+ val (* (aref m1   i j) (aref
> m2  j i)))))
>                (setf (aref m3 i j) val)))
> (aref m3 0 0))

You're variable VAL is undeclared and will be assumed special. This
can have consequences on the efficiency of your program.

You declare (FIXNUM I J) but this declaration can't affect the LOOP
variables I and J since they have their own scope. So use the type
spec feature of LOOP as (loop for i fixnum from ...)

If you use Allegro, ask it to explain its type inference and call
decisions by adding a (declare (:explain :types :calls)).

And most implementations will need to be told the values of all but
the fastest changing index size in order to inline AREF.

-- 
Lieven Marchand <···@wyrd.be>
She says, "Honey, you're a Bastard of great proportion."
He says, "Darling, I plead guilty to that sin."
Cowboy Junkies -- A few simple words
From: Alexey Dejneka
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <m3wuyl12e2.fsf@comail.ru>
Siegfried Gonzi <···············@kfunigraz.ac.at> writes:

> What is wrong (concerning the declaration) with the following code:
> 
> (defun matmul (m1 m2 m3 dims)
>             (declare (optimize (speed 3) (safety 0) (debug 0))
>                      (fixnum i j)
                       ^^^^^^^^^^^^
>                      (double-float val)
>                      (fixnum k dims)
                              ^^^
>                      (simple-array (double-float (* *)) m1 m2 m3))
>               (loop :for i :from 0 :below dims :do
>                   (loop :for j :from 0 :below dims :do
>                   (setf val 0.0)
>                       (loop :for k :from 0 :below dims :do
>                             (setf val (+ val (* (aref m1   i j) (aref
> m2  j i)))))
>                (setf (aref m3 i j) val)))
> (aref m3 0 0))

Underlined declarations are related to global variables and say
nothing about loop-introduced vars.

VAL is a dynamic variable.

0.0 can be not of type DOUBLE-FLOAT, therefore (SETF VAL 0.0) can
produce garbage.

Declaration of M1, M2, M3 has invalid syntax.


> The above code performs the matrix-matrix-multiplication

... after replacing
    (* (aref m1 i j) (aref m2 j i))
with
    (* (aref m1 i k) (aref m2 k j))
:-)

The resulting code:

---
(defun matmul (m1 m2 m3 dims)
  (declare (optimize (speed 3) (safety 0) (debug 0))
           (fixnum dims)
           (type (simple-array double-float (* *)) m1 m2 m3))
  (loop :for i :of-type fixnum :from 0 :below dims :do
        (loop :for j :of-type fixnum :from 0 :below dims :do
              (let ((val 0d0))
                (declare (double-float val))
                (loop :for k :of-type fixnum :from 0 :below dims :do
                      (incf val (* (aref m1 i k) (aref m2 k j))))
                (setf (aref m3 i j) val))))
  (aref m3 0 0))
---

Or you can use typed :SUM clause in the inner loop instead of VAL:

(setf (aref m3 i j)
      (loop :for k :of-type fixnum :from 0 :below dims
            :sum (* (aref m1 i k) (aref m2 k j)) :of-type double-float))

--
Regards,
Alexey Dejneka
From: Siegfried Gonzi
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <3C42EEC2.DE07BFB2@kfunigraz.ac.at>
Alexey Dejneka wrote:

> > The above code performs the matrix-matrix-multiplication
>
> ... after replacing
>     (* (aref m1 i j) (aref m2 j i))
> with
>     (* (aref m1 i k) (aref m2 k j))
> :-)
>
> The resulting code:

Thank you very much; i will try it. Maybe I forgot to change the k-index
back after experimenting with indices. [BTW: my posted Clean version uses k
as a correct index].


S. Gonzi
From: Siegfried Gonzi
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <3C43EF4F.799A0FF2@kfunigraz.ac.at>
Alexey Dejneka wrote:

>
> The resulting code:
>
> ---
> (defun matmul (m1 m2 m3 dims)
>   (declare (optimize (speed 3) (safety 0) (debug 0))
>            (fixnum dims)
>            (type (simple-array double-float (* *)) m1 m2 m3))
>   (loop :for i :of-type fixnum :from 0 :below dims :do
>         (loop :for j :of-type fixnum :from 0 :below dims :do
>               (let ((val 0d0))
>                 (declare (double-float val))
>                 (loop :for k :of-type fixnum :from 0 :below dims :do
>                       (incf val (* (aref m1 i k) (aref m2 k j))))
>                 (setf (aref m3 i j) val))))
>   (aref m3 0 0))
> ---

[I am aware that some posts appeared afterwards, which are discussing
improvements; but I post it nevertheless]

This version takes for arrays of dimension 256x256 about 40sec; an
equivalent version but without any
declarations takes exactly the same time. I have to emphasize here that the
declared-version
does only function when the safety is set to 3 (at least on Allegro CL 6.1
for Windows); the
version without declaration works with safety setting 0. This all on a
Celeron 1000MHz processor and
Allegro CL 6.1 for Windows (student edition). A simple time profiling shows
the following (for a
256x256 matrix-matrix multiplication with AL CL 6.1 and a 1000MHz Celeron):

==
Times below 1.0% will be suppressed.

  %     %     self  total            self   total  Function
 Time  Cum.   secs   secs    calls ms/call ms/call   name
 25.2  25.2    6.3    7.4                          "aref"
 18.1  43.3    4.5    4.5                          EXCL::*_2OP
 13.5  56.8    3.4    3.4                          "_gsgc_scan_stacks"
 12.6  69.4    3.1    3.1                          EXCL::+_2OP
  7.6  77.0    1.9   24.3                          MATMUL
  7.6  84.6    1.9    1.9
"_prelink_stacks_scan_other"
  5.1  89.7    1.3    1.3                          "calc_index"
  3.6  93.3    0.9    7.5                          "new_double_float"
  3.3  96.6    0.8    0.8                          "kf_new_other"
  2.0  98.6    0.5    0.5                          "_ausize_other"
==


[A 256x256 Clean version takes 1.6sec (3sec without declarations). The
R-language takes about 1.5sec;
normaly R is interpreted but the matrix-matrix-multiplication is a build-in
function; I assume so]


Enclosed the file which I process (only for the sake of completeness in
order to exhibit possible
mistakes along my side).


Honestly speaking: A performance hit of a factor of 10 (that means about
10sec for a 256x256
matrix-matrix multiplication compared to 1.6sec with Clean) would not mind
me. But factors of
30 and more are somewhat bothersome for a compiled language - isn't it?

S. Gonzi
My file which is beeing loaded and compiled in AL CL 6.1:

==
(eval-when (compile)
     (declaim (optimize (speed 3) (safety 0) (space 0) (debug 0))))

(defun span-array (rows cols &optional (start-val 0d0))
     (declare (optimize (speed 3) (safety 0) (debug 0)))
  (let* ((array (make-array (list rows cols) )))
    (loop :for i :from 0 :below rows :do
          (loop :for j :from 0 :below cols :do
                (setf (aref array i j) (+ start-val j))))
array))

;;safety setting 0 will result in a segementation-default error
(defun matmul (m1 m2 m3 dims)
  (declare (optimize (speed 3) (safety 3) (debug 0))
           (fixnum dims)
           (type (simple-array double-float (* *)) m1 m2 m3))
  (loop :for i :of-type fixnum :from 0 :below dims :do
        (loop :for j :of-type fixnum :from 0 :below dims :do
              (let ((val 0d0))
                (declare (double-float val))
                (loop :for k :of-type fixnum :from 0 :below dims :do
                      (incf val (* (aref m1 i k) (aref m2 k j))))
        (setf (aref m3 i j) val))))
(aref m3 0 0))


(defun matmul8 (m1 m2 m3 dims)
  (declare (optimize (speed 3) (safety 0) (debug 0)))
  (loop :for i :from 0 :below dims :do
        (loop :for j  :from 0 :below dims :do
              (let ((val 0d0))
                (loop :for k  :from 0 :below dims :do
                      (incf val (* (aref m1 i k) (aref m2 k j))))
        (setf (aref m3 i j) val))))
(aref m3 0 0))



(setf m1 (span-array 256 256))
(setf m2 (span-array 256 256))
(setf m3 (span-array 256 256))

(time (matmul m1 m2 m3 256))
;;(time (matmul8 m1 m2 m3 256))
==
From: Duane Rettig
Subject: Re: Declaring arrays in Lisp
Date: 
Message-ID: <41ygrsgmz.fsf@beta.franz.com>
Siegfried Gonzi <···············@kfunigraz.ac.at> writes:

> [I am aware that some posts appeared afterwards, which are discussing
> improvements; but I post it nevertheless]

I came in late to this thread, so some of what I say might have been
said before.  Also, Nils Goesche posted a corrected version which
he claims speeds up the Lispworks times; his version also speeds
up the Allegro CL times.

However, I do want to make a point that Nils didn't make (though
he corrected his version of the code anyway): it is usually not
good to "lie" to the compiler when you are adding declarations -
note that in matmul you are declaring 

>            (type (simple-array double-float (* *)) m1 m2 m3))

but the arrays created in span-array are not specialized:

>   (let* ((array (make-array (list rows cols) )))

The mismatch between declared and actual types is much more
egregious than the apparent timing difficulty - slowness is
usually apparent right away, but blasted array cells due to
aggressive compilation that is wrong could eventually crash
your lisp.  No modifications are being done to the array
contents in this example, but sometimes a GC error might
eventually be seen after an over-optimized (i.e. wrong) code
stuffs unboxed floats into a general lisp array.

Note the types of the two arrays (first the one you created,
and then the corrected one with corrected element-type specification:

CL-USER(1): (make-array '(256 256))
#2A((NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL ...)
    (NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL ...)
    (NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL ...)
    (NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL ...)
    (NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL ...)
    (NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL ...)
    (NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL ...)
    (NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL ...)
    (NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL ...)
    (NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL ...)
    ...)
CL-USER(2): (type-of *)
(ARRAY T (256 256))
CL-USER(3): (make-array '(256 256) :element-type 'double-float :initial-element 0.0d0)
#2A((0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 ...)
    (0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 ...)
    (0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 ...)
    (0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 ...)
    (0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 ...)
    (0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 ...)
    (0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 ...)
    (0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 ...)
    (0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 ...)
    (0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 0.0d0 ...)
    ...)
CL-USER(4): (type-of *)
(ARRAY DOUBLE-FLOAT (256 256))
CL-USER(5): 

Now, it may be that not all CL implementations will have this trouble;
any implementation which does not provide a specialized double-float
array will upgrade a double-float specification to type T, and thus
there will be no difference in the compilation between an array of
double-floats and an array of general lisp objects (however, you also
won't get the speedup that unboxed arithmetic affords you).
To find out if a particular element-type is supported, use
the following:

CL-USER(5): (upgraded-array-element-type 'double-float)
DOUBLE-FLOAT
CL-USER(6): 

You can also find out whether the element type will be moved to the
next more specific element-type:

CL-USER(7): (upgraded-array-element-type '(signed-byte 24))
(SIGNED-BYTE 32)
CL-USER(8): 

or if the impolementation will decline to specialize it at all:

CL-USER(8): (upgraded-array-element-type 'ratio)
T
CL-USER(9): 

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)