From: Martin Raspaud
Subject: Map on an array
Date: 
Message-ID: <c0vib5$1en$1@news.u-bordeaux.fr>
Hi all,

I'm trying to enhance a part of the program I'm working on. I profiled 
it and came to the conclusion that a part similar to the next function 
was slowing all the process.



(defun stupid-function ()
   (let ((array (make-array 2048 :element-type '(complex double-float)))
	(window (make-array 2048 :element-type '(complex double-float))))
     (loop for i from 0 below 2048 do           ;initialize the arrays
	  (setf (aref array i) (coerce (1+ i) '(complex double-float)))
	  (setf (aref window i) (coerce (- 2048 i) '(complex double-float))))
     (loop for i from 1 to 2000 do
	  (setf array (map '(simple-array (complex double-float) (*)) #'/ 
window array)))))


This takes around 10 seconds on my computer. The problematic part is 
_map_... So I thought maybe there is a way to replace it somehow, 
because as you can see, the "window" is always the same...

Any ideas ?

Martin

From: Paul F. Dietz
Subject: Re: Map on an array
Date: 
Message-ID: <ZeSdnSnWxNh42q7d4p2dnA@dls.net>
Martin Raspaud wrote:
> Hi all,
> 
> I'm trying to enhance a part of the program I'm working on. I profiled 
> it and came to the conclusion that a part similar to the next function 
> was slowing all the process.
> 
> 
> 
> (defun stupid-function ()
>   (let ((array (make-array 2048 :element-type '(complex double-float)))
>     (window (make-array 2048 :element-type '(complex double-float))))
>     (loop for i from 0 below 2048 do           ;initialize the arrays
>       (setf (aref array i) (coerce (1+ i) '(complex double-float)))
>       (setf (aref window i) (coerce (- 2048 i) '(complex double-float))))
>     (loop for i from 1 to 2000 do
>       (setf array (map '(simple-array (complex double-float) (*)) #'/ 
> window array)))))
> 
> 
> This takes around 10 seconds on my computer. The problematic part is 
> _map_... So I thought maybe there is a way to replace it somehow, 
> because as you can see, the "window" is always the same...

MAP is allocating a new array each time.  It's also probably
not being inlined (use DISASSEMBLE to inspect your code).

You need to specify the optimization settings as well.  The default
settings may not lead to good performance.

There's MAP-ONTO, and you could always write your own loop.

If MAP-ONTO is too slow in your implementation, complain to the
implementors that you'd like it to be inlined in such cases.

Which implementation are you using?

	Paul
	who plans to write performance tests sometime after ansi-tests
From: Martin Raspaud
Subject: Re: Map on an array
Date: 
Message-ID: <c0vjca$1tq$1@news.u-bordeaux.fr>
Paul F. Dietz wrote:
> 
> MAP is allocating a new array each time.  It's also probably
> not being inlined (use DISASSEMBLE to inspect your code).
> 
> You need to specify the optimization settings as well.  The default
> settings may not lead to good performance.
> 
> There's MAP-ONTO, and you could always write your own loop.
> 
> If MAP-ONTO is too slow in your implementation, complain to the
> implementors that you'd like it to be inlined in such cases.
> 
> Which implementation are you using?
> 
>     Paul
>     who plans to write performance tests sometime after ansi-tests

I'm using cmucl... And I found map-into (not map-onto) in the hyperspec, 
but it's even slower. Actually the setf I wrote in the end was just 
there to modify "array" which is not the way I do it in my program.

What optimization settings would you recommand ? I used (declare 
(optimize (speed 3) (safety 0))) but it didn't do much (if anything).

Martin
From: Paul F. Dietz
Subject: Re: Map on an array
Date: 
Message-ID: <_M-dnaumm8ZN067dRVn-hQ@dls.net>
Martin Raspaud wrote:

> I'm using cmucl... And I found map-into (not map-onto)

Oops. :)

	Paul
From: Martin Raspaud
Subject: Re: Map on an array
Date: 
Message-ID: <c0vjea$1tq$2@news.u-bordeaux.fr>
Paul F. Dietz wrote:

> 
> There's MAP-ONTO, and you could always write your own loop.
> 

I tried to write my own loop, but it's not faster...

Martin
From: Paul F. Dietz
Subject: Re: Map on an array
Date: 
Message-ID: <c7GdnSxqYe6aza7dRVn-iQ@dls.net>
Martin Raspaud wrote:

>> There's [MAP-INTO], and you could always write your own loop.
>>
> 
> I tried to write my own loop, but it's not faster...

What's may be happening is things are being boxed and unboxed,
which means one heap allocation per floating point number.  This
is slow.  Do disassemble on the compiled function to see if
that's happening.  Make sure the arefs are being inlined as
well (type declarations might help if that's not happening).

What did your loop look like?  Again, disassemble it and look
for places where things haven't been inlined.

Also, check that (upgraded-array-element-type '(complex double-float))
is (complex double-float).  It is on recent CMUCL, but I don't
know about older versions.

	Paul
From: Paul F. Dietz
Subject: Re: Map on an array
Date: 
Message-ID: <--OdnTuAn9mSy67dRVn-hw@dls.net>
Martin Raspaud wrote:

> I tried to write my own loop, but it's not faster...

Funny, a loop does look faster to me.  You may need to declare the loop index
variables to avoid generic arithmetic.  Also note that COMPLEX is not being
inlined, which may be quite expensive (since it is probably heap allocating the
result.)

In comparing this against C, be sure you include the cost of allocating
the vectors in the C program.

(This is at safety 0, debug 0, space 0, speed 3)

* (defun stupid-function ()
   (let ((array (make-array 2048 :element-type '(complex double-float)))
     (window (make-array 2048 :element-type '(complex double-float))))
     (loop for i of-type fixnum from 0 below 2048 do           ;initialize the arrays
       (setf (aref array i) (complex (1+ i) 0.0d0))
       (setf (aref window i) (complex (- 2048 i) 0.0d0)))
     (loop for i of-type fixnum from 1 to 2000 do (setf (aref array i) (/ (aref window i) (aref array i))))
     array))
STUPID-FUNCTION
* (disassemble 'stupid-function)
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:

4877D430:       .ENTRY "LAMBDA NIL"()        ; FUNCTION
      448:       POP     DWORD PTR [EBP-8]
      44B:       LEA     ESP, [EBP-32]
      44E:       MOV     EAX, 102             ; No-arg-parsing entry point
      453:       MOV     EBX, 8192
      458:       MOV     ECX, 32768
      45D:       CALL    #x10000120           ; #x10000120: ALLOCATE-VECTOR
      462:       MOV     ESI, EDX
      464:       MOV     EAX, 102
      469:       MOV     EBX, 8192
      46E:       MOV     ECX, 32768
      473:       CALL    #x10000120           ; #x10000120: ALLOCATE-VECTOR
      478:       MOV     [EBP-12], ESI
      47B:       MOV     [EBP-16], EDX
      47E:       XOR     EBX, EBX
      480: L0:   MOV     [EBP-20], EBX
      483:       LEA     EDX, [EBX+4]
      486:       MOV     ESI, ESP
      488:       SUB     ESP, 12
      48B:       MOV     EDI, [#x280002DC]    ; X86::*FP-CONSTANT-0D0*
      491:       MOV     EAX, [#x4877D428]    ; #<FDEFINITION object for COMPLEX>
      497:       MOV     ECX, 8
      49C:       MOV     [ESI-4], EBP
      49F:       MOV     EBP, ESI
      4A1:       CALL    DWORD PTR [EAX+5]
      4A4:       MOV     ESP, EBX
      4A6:       MOV     EBX, [EBP-20]
      4A9:       FSTPD   ST(0)
      4AB:       FLDD    [EDX+1]
      4AE:       FSTPD   ST(1)
      4B0:       FLDD    [EDX+9]
      4B3:       FXCH    ST(1)
      4B5:       MOV     EAX, [EBP-12]
      4B8:       FSTD    [EAX+EBX*4+1]
      4BC:       FXCH    ST(1)
      4BE:       FSTD    [EAX+EBX*4+9]
      4C2:       FXCH    ST(1)
      4C4:       MOV     EDX, 8192
      4C9:       SUB     EDX, EBX
      4CB:       MOV     ESI, ESP
      4CD:       SUB     ESP, 12
      4D0:       MOV     EDI, [#x280002DC]    ; X86::*FP-CONSTANT-0D0*
      4D6:       MOV     EAX, [#x4877D428]    ; #<FDEFINITION object for COMPLEX>
      4DC:       MOV     ECX, 8
      4E1:       MOV     [ESI-4], EBP
      4E4:       MOV     EBP, ESI
      4E6:       CALL    DWORD PTR [EAX+5]
      4E9:       MOV     ESP, EBX
      4EB:       MOV     EBX, [EBP-20]
      4EE:       FSTPD   ST(0)
      4F0:       FLDD    [EDX+1]
      4F3:       FSTPD   ST(1)
      4F5:       FLDD    [EDX+9]
      4F8:       FXCH    ST(1)
      4FA:       MOV     EAX, [EBP-16]
      4FD:       FSTD    [EAX+EBX*4+1]
      501:       FXCH    ST(1)
      503:       FSTD    [EAX+EBX*4+9]
      507:       FXCH    ST(1)
      509:       ADD     EBX, 4
      50C:       CMP     EBX, 8192
      512:       JL      L0
      518:       MOV     ECX, 4
      51D:       JMP     L3
      51F: L1:   FSTPD   ST(0)
      521:       FLDD    ST(0)
      523:       FDIVD   ST(2)
      525:       FSTD    ST(4)
      527:       FSTPD   ST(0)
      529:       FLDD    ST(3)
      52B:       FMULD   ST(1)
      52D:       FADDD   ST(2)
      52F:       FSTD    ST(1)
      531:       FSTPD   ST(0)
      533:       FLDD    ST(4)
      535:       FMULD   ST(4)
      537:       FADDD   ST(3)
      539:       FDIVD   ST(1)
      53B:       FSTD    ST(2)
      53D:       FSTPD   ST(0)
      53F:       FLDD    ST(2)
      541:       FMULD   ST(4)
      543:       FSUBRD  ST(5)
      545:       FDIVD   ST(1)
      547:       FSTD    ST(3)
      549:       FSTPD   ST(0)
      54B:       FLDD    ST(1)
      54D:       FXCH    ST(3)
      54F:       FSTD    ST(1)
      551:       FXCH    ST(3)
      553: L2:   MOV     EAX, [EBP-12]
      556:       FSTD    [EAX+ECX*4+1]
      55A:       FXCH    ST(1)
      55C:       FSTD    [EAX+ECX*4+9]
      560:       FXCH    ST(1)
      562:       ADD     ECX, 4
      565:       CMP     ECX, 8000
      56B:       JNLE    L4
      571: L3:   MOV     EAX, [EBP-16]
      574:       FSTPD   ST(0)
      576:       FLDD    [EAX+ECX*4+1]
      57A:       FSTPD   ST(1)
      57C:       FLDD    [EAX+ECX*4+9]
      580:       FXCH    ST(1)
      582:       MOV     EAX, [EBP-12]
      585:       FSTPD   ST(6)
      587:       FLDD    [EAX+ECX*4+1]
      58B:       FXCH    ST(6)
      58D:       FSTPD   ST(7)
      58F:       FLDD    [EAX+ECX*4+9]
      593:       FXCH    ST(7)
      595:       FSTD    ST(3)
      597:       FXCH    ST(1)
      599:       FSTD    ST(5)
      59B:       FXCH    ST(1)
      59D:       FXCH    ST(6)
      59F:       FSTD    ST(2)
      5A1:       FXCH    ST(6)
      5A3:       FXCH    ST(7)
      5A5:       FSTD    ST(1)
      5A7:       FXCH    ST(7)
      5A9:       FXCH    ST(2)
      5AB:       FSTD    ST(2)
      5AD:       FABS
      5AF:       FSTD    ST(4)
      5B1:       FXCH    ST(1)
      5B3:       FSTD    ST(1)
      5B5:       FABS
      5B7:       FCOMD   ST(4)
      5B9:       FNSTSW
      5BB:       AND     AH, 69
      5BE:       CMP     AH, 1
      5C1:       JEQ     L1
      5C7:       FSTPD   ST(0)
      5C9:       FLDD    ST(1)
      5CB:       FDIVD   ST(1)
      5CD:       FSTD    ST(4)
      5CF:       FSTPD   ST(0)
      5D1:       FLDD    ST(3)
      5D3:       FMULD   ST(2)
      5D5:       FADD    ST(1), ST(0)
      5D7:       FSTPD   ST(0)
      5D9:       FLDD    ST(2)
      5DB:       FMULD   ST(4)
      5DD:       FADDD   ST(5)
      5DF:       FDIVD   ST(1)
      5E1:       FSTD    ST(2)
      5E3:       FSTPD   ST(0)
      5E5:       FLDD    ST(4)
      5E7:       FMULD   ST(4)
      5E9:       FSUBD   ST(3)
      5EB:       FDIVD   ST(1)
      5ED:       FSTD    ST(3)
      5EF:       FSTPD   ST(0)
      5F1:       FLDD    ST(1)
      5F3:       FXCH    ST(3)
      5F5:       FSTD    ST(1)
      5F7:       FXCH    ST(3)
      5F9:       JMP     L2
      5FE: L4:   MOV     EDX, [EBP-12]
      601:       MOV     ECX, [EBP-8]
      604:       MOV     EAX, [EBP-4]
      607:       ADD     ECX, 2
      60A:       MOV     ESP, EBP
      60C:       MOV     EBP, EAX
      60E:       JMP     ECX
*
From: Martin Raspaud
Subject: Re: Map on an array
Date: 
Message-ID: <c10170$7fm$1@news.u-bordeaux.fr>
Paul F. Dietz wrote:
> Martin Raspaud wrote:
> 
>> I tried to write my own loop, but it's not faster...
> 
> 
> Funny, a loop does look faster to me.  You may need to declare the loop 
> index
> variables to avoid generic arithmetic.  Also note that COMPLEX is not being
> inlined, which may be quite expensive (since it is probably heap 
> allocating the
> result.)
> 
> In comparing this against C, be sure you include the cost of allocating
> the vectors in the C program.
> 
> (This is at safety 0, debug 0, space 0, speed 3)
> 
> * (defun stupid-function ()
>   (let ((array (make-array 2048 :element-type '(complex double-float)))
>     (window (make-array 2048 :element-type '(complex double-float))))
>     (loop for i of-type fixnum from 0 below 2048 do           
> ;initialize the arrays
>       (setf (aref array i) (complex (1+ i) 0.0d0))
>       (setf (aref window i) (complex (- 2048 i) 0.0d0)))
>     (loop for i of-type fixnum from 1 to 2000 do (setf (aref array i) (/ 
> (aref window i) (aref array i))))
>     array))
> STUPID-FUNCTION

Well, you forgot a loop in there, maybe on purpose (for simplicity).
However, this did the trick....

Thanks a million.

But let me understand this better : is it because the types have been 
declared , thus avoiding generic arithmetic ?

Martin
From: Barry Margolin
Subject: Re: Map on an array
Date: 
Message-ID: <barmar-AEB4F8.07073718022004@comcast.ash.giganews.com>
In article <············@news.u-bordeaux.fr>,
 Martin Raspaud <········@labri.fr> wrote:

> Hi all,
> 
> I'm trying to enhance a part of the program I'm working on. I profiled 
> it and came to the conclusion that a part similar to the next function 
> was slowing all the process.
> 
> 
> 
> (defun stupid-function ()
>    (let ((array (make-array 2048 :element-type '(complex double-float)))
> 	(window (make-array 2048 :element-type '(complex double-float))))
>      (loop for i from 0 below 2048 do           ;initialize the arrays
> 	  (setf (aref array i) (coerce (1+ i) '(complex double-float)))
> 	  (setf (aref window i) (coerce (- 2048 i) '(complex double-float))))
>      (loop for i from 1 to 2000 do
> 	  (setf array (map '(simple-array (complex double-float) (*)) #'/ 
> window array)))))
> 
> 
> This takes around 10 seconds on my computer. The problematic part is 
> _map_... So I thought maybe there is a way to replace it somehow, 
> because as you can see, the "window" is always the same...

You could simply replace it with a loop:

(dotimes (j (length array))
  (setf (aref array j) (/ (aref window j) (aref array j))))

My guess is that by using MAP, you're not getting the division inlined, 
but this version should allow it.

It might also help to declare the types of ARRAY and WINDOW, so that it 
can generate the most efficient division code.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
From: Jeremy Yallop
Subject: Re: Map on an array
Date: 
Message-ID: <slrnc36no4.f4v.jeremy@digo.cl.cam.ac.uk>
Martin Raspaud wrote:
> (defun stupid-function ()
>    (let ((array (make-array 2048 :element-type '(complex double-float)))
> 	(window (make-array 2048 :element-type '(complex double-float))))
>      (loop for i from 0 below 2048 do           ;initialize the arrays
> 	  (setf (aref array i) (coerce (1+ i) '(complex double-float)))
> 	  (setf (aref window i) (coerce (- 2048 i) '(complex double-float))))
>      (loop for i from 1 to 2000 do
> 	  (setf array (map '(simple-array (complex double-float) (*)) #'/ 
> window array)))))

;; Copied and adapted from CMUCL's two-arg-/
(defun complex-divide-2 (x y)
  (declare (type (complex double-float) x y))
  (let ((rx (realpart x))
        (ix (imagpart x))
        (ry (realpart y))
        (iy (imagpart y)))
    (if (> (abs ry) (abs iy))
        (let* ((r (/ iy ry))
               (dn (* ry (+ 1 (* r r)))))
	  (complex (/ (+ rx (* ix r)) dn)
		   (/ (- ix (* rx r)) dn)))
      (let* ((r (/ ry iy))
	     (dn (* iy (+ 1 (* r r)))))
	(complex (/ (+ (* rx r) ix) dn)
		 (/ (- (* ix r) rx) dn))))))

(declaim (inline complex-divide-2))

(defun stupid-function ()
  (let ((array (make-array 2048 :element-type '(complex double-float)))
	(window (make-array 2048 :element-type '(complex double-float))))
    (loop for i from 0 below 2048 do           ;initialize the arrays
	  (setf (aref array i) (coerce (1+ i) '(complex double-float)))
	  (setf (aref window i) (coerce (- 2048 i) '(complex double-float))))
    (loop for i from 1 to 2000 do
	  (dotimes (j (length array))
	    (setf (aref array j) (complex-divide-2 (aref window j)
						   (aref array j)))))))

Before:
* (time (stupid-function))
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:

; Evaluation took:
;   23.18 seconds of real time
;   18.47 seconds of user run time
;   3.97 seconds of system run time
;   18,542,216,592 CPU cycles
;   [Run times include 5.73 seconds GC run time]
;   2 page faults and
;   1,443,517,040 bytes consed.


After:
* (time (stupid-function))
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:

; Evaluation took:
;   0.44 seconds of real time
;   0.43 seconds of user run time
;   0.0 seconds of system run time
;   351,376,912 CPU cycles
;   [Run times include 0.05 seconds GC run time]
;   0 page faults and
;   491,784 bytes consed.
;


Jeremy.
From: Jeremy Yallop
Subject: Re: Map on an array
Date: 
Message-ID: <slrnc36of3.fdc.jeremy@digo.cl.cam.ac.uk>
Jeremy Yallop wrote:
> Martin Raspaud wrote:
>> (defun stupid-function ()
>>    (let ((array (make-array 2048 :element-type '(complex double-float)))
>> 	(window (make-array 2048 :element-type '(complex double-float))))
>>      (loop for i from 0 below 2048 do           ;initialize the arrays
>> 	  (setf (aref array i) (coerce (1+ i) '(complex double-float)))
>> 	  (setf (aref window i) (coerce (- 2048 i) '(complex double-float))))
>>      (loop for i from 1 to 2000 do
>> 	  (setf array (map '(simple-array (complex double-float) (*)) #'/ 
>> window array)))))
> 
> ;; Copied and adapted from CMUCL's two-arg-/

Actually, forget that: declaring the types of the arrays has the same
effect:

(defun stupid-function ()
  (let ((array (make-array 2048 :element-type '(complex
						double-float)))
        (window (make-array 2048 :element-type '(complex
						 double-float))))
    (declare (type (simple-array (complex double-float) (*))
		   array window))
    (loop for i from 0 below 2048 do ;initialize the arrays
          (setf (aref array i) (coerce (1+ i) '(complex
						double-float)))
          (setf (aref window i) (coerce (- 2048 i) '(complex double-float))))                                                                       
    (loop for i from 1 to 2000 do
          (dotimes (j (length array))
	    (setf (aref array j) (/ (aref window j) (aref array j)))))))

Jeremy.