optimizing simple-arrays

From: Dave Watson
Subject: optimizing simple-arrays
Date: Tue, 05 Apr 2005 21:35:06 +0000
Message-ID: <slrnd56149.1ll.djwatson@dearf.cs.washington.edu>

I am trying to get sbcl to return an array of floats in an optimized fashion:

(declaim (ftype (function (&optional) (simple-array single-float (3))) blah)) 
(defun blah ()
	(declare (optimize (speed 3)))
	(vector (get-float1) (get-float2) (get-float3)))
--
and I get:
doing float to pointer coercion (cost 13).... 
three times, once for each (get-float).

presumably this has something to do with using a function, 
because the following works as expected:
(vector 1 2 3)

it also works by let'ing the array first, then setf'ing 
each element, but that starts to muddy the point.

Is there a better way to do this?

Thanks to everyone on comp.lang.lisp, you've been very helpful!

-Dave Watson
········@docwatson.org

Re: optimizing simple-arrays Raffael Cavallaro
Re: optimizing simple-arrays Raffael Cavallaro
Re: optimizing simple-arrays Raffael Cavallaro
Re: optimizing simple-arrays Raffael Cavallaro
- Re: optimizing simple-arrays Dave Watson
Re: optimizing simple-arrays Christophe Rhodes
- Re: optimizing simple-arrays Dave Watson
- Re: optimizing simple-arrays Dave Watson
  - Re: optimizing simple-arrays Christophe Rhodes
    - Re: optimizing simple-arrays Dave Watson
      - Re: optimizing simple-arrays Frank Buss
      - Re: optimizing simple-arrays Christophe Rhodes
  - Re: optimizing simple-arrays Juho Snellman
Re: optimizing simple-arrays Adam Warner
Re: optimizing simple-arrays Adam Warner
- Re: optimizing simple-arrays Dave Watson
  - Re: optimizing simple-arrays Adam Warner
    - Re: optimizing simple-arrays Adam Warner
      - Re: optimizing simple-arrays Duane Rettig
        Re: optimizing simple-arrays Christophe Rhodes
        Re: optimizing simple-arrays Duane Rettig
        Re: optimizing simple-arrays Adam Warner
        Re: optimizing simple-arrays Duane Rettig

From: Raffael Cavallaro
Subject: Re: optimizing simple-arrays
Date: Tue, 05 Apr 2005 23:09:10 +0000
Message-ID: <raffaelcavallaro-86FD49.18091005042005@comcast.dca.giganews.com>

In article <·······················@dearf.cs.washington.edu>,
 Dave Watson <········@docwatson.org> wrote:

> Is there a better way to do this?

does this help?

(declare (inline blah))

From: Raffael Cavallaro
Subject: Re: optimizing simple-arrays
Date: Tue, 05 Apr 2005 23:26:39 +0000
Message-ID: <raffaelcavallaro-0B7B29.18263905042005@comcast.dca.giganews.com>

In article <·······················@dearf.cs.washington.edu>,
 Dave Watson <········@docwatson.org> wrote:

> Is there a better way to do this?

Sorry, meant to say -

does this help?

(declaim (ftype (function (VALUES (SIMPLE-VECTOR 3) &OPTIONAL)) blah))

(declare (inline get-float1 get-float2 get-float3))

(defun blah ()
   (declare (optimize (speed 3)))
   (vector (get-float1) (get-float2) (get-float3)))

;; I changed the ftype declaration because sbcl complained that the
;; declared and derived types didn't match. And the inline is to
;; get rid of the function calls of course.
;; All the above assumes suitable definitions of get-float1 etc.

From: Raffael Cavallaro
Subject: Re: optimizing simple-arrays
Date: Tue, 05 Apr 2005 23:27:12 +0000
Message-ID: <raffaelcavallaro-FE000D.18271205042005@comcast.dca.giganews.com>

In article <·······················@dearf.cs.washington.edu>,
 Dave Watson <········@docwatson.org> wrote:

> Is there a better way to do this?

Sorry, meant to say -

does this help?

(declaim (ftype (function (VALUES (SIMPLE-VECTOR 3) &OPTIONAL)) blah))

(declare (inline get-float1 get-float2 get-float3))

(defun blah ()
   (declare (optimize (speed 3)))
   (vector (get-float1) (get-float2) (get-float3)))

;; I changed the ftype declaration because sbcl complained that the
;; declared and derived types didn't match. And the inline is to
;; get rid of the function calls of course.
;; All the above assumes suitable definitions of get-float1 etc.

From: Raffael Cavallaro
Subject: Re: optimizing simple-arrays
Date: Tue, 05 Apr 2005 22:33:51 +0000
Message-ID: <raffaelcavallaro-AD012C.18334905042005@comcast.dca.giganews.com>

In article <·······················@dearf.cs.washington.edu>,
 Dave Watson <········@docwatson.org> wrote:

> Is there a better way to do this?

Wow - this is just not my day for cut and paste (nor for canceling news 
posts either!)

Here's what I actually compiled in sbcl. Does this do what you want?


(declaim (ftype (function (&optional) (simple-vector 3)) blah))

(declare (inline get-float1 get-float2 get-float3))

(defun blah ()
   (declare (optimize (speed 3)))
   (vector (get-float1) (get-float2) (get-float3)))

;; I changed the ftype declaration because sbcl complained that the
;; declared and derived types didn't match. And the inline is to
;; get rid of the function calls of course.
;; All the above assumes suitable definitions of get-float1 etc.

From: Dave Watson
Subject: Re: optimizing simple-arrays
Date: Tue, 05 Apr 2005 23:53:19 +0000
Message-ID: <slrnd5697f.1hs.djwatson@dearf.cs.washington.edu>

In article <······································@comcast.dca.giganews.com>, Raffael Cavallaro wrote:
> In article <·······················@dearf.cs.washington.edu>,
>  Dave Watson <········@docwatson.org> wrote:
> 
>> Is there a better way to do this?
> 
> Here's what I actually compiled in sbcl. Does this do what you want?
> 
> 
> (declaim (ftype (function (&optional) (simple-vector 3)) blah))
> 
> (declare (inline get-float1 get-float2 get-float3))
> 
> (defun blah ()
>    (declare (optimize (speed 3)))
>    (vector (get-float1) (get-float2) (get-float3)))
> 
> ;; I changed the ftype declaration because sbcl complained that the
> ;; declared and derived types didn't match. And the inline is to
> ;; get rid of the function calls of course.
> ;; All the above assumes suitable definitions of get-float1 etc.

The get-floats are a bit more complicated, something like:

(* maxvalue (the single-float
	(cos (the (single-float -1.0 1.0)
		(safe/
			(* .5 (+ (- R G .....

you get the idea.   Basically, something that cannot be inlined.

-Dave Watson
········@docwatson.org

From: Christophe Rhodes
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 08:17:10 +0000
Message-ID: <sqy8bwpbeh.fsf@cam.ac.uk>

Dave Watson <········@docwatson.org> writes:

> (declaim (ftype (function (&optional) (simple-array single-float (3))) blah)) 
> (defun blah ()
> 	(declare (optimize (speed 3)))
> 	(vector (get-float1) (get-float2) (get-float3)))
> --
> and I get:
> doing float to pointer coercion (cost 13).... 
> three times, once for each (get-float).

There are (at least) two issues here.

The first is that you need to be careful with your declaration. The
type (simple-array single-float (3)) does not mean, as one might
expect, "any one-dimensional simple array with three elements, each of
which is a single-float".  Instead, it means "a one-dimensional simple
array, which by its very nature can only[*] store single-floats, with
three elements".  The distinction is that not any old vector will do
as a return value; it had better be one that fits with this additional
requirement.  You get one of those by calling MAKE-ARRAY with a
specified element type.

The second is that the best representation in memory for an array of
single floats depends on what you're going to do with them next.  If
you're going to treat them as objects, by printing them, storing them
in slots, or the like, you might be better off storing them as
pointers, within a (simple-array t (3)); if you're going to be mostly
doing arithmetic with them, then you are indeed probably better off
with them in a (simple-array single-float (3)).

Meanwhile, are you sure that packing your floats into your array is a
relevant bottleneck in your program?  It's something that smacks of an
initializtion, which is a one-time cost that largely evaporates on
real computations...  also, I'd expect your get-floatN routines to
take longer than a couple of memory indirections.

Christophe

[*] This is not quite true for arbitrary values of single-float, but
    in this case it probably is true.

From: Dave Watson
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 14:16:34 +0000
Message-ID: <d30qsa$407$1@gnus01.u.washington.edu>

On 2005-04-06, Christophe Rhodes <·····@cam.ac.uk> wrote:
> Meanwhile, are you sure that packing your floats into your array is a
> relevant bottleneck in your program?  It's something that smacks of an
> initializtion, which is a one-time cost that largely evaporates on
> real computations...  also, I'd expect your get-floatN routines to
> take longer than a couple of memory indirections.

Actually the full routine is an rgb to hsv pixel converter:

(declaim (inline rgb-pto-hsv)
         (ftype (function (&key ) (simple-array (float) (3))) rgb-pto-hsv))
(defun rgb-pto-hsv (pixel &key (maxvalue 255))
  "transforms a pixel in rgb to hsv (vector triple"
  (declare #.*optimize* 
           (type (simple-array (unsigned-byte 8) (3)) pixel)
           (type (unsigned-byte 8) maxvalue)
           (sb-ext:muffle-conditions sb-ext:compiler-note))
  (symbol-macrolet ((R (aref pixel 0)) (G (aref pixel 1)) (B (aref pixel 2)))
    (let ((ret (make-array 3 :element-type 'single-float)))
      (setf (aref ret 0) (* maxvalue (the single-float 
                   (cos (the (single-float -1.0 1.0)
                          (safe/
                           (* .5 (+ (- R G) (- R B)))
                           (sqrt (+ (expt (- R G) 2) (* (- R B) (- G B))))))))))
      (setf (aref ret 1) (* maxvalue (- 1 (the single-float (float
                        (safe/ 
                         (* 3 (min R G B)) 
                         (+ R G B)))))))
      (setf (aref ret 2) (* .333333 (+ R G B)))
      ret)))

(just for context)

I've profiled to find that this (and the caller) are the bottlenecks,
but I didn't know if there was a way to find exactly what in the
function was the exact bottleneck, so I just optimized the whole thing
the best I could because it was small.  

-- 
-Dave Watson
········@docwatson.org

From: Dave Watson
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 14:22:45 +0000
Message-ID: <d30r7t$4e9$1@gnus01.u.washington.edu>

On 2005-04-06, Christophe Rhodes <·····@cam.ac.uk> wrote:
> Meanwhile, are you sure that packing your floats into your array is a
> relevant bottleneck in your program?  It's something that smacks of an
> initializtion, which is a one-time cost that largely evaporates on
> real computations...  also, I'd expect your get-floatN routines to
> take longer than a couple of memory indirections.

Actually the full routine is an rgb to hsv pixel converter:

(declaim (inline rgb-pto-hsv)
         (ftype (function (&key ) (simple-array (float) (3))) rgb-pto-hsv))
(defun rgb-pto-hsv (pixel &key (maxvalue 255))
  "transforms a pixel in rgb to hsv (vector triple"
  (declare #.*optimize* 
           (type (simple-array (unsigned-byte 8) (3)) pixel)
           (type (unsigned-byte 8) maxvalue)
           (sb-ext:muffle-conditions sb-ext:compiler-note))
  (symbol-macrolet ((R (aref pixel 0)) (G (aref pixel 1)) (B (aref pixel 2)))
    (let ((ret (make-array 3 :element-type 'single-float)))
      (setf (aref ret 0) (* maxvalue (the single-float 
                   (cos (the (single-float -1.0 1.0)
                          (safe/
                           (* .5 (+ (- R G) (- R B)))
                           (sqrt (+ (expt (- R G) 2) (* (- R B) (- G B))))))))))
      (setf (aref ret 1) (* maxvalue (- 1 (the single-float (float
                        (safe/ 
                         (* 3 (min R G B)) 
                         (+ R G B)))))))
      (setf (aref ret 2) (* .333333 (+ R G B)))
      ret)))

(just for context)

I've profiled to find that this (and the caller) are the bottlenecks,
but I didn't know if there was a way to find exactly what in the
function was the exact bottleneck, so I just optimized the whole thing
the best I could because it was small.  

-- 
-Dave Watson
········@docwatson.org

From: Christophe Rhodes
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 14:54:16 +0000
Message-ID: <sqd5t8klbb.fsf@cam.ac.uk>

Dave Watson <········@docwatson.org> writes:

> On 2005-04-06, Christophe Rhodes <·····@cam.ac.uk> wrote:
>> Meanwhile, are you sure that packing your floats into your array is a
>> relevant bottleneck in your program?  It's something that smacks of an
>> initializtion, which is a one-time cost that largely evaporates on
>> real computations...  also, I'd expect your get-floatN routines to
>> take longer than a couple of memory indirections.
>
> Actually the full routine is an rgb to hsv pixel converter:
>
> (declaim (inline rgb-pto-hsv)
>          (ftype (function (&key ) (simple-array (float) (3))) rgb-pto-hsv))
> (defun rgb-pto-hsv (pixel &key (maxvalue 255))
>   "transforms a pixel in rgb to hsv (vector triple"
>   (declare #.*optimize* 
>            (type (simple-array (unsigned-byte 8) (3)) pixel)
>            (type (unsigned-byte 8) maxvalue)
>            (sb-ext:muffle-conditions sb-ext:compiler-note))
>   (symbol-macrolet ((R (aref pixel 0)) (G (aref pixel 1)) (B (aref pixel 2)))
>     (let ((ret (make-array 3 :element-type 'single-float)))
>       (setf (aref ret 0) (* maxvalue (the single-float 
>                    (cos (the (single-float -1.0 1.0)
>                           (safe/
>                            (* .5 (+ (- R G) (- R B)))
>                            (sqrt (+ (expt (- R G) 2) (* (- R B) (- G B))))))))))
>       (setf (aref ret 1) (* maxvalue (- 1 (the single-float (float
>                         (safe/ 
>                          (* 3 (min R G B)) 
>                          (+ R G B)))))))
>       (setf (aref ret 2) (* .333333 (+ R G B)))
>       ret)))
>
> (just for context)

What is SAFE/?  If this function is not known to the compiler, it
won't know if it can compile FLOAT of it efficiently.

Also, why SYMBOL-MACROLET, rather than a simple LET?  The system will
have to perform the array lookup at each reference, which is unlikely
to be a win.

You might do better, also, to have maxvalue (and 1!) as single floats
explicitly.

(While I'm at it, your FTYPE declaration is wrong, both in the lambda
list and the return value declarations)

> I've profiled to find that this (and the caller) are the bottlenecks,
> but I didn't know if there was a way to find exactly what in the
> function was the exact bottleneck, so I just optimized the whole thing
> the best I could because it was small.  

I'm not really an expert in timings, but I would have expected
floating point operations to dwarf memory accesses on functions of the
above form.  Still, let's see the support routines for this piece of
code, and maybe something else will leap out.

Christophe

From: Dave Watson
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 15:09:38 +0000
Message-ID: <d30tvq$667$1@gnus01.u.washington.edu>

On 2005-04-06, Christophe Rhodes <·····@cam.ac.uk> wrote:
>> Actually the full routine is an rgb to hsv pixel converter:
>>
>> (declaim (inline rgb-pto-hsv)
>>          (ftype (function (&key ) (simple-array (float) (3))) rgb-pto-hsv))
>> (defun rgb-pto-hsv (pixel &key (maxvalue 255))
>>   "transforms a pixel in rgb to hsv (vector triple"
>>   (declare #.*optimize* 
>>            (type (simple-array (unsigned-byte 8) (3)) pixel)
>>            (type (unsigned-byte 8) maxvalue)
>>            (sb-ext:muffle-conditions sb-ext:compiler-note))
>>   (symbol-macrolet ((R (aref pixel 0)) (G (aref pixel 1)) (B (aref pixel 2)))
>>     (let ((ret (make-array 3 :element-type 'single-float)))
>>       (setf (aref ret 0) (* maxvalue (the single-float 
>>                    (cos (the (single-float -1.0 1.0)
>>                           (safe/
>>                            (* .5 (+ (- R G) (- R B)))
>>                            (sqrt (+ (expt (- R G) 2) (* (- R B) (- G B))))))))))
>>       (setf (aref ret 1) (* maxvalue (- 1 (the single-float (float
>>                         (safe/ 
>>                          (* 3 (min R G B)) 
>>                          (+ R G B)))))))
>>       (setf (aref ret 2) (* .333333 (+ R G B)))
>>       ret)))
>>
>> (just for context)
>
> What is SAFE/?  If this function is not known to the compiler, it
> won't know if it can compile FLOAT of it efficiently.

Ok, well the name is backwards, but:

(defun safe/ (nom denum)
  (if (and (= 0 nom) (= 0 denum))
        0.0
	      (/ nom denum)))

> Also, why SYMBOL-MACROLET, rather than a simple LET?  The system will
> have to perform the array lookup at each reference, which is unlikely
> to be a win.

Yes I was unsure about this, in any case it doesn't seem to make much
of a difference either way.

> I'm not really an expert in timings, but I would have expected
> floating point operations to dwarf memory accesses on functions of the
> above form.  Still, let's see the support routines for this piece of
> code, and maybe something else will leap out.

This is the first time I've tried any optimization with lisp, so
I'm sure you're very right ;)   Then again the question was
half just for the understanding.

-- 
-Dave Watson
········@docwatson.org

From: Frank Buss
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 15:48:36 +0000
Message-ID: <d310cj$ro3$1@newsreader3.netcologne.de>

Dave Watson <········@docwatson.org> wrote:

> Ok, well the name is backwards, but:
> 
> (defun safe/ (nom denum)
>   (if (and (= 0 nom) (= 0 denum))
>         0.0
>            (/ nom denum)))

don't know, if it is possible with your algorithm, but if you want to 
avoid division-by-zero errors, this is the right definition:

(defun safe/ (nom denum)
  (if (zerop denum) 0 (/ nom denum)))

This works like yours, execpt that it doesn't generate an error if denum 
is zero only.

> This is the first time I've tried any optimization with lisp, so
> I'm sure you're very right ;)   Then again the question was
> half just for the understanding.

you can try (disassemble 'fun) to look like it is compiled. Sometimes 
this helps.

For your rgb-pto-hsv perhaps it could be faster to define a struct for 
your RGB values instead of using a vector.

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: Christophe Rhodes
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 15:17:52 +0000
Message-ID: <sq8y3wkk7z.fsf@cam.ac.uk>

Dave Watson <········@docwatson.org> writes:

> On 2005-04-06, Christophe Rhodes <·····@cam.ac.uk> wrote:
>> What is SAFE/?  If this function is not known to the compiler, it
>> won't know if it can compile FLOAT of it efficiently.
>
> Ok, well the name is backwards, but:
>
> (defun safe/ (nom denum)
>   (if (and (= 0 nom) (= 0 denum))
>         0.0
> 	      (/ nom denum)))

Hm.  So under normal circumstances, this actually (in your code)
returns a rational number, right?  I would expect coercing a rational
to a float to take longer even than square roots and the like.

How about

(declaim (ftype (function ((signed-byte 11) (signed-byte 11)) single-float)))
(defun safe/ (nom denum)
  (declare #.*optimize*
           ;; I think this declaration is right
           (type (signed-byte 11) nom denum))
  (if (and (= 0 nom) (= 0 denum))
      0.0
      (/ (float nom) (float denum))))

(defun rgb-pto-hsv 
  ...)

does that make any difference?

Christophe

From: Juho Snellman
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 17:19:17 +0000
Message-ID: <slrnd586gl.sp3.jsnell@sbz-31.cs.Helsinki.FI>

<········@docwatson.org> wrote:
> I've profiled to find that this (and the caller) are the bottlenecks,
> but I didn't know if there was a way to find exactly what in the
> function was the exact bottleneck, so I just optimized the whole thing
> the best I could because it was small.  

I get the impression that you're using the instrumenting profiler
(SB-PROFILE) instead of the statistical profiler (SB-SPROF,
http://www.sbcl.org/manual/Statistical-Profiler.html). The latter is
probably going to be more useful in this situation. It would, for
example, show that lots of rationals are being constructed (as
Christophe noted) and that a lot of time is spent calling the generic
SQRT. It's also integrated into the disassembler, which usually helps
in locating the hotspots inside the function:

;     AF4C: L0:   D9FF             FCOS                       ; 1/1000 samples
;     AF4E:       9B               WAIT                       ; 278/1000 samples
;     AF4F:       DDD4             FSTD FR4                   ; 7/1000 samples
;     AF51:       897DF4           MOV [EBP-12], EDI          ; 7/1000 samples
;     AF54:       DDD8             FSTPD FR0                  ; 3/1000 samples
;     AF56:       DB45F4           FILD [EBP-12]

Anyway, I'd probably write the functions somewhere along these lines:

(declaim (inline safe/))
(defun safe/ (nom denum)
  (declare (single-float nom denum))
  (if (and (= 0 nom) (= 0 denum))
      0.0
      (/ nom denum)))

(defun rgb-pto-hsv (pixel &key (maxvalue 255))
  "transforms a pixel in rgb to hsv (vector triple"
  (declare #.*optimize*
           (type (simple-array (unsigned-byte 8) (3)) pixel)
           (type (unsigned-byte 8) maxvalue))
  (let ((R (float (aref pixel 0) 0.0))
	(G (float (aref pixel 1) 0.0))
	(B (float (aref pixel 2) 0.0)))
    (let ((ret (make-array 3 :element-type 'single-float)))
      (setf (aref ret 0)
	    (* maxvalue (cos (the (single-float -1.0 1.0)
			       (safe/ (* .5 (+ (- R G) (- R B)))
				      (sqrt (+ (expt (- R G) 2)
					       (* (- R B)
						  (- G B)))))))))
      (setf (aref ret 1)
	    (* maxvalue (- 1 (safe/ (* 3 (min R G B))
				    (+ R G B)))))
      (setf (aref ret 2)
	    (* .333333 (+ R G B)))
      ret)))

And just for laughs, here's the source of the latter function with
profiler annotations:

CL-USER> (profile-annotate-source #'rgb-pto-hsv)
([16/1001]
 (LAMBDA ()
   ([1/143]
    (DEFUN RGB-PTO-HSV (PIXEL MAXVALUE)
      (DECLARE (OPTIMIZE (SPEED 3) (SAFETY 0) (DEBUG 2))
	       (TYPE (SIMPLE-ARRAY (UNSIGNED-BYTE 8) (3)) PIXEL)
	       (TYPE (UNSIGNED-BYTE 8) MAXVALUE))
      (LET ((R
	     ([1/143]
	      (FLOAT (AREF PIXEL 0) 0.0)))
	    (G
	     ([6/1001]
	      (FLOAT (AREF PIXEL 1) 0.0)))
	    (B
	     ([10/1001]
	      (FLOAT (AREF PIXEL 2) 0.0))))
	(LET ((RET (MAKE-ARRAY 3 ELEMENT-TYPE 'SINGLE-FLOAT)))
	  (SETF (AREF RET 0)
		([2/91]
		 (* MAXVALUE
		    ([293/1001]
		     (COS
		      (THE (SINGLE-FLOAT -1.0 1.0)
			([48/1001]
			 (SAFE/ ([4/1001]
				 (* 0.5
				    ([27/1001]
				     (+ ([1/91]
					 (- R G))
					([10/1001]
					 (- R B))))))
				([31/1001]
				 (SQRT
				  (+ ([2/1001]
				      (EXPT (- R G) 2))
				     ([3/1001]
				      (* ([2/1001]
					  (- R B))
					 (- G B))))))))))))))
	  (SETF (AREF RET 1)
		([31/1001]
		 (* MAXVALUE
		    ([9/1001]
		     (- 1
			([1/77]
			 (SAFE/ ([5/1001]
				 (* 3
				    ([1/143]
				     (MIN R G B))))
				([9/1001]
				 (+ R G B)))))))))
	  (SETF (AREF RET 2)
		([79/1001]
		 (* 0.333333
		    ([1/1001]
		     (+ R G B)))))
	  RET))))))

-- 
Juho Snellman
"Premature profiling is the root of all evil."

From: Adam Warner
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 00:14:34 +0000
Message-ID: <pan.2005.04.06.00.14.31.968877@consulting.net.nz>

On Tue, 05 Apr 2005 21:35:06 +0000, Dave Watson wrote:
> I am trying to get sbcl to return an array of floats in an optimized fashion:
> 
> (declaim (ftype (function (&optional) (simple-array single-float (3))) blah)) 
> (defun blah ()
> 	(declare (optimize (speed 3)))
> 	(vector (get-float1) (get-float2) (get-float3)))

So don't returning a vector of type T!

(defun get-float1 () (random 1f0))
(defun get-float2 () (random 1f0))
(defun get-float3 () (random 1f0))

(declaim (ftype (function (&optional) (simple-array single-float (3))) blah)) 
(defun blah ()
  (declare (optimize (speed 3)))
  (let ((array (make-array 3 :element-type 'single-float)))
    (setf (aref array 0) (get-float1))
    (setf (aref array 1) (get-float2))
    (setf (aref array 2) (get-float3))
  array))

> and I get:
> doing float to pointer coercion (cost 13).... 
> three times, once for each (get-float).
> 
> presumably this has something to do with using a function, 
> because the following works as expected:
> (vector 1 2 3)

(type-of (vector 1 2 3)) => (SIMPLE-VECTOR 3))

(type-of (blah)) => (SIMPLE-ARRAY SINGLE-FLOAT (3))

(vector 1 2 3) doesn't return the correct type.
 
> it also works by let'ing the array first, then setf'ing 
> each element, but that starts to muddy the point.

It doesn't muddy the point. How else are you going to generate a return
value _of the correct type_? You've got to use MAKE-ARRAY.
:INITIAL-CONTENTS can also be passed to MAKE-ARRAY.

> Is there a better way to do this?

No. Just use MAKE-ARRAY with the correct :ELEMENT-TYPE.

Regards,
Adam

From: Adam Warner
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 00:15:02 +0000
Message-ID: <pan.2005.04.06.00.14.54.349072@consulting.net.nz>

On Tue, 05 Apr 2005 21:35:06 +0000, Dave Watson wrote:
> I am trying to get sbcl to return an array of floats in an optimized fashion:
> 
> (declaim (ftype (function (&optional) (simple-array single-float (3))) blah)) 
> (defun blah ()
> 	(declare (optimize (speed 3)))
> 	(vector (get-float1) (get-float2) (get-float3)))

So don't return a vector of type T!

(defun get-float1 () (random 1f0))
(defun get-float2 () (random 1f0))
(defun get-float3 () (random 1f0))

(declaim (ftype (function (&optional) (simple-array single-float (3))) blah)) 
(defun blah ()
  (declare (optimize (speed 3)))
  (let ((array (make-array 3 :element-type 'single-float)))
    (setf (aref array 0) (get-float1))
    (setf (aref array 1) (get-float2))
    (setf (aref array 2) (get-float3))
  array))

> and I get:
> doing float to pointer coercion (cost 13).... 
> three times, once for each (get-float).
> 
> presumably this has something to do with using a function, 
> because the following works as expected:
> (vector 1 2 3)

(type-of (vector 1 2 3)) => (SIMPLE-VECTOR 3))

(type-of (blah)) => (SIMPLE-ARRAY SINGLE-FLOAT (3))

(vector 1 2 3) doesn't return the correct type.
 
> it also works by let'ing the array first, then setf'ing 
> each element, but that starts to muddy the point.

It doesn't muddy the point. How else are you going to generate a return
value _of the correct type_? You've got to use MAKE-ARRAY.
:INITIAL-CONTENTS can also be passed to MAKE-ARRAY.

> Is there a better way to do this?

No. Just use MAKE-ARRAY with the correct :ELEMENT-TYPE.

Regards,
Adam

From: Dave Watson
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 00:27:21 +0000
Message-ID: <slrnd56b78.ci1.djwatson@attu2.cs.washington.edu>

In article <······························@consulting.net.nz>, Adam Warner wrote:
> On Tue, 05 Apr 2005 21:35:06 +0000, Dave Watson wrote:
>> it also works by let'ing the array first, then setf'ing 
>> each element, but that starts to muddy the point.
> 
> It doesn't muddy the point. How else are you going to generate a return
> value _of the correct type_? You've got to use MAKE-ARRAY.
>:INITIAL-CONTENTS can also be passed to MAKE-ARRAY.

Ok, this is the approach I've taken for now then...

>> Is there a better way to do this?
> 
> No. Just use MAKE-ARRAY with the correct :ELEMENT-TYPE.

One still must use the setf's with this approach, instead of using
:initial-contents, correct?   As initial-contents requires a list, 
getting into the same mess as before...

Thanks for the clarification

-Dave Watson
········@docwatson.org

From: Adam Warner
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 01:11:53 +0000
Message-ID: <pan.2005.04.06.01.11.50.910008@consulting.net.nz>

On Wed, 06 Apr 2005 00:27:21 +0000, Dave Watson wrote:
> In article <······························@consulting.net.nz>, Adam
> Warner wrote:
>> On Tue, 05 Apr 2005 21:35:06 +0000, Dave Watson wrote:
>>> it also works by let'ing the array first, then setf'ing each element,
>>> but that starts to muddy the point.
>> 
>> It doesn't muddy the point. How else are you going to generate a return
>> value _of the correct type_? You've got to use MAKE-ARRAY.
>>:INITIAL-CONTENTS can also be passed to MAKE-ARRAY.
> 
> Ok, this is the approach I've taken for now then...
> 
>>> Is there a better way to do this?
>> 
>> No. Just use MAKE-ARRAY with the correct :ELEMENT-TYPE.
> 
> One still must use the setf's with this approach, instead of using
> :initial-contents, correct?   As initial-contents requires a list,
> getting into the same mess as before...

From the HyperSpec definition of MAKE-ARRAY: "initial-contents is composed
of a nested structure of sequences." e.g. (make-array 3 :initial-contents
#(1 2 3)) also works.

The problem is that you are evaluating the initial contents at run time.
So you would need to build a sequence at run time to pass to
:INITIAL-CONTENTS. You'd actually build two sequences and would be likely
to generate more garbage. That's why it's better to just make the array
and then fill each element in.

Note that "If initial-contents is not supplied, the consequences of later
reading an uninitialized element of new-array are undefined unless either
initial-element is supplied or displaced-to is non-nil."

This means the implementation is not required to first initialise the
array elements to zero before you later setf them. You can think of the
non-initialised version of make-array as a fast version of C's malloc (I
say fast because Lisp implementations are typically faster at allocating
heap memory than malloc because a good malloc has to also attempt to keep
memory fragmentation low and this is an algorithmically complex task).

[An implementation could sill zero the initial elements in order to reduce
false positives from, e.g., conservative garbage collection. But the
implementation might also deduce that the elements are immediately being
filled in and choose not to zero them first. The global function calls in
the example would complicate such a deduction]

[Any Lisp implementation also creates overhead setting up the array
header. This should not outweigh the larger cost of C allocating such a
small array via malloc. Lisp implementations are typically really good at
allocating small objects]

Regards,
Adam

From: Adam Warner
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 01:33:55 +0000
Message-ID: <pan.2005.04.06.01.33.52.677218@consulting.net.nz>

On Wed, 06 Apr 2005 13:11:53 +1200, Adam Warner wrote:
> The problem is that you are evaluating the initial contents at run time.
> So you would need to build a sequence at run time to pass to
> :INITIAL-CONTENTS. You'd actually build two sequences and would be likely
> to generate more garbage. That's why it's better to just make the array
> and then fill each element in.

Here's another approach where one first builds a sequence of the wrong
type and then coerces it to the correct type:

(coerce (vector (random 1f0) (random 1f0) (random 1f0))
        '(simple-array single-float (*)))

While a Sufficiently Smart Compiler could transform this to code that
skips the initial generation of the type T vector, SBCL does not have such
a SSC (they're a mythical construct):

* (lambda ()
    (declare (optimize (speed 3) (safety 0) (debug 0)))
    (coerce (vector (random 1f0) (random 1f0) (random 1f0))\
            '(simple-array single-float (*))))

; in: LAMBDA NIL
;     (VECTOR (RANDOM 1.0) (RANDOM 1.0) (RANDOM 1.0))
; --> LET PROGN LET LOCALLY SETF SB-KERNEL:%SVSET SB-KERNEL:%ASET LET*
; --> SB-KERNEL:HAIRY-DATA-VECTOR-SET MULTIPLE-VALUE-BIND MULTIPLE-VALUE-CALL
; --> FUNCTION
; ==>
;   (SB-KERNEL:DATA-VECTOR-SET ARRAY SB-INT:INDEX SB-C::NEW-VALUE)
;
; note: doing float to pointer coercion (cost 13), for:
;       the second argument of DATA-VECTOR-SET/SIMPLE-VECTOR-C
;
; note: doing float to pointer coercion (cost 13), for:
;       the second argument of DATA-VECTOR-SET/SIMPLE-VECTOR-C
;
; note: doing float to pointer coercion (cost 13), for:
;       the second argument of DATA-VECTOR-SET/SIMPLE-VECTOR-C
; compilation unit finished
;   printed 3 notes
#<FUNCTION (LAMBDA ()) {93062DD}>

The compiler notes indicate that the randomly generated single floats are
first assigned to the vector of type T. A disassembly will also confirm
the call to ALLOCATE-VECTOR.

Regards,
Adam

From: Duane Rettig
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 02:31:34 +0000
Message-ID: <4u0mkbpq1.fsf@franz.com>

Adam Warner <······@consulting.net.nz> writes:

> On Wed, 06 Apr 2005 13:11:53 +1200, Adam Warner wrote:
> > The problem is that you are evaluating the initial contents at run time.
> > So you would need to build a sequence at run time to pass to
> > :INITIAL-CONTENTS. You'd actually build two sequences and would be likely
> > to generate more garbage. That's why it's better to just make the array
> > and then fill each element in.
> 
> Here's another approach where one first builds a sequence of the wrong
> type and then coerces it to the correct type:
> 
> (coerce (vector (random 1f0) (random 1f0) (random 1f0))
>         '(simple-array single-float (*)))

Dangerous: that won't work too well in a 64-bit lisp...


-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Christophe Rhodes
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 08:09:09 +0000
Message-ID: <sq3bu4qqca.fsf@cam.ac.uk>

Duane Rettig <·····@franz.com> writes:

> Adam Warner <······@consulting.net.nz> writes:
>
>> Here's another approach where one first builds a sequence of the wrong
>> type and then coerces it to the correct type:
>> 
>> (coerce (vector (random 1f0) (random 1f0) (random 1f0))
>>         '(simple-array single-float (*)))
>
> Dangerous: that won't work too well in a 64-bit lisp...

?!??!

Do you mean that you can't simply alter the vector's tag bits, because
a T vector takes 64 bits per element and a single-float vector 32?  I
don't think you can normally smash the tag bits anyway, because a
simple vector contains pointers and, probably, a (simple-array
single-float (*)) contains the single float bits.

What am I missing?

Christophe

From: Duane Rettig
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 15:04:33 +0000
Message-ID: <44qekj69q.fsf@franz.com>

Christophe Rhodes <·····@cam.ac.uk> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > Adam Warner <······@consulting.net.nz> writes:
> >
> >> Here's another approach where one first builds a sequence of the wrong
> >> type and then coerces it to the correct type:
> >> 
> >> (coerce (vector (random 1f0) (random 1f0) (random 1f0))
> >>         '(simple-array single-float (*)))
> >
> > Dangerous: that won't work too well in a 64-bit lisp...
> 
> ?!??!
> 
> Do you mean that you can't simply alter the vector's tag bits, because
> a T vector takes 64 bits per element and a single-float vector 32?

Yes, that's what I meant, but obviously the technique I had been
thinking about wouldn't work anyway, because of your reasoning.

I could claim that the fact that I have been sick the last few days
has clouded not only my thinking but my sinuses, or I could make the
excuse that I was thinking of more reliable techniques, like perhaps
blasting vectors full of fixnums into vectors full of signed or
unsigned integers with a *4 or *8 factor  - we do not tend to do such
tag blasting, but we do pun fixnums as natural-word-aligned addresses
for static values like stacks, etc. - they are called "aligned"
pointers:
http://www.franz.com/support/documentation/7.0/doc/ftype.htm#aligned-pointers-1

(though I do notice that this documentation does not address the 64-bit
lisps, where the factor is 8).

Now, which excuse shall I use?  Choices, choices :-)

>  I
> don't think you can normally smash the tag bits anyway, because a
> simple vector contains pointers and, probably, a (simple-array
> single-float (*)) contains the single float bits.
> 
> What am I missing?

Absolutely nothing.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182

From: Adam Warner
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 03:04:23 +0000
Message-ID: <pan.2005.04.06.03.04.21.793199@consulting.net.nz>

On Tue, 05 Apr 2005 19:31:34 -0700, Duane Rettig wrote:
>> Here's another approach where one first builds a sequence of the wrong
>> type and then coerces it to the correct type:
>> 
>> (coerce (vector (random 1f0) (random 1f0) (random 1f0))
>>         '(simple-array single-float (*)))
> 
> Dangerous: that won't work too well in a 64-bit lisp...

I suspect many will be surprised to learn this Duane. Would you mind
explaining why?

Regards,
Adam

PS: SBCL for AMD64 happily performs the array conversion. If you can show
it's non-compliant/undefined behaviour this may not work much longer!

From: Duane Rettig
Subject: Re: optimizing simple-arrays
Date: Wed, 06 Apr 2005 14:51:20 +0000
Message-ID: <48y3wj6vr.fsf@franz.com>

Adam Warner <······@consulting.net.nz> writes:

> On Tue, 05 Apr 2005 19:31:34 -0700, Duane Rettig wrote:
> >> Here's another approach where one first builds a sequence of the wrong
> >> type and then coerces it to the correct type:
> >> 
> >> (coerce (vector (random 1f0) (random 1f0) (random 1f0))
> >>         '(simple-array single-float (*)))
> > 
> > Dangerous: that won't work too well in a 64-bit lisp...
> 
> I suspect many will be surprised to learn this Duane. Would you mind
> explaining why?

It may be that I've misunderstood the original intention of the technique.
Coerce would certainly not smash the array type in-place, but some
internal::coerce-in-place functionality might, and that was what I was
thinking had been the intention.  Of course the coerce will work, but
you'd find that it had created a copy of the vector (also as expected).

> PS: SBCL for AMD64 happily performs the array conversion. If you can show
> it's non-compliant/undefined behaviour this may not work much longer!

No, there would be no undefined behavior.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182