From: Johann Hibschman Subject: Optimizing general array functions? Date: Message-ID: <60s76u$pr4@agate.berkeley.edu>

I've been trying to figure out the best way to write a function which will take an arbitrary number of 1D arrays of double-float's and sum them, like (a+ a1 a2 a3). It's easy to write a function that does this, but I can't quite figure out how to optimize it. (I'm using CMUCL, btw.) Does anyone have any suggestions? I've tried implementing the routine as follows: ;;; Test different methods of applying numerical operators to ;;; multiple vectors. (declaim (optimize (speed 3) (safety 0))) (defun a+ (&rest arrays) (let* ((n (apply #'min (mapcar #'length arrays))) (out (make-array n :element-type 'double-float))) (declare (fixnum n)) (dotimes (i n) (declare (fixnum i)) (setf (aref out i) (apply #'+ (mapcar #'(lambda (a) (aref a i)) arrays)))) out)) (defun a+2 (&rest arrays) (let* ((n (reduce #'min arrays :key #'length)) (out (make-array n :element-type 'double-float))) (declare (fixnum n)) (dotimes (i n) (declare (fixnum i)) (setf (aref out i) (reduce #'+ arrays :key (lambda (a) (aref a i))))) out)) (defun a+3 (&rest arrays) (let* ((n (apply #'min (mapcar #'length arrays))) (out (make-array n :element-type 'double-float))) (declare (fixnum n)) (apply #'map-into out #'+ arrays))) (defun test (n-arrays n-elements n-times) (declare (fixnum n-arrays n-elements n-times)) (let ((alist nil)) (dotimes (i n-arrays) (declare (fixnum i)) (push (make-array n-elements :element-type 'double-float :initial-element (* 1d0 i)) alist)) (time (dotimes (i n-times) (declare (fixnum i)) (apply #'a+ alist))) (time (dotimes (i n-times) (declare (fixnum i)) (apply #'a+2 alist))) (time (dotimes (i n-times) (declare (fixnum i)) (apply #'a+3 alist))))) -- Johann A. Hibschman | Grad student in Physics, working in Astronomy. ······@physics.berkeley.edu | Probing pulsar pair production processes.

- Re: Optimizing general array functions?
**Bernhard Pfahringer**- Re: Optimizing general array functions?
**Johann Hibschman**- Re: Optimizing general array functions?
**Bernhard Pfahringer**

- Re: Optimizing general array functions?

- Re: Optimizing general array functions?
- Re: Optimizing general array functions?
**Raymond Toy** - Re: Optimizing general array functions?
**Kelly Murray**

From: Bernhard Pfahringer Subject: Re: Optimizing general array functions? Date: Message-ID: <60tfjm$ia2$1@ftp.univie.ac.at>

In article <··········@agate.berkeley.edu>, Johann Hibschman <······@physics12.Berkeley.EDU> wrote: >I've been trying to figure out the best way to write a function >which will take an arbitrary number of 1D arrays of double-float's >and sum them, like (a+ a1 a2 a3). > > >(declaim (optimize (speed 3) (safety 0))) > >(defun a+ (&rest arrays) > (let* ((n (apply #'min (mapcar #'length arrays))) > (out (make-array n :element-type 'double-float))) > (declare (fixnum n)) > (dotimes (i n) > (declare (fixnum i)) > (setf (aref out i) (apply #'+ (mapcar #'(lambda (a) (aref a i)) > arrays)))) > out)) > Hows about the following (plus more declarations): (defun a+ (&rest arrays) (loop with n = (loop for a in arrays minimize (length a)) with out = (make-array n :element-type 'double-float) for i below n do (setf (aref out i) (loop for a in arrays sum (aref a i))) finally (return out))) But I guess that especially for larger and/or more vectors the following should be more efficient due to locality of access reasons: (defun a+ (&rest arrays) (loop with n = (loop for a in arrays minimize (length a)) with out = (make-array n :element-type 'double-float :initial-element 0.0d0) for a in arrays do (loop for i below n do (incf (aref out i) (aref a i))) finally (return out))) cheers, Bernhard -- -------------------------------------------------------------------------- Bernhard Pfahringer Austrian Research Institute for http://www.ai.univie.ac.at/~bernhard/ Artificial Intelligence ········@ai.univie.ac.at

From: Johann Hibschman Subject: Re: Optimizing general array functions? Date: Message-ID: <60uc9m$99@agate.berkeley.edu>

In article <············@ftp.univie.ac.at>, Bernhard Pfahringer <········@korb.ai.univie.ac.at> wrote: >But I guess that especially for larger and/or more vectors >the following should be more efficient due to locality of access >reasons: I've been trying to understand these "locality of access issues," so if you could elaborate a bit more, I'd appreciate it. I know it's better to access many data elements close to each other in memory, due to caches, etc. However, I see the C++ template numerics people saying that their method is great because they implement (for vectors a, b, c, d) a = b + c + d; as for(i=0; i<N; i++) a[i] = b[i] + c[i] + d[i]; rather than as lots of pairwise interactions a = b + (c + d). double temp[N]; for(i=0; i<N; i++) temp1[i] = c[i]+d[i]; for(i=0; i<N; i++) a[i] = b[i]+temp[i]; If the former is better, it's either because the cost of generating the temporary, or because it's somehow better to only have to access each element of 'a' once. One 'easy method' in Lisp seems to be in between the two, in C terms: for(i=0; i<N; i++) a[i] = d[i]; for(i=0; i<N; i++) a[i] += c[i] for(i=0; i<N; i++) a[i] += d[i] which avoids the temporary, and only involves two arrays in each loop, but which contains the looping overhead multiple times. If temporary allocation is the problem, this is fine. If accessing 'a' or the looping overhead is the problem, then this is bad. What mental picture do I need to decide between the two? >(defun a+ (&rest arrays) > (loop with n = (loop for a in arrays minimize (length a)) > with out = (make-array n > :element-type 'double-float > :initial-element 0.0d0) > for a in arrays do > (loop for i below n do > (incf (aref out i) (aref a i))) > finally > (return out))) I haven't yet figured out the loop macro. How would you insert the appropriate declarations into this? - Johann -- Johann A. Hibschman | Grad student in Physics, working in Astronomy. ······@physics.berkeley.edu | Probing pulsar pair production processes.

From: Bernhard Pfahringer Subject: Re: Optimizing general array functions? Date: Message-ID: <60vqtc$13fo$1@ftp.univie.ac.at>

In article <·········@agate.berkeley.edu>, Johann Hibschman <······@physics12.Berkeley.EDU> wrote: >In article <············@ftp.univie.ac.at>, > >I've been trying to understand these "locality of access issues," so >if you could elaborate a bit more, I'd appreciate it. > >I know it's better to access many data elements close to each other in >memory, due to caches, etc. However, I see the C++ template numerics >people saying that their method is great because they implement (for >vectors a, b, c, d) > >a = b + c + d; > >as > >for(i=0; i<N; i++) > a[i] = b[i] + c[i] + d[i]; > For 3 vectors this is the better solution I guess, especially as the temp-sum can be kept in a register. The problem with this kind of solution starts when the number of vectors is so large that accessing zzz[i] causes the prefetched range of values b[i]-b[i+SOME-N] to drop out of the cache. So for a small number of vectors the above solution rules, whereas for a larger number the alternative is better. The exact breakeven point depends on the specifics of your cache. And if your vectors are large enough as to not fit into main memory all at once (i.e. paging occurs), then I guess the alternative is better, too. But you never know without benchmarking. > >I haven't yet figured out the loop macro. How would you insert the >appropriate declarations into this? > Easy-peasy, I should have included some, but I was lazy, the following compiles with no notes at all: (defun a+ (&rest arrays) (loop with n = (loop for a of-type (simple-array double-float (*)) in arrays minimize (length a)) with out = (make-array (the integer n) :element-type 'double-float :initial-element 0.0d0) for a of-type (simple-array double-float (*)) in arrays do (loop for i below n do (incf (aref out i) (aref a i))) finally (return out))) cheers, Bernhard -- -------------------------------------------------------------------------- Bernhard Pfahringer Austrian Research Institute for http://www.ai.univie.ac.at/~bernhard/ Artificial Intelligence ········@ai.univie.ac.at

From: Raymond Toy Subject: Re: Optimizing general array functions? Date: Message-ID: <4nyb4d8wvm.fsf@rtp.ericsson.se>

>>>>> "Johann" == Johann Hibschman <······@physics12.Berkeley.EDU> writes: Johann> I've been trying to figure out the best way to write a function Johann> which will take an arbitrary number of 1D arrays of double-float's Johann> and sum them, like (a+ a1 a2 a3). Johann> It's easy to write a function that does this, but I can't quite Johann> figure out how to optimize it. (I'm using CMUCL, btw.) Johann> Does anyone have any suggestions? I've tried implementing the Johann> routine as follows: [code deleted] How about this? I get just one compiler note for make-array. I think this does what you want: (defun a+ (&rest arrays) (declare (optimize (speed 3))) (let* ((n (reduce #'min arrays :key #'length)) (out (make-array n :element-type 'double-float :initial-element 0d0))) (declare (fixnum n)) (dolist (array arrays out) (declare (type (simple-array double-float (*)) array)) (dotimes (k n) (incf (aref out k) (aref array k)))))) I assume each array is a simple-array of double-float's. Ray

From: Kelly Murray Subject: Re: Optimizing general array functions? Date: Message-ID: <60vame$iaf$1@vapor.franz.com>

In article <··········@agate.berkeley.edu>, ······@physics12.Berkeley.EDU (Johann Hibschman) writes: >> I've been trying to figure out the best way to write a function >> which will take an arbitrary number of 1D arrays of double-float's >> and sum them, like (a+ a1 a2 a3). >> >> It's easy to write a function that does this, but I can't quite >> figure out how to optimize it. (I'm using CMUCL, btw.) >> >> (declaim (optimize (speed 3) (safety 0))) >> >> (defun a+ (&rest arrays) >> (let* ((n (apply #'min (mapcar #'length arrays))) >> (out (make-array n :element-type 'double-float))) >> (declare (fixnum n)) >> (dotimes (i n) >> (declare (fixnum i)) >> (setf (aref out i) (apply #'+ (mapcar #'(lambda (a) (aref a i)) >> arrays)))) >> out)) The real key to speeding this stuff up is using declarations for the arrays so the compiler can opencode the float operations without consing up a bunch of temporary float objects. This likely means you can't #'apply or #'function the operator, but it must be called directly in the source on the declared arg types. This leads to writing a macro instead of a function, that can expand into the fast code. It may or maynot crimp your style, since as a macro, it can't be passed as a function, but you can define a "slow" version as above to handle the general case. the "op" could be a closure, and not a simple arithmetic symbol, but that would take a small change to the macro. The macros handles all the variable args stuff at "compile" time. Here is an implementation. It is "hairy", but it will produce highly optimal code (at least under ACL) Hope this is helpful. -Kelly Murray ···@franz.com (defmacro float-vector-operation (op &rest vectors) (let* ((vbinds (mapcar #'(lambda (v)`(,(gensym) ,v)) vectors)) (vvars (mapcar #'car vbinds)) ) `(let* ( ,@vbinds ;; each vector arg is evaluated, and bound to a local ;; now find the minimum length (n (min ,@vvars)) ;; make the result vector (out (make-array n :element-type 'double-float)) ) (declare (fixnum n) (type (simple-array double-float (*)) out) (type (simple-array double-float (*)) ,@vvars) (optimize (speed 3) (space 0) (safety 0)) ;; yee-ha! ) (dotimes (i n) (declare (fixnum i)) (setf (aref out i) (,op ,@(mapcar #'(lambda (v)`(aref ,v i)) vvars)))) out)))) Here is an example of how it would be called: (let* ((v1+v2 (float-vector-operation + v1 v2))) ...) and what it expands into: (let* ((v1+v2 (let* ((#:gensym1 v1) (#:gensym2 v2) (n (min #:gensym1 #:gensym2)) (out (make-array n :element-type 'double-float)) ) (declare (fixnum n) (type (simple-array double-float (*)) out) (type (simple-array double-float (*)) #:gensym1 #:gensym2) (optimize (speed 3) (space 0) (safety 0)) (dotimes (i n) (declare (fixnum i)) (setf (aref out i) (+ (aref #:gensym1 i) (aref #:gensym2 i)))) out))) ..)