Profiling Lisp code

From: Jeff
Subject: Profiling Lisp code
Date: Tue, 07 Dec 2004 02:28:36 +0000
Message-ID: <oz8td.525611$D%.494425@attbi_s51>

I'm debating between two methods of storing and manipulating 2D and 3D
vectors: a struct or a 1D array. To test this, I whipped up a test
routine and structure:

(defstruct v3 x y z)

(defun v3-cross (v1 v2)
  "Returns the cross product of two, 3D vectors."
  (make-v3 :x (- (* (v3-y v1) (v3-z v2)) (* (v3-z v1) (v3-y v2)))
           :y (- (* (v3-z v1) (v3-x v2)) (* (v3-x v1) (v3-z v2)))
           :z (- (* (v3-x v1) (v3-y v2)) (* (v3-y v1) (v3-x v2)))))

(defun v3-across (v1 v2)
  "Returns the cross product of two, 3D vectors."
  (vector (- (* (aref v1 1) (aref v2 2)) (* (aref v1 2) (aref v2 1)))
          (- (* (aref v1 2) (aref v2 0)) (* (aref v1 0) (aref v2 2)))
          (- (* (aref v1 0) (aref v2 1)) (* (aref v1 1) (aref v2 0)))))

Now, both of these could be improved by storing x, y and z in local
variables, but the point of the test is to see, overall, if one method
of access really beat out the other (this is running on LispWorks).

What gets me, is that using a benchmark macro, the TIME or PROFILE
functions over 1,000,000+ iterations returns very varied results each
time. I'm assuming that this is being caused by the garbage collector
possibly kicking in at inopportune times?

As example numbers that I'm getting, using TIME and 1M iterations of a
single cross product on each gives me:

V3-CROSS: 3.640s, 72003376 bytes standard / 11004213 bytes conses
V3-ACROSS: 3.343s, 96003216 bytes standard / 11004257 bytes conses

Yet, running the same test again results in almost swapped values (the
CROSS versions runs faster than the ACROSS version). Just wondering if
anyone out there had any suggestions as to how and better profile these
functions.

Note: This is not a quest for the fastest cross product.. rather the
quest for the better storage method. However, determining how to
profile functions better is a good side-quest ;)

Jeff M.

-- 
http://www.retrobyte.org
··············@gmail.com

Re: Profiling Lisp code Pascal Bourguignon
- Re: Profiling Lisp code Jeff
  - Re: Profiling Lisp code ···············@yahoo.com
    - Re: Profiling Lisp code Svein Ove Aas
  - Re: Profiling Lisp code GP lisper
Re: Profiling Lisp code Raffael Cavallaro
Re: Profiling Lisp code Wade Humeniuk
- Re: Profiling Lisp code Wade Humeniuk

From: Pascal Bourguignon
Subject: Re: Profiling Lisp code
Date: Tue, 07 Dec 2004 03:30:19 +0000
Message-ID: <87y8gazsxg.fsf@thalassa.informatimago.com>

"Jeff" <·······@gmail.com> writes:
> I'm debating between two methods of storing and manipulating 2D and 3D
> vectors: a struct or a 1D array.

There should not be a lot of difference.

For example in clisp note how the same is done both in the case of
vectors and strctures:

LISPFUNNR(svref,2)
{ /* (SVREF simple-vector index), CLTL p. 291 */
  /* check simple-vector: */
  if (!simple_vector_p(STACK_1))
    fehler_kein_svector(TheSubr(subr_self)->name,STACK_1);
  /* check index: */
  var uintL index = test_index();
  /* fetch element: */
  VALUES1(TheSvector(STACK_1)->data[index]);
  skipSTACK(2);
}

/* Subroutine for record access functions
 > STACK_1: record argument
 > STACK_0: index argument
 < STACK: cleared up
 < returns: the address of the referred record item */
local gcv_object_t* record_up (void) {
  /* the record must be a Closure/Structure/Stream/OtherRecord: */
  if_recordp(STACK_1, ; , { skipSTACK(1); fehler_record(); } );
  var object record = STACK_1;
  var uintL length = Record_length(record);
  var uintL index;
  if (!(posfixnump(STACK_0) && ((index = posfixnum_to_L(STACK_0)) < length)))
    /* extract and check index */
    fehler_index(length);
  skipSTACK(2); /* clear up stack */
  return &TheRecord(record)->recdata[index]; /* record element address */
}

LISPFUNNR(record_ref,2)
{ /* (SYS::%RECORD-REF record index) return the index'th entry in the record */
  VALUES1(*(record_up())); /* record element as value */
}

An implementation could easily use exactly the same low-level data
structure for both structures and (vector t).  Note how CLHS even give
the option to the user to reveal such an implementation choice with
the :type option of defstruct.

So you will choose one or the other depending on how you use the
fields.  If the operators on your objects can be expressed simply only
with absolute names, a structure will do and _could_ be slightly
faster with a higly optimizing compiler.  On the other hand, if you
have complex operations that are better expressed with indices you'd
be better using vectors.  With  few dimensions, it's probably a better
bet to choose structures, may be with some macrology to expand indexed
notation to absolute "fields", but I would not bother.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
The world will now reboot; don't bother saving your artefacts.

From: Jeff
Subject: Re: Profiling Lisp code
Date: Tue, 07 Dec 2004 07:07:14 +0000
Message-ID: <CEctd.216539$R05.95314@attbi_s53>

Pascal Bourguignon wrote:

> 
> "Jeff" <·······@gmail.com> writes:
> > I'm debating between two methods of storing and manipulating 2D and
> > 3D vectors: a struct or a 1D array.
> 
> There should not be a lot of difference.

Actually, for what they are being used for, there could potentially be
a lot of difference (which is why I'll be using vectors), considering
other functions that exist in the standard.

That's why I really didn't want this to turn into a debate between
structs and vectors (in this particular case). I was more interested in
why the profiling seemed to bounce back and forth. Thanks for the
reply, though. :)

Jeff M.

-- 
http://www.retrobyte.org
··············@gmail.com

From: ···············@yahoo.com
Subject: Re: Profiling Lisp code
Date: Tue, 07 Dec 2004 15:08:37 +0000
Message-ID: <1102432117.175225.193690@f14g2000cwb.googlegroups.com>

For speed, the most important thing is to declare the number types:
single-float, double-float, fixnum, or whatever.  Use :type in
defstruct, and
(declare (double-float ...)) or
(declare (type (simple-vector double-float *) ...) after let and defun.

Then compile with
(declaim (optimize speed)).  Or even
(declaim (optimize speed (safety 0) (debug 0))),
but read your compiler's manual for why (safety 0) is way risky.

In v3-across, consider simple-vector and svref instead of array/vector
and aref.

To be absolutely pedantic, wrap (the double-float ...) around every
function call that returns a double-float.

All this is ugly.  But you can experiment, remove the changes that
don't matter, and wrap some of the others in a macro.

After compilation with the speed setting, compare
(disassemble 'v3-cross) and
(disassemble 'v3-across)

From: Svein Ove Aas
Subject: Re: Profiling Lisp code
Date: Tue, 07 Dec 2004 15:28:00 +0000
Message-ID: <cp4i57$8j3$2@services.kq.no>

···············@yahoo.com wrote:

> For speed, the most important thing is to declare the number types:
> single-float, double-float, fixnum, or whatever.  Use :type in
> defstruct, and
> (declare (double-float ...)) or
> (declare (type (simple-vector double-float *) ...) after let and defun.
> 
> Then compile with
> (declaim (optimize speed)).  Or even
> (declaim (optimize speed (safety 0) (debug 0))),
> but read your compiler's manual for why (safety 0) is way risky.
> 
> In v3-across, consider simple-vector and svref instead of array/vector
> and aref.
> 
> To be absolutely pedantic, wrap (the double-float ...) around every
> function call that returns a double-float.
> 
> All this is ugly.  But you can experiment, remove the changes that
> don't matter, and wrap some of the others in a macro.
> 
> After compilation with the speed setting, compare
> (disassemble 'v3-cross) and
> (disassemble 'v3-across)

Yes, that'll work.
It'll *work*, but...

Instead of declaring parameters to be of type fixnum, declare them to be
something more limited. For example, if one parameter can only be between 0
and 9, declare it as (integer 0 9).

Type inference will work better, *and* you get a bonus assertion at high
safety levels. (1+ a) can return a bignum if a is declared as a fixnum, but
not if a is an (integer 0 1023), or similar.

From: GP lisper
Subject: Re: Profiling Lisp code
Date: Tue, 07 Dec 2004 19:58:34 +0000
Message-ID: <1102449514.7ddfc04422916e7f1294e2a64079ba5d@teranews>

On Tue, 07 Dec 2004 07:07:14 GMT, <·······@gmail.com> wrote:
> structs and vectors (in this particular case). I was more interested in
> why the profiling seemed to bounce back and forth. Thanks for the

The OS, especially if it's windoze.

You need to run multiple trials in such circumstances.


-- 
Everyman has three hearts;
one to show the world, one to show friends, and one only he knows.

From: Raffael Cavallaro
Subject: Re: Profiling Lisp code
Date: Tue, 07 Dec 2004 17:22:08 +0000
Message-ID: <2004120712220816807%raffaelcavallaro@pasdespamsilvousplaitdotmaccom>

On 2004-12-06 21:28:36 -0500, "Jeff" <·······@gmail.com> said:

> I'm debating between two methods of storing and manipulating 2D and 3D
> vectors: a struct or a 1D array. To test this, I whipped up a test
> routine and structure:
> 
> (defstruct v3 x y z)
> 
> (defun v3-cross (v1 v2)
>   "Returns the cross product of two, 3D vectors."
>   (make-v3 :x (- (* (v3-y v1) (v3-z v2)) (* (v3-z v1) (v3-y v2)))
>            :y (- (* (v3-z v1) (v3-x v2)) (* (v3-x v1) (v3-z v2)))
>            :z (- (* (v3-x v1) (v3-y v2)) (* (v3-y v1) (v3-x v2)))))
> 
> (defun v3-across (v1 v2)
>   "Returns the cross product of two, 3D vectors."
>   (vector (- (* (aref v1 1) (aref v2 2)) (* (aref v1 2) (aref v2 1)))
>           (- (* (aref v1 2) (aref v2 0)) (* (aref v1 0) (aref v2 2)))
>           (- (* (aref v1 0) (aref v2 1)) (* (aref v1 1) (aref v2 0)))))


Apart from the useful suggestions regarding type declarations, you 
might want to do two other things to get consistent results:

1.  Use random data:

;; for example

(defun random-v3 ()
    (make-v3 :x (random 1000000d0)
             :y (random 1000000d0)
             :z (random 1000000d0)))

(defun random-array ()
    (let ((the-array (make-array 3 :element-type 'double-float)))
      (loop for index from 0 to 2 do (setf (aref the-array index) 
(random 1000000d0)))
      the-array))

2. Control for the time needed to create the data, so you're only 
timing the access and computation:

;; for example

;; control for the time to create the random arrays
;; (time (dotimes (n (expt 10 6)) (progn (random-array) (random-array))))

;; time the actual computation
;; (time (dotimes (n (expt 10 6)) (v3-across (random-array) (random-array))))

;; control for the time to create the random v3 structs
;; (time (dotimes (n (expt 10 6)) (progn (random-v3) (random-v3))))

;; perform the actual computation
;; (time (dotimes (n (expt 10 6)) (v3-cross (random-v3) (random-v3))))


3. Combine these two in your benchmark:

(defun v3-bench (times)
	(time (dotimes (n times) (progn (random-array) (random-array))))
	(time (dotimes (n times) (v3-across (random-array) (random-array))))
	(time (dotimes (n times) (progn (random-v3) (random-v3))))
	(time (dotimes (n times) (v3-cross (random-v3) (random-v3)))))

When I do this, I get very consistent times across several implementations.


BTW these macros might be useful for your type declarations:

(defmacro df- (one two) 
  `(the double-float (- (the double-float ,one) (the double-float ,two))))

(defmacro df* (one two) 
  `(the double-float (* (the double-float ,one) (the double-float ,two))))

Then your functions become:

(defun v3-cross (v1 v2)
   "Returns the cross product of two, 3D vectors."
   (declare (type v3 v1 v2)
	(optimize  (speed 3) (safety 0)
	 #+ lispworks (float 0)
	 (compilation-speed 0) (space 0)))
    (make-v3 :x (df- (df* (v3-y v1) (v3-z v2)) (df* (v3-z v1) (v3-y v2)))
             :y (df- (df* (v3-z v1) (v3-x v2)) (df* (v3-x v1) (v3-z v2)))
             :z (df- (df* (v3-x v1) (v3-y v2)) (df* (v3-y v1) (v3-x v2)))))

(defun v3-across (v1 v2)
  "Returns the cross product of two, 3D vectors."
   (declare (type (simple-array double-float (3)) v1 v2)
            (optimize (speed 3) (safety 0) 
             #+ lispworks (float 0)
            (compilation-speed 0) (space 0)))
    (vector (df- (df* (aref v1 1) (aref v2 2)) (df* (aref v1 2) (aref v2 1)))
            (df- (df* (aref v1 2) (aref v2 0)) (df* (aref v1 0) (aref v2 2)))
            (df- (df* (aref v1 0) (aref v2 1)) (df* (aref v1 1) (aref v2 0)))))





regards,

Ralph

From: Wade Humeniuk
Subject: Re: Profiling Lisp code
Date: Tue, 07 Dec 2004 16:44:59 +0000
Message-ID: <f6ltd.50149$VL6.14400@clgrps13>

Jeff wrote:

> 
> What gets me, is that using a benchmark macro, the TIME or PROFILE
> functions over 1,000,000+ iterations returns very varied results each
> time. I'm assuming that this is being caused by the garbage collector
> possibly kicking in at inopportune times?
> 
> As example numbers that I'm getting, using TIME and 1M iterations of a
> single cross product on each gives me:
> 
> V3-CROSS: 3.640s, 72003376 bytes standard / 11004213 bytes conses
> V3-ACROSS: 3.343s, 96003216 bytes standard / 11004257 bytes conses
> 
> Yet, running the same test again results in almost swapped values (the
> CROSS versions runs faster than the ACROSS version). Just wondering if
> anyone out there had any suggestions as to how and better profile these
> functions.
> 

There could be three possible areas that can introduce uncertainty into the
timing.  Multiprocessing, interrupts and gc.  I can seem to reduce the
uncertainty by using:

mp:with-preemption
mp:without-interrupts
hcl:with-heavy-allocation

See the reference manual under the proper sections:

http://www.lispworks.com/reference/lw43/LWRM/html/lwref.htm

You could try this version.  It seems to be more stable
for timing the opreation, at least one has more
control over what my-time-function does than what
is available in time.

(defun v3-cross-test ()
   (let ((v1 (make-v3 :x 3.0 :y 1.23 :z 4.5))
         (v2 (make-v3 :x 4.5 :y 19.43 :z 3.1)))
     (dotimes (i 1000000)

(defun v3-across-test ()
   (let ((v1 (vector 3.0 1.23 4.5))
         (v2 (vector 4.5 19.43 3.1)))
     (dotimes (i 1000000)
       (v3-across v1 v2))))     (v3-cross v1 v2))))


(defun my-time-function (function)
   (mp:without-preemption
     (mp:without-interrupts
       (hcl:with-heavy-allocation
         (let ((start-time (get-internal-real-time)))
           (funcall function)
           (prog1
               (float (/ (- (get-internal-real-time) start-time) 
internal-time-units-per-second))
             (hcl:gc-if-needed)))))))

Seems I have to run it a few times to get it stable,

CL-USER 60 > (my-time-function #'v3-cross-test)
2.985

CL-USER 61 > (my-time-function #'v3-cross-test)
3.025

CL-USER 62 > (my-time-function #'v3-cross-test)
3.114

CL-USER 63 > (my-time-function #'v3-cross-test)
3.135

CL-USER 64 > (my-time-function #'v3-cross-test)
3.155

CL-USER 65 > (my-time-function #'v3-cross-test)
2.653

CL-USER 66 > (my-time-function #'v3-cross-test)
2.653

CL-USER 67 > (my-time-function #'v3-cross-test)
2.654

CL-USER 68 > (my-time-function #'v3-cross-test)
2.654

CL-USER 69 > (my-time-function #'v3-cross-test)
2.674

CL-USER 70 > (my-time-function #'v3-cross-test)
2.654

CL-USER 71 > (my-time-function #'v3-cross-test)
2.474

CL-USER 72 > (loop for i from 0 below 30
                    sum (my-time-function #'v3-cross-test) into
                    total
                    finally (return (/ total 30)))
2.6111

CL-USER 73 > (loop for i from 0 below 30
                    sum (my-time-function #'v3-across-test) into
                    total
                    finally (return (/ total 30)))
3.560133333333333

CL-USER 74 >

Wade

From: Wade Humeniuk
Subject: Re: Profiling Lisp code
Date: Tue, 07 Dec 2004 16:56:07 +0000
Message-ID: <Hgltd.50151$VL6.34925@clgrps13>

On second thought it may be better to do:

(defun my-time-function (function)
   (mp:without-preemption
     (mp:without-interrupts
       (unwind-protect
           (progn
             (hcl:avoid-gc)
             (let ((start-time (get-internal-real-time)))
               (funcall function)
               (float (/ (- (get-internal-real-time) start-time) 
internal-time-units-per-second))))
         (progn (hcl:normal-gc)
           (hcl:gc-if-needed))))))


Wade