From: Lyle Borg-Graham
Subject: ? on efficient iteration over structures
Date: 
Message-ID: <LYLE.94Oct13122255@cogni.ai.mit.edu>
(FWIW, this is with CMU CL 17c, SPARC 10)

I have a collection of structures ("segments") which are initially
stored in SEGMENT-HASH-TABLE. In these examples, there are a total of
1000 segment structures:

(defstruct segment
  node-1		               ; Another type of structure 
           .
           .
           .
  (capacitance zero :type single-float)	;nanofarads
  graphics-objects			; A list of graphics objects
  )


For the tests below, the segments are also referenced by
*SEGMENT-LIST* and *SEGMENT-ARRAY*, as follows:


(setf *segment-array*  (make-array (list (hash-table-count segment-hash-table))))

(loop for segment being the hash-value of segment-hash-table 
      for i upfrom 0 do
      (setf (aref *segment-array* i) segment))

(setq *SEGMENT-LIST* (loop for segment being the hash-value of segment-hash-table 
			   collecting segment)) 



A function EVAL-SEGMENT is applied to each segment, in arbitrary
order. This function includes 2 multiplies, 3 addition/subtractions,
and 3 SETFs. While this function is not so important to the following
discussion, here it is anyway:



(defun eval-segment (segment)
  (declare (optimize (safety 0) (speed 3) (space 1))
	   (type segment segment))
  (let* ((alpha-cap-double (the double-float (* (the double-float *2/dt*)
						(segment-capacitance segment))))
	 (node (segment-node-2 segment))
	 (node-double-floats (node-double-floats node)))
    (declare (single-float alpha-cap)
	     (double-float alpha-cap-double))
    (setf
     (node-double-floats-aref-current node-double-floats)
     (the double-float (- (node-double-floats-aref-current node-double-floats)
			  (segment-g-leak*v-leak-double segment)))

     (node-double-floats-aref-alpha-charge node-double-floats)
     (the double-float (+ (node-double-floats-aref-alpha-charge node-double-floats)
			  (the double-float
			       (* (the double-float alpha-cap-double)
				  (node-double-floats-aref-voltage-n node-double-floats)))))

     (node-double-floats-aref-jacobian node-double-floats)
     (the double-float (+ (node-double-floats-aref-jacobian node-double-floats)
			  (the double-float alpha-cap-double)))))
  nil)




I have tried various iteration constructs to find the most efficient
one, yet all seem to put as much time in the iteration mechanics as in
the applied function:



(defun eval-all-segments-array ()
  (declare (optimize (safety 0) (speed 3) (space 0) (compilation-speed 0)))
  (do ((i 0 (1+ (the fixnum i))))
      ((= (the fixnum i)
	  (the fixnum (length (the (simple-array segment (*)) *segment-array*)))))
    (eval-segment (aref (the (simple-array segment (*)) *segment-array*) i))))

(defun eval-all-segments-list ()
  (declare (optimize (safety 0) (speed 3) (space 0) (compilation-speed 0)))
  (dolist (segment *segment-list*)
    (eval-segment segment)))

(defun eval-all-segments-list2 ()
  (declare (optimize (safety 0) (speed 3) (space 0) (compilation-speed 0)))
  (do ((segment *segment-list* (cdr segment)))
      ((null segment))
    (eval-segment (the segment (car segment)))))

(defun eval-all-segments-list3 ()
  (declare (optimize (safety 0) (speed 3) (space 0) (compilation-speed 0)))
  (loop for segment in *segment-list* do
	(eval-segment (the segment segment))))

(defun eval-all-segments-mapc ()
  (declare (optimize (safety 0) (speed 3) (space 0) (compilation-speed 0)))
  (mapc #'eval-segment *segment-list*))
	

(defun eval-all-segments-hash ()
  (declare (optimize (safety 0) (speed 3) (space 0) (compilation-speed 0)))
  (loop for segment being the hash-value of segment-hash-table do
	(eval-segment (the segment segment))))





The results of profiling:


(progn (profile:unprofile)
       (profile:profile EVAL-ALL-SEGMENTS-array
			EVAL-ALL-SEGMENTS-LIST
			EVAL-ALL-SEGMENTS-LIST2
			EVAL-ALL-SEGMENTS-LIST3 		
			EVAL-ALL-SEGMENTS-mapc
			EVAL-ALL-SEGMENTS-hash)
       (dotimes (i 1000) (EVAL-ALL-SEGMENTS-array))
       (dotimes (i 1000) (EVAL-ALL-SEGMENTS-LIST))
       (dotimes (i 1000) (EVAL-ALL-SEGMENTS-LIST2))
       (dotimes (i 1000) (EVAL-ALL-SEGMENTS-LIST3))
       (dotimes (i 1000) (EVAL-ALL-SEGMENTS-mapc))
       (dotimes (i 1000) (EVAL-ALL-SEGMENTS-hash))
       (profile:report-time))
  Seconds  |  Consed   |  Calls  |  Sec/Call  |  Name:
------------------------------------------------------
     7.110 |         0 |   1,000 |    0.00711 | EVAL-ALL-SEGMENTS-HASH
     5.510 |         0 |   1,000 |    0.00551 | EVAL-ALL-SEGMENTS-MAPC
     5.470 |         0 |   1,000 |    0.00547 | EVAL-ALL-SEGMENTS-ARRAY
     5.390 |         0 |   1,000 |    0.00539 | EVAL-ALL-SEGMENTS-LIST2
     5.340 |         0 |   1,000 |    0.00534 | EVAL-ALL-SEGMENTS-LIST
     5.270 |         0 |   1,000 |    0.00527 | EVAL-ALL-SEGMENTS-LIST3
------------------------------------------------------
    34.090 |         0 |   6,000 |            | Total

Estimated total profiling overhead: 0.24 seconds


Hash iteration obviously is not so good, and the remainder are roughly
the same as each other.  Now, to estimate the time taken in iteration
versus applying the EVAL-SEGMENT function:


(defun eval-all-segments-list3-five-times ()
  (declare (optimize (safety 0) (speed 3) (space 0) (compilation-speed 0)))
  (loop for segment in *segment-list* do
	(eval-segment (the segment segment))
	(eval-segment (the segment segment))
	(eval-segment (the segment segment))
	(eval-segment (the segment segment))
	(eval-segment (the segment segment))))

(progn (profile:unprofile)
       (profile:profile 
			EVAL-ALL-SEGMENTS-LIST3 		
			EVAL-ALL-SEGMENTS-LIST3-five-times)
       (dotimes (i 1000) (EVAL-ALL-SEGMENTS-LIST3))
       (dotimes (i 1000) (EVAL-ALL-SEGMENTS-LIST3-five-times))
       (profile:report-time))
  Seconds  |  Consed   |  Calls  |  Sec/Call  |  Name:
------------------------------------------------------
    14.190 |         0 |   1,000 |    0.01419 | EVAL-ALL-SEGMENTS-LIST3-FIVE-TIMES
     5.200 |         0 |   1,000 |    0.00520 | EVAL-ALL-SEGMENTS-LIST3
------------------------------------------------------
    19.390 |         0 |   2,000 |            | Total

Estimated total profiling overhead: 0.08 seconds



Thus, the ratio between the iteration mechanics and the EVAL-SEGMENT
call is:

* (/ (- (* 5 .0052) .01419)
     (- 0.01419 .0052))
1.3136821
*

Is this overhead unavoidable?

Thanks for any ideas!

Lyle

From: Simon Leinen
Subject: Re: ? on efficient iteration over structures
Date: 
Message-ID: <SIMON.94Oct13123801@ligsg6.epfl.ch>
Your comparison is somewhat unfair in that the non-iterating version
evals the same segment five times.  It is very likely that all data
necessary for the computation fit in the (primary data) cache, so the
second to fifth iterations might be much faster than the first.  If I
understand correctly, the iteration tests were done on "real" data,
where there are actually 1000 distinct segments with their attached
data structures.  These certainly won't fit in the primary cache on
the SPARC.

To test whether the difference is due to overhead of the iteration
constructs rather than from caching effects, you should fill all slots
in the data structures with the same segment and try again.

If it turns out that the performance loss is because of caching, you
should try to improve the locality of the data you are actually using.
For example, you could make a copy of the "important" (i.e. used in
the time-critical part) slots of your segments and nodes and store
them in vectors of smaller structs or simply vectors of numbers.
-- 
Simon.
From: Barry Margolin
Subject: Re: ? on efficient iteration over structures
Date: 
Message-ID: <37k2el$5q@tools.near.net>
In article <···················@ligsg6.epfl.ch> ·····@lig.di.epfl.ch (Simon Leinen) writes:
>To test whether the difference is due to overhead of the iteration
>constructs rather than from caching effects, you should fill all slots
>in the data structures with the same segment and try again.

The way I like to determine iteration overhead is by replacing the call to
the work function (EVAL-SEGMENT in his example) with a call to a function
like:

(defun no-op (&rest ignore)
  (declare (ignore ignore))
  nil)

As to how to reduce iteration overhead, the best technique is probably loop
unrolling.  Iteration over arrays is most easily amenable to this
optimization technique.  And Lisp macros make it easy to write the code
once.  Here's an (untested) macro to generate unrolled loops:

(defmacro dotimes-unrolled ((variable limit unrolling-factor
                             &optional resultform)
                            &body body)
  "Like DOTIMES, but unrolls the loop into groups of UNROLLING-FACTOR chunks.
UNROLLING-FACTOR must be a literal constant."
  (check-type unrolling-factor (satisfies constantp) "a literal constant")
  (let ((limit-var (gensym)) ; Prevent multiple evaluation (ONCE-ONLY)
        (unrolled-limit-var (gensym))) ; Prevent variable shadowing
    `(let* ((,limit-var ,limit)
            (,unrolled-limit-var
              (- ,limit-var (mod ,limit-var ,unrolling-factor))))
       (do ((,variable 0))
           ((>= ,variable ,unrolled-limit-var)
            ;; do the remaining iterations one at a time
	    (do ((,variable ,variable (1+ ,variable)))
                ((>= ,variable ,limit-var) ,result-form)
              ,@body))
         ,@(loop for i from 0 to unrolling-factor
                 append body
                 collect `(incf ,variable))))))
-- 

Barry Margolin
BBN Internet Services Corp.
······@near.net
From: Henry G. Baker
Subject: Re: ? on efficient iteration over structures
Date: 
Message-ID: <hbakerCxotuu.7t2@netcom.com>
In article <·········@tools.near.net> ······@nic.near.net (Barry Margolin) writes:
>In article <···················@ligsg6.epfl.ch> ·····@lig.di.epfl.ch (Simon Leinen) writes:
>>To test whether the difference is due to overhead of the iteration
>>constructs rather than from caching effects, you should fill all slots
>>in the data structures with the same segment and try again.
>
>The way I like to determine iteration overhead is by replacing the call to
>the work function (EVAL-SEGMENT in his example) with a call to a function
>like:
>
>(defun no-op (&rest ignore)
>  (declare (ignore ignore))
>  nil)
>
>As to how to reduce iteration overhead, the best technique is probably loop
>unrolling.  Iteration over arrays is most easily amenable to this
>optimization technique.  And Lisp macros make it easy to write the code
>once.  Here's an (untested) macro to generate unrolled loops:

There's a discussion of loop-unrolling in Lisp in:

"Inlining Semantics for Subroutines which are Recursive".  ACM Sigplan
Not. 27,12 (Dec 1992), 39-46.  Also in my ftp directory.

      Henry Baker
      Read ftp.netcom.com:/pub/hbaker/README for info on ftp-able papers.
From: Rob MacLachlan
Subject: Re: ? on efficient iteration over structures
Date: 
Message-ID: <37jpef$q3v@cantaloupe.srv.cs.cmu.edu>
I suspect what you're seeing is not loop overhead per-se, but cache miss
overhead.  When you evaluate the same segment 5 times, the first evaluation
incurrs all of the cache misses, and subsequent evaluations run much faster.

A gratuitous overhead that you may be incurring in the inner loop is the
actual call to eval-segment.  This should be inlined or block-compiled.

  Rob