Extreme let-itis. Two most efficiency questions

From: R. Bharat Rao
Subject: Extreme let-itis.  Two most efficiency questions
Date: Mon, 26 Aug 1991 18:01:23 +0000
Message-ID: <bharat.683229683@milton>

This is related to the previous note on let and let*, but is somewhat
different.

QUESTION A.
;;; Consider
  (let ((x1 ..) (x2 ...))
    (dotimes (i 5000)
       (some-function (FOO2 eqn) x1 x2))
    (more stuff))
;;VERSUS
  (let ((x1 ..) (x2 ...)
	(EXTRA_VAR (FOO2 eqn)))
    (dotimes (i 5000)
       (some-function EXTRA-VAR x1 x2))
    (more stuff))

Assume that the readability issue is nil (both forms are equally
readable).  At what point (number of dotimes iterations, each
involving a call to FOO2) does it become computationally more
efficient to create an extra local variable, EXTRA-VAR that does this
just once (onviously FOO2 has no side effects)

This is clearly related to the complexity of FOO2.  So what if FOO2 is:

a) a slot-access fn for a defstruct, like (equation-template eqn)
  [ corresponds to a single aref].  

b) slighly more complicated like (length (equation-cover eqn)), where
   equation-cover is also a slot-access function on a defstruct.

Or are the relative tradeofs in such cases so small or so machine
dependent that it is not even a factor.

QUESTION B:
Say, you need to define a few (< 3) ``basic'' local variables, and
then some more local variables that depend on these.  For example,
consider the function linear-regression (below) that given an equation
(a structure) performs linear regressionto find the coefficients.

So you need to first define local variables n (number of datapoints),
and k+1 (number of initial matrix columns), and then use these to
create matrices of appropriate orders.

Again, this function will be called several thousand times

1. Using two let's
(defun linear-regression (eqn)
  (let ((k+1 (compute-order-length (equation-template eqn)))
	(n (length (equation-cover eqn))))
    (let ((A-matrix (make-matrix n k+1))	; A is a nX(k+1) matrix
	  (Y-matrix (make-matrix n 1))		;Y is a nX1 matrix
	  (A-transpose (make-matrix k+1 n))	;Trans[A] is (k+1)Xn matrix
	  ....
	  (B-matrix (make-matrix k+1 1))	;INV . A-tr-Y
	  )
      (many matrix computations)
      ...)))

2. Using one let*
(defun linear-regression (eqn)
  (let* ((k+1 (compute-order-length (equation-template eqn)))
	 (n (length (equation-cover eqn)))
	 (A-matrix (make-matrix n k+1))	; A is a nX(k+1) matrix
	 (Y-matrix (make-matrix n 1))		;Y is a nX1 matrix
	  )
    (many matrix computations)
    ...))

3. Using &aux & let
(defun linear-regression (eqn &aux k+1 n)
  (setq k+1 (compute-order-length (equation-template eqn)))
  (setq n (length (equation-cover eqn)))
  (let ((A-matrix (make-matrix n k+1))		; A is a nX(k+1) matrix
	..)
    (many matrix computations)
    ...))


It seems to me that [2] is the most convenient to write but the most
computationally expensive.  

I personally find 3 more readable than 1 (especially when you have
just 1 ``basic'' variable, say only n and no k+1), but is 1
significantly more efficient?

Thanks,

-Bharat
--
R. Bharat Rao                         E-mail: ······@cs.uiuc.edu
Beckman Institute for Advanced Science and Technology
Electrical & Computer Engineering, University of Illinois, Urbana
Snail Mail: Beckman Institue, 405 N. Matthews, Urbana, IL 61801

Re: Extreme let-itis. Two most efficiency questions Barry Margolin
Re: Extreme let-itis. Two most efficiency questions Bob Kerns
Re: Extreme let-itis. Two most efficiency questions Bob Kerns

From: Barry Margolin
Subject: Re: Extreme let-itis.  Two most efficiency questions
Date: Mon, 26 Aug 1991 22:45:46 +0000
Message-ID: <1991Aug26.224546.25802@Think.COM>

In article <················@milton> ······@herodotus.cs.uiuc.edu (R. Bharat Rao) writes:
>This is related to the previous note on let and let*, but is somewhat
>different.
>
>QUESTION A.
>;;; Consider
>  (let ((x1 ..) (x2 ...))
>    (dotimes (i 5000)
>       (some-function (FOO2 eqn) x1 x2))
>    (more stuff))
>;;VERSUS
>  (let ((x1 ..) (x2 ...)
>	(EXTRA_VAR (FOO2 eqn)))
>    (dotimes (i 5000)
>       (some-function EXTRA-VAR x1 x2))
>    (more stuff))
>
>Assume that the readability issue is nil (both forms are equally
>readable).  At what point (number of dotimes iterations, each
>involving a call to FOO2) does it become computationally more
>efficient to create an extra local variable, EXTRA-VAR that does this
>just once (onviously FOO2 has no side effects)

I would guess that in most implementations EXTRA-VAR would make it more
efficient if there is more than one iteration.  Both forms

	(some-function (FOO2 eqn) x1 x2)

and

	(some-function EXTRA-VAR x1 x2)

access three variables and call SOME-FUNCTION, but the first one also calls
FOO2.  Assuming it takes the same amount of time to access the variables
EXTRA-VAR and EQN, I would expect the first to be slower than the second.

>This is clearly related to the complexity of FOO2.  So what if FOO2 is:

I don't think so.  No matter what FOO2 does, invoking it will be more
expensive than not invoking it.

In some implementations the two forms may have equivalent performance.  If
the compiler is able to determine that FOO2 is a pure function (has no side
effects or dependencies on anything other than its arguments) and that the
value of (FOO eqn) can't change during the iteration, the optimizer may
move the computation out of the loop for you; this is one of the oldest
known compiler optimizations (along with its cousin, common subexpression
elimination).  For instance, if FOO is CAR, there is no (RPLACA eqn ...)
in the loop, and the value of eqn is never stored anywhere else or passed
to a non-pure function (which might modify the cons or store it in a global
variable), repetitive (CAR eqn) calls can be optimized away.

My personal style is that I try to do common subexpression elimination
myself.  Even something as simple as CAR; I'll write

	(let ((first-foo (car foo)))
	  (call-one-function first-foo)
	  (call-another-function first-foo))

rather than

	(call-one-function (car foo))
	(call-another-function (car foo))

First, I think it makes the program clearer -- if I perform a computation
more than once, the result probably deserves a name.  Second, it's better
for modularity: if the computation changes, I only have to change one
place.  Finally, I don't expect compilers to be able to determine most of
the occasions when common subexpression elimination and moving computations
out of loops are appropriate; look at all the "if"s in my earlier
paragraph: the age-old compiler techniques for determining when these
conditions are met are mostly applicable to variables that are known to
hold unmodifiable data (e.g. numbers) -- as soon as you pass a cons cell or
defstruct structure to an unrecognized function, the compiler must assume
that it may modify it.

>QUESTION B:
>Say, you need to define a few (< 3) ``basic'' local variables, and
>then some more local variables that depend on these.  For example,
>consider the function linear-regression (below) that given an equation
>(a structure) performs linear regressionto find the coefficients.
>
>So you need to first define local variables n (number of datapoints),
>and k+1 (number of initial matrix columns), and then use these to
>create matrices of appropriate orders.
>
>Again, this function will be called several thousand times
>
>1. Using two let's
>(defun linear-regression (eqn)
>  (let ((k+1 (compute-order-length (equation-template eqn)))
>	(n (length (equation-cover eqn))))
>    (let ((A-matrix (make-matrix n k+1))	; A is a nX(k+1) matrix
>	  (Y-matrix (make-matrix n 1))		;Y is a nX1 matrix
>	  (A-transpose (make-matrix k+1 n))	;Trans[A] is (k+1)Xn matrix
>	  ....
>	  (B-matrix (make-matrix k+1 1))	;INV . A-tr-Y
>	  )
>      (many matrix computations)
>      ...)))
>
>2. Using one let*
>(defun linear-regression (eqn)
>  (let* ((k+1 (compute-order-length (equation-template eqn)))
>	 (n (length (equation-cover eqn)))
>	 (A-matrix (make-matrix n k+1))	; A is a nX(k+1) matrix
>	 (Y-matrix (make-matrix n 1))		;Y is a nX1 matrix
>	  )
>    (many matrix computations)
>    ...))
>
>3. Using &aux & let
>(defun linear-regression (eqn &aux k+1 n)
>  (setq k+1 (compute-order-length (equation-template eqn)))
>  (setq n (length (equation-cover eqn)))
>  (let ((A-matrix (make-matrix n k+1))		; A is a nX(k+1) matrix
>	..)
>    (many matrix computations)
>    ...))
>
>
>It seems to me that [2] is the most convenient to write but the most
>computationally expensive.  

I'm not sure I understand this problem.  What happened to A-transpose and
B-matrix in [2] and [3]?  Are they now internal, intermediate values in the
"many matrix computations"?  So is the point of this question actually:
should I preallocate arrays to hold intermediate values, or pass them back
and forth as parameters and values?

Without seeing what the bodies of these LETs look like, it's hard to answer
that.  It's likely to be better to preallocate -- this code is likely to be
dominated by the time spent allocating and GCing the arrays, and page
faulting while accessing them (assuming they're fairly large).  Allocating
as few arrays as possible and reusing them is likely to help in this
regard.  Some Lisp implementations provide a facility called "resources"
that exist to manage pools of similar objects that should be allocated and
then reused as needed.

>I personally find 3 more readable than 1 (especially when you have
>just 1 ``basic'' variable, say only n and no k+1), but is 1
>significantly more efficient?

[2] and [3] should be almost identical in performance.  Some
implementations will bind N and K+1 to NIL in [3], even though they are
never used before being set to "real" values; this time will usually be
negligable, though (it's often done for debugability -- if you set a
breakpoint before the code that sets K+1, its value should be NIL).

Other than what I said above, I have no comment on the comparison between
[1] and the other two.
-- 
Barry Margolin, Thinking Machines Corp.

······@think.com
{uunet,harvard}!think!barmar

From: Bob Kerns
Subject: Re: Extreme let-itis.  Two most efficiency questions
Date: Mon, 26 Aug 1991 22:51:28 +0000
Message-ID: <1991Aug26.225128.24292@crl.dec.com>

In article <················@milton>, ······@herodotus.cs.uiuc.edu (R. Bharat Rao) writes:
> QUESTION A.
> ;;; Consider
>   (let ((x1 ..) (x2 ...))
>     (dotimes (i 5000)
>        (some-function (FOO2 eqn) x1 x2))
>     (more stuff))
> ;;VERSUS
>   (let ((x1 ..) (x2 ...)
> 	(EXTRA_VAR (FOO2 eqn)))
>     (dotimes (i 5000)
>        (some-function EXTRA-VAR x1 x2))
>     (more stuff))
> 
> Assume that the readability issue is nil (both forms are equally
> readable).  At what point (number of dotimes iterations, each
> involving a call to FOO2) does it become computationally more
> efficient to create an extra local variable, EXTRA-VAR that does this
> just once (onviously FOO2 has no side effects)

Figure that the cost of binding a variable is essentially the
same cost as passing one argument to a function.  (Not CALLING
the function, mind you, just setting up the value).

I.e.

(let ((x 6))
  (dotimes (i 1000)
    (foo i x)))

and

(dotimes (i 1000)
  (foo i 6))

will be essentially the same, except for perhaps 1 stack slot.
This is just a rule of thumb; in some architectures this may
vary slightly one way or the other.

> This is clearly related to the complexity of FOO2.  So what if FOO2 is:
> 
> a) a slot-access fn for a defstruct, like (equation-template eqn)
>   [ corresponds to a single aref].  
Not necessarily exactly an AREF.  For example, it might do type
checking.  But regardless, it will be faster to put the constant
into a local variable.

> b) slighly more complicated like (length (equation-cover eqn)), where
>    equation-cover is also a slot-access function on a defstruct.

The more complex, the more you gain by moving the computation out
of the loop.

> Or are the relative tradeofs in such cases so small or so machine
> dependent that it is not even a factor.

Moving computations out of inner loops is always a good thing to do.

> QUESTION B:
> Again, this function will be called several thousand times
> 1. Using two let's
> 2. Using one let*
> 3. Using &aux & let

> It seems to me that [2] is the most convenient to write but the most
> computationally expensive.  

I didn't see any computational-cost  difference between ANY of them,
although I might have misinterpreted your elipsis.

> I personally find 3 more readable than 1 (especially when you have
> just 1 ``basic'' variable, say only n and no k+1), but is 1
> significantly more efficient?

I really recommend against use of &AUX.  It clutters up the
argument list, making it harder to read, etc.

I generally try to keep the creation of a variable together with
the creation of the data to put into it, and LET* is usually
the best way to achieve that.

From: Bob Kerns
Subject: Re: Extreme let-itis. Two most efficiency questions
Date: Tue, 27 Aug 1991 00:31:29 +0000
Message-ID: <1991Aug27.003129.29530@crl.dec.com>

In article <················@milton>, ······@herodotus.cs.uiuc.edu (R. Bharat Rao) writes:
> Does adding an extra let block have much more overhead than simply
> creating a new local variable to an existing let (see below)?

No, it doesn't matter performance-wise whether you write
LET once or twice to bind two variables.

> At what value of *local-loop* is the tradeoff worth the extra
> variable/let-block.  If *local-loop* is 1, obviously the the extra
> variable is unnecesary over-head (again readability is not an issue,
> here).  If *local-loop* = 10,000,000, then the extra local var makes
> sense.  Any one have a rough idea of the tradeoffs involved?

Do the let, if *LOCAL-LOOP* will ever be larger than 1.  Otherwise,
remove the DOTIMES, which is far more expensive than the LET!