From: Didier Verna
Subject: inlining works /against/ performance ??
Date: 
Message-ID: <muxzmpouf6h.fsf@uzeb.lrde.epita.fr>
        Hi !

I'm puzzled by the following bench I just ran on my computer. This is with the
CMU compiler. Consider the following code:


(eval-when (:compile-toplevel)
  (declaim (inline add) ;; =================> Comment it or not
	   (optimize (speed 3)
		     (compilation-speed 0)
		     (safety 0)
		     (debug 0))))

(defun add (image value)
  (declare (type (simple-array single-float (1048576)) image))
  (declare (type single-float value))
  (let ((size (array-dimension image 0)))
    (dotimes (pos size)
      (setf (aref image pos) (+ (aref image pos) value)))))

(let ((image (make-array 1048576
			 :element-type 'single-float
			 :initial-element 0.0)))
  (time (dotimes (i 500)
	  (declare (type fixnum i))
	  (add image 3.0))))



Without inlining, I get the following result:

; Evaluation took:
;   1.39 seconds of real time
;   1.36 seconds of user run time
;   0.01 seconds of system run time
;   4,147,211,774 CPU cycles
;   0 page faults and
;   0 bytes consed.
;

However, with inlining, I get this:

; Evaluation took:
;   1.73 seconds of real time
;   1.72 seconds of user run time
;   0.01 seconds of system run time
;   5,185,222,800 CPU cycles
;   0 page faults and
;   0 bytes consed.
;


It's counter-intuitive (at least to me) that inlining works against
performance. Can somebody explain ?

Thanks !

-- 
Didier Verna, ······@lrde.epita.fr, http://www.lrde.epita.fr/~didier

EPITA / LRDE, 14-16 rue Voltaire   Tel.+33 (1) 44 08 01 85
94276 Le Kremlin-Bic�tre, France   Fax.+33 (1) 53 14 59 22   ······@xemacs.org

From: Hannah Schroeter
Subject: Re: inlining works /against/ performance ??
Date: 
Message-ID: <di0uu3$e7g$2@c3po.use.schlund.de>
Hello!

Didier Verna  <······@lrde.epita.fr> wrote:
>[...]

>It's counter-intuitive (at least to me) that inlining works against
>performance. Can somebody explain ?

I have no explanation for your particular problem.

But in general, inlining can make the code bigger so that it might
not fit into the instruction cache of the CPU as well as before.

This is an argument to prefer inlining of very small functions over
inlining of bigger functions.

In addition, there are some things that aren't solved that well in
CMUCL, IIRC for example register allocation. Maybe this leads to
worse register allocation in the new function, with the incorporated
body of the called function, in comparison to two separate functions.

However, I can't really be sure about the latter, because I don't
know the CMUCL internals all too well.

>Thanks !

Kind regards,

Hannah.
From: Juho Snellman
Subject: Re: inlining works /against/ performance ??
Date: 
Message-ID: <slrndk8254.gas.jsnell@sbz-30.cs.Helsinki.FI>
<······@lrde.epita.fr> wrote:
> (let ((image (make-array 1048576
> 			 :element-type 'single-float
> 			 :initial-element 0.0)))
>   (time (dotimes (i 500)
> 	  (declare (type fixnum i))
> 	  (add image 3.0))))
[...]
> It's counter-intuitive (at least to me) that inlining works against
> performance. Can somebody explain ?

When the function is inlined the constant 3.0 gets propagated inside
the loop in ADD, where it's loaded from memory to a fp register (or
x87 stack) on each iteration. When it isn't inlined no such
propagation is possible, and VALUE is unboxed just once at the start
of ADD. So the compiler isn't stupid enough to not make this
optimization at all, and not smart enough to know that it's a bad idea
in this particular case.

If you changed the call to something with non-constant arguments, for
example (add image (float i 1.0)), the inlined and non-inlined
versions would probably be about as fast.

-- 
Juho Snellman
"Premature profiling is the root of all evil."
From: Thomas A. Russ
Subject: Re: inlining works /against/ performance ??
Date: 
Message-ID: <ymivf0a2tmm.fsf@sevak.isi.edu>
Didier Verna <······@lrde.epita.fr> writes:

> 
> (let ((image (make-array 1048576
> 			 :element-type 'single-float
> 			 :initial-element 0.0)))
>   (time (dotimes (i 500)
> 	  (declare (type fixnum i))
> 	  (add image 3.0))))

Regardless of the specifics of this case, it's generally not a good idea
to call TIME on general lisp expressions.  Sometimes this will result in
the system using interpreted code rather than compiled (depending on the
Lisp system.  I don't think CMUCL actually does this) or else including
the time of the compilation of the form in the timing output.

A better approach is to create one or more test functions for doing the
timing.  They can then be compiled and you can be sure(r) of what you're
actually testing.  For example:

(defun test-inline (image n)
   (declare (inline add)
	    (type n fixnum)))
   (dotimes (i n)
      (declare (type fixnum i))
      (add image 3.0)))

(defun test-notinline (image n)
   (declare (notinline add) 
	    (type n fixnum))
   (dotimes (i n)
      (declare (type fixnum i))
      (add image 3.0)))

(compile 'test-inline)
(compile 'test-notinline)

(let ...
  (time (test-inline image 500))
  (time (test-inline image 500))
)


-- 
Thomas A. Russ,  USC/Information Sciences Institute