From: Boris Schaefer
Subject: Efficiency in cmucl
Date: 
Message-ID: <f5b77cc5.0410241436.5c10811e@posting.google.com>
Hi,

most of my Lisp programs were fast enough and so I never really
learned all the necessary stuff about efficiency problems, but over
the weekend I investigated a toy program and now I'm wondering how to
do this efficiently.

Basically it's a very simply root-finder for an equation that uses the
bisection method.  Now, I coded this in Lisp and C, and the C version
is about 3-4 times as fast.  I believe the main problem with the Lisp
version is that it conses so much, and I'm not able to avoid it.  I'm
wondering especially about FIND-ZERO-RECURSIVE, I don't really
understand why it conses so much.  As far is I understand it, the 16
bytes that NEXT-X-1 and TEST-2 cons up, are because of the boxing of
the return value and this seems unavoidable, as far as I can tell from
my quick Google search.  But FIND-ZERO-RECURSIVE, I just don't
understand.

The speed isn't really changed by uncommenting the block compiling,
and if the consing changes, I can't tell, because I can't profile it
anymore (with Slime's profile interface, at least).  In any case, the
warning about the boxing of the return value remains, even with block
compilation turned on.

As you can see, I just put type declarations all over the place,
because I don't  yet fully understand, where they are useful and where
not.

I'd be grateful for any help to understand this and any tips on how to
make this more efficient in runtime and space.

(declaim (optimize (speed 3) (safety 0)))
;;(declaim (start-block zero-test))

(setq extensions:*gc-verbose* nil)

(defun zero-test ()
  (loop repeat 100
        sum (find-zero-recursive #'test-2
                                 #'next-x-1
                                 0.0d0
                                 100.0d0
                                 0.0001d0) of-type double-float))

(defun test-2 (x)
  (declare (type double-float x))
  (the double-float (+ (the double-float (* x x)) -3)))

(defun next-x-1 (f xk-1 xk)
  (declare (type double-float xk-1 xk)
           (ignore f))
  (the double-float (/ (the double-float (+ xk-1 xk)) 2d0)))

(defun find-zero-recursive (f next-x xk-1 xk epsilon)
  (declare (type double-float xk-1 xk epsilon)
           (type (function (double-float) double-float) f)
           (type (function ((function (double-float) double-float)
                            double-float double-float) double-float)
                 next-x))
  (let ((xk+1 (funcall next-x f xk-1 xk)))
    (declare (type double-float xk+1))
    (if (< (the double-float
                (abs (the double-float (funcall f xk+1)))) epsilon)
        (the double-float xk+1)
        (if (< (the double-float (* (the double-float (funcall f xk))
                                    (the double-float (funcall f
xk+1))))
               0d0)
            (the double-float
                 (find-zero-recursive f next-x xk xk+1 epsilon))
            (the double-float
                 (find-zero-recursive f next-x xk-1 xk+1 epsilon))))))

;;(declaim (end-block))

  Consed  | Calls   | Secs  | Sec/Call|Bytes/C.| Name:
-------------------------------------------------------------------
   88,000 |   5,500 | 0.010 | 0.00000 |     16 | TEST-2
   59,200 |     100 | 0.000 | 0.00000 |    592 | FIND-ZERO-RECURSIVE
   30,400 |   1,900 | 0.000 | 0.00000 |     16 | NEXT-X-1
      816 |       1 | 0.000 | 0.00000 |    816 | ZERO-TEST
-------------------------------------------------------------------
  178,416 |   7,501 | 0.010 |         |        | Total

Regards,
Boris Schaefer

From: Juho Snellman
Subject: Re: Efficiency in cmucl
Date: 
Message-ID: <slrncnoloe.ffa.jsnell@sbz-31.cs.Helsinki.FI>
<·····@uncommon-sense.net> wrote:
>I'd be grateful for any help to understand this and any tips on how to
>make this more efficient in runtime and space.

The following code is about 10 times faster than the original on my
computer. It has a generic FIND-ZERO with no declarations, and a
version that's specialized for double-floats / TEST-2 / NEXT-X-1. The
specialized function is implemented by inlining a call to the generic
version.

(declaim (inline find-zero))
(defun find-zero (f next-x xk-1 xk epsilon)
  (labels ((find-zero-recursive (xk-1 xk)
	     (let ((xk+1 (funcall next-x f xk-1 xk)))
	       (if (< (abs (funcall f xk+1))
		      epsilon)
		   xk+1
		   (if (< (* (funcall f xk)
			     (funcall f xk+1))
			  0d0)
		       (find-zero-recursive xk xk+1)
		       (find-zero-recursive xk-1 xk+1))))))
    (find-zero-recursive xk-1 xk)))

(defun find-zero-with-test-2 (xk-1 xk epsilon)
  (declare (type double-float xk-1 xk epsilon)
	   (optimize (speed 3)))
  (labels ((test-2 (x)
	     (declare (double-float x))
	     (+ (* x x) -3))
	   (next-x-1 (f xk-1 xk)
	     (declare (double-float xk-1 xk)
		      (ignore f))
	     (/ (+ xk-1 xk) 2d0)))
    (declare (inline test-2 next-x-1))
    (find-zero #'test-2 #'next-x-1 xk-1 xk epsilon)))

(defun zero-test ()
  (loop repeat 100
    sum (find-zero-with-test-2 0.0d0
			       100.0d0
			       0.0001d0) of-type double-float))

-- 
Juho Snellman
"Premature profiling is the root of all evil."
From: Boris Schaefer
Subject: Re: Efficiency in cmucl
Date: 
Message-ID: <f5b77cc5.0410242353.42a1f1bb@posting.google.com>
······@iki.fi (Juho Snellman) wrote:
> 
> The following code is about 10 times faster than the original on my
> computer. It has a generic FIND-ZERO with no declarations, and a
> version that's specialized for double-floats / TEST-2 / NEXT-X-1. The
> specialized function is implemented by inlining a call to the generic
> version.

Wow, thanks.  I'm really impressed.  It's about 20 times faster on my
computer, and also about 5 times faster than my (probably naive) C
versions.  (I've written a tail-recursive and an iterative C version
and both are equally fast on my computer.  The tail-recursive C
version is effectively the Lisp code translated to C.)

Regards, and thanks again,
Boris
From: Boris Schaefer
Subject: Re: Efficiency in cmucl
Date: 
Message-ID: <f5b77cc5.0410250404.16b857e@posting.google.com>
······@iki.fi (Juho Snellman) wrote: [a nice and fast version of my
problem]

I'm not seeing my original followup to your post in Google Groups yet,
so I'm replying to this article instead of my own.  In my other
followup I wrote that your version is 4-5 as fast as my C version. 
This is not true.  I tested with different epsilon-values.  But it's
still about twice as fast with the same starting values ... great.

Thanks again,
Boris
From: Peter Scott
Subject: Re: Efficiency in cmucl
Date: 
Message-ID: <7e267a92.0410250915.5f531282@posting.google.com>
·····@uncommon-sense.net (Boris Schaefer) wrote in message news:<····························@posting.google.com>...
> Hi,
> 
> most of my Lisp programs were fast enough and so I never really
> learned all the necessary stuff about efficiency problems, but over
> the weekend I investigated a toy program and now I'm wondering how to
> do this efficiently.
> 
> Basically it's a very simply root-finder for an equation that uses the
> bisection method.

I recommend moving to the regula falsi method. It should converge much
faster than the plain bisection method, and it's not very complicated.
This may not teach you much about type declaration optimization, but
it should be pretty darn effective at making your program go faster.

See http://en.wikipedia.org/wiki/Regula_falsi

-Peter
From: Christophe Turle
Subject: Re: Efficiency in cmucl
Date: 
Message-ID: <417d4b7d$0$2519$636a15ce@news.free.fr>
"Peter Scott" <·········@gmail.com> a �crit dans le message de news: 
····························@posting.google.com...
> ·····@uncommon-sense.net (Boris Schaefer) wrote in message 
> news:<····························@posting.google.com>...
>> Hi,
>>
>> most of my Lisp programs were fast enough and so I never really
>> learned all the necessary stuff about efficiency problems, but over
>> the weekend I investigated a toy program and now I'm wondering how to
>> do this efficiently.
>>
>> Basically it's a very simply root-finder for an equation that uses the
>> bisection method.
>
> I recommend moving to the regula falsi method. It should converge much
> faster than the plain bisection method, and it's not very complicated.
> This may not teach you much about type declaration optimization, but
> it should be pretty darn effective at making your program go faster.
>
> See http://en.wikipedia.org/wiki/Regula_falsi
>
> -Peter

Searching good algorithms are often better than searching good optimization.

-- 
___________________________________________________________
Christophe Turle.
sava preview http://perso.wanadoo.fr/turle/lisp/sava.html
(format nil ···@~a.~a" 'c.turle 'wanadoo 'fr) 
From: Boris Schaefer
Subject: Re: Efficiency in cmucl
Date: 
Message-ID: <f5b77cc5.0410260526.63adc2a7@posting.google.com>
I wrote:
> 
> Basically it's a very simply root-finder for an equation that uses the
> bisection method.

·········@gmail.com (Peter Scott) replied:
> 
> I recommend moving to the regula falsi method. It should converge much
> faster than the plain bisection method, and it's not very complicated.
> This may not teach you much about type declaration optimization, but
> it should be pretty darn effective at making your program go faster.

Actually, I already tried, and it's slower (at least in my quick
test).  It converges faster, but the calculation of the next value is
more expensive.  And, so, at least with my simple test it's slower. 
(Probably, if I had tried harder in not multiply evaluating things,
which need not be, it would be faster, but see the next paragraph.)

Anyhow, this was not really the point of my exercise.  I did this to
learn about optimization techniques and (especially) effective type
declarations to avoid irresponsible consing.

Regards,
Boris
From: Boris Schaefer
Subject: Re: Efficiency in cmucl
Date: 
Message-ID: <f5b77cc5.0410260547.5aef51b@posting.google.com>
·········@gmail.com (Peter Scott) wrote:
> 
> I recommend moving to the regula falsi method. It should converge much
> faster than the plain bisection method, and it's not very complicated.

I already replied in another article, but, I'm replying again, because
I repeated the mistake[1] in my article.  Regula falsi does not
converge "much faster" than plain bisection.  They both have linear
convergence.  Still, in my test, regula falsi needs less steps than
bisection, but takes longer, because the steps are more expensive (in
my very inept implementation, that is).

Regards,
Boris

[1] I think it's more like an exaggeration, than a mistake.