I've seen a video presentation on how a 96 kB game was created:
http://spaces.msn.com/members/grammerjack/Blog/cns!1peCHL_zjY_VAAoE21WVgU4Q!117.entry
and I've tried the texture generation in Lisp:
http://www.frank-buss.de/lisp/texture.html
But for the last big pictures with #'olympic-rings it took half an hour
to calculate the image. How can I improve it? I think the funcalls are
too slow, perhaps it can be assembled together with direct function
calls with some macros at compile-time? And the values should be cached,
perhaps with some memoizing concept, because currently too much points
are calculated multiple times when chaining #'blur and #'emboss.
My goal is to adjust the parameters in real-time with an interactive
GUI, like in the editor for the kkriger game:
http://www.theprodukkt.com/werkkzeug1.html
--
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
To speed up the funcalls, perhaps you could declare functions which are
funcalled to be of type FUNCTION. As it is, they can be either a
function or an fbound symbol. If the funcalls really are the slow part
(profile!) this might be able to speed it up.
-Peter
"Peter Scott" <·········@gmail.com> wrote:
> To speed up the funcalls, perhaps you could declare functions which
> are funcalled to be of type FUNCTION. As it is, they can be either a
> function or an fbound symbol. If the funcalls really are the slow part
> (profile!) this might be able to speed it up.
This doesn't help. But I've managed to speed up the calculation of the
big picture from 17 minutes to 80 seconds with caching (see code below).
Now it is not so exact, because the resolution of the cache is only
twice as high as the resolution of the canvas compared to the pure
mathematical solution, which uses as many sub-pixels as required, but
the result looks nearly identicly. A drawback: zooming, twirling etc.
would be impossible or more difficult now.
But the "werkkzeug" tool is still more than 1000 times faster: You don't
notice a delay in recalculating, when changing a parameter of an operator
of an complex texture with emboss, lights etc., you can do it in real-time.
(defparameter *buffer-delete-callbacks* '())
(defmacro memoized-lambda ((x y) &body body)
(let ((buffer (gensym))
(xb (gensym))
(yb (gensym)))
`(let (,buffer)
(lambda (,x ,y)
(let ((,xb (round (* (+ 1.0 ,x) clc:*width*)))
(,yb (round (* (+ 1.0 ,y) clc:*height*))))
(when (< ,xb 0) (setf ,xb 0))
(when (>= ,xb (* 2 clc:*width*))
(setf ,xb (1- (* 2 clc:*width*))))
(when (< ,yb 0) (setf ,yb 0))
(when (>= ,yb (* 2 clc:*height*))
(setf ,yb (1- (* 2 clc:*height*))))
(unless ,buffer
(setf ,buffer (make-array (list (* 2 clc:*height*)
(* 2 clc:*width*))
:initial-element 1.0e10
:element-type 'single-float))
(push #'(lambda () (setf ,buffer nil))
*buffer-delete-callbacks*))
(let ((date (aref ,buffer ,yb ,xb)))
(if (/= date 1.0e10)
date
(setf (aref ,buffer ,yb ,xb) ,@body))))))))
(defun circle (&key x0 y0 radius line-width)
(memoized-lambda (x y)
(let ((xc (- x x0))
(yc (- y y0)))
(let ((r2 (sqrt (+ (* xc xc) (* yc yc)))))
(if (and (>= r2 (- radius line-width))
(< r2 (+ radius line-width)))
1.0
0.0)))))
(defun convolution (&key function matrix sample-radius divisor)
(memoized-lambda (x y)
(let ((sum 0.0))
(loop for xo from -1 to 1 do
(loop for yo from -1 to 1 do
(incf sum (* (aref matrix (1+ yo) (1+ xo))
(funcall function
(+ x (* sample-radius xo))
(+ y (* sample-radius yo)))))))
(setf sum (/ sum divisor))
sum)))
(defconstant *blur-matrix* '#2a((0.7 1.0 0.7)
(1.0 1.0 1.0)
(0.7 1.0 0.7)))
(defconstant *emboss-matrix* '#2a(( 1.0 0.7 0.0)
( 0.7 0.0 -0.7)
( 0.0 -1.0 -0.7)))
(defun blur (&key function sample-radius)
(convolution :function function
:matrix *blur-matrix*
:sample-radius sample-radius
:divisor 9.0))
(defun emboss (&key function sample-radius)
(convolution :function function
:matrix *emboss-matrix*
:sample-radius sample-radius
:divisor 1.0))
(defun add (&rest functions)
(memoized-lambda (x y)
(let ((sum 0))
(loop for function in functions do
(incf sum (funcall function x y)))
sum)))
(defun black-white (&key function limit)
(memoized-lambda (x y)
(if (> (funcall function x y) limit)
1.0
0.0)))
(defun paint (function)
(loop for callback in *buffer-delete-callbacks* do (funcall callback))
(setf *buffer-delete-callbacks* '())
(let ((data (make-array (list clc:*height* clc:*width*)))
(min-date 1.0e10)
(max-date -1.0e10))
(loop for y from 0 below clc:*height* do
(loop for x from 0 below clc:*width* do
(let ((date (funcall function
(- (* (/ x clc:*width*) 2) 1.0)
(- (* (/ y clc:*height*) 2) 1.0))))
(setf (aref data y x) date)
(when (< date min-date) (setf min-date date))
(when (> date max-date) (setf max-date date)))))
(let ((diff (- max-date min-date)))
(when (< diff 0.001) (setf diff 0.001))
(loop for y from 0 below clc:*height* do
(loop for x from 0 below clc:*width* do
(let ((normalized-date (/ (- (aref data y x) min-date)
diff)))
(let ((c (round (* 256 normalized-date))))
(when (> c 255) (setf c 255))
(when (< c 0) (setf c 0))
(setf (clc:framebuffer-point x y)
(logior c (ash c 8) (ash c 16)))))))))
(clc:repaint))
(defun olympia ()
(clc:show-canvas 300 300 :caption "Olympia" :topmost t)
(let* ((c1 (circle :x0 -0.5 :y0 -0.5 :radius 0.2 :line-width 0.02))
(c2 (circle :x0 0.0 :y0 -0.5 :radius 0.2 :line-width 0.02))
(c3 (circle :x0 0.5 :y0 -0.5 :radius 0.2 :line-width 0.02))
(c4 (circle :x0 -0.25 :y0 -0.3 :radius 0.2 :line-width 0.02))
(c5 (circle :x0 0.25 :y0 -0.3 :radius 0.2 :line-width 0.02))
(olympic (add c1 c2 c3 c4 c5))
(olympic-bw (black-white :function olympic :limit 0.5))
(embossed-rings (emboss :function olympic-bw :sample-radius 0.01))
(added-rings (add olympic-bw embossed-rings))
(olympic-rings (blur :function added-rings :sample-radius 0.007)))
(paint olympic-rings)))
--
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
Frank Buss wrote:
> "Peter Scott" <·········@gmail.com> wrote:
> > To speed up the funcalls, perhaps you could declare functions which
> > are funcalled to be of type FUNCTION. As it is, they can be either
a
> > function or an fbound symbol. If the funcalls really are the slow
part
> > (profile!) this might be able to speed it up.
>
> This doesn't help. But I've managed to speed up the calculation of
the
> big picture from 17 minutes to 80 seconds with caching (see code
below).
Once again, high-level optimization beats low level. Who says Lisp is
slow?
-Peter
Frank Buss wrote:
> I've seen a video presentation on how a 96 kB game was created:
>
> http://spaces.msn.com/members/grammerjack/Blog/cns!1peCHL_zjY_VAAoE21WVgU4Q!117.entry
>
> and I've tried the texture generation in Lisp:
>
> http://www.frank-buss.de/lisp/texture.html
Interesting video. Your code reminded me of this nice text
by Jerzy Karczmarczuk:
<http://users.info.unicaen.fr/~karczma/Work/Clastic_distr/ctreport.pdf>
--
Jens Axel Søgaard
Wade Humeniuk <··················@telus.net> wrote:
> Just by adding declarations and some slight recoding to your original
> code (not the meomized version) I can get:
thanks. I've measured the difference of every of your declarations and
nearly all are useless or only some percent, but only one: Using the code
from my webpage it took 90 seconds, but with specifying the element-type in
the paint function it took 38 seconds (your code: 36 seconds).
Same speedup for my memoizing version: With ":element-type 'single-float"
in paint it took 6.8 seconds, without 14.4 seconds. Now I need only a 10x
speedup for some useful realtime handling :-)
--
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
Frank Buss <··@frank-buss.de> wrote:
> Same speedup for my memoizing version: With ":element-type
> 'single-float" in paint it took 6.8 seconds, without 14.4 seconds. Now
> I need only a 10x speedup for some useful realtime handling :-)
Wade has managed to improve the speed. Now it runs in 0.4 seconds on my
computer. Now the functions doesn't return a value for every point,
but works on the same array to apply the changes. This is not as
flexible as the pure mathematical solution, but the results look nearly
the same. When I've some time, I'll implement a GUI for changing the
parameters interactivly.
--
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
Frank Buss wrote:
> But for the last big pictures with #'olympic-rings it took half an
hour
> to calculate the image. How can I improve it? I think the funcalls
are
> too slow, perhaps it can be assembled together with direct function
> calls with some macros at compile-time? And the values should be
cached,
> perhaps with some memoizing concept, because currently too much
points
> are calculated multiple times when chaining #'blur and #'emboss.
The short answer is: don't use the CPU, use the GPU
blur and emboss can be executed in real time if implemented as pixel
shader programs. Instead of rendering 2x/hour you could render this
100x/second, i.e., a speedup of 180,000
tweaking your Lisp code, or even rewriting your code with assembler,
will not get you anywhere close to this
I am trying to talk some people into helping me writing a Lisp shader
compiler. It would be possible to have a Lisp subset to be compiled
into a shader program.
The alternative is to use an OS that has built-in shader support
accessible via API. OS X Tiger offers real time blur and emboss as part
of Core Image: http://developer.apple.com/macosx/tiger/coreimage.html
The shader-based approach is fast enough to apply image operations to
running movies.
Alex
Frank Buss <··@frank-buss.de> writes:
> I've seen a video presentation on how a 96 kB game was created:
>
> http://spaces.msn.com/members/grammerjack/Blog/cns!1peCHL_zjY_VAAoE21WVgU4Q!117.entry
>
> and I've tried the texture generation in Lisp:
>
> http://www.frank-buss.de/lisp/texture.html
>
> But for the last big pictures with #'olympic-rings it took half an hour
> to calculate the image. How can I improve it?
Try partial evaluation.
Joe Marshall <···@ccs.neu.edu> wrote:
> Try partial evaluation.
Thanks, this is a good idea. I've done something like this a bit with my
memoization code, but it was not as fast as Wade's code with mainly one
array. But perhaps it is possible to build one complex function, which
gets x and y as input and returns the result. But then the function has
to be simplified. But is this possible with functions, which does
coordinate transformations, like a warp function?
(defun warp (&key function p0 p1 p2 p3)
(let* ((x0d (point-x p0))
(y0d (point-y p0))
(x1d (point-x p1))
(y1d (point-y p1))
(x2d (point-x p2))
(y2d (point-y p2))
(x3d (point-x p3))
(y3d (point-y p3))
(c00 (/ (+ (- x3d) x2d (- x1d) x0d) 4))
(c01 (- (/ (+ (- x3d x2d x1d) x0d) 4)))
(c02 (- (/ (+ (- x3d) (- x2d) x1d x0d) 4)))
(c03 (/ (+ x3d x2d x1d x0d) 4))
(c10 (/ (+ (- y3d) y2d (- y1d) y0d) 4))
(c11 (- (/ (+ (- y3d y2d y1d) y0d) 4)))
(c12 (- (/ (+ (- y3d) (- y2d) y1d y0d) 4)))
(c13 (/ (+ y3d y2d y1d y0d) 4)))
(lambda (x y)
(let ((x-warped (+ (* c00 x y) (* c01 x) (* c02 y) c03))
(y-warped (+ (* c10 x y) (* c11 x) (* c12 y) c13)))
(funcall function x-warped y-warped)))))
(paint (warp :function olympic-rings
:p0 '(-1.2 -1.0) :p1 '(1.0 -0.8)
:p2 '(1.0 0.9) :p3 '(-0.4 0.0)))
--
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
Frank Buss <··@frank-buss.de> writes:
> Joe Marshall <···@ccs.neu.edu> wrote:
>
>> Try partial evaluation.
>
> Thanks, this is a good idea. I've done something like this a bit with my
> memoization code, but it was not as fast as Wade's code with mainly one
> array. But perhaps it is possible to build one complex function, which
> gets x and y as input and returns the result. But then the function has
> to be simplified.
I was looking at the original code:
(defun convolution (&key function matrix sample-radius divisor)
(lambda (x y)
(let ((sum 0.0))
(loop for xo from -1 to 1 do
(loop for yo from -1 to 1 do
(incf sum (* (aref matrix (1+ yo) (1+ xo))
(funcall function
(+ x (* sample-radius xo))
(+ y (* sample-radius yo)))))))
(setf sum (/ sum divisor))
sum)))
and noticing that in EMBOSS and BLUR:
(defun blur (&key function sample-radius)
(convolution :function function
:matrix '#2a((0.7 1.0 0.7)
(1.0 1.0 1.0)
(0.7 1.0 0.7))
:sample-radius sample-radius
:divisor 9.0))
(defun emboss (&key function sample-radius)
(convolution :function function
:matrix '#2a(( 1.0 0.7 0.0)
( 0.7 0.0 -0.7)
( 0.0 -1.0 -0.7))
:sample-radius sample-radius
:divisor 1.0))
that you'll be doing an awful lot of multiplication by 0.0 and 1.0
Although the function and sample-radius are dynamic parameters, since
the image is big, it *still* might be worthwhile to construct a chunk
of source code, run it through the compiler, and funcall the output.
> But is this possible with functions, which does coordinate
> transformations, like a warp function?
> (defun warp (&key function p0 p1 p2 p3)
> (let* ((x0d (point-x p0))
> (y0d (point-y p0))
> (x1d (point-x p1))
> (y1d (point-y p1))
> (x2d (point-x p2))
> (y2d (point-y p2))
> (x3d (point-x p3))
> (y3d (point-y p3))
> (c00 (/ (+ (- x3d) x2d (- x1d) x0d) 4))
> (c01 (- (/ (+ (- x3d x2d x1d) x0d) 4)))
> (c02 (- (/ (+ (- x3d) (- x2d) x1d x0d) 4)))
> (c03 (/ (+ x3d x2d x1d x0d) 4))
> (c10 (/ (+ (- y3d) y2d (- y1d) y0d) 4))
> (c11 (- (/ (+ (- y3d y2d y1d) y0d) 4)))
> (c12 (- (/ (+ (- y3d) (- y2d) y1d y0d) 4)))
> (c13 (/ (+ y3d y2d y1d y0d) 4)))
> (lambda (x y)
> (let ((x-warped (+ (* c00 x y) (* c01 x) (* c02 y) c03))
> (y-warped (+ (* c10 x y) (* c11 x) (* c12 y) c13)))
> (funcall function x-warped y-warped)))))
>
> (paint (warp :function olympic-rings
> :p0 '(-1.2 -1.0) :p1 '(1.0 -0.8)
> :p2 '(1.0 0.9) :p3 '(-0.4 0.0)))
Sure.
(define (make-warper &key p0 p1 p2 p3)
(let* ((x0d (point-x p0))
(y0d (point-y p0))
(x1d (point-x p1))
(y1d (point-y p1))
(x2d (point-x p2))
(y2d (point-y p2))
(x3d (point-x p3))
(y3d (point-y p3))
(c00 (/ (+ (- x3d) x2d (- x1d) x0d) 4))
(c01 (- (/ (+ (- x3d x2d x1d) x0d) 4)))
(c02 (- (/ (+ (- x3d) (- x2d) x1d x0d) 4)))
(c03 (/ (+ x3d x2d x1d x0d) 4))
(c10 (/ (+ (- y3d) y2d (- y1d) y0d) 4))
(c11 (- (/ (+ (- y3d y2d y1d) y0d) 4)))
(c12 (- (/ (+ (- y3d) (- y2d) y1d y0d) 4)))
(c13 (/ (+ y3d y2d y1d y0d) 4)))
(compile `(lambda (f x y)
(funcall f (+ (* ,c00 x y) (* ,c01 x) (* ,c02 y) ,c03)
(+ (* ,c10 x y) (* ,c11 x) (* ,c12 y)
,c13))))))
This may or may not be worthwhile depending on how often you change
the points and how fast the compiler is.
Joe Marshall <···@ccs.neu.edu> wrote:
> I was looking at the original code:
>
> (defun convolution (&key function matrix sample-radius divisor)
> (lambda (x y)
> (let ((sum 0.0))
> (loop for xo from -1 to 1 do
> (loop for yo from -1 to 1 do
> (incf sum (* (aref matrix (1+ yo) (1+ xo))
> (funcall function
> (+ x (* sample-radius xo))
> (+ y (* sample-radius
> yo)))))))
> (setf sum (/ sum divisor))
> sum)))
>
> and noticing that in EMBOSS and BLUR:
>
> (defun blur (&key function sample-radius)
> (convolution :function function
> :matrix '#2a((0.7 1.0 0.7)
> (1.0 1.0 1.0)
> (0.7 1.0 0.7))
> :sample-radius sample-radius
> :divisor 9.0))
>
> (defun emboss (&key function sample-radius)
> (convolution :function function
> :matrix '#2a(( 1.0 0.7 0.0)
> ( 0.7 0.0 -0.7)
> ( 0.0 -1.0 -0.7))
> :sample-radius sample-radius
> :divisor 1.0))
>
>
> that you'll be doing an awful lot of multiplication by 0.0 and 1.0
yes, the 0.4 version by Wade looks like this:
(defun convolution (&key function matrix sample-radius divisor)
(declare (optimize (speed 3) (safety 0) (float 0) (debug 0))
(type fixnum sample-radius)
(type single-float divisor))
(destructuring-bind (m00 m01 m02 m10 m11 m12 m20 m21 m22)
matrix
(declare (type single-float m00 m01 m02 m10 m11 m12 m20 m21 m22))
(let ((-sample-radius (* sample-radius -1)))
(declare (type fixnum -sample-radius))
(lambda (array)
(declare (type (simple-array single-float (* *)) array))
(funcall function array)
(let ((copy (copy-image-array array)))
(declare (type (simple-array single-float (* *)) copy))
(destructuring-bind (height width)
(array-dimensions array)
(flet ((aref-internal (x y)
(aref copy
(cond ((< x 0) 0)
((>= x width) (1- width))
(t x))
(cond ((< y 0) 0)
((>= y height) (1- height))
(t y)))))
(declare (inline aref-internal))
(loop for x fixnum from 0 below width do
(loop for y fixnum from 0 below height do
(let ((-x (+ x -sample-radius))
(+x (+ x sample-radius))
(-y (+ y -sample-radius))
(+y (+ y sample-radius)))
(declare (type fixnum -x +x -y +y))
(setf (aref array x y)
(/
(+
(* m00 (aref-internal -x -y))
(* m01 (aref-internal -x y))
(* m02 (aref-internal -x +y))
(* m10 (aref-internal x -y))
(* m11 (aref-internal x y))
(* m12 (aref-internal x +y))
(* m20 (aref-internal +x -y))
(* m21 (aref-internal +x y))
(* m22 (aref-internal +x +y)))
divisor))))))))))))
> Sure.
>
> (define (make-warper &key p0 p1 p2 p3)
> (let* ((x0d (point-x p0))
> (y0d (point-y p0))
> (x1d (point-x p1))
> (y1d (point-y p1))
> (x2d (point-x p2))
> (y2d (point-y p2))
> (x3d (point-x p3))
> (y3d (point-y p3))
> (c00 (/ (+ (- x3d) x2d (- x1d) x0d) 4))
> (c01 (- (/ (+ (- x3d x2d x1d) x0d) 4)))
> (c02 (- (/ (+ (- x3d) (- x2d) x1d x0d) 4)))
> (c03 (/ (+ x3d x2d x1d x0d) 4))
> (c10 (/ (+ (- y3d) y2d (- y1d) y0d) 4))
> (c11 (- (/ (+ (- y3d y2d y1d) y0d) 4)))
> (c12 (- (/ (+ (- y3d) (- y2d) y1d y0d) 4)))
> (c13 (/ (+ y3d y2d y1d y0d) 4)))
>
> (compile `(lambda (f x y)
> (funcall f (+ (* ,c00 x y) (* ,c01 x) (* ,c02 y) ,c03)
> (+ (* ,c10 x y) (* ,c11 x) (* ,c12 y)
> ,c13))))))
>
> This may or may not be worthwhile depending on how often you change
> the points and how fast the compiler is.
I have not measured it, but I don't think that this makes much
difference, because it does not much optimization, only an immediate
register load (on CPUs where this is possible) instead of reading it from
memory.
My idea was something like this: (add circle1 circle2) should be
optimized in a merged version of both circle lambda bodies, without any
funcalls and perhaps with automatic simplifications (for example, if the
center of both circles are the same, there are only one xc/xy pair
needed). And if the result lambda of the add call is used as an argument
for another function, it should be merged into this function and
simplified, if possible etc. This would be really nice partial
evaluation. The merge process should be possible with macros and the
simplications can be done by the compiler.
--
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
Hello,
probably you should also not use 2d-arrays. Some time ago, I wrote similar
code which compared favorably with gcc [1]. Unfortunately, this depends
quite a lot on the architecture.
Nicolas.
[1] N. Neuss: On using Common Lisp for Scientific Computing
see <http://sit.iwr.uni-heidelberg.de/~neuss/publications.html>
and <http://sit.iwr.uni-heidelberg.de/~neuss/misc/codegen.lisp>
Frank Buss <··@frank-buss.de> writes:
> Joe Marshall <···@ccs.neu.edu> wrote:
>
>> I was looking at the original code:
>>
>> (defun convolution (&key function matrix sample-radius divisor)
>> (lambda (x y)
>> (let ((sum 0.0))
>> (loop for xo from -1 to 1 do
>> (loop for yo from -1 to 1 do
>> (incf sum (* (aref matrix (1+ yo) (1+ xo))
>> (funcall function
>> (+ x (* sample-radius xo))
>> (+ y (* sample-radius
>> yo)))))))
>> (setf sum (/ sum divisor))
>> sum)))
>>
>> and noticing that in EMBOSS and BLUR:
>>
>> (defun blur (&key function sample-radius)
>> (convolution :function function
>> :matrix '#2a((0.7 1.0 0.7)
>> (1.0 1.0 1.0)
>> (0.7 1.0 0.7))
>> :sample-radius sample-radius
>> :divisor 9.0))
>>
>> (defun emboss (&key function sample-radius)
>> (convolution :function function
>> :matrix '#2a(( 1.0 0.7 0.0)
>> ( 0.7 0.0 -0.7)
>> ( 0.0 -1.0 -0.7))
>> :sample-radius sample-radius
>> :divisor 1.0))
>>
>>
>> that you'll be doing an awful lot of multiplication by 0.0 and 1.0
>
> yes, the 0.4 version by Wade looks like this:
>
> (defun convolution (&key function matrix sample-radius divisor)
> (declare (optimize (speed 3) (safety 0) (float 0) (debug 0))
> (type fixnum sample-radius)
> (type single-float divisor))
> (destructuring-bind (m00 m01 m02 m10 m11 m12 m20 m21 m22)
> matrix
> (declare (type single-float m00 m01 m02 m10 m11 m12 m20 m21 m22))
> (let ((-sample-radius (* sample-radius -1)))
> (declare (type fixnum -sample-radius))
> (lambda (array)
> (declare (type (simple-array single-float (* *)) array))
> (funcall function array)
> (let ((copy (copy-image-array array)))
> (declare (type (simple-array single-float (* *)) copy))
> (destructuring-bind (height width)
> (array-dimensions array)
> (flet ((aref-internal (x y)
> (aref copy
> (cond ((< x 0) 0)
> ((>= x width) (1- width))
> (t x))
> (cond ((< y 0) 0)
> ((>= y height) (1- height))
> (t y)))))
> (declare (inline aref-internal))
> (loop for x fixnum from 0 below width do
> (loop for y fixnum from 0 below height do
> (let ((-x (+ x -sample-radius))
> (+x (+ x sample-radius))
> (-y (+ y -sample-radius))
> (+y (+ y sample-radius)))
> (declare (type fixnum -x +x -y +y))
> (setf (aref array x y)
> (/
> (+
> (* m00 (aref-internal -x -y))
> (* m01 (aref-internal -x y))
> (* m02 (aref-internal -x +y))
> (* m10 (aref-internal x -y))
> (* m11 (aref-internal x y))
> (* m12 (aref-internal x +y))
> (* m20 (aref-internal +x -y))
> (* m21 (aref-internal +x y))
> (* m22 (aref-internal +x +y)))
> divisor))))))))))))
That helps, but you really want the matrix coefficients inlined.
Consider the EMBOSS function. Since m02, m11, and m20 are zero, the
last form could be written like this:
(setf (aref array x y)
(/
(+
(* m00 (aref-internal -x -y))
(* m01 (aref-internal -x y))
(* m10 (aref-internal x -y))
(* m12 (aref-internal x +y))
(* m21 (aref-internal +x y))
(* m22 (aref-internal +x +y))
)
divisor))))))))))))
Not only do we save the multiply, we save the aref-internal and the
associated computations.
>> Sure.
>>
>> (define (make-warper &key p0 p1 p2 p3)
>> (let* ((x0d (point-x p0))
>> (y0d (point-y p0))
>> (x1d (point-x p1))
>> (y1d (point-y p1))
>> (x2d (point-x p2))
>> (y2d (point-y p2))
>> (x3d (point-x p3))
>> (y3d (point-y p3))
>> (c00 (/ (+ (- x3d) x2d (- x1d) x0d) 4))
>> (c01 (- (/ (+ (- x3d x2d x1d) x0d) 4)))
>> (c02 (- (/ (+ (- x3d) (- x2d) x1d x0d) 4)))
>> (c03 (/ (+ x3d x2d x1d x0d) 4))
>> (c10 (/ (+ (- y3d) y2d (- y1d) y0d) 4))
>> (c11 (- (/ (+ (- y3d y2d y1d) y0d) 4)))
>> (c12 (- (/ (+ (- y3d) (- y2d) y1d y0d) 4)))
>> (c13 (/ (+ y3d y2d y1d y0d) 4)))
>>
>> (compile `(lambda (f x y)
>> (funcall f (+ (* ,c00 x y) (* ,c01 x) (* ,c02 y) ,c03)
>> (+ (* ,c10 x y) (* ,c11 x) (* ,c12 y)
>> ,c13))))))
>>
>> This may or may not be worthwhile depending on how often you change
>> the points and how fast the compiler is.
>
> I have not measured it, but I don't think that this makes much
> difference, because it does not much optimization, only an immediate
> register load (on CPUs where this is possible) instead of reading it from
> memory.
It depends. If any of the cXX values end up being zero, you avoid the
register load *and* the multiply. If any end up being one, you at
least avoid the multiply.
> My idea was something like this: (add circle1 circle2) should be
> optimized in a merged version of both circle lambda bodies, without any
> funcalls and perhaps with automatic simplifications (for example, if the
> center of both circles are the same, there are only one xc/xy pair
> needed). And if the result lambda of the add call is used as an argument
> for another function, it should be merged into this function and
> simplified, if possible etc. This would be really nice partial
> evaluation. The merge process should be possible with macros and the
> simplications can be done by the compiler.
This would be a bigger win than the suggestions above.
Jeffrey Mark Siskind has a partial evaluator for Common Lisp that I've
played with. Partial evaluation is tricky, but the payoff is big.