From: Frank Buss
Subject: Functional Textures
Date: 
Message-ID: <d2650l$dai$1@newsreader3.netcologne.de>
I've seen a video presentation on how a 96 kB game was created:

http://spaces.msn.com/members/grammerjack/Blog/cns!1peCHL_zjY_VAAoE21WVgU4Q!117.entry 

and I've tried the texture generation in Lisp:

http://www.frank-buss.de/lisp/texture.html

But for the last big pictures with #'olympic-rings it took half an hour
to calculate the image. How can I improve it? I think the funcalls are
too slow, perhaps it can be assembled together with direct function
calls with some macros at compile-time? And the values should be cached,
perhaps with some memoizing concept, because currently too much points
are calculated multiple times when chaining #'blur and #'emboss.

My goal is to adjust the parameters in real-time with an interactive
GUI, like in the editor for the kkriger game:

http://www.theprodukkt.com/werkkzeug1.html

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: Peter Scott
Subject: Re: Functional Textures
Date: 
Message-ID: <1111940393.634644.286240@z14g2000cwz.googlegroups.com>
To speed up the funcalls, perhaps you could declare functions which are
funcalled to be of type FUNCTION. As it is, they can be either a
function or an fbound symbol. If the funcalls really are the slow part
(profile!) this might be able to speed it up.

-Peter
From: Frank Buss
Subject: Re: Functional Textures
Date: 
Message-ID: <d26rpf$olo$1@newsreader3.netcologne.de>
"Peter Scott" <·········@gmail.com> wrote:

> To speed up the funcalls, perhaps you could declare functions which
> are funcalled to be of type FUNCTION. As it is, they can be either a
> function or an fbound symbol. If the funcalls really are the slow part
> (profile!) this might be able to speed it up.

This doesn't help. But I've managed to speed up the calculation of the
big picture from 17 minutes to 80 seconds with caching (see code below).
Now it is not so exact, because the resolution of the cache is only
twice as high as the resolution of the canvas compared to the pure
mathematical solution, which uses as many sub-pixels as required, but
the result looks nearly identicly. A drawback: zooming, twirling etc.
would be impossible or more difficult now. 

But the "werkkzeug" tool is still more than 1000 times faster: You don't
notice a delay in recalculating, when changing a parameter of an operator
of an complex texture with emboss, lights etc., you can do it in real-time.


(defparameter *buffer-delete-callbacks* '())

(defmacro memoized-lambda ((x y) &body body)
  (let ((buffer (gensym))
        (xb (gensym))
        (yb (gensym)))
    `(let (,buffer)
       (lambda (,x ,y)
         (let ((,xb (round (* (+ 1.0 ,x) clc:*width*)))
               (,yb (round (* (+ 1.0 ,y) clc:*height*))))
           (when (< ,xb 0) (setf ,xb 0))
           (when (>= ,xb (* 2 clc:*width*))
             (setf ,xb (1- (* 2 clc:*width*))))
           (when (< ,yb 0) (setf ,yb 0))
           (when (>= ,yb (* 2 clc:*height*))
             (setf ,yb (1- (* 2 clc:*height*))))
           (unless ,buffer
             (setf ,buffer (make-array (list (* 2 clc:*height*)
                                             (* 2 clc:*width*))
                                       :initial-element 1.0e10
                                       :element-type 'single-float))
             (push #'(lambda () (setf ,buffer nil))
                   *buffer-delete-callbacks*))
           (let ((date (aref ,buffer ,yb ,xb)))
             (if (/= date 1.0e10)
                 date
               (setf (aref ,buffer ,yb ,xb) ,@body))))))))

(defun circle (&key x0 y0 radius line-width)
  (memoized-lambda (x y)
    (let ((xc (- x x0))
          (yc (- y y0)))
      (let ((r2 (sqrt (+ (* xc xc) (* yc yc)))))
        (if (and (>= r2 (- radius line-width))
                 (< r2 (+ radius line-width)))
            1.0
          0.0)))))

(defun convolution (&key function matrix sample-radius divisor)
  (memoized-lambda (x y)
    (let ((sum 0.0))
      (loop for xo from -1 to 1 do
            (loop for yo from -1 to 1 do
                  (incf sum (* (aref matrix (1+ yo) (1+ xo))
                               (funcall function
                                        (+ x (* sample-radius xo))
                                        (+ y (* sample-radius yo)))))))
      (setf sum (/ sum divisor))
      sum)))
    
(defconstant *blur-matrix* '#2a((0.7 1.0 0.7)
                                (1.0 1.0 1.0)
                                (0.7 1.0 0.7)))

(defconstant *emboss-matrix* '#2a(( 1.0  0.7  0.0)
                                  ( 0.7  0.0 -0.7)
                                  ( 0.0 -1.0 -0.7)))

(defun blur (&key function sample-radius)
  (convolution :function function
               :matrix *blur-matrix*
               :sample-radius sample-radius
               :divisor 9.0))

(defun emboss (&key function sample-radius)
  (convolution :function function
               :matrix *emboss-matrix*
               :sample-radius sample-radius
               :divisor 1.0))

(defun add (&rest functions)
  (memoized-lambda (x y)
    (let ((sum 0))
      (loop for function in functions do
            (incf sum (funcall function x y)))
      sum)))

(defun black-white (&key function limit)
  (memoized-lambda (x y)
    (if (> (funcall function x y) limit)
        1.0
      0.0)))

(defun paint (function)
  (loop for callback in *buffer-delete-callbacks* do (funcall callback))
  (setf *buffer-delete-callbacks* '())
  (let ((data (make-array (list clc:*height* clc:*width*)))
        (min-date 1.0e10)
        (max-date -1.0e10))
    (loop for y from 0 below clc:*height* do
          (loop for x from 0 below clc:*width* do
                (let ((date (funcall function
                                     (- (* (/ x clc:*width*) 2) 1.0)
                                     (- (* (/ y clc:*height*) 2) 1.0))))
                  (setf (aref data y x) date)
                  (when (< date min-date) (setf min-date date))
                  (when (> date max-date) (setf max-date date)))))
    (let ((diff (- max-date min-date)))
      (when (< diff 0.001) (setf diff 0.001))
      (loop for y from 0 below clc:*height* do
            (loop for x from 0 below clc:*width* do
                  (let ((normalized-date (/ (- (aref data y x) min-date)
                                            diff)))
                    (let ((c (round (* 256 normalized-date))))
                      (when (> c 255) (setf c 255))
                      (when (< c 0) (setf c 0))
                      (setf (clc:framebuffer-point x y)
                            (logior c (ash c 8) (ash c 16)))))))))
  (clc:repaint))

(defun olympia ()
  (clc:show-canvas 300 300 :caption "Olympia" :topmost t)
  (let* ((c1 (circle :x0 -0.5 :y0 -0.5 :radius 0.2 :line-width 0.02))
         (c2 (circle :x0  0.0 :y0 -0.5 :radius 0.2 :line-width 0.02))
         (c3 (circle :x0  0.5 :y0 -0.5 :radius 0.2 :line-width 0.02))
         (c4 (circle :x0  -0.25 :y0 -0.3 :radius 0.2 :line-width 0.02))
         (c5 (circle :x0  0.25 :y0 -0.3 :radius 0.2 :line-width 0.02))
         (olympic (add c1 c2 c3 c4 c5))
         (olympic-bw (black-white :function olympic :limit 0.5))
         (embossed-rings (emboss :function olympic-bw :sample-radius 0.01))
         (added-rings (add olympic-bw embossed-rings))
         (olympic-rings (blur :function added-rings :sample-radius 0.007)))
    (paint olympic-rings)))



-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: Peter Scott
Subject: Re: Functional Textures
Date: 
Message-ID: <1111961214.402028.26360@z14g2000cwz.googlegroups.com>
Frank Buss wrote:
> "Peter Scott" <·········@gmail.com> wrote:
> > To speed up the funcalls, perhaps you could declare functions which
> > are funcalled to be of type FUNCTION. As it is, they can be either
a
> > function or an fbound symbol. If the funcalls really are the slow
part
> > (profile!) this might be able to speed it up.
>
> This doesn't help. But I've managed to speed up the calculation of
the
> big picture from 17 minutes to 80 seconds with caching (see code
below).

Once again, high-level optimization beats low level. Who says Lisp is
slow?

-Peter
From: Jens Axel Søgaard
Subject: Re: Functional Textures
Date: 
Message-ID: <4246b0a1$0$253$edfadb0f@dread12.news.tele.dk>
Frank Buss wrote:

> I've seen a video presentation on how a 96 kB game was created:
> 
> http://spaces.msn.com/members/grammerjack/Blog/cns!1peCHL_zjY_VAAoE21WVgU4Q!117.entry 
> 
> and I've tried the texture generation in Lisp:
> 
> http://www.frank-buss.de/lisp/texture.html

Interesting video. Your code reminded me of this nice text
by Jerzy Karczmarczuk:

<http://users.info.unicaen.fr/~karczma/Work/Clastic_distr/ctreport.pdf>

-- 
Jens Axel Søgaard
From: Wade Humeniuk
Subject: Re: Functional Textures
Date: 
Message-ID: <2wD1e.100518$ZO2.61968@edtnps84>
Frank,

Just by adding declarations and some slight recoding to your original code
(not the meomized version) I can get:

LWW 4.4

CL-USER 51 > (time (paint olympic-rings))
Timing the evaluation of (PAINT OLYMPIC-RINGS)

user time    =    103.068
system time  =      0.020
Elapsed time =   0:01:46
Allocation   = 6695847544 bytes standard / 4994 bytes conses
0 Page faults
1

CL-USER 52 >

Code -------------->

(in-package :cl-user)

(defun circle (&key x0 y0 radius line-width)
   (declare (optimize (speed 3) (safety 0))
            (type single-float x0 y0 radius line-width))
   (lambda (x y)
     (declare (optimize (speed 3) (safety o))
              (type single-float x y))
     (let ((xc (- x x0))
           (yc (- y y0)))
       (declare (type single-float xc yc))
       (let ((r2 (sqrt (+ (* xc xc) (* yc yc)))))
         (declare (type single-float r2))
         (if (and (>= r2 (- radius line-width))
                  (< r2 (+ radius line-width)))
             1.0
           0.0)))))

(declaim (inline convolution))
(defun convolution (&key function matrix sample-radius divisor)
   (lambda (x y)
     (declare (optimize (speed 3) (safety 0))
              (type single-float x y))
     (let ((sum 0.0))
       (declare (type single-float sum))
       (loop for xo from -1 to 1 do
             (loop for yo from -1 to 1 do
                   (incf sum (* (aref matrix (1+ yo) (1+ xo))
                                (the single-float (funcall function
                                                          (+ x (* sample-radius xo))
                                                          (+ y (* sample-radius yo))))))))
       (setf sum (/ sum divisor))
       sum)))


(defun blur (&key function sample-radius)
   (declare (optimize (speed 3) (safety 0))
            (type single-float sample-radius))
   (let ((matrix (make-array '(3 3) :element-type 'single-float
                             :initial-contents '((0.7 1.0 0.7)
                                                (1.0 1.0 1.0)
                                                (0.7 1.0 0.7)))))
     (declare (type (simple-array single-float (3 3)) matrix))
     (convolution :function function
                  :matrix matrix
                  :sample-radius sample-radius
                  :divisor 9.0)))

(defun emboss (&key function sample-radius)
   (declare (optimize (speed 3) (safety 0))
            (type single-float sample-radius))
   (let ((matrix (make-array '(3 3) :element-type 'single-float
                             :initial-contents
                             '(( 1.0  0.7  0.0)
                               ( 0.7  0.0 -0.7)
                               ( 0.0 -1.0 -0.7)))))
     (declare (type (simple-array single-float (3 3)) matrix))
     (convolution :function function
                  :matrix matrix
                  :sample-radius sample-radius
                  :divisor 1.0)))

(defun add (&rest functions)
   (lambda (x y)
     (declare (optimize (speed 3) (safety 0))
              (type single-float x y))
     (let ((sum 0.0))
       (declare (type  single-float sum))
       (loop for function in functions do
             (incf sum (funcall function x y)))
       sum)))

(defun black-white (&key function limit)
   (declare (optimize (speed 3) (safety 0))
            (type single-float limit))
   (lambda (x y)
     (declare (optimize (speed 3) (safety 0))
              (type single-float x y))
     (if (> (funcall function x y) limit)
         1.0
       0.0)))

(defun paint (function)
   (declare (optimize (speed 3) (safety 0)))
   (let ((data (make-array (list clc:*height* clc:*width*) :element-type 'single-float))
         (min-date 1.0e10)
         (max-date -1.0e10))
     (declare (type (simple-array single-float (* *)) data)
              (type single-float min-date max-date))
     (loop for y from 0 below clc:*height* do
           (loop for x from 0 below clc:*width* do
                 (let ((date (funcall function
                                      (- (* (/ x clc:*width*) 2) 1.0)
                                      (- (* (/ y clc:*height*) 2) 1.0))))
                   (declare (type single-float date))
                   (setf (aref data y x) date)
                   (when (< date min-date) (setf min-date date))
                   (when (> date max-date) (setf max-date date)))))
     (loop for y from 0 below clc:*height* do
           (loop for x from 0 below clc:*width* do
                 (let ((normalized-date (/ (- (aref data y x) min-date)
                                           (- max-date min-date))))
                   (declare (type single-float normalized-date))
                   (let ((c (round (* 256 normalized-date))))
                     (when (> c 255) (setf c 255))
                     (when (< c 0) (setf c 0))
                     (setf (clc:framebuffer-point x y)
                           (logior c (ash c 8) (ash c 16)))))))
     (clc:repaint)))
From: Frank Buss
Subject: Re: Functional Textures
Date: 
Message-ID: <d274fh$b9v$1@newsreader3.netcologne.de>
Wade Humeniuk <··················@telus.net> wrote:

> Just by adding declarations and some slight recoding to your original
> code (not the meomized version) I can get:

thanks. I've measured the difference of every of your declarations and 
nearly all are useless or only some percent, but only one: Using the code 
from my webpage it took 90 seconds, but with specifying the element-type in 
the paint function it took 38 seconds (your code: 36 seconds).

Same speedup for my memoizing version: With ":element-type 'single-float" 
in paint it took 6.8 seconds, without 14.4 seconds. Now I need only a 10x 
speedup for some useful realtime handling :-)

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: Frank Buss
Subject: Re: Functional Textures
Date: 
Message-ID: <d2c7cj$2ea$1@newsreader3.netcologne.de>
Frank Buss <··@frank-buss.de> wrote:

> Same speedup for my memoizing version: With ":element-type
> 'single-float" in paint it took 6.8 seconds, without 14.4 seconds. Now
> I need only a 10x speedup for some useful realtime handling :-)

Wade has managed to improve the speed. Now it runs in 0.4 seconds on my
computer. Now the functions doesn't return a value for every point,
but works on the same array to apply the changes. This is not as
flexible as the pure mathematical solution, but the results look nearly
the same. When I've some time, I'll implement a GUI for changing the
parameters interactivly. 

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: Alexander Repenning
Subject: Re: Functional Textures
Date: 
Message-ID: <1112030878.513569.113570@o13g2000cwo.googlegroups.com>
Frank Buss wrote:
> But for the last big pictures with #'olympic-rings it took half an
hour
> to calculate the image. How can I improve it? I think the funcalls
are
> too slow, perhaps it can be assembled together with direct function
> calls with some macros at compile-time? And the values should be
cached,
> perhaps with some memoizing concept, because currently too much
points
> are calculated multiple times when chaining #'blur and #'emboss.

The short answer is: don't use the CPU, use the GPU

blur and emboss can be executed in real time if implemented as pixel
shader programs. Instead of rendering 2x/hour you could render this
100x/second, i.e., a speedup of 180,000

tweaking your Lisp code, or even rewriting your code with assembler,
will not get you anywhere close to this

I am trying to talk some people into helping me writing a Lisp shader
compiler. It would be possible to have a Lisp subset to be compiled
into a shader program.

The alternative is to use an OS that has built-in shader support
accessible via API. OS X Tiger offers real time blur and emboss as part
of Core Image:  http://developer.apple.com/macosx/tiger/coreimage.html
The shader-based approach is fast enough to apply image operations to
running movies.

Alex
From: Joe Marshall
Subject: Re: Functional Textures
Date: 
Message-ID: <8y3xz588.fsf@ccs.neu.edu>
Frank Buss <··@frank-buss.de> writes:

> I've seen a video presentation on how a 96 kB game was created:
>
> http://spaces.msn.com/members/grammerjack/Blog/cns!1peCHL_zjY_VAAoE21WVgU4Q!117.entry 
>
> and I've tried the texture generation in Lisp:
>
> http://www.frank-buss.de/lisp/texture.html
>
> But for the last big pictures with #'olympic-rings it took half an hour
> to calculate the image. How can I improve it? 

Try partial evaluation.
From: Frank Buss
Subject: Re: Functional Textures
Date: 
Message-ID: <d2u7fn$i5o$1@newsreader3.netcologne.de>
Joe Marshall <···@ccs.neu.edu> wrote:

> Try partial evaluation.

Thanks, this is a good idea. I've done something like this a bit with my 
memoization code, but it was not as fast as Wade's code with mainly one 
array. But perhaps it is possible to build one complex function, which 
gets x and y as input and returns the result. But then the function has 
to be simplified. But is this possible with functions, which does 
coordinate transformations, like a warp function?

(defun warp (&key function p0 p1 p2 p3)
  (let* ((x0d (point-x p0))
         (y0d (point-y p0))
         (x1d (point-x p1))
         (y1d (point-y p1))
         (x2d (point-x p2))
         (y2d (point-y p2))
         (x3d (point-x p3))
         (y3d (point-y p3))
         (c00 (/ (+ (- x3d) x2d (- x1d) x0d) 4))
         (c01 (- (/ (+ (- x3d x2d x1d) x0d) 4)))
         (c02 (- (/ (+ (- x3d) (- x2d) x1d x0d) 4)))
         (c03 (/ (+ x3d x2d x1d x0d) 4))
         (c10 (/ (+ (- y3d) y2d (- y1d) y0d) 4))
         (c11 (- (/ (+ (- y3d y2d y1d) y0d) 4)))
         (c12 (- (/ (+ (- y3d) (- y2d) y1d y0d) 4)))
         (c13 (/ (+ y3d y2d y1d y0d) 4)))
    (lambda (x y)
      (let ((x-warped (+ (* c00 x y) (* c01 x) (* c02 y) c03))
            (y-warped (+ (* c10 x y) (* c11 x) (* c12 y) c13)))
        (funcall function x-warped y-warped)))))

(paint (warp :function olympic-rings
             :p0 '(-1.2 -1.0) :p1 '(1.0 -0.8)
             :p2 '(1.0 0.9) :p3 '(-0.4 0.0)))


-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: Joe Marshall
Subject: Re: Functional Textures
Date: 
Message-ID: <y8bxuldi.fsf@ccs.neu.edu>
Frank Buss <··@frank-buss.de> writes:

> Joe Marshall <···@ccs.neu.edu> wrote:
>
>> Try partial evaluation.
>
> Thanks, this is a good idea. I've done something like this a bit with my 
> memoization code, but it was not as fast as Wade's code with mainly one 
> array. But perhaps it is possible to build one complex function, which 
> gets x and y as input and returns the result. But then the function has 
> to be simplified. 

I was looking at the original code:

(defun convolution (&key function matrix sample-radius divisor)
  (lambda (x y)
    (let ((sum 0.0))
      (loop for xo from -1 to 1 do
            (loop for yo from -1 to 1 do
                  (incf sum (* (aref matrix (1+ yo) (1+ xo))
                               (funcall function
                                        (+ x (* sample-radius xo))
                                        (+ y (* sample-radius yo)))))))
      (setf sum (/ sum divisor))
      sum)))

and noticing that in EMBOSS and BLUR:

(defun blur (&key function sample-radius)
  (convolution :function function
               :matrix '#2a((0.7 1.0 0.7)
                            (1.0 1.0 1.0)
                            (0.7 1.0 0.7))
               :sample-radius sample-radius
               :divisor 9.0))

(defun emboss (&key function sample-radius)
  (convolution :function function
               :matrix '#2a(( 1.0  0.7  0.0)
                            ( 0.7  0.0 -0.7)
                            ( 0.0 -1.0 -0.7))
               :sample-radius sample-radius
               :divisor 1.0))


that you'll be doing an awful lot of multiplication by 0.0 and 1.0

Although the function and sample-radius are dynamic parameters, since
the image is big, it *still* might be worthwhile to construct a chunk
of source code, run it through the compiler, and funcall the output.

> But is this possible with functions, which does coordinate
> transformations, like a warp function? 

> (defun warp (&key function p0 p1 p2 p3)
>   (let* ((x0d (point-x p0))
>          (y0d (point-y p0))
>          (x1d (point-x p1))
>          (y1d (point-y p1))
>          (x2d (point-x p2))
>          (y2d (point-y p2))
>          (x3d (point-x p3))
>          (y3d (point-y p3))
>          (c00 (/ (+ (- x3d) x2d (- x1d) x0d) 4))
>          (c01 (- (/ (+ (- x3d x2d x1d) x0d) 4)))
>          (c02 (- (/ (+ (- x3d) (- x2d) x1d x0d) 4)))
>          (c03 (/ (+ x3d x2d x1d x0d) 4))
>          (c10 (/ (+ (- y3d) y2d (- y1d) y0d) 4))
>          (c11 (- (/ (+ (- y3d y2d y1d) y0d) 4)))
>          (c12 (- (/ (+ (- y3d) (- y2d) y1d y0d) 4)))
>          (c13 (/ (+ y3d y2d y1d y0d) 4)))
>     (lambda (x y)
>       (let ((x-warped (+ (* c00 x y) (* c01 x) (* c02 y) c03))
>             (y-warped (+ (* c10 x y) (* c11 x) (* c12 y) c13)))
>         (funcall function x-warped y-warped)))))
>
> (paint (warp :function olympic-rings
>              :p0 '(-1.2 -1.0) :p1 '(1.0 -0.8)
>              :p2 '(1.0 0.9) :p3 '(-0.4 0.0)))

Sure.

(define (make-warper &key p0 p1 p2 p3)
   (let* ((x0d (point-x p0))
          (y0d (point-y p0))
          (x1d (point-x p1))
          (y1d (point-y p1))
          (x2d (point-x p2))
          (y2d (point-y p2))
          (x3d (point-x p3))
          (y3d (point-y p3))
          (c00 (/ (+ (- x3d) x2d (- x1d) x0d) 4))
          (c01 (- (/ (+ (- x3d x2d x1d) x0d) 4)))
          (c02 (- (/ (+ (- x3d) (- x2d) x1d x0d) 4)))
          (c03 (/ (+ x3d x2d x1d x0d) 4))
          (c10 (/ (+ (- y3d) y2d (- y1d) y0d) 4))
          (c11 (- (/ (+ (- y3d y2d y1d) y0d) 4)))
          (c12 (- (/ (+ (- y3d) (- y2d) y1d y0d) 4)))
          (c13 (/ (+ y3d y2d y1d y0d) 4)))

    (compile `(lambda (f x y)
                (funcall f (+ (* ,c00 x y) (* ,c01 x) (* ,c02 y) ,c03)
                           (+ (* ,c10 x y) (* ,c11 x) (* ,c12 y)
                           ,c13))))))

This may or may not be worthwhile depending on how often you change
the points and how fast the compiler is.
From: Frank Buss
Subject: Re: Functional Textures
Date: 
Message-ID: <d2un5p$i3i$1@newsreader3.netcologne.de>
Joe Marshall <···@ccs.neu.edu> wrote:

> I was looking at the original code:
> 
> (defun convolution (&key function matrix sample-radius divisor)
>   (lambda (x y)
>     (let ((sum 0.0))
>       (loop for xo from -1 to 1 do
>             (loop for yo from -1 to 1 do
>                   (incf sum (* (aref matrix (1+ yo) (1+ xo))
>                                (funcall function
>                                         (+ x (* sample-radius xo))
>                                         (+ y (* sample-radius
>                               yo))))))) 
>       (setf sum (/ sum divisor))
>       sum)))
> 
> and noticing that in EMBOSS and BLUR:
> 
> (defun blur (&key function sample-radius)
>   (convolution :function function
>                :matrix '#2a((0.7 1.0 0.7)
>                             (1.0 1.0 1.0)
>                             (0.7 1.0 0.7))
>                :sample-radius sample-radius
>                :divisor 9.0))
> 
> (defun emboss (&key function sample-radius)
>   (convolution :function function
>                :matrix '#2a(( 1.0  0.7  0.0)
>                             ( 0.7  0.0 -0.7)
>                             ( 0.0 -1.0 -0.7))
>                :sample-radius sample-radius
>                :divisor 1.0))
> 
> 
> that you'll be doing an awful lot of multiplication by 0.0 and 1.0

yes, the 0.4 version by Wade looks like this:

(defun convolution (&key function matrix sample-radius divisor)
  (declare (optimize (speed 3) (safety 0) (float 0) (debug 0))
           (type fixnum sample-radius)
           (type single-float divisor))
  (destructuring-bind (m00 m01 m02 m10 m11 m12 m20 m21 m22)
      matrix
    (declare (type single-float m00 m01 m02 m10 m11 m12 m20 m21 m22))
    (let ((-sample-radius (* sample-radius -1)))
      (declare (type fixnum -sample-radius))
      (lambda (array)
        (declare (type (simple-array single-float (* *)) array))
        (funcall function array)
        (let ((copy (copy-image-array array)))
          (declare (type (simple-array single-float (* *)) copy))
          (destructuring-bind (height width)
              (array-dimensions array)
            (flet ((aref-internal (x y)
                     (aref copy
                           (cond ((< x 0) 0)
                                 ((>= x width) (1- width))
                                 (t x))
                           (cond ((< y 0) 0)
                                 ((>= y height) (1- height))
                                 (t y)))))
              (declare (inline aref-internal))
              (loop for x fixnum from 0 below width do
                    (loop for y fixnum from 0 below height do
                          (let ((-x (+ x -sample-radius))
                                (+x (+ x sample-radius))
                                (-y (+ y -sample-radius))
                                (+y (+ y sample-radius)))
                            (declare (type fixnum -x +x -y +y))
                            (setf (aref array x y)
                                  (/
                                   (+
                                    (* m00 (aref-internal -x -y))
                                    (* m01 (aref-internal -x y))
                                    (* m02 (aref-internal -x +y))
                                    (* m10 (aref-internal x -y))
                                    (* m11 (aref-internal x y))
                                    (* m12 (aref-internal x +y))
                                    (* m20 (aref-internal +x -y))
                                    (* m21 (aref-internal +x y))
                                    (* m22 (aref-internal +x +y)))
                                   divisor))))))))))))

> Sure.
> 
> (define (make-warper &key p0 p1 p2 p3)
>    (let* ((x0d (point-x p0))
>           (y0d (point-y p0))
>           (x1d (point-x p1))
>           (y1d (point-y p1))
>           (x2d (point-x p2))
>           (y2d (point-y p2))
>           (x3d (point-x p3))
>           (y3d (point-y p3))
>           (c00 (/ (+ (- x3d) x2d (- x1d) x0d) 4))
>           (c01 (- (/ (+ (- x3d x2d x1d) x0d) 4)))
>           (c02 (- (/ (+ (- x3d) (- x2d) x1d x0d) 4)))
>           (c03 (/ (+ x3d x2d x1d x0d) 4))
>           (c10 (/ (+ (- y3d) y2d (- y1d) y0d) 4))
>           (c11 (- (/ (+ (- y3d y2d y1d) y0d) 4)))
>           (c12 (- (/ (+ (- y3d) (- y2d) y1d y0d) 4)))
>           (c13 (/ (+ y3d y2d y1d y0d) 4)))
> 
>     (compile `(lambda (f x y)
>                 (funcall f (+ (* ,c00 x y) (* ,c01 x) (* ,c02 y) ,c03)
>                            (+ (* ,c10 x y) (* ,c11 x) (* ,c12 y)
>                            ,c13))))))
> 
> This may or may not be worthwhile depending on how often you change
> the points and how fast the compiler is.

I have not measured it, but I don't think that this makes much 
difference, because it does not much optimization, only an immediate 
register load (on CPUs where this is possible) instead of reading it from 
memory.

My idea was something like this: (add circle1 circle2) should be 
optimized in a merged version of both circle lambda bodies, without any 
funcalls and perhaps with automatic simplifications (for example, if the 
center of both circles are the same, there are only one xc/xy pair 
needed). And if the result lambda of the add call is used as an argument 
for another function, it should be merged into this function and 
simplified, if possible etc. This would be really nice partial 
evaluation. The merge process should be possible with macros and the 
simplications can be done by the compiler.

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: Nicolas Neuss
Subject: Re: Functional Textures
Date: 
Message-ID: <871x9oswvj.fsf@ortler.iwr.uni-heidelberg.de>
Hello,

probably you should also not use 2d-arrays.  Some time ago, I wrote similar
code which compared favorably with gcc [1].  Unfortunately, this depends
quite a lot on the architecture.

Nicolas.

[1] N. Neuss: On using Common Lisp for Scientific Computing
    see <http://sit.iwr.uni-heidelberg.de/~neuss/publications.html>
    and <http://sit.iwr.uni-heidelberg.de/~neuss/misc/codegen.lisp>
From: Joe Marshall
Subject: Re: Functional Textures
Date: 
Message-ID: <is30uhlc.fsf@ccs.neu.edu>
Frank Buss <··@frank-buss.de> writes:

> Joe Marshall <···@ccs.neu.edu> wrote:
>
>> I was looking at the original code:
>> 
>> (defun convolution (&key function matrix sample-radius divisor)
>>   (lambda (x y)
>>     (let ((sum 0.0))
>>       (loop for xo from -1 to 1 do
>>             (loop for yo from -1 to 1 do
>>                   (incf sum (* (aref matrix (1+ yo) (1+ xo))
>>                                (funcall function
>>                                         (+ x (* sample-radius xo))
>>                                         (+ y (* sample-radius
>>                               yo))))))) 
>>       (setf sum (/ sum divisor))
>>       sum)))
>> 
>> and noticing that in EMBOSS and BLUR:
>> 
>> (defun blur (&key function sample-radius)
>>   (convolution :function function
>>                :matrix '#2a((0.7 1.0 0.7)
>>                             (1.0 1.0 1.0)
>>                             (0.7 1.0 0.7))
>>                :sample-radius sample-radius
>>                :divisor 9.0))
>> 
>> (defun emboss (&key function sample-radius)
>>   (convolution :function function
>>                :matrix '#2a(( 1.0  0.7  0.0)
>>                             ( 0.7  0.0 -0.7)
>>                             ( 0.0 -1.0 -0.7))
>>                :sample-radius sample-radius
>>                :divisor 1.0))
>> 
>> 
>> that you'll be doing an awful lot of multiplication by 0.0 and 1.0
>
> yes, the 0.4 version by Wade looks like this:
>
> (defun convolution (&key function matrix sample-radius divisor)
>   (declare (optimize (speed 3) (safety 0) (float 0) (debug 0))
>            (type fixnum sample-radius)
>            (type single-float divisor))
>   (destructuring-bind (m00 m01 m02 m10 m11 m12 m20 m21 m22)
>       matrix
>     (declare (type single-float m00 m01 m02 m10 m11 m12 m20 m21 m22))
>     (let ((-sample-radius (* sample-radius -1)))
>       (declare (type fixnum -sample-radius))
>       (lambda (array)
>         (declare (type (simple-array single-float (* *)) array))
>         (funcall function array)
>         (let ((copy (copy-image-array array)))
>           (declare (type (simple-array single-float (* *)) copy))
>           (destructuring-bind (height width)
>               (array-dimensions array)
>             (flet ((aref-internal (x y)
>                      (aref copy
>                            (cond ((< x 0) 0)
>                                  ((>= x width) (1- width))
>                                  (t x))
>                            (cond ((< y 0) 0)
>                                  ((>= y height) (1- height))
>                                  (t y)))))
>               (declare (inline aref-internal))
>               (loop for x fixnum from 0 below width do
>                     (loop for y fixnum from 0 below height do
>                           (let ((-x (+ x -sample-radius))
>                                 (+x (+ x sample-radius))
>                                 (-y (+ y -sample-radius))
>                                 (+y (+ y sample-radius)))
>                             (declare (type fixnum -x +x -y +y))
>                             (setf (aref array x y)
>                                   (/
>                                    (+
>                                     (* m00 (aref-internal -x -y))
>                                     (* m01 (aref-internal -x y))
>                                     (* m02 (aref-internal -x +y))
>                                     (* m10 (aref-internal x -y))
>                                     (* m11 (aref-internal x y))
>                                     (* m12 (aref-internal x +y))
>                                     (* m20 (aref-internal +x -y))
>                                     (* m21 (aref-internal +x y))
>                                     (* m22 (aref-internal +x +y)))
>                                    divisor))))))))))))

That helps, but you really want the matrix coefficients inlined.
Consider the EMBOSS function.  Since m02, m11, and m20 are zero, the
last form could be written like this:

                             (setf (aref array x y)
                                   (/
                                    (+
                                     (* m00 (aref-internal -x -y))
                                     (* m01 (aref-internal -x y))
                                     (* m10 (aref-internal x -y))
                                     (* m12 (aref-internal x +y))
                                     (* m21 (aref-internal +x y))
                                     (* m22 (aref-internal +x +y))
                                     )
                                    divisor))))))))))))

Not only do we save the multiply, we save the aref-internal and the
associated computations.

>> Sure.
>> 
>> (define (make-warper &key p0 p1 p2 p3)
>>    (let* ((x0d (point-x p0))
>>           (y0d (point-y p0))
>>           (x1d (point-x p1))
>>           (y1d (point-y p1))
>>           (x2d (point-x p2))
>>           (y2d (point-y p2))
>>           (x3d (point-x p3))
>>           (y3d (point-y p3))
>>           (c00 (/ (+ (- x3d) x2d (- x1d) x0d) 4))
>>           (c01 (- (/ (+ (- x3d x2d x1d) x0d) 4)))
>>           (c02 (- (/ (+ (- x3d) (- x2d) x1d x0d) 4)))
>>           (c03 (/ (+ x3d x2d x1d x0d) 4))
>>           (c10 (/ (+ (- y3d) y2d (- y1d) y0d) 4))
>>           (c11 (- (/ (+ (- y3d y2d y1d) y0d) 4)))
>>           (c12 (- (/ (+ (- y3d) (- y2d) y1d y0d) 4)))
>>           (c13 (/ (+ y3d y2d y1d y0d) 4)))
>> 
>>     (compile `(lambda (f x y)
>>                 (funcall f (+ (* ,c00 x y) (* ,c01 x) (* ,c02 y) ,c03)
>>                            (+ (* ,c10 x y) (* ,c11 x) (* ,c12 y)
>>                            ,c13))))))
>> 
>> This may or may not be worthwhile depending on how often you change
>> the points and how fast the compiler is.
>
> I have not measured it, but I don't think that this makes much 
> difference, because it does not much optimization, only an immediate 
> register load (on CPUs where this is possible) instead of reading it from 
> memory.

It depends.  If any of the cXX values end up being zero, you avoid the
register load *and* the multiply.  If any end up being one, you at
least avoid the multiply.

> My idea was something like this: (add circle1 circle2) should be 
> optimized in a merged version of both circle lambda bodies, without any 
> funcalls and perhaps with automatic simplifications (for example, if the 
> center of both circles are the same, there are only one xc/xy pair 
> needed). And if the result lambda of the add call is used as an argument 
> for another function, it should be merged into this function and 
> simplified, if possible etc. This would be really nice partial 
> evaluation. The merge process should be possible with macros and the 
> simplications can be done by the compiler.

This would be a bigger win than the suggestions above.

Jeffrey Mark Siskind has a partial evaluator for Common Lisp that I've
played with.  Partial evaluation is tricky, but the payoff is big.