Help with GA, and critique my Lisp (please ;-))

From: William Bland
Subject: Help with GA, and critique my Lisp (please ;-))
Date: Mon, 27 Sep 2004 16:25:38 +0000
Message-ID: <pan.2004.09.27.16.17.29.325052@abstractnonsense.com>

Hi all,
	going on the theory that you have to not mind looking like an idiot if
you want to learn anything worth knowing...

I wanted to try something a little more advanced than the toy problems
I've been solving in Lisp.  Actually this one's still a toy, but I can't
get it to work!  I have to (shamefully) admit that I thought I understood
Genetic Algorithms because I've read a lot about how other people have
used them, but I now realise I know next to nothing.  This is the first
time I've tried to implement one myself and, embarrassingly the fitness
(both max and average) seems to trend downwards!  It never gets close to
solving the problem (simply finding a polynomial that passes through a
given set of points in the plane).  I'd be very interested to hear from
any GA gurus who might be able to see what I'm doing wrong.

I'm also interested to hear what warts I might have in my Lisp code, as
this is one of the largest chunks of CL code I've written so far.  I
probably still write CL with a bit of a Scheme accent (hopefully my Java
accent doesn't show too much though!).  Oh, I know using ERROR to end the
program is *unspeakably* ugly - any thoughts on the Right Way to do it?

Thanks very much for any comments.

;;;; ga.lisp  Try to find a polynomial to fit a set of points
;;;; There is an easy exact solution - I wanted to see if a
;;;; GA could find it.

(defun evaluate-poly (p x)  ; (7 3 3.14) represents 7 + 3x + 3.14x^2
  (labels ((looper (p acc power)
	      (if p
		  (looper (rest p) (+ acc (* (first p) (expt x power)))
		     (1+ power))
		  acc)))
    (looper p 0 0)))

(defun fitness (p points)  ; points is a list of points (z1 z2 ...)
  (let ((dist 0))
    (dolist (pt points dist)
      (incf dist (abs (- (imagpart pt) (evaluate-poly p (realpart pt))))))
    (if (= dist 0)
	(error "Found solution")
	(/ 1 dist))))

(defun make-random-poly ()
  (let ((ret nil))
    (dotimes (i (random 10) ret)
      (push (- (random 10.0) (random 10.0)) ret))))

(defun make-random-point ()
  (complex (- (random 10.0) (random 10.0))
	   (- (random 10.0) (random 10.0))))

(defun mate-polys (p q)
  (cond ((or (null p) (null q))
	 (append p q))
	((> (random 100) 95)
	 (mate-polys p (make-random-poly)))
	((> (random 100) 95)
	 (mate-polys (make-random-poly) q))
	(t
	 (let ((split (random (min (length p) (length q)))))
	   (append (subseq p 0 split)
		   (subseq q split))))))

(defun random-element (lst)
  (nth (random (length lst)) lst))

(defun main ()
  (let ((population nil)
	(points nil))
    (dotimes (i 5) (push (make-random-point) points))
    (dotimes (i 10000) (push (make-random-poly) population))
    (loop
       (let* ((new-population nil)
	      (fitnesses (mapcar (lambda (p) (fitness p points)) population))
	      (sum-fitnesss (apply #'+ fitnesses))
	      (weighted-population nil))
	 (format t "Current best: ~A, average: ~A~%"
		 (apply #'max fitnesses) (/ sum-fitnesss (length population)))
	 (force-output)
	 (dolist (p population)
	   (dotimes (i (/ (fitness p points) sum-fitnesss))
	     (push p weighted-population)))
	 (dotimes (i (length population))
	   (push (mate-polys (random-element weighted-population)
			     (random-element weighted-population))
		 new-population))
	 (setf population new-population)))))

;;;; End of file

Thanks again,
		Bill.
-- 
Dr. William Bland.
It would not be too unfair to any language to refer to Java as a
stripped down Lisp or Smalltalk with a C syntax.   (Ken Anderson).

Re: Help with GA, and critique my Lisp (please ;-)) Marco Antoniotti
- Re: Help with GA, and critique my Lisp (please ;-)) Geoffrey Summerhayes
  - Re: Help with GA, and critique my Lisp (please ;-)) Kalle Olavi Niemitalo
Re: Help with GA, and critique my Lisp (please ;-)) Gareth McCaughan
- Re: Help with GA, and critique my Lisp (please ;-)) William Bland
Re: Help with GA, and critique my Lisp (please ;-)) Matthias
Genetic Algorithms for Lisp [was: Re: Help with GA, ... ] szymon
- Re: Genetic Algorithms for Lisp [was: Re: Help with GA, ... ] Zakath
Re: Help with GA, and critique my Lisp (please ;-)) Time Waster

From: Marco Antoniotti
Subject: Re: Help with GA, and critique my Lisp (please ;-))
Date: Mon, 27 Sep 2004 21:29:18 +0000
Message-ID: <PC%5d.32$NQ1.13133@typhoon.nyu.edu>

William Bland wrote:
> Hi all,
> 	going on the theory that you have to not mind looking like an idiot if
> you want to learn anything worth knowing...
> 
> I wanted to try something a little more advanced than the toy problems
> I've been solving in Lisp.  Actually this one's still a toy, but I can't
> get it to work!  I have to (shamefully) admit that I thought I understood
> Genetic Algorithms because I've read a lot about how other people have
> used them, but I now realise I know next to nothing.  This is the first
> time I've tried to implement one myself and, embarrassingly the fitness
> (both max and average) seems to trend downwards!  It never gets close to
> solving the problem (simply finding a polynomial that passes through a
> given set of points in the plane).  I'd be very interested to hear from
> any GA gurus who might be able to see what I'm doing wrong.
> 
> I'm also interested to hear what warts I might have in my Lisp code, as
> this is one of the largest chunks of CL code I've written so far.  I
> probably still write CL with a bit of a Scheme accent (hopefully my Java
> accent doesn't show too much though!).  Oh, I know using ERROR to end the
> program is *unspeakably* ugly - any thoughts on the Right Way to do it?
> 
> Thanks very much for any comments.
> 
> ;;;; ga.lisp  Try to find a polynomial to fit a set of points
> ;;;; There is an easy exact solution - I wanted to see if a
> ;;;; GA could find it.
> 
> (defun evaluate-poly (p x)  ; (7 3 3.14) represents 7 + 3x + 3.14x^2
>   (labels ((looper (p acc power)
> 	      (if p
> 		  (looper (rest p) (+ acc (* (first p) (expt x power)))
> 		     (1+ power))
> 		  acc)))
>     (looper p 0 0)))

Nice but waaaaayyyyy too Schemeish.

(defun evaluate-poly (p x)
    (loop for coeff in p
          for power from 0
          sum (* coeff (expt x power)))

A little wasteful, but what the heck.


Cheers
--
Marco

From: Geoffrey Summerhayes
Subject: Re: Help with GA, and critique my Lisp (please ;-))
Date: Tue, 28 Sep 2004 00:31:47 +0000
Message-ID: <Ii26d.4803$MD5.390912@news20.bellglobal.com>

"Marco Antoniotti" <·······@cs.nyu.edu> wrote in message ·······················@typhoon.nyu.edu...
>
> Nice but waaaaayyyyy too Schemeish.
>
> (defun evaluate-poly (p x)
>    (loop for coeff in p
>          for power from 0
>          sum (* coeff (expt x power)))
>
> A little wasteful, but what the heck.
>

(defun evaluate-poly (p x)
  (reduce #'(lambda (a c) (+ c (* x a)))
         (reverse p) :initial-value 0))

--
Geoff

From: Kalle Olavi Niemitalo
Subject: Re: Help with GA, and critique my Lisp (please ;-))
Date: Sun, 03 Oct 2004 09:22:09 +0000
Message-ID: <87zn34kvge.fsf@Astalo.kon.iki.fi>

"Geoffrey Summerhayes" <·······@NhOoStPmAaMil.com> writes:

> (defun evaluate-poly (p x)
>   (reduce #'(lambda (a c) (+ c (* x a)))
>          (reverse p) :initial-value 0))

I thought REDUCE with :FROM-END T might cons less, but there is
apparently no difference in SBCL 0.8.14.9.

  (declaim (optimize (speed 2) (space 2) (safety 0) (debug 0)))

  (defun evaluate-poly-from-end (p x)
    (reduce #'(lambda (c a) (+ c (* x a)))
            p :from-end t :initial-value 0))

  (defun evaluate-poly-antoniotti (p x)
    (loop for coeff in p
          for power from 0
          sum (* coeff (expt x power))))

  (defun evaluate-poly-mul (p x)
    (loop for coeff in p
          and x^n = 1 then (* x^n x)
          sum (* coeff x^n)))

* (time (loop repeat 10000000 do (evaluate-poly '(5 7 3 4) 9)))

Evaluation took:
  		 37.241 seconds of real time
  		 33.25 seconds of user run time
  		 3.9 seconds of system run time
  		 0 page faults and
  		 2,082,226,184 bytes consed.
NIL
* (time (loop repeat 10000000 do (evaluate-poly-from-end '(5 7 3 4) 9)))

Evaluation took:
  		 40.868 seconds of real time
  		 37.11 seconds of user run time
  		 3.49 seconds of system run time
  		 0 page faults and
  		 2,082,280,296 bytes consed.
NIL
* (time (loop repeat 10000000 do (evaluate-poly-antoniotti '(5 7 3 4) 9)))

Evaluation took:
  		 20.961 seconds of real time
  		 19.96 seconds of user run time
  		 0.75 seconds of system run time
  		 0 page faults and
  		 320,346,128 bytes consed.
NIL
* (time (loop repeat 10000000 do (evaluate-poly-mul '(5 7 3 4) 9)))

Evaluation took:
  		 8.62 seconds of real time
  		 7.8 seconds of user run time
  		 0.63 seconds of system run time
  		 0 page faults and
  		 320,343,240 bytes consed.
NIL

The results are almost equal with (optimize (speed 3) (space 2))
or (optimize (speed 0) (space 3)).

From: Gareth McCaughan
Subject: Re: Help with GA, and critique my Lisp (please ;-))
Date: Tue, 28 Sep 2004 00:33:01 +0000
Message-ID: <87y8ivp7az.fsf@g.mccaughan.ntlworld.com>

William Bland wrote:

> (defun evaluate-poly (p x)  ; (7 3 3.14) represents 7 + 3x + 3.14x^2
>   (labels ((looper (p acc power)
> 	      (if p
> 		  (looper (rest p) (+ acc (* (first p) (expt x power)))
> 		     (1+ power))
> 		  acc)))
>     (looper p 0 0)))

Very Scheme-y indeed. :-) Also, all those EXPTs are going to
be inefficient when you're doing a lot of them. (And when you
do GA, you inevitably do a lot of whatever you do.)

    (loop for coeff in p
          for x^n = 1 then (* x^n x)
          sum (* coeff x^n))

is more idiomatic CL, if you can stand LOOP. Or put the coefficients
in the other order, and use Horner's rule:

    (loop with s = 0
          for coeff in p
          do (setf s (+ coeff (* coeff x)))
          finally (return s))

or something of the sort. That saves half the multiplications.

> (defun fitness (p points)  ; points is a list of points (z1 z2 ...)
>   (let ((dist 0))
>     (dolist (pt points dist)
>       (incf dist (abs (- (imagpart pt) (evaluate-poly p (realpart pt))))))
>     (if (= dist 0)
> 	(error "Found solution")
> 	(/ 1 dist))))

You were concerned about the ugliness of using ERROR here.
Two possible ways to avoid that:

  - Use CATCH and THROW instead. Same sort of effect,
    but THROW doesn't say "error" so loudly as ERROR
    does.

  - Don't bother taking the reciprocal of your total
    error; change the sign of your testing in MAIN and
    stop when the fittest polynomial has unfitness 0.

I favour the latter. In fact, I favour using unfitnesses
rather than fitnesses in any case. If you get very close
to a perfect fit, then you can suffer overflow. Why bother?

Using complex numbers to represent your points is a nasty
hack, since complex numbers *are* numbers. And some day
you might want to do your optimization over complex
polynomials, anyway. Why not use, say, cons cells?

> (defun make-random-poly ()
>   (let ((ret nil))
>     (dotimes (i (random 10) ret)
>       (push (- (random 10.0) (random 10.0)) ret))))
> 
> (defun make-random-point ()
>   (complex (- (random 10.0) (random 10.0))
> 	   (- (random 10.0) (random 10.0))))
> 
> (defun mate-polys (p q)
>   (cond ((or (null p) (null q))
> 	 (append p q))
> 	((> (random 100) 95)
> 	 (mate-polys p (make-random-poly)))
> 	((> (random 100) 95)
> 	 (mate-polys (make-random-poly) q))
> 	(t
> 	 (let ((split (random (min (length p) (length q)))))
> 	   (append (subseq p 0 split)
> 		   (subseq q split))))))

This probably doesn't matter, but have you noticed that this
chooses to mutate just P more often than it does just Q?

All this LENGTH and SUBSEQ and APPEND on lists is a bit
painful efficiency-wise, though I suspect that in real
applications almost anything you do in this part of the
code will have negligible cost compared to the actual
fitness evaluations. I'd be inclined to make polynomials
be vectors. (If and when you start needing real speed,
you may -- at least on some implementations -- be able
to get substantial gains using specialized vectors of
double-floats, or whatever: save a lot of boxing.)

> (defun random-element (lst)
>   (nth (random (length lst)) lst))

This is pretty inefficient. (It iterates over the entire list
to find its length, and then iterates over some random initial
sublist to find the requested element.) Consider using an
adjustable array instead of a list for your population.

> (defun main ()
>   (let ((population nil)
> 	(points nil))
>     (dotimes (i 5) (push (make-random-point) points))
>     (dotimes (i 10000) (push (make-random-poly) population))
>     (loop
>        (let* ((new-population nil)
> 	      (fitnesses (mapcar (lambda (p) (fitness p points)) population))
> 	      (sum-fitnesss (apply #'+ fitnesses))

The calculation of SUM-FITNESSS (shouldn't there be another E
in there?) is not guaranteed to work; CALL-ARGUMENTS-LIMIT
is allowed to be quite small. Use (reduce #'+ fitnesses) instead.

> 	      (weighted-population nil))
> 	 (format t "Current best: ~A, average: ~A~%"
> 		 (apply #'max fitnesses) (/ sum-fitnesss (length population)))
> 	 (force-output)
> 	 (dolist (p population)
> 	   (dotimes (i (/ (fitness p points) sum-fitnesss))
> 	     (push p weighted-population)))

You're calculating each polynomial's fitness twice. That seems
a bit gratuitous.

And how can this possibly work? The number-of-times
you're feeding to DOTIMES isn't an integer (it results
from dividing a float by another float), and it's
always at most 1 (it's one term of a sum of positive
terms divided by the sum).

So let me make a guess at what's happening. Your CL
implementation, if you say

    (dotimes (i 0.1) ...),

will execute the loop body exactly once. So every polynomial
from POPULATION gets into WEIGHTED-POPULATION exactly once.
And then you do your random mating, etc., and there's nothing
to encourage it to get any better. I'm not sure offhand why
it should get *worse*, though.

I suspect that what you intended is

    (dotimes (i (round (* (length population)
                          (/ (fitness p points) sum-fitnesses))))
      ...)

or something along those lines. (Possibly better to use the
actual intended population size instead of (LENGTH POPULATION),
not for reasons of efficiency (the cost of iterating over
POPULATION will be swamped by other things) but so as to
avoid funny effects where the population grows or shrinks
by 1% every generation due to rounding effects and ends up
tiny or huge.)

> 	 (dotimes (i (length population))
> 	   (push (mate-polys (random-element weighted-population)
> 			     (random-element weighted-population))
> 		 new-population))
> 	 (setf population new-population)))))

That bit looks OK, though despite my remarks above I'm
a bit queasy at the unnecessary calculation of
(LENGTH POPULATION). Either stick the population size
in a variable, or do something like

    (setf population
          (loop for dummy in population collect
                (mate-polys (random-element weighted-population)
                            (random-element weighted-population))))

Actually, I'd factor a bit of that out:

    (defun breed-once (population)
      (mate-polys (random-element population)
                  (random-element population)))

I'd factor a bit more out too -- the whole business of
making the next generation. But I'd do that in parallel
with changing the way that works: using an array instead
of a list, not recalculating all the fitnesess, etc.

-- 
Gareth McCaughan
.sig under construc

From: William Bland
Subject: Re: Help with GA, and critique my Lisp (please ;-))
Date: Wed, 29 Sep 2004 19:29:39 +0000
Message-ID: <pan.2004.09.29.19.21.19.386824@abstractnonsense.com>

On Tue, 28 Sep 2004 00:33:01 +0000, Gareth McCaughan wrote:
[snip - lots of useful comments]
> And how can this possibly work? The number-of-times
> you're feeding to DOTIMES isn't an integer (it results
> from dividing a float by another float), and it's
> always at most 1 (it's one term of a sum of positive
> terms divided by the sum).

Doh!

Well I'm glad it was a fairly simple mistake rather than a fundamental
misunderstanding.  It works now, albeit very slowly.  I'll try all the
other things you suggested - they all make a lot of sense.  In particular
I thought I was being clever, using complex numbers to represent points in
the plane, but I think on reflection you're right - it's not a good idea.

Thanks very much.

Cheers,
	Bill.
-- 
Dr. William Bland.
It would not be too unfair to any language to refer to Java as a
stripped down Lisp or Smalltalk with a C syntax.   (Ken Anderson).

From: Matthias
Subject: Re: Help with GA, and critique my Lisp (please ;-))
Date: Tue, 28 Sep 2004 08:09:53 +0000
Message-ID: <36wmzzastke.fsf@hundertwasser.ti.uni-mannheim.de>

William Bland <·······@abstractnonsense.com> writes:

> Hi all,
> 	going on the theory that you have to not mind looking like an idiot if
> you want to learn anything worth knowing...
> 
> I wanted to try something a little more advanced than the toy problems
> I've been solving in Lisp.  Actually this one's still a toy, but I can't
> get it to work!  I have to (shamefully) admit that I thought I understood
> Genetic Algorithms because I've read a lot about how other people have
> used them, but I now realise I know next to nothing.  This is the first
> time I've tried to implement one myself and, embarrassingly the fitness
> (both max and average) seems to trend downwards!  It never gets close to
> solving the problem (simply finding a polynomial that passes through a
> given set of points in the plane).  I'd be very interested to hear from
> any GA gurus who might be able to see what I'm doing wrong.

I'm not a GA guru.  In fact, in my experience they are often among the
most inefficient ways to solve an optimization problem.

But in your algorithm the problem simply is that you don't use the
fitness score to select the next generation.

Try 
	(dotimes (i (* 10000 (/ (fitness p points) sum-fitnesss)))

instead of 

        (dotimes (i (/ (fitness p points) sum-fitnesss))

in main.  Then your GA works, in principle, and the fun of finding the
most appropriate parameters starts. :-)

  Matthias

From: szymon
Subject: Genetic Algorithms for Lisp [was: Re: Help with GA, ... ]
Date: Tue, 28 Sep 2004 23:10:59 +0000
Message-ID: <cjcrct$nod$1@atlantis.news.tpi.pl>

You may be interested in Gene Stover's article:
[ http://www.lisp-p.org/evie/ ]

Regards, Szymon.

From: Zakath
Subject: Re: Genetic Algorithms for Lisp [was: Re: Help with GA, ... ]
Date: Thu, 30 Sep 2004 04:54:25 +0000
Message-ID: <415b9181$0$27723$626a14ce@news.free.fr>

szymon wrote:
> 
> You may be interested in Gene Stover's article:
> [ http://www.lisp-p.org/evie/ ]
> 
> Regards, Szymon.

It looked very interesting, but there is almost nothing on the page
you linked (i didn't even manage to find the tarballs...).

If you have other references, I am interested too !

Thanks

From: Time Waster
Subject: Re: Help with GA, and critique my Lisp (please ;-))
Date: Wed, 29 Sep 2004 10:18:50 +0000
Message-ID: <1096453130.WK7h0JIaF6rOLfPekcbDMg@teranews>

On Mon, 27 Sep 2004 16:25:38 GMT, <·······@abstractnonsense.com> wrote:
> Hi all,
> 	going on the theory that you have to not mind looking like an idiot if
> you want to learn anything worth knowing...

Well, in that spirit...

You are simply matching a fixed polynominals coefficients, the hard
way.  There has been a similar problem on the net for a few years
using perl that is far more GP than this example.  A point to ponder
is that an infinite supply of functions can pass thru your discrete
point list, it's more interesting (and far more useful GP code) to
constrain the solution set in a different fashion.

Look up the perl version, it's been published in The Perl Journal
(they make basic mathematical mistakes too, programmers are always in
good company there).  Do the basic "Hello world" and Ones programs.
Then look for Koza's work (it is in Lisp already).  Then solve your
discrete point list (same list of course) with a Fourier series... or
anything you want with Kozas tree method.

GP is for optimization problems where you don't know any alternative,
it's obviously not as fast in an area with advanced techniques in
place.

Oh, and the fitness function is always the key issue, you will get
whatever you put into it ;-)