From: ···········@yahoo.com
Subject: FART Performance
Date: 
Message-ID: <1175474786.265158.54580@n59g2000hsh.googlegroups.com>
I'm looking for some advice on mathematical optimization in CL. I have
implemented a Fuzzy ART network, and I wonder how I might
significantly improve the performance.

After trying some simple differential benchmarks on simple constructs,
I tried implementing it without CLOS, using a struct instead of class
and functions instead of methods. After benchmarking the alternate
implementation (and keeping in mind that small differences will
snowball into big differences as the size of the network grows), I
didn't see enough difference to justify the sacrifices. (Due to the
input randomization, the resulting networks had slightly different
sizes, but they were close enough. I'll have to devise a static set of
inputs to truly compare apples to apples.)

Because the application requires chaining FARTs together, to build
multi-level networks, such as Fusion ARTMAPs, I'll require a
structured architecture.

I tried profiling with SBCL's profiling utilities, but as I don't have
much experience profiling in CL, I didn't quite know how to interpret
the reports.

>>>>>>>>>>

(defclass fart ()
  ((inputs
    :initarg :inputs :accessor inputs :type simple-vector
    :documentation "the vector of inputs (I)")
   (outputs
    :initarg :outputs :accessor outputs :initform nil
    :documentation "the vector of outputs (Y)")
   (weights
    :initarg :weights :accessor weights :initform nil
    :documentation "the two-dimensional vector of weights (W)")
   (vigilance
    :initarg :vigilance :accessor vigilance :initform 0.0 :type single-
float
    :documentation "the vigilance parameter (phi)")
   (alpha
    :initarg :alpha :accessor alpha :initform 0.0 :type single-float
    :documentation "the alpha parameter")
   (beta
    :initarg :beta :accessor beta :initform 0.0 :type single-float
    :documentation "the beta parameter")
   (winner
    :initarg :winner :accessor winner :initform 0 :type single-float
    :documentation "the winning category")
   (size-delta
    :initarg :size-delta :accessor size-delta :initform 8 :type
integer
    :documentation "the amount by which vector allocations will grow")
   (complement-coding
    :initarg :complement-coding :accessor complement-coding :initform
t
    :documentation "whether the fuzzy-art complement-codes its
input")))

(defmethod initialize-instance :after ((instance fart) &key)
  (with-accessors ((outputs outputs) (weights weights) (winner winner)
		   (size-delta size-delta)) instance
    (unless (and (slot-boundp instance 'outputs) outputs)
      (setf outputs (make-array size-delta :element-type 'single-
float :fill-pointer 0)))
    (unless (and (slot-boundp instance 'weights) weights )
      (setf weights (make-array size-delta :initial-element nil :fill-
pointer 0)))
    (unless (and (slot-boundp instance 'winner) (>= winner 0))
      (setf winner 0)))
  instance)

(defgeneric feed-inputs (element value))

(defmethod feed-inputs ((fart fart) (value array))
  "Set the current input vector for training. With complement-coding
disabled,
this method merely sets inputs to value, so altering the shared array
object would
lead to undefined behavior. With complement-coding enabled, it sets
inputs to
a complement-coded copy of value. Related egghead methods do not alter
the value array."
  (if (complement-coding fart)
      (let* ((length (length value))
	     (array (make-array (* 2 (length value)) :element-type 'single-
float)))
	(loop for i from 0 below length
	      for ii across value
	      do (setf (aref array i) ii (aref array (+ i length)) (- 1 ii))
	      finally (return (setf (inputs fart) array))))
      (setf (inputs fart) value)))

(defgeneric grow (element))

(defmethod grow ((fart fart))
  "The outputs vector (y) and the weights vector (w) grow together.
This method will
increment the fill-pointer of each vector, increase the vector
allocations if necessary,
set the winner to the new category and return T, indicating that the
winning category is
uncommitted."
  (with-accessors ((outputs outputs) (weights weights) (winner
winner)
		   (size-delta size-delta)) fart
    (let ((category-count (fill-pointer outputs)))
      (when (eq (array-dimension outputs 0) category-count)
	(adjust-array outputs (+ category-count size-delta))
	(adjust-array weights (+ category-count size-delta) :initial-element
nil))
      (incf (fill-pointer outputs))
      (incf (fill-pointer weights))))
  t)

(defgeneric fire (element #|activator signal context|#))

(defmethod fire ((fart fart) #|activator signal context|#)
  (setf (winner fart) -1)
  (if (eq 0 (fill-pointer (outputs fart)))
      (grow fart)
      (let* ((max-y most-negative-single-float)
	     (inputs (inputs fart)) (outputs (outputs fart)) (weights
(weights fart))
	     (alpha (alpha fart)) (vigilance (vigilance fart)) (winner
(winner fart))
	     (sum-inputs (loop for ii across inputs sum ii into sum finally
(return sum))))
	;; TODO: With complement-coding, sum-inputs will be constant across
input vectors
	;; of the same size--i.e., half of the node count of the complement-
coded input vector.
	(loop for j from 0 below (length outputs)
	      for wj across weights
	      do (multiple-value-bind (sum-weights sum-iw)
		     (loop for wji across wj
			   for ii across inputs
			   sum wji into sum-weights
			   sum (min wji ii) into sum-iw
			   finally (return (values sum-weights sum-iw)))
		   ;; FIX: As it is, fire activates all of the outputs, when it
should only
		   ;; activate the winner. Use a vector pre-y for pre-activation
signals,
		   ;; zero the outputs and activate only the winner.
		   (setf (aref outputs j) (/ sum-iw (+ alpha sum-weights)))
		   (if (and (>= sum-iw (* vigilance sum-inputs))
			    (< max-y (aref outputs j)))
		       (setf winner j max-y (aref outputs j)))))
	(setf (winner fart) winner)
	(if (eq winner -1)
	    (grow fart)
	    nil))))

(defgeneric learn (element #|specialty signal context|#))

(defmethod learn ((fart fart) #|specialty signal context|#)
  (with-accessors ((inputs inputs) (outputs outputs) (weights weights)
		   (beta beta) (winner  winner)) fart
    (if (eq -1 winner)
	;; fast learning
	(progn
	  (setf winner (1- (fill-pointer outputs)))
	  (setf (aref weights winner) (copy-seq inputs))
	  (setf (aref outputs winner) 1.0))
	;; slow recoding
	(let ((wj (aref weights winner))) ; winner must be a valid index into
weights
	  (loop for i from 0 below (length inputs)
		for wji across wj
		for ii across inputs
		do (setf (aref wj i) (+ (* beta (min ii wji)) (* (- 1 beta) wji)))
		finally (return nil))))))

(defgeneric train-current (element))

(defmethod train-current ((fart fart))
  (fire fart #|nil nil nil|#)
  (learn fart #|nil nil nil|#))

(defparameter *fart* (make-instance 'fart :vigilance 0.89 :alpha
0.000001 :beta .1 :size-delta 2048))
(defparameter *inputs* (make-array 4 ))
(proclaim '(simple-vector *inputs*))

(defun bang ()
  (loop for i from 0 below (length *inputs*)
	do (setf (svref *inputs* i) (coerce (/ (random 100) 100) 'single-
float)))
  (feed-inputs *fart* *inputs*)
  (train-current *fart*)
  (winner *fart*))

>>>>>>>>>>

; On an AMD Athlon(tm) 64 X2 Dual Core Processor 4200+, with 2GB RAM,
; a linux-2.6.17, 64-bit Gentoo system and SBCL 1.0.1, I have the
following:
CL-USER> (time (dotimes (n 10000) (bang)))
Evaluation took:
  7.193 seconds of real time
  7.192449 seconds of user run time
  0.0 seconds of system run time
  0 calls to %EVAL
  0 page faults and
  2,467,552 bytes consed.
NIL
CL-USER> (length (outputs *fart*))
560
; Of course, as the network grows, the runtime increases.
CL-USER> (time (dotimes (n 10000) (bang)))
Evaluation took:
  13.337 seconds of real time
  13.336833 seconds of user run time
  0.0 seconds of system run time
  0 calls to %EVAL
  0 page faults and
  2,446,848 bytes consed.
NIL
CL-USER> (length (outputs *fart*))
829
CL-USER>
; Build a network up to a few thousand nodes, and you might as well go
find other things to do while it chews on it.
>>>>>>>>>>

I tried a couple of things, such as proclaiming data types of slots,
arrays and specific variables, making arrays simple-vectors where
possible, etc., but I wasn't seeing any dramatic gains. I'd wager that
the bottleneck is the math--correct me if I'm wrong. Due mostly to my
inexperience in CL optimizations, I imagine that there must be a few
things that will result in marked performance improvements. Does
anyone have some pointers on where to start?

Thanks in advance,
Jack

From: D Herring
Subject: Re: FART Performance
Date: 
Message-ID: <t96dnWIiLLg6xI3bnZ2dnUVZ_oCmnZ2d@comcast.com>
···········@yahoo.com wrote:
> I'm looking for some advice on mathematical optimization in CL. I have
> implemented a Fuzzy ART network, and I wonder how I might
> significantly improve the performance.
...
> I tried profiling with SBCL's profiling utilities, but as I don't have
> much experience profiling in CL, I didn't quite know how to interpret
> the reports.
...

Try using the statistical profiler to identify hot spots:
http://www.sbcl.org/manual/Statistical-Profiler.html

As for efficiency,
- use type declarations in hot spots, especially for raw numbers
- generic CLOS method calls are slower than normal function calls
- fix any warnings SBCL may give (these are often optimization hints)

Hope that helps,
Daniel
From: ···········@yahoo.com
Subject: Re: FART Performance
Date: 
Message-ID: <1175483525.189535.295460@l77g2000hsb.googlegroups.com>
On Apr 1, 6:17 pm, D Herring <········@at.tentpost.dot.com> wrote:
> Try using the statistical profiler to identify hot spots:http://www.sbcl.org/manual/Statistical-Profiler.html
>
> As for efficiency,
> - use type declarations in hot spots, especially for raw numbers
> - generic CLOS method calls are slower than normal function calls
> - fix any warnings SBCL may give (these are often optimization hints)
>
> Hope that helps,
> Daniel

Thanks, Daniel. I had played with the statistical profiler before I
posted the query, but I wasn't exactly sure what it was showing me.
After looking at the report again, I decided that it was spending a
lot of time in SB-KERNEL:HAIRY-DATA-VECTOR-REF and related functions,
so I declared the types of some variables in the LOOP ACROSS clauses.
This trick improved performance by 20%! Profiling now reports a ~60%
reduction of total sample of HAIR-DATA-VECTOR-REF, moving it from
first to third on the list of samples.

CL-USER> (time (dotimes (n 10000) (bang)))
Evaluation took:
  5.71 seconds of real time
  5.712357 seconds of user run time
  0.0 seconds of system run time
  0 calls to %EVAL
  0 page faults and
  2,778,640 bytes consed.
NIL
CL-USER> (length (outputs *fart*))
563
CL-USER>

>
> As for efficiency,
> - use type declarations in hot spots, especially for raw numbers

I found some warnings on c.l.l. that indiscriminate use of type
declarations can actually degrade performance, so until I find some
advice on how to do it effectively I'll stay away from it. I did find
one raw number in the learn method and decorated it with (the single-
float 1.0). I'm not too concerned about the other raw numbers in the
code, because they're not in the inner loops.

> - generic CLOS method calls are slower than normal function calls

I was aware of this and did some micro-benchmarks which suggested that
calls to methods incur a 600% penalty over calls to functions on my
system. When I tried the aforementioned alternate implementation with
a structure and plain functions, I didn't see that much difference in
the overall performance. After making other tweaks, if I want to shave
off more, I'll definitely reconsider the alternate implementation.

> - fix any warnings SBCL may give (these are often optimization hints)

I'm sure this would help, but it hasn't given me any warnings.

Thanks for the ideas. Keep 'em coming.

Jack
From: kavenchuk
Subject: Re: FART Performance
Date: 
Message-ID: <1175501534.543902.158970@l77g2000hsb.googlegroups.com>
On Apr 2, 4:17 am, D Herring <········@at.tentpost.dot.com> wrote:

> - generic CLOS method calls are slower than normal function calls

Hmm, compiler not translate generic call with parameters of concrete
type to call of a concrete method?

--
WBR, Yaroslav Kavenchuk.
From: Rainer Joswig
Subject: Re: FART Performance
Date: 
Message-ID: <joswig-C35AA6.11533202042007@news-europe.giganews.com>
In article <························@l77g2000hsb.googlegroups.com>,
 "kavenchuk" <·········@gmail.com> wrote:

> On Apr 2, 4:17 am, D Herring <········@at.tentpost.dot.com> wrote:
> 
> > - generic CLOS method calls are slower than normal function calls
> 
> Hmm, compiler not translate generic call with parameters of concrete
> type to call of a concrete method?

One can change the generic function at runtime. CLOS
automatically then uses the actual definition of the
generic function. Note also that the called method
is not THE method, but a computed method based
on the method combination and the matching methods
(in standard method combination this includes
:around, :before and :after methods). Note also
that the class hierarchy can change at runtime which
may trigger that a different set of methods matches
a generic function call.

CLOS has been design to provide a maximum of dynamic
features at runtime. There is little/none in the ANSI CL
standard to make CLOS more efficient. This is left
to the implementations. The implementations can be based
on a naive CLOS implementation or use the mythical 
smart compiler / runtime to implement the standard CLOS.
Additionally non-standard features are provided by
some implementations:.

A certain implementation may provide some extension to
Common Lisp to allow the compiler to generate more
efficient code. One such non-standard optimization
is called 'sealing'. You would then declare the classes
and/or methods as non-extensible and the compiler
can make static assumptions. But, again, this is
purely non-standard and sometimes used for delivery
of applications.

> 
> --
> WBR, Yaroslav Kavenchuk.

-- 
http://lispm.dyndns.org
From: dpapathanasiou
Subject: Re: FART Performance
Date: 
Message-ID: <1175519968.786181.248460@o5g2000hsb.googlegroups.com>
On Apr 1, 8:46 pm, ···········@yahoo.com wrote:
> I'm looking for some advice on mathematical optimization in CL. I have
> implemented a Fuzzy ART network, and I wonder how I might

"Fuzzy ART network", eh?

> (defclass fart ()

hmmm...

> (defmethod grow ((fart fart))

Suspicious...

> (defmethod fire ((fart fart) #|activator signal context|#)

Ok, that's just silly.

Can anyone say "April Fools"?
From: John Thingstad
Subject: Re: FART Performance
Date: 
Message-ID: <op.tp5vzra9pqzri1@pandora.upc.no>
On Mon, 02 Apr 2007 15:19:28 +0200, dpapathanasiou  
<···················@gmail.com> wrote:

> On Apr 1, 8:46 pm, ···········@yahoo.com wrote:
>> I'm looking for some advice on mathematical optimization in CL. I have
>> implemented a Fuzzy ART network, and I wonder how I might
>
> "Fuzzy ART network", eh?
>
>> (defclass fart ()
>
> hmmm...
>
>> (defmethod grow ((fart fart))
>
> Suspicious...
>
>> (defmethod fire ((fart fart) #|activator signal context|#)
>
> Ok, that's just silly.
>
> Can anyone say "April Fools"?
>
>

nop, but ROTFL

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
From: ·······@gmail.com
Subject: Re: FART Performance
Date: 
Message-ID: <1175538217.286795.145130@d57g2000hsg.googlegroups.com>
On Apr 2, 11:00 am, "John Thingstad" <··············@chello.no> wrote:
> On Mon, 02 Apr 2007 15:19:28 +0200, dpapathanasiou
>
>
>
> <···················@gmail.com> wrote:
> > On Apr 1, 8:46 pm, ···········@yahoo.com wrote:
> >> I'm looking for some advice on mathematical optimization in CL. I have
> >> implemented a Fuzzy ART network, and I wonder how I might
>
> > "Fuzzy ART network", eh?
>
> >> (defclass fart ()
>
> > hmmm...
>
> >> (defmethod grow ((fart fart))
>
> > Suspicious...
>
> >> (defmethod fire ((fart fart) #|activator signal context|#)
>
> > Ok, that's just silly.
>
> > Can anyone say "April Fools"?
>
> nop, but ROTFL
>
> --
> Using Opera's revolutionary e-mail client:http://www.opera.com/mail/

Since being in the South for a few days, I must admit that the FART
performance has made significant strides. What was once a struggle to
get a squeeker out has now become a full on battle roy-al thanks to
the fabulous southern cuisine.
From: ···········@yahoo.com
Subject: Re: FART Performance
Date: 
Message-ID: <1175567496.810663.217700@o5g2000hsb.googlegroups.com>
On Apr 2, 11:23 am, ·······@gmail.com wrote:
> On Apr 2, 11:00 am, "John Thingstad" <··············@chello.no> wrote:
>
>
>
> > On Mon, 02 Apr 2007 15:19:28 +0200, dpapathanasiou
>
> > <···················@gmail.com> wrote:
> > > On Apr 1, 8:46 pm, ···········@yahoo.com wrote:
> > >> I'm looking for some advice on mathematical optimization in CL. I have
> > >> implemented a Fuzzy ART network, and I wonder how I might
>
> > > "Fuzzy ART network", eh?
>
> > >> (defclass fart ()
>
> > > hmmm...
>
> > >> (defmethod grow ((fart fart))
>
> > > Suspicious...
>
> > >> (defmethod fire ((fart fart) #|activator signal context|#)
>
> > > Ok, that's just silly.
>
> > > Can anyone say "April Fools"?
>
> > nop, but ROTFL
>
> > --
> > Using Opera's revolutionary e-mail client:http://www.opera.com/mail/
>
> Since being in the South for a few days, I must admit that the FART
> performance has made significant strides. What was once a struggle to
> get a squeeker out has now become a full on battle roy-al thanks to
> the fabulous southern cuisine.

Hey, I'm busting a gut here. Stop! it hurts!
From: ···········@yahoo.com
Subject: Re: FART Performance
Date: 
Message-ID: <1175567461.304768.250020@d57g2000hsg.googlegroups.com>
On Apr 2, 6:19 am, "dpapathanasiou" <···················@gmail.com>
wrote:
> On Apr 1, 8:46 pm, ···········@yahoo.com wrote:
>
> > I'm looking for some advice on mathematical optimization in CL. I have
> > implemented a Fuzzy ART network, and I wonder how I might
>
> "Fuzzy ART network", eh?
>
> > (defclass fart ()
>
> hmmm...
>
> > (defmethod grow ((fart fart))
>
> Suspicious...
>
> > (defmethod fire ((fart fart) #|activator signal context|#)
>
> Ok, that's just silly.
>
> Can anyone say "April Fools"?

I especially liked...

 (feed-inputs *fart* *inputs*)
 (train-current *fart*)
 (winner *fart*))

By the way, after some tweaking, my FART machine now bangs 10,000
times in less than 2 seconds. And I see a few more tweaks that I can
make, when I have a chance. Woohoo!