From: Jeffrey Mark Siskind
Subject: Re: Query on efficiency
Date: 
Message-ID: <6gevtv$q09$1@goldenapple.srv.cs.cmu.edu>
   > My question really is, how efficient is this? In an imperative language,
   > the environment (E) is obviously mutable, and I know this could be
   > simulated by using references. Or would the compiler recognise this as a
   > common form (???) and have techniques to transform it into one
   > internally?

   On a model of storage which says that RAM access is constant-time, then
   functional programs can cost more than non-functional programs by a
   logarithmic factor. However the constant-time memory model is a load
   of nonsense. For a more realistic model, take your pick of logarithmic
   versus cube-root-of n, for a single processor.

   Potentially, functional programming languages [or more precisely garbage
   collected languages] have the advantage that the system has an extra degree
   of freedom in that it can choose to relocate objects. Garbage collectors
   in fact typically compact storage usage, leaving no holes. Combined with
   some kind of generational scheme this can offer real efficiency since with
   modern machines storage access is expensive relative to computation.

While this may be true in theory, in practise I have yet to see a compiler
which generates as good code for functional programs as for imperative
programs. Let me give a concrete example. I have enclosed below two benchmarks
that compute the same thing. One is written in a functional style and the
other is written in an imperative style. The benchmarks are written in R4RS
Scheme and implement a multidimensional clustering algorithm based on the EM
algorithm.

I have run this pair of programs on four different Scheme implementations. And
in all four, the imperative version runs faster than the functional version.

          functional imperative ratio
Stalin        19.670     10.710  1.84
Scheme->C    117.770     43.710  2.69
Gambit-C     456.090    270.460  1.69
Chez          93.550     64.960  1.44

I want to stress that the absolute performance of the various Scheme
implementations is not important here. It is obvious that different
implementations employ compilation techniques of different levels of
sophistication. What is important, however, is the relative performance of the
functional and imperative versions within a given implementation. That
controls for the level of sophistication of an implementation such that
presumably the difference being measured is in the cost of a functional style
versus imperative style keeping all other compilation technology constant.
While it is conceivable that implementations go out of their way to optimize
things that pertain to an imperative style more than those that pertain to a
functional style, I would conjecture that this is not the case for Scheme
implementations where the predominant culture has been to support a functional
style. Yet, despite that culture, all four of the implementations that I have
tested perform significantly better with an imperative style than a functional
style.

So you may say, OK, all of the Scheme implementations that you tested are
broken. Well, here is my challenge. Rewrite these benchmarks faithfully in
your favourite language. Any language that supports both a functional and
imperative style. And run them in your favourite implementation of that
language. I predict that you will get similar results.

Prove me wrong. Really. I want to be proven wrong. I have been programming in
a predominantly functional style for almost twenty years. Back in 1981 I wrote
a silicon compiler known as MacPitts. At the time, it was probably the world's
largest purely functional program (about 50,000 lines of Franz Lisp without
any side effects). And I continue to program in a functional style despite my
experience that such a style incurs a speed penalty.

I'd like to ward off anticipated criticism of the enclosed benchmarks. I
posted earlier versions of these benchmarks about two years ago. But those
versions had several flaws that have been corrected in the enclosed versions:

1. The new versions are almost almost almost R4RS. The only exception is that
   (log 0.0) must return an IEEE minus-infinity instead of raising an
   exception.
2. They include a Scheme random-number generator (stolen from slib and
   modified) instead of calling C's rand.
3. They have been rewritten so that they execute the same number of iterations
   in both the functional and imperative versions. Furthermore, the log
   likelihood at each iteration is the same for both the functional and
   imperative version. The precise number, order, and arguments of the
   floating point operations varies slightly between the two benchmarks,
   particularly when reducing a summation or multiplying matrices and
   vectors. The follows from differences in what is the most natural way to
   write this code in a functional style versus an imperative style. But
   despite this, both versions perform the same number of iterations in all
   four implementations tested and compute precisely the same log likelihood
   at each iteration in three of the four implementations tested. (Stalin
   yields slightly different low-order bits on x86s because register floating
   point arithmetic is 80 bits while floating point representation in memory is
   64 bits. Different register allocation and spill schedules between the
   functional and imperative versions result in slight differences in the
   low-order bits of the computed log likelihoods at each iteration. But these
   do not affect the number of iterations and do not affect the end result.
   Furthermore, this only affects Stalin because Stalin does unboxed floating
   point arithmetic. The three other implementations do boxed floating point
   which forces a store after each floating point operation, effectively
   normalizing all computation to 64 bits.

It has also been pointed out that the benchmarks involve numerous
single-element vectors. This is legitimate for two reasons. First, the
benchmark is written as a multidimensional clustering algorithm. The points to
be clustered are vectors, not scalars. As it turns out, this benchmark is only
run on unit-length vectors but that does not invalidate the benchmark. It is
often the case that general-purpose code is run on special-case data. Second,
both versions suffer from the same problem. Absolute performance is not the
issue, relative performance is. And boxing scalars as unit-length vectors
hould not affect the relative performance.

I put forth this benchmark as a challenge to the functional programming
community.

    Jeff (http://www.neci.nj.nec.com/homepages/qobi)
-------------------------------------------------------------------------------
;;; The constants are hardwired to be inexact for efficiency.

;;; begin Stalin
(define make-model (primitive-procedure make-structure model 6))
(define model-pi (primitive-procedure structure-ref model 0))
(define model-mu (primitive-procedure structure-ref model 1))
(define model-sigma (primitive-procedure structure-ref model 2))
(define model-log-pi (primitive-procedure structure-ref model 3))
(define model-sigma-inverse (primitive-procedure structure-ref model 4))
(define model-log-determinant-sigma
 (primitive-procedure structure-ref model 5))
(define (void) ((lambda ())))
(define ieee-log log)
;;; end Stalin
;;; begin Scheme->C
(define make-model vector)
(define (model-pi model) (vector-ref model 0))
(define (model-mu model) (vector-ref model 1))
(define (model-sigma model) (vector-ref model 2))
(define (model-log-pi model) (vector-ref model 3))
(define (model-sigma-inverse model) (vector-ref model 4))
(define (model-log-determinant-sigma model) (vector-ref model 5))
(define (panic s) (error 'panic s))
(define (void) #f)
(define ieee-log log)
;;; end Scheme->C
;;; begin Gambit-C
(define-structure model
 pi mu sigma log-pi sigma-inverse log-determinant-sigma)
(define (panic s) (error s))
(define ieee-log ##flonum.log)
;;; end Gambit-C
;;; begin Chez
(define make-model vector)
(define (model-pi model) (vector-ref model 0))
(define (model-mu model) (vector-ref model 1))
(define (model-sigma model) (vector-ref model 2))
(define (model-log-pi model) (vector-ref model 3))
(define (model-sigma-inverse model) (vector-ref model 4))
(define (model-log-determinant-sigma model) (vector-ref model 5))
(define (panic s) (error 'panic s))
(define ieee-log log)
;;; end Chez

(define (hex-string->number s)
 (let loop ((s (string->list s)) (c 0))
  (if (null? s)
      c
      (loop (cdr s) (+ (* 16 c)
		       (if (char-numeric? (car s))
			   (- (char->integer (car s)) (char->integer #\0))
			   (+ (- (char->integer (car s)) (char->integer #\a))
			      10)))))))

;;; The following code is a modified version of code taken from SLIB.
;;; Copyright (C) 1991, 1993 Aubrey Jaffer.
;
;Permission to copy this software, to redistribute it, and to use it
;for any purpose is granted, subject to the following restrictions and
;understandings.
;
;1.  Any copy made of this software must include this copyright notice
;in full.
;
;2.  I have made no warrantee or representation that the operation of
;this software will be error-free, and I am under no obligation to
;provide any services, by way of maintenance, update, or otherwise.
;
;3.  In conjunction with products arising from the use of this
;material, there shall be no use of my name in any advertising,
;promotional, or sales literature without prior written consent in
;each case.

(define *most-positive-fixnum* 65535)

(define (logical:logxor n1 n2)
 (cond ((= n1 n2) 0)
       ((zero? n1) n2)
       ((zero? n2) n1)
       (else (+ (* (logical:logxor (logical:ash-4 n1) (logical:ash-4 n2)) 16)
		(vector-ref (vector-ref logical:boole-xor (modulo n1 16))
			    (modulo n2 16))))))

(define (logical:ash-4 x)
 (if (negative? x) (+ -1 (quotient (+ 1 x) 16)) (quotient x 16)))

(define logical:boole-xor
 '#(#(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
    #(1 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14)
    #(2 3 0 1 6 7 4 5 10 11 8 9 14 15 12 13)
    #(3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12)
    #(4 5 6 7 0 1 2 3 12 13 14 15 8 9 10 11)
    #(5 4 7 6 1 0 3 2 13 12 15 14 9 8 11 10)
    #(6 7 4 5 2 3 0 1 14 15 12 13 10 11 8 9)
    #(7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8)
    #(8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7)
    #(9 8 11 10 13 12 15 14 1 0 3 2 5 4 7 6)
    #(10 11 8 9 14 15 12 13 2 3 0 1 6 7 4 5)
    #(11 10 9 8 15 14 13 12 3 2 1 0 7 6 5 4)
    #(12 13 14 15 8 9 10 11 4 5 6 7 0 1 2 3)
    #(13 12 15 14 9 8 11 10 5 4 7 6 1 0 3 2)
    #(14 15 12 13 10 11 8 9 6 7 4 5 2 3 0 1)
    #(15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0)))

(define random:tap 24)

(define random:size 55)

(define (random:size-int l)
 (let ((trial (hex-string->number (make-string l #\f))))
  (if (and (exact? trial) (positive? trial) (>= *most-positive-fixnum* trial))
      l
      (random:size-int (- l 1)))))

(define random:chunk-size (* 4 (random:size-int 8)))

(define random:MASK
 (hex-string->number (make-string (quotient random:chunk-size 4) #\f)))

(define *random-state* '#())

(let ((random-strings
       '#("d909ef3e" "fd330ab3" "e33f7843" "76783fbd" "f3675fb3"
		     "b54ef879" "0be45590" "a6794679" "0bcd56d3" "fabcdef8"
		     "9cbd3efd" "3fd3efcd" "e064ef27" "dddecc08" "34444292"
		     "85444454" "4c519210" "c0366273" "54734567" "70abcddc"
		     "1bbdac53" "616c5a86" "a982efa9" "105996a0" "5f0cccba"
		     "1ea055e1" "fe2acd8d" "1891c1d4" "e6690270" "6912bccc"
		     "2678e141" "61222224" "907abcbb" "4ad6829b" "9cdd1404"
		     "57798841" "5b892496" "871c9cd1" "d1e67bda" "8b0a3233"
		     "578ef23f" "28274ef6" "823ef5ef" "845678c5" "e67890a5"
		     "5890abcb" "851fa9ab" "13efa13a" "b12278d6" "daf805ab"
		     "a0befc36" "0068a7b5" "e024fd90" "a7b690e2" "27f3571a"
		     0)))
 (set! *random-state* (make-vector (+ random:size 1) 0))
 (let ((nibbles (quotient random:chunk-size 4)))
  (do ((i 0 (+ i 1))) ((= i random:size))
   (vector-set!
    *random-state* i
    (hex-string->number
     (substring (vector-ref random-strings i) 0 nibbles))))))

;;; random:chunk returns an integer in the range of
;;; 0 to (- (expt 2 random:chunk-size) 1)
(define (random:chunk v)
 (let* ((p (vector-ref v random:size))
	(ans (logical:logxor
	      (vector-ref v (modulo (- p random:tap) random:size))
	      (vector-ref v p))))
  (vector-set! v p ans)
  (vector-set! v random:size (modulo (- p 1) random:size))
  ans))

(define (rand)
 (do ((ilen 0 (+ 1 ilen))
      (s random:MASK (+ random:MASK (* (+ 1 random:MASK) s))))
   ((>= s (- *most-positive-fixnum* 1))
    (let ((slop (modulo (+ s (- 1 *most-positive-fixnum*))
			*most-positive-fixnum*)))
     (let loop ((n ilen) (r (random:chunk *random-state*)))
      (cond ((not (zero? n))
	     (loop (+ -1 n)
		   (+ (* r (+ 1 random:MASK)) (random:chunk *random-state*))))
	    ((>= r slop) (modulo r *most-positive-fixnum*))
	    (else (loop ilen (random:chunk *random-state*)))))))))

;;; End of code taken from SLIB

(define log-math-precision 35.0)

(define minus-infinity (ieee-log 0.0))

(define first car)

(define second cadr)

(define rest cdr)

(define (for-each-n f n)
 (let loop ((i 0)) (if (< i n) (begin (f i) (loop (+ i 1))))))

(define (for-each-from-a-up-to-b f a b)
 (let loop ((i a)) (if (< i b) (begin (f i) (loop (+ i 1))))))

(define (map-vector f v)
 ;; needs work: Won't work correctly when F is nondeterministic.
 (let ((u (make-vector (vector-length v))))
  (for-each-n (lambda (i) (vector-set! u i (f (vector-ref v i))))
	      (vector-length v))
  u))

(define (map-vector-two f v1 v2)
 ;; needs work: Won't work correctly when F is nondeterministic.
 (let ((u (make-vector (vector-length v1))))
  (for-each-n
   (lambda (i) (vector-set! u i (f (vector-ref v1 i) (vector-ref v2 i))))
   (vector-length v1))
  u))

(define (reduce f l i)
 (cond ((null? l) i)
       ((null? (rest l)) (first l))
       (else (let loop ((l (rest l)) (c (first l)))
	      (if (null? l) c (loop (rest l) (f c (first l))))))))

(define (reduce-vector f v i)
 (let ((n (vector-length v)))
  (cond ((zero? n) i)
	((= n 1) (vector-ref v 0))
	(else (let loop ((i 1) (c (vector-ref v 0)))
	       (if (= i n) c (loop (+ i 1) (f c (vector-ref v i)))))))))

(define (every-n p n)
 (let loop ((i 0)) (or (>= i n) (and (p i) (loop (+ i 1))))))

(define (sum f n)
 (let loop ((n (- n 1)) (c 0.0))
  (if (negative? n) c (loop (- n 1) (+ c (f n))))))

(define (dot u v) (reduce-vector + (map-vector-two * u v) 0.0))

(define (v+ u v) (map-vector-two + u v))

(define (v- u v) (map-vector-two - u v))

(define (k*v k u) (map-vector (lambda (x) (* k x)) u))

(define (add-exp e1 e2)
 (let* ((e-max (max e1 e2))
	(e-min (min e1 e2))
	(factor (floor e-min)))
  (if (= e-max minus-infinity)
      minus-infinity
      (if (> (- e-max factor) log-math-precision)
	  e-max
	  (+ (ieee-log (+ (exp (- e-max factor)) (exp (- e-min factor))))
	     factor)))))

(define (map-n f n)
 ;; needs work: To eliminate REVERSE.
 (let loop ((i 0) (c '()))
  (if (< i n) (loop (+ i 1) (cons (f i) c)) (reverse c))))

(define (map-n-vector f n)
 (let ((v (make-vector n)))
  (let loop ((i 0))
   (if (< i n)
       (begin (vector-set! v i (f i))
	      (loop (+ i 1)))))
  v))

(define (remove-if-not p l)
 ;; needs work: To eliminate REVERSE.
 (let loop ((l l) (c '()))
  (cond ((null? l) (reverse c))
	((p (first l)) (loop (rest l) (cons (first l) c)))
	(else (loop (rest l) c)))))

(define (positionv x l)
 (let loop ((l l) (i 0))
  (cond ((null? l) #f)
	((eqv? x (first l)) i)
	(else (loop (rest l) (+ i 1))))))

(define (make-matrix m n) (map-n-vector (lambda (i) (make-vector n)) m))

(define (make-matrix-initial m n initial)
 (map-n-vector (lambda (i) (make-vector n initial)) m))

(define (matrix-rows a) (vector-length a))

(define (matrix-columns a) (vector-length (vector-ref a 0)))

(define (matrix-ref a i j) (vector-ref (vector-ref a i) j))

(define (matrix-set! a i j x) (vector-set! (vector-ref a i) j x))

(define (matrix-row-ref a i) (vector-ref a i))

(define (matrix-column-ref a j) (map-vector (lambda (v) (vector-ref v j)) a))

(define (matrix-row-set! a i v) (vector-set! a i v))

(define (m+ a b) (map-vector-two v+ a b))

(define (m- a b) (map-vector-two v- a b))

(define (m*v a v) (map-vector (lambda (u) (dot u v)) a))

(define (transpose a)
 (map-n-vector (lambda (j) (matrix-column-ref a j)) (matrix-columns a)))

(define (outer-product f u v)
 (map-vector (lambda (ui) (map-vector (lambda (vj) (f ui vj)) v)) u))

(define (self-outer-product f v) (outer-product f v v))

(define (m* a b) (outer-product dot a (transpose b)))

(define (k*m k m)
 (map-vector (lambda (row) (map-vector (lambda (e) (* k e)) row)) m))

(define (determinant a)
 (if (not (= (matrix-rows a) (matrix-columns a)))
     (panic "Can only find determinant of a square matrix"))
 (call-with-current-continuation
  (lambda (return)
   (let* ((n (matrix-rows a))
	  (b (make-matrix n n))
	  (d 1.0))
    (for-each-n
     (lambda (i)
      (for-each-n (lambda (j) (matrix-set! b i j (matrix-ref a i j))) n))
     n)
    (for-each-n
     (lambda (i)
      ;; partial pivoting reduces rounding errors
      (let ((greatest (abs (matrix-ref b i i)))
	    (index i))
       (for-each-from-a-up-to-b
	(lambda (j)
	 (let ((x (abs (matrix-ref b j i))))
	  (if (> x greatest) (begin (set! index j) (set! greatest x)))))
	(+ i 1)
	n)
       (if (= greatest 0.0) (return 0.0))
       (if (not (= index i))
	   (let ((v (matrix-row-ref b i)))
	    (matrix-row-set! b i (matrix-row-ref b index))
	    (matrix-row-set! b index v)
	    (set! d (- d))))
       (let ((c (matrix-ref b i i)))
	(set! d (* d c))
	(for-each-from-a-up-to-b
	 (lambda (j) (matrix-set! b i j (/ (matrix-ref b i j) c)))
	 i
	 n)
	(for-each-from-a-up-to-b
	 (lambda (j)
	  (let ((e (matrix-ref b j i)))
	   (for-each-from-a-up-to-b
	    (lambda (k)
	     (matrix-set!
	      b j k (- (matrix-ref b j k) (* e (matrix-ref b i k)))))
	    (+ i 1)
	    n)))
	 (+ i 1)
	 n))))
     n)
    d))))

(define (invert-matrix a)
 (if (not (= (matrix-rows a) (matrix-columns a)))
     (panic "Can only invert a square matrix"))
 (let* ((n (matrix-rows a))
	(c (make-matrix n n))
	(b (make-matrix-initial n n 0.0)))
  (for-each-n
   (lambda (i)
    (for-each-n (lambda (j) (matrix-set! c i j (matrix-ref a i j))) n))
   n)
  (for-each-n (lambda (i) (matrix-set! b i i 1.0)) n)
  (for-each-n
   (lambda (i)
    (if (zero? (matrix-ref c i i))
	(call-with-current-continuation
	 (lambda (return)
	  (for-each-n (lambda (j)
		       (if (and (> j i) (not (zero? (matrix-ref c j i))))
			   (begin (let ((e (vector-ref c i)))
				   (vector-set! c i (vector-ref c j))
				   (vector-set! c j e))
				  (let ((e (vector-ref b i)))
				   (vector-set! b i (vector-ref b j))
				   (vector-set! b j e))
				  (return (void)))))
		      n)
	  (panic "Matrix is singular"))))
    (let ((d (/ (matrix-ref c i i))))
     (for-each-n (lambda (j)
		  (matrix-set! c i j (* d (matrix-ref c i j)))
		  (matrix-set! b i j (* d (matrix-ref b i j))))
		 n)
     (for-each-n
      (lambda (k)
       (let ((d (- (matrix-ref c k i))))
	(if (not (= k i))
	    (for-each-n
	     (lambda (j)
	      (matrix-set!
	       c k j (+ (matrix-ref c k j) (* d (matrix-ref c i j))))
	      (matrix-set!
	       b k j (+ (matrix-ref b k j) (* d (matrix-ref b i j)))))
	     n))))
      n)))
   n)
  b))

(define (jacobi a)
 (if (not (and (= (matrix-rows a) (matrix-columns a))
	       (every-n (lambda (i)
			 (every-n (lambda (j)
				   (= (matrix-ref a i j) (matrix-ref a j i)))
				  (matrix-rows a)))
			(matrix-rows a))))
     (panic "Can only compute eigenvalues/eigenvectors of a symmetric matrix"))
 (let* ((a (map-vector (lambda (row) (map-vector (lambda (x) x) row)) a))
	(n (matrix-rows a))
	(d (make-vector n))
	(v (make-matrix-initial n n 0.0))
	(b (make-vector n))
	(z (make-vector n 0.0)))
  (for-each-n (lambda (ip)
	       (matrix-set! v ip ip 1.0)
	       (vector-set! b ip (matrix-ref a ip ip))
	       (vector-set! d ip (matrix-ref a ip ip)))
	      n)
  (let loop ((i 0))
   (if (> i 50) (panic "Too many iterations in JACOBI"))
   (let ((sm (sum (lambda (ip)
		   (sum (lambda (ir)
			 (let ((iq (+ ip ir 1)))
			  (abs (matrix-ref a ip iq))))
			(- n ip 1)))
		  (- n 1))))
    (if (not (zero? sm))
	(begin
	 (let ((tresh (if (< i 3) (/ (* 0.2 sm) (* n n)) 0.0)))
	  (for-each-n
	   (lambda (ip)
	    (for-each-n
	     (lambda (ir)
	      (let* ((iq (+ ip ir 1))
		     (g (* 100.0 (abs (matrix-ref a ip iq)))))
	       (cond
		((and
		  (> i 3)
		  (= (+ (abs (vector-ref d ip)) g) (abs (vector-ref d ip)))
		  (= (+ (abs (vector-ref d iq)) g) (abs (vector-ref d iq))))
		 (matrix-set! a ip iq 0.0))
		((> (abs (matrix-ref a ip iq)) tresh)
		 (let* ((h (- (vector-ref d iq) (vector-ref d ip)))
			(t (if (= (+ (abs h) g) (abs h))
			       (/ (matrix-ref a ip iq) h)
			       (let ((theta
				      (/ (* 0.5 h) (matrix-ref a ip iq))))
				(if (negative? theta)
				    (- (/ (+ (abs theta)
					     (sqrt (+ (* theta theta) 1.0)))))
				    (/ (+ (abs theta)
					  (sqrt (+ (* theta theta) 1.0))))))))
			(c (/ (sqrt (+ (* t t) 1.0))))
			(s (* t c))
			(tau (/ s (+ c 1.0)))
			(h (* t (matrix-ref a ip iq))))
		  (define (rotate a i j k l)
		   (let ((g (matrix-ref a i j))
			 (h (matrix-ref a k l)))
		    (matrix-set! a i j (- g (* s (+ h (* g tau)))))
		    (matrix-set! a k l (+ h (* s (- g (* h tau)))))))
		  (vector-set! z ip (- (vector-ref z ip) h))
		  (vector-set! z iq (+ (vector-ref z iq) h))
		  (vector-set! d ip (- (vector-ref d ip) h))
		  (vector-set! d iq (+ (vector-ref d iq) h))
		  (matrix-set! a ip iq 0.0)
		  (for-each-n (lambda (j)
			       (cond ((< j ip) (rotate a j ip j iq))
				     ((< ip j iq) (rotate a ip j j iq))
				     ((< iq j) (rotate a ip j iq j)))
			       (rotate v j ip j iq))
			      n))))))
	     (- n ip 1)))
	   (- n 1)))
	 (for-each-n
	  (lambda (ip)
	   (vector-set! b ip (+ (vector-ref b ip) (vector-ref z ip)))
	   (vector-set! d ip (vector-ref b ip))
	   (vector-set! z ip 0.0))
	  n)
	 (loop (+ i 1))))))
  (for-each-n
   (lambda (i)
    (let ((k i)
	  (p (vector-ref d i)))
     (for-each-n
      (lambda (l)
       (let* ((j (+ i l 1)))
	(if (>= (vector-ref d j) p)
	    (begin (set! k j) (set! p (vector-ref d j))))))
      (- n i 1))
     (if (not (= k i))
	 (begin (vector-set! d k (vector-ref d i))
		(vector-set! d i p)
		(for-each-n (lambda (j)
			     (let ((p (matrix-ref v j i)))
			      (matrix-set! v j i (matrix-ref v j k))
			      (matrix-set! v j k p)))
			    n)))))
   (- n 1))
  (list d (transpose v))))

(define (vector->diagonal-matrix v)
 (let ((m (make-matrix-initial (vector-length v) (vector-length v) 0.0)))
  (for-each-n (lambda (i) (matrix-set! m i i (vector-ref v i)))
	      (vector-length v))
  m))

(define (clip-eigenvalues a v)
 (let* ((j (jacobi a))
	(e (second j)))
  (m* (transpose e)
      (m* (vector->diagonal-matrix (map-vector-two max v (first j))) e))))

;;; EM

(define (e-step x models)
 (let ((z (map-vector
	   (lambda (xi)
	    ;; Compute for each model.
	    (map-vector
	     (lambda (model)
	      (let ((log-pi (model-log-pi model))
		    (mu (model-mu model))
		    (sigma-inverse (model-sigma-inverse model))
		    (log-determinant-sigma
		     (model-log-determinant-sigma model)))
	       ;; Compute likelihoods (note: up to constant for all models).
	       (- log-pi
		  (* 0.5
		     (+ log-determinant-sigma
			(dot (v- xi mu) (m*v sigma-inverse (v- xi mu))))))))
	     models))
	   x)))
  ;; Normalize ownerships to sum to one.
  (let ((s (map-vector (lambda (zi) (reduce-vector add-exp zi minus-infinity))
		       z)))
   ;; Return log likelihood and ownerships matrix.
   (list (reduce-vector + s 0.0)
	 (map-vector (lambda (zi) (map-vector exp zi))
		     (m- z (transpose (make-vector (matrix-columns z) s))))))))

(define (m-step x z clip)
 ;; Returns new set of models.
 (let* ((ii (vector-length x))
	(kk (vector-length (vector-ref x 0))))
  ;; For each model, optimize parameters.
  (map-n-vector
   (lambda (j)
    (let* ((zj (matrix-column-ref z j))
	   (zj-sum (reduce-vector + zj 0.0))
	   ;; Optimize values.
	   (mu (k*v (/ zj-sum)
		    (reduce-vector
		     v+ (map-vector-two k*v zj x) (make-vector kk 0.0))))
	   (sigma (clip-eigenvalues
		   (k*m (/ zj-sum)
			(reduce-vector
			 m+
			 (map-vector-two
			  (lambda (zij xi)
			   (k*m zij (self-outer-product * (v- xi mu))))
			  zj
			  x)
			 (make-matrix-initial kk kk 0.0)))
		   clip)))
     (make-model (/ zj-sum ii)
		 mu
		 sigma
		 (ieee-log (/ zj-sum ii))
		 (invert-matrix sigma)
		 (ieee-log (determinant sigma)))))
   (matrix-columns z))))

(define (em x pi mu sigma clip em-kick-off-tolerance em-convergence-tolerance)
 (let ((jj (vector-length mu)))
  (let loop ((models
	      (map-n-vector (lambda (j)
			     (make-model
			      (vector-ref pi j)
			      (vector-ref mu j)
			      (vector-ref sigma j)
			      (ieee-log (vector-ref pi j))
			      (invert-matrix (vector-ref sigma j))
			      (ieee-log (determinant (vector-ref sigma j)))))
			    jj))
	     (old-log-likelihood minus-infinity)
	     (starting? #t))
   (let ((log-likelihood-z (e-step x models)))
    (if (or (and starting? (> (first log-likelihood-z) old-log-likelihood))
	    (> (first log-likelihood-z)
	       (+ old-log-likelihood em-convergence-tolerance)))
	(loop (m-step x (second log-likelihood-z) clip)
	      (first log-likelihood-z)
	      (and starting?
		   (not (= jj 1))
		   (or (= old-log-likelihood minus-infinity)
		       (< (first log-likelihood-z)
			  (+ old-log-likelihood em-kick-off-tolerance)))))
	(list old-log-likelihood models))))))

(define (noise epsilon)
 (- (* 2.0 epsilon (/ (exact->inexact (rand)) *most-positive-fixnum*))
    epsilon))

(define (initial-z ii jj)
 (map-n-vector
  (lambda (i)
   (let ((zi (map-n-vector
	      (lambda (j)
	       (+ (/ (exact->inexact jj)) (noise (/ (exact->inexact jj)))))
	      jj)))
    (k*v (/ (reduce-vector + zi 0.0)) zi)))
  ii))

(define (ems x clip em-kick-off-tolerance em-convergence-tolerance
	     ems-convergence-tolerance)
 (let loop ((jj 1)
	    ;; needs work: Should replace #F with ((LAMBDA ())).
	    (old-log-likelihood-models (list minus-infinity #f)))
  (let* ((models (m-step x (initial-z (vector-length x) jj) clip))
	 (new-log-likelihood-models
	  (em x
	      (map-vector model-pi models)
	      (map-vector model-mu models)
	      (map-vector model-sigma models)
	      clip
	      em-kick-off-tolerance
	      em-convergence-tolerance)))
   (if (> (- (/ (first old-log-likelihood-models)
		(first new-log-likelihood-models))
	     1.0)
	  ems-convergence-tolerance)
       (loop (+ jj 1) new-log-likelihood-models)
       (second old-log-likelihood-models)))))

(define (em-clusterer x clip em-kick-off-tolerance em-convergence-tolerance
		      ems-convergence-tolerance)
 (let* ((z (second (e-step x (ems x clip em-kick-off-tolerance
				  em-convergence-tolerance
				  ems-convergence-tolerance))))
	(clusters
	 (map-n (lambda (i)
		 (let ((zi (vector->list (vector-ref z i))))
		  (list i (positionv (reduce max zi minus-infinity) zi))))
		(vector-length z))))
  (map-n (lambda (j)
	  (map (lambda (cluster) (vector-ref x (first cluster)))
	       (remove-if-not (lambda (cluster) (= (second cluster) j))
			      clusters)))
	 (vector-length (vector-ref z 0)))))

(do ((i 0 (+ i 1))) ((= i 100))
 (write
  (em-clusterer
   '#(#(1.0) #(2.0) #(3.0) #(11.0) #(12.0) #(13.0)) '#(1.0) 10.0 1.0 0.01))
 (newline))
-------------------------------------------------------------------------------
;;; The constants are hardwired to be inexact for efficiency.

;;; begin Stalin
(define make-model (primitive-procedure make-structure model 6))
(define model-pi (primitive-procedure structure-ref model 0))
(define set-model-pi! (primitive-procedure structure-set! model 0))
(define model-mu (primitive-procedure structure-ref model 1))
(define model-sigma (primitive-procedure structure-ref model 2))
(define model-log-pi (primitive-procedure structure-ref model 3))
(define set-model-log-pi! (primitive-procedure structure-set! model 3))
(define model-sigma-inverse (primitive-procedure structure-ref model 4))
(define model-log-determinant-sigma
 (primitive-procedure structure-ref model 5))
(define set-model-log-determinant-sigma!
 (primitive-procedure structure-set! model 5))
(define (void) ((lambda ())))
(define ieee-log log)
;;; end Stalin
;;; begin Scheme->C
(define make-model vector)
(define (model-pi model) (vector-ref model 0))
(define (set-model-pi! model pi) (vector-set! model 0 pi))
(define (model-mu model) (vector-ref model 1))
(define (model-sigma model) (vector-ref model 2))
(define (model-log-pi model) (vector-ref model 3))
(define (set-model-log-pi! model log-pi) (vector-set! model 3 log-pi))
(define (model-sigma-inverse model) (vector-ref model 4))
(define (model-log-determinant-sigma model) (vector-ref model 5))
(define (set-model-log-determinant-sigma! model log-determinant-sigma)
 (vector-set! model 5 log-determinant-sigma))
(define (panic s) (error 'panic s))
(define (void) #f)
(define ieee-log log)
;;; end Scheme->C
;;; begin Gambit-C
(define-structure model
 pi mu sigma log-pi sigma-inverse log-determinant-sigma)
(define set-model-pi! model-pi-set!)
(define set-model-log-pi! model-log-pi-set!)
(define set-model-log-determinant-sigma! model-log-determinant-sigma-set!)
(define (panic s) (error s))
(define ieee-log ##flonum.log)
;;; end Gambit-C
;;; begin Chez
(define make-model vector)
(define (model-pi model) (vector-ref model 0))
(define (set-model-pi! model pi) (vector-set! model 0 pi))
(define (model-mu model) (vector-ref model 1))
(define (model-sigma model) (vector-ref model 2))
(define (model-log-pi model) (vector-ref model 3))
(define (set-model-log-pi! model log-pi) (vector-set! model 3 log-pi))
(define (model-sigma-inverse model) (vector-ref model 4))
(define (model-log-determinant-sigma model) (vector-ref model 5))
(define (set-model-log-determinant-sigma! model log-determinant-sigma)
 (vector-set! model 5 log-determinant-sigma))
(define (panic s) (error 'panic s))
(define ieee-log log)
;;; end Chez

(define (hex-string->number s)
 (let loop ((s (string->list s)) (c 0))
  (if (null? s)
      c
      (loop (cdr s) (+ (* 16 c)
		       (if (char-numeric? (car s))
			   (- (char->integer (car s)) (char->integer #\0))
			   (+ (- (char->integer (car s)) (char->integer #\a))
			      10)))))))

;;; The following code is a modified version of code taken from SLIB.
;;; Copyright (C) 1991, 1993 Aubrey Jaffer.
;
;Permission to copy this software, to redistribute it, and to use it
;for any purpose is granted, subject to the following restrictions and
;understandings.
;
;1.  Any copy made of this software must include this copyright notice
;in full.
;
;2.  I have made no warrantee or representation that the operation of
;this software will be error-free, and I am under no obligation to
;provide any services, by way of maintenance, update, or otherwise.
;
;3.  In conjunction with products arising from the use of this
;material, there shall be no use of my name in any advertising,
;promotional, or sales literature without prior written consent in
;each case.

(define *most-positive-fixnum* 65535)

(define (logical:logxor n1 n2)
 (cond ((= n1 n2) 0)
       ((zero? n1) n2)
       ((zero? n2) n1)
       (else (+ (* (logical:logxor (logical:ash-4 n1) (logical:ash-4 n2)) 16)
		(vector-ref (vector-ref logical:boole-xor (modulo n1 16))
			    (modulo n2 16))))))

(define (logical:ash-4 x)
 (if (negative? x) (+ -1 (quotient (+ 1 x) 16)) (quotient x 16)))

(define logical:boole-xor
 '#(#(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)
    #(1 0 3 2 5 4 7 6 9 8 11 10 13 12 15 14)
    #(2 3 0 1 6 7 4 5 10 11 8 9 14 15 12 13)
    #(3 2 1 0 7 6 5 4 11 10 9 8 15 14 13 12)
    #(4 5 6 7 0 1 2 3 12 13 14 15 8 9 10 11)
    #(5 4 7 6 1 0 3 2 13 12 15 14 9 8 11 10)
    #(6 7 4 5 2 3 0 1 14 15 12 13 10 11 8 9)
    #(7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8)
    #(8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7)
    #(9 8 11 10 13 12 15 14 1 0 3 2 5 4 7 6)
    #(10 11 8 9 14 15 12 13 2 3 0 1 6 7 4 5)
    #(11 10 9 8 15 14 13 12 3 2 1 0 7 6 5 4)
    #(12 13 14 15 8 9 10 11 4 5 6 7 0 1 2 3)
    #(13 12 15 14 9 8 11 10 5 4 7 6 1 0 3 2)
    #(14 15 12 13 10 11 8 9 6 7 4 5 2 3 0 1)
    #(15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0)))

(define random:tap 24)

(define random:size 55)

(define (random:size-int l)
 (let ((trial (hex-string->number (make-string l #\f))))
  (if (and (exact? trial) (positive? trial) (>= *most-positive-fixnum* trial))
      l
      (random:size-int (- l 1)))))

(define random:chunk-size (* 4 (random:size-int 8)))

(define random:MASK
 (hex-string->number (make-string (quotient random:chunk-size 4) #\f)))

(define *random-state* '#())

(let ((random-strings
       '#("d909ef3e" "fd330ab3" "e33f7843" "76783fbd" "f3675fb3"
		     "b54ef879" "0be45590" "a6794679" "0bcd56d3" "fabcdef8"
		     "9cbd3efd" "3fd3efcd" "e064ef27" "dddecc08" "34444292"
		     "85444454" "4c519210" "c0366273" "54734567" "70abcddc"
		     "1bbdac53" "616c5a86" "a982efa9" "105996a0" "5f0cccba"
		     "1ea055e1" "fe2acd8d" "1891c1d4" "e6690270" "6912bccc"
		     "2678e141" "61222224" "907abcbb" "4ad6829b" "9cdd1404"
		     "57798841" "5b892496" "871c9cd1" "d1e67bda" "8b0a3233"
		     "578ef23f" "28274ef6" "823ef5ef" "845678c5" "e67890a5"
		     "5890abcb" "851fa9ab" "13efa13a" "b12278d6" "daf805ab"
		     "a0befc36" "0068a7b5" "e024fd90" "a7b690e2" "27f3571a"
		     0)))
 (set! *random-state* (make-vector (+ random:size 1) 0))
 (let ((nibbles (quotient random:chunk-size 4)))
  (do ((i 0 (+ i 1))) ((= i random:size))
   (vector-set!
    *random-state* i
    (hex-string->number
     (substring (vector-ref random-strings i) 0 nibbles))))))

;;; random:chunk returns an integer in the range of
;;; 0 to (- (expt 2 random:chunk-size) 1)
(define (random:chunk v)
 (let* ((p (vector-ref v random:size))
	(ans (logical:logxor
	      (vector-ref v (modulo (- p random:tap) random:size))
	      (vector-ref v p))))
  (vector-set! v p ans)
  (vector-set! v random:size (modulo (- p 1) random:size))
  ans))

(define (rand)
 (do ((ilen 0 (+ 1 ilen))
      (s random:MASK (+ random:MASK (* (+ 1 random:MASK) s))))
   ((>= s (- *most-positive-fixnum* 1))
    (let ((slop (modulo (+ s (- 1 *most-positive-fixnum*))
			*most-positive-fixnum*)))
     (let loop ((n ilen) (r (random:chunk *random-state*)))
      (cond ((not (zero? n))
	     (loop (+ -1 n)
		   (+ (* r (+ 1 random:MASK)) (random:chunk *random-state*))))
	    ((>= r slop) (modulo r *most-positive-fixnum*))
	    (else (loop ilen (random:chunk *random-state*)))))))))

;;; End of code taken from SLIB

(define log-math-precision 35.0)

(define minus-infinity (ieee-log 0.0))

(define first car)

(define second cadr)

(define rest cdr)

(define (reduce f l i)
 (cond ((null? l) i)
       ((null? (rest l)) (first l))
       (else (let loop ((l (rest l)) (c (first l)))
	      (if (null? l) c (loop (rest l) (f c (first l))))))))

(define (every-n p n)
 (let loop ((i 0)) (or (>= i n) (and (p i) (loop (+ i 1))))))

(define (sum f n)
 (let loop ((n (- n 1)) (c 0.0))
  (if (negative? n) c (loop (- n 1) (+ c (f n))))))

(define (add-exp e1 e2)
 (let* ((e-max (max e1 e2))
	(e-min (min e1 e2))
	(factor (floor e-min)))
  (if (= e-max minus-infinity)
      minus-infinity
      (if (> (- e-max factor) log-math-precision)
	  e-max
	  (+ (ieee-log (+ (exp (- e-max factor)) (exp (- e-min factor))))
	     factor)))))

(define (map-n f n)
 ;; needs work: To eliminate REVERSE.
 (let loop ((i 0) (c '()))
  (if (< i n) (loop (+ i 1) (cons (f i) c)) (reverse c))))

(define (map-n-vector f n)
 (let ((v (make-vector n)))
  (let loop ((i 0))
   (if (< i n)
       (begin (vector-set! v i (f i))
	      (loop (+ i 1)))))
  v))

(define (remove-if-not p l)
 ;; needs work: To eliminate REVERSE.
 (let loop ((l l) (c '()))
  (cond ((null? l) (reverse c))
	((p (first l)) (loop (rest l) (cons (first l) c)))
	(else (loop (rest l) c)))))

(define (positionv x l)
 (let loop ((l l) (i 0))
  (cond ((null? l) #f)
	((eqv? x (first l)) i)
	(else (loop (rest l) (+ i 1))))))

(define (make-matrix m n)
 (map-n-vector (lambda (i) (make-vector n)) m))

(define (make-matrix-initial m n initial)
 (map-n-vector (lambda (i) (make-vector n initial)) m))

(define (matrix-rows a) (vector-length a))

(define (matrix-columns a) (vector-length (vector-ref a 0)))

(define (matrix-ref a i j) (vector-ref (vector-ref a i) j))

(define (matrix-set! a i j x) (vector-set! (vector-ref a i) j x))

(define (matrix-row-ref a i) (vector-ref a i))

(define (matrix-row-set! a i v) (vector-set! a i v))

(define (determinant a)
 (if (not (= (matrix-rows a) (matrix-columns a)))
     (panic "Can only find determinant of a square matrix"))
 (call-with-current-continuation
  (lambda (return)
   (let* ((n (matrix-rows a))
	  (b (make-matrix n n))
	  (d 1.0))
    (do ((i 0 (+ i 1))) ((= i n))
     (do ((j 0 (+ j 1))) ((= j n)) (matrix-set! b i j (matrix-ref a i j))))
    (do ((i 0 (+ i 1))) ((= i n))
     ;; partial pivoting reduces rounding errors
     (let ((greatest (abs (matrix-ref b i i)))
	   (index i))
      (do ((j (+ i 1) (+ j 1))) ((= j n))
       (let ((x (abs (matrix-ref b j i))))
	(if (> x greatest) (begin (set! index j) (set! greatest x)))))
      (if (= greatest 0.0) (return 0.0))
      (if (not (= index i))
	  (let ((v (matrix-row-ref b i)))
	   (matrix-row-set! b i (matrix-row-ref b index))
	   (matrix-row-set! b index v)
	   (set! d (- d))))
      (let ((c (matrix-ref b i i)))
       (set! d (* d c))
       (do ((j i (+ j 1))) ((= j n))
	(matrix-set! b i j (/ (matrix-ref b i j) c)))
       (do ((j (+ i 1) (+ j 1))) ((= j n))
	(let ((e (matrix-ref b j i)))
	 (do ((k (+ i 1) (+ k 1))) ((= k n))
	  (matrix-set!
	   b j k (- (matrix-ref b j k) (* e (matrix-ref b i k))))))))))
    d))))

(define (invert-matrix! a b)
 (if (not (= (matrix-rows a) (matrix-columns a)))
     (panic "Can only invert a square matrix"))
 (let* ((n (matrix-rows a))
	(c (make-matrix n n)))
  (do ((i 0 (+ i 1))) ((= i n))
   (do ((j 0 (+ j 1))) ((= j n))
    (matrix-set! b i j 0.0)
    (matrix-set! c i j (matrix-ref a i j))))
  (do ((i 0 (+ i 1))) ((= i n)) (matrix-set! b i i 1.0))
  (do ((i 0 (+ i 1))) ((= i n))
   (if (zero? (matrix-ref c i i))
       (call-with-current-continuation
	(lambda (return)
	 (do ((j 0 (+ j 1))) ((= j n))
	  (if (and (> j i) (not (zero? (matrix-ref c j i))))
	      (begin (let ((e (vector-ref c i)))
		      (vector-set! c i (vector-ref c j))
		      (vector-set! c j e))
		     (let ((e (vector-ref b i)))
		      (vector-set! b i (vector-ref b j))
		      (vector-set! b j e))
		     (return (void)))))
	 (panic "Matrix is singular"))))
   (let ((d (/ (matrix-ref c i i))))
    (do ((j 0 (+ j 1))) ((= j n))
     (matrix-set! c i j (* d (matrix-ref c i j)))
     (matrix-set! b i j (* d (matrix-ref b i j))))
    (do ((k 0 (+ k 1))) ((= k n))
     (let ((d (- (matrix-ref c k i))))
      (if (not (= k i))
	  (do ((j 0 (+ j 1))) ((= j n))
	   (matrix-set!
	    c k j (+ (matrix-ref c k j) (* d (matrix-ref c i j))))
	   (matrix-set!
	    b k j (+ (matrix-ref b k j) (* d (matrix-ref b i j))))))))))))

(define (jacobi! a)
 (if (not (and (= (matrix-rows a) (matrix-columns a))
	       (every-n (lambda (i)
			 (every-n (lambda (j)
				   (= (matrix-ref a i j) (matrix-ref a j i)))
				  (matrix-rows a)))
			(matrix-rows a))))
     (panic "Can only compute eigenvalues/eigenvectors of a symmetric matrix"))
 (let* ((n (matrix-rows a))
	(d (make-vector n))
	(v (make-matrix-initial n n 0.0))
	(b (make-vector n))
	(z (make-vector n 0.0)))
  (do ((ip 0 (+ ip 1))) ((= ip n))
   (matrix-set! v ip ip 1.0)
   (vector-set! b ip (matrix-ref a ip ip))
   (vector-set! d ip (matrix-ref a ip ip)))
  (let loop ((i 0))
   (if (> i 50) (panic "Too many iterations in JACOBI!"))
   (let ((sm (sum (lambda (ip)
		   (sum (lambda (ir)
			 (let ((iq (+ ip ir 1)))
			  (abs (matrix-ref a ip iq))))
			(- n ip 1)))
		  (- n 1))))
    (if (not (zero? sm))
	(begin
	 (let ((tresh (if (< i 3) (/ (* 0.2 sm) (* n n)) 0.0)))
	  (do ((ip 0 (+ ip 1))) ((= ip (- n 1)))
	   (do ((ir 0 (+ ir 1))) ((= ir (- n ip 1)))
	    (let* ((iq (+ ip ir 1))
		   (g (* 100.0 (abs (matrix-ref a ip iq)))))
	     (cond
	      ((and (> i 3)
		    (= (+ (abs (vector-ref d ip)) g) (abs (vector-ref d ip)))
		    (= (+ (abs (vector-ref d iq)) g) (abs (vector-ref d iq))))
	       (matrix-set! a ip iq 0.0))
	      ((> (abs (matrix-ref a ip iq)) tresh)
	       (let* ((h (- (vector-ref d iq) (vector-ref d ip)))
		      (t (if (= (+ (abs h) g) (abs h))
			     (/ (matrix-ref a ip iq) h)
			     (let ((theta (/ (* 0.5 h) (matrix-ref a ip iq))))
			      (if (negative? theta)
				  (- (/ (+ (abs theta)
					   (sqrt (+ (* theta theta) 1.0)))))
				  (/ (+ (abs theta)
					(sqrt (+ (* theta theta) 1.0))))))))
		      (c (/ (sqrt (+ (* t t) 1.0))))
		      (s (* t c))
		      (tau (/ s (+ c 1.0)))
		      (h (* t (matrix-ref a ip iq))))
		(define (rotate a i j k l)
		 (let ((g (matrix-ref a i j))
		       (h (matrix-ref a k l)))
		  (matrix-set! a i j (- g (* s (+ h (* g tau)))))
		  (matrix-set! a k l (+ h (* s (- g (* h tau)))))))
		(vector-set! z ip (- (vector-ref z ip) h))
		(vector-set! z iq (+ (vector-ref z iq) h))
		(vector-set! d ip (- (vector-ref d ip) h))
		(vector-set! d iq (+ (vector-ref d iq) h))
		(matrix-set! a ip iq 0.0)
		(do ((j 0 (+ j 1))) ((= j n))
		 (cond ((< j ip) (rotate a j ip j iq))
		       ((< ip j iq) (rotate a ip j j iq))
		       ((< iq j) (rotate a ip j iq j)))
		 (rotate v j ip j iq)))))))))
	 (do ((ip 0 (+ ip 1))) ((= ip n))
	  (vector-set! b ip (+ (vector-ref b ip) (vector-ref z ip)))
	  (vector-set! d ip (vector-ref b ip))
	  (vector-set! z ip 0.0))
	 (loop (+ i 1))))))
  (do ((i 0 (+ i 1))) ((= i (- n 1)))
   (let ((k i)
	 (p (vector-ref d i)))
    (do ((l 0 (+ l 1))) ((= l (- n i 1)))
     (let* ((j (+ i l 1)))
      (if (>= (vector-ref d j) p)
	  (begin (set! k j) (set! p (vector-ref d j))))))
    (if (not (= k i))
	(begin (vector-set! d k (vector-ref d i))
	       (vector-set! d i p)
	       (do ((j 0 (+ j 1))) ((= j n))
		(let ((p (matrix-ref v j i)))
		 (matrix-set! v j i (matrix-ref v j k))
		 (matrix-set! v j k p)))))))
  (list d v)))

(define (clip-eigenvalues! a v)
 (let* ((j (jacobi! a))
	(l (first j))
	(e (second j)))
  (do ((k1 0 (+ k1 1))) ((= k1 (vector-length a)))
   (let ((a-k1 (vector-ref a k1))
	 (e-k1 (vector-ref e k1)))
    (do ((k2 0 (+ k2 1))) ((= k2 (vector-length a-k1)))
     (let ((e-k2 (vector-ref e k2))
	   (s 0.0))
      (do ((k 0 (+ k 1))) ((= k (vector-length a)))
       (set! s (+ s (* (vector-ref e-k1 k)
		       (max (vector-ref v k) (vector-ref l k))
		       (vector-ref e-k2 k)))))
      (vector-set! a-k1 k2 s)))))))

;;; EM

(define (e-step! x z models)
 (do ((i 0 (+ i 1))) ((= i (vector-length x)))
  (let ((xi (vector-ref x i))
	(zi (vector-ref z i)))
   (do ((j 0 (+ j 1))) ((= j (vector-length models)))
    ;; Compute for each model.
    (let* ((model (vector-ref models j))
	   (log-pi (model-log-pi model))
	   (mu (model-mu model))
	   (sigma-inverse (model-sigma-inverse model))
	   (log-determinant-sigma (model-log-determinant-sigma model))
	   (t 0.0))
     ;; Compute likelihoods (note: up to constant for all models).
     (set! t 0.0)
     (do ((k1 0 (+ k1 1))) ((= k1 (vector-length xi)))
      (let ((sigma-inverse-k1 (vector-ref sigma-inverse k1)))
       (do ((k2 0 (+ k2 1))) ((= k2 (vector-length xi)))
	(set! t (+ t (* (- (vector-ref xi k1) (vector-ref mu k1))
			(vector-ref sigma-inverse-k1 k2)
			(- (vector-ref xi k2) (vector-ref mu k2))))))))
     (vector-set! zi j (- log-pi (* 0.5 (+ log-determinant-sigma t))))))))
 (let ((l 0.0))
  (do ((i 0 (+ i 1))) ((= i (vector-length x)))
   (let ((s minus-infinity)
	 (zi (vector-ref z i)))
    ;; Normalize ownerships to sum to one.
    (do ((j 0 (+ j 1))) ((= j (vector-length models)))
     (set! s (add-exp s (vector-ref zi j))))
    (do ((j 0 (+ j 1))) ((= j (vector-length models)))
     (vector-set! zi j (exp (- (vector-ref zi j) s))))
    (set! l (+ l s))))
  ;; Return log likelihood.
  l))

(define (m-step! x models z clip)
 (let ((kk (vector-length (vector-ref x 0))))
  ;; For each model, optimize parameters.
  (do ((j 0 (+ j 1))) ((= j (vector-length models)))
   (let* ((model (vector-ref models j))
	  (mu (model-mu model))
	  (sigma (model-sigma model))
	  (s 0.0))
    ;; Optimize values.
    (do ((i 0 (+ i 1))) ((= i (vector-length x)))
     (set! s (+ s (vector-ref (vector-ref z i) j))))
    (let ((si (/ s)))
     (do ((k 0 (+ k 1))) ((= k kk))
      (let ((m 0.0))
       (do ((i 0 (+ i 1))) ((= i (vector-length x)))
	(set! m (+ m (* (vector-ref (vector-ref z i) j)
			(vector-ref (vector-ref x i) k)))))
       (vector-set! mu k (* si m)))))
    (let ((si (/ s)))
     (do ((k1 0 (+ k1 1))) ((= k1 kk))
      (let ((sigma-k1 (vector-ref sigma k1))
	    (mu-k1 (vector-ref mu k1)))
       (do ((k2 0 (+ k2 1))) ((= k2 kk))
	(let ((mu-k2 (vector-ref mu k2))
	      (m 0.0))
	 (do ((i 0 (+ i 1))) ((= i (vector-length x)))
	  (set! m (+ m (* (vector-ref (vector-ref z i) j)
			  (* (- (vector-ref (vector-ref x i) k1) mu-k1)
			     (- (vector-ref (vector-ref x i) k2) mu-k2))))))
	 (vector-set! sigma-k1 k2 (* si m)))))))
    (clip-eigenvalues! sigma clip)
    (set-model-pi! model (/ s (vector-length x)))
    (set-model-log-pi! model (ieee-log (/ s (vector-length x))))
    (invert-matrix! sigma (model-sigma-inverse model))
    (set-model-log-determinant-sigma! model (ieee-log (determinant sigma)))))))

(define (em! x z models clip em-kick-off-tolerance em-convergence-tolerance)
 (let loop ((old-log-likelihood minus-infinity) (starting? #t))
  (let ((log-likelihood (e-step! x z models)))
   (cond
    ((or (and starting? (> log-likelihood old-log-likelihood))
	 (> log-likelihood (+ old-log-likelihood em-convergence-tolerance)))
     (m-step! x models z clip)
     (loop log-likelihood
	   (and starting?
		(not (= (vector-length models) 1))
		(or (= old-log-likelihood minus-infinity)
		    (< log-likelihood
		       (+ old-log-likelihood em-kick-off-tolerance))))))
    (else old-log-likelihood)))))

(define (noise epsilon)
 (- (* 2.0 epsilon (/ (exact->inexact (rand)) *most-positive-fixnum*))
    epsilon))

(define (initial-z ii jj)
 (map-n-vector
  (lambda (i)
   (let ((zi (map-n-vector (lambda (j)
			    (+ (/ (exact->inexact jj))
			       (noise (/ (exact->inexact jj)))))
			   jj))
	 (s 0.0))
    (do ((j 0 (+ j 1))) ((= j jj)) (set! s (+ s (vector-ref zi j))))
    (let ((si (/ s)))
     (do ((j 0 (+ j 1))) ((= j jj))
      (vector-set! zi j (* si (vector-ref zi j)))))
    zi))
  ii))

(define (ems x clip em-kick-off-tolerance em-convergence-tolerance
	     ems-convergence-tolerance)
 (let loop ((jj 1)
	    (old-z (void))
	    (old-models (void))
	    (old-log-likelihood minus-infinity))
  (let* ((kk (vector-length (vector-ref x 0)))
	 (z (initial-z (vector-length x) jj))
	 (models (map-n-vector
		  (lambda (j)
		   ;; needs work: Should replace 0.0 with ((LAMBDA ())).
		   (make-model 0.0
			       (make-vector kk)
			       (make-matrix kk kk)
			       0.0
			       (make-matrix kk kk)
			       0.0))
		  jj)))
   (m-step! x models z clip)
   (let ((new-log-likelihood
	  (em!
	   x z models clip em-kick-off-tolerance em-convergence-tolerance)))
    (if (> (- (/ old-log-likelihood new-log-likelihood) 1.0)
	   ems-convergence-tolerance)
	(loop (+ jj 1) z models new-log-likelihood)
	(list old-z old-models))))))

(define (em-clusterer x clip em-kick-off-tolerance em-convergence-tolerance
		      ems-convergence-tolerance)
 (let* ((z-models (ems x clip em-kick-off-tolerance
		       em-convergence-tolerance
		       ems-convergence-tolerance))
	(z (first z-models))
	(models (second z-models)))
  (e-step! x z models)
  (let ((clusters
	 (map-n (lambda (i)
		 (let ((zi (vector->list (vector-ref z i))))
		  (list i (positionv (reduce max zi minus-infinity) zi))))
		(vector-length z))))
   (map-n (lambda (j)
	   (map (lambda (cluster) (vector-ref x (first cluster)))
		(remove-if-not (lambda (cluster) (= (second cluster) j))
			       clusters)))
	  (vector-length (vector-ref z 0))))))

(do ((i 0 (+ i 1))) ((= i 100))
 (write
  (em-clusterer
   '#(#(1.0) #(2.0) #(3.0) #(11.0) #(12.0) #(13.0)) '#(1.0) 10.0 1.0 0.01))
 (newline))

From: William D Clinger
Subject: Re: Query on efficiency
Date: 
Message-ID: <6gmgnj$ffr$1@goldenapple.srv.cs.cmu.edu>
Abstract:  No, there is no inherent asymptotic inefficiency
associated with functional programming.  Yes, there probably
is a small constant factor of inefficiency associated with
functional programming in current implementations of languages
like Standard ML, Scheme, and (especially) Common Lisp.

Jeffrey Mark Siskind wrote:

    While this may be true in theory, in practise I have yet to
    see a compiler which generates as good code for functional
    programs as for imperative programs. Let me give a concrete
    example....

    I put forth this benchmark as a challenge to the functional
    programming community.

This is an interesting challenge.  Most of the difference between
the performance of Siskind's EM-IMP and EM-FUN benchmarks is due
to inefficiencies in EM-FUN that have nothing to do with imperative
versus functional programming.  By getting rid of some of the more
obvious inefficiencies I have been able to eliminate over half of
the performance differential.  I will be able to eliminate more,
and will report the results in these newsgroups, but I do not expect
to be able to make the functional version run quite as fast as the
imperative version, mainly because Scheme provides little support
for functional programming with vectors and arrays.

To make the functional version run as fast as the imperative version
would require features such as state monads and/or destructive update
optimization.  In purely functional languages, these features are
common.  Imperative languages like Scheme don't have these features,
partly because they are harder to support in imperative languages,
but mainly because most people would use side effects instead.

The reason I have decided to put so much time into Siskind's
benchmarks is that it is hard to devise a convincing benchmark
that demonstrates a difference in performance between functional
and imperative programming.  Unless the benchmark is very small,
it is hard to be sure that the functional and imperative versions
are using the same algorithms, are written equally well, and have
been compiled equally well.

Some years ago I wrote an LL(1) parser generator in Scheme.  The
computational core of this program, which computed the FIRST, FOLLOW,
and director sets, was purely functional.  I thought it would run
faster if I translated it into imperative code.  It did run a little
faster on the systems of that time.  Nowadays, on an Ultrasparc
running either Chez Scheme or Larceny, the purely functional code
runs a hair faster than the imperative code.

After translating the code into imperative style, I realized that
I could make it run even faster by memoizing the FIRST sets.  That
made a big difference.  Then I realized I could do essentially the
same thing with the purely functional version.  When the memoizing
is done without side effects, the proper name for it is dynamic
programming.  Here are the relative speeds of the four versions,
when computing the director sets for Modula-2, as measured today
on an Ultrasparc named vega:

                                      Chez v5.0a   Larceny v0.32
  imperative with memoization:           100           100
  functional with dynamic programming:    74            85
  functional without cleverness:          19            21
  imperative without cleverness:          17            20

Do these figures imply that imperative programming is a little
faster than functional programming, a hair slower, or what?  No
valid conclusion can be drawn from such limited data.

So which implementation did I use when I released the parser
generator?  The functional version with dynamic programming.
It may not be quite as fast as the memoized imperative code,
but it's simpler and easier to maintain.

Turning back to theory for a moment:  There is no asymptotic
performance penalty associated with sequential functional
programming compared to sequential imperative programming.  Not
even a logarithmic factor.  This follows from the fact that
sequential imperative programs are single-threaded by definition,
which implies that all side effects are performed on values that
are dead following the update (because imperative updates "kill"
the previous value, to use the jargon of optimizing compilers).
A slight modification to your favorite algorithm for
CPS-conversion suffices to convert the imperative program into
implicit-state-passing form, without changing its asymptotic
complexity.  We can then convert the program into state-monad
functional form, which doesn't change its asymptotic complexity
because we can rely on the functional updates to be implemented
as imperative updates on the target machine.  QED

Notice, please, that functional programs are not the same as
functional machines.  Nicholas Pippenger has proved that
the asymptotic time complexity of functional machines are
sometimes worse than the asymptotic time complexity of imperative
machines, for the somewhat specialized circumstances of online
time complexity, an infinite set of symbols that can each be
read in constant time, and no data structures larger than
pairs (cons cells) [ACM TOPLAS 19(2), March 1997].  Pippenger's
results are not applicable to the efficiency of functional
programming, because modern implementations of functional
languages generally execute on an imperative target machine.

It is wonderful that functional programs can always be made to
run with the same asymptotic efficiency as imperative programs,
but constant factors matter too.  Hence Siskind's challenge.

In mostly-functional languages like Standard ML, Scheme, and (to
a lesser extent) Common Lisp, functional programming does have a
few performance advantages, such as avoiding the overhead of a
write barrier.  It also has some disadvantages, notably the
aggregate update problem:  None of these languages provide state
monads, array comprehension, or other serious support for large
functional data structures.  You have to construct these things
on your own, and they will probably not be as efficient as they
would be if built-in to the language.

In practice, it is easy to write small functional programs in
SML or Scheme that are likely to run as fast as any equivalent
imperative program, but it is harder to find large functional
programs for which this is true.  Of course, it's hard to make
any valid comparisons at all between large programs.  Siskind's
programs are just barely small enough for me to believe that a
fair comparison will be possible.

William D Clinger
From: William D Clinger
Subject: Re: Query on efficiency
Date: 
Message-ID: <6hoh1a$be0$1@goldenapple.srv.cs.cmu.edu>
Here is a summary of what I have learned about Jeffrey Mark Siskind's EM
benchmarks.  For the full story, including source code and procedure call
profiles, please see

    http://www.ccs.neu.edu/home/will/Twobit/Benchmarks/EM/index.html

Recall that Siskind posted a pair of numerical benchmarks that computed
the same results.  Both were written in Scheme.  One was written in an
imperative style, the other in a functional style.  Siskind reported
that the functional version was 1.44 to 2.69 times as slow as the
imperative version, depending on the implementation of Scheme.  He
wrote:

  So you may say, OK, all of the Scheme implementations that you tested are
  broken. Well, here is my challenge. Rewrite these benchmarks faithfully in
  your favourite language. Any language that supports both a functional and
  imperative style. And run them in your favourite implementation of that
  language. I predict that you will get similar results.

The main reason why Siskind's imperative code is faster than his functional
code is that the functional version was written in a gratuitously inefficient
style that resulted in six times as many procedure calls to top-level and
anonymous procedures as in the imperative version.

I cleaned up the functional version to make it more faithful to the
algorithms and coding styles that were used in the imperative version.
During this cleanup, I was careful not to use any side effects within
a procedure unless that procedure was a primitive that had been coded
using side effects in the original functional [sic] version.  The
cleaned-up version of the functional benchmark still performs about 40%
more calls than the imperative version, but is only 5 to 10 per cent
slower.

William D Clinger