From: Ken Anderson
Subject: Fannkuch revisited or Benchmarking is hard
Date: 
Message-ID: <KANDERSO.94Sep23150911@wheaton.bbn.com>
#||

Compile and load this file to run the fannkuch benchmark.

I have worked to make this study as complete and accurate as i can.  I
apologize for anything that isn't quite right.  Please help me correct
them.  Duane Rettig (·····@franz.com) provided a great deal of help to get
me to understand what is going on, and has developed two patches (0230 and
0231) that provide a significant performance improvement (almost 30%).  The
before and after performance figures are referred to as "Allegro1" and
"Allegro2" below.  Our collaboration is an excellent example of what Lisp
vendors and users can do together to make the world a safer place for
object kind.


HISTORY:

This benchmark came out of a thread on comp.lang.lisp in Sept 1994
originated by Bruno Haible (······@ma2s2.mathematik.uni-karlsruhe.de).  The
original post introduced the language Beta to the news group and in passing
mentioned an "integer hacking" benchmark that indicated that at least some
Lisp implementations were much slower (50 to 100 times) on the benchmark
than C.

Several people then began studying the benchmark.  Lawrence Mayka
(···@polaris.ih.att.com) produced a properly optimized Lisp version.  I
modified that version to pull out the formatted output and array
construction.  Jacob Seligmann (Jacob Seligmann) provided Haible's C
version and a Beta version provided at the end of this file.

The following exhibit shows the time in seconds for running various version
of the fannkuch function, where N = 9 or 10, for various Lisp and C
implementations on a Sun Sparc 10:

Exhibit 1: 

Time in seconds for various versions of the fannkuch benchmark.
Rightmost two column show time relative to gcc -O2.

Optimized    NO  YES  YES   YES   YES   RELATIVE
N             9    9    9    10    10     9   10
I/O         YES  YES   NO   YES    NO    NO   NO
gcc -02              1.72       19.65  1.00 1.00
cc -O2               1.83       21.54  1.06 1.10
Allero2   18.83 3.25 2.35 38.40 26.95  1.36 1.37
CMU        8.59 2.84 2.65 32.24 34.08  1.54 1.73
Allegro1  19.12 4.60 2.85 55.78 34.08  1.65 1.73
cc                   2.93       34.21  1.70 1.74        
Lispworks 19.48 3.00 3.00 35.16 35.20  1.74 1.79
gcc                  3.37       38.95  1.95 1.98
Lucid      7.45 3.60 3.50 41.58 41.57  2.03 2.12

ANALYSIS:

Good benchmarking is harder than you think. 

While the original Lisp version of the fannkuch function is about 10 times
slower than optimized C, the optimized version takes 1.75 times the
optimized C time, which is comparable to the time for unoptimized C.

While the original version, FANNKUCH-1, looked optimized, there were
several important optimizations missing.  For example:

o I in (DOTIME (I N) ...) was not declared to be a fixnum which can cause
  generic arithmetic and comparison to be used.

o There are several places where code of the form (SETF (SVREF X I)
  RESULT)" should have RESULT declared as FIXNUM to avoid a GC write
  barrier check.

o While the arrays were declared to be of type (SIMPLE-ARRAY FIXNUM (*)),
  they were accessed with SVREF which lets the compiler assume it is dealing
  with a SIMPLE-VECTOR which is wrong.

o Computation of (CEILING K 2) in some Lisp's can be quite slow.  In the
  leftmost Allegro time reported above, CEILING accounted for 60% of the
  run time.  Allegro has recently provided a patch to fix this.  The
  FANNKUCH-FAST version uses (ash (1+ K) -1) to perform this computation.

So why is the Lisp version slower?  Some people speculated that the
creation of 4 arrays and the I/O should be removed because it would have a
slight effect that isn't relevant.  So, I produced a version,
FANNKUCH-FAST-1 and FANNKUCH-FAST-2, that separated that out from the body
of the algorithm.  The effect was small except for Allegro (and MCL not
shown here) where the FANNKUCH-FAST-1 was much faster than FANNKUCH-FAST.
The effect was over 60% which could not be explained by a little I/O.

As originally coded as one page of code, some crucial variables like I, K,
K2, and PERM were allocated to the stack rather than to registers.  By
breaking up the algorithm into two peices all the variables became register
allocated and performance improved accordingly.

I disassembled three important loops to check the quality of the compiled
code produced (see the code below):

Exhibit 2: 

Lines of code three loops.

          fill-i svcopy flip
gcc -O2   5  9    5 10  10 15
Allegro2  5  7    7  9  12 14
CMU       6  9    7 10  13 15
Allegro1  6  7    8  9  16 17
Lucid     6  8    8 10  13 15
Lispworks 8 10    9 11  17 


For each loop two numbers are shown.  The first is the number of
instructions in the body of the do loop.  The second is the number of
instructions in the entire loop, the loop's "footprint".

The bodies of the loops are smaller for C than for Lisp, while the overall
footprint of a loop is smaller for Lisp than for C.  Lets look at the
footprint size first.

Take a loop like 

 (dotimes (i n) ...)

In a simple C-like assembly language, the loop is coded as something like:

    GCC loop              Typical Lisp loop        Allegro loop
                                    
     i = 0                       i = 0                   i= 0
     if (i=n) goto end	         goto test          top  if (< i N) goto body
      ...		    top  ...                     return     .
 top  ...                        i = i + 4          body ....
     i = i +  1            test  if (i < N) goto top     i = i + 4 
     if (i < N) goto top                                 goto top
 end

To get the full details, i recommend dissassembling the functions yourself.

The difference there is a matter of style, rather than language.  The gcc
compiler does a test and start executing the body of the loop, while the
typical Lisp code jumps to the test which is at the end of the loop body.
Allegro originally put the test first which requires an extra instruction,
but a patch will soon be available to correct this.

The number of instructions in the body of the loop varies between the
different Lisps.  The Lisps with the higher instruction counts are
typically due to one or more redundant instructions that could be removed.
There is no reason such deficiencies could not be corrected.

To see the influence that the two languages have on the compiled code, lets
look at the loop body for the function FILL-I:

  (dotimes (i N)
   (setf (aref p i) i))

    Compiled C               Compiled Lisp

1      i = 0                      i = 0
2      goto test                  goto test
3 top  temp = i << 2         top temp = i + ARRAYOFFSET
4      p[j] = i                   p[temp] = i
5      i = i + 1                  i = i + 4
6 test if (i < N) goto top   test if (i < N) goto top

Here the loops have been written in the same style for ease of comparison.
While both take the same number of instructions, there are two differences.
First, Lisp uses fixnums, which are typically implemented as a machine
integer with the two lower bits of #b00 used as a type tag.  We can see the
effect of this in line 5 where i is incremented by 4 rather than by 1 as in
the C version.  This form of fixnum is very convenient for array indexing
where fixnums can be used directly.  In C, where machine integer are used,
they must be left shifted before they can be used as array offsets.

On the other hand, when accessing a Lisp array, one must account for the
array type tag.  This is done by adding an offset (ARRAYOFFSET) to the
index to remove the tag.  On CISC computers, like the M68000's, the offset
and index can be added to the array pointer in one instruction, but on RISC
machines, two instructions are required.

While ARRAYOFFSET is a loop constant, it is generally no pulled out of the
loop if an interrupt could occur during the loop.  The reason for this is
that if a GC occurs, it must be able to recognize all pointers to the array
as lisp-values, so that the pointers can be changed if the array itself
changes location.  When a loop is noninterruptable, the compiled Lisp code
could be smaller than the compiled C code, because the GC would never be
invoked."

CONCLUSIONS

1.  Take time to benchmark carefully and understand where the time is
going.  Poorly optimized Lisp code can easily make Lisp look bad.

2.  Breaking up algorithms into smaller (less than page size) chunks can
lead to better performance due to reduced stack allocation of variables.

3.  The relative slowness between current Lisp implementations and C is
about 1.75. This difference correlates reasonably well with the relative
number of instructions in critical loops.  Lisp loops tend to have smaller
footprints but larger loop bodies.  Properly compiled Lisp code should
provide comparable loop sizes (even smaller) and performance on this
benchmark.

||#

(defun fannkuch-1 (&optional (n (progn
                                (format *query-io* "n = ?")
                                (parse-integer (read-line *query-io*))
				)          )  )
  ;; Original benchmark.
  (unless (and (> n 0) (<= n 100)) (return-from fannkuch-1))
  (let ((n n))
    (declare (fixnum n))
    (let ((perm (make-array n :element-type 'fixnum))
          (perm1 (make-array n :element-type 'fixnum))
          (zaehl (make-array n :element-type 'fixnum))
          (permmax (make-array n :element-type 'fixnum))
          (bishmax -1))
      (declare (type (simple-array fixnum (*)) perm perm1 zaehl permmax))
      (declare (fixnum bishmax))
      (dotimes (i n) (setf (svref perm1 i) i))
      (prog ((\t n))
        (declare (fixnum \t))
        Kreuz
          (when (= \t 1) (go standardroutine))
          (setf (svref zaehl (- \t 1)) \t)
          (decf \t)
          (go Kreuz)
        Dollar
          (when (= \t n) (go fertig))
          (let ((perm0 (svref perm1 0)))
            (dotimes (i \t) (setf (svref perm1 i) (svref perm1 (+ i 1))))
            (setf (svref perm1 \t) perm0)
          )
          (when (plusp (decf (svref zaehl \t))) (go Kreuz))
          (incf \t)
          (go Dollar)
        standardroutine
          (dotimes (i n) (setf (svref perm i) (svref perm1 i)))
          (let ((Spiegelungsanzahl 0) (k 0))
            (declare (fixnum Spiegelungsanzahl k))
            (loop
              (when (= (setq k (svref perm 0)) 0) (return))
              (let ((k2 (ceiling k 2)))
                (declare (fixnum k2))
                (dotimes (i k2) (rotatef (svref perm i) (svref perm (- k i))))
              )
              (incf Spiegelungsanzahl)
            )
            (when (> Spiegelungsanzahl bishmax)
              (setq bishmax Spiegelungsanzahl)
              (dotimes (i n) (setf (svref permmax i) (svref perm1 i)))
          ) )
          (go Dollar)
        fertig
      )
      (format t "The maximum was ~D.~% at " bishmax)
      (format t "(")
      (dotimes (i n)
        (when (> i 0) (format t " "))
        (format t "~D" (+ (svref permmax i) 1))
      )
      (format t ")")
      (terpri)
      (values)
) ) )

(defun fannkuch-fast (&optional (n (progn
				     (format *query-io* "n = ?")
				     (parse-integer (read-line *query-io*)))))
  ;; Properly optimized version.
  (declare (optimize (safety 0) (speed 3) (space 0) (debug 0))
	   (fixnum n))
  (unless (and (> n 0) (<= n 100))
    (return-from fannkuch-fast))
  (let ((perm (make-array n :initial-element 0))
	(perm1 (make-array n :initial-element 0))
	(zaehl (make-array n :initial-element 0))
	(permmax (make-array n :initial-element 0))
	(bishmax -1))
    (declare (type simple-vector perm perm1 zaehl permmax)
	     (dynamic-extent perm perm1 zaehl permmax)
	     (fixnum bishmax))
    (dotimes (i n)
      (declare (fixnum i))
      (setf (svref perm1 i) i))
    (prog ((\t n))
      (declare (fixnum \t))
     Kreuz
      (when (= \t 1)
	(go standardroutine))
      (setf (svref zaehl (the fixnum (1- \t))) \t)
      (setf \t (the fixnum (1- \t)))
      (go Kreuz)
     Dollar
      (when (= \t n)
	(go fertig))
      (let ((perm0 (svref perm1 0)))
	(declare (fixnum perm0))
	(dotimes (i \t)
	  (declare (fixnum i))
	  (setf (svref perm1 i) (the fixnum (svref perm1 (the fixnum (1+ i))))))
	(setf (svref perm1 \t) perm0))
      (when (> (the fixnum (setf (svref zaehl \t)
			     (the fixnum (1- (the fixnum (svref zaehl \t))))))
	       0)
	(go Kreuz))
      (setf \t (the fixnum (1+ \t)))
      (go Dollar)
     standardroutine
      (dotimes (i n)
	(declare (fixnum i))
	(setf (svref perm i) (the fixnum (svref perm1 i))))
      (let ((Spiegelungsanzahl 0)
	    (k 0))
	(declare (fixnum Spiegelungsanzahl k))
	(loop
	  (when (= (the fixnum (setq k (svref perm 0))) 0)
	    (return))
	  (let ((k2 (ash (the fixnum (1+ k)) -1)))
	    (declare (fixnum k2))
	    (dotimes (i k2)
	      (declare (fixnum i))
	      (rotatef (the fixnum (svref perm i))
		       (the fixnum (svref perm (the fixnum (- k i)))))))
	  (setf Spiegelungsanzahl (the fixnum (1+ Spiegelungsanzahl))))
	(when (> Spiegelungsanzahl bishmax)
	  (setq bishmax Spiegelungsanzahl)
	  (dotimes (i n)
	    (declare (fixnum i))
	    (setf (svref permmax i) (the fixnum (svref perm1 i))))))
      (go Dollar)
     fertig)
    (format t "The maximum was ~D.~% at " bishmax)
    (format t "(")
    (dotimes (i n)
      (declare (fixnum i))
      (when (> i 0)
	(format t " "))
      (format t "~D" (the fixnum (1+ (the fixnum (svref permmax i))))))
    (format t ")")
    (terpri)
    (values)))

(defun fannkuch-fast-1 (&optional (n 10))
  ;; Driver for fannkuch-fast-2
  (declare (optimize (safety 0) (speed 3) (space 0) (debug 0))
	   (fixnum n))
  (unless (and (> n 0) (<= n 100))
    (return-from fannkuch-fast-1))
  (let ((perm (make-array n :initial-element 0))
	(perm1 (make-array n :initial-element 0))
	(zaehl (make-array n :initial-element 0))
	(permmax (make-array n :initial-element 0))
	)
    (declare (type simple-vector perm perm1 zaehl permmax)
	     (dynamic-extent perm perm1 zaehl permmax))
    (let ((bishmax (fannkuch-fast-2 n perm perm1 zaehl permmax)))
      (declare (fixnum bishmax))
      (format t "The maximum was ~D.~% at " bishmax)
      (format t "(")
      (dotimes (i n)
	(declare (fixnum i))
	(when (> i 0)
	  (format t " "))
	(format t "~D" (the fixnum (1+ (the fixnum (svref permmax i))))))
      (format t ")"))
    (terpri)
    (values)))

(defun fannkuch-fast-2 (n perm perm1 zaehl permmax)
  ;; Guts of benchmark.
  (declare (optimize (safety 0) (speed 3) (space 0) (debug 0))
	   (type simple-vector perm perm1 zaehl permmax)
	   (type (integer 1 100) n))
  (let ((bishmax -1))
    (declare (fixnum bishmax))
    (dotimes (i n)
      (declare (fixnum i))
      (setf (svref perm1 i) i))
    (prog ((\t n))
       (declare (fixnum \t))
       Kreuz
       (when (= \t 1)
	 (go standardroutine))
       (setf (svref zaehl (the fixnum (1- \t))) \t)
       (setf \t (the fixnum (1- \t)))
       (go Kreuz)
       Dollar
       (when (= \t n)
	 (go fertig))
       (let ((perm0 (svref perm1 0)))
	 (declare (fixnum perm0))
	 (dotimes (i \t)
	   (declare (fixnum i))
	   (setf (svref perm1 i) (the fixnum (svref perm1 (the fixnum (1+ i))))))
	 (setf (svref perm1 \t) perm0))
       (when (> (the fixnum (setf (svref zaehl \t)
				  (the fixnum (1- (the fixnum (svref zaehl \t))))))
		0)
	 (go Kreuz))
       (setf \t (the fixnum (1+ \t)))
       (go Dollar)
       standardroutine
       (dotimes (i n)
	 (declare (fixnum i))
	 (setf (svref perm i) (the fixnum (svref perm1 i))))
       (let ((Spiegelungsanzahl 0)
	     (k 0))
	 (declare (fixnum Spiegelungsanzahl k))
	 (loop
	  (when (= (the fixnum (setq k (svref perm 0))) 0)
	    (return))
	  (let ((k2 (ash (the fixnum (1+ k)) -1)))
	    (declare (fixnum k2))
	    (dotimes (i k2)
	      (declare (fixnum i))
	      (rotatef (the fixnum (svref perm i))
		       (the fixnum (svref perm (the fixnum (- k i)))))))
	  (setf Spiegelungsanzahl (the fixnum (1+ Spiegelungsanzahl))))
	 (when (> Spiegelungsanzahl bishmax)
	   (setq bishmax Spiegelungsanzahl)
	   (dotimes (i n)
	     (declare (fixnum i))
	     (setf (svref permmax i) (the fixnum (svref perm1 i))))))
       (go Dollar)
       fertig)
    (values bishmax)))

(defun fannkuch-benchmark ()
  #-gcl
  (dolist (f '(fill-i svcopy flip))
    (print f)
    (disassemble f))
  (let ((perm (make-array 100))
	(perm1 (make-array 100))
	(zaehl (make-array 100))
	(permmax (make-array 100)))
    #-mcl
    (dotimes (i 3)
      (time (fannkuch-1 9)))
    (dotimes (i 3)
      (time (fannkuch-fast 9)))
    (dotimes (i 3)
      (time (fannkuch-fast-2 9 perm perm1 zaehl permmax)))
    (dotimes (i 3)
      (time (fannkuch-fast 10)))
    (dotimes (i 3)
      (time (fannkuch-fast-2 10 perm perm1 zaehl permmax)))))


(defun fill-i (perm1 N)
  (declare (optimize (safety 0) (speed 3) (space 0) (debug 0))
	   (type simple-vector perm1)
	   (fixnum N))
  (dotimes (i n)
    (declare (fixnum i))
    (setf (svref perm1 i) i)))

(defun svcopy (perm perm1 N)
  (declare (optimize (safety 0) (speed 3) (space 0) (debug 0))
	   (type simple-vector perm perm1)
	   (fixnum N))
  (dotimes (i n)
    (declare (fixnum i))
    (setf (svref perm i) (the fixnum (svref perm1 i)))))

(defun flip (perm k k2)
  (declare (optimize (safety 0) (speed 3) (space 0) (debug 0))
	   (simple-vector perm)
	   (fixnum k k2))
  (dotimes (i k2)
    (declare (fixnum i))
    (rotatef (the fixnum (svref perm i))
	     (the fixnum (svref perm (the fixnum (- k i)))))))

(fannkuch-benchmark)


#||
=== fill-i.c ===
void filli(int * perm1, int N)
{
  int i;
  for (i = 0; i < N; i++)
    perm1[i] = i;
}

void svcopy (int * perm, int * perm1, int N)
{
  int i;
  for (i = 0; i < N; i++)
    perm[i] = perm1[i];
}
       
void flip (int * perm, int k, int k2)
{
  int i;
  for (i = 0; i < k2; i++)
    {
      int temp = perm[i];
      perm[i] = perm[k-i];
      perm[k-i] = temp;
   }
}

=== fannkuch.c ===

/* Programm zur Simulation des Pfannkuchenspiels */
/* Bruno Haible 10.06.1990 */

#include <stdio.h>

#define CLOCKS_PER_SEC 1000000 /* Sun Sparc 10 */
#define PermLength 100
#define PermCopy(Source,Dest,n)                                              \
  {register int h = n; register int *s = Source; register int *d = Dest;     \
   while (h) {*d++ = *s++; h--;};                                               \
  }

void main()
{ int n;
  long start;
  int Perm[PermLength];
  int Perm1[PermLength];
  int Zaehl[PermLength];
  int PermMax[PermLength];
  int BishMax; /* bisheriges Maximum aller Spiegelungsanzahlen */
/* 
  printf("n = ?");
  scanf("%d",&n); if (!((n>0)&&(n<=PermLength))) goto Ende;
*/
  start = clock();
  n = 10;
  BishMax=-1;
  /* Erzeugung aller Permutationen */
  /* Erzeuge die Permutationen nach dem Algorithmus:
     PERM1[0..n-1] := (0,...,n-1]
     t:=n
   # if t=1 then standardroutine, goto $
     Z_hl[t-1]:=t
     t:=t-1, goto #
   $ if t<n then goto &, if t=n then fertig.
   & rotiere PERM1[0..t], dec Z_hl[t], if >0 then goto #
     t:=t+1, goto $
  */
  { register int i;
    for (i=0; i<n; i++) { Perm1[i]=i; };
  };
  { register int t;
    t=n;
    Kreuz:  if (t==1) goto standardroutine;
            Zaehl[t-1]=t;
            t=t-1; goto Kreuz; /* rekursiver Aufruf */
    Dollar: /* R_cksprung aus dem rekursiven Aufruf */
            if (t==n) goto Fertig;
            /* Rotieren: Perm1[0] <- Perm1[1] <- ... <- Perm1[n-1] <- Perm1[0] */
            { register int Perm0; register int i;
              Perm0=Perm1[0];
              for (i=0; i<t; i++) {Perm1[i]=Perm1[i+1];};
              Perm1[t]=Perm0;
            };
            if (--Zaehl[t]) goto Kreuz;
            t=t+1; goto Dollar;

    standardroutine:
      PermCopy(Perm1,Perm,n); /* Perm := Perm1 */
      { int Spiegelungsanzahl;
        Spiegelungsanzahl=0;
        { unsigned int k;
          while (!((k=Perm[0]) == 0))
           {/* Spiegle Perm[0..k] */
            unsigned int k2=(k+1)/2;
            register int *up = &Perm[0]; register int *down = &Perm[k];
            { register int i;
              i=k2; while (i) {int h; h=*up; *up++=*down; *down--=h; i--;};
            }
            Spiegelungsanzahl++;
           };
        };
        if (Spiegelungsanzahl>BishMax)
          {BishMax=Spiegelungsanzahl; PermCopy(Perm1,PermMax,n);};
      }
      goto Dollar;
  }
  Fertig:
/*
  printf("Das Maximum betrug %d.\n bei ",BishMax);
          {register unsigned int i;
           printf("(");
           for (i=0; i<n; i++)
            {if (i>0) printf(" ");
             printf("%d",PermMax[i]+1);
            };
           printf(")");
          };
          printf("\n");
*/
  Ende: ;
  printf ("%d\n", clock() - start);
}
=== fannkuch.bet ===

ORIGIN '~beta/basiclib/v1.4/betaenv'
--program:descriptor--
(#
   PermLength: (# exit 100 #);
   Perm, Perm1, PermMax, Zaehl: [PermLength]@integer;
   h, i, k, n, t, up, down, BishMax, Spiegelungsanzahl: @integer;
do
   'n = ?' -> putText;
   getInt -> n;
   (if (n < 1) or (n > PermLength) then stop if);
   
   -1 -> BishMax;
   (for i:n repeat i-1 -> Perm1[i] for);
   n -> t;
   
   again: 
     (# 
     do (for i:t repeat i -> Zaehl[i] for); 1 -> t;
        (for i:n repeat Perm1[i] -> Perm[i] for);
        0 -> Spiegelungsanzahl;
        
        while1: 
          (# 
          do (if Perm[1]->k // 0 then leave while1 if);
             1 -> up; k+1 -> down; down/2 -> i;
             while2:
               (# 
               do (if i // 0 then leave while2 if);
                  Perm[up] -> h; Perm[down] -> Perm[up]; h -> Perm[down];
                  up+1 -> up; down-1 -> down; i-1 -> i;
                  restart while2;
               #);
             Spiegelungsanzahl+1 -> Spiegelungsanzahl;
             restart while1;
          #);

        (if Spiegelungsanzahl > BishMax then
            Spiegelungsanzahl -> BishMax; 
            (for i:n repeat Perm1[i] -> PermMax[i] for)
        if);
        
        while3:
          (# 
          do (if t // n then leave while3 if);
             Perm1[1] -> h; 
             (for i:t repeat Perm1[i+1] -> Perm1[i] for);
             h -> Perm1[t+1];
             (if (Zaehl[t+1]-1 -> Zaehl[t+1]) <> 0 then restart again if);
             t+1 -> t;
             restart while3;
          #);
     #);
   
   'Das Maximum betrug ' -> putText; 
   BishMax -> putInt; 
   '.\n bei (' -> putText;
   (for i:n repeat 
        (if i > 1 then ' ' -> put if); 
        PermMax[i]+1 -> putInt; 
   for);
   ')' -> putLine;
#)
||#
--
Ken Anderson 
Internet: ·········@bbn.com
BBN ST               Work Phone: 617-873-3160
10 Moulton St.       Home Phone: 617-643-0157
Mail Stop 6/4a              FAX: 617-873-2794
Cambridge MA 02138
USA
From: William D. Gooch
Subject: Re: Fannkuch revisited or Benchmarking is hard
Date: 
Message-ID: <Pine.A32.3.90.940923144449.42373c-100000@swim5.eng.sematech.org>
Excellent work, Ken!  Thanks for taking the time to do this.