new benchmark results for 8 CL implementations in cliki.net

From: Christian Pietsch
Subject: new benchmark results for 8 CL implementations in cliki.net
Date: Wed, 28 Jul 2004 22:58:45 +0000
Message-ID: <ce9b35$6jir0$1@hades.rz.uni-saarland.de>

Hi Lisp hackers,

after installing a couple of Lisp implementations on my new Linux box,
I could not resist running Eric Marsden's benchmark collection on
them.  These include the classic Gabriel benchmarks as well as
``mathematical functions, bignum-intensive operations, CLOS test,
hashtable exercising, read-line exercising, and various operations of
arrays, strings and bitvectors,'' if I may quote from his site:
http://purl.org/net/emarsden/home/downloads

You find my results at http://www.cliki.net/Performance%20Benchmarks
OK, I include them here in case somebody wants to discuss them.  Sorry
for the long lines.  These figures are by no means scientific
measurements.  Only the reference column shows absolute time.  All
other columns show are ratios.


,---- dual Pentium 4, 3.0 GHz, 1000 MB RAM, 512 KB cache ---
                                                                                                         
Benchmark               Reference  CMUCL CMUCL  SBCL     ACL  GNUCL   ECL-S  CLISP Poplog ABCL/CVS ABCL/CVS
                        CMUCL 18e    19a 19aP4 0.8.13    5.0  2.6.3    0.9d 2.33.2  15.53 JDK1.4.2 JDK1.5b2
-----------------------------------------------------------------------------------------------------------
COMPILER                 [  1.98]   1.15  1.05  1.49    0.90   0.08    3.50   0.69   0.01    -1.00    -1.00 
LOAD-FASL                [  0.18]   1.16  1.12  2.79    1.19   0.45    2.61   2.19  11.14    -1.00    -1.00 
SUM-PERMUTATIONS         [  2.05]   1.13  1.11  1.52    1.52   1.24   -1.00   1.51   1.63    10.34     8.90 
WALK-LIST/SEQ            [  0.03]   1.00  0.96  1.00    1.07   1.43    4.29   1.00  10.36     3.89     3.79 
WALK-LIST/MESS           [  0.16]   1.31  0.47  1.82   -1.00   2.06    0.75   1.57  -1.00    -1.00    -1.00 
BOYER                    [  5.51]   0.99  0.97  0.86    1.05   2.86    4.23  12.43   7.73     1.39     0.92 
BROWSE                   [  0.58]   1.04  0.97  0.96    1.33  12.85    2.50   6.27   1.74     5.61     4.25 
DDERIV                   [  0.56]   0.97  0.98  0.91    1.11   3.97    6.97  12.55   2.30     7.90     6.01 
DERIV                    [  0.62]   1.02  1.01  0.93    0.41   3.65    6.37  12.88   2.11     6.97     5.06 
DESTRUCTIVE              [  0.51]   0.97  1.00  0.89    0.33   3.66    3.72   7.75   2.39     8.74     5.11 
DIV2-TEST-1              [  1.30]   0.97  0.94  0.71    0.18   2.66    4.54  10.66   1.64     4.24     2.42 
DIV2-TEST-2              [  1.41]   0.96  0.94  0.84    0.33   2.89    4.42  10.49   1.34     1.56     0.65 
FFT                      [  0.06]   1.03  1.02  0.87    1.09  55.00   42.03  65.58 250.16    63.53    20.61 
FRPOLY/FIXNUM            [  0.54]   0.99  1.00  0.87    2.00   2.52    4.69  12.59   3.04     4.68     6.12 
FRPOLY/BIGNUM            [  0.62]   0.98  1.02  0.84    2.20   1.68    2.84   7.81   1.60     1.98     1.59 
FRPOLY/FLOAT             [  0.94]   0.88  0.90  0.76    2.76   1.35    2.70  10.75   2.19     1.65     1.35 
PUZZLE                   [  0.17]   0.99  1.00  1.10    1.90   6.21   46.26  71.59  14.14   112.05    62.28 
TAK                      [  0.36]   1.02  1.00  1.00    0.78   1.71    7.17  13.94   2.97    10.51     6.17 
CTAK                     [  0.33]   1.20  1.19  0.94    3.34   4.53    6.90  11.24  25.02  1424.99  1292.46 
TRTAK                    [  0.36]   1.01  0.99  0.99    0.78   1.67    7.13  13.64   2.95     6.65     3.38 
TAKL                     [  0.37]   1.05  1.00  0.99    0.98   0.87    4.78  20.55   5.03     5.91     4.38 
STAK                     [  0.47]   1.12  0.99  1.05    2.54   1.67   -1.00   7.64   7.34    17.38    10.50 
FPRINT/UGLY              [  0.68]   1.08  1.06  1.63   13.72   1.23    1.95   1.70   8.58    30.99    29.87 
FPRINT/PRETTY            [  2.23]   0.95  0.94  1.37    4.40   0.39    1.22   3.34   6.77   224.69   190.60 
TRAVERSE                 [  0.80]   1.00  1.00  4.21    0.64   2.23    7.21  14.10   5.29    22.50    13.48 
TRIANGLE                 [  0.60]   1.04  1.07  1.01    0.68   2.09   18.39  30.20  10.58    16.32    14.10 
RICHARDS                 [  0.46]   1.05  1.03  1.01    3.21   1.19   10.72  30.24  12.08    24.88    26.11 
FACTORIAL                [  0.33]   0.98  0.97  1.33    3.03   9.79    4.71  20.27   2.40     3.43     2.28 
FIB                      [  0.44]   1.01  1.00  1.01    0.27   4.77    2.24   6.44   1.10     6.00     2.33 
FIB-RATIO                [  0.34]   0.99  0.99  1.01   15.95  30.55    0.99   0.11   2.94     3.47     1.94 
ACKERMANN                [  5.63]   1.00  1.01  1.00    0.64  22.28    2.06   6.69   1.92     4.47     2.06 
MANDELBROT/COMPLEX       [  9.13]   0.97  0.97  1.01    5.36   1.93    2.24  10.58   0.77     0.68     0.34 
MANDELBROT/DFLOAT        [  5.26]   0.99  0.98  0.96    5.55   2.51    3.42  13.09   1.50     1.52     0.78 
MRG32K3A                 [  0.78]   1.00  1.00  1.03  183.82   5.45    7.13 118.85   7.38    11.35     7.91 
CRC40                    [ 19.30]   1.01  1.11  0.95    4.27  13.52    8.91  12.63   2.26     4.27     2.31 
BIGNUM/ELEM-100-1000     [  0.50]   1.01  1.02  1.17    1.88   0.77    0.22   0.10   2.40     1.54     1.52 
BIGNUM/ELEM-1000-100     [  2.74]   1.01  1.00  1.40    1.68   0.92    0.05   0.07   3.08     0.43     0.38 
BIGNUM/ELEM-10000-1      [  3.00]   1.01  1.00  1.38    4.80   0.79    0.05   0.06  19.89     2.02     1.77 
BIGNUM/PARI-100-10       [  1.28]   0.50  0.49  0.69    0.07   0.02    0.02   0.01   0.02     0.27     0.24 
BIGNUM/PARI-200-5        [ 15.90]   0.49  0.49  0.76    0.02   0.01    0.00   0.00   0.01     0.03     0.03 
PI-DECIMAL/SMALL         [ 25.10]   1.01  1.00  1.42   15.31   8.44    0.27   0.24   3.31     1.91     1.83 
PI-DECIMAL/BIG           [ 53.32]   1.00  1.00  1.55   15.74   6.07   -1.00   0.05   3.84    -1.00    -1.00 
PI-ATAN                  [  1.38]   1.01  0.99  1.29    2.98   5.80   -1.00   6.89   1.16    -1.00    -1.00 
PI-RATIOS                [  4.51]   0.99  1.00  1.11    5.42   3.91    0.58   0.33   2.01     1.70     1.60 
SLURP-LINES              [  0.89]   0.96  0.98  1.02    7.09   2.85    1.63   4.69   3.00    33.60    30.80 
HASH-STRINGS             [  0.47]   0.84  0.78  0.81 4218.26 251.42   50.04   5.81  35.71    78.09    70.22 
HASH-INTEGERS            [  0.84]   0.99  0.94  1.21    4.89   0.58    1.49   6.31  11.79    12.11    10.33 
BOEHM-GC                 [  1.89]   1.01  0.97  1.16    3.82   3.89    7.87  18.34   2.22    66.31    64.75 
DEFLATE-FILE             [  0.45]   1.30  1.30  1.22   24.97   3.27    9.55   8.87   7.80    25.86    20.27 
1D-ARRAYS                [  0.07]   1.01  1.03  1.49    8.22   4.11    6.30  12.14  18.63     4.81     3.08 
2D-ARRAYS                [  0.87]   1.01  1.01  0.63    9.77  22.62   21.31  26.39  14.16    66.20    45.94 
3D-ARRAYS                [  2.91]   1.00  1.00  0.77    7.62   8.91   16.19  16.69   7.88    -1.00    -1.00 
BITVECTORS               [  0.63]   0.97  0.95  1.60    1.74  10.16   13.12  33.12  -1.00  2233.05   868.14 
BENCH-STRINGS            [  2.54]   1.02  0.99  0.14   16.87   4.30    5.78   0.47  33.55     9.39     4.20 
fill-strings/adjustable  [ 19.24]   1.01  1.00  1.00   -1.00   0.90   15.19   4.62 104.94     1.23     0.94 
STRING-CONCAT            [ 55.08]   1.07  1.03  0.50   -1.00  26.24   -1.00  10.29  -1.00    20.64    19.87 
SEARCH-SEQUENCE          [  3.03]   1.00  1.00  0.05    4.07   2.28    3.29   3.58   3.61    11.59     9.53 
CLOS/defclass            [  3.46]   1.08  1.07  0.63    0.11  -1.00    0.47   0.12   0.02     4.74     4.30 
CLOS/defmethod           [  7.27]   1.03  1.05  0.85    0.05  -1.00    0.17   0.01   0.00     1.55     1.41 
CLOS/instantiate         [  8.03]   1.13  1.09  0.98    0.86  -1.00  281.76   1.36   1.55    30.37    28.01 
CLOS/simple-instantiate  [  0.33]   0.99  0.99  0.96    0.57  -1.00   51.78  18.96 112.65   687.92   605.75 
CLOS/methodcalls         [  1.61]   0.97  0.97  1.23    3.01  -1.00   50.49  10.92   9.61    95.39    84.53 
CLOS/complex-methods     [  0.06]   1.00  0.97  7.44    5.76  -1.00 1963.22  -1.00  -1.00  1746.36  1510.32 
EQL-SPECIALIZED-FIB      [  0.32]   1.02  1.01  1.06   10.87  -1.00    7.16   5.30   9.06 11985.26 10777.13 

Reference time in first column is in seconds; other columns are relative
Reference implementation: CMU Common Lisp 18e CVS Head 2003-09-14 16:41:12 (binary release)
Impl CMUCL:  CMU Common Lisp 19a-pre3 (binary release from ftp.linux.org.uk/pub/lisp/cmucl)
Impl CMUCL:  CMU Common Lisp 19a-pre3 Pentium4 (built with CFLAGS="-march=pentium4")
Impl SBCL:   SBCL 0.8.13 (bootstrapped with CMUCL 18e)
Impl ACL 5:  Allegro CL Trial Edition 5.0 [Linux/X86] (no heap limit! binary only:)
Impl GNUCL:  Kyoto Common Lisp GCL 2.6.3 (built with CFLAGS="-march=pentium4")
Impl ECL-S:  ECL 0.9d CVS Head 2004-07-27 (aka ECL Spain, CFLAGS="-march=pentium4")
Impl CLISP:  CLISP 2.33.2 (2004-06-02) (not optimized for pentium4 because of make failure)
Impl Poplog: Sussex Poplog 15.53e Common Lisp 2.0 (using "easy to install version" haha)
Impl ABCL14: Armed Bear Common Lisp 0.0.3.16+ CVS Head 2004-07-26 / Sun Java 1.4.2-b28
Impl ABCL15: Armed Bear Common Lisp 0.0.3.16+ CVS Head 2004-07-26 / Sun Java 1.5.0-beta2-b51
Java invocation for ABCL: java -server -Xss64M -Xmx256M -Xrs
=== Test machine ===
   Machine-type: X86
   Machine-version: Dual Intel Pentium 4 at 3.00GHz, 1000 MB RAM, 512 KB cache
   Linux 2.4.21-199-smp4G i686 i386 GNU/Linux (SuSE 9.0)
(declaim (optimize (speed 3) (space 1) (safety 0) (debug 0) (compilation-speed 0)))
`----

What this tells me as a Lisp non-expert is that building you own CMUCL
does not alway increase performance (except for WALK-LIST/MESS).  Also,
it confirms my observation from Xalan performance that Java 1.5 is
about 10% faster than Java 1.4.2 (except for PUZZLE where the new Java
is twice as fast, and FFT where the new Java is thrice as fast -- they
must have sped up floating point operations).

It's interesting to see how some implementations excel at some tasks,
and other ones at other tasks.  May I naively ask why nobody combines
the best routines into a high-performance Lisp?  I'm sure somebody is
trying to do this right now.

When you see a -1 value, that means Lisp either aborted or seemed to
take ``forever.''  I told you I didn't do scientific measurements! :-)

Let's see what I can remember...

Allegro CL 5.0 aborted WALK-LIST/MESS because of a stack overflow. I
checked umlimit -s which said unlimited.  Then I played around with
set-stack-cushion -- no success.  Can anyone explain the extremely bad
performance of ACL for HASH-STRINGS? I've never had any problems using
large hashes of strings with ACL.  This experience is true for ACL 4.3
on Solaris and ACL 5.0 trial edition on Linux (which is not crippled
like later ACL 6.x trial editions -- it just lacks CLIM which you can
compensate using McCLIM -- thanks, Franz:).

Armed Bear Common Lisp also had a stack overflow in WALK-LIST/MESS. I
increased the maximum stack size to ridiculous values of up to
256 MB without success.  Although Armed Bear CL does not look like a
sprinter, it does a great job as a scripting language for the editor J
it comes with.  J feels like an Emacs made today, and you don't need a
gigahertz machine to run it comfortably. Of course, it has a nice lisp
mode, and you can run an ``inferior'' Armed Bear CL typing ALT-x lisp.

There were also some run-time errors with ECL-S which I denote as -1.

CLOS would not work with GCL but I guess I made some mistake.  The
shell scripts for GCL and Armed Bear Common Lisp that came with the
benchmark collection required quite a bit of fiddling.  STRING-CONCAT
uses write-sequence/2 which can't be found in GCL.  So I dropped in a
piece of code I found using a search engine at
http://www.cormanlisp.com/CormanLisp/patches/1_5/streams.lisp .  So
this value of 26.24 should actually read -1.  Poplog also has no
write-sequence/2, and it refused to load the definition I used for
GCL, so it got -1.  I would like to have given it a -2 for all the
hassle I had setting it up.  That must be the reason why it is only
used at 3 or 4 sites.  Otherwise it has nice features integrating
Prolog, Pop11 and Standard ML besides a reasonably fast CLtL2 Lisp.

I know that a lot of you folks are emotionally involved with one or
the other Lisp implementation.  If your favourite Lisp does not look
well in my results, please don't blame me -- I'm still a beginner at
Lisp, and I have treated each implementatin with the same amount of
ignorance.  Of course I do have an implementation of choice, Allegro,
but I'm looking for a free (libre) alternative.  Because I consider
porting a large CLtL1-2 application that makes heavy use of Screamer,
I would like to see a Screamer benchmark being included into Erics
collection.  Since Screamer has become a part of CLOCC, it should be
portable enough now.

Comments welcome.
Christian

-- 
  Christian Pietsch
  http://www.interling.de

Re: new benchmark results for 8 CL implementations in cliki.net Pascal Costanza
- Re: new benchmark results for 8 CL implementations in cliki.net Christian Pietsch
  - Re: new benchmark results for 8 CL implementations in cliki.net Pascal Costanza
  - Re: new benchmark results for 8 CL implementations in cliki.net Christophe Rhodes
    - Re: new benchmark results for 8 CL implementations in cliki.net Svein Ove Aas
      - Re: new benchmark results for 8 CL implementations in cliki.net Pascal Bourguignon
        NIL implementation designs Frode Vatvedt Fjeld
      - Re: new benchmark results for 8 CL implementations in cliki.net Adam Warner
        Re: new benchmark results for 8 CL implementations in cliki.net Svein Ove Aas
        Re: new benchmark results for 8 CL implementations in cliki.net Adam Warner
        Re: new benchmark results for 8 CL implementations in cliki.net Svein Ove Aas
        Re: new benchmark results for 8 CL implementations in cliki.net Brian Downing
        Re: new benchmark results for 8 CL implementations in cliki.net Svein Ove Aas
- Re: new benchmark results for 8 CL implementations in cliki.net Don Geddis
  - Re: new benchmark results for 8 CL implementations in cliki.net Lynn Winebarger
    - Re: new benchmark results for 8 CL implementations in cliki.net Don Geddis
Re: new benchmark results for 8 CL implementations in cliki.net Raymond Toy
Re: new benchmark results for 8 CL implementations in cliki.net Mike Thomas
Re: new benchmark results for 8 CL implementations in cliki.net Douglas Crosher
Re: new benchmark results for 8 CL implementations in cliki.net Eric Marsden
- benchmark results updated, now including Allegro CL 6.2 Enterprise Christian Pietsch

From: Pascal Costanza
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Thu, 29 Jul 2004 07:12:50 +0000
Message-ID: <cea81f$2ao$1@newsreader2.netcologne.de>

Christian Pietsch wrote:

> It's interesting to see how some implementations excel at some tasks,
> and other ones at other tasks.  May I naively ask why nobody combines
> the best routines into a high-performance Lisp?  I'm sure somebody is
> trying to do this right now.

I don't think so. The performance of an implementation is largely an 
emergent property resulting from a combination of many influences. You 
can't just throw together the best results of different implementations. 
Do you know the butterfly effect?

Furthermore, benchmark usually don't measure what is important in real 
applications, at least not in general. This makes matters even more complex.

Pascal

-- 
Tyler: "How's that working out for you?"
Jack: "Great."
Tyler: "Keep it up, then."

From: Christian Pietsch
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Thu, 29 Jul 2004 09:28:23 +0000
Message-ID: <ceafvn$6m7m2$1@hades.rz.uni-saarland.de>

In <························@lon-reader.news.telstra.net>, Mike Thomas wrote:
>> CLOS would not work with GCL but I guess I made some mistake.  The
>> shell scripts for GCL and Armed Bear Common Lisp that came with the
>> benchmark collection required quite a bit of fiddling.  STRING-CONCAT
>> uses write-sequence/2 which can't be found in GCL.
> 
> You probably used a CLtL1 build of GCL.  To get an ANSI build, configure
> like this:
> 
>      configure --enable-ansi (+ whatever other arguments you want)

That's what I did.  The test were run with GCL in ANSI mode.  Mike
tells me in a PM that enthusiasts on the GCL developers list are
currently trying to figure out what's going on.

As an aside to Pascal: I have heard about the butterfly effect as a
popular way of explaining chaos theory.  Do you consider a Lisp
implementation a chaotic system?  Anyway, I am fully aware of the fact
that speed is not the only interesting aspect of a Common Lisp
implementation. :-)

Cheers,
Christian

-- 
  Christian Pietsch
  http://www.interling.de

From: Pascal Costanza
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Thu, 29 Jul 2004 09:34:19 +0000
Message-ID: <ceagas$hag$1@f1node01.rhrz.uni-bonn.de>

Christian Pietsch wrote:

> As an aside to Pascal: I have heard about the butterfly effect as a
> popular way of explaining chaos theory.  Do you consider a Lisp
> implementation a chaotic system?

Any sufficiently complicated C or Fortran program contains an ad-hoc, 
informally-specified bug-ridden slow implementation of half of a chaotic 
system. ;)

Pascal

-- 
Pascal Costanza               University of Bonn
···············@web.de        Institute of Computer Science III
http://www.pascalcostanza.de  R�merstr. 164, D-53117 Bonn (Germany)

From: Christophe Rhodes
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Thu, 29 Jul 2004 12:45:24 +0000
Message-ID: <sqvfg782fj.fsf@cam.ac.uk>

Christian Pietsch <·······@interling.de> writes:

> As an aside to Pascal: I have heard about the butterfly effect as a
> popular way of explaining chaos theory.  Do you consider a Lisp
> implementation a chaotic system?

"chaotic" in the sense of slight changes in initial conditions giving
vastly different outcomes over time, possibly not.  However, the
system can appear chaotic in the sense of complexity theory, in that
the observable outcomes (speeds of programs) are affected by large
numbers of underlying factors (implementation techniques, hardware
cache sizes, OS paging behaviour...) such that it becomes a nonsense
to talk of amalgamating all the fastest routines.

To take a typical example, consider the representation of NIL.  Since
NIL is both a symbol and a list (and is the only such object), there
are decisions to be made on its machine representation.  For example,
one could give NIL a representation that matches the Lisp symbol
representation, but ensure that the first two slots in the symbol
layout are such that NIL can safely be stored there, so that CAR and
CDR work (SBCL and CMUCL do essentially this).  One can give NIL a
completely distinguished representation, and store all sorts of
interesting information in this unique object, such as pointers to
per-thread storage areas -- this, of course, complicates the
implementation of SYMBOLP (Movitz does this).  Or, one can give NIL a
list-like representation, but ensure that there is enough space around
it to include all the necessary slots for it to look like a symbol
(which complicates the implementation of RPLACA and RPLACD, unless the
Operating System can be convinced to turn off write permission to the
memory where NIL resides).  There are probably other strategies I
haven't thought of; and each of the strategies contains several
subchoices, which often have different answers per-architecture, based
on possible addressing modes, availability of register renaming, and
so on.

Each of these choices gives different performance characteristics to
various operators, which will have knock-on effects to the performance
characteristics of larger programs.  There are many such places where
choices have to be made in Common Lisp implementations.  Given all of
this, I hope you realise that it's not quite as simple as identifying
one 'routine' that will obviously improve all implementations out
there.

Christophe
-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)

From: Svein Ove Aas
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Fri, 30 Jul 2004 01:34:17 +0000
Message-ID: <cec8hb$lqn$1@services.kq.no>

Christophe Rhodes wrote:

> One can give NIL a
> completely distinguished representation, and store all sorts of
> interesting information in this unique object, such as pointers to
> per-thread storage areas -- this, of course, complicates the
> implementation of SYMBOLP (Movitz does this).  

I find this concept to be very interesting; I take it it does that for
performance reasons? What I mean is, most functions that allocate space
will have a nil or two laying around, so dereferencing that means less
time spent chasing pointers and/or fewer registers used?

I'm afraid I don't understand this half as well as I should; I'm trying to
read through the Movitz code, but it's hard going, especially since I
don't have a functional CL implementation around these days.

On that note, does anyone have a pointer to a (free!) amd64-based Lisp I
could use? I could probably get sbcl to work in 32-bit mode, but it seems
like such a waste of good registers, and keeping two sets of libraries
around is anything but easy.

From: Pascal Bourguignon
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Fri, 30 Jul 2004 01:56:29 +0000
Message-ID: <877jsm5kc2.fsf@thalassa.informatimago.com>

Svein Ove Aas <··············@brage.info> writes:

> Christophe Rhodes wrote:
> 
> > One can give NIL a
> > completely distinguished representation, and store all sorts of
> > interesting information in this unique object, such as pointers to
> > per-thread storage areas -- this, of course, complicates the
> > implementation of SYMBOLP (Movitz does this).  
> 
> I find this concept to be very interesting; I take it it does that for
> performance reasons? What I mean is, most functions that allocate space
> will have a nil or two laying around, so dereferencing that means less
> time spent chasing pointers and/or fewer registers used?

No, that's just a gross kludge that poses all sort of problems.  I bet
they'll have one day a major version rewrite scrapping that.


> I'm afraid I don't understand this half as well as I should; I'm trying to
> read through the Movitz code, but it's hard going, especially since I
> don't have a functional CL implementation around these days.
> 
> On that note, does anyone have a pointer to a (free!) amd64-based Lisp I
> could use? I could probably get sbcl to work in 32-bit mode, but it seems
> like such a waste of good registers, and keeping two sets of libraries
> around is anything but easy.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

There is no worse tyranny than to force a man to pay for what he does not
want merely because you think it would be good for him. -- Robert Heinlein

From: Frode Vatvedt Fjeld
Subject: NIL implementation designs
Date: Fri, 30 Jul 2004 10:29:55 +0000
Message-ID: <2h7jslu6sc.fsf_-_@vserver.cs.uit.no>

Christophe Rhodes wrote:

>> > One can give NIL a completely distinguished representation, and
>> > store all sorts of interesting information in this unique object,
>> > such as pointers to per-thread storage areas -- this, of course,
>> > complicates the implementation of SYMBOLP (Movitz does this).

Svein Ove Aas <··············@brage.info> writes:

>> I find this concept to be very interesting; I take it it does that
>> for performance reasons? What I mean is, most functions that
>> allocate space will have a nil or two laying around, so
>> dereferencing that means less time spent chasing pointers and/or
>> fewer registers used?

Pascal Bourguignon <····@thalassa.informatimago.com> writes:

> No, that's just a gross kludge that poses all sort of problems. I
> bet they'll have one day a major version rewrite scrapping that.

Well, Christophe's account of NIL isn't entirely accurate with regard
to Movitz. The NIL value is a system-wide constant (currently #x6d)
that is pretty much always located in one designated register. This
value is also congruent with cons-cells (modulo the word-width of 4
octets, or 32 bits), and so by storing NIL at the car and cdr offsets
from #x6d ensures that the car and cdr operations work with no
overhead (this was my highest priority in this NIL design).

Since NIL is congruent with the cons low-tag, it cannot also be
congruent with the symbol low-tag. Still, it is a quite simple trick
in (x86) assembly to implement symbolp with zero overhead. However,
the symbol accessors (such as symbol-name) are somewhat more of an
issue, because NIL and symbols not being congruent modulo 4 means
dereferencing them with the same offset would yield unaligned memory
references (which on x86 work but are hightly undesirable for
performance and other reasons) in one of the cases. However, it is
possible to compute a good offset in about 2-3 CPU instructions,
without branching, so I don't really consider this a problem. (In
fact, I don't believe any of the symbol accessors are that important
speed-wise anyway, except the symbol-function access that's being done
on every function call etc. but then you don't need NIL to work
anyway.)

Additionally, the memory locations around #x6d are used for some
global constants so that they (given the x86 instruction encoding) can
be referenced efficiently. And, further, some of these memory
locations are denoted as thread-local, and accesses to such locations
are performed indirectly also through the FS segment selector. In
other words, the FS segment selector is used to switch between
thread-local storages (although there are no actual threads in Movitz
just yet).  All of this works quite well, and I foresee no scrapping
of this design.

If anyone reading this is worrying that one shouldn't over-write the
first few kilo-bytes of memory (i.e. around #x6d) on the PC
architecture, I can inform you that the default segment selector is
set up to start at one megabyte, such that e.g. location #x6d is in
fact the physical location #x10006d. One reason for this is that it is
in fact helpful (in terms of code-size, and therefore i-cache
performance) under the x86 instruction architecture to have access to
a small (6-7 bits), well-known value in a register, for generating
(smallish) constant values. On a register-starved architecture, one
must try to make the most of everything.

So, there are indeed many considerations in a NIL design :-)

Svein Ove Aas <··············@brage.info> writes:

>> I'm afraid I don't understand this half as well as I should; I'm
>> trying to read through the Movitz code, but it's hard going,
>> especially since I don't have a functional CL implementation around
>> these days.

I seem to remember you said you're a student here in Troms�? You are
of course welcome to stop by my lab or office anytime for a chat. Or,
like everyone, to ask questions on the movitz-devel mailing list.

-- 
Frode Vatvedt Fjeld

From: Adam Warner
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Sat, 31 Jul 2004 03:23:45 +0000
Message-ID: <pan.2004.07.31.03.23.43.178729@consulting.net.nz>

Hi Svein Ove Aas,

> On that note, does anyone have a pointer to a (free!) amd64-based Lisp I
> could use? I could probably get sbcl to work in 32-bit mode, but it seems
> like such a waste of good registers, and keeping two sets of libraries
> around is anything but easy.

ArmedBear Common Lisp. Just add an AMD64 JVM (for example Sun's 1.5.0
beta: <http://java.sun.com/j2se/1.5.0/install-linux-64.html>)

GNU GCL is another choice. It's compiled for Debian Pure64. I'm yet to
find any documentation on the differences when it is compiled for AMD64.
I do know MOST-POSITIVE-FIXNUM remains unchanged and if you're expecting
bignum arithmetic to be faster in AMD64 mode you may be disappointed (I
found a simple sum of fixnums (where the sum overflows to a bignum) to be
marginally slower and I suspect it's because the bignum is now an object
with a 64-bit pointer, which puts more strain upon memory bandwidth).

Regards,
Adam

From: Svein Ove Aas
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Sat, 31 Jul 2004 11:09:05 +0000
Message-ID: <ceg24t$59q$1@services.kq.no>

Adam Warner wrote:

> Hi Svein Ove Aas,
> 
>> On that note, does anyone have a pointer to a (free!) amd64-based Lisp
>> I could use? I could probably get sbcl to work in 32-bit mode, but it
>> seems like such a waste of good registers, and keeping two sets of
>> libraries around is anything but easy.
> 
> ArmedBear Common Lisp. Just add an AMD64 JVM (for example Sun's 1.5.0
> beta: <http://java.sun.com/j2se/1.5.0/install-linux-64.html>)
> 
I'll go take a look.

> GNU GCL is another choice. It's compiled for Debian Pure64. I'm yet to
> find any documentation on the differences when it is compiled for AMD64.
>
Pure64... amd64... aren't those two names for the same thing?

> I do know MOST-POSITIVE-FIXNUM remains unchanged and if you're expecting
> bignum arithmetic to be faster in AMD64 mode you may be disappointed (I
> found a simple sum of fixnums (where the sum overflows to a bignum) to
> be marginally slower and I suspect it's because the bignum is now an
> object with a 64-bit pointer, which puts more strain upon memory
> bandwidth).
> 
Not quite correct; if you read /proc/cpuinfo, you'll find that it doesn't
use 64-bit addresses. In my case, it's 48-bit virtual addresses (and
40-bit physical); it probably doesn't matter, but there is a slight
possibility that the cpu optimizes by not transferring the full 64 bits.

From: Adam Warner
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Sat, 31 Jul 2004 12:37:11 +0000
Message-ID: <pan.2004.07.31.12.37.09.287871@consulting.net.nz>

Hi Svein Ove Aas,

>> GNU GCL is another choice. It's compiled for Debian Pure64. I'm yet to
>> find any documentation on the differences when it is compiled for AMD64.
>>
> Pure64... amd64... aren't those two names for the same thing?

Debian's Pure64 only runs AMD64 binaries. It requires a compatibility
package or a chroot environment to run 32-bit binaries. I meant the
differences when GCL is compiled for AMD64 compared to IA32.

>> I do know MOST-POSITIVE-FIXNUM remains unchanged and if you're expecting
>> bignum arithmetic to be faster in AMD64 mode you may be disappointed (I
>> found a simple sum of fixnums (where the sum overflows to a bignum) to
>> be marginally slower and I suspect it's because the bignum is now an
>> object with a 64-bit pointer, which puts more strain upon memory
>> bandwidth).
>> 
> Not quite correct; if you read /proc/cpuinfo, you'll find that it doesn't
> use 64-bit addresses. In my case, it's 48-bit virtual addresses (and
> 40-bit physical); it probably doesn't matter, but there is a slight
> possibility that the cpu optimizes by not transferring the full 64 bits.

Interesting observation, thanks.

Regards,
Adam

From: Svein Ove Aas
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Tue, 03 Aug 2004 12:34:14 +0000
Message-ID: <ceo480$eon$1@services.kq.no>

Adam Warner wrote:

>>> I do know MOST-POSITIVE-FIXNUM remains unchanged and if you're
>>> expecting bignum arithmetic to be faster in AMD64 mode you may be
>>> disappointed (I found a simple sum of fixnums (where the sum overflows
>>> to a bignum) to be marginally slower and I suspect it's because the
>>> bignum is now an object with a 64-bit pointer, which puts more strain
>>> upon memory bandwidth).
>>> 
>> Not quite correct; if you read /proc/cpuinfo, you'll find that it
>> doesn't use 64-bit addresses. In my case, it's 48-bit virtual addresses
>> (and 40-bit physical); it probably doesn't matter, but there is a
>> slight possibility that the cpu optimizes by not transferring the full
>> 64 bits.
> 
> Interesting observation, thanks.
> 
One thing I missed at the time, which has turned out to be useful later:
A pointer is still 64 bits; you have to set the upper (16, at this point)
bits to all 0 or all 1 to use it, but you can certainly store information
there if you 'and' or 'or' it into submision before using it.

As such, maybe that would be a good place to store tags?

From: Brian Downing
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Tue, 03 Aug 2004 16:44:23 +0000
Message-ID: <HhPPc.71589$8_6.2003@attbi_s04>

In article <············@services.kq.no>,
Svein Ove Aas  <··············@brage.info> wrote:
> One thing I missed at the time, which has turned out to be useful later:
> A pointer is still 64 bits; you have to set the upper (16, at this point)
> bits to all 0 or all 1 to use it, but you can certainly store information
> there if you 'and' or 'or' it into submision before using it.
> 
> As such, maybe that would be a good place to store tags?

I believe history shows that subverting the architecture specification
like this usually bites you in the ass in the end.

For example, look at the 32-bit cleanliness issues the Macintosh world
went through after some overly-clever developers decided that the
"unused" high address bits in the 68000 would be a good place to store
stuff.

Avoiding issues like that is the very reason AMD specified that all the
unused bits must be 0 or 1.

-bcd

From: Svein Ove Aas
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Tue, 03 Aug 2004 16:39:56 +0000
Message-ID: <ceoikl$cav$1@services.kq.no>

Brian Downing wrote:

> In article <············@services.kq.no>,
> Svein Ove Aas  <··············@brage.info> wrote:
>> One thing I missed at the time, which has turned out to be useful
>> later: A pointer is still 64 bits; you have to set the upper (16, at
>> this point) bits to all 0 or all 1 to use it, but you can certainly
>> store information there if you 'and' or 'or' it into submision before
>> using it.
>> 
>> As such, maybe that would be a good place to store tags?
> 
> I believe history shows that subverting the architecture specification
> like this usually bites you in the ass in the end.
> 
> For example, look at the 32-bit cleanliness issues the Macintosh world
> went through after some overly-clever developers decided that the
> "unused" high address bits in the 68000 would be a good place to store
> stuff.
> 
> Avoiding issues like that is the very reason AMD specified that all the
> unused bits must be 0 or 1.
> 
Right you are.

For some reason I thought it was specified to *never* use more than 48
bits of virtual address space, but after checking that turns out to be
false - and they even give the same justification for requiring canonical
addresses as you do.

Well, I'll just have to do this the traditional way.

From: Don Geddis
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Thu, 29 Jul 2004 18:07:45 +0000
Message-ID: <87ekmur8jy.fsf@sidious.geddis.org>

> Christian Pietsch wrote:
>> It's interesting to see how some implementations excel at some tasks,
>> and other ones at other tasks.  May I naively ask why nobody combines
>> the best routines into a high-performance Lisp?

Pascal Costanza <········@web.de> wrote on Thu, 29 Jul 2004:
> I don't think so. The performance of an implementation is largely an emergent
> property resulting from a combination of many influences. You can't just
> throw together the best results of different implementations.

Moreover, sometimes it really is a matter of choice.  You can optimize either
case A or case B, but not both.  Do you use lists or hash tables for an
association map?  Lists are faster if the maps are small.  Hash tables have
a higher fixed overhead, but better scaling as the maps grow larger.

Neither choice is "better" than the other.  If one implementation chose one
approach, and another implementation the other approach, then you'd find
results much like your benchmarks, where one is better on some benchmarks
and the other on different benchmarks.

But in this case, it doesn't make any sense to "combine the best of both".
You either implement the map as a list or as a hash table.  There's no
automatic way to get list-like performance on the smallcases, while
hash-table-like performance on the large cases.

        -- Don
_______________________________________________________________________________
Don Geddis                  http://don.geddis.org/               ···@geddis.org
In my opinion anyone interested in improving himself should not rule out
becoming pure energy.  -- Jack Handey, The New Mexican, 1988

From: Lynn Winebarger
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Thu, 29 Jul 2004 22:33:55 +0000
Message-ID: <cebs97$fv2$1@hood.uits.indiana.edu>

Don Geddis wrote:
>>Christian Pietsch wrote:
>>
>>>It's interesting to see how some implementations excel at some tasks,
>>>and other ones at other tasks.  May I naively ask why nobody combines
>>>the best routines into a high-performance Lisp?
> 
> 
> Pascal Costanza <········@web.de> wrote on Thu, 29 Jul 2004:
> 
>>I don't think so. The performance of an implementation is largely an emergent
>>property resulting from a combination of many influences. You can't just
>>throw together the best results of different implementations.
> 
> 
> Moreover, sometimes it really is a matter of choice.  You can optimize either
> case A or case B, but not both.  Do you use lists or hash tables for an
> association map?  Lists are faster if the maps are small.  Hash tables have
> a higher fixed overhead, but better scaling as the maps grow larger.
> 
> Neither choice is "better" than the other.  If one implementation chose one
> approach, and another implementation the other approach, then you'd find
> results much like your benchmarks, where one is better on some benchmarks
> and the other on different benchmarks.
> 
> But in this case, it doesn't make any sense to "combine the best of both".
> You either implement the map as a list or as a hash table.  There's no
> automatic way to get list-like performance on the smallcases, while
> hash-table-like performance on the large cases.

    You could try using balanced trees of some sort...

Lynn

From: Don Geddis
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Fri, 30 Jul 2004 16:26:12 +0000
Message-ID: <871xitpil7.fsf@sidious.geddis.org>

I wrote:
>> Do you use lists or hash tables for an
>> association map?  Lists are faster if the maps are small.  Hash tables have
>> a higher fixed overhead, but better scaling as the maps grow larger.
>> Neither choice is "better" than the other.

Lynn Winebarger <········@indiana.edu> wrote on Thu, 29 Jul 2004:
>     You could try using balanced trees of some sort...

Yes, there are all sorts of mapping algorithms.  I saw some "dictionary"
Lisp code once that actually started with lists, and then as the map grew
bigger it automatically converted everything to hash tables.

But all these approaches, including balanced trees, involve tradeoffs.
Generally higher startup (or lookup) overhead, in exchange for better
complexity growth at large sizes.

I can basically guarantee you that if I used association lists for a map,
and you used balanced trees, that I could construct a "benchmark" where my
implementation easily beat yours in speed by a significant fraction.

Rarely do you get anything for free in algorithms.  Generally you optimize
certain cases by making other cases worse.  The trick is understanding what
is likely in your domain, so that you optimize the common cases instead of
the rare ones.

        -- Don
_______________________________________________________________________________
Don Geddis                  http://don.geddis.org/               ···@geddis.org
If I could sum up my life in one sentence, I think it would be: He was born, he
lived, and then he kept on living, much longer than anyone had ever lived
before, getting richer and richer and glowing with a bright white light.
	-- Deep Thoughts, by Jack Handey [1999]

From: Raymond Toy
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Thu, 29 Jul 2004 15:40:31 +0000
Message-ID: <sxdfz7ayg7k.fsf@edgedsp4.rtp.ericsson.se>

>>>>> "Christian" == Christian Pietsch <·······@interling.de> writes:

    Christian> What this tells me as a Lisp non-expert is that building you own CMUCL
    Christian> does not alway increase performance (except for WALK-LIST/MESS).  Also,

I think the walk-list/mess and walk-list/seq benchmarks are bogus
because they use random numbers.  So each may get a different sequence
of numbers, and are almost certainly different between
implementations.

Also, you just compiled the C files.  Those are mostly used during
garbage collection, so even if the C code were infinitely fast, you
wouldn't see that much improvement in speed, unless you spend all of
your time collecting garbage.

Nice set of results, though.  Interesting reading.

Ray

From: Mike Thomas
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Thu, 29 Jul 2004 05:02:44 +0000
Message-ID: <4108841c$0$9127$c30e37c6@lon-reader.news.telstra.net>

Hi Christian.

Thanks for your work in obtaining and presenting these reaults.

> CLOS would not work with GCL but I guess I made some mistake.  The
> shell scripts for GCL and Armed Bear Common Lisp that came with the
> benchmark collection required quite a bit of fiddling.  STRING-CONCAT
> uses write-sequence/2 which can't be found in GCL.

You probably used a CLtL1 build of GCL.  To get an ANSI build, configure
like this:

     configure --enable-ansi (+ whatever other arguments you want)

before running make.  For help on other options, try the usual
"configure --help".

Write-sequence and CLOS are present in the ANSI build but not in the CLtL1
build.  You can check which one you have by looking at the startup message:

GCL (GNU Common Lisp)  2.6.3 ANSI   Jul 19 2004 15:11:02

or

GCL (GNU Common Lisp)  2.6.3 CLtL1   Jul 20 2004 11:26:58

It would be interesting to see how GCL goes on the PIII, as we have recently
noted significant execution speed differences relative to other
architectures depending, we believe, on the memory/cache architecture of the
CPU and system in question.

Cheers

Mike Thomas.

From: Douglas Crosher
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Fri, 30 Jul 2004 11:24:51 +0000
Message-ID: <410a3006$0$9140$c30e37c6@lon-reader.news.telstra.net>

Christian Pietsch wrote:
> after installing a couple of Lisp implementations on my new Linux box,
> I could not resist running Eric Marsden's benchmark collection on
> them.  These include the classic Gabriel benchmarks as well as
> ``mathematical functions, bignum-intensive operations, CLOS test,
> hashtable exercising, read-line exercising, and various operations of
> arrays, strings and bitvectors,'' if I may quote from his site:
> http://purl.org/net/emarsden/home/downloads
...

> It's interesting to see how some implementations excel at some tasks,
> and other ones at other tasks.  May I naively ask why nobody combines
> the best routines into a high-performance Lisp?  I'm sure somebody is
> trying to do this right now.
...

Out of interest, below are results for a 64 bit version of Scieneer
Common Lisp (SCL) running of an AMD64 3200 (2GHz, 1024k cache).  The
results are compared to the same reference, and most of the results
are close to this 32 bit CMUCL reference.  The larger fixnums in a
64 bit CL probably explain the improved performance in the CRC40
benchmark.  The bignum benchmarks also appear to do well on the 64 bit
platform, with SCL processing bignum calculations in 64 bit chunks.
The AMD64 Linux environment has matured well over the past year and
may well may be worth considering for CL based project development and
deployment.

Regards
Douglas Crosher
Scieneer Pty Ltd


-------------------------------------------------------------------------------------
Benchmark                 Reference  SCL 1.2.3b2
                           CMUCL18e   (AMD64)
-------------------------------------------------------------------------------------
COMPILER                 [      1.98]   1.03
LOAD-FASL                [      0.18]   0.72
SUM-PERMUTATIONS         [      2.05]   0.54
WALK-LIST/SEQ            [      0.03]   2.10
WALK-LIST/MESS           [      0.16]   0.41
BOYER                    [      5.51]   0.79
BROWSE                   [      0.58]   0.66
DDERIV                   [      0.56]   0.55
DERIV                    [      0.62]   0.58
DESTRUCTIVE              [      0.51]   0.86
DIV2-TEST-1              [      1.30]   0.44
DIV2-TEST-2              [      1.41]   0.54
FFT                      [      0.06]   1.43
FRPOLY/FIXNUM            [      0.54]   0.83
FRPOLY/BIGNUM            [      0.62]   0.55
FRPOLY/FLOAT             [      0.94]   0.75
PUZZLE                   [      0.17]   2.51
TAK                      [      0.36]   1.02
CTAK                     [      0.33]   0.96
TRTAK                    [      0.36]   1.01
TAKL                     [      0.37]   1.45
STAK                     [      0.47]   1.63
FPRINT/UGLY              [      0.68]   2.30
FPRINT/PRETTY            [      2.23]   2.15
TRAVERSE                 [      0.80]   1.27
TRIANGLE                 [      0.60]   1.67
RICHARDS                 [      0.46]   1.71
FACTORIAL                [      0.33]   0.85
FIB                      [      0.44]   1.02
FIB-RATIO                [      0.34]   0.60
ACKERMANN                [      5.63]   1.26
MANDELBROT/COMPLEX       [      9.13]   0.78
MANDELBROT/DFLOAT        [      5.26]   0.63
MRG32K3A                 [      0.78]   0.90
CRC40                    [     19.30]   0.02
BIGNUM/ELEM-100-1000     [      0.50]   0.51
BIGNUM/ELEM-1000-100     [      2.74]   0.24
BIGNUM/ELEM-10000-1      [      3.00]   0.18
BIGNUM/PARI-100-10       [      1.28]   0.16
BIGNUM/PARI-200-5        [     15.90]   0.10
PI-DECIMAL/SMALL         [     25.10]   0.32
PI-DECIMAL/BIG           [     53.32]   0.18
PI-ATAN                  [      1.38]   0.56
PI-RATIOS                [      4.51]   0.59
SLURP-LINES              [      0.89]   0.42
HASH-STRINGS             [      0.47]   1.13
HASH-INTEGERS            [      0.84]   2.22
BOEHM-GC                 [      1.89]   0.71
DEFLATE-FILE             [      0.45]   1.07
1D-ARRAYS                [      0.07]   1.26
2D-ARRAYS                [      0.87]   1.41
3D-ARRAYS                [      2.91]   1.12
BITVECTORS               [      0.63]   0.86
BENCH-STRINGS            [      2.54]   1.05
fill-strings/adjustable  [     19.24]   1.01
STRING-CONCAT            [     55.08]   0.45
SEARCH-SEQUENCE          [      3.03]   0.87
CLOS/defclass            [      3.46]   0.21
CLOS/defmethod           [      7.27]   0.29
CLOS/instantiate         [      8.03]   0.44
CLOS/simple-instantiate  [      0.33]   1.01
CLOS/methodcalls         [      0.33]   4.30
CLOS/complex-methods     [      0.06]  15.53
EQL-SPECIALIZED-FIB      [      0.32]   4.09
Reference time in first column is in seconds; other columns are relative
Reference implementation: CMU Common Lisp 18e CVS Head 2003-09-14 16:41:12 (binary rel.)

From: Eric Marsden
Subject: Re: new benchmark results for 8 CL implementations in cliki.net
Date: Mon, 02 Aug 2004 17:29:00 +0000
Message-ID: <wzismb578kj.fsf@melbourne.laas.fr>

>>>>> "cp" == Christian Pietsch <·······@interling.de> writes:

  cp> What this tells me as a Lisp non-expert is that building you own CMUCL
  cp> does not alway increase performance (except for WALK-LIST/MESS).

  as Raymond noted, the WALK-LIST/* tests were bogus, because the data
  they were working on were generated using RANDOM. I have uploaded a
  fixed version to <URL:http://www.chez.com/emarsden/downloads/>. In
  the future I shall be creating a common-lisp.net project for
  cl-bench, to make it easier for others to contribute.

  cp> It's interesting to see how some implementations excel at some tasks,
  cp> and other ones at other tasks.  May I naively ask why nobody combines
  cp> the best routines into a high-performance Lisp? 

  until they test them, people may not be aware of the performance
  characteristics of different implementations. This is one of the
  reasons for maintaining the cl-bench suite: it has highlighted some
  areas where implementations' performance can be improved
  considerably. For instance, the atrocious performance of older
  versions of Allegro CL on some of the tests caused Franz to improve
  its product.

  Performance is also not the only factor to consider when building a
  CL implementation; ease of debugging, maintainability, memory
  footprint may be more important to a vendor. 

  cp> Poplog also has no write-sequence/2, and it refused to load the
  cp> definition I used for GCL, so it got -1. I would like to have
  cp> given it a -2 for all the hassle I had setting it up. That must
  cp> be the reason why it is only used at 3 or 4 sites. Otherwise it
  cp> has nice features integrating Prolog, Pop11 and Standard ML
  cp> besides a reasonably fast CLtL2 Lisp.

  yes, given its implementation technique Poplog is really quite fast.
  It's a shame that it's no longer really maintained.


  BTW, I see you mention that you changed the cl-bench parameters when
  running GCL, to increase the heap size that it uses. To make things
  as fair as possible, I make an attempt to run each implementation
  under similar conditions, including factors such as the heap size.
  If you change this, please list the changes explicitly when
  publishing results.
  
-- 
Eric Marsden                          <URL:http://www.laas.fr/~emarsden/>

From: Christian Pietsch
Subject: benchmark results updated, now including Allegro CL 6.2 Enterprise
Date: Tue, 03 Aug 2004 13:37:42 +0000
Message-ID: <ceo4f6$6uenv$1@hades.rz.uni-saarland.de>

Eric Marsden wrote in article <···············@melbourne.laas.fr>:
>   BTW, I see you mention that you changed the cl-bench parameters when
>   running GCL, to increase the heap size that it uses. To make things
>   as fair as possible, I make an attempt to run each implementation
>   under similar conditions, including factors such as the heap size.
>   If you change this, please list the changes explicitly when
>   publishing results.

When generating the figures I reported here, GCL ran with stack limits
unchanged.  What I mentioned in #lisp was a later, unpublished
experiment.

[ACL 6.2] Incidentally I found out that my institute has a license for
Allegro Common Lisp 6.2. Enterprise edition, so I could test it too
(see below).  Some tests showed a spectacular speedup.  Note that I am
not yet using Eric's new test suite released on August 2.

Before I present the new results table, I would like to share some
comments:

[Armed Bear Common Lisp] The LOAD-FASL and COMPILER tests were
disabled because because in earlier versions, Armed Bear Common Lisp
did not support file compilaton.  Just a day after I published my
results, Peter Graves informed me he had fixed a rational-to-float
conversion bug in Armed Bear CL, so I checked out the fix from CVS and
got three more tests running: PI-DECIMAL/BIG, PI-ATAN, and
3D-ARRAYS.  The two ABCL columns contain all new measurements.

[CMUCL] Some people on Freenode's #lisp channel reported credibly that
speed improvement from CMUCL version 18e to 19a is much greater on
other machines. 

[GCL] They also expressed doubts about the validity of compile time
measurements for GCL (GNUCL in the table).  I should add that GCL in
ANSI mode is indeed capable of running CLOS code.  However, I stumbled
over a GCL bug that triggers an ``Error in FIND-METHOD-COMBINATION.''
I have been told that this bug is on the to-do list of the GCL
developers.

CLOS/method+after test results are still missing altogether because
report.lisp refuses to generate this table row for unknown reasons.

A copy of the table below can be found at the usual place:
http://www.cliki.net/Performance%20Benchmarks

Cheers,
Christian



,---- dual Pentium 4, 3.0 GHz, 1000 MB RAM, 512 KB cache ---

Benchmark               Reference  CMUCL CMUCL  SBCL    ACL     ACL  GNUCL   ECL-S  CLISP Poplog     ABCL    ABCL
                        CMUCL 18e    19a 19aP4 0.8.13   6.2     5.0  2.6.3    0.9d 2.33.2  15.53   J1.4.2  J1.5b2
-----------------------------------------------------------------------------------------------------------------
COMPILER                 [  1.98]   1.15  1.05  1.49   0.60    0.90   0.08    3.50   0.69   0.01    20.23   19.35
LOAD-FASL                [  0.18]   1.16  1.12  2.79   1.70    1.19   0.45    2.61   2.19  11.14    24.47   23.08
SUM-PERMUTATIONS         [  2.05]   1.13  1.11  1.52   0.84    1.52   1.24   -1.00   1.51   1.63     6.39    5.51
WALK-LIST/SEQ            [  0.03]   1.00  0.96  1.00   1.07    1.07   1.43    4.29   1.00  10.36     3.93    3.82
WALK-LIST/MESS           [  0.16]   1.31  0.47  1.82  -1.00   -1.00   2.06    0.75   1.57  -1.00    -1.00   -1.00
BOYER                    [  5.51]   0.99  0.97  0.86   1.14    1.05   2.86    4.23  12.43   7.73     5.65    4.22
BROWSE                   [  0.58]   1.04  0.97  0.96   0.44    1.33  12.85    2.50   6.27   1.74     4.86    3.43
DDERIV                   [  0.56]   0.97  0.98  0.91   0.54    1.11   3.97    6.97  12.55   2.30     7.98    6.38
DERIV                    [  0.62]   1.02  1.01  0.93   0.47    0.41   3.65    6.37  12.88   2.11     6.99    5.36
DESTRUCTIVE              [  0.51]   0.97  1.00  0.89   0.33    0.33   3.66    3.72   7.75   2.39     8.89    5.11
DIV2-TEST-1              [  1.30]   0.97  0.94  0.71   0.15    0.18   2.66    4.54  10.66   1.64     4.18    1.99
DIV2-TEST-2              [  1.41]   0.96  0.94  0.84   0.33    0.33   2.89    4.42  10.49   1.34     1.61    0.64
FFT                      [  0.06]   1.03  1.02  0.87   0.94    1.09  55.00   42.03  65.58 250.16    63.09   23.39
FRPOLY/FIXNUM            [  0.54]   0.99  1.00  0.87   1.39    2.00   2.52    4.69  12.59   3.04    13.01    5.85
FRPOLY/BIGNUM            [  0.62]   0.98  1.02  0.84   1.49    2.20   1.68    2.84   7.81   1.60     2.06    1.57
FRPOLY/FLOAT             [  0.94]   0.88  0.90  0.76   1.09    2.76   1.35    2.70  10.75   2.19     1.77    1.27
PUZZLE                   [  0.17]   0.99  1.00  1.10   4.31    1.90   6.21   46.26  71.59  14.14   109.75   61.32
TAK                      [  0.36]   1.02  1.00  1.00   0.62    0.78   1.71    7.17  13.94   2.97    10.29    6.06
CTAK                     [  0.33]   1.20  1.19  0.94   3.71    3.34   4.53    6.90  11.24  25.02  1461.22 1128.68
TRTAK                    [  0.36]   1.01  0.99  0.99   0.61    0.78   1.67    7.13  13.64   2.95     7.44    3.35
TAKL                     [  0.37]   1.05  1.00  0.99   0.85    0.98   0.87    4.78  20.55   5.03     4.37    4.08
STAK                     [  0.47]   1.12  0.99  1.05   5.35    2.54   1.67   -1.00   7.64   7.34    15.64    7.32
FPRINT/UGLY              [  0.68]   1.08  1.06  1.63   3.30   13.72   1.23    1.95   1.70   8.58    21.18   21.04
FPRINT/PRETTY            [  2.23]   0.95  0.94  1.37   2.30    4.40   0.39    1.22   3.34   6.77   216.94  199.28
TRAVERSE                 [  0.80]   1.00  1.00  4.21   0.56    0.64   2.23    7.21  14.10   5.29    20.60   15.60
TRIANGLE                 [  0.60]   1.04  1.07  1.01   1.48    0.68   2.09   18.39  30.20  10.58    17.64   12.62
RICHARDS                 [  0.46]   1.05  1.03  1.01   4.43    3.21   1.19   10.72  30.24  12.08    28.02   24.34
FACTORIAL                [  0.33]   0.98  0.97  1.33   3.15    3.03   9.79    4.71  20.27   2.40     3.55    2.16
FIB                      [  0.44]   1.01  1.00  1.01   0.25    0.27   4.77    2.24   6.44   1.10     5.55    2.26
FIB-RATIO                [  0.34]   0.99  0.99  1.01  11.25   15.95  30.55    0.99   0.11   2.94     3.24    1.69
ACKERMANN                [  5.63]   1.00  1.01  1.00   0.53    0.64  22.28    2.06   6.69   1.92     4.33    2.05
MANDELBROT/COMPLEX       [  9.13]   0.97  0.97  1.01   1.80    5.36   1.93    2.24  10.58   0.77     0.75    0.32
MANDELBROT/DFLOAT        [  5.26]   0.99  0.98  0.96   0.92    5.55   2.51    3.42  13.09   1.50     1.58    0.76
MRG32K3A                 [  0.78]   1.00  1.00  1.03   5.93  183.82   5.45    7.13 118.85   7.38    10.43    8.45
CRC40                    [ 19.30]   1.01  1.11  0.95   4.02    4.27  13.52    8.91  12.63   2.26     4.40    2.23
BIGNUM/ELEM-100-1000     [  0.50]   1.01  1.02  1.17   1.59    1.88   0.77    0.22   0.10   2.40     1.56    1.46
BIGNUM/ELEM-1000-100     [  2.74]   1.01  1.00  1.40   1.66    1.68   0.92    0.05   0.07   3.08     0.43    0.35
BIGNUM/ELEM-10000-1      [  3.00]   1.01  1.00  1.38   4.77    4.80   0.79    0.05   0.06  19.89     0.48    0.37
BIGNUM/PARI-100-10       [  1.28]   0.50  0.49  0.69   0.05    0.07   0.02    0.02   0.01   0.02     0.19    0.15
BIGNUM/PARI-200-5        [ 15.90]   0.49  0.49  0.76   0.02    0.02   0.01    0.00   0.00   0.01     0.02    0.02
PI-DECIMAL/SMALL         [ 25.10]   1.01  1.00  1.42  14.89   15.31   8.44    0.27   0.24   3.31     1.97    1.79
PI-DECIMAL/BIG           [ 53.32]   1.00  1.00  1.55  15.63   15.74   6.07   -1.00   0.05   3.84     1.90    1.58
PI-ATAN                  [  1.38]   1.01  0.99  1.29   2.97    2.98   5.80   -1.00   6.89   1.16     2.40    2.38
PI-RATIOS                [  4.51]   0.99  1.00  1.11   5.31    5.42   3.91    0.58   0.33   2.01     1.54    1.44
SLURP-LINES              [  0.89]   0.96  0.98  1.02   0.57    7.09   2.85    1.63   4.69   3.00    23.99   21.94
HASH-STRINGS             [  0.47]   0.84  0.78  0.81  13.63 4218.26 251.42   50.04   5.81  35.71     7.68    6.02
HASH-INTEGERS            [  0.84]   0.99  0.94  1.21   3.13    4.89   0.58    1.49   6.31  11.79     5.04    4.08
BOEHM-GC                 [  1.89]   1.01  0.97  1.16   1.37    3.82   3.89    7.87  18.34   2.22    12.13    7.14
DEFLATE-FILE             [  0.45]   1.30  1.30  1.22   1.07   24.97   3.27    9.55   8.87   7.80    16.87   11.25
1D-ARRAYS                [  0.07]   1.01  1.03  1.49   2.60    8.22   4.11    6.30  12.14  18.63     4.79    3.29
2D-ARRAYS                [  0.87]   1.01  1.01  0.63  10.25    9.77  22.62   21.31  26.39  14.16    46.67   31.88
3D-ARRAYS                [  2.91]   1.00  1.00  0.77   7.88    7.62   8.91   16.19  16.69   7.88    36.01   26.28
BITVECTORS               [  0.63]   0.97  0.95  1.60   1.69    1.74  10.16   13.12  33.12  -1.00  2334.29  822.13
BENCH-STRINGS            [  2.54]   1.02  0.99  0.14   3.88   16.87   4.30    5.78   0.47  33.55     9.67    4.18
fill-strings/adjustable  [ 19.24]   1.01  1.00  1.00   1.47   -1.00   0.90   15.19   4.62 104.94     1.22    0.91
STRING-CONCAT            [ 55.08]   1.07  1.03  0.50   0.60   -1.00  26.24   -1.00  10.29  -1.00     2.23    1.66
SEARCH-SEQUENCE          [  3.03]   1.00  1.00  0.05   0.73    4.07   2.28    3.29   3.58   3.61    11.74    9.28
CLOS/defclass            [  3.46]   1.08  1.07  0.63   0.05    0.11  -1.00    0.47   0.12   0.02     3.92    3.75
CLOS/defmethod           [  7.27]   1.03  1.05  0.85   0.03    0.05  -1.00    0.17   0.01   0.00     1.34    1.27
CLOS/instantiate         [  8.03]   1.13  1.09  0.98   0.26    0.86  -1.00  281.76   1.36   1.55    28.19   27.07
CLOS/simple-instantiate  [  0.33]   0.99  0.99  0.96   0.84    0.57  -1.00   51.78  18.96 112.65   692.94  635.93
CLOS/methodcalls         [  1.61]   0.97  0.97  1.23   3.25    3.01  -1.00   50.49  10.92   9.61    97.55   86.93
CLOS/complex-methods     [  0.06]   1.00  0.97  7.44   4.92    5.76  -1.00 1963.22  -1.00  -1.00  1799.75 1600.25
EQL-SPECIALIZED-FIB      [  0.32]   1.02  1.01  1.06   1.19   10.87  -1.00    7.16   5.30   9.06 10336.55 9321.19

Reference time in first column is in seconds; other columns are relative
Reference implementation: CMU Common Lisp 18e CVS Head 2003-09-14 16:41:12 (binary rel.)
Impl CMUCL:  CMU Common Lisp 19a-pre3 (binary rel. from ftp.linux.org.uk/pub/lisp/cmucl)
Impl CMUCL:  CMU Common Lisp 19a-pre3 Pentium4 (built with CFLAGS="-march=pentium4")
Impl SBCL:   SBCL 0.8.13 (bootstrapped with CMUCL 18e)
Impl ACL6.2: International Allegro CL Enterprise Edition 6.2 [Linux/X86] (binary:)
Impl ACL 5:  Allegro CL Trial Edition 5.0 [Linux/X86] (no heap limit! binary only:)
Impl GNUCL:  Kyoto Common Lisp GCL 2.6.3 ANSI (gcc 3.3.1 -march=pentium4)
Impl ECL-S:  ECL 0.9d CVS Head 2004-07-27 (aka ECL Spain, gcc 3.3.1 -march=pentium4)
Impl CLISP:  CLISP 2.33.2 (2004-06-02) (not optimized for pentium4 because of make failure)
Impl Poplog: Sussex Poplog 15.53e Common Lisp 2.0 (using "easy to install version" haha)
Impl ABCL14: Armed Bear Common Lisp 0.0.3.16+ CVS Head 2004-08-01 / Sun Java 1.4.2-b28
Impl ABCL15: Armed Bear Common Lisp 0.0.3.16+ CVS Head 2004-08-01 / Sun Java 1.5.0-beta2-b51
             Java invocation for ABCL: java -server -Xss64M -Xmx256M -Xrs
=== Test machine ===
   Machine-type: X86
   Machine-version: dual Intel Pentium 4 at 3.00GHz, 1000 MB RAM, 512 KB cache
   Linux 2.4.21-199-smp4G i686 i386 GNU/Linux (SuSE 9.0)
(declaim (optimize (speed 3) (space 1) (safety 0) (debug 0) (compilation-speed 0)))
`----


-- 
  Christian Pietsch
  http://www.interling.de