From: Martin Cracauer
Subject: Lisp-benchmarks: Call for apps and comments
Date: 
Message-ID: <1994Apr15.130741.15988@wavehh.hanse.de>
Hello, 
some of you might remember that we had a discussion about Lisp
benchmarks a few weeks ago and that I called for volunteers that would
run a given set of benchmarks. Well, there are enough of them to cover
almost all widely-used UNIX workstations and their Lisp Systems, MCL
and LispM. The only platforms missing are MS-Windows based systems.

So, this is a call for applications that would be usable within a set
of benchmarks and - of course - for comments on how to collect and run
such a set of benchmarks.

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
If you've got or know an application that you would contribute and it
fits the demands below (or could be made so), please contact me.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Let me first explain how I think we could get a useful set of
benchmarks and then show the demands for applications that would be
usable for this benchmark collection. Those of you who are interested
in these demands only may skip to 'Application Demands:'.

I expect the discussion to be very Lisp-specific, so please followup
to comp.lang.lisp and keep comp.benchmarks clear. I might set up a
mailing-list if necessary.

When I say 'Lisp' in this posting, I mean Common Lisp.

General description of what to do:
----------------------------------

I think of collection a set of applications similar to what the
SPECmark suite is based on.

This benchmark collection should:

- consist of several part that covers different applications
- consist of 'real-world' programs with 'real-world' data.
- freely distributable via ftp
- be easy to run

This collection is not intended to be useful to discover problems with
specific parts of a Lisp implementation, say function calls, consing,
garbage collection. Instead, we want to test the overall performance
for a given kind of application, say floating-point insensitive
programs, search-insensitive apps...

How do I define 'real-world-application'? A program on which one's
work is based, which produces results that are usable for other
purposes than benchmarking :-) They don't have to be big programs, in
fact I'd like to see a good mix of small and big programs.

The data sets in the benchmarks should be of the same size as in the
application's natural operation. One of the strongest factor in
performance of today's workstations is cache and memory bandwidth, so
reducing the size would not be useful.

Therefore, I don't require any application to have short run time.
They should be easy to run, but once running I find it necessary to
allow them to run for several hours (all the benchmarks together, I
hope we'll have some shorter apps, too :-)

I don't want to bother the person who runs the benchmark too much, so
we must take care that no further observation is required after the
first few minutes of compiling. So, the application should be able to
be compiled first without actually running it, so that the compilation
of the entire benchmark set can be done first.

One problem is implementation-specific optimization. My suggested
solution is: The benchmark collection may be run twice. One run
includes only generic Common-Lisp optimizations, for the second run
all optimizations are allowed, as long as program flow is not affected.
Of course, there will be additional guideline, such as that you aren't
allowed to use foreign modules.

So, for every given Lisp implementation there are two values, 'as
shipped' and 'optimized, maybe vendor-specific'. 

This way the effort of providing a set of benchmarks would not grow
too much, because the applications aren't required to be optimized
best for all platforms.  In fact, I could even think of having
applications without too much Common-Lisp optimization in it. We want
to test a system for the performance a user could expect. And some
programs we run don't even have all useful declarations, maybe because
they're not intended to be used much. Anyway, it would be
useful to have those run as fast as possible. So, if we include apps
without too much declaration we could get an idea how good a given
Lisp implementation can handle this.

If someone feels that the benchmarks discriminate his Lisp more than
others, he is free to optimize it as much as he can and provide this
as the second value.

Another problem is how to handle garbage collection time. I find it
impossible to handle garbage collection time different from the user
run time. Not all CL implementation's "(time ...)" tell you about GC
time, so we have to use 'real time'/'run time', which include GC time.
So, we must provide guidelines on how the garbage collector should be
set when running the 'as shipped' part of the benchmarks. Any
comments? 

I want to force everyone who publish results of this benchmark suite
to include all individual numbers, not only a mean. I really find it
annoying what happens with the SPECmark numbers. The mean is not too
useful to see if a platform fits your needs. With different kinds of
Lisp-applications it will be even harder to see this without the
individual results. Additionally, published results must include a
useful description of the platform, including version numbers of the
software, RAM of the machine and so on.

One word on the gabriel benchmarks: As you might know, there is
already a set of benchmarks, the gabriel benchmarks. But these
benchmarks are different from the approach used here. The gabriel
benchmarks mostly tests specific features, not full applications.
Additionally, they are a bit outdated, the run time of most of them is
too short on modern platforms to be useful. Anyway, the gabriel
benchamrks are useful to tune an implementation.  I want to do
something different - evaluating the overall performance for various
kinds of applications - and by no mean want to degrade Dick Gabriel's
work.

'Application Demands:'
----------------------

So, this is what I think we need. The applications that form the CL
benchmark set should be:

- freely distributable
- the whole compilation should be done apart from the main run
- should not include any vendor-specific optimizations
- in general, they should not include any non-CL feature
- they aren't required to have best CL-level optimization, they should
  be a real-world application and include the same optimizations as the
  original author found them sufficient.
- it should be obvious what the benchmark does, so that the reader of
  results could imagine if it fits it own apps.
- the total run time should be more than 30 seconds on fast machines
  or more than 3 minutes on slow machines (SPARC 1 with kcl/clisp, MCL
  on 68030). If the run time is shorter, it is not acceptable to
  simply run the program multiple times, at least not without
  investigation of the effects. Maybe the repetition will be computed
  mostly on cached values. On the other hand, more than a few hours on
  slow machines isn't acceptable, too. 

The author should be reachable by e-mail, because there may be some
modifications necessary to write the run times out in a re-readable
way and to remove non-CL-statements. I want to make sure that these
modifications don't change the application itself. And I need a
description of what the application does. 

Otherwise, there will be no further work to be done by the original
author. So shortage of time shouldn't make you think you cannot
contribute here.


One word on my person: I am most certainly not the most experienced
Lisp-Hacker, just an in-house developer that needs to run his
applications faster. Anyway, my Lisp knowledge is sufficient to
compile this set of benchmarks and make them useful and easy-to-run on
several platforms. I am a Harlequin customer, but that won't affect
this work in any way. My Internet connection is this private e-mail
and dial-up-Internet, so I won't be able to make the collection
available by ftp on this machine. As you might have seen, by english
is not the best, but I'm afraid you have to live with it :-)

Any questions, suggestions welcome.

P.S. we need a word to name what we are doing "CLmarks" ok for you?
-- 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <········@wavehh.hanse.de>,Voice+4940-5221829,Fax.-5228536
Waldstrasse 200, 22846 Norderstedt, Germany.     German language accepted.