Stalin's optimisations: Can they be used outside Scheme ?

From: Advance Australia Dear
Subject: Stalin's optimisations: Can they be used outside Scheme ?
Date: Sat, 26 Oct 2002 05:27:41 +0000
Message-ID: <bu9kru04q8eib094p1f0dfovtooa1j7kgi@4ax.com>

I have often wondered about the optimisations that Stalin does to
achieve it's speed. 

It seems to me that these optimisations could also be used to speed up
ML, Haskell or even CL implementations.

Here follows some information that I found in documents by  Siskind.

Any comments about the applicability of these outside the context of
Scheme ?

From Statement of Research
ftp://ftp.ecn.purdue.edu/qobi/sor.pdf
"Stalin is the only Lisp or Scheme compiler that does whole-program
analysis and is one of only a handful of whole-program compilers for
any language. 

Stalin does whole-program polyvariant 
flow analysis, reachability analysis, aggregate reference analysis,
points-to analysis, escape analysis, lifetime analysis, reentrency
analysis, and must-alias analysis and uses these analyses to support
globalization, lightweight closure conversion, lightweight CPS
conversion, in-lining, per-program and per{program-point
representation selection and coercion insertion, typetag,
type-check, and type-dispatch elimination, unboxing, and combined
heap-, region-, and stack-based storage management. Siskind (1999a)
discusses the lightweight closure-conversion optimizations performed
by Stalin as well as the analyses that support those optimizations.
The lightweight CPS-conversion techniques used by Stalin
build on earlier work on Screamer, an extension to Common Lisp that
supports backtracking and constraint-based reasoning. Siskind and
McAllester (1993a, 1993b) describe the earlier work on Screamer.

Stalin generates code that often runs ten times faster, and sometimes
even more than a hundred times faster, than code generated by other
current state-of-the-art native-code Scheme compilers. It even
surpasses handwritten c code in some cases. In one benchmark, a
merge-sort routine compiled with Stalin outperformed the Berkeley
Unix qsort implementation by a factor of seven and outperformed a
merge-sort procedure handwritten in c by a factor of two. In another
benchmark, a double-integration algorithm compiled by Stalin
outperformed handwritten implementations of the same algorithm in
Fortran, c, Pascal, and Ada. The c code generated by Stalin
outperformed the handwritten c code by a factor of twenty despite the
fact that the handwritten code did not exhibit any obvious  flaws."

And from Flow-Directed Lightweight Closure and CPS Conversion by
Siskind:
http://www.ai.mit.edu/projects/dynlangs/Talks/flow-directed-cps-conversion.htm

"Jeffrey Siskind

Affiliation
NEC Research Institute, Inc.

Abstract
Lexically-scoped higher-order programming languages require closures
for procedures. Typical implementations create a closure for every
procedure and a variable slot for every free variable. This entails
substantial overhead. In this talk, I will present a novel
compile-time analysis and code-generation strategy, called lightweight
closure conversion, that significantly reduces the number of
procedures that need closures and the number of variables that need
slots, resulting in faster code.

Programming languages such as Scheme and SML/NJ support first-class
continuations through a CALL/CC primitive. Without continuations,
programs obey a last-in-first-out procedure-call discipline that
allows stack- based allocation of activation records. In the presence
of continuations, programs no longer obey a last-in-first-out
procedure-call discipline and thus require heap-based allocation of
activation records. In the past, two general strategies have been used
to support CALL/CC. One involves converting the entire program to
continuation-passing style (CPS). This process is called CPS
conversion. In essence, this heap-allocates all activation records.
The other involves leaving the program in direct style by implementing
mechanisms to copy portions of the stack to the heap when creating
continuations and copying saved continuations from the heap back to
the stack when calling continuations. The former has the advantage
that creating and calling continuations is cheap but the disadvantage
that ordinary procedure call and return is expensive. The latter has
the reverse set of tradeoffs. Lightweight CPS conversion is a novel
scheme that gives the advantages of both approaches, without the
disadvantages, in most situations."

Alos worth reading in this context is Siskind's `Flow-Directed
Lightweight Closure Conversion,' Technical Report 99-105, NEC Research
Institute, Inc., July 1999. available at
ftp://ftp.ecn.purdue.edu/qobi/fdlcc.pdf

Re: Stalin's optimisations: Can they be used outside Scheme ? Shriram Krishnamurthi
- Re: Stalin's optimisations: Can they be used outside Scheme ? Michael Hudson
Re: Stalin's optimisations: Can they be used outside Scheme ? William D Clinger

From: Shriram Krishnamurthi
Subject: Re: Stalin's optimisations: Can they be used outside Scheme ?
Date: Sat, 26 Oct 2002 14:59:07 +0000
Message-ID: <w7dznt1l004.fsf@cs.brown.edu>

Advance Australia Dear <····················@yahoo.com> writes:

> I have often wondered about the optimisations that Stalin does to
> achieve it's speed. 
> 
> It seems to me that these optimisations could also be used to speed up
> ML, Haskell or even CL implementations.

They could.  Indeed, good compilers for these languages do employ most
of these techniques.

The major difference between the other compilers and Stalin is that
most of the others are not whole-program compilers to the same extent
that Stalin is, so the scope over which they apply these techniques is
much more limited.  This exposes opportunities for optimization to
Stalin, but is expensive and may not integrate well into the
development cycle.

Many good compilers now report performance comparable to some
hand-written C programs.  I believe Bigloo, Larceny and Gambit for
Scheme and OCaml for ML make such claims on particular cases.  These
glib comparisons are not too useful, however, because much depends on
the exact nature of the program.  For instance, I believe Bigloo and
Stalin did not handle tail calls in general (maybe one or both do
now).  Also, Stalin has seen special investment into efficiently
handling floating-point numbers without boxing, because of the kinds
of programs Siskind wants to be able to write, but that in turn has an
impact on the kinds of analyses it must use.  Of the lot, I think
Stalin has better news to report than the others, but that's not
surprising for a whole-program compiler.  I bet it also has longer
compilation times of the lot.  This trade-off between compilation time
and execution time is non-trivial, and you can't leave it out of any
comparisons you make.

Stalin's a great project, and it's definitely very nice that someone
has taken the trouble to build and maintain a whole-program compiler.
However, it seems to me (and maybe Jeff will correct me) the lessons
of Stalin are not so easy to adapt to other compilers without
completely changing their nature.  If you don't understand why, you
need to think about your question a bit harder.

Shriram

From: Michael Hudson
Subject: Re: Stalin's optimisations: Can they be used outside Scheme ?
Date: Fri, 01 Nov 2002 13:19:05 +0000
Message-ID: <lk7kfxzaz5.fsf@pc150.maths.bris.ac.uk>

Shriram Krishnamurthi <··@cs.brown.edu> writes:

> I bet [stalin] also has longer compilation times of the lot.

No kidding; the README says:

    If you wish to rebuild Stalin from the Scheme sources (given a
    running Stalin compiler), you can touch or modify the file
    `stalin.sc' and then do:

    % make

    which will first generate `stalin.c' from `stalin.sc' and then
    generate `stalin' from `stalin.c'.  This takes about 17 hours and
    800M on a 450MHz Xeon with 512K L2.

17 hours!  Ouch.

Cheers,
M.

-- 
  To summarise the summary of the summary:- people are a problem.
                   -- The Hitch-Hikers Guide to the Galaxy, Episode 12

From: William D Clinger
Subject: Re: Stalin's optimisations: Can they be used outside Scheme ?
Date: Sat, 26 Oct 2002 20:33:25 +0000
Message-ID: <b84e9a9f.0210261233.607b316c@posting.google.com>

Advance Australia Dear wrote:
> I have often wondered about the optimisations that Stalin does to
> achieve it's speed. 
...
> Any comments about the applicability of these outside the context of
> Scheme ?

Except perhaps for the lightweight CPS conversion, every optimization
that you listed in your message would be applicable to Common Lisp.

Will