LLVM: A superior Lisp compiler framework?

From: Adam Warner
Subject: LLVM: A superior Lisp compiler framework?
Date: Sat, 18 Dec 2004 11:29:36 +0000
Message-ID: <pan.2004.12.18.11.29.33.33249@consulting.net.nz>

Hi all,

Today I learned of LLVM, the Low Level Virtual Machine framework. It is
*very* impressive: <http://llvm.cs.uiuc.edu/>

It's a BSD-style licensed ahead-of-time, just-in-time, highly optimising
compiler with a very attractive bytecode format that doesn't force a
particular object system upon bytecode generators. In addition:
<http://llvm.cs.uiuc.edu/pubs/2004-01-30-CGO-LLVM.html>

   LLVM defines a common, low-level code representation in Static Single
   Assignment (SSA) form, with several novel features: a simple,
   language-independent type-system that exposes the primitives commonly
   used to implement high-level language features; an instruction for
   typed address arithmetic; and a simple mechanism that can be used to
   implement the exception handling features of high-level languages (and
   setjmp/longjmp in C) uniformly and efficiently.

Furthermore it has GNU C and C++ to LLVM translators based upon GCC (this
part is GPL'd, of course). This is handy for bootstrapping a language and
library support. However you can completely bypass GCC by generating LLVM
bytecode that is natively or JIT compiled (and we're talking state of the
art optimisation with multiple register allocation and spiller algorithms,
peephole optimisation, frame pointer elimination, etc). As a consequence,
<http://llvm.cs.uiuc.edu/docs/FAQ.html#cfe_code> ;-)

   Where did all of my code go??

   If you are using the LLVM demo page, you may often wonder what happened
   to all of the code that you typed in. Remember that the demo script is
   running the code through the LLVM optimizers, so if your code doesn't
   actually do anything useful, it might all be deleted.

There's also an LLVM->LLVM optimiser which "takes LLVM bytecode as input,
runs the specified optimizations on it, and then outputs the optimized
LLVM bytecode" (this may assist naive bytecode generators). LLVM includes
type analysis that is able to deduce more safe optimisations than some C
compilers. Tests and benchmarks are run daily:
<http://llvm.cs.uiuc.edu/testresults/X86/#Program>

If you compare the GCC and CBE (C backend->LLVM) columns you will see its
native compilation occasionally does better than GCC -O2 (and its JIT
results are not too shabby).

Lisp developers have wanted access to GCC's backend for years but
political issues have made this technically and legally infeasible:
<http://gcc.gnu.org/ml/gcc/2000-01/msg00572.html>.

In addition to X86 there are SparcV9 and PowerPC backends. There is also
a plain interpreter (largely for presently unsupported JIT platforms).

There is also support for hooking in precise garbage collectors:
<http://llvm.cs.uiuc.edu/docs/LangRef.html#int_gc>
<http://llvm.cs.uiuc.edu/docs/GarbageCollection.html>
(A simple copying garbage collector example is available)

An instructive Scheme compiler has been written by Tobias Nurmiranta
<http://www.ida.liu.se/~tobnu/scheme2llvm/> in around a thousand lines of
code. As a batch compiler it converts a subset of Scheme to assembly which
can then be piped to llvm-as for bytecode generation and execution by lli
(the JIT or plain interpreter) or llc (the ahead-of-time compiler).

Note that the LLVM platform also supports in-memory code generation that
bypasses the file system. At this stage it appears the only JIT code
generation API is C++.

I hope there will soon be instructive solutions to incremental run time
compilation (or JIT compilation) without having to write a REPL in C++.
This is the most applicable discussion about JIT compilation from the
developer's mailing list:
<http://mail.cs.uiuc.edu/pipermail/llvmdev/2004-November/002566.html>

I may not be able to participate in any discussion this message generates.
I just want to pass this info on to the Lisp/Scheme community.

Regards,
Adam

From: Adam Warner
Subject: Re: LLVM: A superior Lisp compiler framework?
Date: Sun, 19 Dec 2004 07:50:32 +0000
Message-ID: <pan.2004.12.19.07.49.30.601946@consulting.net.nz>

Hi Michael Sperber,

> Adam> Today I learned of LLVM, the Low Level Virtual Machine framework. It is
> Adam> *very* impressive: <http://llvm.cs.uiuc.edu/>
> 
> Without an efficient representation of first-class continuations, it's
> too weak for Scheme.  I also don't see direct support for proper tail
> recursion, which would also be needed to make things work well.

The tail call documentation is unofficial and currently available from the
primary author of LLVM, Chris Lattner:
<http://nondot.org/sabre/LLVMNotes/>
<http://nondot.org/sabre/LLVMNotes/GuaranteedEfficientTailCalls.txt>

   ...

   Efficient support for aritrary tail calls is absolutely essential for a
   broad class of languages, particularly functional languages.  In
   particular, the Scheme language requires, as part of its language
   definition, that all implementations implement arbitrary tail calls
   efficiently (even indirect calls!).

   ...

   LLVM currently supports an optimization named 'tail recursion
   elimination'. This optimization converts self recursive functions who
   have simple tail calls into explicit loops.  There is a large amount of
   overlap between tail recursion elimination and proper tail call
   support, but proper tail call support is more general: it applies to
   arbitrary tail calls to arbitrary (even unknown) functions.

   ...

   ... languages that do not require all functions to be compatible with C
   calling conventions should default to marking their functions as CC#0
   explicitly: this will permit guaranteed predictable tail calls in all
   non-varargs cases (which are all of the cases possible).

Regards,
Adam