On the implementation of lexical variables...

From: ····@domain.invalid
Subject: On the implementation of lexical variables...
Date: Tue, 06 Nov 2001 22:22:54 +0000
Message-ID: <3BE862BE.5050305@domain.invalid>

     Sorry to post anonymously, but my employer, etc.

     I've been trying to implement a core Lisp-2 style language in the
     style of Baker's critique of the DKLisp definition (see
     http://www.http://linux.rice.edu/~rahul/hbaker/CritLisp.html )
     with the further goal of keeping the interpreter under 10,000 lines
     of ANSI C. The eventual goal is to produce a small, portable
     interpreter for use in scripting and as a bootstrap for a more
     canonical CL implementation. Never mind why I decided to do this
     in the face of readily available and competent CL implementations,
     it's a sort of Road To Damascus thing.

     I've lurked here for years, and both Tim Bradshaw and Barry Margolin
     have been generous enough to answer private questions. It's plain
       that the question I'm about to ask is broad enough so that I should
     not impose on either of them, so I'll toss it out here and see if
     I can understand any responses.

     I've implemented symbols as a data structure containing pointers
     to function and value slots, as a low tag pointer (2 bits for the
     tag.)  Since read/intern don't know anything about lexical or
     special variables, I allocate a symbol from the heap every time
     read sees a new one. This implementation does not use packages to
     map names to symbols, I keep them in a single hash table.

     The documentation of environments in both CLTL2 and the Hyperspec
     is vague to the point of distraction (this is not a complaint about
     the authors' capabilities, from reading Kent's comments in previous
     threads it seems that environments were deliberately not detailed.)
     Currently, I have the lexical environment implemented as a vector of
     symbol data structures.

     Lambda is the single primitive form that establishes a lexical
     binding; let and friends are macros that expand to a lambda form.
     When I evaluate a lambda form, I note the lexical variables
     in the lambda list, allocate space in the lexicals vector, rewrite
     all the corresponding symbol pointers in the body to be of type
     lexical and to point to the corresponding symbol in the lexical
     vector, and wrap the resulting body in a nil labelled block. The
     lexical symbol contains a pointer to the corresponding special of
     the same name so forms that only access globals can get to the
     global binding.

     On evaluating the resulting block, I save the state of the form's
     lexicals on the stack, bind the evaluated parameters to them, and
     evaluate the body. Upon exit, I restore the shadowed lexicals from
     the stack. This makes recursion work.

     This design made getting at the value and function slots of lexicals
     a simple machine pointer indirection (after masking the tag) and the
     corresponding global only one more indirection after that. The
     alternative seemed to be to look up the lexical variable in the
     environment vector every time I need its binding.

     The problem (other than where this design might not be correct for
     some implication I have not yet seen) is that rewriting symbols in
     the body forms of the lambda correctly needs something like a
     complete code walker, which is not easy. Further, it does not
     implement closures at all since it does not associate specific
     lexical bindings with a function object. Doing that would require
     a per-closure private binding vector, which gets very messy
     very quickly. Added to that, unwind-protect not only has a lexical
     enviroment for its clean up forms, but it also has to capture and
     restore special bindings.

     It was at this point that my head started to hurt and I reviewed
     the literature. I went through www.lisp.org and comments in this
     newsgroup, and have a line on a copy of SICP, which I believe may
     help. This cannot be as complicated as I'm making it, but I'm kind
     of stumped. I know I could go read the clisp source to see how they
     do it, but I want to understand it first and I obviously don't.

     Any suggestions for additional resources or comments to help me
     figure this out?

     My thanks.

-- 
James M. Putnam

Re: On the implementation of lexical variables... Daniel Lakeland
- Re: On the implementation of lexical variables... ·······@nvidia.com
  - Re: On the implementation of lexical variables... Daniel Lakeland
    - Re: On the implementation of lexical variables... ·······@nvidia.com
      - Re: On the implementation of lexical variables... Daniel Lakeland
        Re: On the implementation of lexical variables... ·······@nvidia.com
        Re: On the implementation of lexical variables... Daniel Lakeland
    - Re: On the implementation of lexical variables... Thomas F. Burdick
      - Re: On the implementation of lexical variables... Daniel Lakeland
        Re: On the implementation of lexical variables... Thomas F. Burdick
        Re: On the implementation of lexical variables... Daniel Lakeland
        Re: On the implementation of lexical variables... Thomas F. Burdick
        Re: On the implementation of lexical variables... Tim Bradshaw
        Re: On the implementation of lexical variables... Thomas F. Burdick
Re: On the implementation of lexical variables... Lieven Marchand
Re: On the implementation of lexical variables... Reini Urban
Re: On the implementation of lexical variables... Roger Corman
- Re: On the implementation of lexical variables... ·······@nvidia.com
  - Re: On the implementation of lexical variables... Juanjo

From: Daniel Lakeland
Subject: Re: On the implementation of lexical variables...
Date: Wed, 07 Nov 2001 03:10:30 +0000
Message-ID: <20011106.191030.437116466.5743@silnospamcon.com>

In article <················@domain.invalid>, "user" <····@domain.invalid>
wrote:

>[problems with implementing closures]

You might try getting a copy of "Essentials of Programming Languages"
Friedman et al. ISBN 0262062178

>      Lambda is the single primitive form that establishes a lexical
>      binding; let and friends are macros that expand to a lambda form.
>      When I evaluate a lambda form, I note the lexical variables in the
>      lambda list, allocate space in the lexicals vector, rewrite all the
>      corresponding symbol pointers in the body to be of type lexical and
>      to point to the corresponding symbol in the lexical vector, and
>      wrap the resulting body in a nil labelled block. The lexical symbol
>      contains a pointer to the corresponding special of the same name so
>      forms that only access globals can get to the global binding.
> 
>      On evaluating the resulting block, I save the state of the form's
>      lexicals on the stack, bind the evaluated parameters to them, and
>      evaluate the body. Upon exit, I restore the shadowed lexicals from
>      the stack. This makes recursion work.

Yes, and with clever tricks you can make trivial tail recursion
elimination. However so far, you've implemented what's called "dynamic
scope"  lexical scope is distinguishable from dynamic scope only with
respect to FREE variables used inside a lambda. How do you deal with free
vars? Look them up on the stack, or look them up in a "closure" that
remembers the bindings in place at the time of creation?

>      This design made getting at the value and function slots of
>      lexicals a simple machine pointer indirection (after masking the
>      tag) and the corresponding global only one more indirection after
>      that. The alternative seemed to be to look up the lexical variable
>      in the environment vector every time I need its binding.

>      The problem (other than where this design might not be correct for
>      some implication I have not yet seen) is that rewriting symbols in
>      the body forms of the lambda correctly needs something like a
>      complete code walker, which is not easy. Further, it does not
>      implement closures at all since it does not associate specific
>      lexical bindings with a function object. Doing that would require a
>      per-closure private binding vector, which gets very messy very
>      quickly. Added to that, unwind-protect not only has a lexical
>      enviroment for its clean up forms, but it also has to capture and
>      restore special bindings.

I think the per-closure private environment is required for proper lexical
scope.

You should definitely get Essentials of Programming Languages (see above).

>      It was at this point that my head started to hurt and I reviewed
>      the literature. I went through www.lisp.org and comments in this
>      newsgroup, and have a line on a copy of SICP, which I believe may
>      help. This cannot be as complicated as I'm making it, but I'm kind
>      of stumped. I know I could go read the clisp source to see how they
>      do it, but I want to understand it first and I obviously don't.

The technique you're using is closely related to compilation. Ie. you're
partially compiling to an intermediate form that you are interpreting. As
such, I think a code-walker is probably required. Your goal should be to
create a small enough intermediate representation that the code walker
doesn't have to deal with understanding anything hairy other than lambda
and maybe some other trivial forms (or, and, setq funcall etc)

I once wanted to create a GCC frontend or perhaps a full compiler for a
lisp language. But I decided that there were better ways to use my time. I
wanted a system that would let me write low level code (ie. kernel level)
in CL or scheme... To be able to have an extensible kernel, with code
migration and all kinds of nice things... but it ultimately wasn't worth
my time (at the time).

I was going to write it in scheme or common lisp, and bootstrap it from an
interpreter.

From: ·······@nvidia.com
Subject: Re: On the implementation of lexical variables...
Date: Wed, 07 Nov 2001 19:41:33 +0000
Message-ID: <3BE98E6D.9030906@nvidia.com>

Daniel Lakeland wrote:

>
> You might try getting a copy of "Essentials of Programming Languages"
> Friedman et al. ISBN 0262062178


     Exactly the suggestion I was looking for, thank you. Also thanks
     to Reini for the link to Knott's book.


> Yes, and with clever tricks you can make trivial tail recursion
> elimination. However so far, you've implemented what's called "dynamic
> scope"  lexical scope is distinguishable from dynamic scope only with
> respect to FREE variables used inside a lambda. How do you deal with free
> vars? Look them up on the stack, or look them up in a "closure" that
> remembers the bindings in place at the time of creation?


     That's part of the problem. As I understand it (and I've already
     decided that I have some portion of this horribly wrong) lexical
     variables have lexical scope (that is, a unique symbol is created
     by each defining form) and indefinite extent (the binding of the
     symbol persists after the defining form exits.) A closure, for
     example, consists of a function object and a set of bindings of
     for unshadowed lexical variables in effect when it was defined.

     Where I get confused has to do with the uniqueness of those
     variables (I think they're what you're referring to as free
     variables. The Hyperspec doesn't cite free variables, but it
     does describe a free declaration which might be the same thing.)

     In something like:

     (let ((a 0) (b 1))
        (defun foo (a)
            (print (list a b))
        (defun bar (b)
            (print (list a b))
        (defun baz ()
            (print (list a b)))

     the let form creates two lexical bindings, a and b. foo creates a
     lexical binding for a (which is a different symbol than let's
     a) and inherits b from the let. bar does the opposite. When foo
     or bar are evaluated, they use the bindings for a and b respectively
     that were in effect at the time they were defined.

     baz has no lexical bindings, so it captures let's a and b.

     Are foo's b, bar's a, and baz's a and b and let's a and b all unique
     symbols?


> I think the per-closure private environment is required for proper lexical
> scope.


     I think so too. That implies that all of the symbols above are
     unique.


> The technique you're using is closely related to compilation. Ie. you're
> partially compiling to an intermediate form that you are interpreting. As
> such, I think a code-walker is probably required. Your goal should be to
> create a small enough intermediate representation that the code walker
> doesn't have to deal with understanding anything hairy other than lambda
> and maybe some other trivial forms (or, and, setq funcall etc)


     That was the idea. I wanted to reduce whatever forms the kernel
     language understands to a small number of easily walked functions.
     I suppose that's a compilation of a sort.

 
> I once wanted to create a GCC frontend or perhaps a full compiler for a
> lisp language. But I decided that there were better ways to use my time. I
> wanted a system that would let me write low level code (ie. kernel level)
> in CL or scheme... To be able to have an extensible kernel, with code
> migration and all kinds of nice things... but it ultimately wasn't worth
> my time (at the time).


     That was sort of where I was going with this, I wanted to be able
     to write lisp code and have it work in standalone (either user or
     system) environments.

From: Daniel Lakeland
Subject: Re: On the implementation of lexical variables...
Date: Wed, 07 Nov 2001 22:53:25 +0000
Message-ID: <pan.2001.11.07.14.52.38.396.2221@sil-no.spam-con.com>

In article <················@nvidia.com>, "jputnam" <·······@nvidia.com>
wrote:

> Daniel Lakeland wrote:
> 
> 
>> You might try getting a copy of "Essentials of Programming Languages"
>> Friedman et al. ISBN 0262062178
> 
> 
>      Exactly the suggestion I was looking for, thank you. Also thanks to
>      Reini for the link to Knott's book.

Yes, Essentials is a nice book that will walk you through a lot of
programming language theory, and it uses scheme as its implementation
language.

>      That's part of the problem. As I understand it (and I've already
>      decided that I have some portion of this horribly wrong) lexical
>      variables have lexical scope (that is, a unique symbol is created
>      by each defining form) and indefinite extent (the binding of the
>      symbol persists after the defining form exits.) A closure, for
>      example, consists of a function object and a set of bindings of for
>      unshadowed lexical variables in effect when it was defined.

It's not so much the symbols that are new, but rather the bindings.
Creating a lexical variable allows the binding to be captured by a
closure.

>      Where I get confused has to do with the uniqueness of those
>      variables (I think they're what you're referring to as free
>      variables. The Hyperspec doesn't cite free variables, but it does
>      describe a free declaration which might be the same thing.)

A free variable is one that is used inside a form whose closest lexical
binding doesn't define the form.

(lambda (x) (list a x)) is a function with one free variable, 'a'

If lambda is the only binding form you use (all others are macros
expanding to lambda) then a variable is free with respect to a lambda when
it appears inside a lambda but doesn't appear in the formal argument list
of that lambda.

>      In something like:
> 
>      (let ((a 0) (b 1))
>         (defun foo (a)
>             (print (list a b))
>         (defun bar (b)
>             (print (list a b))
>         (defun baz ()
>             (print (list a b)))
>...

>      Are foo's b, bar's a, and baz's a and b and let's a and b all
>      unique symbols?

No, they're all unique BINDINGS... a different thing.

A symbol is a symbol, if it's interned (all of these are automatically
interned by read) then there is only ONE, in the symbol table (if in CL
you have a symbol table per package, in scheme, a single symbol table)

However, the *binding* that the symbol has in a certain context changes,
and is usually established by an "environment", a data structure that
provides a mapping from symbols to values, kind of like a hash table.
For compilation, you can completely ignore all the symbols and
simply replace them with offsets into an array. For bound variables you
can always compute a reference at compile time, for free variables with
lexical scope it's the same. For dynamically scoped variables you have to
look up the value on the run-time stack or somewhere an might need to
keep the symbol name.

(lambda (x) (list x x)) would become something like (lambda (#1) (list #1
#1)) where #n refers to the n'th slot in the environment array.

So while in the let,  'a' refers to a memory location containing
the number 0, and  'b' to 1, in 'bar' 'a' will refer to the same memory
location as the zero, but 'b' will refer to the memory location of the
argument of the function.

>      That was the idea. I wanted to reduce whatever forms the kernel
>      language understands to a small number of easily walked functions.
>      I suppose that's a compilation of a sort.

Yes, you might look at the code for scheme 48 as well, I think it compiles
to a systems level "s-code" subset of scheme. There are TONS of scheme
implementations, unfortunately.

>> I once wanted to create a GCC frontend or perhaps a full compiler for a
>> lisp language. But I decided that there were better ways to use my
>> time. I wanted a system that would let me write low level code (ie.
>> kernel level) in CL or scheme... To be able to have an extensible
>> kernel, with code migration and all kinds of nice things... but it
>> ultimately wasn't worth my time (at the time).
> 
> 
>      That was sort of where I was going with this, I wanted to be able
>      to write lisp code and have it work in standalone (either user or
>      system) environments.

If this is your goal, I would be interested in participating in the design
and implementation, assuming you are doing a GPL or similar DFSG Free
license.

Though I would love to have a free Common Lisp implementation that
supported multi-threaded standalone code, was processor independent,
supported linkage with hand-assembly, and provided a restricted subset for
use in non-gc kernel interrupt type code, I think there's enough work just
to implement naive CL that makes CL a poor choice for this type of
project. Roger Corman will probably chime in here as well :-)

My idea was to use 3 passes,

1) scheme superset -> scheme subset,

2) scheme subset -> processor independent machine code, 

3) processor independent code -> processor machine code.

1,2 could be done in scheme.

3 could be done in scheme, or in a GCC frontend.

In order to do anything like this, you'd have to:

1) Implement the (2) compilers in step 1, and 2.
2) Implement a runtime library, (ie. GC, allocation, type system, basic
scheme functions etc)
3) Do a textual representation of your processor independent code, and a
GCC frontend if you wanted to take that route., or do a hardware
translation layer in scheme to make machine code from your processor independent
code.

It's a lot of work. But I think the results would be a real asset to lisp
hackers who want to do system-level coding in lisp. Especially for the
HURD for example.

From: ·······@nvidia.com
Subject: Re: On the implementation of lexical variables...
Date: Thu, 08 Nov 2001 00:12:38 +0000
Message-ID: <3BE9CDF6.2070305@nvidia.com>

Daniel Lakeland wrote:

> 
> It's not so much the symbols that are new, but rather the bindings.
> Creating a lexical variable allows the binding to be captured by a
> closure.


     With you so far.


> A free variable is one that is used inside a form whose closest lexical
> binding doesn't define the form.


     This part I get.


> No, they're all unique BINDINGS... a different thing.


     Now I'm really confused. Kent told us in a thread named "Scope of
     Variables" in July of this year that (Google doesn't let me grab the
     message ID, sorry.)

Kent writes:
 >> In the case of a lexical variable, each binding is a new variable.
 >> So when you do
 >>
 >> (defun make-counter ()
 >>   (let ((x 0))
 >>     #'(lambda () (incf x))))
 >>
 >> and you do
 >>
 >> (setq *counter-1* (make-counter))
 >> (setq *counter-2* (make-counter))
 >>
 >> you will be counting with two different variables because each
 >> call to
 >> MAKE-COUNTER allocates a new variable (whose lexical name is X, but
 >> that doesn't really matter--it's never referred to by name).
 >>
 >> Special variables are ALWAYS the same location.  There is only one
 >> special named X.  If you re-bind a lexical, as in
 >> (let ((x 1)) (values (let ((x (1+ x))) x) x))
 >> you get a second variable.  In the case of
 >> (let ((x 1))
 >>   (declare (special x))
 >>   (values (let ((x (1+ x)))
 >>             (declare (special x))
 >>             x)
 >>           x))
 >> you'd get the same answer but the way it works is VERY different.
 >> The inner binding secretly stores the old value of X at the time of
 >> the binding (on the stack somewhere) and replaces the SAME location
 >> with a new value.
 >>
 >> So for a bound lexical, you get a new variable [location].
 >> For a bound special, you get a new value [in same location].

     I get lexical shadowing of special variables. I have this
     implemented as shallow binding.


> A symbol is a symbol, if it's interned (all of these are automatically
> interned by read) then there is only ONE, in the symbol table (if in CL
> you have a symbol table per package, in scheme, a single symbol table)


     The above implies strongly there are multiple instances of lexical
     variables (ie, are not eq) unless Kent means variable in the sense
     a binding.


> (lambda (x) (list x x)) would become something like (lambda (#1) (list #1
> #1)) where #n refers to the n'th slot in the environment array.


     I had implemented something similar before I got caught up in the
     semantic quagmire of lexical variables.

 
> So while in the let,  'a' refers to a memory location containing
> the number 0, and  'b' to 1, in 'bar' 'a' will refer to the same memory
> location as the zero, but 'b' will refer to the memory location of the
> argument of the function.


     That makes sense. Free variables are shared. I plainly need to
     do some reading.


> Yes, you might look at the code for scheme 48 as well, I think it compiles
> to a systems level "s-code" subset of scheme. There are TONS of scheme
> implementations, unfortunately.


     I want a Lisp-2, but there may be some ideas I can grab.


> If this is your goal, I would be interested in participating in the design
> and implementation, assuming you are doing a GPL or similar DFSG Free
> license.


     That was the general idea (I'm fractionally more fond of the
     BSD-stlye license) but I don't have any commercial interest
     in this work. If you really are interested, contact me privately
     and I'll send you the design whitepaper and if that doesn't scare
     you off, a snapshot of the code base.


> Though I would love to have a free Common Lisp implementation that
> supported multi-threaded standalone code, was processor independent,
> supported linkage with hand-assembly, and provided a restricted subset for
> use in non-gc kernel interrupt type code, I think there's enough work just
> to implement naive CL that makes CL a poor choice for this type of
> project. Roger Corman will probably chime in here as well :-)


     It has been plain to me for some time that much as I love CL,
     it's not really suited for a kernel language. I'm not competent
     to criticise the language in any particular and I come directly
     from a worse-is-better background so I don't especially trust my
     instincts. I think a small, fast CLish Lisp-2 would solve a number
     of my problems (or at least the problems for which I want to write
     lisp code but can't carry around the entire Lisp environment.) The
     original idea was to have an alternative to tcl for writing shells
     and extendable tools.

 
> It's a lot of work. But I think the results would be a real asset to lisp
> hackers who want to do system-level coding in lisp. Especially for the
> HURD for example.


     I have spent some nontrivial amount of time looking at compiler
     backends, and if I can ever get out of the current morass I want
     to use lcc's backend to emit machine code. I don't want to invent
     yet another virtual machine.

From: Daniel Lakeland
Subject: Re: On the implementation of lexical variables...
Date: Thu, 08 Nov 2001 01:20:16 +0000
Message-ID: <pan.2001.11.07.17.20.13.729.1090@sil-no.spam-con.com>

In article <················@nvidia.com>, "jputnam" <·······@nvidia.com>
wrote:

> Daniel Lakeland wrote:
>> No, they're all unique BINDINGS... a different thing.
> 
> 
>      Now I'm really confused. Kent told us in a thread named "Scope of
>      Variables" in July of this year that (Google doesn't let me grab
>      the message ID, sorry.)
> 
> Kent writes:
>  >> In the case of a lexical variable, each binding is a new variable.
>  >> So when you do
>  >>
>  >> (defun make-counter ()
>  >>   (let ((x 0))
>  >>     #'(lambda () (incf x))))
>  >>
>  >> and you do
>  >>
>  >> (setq *counter-1* (make-counter))
>  >> (setq *counter-2* (make-counter))

This is correct, because every time you call make-counter, the let form
creates a new environment in which the first entry is 0 and then the
lambda captures that environment. Don't think in terms of the names,
because as I said, you can always replace formals with direct references
at compile time, in other words, the names are just there to identify
what goes with what.

Does this help?

>  >> So for a bound lexical, you get a new variable [location]. For a
>  >> bound special, you get a new value [in same location].
> 
>      I get lexical shadowing of special variables. I have this
>      implemented as shallow binding.

In a dynamic binding system you allocate a new binding on a stack and
always use the most recent binding. As you pop the stack (exiting the
scope of a binding) you'll see the more and more ancient bindings.

In a lexical binding system, whenever you create a lambda that has a free
variable, you snag the current environment and remember it for all time
until the closure is GCd. The next time you come back to the same spot,
the environment you snag is a different one. Hence kent's explanation.

Does that help?

>> A symbol is a symbol, if it's interned (all of these are automatically
>> interned by read) then there is only ONE, in the symbol table (if in CL
>> you have a symbol table per package, in scheme, a single symbol table)
> 
> 
>      The above implies strongly there are multiple instances of lexical
>      variables (ie, are not eq) unless Kent means variable in the sense
>      a binding.

I think that's what he means. A variable is really (abstractly) a storage
location. The symbol is just the NAME of that storage location.

From: ·······@nvidia.com
Subject: Re: On the implementation of lexical variables...
Date: Thu, 08 Nov 2001 19:36:45 +0000
Message-ID: <3BEADECD.9020505@nvidia.com>

Daniel Lakeland wrote:


> This is correct, because every time you call make-counter, the let form
> creates a new environment in which the first entry is 0 and then the
> lambda captures that environment. Don't think in terms of the names,
> because as I said, you can always replace formals with direct references
> at compile time, in other words, the names are just there to identify
> what goes with what.


     I need to read some more before I can ask intelligent questions
     (none of which I've made in this thread.) Let me see if I got this
     right.

     A symbol is a symbol is a symbol, lexical or special. However
     many of lexicals there may be by the same name, there is only one
     symbol data structure. A symbol data structure is primarily a
     name string. Symbol bindings associate a name string with a
     variety of values; property list, function, data, etc.

     A lexical variable is a symbol binding, not a symbol itself. A
     symbol is not a binding, it's a name.

     A bound lexical variable is a binding whose value is established
     dynamically, that is at the time its binding form is invoked. Its
     scope is lexical and its extent is dynamic in its defining form,
     just like special bindings.

     A free lexical variable is a binding whose value is established
     textually by an enclosing binding form at the time the form it is
     free in was defined. Its scope is lexical and extent indefinite. Its
     value at any one time depends on the current binding of the free
     lexical in the environment.

     Environments hold bindings. Symbol tables hold symbols. The
     two should not be confused.


> In a lexical binding system, whenever you create a lambda that has a free
> variable, you snag the current environment and remember it for all time
> until the closure is GCd. The next time you come back to the same spot,
> the environment you snag is a different one. Hence kent's explanation.


     In this sense, the value of a free lexical depends on the
     environment at the time it is established by its enclosing form.
     An enclosing form may alter a free variable, which is reflected
     in the state of the environment at the time the form was defined.
     These bindings are persistent while the closure is accessible.

     Closures may share a free lexical's environment, in which case
     one closure's change of that variable will be seen by the other.

     Is that in the slightest bit correct?

From: Daniel Lakeland
Subject: Re: On the implementation of lexical variables...
Date: Fri, 09 Nov 2001 17:42:51 +0000
Message-ID: <pan.2001.11.09.09.42.45.648.761@sil-no.spam-con.com>

In article <················@nvidia.com>, "jputnam" <·······@nvidia.com>
wrote:

> Daniel Lakeland wrote:
> 
> 
>> This is correct, because every time you call make-counter, the let form
>> creates a new environment in which the first entry is 0 and then the
>> lambda captures that environment. Don't think in terms of the names,
>> because as I said, you can always replace formals with direct
>> references at compile time, in other words, the names are just there to
>> identify what goes with what.
> 
> 
>      I need to read some more before I can ask intelligent questions
>      (none of which I've made in this thread.) Let me see if I got this
>      right.
> 
>      A symbol is a symbol is a symbol, lexical or special. However many
>      of lexicals there may be by the same name, there is only one symbol
>      data structure. A symbol data structure is primarily a name string.
>      Symbol bindings associate a name string with a variety of values;
>      property list, function, data, etc.
> 
>      A lexical variable is a symbol binding, not a symbol itself. A
>      symbol is not a binding, it's a name.
> 
>      A bound lexical variable is a binding whose value is established
>      dynamically, that is at the time its binding form is invoked. Its
>      scope is lexical and its extent is dynamic in its defining form,
>      just like special bindings.
> 
>      A free lexical variable is a binding whose value is established
>      textually by an enclosing binding form at the time the form it is
>      free in was defined. Its scope is lexical and extent indefinite.
>      Its value at any one time depends on the current binding of the
>      free lexical in the environment.
> 
>      Environments hold bindings. Symbol tables hold symbols. The two
>      should not be confused.


I think this sounds right... So hopefully that helps. Essentials of
Programming Languages will give you a better view on the whole issue.
Lisp In Small Pieces also sounds like it would cover this but I haven't
read it.



>> In a lexical binding system, whenever you create a lambda that has a
>> free variable, you snag the current environment and remember it for all
>> time until the closure is GCd. The next time you come back to the same
>> spot, the environment you snag is a different one. Hence kent's
>> explanation.
> 
> 
>      In this sense, the value of a free lexical depends on the
>      environment at the time it is established by its enclosing form. An
>      enclosing form may alter a free variable, which is reflected in the
>      state of the environment at the time the form was defined. These
>      bindings are persistent while the closure is accessible.
> 
>      Closures may share a free lexical's environment, in which case one
>      closure's change of that variable will be seen by the other.
> 
>      Is that in the slightest bit correct?

Yes, by Jove I think you've got it!

(unless of course I have got it wrong too :-) ) hopefully someone else
will confirm this as well.

From: Thomas F. Burdick
Subject: Re: On the implementation of lexical variables...
Date: Thu, 08 Nov 2001 00:23:34 +0000
Message-ID: <xcvvggm6rg9.fsf@conquest.OCF.Berkeley.EDU>

"Daniel Lakeland" <········@sil-no.spam-con.com> writes:

> In article <················@nvidia.com>, "jputnam" <·······@nvidia.com>
> wrote:
> 
> > Daniel Lakeland wrote:
> > 
> > 
> >> You might try getting a copy of "Essentials of Programming Languages"
> >> Friedman et al. ISBN 0262062178
> > 
> > 
> >      Exactly the suggestion I was looking for, thank you. Also thanks to
> >      Reini for the link to Knott's book.
> 
> Yes, Essentials is a nice book that will walk you through a lot of
> programming language theory, and it uses scheme as its implementation
> language.

As I get further and further into the compiler I'm writing (is fall
make-your-own-Lisp-system or something?), I'm *very* glad I'm writing
it in CL.  Scheme or C++ would make my life very difficult, since I
find myself taking advantage of most corners of the language.
Unfortunately, that means any aspirations of making my Lisp
self-hosting anytime soon have pretty much vanished.  Oh well,
compilers are hard problems, and I'm glad I'm using the most powerful
language I know to write mine.

 [...]
> A free variable is one that is used inside a form whose closest lexical
> binding doesn't define the form.
> 
> (lambda (x) (list a x)) is a function with one free variable, 'a'
> 
> If lambda is the only binding form you use (all others are macros
> expanding to lambda) then a variable is free with respect to a lambda when
> it appears inside a lambda but doesn't appear in the formal argument list
> of that lambda.

I guess it depends on the approach you're taking, but if I'd made
LAMBDA the only binding form in my system, LET would cons up closures
like crazy.  Of course, I handle ((lambda ... ) ...) by turning it
into

 (let ((#:CLOSURE112 #'(lambda ...)))
   (declare (dynamic-extent #:CLOSURE112))
   (funcall #:CLOSURE122 ...))

but unless you're going to handle ((lambda ...) ...) specially, and
differently from (funcall #'(lambda ...) ...), that's going to be the
case.  I see these two LAMBDAs as being different expressions, though,
so I don't really get the advantage of making ((lambda ...) ...)
primitive over LET.

 [...]
> >      That was sort of where I was going with this, I wanted to be able
> >      to write lisp code and have it work in standalone (either user or
> >      system) environments.
> 
> If this is your goal, I would be interested in participating in the design
> and implementation, assuming you are doing a GPL or similar DFSG Free
> license.
> 
> Though I would love to have a free Common Lisp implementation that
> supported multi-threaded standalone code, was processor independent,
> supported linkage with hand-assembly, and provided a restricted subset for
> use in non-gc kernel interrupt type code, I think there's enough work just
> to implement naive CL that makes CL a poor choice for this type of
> project. Roger Corman will probably chime in here as well :-)
> 
> 
> My idea was to use 3 passes,
> 
> 1) scheme superset -> scheme subset,
> 
> 2) scheme subset -> processor independent machine code, 
> 
> 3) processor independent code -> processor machine code.
> 
> 1,2 could be done in scheme.
> 
> 3 could be done in scheme, or in a GCC frontend.
> 
> 
> In order to do anything like this, you'd have to:
> 
> 1) Implement the (2) compilers in step 1, and 2.
> 2) Implement a runtime library, (ie. GC, allocation, type system, basic
> scheme functions etc)
> 3) Do a textual representation of your processor independent code,
> and a GCC frontend if you wanted to take that route., or do a
> hardware translation layer in scheme to make machine code from your
> processor independent code.
> 
> It's a lot of work. But I think the results would be a real asset to lisp
> hackers who want to do system-level coding in lisp. Especially for the
> HURD for example.

If nothing else, this would be a fun project.  If I were you, I'd
start it and see if people would help out (me, I've already got a Lisp
system to work on :).  Although I don't really see the advantage of
making it Scheme-based, rather than Lisp-based.  You could replace
"scheme subset" with "small Lisp dialect", and "scheme superset" with
"aspires to be Common Lisp", and you'd have a much more appealing
project, IMO.  Plus, then you could write your compilers in CL, and
hope to one day make them self-hosting :)

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Daniel Lakeland
Subject: Re: On the implementation of lexical variables...
Date: Thu, 08 Nov 2001 01:53:09 +0000
Message-ID: <pan.2001.11.07.17.53.08.433.1090@sil-no.spam-con.com>

In article <···············@conquest.OCF.Berkeley.EDU>, "Thomas F.
Burdick" <···@conquest.ocf.berkeley.edu> wrote:

> "Daniel Lakeland" <········@sil-no.spam-con.com> writes:
> 
>> In article <················@nvidia.com>, "jputnam"
>> <·······@nvidia.com> wrote:
>> 
>> > Daniel Lakeland wrote:
>> > 

....

> I guess it depends on the approach you're taking, but if I'd made LAMBDA
> the only binding form in my system, LET would cons up closures like
> crazy.  Of course, I handle ((lambda ... ) ...) by turning it into
> 
>  (let ((#:CLOSURE112 #'(lambda ...)))
>    (declare (dynamic-extent #:CLOSURE112)) (funcall #:CLOSURE122 ...))
> 
> but unless you're going to handle ((lambda ...) ...) specially, and
> differently from (funcall #'(lambda ...) ...), that's going to be the
> case.  I see these two LAMBDAs as being different expressions, though,
> so I don't really get the advantage of making ((lambda ...) ...)
> primitive over LET.

Is this an interpreter or a compiler? It looks like an interpreter.

I believe that CMUCL makes lambda the only binding form. So 

(let ((a 1) (b 2)) ...body) => ((lambda (a b) body) 1 2)

which doesn't have to cons unless the body has free variables. Even then
I think it analyzes which closures can be captured and can treat this
kind of lambda specially since it immediately is called and then can
never be referenced again.

>> Though I would love to have a free Common Lisp implementation that
>> supported multi-threaded standalone code, was processor independent,
>> supported linkage with hand-assembly, and provided a restricted subset
>> for use in non-gc kernel interrupt type code, I think there's enough
>> work just to implement naive CL that makes CL a poor choice for this
>> type of project. Roger Corman will probably chime in here as well :-)
>> 
>> 
>> My idea was to use 3 passes,

....

>> It's a lot of work. But I think the results would be a real asset to
>> lisp hackers who want to do system-level coding in lisp. Especially for
>> the HURD for example.
> 
> If nothing else, this would be a fun project.  If I were you, I'd start
> it and see if people would help out (me, I've already got a Lisp system
> to work on :).  Although I don't really see the advantage of making it
> Scheme-based, rather than Lisp-based.  You could replace "scheme subset"
> with "small Lisp dialect", and "scheme superset" with "aspires to be
> Common Lisp", and you'd have a much more appealing project, IMO.  Plus,
> then you could write your compilers in CL, and hope to one day make them
> self-hosting :)

Yes. I agree. But for the moment I'm busy trying to get my own company
off the ground, and don't have resources to do this. I'm spending all my
programming effort on writing code in common lisp for my first
product/service. (not a compiler)

Still I'd love to see it. And I agree making it "aspire to be" CL would
be best.

Using an existing abstract machine with a backend code
generator would also be best.

Perhaps I'll look into that. Someone else might be willing to write the
CL -> subset part if I got into the subset -> abstract machine, and we
leave the abstract machine -> machine code to existing tools.

I considered using FORTH as the abstract machine part. I don't know if
there's a good FORTH-> x86 compiler or not though. Also FORTH -> IA64
might be a bitch to optimize.

From: Thomas F. Burdick
Subject: Re: On the implementation of lexical variables...
Date: Thu, 08 Nov 2001 02:54:28 +0000
Message-ID: <xcvelna6kgr.fsf@conquest.OCF.Berkeley.EDU>

"Daniel Lakeland" <········@sil-no.spam-con.com> writes:

> In article <···············@conquest.OCF.Berkeley.EDU>, "Thomas F.
> Burdick" <···@conquest.ocf.berkeley.edu> wrote:
>
> > I guess it depends on the approach you're taking, but if I'd made LAMBDA
> > the only binding form in my system, LET would cons up closures like
> > crazy.  Of course, I handle ((lambda ... ) ...) by turning it into
> > 
> >  (let ((#:CLOSURE112 #'(lambda ...)))
> >    (declare (dynamic-extent #:CLOSURE112)) (funcall #:CLOSURE122 ...))
> > 
> > but unless you're going to handle ((lambda ...) ...) specially, and
> > differently from (funcall #'(lambda ...) ...), that's going to be the
> > case.  I see these two LAMBDAs as being different expressions, though,
> > so I don't really get the advantage of making ((lambda ...) ...)
> > primitive over LET.
> 
> Is this an interpreter or a compiler? It looks like an interpreter.

It's a compiler, why do you say it looks like an interpreter?

> I believe that CMUCL makes lambda the only binding form. So 
> 
> (let ((a 1) (b 2)) ...body) => ((lambda (a b) body) 1 2)
> 
> which doesn't have to cons unless the body has free variables. Even then
> I think it analyzes which closures can be captured and can treat this
> kind of lambda specially since it immediately is called and then can
> never be referenced again.

Well, yes (let ((a 1) (b 2)) (+ a b)) isn't going to cons, but

 (let ((a 0))
   (incf a)
   (let ((b 2))
     (+ a b)))

gets turned into

  ((lambda (a)
     (incf a)
     ((lambda (b)
        (+ a b)) 2))
   0)

which has a free reference in the inner lambda.  Now of course, you
can treat (lambda ...) forms in the first position of a list
differently than you treat other lambdas, and not cons a closure,
since it's impossible to get your hands on the would-be closure
(lambda (b) ...).  I just don't see how the weird special-case lambda
is more primitive than let.  It's probably one's hnistorical
perspective, but I just see it as an annoying historical special case.

 [...]
> > If nothing else, this would be a fun project.  If I were you, I'd start
> > it and see if people would help out (me, I've already got a Lisp system
> > to work on :).  Although I don't really see the advantage of making it
> > Scheme-based, rather than Lisp-based.  You could replace "scheme subset"
> > with "small Lisp dialect", and "scheme superset" with "aspires to be
> > Common Lisp", and you'd have a much more appealing project, IMO.  Plus,
> > then you could write your compilers in CL, and hope to one day make them
> > self-hosting :)
> 
> Yes. I agree. But for the moment I'm busy trying to get my own company
> off the ground, and don't have resources to do this. I'm spending all my
> programming effort on writing code in common lisp for my first
> product/service. (not a compiler)

Heh.  I have a sad suspicion that most people who want to build a
compiler, some day, don't ever get to.  Which is why I'm glad I found
a good reason/excuse.

> Still I'd love to see it. And I agree making it "aspire to be" CL would
> be best.
> 
> Using an existing abstract machine with a backend code
> generator would also be best.
> 
> Perhaps I'll look into that. Someone else might be willing to write the
> CL -> subset part if I got into the subset -> abstract machine, and we
> leave the abstract machine -> machine code to existing tools.

You never know.  If there were a usable Lisp dialect that could
compile code that was reasonable to use in a context like a linux
device driver, I'd probably actually be willing to write a device
driver if I found myself stuck with a device that wasn't supported.
And I'd probably add in features I wanted in the Lisp system.  Of
course, you wouldn't really want to build a CL system in such a
hack-on-top-of-a-hack kind of way.

> I considered using FORTH as the abstract machine part. I don't know if
> there's a good FORTH-> x86 compiler or not though. Also FORTH -> IA64
> might be a bitch to optimize.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Daniel Lakeland
Subject: Re: On the implementation of lexical variables...
Date: Thu, 08 Nov 2001 03:30:18 +0000
Message-ID: <pan.2001.11.07.19.30.17.834.5362@sil-no.spam-con.com>

In article <···············@conquest.OCF.Berkeley.EDU>, "Thomas F.
Burdick" <···@conquest.ocf.berkeley.edu> wrote:

> "Daniel Lakeland" <········@sil-no.spam-con.com> writes:
> 
>> In article <···············@conquest.OCF.Berkeley.EDU>, "Thomas F.
>> Burdick" <···@conquest.ocf.berkeley.edu> wrote: Is this an interpreter
>> or a compiler? It looks like an interpreter.
> 
> It's a compiler, why do you say it looks like an interpreter?

Never mind, I just realized what you're doing. It was a little too
narrowly focused for me to see the big picture. You're obviously doing a
source to source transform before compiling to whatever your lower level
representation is.

>I
> just don't see how the weird special-case lambda is more primitive than
> let.  It's probably one's hnistorical perspective, but I just see it as
> an annoying historical special case.

It gets rid of let, so in that sense it is more primitive, because you
can't synthesize lambda from let... I guess people were looking for the
simplest possible source to have to compile.

> You never know.  If there were a usable Lisp dialect that could compile
> code that was reasonable to use in a context like a linux device driver,
> I'd probably actually be willing to write a device driver if I found
> myself stuck with a device that wasn't supported. And I'd probably add
> in features I wanted in the Lisp system.  Of course, you wouldn't really
> want to build a CL system in such a hack-on-top-of-a-hack kind of way.

Exactly. Or for example, I'd be willing to write a file system handler for
HURD in common lisp, or be willing to write an "exokernel" in common lisp
(see one of the groups at MIT  for their thoughts on exokernels).

I think it'd be great to write a kernel in common lisp that exposes the
hardware registers directly through some sort of "safe code" interface
(this is where the CL code migration comes in).

Then you could do something like run linux as a user-mode process on your
tiny lisp kernel, and get a useable UNIX system, and at the same time
write specialty drivers and soforth in lisp that ran as other user mode
processes.... You get all the advantages of co-opting linux, without
having to sacrifice lispability in the kernel.

blablabla... lots of cool projects in the world. many of them would be
nice to do in CL... I just wish someone had a DFSG free ANSI common lisp
that could do this kind of thing.

From: Thomas F. Burdick
Subject: Re: On the implementation of lexical variables...
Date: Thu, 08 Nov 2001 04:24:38 +0000
Message-ID: <xcvadxx7uux.fsf@conquest.OCF.Berkeley.EDU>

"Daniel Lakeland" <········@sil-no.spam-con.com> writes:

> In article <···············@conquest.OCF.Berkeley.EDU>, "Thomas F.
> Burdick" <···@conquest.ocf.berkeley.edu> wrote:
> 
> > "Daniel Lakeland" <········@sil-no.spam-con.com> writes:
> > 
> >> In article <···············@conquest.OCF.Berkeley.EDU>, "Thomas F.
> >> Burdick" <···@conquest.ocf.berkeley.edu> wrote: Is this an interpreter
> >> or a compiler? It looks like an interpreter.
> > 
> > It's a compiler, why do you say it looks like an interpreter?
> 
> Never mind, I just realized what you're doing. It was a little too
> narrowly focused for me to see the big picture. You're obviously doing a
> source to source transform before compiling to whatever your lower level
> representation is.
> 
> >I
> > just don't see how the weird special-case lambda is more primitive than
> > let.  It's probably one's hnistorical perspective, but I just see it as
> > an annoying historical special case.
> 

> It gets rid of let, so in that sense it is more primitive, because you
> can't synthesize lambda from let... 

Okay, I'll buy that.

> I guess people were looking for the simplest possible source to have
> to compile.

This, however ... I believe that's what people were doing, but I found
it much easier to compile LET than it would have been to compile
((lambda ...) ...) forms in a reasonable manner.  Because if you treat
them like (function (lambda ...)) forms, you have to cons closures, so
that's not reasonable.

Thinking about it a bit more, though, I bet this is a legacy of
dynamic binding.  Because in a dynamically-scoped lisp, neither
  (let ((fn #'(lambda ...)))
    (funcall fn ...))
nor
  ((lambda ...) ...)
would create a closure, so you wouldn't need to treat the second in a
special manner to prevent consing.  In a situation like that, I could
see not wanting to deal with LET

 [...]
> Exactly. Or for example, I'd be willing to write a file system handler for
> HURD in common lisp, or be willing to write an "exokernel" in common lisp
> (see one of the groups at MIT  for their thoughts on exokernels).

Come to think of it, a Lisp system to do systems work in was a part of
the original description of the GNU system.  So, I'm sure it'll get
taken care of, right after the HURD's ready ;-)

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Tim Bradshaw
Subject: Re: On the implementation of lexical variables...
Date: Fri, 09 Nov 2001 13:03:09 +0000
Message-ID: <fbc0f5d1.0111090503.40a45c51@posting.google.com>

···@conquest.OCF.Berkeley.EDU (Thomas F. Burdick) wrote in message news:<···············@conquest.OCF.Berkeley.EDU>...
> 
> This, however ... I believe that's what people were doing, but I found
> it much easier to compile LET than it would have been to compile
> ((lambda ...) ...) forms in a reasonable manner.  Because if you treat
> them like (function (lambda ...)) forms, you have to cons closures, so
> that's not reasonable.

It's quite interesting to look at what implementations actually do.
In the process of writing some obscurified answer to a homework
question on cll I discovered that at least some compilers (I think
allegro and CMUCL were the ones I looked at) generate bitwise
identical code for at least ((lambda (...) ...) ...) forms to (let
(...) ...), and I think (but do not recall exactly) also for general
single-use LAMBDAs, so something like

(let ((l (lambda (...) ...))) ... (funcall l ...)) was equivalent to
(progn ... (let (...) ...))

I think that the point is that you already want to to analysis to
avoid consing a closure when you definitely don't need to and you also
want to do lifetime-analysis for bindings, and the combination of
these can show that the thing can be completely optimised away.  At
least that's what I think - I didn't look at the implementations just
the assembler they produced - perhaps an implementor can comment?

--tim

From: Thomas F. Burdick
Subject: Re: On the implementation of lexical variables...
Date: Fri, 09 Nov 2001 22:42:40 +0000
Message-ID: <xcvwv0z5zxb.fsf@conquest.OCF.Berkeley.EDU>

··········@tfeb.org (Tim Bradshaw) writes:

> (let ((l (lambda (...) ...))) ... (funcall l ...)) was equivalent to
> (progn ... (let (...) ...))
> 
> I think that the point is that you already want to to analysis to
> avoid consing a closure when you definitely don't need to and you also
> want to do lifetime-analysis for bindings, and the combination of
> these can show that the thing can be completely optimised away.  At
> least that's what I think - I didn't look at the implementations just
> the assembler they produced - perhaps an implementor can comment?

Ah yes, CMUCL is very much an eager sheepdog of a compiler (a very
good trait in a compiler, btw), and will go running around sniffing at
things you didn't necessarily ask it to, in an attempt to be helpful.
Generally, it doesn't do about anything the naive way (or when it
does, it's probably considered a bug).  I look through the cmucl
source, at the assembly it produces, and especially its documentation,
a lot.  In private e-mail, though, I'm prone to referring to 10 years
and unlimited grad students, when contrasting it to my compiler.
That's a great design decision, if you have the grad students to throw
at the problem to keep it from consing unnecessary closures.  At some
point, I want to be smarter about making closures.  Until that time,
though, I might as well do the obvious analyses myself.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Lieven Marchand
Subject: Re: On the implementation of lexical variables...
Date: Tue, 06 Nov 2001 21:57:42 +0000
Message-ID: <m3u1w7sgtl.fsf@localhost.localdomain>

····@domain.invalid writes:

>      Any suggestions for additional resources or comments to help me
>      figure this out?

Christian Queinnec: Lisp in Small Pieces

-- 
Lieven Marchand <···@wyrd.be>
She says, "Honey, you're a Bastard of great proportion."
He says, "Darling, I plead guilty to that sin."
Cowboy Junkies -- A few simple words

From: Reini Urban
Subject: Re: On the implementation of lexical variables...
Date: Wed, 07 Nov 2001 14:52:21 +0000
Message-ID: <3be947d0.445802009@news.tu-graz.ac.at>

>     Any suggestions for additional resources or comments to help me
>     figure this out?

SICP and "LiSP In Small Pieces" by Christian Queinnec are the best. 
These are by far more understandable than looking at the sources. 
BTW: CMUCL and Corman Lisp sources are IMHO better organized than CLISP.
Esp. Corman because it only has to support one platform and CMUCL's
optimizer is quite advanced.

"Anatomy of Lisp" implements only dynamic vars.

The postscript lispbook by gary knott is very simple, but covers your
binding questions in easily understandable form. just no implementation.

On the other books at http://www.lisp.org/table/books.htm#impl I cannot
comment. 
-- 
Reini Urban
http://tv.mur.at/film/

From: Roger Corman
Subject: Re: On the implementation of lexical variables...
Date: Wed, 07 Nov 2001 18:47:34 +0000
Message-ID: <3be97bbe.557360431@news.callatg.com>

On Tue, 06 Nov 2001 14:22:54 -0800, ····@domain.invalid wrote:

>     It was at this point that my head started to hurt and I reviewed
>     the literature. I went through www.lisp.org and comments in this
>     newsgroup, and have a line on a copy of SICP, which I believe may
>     help. This cannot be as complicated as I'm making it, but I'm kind
>     of stumped. I know I could go read the clisp source to see how they
>     do it, but I want to understand it first and I obviously don't.
>
>     Any suggestions for additional resources or comments to help me
>     figure this out?

As others have pointed out, Lisp In Small Pieces, by Christian Queinnec,  is an
amazing book which pretty well addresses all these things. it walks you through
the implementation of literallay dozens of lisp (or lisp-like) interpreters and
compilers, all simple, with each one demonstrating elegant ways to implement
lisp features (and some non-standard features).

Of course it still is, however, a leap from that to a real production-quality
system. But for understanding, I don't think you can beat it.

As to your comment about how complicated it seems, believe me, it is that
complicated. Closures may be conceptually simple, but efficient implementation
can make your head spin. The payoff is that once you have them working, they
enable all kinds of infrastructure, often using macros, which utilize their
capabilities.

You are welcome to look at the Corman Lisp source, which is included with the
release. Keep in mind, however, that there is no interpreter (only compilation)
and much of the code generator is written in C++. The compilation of special
variables, closures, unwind-protect, et. al. is very complex, and intel
processor specific. You may still find it useful.

The source for PowerLisp, which runs on Mac OS, is much simpler and implements
most of the core features of common lisp. It is an interpreter (the compiler is
an optional package, coded in lisp) written in C++. It is pretty portable. That
may be useful to look at.

When I first started writing lisp engines some people tried to discourage me.
One told me I hald built a boat in a basement, and it would never float. Another
said I should concentrate on leveraging existing systems by adding new
capabilities. The latter comment made a lot of sense.

I would never discourage anybody from building their own compiler--the people
who gravitate to lisp tend to be people who like that kind of thing. However, my
experience is that every step of the way turned out to be much more difficult
than it at first appeared. Overcoming each hurdle became a technical challenge
which I have come to love, which is why I cannot stop myself from continuing.

Doing it under a strict time budget, operating under typical scheduling
pressures, could be a nightmare. Like everything else, once you have already
done it once, it's easy to predict how long it will take. However, in software,
once you have done it once, there is no need to do it again (just reuse the
code).

In this case, I suggest limiting your functionality as much as possible is
probably key to success. Common Lisp is much too big a target (at least to begin
with).

Roger

From: ·······@nvidia.com
Subject: Re: On the implementation of lexical variables...
Date: Wed, 07 Nov 2001 20:09:31 +0000
Message-ID: <3BE994FB.6060203@nvidia.com>

Roger Corman wrote:

>
> As to your comment about how complicated it seems, believe me, it is that
> complicated. Closures may be conceptually simple, but efficient implementation
> can make your head spin. The payoff is that once you have them working, they
> enable all kinds of infrastructure, often using macros, which utilize their
> capabilities.


     Rats. I was hoping I was making it more complicated than it was.
     I'll work on efficiency once I get correctness.

     It seems to me that unwind-protect can be partially implemented with
     a closure. It does need to save and restore special bindings. I have
     to do the same thing for threads when I get them in.


> You are welcome to look at the Corman Lisp source, which is included with the
> release. Keep in mind, however, that there is no interpreter (only compilation)
> and much of the code generator is written in C++. The compilation of special
> variables, closures, unwind-protect, et. al. is very complex, and intel
> processor specific. You may still find it useful.


     Thank you. I actually have Corman lisp installed at home. It is the
     sole reason that machine runs Windows, and the only one of my
     systems that does.


> When I first started writing lisp engines some people tried to discourage me.
> One told me I hald built a boat in a basement, and it would never float. Another
> said I should concentrate on leveraging existing systems by adding new
> capabilities. The latter comment made a lot of sense.


     This little project started about 1987, after I picked up a copy of
     Winston, Horn, et al and thought "This looks elegant. I wonder if I
     could implement it usefully on a small machine?"

     It has taken me down some strange paths, but I'm gratified for the
     experience.


> In this case, I suggest limiting your functionality as much as possible is
> probably key to success. Common Lisp is much too big a target (at least to begin
> with).


     Sage advice.


     Canto kernel Lisp has approximately fifty functions and special
     forms, down from a couple of hundred. The complexity was killing me,
     and after reading what Gabriel and Baker had to say about new lisps
     I got the picture and tossed about ten thousand lines of code. I'm
     tired of writing this in C, I want to switch over to Lisp for added
     functionality.

From: Juanjo
Subject: Re: On the implementation of lexical variables...
Date: Thu, 08 Nov 2001 08:38:49 +0000
Message-ID: <ab4b7d4.0111080038.7f5dcd92@posting.google.com>

·······@nvidia.com wrote in message news:<················@nvidia.com>...
>      Rats. I was hoping I was making it more complicated than it was.
>      I'll work on efficiency once I get correctness.
>      It seems to me that unwind-protect can be partially implemented with
>      a closure. It does need to save and restore special bindings. I have
>      to do the same thing for threads when I get them in.

Since you are working rather the lambda way by implementing everything
on top of this special form, maybe it is not that useful. But perhaps
you will find it interesting to have a look at ECLS
(http://ecls.sourceforge.net). I has evolved from an interpreter based
on lists (code = list that has to be grovelled on every execution) to
a bytecodes interpreter. Here all environments are association lists
that keep the bindings for variables, macros and local functions. The
next generation (to be released in a few weeks) optimizes access to
this environment by computing the offset of each variable, function or
whatever at compile time. ECLS also has a Lisp->C translator which
comes back from ECoLisp, but I would not recommend studying its code
unless one really knows the guts of ECLS.

Juanjo