From: robbie carlton
Subject: Implementing Lisp in C?
Date: 
Message-ID: <32b5ef05.0404070605.2d99c869@posting.google.com>
Hi. I'm looking at doing a LIsp implementation, for educational
reasons. Most of it seems doable, but I'm wondering how to do
functions. Is it necessary to use assembly language to dynamically
create functions or can it be done in C? I know one way of doing it
would be using a virtual machine, but that seems prohibitively
expensive.

From: Robert Bruce Carleton
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <1078b6of9pr2f6f@corp.supernews.com>
I have a book called "Data Structures, An advanced approach using C" by 
Jeffrey Esakov and Tom Weiss.  It uses a simple lisp subset interpreter 
as an example of circular linked lists.  I've never tried it myself, but 
it might give you an idea of what's involved.

Best regards,

			--Bruce

robbie carlton wrote:
> Hi. I'm looking at doing a LIsp implementation, for educational
> reasons. Most of it seems doable, but I'm wondering how to do
> functions. Is it necessary to use assembly language to dynamically
> create functions or can it be done in C? I know one way of doing it
> would be using a virtual machine, but that seems prohibitively
> expensive.
From: Sam Steingold
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <uisgb336e.fsf@gnu.org>
> * robbie carlton <··············@ubgznvy.pbz> [2004-04-07 07:05:22 -0700]:
>
> I'm looking at doing a LIsp implementation, for educational reasons.

Sounds like a waste of time, something Forth afficionados do all the
time: creating half-baked incompatible partial implementations.

There are quite a few open source Lisp implementations, most in C, which
you can study for "educational reasons".  Even if you are dead set on
doing yet another one yourself, you should start with learning what and
how others did.

-- 
Sam Steingold (http://www.podval.org/~sds) running w2k
<http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/>
<http://www.mideasttruth.com/> <http://www.honestreporting.com>
The only time you have too much fuel is when you're on fire.
From: Will Hartung
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <c51shk$2ohp86$1@ID-197644.news.uni-berlin.de>
"Sam Steingold" <···@gnu.org> wrote in message ··················@gnu.org...
> > * robbie carlton <··············@ubgznvy.pbz> [2004-04-07
07:05:22 -0700]:
> >
> > I'm looking at doing a LIsp implementation, for educational reasons.
>
> Sounds like a waste of time, something Forth afficionados do all the
> time: creating half-baked incompatible partial implementations.
>
> There are quite a few open source Lisp implementations, most in C, which
> you can study for "educational reasons".  Even if you are dead set on
> doing yet another one yourself, you should start with learning what and
> how others did.

There is much to be learned by building and reinventing your own wheels.
Reading others code for anything is not the same as writing it yourself. The
simple act of typing code into an editor is vastly more educating than
simply cutting and pasting that same code, even if the end result is the
same -- a bunch of source code in an editor window. It all depends on the
goals of the user.

Going through the code character by character makes your think about each
little piece in detail, however fleetingly (unless you're in "touch
typing"/transcription mode).

The final result may be half baked, and even incomplete, but far more
instructive to the user than the most polished system can ever be.

Regards,

Will Hartung
(·····@msoft.com)
From: Cameron MacKinnon
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <6eKdneVD3sg37-ndRVn-hA@golden.net>
Sam Steingold wrote:
>>* robbie carlton <··············@ubgznvy.pbz> [2004-04-07 07:05:22 -0700]:
>>
>>I'm looking at doing a LIsp implementation, for educational reasons.
> 
> 
> Sounds like a waste of time, something Forth afficionados do all the
> time: creating half-baked incompatible partial implementations.

Speaking as someone who has created a Forth which, at every design 
choice point, went with smaller rather than faster, and who learned a 
lot in the process, I can tell you it was time well wasted.

One learns a lot more about software by writing one's own mistakes than 
by reading someone else's.

-- 
Cameron MacKinnon
Toronto, Canada
From: Jeff Dalton
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <fx4brm35yji.fsf@todday.inf.ed.ac.uk>
··············@hotmail.com (robbie carlton) writes:

> Hi. I'm looking at doing a LIsp implementation, for educational
> reasons. Most of it seems doable, but I'm wondering how to do
> functions. Is it necessary to use assembly language to dynamically
> create functions or can it be done in C? I know one way of doing it
> would be using a virtual machine, but that seems prohibitively
> expensive.

You shouldn't need any assembler.  C has pointers to functions,
so you can have a heap-allocated struct that contains such a
pointer (the struct will be your lisp function), then when
calling the Lisp function, gab the pointer and call it.

Are you thinking of writing an interpreter, a compiler, or both?

An interpreter shouldn't present any problem, so far as
dynamically creating functions is concerned.  I've written
Lisp interpreters in Basic and Pascal, and they're both
more restrictive than C.

-- jd
From: Rob Warnock
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <XKSdnQ9i_s0ysujdRVn-iQ@speakeasy.net>
Jeff Dalton  <····@todday.inf.ed.ac.uk> wrote:
+---------------
| ··············@hotmail.com (robbie carlton) writes:
| > Most of it seems doable, but I'm wondering how to do
| > functions. Is it necessary to use assembly language to dynamically
| > create functions or can it be done in C?
| 
| You shouldn't need any assembler.  C has pointers to functions,
| so you can have a heap-allocated struct that contains such a
| pointer (the struct will be your lisp function), then when
| calling the Lisp function, gab the pointer and call it.
+---------------

However, note that on some machine architectures it is necessary
to explicitly flush the data cache and/or the instruction cache
(for at least the affected region) *after* putting the instructions
to be executed into the heap-allocated array but *before* calling
the function. Most operating systems provide a user-accessible call
to perform that function.


-Rob

p.s. For a concrete example of this sort of thing, look in the
CMUCL C sources of the image loader (in "cmucl-18e/src/lisp/")
for references to the routines "sanctify_for_execution()" and
"os_flush_icache()"...

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Duane Rettig
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <4d66hkucw.fsf@franz.com>
····@rpw3.org (Rob Warnock) writes:

> Jeff Dalton  <····@todday.inf.ed.ac.uk> wrote:
> +---------------
> | ··············@hotmail.com (robbie carlton) writes:
> | > Most of it seems doable, but I'm wondering how to do
> | > functions. Is it necessary to use assembly language to dynamically
> | > create functions or can it be done in C?
> | 
> | You shouldn't need any assembler.  C has pointers to functions,
> | so you can have a heap-allocated struct that contains such a
> | pointer (the struct will be your lisp function), then when
> | calling the Lisp function, gab the pointer and call it.
> +---------------
> 
> However, note that on some machine architectures it is necessary
> to explicitly flush the data cache and/or the instruction cache
> (for at least the affected region) *after* putting the instructions
> to be executed into the heap-allocated array but *before* calling
> the function. Most operating systems provide a user-accessible call
> to perform that function.

Ironically (because you're the one saying it), Irix is one of the
very _few_ operating systems to actually provide a documented
interface to flush a specified range of the icache.  Most of the
systems which require icache flushing either provide undocumented
support that has to be flushed (sic :-) out, or they provide none
at all and expect the stupid programmer (well, who would be so stupid
as to try to execute in data space? :-) to perform the flush, and
some architectures don't provide any way to flush anything but the
entire cache (what a waste, especially when the caches get larger).

So, for example,

SGI has cacheflush, which even has a man page.

AIX has a sync_cache_range, but I don't see it documented in my
man pages (perhaps it is, but it is not in the standard set on
my machine).

HP-PA forces you to use fdc and fic instructions to flush out
the data cache and instruction cache, although this does indeed
give you control over what cache lines you are flushing (i.e.,
like the above functions, it doesn't force you to fluch the
whole caches).

Alpha has a PAL instruction that flushes out the whole cache.

Sparc is (was) the worst of all; newer architectures (I think
those from V8 and later) have a flush instructon, but the
rub is that if you execute this instruction on a V7,
you will get an illegal-instruction trap.  Perhaps newer versions
of solaris now handle such a trap by ignoring it, but at the
time of the transition from sparc-2 to sparc-10 and 5, when there
were more of the former than the latter in the field, and when the
version of solaris at the time expected no such flush operations,
it was indeed painful determining what hardware we had so that
we could execute or not execute the flush instruction, as
appropriate...

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Rob Warnock
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <1NCdnYTB7_m1qurdRVn-tw@speakeasy.net>
Duane Rettig  <·····@franz.com> wrote:
+---------------
| ····@rpw3.org (Rob Warnock) writes:
| > However, note that on some machine architectures it is necessary
| > to explicitly flush the data cache and/or the instruction cache
| > (for at least the affected region) *after* putting the instructions
| > to be executed into the heap-allocated array but *before* calling
| > the function. Most operating systems provide a user-accessible call
| > to perform that function.
| 
| Ironically (because you're the one saying it), Irix is one of the
| very _few_ operating systems to actually provide a documented
| interface to flush a specified range of the icache.
+---------------

Yes, well, by the time I got to SGI I was already well-sensitized to
the issue, having just-previously consulted for several years for AMD
on a Unix port to the Am29000, which had *completely* separate I & D
memory busses! [...though the I & D busses shared a common address bus.]
Depending on the system design, it wasn't necessarily even *possible*
to execute data! [Think ROM-only I-space for embedded systems...]

The later Am29030 had only a single memory bus, but did have an 2-way
set-associative I-cache, evoking similar issues. So naturally when I got
to SGI I was pleased to discover that they had a documented interface
that worked across all their platforms (even when it was a no-op, as it
was on a few machines).

+---------------
| Most of the systems which require icache flushing either provide
| undocumented support that has to be flushed (sic :-) out, or they
| provide none at all and expect the stupid programmer (well, who would
| be so stupid as to try to execute in data space? :-) to perform the
| flush, and some architectures don't provide any way to flush anything
| but the entire cache (what a waste, especially when the caches get larger).
+---------------

Looking at the CMUCL source code, they seem to assume(!?!) that the
system call "mprotect(addr, len, PROT_READ|PROT_WRITE|PROT_EXEC)"
will flush the caches, which at least the FreeBSD "mprotect(2)" man
page does *not* guarantee. [In fact, it doesn't even promise page-level
granularity: "...protection changes may be as large as an entire region."]

It'd be "interesting" [to say the least!] to know of a system implementing
"mprotect()" where the caches *weren't* flushed...


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Gareth McCaughan
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <87u0zvv57l.fsf@g.mccaughan.ntlworld.com>
··············@hotmail.com (robbie carlton) writes:

> Hi. I'm looking at doing a Lisp implementation, for educational
> reasons. Most of it seems doable, but I'm wondering how to do
> functions. Is it necessary to use assembly language to dynamically
> create functions or can it be done in C? I know one way of doing it
> would be using a virtual machine, but that seems prohibitively
> expensive.

If you're making an interpreter (which I'm guessing you
are, because otherwise there are plenty of much more
difficult things) then all you have to do is to store
the function body (as list structure), the argument
list, and the environment. So:

    /* |form| is a cons cell with the argument list
     * as car and the function body as cdr.
     * |env| is an environment object, however you
     * choose to represent that.
     */
    Object * make_function(Object * form, Environment * env) {

      /* Allocate memory.
       */
      Function * result = new_function();

      /* Fill in the object.
       */
      result->arglist = car(form);
      result->env = env;
      result->kind = func_INTERPRETED;

      /* Convert to a Lisp object.
       */
      return box_pointer(result, function_type);
    }

    /* |func| is a function object. |args| is the
     * evaluated argument list, as a Lisp object.
     * |env| is the environment.
     */
    Object * apply(Function * func, Object * args,
                   Environment * env) {
      /* Allocate new environment object and link it
       * to its predecessor.
       */
      Environment new_env = new_environment();
      new_env->parent = func->env;

      /* Bind function arguments.
       */
      { Object * vars = func->arglist;
        while (vars && args) {
          bind(new_env, car(vars), var(args));
          vars = cdr(vars);
          args = cdr(args);
        }
      }

      /* Transfer control.
       */
      return eval(func->body, new_env);
    }

And, just by way of background:

    Object * eval(Object * form, Environment * env) {
      if (is_cons(form)) {
        Object * first = car(form);
        Object * args = cdr(form);
        if (!is_symbol(first)) /* error: return in some magical way */
        first = lookup_operator(first, env); /* just lookup(...) for a Lisp-1 */
        if (!first) /* error: return in some magical way */
        if (operator_evaluates_args(first)) args = eval_list(args, env);
        if (operator_is_primitive(first)) return apply_primitive(first, args, env);
        else return apply(first, args, env);
      }
      else if (is_immediate(form)) {
        if (is_symbol(form)) return lookup(form, env);
        else if (is_self_evaluating(form)) return form;
        /* ... */
      }
      else {
        /* ... */
      }
    }

    Object * lookup(Object * name, Environment * env) {
      while (env) {
        Binding * binding = env->bindings;
        while (binding) {
          if (binding->name == name) return binding->value;
          binding = binding->next;
        }
        env = env->parent;
      }
      /* error (unbound symbol): do something magic */
    }

    Object * bind(Environment * env, Object * name, Object * value) {
      Binding * b = new_binding();
      b->name = name;
      b->value = value;
      b->next = env->bindings;
      env->bindings = b;
    }

If your C compiler doesn't eliminate tail calls, you may want to
combine |eval| and |apply| and various other things into a single
big hairy function and transfer control around using goto so as
not to make your stack explode. But that's ugly.

If you can live without lexical scoping (as the first Lisp
implementations did -- but it's not recommended) then life
is even easier: a function object doesn't need to store its
environment, so there's no particular reason not to represent
it as plain ol' list structure -- say, as a LAMBDA form --
and then your interpreter can do the right thing when the
is_cons branch of eval finds that the "operator" is a list
whose car is LAMBDA.

The /* do something magic */ for errors isn't cheating; it's
just that there are various ways to do it. You could let
all those functions return some special error value and
check for it everywhere, you could use |longjmp|, you could
write in C++ instead of C and use exceptions.

I've used C structs for function and environment objects.
You could make them Lisp objects instead and, e.g.,
implement an environment as an "alist" ((name . val) (name . val) ...).
That would be a little less efficient than the above, but it
has some advantages your your self-educational use.

-- 
Gareth McCaughan
.sig under construc
From: Erann Gat
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <gNOSPAMat-0704041531560001@k-137-79-50-101.jpl.nasa.gov>
FWIW:

# l.py - A tiny interpreter for a lisp-like language with full
# lexical closures in Python
#
# Written by Erann Gat. contributed to the public domain

# The global environment

globalenv = {}

# A handy utility: turn a list of keys and values into a dictionary

def dictify(keys, vals):
  d = {}
  for i in range(len(keys)):
    d[keys[i]]=vals[i]
  return d

# Find the lexical frame where a variable is bound

def findframe(s, env):
  if type(env)==type({}):
    return env
  else:
    if env[0].get(s) != None:
      return env[0]
    else:
      return findframe(s, env[1])

# Set the value of a variable.

def set(s, val, env):
  findframe(s, env)[s] = val

# The interpreter proper

class closure:
  def __init__(self, env, name, args, body):
    self.env = env
    self.name = name
    self.args = args
    self.body = body
  def __repr__(self):
    return "<closure %s (%s) %s %s>" % (self.name, self.args, self.body, self.en
v)
  def apply(self, params):
    frame = dictify(self.args, params)
    frame[self.name] = self
    return ev(self.body, [frame, self.env])

class primop:
  def __init__(self, op, name=None):
    if name == None: name = op.__name__
    if not isCallable(op):
      raise '%s is not callable' % op
    self.op = op
    globalenv[name] = self
  def __repr__(self):
    return "<primop %s>" % self.op.__name__
  def apply(self, params):
    return apply(self.op, params)

def ev(l, env):
  if type(l)==type(''):
    return findframe(l, env)[l]
  elif type(l)==type([]):
    if len(l)==0:
      return l
    elif l[0]=='fn':
      return closure(env, l[1], l[2], l[3])
    elif l[0]=='quote':
      return l[1:]
    elif l[0]=='cond':
      for clause in l[1:]:
        if ev(clause[0], env):
          return ev(clause[1], env)
      return 0
    elif len(l)==3 and l[1]=='=':
      set(l[0], ev(l[2], env), env)
    else:
      l = map(lambda x, env=env: ev(x, env), l)
      try: f = l[0].apply
      except: raise '%s is not a function object' % l[0]
      return f(l[1:])
  else:
    return l

# That's it!  Now we need a parser because Python doesn't provide one

import re, string

def parse(s):
  return lparse(re.split('(\[|\])|[\\s+|,]', s))

def parseAtom(s):
  try:
    s = string.atoi(s)
  except ValueError:
    try:
      s = string.atof(s)
    except ValueError:
      pass
  return s

def lparse(l):
  result = []
  rstack = [result]
  for item in l:
    if item == None or item == '':
      continue
    elif item == '[':
      r1 = []
      rstack[0].append(r1)
      rstack = [r1] + rstack
    elif item == ']':
      if rstack == []:
        print "Ignoring extra right paren"
      rstack = rstack[1:]
    else:
      rstack[0].append(parseAtom(item))
  if len(rstack)>1:
    print "Providing %s missing right parens" % (len(rstack)-1)
  return result

def evl(s):
  return ev(parse(s), globalenv)

# Examples.  Note: FN is like LAMBDA except that it takes the name
# of the function as its first argument so it can be printed to help
# in debugging.

import operator
from operator import *
for op in [add, mul, sub, div, abs, mod]: primop(op)

def eql(x,y): return x==y

primop(eql, '==')

evl('[fn foo [x y] [add x y]] 2 3.3')
evl('fact = [fn fact [x] [cond [[== x 0] 1] [1 [mul x [fact [sub x 1]]]]]]')
evl('fact 3')

evl('''
  fib = [fn fib [x] [cond [[== x 0] 1]
                          [[== x 1] 1]
                          [1 [add [fib [sub x 1]]
                                  [fib [sub x 2]]]]]]
                                  ''')
From: André Thieme
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <c595fd$r8p$1@ulric.tng.de>
Erann Gat wrote:

> FWIW:
> 
> # l.py - A tiny interpreter for a lisp-like language with full
> # lexical closures in Python

Nice nice nice!


Andr�
--
From: Erann Gat
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <gNOSPAMat-1004041009460001@192.168.1.51>
In article <············@ulric.tng.de>, =?ISO-8859-1?Q?Andr=E9_Thieme?=
<······································@justmail.de> wrote:

> Erann Gat wrote:
> 
> > FWIW:
> > 
> > # l.py - A tiny interpreter for a lisp-like language with full
> > # lexical closures in Python
> 
> Nice nice nice!

Thanks!  I was beginning to wonder if anyone had noticed.

E.
From: Peter Lewerin
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <dbc03c5a.0404150145.2039b7e6@posting.google.com>
> Thanks!  I was beginning to wonder if anyone had noticed.

I liked it too, and as an experiment I translated it to Tcl:

# l.tcl - A tiny interpreter for a lisp-like language with full
# lexical closures in Tcl
#
# by Peter Lewerin, based on
#
# # l.py - A tiny interpreter for a lisp-like language with full
# # lexical closures in Python
#
# # Written by Erann Gat. contributed to the public domain

# Find the lexical frame where a variable is bound

proc findframe {s ns} {
	if {$ns eq "::"} {
		return $ns
	} else {
		if {[namespace eval $ns [list info exists $s]]} {
			return $ns
		} else {
			return [findframe $s [namespace parent $ns]]
		}
	}
}

# The interpreter proper

package require snit

snit::type closure {
	option -args
	option -body
	constructor args {
		$self configurelist $args
	}
	method __repr__ {} {
		return "<$type $self ([$self cget -args]) [$self cget -body] $selfns>"
	}
	method apply {s ns} {
		foreach a [$self cget -args] p [lrange $s 1 end] {
			namespace eval ${ns}::frame set $a [appraise $p $ns]
		}
		return [appraise [$self cget -body] ${ns}::frame]
	}
}

snit::type mathop {
	option -expr
	constructor args {
		$self configurelist $args
	}
	method __repr__ {} {
		return "<$type $self ([$self cget -expr])>"
	}
	method apply {params ns} {
		set s {}
		foreach term [lrange $params 1 end] {
			lappend s [appraise $term $ns]
		}
		return [expr [subst [$self cget -expr]]]
	}
}

proc appraise {s {ns ::}} {
	if {$s eq {}} {
		return
	} elseif {[llength $s] == 1} {
		set lookupFailed [catch {
			# try to find a variable named by s in a namespace from this to global
			namespace eval [findframe $s $ns] set $s
		} value]
		if {$lookupFailed} {
			# didn't find any, s must contain a literal
			return $s
		} else {
			return $value
		}
	}
	
	# s contains a list; try to evaluate is as an s-expr
	set op [lindex $s 0]
	set params [lrange $s 1 end]
	
	# pre-evaluate operator in case it's an expression or a variable.
	# disable this for more lisp-like behavior
	set op [appraise $op $ns]

	if {![catch {$op info type} t]} {
		if {$t eq "::closure" || $t eq "::mathop"} {
			# if the first element is a closure or mathop object, call its apply method
			return [$op apply $s $ns]
		}
	}

	# otherwise, try the "special forms" and the "general form"
	switch -- $op {
		fn {
			return [closure ${ns}[lindex $params 0] \
				-args [lindex $params 1] -body [lindex $params 2]]
		}
		cond {
			foreach clause $params {
				if {[appraise [lindex $clause 0] $ns]} {
					return [appraise [lindex $clause 1] $ns]
				}
			}
			return 0
		}
		default {
			foreach term $params {
				lappend op [appraise $term $ns]
			}
			return [eval $op]
		}
	}
}

proc echo s {
	puts "{$s} = [appraise $s]"
}

mathop + -expr {[join $s +]}
mathop - -expr {[lindex $s 0] - [join [lrange $s 1 end] +]}
mathop * -expr {[join $s *]}
mathop / -expr {double([lindex $s 0]) / [join [lrange $s 1 end] *]}
mathop = -expr {[lindex $s 0] == [lindex $s 1]}

echo {+ 2 3 4}
echo {fn foo {x y} {+ x y}}
echo {foo 2 3.3}
echo {{fn bar {x y} {+ x y}} 2 3.3}
echo {fn fact {x} {cond {{= x 0} 1} {1 {* x {fact {- x 1}}}}}}
echo {fact 3}
echo {
	fn fib {x} {
	  cond {{= x 0} 1}
	       {{= x 1} 1}
	       {1 {+ {fib {- x 1}}
	             {fib {- x 2}}}}}
}
echo {fib 5}
From: David Sletten
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <5u0dc.43947$Lq4.20003@twister.socal.rr.com>
robbie carlton wrote:

> Hi. I'm looking at doing a LIsp implementation, for educational
> reasons. Most of it seems doable, but I'm wondering how to do
> functions. Is it necessary to use assembly language to dynamically
> create functions or can it be done in C? I know one way of doing it
> would be using a virtual machine, but that seems prohibitively
> expensive.

The book 'Lisp in Small Pieces' discusses implementing Lisp/Scheme in C. 
There's also a book online here:
http://www.civilized.com/LispBook/

David Sletten
From: Lars Brinkhoff
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <85oeq3j8o8.fsf@junk.nocrew.org>
··············@hotmail.com (robbie carlton) writes:
> I'm looking at doing a LIsp implementation, for educational reasons.
> Most of it seems doable, but I'm wondering how to do functions.  Is
> it necessary to use assembly language to dynamically create
> functions or can it be done in C?

My guess is that your're asking about how to dynamically compile Lisp
functions into machine code.  One way to do that would be to emit C
code into a temporary file, compile it with a C compiler to a
dynamically loadable object file, and then load that.  Some Lisp
implementations do exactly that.

> I know one way of doing it would be using a virtual machine, but
> that seems prohibitively expensive.

Maybe not.

-- 
Lars Brinkhoff,         Services for Unix, Linux, GCC, HTTP
Brinkhoff Consulting    http://www.brinkhoff.se/
From: Jeff Dalton
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <fx4n05m76a6.fsf@todday.inf.ed.ac.uk>
Lars Brinkhoff <·········@nocrew.org> writes:

> ··············@hotmail.com (robbie carlton) writes:
> > I'm looking at doing a LIsp implementation, for educational reasons.
> > Most of it seems doable, but I'm wondering how to do functions.  Is
> > it necessary to use assembly language to dynamically create
> > functions or can it be done in C?
> 
> My guess is that your're asking about how to dynamically compile Lisp
> functions into machine code.  One way to do that would be to emit C
> code into a temporary file, compile it with a C compiler to a
> dynamically loadable object file, and then load that.  Some Lisp
> implementations do exactly that.

It's tricky though.  Perhaps there are utilities that make it
easier these days.  (Tell me if so, because I'd have a use for
them.)

Garbage collection can also be tricky, with obscure bugs.

If you just want to see what implementing the higher levels
of Lisp looks like, it would be easier to do it in Java.

-- jd
From: Cameron MacKinnon
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <j7adnfnkOOdAFOjdRVn-jw@golden.net>
Jeff Dalton wrote:
> Lars Brinkhoff <·········@nocrew.org> writes:
>>My guess is that your're asking about how to dynamically compile Lisp
>>functions into machine code.  One way to do that would be to emit C
>>code into a temporary file, compile it with a C compiler to a
>>dynamically loadable object file, and then load that.  Some Lisp
>>implementations do exactly that.
> 
> It's tricky though.  Perhaps there are utilities that make it
> easier these days.  (Tell me if so, because I'd have a use for
> them.)

If I were taking the Lisp->C->machine code approach, I'd be tempted to 
hack TCC, The Tiny C Compiler - http://fabrice.bellard.free.fr/tcc/
to just pass it C in a string and have it emit code to memory.

And with a few extra hours of hacking, you could have a REPL that 
accepts either Lisp or C, and they could call each other! Sick, sick, sick.

-- 
Cameron MacKinnon
Toronto, Canada
From: Romain Francoise
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <87ptahz5kx.fsf@orebokech.com>
Cameron MacKinnon <··········@clearspot.net> writes:

> If I were taking the Lisp->C->machine code approach, I'd be tempted to
> hack TCC, The Tiny C Compiler - http://fabrice.bellard.free.fr/tcc/
> to just pass it C in a string and have it emit code to memory.

Actually you don't even need to hack it at all; it comes with libtcc, a
library which does exactly that.

-- 
Romain Francoise <······@orebokech.com> | Shine the headlight, straight
it's a miracle -- http://orebokech.com/ | into my eyes.
From: Paul F. Dietz
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <w_6dnfhMgPrMPOvdRVn-ug@dls.net>
Romain Francoise wrote:
> Cameron MacKinnon <··········@clearspot.net> writes:
> 
> 
>>If I were taking the Lisp->C->machine code approach, I'd be tempted to
>>hack TCC, The Tiny C Compiler - http://fabrice.bellard.free.fr/tcc/
>>to just pass it C in a string and have it emit code to memory.
> 
> 
> Actually you don't even need to hack it at all; it comes with libtcc, a
> library which does exactly that.

You might also be interested in

   vcode          http://www.pdos.lcs.mit.edu/~engler/
                  http://www.pdos.lcs.mit.edu/~engler/pldi96-abstract.html
   `C (Tick C)    http://www.pdos.lcs.mit.edu/tickc/
   GNU Lightning  http://www.gnu.org/software/lightning/
   DyC            http://www.cs.washington.edu/research/dyncomp/

It would be interesting if the CLISP project were to use a portable
dynamic code generation system like GNU Lightning to further speed
up their system.

	Paul
From: Ray Dillinger
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <40759A68.C6F72E33@sonic.net>
Jeff Dalton wrote:
 
> Garbage collection can also be tricky, with obscure bugs.

Actually, a simple and reasonably efficient generational 
garbage collector can be written in under a hundred lines 
of C.  I know, because I wrote one. 

It requires a 3-word "header" for all boxed objects, though, 
so even cons cells wind up occupying 5 32-bit words.  What 
I figured was "get it working before worrying about complications
that make it more space-efficient."

Would you like to see the code?

				Bear
From: Michael Walter
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <c547og$2oh7f2$1@ID-88904.news.uni-berlin.de>
Ray Dillinger wrote:
> Jeff Dalton wrote:
>  
> 
>>Garbage collection can also be tricky, with obscure bugs.
> 
> 
> Actually, a simple and reasonably efficient generational 
> garbage collector can be written in under a hundred lines 
> of C.  I know, because I wrote one. 
> 
> It requires a 3-word "header" for all boxed objects, though, 
> so even cons cells wind up occupying 5 32-bit words.  What 
> I figured was "get it working before worrying about complications
> that make it more space-efficient."
> 
> Would you like to see the code?
> 
> 				Bear
Yes, we would like to :)

Cheers,
Michael
From: Jeff Dalton
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <fx47jwq7yuh.fsf@tarn.inf.ed.ac.uk>
Ray Dillinger <····@sonic.net> writes:

> Jeff Dalton wrote:
>  
> > Garbage collection can also be tricky, with obscure bugs.
> 
> Actually, a simple and reasonably efficient generational 
> garbage collector can be written in under a hundred lines 
> of C.  I know, because I wrote one. 
> 
> It requires a 3-word "header" for all boxed objects, though, 
> so even cons cells wind up occupying 5 32-bit words.  What 
> I figured was "get it working before worrying about complications
> that make it more space-efficient."
> 
> Would you like to see the code?

Definitely.

I don't think it has to be hard to write a garbage collector.

If the Lisp implementation keeps all live pointers in easy-to-find
places, it can even be written without knowing any C hackery
such as how to look at C's stack.

However, bugs can be hard to track down; and things can work
in many cases but fail in unusual ones that are hard to
reporduce; and so on.

-- jd
From: Ari Johnson
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <Oprdc.19255$Vo.13274@fed1read03>
Jeff Dalton wrote:
> Ray Dillinger <····@sonic.net> writes:
> 
> 
>>Jeff Dalton wrote:
>> 
>>
>>>Garbage collection can also be tricky, with obscure bugs.
>>
>>Actually, a simple and reasonably efficient generational 
>>garbage collector can be written in under a hundred lines 
>>of C.  I know, because I wrote one. 
>>
>>It requires a 3-word "header" for all boxed objects, though, 
>>so even cons cells wind up occupying 5 32-bit words.  What 
>>I figured was "get it working before worrying about complications
>>that make it more space-efficient."
>>
>>Would you like to see the code?
> 
> 
> Definitely.
> 
> I don't think it has to be hard to write a garbage collector.
> 
> If the Lisp implementation keeps all live pointers in easy-to-find
> places, it can even be written without knowing any C hackery
> such as how to look at C's stack.
> 
> However, bugs can be hard to track down; and things can work
> in many cases but fail in unusual ones that are hard to
> reporduce; and so on.
> 
> -- jd

I have a mark-and-sweep type of GC in about 100 lines of C (113 with 
blank lines for formatting) in my ALisp interpreter.  It was essentially 
bug-free from the first time I typed in the code, and requires a grand 
total of 1 bit per 2-word cell to do its work.  It's also not 
horrendously slow. :)
From: Jeff Dalton
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <fx4u0zr4tf5.fsf@tarn.inf.ed.ac.uk>
Ari Johnson <·····@hotmail.com> writes:

> Jeff Dalton wrote:
> > Ray Dillinger <····@sonic.net> writes:
> >
> >>Jeff Dalton wrote:
> >>
> >>>Garbage collection can also be tricky, with obscure bugs.
> >>
> >> Actually, a simple and reasonably efficient generational garbage
> >> collector can be written in under a hundred lines of C.  I know,
> >> because I wrote one. It requires a 3-word "header" for all boxed
> >> objects, though, so even cons cells wind up occupying 5 32-bit
> >> words.  What I figured was "get it working before worrying about
> >> complications that make it more space-efficient."

> > I don't think it has to be hard to write a garbage collector.
> > If the Lisp implementation keeps all live pointers in easy-to-find
> > places, it can even be written without knowing any C hackery
> > such as how to look at C's stack.

> > However, bugs can be hard to track down; and things can work
> > in many cases but fail in unusual ones that are hard to
> > reporduce; and so on.

> I have a mark-and-sweep type of GC in about 100 lines of C (113 with
> blank lines for formatting) in my ALisp interpreter.  It was
> essentially bug-free from the first time I typed in the code, and
> requires a grand total of 1 bit per 2-word cell to do its work.  It's
> also not horrendously slow. :)

I wrote a similar-sounding GC at one point, though not in C.
It changed pointers as it went to avoid needing a stack,
following an algorithm from Knuth Vol 1.  It had one tough bug
I can remember, somewhere in the part that found the roots
of all live pointers.  But it hardly ever happened, making
it tricky to find.

So I know a GC can work out fairly well; but I've worked on
a number of Lisps over the years, and GC bugs were among the
most difficult.

If someone is interested in all the low-level parts of Lisp
implementation, then they definitely should write a GC.  But
otherwise, they can use an existing conservative collector or
else use a language that has GC built-in, and they might prefer
to do so.

BTW, I am curious about how people handled the problem of
finding live pointers in their Lisp-in-Cs.

A fairly common technique, even in non-toy implementations,
is to use a separate stack for pointers to Lisp objects, but
that seems inelegant.  Using C's own stack puts you in the
realm of conservative GC.

-- jd
From: Joe Marshall
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <1xmv4p8z.fsf@comcast.net>
Jeff Dalton <····@tarn.inf.ed.ac.uk> writes:

> BTW, I am curious about how people handled the problem of
> finding live pointers in their Lisp-in-Cs.
>
> A fairly common technique, even in non-toy implementations,
> is to use a separate stack for pointers to Lisp objects, but
> that seems inelegant.  Using C's own stack puts you in the
> realm of conservative GC.

For Rebol, I used a hybrid system.  The main heap was handled by
Boehm's conservative GC.  I used the `Cheney on the MTA' technique for
the stack, so the stack was used as a `nursery' for continuations.
When the stack overflowed, it was necessary to copy the active parts
of the call chain into the heap prior to resetting the stack pointer.
When allocating a continuation on the stack, the first cell was
initialized with the address of a routine that would copy the
continuation and (recursively) its contents to the heap.  Therefore,
stack overflow was handled by an indirect jump to the first slot of
the current continuation.  This would evacuate the topmost frame and
recursively call via first cell on the containing continuations (the
call chain usually had two parallel continuations).  When the stack
was initialized, the topmost continuation first slot was set to a
routine that would finish the collection and longjmp to reset the
stack.  These `micro-gcs' were very fast and could run several dozen
times a second.

-- 
~jrm
From: Ray Dillinger
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <40785B73.7612AA99@sonic.net>
Jeff Dalton wrote:
 
> If someone is interested in all the low-level parts of Lisp
> implementation, then they definitely should write a GC.  But
> otherwise, they can use an existing conservative collector or
> else use a language that has GC built-in, and they might prefer
> to do so.
> 
> BTW, I am curious about how people handled the problem of
> finding live pointers in their Lisp-in-Cs.

I was implementing a lisp->c compiler (small-L lisp, not Common 
Lisp) first in C then in lisp.  What I did was to pack the pointers 
to the front of each object.  16 bits of my GC header is a count of 
how many pointers are in an object, and then as far as the GC is 
concerned each boxed object, no matter what else it is, is simply 
an array of pointers with some irrelevant data at the end. This 
means I didn't need to keep type information (pointer counts and 
maps) around in a lookup table where I could find it from the 
typetag; it meant bigger header blocks on boxed objects, but it 
also means fewer cache stalls since I don't have to look something 
up in a separate table. 

Call frames (the "stack") are allocated on the heap, along with 
everything else, and I just mark the "rootset" bit in the GC header 
when I enter the scope and clear it when I exit the scope.  This 
is because stack frames may be saved even when they aren't rootset 
by being pointed to from a continuation. Each stack frame also points
to its parent stack frame, (the dynamic pointer) and its lexical 
scope object (the static pointer) and each lexical scope object
has a pointer to its parent lexical scope object.  This means I 
only need one stack frame in my root set to find all live pointers.
In practice, there are other objects I keep around so I don't have 
to traverse them, such as unmutated global functions and the 
datatables they contain. 

Every allocated object is part of a linked list ("soft" pointers 
in the header) so the garbage collector can always find everything.  
The GC just takes something from one list, processes it, and moves 
it to another list.  Whenever the "to-visit" list becomes empty, 
the "unvisited" list is renamed the "garbage" list, the "visited" 
list is renamed the "unvisited" list, and the "root objects" list 
is renamed the "to-visit" list. 

Every time the GC is invoked, if the "garbage" list is nonempty,
I take something off and free it.  If the "to-visit" list is nonempty,
I traverse its pointers moving anything that it points to which is 
in the "unvisited" list to the "to-visit" list. Then I check its 
header to see if it's a root object. If so, it goes back in the 
"root" list, and if not, it goes into the "visited" list. And if 
both the garbage and to-visit lists are empty, then the current 
GC is over and we start again by moving the list heads around.
(when this happens, the GC swizzles a set of numeric list 
identifiers, so that the list identifier nybble in each GC header 
continues to correspond to its new list).

The mutator always allocates new objects on the "to-visit" list, 
and whenever it changes an object that's currently on the "unvisited" 
list it moves it to the "to-visit" list. 

The way the GC works, finding the "root" pointers is automatic; 
they're just the data pointers in the nodes whose "root" bit is 
set, and you find those nodes by traversing the "to-visit" list 
through the pointers in the GC header. 

One consequence of having the pointer-count be a 16-bit value was 
that arrays and vectors longer than 65535 elements have to be 
allocated in chunks rather than in one contiguous gulp; but since 
the GC is supposed to be incremental and interleaved anyway, I 
didn't want noticeable pauses as it traversed million-element 
arrays. Making big arrays into 2-layered trees is a small 
performance hit that results in a constant-factor slowdown when 
very large arrays are in use.  

It's very simple.  It's also portable, incremental, precise, 
noncopying, well-behaved since it allows the heap to grow and 
shrink normally as the program uses more or less memory, and, 
generally, adequate.  There are faster collectors, but this 
list-based mark&sweep just has a lot of nice properties that 
most faster collectors don't have. 

The GC header is: a 32-bit next pointer, a 32-bit last pointer, 
a 16-bit pointer count, a 3-bit GC list identifier and the 
rootset bit, and a 12-bit typetag.  

				Bear
From: Rob Warnock
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <7vednTOErMZrHOvd4p2dnA@speakeasy.net>
Ray Dillinger  <····@sonic.net> wrote:
+---------------
| Jeff Dalton wrote:
| > Garbage collection can also be tricky, with obscure bugs.
| 
| Actually, a simple and reasonably efficient generational 
| garbage collector can be written in under a hundred lines 
| of C.  I know, because I wrote one. 
| 
| It requires a 3-word "header" for all boxed objects, though, 
| so even cons cells wind up occupying 5 32-bit words.
+---------------

If you allocate memory in "hunks", as the Train Algorithm does or
classic BiBoP (or a number of others), you can store the generation
information in the per-hunk metadata, and the per-object overhead
goes away. You should be able to get back to a 1-word header for
most heap objects, and if you use a low-bits pointer-tagging method
a small class of heap objects (cons cells are a obvious type to include)
will need *no* header [while the rest will still need a 1-word header].

Also note that card-marking software write barriers co-exist nicely
with "hunk" allocation (e.g., 256-byte cards in 64 KB hunks).


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Lars Brinkhoff
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <851xmyjst8.fsf@junk.nocrew.org>
Jeff Dalton <····@todday.inf.ed.ac.uk> writes:
> Lars Brinkhoff <·········@nocrew.org> writes:
> > emit C code into a temporary file, compile it with a C compiler to
> > a dynamically loadable object file, and then load that.
> It's tricky though.  Perhaps there are utilities that make it easier
> these days.  (Tell me if so, because I'd have a use for them.)

It's quite easy on the typical ELF platform.  Just write the C file,
call the C compiler with appropriate flag to generate an .so file, and
use dlopen, dlsym, etc to load and access it.

-- 
Lars Brinkhoff,         Services for Unix, Linux, GCC, HTTP
Brinkhoff Consulting    http://www.brinkhoff.se/
From: Erann Gat
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <gNOSPAMat-0804042327320001@192.168.1.51>
In article <··············@junk.nocrew.org>, Lars Brinkhoff
<·········@nocrew.org> wrote:

> Jeff Dalton <····@todday.inf.ed.ac.uk> writes:
> > Lars Brinkhoff <·········@nocrew.org> writes:
> > > emit C code into a temporary file, compile it with a C compiler to
> > > a dynamically loadable object file, and then load that.
> > It's tricky though.  Perhaps there are utilities that make it easier
> > these days.  (Tell me if so, because I'd have a use for them.)
> 
> It's quite easy on the typical ELF platform.  Just write the C file,
> call the C compiler with appropriate flag to generate an .so file, and
> use dlopen, dlsym, etc to load and access it.

That works nicely until you want to change the definition of a function
(as opposed to defining a new function).

E.
From: Lars Brinkhoff
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <85wu4pimf0.fsf@junk.nocrew.org>
·········@flownet.com (Erann Gat) writes:
> > It's quite easy on the typical ELF platform.  Just write the C file,
> > call the C compiler with appropriate flag to generate an .so file, and
> > use dlopen, dlsym, etc to load and access it.
> That works nicely until you want to change the definition of a function
> (as opposed to defining a new function).

I haven't tested it, so I don't know for sure, but I believe it could
work something like this:

  new_object = dlopen ("file.so", RTLD_NOW);
  new_function = dlsym (new_object, "fn");
  set_symbol_function (find_symbol ("fn", package),
                       make_lisp_function (new_function));

-- 
Lars Brinkhoff,         Services for Unix, Linux, GCC, HTTP
Brinkhoff Consulting    http://www.brinkhoff.se/
From: Erann Gat
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <gNOSPAMat-0904041916350001@192.168.1.51>
In article <··············@junk.nocrew.org>, Lars Brinkhoff
<·········@nocrew.org> wrote:

> ·········@flownet.com (Erann Gat) writes:
> > > It's quite easy on the typical ELF platform.  Just write the C file,
> > > call the C compiler with appropriate flag to generate an .so file, and
> > > use dlopen, dlsym, etc to load and access it.
> > That works nicely until you want to change the definition of a function
> > (as opposed to defining a new function).
> 
> I haven't tested it, so I don't know for sure, but I believe it could
> work something like this:
> 
>   new_object = dlopen ("file.so", RTLD_NOW);
>   new_function = dlsym (new_object, "fn");
>   set_symbol_function (find_symbol ("fn", package),
>                        make_lisp_function (new_function));

Doesn't work, at least not under Linux.  The second time you dlopen the
"same" file nothing happens.  (It even says so in the man page.)

E.
From: Duane Rettig
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <4vfk8348r.fsf@franz.com>
·········@flownet.com (Erann Gat) writes:

> In article <··············@junk.nocrew.org>, Lars Brinkhoff
> <·········@nocrew.org> wrote:
> 
> > ·········@flownet.com (Erann Gat) writes:
> > > > It's quite easy on the typical ELF platform.  Just write the C file,
> > > > call the C compiler with appropriate flag to generate an .so file, and
> > > > use dlopen, dlsym, etc to load and access it.
> > > That works nicely until you want to change the definition of a function
> > > (as opposed to defining a new function).
> > 
> > I haven't tested it, so I don't know for sure, but I believe it could
> > work something like this:
> > 
> >   new_object = dlopen ("file.so", RTLD_NOW);
> >   new_function = dlsym (new_object, "fn");
> >   set_symbol_function (find_symbol ("fn", package),
> >                        make_lisp_function (new_function));
> 
> Doesn't work, at least not under Linux.  The second time you dlopen the
> "same" file nothing happens.  (It even says so in the man page.)

The "same" file means the inode is the same.  When you rebuild
the .so, it becomes a "different" .so.  Some operating systems force
you to rename the old .so before creating a new one.  You can
use mv for that.

Be sure to dlclose() the handle first, to release and gc the old
inode and so that the dlopen will grab the new file.

As far as I know, Windows is the only system that actually doesn't let you
do this, and the reason is that it won't allow you to move an active .dll

[Some systems have different names for dlopen/dlclose, but all have
something with similar functionality]

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Erann Gat
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <gNOSPAMat-0904042248210001@192.168.1.51>
In article <·············@franz.com>, Duane Rettig <·····@franz.com> wrote:

> ·········@flownet.com (Erann Gat) writes:
> 
> > In article <··············@junk.nocrew.org>, Lars Brinkhoff
> > <·········@nocrew.org> wrote:
> > 
> > > ·········@flownet.com (Erann Gat) writes:
> > > > > It's quite easy on the typical ELF platform.  Just write the C file,
> > > > > call the C compiler with appropriate flag to generate an .so file, and
> > > > > use dlopen, dlsym, etc to load and access it.
> > > > That works nicely until you want to change the definition of a function
> > > > (as opposed to defining a new function).
> > > 
> > > I haven't tested it, so I don't know for sure, but I believe it could
> > > work something like this:
> > > 
> > >   new_object = dlopen ("file.so", RTLD_NOW);
> > >   new_function = dlsym (new_object, "fn");
> > >   set_symbol_function (find_symbol ("fn", package),
> > >                        make_lisp_function (new_function));
> > 
> > Doesn't work, at least not under Linux.  The second time you dlopen the
> > "same" file nothing happens.  (It even says so in the man page.)
> 
> The "same" file means the inode is the same.  When you rebuild
> the .so, it becomes a "different" .so.

That's what I thought too, but empirically this is not so (no pun
intended).  Either rebuilding the .so somehow results in the file having
the same inode as before, or dlopen judges "sameness" by path, not inode.

> Some operating systems force
> you to rename the old .so before creating a new one.  You can
> use mv for that.

Doesn't help.

> Be sure to dlclose() the handle first, to release and gc the old
> inode and so that the dlopen will grab the new file.

That is probably the key.  But this is problematic because now you have to
keep track for every function that you load which file it came from.  When
you redefine a function you have to close its corresponding library, and
then make sure that *all* of the functions defined in that library get
restored in the new version.  It's a titanic pain in the barumpus.  It
would probably be easier to get the code for dlopen and hack it to do the
Right Thing.

E.
From: Duane Rettig
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <4r7uw2ub7.fsf@franz.com>
·········@flownet.com (Erann Gat) writes:

> In article <·············@franz.com>, Duane Rettig <·····@franz.com> wrote:
> 
> > ·········@flownet.com (Erann Gat) writes:
> > 
> > > In article <··············@junk.nocrew.org>, Lars Brinkhoff
> > > <·········@nocrew.org> wrote:
> > > 
> > > > ·········@flownet.com (Erann Gat) writes:
> > > > > > It's quite easy on the typical ELF platform.  Just write the C file,
> > > > > > call the C compiler with appropriate flag to generate an .so file, and
> > > > > > use dlopen, dlsym, etc to load and access it.
> > > > > That works nicely until you want to change the definition of a function
> > > > > (as opposed to defining a new function).
> > > > 
> > > > I haven't tested it, so I don't know for sure, but I believe it could
> > > > work something like this:
> > > > 
> > > >   new_object = dlopen ("file.so", RTLD_NOW);
> > > >   new_function = dlsym (new_object, "fn");
> > > >   set_symbol_function (find_symbol ("fn", package),
> > > >                        make_lisp_function (new_function));
> > > 
> > > Doesn't work, at least not under Linux.  The second time you dlopen the
> > > "same" file nothing happens.  (It even says so in the man page.)
> > 
> > The "same" file means the inode is the same.  When you rebuild
> > the .so, it becomes a "different" .so.
> 
> That's what I thought too, but empirically this is not so (no pun
> intended).  Either rebuilding the .so somehow results in the file having
> the same inode as before, or dlopen judges "sameness" by path, not inode.

Well, a non-buggy linker would refuse to replace an in-use .so directly.
We generally have our makefiles move the old one out of the way before
building a new one.  The only times this doesn't work is if there are
old zombie processes (for whatever reason) that are hanging on to any
of the old .so files so that the rename fails.

> > Some operating systems force
> > you to rename the old .so before creating a new one.  You can
> > use mv for that.
> 
> Doesn't help.

We tend to have no problem with this.

> > Be sure to dlclose() the handle first, to release and gc the old
> > inode and so that the dlopen will grab the new file.
> 
> That is probably the key.  But this is problematic because now you have to
> keep track for every function that you load which file it came from.  When
> you redefine a function you have to close its corresponding library, and
> then make sure that *all* of the functions defined in that library get
> restored in the new version.  It's a titanic pain in the barumpus.  It
> would probably be easier to get the code for dlopen and hack it to do the
> Right Thing.

Correct.  And I know we're talking about more-or-less direct calls to C
in this thread, but I tend to think more in terms of our own
foreign-function interface, which goes the extra distance to keep track
of the foreign functions it knows about in what we call "entry-vecs" -
dlopen calls are done indirectly and kept track of, and then if the
library changes and is reloaded, the interface automatically unloads,
reloads, and re-fills entry addresses and handles in the entry-vecs;
it makes for a vrey lispy interface.  And the fact that the
ff:def-foreign-call and the cl:load call are separate means that
you can do something like this:

CL-USER(2): (ff:def-foreign-call add2 ((x :int) (y :int)))
ADD2
CL-USER(3): (add2 10 20)
Error: Attempt to call #("add2" 1074540579 0 2 1074540579) for which
       the definition has not yet been (or is no longer) loaded.
  [condition type: SIMPLE-ERROR]

Restart actions (select using :continue):
 0: Return to Top Level (an "abort" restart).
 1: Abort entirely from this (lisp) process.
[1] CL-USER(4): :ld test/c/add2.so
; Foreign loading test/c/add2.so.
[1] CL-USER(5): :prt
CL-USER(6): (ADD2 10 20) ;; :prt evaluation
30
CL-USER(7): :ld test/c/add2.so
; Foreign loading test/c/add2.so.
CL-USER(8): (add2 30 40)
70
CL-USER(9): 

Note how I reloaded add2.so, (even though I hadn't changed the
library - if I had made a change the behavior would have been
what was intuitively expected).

And, playing around with some internals:

CL-USER(9): (ff:get-entry-point "add2")
1073824504
134560512
CL-USER(10): (db::get-c-name-from-pc 1073824504)
"add2+0"
CL-USER(11): (ff:unload-foreign-library "test/c/add2.so")
#p"test/c/add2.so"
CL-USER(12): (ff:get-entry-point "add2")
NIL
NIL
CL-USER(13): (ff:get-entry-point "add2" :return-missing-stub-address t)
1074540579
NIL
CL-USER(14): (db::get-c-name-from-pc 1074540579)
"lisp_missing_stub+0"
CL-USER(15): :ld test/c/add2.so
; Foreign loading test/c/add2.so.
CL-USER(16): (ff:get-entry-point "add2")
1073824504
134562320
CL-USER(17): 

In this last case the operating system just happens to have loaded
the .so in the same place as it had been loaded before.  Notice,
however, that the handle (the second value returned from
get-entry-point) is in fact different for the last loading than for
the next-to-last loading of the .so file.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Duane Rettig
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <4n05k2u0k.fsf@franz.com>
Duane Rettig <·····@franz.com> writes:

> > That's what I thought too, but empirically this is not so (no pun
> > intended).  Either rebuilding the .so somehow results in the file having
> > the same inode as before, or dlopen judges "sameness" by path, not inode.
> 
> Well, a non-buggy linker would refuse to replace an in-use .so directly.

Forgot to mention: that the usual error syndrome for this kind of
thing is a "text busy" errno returned by the linker.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Rob Warnock
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <iL-dnTpOUdceT-rd4p2dnA@speakeasy.net>
Erann Gat <·········@flownet.com> wrote:
+---------------
|  Duane Rettig <·····@franz.com> wrote:
| > Be sure to dlclose() the handle first, to release and gc the old
| > inode and so that the dlopen will grab the new file.
| 
| That is probably the key.  But this is problematic because now you have
| to keep track for every function that you load which file it came from.
| When you redefine a function you have to close its corresponding library,
| and then make sure that *all* of the functions defined in that library get
| restored in the new version.  It's a titanic pain in the barumpus.
+---------------

Somebody posted some code here just recently to do *exactly* that...
Now where was it...?  *AHA!!*  Here it is:

  Newsgroups: comp.lang.lisp
  From: Eric Marsden <········@laas.fr>
  Subject: Re: does CMUCL garbage-collect object code?
  Date: Wed, 31 Mar 2004 11:48:38 +0200
  Message-ID: <···············@melbourne.laas.fr>

or:

  <http://www.google.com/groups?as_umsgid=wzin05x9wjt.fsf%40melbourne.laas.fr>

The function RELOAD-SHARED-LIBRARIES, which he cheerfully admits is
CMUCL-specific and nowhere near "supported".


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Duane Rettig
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <4r7uvrd67.fsf@franz.com>
····@rpw3.org (Rob Warnock) writes:

> Erann Gat <·········@flownet.com> wrote:
> +---------------
> |  Duane Rettig <·····@franz.com> wrote:
> | > Be sure to dlclose() the handle first, to release and gc the old
> | > inode and so that the dlopen will grab the new file.
> | 
> | That is probably the key.  But this is problematic because now you have
> | to keep track for every function that you load which file it came from.
> | When you redefine a function you have to close its corresponding library,
> | and then make sure that *all* of the functions defined in that library get
> | restored in the new version.  It's a titanic pain in the barumpus.
> +---------------
> 
> Somebody posted some code here just recently to do *exactly* that...
> Now where was it...?  *AHA!!*  Here it is:
> 
>   Newsgroups: comp.lang.lisp
>   From: Eric Marsden <········@laas.fr>
>   Subject: Re: does CMUCL garbage-collect object code?
>   Date: Wed, 31 Mar 2004 11:48:38 +0200
>   Message-ID: <···············@melbourne.laas.fr>
> 
> or:
> 
>   <http://www.google.com/groups?as_umsgid=wzin05x9wjt.fsf%40melbourne.laas.fr>
> 
> The function RELOAD-SHARED-LIBRARIES, which he cheerfully admits is
> CMUCL-specific and nowhere near "supported".

This looks correct, but there's still a piece missing (though CMUCL might
already do this anyway): When a user does a
dumplisp/save-image/whatever-it's-called-by-your-vendor the reborn
lisp image should for the most part come back up the way it had been
before.  So all those loaded .so files should also be re-loaded if
they exist (this part is a stickler, since the dumped image might
be moved to a different directory or even to a different machine,
but it is resolvable, especially by using logical pathnames).  For
Allegro CL, we have a FAQ entry, with the question "How does Lisp
start up, in terms of shared-library linking and loading?" which is at

http://www.franz.com/support/documentation/6.2/doc/faq/faq3-9.htm#393

Note specifically bullet 9 in the section labelled "The Startup Process".

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Rob Warnock
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <PJSdne0zmMIHquTdRVn-gQ@speakeasy.net>
Duane Rettig  <·····@franz.com> wrote:
+---------------
| ····@rpw3.org (Rob Warnock) writes:
| > Somebody posted some code here just recently to do *exactly* that...
| > From: Eric Marsden <········@laas.fr>
| > <http://www.google.com/groups?as_umsgid=wzin05x9wjt.fsf%40melbourne.laas.fr>
| > The function RELOAD-SHARED-LIBRARIES, which he cheerfully admits is
| > CMUCL-specific and nowhere near "supported".
| 
| This looks correct, but there's still a piece missing (though CMUCL might
| already do this anyway): When a user does a
| dumplisp/save-image/whatever-it's-called-by-your-vendor the reborn
| lisp image should for the most part come back up the way it had been
| before.  So all those loaded .so files should also be re-loaded if
| they exist...
+---------------

No, CMUCL doesn't try to do this for you. In fact, there's a warning
in the CMUCL User's Manual in the section on the "Alien" FFI stuff:

    8.6 Loading Unix Object Files
    Foreign object files are loaded into the running Lisp process
    by LOAD-FOREIGN...
    Note that if a Lisp core image is saved (using SAVE-LISP),
    all loaded foreign code is lost when the image is restarted.

[The same is equally true if you use the internal SYSTEM::LOAD-OBJECT-FILE
to load ".so" files directly, as some of us do...]

That said, though, CMUCL *does* provide an *AFTER-SAVE-INITIALIZATIONS*
hook -- "a list of functions which are called when a saved core image
starts up" -- so before saving the image you can push a closure onto
this hook to (re)load stuff for you.

+---------------
| (this part is a stickler, since the dumped image might be moved
| to a different directory or even to a different machine, but it
| is resolvable, especially by using logical pathnames).
+---------------

That helps, but varying versions of operating systems and/or system
libraries might create further portability problems, just as they do
with C programs that use shared libs.

+---------------
| For Allegro CL, we have a FAQ entry, with the question "How does Lisp
| start up, in terms of shared-library linking and loading?" which is at
|   http://www.franz.com/support/documentation/6.2/doc/faq/faq3-9.htm#393
| Note specifically bullet 9 in the section labelled "The Startup Process".
+---------------

Interesting, thanks!


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Cor Gest
Subject: Re: Implementing Lisp in C?
Date: 
Message-ID: <873c7cl504.fsf@cleopatra.clsnet.nl>
Doubleplus-good witty human: ·········@flownet.com (Erann Gat) wrote :
> > > > I haven't tested it, so I don't know for sure, but I believe it could
> > > > work something like this:
> > > > 
> > > >   new_object = dlopen ("file.so", RTLD_NOW);
> > > >   new_function = dlsym (new_object, "fn");
> > > >   set_symbol_function (find_symbol ("fn", package),
> > > >                        make_lisp_function (new_function));
> > > 
> > > Doesn't work, at least not under Linux.  The second time you dlopen the
> > > "same" file nothing happens.  (It even says so in the man page.)
> > 
> > The "same" file means the inode is the same.  When you rebuild
> > the .so, it becomes a "different" .so.
> 
> That's what I thought too, but empirically this is not so (no pun
> intended).  Either rebuilding the .so somehow results in the file having
> the same inode as before, or dlopen judges "sameness" by path, not inode.
> 
> > Be sure to dlclose() the handle first, to release and gc the old
> > inode and so that the dlopen will grab the new file.
> 
> That is probably the key.  But this is problematic because now you have to
> keep track for every function that you load which file it came from.  When
> you redefine a function you have to close its corresponding library, and
> then make sure that *all* of the functions defined in that library get
> restored in the new version.  It's a titanic pain in the barumpus.  It
> would probably be easier to get the code for dlopen and hack it to do the
> Right Thing.

But since it is an .so the old could still be in memory, and not reread if
you change it without having run ldconfig to reread the dynamic libraries as
stated in /etc/ld.so.conf, which stay in memory after loading only once,
no matter what you do with the library on disk.
At least that is how this could be caused in linux.

cor


-- 
    Elke overeenkomst met de waarheid berust op louter toeval en mag
             derhalve de schrijver niet worden aangerekend.
Alle sgraifvauden zijn bedoelt voor het internet-werkgelegenheidsproject
http://www.clsnet.nl     http://geekgrrrl.nl     http://thefreeworld.net