self-hosting gc

From: Thomas Bushnell, BSG
Subject: self-hosting gc
Date: Thu, 28 Feb 2002 23:51:32 +0000
Message-ID: <87elj5i2rf.fsf@becket.becket.net>

So I have a question for the people who know more than me.  

Is there experience with writing self-hosting GC for Lisp or Scheme
systems?  By "self-hosting" I mean that the GC is principally written
in Lisp/Scheme, and compiled by the native compiler.  I do not mean
something written in a secondary language (like C) and compiled by a
separate compiler, or something written all in assembly language.

Obviously there are interesting problems to be solved it making such a
thing work.

What did the old Lisp Machines do?  It's my understanding that the GC
was basically all written in assembly language; is that correct?

Is there research/experience on doing it?  Guesses about the best ways
to make it work?

Thomas

Re: self-hosting gc Carl Shapiro
Re: self-hosting gc Barry Margolin
- Re: self-hosting gc Thomas Bushnell, BSG
  - Re: self-hosting gc Thomas F. Burdick
Re: self-hosting gc Tim Moore
- Re: self-hosting gc Thomas Bushnell, BSG
  - Re: self-hosting gc Tim Moore
    - Re: self-hosting gc Thomas Bushnell, BSG
      - Re: self-hosting gc Frank A. Adrian
        Re: self-hosting gc Thomas Bushnell, BSG
      - Re: self-hosting gc Barry Margolin
        Re: self-hosting gc Stefan Monnier
    - Re: self-hosting gc Erik Naggum
      - Re: self-hosting gc Martin Simmons
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Martin Simmons
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Thomas Bushnell, BSG
      - Re: self-hosting gc Tim Moore
        Re: self-hosting gc Thomas Bushnell, BSG
  - Re: self-hosting gc David Rush
    - Re: self-hosting gc Christian Lynbech
      - Re: self-hosting gc David Rush
        Re: self-hosting gc Nils Goesche
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Stefan Monnier
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Christian Lynbech
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Ray Dillinger
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc ·······@andrew.cmu.edu
        Re: self-hosting gc Christian Lynbech
        Re: self-hosting gc Vilhelm Sjoberg
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Ray Blaak
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Tim Bradshaw
        on OS design and language technology [was: Re: self-hosting gc] Matthias Blume
        Re: on OS design and language technology [was: Re: self-hosting gc] Tim Bradshaw
        Re: on OS design and language technology [was: Re: self-hosting gc] Erann Gat
        Re: on OS design and language technology [was: Re: self-hosting gc] Marco Antoniotti
        Re: on OS design and language technology [was: Re: self-hosting gc] Erann Gat
        Re: on OS design and language technology [was: Re: self-hosting gc] Stefan Monnier
        Re: on OS design and language technology [was: Re: self-hosting gc] Tim Bradshaw
        Re: on OS design and language technology [was: Re: self-hosting gc] Sander Vesik
        Re: on OS design and language technology [was: Re: self-hosting gc] Marco Antoniotti
        Re: on OS design and language technology [was: Re: self-hosting gc] Tim Bradshaw
        Re: on OS design and language technology [was: Re: self-hosting gc] Simon Helsen
        Re: on OS design and language technology [was: Re: self-hosting gc] Nils Kassube
        Re: on OS design and language technology [was: Re: self-hosting gc] Sander Vesik
        Re: on OS design and language technology [was: Re: self-hosting gc] Tim Bradshaw
        Re: on OS design and language technology [was: Re: self-hosting gc] Sander Vesik
        Re: on OS design and language technology [was: Re: self-hosting gc] ····@pobox.com
        Re: on OS design and language technology [was: Re: self-hosting gc] Sander Vesik
        Re: on OS design and language technology [was: Re: self-hosting gc] Skip Egdorf
        Re: on OS design and language technology [was: Re: self-hosting gc] Daniel Barlow
        Re: self-hosting gc David Rush
        Re: self-hosting gc Nils Goesche
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Marco Antoniotti
        Re: self-hosting gc ·······@andrew.cmu.edu
        Re: self-hosting gc Marco Antoniotti
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Marco Antoniotti
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Marco Antoniotti
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Marco Antoniotti
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Sander Vesik
        Re: self-hosting gc Jeffrey M. Vinocur
        Re: self-hosting gc Sander Vesik
        Re: self-hosting gc Seth Gordon
        Re: self-hosting gc Jeffrey M. Vinocur
        Re: self-hosting gc Nils Goesche
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Nils Goesche
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Nils Goesche
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Nils Goesche
        Re: self-hosting gc David Rush
        Re: self-hosting gc Nils Goesche
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Ray Dillinger
        Re: self-hosting gc Frode Vatvedt Fjeld
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Frode Vatvedt Fjeld
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Frode Vatvedt Fjeld
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Will Deakin
        Re: self-hosting gc Christopher Browne
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Matthias Blume
        Proof Carrying Code Christopher Browne
        Re: Proof Carrying Code Thomas Bushnell, BSG
        Re: Proof Carrying Code Matthias Blume
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Rahul Jain
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Nils Goesche
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Frode Vatvedt Fjeld
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Frode Vatvedt Fjeld
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Frode Vatvedt Fjeld
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Frode Vatvedt Fjeld
        Re: self-hosting gc Joe Marshall
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Frode Vatvedt Fjeld
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Joe Marshall
        Re: self-hosting gc Jerry
        sandboxes are hard? Lex Spoon
        Re: sandboxes are hard? Jeffrey Siegal
        Re: sandboxes are hard? Lex Spoon
        Re: self-hosting gc Nils Goesche
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Nils Goesche
        Re: self-hosting gc David Rush
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc David Rush
        Re: self-hosting gc Nils Goesche
        Re: self-hosting gc Kenny Tilton
        Re: self-hosting gc Nicolas Neuss
        Re: self-hosting gc Nicolas Neuss
        Re: self-hosting gc Bulent Murtezaoglu
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Christian Lynbech
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Frank A. Adrian
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Frank A. Adrian
        Re: self-hosting gc David Rush
        Re: self-hosting gc Stefan Monnier
        Re: self-hosting gc Nils Goesche
        Re: self-hosting gc Jeffrey Siegal
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Stefan Monnier
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Joe English
        Re: self-hosting gc Michael Sperber [Mr. Preprocessor]
        Re: self-hosting gc Daniel C. Wang
        Re: self-hosting gc Jochen Schmidt
      - Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Rahul Jain
- Re: self-hosting gc Martin Simmons
Re: self-hosting gc Christopher C. Stacy
Re: self-hosting gc Craig Brozefsky
Re: self-hosting gc Christian Lynbech
- Re: self-hosting gc Bernhard Pfahringer
  - Linear Lisp (was Re: self-hosting gc) Christian Lynbech
Re: self-hosting gc Mike Travers
- Re: self-hosting gc Thomas Bushnell, BSG
  - Re: self-hosting gc Marco Antoniotti
    - Re: self-hosting gc Thomas Bushnell, BSG
      - Re: self-hosting gc Marco Antoniotti
        Re: self-hosting gc Christopher Browne
        Re: self-hosting gc Joe Marshall
        Re: self-hosting gc Frode Vatvedt Fjeld
        Re: self-hosting gc Thomas Bushnell, BSG
      - Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Bijan Parsia
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Nils Goesche
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Martin Simmons
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Tim Bradshaw
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Duane Rettig
        Re: self-hosting gc Thomas Bushnell, BSG
        Re: self-hosting gc Christopher Browne
        Re: self-hosting gc Erik Naggum
        Re: self-hosting gc Thomas Bushnell, BSG
      - Re: self-hosting gc Joe Marshall
        Re: self-hosting gc Christopher Browne
  - Re: self-hosting gc Mike Travers
    - Re: self-hosting gc Daniel C. Wang
      - Re: self-hosting gc Jochen Schmidt
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc ·······@andrew.cmu.edu
        Re: self-hosting gc Jochen Schmidt
        Re: self-hosting gc Matthias Blume
        Re: self-hosting gc Jochen Schmidt

From: Carl Shapiro
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 02:30:50 +0000
Message-ID: <ouy664hggth.fsf@panix3.panix.com>

·········@becket.net (Thomas Bushnell, BSG) writes:

> Is there experience with writing self-hosting GC for Lisp or Scheme
> systems?  By "self-hosting" I mean that the GC is principally written
> in Lisp/Scheme, and compiled by the native compiler.  I do not mean
> something written in a secondary language (like C) and compiled by a
> separate compiler, or something written all in assembly language.

A former Lucid employee (now a co-worker of mine) claims that Lucid
Common Lisp's collector was written entirely in Lisp.

There is a publicly available paper describing some aspects of one
particular revision of their collector:

ftp://publications.ai.mit.edu/ai-publications/1000-1499/AITR-1417.ps.Z

From: Barry Margolin
Subject: Re: self-hosting gc
Date: Thu, 28 Feb 2002 23:56:54 +0000
Message-ID: <atzf8.29$h14.6669@paloalto-snr2.gtei.net>

In article <··············@becket.becket.net>,
Thomas Bushnell, BSG <·········@becket.net> wrote:
>
>So I have a question for the people who know more than me.  
>
>Is there experience with writing self-hosting GC for Lisp or Scheme
>systems?  By "self-hosting" I mean that the GC is principally written
>in Lisp/Scheme, and compiled by the native compiler.  I do not mean
>something written in a secondary language (like C) and compiled by a
>separate compiler, or something written all in assembly language.
>
>Obviously there are interesting problems to be solved it making such a
>thing work.
>
>What did the old Lisp Machines do?  It's my understanding that the GC
>was basically all written in assembly language; is that correct?

No, it was mostly written in Lisp, IIRC.  There was some hardware and
microcode acceleration, for things like ephemeral GC, but the main
algorithm was implemented in Lisp.

To do it, the Lisp system has to provide subprimitives that allow a program
to access the low-level data, so it can look at type tags and memory
addresses, perform direct memory reads and writes, etc.  The GC code will
probably have to restrict itself to a subset of the full language's
capabilities; it might not even be able to cons, or it might have to do its
consing in a special region of memory that's set aside for it.

-- 
Barry Margolin, ······@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 00:28:37 +0000
Message-ID: <87ofi9gmh6.fsf@becket.becket.net>

Barry Margolin <······@genuity.net> writes:

> No, it [lispm gc] was mostly written in Lisp, IIRC.  There was some
> hardware and microcode acceleration, for things like ephemeral GC,
> but the main algorithm was implemented in Lisp.

That's really cool to hear.  Are there papers on implementation
around? 

> To do it, the Lisp system has to provide subprimitives that allow a program
> to access the low-level data, so it can look at type tags and memory
> addresses, perform direct memory reads and writes, etc.  The GC code will
> probably have to restrict itself to a subset of the full language's
> capabilities; it might not even be able to cons, or it might have to do its
> consing in a special region of memory that's set aside for it.

Sure, of course I was taking for granted the existence of suitable
peek/poke primitives.  

I'm wondering about what if the entire set of language structures were
usable (for scheme, probably maybe even call/cc)--at least, as the
hard case.  And if the hard case works, then why bother restricting
the language?!

So I had thought of the "special memory region" technique; the problem
is that you need to be darn sure that region doesn't run out.
Moreover, it might amount to reserving significant amounts of storage
for this case (to hold a stack, for example).  Though a conventional
("in the metalanguage") GC might well need just as much extra space.

From: Thomas F. Burdick
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 03:22:16 +0000
Message-ID: <xcvadttdlav.fsf@conquest.OCF.Berkeley.EDU>

·········@becket.net (Thomas Bushnell, BSG) writes:

> So I had thought of the "special memory region" technique; the problem
> is that you need to be darn sure that region doesn't run out.
> Moreover, it might amount to reserving significant amounts of storage
> for this case (to hold a stack, for example).  Though a conventional
> ("in the metalanguage") GC might well need just as much extra space.

This isn't a language issue, really, it's a general design issue for
GC algorithms.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Tim Moore
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 00:29:55 +0000
Message-ID: <a5mi23$fiv$0@216.39.145.192>

On 28 Feb 2002 15:51:32 -0800, Thomas Bushnell, BSG <·········@becket.net>
 wrote:
>
>So I have a question for the people who know more than me.  
>
>Is there experience with writing self-hosting GC for Lisp or Scheme
>systems?  By "self-hosting" I mean that the GC is principally written
>in Lisp/Scheme, and compiled by the native compiler.  I do not mean
>something written in a secondary language (like C) and compiled by a
>separate compiler, or something written all in assembly language.

It depends what you mean by "principally written in Lisp/Scheme."  If
you mean "Common Lisp or R5RS Scheme with no other functions allowed,"
I doubt that is practical.  On the other hand, if you assume that
allocation, copying, and tag-bit twiddling operations are callable
from Lisp, then it is quite possible, at least for simple collectors.
>
>Obviously there are interesting problems to be solved it making such a
>thing work.
>
>What did the old Lisp Machines do?  It's my understanding that the GC
>was basically all written in assembly language; is that correct?
>
>Is there research/experience on doing it?  Guesses about the best ways
>to make it work?

The Utah Common Lisp collector was written in Lisp (not by me).  It
was a stop-and-copy collector, and it certainly didn't look like any
other Lisp program. IIRC, (car foo) would get you the forwarding
pointer of the cons cell.  On the other hand, coding the GC algorithm
in Lisp was pretty straight-forward, given the right primitives.

I'm not sure if it's advantageous to write a Lisp collector in Lisp.
Because the normal Lisp world is so inconsistant and screwed up while
running the collector, normal Lisp advantages like debuggability and
access to a repl simply don't apply.

Tim

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 00:40:38 +0000
Message-ID: <87elj5glx5.fsf@becket.becket.net>

······@sea-tmoore-l.dotcast.com (Tim Moore) writes:

> It depends what you mean by "principally written in Lisp/Scheme."  If
> you mean "Common Lisp or R5RS Scheme with no other functions allowed,"
> I doubt that is practical.  On the other hand, if you assume that
> allocation, copying, and tag-bit twiddling operations are callable
> from Lisp, then it is quite possible, at least for simple collectors.

Um, it's clear that it *can't* be done from the primitives of the
standard, but that's not the point.  I'm thinking about this from a
systems design perspective: if you wanted a Lisp/Scheme system and you
didn't want to write *two* compilers....

> The Utah Common Lisp collector was written in Lisp (not by me).  It
> was a stop-and-copy collector, and it certainly didn't look like any
> other Lisp program. IIRC, (car foo) would get you the forwarding
> pointer of the cons cell.  On the other hand, coding the GC algorithm
> in Lisp was pretty straight-forward, given the right primitives.

Well, I'll be more explicit by what I mean by "the obvious problems".
The obvious problems are: the collector itself uses memory.  Obviously
the memory isn't in the same arena as everything else.

What techniques are in use for making that work?

> I'm not sure if it's advantageous to write a Lisp collector in Lisp.
> Because the normal Lisp world is so inconsistant and screwed up while
> running the collector, normal Lisp advantages like debuggability and
> access to a repl simply don't apply.

How about Lisp advantages like: "it's a better language".

From: Tim Moore
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 01:27:26 +0000
Message-ID: <a5mldu$mom$0@216.39.145.192>

On 28 Feb 2002 16:40:38 -0800, Thomas Bushnell, BSG <·········@becket.net> 
 wrote:
>······@sea-tmoore-l.dotcast.com (Tim Moore) writes:
>
>> The Utah Common Lisp collector was written in Lisp (not by me).  It
>> was a stop-and-copy collector, and it certainly didn't look like any
>> other Lisp program. IIRC, (car foo) would get you the forwarding
>> pointer of the cons cell.  On the other hand, coding the GC algorithm
>> in Lisp was pretty straight-forward, given the right primitives.
>
>Well, I'll be more explicit by what I mean by "the obvious problems".
>The obvious problems are: the collector itself uses memory.  Obviously
>the memory isn't in the same arena as everything else.

That "obvious problem" isn't particularly a problem.  A two-space
copying collector doesn't use memory other than what it allocates from
to-space, except for a couple of pointers which can hang out in
globals.  More complex algorithms can use preallocated memory,
allocate it directly from the OS, whatever.

>
>What techniques are in use for making that work?
>
>> I'm not sure if it's advantageous to write a Lisp collector in Lisp.
>> Because the normal Lisp world is so inconsistant and screwed up while
>> running the collector, normal Lisp advantages like debuggability and
>> access to a repl simply don't apply.
>
>How about Lisp advantages like: "it's a better language".  
>

Some of us are prepared to be more ecumenical in our views :)

I suppose that access to macros might be a bonus when writing a
collector in Lisp, but assuming that much Lisp functionality won't be
available in the collector or will be available in some weird and
crippled form, and that it's desirable for the collector not to cons
itself, the Lisp you write for the collector ends up looking a lot
like C.

Tim

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 01:47:19 +0000
Message-ID: <87wuwxf49k.fsf@becket.becket.net>

······@sea-tmoore-l.dotcast.com (Tim Moore) writes:

> That "obvious problem" isn't particularly a problem.  A two-space
> copying collector doesn't use memory other than what it allocates from
> to-space, except for a couple of pointers which can hang out in
> globals.  More complex algorithms can use preallocated memory,
> allocate it directly from the OS, whatever.

Sure, stop and copy doesn't require much dynamic allocation in the
collector at all.  Well, except if procedure invocation is doing
allocations, which might be the behavior of a compiler.  In which
case, the number allocations will still be something like order n.  

That's the sort of problem I have in mind...given that the collector
does do allocations--even if it's official space complexity is still a
constant--is there good knowledge about how to set the bounds?

> I suppose that access to macros might be a bonus when writing a
> collector in Lisp, but assuming that much Lisp functionality won't be
> available in the collector or will be available in some weird and
> crippled form, and that it's desirable for the collector not to cons
> itself, the Lisp you write for the collector ends up looking a lot
> like C.

Um, so the question is to tackle the hard case--that is, not crippling
the language.

And part of the motivation is to avoid the need to write two
compilers.

From: Frank A. Adrian
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 08:17:49 +0000
Message-ID: <OOGf8.249$a87.392439@news.uswest.net>

Thomas Bushnell, BSG wrote:
> That's the sort of problem I have in mind...given that the collector
> does do allocations--even if it's official space complexity is still a
> constant--is there good knowledge about how to set the bounds?

Read the first half of Jones & Lins book 
(http://www1.fatbrain.com/asp/bookinfo/bookinfo.asp?theisbn=0471941484) for 
a good survey of the state of the GC art as of about seven years or so ago. 
If you're doing a basic uniprocessor implementation, it should give you the 
info you need.  It also has a good survey of bounded space strategies.  The 
multiproc and some RT stuff has moved on a bit beyond what's offered in the 
book, but you can catch up by checking out the last few proceedings of the 
International Symposia on Memory Management (even though you'll have to pan 
a lot of Java crap to get to a few Lisp applicable nuggets).

faa

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Sat, 02 Mar 2002 20:56:30 +0000
Message-ID: <87pu2mg03l.fsf@becket.becket.net>

"Frank A. Adrian" <·······@ancar.org> writes:

> Read the first half of Jones & Lins book 
> (http://www1.fatbrain.com/asp/bookinfo/bookinfo.asp?theisbn=0471941484) for 
> a good survey of the state of the GC art as of about seven years or so ago. 
> If you're doing a basic uniprocessor implementation, it should give you the 
> info you need.  It also has a good survey of bounded space strategies.  The 
> multiproc and some RT stuff has moved on a bit beyond what's offered in the 
> book, but you can catch up by checking out the last few proceedings of the 
> International Symposia on Memory Management (even though you'll have to pan 
> a lot of Java crap to get to a few Lisp applicable nuggets).

No, it doesn't, because you aren't thinking about the actual
complexity of the problem.

Jones & Lins uses normal space complexity measurements, which don't
apply here, because garbage created by the GC is not itself cleaned;
as a result, such an algorithm uses more space than its actual space
complexity.

From: Barry Margolin
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 16:21:37 +0000
Message-ID: <lUNf8.2$jn.213@paloalto-snr1.gtei.net>

In article <··············@becket.becket.net>,
Thomas Bushnell, BSG <·········@becket.net> wrote:
>That's the sort of problem I have in mind...given that the collector
>does do allocations--even if it's official space complexity is still a
>constant--is there good knowledge about how to set the bounds?

This issue is completely orthogonal to the language that the GC is written
in.  After all, even if it's written in assembler, if the algorithm
requires dynamic memory allocation it will need to be done, and you'll need
to be concerned about running out of space.

Many (perhaps most) GC algorithms are able to use tricks that avoid needing
dynamic memory allocation.  For instance, in a recursive copying GC, you
can use the objects in old-space to implement the stack.

-- 
Barry Margolin, ······@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

From: Stefan Monnier
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 15:32:51 +0000
Message-ID: <5l4rjvjaho.fsf@rum.cs.yale.edu>

>>>>> "Barry" == Barry Margolin <······@genuity.net> writes:
> Many (perhaps most) GC algorithms are able to use tricks that avoid needing
> dynamic memory allocation.  For instance, in a recursive copying GC, you
> can use the objects in old-space to implement the stack.

Or in a depth-first stop-and-copy (i.e. recursive copying), you can also
show that the amount of stack space is smaller or equal to the amount
of space that's yet to be copied, so you can put your stack at the end of
the to space.
See Appel's paper about hash-consing-during-GC where they did just that.

	Stefan

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 16:04:26 +0000
Message-ID: <3223987471212925@naggum.net>

[ Not responding to comp.lang.scheme. ]

* ······@sea-tmoore-l.dotcast.com (Tim Moore)
| I suppose that access to macros might be a bonus when writing a collector
| in Lisp, but assuming that much Lisp functionality won't be available in
| the collector or will be available in some weird and crippled form, and
| that it's desirable for the collector not to cons itself, the Lisp you
| write for the collector ends up looking a lot like C.

  Why is this?  It seem obviously false to me.  The first thing you would
  do in a copying garbage collector would be to switch the active half of
  the two-space memory organization.  The GC's task is only to move live
  objects from the other half to this half, right?  It should be doable
  with the full language available and with no real constraints on consing.
  Even reorganizing old space in a generational garbage collector should be
  doable while consing from the fresh memory arena.

///                                                             2002-03-01
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Martin Simmons
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 20:49:32 +0000
Message-ID: <3c7fe967$0$234$ed9e5944@reading.news.pipex.net>

"Erik Naggum" <····@naggum.net> wrote in message
·····················@naggum.net...
> [ Not responding to comp.lang.scheme. ]
>
> * ······@sea-tmoore-l.dotcast.com (Tim Moore)
> | I suppose that access to macros might be a bonus when writing a collector
> | in Lisp, but assuming that much Lisp functionality won't be available in
> | the collector or will be available in some weird and crippled form, and
> | that it's desirable for the collector not to cons itself, the Lisp you
> | write for the collector ends up looking a lot like C.
>
>   Why is this?  It seem obviously false to me.  The first thing you would
>   do in a copying garbage collector would be to switch the active half of
>   the two-space memory organization.  The GC's task is only to move live
>   objects from the other half to this half, right?  It should be doable
>   with the full language available and with no real constraints on consing.

Yes, but the two-space model usually assumes equal sized spaces.  If from-space
is almost full of live data you won't have much room in the to-space for data
allocated during the copying.

Also, the subset of Lisp available will be constrained to things that can run
safely at any allocation point.  E.g. you can't use anything that alloctates
while holding a lock (I'm assuming the is keeping some data structure
consistent).
--
Martin Simmons, Xanalys Software Tools
······@xanalys.com
rot13 to reply

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 23:00:22 +0000
Message-ID: <3224012428193198@naggum.net>

* "Martin Simmons" <······@xanalys.com>
| Yes, but the two-space model usually assumes equal sized spaces.  If
| from-space is almost full of live data you won't have much room in the
| to-space for data allocated during the copying.

  I assume that if this technique is used at all, it is used all the time,
  it is not suddenly used in a system that did not do this previously, but
  your argument looks like you think someone would suddenly start to do
  some allocation in the garbage collector of a Common Lisp system that had
  never done this before.  I find this a rather peculiar inability to see
  the simplest possible ramification of a suggestion: that it be used all
  the time.  So, at least some of the garbage in from-space is the direct
  result of its collection phase of what is now to-space, right?  In other
  words, there is at least as much space as you allocated last time you
  collected.

  Also, what happens in a system that does not allocate during collection?
  Do they crash and burn if they they have to grow their new-space because
  it did not release enough space on the last allocation?  No?  If not,
  survival tactics of a similar kind _might_ be re-usable in a collector
  that does allocation, don't you think?

  In other words, your fake "problem" does not exist.  If it should spring
  up nonetheless, the solution is the same as for a system that does not
  allocate during collection.  Worst case, we start garbage collection some
  time prior to actually hitting the roof of the available space.  Yet none
  of these very trivial solutions came to mind between starting to write
  your article and deciding to post it after it was written.  How come?

| Also, the subset of Lisp available will be constrained to things that can
| run safely at any allocation point.  E.g. you can't use anything that
| alloctates while holding a lock (I'm assuming the is keeping some data
| structure consistent).

  This makes even less sense than your previous, non-existing problem.
  What other parts of the system inhibits allocation while being locked?
  Please remember that garbage collection happens at a time when the memory
  allocation subsystem has the control of the system, so if you had a lock
  and you never allocated anything before releasing it, you would never
  trigger a garbage collection in the first place.

  Do not waste my time by trying to make me think for you.  Think and post
  in that order. please.

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Martin Simmons
Subject: Re: self-hosting gc
Date: Mon, 04 Mar 2002 16:38:34 +0000
Message-ID: <3c83a316$0$236$ed9e5944@reading.news.pipex.net>

"Erik Naggum" <····@naggum.net> wrote in message
·····················@naggum.net...
> * "Martin Simmons" <······@xanalys.com>
> | Yes, but the two-space model usually assumes equal sized spaces.  If
> | from-space is almost full of live data you won't have much room in the
> | to-space for data allocated during the copying.
>
>   I assume that if this technique is used at all, it is used all the time,
>   it is not suddenly used in a system that did not do this previously, but
>   your argument looks like you think someone would suddenly start to do
>   some allocation in the garbage collector of a Common Lisp system that had
>   never done this before.  I find this a rather peculiar inability to see
>   the simplest possible ramification of a suggestion: that it be used all
>   the time.  So, at least some of the garbage in from-space is the direct
>   result of its collection phase of what is now to-space, right?  In other
>   words, there is at least as much space as you allocated last time you
>   collected.

That makes some more assumptions, E.g.

1) The first time you won't have any garbage from the previous collection.
2) A collection cannot allocate more than the previous one.
3) All allocation during collection is garbage by the time the next collection
starts.

Of course it would be possible to make a system that will work most of the time
(or in specific cases), but that has to be made clear from the start.


>   Also, what happens in a system that does not allocate during collection?
>   Do they crash and burn if they they have to grow their new-space because
>   it did not release enough space on the last allocation?  No?  If not,
>   survival tactics of a similar kind _might_ be re-usable in a collector
>   that does allocation, don't you think?

Could do, provided they work in the middle of a collection rather than at the
end of one where they are normally used.


>   In other words, your fake "problem" does not exist.  If it should spring
>   up nonetheless, the solution is the same as for a system that does not
>   allocate during collection.  Worst case, we start garbage collection some
>   time prior to actually hitting the roof of the available space.  Yet none
>   of these very trivial solutions came to mind between starting to write
>   your article and deciding to post it after it was written.  How come?

Actually, I discarded the "allow some extra space" idea because it doesn't work:
it also assumes 3 above.

>
> | Also, the subset of Lisp available will be constrained to things that can
> | run safely at any allocation point.  E.g. you can't use anything that
> | alloctates while holding a lock (I'm assuming the is keeping some data
> | structure consistent).
>
>   This makes even less sense than your previous, non-existing problem.
>   What other parts of the system inhibits allocation while being locked?
>   Please remember that garbage collection happens at a time when the memory
>   allocation subsystem has the control of the system, so if you had a lock
>   and you never allocated anything before releasing it, you would never
>   trigger a garbage collection in the first place.

By "available/use" I meant "available/use in the GC" (since that would result in
recursive entry to the lock if the GC was triggered by allocating while holding
a lock).
--
Martin Simmons, Xanalys Software Tools
······@xanalys.com
rot13 to reply

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Mon, 04 Mar 2002 20:54:21 +0000
Message-ID: <3224264069822536@naggum.net>

* "Martin Simmons" <······@xanalys.com>
| Of course it would be possible to make a system that will work most of
| the time (or in specific cases), but that has to be made clear from the
| start.

  I vastly prefer open insults to veiled ones, thank you.

  I am sorry I criticized you, as you are obviously not ready for public
  debate about your beliefs.  Please let me know when you become ready.

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Sat, 02 Mar 2002 21:01:21 +0000
Message-ID: <87lmdafzvi.fsf@becket.becket.net>

"Martin Simmons" <······@xanalys.com> writes:

> Also, the subset of Lisp available will be constrained to things that can run
> safely at any allocation point.  E.g. you can't use anything that alloctates
> while holding a lock (I'm assuming the is keeping some data structure
> consistent).

This kind of restriction is *exactly* what I want to avoid.  I'll be
posting a revised version of my question next week.

From: Tim Moore
Subject: Re: self-hosting gc
Date: Sat, 02 Mar 2002 00:15:15 +0000
Message-ID: <a5p5ij$s8b$0@216.39.145.192>

On Fri, 01 Mar 2002 16:04:26 GMT, Erik Naggum <····@naggum.net> wrote:
>[ Not responding to comp.lang.scheme. ]
>
>* ······@sea-tmoore-l.dotcast.com (Tim Moore)
>| I suppose that access to macros might be a bonus when writing a collector
>| in Lisp, but assuming that much Lisp functionality won't be available in
>| the collector or will be available in some weird and crippled form, and
>| that it's desirable for the collector not to cons itself, the Lisp you
>| write for the collector ends up looking a lot like C.
>
>  Why is this?  It seem obviously false to me.  The first thing you would
>  do in a copying garbage collector would be to switch the active half of
>  the two-space memory organization.  The GC's task is only to move live
>  objects from the other half to this half, right?  It should be doable
>  with the full language available and with no real constraints on consing.

I'm not sure what you find obviously false; certainly some algorithms,
when implemented in Lisp, will look more like C or Fortran than
typical Lisp code just by nature of the algorithm itself.  I believe
GC fits in that category.  Nevertheless, here are some additional
reasons why the GC-in-Lisp -- which I haven't looked at in 10 years --
did not look like typical Lisp code to me:

Instead of using the normal Lisp accessor function like car and aref,
open-coded equivalents of the same were used everywhere to avoid type
checking and other surprises.  Perhaps we could have achieved the same
by using (optimize inline) and other declarations.  But we didn't.

The semantics of the accessors was further muddied by forwarding
pointers, which were stashed in odd spots.  Even simple things
like car require a check for a forwarding pointer, or not, depending
on the context.  I could imagine that this requirement would break
pieces of the full language, for example conditions and handler-bind
if the handlers are stored in a list.

I wouldn't say that a GC can't cons in the course of doing a
collection, just that it's desirable not to.  It may be awkward to
reclaim the GC's own storage as garbage immediately, especially if in
a two-space collector the storage is simply allocated in to-space.

You can put in arbitrary effort to making the full language available
for use in the GC.  On the other hand, in a Common Lisp implementation
there are a lot of things you into which you can put arbitrary effort :)

Tim

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Sat, 02 Mar 2002 21:05:06 +0000
Message-ID: <87henyfzp9.fsf@becket.becket.net>

······@sea-tmoore-l.dotcast.com (Tim Moore) writes:

> Instead of using the normal Lisp accessor function like car and aref,
> open-coded equivalents of the same were used everywhere to avoid type
> checking and other surprises.  Perhaps we could have achieved the same
> by using (optimize inline) and other declarations.  But we didn't.

So now perhaps it is clear why my original problem is a *problem* and
not a triviality.

See, open coding such functions is exactly what I want to avoid.  The
question is "implement GC in lisp", not "implement GC in some
pseudo-Lisp with all the guts marked off-limits".

> I wouldn't say that a GC can't cons in the course of doing a
> collection, just that it's desirable not to.  It may be awkward to
> reclaim the GC's own storage as garbage immediately, especially if in
> a two-space collector the storage is simply allocated in to-space.

So this is a failure to look at the actual space of solutions.  For
example, if the GC thread has a special arena to allocate from, then
it's perfectly fine to cons.

From: David Rush
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 08:44:09 +0000
Message-ID: <okfvgcgad9i.fsf@bellsouth.net>

·········@becket.net (Thomas Bushnell, BSG) writes:
> ······@sea-tmoore-l.dotcast.com (Tim Moore) writes:
> > I'm not sure if it's advantageous to write a Lisp collector in Lisp.
> > Because the normal Lisp world is so inconsistant and screwed up while
> > running the collector, normal Lisp advantages like debuggability and
> > access to a repl simply don't apply.
> 
> How about Lisp advantages like: "it's a better language".  

For writing a GC, it isn't. GC algorithms are all about manipulating
typed data, Lisp is about flexible types. There is something of an
impedance mismatch here. With the proper type and location reification
operators, it is certainly possible to do this in Lisp/Scheme (in fact
it's probably easier in Scheme because of guaranteed TCO), but that
doesn't mean that the expression of the GC algorithm itself is more
"natural".

That said, I'm waiting for someone to well and truly resurrect certain
aspects of the LM ideal - specifically a GC-friendly OS - although I
suspect that it may be better written in SML.

This is all IMHO. Someone is sure to flame me for this...

david rush
-- 
There is only one computer program, and we're all just writing
pieces of it.
	-- Thant Tessman (on comp.lang.scheme)

From: Christian Lynbech
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 10:27:30 +0000
Message-ID: <87g03kd1m5.fsf@baguette.webspeed.dk>

>>>>> "David" == David Rush <····@bellsouth.net> writes:

David> That said, I'm waiting for someone to well and truly resurrect certain
David> aspects of the LM ideal - specifically a GC-friendly OS - although I
David> suspect that it may be better written in SML.

Are you implying that SML is better suited for writing an OS than
Lisp? If so, what is the advantages of SML in that problem domain?


------------------------+-----------------------------------------------------
Christian Lynbech       | 
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - ·······@hal.com (Michael A. Petonic)

From: David Rush
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 11:23:35 +0000
Message-ID: <okfadtsa5vs.fsf@bellsouth.net>

Followup-to ignored because I read c.l.s although I'm not sure that I
really want to see the flamewar that is brewing...

Christian Lynbech <·······@get2net.dk> writes:
> >>>>> "David" == David Rush <····@bellsouth.net> writes:
> 
> David> That said, I'm waiting for someone to well and truly resurrect certain
> David> aspects of the LM ideal - specifically a GC-friendly OS - although I
> David> suspect that it may be better written in SML.
> 
> Are you implying that SML is better suited for writing an OS than
> Lisp? 

Yes.

> If so, what is the advantages of SML in that problem domain?

Static typing. Provable correctness. I *hate* slow buggy OSes.

david rush
-- 
Einstein said that genius abhors consensus because when consensus is
reached, thinking stops. Stop nodding your head.
	-- the Silicon Valley Tarot

From: Nils Goesche
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 11:58:35 +0000
Message-ID: <a5nqdb$8if4g$1@ID-125440.news.dfncis.de>

In article <···············@bellsouth.net>, David Rush wrote:
> Followup-to ignored because I read c.l.s although I'm not sure that I
> really want to see the flamewar that is brewing...
> 
> Christian Lynbech <·······@get2net.dk> writes:
>> >>>>> "David" == David Rush <····@bellsouth.net> writes:
>> 
>> David> That said, I'm waiting for someone to well and truly resurrect certain
>> David> aspects of the LM ideal - specifically a GC-friendly OS - although I
>> David> suspect that it may be better written in SML.
>> 
>> Are you implying that SML is better suited for writing an OS than
>> Lisp? 
> 
> Yes.

Well, that is hard to prove.  We already had an OS in Lisp (several,
actually), it was even sold, but I aren't aware of any written
in SML...

>> If so, what is the advantages of SML in that problem domain?
> 
> Static typing. Provable correctness. I *hate* slow buggy OSes.

Last time I checked, SML compilers generated pretty slow code.
When I ported programs to CMUCL they ran several times faster.
Provable correctness?  SML has a really great language definition,
which is provably correct.  But I don't think it follows that an OS
written in SML will be any less buggy than one written in any other
language with a less strict definition.  It just doesn't follow.
Same with static typing.  By your logic, OS's should be written rather
in *Haskell*.

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 15:21:16 +0000
Message-ID: <pan.2002.03.01.10.21.15.524027.25143@shimizu-blume.com>

On Fri, 01 Mar 2002 06:58:35 -0500, Nils Goesche wrote:

> Last time I checked, SML compilers generated pretty slow code.

Must have been a long time ago.

> When I ported programs to CMUCL they ran several times faster.

See the Great PL Shootout page.  There are three ML compilers (two SML
and one Ocaml) ahead of CMUCL.  Two of them are *significantly* ahead.

> Provable correctness?  SML has a really great language definition, which is
> provably correct.  But I don't think it follows that an OS written in
> SML will be any less buggy than one written in any other language with a
> less strict definition.  It just doesn't follow.

That is probably so.  (Actually, "any less buggy" is probably not so.
But that's just my opinion, and I cannot prove it.)

> Same with static typing.  By your logic, OS's should be written rather in *Haskell*.

Actually, no.  I think that ML's fairly restrictive type system would be
just the right fit.

By the way, the type system is not just for making sure your
implementation is correct.  It can also be used to *structure* the OS
itself -- in such a way that things could (at least in theory) become
much more efficient.

Let me give you an example:

In Unix-like OSes, we have the familiar "read" syscall, which roughly has
the following type signature:

    val unix_read : int * pointer * int -> int

When unix_read is being called, the OS must check several things:

   1. First argument:

      - must be a number obtained from a previous call to open/dup/...
      - must not have been subjected to "close" in the meantime
      - must have been opened for reading

   2. Second argument:

      - must point to a writable memory region owned by the current process
      - the region must contain at least <third argument> many writable
        bytes

All this is being checked (by a combination of hardware protection and
actual dynamic checking *at runtime*) when the call is being made.

With a strong static type system that provides abstraction facilities,
one could have the following alternative interface:

---------------------
module FileDesc : sig
   type readable
   type writeable
   type ('r, 'w) filedes
   val open_for_reading : ... -> (readable, unit) filedes
   val open_for_writing : ... -> (unit, writable) filedes
   val open_for_both :    ... -> (readable, writable) filedes
   ...
end

module Buffer : sig
  type buffer
  val alloc : int -> buffer
  val size : buffer -> int
  val getData : buffer -> char_array
  ...
end

open FileDesc
open Buffer

val mlos_read : (readable, 'w) filedes * buffer -> int
val mlos_write : ('r, writable) filedes * buffer -> int
--------------

The point is that both file descriptors and buffers are *unforgeable*
abstract types.  So whenever a user program invokes mlos_read, it can
only do so if it has obtained a valid file descriptor and a valid buffer
beforehand.  Thus, mlos_read does not need to do *any* checking of its
arguments at runtime because there is a compile-time proof that
everything will be ok.  And the important contribution of the programming
language is that it lets you define such abstractions (they do not have
to be built-in).

This is just an example, and I have left out many of the details. In
fact, the above may not be the "best" design once we start living in
a compiler-checked strongly-typed world.  But I think it serves to
demonstrate the idea.

Matthias

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 15:51:00 +0000
Message-ID: <ey38z9cjnh7.fsf@cley.com>

* Matthias Blume wrote:

> The point is that both file descriptors and buffers are *unforgeable*
> abstract types.  So whenever a user program invokes mlos_read, it can
> only do so if it has obtained a valid file descriptor and a valid buffer
> beforehand.  Thus, mlos_read does not need to do *any* checking of its
> arguments at runtime because there is a compile-time proof that
> everything will be ok.  And the important contribution of the programming
> language is that it lets you define such abstractions (they do not have
> to be built-in).

What magic prevents me from stopping the program at a crucial moment
and inserting some bogus stuff in these `unforgeable' types?  For
instance: OS gives me a file descriptor, I then hack at it with a hex
editor, and hand it back.  Oops, now I'm all over your machine.

--tim

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 18:31:51 +0000
Message-ID: <pan.2002.03.01.13.31.50.191245.25143@shimizu-blume.com>

On Fri, 01 Mar 2002 10:51:00 -0500, Tim Bradshaw wrote:

> * Matthias Blume wrote:
> 
>> The point is that both file descriptors and buffers are *unforgeable*
>> abstract types.  So whenever a user program invokes mlos_read, it can
>> only do so if it has obtained a valid file descriptor and a valid
>> buffer beforehand.  Thus, mlos_read does not need to do *any* checking
>> of its arguments at runtime because there is a compile-time proof that
>> everything will be ok.  And the important contribution of the
>> programming language is that it lets you define such abstractions (they
>> do not have to be built-in).
> 
> What magic prevents me from stopping the program at a crucial moment and
> inserting some bogus stuff in these `unforgeable' types?  For instance:
> OS gives me a file descriptor, I then hack at it with a hex editor, and
> hand it back.  Oops, now I'm all over your machine.

Well, there are certainly a lot of things that one has to be very careful
about here.  For one, the OS clearly cannot let you run any old code that
comes out of a hex editor.  A low-tech solution might be cryptographic
fingerprints inserted by a certified compiler (questionable, because
compilers tend to be buggy).  A better solution might eventually emerge
from the idea of proof-carrying code.

Matthias

From: Stefan Monnier
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 15:38:42 +0000
Message-ID: <5lzo1nhvnh.fsf@rum.cs.yale.edu>

>>>>> "Tim" == Tim Bradshaw <···@cley.com> writes:
> instance: OS gives me a file descriptor, I then hack at it with a hex

The OS disallows "hacking at it with a hex editor".
Unless you're some kind of super-privileged user, of course (just like
you can write all over /proc/kmem if you're root).

	Stefan

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 15:55:30 +0000
Message-ID: <ey3d6yjggb1.fsf@cley.com>

* Stefan Monnier wrote:
>>>>>> "Tim" == Tim Bradshaw <···@cley.com> writes:
>> instance: OS gives me a file descriptor, I then hack at it with a hex

> The OS disallows "hacking at it with a hex editor".
> Unless you're some kind of super-privileged user, of course (just like
> you can write all over /proc/kmem if you're root).

I didn't quite mean it quite so literally.  Imagine I get a blob of
code, how do I know that it doesn't fake things?  The only way I can
see to do this is a completely trusted compiler, which can sign its
output, so you're still dynamically checking, you just do it once,
when the program starts (isn't this what MS push with ActiveX?).  Or I
guess you can do some kind of proof on the program before running it
(Java?).

Given the negligible cost of checks, I'd kind of rather the OS just
did them though.

--tim

From: Christian Lynbech
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 20:04:34 +0000
Message-ID: <87y9h6hjcd.fsf@baguette.webspeed.dk>

>>>>> "Tim" == Tim Bradshaw <···@cley.com> writes:

Tim> Or I guess you can do some kind of proof on the program before
Tim> running it (Java?).

Back in the good old days when it was still generally believed that
Java was a secure solution, I saw an article picking that idea
apart. 

On top of a bunch of bugs in the various implementations they also
demonstrated (according to my recollection) that it was possible to
circumvent the security mechanisms of Java, not through valid Java
code, but it was possible by hacking directly at the JVM bytecodes.

The fix was to add signing of applets, such that also for Java you
need to trust the SW supplier.

I must admit not to have a firm reference on the paper. I'll do a
little digging if people start accusing me of lying :-)


------------------------+-----------------------------------------------------
Christian Lynbech       | 
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - ·······@hal.com (Michael A. Petonic)

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 21:14:56 +0000
Message-ID: <ey37koqhg33.fsf@cley.com>

* Christian Lynbech wrote:

> The fix was to add signing of applets, such that also for Java you
> need to trust the SW supplier.

This is nice to know, and enables me to make my point more succinctly:
(a) you need signing, and (b) do you think the average software
vendor's digital signature is worth the bits its made of?  Better
check those system calls...

--tim

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 22:45:09 +0000
Message-ID: <pan.2002.03.05.17.45.09.609547.13987@shimizu-blume.com>

On Tue, 05 Mar 2002 16:14:56 -0500, Tim Bradshaw wrote:

> * Christian Lynbech wrote:
> 
>> The fix was to add signing of applets, such that also for Java you need
>> to trust the SW supplier.
> 
> This is nice to know, and enables me to make my point more succinctly:
> (a) you need signing, and (b) do you think the average software vendor's
> digital signature is worth the bits its made of?  Better check those
> system calls...

No-one was talking about Java.

Matthias

From: Ray Dillinger
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 17:09:55 +0000
Message-ID: <3C864D82.5396EB91@sonic.net>

Tim Bradshaw wrote:
> 
> * Stefan Monnier wrote:
> >>>>>> "Tim" == Tim Bradshaw <···@cley.com> writes:
> >> instance: OS gives me a file descriptor, I then hack at it with a hex
> 
> > The OS disallows "hacking at it with a hex editor".
> > Unless you're some kind of super-privileged user, of course (just like
> > you can write all over /proc/kmem if you're root).
> 
> I didn't quite mean it quite so literally.  Imagine I get a blob of
> code, how do I know that it doesn't fake things?  The only way I can
> see to do this is a completely trusted compiler, which can sign its
> output, so you're still dynamically checking, you just do it once,
> when the program starts (isn't this what MS push with ActiveX?).  Or I
> guess you can do some kind of proof on the program before running it
> (Java?).
> 
> Given the negligible cost of checks, I'd kind of rather the OS just
> did them though.
> 

NAK!  This implies that nobody can modify the compiler.  If you 
have a compiler that signs its output, then somebody can open up 
the source code and find the signing key.  Then the signing key 
can be used to sign arbitrary output.  That means you cannot 
release the source code for your compiler.  

Or maybe read priveleges to it are root-only and root can set the 
signing key for a particular installation -- but then you have a 
problem that nobody can compile on one system and run on another.

Far far better to have potentially-dangerous processes running in 
their own memory arenas where the OS can keep an eye on them in 
case they try messing anything up.

				Bear

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 17:35:23 +0000
Message-ID: <pan.2002.03.06.12.35.21.929658.21623@shimizu-blume.com>

On Wed, 06 Mar 2002 12:09:55 -0500, Ray Dillinger wrote:

> Tim Bradshaw wrote:
>> 
>> * Stefan Monnier wrote:
>> >>>>>> "Tim" == Tim Bradshaw <···@cley.com> writes:
>> >> instance: OS gives me a file descriptor, I then hack at it with a
>> >> hex
>> 
>> > The OS disallows "hacking at it with a hex editor". Unless you're
>> > some kind of super-privileged user, of course (just like you can
>> > write all over /proc/kmem if you're root).
>> 
>> I didn't quite mean it quite so literally.  Imagine I get a blob of
>> code, how do I know that it doesn't fake things?  The only way I can
>> see to do this is a completely trusted compiler, which can sign its
>> output, so you're still dynamically checking, you just do it once, when
>> the program starts (isn't this what MS push with ActiveX?).  Or I guess
>> you can do some kind of proof on the program before running it (Java?).
>> 
>> Given the negligible cost of checks, I'd kind of rather the OS just did
>> them though.
>> 
>> 
> NAK!  This implies that nobody can modify the compiler. [ ... ]

Yes.  But there are far better methods than just signing the output of
the compiler.  In particular, read up on proof-carrying code:  It does
not require a certifying compiler (you can even write the code by hand
as long as you also write the corresponding proof).  Code (and
proof!) can come from anywhere. Finally, the trusted computing base can be
far smaller than a typical compiler.

Matthias

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 21:45:32 +0000
Message-ID: <87lmd574lf.fsf@becket.becket.net>

Ray Dillinger <····@sonic.net> writes:

> NAK!  This implies that nobody can modify the compiler.  If you 
> have a compiler that signs its output, then somebody can open up 
> the source code and find the signing key.  Then the signing key 
> can be used to sign arbitrary output.  That means you cannot 
> release the source code for your compiler.  

No, a trusted compiler is simply the only object that has the ability
to create compiled-procedure objects.  No problem at all!  

Well, the problem is still that only the one compiler is the trusted
one.  Two solutions for that problem are to use a subsetted bytecode
thing, like the Java VM, and to use proof-carrying code to validate
compiler output.

From: ·······@andrew.cmu.edu
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 19:01:32 +0000
Message-ID: <20020306140132.V16447@emu>

On Wed, Mar 06, 2002 at 05:09:55PM +0000, Ray Dillinger wrote:
> Tim Bradshaw wrote:
> > 
> > * Stefan Monnier wrote:
> > >>>>>> "Tim" == Tim Bradshaw <···@cley.com> writes:
> > >> instance: OS gives me a file descriptor, I then hack at it with a hex
> > 
> > > The OS disallows "hacking at it with a hex editor".
> > > Unless you're some kind of super-privileged user, of course (just like
> > > you can write all over /proc/kmem if you're root).
> > 
> > I didn't quite mean it quite so literally.  Imagine I get a blob of
> > code, how do I know that it doesn't fake things?  The only way I can
> > see to do this is a completely trusted compiler, which can sign its
> > output, so you're still dynamically checking, you just do it once,
> > when the program starts (isn't this what MS push with ActiveX?).  Or I
> > guess you can do some kind of proof on the program before running it
> > (Java?).
> > 
> > Given the negligible cost of checks, I'd kind of rather the OS just
> > did them though.
> > 
> 
> NAK!  This implies that nobody can modify the compiler.  If you 
> have a compiler that signs its output, then somebody can open up 
> the source code and find the signing key.  Then the signing key 
> can be used to sign arbitrary output.  That means you cannot 
> release the source code for your compiler.  

No need to sign output.  Simply disallow any binaries that were not
created by that machine's compiler.  In order to run source code it
must be passed through, and checked, by the compiler on that machine.

> 
> Or maybe read priveleges to it are root-only and root can set the 
> signing key for a particular installation -- but then you have a 
> problem that nobody can compile on one system and run on another.
> 

Oh well.  FreeBSD gets by (though it's not required to compile, they
still do a lot).

> Far far better to have potentially-dangerous processes running in 
> their own memory arenas where the OS can keep an eye on them in 
> case they try messing anything up.

Context-switches are expensive, remember.  An OS/compiler that removed
as many layers as possible between program and underlying hardware would
be much faster; if the compiler has a chance to examine every piece
of code that goes in the system then it may be able to do this.

-- 
; Matthew Danish <·······@andrew.cmu.edu>
; OpenPGP public key: C24B6010 on keyring.debian.org
; Signed or encrypted mail welcome.
; "There is no dark side of the moon really; matter of fact, it's all dark."

From: Christian Lynbech
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 14:20:47 +0000
Message-ID: <of4rjrywcg.fsf@chl.ted.dk.eu.ericsson.se>

>>>>> "mdanish" == mdanish  <·······@andrew.cmu.edu> writes:

mdanish> Context-switches are expensive, remember.  An OS/compiler
mdanish> that removed as many layers as possible between program and
mdanish> underlying hardware would be much faster;

Isn't this what exokernels are all about? As I remember/understood it,
the exokernel "movement" takes the microkernel idea to the logical
extreme, expecting the kernel to do very little other than
multiplexing access to hardware.


------------------------+-----------------------------------------------------
Christian Lynbech       | Ericsson Telebit, Skanderborgvej 232, DK-8260 Viby J
Phone: +45 8938 5244    | email: ·················@ted.ericsson.se
Fax:   +45 8938 5101    | web:   www.ericsson.com
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - ·······@hal.com (Michael A. Petonic)

From: Vilhelm Sjoberg
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 02:28:25 +0000
Message-ID: <3C8821C9.3080608@cam.ac.uk>

Tim Bradshaw wrote:

> I didn't quite mean it quite so literally.  Imagine I get a blob of
> code, how do I know that it doesn't fake things?  The only way I can
> see to do this is a completely trusted compiler, which can sign its
> output, so you're still dynamically checking, you just do it once,
> when the program starts (isn't this what MS push with ActiveX?).  Or I
> guess you can do some kind of proof on the program before running it
> (Java?).
> 
> Given the negligible cost of checks, I'd kind of rather the OS just
> did them though. 

Running untrusted code, i.e. an opaque binary handed to you by a 
potentially malicious stranger, is a problem that requires somewhat 
elaborate solutions. For example, you can run it in a sandbox that 
restricts its access to sensitive resources (Java VM), you can do the 
same thing but with hardware support to speed it up (Unix processes with 
CPU-supported memory protection), or you can require the person giving 
you the code to suppy a machine-checkable proof that it is harmless 
(proof-carrying code), or you can just give up and ask the user "do you 
trust this guy?" (digital signatures).

Note how even the low-tech altenative of sandboxing does not work too 
well; numerous flaws were pointed out in the Java security scheme, and 
even though e.g. FreeBSD provides a "jail" system call, you probably 
would only want to use it to provide an extra level of protection for 
deamons that already provide their own checks. Relying completly on the 
OS provisions for this would feel risky.

The primary reason that these runtime checks are universally included is 
that they solve a *different* problem: when I make a mistake and 
introduce a bug in my program, how can I prevent it from clobbering all 
information on my computer and setting me back a week's worth of work? 
Somewhere on Dennis Ritchie's homepage there is a description of what it 
was like working on their multi-user machine before the memory 
protection unit was delivered. Before you ran your newly compiled 
program you shouted out "a.out!" and waited until your coworkers had had 
time to save their files. Quite often the computer would stop echoing 
keystrokes after that.

But the crucial point is that this problem has another solution, by 
writing your program in a type-safe language like Lisp, ML, Java instead 
of in assembly or C. No Lisp or Java program will ever modify memory 
that wasn't properly allocated to it. And this is not because of 
run-time checking - there are some checks, like array bounds, but 
primarily this is because these languages have _no way to describe_ the 
act of peeking and poking at data that does not belong to you (kind of 
like Orwell's newspeak). So there is no need to monitor an ML program 
with runtime checks - the very fact that it was written in ML gives us 
all the confidence we need.

There is some loss of clarity because the same word "protection" is used 
to describle both the concepts above (and the also unrelated idea of 
enforcing good software engineering practice by "encapsulating" parts of 
programs in modules).

But if you do your program development in a type-safe language using a 
standard OS, then you are paying for a feature you don't need, namely 
the sandboxing of programs into processes that cannot hurt eachother. 
(Whereas if you program in some more assembly-style language, this is 
definently a feature you would want). And these checks do not have 
negligable cost. Every time you want your processes to communicate with 
each other, or with the world, or you just want to switch to another 
process in a multi-tasking system, you need to tell the 
memory-protection unit about it, set up a new stack and virtual address 
translation table, etc. These context-switches take time - OS 
preformance is assesed partly on the basis of how long time, buffering 
IO libraries are used to avoid them, etc.

It would be nice with a operating system based on a type-safe language 
instead of C. It could then dispense with the concept of processes all 
together, and there would be no distinction between user or system code. 
(the "kernel" would dissolve into a set of libraries, with a thread 
scheduler somewhere).

Such an OS could of course still sandbox or proof-check or verify the 
signatures of untrusted code, just as you run a Java VM in Windows/Unix.

-Vilhelm

(The ideas above come mainly from Vapour 
[http://vapour.sourceforge.net/], a pet project of an IRC aquintance. He 
seems to have given up developing it, though).

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 03:29:06 +0000
Message-ID: <874rjrbuv1.fsf@becket.becket.net>

Vilhelm Sjoberg <·····@cam.ac.uk> writes:

> It would be nice with a operating system based on a type-safe language
> instead of C. It could then dispense with the concept of processes all
> together, and there would be no distinction between user or system
> code. (the "kernel" would dissolve into a set of libraries, with a
> thread scheduler somewhere).

Which goal is indeed part of the background behind the question of
mine that started this thread.

From: Ray Blaak
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 05:30:27 +0000
Message-ID: <m3n0xjtyr6.fsf@blight.transcend.org>

Vilhelm Sjoberg <·····@cam.ac.uk> writes:
> But if you do your program development in a type-safe language using a 
> standard OS, then you are paying for a feature you don't need, namely 
> the sandboxing of programs into processes that cannot hurt eachother. 

As long as your type-safe language has "escape hatches" for bypassing safety
(e.g. unchecked conversion, for V'Address use ..., calls for foreign functions,
etc.) then OS protection features are still necessary.

Even if your language has no escape hatches, you are still putting a lot of
trust in the security and quality of your runtime environment (which ultimately
is not implemented in the safe language).

It is far far better to have both safety features (language safety and
OS-protections).

-- 
Cheers,                                        The Rhythm is around me,
                                               The Rhythm has control.
Ray Blaak                                      The Rhythm is inside me,
·····@telus.net                                The Rhythm has my soul.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 07:17:36 +0000
Message-ID: <87vgc78r5b.fsf@becket.becket.net>

Ray Blaak <·····@telus.net> writes:

> As long as your type-safe language has "escape hatches" for
> bypassing safety (e.g. unchecked conversion, for V'Address use ...,
> calls for foreign functions, etc.) then OS protection features are
> still necessary.

If the escape hatches are only available to privileged code, you still
don't need OS protection features.

> Even if your language has no escape hatches, you are still putting a
> lot of trust in the security and quality of your runtime environment
> (which ultimately is not implemented in the safe language).

Sez who?

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 12:04:36 +0000
Message-ID: <pan.2002.03.08.07.04.40.694457.26032@shimizu-blume.com>

On Fri, 08 Mar 2002 00:30:27 -0500, Ray Blaak wrote:

> Vilhelm Sjoberg <·····@cam.ac.uk> writes:
>> But if you do your program development in a type-safe language using a
>> standard OS, then you are paying for a feature you don't need, namely
>> the sandboxing of programs into processes that cannot hurt eachother.
> 
> As long as your type-safe language has "escape hatches" for bypassing
> safety (e.g. unchecked conversion, for V'Address use ..., calls for
> foreign functions, etc.) then OS protection features are still
> necessary.

But in the kind of language we are talking about, it is statically known
whether "escape hatches" have been use in a particular program.  Only
programs that do use unsafe features need OS protection.  In practice,
those should be the vast minority.

> Even if your language has no escape hatches, you are still putting a lot
> of trust in the security and quality of your runtime environment (which
> ultimately is not implemented in the safe language).

Right.  Just as much trust as I put now into my runtime environment --
the millions of lines of kernel code,  the code for all those setuid
programs on my system, ...

I would have an easier time "trusting" my environment if I knew its
safety relies on proofs rather than on mere hope that not one of a buch of
a few thousand programmers who I personally don't know haven't screwed up
somewhere.

> It is far far better to have both safety features (language safety and
> OS-protections).

I see no reason to pay for OS protection if I provably won't need it.

Matthias

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 15:06:25 +0000
Message-ID: <ey3elivcd5a.fsf@cley.com>

> I would have an easier time "trusting" my environment if I knew its
> safety relies on proofs rather than on mere hope that not one of a buch of
> a few thousand programmers who I personally don't know haven't screwed up
> somewhere.

Well, that's good.  Perhaps you should start writing a provably
correct shared-memory multiprocessor OS that scales to, say, a 100
processor machine reasonably well and supports all the features of,
say, Solaris on such HW, and performs as well.

--tim

From: Matthias Blume
Subject: on OS design and language technology [was: Re: self-hosting gc]
Date: Fri, 08 Mar 2002 16:55:33 +0000
Message-ID: <pan.2002.03.08.11.55.31.75224.9049@shimizu-blume.com>

On Fri, 08 Mar 2002 10:06:25 -0500, Tim Bradshaw wrote:

>> I would have an easier time "trusting" my environment if I knew its
>> safety relies on proofs rather than on mere hope that not one of a buch
>> of a few thousand programmers who I personally don't know haven't
>> screwed up somewhere.
> 
> Well, that's good.  Perhaps you should start writing a provably correct
> shared-memory multiprocessor OS that scales to, say, a 100 processor
> machine reasonably well and supports all the features of, say, Solaris
> on such HW, and performs as well.

I might, but I need someone to pay me for doing so.  Plus, there is no
guarantee that the result would be widely empoyed.  Look at what we are
stuck with now (Windows, Windows, Windows, with perhaps an odd Linux
here and there in the mix):  Technical superiority is not at all a
guarantee for success in this marketplace.  (Neither Windows nor Linux
would be so popular otherwise.)

By the way, what I am thinking of would never support "all the features
of, say, Solaris", at least not to the point of low-level (API-)
compatibility. This is because we are talking complete redesign of
*everything*, and this must start with exactly those interfaces.  Of
course, the resulting incompatibility with existing software would make
it even harder to be accepted. Basically, we are stuck with what we have
now.  I am quite confident that in, say, 100 years the world of OS design
will look quite different, but it will take a major revolution or two to
get there.

Matthias

From: Tim Bradshaw
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Fri, 08 Mar 2002 17:45:30 +0000
Message-ID: <ey3lmd3ar7p.fsf@cley.com>

* Matthias Blume wrote:

> I might, but I need someone to pay me for doing so.  Plus, there is no
> guarantee that the result would be widely empoyed.  Look at what we are
> stuck with now (Windows, Windows, Windows, with perhaps an odd Linux
> here and there in the mix):  Technical superiority is not at all a
> guarantee for success in this marketplace.  (Neither Windows nor Linux
> would be so popular otherwise.)

Well, OK.  Here's a related question: has anyone ever produced a
substantial system that dealt with physical hardware (and specifically
handled completely asynchronous events) that was provably correct?
`Substantial' would mean something of the size of a reasonable small
OS kernel, say 50-100,000 lines.  Bonus points for being deployed in
commercial use and explaining how it deals with hardware failure.

This isn't a completely rhetorical question.  While I find myself
getting annoyed by the static language / provable correctness people,
who seem to live in a strange alternative universe to the one I
inhabit, I'd be interested in knowing if anything substantial had ever
actually been done.

--tim

From: Erann Gat
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Fri, 08 Mar 2002 18:31:17 +0000
Message-ID: <gat-0803021031470001@192.168.1.50>

In article <···············@cley.com>, Tim Bradshaw <···@cley.com> wrote:

> * Matthias Blume wrote:
> 
> > I might, but I need someone to pay me for doing so.  Plus, there is no
> > guarantee that the result would be widely empoyed.  Look at what we are
> > stuck with now (Windows, Windows, Windows, with perhaps an odd Linux
> > here and there in the mix):  Technical superiority is not at all a
> > guarantee for success in this marketplace.  (Neither Windows nor Linux
> > would be so popular otherwise.)
> 
> Well, OK.  Here's a related question: has anyone ever produced a
> substantial system that dealt with physical hardware (and specifically
> handled completely asynchronous events) that was provably correct?
> `Substantial' would mean something of the size of a reasonable small
> OS kernel, say 50-100,000 lines.  Bonus points for being deployed in
> commercial use and explaining how it deals with hardware failure.

Not directly on point, but you might find the following interesting:

http://www.computer.org/tse/ts2001/e1000abs.htm

+++++

"Formal Analysis of a Space-Craft Controller Using SPIN"

Klaus��Havelund, Mike��Lowry, John��Penix

     Abstract�This paper documents an application of the finite state
model checker Spin to formally
     analyze a multithreaded plan execution module. The plan execution
module is one component of
     NASA's New Millennium Remote Agent, an artificial intelligence-based
space-craft control system
     architecture which launched in October of 1998 as part of the Deep
Space 1 mission. The bottom
     layer of the plan execution module architecture is a domain specific
language, named Esl (Executive
     Support Language), implemented as an extension to multithreaded
Common Lisp. Esl supports the
     construction of reactive control mechanisms for autonomous robots and
space-craft. For this case
     study, we translated the Esl services for managing interacting
parallel goal-and-event driven
     processes into the Promela input language of Spin. A total of five
previously undiscovered
     concurrency errors were identified within the implementation of Esl.
According to the Remote Agent
     programming team, the effort has had a major impact, locating errors
that would not have been
     located otherwise and, in one case, identifying a major design flaw.
In fact, in a different part of the
     system, a concurrency bug identical to one discovered by this study
escaped testing and caused a
     deadlock during an in-flight experiment 96 million kilometers from
earth. The work additionally
     motivated the introduction of procedural abstraction in terms of
inline procedures into Spin.

From: Marco Antoniotti
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Fri, 08 Mar 2002 19:27:26 +0000
Message-ID: <y6csn7a7tcx.fsf@octagon.mrl.nyu.edu>

···@jpl.nasa.gov (Erann Gat) writes:

> In article <···············@cley.com>, Tim Bradshaw <···@cley.com> wrote:
> 
	...
> Not directly on point, but you might find the following interesting:
> 
> http://www.computer.org/tse/ts2001/e1000abs.htm
> 

That is interesting.  May I ask you why SPIN was chosen and not, say,
`smv' or some of its descendants?

Incidentally, I'd like an OBDD CL library.  Any pointers out there?

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.

From: Erann Gat
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Fri, 08 Mar 2002 22:31:22 +0000
Message-ID: <gat-0803021431520001@192.168.1.50>

In article <···············@octagon.mrl.nyu.edu>, Marco Antoniotti
<·······@cs.nyu.edu> wrote:

> ···@jpl.nasa.gov (Erann Gat) writes:
> 
> > In article <···············@cley.com>, Tim Bradshaw <···@cley.com> wrote:
> > 
>         ...
> > Not directly on point, but you might find the following interesting:
> > 
> > http://www.computer.org/tse/ts2001/e1000abs.htm
> > 
> 
> That is interesting.  May I ask you why SPIN was chosen and not, say,
> `smv' or some of its descendants?


No idea.  You'd have to ask the authors of the paper.

E.

From: Stefan Monnier
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Fri, 08 Mar 2002 19:48:29 +0000
Message-ID: <5leliuhmcy.fsf@rum.cs.yale.edu>

> This isn't a completely rhetorical question.  While I find myself
> getting annoyed by the static language / provable correctness people,

There is a world of a difference between strong static typing and
program correctness.


	Stefan

From: Tim Bradshaw
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Fri, 08 Mar 2002 20:23:21 +0000
Message-ID: <ey3y9h2ajwm.fsf@cley.com>

* Stefan Monnier wrote:
> There is a world of a difference between strong static typing and
> program correctness.

That's OK, I find both groups equally annoying.

--tim

From: Sander Vesik
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Fri, 08 Mar 2002 19:44:33 +0000
Message-ID: <1015616668.914134@haldjas.folklore.ee>

In comp.lang.scheme Tim Bradshaw <···@cley.com> wrote:
> * Matthias Blume wrote:
> 
>> I might, but I need someone to pay me for doing so.  Plus, there is no
>> guarantee that the result would be widely empoyed.  Look at what we are
>> stuck with now (Windows, Windows, Windows, with perhaps an odd Linux
>> here and there in the mix):  Technical superiority is not at all a
>> guarantee for success in this marketplace.  (Neither Windows nor Linux
>> would be so popular otherwise.)
> 
> Well, OK.  Here's a related question: has anyone ever produced a
> substantial system that dealt with physical hardware (and specifically
> handled completely asynchronous events) that was provably correct?
> `Substantial' would mean something of the size of a reasonable small
> OS kernel, say 50-100,000 lines.  Bonus points for being deployed in
> commercial use and explaining how it deals with hardware failure.

Nobody in commercial use cares about that level - the 'proved correct'
things would only be found in teh inner circles of large defence networks
in paranoid countries. 

> 
> This isn't a completely rhetorical question.  While I find myself
> getting annoyed by the static language / provable correctness people,
> who seem to live in a strange alternative universe to the one I
> inhabit, I'd be interested in knowing if anything substantial had ever
> actually been done.

Well, it really depends on the amount of features you want to provide in
the OS vs. libraries. If the OS provides a limited set of need to have
functions + essential security and is designed from ground up as provable,
then its not impossible.

This is of course the Orange book rating of A1+ which nobody - or close to
so - seems to be, the result being the systems probably running on machines
they were designed on or similar (think core memory).

> 
> --tim

-- 
	Sander

+++ Out of cheese error +++

From: Marco Antoniotti
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Fri, 08 Mar 2002 20:29:24 +0000
Message-ID: <y6czo1i6bx7.fsf@octagon.mrl.nyu.edu>

Sander Vesik <······@haldjas.folklore.ee> writes:

	...

> Nobody in commercial use cares about that level - the 'proved correct'
> things would only be found in teh inner circles of large defence networks
> in paranoid countries. 

Or Intel.

You underestimate the value of the term "provably correct".  The fact
that a lot of the research done on "verification" is difficult to
grasp, does not mean that it has no value. Even in commercial setups.

This is also part of the reasons why I look with respect at the
`static types language' crowd.  I believe there is a lot of value in
that field.  I just hate what they did with the core languages (read:
they should have expanded CL), but the type inference algorithms are
fascinating (and not that difficult to implement either in simple form).

E.g. Suppose you wrote in CL

	(defun fun (l)
           (declare (type list l))
           (loop for x in l collect (+ i 100)))

Now. AFAIU The compiler has no way to infer that the type of X in the
loop is anything but NUMBER. This means that the + being called will
actually be 'generic'.

Suppose now that I have

(defvar *l* (list 1 2 3 4))

(declaim (type (simple-list fixnum) *l*))

(print (fun *l*))

What would be very nice here, is for the compiler to recompile `fun'
(a different "instance" of it), while re-inferring that the type of X
is going to be a FIXNUM.

Note that this is, AFAIK, even beyond what a *ML compiler can do. (I
beleive it will infer the type of FUN to be `int list -> int list'). I
might be wrong, since I have not followed the literature, but this is
what I think would be a very nice thing to have.

On a different note, the SIMPLE-LIST type declaration is not
ANSI. Being able to write

	(defun fun (l)
	   (declare (type (simple-list fixnum) l))
           (loop for x in l collect (+ i 100)))

and have the compiler know that X is going to be of type FIXNUM would
already be nice.

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.

From: Tim Bradshaw
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Fri, 08 Mar 2002 20:49:37 +0000
Message-ID: <ey3u1rqaiou.fsf@cley.com>

* Sander Vesik wrote:

> Nobody in commercial use cares about that level - the 'proved correct'
> things would only be found in teh inner circles of large defence networks
> in paranoid countries. 

Well, this is kind of interesting.  People with big commercial
applications *definitely* care about them being reliable and correct,
and are willing to pay serious money for this.  Banking systems can
cost many, many millions of pounds a day if they're down, and more
than that if they develop undetected errors.

So I guess that proved correct systems cost a lot more than this, if
they can be written at all.

I wonder, also, how the military people manage to get hardware which
is provably correct too - they must have some pretty amazing
technology to stop the things that cause random failures for the rest
of us hurting them.  I guess they get to save a lot by leaving off all
the dynamic checks since their software is known to be correct...

--tim

From: Simon Helsen
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Fri, 08 Mar 2002 21:55:56 +0000
Message-ID: <Pine.LNX.4.33.0203082247320.20582-100000@waialeale.informatik.uni-freiburg.de>

On 8 Mar 2002, Tim Bradshaw wrote:

>So I guess that proved correct systems cost a lot more than this, if
>they can be written at all.
>
>I wonder, also, how the military people manage to get hardware which
>is provably correct too - they must have some pretty amazing
>technology to stop the things that cause random failures for the rest
>of us hurting them.  I guess they get to save a lot by leaving off all
>the dynamic checks since their software is known to be correct...

perhaps not entirely addressing your questions, still, you might be
interested in what the following company does: http://www.polyspace.com/

In particular the following product:
http://www.polyspace.com/product_datasheet/cverifier.htm

Yes, I know, it's for C and (to my own astonishment) they use abstract
interpretation (an obscure method, even statically typed people not
necessarly like) and no, of course, they do not detect *all* errors at
compile time as they claim, but they give the developer the opportunity to
prove certain properties or narrow down potentially buggy parts (and this
provably correct!) So, I can imagine that combining such a tool with
computer guided (human) theorem proving may actually get you a bit closer
to an all provably correct OS.

Btw, they seem to have a whole bunch of customers who *are* actually be
interested in provably correct products:
http://www.polyspace.com/references.htm

Kind regards,

	Simon

From: Nils Kassube
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Sat, 09 Mar 2002 15:24:43 +0000
Message-ID: <87ofhxhih0.fsf@kursk.kassube.de>

Tim Bradshaw <···@cley.com> writes:

> I wonder, also, how the military people manage to get hardware which
> is provably correct too - they must have some pretty amazing
> technology to stop the things that cause random failures for the rest
> of us hurting them.  I guess they get to save a lot by leaving off all

ROTFLBTC. 

comp.risks is next door. 

(Yes, this _must_ be a small malfunction of my irony detector.)

From: Sander Vesik
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Sat, 09 Mar 2002 17:17:03 +0000
Message-ID: <1015694157.261258@haldjas.folklore.ee>

In comp.lang.scheme Tim Bradshaw <···@cley.com> wrote:
> * Sander Vesik wrote:
> 
>> Nobody in commercial use cares about that level - the 'proved correct'
>> things would only be found in teh inner circles of large defence networks
>> in paranoid countries. 
> 
> Well, this is kind of interesting.  People with big commercial
> applications *definitely* care about them being reliable and correct,
> and are willing to pay serious money for this.  Banking systems can
> cost many, many millions of pounds a day if they're down, and more
> than that if they develop undetected errors.

They also tend to ask questions like 'how many millions of transactions per
second' and 'lets say this disk here goes bad, what happens then'? They
don't just want systems that are reliable - they want systems that are
reliabe, fast and have a large capacity. And they couldn't care less
whetever the downtime is due to OS, database or hardware.

> 
> So I guess that proved correct systems cost a lot more than this, if
> they can be written at all.

You are assuming that a "proved correct" OS kernel is huge beneficial
asset on its own.

> 
> I wonder, also, how the military people manage to get hardware which
> is provably correct too - they must have some pretty amazing

For a start, they don't buy off the shelf PC boxes.

> technology to stop the things that cause random failures for the rest
> of us hurting them.  I guess they get to save a lot by leaving off all

Well, redundant and self-checking hardware isn't exactly new. There are
also algorithms for software that check that the result is correct as part
of the computation.

> the dynamic checks since their software is known to be correct...

I doubt it

> 
> --tim
> 
> 

-- 
	Sander

+++ Out of cheese error +++

From: Tim Bradshaw
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Sat, 09 Mar 2002 21:57:45 +0000
Message-ID: <ey3vgc58kva.fsf@cley.com>

* Sander Vesik wrote:

> They also tend to ask questions like 'how many millions of transactions per
> second' and 'lets say this disk here goes bad, what happens then'? They
> don't just want systems that are reliable - they want systems that are
> reliabe, fast and have a large capacity. And they couldn't care less
> whetever the downtime is due to OS, database or hardware.

Yes.  That was my point.  They look (or should look) at the
reliability and performance of the whole system, including hardware
and so forth, and they don't buy proved correct OS's.  So should the
military.

>> 
>> So I guess that proved correct systems cost a lot more than this, if
>> they can be written at all.

> You are assuming that a "proved correct" OS kernel is huge beneficial
> asset on its own.

No, I'm not.  Actually I'm trying to make the point that it isn't, and
that you have to look at the total system issues. 

--tim

From: Sander Vesik
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Tue, 12 Mar 2002 19:34:52 +0000
Message-ID: <1015961690.489226@haldjas.folklore.ee>

In comp.lang.scheme Tim Bradshaw <···@cley.com> wrote:
> * Sander Vesik wrote:
> 
>> They also tend to ask questions like 'how many millions of transactions per
>> second' and 'lets say this disk here goes bad, what happens then'? They
>> don't just want systems that are reliable - they want systems that are
>> reliabe, fast and have a large capacity. And they couldn't care less
>> whetever the downtime is due to OS, database or hardware.
> 
> Yes.  That was my point.  They look (or should look) at the
> reliability and performance of the whole system, including hardware
> and so forth, and they don't buy proved correct OS's.  So should the
> military.
> 

But the military, its requirements and spec need not be "rational"
nor "cost effective". Paranoia is considerabily easier to satisfy
there aswell.

> 
> --tim
> 

-- 
	Sander

+++ Out of cheese error +++

From: ····@pobox.com
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Sun, 10 Mar 2002 21:25:13 +0000
Message-ID: <7eb8ac3e.0203101325.5deddf78@posting.google.com>

Tim Bradshaw <···@cley.com> wrote in message news:<···············@cley.com>...

> Here's a related question: has anyone ever produced a
> substantial system that dealt with physical hardware (and specifically
> handled completely asynchronous events) that was provably correct?

Yes. J. Strother Moore has mentioned in his invited talk at PADL02
that Boyer, he and their student have built a microprocessor; an
assembler and a linker, and a simple OS -- and *formally proven them
correct*.

In less toy applications, J. Strother Moore and his students have
formally proved that the FPU of the AMD Athlon chip is correct.  Moore
and his students later proved correctness theorems for an IBM 4758
crypto-processor. The theorems contributed to the crypto-processor's
being awarded IFIPS 140-1 rating, the highest security rating for any
piece of hardware and software.  They have proved that executable
model of the Rockwell-Collins JEM1, the world's first silicon Java
Virtual Machine, formally satisfies the JVM specifications.

The summary of J. Strother Moore talk can be read at:
	http://lambda.weblogs.com/discuss/msgReader$2629

The summary also mentions the issue of static typing of the prover
and its term language. The term language of ACL2 is a pure-functional
subset of Common Lisp. The prover is written in the same language.

BTW, the FPU unit of Pentiums (save the first one) have also been
proven correct, by a group at Intel. They used a theorem prover that
is based on ML (and whose term language is based on ML). In many other
respects, The Intel's theorem prover (FL) and ACL2 are surprisingly
similar.

From: Sander Vesik
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Tue, 12 Mar 2002 19:37:27 +0000
Message-ID: <1015961845.352866@haldjas.folklore.ee>

In comp.lang.scheme ····@pobox.com <····@pobox.com> wrote:
> Tim Bradshaw <···@cley.com> wrote in message news:<···············@cley.com>...
> 
>> Here's a related question: has anyone ever produced a
>> substantial system that dealt with physical hardware (and specifically
>> handled completely asynchronous events) that was provably correct?
> 
> Yes. J. Strother Moore has mentioned in his invited talk at PADL02
> that Boyer, he and their student have built a microprocessor; an
> assembler and a linker, and a simple OS -- and *formally proven them
> correct*.
> 
> In less toy applications, J. Strother Moore and his students have
> formally proved that the FPU of the AMD Athlon chip is correct.  Moore
> and his students later proved correctness theorems for an IBM 4758
> crypto-processor. The theorems contributed to the crypto-processor's
> being awarded IFIPS 140-1 rating, the highest security rating for any
> piece of hardware and software.  They have proved that executable
> model of the Rockwell-Collins JEM1, the world's first silicon Java
> Virtual Machine, formally satisfies the JVM specifications.

Would (confusingly named) VLISP that uses (or used to use) pre-scheme
on its backend qualify as a non-toy example aswell?

-- 
	Sander

+++ Out of cheese error +++

From: Skip Egdorf
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Mon, 11 Mar 2002 02:59:40 +0000
Message-ID: <3C8C1D9C.6040908@cybermesa.com>

>
>
>Well, OK.  Here's a related question: has anyone ever produced a
>substantial system that dealt with physical hardware (and specifically
>handled completely asynchronous events) that was provably correct?
>`Substantial' would mean something of the size of a reasonable small
>OS kernel, say 50-100,000 lines.  Bonus points for being deployed in
>commercial use and explaining how it deals with hardware failure.
>
>This isn't a completely rhetorical question.  While I find myself
>getting annoyed by the static language / provable correctness people,
>who seem to live in a strange alternative universe to the one I
>inhabit, I'd be interested in knowing if anything substantial had ever
>actually been done.
>
>--tim
>

Yes. The Honeywell SCOMP. This was a 16-bit minicomputer about like
a PDP-11/34 or so that was originally designed for a secure front end for
Multics. Honeywell added hardware to assist multi-level security and
coded a Unix (v6) like OS with the kernel in a Pascal subset. The system
call level of the OS was modeled against the standard security model of the
time and proven. It was sort-of a prototype of the Orange Book A1 level
and was certified at the same time that the Orange book was being completed.

Skip Egdorf
······@cybermesa.com

From: Daniel Barlow
Subject: Re: on OS design and language technology [was: Re: self-hosting gc]
Date: Fri, 08 Mar 2002 23:18:08 +0000
Message-ID: <87y9h2odhr.fsf@noetbook.telent.net>

Matthias Blume <········@shimizu-blume.com> writes:

> now.  I am quite confident that in, say, 100 years the world of OS design
> will look quite different, but it will take a major revolution or two to
> get there.

In 100 years?  I should bloody hope so, yes.


-dan

-- 

  http://ww.telent.net/cliki/ - Link farm for free CL-on-Unix resources

From: David Rush
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 22:05:00 +0000
Message-ID: <okfhenq90mr.fsf@bellsouth.net>

Tim Bradshaw <···@cley.com> writes:
> > I would have an easier time "trusting" my environment if I knew its
> > safety relies on proofs rather than on mere hope that not one of a buch of
> > a few thousand programmers who I personally don't know haven't screwed up
> > somewhere.
> 
> Well, that's good.  Perhaps you should start writing a provably
> correct shared-memory multiprocessor OS that scales to, say, a 100
> processor machine reasonably well and supports all the features of,
> say, Solaris on such HW, and performs as well.

It's not hard to make the argument that he has, given that he's
working on SML/NJ. The *first* thing that's needed is a solid language
platform.

david rush
-- 
With guns, we are citizens. Without them, we are subjects.
	-- YZGuy, IPL

From: Nils Goesche
Subject: Re: self-hosting gc
Date: Sat, 09 Mar 2002 16:22:16 +0000
Message-ID: <a6dcro$djtk4$1@ID-125440.news.dfncis.de>

In article <···············@bellsouth.net>, David Rush wrote:
> Tim Bradshaw <···@cley.com> writes:
>> > I would have an easier time "trusting" my environment if I knew its
>> > safety relies on proofs rather than on mere hope that not one of a buch of
>> > a few thousand programmers who I personally don't know haven't screwed up
>> > somewhere.
>> 
>> Well, that's good.  Perhaps you should start writing a provably
>> correct shared-memory multiprocessor OS that scales to, say, a 100
>> processor machine reasonably well and supports all the features of,
>> say, Solaris on such HW, and performs as well.
> 
> It's not hard to make the argument that he has, given that he's
> working on SML/NJ. The *first* thing that's needed is a solid language
> platform.

Maybe so; only that Lisp is a solid language platform, too.  SML/NJ
is started by a shell script called ``.run-sml''.  I had to hack it
before I could use SML/NJ, because it contained a bug that passed
a command line argument like "foo bar" as two arguments "foo" and
"bar".  The shell script wasn't written in SML, of course, but the
point is: bugs lurk everywhere.  You will not magically get rid of
them only because you use a language that is statically strongly
typed instead of one that is dynamically strongly typed.  Typed
lambda calculus is a nice thing.  But that does not imply at all,
as many people seem to believe, that we suddenly all have to
use a statically typed language in order to write good software.
You can just as well regard dynamic typing as a /feature/ of your
language which makes your language more expressive and hence
programming easier; and easily written programs usually have /less/
bugs than more complicated ones.  There is no way to prove whether
this is true or not.  All you can do is try and see for yourself :-)

And another point I made long ago /still/ remains:  If people think
we need statically typed languages to write working software, because
there provably won't be any runtime type-errors [1], why not go
all the way and demand Haskell?  Look into your mind and answer
yourself why you wouldn't write an OS in Haskell and you are
almost there:  You'll suddenly see why Lisp people don't use
statically typed languages :-)

Oh, and look at this:

(defun first-two (list)
  (list (car list) (cadr list)))

(defun start-program (program &rest args)
  (handler-case (apply (symbol-function program) args)
    (type-error (cnd)
                (format *error-output*
                        "~&~A was shut down because of a type error:~&~A"
                        program cnd))))

BLARK 7 > (start-program 'first-two 42)
FIRST-TWO was shut down because of a type error:
42 is not of type CONS.
NIL

BLARK 8 > 

See?  Why couldn't my OS do just that?  No weird things happen,
no BSOD, no kernel panic.  Aren't there any exceptions you
might forget to catch in *ML?

[1] As if those languages didn't contain undocumented loopholes that
    let you circumvent that security; read the OCaml list and learn
    that advanced users of OCaml use them all the time.  I always
    have to laugh when I see that.

Regards,
-- 
Nils Goesche                          PGP key ID 0x42B32FC9

"The sooner all the animals are dead, the sooner we'll find
 their money."                              -- Ed Bluestone

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Sat, 09 Mar 2002 17:48:42 +0000
Message-ID: <pan.2002.03.09.12.48.52.348925.31024@shimizu-blume.com>

On Sat, 09 Mar 2002 11:22:16 -0500, Nils Goesche wrote:

> In article <···············@bellsouth.net>, David Rush wrote:
>> Tim Bradshaw <···@cley.com> writes:
>>> > I would have an easier time "trusting" my environment if I knew its
>>> > safety relies on proofs rather than on mere hope that not one of a
>>> > buch of a few thousand programmers who I personally don't know
>>> > haven't screwed up somewhere.
>>> 
>>> Well, that's good.  Perhaps you should start writing a provably
>>> correct shared-memory multiprocessor OS that scales to, say, a 100
>>> processor machine reasonably well and supports all the features of,
>>> say, Solaris on such HW, and performs as well.
>> 
>> It's not hard to make the argument that he has, given that he's working
>> on SML/NJ. The *first* thing that's needed is a solid language
>> platform.
> 
> Maybe so; only that Lisp is a solid language platform, too.

But it is not statically typed, so the point is moot.  It is not
expressive enough to do the things that I want to do.  (That is: I can't
express invariants as types.)

>  SML/NJ is
> started by a shell script called ``.run-sml''.  I had to hack it before
> I could use SML/NJ, because it contained a bug that passed a command
> line argument like "foo bar" as two arguments "foo" and "bar".  The
> shell script wasn't written in SML, of course, but the point is: bugs
> lurk everywhere.

Thank you for just making my point:  Since we still have to live in the world
of Unix, we have to somehow get started.  That's why there is
Unix-specific stuff.  I'd *love* to get rid of it, but currently I can't,
precisely because of how the OS works.

  You will not magically get rid of them only because
> you use a language that is statically strongly typed instead of one that
> is dynamically strongly typed.  Typed lambda calculus is a nice thing.
> But that does not imply at all, as many people seem to believe, that we
> suddenly all have to use a statically typed language in order to write
> good software. You can just as well regard dynamic typing as a /feature/
> of your language which makes your language more expressive and hence
> programming easier; and easily written programs usually have /less/ bugs
> than more complicated ones.  There is no way to prove whether this is
> true or not.  All you can do is try and see for yourself :-)

Well, what can I say.  I did, and I found the opposite to be true.  But I
said that already...

> And another point I made long ago /still/ remains:  If people think we
> need statically typed languages to write working software, because there
> provably won't be any runtime type-errors [1], why not go all the way
> and demand Haskell?  Look into your mind and answer yourself why you
> wouldn't write an OS in Haskell and you are almost there:  You'll
> suddenly see why Lisp people don't use statically typed languages :-)

Nonsense.  In Haskell it is very, very hard to get an intuitive graps on 
resource consumption.  Switch two operands of a +, and your program blows
up.  This is due to lazy evaluation (which, for many other purposes, has
its very good sides, too).  This has nothing to do with static vs.
dynamic typing.

> See?  Why couldn't my OS do just that?  No weird things happen, no BSOD,
> no kernel panic.  Aren't there any exceptions you might forget to catch
> in *ML?

True.  But there are static exception analyzers that try to mitigate this
problem.  In the end, yes, there will always be things that need to be
checked at runtime.  But I am not willing to pay for those that don't
have to be.

The exception thing is also a program design problem -- and those don't
get magically resolved by switching over to a statically typed language
alone.  What the language gives you is a set of tools that enable you to
redesign the problem.

Simple example:  I have a finite map data structure.  Should lookup raise
an exception if the item to be looked up is not in the domain of the map?
Well, I prefer to have lookup return an option.  This is sometimes less
convenient to use, but it makes it clear at the type level that  one must
somehow be prepared to deal with the failure case.  (Of course, in
certain applications the other interface is more convenient.)

> [1] As if those languages didn't contain undocumented loopholes that
>     let you circumvent that security; read the OCaml list and learn that
>     advanced users of OCaml use them all the time.  I always have to
>     laugh when I see that.

So do I.  (I don't program in Ocaml.)  By the way, Ocaml is a very poor
example because "language" coincides with "the one and only
implementation".  But take SML, for example:  There are *no*
"undocumented" loopholes in the "language", but most implementations
probably do provide some "escape hatches". Clearly, if the language is
to be used in the way that I outlined, then there must be no loopholes
in the implementation.  (Or, those loopholes must be governed by some
access control mechanism.  This is just like "root" in Unix:  "root" is
a *gigantic* loophole, but only privileged users have access to it -- at
least that's the theory. :-)

Matthias

From: Marco Antoniotti
Subject: Re: self-hosting gc
Date: Sat, 09 Mar 2002 18:48:18 +0000
Message-ID: <y6cofhx37d9.fsf@octagon.mrl.nyu.edu>

Matthias Blume <········@shimizu-blume.com> writes:

> On Sat, 09 Mar 2002 11:22:16 -0500, Nils Goesche wrote:
> 
> > In article <···············@bellsouth.net>, David Rush wrote:
> >> Tim Bradshaw <···@cley.com> writes:
> >>> > I would have an easier time "trusting" my environment if I knew its
> >>> > safety relies on proofs rather than on mere hope that not one of a
> >>> > buch of a few thousand programmers who I personally don't know
> >>> > haven't screwed up somewhere.
> >>> 
> >>> Well, that's good.  Perhaps you should start writing a provably
> >>> correct shared-memory multiprocessor OS that scales to, say, a 100
> >>> processor machine reasonably well and supports all the features of,
> >>> say, Solaris on such HW, and performs as well.
> >> 
> >> It's not hard to make the argument that he has, given that he's working
> >> on SML/NJ. The *first* thing that's needed is a solid language
> >> platform.
> > 
> > Maybe so; only that Lisp is a solid language platform, too.
> 
> But it is not statically typed, so the point is moot.  It is not
> expressive enough to do the things that I want to do.  (That is: I can't
> express invariants as types.)

And why do you need to express an "invariant" as a "type"?

Seriously, most uses of the type system I saw in *ML languages are to
define "sublanguages".  Here is an example from a Type Checker written
in very standard (AFAIU) SML/NJ

datatype Expression = constant of Constant |
		      variable of Variable |
		      lambda of Variable * Expression |
		      conditional of Expression * Expression * Expression |
		      application of Expression * Expression |
		      let_exp of (Variable list) *
				 (Expression list) *
				 Expression |
		      letrec_exp of (Variable list) *
				    (Expression list) *
				    Expression |
		      oper of Operator * Expression * Expression

So, now you have your "type" which is nothing else than a glorified
S-expr.  This is essentially my personal experience with *ML programs.

Yes.  I do believe that type inferencing does buy you a lot.  I just
do not think it buys you enough to make you switch.

After all

(defun foo (x) (+ x 3))
(defun baz () (car (foo 3)))

When compiling with CMUCL gives me the following

* (compile 'foo)
FOO
NIL
NIL

* (compile 'baz)
In: LAMBDA NIL
  (FOO 3)
Warning: Result is a NUMBER, not a (VALUES &OPTIONAL LIST &REST T).

Compilation unit finished.
  1 warning

BAZ
T
T

You may not consider this sufficient, but it is already a lot.

	...

> Simple example:  I have a finite map data structure.  Should lookup raise
> an exception if the item to be looked up is not in the domain of the map?
> Well, I prefer to have lookup return an option.  This is sometimes less
> convenient to use, but it makes it clear at the type level that  one must
> somehow be prepared to deal with the failure case.  (Of course, in
> certain applications the other interface is more convenient.)

This is one of the typical examples of different viewpoints, which are
just an example of "diversity".

You insistence on "at the type level" you "prefer to return an option"
(it'd be nice to explain what this actually means) is just a fancy way
to say: my function will return a second value of NIL if the item is
not in the map.  E.g. this is what GETHASH does.  Somehow, I will have
to deal with the case,  but this is simply part of the "contract" the
function writer is asking me to respect.  There is nothing magical
about "types" in this context (unless I am grossly misunderstanding
something).

I.e. defining something like

datatype 'a result = item of 'a | not_found

fun find x [] = not_found
  | find x (first::rest) = if x = first then item(first) else find x rest

and then writing code that does not handle the `not_found' type
constant is just the same as writing code that does not handle the
NIL.

fun foo item(x) = x + 4

is definable (AFAIK) and all you will get is a warning about a
possible match exception.  Granted, it is better than doing

(defun find* (x list) ....)

(defun foo (x) (+ x 4))

(foo (find* 3 '(1 2 44)))

But the bottom line is that you will still get a runtime error if you
wrote

foo (find 3 [1, 2, 44])

And, at the end of the day, by moving to *ML, I have lost code and
data equivalency, a powerful macro system and multimethods.  This is
why I stick to CL, hoping that the compilers will improve their type
inferencing capabilites.

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.

From: ·······@andrew.cmu.edu
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 04:36:37 +0000
Message-ID: <20020309233637.C11996@emu>

I think he means by "return an option" the datatype:

datatype 'a option = SOME of 'a | NONE

which is essentially what you thought, it seems, anyway.

-- 
; Matthew Danish <·······@andrew.cmu.edu>
; OpenPGP public key: C24B6010 on keyring.debian.org
; Signed or encrypted mail welcome.
; "There is no dark side of the moon really; matter of fact, it's all dark."

From: Marco Antoniotti
Subject: Re: self-hosting gc
Date: Mon, 11 Mar 2002 16:52:22 +0000
Message-ID: <y6cu1rn2gjd.fsf@octagon.mrl.nyu.edu>

·······@andrew.cmu.edu writes:

> I think he means by "return an option" the datatype:
> 
> datatype 'a option = SOME of 'a | NONE
> 
> which is essentially what you thought, it seems, anyway.

Yes, thanks, I figured it out by looking at some of my old code.


Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 05:00:26 +0000
Message-ID: <pan.2002.03.10.00.00.39.761796.1795@shimizu-blume.com>

On Sat, 09 Mar 2002 13:48:18 -0500, Marco Antoniotti wrote:


> Matthias Blume <········@shimizu-blume.com> writes:
> 
>> On Sat, 09 Mar 2002 11:22:16 -0500, Nils Goesche wrote:
>> 
>> > In article <···············@bellsouth.net>, David Rush wrote:
>> >> Tim Bradshaw <···@cley.com> writes:
>> >>> > I would have an easier time "trusting" my environment if I knew
>> >>> > its safety relies on proofs rather than on mere hope that not one
>> >>> > of a buch of a few thousand programmers who I personally don't
>> >>> > know haven't screwed up somewhere.
>> >>> 
>> >>> Well, that's good.  Perhaps you should start writing a provably
>> >>> correct shared-memory multiprocessor OS that scales to, say, a 100
>> >>> processor machine reasonably well and supports all the features of,
>> >>> say, Solaris on such HW, and performs as well.
>> >> 
>> >> It's not hard to make the argument that he has, given that he's
>> >> working on SML/NJ. The *first* thing that's needed is a solid
>> >> language platform.
>> > 
>> > Maybe so; only that Lisp is a solid language platform, too.
>> 
>> But it is not statically typed, so the point is moot.  It is not
>> expressive enough to do the things that I want to do.  (That is: I
>> can't express invariants as types.)
> 
> And why do you need to express an "invariant" as a "type"?
> 
> Seriously, most uses of the type system I saw in *ML languages are to
> define "sublanguages".  Here is an example from a Type Checker written
> in very standard (AFAIU) SML/NJ
> 
> datatype Expression = constant of Constant |
> 		      variable of Variable |
> 		      lambda of Variable * Expression |
> 		      conditional of Expression * Expression * Expression |
> 		      application of Expression * Expression | let_exp of (Variable
> 		      list) *
> 				 (Expression list) *
> 				 Expression |
> 		      letrec_exp of (Variable list) *
> 				    (Expression list) *
> 				    Expression |
> 		      oper of Operator * Expression * Expression
> 
> So, now you have your "type" which is nothing else than a glorified
> S-expr.  This is essentially my personal experience with *ML programs.

You conveniently gloss over the important difference by using the word
"glorified".  Notice that this is *not* an S-expression at all!  It is a
brand-new type that cannot be confused with anything else.  Yes, you
could encode values of this type as S-expressions, but programming at
that level would not detect when you accidentally confuse such a value
with a completely unrelated one, it does not give you feedback about
whether your code handles all the cases, you can accidentally create
ill-formed S-expressions that do not correspond to any Expression value,
etc...

>> Simple example:  I have a finite map data structure.  Should lookup
>> raise an exception if the item to be looked up is not in the domain of
>> the map? Well, I prefer to have lookup return an option.  This is
>> sometimes less convenient to use, but it makes it clear at the type
>> level that  one must somehow be prepared to deal with the failure case.
>>  (Of course, in certain applications the other interface is more
>> convenient.)
> 
> This is one of the typical examples of different viewpoints, which are
> just an example of "diversity".
> 
> You insistence on "at the type level" you "prefer to return an option"
> (it'd be nice to explain what this actually means)

It's a standard type constructor in SML:  A value of type T option can be
either SOME t where t is a value of type T or NONE.

> is just a fancy way
> to say: my function will return a second value of NIL if the item is not
> in the map.  E.g. this is what GETHASH does.  Somehow, I will have to
> deal with the case,  but this is simply part of the "contract" the
> function writer is asking me to respect.  There is nothing magical about
> "types" in this context (unless I am grossly misunderstanding
> something).
> 
> I.e. defining something like
> 
> datatype 'a result = item of 'a | not_found

This is precisely the (alpha-renamed) option type that I mentioned.

> 
> fun find x [] = not_found
>   | find x (first::rest) = if x = first then item(first) else find x
>   rest
> 
> and then writing code that does not handle the `not_found' type constant
> is just the same as writing code that does not handle the NIL.
> 
> fun foo item(x) = x + 4

The correct syntax is

  fun foo (item x) = x + 4

> 
> is definable (AFAIK) and all you will get is a warning about a possible
> match exception.  Granted, it is better than doing
> 
> (defun find* (x list) ....)
> 
> (defun foo (x) (+ x 4))
> 
> (foo (find* 3 '(1 2 44)))
> 
> But the bottom line is that you will still get a runtime error if you
> wrote
> 
> foo (find 3 [1, 2, 44])

Only if you tolerate compile-time warnings of the sort "non-exhaustive
match".  I never do that in my own code.  (And if I had designed ML,
I'd made the above definition of foo illegal.)

But even now we are better off than in Lisp:  The compiler will *force*
me to write "item x" and not just "x".  And by doing so it will remind me
that there is another case to consider.  As it stands, yes, I can choose
to not handle the other case. But I couldn't simply forget that there
*are* cases.

Matthias

From: Marco Antoniotti
Subject: Re: self-hosting gc
Date: Mon, 11 Mar 2002 17:13:46 +0000
Message-ID: <y6cr8mr2fjp.fsf@octagon.mrl.nyu.edu>

Matthias Blume <········@shimizu-blume.com> writes:

> On Sat, 09 Mar 2002 13:48:18 -0500, Marco Antoniotti wrote:
> 
> 
> > Matthias Blume <········@shimizu-blume.com> writes:
> > 
> >> On Sat, 09 Mar 2002 11:22:16 -0500, Nils Goesche wrote:
> >> 
> >> > In article <···············@bellsouth.net>, David Rush wrote:
> >> >> Tim Bradshaw <···@cley.com> writes:
> >> >>> > I would have an easier time "trusting" my environment if I knew
> >> >>> > its safety relies on proofs rather than on mere hope that not one
> >> >>> > of a buch of a few thousand programmers who I personally don't
> >> >>> > know haven't screwed up somewhere.
> >> >>> 
> >> >>> Well, that's good.  Perhaps you should start writing a provably
> >> >>> correct shared-memory multiprocessor OS that scales to, say, a 100
> >> >>> processor machine reasonably well and supports all the features of,
> >> >>> say, Solaris on such HW, and performs as well.
> >> >> 
> >> >> It's not hard to make the argument that he has, given that he's
> >> >> working on SML/NJ. The *first* thing that's needed is a solid
> >> >> language platform.
> >> > 
> >> > Maybe so; only that Lisp is a solid language platform, too.
> >> 
> >> But it is not statically typed, so the point is moot.  It is not
> >> expressive enough to do the things that I want to do.  (That is: I
> >> can't express invariants as types.)
> > 
> > And why do you need to express an "invariant" as a "type"?
> > 
> > Seriously, most uses of the type system I saw in *ML languages are to
> > define "sublanguages".  Here is an example from a Type Checker written
> > in very standard (AFAIU) SML/NJ
> > 
> > datatype Expression = constant of Constant |
> > 		      variable of Variable |
> > 		      lambda of Variable * Expression |
> > 		      conditional of Expression * Expression * Expression |
> > 		      application of Expression * Expression | let_exp of (Variable
> > 		      list) *
> > 				 (Expression list) *
> > 				 Expression |
> > 		      letrec_exp of (Variable list) *
> > 				    (Expression list) *
> > 				    Expression |
> > 		      oper of Operator * Expression * Expression
> > 
> > So, now you have your "type" which is nothing else than a glorified
> > S-expr.  This is essentially my personal experience with *ML programs.
> 
> You conveniently gloss over the important difference by using the word
> "glorified".  Notice that this is *not* an S-expression at all!  It is a
> brand-new type that cannot be confused with anything else.

I understand that that is your point of view.  My answer to that is:
"so what"?  The fact that that is a type helps me yes, I just claim
that what you must give up to switch to *ML is too much.

> Yes, you
> could encode values of this type as S-expressions, but programming at
> that level would not detect when you accidentally confuse such a value
> with a completely unrelated one, it does not give you feedback about
> whether your code handles all the cases, you can accidentally create
> ill-formed S-expressions that do not correspond to any Expression value,
> etc...

That is called debugging.  Maybe the case above is a bad case, since
it does mix syntax and semantics.  However, when you write some
processor of the kind in CL you essentially write something like

	(ecase (first expr)
           (op1 ...)
           (op2 ...)
           ...
           (opN ...))

So, your "malformed" S-expr case is taken care of.  Eventually your
program will have to be "correct".  The type inference will help you,
sure, but again, is not something that helps you *that much*.

> >> Simple example:  I have a finite map data structure.  Should lookup
> >> raise an exception if the item to be looked up is not in the domain of
> >> the map? Well, I prefer to have lookup return an option.  This is
> >> sometimes less convenient to use, but it makes it clear at the type
> >> level that  one must somehow be prepared to deal with the failure case.
> >>  (Of course, in certain applications the other interface is more
> >> convenient.)
> > 
> > This is one of the typical examples of different viewpoints, which are
> > just an example of "diversity".
> > 
> > You insistence on "at the type level" you "prefer to return an option"
> > (it'd be nice to explain what this actually means)
> 
> It's a standard type constructor in SML:  A value of type T option can be
> either SOME t where t is a value of type T or NONE.
> 
> > is just a fancy way
> > to say: my function will return a second value of NIL if the item is not
> > in the map.  E.g. this is what GETHASH does.  Somehow, I will have to
> > deal with the case,  but this is simply part of the "contract" the
> > function writer is asking me to respect.  There is nothing magical about
> > "types" in this context (unless I am grossly misunderstanding
> > something).
> > 
> > I.e. defining something like
> > 
> > datatype 'a result = item of 'a | not_found
> 
> This is precisely the (alpha-renamed) option type that I mentioned.
> 
> > 
> > fun find x [] = not_found
> >   | find x (first::rest) = if x = first then item(first) else find x
> >   rest
> > 
> > and then writing code that does not handle the `not_found' type constant
> > is just the same as writing code that does not handle the NIL.
> > 
> > fun foo item(x) = x + 4
> 
> The correct syntax is
> 
>   fun foo (item x) = x + 4

AFAIK, both syntaxes are correct.  Looks like you prefer a Lispy one :)

> > is definable (AFAIK) and all you will get is a warning about a possible
> > match exception.  Granted, it is better than doing
> > 
> > (defun find* (x list) ....)
> > 
> > (defun foo (x) (+ x 4))
> > 
> > (foo (find* 3 '(1 2 44)))
> > 
> > But the bottom line is that you will still get a runtime error if you
> > wrote
> > 
> > foo (find 3 [1, 2, 44])
> 
> Only if you tolerate compile-time warnings of the sort "non-exhaustive
> match".  I never do that in my own code.  (And if I had designed ML,
> I'd made the above definition of foo illegal.)

But it is not.  Not only that: with the power of hindsight, it was
left like that (i.e. only as a warning), exactly because you want to
achieve some CL flexibility and not let the compiler get in your way
when doing what used to be called "prototyping" and "exploratory
programming".

> But even now we are better off than in Lisp:  The compiler will *force*
> me to write "item x" and not just "x".  And by doing so it will remind me
> that there is another case to consider.  As it stands, yes, I can choose
> to not handle the other case. But I couldn't simply forget that there
> *are* cases.

That is correct. And I applaude the *ML system for doing that.
However, first you can extend Common Lisp to accept such union datatypes by
a layer of macrology, second, by switching to *ML, you still miss
*all* the other niceties that CL offers you.

IMHO the best language you may want is the CL of 1994 + type inference.
*ML languages get the type inference, but forgo too much of the rest.

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Mon, 11 Mar 2002 17:36:01 +0000
Message-ID: <pan.2002.03.11.12.35.59.10146.12504@shimizu-blume.com>

On Mon, 11 Mar 2002 12:13:46 -0500, Marco Antoniotti wrote:

>> > fun foo item(x) = x + 4
>> 
>> The correct syntax is
>> 
>>   fun foo (item x) = x + 4
> 
> AFAIK, both syntaxes are correct.  Looks like you prefer a Lispy one :)

You are wrong, you can trust me on this one.  (You may put parentheses around the
x, but the ones around "item x" are not optional. Actually, they are, but
leaving them out gives you a completely different function.)

Matthias

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Mon, 11 Mar 2002 18:28:05 +0000
Message-ID: <pan.2002.03.11.13.28.05.722908.12899@shimizu-blume.com>

On Mon, 11 Mar 2002 12:36:01 -0500, Matthias Blume wrote:

> On Mon, 11 Mar 2002 12:13:46 -0500, Marco Antoniotti wrote:
> 
>>> > fun foo item(x) = x + 4
>>> 
>>> The correct syntax is
>>> 
>>>   fun foo (item x) = x + 4
>> 
>> AFAIK, both syntaxes are correct.  Looks like you prefer a Lispy one :)
> 
> You are wrong, you can trust me on this one.  (You may put parentheses
> around the x, but the ones around "item x" are not optional.

So far so good...

> Actually,
> they are, but leaving them out gives you a completely different
> function.)

... but this one I take back. The definition without parentheses around
"item x" or "item (x)" is not valid SML.

Matthias

From: Marco Antoniotti
Subject: Re: self-hosting gc
Date: Mon, 11 Mar 2002 21:01:56 +0000
Message-ID: <y6czo1e24zf.fsf@octagon.mrl.nyu.edu>

Matthias Blume <········@shimizu-blume.com> writes:

> >> AFAIK, both syntaxes are correct.  Looks like you prefer a Lispy one :)
> > 
> > You are wrong, you can trust me on this one.  (You may put parentheses
> > around the x, but the ones around "item x" are not optional.
> 
> So far so good...
> 
> > Actually,
> > they are, but leaving them out gives you a completely different
> > function.)
> 
> ... but this one I take back. The definition without parentheses around
> "item x" or "item (x)" is not valid SML.

Fine.  The implementation I am using (poplog ML) is not up to speed.

This is another of the annoying things about writing *ML code.  On the
surface, the language seems not to care about parenthesization.
However, because of issues like the `(item x)' above, you end up
needing a lot of them.  So, either you program defensively and put a
lot of parenthiesis here and there, or you keep fighting with 'tycon'
errors etc. etc.

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Mon, 11 Mar 2002 21:24:58 +0000
Message-ID: <pan.2002.03.11.16.24.58.136008.13460@shimizu-blume.com>

On Mon, 11 Mar 2002 16:01:56 -0500, Marco Antoniotti wrote:

> This is another of the annoying things about writing *ML code.  On the
> surface, the language seems not to care about parenthesization. However,
> because of issues like the `(item x)' above, you end up needing a lot of
> them.  So, either you program defensively and put a lot of parenthiesis
> here and there, or you keep fighting with 'tycon' errors etc. etc.

I also don't know how you arrived at the impression that ML does not
care about parenthesization.  It is pretty much like every other language
with the main exception being Lisp in that it tolerates extra parentheses
around things that are already grouped (so x and (x) are the same etc.).
But it obviously does need parentheses in some cases. (Or otherwise, why
would there be any parentheses in the language in the first place?)

Internalizing those few simple rules that govern parentheses takes maybe
a week, two at most.  After that, the kind of "fight" that you describe
should be a thing of the past, pretty much.

Matthias

From: Marco Antoniotti
Subject: Re: self-hosting gc
Date: Mon, 11 Mar 2002 21:49:53 +0000
Message-ID: <y6cpu2azse6.fsf@octagon.mrl.nyu.edu>

Matthias Blume <········@shimizu-blume.com> writes:

> On Mon, 11 Mar 2002 16:01:56 -0500, Marco Antoniotti wrote:
> 
> > This is another of the annoying things about writing *ML code.  On the
> > surface, the language seems not to care about parenthesization. However,
> > because of issues like the `(item x)' above, you end up needing a lot of
> > them.  So, either you program defensively and put a lot of parenthiesis
> > here and there, or you keep fighting with 'tycon' errors etc. etc.
> 
> I also don't know how you arrived at the impression that ML does not
> care about parenthesization.  It is pretty much like every other language
> with the main exception being Lisp in that it tolerates extra parentheses
> around things that are already grouped (so x and (x) are the same etc.).
> But it obviously does need parentheses in some cases. (Or otherwise, why
> would there be any parentheses in the language in the first place?)
> 
> Internalizing those few simple rules that govern parentheses takes maybe
> a week, two at most.  After that, the kind of "fight" that you describe
> should be a thing of the past, pretty much.

Of course.  I should have qualified the above statement, by saying
that "I" find it annoying.  Anyway, there are still too many things I
miss from CL that I have not made the switch to *ML.

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Tue, 12 Mar 2002 01:48:18 +0000
Message-ID: <874rjmr1y5.fsf@becket.becket.net>

Matthias Blume <········@shimizu-blume.com> writes:

> On Mon, 11 Mar 2002 12:13:46 -0500, Marco Antoniotti wrote:
> 
> >> > fun foo item(x) = x + 4
> >> 
> >> The correct syntax is
> >> 
> >>   fun foo (item x) = x + 4
> > 
> > AFAIK, both syntaxes are correct.  Looks like you prefer a Lispy one :)
> 
> You are wrong, you can trust me on this one.  (You may put
> parentheses around the x, but the ones around "item x" are not
> optional. Actually, they are, but leaving them out gives you a
> completely different function.)

It's things like this that make me glad that
Scheme/Lisp/whatever-you-want-to-call-it doesn't even *have* syntax of
this sort.

From: Sander Vesik
Subject: Re: self-hosting gc
Date: Sat, 09 Mar 2002 18:16:31 +0000
Message-ID: <1015697789.673218@haldjas.folklore.ee>

In comp.lang.scheme Matthias Blume <········@shimizu-blume.com> wrote:

[snip]

> Thank you for just making my point:  Since we still have to live in the world
> of Unix, we have to somehow get started.  That's why there is
> Unix-specific stuff.  I'd *love* to get rid of it, but currently I can't,
> precisely because of how the OS works.

You mean there is a realy, quantifyable problem that keeps you from supporting the 

	#!/path/to/interpreter [arg1 [arg2 ... ] ...]

format?

> So do I.  (I don't program in Ocaml.)  By the way, Ocaml is a very poor
> example because "language" coincides with "the one and only
> implementation".  But take SML, for example:  There are *no*
> "undocumented" loopholes in the "language", but most implementations
> probably do provide some "escape hatches". Clearly, if the language is
> to be used in the way that I outlined, then there must be no loopholes
> in the implementation.  (Or, those loopholes must be governed by some
> access control mechanism.  This is just like "root" in Unix:  "root" is
> a *gigantic* loophole, but only privileged users have access to it -- at
> least that's the theory. :-)

This is not strictly true - both the bsd runlevels and the B2 variants of the
various commercial unix-es are examples where this isn't true.

> 
> Matthias

-- 
	Sander

+++ Out of cheese error +++

From: Jeffrey M. Vinocur
Subject: Re: self-hosting gc
Date: Sat, 09 Mar 2002 21:28:55 +0000
Message-ID: <a6duqn$cvi$4@marduk.litech.org>

In article <·················@haldjas.folklore.ee>,
Sander Vesik  <······@haldjas.folklore.ee> wrote:
>
>You mean there is a realy, quantifyable problem that keeps you from
>supporting the 
>
>	#!/path/to/interpreter [arg1 [arg2 ... ] ...]
>
>format?

What?

The reason they use a shell script is because they support
running on multiple architectures with the same PATH setting.
No binary can do that...


-- 
Jeffrey M. Vinocur   *   ·····@cornell.edu
http://www.people.cornell.edu/pages/jmv16/

From: Sander Vesik
Subject: Re: self-hosting gc
Date: Tue, 12 Mar 2002 15:10:37 +0000
Message-ID: <1015945835.164480@haldjas.folklore.ee>

In comp.lang.scheme Jeffrey M. Vinocur <·····@cornell.edu> wrote:
> In article <·················@haldjas.folklore.ee>,
> Sander Vesik  <······@haldjas.folklore.ee> wrote:
>>
>>You mean there is a realy, quantifyable problem that keeps you from
>>supporting the 
>>
>>       #!/path/to/interpreter [arg1 [arg2 ... ] ...]
>>
>>format?
> 
> What?
> 
> The reason they use a shell script is because they support
> running on multiple architectures with the same PATH setting.
> No binary can do that...
> 

Just like perl? Oh wait, perl handles this just fine by having a
default assumed location for perl interpreter...

-- 
	Sander

+++ Out of cheese error +++

From: Seth Gordon
Subject: Re: self-hosting gc
Date: Tue, 12 Mar 2002 17:16:57 +0000
Message-ID: <3C8E3809.BFBF14D@genome.wi.mit.edu>

Sander Vesik wrote:
> 
> > The reason they use a shell script is because they support
> > running on multiple architectures with the same PATH setting.
> > No binary can do that...
> >
> 
> Just like perl? Oh wait, perl handles this just fine by having a
> default assumed location for perl interpreter...

Tell that to the people where I work, who installed perl on a DEC server
at /util/bin/perl....

> 
> --
>         Sander
> 
> +++ Out of cheese error +++

-- 
"Any fool can write code that a computer can understand.
 Good programmers write code that humans can understand."
 --Martin Fowler
// seth gordon // wi/mit ctr for genome research //
// ····@genome.wi.mit.edu // standard disclaimer //

From: Jeffrey M. Vinocur
Subject: Re: self-hosting gc
Date: Tue, 12 Mar 2002 17:33:50 +0000
Message-ID: <a6le5v$p0a$5@marduk.litech.org>

In article <·················@haldjas.folklore.ee>,
Sander Vesik  <······@haldjas.folklore.ee> wrote:
>In comp.lang.scheme Jeffrey M. Vinocur <·····@cornell.edu> wrote:
>> In article <·················@haldjas.folklore.ee>,
>> Sander Vesik  <······@haldjas.folklore.ee> wrote:
>>>
>>>You mean there is a realy, quantifyable problem that keeps you from
>>>supporting the 
>>>
>>>       #!/path/to/interpreter [arg1 [arg2 ... ] ...]
>>>
>>>format?
>> 
>> The reason they use a shell script is because they support
>> running on multiple architectures with the same PATH setting.
>> No binary can do that...
>
>Just like perl? Oh wait, perl handles this just fine by having a
>default assumed location for perl interpreter...

You're not understanding me.  SML allows you to run on multiple
architectures out of the same *tree*.  If you have a filesystem
which is mountable from several machines, the SML script will
choose the appropriate binary *without* you having to work things
so that the directories in your PATH point to different actual
locations depending on the system.  (This is an issue when you're
storing SML in your home directory, and not as much of one when
it's installed by the system administrator.)

The MzScheme package does the same thing.  It's quite handy.

-- 
Jeffrey M. Vinocur   *   ·····@cornell.edu
http://www.people.cornell.edu/pages/jmv16/

From: Nils Goesche
Subject: Re: self-hosting gc
Date: Sat, 09 Mar 2002 18:37:42 +0000
Message-ID: <a6dkpl$dkrfq$1@ID-125440.news.dfncis.de>

In article <····································@shimizu-blume.com>, Matthias Blume wrote:
> On Sat, 09 Mar 2002 11:22:16 -0500, Nils Goesche wrote:
> 
>> In article <···············@bellsouth.net>, David Rush wrote:
>>> Tim Bradshaw <···@cley.com> writes:
>>>> > I would have an easier time "trusting" my environment if I knew its
>>>> > safety relies on proofs rather than on mere hope that not one of a
>>>> > buch of a few thousand programmers who I personally don't know
>>>> > haven't screwed up somewhere.
>>>> 
>>>> Well, that's good.  Perhaps you should start writing a provably
>>>> correct shared-memory multiprocessor OS that scales to, say, a 100
>>>> processor machine reasonably well and supports all the features of,
>>>> say, Solaris on such HW, and performs as well.
>>> 
>>> It's not hard to make the argument that he has, given that he's working
>>> on SML/NJ. The *first* thing that's needed is a solid language
>>> platform.
>> 
>> Maybe so; only that Lisp is a solid language platform, too.
> 
> But it is not statically typed, so the point is moot.

Maybe it is moot for you; some people, however, think that we need
a statically typed language for writing a reliable OS.  I know that
your point is actually a bit different, but I wasn't talking only
to you.

>  It is not expressive enough to do the things that I want to do.
>  (That is: I can't express invariants as types.)

This reminds me of a Perl programmer who says ``I want Perl because
I can easily do fancy regexp things with it'' ;-)

>> SML/NJ is
>> started by a shell script called ``.run-sml''.  I had to hack it before
>> I could use SML/NJ, because it contained a bug that passed a command
>> line argument like "foo bar" as two arguments "foo" and "bar".  The
>> shell script wasn't written in SML, of course, but the point is: bugs
>> lurk everywhere.
> 
> Thank you for just making my point:  Since we still have to live in the world
> of Unix, we have to somehow get started.  That's why there is
> Unix-specific stuff.  I'd *love* to get rid of it, but currently I can't,
> precisely because of how the OS works.

Hey, it's *my* point :-)  I anticipated that response, that's why I
added ``It isn't written in XML, of course''.  BTW, Unix has no
problems at all with spaces in arguments to processes, it's sh,
something quite different.  The only significance of this is that
bugs come from various directions.  Static typing gets rid of one
particular kind of bug, a million others remain.  And as there is
no such thing as a free lunch, you pay a price for getting rid of
run-time type errors.  If what you get for what you pay for is worth
it, fine; I, however, believe it isn't.

>> You will not magically get rid of them only because
>> you use a language that is statically strongly typed instead of one that
>> is dynamically strongly typed.  Typed lambda calculus is a nice thing.
>> But that does not imply at all, as many people seem to believe, that we
>> suddenly all have to use a statically typed language in order to write
>> good software. You can just as well regard dynamic typing as a /feature/
>> of your language which makes your language more expressive and hence
>> programming easier; and easily written programs usually have /less/ bugs
>> than more complicated ones.  There is no way to prove whether this is
>> true or not.  All you can do is try and see for yourself :-)
> 
> Well, what can I say.  I did, and I found the opposite to be true.  But I
> said that already...

Maybe, but that's all we can say: We tried and came to different
conclusions.  Neither of us can prove that he is right on this.  Other
than actually writing the OS, which is quite hard :-)

>> And another point I made long ago /still/ remains:  If people think we
>> need statically typed languages to write working software, because there
>> provably won't be any runtime type-errors [1], why not go all the way
>> and demand Haskell?  Look into your mind and answer yourself why you
>> wouldn't write an OS in Haskell and you are almost there:  You'll
>> suddenly see why Lisp people don't use statically typed languages :-)
> 
> Nonsense.  In Haskell it is very, very hard to get an intuitive graps on 
> resource consumption.  Switch two operands of a +, and your program blows
> up.  This is due to lazy evaluation (which, for many other purposes, has
> its very good sides, too).  This has nothing to do with static vs.
> dynamic typing.

It has to do with the question whether we need static typing for writing
reliable code - indirectly: If that were true, if we really needed
provable facts about our programs to make them safe, then why not use
Haskell?  Because we can prove much more things much more easily about
purely functional code; you give good reasons why choosing Haskell
would be stupid, but if it is allowed to make such arguments, maybe
some other arguments that would lead to the conclusion that we don't
need static typing either, would /also/ be allowed...  Do you really
not understand this?

>> See?  Why couldn't my OS do just that?  No weird things happen, no BSOD,
>> no kernel panic.  Aren't there any exceptions you might forget to catch
>> in *ML?
> 
> True.  But there are static exception analyzers that try to mitigate this
> problem.  In the end, yes, there will always be things that need to be
> checked at runtime.  But I am not willing to pay for those that don't
> have to be.

Yes, that was /your/ point all along.  I wasn't talking about that.
What I'd answer to /you/ is that I doubt that you would gain very
much by eliminating some runtime checks - at least not enough that
it was worth the inevitable price that comes with it...  I can't
prove that, of course; but stop acting as if you or anybody else
could prove otherwise, because they can't.

> Simple example:  I have a finite map data structure.  Should lookup raise
> an exception if the item to be looked up is not in the domain of the map?
> Well, I prefer to have lookup return an option.  This is sometimes less
> convenient to use, but it makes it clear at the type level that  one must
> somehow be prepared to deal with the failure case.  (Of course, in
> certain applications the other interface is more convenient.)

I agree and am happy that Lisp agrees, too :-)

Regards,
-- 
Nils Goesche                          PGP key ID 0x42B32FC9

"The sooner all the animals are dead, the sooner we'll find
 their money."                              -- Ed Bluestone

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Sat, 09 Mar 2002 18:56:38 +0000
Message-ID: <pan.2002.03.09.13.56.46.2194.31100@shimizu-blume.com>

On Sat, 09 Mar 2002 13:37:42 -0500, Nils Goesche wrote:

>>> And another point I made long ago /still/ remains:  If people think we
>>> need statically typed languages to write working software, because
>>> there provably won't be any runtime type-errors [1], why not go all
>>> the way and demand Haskell?  Look into your mind and answer yourself
>>> why you wouldn't write an OS in Haskell and you are almost there:
>>> You'll suddenly see why Lisp people don't use statically typed
>>> languages :-)
>> 
>> Nonsense.  In Haskell it is very, very hard to get an intuitive graps
>> on resource consumption.  Switch two operands of a +, and your program
>> blows up.  This is due to lazy evaluation (which, for many other
>> purposes, has its very good sides, too).  This has nothing to do with
>> static vs. dynamic typing.
> 
> It has to do with the question whether we need static typing for writing
> reliable code - indirectly: If that were true, if we really needed
> provable facts about our programs to make them safe, then why not use
> Haskell?  Because we can prove much more things much more easily about
> purely functional code; you give good reasons why choosing Haskell would
> be stupid, but if it is allowed to make such arguments, maybe some other
> arguments that would lead to the conclusion that we don't need static
> typing either, would /also/ be allowed...  Do you really not understand
> this?

I understand that you are trying to make this point.  But I haven't seen
those "other arguments" yet, at least not convincing ones.  And I happen
to believe that they don't really exists.

>> Simple example:  I have a finite map data structure.  Should lookup
>> raise an exception if the item to be looked up is not in the domain of
>> the map? Well, I prefer to have lookup return an option.  This is
>> sometimes less convenient to use, but it makes it clear at the type
>> level that  one must somehow be prepared to deal with the failure case.
>>  (Of course, in certain applications the other interface is more
>> convenient.)
> 
> I agree and am happy that Lisp agrees, too :-)

How does Lisp remind me (at compile time!) that I am  not handling a case
that I should handle?

Matthias

From: Nils Goesche
Subject: Re: self-hosting gc
Date: Sat, 09 Mar 2002 19:12:10 +0000
Message-ID: <a6dmqa$dmjke$1@ID-125440.news.dfncis.de>

In article <··································@shimizu-blume.com>, Matthias Blume wrote:
> On Sat, 09 Mar 2002 13:37:42 -0500, Nils Goesche wrote:
> 
>> It has to do with the question whether we need static typing for writing
>> reliable code - indirectly: If that were true, if we really needed
>> provable facts about our programs to make them safe, then why not use
>> Haskell?  Because we can prove much more things much more easily about
>> purely functional code; you give good reasons why choosing Haskell would
>> be stupid, but if it is allowed to make such arguments, maybe some other
>> arguments that would lead to the conclusion that we don't need static
>> typing either, would /also/ be allowed...  Do you really not understand
>> this?
> 
> I understand that you are trying to make this point.  But I haven't seen
> those "other arguments" yet, at least not convincing ones.  And I happen
> to believe that they don't really exists.

Well, they, and my ML experience, convinced me enough to return to
Lisp...

>>> Simple example:  I have a finite map data structure.  Should lookup
>>> raise an exception if the item to be looked up is not in the domain of
>>> the map? Well, I prefer to have lookup return an option.  This is
>>> sometimes less convenient to use, but it makes it clear at the type
>>> level that  one must somehow be prepared to deal with the failure case.
>>>  (Of course, in certain applications the other interface is more
>>> convenient.)
>> 
>> I agree and am happy that Lisp agrees, too :-)
> 
> How does Lisp remind me (at compile time!) that I am  not handling a case
> that I should handle?

I am handling it, so it doesn't.  Or I know that it can't fail in
a particular situation, so I don't handle it and the compiler is
happy with that, too.  How many complaints do you get from the
compiler when you write programs in *ML?  Newbies get a lot, but
how many do /you/ get?  The only ones I got in the end were those
when the type checker complained about things it didn't like
although they were perfectly correct and could never fail.  You
know the function can return None or something, so you are
prepared for that.  In Lisp I am, too.  And I know I won't
convince you of that.  But I think it is necessary once in a while
to demonstrate that there are /other/ views on the world than
that of the static typing crowd.  These people always act if they
could somehow prove that their languages were superior to ours,
but all they can in fact prove is that they won't have any runtime
type errors, and that's about it :-)

Regards,
-- 
Nils Goesche                          PGP key ID 0x42B32FC9

"The sooner all the animals are dead, the sooner we'll find
 their money."                              -- Ed Bluestone

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Sat, 09 Mar 2002 21:50:21 +0000
Message-ID: <pan.2002.03.09.16.50.31.636232.31557@shimizu-blume.com>

On Sat, 09 Mar 2002 14:12:10 -0500, Nils Goesche wrote:

>> How does Lisp remind me (at compile time!) that I am  not handling a
>> case that I should handle?
> 
> I am handling it, so it doesn't.  Or I know that it can't fail in a
> particular situation, so I don't handle it and the compiler is happy
> with that, too.

Ok, I see.  So you are one of those legendary super-hero programmers who
always know when something can or cannot happen.  Good for you.  For the
forgetful rest of us, there are typecheckers... :)

> How many complaints do you get from the compiler when
> you write programs in *ML?  Newbies get a lot, but how many do /you/
> get?

Make this: newbies and really old hares.  Newbies get them because they
screw up at simple things.  Old hares get them because they have set up
their abstractions in such a way that they get timely reminders when
they screw up at difficult things.

>  The only ones I got in the end were those when the type checker
> complained about things it didn't like although they were perfectly
> correct and could never fail.

So you made it beyond the newbie stage but never to the old hare stage.
In other words, you weren't setting up your types cleverly.  (E.g., if a
function with return type "foo option" cannot return NONE, then why does
it have type "foo option" and not just "foo" in the first place?)

> You know the function can return None or
> something, so you are prepared for that.  In Lisp I am, too.  And I know
> I won't convince you of that.

Indeed, you won't because I have been programming in Lisp long enough
to know better.  At least for my case.  (In your case, since you are
that super-hero programmer from above, I guess you must be right.)

>  But I think it is necessary once in a
> while to demonstrate that there are /other/ views on the world than that
> of the static typing crowd.

It is not (necessary).  I already knew that.  Used to be one of those who
held that /other/ view.  And I was just as /passionate/ as /you/. /:-)/

> These people always act if they could
> somehow prove that their languages were superior to ours, but all they
> can in fact prove is that they won't have any runtime type errors, and
> that's about it :-)

No, that's not "about it".  Not only won't there be runtime type errors.
What makes the real difference is that those runtime type errors can
correspond to violations of real program invariants (if types were used
by an experienced programmer!).   So by knowing that there are no runtime
type errors, we know that all invariants whose violations we managed to
turn into runtime type errors, will always be maintained.  And *that* is
what it's all about.  (Of course, this does not matter to anyone such as
yourself who *already* knew that his program invariants were all
satisfied.  Unfortunately, not everyone is is that smart, I'm afraid.)

Matthias

From: Nils Goesche
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 18:44:59 +0000
Message-ID: <a6g9jb$e3dih$1@ID-125440.news.dfncis.de>

In article <····································@shimizu-blume.com>, Matthias Blume wrote:
> On Sat, 09 Mar 2002 14:12:10 -0500, Nils Goesche wrote:
> 
>>> How does Lisp remind me (at compile time!) that I am  not handling a
>>> case that I should handle?
>> 
>> I am handling it, so it doesn't.  Or I know that it can't fail in a
>> particular situation, so I don't handle it and the compiler is happy
>> with that, too.
> 
> Ok, I see.  So you are one of those legendary super-hero programmers who
> always know when something can or cannot happen.  Good for you.  For the
> forgetful rest of us, there are typecheckers... :)

Hehe, but it isn't that hard to do:  When in doubt, check.  When I can
prove that some library function won't fail in a particular situation,
I can safely omit the check.  Fortunately, /I/ am programming in a
language where I can just do it and don't have to communicate my
proof to the compiler, first.

>>  The only ones I got in the end were those when the type checker
>> complained about things it didn't like although they were perfectly
>> correct and could never fail.
> 
> So you made it beyond the newbie stage but never to the old hare stage.
> In other words, you weren't setting up your types cleverly.  (E.g., if a
> function with return type "foo option" cannot return NONE, then why does
> it have type "foo option" and not just "foo" in the first place?)

Good question.  What if it's a library function?  Maybe one I wrote,
maybe one my collegue wrote.  Maybe it used to return NONE once in
a while in an earlier version of the program but for version 3.1
things changed and it can't in that particular situation.  Maybe
I am still experimenting - writing small parts of a whole system
most of which is not written yet - how can I possibly know already
which types I will use in the end, if I not even know exactly what
the program is going to do?  When the specifications change along
the way, the whole type system has to change, too, every time.

>> These people always act if they could
>> somehow prove that their languages were superior to ours, but all they
>> can in fact prove is that they won't have any runtime type errors, and
>> that's about it :-)
> 
> No, that's not "about it".  Not only won't there be runtime type errors.
> What makes the real difference is that those runtime type errors can
> correspond to violations of real program invariants (if types were used
> by an experienced programmer!).   So by knowing that there are no runtime
> type errors, we know that all invariants whose violations we managed to
> turn into runtime type errors, will always be maintained.  And *that* is
> what it's all about.  (Of course, this does not matter to anyone such as
> yourself who *already* knew that his program invariants were all
> satisfied.  Unfortunately, not everyone is is that smart, I'm afraid.)

I think I understand that perfectly well :-)  But you don't have to be
that smart to do it.  If you are an experienced programmer, you know
your language, its pitfalls, and what has to be checked.  The programs
I write in C aren't horrible piles of core dumping garbage, either.
But I can write them in Lisp much faster.  That's it.  Sure, it is
also very satisfying if you have a particular problem at hand, think
of a proper ML type representation for it and implement it in SML.
But if the specification changes when I have already written half
of the program, as it usually does, adapting things in ML will be
much harder than in Lisp.

Yes, SML is perfect, in a way.  ``The Definition of Standard ML'' is
certainly a milestone in the history of computer science.  It is
round and complete.  If I were still working in a university, I'd
probably use it and nothing else.  But I left the university and
have spent years in the real world, now.  In the real world, things
aren't as clean as in the ML world.  I am not setting up a problem,
think of a mathematical solution, implement it and prove its
correctness.  Marketing people are telling me what they want.  And
two weeks later, they change their mind and want something different.
And then the customer wants something different again.  I want a
language that helps me cope with ``dynamic'' specifications.

Of course, I am not saying that all that static typing research is
for Tim Bradshaw's cat... I think it is very valuable; I hope my
Lisp compiler vendor reads your papers, too, and takes as much
as possible out of it to improve my Lisp systems even further.

Regards,
-- 
Nils Goesche
Ask not for whom the <CONTROL-G> tolls.

PGP key ID 0xC66D6E6F

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 22:22:51 +0000
Message-ID: <pan.2002.03.10.17.23.06.59011.3368@shimizu-blume.com>

On Sun, 10 Mar 2002 13:44:59 -0500, Nils Goesche 

> [ ... ] Sure, it is also very
> satisfying if you have a particular problem at hand, think of a proper
> ML type representation for it and implement it in SML. But if the
> specification changes when I have already written half of the program,
> as it usually does, adapting things in ML will be much harder than in
> Lisp.

In my experience, the exact opposite is true.  And that is precisely
because of the types:  I change one thing in one corner of the program,
invariants shift, so types change, and the compiler points out to me
which other parts of the program -- even those that I had already
forgotten about, or which I never looked at before -- need to change
as well.

Matthias

From: Nils Goesche
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 22:33:33 +0000
Message-ID: <a6gmvt$donoh$1@ID-125440.news.dfncis.de>

In article <··································@shimizu-blume.com>, Matthias Blume wrote:
> On Sun, 10 Mar 2002 13:44:59 -0500, Nils Goesche 
> 
>> [ ... ] Sure, it is also very
>> satisfying if you have a particular problem at hand, think of a proper
>> ML type representation for it and implement it in SML. But if the
>> specification changes when I have already written half of the program,
>> as it usually does, adapting things in ML will be much harder than in
>> Lisp.
> 
> In my experience, the exact opposite is true.  And that is precisely
> because of the types:  I change one thing in one corner of the program,
> invariants shift, so types change, and the compiler points out to me
> which other parts of the program -- even those that I had already
> forgotten about, or which I never looked at before -- need to change
> as well.

Interestingly, I think my *ML experience has made me a better Lisp
programmer :-)  I have more discipline now how I represent my data
through abstractions, and I have suddenly begun to use CLOS, which
I had totally ignored before.

Regards,
-- 
Nils Goesche
Ask not for whom the <CONTROL-G> tolls.

PGP key ID 0xC66D6E6F

From: David Rush
Subject: Re: self-hosting gc
Date: Tue, 19 Mar 2002 10:54:22 +0000
Message-ID: <okfzo14keu9.fsf@bellsouth.net>

Nils Goesche <···@cartan.de> writes:
> In article <····································@shimizu-blume.com>, Matthias Blume wrote:
> > if a
> > function with return type "foo option" cannot return NONE, then why does
> > it have type "foo option" and not just "foo" in the first place?)
> 
> Good question.  What if it's a library function?  Maybe one I wrote,
> maybe one my collegue wrote.  Maybe it used to return NONE once in
> a while in an earlier version of the program but for version 3.1
> things changed and it can't in that particular situation.  Maybe
> I am still experimenting - writing small parts of a whole system
> most of which is not written yet - how can I possibly know already
> which types I will use in the end, if I not even know exactly what
> the program is going to do?  When the specifications change along
> the way, the whole type system has to change, too, every time.

OK guys. I am more of a fence-sitter than either of you. The first
(non-trivial, say 3kloc+) program that I ever wrote that worked 100%
the first time I ran it was a *direct* product of learning to live
with the SML type system. Since then, I have written a few others, so
SML clearly did a lot of good for my programming skills. 

That said, I now program nearly everything in Scheme for *exactly* the
reasons Nils mentions. It is strictly a time/functionality
tradeoff. When I'm prototyping code for which I have no managerial
backing, I'm much more concerned with getting it to work. Random
failures are OK, because the quality level I'm working for is strictly
prototype/proof of concept. However for shipping code, I would *far*
prefer to use SML (I can't, because nobody will pay for it), to
eliminate as many problems as possible. Purify/MrSpidey/PolySpace
are a poor (and labor-intensive) substitute for the benefits of the
H-M inference in SML and it's cousins.

And I would especially appreciate an OS built with those 'restrictions'.

Also, in the FWIW category. I am starting to move back towards SML
these days, even for prototyping. Managing code reuse with multiple
covariant types seems much easier given SML's functors than with any
other system I have encountered thus far.

david rush
-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS d? s-: a C++$ ULSAH+++$ P+(---) L++ E+++ W+(--) N++ K w(---) ···@
PS+++(--) PE(++) Y+ PGP !tv b+++ DI++ D+(--) e*(+++>+++) h---- r+++
z++++
-----END GEEK CODE BLOCK-----

From: Nils Goesche
Subject: Re: self-hosting gc
Date: Tue, 19 Mar 2002 15:53:48 +0000
Message-ID: <a77muc$j888k$1@ID-125440.news.dfncis.de>

In article <···············@bellsouth.net>, David Rush wrote:
> That said, I now program nearly everything in Scheme for *exactly* the
> reasons Nils mentions.

[snip]

> Also, in the FWIW category. I am starting to move back towards SML
> these days, even for prototyping. Managing code reuse with multiple
> covariant types seems much easier given SML's functors than with any
> other system I have encountered thus far.

Well, if the only choice I had were between Scheme and SML, I think
I'd rather go for SML, too :->

Just kidding.  Seriously, this thread had already died quite a while
ago, and as the temperature is already pretty high in comp.lang.lisp
these days, it would be nice if you could keep this out of
comp.lang.lisp.  Thank you. follow up set, if I can figure out
how to do that with slrn...

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Sat, 09 Mar 2002 21:54:30 +0000
Message-ID: <ey3zo1h8l0p.fsf@cley.com>

* Nils Goesche wrote:

> See?  Why couldn't my OS do just that?  No weird things happen,
> no BSOD, no kernel panic.  Aren't there any exceptions you
> might forget to catch in *ML?

And in fact your OS *must* do that because *nothing* you can do will
avoid dynamic errors such as those arising from hardware: you can't
prove those away at compile time.

--tim

From: Ray Dillinger
Subject: Re: self-hosting gc
Date: Fri, 05 Apr 2002 20:39:21 +0000
Message-ID: <3CAE0B71.880A75ED@sonic.net>

Matthias Blume wrote:
> 
> On Fri, 08 Mar 2002 00:30:27 -0500, Ray Blaak wrote:
> 
> > Vilhelm Sjoberg <·····@cam.ac.uk> writes:
> >> But if you do your program development in a type-safe language using a
> >> standard OS, then you are paying for a feature you don't need, namely
> >> the sandboxing of programs into processes that cannot hurt eachother.
> >
> > As long as your type-safe language has "escape hatches" for bypassing
> > safety (e.g. unchecked conversion, for V'Address use ..., calls for
> > foreign functions, etc.) then OS protection features are still
> > necessary.
> 
> But in the kind of language we are talking about, it is statically known
> whether "escape hatches" have been use in a particular program.  Only
> programs that do use unsafe features need OS protection.  In practice,
> those should be the vast minority.

It becomes a little more difficult to reliably detect an "escape 
hatch" when all you have to work on is raw binary.  Most commercial 
software houses don't want to sell you more than the raw binary.  
Ergo, no matter what you can do with programs you compile and whose 
source you can analyze, you'll probably always have to also provide a 
separate memory space for commercial code to run in. Unless you just 
want to trust those guys implicitly.... 

				Bear

From: Frode Vatvedt Fjeld
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 14:06:25 +0000
Message-ID: <2hwuwnjgri.fsf@vserver.cs.uit.no>

Ray Blaak <·····@telus.net> writes:

> It is far far better to have both safety features (language safety
> and OS-protections).

The old lisp machines lacked OS protection mechanisms like address
spaces, didn't they? Did they suffer substantially from this design?

-- 
Frode Vatvedt Fjeld

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 14:46:51 +0000
Message-ID: <3224587620423566@naggum.net>

* Frode Vatvedt Fjeld <······@acm.org>
| The old lisp machines lacked OS protection mechanisms like address
| spaces, didn't they? Did they suffer substantially from this design?

  They would today, since all kinds of creep can and do get access to
  computers.  Besides, if anyone would implement a system that invites
  intrusion and viruses and malicious abuse and is completely helpless in
  the face of such abuse, they would be violating a number of Microsoft
  patents on how _not_ to design software.

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 17:34:31 +0000
Message-ID: <871yev7yl4.fsf@becket.becket.net>

Erik Naggum <····@naggum.net> writes:

>   They would today, since all kinds of creep can and do get access to
>   computers.  Besides, if anyone would implement a system that invites
>   intrusion and viruses and malicious abuse and is completely helpless in
>   the face of such abuse, they would be violating a number of Microsoft
>   patents on how _not_ to design software.

And since the popular operating systems *do* have those memory
protection features, it's not possible for a creep to get in and
destroy a system, right?

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 18:58:01 +0000
Message-ID: <3224602690894312@naggum.net>

* Thomas Bushnell, BSG
| And since the popular operating systems *do* have those memory protection
| features, it's not possible for a creep to get in and destroy a system,
| right?

  Wrong, idiot.

  Please take a remedial course in argumentation and logic and figure out
  the difference between a necessary and a sufficient condition.  You seem
  not to be the only one who might need to understand this point around
  here, but it is still tremendously annoying when some snotty bastard who
  fails to grasp that difference tries to make a fool out of others because
  of _his_ lack of understanding.

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 21:51:17 +0000
Message-ID: <87y9h2684q.fsf@becket.becket.net>

Erik Naggum <····@naggum.net> writes:

>   Wrong, idiot.

Bzzt, try again.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 21:54:30 +0000
Message-ID: <87u1rq67zd.fsf@becket.becket.net>

Erik Naggum <····@naggum.net> writes:

>   Please take a remedial course in argumentation and logic and figure out
>   the difference between a necessary and a sufficient condition.  

The point is that a kernel/user barrier doesn't improve the system
against miscreants *AT ALL*.  

If I have access to user mode on any Unix system in the world, I can
still run the halt system call.

The user/kernel barrier does not *AT ALL* improve security against
miscreants with access to privileged user-mode programs.  And, in
fact, it is such access where almost all the current security problems
on *nix systems resides.

However, it is possible to cause almost any *nix system to fall over
and start behaving *very* badly even from untrusted non-root user-mode
programs.

The kernel/user barrier is a hedge against programming mistakes, but
not *AT ALL* a hedge against miscreant attackers.

Finally, you do not improve your arguments by calling someone an
idiot.  It only makes you look foolish.

Thomas

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 23:40:45 +0000
Message-ID: <3224619654799393@naggum.net>

* Thomas Bushnell, BSG
| Finally, you do not improve your arguments by calling someone an idiot.
| It only makes you look foolish.

  You are arguing against ludicrous and bogus arguments you have imputed to
  me, but which I have not expressed, implied, or made it possible to infer
  in the first place.  _That_ was what was wrong, not your "argument".  I
  have extremely low tolerance for people who play rhetorical games where
  the goal is to embarrass someone instead of actually making a point, and
  if you engage in gratuitous fault-finding, too, there is no good-will
  left on my end at all.

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 23:53:40 +0000
Message-ID: <87henq4nwb.fsf@becket.becket.net>

Erik Naggum <····@naggum.net> writes:

>   I have extremely low tolerance for people who play rhetorical
>   games where the goal is to embarrass someone instead of actually
>   making a point, and if you engage in gratuitous fault-finding,
>   too, there is no good-will left on my end at all.

As do I.  Please accept my apology for any slight; I was not trying to
engage in gratuitous fault-finding.

My training in philosophy sometimes gets ahead of me, I'm afraid.
When confronted with a proposition that I believe to be false, I find
the clearest most direct counterexample; the goal is never focused on
the person, but only on the proposition.  And I don't have any
intention of belittleing someone or scoring rhetorical points; I'm
actually a very issue-focused person.  But that tends (again) to get
ahead of me at times, and people interpret my zeal to demonstrate the
falisty of some proposition as a zeal to demonstrate the foolishness
of some other person.  But that is never my goal--though I grant that
it can be near impossible to tell.

So this is a plea for good will.  In the particular case, I was trying
to augment the previous poster's example, the point being that
hand-typed tables can be maddeningly hard to get right at times, and
even harder to find bugs in than normal code.  I once had such a case
in (IIRC) a table of parities for ascii.  There was a bug in one of
the values, and it was so hard to notice in the midst of a big table,
that (once I found the bug) I immediately replaced it with a table
programmatically generated on startup.  *That* I could be sure was
correct.

In the instant case, however, my message was even worse, given the
presence of a foolish carelessness in my part (since, of course,
omitting the "I" was part of the original problem statement).

Thomas

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Sat, 09 Mar 2002 03:44:21 +0000
Message-ID: <3224634269989329@naggum.net>

* ·········@becket.net (Thomas Bushnell, BSG)
| My training in philosophy sometimes gets ahead of me, I'm afraid.  When
| confronted with a proposition that I believe to be false, I find the
| clearest most direct counterexample; the goal is never focused on the
| person, but only on the proposition.

  However, the space of things that are the negations of falsehoods is
  rather large and does not at all need to include any truth.  By picking a
  particular negation of a falsehood, you imply that your angle on what is
  false is sufficient to construct a counterexample.  This is generally not
  true.  When somebody is opposed to something, as is the case here, there
  is no telling what they actually would like propose.  By choosing some
  counter-example to a negative argument, you actually do impute intention
  and opinions about what a person would propose if they did not oppose
  your particular argument to begin with.  I have a very hard time seeing
  how this can _not_ result in hostilities.

| But that tends (again) to get ahead of me at times, and people interpret
| my zeal to demonstrate the falisty of some proposition as a zeal to
| demonstrate the foolishness of some other person.  But that is never my
| goal--though I grant that it can be near impossible to tell.

  If someone spends a lot of time demonstrating the falsity of something a
  person does not even express or imply, because the falsity you perceive
  is in your view a counter-argument to your position.  It is vital to keep
  track of positive and negative propositions and arguments in order to
  make this work.  Because the disproportionally larger space of a negative
  proposition, the negative of a negative may not even include the positive
  -- and that means that you must be very careful in countering arguments
  against your own position with counter-examples.  Instead, support your
  initial position.

| So this is a plea for good will.  In the particular case, I was trying to
| augment the previous poster's example, the point being that hand-typed
| tables can be maddeningly hard to get right at times, and even harder to
| find bugs in than normal code.

  However, it was not hand-typed.  Christ, give me a break.  First, if it
  were a hand-typed table, the line with l-p would not _all_ be lowercase,
  now, would it?  Second, just because something is in source form does not
  mean it was typed in by a human being.  It is far easier to read a case
  statement that actually contains the cases than code that builds a hairly
  table.  It is also far better to let the compiler writers worry about the
  optimization of table lookup than to do that grunt work yourself.  case
  is the right tool for the job.

  Of all the languages in which you can write code, the Lisp family offers
  the best way of all to produce some output and recycle it as code.  I do
  this a lot, because it is often easier to write Emacs Lisp code to write
  code for you than it is to type it in manually, anyway.

| I once had such a case in (IIRC) a table of parities for ascii.  There
| was a bug in one of the values, and it was so hard to notice in the midst
| of a big table, that (once I found the bug) I immediately replaced it
| with a table programmatically generated on startup.  *That* I could be
| sure was correct.

  Well, I prefer to produce source code in intimate cooperation with the
  editor, Emacs.

| In the instant case, however, my message was even worse, given the
| presence of a foolish carelessness in my part (since, of course,
| omitting the "I" was part of the original problem statement).

  I think your careless assumption that it was hand-typed should have been
  amply refuted by the presence of a systematic error that should unlikely
  have been made by hand.

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Frode Vatvedt Fjeld
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 15:32:30 +0000
Message-ID: <2hsn7bjcs1.fsf@vserver.cs.uit.no>

* Frode Vatvedt Fjeld <······@acm.org>
| The old lisp machines lacked OS protection mechanisms like address
| spaces, didn't they? Did they suffer substantially from this design?

Erik Naggum <····@naggum.net> writes:

> They would today, since all kinds of creep can and do get access to
> computers.  Besides, if anyone would implement a system that invites
> intrusion and viruses and malicious abuse and is completely helpless
> in the face of such abuse, they would be violating a number of
> Microsoft patents on how _not_ to design software.

But recent Microsoft OSes do implement address spaces, and enforce
them as do unix systems. The fundamental problem of macro viruses is
not solved by address spaces (even if they arguably can help somewhat
once the damage is done), nor are buffer/stack overruns, or similar
attacks. It is also generally understood that the unix (and, I
suppose, windows) protection mechanisms do not seriously protect
against anyone with physical access to the machine.

I'd say if you run malicious code, outside some specially rigged jail
environment, you are in big trouble regardless. Java's virtual machine
is one such rigged environment, but to my knowledge this approach also
doesn't rely on strictly enforced address spaces.

Address spaces do allow multiple users (to a limited degree) not to
interfere with one another, but I believe most PC-users today get to
have one personal computer each for typical work, and demanding
applications get at least one devoted machine.

-- 
Frode Vatvedt Fjeld

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 16:17:52 +0000
Message-ID: <ey3u1rrav9r.fsf@cley.com>

* Frode Vatvedt Fjeld wrote:

> I'd say if you run malicious code, outside some specially rigged jail
> environment, you are in big trouble regardless. Java's virtual machine
> is one such rigged environment, but to my knowledge this approach also
> doesn't rely on strictly enforced address spaces.

> Address spaces do allow multiple users (to a limited degree) not to
> interfere with one another, but I believe most PC-users today get to
> have one personal computer each for typical work, and demanding
> applications get at least one devoted machine.

I think you're missing the distinction between necessary and
sufficient.  No one is claiming that memory protection mechanisms are
*sufficient* but for native code applications I think I would like to
claim they are *necessary* to run untrusted native code.

--tim

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 17:35:17 +0000
Message-ID: <87wuwn6jze.fsf@becket.becket.net>

Tim Bradshaw <···@cley.com> writes:

> I think you're missing the distinction between necessary and
> sufficient.  No one is claiming that memory protection mechanisms are
> *sufficient* but for native code applications I think I would like to
> claim they are *necessary* to run untrusted native code.

I'd agree, if you add the qualificiation "non-proof-carrying".

From: Frode Vatvedt Fjeld
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 17:33:19 +0000
Message-ID: <2hit87j76o.fsf@vserver.cs.uit.no>

Tim Bradshaw <···@cley.com> writes:

> I think you're missing the distinction between necessary and
> sufficient.  No one is claiming that memory protection mechanisms
> are *sufficient* but for native code applications I think I would
> like to claim they are *necessary* to run untrusted native code.

But why would you want to run untrusted native code? I believe
Microsoft is setting up an entire trust infrastructure for binaries,
with cryptographic signing of applications and drivers and whatnot,
and that's probably for a reason.

Are you willing to take a (potentially) big performance hit just to be
able to support the jail system call? Maybe sometimes that's
necessary, but not always.

-- 
Frode Vatvedt Fjeld

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 17:57:13 +0000
Message-ID: <ey3henraqo6.fsf@cley.com>

* Frode Vatvedt Fjeld wrote:

> But why would you want to run untrusted native code? 

Because I can't lay my hands on any other kind.  The only kind of
native code I'd regard as trusted is that for which there is a formal
correctness proof.  Since I can't get an X server or a Lisp system
which has such a proof for instance, I'm constrained to run ones which
might follow random pointers occasionally.  In order to reduce the
damage that this kind of problem does I like to have a whole lot of
dynamic checks on things.

(Note, I say `reduce': since the OS is not provably correct and the HW
may fail and so forth I'm not foolish enough to believe I can
eliminate the damage).

> I believe
> Microsoft is setting up an entire trust infrastructure for binaries,
> with cryptographic signing of applications and drivers and whatnot,
> and that's probably for a reason.

You think I'm going to trust a program to be correct just because
someone's signed it?  Come on, be serious.

--tim

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 19:14:26 +0000
Message-ID: <3224603675566967@naggum.net>

* Tim Bradshaw <···@cley.com>
| The only kind of native code I'd regard as trusted is that for which
| there is a formal correctness proof.

  How would you arrive at that proof?  What software would you trust
  implicitly in order to trust some other software explicitly?

> I believe Microsoft is setting up an entire trust infrastructure for
> binaries, with cryptographic signing of applications and drivers and
> whatnot, and that's probably for a reason.

| You think I'm going to trust a program to be correct just because
| someone's signed it?  Come on, be serious.

  Well, you are obviously not the target audience for Microsoft's "we're
  the good guys, and the government that made us look like criminals in
  their court are the real bad guys" or their "we're the good guys, but all
  those bad guys abuse our naive incompetence to do bad things" propaganda.
  If there is one company I would _not_ trust to sign software I would
  depend on to be correct, it is Microsoft.  Who _cares_ if buggy shitware
  with security holes the size of Washington state is signed or not?
  
///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 19:22:17 +0000
Message-ID: <ey38z92c1au.fsf@cley.com>

* Erik Naggum wrote:

>   How would you arrive at that proof?  What software would you trust
>   implicitly in order to trust some other software explicitly?

I don't know.  I regard it as kind of up to the static correctness
proof people to convince me of this.  Until then I'm keeping my
hardware checks.

(Incidentally they also need to convince me that if the machine gets
an undetected error in memory (which for me means a 3 bit error I
think and for most PCs means a single bit error) that the resulting
code is also safe.)

--tim

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 21:50:19 +0000
Message-ID: <873cza7mqs.fsf@becket.becket.net>

Tim Bradshaw <···@cley.com> writes:

> I don't know.  I regard it as kind of up to the static correctness
> proof people to convince me of this.  Until then I'm keeping my
> hardware checks.

Wait, how does the hardware check?

In the case of Linux, for example, it doesn't check automatically, but
only as directed by the kernel--the big, complicated, filled-with-bugs
kernel.

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 21:56:52 +0000
Message-ID: <ey3it86afkr.fsf@cley.com>

* Thomas Bushnell wrote:

> In the case of Linux, for example, it doesn't check automatically, but
> only as directed by the kernel--the big, complicated, filled-with-bugs
> kernel.  

Sure, but the kernel only has to set up the page tables once, then the
HW will prevent you trashing (or seeing) other people's memory,
including the OS's.  I'm obviously not suggesting a pure hardware
solution, did you really think I was?

--tim

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 22:04:26 +0000
Message-ID: <87henq67it.fsf@becket.becket.net>

Tim Bradshaw <···@cley.com> writes:

> * Thomas Bushnell wrote:
> 
> > In the case of Linux, for example, it doesn't check automatically, but
> > only as directed by the kernel--the big, complicated, filled-with-bugs
> > kernel.  
> 
> Sure, but the kernel only has to set up the page tables once, then the
> HW will prevent you trashing (or seeing) other people's memory,
> including the OS's.  I'm obviously not suggesting a pure hardware
> solution, did you really think I was?

Here's the point I was making, I'll spell it out more carefully:

You were arguing that proof-carrying code, or
trusted-compiler-generated code was not an adequate replacement for a
user/kernel barrier (and concomitant user/user process barriers).  It
seemed to me that you were saying that was because you wouldn't be
willing to actually trust the compiler or the proof-checker to do a
perfect job, and so it might let in things that shouldn't be let in.

My reply is that your "hardware" solution actually still depends on
the compiler's ability to correctly compile the kernel code (and not
generate stray memory writes that might clobber page tables, for
example).  It also depends on your ability to read through the
page-table handling code, and verify that it really does enforce the
required barriers.  It depends on there being no errant code
(anywhere!) in the kernel which might execute a stray memory write
that happens to clobber a page table.

So you are really trusting quite a lot.

A proof-checker is pretty easy to get right.

Moreover, a trusted-compiler is just like any other compiler.  That
is, when I trust the trusted-compiler not to generate stray memory
writes, I'm doing *exactly* the same thing as when I trust whatever
compiles the kernel not to generate stray memory writes.

Thomas

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 22:40:47 +0000
Message-ID: <ey3adtiadjk.fsf@cley.com>

* Thomas Bushnell wrote:

> You were arguing that proof-carrying code, or
> trusted-compiler-generated code was not an adequate replacement for a
> user/kernel barrier (and concomitant user/user process barriers).  It
> seemed to me that you were saying that was because you wouldn't be
> willing to actually trust the compiler or the proof-checker to do a
> perfect job, and so it might let in things that shouldn't be let in.

Well, no, I wasn't. What I actually wrote was:

    Because I can't lay my hands on any other kind.  The only kind of
    native code I'd regard as trusted is that for which there is a formal
    correctness proof.  Since I can't get an X server or a Lisp system
    which has such a proof for instance, I'm constrained to run ones which
    might follow random pointers occasionally.  In order to reduce the
    damage that this kind of problem does I like to have a whole lot of
    dynamic checks on things.

--tim

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 22:48:07 +0000
Message-ID: <87u1rq4qxk.fsf@becket.becket.net>

Tim Bradshaw <···@cley.com> writes:

>     Because I can't lay my hands on any other kind.  The only kind of
>     native code I'd regard as trusted is that for which there is a formal
>     correctness proof.  Since I can't get an X server or a Lisp system
>     which has such a proof for instance, I'm constrained to run ones which
>     might follow random pointers occasionally.  In order to reduce the
>     damage that this kind of problem does I like to have a whole lot of
>     dynamic checks on things.

Ah, thanks for the clarification.

If you are running traditional process/kernel programs, they may well
rely on such protections (even if inadvertently).

The discussion is, however, focused on new systems--ones which, for
example, might have no process/kernel split.  Or, for example, adding
extensions to existing well-designed systems.

Why do you want a formal proof for a Lisp system before you accept its
pointer guarantees, but you don't expect a formal proof for the C
compiler or your kernel, before you accept their guarantees?

Thomas

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Sat, 09 Mar 2002 00:25:45 +0000
Message-ID: <3224622354197752@naggum.net>

* ·········@becket.net (Thomas Bushnell, BSG)
| Why do you want a formal proof for a Lisp system before you accept its
| pointer guarantees, but you don't expect a formal proof for the C
| compiler or your kernel, before you accept their guarantees?

  But has anyone actually asked for a formal proof for a Lisp system?

  Mostly, what I have written and what I understand Tim to be writing is a
  resounding rejection of all the wild claims made by the "proof" and
  "static typing" crowd, namly that such tactics at the source level is
  _sufficient_ to ensure a bug-free and fully operationsl system, and that
  is only necessary because of their attacks on the infrastructure upon
  which we rely today for problems that very few of us believe would go
  away with all that fancy-schmancy type-based proof cruft, which has a lot
  of theoretical values in how to design and not to design software, but
  those crowds are so incredibly snotty about their "superior" theories and
  so absolutely clueless about many real-world issues that simply do not
  fit their theories, and which therefore appear to many to be a case of
  "if the theory does not fit, you must acquit", which so many people who
  have "good theories" resort to in order to purposefully discard those
  parts of reality that are not explainable by their theories.  I mean, I
  know some really smart people who have these incredibly ludicrous ideas
  about how to design and run societies because they flat out refuse to
  believe that bad people exist, and so completely ignore the threat they
  pose and offer no way to deal with anyone who would seize the opportunity
  to do someone harm.  My theory of society-building is that people are
  only polite and friendly and can work together because there are some
  very serious and credible counter-forces that would be applied against
  any real or attempted use of force to begin with, and that translates to
  how computers have to deal with all of those malicious people who do
  _not_ perceive a credible counter-force to their destructive intents and
  to all those _mishaps_ that just happen to software in a very unfriendly
  real world.

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Sat, 09 Mar 2002 03:09:11 +0000
Message-ID: <878z92e8tk.fsf@becket.becket.net>

Erik Naggum <····@naggum.net> writes:

>   Mostly, what I have written and what I understand Tim to be writing is a
>   resounding rejection of all the wild claims made by the "proof" and
>   "static typing" crowd, namly that such tactics at the source level is
>   _sufficient_ to ensure a bug-free and fully operationsl system
>   [....]

I don't think it is; such wild claims are certainly, well, wild.

I think the point is that proof carrying code, or using only
pointer-safe languages, obviates the need for memory barriers between
programs.  It certainly doesn't obviate the need for other security
and reliability techniques.

Thomas

From: Will Deakin
Subject: Re: self-hosting gc
Date: Wed, 13 Mar 2002 11:32:48 +0000
Message-ID: <3C8F38E0.6080304@hotmail.com>

Thomas Bushnell, BSG wrote:
> I think the point is that proof carrying code, or using only
> pointer-safe languages, obviates the need for memory barriers between
> programs.
Fair enough -- I don't. And although I can't prove this ;) -- or want 
to for that matter -- I don't think Erik Naggum or Tim do either,

:)w

From: Christopher Browne
Subject: Re: self-hosting gc
Date: Wed, 13 Mar 2002 16:10:17 +0000
Message-ID: <m3g034v47q.fsf@salesman.cbbrowne.com>

In an attempt to throw the authorities off his trail, Will Deakin <···········@hotmail.com> transmitted:
> Thomas Bushnell, BSG wrote:
> > I think the point is that proof carrying code, or using only
> > pointer-safe languages, obviates the need for memory barriers between
> > programs.

> Fair enough -- I don't. And although I can't prove this ;) -- or
> want to for that matter -- I don't think Erik Naggum or Tim do
> either,

Seems to me this is rather like the issue of how much you trust a PGP
"web of trust."

They have the same kind of issues, namely of whether the "proof"
involves evidence that is useful for what you want it to imply.

A "proof-carrying code" doesn't necessarily provide evidence that the
system is proof against nefarious code.
-- 
(reverse (concatenate 'string ··········@" "enworbbc"))
http://www3.sympatico.ca/cbbrowne/lisp.html

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Wed, 13 Mar 2002 16:33:27 +0000
Message-ID: <874rjkjulk.fsf@becket.becket.net>

Christopher Browne <········@acm.org> writes:

> They have the same kind of issues, namely of whether the "proof"
> involves evidence that is useful for what you want it to imply.
> 
> A "proof-carrying code" doesn't necessarily provide evidence that the
> system is proof against nefarious code.

Of course.  The operating system gets to dictate what the compiler
must prove, and they are typically easy things, like "all memory reads
and writes are in bounds to objects that you already have a pointer
to".  That sort of thing.

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Wed, 13 Mar 2002 16:34:11 +0000
Message-ID: <pan.2002.03.13.11.34.10.327679.32670@shimizu-blume.com>

On Wed, 13 Mar 2002 11:10:17 -0500, Christopher Browne wrote:

> In an attempt to throw the authorities off his trail, Will Deakin
> <···········@hotmail.com> transmitted:
>> Thomas Bushnell, BSG wrote:
>> > I think the point is that proof carrying code, or using only
>> > pointer-safe languages, obviates the need for memory barriers between
>> > programs.
> 
>> Fair enough -- I don't. And although I can't prove this ;) -- or want
>> to for that matter -- I don't think Erik Naggum or Tim do either,
> 
> Seems to me this is rather like the issue of how much you trust a PGP
> "web of trust."
> 
> They have the same kind of issues, namely of whether the "proof"
> involves evidence that is useful for what you want it to imply.
> 
> A "proof-carrying code" doesn't necessarily provide evidence that the
> system is proof against nefarious code.

You may have some misconception about how PCC works.  The "proof" in
"proof-carrying code" does not refer to the concept of proof as in
"proof that I am 21 by showing my id".  Such a scheme would, indeed,
be vulnerable to whether or not you trust the id and the person who
issued it.

PCC carries a proof in the traditional logic sense:  It comes with a
logic derivation of the safety property in question.  The derivation
starts from basic principles -- axioms that the host can agree with.
The derivation constitutes a formal proof.  Such a formal proof is
mechanically checkable.

Now, there are 4 possible scenarios:

  1. The program is correct, the proof is correct.
     This is a safe situation, and the host will let the code run.

  2. The program is incorrect, the proof is correct.
     In this case, unless the logic was inconsistent to begin with,
     the proof will not actually conclude with the safety property
     in question.  Thus, the host will not let the code run.
     Safe again.

  3. The program is correct, the proof is incorrect.
     In this case the host will detect the flaw in the proof and will
     not let the code run.  This is the case of a "false positive",
     but it is still safe.

  4. The program is incorrect, and so is the proof.
     Again, the host will detect the flawed proof and, thus, reject
     the program.  Safe.

As you see, the only thing we rely on here is that the logic itself is
sound.  There is no reference to an external authority or any form of
"web of trust".  It does not matter at all who sends the code, all it
matters that it comes with a correct proof that shows the program to
be safe.

Matthias

From: Christopher Browne
Subject: Proof Carrying Code
Date: Wed, 13 Mar 2002 22:50:04 +0000
Message-ID: <m34rjkt74z.fsf_-_@salesman.cbbrowne.com>

In the last exciting episode, Matthias Blume <········@shimizu-blume.com> wrote::
> On Wed, 13 Mar 2002 11:10:17 -0500, Christopher Browne wrote:
>> In an attempt to throw the authorities off his trail, Will Deakin
>> <···········@hotmail.com> transmitted:
>>> Thomas Bushnell, BSG wrote:

>>>> I think the point is that proof carrying code, or using only
>>>> pointer-safe languages, obviates the need for memory barriers
>>>> between programs.

>>> Fair enough -- I don't. And although I can't prove this ;) -- or
>>> want to for that matter -- I don't think Erik Naggum or Tim do
>>> either,

>> Seems to me this is rather like the issue of how much you trust a
>> PGP "web of trust."

>> They have the same kind of issues, namely of whether the "proof"
>> involves evidence that is useful for what you want it to imply.

>> A "proof-carrying code" doesn't necessarily provide evidence that
>> the system is proof against nefarious code.

> You may have some misconception about how PCC works.  The "proof" in
> "proof-carrying code" does not refer to the concept of proof as in
> "proof that I am 21 by showing my id".  Such a scheme would, indeed,
> be vulnerable to whether or not you trust the id and the person who
> issued it.

> PCC carries a proof in the traditional logic sense: It comes with a
> logic derivation of the safety property in question.  The derivation
> starts from basic principles -- axioms that the host can agree with.
> The derivation constitutes a formal proof.  Such a formal proof is
> mechanically checkable.

> Now, there are 4 possible scenarios:
> 
>   1. The program is correct, the proof is correct.
>      This is a safe situation, and the host will let the code run.
> 
>   2. The program is incorrect, the proof is correct.
>      In this case, unless the logic was inconsistent to begin with,
>      the proof will not actually conclude with the safety property
>      in question.  Thus, the host will not let the code run.
>      Safe again.
> 
>   3. The program is correct, the proof is incorrect.
>      In this case the host will detect the flaw in the proof and will
>      not let the code run.  This is the case of a "false positive",
>      but it is still safe.
> 
>   4. The program is incorrect, and so is the proof.
>      Again, the host will detect the flawed proof and, thus, reject
>      the program.  Safe.
> 
> As you see, the only thing we rely on here is that the logic itself is
> sound.  There is no reference to an external authority or any form of
> "web of trust".  It does not matter at all who sends the code, all it
> matters that it comes with a correct proof that shows the program to
> be safe.

That still leaves open the question of what is the form of the code
and of the proof, and _precisely_ what you're doing with it to
validate "safety."

What happens with Java code is that there is the claim that compiled
Java satisfies various properties that ensure some forms of safety.
_Unfortunately_, the power of this is diminished by the factor that
this is only true of JVM bytecode that was compiled by a "safe" Java
compiler.  

You can hack together bytecode that _won't_ be safe, which essentially
means that you head back to needing to trust whomever generated the
bytecode.  Bye, bye, "proof."

That is, of course, something of a strawman; it may not apply directly
to your scheme.

I guess this leaves open the question of what are the things that get
deployed.  With Java, it's a bunch of bytecode, perhaps with a digital
signature of some sort.

With PCC, what do we have distributed?

I'm speculating that there's:
 a) A set of binary bytecode of some sort;
 b) Some encoding of the "proof of safety."

You take the proof, take the bytecode, throw them into some program,
and, with some hopefully-reasonable amount of computation, either
achieve:

  T - the bytecode is consistent with the proof;
  NIL - there is something inconsistent with the proof.

It seems to me that this may still leave open a door to whether or not
the proof that you sent me is properly relevant, that is, that _I_
agree that your proof implies "safety."  Maybe I'm wrong about that;
perhaps there is a way of specifying what 'safety' is supposed imply
in the local environment, and to verify that the proof is meaningful.

My concern is that we might see a bunch of effort going into building
"proofs," when the proofs don't actually provide any "certification of
safety" that I happen to care about.

For instance, if I'm building a satellite, it doesn't suffice for you
to claim you have components that you can "prove" are somehow "really
good;" what I _need_ are components that I can take to my insurance
company and say:
  "Here, Rockwell has signed off on this 80486 chip, establishing that
   they certify it to be of Space Grade.  If it fails, you can pursue
   Rockwell..."

If we're not proving something that is consequential and readily
interpreted to be relevant to a particular context, it's a bit of a
waste of time...
-- 
(reverse (concatenate 'string ··········@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/internet.html
Rules of the Evil Overlord #46. If an advisor says to me "My liege, he
is but one man. What can one man possibly do?", I will reply "This."
and kill the advisor. <http://www.eviloverlord.com/>

From: Thomas Bushnell, BSG
Subject: Re: Proof Carrying Code
Date: Wed, 13 Mar 2002 23:08:44 +0000
Message-ID: <87zo1cnk03.fsf@becket.becket.net>

Christopher Browne <········@acm.org> writes:

> What happens with Java code is that there is the claim that compiled
> Java satisfies various properties that ensure some forms of safety.
> _Unfortunately_, the power of this is diminished by the factor that
> this is only true of JVM bytecode that was compiled by a "safe" Java
> compiler.  
> 
> You can hack together bytecode that _won't_ be safe, which essentially
> means that you head back to needing to trust whomever generated the
> bytecode.  Bye, bye, "proof."

No.  If someone hands me JVM bytecodes which don't satisfy the
standard properties, I can check that at load time, and determine that
it doesn't satisfy them.  The properties were carefully chosen to make
this easy to do.

There may be safe programs which don't meet the official properties
needed, and those will also get rejected; things that compile to JVM
bytecodes must therefore stick to the official properties in all the
code they generate.

The JVM does *not* trust the compiler to stick to the rules.

The difference between the JVM approach and the proof-carrying code
approach is that the JVM picks one sort of proof-schema and requires
all the proofs to match that schema.  The result is that the actual
proof can be omitted, it becomes easy (if you have stuck to the
schema==official required properties) for the JVM to generate the
necessary proof on the fly.

> I guess this leaves open the question of what are the things that get
> deployed.  With Java, it's a bunch of bytecode, perhaps with a digital
> signature of some sort.

The whole point of JVM bytecodes is that a signature is *NOT* required.

> My concern is that we might see a bunch of effort going into building
> "proofs," when the proofs don't actually provide any "certification of
> safety" that I happen to care about.

*Your* system advertises "by safety I mean X, Y, and Z".  Compilers
must generate proofs of *those* properties to satisfy you.  Other
proofs don't amount to a hill of beans.

If you get a piece of code from a random source and wonder "can I run
this safely", you simply see whether it actually proves the things you
want proven.  Certainly that requires the compiler having a clue what
things you were going to care about.

So the run time says "you must generate a proof that no stray memory
reads or writes ever happen"; indeed, the run time specifies exactly
what it means by "stray memory read or write".  -

Then, any code that comes with a proof of those properties, you can
check and run.

Really, please read the papers; references have already been posted.

Thomas

From: Matthias Blume
Subject: Re: Proof Carrying Code
Date: Thu, 14 Mar 2002 02:37:12 +0000
Message-ID: <pan.2002.03.13.21.37.32.20465.11510@shimizu-blume.com>

On Wed, 13 Mar 2002 17:50:04 -0500, Christopher Browne wrote:

[ ... my explanantion of PCC snipped ... ]
 
> That still leaves open the question of what is the form of the code and
> of the proof, and _precisely_ what you're doing with it to validate
> "safety."
> 
> What happens with Java code is that there is the claim that compiled
> Java satisfies various properties that ensure some forms of safety.
> _Unfortunately_, the power of this is diminished by the factor that this
> is only true of JVM bytecode that was compiled by a "safe" Java
> compiler.
> 
> You can hack together bytecode that _won't_ be safe, which essentially
> means that you head back to needing to trust whomever generated the
> bytecode.  Bye, bye, "proof."

I was not talking about Java- or any other bytecode.  PCC works with
machine code, even hand-optimized machine code.  PCC means that there are
two parts to a program: the code itself and the proof.  In the end, only
the code gets to execute.  The proof is just there to convince the host
that the code can be trusted.
 
> That is, of course, something of a strawman; it may not apply directly
> to your scheme.

Indeed.  Java bytecode is *not* PCC.

> I guess this leaves open the question of what are the things that get
> deployed.  With Java, it's a bunch of bytecode, perhaps with a digital
> signature of some sort.
> 
> With PCC, what do we have distributed?
> 
> I'm speculating that there's:
>  a) A set of binary bytecode of some sort; b) Some encoding of the
>  "proof of safety."
> 
> You take the proof, take the bytecode, throw them into some program,
> and, with some hopefully-reasonable amount of computation, either
> achieve:
> 
>   T - the bytecode is consistent with the proof; NIL - there is
>   something inconsistent with the proof.
> 
> It seems to me that this may still leave open a door to whether or not
> the proof that you sent me is properly relevant, that is, that _I_ agree
> that your proof implies "safety."  Maybe I'm wrong about that; perhaps
> there is a way of specifying what 'safety' is supposed imply in the
> local environment, and to verify that the proof is meaningful.

The host sets the rules of what needs to be proved.  If the code does not
come with a proof of precisely that, it will be rejected.

Matthias

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Thu, 14 Mar 2002 10:27:01 +0000
Message-ID: <fbc0f5d1.0203140227.7b9b0b9e@posting.google.com>

Christopher Browne <········@acm.org> wrote in message news:<··············@salesman.cbbrowne.com>...

> Seems to me this is rather like the issue of how much you trust a PGP
> "web of trust."

No, I don't think so. As I understand it (and I'm not familiar with
the literature) PCC carries with it a formal logical proof that the
code satisfies some properties. Due to the wonders of such things
this proof can be checked really efficiently, even though it might be
hard to write. So, so long as you trust your proof checker, then if
the proof is OK, then the code is OK. I am comfortable that this
approach does actually provide the safety it advertises.

My problems with this approach are really elsewhere:

1. Generating these proofs might be very hard in general. I asked
about this in this thread and one of the pro-PCC people (sorry if I am
unfairly representing your position whoever it was) described it, I
think, as a research problem, which might have tractable solutions if
the compiler generated only very stylized code. My intuition is that
this is pretty damming: it looks to me like it would place a very
significant burden on compiler implementors and so forth (which means
everything gets more expensive), and it might significazntly constrain
the kind of code they can generate to the extent that non-PCC systems
have possibly large performance wins and so on.

2. It begs various questions such as wondering whether the hardware
actually implements what it claims to. I suspect that many processors
have ugly little problems with things like interrupts which can make
bad things occasionally happen, *even if* the code they are running is
correct. Certainly if you look at a modern high-performance CPU it's
hard to imagine that one could ever prove it. Of course CPU desigers
already design in a lot of redundant checks on stuff such as ECC on
cache and so forth, because they *know* that the whole thing is
relying on all sorts of marginally-working (because if they werenm't
marginally working, you'd clock it faster until they were) components,
and other parts of HW design do the same thing.

3. It's a classic completely-brittle CS solution to a problem: if the
system (including the hardware, in fact particularly including the
hardware) behaves *exactly* according to its spec then the proof is
valid, but *any* deviation leaves you with really no promises at all,
and (if you leave out all the `redundant' hardware-supported checks of
things) it's actually quite likely to result in complete and possibly
undetectable system failure. I'm an engineer not a theorist and I
want my computer systems built the same way good battleships got
built: as far as possible there were layers upon layers of stuff so
that the ship was as unbrittle as it was possible to be[1]. I want
computing systems to be less brittle, and more like the kind of
amazingly robust physical engineering systems that we build, not more
brittle. We already build hardware like this (just look at the kind
of stuff that mainframes (including big Unix boxes) do), and I want
software to be more like this, not less.

--tim

[1] There's an interesting (and probably apocryphal) story about this.
When considering the design of, I think, the Hood class, there was a
question of whether the magazines should have the ability to vent to
the outside. If a magazine can vent like this, then the possibility
of a catastrophic explosion is reduced because the burning cordite
doesn't build up enough pressure & heat to cause a really big bang. A
magazine fire still isn't something you want to happen, but it might
not sink the ship. The decision was made to not do this because
`shells should not penetrate the magazines of His Majesty's ships'.
This is a classic choice of an elegant-but-brittle solution: why vent
the magazines when you can prevent shells getting in. In quite recent
history (1916) the British had lost three battlecruisers to
catastrophic magazine explosion due either to shells penetrating or
(more likely I think) flash from shell hits on turrets combined with
inadequately designed flash protection systems. We'd also lost at
least one ship due to *accidental* magazine explosion (cordite can
become unstable and catch fire spontaneously). And twenty-odd years
later Hood herself was sunk due to something that was almost certainly
a catastrophic magazine explosion. Better to have been redundant, I
think.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Thu, 14 Mar 2002 23:07:02 +0000
Message-ID: <87lmcupx49.fsf@becket.becket.net>

··········@tfeb.org (Tim Bradshaw) writes:

> 1. Generating these proofs might be very hard in general.  I asked
> about this in this thread and one of the pro-PCC people (sorry if I am
> unfairly representing your position whoever it was) described it, I
> think, as a research problem, which might have tractable solutions if
> the compiler generated only very stylized code.  My intuition is that
> this is pretty damming: it looks to me like it would place a very
> significant burden on compiler implementors and so forth (which means
> everything gets more expensive), and it might significazntly constrain
> the kind of code they can generate to the extent that non-PCC systems
> have possibly large performance wins and so on.

I think I said this.

So I don't know *how* hard it is; I know that some cases are no
trouble.  And if you are compiling a language like Scheme, it's *way*
easier than for a language like C.  Indeed, so much easier that it's
not too hard at all.

The obvious trivial compilation of Scheme, for example, will be easy
to prove memory validity with.  If you optimize it, it might be
harder--except that the compiler knows that the optimizations are!
That is, it doesn't have to intuit "what's going on", but rather as it
works its optimization, it simply updates the proof to match.

So I think for some languages, it's reasonably clear how to do it
efficiently.  Now, that said, it *is* a research topic, and how far
actual work has progressed is something I don't have the facts on;
you'd have to ask the folks actively working on it.  

> 2. It begs various questions such as wondering whether the hardware
> actually implements what it claims to.  I suspect that many processors
> have ugly little problems with things like interrupts which can make
> bad things occasionally happen, *even if* the code they are running is
> correct. 

This is a classic problem with "proofs of program correctness", but
proof-carrying code is a totally different beast.

If the hardware f*cks up, then you have *no* security guarantees;
that's just a well established maxim.  There is *no* solution that
somehow fixes that one, and anyway, PCC is not a replacement for
thinking about security.

Here's what PCC is for: 

If I use a correct Scheme compiler, on a correct runtime, I'm
guaranteed that the procedures I load will not tromp on random
memory.  By "correct" here, I mean nothing involving proofs; I just
mean that this is a property of a well-functioning Scheme system.

But I want to run a program that *you* write.  Now if you give me
sources, then I compile it with my compiler, and I'm guaranteed it
will not tromp on random memory.  It might do *other* things that are
horrible, but because of the way the language works, it won't tromp on
memory.

But that way restricts me to getting programs from you in sources.  If
I want a binary from you, then I have three choices:

1) Trust you to have compiled with a compiler that works as good as
   mine,
2) Make you prove that the program doesn't tromp on random memory.

So PCC is about doing number (2); signing code is a way of
standardizing number (1).

Once you have PCC, lots of *other* benefits come too.  I can ask you
to prove other things about the code too, if that's important to me.
(Though I think it's very unlikely that one could put *all* security
issues.)

> 3. It's a classic completely-brittle CS solution to a problem: if the
> system (including the hardware, in fact particularly including the
> hardware) behaves *exactly* according to its spec then the proof is
> valid, but *any* deviation leaves you with really no promises at all,
> and (if you leave out all the `redundant' hardware-supported checks of
> things) it's actually quite likely to result in complete and possibly
> undetectable system failure. 

You're still thinking about proffs-of-correctness.

Think of PCC as aiming at a smaller problem: replacing the need to
trust *your* compiler as much as I already trust *mine*.  That I
already trust mine is a given, and PCC is not about somehow obviating
*that*.

Thomas

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Fri, 15 Mar 2002 10:03:06 +0000
Message-ID: <fbc0f5d1.0203150203.3332249c@posting.google.com>

·········@becket.net (Thomas Bushnell, BSG) wrote in message news:<··············@becket.becket.net>...
> 
> If the hardware f*cks up, then you have *no* security guarantees;
> that's just a well established maxim.  There is *no* solution that
> somehow fixes that one, and anyway, PCC is not a replacement for
> thinking about security.
> 

I'm going to stop now, because think that, yet again in this thread,
I'm running into what Erik is calling the `1-bit/n-bit person' issue
and what I think of as the `discrete/continuous person' issue[1].

However to reiterate once more: I am not after security guarantees,
and, in fact, I hold such things to be chimeras.  What I am after is
elegant failure such that if, say, a bit of HW fails, I have a *good
chance* that some other part of the system will catch this failure and
stop it spreading to kill the entire system.  HW-supported memory
protection is a mechanism which can help this: arguing that it can be
omitted because code is known not to require it is a way of reducing
this good chance of survival to almost no chance, and is a classic
example of brittle `discrete' or `1-bit' thinking.

That's really all I have to say on this issue: I think it's an
unfortunate truth that communication between discrete and continuous
people is essentially impossible.  I wish this was not so, and for
years I denied it, but there does seem to be a lot of evidence.

--tim

[1] A continuous person is an n-bit person who is comfortable with the
idea that for n large enough you can quietly assume a continuum, in
particular you can gloss over the boundary between countably infinite
and continuous.  A discrete person is one who is not comfortable with
this but requires proof.  The classic example of this kind of thing is
QM, where physicists (who are generally continuous people) did some
really enormous glossing over in the 30s, which I think got sorted out
in the 40s or 50s when the discrete people worked out that you can in
fact choose a countable basis for the cases where the `obvious' basis
becomes uncountable.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Fri, 15 Mar 2002 16:47:56 +0000
Message-ID: <87r8mlaibn.fsf@becket.becket.net>

··········@tfeb.org (Tim Bradshaw) writes:

> However to reiterate once more: I am not after security guarantees,
> and, in fact, I hold such things to be chimeras.  What I am after is
> elegant failure such that if, say, a bit of HW fails, I have a *good
> chance* that some other part of the system will catch this failure and
> stop it spreading to kill the entire system.  HW-supported memory
> protection is a mechanism which can help this: arguing that it can be
> omitted because code is known not to require it is a way of reducing
> this good chance of survival to almost no chance, and is a classic
> example of brittle `discrete' or `1-bit' thinking.

This is certainly a possible benefit of HW-supported memory
protection.

I think, however:

1) It is nowhere near the princpal reason that a system like Unix is
   using HW-supported memory protection;

2) The kinds of security-relevant processor bugs that hardward often
   ends up having are things that memory protection is not able to
   stop.  (Lock up bugs, etc.)

Thomas

From: Rahul Jain
Subject: Re: self-hosting gc
Date: Fri, 15 Mar 2002 18:12:11 +0000
Message-ID: <871yelya2s.fsf@photino.sid.rice.edu>

··········@tfeb.org (Tim Bradshaw) writes:

> However to reiterate once more: I am not after security guarantees,
> and, in fact, I hold such things to be chimeras.  What I am after is
> elegant failure such that if, say, a bit of HW fails, I have a *good
> chance* that some other part of the system will catch this failure and
> stop it spreading to kill the entire system.  HW-supported memory
> protection is a mechanism which can help this: arguing that it can be
> omitted because code is known not to require it is a way of reducing
> this good chance of survival to almost no chance, and is a classic
> example of brittle `discrete' or `1-bit' thinking.

That's a common, and very reasonable position, but look at what it has
given us: the Unix process model. Every application runs it its own
universe, without knowing or wanting to know about what any other
process is doing. IPC in unix is painful because of processes. Instead
of just passing the other application a reference to your data, you
have to use elaborate schemes to register and protect the IPC services
you provide. I don't know if a pure capability-based IPC model is
really practical in unix, because of processes. This causes wasteful
copying and effectively forces all IPC to be CORBA-like instead of
function-call-and-reference-passing-like.

I don't know of any hardware which supports pure-capability-based
memory protection, but I'd be interested in hearing about it. Of
course, that doesn't help the fact that an OS that only runs on a
specialized hardware platform won't get much general use.

-- 
-> -/                        - Rahul Jain -                        \- <-
-> -\  http://linux.rice.edu/~rahul -=-  ············@techie.com   /- <-
-> -/ "Structure is nothing if it is all you got. Skeletons spook  \- <-
-> -\  people if [they] try to walk around on their own. I really  /- <-
-> -/  wonder why XML does not." -- Erik Naggum, comp.lang.lisp    \- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   (c)1996-2002, All rights reserved. Disclaimer available upon request.

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Fri, 15 Mar 2002 12:30:17 +0000
Message-ID: <pan.2002.03.15.07.30.37.841076.2448@shimizu-blume.com>

On Fri, 15 Mar 2002 05:03:06 -0500, Tim Bradshaw wrote:

> ·········@becket.net (Thomas Bushnell, BSG) wrote in message
> news:<··············@becket.becket.net>...
>> 
>> If the hardware f*cks up, then you have *no* security guarantees;
>> that's just a well established maxim.  There is *no* solution that
>> somehow fixes that one, and anyway, PCC is not a replacement for
>> thinking about security.
>

[ ... ]

> However to reiterate once more: I am not after security guarantees, and,
> in fact, I hold such things to be chimeras.  What I am after is elegant
> failure such that if, say, a bit of HW fails, I have a *good chance*
> that some other part of the system will catch this failure and stop it
> spreading to kill the entire system.  HW-supported memory protection is
> a mechanism which can help this: arguing that it can be omitted because
> code is known not to require it is a way of reducing this good chance of
> survival to almost no chance, [ ... ]

Do you (or anyone) have a pointer to some empirical study of how much
*in currently existing systems* memory protection buys you in terms of
making them more robust wrt. minor hardware failures?

(It seems to me that it can't be all that much, but that's just a guess.
For example, on a lightly-loaded linux server that has lots of main memory,
most of that memory will be occupied by a kernel data structure -- the
buffer cache.  What that means is that, say, a flipped bit in a pointer
has a good chance of messing with a part of the system that is thoroughly
non-separated from the most priviliged code in the system.  So what I am
saying is that at the current granularity, memory-protection is likely
(or so it seems to me) to have only a negligible net effect on system
robustness wrt. hardware failures.  But I'd be delighted to see proof
that my guess is wrong.)

Matthias

From: Nils Goesche
Subject: Re: self-hosting gc
Date: Fri, 15 Mar 2002 14:48:52 +0000
Message-ID: <a6t1kk$h1oql$1@ID-125440.news.dfncis.de>

In article <···································@shimizu-blume.com>, Matthias Blume wrote:
> On Fri, 15 Mar 2002 05:03:06 -0500, Tim Bradshaw wrote:
> 
>> However to reiterate once more: I am not after security guarantees, and,
>> in fact, I hold such things to be chimeras.  What I am after is elegant
>> failure such that if, say, a bit of HW fails, I have a *good chance*
>> that some other part of the system will catch this failure and stop it
>> spreading to kill the entire system.  HW-supported memory protection is
>> a mechanism which can help this: arguing that it can be omitted because
>> code is known not to require it is a way of reducing this good chance of
>> survival to almost no chance, [ ... ]
> 
> Do you (or anyone) have a pointer to some empirical study of how much
> *in currently existing systems* memory protection buys you in terms of
> making them more robust wrt. minor hardware failures?
> 
> (It seems to me that it can't be all that much, but that's just a guess.
> For example, on a lightly-loaded linux server that has lots of main memory,
> most of that memory will be occupied by a kernel data structure -- the
> buffer cache.  What that means is that, say, a flipped bit in a pointer
> has a good chance of messing with a part of the system that is thoroughly
> non-separated from the most priviliged code in the system.  So what I am
> saying is that at the current granularity, memory-protection is likely
> (or so it seems to me) to have only a negligible net effect on system
> robustness wrt. hardware failures.  But I'd be delighted to see proof
> that my guess is wrong.)

I once lived for a while with a defect memory module.  I had chosen
all parts of my computer myself, put them together in my living room
and everything seemed to work at first.  After a while, sometimes
unpleasant things happened.  On Winblows, Quake3 crashed after a few
hours, sometimes locking the system.  I don't run anything but games
on Windows, so that was all I noticed there (at first I thought the
OpenGL driver was faulty etc.).  On Linux, everything went fine until I
noticed that there were sometimes damaged inodes on my var partition,
and a random, unexplainable core dump now and then.  This happened
much later on Linux, so it took me a while to figure out that the
hardware might be faulty.  First I bought lots of fans.  Much later
I tried the Heise RAM test and found a defect module - after exchanging
it, everything worked fine, until today.

So, what net cost?  The worst thing was that I had to reinstall teTeX,
because some of its files were damaged on the var partition.  But that
was it.  I am certain that the damage would have been much higher if
there wouldn't have been an MMU that had some programs shut down who
used some faulty bits, and if I wouldn't always use lots of partitions
on systems I install, not only for limiting size but much more for
localizing the effects of hardware failures.

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Sat, 16 Mar 2002 22:10:14 +0000
Message-ID: <ey3u1rgdv09.fsf@cley.com>

* Matthias Blume wrote:

> Do you (or anyone) have a pointer to some empirical study of how much
> *in currently existing systems* memory protection buys you in terms of
> making them more robust wrt. minor hardware failures?

No.  However I think this is more both because most OS's have awful
designs (everything in one unprotected space inside the kernel, oops),
and because the systems people would bother testing already have ECC
which pushes undetected memory problems down so far they are hard to
measure.

At least one system (Sun? not sure) had or has stuff which would
detect ECC errors in a memory module, and if the pages on the module
were clean would mark them trap-on-touch and thus cause any processes
which mapped them to get traps next time they accessed them which
would cause them to be refetched from disk and mapped to a different
module. I think you can administratively mark a board as bad which
will cause it all to get marked this way so you can then swap it.

--tim

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Sat, 09 Mar 2002 11:52:56 +0000
Message-ID: <ey366469cvb.fsf@cley.com>

* Thomas Bushnell wrote:

> Why do you want a formal proof for a Lisp system before you accept its
> pointer guarantees, but you don't expect a formal proof for the C
> compiler or your kernel, before you accept their guarantees?

I would like formal proofs for both.  Since I can't get them for
either I want to have as many layers of probably-working stuff between
me and disaster.  Given that there is one instance of the kernel which
gets tested continually (and has not yet failed on our systems) as
against some thousand or so user programs none of which I trust, I'm
also reasonably convinced that the kernel actually is correct enough,
whereas I have plenty of evidence that the programs (including lisp
systems) are not entirely trustworthy.

(Incidentally this is the general flaw in the `why do you trust this
thing and not that thing' argument.  If `this thing' is a library or
kernel which is used all the time and `that thing' is several hundred
programs, then it is completely rational to trust this thing more than
that thing.  It's the reason why when my Lisp programs die, I tend to
assume that I've made a mistake rather than the compiler is buggy.
Statistics is a wonderful thing, although something alien to the `discrete'
CS mindset I realise.)

--tim

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 03:41:44 +0000
Message-ID: <pan.2002.03.09.22.42.00.224019.1560@shimizu-blume.com>

On Fri, 08 Mar 2002 14:22:17 -0500, Tim Bradshaw wrote:

> * Erik Naggum wrote:
> 
>>   How would you arrive at that proof?  What software would you trust
>>   implicitly in order to trust some other software explicitly?
> 
> I don't know.  I regard it as kind of up to the static correctness proof
> people to convince me of this.  Until then I'm keeping my hardware
> checks.
> 
> (Incidentally they also need to convince me that if the machine gets an
> undetected error in memory (which for me means a 3 bit error I think and
> for most PCs means a single bit error) that the resulting code is also
> safe.)

You are not seriously suggesting that any of the existing "let's check
explicitly" systems would still be safe in that situation?

Matthias

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 15:36:11 +0000
Message-ID: <ey3it848mfo.fsf@cley.com>

* Matthias Blume wrote:

> You are not seriously suggesting that any of the existing "let's check
> explicitly" systems would still be safe in that situation?

In case it's not clear, I am not suggesting (and indeed I hold the
idea to be ludicrous) that a computing system can ever be *known* to
be correct in the sense that you seem to mean by `safe'.  Something
sufficiently bad can always result in failure.  I'm simply after
systems that offer good statistical properties.

And yes, a system that dynamically protection between programs is
safer in this case.  The failing program will cause, say, wild
pointers to be followed, which will cause it to get hardware traps,
and likely get killed.  The system without hardware traps will simply
blunder on, probably writing crap all over other programs and the OS.

Of course undetected hardware errors are not a good situation to be
in, but it's one in which hardware protection can still help you.  So
systems without this protection should help here too.

--tim

From: Frode Vatvedt Fjeld
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 15:53:07 +0000
Message-ID: <2hvgc4ifmk.fsf@vserver.cs.uit.no>

Tim Bradshaw <···@cley.com> writes:

> And yes, a system that dynamically protection between programs is
> safer in this case.  The failing program will cause, say, wild
> pointers to be followed, which will cause it to get hardware traps,
> and likely get killed.  The system without hardware traps will
> simply blunder on, probably writing crap all over other programs and
> the OS.

I'd like to emphasize that presented with the choice "protection" or
"no protection", the choice is very easy. However, sometimes this is
conflicting with "performance" and also "flexibility" (in the sense
that communication between two lisp worlds is much more problematic
than between two processes in the same world, "world" being newspeak
for address-space, I guess).

-- 
Frode Vatvedt Fjeld

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 03:40:04 +0000
Message-ID: <pan.2002.03.09.22.40.19.885200.1560@shimizu-blume.com>

On Fri, 08 Mar 2002 14:14:26 -0500, Erik Naggum wrote:

> * Tim Bradshaw <···@cley.com>
> | The only kind of native code I'd regard as trusted is that for which |
> there is a formal correctness proof.
> 
>   How would you arrive at that proof?

That does not matter.  You can check whether the proof is correct.
That's far, far easier than coming up with it.

>  What software would you trust
>   implicitly in order to trust some other software explicitly?

The proofchecker -- which must be part of the trusted computing base, but
which does not have to be big or complicated.

That's the whole premise of PCC.

Matthias

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 03:48:29 +0000
Message-ID: <3224720919056228@naggum.net>

* Matthias Blume <········@shimizu-blume.com>
| That does not matter.  You can check whether the proof is correct.
| That's far, far easier than coming up with it.

  Do you check this for the source code or for the compiled machine code?

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 07:46:40 +0000
Message-ID: <87g038n9un.fsf@becket.becket.net>

Erik Naggum <····@naggum.net> writes:

> * Matthias Blume <········@shimizu-blume.com>
> | That does not matter.  You can check whether the proof is correct.
> | That's far, far easier than coming up with it.
> 
>   Do you check this for the source code or for the compiled machine code?

For the compiled machine code.

The compiler (which need not be trusted) provides, along with the
code, proofs that the code satisfies whatever memory rule (or other
checks) that the OS requires of executable code.  Producing such
proofs may be hard, but that's ok, no security depends on the proofs
being correctly produced.

Then, the system *checks* the proofs.  That's easy to do--it's just
making sure (essentially) that proper deductive logic is being used.
That's a small fairly piece of code, very easy to get right.  Anything
that comes with a valid proof meets the security tests.

Thomas

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 15:21:55 +0000
Message-ID: <ey3r8ms8n3g.fsf@cley.com>

* Thomas Bushnell wrote:

> The compiler (which need not be trusted) provides, along with the
> code, proofs that the code satisfies whatever memory rule (or other
> checks) that the OS requires of executable code.  Producing such
> proofs may be hard, but that's ok, no security depends on the proofs
> being correctly produced.

I kind of like the way people skate around the problem here.  Just
*how* hard is producing these proofs?  Just *how* expensive will this
make software?

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 19:55:25 +0000
Message-ID: <87d6ycuriq.fsf@becket.becket.net>

Tim Bradshaw <···@cley.com> writes:

> I kind of like the way people skate around the problem here.  Just
> *how* hard is producing these proofs?  Just *how* expensive will this
> make software?

Producing the proofs is, generally, pretty darn hard, depending on
what the OS is asking the compiler to prove.  But it's not *that*
hard, if the compiler sticks to well-worn idioms, for certain kinds of
proofs.  But definitely, this is a research topic--albeit one that is
seeing plenty of results.

Thomas

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 04:05:26 +0000
Message-ID: <pan.2002.03.09.23.05.41.455545.1560@shimizu-blume.com>

On Sat, 09 Mar 2002 22:48:29 -0500, Erik Naggum wrote:

> * Matthias Blume <········@shimizu-blume.com> | That does not matter.
> You can check whether the proof is correct. | That's far, far easier
> than coming up with it.
> 
>   Do you check this for the source code or for the compiled machine
>   code?

For the machine code, of course!  That's what PCC (proof-carrying code)
is all about.  (The proof in question proves a "safety theorem" about the
*machine* code.)

I really recommend looking up some of the canonical references.  Necula
and Lee is a good starting point.

Matthias

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 12:41:18 +0000
Message-ID: <3224752888315724@naggum.net>

* Matthias Blume <········@shimizu-blume.com>
| I really recommend looking up some of the canonical references.  Necula
| and Lee is a good starting point.

  As far as I can see, this is the first time you have bothered to give
  anyone any indication or where they could learn about all the stuff you
  claim is true, somewhat ironically without proof or evidence.  I have
  read your messages with what I find to be increasingly wild claims on a
  lacking foundation.  You may think you are right, but there is nothing in
  what you have written which communicates _how_ and _why_ to anybody else,
  unless they already agree with you, in which case it is rather pointless.

  You have also not credibly argued why any of these things are any better
  than what we have today, not in any technical sense on your own premises,
  of course, but in the sociological sense that the world would be better
  with this technology, that it would actually matter, and that people
  would go along with it.  Nor have you argued for a way of getting there
  from here or how to determine whether what you want to prove (in society)
  has succeeded or failed.  So far, it all looks like the kind of hope for
  a better world that I associate with cults and bad religions and really
  bad political ideologies, who all would probably work wonderfully --
  provided that human diversity does not get in the way, i.e., if everybody
  understood the theory, agreed to all of the tacit as well as explicit
  premises, and practiced it faithfully.

  I think you should have realized that the arguments you have received are
  not about the merits of the technology, but about its desirability and
  its premises.  This is the question of _why_ we do things, not _that_ we
  can do them.

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 15:03:58 +0000
Message-ID: <pan.2002.03.10.10.04.14.337167.2852@shimizu-blume.com>

On Sun, 10 Mar 2002 07:41:18 -0500, Erik Naggum wrote:

> * Matthias Blume <········@shimizu-blume.com> | I really recommend
> looking up some of the canonical references.  Necula | and Lee is a good
> starting point.
> 
>   As far as I can see, this is the first time you have bothered to give
>   anyone any indication or where they could learn about all the stuff
>   you claim is true,

I recall having mentioned proof-carrying code before.  I don't remember
whether I said "Necula" or "Lee" or both, but anyone who would have
invested the the 5 seconds it takes to type "proof-carrying code" into
Google, would have immediately arrived at a link to Peter Lee's overview
on the topic.

Matthias

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 15:29:45 +0000
Message-ID: <3224762994900674@naggum.net>

* Matthias Blume
| I recall having mentioned proof-carrying code before.  I don't remember
| whether I said "Necula" or "Lee" or both, but anyone who would have
| invested the the 5 seconds it takes to type "proof-carrying code" into
| Google, would have immediately arrived at a link to Peter Lee's overview
| on the topic.

  Thank you for your condescension.  It is becoming clear that you do not
  want to help people understand why you are right, but prefer to huff and
  puff because you probably are wrong.  It is clearly a waste of time for
  you to guide people.  It seems that you work very hard to make a peculiar
  kind of "you wouldn't understand, anyway" prejudice come true.  One has
  to wonder how you arrived at what you currently believe, but I shall not
  bother you with more questions nor request that you be helpful if you
  have any desire to convince others at all and not only try to intimidate
  them with your "superior" theories.  Just asking people to google is not
  quite sufficient to see how _you_ arrived at what you believe, you know.

  Perhaps you have become so defensive that you should take a long break?

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 15:22:18 +0000
Message-ID: <ey36647ccet.fsf@cley.com>

* Frode Vatvedt Fjeld wrote:

> The old lisp machines lacked OS protection mechanisms like address
> spaces, didn't they? Did they suffer substantially from this design?

Not then, but the thought of such a system with a MIME mailer
which could cause random Lisp code to be executed fills me with
horror.

--tim

From: Frode Vatvedt Fjeld
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 15:38:58 +0000
Message-ID: <2hn0xjjch9.fsf@vserver.cs.uit.no>

Tim Bradshaw <···@cley.com> writes:

> Not then, but the thought of such a system with a MIME mailer which
> could cause random Lisp code to be executed fills me with horror.

But wouldn't any mailer that causes any kind of random code to be
executed equally fill you with horror?

-- 
Frode Vatvedt Fjeld

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 16:19:43 +0000
Message-ID: <ey3pu2fav6o.fsf@cley.com>

* Frode Vatvedt Fjeld wrote:

> But wouldn't any mailer that causes any kind of random code to be
> executed equally fill you with horror?

yes, but not for the safety of the OS.

--tim

From: Frode Vatvedt Fjeld
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 18:01:31 +0000
Message-ID: <2helivj5vo.fsf@vserver.cs.uit.no>

Tim Bradshaw <···@cley.com> writes:

> * Frode Vatvedt Fjeld wrote:
>
>> But wouldn't any mailer that causes any kind of random code to be
>> executed equally fill you with horror?
>
> yes, but not for the safety of the OS.

Why is that distinction important? If your computer starts to send out
fake e-mails to your friends in your name, does it matter much if the
OS per se is infected or not? Can you ever trust your computer after
you have run untrusted code under your own privileges? Maybe if you
delete your entire account and install a new one with a backup, but is
that substantially better than doing the same thing for the entire OS,
given a single-user machine?

Or do you mean by "safety of the OS" that the machine will not crash?
In that case I think you have one question to ask yourself when facing
some binary blob that you consider executing as machine code: Do I
trust this code to obey the run-time invariants, function-call
conventions etc. of my system? If yes, you can have faith in this
run-time system to catch any errors, just like regular CL systems
catch pretty much all of the errors I experience in regular
work. Otherwise you either refuse to run the code, or if that is not
an option you could in fact implement address-spaces and the other
machinery needed for such special cases, as an extra feature. Just
like it would be concievable to implement support for (say) linux
binaries on top of some lisp system like this. This would not be my
first priority, however.

-- 
Frode Vatvedt Fjeld

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 18:44:32 +0000
Message-ID: <ey3d6yec31r.fsf@cley.com>

* Frode Vatvedt Fjeld wrote:
> Or do you mean by "safety of the OS" that the machine will not crash?

This is what I mean.  It matters to me that machines don't crash even
when individual programs misbehave: the machine I'm typing this on is
supporting a couple of web servers, primary DNS for several domains,
is the preferred MX for those domains, and is supporting a couple of
users doing normal user things and developing code with all the tools
that entails.  I don't want it to fall over, ever, and I'm happy to
say that despite the terrible deficiencies of the Unix privilege
model, it never has fallen over other than because of disk failure
(it's a small machine, we don't have RAID though we rsync stuff so our
loss in the case of disk failure is an hour or so).

> In that case I think you have one question to ask yourself when facing
> some binary blob that you consider executing as machine code: Do I
> trust this code to obey the run-time invariants, function-call
> conventions etc. of my system? 

No, I don't.

I think what this comes down to is that I'm a Lisp person.  I like my
languages dynamically checked, and I like my hardware dynamically
checked too.  Fortunately hardware designers are Lisp people too and
design hardware that checks dynamically. They have to be because they
live in a world of real objects which suffer from wear and random
failure, and most of all from software writers.

--tim

From: Frode Vatvedt Fjeld
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 19:22:24 +0000
Message-ID: <2hadtikgpb.fsf@vserver.cs.uit.no>

Tim Bradshaw <···@cley.com> writes:

> No, I don't.

In that case, as I said, you have special needs that require specific
support. I can certainly sympathize with the "I live in the real
world" argument, and I know that a DNS server for this hypothetical
lisp system won't materialize out of nowhere. However, an equally
pragmatic approach would be to get an old 486 with NetBSD (or whatever
your choice unix is) and have it do your DNS serving while you do your
exciting new computing on the hypo-box. And remember, keep both feet
on the ground and you'll never move forward (in other words I think
there is such a thing as being too pragmatic).

> I think what this comes down to is that I'm a Lisp person.  I like
> my languages dynamically checked, and I like my hardware dynamically
> checked too.

The way I see it, you are slightly confusing things here. The hardware
itself is rarely dynamically checked. What is currently happening is
that hardware is required to dynamically check software written in
inferior languages that don't know how to check anything.

> Fortunately hardware designers are Lisp people too

Oh, I'm not so sure about that. Have you looked at VHDL or Verilog? :-)

> and design hardware that checks dynamically. They have to be because
> they live in a world of real objects which suffer from wear and
> random failure, and most of all from software writers.

Hm. So you wouldn't consider the lisp machines' hardware designers to
be Lisp people?

I guess we're back to my earlier question (more or less): Did the lisp
machines crash often due to not having address spaces? What I do know
is that the uptime of the lisp systems I use is consistently better
than most OSes I use.

-- 
Frode Vatvedt Fjeld

From: Joe Marshall
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 20:20:42 +0000
Message-ID: <u29i8.2865$44.847449@typhoon.ne.ipsvc.net>

"Frode Vatvedt Fjeld" <······@acm.org> wrote in message
···················@vserver.cs.uit.no...
>
> I guess we're back to my earlier question (more or less): Did the lisp
> machines crash often due to not having address spaces?

Rarely.  They almost never crashed from stray or bad pointers.

However, it is trivial to crash a Lisp Machine deliberately (iterate through
the SYS package setting every symbol value to NIL, for example).  I think
this would be the `moral equivalent' of zeroing out memory in a more
primitive
language.

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 20:21:04 +0000
Message-ID: <ey34rjqbykv.fsf@cley.com>

* Frode Vatvedt Fjeld wrote:

>> I think what this comes down to is that I'm a Lisp person.  I like
>> my languages dynamically checked, and I like my hardware dynamically
>> checked too.

> The way I see it, you are slightly confusing things here. The hardware
> itself is rarely dynamically checked. What is currently happening is
> that hardware is required to dynamically check software written in
> inferior languages that don't know how to check anything.

Actually, the hardware is dynamically checked - what do you think an
ECC error is?  Or what do you think RAID does?  If you had seen big
machines, you'd also know how much HW checking goes on there, too.

But yes, I want the hardware to check what the programs do, because
it's incredibly cheap to do this checking and it buys a lot (and
incidentally, you need most of this stuff *anyway* to implement things
like VM systems, but we'll let that pass shall we).

> Hm. So you wouldn't consider the lisp machines' hardware designers to
> be Lisp people?

Ah wait a minute.  Have you ever seen a Lisp machine?  They
dynamically checked to an extent which very few modern machines do.
CAR of a non-list results in a hardware trap.  If you write C programs
on them then errors from wild pointers result in hardware traps.  I
think it went away later, but early 36xx machines (L machines rather
than G machines I think) had all sorts of amazing stuff with the FEP
checking the real processor and stopping it if it detected bad things
happening. Dynamic checking was kind of what a Lisp machine was about.

> I guess we're back to my earlier question (more or less): Did the lisp
> machines crash often due to not having address spaces? 

If they had been multiuser machines (which they were not) then I think
they would have been uncomfortably unreliable and weird, yes.  Even as
single-user machines you could cause the system to eat itself if you
didn't know what you were doing: if you screwed up the package system
you really needed to reboot unless you were very skillful and patient.

> What I do know is that the uptime of the lisp systems I use is
> consistently better than most OSes I use.

Well I don't know what OSs you use, but the ones I rely on effectively
don't crash any more so long as you don't insist on the latest
bleeding edge version.

--tim

From: Frode Vatvedt Fjeld
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 20:50:53 +0000
Message-ID: <2h6646kclu.fsf@vserver.cs.uit.no>

Tim Bradshaw <···@cley.com> writes:

> Actually, the hardware is dynamically checked - what do you think an
> ECC error is?  Or what do you think RAID does?

Ok, you got me there.

> But yes, I want the hardware to check what the programs do, because
> it's incredibly cheap to do this checking and it buys a lot

I'm not sure it's so cheap, recent OS research indicates it's not, in
many circumstances.

> (and incidentally, you need most of this stuff *anyway* to implement
> things like VM systems, but we'll let that pass shall we).

Many of the problems that come with address spaces are quite separate
from the issues VM etc. bring with them. Hm.. flashback, didn't I have
this discussion with someone here about one or two years ago?

> Ah wait a minute.  Have you ever seen a Lisp machine?

No, that's why I ask these questions about them :)

> They dynamically checked to an extent which very few modern machines
> do.  CAR of a non-list results in a hardware trap.

But this can be (and is) simulated on any CPU. Didn't the lisp
machines feature non-type-checked "load word" instructions?

> Well I don't know what OSs you use, but the ones I rely on
> effectively don't crash any more so long as you don't insist on the
> latest bleeding edge version.

But do your lisp systems crash, or experience many traps caused by
"bus errors"?

-- 
Frode Vatvedt Fjeld

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 21:48:42 +0000
Message-ID: <ey3pu2eafyd.fsf@cley.com>

* Frode Vatvedt Fjeld wrote:
> But this can be (and is) simulated on any CPU. Didn't the lisp
> machines feature non-type-checked "load word" instructions?

I don't know.  I suspect not, although you could probably have loaded
microcode which did.

> But do your lisp systems crash, or experience many traps caused by
> "bus errors"?

Yes, if you mean the Lisp systems I use to write on the Unix boxes.
Not often, but more often than once a year which is a conservative
estimate of how well the Unix boxes have done over the last 3 years
(they've been down more than that but it's been either HW failure or
changes, moving them, or reboots to test we still can reboot them).

--tim

From: Joe Marshall
Subject: Re: self-hosting gc
Date: Sat, 09 Mar 2002 00:11:29 +0000
Message-ID: <Rqci8.3102$44.1006416@typhoon.ne.ipsvc.net>

"Frode Vatvedt Fjeld" <······@acm.org> wrote in message
···················@vserver.cs.uit.no...

> Didn't the lisp machines feature non-type-checked "load word"
instructions?
>

Yes, there were primitives that could be used to manipulate untagged data,
but they were non-standard.

From: Jerry
Subject: Re: self-hosting gc
Date: Tue, 12 Mar 2002 12:47:17 +0000
Message-ID: <b69f1938.0203120447.47780ba7@posting.google.com>

Vilhelm Sjoberg <·····@cam.ac.uk> wrote in message news:<················@cam.ac.uk>...
...
> It would be nice with a operating system based on a type-safe language 

Your'e lucky: most of it exists: http://www.askemos.org/

> instead of C. It could then dispense with the concept of processes all 

It does.

> together, and there would be no distinction between user or system code. 
> (the "kernel" would dissolve into a set of libraries, with a thread 
> scheduler somewhere).

Mostly.
 
> Such an OS could of course still sandbox or proof-check or verify the 
> signatures of untrusted code, just as you run a Java VM in Windows/Unix.
> 
> -Vilhelm

Though it's currently implemented at user level (a ordinary process)
it was designed with the idea to change the underlying language
to operate directly on hardware.

/Jerry

From: Lex Spoon
Subject: sandboxes are hard?
Date: Wed, 13 Mar 2002 01:55:20 +0000
Message-ID: <m3u1rlp6xs.fsf_-_@logrus.dnsalias.net>

Vilhelm Sjoberg <·····@cam.ac.uk> writes:

> Tim Bradshaw wrote:
> 
> > I didn't quite mean it quite so literally.  Imagine I get a blob of
> > code, how do I know that it doesn't fake things?  The only way I can
> > see to do this is a completely trusted compiler, which can sign its
> > output, so you're still dynamically checking, you just do it once,
> > when the program starts (isn't this what MS push with ActiveX?).  Or I
> > guess you can do some kind of proof on the program before running it
> > (Java?).
> > Given the negligible cost of checks, I'd kind of rather the OS just
> > did them though.
> 
> 
> Running untrusted code, i.e. an opaque binary handed to you by a
> potentially malicious stranger, is a problem that requires somewhat
> elaborate solutions. 

I disagree entirely.  Java is elaborate, but solutions don't have to be.
In particular, if your language is fairly simple (like Scheme!), and
if you run on a virtual machine, then you can easily move the problem 
to verifying that the primitive operations are secure.  This is quite
similar to verifying that a TCP/IP server is secure.

I really have no idea why Java's security mechanisms are so complex.
You really don't have to do things like analyze who called you on the
stack.  A key idea to make one of these simpler sandboxes is to think
about capabilities instead of permission lists....

-Lex

From: Jeffrey Siegal
Subject: Re: sandboxes are hard?
Date: Wed, 13 Mar 2002 09:30:52 +0000
Message-ID: <3C8F1C4C.AE4351F4@quiotix.com>

Lex Spoon wrote:
> I really have no idea why Java's security mechanisms are so complex.
> You really don't have to do things like analyze who called you on the
> stack.

You don't as long as the sandbox is limited to being unable to do
anything powerful (like access files).  This was the original Java
security model, until they realized that being unable to do anything
powerful also meant being unable to do anything useful.  So they gave
"signed code" the ability to escape from the sandbox, but that opens up
the possibilty of signed code being exploited by unsigned code running
in the same JVM.  It was all downhill from there.

From: Lex Spoon
Subject: Re: sandboxes are hard?
Date: Fri, 15 Mar 2002 15:25:06 +0000
Message-ID: <m3lmctontj.fsf@logrus.dnsalias.net>

Jeffrey Siegal <···@quiotix.com> writes:

> Lex Spoon wrote:
> > I really have no idea why Java's security mechanisms are so complex.
> > You really don't have to do things like analyze who called you on the
> > stack.
> 
> You don't as long as the sandbox is limited to being unable to do
> anything powerful (like access files).  This was the original Java
> security model, until they realized that being unable to do anything
> powerful also meant being unable to do anything useful.  So they gave
> "signed code" the ability to escape from the sandbox, but that opens up
> the possibilty of signed code being exploited by unsigned code running
> in the same JVM.  It was all downhill from there.

No way.  Please visit, say, www.erights.org some time.  Or look at how
the Grail web browser is set up for running Python code.  The general
strategy is to make it impossible to even ask for things you aren't
allowed to do.  A mistake Java and a lot of systems make is that you
can ask for anything, and then the system has to decide whether you
are allowed.  This gets complicated because the answer depends on who
is asking....

Here's a key point: don't put iffy things in the global environment.
Instead, start out with an environment that only has innocuous things
in it, and then add objects with more power one by one.  

Consider files, for example.  Make a LimitedFileFactory that is only
allowed to open files from a certain directory.  Then put an
*instance* of the factory in the global environment.  Now you've
converted a sandbox that can do nothing, into a sandbox that can open
files by talking to LimitedFileFactory.  LimitedFileFactory doesn't
need to analyze who called it -- it implements the same limited
behavior no matter who calls it.  (Presumably, an UnlimitedFileFactory
is available for system code, and thus system code would never talk to
a LimitedFileFactory.)

This approach is much easier if you have a simple language.  You need
to guarantee, for example, that no one can obtain pointers that they
aren't handed by safe means.  But people outside of Sun have even
tinkered with doing it in Java.  In Java, you have to change the way
some services are requested, however.

-Lex

From: Nils Goesche
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 16:42:22 +0000
Message-ID: <a5ob1e$8lth7$1@ID-125440.news.dfncis.de>

In article <····································@shimizu-blume.com>, Matthias Blume wrote:
> On Fri, 01 Mar 2002 06:58:35 -0500, Nils Goesche wrote:
> 
>> Last time I checked, SML compilers generated pretty slow code.
> 
> Must have been a long time ago.

Last year.  I used SML/NJ, which you are probably familiar with :-))
Don't get me wrong:  I think SML/NJ is a great system, I am not such
a performance junkie.

>> When I ported programs to CMUCL they ran several times faster.
> 
> See the Great PL Shootout page.  There are three ML compilers (two SML
> and one Ocaml) ahead of CMUCL.  Two of them are *significantly* ahead.

I am aware of that web page.  Frankly I don't understand why CMUCL
looks so bad there.  Whenever /I/ implemented anything in several
languages, admittedly something bigger than the programs you find
there, CMUCL left most other compilers far behind, sometimes even
gcc.  And yes, the OCaml compiler is just great w.r.t. performance,
note that I was talking about SML.  Again: I actually prefer SML to
OCaml for various reasons; but it is not my experience that it is
particularly fast.

>> Same with static typing.  By your logic, OS's should be written rather
>> in *Haskell*.
> 
> Actually, no.  I think that ML's fairly restrictive type system would be
> just the right fit.

I am not surprised that you think so ;-)  But the point remains true:
If you advocate using a language that forbids the programmer to do
all kinds of things in order to make programs safer, why not go all
the way?  Shouldn't we use Haskell, then?  Why should that *one*
point, that we know there won't be any runtime type errors in SML,
make something like an OS so instantaniously safer?  I thought there
were /lots and lots/ of different kinds of runtime errors usually
occurring in programs.  Writing device drivers and hacking around
in kernels is my job.  I do that every day.  The problems I face
there are of a totally different nature.  Buggy hardware for instance.
Timing problems.  Synchronization.  Braindead protocols.  Deadlocks.
Non-aligned buffers.  Memory management.  Wild pointers.  Registers
that don't work as advertised.  I don't think much of these will
go away only because I won't have runtime type errors.  What I could
need down there is an expressive language that makes programming
those things easier, on a more abstract level.  And nothing is more
expressive than Lisp.

> By the way, the type system is not just for making sure your
> implementation is correct. It can also be used to *structure* the OS
> itself -- in such a way that things could (at least in theory) become
> much more efficient.

And all of a sudden we don't disagree anymore:  Sure, a rich type
system is a nice thing.  I am wondering if you are aware of the
fact that Common Lisp in fact /has/ a very rich type system: Your
example could be very easily rewritten in CLOS, for instance.
Static typing fanatics often pretend that there is no typing
besides static typing, but Common Lisp is /not/ an untyped language.
Very strange that this has to be repeated over and over again.
If you call a Lisp function with an argument it doesn't expect and
can't handle it will signal an error which you can catch.  It won't
do any weird things like an untyped language would.  And the
important observation is that experienced programmers hardly if
ever /get/ any of those runtime type errors.  And when they do,
then only because of some trivial oversight.  Typically when
that happens, a type-error is signalled the first time you run the
program and the bug is trivially fixed.  Compare that with your
experience:  When was the last time SML/NJ complained about a
type error you programmed that revealed a significant bug in your
program?  Sure, when I was a newbie in ML I got type errors all
the time and I was very impressed by how the compiler found all
those errors I made.  But when I became more experienced, those
errors occurred more and more rarely.  In the end they found
simple typos, not more.  Or complained about things which I
knew were perfectly correct but were too hard for the type
inference algorithm to figure out.  Then it became very annoying.

We'll probably never agree on this, but I wish people would stop
making claims like ``Lisp is an untyped language''; you didn't
make that claim, but all too many people draw that conclusion.

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 20:33:21 +0000
Message-ID: <pan.2002.03.01.15.33.21.468449.25143@shimizu-blume.com>

On Fri, 01 Mar 2002 11:42:22 -0500, Nils Goesche wrote:

> In article <····································@shimizu-blume.com>,
> Matthias Blume wrote:
>> On Fri, 01 Mar 2002 06:58:35 -0500, Nils Goesche wrote:
>> 
>>> Last time I checked, SML compilers generated pretty slow code.
>> 
>> Must have been a long time ago.
> 
> Last year.  I used SML/NJ, which you are probably familiar with :-))
> Don't get me wrong:  I think SML/NJ is a great system, I am not such a
> performance junkie.
> 

Thanks.  I am not such a performance junkie either. :-)
(You brought that topic up.)  But I acknowledge that poor performance can
be a showstopper in some situations.

>>> When I ported programs to CMUCL they ran several times faster.
>>
>> See the Great PL Shootout page.  There are three ML compilers (two SML
>> and one Ocaml) ahead of CMUCL.  Two of them are *significantly* ahead.
> 
> I am aware of that web page.  Frankly I don't understand why CMUCL looks
> so bad there.  Whenever /I/ implemented anything in several languages,
> admittedly something bigger than the programs you find there, CMUCL left
> most other compilers far behind, sometimes even gcc.

Maybe you are just more experienced in cranking out efficient code for
CMUCL than you are for SML.  I am serious about this:  The stuff on "that
web page" are actually putting SML/NJ at a disadvantage because Bagley
insisted on not using our latest versions -- which especially for the
x86 platform produces much better code almost all of the time.  Moreover,
some of the code was written in a slightly strange style.  For example,
using 110.39 sped up the matrix multiplication code by a factor of 3,
and re-coding that code in a (to me) more natural style sped it up by
another factor of 2.

> And yes, the OCaml
> compiler is just great w.r.t. performance, note that I was talking about
> SML.  Again: I actually prefer SML to OCaml for various reasons; but it
> is not my experience that it is particularly fast.

Certainly.  We perfectly agree here.

> 
>>> Same with static typing.  By your logic, OS's should be written rather
>>> in *Haskell*.
>> 
>> Actually, no.  I think that ML's fairly restrictive type system would
>> be just the right fit.
> 
> I am not surprised that you think so ;-)  But the point remains true: If
> you advocate using a language that forbids the programmer to do all
> kinds of things in order to make programs safer, why not go all the way?
>  Shouldn't we use Haskell, then?

We could. I am just a bit paranoid about programming in a language where
switching from writing a + b to b + a (literally!) can change the
complexity class of your program from O(n) to O(n^2) etc.  Don't get me
wrong, Haskell is a wonderful language as far as beauty is concerned.
But I prefer ML because my brain is wired in such a way that I can more
easily predict how well code that I write down will perform in practice.
Other people's brains are wired differently, and they may well go ahead
and code an OS in Haskell.

> Why should that *one* point, that we
> know there won't be any runtime type errors in SML, make something like
> an OS so instantaniously safer?

It is not "instantanious".  It actually takes quite a bit of experience
in setting up the right invariants and then code them into the type
system.

>> By the way, the type system is not just for making sure your
>> implementation is correct. It can also be used to *structure* the OS
>> itself -- in such a way that things could (at least in theory) become
>> much more efficient.
> 
> And all of a sudden we don't disagree anymore:  Sure, a rich type system
> is a nice thing.  I am wondering if you are aware of the fact that
> Common Lisp in fact /has/ a very rich type system: Your example could be
> very easily rewritten in CLOS, for instance.

But the CLOS type system will not be checked at compile time, at least
not the same way that the ML typesystem is being checked.  That's at
least my understanding.  Correct me if I am wrong.  Here is the acid
test:

I want to store in a variable x a 32-bit quantity that has the exact same
representation as an 'unsigned int' in C but for which I have a
compile-time guarantee that it will never be an odd number.  What I want
to be able to do on these numbers are the "usual" four arithmetic
operations (with division rounding down to the next even number), and
perhaps some comparisons.  And a conversion from/to "unsigned int"
(again, with rounding down to the next even).  In SML I write:

structure Even :> sig

   type even

   val + : even * even -> even
   val - : even * even -> even
   val * : even * even -> even
   val / : even * even -> even
   val compare : even * even -> order
   val fromWord: Word32.word -> even
   val toWord: even -> Word32.word

end = struct

   fun roundeven (x: Word32.word) =
       Word32.andb (x, 0wxfffffffe)

   type even = Word32.word
   val op + = Word32.+
   val op - = Word32.-
   val op * = Word32.*
   fun x / y = roundeven (Word32./ (x, y))
   val compare = Word32.compare
   fun toWord x = x
   val fromWord = roundeven

end

Now, whenever I see a value of type "even", I can be absolutely certain
that it will be an even number -- but the point is that all the
operations (except /) are *precisely* as efficient as the ones on Word32.

How do you do this in Common Lisp?

> Static typing fanatics
> often pretend that there is no typing besides static typing, but Common
> Lisp is /not/ an untyped language. Very strange that this has to be
> repeated over and over again. If you call a Lisp function with an
> argument it doesn't expect and can't handle it will signal an error
> which you can catch.

Right, but that happens only after I have already called it -- which is
too late.  This is precisely the thing that I described earlier: OS
syscalls do dynamic runtime checks.  But what I want is to get rid of
these runtime checks -- safely (i.e., without compromising the integrity
of the OS).

> It won't do any weird things like an untyped
> language would.  And the important observation is that experienced
> programmers hardly if ever /get/ any of those runtime type errors.  And
> when they do, then only because of some trivial oversight.

That is irrelevant because you use types in quite a different way than
the one I am proposing.  Catching the occasional programming error (which
happens more rarely the more experienced the programmer is) is one thing,
catching the malicious attempt at breaking an abstraction (which happens
more frequently the more experienced the malicious hacker is) is another.

I don't want a guarantee that works 99.9% of the time because that's not
good enough for this particular application.  What I want is something
that works always, but I want to eliminate the usual runtime penalty for it.

Matthias

From: Nils Goesche
Subject: Re: self-hosting gc
Date: Sat, 02 Mar 2002 22:38:11 +0000
Message-ID: <a5rk8j$9oets$1@ID-125440.news.dfncis.de>

In article <····································@shimizu-blume.com>, Matthias Blume wrote:
> On Fri, 01 Mar 2002 11:42:22 -0500, Nils Goesche wrote:
> 
>> And all of a sudden we don't disagree anymore:  Sure, a rich type system
>> is a nice thing.  I am wondering if you are aware of the fact that
>> Common Lisp in fact /has/ a very rich type system: Your example could be
>> very easily rewritten in CLOS, for instance.
> 
> But the CLOS type system will not be checked at compile time, at least
> not the same way that the ML typesystem is being checked.  That's at
> least my understanding.  Correct me if I am wrong.  Here is the acid
> test:

Well, not in the same way, I guess.  I declare whatever I want
to be checked...

> I want to store in a variable x a 32-bit quantity that has the exact same
> representation as an 'unsigned int' in C but for which I have a
> compile-time guarantee that it will never be an odd number.  What I want
> to be able to do on these numbers are the "usual" four arithmetic
> operations (with division rounding down to the next even number), and
> perhaps some comparisons.  And a conversion from/to "unsigned int"
> (again, with rounding down to the next even).  In SML I write:
> 
> structure Even :> sig
> 
>    type even
> 
>    val + : even * even -> even
>    val - : even * even -> even
>    val * : even * even -> even
>    val / : even * even -> even
>    val compare : even * even -> order
>    val fromWord: Word32.word -> even
>    val toWord: even -> Word32.word
> 
> end = struct
> 
>    fun roundeven (x: Word32.word) =
>        Word32.andb (x, 0wxfffffffe)
> 
>    type even = Word32.word
>    val op + = Word32.+
>    val op - = Word32.-
>    val op * = Word32.*
>    fun x / y = roundeven (Word32./ (x, y))
>    val compare = Word32.compare
>    fun toWord x = x
>    val fromWord = roundeven
> 
> end
> 
> Now, whenever I see a value of type "even", I can be absolutely certain
> that it will be an even number -- but the point is that all the
> operations (except /) are *precisely* as efficient as the ones on Word32.
> 
> How do you do this in Common Lisp?

I am not sure what to make of this.  I'd never define a type like
this, but you could do something like

(deftype even (num-type &optional size)
  `(and ,(if size
	     (list num-type size)
	     num-type)
        (satisfies evenp)))

(defun blark (x)
  (declare (type (even unsigned-byte 32) x))
  (+ x 42))

and whether that will be slower or faster than if you left out the
declaration depends only on compilation settings.  Sure, you'll say
that I won't have any guarantee that x will in fact be an even
number.  Well, that's right:  There is no guarantee that you
won't have any runtime type errors in CL.  That's the point:
I think it doesn't matter :-)  I don't have any guarantee that
I won't try to use too large array indices, either, for instance.

> I don't want a guarantee that works 99.9% of the time because that's not
> good enough for this particular application.  What I want is something
> that works always, but I want to eliminate the usual runtime penalty for it.

I know what you mean.  There are lots of things I'd like to have, too.
But there is no such thing as a free lunch.  Your perfect type
inference algorithm comes with a price:  It will always be too
restrictive, there is nothing one can do about that.  You think
that price is acceptable, I think it isn't.

This reminds me of what some people percieve as contradicting goals
of physicists and mathematicians:  The physicists want to get the
job done.  If they can't prove some formula they need rigorously
they'll use it anyway.  They have no time to do otherwise.
Some of them don't like mathematicians because they think that
mathematicians were always trying to force them to prove everything
rigorously or whatever, but strangely I have never met a
mathematician who actually did that.  In the same way, I want
to get the program done.  Fast.  I can't spend days and days
to find workarounds for some obscure restrictions on types
of elements of modules or something that are only there to ensure
the soundness of your carefully designed type system when I know
that what I am trying to do will work anyway in that case.
And I want to be able to quickly redesign and change arbitrarily large
parts of my software, and your type system will certainly get
in the way again.

You'll disagree, of course.  Let's leave it at that.  There is
no way you or I can /prove/ that our respective views on
practical programming is correct.  Only practice can show -- people
have to try out for themselves :-)

Regards,
-- 
Nils Goesche
Ask not for whom the <CONTROL-G> tolls.

PGP key ID 0xC66D6E6F

From: David Rush
Subject: Re: self-hosting gc
Date: Mon, 04 Mar 2002 23:14:49 +0000
Message-ID: <okfwuws8y6q.fsf@bellsouth.net>

Nils Goesche <···@cartan.de> writes:
> I want
> to get the program done.  Fast.  I can't spend days and days
> to find workarounds for some obscure restrictions on types
> of elements of modules or something that are only there to ensure
> the soundness of your carefully designed type system when I know
> that what I am trying to do will work anyway in that case.
> And I want to be able to quickly redesign and change arbitrarily large
> parts of my software, and your type system will certainly get
> in the way again.

Actually *I* agree, but having said that, it largely depends on the
application domain. For "five-9s" systems (OSes, telephone switches,
life-support systems...) I want as much surety of correctness as I can
get, so it's worth the time it takes to work withing the constraints
of static type discipline. For human-oriented systems (failure modes
can be handled by humans), especially when they must be developed
quickly, I'll take a dynamically-typed system any day of the week. In
these cases the delayed detection of failures is a benefit because it
allows me to cope with the fluidity of the marketplace.

Mathias thinks I'm a heretic for this, I'm sure, but that's the way
*my* brain is wired.

david rush
-- 
Coffee should be black as hell, strong as death and sweet as love.
	-- Turkish proverb

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Mon, 04 Mar 2002 12:18:41 +0000
Message-ID: <ey34rjwjzku.fsf@cley.com>

* Matthias Blume wrote:
> Right, but that happens only after I have already called it -- which is
> too late.  This is precisely the thing that I described earlier: OS
> syscalls do dynamic runtime checks.  But what I want is to get rid of
> these runtime checks -- safely (i.e., without compromising the integrity
> of the OS).

One thing that I think you should bear in mind is what these checks
cost, or rather don't.  Imagine the case where the OS has been given a
file handle (or something that claims to be one) and a buffer
(likewise).  In a dynamically checked system you have to do a check to
make sure the filehandle is a filehandle of a suitable kind, and a
check to make sure the buffer points at writable space belonging to
whoever made the call.  And then you have to fetch data from the disk
and put it into the buffer.  Even if the data is already cached you're
doing a memory-to-memory copy of probably hundreds of bytes.  It's in
the nature of modern processors that this will *hugely* outweigh the
cost of the checks unless you have done something terribly wrong. On a
PDP11 with fast memory and a slow processor this might have mattered,
on a modern machine where you spend most of your time waiting for
memory it's an irrelevance.

So the efficiency argument is really a red-herring.  A couple of more
interesting arguments are that the OS design might be simplified and
that errors might happen at a more convenient time.  The former I
don't know about (to make such a system work would require such
radical changes that it's hard to reason about simplicity), the latter
is also fairly bogus because you still need to handle things like the
disk filling up, ECC errors from main memory and so on - no amount of
theorem proving will make these go away or mean that well-written
applications (databases say) need to try and cope.

--tim

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Mon, 04 Mar 2002 15:28:25 +0000
Message-ID: <pan.2002.03.04.10.28.24.295409.1109@shimizu-blume.com>

On Mon, 04 Mar 2002 07:18:41 -0500, Tim Bradshaw wrote:

> * Matthias Blume wrote:
>> Right, but that happens only after I have already called it -- which is
>> too late.  This is precisely the thing that I described earlier: OS
>> syscalls do dynamic runtime checks.  But what I want is to get rid of
>> these runtime checks -- safely (i.e., without compromising the
>> integrity of the OS).
> 
> One thing that I think you should bear in mind is what these checks
> cost, or rather don't.  Imagine the case where the OS has been given a
> file handle (or something that claims to be one) and a buffer
> (likewise).  In a dynamically checked system you have to do a check to
> make sure the filehandle is a filehandle of a suitable kind, and a check
> to make sure the buffer points at writable space belonging to whoever
> made the call.  And then you have to fetch data from the disk and put it
> into the buffer.  Even if the data is already cached you're doing a
> memory-to-memory copy of probably hundreds of bytes.  It's in the nature
> of modern processors that this will *hugely* outweigh the cost of the
> checks unless you have done something terribly wrong. On a PDP11 with
> fast memory and a slow processor this might have mattered, on a modern
> machine where you spend most of your time waiting for memory it's an
> irrelevance.

Well, "read" was just one example -- and maybe not the best one, at least
not if you want to retain an Unix-style interface.  This is part of the
problem, really.  For example, in a strongly typed world I could imagine
the kernel handing back directly a pointer to its internal buffer data
structures (knowing that the type system will prevent user code to mess
with its invariants).  This way you get zero-copy I/O -- which for a
while has been pretty much the holy grail of OS implementation --
essentially for free.

Or, another example, consider "fstat":  In a world where kernel and user
space can safely share memory (because the type system will prevent "bad"
things from happening), all you need to do is give user
code a pointer to the kernel's data structures.  No context switch
necessary, no buffer copying, no checks whether the buffer exists and
is writable, etc. pp.

Matthias

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Mon, 04 Mar 2002 17:39:03 +0000
Message-ID: <ey3r8n0i66g.fsf@cley.com>

* Matthias Blume wrote:
> For example, in a strongly typed world I could imagine
> the kernel handing back directly a pointer to its internal buffer data
> structures (knowing that the type system will prevent user code to mess
> with its invariants).  This way you get zero-copy I/O -- which for a
> while has been pretty much the holy grail of OS implementation --
> essentially for free.

But this is already done.  Instead of the kernel handling a pointer
back, the user process hands a pointer in and the kernel reads direct
into that.

I really think that going after efficiency is not the thing to do:
conventional OSs have really good implementations by now.  You need to
aim for something which is not solved in conventional systems, not
even those that check dynamically (so buffer overflow isn't it).  I
have no idea what the win might be (I think there isn't one, but then
I'm not a fan of static type systems, so I would think that).

--tim

From: David Rush
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 17:40:01 +0000
Message-ID: <okf664g9oge.fsf@bellsouth.net>

Nils Goesche <······@cartan.de> writes:
> In article <····································@shimizu-blume.com>, Matthias Blume wrote:
> > On Fri, 01 Mar 2002 06:58:35 -0500, Nils Goesche wrote:
> >> Last time I checked, SML compilers generated pretty slow code.
> > Must have been a long time ago.
> Last year.  I used SML/NJ, which you are probably familiar with :-))

I've been hearing good things about MLton. And OCaml is widely known
for it's excellent code. My experience of SML/NJ was that the 0.93
codebase produced acceptably fast code. Then it got slower, and I
switched to Scheme (those facts are *not* related, actually).

> >> Same with static typing.  By your logic, OS's should be written rather
> >> in *Haskell*.

Actually OSes are so inherently stateful that I can't see how using a
pure functional language would help. Perhaps my imagination is simply
inadequate.

> Writing device drivers and hacking around
> in kernels is my job.  I do that every day.  The problems I face
> there are of a totally different nature.  Buggy hardware for instance.
> Timing problems.  Synchronization.  Braindead protocols.  Deadlocks.
> Non-aligned buffers.  Memory management.  Wild pointers.  Registers
> that don't work as advertised.  I don't think much of these will
> go away only because I won't have runtime type errors.  What I could
> need down there is an expressive language that makes programming
> those things easier, on a more abstract level.  And nothing is more
> expressive than Lisp.

Actually, I found SML to be incredibly expressive, but admittedly my
SML programs generally turned into huge masses of functors...

david rush
-- 
Don't you think it would be a useful item to add to your intellectual
tookit to be capable of saying, when a ton of wet steaming bullshit
lands on your head, "My goodness, this appears to be bullshit?"
	-- Douglas MacArthur Shaftoe, in _Cryptonomicon_

From: Nils Goesche
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 17:56:13 +0000
Message-ID: <a5ofbt$92vvn$1@ID-125440.news.dfncis.de>

In article <···············@bellsouth.net>, David Rush wrote:
> Nils Goesche <······@cartan.de> writes:

>> What I could need down there is an expressive language that
>> makes programming those things easier, on a more abstract level.
>> And nothing is more expressive than Lisp.

> Actually, I found SML to be incredibly expressive, but admittedly my
> SML programs generally turned into huge masses of functors...

Oh, I agree that SML is very expressive.  SML would certainly be
the language of my choice... if Common Lisp didn't exist :-)

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9

From: Kenny Tilton
Subject: Re: self-hosting gc
Date: Mon, 04 Mar 2002 20:25:34 +0000
Message-ID: <3C83D8E5.7F0F2373@nyc.rr.com>

David Rush wrote:
> Actually OSes are so inherently stateful that I can't see how using a
> pure functional language would help. Perhaps my imagination is simply
> inadequate.

Simple constraint systems (so simple they are essentially spreadsheets
for program state) are at once functional and all about state. My Cell
system /feels/ functional, but its purpose is to keep program state
internally consistent. This has puzzled me for a while. :)

-- 

 kenny tilton
 clinisys, inc
 ---------------------------------------------------------------
 "Be the ball...be the ball...you're not being the ball, Danny."
                                               - Ty, Caddy Shack

From: Nicolas Neuss
Subject: Re: self-hosting gc
Date: Mon, 04 Mar 2002 13:06:17 +0000
Message-ID: <87k7ssiit2.fsf@ortler.iwr.uni-heidelberg.de>

Nils Goesche <······@cartan.de> writes:

> > See the Great PL Shootout page.  There are three ML compilers (two SML
> > and one Ocaml) ahead of CMUCL.  Two of them are *significantly* ahead.
> 
> I am aware of that web page.  Frankly I don't understand why CMUCL
> looks so bad there.  Whenever /I/ implemented anything in several
> languages, admittedly something bigger than the programs you find
> there, CMUCL left most other compilers far behind, sometimes even
> gcc.  And yes, the OCaml compiler is just great w.r.t. performance,
> note that I was talking about SML.  Again: I actually prefer SML to
> OCaml for various reasons; but it is not my experience that it is
> particularly fast.

I looked again at that page, and (randomly) chose the sieve of
Eratosthenes.  Here, the CMUCL code uses a fixnum array instead of a
char array as the gcc code.  On my machine, I get

Evaluation took:
  1.07 seconds of real time
  0.92 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  32784 bytes consed.

Choosing a boolean array with CMUCL, I get

Evaluation took:
  0.3 seconds of real time
  0.3 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  32784 bytes consed.
NIL

which would make it the third-fastest (after gcc with 0.23 and ghc
with 0.27).

Considering that it is written in the methodology:

http://www.bagley.org/~doug/shootout/bench/sieve/

that one should use small arrays (for fitting in cache size) I find
the comparison quite unfair.  I'll mail an improved version to Doug
Bagley.

Yours, Nicolas.

From: Nicolas Neuss
Subject: Re: self-hosting gc
Date: Mon, 04 Mar 2002 14:52:06 +0000
Message-ID: <87g03gidwp.fsf@ortler.iwr.uni-heidelberg.de>

Nicolas Neuss <·············@iwr.uni-heidelberg.de> writes:

> Nils Goesche <······@cartan.de> writes:
> 
> > > See the Great PL Shootout page.  There are three ML compilers (two SML
> > > and one Ocaml) ahead of CMUCL.  Two of them are *significantly* ahead.
> > 
> > I am aware of that web page.  Frankly I don't understand why CMUCL
> > looks so bad there.  Whenever /I/ implemented anything in several
> > languages, admittedly something bigger than the programs you find
> > there, CMUCL left most other compilers far behind, sometimes even
> > gcc.  And yes, the OCaml compiler is just great w.r.t. performance,
> > note that I was talking about SML.  Again: I actually prefer SML to
> > OCaml for various reasons; but it is not my experience that it is
> > particularly fast.
> 
> I looked again at that page, and (randomly) chose the sieve of
> Eratosthenes.  Here, the CMUCL code uses a fixnum array instead of a
> char array as the gcc code.  On my machine, I get
> 
> Evaluation took:
>   1.07 seconds of real time
>   0.92 seconds of user run time
>   0.0 seconds of system run time
>   0 page faults and
>   32784 bytes consed.
> 
> Choosing a boolean array with CMUCL, I get
> 
> Evaluation took:
>   0.3 seconds of real time
>   0.3 seconds of user run time
>   0.0 seconds of system run time
>   0 page faults and
>   32784 bytes consed.
> NIL
> 
> which would make it the third-fastest (after gcc with 0.23 and ghc
> with 0.27).
> 
> Considering that it is written in the methodology:
> 
> http://www.bagley.org/~doug/shootout/bench/sieve/
> 
> that one should use small arrays (for fitting in cache size) I find
> the comparison quite unfair.  I'll mail an improved version to Doug
> Bagley.
> 
> Yours, Nicolas.

I'm sorry for it, but the above is nonsense.  CMUCL allocates full
words also for booleans (as can be seen from the consed bytes).
[Additionally, the original code contained an omission (it does not
reinitialize the array for each test run), which I have augmented with
another error...]

The correct result for CMUCL on my machine is 
Evaluation took:
  0.58 seconds of real time
  0.58 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  32784 bytes consed.

Compared with gcc (same version as used for the test)

real	0m0.268s
user	0m0.260s
sys	0m0.000s

such that CMUCL is a factor of 2 slower which is quite typical.  I
don't understand why Doug Bagley reports 0.94 secs for CMUCL, and I'll
ask him for that.

Nicolas.


P.S. 1: Here is the code I used:

(defun main ()
  (let ((flags (make-array 8193 :element-type 'fixnum :initial-element 1)))
    (loop repeat 900 of-type fixnum for count of-type fixnum = 0 then 0 do
          (loop for i fixnum from 2 upto 8192 do
                (unless (zerop (aref flags i))
                  (loop for k fixnum from (* 2 i) upto 8192 by i do
                        (setf (aref flags k) 0))
                  (incf count)))
          (dotimes (i 8192) (setf (aref flags i) 1))
          finally (format t "Count: ~D~%" count))))

(time (main))

P.S. 2: I tested also strings and bit arrays but results were worse.

From: Bulent Murtezaoglu
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 18:14:01 +0000
Message-ID: <87r8mxwogq.fsf@nkapi.internal>

>>>>> "NN" == Nicolas Neuss <·············@iwr.uni-heidelberg.de> writes:
[...]
    NN> I'm sorry for it, but the above is nonsense.  CMUCL allocates
    NN> full words also for booleans (as can be seen from the consed
    NN> bytes).  [Additionally, the original code contained an
    NN> omission (it does not reinitialize the array for each test
    NN> run), which I have augmented with another error...]

I went back and forth with Doug on this.  There are several issues you 
need to think about:

-- If you use bit vectors, you pay for some shifting of bits etc. 
   CMUCL actually generates somewhat suboptimal code for the X86 platform 
   (and extra mask and returning a value that's not used)

-- If you use eight bit BYTE's and don't coerce the compiler to use machine 
   integers to address the array.  You again pay for shifting for fixnum 
   untagging.

-- If you use fixnums or machine integers (32 bit) fixnums can address
   them w/o untagging, and you get fast results BUT this is cheating (won't 
   scale and it will spill over to L2 cache even with a small array.  Doug's 
   machine has 1/2 speed L2 cache (P-II) so your results will vary if you 
   have a full speed L2 (eg celeron will beat regular P-II).

this is about all I remember.  

There was also the additional issue of loop macro and declarations if 
remember correctly. 

Doug probably has a changelog of all this somewhere.  But disassemble and the 
compiler trace facility of CMUCL should be helpful also.

cheers,

BM

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 17:37:03 +0000
Message-ID: <3223993027817075@naggum.net>

* Matthias Blume
| The point is that both file descriptors and buffers are *unforgeable*
| abstract types.   So whenever a user program invokes mlos_read, it can
| only do so if it has obtained a valid file descriptor and a valid buffer
| beforehand.  Thus, mlos_read does not need to do *any* checking of its
| arguments at runtime because there is a compile-time proof that
| everything will be ok.  And the important contribution of the programming
| language is that it lets you define such abstractions (they do not have
| to be built-in).

  What happens when you close a file?  Suppose I get an opened-for-reading
  file descriptor back from open, store it somewhere, and its type is known
  to be such that I can mlos_read from it.  Do we have a different way to
  keep track of this open file than storing (a reference to) the file
  descriptor in a variable?  If not, how does close communicate the type
  back to the compiler so that there is only compile-time type checking
  that prevents you from calling mlos_read on a closed file descriptor?

  It is probably something very trivial in SML, but I keep getting confused
  by such things as the run-time behavior of streams, and wonder how a file
  descriptor that has hit end of file is prevented at compile-time from
  being used to read more data, and other such simple things.

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 20:38:32 +0000
Message-ID: <pan.2002.03.01.15.38.30.868244.25143@shimizu-blume.com>

On Fri, 01 Mar 2002 12:37:03 -0500, Erik Naggum wrote:

> * Matthias Blume
> | The point is that both file descriptors and buffers are *unforgeable*
> | abstract types.   So whenever a user program invokes mlos_read, it can
> | only do so if it has obtained a valid file descriptor and a valid
> buffer | beforehand.  Thus, mlos_read does not need to do *any* checking
> of its | arguments at runtime because there is a compile-time proof that
> | everything will be ok.  And the important contribution of the
> programming | language is that it lets you define such abstractions
> (they do not have | to be built-in).
> 
>   What happens when you close a file?

You wouldn't.

>   Suppose I get an
>   opened-for-reading file descriptor back from open, store it somewhere,
>   and its type is known to be such that I can mlos_read from it.  Do we
>   have a different way to keep track of this open file than storing (a
>   reference to) the file descriptor in a variable?  If not, how does
>   close communicate the type back to the compiler so that there is only
>   compile-time type checking that prevents you from calling mlos_read on
>   a closed file descriptor?

Well, if you insist on having "close" around, there are linear type
systems or unique types that have been built into some languages (but not
ML).  Alternatively, one could probably use monads in some clever way.

>   It is probably something very trivial in SML, but I keep getting
>   confused by such things as the run-time behavior of streams, and
>   wonder how a file descriptor that has hit end of file is prevented at
>   compile-time from being used to read more data, and other such simple
>   things.

Notice that I did not include "hitting the end of the file" into my
earlier description of which runtime checks I want to avoid.  Of course,
nobody should claim that *every* runtime check can be eliminated by
static analysis, doing so would be silly.  But the checks that can
be eliminated should be, IMO.

Matthias

From: Christian Lynbech
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 18:36:25 +0000
Message-ID: <87elj4f846.fsf@baguette.webspeed.dk>

>>>>> "Matthias" == Matthias Blume <········@shimizu-blume.com> writes:

Matthias> When unix_read is being called, the OS must check several things:

Matthias>    1. First argument:

Matthias>       - must be a number obtained from a previous call to open/dup/...
Matthias>       - must not have been subjected to "close" in the meantime
Matthias>       - must have been opened for reading

It may be that there are things I do not know about the ML system, or
how an ML OS would work file objects, but as I know file objects from
UNIX (be it represented by a small integer or something else), such an
object has state; so how would it be verifiable at compile time that a
particular call to `read' would not be operating on an object that has
been closed?

Is the ML compiler smart enough to know that in regions following the
close operation, the object has another type and would that work in a
multiprocess setting?

Or did it just misinterpret the extent of your claims (in which case I
must state for the record that it was with no malicious intent :-)

------------------------+-----------------------------------------------------
Christian Lynbech       | 
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - ·······@hal.com (Michael A. Petonic)

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 20:45:26 +0000
Message-ID: <pan.2002.03.01.15.45.24.920817.25143@shimizu-blume.com>

On Fri, 01 Mar 2002 13:36:25 -0500, Christian Lynbech wrote:

>>>>>> "Matthias" == Matthias Blume <········@shimizu-blume.com> writes:
> 
> Matthias> When unix_read is being called, the OS must check several
> things:
> 
> Matthias>    1. First argument:
> 
> Matthias>       - must be a number obtained from a previous call to
> open/dup/... Matthias>       - must not have been subjected to "close"
> in the meantime Matthias>       - must have been opened for reading
> 
> It may be that there are things I do not know about the ML system, or
> how an ML OS would work file objects, but as I know file objects from
> UNIX (be it represented by a small integer or something else), such an
> object has state; so how would it be verifiable at compile time that a
> particular call to `read' would not be operating on an object that has
> been closed?

Well, I did explicitly say that I want to structure things differently,
so I was not really thinking of a Unix-style treatment of files.

As I have already said in a reply to another such question, I would
probably not have a "close" operation at all -- for pretty much the same
reason that we don't need "free" in Lisp or ML.

And again, not all runtime checks can be eliminated this way -- in
particular not those that are testing for inherently dynamic properties.

> Is the ML compiler smart enough to know that in regions following the
> close operation, the object has another type and would that work in a
> multiprocess setting?

I don't know what this has to do with multiprocess settings.  (Notice
that even in Unix you can close a file in one process and another process
can still read/write from its (clone of that) file descriptor.)

And there are static typing mechanisms that would allow one to express the
effect that "close" has (linear types, unique types, monads).

Matthias

From: Frank A. Adrian
Subject: Re: self-hosting gc
Date: Sat, 02 Mar 2002 05:14:08 +0000
Message-ID: <BcZf8.370$i64.134381@news.uswest.net>

Matthias Blume wrote:
> And again, not all runtime checks can be eliminated this way -- in
> particular not those that are testing for inherently dynamic properties.

In other words, the "provable correctness" claim that was stated by another 
in an earlier post is false.  It is only static type claims that are 
enforced, a non-trivial, but fairly gross level of checking and certainly 
one that can be circumvented in any "useful" system (read any system that 
needs to input untagged binary object representations).

faa

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Sat, 02 Mar 2002 05:30:15 +0000
Message-ID: <pan.2002.03.02.00.30.16.500864.1901@shimizu-blume.com>

On Sat, 02 Mar 2002 00:14:08 -0500, Frank A. Adrian wrote:

> Matthias Blume wrote:
>> And again, not all runtime checks can be eliminated this way -- in
>> particular not those that are testing for inherently dynamic
>> properties.
> 
> In other words, the "provable correctness" claim that was stated by
> another in an earlier post is false.

Better: The claim is not related to the above problem.  (Dynamic tests
do not preclude provable correctness -- in fact, correctness will
often require certain dynamic tests to be present (and correct :-).)

>  It is only static type claims that
> are enforced, a non-trivial, but fairly gross level of checking and

Well, how far one can push these things remains to be seen.  It is
certainly not the technology of today.

> certainly one that can be circumvented in any "useful" system (read any
> system that needs to input untagged binary object representations).

This I would dispute (your definition of "useful", that is).  In our
more and more networked world, it is less and less likely that untagged
binary objects can be tolerated as input (and then used, e.g., as
executable code). This is completely independent of whether or
not static typing is involved.  For safety and security, it is absolutely
necessary to perform checks on dynamic data.  Static typing is simply a
technique to reduce the number of them by providing compile-time proofs
that some of them will always come out true, no matter what.

Matthias

From: Frank A. Adrian
Subject: Re: self-hosting gc
Date: Sat, 02 Mar 2002 19:04:38 +0000
Message-ID: <bn9g8.41$KD5.62253@news.uswest.net>

Matthias Blume wrote:

> This I would dispute (your definition of "useful", that is).  In our
> more and more networked world, it is less and less likely that untagged
> binary objects can be tolerated as input (and then used, e.g., as
> executable code). This is completely independent of whether or
> not static typing is involved.  For safety and security, it is absolutely
> necessary to perform checks on dynamic data.  Static typing is simply a
> technique to reduce the number of them by providing compile-time proofs
> that some of them will always come out true, no matter what.

You may dispute it.  The fact that there are legacy machines out there that 
provide these capabilities and the fact that it is cheaper to use this 
representation (in the sense of sending fewer bits) will ensure that it has 
some value in certain configurations and will need to be handled in a 
dynamic way even in static-typed systems.

My concern is that many people blur the line between levels of safety (note 
the earlier posters confusion between "statically type-checked" and 
"provably correct") and that many static typing afficianados like to take 
advantage of this blurring to make assertions that languages that do not 
provide these checks are somehow deficient rather than having simply 
selected a different point on the single axis labeled "type-safe", perhaps 
to gain ground on other valuation axes.

faa

From: David Rush
Subject: Re: self-hosting gc
Date: Mon, 04 Mar 2002 23:14:34 +0000
Message-ID: <okf1yf0adeu.fsf@bellsouth.net>

"Frank A. Adrian" <·······@ancar.org> writes:
> Matthias Blume wrote:
> > And again, not all runtime checks can be eliminated this way -- in
> > particular not those that are testing for inherently dynamic properties.
> 
> In other words, the "provable correctness" claim that was stated by another 
> in an earlier post is false.  It is only static type claims that are 
> enforced, a non-trivial, but fairly gross level of checking and certainly 
> one that can be circumvented in any "useful" system (read any system that 
> needs to input untagged binary object representations).

So what? It also allows one to clearly delineate the boundaries of
such type-unsafe behavior. This is one of the reasons why I am not a
fan of side-effect-free functional languages.  In Scheme, SML (and I
assume Common Lisp) I can isolate side-effecting code so that the rest
of the program is still amenable to algebraic reasoning. SML gives a
type system strong enough to isolate poorly-typed code so that it
doesn't pollute the clean bits.

Strong type systems are a bridge from the messy real world to a much
cleaner mathematical one.

david rush
-- 
There's man all over for you, blaming on his boots the faults of his feet.
	-- Samuel Becket (Waiting For Godot)

From: Stefan Monnier
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 17:16:42 +0000
Message-ID: <5lpu2jhr45.fsf@rum.cs.yale.edu>

>>>>> "Frank" == Frank A Adrian <·······@ancar.org> writes:
> In other words, the "provable correctness" claim that was stated by another
> in an earlier post is false.  It is only static type claims that are
> enforced, a non-trivial, but fairly gross level of checking and certainly

They can also statically enforce the presence of the relevant dynamic checks.


	Stefan

From: Nils Goesche
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 23:34:31 +0000
Message-ID: <a5p367$97tvj$1@ID-125440.news.dfncis.de>

In article <····································@shimizu-blume.com>, Matthias Blume wrote:
> As I have already said in a reply to another such question, I would
> probably not have a "close" operation at all -- for pretty much the same
> reason that we don't need "free" in Lisp or ML.

The guy hanging on the other side of the socket connection represented
by your file descriptor will most certainly not appreciate this
attitude of yours :-)

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9

From: Jeffrey Siegal
Subject: Re: self-hosting gc
Date: Sat, 02 Mar 2002 06:01:51 +0000
Message-ID: <3C806ACF.2070501@quiotix.com>

Nils Goesche wrote:
> In article <····································@shimizu-blume.com>, Matthias Blume wrote:
> 
>>As I have already said in a reply to another such question, I would
>>probably not have a "close" operation at all -- for pretty much the same
>>reason that we don't need "free" in Lisp or ML.
>>
> 
> The guy hanging on the other side of the socket connection represented
> by your file descriptor will most certainly not appreciate this
> attitude of yours :-)

You if you send him a message telling him you're done communicating.

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Sat, 02 Mar 2002 01:44:54 +0000
Message-ID: <pan.2002.03.01.20.44.56.257124.4284@shimizu-blume.com>

On Fri, 01 Mar 2002 18:34:31 -0500, Nils Goesche wrote:

> In article <····································@shimizu-blume.com>,
> Matthias Blume wrote:
>> As I have already said in a reply to another such question, I would
>> probably not have a "close" operation at all -- for pretty much the
>> same reason that we don't need "free" in Lisp or ML.
> 
> The guy hanging on the other side of the socket connection represented
> by your file descriptor will most certainly not appreciate this attitude
> of yours :-)

Well, there are two things that "close" does to a socket: it signals the
other guy that communication is over, and it deallocates the data
structures associated with this communication (if I am the last one
holding on to it).  What I was proposing was to not do the deallocation.
The runtime check for "is communication over?", of course, cannot be
eliminated because it is inherently dynamic.  (Actually, with unique
or linear types, or with monads, there might be a way around even this.)

Anyway, for the last time, I am not saying that every dynamic check can
be eliminated.  But many can, and those I would like to...

Matthias

From: Stefan Monnier
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 17:14:11 +0000
Message-ID: <5lu1rvhr8c.fsf@rum.cs.yale.edu>

>>>>> "Matthias" == Matthias Blume <········@shimizu-blume.com> writes:
> Well, there are two things that "close" does to a socket: it signals the
> other guy that communication is over, and it deallocates the data
> structures associated with this communication (if I am the last one
> holding on to it).  What I was proposing was to not do the deallocation.
> The runtime check for "is communication over?", of course, cannot be
> eliminated because it is inherently dynamic.  (Actually, with unique
> or linear types, or with monads, there might be a way around even this.)

Actually no type system can deal with it because the communication might
be closed because someone tripped over the ethernet cable.
I.e. it really is inherently dynamic.

As for those type systems you mention, the most relevant research
in the area is obviously the "Vault" system which is specifically designed
to make sure that operations are executed in the proper order (open, then
read, then close) in device drivers.

	Stefan

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 18:30:53 +0000
Message-ID: <pan.2002.03.05.13.30.51.958972.12355@shimizu-blume.com>

On Tue, 05 Mar 2002 12:14:11 -0500, Stefan Monnier <···@acm.com> wrote:

>>>>>> "Matthias" == Matthias Blume <········@shimizu-blume.com> writes:
>> Well, there are two things that "close" does to a socket: it signals
>> the other guy that communication is over, and it deallocates the data
>> structures associated with this communication (if I am the last one
>> holding on to it).  What I was proposing was to not do the
>> deallocation. The runtime check for "is communication over?", of
>> course, cannot be eliminated because it is inherently dynamic.
>> (Actually, with unique or linear types, or with monads, there might be
>> a way around even this.)
> 
> Actually no type system can deal with it because the communication might
> be closed because someone tripped over the ethernet cable. I.e. it
> really is inherently dynamic.

Yes, I know.  I was being too brief to be correct.  What I was thinking
of was not to completely eliminate the "end-of-communication?" test --
which for obvious reasons is not possible.
What I did think of was a type system that won't let me do this test again
after it said "yes" once.  (Of course, this is not very interesting other
than for checking the sanity of my program's internal logic.)

Matthias

From: Joe English
Subject: Re: self-hosting gc
Date: Sat, 02 Mar 2002 18:38:15 +0000
Message-ID: <a5r66n1slr@enews3.newsguy.com>

Christian Lynbech wrote:
>
>Matthias> When unix_read is being called, the OS must check several things:
>Matthias>    1. First argument:
>Matthias>       - must be a number obtained from a previous call to open/dup/..
>Matthias>       - must not have been subjected to "close" in the meantime
>Matthias>       - must have been opened for reading
>
>It may be that there are things I do not know about the ML system, or
>how an ML OS would work file objects, but as I know file objects from
>UNIX (be it represented by a small integer or something else), such an
>object has state; so how would it be verifiable at compile time that a
>particular call to `read' would not be operating on an object that has
>been closed?

Here's one way:

Don't provide 'open' and 'close' operations.

Instead, provide (in Haskell syntax, my ML is rusty):

	type ReadableFile = ... opaque
	type WritableFile = ... opaque

	withInputFile  :: (ReadableFile -> IO ()) -> FilePath -> IO ()
	withOutputFile :: (WritableFile -> IO ()) -> FilePath -> IO ()

	readChar :: ReadableFile -> IO (Maybe Char)
	... etc.

'withInputFile proc filename' opens the file for reading,
calls 'proc <filehandle>', closes the file, and returns ().
(I'm ignoring error conditions for the moment).

There's no way to get a ReadableFile except by passing
a callback procedure to withInputFile, and since this returns ()
the file handle can't "escape" after it's been closed.

(Well, you could use call/cc, mutable variables, or some
other trick to sneak an open file handle out of its dynamic
extent, but I think the above should be safe in a pure HM 
type system).

--Joe English

From: Michael Sperber [Mr. Preprocessor]
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 12:22:22 +0000
Message-ID: <y9lk7sw1nr5.fsf@informatik.uni-tuebingen.de>

>>>>> "David" == David Rush <····@bellsouth.net> writes:

>> If so, what is the advantages of SML in that problem domain?

David> Static typing. Provable correctness. I *hate* slow buggy OSes.

PreScheme gives you exactly that, which is what the GC of Scheme 48 is
written in.  Not that it buys you much there except for performance,
as the (or this particular) GC primarily deals with words and bits.

-- 
Cheers =8-} Mike
Friede, V�lkerverst�ndigung und �berhaupt blabla

From: Daniel C. Wang
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 16:42:25 +0000
Message-ID: <uk7snui32.fsf@agere.com>

·······@informatik.uni-tuebingen.de (Michael Sperber [Mr. Preprocessor]) writes:

> >>>>> "David" == David Rush <····@bellsouth.net> writes:
> 
> >> If so, what is the advantages of SML in that problem domain?
> 
> David> Static typing. Provable correctness. I *hate* slow buggy OSes.
> 
> PreScheme gives you exactly that, which is what the GC of Scheme 48 is
> written in.  Not that it buys you much there except for performance,
> as the (or this particular) GC primarily deals with words and bits.
> 

N.B. PreScheme is really just ML with Scheme syntax.. :)

From: Jochen Schmidt
Subject: Re: self-hosting gc
Date: Sat, 02 Mar 2002 04:59:06 +0000
Message-ID: <a5pigu$l$1@rznews2.rrze.uni-erlangen.de>

David Rush wrote:

> Followup-to ignored because I read c.l.s although I'm not sure that I
> really want to see the flamewar that is brewing...
> 
> Christian Lynbech <·······@get2net.dk> writes:
>> >>>>> "David" == David Rush <····@bellsouth.net> writes:
>> 
>> David> That said, I'm waiting for someone to well and truly resurrect
>> certain David> aspects of the LM ideal - specifically a GC-friendly OS -
>> although I David> suspect that it may be better written in SML.
>> 
>> Are you implying that SML is better suited for writing an OS than
>> Lisp?
> 
> Yes.
> 
>> If so, what is the advantages of SML in that problem domain?
> 
> Static typing. Provable correctness. I *hate* slow buggy OSes.

IMHO a new OS would be much more interesting if written in a highly dynamic 
language like Lisp than a static language like SML.
Operating Systems are very dynamic environments and you really want it to 
adapt to the actual need. How much runtime introspection do you have in SML?
How easy is it to introspect and patch running programs? How easy is it to 
add scripting capabilities to programs written in SML? (all rather trivial  
and elegant in Lisp)

I don't think that there is a need for yet another OS that would only add
compile-time type-safety to its mostly static and autistic 
program-components.

What I would want to see in new OS developments are things like moving the 
general OS behaviour from patiens to agens. I want the OS to actively 
observate the user(s) and try to actively act in a way that helps him/them 
to reach his/their goals. I cannot imagine developing such a thing in a 
language that is any less dynamic than lisp.

I sometimes wish that the programming environments in our so called "modern 
operating systems" would not distinguish so strongly between user 
interaction (shells, scripting) and "real" program development. A language 
like Lisp would enable a smooth transition from using it as a shell or for 
little scripting tasks up to writing whole applications. Communication 
between programs would be trivial and scripting would come for free.

ciao,
Jochen

--
http://www.dataheaven.de

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 13:45:17 +0000
Message-ID: <pan.2002.03.01.08.45.20.212805.3794@shimizu-blume.com>

On Fri, 01 Mar 2002 05:27:30 -0500, Christian Lynbech wrote:

>>>>>> "David" == David Rush <····@bellsouth.net> writes:
> 
> David> That said, I'm waiting for someone to well and truly resurrect
> certain David> aspects of the LM ideal - specifically a GC-friendly OS -
> although I David> suspect that it may be better written in SML.
> 
> Are you implying that SML is better suited for writing an OS than Lisp?
> If so, what is the advantages of SML in that problem domain?

One word: types.

Matthias

From: Rahul Jain
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 18:49:40 +0000
Message-ID: <87pu2op1h7.fsf@photino.sid.rice.edu>

Matthias Blume <········@shimizu-blume.com> writes:

> > Are you implying that SML is better suited for writing an OS than Lisp?
> > If so, what is the advantages of SML in that problem domain?

> One word: types.

I hope that wasn't an intentional troll, but merely a bizarre and
horrible misunderstanding of the entire way Lisp works.

-- 
-> -/-                       - Rahul Jain -                       -\- <-
-> -\- http://linux.rice.edu/~rahul -=-  ············@techie.com  -/- <-
-> -/- "I never could get the hang of Thursdays." - HHGTTG by DNA -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   Version 11.423.999.221020101.23.50110101.042
   (c)1996-2002, All rights reserved. Disclaimer available upon request.

From: Martin Simmons
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 20:37:45 +0000
Message-ID: <3c7fe6a4$0$232$ed9e5944@reading.news.pipex.net>

"Tim Moore" <······@sea-tmoore-l.dotcast.com> wrote in message
·················@216.39.145.192...
> The Utah Common Lisp collector was written in Lisp (not by me).  It
> was a stop-and-copy collector, and it certainly didn't look like any
> other Lisp program. IIRC, (car foo) would get you the forwarding
> pointer of the cons cell.  On the other hand, coding the GC algorithm
> in Lisp was pretty straight-forward, given the right primitives.
>
> I'm not sure if it's advantageous to write a Lisp collector in Lisp.
> Because the normal Lisp world is so inconsistant and screwed up while
> running the collector, normal Lisp advantages like debuggability and
> access to a repl simply don't apply.

Actually, they do.  Writing the GC in subset of Lisp that doesn't cons has
benefits like:

- use of the many code-construction mechanisms that Lisp allows (macros etc)

- easy sharing of the constants needed by the compiler to implement primitives
(assuming your Lisp compiler is written in Lisp :-)

- the ability to develop and test code on-the-fly (sure, if you screw up badly
then you are lost, but many cases this doesn't happen).

--
Martin Simmons, Xanalys Software Tools
······@xanalys.com
rot13 to reply

From: Christopher C. Stacy
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 02:03:01 +0000
Message-ID: <uk7sx82p6.fsf@theworld.com>

>>>>> On 28 Feb 2002 15:51:32 -0800, Thomas Bushnell, BSG ("Thomas") writes:

 Thomas> So I have a question for the people who know more than me.  

 Thomas> Is there experience with writing self-hosting GC for Lisp or Scheme
 Thomas> systems?  By "self-hosting" I mean that the GC is principally written
 Thomas> in Lisp/Scheme, and compiled by the native compiler.  I do not mean
 Thomas> something written in a secondary language (like C) and compiled by a
 Thomas> separate compiler, or something written all in assembly language.

 Thomas> Obviously there are interesting problems to be solved it
 Thomas> making such a thing work.

Mostly not consing!

 Thomas> What did the old Lisp Machines do?  It's my understanding that the GC
 Thomas> was basically all written in assembly language; is that correct?

As opposed to the new Lisp Machines?  :)

The Lisp Machine GC (like everything else that comprised
the entire machine operating system) was written in Lisp.

Lisp Machine Lisp was the superset from which most of Common Lisp
descends, and included functions to access the hardware in various
ways, including accessing memory below the normal (Lisp-object level)
storage conventions.  For example, you could hack type codes and
header words, and BLT words around.

The lowest-level machine instruction codes of the Lisp Machines was
intended for implementing Lisp, and mostly corresponded directly to
Lisp functions. The Lisp compiler just translated from higher Lisp
constructs and syntax into the more primitive Lisp constructs
implemented in the hardware.  Nobody programmed in "assembly language"
on those machines -- that would be the same as programming in Lisp,
only more cumbersome.

The hardware instructions included: the various Lisp function call 
and return protocols (including multiple values and CATCH and so on),
branching, stack and bindings manipulation, CONS, CAR, CDR, SET-TO-CAR,
SET-TO-CDR, RPLACA, RPLACD, GETF, MEMBER, ASSOC, EQ, EQL, GREATERP,
LESSP, LOGIOR, LOGTEST, ENDP, PLUSP, MINUSP, TYPEP, ZEROP, ADD, SUB,
MULTIPLY, etc, CEILING, TRUNCATE, ASH, ROT, LSH, array instructions
such as AREF, ASET, STORE-ARRAY-LEADER, %INSTANCE-REF, %INSTANCE-SET,
%GENERIC-DISPATCH, and so forth.  The subprimitive instructions were
things like typed memory allocation and BLTs, %POINTER-DIFFERENCE, 
%SET-TAG, %SET-CDR-CODE, STORE-CONDITIONAL (atomic), and stuff like
%CHECK-PREEMPT-REQUEST for handling hardware interrupts and reading
the clock and writing device control registers. Also, some instructions
were available to the compiler for common optimizations, such as an
instruction that corresponded to (SETQ X (CDR X)) seen in loops.

The actual code for the GC was just plain written in Lisp, although it
called "subprimitive functions" (eg. instructions like %POINTER-TYPE-P).
The code was written using normal Lisp syntax, such as the LOOP macro,
and was compiled by the normal Lisp compiler and run like any other.

 Thomas> Is there research/experience on doing it?
 Thomas> Guesses about the best ways to make it work?

See also T (an early implementation of Scheme by Rees and Pitman)
<http://www.paulgraham.com/thist.html>

From: Craig Brozefsky
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 05:46:40 +0000
Message-ID: <873czk7sak.fsf@piracy.red-bean.com>

·········@becket.net (Thomas Bushnell, BSG) writes:

> So I have a question for the people who know more than me.  
> 
> Is there experience with writing self-hosting GC for Lisp or Scheme
> systems?  By "self-hosting" I mean that the GC is principally written
> in Lisp/Scheme, and compiled by the native compiler.  I do not mean
> something written in a secondary language (like C) and compiled by a
> separate compiler, or something written all in assembly language.

If I recall correctly, Scheme48 was written in PreScheme, a Scheme
dialect, and that includes it's GC.

-- 
Craig Brozefsky                           <·····@red-bean.com>
                                http://www.red-bean.com/~craig
Ask me about Common Lisp Enterprise Eggplants at Red Bean!

From: Christian Lynbech
Subject: Re: self-hosting gc
Date: Fri, 01 Mar 2002 08:57:15 +0000
Message-ID: <87pu2od5sk.fsf@baguette.webspeed.dk>

>>>>> "Thomas" == Thomas Bushnell, BSG <tb> writes:

Thomas> So I have a question for the people who know more than me.  

I will try to answer anyway :-)

Thomas> Is there experience with writing self-hosting GC for Lisp or Scheme
Thomas> systems?

I have been pondering the concept of a "systems lisp" and I think that
Hnery Bakers Linear Lisp[1] has some very interesting properties for
such applications.

First of all Linear Lisp does not need a garbage collector. Since each
object is referenced only once, it is possible to deallocate an object
as soon as the unique reference to it is overwritten.

Secondly, it is quite possible that it gets easier to reason about the
memory requirements of Linear Lisp routines due to the highly regular
memory usage patterns.

The question is of course then how to integrate a Linear Lisp into the
system. In some sense it is cheating since we are using a second
language for the GC part, even though we are not cheating very much
since a Linear Lisp would be a proper subset of the real lisp.

The straightforward solution would be to extend the compiler with a
special linear mode, but I think that it would even be possible to
build a linear sublanguage on top of an existing system, if proper
access to lowlevel memory management was available.

Check your favorite Hnery Baker archive for his papers on Linear Lisp.


[1] Linear Lisp is a lisp in which all objects (the term is used here
    in its generic sense) is referenced exactly once. This means that
    if you want to pass on an object, you either must destroy the
    original reference or explicitly copy the object.


------------------------+-----------------------------------------------------
Christian Lynbech       | 
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - ·······@hal.com (Michael A. Petonic)

From: Bernhard Pfahringer
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 05:00:17 +0000
Message-ID: <a61jd1$v54$1@hummel.cs.waikato.ac.nz>

In article <··············@baguette.webspeed.dk>,
Christian Lynbech  <·······@get2net.dk> wrote:
>
>[1] Linear Lisp is a lisp in which all objects (the term is used here
>    in its generic sense) is referenced exactly once. This means that
>    if you want to pass on an object, you either must destroy the
>    original reference or explicitly copy the object.
>

I have read that some time ago, but I keep wondering how this choice
influences expressivity: it seems that you cannot directly represent
any circular structure. Is that true?
If so, would one need to implement a layer on top (maybe using hash-tables)
to represent any form of circular structure? 
Would that force the application programmer to implement their own specialized
GC for that structure? Similar to triggers and integrity constraints in a 
relational DB?

Bernhard
-- 
---------------------------------------------------------------------
Bernhard Pfahringer, Dept. of Computer Science, University of Waikato
http://www.cs.waikato.ac.nz/~bernhard                  +64 7 838 4041
---------------------------------------------------------------------

From: Christian Lynbech
Subject: Linear Lisp (was Re: self-hosting gc)
Date: Tue, 05 Mar 2002 08:34:06 +0000
Message-ID: <of664bxvk1.fsf_-_@chl.ted.dk.eu.ericsson.se>

>>>>> "Bernhard" == Bernhard Pfahringer <········@hummel.cs.waikato.ac.nz> writes:

Bernhard> I have read that some time ago, but I keep wondering how this choice
Bernhard> influences expressivity: it seems that you cannot directly represent
Bernhard> any circular structure. Is that true?

Good question. It is too long since I read Bakers papers on the
subject, but I do recall him having some kind of demonstration that
certain things are still possible, even in a linear lisp.

Even if one cannot have a circular structure "by reference", one can
of course have it "by name" (ie. rather than having a pointer to a
previous element you could use some form of symbolic representation
of the element). 


------------------------+-----------------------------------------------------
Christian Lynbech       | Ericsson Telebit, Skanderborgvej 232, DK-8260 Viby J
Phone: +45 8938 5244    | email: ·················@ted.ericsson.se
Fax:   +45 8938 5101    | web:   www.ericsson.com
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - ·······@hal.com (Michael A. Petonic)

From: Mike Travers
Subject: Re: self-hosting gc
Date: Mon, 04 Mar 2002 18:48:29 +0000
Message-ID: <cabbd060.0203041048.79803b67@posting.google.com>

This is not a direct answer to your question, but I know of a couple
of projects in other languages that did something like this, and would
probably give some insight:

The Jalapeno Java JVM:
http://www.research.ibm.com/jalapeno/publication.html#oopsla99_jvm

Squeak Smalltalk:
ftp://st.cs.uiuc.edu/Smalltalk/Squeak/docs/OOPSLA.Squeak.html

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 05:01:05 +0000
Message-ID: <87zo1nvca6.fsf@becket.becket.net>

··@mdli.com (Mike Travers) writes:

> The Jalapeno Java JVM:
> http://www.research.ibm.com/jalapeno/publication.html#oopsla99_jvm
> 
> Squeak Smalltalk:
> ftp://st.cs.uiuc.edu/Smalltalk/Squeak/docs/OOPSLA.Squeak.html

Neither of these have a self-hosting GC.

From: Marco Antoniotti
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 15:32:34 +0000
Message-ID: <y6cbse3jai5.fsf@octagon.mrl.nyu.edu>

·········@becket.net (Thomas Bushnell, BSG) writes:

> ··@mdli.com (Mike Travers) writes:
> 
> > The Jalapeno Java JVM:
> > http://www.research.ibm.com/jalapeno/publication.html#oopsla99_jvm
> > 
> > Squeak Smalltalk:
> > ftp://st.cs.uiuc.edu/Smalltalk/Squeak/docs/OOPSLA.Squeak.html
> 
> Neither of these have a self-hosting GC.

From your posts it seem like you want some primitives that get in the
guts of the OS/Hardware layers of the machine.  Am I correct?

Cheers


-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 21:09:18 +0000
Message-ID: <87eliy203l.fsf@becket.becket.net>

Marco Antoniotti <·······@cs.nyu.edu> writes:

> From your posts it seem like you want some primitives that get in the
> guts of the OS/Hardware layers of the machine.  Am I correct?

I thought I was bright-shining clear.  What I want is a GC written in
the language itself, with all the normal language facilities
available.

Having special peek/poke primitives is certainly necessary for that
task, but not sufficient.

Consider, for example, that memory management for implementations of
the C language are normally written in C.

Consider that if a Lisp system's GC is written in some other language
(like, say, C) then you now need two compilers to build the language.
If your only use for a C compiler is to compile your GC, then you have
really wasted a vast effort in writing one.

Thomas

From: Marco Antoniotti
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 21:30:21 +0000
Message-ID: <y6cn0xm4s9e.fsf@octagon.mrl.nyu.edu>

·········@becket.net (Thomas Bushnell, BSG) writes:

> Marco Antoniotti <·······@cs.nyu.edu> writes:
> 
> > From your posts it seem like you want some primitives that get in the
> > guts of the OS/Hardware layers of the machine.  Am I correct?
> 
> I thought I was bright-shining clear.  What I want is a GC written in
> the language itself, with all the normal language facilities
> available.
> 
> Having special peek/poke primitives is certainly necessary for that
> task, but not sufficient.
> 
> Consider, for example, that memory management for implementations of
> the C language are normally written in C.
> 
> Consider that if a Lisp system's GC is written in some other language
> (like, say, C) then you now need two compilers to build the language.
> If your only use for a C compiler is to compile your GC, then you have
> really wasted a vast effort in writing one.

I understand your points.  What I wanted to point out is that the
`malloc' library you write under Unix is different from the one your
write under Windows.  In (Common) Lisp, you have another layer to get
past by: the specific CL implementation, which may or may not give you
the necessary hooks to control the OS interface in a way that does not
interfere with the (Common) Lisp system itself.

So the question is: how do you get past this `impasse'?  (I surely
don't claim to know how).

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.

From: Christopher Browne
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 22:25:18 +0000
Message-ID: <m3zo1mejox.fsf@chvatal.cbbrowne.com>

Marco Antoniotti <·······@cs.nyu.edu> wrote:
> ·········@becket.net (Thomas Bushnell, BSG) writes:
>
>> Marco Antoniotti <·······@cs.nyu.edu> writes:
>> 
>> > From your posts it seem like you want some primitives that get in the
>> > guts of the OS/Hardware layers of the machine.  Am I correct?

>> I thought I was bright-shining clear.  What I want is a GC written
>> in the language itself, with all the normal language facilities
>> available.

>> Having special peek/poke primitives is certainly necessary for that
>> task, but not sufficient.

>> Consider, for example, that memory management for implementations
>> of the C language are normally written in C.

>> Consider that if a Lisp system's GC is written in some other
>> language (like, say, C) then you now need two compilers to build
>> the language.  If your only use for a C compiler is to compile your
>> GC, then you have really wasted a vast effort in writing one.

> I understand your points.  What I wanted to point out is that the
> `malloc' library you write under Unix is different from the one your
> write under Windows.  In (Common) Lisp, you have another layer to
> get past by: the specific CL implementation, which may or may not
> give you the necessary hooks to control the OS interface in a way
> that does not interfere with the (Common) Lisp system itself.

> So the question is: how do you get past this `impasse'?  (I surely
> don't claim to know how).

I don't think the `impasse' is passable.

Consider that the various Unix kernels out there do NOT use "all of
C;" they use subsets that on the one hand likely permit all the
_operators_ and control structures of the base language, but which
_EXCLUDE_ great gobs of "The Standard C Library," notably anything
that forcibly depends on malloc().

One of the Frequently Asked Questions about Linux is "So why don't you
port it to C++?  Wouldn't that make it lots better?"

The _real_ answer to that:  "Because the developers prefer C."

But another pointed reason not to is that C++ subsumes into the base
language a bunch of stuff that, in C, is part of LIBC, and, which, in
many cases, depends on having malloc()/free() (or equivalents thereof)
around to do their work, what with constructors and destructors and
the like.

In order to build an OS kernel in C++, you have to very carefully pick
a subset that doesn't require any underlying "runtime support."  By
the time you gut C++ that way, what you've got is basically C with
classes, and there's little point to calling it a "C++-based OS."

With Lisp, it's much the same story; you will at the "base" have to
have some basic set of functions and operations that DO NOT REQUIRE
RUNTIME SUPPORT, because the point of the exercise is to _implement_
that runtime support.

This actually suggests there being merit to the hoary question of
"What's a good `base CL?'" where you bootstrap with some minimal set
of operators, functions, and macros, and then implement the rest of
the system on top of that.

A necessary "base" would include some basic set of operators/functions
necessary for writing the garbage collector which do not themselves
make any use of dynamic memory allocation.  [Might this mean that the
'base' would exclusively use stack-based memory allocation?  I'd tend
to think so...]

The notion that the system could bootstrap itself without that limited
'base' seems very wishful.

I'll bet an interesting OS to look at would be SPIN, which was
implemented in Modula-3.  M3 offers the same "chewy garbage collection
goodness" of Lisp; presumably the SPIN kernel has to have certain
sections that implement the "memory management runtime support" in
such a way that they require no such runtime support.

Forth would be another candidate; one of the longstanding traditions
there is the notion of implementing "target compilers" which start
with a basic set of CODE words (e.g. - assembly language) and then use
that as a bootstrap on top of which to implement the rest of the
language.  

That actually points to a somewhat reasonable approach:
 - Write a function that issues assembly language instructions
   into a function;
 - Write some functions that issue groups of assembly language
   instructions ("macros" in the assembler sense);
 - Implement a set of memory management functions using that
   "bootstrap";
 - Then you've got the basis for implementing everything else on
   top of that.

The notion of doing that without something like assembly language
macros underneath is just wishful thinking...
-- 
(reverse (concatenate 'string ···········@" "enworbbc"))
http://www3.sympatico.ca/cbbrowne/macros.html
Rules of  the Evil Overlord  #123. "If I  decide to hold a  contest of
skill  open to  the general  public, contestants  will be  required to
remove their  hooded cloaks and  shave their beards  before entering."
<http://www.eviloverlord.com/>

From: Joe Marshall
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 09:06:07 +0000
Message-ID: <3_kh8.27434$ro5.12864338@typhoon.ne.ipsvc.net>

"Christopher Browne" <········@acm.org> wrote in message
···················@chvatal.cbbrowne.com...
>
> A necessary "base" would include some basic set of operators/functions
> necessary for writing the garbage collector which do not themselves
> make any use of dynamic memory allocation.  [Might this mean that the
> 'base' would exclusively use stack-based memory allocation?  I'd tend
> to think so...]

To be somewhat pedantic, it isn't necessary to eschew *all* dynamic
allocation in a GC.  You just have to collect more than you cons.

From: Frode Vatvedt Fjeld
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 08:59:24 +0000
Message-ID: <2hr8myax77.fsf@vserver.cs.uit.no>

Christopher Browne <········@acm.org> writes:

> That actually points to a somewhat reasonable approach:
>  - Write a function that issues assembly language instructions
>    into a function;
>  - Write some functions that issue groups of assembly language
>    instructions ("macros" in the assembler sense);
>  - Implement a set of memory management functions using that
>    "bootstrap";
>  - Then you've got the basis for implementing everything else on
>    top of that.
>
> The notion of doing that without something like assembly language
> macros underneath is just wishful thinking...

I'm working on a CL system that is pretty much based on (x86) assembly
macros like you are describing. It looks something like this:

(defun (setf car) (value cell)
  (check-type cell cons)
  (with-inline-assembly (:returns :eax)
    (:load-lexical cell :ebx)
    (:load-lexical value :eax)
    (:movl :eax (:ebx -1))))

-- 
Frode Vatvedt Fjeld

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 06:41:55 +0000
Message-ID: <87lmd62o5o.fsf@becket.becket.net>

Marco Antoniotti <·······@cs.nyu.edu> writes:

> I understand your points.  What I wanted to point out is that the
> `malloc' library you write under Unix is different from the one your
> write under Windows.  In (Common) Lisp, you have another layer to get
> past by: the specific CL implementation, which may or may not give you
> the necessary hooks to control the OS interface in a way that does not
> interfere with the (Common) Lisp system itself.

I'm not talking about writing it for an existing CL system, I'm
talking about writing it from the standpoint of a systems designer.

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 22:12:56 +0000
Message-ID: <3224355185019060@naggum.net>

* Thomas Bushnell, BSG
| Consider that if a Lisp system's GC is written in some other language
| (like, say, C) then you now need two compilers to build the language.
| If your only use for a C compiler is to compile your GC, then you have
| really wasted a vast effort in writing one.

  It seems quite natural that someone who writes a Common Lisp system would
  write its guts in some other language first.  After a while, it would be
  possible to bootstrap the building process in the system itself, but it
  would seem natural to build some lower-level Lisp that would enable a
  highly portable substrate to be written, and then cross-compilation would
  be a breeze, but it still seems fairly reasonable to start off with a
  different compiler or language if you want anybody to repeat the building
  process from scratch, not just for GC, but for the initial substrate.  I
  remember having to compile GNU CC on SPARC with the SunOS-supplied C
  compiler and then with the GNU CC thus built, in order to arrive at a
  "native build" and that when Sun stopped shipping compilers with their
  application-only operating system, someone was nice enough to make
  binaries available for the rest of the world.

  Why is GC so special in your view?

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 06:43:21 +0000
Message-ID: <87henu2o3a.fsf@becket.becket.net>

Erik Naggum <····@naggum.net> writes:

>   Why is GC so special in your view?

One might well need bootstrap in designing and initially building the
system.  But now, one needs *only* GCC to build GCC, and not anything
else.  Once one has a running system with GCC, you don't any longer
need the pcc compilers that GCC was originally built with.

From: Bijan Parsia
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 12:44:44 +0000
Message-ID: <Pine.A41.4.21L1.0203060740430.98092-100000@login8.isis.unc.edu>

On 5 Mar 2002, Thomas Bushnell, BSG wrote:

> Erik Naggum <····@naggum.net> writes:
> 
> >   Why is GC so special in your view?
> 
> One might well need bootstrap in designing and initially building the
> system.  But now, one needs *only* GCC to build GCC, and not anything
> else.  Once one has a running system with GCC, you don't any longer
> need the pcc compilers that GCC was originally built with.

But then why the restriction that you "must" have the "full" langauge
available? Sure, Squeak uses a subset "slang" which maps fairly directly
to C and is intended to generate C which is compiled by a separate C
compiler, but it *runs* inside Squeak. You can run/debug a slang based VM
in Squeak (well, it can be done, at least :)). It's *way* slower, but
presumably that's a "mere" implementational issue (the Squeak community
doesn't have the resources to be able to afford *not* to delegate this bit
to C compilers).

There are Smalltalks (and lisps) that let you inline C or asm code...would
that be ok?

Cheers,
Bijan Parsia.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 21:49:59 +0000
Message-ID: <874rjt74e0.fsf@becket.becket.net>

Bijan Parsia <·······@email.unc.edu> writes:

> But then why the restriction that you "must" have the "full" langauge
> available? 

Because I'm looking for solutions to the hard problem, not ways of
solving a different problem.

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 09:57:15 +0000
Message-ID: <3224397443760956@naggum.net>

* ·········@becket.net (Thomas Bushnell, BSG)
| One might well need bootstrap in designing and initially building the
| system.  But now, one needs *only* GCC to build GCC, and not anything
| else.  Once one has a running system with GCC, you don't any longer
| need the pcc compilers that GCC was originally built with.

  I actually tried to argue that the same would true of a Common Lisp
  system, but that portability constraints dictate that those who want to
  port a Common Lisp compiler to System X on the Y processor should be able
  to use the portable assembler (C) instead of having to start off writing
  non-portable assembler and use the system's assembler to bootstrap from.

  Needing *only* GCC, as you say, is predicated on the existence of a
  binary for your system to begin with.  How do people port GCC to a new
  platform om which they intend to build the GNU system?  My take on this
  is that it is no less dependent on some other existing C compiler than
  the similar problem for CL compilers is.  Duane, please help.  :)

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Nils Goesche
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 11:23:15 +0000
Message-ID: <a64u73$bk47n$1@ID-125440.news.dfncis.de>

In article <················@naggum.net>, Erik Naggum wrote:
> * ·········@becket.net (Thomas Bushnell, BSG)
>| One might well need bootstrap in designing and initially building the
>| system.  But now, one needs *only* GCC to build GCC, and not anything
>| else.  Once one has a running system with GCC, you don't any longer
>| need the pcc compilers that GCC was originally built with.
> 
>   I actually tried to argue that the same would true of a Common Lisp
>   system, but that portability constraints dictate that those who want to
>   port a Common Lisp compiler to System X on the Y processor should be able
>   to use the portable assembler (C) instead of having to start off writing
>   non-portable assembler and use the system's assembler to bootstrap from.
> 
>   Needing *only* GCC, as you say, is predicated on the existence of a
>   binary for your system to begin with.  How do people port GCC to a new
>   platform om which they intend to build the GNU system?  My take on this
>   is that it is no less dependent on some other existing C compiler than
>   the similar problem for CL compilers is.  Duane, please help.  :)

IIRC, they first write a /cross/ compiler for the new system that
runs on an old system.  Then they use the cross compiler to compile
gcc itself and voila... done.  Hey, sounds easy, doesn't it?  :-))

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 17:32:16 +0000
Message-ID: <3224424745291931@naggum.net>

* Nils Goesche
| IIRC, they first write a /cross/ compiler for the new system that
| runs on an old system.  Then they use the cross compiler to compile
| gcc itself and voila... done.  Hey, sounds easy, doesn't it?  :-))

  It sounds like _vastly_ more work than building on the native system with
  a native assembler and linker to build the first executables until you
  could replace those, too.  

  Back in the old days, I wrote 8080 and Z80 code on the PDP-10 and its
  cross-assembler for "microcomputers", because it was so fantastically
  more convenient to work on a real computer and deploy on a toy than work
  on the toy computer -- mostly all I did on the toy computer was to write
  an excellent terminal emulation program, in assembler.  However, the only
  reason this was more convenient was that it was a royal pain in the butt
  to try to use the toy computer for any development.  However, I had to
  copy the ROMs in that machine to the PDP-10 and basically regenerate its
  symbol table in order to make things work correctly.  Luckily, it had an
  emulator, and curiously, the PDP-10 emulated the code about 100 times
  faster than my toy computer executed it.  Were it not for the 100,000
  times difference in the cost of acquisition and ownership of the two
  computers, I would certainly have replaced my Exidy Sorcerer with a
  PDP-10.  Come to think of, my current home computer is strong enough to
  emulate a PDP-10 about 100 times faster than the real thing, too...

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 21:49:24 +0000
Message-ID: <878z9574ez.fsf@becket.becket.net>

Erik Naggum <····@naggum.net> writes:

> * Nils Goesche
> | IIRC, they first write a /cross/ compiler for the new system that
> | runs on an old system.  Then they use the cross compiler to compile
> | gcc itself and voila... done.  Hey, sounds easy, doesn't it?  :-))
> 
>   It sounds like _vastly_ more work than building on the native system with
>   a native assembler and linker to build the first executables until you
>   could replace those, too.  

You really have no clue how GCC works if you think it's more trouble.
Really, GCC is totally equipped to do cross-compilation (as are all
the other parts of the toolchain).

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 22:47:43 +0000
Message-ID: <3224443672602805@naggum.net>

* Thomas Bushnell, BSG
| You really have no clue how GCC works if you think it's more trouble.
| Really, GCC is totally equipped to do cross-compilation (as are all
| the other parts of the toolchain).

  I have helped port GCC in the past, like in 1988.  I am quite sure the
  design has improved since then.  You imply it has in the most unuseful
  way, so I guess you think it is quite useless to be specific and useful.

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 22:53:38 +0000
Message-ID: <87sn7d1f65.fsf@becket.becket.net>

Erik Naggum <····@naggum.net> writes:

> * Thomas Bushnell, BSG
> | You really have no clue how GCC works if you think it's more trouble.
> | Really, GCC is totally equipped to do cross-compilation (as are all
> | the other parts of the toolchain).
> 
>   I have helped port GCC in the past, like in 1988.  I am quite sure the
>   design has improved since then.  You imply it has in the most unuseful
>   way, so I guess you think it is quite useless to be specific and useful.

GCC has improved rather a lot in the intervening 14 years.  It is now
much easier to simply directly build the compiler for a new target
(using cross compilation) than it is to port some other compiler
first.  It's so easy, that it's generally the preferable option.

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 12:10:26 +0000
Message-ID: <ey3g03dgaml.fsf@cley.com>

* Erik Naggum wrote:
>   Needing *only* GCC, as you say, is predicated on the existence of a
>   binary for your system to begin with.  How do people port GCC to a new
>   platform om which they intend to build the GNU system?  My take on this
>   is that it is no less dependent on some other existing C compiler than
>   the similar problem for CL compilers is.  Duane, please help.  :)

I assume they add support for the new target to gcc, compile gcc on an
existing system targeted at the new system and then run this new
compiler on the new system.

--tim

From: Martin Simmons
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 14:49:11 +0000
Message-ID: <3c862c73$0$238$ed9e5944@reading.news.pipex.net>

"Tim Bradshaw" <···@cley.com> wrote in message ····················@cley.com...
> * Erik Naggum wrote:
> >   Needing *only* GCC, as you say, is predicated on the existence of a
> >   binary for your system to begin with.  How do people port GCC to a new
> >   platform om which they intend to build the GNU system?  My take on this
> >   is that it is no less dependent on some other existing C compiler than
> >   the similar problem for CL compilers is.  Duane, please help.  :)
>
> I assume they add support for the new target to gcc, compile gcc on an
> existing system targeted at the new system and then run this new
> compiler on the new system.

Correct, though it is often complicated by object file formats.  One approach is
to generate textual assembly language on the host machine, which is then
assembled and linked on the target machine (using existing tools).  Another
approach is to retarget the equivalent GNU tools and generate the binaries
directly on the host machine.
--
Martin Simmons, Xanalys Software Tools
······@xanalys.com
rot13 to reply

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 19:20:27 +0000
Message-ID: <3224431235976594@naggum.net>

* Tim Bradshaw
| I assume they add support for the new target to gcc, compile gcc on an
| existing system targeted at the new system and then run this new compiler
| on the new system.

  This is probably doable, but in my experience with cross-compilation, you
  do not just generate code, you effectively generate a module that works
  with a much larger system.  To make this _really_ work, you have to have
  intimate knowledge of the target system.  Since the compiler is often the
  first thing you build on a new system in order to build the other tools
  you want to use there, my thinking is that you save a lot of time using a
  pre-existing compiler and like tool, particularly to ensure that you get
  the linking information right for that particular environment, what with
  all the shared library dependencies and whatnot.

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 21:48:48 +0000
Message-ID: <87d6yh74fz.fsf@becket.becket.net>

Erik Naggum <····@naggum.net> writes:

>   This is probably doable, but in my experience with cross-compilation, you
>   do not just generate code, you effectively generate a module that works
>   with a much larger system.  To make this _really_ work, you have to have
>   intimate knowledge of the target system.  Since the compiler is often the
>   first thing you build on a new system in order to build the other tools
>   you want to use there, my thinking is that you save a lot of time using a
>   pre-existing compiler and like tool, particularly to ensure that you get
>   the linking information right for that particular environment, what with
>   all the shared library dependencies and whatnot.

No, Tim was totally right.  You don't use the pre-existing compiler in
general; often times the manufacturer isn't providing one.

Often you are the first person writing one: this is now rather often
the case with GCC.

From: Tim Bradshaw
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 22:42:10 +0000
Message-ID: <ey36649e2t9.fsf@cley.com>

* Erik Naggum wrote:
>   This is probably doable, but in my experience with cross-compilation, you
>   do not just generate code, you effectively generate a module that works
>   with a much larger system.  To make this _really_ work, you have to have
>   intimate knowledge of the target system.  Since the compiler is often the
>   first thing you build on a new system in order to build the other tools
>   you want to use there, my thinking is that you save a lot of time using a
>   pre-existing compiler and like tool, particularly to ensure that you get
>   the linking information right for that particular environment, what with
>   all the shared library dependencies and whatnot.

I think this is obviously what you do when you have the choice - when
I got gcc working on our Sun's I did it by compiling with Sun's cc and
using a large number of other pre-provided tools (I think I still use
the Sun linker), and so did everyone else in those days (nowadays you
get a binary of gcc to bootstrap because Sun don't ship a c compiler
for free any more...).

But in what I suspect (without real knowledge) is a large proportion
of ports nowadays the target machine is something which does not yet
have anything on it, because it's a little embedded processor for
which you (the maker of the processor) are going to provide the gnu
toolchain (this seems to be pretty much standard for embedded systems
now).  In these cases you probably have seriously minimal support on
the target since you're aiming to get something like an OS up on it
from the bare iron.

--tim

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 21:47:57 +0000
Message-ID: <87hent74he.fsf@becket.becket.net>

Erik Naggum <····@naggum.net> writes:

>   I actually tried to argue that the same would true of a Common Lisp
>   system, but that portability constraints dictate that those who want to
>   port a Common Lisp compiler to System X on the Y processor should be able
>   to use the portable assembler (C) instead of having to start off writing
>   non-portable assembler and use the system's assembler to bootstrap from.

You'll always need an assembler, of course; there isn't any way around
that.  And there are advantages for systems like the old KCL and its
descendents which use GCC as the back end for the compiler.

But I'm thinking about a different problem space, not the one you are.

>   Needing *only* GCC, as you say, is predicated on the existence of a
>   binary for your system to begin with.  How do people port GCC to a new
>   platform om which they intend to build the GNU system?  My take on this
>   is that it is no less dependent on some other existing C compiler than
>   the similar problem for CL compilers is.  Duane, please help.  :)

People port GCC to new platforms by having GCC cross-compile code.
No reliance on other compilers is necessary.

MIT Scheme gets ported the same way.

From: Duane Rettig
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 22:00:01 +0000
Message-ID: <4r8mxgzsz.fsf@beta.franz.com>

Erik Naggum <····@naggum.net> writes:

> * ·········@becket.net (Thomas Bushnell, BSG)
> | One might well need bootstrap in designing and initially building the
> | system.  But now, one needs *only* GCC to build GCC, and not anything
> | else.  Once one has a running system with GCC, you don't any longer
> | need the pcc compilers that GCC was originally built with.
> 
>   I actually tried to argue that the same would true of a Common Lisp
>   system, but that portability constraints dictate that those who want to
>   port a Common Lisp compiler to System X on the Y processor should be able
>   to use the portable assembler (C) instead of having to start off writing
>   non-portable assembler and use the system's assembler to bootstrap from.
> 
>   Needing *only* GCC, as you say, is predicated on the existence of a
>   binary for your system to begin with.  How do people port GCC to a new
>   platform om which they intend to build the GNU system?  My take on this
>   is that it is no less dependent on some other existing C compiler than
>   the similar problem for CL compilers is.  Duane, please help.  :)

I don't view self-hosting as precluding any bootstrapping which is
necessary to get to that self-hosting state.  In fact, I would be
surprised to hear of _any_ kind of self-hosting which doesn't require
a non-self-hosted bootstrap.  This applies to both cross-compiling
from another architecture and re-compiling on the same architecture
starting with a different compiler.

Thomas Bushnell's challenge is a good one.  And this thread has been
a good one, as well.  Several times I considered answering some of the
statements made on this thread, but have refrained because there are
so many issues and stochastic requirements.  So I thought I'd put
together several ideas and present them at once, from the point of
view of an Allegro CL developer.

As an initial summary, I submit that the entire lisp _could_ be
written entirely in lisp, but that it is not convenient to do so,
given the fact that we run our lisp on Unix and MS systems, which
are all C based, and even embedded systems tend to have libc
equivalent interfaces.  However, I do disagree that it is necessary
to require that the whole language be available for a GC written in
lisp, and will explain that later as well.

First, a background review of Allegro CL's structure, for those
who don't yet know:

 1. Most of Allegro CL is written in Allegro CL, and compiles per
architecture to bits (represented in code vector lisp objects)
using the Allegro CL compile and compile-file functions.

 2. A subsection of the kernel or "runtime" of Allegro CL is an
extension of CL I call runtime or "rs" code, which also use the
Allegro CL compiler, extended and hooked to produce assembler
source as output.

 3. Some small part of Allegro CL is written in C.  On some
architectures, the C++ compiler is used, but it is mostly written
in C style.  The major purpose of the C code is to parse the .h
header files of the system for the os interface.  We try mostly
to limit our C code to os-interface functionality and regularization.

In addition, as a kind of #3a: We also have written our garbage-collector
and our fasl-file reader in C.

The binaries from 2, 3, and 3a are all linked together using the system
linker to either produce a single executable, or to produce a simple
executable main and a shared-library.  In both cases, that link output
serves dual purpose as a bootstrap mechanism to load pure lisp code in
(i.e. from #1) or to re-estalish the environment dumped in a previous
lisp session.

The rs code in #2 is sort of a combination of superset/subset of regular
CL code; it understands both C and Lisp calling conventions, but does not
set up a "current function" for its own operation.  Since the produced
code is just assembler source, and does not set up a function object,
local constants are not allowed; only constants that are in the lisp's
global table can be referenced by rs code.  Recently, I added an
exception to this; string constants can now be represented in rs
code - these will become .asciz or equivalent directives in the
assembler source.  This allows such rs functions as

(def-runtime-q print-answer (n)
  (q-c-call printf "The answer is %d
" n))

I have also recently extended the rs code to allow for large
stack-allocated specialized arrays; we've always been able to
allocate stack-based simple-vectors in rs code, but due to the
rs code stack frame descriptors we provided for gc purposes, non-lisp
data had been restricted to a few hundred bytes until now.

Theoretically, due to these and other changes, we should now be
able to rewrite both the fasl reader and the garbage-collector
in rs code, but it hasn't been a high priority.  For the garbage
collector especially, there must be an incentive to make such a
potentially regressive move; it may be that a new gc to handle
concurrent MP might be just that incentive.

For #3, I was almost ready to disagree with Thomas Bushnell 
because I believed that it is necessary to use C functionality
to interface to C library functions.  This is especially true
for the need to parse .h files, and to get the correct
definitions and interfaces based on particular #define constants.
If you doubt this, just try to figure out, for example, HP's
sigcontext structure, which has layer upon layer of C macrology
to define a large number of incompatible structure and interface
definitions.

However, I had to back off on any such disagreement, becuase it
certainly is _possible_ to write any of these interface functions
in lisp, using such facilities as our Cbind tool to pre-parse the
header files and to thus present all pertinent information to
the lisp-in-lisp code.  However, I still am not inclined to do such
a thing, because it would be specialized toward lisp bootstrap, and
thus not useful for anything else.  And why not use C at what it does
best (parse C header files)?  Besides, even our Cbind facility uses
the gcc front-end to do the initial parsing, so in essence a non-lisp
compiler part would still be used.  Bottom line; it is more convenient
to write our os-interface code in C, because it interfaces to C
libraries.  I suppose that we would remove such C interfaces if we
were porting our lisp to a Lisp operatring system.

Finally, I'd like to disagree wholeheartedly with the notion that
the full language must be available for the whole lisp implementation.
Specifically, I am responding to this point by Thomas Bushnell:

>> I thought I was bright-shining clear.  What I want is a GC written in
>> the language itself, with all the normal language facilities
>> available.

It is the notion of "availability" that I take issue with.  To
make my point, consider the statement that in every English
sentence, all letters of the alphabet are available.  That is,
of course, a true statement.  And as in the specific example where
"The quick brown fox jumped over the lazy dogs." it is obviously
possible to construct a sentence which _indeed_ does use every
letter in the alphabet.  However, does this require that every
sentence be constructed in such a way?  Of course not!  It is thus
not the whole alphabet which is available to a particular sentence,
but only those letters which in fact work toward constructing the
sentence, which are in fact "available".  Thus, for normal
conversation, the letter "q" is not generally available to me to
use unless I am using a word which has a "q" in it (or unless I'm
specifically talking about the letter "q" itself).

Let's extend this notion to an extensible language like Lisp.
Consider the start of a CL function foo:

(defun foo ()
   ...)

Now, the body of foo can refer to any CL functionality, including
foo itself.  However, it would generally be bad programming (i.e.
a bug) to allow a call to foo within foo which results in an
infinite recursion.  Thus, to some extent, foo is not fully
available to use as one wishes within foo.

Similar truths apply to a garbage-collector.  It might be
perfectly acceptable for a gc function to call cons, but it
had better be prepared to deal with the case where there is
no more room for a new cons cell, which would thus cause a
recursive call to the garbage-collector (presumably an infinite
recursion, since the reason for the initial gc call might have
been for lack of space).

And, as The Oracle in Matrix says, "What's really going to
bake your noodle ..." is that at least in CL, there is no
definition of what a garbage-collector actually _is_.  There are
a few references, but no definitions or specs...

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 22:19:30 +0000
Message-ID: <87r8mx5ogd.fsf@becket.becket.net>

Duane Rettig <·····@franz.com> writes:

> As an initial summary, I submit that the entire lisp _could_ be
> written entirely in lisp, but that it is not convenient to do so,
> given the fact that we run our lisp on Unix and MS systems, which
> are all C based, and even embedded systems tend to have libc
> equivalent interfaces.  

So it should be pointed out that one of the reasons I'm interested in
this question is that I'm interested in lisp systems running on bare
metal.

> However, I do disagree that it is necessary
> to require that the whole language be available for a GC written in
> lisp, and will explain that later as well.

I agree that it may not be *necessary* depending on what that means.

But note that I began by asking about both Scheme and CL; the point is
that of course I could confine myself to a tiny subset of CL and do
things in PL/I (er, I mean "the loop macro").  

However, the real things I want are fairly simple:  I want complex
closures and I want cons.  I might want call/cc, at least, I'm not
willing to exclude that a priori.

> For #3, I was almost ready to disagree with Thomas Bushnell 
> because I believed that it is necessary to use C functionality
> to interface to C library functions.  

If you really need to, you can do that, and it may well be the most
efficient implementation strategy if you want to run on Unix.  (As, of
course, you do.) 

A "pure" implementation means that you would do the same work the C
library people do, and make Lisp equivalents for the C header files
yourself.  Remember, *I'm* always thinking of this from a systems
design perspective, so "tell the other group to do the work" isn't
really a solution. :) But if the other group is doing the work anyway
(as is the case for people running a Lisp environment on Unix, then of
course, it's convenient to piggyback on them.

> For #3, I was almost ready to disagree with Thomas Bushnell 
> because I believed that it is necessary to use C functionality
> to interface to C library functions.  

The actual interfaces you need are the *kernel* interfaces, not
interfaces to the C library.  From the systems design perspective,
your system would be *replacing* the C library, not borrowing it.  If
you do want to borrow it, then it might be most convenient to use C to
hook into it, though as you correctly note, even then you can get
around it.

> Similar truths apply to a garbage-collector.  It might be
> perfectly acceptable for a gc function to call cons, but it
> had better be prepared to deal with the case where there is
> no more room for a new cons cell, which would thus cause a
> recursive call to the garbage-collector (presumably an infinite
> recursion, since the reason for the initial gc call might have
> been for lack of space).

So the *point* of my question is, in part, just this problem.  Now, if
there *isn't* a solution, then you have to subset the language, omit
cons, and then code your GC.

But why do that if there is a convenient solution?

One strategy:  suppose to GC an arena of N bytes takes N/10 bytes of
memory to hold dynamically allocated GC data structures.

One strategy is to just save that space always, so it's there.  Or, if
one is using stop-and-copy, then it's even easier to find space.

If one is in a multi-threaded world, and each thread gets its own
allocation arena for normal allocation, and you are using a
stop-the-world approach to GC, then you can't reliably assume
(perhaps) that all the threads have left their arena in an ideal
state.  That means that the GC will probably have to allocate out of a
totally separate arena from what other programs use.  When it's done,
a quick GC pass (allocating from the main heap) can be run to clean
the special GC arena, and copy anything remaining there onto the main
heap.

From: Christopher Browne
Subject: Re: self-hosting gc
Date: Thu, 07 Mar 2002 00:26:41 +0000
Message-ID: <a66cc6$bte2g$1@ID-125932.news.dfncis.de>

The world rejoiced as ·········@becket.net (Thomas Bushnell, BSG) wrote:
> Duane Rettig <·····@franz.com> writes:
>> Similar truths apply to a garbage-collector.  It might be
>> perfectly acceptable for a gc function to call cons, but it
>> had better be prepared to deal with the case where there is
>> no more room for a new cons cell, which would thus cause a
>> recursive call to the garbage-collector (presumably an infinite
>> recursion, since the reason for the initial gc call might have
>> been for lack of space).

> So the *point* of my question is, in part, just this problem.  Now,
> if there *isn't* a solution, then you have to subset the language,
> omit cons, and then code your GC.

> But why do that if there is a convenient solution?

> One strategy: suppose to GC an arena of N bytes takes N/10 bytes of
> memory to hold dynamically allocated GC data structures.

> One strategy is to just save that space always, so it's there.  Or,
> if one is using stop-and-copy, then it's even easier to find space.

> If one is in a multi-threaded world, and each thread gets its own
> allocation arena for normal allocation, and you are using a
> stop-the-world approach to GC, then you can't reliably assume
> (perhaps) that all the threads have left their arena in an ideal
> state.  That means that the GC will probably have to allocate out of
> a totally separate arena from what other programs use.  When it's
> done, a quick GC pass (allocating from the main heap) can be run to
> clean the special GC arena, and copy anything remaining there onto
> the main heap.

I think you need to take a step back and try to see _precisely_ what
it is you're trying to implement.

It seems (from what I see) that your goal is to build a "not far off
bare iron" Lisp environment.

A hopefully not-too-wild guess would be that you might take one of the
Unix-like kernels, and rather than running init on top of that,
running Lisp instead.  (I could be wrong, but that would surprise a
bit...)

I think your goal is likely to be to:
 - Interface to the kernel's devices;
 - Interface to the kernel's memory manager;
and then build a usable environment on top of that.

Let's suppose that this environment is to be an implementation of
Common Lisp.  (Changing to Scheme wouldn't change the fundamentals.)

You clearly need to have the following things:

a) A code generator that generates and stores object code that the
kernel knows how to load and run.

b) A memory management system that requests memory from the kernel,
when needed, which doles it out when the CL wants to CONS a bit, 
which can reclaim garbage, and perhaps even return it to the kernel.

Those are things which fall out of the scope of things that a CL
implementation "trivially provides."

It would certainly be a reasonable idea to create, in portable Common
Lisp (or, for that matter, Scheme):

 a) Some functions that know how to take object code in some sort of
 buffer, and write it out to storage;

 b) Some functions that implement the "code buffer" described in a),
 and which allow storing machine language instructions in that buffer
 by making use of some sort of "assembler";

 c) Some set of Lisp functions that implement a memory manager, using
 the "assembler" described in b).

Given that, you can then proceed to implement:

 d) A library of string and number functions using c) and b);

 e) An interface, in assembly language, to the OS kernel, which uses
 d) and b) to provide an API that can access OS functionality.

At this point you have a memory manager, a program loader and "saver,"
an assembler, and access to OS services.  

You can start assembling functions to implement basic pieces of a Lisp
environment, and then assemble many of those together to implement
further functions and macros.

There's going to be a bunch of code that is written in pretty much raw
assembler, guaranteed.  I'd think a)-e) are all examples of that.

The proper goal of the exercise will doubtless be to implement in the
f)-and-following phases the pieces of the Lisp environment needed to
host a)-e).

Thus, it would be perfectly acceptable to make extensive use of macros
and CLOS throughout perhaps even MOP, with the underlying expectation
that f)-and-following includes an implementation of macros, CLOS, and
MOP.  In effect, the project involves implementing an Extremely
Sophisticated Assembler in Lisp, and that's quite a reasonable idea.

You could reasonably assume that since there exists the CLOS
implementation "PCL," that you don't forcibly have to create CLOS from
scratch, but could just implement enough of CL to allow loading in
PCL.

What you _can't_ assume is that simply by the sheer existence of
CLISP, CMU/CL, and ACL, you can avoid a)-e) or f)-and-following.

Your big tasks are going to be to design:
 a) How you'll save and load binary code;
 b) How you'll implement memory management;
 c) How you'll store Lisp code in memory;
 d) How you'll store Lisp data in memory.

That's certainly where the project starts.  If the approaches for
those four things are done well, the system may be good.  If the
approaches suck, well, great suckage lies there...
-- 
(reverse (concatenate 'string ···········@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/oses.html
"Windows: The ``Big O'' of operating systems."

From: Erik Naggum
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 22:51:34 +0000
Message-ID: <3224443903274171@naggum.net>

* Thomas Bushnell, BSG
| So it should be pointed out that one of the reasons I'm interested in
| this question is that I'm interested in lisp systems running on bare
| metal.

  That would be an operating system for and in (Common) Lisp, would it not?

///
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

From: Thomas Bushnell, BSG
Subject: Re: self-hosting gc
Date: Wed, 06 Mar 2002 22:59:40 +0000
Message-ID: <87ofi11ew3.fsf@becket.becket.net>

Erik Naggum <····@naggum.net> writes:

> * Thomas Bushnell, BSG
> | So it should be pointed out that one of the reasons I'm interested in
> | this question is that I'm interested in lisp systems running on bare
> | metal.
> 
>   That would be an operating system for and in (Common) Lisp, would it not?

Or Scheme, or other things.  But the general problem is of more
interest that merely a Lisp-OS strategy.

From: Joe Marshall
Subject: Re: self-hosting gc
Date: Tue, 12 Mar 2002 16:52:41 +0000
Message-ID: <tnqj8.18729$44.3920807@typhoon.ne.ipsvc.net>

"Thomas Bushnell, BSG" <·········@becket.net> wrote in message
···················@becket.becket.net...
>
> Consider, for example, that memory management for implementations of
> the C language are normally written in C.

Actually, they are written in a restricted subset of C that does
not use malloc and relies solely on static, stack-allocated, and
OS-allocated (mmap, sbrk, VirtualAlloc, etc.) non-managed memory.

> Consider that if a Lisp system's GC is written in some other language
> (like, say, C) then you now need two compilers to build the language.
> If your only use for a C compiler is to compile your GC, then you have
> really wasted a vast effort in writing one.

Agreed.

From: Christopher Browne
Subject: Re: self-hosting gc
Date: Tue, 19 Mar 2002 19:17:40 +0000
Message-ID: <m3r8mggyej.fsf@chvatal.cbbrowne.com>

In an attempt to throw the authorities off his trail, "Joe Marshall" <·············@attbi.com> transmitted:
> "Thomas Bushnell, BSG" <·········@becket.net> wrote in message
> ···················@becket.becket.net...
>> Consider, for example, that memory management for implementations
>> of the C language are normally written in C.

> Actually, they are written in a restricted subset of C that does not
> use malloc and relies solely on static, stack-allocated, and
> OS-allocated (mmap, sbrk, VirtualAlloc, etc.) non-managed memory.

>> Consider that if a Lisp system's GC is written in some other
>> language (like, say, C) then you now need two compilers to build
>> the language.  If your only use for a C compiler is to compile your
>> GC, then you have really wasted a vast effort in writing one.

> Agreed.

... And it would be entirely silly to do so.  (Write a whole C
compiler.)

It would make a _LOT_ more sense to write a _very_ shortened-down
version of C that would, off the top of my head, need to support:

Data types including:
 - Integers (probably of a fixed size, either 32 bit or 64 bit, NOT
   BOTH);
 - Characters (well, more accurately, bytes);
 - Words, maybe;
 - Probably structs;
 - Pointers to the preceding types, and _perhaps_ pointers to
   functions...

Control structures would include the standard for/while/do/until/case,
functions.

There would need to be an expression evaluation scheme.

Conspicuously _absent_ would be:
 - Floating point;
 - I/O library;
 - Any complex sort of automatic casting scheme.

The result of this would be a language a minscule fraction of the size
of ANSI C99, and it would be perhaps more than merely unconsciously
reminiscent of C--, created by Simon Peyton Jones, a chief Haskell
guy.

Of course, this all mandates implementing a parser and a code
generator; it might very well be a better idea to embed the above set
of abstractions into a set of Lisp functions so that you'd go off and
write:

(let ((a (mm:auto :int))
      (b (mm:auto :int))
      (c (mm:auto :char)))
  (mm:set a 20)
  (mm:set b 30)
  (mm:while (mm:< a b)
	    (mm:set c (mm:- a b))
	    (mm:incf a))) 

Which is a totally worthless bit of pseudocode, but which shows the
notion of embedding a calculation written in a very specialized Lisp
vocabulary that conspicuously _isn't_ using plain, ordinary Lisp
operators.
-- 
(reverse (concatenate 'string ···········@" "enworbbc"))
http://www3.sympatico.ca/cbbrowne/sgml.html
"We're thinking about upgrading from SunOS 4.1.1 to SunOS 3.5."
-- Henry Spencer

From: Mike Travers
Subject: Re: self-hosting gc
Date: Tue, 05 Mar 2002 19:26:52 +0000
Message-ID: <cabbd060.0203051126.5e6881ef@posting.google.com>

·········@becket.net (Thomas Bushnell, BSG) wrote in message news:<··············@becket.becket.net>...
> ··@mdli.com (Mike Travers) writes:
> 
> > The Jalapeno Java JVM:
> > http://www.research.ibm.com/jalapeno/publication.html#oopsla99_jvm
> > 
> > Squeak Smalltalk:
> > ftp://st.cs.uiuc.edu/Smalltalk/Squeak/docs/OOPSLA.Squeak.html
> 
> Neither of these have a self-hosting GC.

In both cases, the GC is written in a modified version of the
high-level language.  If that's not self-hosting, then you better
explain more precisely what you mean by the term.

If you mean that the GC has to be written in the UNmodified language,
this is pretty obvioulsy unrealistic -- you need forms of memory
access that the pure HLL won't give you.

From: Daniel C. Wang
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 16:52:28 +0000
Message-ID: <ug03buhmb.fsf@agere.com>

Let me plug my thesis. 

http://ncstrl.cs.princeton.edu/expand.php?id=TR-640-01

Abstract

 ...

I combine existing type systems with several standard type-based compilation
techniques to write strongly typed programs that include a function that
acts as a tracing garbage collector for the program. Since the garbage
collector is an explicit function, there is no need to provide a trusted
garbage collector as a runtime service to manage memory. Since the language
is strongly typed, the standard type soundness guarantee ``Well typed
programs do not go wrong'' is extended to include the collector, making the
garbage collector an untrusted piece of code. This is a desirable property
for both Java and proof-carrying code systems.

 ...

It is extends the work I presented at POPL 2001. I won't make any claims
of practicality until I bite the bullet and build a large scale
system. However, the underlying design is sound. If you're not concerned
about type saftey then the work still offers you a recipe of what minimal
abstractions you need to do a GC in your favoriate untyped language.

From: Jochen Schmidt
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 20:31:47 +0000
Message-ID: <a6b3dq$eql$1@rznews2.rrze.uni-erlangen.de>

Daniel C. Wang wrote:

> It is extends the work I presented at POPL 2001. I won't make any claims
> of practicality until I bite the bullet and build a large scale
> system. However, the underlying design is sound. If you're not concerned
> about type saftey then the work still offers you a recipe of what minimal
> abstractions you need to do a GC in your favoriate untyped language.

Just curious - do you mean Lisp by mentioning the term "untyped language"?


--
http://www.dataheaven.de

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 20:45:42 +0000
Message-ID: <pan.2002.03.08.15.45.42.41996.10606@shimizu-blume.com>

On Fri, 08 Mar 2002 15:31:47 -0500, Jochen Schmidt wrote:

> Daniel C. Wang wrote:
> 
>> It is extends the work I presented at POPL 2001. I won't make any
>> claims of practicality until I bite the bullet and build a large scale
>> system. However, the underlying design is sound. If you're not
>> concerned about type saftey then the work still offers you a recipe of
>> what minimal abstractions you need to do a GC in your favoriate untyped
>> language.
> 
> Just curious - do you mean Lisp by mentioning the term "untyped
> language"?

I think he meant "languages where it is possible to violate at runtime
the type abstractions that were present at compile time".  With this,
yes, Lisp is in that class.

Matthias

From: ·······@andrew.cmu.edu
Subject: Re: self-hosting gc
Date: Fri, 08 Mar 2002 23:40:14 +0000
Message-ID: <20020308184014.Z16447@emu>

On Fri, Mar 08, 2002 at 03:45:42PM -0500, Matthias Blume wrote:
> On Fri, 08 Mar 2002 15:31:47 -0500, Jochen Schmidt wrote:
> 
> > Daniel C. Wang wrote:
> > 
> >> It is extends the work I presented at POPL 2001. I won't make any
> >> claims of practicality until I bite the bullet and build a large scale
> >> system. However, the underlying design is sound. If you're not
> >> concerned about type saftey then the work still offers you a recipe of
> >> what minimal abstractions you need to do a GC in your favoriate untyped
> >> language.
> > 
> > Just curious - do you mean Lisp by mentioning the term "untyped
> > language"?
> 
> I think he meant "languages where it is possible to violate at runtime
> the type abstractions that were present at compile time".  With this,
> yes, Lisp is in that class.
> 

A brief glance at the CLHS or CLTL2 will reveal a quite rich system of
types.  I think the term you are looking for is "static" in reference to
what he said above, not "untyped" which is a misnomer.

Your definition of "untyped"-ness also reveals a definite "static"-outlook,
which is a different view than that taken by Lispers.  In the static
view (which I am not condemning), there are distinct and separate phases
for compilation and execution.  As a result, no changes may be introduced
into the running program and the consequence is that compile-time type-
invariants are held.  Thus it is possible to claim that a program which
has a type-error must've been programmed in a language that did not
distinguish between types (which "untyped" implies to me) because there is
no way to make further changes that rectify the error.

In the "dynamic" view that Lisps generally favor, there is no time set
aside only for compilation and no time set aside only for execution.
Since it is quite possible to merge newly-compiled code into an already-
running image, the language and compiler needs to accomodate the fact
that at some point in the process the type-invariants may be broken
(even if it is only temporarily).  The solution to this, while preserving
the dynamic nature of Lisp, is the excellent condition system found in
CL, which allows you to work with the system even in an "exceptional"
state.  No longer is it possible to claim that a type-error implies
untyped-ness, because it is the nature of the system that incremental
updates may break certain invariants.  A "type-error" means that the
system is in fact performing type-checking and distinguishes between
types, which would be a rather odd thing to be called "untyped".
Furthermore there is a strong sense of object-identity in CL which is
linked to "strong" type-checking.

Please do not confuse "dynamic" with "weak" or "untyped".  There are more
points of view than the static one out there, and some of them suit
certain problem-sets better than others.  There are even languages which
can be considered to be static and weakly typed, such as C.  I find the
two criteria to be orthogonal.

-- 
; Matthew Danish <·······@andrew.cmu.edu>
; OpenPGP public key: C24B6010 on keyring.debian.org
; Signed or encrypted mail welcome.
; "There is no dark side of the moon really; matter of fact, it's all dark."

From: Jochen Schmidt
Subject: Re: self-hosting gc
Date: Sat, 09 Mar 2002 02:16:16 +0000
Message-ID: <a6bnjj$mp0$1@rznews2.rrze.uni-erlangen.de>

Matthias Blume wrote:

> On Fri, 08 Mar 2002 15:31:47 -0500, Jochen Schmidt wrote:
> 
>> Daniel C. Wang wrote:
>> 
>>> It is extends the work I presented at POPL 2001. I won't make any
>>> claims of practicality until I bite the bullet and build a large scale
>>> system. However, the underlying design is sound. If you're not
>>> concerned about type saftey then the work still offers you a recipe of
>>> what minimal abstractions you need to do a GC in your favoriate untyped
>>> language.
>> 
>> Just curious - do you mean Lisp by mentioning the term "untyped
>> language"?
> 
> I think he meant "languages where it is possible to violate at runtime
> the type abstractions that were present at compile time".  With this,
> yes, Lisp is in that class.

Then I would give our lisp newcomers from the static typed world the advice 
that they learn to decide between "dynamic strong typing" and "static 
strong typing". It is completely nonsense to call lisp untyped and shows 
that someone has absolutely no clue about typing at all.

ciao,
Jochen

--
http://www.dataheaven.de

From: Matthias Blume
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 04:02:05 +0000
Message-ID: <pan.2002.03.09.23.02.20.555831.1560@shimizu-blume.com>

On Fri, 08 Mar 2002 21:16:16 -0500, Jochen Schmidt wrote:

> Matthias Blume wrote:
> 
>> On Fri, 08 Mar 2002 15:31:47 -0500, Jochen Schmidt wrote:
>> 
>>> Daniel C. Wang wrote:
>>> 
>>>> It is extends the work I presented at POPL 2001. I won't make any
>>>> claims of practicality until I bite the bullet and build a large
>>>> scale system. However, the underlying design is sound. If you're not
>>>> concerned about type saftey then the work still offers you a recipe
>>>> of what minimal abstractions you need to do a GC in your favoriate
>>>> untyped language.
>>> 
>>> Just curious - do you mean Lisp by mentioning the term "untyped
>>> language"?
>> 
>> I think he meant "languages where it is possible to violate at runtime
>> the type abstractions that were present at compile time".  With this,
>> yes, Lisp is in that class.
> 
> Then I would give our lisp newcomers from the static typed world the
> advice that they learn to decide between "dynamic strong typing" and
> "static strong typing". It is completely nonsense to call lisp untyped
> and shows that someone has absolutely no clue about typing at all.

Thanks for the lecture.  Am always eager to learn.  (It is news to me
that I (or Dan, for that matter) are "lisp newcomers".  Actually, it is
quite hilarious.  I was hacking Lisp when you were 5 years old.
Literally.  And I wrote my first Lisp implementation when you were 7.)

Now, the sad truth is that the word "type" has been used by different
people in quite different ways.  Lisp is "untyped" just like the "untyped
lambda calculus" (an established term in theoretical CS) is "untyped" --
even though we can distinguish between different classes of values.

So, yes, Lisp has "types", but those are not the kind of types Dan was
talking about.  One might be tempted to say that it shows a kind of
cluelessness about types if one does not recognize this fact.

Matthias

From: Jochen Schmidt
Subject: Re: self-hosting gc
Date: Sun, 10 Mar 2002 18:38:02 +0000
Message-ID: <a6g5hl$hnd$1@rznews2.rrze.uni-erlangen.de>

Matthias Blume wrote:

> On Fri, 08 Mar 2002 21:16:16 -0500, Jochen Schmidt wrote:
> 
>> Matthias Blume wrote:
>> 
>>> On Fri, 08 Mar 2002 15:31:47 -0500, Jochen Schmidt wrote:
>>> 
>>>> Daniel C. Wang wrote:
>>>> 
>>>>> It is extends the work I presented at POPL 2001. I won't make any
>>>>> claims of practicality until I bite the bullet and build a large
>>>>> scale system. However, the underlying design is sound. If you're not
>>>>> concerned about type saftey then the work still offers you a recipe
>>>>> of what minimal abstractions you need to do a GC in your favoriate
>>>>> untyped language.
>>>> 
>>>> Just curious - do you mean Lisp by mentioning the term "untyped
>>>> language"?
>>> 
>>> I think he meant "languages where it is possible to violate at runtime
>>> the type abstractions that were present at compile time".  With this,
>>> yes, Lisp is in that class.
>> 
>> Then I would give our lisp newcomers from the static typed world the
>> advice that they learn to decide between "dynamic strong typing" and
>> "static strong typing". It is completely nonsense to call lisp untyped
>> and shows that someone has absolutely no clue about typing at all.
> 
> Thanks for the lecture.  Am always eager to learn.  (It is news to me
> that I (or Dan, for that matter) are "lisp newcomers".  Actually, it is
> quite hilarious.  I was hacking Lisp when you were 5 years old.
> Literally.  And I wrote my first Lisp implementation when you were 7.)

You answered that you think that Dan meant languages in which it is 
possible to "violate" at runtime the type abstractions that were present at 
compile time. You may be much older than me and know lisp much longer 
(which doesn't have to imply better...) - but have you never realized that 
runtime and compile time are somewhat differently layed out than in other 
languages? That it is possible to temporarily switch to compile time at 
runtime? That the compiler has the whole language available and can do 
arbitrary computations? Did you never realize that your answer to me simply 
did not make any sense in the case of lisp?
You wrote that it is possible to "violate" the type abstractions at runtime 
but this is not even true since you will get type errors.
The term "untyped" has absolutely _nothing_ to do with "runtime" or 
"compile time".

> Now, the sad truth is that the word "type" has been used by different
> people in quite different ways.  Lisp is "untyped" just like the "untyped
> lambda calculus" (an established term in theoretical CS) is "untyped" --
> even though we can distinguish between different classes of values.

You are right that the word "type" is used by different people in quite 
different ways. This fact alone does in no way mean that it is ok to use it
in a way that is misleading out of pure laziness.

"Untyped lambda calculus" is called so because the variables are untyped.
Calling a *language* untyped because it has untyped variables is like 
calling women "unhairy" because they do not have beards. ;-)

But even in the case of "untyped variables" this is not quite true since 
you actually _can_ declare the types of variables in Common Lisp.

> So, yes, Lisp has "types", but those are not the kind of types Dan was
> talking about.  One might be tempted to say that it shows a kind of
> cluelessness about types if one does not recognize this fact.

I advised to use more explicit notions because it would make things easier 
to understand and more difficult to misunderstand. You seem to agree 
somewhat that the simple use of "untyped" is too ambiguous to be of _any_ 
value in a discussion. One cannot claim having a clue of a _very_ general 
concept like "types" if he only knows a rather smallish subset of the whole 
thing - so my
claim, that people who are unable see the evil consequences of ambiguous 
use of the term "untyped" are clueless, is a valid one.

ciao,
Jochen

--
http://www.dataheaven.de