From: Scott McKay
Subject: GCs (was: Why Isn't Lisp a Mainstream Language?)
Date: 
Message-ID: <19930205171052.1.SWM@SUMMER.SCRC.Symbolics.COM>
    Date: Fri, 5 Feb 1993 11:23 EST
    From: Joshua M Yelon <·······@ehsn11.cen.uiuc.edu>

    In article <············@nz12.rz.uni-karlsruhe.de> Bruno Haible writes:

    >>>The only remaining run-time burden from Lisp
    >>>(vs. C or Pascal) is garbage collection.

    ····@aiai.ed.ac.uk (Jeff Dalton) replies:

    >GC isn't necessarily a burden.  For instance, there must be cases
    >where the overheads of explicitly keeping track of storage in C
    >are greater than the overheads of a good garbage collector.  And
    >using malloc to allocate lists (in a naive way such as one malloc
    >per cons) is almost certainly more expensive than the Lisp allocator.
    >I can imagine that allocating a cons could be reduced to one
    >instruction in Lisp (add 2 to the "next cons" pointer and rely 
    >on a reference to unmapped memory to detect the need for a GC.)

    Unfortunately, what you say is true, but it's missing the bigger
    picture.  Like you say, it should be very fast to allocate cons cell.
    Likewise, it should be very fast to allow the collector to collect it.

    The real cost of a GC, though, isn't the time it takes to allocate and
    deallocate memory.  It's the fact that every other part of the system
    must try very hard not to confuse the GC.

The real cost of not having a GC is the time it takes every programmer
to chase down dangling pointers and memory leaks.  Isn't it better for a
small handful of qualified (or at least interested) people, namely the
people who implement the language and the GC, to do this exactly once,
than it is for every single "end-programmer" to have to do it?

I freely admit that there are times when a GC imposes an unacceptable
overhead, but that is just not the case most of the time.  One rather
interesting thing about modern generational copying GCs is that they
often improve the performance of programs over the long run because the
keep the working set of the program quite compact.

    Here is a random smattering of the things you typically have to do to
    keep the GC unconfused:

	* when you allocate an activation record, you must typically
	  initialize all the local variables, even if the optimizer says
	  that the user isn't going to read them before he writes them,
	  so that the GC doesn't find nonsense data and get confused.

That's a good point.

	* you must never borrow a bit from a pointer.  Lots of tree and
	  graph algorithms require a "bit" or two in the nodes.  In more
	  traditional languages, these bits can often be borrowed from
	  the bottoms of pointers.  However, in lisp, this would confuse
	  the GC, so you must allocate an extra word for the bit.

Why does borrowing a bit from the bottom of the word confuse the GC?  It
must not be very confusing, since most of the Lisp implementations I know
of that run on standard hardware do this.

	* you must typically statically partition the machine registers
	  into those that contain lisp-objects, and those that contain other
	  data, so that the GC knows which values to attempt to traverse.

	* you must often put tags in your stack activation records, so that
	  the GC can recognize and understand the format of the stack.

	* it is generally impossible to maintain pointers to the middle of
	  objects, because the GC cannot find the header.  For example,
	  In C, one can scan an array very rapidly by pointing to the
	  first element, then incrementing the pointer (it doesn't matter
	  whether the programmer or the optimizer wrote the code that does
	  this).  In most lisps, scanning an array takes more operations.

    This list keeps going, but I gotta go...

What you say above is all true, but I think you may be reaching the
wrong conclusion.  I believe that the correct conclusion is that
hardware architectures should support tagged data.  Adding as few as 2
bits of tag to every memory word would have the doubly useful effect of
making GCs easier to implement for *any* language, and having the object
type manifest in each memory word also makes it substantially easier to
prevent software from running wild over bad data.  Symbolics hardware
has done this for years, and it's great.  Of all of the stuff that we've
done over the years at `Bolics, this is the main thing that I wish had
caught on.  It's so cheap to implement, and it buys you so much...

From: Joshua M Yelon
Subject: Re: GCs (was: Why Isn't Lisp a Mainstream Language?)
Date: 
Message-ID: <C1zpL4.3Fw@news.cso.uiuc.edu>
>I freely admit that there are times when a GC imposes an unacceptable
>overhead, but that is just not the case most of the time.

I agree completely.  I love GC's ... but I simply have to accept the fact that
they do have requirements that can slow down a language.  That's a price I'm
willing to pay.

Incidentally, I'm not willing to pay the price for the extra safety, as you
suggest.  I like garbage collectors for one big reason.  Watch:

Suppose I want to write this in C:

linkedlistint append(sublist1, sublist2)
    linkedlistint sublist1;
    linkedlistint sublist2;
    { ...

If I implement this like the LISP version, which makes a copy of
sublist1 and makes the last cell of the copy point to sublist2, then
it will run just as fast as the lisp subroutine.  But, what do you do
when it comes time to dispose the result?  The result contains cons
cells that were created by append, and also cons cells that existed
before, as a part of sublist2.  Any attempt to dispose the list will
also dispose sublist2 as an undesired side-effect.

The standard solutions are:

  S1) Copy both sublists, so that the whole result value is fresh conses,
      so that the person using the result can dispose it when he is done.
      This is slow, because it involves extra copying.  And, in similar
      subroutines which simply append, cons, or otherwise "tack on" a
      little bit to a large existing structure, copying is prohibitive.

  S2) Have the user pass in a buffer in which to build the new cons cells.
      When he no longer wants the new list, he should dispose the buffer.
      This is the solution typically taken by C library routines like
      strcat.  Unfortunately, it requires the caller of the subroutine to
      try to allocate memory for the subroutine, when he doesn't know how
      much memory the subroutine will need.  This leads to lots of large,
      fixed-size buffers.

So what conclusions do I draw?  These:  In C, a subroutine that returns a
new object must somehow tell its caller which parts of the new object
should be disposed. This can either be done by:

    * Filling an auxiliary data structure with this information,
      (ie, the buffer in example S2 above), or,

    * Making sure that the object it returns is all disposable, by
      extra copying (as in S1 above).

Both are undesirable: in the first case, the user of what would otherwise
be an elegant subroutine now has to worry about an auxiliary data structure,
and in the second case, speed is sacrificed.

There is no solution to this problem in C.  There is simply no way to make
fast, elegant append subroutine with correct 'dispose' behavior.
From: Michael Greenwald
Subject: Re: GCs (was: Why Isn't Lisp a Mainstream Language?)
Date: 
Message-ID: <michaelg.728940296@Xenon.Stanford.EDU>
···@stony-brook.scrc.symbolics.com (Scott McKay) writes:

>    Date: Fri, 5 Feb 1993 11:23 EST
>    From: Joshua M Yelon <·······@ehsn11.cen.uiuc.edu>

>    Here is a random smattering of the things you typically have to do to
>    keep the GC unconfused:

>	* when you allocate an activation record, you must typically
>	  initialize all the local variables, even if the optimizer says
>	  that the user isn't going to read them before he writes them,
>	  so that the GC doesn't find nonsense data and get confused.

>That's a good point.

Mr Yelon's point is true, "typically", but it doesn't seem much more
compelling than his others.

You can get around initializing the locals by having the compiler
produce a table of PC-range to templates (a type description of the
activation record at the given PCs, often suggested when people want
to implement run-time type checking on standard hardware without
losing bits on pointers >or< immediates).  In those cases, a simple
bit denoting an entry is a pointer (rather than an immediate) is
sufficient for the GC.  Here, the bit would denote "initialized" or
not.

There's no run-time cost for the user; a bit of extra overhead for the
GC when it scanned the stack, and a bit of space overhead per function
(smaller than debugger info, but, then again, many people view
debugger overhead as unacceptably high.)

>	* you must typically statically partition the machine registers
>	  into those that contain lisp-objects, and those that contain other
>	  data, so that the GC knows which values to attempt to traverse.

(a) Similar template like solution is possible.  
(b) Some older machines had to do that already (address vs. data
registers).  It makes register allocation a little harder, but not
impossible.

>	* you must often put tags in your stack activation records, so that
>	  the GC can recognize and understand the format of the stack.

Again, the stack is already tagged by the return PC, and is amenable
to the same type of tricks above.

>	* it is generally impossible to maintain pointers to the middle of
>	  objects, because the GC cannot find the header.  For example,
>	  In C, one can scan an array very rapidly by pointing to the
>	  first element, then incrementing the pointer (it doesn't matter
>	  whether the programmer or the optimizer wrote the code that does
>	  this).  In most lisps, scanning an array takes more operations.

If the pointer is tagged as a pointer, there are plenty of techniques
that allow the GC to find the object header.  

Even ignoring this, if it is important to scan arrays quickly then a
language vendor can write a subprimitive (or compiler optimizer) that
scans an array using the C technique, while holding a pointer to the
header of the array in an extra register.  The only trick is making
sure the GC relocates the "hidden" pointer if the array is moved out
from under.

Most of the stack issues mentioned above, are also debugger issues.
I'm sure you're talking about "production quality" object code, but
solving these problems in a way that lets you debug finished products
is an added bonus.


I agree that these are issues you don't need to confront if you're not
using a language with a GC.  I'm sure you were aware of the
work-arounds, too (I noticed your cautious use of the word "typical").
Where we differ seems to be in the conclusion: I think most of the
problems you presented are amenable to one-time solutions by language
implementors, and don't >need< to impose a burden on the performance
conscious lisp programmer.  There >are< still arguments about how
useful a GC is, but I don't think these arguments should tip the
scale.