From: ··············@gmail.com
Subject: The joy of garbage collection
Date: 
Message-ID: <1128008720.683586.58350@o13g2000cwo.googlegroups.com>
Hi. Sorry if this is a bit of topic, but I thought people here would be
more interested than people in comp.lang.c
I'm writing a simple lisp interpreter, just for the sheer joy of it,
and I've fallen down a bit at the memory management stage. I want to
implement a simple mark and sweep garbage collector, and I understand
how to do the mark part (having dillegently studied Knuth), but it
seems that the sweep bit is incompatible with storing unboxed data in
the heap, which I would like to do. Once the gc has marked all of the
objects that are active, how does it free up the rest, and in what
sense are they free. Does the program have to maintain a list of free
memory somewhere, and if so, how?
Thanks in advance

From: MSCHAEF.COM
Subject: Re: The joy of garbage collection
Date: 
Message-ID: <WeidndEoRu1t3KHeRVn-jQ@io.com>
In article <·······················@o13g2000cwo.googlegroups.com>,
··············@gmail.com <··············@gmail.com> wrote:
  ...
>Does the program have to maintain a list of free
>memory somewhere, 

Yes.

>and if so, how?

SIOD does it with a linked list of cells on the heap.

allocate = remove a cell from the freelist, set its type tag appropriately

mark = traverse the heap, setting a mark bit

sweep = scan the heap. Every unmarked cell that has a type tag
other than TC_FREE_CELL needs to be added to the free list and
have its type tag set to TC_FREE_CELL.

The heap itself is an array or arrays of a C union.

There are a number of significant downsides to the approach, but its 
simple to read/maintain.

-Mike
-- 
http://www.mschaef.com
From: Dave Baum
Subject: Re: The joy of garbage collection
Date: 
Message-ID: <Dave.Baum-C0837F.16482929092005@newshost.mot.com>
In article <·······················@o13g2000cwo.googlegroups.com>,
 ···············@gmail.com" <··············@gmail.com> wrote:

> Hi. Sorry if this is a bit of topic, but I thought people here would be
> more interested than people in comp.lang.c
> I'm writing a simple lisp interpreter, just for the sheer joy of it,
> and I've fallen down a bit at the memory management stage. I want to
> implement a simple mark and sweep garbage collector, and I understand
> how to do the mark part (having dillegently studied Knuth), but it
> seems that the sweep bit is incompatible with storing unboxed data in
> the heap, which I would like to do. Once the gc has marked all of the
> objects that are active, how does it free up the rest, and in what
> sense are they free. Does the program have to maintain a list of free
> memory somewhere, and if so, how?
> Thanks in advance

How you sweep the objects depends on how you allocate them.  For 
example, if each new object results in a call to malloc(), then you can 
call free() to dispose of the object when it is no longer needed.

In general, malloc()/free() is probably not your best bet for 
allocating/deallocating many small objects over and over again.   This 
is especially true if large groups of the objects are of uniform size 
(such as the cons cells in a lisp interpreter).  In such cases you 
should maintain your own pool of free cons cells - perhaps allocating 
1000 of them at a time from the heap using malloc(), then handing them 
out one at a time to the lisp interpreter.

However, if you're just writing the interpreter for fun, then don't 
worry about the performance and get things working with malloc()/free().

If you want to focus on other aspects of the interpreter and not worry 
about GC at all, then you can use a conservative garbage collector for C 
such as the one described at 
http://www.hpl.hp.com/personal/Hans_Boehm/gc/

Dave
From: Alan Crowe
Subject: Re: The joy of garbage collection
Date: 
Message-ID: <86zmpvbzl2.fsf@cawtech.freeserve.co.uk>
robbie.carlton wrote:
> I'm writing a simple lisp interpreter, just for the sheer
> joy of it, and I've fallen down a bit at the memory
> management stage.

You have to think very hard about the exact scope of
learning exercise. The usual idea is to inherit the garbage
collection from the native environment. You write a
meta-circular interpreter, without a garbage collecter, and
the underlying lisp takes care of it for you.

If you specifically want to learn about garbage collection,
well I'm not so sure. Perhaps you need to allocate a big
array and store things in it appropriate to the imaginary
word of your software machine. So maybe a cons is defined by
a defstruct

(defstruct cons mark car cdr)

and car and cdr are always pointers

(defstruct pointer tag mark address)

where address is a number that indexes your big
array. (something like that, I've never tried it)

Then the underlying lisp never needs more memory than the
space required for big array, times a small constant
factor. All the things you put in it don't have pointers
themselves, just "addresses" = numbers in your software
machine, so they don't keep anything else alive.

If you write nils to the big array, then the underlying lisp
can collect the objects thats used to be stored there (once
you have written nil to the last reference.) but I don't
think you need this.

> but it seems that the sweep bit is incompatible with
> storing unboxed data in the heap

I'm very confused here because there are two heaps on the
go. If you have decided to make your educational toy work at
the level of 32-bit words and your big array is

(make-array memory-size :element-type '(unsigned-byte 32))

many lisp implementations will store the unsigned bytes
unboxed. The array is on the heap, and gets collected all in
one go. The elements cannot point anywhere, they are just
numbers.

Meanwhile, inside your education toy, they are not just
numbers. Well maybe the lsb is your tag bit and even ones
are your fixnums, and odd ones are pointers and point
elsewhere in the "heap". You traverse these pointers during
marking. I don't see how they could be on the heap
individually and unboxed, where would the mark bit go, how
could sweep know to leave them?

Maybe the bit you are missing is the free list. Sweep writes
stuff into the cells it sweeps up, organising them into a
linked list. It needs just one cell for the start of the
free list. The rest of the free list hosts itself in the big
array. 

I hope I am remembering this right, it is 23 years ago that
I did a Schorr Waite thingy in 6809 assembler. If you want
to learn this stuff today it is an excellent idea to do it
in a high level language, but you will have to fight the
level confusion ie your heap versus the underlying
implementations heap.

Alan Crowe
Edinburgh
Scotland
From: Pascal Bourguignon
Subject: Re: The joy of garbage collection
Date: 
Message-ID: <87wtkz3kft.fsf@thalassa.informatimago.com>
···············@gmail.com" <··············@gmail.com> writes:

> Hi. Sorry if this is a bit of topic, but I thought people here would be
> more interested than people in comp.lang.c
> I'm writing a simple lisp interpreter, just for the sheer joy of it,
> and I've fallen down a bit at the memory management stage. I want to
> implement a simple mark and sweep garbage collector, and I understand
> how to do the mark part (having dillegently studied Knuth), but it
> seems that the sweep bit is incompatible with storing unboxed data in
> the heap, which I would like to do. Once the gc has marked all of the
> objects that are active, how does it free up the rest, and in what
> sense are they free. 

They're free to be allocated for new objects.

> Does the program have to maintain a list of free
> memory somewhere, and if so, how?

Not the program, the memory management module.  

It can be done however you want.  You can keep a linked list of free
memory cells, or you can move down marked objects at one side of the
heap, and therefore have a big free block on the other side of the
heap, or you can keep a separate bitmap of free memory cells, or
anything else.



If you want to reuse, or just read an example of a simple mark and
sweep gc, have a look at:

cvs -z3 -d ··················@cvs.informatimago.com:/usr/local/cvs/public/chrooted-cvs/cvs co common/common-lisp/heap.lisp

http://www.informatimago.com/develop/lisp/


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

Nobody can fix the economy.  Nobody can be trusted with their finger
on the button.  Nobody's perfect.  VOTE FOR NOBODY.
From: Ulrich Hobelmann
Subject: Re: The joy of garbage collection
Date: 
Message-ID: <3q2lpqF74u3dU1@individual.net>
As Alan wrote, it's easiest to just write your Lisp inside Lisp, so you 
get all the benefits from a good compiler and a performant garbage 
collector for your language (and for free!).

If you want to experiment with garbage collection, however, I suggest 
you read some nice papers on memory management from this page:
http://www.cs.utexas.edu/users/oops/papers.html
especially "Uniprocessor Garbage Collection Techniques" and "Dynamic 
Storage Allocation: A Survey and Critical Review", maybe the second one 
before the first if you want to know how other memory managers (malloc) 
are implemented.

To answer your free-list question: it depends how you manage memory.  If 
you get your memory blocks from malloc(), then the GC's sweep-phase 
should free() the blocks.  If you roll your own memory manager on large 
blocks (that malloc() or some other means provides), then of course you 
need your own mechanism (such as free-lists) to make sure that that 
memory is available for future allocation requests.

-- 
Do or do not.  There is no try.
   Yoda
From: ··············@gmail.com
Subject: Re: The joy of garbage collection
Date: 
Message-ID: <1128021097.355163.32190@z14g2000cwz.googlegroups.com>
Thanks for the replies folks, there's definitely some things I can use
in there. I'm implementing it in c, which I should have made clearer,
so I can't rely on the native environment for garbage collection. And
yes, one of the objectives is to get an understanding of garbage
collection


The gc papers look good, I've been looking for something like that, but
there's a lot of extremely technical papers out there that assume much
prior knowledge. It's good to have a place to start.

Also, the free list is the part I was missing, and the various ideas
for implementing that are interesting. I especially like the bitmap
idea.

"I don't see how they could be on the heap
individually and unboxed, where would the mark bit go, how
could sweep know to leave them?"

Yes! This is what I still don't understand. Where do you put the
unboxed data, as it can't go on the heap. I think I could now implement
a very simple garbage collector if everything was boxed, but I'd like
strings, arrays and various c data structures as unboxed data. Does
this make sense?
From: Pascal Bourguignon
Subject: Re: The joy of garbage collection
Date: 
Message-ID: <87zmpvzmxe.fsf@thalassa.informatimago.com>
···············@gmail.com" <··············@gmail.com> writes:

> Thanks for the replies folks, there's definitely some things I can use
> in there. I'm implementing it in c, which I should have made clearer,
> so I can't rely on the native environment for garbage collection. And
> yes, one of the objectives is to get an understanding of garbage
> collection
>
>
> The gc papers look good, I've been looking for something like that, but
> there's a lot of extremely technical papers out there that assume much
> prior knowledge. It's good to have a place to start.
>
> Also, the free list is the part I was missing, and the various ideas
> for implementing that are interesting. I especially like the bitmap
> idea.
>
> "I don't see how they could be on the heap
> individually and unboxed, where would the mark bit go, how
> could sweep know to leave them?"
>
> Yes! This is what I still don't understand. Where do you put the
> unboxed data, as it can't go on the heap. I think I could now implement
> a very simple garbage collector if everything was boxed, but I'd like
> strings, arrays and various c data structures as unboxed data. Does
> this make sense?

The bitmap, or separate typecode, are not the fastest solutions.  But
if you want to mix easily C with Lisp, they might be worthwhile.

A somewhat intermediate solution between boxed and separate typecode,
is to assign type to each page.  You can have pages of int32, pages of
char, pages of float, etc.  But for random C structures, you can only
use separate typecodes, and this means that you have to duplicate
about all processing.

Otherwise, for C values, you can use/reimplement a conservative GC
like BoehmGC.


From the lisp side, I would keep all values boxed.  C values can come
from a C heap, or C static data or C stack.  But all C values can be
refered to with a pointer.  So you can have a boxed and typed pointer
to refer to C values from the Lisp side.  Do you need to GC C values?
I guess not.  If you mean to use existing C libraries, they do their
own memory management.

(make-c-pointer c-type c-address) --> c-locative
(type-of c-locative)              --> 'C-LOCATIVE
(type-of-c-value c-locative)      --> c-type
(c-value c-locative)              --> lisp-value
(c-value c-locative) is a place.

So you wouldn't need to manage unboxed C values in your lisp heap.


-- 
"Our users will know fear and cower before our software! Ship it!
Ship it and let them flee like the dogs they are!"
From: Sylvain
Subject: Re: The joy of garbage collection
Date: 
Message-ID: <vvqdneh9LblHfaHeRVn-jA@speakeasy.net>
··············@gmail.com wrote:
> The gc papers look good, I've been looking for something like that, but
> there's a lot of extremely technical papers out there that assume much
> prior knowledge. It's good to have a place to start.

you might want to look at Richard Jones' book "Garbage
Collection" and home page:

http://www.cs.kent.ac.uk/people/staff/rej/gc.html

--Sylvain
From: Oleg A. Paraschenko
Subject: Re: The joy of garbage collection
Date: 
Message-ID: <20050930100015.56a2145d.usenet@datahansa.com>
  Hi Robbie,

On 29 Sep 2005 08:45:20 -0700
···············@gmail.com" <··············@gmail.com> wrote:

> Hi. Sorry if this is a bit of topic, but I thought people here would be
> more interested than people in comp.lang.c
> I'm writing a simple lisp interpreter, just for the sheer joy of it,
> and I've fallen down a bit at the memory management stage.

  Look at the documentation for Guile (a Scheme interpreter written in C).
A chapter on garbage collection is good. It briefly explains mark&sweep,
issues and technical implementation. You can borrow some ideas.

-- 
Oleg Paraschenko  ····@ http://uucode.com/
http://uucode.com/blog/  Generative Programming, XML, TeX, Scheme