Hybrid RC/GC system to support real-time applications that can't afford pauses?

From: Robert Maas, http://tinyurl.com/uh3t
Subject: Hybrid RC/GC system to support real-time applications that can't afford pauses?
Date: Fri, 11 Jul 2008 15:16:57 +0000
Message-ID: <rem-2008jul11-002@yahoo.com>

Thursday I came up with a new idea (at least new for me anyway):

Most algorithms in Lisp have shared structure but no pointer-loops
whatsoever except for loops that pass through a symbol. Even
algorithms that have explicit loops can often be re-coded to have
symbols at the key points where loops come around, and to
explicitly keep a database/set of such key points.

Furthermore, many algorithms that are commonly done by modifying
existing structure (by RPLACA/RPLACD or setf equivalent thereof)
are actually cleaner if done *without* modifying existing
structure. For example, a self-balancing binary search tree can be
coded such that updates re-build a path of length log(n) while
sharing all old structure off from that main path. Priority queues
implemented as heaps can likewise be implemented in that way (in
fact I actually did that in 2001.Nov, appx. 2.8k bytes total
including liberal comments). Both have the advantage that they are
guaranteed thread-safe as well as history-safe. (With virtually no
extra effort you can keep a log of earlier states of a sbbst or
heap, as many different states as you want, every last one of them
if you really want. This helps debugging and allows arbitrary undo
as far back as you want. You can use undo-by-history of these
structures to implement backtracking if the sbbst keeps the state
of the recursive search.)

The most obvious case where you really do want pointer loops is
when you model a graph (set of nodes with bi-directional links
between various pairs of nodes). But if each node is an uninterned
symbol (with list of links to neighbors hanging off the value cell
or property list), then every pointer-loop passes through such a
symbol, so again it satisfies my model of loop-free structures
except for loops that pass through symbols.

Now we all know if there are no loops then a reference count system
works just fine, and is in fact more efficient than a
garbage-collect system. It's just that it's awfully difficult to
write useful software without any pointer loops whatsoever, such
that a pure reference-count system would really work.

So my proposal is to have a hybrid garbage management system, and
to disallow side-effects *except* for changing the value cell or
property list directly under a symbol. For most structure, you use
reference counts, which allows immediate starting of re-cycling
garbage whenever a symbol's value or property list is updated so
that the reference count of the *former* value or property list is
decreased by one and reaches zero. For the symbols themselves, you
arrange that your application keeps track of which are actively
used. For example, when modeling a graph you usually have a master
list or table of all nodes in the graph, and either you explicitly
remove a node from the graph because you want to, or you discover a
node has no links and *then* remove it. So my idea is that when you
cease "using" a node, you submit to a garbage collector that first
erases both the value cell and property list to NIL then makes sure
nobody is looking at it. (I'm talking about non-interned symbols
here. Any symbol interned in a package is *never* garbage collected
in any case.) So the vast majority of storage that might ever be
recycled, everything except symbols, is handled by reference count,
and the garbage collector for symbols has a much easier time
because it needs consider *only* symbols which have been explicitly
offered for possible collection.

So anyway, yesterday I was working out in my mind how very simple a
separate-thread reference-count garbage-recycling would be, and how
it would require only about four local variables to hold two
emergency pointers plus two pointers to linked-lists acting as
queues of things that need further processing. When a main thread
updates the value (or property list) of a symbol, the only kind of
memory-modification side-effect allowed, it waits for a
lock/semaphore from the reference-count-recycling process, passes
it the pointer to the top CONS cell. The rcr process decrements the
rc, then if it's reached zero it immediately stores the CAR and CDR
pointers into the two emergency variables, and links the old cell
into the free list. Then both of the emergency pointers are
rc-decremented, where if one reaches zero it's linked through a
cell from the free list (the cell that was put there a moment ago)
into a queue of pointers already rc-decremented but not yet
deep-traced, and then with at most one emergency cell still in use
it can simply pretend like *that* pointer was passed to it from a
main process. I think it works, at least it seems to in my mind. Is
there a flaw, or would this incredibly simple design work just fine
so that the rc-recycling process can run just fine as separate
thread from the main threads and require only the slightest pause
in a main thread at the point of handoff of a toplevel value or
property from a symbol? Has anybody seriously considered such a
hybrid design in a Lisp system, maybe even implemented such?

The above description deals only with CONS-trees, not with trees
that have other kinds of branching nodes with more than two links,
such as general arrays. A slight modification of the algorithm
would seem to work there too. You put the pointer to the array in
one emergency variable, and move the pointer *in* the first element
of the array to the other emergency variable, replacing that first
element in the array by a count of how many elements in the array
still need to be traced. One by one as you have space available
another cell of the array is moved to the *other* emergency
variable and the count in the first element decremented again. Then
when that count reaches zero, you are done with the array and can
put *it* on a freelist for array repacking (a little more
complicated than CONS recycling where all cells are the same size
so no fragmentation occurs and no repacking is needed). CLOS
objects likewise can be handled by an enumeration of all the
general-pointer slots, using the *first* to hold the
count-remaining. Ditto with hash tables. I think my algorithm works
in all cases?? (Or I overlooked something crucial.)

(Mostly my idea is a counter to Jon Harrop who insists we need
 concurrent GC but Lisp doesn't have it and it's too difficult to
 implement so that makes Lisp inferior to his favorite language. It
 seems to me that my mostly-RC system would solve his problem and be
 easy to implement. Yeah, I'll miss RPLAC[A|D], but it's not too
 difficult to get the same effect by non-side-effect (except symbol
 values/properties) methods. Well a no-side-effect hash table
 emulated via a sbbst would be fun to implement. Really!)

Re: Hybrid RC/GC system to support real-time applications that can't afford pauses? Rob Warnock
- Re: Hybrid RC/GC system to support real-time applications that can't afford pauses? George Neuner
Re: Hybrid RC/GC system to support real-time applications that can't afford pauses? D Herring

From: Rob Warnock
Subject: Re: Hybrid RC/GC system to support real-time applications that can't afford pauses?
Date: Sat, 12 Jul 2008 02:18:57 +0000
Message-ID: <xfqdnU1TJ6cMieXVnZ2dnUVZ_oTinZ2d@speakeasy.net>

Robert Maas, <··················@spamgourmet.com.remove> wrote:
+---------------
| Most algorithms in Lisp have shared structure but no pointer-loops
| whatsoever except for loops that pass through a symbol.
+---------------

I suspect you are quite incorrect on this. Large graph-based
algorithms routinely have numerous cycles, up-pointers, sibling
pointers, double-linked lists [for easy insertion/deletion], etc., etc.
Jeff Shrager ["BioBike", etc.] might be able to confirm this.

+---------------
| Even algorithms that have explicit loops can often be re-coded
| to have symbols at the key points where loops come around, and to
| explicitly keep a database/set of such key points.
+---------------

Not algorithms that dynamically generate millions of graph nodes
with arbitrary linkages.

+---------------
| Now we all know if there are no loops then a reference count system
| works just fine, and is in fact more efficient than a garbage-collect
| system.
+---------------

Aha! Here's the main flaw! Reference-counting GCs are in fact *NOT*
more efficient than either copying or mark&sweep GCs!! Ref-counting
requires *stores* to the ref counts as they are modified, and this
causes read/modify/write pipeline stalls. Game over.

[Remainder ignored, being based on a fallacious assumption...]


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: George Neuner
Subject: Re: Hybrid RC/GC system to support real-time applications that can't afford pauses?
Date: Sat, 12 Jul 2008 04:54:50 +0000
Message-ID: <nqdg745aprlcc03cjt0jj4hvac5f9kn0k5@4ax.com>

On Fri, 11 Jul 2008 21:18:57 -0500, ····@rpw3.org (Rob Warnock) wrote:

>Robert Maas, <··················@spamgourmet.com.remove> wrote:
>
>+---------------
>| Now we all know if there are no loops then a reference count system
>| works just fine, and is in fact more efficient than a garbage-collect
>| system.
>+---------------
>
>Aha! Here's the main flaw! Reference-counting GCs are in fact *NOT*
>more efficient than either copying or mark&sweep GCs!! Ref-counting
>requires *stores* to the ref counts as they are modified, and this
>causes read/modify/write pipeline stalls. Game over.

Except for the interesting case of using just 1 bit to distinguish
whether or not the cell is shared.  Where most cells are not shared,
this type of system can be implemented very efficiently.

See:
Jones and Lins, "Garbage Collection: Algorithms for Automatic
Dynamic Memory Management", ISBN 0-471-94148-4.

George
--
for email reply remove "/" from address

From: D Herring
Subject: Re: Hybrid RC/GC system to support real-time applications that can't afford pauses?
Date: Sat, 12 Jul 2008 01:58:20 +0000
Message-ID: <V_qdnbr9qP4ikuXVnZ2dnUVZ_j2dnZ2d@comcast.com>

Robert Maas, http://tinyurl.com/uh3t wrote:
> Thursday I came up with a new idea (at least new for me anyway):

It seems interesting; but, coming from the boost::shared_ptr world, 
I'm not convinced reference counting is so hot.  In particular, 
thread-safe CAS operations every time you move something can become 
costly.  I find myself passing shared pointers (or their contents) by 
reference just to avoid the wasted overhead.

For instance, a function that conses a bunch of stuff and then returns 
a small result will generally do better in a GC than it would if those 
conses were reference counted.

But I totally agree with the sentiment; it seems that there should be 
some way for "advanced users" to give useful hints to the gc and 
locally change its behavior.  There seems to be a lot of room for 
experimentation and "cooperative garbage collection" (if there is such 
a term; I'm thinking roughly like custom allocators in C++ -- rarely 
used because they're hard to set up, but lots of potential).

- Daniel