From: Tim Olson
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <54iqsk$5ch@cerberus.ibmoto.com>
In article <··········@cerberus.ibmoto.com>
···@apple.com (me) writes:

> ... Is anyone aware
> of any work that has been done in additionally performing some sort of
> aging on the reachable set?

To clarify, when I mentioned "aging", I'm not talking about existing
generational GC schemes, here. Generational GC's migrate "reachable"
objects into successively older generation bins every time they survive
a garbage collection.  However, this generational migration does not
(as far as I know!) take into account whether the object was recently
used or not -- only that it has survived a garbage collection.  I'm
using the term "aging" to differentiate frequently-used objects from
infrequently-used objects.

Consider an infrequently used function that is bound to the global
symbol table at the top level.  Because it exists in this structure, it
is "reachable" (you cannot prove it will not be used, so you cannot
discard it as garbage).  Therefore it will survive all GC's, finding
its way to the oldest generation bin.  Likewise, a very frequently used
function similarly bound at the top level will find its way there. 
Ideally you would like to differentiate the two so that more resources
can be applied to the frequently-used function (e.g. JIT compilation to
native code), and less resources to the infrequently-used function
(e.g. memory compression).


-- Tim Olson
Apple Computer, Inc.
(···@apple.com)

From: Doug Bell
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <dbell-2210961640400001@news2.cts.com>
···@apple.com (Tim Olson) wrote:

> ···@apple.com (me) writes:
> 
> > ... Is anyone aware
> > of any work that has been done in additionally performing some sort of
> > aging on the reachable set?
> 
> To clarify, when I mentioned "aging", I'm not talking about existing
> generational GC schemes, here. Generational GC's migrate "reachable"
> objects into successively older generation bins every time they survive
> a garbage collection.  However, this generational migration does not
> (as far as I know!) take into account whether the object was recently
> used or not -- only that it has survived a garbage collection.  I'm
> using the term "aging" to differentiate frequently-used objects from
> infrequently-used objects.
> 
> Consider an infrequently used function that is bound to the global
> symbol table at the top level.  Because it exists in this structure, it
> is "reachable" (you cannot prove it will not be used, so you cannot
> discard it as garbage).  Therefore it will survive all GC's, finding
> its way to the oldest generation bin.  Likewise, a very frequently used
> function similarly bound at the top level will find its way there. 
> Ideally you would like to differentiate the two so that more resources
> can be applied to the frequently-used function (e.g. JIT compilation to
> native code), and less resources to the infrequently-used function
> (e.g. memory compression).

···@apple.com (Tim Olson) wrote:

> ···@apple.com (me) writes:
> 
> > ... Is anyone aware
> > of any work that has been done in additionally performing some sort of
> > aging on the reachable set?
> 
> To clarify, when I mentioned "aging", I'm not talking about existing
> generational GC schemes, here. Generational GC's migrate "reachable"
> objects into successively older generation bins every time they survive
> a garbage collection.  However, this generational migration does not
> (as far as I know!) take into account whether the object was recently
> used or not -- only that it has survived a garbage collection.  I'm
> using the term "aging" to differentiate frequently-used objects from
> infrequently-used objects.
> 
> Consider an infrequently used function that is bound to the global
> symbol table at the top level.  Because it exists in this structure, it
> is "reachable" (you cannot prove it will not be used, so you cannot
> discard it as garbage).  Therefore it will survive all GC's, finding
> its way to the oldest generation bin.  Likewise, a very frequently used
> function similarly bound at the top level will find its way there. 
> Ideally you would like to differentiate the two so that more resources
> can be applied to the frequently-used function (e.g. JIT compilation to
> native code), and less resources to the infrequently-used function
> (e.g. memory compression).


Tim,

I don't know of any formal work on this, although I'm sure one must exist,
but I have implemented (for a commercial game) a memory manager which did
exactly what you discuss.  It was for a game which managed a number of
reproducible resources.  When it recieved a request for a resource, it
checked to see if it had already been produced and was available in the
memory pool.  If it wasn't then the resource had to be produced.  The
largest class of managed resources was compressed graphics, which were
decompressed on the fly, and maintained in the memory pool until something
else needed the memory.  This was developed and refined over the course of
several successful games, with a great deal of analysis applied to the
strategy  of choosing which blocks to reclaim when memory was needed. 
(Now don't go thinking that just because it was for a game it wasn't
sophisticated, 'cause you'd be wrong. :)

What I discovered from my analysis was that using a single metric to
determine priority amoung reclaimable blocks wasn't sufficient.  For my
purposes, the single most important metric usually was not how frequently,
but rather how _recently_, a resource had been requested, because this was
a better prediction of how likely it was to be requested again.  Another
consideration, in a few special cases, was the ratio of CPU resources
required to produce the data vs. the memory resources required to maintain
the result.  Certain classes of data elements which were expensive to
produce, but cheap to maintain, were given a higher priority.  When it
came time to reclaim memory, the lowest priority element was reclaimed.

Based on my experience, I expect that it will be very important to get
good information on how the system performs in real world circumstances in
order to select the best compromise between the various factors in
prioritizing elements for deletion.  I would be hard to stress this point
too much as the characteristics of a chaotic system are difficult to
accurately predict, and poor choices in implementation can result in
thrashing.  (Actually, so can good choices, so it's rally just a matter of
degree.)

Thrashing is what I called it when resources were repeatedly discarded
shortly before they were requested again.  Consider the simplest case
where you have three resources, A, B, & C, which are requested in cyclic
order, and memory resource adequate to hold any two of the resources. 
Then prioritizing on the most recently accessed resource results in:

produce A,
procuce B,
reclaim A, produce C,
reclaim B, produce A,
reclaim C, produce B,
reclaim A, produce C,
reclaim B, produce A,
reclaim C, produce B,
...

where every request from this point out will result in a resource production.
Changing the prioritizing algorithm so that the _least_ recently requested
resource is reclaimed last improves the usage to:

produce A,
procuce B,
reclaim B, produce C,
use A,
reclaim C, produce B,
reclaim B, produce C,
use A,
reclaim C, produce B,
...

Of course, removing the most recently requested resource is not a good
scheme for the list as a whole, but perhaps surprisingly, it did have a
place in my final implementation.

What I ended up with was a simple linked list with multiple insertion
points.  It turned out that distinguishing resources based on cost of
production only required two categories: normal and high-priority. 
Resources were deleted from the tail of the list, and inserted at one of
two heads.  The implementation could be generalized to have 'n' different
priority levels, but over-segregating the list could have ramifications
for potential thrashing scenarios.  If an item in the pool was requested,
it was moved to the appropriate insertion point.

The list might look something like:

 tail            insert      high_pri           head
  |                 |           |                 |
  v                 v           v                 v
  R --> R --> R --> R --> R --> R --> R --> R --> R

where:
R is a data resource,
'tail' is the next element to be reclaimed,
'insert' is the insertion point for the next normal-priority resource,
'high-pri' is the insertion point for the next high-priority resource, and
'head' is the last element in the list to be reclaimed.

If a normal-priority resource * was inserted in the list, the list would
look like:

 tail            insert             high_pri           head
  |                 |                 |                 |
  v                 v                 v                 v
  R --> R --> R --> * --> R --> R --> R --> R --> R --> R

Now my particular implementation had clearly defined points (a game
"frame") where resource reuse characteristics for normal priority
resources had a discontinuity in the probably of reuse.  That is to say
that a graphic which was requested in the immediately prior frame was much
more likely to be requested in the current frame than even a graphic from
two frames ago.  At the end of each frame, the 'insert' pointer was set to
the 'high_pri' pointer, such that after the first element X was added to
the new frame, the list would look like:

 tail                             insert  high_pri          head
  |                                   |     |                 |
  v                                   v     v                 v
  R --> R --> R --> * --> R --> R --> X --> R --> R --> R --> R

(High-priority items were not distinguished by frame.)

An alternative approach would be to simple age the list based on the
number of resource requests (or possibly productions) and periodically
move the insertion point(s) up.  The reason that the insertions were not
made at the head of the list was to reduce thrashing.  Consider what
happens when tail == insert.  This means that recently produced resources
are being reclaimed.  By inserting recently requested resources at the end
of the recent list instead of at the head, thrashing was reduced in my
case, but YMMV.  If you don't anticipate running where memory resources
are low enough to initiate thrashing, or through empirical evidence
determine that this is not a factor, then a simple implementation which
inserts at the head of the list is adequate.

One other thing which I left out to simplify the discussion is that each
resource in the list maintained a frequency count.  This was only relevant
for resources from 'insert' but less than 'high_pri' (high priority
resource were not frequency sorted.)  So really each time a resource was
inserted at 'insert', it's frequency counter was incremented if it had
been removed from the (insert, high_pri) list, or set to 1 if it not from
the list and then it was moved up the list (toward the head) until the
next item above it in the list had the same frequency.

The management was based on a check out/in sequence, so that while a
resource was checked out, it was not in the list and thus not available
for reclamation.  This system served graphics and data to a real-time
game, managing thousands of resources and several megabytes of memory. 
The costs of the resource management were entirely negligible, especially
when compared to the costs of producing resources.  I imagine a JITC would
have similar characteristics in this regard.

Anyway, I imagine my approach is probably not going to fit your
requirements as is, but there's one story from the field in addressing
this issue.  If you have questions, I'd be glad to try and answer them.

Doug Bell
·····@shvn.com
From: Torben AEgidius Mogensen
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <54koqm$593@vidar.diku.dk>
···@apple.com (Tim Olson) writes:

>Consider an infrequently used function that is bound to the global
>symbol table at the top level.  Because it exists in this structure, it
>is "reachable" (you cannot prove it will not be used, so you cannot
>discard it as garbage).  Therefore it will survive all GC's, finding
>its way to the oldest generation bin.  Likewise, a very frequently used
>function similarly bound at the top level will find its way there. 
>Ideally you would like to differentiate the two so that more resources
>can be applied to the frequently-used function (e.g. JIT compilation to
>native code), and less resources to the infrequently-used function
>(e.g. memory compression).

While not exactly what you describe I have a student working on a
simulator for FP instructions on a system without FP hardware. The
simulator uses a kind of JIT scheme to generate code for an FP
instruction the first time it is used, and then calls this code on
subsequent times. There is a fixed sized memory used for this code and
when this runs out some kind of GC is required. You have no
reachability information so a replacement scheme based on LRU or some
other cache-like replacement scheme is needed. Initially, though, the
plan is to just clear the entire code cache when it fills. If this
works fine, it will stay that way. Otherwise, more complex schemes
will be tried. The bookkeeping for this must be rather small, though,
as it otherwise will slow down the simulation.

	Torben Mogensen (·······@diku.dk)
From: Shannon Spires
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <svspire-ya023180002410960113150001@news.nmia.com>
In article <··········@cerberus.ibmoto.com>, ···@apple.com (Tim Olson) wrote:

> In article <··········@cerberus.ibmoto.com>
> ···@apple.com (me) writes:
> 
> > ... Is anyone aware
> > of any work that has been done in additionally performing some sort of
> > aging on the reachable set?
> 
> To clarify, when I mentioned "aging", I'm not talking about existing
> generational GC schemes, here. Generational GC's migrate "reachable"
> objects into successively older generation bins every time they survive
> a garbage collection.  However, this generational migration does not
> (as far as I know!) take into account whether the object was recently
> used or not -- only that it has survived a garbage collection.  I'm
> using the term "aging" to differentiate frequently-used objects from
> infrequently-used objects.
> 
> Consider an infrequently used function that is bound to the global
> symbol table at the top level.  Because it exists in this structure, it
> is "reachable" (you cannot prove it will not be used, so you cannot
> discard it as garbage).  Therefore it will survive all GC's, finding
> its way to the oldest generation bin.  Likewise, a very frequently used
> function similarly bound at the top level will find its way there. 
> Ideally you would like to differentiate the two so that more resources
> can be applied to the frequently-used function (e.g. JIT compilation to
> native code), and less resources to the infrequently-used function
> (e.g. memory compression).

Given the proper hooks, you can add this to an existing GC mechanism
and thus bring it under programmer control. Consider a set of objects,
some of which are more important to keep in RAM than others. Put them
all into a weak hash table or weak array (most Lisp vendors offer
weak structures of some type) so that the GC can collect them if
necessary. In addition, bind the most important ones to a normal
[strong] array. Now write a hook function that runs on completion of
a GC cycle: the hook will check to see if the GC was able to free
as much memory as was requested, and if not, it will detach a few
of the critical objects from the strong array and run the GC again.

The bookkeeping of which structures are "critical" is up to you.

You have to have weak structures, post-GC hooks, and the ability to
run the hook before the GC throws an "out of memory" error. MCL
comes pretty close here. Allegro offers weak structures but I'm not
sure what kind of GC hooks it has. (I'd be interested in finding out.)

--
Shannon Spires
·······@telespin.com