From: Tim Olson
Subject: Garbage Collection and Aging
Date: 
Message-ID: <54ii57$q0o@cerberus.ibmoto.com>
Garbage collectors usually divide memory into two sets: unreachable
(provably unused) and reachable (not provably unused).  Is anyone aware
of any work that has been done in additionally performing some sort of
aging on the reachable set?  What I'm thinking of here is GC support
for knowing what objects are recently used vs. not recently used, so
that space-time tradeoffs can be performed automatically (e.g. JIT
compilation of frequently-used functions, discarding generated code for
infrequently used functions, or compression of infrequently-used
objects).


-- Tim Olson
Apple Computer, Inc.
(···@apple.com)

From: James McCartney
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <james-2210961400580001@slip-80-6.ots.utexas.edu>
In article <··········@cerberus.ibmoto.com>, ···@apple.com (Tim Olson) wrote:

> Garbage collectors usually divide memory into two sets: unreachable
> (provably unused) and reachable (not provably unused).  Is anyone aware
> of any work that has been done in additionally performing some sort of
> aging on the reachable set?  What I'm thinking of here is GC support
> for knowing what objects are recently used vs. not recently used, so
> that space-time tradeoffs can be performed automatically (e.g. JIT
> compilation of frequently-used functions, discarding generated code for
> infrequently used functions, or compression of infrequently-used
> objects).


It would seem that to keep track of that info you'd have to have 
some extra code executed at each access which would be expensive.

   --- james mccartney     ·····@clyde.as.utexas.edu    ·····@lcsaudio.com
If you have a PowerMac check out SuperCollider, a real time synth program:
ftp://mirror.apple.com//mirrors/Info-Mac.Archive/gst/snd/super-collider-demo.hqx
From: Mark Tillotson
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <MARKT.96Oct22212224@casper.harlqn.co.uk>
···@apple.com (Tim Olson) wrote:
> Garbage collectors usually divide memory into two sets: unreachable
> (provably unused) and reachable (not provably unused).  Is anyone aware
> of any work that has been done in additionally performing some sort of
> aging on the reachable set?  What I'm thinking of here is GC support
> for knowing what objects are recently used vs. not recently used, so
> that space-time tradeoffs can be performed automatically (e.g. JIT
> compilation of frequently-used functions, discarding generated code for
> infrequently used functions, or compression of infrequently-used
> objects).

I remember a paper about a system on the T.I. Explorer (I think), in
which the GC had a training mode, which activated a read barrier.
This was used to improve the locality of objects which temporally [sic]
reference each other (in training mode the GC copying everything on
first mutator read/write, thus clustering related objects into the
same page(s))

However I can't find reference this amongst the bibliographies...

I think the conclusion was that you can shrink the working set by a
factor of two this way, if you are prepared to put up with the slow
response during a training period.  I think the other conclusion was
that this technique allowed a heap to be 4 times larger than would
otherwise be bearable in a given amount of real memory under VM
paging.

For the JIT code-cache, you don't really need this generality
(read-barrier hooks), as it is easy to maintain a call count or
timestamp in the function object, and use this in deciding when to
interpret/code-generate, as well as when flushing old stuff from the
code-cache. 

There might be something useful about this in the paper on the Self
dynamic recompilation technology, obtainable from
    http://self.sunlabs.com/papers/third-generation.html

__Mark
[ ·····@harlequin.co.uk | http://www.harlequin.co.uk/ | +44(0)1954 785433 ]
[ personal homepage http://utter.chaos.org.uk/~markt/ |   fax "  " 785444 ]
From: Kelly Murray
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <54k0cc$kuc@sparky.franz.com>
> For the JIT code-cache, you don't really need this generality
> (read-barrier hooks), as it is easy to maintain a call count or
> timestamp in the function object, and use this in deciding when to
> interpret/code-generate, as well as when flushing old stuff from the
>code-cache. 


I don't have any references, but I believe that PARC used
a byte-code-to-native compiler for SmallTalk, but gave up on it.
If I recall correctly, they found it was essentially compiling
everthing, bloating the image dramatically.  
This increases the working set, perhaps slowing down everything
from additional page faulting.

It's also my sense from thinking about it,
that the need for speed of a function is not highly 
related to how often it is used.  User-interface stuff can
be execute very often, but being quite fast enough
as a byte-compiled function.  Some searching function might only
be executed once a day, but it should be very fast the
first time it's called..

-kelly murray  ···@franz.com http://www.franz.com


	
From: Jeffrey B. Siegal
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <jbs-2310960312550001@dial-sf2-22.iway.aimnet.com>
In article <··········@sparky.franz.com>, ···@franz.com wrote:
> It's also my sense from thinking about it,
> that the need for speed of a function is not highly 
> related to how often it is used.  User-interface stuff can
> be execute very often, but being quite fast enough
> as a byte-compiled function.  Some searching function might only
> be executed once a day, but it should be very fast the
> first time it's called..

In Scheme looping is represented by function calls (often tail
recursive).  A searching function almost certainly has some internal loop
which would mean that an internal function is being called many times. 
That function could easily be recognized as needing compilation (or other
optimization).

User interface functions will often not contain loops, or will contain
loops that are executed relatively few times.
From: Matt Kennel
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <54m27b$o29@gaia.ns.utk.edu>
Jeffrey B. Siegal (···@quiotix.com) wrote:
: In article <··········@sparky.franz.com>, ···@franz.com wrote:
: > It's also my sense from thinking about it,
: > that the need for speed of a function is not highly 
: > related to how often it is used.  User-interface stuff can
: > be execute very often, but being quite fast enough
: > as a byte-compiled function.  Some searching function might only
: > be executed once a day, but it should be very fast the
: > first time it's called..

: In Scheme looping is represented by function calls (often tail
: recursive).  A searching function almost certainly has some internal loop
: which would mean that an internal function is being called many times. 
: That function could easily be recognized as needing compilation (or other
: optimization).

In my experience with trying to get high performance out of OO programs
in Sather, the most important optimization for those really low-level loops
which must go fast is very aggressive inlining and subsequent data flow
analysis and loop hoisting.

I feel that programmer directed hints would be useful for optimization
even in an architecturally neutral distribution format.  I.e.: if you
can, optimize *this* section into machine code. 

: User interface functions will often not contain loops, or will contain
: loops that are executed relatively few times.

--
Matthew B. ··········@caffeine.engr.utk.edu/I do not speak for ORNL, DOE or UT
Oak Ridge National Laboratory/University of Tennessee, Knoxville, TN USA/ 
  I would not, could not SAVE ON PHONE,    |==================================
  I would not, could not BUY YOUR LOAN,    |The US Government does not like
  I would not, could not MAKE MONEY FAST,  |spam either.  It is ILLEGAL!
  I would not, could not SEND NO CA$H,     |USC Title 47, section 227
  I would not, could not SEE YOUR SITE,    |p (b)(1)(C) www.law.cornell.edu/
  I would not, could not EAT VEG-I-MITE,   | /uscode/47/227.html
  I do *not* *like* GREEN CARDS AND SPAM!  |==================================
               M A D - I - A M!
From: Michael Allen Latta
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <01bbc0e5$28e412a0$0929a9cd@latta.hologisys.com>
The currnt ParcPlace virtual machine does use a byte-code to native
compiler.  THere is
a "code cache" that holds a configurable amount of native code for
execution of methods.
I am not sure if all methods must be compiled to run or only selected ones
are compiled
to native code.  This cache however adds about a meg to the working set and
removes any
chance of code locality.

I hope that Java goes te route of fully compiled applications and keywords
are added to direct
the optimization of the JIT compiles, or the new technology that was shown
at OOPSLA makes
it to the market.  A company indicated that they have a VM that is 2 times
faster than current
JIT and 3.5 to 5 times faster than current Smalltalk VMs.  They used some
of the Self research
work to provide global on-the-fly optimizations and they tweeked the
garbage collection system.
-- 
Michael Allen Latta
Hologic SYstems International
······@hologisys.com

Kelly Murray <···@math.ufl.edu> wrote in article
<··········@sparky.franz.com>...
> I don't have any references, but I believe that PARC used
> a byte-code-to-native compiler for SmallTalk, but gave up on it.
> If I recall correctly, they found it was essentially compiling
> everthing, bloating the image dramatically.  
> This increases the working set, perhaps slowing down everything
> from additional page faulting.
> 
> It's also my sense from thinking about it,
> that the need for speed of a function is not highly 
> related to how often it is used.  User-interface stuff can
> be execute very often, but being quite fast enough
> as a byte-compiled function.  Some searching function might only
> be executed once a day, but it should be very fast the
> first time it's called..
> 
> -kelly murray  ···@franz.com http://www.franz.com
> 
> 
> 	
> 
From: Douglas Johnson
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <326e56e3.3947526@nntp.ix.netcom.com>
·····@harlqn.co.uk (Mark Tillotson) wrote:

>···@apple.com (Tim Olson) wrote:
>> What I'm thinking of here is GC support
>> for knowing what objects are recently used vs. not recently used, so
>> that space-time tradeoffs can be performed automatically (e.g. JIT
>> compilation of frequently-used functions, discarding generated code for
>> infrequently used functions, or compression of infrequently-used
>> objects).
>
>I remember a paper about a system on the T.I. Explorer (I think), in
>which the GC had a training mode, which activated a read barrier.
>This was used to improve the locality of objects which temporally [sic]
>reference each other (in training mode the GC copying everything on
>first mutator read/write, thus clustering related objects into the
>same page(s))
>
>However I can't find reference this amongst the bibliographies...

Sorry, I don't have the exact citation, but the paper is by Bob Courts
and was in the September 1988 Communications of the ACM.

>I think the conclusion was that you can shrink the working set by a
>factor of two this way, if you are prepared to put up with the slow
>response during a training period.  I think the other conclusion was
>that this technique allowed a heap to be 4 times larger than would
>otherwise be bearable in a given amount of real memory under VM
>paging.

For a performance analysis of the technique, see my paper in the
proceedings of the 1991 ASPLOS conference.  I don't have the exact
citation here, either.  I'm on the road away from my library.  Your
summary is not bad, given that the actual performance varies widely
based on problem and memory size.  Note that there really isn't a
"training period".  The moving of objects is an artifact of the Baker
style collector anyway.  The temporal collector just uses it to
optimize virtual memory usage.

>For the JIT code-cache, you don't really need this generality
>(read-barrier hooks), as it is easy to maintain a call count or
>timestamp in the function object, and use this in deciding when to
>interpret/code-generate, as well as when flushing old stuff from the
>code-cache. 

Agreed.  

-- Doug
From: Richard B. Winslow
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <555ha8$r4u@mtinsc01-mgt.ops.worldnet.att.net>
Please forgive my outdated and fragmentary knowledge, but I'm curious about
what's happening in this area; in particular, whether any advances or changes
in philosophy have taken place since I was doing work on it (around '88-'90).

When last I had occasion to look into it, I recall that a guy named Ungar from
MIT had done some excellent heuristic analysis of aging in generational
collectors, and had very accurately characterized the optimal conditions for
retiring (graduating) objects.  His thesis was released by MIT press under a
title something like "The Design and Evaluation of a High Performance Smalltalk
System," and he went on to work for ParcPlace.  This was several years ago, and
I don't have the book handy (nor Lieberman & Hewitt's papers on the
generational strategy, which Ungar's work was based upon), so I may be
completely off-base here, but as I recall, these approaches were based entirely
on aging, and only indirectly dealt with the issue of locality of reference
(i.e., generational schemes would "favor" locality of reference by virtue of
the affinity among similarly-aged objects).  As I remember it, the rationale
had to do with the minimization of copying (and thus reduction of real memory
required) yielding more net benefit than simply trying to keep related objects
together in real memory.  If this is accurate, then the age of an object was
the most important factor motivating its placement in memory, and the issues
that I think were touched upon earlier in this thread (surrounding locality of
reference) weren't considered at the same level.

So, what *is* going on in this area, these days?  Has there been further work
on copy collector efficiencies?  Other strategies?  I had heard, early in
Java's hype cycle, that the Sun implementation used a simple mark 'n' sweep
reference counter, but that other implementations (notably Microsoft and
Asymetrix) were exploring more modern garbage collection technologies.
From: William D Clinger
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <3277716E.5EE@ccs.neu.edu>
Berkeley is not MIT, nor is Stanford ParcPlace.

·······@loop.com (Richard B. Winslow) writes:
> [David Ungar] had very accurately characterized the optimal
> conditions for retiring (graduating) objects.

No one has done this, nor will anyone do this.  The optimal
conditions depend not only on the program, but also on its
inputs.  For example, no garbage collector can possibly
predict the order in which windows will be closed by an
interactive user.

It appears that the most widely applicable heuristic is to
assume that most newly allocated objects will become garbage
sooner than most objects that have survived for some time.
Nonetheless this heuristic fails for many real programs.  For
example, it fails for a simple multipass compiler that builds
an abstract syntax tree, performs static checking, and then
generates code by a recursive tree-walk.

What Ungar actually did was

  *  to notice that the generational aspect of the
     Lieberman-Hewitt algorithm did not depend on its
     real-time aspects,
  *  to build an actual generational collector, and
  *  to refine the heuristics suggested by Lieberman and Hewitt.

For example, Ungar's nursery idea improves the locality of
allocation.

Although age is the most common criterion for assigning objects
to generations, many other criteria can be and often are used.
For example, several collectors keep objects that contain pointers
separate from objects that don't.

William D Clinger
From: Henry Baker
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <hbaker-3110960951020001@10.0.2.1>
In article <············@ccs.neu.edu>, ····@ccs.neu.edu (Will Clinger) wrote:

> It appears that the most widely applicable heuristic is to
> assume that most newly allocated objects will become garbage
> sooner than most objects that have survived for some time.
> Nonetheless this heuristic fails for many real programs.  For
> example, it fails for a simple multipass compiler that builds
> an abstract syntax tree, performs static checking, and then
> generates code by a recursive tree-walk.
> 
> What Ungar actually did was
>   *  to notice that the generational aspect of the
>      Lieberman-Hewitt algorithm did not depend on its
>      real-time aspects,
>   *  to build an actual generational collector, and
>   *  to refine the heuristics suggested by Lieberman and Hewitt.
> 
> For example, Ungar's nursery idea improves the locality of
> allocation.

Ungar, et al, also later noticed that it was impossible to guess the
appropriate _size_ for the nursery, a priori.  This is kind of like the
situation in image processing, where you can do a lot with thresholding,
but the appropriate threshold is different for every image, and there aren't
any good heuristics for finding it.

See

ftp://ftp.netcom.com/pub/hb/hbaker/YoungGen.html  (also .ps.Z)

for a discussion of infant mortality.
From: Paul Wilson
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <55d41u$42e@roar.cs.utexas.edu>
In article <·······················@10.0.2.1>,
Henry Baker <······@netcom.com> wrote:
>In article <············@ccs.neu.edu>, ····@ccs.neu.edu (Will Clinger) wrote:
>
>Ungar, et al, also later noticed that it was impossible to guess the
>appropriate _size_ for the nursery, a priori.  This is kind of like the
>situation in image processing, where you can do a lot with thresholding,
>but the appropriate threshold is different for every image, and there aren't
>any good heuristics for finding it.

On many modern, fast machines, a good guess as to the size of the youngest
generation is to make it somewhat smaller than your second-level cache,
so that the very-frequently used area of memory stays in the cache.
This is the same basic idea as generational GC was invented for at
the VM level, applied to cache memories---which are now similar in
size to main memories when generational GC was invented.

This doesn't work for caches that are smaller than hundreds of KB,
because the tracing costs go up more than the locality is worth.

It's also not as critical if you have a write-validate cache.  Most
machines people actually build don't seem to have write-validate
caches, so all other things being equal, keeping the youngest generation
within your cache seems to be a good idea.  On slow machines, this
isn't critical, but on fast machines, it's likely to be increasingly
important.  (E.g., new Pentium Pro's, Alphas, high-end PowerPC's, etc.)

For references, see my GC survey and bibliography at 
http://www.cs.utexas.edu/users/oops.

That said, it's certainly true that no GC configuration is optimal
for all workloads.  (See the survey for a basic discussion of that sort
of thing, as well as Henry's papers.)

Details of implementation also matter.  If your mechanism for finding
pointers into the young generation requires scanning large amounts
of memory, it'll defeat the purpose of keeping the youngest generation
in cache.  :-(  This is problematic for pagewise write barriers,
and also for really large card-marking bitmaps.  (Using bytes as bits
can help speed by letting you use byte instructions to set flags, but
trash your cache locality because scanning tables that are 8x bigger may
touch more memory than the youngest generation itself.)

-- 
| Paul R. Wilson, Comp. Sci. Dept., U of Texas @ Austin (······@cs.utexas.edu)
| Papers on memory allocators, garbage collection, memory hierarchies,
| persistence and  Scheme interpreters and compilers available via ftp from 
| ftp.cs.utexas.edu, in pub/garbage (or http://www.cs.utexas.edu/users/wilson/)      
From: Richard B. Winslow
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <55dics$r2f@mtinsc01-mgt.ops.worldnet.att.net>
William D Clinger <····@ccs.neu.edu> wrote:

> ·······@loop.com (Richard B. Winslow) writes:
> > [David Ungar] had very accurately characterized the optimal
> > conditions for retiring (graduating) objects.
>
> No one has done this, nor will anyone do this.  The optimal
> conditions depend not only on the program, but also on its
> inputs.  For example, no garbage collector can possibly
> predict the order in which windows will be closed by an
> interactive user.

I stand corrected.  My shaky recollection was, in short, that Ungar had done
some measurement work that transcended any previous research that I knew of.

> Berkeley is not MIT, nor is Stanford ParcPlace.

Indeed; although, Mssr. Ungar *did* work for ParcPlace as a consultant on their
garbage collector.  Where he went after that, I was not aware.

> [information about current thinking on locality, etc.]

Many thanks for the edification.  I find it interesting that the generational
model survives to this day in more or less the same form (if I understand you
correctly) that it took almost a decade ago.
From: Paul Wilson
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <55d2qe$415@roar.cs.utexas.edu>
In article <············@ccs.neu.edu>,
William D Clinger  <····@ccs.neu.edu> wrote:
>Berkeley is not MIT, nor is Stanford ParcPlace.
>
>It appears that the most widely applicable heuristic is to
>assume that most newly allocated objects will become garbage
>sooner than most objects that have survived for some time.
>Nonetheless this heuristic fails for many real programs.  For
>example, it fails for a simple multipass compiler that builds
>an abstract syntax tree, performs static checking, and then
>generates code by a recursive tree-walk.

This is not our experience.  Certainly, you can have a lot of objects
that live a long time, but usually you have a lot of objects that
die young anyway.  E.g., in building an AST, you often have very
short-lived objects, even if each hunk of AST you generate lives
until the entire AST is generated.

Most recursive tree-transformation algorithms generate temporary data 
structures that are very friendly to generational GC, even if the
overall result data structure isn't.

The generational principle isn't perfect, granted, but I think it's
by far the best thing we've got.

>What Ungar actually did was
>
>  *  to notice that the generational aspect of the
>     Lieberman-Hewitt algorithm did not depend on its
>     real-time aspects,
>  *  to build an actual generational collector, and
>  *  to refine the heuristics suggested by Lieberman and Hewitt.

He also showed that for *many* programs, two generations is sufficient,
because most objects die very young.  Unfortuantely, there are programs
for which this isn't true, so I'd go for more than 2 generations.

There will always be some programs for which a given configuration is
not ideal.

An even more annoying example than the one you mention above is a
system that loops through a large volume of data, transforming it
repeatedly, creating new objects each time.  There are likely to
be short-lived temporary data structures, but at the larger timescale
(of the big loop), the oldest data die first, not the youngest.

This is pretty analogous to the well-known problem for LRU replacement
of loops that are bigger than memory.  LRU will always evict the
thing that hasn't been touched for the longest time, which for a big
loop is exactly the wrong thing to do because it will always evict
the data that will be touched soonest, the next time through the loop.

-- 
| Paul R. Wilson, Comp. Sci. Dept., U of Texas @ Austin (······@cs.utexas.edu)
| Papers on memory allocators, garbage collection, memory hierarchies,
| persistence and  Scheme interpreters and compilers available via ftp from 
| ftp.cs.utexas.edu, in pub/garbage (or http://www.cs.utexas.edu/users/wilson/)      
From: Tom Lord
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <55g8s9$6mm@emf.emf.net>
Paul Wilson writes that on the one hand:

	The generational principle isn't perfect, granted, but I think it's
	by far the best thing we've got.

But on the other:

	He also showed that for *many* programs, two generations is sufficient,
	[....]
	Unfortuantely, there are programs
	for which this isn't true, so I'd go for more than 2 generations.

	There will always be some programs for which a given configuration is
	not ideal.

I think those observations add up to an engineering challenge -- not
a fundamental limitation on GC.  Whenever there is no clearly best solution,
the Right Thing is defer the choice and keep options open until further
constraints make the decision clearer.

By analogy: 
	Q. What is the best text pattern matching algorithm?
	A. It depends (on the pattern, whether time or space is 
	   more precious, whether the text being searched is 
	   in memory or on disk ...)

The challenge at hand is simply to build systems which provide a
variety of GC strategies, choosing between them either
auto-adaptively, or statically with guidance from humans.  Adaptive
strategies will be best in situations where we can connect a few GC
parameters to a suitable feedback mechanism, human intervention better
when we can't do that, or when we don't want to suffer the penalties
of waiting for an adaptive system to reach a reasonable state.

Mostly (and this is a general principle) I want systems that I can easily
hack to get around undesirable edge cases when they (inevitably) come up 
during actual use.  More knowledge about the space of collectors is only 
part of the  story - better engineering of systems that use collectors is 
equally important.

As an aside: this is a good argument in favor of insisting on access to
the source code of much of the software you want to rely on, and on 
having the right/means to change, rebuild, and reinstall any of that 
source code.  Whenever things get complicated, it seems, a statically 
optimal solution doesn't exist so you'd better have the ability to
modify whatever solution you are initially stuck with.

As another aside: don't misinterpret the statement that more flexible
engineering is needed to be the same as the statement that a specific
programming methodology that supports reconfiguration is needed.  Programming
methodologies seem to me like another area where no solution is optimal,
and the real challenge is to develop systems that let programming
methodologies be mixed.

As a final aside: some GC strategies, text pattern matching algorithms,
and programming methodologies are objectively, absolutely better than
some others -- just because there is no best doesn't mean comparisons
can't be sometimes made.


	This is pretty analogous to the well-known problem for LRU replacement
	of loops that are bigger than memory.  LRU will always evict the
	thing that hasn't been touched for the longest time, which for a big
	loop is exactly the wrong thing to do because it will always evict
	the data that will be touched soonest, the next time through the loop.

He he... funny analogy.  Too bad computer memories don't generally use
adaptive compression for this kind of situation.
From: Paul Wilson
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <55ja3o$152@roar.cs.utexas.edu>
In article <··········@emf.emf.net>, Tom Lord <····@emf.emf.net> wrote:
>Paul Wilson writes that on the one hand:
>	The generational principle isn't perfect, granted, but I think it's
>	by far the best thing we've got.
>But on the other:
>	[....]
>	There will always be some programs for which a given configuration is
>	not ideal.
>
>I think those observations add up to an engineering challenge -- not
>a fundamental limitation on GC.  Whenever there is no clearly best solution,
>the Right Thing is defer the choice and keep options open until further
>constraints make the decision clearer.

Absolutely.  This is why I worked on opportunistic garbage collection
(and others have too---Ungar and Jackson's feedback-mediated tenuring
and Barry Hayes's key object opportunism.)

I should have been clearer on what I meant.  I actually do think that
there can be a pretty darned good one-size-fits-all GC for almost
all purposes.  It just has to have a couple of knobs for tuning it
to particular workloads.  A reasonable generational GC can do a good
job for the vast majority of programs.  With a little tuning, it
can do a good job for most of the rest and an excellent job for
the majority.  Incremental GC's can be pretty fast, so an incremental
generational GC may do the all-purpose job.

What we need is more work on profiling and tuning, to characterize
the relevant program behaviors and how to set the knobs.  We need
more data from real and somewhat nasty programs---the hard cases.
Unfortunately, detailed empirical work with real workloads (rather
than synthetic benchmarks or toy demo programs) is hard to come
by, and not well rewarded in terms of research publications.

[...]
>The challenge at hand is simply to build systems which provide a
>variety of GC strategies, choosing between them either
>auto-adaptively, or statically with guidance from humans. [...]

Right.

>Mostly (and this is a general principle) I want systems that I can easily
>hack to get around undesirable edge cases when they (inevitably) come up 
>during actual use.  More knowledge about the space of collectors is only 
>part of the  story - better engineering of systems that use collectors is 
>equally important.

Absolutely.  We need not only control, but clean designs.  Basic GC issues
are now pretty well understood, but most runtime systems are not designed
in a clean modular way, and there's not a good interface for plugging
in alternative mechanisms and policies.  Sigh.

> [ more good commentary deleted ]
>As a final aside: some GC strategies, text pattern matching algorithms,
>and programming methodologies are objectively, absolutely better than
>some others -- just because there is no best doesn't mean comparisons
>can't be sometimes made.

Right.  Real programs generally exhibit some strong regularities, though
they may exhibit different regularities in different proportions.  In
general, though, it should only take a very few parameters to tune a
GC or allocator very well.

>	This is pretty analogous to the well-known problem for LRU replacement
>	of loops that are bigger than memory.  LRU will always evict the
>	thing that hasn't been touched for the longest time, which for a big
>	loop is exactly the wrong thing to do because it will always evict
>	the data that will be touched soonest, the next time through the loop.
>
>He he... funny analogy.  Too bad computer memories don't generally use
>adaptive compression for this kind of situation.

Isn't it?  :-)  (*Some* computer memories do this;  looked inside the 
Newton lately?  It's got our heap data compression algorithm in it.)
-- 
| Paul R. Wilson, Comp. Sci. Dept., U of Texas @ Austin (······@cs.utexas.edu)
| Papers on memory allocators, garbage collection, memory hierarchies,
| persistence and  Scheme interpreters and compilers available via ftp from 
| ftp.cs.utexas.edu, in pub/garbage (or http://www.cs.utexas.edu/users/wilson/)      
From: Walter Smith
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <01bbcb45$d2b24120$c9d63d9d@wsmith>
Paul Wilson <······@cs.utexas.edu> wrote:
> Tom Lord <····@emf.emf.net> wrote:
> >Too bad computer memories don't generally use
> >adaptive compression for this kind of situation.
> 
> Isn't it?  :-)  (*Some* computer memories do this;  looked inside the 
> Newton lately?  It's got our heap data compression algorithm in it.)

Oops, actually, Newton doesn't compress the heap at runtime. Newton
packages (applications) are essentially frozen heaps, and those can be
compressed with your algorithm, but they're read-only.

However, you can set up a virtual binary object (kind of like a
memory-mapped file) for working space, and it will be pagewise compressed
into the object store (RAM or Flash) when the system needs free pages.
That's pretty close. We got (they probably still get) a lot of requests to
extend that mechanism to objects with pointers in them, which is feasible,
and that would be even closer.

- W
-- 
Walter Smith    ···@pobox.com    http://www.pobox.com/~wrs/
This posting consists of my personal opinions, of course.
From: Henry Baker
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <hbaker-0611960929560001@10.0.2.1>
In article <··························@wsmith>, "Walter Smith"
<···@pobox.com> wrote:

> Paul Wilson <······@cs.utexas.edu> wrote:
> > Tom Lord <····@emf.emf.net> wrote:
> > >Too bad computer memories don't generally use
> > >adaptive compression for this kind of situation.
> > 
> > Isn't it?  :-)  (*Some* computer memories do this;  looked inside the 
> > Newton lately?  It's got our heap data compression algorithm in it.)
> 
> Oops, actually, Newton doesn't compress the heap at runtime. Newton
> packages (applications) are essentially frozen heaps, and those can be
> compressed with your algorithm, but they're read-only.
> 
> However, you can set up a virtual binary object (kind of like a
> memory-mapped file) for working space, and it will be pagewise compressed
> into the object store (RAM or Flash) when the system needs free pages.
> That's pretty close. We got (they probably still get) a lot of requests to
> extend that mechanism to objects with pointers in them, which is feasible,
> and that would be even closer.
> -- 
> Walter Smith    ···@pobox.com    http://www.pobox.com/~wrs/
> This posting consists of my personal opinions, of course.

RamDoubler, of course, does do heap compression.
From: Paul Wilson
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <55qngn$909@roar.cs.utexas.edu>
In article <·······················@10.0.2.1>,
Henry Baker <······@netcom.com> wrote:
[...]
>RamDoubler, of course, does do heap compression.

Does anybody know what compression algorithms it uses?  I'm assuming
that stuff's proprietary, but if it's not I'd like to know about
it, and be able to cite it.

-- 
| Paul R. Wilson, Comp. Sci. Dept., U of Texas @ Austin (······@cs.utexas.edu)
| Papers on memory allocators, garbage collection, memory hierarchies,
| persistence and  Scheme interpreters and compilers available via ftp from 
| ftp.cs.utexas.edu, in pub/garbage (or http://www.cs.utexas.edu/users/wilson/)      
From: Henry Baker
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <hbaker-0711960919490001@10.0.2.1>
In article <··········@roar.cs.utexas.edu>, ······@cs.utexas.edu (Paul
Wilson) wrote:

> In article <·······················@10.0.2.1>,
> Henry Baker <······@netcom.com> wrote:
> [...]
> >RamDoubler, of course, does do heap compression.
> 
> Does anybody know what compression algorithms it uses?  I'm assuming
> that stuff's proprietary, but if it's not I'd like to know about
> it, and be able to cite it.

You might try their web site:

http://www.connectix.com/
From: Stefan Monnier
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <5liv7hy1bc.fsf@daffy.systemsx.cs.yale.edu>
······@cs.utexas.edu (Paul Wilson) writes:
> Newton lately?  It's got our heap data compression algorithm in it.)

So is there (finally) info available about your (infamous) heap compression
algorithm ?


        Stefan
From: Richard B. Winslow
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <5583ks$fti@mtinsc01-mgt.ops.worldnet.att.net>
A brief correction:  I think, in my previous reply to this posting, that I used
the phrase "mark and sweep reference counter" to describe the garbage collector
in Sun's VM.  Of course, a mark 'n' sweep collector and a reference counting
collector are two different things (although M&S has been used as an auxiliary
to RC in order to deal with the circular garbage problem).

I think Sun used mark 'n' sweep, but I'm not sure.
From: Ray S. Dillinger
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <327B8EAD.777F@sonic.net>
Richard B. Winslow wrote:
> 
> A brief correction:  I think, in my previous reply to this posting, that I used
> the phrase "mark and sweep reference counter" to describe the garbage collector
> in Sun's VM.  Of course, a mark 'n' sweep collector and a reference counting
> collector are two different things (although M&S has been used as an auxiliary
> to RC in order to deal with the circular garbage problem).
> 
> I think Sun used mark 'n' sweep, but I'm not sure.

Mark and sweep is an excellent complement to reference counting, as I'm finding 
out.

My approach is to use refcounters for all objects; it is a sufficient and necessary 
condition for collecting objects which contain no pointers, and a sufficient 
condition for collecting all objects.  It's hard to get better than 
that for as little work as it takes to maintain refcounts.  One thing though, 
is that for that kind of total reliance, refcounters have to be essentially the 
same width as pointers in a given implementation -- so that they can't overflow. 

However, this still fails to collect some structures containing pointer cycles.  
For those, I use mark&sweep -- and with a little more work I'll be moving it up 
to generational collecting, which is a variation on mark&sweep.  However, I was 
astonished how much of the work is done by the simpleminded refcount collector; 
each time a refcount reaches zero, it puts that object on the collection queue, 
and each time the reaper is called it takes two items off the queue. If the 
queue is empty it advances the mark&sweep by a fixed amount instead.  If the 
mark&sweep is finished, it has separated objects into collectable and 
non-collectable queues -- whereupon the collectable queue becomes the collection 
queue.

I picked this particular GC combo for its realtime performance -- It NEVER causes 
the application to pause for an unbounded length of time.  I was surprised by how 
MUCH of the problem the refcount GC can take care of.  The mark&sweep eventually 
gets everything, but for most programs, over 90% of my garbage is handled by 
refcount alone, which is computationally cheap.  I think the combination means I'm 
still doing a little more work than I'd do with either one by itself; I'm trying to 
figure out how to make my compiler smart enough to figure out statically what can
be refcounted (and not included in mark&sweep) and what not to bother refcounting.
I've already cut objects which contain no pointers out of the mark&sweep set; Now 
I need a way to recognize a few additional classes I can cut out.  

						Bear
From: Henry Baker
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <hbaker-0211961916510001@10.0.2.1>
In article <·············@sonic.net>, "Ray S. Dillinger" <····@sonic.net> wrote:

> Richard B. Winslow wrote:
> > 
> > A brief correction:  I think, in my previous reply to this posting,
that I used
> > the phrase "mark and sweep reference counter" to describe the garbage
collector
> > in Sun's VM.  Of course, a mark 'n' sweep collector and a reference counting
> > collector are two different things (although M&S has been used as an
auxiliary
> > to RC in order to deal with the circular garbage problem).
> > 
> > I think Sun used mark 'n' sweep, but I'm not sure.
> 
> Mark and sweep is an excellent complement to reference counting, as I'm
finding 
> out.
> 
> My approach is to use refcounters for all objects; it is a sufficient
and necessary 
> condition for collecting objects which contain no pointers, and a sufficient 
> condition for collecting all objects.  It's hard to get better than 
> that for as little work as it takes to maintain refcounts.  One thing though, 
> is that for that kind of total reliance, refcounters have to be
essentially the 
> same width as pointers in a given implementation -- so that they can't
overflow. 
> 
> However, this still fails to collect some structures containing pointer
cycles.  
> For those, I use mark&sweep -- and with a little more work I'll be
moving it up 
> to generational collecting, which is a variation on mark&sweep. 
However, I was 
> astonished how much of the work is done by the simpleminded refcount
collector; 
> each time a refcount reaches zero, it puts that object on the collection
queue, 
> and each time the reaper is called it takes two items off the queue. If the 
> queue is empty it advances the mark&sweep by a fixed amount instead.  If the 
> mark&sweep is finished, it has separated objects into collectable and 
> non-collectable queues -- whereupon the collectable queue becomes the
collection 
> queue.
> 
> I picked this particular GC combo for its realtime performance -- It
NEVER causes 
> the application to pause for an unbounded length of time.  I was
surprised by how 
> MUCH of the problem the refcount GC can take care of.  The mark&sweep
eventually 
> gets everything, but for most programs, over 90% of my garbage is handled by 
> refcount alone, which is computationally cheap.  I think the combination
means I'm 
> still doing a little more work than I'd do with either one by itself;
I'm trying to 
> figure out how to make my compiler smart enough to figure out statically
what can
> be refcounted (and not included in mark&sweep) and what not to bother
refcounting.
> I've already cut objects which contain no pointers out of the mark&sweep
set; Now 
> I need a way to recognize a few additional classes I can cut out.  

There has been a _huge_ amount of work comparing M&S with RC -- unfortunately,
not enough published, and not enough to remove ad hoc peculiarities of a
particular implementation.

Nevertheless, most experience shows that RC almost always loses to GC,
whether M&S or copying or whatever.  Therefore, you'd be better off getting
rid of the refcount updating overhead, and implement a decent real-time GC.

That being said, there is one major exception to the above conclusion:
'singly-referenced'/'linear'/'unique' objects, whose reference count is
known to be always one.  In this case, there is no updating overhead, and
RC wins.

There is lots of stuff in my web site and in the utexas pub/garbage site.

ftp://ftp.netcom.com/pub/hb/hbaker/home.html
From: David N. Smith
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <55l08n$rel@watnews1.watson.ibm.com>
In article <··········@mtinsc01-mgt.ops.worldnet.att.net> Richard B.
Winslow, ·······@loop.com writes:
>> Berkeley is not MIT, nor is Stanford ParcPlace.
>
>Indeed; although, Mssr. Ungar *did* work for ParcPlace as a consultant on their
>garbage collector.  Where he went after that, I was not aware.

Ungar went to Stanford after he graduated from Berkeley where he (with
Randall Smith) developed SELF, a language with delegation but inspired by
Smalltalk. After some years he moved to Sun research, where he now
resides. Most of his and his team's work has been in implementation
techniques, especially making SELF run at near-C speeds. I think there is
a paper or two on GC at the SELF web site though:

   http://self.sunlabs.com/

Dave

_____________________________________________
David N. Smith
IBM T J Watson Research Center, Hawthorne, NY
Mailto: ·······@watson.ibm.com
Home Page: http://www.dnsmith.com/
_____________________________________________
Any opinions or recommendations are those 
of the author and not of his employer.
From: Lee Webber
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <55l81u$rqg@wizard.pn.com>
······@netcom.com (Henry Baker) wrote:
>
> In article <·············@sonic.net>, "Ray S. Dillinger" <····@sonic.net> wrote:
>
>	[snip]
>
> > Mark and sweep is an excellent complement to reference counting, as I'm
> > finding 
> > out... I was 
> > astonished how much of the work is done by the simpleminded refcount
> > collector;  ... I picked this particular GC combo for its realtime
> > performance -- It NEVER causes the application to pause for an
> > unbounded length of time.  I was surprised by how MUCH of the
> > problem the refcount GC can take care of.  The mark&sweep
> > eventually gets everything, but for most programs, over 90% of
> > my garbage is handled by refcount alone, which is computationally
> > cheap...
>
> There has been a _huge_ amount of work comparing M&S with RC -- unfortunately,
> not enough published, and not enough to remove ad hoc peculiarities of a
> particular implementation.
> 
> Nevertheless, most experience shows that RC almost always loses to GC,
> whether M&S or copying or whatever.  Therefore, you'd be better off getting
> rid of the refcount updating overhead, and implement a decent real-time GC.

Some things Ray Dillinger alluded to (see excerpts above) haven't had
enough exposure IMO. The good thing about RC is that, in many
instruction sets and ignoring cycles and collecting objects with counter
overflows, the mutator and collector can run with no synchronization
whatsoever. This means that the large majority of the collector's
work can be done entirely in the background; it may take more time
but it can take it whenever there's nothing else to do. This is a
godsend for a soft real-time system that spends a lot of its time idle
but needs every cycle for real work when it is busy. (Often, recog-
nizing the system is busy and needs to lock out the GC is not a
trivial task.)

I would love to see papers that speak to this set of values and
examine ways to deal efficiently with the "difficult" garbage that
does require that the mutator and collector be synchronized.
Unfortunately, most papers I have read (including some of those
available from Henry Baker's site) do not seem to consider wasted idle
time as a metric. Maybe this approach was resolved a long time ago and
discovered to be a dead end -- I don't know.
From: Paul Wilson
Subject: Opportunistic GC (was Re: Garbage Collection and Aging)
Date: 
Message-ID: <55oc2i$7cu@roar.cs.utexas.edu>
In article <··········@wizard.pn.com>, Lee Webber  <····@micrologic.com> wrote:
>
>Some things Ray Dillinger alluded to (see excerpts above) haven't had
>enough exposure IMO. The good thing about RC is that, in many
>instruction sets and ignoring cycles and collecting objects with counter
>overflows, the mutator and collector can run with no synchronization
>whatsoever. This means that the large majority of the collector's
>work can be done entirely in the background; it may take more time
>but it can take it whenever there's nothing else to do. This is a
>godsend for a soft real-time system that spends a lot of its time idle
>but needs every cycle for real work when it is busy. (Often, recog-
>nizing the system is busy and needs to lock out the GC is not a
>trivial task.)

Several systems do this kind of thing.  My "Opportunistic Garbage Collector"
was designed with this as one of the goals.  It had two basic reasons
for scheduling GC's carefully:

  1. to hide GC costs in pause times when the system would otherwise
     be idle---e.g., when blocked waiting for I/O.

  2. to schedule GC's at times when there's likely to be little live
     data to trace, and actually improve the GC's efficiency.

One of the basic tricks was to schedule GC's preferentially at (approximately)
local minima in the activation stack height, when procedure returns
are likely to have freed up short- and medium- lifetime data structures,
leaving only long-lived data structures to deal with.

This worked pretty well in our experience, but we never published detailed
data because our system was too small and we thought the data wouldn't be
compelling.  (I now think that was a mistake...)

The general idea has been around a long time, e.g., at least one
version of Interlisp did deferred RC, and often did most of its
reference counting and transitive reclamation when the system was
blocked waiting for user input (i.e., during user "think time")

As I understand it, the Java GC can do something similar, though it
aborts the GC if the system unblocks, because its GC is not 
incremental.

My GC survey is a good place to start looking for data about this
kind of thing.  It discusses our opportunistic GC scheme and Barry
Hayes's "Key Object Opportunism", as well as Ungar and Jackson's
"feedback mediated tenuring."

>I would love to see papers that speak to this set of values and
>examine ways to deal efficiently with the "difficult" garbage that
>does require that the mutator and collector be synchronized.
>Unfortunately, most papers I have read (including some of those
>available from Henry Baker's site) do not seem to consider wasted idle
>time as a metric. Maybe this approach was resolved a long time ago and
>discovered to be a dead end -- I don't know.

I don't think it's a dead end at all.  I think it's necessary for
maximum performance and usability.  

One problem with the literature in this area is that papers on tuning
GC's are hard to write---you have to actually tune to a bunch of
real applications, and do some fairly sophisticated arguing about
the significance of the results.  (E.g., you have to make an argument
that the example programs you picked are representative, and any
hand-tuning is "easy", whatever that means, etc. etc.)  Arguments
about non-adaptive and non-hand-tuned systems are usually simpler
and more convincing, but that's unfortunate because in the general
case you need a few knobs to get maximum performance for "unusual"
programs.

OTOH, most implementors seem to be reasonably satisfied with fairly
straighforward generational GC approaches.  Works well for most
programs, and not badly for most of the rest.  Most implementors
are (and often rightly) more worried about the quality of their
compilers' code generation than on improving their GC's once they
have a "pretty good" generational GC. 

Most commercial GC's for serious GC'd systems do have a few hand-tuning
knobs, but I know of no good studies of how well they work, and how
programmers should think about tuning.

-- 
| Paul R. Wilson, Comp. Sci. Dept., U of Texas @ Austin (······@cs.utexas.edu)
| Papers on memory allocators, garbage collection, memory hierarchies,
| persistence and  Scheme interpreters and compilers available via ftp from 
| ftp.cs.utexas.edu, in pub/garbage (or http://www.cs.utexas.edu/users/wilson/)      
From: Charles Fiterman
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <Dzoy66.BuC@midway.uchicago.edu>
The appropriate place for this is ······@ieee.com. This is an email list
and not a news group but has all the top gc experts.
From: Tom Lord
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <54lcpp$553@emf.emf.net>
	In reply to ···@apple.com
	Re: Garbage Collection and Aging

This is in answer to your query about the existance of garbage collectors 
that track how recently objects have been accessed and use this information
as a guide for activities such as compressing or flushing infrequently
used objects.

GNU Rx is an implementation of an on-line regular expression engine;
you give it a regexp, it incrementally compiles a DFA and compares
strings to the regexp.

Rx has a data-structure specific (far from fully general) reference-counting
garbage collector.  This collector does attempt to keep track 
of which objects have been most/least recently accessed.  Rx sometimes
"compresses" objects not recently accessed by throwing away all parts
of their state that can be recomputed on-demand.

It so happens that there is a convenient read-barrier in Rx for
the data structures that are collected as I've described.  This
read barrier is used to track (approximately) how recently objects
have been referenced.  The nub of the gist is that sometimes the
collector will pick an object and force it to cause a trap if 
that object is accessed by a read.   After a delay, if no trap
has occured, the collector concludes that the object has not been
used recently, and compresses it.  If a trap does occur, the 
collector makes sure that the effected object will not be a candidate
for compression for "a long time".  Thus, frequently used objects are
occaisionally considered for compression, but are usually spared
that fate;  an infrequently used object will eventually be considered
for compression, and is likely to be compressed at that time.

This was the cheapest way I found to measure (more or less) 
which objects were most recently and most frequently used.  It depends
heavily on the existance of a natural read barrier.  I suspect this
technique could easily be adapted to generalized collectors provided
that they too involve a read barrier.

Warning: Rampant Speculation Follows

Normally, fine-grain read-barriers do not come cheap so I wonder
if this idea can be adapted to a situation in which no read-barrier 
is used.  One observation is that at any given time, the only
objects a program is able to reference quickly are those addressed by machine
registers.  Every object can be seen to have a certain "distance" from
being referenced where that distance is the number of primitive 
operations (e.g. CAR, CDR, VECTOR-REF ... in Scheme) needed to 
get to the object starting from objects in registers.  This notion
of distance could be further refined by considering the specific
program being executed -- for example, we might be certain that
the value in $R3 will never be passed to CADR if $PC happens to 
point to 0xcafebabe.

Perhaps a barrier-less approximation of MRU could be computed
by periodically sampling the machine registers and noting which
objects are "close" to being referenced.  Alas, a program can cover a lot
of distance quickly which is one reason why this notion of "close" may
not be very useful.

The other direction, using course-granularity barriers, would
seem to give you something like a generational collector.  Isn't it
the case that performance considerations have caused people to use write
barriers for such collectors?  Perhaps an infrequently applied read
barrier could help to identify frequently accessed objects which a copying
collector could then cluster in hopes of improving locality.  Has this
been tried?

-t
From: Paul Wilson
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <553t8l$h51@roar.cs.utexas.edu>
In article <··········@emf.emf.net>, Tom Lord <····@emf.emf.net> wrote:
>	In reply to ···@apple.com
>	Re: Garbage Collection and Aging
>
>It so happens that there is a convenient read-barrier in Rx for
>the data structures that are collected as I've described.  This
>read barrier is used to track (approximately) how recently objects
>have been referenced.  The nub of the gist is that sometimes the
>collector will pick an object and force it to cause a trap if 
>that object is accessed by a read.   After a delay, if no trap
>has occured, the collector concludes that the object has not been
>used recently, and compresses it.  If a trap does occur, the 
>collector makes sure that the effected object will not be a candidate
>for compression for "a long time".

For general reference, this is very similar to "two-level" FIFO/LRU
approximations of LRU replacement used in several virtual memory
systems.  It was invented for machines without reference bits
in hardware---you access-protect pages occasionally to see if
they get touched.   This scheme is also used on some machines that
do have reference bits.  (I believe I heard that Win95 and/or NT
uses this, and that VMS does/did.)

>Perhaps a barrier-less approximation of MRU could be computed
>by periodically sampling the machine registers and noting which
>objects are "close" to being referenced.  Alas, a program can cover a lot
>of distance quickly which is one reason why this notion of "close" may
>not be very useful.

Yes.  This is a big problem.  A common case for bad locality is when
you have objects that are "close" to each other in terms of pointer
connections, but often not touched together.  For example, suppose
you have an object that has several pointer fields that point to
conceptual "subobjects".  If your important access patterns don't
access some of the subobjects, you DON'T want to group those subobjects
with the main objects, even though they're directly connected by a
single pointer.  If you group things in the obvious way, you'll always
bring in all of the subobjects every time you touch the main object,
and pollute the cache.

A classic example of this is Common Lisp symbols, which have several
binding fields.  Most of the time, running code only ever accesses
the normal variable binding field or the procedure binding field,
not the property binding field or printname field or package
field.

For this reason, some Common Lisp implementations organize the
symbol table as several vectors of fields, rather than a vector
of structs.  A single conceptual "symbol" object is actually spread
across several arrays, so that the frequently-accessed parts
of many symbols can be grouped together on a page.

Another simple example of this is documentation objects.  In many
systems, objects can have a documentation field that points to a
documentation object.  You generally do NOT want to group the
documentation objects with the objects they document, because they're
very rarely accessed in normal operation.  If you blindly group them,
they'll wildly pollute the cache.

(This is the kind of thing that motivated our "Object-Type Directed
Garbage Collection", cited in Richard Jones's earlier posting.)

Similar examples come up in relational databases all the time.
You often don't want to group all of the attributes of a tuple
together, because only certain attributes are used for the
important/common/memory-hungry queries.  Physical database
design often involves splitting large conceptual tables (relations)
by columns, so that you don't have to bring unneeded attributes
into memory when doing a query.

In an object-oriented system, things are less obvious but the basic
idea still applies.  You often have objects that are like tuples,
with pointers to other objects that are like attributes.  (This
is especially true when you deal with large volumes of interesting
data, as a database does, and that's often where locality really
matters.)   Blindly grouping objects with the objects that hold
pointers to them can be a big locality mistake.

Unfortunately, there's no clustering principle that works in general
for all kinds of data, just heuristics.  And the empirical data to
evaluate heuristics is very, very weak.  (BTW, never trust a synthetic
benchmark for locality experiments, unless the experimentor is very
clear on the assumptions underlying the benchmark, and the limitations
of the results.  Most clustering experiments using synthetic data
are worse than useless.)

-- 
| Paul R. Wilson, Comp. Sci. Dept., U of Texas @ Austin (······@cs.utexas.edu)
| Papers on memory allocators, garbage collection, memory hierarchies,
| persistence and  Scheme interpreters and compilers available via ftp from 
| ftp.cs.utexas.edu, in pub/garbage (or http://www.cs.utexas.edu/users/wilson/)