From: Paul Wilson
Subject: garbage collection vs. caches (was: Re: "General Purpose" architectures and symbolic languages)
Date: 
Message-ID: <7104@polya.Stanford.EDU>
In an earlier posting to comp.arch, I asked if anybody was trying to keep
the Newest generation of a generational gc entirely within a cache.  I
suggested a small (about 50 KB) generation with a  large (about 100KB) cache.

The earlier posting didn't make clear why I think this is a good idea --
one response said that the cache misses caused by a scavenge would be
no big deal compared to the usual costs of actually running the user program.

Actually, what I meant to say is that keeping the user program in cache
should keep the user program from having many cache misses.  (In exactly
the same sense that generational gc's improve locality for the running
program, at the scale of main memory.  Only on a smaller scale.)

My reasoning is this:  Lisp and Smalltalk allocate huge amounts of short-lived
data.  In the absence of garbage collection, the cache will tend to be
full of recently-allocated (hence recently touched) data.  And it will always
be allocating new data in new locations.  This will cause continual cache
misses, to bring the newly-allocated locations into the cache.  In addition,
this will cause some older data to be bumped out of the cache;  you lose
coming and going.

  (The old data will generally be things that were allocated
a couple of seconds before.  Since they were given new values on allocation, 
they *will* be dirty and they *will* be written out.  Yuck.)


  To give this a (VERY APPROXIMATE) quantitative flavor, suppose you have
a 128KB cache, and a 5 MIPS processor;  suppose also that you have a 20%
load ratio, 1% miss rate, and uniformly distributed misses, for an
average cache lifetime of about 3 seconds.  (This VERY off-the-cuff supposition
is due to Doug Johnson, who should not be held accountable.  Feel
free to suggest better numbers.)

   Now suppose you try to run Lisp instead of "normal" programs.  Suppose that
on your 5 MIPS processor a compute-bound Lisp program allocates 50 to 100 KB
of data per second.  (This not-quite off-the-cuff supposition comes from the
programs Bob Shaw analyzed in his Stanford dissertation last year.  His programs
allocated between 40 KB (or thereabouts) and over 200 KB (!) per second on
a 4 MIPS processor.)

   At 50 KB/s allocation, cache misses due to sheer allocation will be
competitive with the usual kinds of cache misses.  Over 100 KB/s, they will
dominate.  It seems to me that this will greatly decrease the usefulness
of a cache in a Lisp system.


   Two strategies for dealing with these costs:

     1) Keep the newest generation so small that it lives in the cache,
        reusing the same locations over and over again.

     2) Tell the cache not to bother to swap in blocks that we're allocating
        fresh, so at least we don't swap in garbage only to initialize
        it as new objects with new values.  (I believe the Symbolics does
        something similar with paging (?), "creating" pages rather than
        reusing them.  That way, the VM system is free to reuse any
        page frame rather than swapping garbage in from disk.)


   The big questions are whether (1) is worth the cost and whether (2) is
possible in a given hardware configuration.

  (1) does seem reasonable to me, again based on data from Bob
Shaw's dissertation.  I calculate that even with a 32 KB newest generation,
his programs will only spend between about 1% and 4% of their time copying
data.  (Using Ungar's measurements that say you can copy about a half a
megabyte per second on a ~4 MIPS 68020, and assuming you advance things to the
next generation if they survive one scavenge.)  If that's representative, then
the question reduces to whether a two or three percent cost in copying
is repaid by our improved cache hit rate.  My guess is yes.  Anybody else
care to guess?  Anybody got numbers that will decide the issue?

   About point (2), Allen Baum sez both IBM and HP have patents on caches that
could support this sort of thing (e.g., caches exposed to software control).
So is anybody actively researching/developing such things for garbage
collected systems?

   (BTW, it seems that such a strategy is going to have big problems with
multitasking.  Looking at Hennessy et al.'s recent (TOPLAS?) paper, it didn't
appear that his locality measurements were for any Lisps with neat locality
tricks.  (Most of their measurements were for non-Lisp programs.)
It seems that gc strategies introduce a bunch of new parameters that
all could affect locality in funky ways.)

   -- Paul

Paul R. Wilson                         
Human-Computer Interaction Laboratory    lab ph.: (312) 413-0042
U. of Ill. at Chi. EECS Dept. (M/C 154) 
Box 4348   Chicago,IL 60680              ······@carcoar.stanford.edu
Paul R. Wilson                         
Human-Computer Interaction Laboratory    lab ph.: (312) 413-0042
U. of Ill. at Chi. EECS Dept. (M/C 154) 
Box 4348   Chicago,IL 60680              ······@carcoar.stanford.edu