From: Stefan Monnier
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <5l20eli70o.fsf@daffy.systemsx.cs.yale.edu>
David Hanley <·····@netright.com> writes:
> 	I understand that; I suppose the question I needed to append
> is, how often do we reach objects in some other manner than the pointers
> attached to them? :)  I can't think of a lot of situations in which this
> training technique will be better, and all of them are obscure.  Of
> course, it's probably just due to lack of imagination on my part, and
> I'm genuinely curious for examples where this training technique will
> work very well.

There are several possibilities. Most of them were suggested by XXX (can really
not remember his name, sorry, it was a couple years back) in his study where he
tried to improve the locality by statically changing the graph traversal.
Basically, the idea was:
- a normal copying GC goes breadth-first
- most graph traversals are more depth-first
- switching to a depth-first GC doesn't help much (it does help, but not as
  much as one might expect)
- so he switched to a "mostly-depth-first" traversal (based on the observation
  that even though the mutator is basically depth-first, it does "look around"
  at each depth level). Can't remember exactly what this "mostly-depth-first"
  looked like, sorry.
- then he added special cases for hashtables (the system he was working with
  was a lisp system, so most things were lists and hastables): basically,
  hash-tables were handled in a purely depth-first way and only after
  everything else had been scanned (to reduce the (random) ordering influence
  of hashing)
If anybody has the reference I'd be happy to hear about it.

> 	I suppose what i'm skeptical of is if this will give a big enough win
> in the general case to compensate for the cost of the read barrier.  It
> could, perhaps, with the possibility of going to the disk, and hitting
> L2 and L1 caches more frequently.  Do you know of any solid research
> done in this area?

I don't know how well TI Explorer's scheme does compared to the
statically-optimised traversal proposed above. There was a paper (whose
reference has been lost at the exact same time as the previous one)
comparing the TI Exporer's GC to a vanilla 2-generations (non-concurrent) GC
(both on the same hardware). Their conclusion was "the fancy GC doesn't do
any better than the simple one (except in the memory-limited case), so why
bother ?". I happen to totally disagree with their conclusion, because what
they actually showed was that despite its fanciness (concurrency for instance
which makes it very nice for interactive use), TI's GC isn't noticeably slower
in the worst possible case (batch CAD-work with plenty of memory so that
swapping is not an issue). On the other hand, when the memory was limited, TI's
GC only took twice as much time, whereas the simple GC didn't finish at all
(the testers got tired after waiting a whole week, if I remember correctly).

I expect this kind of result to be comparable to what you can get for
"compressed virtual-memory": it only improves behavior when you memory
foot-print is slightly larger that real memory: if it's smaller, you can't
improve anything, if it's much smaller, no compression nor anything else can
help either.

As an aside note, the hit for read-barrier is not so much taken because of the
"let's let the mutator rather than the GC traverse its graph" but rather
because the GC is concurrent. Also the read-barrier was fairly cheap because
the GC was part of the OS, so access to the VM exception handler was much more
efficient than what is usually available to a user process on unix.


        Stefan

From: Richard Jones
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <899@beech.ukc.ac.uk>
In article <··············@daffy.systemsx.cs.yale.edu>,
>There are several possibilities. Most of them were suggested by XXX (can really
>not remember his name, sorry, it was a couple years back) in his study where he
>tried to improve the locality by statically changing the graph traversal.
    [snip]
>If anybody has the reference I'd be happy to hear about it.

There are papers that address these locality issues (see below). At the risk
of being accused of plugging my own book, can I suggest that one accessible

of being accused of pluggin my own book, can I suggest that one accessible
point of reference is
    Garbage Collection
    Algorithms for Automatic Dynamic Memory Management
    John Wiley & Sons, ISBN 0 471 94148 4
    Chapter 6, pages 129-137
    Chapter 8, pages 209-209



Richard Jones

Computing Laboratory                                 Dialling code:
University of Kent at Canterbury                       01227 (UK) 
Canterbury                                             +44 1227 (International)
CT2 7NF, UK                                          Tel: 827943 direct line
                                                          746000 (ext. 7943)
http://www.ukc.ac.uk/computer_science/Html/Jones/    Fax: 762811

===============================================================================

"Approximately depth-first copying" is due to David Moon:
    @inproceedings{moon84,
    title = "Garbage Collection in a Large {LISP} System",
    author = "David A. Moon",
    crossref = "LFP84",
    pages = "235--245",
    }

Paul Wilson et al modify Moon's approach to avoid re-scanning and also
consider the impact of system hash tables on locality:
    @article{wils91,
    title = "Effective Static-Graph Reorganization to Improve Locality in
    Garbage Collected Systems",
    author = "Paul R. Wilson and Michael S. Lam and Thomas G. Moher",
    journal = Sigplan,
    publisher = ACM,
    year = 1991,
    volume = 26,
    number = 6,
    pages = "177--191",
    }

Optimal grouping has been found to be dependent on the shape and type of data
structures being copied:
    @inproceedings{lam92,
    title = "Object Type Directed Garbage Collection to Improve Locality",
    author = "Michael S. Lam and Paul R. Wilson and Thomas G. Moher",
    address = "University of Illinois, USA",
    crossref = "IWMM92",
}

Robert Courts' "Temporal garbage collector" for the TI Explorer is described in:
    @article{cour88,
    title = "Improving Locality Of Reference In A Garbage-Collecting Memory
    Management-System",
    author = "Robert Courts",
    journal = CACM,
    publisher = ACM,
    year = 1988,
    volume = 31,
    number = 9,
    pages = "1128--1138",
    }

Other studies include:

    @techreport{stam82,
    author = "Stamos, James W.",
    title = "A Large Object-Oriented Virtual Memory: Grouping Strategies,
    Measurements, and Performance",
    institution = PARC,
    number = "SCG-82-2",
    month = May,
    year = 1982
    }

    @inproceedings{blau83,
    author = "Ricki Blau",
    title = "Paging on an Object-Oriented Personal Computer for {S}malltalk",
    booktitle = "{ACM} {SIGMETRICS} Conference on Measurement and Modeling of
    Computer Systems, {M}inneapolis",
    month = aug,
    year = 1983,
    publisher = ACM,
    note = "Also appears as Technical Report UCB/CSD 83/125, University of
    California at Berkeley, Computer Science Division (EECS)"

    @article{stam84,
    title = "Static Grouping of Small Objects to Enhance Performance of a Paged Virtual Memory",
    author = "Stamos, James W.",
    journal = TransCompSys,
    publisher = ACM,
    volume = 2,
    number = 3,
    month = May,
    year = 1984,
    pages = "155--180",
    }

Crossreferences:

@proceedings{IWMM92, key = "IWMM",
booktitle = "Proceedings of International Workshop on Memory Management",
title = "Proceedings of International Workshop on Memory Management",
editor = "Yves Bekkers and Jacques Cohen",
address = "St Malo, France",
publisher = SV,
series = LNCS,
volume = 637,
month = "16--18~" # sep,
year = 1992
}

@proceedings{LFP84, key = "LFP",
booktitle = "Conference Record of the 1984 {ACM} Symposium on Lisp and Functiona
l Programming",
title = "Conference Record of the 1984 {ACM} Symposium on Lisp and Functional Pr
ogramming",
editor = "Steele, Guy L.",
publisher = ACM,
address = "Austin, Texas",
month = aug,
year = 1984
}
--
Richard Jones
From: Paul Wilson
Subject: Re: GC and Aging, GC and Allocator Surveys available
Date: 
Message-ID: <553uf3$ha8@roar.cs.utexas.edu>
In article <··············@daffy.systemsx.cs.yale.edu>,
Stefan Monnier  <··············@lia.di.epfl.ch> wrote:
> [ stuff about grouping data structures hierarchically, and special
    treatment of hash tables to avoid randomizing influence that
    destroys locality ]
>If anybody has the reference I'd be happy to hear about it.

I believe that would be my paper on "Effective Static-Graph Reorganization" 
paper with Mike Lam and Tom Moher.  I forget the detailed citation,
but it's in my monster GC survey which is available online
from http://www.cs.utexas.edu/users/oops.  This survey has been
accepted by Computing Surveys.

This survey also discusses Courts' GC, and related work by the MUSROOM
group on an object-grouping memory hierarchy, and a bunch of other
stuff. 

More recently, we published a huge survey on memory allocators (like
malloc/free) that discusses some related issues of locality, as well
as fragmentation.  (It turns out that conventional memory allocators have
been very poorly studied until recently, and their locality characteristics
are important but not very well understood.)  It's also available from our
web site.

My bibliography file with lots of GC and memory hierarchy stuff is
also available there.  (Most of the GC stuff, at least, is also
in Richard Jones' bibliography.)
-- 
| Paul R. Wilson, Comp. Sci. Dept., U of Texas @ Austin (······@cs.utexas.edu)
| Papers on memory allocators, garbage collection, memory hierarchies,
| persistence and  Scheme interpreters and compilers available via ftp from 
| ftp.cs.utexas.edu, in pub/garbage (or http://www.cs.utexas.edu/users/wilson/)