From: David Hanley
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <3270DCA7.6F89@netright.com>
Stefan Monnier wrote:
> 
> David Hanley <·····@netright.com> writes:
> > It is interesting to see copy-on-read used to increase heap locality,
> > but, unless I am mistaken, isn't this what a copying gc will generally
> > do?  I.E. the nodes of a list, even if allocated distinctly, should be
> > copied into the same page, generally speaking.
> 
> Yes, but they generally do it with a GC-directed graph traversal, whereas the
> TI Explorer would mostly let the mutator do it so that it would not be layed
> out in a breadth-first or depth-first manner but in a "actual access pattern"
> manner. In other words, if your graph traversal is fairly constant, one GC of
> your datastructure will make sure that you will later step through memory
> sequentially.

	I understand that; I suppose the question I needed to append
is, how often do we reach objects in some other manner than the pointers
attached to them? :)  I can't think of a lot of situations in which this
training technique will be better, and all of them are obscure.  Of
course, it's probably just due to lack of imagination on my part, and
I'm genuinely curious for examples where this training technique will
work very well.

	One, for example, might be a binary tree in which a few branches are
being hit all the time, and some accessed rarely, if ever.  Or the same
thing for a hash table..  

	I suppose what i'm skeptical of is if this will give a big enough win
in the general case to compensate for the cost of the read barrier.  It
could, perhaps, with the possibility of going to the disk, and hitting
L2 and L1 caches more frequently.  Do you know of any solid research
done in this area?

dave
From: Douglas Johnson
Subject: Re: Garbage Collection and Aging
Date: 
Message-ID: <32713b17.68916157@nntp.ix.netcom.com>
David Hanley <·····@netright.com> wrote:

>Stefan Monnier wrote:
>> 
>> David Hanley <·····@netright.com> writes:
>> > It is interesting to see copy-on-read used to increase heap locality,

>> Yes, but they generally do it with a GC-directed graph traversal, whereas the
>> TI Explorer would mostly let the mutator do it so that it would not be layed
>> out in a breadth-first or depth-first manner but in a "actual access pattern"
>> manner. 

>	I suppose what i'm skeptical of is if this will give a big enough win
>in the general case to compensate for the cost of the read barrier.  It
>could, perhaps, with the possibility of going to the disk, and hitting
>L2 and L1 caches more frequently.  Do you know of any solid research
>done in this area?

It would be immodest to claim that this is solid research, but I did a
paper for the 1991 ASPLOS conference titled "The Case for a Read
Barrier" that compared two collectors, one with a read barrier that
did virtual memory optimization as discussed here and one that did
not.  The problem being run was a VLSI design program.  

This program first created a circuit in connectivity order, creating
transistors, then the wires they used, then the transistors at the
ends of the wires, and so on.  It then optimized the layout of that
design.  When doing that, the program tended to access the design not
by connectivity, but by physical location on the face of the chip.

The virtual memory behavior of the program with the conventional
collector  was only slightly worse than the temporal collector during
the first phase.  In the second phase, however, it was (depending on
physical memory size) dramatically worse to the point it thrashed
itself to death, not completing after a week of running while the
program using the temporal collector completed normally,

Now whether this is worth it or not depends on the problem, the cost
of the read barrier, and the cost of memory.  The read barrier in the
Explorer was nearly free.  One bit in each entry in the memory map and
a NAND gate.  It can be similarly inexpensive in conventional
architectures.  

A lot of programs don't go through the kind of dramatic switch in
access patterns that that VLSI design program did.  For those, likely
the best layout of objects is in order of creation.  I've seen lots of
programs that do make major changes in access patterns.  If you have a
hardware read barrier, limited physical  memory, and such a program
you win.

-- Doug