From: Tim Bradshaw
Subject: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <nkjsnocj89o.fsf@tfeb.org>
Does anyone have any figures or pointers to references on how fast
various Lisp systems can allocate ephemeral objects?  What I mean by
this is that if the system is allocating a huge number of relatively
small objects which will (essentially) all be garbage at the time of
the first GC, and is allocating enough of these that the GC does get
triggered often enough that GC overhead is measured.  Obviously this
figure would depend on the machine.

I have some figures for a couple of systems but they were measured
with no GC tweaking or vendor input so I don't think it's fair to
report them here (especially as both systems are commercial and one
does significantly better than the other in what may be a very unfair
test, although they both do quite well).

I'm not sure what sort of GC tweaks should be allowed -- definitely
not `turning it off', but I guess anything that I could put inside a
(with-ephemeral-consing ...) style macro and which would not
significantly hurt the performance after the consing frenzy is over.

I'm not really after some kind of `choose system x' figures, rather
I'm interested to know how well actual plausible implementations can
do without heroic efforts.  I guess it would also be cool to know what
the theoretical maximum is -- I suppose for a single huge chunk of
zeroed memory it could take n processor cycles to allocate n words (as
it has to be initialized), always assuming the memory system can keep
up, but that's really a completely other case.

Perhaps there is some literature on this kind of thing (or perhaps I'm
just confused in what I'm asking).  This arose from a real-life
situation where we had this kind of allocation frenzy & I just got
interested in what the limits were.

--tim

From: Joe Marshall
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <8zq4m07o.fsf@content-integrity.com>
Tim Bradshaw <···@tfeb.org> writes:

> Does anyone have any figures or pointers to references on how fast
> various Lisp systems can allocate ephemeral objects?  What I mean by
> this is that if the system is allocating a huge number of relatively
> small objects which will (essentially) all be garbage at the time of
> the first GC, and is allocating enough of these that the GC does get
> triggered often enough that GC overhead is measured.  Obviously this
> figure would depend on the machine.

It would depend on how much RAM you have allocated to ephemeral
space.  Suppose you had 128MB, then you could call CONS about 16
million times before you filled that up.

A good ephemeral GC would take time roughly proportional to the amount
of storage retained (which is zero) plus a few instructions for
bookkeeping, plus a few instructions to mark the reclaimed pages, but
all in all, I would imagine that a few thousand instructions could do
it.  If you amortize this amount over the cost of consing this much
memory, I think you will find it negligible.

So the GC overhead (for the flip) would have little effect on the rate
of consing.  The rate of consing would be determined by how many
instructions are executed to effect a single cons.  At the least, you
would have to bump the free pointer and write two NILs.  You probably
want to compare the free pointer to the allocation limit, you wouldn't
necessarily have to mark the pages as containing ephemeral pointers,
but you might, and you might encapsulate CONS in a subroutine rather
than inlining it.  You would probably want to check the interrupts
occasionally if you are polling.  You should probably be able to do
this in less than a dozen instructions.

You will, of course, have variations depending on things like memory
latency, OS memory management, etc.

I wrote a little program to test this.  The machine on my desk here
can do a billion cons cells in under a minute, including the 1.7
seconds of GC time.   (about 57nS per cons)
I think I have a 177MHz machine, so that would come out to about 10
clock ticks.


> 
> I have some figures for a couple of systems but they were measured
> with no GC tweaking or vendor input so I don't think it's fair to
> report them here (especially as both systems are commercial and one
> does significantly better than the other in what may be a very unfair
> test, although they both do quite well).
> 
> I'm not sure what sort of GC tweaks should be allowed -- definitely
> not `turning it off', but I guess anything that I could put inside a
> (with-ephemeral-consing ...) style macro and which would not
> significantly hurt the performance after the consing frenzy is over.
> 
> I'm not really after some kind of `choose system x' figures, rather
> I'm interested to know how well actual plausible implementations can
> do without heroic efforts.  I guess it would also be cool to know what
> the theoretical maximum is -- I suppose for a single huge chunk of
> zeroed memory it could take n processor cycles to allocate n words (as
> it has to be initialized), always assuming the memory system can keep
> up, but that's really a completely other case.
> 
> Perhaps there is some literature on this kind of thing (or perhaps I'm
> just confused in what I'm asking).  This arose from a real-life
> situation where we had this kind of allocation frenzy & I just got
> interested in what the limits were.
> 
> --tim


-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----==  Over 80,000 Newsgroups - 16 Different Servers! =-----
From: Tim Bradshaw
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <nkjpujgj6ga.fsf@tfeb.org>
Joe Marshall <···@content-integrity.com> writes:

> So the GC overhead (for the flip) would have little effect on the rate
> of consing.  The rate of consing would be determined by how many
> instructions are executed to effect a single cons.  At the least, you
> would have to bump the free pointer and write two NILs.  You probably
> want to compare the free pointer to the allocation limit, you wouldn't
> necessarily have to mark the pages as containing ephemeral pointers,
> but you might, and you might encapsulate CONS in a subroutine rather
> than inlining it.  You would probably want to check the interrupts
> occasionally if you are polling.  You should probably be able to do
> this in less than a dozen instructions.

That was my theory too.  I was curious how close real systems got.
From your measurements it looks like the answer is `quite close'.  Thanks!

--tim
From: Duane Rettig
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <4aeajzvlw.fsf@beta.franz.com>
Joe Marshall <···@content-integrity.com> writes:

> Tim Bradshaw <···@tfeb.org> writes:
> 
> > Does anyone have any figures or pointers to references on how fast
> > various Lisp systems can allocate ephemeral objects?  What I mean by
> > this is that if the system is allocating a huge number of relatively
> > small objects which will (essentially) all be garbage at the time of
> > the first GC, and is allocating enough of these that the GC does get
> > triggered often enough that GC overhead is measured.  Obviously this
> > figure would depend on the machine.

I assume that Tim is asking about the overhead of actually allocating a
cons cell, as opposed to the overhead of garbage-collecting a cons cell.
Even if gc is taken into account, the overhead for the above scenario
shouldn't be much, since many modern "garbage collectors" are misnamed
(they are garbage ignorers, and only actually collect live data in the
newer generations).

> So the GC overhead (for the flip) would have little effect on the rate
> of consing.  The rate of consing would be determined by how many
> instructions are executed to effect a single cons.  At the least, you
> would have to bump the free pointer and write two NILs.

But why bother writing two nils into the cell, when you already have
the objects you want to put into the car and cdr?  It is only necessary
to write the arguments directly at the time of the allocation.

>  You probably
> want to compare the free pointer to the allocation limit, you wouldn't
> necessarily have to mark the pages as containing ephemeral pointers,
> but you might, and you might encapsulate CONS in a subroutine rather
> than inlining it.  You would probably want to check the interrupts
> occasionally if you are polling.  You should probably be able to do
> this in less than a dozen instructions.

This is generally true in Allegro CL, where the instruction count for a
cons is on the order of 10 instructions (unless a page boundary is
crossed).  The best way to show this is to disassemble the primitive
function which does the majority of cons allocation (shown here on
x86 and on hp 64-bit lisps):

x86/linux (8 instructions):

cl-user(2): (disassemble "qcons")
;; disassembly of #("qcons" 1074884083)

;; code start: #x40116df3:
   0: 8b 8f f3 fd movl	ecx,[edi-525]   ; sys::c_gsgc_newconsloc
      ff ff 
   6: 3b 8f f7 fd cmpl	ecx,[edi-521]   ; sys::c_gsgc_newconsend
      ff ff 
  12: 0f 84 23 23 jz	9013            ; "cons+0"
      00 00 
  18: 89 41 ef    movl	[ecx-17],eax
  21: 89 c8       movl	eax,ecx
  23: 89 50 f3    movl	[eax-13],edx
  26: 83 87 f3 fd addl	[edi-525],$8    ; sys::c_gsgc_newconsloc
      ff ff 08 
xgnus  33: c3          ret
cl-user(3): 

HP-64 (10 instructions):

cl-user(2): (disassemble "qcons")
;; disassembly of #("qcons" 13835058055285823500)

;; code start: #xc00000000037d80c:
   0: 521f3811             ldd -1016(r16),r31    	sys::c_gsgc_newconsloc
   4: 521d3821             ldd -1008(r16),r29    	sys::c_gsgc_newconsend
   8: 9fbf2030             cmpb,*= r31,r29,40     	lb1,(sys::c_gsgc_newconsloc),(sys::c_gsgc_newconsend)
  12: 37fd0020             ldo 16(r31),r29
  16: 721d3811             std r29,-1016(r16)    	sys::c_gsgc_newconsloc
  20: 0ffa12dd             std r26,-2(s0,r31)
  24: 0ff912cc             std r25,6(s0,r31)
  28: 37fa0000     [ldo]   copy r31,r26   	(sys::c_gsgc_newconsloc)
  32: e840d000             bve (r2)
  36: 34040002     [ldo]   ldi #x1,r4
lb1:
  40: 521d04b0             ldd 600(r16),r29    	cons
  44: eba0d000             bve (r29)
  48: 08000240     [or]    nop
cl-user(3): 

Note that only if the page is exhausted is the more hefty cons primitive
called, which allocates a new page to play in (possibly also causing a
gc).

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)
From: Joe Marshall
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <lmu3lrn2.fsf@content-integrity.com>
Duane Rettig <·····@franz.com> writes:

> Joe Marshall <···@content-integrity.com> writes:
>
> > So the GC overhead (for the flip) would have little effect on the rate
> > of consing.  The rate of consing would be determined by how many
> > instructions are executed to effect a single cons.  At the least, you
> > would have to bump the free pointer and write two NILs.
> 
> But why bother writing two nils into the cell, when you already have
> the objects you want to put into the car and cdr?  It is only necessary
> to write the arguments directly at the time of the allocation.

Well, what I meant was ``you'd have to bump the free pointer and write
the car and cdr''.  Obviously you wouldn't write a couple of nils in
there and then turn around and overwrite them with the arguments.


-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----==  Over 80,000 Newsgroups - 16 Different Servers! =-----
From: Duane Rettig
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <466l7zry6.fsf@beta.franz.com>
Joe Marshall <···@content-integrity.com> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > Joe Marshall <···@content-integrity.com> writes:
> >
> > > So the GC overhead (for the flip) would have little effect on the rate
> > > of consing.  The rate of consing would be determined by how many
> > > instructions are executed to effect a single cons.  At the least, you
> > > would have to bump the free pointer and write two NILs.
> > 
> > But why bother writing two nils into the cell, when you already have
> > the objects you want to put into the car and cdr?  It is only necessary
> > to write the arguments directly at the time of the allocation.
> 
> Well, what I meant was ``you'd have to bump the free pointer and write
> the car and cdr''.  Obviously you wouldn't write a couple of nils in
> there and then turn around and overwrite them with the arguments.

This is indeed true, but actually not quite obvious.  We only get away
with it because the operation is completely atomic; there is no possibility
of a gc occurring between the allocation of and storing into the cons cell.
If the allocation were being done as a separate function, and if there were
a possibility of an interruption occurring immediately after the allocation
but before the storage of the two values, then something would have to be
stored into the cell temporarily, in order not to confuse the gc with
non-lisp values within the new cons cell.  The other way to do this allocation
safely is to not make the new cons cell "visible" to the gc until it is
filled, but that carries with it a re-entrancy problem and the danger of
the same cell being allocated twice.  Of course, making the whole sequence
atomic gets around both of these problems.

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)
From: Joe Marshall
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <hf4rlosa.fsf@content-integrity.com>
Duane Rettig <·····@franz.com> writes:

> Joe Marshall <···@content-integrity.com> writes:
> 
> > Duane Rettig <·····@franz.com> writes:
> > 
> > > Joe Marshall <···@content-integrity.com> writes:
> > >
> > > > So the GC overhead (for the flip) would have little effect on the rate
> > > > of consing.  The rate of consing would be determined by how many
> > > > instructions are executed to effect a single cons.  At the least, you
> > > > would have to bump the free pointer and write two NILs.
> > > 
> > > But why bother writing two nils into the cell, when you already have
> > > the objects you want to put into the car and cdr?  It is only necessary
> > > to write the arguments directly at the time of the allocation.
> > 
> > Well, what I meant was ``you'd have to bump the free pointer and write
> > the car and cdr''.  Obviously you wouldn't write a couple of nils in
> > there and then turn around and overwrite them with the arguments.
> 
> This is indeed true, but actually not quite obvious.  We only get away
> with it because the operation is completely atomic; there is no possibility
> of a gc occurring between the allocation of and storing into the cons cell.
> If the allocation were being done as a separate function, and if there were
> a possibility of an interruption occurring immediately after the allocation
> but before the storage of the two values, then something would have to be
> stored into the cell temporarily, in order not to confuse the gc with
> non-lisp values within the new cons cell.  The other way to do this allocation
> safely is to not make the new cons cell "visible" to the gc until it is
> filled, but that carries with it a re-entrancy problem and the danger of
> the same cell being allocated twice.  Of course, making the whole sequence
> atomic gets around both of these problems.

You could also arrange for empty pages to be filled with some
innocuous value upon allocation.


-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----==  Over 80,000 Newsgroups - 16 Different Servers! =-----
From: Duane Rettig
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <41yvvzpgv.fsf@beta.franz.com>
Joe Marshall <···@content-integrity.com> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > Joe Marshall <···@content-integrity.com> writes:
> > 
> > > Duane Rettig <·····@franz.com> writes:
> > > 
> > > > Joe Marshall <···@content-integrity.com> writes:
> > > >
> > > > > So the GC overhead (for the flip) would have little effect on the rate
> > > > > of consing.  The rate of consing would be determined by how many
> > > > > instructions are executed to effect a single cons.  At the least, you
> > > > > would have to bump the free pointer and write two NILs.
> > > > 
> > > > But why bother writing two nils into the cell, when you already have
> > > > the objects you want to put into the car and cdr?  It is only necessary
> > > > to write the arguments directly at the time of the allocation.
> > > 
> > > Well, what I meant was ``you'd have to bump the free pointer and write
> > > the car and cdr''.  Obviously you wouldn't write a couple of nils in
> > > there and then turn around and overwrite them with the arguments.
> > 
> > This is indeed true, but actually not quite obvious.  We only get away
> > with it because the operation is completely atomic; there is no possibility
> > of a gc occurring between the allocation of and storing into the cons cell.
> > If the allocation were being done as a separate function, and if there were
> > a possibility of an interruption occurring immediately after the allocation
> > but before the storage of the two values, then something would have to be
> > stored into the cell temporarily, in order not to confuse the gc with
> > non-lisp values within the new cons cell.  The other way to do this allocation
> > safely is to not make the new cons cell "visible" to the gc until it is
> > filled, but that carries with it a re-entrancy problem and the danger of
> > the same cell being allocated twice.  Of course, making the whole sequence
> > atomic gets around both of these problems.
> 
> You could also arrange for empty pages to be filled with some
> innocuous value upon allocation.

True again.  However, this is another form of your original algorithm, which
was to write nils to car and cdr at the time of the allocation.  It turns out
that these two styles would be similar in execution time - normally the
block-filling technique would be faster, but its speed comes by way of the
cache efficiency it presents.  However, when we are dealing with conses
allocated at a high rate, the page being allocated from is likely to reside
in the cache.  Thus, the cache efficiency is nullified or at least minimized.

However, both of these approaches are less efficient than the approach
which simply stores into the cons cell once, instead of twice, if it
can be done that way.

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)
From: Joe Marshall
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <r93uk98i.fsf@content-integrity.com>
Duane Rettig <·····@franz.com> writes:

> > You could also arrange for empty pages to be filled with some
> > innocuous value upon allocation.
> 
> True again.  However, this is another form of your original algorithm, which
> was to write nils to car and cdr at the time of the allocation.  It turns out
> that these two styles would be similar in execution time - normally the
> block-filling technique would be faster, but its speed comes by way of the
> cache efficiency it presents.  However, when we are dealing with conses
> allocated at a high rate, the page being allocated from is likely to reside
> in the cache.  Thus, the cache efficiency is nullified or at least minimized.
> 
> However, both of these approaches are less efficient than the approach
> which simply stores into the cons cell once, instead of twice, if it
> can be done that way.

Under `normal' circumstances, it is definitely quicker to do one write
than two, but there are circumstances where this might not be the
case.

1.  A 'secure' OS might automatically clean pages before handing them
    to a user process.  In this case, you cannot avoid the `penalty'
    so you might as well reap the `benefit'.

2.  Some OSs may provide an API for `fast zeroing' of a page.


-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----==  Over 80,000 Newsgroups - 16 Different Servers! =-----
From: Tim Bradshaw
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <nkj66l7m2t0.fsf@tfeb.org>
Duane Rettig <·····@franz.com> writes:

> 
> I assume that Tim is asking about the overhead of actually allocating a
> cons cell, as opposed to the overhead of garbage-collecting a cons cell.
> Even if gc is taken into account, the overhead for the above scenario
> shouldn't be much, since many modern "garbage collectors" are misnamed
> (they are garbage ignorers, and only actually collect live data in the
> newer generations).

Well in real life the program is allocating CLOS objects, but I was
idealizing.  I realise that the actual GC time will be basically zero
because it's all garbage, but I think I was trying to get a handle on
what the cost of actually having the GC run frequently was, as it
takes some small time to trigger it.

For instance the trick of making the first generation be enormous to
reduce GC frequency (which can obviously make the GC cost
asymptotically zero) might not be so good if you can't make it small
again later, perhaps.

But these kinds of issues are all implementation-dependent, obviously.
I think the question I really was trying to answer was `can realistic
Lisp systems allocate as fast (or nearly as fast) as could
realistically be done on the hardware' and the answer to that seems to
be `yes', I think.

As usueal my question was probably over-vague...

--tim
From: Joe Marshall
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <vgt6k9nj.fsf@content-integrity.com>
Tim Bradshaw <···@tfeb.org> writes:

> For instance the trick of making the first generation be enormous to
> reduce GC frequency (which can obviously make the GC cost
> asymptotically zero) might not be so good if you can't make it small
> again later, perhaps.

Enormous is relative.  The *entire* address space of the CADR lisp
machine was 256 MB.  I can fit the whole thing (not just the first
generation) in RAM.  In a few years, I expect it will fit in the Level
1 cache.

If I am primarily using Lisp on this machine (which I am), there is no
good reason to not let Lisp use most of the RAM.  And these days,
customers do not balk at buying `enough' RAM.

Under our `normal' load, the ephemeral space is collected a few times
an *hour*, so the asymptotic GC cost is almost unmeasurable. 


-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----==  Over 80,000 Newsgroups - 16 Different Servers! =-----
From: Tim Bradshaw
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <nkjzoiilmqd.fsf@tfeb.org>
Joe Marshall <···@content-integrity.com> writes:
 
> Enormous is relative.  The *entire* address space of the CADR lisp
> machine was 256 MB.  I can fit the whole thing (not just the first
> generation) in RAM.  In a few years, I expect it will fit in the Level
> 1 cache.
> 
> If I am primarily using Lisp on this machine (which I am), there is no
> good reason to not let Lisp use most of the RAM.  And these days,
> customers do not balk at buying `enough' RAM.
> 
> Under our `normal' load, the ephemeral space is collected a few times
> an *hour*, so the asymptotic GC cost is almost unmeasurable. 

Yes, of course.  The kind of problem I'm talking about is something
like this:

If all the allocation is going to be ephemeral then there's no real
reason not to have the first ephemeral space be almost the whole
physical memory (that's basically my definition of `enormous' for any
given machine).

*But* if I'm wrong about that and a significant amount of stuff
survives the 1st generation then I'm completely screwed: If I'm lucky
I'll page into the ground, if I'm not I'll run out of swap space and
die.

Even if I'm significantly more cautious and have a 1st generation
which is sufficiently small that I'm not going to page too badly even
in the bad case, but I'm still wrong and a whole load of stuff
survives the 1st generation, then I end up with possibly significant
GC pauses as objects get copied.

Really I'm just saying you (I) need to tune the GC, which is hardly
news...

--tim
From: Will Deakin
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <3A26247C.3050001@pindar.com>
Tim wrote:

> Yes, of course.  The kind of problem I'm talking about is something
> like this:
> 
> If all the allocation is going to be ephemeral then there's no real
> reason not to have the first ephemeral space be almost the whole
> physical memory (that's basically my definition of `enormous' for any
> given machine).
> 
> *But* if I'm wrong about that and a significant amount of stuff
> survives the 1st generation then I'm completely screwed: If I'm lucky
> I'll page into the ground, if I'm not I'll run out of swap space and
> die.
With you talking about `spaces' are you are talking about a copying 
generational collector? In the literature[1] there a number of 
non-copying generational gc's described -- whether your implementation 
`del giorno' does this I don't know...

Anyway, measured ephemeral allocation rates of 60-95% were measured in a 
range of lisp[2]. IIRC similar or higher rates have been measured in 
smalltalk, prolog and some scheme systems.

:)w

[1] and having done the research I've forgotten the piece of paper on 
which I wrote the references :(

[2] Jones[3] and Wilson[4] quote Zorn and others: Benjamin G. Zorn. 
Comparative Performance Evaluation of Garbage Collection Algorithms. PhD 
thesis, University of California at Berkeley, 1989.

[3] Richard E. Jones.  Garbage Collection: Algorithms for Automatic 
Dynamic Memory Management. Wiley, July 1996.

[4]  Paul R. Wilson, Mark S. Johnstone, Michael Neely, and David Boles. 
Dynamic storage allocation: A survey and critical review. IWMM, volume 
986 of  Lecture Notes in Computer Science, Kinross, Scotland, September 
1995. Springer-Verlag.
From: Tim Bradshaw
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <nkj4s0pu24n.fsf@tfeb.org>
Will Deakin <········@pindar.com> writes:

> With you talking about `spaces' are you are talking about a copying 
> generational collector? In the literature[1] there a number of 
> non-copying generational gc's described -- whether your implementation 
> `del giorno' does this I don't know...
> 

Yes, both of the implementations I'm currently using in anger use a
copying generational GC.

I read some stuff recently (forget where) which was saying that
non-copying GCs might be quite interesting for MP systems, as copying
invalidates everyone's caches.  I'm not sure if this was based on any
measurement though or if it was just a random hypothesis.  Given no
one seems to have serious GCd language implementations that run on
stock-hardware MP machines[1] it was probably just a guess, and thus
probably wrong...  Of course if almost everything is ephemeral as in
my original question then a copying GC really doesn't copy much and so
the cache invalidation question is pretty moot.

--tim


[1] I don't know if Java is serious (or really runs on MP machines),
but its GC doesn't seem to be. Even the flashy Sun enterprise Java or
whatever it is called seems to just have a conventional generational
GC (if I read through the hype correctly).
From: Will Deakin
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <3A26686D.8070800@pindar.com>
Tim wrote:

> I read some stuff recently (forget where) which was saying that
> non-copying GCs might be quite interesting for MP systems, as copying
> invalidates everyone's caches. 
I would see this as very plausable. It would depend on how you indicate 
something can be gc'd tho' -- in the context of high ephemeral 
allocation and a large disparity between cache and memory sizes, a 
system that marks live data would have advantages over a copying system, 
for example. Not that this is saying much.

Hmmmm. Anyway, I've got some more stuff but not where I am -- I'll see 
if I can dig this out tonight and provide a little more coherent response.

> Given no one seems to have serious GCd language implementations that

> run on stock-hardware MP machines...
Looking at the libhoard stuff[2], there are issues with scalable 
threaded and mp memory allocation full stop[1] -- as can be seen in 
Sun's comparison of solaris stock malloc/free, thread malloc/free 
against libhoard[3]

> Of course if almost everything is ephemeral as in my original question

> then a copying GC really doesn't copy much and so the cache invalidation 

> question is pretty moot.
I'm not as sure about this as I could easily concieve of an mp gc that 
would trigger cache invalidation of dead garbage.

> I don't know if Java is serious (or really runs on MP machines),
> but its GC doesn't seem to be. 
I remember a while back muttering my dark suspisions about something 
like this[4]...

:) w

[1] Hoard: A Scalable Memory Allocator for Multithreaded Applications, 
by Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. 
Wilson. (ASPLOS-IX). Cambridge, MA, November 2000. For a pdf of the 
paper www.cs.utexas.edu/users/emery/hoard/asplos.pdf
[2] www.hoard.org
[3] soldc.sun.com/articles/multiproc/multiproc.html
[4] (I can't remember if it was me or not)  ;)
From: Joe Marshall
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <pujd75px.fsf@content-integrity.com>
Tim Bradshaw <···@tfeb.org> writes:

> I read some stuff recently (forget where) which was saying that
> non-copying GCs might be quite interesting for MP systems, as copying
> invalidates everyone's caches.  

Yes.  But considering that newly consed objects cannot be shared
(until some sort of sharing operation happens, of course), you can
use copying GC on the most ephemeral level.


-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----==  Over 80,000 Newsgroups - 16 Different Servers! =-----
From: Pekka P. Pirinen
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <ixhf4pz6qx.fsf@harlequin.co.uk>
Will Deakin <········@pindar.com> writes:
> In the literature[1] there a number of non-copying generational gc's
> described [...]
> 
> [1] and having done the research I've forgotten the piece of paper on 
> which I wrote the references :(

You've got the most important references, when you've got Jones' book
and Wilson et al's survey.  More recent than those, there's
Johnstone's dissertation (warning: big document):

@phdthesis{john97a, 
title = {Non-Compacting Memory Allocation and Real-Time Garbage Collection},
author = {Mark S. Johnstone},
school = {University of Texas at Austin},
month = dec,
year = 1997,
URL = {ftp://ftp.cs.utexas.edu/pub/garbage/johnstone-dissertation.ps.gz}
}

I don't remember any other recent work.
-- 
Pekka P. Pirinen, Adaptive Memory Management Group, Harlequin Limited
"If you don't look after knowledge, it goes away."
  - Terry Pratchett, The Carpet People
From: Rob Warnock
Subject: Re: Peak (& sustainable) ephemeral allocation rates
Date: 
Message-ID: <902ot2$3prld$1@fido.engr.sgi.com>
Tim Bradshaw  <···@tfeb.org> wrote:
+---------------
| Does anyone have any figures or pointers to references on how fast
| various Lisp systems can allocate ephemeral objects?
+---------------

Appel's book "Compiling with Continuations" has some analysis, and IIRC
also makes reference to some studies by Ungar concerning using a fixed
ephemeral-allocation area roughly the size of the secondary cache on
the machine. That is, rather than the classical "two semi-space" copying
collector (or the generational extension of it), there's a third, fixed
"nursery" space for "generation zero".

All allocation (at least of "small" objects) is done there with a simple
incrementing pointer, and when that space fills up a minor collection is
done that moves all the live data from the nursery into the current active
semi-space of gen#1 (with a more major GC only if/when needed), and you
start allocating again in the fixed nursery. This keeps the area where
furious allocation is going on "hot" in the (secondary) cache, rather
than striding through all of memory.

[Note that Baker's "Cheney on the MTA" ("Cons Should not Cons: II") uses
the C stack as exactly such a fixed nursery area, though with a higher
percentage of "wasted" space (for the unused C stack frames). There's
recently been a long thread in comp.lang.scheme about such things,
especially re the "Chicken" Scheme compiler, whose runtime uses CotMTA.]

Anyway, Appel claims that with typical secondary cache sizes, the amortized
cost of collecting the live data out of the nursery is close to zero...


-Rob

p.s. If you want more precise references, I'll try to dig them out.

-----
Rob Warnock, 31-2-510		····@sgi.com
Network Engineering		http://reality.sgi.com/rpw3/
Silicon Graphics, Inc.		Phone: 650-933-1673
1600 Amphitheatre Pkwy.		PP-ASEL-IA
Mountain View, CA  94043