Garbage Collection Eligibility and portability

From: Matt L
Subject: Garbage Collection Eligibility and portability
Date: Sat, 01 Dec 2007 23:29:32 +0000
Message-ID: <w7m4j.22450$4V6.15680@newssvr14.news.prodigy.net>

There seems to be variance in garbage-collected languages, or in options 
used, that alters exactly when a (strong) variable reference to a memory 
location ceases holding it from collection.

One type generally follows scope - that is, while code is executing in a 
scope where the variable is available the referenced memory is 
ineligible for collection.  (Importantly) This tends to include the 
caller's form for values passed from above.

The other, which I've seen called "Aggressive garbage collection" in 
some articles on this behavior in C#/.Net (possibly misnamed as it 
pertains to how the language interacts with the GC, not the GC itself), 
appears to make a memory location eligible the instant the last 
reference to the location is used.  A caller passing a reference to a 
function does not keep it ineligible, and the receiving function removes 
its hold the instance the last reference is made.

I can see the benefits of "aggressive" in composition of code where 
sizeable intermediate results are passed down into successive phases of 
a computation.

In my own exploration I have found a case where the distinction between 
the two, from the point of view of my library and how it should be used, 
makes a huge difference:

I have tried testing in a couple of lisp variants (CLisp, SBCL and 
Lispworks); but may lack the expertise to properly test them.  I was not 
able to get SBCL/CLisp to exhibit the "aggressive" behavior.

In Lispworks Win32, the following code exhibits either behavior 
depending on whether the functions are compiled at a high speed 
setting/without debug info:


(defun secondary (input-list)
   (format t "input-list length is ~A" (length input-list))
   (terpri)
   (format t "Force collection here")
   (terpri)
   (force-output)
   (break)
   (format t "Don't need input-list here")
   (force-output))

(defun primary ()
   (secondary (make-list 10000000)))


(declaim (optimize (safety 0) (speed 3) (debug 0)))
(compile 'primary)
(compile 'secondary)


In the function "secondary", at the (break) point the code no longer 
needs that huge memory-hogging list.  Cleaning out ram at this point 
will show how the current environment and settings deal with such 
references.

Is there a way to make any given lisp exhibit this behavior, 
surrendering its references the instant it is no longer needed (figure 
it's inherent to how the implementation uses the stack; but thought I'd 
ask)?  Is there a good way to emulate this behavior without requiring 
this compiler behavior or characteristic?

I have a library I'd like to publish as portable but this issue may 
influence its implementation or usage patterns.

Any information on this subject is appreciated.  Any good terminology I 
can search on?  Links to discussion of this topic?

Re: Garbage Collection Eligibility and portability Tim Bradshaw
- Re: Garbage Collection Eligibility and portability Matthew Lamari
Re: Garbage Collection Eligibility and portability Kaz Kylheku
- Re: Garbage Collection Eligibility and portability Gene
Re: Garbage Collection Eligibility and portability Daniel Weinreb
- Re: Garbage Collection Eligibility and portability Matthew Lamari
  - Re: Garbage Collection Eligibility and portability Tim Bradshaw
- Re: Garbage Collection Eligibility and portability George Neuner
- Re: Garbage Collection Eligibility and portability William D Clinger
  - Re: Garbage Collection Eligibility and portability William D Clinger
Re: Garbage Collection Eligibility and portability szergling

From: Tim Bradshaw
Subject: Re: Garbage Collection Eligibility and portability
Date: Sun, 02 Dec 2007 16:46:24 +0000
Message-ID: <5a31d1f0-e8a4-4d2f-a0e2-899269a75f0a@e25g2000prg.googlegroups.com>

On Dec 1, 11:29 pm, Matt L <······@forthebirds.com.com> wrote:

> I have a library I'd like to publish as portable but this issue may
> influence its implementation or usage patterns.

I think that if your code depends on these details then you have
fairly serious implementation issues.  It is never a good idea to
depend on the analysis the compiler does about what is live within a
stack frame, let along when or if the GC runs.  If you are desperate,
you can finesse the liveness analysis by, say:

(let ((x (make-huge-thing)))
  ...
  (setf x nil)
  ...)

From: Matthew Lamari
Subject: Re: Garbage Collection Eligibility and portability
Date: Sun, 02 Dec 2007 19:02:32 +0000
Message-ID: <wiD4j.30141$Pv2.13306@newssvr23.news.prodigy.net>

It has to do more with the wider usage-pattern, not just code internal 
to the library.  "When" the GC runs is an orthogonal issue - this is 
about eligibility if ram runs low.  But thanks, yes, it looks like this 
analysis should not be assumed.

Here's an example.

(let ((fib (some-form-that-builds-fibonacci-sequence)))
    (nth/ 10000 fib))

(Note - in my tests, even without the local variable, the caller still 
holds the generated list as ineligible for collection when passing it on).

The fibonacci is defined in terms of a memoized self-referencing 
lazy-list, and nth/ traverses lazy-lists.  The internal memoization of 
self-referencing lists allows them to proceed with linear performance 
(algorithmically match the iterative solution, not recalculating every 
prior fromscratch); but keeping a reference to the start of the memoized 
list keeps the whole cache around even though it will never be reused. 
With "aggressive", the same call to "nth" holds no significant 
"uncollectable" hold over a large amount of ram.

I have a workaround for this - that the self-referencing list be defined 
so as to have another level of indirection - that its memoized list 
itself only be created on a request for the first element.  This 
construct may also prove useful in other cases to emulate "aggressive" 
garbage collection on single-scanned sequences (man, I really need to 
find the correct terminology on all of this).

 From discussions on IRC, the requirement of a fixed block of ram for 
the fibonacci solution seemed inherent even to the stereotypical haskell 
fibonacci solution on which mine was based (this is not mentioned 
intended to elicit language advocacy/wars/flames, nor to suggest that 
there are not good workarounds).

Tim Bradshaw wrote:
 > On Dec 1, 11:29 pm, Matt L <······@forthebirds.com.com> wrote:
 >
 >> I have a library I'd like to publish as portable but this issue may
 >> influence its implementation or usage patterns.
 >
 > I think that if your code depends on these details then you have
 > fairly serious implementation issues.  It is never a good idea to
 > depend on the analysis the compiler does about what is live within a
 > stack frame, let along when or if the GC runs.  If you are desperate,
 > you can finesse the liveness analysis by, say:
 >
 > (let ((x (make-huge-thing)))
 >   ...
 >   (setf x nil)
 >   ...)

From: Kaz Kylheku
Subject: Re: Garbage Collection Eligibility and portability
Date: Sun, 02 Dec 2007 21:49:00 +0000
Message-ID: <e76ec25e-ee10-4ff4-b55e-62461f75544f@w40g2000hsb.googlegroups.com>

On Dec 1, 3:29 pm, Matt L <······@forthebirds.com.com> wrote:
> There seems to be variance in garbage-collected languages, or in options
> used, that alters exactly when a (strong) variable reference to a memory
> location ceases holding it from collection.
>
> One type generally follows scope - that is, while code is executing in a
> scope where the variable is available the referenced memory is
> ineligible for collection.

The idea of a scope where a lexical variable is available, is only an
abstraction that helps us develop the program. Within a lexical
contour, the lifetime of a variable in fact ends with the last mention
which takes its value. The variable still has a scope which extends to
the end of the contour. However, since there are no references to that
variable there, and none can be inserted after the code is accepted by
the compiler, that piece of scope is fictional.

Garbage collectors work at a lower level.  They will scan the stack
frames for references to an object, as well as all of the relevant
registers of a frozen thread. An object is ineligible for garbage
collection if a reference is found in any of those areas.   After the
last use of a reference variable, it's possible that the reference
still hangs around in some live stack frame location or in a register.
That copy can prevent the reference from being garbage collected.

  (Importantly) This tends to include the
> caller's form for values passed from above.

That's right. So if there is a lexical scope where the last use of a
variable occurs in a function call, which deeply nests, and the last
reference to the object is at the bottom-most activation record, it
may be necessary for that entire chain to pop all the way out before
the object is considered garbage.

> The other, which I've seen called "Aggressive garbage collection" in
> some articles on this behavior in C#/.Net (possibly misnamed as it
> pertains to how the language interacts with the GC, not the GC itself),
> appears to make a memory location eligible the instant the last
> reference to the location is used.  A caller passing a reference to a
> function does not keep it ineligible, and the receiving function removes
> its hold the instance the last reference is made.

However, depending on how this is done, it may introduce performance
penalties.

You don't get more timely GC for nothing.

For instance, suppose that it's done like this: the compiler uses its
variable liveness information to insert additional code which clears
any register or memory location that holds a variable which is no
longer live. Obviously, this costs extra cycles.

It could be done in special cases, like when the last use of a value
is as a downward funarg, and that value was created within that scope
and is known not to have otherwise escaped. It could also be sensitive
to the optimization setting (is size important versus speed).

> I can see the benefits of "aggressive" in composition of code where
> sizeable intermediate results are passed down into successive phases of
> a computation.
>
> In my own exploration I have found a case where the distinction between
> the two, from the point of view of my library and how it should be used,
> makes a huge difference:
>
> I have tried testing in a couple of lisp variants (CLisp, SBCL and
> Lispworks); but may lack the expertise to properly test them.  I was not
> able to get SBCL/CLisp to exhibit the "aggressive" behavior.
>
> In Lispworks Win32, the following code exhibits either behavior
> depending on whether the functions are compiled at a high speed
> setting/without debug info:
>
> (defun secondary (input-list)
>    (format t "input-list length is ~A" (length input-list))
>    (terpri)
>    (format t "Force collection here")
>    (terpri)
>    (force-output)
>    (break)
>    (format t "Don't need input-list here")
>    (force-output))
>
> (defun primary ()
>    (secondary (make-list 10000000)))
>
> (declaim (optimize (safety 0) (speed 3) (debug 0)))
> (compile 'primary)
> (compile 'secondary)
>
> In the function "secondary", at the (break) point the code no longer
> needs that huge memory-hogging list.

But after the (break) point, it quickly returns back to primary, which
just returns. After that, the object is garbage.

It could be a problem in code that doesn't quickly return. Say it uses
the parameter value once, and then doesn't need it any longer, but
goes into a lengthy loop. Perhaps an infinite service loop.

The program could be restructured to avoid the problem.

One obvious thing you could do in this particular case (big list) is
to clobber the list when it's no longer used:

  (rplacd big-list nil)

Now the garbage collector can at least blow away all but the first
cons.

Also, an extension could be provided whereby the programmer could give
the garbage collector a hint that some object is no longer going to be
used. The next time GC is run, the garbage collector would replace
every reference to that object with nil (similar to how weak pointers
lapse) and blow it away.

From: Gene
Subject: Re: Garbage Collection Eligibility and portability
Date: Mon, 03 Dec 2007 04:37:10 +0000
Message-ID: <bb5cfc34-84f1-4ad5-a3c8-e9742e6860a9@e67g2000hsc.googlegroups.com>

On Dec 2, 4:49 pm, Kaz Kylheku <········@gmail.com> wrote:
> On Dec 1, 3:29 pm, Matt L <······@forthebirds.com.com> wrote:
>
> > There seems to be variance in garbage-collected languages, or in options
> > used, that alters exactly when a (strong) variable reference to a memory
> > location ceases holding it from collection.
>
> > One type generally follows scope - that is, while code is executing in a
> > scope where the variable is available the referenced memory is
> > ineligible for collection.
>
> The idea of a scope where a lexical variable is available, is only an
> abstraction that helps us develop the program. Within a lexical
> contour, the lifetime of a variable in fact ends with the last mention
> which takes its value. The variable still has a scope which extends to
> the end of the contour. However, since there are no references to tha
> variable there, and none can be inserted after the code is accepted by
> the compiler, that piece of scope is fictional.
>
> Garbage collectors work at a lower level.  They will scan the stack
> frames for references to an object, as well as all of the relevant
> registers of a frozen thread. An object is ineligible for garbage
> collection if a reference is found in any of those areas.   After the
> last use of a reference variable, it's possible that the reference
> still hangs around in some live stack frame location or in a register.
> That copy can prevent the reference from being garbage collected.
>
>   (Importantly) This tends to include the
>
> > caller's form for values passed from above.
>
> That's right. So if there is a lexical scope where the last use of a
> variable occurs in a function call, which deeply nests, and the last
> reference to the object is at the bottom-most activation record, it
> may be necessary for that entire chain to pop all the way out before
> the object is considered garbage.
>
> > The other, which I've seen called "Aggressive garbage collection" in
> > some articles on this behavior in C#/.Net (possibly misnamed as it
> > pertains to how the language interacts with the GC, not the GC itself),
> > appears to make a memory location eligible the instant the last
> > reference to the location is used.  A caller passing a reference to a
> > function does not keep it ineligible, and the receiving function removes
> > its hold the instance the last reference is made.
>
> However, depending on how this is done, it may introduce performance
> penalties.
>
> You don't get more timely GC for nothing.

Well, actually you can get very close.  The compiler can produce a
static table that shows which pointers in a frame are live as a
function of the return address just below in the stack (this gives the
active function call site).  In this manner the (precise) GC itself
does a tad of extra work, but there's no penalty unless a GC occurs.

A perfect freebie occurs when the register allocator uses the same
register for multiple short-lived variables  and/or skips a callee
save because the corresponding register's live range ends at or before
the call site.

Gene

From: Daniel Weinreb
Subject: Re: Garbage Collection Eligibility and portability
Date: Sun, 02 Dec 2007 17:12:00 +0000
Message-ID: <4752E761.5020204@alum.mit.edu>

I think the systems you are describing as "aggressive" might
be implemented by having the compiler perform lifetime
analysis on variables, and then conveying that information
to where it can be examined at runtime by the GC.  The GC
would be able to tell that certain words on the stack
should not "traced" because they are known to be "dead"
(in the parlance of compiler writers).

I don't know any specific terminology for that.

There isn'tany way to get a given Lisp to act that way.

Usually this would not affect portability.  The only
way I can seriously see it affect portability is if
the delay in returning the garbage could cause Lisp to
run out of memory.  If this really matters, you could
just put a (setq input-list nil) after the first format
call in secondary.

Matt L wrote:
> 
> There seems to be variance in garbage-collected languages, or in options 
> used, that alters exactly when a (strong) variable reference to a memory 
> location ceases holding it from collection.
> 
> One type generally follows scope - that is, while code is executing in a 
> scope where the variable is available the referenced memory is 
> ineligible for collection.  (Importantly) This tends to include the 
> caller's form for values passed from above.
> 
> The other, which I've seen called "Aggressive garbage collection" in 
> some articles on this behavior in C#/.Net (possibly misnamed as it 
> pertains to how the language interacts with the GC, not the GC itself), 
> appears to make a memory location eligible the instant the last 
> reference to the location is used.  A caller passing a reference to a 
> function does not keep it ineligible, and the receiving function removes 
> its hold the instance the last reference is made.
> 
> I can see the benefits of "aggressive" in composition of code where 
> sizeable intermediate results are passed down into successive phases of 
> a computation.
> 
> In my own exploration I have found a case where the distinction between 
> the two, from the point of view of my library and how it should be used, 
> makes a huge difference:
> 
> I have tried testing in a couple of lisp variants (CLisp, SBCL and 
> Lispworks); but may lack the expertise to properly test them.  I was not 
> able to get SBCL/CLisp to exhibit the "aggressive" behavior.
> 
> In Lispworks Win32, the following code exhibits either behavior 
> depending on whether the functions are compiled at a high speed 
> setting/without debug info:
> 
> 
> (defun secondary (input-list)
>   (format t "input-list length is ~A" (length input-list))
>   (terpri)
>   (format t "Force collection here")
>   (terpri)
>   (force-output)
>   (break)
>   (format t "Don't need input-list here")
>   (force-output))
> 
> (defun primary ()
>   (secondary (make-list 10000000)))
> 
> 
> (declaim (optimize (safety 0) (speed 3) (debug 0)))
> (compile 'primary)
> (compile 'secondary)
> 
> 
> In the function "secondary", at the (break) point the code no longer 
> needs that huge memory-hogging list.  Cleaning out ram at this point 
> will show how the current environment and settings deal with such 
> references.
> 
> Is there a way to make any given lisp exhibit this behavior, 
> surrendering its references the instant it is no longer needed (figure 
> it's inherent to how the implementation uses the stack; but thought I'd 
> ask)?  Is there a good way to emulate this behavior without requiring 
> this compiler behavior or characteristic?
> 
> I have a library I'd like to publish as portable but this issue may 
> influence its implementation or usage patterns.
> 
> Any information on this subject is appreciated.  Any good terminology I 
> can search on?  Links to discussion of this topic?
> 
>

From: Matthew Lamari
Subject: Re: Garbage Collection Eligibility and portability
Date: Sun, 02 Dec 2007 19:02:34 +0000
Message-ID: <yiD4j.30142$Pv2.1266@newssvr23.news.prodigy.net>

Daniel Weinreb wrote:
> the delay in returning the garbage could cause Lisp to
> run out of memory.  If this really matters, you could
> just put a (setq input-list nil) after the first format
> call in secondary.

That's the core problem (run out of memory, inability to scale in 
certain directions); but (at least in my test) the CALLER also keeps the 
item ineligible for collection.

The aggressiveness property is very useful, my question was really 
two-fold, how to work  around the absence of this analysis, and if it 
can be recaptured in some way.

I went into detail on what I'm seeing and how it affects what I'm trying 
to do on my other response.

But with early eligibility for garbage collection, code like this 
becomes (theoretically) possible (assume three-gigabyte-list to mean 
"can't have 2 of these in ram").

(let* ((a (make-three-gigabyte-list))
        (b (mapcar #'func-1 a))
        (c (mapcar #'func-2 b)))
   (mapcar #'func-3 c))

But it's beginning to sound like it's better to code so as to not assume 
this compiler behavior.  Even in the implementation where I found it can 
be done, it can only be done when caller and function are compiled in 
no-debug mode on non-generic functions - which may not be a tolerable 
restriction.

From: Tim Bradshaw
Subject: Re: Garbage Collection Eligibility and portability
Date: Sun, 02 Dec 2007 20:50:24 +0000
Message-ID: <2faca3db-9ae1-4430-9fdc-f89b9e37c94e@y5g2000hsf.googlegroups.com>

On Dec 2, 7:02 pm, Matthew Lamari <······@forthebirds.com.com> wrote:

>
> (let* ((a (make-three-gigabyte-list))
>         (b (mapcar #'func-1 a))
>         (c (mapcar #'func-2 b)))
>    (mapcar #'func-3 c))
>
> But it's beginning to sound like it's better to code so as to not assume
> this compiler behavior.  Even in the implementation where I found it can
> be done, it can only be done when caller and function are compiled in
> no-debug mode on non-generic functions - which may not be a tolerable
> restriction.

What you're doing is pretty much analogous to assumign tail-call
elimination.  In fact it's more-or-less exactly the same thing, since
you can almost certainly rephrase these things as tail calls. which
may or may not be eliminated.  If you're writing programs which make
that kind of assumption, well, don't.

From: George Neuner
Subject: Re: Garbage Collection Eligibility and portability
Date: Mon, 03 Dec 2007 02:14:32 +0000
Message-ID: <anl6l3p6jm66qsk988ecoelofp59rklqh6@4ax.com>

On Sun, 02 Dec 2007 17:12:00 GMT, Daniel Weinreb <···@alum.mit.edu>
wrote:

>I think the systems you are describing as "aggressive" might
>be implemented by having the compiler perform lifetime
>analysis on variables, and then conveying that information
>to where it can be examined at runtime by the GC.  The GC
>would be able to tell that certain words on the stack
>should not "traced" because they are known to be "dead"
>(in the parlance of compiler writers).

That's how precise GC can be implemented for statically typed
languages that use untagged values - the call frame contains a map of
active pointers which is updated at points where GC could occur.
Those points depend on the design of the system but in general the map
would have be updated before every function call in case GC is
triggered by a child frame.

>I don't know any specific terminology for that.

There isn't any that I know of either.

George
--
for email reply remove "/" from address

From: William D Clinger
Subject: Re: Garbage Collection Eligibility and portability
Date: Tue, 04 Dec 2007 15:53:23 +0000
Message-ID: <0c347fc7-7668-49c9-8cf1-9ca0b4c0605a@s12g2000prg.googlegroups.com>

Daniel Weinreb wrote:
> I think the systems you are describing as "aggressive" might
> be implemented by having the compiler perform lifetime
> analysis on variables, and then conveying that information
> to where it can be examined at runtime by the GC.  The GC
> would be able to tell that certain words on the stack
> should not "traced" because they are known to be "dead"
> (in the parlance of compiler writers).
>
> I don't know any specific terminology for that.

It's usually called "safe-for-space", but there are several
different properties that go by that name, differing in
their agressiveness [1,2,3].  Indeed, proper tail recursion
is one of the less aggressive safe-for-space properties [2].

Will

[1] Zhong Shao and Andrew W Appel.  Space-efficient closure
representations.  ACM Lisp Pointers VII(3), July-Sept 1994,
pages 150-161.  See also [3].
[2] William D Clinger.  Proper tail recursion and space efficiency.
ACM PLDI, 1988, pages 174-185.
[3] Zhong Shao and Andrew W Appel.  Efficient and safe-for-space
closure conversion.  ACM TOPLAS 22(1), January 2000, pages 129-161.

From: William D Clinger
Subject: Re: Garbage Collection Eligibility and portability
Date: Tue, 04 Dec 2007 16:00:13 +0000
Message-ID: <7aad4dca-b1a4-43ca-bb79-da736af5f806@s19g2000prg.googlegroups.com>

Oops.  Minor correction to the date of a reference, adding a URL:

[2] William D Clinger.  Proper tail recursion and space efficiency.
ACM PLDI, 1998, pages 174-185.
ftp://ftp.ccs.neu.edu/pub/people/will/tail.ps.gz

Will

From: szergling
Subject: Re: Garbage Collection Eligibility and portability
Date: Mon, 03 Dec 2007 04:33:22 +0000
Message-ID: <0555ec0e-e608-4c79-aa95-c5b5a811ea33@s36g2000prg.googlegroups.com>

On Dec 2, 12:29 pm, Matt L <······@forthebirds.com.com> wrote:

> There seems to be variance in garbage-collected languages, or
> in options used, that alters exactly when a (strong) variable
> reference to a memory location ceases holding it from
> collection.
>
> One type generally follows scope - that is, while code is
> executing in a scope where the variable is available the
> referenced memory is ineligible for collection.  (Importantly)
> This tends to include the caller's form for values passed from
> above.

[[ ... ]]

> Any information on this subject is appreciated.  Any good
> terminology I can search on?  Links to discussion of this
> topic?

This hasn't been mentioned, but might be somewhat related:

A DYNAMIC-EXTENT declaration, as far as I understand, hints to
the implementation that some objects may optionally be allocated
on the stack, bypassing gc altogether. Function return (or any
stack unwind) automatically 'reclaims memory' like local
variables in C.

I'm not sure it's useful for large objects though... and it is
only an optional optimisation.