Forgetting stuff is better than dying

From: William Bland
Subject: Forgetting stuff is better than dying
Date: Mon, 31 Oct 2005 17:59:12 +0000
Message-ID: <pan.2005.10.31.17.59.10.962364@gmail.com>

I have a huge hash-table that contains a bunch of information, some of which is
more important than the rest.

The importance of different pieces of information may change over time.

The memory taken up by each piece of information is not predictable (each entry
is an arbitrary-size tree).

The hash-table gets added to over time (not at a constant rate, but
sporadically).

Now the problem. Rather than dying when it runs out of memory, I'd prefer for a
program to look at this hash-table and discard the least-important pieces
of information.

I'm assuming there's no portable way to achieve this.  Have I missed
something?

If there is no portable way, where should I start looking for
implementation-specific stuff?  I'm supposing that there might be a
possibility to install a hook that gets called when a runtime has no more
memory and fails to allocate memory, but my Google skills are weak today
and I can't find anything.

I'm using SBCL but would be interested to hear what's available in other
implementations.

Thanks and best wishes,
	Bill.

Re: Forgetting stuff is better than dying JP Massar
- Re: Forgetting stuff is better than dying Steven M. Haflich
- Re: Forgetting stuff is better than dying Björn Lindberg
  - Re: Forgetting stuff is better than dying David Golden
    - Re: Forgetting stuff is better than dying Christophe Rhodes
      - Re: Forgetting stuff is better than dying David Golden
Re: Forgetting stuff is better than dying Pascal Bourguignon
- Re: Forgetting stuff is better than dying William Bland
  - Re: Forgetting stuff is better than dying Pascal Bourguignon
  - Re: Forgetting stuff is better than dying Bulent Murtezaoglu
    - Re: Forgetting stuff is better than dying justinhj
Re: Forgetting stuff is better than dying Christian Lynbech
Re: Forgetting stuff is better than dying Alain Picard

From: JP Massar
Subject: Re: Forgetting stuff is better than dying
Date: Mon, 31 Oct 2005 21:26:34 +0000
Message-ID: <mm2dm11d10ulam7s9j131anlpkc6p6b8pl@4ax.com>

On Mon, 31 Oct 2005 17:59:12 GMT, William Bland
<···············@gmail.com> wrote:

>I have a huge hash-table that contains a bunch of information, some of which is
>more important than the rest.
>
>The importance of different pieces of information may change over time.
>
>The memory taken up by each piece of information is not predictable (each entry
>is an arbitrary-size tree).
>
>The hash-table gets added to over time (not at a constant rate, but
>sporadically).
>
>Now the problem. Rather than dying when it runs out of memory, I'd prefer for a
>program to look at this hash-table and discard the least-important pieces
>of information.
>
>I'm assuming there's no portable way to achieve this.  Have I missed
>something?

Probably not.  But you can try to catch a STORAGE-CONDITION condition,
and if a given implementation doesn't die hideously you might be able
to salvage your hash table.

(You could also save some fairly large block of memory in some
variable and when you get the storage condition immediately release
it,  thereby perhaps giving your implementation a fighting chance
to recover from the out-of-memory problem.)

My guess is many implementations will crash and burn no matter
what you do but at least under some conditions (pun intended...)
I've observed that Allegro can recover.


>
>If there is no portable way, where should I start looking for
>implementation-specific stuff?  I'm supposing that there might be a
>possibility to install a hook that gets called when a runtime has no more
>memory and fails to allocate memory, but my Google skills are weak today
>and I can't find anything.
>
>I'm using SBCL but would be interested to hear what's available in other
>implementations.
>
>Thanks and best wishes,
>	Bill.

From: Steven M. Haflich
Subject: Re: Forgetting stuff is better than dying
Date: Tue, 01 Nov 2005 04:30:41 +0000
Message-ID: <RbC9f.8628$7h7.374@newssvr21.news.prodigy.com>

JP Massar wrote:

> Probably not.  But you can try to catch a STORAGE-CONDITION condition,
> and if a given implementation doesn't die hideously you might be able
> to salvage your hash table.

Handling storage-condition can be difficult in a multiprocessing
environment.  Handlers are part of the dynamic environment, and
each process has its own dynamic environment.  Your main-line
thread may do nearly all the consing, but it might happen that
storage-condition is signalled on the single cons done by some
event handler in some implementation thread of which your
application isn't even aware.  I imagine that multiprocessing
implementations would want some notion of a global handler, but
I don't know of any implementation that provides this.  It would
be useful for non-process-specific conditions, not only ones like
storage-condition but also ones like user-interrupt.

I imagine Java may have some of the same issues, at least for
storage-condition, but I don't know the details.  (And don't really
_want_ to know them.  I'm much to old to learn new things, and
you can't imagine how many years it took me to learn that.)

> (You could also save some fairly large block of memory in some
> variable and when you get the storage condition immediately release
> it,  thereby perhaps giving your implementation a fighting chance
> to recover from the out-of-memory problem.)

Some implementations may provide a way to deal with memory limits, but
I don't know of anything that reliably allows memory to be "released".
That's not the way Lisp memory is managed.  The fact that you have
maintained no pointer to an object doesn't prove that a reference isn't
still sitting in some dead word in some stack frame, for example.

From: Björn Lindberg
Subject: Re: Forgetting stuff is better than dying
Date: Wed, 02 Nov 2005 09:22:28 +0000
Message-ID: <9mpwtjr8kvf.fsf@muvclx01.cadence.com>

William Bland <···············@gmail.com> writes:

> On Mon, 31 Oct 2005 13:26:34 -0800, JP Massar wrote:
> 
> > On Mon, 31 Oct 2005 17:59:12 GMT, William Bland
> > <···············@gmail.com> wrote:
> >>
> >>Now the problem. Rather than dying when it runs out of memory, I'd
> >>prefer for a program to look at this hash-table and discard the
> >>least-important pieces of information.
> >>
> >>I'm assuming there's no portable way to achieve this.  Have I missed
> >>something?
> > 
> > Probably not.  But you can try to catch a STORAGE-CONDITION condition,
> > and if a given implementation doesn't die hideously you might be able
> > to salvage your hash table.
> 
> On further investigation, both SBCL and CLisp will happily continue to
> allocate memory on my machine until the Linux kernel's OOM (Out Of Memory)
> killer terminates them.  So I guess the first step is to learn about the
> OOM killer and whether it can be persuaded not to be quite so aggressive.
> 
> Thanks for the ideas though.  I'll keep experimenting...

At least at some point you could turn off the overcommit "feature" of
the Linux kernal via the /proc file system. Overcommit works on the
principle that most programs don't actually use all the memory they
allocate, so let the kernel give out more memory than it has, and if
it still runs out start killing processes. (Or this is my superficial
understanding at least. :-) Judging from the machine I'm sitting in
front of right now, you have a look at /proc/sys/vm/overcommit_memory
and set it to 0 unless it already is.

Bj�rn

From: David Golden
Subject: Re: Forgetting stuff is better than dying
Date: Fri, 04 Nov 2005 20:18:53 +0000
Message-ID: <NmPaf.18214$R5.1796@news.indigo.ie>

Bj�rn Lindberg wrote:

> Judging from the machine I'm sitting in 
> front of right now, you have a look at /proc/sys/vm/overcommit_memory
> and set it to 0 unless it already is.
> 

Hmm. In linux, unless it changed more recently than 2.6.11: (see
Documentation/vm/overcommit-accounting.txt in kernel sources)

0 is "heuristic overcommit", where linux will happily allow some
overcommit unless it, um, doesn't.  It is of course the usual default,
for maximal infuriation.  

1 is "always overcommit". In a way I prefer it to 0, because at least
it's predictably horrific.

2 is "don't overcommit".   This is the only sane setting. 

See /proc/meminfo for details of total allowed memory commit and current
memory commit. 

But IIRC, you might run into problems with 2 on systems with very small
amounts of real memory and swap with certain lisp implementations,
because of, um, less than satisfactory memory allocation handling. IIRC
CMUCL and SBCL used to just merrily allocate the entire virtual address
space or something like that. What happened to the SBCL or CMUCL
"allocate memory as necessary" patches, or am I just out of date and
now one or both have the feature?

From: Christophe Rhodes
Subject: Re: Forgetting stuff is better than dying
Date: Sat, 05 Nov 2005 08:54:05 +0000
Message-ID: <sqmzkjh3v6.fsf@cam.ac.uk>

David Golden <············@oceanfree.net> writes:

> But IIRC, you might run into problems with 2 on systems with very small
> amounts of real memory and swap with certain lisp implementations,
> because of, um, less than satisfactory memory allocation handling. IIRC
> CMUCL and SBCL used to just merrily allocate the entire virtual address
> space or something like that.

Not quite the entire virtual address space, but yes, with
MAP_NORESERVE -- to inform the kernel that it need not worry about
overcommitting.

> What happened to the SBCL or CMUCL "allocate memory as necessary"
> patches, or am I just out of date and now one or both have the
> feature?

The patches don't work.  Feel free to fix them.

Christophe

From: David Golden
Subject: Re: Forgetting stuff is better than dying
Date: Sat, 05 Nov 2005 19:59:32 +0000
Message-ID: <Ea8bf.18257$R5.1536@news.indigo.ie>

Christophe Rhodes wrote:
> Not quite the entire virtual address space, but yes, with
> MAP_NORESERVE -- to inform the kernel that it need not worry about
> overcommitting.

Hm. Seems Sun Java does that too... but may not be all roses?
http://lkml.org/lkml/2005/8/6/137

So, if overcommit is off, noreserve pages are reserved and count towards
commit total, at least back in august.

When similar functionality to MAP_NORESERVE was introduced for shmget()
on hallowe'en, similar behaviour was deliberately carried across, too:
http://lkml.org/lkml/2005/10/31/289

> The patches don't work.  Feel free to fix them.

I'd probably need to learn an awful lot more than I know now about CMUCL
or SBCL internals to actually do anything about it, I'm mostly just
whining.

From: Pascal Bourguignon
Subject: Re: Forgetting stuff is better than dying
Date: Mon, 31 Oct 2005 18:55:30 +0000
Message-ID: <878xw9plct.fsf@thalassa.informatimago.com>

William Bland <···············@gmail.com> writes:

> I have a huge hash-table that contains a bunch of information, some of which is
> more important than the rest.
>
> The importance of different pieces of information may change over time.
>
> The memory taken up by each piece of information is not predictable (each entry
> is an arbitrary-size tree).
>
> The hash-table gets added to over time (not at a constant rate, but
> sporadically).
>
> Now the problem. Rather than dying when it runs out of memory, I'd prefer for a
> program to look at this hash-table and discard the least-important pieces
> of information.
>
> I'm assuming there's no portable way to achieve this.  Have I missed
> something?
>
> If there is no portable way, where should I start looking for
> implementation-specific stuff?  I'm supposing that there might be a
> possibility to install a hook that gets called when a runtime has no more
> memory and fails to allocate memory, but my Google skills are weak today
> and I can't find anything.
>
> I'm using SBCL but would be interested to hear what's available in other
> implementations.

Indeed, you'll have to implement your own data type, and/or to use
extensions.  You could put to good use weak-pointers. In clisp, there
are weak-hash-tables, I guess sbcl is not far behind.  So when you put
data into a recycling-hash-table of yours, you'd have to determine if
some entry is least-important, and if there are, use a weak-pointer to
keep hold of them.  Then when there's not enough memory, they'll be
garbage collecter, and the next time you need them you'll be able to
see if they're still there of if they've been collected and act
accordingly.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS d? s++:++ a+ C+++ UL++++ P--- L+++ E+++ W++ N+++ o-- K- w--- 
O- M++ V PS PE++ Y++ PGP t+ 5+ X++ R !tv b+++ DI++++ D++ 
G e+++ h+ r-- z? 
------END GEEK CODE BLOCK------

From: William Bland
Subject: Re: Forgetting stuff is better than dying
Date: Mon, 31 Oct 2005 19:13:47 +0000
Message-ID: <pan.2005.10.31.19.13.41.604196@gmail.com>

On Mon, 31 Oct 2005 19:55:30 +0100, Pascal Bourguignon wrote:

> William Bland <···············@gmail.com> writes:
> 
>> I have a huge hash-table that contains a bunch of information, some of
>> which is more important than the rest.
[snip]
> 
> Indeed, you'll have to implement your own data type, and/or to use
> extensions.  You could put to good use weak-pointers. In clisp, there
> are weak-hash-tables, I guess sbcl is not far behind.  So when you put
> data into a recycling-hash-table of yours, you'd have to determine if
> some entry is least-important, and if there are, use a weak-pointer to
> keep hold of them.  Then when there's not enough memory, they'll be
> garbage collecter, and the next time you need them you'll be able to
> see if they're still there of if they've been collected and act
> accordingly.

Thanks Pascal,

I had thought about weak-pointers, but I have to admit I don't know enough
about them.

My impression was that any object only accessible by weak-pointers would
be eligible for garbage-collection at *any* time.  If that's the case,
this doesn't seem to be quite what I'm after.

I only want to start forgetting stuff when it's absolutely necessary (i.e.
if the system would otherwise fall over and die).  Even the less-important
stuff is still important enough to keep around when that's possible.

Perhaps I've misunderstood the use of weak-pointers though?

Thanks again,
	Bill.

From: Pascal Bourguignon
Subject: Re: Forgetting stuff is better than dying
Date: Mon, 31 Oct 2005 19:41:24 +0000
Message-ID: <87zmopo4nv.fsf@thalassa.informatimago.com>

William Bland <···············@gmail.com> writes:

> On Mon, 31 Oct 2005 19:55:30 +0100, Pascal Bourguignon wrote:
>
>> William Bland <···············@gmail.com> writes:
>> 
>>> I have a huge hash-table that contains a bunch of information, some of
>>> which is more important than the rest.
> [snip]
>> 
>> Indeed, you'll have to implement your own data type, and/or to use
>> extensions.  You could put to good use weak-pointers. In clisp, there
>> are weak-hash-tables, I guess sbcl is not far behind.  So when you put
>> data into a recycling-hash-table of yours, you'd have to determine if
>> some entry is least-important, and if there are, use a weak-pointer to
>> keep hold of them.  Then when there's not enough memory, they'll be
>> garbage collecter, and the next time you need them you'll be able to
>> see if they're still there of if they've been collected and act
>> accordingly.
>
> Thanks Pascal,
>
> I had thought about weak-pointers, but I have to admit I don't know enough
> about them.
>
> My impression was that any object only accessible by weak-pointers would
> be eligible for garbage-collection at *any* time.  If that's the case,
> this doesn't seem to be quite what I'm after.
>
> I only want to start forgetting stuff when it's absolutely necessary (i.e.
> if the system would otherwise fall over and die).  Even the less-important
> stuff is still important enough to keep around when that's possible.
>
> Perhaps I've misunderstood the use of weak-pointers though?

You should check what weak-pointer features are provided by your
implementation.  Perhaps they're collected if the next garbage
collection doesn't recover enough memory, or perhaps they're collected
only if no more memory can be grabbed from the OS.

In anycase, you can modulate the behavior by choosing when you switch
from a normal reference to a weak-pointer reference (and back if the
object start to be used again).

You will be implementing and using your own datatype that will
encapsulate these decisions and the use of a hask-table and the
weak-pointers.

If you want to decide yourself when least-important objects start to
be collected, you'd have to use lower-level hooks to the garbage
collector, if they're available.  

-- 
"A TRUE Klingon warrior does not comment his code!"

From: Bulent Murtezaoglu
Subject: Re: Forgetting stuff is better than dying
Date: Mon, 31 Oct 2005 19:37:13 +0000
Message-ID: <87br15v5p2.fsf@p4.internal>

>>>>> "WB" == William Bland <···············@gmail.com> writes:
[...]
    WB> I only want to start forgetting stuff when it's absolutely
    WB> necessary (i.e.  if the system would otherwise fall over and
    WB> die).  Even the less-important stuff is still important enough
    WB> to keep around when that's possible.

Hmm, maybe a quick solution could be cooked up by using a GC hook provided 
by your implementation.  If you could figure out (perhaps by some impl. 
extension to room) whether the disaster is near, you could walk over your
hash table entries and turn the relatively unimportant ones into weak 
pointers (or remhash them).  SBCL seems to offer a post-gc hook:

http://www.sbcl.org/manual/Garbage-Collection.html#Garbage%20Collection

Though a pre-gc hook would be better, I think.

I think this scheme will get you partly there, though you'll need to
tune both the GC triggering and the available memory threshold for it
to work reliably.  You'll also need to be careful about consing in 
your reclamation function.  Lots of details, but it seems it could be
made to work w/o going deep into the internals of your lisp.

cheers,

BM

From: justinhj
Subject: Re: Forgetting stuff is better than dying
Date: Tue, 01 Nov 2005 15:29:11 +0000
Message-ID: <1130858951.602626.211460@g47g2000cwa.googlegroups.com>

Can you sacrifice access time? If you can change the data structures
you're using you can manage the problem without getting under the hood.

Perhaps store the items in an array organised as a heap giving rapid
removal of the item with least priority when you want to add an item to
a full array.

Then make a binary tree which indexes the array so it can be sorted by
what you currently hash.

Justin

From: Christian Lynbech
Subject: Re: Forgetting stuff is better than dying
Date: Mon, 31 Oct 2005 21:50:39 +0000
Message-ID: <87br15z780.fsf@chateau.defun.dk>

I know that ACL signals an out of memory exception in certain
situations. It is something like that if a GC fails to recover a
certain amount of space, a request is made to the OS for more memory
and if that fails an exception is raised. Fixing the situation is a
simple matter of adding an error handler around the code in question.

I would not be surprised if SBCL did something similar.


------------------------+-----------------------------------------------------
Christian Lynbech       | christian ··@ defun #\. dk
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - ·······@hal.com (Michael A. Petonic)

From: Alain Picard
Subject: Re: Forgetting stuff is better than dying
Date: Tue, 01 Nov 2005 06:56:18 +0000
Message-ID: <87k6fssvot.fsf@memetrics.com>

William Bland <···············@gmail.com> writes:

> I have a huge hash-table that contains a bunch of information, some of which is
> more important than the rest.

I use an "expiring" hash table for this purpose; you specify
a high water mark level, and it starts evicting objects from
the cache based on time of last access.  Requires a multithreaded
lisp.  It's a bit heavyweight... but maybe this is the sort of
thing you were thinking about?

Otherwise you're down to seeing if your implementation is willing
to throw some sort of "memory exhausted" error, and wrap everything
with some handler to catch it, clear your cache, gc, and try again
sort of thing.

                --ap