From: Coby Beck
Subject: Memory Exception
Date: 
Message-ID: <ep6T7.94198$oj3.17515067@typhoon.tampabay.rr.com>
Hello Group,

This may seem an odd sort of question but I need some third party coroboration
for something I believe to be true--or perhaps I'll learn something important.

I am using LWW to write a server application that exposes a simple socket
command interface to provide some complicated services to Java client code.
When a client connects, I spawn a master process that spawns a slave process to
process data and another listener process to establish another socket
connection on a dynamically allocated port for receiving and sending that data.
What the server does involves generating, compiling and executing code.  I have
done some checking and I don't seem to have any processes left hanging or
streams not closed..(not a big challenge thanks to UNWIND-PROTECT :)

That's just general background because I don't know what could be significant
to answering my question.  I am getting an application crash due to a memory
exception error.  A bug report is in to Xanalys, I have every confidence they
will respond quickly and with a solution as they always have.  Generally, after
running on my own mahine (which is remote so network stuff is much slower) for
about an hour with the number of processes hovering between 20 and 50 and with
memory usage flucuating but slowly climbing from 20MB to 175MB the app freezes
for ~5 minutes and then dies a horrible C-like death.

So what's my question?  Well, my boss is naturally worried (he is also surround
by suspicious Java developers) and thinks I should be doing something more than
wait.  I believe this is out of my own code's domain....So:

Is there any thing that I am doing that could *possibly* cause, or even just
make more likely, such a catastrophic failure?  Are there any pearls of
socket-server wisdom I may not be privy too?  My belief is this kind of error
is _entirely_ in the implementation's domain, but I would like to hear from
more experienced lispers.  Perhaps there is something I can check or monitor so
I can take action or know what to avoid doing.  Any and all advice is welcome.

Again I am taking the appropriate steps with my vendor, but is that all I can
do?  TIA

--
Coby
(remove #\space "coby . beck @ opentechgroup . com")

From: Wade Humeniuk
Subject: Re: Memory Exception
Date: 
Message-ID: <9vj4b9$n0f$1@news3.cadvision.com>
> running on my own mahine (which is remote so network stuff is much slower)
for
> about an hour with the number of processes hovering between 20 and 50 and
with
> memory usage flucuating but slowly climbing from 20MB to 175MB the app
freezes
> for ~5 minutes and then dies a horrible C-like death.

How much memory does your machine have (sounds like you might have about
256)?  I had a similar problem developing an app that had a large amount of
data (compared to physical memory).  The machine would just thrash away when
doing a full gc (and sometimes crash).  Maybe increasing your memory will
solve the problem for the short term (or forever).  Anyways it would still
be insightful.

The crash might be Windows actually crashing because of problems in the
virtual memory system.  Which Windows are you using?

Wade
From: Coby Beck
Subject: Re: Memory Exception
Date: 
Message-ID: <qZ8T7.96108$oj3.17759131@typhoon.tampabay.rr.com>
"Wade Humeniuk" <········@cadvision.com> wrote in message
·················@news3.cadvision.com...
> > running on my own mahine (which is remote so network stuff is much slower)
> for
> > about an hour with the number of processes hovering between 20 and 50 and
> with
> > memory usage flucuating but slowly climbing from 20MB to 175MB the app
> freezes
> > for ~5 minutes and then dies a horrible C-like death.
>
> How much memory does your machine have (sounds like you might have about
> 256)?  I had a similar problem developing an app that had a large amount of
> data (compared to physical memory).  The machine would just thrash away when
> doing a full gc (and sometimes crash).  Maybe increasing your memory will
> solve the problem for the short term (or forever).  Anyways it would still
> be insightful.
>
> The crash might be Windows actually crashing because of problems in the
> virtual memory system.  Which Windows are you using?
>

Windows 2000Pro with 256MB.  I will suggest more memory, thanks.

--
Coby
(remove #\space "coby . beck @ opentechgroup . com")
From: Wade Humeniuk
Subject: Re: Memory Exception
Date: 
Message-ID: <9vjb9i$pcp$1@news3.cadvision.com>
>
> Windows 2000Pro with 256MB.  I will suggest more memory, thanks.


I am no LW gc expert but looking at the manual for set-gc-parameters

http://www.xanalys.com/software_tools/reference/lww41/lwref/LWRM-135.html#HE
ADING135-0

If it is a generation 2 collection problem maybe you can change some gc
parameters like minimum-for-promote so that objects will take longer to get
to gen 2 and thus be collected before they get to generation 2.

Even though you may add more memory you could also change minimum-for-sweep
to a lower value to cause gc more often (before objects get to gen 2).

From the manual entry for mark-and-sweep:

http://www.xanalys.com/software_tools/reference/lww41/lwref/LWRM-411.html#MA
RKER-9-1101

it gives the empression that gen 2 garbage that may be application generated
is when longer lived objects become garbage.  You mentioned that you
generate, compile and execute code on the fly.  When is that code removed
from the system.  Is this the stuff that is living into gen 2?  If it is
never removed your image will just keep increasing in size.

Also from:

http://www.xanalys.com/software_tools/reference/lww41/lwuser/LWUG_65.HTM#HEA
DING65-0

interned symbols go into gen 2 automatically.  Are you creating these?

Wade
From: Christopher Stacy
Subject: Re: Memory Exception
Date: 
Message-ID: <uitb7dj3s.fsf@spacy.Boston.MA.US>
Things that you might program which have "undefined consequences"
could include trashing your memory.  (But maybe there's just a bug
with the GC or FFI or sockets interface in the Lisp implementation.)
From: Coby Beck
Subject: Re: Memory Exception
Date: 
Message-ID: <Ra9T7.96204$oj3.17777233@typhoon.tampabay.rr.com>
"Christopher Stacy" <······@spacy.Boston.MA.US> wrote in message
··················@spacy.Boston.MA.US...
> Things that you might program which have "undefined consequences"
> could include trashing your memory.  (But maybe there's just a bug
> with the GC or FFI or sockets interface in the Lisp implementation.)
>

I guess that was an implicit part of the question, are there ways to trash you
memory in lisp without making special efforts?  I do not use any intentionally
memory munging stuff.

--
Coby
(remove #\space "coby . beck @ opentechgroup . com")
From: Christopher Stacy
Subject: Re: Memory Exception
Date: 
Message-ID: <uelluent3.fsf@spacy.Boston.MA.US>
>>>>> On Sun, 16 Dec 2001 22:24:49 GMT, Coby Beck ("Coby") writes:

 Coby> "Christopher Stacy" <······@spacy.Boston.MA.US> wrote in message
 Coby> ··················@spacy.Boston.MA.US...
 >> Things that you might program which have "undefined consequences"
 >> could include trashing your memory.  (But maybe there's just a bug
 >> with the GC or FFI or sockets interface in the Lisp implementation.)

 Coby> I guess that was an implicit part of the question, are there
 Coby> ways to trash you memory in lisp without making special
 Coby> efforts?  I do not use any intentionally memory munging stuff.

If you look up the phrase "The consequences are undefined", under
"Error Terminology (1.4.2)", you will see that anywhere in the spec 
it says that, it means exactly that.

The example they give there is assigning or binding a DEFCONSTANT name.
If you do that, a conforming Lisp implementation could do things such as:
ignore the problem and continue like nothing happened, bind some other
variable than you specified or otherwise randomize your bindings,
entirely crash with a memory fault, delete all your files, or look up
your credit card info Microsoft Wallet and launch a web browser and
connect to Amazon and order you some books on Java.

However, I think crashing with a memory fault would be the 
most likely thing.   That's just one example, though.

A Lisp implementation that I used long ago on Unix would crash with
a segmentation fault if it executed the following code:
        (FORMAT "Did I forget something~P?" things)
From: Marc Battyani
Subject: Re: Memory Exception
Date: 
Message-ID: <F0D3676EBA3B6ED2.0C51D50C1FDD6186.0FE0B373E0EA8B35@lp.airnews.net>
"Coby Beck" <·····@mercury.bc.ca> wrote

> That's just general background because I don't know what could be
significant
> to answering my question.  I am getting an application crash due to a
memory
> exception error.  A bug report is in to Xanalys, I have every confidence
they
> will respond quickly and with a solution as they always have.  Generally,
after
> running on my own mahine (which is remote so network stuff is much slower)
for
> about an hour with the number of processes hovering between 20 and 50 and
with
> memory usage flucuating but slowly climbing from 20MB to 175MB the app
freezes
> for ~5 minutes and then dies a horrible C-like death.

I had the same problem of memory increasing. It was also in a web service
(Apache+mod_lisp) called by Java & IIs applications. We made tests with 300
simultaneous clients (i.e. 300 Lisp threads).
It is because the generation 2 is not GCed by default. You have to
explicitly call mark-and-sweep gen 2 from time to time.
Xanalys seems to be aware of that and IIRC in LW 4.2 the system code will be
in generation 3 so that gen 2 will be GCed.
With the current version, it takes a long time to do a full GC.
It didn't crashed though.

Marc
From: Coby Beck
Subject: Re: Memory Exception
Date: 
Message-ID: <Ow9T7.96318$oj3.17807909@typhoon.tampabay.rr.com>
"Marc Battyani" <·············@fractalconcept.com> wrote in message
·······················································@lp.airnews.net...
> "Coby Beck" <·····@mercury.bc.ca> wrote
>
> > That's just general background because I don't know what could be
> significant
> > to answering my question.  I am getting an application crash due to a
> memory
> > exception error.  A bug report is in to Xanalys, I have every confidence
> they
> > will respond quickly and with a solution as they always have.  Generally,
> after
> > running on my own mahine (which is remote so network stuff is much slower)
> for
> > about an hour with the number of processes hovering between 20 and 50 and
> with
> > memory usage flucuating but slowly climbing from 20MB to 175MB the app
> freezes
> > for ~5 minutes and then dies a horrible C-like death.
>
> I had the same problem of memory increasing. It was also in a web service
> (Apache+mod_lisp) called by Java & IIs applications. We made tests with 300
> simultaneous clients (i.e. 300 Lisp threads).
> It is because the generation 2 is not GCed by default. You have to
> explicitly call mark-and-sweep gen 2 from time to time.
> Xanalys seems to be aware of that and IIRC in LW 4.2 the system code will be
> in generation 3 so that gen 2 will be GCed.
> With the current version, it takes a long time to do a full GC.
> It didn't crashed though.
>

Thanks I'll investigate that idea.  For least impact on server responsivness
how often do you recommend making a call like that?

--
Coby
(remove #\space "coby . beck @ opentechgroup . com")
From: Espen Vestre
Subject: Re: Memory Exception
Date: 
Message-ID: <w6lmg2z0yg.fsf@wallace.ws.nextra.no>
"Coby Beck" <·····@mercury.bc.ca> writes:

> Thanks I'll investigate that idea.  For least impact on server responsivness
> how often do you recommend making a call like that?

In applications that use moderate amounts of memory (but much more
than your 20MB), we run mark-and-sweep on gen. 2 every hour. It's
definitely much more than necessary, but for moderately-sized
applications, the mark-and-sweep operation doesn't take much time
(less than a second if less than 50MB is allocated, I think). For a
more memory-hungry application, I have developed a more "intelligent"
scheme, where memory compaction is done (in small steps, since
compaction runs may take 20 seconds in that huge application) every
time generation 2 is fragmented, and where mark-and-sweep is run only
if the amount of free memory in gen. 2 has decreased with more than
20MB since the last run.

I think you should try the hourly scheme first (run it off your
internal "cron-in-lisp", which you may have implemented by now if
you've worked on lisp server applications for a while ;-)). You should
log ROOM output before and after the GC to a file, and you'll quickly
see if you're running it too frequently.

The people at Xanalys support were really helpful with my questions on
my rather tricky GC issues :-)
-- 
  (espen)
From: Marc Battyani
Subject: Re: Memory Exception
Date: 
Message-ID: <9854B8D87626C0E1.B1F30222C61EB083.5A7EFD9A6BA6F03B@lp.airnews.net>
"Coby Beck" <·····@mercury.bc.ca> wrote
>
> "Marc Battyani" <·············@fractalconcept.com> wrote
> > "Coby Beck" <·····@mercury.bc.ca> wrote
> >
> > > That's just general background because I don't know what could be
> > significant
> > > to answering my question.  I am getting an application crash due to a
> > memory
> > > exception error.  A bug report is in to Xanalys, I have every
confidence
> > they
> > > will respond quickly and with a solution as they always have.
Generally,
> > after
> > > running on my own mahine (which is remote so network stuff is much
slower)
> > for
> > > about an hour with the number of processes hovering between 20 and 50
and
> > with
> > > memory usage flucuating but slowly climbing from 20MB to 175MB the app
> > freezes
> > > for ~5 minutes and then dies a horrible C-like death.
> >
> > I had the same problem of memory increasing. It was also in a web
service
> > (Apache+mod_lisp) called by Java & IIs applications. We made tests with
300
> > simultaneous clients (i.e. 300 Lisp threads).
> > It is because the generation 2 is not GCed by default. You have to
> > explicitly call mark-and-sweep gen 2 from time to time.
> > Xanalys seems to be aware of that and IIRC in LW 4.2 the system code
will be
> > in generation 3 so that gen 2 will be GCed.
> > With the current version, it takes a long time to do a full GC.
> > It didn't crashed though.
> >
>
> Thanks I'll investigate that idea.  For least impact on server
responsivness
> how often do you recommend making a call like that?

First check if this is the problem: Execute (room) look at the generation 2
size, perform a (hcl:mark-and-sweep 2) and look again at the generation 2
size.

Then you can try to tell the GC to do less promotion by playing with the
various GC parameters. The trouble is that it is very application and load
dependant. If you have only a few threads the normal GC settings will be OK
but if you have lots of threads you have too much promotion. It's a problem
with generational GCs. Have you looked at the another c.l.l thread on this
subject (GC in servers) ?

Marc
From: Alain Picard
Subject: Re: Memory Exception
Date: 
Message-ID: <867krm9pyq.fsf@gondolin.local.net>
Hi Coby!  Long time no talk.

Are you doing any FFI?  That's the first point of paranoia.

And, I'm _suuuure_ you've checked this, but you don't have some
global hash table lying around collecting gensym'ed things in it,
are you?  naaaahhh... :-P

Lastly, is that error of a type for which you can set *break-on-signals*?
Then you could at least see what it was doing before dying.

my best regards,  --ap


-- 
It would be difficult to construe        Larry Wall, in  article
this as a feature.			 <·····················@netlabs.com>