socket problem in CMUCL

From: quasi
Subject: socket problem in CMUCL
Date: Mon, 16 Oct 2006 15:29:38 +0000
Message-ID: <1161012578.670713.97520@i42g2000cwa.googlegroups.com>

Hey folks,

  The CMU-help list seem be pretty much dead, which is why I am writing
here.  Our website runs on CMUCL + TBNL + mod_lisp2 + apache2.

It has been extremely stable and the simple dynamic page generation
come close to 95% of apache SSI performance over the network with load
of over 100 concurrent users.

That said, there has been a major flaw in the system which is now
becomeing serious.

The application creates multiple process with mp:make-process and each
of the processes then opens a connection with a java server (using
trivial-socket:open-stream) to get some data.  This call is wrapped in
with-open-stream so it should close stream when done with.  It's a
simple operation and works well.  But if there are open connections
with the java server and the server dies (or is killed because it hung)
then the lisp goes into continuous GC taking 99% CPU.  This also occurs
when we take the network interface down (experiment).   I have used the
code from araneida, the forcibly-close-socket and it still does not
help. This is a serious issue because the instability of the java
server causes the lisp  to go down.

We have written to the CMU lists but to no avail - is it dead ?

any help/suggestions on this would be really really appreciated (my CTO
is on my ass!).

:-)

thanks,
quasi

Re: socket problem in CMUCL Raymond Toy
Re: socket problem in CMUCL Carl Shapiro
Re: socket problem in CMUCL Bernd Schmitt
Re: socket problem in CMUCL Thibault Langlois
Re: socket problem in CMUCL quasi
- Re: socket problem in CMUCL quasi
- Re: socket problem in CMUCL Madhu
Re: socket problem in CMUCL Rob Warnock

From: Raymond Toy
Subject: Re: socket problem in CMUCL
Date: Mon, 16 Oct 2006 17:19:02 +0000
Message-ID: <sxdmz7wuqk9.fsf@rtp.ericsson.se>

>>>>> "quasi" == quasi  <·········@gmail.com> writes:

    quasi> The application creates multiple process with mp:make-process and each
    quasi> of the processes then opens a connection with a java server (using
    quasi> trivial-socket:open-stream) to get some data.  This call is wrapped in
    quasi> with-open-stream so it should close stream when done with.  It's a
    quasi> simple operation and works well.  But if there are open connections
    quasi> with the java server and the server dies (or is killed because it hung)
    quasi> then the lisp goes into continuous GC taking 99% CPU.  This also occurs
    quasi> when we take the network interface down (experiment).   I have used the
    quasi> code from araneida, the forcibly-close-socket and it still does not
    quasi> help. This is a serious issue because the instability of the java
    quasi> server causes the lisp  to go down.

    quasi> We have written to the CMU lists but to no avail - is it dead ?

The lists are not dead.  If no one answers it's usually because no one
knows the answer.

I don't answer because I've never used cmucl's mp stuff, and I rarely
ever use cmucl's socket stuff, so I'm basically clueless about your
questions.

I'm sorry I can't give a better answer.

Ray

From: Carl Shapiro
Subject: Re: socket problem in CMUCL
Date: Mon, 16 Oct 2006 21:48:58 +0000
Message-ID: <ouyslho9bjp.fsf@panix3.panix.com>

"quasi" <·········@gmail.com> writes:

> We have written to the CMU lists but to no avail - is it dead ?
>
> any help/suggestions on this would be really really appreciated (my CTO
> is on my ass!).

Suggestion: create a bug-in-the-box and sent it to the list.  You can
increase the odds of getting a helpful response if other people can
recreate enough of your environment to reproduce your problem locally.

From: Bernd Schmitt
Subject: Re: socket problem in CMUCL
Date: Mon, 16 Oct 2006 19:36:39 +0000
Message-ID: <4533ce6b$0$9648$9b622d9e@news.freenet.de>

Hello,

On 16.10.2006 17:29, quasi wrote:
> The application creates multiple process with mp:make-process and each
> of the processes then opens a connection with a java server (using
> trivial-socket:open-stream) to get some data.  This call is wrapped in
> with-open-stream so it should close stream when done with.  It's a
> simple operation and works well.  But if there are open connections
> with the java server and the server dies (or is killed because it hung)
> then the lisp goes into continuous GC taking 99% CPU.  This also occurs
> when we take the network interface down (experiment).   
I am a novice, but what about a local man-in-the-middle-attack
work-around, so that connection always are shut down correctly?
In order to get the time to improve the lisp-code ;)


Ciao,
Bernd

From: Thibault Langlois
Subject: Re: socket problem in CMUCL
Date: Tue, 17 Oct 2006 09:02:12 +0000
Message-ID: <1161075731.978641.203610@m73g2000cwd.googlegroups.com>

On Oct 16, 4:29 pm, "quasi" <·········@gmail.com> wrote:
> Hey folks,
>
>   The CMU-help list seem be pretty much dead, which is why I am writing
> here.  Our website runs on CMUCL + TBNL + mod_lisp2 + apache2.
>
> It has been extremely stable and the simple dynamic page generation
> come close to 95% of apache SSI performance over the network with load
> of over 100 concurrent users.
>
> That said, there has been a major flaw in the system which is now
> becomeing serious.
>
> The application creates multiple process with mp:make-process and each
> of the processes then opens a connection with a java server (using
> trivial-socket:open-stream) to get some data.  This call is wrapped in
> with-open-stream so it should close stream when done with.  It's a
> simple operation and works well.  But if there are open connections
> with the java server and the server dies (or is killed because it hung)
> then the lisp goes into continuous GC taking 99% CPU.

I am not sure I understand your problem, but would it be sufficient for
the lisp process to know that the server is down ? In this case, you
could check if the sever's PID still exists.

>   This also occurs
> when we take the network interface down (experiment).   I have used the
> code from araneida, the forcibly-close-socket and it still does not
> help. This is a serious issue because the instability of the java
> server causes the lisp  to go down.
>
> We have written to the CMU lists but to no avail - is it dead ?
>
> any help/suggestions on this would be really really appreciated (my CTO
> is on my ass!).
> 
> :-)
> 
> thanks,
> quasi

From: quasi
Subject: Re: socket problem in CMUCL
Date: Thu, 19 Oct 2006 06:51:27 +0000
Message-ID: <1161240687.192131.194530@b28g2000cwb.googlegroups.com>

Madhu wrote:
> Helu
>
> There was an identical problem reported in the same cmucl-general list
> in a long thread very recently, again with trivial-http wrapping calls
> to CMUCL functions. I'll cite two posts from that thread which
> identify and offer solutions to the problem
>
> http://permalink.gmane.org/gmane.lisp.cmucl.general/6049
> http://permalink.gmane.org/gmane.lisp.cmucl.general/6050
>
> If the symptoms seem familiar, the fix should work for you too.

That was us.  But it seems to not be that problem.  The solution did
not fix the problem.
What's weird is that it keeps happning is weird ways. I dont have
enough knowledge of
lisp internals to actually have any idea.  I will just site the
observed cases.

In all the cases we are doing network i/o through multiple parallel
processes.

1)  When the socket count went up to a large number and some error was
thrown.
2)  When the network went down and some error was thrown
3)  When the network was ok and some error got thrown while parsing the
data recieved from the network. (eg. parse-integer on nil value).

The things is 3) occurs only rarely.

In all the cases GC starts continuously taking 99% CPU and I can
usually get out of it by being patient and aborting all the restarts
(connecting to lisp directly with attachtty).

Another twist : Yesterday I tried with Allegro 8.0.  It straight out
said that "Scavenger invoked itself" and asked if it should dump core
for debugging.

think it has something to do with the error handling?  Or am I doing
something horribly wrong !!?

thanks,

> --
> Madhu

From: quasi
Subject: Re: socket problem in CMUCL
Date: Thu, 19 Oct 2006 08:47:19 +0000
Message-ID: <1161247639.217618.241460@b28g2000cwb.googlegroups.com>

quasi wrote:
> In all the cases we are doing network i/o through multiple parallel
> processes.
>
> 1)  When the socket count went up to a large number and some error was
> thrown.
> 2)  When the network went down and some error was thrown
> 3)  When the network was ok and some error got thrown while parsing the
> data recieved from the network. (eg. parse-integer on nil value).
>
> The things is 3) occurs only rarely.
>

let me just add : in case 3) the data has already got from the network
and the
network connection has been closed.  It is parsed only after that.

From: Madhu
Subject: Re: socket problem in CMUCL
Date: Thu, 19 Oct 2006 21:24:42 +0000
Message-ID: <m3lknc578l.fsf@robolove.meer.net>

Helu

* "quasi"  <························@b28g2000cwb.XXXXXXXXXXXX.com> 

> That was us. 

Oops sorry.

>  But it seems to not be that problem.  The solution did not fix the
> problem.  What's weird is that it keeps happning is weird ways. I
> dont have enough knowledge of lisp internals to actually have any
> idea.  I will just site the observed cases.

[...]

> In all the cases GC starts continuously taking 99% CPU and I can
> usually get out of it by being patient and aborting all the restarts
> (connecting to lisp directly with attachtty).

I used to get this with a subclassed fd-stream (tcp-stream). When the
gc symptomps showed up, a ^C and "destroy the process" restart would
usually fix things. I believe the problem has not occured since all
code paths were fixed to end with unix-close being called on the
socket fd. But it may be a rare bug and maybe I'm just lucky in not
seeing it again

--
Madhu

From: Rob Warnock
Subject: Re: socket problem in CMUCL
Date: Thu, 19 Oct 2006 08:40:27 +0000
Message-ID: <zPCdnRdJsLVmpKrYnZ2dnUVZ_oidnZ2d@speakeasy.net>

quasi <·········@gmail.com> wrote:
+---------------
| The application creates multiple process with mp:make-process and each
| of the processes then opens a connection with a java server (using
| trivial-socket:open-stream) to get some data.  This call is wrapped in
| with-open-stream so it should close stream when done with.  It's a
| simple operation and works well.  But if there are open connections
| with the java server and the server dies (or is killed because it hung)
| then the lisp goes into continuous GC taking 99% CPU. This also occurs
| when we take the network interface down (experiment). I have used the
| code from araneida, the forcibly-close-socket and it still does not help.
+---------------

Hmmm... You may be running into something that bit me with my
CMUCL-based server several years ago, namely, if the thing on
the other end of a socket goes away, then you'll get a SIGPIPE
the next time to write to it, which by default will call ERROR
and print a message before entering the debugger. But if the
*ERROR_OUTPUT* stream happens to be bound to the very Unix file
descriptor that you just got the SIGPIPE on... Oops!! Infinite
recursion. (Which can cause an infinite string of GCs due to
consing in the signal handler, although the actual CPU saturation
isn't *caused* by the GC per se.) If this is your problem, a
solution is described in some detail in an article I posted here
last December [you may need to click on "Show original" to get
decent formatting]:

    http://groups.google.com/group/comp.lang.lisp/msg/ee284119a99c0d68

But briefly, the trick [due to Dan Barlow] is to *ignore* SIGPIPE,
and let the normal I/O error handing take over. [Writes will result
in EPIPEs, not signals.]

Hope that helps...


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607