sockets and multi-processing

From: Felix Schlesinger
Subject: sockets and multi-processing
Date: Sun, 14 Apr 2002 09:17:55 +0000
Message-ID: <slrnabiia3.ct.fam_Schlesinger@schlesinger.dyndns.org>

Hi

I am writing a small program in cmucl that makes use of the mp: package
(for multi-processing). One of the processes opens a listener on an
inet-socket and the other processes (several of them) open
TCP-connections to the outside, and to the socket opened in the same
program (by the other process).

This is the listener:

(defun make-listener (port func)
 (mp:make-process #'(lambda ()
		(let ((socket (ext:create-inet-listener port)))
		  (unwind-protect
		      (loop     ;; Wait for a new connection
			(mp:process-wait-until-fd-usable socket :input)
			(let ((stream (sys:make-fd-stream 
                                (ext:accept-tcp-connection socket) 
                                :input t :output nil :buffering :none)))
			  (mp:make-process #'(lambda ()
					       (funcall
						func stream)))))
		    (ext:close-socket socket))))))

A connection (from another thread) would look like this
"
(defun connect (host port &optional (kind :stream))
  (sys:make-fd-stream (ext:connect-to-inet-socket host port kind) 
                      :input nil :output t :buffering :none))

(print something (connect somewhere))
"

The Function called by the listener does something like

(defun connection (stream)
  (mp:process-wait (mp:current-process) #'(lambda (listen stream)))
  (let ((message (read stream)))
  ... (later on it will open some connections itself)

The Problem is, that when i open several threads (around 20) the
program will at some point just hang ("top" shows it as "connecting"
with very little CPU-usage).

I guess that i have to put an process-wait in there somewhere (maybe on
process waits for the stream to open but doesnt give back the control so
the other process can open it, or something like that), but i dont no
where or what to wait for.

Thanks for any help.

Ciao
  Felix

Re: sockets and multi-processing Joe Marshall
- Re: sockets and multi-processing Arjun Ray
  - Re: sockets and multi-processing Joe Marshall
    - Re: sockets and multi-processing Arjun Ray
      - Re: sockets and multi-processing Joe Marshall
        Re: sockets and multi-processing Arjun Ray
        Re: sockets and multi-processing Joe Marshall
- Re: sockets and multi-processing Felix Schlesinger
Re: sockets and multi-processing Pierre R. Mai

From: Joe Marshall
Subject: Re: sockets and multi-processing
Date: Sun, 14 Apr 2002 19:11:41 +0000
Message-ID: <Nvku8.28153$%s3.9883035@typhoon.ne.ipsvc.net>

"Felix Schlesinger" <···············@t-online.de> wrote in message
··································@schlesinger.dyndns.org...

I wanted to suggest a few places to look.
I don't know all that much about cmucl, so this could be totally
wrong.

I noticed what appears to be a race condition in your code.

> (mp:process-wait-until-fd-usable socket :input)
> (let ((stream (sys:make-fd-stream
>                 (ext:accept-tcp-connection socket)
>                 :input t :output nil :buffering :none)))

Between the return from MP:PROCESS-WAIT-UNTIL-FD-USABLE
and the call to SYS:MAKE-FD-STREAM, there doesn't appear
to be anything guaranteeing that another process does
not sieze the stream first.

I'm even less sure about this one, but...

>   (mp:process-wait (mp:current-process) #'(lambda (listen stream)))
>   (let ((message (read stream)))

I assume that you meant #'(lambda () (listen stream))

You are discarding the return value from listen.  Now it could
be the case that CMUCL has a socket interface that is not
very similar to the standard berkely socket interface, but if
it *were* similar, then this isn't what you want.

(digression on berkeley sockets)
When you're writing a server, you need to create a socket
for incoming connections.  You call the `listen' function to
tell the OS to associate your socket with a port.  Then you
call the `accept' function to wait for something to happen.

'Accept' usually returns a *new* socket that you will use
for communicating with the client.  The old socket that you
were using for listening is still there listening.

The basic use of `accept' causes your process to hang until
there is no input.  This usually isn't what you want in a
multithreaded lisp.  One solution is to poll the socket by
calling `accept' with a timeout of zero seconds.

Let me know if any of this helps or if I am totally off target.

From: Arjun Ray
Subject: Re: sockets and multi-processing
Date: Sun, 14 Apr 2002 20:12:17 +0000
Message-ID: <7mnjbu0p4dnsunmdk9gnieu7i4htlpmgs9@4ax.com>

In <·······················@typhoon.ne.ipsvc.net>, "Joe Marshall"
<·············@attbi.com> wrote:

| (digression on berkeley sockets) [...]

| The basic use of `accept' causes your process to hang until there is 
| [...] input.  This usually isn't what you want in a multithreaded 
| lisp.  One solution is to poll the socket by calling `accept' with a 
| timeout of zero seconds.

I'm a CL beginner, but I'd be surprised if such busywait polling were
considered an acceptable solution to the problem of 'accept' being a
blocking call.

One general class of solutions in multithreaded programming uses a
dedicated thread to block on accept.  When the call completes, the new
socket is passed off to a separate worker thread, either through a queue
to a pool of such threads or directly to a new one created on the spot.
In this class of solutions, all clients are "accepted" immediately with
the service taking the resource hit (either an unbounded number of
threads or an unbounded number of accepted connections on the working
queue.)  Another class of solutions addresses this resource problem by
having the worker threads individually block on accept, either through a
semaphore discipline, or simultaneously, with the OS farming out work.
(The latter works, for example, on *BSD systems where more than one
process or thread can call accept on the same passive socket, but not on
Solaris, due to its Streams implementation of the Berkeley socket API.) 

With so many alternatives possible, polling would be a very inefficient
approach.

From: Joe Marshall
Subject: Re: sockets and multi-processing
Date: Sun, 14 Apr 2002 22:19:55 +0000
Message-ID: <fgnu8.28176$%s3.9980983@typhoon.ne.ipsvc.net>

"Arjun Ray" <····@nmds.com.invalid> wrote in message
·······································@4ax.com...
> In <·······················@typhoon.ne.ipsvc.net>, "Joe Marshall"
> <·············@attbi.com> wrote:
>
> | (digression on berkeley sockets) [...]
>
> | The basic use of `accept' causes your process to hang until there is
> | [...] input.  This usually isn't what you want in a multithreaded
> | lisp.  One solution is to poll the socket by calling `accept' with a
> | timeout of zero seconds.
>
> I'm a CL beginner, but I'd be surprised if such busywait polling were
> considered an acceptable solution to the problem of 'accept' being a
> blocking call.

Well, it is and it isn't....

It really depends on the architecture of the mulitprocessing
within the lisp.  Some Lisp systems on stock hardware do not
use `native' threads (some OS's don't provide them anyway).
In these Lisp systems, the multitasking is done via an in-process
scheduler that switches between Lisp stacks periodically.
As far as the OS is concerned, though, the Lisp is single-threaded.

In a Lisp system such as this, you simply cannot block for an
accept.

Another thing to consider is whether the Lisp system has
`wait functions'.  A wait function is a function that the
scheduler may invoke to determine if the process is ready
to run.  On the Lisp machine, the wait function was run from
within the dynamic context of the scheduler.  The wait
function was usually run on each scheduling cycle.  On stock
hardware implementations, the scheduler often doesn't have
its own stack.  In that case, the wait function might run
in the dynamic context of whatever stack happened to be
around.

The difficulty with this model is that you cannot easily
tell *why* the process is waiting from the wait function.

Ideally, if all processes are each awaiting an event, the
OS need not schedule the Lisp process until the event
occurs (or until a signal comes in).  Under Unix there
is an OS call named `select' that implements this feature.
You give select a set of descriptors and a timeout and
the OS puts your process to sleep until something interesting
happens on a descriptor, until a timeout happens, or until
a signal occurs.  Under Windows, the analagous call is
`WaitForMultipleObjects'.  So what you'd like to have happen
is when your `listening' thread wants to wait for a
connection is for the Lisp process to note this, and if
nothing else is going on, do a `select' on that socket.

Unfortunately, this doesn't fit in with the `wait function'
model.  Since a `wait function' can be an arbitrary
piece of code, the lisp process cannot easily determine if
what is being waited on is in fact something that can
be handled by the OS.

There are various workarounds to this, but most involve
giving the scheduler some idea of what file-descriptors
it should be waiting on when it is idle.  At best it is
complicated, and at worst, the scheduler is buggy so it
doesn't quite work all the time.

Now, back to polling.  First of all, it is very simple.
You don't have to know how the scheduler and the OS
interact.  Second, it is more portable.  It ought to
work on different OS and even different Lisp implementations.
Third, if you are polling in the
wait function, this will happen on process switching,
probably on the order of dozens of times a second.
This really isn't very frequent on a machine that runs
at several hundred megahertz.

You are correct that polling is far less efficient
than event-driven scheduling, but I think the efficiency
hit will be in the noise.  The problem with the more
sophisticated solutions that you posed is that there
are implementations that can't support this kind of
solution without major effort.  I don't know CMUCL so
I couldn't tell you if you could make a blocking call
on a worker thread.  I *do* know that if you have a
process wait, that polling ought to work.

> One general class of solutions in multithreaded programming uses a
> dedicated thread to block on accept.  When the call completes, the new
> socket is passed off to a separate worker thread, either through a queue
> to a pool of such threads or directly to a new one created on the spot.
> In this class of solutions, all clients are "accepted" immediately with
> the service taking the resource hit (either an unbounded number of
> threads or an unbounded number of accepted connections on the working
> queue.)  Another class of solutions addresses this resource problem by
> having the worker threads individually block on accept, either through a
> semaphore discipline, or simultaneously, with the OS farming out work.
> (The latter works, for example, on *BSD systems where more than one
> process or thread can call accept on the same passive socket, but not on
> Solaris, due to its Streams implementation of the Berkeley socket API.)
>
> With so many alternatives possible, polling would be a very inefficient
> approach.
>

From: Arjun Ray
Subject: Re: sockets and multi-processing
Date: Wed, 17 Apr 2002 09:40:09 +0000
Message-ID: <mkaqbuguihgb1c9ea9ikunmj8ubg4l4jdb@4ax.com>

In <·······················@typhoon.ne.ipsvc.net>, 
"Joe Marshall" <·············@attbi.com> wrote:

| [On some stock hardware], the multitasking is done via an in-process
| scheduler that switches between Lisp stacks periodically.  As far as 
| the OS is concerned, though, the Lisp is single-threaded.  In a Lisp 
| system such as this, you simply cannot block for an accept.

Well, yes, but such threading libraries in other languages (eg pthreads
for C) seem to do a lot more.  A vital part is a set of wrappers around
system calls, so that threads really "block" in user space (and give the
scheduler a convenient chance to run).  The "real" system calls will be
in nonblocking mode regardless of the semantics at the application
level.

I get the impression that the MP package is at a lower level, exposing a
lot of mechanics that the libraries I'm thinking of would encapsulate.
 
| Another thing to consider is whether the Lisp system has `wait functions'.

Thank you, this was a vital piece of information.  It got me to find
resources like these:

 http://www.franz.com/support/documentation/6.1/doc/multiprocessing.htm
 http://ligwww.epfl.ch/software/ilu/manual_14.html

The material is very accessible, but I find a number of parts more than
a little mystifying.

For instance, the ILU manual has a producer-consumer queue example with
the consumer code looking like this (some comments removed for brevity):

(defun consume ()
  (loop
    ;; Check to see if there is anything on the queue.
    (if (not (queue-empty-p queue))
        ;; There is an item on the queue; pop and print all items.
        (do ()((queue-empty-p queue))
          (fresh-line t)
          (princ "Output: ")
          (prin1 (queue-pop queue))
          (fresh-line t)
          (finish-output t))       
      ;; Queue is empty; check to see if the producer is still alive.
      (if (null (ilu-process:find-process "Producer Process"))   
        (return nil)))     
    ;; Sleep for five seconds; this gives up control immediately
    ;; so some other process can run.
    (sleep 5)
    ))

For the life of me I can't see why this isn't seriously raced, given
that by the time (queue-pop queue) is executed, the result of a prior
(queue-empty-p queue) evaluation could easily be meaningless.  

I must be missing something basic.

| The difficulty with this model is that you cannot easily tell *why* the 
| process is waiting from the wait function.

Until I get the "Aha!" of it, I'd have to say that this looks a like a
problem of one's own making.  Maybe there's (code-written-for) Lisp
Machine compatibility at stake, maybe the point is to have to build a
support layer, but the synchronization primitives don't seem to form a
usefully complete set.  In particular, I don't see the point of a wait
function where what seems to be needed is the concept of a condition
variable.  This could be a terminological problem, in that the "wait" in
"process-wait" really feels more like "yield" in systems where "wait"
connotes the P operation.   

| Unfortunately, this doesn't fit in with the `wait function' model.  
| Since a `wait function' can be an arbitrary piece of code, the lisp 
| process cannot easily determine if what is being waited on is in fact 
| something that can be handled by the OS.

It seems to me that the real problem is in the guarantee that the wait
function will be invoked periodically - that is, a polling loop is a
built-in premise of the entire scheme.  But what I find really confusing
is whether a wait function is allowed to block, which it might have to
if its predicate requires examining something under WITH-PROCESS-LOCK.
Oridnarily I would have expected a multi-threading system to run the
scheduler at natural block points (mutex lock, blocking system call,
etc.) but what seems to be the paradigm here is the scheduler running at
more or less arbitrary points with all threads implicitly running in
polling loops.

That said, I think I see now why polling was a natural suggestion!

| Now, back to polling.  First of all, it is very simple. You don't have 
| to know how the scheduler and the OS interact. 

I'm not sure why this is important, in the sense that I don't see how
other paradigms (such as pthreads) are different in this respect. 

| Second, it is more portable.  It ought to work on different OS and 
| even different Lisp implementations.

I'm not sure this is true.  The critical distinction seems to be whether
an OS provides synchronization primitives (and perhaps, signals.)  Even
without them, it seems a lot of unnecessary work to have to allow the
scheduler and wait functions to run at arbitrary - albeit guaranteed -
intervals when everything could be orgnanized around a defined set of
blocking operations.

| Third, if you are polling in the wait function, this will happen on 
| process switching, probably on the order of dozens of times a second.

I don't quite follow, sorry.  Is "process-switching" at the OS level or
at the intraprocess thread level?  How does the scheduler get to run
without a special OS-level service?

From: Joe Marshall
Subject: Re: sockets and multi-processing
Date: Wed, 17 Apr 2002 11:26:46 +0000
Message-ID: <WZcv8.33182$%s3.12117583@typhoon.ne.ipsvc.net>

I re-ordered your text so I could address the issues in
what I hope is a clearer sequence.

"Arjun Ray" <····@nmds.com.invalid> wrote in message
·······································@4ax.com...
> For instance, the ILU manual has a producer-consumer queue example with
> the consumer code looking like this (some comments removed for brevity):
>
> (defun consume ()
>   (loop
>     ;; Check to see if there is anything on the queue.
>     (if (not (queue-empty-p queue))
>         ;; There is an item on the queue; pop and print all items.
>         (do ()((queue-empty-p queue))
>           (fresh-line t)
>           (princ "Output: ")
>           (prin1 (queue-pop queue))
>           (fresh-line t)
>           (finish-output t))
>       ;; Queue is empty; check to see if the producer is still alive.
>       (if (null (ilu-process:find-process "Producer Process"))
>         (return nil)))
>     ;; Sleep for five seconds; this gives up control immediately
>     ;; so some other process can run.
>     (sleep 5)
>     ))
>
> For the life of me I can't see why this isn't seriously raced, given
> that by the time (queue-pop queue) is executed, the result of a prior
> (queue-empty-p queue) evaluation could easily be meaningless.
>
> I must be missing something basic.

Well it isn't as serious a bug as it looks at first.  First,
we assume that this process is the only one removing elements
from the queue.  So if queue-empty-p returns NIL, we know that
there is at least one element that can be removed and that no
one else will get to it first.

But there is a race condition.  Suppose that we have just woken
up from the SLEEP and gone to the top of the LOOP.  We find the
QUEUE is empty and proceed down the false branch of the conditional.
But let us suppose that we are suspended and the producer
process gains control.  It spews a megabyte of text and then exits.
We regain control and find that producer process is dead,
so we exit without having printed the final message.

This bug is unlikely to show up in practice because entering the
loop is synchronized with the scheduler (because of the call to
sleep), and the scheduling quantum is unlikely to be so small as
to expire before we leave the racey code.  But if the machine has
a huge load, it could miss final messages.

> In <·······················@typhoon.ne.ipsvc.net>,
> "Joe Marshall" <·············@attbi.com> wrote:
>
> | [On some stock hardware], the multitasking is done via an in-process
> | scheduler that switches between Lisp stacks periodically.  As far as
> | the OS is concerned, though, the Lisp is single-threaded.  In a Lisp
> | system such as this, you simply cannot block for an accept.
>
> Well, yes, but such threading libraries in other languages (eg pthreads
> for C) seem to do a lot more.  A vital part is a set of wrappers around
> system calls, so that threads really "block" in user space (and give the
> scheduler a convenient chance to run).  The "real" system calls will be
> in nonblocking mode regardless of the semantics at the application
> level.
>
> I get the impression that the MP package is at a lower level, exposing a
> lot of mechanics that the libraries I'm thinking of would encapsulate.

Yes.  The MP package is at quite a low level.  The original stack-group
model appeared on the CADR lisp machine in the late 70's or early 80's.
The various multiprocessing packages have attempted to emulate this
basic functionality.

> Until I get the "Aha!" of it, I'd have to say that this looks a like a
> problem of one's own making.  Maybe there's (code-written-for) Lisp
> Machine compatibility at stake, maybe the point is to have to build a
> support layer, but the synchronization primitives don't seem to form a
> usefully complete set.  In particular, I don't see the point of a wait
> function where what seems to be needed is the concept of a condition
> variable.  This could be a terminological problem, in that the "wait" in
> "process-wait" really feels more like "yield" in systems where "wait"
> connotes the P operation.

The `Aha!' of it is this:  the wait-function model sucks as a
user-level API.  (In fact, it pretty much sucks altogether *unless*
your OS is written in Lisp.)  It's a popular model, but it is the
wrong model.

> It seems to me that the real problem is in the guarantee that the wait
> function will be invoked periodically - that is, a polling loop is a
> built-in premise of the entire scheme.

This is the basic problem with a wait function.

> But what I find really confusing is whether a wait function is
> allowed to block, which it might have to
> if its predicate requires examining something under WITH-PROCESS-LOCK.

To quote Dr. Egon Spengler, ``That would be bad.''

The wait function simply must not block, and ought to run as quickly
as possible.  On the Lisp Machine, if the wait function blocked, the
entire machine would freeze.

> Oridnarily I would have expected a multi-threading system to run the
> scheduler at natural block points (mutex lock, blocking system call,
> etc.) but what seems to be the paradigm here is the scheduler running at
> more or less arbitrary points with all threads implicitly running in
> polling loops.
>
> That said, I think I see now why polling was a natural suggestion!

That's pretty much the state of things.

> | Now, back to polling.  First of all, it is very simple. You don't have
> | to know how the scheduler and the OS interact.
>
> I'm not sure why this is important, in the sense that I don't see how
> other paradigms (such as pthreads) are different in this respect.

It's important because the useful abstraction layer is missing.
In the original example we were talking about sockets.  Under Unix,
the Lisp scheduler ought to be calling `select' when it determines
that the Lisp is idle.  The sockets that are awaiting input ought to
be in the FDSET for select.  Under NT, however, there are a number
of options ranging from calling `WaitForMultipleObjects' or using
`overlapped i/o'.

It would be nice if you could just call the blocking `accept' function,
but what if that hangs your lisp?  (Don't laugh, it might!)  Well,
you *could* start mucking around to figure out how to get the
Lisp scheduler to understand that you are waiting on a particular
socket, or you could poll.

Personally, I'd consider the former, but I'd advise the latter to
strangers.

> | Second, it is more portable.  It ought to work on different OS and
> | even different Lisp implementations.
>
> I'm not sure this is true.  The critical distinction seems to be whether
> an OS provides synchronization primitives (and perhaps, signals.)  Even
> without them, it seems a lot of unnecessary work to have to allow the
> scheduler and wait functions to run at arbitrary - albeit guaranteed -
> intervals when everything could be orgnanized around a defined set of
> blocking operations.

Agreed, but this is the reality of the situation.

> | Third, if you are polling in the wait function, this will happen on
> | process switching, probably on the order of dozens of times a second.
>
> I don't quite follow, sorry.  Is "process-switching" at the OS level or
> at the intraprocess thread level?  How does the scheduler get to run
> without a special OS-level service?

I'm considering the intraprocess thread level here.  The scheduler
runs when a timer interrupt goes off or the currently running process
yields control.  Scheduling quanta are usually on the order of
1/10th of a second or so.

From: Arjun Ray
Subject: Re: sockets and multi-processing
Date: Thu, 18 Apr 2002 02:09:05 +0000
Message-ID: <lv9sbuo6si22570vpd4co1oopevadttgmm@4ax.com>

In <························@typhoon.ne.ipsvc.net>, "Joe Marshall"
<·············@attbi.com> wrote:
| "Arjun Ray" <····@nmds.com.invalid> wrote in message
| ·······································@4ax.com...

| Yes.  The MP package is at quite a low level.  The original stack-group
| model appeared on the CADR lisp machine in the late 70's or early 80's.
| The various multiprocessing packages have attempted to emulate this
| basic functionality.

Is this a matter of supporting a (large?) body of legacy code written to
this model?  Or is there something (idiomatically) appropriate about
continuing with this model in preference to other possibly better ones?
Does it really buy anything in terms of expressiveness? 

| The `Aha!' of it is this:  the wait-function model sucks as a user-level 
| API.  (In fact, it pretty much sucks altogether *unless* your OS is 
| written in Lisp.)  It's a popular model, but it is the wrong model.

Even so, I think it's only fair to try and understand the model on its
own terms.  (Which would lead to issues such as: why is it continuing to
be popular?)

|> But what I find really confusing is whether a wait function is allowed 
|> to block, which it might have to if its predicate requires examining 
|> something under WITH-PROCESS-LOCK.
| 
| To quote Dr. Egon Spengler, ``That would be bad.''
| 
| The wait function simply must not block, and ought to run as quickly
| as possible.  On the Lisp Machine, if the wait function blocked, the
| entire machine would freeze.

I think I have a better understanding now.

It looks like a process-lock is more of a critical section guard (as in
NT) than a binary semaphore or mutex.  That is, acquiring a process-lock
is not a blocking operation; rather, it is advising the scheduler, if by
chance it interrupts, that a critical section (in a WITH-PROCESS-LOCK)
must run to completion.  Likewise, a wait function would also have to
run to completion, and wouldn't need WITH-PROCESS-LOCK to manipulate
shared variables because running in the context of the scheduler ensures
that it can't be interrupted while variables are in inconsistent states.

This model works best when threads *never* really block, only cooperate
actively to yield (or be interruptible) at points where critical section
considerations don't apply.  Blocking I/O is therefore a nasty problem.
Either file descriptors are put in nonblocking mode (and threads spin in
sleep-and-retry loops), or the system supports I/O test predicates ("can
read?", "can write?") which wait functions would evaluate, provided the
associated I/O doesn't itself overstep resource limits.  The latter case
worries me: what if I can't read "enough"?  what if I write "too much"?
These issues are especially relevant to network I/O.

| It would be nice if you could just call the blocking `accept' function, 
| but what if that hangs your lisp?  (Don't laugh, it might!) 

Worrying about blocking I/O strikes me as a gratuitous complication, so
I'm not seeing what the real advantage of the wait function model is,
even in the general case.

| Well, you *could* start mucking around to figure out how to get the 
| Lisp scheduler to understand that you are waiting on a particular 
| socket, or you could poll.  Personally, I'd consider the former, but 
| I'd advise the latter to strangers.

I think it's also the way the MP package would prefer to have one code
things.  Thanks very much for the insights.

From: Joe Marshall
Subject: Re: sockets and multi-processing
Date: Thu, 18 Apr 2002 15:24:49 +0000
Message-ID: <5zBv8.33861$%s3.13056119@typhoon.ne.ipsvc.net>

"Arjun Ray" <····@nmds.com.invalid> wrote in message
·······································@4ax.com...
> In <························@typhoon.ne.ipsvc.net>, "Joe Marshall"
> <·············@attbi.com> wrote:
> | "Arjun Ray" <····@nmds.com.invalid> wrote in message
> | ·······································@4ax.com...
>
> | Yes.  The MP package is at quite a low level.  The original stack-group
> | model appeared on the CADR lisp machine in the late 70's or early 80's.
> | The various multiprocessing packages have attempted to emulate this
> | basic functionality.
>
> Is this a matter of supporting a (large?) body of legacy code written to
> this model?  Or is there something (idiomatically) appropriate about
> continuing with this model in preference to other possibly better ones?
> Does it really buy anything in terms of expressiveness?

I'm not so sure it is a `large' body of legacy code.  Back when
Lisp machines roamed the earth it was important to be compatible
with them.  So companies like Franz, Lucid, and Harlequin wrote
mulitasking packages that were more or less compatible with
the Lisp machine model.

>
> It looks like a process-lock is more of a critical section guard (as in
> NT) than a binary semaphore or mutex.  That is, acquiring a process-lock
> is not a blocking operation; rather, it is advising the scheduler, if by
> chance it interrupts, that a critical section (in a WITH-PROCESS-LOCK)
> must run to completion.  Likewise, a wait function would also have to
> run to completion, and wouldn't need WITH-PROCESS-LOCK to manipulate
> shared variables because running in the context of the scheduler ensures
> that it can't be interrupted while variables are in inconsistent states.

I'm afraid I must have given the wrong impression.

There is usually an opertion called WITHOUT-INTERRUPTS (or some such
variant) that attempts to ensure a section of code is not interrupted.
PROCESS-LOCK really is a binary semaphore or mutex.

> Blocking I/O is therefore a nasty problem.
> Either file descriptors are put in nonblocking mode (and threads spin in
> sleep-and-retry loops), or the system supports I/O test predicates ("can
> read?", "can write?") which wait functions would evaluate, provided the
> associated I/O doesn't itself overstep resource limits.  The latter case
> worries me: what if I can't read "enough"?  what if I write "too much"?
> These issues are especially relevant to network I/O.

It isn't quite *that* dismal.  There are I/O test predicates and
the scheduler usually *does* have a close interaction with the
vendor supplied I/O primitives.

From: Felix Schlesinger
Subject: Re: sockets and multi-processing
Date: Sun, 14 Apr 2002 20:06:04 +0000
Message-ID: <slrnabjo9c.14c.fam_Schlesinger@schlesinger.dyndns.org>

Joe Marshall schrieb
> "Felix Schlesinger" <···············@t-online.de> wrote in message
> I noticed what appears to be a race condition in your code.
> 
>> (mp:process-wait-until-fd-usable socket :input)
>> (let ((stream (sys:make-fd-stream
>>                 (ext:accept-tcp-connection socket)
>>                 :input t :output nil :buffering :none)))
> 
> Between the return from MP:PROCESS-WAIT-UNTIL-FD-USABLE
> and the call to SYS:MAKE-FD-STREAM, there doesn't appear
> to be anything guaranteeing that another process does
> not sieze the stream first.

Yepp, your right. I fixed that, thanks.
 
>>   (mp:process-wait (mp:current-process) #'(lambda (listen stream)))
>>   (let ((message (read stream)))
> 
> I assume that you meant #'(lambda () (listen stream))

I did, but it doesnt really matter. The process-wait above isnt
necessary at all. The call to listen just checks whether there is input
in the stream. All the accept-stuff is done in the listener thread
above.

Ciao
  Felix

From: Pierre R. Mai
Subject: Re: sockets and multi-processing
Date: Sun, 14 Apr 2002 22:55:58 +0000
Message-ID: <87bscl3n5t.fsf@orion.bln.pmsf.de>

···············@t-online.de (Felix Schlesinger) writes:

> (defun connection (stream)
>   (mp:process-wait (mp:current-process) #'(lambda (listen stream)))
>   (let ((message (read stream)))
>   ... (later on it will open some connections itself)

I think that using listen to wait for the fd becoming usable is not
very wise.  If necessary, then use mp:process-wait-until-fd-usable
instead.

That said, read will already block your process, so it should work
just removing that call to process-wait...

What happens if you do that?

> I guess that i have to put an process-wait in there somewhere (maybe on
> process waits for the stream to open but doesnt give back the control so
> the other process can open it, or something like that), but i dont no
> where or what to wait for.

FWIW, we've been running our CMU CL-based HTTP/1.1 server for some
time, including benchmarking sessions with more than 100 connections
per second (and more than 100 active connections), and didn't ever run
into dead-locks, so in theory this should just work...

Regs, Pierre.

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein