implementing a counting sempahore (allegroCL)

From: Ryan McCormack
Subject: implementing a counting sempahore (allegroCL)
Date: Wed, 25 Jul 2001 15:25:03 +0000
Message-ID: <m38zhdqbe8.fsf@ryan.dotcast.com>

Does anyone have any experience with implementing a semaphore with
Lisp?  I am struggling with whether or not I can do this given the
primitives in the multiprocessing package that comes with AllegroCL.
Any links to examples of semaphore implementations in Lisp would be
greatly appreciated.

Thank you,

Ryan McCormack

Re: implementing a counting sempahore (allegroCL) ···@itasoftware.com
- Re: implementing a counting sempahore (allegroCL) Ryan McCormack
  - Re: implementing a counting sempahore (allegroCL) John Foderaro
    - Re: implementing a counting sempahore (allegroCL) ···@itasoftware.com
  - Re: implementing a counting sempahore (allegroCL) ···@itasoftware.com
    - Re: implementing a counting sempahore (allegroCL) John Foderaro
      - Re: implementing a counting sempahore (allegroCL) ···@itasoftware.com
        Re: implementing a counting sempahore (allegroCL) John Foderaro
        Re: implementing a counting sempahore (allegroCL) ···@itasoftware.com
        Re: implementing a counting sempahore (allegroCL) John Foderaro
        Re: implementing a counting sempahore (allegroCL) ···@itasoftware.com
Re: implementing a counting sempahore (allegroCL) Thor Kristoffersen

From: ···@itasoftware.com
Subject: Re: implementing a counting sempahore (allegroCL)
Date: Wed, 25 Jul 2001 19:17:02 +0000
Message-ID: <7kww3jkh.fsf@itasoftware.com>

Ryan McCormack <········@cis.ohio-state.edu> writes:

> Does anyone have any experience with implementing a semaphore with
> Lisp?  I am struggling with whether or not I can do this given the
> primitives in the multiprocessing package that comes with AllegroCL.
> Any links to examples of semaphore implementations in Lisp would be
> greatly appreciated.

You can do it under Allegro CL, but it can be quite tricky as the
multitasking code in Allegro is quite buggy.

If you told me a bit more about the behavior you are looking for (I
have heard several `canonical' definitions of what a semaphore is), I
can help you out.

> 
> Thank you,
> 
> Ryan McCormack

From: Ryan McCormack
Subject: Re: implementing a counting sempahore (allegroCL)
Date: Wed, 25 Jul 2001 17:28:25 +0000
Message-ID: <m3u200q5om.fsf@ryan.dotcast.com>

···@itasoftware.com writes:

> You can do it under Allegro CL, but it can be quite tricky as the
> multitasking code in Allegro is quite buggy.
> 
> If you told me a bit more about the behavior you are looking for (I
> have heard several `canonical' definitions of what a semaphore is), I
> can help you out.

my semaphore definition

type semaphore = record
        value: integer
        L: list of process
end;

two operations needed (?)

block - suspend the process that invokes it
wakeup(P) - resumes the execution of a blocked process P

semaphore operations

wait(S):
        S.value := S.value-1;
        if S.value < 0 then
        begin
                add this process to S.L;
                block;
        end;

signal(S):
        S.value := S.value+1;
        if S.value <= 0 then
        begin
                remove a process P from S.L;
                wakeup(P);
        end;

note -- this is what I was thinking.  Insted of block and wakeup, I
could use process-wait and a function that checks the process list of
the semaphore.  The function argument of process wait is a function that checks
to see if the process is still on the semaphore process-list.  When
the process is no longer on the list, the function returns true, in
this case, the process wakes up.  This pretty much gets rid of the
need for a wakeup call.. it is just assumed that the process wakes up
when the process is removed.   

-ryan

From: John Foderaro
Subject: Re: implementing a counting sempahore (allegroCL)
Date: Thu, 26 Jul 2001 14:24:30 +0000
Message-ID: <MPG.15c9c7d73622d1749896a2@news.dnai.com>

 If you're using ACL 6.0 you can make use of the Queue object
to do P and V.  

http://www.franz.com/support/documentation/6.0/doc/multiprocessing.htm#queues-1

A queue is overkill for a counting semaphore but if you aren't doing too many
P's and V's then the consing done in the enqueue operation won't be a problem.
enqueue/dequeue are very simple functions that use a process lock to serialize
access to the queue itself.  One could do the same thing to implement a
simple counting semaphore.


 If you want one process to control the starting of other processes you might
consider using the processes' run reasons.   Waking up a process is then 
just adding something to it's run reasons.  This is what AllegroServe
does. (see the function http-accept-thread in main.cl in the source you
can find via  http://opensource.franz.com/aserve )


- john foderaro
  franz inc.

From: ···@itasoftware.com
Subject: Re: implementing a counting sempahore (allegroCL)
Date: Thu, 26 Jul 2001 17:16:56 +0000
Message-ID: <y9pb1ugn.fsf@itasoftware.com>

John Foderaro <···@xspammerx.franz.com> writes:

>  If you want one process to control the starting of other processes you might
> consider using the processes' run reasons.   Waking up a process is then 
> just adding something to it's run reasons.  This is what AllegroServe
> does. (see the function http-accept-thread in main.cl in the source you
> can find via  http://opensource.franz.com/aserve )

The difficulty with doing this under Franz is that there is no way
to atomically modify the run reasons of more than one process at a
time.  Any operation that touches a process's run reasons will cause a
trip through the scheduler, *even* if interrupts and scheduling are
off.

So, for example, you cannot atomically turn off one process and turn
on another, or atomically revoke run reasons and add a process to a
queue.

From: ···@itasoftware.com
Subject: Re: implementing a counting sempahore (allegroCL)
Date: Thu, 26 Jul 2001 13:50:36 +0000
Message-ID: <3d7j3ikz.fsf@itasoftware.com>

Ryan McCormack <········@cis.ohio-state.edu> writes:

> note -- this is what I was thinking.  Insted of block and wakeup, I
> could use process-wait and a function that checks the process list of
> the semaphore.  The function argument of process wait is a function that checks
> to see if the process is still on the semaphore process-list.  When
> the process is no longer on the list, the function returns true, in
> this case, the process wakes up.  This pretty much gets rid of the
> need for a wakeup call.. it is just assumed that the process wakes up
> when the process is removed.   

You'd have to maintain the semaphore count in sync with the list, but
that ought to work.  Here is what I came up with.

(in-package :cl-user)

;;; Franz implements control-c by signalling a condition of class
;;; EXCL:INTERRUPT-SIGNAL.  This inherits from ERROR, so
;;; IGNORE-ERRORS also ignores control-c.  Probably not what was
;;; intended.

(defmacro working-ignore-errors (&rest forms)
  "Working version of CL:IGNORE-ERRORS
   This form does *not* ignore control-c interrupts."
  (let ((block-name (gensym (symbol-name :ignore-errors-block-))))
    `(BLOCK ,block-name
       (HANDLER-BIND ((EXCL:INTERRUPT-SIGNAL #'ERROR)

		      (ERROR (LAMBDA (CONDITION)
			       (RETURN-FROM ,block-name
				 (VALUES NIL CONDITION)))))
	 ,@forms))))

;;; Franz's implementation of unwind-protect is not safe from
;;; asynchronous interrupts and thus not useful in a multitasking
;;; environment.

(defmacro working-unwind-protect (protected-form &rest cleanup-forms)
  "A multitasking-safe version of unwind-protect.  Immune to race
   condition caused by asynchronous interrupts and asynchronous
   process manipulation."
  (let ((interrupts-var (gensym (symbol-name :interrupts-))))
    `(let ((,interrupts-var excl::*without-interrupts*)
	   (excl::*without-interrupts* t))
       (unwind-protect
           (let ((excl::*without-interrupts* ,interrupts-var))
	     ,protected-form)
	 ,@cleanup-forms))))

;;; It is incorrect to attempt to seize a lock when multitasking
;;; is inhibited:  if the lock is not free, it will never become free.
;;; However, Franz's implementation of PROCESS-LOCK does not report
;;; this as an error, but rather it silently turns multitasking
;;; back on in the hope that the lock will be released.
;;;
;;; This, of course, invalidates the contract of WITHOUT-INTERRUPTS,
;;; which is the primary synchronization mechanism for the entire
;;; system.
;;;
;;; The macro-expansion for with-process-lock exacerbates this problem
;;; because it binds EXCL::*WITHOUT-INTERRUPTS* prior to calling
;;; MP:PROCESS-LOCK.  We cannot fix the macro after it has been
;;; expanded, so we do this ugly trick:  we peek up the binding stack
;;; to find the saved value of EXCL::*WITHOUT-INTERRUPTS* and ensure
;;; it was not T prior to being bound.

;;; The way to avoid this error is to make sure you seize all necessary
;;; process-locks BEFORE you call without-interrupts.

(defun were-interrupts-off? ()
  (let ((index (excl::index-in-bindstack 'excl::*without-interrupts*)))
    (when index
      (excl::bindstack-value index))))

(defun check-deadlock (lock &optional (lock-value sys:*current-process*) &rest whatever)
  (declare (ignore whatever))
  (when (and (or (were-interrupts-off?) sys::*disallow-scheduling*)
	     (not (eq (mp:process-lock-locker lock) lock-value)))
    (error "DEADLOCK!  Multitasking is off while trying to sieze lock ~s" lock)))

(eval-when (:load-toplevel :execute)
  (when (member :compiler *features*)
    (let ((excl:*compile-advice* t))
      (excl:defadvice mp:process-lock :before
	(apply #'check-deadlock excl:arglist)))))

;;; Mutex

(defconstant +max-waiting-processes+ 128
  "The maximum number of processes that can wait on a mutex.")

(defstruct (mutex
	    (:constructor make-mutex
			  (name &aux (waiting-processes
				      (make-array +max-waiting-processes+)))))
  (name nil :read-only t)
  (waiting-processes #() :read-only t))

(defun process-has-mutex-p (mutex &optional (process mp:*current-process*))
  (eq (svref (mutex-waiting-processes mutex) 0) process))

(defun find-empty-queue-slot (queue current-position)
  (1+ (or (position-if-not #'null queue
			   :from-end t
			   :end current-position)
	  -1)))

(defun enter-mutex-queue (mutex)
  "Place the current process in the queue of processes awaiting semaphore.

   Returns the index of the process in the queue.  Returns NIL if queue is full."
  ;; The only interlock necessary to prevent races.
  (let* ((waiting-processes (mutex-waiting-processes mutex))
	 (limit (length waiting-processes)))
    (excl:without-interrupts
      (let ((index (find-empty-queue-slot waiting-processes limit)))
	(unless (= index limit)
	  (setf (svref waiting-processes index) mp:*current-process*)
	  index)))))

(defun leave-mutex-queue (mutex)
  (nsubstitute nil mp:*current-process* (mutex-waiting-processes mutex)))

(defun advance-in-queue (mutex current-index)
  (let ((queue (mutex-waiting-processes mutex)))
    (if (zerop current-index)
	0
	(let ((next-index (find-empty-queue-slot queue current-index)))
	  (if (= next-index current-index)
	      current-index
	      (progn
		(setf (svref queue next-index) mp:*current-process*)
		(setf (svref queue current-index) nil)
		next-index))))))

(defun can-advance-p (waiting-processes position)
  (null (svref waiting-processes (1- position))))

(defun acquire-mutex (mutex)
  (when excl::*without-interrupts*
    (error "Cannot wait for mutex when interrupts are off."))
  (when sys::*disallow-scheduling*
    (error "Cannot wait for mutex with scheduling off."))

  (let ((position (enter-mutex-queue mutex)))
    (if (null position)
	(error "Too many processes awating mutex ~s" mutex)
	(wait-for-mutex mutex position))))

(defun wait-for-mutex (mutex position)
  "Attempt to advance in the queue until we are in position zero."
  (let ((whostate (format nil "Mutex ~s" (mutex-name mutex)))
	(queue (mutex-waiting-processes mutex)))
    (assert (eq (svref queue position) mp:*current-process*)) ; sanity
    (loop
	until (zerop position)
	do (mp:process-wait whostate #'can-advance-p queue position)
	   (setq position (advance-in-queue mutex position)))))

(defmacro with-mutex ((mutex) &body body-forms)
  "Acquire mutex around execution of body-forms.  Current process will
   block if mutex is unavailable."
  (let ((mutex-var (gensym (symbol-name :mutex-)))
	(release-p (gensym (symbol-name :release-p-))))
    `(let ((,mutex-var ,mutex)
	   (,release-p nil))
       (working-unwind-protect
	   (progn
	     (unless (process-has-mutex-p ,mutex-var mp:*current-process*)
	       (setq ,release-p t)
	       (acquire-mutex ,mutex-var))
	     ,@body-forms)
	 (when ,release-p
	   (leave-mutex-queue ,mutex-var))))))

(defstruct (semaphore
	    (:constructor
	     make-semaphore
	     (name size &aux
		   (mutex (make-mutex name))
		   (running-processes
		    (make-array size)))))
  (name nil :read-only t)
  (size 0 :read-only t)
  (mutex nil :read-only t)
  (running-processes #() :read-only t))

(defun process-has-semaphore-p (semaphore process)
  (find process (semaphore-running-processes semaphore)))

(defun wait-for-semaphore (semaphore)
  "Scan the set of running processes to see if there are any slots
   open."
  (let ((whostate (format nil "Semaphore ~s" (semaphore-name semaphore)))
	(running-processes (semaphore-running-processes semaphore)))
    ;; When this returns, there is an open slot in the queue.
    (mp:process-wait whostate #'some #'null running-processes)
    (let ((index (position-if #'null running-processes)))
      (assert index)			; sanity check
      ;; take the empty slot
      (setf (svref running-processes index) mp:*current-process*))))

(defun acquire-semaphore (semaphore)
  "Suspend the current process until the semaphore is free."
  (when excl::*without-interrupts*
    (error "Cannot wait for semaphore when interrupts are off."))
  (when sys::*disallow-scheduling*
    (error "Cannot wait for semaphore with scheduling off."))
  (with-mutex ((semaphore-mutex semaphore))
    (wait-for-semaphore semaphore)))

(defun release-semaphore (semaphore)
  "Remove current process from semaphore data structures."
  ;; No mulitprocessing interlock needed here.
  (nsubstitute nil mp:*current-process* (semaphore-running-processes semaphore)))

(defmacro with-semaphore ((semaphore) &body body-forms)
  "Acquire semaphore around execution of body-forms.  Current process will
   block if more than the allowed number of processes have the semaphore."
  (let ((semaphore-var (gensym (symbol-name :semaphore-)))
	(release-p (gensym (symbol-name :release-p-))))
    `(let ((,semaphore-var ,semaphore)
	   (,release-p NIL))
       (working-unwind-protect
	   (progn
	     (unless (process-has-semaphore-p semaphore mp:*current-process*)
	       (setq ,release-p t)
	       (acquire-semaphore semaphore))
	     ,@body-forms)
            (when ,release-p
	      (release-semaphore ,semaphore-var))))))

(defun testit ()
  (let ((semaphore (make-semaphore 'Test 3)))
    (dotimes (i 10)
      (mp:process-run-function (format nil "Semaphore Test ~d" i)
       (lambda (i output)
	 (format output "~&Process ~d waiting for semaphore." i)
	 (with-semaphore (semaphore)
	   (format output "~&Process ~d got semaphore, sleeping." i)
	   (sleep (random 10))
	   (format output "~&Process ~d releasing semaphore." i))
	 (format output "~&Process ~d exiting." i))
       i *standard-output*))))

From: John Foderaro
Subject: Re: implementing a counting sempahore (allegroCL)
Date: Thu, 26 Jul 2001 17:28:06 +0000
Message-ID: <MPG.15c9f2677604f3cb9896a3@news.dnai.com>

In article <············@itasoftware.com>, ···@itasoftware.com says...
> 
> ;;; Franz's implementation of unwind-protect is not safe from
> ;;; asynchronous interrupts and thus not useful in a multitasking
> ;;; environment.
> 

 This statement is false.
 What this statement is based on is the observation
by jrm that if you use with-open-file in a tight loop and 
hit ^C a lot to cause control to throw out of 
with-open-file (to an ignore-errors) that you 
occasionally lose file descriptors.
 The actual problem is that the open function allocates
the file descriptor in C code using an operating 
system function.  When that file descriptor is returned
to Lisp a stream object is allocated to hold it
and then that's returned from Lisp to the caller
of open.   If during the period of time that open
has an allocated file descriptor and hasn't returned
the stream object to the caller the thread doing
that open is interrupted and control is thrown out
of the open function then the file descriptor will
be lost.   It isn't enough that the thread be interrupted
as long as things will work if it's allowed to later resume execution
where it left off.
 Thus to be perfectly safe in a multiprocessing environment
with threads being interrupted at random points
and not being allowed to continue you should
never call the open function like this:
  (setq p (open "foo.html"))
instead it should be

   (without-interrupts (setq p (open "foo.html"))
and that should be done inside an unwind-protect
which has a cleanup form that closes p if it's 
a stream when the cleanup form is invoked.
And you'll want to do the close inside a
without-interrupts as well.

 Basically if you *have to* deal with random process
interrupts that causes control to be thrown rather 
than returned, then you have to code in a very
paranoid way.  If there is any way to avoid taking
these random process interrupts then your programming
task becomes a lot easier.


-john foderaro
 franz inc.

From: ···@itasoftware.com
Subject: Re: implementing a counting sempahore (allegroCL)
Date: Thu, 26 Jul 2001 18:36:04 +0000
Message-ID: <u1zz1qsr.fsf@itasoftware.com>

John Foderaro <···@xspammerx.franz.com> writes:

> In article <············@itasoftware.com>, ···@itasoftware.com says...
> > 
> > ;;; Franz's implementation of unwind-protect is not safe from
> > ;;; asynchronous interrupts and thus not useful in a multitasking
> > ;;; environment.
> > 
> 
>  This statement is false.

I should have been more specific.

The implementation of unwind-protect in Allegro CL does not defer
handling of asynchronous interrupts during the `cleanup' forms of an
unwind-protect.  This means that it is possible for an ill-timed
asynchronous event (control-c or process-kill, for example) to cause
execution of the cleanup forms in an unwind-protect to be aborted.

In a multitasking environment, it is absolutely critical that there be
an unwind-protect mechanism that cannot be asynchronously aborted.
This means that another thread should not be able to cause a non-local
exit within this thread while the cleanup forms are being run.

However, it is *not* necessary to completely disable interrupts or
even process switching.  The mechanism provided by Lucid Common Lisp
(whatever it is called now) is an example of what I mean.  The form
INTERRUPTIONS-DEFERRED-UNWIND-PROTECT would defer processing of
asynchronous inter-process messages, but still allow multitasking to
proceed.  This means that within any particular thread, when the
cleanup forms are started, they will run to completion (unless, of
course, the cleanup forms did a non-local exit themselves).

>  What this statement is based on is the observation
> by jrm that if you use with-open-file in a tight loop and 
> hit ^C a lot to cause control to throw out of 
> with-open-file (to an ignore-errors) that you 
> occasionally lose file descriptors.

I have observed this, and it is related to my statement, but it is not
the crux of the issue. 

(with-open-file (stream "foo") (act-on stream)) 
expands into
(LET ((STREAM (OPEN "foo")) (#:G167900 T))
  (UNWIND-PROTECT
      (MULTIPLE-VALUE-PROG1 (PROGN (ACT-ON STREAM)) (SETQ #:G167900 NIL))
    (WHEN (STREAMP STREAM) (CLOSE STREAM :ABORT #:G167900))))

There are at least two race conditions in this expansion.
First, the stream is opened and bound *before* the unwind-protect is
entered.  An asynchronous interrupt before the unwind-protect would
lose the stream.  A better solution would be this:

(LET ((STREAM nil) (#:G167900 T))
  (UNWIND-PROTECT
      (PROGN (SETQ STREAM (OPEN "foo"))
             (MULTIPLE-VALUE-PROG1 (PROGN (ACT-ON STREAM)) (SETQ #:G167900 NIL)))
    (WHEN (STREAMP STREAM) (CLOSE STREAM :ABORT #:G167900))))

Now it turns out that this particular race condition is not a problem
under Allegro because of the way interrupts are handled.  The Allegro
compiler inserts interrupt polls into the instruction stream at
various points.  The compiler does not insert an interrupt check
between the let binding and the entry to the unwind-protect, so we
won't have any difficulty with the regular expansion.

The second potential race condition is within the cleanup forms
themselves.  An asynchronous interrupt processed before the close of
the stream is completed could cause the stream to remain open.
Again, we have lucked out because the compiler does not insert an
interrupt check before the code to STREAMP.  But now our luck runs
out.

CLOSE is a generic function, and an out-of-line call is made to it.
There are ample sources of interrupt checks, and if a control-c is
processed at one of these checks, it is possible that the stream will
not be closed.

>  The actual problem is that the open function allocates
> the file descriptor in C code using an operating 
> system function.  When that file descriptor is returned
> to Lisp a stream object is allocated to hold it
> and then that's returned from Lisp to the caller
> of open.   If during the period of time that open
> has an allocated file descriptor and hasn't returned
> the stream object to the caller the thread doing
> that open is interrupted and control is thrown out
> of the open function then the file descriptor will
> be lost.   It isn't enough that the thread be interrupted
> as long as things will work if it's allowed to later resume execution
> where it left off.

This is yet *another* source of errors, and perhaps it is a bigger
problem than the interrupt check during the call to close.

>  Thus to be perfectly safe in a multiprocessing environment
> with threads being interrupted at random points
> and not being allowed to continue you should
> never call the open function like this:
>   (setq p (open "foo.html"))
> instead it should be
> 
>    (without-interrupts (setq p (open "foo.html"))
> and that should be done inside an unwind-protect
> which has a cleanup form that closes p if it's 
> a stream when the cleanup form is invoked.
> And you'll want to do the close inside a
> without-interrupts as well.

I use a macro much like this:

(defmacro with-open-file ((stream filespec &rest options) &body body)
  "Like with-open-file, but no race conditions."
  (let ((abort-p             (gensym (symbol-name :abort-p-)))
        (interrupts-off      (gensym (symbol-name :interrupts-off-)))
        (arglist             (mapcar (lambda (option)
                                         (declare (ignore option))
                                         (gensym (symbol-name :argument-)))
                                     options)))
    `(LET ((,stream nil)
           (,abort-p t)
           (,interrupts-off EXCL::*WITHOUT-INTERRUPTS*)
           (EXCL::*WITHOUT-INTERRUPTS* T)
           ,@(mapcar #'list arglist options)) ; eval arguments normally
        (CL:UNWIND-PROTECT
            (MULTIPLE-VALUE-PROG1
                (PROGN (SETQ ,stream (OPEN ,filespec ,@arglist))
                       (LET ((EXCL::*WITHOUT-INTERRUPTS* ,interrupts-off))
                          ,@body))
                 (SETQ ,abort-p NIL))
             (WHEN ,stream
               (CLOSE ,stream :abort ,abort-p))))))

>  Basically if you *have to* deal with random process
> interrupts that causes control to be thrown rather 
> than returned, then you have to code in a very
> paranoid way.  If there is any way to avoid taking
> these random process interrupts then your programming
> task becomes a lot easier.

Since control-c is implemented as a `random' process interrupt, I
don't see a good way to avoid being very paranoid.  Replacing the
standard unwind-protect with 

(defmacro unwind-protect (protected-form &body cleanup-forms)
  (let ((interrupt-enables (gensym (symbol-name :interrupt-enables-))))
    `(LET ((,interrupt-enables EXCL::*WITHOUT-INTERRUPTS*)
	   (EXCL::*WITHOUT-INTERRUPTS* t))
       (CL:UNWIND-PROTECT
	   (LET ((EXCL::*WITHOUT-INTERRUPTS* ,interrupt-enables))
	     ,protected-form)
	 ;; interrupts are off during cleanup
	 ,@cleanup-forms))))

goes a long way toward making the code more robust because you can now
depend upon the cleanup forms running to completion, even if a
control-c is pressed.  Again, it is possible to write a macro that
prevents asynchronous interrupts from aborting the cleanup forms, but
nonetheless allows multitasking to proceed as `normal'.  It is quite
hairy, so I won't include it in this message.

It is true that in a completely unrestrained environment, you cannot
guarantee that the unwind-protect forms will run to completion (after
all, someone could turn off the machine!)  But processing of
asynchronous interrupts can be handled in a much safer way so that
code that runs correctly in a single-threaded environment can continue
to run correctly in a multithreaded one.

From: John Foderaro
Subject: Re: implementing a counting sempahore (allegroCL)
Date: Fri, 27 Jul 2001 14:29:50 +0000
Message-ID: <MPG.15cb1aa5321a6a6b9896a5@news.dnai.com>

In article <············@itasoftware.com>, ···@itasoftware.com says...
> In a multitasking environment, it is absolutely critical that there be
> an unwind-protect mechanism that cannot be asynchronously aborted.
> This means that another thread should not be able to cause a non-local
> exit within this thread while the cleanup forms are being run.
> 

It's not critical or even necessary if you don't allow one thread
to process-interrupt another thread.  You can control whether your
own code calls functions that would cause a non local transfer of
control of another thread.  You can disable the ^C interrupt handling
in your application so it won't do it either (or you can modify
the ^C handler).

I'd be concerned about adding special semantics to the cleanup forms
of an unwind-protect.  They are just lisp expressions and in fact
the guts of a program could be invoked from a call from a cleanup form.
Do you want your whole program to be running in this special
'no outside interference' mode?   What if your cleanup forms
get into trouble?  The only time I've found ^C to be useful
is in debugging when something goes wrong and I want to break
the current execution and see what's happening.  There can be
errors in cleanup forms as well as everywhere else.
 I don't like the interaction of ^C and ignore-errors either but
at the same time I realize that I shouldn't be using 
something so general as ignore-errors
except in very limited cases since it tends to cover up bugs
during development that later bite in deployment.
If there were cases where it really bit me about ignore-errors
and ^C I'd write my own variant of ignore-errors that didn't
ignore ^C's or I'd change what ^C did.


- John Foderaro
  franz inc.

From: ···@itasoftware.com
Subject: Re: implementing a counting sempahore (allegroCL)
Date: Fri, 27 Jul 2001 18:07:30 +0000
Message-ID: <puam1c0t.fsf@itasoftware.com>

John Foderaro <···@xspammerx.franz.com> writes:

> In article <············@itasoftware.com>, ···@itasoftware.com says...
> > In a multitasking environment, it is absolutely critical that there be
> > an unwind-protect mechanism that cannot be asynchronously aborted.
> > This means that another thread should not be able to cause a non-local
> > exit within this thread while the cleanup forms are being run.
> > 
> 
> It's not critical or even necessary if you don't allow one thread
> to process-interrupt another thread.  

Granted.  If an asynchronous abort never occurs, it doesn't matter if
unwind-protect can't protect against it.

> You can control whether your own code calls functions that would
> cause a non local transfer of control of another thread.

Yes, but...

I can't control whether another piece of code might want to
process-kill my process.  Suppose I were using a web server provided
by a third party, and it spawned a worker thread for each request, and
it kept careful track of the amount of time spent in the worker
thread, killing it when a set limit is exceeded.  This seems to me a
very plausible and reasonable design.  However, if my worker thread is
beginning to clean up when the master kills the thread, the cleanup
forms *that are being executed* get aborted, but the other ones
further down the stack do not.

> You can disable the ^C interrupt handling in your application so it
> won't do it either (or you can modify the ^C handler).

Er, yes.  You could tell your clients that pressing control-c during
operation of your program may result in inconsistent state.  I
wouldn't want to use such a system myself, though.  I think it is
reasonable to expect that a system can recover from user-interrupts.

> I'd be concerned about adding special semantics to the cleanup forms
> of an unwind-protect.  

Like what?  Multitasking is already outside the scope of the language,
and therefore requires special semantics.  The standard semantics are
these:

    UNWIND-PROTECT evaluates protected-form and guarantees that
    cleanup-forms are executed before UNWIND-PROTECT exits, whether it
    terminates normally or is aborted by a control transfer of some
    kind.  UNWIND-PROTECT is intended to be used to make sure that
    certain side effects take place after the evaluation of
    protected-form.

If control can be wrested away from UNWIND-PROTECT before it completes
the cleanup forms, then there is no longer a guarantee that the
cleanup forms are executed before the UNWIND-PROTECT exits.  This
clearly violates the intent of UNWIND-PROTECT to ``make sure that
certain side effects take place after the evaluation of [the]
protected form.''

The ability to asynchronously wrest control from a process is a
(common) extension to the language.

    An implementation can have extensions, provided they do not alter
    the behavior of conforming code.

The behavior of UNWIND-PROTECT in the presence of user interrupts is
dramatically different.  I don't want special new semantics, I want
the old ones!

> They are just lisp expressions and in fact
> the guts of a program could be invoked from a call from a cleanup form.

Yes.  The cleanup forms could perform arbitrarily hairy tasks.

However, I would argue that this would probably be indicative of a
design error.

> Do you want your whole program to be running in this special
> 'no outside interference' mode?   What if your cleanup forms
> get into trouble?  

The mode of operation that completely suspends interrupt processing
during cleanup is too draconian.  But Franz had this good idea
concerning interrupts in uninterruptable code:  when the user presses
enough of them, the system prints out a warning and interrupts the
process anyway.

> The only time I've found ^C to be useful is in debugging when
> something goes wrong and I want to break the current execution and
> see what's happening.  There can be errors in cleanup forms as well
> as everywhere else.

Granted.

But first of all, the amount of code in an unwind-protect cleanup form
ought to be fairly limited.  Of course there are exceptions, but if
you are writing an unwind-protect, I would imagine you think it is far
more likely that the body forms would throw than the cleanup forms.

Secondly, if the cleanup forms are in error, then your code will not
be correct whether you run them or not.  However, if your cleanup
forms are *not* in error, then it may make a great deal of difference
if you don't always run them.  (In fact, if cleanup didn't matter, you
wouldn't need an unwind-protect.)

Sure, you can write broken code that will get you in trouble with
unwind-protect.  It will get you in more trouble if the cleanup forms
are not interruptable.  However, broken code is, by definition,
broken.  I'm concerned about the case where the cleanup forms in the
unwind-protect *aren't* broken.  Without the ability to ensure that
they run to completion *each and every time* no matter what happens, I
cannot guarantee that correctly working code will continue to work
correctly in the presence of user interrupts.  In other words, it is
true that Franz's unwind-protect is no worse than having buggy cleanup
forms, but that's no excuse for being unable to guarantee correct
behavior when the cleanup forms are not buggy.

When I debug my code, I work from the assumption that part of the
code-base (perhaps a substantial part) does in fact work correctly.  I
expect the code to continue to perform correctly even if I have to hit
a control-c break and pop back to top level.  Sure the code I am
debugging is broken, but if the rest of the code in the system depends
on the user never hitting a control-c, my debug sessions are going to
become sheer torture:  every time you hit control-c, you would have to
reload the entire machine state to guarantee consistency.

This isn't just a theoretical observation.  The database application
we were developing at ContentIntegrity was quite complicated.  As in
any database application, there is a fair amount of state associated
with a transaction, and when a transaction commits or aborts, this
state must be correctly maintained.  It turned out that a fair amount
of time was spent at the end of a transaction doing the bookkeeping,
so the likelyhood of interrupting the cleanup with a control-c was
higher than usual.  The code that did the final transaction commit or
abort was quite stable (we wouldn't have gotten very far if it were
not), so it was far more important that it be allowed to run to
completion under all circumstances than for it to be easily
interrupted for debugging.

For several months, we were puzzled as to why the system got into an
unusable state as debugging progressed.  The transaction code was
stable, yet we would occasionally end up deadlocked, or unable to use
the database because an exclusive lock had been dropped.  The correct
cleanup code was obviously there in the unwind-protect.  As we
debugged, and occasionally hit control-c to abort some test run, and
popped back to top level, we'd often end up aborting the transaction
cleanup, resulting in a trashed state.  Once we discovered that
unwind-protect was not safe from control-c interrupts, it was
relatively easy to rewrite it so that it was.  This caused the
debugging problems to go away.

From: John Foderaro
Subject: Re: implementing a counting sempahore (allegroCL)
Date: Mon, 30 Jul 2001 15:57:02 +0000
Message-ID: <MPG.15cf23a76cf71f7b9896a6@news.dnai.com>

In article <············@itasoftware.com>, ···@itasoftware.com says...
> Suppose I were using a web server provided
> by a third party, and it spawned a worker thread for each request, and
> it kept careful track of the amount of time spent in the worker
> thread, killing it when a set limit is exceeded.  This seems to me a
> very plausible and reasonable design.

I don't think that it is a good design, it requires that code over
which you have no control be written to a unusually strict standard,
namely the standard that it leave nothing trashed should execution
of the thread cease at any point.  Maybe the author of the web
server can write code to this standard but presumably the web server
is invoking user written code to generate pages and we can't be sure
that the users themselves will write code to this standard.
Even something simple like
	(setq *foo* (nreverse *foo*))
should never be done without shutting off interrupts since if execution
of the thread ceases inside the nreverse the *foo* value is garbage.

As you know Apache on Unix runs as multiple processes.  It would
be bad design to have a daemon running that would just kill off
Apache processes that seem to be eating up cpu time.  Chances are they
are running CGI programs and who knows what state they'll leave
the system in if just killed at an arbitrary point.


> Er, yes.  You could tell your clients that pressing control-c during
> operation of your program may result in inconsistent state.  I
> wouldn't want to use such a system myself, though.  I think it is
> reasonable to expect that a system can recover from user-interrupts.
> 

 I agree that the system should recover from user-interrupts and
most (all?) lisps will as long as you allow the thread to continue
from where the interrupt occurred.   If your code causes a non local
transfer of control on an interrupt thus bypassing code that would
have been run then the system may not recover.
 Yes, Acl's  ignore-errors and ^C interrupt handler can make this happen
to unsuspecting code. Writing your own ignore-errors macro or 
changing the ^C interrupt function can solve this.   Another way to
solve it is to debug one thread from another thread thus you can 
freeze a thread and see what it's doing and then unfreeze it without
ever typing ^C.   
 




> 
> If control can be wrested away from UNWIND-PROTECT before it completes
> the cleanup forms, then there is no longer a guarantee that the
> cleanup forms are executed before the UNWIND-PROTECT exits.  This
> clearly violates the intent of UNWIND-PROTECT to ``make sure that
> certain side effects take place after the evaluation of [the]
> protected form.''
> 

unwind-protect is simply a means of altering normal control flow
in Lisp and thus is like cond or go.  There are few guarantees in Lisp
even when it appears in the cl spec that there are.  
Consider the description of throw:

    throw causes a non-local control transfer to a catch whose tag is eq to tag

We know that if there's an unwind-protect between the throw and the
catch then the cleanup forms of the unwind-protect could well 
transfer control to some place other than the  'catch whose tag is eq to tag'
Does this make the spec wrong?  No, it just means that you have
to interpret the words in the context of things like unwind-protect.

When I read the spec for unwind-protect I see it as simply stating
how control transfer is altered by the unwind-protect form.  You then
have to think of that in the context of the world in which you
run (uniprocessing, multiprocessing, multiprocessing with random
uncontrolled process interrupts).

As I showed in my example above if you really want to live in a world
of malicious process-interrupts that could cause control throw
to change at any point in the code then your real concern should
be functions like nreverse.



> Yes.  The cleanup forms could perform arbitrarily hairy tasks.
> However, I would argue that this would probably be indicative of a
> design error.

one man's design error is another's legacy code.




> 
> But first of all, the amount of code in an unwind-protect cleanup form
> ought to be fairly limited.  

Sounds like you're designing a variant of unwind-protect, one that 
is willing to take extra time to ensure that interrupts don't
get fielded during the cleanup forms.   You wouldn't need to use
this super-unwind-protect everywhere and it would cost more than
the unwind-protect of the CL spec, so it's a great candidate for
writing your own macro.


> When I debug my code, I work from the assumption that part of the
> code-base (perhaps a substantial part) does in fact work correctly.  I
> expect the code to continue to perform correctly even if I have to hit
> a control-c break and pop back to top level.  

Before you do that pop you should really check where you are.  You may be
in an unwind-protect cleanup form or worse yet you may be half way
through modifying some struct or class instance so that it's currently in 
an inconsistent state.  Popping to the top level is going to leave
some part of your system in a weird state. There are just so many 
ways to lose if you randomly stop executing code in a thread.
If you put these super-unwind-protect everywhere they were actually 
needed and then wrote code to back out of any changes in the uwp body were
it interrupted and control thrown out if it then you would have
one huge mass of slow hard to read and very very hard to test code.

It's better to nip the problem at the bud and say: no 
process-interrupts that throw control.   Should you hit ^C during
debugging then it's you the typist's responsibility to check the whole stack
and convince yourself that no harm would be done by popping to top level.


>  Once we discovered that
> unwind-protect was not safe from control-c interrupts, it was
> relatively easy to rewrite it so that it was.  This caused the
> debugging problems to go away.
> 

I"m sorry to hear that you wasted so much time tracking down this 
problem.   It sounds like you now feel that you can hit ^C and 
pop to top level whenever you want.  You might want to check your
code for expressions outside of unwind-protect cleanup forms where
if thread execution ended abruptly at a certain point in your code
(maybe in a system lisp function) then something would be trashed.  
I think that you'll find some.


-john foderaro
 franz inc.

From: ···@itasoftware.com
Subject: Re: implementing a counting sempahore (allegroCL)
Date: Mon, 30 Jul 2001 16:56:23 +0000
Message-ID: <8zh61hl4.fsf@itasoftware.com>

John Foderaro <···@xspammerx.franz.com> writes:

> In article <············@itasoftware.com>, ···@itasoftware.com says...
> > Suppose I were using a web server provided
> > by a third party, and it spawned a worker thread for each request, and
> > it kept careful track of the amount of time spent in the worker
> > thread, killing it when a set limit is exceeded.  This seems to me a
> > very plausible and reasonable design.
> 
> I don't think that it is a good design, it requires that code over
> which you have no control be written to a unusually strict standard,
> namely the standard that it leave nothing trashed should execution
> of the thread cease at any point.  Maybe the author of the web
> server can write code to this standard but presumably the web server
> is invoking user written code to generate pages and we can't be sure
> that the users themselves will write code to this standard.
> Even something simple like
> 	(setq *foo* (nreverse *foo*))
> should never be done without shutting off interrupts since if execution
> of the thread ceases inside the nreverse the *foo* value is garbage.
>
> > When I debug my code, I work from the assumption that part of the
> > code-base (perhaps a substantial part) does in fact work correctly.  I
> > expect the code to continue to perform correctly even if I have to hit
> > a control-c break and pop back to top level.  
> 
> Before you do that pop you should really check where you are.  You may be
> in an unwind-protect cleanup form or worse yet you may be half way
> through modifying some struct or class instance so that it's currently in 
> an inconsistent state.  Popping to the top level is going to leave
> some part of your system in a weird state. There are just so many 
> ways to lose if you randomly stop executing code in a thread.

Just because there are other ways to lose doesn't mean that solving
one particular variant of it is useless.

> If you put these super-unwind-protect everywhere they were actually 
> needed and then wrote code to back out of any changes in the uwp body were
> it interrupted and control thrown out if it then you would have
> one huge mass of slow hard to read and very very hard to test code.

I don't think this is the case.  In my experience, the number of
places you have to protect against interruption are not that
widespread, and can be controlled via locks and other synchronization
functions.  All of the locking code of course needs to use the
`super-unwind-protect'.

> It's better to nip the problem at the bud and say: no 
> process-interrupts that throw control.   Should you hit ^C during
> debugging then it's you the typist's responsibility to check the whole stack
> and convince yourself that no harm would be done by popping to top level.

And if it isn't safe?  This puts a rather high burden on the
programmer.  How can he arrange to have the control-C come in at a
safe time?

Being able to throw control via an interrupt is one of the reasons one
wants to use multitasking.  If every task has to run uninterruptably,
what benefit does one acquire from having process control primitives?

> >  Once we discovered that
> > unwind-protect was not safe from control-c interrupts, it was
> > relatively easy to rewrite it so that it was.  This caused the
> > debugging problems to go away.
> > 
> 
> I"m sorry to hear that you wasted so much time tracking down this 
> problem.   It sounds like you now feel that you can hit ^C and 
> pop to top level whenever you want.  You might want to check your
> code for expressions outside of unwind-protect cleanup forms where
> if thread execution ended abruptly at a certain point in your code
> (maybe in a system lisp function) then something would be trashed.  
> I think that you'll find some.

Well, I'm not working on that code, now.  We didn't have too many
problems with trashed global structure, because we kept our global
state in the database, where it is safe.

Again, I'm not arguing that fixing unwind-protect is a panacea.  There
are many ways to lose.  However, that's not a valid reason to not plug
one nasty gap.  In fact, that is what I did in the semaphore code that
I posted.

If you look at the semaphore code that I posted, you will see that it
doesn't use shared global structure, nor does it modify anything in an
unsafe way outside of a protected context.  If you use the
unwind-protect replacement that the code provides, you can even
control-c or process-interrupt within the semaphore code perfectly
safely.

From: Thor Kristoffersen
Subject: Re: implementing a counting sempahore (allegroCL)
Date: Wed, 25 Jul 2001 18:30:17 +0000
Message-ID: <yznsnfkyi86.fsf@triumph.nr.no>

Ryan McCormack writes:
> Does anyone have any experience with implementing a semaphore with
> Lisp?  I am struggling with whether or not I can do this given the
> primitives in the multiprocessing package that comes with AllegroCL.
> Any links to examples of semaphore implementations in Lisp would be
> greatly appreciated.

I am not familiar with Allegro CL, but LispWorks has so-called mailbox
objects that can be used as general semaphores (reading an object from a
mailbox is equivalent to a P operation and putting an object into a mailbox
is equivalent to a V operation).  Maybe ACL has something similar to
mailboxes?

Thor