From: David L. Rager
Subject: Is semaphore signaling in OpenMCL Threadsafe?
Date: 
Message-ID: <42CD8E91.1050105@cs.utexas.edu>
Hello All,

Sorry that there's not more context for this question, but the code is 
just too long to put up here.

As we know, OpenMCL supports OS-level processes and gives us semaphores 
to help with efficient signaling - woohoo! I'm using these semaphores - 
woohoo!


My question:
Is it possible for an OpenMCL user to call (ccl:signal-semaphore ...) on 
one semaphore but actually have it signal another semaphore (this is 
undesired behavior)?  Ie - is ccl:signal-semaphore "thread-safe?"

I've looked through the OpenMCL source for a definition of 
signal-semaphore, but my non-veteran grep skills are turning up nothing. 
Does anyone know where the definition for the semaphore functions are?



My Problem (less relevant probably):
When I run my function, I run certain functions in parallel (like 
identity-burn-x-seconds). At the end of some executions, it signals a 
semaphore. By examining the trace and printed statements below my 
signature, we can see that semaphore #5 is signaled in line 21 but for 
some reason semaphore #3 is *also* signaled. This is extremely weird 
behavior. Semaphore #3 is correctly signaled on line 42.

Each semaphore is stored in a global semaphore-alist, where the key is 
the semaphore number. There is a lock to prevent concurrent access to 
this global. The global semaphore alist was reset with all semaphores 
holding values of 0 before execution.

The actual error about having nil as an argument to binary-+ is a result 
of this semaphore confusion, so don't worry about debugging my call to 
binary-+. (for curiousity, binary-+ is called from pfoo-2).

If I figure it out, I'll post back immediately so I don't waste anyone's 
thought cycles.

Many thanks,
David


? (reset-semaphore-list) (time (pfoo-2 3))
#<RECURSIVE-LOCK "semaphore-alist-lock" [ptr @ #x10C3C0] #x75E7666>
?  Calling (PFOO-2 3)
Waiting on 3
  Calling (ACL2::IDENTITY-BURN-X-SECONDS 3)
  Calling (ACL2::IDENTITY-BURN-X-SECONDS 3)
  ACL2::IDENTITY-BURN-X-SECONDS returned 3
Signaling for 2
  Calling (ACL2::PFOO-2 2)
   Calling (ACL2::IDENTITY-BURN-X-SECONDS 2)
  ACL2::IDENTITY-BURN-X-SECONDS returned 3
Signaling for 1
   ACL2::IDENTITY-BURN-X-SECONDS returned 2
   Calling (ACL2::IDENTITY-BURN-X-SECONDS 2)
   ACL2::IDENTITY-BURN-X-SECONDS returned 2
Waiting on 6
  Calling (ACL2::IDENTITY-BURN-X-SECONDS 1)
  Calling (ACL2::IDENTITY-BURN-X-SECONDS 1)
  ACL2::IDENTITY-BURN-X-SECONDS returned 1

Signaling for 5 ;;; Is this the problem?
Received Signal for 3
Waiting on 2
Received Signal for 2
Waiting on 1
Received Signal for 1

Calling (ACL2::PFOO-2 0)
  ACL2::PFOO-2 returned 0
Signaling for 6
Received Signal for 6
Waiting on 5
  Received Signal for 5
  Waiting on 4

  ACL2::IDENTITY-BURN-X-SECONDS returned 1

Signaling for 4
Received Signal for 4

  ACL2::PFOO-2 returned 6
Signaling for 3

 > Error in process listener(2): value NIL is not of the expected type 
NUMBER.
 > While executing: CCL::+-2
 > Type :POP to abort.
Type :? for other options.
1 >

From: Pascal Bourguignon
Subject: Re: Is semaphore signaling in OpenMCL Threadsafe?
Date: 
Message-ID: <87mzoxwz9z.fsf@thalassa.informatimago.com>
"David L. Rager" <·······@cs.utexas.edu> writes:
> Each semaphore is stored in a global semaphore-alist, where the key is
> the semaphore number. There is a lock to prevent concurrent access to
> this global. The global semaphore alist was reset with all semaphores
> holding values of 0 before execution.

I would trace this alist. Perhaps you have the same semaphore as value
for both key 3 and key 5.

(Irrelevant to your problem:) Why don't you use a vector?  

Not much can be said without the sources...

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

Nobody can fix the economy.  Nobody can be trusted with their finger
on the button.  Nobody's perfect.  VOTE FOR NOBODY.
From: David L. Rager
Subject: Re: Is semaphore signaling in OpenMCL Threadsafe?
Date: 
Message-ID: <42D40BD7.3090308@cs.utexas.edu>
It was a bug fixed in OpenMCL 0.10.3 - *cheer*