From: David Bakhash
Subject: simulations, Lisp, and threads...
Date: 
Message-ID: <cxjr9e7u8tc.fsf@engc.bu.edu>
Hi,

I have a questions about simulations of actual physical environments
written in Common Lisp.

It turns out that several Lisp implementations offer threaded
environments.  It's also the case that Lisp runs on operating systems
that work well with clustered systems.

Suppose you're writing a simulation where you have many independent
processes operating in a common "world" (e.g. independent gas
molecules inside a volumetric world).  Suppose that you had ~100 of
these objects, and you wanted to write a simulation, and somehow make
use of interesting possibility that this seems to fit the Lisp world
and threads model quite well (based on the major commercial CL
implementations).

Not having done this, I'd just like to know if:

1) CL running in a multi-processor clustered systems could efficiently 
   use all the CPUs to divide and conquer the problem.

   In other words, can the treads potentially run on separate CPUs?
   Can having multiple processors benefit a multi-threaded Lisp
   application?

2) If it's possible to use the "Lisp world" model (e.g. access to
   what's going on in each thread, globals, etc.) to simplify the
   simulation code.

   Phrased another way, the traditional way to simulate this
   environment would be to discretize time, and then with each
   timestep, loop through the different objects, have them behave
   according to your model, update the state of the overall system,
   and then increment time again, etc.  Is there a clever way to
   somehow have each of these (mostly identical) objects just become a 
   thread, and live on its own, but participate in the simulation
   simply because it's part of the Lisp world?

thanks,
dave

From: Gavin E. Gleason
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <87og9br7hj.fsf@hasdrubal.carthage.unm.edu>
David Bakhash wrote: 
>   Phrased another way, the traditional way to simulate this
>   environment would be to discretize time, and then with each
>   timestep, loop through the different objects, have them behave
>   according to your model, update the state of the overall system,
>   and then increment time again, etc.  Is there a clever way to
>   somehow have each of these (mostly identical) objects just become a 
>   thread, and live on its own, but participate in the simulation
>   simply because it's part of the Lisp world?

If you can think of one then you are likely to end up a multimillionare. 
This is the way that almost all protien folding systems operate, or any 
quantum mechanical simulation for chemical synthesis.  Not to say that 
there isn't a better way, but that its not obvious. 

	Gavin E. Mendel-Gleason


-- 
"Syntactic sugar causes cancer of the semicolon."
	-Alan Perlis
From: Tim Bradshaw
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <ey3og9bfx1o.fsf@cley.com>
* David Bakhash wrote:

> 1) CL running in a multi-processor clustered systems could efficiently 
>    use all the CPUs to divide and conquer the problem.

>    In other words, can the treads potentially run on separate CPUs?
>    Can having multiple processors benefit a multi-threaded Lisp
>    application?

I believe that most existing CL systems treat the heap as a
non-sharable resource, which makes this kind of thing hard to do.  I
think shared heaps in multiprocessor GCd systems is fairly hard.  I'd
be happy to be proved wrong (either that some major implementations
can share the heap on multiprocessors, or that it's easy...).

> 2) If it's possible to use the "Lisp world" model (e.g. access to
>    what's going on in each thread, globals, etc.) to simplify the
>    simulation code.

>    Phrased another way, the traditional way to simulate this
>    environment would be to discretize time, and then with each
>    timestep, loop through the different objects, have them behave
>    according to your model, update the state of the overall system,
>    and then increment time again, etc.  Is there a clever way to
>    somehow have each of these (mostly identical) objects just become a 
>    thread, and live on its own, but participate in the simulation
>    simply because it's part of the Lisp world?

My intuition is that general-purpose threading models are not really
suitable for this kind of thing.  As far as I know the people who do
big simulation-type code on multiprocessors do some fairly explicit
chopping-up of the task into bits (or, perhaps, have the compiler do
it for them).  Again, I could easily be wrong here.

--tim
From: Christian Lynbech
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <of1z66dh0o.fsf@chl.tbit.dk>
>>>>> "Tim" == Tim Bradshaw <···@cley.com> writes:

Tim> * David Bakhash wrote:
>> 1) CL running in a multi-processor clustered systems could efficiently 
>> use all the CPUs to divide and conquer the problem.

>> In other words, can the treads potentially run on separate CPUs?
>> Can having multiple processors benefit a multi-threaded Lisp
>> application?

Tim> I believe that most existing CL systems treat the heap as a
Tim> non-sharable resource, which makes this kind of thing hard to do.  I
Tim> think shared heaps in multiprocessor GCd systems is fairly hard.  I'd
Tim> be happy to be proved wrong (either that some major implementations
Tim> can share the heap on multiprocessors, or that it's easy...).

The ancestor of rscheme (the precise name escapes me at the moment,
somewhat old, apparently ran on SGI) had some fairly impressive
multiprocessor support. I read an article once, I could possibly dig
up a reference if anybody is interested.

Continuing the randomized search of the associative array in the back
of my head, I remember that Journal of Symbolic Computation once had a
theme number devoted to parallel lisp.

Not that any of the above makes it particularly simpler :-)


---------------------------+--------------------------------------------------
Christian Lynbech          | Ericsson Telebit A/S                       
Fax:   +45 8628 8186       | Fabrikvej 11, DK-8260 Viby J
Phone: +45 8738 2228       | email: ···@tbit.dk --- URL: http://www.tbit.dk
---------------------------+--------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - ·······@hal.com (Michael A. Petonic)
From: Tim Bradshaw
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <ey3hff2de8t.fsf@cley.com>
* Christian Lynbech wrote:

> The ancestor of rscheme (the precise name escapes me at the moment,
> somewhat old, apparently ran on SGI) had some fairly impressive
> multiprocessor support. I read an article once, I could possibly dig
> up a reference if anybody is interested.

> Continuing the randomized search of the associative array in the back
> of my head, I remember that Journal of Symbolic Computation once had a
> theme number devoted to parallel lisp.

I'd be interested in references, particularly to reasonable
implementations.  

I get the impression that there are several problems:

	lots of CL-based systems that use multiprocessing seem to
	assume that WITHOUT-INTERRUPTS is a reasonable synonym for
	WITH-LOCK (or some such), which fails to be true in a big way
	on multiprocessors.

	Reasonably good error checking may be hard on multiprocessors
	unless you lock almost everything, so Lisp becomes more like C
	(whoops these two threads tried to change this object at once,
	now it's basically random and you fall off the end of the
	world at some later point.

	Distributed GC is hard and pretty much necessary unless you
	don't want to get killed by Amdahl's law (is it Amdahl's? the
	`if 1/n of your task is serial then more than n processors
	don't help' one).

It seems to me that the third is the killer.

(The fourth is probably `no one is willing to pay the vendors for it,
yet').

--tim
From: David Bakhash
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <cxjln4dtant.fsf@engc.bu.edu>
so why is it that every single Lisp I've ever has chosen not to make
GC a separate thread?  If it were (and I'm sure that this would
complicate the system somewhat), then the global GCs wouldn't be so
severe.

dave
From: Joe Marshall
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <rPBs4.58378$vi4.124434@dfw-read.news.verio.net>
David Bakhash <·····@alum.mit.edu> wrote in message
····················@engc.bu.edu...
> so why is it that every single Lisp I've ever has chosen not to make
> GC a separate thread?

Perhaps you are making poor choices?

> If it were (and I'm sure that this would
> complicate the system somewhat), then the global GCs wouldn't be so
> severe.

If you want to have scavenging run in parallel to a mutator process,
you need a `read barrier' to keep oldspace pointers out of
`the machine'.  Read barriers are usually considered too
expensive to implement on stock hardware unless there are
serious real-time constraints.  Generational scavenging is usually
implemented atomically, but the time to GC is so small it is
barely noticable.

Even on a LispM, which had a hardware read barrier, a global
GC is so disk-intensive that it will thrash your virtual memory,
so global GC is pretty painful no matter what you do.
From: Chuck Fry
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <38b36a5a$0$225@nntp1.ba.best.com>
In article <······················@dfw-read.news.verio.net>,
Joe Marshall <·········@alum.mit.edu> wrote:
>If you want to have scavenging run in parallel to a mutator process,
>you need a `read barrier' to keep oldspace pointers out of
>`the machine'.  

Can someone please elaborate on this?  I imagine the read barrier itself
is trivial to implement with the standard virtual memory hardware on
most modern CPUs, but I'm a bit unclear on how it would be used.

>Even on a LispM, which had a hardware read barrier, a global
>GC is so disk-intensive that it will thrash your virtual memory,
>so global GC is pretty painful no matter what you do.

And that was on a VM system that was *tuned* for GCing... the typical
Un*x LRU page replacement algorithm is worse.

 -- Chuck
--
	    Chuck Fry -- Jack of all trades, master of none
 ······@chucko.com (text only please)  ········@home.com (MIME enabled)
Lisp bigot, car nut, photographer, sometime guitarist and mountain biker
The addresses above are real.  All spammers will be reported to their ISPs.
From: Joe Marshall
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <QsTs4.62716$vi4.128995@dfw-read.news.verio.net>
Chuck Fry <······@best.com> wrote in message
···················@nntp1.ba.best.com...
> In article <······················@dfw-read.news.verio.net>,
> Joe Marshall <·········@alum.mit.edu> wrote:
> >If you want to have scavenging run in parallel to a mutator process,
> >you need a `read barrier' to keep oldspace pointers out of
> >`the machine'.
>
> Can someone please elaborate on this?  I imagine the read barrier itself
> is trivial to implement with the standard virtual memory hardware on
> most modern CPUs, but I'm a bit unclear on how it would be used.

The basic idea is this:  we'll model the heap as a directed graph,
compute the transitive closure based on a root set, and discard
anything not reached.  If we want to do this incrementally, we will
be keeping track of 3 sets of nodes:  A) Those nodes that are reachable
from the root and only point to nodes reachable from the root,
B) those nodes that are reachable from the root, but point to nodes
that we have not examined, and C) all the other nodes.

Set A is `newspace', set B is the `scavenge queue', set C is
oldspace.

The scavenger starts by putting the root set in B, and everything
else in C.  As the scavenger runs, it selects a node from B, puts
any children of that node that are in C into B, then moves B to A.
Once B is exhausted, any nodes left in C are not reachable by A.
The important invariant is that nodes in A *never* point to any node in
C.

If we want to run another process in parallel with this one, we need
to make sure that as we create new nodes and modify existing
ones that we don't create violate that invariant.

One way to ensure that we never violate the invariant is to note that
we can only modify a small selection of nodes:  those immediately
reachable by the addresses in the CPU registers.  (These nodes
are ``in the machine'', the rest are ``in the heap''.)  If all the nodes
in the machine are in set A or B, then changing the children of a
node can only result in the node still pointing to nodes in set A
or B.  And consing a new node still results in a node in set A
or B.

So if we can guarantee that all nodes ``in the machine'' are always
in set A, we can guarantee that the mutator process cannot confuse
the scavenger process.  The way we do that is with a ``read barrier''.
The read barrier examines every pointer we read into the
machine.  If the pointer points to oldspace, we immediately move
the object pointed at into scavenge queue (set B) before we allow any
further action.

The LMI LispM worked this way:  When a pointer was read,
the top 12 bits (5-6 bits of tag, 6-7 bits of address) were used
to address a RAM. If the tag indicated `pointer', and the address
indicated `oldspace', the machine would trap, move the object to
newspace, then fixup the pointer before resuming.

The problem with standard virtual memory hardware is that it
checks the *address* being written to or read from, but to implement
a read barrier, we need to look at the *data* being read.  As far as
I know, you can't co-opt the vm hardware to do this.  (The write
barrier used for generation scavenging relies on the *address*,
not the data, so you can get the vm hardware to deal with it.)
The alternative is to wrap *every* read from memory with a test
and branch.

Baker's paper goes into much more detail, but omits two important
points:  1. You can only get hard real-time response out of this if
the time to scavenge a single object is bounded.  If all objects
are fixed size, this is trivial, but if an object is an array, it could
be arbitrarily large, and you have to scavenge the whole thing.
2.  Even if everything is a cons cell, and you are guaranteed
a bounded amount of time to scavenge, the upper bound is the
amount of time it takes to read two objects from virtual memory,
which could be as high as two disk accesses.

So since you generally can't guarantee real-time response
anyway, and since wrapping every read would significantly slow
down the code (remember this wrapping every read that the
mutator does), there seems little point in attempting incremental
scavenging on stock hardware.  (of course, if you are running
an embedded system with no disk, and needed hard real-time
response, then you would want to do this).

> >Even on a LispM, which had a hardware read barrier, a global
> >GC is so disk-intensive that it will thrash your virtual memory,
> >so global GC is pretty painful no matter what you do.
>
> And that was on a VM system that was *tuned* for GCing... the typical
> Un*x LRU page replacement algorithm is worse.

Um.  The LMI VM wasn't tuned for GC.  I finished up the generational
GC implementation for the Lambda, and when I turned it on, the
performance of my machine plumetted.  As it turned out, a bug in
the page replacement algorithm and the fact that the page table
code didn't scale well (early CADRs had 192K, the Lambda
had 16Meg), interacted with the GC in such a way as to make
every page end up on the same map cache line.  Of course we
fixed this right away, but that was in '86 and LMI disappeared
shortly afterward.
From: Pekka P. Pirinen
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <ixvh3eo3bp.fsf@harlequin.co.uk>
"Joe Marshall" <·········@alum.mit.edu> writes:
> Chuck Fry <······@best.com> wrote:
> > Joe Marshall <·········@alum.mit.edu> wrote:
> > >If you want to have scavenging run in parallel to a mutator process,
> > >you need a `read barrier' to keep oldspace pointers out of
> > >`the machine'.

That's one way, often the best way.  There are other ways of doing
incremental/parallel collection, especially for non-moving collection.

> > Can someone please elaborate on this?  I imagine the read barrier itself
> > is trivial to implement with the standard virtual memory hardware on
> > most modern CPUs, but I'm a bit unclear on how it would be used.
> 
> [deleted good explanation of tri-colour marking with a Baker
> read barrier]
>
> The problem with standard virtual memory hardware is that it
> checks the *address* being written to or read from, but to implement
> a read barrier, we need to look at the *data* being read.  As far as
> I know, you can't co-opt the vm hardware to do this.  [...]
> The alternative is to wrap *every* read from memory with a test
> and branch.

You only need the read barrier on the scavenge queue (set B, the
"grey" set).  This can be done with standard VM hardware, if you're
willing to take the trap even when the pointer being read isn't
oldspace (set C, "white").  Appel, Ellis and Li (Andrew Appel, John
R. Ellis, Kai Li. 1988. Real-time Concurrent Collection on Stock
Multiprocessors.) showed that if you scan the entire page when the
barrier is hit, you can get acceptable performance.  This is what
Functional Developer (formerly known as Harlequin Dylan) uses,
although in practice, most generation 0 collections are so fast, there
are only a few increments and few chances to hit the barrier.

> wrapping every read would significantly slow
> down the code (remember this wrapping every read that the
> mutator does), there seems little point in attempting incremental
> scavenging on stock hardware.

Yes, read barriers are usually more expensive than write barriers, and
doing them in the code will result in a large increase in code size
and a significant loss of performance.  However, you can use write
barriers for incremental collection (even read & write barriers, if
you're feeling perverse), and that's feasible without hardware
support.  It's been done for various Lisps and MLs.  See <URL:http://
www.harlequin.com/products/st/mm/bib/gc.html#incremental> for papers
on various techniques.

> > >Even on a LispM, which had a hardware read barrier, a global
> > >GC is so disk-intensive that it will thrash your virtual memory,
> > >so global GC is pretty painful no matter what you do.

This is still true: we need better VM systems.  I have hopes that the
spread of Java will persuade OS designers to include more useful VM
interfaces.
-- 
Pekka P. Pirinen, Adaptive Memory Management Group, Xanalys Inc.
 "Not Politically Correct" is underhanded praise for being mean or
 impolite without good reason.
From: Joe Marshall
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <HAft4.67589$vi4.136862@dfw-read.news.verio.net>
Pekka P. Pirinen <·····@harlequin.co.uk> wrote in message
···················@harlequin.co.uk...
> "Joe Marshall" <·········@alum.mit.edu> writes:
> > Chuck Fry <······@best.com> wrote:
> > > Joe Marshall <·········@alum.mit.edu> wrote:

Below, we'll be referring to three sets,

the "black", newspace, what I called set A before
the "grey", scavenge-queue, what I called B before
and the "white", oldspace, what I called C before

> > > >If you want to have scavenging run in parallel to a mutator process,
> > > >you need a `read barrier' to keep oldspace pointers out of
> > > >`the machine'.

> That's one way, often the best way.  There are other ways of doing
> incremental/parallel collection, especially for non-moving collection.

Right.  You have to be able to identify whether a particular object
is in the "black", "white" or "grey" set.  A copying collector can do
that by examining the range of the pointer, but you could use
explicit mark bits on the objects or some other mechanism.

I think Baker's `treadmill' GC works with non-moving collections.

> You only need the read barrier on the scavenge queue (set B, the
> "grey" set).

You want to prevent non-black pointers ending up in the black set.
So you can put the read-barrier between the scavenge-queue
and the "white" set, or you can put the read-barrier on the scavenge
queue itself.  The former mechanism allows the mutator to manipulate
grey elements, perhaps allowing it to access more data, and
it transports objects on a pointer-by-pointer basis.

> This can be done with standard VM hardware, if you're
> willing to take the trap even when the pointer being read isn't
> oldspace (set C, "white").

This method would trap every time the mutator saw a "grey"
pointer.

> Appel, Ellis and Li (Andrew Appel, John
> R. Ellis, Kai Li. 1988. Real-time Concurrent Collection on Stock
> Multiprocessors.) showed that if you scan the entire page when the
> barrier is hit, you can get acceptable performance.  This is what
> Functional Developer (formerly known as Harlequin Dylan) uses,
> although in practice, most generation 0 collections are so fast, there
> are only a few increments and few chances to hit the barrier.

Ok, but doesn't this mean that you transport the entire page of
objects into newspace?  Wouldn't it retain more storage?

> Yes, read barriers are usually more expensive than write barriers, and
> doing them in the code will result in a large increase in code size
> and a significant loss of performance.  However, you can use write
> barriers for incremental collection (even read & write barriers, if
> you're feeling perverse), and that's feasible without hardware
> support.  It's been done for various Lisps and MLs.  See <URL:http://
> www.harlequin.com/products/st/mm/bib/gc.html#incremental> for papers
> on various techniques.

Write barriers work great, if you can get some co-operation from
the OS.

> > > >Even on a LispM, which had a hardware read barrier, a global
> > > >GC is so disk-intensive that it will thrash your virtual memory,
> > > >so global GC is pretty painful no matter what you do.
>
> This is still true: we need better VM systems.  I have hopes that the
> spread of Java will persuade OS designers to include more useful VM
> interfaces.

No doubt they will.  And they'll probably take the credit for inventing
it in the first place.
From: Pekka P. Pirinen
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <ixd7pls874.fsf@harlequin.co.uk>
"Joe Marshall" <·········@alum.mit.edu> writes:
> Below, we'll be referring to three sets,
> the "black", newspace, what I called set A before
> the "grey", scavenge-queue, what I called B before
> and the "white", oldspace, what I called C before

The three colors are the standard abstract description of tracing GC,
whereas "newspace" etc. are copying collector nomenclature.

> Pekka P. Pirinen <·····@harlequin.co.uk> wrote
> > You only need the read barrier on the scavenge queue (set B, the
> > "grey" set).
> 
> You want to prevent non-black pointers ending up in the black set.

YM white.  Or perhaps it's just naming: when I say "non-black pointer"
I mean "a pointer to a non-black object".

> So you can put the read-barrier between the scavenge-queue
> and the "white" set, or you can put the read-barrier on the scavenge
> queue itself.  The former mechanism allows the mutator to manipulate
> grey elements, perhaps allowing it to access more data, and
> it transports objects on a pointer-by-pointer basis.

Logically, these are more alike than it seems at first glance: in
either case, we're talking about a read-barrier on grey objects, i.e.,
the collector does something special when the mutator tries to read
out of a grey object.  In the former (LispM) mechanism this only
happens, if a white pointer is being read; in practice, the LMI
hardware didn't care _where_ the white pointer was being read from,
but it could only have been a grey object (since black ones don't
contain white pointers, by the invariant, and white ones can't be seen
by the mutator).  Different implementation, same logical structure.

> > This can be done with standard VM hardware, if you're
> > willing to take the trap even when the pointer being read isn't
> > oldspace (set C, "white").
> 
> This method would trap every time the mutator saw a "grey"
> pointer.

Every time the mutator tried to read through a grey pointer, yes.

> > Appel, Ellis and Li [...] showed that if you scan the entire page
> > when the barrier is hit, you can get acceptable performance [...]
> 
> Ok, but doesn't this mean that you transport the entire page of
> objects into newspace?  Wouldn't it retain more storage?

No, it scans a page of grey objects (it's a read barrier on grey
implemented by read-protecting the pages we put grey objects on),
transporting all the white objects they pointed to.  Now the grey
objects are black, the read protection can be removed, and the mutator
allowed to complete its read out of this page.  The cost of taking the
protection trap is offset by doing a lot of work for each trap (you
could do additional pages as well).
-- 
Pekka P. Pirinen
 The Memory Management Reference: articles, bibliography, glossary, news
 <URL:http://www.harlequin.com/mm/reference/>
From: Christian Lynbech
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <ofaeksb90x.fsf@chl.tbit.dk>
>>>>> "Chuck" == Chuck Fry <······@best.com> writes:

Chuck> In article <······················@dfw-read.news.verio.net>,
Chuck> Joe Marshall <·········@alum.mit.edu> wrote:
>> If you want to have scavenging run in parallel to a mutator process,
>> you need a `read barrier' to keep oldspace pointers out of
>> `the machine'.  

Chuck> Can someone please elaborate on this?  I imagine the read barrier itself
Chuck> is trivial to implement with the standard virtual memory hardware on
Chuck> most modern CPUs, but I'm a bit unclear on how it would be used.

Trivial yes but efficient no. This is along the same reasons that
reference counting GCs performs so horribly on standard hardware;
modern CPUs are not optimized for such behaviours. Write barriers
seems more popular, perhaps because programs typically read more than
hey write.  All in my not so sophisticated understanding of GC.

I believe Henry Baker has a paper on how to do this. He describes an
incremental GC, but as I remember, it lends itself very well to
running it in a separate thread.

His archive is present at:

        ftp://ftp.netcom.com/pub/hb/hbaker/home.html


>> Even on a LispM, which had a hardware read barrier, a global
>> GC is so disk-intensive that it will thrash your virtual memory,
>> so global GC is pretty painful no matter what you do.

Chuck> And that was on a VM system that was *tuned* for GCing... the typical
Chuck> Un*x LRU page replacement algorithm is worse.

However, having a well performing and multiprocessor utilizing
incremental/generational GC would still make the world a better place,
even if it would make a global (and hopefully not so frequent) GC
slower, IMHO.


---------------------------+--------------------------------------------------
Christian Lynbech          | Ericsson Telebit A/S                       
Fax:   +45 8628 8186       | Fabrikvej 11, DK-8260 Viby J
Phone: +45 8738 2228       | email: ···@tbit.dk --- URL: http://www.tbit.dk
---------------------------+--------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - ·······@hal.com (Michael A. Petonic)
From: Marco Antoniotti
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <lwln4d3y2e.fsf@parades.rm.cnr.it>
David Bakhash <·····@alum.mit.edu> writes:

> so why is it that every single Lisp I've ever has chosen not to make
> GC a separate thread?  If it were (and I'm sure that this would
> complicate the system somewhat), then the global GCs wouldn't be so
> severe.

Maybe, because Posix threads have not been available until recently?

Cheers

-- 
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - 06 68 10 03 17, fax. +39 - 06 68 80 79 26
http://www.parades.rm.cnr.it/~marcoxa
From: ·····@corman.net
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <38b428af.58360573@news>
On 22 Feb 2000 12:22:14 -0500, David Bakhash <·····@alum.mit.edu>
wrote:

>so why is it that every single Lisp I've ever has chosen not to make
>GC a separate thread?  If it were (and I'm sure that this would
>complicate the system somewhat), then the global GCs wouldn't be so
>severe.
>
>dave
IMHO the separate-threaded GC is highly overrated. My experience with
several Java environments has caused no end of trouble, with regard to
the GC thread getting behind the allocator threads. In addition, the
overall performance penalty is much larger for an incremental GC than
for a "stop and copy" generational GC. This may not be true if the GC
thread were running on a totally seperate processor, but I cannot
imagine this to be a cost-effective architecural design. In my
opinion, if you really want incremental GC (for real-time reasons)
then it probably is best done as part of the allocation function,
rather than as a separate thread. In this way, if the GC is lagging
allocation, it can devote a few more cycles to catch up.

A good "stop and copy, generational" GC can do its most common
collections in a few milliseconds, which is certainly suitable for
user interfaces and most computing tasks. Real time tasks which
require completely bounded pauses have many different options,
including forgoing heap allocation completely (although this is
sometimes tricky in lisp). Note that even standard C/C++ techniques
such as reference counting can be a problem in real-time situations.
For example, deleting a root object may cause thousands of other
objects to become dereferenced, in a domino-like fashion, which can be
quite inefficient and result in a lengthy pause in computation.

Roger Corman
From: Robert Munyer
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <88sn3n$gh8$4@eve.enteract.com>
In article <···············@engc.bu.edu>,
David Bakhash <·····@alum.mit.edu> wrote:

> Suppose you're writing a simulation where you have many independent
> processes operating in a common "world" (e.g. independent gas
> molecules inside a volumetric world).

If you can afford a used Connection Machine ;-) take a look at
their SIMD version of Lisp (I think it's called *Lisp?).  It's
very interesting for the kind of project you're contemplating.

-- Robert Munyer <······@mcs.com>
From: Paolo Amoroso
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <5lWyOIu=gzHHDZe+PhciF+zxzOP3@4ax.com>
On Mon, 21 Feb 2000 19:03:35 -0600 (CST), Robert Munyer <······@mcs.com>
wrote:

> If you can afford a used Connection Machine ;-) take a look at
> their SIMD version of Lisp (I think it's called *Lisp?).  It's

The CMU CL port of the *Lisp simulator is available at:

  ftp://ftp.csl.sri.com/pub/users/gilham/starlisp.tar.gz


Paolo
-- 
EncyCMUCLopedia * Extensive collection of CMU Common Lisp documentation
http://cvs2.cons.org:8000/cmucl/doc/EncyCMUCLopedia/
From: Rainer Joswig
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <rainer.joswig-8AFD72.08123022022000@news.is-europe.net>
In article <············@eve.enteract.com>, Robert Munyer 
<······@mcs.com> wrote:

> In article <···············@engc.bu.edu>,
> David Bakhash <·····@alum.mit.edu> wrote:
> 
> > Suppose you're writing a simulation where you have many independent
> > processes operating in a common "world" (e.g. independent gas
> > molecules inside a volumetric world).
> 
> If you can afford a used Connection Machine ;-) take a look at
> their SIMD version of Lisp (I think it's called *Lisp?).  It's
> very interesting for the kind of project you're contemplating.
> 
> -- Robert Munyer <······@mcs.com>

Also take a look at CM Lisp (Connection Machine Lisp).
-> W. Daniel Hillis. The Connection Machine. MIT Press, Cambridge, Massachusetts, 1985.

Rainer Joswig, ISION Internet AG, Harburger Schlossstra�e 1, 
21079 Hamburg, Germany, Tel: +49 40 77175 226
Email: ·············@ision.de , WWW: http://www.ision.de/
From: Robert Monfera
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <38B0DAE2.1A51995D@fisec.com>
David Bakhash wrote:

> 1) CL running in a multi-processor clustered systems could
> efficiently use all the CPUs to divide and conquer the problem.

Yes, but only on platforms where Lisp threads are mapped to OS threads.
The only implementations that do that are ACL for Windows, LW for
Windows and Corman Lisp.  Roger said he hasn't tested multiprocessor
execution, but it should work.  Maybe someone has experiences with the
other two systems?  (I don't think CL implementations in general perform
GC as a separate thread while other threads run.)  I don't know if Lisp
threads on Unix are not mapped to lightweight OS processes because they
lack some features Windows threading has, or just because Windows
implementations are more recent.

One difference is that things like WITHOUT-INTERRUPTION or
UNWIND-PROTECT may work differently compared to a single-threaded model,
e.g., unwind-protected functions may not work atomically.  In preemptive
multitasking the OS has full control of stopping a thread and starting a
new one, or (on a MP system) running many in parallel, besides setting
priority of course.

> 2) If it's possible to use the "Lisp world" model (e.g. access to
>    what's going on in each thread, globals, etc.) to simplify the
>    simulation code.

Yes, implementations give (non-standardized, but fairly similar)
functions to monitor if a thread is running a function or waiting for
something for example.  Check out sample codes in the documentation.

>    Phrased another way, the traditional way to simulate this
>    environment would be to discretize time, and then with each
>    timestep, loop through the different objects, have them behave
>    according to your model, update the state of the overall system,
>    and then increment time again, etc.

Yes (although other different simulation methods exist, too).  Depending
on the number of objects, threads may be too coarse for simulation
purposes.  Each thread creation allocates a lot of memory (both in Lisp
and at the OS level), maybe several kilobytes.  Even worse, there's
quite a bit of overhead incurred by thread scheduling by the OS, which
is a lot of work compared to having you looping through the object.  I
think operating systems are also not prepared to handle hundreds or
thousands of threads (maybe someone knows practical limits)
- for example, Windows 95 is reckoned to leak a few kilobytes of memory
at every thread creation.

Maybe the best you can do to maximize throughput is to create one thread
per processor.

I think it's a shame that few computers are built with massively
parallel architectures, and even those are mostly custom-programmed, for
research and cloud simulation etc.  You could check out Erlang, a purely
functional language optimized to run in parallel environments.
Hopefully time will come when every non-trivial computer will be
massively parallel.

Robert
From: Lieven Marchand
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <88sa3m$ihe$1@newnews1.news.nl.uu.net>
Robert Monfera <·······@fisec.com> writes:

> Hopefully time will come when every non-trivial computer will be
> massively parallel.

Thinking Machines' CM-x weren't exactly a huge success. They came with
a version of Common Lisp enhanced for parallel execution called *Lisp.

-- 
Lieven Marchand <···@bewoner.dma.be>
If there are aliens, they play Go. -- Lasker
From: Rainer Joswig
Subject: Re: simulations, Lisp, and threads...
Date: 
Message-ID: <rainer.joswig-BA4686.01424922022000@news.is-europe.net>
In article <············@newnews1.news.nl.uu.net>, Lieven Marchand 
<···@bewoner.dma.be> wrote:

> Robert Monfera <·······@fisec.com> writes:
> 
> > Hopefully time will come when every non-trivial computer will be
> > massively parallel.
> 
> Thinking Machines' CM-x weren't exactly a huge success. They came with
> a version of Common Lisp enhanced for parallel execution called *Lisp.

There were conferences on parallel Lisps - atleast two
conference books have been published on this topic.
See also "Paralation Lisp" (published by MIT Press, I think),
MultiLisp, TopLevel Lisp, ABCL, ...

Rainer Joswig, ISION Internet AG, Harburger Schlossstra�e 1, 
21079 Hamburg, Germany, Tel: +49 40 77175 226
Email: ·············@ision.de , WWW: http://www.ision.de/