From: Dmitrii Pasechnik
Subject: parallel scheme/lisp
Date: 
Message-ID: <t3en66e08p.fsf@duti515a.twi.tudelft.nl>
Are there any other than EULisp (which doesn't seem to be active any
more :-( ) 
attemps to make Scheme or Lisp run on a parallel machine
(in a multiprocessor mode, certainly) (I'd like something which runs
on a Cray J90...)

Or any attemps to make MPI interface for Scheme/Lisp ?

Thanks in advance,
Dmitrii
I'd appreciate a cc: to my email: ···········@twi.tudelft.nl

From: Daniel Neri
Subject: Re: parallel scheme/lisp
Date: 
Message-ID: <x8fen652lza.fsf@chagall.nada.kth.se>
Dmitrii Pasechnik <····@duti515a.twi.tudelft.nl> writes:

> Are there any other than EULisp (which doesn't seem to be active any
> more :-( ) 
> attemps to make Scheme or Lisp run on a parallel machine
> (in a multiprocessor mode, certainly) (I'd like something which runs
> on a Cray J90...)

Have a look at Kali (based on Scheme48), from the brave boys (and
girls?) of NECI. I don't an URL handy at the moment, but AltaVista is
your friend. ;-)

> Or any attemps to make MPI interface for Scheme/Lisp ?

I've thought about hacking Kali to use MPI instead of TCP/IP and thus
make use of the High Perfomance Switch when running on an IBM SP2
machine. (What do people think of this idea?)

> Thanks in advance,
> Dmitrii

/Daniel
-- 
Daniel Neri <·······@nada.kth.se> (8 bits, please)
Correspondence in swedish, english or french welcome
From: Sean Doran
Subject: Re: parallel scheme/lisp
Date: 
Message-ID: <52zpotngw2.fsf@sean.ebone.net>
Daniel Neri <·······@mumrik.nada.kth.se> writes:

Kali Scheme is cool.

> I don't an URL handy at the moment

http://www.neci.nj.nec.com/PLS/kali.html

> I've thought about hacking Kali to use MPI instead of TCP/IP and thus
> make use of the High Perfomance Switch when running on an IBM SP2
> machine. (What do people think of this idea?)

Better yet, generalize it so that you can use whatever
random transport you care to bolt on underneath.

Some sort of channel running through SSH would be kinda
neat, for instance, for doing stuff across the Internet.

While you're hacking Kali, port scsh to it.  --:)

	Sean.
From: William Paul Vrotney
Subject: Re: parallel scheme/lisp
Date: 
Message-ID: <vrotneyEHD5Ar.IH7@netcom.com>
In article <··············@duti515a.twi.tudelft.nl> Dmitrii Pasechnik
<····@duti515a.twi.tudelft.nl> writes:

> 
> Are there any other than EULisp (which doesn't seem to be active any
> more :-( ) 
> attemps to make Scheme or Lisp run on a parallel machine
> (in a multiprocessor mode, certainly) 

Well, the Connection Machine was programmed in Lisp and it had
multi-processor cells.

> (I'd like something which runs
> on a Cray J90...)
> 
> Or any attemps to make MPI interface for Scheme/Lisp ?
> 

There was a Common Lisp called Top Level Common Lisp that ran on a few
platforms and it had parallel evaluation at 4 grain sizes.  I don't remember
if they supported Cray or not.  I recall that it was implemented for one of
the parallel machines at the time.  I also don't know if they are still
around.

The architecture and flexibility of Top Level Common Lisp made a lot of
sense to me when I evaluated it.


-- 

William P. Vrotney - ·······@netcom.com
From: Marco Antoniotti
Subject: Re: parallel scheme/lisp
Date: 
Message-ID: <scfg1qlto68.fsf@infiniti.PATH.Berkeley.EDU>
·······@netcom.com (William Paul Vrotney) writes:

> 
> There was a Common Lisp called Top Level Common Lisp that ran on a few
> platforms and it had parallel evaluation at 4 grain sizes.  I don't remember
> if they supported Cray or not.  I recall that it was implemented for one of
> the parallel machines at the time.  I also don't know if they are still
> around.
> 
> The architecture and flexibility of Top Level Common Lisp made a lot of
> sense to me when I evaluated it.
> 

I am no expert of parallel processing.  I suppose "4 grain sizes" has
some formal definition.  Can you briefly explain what it is?

-- 
Marco Antoniotti
==============================================================================
California Path Program - UC Berkeley
Richmond Field Station
tel. +1 - 510 - 231 9472
From: William Paul Vrotney
Subject: Re: parallel scheme/lisp
Date: 
Message-ID: <vrotneyEHF2Fr.8BM@netcom.com>
In article <···············@infiniti.PATH.Berkeley.EDU> m a r c o x a @path
. berkeley . edu (Marco Antoniotti) writes:

> 
> ·······@netcom.com (William Paul Vrotney) writes:
> 
> > 
> > There was a Common Lisp called Top Level Common Lisp that ran on a few
> > platforms and it had parallel evaluation at 4 grain sizes.  I don't remember
> > if they supported Cray or not.  I recall that it was implemented for one of
> > the parallel machines at the time.  I also don't know if they are still
> > around.
> > 
> > The architecture and flexibility of Top Level Common Lisp made a lot of
> > sense to me when I evaluated it.
> > 
> 
> I am no expert of parallel processing.  I suppose "4 grain sizes" has
> some formal definition.  Can you briefly explain what it is?
> 

For Top CL it had something to do with the nature and weight of the process.
I don't remember the exact details, but the grain size was something like
(from heavier to lighter):

        4. Distributed process
        3. Multi process
        2. Lightweight process
        1. Hardware process

But this is really from vague memory.  Perhaps Kelly Murray could give a
more formal and accurate definition.




-- 

William P. Vrotney - ·······@netcom.com
From: Kelly Murray
Subject: Re: parallel scheme/lisp
Date: 
Message-ID: <610trb$85i$1@vapor.franz.com>
In article <·················@netcom.com>, ·······@netcom.com (William Paul Vrotney) writes:
>> In article <···············@infiniti.PATH.Berkeley.EDU> m a r c o x a @path
>> . berkeley . edu (Marco Antoniotti) writes:
>> 
>> > ·······@netcom.com (William Paul Vrotney) writes:
>> > > There was a Common Lisp called Top Level Common Lisp that ran on a few
>> > > platforms and it had parallel evaluation at 4 grain sizes.  I don't remember
>> > 
>> > I am no expert of parallel processing.  I suppose "4 grain sizes" has
>> > some formal definition.  Can you briefly explain what it is?
>> > 
>> 
>> For Top CL it had something to do with the nature and weight of the process.


The big challenge when doing parallel processing is to achieve the goal
of actually improving performance over a serial program.  
The difficulty is that parallism has overhead, and if the overhead is
too big, the program will run *slower* than a serial version.

So one must have a very good understanding of overhead of using parallel
constructs to use them effectively.  The four grain-sizes were an attempt
to define these overheads and the functionality gained by increases in overhead.
I expressed the overhead in units of "function calls" (FC) of the base lisp,
where a function call is the time to call and return from a function.
Of course, this time varies with implementation, but it is about the best
you can do in terms of a high-level language performance unit.
TopCL has three shared-memory parallel constructs, each of which is
roughly an order of magnitude increase in overhead.

 3. Process (4000-10000 FC)  supports I/O, some private address space.
 2. Task    (40-100 FC)  has it's own stack, supports timesharing, special bindings
 1. Thread  (4-10 FCs)   no stack, semantically runs to completion
 

The fourth level was a Node, which was a distributed memory construct,
but I never really implemented that in TopCL
 (but did work on this initially at UMASS).
 
Since I didn't have many users, whether this all made sense and really 
was effective, I can't say with a lot of confidence.
I did write lots of example programs that ran faster,
and my compiler actually ran in parallel, and could compile files up to three
times faster on a  multiprocessor machine.

Another key aspect of TopCL is that it supported Futures, which are objects
that represent the return value of a parallel function call, and
support implicit synchronization. 
I believe Scheme defines a Promise using the same idea.
Futures are a useful construct even without parallelism,
and can essentially be used as "forwarding pointers", or also support
lazy evaluation in a serial system.

E.g. (let ((mailer-host (process #'domain-name-lookup "franz.com" "MX")))
        ...
        (smtp:send-mail :host mailer-host :name "kem")      
        ...)
        
        and
           mailer-host is bound to a future object that can be used
           like any object, and only when it's value is actually needed,
           the process needing the value will implicitly wait if needed until   
           the value becomes available.  So the smtp:send-mail function
           does not have to be rewritten and can proceed in parallel with
           the domain name lookup until the hostname is needed

Ok, that's enough already...

-Kelly Murray  ···@franz.com 
From: dan corkill
Subject: Re: parallel scheme/lisp
Date: 
Message-ID: <60ucug$g9a@kernighan.cs.umass.edu>
In article <·················@netcom.com>,
William Paul Vrotney <·······@netcom.com> wrote:

>There was a Common Lisp called Top Level Common Lisp that ran on a few
>platforms and it had parallel evaluation at 4 grain sizes.  I don't remember
>if they supported Cray or not.  I recall that it was implemented for one of
>the parallel machines at the time.  I also don't know if they are still
>around.


Technically, Top CL was quite good.  The market for parallel Lisp was
not.  Kelly Murray, the architect of Top CL is now at Franz.

So it goes....
From: Kelly Murray
Subject: Re: parallel scheme/lisp
Date: 
Message-ID: <60vbo6$iqu$1@vapor.franz.com>
In article <··········@kernighan.cs.umass.edu>, ····@cs.umass.edu (dan corkill) writes:
>> In article <·················@netcom.com>,
>> William Paul Vrotney <·······@netcom.com> wrote:
>> 
>> >There was a Common Lisp called Top Level Common Lisp that ran on a few
>> >platforms and it had parallel evaluation at 4 grain sizes.  I don't remember
>> >if they supported Cray or not.  I recall that it was implemented for one of
>> >the parallel machines at the time.  I also don't know if they are still
>> >around.
>> 
>> 
>> Technically, Top CL was quite good.  The market for parallel Lisp was
>> not.  Kelly Murray, the architect of Top CL is now at Franz.
>> 
>> So it goes....
>> 
>> 

Hey guys, thanks for remembering me and my old company!  :)

TopCL ran on 4 different shared-memory multiprocessors,
Sequent (386), Encore (32K), DataGeneral (88K),  Solbourne (Sparc),
and a SGI (mips) port was in the works when I gave up.

It seemed the only people interested in parallel processing at the time
(1990) were supercomputer geeks, which had about a zero intersection
with Lisp geeks, and hence no market for the product.

It would be a big win today, as finally shared memory
multiprocessor machines are pretty much a requirement for doing any
heavy server work, like for a webserver for example ;)
MP's *still* haven't made it onto the desktop like I predicted,
and I doubt they will anytime soon either, with Microsoft in
control of desktop OS's and apps, and the Web
which relies on a larger server for an apps heavy lifting.

If things go my way and some customers and/or funding appears,
Franz may have a true multiprocessor CL in the not-to-distant future.

-Kelly Murray   ···@franz.com
From: Reginald S. Perry
Subject: Re: parallel scheme/lisp
Date: 
Message-ID: <sxl7mbvudfc.fsf@yakko.zso.dec.com>
···@math.ufl.edu (Kelly Murray) writes:

> It seemed the only people interested in parallel processing at the time
> (1990) were supercomputer geeks, which had about a zero intersection
> with Lisp geeks, and hence no market for the product.
> 
> It would be a big win today, as finally shared memory
> multiprocessor machines are pretty much a requirement for doing any
> heavy server work, like for a webserver for example ;)
> MP's *still* haven't made it onto the desktop like I predicted,
> and I doubt they will anytime soon either, with Microsoft in
> control of desktop OS's and apps, and the Web
> which relies on a larger server for an apps heavy lifting.
> 

Keep the hope alive! I put together a kick-butt dual processor pentium
desktop system (actually its a mini-tower sitting on my
desktop). Right now its running NT but I plan to also put Linux on
it. Dual Pentium motherboards are pretty cheap now considering, I got
mine for $195US plus shipping and P5 processors are almost dirt cheap
so I suspect that there is a greater chance for the multiprocessor
loving Lisp hacking population to increase soon. :-)


-Reggie

-------------------
Reginald S. Perry                      e-mail: ·····@zso.dec.com   
Digital Equipment Corporation
Performance Manager Group	               
http://www.UNIX.digital.com/unix/sysman/perf_mgr/

The train to Success makes many stops in the state of Failure.
From: Jeffrey B. Siegal
Subject: Re: parallel scheme/lisp
Date: 
Message-ID: <34336414.D5D9C20F@aimnet.com>
Kelly Murray wrote:

> It would be a big win today, as finally shared memory
> multiprocessor machines are pretty much a requirement for doing any
> heavy server work, like for a webserver for example ;)

Not really.  To be sure, there are more reasonably priced multiprocessor machines around
now than ever before.  On the other hand, memory is more and more of a bottleneck (with
memory accesses are taking _hundreds_ of cycles) and the (short term) trend seems to be
away from shared memory multiprocessor systems with more than a small handful of
processors.

It is pretty easy to break up a big web server across separate independent systems, each
with its own memory bus.  Big database servers may be different; I don't know much about
those.
From: Bruno Haible
Subject: Re: parallel scheme/lisp
Date: 
Message-ID: <61e1m2$j3h$1@nz12.rz.uni-karlsruhe.de>
Jeffrey B. Siegal <···@aimnet.com> wrote:

> On the other hand, memory is more and more of a bottleneck (with
> memory accesses are taking _hundreds_ of cycles)

That was true 10 years ago. Today, memory architectures have good throughput
and fast caches. Here are some figures which were recently posted to
comp.arch:

   Digital PWS 600au:
           Mem type        Capacity        Bandwidth       Latency
           FP Registers    256 Byte        28.8GB/sec      1.67ns  
           L1 Cache        8 KByte         19.2GB/sec      1.67ns
           L2 Cache        96 KByte        9.6 GB/sec      5.0ns
           L3 Cache        2 MByte         873 MB/sec      23.3ns          
           Main Mem        1.5 GByte       1.07 GB/sec     105ns

   SGI Origin 2000:
           Mem type        Capacity        Bandwidth       Latency
           FP Registers    512 Bytes          ?              5 ns
           L1 Cache        32 KB           3.12 GB/s        10 ns
           L2 Cache        4 MB            2.08 GB/s (1)    41 ns (1)
           Main Mem        4 GB            780 MB/s (2)    313 ns

   (1) powering L2 with 130 MHz (default for Origin 2000)
   (2) this is peak, sustained 680 MB/s 

It becomes clear that to make efficient use of such a memory hierarchy,
the programmer has to think about locality of reference. That means
for Lisp in particular:

  - Use vectors, not lists, to store a fixed number of items (unless
    your Lisp implements lists through CDR-coding, which is rare on
    conventional hardware).

  - Does your Lisp's GC improve the locality of data in memory, or does
    it make it worse?

  - The Symbolics "resources" technique (which consists in having pools
    of preallocated and reusable objects of a certain type) is obsolete,
    because allocating a fresh object in recently touched memory will
    cause less cache misses than reusing an old object not touched for
    a long time.

                           Bruno
From: Rainer Joswig
Subject: Re: parallel scheme/lisp
Date: 
Message-ID: <joswig-ya023180001010971330550001@news.lavielle.com>
In article <············@nz12.rz.uni-karlsruhe.de>,
······@ma2s2.mathematik.uni-karlsruhe.de (Bruno Haible) wrote:

> It becomes clear that to make efficient use of such a memory hierarchy,
> the programmer has to think about locality of reference. That means
> for Lisp in particular:
> 
>   - Use vectors, not lists, to store a fixed number of items (unless
>     your Lisp implements lists through CDR-coding, which is rare on
>     conventional hardware).

like on a Symbolics

>   - Does your Lisp's GC improve the locality of data in memory, or does
>     it make it worse?

Symbolics does

>   - The Symbolics "resources" technique (which consists in having pools
>     of preallocated and reusable objects of a certain type) is obsolete,
>     because allocating a fresh object in recently touched memory will
>     cause less cache misses than reusing an old object not touched for
>     a long time.

Well, you find a similar tradeoff on a Symbolics between real
and virtual memory. The resource mechanism is being used for reducing
work on the GC. Typical situations for resources are
(according to Genera docs) very large objects (window bitmaps ,...)
at a high rate or large objects (large network packets, ..)
at a very high rate (-> cache misses may/will happen anyway).

Another use of resources are objects that are costly to initialize, but
cheap to reinitialize (this could be true for objects like
threads, TCP/IP stream objects ,...). Take a webserver that
has to create a thread and a TCP/IP connection for every
incoming requests at a high rate. Gettting these objects
often requires a complicated communication between the Lisp
system and the OS using different memory management systems.

I wouldn't say resources are obsolete.

Another question: how would you optimize a GC mechanism for use
on a modern machine architecture with 0.5-4 MB full-speed back-side cache?

-- 
http://www.lavielle.com/~joswig/
From: Jeffrey Mark Siskind
Subject: Re: parallel scheme/lisp
Date: 
Message-ID: <xohyb3z6mx3.fsf@qobi.nj.nec.com>
> the better GC for locality might involve static analysis to
> determine object lifetime and reuse space with a minimum number of
> dynamic reachability tests.  hopefully Lisp implementors are looking
> into these issues

Yes they are. Stalin does this.

    Jeff (http://www.neci.nj.nec.com/homepages/qobi)
From: Samuel S. Paik
Subject: Re: parallel scheme/lisp
Date: 
Message-ID: <paik-0210970651250001@3.paik.webnexus.com>
In article <············@vapor.franz.com>, ···@franz.com wrote:
>MP's *still* haven't made it onto the desktop like I predicted,
>and I doubt they will anytime soon either, with Microsoft in
>control of desktop OS's and apps

Today, you can buy a 2 processor Pentium/Pentium Pro/Pentium II
motherboard for less than $200 more than the single processor version.
That isn't much of a price penalty.  2 Pentium Pro 180s and a dual
processor Pentium Pro motherboard costs less than a Pentium II 266.

Prepackaged systems from PC system vendors tend to be a lot more
expensive because they are configured with a lot more than the
typical desktop and they are low volume so get a hefty markup.

Windows NT supports these dual processors quite well.  Linux and the
*BSDs have support in late "alpha" or "beta".  Presumably BeOS has
support for them.

Sam

-- 
Necessity is the Mother of Improvisation.       | Samuel S. Paik
Laziness and Greed are the Parents of Innovation| ····@webnexus.com
                                                  Speak only for self
From: Craig Brozefsky
Subject: Re: parallel scheme/lisp
Date: 
Message-ID: <615sdh$orq@eve.enteract.com>
In article <·····················@3.paik.webnexus.com>,
Samuel S. Paik <····@webnexus.com> wrote:
>
>Prepackaged systems from PC system vendors tend to be a lot more
>expensive because they are configured with a lot more than the
>typical desktop and they are low volume so get a hefty markup.
>
>Windows NT supports these dual processors quite well.  Linux and the
>*BSDs have support in late "alpha" or "beta".  Presumably BeOS has
>support for them.

Actually Linux has had support for SMP intel boxes for awhile, and
several benchmarks for multi-processor webserver setups down by Lan
Times and others have shown that the Linux SMP support actually is
beneficial, while the NT SMP support resulted in some performance
problems from lock conetntion and IRQ handling.  Linux itself has a
glut of spin locks spread thru the kernel which slowed it down quite a
bit, but the 2.1.X series apparently is going to be getting rid of the
majority of those.

I'm not much of a multiprocessor head, but my understanding is that
the Sparc and Alpha multiprocessor architectures are considerably more
beautiful than the Intel multi-processor architecture, or at least
that's the impression I got from those working ont he multi-proc
support for the various ports of Linux.

I can't speak for the BSDs or BeOS.  According to marketing white
papers tho BeOS does support multiple processors.
-- 
Craig Brozefsky              ·····@onshore.com
onShore Inc.                 http://www.onshore.com/~craig
Development Team             p_priority=PFUN+(p_work/4)+(2*p_cash)
I hear my inside, the mechanized hum of another world - Steely Dan
From: Jeffrey B. Siegal
Subject: Re: parallel scheme/lisp
Date: 
Message-ID: <3437282D.58848907@quiotix.com>
Craig Brozefsky wrote:

> Linux itself has a
> glut of spin locks spread thru the kernel which slowed it down quite a
> bit, but the 2.1.X series apparently is going to be getting rid of the
> majority of those.

Not correct.

Linux 2.0 has SMP support, but there is a single lock for the entire
kernel.  Only one processor can execute in the kernel at one time.
Whether it is "released," "alpha," or "beta" depends on whom you ask.

The Linux 2.1 project (in development) is breaking this up into multiple
locks, and also looking for ways to minimize the need for locking in the
kernel.

Whether or not kernel locking is a problem depends on your application.
Many useful applications are basically CPU-bound in user mode.  For
example, Linux 2.0 shows close to linear performance for a small number of
processors when compiling a large project with make -j.  The single kernel
lock isn't a serious problem in this case.

There are also hardware limitations.  For example, on most multiprocessor
PC's, only one CPU can handle interrupts.  (Actually, I think this is a
combined hardware/software problem, but I don't know the details.) Again,
if your system and application don't generate a huge number of hardware
interrupts, this isn't a problem.