From: Christopher J. Vogt
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <35A37E78.B45A56EC@computer.org>
Robert Swindells wrote:
> 
> Christopher J. Vogt wrote:
> >Tim Bradshaw wrote:
> >>
> >> * Kevin Haddock wrote:
> >>
> >> > I remeber reading somewhere that certain hardware features can give
> >> > lisp a big performance boost (e.g. sparc's having data type registers
> >> > associated with data registers).  Can anyone give me a rundown on
> >> > inexpensive/common systems or plug in cards that provide the best bang
> >> > for the buck when running lisp (e.g. how is the power-pc, alpha,
> >> > writable instruction set processors, etc...)
> >>
> >> I think that all recent experience has been that special `Lisp' HW is
> >> actually not worth it.  A few companies tried it, they all died, and
> >> I'm not aware of any case where their HW had a significant advantage
> >> over general-purpose HW for long.
> 
> >If you look at "Performance and Evaluation of Lisp Systems" by Gabriel
> >from 1985, I think that you'll see that the Symbolics computers were 2
> >to 3 times faster than comparable "workstations" from Sun and Apollo
> >on all the benchmarks.  I consider a factor of 2 or 3 to be
> >"significant".
> 
> They were also about 5 times more expensive.

I never said that they weren't more expensive.  The original post was about
hardware for faster Lisp execution.  It seemed to me that the response suggested
that you couldn't get much of a performance boost from special hardware over
conventional hardware.  I refuted this assertion.

>[...]
> 
> >Furthermore, when it comes to floating point, current
> >hardware/software combinations have only recently become faster than
> >Symbolics 1985 hardware/software.  In 1985 the Symbolics 3600+IFU too
> >3.87 seconds on the FFT benchmark, I recently had an opportunity to
> >use Franz's Beta 5 version of ACL for windows NT on my 133MHZ
> >processor, and got a time of 3.28.  This really shouldn't be
> >surprising if you understand the issues of type checking, boxing
> >floats, and GC.
> 
> The crossover happened far earlier than this.
> 
> Nobody was going to buy a Symbolics based on how fast it ran FFT. You
> need to look at the times for the larger AI style ones - BROWSE, BOYER
> and TRIANG. On these benchmarks, a 486 is comparable to PSL-CRAY and
> significantly faster than any of the full Common Lisp implementations
> that are listed in the book.
> 
> I think that there is something wrong with how you ran that benchmark.
> My 486DX4-100 will run FFT in 0.29 seconds. My PII-233 takes 0.03s
> using CMUCL, 0.02s using ACL 4.3.

While I might have made a mistake, your timings suggest to me that you have
added type declarations to the original benchmark.  You can get improved
performance from many compilers by added type declartions in your code.

> 
> I would be interested to see the times for the last generation of
> dedicated hardware e.g. TI Explorer-2 and XL1200.
> 
> Robert Swindells
> ---------------------------------
> Robert Swindells     - GenRad Ltd

-- 
Christopher J. Vogt - Computer Consultant - Lisp, AI, Graphics, etc.
http://members.home.com/vogt/

From: Tim Bradshaw
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <ey3af6iyotx.fsf@todday.aiai.ed.ac.uk>
* Christopher J Vogt wrote:

> I never said that they weren't more expensive.  The original post
> was about hardware for faster Lisp execution.  It seemed to me that
> the response suggested that you couldn't get much of a performance
> boost from special hardware over conventional hardware.  I refuted
> this assertion.

Actually, my post said `special `Lisp' HW is actually not worth it',
and I was indeed intending this to refer to the fact that the
performance increase from special HW is not economically viable.

I'm quite sure that one could build a special very-high-performance
lisp machine, the same way you can build, say, a special
very-high-performance vector machine for heavy numerical work.  But I
doubt such a machine would justify its cost (neither do vector
machines, arguably, although the case is less clear there as the
target is much clearer).  I'm also quite sure such a machine wouldn't
look too much like the special HW that was built (except possibly the
very late, never-sold LMI machine), as things like the 3600
architecture are crippled at least by varying clock speeds.  

If I wanted to build a lisp supercomputer I'd look very hard at what
compilers really can't do -- for instance, perhaps the nasty (op
fixnum fixnum) -> possibly-a-bignum cases are worth worrying about.  A
fix for that would be a speculative fixnum operation combined with a
pll typecheck and the ability to back out the fixnum results.  This
(not the type check perhaps) is the kind of thing that superscalar
machines do now of course: it might even be doable simply by relying
on a smarter compiler, already.  My idea is that this would let the
compiler assume `probably a fixnum' and compile good code. If I wanted
to build a *supercomputer* I'd also not muck around with VM anmd all
the GC implications -- RAM is cheap, MMUs are slow, just put a lot in,
and use the silicon you save on either much smarter caching or just
much more memory bandwidth.

[About benchmark times for FP stuff]

> While I might have made a mistake, your timings suggest to me that you have
> added type declarations to the original benchmark.  You can get improved
> performance from many compilers by added type declartions in your code.

Yes, of course (although not on the Symbolics, at least not for float
declarations).  But adding declarations isn't cheating, it's just
making the benchmark measure something useful.  Otherwise you might as
well just run it interpreted and argue that the modern system is very
slow...

--tim
From: Chuck Fry
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <6o5fo6$32f$1@shell5.ba.best.com>
In article <···············@todday.aiai.ed.ac.uk>,
Tim Bradshaw  <···@aiai.ed.ac.uk> wrote:
>If I wanted to build a lisp supercomputer I'd look very hard at what
>compilers really can't do -- for instance, perhaps the nasty (op
>fixnum fixnum) -> possibly-a-bignum cases are worth worrying about.  A
>fix for that would be a speculative fixnum operation combined with a
>pll typecheck and the ability to back out the fixnum results.  This
>(not the type check perhaps) is the kind of thing that superscalar
>machines do now of course: it might even be doable simply by relying
>on a smarter compiler, already.  

I've been looking closely at code Harlequin LispWorks generates on
PowerPC, and I find it already does something very similar.

Common Lisp code like (+ (the fixnum x) (the fixnum y)) translates into
a simple register-to-register add instruction, followed by a
branch-if-overflow.  If the branch is taken, an escape routine that does
the fixnum + fixnum -> bignum add is called.  Now obviously the branch
can't happen in parallel, but I don't think this slows things down much
on the Power PC, since the following instruction in the branch-not-taken
scenario is already being speculatively executed in parallel with the
branch test!

Since on modern RISC machines, the processor can run quite a bit faster
than the memory system, I think we're getting into a situation where the
runtime checks necessary in Common Lisp no longer cost very much time
compared to the memory fetches.  This doesn't make naively written CL
code competitive with C, of course, but a careful CL coder with a good
optimizing compiler can write inner loop code that's nearly as fast as
the equivalent C code.

The memory bottleneck has other implications, most importantly in data
structure design.  There has always been a speed penalty for linked
lists over vectors, but modern processors exacerbate it.  On-chip cache
architectures which prefetch an entire cache line on every cache miss
have the effect of making the 2nd and successive accesses to a vector
substantially faster -- 1 CPU clock cycle vs. 1 RAM cycle.

As one example, there is a BIG penalty for using lists of length 2, when
compared to just using conses to hold the same data.  The CDR of the
cons cell will almost always be in the cache whenever the CAR has been
accessed.

For another example, one project I'm working on depends heavily on a
frame system written in the late '80s, where the frame data were stored
on property lists.  I changed the frame representation to a "property
vector" and got a 10-15% speedup in the overall program by carefully
optimizing the property vector search loop.  Allocation of the vectors
is the weak link in this implementation, since one must either allocate
a larger vector than necessary or reallocate when it grows (I did both).
Even so, overall memory use is no worse than the plist version in
Allegro CL, and quite a bit better in LispWorks, which for some reason
uses 3-word cons cells on PowerPC.

When 64-bit CPUs become commonplace, I suspect we'll see a return of the
Lisp machine CDR-coded list representation, for the simple reason that
it takes maximum advantage of cache prefetching.

I've been saying for some time that one of Lisp's biggest problems is
that many Common Lisp users still program as if CL only offered lists
and symbols for data structures.  I would love to see someone write a
Common Lisp textbook that put off using lists until the middle third of
the book!

>							    If I wanted
>to build a *supercomputer* I'd also not muck around with VM anmd all
>the GC implications -- RAM is cheap, MMUs are slow, just put a lot in,
>and use the silicon you save on either much smarter caching or just
>much more memory bandwidth.

Ah, but VM hardware has other uses: garbage collection, for one;
protection from runaway programs for another.

Also, it's not clear that large amounts of fast RAM will ever be cheap,
nor that bulk RAM will ever be fast.  Note that commodity dynamic RAM
speeds have hardly increased over the last decade, perhaps from 100 ns
to 60 ns access, while processor clock speeds have increased by an order
of magnitude or more.

 -- Chuck
-- 
	    Chuck Fry -- Jack of all trades, master of none
 ······@chucko.com (text only please), ········@home.com (MIME enabled),
		  ······@gateway.idiom.com (SPAM ONLY)
From: Christopher J. Vogt
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <35A93A72.3F95E94D@computer.org>
Chuck Fry wrote:
> 
> 
> I've been saying for some time that one of Lisp's biggest problems is
> that many Common Lisp users still program as if CL only offered lists
> and symbols for data structures.  I would love to see someone write a
> Common Lisp textbook that put off using lists until the middle third of
> the book!

I strongly agree with this!

> 
> >                                                           If I wanted
> >to build a *supercomputer* I'd also not muck around with VM anmd all
> >the GC implications -- RAM is cheap, MMUs are slow, just put a lot in,
> >and use the silicon you save on either much smarter caching or just
> >much more memory bandwidth.
> 
> Ah, but VM hardware has other uses: garbage collection, for one;
> protection from runaway programs for another.
> 
> Also, it's not clear that large amounts of fast RAM will ever be cheap,
> nor that bulk RAM will ever be fast.  Note that commodity dynamic RAM
> speeds have hardly increased over the last decade, perhaps from 100 ns
> to 60 ns access, while processor clock speeds have increased by an order
> of magnitude or more.

And this is the one place where I see Lisp systems costing more in terms
of performance and money.  Since every word (potentially) has an additional
tag byte, in a 32-bit word system, this is 25% more bits to move around
in an area where speed is not improving much, and tends to be the bottleneck
in many applications.  It is also 25% more in memory needed.  With 64-bit
systems some of this could be mitigated, but I still see it as potentially
problematic.

> 
>  -- Chuck
> --
>             Chuck Fry -- Jack of all trades, master of none
>  ······@chucko.com (text only please), ········@home.com (MIME enabled),
>                   ······@gateway.idiom.com (SPAM ONLY)

-- 
Christopher J. Vogt - Computer Consultant - Lisp, AI, Graphics, etc.
http://members.home.com/vogt/
From: Christopher J. Vogt
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <35A938BD.BF30BCEA@computer.org>
Tim Bradshaw wrote:
> 
> * Christopher J Vogt wrote:
> 
> > I never said that they weren't more expensive.  The original post
> > was about hardware for faster Lisp execution.  It seemed to me that
> > the response suggested that you couldn't get much of a performance
> > boost from special hardware over conventional hardware.  I refuted
> > this assertion.
> 
> Actually, my post said `special `Lisp' HW is actually not worth it',
> and I was indeed intending this to refer to the fact that the
> performance increase from special HW is not economically viable.

Well I agree and disagree.  If you are not a chip manufacturer, it may well
be the case that due to economy of scale, you couldn't produce an econimical
chip.  However, if you were Inte or Motorola, you could do it and it wouldn't
cost much.

>[...]
> 
> [About benchmark times for FP stuff]
> 
> > While I might have made a mistake, your timings suggest to me that you have
> > added type declarations to the original benchmark.  You can get improved
> > performance from many compilers by added type declartions in your code.
> 
> Yes, of course (although not on the Symbolics, at least not for float
> declarations).  But adding declarations isn't cheating, it's just
> making the benchmark measure something useful.  Otherwise you might as
> well just run it interpreted and argue that the modern system is very
> slow...

Well, if you change a benchmark, you now have a different benchmark.  The
Gabriel benchmarks don't have type declarations in them.  If you want to 
create the Bradshaw benchmarks with type declarations, I have no compaint,
but don't claim to be using Gabriel benchmarks, when you (or somebody) has
modified them.

Adding declarations isn't cheating, but you are measuring something different.
The more type declarations you add to your code, the more chance of your
porgram crashing, something Lisp programs without type declarations don't do.

-- 
Christopher J. Vogt - Computer Consultant - Lisp, AI, Graphics, etc.
http://members.home.com/vogt/
From: Raymond Toy
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <4nu34mises.fsf@rtp.ericsson.se>
>>>>> "Christopher" == Christopher J Vogt <····@computer.org> writes:

    Christopher> measuring something different.  The more type
    Christopher> declarations you add to your code, the more chance of
    Christopher> your porgram crashing, something Lisp programs
    Christopher> without type declarations don't do.

Isn't that (crashing with declarations) a "feature" of your lisp?

CMUCL doesn't normally crash with extra declarations because
(according to the User's Guide), it accepts declarations and treats
them as assertions to be tested at the appropriate places.  You get
pretty fast code that won't crash anymore than the undeclared
version. 

All bets are off, of course, if you say (optimize (speed 3) (safety
0)).

Ray
From: Christopher J. Vogt
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <35AA64B0.12C02E72@computer.org>
Raymond Toy wrote:
> 
> >>>>> "Christopher" == Christopher J Vogt <····@computer.org> writes:
> 
>     Christopher> measuring something different.  The more type
>     Christopher> declarations you add to your code, the more chance of
>     Christopher> your porgram crashing, something Lisp programs
>     Christopher> without type declarations don't do.
> 
> Isn't that (crashing with declarations) a "feature" of your lisp?
> 
> CMUCL doesn't normally crash with extra declarations because
> (according to the User's Guide), it accepts declarations and treats
> them as assertions to be tested at the appropriate places.  You get
> pretty fast code that won't crash anymore than the undeclared
> version.
> 
> All bets are off, of course, if you say (optimize (speed 3) (safety
> 0)).

I have not used CMUCL.  The crashign w/ declarations that I'm talking about is
declare a parameter to a function as a float, but pass in a bignum.  Declare
an array as float, but store a bignum.  Often times the compiler produces 
code that relies on your assertions of type, and if you turn out to be wrong,
you will eventually crash, much as you do using C.  I don't know about CMUCL.
From: Pierre Mai
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <m3k95gv7gm.fsf@torus.cs.tu-berlin.de>
"Christopher J. Vogt" <····@computer.org> writes:

> I have not used CMUCL.  The crashign w/ declarations that I'm talking about is
> declare a parameter to a function as a float, but pass in a bignum.  Declare
> an array as float, but store a bignum.  Often times the compiler produces 
> code that relies on your assertions of type, and if you turn out to be wrong,
> you will eventually crash, much as you do using C.  I don't know about CMUCL.

;;; *** Don't forget to edit /var/lib/cmucl/site-init.lisp! ***
CMU Common Lisp Experimental 18a+ release x86-linux 2.2.0 cvs, running on torus
Send bug reports and questions to your local CMU CL maintainer,
or to ··@snoopy.mv.com,
or to ·······@uia.ac.be or ··············@uia.ua.ac.be
or to ··········@cons.org. (prefered)

type (help) for help, (quit) to exit, and (demo) to see the demos

Loaded subsystems:
    Python 1.0, target Intel x86
    CLOS based on PCL version:  September 16 92 PCL (f)
* (defun demo (x) (declare (type double-float x)) (+ x 2.0d0))

DEMO
* (proclaim '(optimize (speed 3)))

EXTENSIONS::%UNDEFINED%
* (compile 'demo)
Compiling LAMBDA (X):

In: LAMBDA (X)
  #'(LAMBDA (X) (DECLARE (TYPE DOUBLE-FLOAT X)) (BLOCK DEMO (+ X 2.0d0)))
Note: Doing float to pointer coercion (cost 13) to "<return value>".

Compiling Top-Level Form:

Compilation unit finished.
  1 note


DEMO
T
NIL
* (demo 12478900010391)


Type-error in KERNEL::OBJECT-NOT-DOUBLE-FLOAT-ERROR-HANDLER:
   12478900010391 is not of type DOUBLE-FLOAT

Restarts:
  0: [ABORT] Return to Top-Level.

Debug  (type H for help)

(DEMO #<unavailable-arg> #<unavailable-arg>)[:EXTERNAL]
Source: #'(LAMBDA (X) (DECLARE (TYPE DOUBLE-FLOAT X)) (BLOCK DEMO (+ X 2.0d0)))
0] 0

* (proclaim '(optimize (speed 3) (safety 0)))

EXTENSIONS::%UNDEFINED%
* (compile 'demo)
Compiling LAMBDA (X):

In: LAMBDA (X)
  #'(LAMBDA (X) (DECLARE (TYPE DOUBLE-FLOAT X)) (BLOCK DEMO (+ X 2.0d0)))
Note: Doing float to pointer coercion (cost 13) to "<return value>".

Compiling Top-Level Form:

Compilation unit finished.
  1 note


DEMO
T
NIL
* (demo 124784841941411)

2.0d0
*

;)

Regs, Pierre.

-- 
Pierre Mai <····@cs.tu-berlin.de>	http://home.pages.de/~trillian/
  "Such is life." -- Fiona in "Four Weddings and a Funeral" (UK/1994)
From: Barry Margolin
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <q6uq1.7$6o3.113175@cam-news-reader1.bbnplanet.com>
In article <··············@rtp.ericsson.se>,
Raymond Toy  <···@rtp.ericsson.se> wrote:
>All bets are off, of course, if you say (optimize (speed 3) (safety
>0)).

If the reason you're using declarations to allow the compiler to do more
optimization, aren't you likely to use optimization settings like that as
well?  The purpose of the declarations is to allow the compiler to omit the
type dispatching code.

-- 
Barry Margolin, ······@bbnplanet.com
GTE Internetworking, Powered by BBN, Cambridge, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
From: Raymond Toy
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <4nk95ha3dp.fsf@rtp.ericsson.se>
>>>>> "Barry" == Barry Margolin <······@bbnplanet.com> writes:

    Barry> In article <··············@rtp.ericsson.se>,
    Barry> Raymond Toy  <···@rtp.ericsson.se> wrote:
    >> All bets are off, of course, if you say (optimize (speed 3) (safety
    >> 0)).

    Barry> If the reason you're using declarations to allow the compiler to do more
    Barry> optimization, aren't you likely to use optimization settings like that as
    Barry> well?  The purpose of the declarations is to allow the compiler to omit the
    Barry> type dispatching code.

Yes, but I might not want (safety 0).  I think in CMUCL the
declarations do cause the compiler to omit the type dispatching code,
but it does check, at appropriate points where it hasn't proven the
type, that the type does match the declaration.   With (speed 3) and
default safety, I think the type checks are weakened, but not
removed.  With (safety 0), the type checks are removed.

I haven't checked recently, but I think that (speed 3) is virtually as
fast as (speed 3) (safety 0) because the compiler has good type
inference.

Ray
From: Bill Newman
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <wnewmanEw2Gz8.Jqo@netcom.com>
Raymond Toy (···@rtp.ericsson.se) wrote:
: >>>>> "Barry" == Barry Margolin <······@bbnplanet.com> writes:

:     Barry> In article <··············@rtp.ericsson.se>,
:     Barry> Raymond Toy  <···@rtp.ericsson.se> wrote:
:     >> All bets are off, of course, if you say (optimize (speed 3) (safety
:     >> 0)).

:     Barry> If the reason you're using declarations 
:     Barry> to allow the compiler to do more
:     Barry> optimization, aren't you likely to use 
:     Barry> optimization settings like that as
:     Barry> well?  The purpose of the declarations 
:     Barry> is to allow the compiler to omit the
:     Barry> type dispatching code.

: Yes, but I might not want (safety 0).  I think in CMUCL the
: declarations do cause the compiler to omit the type dispatching code,
: but it does check, at appropriate points where it hasn't proven the
: type, that the type does match the declaration.   With (speed 3) and
: default safety, I think the type checks are weakened, but not
: removed.  With (safety 0), the type checks are removed.

: I haven't checked recently, but I think that (speed 3) is virtually as
: fast as (speed 3) (safety 0) because the compiler has good type
: inference.

My experience matches what the CMUCL manual says: the type
declarations make an enormous difference in performance. There is a
further gain in performance going to unsafe code, but in my experience
it's never been as dramatic as the gain from putting in appropriate
declarations.

I haven't looked at the generated code much -- my assembly language
experience stopped with 8086s, 68000s, and VAXes and I may never
understand my Pentium at that level -- but I'll speculate about the
answer to Barry's question. The generated code can be a lot simpler if
it leads off with a few implicit assertions about types (generated
automatically from DECLAREs). After that, all operations need to
handle only the appropriate special cases (e.g. single-float-only or
fixnum-only, or simple-array only), rather than trying to generate
code that can handle e.g. any combination of bignums and single-floats
and fixnums and double-floats and rationals at any point inside your
function. There may be a few places after the initial type
declarations where the program needs to check again (e.g. making sure
that (INCF X Y) doesn't violate the declared FIXNUM-ness of X) but
even those can probably be handled more efficiently by producing code
to raise an error on overflow rather than by producing code to
construct and propagate the appropriate bignum on overflow.

  Bill Newman