From: Tim Bradshaw
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <ey3hg0vb6jo.fsf@todday.aiai.ed.ac.uk>
* Christopher J Vogt wrote:

> If you look at "Performance and Evaluation of Lisp Systems" by
> Gabriel from 1985, I think that you'll see that the Symbolics
> computers were 2 to 3 times faster than comparable "workstations"
> from Sun and Apollo on all the benchmarks.  I consider a factor of 2
> or 3 to be "significant".

But they were typically much more expensive, and they didn't keep
their advantage for long.  Certainly when RISC came along, and people
started to think harder about writing compilers for Lisp on RISCy
machines, they were pretty much ground into the dust in terms of
performance.  (I'm writing this as someone who owns several Symbolics
machines, so I'm not just bashing them without knowledge here!)  I
think in fact the crossover was before RISC really arrived -- things
like Sun 3/260s with Lucid were beating 3600s I think.

It may be that the commercial Lisps took a long time to pick up on
better compiler technology (have they yet?) because the market is too
small to justify the investment.  Things like CMUCL have been
producing good code on general-purpose HW for some time now.

> Furthermore, when it comes to floating point, current
> hardware/software combinations have only recently become faster than
> Symbolics 1985 hardware/software.  In 1985 the Symbolics 3600+IFU
> too 3.87 seconds on the FFT benchmark, I recently had an opportunity
> to use Franz's Beta 5 version of ACL for windows NT on my 133MHZ
> processor, and got a time of 3.28.  This really shouldn't be
> surprising if you understand the issues of type checking, boxing
> floats, and GC.

I think you must be making a mistake in your benchmarks -- perhaps not
declaring enough things or something?  The Symbolics certainly can
produce reasonable FP performance for single-float-only code
(double-float is death-by-consing), but it didn't have particularly
glamorous FP performance except possibly by the standards of other
Lisp systems with very poor FP support.  Something like CMU (or a
commercial lisp I'm sure), with declarations, should be hundreds of
times better than this on a modern machine.

--tim

From: Christopher J. Vogt
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <35A3841F.8224D3B8@computer.org>
Tim Bradshaw wrote:
> 
> * Christopher J Vogt wrote:
> 
> > If you look at "Performance and Evaluation of Lisp Systems" by
> > Gabriel from 1985, I think that you'll see that the Symbolics
> > computers were 2 to 3 times faster than comparable "workstations"
> > from Sun and Apollo on all the benchmarks.  I consider a factor of 2
> > or 3 to be "significant".
> 
> But they were typically much more expensive, and they didn't keep
> their advantage for long.  Certainly when RISC came along, and people
> started to think harder about writing compilers for Lisp on RISCy
> machines, they were pretty much ground into the dust in terms of
> performance.  (I'm writing this as someone who owns several Symbolics
> machines, so I'm not just bashing them without knowledge here!)  I
> think in fact the crossover was before RISC really arrived -- things
> like Sun 3/260s with Lucid were beating 3600s I think.

The Symbolics machines were more expensive, but a large part of that
is business and marketing.  If you get down to silicon, there is really
very little requried to support Lisp in hardware.  Therefore, business
issues aside (and ignoring the economy of scale) the cost of hardware
that would support Lisp would be only slightly more expensive than
conventional hardware.

> 
> It may be that the commercial Lisps took a long time to pick up on
> better compiler technology (have they yet?) because the market is too
> small to justify the investment.  Things like CMUCL have been
> producing good code on general-purpose HW for some time now.
> 
> > Furthermore, when it comes to floating point, current
> > hardware/software combinations have only recently become faster than
> > Symbolics 1985 hardware/software.  In 1985 the Symbolics 3600+IFU
> > too 3.87 seconds on the FFT benchmark, I recently had an opportunity
> > to use Franz's Beta 5 version of ACL for windows NT on my 133MHZ
> > processor, and got a time of 3.28.  This really shouldn't be
> > surprising if you understand the issues of type checking, boxing
> > floats, and GC.
> 
> I think you must be making a mistake in your benchmarks -- perhaps not
> declaring enough things or something?  The Symbolics certainly can
> produce reasonable FP performance for single-float-only code
> (double-float is death-by-consing), but it didn't have particularly
> glamorous FP performance except possibly by the standards of other
> Lisp systems with very poor FP support.  Something like CMU (or a
> commercial lisp I'm sure), with declarations, should be hundreds of
> times better than this on a modern machine.

Well, the key here is "with declarations".  The Gabriel benchmarks don't
have declarations.  Yes, you can get a performance boost by adding
declarations to your code, but doing so does come with a price.  The
price you pay is less flexibility, and an increased chance that your 
program will "exit" unexpectedly like other programming languages, rather
than just entering a "break".



> 
> --tim

-- 
Christopher J. Vogt - Computer Consultant - Lisp, AI, Graphics, etc.
http://members.home.com/vogt/
From: ·······@ibm.net
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <35A4545F.7907@ibm.net>
Christopher J. Vogt wrote:
> 
> Tim Bradshaw wrote:
> >
> > * Christopher J Vogt wrote:
> >
> > > If you look at "Performance and Evaluation of Lisp Systems" by
> > > Gabriel from 1985, I think that you'll see that the Symbolics
> > > computers were 2 to 3 times faster than comparable "workstations"
> > > from Sun and Apollo on all the benchmarks.  I consider a factor of 2
> > > or 3 to be "significant".
> >
> > But they were typically much more expensive, and they didn't keep
> > their advantage for long.  Certainly when RISC came along, and people
> > started to think harder about writing compilers for Lisp on RISCy
> > machines, they were pretty much ground into the dust in terms of
> > performance.  (I'm writing this as someone who owns several Symbolics
> > machines, so I'm not just bashing them without knowledge here!)  I
> > think in fact the crossover was before RISC really arrived -- things
> > like Sun 3/260s with Lucid were beating 3600s I think.
> 
> The Symbolics machines were more expensive, but a large part of that
> is business and marketing.  If you get down to silicon, there is really
> very little requried to support Lisp in hardware.  Therefore, business
> issues aside (and ignoring the economy of scale) the cost of hardware
> that would support Lisp would be only slightly more expensive than
> conventional hardware.
> 

Do you think that the people who were thinking of
buying them gave a shit about that?

I was at a lot of LISP machine sales pitches
about ten years ago.  The pitches were based
on applications written in LISP.  A lot
of the salespeople were the same people trying to
sell us work stations the year before.  They were now
telling us that we needed a LISP machine to do
any non-toy project. A year
later, the same applications were available written
in C for a fraction of the cost and available
on hardware costing a fraction of the cost of a
LISP machine.  My employers never ended up buying
either alternative.  I wasn't involved in the
ultimate decision to buy or not.

No flames please.  I'm writing my own LISP interpreter
as a hobby project, but the LISP hardware movement
hit the market at the same time as the hype about
AI and the people marketing it might as well have
been selling you an Apollo (remember them) as a
Symbolics.  The AI hype died down, the LISP machine
hype died considerably earlier.

Please respond to my email since I look at this
newsgroup infrequently.

John Anderson
From: Tim Bradshaw
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <ey390m2yo36.fsf@todday.aiai.ed.ac.uk>
* Christopher J Vogt wrote:

> The Symbolics machines were more expensive, but a large part of that
> is business and marketing.  If you get down to silicon, there is really
> very little requried to support Lisp in hardware.  Therefore, business
> issues aside (and ignoring the economy of scale) the cost of hardware
> that would support Lisp would be only slightly more expensive than
> conventional hardware.

Yes, and that hardware would therefore be *very similar to*
conventional hardware.  Because if it's not, then it's going to be
very very expensive indeed, because developing a modern
high-performance processor is a very expensive business indeed (anyone
know what the budget for merced or something is?).

[FP benchmarks]

> Well, the key here is "with declarations".  The Gabriel benchmarks don't
> have declarations.  Yes, you can get a performance boost by adding
> declarations to your code, but doing so does come with a price.  The
> price you pay is less flexibility, and an increased chance that your 
> program will "exit" unexpectedly like other programming languages, rather
> than just entering a "break".

Well, you either need declarations, or you need a very smart
type-inferencing compiler, because you will *not* be able to get
competitive performance with full HW typechecking.  The best case I
can think of is something like this:

	(* a b)	so a and b are some kind of number
	1) start a typecheck, and in pll issue fixnum and FP mult
	2) if the typecheck is OK, and there was no overflow, then
	   commit whichever was the right one to do.  Otherwise back
	   out to code to do the bignum case.

This can all happen in one instruction, *but* I'm using 3 execution
units to do it (FP, fix, and typecheck).  And of course I'm also not
dealing with the fix/float case.  If on the other hand the compiler
knew the types, I can use two of these execution units to do an
integer and a float operation in parallel (say increment a loop
counter and do the work of the loop at once), and I can use the
silicon saved by not needing the typecheck unit to go towards another
one, so I could get even more parallelism out of the thing.

HW typechecking means either slow clock or less parallelism, and that
means lower performance.

--tim
From: Christopher J. Vogt
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <35AA6389.883A0D8C@computer.org>
Tim Bradshaw wrote:
>
>  [...]
>
> [FP benchmarks]
> 
> > Well, the key here is "with declarations".  The Gabriel benchmarks don't
> > have declarations.  Yes, you can get a performance boost by adding
> > declarations to your code, but doing so does come with a price.  The
> > price you pay is less flexibility, and an increased chance that your
> > program will "exit" unexpectedly like other programming languages, rather
> > than just entering a "break".
> 
> Well, you either need declarations, or you need a very smart
> type-inferencing compiler, because you will *not* be able to get
> competitive performance with full HW typechecking.  The best case I
> can think of is something like this:

This is where we disagree.  I believe that you can get equivalent performance
via HW typechecking.

> 
>         (* a b) so a and b are some kind of number
>         1) start a typecheck, and in pll issue fixnum and FP mult
>         2) if the typecheck is OK, and there was no overflow, then
>            commit whichever was the right one to do.  Otherwise back
>            out to code to do the bignum case.
> 
> This can all happen in one instruction, *but* I'm using 3 execution
> units to do it (FP, fix, and typecheck).  And of course I'm also not
> dealing with the fix/float case.  If on the other hand the compiler
> knew the types, I can use two of these execution units to do an
> integer and a float operation in parallel (say increment a loop
> counter and do the work of the loop at once), and I can use the
> silicon saved by not needing the typecheck unit to go towards another
> one, so I could get even more parallelism out of the thing.

First of all, an FP unit is on the order of 10x more gates/transisters than
an ALU, and an ALU is on the order of 5x more gates/transisters than a
type check unit.  So you are comparing apples to oranges when you compare these
units in terms of HW real estate.  It is on this basis that I claim the 
type check hardware is virtually free: it takes up very little space and you
incur very little in timing penalty.  You just have to do it.  Unfortunately,
the chip manufacturers don't care.

As to ILP, I think you could do as well as anybody else.  You wouldn't have to
do
both the fix and the float at the same time.  I can envision schemes to get
around this, without impacting performance very much.

> 
> HW typechecking means either slow clock or less parallelism, and that
> means lower performance.

I don't see HW typechecking slowing the clock appreciably, or delivering
significantly lower performance.

> 
> --tim

-- 
Christopher J. Vogt - Computer Consultant - Lisp, AI, Graphics, etc.
http://members.home.com/vogt/
From: Tim Bradshaw
Subject: Re: common hardware soup-ups?
Date: 
Message-ID: <ey3sok4xt7w.fsf@todday.aiai.ed.ac.uk>
* Christopher J Vogt wrote:
> First of all, an FP unit is on the order of 10x more
> gates/transisters than an ALU, and an ALU is on the order of 5x more
> gates/transisters than a type check unit.  So you are comparing
> apples to oranges when you compare these units in terms of HW real
> estate.  It is on this basis that I claim the type check hardware is
> virtually free: it takes up very little space and you incur very
> little in timing penalty.  You just have to do it.  Unfortunately,
> the chip manufacturers don't care.

This might be right (which isn't meant to mean `I think it isn't', I
just don't know!).  But you come back to the problem of getting people
to do it.  Designing modern high-performance CPUs is brutally
expensive, so someone doing something that isn't mainstream is going
to be in trouble.  Of course, if you could get C/Fortran people to
find a use for a typechecker, you'd be home and dry.

Except (and this is something I just thought of...) you wouldn't
really.  Because if you want HW typechecking and sensible data sizes
(meaning, for numerical stuff, 32 bit singles and 64 bit doubles),
then you need wider words and all sorts of expensive horrors come out
to get you.  Of course on a basically-64-bit machine you could do
competitive single-float stuff but you'd lose horribly for doubles
(like Symbolics's do).  And single-float often is not enough (I spent
half of the night before last cursing my 3630 because it blows up all
the time with single-float code and I haven't got the weeks it would
take to do the double-float stuff... I'm sure I'll end up running on
CMUCL with no decent graphics output).

> As to ILP, I think you could do as well as anybody else.  You
> wouldn't have to do both the fix and the float at the same time.  I
> can envision schemes to get around this, without impacting
> performance very much.

I'm not sure about this.  If you're not doing type inference then at
least for numerical work you rapidly end up knowing almost nothing
about the types beyond that they're numbers (I think!).  So unless
you're making an assumption of `probably an x' then you need to try
both.  My idea in some earlier post was to have a type-inferencing
compiler which would be willing to generate just this `almost
certainly an x' code, and then an off-to-the-side (hardware) type
checker which would cope with the `not an x after all' case by punting
to slow general-case code.  This would help things like loops where
CMUCL at least stresses about loop variables not being provably
fixnums.

I do think that using a compiler with good type support is the right
way to go, and should get you almost all the way to high performance.
Such compilers exist (CMUCL) and work moderately well, and I'm sure
could work very well if a good deal more work was put into them.  It
might be that HW would still get you extra performance, but I suspect
it would not be a great deal extra.  It might, as you said earlier,
get better safety, but, again, CMUCL does quite well here by a
combination of checking type declarations and doing inference to pull
those checks out of the inner loops allowing them to be compiled with
no checks at all.

--tim