From: Fernando
Subject: Interview with Alan Kay
Date: 
Message-ID: <1109077050.199834.21960@f14g2000cwb.googlegroups.com>
Hi,

I just finished reading an interview with Alan kay (creator of
Smalltalk). Highly recommended reading (it also mentions Lisp).

http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=273&page=1

Check it out!

From: Holger Duerer
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <87psysg1j6.fsf@ronaldann.demon.co.uk>
>>>>> "Fernando" == Fernando  <···@easyjob.net> writes:

    Fernando> Hi,
    Fernando> I just finished reading an interview with Alan kay
    Fernando> (creator of Smalltalk). Highly recommended reading (it
    Fernando> also mentions Lisp).

    Fernando> http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=273&page=1
I wonder how often this article will still turn up?  :-)

Anyways, what I always had wanted to ask about something said in that
interview:

When talking about the Burroughs machines, this comes up:

<cite>
   Neither Intel nor Motorola nor any other chip company understands
   the first thing about why that architecture was a good idea.

   Just as an aside, to give you an interesting benchmark--on roughly
   the same system, roughly optimized the same way, a benchmark from
   1979 at Xerox PARC runs only 50 times faster today. Moore's law has
   given us somewhere between 40,000 and 60,000 times improvement in
   that time. So there's approximately a factor of 1,000 in efficiency
   that has been lost by bad CPU architectures.

   The myth that it doesn't matter what your processor architecture
   is--that Moore's law will take care of you--is totally false.
</cite>

Can anybody spare some insight into how that architecture was so much
better?

Looking forward to some good stories
        Holger
From: Thomas Gagne
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <WNOdnWDErJNdpobfRVn-sQ@wideopenwest.com>
This is a fairly consistant theme of his, going back to his paper on the 
History of Smalltalk.  He discusses it in the coda 
<http://gagne.homedns.org/~tgagne/contrib/EarlyHistoryST.html#coda>

"One final comment. Hardware is really just software crystallized early. It is 
there to make program schemes run as efficiently as possible. But far too 
often the hardware has been presented as a given and it is up to software 
designers to make it appear reasonable. This has caused low-level techniques 
and excessive optimization to hold back progress in program design. As Bob 
Barton used to say: "Systems programmers are high priests of a low cult."

"One way to think about progress in software is that a lot of it has been 
about finding ways to late-bind, then waging campaigns to convince 
manufacturers to build the ideas into hardware. Early hardware had wired 
programs and parameters; random access memory was a scheme to late-bind them. 
Looping and indexing used to be done by address modification in storiage; 
index registers were a way to late-bind. Over the years software designers 
have found ways to late-bind the locations of computations--this led to 
base/bounds registers, segment relocation, page MMUs, migratory processes, and 
so forth. Time-sharing was held back for years because it was "inefficient"-- 
but the manufacturers wouldn't put MMUs on the machines, universities had to 
do it themselves! Recursion late-binds parameters to procedures, but it took 
years to get even rudimentary stack mechanisms into CPUs. Most machines still 
have no support for dynamic allocation and garbage collection and so forth. In 
short, most hardware designs today are just re-optimizations of moribund 
architectures."
From: Tim May
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <220220051046162736%timcmay@removethis.got.net>
In article <······················@wideopenwest.com>, Thomas Gagne
<······@wide-open-west.com> wrote:

...quotes from Alan Kay's interesting interview...

> so forth. Time-sharing was held back for years because it was "inefficient"-- 
> but the manufacturers wouldn't put MMUs on the machines, universities had to 
> do it themselves! Recursion late-binds parameters to procedures, but it took 
> years to get even rudimentary stack mechanisms into CPUs. Most machines still 
> have no support for dynamic allocation and garbage collection and so forth.
> In 
> short, most hardware designs today are just re-optimizations of moribund 
> architectures."


What Alan Kay says is no doubt true, but the real issue has always been
that computer architecture, especially for workstations and PCs, has
been a popularity contest: one architecture wins out.

There are multiple reasons for this, including "most popular gets more
users, hence more software," and "learning curve" (costs of production
lower for higher-volume chips), and, I think most importantly, "limited
desktop space means one machine per desktop."

In the 80s and into the 90s, this meant a Sun workstation or equivalent
RISC/Unix/C machine for engineers and designers, an IBM PC or
equivalent for most office workers, a Macintosh for most graphics
designers or desktop publishers.

It didn't matter if a Forth engine was great for Forth, or a Symbolics
3600 was great for Lisp, or a D-machine was great for Smalltalk: there
just weren't many of these sold.

Similar things happened in minicomputer and mainframe computer markets.

I was at Intel from 1974 to 1986 and saw efforts to introduce new
architectures (432, 960, 860, iWarp, other processors from other
companies, such as Z8000, 32032, Swordfish, etc.). Mostly these efforts
failed. (There are various reasons, but most never got a chance to be
tweaked or fixed, because the "market had spoken.") The customers
wanted x86, despite obvious shortcomings. Niche architectures mostly
died out. And such is the case today, even more so than back then.

(Something that may change this is the "slowing down" of Moore's
observation about doubling rates. For good physics reasons, clock
speeds are not doubling at the rates seen in the past. This may push
architectures in different directions.)

So while there are all sorts of things that _could_ be put into
computer architectures, limited desktop space and the economies of
scale pretty much dictate a slower rate of adding these features.

--Tim May
From: Dave Roberts
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <m3u0o4v2f8.fsf@linux.droberts.com>
Tim May <·······@removethis.got.net> writes:

> I was at Intel from 1974 to 1986 and saw efforts to introduce new
> architectures (432, 960, 860, iWarp, other processors from other
> companies, such as Z8000, 32032, Swordfish, etc.). Mostly these efforts
> failed. (There are various reasons, but most never got a chance to be
> tweaked or fixed, because the "market had spoken.") The customers
> wanted x86, despite obvious shortcomings. Niche architectures mostly
> died out. And such is the case today, even more so than back then.

Generally, I agree with you on the bulk of this post. One thing I
would tweak would be the statements above. The root of the problem
wasn't that people wanted x86 so much, they just wanted something that
would preserve the high-performance execution of their old
binaries. The market has shown that it will allow people to add things
to the x86 architecture, as long as the installed base of software is
preserved. Intel did this very successfully from the 8086/88 -> 286 ->
386 -> 486 -> Pentium -> Pro -> MMX -> III -> SSE -> SSE2 ->
4. Surprisingly, Intel then forgot the rule and went off and did
Itanic, with consequent adoption results.

AMD, it seems, did not forget the rule, allowing it to add in 64-bit
support, extended register sets, etc., in AMD64.

In any case, I submit that the market would actually not reject such
things as enhanced Lisp support in a processor, as long as that
processor was capable of running standard 16-bit, 32-bit, and (now)
64-bit x86 binaries in addition to the extended capabilities.

> (Something that may change this is the "slowing down" of Moore's
> observation about doubling rates. For good physics reasons, clock
> speeds are not doubling at the rates seen in the past. This may push
> architectures in different directions.)

I agree that this should help, but I think the bigger pressure is not
so much physics as economics. The fact is, Intel has been trying to
figure out how to use all those transistors it has for something that
people will actually buy for quite sometime. This was what drove MMX
support and other enhancements initially. Intel's big problem now is
that processors are fast enough for the vast majority of basic office
tasks (word processing, spreadsheets, etc.). That means you just have
downward price pressure to look forward to. I think you'll see all
sorts of transformations over the next few years as things like
advanced crypto, 3D graphics, and other enhancements are added to the
x86 architecture. Unfortunately, while I would love it, I doubt that
Lisp support will be one of them.

--

Dave Roberts
dave at findinglisp dot com
From: Tim May
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <220220051221013878%timcmay@removethis.got.net>
In article <··············@linux.droberts.com>, Dave Roberts
<·····@droberts.com> wrote:

> Tim May <·······@removethis.got.net> writes:
> 
> > I was at Intel from 1974 to 1986 and saw efforts to introduce new
> > architectures (432, 960, 860, iWarp, other processors from other
> > companies, such as Z8000, 32032, Swordfish, etc.). Mostly these efforts
> > failed. (There are various reasons, but most never got a chance to be
> > tweaked or fixed, because the "market had spoken.") The customers
> > wanted x86, despite obvious shortcomings. Niche architectures mostly
> > died out. And such is the case today, even more so than back then.
> 
> Generally, I agree with you on the bulk of this post. One thing I
> would tweak would be the statements above. The root of the problem
> wasn't that people wanted x86 so much, they just wanted something that
> would preserve the high-performance execution of their old
> binaries. The market has shown that it will allow people to add things
> to the x86 architecture, as long as the installed base of software is
> preserved. Intel did this very successfully from the 8086/88 -> 286 ->
> 386 -> 486 -> Pentium -> Pro -> MMX -> III -> SSE -> SSE2 ->
> 4. Surprisingly, Intel then forgot the rule and went off and did
> Itanic, with consequent adoption results.

This is not quite so, from what I saw (knowing some of the
IA64/VLIW/EPIC designers). Firstly, it was expected that the first
IA64s would hit the market around 1998 and that transaction processing
would be a dominant market (and hence x86 binaries less of an issue
than with desktops, as the main competition was non-x86 machines
anyway). This was similar to the 432 experience, and Intel and Siemens
even had a joint venture in the 1980s to develop a highly capable,
fault-tolerant business machine ("BiiN," which wags dubbed "Billions
invested in Nothing," though I think this is too harsh, for various
reasons).

(A lot of groups were coming out with novel chip and system
architectures in the 1980s. Essentially they _all_ failed or were
absorbed into other companies and used with fairly conventional chip
architectures.)

Secondly, by the time the first IA64s hit the market, Xeon performance
had increased enormously (recall that this was during that "sprint"
from 400 MHz machines (PCs, Macs, Suns) to 1 GHz, then 2 GHz, then 3
GHz and beyond, depending on exact architecture, e.g., pipelines).
Thirdly, the initial IA64 was disappointing. The second and third
iterations (McKinley, Madison, etc.) were better, and have scored well
on a bunch of benchmarks, winning some large installations (such as the
"Columbia" supercomputer, currently either #2 or #3 in ranking).

Fourthly, software just hasn't changed much, at least not for most
users. Little demand for 64 bits. I suppose this is the same as your
"legacy x86 code" point.

 In any case, Intel _was_ pursuing 64-bit extensions to the x86, as the
current product announcements show (these are multi-year efforts, and
the work was started a long time before AMD announced parts).

Eventually there _will_ be a major shift to a new ISA. Intel has
expected this for a long time...and has, in my opinion as a long-time
stockholder, generally done the right thing in supporting the x86 while
also attempting to introduce other architectures.

AMD has done well, but is not in nearly the same position to innovate
in ISAs. (Recall they had a well-regarded bipolar bitslice product
line...largely abandoned for the "monoculture" reasons we are
discussing.)

> AMD, it seems, did not forget the rule, allowing it to add in 64-bit
> support, extended register sets, etc., in AMD64.
> 
> In any case, I submit that the market would actually not reject such
> things as enhanced Lisp support in a processor, as long as that
> processor was capable of running standard 16-bit, 32-bit, and (now)
> 64-bit x86 binaries in addition to the extended capabilities.

A fascinating topic. I tend to disagree, for reasons it would take me
too much time and space to get into. Sure, if supporting "Language X"
cost nothing more (to add to the chip), it would not "hurt" sales. But
would Dell or HP or anyone else bother to build a box that used the X
features in _any_ significant way? I doubt it.

And, in fact, I can't see many ways that adding Language X features
could be reasonably done without affecting the die size, the design and
debugging costs, the yields, etc. Perhaps if we had some specific
proposals made?

(For example, adding tag bits. One of the 960 processors had a tag bit
(I think just one, but maybe more). As I recall, this was the 960XA.
Designed to support some capability-based features, along the lines of
IBM's System/38. Related to some Ada work. Boeing was enthusiastic
about having the XA, and designed an avionic system for the 7J7 jumbo.
Which got cancelled. Last I checked, no systems were using the tag
bit(s).)

In fact, going out on a speculative limb here, it could be argued that
"listening to requests from language fans" was precisely what led to
the 432 situation and the recent IPA/EPIC/VLIW efforts. 

Could either OO/tag bits or VLIW features be added at little or no cost
to the x86 ISA? I doubt it.

It sure looks to me that the architects look at the usual benchmarks,
for integer and FP performance, for transactions per second, for C++
and Java code, and then tweak architectures with more cache, more
registers, etc. Exotic things to improve "Language X" performance,
where Language X has a small market share, don't get done.


> 
> > (Something that may change this is the "slowing down" of Moore's
> > observation about doubling rates. For good physics reasons, clock
> > speeds are not doubling at the rates seen in the past. This may push
> > architectures in different directions.)
> 
> I agree that this should help, but I think the bigger pressure is not
> so much physics as economics. The fact is, Intel has been trying to
> figure out how to use all those transistors it has for something that
> people will actually buy for quite sometime. 

I think it's much more about physics. By physics I mean heat production
(exceeding 100 W on some of the mid-3 GHz processors, requiring very,
very large heat sinks and costly cooling systems, as seen even in
Apple's dual-2.5 GHz PPC machines, using water-cooling) and
leakage/channel length/dopant variation problems. 

Believe me, Intel would dearly have loved to have rolled out the 4 GHz
chips as planned, and IBM would have loved to get 3 GHz PPC chips out
for Apple to use....Steve Jobs had to eat crow when the faster Macs
didn't appear (and still haven't). So it's not a matter of customers
not knowing what to do with the faster chips, it's that pushing the
lithography down to 90nm is coming with some very serious physics
challenges (more so than some past challenges, which were overcome in
various ways...I worked on some of them, in fact).

And pushing to 65 nm, the next major milestone, looks to present major
challenges (these milestones occur at certain places for some
complicated reasons, involving equipment development, light
wavelengths, optics, bus speeds, etc.).

Meanwhile, customers who had gotten used to trading in their machines
for ones twice as fast (33 MHz --> 66 MHz --> 166 MHz --> 400 MHz --> 1
GHz --> 2 GHz --> 3.2 GHz) are finding that the speeds are now only
creeping up, seemingly topping out in the last couple of years in the
3.5-3.8 GHz range. While no doubt the 4 GHz mark will be crossed, a
definite slowdown in speedup is happening. For excellent reasons.

(I won't even get into exotic technologies, which are a long way away,
IMO.)

Maybe this is a good thing, a hidden blessing. The march of Moore's Law
has made most architectural experimentation into dead-ends.

Crawling even further, much further, out onto that speculative limb, I
think the era of truly distributed systems is finally going to happen,
on desktops I mean (in that we are already using other kinds of
distributed systems). Not just multi-core, with small numbers of cores,
but with hundreds or more processors.

(Note that a lot of folks are saying this. It may finally happen this
time, though.)

Then the paradigm of "actor-oriented" or "side effect-free" functional
programming may become more important. 

--Tim May
From: Dave Roberts
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <m3ll9guuup.fsf@linux.droberts.com>
Tim May <·······@removethis.got.net> writes:

> This is not quite so, from what I saw (knowing some of the
> IA64/VLIW/EPIC designers). Firstly, it was expected that the first
> IA64s would hit the market around 1998 and that transaction processing
> would be a dominant market (and hence x86 binaries less of an issue
> than with desktops, as the main competition was non-x86 machines
> anyway). This was similar to the 432 experience, and Intel and Siemens
> even had a joint venture in the 1980s to develop a highly capable,
> fault-tolerant business machine ("BiiN," which wags dubbed "Billions
> invested in Nothing," though I think this is too harsh, for various
> reasons).
> 
> (A lot of groups were coming out with novel chip and system
> architectures in the 1980s. Essentially they _all_ failed or were
> absorbed into other companies and used with fairly conventional chip
> architectures.)
> 
> Secondly, by the time the first IA64s hit the market, Xeon performance
> had increased enormously (recall that this was during that "sprint"
> from 400 MHz machines (PCs, Macs, Suns) to 1 GHz, then 2 GHz, then 3
> GHz and beyond, depending on exact architecture, e.g., pipelines).
> Thirdly, the initial IA64 was disappointing. The second and third
> iterations (McKinley, Madison, etc.) were better, and have scored well
> on a bunch of benchmarks, winning some large installations (such as the
> "Columbia" supercomputer, currently either #2 or #3 in ranking).

Yes, I used to work in the lab at HP that had a hand in
Madison. *Very* talented people work there. The product also reflects
their architectural philosophy: *big*, *fast* caches.

> Fourthly, software just hasn't changed much, at least not for most
> users. Little demand for 64 bits. I suppose this is the same as your
> "legacy x86 code" point.

Yes, that's exactly it. Anybody who says that Itantic's competition
was simply non-x86 architectures misses the point. That's very poor
market strategy. In the same way that x86 competed against those other
architectures, it has to compete versus EPIC-based designs, too.

In any case, my comments weren't meant to be anti-Intel or
anti-EPIC/Itanic (though I do continue to call it Itanic ;-). Indeed,
EPIC is a very interesting architecture and is successful technically
on some axes (great floating point). The technical guys who worked on
it are to be commended. The problem is the marketing folks to agreed
to junking backward compatibility. The fact that it hasn't been a
commercial success is, IMO, directly attributable to the fact that it
doesn't run x86 code as fast as Xeon and AMD processors. The fact that
AMD64, on the other hand, has been a (relative, at this point)
commercial success is directly attributable to the fact that it does
run standard 32-bit x86 binaries as fast or faster than other 32-bit
only x86 processors.


>  In any case, Intel _was_ pursuing 64-bit extensions to the x86, as the
> current product announcements show (these are multi-year efforts, and
> the work was started a long time before AMD announced parts).

That's all well and good. I wasn't suggesting that Intel wasn't
pursuing such extensions, only that such efforts were not brought to
market and EPIC was. That was Andy Glew's reason for leaving Intel and
going to AMD, no? (Though the recent news is that he's gone back to
Intel, so perhaps they are putting more effort in this direction.)

> Eventually there _will_ be a major shift to a new ISA. Intel has
> expected this for a long time...and has, in my opinion as a long-time
> stockholder, generally done the right thing in supporting the x86 while
> also attempting to introduce other architectures.

I guess we disagree here. I agree with you that ISAs are not static
and will shift over time, but given the dominant code base is x86 for
the average IT buyer, any changes will have to evolve from here to
there. Note that each market segment may have different
preferences. If you're taking the mainframe market forward, you better
make sure you take all the legacy IBM code into account. Etc. In that
sense, another architecture (something like IBM's Cell, perhaps) might
be able to flourish in a particular new niche (like Playstations), but
it's unlikely such an architecture can will ever penetrate the
segments dominated by x86 without providing a very graceful transition
strategy.

> > In any case, I submit that the market would actually not reject such
> > things as enhanced Lisp support in a processor, as long as that
> > processor was capable of running standard 16-bit, 32-bit, and (now)
> > 64-bit x86 binaries in addition to the extended capabilities.
> 
> A fascinating topic. I tend to disagree, for reasons it would take me
> too much time and space to get into. Sure, if supporting "Language X"
> cost nothing more (to add to the chip), it would not "hurt" sales. But
> would Dell or HP or anyone else bother to build a box that used the X
> features in _any_ significant way? I doubt it.
> 
> And, in fact, I can't see many ways that adding Language X features
> could be reasonably done without affecting the die size, the design and
> debugging costs, the yields, etc. Perhaps if we had some specific
> proposals made?

Of course. I'm assuming that basic features could be added without too
much of an end-user cost hit. Indeed, AMD did it with AMD64. Without
knowing details, it's hard to understand whether a particular
extension would make it or not. Basically, I'm just arguing along Alan
Kaye's line: I think such improvements could be introduced into
products. The new innovations have to be cost-neutral, however.

> (For example, adding tag bits. One of the 960 processors had a tag bit
> (I think just one, but maybe more). As I recall, this was the 960XA.
> Designed to support some capability-based features, along the lines of
> IBM's System/38. Related to some Ada work. Boeing was enthusiastic
> about having the XA, and designed an avionic system for the 7J7 jumbo.
> Which got cancelled. Last I checked, no systems were using the tag
> bit(s).)

This argument as a couple of problems. First, you're arguing a
negative: the failure of one processor that had tag bits doesn't imply
anything about tag bits. The processor also had all manner of other
differences and its failure doesn't necessarily imply anything about
those other innovations. Secondly, you're sort of proving my point:
without x86 compatibility and neutral cost structure for those
innovations, almost all processors trying to address the x86 market
space will sink (and yes, I realize that the 960 wasn't trying to
compete with the x86 space).

> In fact, going out on a speculative limb here, it could be argued that
> "listening to requests from language fans" was precisely what led to
> the 432 situation and the recent IPA/EPIC/VLIW efforts. 

Hmmm... I'm not sure I see that. Which language fans were arguing for
EPIC or VLIW? Some processor architecture people might have been, but
I don't see the translation to HLL fans.

> Could either OO/tag bits or VLIW features be added at little or no cost
> to the x86 ISA? I doubt it.

I bet you could. AMD added a whole 64-bit architecture there, with new
registers, etc., etc., and still kept the cost structure under
control. A few more instructions for handling tagged data types in the
ALU would likely not add a any cost whatsoever. Obviously, VLIW would
(changing the whole pipeline, etc.). Heck, with the way that x86s are
architected these days, you could probably handle some of the tagged
data type instructions with microcode such that they were built out of
slightly larger instructions. Not as efficient as hard-coded
primitives, but probably better than separate primitive instructions
being fed into the fetch unit.

> It sure looks to me that the architects look at the usual benchmarks,
> for integer and FP performance, for transactions per second, for C++
> and Java code, and then tweak architectures with more cache, more
> registers, etc. Exotic things to improve "Language X" performance,
> where Language X has a small market share, don't get done.

Oh, agreed. I wasn't arguing that Lisp *will* be implemented in such
processors. In fact, I said quite the opposite. The fact is, however,
that many such other functionalities *will be* (again, crypto,
graphics, etc.).

> > > (Something that may change this is the "slowing down" of Moore's
> > > observation about doubling rates. For good physics reasons, clock
> > > speeds are not doubling at the rates seen in the past. This may push
> > > architectures in different directions.)
> > 
> > I agree that this should help, but I think the bigger pressure is not
> > so much physics as economics. The fact is, Intel has been trying to
> > figure out how to use all those transistors it has for something that
> > people will actually buy for quite sometime. 
> 
> I think it's much more about physics. By physics I mean heat production
> (exceeding 100 W on some of the mid-3 GHz processors, requiring very,
> very large heat sinks and costly cooling systems, as seen even in
> Apple's dual-2.5 GHz PPC machines, using water-cooling) and
> leakage/channel length/dopant variation problems. 

While all the problems you cite are very real, I think they are
totally orthogonal to this discussion. Again, AMD and even Intel
itself have given us existence proofs that such specialized
functionality can be integrated into x86 CPUs for minimal added cost
and that such functionality will be adopted and used if it provides
value for people. I would cite MMX, AMD64, Altivec in the case of
Motorola/Freescale with the PPC architecture, etc.

--
Dave Roberts
dave at findinglisp dot com
From: Raymond Toy
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <sxdfyznfm1e.fsf@rtp.ericsson.se>
>>>>> "Dave" == Dave Roberts <·····@droberts.com> writes:

    Dave> Tim May <·······@removethis.got.net> writes:
    >> (For example, adding tag bits. One of the 960 processors had a tag bit
    >> (I think just one, but maybe more). As I recall, this was the 960XA.
    >> Designed to support some capability-based features, along the lines of
    >> IBM's System/38. Related to some Ada work. Boeing was enthusiastic
    >> about having the XA, and designed an avionic system for the 7J7 jumbo.
    >> Which got cancelled. Last I checked, no systems were using the tag
    >> bit(s).)

    Dave> This argument as a couple of problems. First, you're arguing a
    Dave> negative: the failure of one processor that had tag bits doesn't imply
    Dave> anything about tag bits. The processor also had all manner of other
    Dave> differences and its failure doesn't necessarily imply anything about
    Dave> those other innovations. Secondly, you're sort of proving my point:

Sparc still has instructions and support for 2 tag bits.  It's still
alive, for some value of alive.

Ray
From: Dave Roberts
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <m3u0o3b0cg.fsf@linux.droberts.com>
Raymond Toy <···········@ericsson.com> writes:

> Sparc still has instructions and support for 2 tag bits.  It's still
> alive, for some value of alive.

Originally, yes, there were some bits there. Some of the late 1980
speaches by Bill Joy mention high level languages like Lisp as being
central to Sun's direction. There is a rumor going around that those
instructions don't actually work in some of the newest SPARC designs,
though. Not sure if that's true or if they have been formally removed
from the latest versions of the architecture.

If anybody can confirm one way or another, I would be interested in
finding out.

-- 
Dave Roberts
dave -remove- AT findinglisp DoT com
http://www.findinglisp.com/
From: Raymond Toy
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <sxdbrabf6x3.fsf@rtp.ericsson.se>
>>>>> "Dave" == Dave Roberts <···········@remove-findinglisp.com> writes:

    Dave> Raymond Toy <···········@ericsson.com> writes:
    >> Sparc still has instructions and support for 2 tag bits.  It's still
    >> alive, for some value of alive.

    Dave> Originally, yes, there were some bits there. Some of the late 1980
    Dave> speaches by Bill Joy mention high level languages like Lisp as being
    Dave> central to Sun's direction. There is a rumor going around that those
    Dave> instructions don't actually work in some of the newest SPARC designs,
    Dave> though. Not sure if that's true or if they have been formally removed
    Dave> from the latest versions of the architecture.

AFAIK, they still work, and are documented as still working in the
Sparc V9 architecture document.  However, the taddcctv and tsubcctv
instructions are deprecated.  (taddcctv adds 2 tagged numbers and if
the result overflows, or if the tag bits aren't zero, a trap is
generated.)  CMUCL used to use these instructions, but hasn't in quite
a while.

Ray
From: Bulent Murtezaoglu
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <87is4jw20l.fsf@p4.internal>
>>>>> "DR" == Dave Roberts <···········@remove-findinglisp.com> writes:

    DR> ... There is a rumor
    DR> going around that those instructions don't actually work in
    DR> some of the newest SPARC designs, though. Not sure if that's
    DR> true or if they have been formally removed from the latest
    DR> versions of the architecture. ...

I was was waiting for Duage Rettig to chime in on this, but perhaps 
he's too busy to care at the moment.  In any event, when the subject 
of tagging came up before he'd tell us that it isn't necessarly hardware 
support for tagging but rather fast user-level traps that'd help CL 
implementors (alongside other GC languages).

Confirming later sparcs messing up tagging
http://groups-beta.google.com/group/comp.sys.sun.hardware/msg/554caec6480bb727
On fast traps helping with SW barriers
http://groups-beta.google.com/group/comp.lang.lisp/msg/953b65239ef234af

(more googling will yield more)

Just another thing to keep in mind while fantasizing about 
architectures.

cheers,

BM
From: Duane Rettig
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <4u0o3t7i5.fsf@franz.com>
Bulent Murtezaoglu <··@acm.org> writes:

> >>>>> "DR" == Dave Roberts <···········@remove-findinglisp.com> writes:
> 
>     DR> ... There is a rumor
>     DR> going around that those instructions don't actually work in
>     DR> some of the newest SPARC designs, though. Not sure if that's
>     DR> true or if they have been formally removed from the latest
>     DR> versions of the architecture. ...
> 
> I was was waiting for Duage Rettig to chime in on this, but perhaps 
> he's too busy to care at the moment.

Heh; yes, this morning I waded through the new google search mechanism,
which I don't much like (or am not yet used to - old dog / new tricks...)
and couldn't easily find the articles that were relevant.  Perhaps the
problem was fading memory, where I had thought the answer I was looking
for was in comp.arch, but as you found below, they were in c.l.l after all.
Thanks for finding them.

>  In any event, when the subject 
> of tagging came up before he'd tell us that it isn't necessarly hardware 
> support for tagging but rather fast user-level traps that'd help CL 
> implementors (alongside other GC languages).

Yes, this is the one hardware feature that I think would fly.  Trying
to do what a Lisp machine would do with respect to tags is not only
too lisp-specific, it is too specific-implementation-of-lisp specific.
As a developer in one of the many General Purpose Hardware lisp
implementations, I can say that any savings to be had by adding tagging
help is not very large, and the tagging help itself (for example, in the
sparcs, which only helps adds and subtracts) doesn't really do that much.

> Confirming later sparcs messing up tagging
> http://groups-beta.google.com/group/comp.sys.sun.hardware/msg/554caec6480bb727
> On fast traps helping with SW barriers
> http://groups-beta.google.com/group/comp.lang.lisp/msg/953b65239ef234af
> 
> (more googling will yield more)

Yes; also google for "alignment traps" and "user level interrupts" in
comp.arch for more background.  It was interesting going back over the
articles I had written and the conversations that resulted, but the
articles were not quite as on-point as the ones you found were.

> Just another thing to keep in mind while fantasizing about 
> architectures.

Fantasies about architectures must include the prospect of that
fantasy's viability in the future; the lesson of the Lisp machines
and the lesson of the Sparc taddcc* instructions tells us that GP
hardware features are more likely to survive than special-purpose
hardware features, and along with that the lisps which only rely
on GP hardware.  So as we fantasize, it would do us good to
consider how those features might be useful to the "popular"
languages as well.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Dave Roberts
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <m38y5fapue.fsf@linux.droberts.com>
Duane Rettig <·····@franz.com> writes:

> Bulent Murtezaoglu <··@acm.org> writes:
> >  In any event, when the subject 
> > of tagging came up before he'd tell us that it isn't necessarly hardware 
> > support for tagging but rather fast user-level traps that'd help CL 
> > implementors (alongside other GC languages).
> 
> Yes, this is the one hardware feature that I think would fly.  Trying
> to do what a Lisp machine would do with respect to tags is not only
> too lisp-specific, it is too specific-implementation-of-lisp specific.
> As a developer in one of the many General Purpose Hardware lisp
> implementations, I can say that any savings to be had by adding tagging
> help is not very large, and the tagging help itself (for example, in the
> sparcs, which only helps adds and subtracts) doesn't really do that much.

Duane, I agree that tagging simply for adds/subtracts is really not
very helpful. It sort of makes me wonder why Sun originally put them
in there? Does anybody have any insight there?

Also, when you say that fast user-level traps are helpful, how much of
your statement is directed at the OS versus the hardware? In other
words, given a standard architecture like x86 today, how much could
you do if Linux or Windows wasn't in your way versus requiring new
chips? I'm thinking of an answer in the form of, "X% of the problem is
software vs. Y% hardware."

-- 
Dave Roberts
dave -remove- AT findinglisp DoT com
http://www.findinglisp.com/
From: Dave Roberts
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <m34qg2c2k0.fsf@linux.droberts.com>
Dave Roberts <···········@remove-findinglisp.com> writes:

> Duane, I agree that tagging simply for adds/subtracts is really not
> very helpful. It sort of makes me wonder why Sun originally put them
> in there? Does anybody have any insight there?

I should probably be more clear: these instructions are helpful, but
there are bigger fish to fry before you get to this point of
optimization. I was just surprised that Sun included them without
addressing some of the other issues of dynamic languages first.

-- 
Dave Roberts
dave -remove- AT findinglisp DoT com
http://www.findinglisp.com/
From: Duane Rettig
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <43bvllp20.fsf@franz.com>
Dave Roberts <···········@remove-findinglisp.com> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > Bulent Murtezaoglu <··@acm.org> writes:
> > >  In any event, when the subject 
> > > of tagging came up before he'd tell us that it isn't necessarly hardware 
> > > support for tagging but rather fast user-level traps that'd help CL 
> > > implementors (alongside other GC languages).
> > 
> > Yes, this is the one hardware feature that I think would fly.  Trying
> > to do what a Lisp machine would do with respect to tags is not only
> > too lisp-specific, it is too specific-implementation-of-lisp specific.
> > As a developer in one of the many General Purpose Hardware lisp
> > implementations, I can say that any savings to be had by adding tagging
> > help is not very large, and the tagging help itself (for example, in the
> > sparcs, which only helps adds and subtracts) doesn't really do that much.
> 
> Duane, I agree that tagging simply for adds/subtracts is really not
> very helpful. It sort of makes me wonder why Sun originally put them
> in there? Does anybody have any insight there?

I found a reference to this: on page 11 of
http://www.cs.ucsb.edu/labs/oocsb/papers/tr94-21.pdf
it was apparently a product of the SOAR (Smalltalk On A Risc) project.
It is interesting that the performance degradation was said to have
been 26% without the instruction; I didn't follow the reference to
see whether it was due to actual experimentation, simulation, or
hypothesis.  Note that this design was done in 1984, and thus specified
before Gabriel's 1984 book on benchmarkin Lisp systems; I don't know
what care Smalltalkers gave to performance back then, but if it was
anything like what Gabriel found in Lisp systems...

> Also, when you say that fast user-level traps are helpful, how much of
> your statement is directed at the OS versus the hardware? In other
> words, given a standard architecture like x86 today, how much could
> you do if Linux or Windows wasn't in your way versus requiring new
> chips? I'm thinking of an answer in the form of, "X% of the problem is
> software vs. Y% hardware."

This is an excellent question.  I think that the answer is that there
is a little bit of both, but even more than that; it is a trichotomy,
not a dichotomy; having been a hardware designer myself, I would categorize
the requirements into three separate pieces; hardware, operating systems and
libraries, and language architecture.  The third category might seem
like it is more of a user of the trap than a part of the system design, but
recall that the garbage-collector is in fact a provider for the user (it
sets up an environment within which the user can work without having to
consider who owns particular data, or when to deallocate it - as such,
it is part of the substrate that would be nice to optimize).

As for hardware, almost anything can be done either in hardware or
in software (i.e. Operating system), with tradeoffs for each.  Pushing
more and more to hardware lessens the generality of the change, and
makes it less likely to occur.  Pushing things off on the software
tends to make it more general, but slower.  In some case this doesn't
matter; consider for example the Alpha architecture, which (at least
at the start) did no denormalized float arithmetic, because such
arithmetic could be done algortihmically, after denormalization traps
caused the trap handler to operate.  As long as the operation occurs
seldom, and the hardware is able to communicate what is required, there
is no perceived problem or system slowdown.  In the case of the Alpha,
float instructions even have a "software" bit which identifies the
instrution as one which the programmer wants the trap handler to emulate
if conditions allow it.

For GC support, the real problem to solve is fast forwarding of
pointers that have moved.  Traps being as expensive as they are,
it might be worthwhile to allow for a mechanism within the MMU itself,
along with data fetching hardware, to allow pages to be set up to
automatically be forwarded _without_ entering a trap.  This would allow
memory reads and stores to be indirected only when the page is being
cleaned out, as opposed to a true indirecting memory set and reference
technology that would _always_ go through an extra indirection; that
would work fine enough, but memory reads would be more than twice
as slow (not only two memory references, but second one being dependent
on the first, so the pipeline would stall often).  This slowdown can be
tolerated if it is only required on pages that are being worked on by
the gc, but not if it is the basic operation for every memory read and
write.

The reason I talk about traps rather than this indirection technique
is that it is more likely to occur, because less hardware would be
involved (and it is thus more general).  Operating systems can indeed
do relatively fast trap handling, as long as the context switch isn't
hevyweight (or unless there is no context switch at all).  Saving of
context, including MMX and XMM registers on Pentium and AMD hardware,
is one of the reasons why context switches are in fact so heavyweight.
One possible solution is that Calling Conventions could be created
for such trap handlers which allow for only certain registers to be
used; these could be installed in a non-context-switching first-level
trap-handler, if compiled properly, and user-level manipulation of
the pointers done that way.  I don't know, however, what the security
concerns would be; they are likely to be non-nil...

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Dave Roberts
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <m3hdk0sevq.fsf@linux.droberts.com>
Duane Rettig <·····@franz.com> writes:

> > Also, when you say that fast user-level traps are helpful, how much of
> > your statement is directed at the OS versus the hardware? In other
> > words, given a standard architecture like x86 today, how much could
> > you do if Linux or Windows wasn't in your way versus requiring new
> > chips? I'm thinking of an answer in the form of, "X% of the problem is
> > software vs. Y% hardware."
> 
> This is an excellent question.  I think that the answer is that there
> is a little bit of both, but even more than that; it is a trichotomy,
> not a dichotomy; having been a hardware designer myself, I would categorize
> the requirements into three separate pieces; hardware, operating systems and
> libraries, and language architecture.  The third category might seem
> like it is more of a user of the trap than a part of the system design, but
> recall that the garbage-collector is in fact a provider for the user (it
> sets up an environment within which the user can work without having to
> consider who owns particular data, or when to deallocate it - as such,
> it is part of the substrate that would be nice to optimize).

Right, of course. I'm a hardware/software guy myself, so hit me with
the big stuff. ;-)

> As for hardware, almost anything can be done either in hardware or
> in software (i.e. Operating system), with tradeoffs for each.  Pushing
> more and more to hardware lessens the generality of the change, and
> makes it less likely to occur.  Pushing things off on the software
> tends to make it more general, but slower.  In some case this doesn't
> matter; consider for example the Alpha architecture, which (at least
> at the start) did no denormalized float arithmetic, because such
> arithmetic could be done algortihmically, after denormalization traps
> caused the trap handler to operate.  As long as the operation occurs
> seldom, and the hardware is able to communicate what is required, there
> is no perceived problem or system slowdown.  In the case of the Alpha,
> float instructions even have a "software" bit which identifies the
> instrution as one which the programmer wants the trap handler to emulate
> if conditions allow it.

Yup. That was the general philosophy of all the 1980s RISC
machines. Basically have hardware execute all the typical cases and
trap to software for anything considered "rare." The main problem here
is that "rare" is defined totally by what you measure as your
representative software sample. You *always* get programs that violate
those assumptions by being chock-full of those "rare" instructions,
but are still interesting programs. Most of the early RISC
architectures ended up adding a lot of missing instructions. (I was at
HP working on PA-RISC workstations in the 1989 - 1991 timeframe. The
original PA architecture didn't have integer division instructions,
only a divide-step instruction. Lots of those missing elements were
added back in later as their omission was found to be crippling for
certain workloads.)

> For GC support, the real problem to solve is fast forwarding of
> pointers that have moved.  

If you could design GC from scratch, would you do a copying collector?
They are popular with current systems, particularly generational GC
systems, but there is clearly overhead there.

> Traps being as expensive as they are,
> it might be worthwhile to allow for a mechanism within the MMU itself,
> along with data fetching hardware, to allow pages to be set up to
> automatically be forwarded _without_ entering a trap.  

Could you do this at page granularity? Most of the copying GCs I have
seen make this a per-object forwarding.

> This would allow
> memory reads and stores to be indirected only when the page is being
> cleaned out, as opposed to a true indirecting memory set and reference
> technology that would _always_ go through an extra indirection; that
> would work fine enough, but memory reads would be more than twice
> as slow (not only two memory references, but second one being dependent
> on the first, so the pipeline would stall often).  This slowdown can be
> tolerated if it is only required on pages that are being worked on by
> the gc, but not if it is the basic operation for every memory read and
> write.

Ah, I think I see. So you set a bit that says that this page is in
old-space. Then, whenever you do a memory read to that page, you
automatically check for a forwarding pointer in hardware and use that
instead. Objects that have moved require the extra
indirection. Objects that have yet to be copied are simply referenced
where they lay. Am I getting that right?

> The reason I talk about traps rather than this indirection technique
> is that it is more likely to occur, because less hardware would be
> involved (and it is thus more general).  Operating systems can indeed
> do relatively fast trap handling, as long as the context switch isn't
> hevyweight (or unless there is no context switch at all).  Saving of
> context, including MMX and XMM registers on Pentium and AMD hardware,
> is one of the reasons why context switches are in fact so heavyweight.
> One possible solution is that Calling Conventions could be created
> for such trap handlers which allow for only certain registers to be
> used; these could be installed in a non-context-switching first-level
> trap-handler, if compiled properly, and user-level manipulation of
> the pointers done that way.  I don't know, however, what the security
> concerns would be; they are likely to be non-nil...

Yup. Optimization frequently cuts across the security boundary. There
are a few ways to make this better, but the cost is always higher than
if you didn't have to worry about it.

Anyway, thanks for the answer.

-- 
Dave Roberts
dave -remove- AT findinglisp DoT com
http://www.findinglisp.com/
From: Duane Rettig
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <4ll9as7ra.fsf@franz.com>
Dave Roberts <···········@remove-findinglisp.com> writes:

> Duane Rettig <·····@franz.com> writes:

> > For GC support, the real problem to solve is fast forwarding of
> > pointers that have moved.  
> 
> If you could design GC from scratch, would you do a copying collector?
> They are popular with current systems, particularly generational GC
> systems, but there is clearly overhead there.

Well, Generational GCs are not garbage collectors, they are actually
garbage _leavers_.  They collect live data, and leave the garbage.
They work efficiently because of the actual tendency of most garbage
to become unreferenced early in its lifetime.  Therefore, a higher
percentage of garbage is present in a first generation than in any
successive generation, and thus collecting the live data is going
to be more efficient than collecting the garbage and leaving the
live data uncopied.  [Another nicety of collecting live data is that
the "marking" of the data a live no longer needs a bit to be set
for that data; if the data has moved, then that also becomes the
indicator that the data is live.]  Once the number of generations
is high enough, it no longer is as useful to collect data, and
perhaps it becomes less efficient to continue to move live data
around at that time (though it is still easier to identify live
data than dead, so live-data-collection might still be worthwhile
even in older generations, though there are these tradeoffs).  So
yes, when I talk about forwarding, I am assuming a copying portion
of a garbage collector, probably in a very young generation.

> > Traps being as expensive as they are,
> > it might be worthwhile to allow for a mechanism within the MMU itself,
> > along with data fetching hardware, to allow pages to be set up to
> > automatically be forwarded _without_ entering a trap.  
> 
> Could you do this at page granularity? Most of the copying GCs I have
> seen make this a per-object forwarding.

Per-object is mandatory.  It is a question of how the forwarding can
be set up.  If a simple trap per-page can be made (which is easy to
do; simply set up the page without read or write access, and when a
read or a write is done, then a trap occurs) then software in the trap
handler can look up current forwards in tables in order to know (per
object) where the forwarded address is.  So at the trap level, it
would be per page, but as far as the application is concerned, it is
just referencing an address and is getting its current location,
regardless of where that is.

If the MMU were sufficiently intricate, and didn't demand to work on
a page basis, then it could be configured to automatically forward
pointers regardless of what page they were on.  This is unlikely to
occur on modern, paged architectures, though.

> > This would allow
> > memory reads and stores to be indirected only when the page is being
> > cleaned out, as opposed to a true indirecting memory set and reference
> > technology that would _always_ go through an extra indirection; that
> > would work fine enough, but memory reads would be more than twice
> > as slow (not only two memory references, but second one being dependent
> > on the first, so the pipeline would stall often).  This slowdown can be
> > tolerated if it is only required on pages that are being worked on by
> > the gc, but not if it is the basic operation for every memory read and
> > write.
> 
> Ah, I think I see. So you set a bit that says that this page is in
> old-space.

Not quite.  You set a bit that says that this page is one which may
have forwarded pointers.  Where the page is is irrelevant, though
it is actually more likely to be in a newspace than an oldspace.

    Then, whenever you do a memory read to that page, you
> automatically check for a forwarding pointer in hardware and use that
> instead. Objects that have moved require the extra
> indirection. Objects that have yet to be copied are simply referenced
> where they lay. Am I getting that right?

Yes.

> > The reason I talk about traps rather than this indirection technique
> > is that it is more likely to occur, because less hardware would be
> > involved (and it is thus more general).  Operating systems can indeed
> > do relatively fast trap handling, as long as the context switch isn't
> > hevyweight (or unless there is no context switch at all).  Saving of
> > context, including MMX and XMM registers on Pentium and AMD hardware,
> > is one of the reasons why context switches are in fact so heavyweight.
> > One possible solution is that Calling Conventions could be created
> > for such trap handlers which allow for only certain registers to be
> > used; these could be installed in a non-context-switching first-level
> > trap-handler, if compiled properly, and user-level manipulation of
> > the pointers done that way.  I don't know, however, what the security
> > concerns would be; they are likely to be non-nil...
> 
> Yup. Optimization frequently cuts across the security boundary. There
> are a few ways to make this better, but the cost is always higher than
> if you didn't have to worry about it.
> 
> Anyway, thanks for the answer.

No problem.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Dave Roberts
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <m3acppx0qz.fsf@linux.droberts.com>
Duane Rettig <·····@franz.com> writes:

> Dave Roberts <···········@remove-findinglisp.com> writes:
> 
> > Duane Rettig <·····@franz.com> writes:
> 
> > > For GC support, the real problem to solve is fast forwarding of
> > > pointers that have moved.  
> > 
> > If you could design GC from scratch, would you do a copying collector?
> > They are popular with current systems, particularly generational GC
> > systems, but there is clearly overhead there.
> 
> Well, Generational GCs are not garbage collectors, they are actually
> garbage _leavers_.  They collect live data, and leave the garbage.
> They work efficiently because of the actual tendency of most garbage
> to become unreferenced early in its lifetime.  Therefore, a higher
> percentage of garbage is present in a first generation than in any
> successive generation, and thus collecting the live data is going
> to be more efficient than collecting the garbage and leaving the
> live data uncopied.  [Another nicety of collecting live data is that
> the "marking" of the data a live no longer needs a bit to be set
> for that data; if the data has moved, then that also becomes the
> indicator that the data is live.]  Once the number of generations
> is high enough, it no longer is as useful to collect data, and
> perhaps it becomes less efficient to continue to move live data
> around at that time (though it is still easier to identify live
> data than dead, so live-data-collection might still be worthwhile
> even in older generations, though there are these tradeoffs).  So
> yes, when I talk about forwarding, I am assuming a copying portion
> of a garbage collector, probably in a very young generation.

I think you missed the root of my question. I understand the theory of
generational GCs. I understand why copying collectors are typically
used for young generations. But those collectors are built for
existing machines with a certain set of assumptions behind them. Some
of those assumptions are about the nature of memory allocation (the
young typically die quickly) and some are about the nature of the
standard machines on which these operate (memory access speeds,
paging, etc.). Obviously, the first set of assumptions about the
behavior of garbage is fixed given the language in which we work. The
question is really about changing the assumptions of the second
set. If we could make a machine do what we *want* rather than what
some CPU designer thought was the right thing, could we do better? If
so, would we still use the same algorithms, or would we shift to a new
optimum point on the curve?

> > > Traps being as expensive as they are,
> > > it might be worthwhile to allow for a mechanism within the MMU itself,
> > > along with data fetching hardware, to allow pages to be set up to
> > > automatically be forwarded _without_ entering a trap.  
> > 
> > Could you do this at page granularity? Most of the copying GCs I have
> > seen make this a per-object forwarding.
> 
> Per-object is mandatory.  It is a question of how the forwarding can
> be set up.  If a simple trap per-page can be made (which is easy to
> do; simply set up the page without read or write access, and when a
> read or a write is done, then a trap occurs) then software in the trap
> handler can look up current forwards in tables in order to know (per
> object) where the forwarded address is.  So at the trap level, it
> would be per page, but as far as the application is concerned, it is
> just referencing an address and is getting its current location,
> regardless of where that is.

Right, gotcha.

> If the MMU were sufficiently intricate, and didn't demand to work on
> a page basis, then it could be configured to automatically forward
> pointers regardless of what page they were on.  This is unlikely to
> occur on modern, paged architectures, though.

Right. The original Lisp machines did some variant of this, right? You
could mark a pointer as special somehow and the microcode would do a
double indirection if it encountered it, I believe.

> > > This would allow
> > > memory reads and stores to be indirected only when the page is being
> > > cleaned out, as opposed to a true indirecting memory set and reference
> > > technology that would _always_ go through an extra indirection; that
> > > would work fine enough, but memory reads would be more than twice
> > > as slow (not only two memory references, but second one being dependent
> > > on the first, so the pipeline would stall often).  This slowdown can be
> > > tolerated if it is only required on pages that are being worked on by
> > > the gc, but not if it is the basic operation for every memory read and
> > > write.
> > 
> > Ah, I think I see. So you set a bit that says that this page is in
> > old-space.
> 
> Not quite.  You set a bit that says that this page is one which may
> have forwarded pointers.  Where the page is is irrelevant, though
> it is actually more likely to be in a newspace than an oldspace.

Okay, right. That's basically what I meant. I was making the jump that
this would be most useful for old pages.

-- 
Dave Roberts
dave -remove- AT findinglisp DoT com
http://www.findinglisp.com/
From: Duane Rettig
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <4is4doy4g.fsf@franz.com>
Dave Roberts <···········@remove-findinglisp.com> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > Dave Roberts <···········@remove-findinglisp.com> writes:
> > 
> > > Duane Rettig <·····@franz.com> writes:
> > 
> > > > For GC support, the real problem to solve is fast forwarding of
> > > > pointers that have moved.  
> > > 
> > > If you could design GC from scratch, would you do a copying collector?
> > > They are popular with current systems, particularly generational GC
> > > systems, but there is clearly overhead there.
> > 
> > Well, Generational GCs are not garbage collectors, they are actually
> > garbage _leavers_.  They collect live data, and leave the garbage.
> > They work efficiently because of the actual tendency of most garbage
> > to become unreferenced early in its lifetime.  Therefore, a higher
> > percentage of garbage is present in a first generation than in any
> > successive generation, and thus collecting the live data is going
> > to be more efficient than collecting the garbage and leaving the
> > live data uncopied.  [Another nicety of collecting live data is that
> > the "marking" of the data a live no longer needs a bit to be set
> > for that data; if the data has moved, then that also becomes the
> > indicator that the data is live.]  Once the number of generations
> > is high enough, it no longer is as useful to collect data, and
> > perhaps it becomes less efficient to continue to move live data
> > around at that time (though it is still easier to identify live
> > data than dead, so live-data-collection might still be worthwhile
> > even in older generations, though there are these tradeoffs).  So
> > yes, when I talk about forwarding, I am assuming a copying portion
> > of a garbage collector, probably in a very young generation.
> 
> I think you missed the root of my question.

No, I didn't, but in order to really answer it, I would have to
provide a lot of background.  OK, you asked for it...

> I understand the theory of
> generational GCs. I understand why copying collectors are typically
> used for young generations. But those collectors are built for
> existing machines with a certain set of assumptions behind them. Some
> of those assumptions are about the nature of memory allocation (the
> young typically die quickly) and some are about the nature of the
> standard machines on which these operate (memory access speeds,
> paging, etc.). Obviously, the first set of assumptions about the
> behavior of garbage is fixed given the language in which we work.

Agreed.

> The question is really about changing the assumptions of the second
> set.

Yes, and my caution is that we want not to do that.  In a previous
career I was a hardware engineer, and I worked in test engineering
labs.  We built special-purpose equipment for specialized testing
strategies, and yet there was always the requirement that we keep
costs down and reuse high as much as possible - the problem with
not doing so was not that it was too expensive, but that we would
miss dates for production startups.  Thus, even in such highly
specialized situations, we tried as much as possible to stay with
a mainstream of design thought, geared toward the high reuse of
older designs _and_ the brainstorming of potential future uses.
Needless to say that this thinking led to the incorporation of
a lot of software into this design process.  And as little as
we could change hardware, the decisions were usually weighted
toward those choices.

When I started working on Lisp, I joined a Company that believed in
getting the most out of General Purpose hardware, and I joined
it for a reason, so it should be no surprise that I advocate for
a more GP solution, or taking advantage of hardware that can
also be used generally; thus increasing its chances of that
hardware surviving a generation or two of redesigns.

> If we could make a machine do what we *want* rather than what
> some CPU designer thought was the right thing, could we do better? If
> so, would we still use the same algorithms, or would we shift to a new
> optimum point on the curve?

Very likely so, but if it did not also have a local optimum for
non-gc languages, then it would be likely to die the same death
that most special-purose hardware eventually dies.  

Now, I believe that my view on what constitutes GP hardware might be
surprising, so I will state it simply - it is not the architecture that
makes a computer General Purpose, but its survivability.  Consider,
for example, your own computer; it is actually a special-purpose box
that has a certain set of hardware that runs on it at a certain speed
(likely it is as fast as it can reasonably be clocked), and even if you
just bought it recently, chances are it will be obsolete in 5 years.
Why?  Because you will be buying other boxes (perhaps even with a
different architecture) by then, because your nice, crisp box will
seem so painfully slow by then.  And computer manufacturers keep making
new machines, so the older ones will continue to grow obsolete.
But each generation of architecture has many points in common with its
predecessors, so there is a continuum along which buyers of these
computers are willing to travel.  That is my GP hardware philosophy.

So how does a new feature make it into an architecture _and_ stay?  It
must be in demand by a broad range of users, at least enough to sustain
a critical mass of buyers.  Otherwise, costs of adding the extra usage to
the real-estate of the chip, and the extra design and debug time, make
it less likely that a feature would get in.

Now, one thing that we GC advocates now have going for us is the
newfound acceptability in the marketplace of the concept of
garbage-collection.  It is no longer a dirty word, although there
are still many different gc concepts and theories about how the
colorations can be done either in software or sped up in hardware.
So whether a hardware-based approach will fly (i.e. be incorporated
into silicon and last) depends on how general the concept can be
made, and how much gain it can give to all languages for an appropriate
amount of cost.

So, finally, we come to the answer to your question.  But, (and sorry
about this :-) instead of just giving you an answer, which would only
be an opinion of mine, let me turn your question on its ear and let
me answer you by asking you a question: Given that you want to remove
the copying of objects by a generational gc, what steps might you take
in hardware to maintain the generational nature of the gc, and yet to
provide the information necessary to perform the garbage collection?


> > > > This would allow
> > > > memory reads and stores to be indirected only when the page is being
> > > > cleaned out, as opposed to a true indirecting memory set and reference
> > > > technology that would _always_ go through an extra indirection; that
> > > > would work fine enough, but memory reads would be more than twice
> > > > as slow (not only two memory references, but second one being dependent
> > > > on the first, so the pipeline would stall often).  This slowdown can be
> > > > tolerated if it is only required on pages that are being worked on by
> > > > the gc, but not if it is the basic operation for every memory read and
> > > > write.
> > > 
> > > Ah, I think I see. So you set a bit that says that this page is in
> > > old-space.
> > 
> > Not quite.  You set a bit that says that this page is one which may
> > have forwarded pointers.  Where the page is is irrelevant, though
> > it is actually more likely to be in a newspace than an oldspace.
> 
> Okay, right. That's basically what I meant. I was making the jump that
> this would be most useful for old pages.

I would think the opposite; the more likely place for a forwarding
pointer is to a newly moved object, which is most optimally in objects
just moved from the first to the second generation (and less and less
likely in generations after that).

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Thomas Gagne
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <bdCdnbMtpuTukr7fRVn-oQ@wideopenwest.com>
Duane Rettig wrote:
> <snip> 
> 
> Yes, and my caution is that we want not to do that.  In a previous
> career I was a hardware engineer, and I worked in test engineering
> labs.  We built special-purpose equipment for specialized testing
> strategies, and yet there was always the requirement that we keep
> costs down and reuse high as much as possible - the problem with
> not doing so was not that it was too expensive, but that we would
> miss dates for production startups.  Thus, even in such highly
> specialized situations, we tried as much as possible to stay with
> a mainstream of design thought, geared toward the high reuse of
> older designs _and_ the brainstorming of potential future uses.
> Needless to say that this thinking led to the incorporation of
> a lot of software into this design process.  And as little as
> we could change hardware, the decisions were usually weighted
> toward those choices.
> 
> When I started working on Lisp, I joined a Company that believed in
> getting the most out of General Purpose hardware, and I joined
> it for a reason, so it should be no surprise that I advocate for
> a more GP solution, or taking advantage of hardware that can
> also be used generally; thus increasing its chances of that
> hardware surviving a generation or two of redesigns.
> 

All very relevant, but I don't understand why NVidia and ATM and other graphic 
chip designers seem unencumbered by these restrictions, which seem more 
psychological and economical.  It was recently reported the video game market 
out-earned Hollywood last year.  That's a lot of money made on a lot of 
hardware that didn't exist a few years ago.  Though Moore's law threatens to 
leave the x86 architecture the GPUs are taking more responsibility for system 
throughput.

And people are paying for it.

Like the 80s, there are different chips that aren't all compatible with each 
other.  Each year or so great new products come out that obsolete previous 
ones and not only because they're faster but because of new instructions.  And 
each year people are buying them.  Over the years special slots have been 
added to motherboards to accomodate new video cards standards.

But still the x86 can't assist VM-based languages with GCs except by doing 
what it has always done a little faster than it used to.

I don't understand hardware design that well and so must be misinterpreting 
what's happening in the industry.
From: Duane Rettig
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <47jksvcmu.fsf@franz.com>
Thomas Gagne <······@wide-open-west.com> writes:

> Duane Rettig wrote:
> > <snip> Yes, and my caution is that we want not to do that.  In a
> > previous
> 
> > career I was a hardware engineer, and I worked in test engineering
> > labs.  We built special-purpose equipment for specialized testing
> > strategies, and yet there was always the requirement that we keep
> > costs down and reuse high as much as possible - the problem with
> > not doing so was not that it was too expensive, but that we would
> > miss dates for production startups.  Thus, even in such highly
> > specialized situations, we tried as much as possible to stay with
> > a mainstream of design thought, geared toward the high reuse of
> > older designs _and_ the brainstorming of potential future uses.
> > Needless to say that this thinking led to the incorporation of
> > a lot of software into this design process.  And as little as
> > we could change hardware, the decisions were usually weighted
> > toward those choices.
> > When I started working on Lisp, I joined a Company that believed in
> 
> > getting the most out of General Purpose hardware, and I joined
> > it for a reason, so it should be no surprise that I advocate for
> > a more GP solution, or taking advantage of hardware that can
> > also be used generally; thus increasing its chances of that
> > hardware surviving a generation or two of redesigns.
> >
> 
> 
> All very relevant, but I don't understand why NVidia and ATM and other
> graphic chip designers seem unencumbered by these restrictions, which
> seem more psychological and economical.

Precisely.  If you note my description of what constitutes GP hardware,
(and perhaps you'll have to read between the lines - it was late last
night :-) you can see that GPUs fit my description of General Purpose.
And it always has a phsychological and economic bent to it.  Look at
the history of mainstream computing - how many GP chips do you know of
would survive nowadays without a Floating Point Unit?  And yet, only 20
years ago FP units were usually special hardware and treated as separate
from the main processor; Software had to know about these distinctions,
but nowadays such knowledge about the availability and kind of FP unit
is unnecessary, because there is usually only one per architecture
(yes, I know, there are exceptions, such as SSE/SSE2 on x86/x86-64
hardware, but the normal standard uses the x87-style fp unit for x86,
and the SSE/SSE2 for x86-64).  Anyway, in another 5 to 10 years no chip
will likely be manufatured without a GPU - I believe it is the next
generation of GP hardware.  And like floating point, there will likely
be standardization efforts, and eventually people will wonder how
anyone got along without GPU power in the "old" days.

>  It was recently reported the
> video game market out-earned Hollywood last year.  That's a lot of
> money made on a lot of hardware that didn't exist a few years ago.
> Though Moore's law threatens to leave the x86 architecture the GPUs
> are taking more responsibility for system throughput.

Yes, I believe that Moore's Law supports my claim that the box to
which you type (whatever it is and however new) is itself a special
purpose piece of hardware and that you will discard it for another
in a few years.  And that the presence of _so_ much money in the GPU
market _defines_ almost any chip that makes it in that market as GP.

> And people are paying for it.

Yes.  Everybody hardware development is paid for; I believe that the
hardware becomes GP when that payment can be amortized over a large
group of people, rather than one or a few companies footing the bill
for the development of the special-purpose hardware that will not
be sold but used.  Note that with this definition, you can consider
the prototype of a new GP box to itself be special-purpose hardware -
you wouldn't normally _sell_ your prototype, except under some special
arrangements...

> Like the 80s, there are different chips that aren't all compatible
> with each other.  Each year or so great new products come out that
> obsolete previous ones and not only because they're faster but because
> of new instructions.  And each year people are buying them.  Over the
> years special slots have been added to motherboards to accomodate new
> video cards standards.

I didn't know the NVidia architecture, so I took a quick look.  Their
website is very helpful, and their hardware architecture looks to be
very forward-thinking (a requirement in order for it to become GP).  I
also checked out their Cg faq (Cg stands for "C for Graphics") here:
http://developer.nvidia.com/object/cg_faq.html  and noted that
 1. They base it on OpenGl
 2. Many people are using it.
 3. They are committed to supporting it on Windows, Linux, and MacOSX
In other words, they seem committed at least to GP software, and to
an upgrade path toward their more specialized hardware (which provides
the continuum I was talking about to make that specialized hardware GP).

> But still the x86 can't assist VM-based languages with GCs except by
> doing what it has always done a little faster than it used to.

Right; as I've been saying in other ways; we might have a better chance
to effect more minor changes in GP hardware or operating systems than
for major blue-sky changes - in order for major changes to occur, there
needs to be a lot of money available to sink into the development, and
I don't see that much money in Lisp and Smalltalk vendors and consortiums.
We two communities are relatively small due to our not being supported
by hardware vendors as a means to their ends.

> I don't understand hardware design that well and so must be
> misinterpreting what's happening in the industry.

It's simple; the industry moves to where the money is.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Dave Roberts
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <m34qfwh5jx.fsf@linux.droberts.com>
Duane Rettig <·····@franz.com> writes:

> Thomas Gagne <······@wide-open-west.com> writes:
> > But still the x86 can't assist VM-based languages with GCs except by
> > doing what it has always done a little faster than it used to.
> 
> Right; as I've been saying in other ways; we might have a better chance
> to effect more minor changes in GP hardware or operating systems than
> for major blue-sky changes - in order for major changes to occur, there
> needs to be a lot of money available to sink into the development, and
> I don't see that much money in Lisp and Smalltalk vendors and consortiums.
> We two communities are relatively small due to our not being supported
> by hardware vendors as a means to their ends.

This is exactly right and I think we're in violent agreement here,
Duane. Lisp or Smalltalk won't drive a "GC-coprocessor," DLX, or
whatever we want to call it. This will have to be done by a
combination of Java, .NET, Perl, Python, and, at the end of the list,
Lisp and Smalltalk (choose the order of those depending on whether
you're reading this comp.lang.lisp or comp.lang.smalltalk ;-). The
question is whether there can be additions to a basic x86 (and by
extension PowerPC) architecture that will provide substantial benefit
for these language problems without undue cost burden.

> > I don't understand hardware design that well and so must be
> > misinterpreting what's happening in the industry.
> 
> It's simple; the industry moves to where the money is.

Always. ;-) Anything else is a hobby, not a business.

-- 
Dave Roberts
dave -remove- AT findinglisp DoT com
http://www.findinglisp.com/
From: Tim May
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <280220051844137213%timcmay@removethis.got.net>
In article <··············@linux.droberts.com>, Dave Roberts
<···········@remove-findinglisp.com> wrote:

> Duane Rettig <·····@franz.com> writes:

> > 
> > It's simple; the industry moves to where the money is.
> 
> Always. ;-) Anything else is a hobby, not a business.

Even jokingly, not always.

Here's one example. Intel made good use of MAINSAIL for chip design. It
was a boutique language, derived from SAIL, the Stanford AI Lab
language (other than LISP), based largely on Algol. The MAIN part came
from MAchine INdependent.

Anyway, a boutique language supported by a small handful of people.
Last I checked, they were all based out of Bodega Bay, CA, or somesuch
small town. 

The "industry" and the "money" definitely went to more popular and more
trendy languages. 

But still it is possible to make good tools and even make good money
doing something that is not where the money is moving to.

--Tim May
From: Thomas Gagne
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <79KdnUZBzKaKQL7fRVn-qQ@wideopenwest.com>
Duane Rettig wrote:
> Thomas Gagne <······@wide-open-west.com> writes:
> 
<snip>
>>
>>
>>All very relevant, but I don't understand why NVidia and ATM and other
>>graphic chip designers seem unencumbered by these restrictions, which
>>seem more psychological and economical.

Duane, would your response have been different if I had typed ".. seemed 
more psychological *than* economical?"  The reason I say that is, as you 
point out, the GP vendors seem not to have been constrained as much by 
what's already there than by what new features they think gamers will 
pay money for.  And again my point would be my willingness to purchase a 
special-purpose processor if it accelerated the execution of VM-based 
languages like Smalltalk, Lisp, and Java.
> 
> 
> Precisely.  If you note my description of what constitutes GP hardware,
> (and perhaps you'll have to read between the lines - it was late last
> night :-) you can see that GPUs fit my description of General Purpose.
> And it always has a phsychological and economic bent to it.  Look at
> the history of mainstream computing - how many GP chips do you know of
> would survive nowadays without a Floating Point Unit?  And yet, only 20
> years ago FP units were usually special hardware and treated as separate
> from the main processor; Software had to know about these distinctions,
> but nowadays such knowledge about the availability and kind of FP unit
> is unnecessary, because there is usually only one per architecture
> (yes, I know, there are exceptions, such as SSE/SSE2 on x86/x86-64
> hardware, but the normal standard uses the x87-style fp unit for x86,
> and the SSE/SSE2 for x86-64).  Anyway, in another 5 to 10 years no chip
> will likely be manufatured without a GPU - I believe it is the next
> generation of GP hardware.  And like floating point, there will likely
> be standardization efforts, and eventually people will wonder how
> anyone got along without GPU power in the "old" days.
> 
> 
>> It was recently reported the
>>video game market out-earned Hollywood last year.  That's a lot of
>>money made on a lot of hardware that didn't exist a few years ago.
>>Though Moore's law threatens to leave the x86 architecture the GPUs
>>are taking more responsibility for system throughput.
> 
> 
> Yes, I believe that Moore's Law supports my claim that the box to
> which you type (whatever it is and however new) is itself a special
> purpose piece of hardware and that you will discard it for another
> in a few years.  And that the presence of _so_ much money in the GPU
> market _defines_ almost any chip that makes it in that market as GP.
> 
> 
>>And people are paying for it.
> 
> 
> Yes.  Everybody hardware development is paid for; I believe that the
> hardware becomes GP when that payment can be amortized over a large
> group of people, rather than one or a few companies footing the bill
> for the development of the special-purpose hardware that will not
> be sold but used.  Note that with this definition, you can consider
> the prototype of a new GP box to itself be special-purpose hardware -
> you wouldn't normally _sell_ your prototype, except under some special
> arrangements...
> 
> 
>>Like the 80s, there are different chips that aren't all compatible
>>with each other.  Each year or so great new products come out that
>>obsolete previous ones and not only because they're faster but because
>>of new instructions.  And each year people are buying them.  Over the
>>years special slots have been added to motherboards to accomodate new
>>video cards standards.
> 
> 
> I didn't know the NVidia architecture, so I took a quick look.  Their
> website is very helpful, and their hardware architecture looks to be
> very forward-thinking (a requirement in order for it to become GP).  I
> also checked out their Cg faq (Cg stands for "C for Graphics") here:
> http://developer.nvidia.com/object/cg_faq.html  and noted that
>  1. They base it on OpenGl
>  2. Many people are using it.
>  3. They are committed to supporting it on Windows, Linux, and MacOSX
> In other words, they seem committed at least to GP software, and to
> an upgrade path toward their more specialized hardware (which provides
> the continuum I was talking about to make that specialized hardware GP).
> 
> 
>>But still the x86 can't assist VM-based languages with GCs except by
>>doing what it has always done a little faster than it used to.
> 
> 
> Right; as I've been saying in other ways; we might have a better chance
> to effect more minor changes in GP hardware or operating systems than
> for major blue-sky changes - in order for major changes to occur, there
> needs to be a lot of money available to sink into the development, and
> I don't see that much money in Lisp and Smalltalk vendors and consortiums.
> We two communities are relatively small due to our not being supported
> by hardware vendors as a means to their ends.
> 
> 
>>I don't understand hardware design that well and so must be
>>misinterpreting what's happening in the industry.
> 
> 
> It's simple; the industry moves to where the money is.
> 
From: Duane Rettig
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <47jkrokbw.fsf@franz.com>
Thomas Gagne <······@wide-open-west.com> writes:

> Duane Rettig wrote:
> > Thomas Gagne <······@wide-open-west.com> writes:
> >
> 
> <snip>
> >>
> >>
> >>All very relevant, but I don't understand why NVidia and ATM and other
> >>graphic chip designers seem unencumbered by these restrictions, which
> >>seem more psychological and economical.
> 
> > Precisely.  If you note my description of what constitutes GP
> > hardware,
>
> Duane, would your response have been different if I had typed
> ".. seemed more psychological *than* economical?"  The reason I say
> that is, as you point out, the GP vendors seem not to have been
> constrained as much by what's already there than by what new features
> they think gamers will pay money for.  And again my point would be my
> willingness to purchase a special-purpose processor if it accelerated
> the execution of VM-based languages like Smalltalk, Lisp, and Java.

And if you are able to put together a strong consortium of Smalltalk,
Lisp, and Java users willing to pay for such feature enhancement,
you could probably get a GP vendor to listen to you.

No, my response to an assertion that put psychoplogy over economy
would be disagrement.  Psychology is usually not inconsequential,
but in any organization driven by the bottom line, the bottom
line is always the bottom line...

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Dave Roberts
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <m38y58h5wy.fsf@linux.droberts.com>
Duane Rettig <·····@franz.com> writes:

> Dave Roberts <···········@remove-findinglisp.com> writes:
> > I think you missed the root of my question.
> 
> No, I didn't, but in order to really answer it, I would have to
> provide a lot of background.  OK, you asked for it...

Okay, hit me! ;-)

> > The question is really about changing the assumptions of the second
> > set.
> 
> Yes, and my caution is that we want not to do that.  In a previous
> career I was a hardware engineer, and I worked in test engineering
> labs.  We built special-purpose equipment for specialized testing
> strategies, and yet there was always the requirement that we keep
> costs down and reuse high as much as possible - the problem with
> not doing so was not that it was too expensive, but that we would
> miss dates for production startups.  Thus, even in such highly
> specialized situations, we tried as much as possible to stay with
> a mainstream of design thought, geared toward the high reuse of
> older designs _and_ the brainstorming of potential future uses.
> Needless to say that this thinking led to the incorporation of
> a lot of software into this design process.  And as little as
> we could change hardware, the decisions were usually weighted
> toward those choices.
> 
> When I started working on Lisp, I joined a Company that believed in
> getting the most out of General Purpose hardware, and I joined
> it for a reason, so it should be no surprise that I advocate for
> a more GP solution, or taking advantage of hardware that can
> also be used generally; thus increasing its chances of that
> hardware surviving a generation or two of redesigns.

Okay, fair enough. You'll find that I tend toward a lot of thought
experiments along the line of, "How fast could this go, with no
artificial limits?" The people that I work with will recognize these
as my "speed of light" questions. My style tends to be to try to
determine the boundaries of the underlying problem, as shaped by
the actual problem itself and not by any external limitations.

Then, I back things off with the reality that you can never start from
scratch and often have to make accommodation. For your case, for
instance, it certainly makes sense for Franz to concentrate on
standard platforms.

What this personal methodology does, however, is make you aware of how
much performance/cost/etc. you are leaving on the table by choosing a
certain route. It also tends to make you much more aware of where the
pressure points are that cause you to leave things on the table.

So, in short, view my original question as one of those speed-of-light
types, just trying to probe the boundaries of the problem at hand.

> > If we could make a machine do what we *want* rather than what
> > some CPU designer thought was the right thing, could we do better? If
> > so, would we still use the same algorithms, or would we shift to a new
> > optimum point on the curve?
> 
> Very likely so, but if it did not also have a local optimum for
> non-gc languages, then it would be likely to die the same death
> that most special-purose hardware eventually dies.  

Yes, agreed. The question ultimately is whether there is a
compromise. Again, I'm thinking about something like the original
Pentium MMX instruction set extensions for various media
operations. Call them DLX for Dynamic Language Extensions, for
instance.

> Now, I believe that my view on what constitutes GP hardware might be
> surprising, so I will state it simply - it is not the architecture that
> makes a computer General Purpose, but its survivability.  Consider,
> for example, your own computer; it is actually a special-purpose box
> that has a certain set of hardware that runs on it at a certain speed
> (likely it is as fast as it can reasonably be clocked), and even if you
> just bought it recently, chances are it will be obsolete in 5 years.
> Why?  Because you will be buying other boxes (perhaps even with a
> different architecture) by then, because your nice, crisp box will
> seem so painfully slow by then.  And computer manufacturers keep making
> new machines, so the older ones will continue to grow obsolete.
> But each generation of architecture has many points in common with its
> predecessors, so there is a continuum along which buyers of these
> computers are willing to travel.  That is my GP hardware philosophy.
> 
> So how does a new feature make it into an architecture _and_ stay?  It
> must be in demand by a broad range of users, at least enough to sustain
> a critical mass of buyers.  Otherwise, costs of adding the extra usage to
> the real-estate of the chip, and the extra design and debug time, make
> it less likely that a feature would get in.

Sort of. Realize that there are two interests at work here: those of
the buyers and those of the sellers. The buyers want best
price/performance on the things that they are using today. The sellers
want to keep the wheel of obsolescence spinning as rapidly as possible
such that in five years (and preferably three) you really do want to
upgrade to a new machine. (That said, I'm typing this on a
five-year-old laptop that works perfectly acceptably for the basic
editing tasks that I ask of it. ;-) So, the sellers are motivated to
introduce new capabilities that shake up the status quo simply for the
reason that there needs to be something new to keep on attracting the
old buyers. I could easily see Intel or AMD saying, "Gee, software
development is shifting towards various virtual-machine sorts of
architectures. That's interesting because it gives us something new to
optimize for, but also a threat since it makes us less relevant. Let's
see if we can add a couple features to our products such that they run
those VMs better than other architectures, thereby keeping the
momentum for the underlying hardware moving in our direction." Given
that all those various VM-based languages, Java and .NET, for
instance, use GC, that's probably a good thing for us
Lispers/Smalltalkers/etc.

> Now, one thing that we GC advocates now have going for us is the
> newfound acceptability in the marketplace of the concept of
> garbage-collection.  It is no longer a dirty word, although there
> are still many different gc concepts and theories about how the
> colorations can be done either in software or sped up in hardware.
> So whether a hardware-based approach will fly (i.e. be incorporated
> into silicon and last) depends on how general the concept can be
> made, and how much gain it can give to all languages for an appropriate
> amount of cost.

Right. Agreed fully. I'm starting to think, however, that the time has
come to ask the question. The answer is no longer a strange,
Lisp-specific thing, but something broadly applicable to everybody
running any garbage collected language.

> So, finally, we come to the answer to your question.  But, (and sorry
> about this :-) instead of just giving you an answer, which would only
> be an opinion of mine, let me turn your question on its ear and let
> me answer you by asking you a question: Given that you want to remove
> the copying of objects by a generational gc, what steps might you take
> in hardware to maintain the generational nature of the gc, and yet to
> provide the information necessary to perform the garbage collection?

Hmmm... I don't think I'm qualified to answer. ;-) And that said, I
don't actually want to remove the copying. My question was simply
whether one *would* remove the copying or not. Moving things around in
memory certainly costs time, and it never runs at faster than 1/2 the
speed of your memory bandwidth for anything larger than will fit in
cache. Given that processor speeds have increased tremendously but
memory speeds have not kept pace, it's natural to question whether
this is a problem.

In my own particular area of expertise, networking, we see this all
the time. For instance, many of the networking stack implementations
are jumping through hoops to try to avoid copies for exactly this
reason.

Whether this is an issue in current GC implementations was part of my
question. I don't know that it is. You might tell me that there are
other portions of the GC cycle that dominate and that for most
generational collectors the surviving objects in the Eden set are very
small and thus easy to copy. That's the root of the question. Clearly,
today, it's faster to copy because it makes following pointers and the
rest so much easier, avoids fragmentation, etc. If you could avoid
some of those other problems using hardware, however, the question is
whether you would stick with the same technique. Only somebody who
works closely with a GC implementation could say for sure. ;-)

-- 
Dave Roberts
dave -remove- AT findinglisp DoT com
http://www.findinglisp.com/
From: Duane Rettig
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <4bra3oknj.fsf@franz.com>
Dave Roberts <···········@remove-findinglisp.com> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > Dave Roberts <···········@remove-findinglisp.com> writes:
> > > I think you missed the root of my question.
> > 
> > No, I didn't, but in order to really answer it, I would have to
> > provide a lot of background.  OK, you asked for it...
> 
> Okay, hit me! ;-)
> 
> > > The question is really about changing the assumptions of the second
> > > set.
> > 
> > Yes, and my caution is that we want not to do that.  In a previous
> > career I was a hardware engineer, and I worked in test engineering
> > labs.  We built special-purpose equipment for specialized testing
> > strategies, and yet there was always the requirement that we keep
> > costs down and reuse high as much as possible - the problem with
> > not doing so was not that it was too expensive, but that we would
> > miss dates for production startups.  Thus, even in such highly
> > specialized situations, we tried as much as possible to stay with
> > a mainstream of design thought, geared toward the high reuse of
> > older designs _and_ the brainstorming of potential future uses.
> > Needless to say that this thinking led to the incorporation of
> > a lot of software into this design process.  And as little as
> > we could change hardware, the decisions were usually weighted
> > toward those choices.
> > 
> > When I started working on Lisp, I joined a Company that believed in
> > getting the most out of General Purpose hardware, and I joined
> > it for a reason, so it should be no surprise that I advocate for
> > a more GP solution, or taking advantage of hardware that can
> > also be used generally; thus increasing its chances of that
> > hardware surviving a generation or two of redesigns.
> 
> Okay, fair enough. You'll find that I tend toward a lot of thought
> experiments along the line of, "How fast could this go, with no
> artificial limits?" The people that I work with will recognize these
> as my "speed of light" questions. My style tends to be to try to
> determine the boundaries of the underlying problem, as shaped by
> the actual problem itself and not by any external limitations.
> 
> Then, I back things off with the reality that you can never start from
> scratch and often have to make accommodation. For your case, for
> instance, it certainly makes sense for Franz to concentrate on
> standard platforms.
> 
> What this personal methodology does, however, is make you aware of how
> much performance/cost/etc. you are leaving on the table by choosing a
> certain route. It also tends to make you much more aware of where the
> pressure points are that cause you to leave things on the table.
> 
> So, in short, view my original question as one of those speed-of-light
> types, just trying to probe the boundaries of the problem at hand.

OK, thinking outside of the box is a Good Thing, but there has to be
a box to think outside of.  When you talk about probing boundaries,
you have to know what those boundaries are, otherwise the concept of
probing them is meaningless.  And as for speed-of-light thinking (which
I take to mean allowing yourself to consider approaching the
unapproachable) you still have to know just what that speed is.
As you approach the speed of light, your mass increases exponentially.
How close do you want to get before you give up (or I could ask:
how heavy do you want to get? :-)

As far as GCs go, there should be certain principles that can be
surmised, even without being an expert in GC technology.  And
these basic principles need to be considered when thinking about
the theoretical "anything you want" offer.  For example, if you
were a hardware vendor offering to build me anything I wanted,
my first question would be "how long would you support my
extra feature?".


> > > If we could make a machine do what we *want* rather than what
> > > some CPU designer thought was the right thing, could we do better? If
> > > so, would we still use the same algorithms, or would we shift to a new
> > > optimum point on the curve?
> > 
> > Very likely so, but if it did not also have a local optimum for
> > non-gc languages, then it would be likely to die the same death
> > that most special-purose hardware eventually dies.  
> 
> Yes, agreed. The question ultimately is whether there is a
> compromise. Again, I'm thinking about something like the original
> Pentium MMX instruction set extensions for various media
> operations. Call them DLX for Dynamic Language Extensions, for
> instance.

It doesn't have to be a compromize, if it benefits non-dynamic
languages as well.

> > Now, I believe that my view on what constitutes GP hardware might be
> > surprising, so I will state it simply - it is not the architecture that
> > makes a computer General Purpose, but its survivability.  Consider,
> > for example, your own computer; it is actually a special-purpose box
> > that has a certain set of hardware that runs on it at a certain speed
> > (likely it is as fast as it can reasonably be clocked), and even if you
> > just bought it recently, chances are it will be obsolete in 5 years.
> > Why?  Because you will be buying other boxes (perhaps even with a
> > different architecture) by then, because your nice, crisp box will
> > seem so painfully slow by then.  And computer manufacturers keep making
> > new machines, so the older ones will continue to grow obsolete.
> > But each generation of architecture has many points in common with its
> > predecessors, so there is a continuum along which buyers of these
> > computers are willing to travel.  That is my GP hardware philosophy.
> > 
> > So how does a new feature make it into an architecture _and_ stay?  It
> > must be in demand by a broad range of users, at least enough to sustain
> > a critical mass of buyers.  Otherwise, costs of adding the extra usage to
> > the real-estate of the chip, and the extra design and debug time, make
> > it less likely that a feature would get in.
> 
> Sort of. Realize that there are two interests at work here: those of
> the buyers and those of the sellers. The buyers want best
> price/performance on the things that they are using today. The sellers
> want to keep the wheel of obsolescence spinning as rapidly as possible
> such that in five years (and preferably three) you really do want to
> upgrade to a new machine. (That said, I'm typing this on a
> five-year-old laptop that works perfectly acceptably for the basic
> editing tasks that I ask of it. ;-) So, the sellers are motivated to
> introduce new capabilities that shake up the status quo simply for the
> reason that there needs to be something new to keep on attracting the
> old buyers.

I disagree.  Planned obsolescence is a working concept for technologies
that don't change, but Moore's Law drives buyers to view their purchases
as obsolete all on their own.  The vendors don't have to do much work on
that front; all they need to do is to keep up the breathtaking pace in
order to stay competitive.

 I could easily see Intel or AMD saying, "Gee, software
> development is shifting towards various virtual-machine sorts of
> architectures. That's interesting because it gives us something new to
> optimize for, but also a threat since it makes us less relevant. Let's
> see if we can add a couple features to our products such that they run
> those VMs better than other architectures, thereby keeping the
> momentum for the underlying hardware moving in our direction." Given
> that all those various VM-based languages, Java and .NET, for
> instance, use GC, that's probably a good thing for us
> Lispers/Smalltalkers/etc.

That is certainly possible, but someone will have to make the sales
pitch to them, because they won't tend to figure it out on their
own.  Not because they are stupid, but because languages like Lisp
and Smalltalk aren't very visible to them, business-wise.  There are
already proposals, I believe, for JVM and CRL machines, and perhaps even
some availabe. but I don't know how advanced they are yet, or whether
they _really_ compete with GP machines yet.

> > Now, one thing that we GC advocates now have going for us is the
> > newfound acceptability in the marketplace of the concept of
> > garbage-collection.  It is no longer a dirty word, although there
> > are still many different gc concepts and theories about how the
> > colorations can be done either in software or sped up in hardware.
> > So whether a hardware-based approach will fly (i.e. be incorporated
> > into silicon and last) depends on how general the concept can be
> > made, and how much gain it can give to all languages for an appropriate
> > amount of cost.
> 
> Right. Agreed fully. I'm starting to think, however, that the time has
> come to ask the question. The answer is no longer a strange,
> Lisp-specific thing, but something broadly applicable to everybody
> running any garbage collected language.

Agreed.

> > So, finally, we come to the answer to your question.  But, (and sorry
> > about this :-) instead of just giving you an answer, which would only
> > be an opinion of mine, let me turn your question on its ear and let
> > me answer you by asking you a question: Given that you want to remove
> > the copying of objects by a generational gc, what steps might you take
> > in hardware to maintain the generational nature of the gc, and yet to
> > provide the information necessary to perform the garbage collection?
> 
> Hmmm... I don't think I'm qualified to answer. ;-) And that said, I
> don't actually want to remove the copying. My question was simply
> whether one *would* remove the copying or not. Moving things around in
> memory certainly costs time, and it never runs at faster than 1/2 the
> speed of your memory bandwidth for anything larger than will fit in
> cache. Given that processor speeds have increased tremendously but
> memory speeds have not kept pace, it's natural to question whether
> this is a problem.
> 
> In my own particular area of expertise, networking, we see this all
> the time. For instance, many of the networking stack implementations
> are jumping through hoops to try to avoid copies for exactly this
> reason.

OK, this tells me the reason for your question, at least.  Please
consider, that it is not _copying_, per se, that is of the devil,
because, if you think about it, most of CPU activity is moving
data from one location to another.  What you're _really_ shying
away from, in a paged, virtual memory architecture, with various
levels of caching at different speeds, is that copying blocks of
data smaller than the cache of page size generate a quantum cost
in speed that is associated with paging.  In other words, copying
a word between two pages that haven't been touched for a while is
very expensive, but a copy that comes right after that will cost
very little.  The problem is, that it is not _copying_ that is at
fault, but _referencing_ far data at all.

The positive effect that copying has on a gc is that it tends to
automatically group live data together, which thus aids locality of
reference, which reduces the chance that a reference will be to a
location not paged-in or in-cache.  If you do not copy data, then
you save the copying time, but you lose locality of reference,
because what has become garbage in between the live data has tended
to isolate that live data, and to drive it toward the worst-case of
having one datum per cache line or page.  Penny wise, pound foolish.

> Whether this is an issue in current GC implementations was part of my
> question. I don't know that it is. You might tell me that there are
> other portions of the GC cycle that dominate and that for most
> generational collectors the surviving objects in the Eden set are very
> small and thus easy to copy. That's the root of the question. Clearly,
> today, it's faster to copy because it makes following pointers and the
> rest so much easier, avoids fragmentation, etc. If you could avoid
> some of those other problems using hardware, however, the question is
> whether you would stick with the same technique. Only somebody who
> works closely with a GC implementation could say for sure. ;-)

If we were to move back toward non-paged, non-cached memory
architectures, then fragmentation wouldn't cause loss of locality and
thus wouldn't matter, and I would agree that copying wouldn't be
necessary.  But I don't see that hardware trend being reversed anytime
soon.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Dave Roberts
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <m3y8d7qih7.fsf@linux.droberts.com>
Duane Rettig <·····@franz.com> writes:

> OK, thinking outside of the box is a Good Thing, but there has to be
> a box to think outside of.  When you talk about probing boundaries,
> you have to know what those boundaries are, otherwise the concept of
> probing them is meaningless.  And as for speed-of-light thinking (which
> I take to mean allowing yourself to consider approaching the
> unapproachable) you still have to know just what that speed is.
> As you approach the speed of light, your mass increases exponentially.
> How close do you want to get before you give up (or I could ask:
> how heavy do you want to get? :-)

We're probably stretching this way too far already, but I'll dive in
again... ;-)

You're confusing asking the question "What is c (the speed of light)?"
with "Wow would I build something to go as fast as c?" I'm just asking
the first, and the answer is more constrained by the fundamental
problem space. I think you are confusing my question for the
second. My response to your question about how heavy I want to get it,
"We'll get around to that in a second. What's the upper bound on my
speed?" After we can answer that, I'll ask, "How close to that upper
bound can I get with a reasonable mass gain (say 200%)?"

Does that make sense? Anyway, any optimization problem I encounter, I
typically find that I can get 50% - 80% of the gain fairly
easily. After that, there are diminishing returns. But you have to
know what you're leaving on the table and be able to compare that with
the remaining effort to reach it to make the appropriate tradeoffs on
where to stop.

> As far as GCs go, there should be certain principles that can be
> surmised, even without being an expert in GC technology.  And
> these basic principles need to be considered when thinking about
> the theoretical "anything you want" offer.  For example, if you
> were a hardware vendor offering to build me anything I wanted,
> my first question would be "how long would you support my
> extra feature?".

Of course, but you're asking that question way too early. My response
back would be, what's your feature? If it's only one transistor over
there in the corner and it's simple for me to maintain and it brings
great speedups to all sorts of codes, well then possibly I'd support
it forever. That's particularly the case if I can attach some great
marketing term to it like "Dynamic Language Extensions" or "Java
Xcellerator Module," such that people now think that whatever I sold
them last year is obsolete. (Bonus points if the marketing term can
reduce to a TLA and has lots of "power letters" like X, R, and G. ;-)
If it requires me to double my die size and it's specific to you, my
response would be, how much money do you have? ;-)

Anyway, I think we pretty well thrashed this subject... ;-)



My bottom line: if the software industry could articulate a few
changes to existing common, general purpose processors (x86, PPC,
etc.) that would speed up codes written in dynamic or
garbage-collected languages, I think the hardware industry would be
receptive to that. Moore's law is starting to slow down and they are
looking for orthogonal optimizations. Sun's Niagra is an example of
that, optimizing for parallel, concurrent threaded workloads rather
than pure straight-line performance. With the continuing rise of Java,
.NET, Perl, Python, and other GC'd, non-C languages (not to mention
the resurgence in Lisp and Smalltalk ;-), I think the case can be made
that hardware companies should at least be looking at what sort of
additional features they can add to their existing GP designs in order
to better run this growing body of code.

The big question for the other side of the house is, if they came to
us and said, "We're ready to do something...," what would we tell them
to do?

-- 
Dave Roberts
dave -remove- AT findinglisp DoT com
http://www.findinglisp.com/
From: Steven Shaw
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <42209bc3_1@news.iprimus.com.au>
Duane Rettig wrote:
> The reason I talk about traps rather than this indirection technique
> is that it is more likely to occur, because less hardware would be
> involved (and it is thus more general).  Operating systems can indeed
> do relatively fast trap handling, as long as the context switch isn't
> hevyweight (or unless there is no context switch at all).  Saving of
> context, including MMX and XMM registers on Pentium and AMD hardware,
> is one of the reasons why context switches are in fact so heavyweight.

IIRC, Linux uses a flag that is set when the first MMX  operation is 
performed (via a trap). This way, when it's time to context switch, 
saving the MMX registers can be avoided for threads that don't use them 
in their time slice.
From: David Magda
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <867jktogw8.fsf@number6.magda.ca>
Duane Rettig <·····@franz.com> writes:

> Note that this design was done in 1984, and thus specified before
> Gabriel's 1984 book on benchmarkin Lisp systems; I don't know what
> care Smalltalkers gave to performance back then, but if it was
> anything like what Gabriel found in Lisp systems...

The ellipse ("...") leaves a lot out for those of us not in the
know. Doing a quick search, I found this ACM paper:

http://portal.acm.org/citation.cfm?id=802143

which may be partly be in what you're referring to. do you have any
details readily available?

-- 
David Magda <dmagda at ee.ryerson.ca>, http://www.magda.ca/
Because the innovator has for enemies all those who have done well under
the old conditions, and lukewarm defenders in those who may do well 
under the new. -- Niccolo Machiavelli, _The Prince_, Chapter VI
From: David Magda
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <863bvhofzf.fsf@number6.magda.ca>
David Magda <··················@ee.ryerson.ca> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > Note that this design was done in 1984, and thus specified before
> > Gabriel's 1984 book on benchmarkin Lisp systems; I don't know what
> > care Smalltalkers gave to performance back then, but if it was
> > anything like what Gabriel found in Lisp systems...
> 
> The ellipse ("...") leaves a lot out for those of us not in the
> know. Doing a quick search, I found this ACM paper:
> 
> http://portal.acm.org/citation.cfm?id=802143
> 
> which may be partly be in what you're referring to. do you have any
> details readily available?

The book referred to is available in PDF format at (ISBN
0-262-07093-6):

http://www.dreamsongs.com/Books.html

-- 
David Magda <dmagda at ee.ryerson.ca>, http://www.magda.ca/
Because the innovator has for enemies all those who have done well under
the old conditions, and lukewarm defenders in those who may do well 
under the new. -- Niccolo Machiavelli, _The Prince_, Chapter VI
From: Duane Rettig
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <4mztpp19y.fsf@franz.com>
David Magda <··················@ee.ryerson.ca> writes:

> David Magda <··················@ee.ryerson.ca> writes:
> 
> > Duane Rettig <·····@franz.com> writes:
> > 
> > > Note that this design was done in 1984, and thus specified before
> > > Gabriel's 1984 book on benchmarkin Lisp systems; I don't know what
> > > care Smalltalkers gave to performance back then, but if it was
> > > anything like what Gabriel found in Lisp systems...
> > 
> > The ellipse ("...") leaves a lot out for those of us not in the
> > know. Doing a quick search, I found this ACM paper:

Yes, Sorry about that; I don't often cross-post, so I had given too
little context.

> > http://portal.acm.org/citation.cfm?id=802143
> > 
> > which may be partly be in what you're referring to.
> 
> The book referred to is available in PDF format at (ISBN
> 0-262-07093-6):
> 
> http://www.dreamsongs.com/Books.html

Yes, this is the one.  Thanks for finding the reference; I used it
just now since my dead-tree version is at the office.

>    do you have any
> details readily available?

Mostly the numbers are now meaningless, since there are so many
new lisps and most are already optimized to maximize speed for these
benchmarks (though sometimes that becomes a negative for overall
performance).  However, the book is a fascinating read, and after
each benchmark are some quotes and quips by developers and maintainers
of various lisps, starting on p 92 after the description and timings
of the Tak benchmark.  Some of the statements form a good cross-section
of the attitudes of lisp maintainers at the time; they range from
"oops", to "oh yeah, that's easy to fix" to "this is stupid" to
"what a poor programming example" (all paraphrases).

Note when you read this book that the mentions of Franz are not
to the Company for which I work, but to Franz Lisp, which is a
free implementation which was distributed with BSD and which
became very popular in the early '80s (and which has nothing
to do with our current product, Allegro CL).

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Holger Duerer
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <874qg3eywy.fsf@ronaldann.demon.co.uk>
>>>>> "Bulent" == Bulent Murtezaoglu <··@acm.org> writes:
  >>>>> "DR" == Dave Roberts <···········@remove-findinglisp.com> writes:
    [... tags not working in newer Sparcs? ...]

    Bulent> I was was waiting for Duage Rettig to chime in on this,
    Bulent> but perhaps he's too busy to care at the moment.  In any
    Bulent> event, when the subject of tagging came up before he'd
    Bulent> tell us that it isn't necessarly hardware support for
    Bulent> tagging but rather fast user-level traps that'd help CL
    Bulent> implementors (alongside other GC languages).

Hmmm.  This has developed into an interesting thread.  But I now get a
bit confused:  Is there some agreement among the experts/knowledgeable
as to what is actually needed to make HW useful for dynamic languages?
Is that actually more hardware specific than I thought?

Coming back to the original quote that sparked this discussion:  What
made these machines at PARC so fast that today's machines are only ~50
times faster?

 [...]

    Holger
From: Christopher Browne
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <3852seF5ko48nU1@individual.net>
After takin a swig o' Arrakan spice grog, Holger Duerer <········@gmx.net> belched out:
>>>>>> "Bulent" == Bulent Murtezaoglu <··@acm.org> writes:
>   >>>>> "DR" == Dave Roberts <···········@remove-findinglisp.com> writes:
>     [... tags not working in newer Sparcs? ...]
>
>     Bulent> I was was waiting for Duage Rettig to chime in on this,
>     Bulent> but perhaps he's too busy to care at the moment.  In any
>     Bulent> event, when the subject of tagging came up before he'd
>     Bulent> tell us that it isn't necessarly hardware support for
>     Bulent> tagging but rather fast user-level traps that'd help CL
>     Bulent> implementors (alongside other GC languages).
>
> Hmmm.  This has developed into an interesting thread.  But I now get
> a bit confused: Is there some agreement among the
> experts/knowledgeable as to what is actually needed to make HW
> useful for dynamic languages?  Is that actually more hardware
> specific than I thought?
>
> Coming back to the original quote that sparked this discussion: What
> made these machines at PARC so fast that today's machines are only
> ~50 times faster?

Indeed.  I'm VERY curious about that, and what kinds of applications
were therefore involved.

It's pretty fair to say that recent enhancements have been getting
squandered on allowing GUIs to add extra "chrome" that is pretty but
nigh well useless.

On the Microsoft side, we see new architectures allow Office to have
"more chrome" so that it doesn't actually get slower even though it
requires 8x the memory and 20x the CPU as it did the other year.

On Linux and such, the same is happening with X/GNOME/KDE, and
Mozilla/OpenOffice.org.

Things _aren't_ faster; they're only barely successful in treading
water.

Mind you, I'm seeing some massive practical speedups, of late; we have
database applications at work that are seeing quite stunning improved
performance when moving from Xeons to Opterons.  Increased cache plays
along with improved memory bandwidth and a move to really spiffy disk
array stuff.  I'm not sure how much of that is terribly much
correlated with anything "dynamic," though...
-- 
output = reverse("moc.liamg" ·@" "enworbbc")
http://linuxdatabases.info/info/slony.html
"It's difficult  to extract sense  from strings, but they're  the only
communication coin we can count on." -- Alan Perlis
From: BR
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <pan.2005.02.24.08.10.44.305633@comcast.net>
On Thu, 24 Feb 2005 04:18:23 +0000, Christopher Browne wrote:

> It's pretty fair to say that recent enhancements have been getting
> squandered on allowing GUIs to add extra "chrome" that is pretty but
> nigh well useless.

That's what GPU's are for.
From: Tim May
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <240220051116024682%timcmay@removethis.got.net>
In article <···············@individual.net>, Christopher Browne
<········@acm.org> wrote:

> After takin a swig o' Arrakan spice grog, Holger Duerer <········@gmx.net>
> belched out:
> >>>>>> "Bulent" == Bulent Murtezaoglu <··@acm.org> writes:
> >   >>>>> "DR" == Dave Roberts <···········@remove-findinglisp.com> writes:
> >     [... tags not working in newer Sparcs? ...]
> >
> >     Bulent> I was was waiting for Duage Rettig to chime in on this,
> >     Bulent> but perhaps he's too busy to care at the moment.  In any
> >     Bulent> event, when the subject of tagging came up before he'd
> >     Bulent> tell us that it isn't necessarly hardware support for
> >     Bulent> tagging but rather fast user-level traps that'd help CL
> >     Bulent> implementors (alongside other GC languages).
> >
> > Hmmm.  This has developed into an interesting thread.  But I now get
> > a bit confused: Is there some agreement among the
> > experts/knowledgeable as to what is actually needed to make HW
> > useful for dynamic languages?  Is that actually more hardware
> > specific than I thought?
> >
> > Coming back to the original quote that sparked this discussion: What
> > made these machines at PARC so fast that today's machines are only
> > ~50 times faster?
> 
> Indeed.  I'm VERY curious about that, and what kinds of applications
> were therefore involved.
> 
> It's pretty fair to say that recent enhancements have been getting
> squandered on allowing GUIs to add extra "chrome" that is pretty but
> nigh well useless.

I'm also interested in seeing some benchmarks that support Alan Kay's
original point. (Not that I am doubting his overall genius, etc.)

I recall seeing the Gabriel-type benchmarks for Lisp running on the
various Lisp machines (including the D-machines, which were bipolar and
even ECL, for the Dorado, so that was mighty fast CPU technology even
for 1985). My recollection is that TAK and other benchmarks did indeed
zoom way up with the fast CMOS processors later introduced.

(Which was impressive, given the ~few nanosecond cycle times of ECL in
those days. It would be interesting to see in which year the basic Lisp
and Smalltalk benchmarks ran faster on x86 than on an ECL-based
Dorado.)

However, the CPU technology/memory technology "speed mismatch" has been
widening for decades. Look at the CPU cycle time/RAM access time in the
mid-80s versus the ratio today, it's clear why so much CPU real estate
is now going to registers and cache. A cache miss to RAM which is
"only" a factor of 10 (wild guess) faster than that of 20 years ago is
very expensive, what with the CPU clock rate being the factor of 3
orders of magnitude faster today than back then.

And, by the way, I don't buy the point that GUIs are involved in the
mismatch. Not for the very apps (Smalltalk, and maybe Lisp) that Alan
Kay was presumably talking about, at least. After all, those are under
their own (the coders's) control, especially the CPU-intensive
benchmarks and AI code. And Squeak has a GUI that is not even as
"glitzy" as the GUIs of Xerox D-Machines of that 1980s period. So the
lack of a proper speedup is presumably not blamable on either Microsoft
or Sun or Apple GUIs.

(I'm not talking about any Microsoft products like "Office" or
"Explorer," which may or may not be bloated. But this wasn't really
Alan Kay's point about CPU speeds, as near as I could tell.)

--Tim May


> 
> On the Microsoft side, we see new architectures allow Office to have
> "more chrome" so that it doesn't actually get slower even though it
> requires 8x the memory and 20x the CPU as it did the other year.
> 
> On Linux and such, the same is happening with X/GNOME/KDE, and
> Mozilla/OpenOffice.org.
> 
> Things _aren't_ faster; they're only barely successful in treading
> water.
> 
> Mind you, I'm seeing some massive practical speedups, of late; we have
> database applications at work that are seeing quite stunning improved
> performance when moving from Xeons to Opterons.  Increased cache plays
> along with improved memory bandwidth and a move to really spiffy disk
> array stuff.  I'm not sure how much of that is terribly much
> correlated with anything "dynamic," though...
From: Eliot Miranda
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <d_qTd.525$C47.329@newssvr14.news.prodigy.com>
Holger Duerer wrote:

>>>>>>"Bulent" == Bulent Murtezaoglu <··@acm.org> writes:
> 
>   >>>>> "DR" == Dave Roberts <···········@remove-findinglisp.com> writes:
>     [... tags not working in newer Sparcs? ...]
> 
>     Bulent> I was was waiting for Duage Rettig to chime in on this,
>     Bulent> but perhaps he's too busy to care at the moment.  In any
>     Bulent> event, when the subject of tagging came up before he'd
>     Bulent> tell us that it isn't necessarly hardware support for
>     Bulent> tagging but rather fast user-level traps that'd help CL
>     Bulent> implementors (alongside other GC languages).
> 
> Hmmm.  This has developed into an interesting thread.  But I now get a
> bit confused:  Is there some agreement among the experts/knowledgeable
> as to what is actually needed to make HW useful for dynamic languages?
> Is that actually more hardware specific than I thought?

What's needed is a closer relationship between hardware designers and 
dynamic language implementors.  Given this one has room to experiment to 
invent and find out which features work best.  Some people are doing 
this commercially.


> Coming back to the original quote that sparked this discussion:  What
> made these machines at PARC so fast that today's machines are only ~50
> times faster?

Flexibility.  The machines at PARC were microcoded so one had a good 
degree of programmability in the hardware.  One can't add instructions 
to commodity processors.  If one could then certain codes could be made 
to go much faster.

Imagine systems which included field-programmable logic to implement 
things such as associative lookup in hardware.   Or programmability in 
the memory cell. Or...

-- 
_______________,,,^..^,,,____________________________
Eliot Miranda              Smalltalk - Scene not herd
From: BR
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <pan.2005.02.25.04.11.13.775015@comcast.net>
On Thu, 24 Feb 2005 20:41:45 +0000, Eliot Miranda wrote:

> Flexibility.  The machines at PARC were microcoded so one had a good
> degree of programmability in the hardware.  One can't add instructions
> to commodity processors.  If one could then certain codes could be made
> to go much faster.

If memory serves (I'll have to read the manuals again). Doesn't the SPARC
have the capability to add to the instruction set (no microcode mind you).
From: Dave Roberts
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <m3d5uose8g.fsf@linux.droberts.com>
Eliot Miranda <······@pacbell.net> writes:

> Flexibility.  The machines at PARC were microcoded so one had a good
> degree of programmability in the hardware.  One can't add instructions
> to commodity processors.  If one could then certain codes could be
> made to go much faster.
> 
> Imagine systems which included field-programmable logic to implement
> things such as associative lookup in hardware.   Or programmability in
> the memory cell. Or...

Hmmm... this tickled some deeply repressed neurons...

Don't most modern x86 processors have some amount of writable
microcode store that is used for "patching" the microcode store?

Ah, yes, gotta love Google:

http://www.derkeiler.com/Mailing-Lists/Securiteam/2004-07/0090.html

The security implications are kind of scary, but it does open up some
possibilities.

-- 
Dave Roberts
dave -remove- AT findinglisp DoT com
http://www.findinglisp.com/
From: Tim Rowledge
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <e77069434d.rowledge@Gravious.telus.net>
In message <··············@linux.droberts.com>
          Dave Roberts <···········@remove-findinglisp.com> wrote:


> 
> Hmmm... this tickled some deeply repressed neurons...
> 
> Don't most modern x86 processors have some amount of writable
> microcode store that is used for "patching" the microcode store?
Not quite microcode, but recent ARM architectures have a 'TCM' - tightly
coupled memory - which is basically cache without the irritating cache
controller that thinks it always knows best. You can put pretty much
what you want in the TCM and the loading/flushing is under software
control. Of course, that then leaves one with the interesting fun of
deciding _whose_ software gets to control it. I don't care so long as
it's mine, mine, all mine... 

tim
--
Tim Rowledge, ···@sumeru.stanford.edu, http://sumeru.stanford.edu/tim
Useful random insult:- Always loses battles of wits because he's unarmed.
From: lin8080
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <421F14CC.2E9C7409@freenet.de>
Eliot Miranda schrieb:

...
> What's needed is a closer relationship between hardware designers and
> dynamic language implementors.  Given this one has room to experiment to
> invent and find out which features work best.  Some people are doing
> this commercially.

Hi.

Once there was a CPU optimization for pascal. We have it till now, but
who uses pascal today? The CPU designer have a great problem to make an
universal useable architecture. 

Maybe it is easier to design a specific layout for a small market? There
are other layouts possible (see java, seems like testing the sell
rate?). But who will take over the financial risks? 

Next step is the not computer-box chip as you can find in handies,
robots, neural-nets, handhelds or multimedia-devices (even
notebook-chips are different). This is a specific layout and the market
therefor becomes bigger, so these chips become cheaper and/or can be
done more optimized.

You can compare this situation with the 3d-graphic-chips and their
history. As I guess, the day will come, when a motherbord is a harmonic
block between CPU, Chipset and 3d Chip (means not compatible to
anything). And that is specific to on kind of applications: games
(better: simulations). The border of software is in sight and cheaper
solutions (ie. playstations) are small market since years.

Last, why not create a nice CPU-chip for the lisp-languages? Should only
fit into a CPU socket or work as a bus card. This can do a big forward
step in benchmarks and language-development, but this can be hard to
workout...

So, see lisp, it is not the mainstream in software and sadly it hangs on
the pot of available hardware. I mean, time is good for some
lisp-hardware, isn't it? Also there is a new generation of young
programmers inside here, they want walk their one ways :) 

stefan 

�hm: lisp specific chip looks round like (), right?
From: Edi Weitz
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <uzmxsycud.fsf@agharta.de>
On Fri, 25 Feb 2005 13:06:36 +0100, lin8080 <·······@freenet.de> wrote:

> Once there was a CPU optimization for pascal. We have it till now,
> but who uses pascal today?

I guess there are much, much more Delphi programmers than Lispers.

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: Julian Stecklina
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <pan.2005.02.25.15.13.37.602775@web.de>
On Fri, 25 Feb 2005 14:43:22 +0100, Edi Weitz wrote:

> On Fri, 25 Feb 2005 13:06:36 +0100, lin8080 <·······@freenet.de> wrote:
> 
>> Once there was a CPU optimization for pascal. We have it till now,
>> but who uses pascal today?
> 
> I guess there are much, much more Delphi programmers than Lispers.

I guess lin8080 meant support for nested stack frames in the x86
architecture. Most assembly language programmers only see this weird ENTER
opcode with parameters, that can be replaced by push esp, mov ebp,esp or
somesuch...
I have never seen a compiler that makes use of this. And various
optimization manuals seem to be very insisting on avoiding ENTER
altogether...

Regards,
-- 
Julian Stecklina

-- Common Lisp can do what C, C++, Java, PASCAL, PHP, Perl, (you --
-- name it) can do. Here's how:                                  --
--                                                               --
-- http://www.amazon.com/exec/obidos/ASIN/1590592395             --
From: Fabrice Popineau
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <ekf3bmep.fsf@esemetz.metz.supelec.fr>
> On Fri, 25 Feb 2005 13:06:36 +0100, lin8080 <·······@freenet.de>
> wrote:
>> Once there was a CPU optimization for pascal. We have it till now,
>> but who uses pascal today?

> I guess there are much, much more Delphi programmers than Lispers.

And if it  is about the RET  instruction, then this  is not  specific to
Pascal: the same calling convention is used everywhere in Windows... in
C/C++!

Fabrice
From: James Graves
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <cvo08g$1n9$1@new7.xnet.com>
lin8080  <·······@freenet.de> wrote:

>Last, why not create a nice CPU-chip for the lisp-languages? Should only
>fit into a CPU socket or work as a bus card. This can do a big forward
>step in benchmarks and language-development, but this can be hard to
>workout...

Or make it even easier to use.  Just have it plug into a USB 2.0 port.

So your Lisp dongle would have its own HD, processor and memory.  All
peripherials (like a display) would be used via the host system.  Maybe
it just runs a VNC type protocol for the display.

For high performance applications, maybe the Lisp dongle would have its
own network interface.

James Graves
From: Gorbag
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <w6NTd.6$5Z3.1@bos-service2.ext.ray.com>
"James Graves" <·······@typhoon.xnet.com> wrote in message
·················@new7.xnet.com...
> lin8080  <·······@freenet.de> wrote:
>
> >Last, why not create a nice CPU-chip for the lisp-languages? Should only
> >fit into a CPU socket or work as a bus card. This can do a big forward
> >step in benchmarks and language-development, but this can be hard to
> >workout...
>
> Or make it even easier to use.  Just have it plug into a USB 2.0 port.
>
> So your Lisp dongle would have its own HD, processor and memory.  All
> peripherials (like a display) would be used via the host system.  Maybe
> it just runs a VNC type protocol for the display.
>
> For high performance applications, maybe the Lisp dongle would have its
> own network interface.
>
> James Graves

Hmm, I think it's been done:

http://www.ai.sri.com/mailing-lists/slug/910630/msg00319.html
From: Nikonos
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <874qg0444s.fsf@abalone.chi-square-works.com>
"Gorbag" <······@invalid.acct> writes:

> "James Graves" <·······@typhoon.xnet.com> wrote in message
> ·················@new7.xnet.com...
>> lin8080  <·······@freenet.de> wrote:
>>
>> >Last, why not create a nice CPU-chip for the lisp-languages? Should only
>> >fit into a CPU socket or work as a bus card. This can do a big forward
>> >step in benchmarks and language-development, but this can be hard to
>> >workout...
>>
>> Or make it even easier to use.  Just have it plug into a USB 2.0 port.
>>
>> So your Lisp dongle would have its own HD, processor and memory.  All
>> peripherials (like a display) would be used via the host system.  Maybe
>> it just runs a VNC type protocol for the display.
>>
>> For high performance applications, maybe the Lisp dongle would have its
>> own network interface.
>>
>> James Graves
>
> Hmm, I think it's been done:
>
> http://www.ai.sri.com/mailing-lists/slug/910630/msg00319.html

Gee, take a look at the date it was posted:)
From: Trent Buck
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <20050226191435.2dc86892@harpo.marx>
Up spake Gorbag:
> > Or make it even easier to use.  Just have it plug into a USB 2.0 port.
> >
> > So your Lisp dongle would have its own HD, processor and memory.  All
> > peripherials (like a display) would be used via the host system.  Maybe
> > it just runs a VNC type protocol for the display.

VNC is *not* a good protocol.  I'd recommend serving both SSH and RDP
from it.  Although come to think of it, I don't know if an open RDP
*server* exists.

> > For high performance applications, maybe the Lisp dongle would have its
> > own network interface.

Yup, sell it as a commodity box, like the routers and print servers you
get on SOHO networks.

Is there even a standardized network-over-USB protocol?  ISTR one for
IEEE1394.

> Hmm, I think it's been done:
> 
> http://www.ai.sri.com/mailing-lists/slug/910630/msg00319.html

For a "Nintendo machine".  Does that mean the Famicom or something?

Whee, Lisp in ZSNES... "Of course I own the ROM!" :-)

-- 
Trent Buck, Student Errant
On two occasions I have been asked [by members of Parliament], `Pray,
Mr. Babbage, if you put into the machine wrong figures, will the right
answers come out?'  I am not able rightly to apprehend the kind of
confusion of ideas that could provoke such a question.
 -- Charles Babbage
From: rush
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <cvum25$q5s$1@ls219.htnet.hr>
"Eliot Miranda" <······@pacbell.net> wrote in message
······················@newssvr14.news.prodigy.com...
> Flexibility.  The machines at PARC were microcoded so one had a good
> degree of programmability in the hardware.  One can't add instructions
> to commodity processors.  If one could then certain codes could be made
> to go much faster.
>
> Imagine systems which included field-programmable logic to implement
> things such as associative lookup in hardware.   Or programmability in
> the memory cell. Or...

I must confess that I had tiny but still existent hopes that Transmeta could
play role of dynamic languages white knight in those matters. Imagine that
Transmeta was willing to open up to 3rd party instruction set designers. One
would get api to develop new instruction set, and to download it to the
Transmeta processor. The new "procesor" would probably imediately run
Smalltalk (or seesh Java) code faster, since it would be implemented on nice
risc instead of arcane x86, and one would keep x86 apps and compatibility.
If successfull, over the time underlying physical Transmeta processor could
be modified to have criticall support instructions included. Or even if
processor was licensed like ARM arhitecture, some manufacturer might choose
to implement Smalltalk instruction set in hardware. Well, all this is now
water under the bridge anyway.

rush
--
http://www.templatetamer.com/
http://www.folderscavenger.com/
From: John Thingstad
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <opsmm8zqyhpqzri1@mjolner.upc.no>
On 22 Feb 2005 13:51:10 -0800, Dave Roberts <·····@droberts.com> wrote:

Any chance of new chips giving hadware support for garbage collectio?

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
From: Dave Roberts
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <m3y8dfb0gz.fsf@linux.droberts.com>
"John Thingstad" <··············@chello.no> writes:

> On 22 Feb 2005 13:51:10 -0800, Dave Roberts <·····@droberts.com> wrote:
> 
> Any chance of new chips giving hadware support for garbage collectio?

My sense is, only if Java is the target. That may not be bad, as it
may be useable for other languages if they do a general-purpose
mechanism.

That said, I think it's unlikely that GC will make it in anytime
soon. The best you can probably hope for are a couple of features that
make read/write barriers more efficient.

Also, realize that most custom hardware for languages with GC use
software to do the actual collecting. Typically, the hardware GC
support just detects pointer writes and other GC-related changes to
the heap easily, which a software GC routine then uses to do its
thing.

-- 
Dave Roberts
dave -remove- AT findinglisp DoT com
http://www.findinglisp.com/
From: John Thingstad
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <opsmn1nozcpqzri1@mjolner.upc.no>
On 23 Feb 2005 10:22:20 -0800, Dave Roberts  
<···········@remove-findinglisp.com> wrote:

> "John Thingstad" <··············@chello.no> writes:
>
>> On 22 Feb 2005 13:51:10 -0800, Dave Roberts <·····@droberts.com> wrote:
>>
>> Any chance of new chips giving hadware support for garbage collectio?
>
> My sense is, only if Java is the target. That may not be bad, as it
> may be useable for other languages if they do a general-purpose
> mechanism.
>
> That said, I think it's unlikely that GC will make it in anytime
> soon. The best you can probably hope for are a couple of features that
> make read/write barriers more efficient.
>
> Also, realize that most custom hardware for languages with GC use
> software to do the actual collecting. Typically, the hardware GC
> support just detects pointer writes and other GC-related changes to
> the heap easily, which a software GC routine then uses to do its
> thing.
>

Yes. I nothiced that Corman Lisp uses memory segments to trap
overun of storeage in it's copy collector.
This allows it to allocate storeage with a simple addition.
The sourcecode for this garbage collector/heap manager is good reading and  
freely avalable.
(You get it with the download of Corman Lisp. www.cormanlisp.com)

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
From: Alexander Repenning
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <1109260520.072871.10090@l41g2000cwc.googlegroups.com>
John Thingstad wrote:
> On 22 Feb 2005 13:51:10 -0800, Dave Roberts <·····@droberts.com>
wrote:
>
> Any chance of new chips giving hadware support for garbage collectio?

We would be happy if more Lisps would offer Ephemeral Garbage
Collection. We create OpenGL-based 3D simulations in Lisp. Nothing
better to wreck a smooth animation/simulation that a non-Ephemeral
garbage collection.
From: Kelly Hall
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <ucRSd.8452$Pz7.2980@newssvr13.news.prodigy.com>
Tim May wrote:
> Eventually there _will_ be a major shift to a new ISA. Intel has
> expected this for a long time...and has, in my opinion as a long-time
> stockholder, generally done the right thing in supporting the x86 while
> also attempting to introduce other architectures.

The most obvious data points in this direction are Java and .NET, IMHO.

If a new CPU designed for one or both of these virtual machines could 
run existing Java or .NET applications with comparable performance and 
for less cost, fewer transistors or less heat than the bloated P4, we 
might be able to wean consumers away from the x86.

As for Intel "[doing] the right thing", let me say that the right thing 
for a stockholder and the right thing for the consumer are very very 
different.

> (I won't even get into exotic technologies, which are a long way away,
> IMO.)

I'm surprised you didn't mention the Pentium M, which appears to be the 
only sanity in Intel's catalog.  At 2.1 GHz, the Pentium M is turning in 
system benchmark numbers comparable to 3.x GHz P4 at a quarter of the 
power.  Bummer that Intel is pricing it so high - it's obvious that they 
are keeping the M out of the desktop market so it doesn't kill the P4.

Kelly
From: Holger Duerer
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <87hdk3fuoi.fsf@ronaldann.demon.co.uk>
>>>>> "Kelly" == Kelly Hall <·····@acm.org> writes:

    Kelly> Tim May wrote:
    >> Eventually there _will_ be a major shift to a new ISA. Intel
    >> has expected this for a long time...and has, in my opinion as a
    >> long-time stockholder, generally done the right thing in
    >> supporting the x86 while also attempting to introduce other
    >> architectures.

    Kelly> The most obvious data points in this direction are Java and
    Kelly> .NET, IMHO.

    Kelly> If a new CPU designed for one or both of these virtual
    Kelly> machines could run existing Java or .NET applications with
    Kelly> comparable performance and for less cost, fewer transistors
    Kelly> or less heat than the bloated P4, we might be able to wean
    Kelly> consumers away from the x86.

Does it actually have to be a new design/architecture?  I would have
thought that (if that is reasonably possible -- and I am completely
out of my depth here which was why I asked the original question) some
addition to the x86 architecture which makes Java and .NET apps run at
speeds closer to native applications would be quickly used by Sun and
Microsoft and thereby would find easier entrance into the market.

 [...]
 
    Holger
From: Thomas Gagne
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <hOidncRJiIKNFIbfRVn-sA@wideopenwest.com>
Already we're seeing the introduction of dual-core processors, and I remember 
back in the 80s when there was a short burst of activity around co-processor 
boards.  If such a chip was designed that lent itself to VMs, Lisp, Smalltalk, 
Java, etc., and the language vendors supported it, I would likely purchase a 
coprocessor board for my boxes.  One chip doesn't need to do everything.

Also worth noting are the advances being made to graphic chips from NVidea and 
the rest.  They are increasingly adding great enhancements and have found that 
gamers are a profitable market where for a special feature they're willing to 
pay extra $$$.  The traditional general CPU makers might take a hint from them.

Dave Roberts wrote:

> 
> 
> Generally, I agree with you on the bulk of this post. One thing I
> would tweak would be the statements above. The root of the problem
> wasn't that people wanted x86 so much, they just wanted something that
> would preserve the high-performance execution of their old
> binaries. The market has shown that it will allow people to add things
> to the x86 architecture, as long as the installed base of software is
> preserved. Intel did this very successfully from the 8086/88 -> 286 ->
> 386 -> 486 -> Pentium -> Pro -> MMX -> III -> SSE -> SSE2 ->
> 4. Surprisingly, Intel then forgot the rule and went off and did
> Itanic, with consequent adoption results.
> 
>snip
From: David Magda
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <86bra5oh8n.fsf@number6.magda.ca>
Thomas Gagne <······@wide-open-west.com> writes:

> around co-processor boards.  If such a chip was designed that lent
> itself to VMs, Lisp, Smalltalk, Java, etc., and the language
> vendors supported it, I would likely purchase a coprocessor board

Sun did have a chip that did Java for a while. I was called picoJava
I believe. I thinking that Moore's Law made the use of a specific
chip less necessary. The general purpose CPUs were able to keep up
with the bloat (?) in programs fairly well.

-- 
David Magda <dmagda at ee.ryerson.ca>, http://www.magda.ca/
Because the innovator has for enemies all those who have done well under
the old conditions, and lukewarm defenders in those who may do well 
under the new. -- Niccolo Machiavelli, _The Prince_, Chapter VI
From: Christopher C. Stacy
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <u8y59fky2.fsf@news.dtpq.com>
David Magda <··················@ee.ryerson.ca> writes:

> Thomas Gagne <······@wide-open-west.com> writes:
> 
> > around co-processor boards.  If such a chip was designed that lent
> > itself to VMs, Lisp, Smalltalk, Java, etc., and the language
> > vendors supported it, I would likely purchase a coprocessor board
> 
> Sun did have a chip that did Java for a while. I was called picoJava
> I believe. I thinking that Moore's Law made the use of a specific
> chip less necessary. The general purpose CPUs were able to keep up
> with the bloat (?) in programs fairly well.

For a while now, processors have been so fast with respect 
to most program requirements that practically nothing matters.  
Still, people are inordinately consumed with optimizing as close
to the metal as possible (even when the metal is a 14-staged
microcoded implementation of a crappy architecture).
As Moore's law starts to break down in a little while, 
maybe we can expect some more innovation in actual 
processor architectures.
From: Alexander Repenning
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <1109259977.140141.141080@z14g2000cwz.googlegroups.com>
Dave Roberts wrote:
>[...] Intel's big problem now is
> that processors are fast enough for the vast majority of basic office
> tasks (word processing, spreadsheets, etc.). That means you just have
> downward price pressure to look forward to.

Then again, this was already said about the 8086. When IBM released the
286 it seamed inconceivable to use up all this "awesome" processing
power for word-processors etc. When more power becomes available it can
always be used. First while-you-type spell, then grammar check used up
CPU to the point were the cursor on a Ghz++ machine moves just as fast
- or slow - now than it used to on a 1 Mhz Z80. Not sure what is next
(story check, fact check) but there will be stuff to do. The Thz
machine word processor will probaly not feel much faster either.

One thing that has changed is that we have moved somewhat from CPU
bound to network bound. Why bother with a super-fast machine when you
have a slow network? The average connection speed is not increasing
fast.

>  I think you'll see all
> sorts of transformations over the next few years as things like
> advanced crypto, 3D graphics, and other enhancements are added to the
> x86 architecture. Unfortunately, while I would love it, I doubt that
> Lisp support will be one of them.

Perhaps one of the largest issues with the Lisp community is a catch up
mind set. Let these other folks - whoever they are - build some stuff
and then hope that sometime later we, the lisp developers, can also
benefit from that. How about a more aggressive stance? For instance,
programming GPUs to create programmable pixel and vertex shaders is a
big challenge in 3D and scientific computing. Most developers use some
very low level programming language (think assembler) to write little
FUNCTIONAL programs. Why not try to push Lisp from follow to lead mode
with creating a GPU Lisp subset?
From: Gorbag
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <FUMTd.5$5Z3.3@bos-service2.ext.ray.com>
"Alexander Repenning" <·····@cs.colorado.edu> wrote in message
·····························@z14g2000cwz.googlegroups.com...
>
> One thing that has changed is that we have moved somewhat from CPU
> bound to network bound. Why bother with a super-fast machine when you
> have a slow network? The average connection speed is not increasing
> fast.

According to some experts, networking "speeds" are going up faster than CPU
speeds (bandwidth doubling every 9 months). Of course the speed of light is
a constant - latency doesn't improve.

See:
http://www.pcmag.com/article2/0,1759,1163648,00.asp

http://www.ipinfusion.com/newsEvents/newsEvents_inTheNews_chatwchuck.html

http://www.elecdesign.com/Articles/Index.cfm?AD=1&ArticleID=3703
From: Tim May
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <250220052015491927%timcmay@removethis.got.net>
In article <·············@bos-service2.ext.ray.com>, Gorbag
<······@invalid.acct> wrote:

> "Alexander Repenning" <·····@cs.colorado.edu> wrote in message
> ·····························@z14g2000cwz.googlegroups.com...
> >
> > One thing that has changed is that we have moved somewhat from CPU
> > bound to network bound. Why bother with a super-fast machine when you
> > have a slow network? The average connection speed is not increasing
> > fast.
> 
> According to some experts, networking "speeds" are going up faster than CPU
> speeds (bandwidth doubling every 9 months). Of course the speed of light is
> a constant - latency doesn't improve.


Maybe network speeds on _certain_ networks.

The best speeds I now get are about 1 Mbps, on certain DSL, cablemodem,
and WiFi lines. (OK, on a couple of cablemodems I get as high as 3-4
Mbps, sometimes.) These are not any better than my friends had in 1992,
when they had a T-1 line to their home.

But most of the time I am on dial-up, courtesy of DSL and cablemodem
only being in certain high-density markets, and courtesy of DirecTV
being both expensive and high-latency. And my dial-up today is about
28-31K, slower than what I had 10 years ago, in a larger town.

So while a certain core may be getting much faster, many in America and
the rest of the world are on network connections much the same as they
were several years ago, even a decade or more ago.

(And in the period that my network connection speed has gone from 9600
to 28K, my CPU speeds have gone from 25 MHz to 1 GHz (and could have
gone a few times higher if I needed it).

--Tim May
From: lin8080
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <4220459B.96A68C6C@freenet.de>
Gorbag schrieb:
> "Alexander Repenning" <·····@cs.colorado.edu> wrote in message

Hi there.

> > One thing that has changed is that we have moved somewhat from CPU
> > bound to network bound. Why bother with a super-fast machine when you
> > have a slow network? The average connection speed is not increasing
> > fast.

> According to some experts, networking "speeds" are going up faster than CPU
> speeds (bandwidth doubling every 9 months). Of course the speed of light is
> a constant - latency doesn't improve.

> http://www.ipinfusion.com/newsEvents/newsEvents_inTheNews_chatwchuck.html

(cite from article-end)
"To me, it's one of the more fascinating technologies that I've seen in
34 years."
(/cite)

They talk about a NPU (network processor unit) which is processing
internet-shipped packages. Very interesting here is the look at the
packrate of the lisp packages... (compiled or not, nearly plain ascii)

But expand the bandwidth (as in the other article) is limited by the
frequencies, otherwise there is less quality (as seen in digital-TV).
Else, when this goes on worldwide, there are government interests refer
to the contents... (so, add more speed, over-lightwave-speed to keep it
hidden) 

When I have a look at lisp-network activity (mod_lisp for apache and
some others), it looks like from the lisp side the interest is not very
big, or do I missed something? 

Is there a quick-processable compress-format in lisp for a
internet-friendly traffic-packaging? I do not know one in the moment,
third-party-apps (means (u)ffi and co., means time-consuming),
javas-class strategy comes in mind, and this should work better with
"lists", compressed lists, they are shorter and more efficient (and not
so enclosed like java-classes, means here: more opened to individual
programmers favorite style). 
 But this way it needs a standart interpreter for lisp, worldwide.
Implementing this is easier with a hardware-based chip (NPU, see above
article). (assume a Lisp-NPU-device to usual computers, wow, and this
can be done. *hmmm) 

And still there is the outstanding threading-mode. Maybe the 4 CPU
modell (today a server) is not possible in the software world?
(,sequence -i give up). Running more VMs in one CPU is still a step by
step cycle-tour. There is (adjustable-array (get-lenght)) and no
(adjustable-sequence (get-process-time)(start (slot-number
return-status))) uuuh 

What I guess for lisp is, sitting on the history-mountain-throne while
the future is ever seen from the rear side (*arg). Please teach me some
better. The world is an open process and my box has only 2 cables to
connect while my brain runs multi-process wishes. 
:) :( :( :( BRK.

Anyway, the next step in my eyes is in hardware. My thoughts leads to
massive parllel, as the brain does. Than speed becomes less important.
Lets say a processor with variable amount of virtual CPUs/VMs inside and
some fix instruments for process-controll, and no shared CPU-time. 

And remember, this is all done interpreted and with garbage-collectors.

stefan
From: lin8080
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <422D8408.7DAEB203@freenet.de>
Gorbag schrieb:
> "Alexander Repenning" <·····@cs.colorado.edu> wrote in message

> > One thing that has changed is that we have moved somewhat from CPU
> > bound to network bound. Why bother with a super-fast machine when you
> > have a slow network? The average connection speed is not increasing

> According to some experts, networking "speeds" are going up faster than CPU
> speeds (bandwidth doubling every 9 months). Of course the speed of light is
> a constant - latency doesn't improve.

Hi there

Now I find the new Playstation-3 Chip on the sony-page. This looks very
intressting. A Chip with 9 CPUs and up to 10 times more speed. They say
availible at end of 2005/ beginn 2006, and should be work also in TVs.
221 cm^2.

What can Lisp do on such a Chip? ...oh*

(To prevent Kenny from lift-off: They call it Cell, Sony-Cell) :)

stefan
From: Thomas Gagne
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <WpadnRrEIOoeTq_fRVn-ug@wideopenwest.com>
Maybe the way to handle GC isn't with special instructions but with 
parallelism.  If something's been marked as garbage why interrupt the 
path of execution to to free-up memory?

 From 
<http://news.com.com/PlayStation+3+chip+has+split+personality/2100-1043_3-5566340.html?tag=nl>

"The eight "synergistic" processors are a step forward from current 
computing system designs, in which the graphics chip draws pixels and 
the central processor does everything else. The Cell cores have 
media-specific instructions baked in, but they are flexible and smart 
enough to handle nonmedia tasks, said Brian Flachs, an IBM engineer. "It 
represents an important middle ground between graphics processors and 
central processors," he said."

lin8080 wrote:
<snip>
> 
> 
> Hi there
> 
> Now I find the new Playstation-3 Chip on the sony-page. This looks very
> intressting. A Chip with 9 CPUs and up to 10 times more speed. They say
> availible at end of 2005/ beginn 2006, and should be work also in TVs.
> 221 cm^2.
> 
> What can Lisp do on such a Chip? ...oh*
> 
> (To prevent Kenny from lift-off: They call it Cell, Sony-Cell) :)
> 
> stefan
> 
From: lin8080
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <42336A21.81D551CD@freenet.de>
Thomas Gagne schrieb:
> 
> Maybe the way to handle GC isn't with special instructions but with
> parallelism.  If something's been marked as garbage why interrupt the
> path of execution to to free-up memory?

Yes, why. Speed seems no longer an argument, memory seems availible as
much as necessary, and parallel -hm. Guess this needs more CPUs and a
strategy to manage this. 
Some clusters I saw are not whats named good-best and threading evolved
to an island there. Near future can integrate more CPUs in one chip,
cause this looks like the next possible way to easy rise calculation
power (architects mixture taken from chip history).

Lets say you have 3 transparent ttf-screens and try holographic (click
one iconplace 3x at once), lets say, you take 4 CPUs and use them with
one harddisk (accses one file 4 times simultaneous), or lets say you use
one symbol-name for multiple things, possible -maybe, but how far away
from any border. -and real time processing (no need to store datas
anywhere - means throw away the register technic completely, add datas
to the base-frequency and downtakt it to new-music-style to get your
program :)).

I saw no common strategy there, seems everybody works in his backroom
and want to keep that secret.

>  From
> <http://news.com.com/PlayStation+3+chip+has+split+personality/2100-1043_3-5566340.html?tag=nl>

Uh, lot of stuff, small steps that can be done as next.

> "The eight "synergistic" processors are a step forward from current
> computing system designs, in which the graphics chip draws pixels and
> the central processor does everything else. The Cell cores have
> media-specific instructions baked in, but they are flexible and smart
> enough to handle nonmedia tasks, said Brian Flachs, an IBM engineer. "It
> represents an important middle ground between graphics processors and
> central processors," he said."

Assume you have a pixelmap ready for readout to the screen, and this
pixelmap represents your programm ready to pass the CPU. In Lisp you
say: a list is program and data; so analog you can say: a pixelmap is...
a soundwave is... see how far that works with Lisp (but real duple
interpretable structures disappeared, people tend to think trivial so
any controller can follow up).

stefan

headexplosion4 &rest
From: Ulrich Hobelmann
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <39ha9fF5svqfvU1@individual.net>
lin8080 wrote:
> 
> Thomas Gagne schrieb:
> 
>>Maybe the way to handle GC isn't with special instructions but with
>>parallelism.  If something's been marked as garbage why interrupt the
>>path of execution to to free-up memory?
> 
> 
> Yes, why. Speed seems no longer an argument, memory seems availible as
> much as necessary, and parallel -hm. Guess this needs more CPUs and a
> strategy to manage this. 

Sun's newest generation of chips use something called chip 
multithreading, where stalls in the pipeline (such as for memory access) 
are used to run other threads.  IIRC one processor can run four threads 
at the same time and typical systems use LOTS of CPUs.

I think some implementation of Java also collects garbage concurrently 
in an extra thread, which should work really nicely with this.
From: lin8080
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <42338F97.6513D85F@freenet.de>
Ulrich Hobelmann schrieb:
> lin8080 wrote:
> > Thomas Gagne schrieb:

The seller-side is at:
(get it per e-mail)

Playstation 3 Marketing.
http://www.datagrid.org/
08/03/2005 00:31:26


stefan
From: Yanni Chiu
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <421C2743.D2B144EA@rogers.com>
Tim May wrote:
> 
> I was at Intel from 1974 to 1986 and saw efforts to introduce new
> architectures (432, 960, 860, iWarp, other processors from other
> companies, such as Z8000, 32032, Swordfish, etc.). Mostly these efforts
> failed.

iWarp - I've still got one. It's encased in plastic
and is part of a keychain though. I got it as a
souvenir after working on a C compiler for iWarp.
This darned thing has outlasted my 8086. :)

--yanni
From: Ronald Kirk Kandt
Subject: Re: Interview with Alan Kay
Date: 
Message-ID: <cvl800$rs8$1@nntp1.jpl.nasa.gov>
What I remember about the Burroughs architecture follows.
(1) It was a microprogrammable machine using very wide instructions 
(probably 128 bits).
(2) The microcode was extremely fast in its day (1-2 orders faster than the 
macro-architectures of the day).
(3) Burroughs developed an instruction set tailored for four different 
languages: Algol (which the OS was written in), Fortan, Cobol, and another 
language (I can't remember but it might have been APL).
(4) The machine could interweave instruction sets from different languages 
because it simultaneously maintained the four instruction sets. (I believe 
two bits in a microcode word referenced one of the 4 microcoded instruction 
sets.)
(5) I think it used a tag architecture (like the Symbolics machine later 
used).
(6) It was a stack machine.

I think the important idea to take away from this is the value of tailoring 
a machine architecture to the language (or languages) it is intended to 
support.

"Tim May" <·······@removethis.got.net> wrote in message 
·······························@removethis.got.net...
> In article <······················@wideopenwest.com>, Thomas Gagne
> <······@wide-open-west.com> wrote:
>
> ...quotes from Alan Kay's interesting interview...
>
>> so forth. Time-sharing was held back for years because it was 
>> "inefficient"-- 
>> but the manufacturers wouldn't put MMUs on the machines, universities had 
>> to
>> do it themselves! Recursion late-binds parameters to procedures, but it 
>> took
>> years to get even rudimentary stack mechanisms into CPUs. Most machines 
>> still
>> have no support for dynamic allocation and garbage collection and so 
>> forth.
>> In
>> short, most hardware designs today are just re-optimizations of moribund
>> architectures."
>
>
> What Alan Kay says is no doubt true, but the real issue has always been
> that computer architecture, especially for workstations and PCs, has
> been a popularity contest: one architecture wins out.
>
> There are multiple reasons for this, including "most popular gets more
> users, hence more software," and "learning curve" (costs of production
> lower for higher-volume chips), and, I think most importantly, "limited
> desktop space means one machine per desktop."
>
> In the 80s and into the 90s, this meant a Sun workstation or equivalent
> RISC/Unix/C machine for engineers and designers, an IBM PC or
> equivalent for most office workers, a Macintosh for most graphics
> designers or desktop publishers.
>
> It didn't matter if a Forth engine was great for Forth, or a Symbolics
> 3600 was great for Lisp, or a D-machine was great for Smalltalk: there
> just weren't many of these sold.
>
> Similar things happened in minicomputer and mainframe computer markets.
>
> I was at Intel from 1974 to 1986 and saw efforts to introduce new
> architectures (432, 960, 860, iWarp, other processors from other
> companies, such as Z8000, 32032, Swordfish, etc.). Mostly these efforts
> failed. (There are various reasons, but most never got a chance to be
> tweaked or fixed, because the "market had spoken.") The customers
> wanted x86, despite obvious shortcomings. Niche architectures mostly
> died out. And such is the case today, even more so than back then.
>
> (Something that may change this is the "slowing down" of Moore's
> observation about doubling rates. For good physics reasons, clock
> speeds are not doubling at the rates seen in the past. This may push
> architectures in different directions.)
>
> So while there are all sorts of things that _could_ be put into
> computer architectures, limited desktop space and the economies of
> scale pretty much dictate a slower rate of adding these features.
>
> --Tim May