Re: ANSI CLISP: strengths vs. weaknesses?

From: Martin Cracauer
Subject: Re: ANSI CLISP: strengths vs. weaknesses?
Date: Thu, 10 Oct 1996 00:00:00 +0000
Message-ID: <1996Oct10.100654.824@wavehh.hanse.de>

····@teleport.com (Brett) writes:

>As a programmer in various other languages (procedural), but a recent LISP
>onlooker, I am now considering LISP more seriously. And I am curious to
>know what other professionals see as the greatest weaknesses of LISP
>(various dialects). I see a lot about its advantages, but wish to peer
>deeper into this language. 

For Common Lisp, I'm unsatisfied mostly with CLOS. It's flexibility
greatly damages possible performance improvements and it is not very
integrated with the rest of the system. A dylan-like OO system with
sealing and standard types beeing normal classes would be nicer.

For Scheme, I found many useful implementations for specific tasks,
but not a universal implementation I could base long-running work
on. Maybe the Guile effort will help here. Performance-wise, I'm still
convinced that Scheme is slow by definition (while Common Lisp allows
to speed up things by declartions etc.).

>Is the code easily readable - code of others, esp? Or does it suffer from 
>problems similar to c's readability? Is it easily packaged into modules 
>of some sort? 

The syntax is one the best things of Lisp, IMHO. I really like to have
a uniform syntax without operator preferences, a syntax to play games
with and I really like to seperate things by space instead of random
other characters (in function argument lists, for example).

Martin

-- 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <········@wavehh.hanse.de>  http://www.bik-gmbh.de/~cracauer
"As far as I'm concerned,  if something is so complicated that you can't ex-"
"plain it in 10 seconds, then it's probably not worth knowing anyway"- Calvin

Re: ANSI CLISP: strengths vs. weaknesses? Erik Naggum
Re: ANSI CLISP: strengths vs. weaknesses? Jim Veitch
- Re: ANSI CLISP: strengths vs. weaknesses? Martin Cracauer
  - Re: ANSI CLISP: strengths vs. weaknesses? Jim Veitch
  - Re: ANSI CLISP: strengths vs. weaknesses? Martin Cracauer
    - Re: ANSI CLISP: strengths vs. weaknesses? William Paul Vrotney
      - Re: ANSI CLISP: strengths vs. weaknesses? Martin Cracauer
      - Re: ANSI CLISP: strengths vs. weaknesses? Chuck Fry
        Re: ANSI CLISP: strengths vs. weaknesses? William Paul Vrotney
      - Re: ANSI CLISP: strengths vs. weaknesses? Kelly Murray
      - Re: ANSI CLISP: strengths vs. weaknesses? Richard A. O'Keefe
        Re: ANSI CLISP: strengths vs. weaknesses? Lyman S. Taylor
    - Re: ANSI CLISP: strengths vs. weaknesses? Jeff Dalton

From: Erik Naggum
Subject: Re: ANSI CLISP: strengths vs. weaknesses?
Date: Fri, 11 Oct 1996 00:00:00 +0000
Message-ID: <3054013258795813@naggum.no>

[Martin Cracauer]

|   For Common Lisp, I'm unsatisfied mostly with CLOS.  It's flexibility
|   greatly damages possible performance improvements and it is not very
|   integrated with the rest of the system.  A dylan-like OO system with
|   sealing and standard types beeing normal classes would be nicer.

it appears that you based your analysis on CLtL2.  if you instead read the
ANSI specification, what you seem to wish for has become true.

for instance, support for international character sets is handled by
subclasses of the class `character'.

#\Erik
-- 
I could tell you, but then I would have to reboot you.

From: Jim Veitch
Subject: Re: ANSI CLISP: strengths vs. weaknesses?
Date: Fri, 11 Oct 1996 00:00:00 +0000
Message-ID: <325E98A0.44CD@franz.com>

Martin Cracauer wrote:
> 
> ····@teleport.com (Brett) writes:
> 
> >As a programmer in various other languages (procedural), but a recent LISP
> >onlooker, I am now considering LISP more seriously. And I am curious to
> >know what other professionals see as the greatest weaknesses of LISP
> >(various dialects). I see a lot about its advantages, but wish to peer
> >deeper into this language.
> 
> For Common Lisp, I'm unsatisfied mostly with CLOS. It's flexibility
> greatly damages possible performance improvements and it is not very
> integrated with the rest of the system. A dylan-like OO system with
> sealing and standard types beeing normal classes would be nicer.

A couple of views from my position as working for a vendor.

If one writes benchmarks which do nothing but method calls, then
performance is poorer than C or C++.  But if you write code that does
other
things (like an application) then surprising facts emerge:

1. In an optimized CLOS implementation like Allegro CL, CLOS overhead is
typically only 10% or so.  So reducing overhead to 0 might speed up your
application by 10%.  In some cases like very tight graphics loops, the
overhead becomes significant and sealing would help, but dropping into
procedural code is very possible to circumvent these cases.  But in
these
cases sealing would be great.

2. Many C++ applications run slower than CLOS applications which do
similar
things.  Why?  C++ seals everything and makes it hard to experiment.  If
you didn't do it right the first time then you never do it right.  C++
is
a "read-only" language.

What are the limitations to CLOS?  things like not enough experienced
programmers, not enough mainstream acceptance and perception that Lisp
is
unsuitable appear to have far more effect on the use of Lisp than the
technical disadvantages.  Of course, some of these mean CLOS/Lisp people
can end up total winners in getting things done since they don't have
the same competition.

Jim.

From: Martin Cracauer
Subject: Re: ANSI CLISP: strengths vs. weaknesses?
Date: Fri, 18 Oct 1996 00:00:00 +0000
Message-ID: <1996Oct18.115909.28857@wavehh.hanse.de>

Jim Veitch <···@franz.com> writes:

>Martin Cracauer wrote:
>> 
>> ····@teleport.com (Brett) writes:
>> 
>> >As a programmer in various other languages (procedural), but a recent LISP
>> >onlooker, I am now considering LISP more seriously. And I am curious to
>> >know what other professionals see as the greatest weaknesses of LISP
>> >(various dialects). I see a lot about its advantages, but wish to peer
>> >deeper into this language.
>> 
>> For Common Lisp, I'm unsatisfied mostly with CLOS. It's flexibility
>> greatly damages possible performance improvements and it is not very
>> integrated with the rest of the system. A dylan-like OO system with
>> sealing and standard types being normal classes would be nicer.

>A couple of views from my position as working for a vendor.

>If one writes benchmarks which do nothing but method calls, then
>performance is poorer than C or C++.  But if you write code that does
>other
>things (like an application) then surprising facts emerge:

>1. In an optimized CLOS implementation like Allegro CL, CLOS overhead is
>typically only 10% or so.  So reducing overhead to 0 might speed up your
>application by 10%.  In some cases like very tight graphics loops, the
>overhead becomes significant and sealing would help, but dropping into
>procedural code is very possible to circumvent these cases.  But in
>these
>cases sealing would be great.

The 10% message call overhead number is probably not wrong, but under
certain assumptions.

I thought a bit about what I feel to be "wrong" with CLOS. It is
certainly not that "CLOS is slow", rather, "CLOS's flexibility
requires to use it in the right places".

CLOS is not to Lisp what C++ is to C. In C++, you can (but don't have
to) put almost everything, even basic functionality of your
application in classes, objects and member functions. It will slow
down, but not too much.

This can't be done with CLOS. Would you think it is right to say that
CLOS is exclusively for higher-level objects and that the basic Common
Lisp functionality should be sufficient for the lower levels?

Then, the 10% number is probably right. I happened to had a bad start
with CLOS, using it to model almost everything in CLOS objects, like I
did when I tried to bring my C programs to C++ (I still use C, not C++
for this, for other reasons than performance).

Obviously, my CLOS code was really bloated and slow. Using CLOS for
tiny objects with low-runtime function calls *is* slow.

It's not only message call overhead. When you define a C++ class with
members of simple data types, these are represented unboxed, without
pointers. No CLOS implementation, as far as I know, can represent
slots without at least one pointer redirection. Then, simple C++
classes usually can use static binding, and therefore inlining.

>2. Many C++ applications run slower than CLOS applications which do
>similar things.  Why?  C++ seals everything and makes it hard to
>experiment.  If you didn't do it right the first time then you never
>do it right.  C++ is a "read-only" language.

Once your objects are of sufficient complexity that even the C++
representation uses some kind of type tagging, pointer indirections
and virtual functions, CLOS will probably be usable again.

>What are the limitations to CLOS?  things like not enough experienced
>programmers, not enough mainstream acceptance and perception that Lisp
>is

I am probably an unexperienced CLOS programmer. I had the same
discussion with Ken Anderson of BBN and other people several times. I
heard many things about how to speed up CLOS.

I have to admit that I didn't understand many of these techniques, for
other, I'm unsure whether they are what I want.

All in all, I think CLOS is a high-level modeling system that can't be
treated like other OO extended extensions or simple OO-only languages
like Java.

It is not trivial to learn to use CLOS right. It's not the unusual
generic functions approach, in fact I like it. I felt to write CLOS
programs with acceptable performance I had to know a lot about the
implementation, to use the right constructs in the right place. This
is true for every program in every language. But to estimate the
runtime behavior of CLOS constructs is more difficult than for plain
Lisp objects. Additionally, Common Lisp implementations with CLOS are
very different from each other, at least when you count free
implementations as well (many have weak CLOS support, but perform well
for basic Common Lisp).

I learned programming mostly from reading other people's sources (and
books, of course). I didn't find much free CLOS code and would
appreciate any pointers. The books I own either teach "how-to-use"
CLOS (not "where-to-use"), in situations where I found CLOS to be
inadequate (complex numbers, points etc.) or they don't use much CLOS
(Graham's and Norvigs books).

>unsuitable appear to have far more effect on the use of Lisp than the
>technical disadvantages.  Of course, some of these mean CLOS/Lisp people
>can end up total winners in getting things done since they don't have
>the same competition.

Here you assume I spoke against Common Lisp as a whole. This is not
the case. In fact, I ended up being very efficient for some projects
*because* I used Lisp. However, I didn't use CLOS and so far I don't
seem to be able to catch up with it. 

I case there's someone with some good pointer where I could educate
myself, and who isn't too upset about my postings, please let me
know. 

Martin
-- 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <········@wavehh.hanse.de>  http://www.bik-gmbh.de/~cracauer
"As far as I'm concerned,  if something is so complicated that you can't ex-"
"plain it in 10 seconds, then it's probably not worth knowing anyway"- Calvin

From: Jim Veitch
Subject: Re: ANSI CLISP: strengths vs. weaknesses?
Date: Fri, 18 Oct 1996 00:00:00 +0000
Message-ID: <3268489D.642F@franz.com>

Martin Cracauer wrote:

> >1. In an optimized CLOS implementation like Allegro CL, CLOS overhead is
> >typically only 10% or so.  So reducing overhead to 0 might speed up your
> >application by 10%.  In some cases like very tight graphics loops, the
> >overhead becomes significant and sealing would help, but dropping into
> >procedural code is very possible to circumvent these cases.  But in
> >these
> >cases sealing would be great.

> I thought a bit about what I feel to be "wrong" with CLOS. It is
> certainly not that "CLOS is slow", rather, "CLOS's flexibility
> requires to use it in the right places".
> 
> CLOS is not to Lisp what C++ is to C. In C++, you can (but don't have
> to) put almost everything, even basic functionality of your
> application in classes, objects and member functions. It will slow
> down, but not too much.

> This can't be done with CLOS. Would you think it is right to say that
> CLOS is exclusively for higher-level objects and that the basic Common
> Lisp functionality should be sufficient for the lower levels?

It's very implementation dependent.  Implementations do have hacks
to speed things up a lot: for example, in Allegro CL for UNIX if you
define the order of superclasses and guarantee to the compiler you won't
change it, we can compile a CLOS object into a flat array which has
array speed access (it's the fixed-index option).

> Then, the 10% number is probably right. I happened to had a bad start
> with CLOS, using it to model almost everything in CLOS objects, like I
> did when I tried to bring my C programs to C++ (I still use C, not C++
> for this, for other reasons than performance).
> 
> Obviously, my CLOS code was really bloated and slow. Using CLOS for
> tiny objects with low-runtime function calls *is* slow.

Yes.  It depends on the ratio of function calls to other code.  If your
other code is doing quite a lot, CLOS works great, but if you are in
tight loops doing little else but function calling, then it'll be slow.
This is a case where sealing would be great.

> It's not only message call overhead. When you define a C++ class with
> members of simple data types, these are represented unboxed, without
> pointers. No CLOS implementation, as far as I know, can represent
> slots without at least one pointer redirection. Then, simple C++
> classes usually can use static binding, and therefore inlining.

See above comments on fixed-index option.
 
> All in all, I think CLOS is a high-level modeling system that can't be
> treated like other OO extended extensions or simple OO-only languages
> like Java.

Java is not too swift either.
 
> But to estimate the
> runtime behavior of CLOS constructs is more difficult than for plain
> Lisp objects. Additionally, Common Lisp implementations with CLOS are
> very different from each other, at least when you count free
> implementations as well (many have weak CLOS support, but perform well
> for basic Common Lisp).

This is true.

> I learned programming mostly from reading other people's sources (and
> books, of course). I didn't find much free CLOS code and would
> appreciate any pointers. The books I own either teach "how-to-use"
> CLOS (not "where-to-use"), in situations where I found CLOS to be
> inadequate (complex numbers, points etc.) or they don't use much CLOS
> (Graham's and Norvigs books).

Paul Graham's ANSI Common Lisp is a good book with a nice CLOS intro.
But I agree it is one of those areas where practitioners could help
more.
Something at the level of Norvig's book, but about CLOS would be great.
People who buy Allegro CL for Windows can get the entire CLOS based
source
for the window system and environment.  But I don't know too much about
public domain high quality CLOS sources.

> Here you assume I spoke against Common Lisp as a whole. This is not
> the case. In fact, I ended up being very efficient for some projects
> *because* I used Lisp. However, I didn't use CLOS and so far I don't
> seem to be able to catch up with it.

I apologize for misunderstanding.

From: Martin Cracauer
Subject: Re: ANSI CLISP: strengths vs. weaknesses?
Date: Sun, 20 Oct 1996 00:00:00 +0000
Message-ID: <1996Oct20.112936.25643@wavehh.hanse.de>

···@intentionally.blank-see.headers writes:

>No runtime dispatch mechanism, not even the C++ mechanism, is fast
>enough to use for "everything".  The only way to be able to give the
>illusion of treating things like numbers just like other objects is
>via type inference, immediate representations, and other optimizations.

>In fact, in practice, the "efficient" C++ dispatch mechanism is not
>significantly faster than the much more dynamic mechanism found in
>Objective-C (or, I believe, IBM's SOM).  But the few extra cycles that
>C++ squeezes out cause lots of headaches when it comes to system
>integration, dynamic configuration, etc.  In particular, OLE and SOM
>are really just workarounds for the inflexible and cumbersome object
>model that C++ has built into it (btw, if you want to know all the
>unpleasant details, take a look at Lippman's "Inside the C++ Object
>Model").

This begins to be off-topic, but the lookup mechanism of ObjC requires
one pointer indirection more than the C++ virtual function
call. Pointer indirections can be very expensive given today's memory
bandwidth/CPU speed relation. 

Also, it was my point that C++ (and Dylan) enable static binding and
inlining. In a language that supports this, you can represent much
"smaller" things using the language's native Class
definition mechanism. Obviously, as I wrote in my former posting, such
languages will give people who jump on the OO wagon and try to
represent too much in true objects a much better performance. This
what happend to me.

To prevent misunderstanding: I don't think I have to represent as much
as possible as "true" objects anymore (and I didn't choose C++ for a
project for a long time). However, I still think this is an
explanation why people may get a positiv impression when they first
try C++ when they start using OO languages.

>I see little reason why CLOS should, in principle, be slower
>than Objective-C's mechanism.  In practice, of course, because
>CLOS is so feature rich, implementors may have less time to
>make it fast.

Also, as Jim Veitch of Franz wrote, there are actually CLOS versions
that can represent a CLOS object's slots as a flat array, so my former
claim that slot accesses (not functions calls) have to be slower in
CLOS was not right. 

I think we shouldn't dicuss things this way (theory only)
anymore. Obviously, my impressions need a reality-check against moderm
commercial CLOS systems (although I'd still prefer to write code that
runs on CMUCL or other free implementations fast as well, what can be
done with non-CLOS Common Lisp).

Martin
-- 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <········@wavehh.hanse.de>  http://cracauer.cons.org

From: William Paul Vrotney
Subject: Re: ANSI CLISP: strengths vs. weaknesses?
Date: Tue, 22 Oct 1996 00:00:00 +0000
Message-ID: <vrotneyDznMB1.CC@netcom.com>

In article <······················@wavehh.hanse.de> ········@wavehh.hanse.de (Martin Cracauer) writes:

> 
> ···@intentionally.blank-see.headers writes:
> 
> >No runtime dispatch mechanism, not even the C++ mechanism, is fast
> >enough to use for "everything".  The only way to be able to give the
> >illusion of treating things like numbers just like other objects is
> >via type inference, immediate representations, and other optimizations.
> 
> >In fact, in practice, the "efficient" C++ dispatch mechanism is not
> >significantly faster than the much more dynamic mechanism found in
> >Objective-C (or, I believe, IBM's SOM).  But the few extra cycles that
> >C++ squeezes out cause lots of headaches when it comes to system
> >integration, dynamic configuration, etc.  In particular, OLE and SOM
> >are really just workarounds for the inflexible and cumbersome object
> >model that C++ has built into it (btw, if you want to know all the
> >unpleasant details, take a look at Lippman's "Inside the C++ Object
> >Model").
> 
> This begins to be off-topic, but the lookup mechanism of ObjC requires
> one pointer indirection more than the C++ virtual function
> call. Pointer indirections can be very expensive given today's memory
> bandwidth/CPU speed relation. 
> 

Actually, I was always in complete agreement with the original poster's
remarks comparing C++ to Objective-C.  I was also under the impression that
pointer indirection was not very expensive on many modern day computers.  So
could you please explain what you mean by this last sentence.



-- 

William P. Vrotney - ·······@netcom.com

From: Martin Cracauer
Subject: Re: ANSI CLISP: strengths vs. weaknesses?
Date: Tue, 22 Oct 1996 00:00:00 +0000
Message-ID: <1996Oct22.190840.26842@wavehh.hanse.de>

·······@netcom.com (William Paul Vrotney) writes:

>In article <······················@wavehh.hanse.de> ········@wavehh.hanse.de (Martin Cracauer) writes:

>> 
>> ···@intentionally.blank-see.headers writes:
>> 
>> >No runtime dispatch mechanism, not even the C++ mechanism, is fast
>> >enough to use for "everything".  The only way to be able to give the
>> >illusion of treating things like numbers just like other objects is
>> >via type inference, immediate representations, and other optimizations.
>> 
>> >In fact, in practice, the "efficient" C++ dispatch mechanism is not
>> >significantly faster than the much more dynamic mechanism found in
>> >Objective-C (or, I believe, IBM's SOM).  But the few extra cycles that
>> >C++ squeezes out cause lots of headaches when it comes to system
>> >integration, dynamic configuration, etc.  In particular, OLE and SOM
>> >are really just workarounds for the inflexible and cumbersome object
>> >model that C++ has built into it (btw, if you want to know all the
>> >unpleasant details, take a look at Lippman's "Inside the C++ Object
>> >Model").
>> 
>> This begins to be off-topic, but the lookup mechanism of ObjC requires
>> one pointer indirection more than the C++ virtual function
>> call. Pointer indirections can be very expensive given today's memory
>> bandwidth/CPU speed relation. 
>> 

>Actually, I was always in complete agreement with the original poster's
>remarks comparing C++ to Objective-C.  I was also under the impression that
>pointer indirection was not very expensive on many modern day computers.  So
>could you please explain what you mean by this last sentence.

I mean in ObjC you have to follow two pointers to memory location and
both could or could not be loaded in the CPU cache. This is for the
GNU runtime.

Martin
-- 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <········@wavehh.hanse.de>  http://cracauer.cons.org

From: Chuck Fry
Subject: Re: ANSI CLISP: strengths vs. weaknesses?
Date: Tue, 22 Oct 1996 00:00:00 +0000
Message-ID: <1996Oct22.184022.11736@ptolemy-ethernet.arc.nasa.gov>

In article <················@netcom.com>,
William Paul Vrotney <·······@netcom.com> wrote:
>In article <······················@wavehh.hanse.de> ········@wavehh.hanse.de (Martin Cracauer) writes:
>> This begins to be off-topic, but the lookup mechanism of ObjC requires
>> one pointer indirection more than the C++ virtual function
>> call. Pointer indirections can be very expensive given today's memory
>> bandwidth/CPU speed relation. 
>
>Actually, I was always in complete agreement with the original poster's
>remarks comparing C++ to Objective-C.  I was also under the impression that
>pointer indirection was not very expensive on many modern day computers.  So
>could you please explain what you mean by this last sentence.

It's very simple.  CPUs are increasing in speed very rapidly.  By
contrast, main memory has not improved in performance nearly as
rapidly.  80 nanosecond memory worked well with the 25 MHz
processors of a few years ago; today processors are an order of
magnitude faster, but memory performance has hardly improved.

For an example, let's take my wife's Mac clone, a PowerCenter 150, a
midrange personal computer by today's standards.  This machine has a
PowerPC 604 as its central processor, running at 150 MHz.  The 604 can
execute multiple integer instructions in one clock cycle, which is
about 6.6... nanoseconds (ns).  I think memory reference instructions
take at least 3 cycles, and that's when the on-chip cache already has
the data.

Main memory on this machine has an access time rating of 70 ns;
various intervening devices and speed-of-light delays add another few
ns.  By my reckoning, it takes at least 12 processor clock cycles to
retrieve data from main memory, and the bus is busy for some time
after that.  The only other PowerPC instructions that take this long
are floating point divisions.  *That* is the cost to chase a pointer
in today's systems!  And that's in addition to the usual function
calling overhead of saving and restoring registers, etc.

When you are trying hard to squeeze cycles out of an inner loop, that
~100 ns per indirection starts to add up.  Caching can help somewhat,
but only for the most frequently referenced routines.

Like it or not, the implications for Lisp aficionados are obvious.
Modern processors work fastest when memory references are minimized,
and when the remaining memory references are primarily in linear order
to maximize cache hits.  C++ has a performance edge on CLOS in this
respect.

One thing that can help is optimizing out unneeded indirections, both
in source code (e.g. by using vectors or vector-based structures
instead of lists, inlining short functions, etc.) and in the resulting
compiled code.  Here the compiler builders can definitely give us a
boost.

For example, CLOS code in particular could benefit from partial
evaluation and faster method discrimination strategies that reduced
the number of memory references.  As another example, Symbolics some
years ago provided the Optimize World feature, which performed such
optimizations as replacing indirect function calls through symbols
with direct calls to the corresponding compiled functions.  The old
Lisp Machine CDR-coding hack would be a help too, especially now that
64-bit processors are becoming commonplace.
 -- Chuck
-- 
 Chuck Fry  Work: ······@ptolemy.arc.nasa.gov  Play: ······@best.com
	I alone am responsible for the contents of this post.
	      Keep NASA and Caelum Research out of this.

From: William Paul Vrotney
Subject: Re: ANSI CLISP: strengths vs. weaknesses?
Date: Fri, 25 Oct 1996 00:00:00 +0000
Message-ID: <vrotneyDztM4I.Bzp@netcom.com>

In article <······················@ptolemy-ethernet.arc.nasa.gov>
······@kronos.arc.nasa.gov (Chuck Fry ) writes:

> 
> In article <················@netcom.com>,
> William Paul Vrotney <·······@netcom.com> wrote:
> >In article <······················@wavehh.hanse.de> ········@wavehh.hanse.de (Martin Cracauer) writes:
> >> This begins to be off-topic, but the lookup mechanism of ObjC requires
> >> one pointer indirection more than the C++ virtual function
> >> call. Pointer indirections can be very expensive given today's memory
> >> bandwidth/CPU speed relation. 
> >
> >Actually, I was always in complete agreement with the original poster's
> >remarks comparing C++ to Objective-C.  I was also under the impression that
> >pointer indirection was not very expensive on many modern day computers.  So
> >could you please explain what you mean by this last sentence.
> 
> It's very simple.  CPUs are increasing in speed very rapidly.  By
> contrast, main memory has not improved in performance nearly as
> rapidly.  80 nanosecond memory worked well with the 25 MHz
> processors of a few years ago; today processors are an order of
> magnitude faster, but memory performance has hardly improved.
> 
>     ... Stuff about his wife's Mac clone and cpu on chip caches ...

Gee wiz, I already knew about on chip caches as I suppose most people that
read this newsgroup do.  I appreciate Mr. Fry response to my question but
I feel like he is mixing apples and oranges to make a point.  I will explain
what I mean by this later.  But first, correct me of I am wrong, keeping
apples and apples, would not a pointer indirection into cache and a non
pointer reference into cache make the pointer reference only be the usual
extra one fetch cycle more costly?  And would not the pointer indirection
into RAM and the non pointer reference into RAM also only be the usual extra
one fetch cycle more costly?  If this is true, gee I thought it was, then
the statistics of pointer hits out of cpu cache can be treated now as a
separate argument (see below).

> 
> Like it or not, the implications for Lisp aficionados are obvious.
> Modern processors work fastest when memory references are minimized,
> and when the remaining memory references are primarily in linear order
> to maximize cache hits.  C++ has a performance edge on CLOS in this
> respect.
> 
> ... Stuff about using arrays instead of lists and CLOS optimizations ...

This is where I think Mr.  Fry is mixing apples and oranges to make a point.
True, Lisp suffers from this pointer cpu cache hit phenomena, but C++ does
as well.  Oranges and oranges.  Arrays in Lisp can be created with objects
of a specific type just like in C++.  C++ function calls will jump out of
cache just like in Lisp.  You can have typed local variables in Lisp just
like in C++.  If you build any pointer structures in C++ you have the same
cache problem that you would have in Lisp.  Look at any large complex C++
program and you will see a huge number of char* variables, how are these any
different from a Lisp pointer?

To say that C++ has a performance edge on Lisp (or CLOS) has elements of
truth but it is too simplistic and is not provably true for complex enough
programs.  I won't go into the reasons why this is so since it has been
repeated here so many times it sounds like a broken record.  But one thing
that I believe is easily observed and bears repeating, "The most
inefficiently created working Lisp program is by far more efficient than the
most efficiently created non-working C++ program".  And this has been
observed, at least by me, far more than any noticeable problems with Lisp
pointing out of cache.

Once again, I appreciate Mr Fry's response to my question, and this last
paragraph strays from Mr Fry's response.  But I didn't want to leave the
impression that it is somehow bad to write programs in Lisp simply because
of the cpu cache pointer problem prompted by my stupid question.
-- 

William P. Vrotney - ·······@netcom.com

From: Kelly Murray
Subject: Re: ANSI CLISP: strengths vs. weaknesses?
Date: Wed, 23 Oct 1996 00:00:00 +0000
Message-ID: <54jvo6$kuc@sparky.franz.com>

> I thought a bit about what I feel to be "wrong" with CLOS. It is
> certainly not that "CLOS is slow", rather, "CLOS's flexibility
> requires to use it in the right places".

Speed is usually very over-rated a criterion,
but nevertheless, sometimes things must be as fast as possible.

My personal opinion is that CLOS *is* too slow...to be used
for everything.  This is very unfortunate, because it means
having to learn two ways of doing things instead of just one,
both defclass and defstruct, which are very different both
syntactically and semantically.

I believe CLOS needs to be changed or extended to allow it to
compile-out dynamic behavior.  I have specific suggestions.

0. "Sealing", which makes everything static, and do compile-time
   optimizations.  The suggestions below are really  more limited
   forms of this.

1. Support for "primary" classes, like Dylan,
   which allow fixed slot offsets for a single-inheritance chain
   of primary classes.  This still allows multiple inheritance,
   but the non-primary mixed-in classes use the existing indirect
   lookup for slot accesses. 
   
2. "Direct Slot Accessors" which are not full methods.
   The standard slot accessors are full methods, which in many
   or most cases are simply unneeded overhead, both at runtime,
   as well as keeping around the meta-data for the methods/generic function.

Let me also recommend people read my most recent CLOS column: _Under_The_Hood_
in the Sept 1996 issue of the Journal of Object Oriented Programming (JOOP).
I described an implementation of CLOS that is much faster than 
the typical implementation, which optimizes for speed the 
common cases with a slightly more expense in more complex cases.
It is basically much more similiar to a C++ implementation!

I've also recently did an interesting
comparison between vectors, defstructs, classes,
all using the same exact basic algorithm, where only the representations
were different.  The algorithm is a b-tree, or more accurately 
a balanced 2-3-4 tree.  The test builds up a 20,000 element structure.
The system does *heavy* slot read/write access,
without any method dispatch.

The vector representation was selected more for space more than speed.
as it uses 2-bits of the integer keys to hold bnode size information,
where other representation allocate an entire slot for the size,
and/or use the run-time tagged "type" of the object to identify the size.
The vector code would probably be much faster if I used declaration
of fixnums for the bit extractions.  The end goal was coding into C,
but I don't have those numbers, I gave up before it was fully debugged!

vector:      7.5 seconds, 387K allocated
defstruct:   5.0 seconds, 492K allocated
classes:    11.0 seconds, 753K allocated

I think this shows that ACL 4.3's implementation of slot-accessor methods
is excellent.  In prevoius tests using ACL 4.2, I showed CLOS was 4 times
slower.  But it is still slower than a defstruct.  
I think in the ideal world, one could specify the CLOS class to
behave like a defstruct, and the app could simply be recompiled.
 

-Kelly Murray  ···@franz.com   http://www.franz.com

From: Richard A. O'Keefe
Subject: Re: ANSI CLISP: strengths vs. weaknesses?
Date: Fri, 25 Oct 1996 00:00:00 +0000
Message-ID: <54q2i0$kcc$1@goanna.cs.rmit.edu.au>

·······@netcom.com (William Paul Vrotney) writes:
>Actually, I was always in complete agreement with the original poster's
>remarks comparing C++ to Objective-C.  I was also under the impression that
>pointer indirection was not very expensive on many modern day computers.  So
>could you please explain what you mean by this last sentence.

Consider a typical modern machine.
I have a real machine in mind, but I am not proclaiming its identity,
in case I have some of the details wrong.

The machine is a 200 MHz 4-way superscalar one,
which means that it can issue up to 4 instructions in each cycle,
and each cycle is 5nsc.
Only one of the instructions in each group can be a memory reference.

If a memory reference hits in L1 cache (16k) the result is available
3 cycles later (there are separate 16k caches for I and D references;
but this is a D reference).
If a memory reference misses the L1 cache but hits the L2 cache (512k
or more, typically 1Mb), the result is available
8 cycles later.
If a memory reference misses both caches, the result is available
30 cycles later.
There is a load buffer, so the processor doesn't stall when a
memory reference misses, only if you try to use the result before
it is available.

So a pair of instructions like
	load R1, Offset(Base)
	call-indirect R1
where the function pointer is likely to be in L2 cache is likely to
take 5 cycles longer than a direct
	call procedure-name
(it's actually somewhat worse than this).

During those 5 cycles, you _could_ have done 20 other instructions...

And it's worse than that; the instruction prefetch system would _know_
where you were headed if the call were direct, and could start preloading
the I-cache.  With an indirect call, that isn't going to work, so you
will get extra I-cache misses that could have been avoided.

In addition, this machine has a "Prefetch I-space" instruction that can
be used to start the page fault processing early,  so a direct call can
do
	prefetch I-space <procedure name>
	<pass arguments>
	call <procedure name>

This doesn't help very much if the instructions you want aren't in memory;
you wouldn't *believe* how long it takes to read a page (the equivalent
of several millions of instructions).  What it _does_ help with is the
I-space TLB, so that the MMU can be fetching the page table entry for
the page containing <procedure name> well before it is needed.

There is no "Prefetch I-space Indirect" instruction on this machine...

It's this kind of thing that makes link-time optimisation of method
calls (where the linker knows the complete class hierarchy, so can
generate direct calls for things that _might_ have been overridden
but happened not to be) so attractive.

For modern machines, the two things to know are
 - memories are getting faster, but not as fast as CPUs.  When your
   C compiler wants to know the size of your L1 and L2 caches, you
   know that memory references are a serious problem.
 - CPU designers make the common things fast; it's not that indirect
   calls have got worse, it's that direct calls have been improved a
   lot more.

-- 
Mixed Member Proportional---a *great* way to vote!
Richard A. O'Keefe; http://www.cs.rmit.edu.au/%7Eok; RMIT Comp.Sci.

From: Lyman S. Taylor
Subject: Re: ANSI CLISP: strengths vs. weaknesses?
Date: Tue, 29 Oct 1996 00:00:00 +0000
Message-ID: <556n29$sv4@pravda.cc.gatech.edu>

In article <············@goanna.cs.rmit.edu.au>,
Richard A. O'Keefe <··@goanna.cs.rmit.edu.au> wrote:
>·······@netcom.com (William Paul Vrotney) writes:
>>Actually, I was always in complete agreement with the original poster's
>>remarks comparing C++ to Objective-C.  I was also under the impression that
>>pointer indirection was not very expensive on many modern day computers.  So
>>could you please explain what you mean by this last sentence.
>
>Consider a typical modern machine.
>I have a real machine in mind, but I am not proclaiming its identity,
>in case I have some of the details wrong.
>
>The machine is a 200 MHz 4-way superscalar one,
>which means that it can issue up to 4 instructions in each cycle,
>and each cycle is 5nsc.
>Only one of the instructions in each group can be a memory reference.

 Perhaps of note is a recent paper from OOPSLA '96 by Driesen and Holzle.
 <http://www.cs.ucsb.edu/oocsb/papers/oopsla96.shtml> on the cost of 
 virtual function calls in C++.  The CPUs they looked at had a BTB 
 branch target buffer. Where the addresss that a branch and a given PC
 is stored and used to prefetch.  I think the PPC 604 has this among others. 
 With speculative execution you're going to need one of these.  
 And without speculative exectuion 4 (and greater) way superscalar isn't going
 to buy you much performance. 
 [ I'm assuming that the BTB will cache "jsr RX" as a branch. ]

>
>So a pair of instructions like
>	load R1, Offset(Base)
>	call-indirect R1
>where the function pointer is likely to be in L2 cache is likely to
>take 5 cycles longer than a direct
>	call procedure-name
>(it's actually somewhat worse than this).

  Yeah but if your instruction scheduler output code like this you're toast
  anyway. :-)  You could get a back few cycles by producing

	<instructs unrelated to unrelated to Base and R1>
	load R1, Offset(Base)
	<instructs unrelated to R1>
	call-indirect R1

   For instance there is the preloading of the function arguments into the
   appropriate registers, preparing the stack frame ,  et. al. 
   Granted you won't get back the protential 30 on a complete miss, but...

   You can also use software pipelining to perhaps move the load to 
   the previous iteration of the loop. 

    (loop for i in mumble 
	do (blah  i  )           )

    in the ith iteration of the loop you start the load for the i+1 element
    in the list. Given enough Registers you can just let that one register
    lie fallow for a while while you finish the computation for the ith
    iteration. Once you reach the i+1th iteration that item will have been 
    "prefetched"... at least to some extent. 

>It's this kind of thing that makes link-time optimisation of method
>calls (where the linker knows the complete class hierarchy, so can
>generate direct calls for things that _might_ have been overridden
>but happened not to be) so attractive.

   Yes this is attractive but in the paper above the median time spent 
   doing virtual dispatch (C++ using only virtual dispatch) was approx ~14%. 
   A performance hit but it isn't the "end of the world" either. 
   Most real programs have "something" nearby that is useful so if you miss 
   then the next time you don't miss so bad. 

   [ Note with CC-NUMA machines even the static dispatch, value semantics 
	folks have these sorts of problems too. So support will be 
	added to the high end CPUs of the future to help with this. 
	i.e. the R10000 can continue on up to 4 outstanding loads. 
	Although this is usually couched as a branching on unknown
	data values... jumping indirect technically falls into that 
	classification.  ]
-- 
Lyman S. Taylor           Comment by a professor observing two students 
(·····@cc.gatech.edu)     unconscious at their keyboards:
				"That's the trouble with graduate students.
				 Every couple of days, they fall asleep."

From: Jeff Dalton
Subject: Re: ANSI CLISP: strengths vs. weaknesses?
Date: Thu, 31 Oct 1996 00:00:00 +0000
Message-ID: <846731153.9803.0@iiltd.demon.co.uk>

In article <······················@wavehh.hanse.de>,
Martin Cracauer <········@wavehh.hanse.de> wrote:

>Also, as Jim Veitch of Franz wrote, there are actually CLOS versions
>that can represent a CLOS object's slots as a flat array, so my former
>claim that slot accesses (not functions calls) have to be slower in
>CLOS was not right. 

I have never seen any _other_ way of representing slots than a flat
array, though it's usually an array of pointers.  What do you have in
mind?

As for pointers, boxing, etc, most implementation of struucture
classes (as in defstruct) support unboxed slot values fixnums and
the like.  So you don't need an especially fancy CLOS implementation
to get some cases of that sort.

-- jd