From: nobody
Subject: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <165b3efa.0403011540.6edea34c@posting.google.com>
I wrote:

>=============================================
>In the following, Lisp == Common Lisp (ANSI):
>---------------------------------------------
>
>1. The fastest Lisp implementations are slow
>   (See any third-party benchmark)
>
>2. Nobody but a small clique of fanatics likes it
>   (Whose existence proves nothing: No matter how odd
>   or perverted the cause, there will be followers)
>
>3. The vast majority of people who study Lisp in
>   school, never want to use it out of their free will
>   later on.
>
>4. Lisp is the most complicated language in the world
>   (It has the biggest standard specification document)
>
>5. However, threads and GUI are not defined by the standard
>
>6. There is no open-source cross-platform native code compiler
>
>7. There is no standard C interface.
>


To which a Lisp fanatic replied, insinuating in an obfuscated
Lisper-like manner:

> What is the definition of "slow"? What particular third-party
> benchmark are we talking about?
>
> One analysis suggests that with the best of Lisp implementations you
> should not accept a speed penality much above 10% relative to C.




I'll call your bluff, mister.

The best known non-stupid (real problem, any algorithm) benchmark
is probably the Coyote Gulch test. There are many languages that it
has been translated into. If you can produce (write or find) and post
a Lisp version that is within 10% of C performance, I will admit
that #1 is incorrect. Otherwise, you will admit that your reply is
misleading propaganda.

Let the platform be Pentium 4, Linux 2.4.x (see #6)

Do you accept the challenge? (yes/no)






I predict that the fastest Lisp version will be 2-20 times
slower than C, and its code size will be *bigger* .

If you remove declaration statements from it, making it a "normal
Lisp program", I predict that it will be about 100 (one hundred)
times slower than C (10 if you are lucky, 1000 if not)

A person not familiar with Lisp might ask: what are these
declarations and why remove them? Well, these are of two kinds:

1. type declarations - these are very different from type
declarations in other languages. If you declare types incorrectly
in C++, the compiler will give you an error. If you declare your
types incorrectly in Lisp, like state that function F
accepts lists of integers, but after some complicated program
logic erroneously give it an integer instead, your program will
*crash*  at runtime, or worse.

2. optimization requests - these are also very different from
"-O3" in other compilers. The latter does not change the program
semantics, but in Lisp, adding
(optimize (speed 3) (safety 0) (debug 0)))
may do so: instead of throwing an exception, the program
might just ignore an error and crash later, or worse.


P.S.

We don't need no obfuscation
We don't need no FORM-CONTROL
No CADR CADAR in the classroom
Hackers, leave them lists alone
Hey! Hackers! Leave them LISPs alone!

Chorus:
All in all, it's just another missing FUNCALL
All in all, purge LISP once and for all!

(See "Why is lisp so weird?" for background)

From: Noah Roberts
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <AZSdne1dvrmpr9ndRVn-ig@scnresearch.com>
nobody wrote:
> I wrote:
> 
> 

I can answer some of these...

>>=============================================
>>In the following, Lisp == Common Lisp (ANSI):
>>---------------------------------------------
>>
>>1. The fastest Lisp implementations are slow
>>  (See any third-party benchmark)

 From the GCL website: "Very efficient. A function call is basically the 
same speed as a C function call, in fact identical to a C function call 
via a pointer."

Of course that is an advertizement to use GCL...but I wouldn't doubt it.
>>
>>2. Nobody but a small clique of fanatics likes it
>>  (Whose existence proves nothing: No matter how odd
>>  or perverted the cause, there will be followers)

Lisp is commonly used in the AI field.  Scheme is used all over the 
place.  Emacs uses lisp to extend a text editor.  The last 2 are not 
Common Lisp, but are lisp dialects.
>>
>>3. The vast majority of people who study Lisp in
>>  school, never want to use it out of their free will
>>  later on.

Who wants to do anything out of their free will?  I think you mean they 
don't want to use it willingly.  Not a good argument I am affraid, most 
students seem to be boobs.  CS students appear to not like programming 
in general, much less something as different as lisp.  College students 
don't like *anything*.

>>
>>4. Lisp is the most complicated language in the world
>>  (It has the biggest standard specification document)
>>
>>5. However, threads and GUI are not defined by the standard

4/5 don't know...  I wouldn't expect a standard to include GUI and/or 
threads since that is very platform dependent.  Most language standards 
do not define these things.
>>
>>6. There is no open-source cross-platform native code compiler

That is where you are definately incorrect: 
http://www.gnu.org/software/gcl/gcl.html

" GCL is the official Common Lisp for the GNU project. Its design makes 
use of the system's C compiler to compile to native object code, 
providing for both good performance and facile portability. GCL 
currently compiles itself and the primary free software Lisp 
applications, Maxima and ACL2, on eleven GNU/Linux architectures (x86 
powerpc s390 sparc arm alpha ia64 hppa m68k mips mipsel), Windows, Sparc 
Solaris, and FreeBSD. On most platforms, GCL can load native object code 
modules directly into its lisp core, where they are preserved in any 
custom lisp images produced via the save-system call."

>>
>>7. There is no standard C interface.
>>

Language Y is not good because it doesn't have a standard interface to 
language X?!

Lisp is a tool, nothing more, nothing less.

-- 
"I'm a war president.  I make decisions here in the Oval Office
  in foreign policy matters with war on my mind." - Bush
From: Rahul Jain
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <8765dnadzk.fsf@nyct.net>
Noah Roberts <········@dontemailme.com> writes:

> Lisp is commonly used in the AI field.

And the graphics field (Lord of the Rings, Gulf War I animations, Final
Fantasy models, Super Mario 64) and the gaming field (Jak and Dexter)
and the web services field (White House document server, Motorola fab
monitor) and the chemical field (MDL's Affrent) and the satellite
control field (Deep Space One) and the airline reservation field
(Orbitz, Delta) and the financial field (PricewaterhouseCoopers' audit
planning) and cargo planning for the space station and a Lucent phone
switch and CAD (iCAD, AutoCAD) and electrical engineering (Cadence
Design Planner) and e-commerce (Yahoo stores) and billing (Norway's
largest ISP, Nextra).

Oh, and that's all common lisp.

>>>6. There is no open-source cross-platform native code compiler
>
> That is where you are definately incorrect:
> http://www.gnu.org/software/gcl/gcl.html

Actually, GCL is one of the least "native" compilers for lisp. CMUCL,
SBCL, MCL, ACL, and LW all implement the _same_ Common Lisp language and
compile to native code on a total of dozens of OS/CPU combinations. SBCL
alone can sucessfully target a half-dozen CPUs. CMUCL has targeted a few
more than that in the past.

MCL is proprietary, but the core compiler was open-sourced and OpenMCL
is a viable implementation. ACL and LW are both purely proprietary, but
what does it matter? This is where the supposed deficiency in the
stardard's scope helps Lisp. All that functionality is now portable
across all the compilers, so you can choose an open source
implementation when it suits you and a proprietary implementation when
the added libraries, support agreements, and tunability justify the
cost.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Joona I Palaste
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <c21gtg$mrl$1@oravannahka.helsinki.fi>
Could you please stop crossposting these to comp.lang.{c,c++,fortran},
thanks?

-- 
/-- Joona Palaste (·······@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"Holy Banana of this, Sacred Coconut of that, Magic Axolotl of the other."
   - Guardian in "Jinxter"
From: William D Clinger
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <fb74251e.0403040909.1b839709@posting.google.com>
Joona I Palaste <·······@cc.helsinki.fi> pleaded:
> Could you please stop crossposting these to comp.lang.{c,c++,fortran},
> thanks?

Nobody crossposted.

Will
From: Nils Gösche
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87fzcsrsjn.fsf@darkstar.cartan.de>
··················@yahoo.com (nobody) writes:

> I wrote:

[stuff]

He should translate about 350 lines of boring C++ code to Lisp just to
satisfy a troll named �nobody�?  Screw off.

Regards,
-- 
Nils G�sche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID #xEEFBA4AF
From: Tayssir John Gabbour
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <866764be.0403020418.307068a0@posting.google.com>
···@cartan.de (Nils G�sche) wrote in message news:<··············@darkstar.cartan.de>...
> He should translate about 350 lines of boring C++ code to Lisp just to
> satisfy a troll named �nobody�?  Screw off.

Let's not be too hasty, Nils.  This "nobody" fellow was basically
volunteering to write this benchmark, since he has enough time and
knows enough lisp to pen silly songs about it.  He can show us drafts
of his work on a dedicated thread, and people will give him advice. 
The educational value is enormous.

Now, admittedly a benchmark isn't the most interesting program. 
There's no efficient way to show off lisp's error-signalling system,
probably no bignums, data-directed technique, etc.  And an actual
programmer would use multiple complementary languages; I once used
Fortran for a month, which was faster than C in that context, but C
was best suited to printing out results since it was on Unix.

I'm sure he'll be glad to show the world his coding ability.  With the
same courage it took him to make such imperious claims about an entire
language anonymously on usenet.
From: Duane Rettig
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <4znazs0bh.fsf@franz.com>
···········@yahoo.com (Tayssir John Gabbour) writes:

> ···@cartan.de (Nils G�sche) wrote in message news:<··············@darkstar.cartan.de>...
> > He should translate about 350 lines of boring C++ code to Lisp just to
> > satisfy a troll named �nobody�?  Screw off.
> 
> Let's not be too hasty, Nils.  This "nobody" fellow was basically
> volunteering to write this benchmark, since he has enough time and
> knows enough lisp to pen silly songs about it.  He can show us drafts
> of his work on a dedicated thread, and people will give him advice. 
> The educational value is enormous.

So are you then saying that nobody has promised to translate this
benchmark?  What an interesting and apropos double-entendre!  And he's
got you going like nobody's business.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Espen Vestre
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <kwr7wbqlid.fsf@merced.netfonds.no>
Duane Rettig <·····@franz.com> writes:

> So are you then saying that nobody has promised to translate this
> benchmark?  What an interesting and apropos double-entendre!  And he's
> got you going like nobody's business.

Nobody is a troll.
Nobody bashes lisp.
It's a beautiful day.
What more could we ask for?
-- 
  (espen)
From: David Steuber
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <m2llmjepnf.fsf@david-steuber.com>
···@cartan.de (Nils G�sche) writes:

> He should translate about 350 lines of boring C++ code to Lisp just to
> satisfy a troll named �nobody�?  Screw off.

I know it is bad to feed the trolls, but is this the benchmark he
speaks of:

  http://www.coyotegulch.com/reviews/intel_comp/intel_gcc_bench2.html

?

Is this a valid performance benchmark for comparing languages?  It
was written to compare C++ compilers.  That seems like a different
thing to do.  Still, number crunching is number crunching I guess.

Until I take a look at it, I don't know if I could write a Lisp
equivalent.  It's not just that I am still a Lisp newbie.  My math
skills have sadly atrophied from lack of use.  I never did an FFT
either.

If this *is* the correct benchmark and it is possible to write a Lisp
program to produce the same results, I may try it for my own
curiosity.  Does Lisp have a timing function that can be run from the
top level like the time utility in Unix?  I assume there must be
something along those lines for profiling code.

In any event, the test codes would have to be run on the same
hardware and OS to be meaningful.  I ain't got no PIV.

-- 
Those who do not remember the history of Lisp are doomed to repeat it,
badly.
From: Lars Brinkhoff
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <853c8rhbyr.fsf@junk.nocrew.org>
David Steuber <·············@verizon.net> writes:
> Does Lisp have a timing function that can be run from the top level
> like the time utility in Unix?

Yes, and it's called "time".
http://clhs.lisp.se/Body/m_time.htm

-- 
Lars Brinkhoff,         Services for Unix, Linux, GCC, HTTP
Brinkhoff Consulting    http://www.brinkhoff.se/
From: John Harrison
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <c21a4b$1ob2kv$1@ID-196037.news.uni-berlin.de>
> >
> >4. Lisp is the most complicated language in the world
> >   (It has the biggest standard specification document)
> >

That's factually incorrect, the ANSI lisp standardisation document is a mere
51 pages, the C++ document is 748 pages. I would say that by your measure
lisp is easily the simplest programming language around.

john
From: John Harrison
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <c21bsp$1o11sp$1@ID-196037.news.uni-berlin.de>
"John Harrison" <···············@hotmail.com> wrote in message
····················@ID-196037.news.uni-berlin.de...
> > >
> > >4. Lisp is the most complicated language in the world
> > >   (It has the biggest standard specification document)
> > >
>
> That's factually incorrect, the ANSI lisp standardisation document is a
mere
> 51 pages, the C++ document is 748 pages. I would say that by your measure
> lisp is easily the simplest programming language around.

Correction the ANSI Scheme document is 51 pages long. Scheme is the most
interesting dialect of lisp.

john
From: Rahul Jain
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <871xobadv1.fsf@nyct.net>
"John Harrison" <···············@hotmail.com> writes:

> Correction the ANSI Scheme document is 51 pages long. Scheme is the most
> interesting dialect of lisp.

As much as Java is the most interesting dialect of C. And the Scheme
standard definitely isn't 51 pages any more.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: John Harrison
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <c21emq$1nojvr$1@ID-196037.news.uni-berlin.de>
"Rahul Jain" <·····@nyct.net> wrote in message
···················@nyct.net...
> "John Harrison" <···············@hotmail.com> writes:
>
> > Correction the ANSI Scheme document is 51 pages long. Scheme is the most
> > interesting dialect of lisp.
>
> As much as Java is the most interesting dialect of C. And the Scheme
> standard definitely isn't 51 pages any more.
>

I have IEEE Std 1178-1990, has a more recent standard been released?

john
From: Ray Dillinger
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <404550CD.187EB08F@sonic.net>
Rahul Jain wrote:
> 
> "John Harrison" <···············@hotmail.com> writes:
> 
> > Correction the ANSI Scheme document is 51 pages long. Scheme is the most
> > interesting dialect of lisp.
> 
> As much as Java is the most interesting dialect of C. And the Scheme
> standard definitely isn't 51 pages any more.
> 

Um, I was the committee chair when IEEE1178 was reaffirmed last year. 
This is the main reason I have that document memorized.  It most 
definitely is 51 pages in length.

R5RS has grown a little bigger; I believe it is now 54 or 55 pages.

				Bear
From: Rahul Jain
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <873c8mdpt2.fsf@nyct.net>
Ray Dillinger <····@sonic.net> writes:

> Um, I was the committee chair when IEEE1178 was reaffirmed last year. 
> This is the main reason I have that document memorized.  It most 
> definitely is 51 pages in length.
>
> R5RS has grown a little bigger; I believe it is now 54 or 55 pages.

OK, I've usually read the standard in an HTML-ized form. I didn't
realize the print form used such a small font.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Marco Antoniotti
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <KC11c.70$IJ5.51283@typhoon.nyu.edu>
John Harrison wrote:

> "John Harrison" <···············@hotmail.com> wrote in message
> ····················@ID-196037.news.uni-berlin.de...
> 
>>>>4. Lisp is the most complicated language in the world
>>>>  (It has the biggest standard specification document)
>>>>
>>
>>That's factually incorrect, the ANSI lisp standardisation document is a
> 
> mere
> 
>>51 pages, the C++ document is 748 pages. I would say that by your measure
>>lisp is easily the simplest programming language around.
> 
> 
> Correction the ANSI Scheme document is 51 pages long. Scheme is the most
> interesting dialect of lisp.

Couldn't find DEFSTRUCT in the 51 pages.  Not very interesting to me.


Marco
From: Rahul Jain
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87n06z1b74.fsf@nyct.net>
Marco Antoniotti <·······@cs.nyu.edu> writes:

> Couldn't find DEFSTRUCT in the 51 pages.  Not very interesting to me.

Strong typing isn't for us mere programmers. Only the language
developers are allowed to do that. :)

"User-defined strong types considered harmful"?

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Christian Lynbech
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <871xob5cpg.fsf@baguette.defun.dk>
>>>>> "Rahul" == Rahul Jain <·····@nyct.net> writes:

Rahul> Marco Antoniotti <·······@cs.nyu.edu> writes:
>> Couldn't find DEFSTRUCT in the 51 pages.  Not very interesting to me.

Rahul> Strong typing isn't for us mere programmers. Only the language
Rahul> developers are allowed to do that. :)

Rahul> "User-defined strong types considered harmful"?

Perhaps we have a candidate for "Greenspuns 9. rule of programming":

        All discussions relating any dialect of lisp to any other
        language will eventually degenerate into bickering between
        the common lisp and scheme communities.

;-)

                             -- Christian
From: Rahul Jain
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87oerepxlg.fsf@nyct.net>
Christian Lynbech <·········@defun.dk> writes:

>         All discussions relating any dialect of lisp to any other
>         language will eventually degenerate into bickering between
>         the common lisp and scheme communities.

Well, they ARE the only serious competition out there. (Other than
dylan, in terms of features, but they're a bit lacking in terms of
presence.)

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Howard Ding
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <jy%0c.18376$ko6.215165@attbi_s02>
John Harrison wrote:

> 
> Correction the ANSI Scheme document is 51 pages long. Scheme is the most
> interesting dialect of lisp.
> 

Correction: to some people, Scheme is the most interesting dialect of 
Lisp. :-)
From: Chul-Woong Yang
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <l3X0c.135964$S3.1648914@news.bora.net>
I'm one of Lisp newbie, maybe not fanatic.

I've started learning Lisp because some said that
the execution speed of Lisp can be tolerable, compared to C.
So if your - nobody - claim turns out to be true, then I'll be
very disappointed.

Though, I believe there should be some reasons to anything which many 
guys insist. You may be right. Lisp can be slow. The Lisp application 
written by myself is slow. But they say that Lisp is easy language to 
write slow program, for newbie. So maybe it's my fault. :-(
But as far as I know, most of Lisp fanatics know both C world and Lisp 
world. They - including Eric Raymond - say Lisp is a thing worth study. 
So I keep going.

If you also know both worlds *well*, please share your opinion. When I 
was evaluating the worth of studying new programming language, I was 
eager to see an article like "Lisp sucks! Lisp sh*t! Lisp so slow that 
nobody can use it on production..." from the guy who lived in both 
worlds, that I failed to find.

Do Lisp sucks?
From: Matthew Danish
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <20040302094433.GF31147@mapcar.org>
On Tue, Mar 02, 2004 at 07:52:49AM +0000, Chul-Woong Yang wrote:
> I'm one of Lisp newbie, maybe not fanatic.
> 
> I've started learning Lisp because some said that
> the execution speed of Lisp can be tolerable, compared to C.
> So if your - nobody - claim turns out to be true, then I'll be
> very disappointed.

"nobody" is just a troll.  He is spreading lies to annoy people.  It is
not an uncommon thing in USENET to see people like him.

> Though, I believe there should be some reasons to anything which many 
> guys insist. You may be right. Lisp can be slow. The Lisp application 
> written by myself is slow. But they say that Lisp is easy language to 
> write slow program, for newbie. So maybe it's my fault. :-(
> But as far as I know, most of Lisp fanatics know both C world and Lisp 
> world. They - including Eric Raymond - say Lisp is a thing worth study. 
> So I keep going.
> 
> If you also know both worlds *well*, please share your opinion. When I 
> was evaluating the worth of studying new programming language, I was 
> eager to see an article like "Lisp sucks! Lisp sh*t! Lisp so slow that 
> nobody can use it on production..." from the guy who lived in both 
> worlds, that I failed to find.

The reason why it is easy to write slow code in Lisp is because the
constructs you use are generally much more powerful than a newbie (or a
C programmer) might think.  Consider the + function: in C it can be
turned into one or two assembly instructions, but in Lisp it might
involve a whole function call and type-dispatching.  What this buys you
is the ability to say (+ 1/2 2/3) and get the mathematically exact
result of 7/6 (as well as (+ #C(1 3) 5/3) => #C(8/3 3) and many other
numeric types), things you cannot do easily in C.  Once you start
introducing this functionality into C,  you find that the performance of
the C code is approximately (or worse) that of the Lisp code.  

The situation gets better for Lisp compilers because they can also
perform analysis on code and perhaps find ways to optimize previously
generic operations into very specific ones.  For example, if CMUCL
determines that the variables A and B in (+ A B) always have integer
values in the range 1 to 1000 (or whatever) then it can optimize that
operation as much as the C compiler does.  This is all effort expended
only by the compiler writers, and the benefits are enjoyed by all the
users.  However, to get the same level of expressiveness in C, every
user must do the hard work for himself.  This is a trade-off that is
quite common between high-level languages and low-level languages.

When you start needing the functionality which Lisp provides, such as in
a large, actually useful program, the efficiency gap between Lisp and C
disappears and can even reverse to where the Lisp program outperforms
the C program.  The reason that this might happen is because the code
written by the C programmer which duplicates the functionality of Lisp
is generally not as good, doesn't get any help from the compiler, has no
way to inform the compiler of potential optimizations, has to deal with
lots of annoying details which Lisp would take care of for you, and is
simply unable to provide the same level of flexibility and
expressiveness no matter what you do.  

Perhaps one of the more difficult parts of learning Lisp from a C
background is the realization of the power of the constructs, and
acquiring the knowledge to know where and how to use them.  A good
example of this is closures.  When I explain the concept to a typical C
programmer, they often say something like ``oh, you mean static
variables in functions'' or ``so what?''  Unless the person is versed in
computability theory (and/or the lambda calculus) it is difficult for
them to comprehend at first the staggering power of such a simple idea.
The concept of a closure is a feature which is almost universal to
high-level languages, and when you understand it is hard to imagine
getting by without it.

Many of the advantages that Lisp has over C are also shared by other
high-level languages.  I would say that these days it is no longer
debated whether high-level languages are better for writing complicated
programs than low-level languages.  The only people who argue otherwise
are a small fringe who are absolutely obsessed with every CPU cycle,
even when the benefit of gaining those cycles is far outweighed by the
cost of development.  A more interesting question is the choice between
the various high-level languages: What are the advantages and
disadvantages of using Common Lisp over one of Smalltalk, Self, SML,
OCAML, Scheme, Haskell, Mercury, etc?

I was a C programmer for approximately 8 years before I learned about
Lisp.  In that time I dabbled with assembly and a little C++.  After I
began learning Lisp, for the past four years, I have also learned about
many other languages such as SML, Scheme, Haskell, Smalltalk, Prolog,
Python, Ruby, etc.  It may be that Lisp opened up my mind, or maybe it
was just the right time.  But either way, the nice thing about Lisp is
that it encompasses or can support just about all high-level concepts in
programming languages, and the language still boasts some of the most
"programmer-friendly" environments.  So I continue to return to CL for
the majority of my work.

-- 
; Matthew Danish <·······@andrew.cmu.edu>
; OpenPGP public key: C24B6010 on keyring.debian.org
; Signed or encrypted mail welcome.
; "There is no dark side of the moon really; matter of fact, it's all dark."
From: Chris Perkins
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <6cb6c81f.0403021032.49af77c9@posting.google.com>
And, let's not forget that Lisp has a compiler built into the
language.  This feature alone can result in massive speed gains over
similar real world C code.

A C program can often be made faster by special case coding option A
vs. option B, especially where A or B is in a speed intensive loop.
For example, if you were writing a blitter you might write a 4
bytes-at-a-time version and a 1 byte-at-a-time version. In these
cases, the options aren't changing during the speed intensive loop,
but they are options that the program needs to handle.

But what if you have lots of options in that loop?  Combinatorial
explosion occurs and you are in trouble.  Is a C programmer going to
produce tens, hundreds, or thousands of versions of a special loop
just to get the maximum speed?  Not typically.  Especially not if it's
going to balloon their executable up by megabytes.

But a Lisp program can take the current set of options and compile
fresh a new tight special cased loop and use it.  Made to order.  So
now the Lisp code has exactly optimized itself to the situation at
hand, and the C code is generalized and its performance suffers.

I know firsthand that situations like this occur in 3D rendering and
real-time special effects, and I'm quite sure there are many other
similar situations.

Chris Perkins
Media Lab, Inc.
From: Jacek Generowicz
Subject: The staggering power of closures
Date: 
Message-ID: <tyfhdx79xjp.fsf_-_@pcepsft001.cern.ch>
Matthew Danish <·······@andrew.cmu.edu> writes:

> Perhaps one of the more difficult parts of learning Lisp from a C
> background is the realization of the power of the constructs, and
> acquiring the knowledge to know where and how to use them.  A good
> example of this is closures.  When I explain the concept to a typical C
> programmer, they often say something like ``oh, you mean static
> variables in functions'' or ``so what?''  Unless the person is versed in
> computability theory (and/or the lambda calculus) it is difficult for
> them to comprehend at first the staggering power of such a simple idea.

I've heard statements to this effect before. Now, I love closures, I
like to believe that I understand them quite well. I find them
natural, I frequently use them, I often get annoyed by their absence
when using languages which don't support them (or support them badly).

Yet, I've never been tempted to call their power "staggering".

My familiarity with computability theory and lambda calculus is less
than intimate ... and repeatedly hearing references to the almost
supernatural power of closures makes me wonder whether I am missing
out on something, or whether it's merely a difference in propensity to
wax lyrical.

I use closures when I want to create functions at runtime, when I want
my functions to hold some state, sometimes even when I want to delay
the evaluation of something ... but I almost don't notice it when I do
these things ... I mostly notice when the language I am forced to use
prevents me from doing it.

Could you show me a usage of closures that would make me shout out
loud "But that's staggering!"?  I guess I'm hoping for people to
submit interesting uses of closures they have come up with, so that we
can all share and enjoy (and learn).

[Time to look up Kaz' super-handler-bind (or something along those
lines - the exact name escapes me right now) macro, which struck me as
cool at the time, but the details of which I have completely forgotten
now, other than that a closure being sent somewhere was central to the
idea.]

Also ... any good examples of where closures could not adequately be
replaced by callable class instances? I'm not talking about CLOS
classes, but your typical common or garden C++ functor, or Python
class with __call__ and the like. (Hi Mike :-)
From: Joe Marshall
Subject: Re: The staggering power of closures
Date: 
Message-ID: <fzcr9v76.fsf@ccs.neu.edu>
Jacek Generowicz <················@cern.ch> writes:

> Also ... any good examples of where closures could not adequately be
> replaced by callable class instances?

Any continuation-passing-style code should do.

> Yet, I've never been tempted to call their power "staggering".
> Could you show me a usage of closures that would make me shout out
> loud "But that's staggering!"?  I guess I'm hoping for people to
> submit interesting uses of closures they have come up with, so that we
> can all share and enjoy (and learn).

Depends on how jaded you are.
From: Rahul Jain
Subject: Re: The staggering power of closures
Date: 
Message-ID: <87llmj2rm2.fsf@nyct.net>
Joe Marshall <···@ccs.neu.edu> writes:

> Jacek Generowicz <················@cern.ch> writes:
>
>> Yet, I've never been tempted to call their power "staggering".
>> Could you show me a usage of closures that would make me shout out
>> loud "But that's staggering!"?  I guess I'm hoping for people to
>> submit interesting uses of closures they have come up with, so that we
>> can all share and enjoy (and learn).
>
> Depends on how jaded you are.

Maybe he finds closures to be so natural and so convenient that it's the
_lack_ of them that he'd find staggering?

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Kenny Tilton
Subject: Re: The staggering power of closures
Date: 
Message-ID: <9321c.2936$Wo2.2107@twister.nyc.rr.com>
Rahul Jain wrote:

> Joe Marshall <···@ccs.neu.edu> writes:
> 
> 
>>Jacek Generowicz <················@cern.ch> writes:
>>
>>
>>>Yet, I've never been tempted to call their power "staggering".
>>>Could you show me a usage of closures that would make me shout out
>>>loud "But that's staggering!"?  I guess I'm hoping for people to
>>>submit interesting uses of closures they have come up with, so that we
>>>can all share and enjoy (and learn).
>>
>>Depends on how jaded you are.
> 
> 
> Maybe he finds closures to be so natural and so convenient that it's the
> _lack_ of them that he'd find staggering?

No, he touched on that and conceded he was wobbled by their absence, 
just not /staggered/.

:)

kenny


-- 
http://tilton-technology.com

Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film

Your Project Here! http://alu.cliki.net/Industry%20Application
From: Jacek Generowicz
Subject: Re: The staggering power of closures
Date: 
Message-ID: <tyf4qt79rb8.fsf@pcepsft001.cern.ch>
Kenny Tilton <·······@nyc.rr.com> writes:

> No, he touched on that and conceded he was wobbled by their absence,
> just not /staggered/.

Hmm, I see that you are intent on making your own prediction come true
:-)
From: David Steuber
Subject: Re: The staggering power of closures
Date: 
Message-ID: <m2vflmde5t.fsf@david-steuber.com>
Joe Marshall <···@ccs.neu.edu> writes:

> Jacek Generowicz <················@cern.ch> writes:
> 
> > Yet, I've never been tempted to call their power "staggering".
> > Could you show me a usage of closures that would make me shout out
> > loud "But that's staggering!"?  I guess I'm hoping for people to
> > submit interesting uses of closures they have come up with, so that we
> > can all share and enjoy (and learn).
> 
> Depends on how jaded you are.

Shouldn't you use COND for that?

-- 
Those who do not remember the history of Lisp are doomed to repeat it,
badly.
From: Matthew Danish
Subject: Re: The staggering power of closures
Date: 
Message-ID: <20040302165528.GI31147@mapcar.org>
On Tue, Mar 02, 2004 at 02:40:58PM +0100, Jacek Generowicz wrote:
> Could you show me a usage of closures that would make me shout out
> loud "But that's staggering!"?  I guess I'm hoping for people to
> submit interesting uses of closures they have come up with, so that we
> can all share and enjoy (and learn).

See my other recent reply to Chul-Woong.  That's something that is much
more difficult to do in a language without closures (I've tried).

In addition, I find the fact that closures and function application are
enough to compute anything computable by a Universal Turing Machine to
be pretty staggering too (lambda calculus).

> Also ... any good examples of where closures could not adequately be
> replaced by callable class instances? I'm not talking about CLOS
> classes, but your typical common or garden C++ functor, or Python
> class with __call__ and the like. (Hi Mike :-)

I am not terribly familiar with those sorts of things but I suppose that
any situation which calls for a (mutable) binding to be visible to
several different closures would be difficult to model using said
classes.  It can probably be achieved, though, by adding a layer of
indirection.

-- 
; Matthew Danish <·······@andrew.cmu.edu>
; OpenPGP public key: C24B6010 on keyring.debian.org
; Signed or encrypted mail welcome.
; "There is no dark side of the moon really; matter of fact, it's all dark."
From: Jacek Generowicz
Subject: Re: The staggering power of closures
Date: 
Message-ID: <tyfznay8gl0.fsf@pcepsft001.cern.ch>
Matthew Danish <·······@andrew.cmu.edu> writes:

> On Tue, Mar 02, 2004 at 02:40:58PM +0100, Jacek Generowicz wrote:
> > Could you show me a usage of closures that would make me shout out
> > loud "But that's staggering!"?  I guess I'm hoping for people to
> > submit interesting uses of closures they have come up with, so that we
> > can all share and enjoy (and learn).
> 
> See my other recent reply to Chul-Woong. 

[ie this one:

<······················@mapcar.org>

http://www.google.com/groups?safe=images&ie=UTF-8&oe=UTF-8&as_umsgid=20040302103002.GG31147%40mapcar.org&lr=&hl=en

]

> > Also ... any good examples of where closures could not adequately be
> > replaced by callable class instances? I'm not talking about CLOS
> > classes, but your typical common or garden C++ functor, or Python
> > class with __call__ and the like. (Hi Mike :-)
> 
> I am not terribly familiar with those sorts of things but I suppose that
> any situation which calls for a (mutable) binding to be visible to
> several different closures would be difficult to model using said
> classes.  It can probably be achieved, though, by adding a layer of
> indirection.

Yup, independent closures which go their separate ways, being sent to
or stored in different parts of the program, but needing to be able to
refer to and update a shared binding.

Now, I can't see a real life situation where I would need this off the
top of my head, but by pointing it out here, you've probably increased
the probability of my thinking of such a solution when the opportunity
next presents itself. Thanks.
From: Joe Marshall
Subject: Re: The staggering power of closures
Date: 
Message-ID: <brnf9l3o.fsf@ccs.neu.edu>
Jacek Generowicz <················@cern.ch> writes:

> Could you show me a usage of closures that would make me shout out
> loud "But that's staggering!"?  

These aren't staggering, but they are pretty cool:

Any control construct whatsover can be built out of closures.  Any
data structure as well.  Even primitive data such as booleans and
numbers can be built out of closures.

If a computer language is missing something important, but it has
closures, you can fill the gap.
From: Jacek Generowicz
Subject: Re: The staggering power of closures
Date: 
Message-ID: <tyfvflm8fys.fsf@pcepsft001.cern.ch>
Joe Marshall <···@ccs.neu.edu> writes:

> Jacek Generowicz <················@cern.ch> writes:
> 
> > Could you show me a usage of closures that would make me shout out
> > loud "But that's staggering!"?  
> 
> These aren't staggering, but they are pretty cool:
> 
> Any control construct whatsover can be built out of closures.  Any
> data structure as well.  Even primitive data such as booleans and
> numbers can be built out of closures.
> 
> If a computer language is missing something important, but it has
> closures, you can fill the gap.

Theoretically, yes.

Have you actually done something like this is practice ?

(Again, maybe I should stress that I don't mean to belittle your
suggestions ... but that if you have an illustrative example then I'd
like to see one, because it would make the idea stick much more
effectively.)
From: Kenny Tilton
Subject: Re: The staggering power of closures
Date: 
Message-ID: <X811c.2929$Wo2.1838@twister.nyc.rr.com>
Jacek Generowicz wrote:
>>variables in functions'' or ``so what?''  Unless the person is versed in
>>computability theory (and/or the lambda calculus) it is difficult for
>>them to comprehend at first the staggering power of such a simple idea.
> 
> 
> I've heard statements to this effect before. Now, I love closures, I
> like to believe that I understand them quite well. I find them
> natural, I frequently use them, I often get annoyed by their absence
> when using languages which don't support them (or support them badly).
> 
> Yet, I've never been tempted to call their power "staggering".

I am having visions of a long thread involving mostly agreement but also 
lots of "yes, that is awesome, but not /staggering/".

This should be fun. :)

kenny

-- 
http://tilton-technology.com

Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film

Your Project Here! http://alu.cliki.net/Industry%20Application
From: Frode Vatvedt Fjeld
Subject: Re: The staggering power of closures
Date: 
Message-ID: <2hishni9mu.fsf@vserver.cs.uit.no>
Kenny Tilton <·······@nyc.rr.com> writes:

> I am having visions of a long thread involving mostly agreement but
> also lots of "yes, that is awesome, but not /staggering/".

And don't forget "this can be done almost equally well in any
Turing-complete language".

-- 
Frode Vatvedt Fjeld
From: Kenny Tilton
Subject: Re: The staggering power of closures
Date: 
Message-ID: <J921c.2937$Wo2.2617@twister.nyc.rr.com>
Frode Vatvedt Fjeld wrote:

> Kenny Tilton <·······@nyc.rr.com> writes:
> 
> 
>>I am having visions of a long thread involving mostly agreement but
>>also lots of "yes, that is awesome, but not /staggering/".
> 
> 
> And don't forget "this can be done almost equally well in any
> Turing-complete language".

Thx, I /did/ forget that variant.

I recently showed someone something I had done in ten lines with a macro 
and I swear to God he said: "Big deal! Here is the same thing in my 
language. [shows me same functionality] It's just a pain writing those 
three pages of code..." I think this one will be a staggeringly easy 
convert.

:)

kenny

-- 
http://tilton-technology.com

Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film

Your Project Here! http://alu.cliki.net/Industry%20Application
From: Jacek Generowicz
Subject: Re: The staggering power of closures
Date: 
Message-ID: <tyf8yij9rcx.fsf@pcepsft001.cern.ch>
Kenny Tilton <·······@nyc.rr.com> writes:

> I am having visions of a long thread involving mostly agreement but
> also lots of "yes, that is awesome, but not /staggering/".

I'll be deleriously happy even if the thread does not induce me to say
"staggering", as long as it makes me say "hey, that's cool, I hadn't
thought of that before" a few times.
From: Kenny Tilton
Subject: Re: The staggering power of closures
Date: 
Message-ID: <Wm31c.3307$Wo2.2799@twister.nyc.rr.com>
Jacek Generowicz wrote:

> Kenny Tilton <·······@nyc.rr.com> writes:
> 
> 
>>I am having visions of a long thread involving mostly agreement but
>>also lots of "yes, that is awesome, but not /staggering/".
> 
> 
> I'll be deleriously happy even if the thread does not induce me to say
> "staggering", as long as it makes me say "hey, that's cool, I hadn't
> thought of that before" a few times.

Well, it's prezelection season here in the states so i am sensitive to 
the expectations game. :) But in all seriousness, i did not respond 
seriously with what i wanted to say because you had already said you 
liked 'em and missed 'em elsewhere. So the bar really did seem set at 
"stagger".

Anyway, one thing that floors me when I look at closures is that I 
cannot believe it works. I am used to functions being black boxes. How 
the hell does a 1st-class function born of a lambda form get to muck 
with a binding (a) lexically outside the form and (b) dynamically no 
longer existent?! Sure, this is not so much functional as it is "wow, 
how'd they do that?" but it is part of what leaves me weak in the knees.

As for the power, I guess we agree (lambda () ...) is powerful. That 
means I can pass a function around as data to be applied as function by 
other functions. So that lets me program at a higher level. But what can 
these functions do? if they cannot close over variables of the spawning 
function, then they are effectively hard-coded agents, reacting only to 
their arguments. But with closures I am able to spawn different agents 
simply by having their code refer to my local bindings.

It's about time I mentioned Cells. One giant leap for mankind was when 
it turned out the runtime population of model instances itself could be 
mediated by cells. That means there is code all over the place that 
looks like:

  (defmethod make-inspector-view ((thingy standard-object) option)
     (make-instance 'inspector-clos-viewer
        :md-value thingy
        :appearance (c? (this-that (the-other thingy)
                          <tons of other code>
                          (case option .....))))


Without closures, I have to code:

      :appearace (case option
                   (:left (c? (whole-new-subroutine 5)))
                   (:top (c? (whole-new-subroutine 10))))

...and god help me if I was closing over two variables.

The rabbit punch line is that, like I said, I have code like this all 
over the place. My already higher-order function objects conform 
seamlessly to the exigencies of the dynamic spawning moment, reaching a 
yet higher order of expressive power.

And this is probably why CLIM leaves me cold: I don't need no stinkin 
presentation scheme, because closures (Cell rules are just closures) let 
code drape naturally over functional requirements.

Whew, I'm all punched out.

:)

kenny



-- 
http://tilton-technology.com

Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film

Your Project Here! http://alu.cliki.net/Industry%20Application
From: Pascal Bourguignon
Subject: Re: The staggering power of closures
Date: 
Message-ID: <87d67vvw48.fsf@thalassa.informatimago.com>
Kenny Tilton <·······@nyc.rr.com> writes:
> Anyway, one thing that floors me when I look at closures is that I
> cannot believe it works. I am used to functions being black boxes. How
> the hell does a 1st-class function born of a lambda form get to muck
> with a binding (a) lexically outside the form and (b) dynamically no
> longer existent?! Sure, this is not so much functional as it is "wow,
> how'd they do that?" but it is part of what leaves me weak in the
> knees.

You have to realize that closures are objects(*) with the insides out, sort of.

(*): "classic" objects, where the data and the methods are both
encapsulated into a black box with messages as only interface.


(defmacro make-serie ((var init) &body increment)
    (let ((vinit (gensym)))
    `(let* ((,vinit ,init)
            (,var nil))
        (lambda (message)
            (case message
              ((:reset)  (setf ,var ,vinit))
              ((:next)   (setf ,var (if ,var (progn ,@increment) ,vinit))))))))


(defparameter s1 (make-serie (n 0) (1+ n)))
(defparameter s2 (make-serie (n 1) (* 2 n)))
(mapcar (lambda (s) (list   (funcall s :next)
                            (funcall s :next)
                            (funcall s :next)
                            (funcall s :reset)
                            (funcall s :next)
                            (funcall s :next)
                            (funcall s :next))) (list s1 s2))
--> ((0 1 2 0 1 2 3) (1 2 4 1 2 4 8))



(defclass serie ()
    ((init      :initarg :init  :accessor init)
     (value     :initform nil   :accessor value)
     (increment :initarg :increment :accessor increment)))

(defmethod reset ((s serie)) (setf (value s) (init s)))
(defmethod next  ((s serie)) 
    (setf (value s) (if (value s) (funcall (increment s) (value s))
                                  (init s))))

(defparameter s1 (make-instance 'serie :init 0 :increment (lambda (n) (1+ n))))
(defparameter s2 (make-instance 'serie :init 1 :increment (lambda (n) (* 2 n))))
(mapcar (lambda (s) (list   (next s)
                            (next s)
                            (next s)
                            (reset s)
                            (next s)
                            (next s)
                            (next s))) (list s1 s2))        

--> ((0 1 2 0 1 2 3) (1 2 4 1 2 4 8))



Or perhaps  I should  say that closures  are objects with  the insides
inside, really encapsulated and protected with the only entry point as
defined and controled by the closure function...


 
> As for the power, I guess we agree (lambda () ...) is powerful. That
> means I can pass a function around as data to be applied as function
> by other functions. So that lets me program at a higher level. But
> what can these functions do? if they cannot close over variables of
> the spawning function, then they are effectively hard-coded agents,
> reacting only to their arguments. But with closures I am able to spawn
> different agents simply by having their code refer to my local
> bindings.


For example, in emacs lisp:

(defun make-closure (var init template)
  (let ((vsym (gensym)))
    (funcall (subst vsym var init))
    (subst vsym var template)))

(setq c (make-closure 'n (lambda () (setq n 0)) (lambda (x) (incf n x) n)))
(list (funcall c 2)(funcall c 2)(funcall c 3))
--> (2 4 7)

In Common-Lisp, with _lexical_ closures:

(setq c (let ((n 0)) (lambda (x) (incf n x) n)))
(list (funcall c 2)(funcall c 2)(funcall c 3))
--> (2 4 7)


So, yes, in a way you have 90% of the closure usefulness in lambda.


The difference may look more  interesting when you capture the lexical
state inside some  other code.  In emacs lisp you  have to provide the
closure  creation  macro  with  the wanted  initialization,  while  in
Common-Lisp  closures, you  just take  what  you need  in the  lexical
environment.

(defun f (p q)
    (let ((x))
        ;; ...
        (let ((y))
            ;; ...
           (let ((z))
               ;; ...
                (lambda () (list p x z))))))

vs:

  ;; ...
                (make-closure '(p_ x_ z_) 
                            (lambda () (setq p_ p x_ x z_ z))
                            (lambda () (list p_ x_ z_))) ;; )))...


> And this is probably why CLIM leaves me cold: I don't need no stinkin
> presentation scheme, because closures (Cell rules are just closures)
> let code drape naturally over functional requirements.

-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Karl Pflästerer
Subject: Re: The staggering power of closures
Date: 
Message-ID: <m3n06zj58e.fsf@hamster.pflaesterer.de>
Pascal Bourguignon <- ····@thalassa.informatimago.com wrote:

> For example, in emacs lisp:

> (defun make-closure (var init template)
>   (let ((vsym (gensym)))
>     (funcall (subst vsym var init))
>     (subst vsym var template)))

> (setq c (make-closure 'n (lambda () (setq n 0)) (lambda (x) (incf n x) n)))
> (list (funcall c 2)(funcall c 2)(funcall c 3))
> --> (2 4 7)

> In Common-Lisp, with _lexical_ closures:

> (setq c (let ((n 0)) (lambda (x) (incf n x) n)))
> (list (funcall c 2)(funcall c 2)(funcall c 3))
> --> (2 4 7)

Of course it is a bit nicer in Emacs Lisp if you use lexical-let:

(fset 'c (lexical-let ((n 0)) (lambda (x) (incf n x))))
(list (c 2)(c 2)(c 3))
 =>(2 4 7)


cl.el ist an extremly good lib for (X)Emacs.


   KP

-- 
 `Beware the Jabberwock, my son!
     The jaws that bite, the claws that catch!
  Beware the Jubjub bird, and shun
    The frumious Bandersnatch!'   "Lewis Carroll" "Jabberwocky"
From: Pascal Bourguignon
Subject: Re: The staggering power of closures
Date: 
Message-ID: <87brnebzu6.fsf@thalassa.informatimago.com>
······@12move.de (Karl Pfl�sterer) writes:

> Pascal Bourguignon <- ····@thalassa.informatimago.com wrote:
> 
> > For example, in emacs lisp:
> 
> Of course it is a bit nicer in Emacs Lisp if you use lexical-let:
> 
> (fset 'c (lexical-let ((n 0)) (lambda (x) (incf n x))))
> (list (c 2)(c 2)(c 3))
>  =>(2 4 7)

Indeed. Thank you for the pointer. 
 
> cl.el ist an extremly good lib for (X)Emacs.

-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Sebastian Stern
Subject: The Unbearable Staggeringness of Closures (was: The staggering power of closures)
Date: 
Message-ID: <ad7d32de.0403030352.68d0d31d@posting.google.com>
Jacek Generowicz:
> Could you show me a usage of closures that would make me shout out
> loud "But that's staggering!"?

Observe and be staggered:

http://www.niksula.cs.hut.fi/~candolin/scheme/beauty.html

--
Sebastian Stern

"Freedom is the freedom to say (= (+ 2 2) 4). If that is granted, all else follows."
From: Jacek Generowicz
Subject: Re: The Unbearable Staggeringness of Closures (was: The staggering power of closures)
Date: 
Message-ID: <tyfn06y87er.fsf@pcepsft001.cern.ch>
········@yahoo.com (Sebastian Stern) writes:

> Jacek Generowicz:
> > Could you show me a usage of closures that would make me shout out
> > loud "But that's staggering!"?
> 
> Observe and be staggered:
> 
> http://www.niksula.cs.hut.fi/~candolin/scheme/beauty.html

But that's staggering!

(This relates to Unlambda in a similar way to how XML relates to
sexprs, right ? :-)
From: Joel Ray Holveck
Subject: Re: The Unbearable Staggeringness of Closures
Date: 
Message-ID: <87oeqv5a6o.fsf@thor.piquan.org>
>>> Could you show me a usage of closures that would make me shout out
>>> loud "But that's staggering!"?
>> Observe and be staggered:
>> http://www.niksula.cs.hut.fi/~candolin/scheme/beauty.html
> But that's staggering!
> (This relates to Unlambda in a similar way to how XML relates to
> sexprs, right ? :-)

It looks like it.  It seems to be an expansion on the idea of
recursion without named functions that Unlambda is so proud of.  I was
a bit surprised to see the (lambda (loop n) ...) towards the end,
since all the rest is curried.  When I first looked at that code, I
almost expected it to use Church integers.

Here's a CL expression that uses the same idea for recursion in a
lambda, but still uses a few modern conveniences like a built-in
multiplication operator and multiple arguments:

  (lambda (n)
    ((lambda (fn ni)
       (funcall fn fn ni))
     (lambda (self nii)
       (if (= nii 0) 1 (* nii (funcall self self (- nii 1)))))
     n))

Cheers,
joelh
From: Eric Smith
Subject: Re: The staggering power of closures
Date: 
Message-ID: <h57f60thi346hd5inuvqtindro7m3pa1av@4ax.com>
On 02 Mar 2004 14:40:58 +0100, Jacek Generowicz
<················@cern.ch> wrote:

>Yet, I've never been tempted to call their power "staggering".

Common Lisp work tends to be done in long sessions, because you get so
much work done, the momentum glues you to your chair, hour after hour,
and even causes your blood circulation to degrade, from sitting still
too long.  Common Lisp has the power, when you finally get up from
your chair, to make you stagger.

--
My email address is ··········@ISP.net where Name is
eric, Number is 9000, and ISP is earthlink.
From: Matthew Danish
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <20040302125712.GH31147@mapcar.org>
On Mon, Mar 01, 2004 at 03:40:00PM -0800, nobody wrote:
> 2. optimization requests - these are also very different from
> "-O3" in other compilers. The latter does not change the program
> semantics, but in Lisp, adding
> (optimize (speed 3) (safety 0) (debug 0)))
> may do so: instead of throwing an exception, the program
> might just ignore an error and crash later, or worse.

This is actually an interesting point.  Lisp optimization declarations
can change the semantics of programs in error conditions.  The standard
makes a distinction between ``safe'' and ``unsafe'' code and discusses
the differences in behavior in section 1.4.2 Error Terminology.  I would
hardly disqualify optimization declarations from benchmarks, however.
By your argument, all C and C++ programs are disqualified from
benchmarking with or without optimization flags, because those languages
barely have any error checking at all and the program will simply crash
with a mysterious message.  Common Lisp clearly defines how to designate
safe and unsafe code, and gives semantics for both cases.  As far as any
benchmark can be valid, I do not see any reason why to deny Lisp
compilers the ability to elide run-time checks which C compilers do not
even bother to try including.

The fact is that optimization declarations are part of the Common Lisp
language.  That you cannot or refuse to comprehend that fact is not a
valid reason to omit their use in a benchmark.  Real CL code uses them
extensively, so a benchmark without them would not be benchmarking CL.

-- 
; Matthew Danish <·······@andrew.cmu.edu>
; OpenPGP public key: C24B6010 on keyring.debian.org
; Signed or encrypted mail welcome.
; "There is no dark side of the moon really; matter of fact, it's all dark."
From: Joe Marshall
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <k7239vc5.fsf@ccs.neu.edu>
··················@yahoo.com (nobody) writes:

> A person not familiar with Lisp might ask: what are these
> declarations and why remove them? Well, these are of two kinds:
>
> 1. type declarations - these are very different from type
>     declarations in other languages.  If you declare types
>     incorrectly in C++, the compiler will give you an error.

As you mentioned, Lisp type declarations are different from type
declarations in other languages.  The should be thought of as promises
to the computer about certain things it cannot figure out for itself.
As such, a Lisp type declaration is more akin to a reinterpret cast
than to a variable declaration.

No C++ compiler will give you an error if you have an incorrect
reinterpret cast.  An incorrect cast would likely lead to the OS
aborting your program.

> If you declare your types incorrectly in Lisp, like state that
> function F accepts lists of integers, but after some complicated
> program logic erroneously give it an integer instead, your program
> will *crash* at runtime, or worse.

There are lisp implementations that gracefully handle out-of-bounds
pointers, so this is simply not true.

> 2. optimization requests - these are also very different from
> "-O3" in other compilers. The latter does not change the program
> semantics, but in Lisp, adding
> (optimize (speed 3) (safety 0) (debug 0)))
> may do so: instead of throwing an exception, the program
> might just ignore an error and crash later, or worse.

Changing the optimizations of a correct program should not alter the
semantics.  (Some vendors will change fixnum assumptions.)
From: Christian Lynbech
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <ofd67va8ee.fsf@situla.ted.dk.eu.ericsson.se>
>>>>> "nobody" == nobody  <··················@yahoo.com> writes:

nobody> The best known non-stupid (real problem, any algorithm) benchmark
nobody> is probably the Coyote Gulch test. There are many languages that it
nobody> has been translated into.

I'll have a look at it. 

So far I have only found implementations in Fortran, Java and C++
(almabench 1.0.1), do you know of any other language implementations?


------------------------+-----------------------------------------------------
Christian Lynbech       | christian ··@ defun #\. dk
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - ·······@hal.com (Michael A. Petonic)
From: rif
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <wj03c8ro1va.fsf@five-percent-nation.mit.edu>
Christian Lynbech <·················@ericsson.com> writes:

> >>>>> "nobody" == nobody  <··················@yahoo.com> writes:
> 
> nobody> The best known non-stupid (real problem, any algorithm) benchmark
> nobody> is probably the Coyote Gulch test. There are many languages that it
> nobody> has been translated into.
> 
> I'll have a look at it. 
> 
> So far I have only found implementations in Fortran, Java and C++
> (almabench 1.0.1), do you know of any other language implementations?
> 

[note: crossposts deleted]

If you're going to get involved with this (I'm deeply unclear whether
this is a good plan), and you actually care about interacting with
"nobody", I suggest being careful to agree in advance on the terms of
comparison.

It seems clear that using enough declarations, you can write a CL
program that's close enough to a C program (all the numbers are
double-floats, all the array sizes are known in advance, the small
functions are inlined, etc.) that the differences in performance will
be mostly a function of the quality of the native code
generation/optimization.  So if you agree to the "challenge", what do
you compare against?  I'd be willing to speculate that the best code
CMUCL will put out will be more-or-less as fast as gcc code, but that
would of course make it more-or-less twice as slow as the Intel
compiler code (on Pentium IV).  That shows that Intel was able to
build a better backend generator, but doesn't tell us anything about
the relative intrinsic speed of the languages.

Of course, there are other problems with nobody's challenge.  One is
the choice of 10% as a threshold of "good enough", which is arbitrary.
I write numerical number-crunching programs in CL all the time.  I
might care about 10% performance in a 350 line program that I was
going to use for years.  My common use case is a 500 line program that
I use for a couple of weeks.  Even if it's 25% or 50% slower than C,
that's fine; a factor of 10 is not fine, but fairly minimal
declarations and a collection of macros I reuse from program to
program let me avoid the factor of 10.

The other issue is that once I've bummed my CL program for
maximum speed by writing it like a C program, I've lost all the fun
and most of the advantages of CL.  What would be more interesting to
me (and I might do this if I get the time, which realistically I
won't) would be an attempt to redo these benchmark programs in a
fairly efficient yet Lispy style.  "My version of the benchmark is
only 25% slower than then the C program, but look how much easier it
was to write."  Although looking at the almabench code, I'm not sure
how much opportunity this program really offers.

rif
From: Rolf Wester
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <c227rt$m8u$1@nets3.rz.RWTH-Aachen.DE>
rif wrote:

> I might care about 10% performance in a 350 line program that I was
> going to use for years.  My common use case is a 500 line program that
> I use for a couple of weeks.  Even if it's 25% or 50% slower than C,

I use CMUCL for small number crunching programs too, but the fastest
CL program of all I wrote and compared to a C++ equivalent was 50% 
slower. It was a matrix-matrix multiplication using declarations as much
as possible. Most other programs were even slower.

Some months ago I had to implement a kind of a scheduler in C++. I 
prototyped it in CL (CMUCL) and then reimplemented it in C++. The CL 
code without any declarations was 10 times slower compared to the 
C++-code. I didn't try to optimize the CL version it so I can't tell how 
fast it could have been. The main advantage of prototyping was that it 
was much easier to develop the algorithms in CL (dynamic, interactive, 
much easier to use data structures etc.). It probably was faster to do 
it this way instead of trying to develop the algorithms in C++.

Rolf Wester
From: Raymond Toy
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <4nznaz1cep.fsf@edgedsp4.rtp.ericsson.se>
>>>>> "Rolf" == Rolf Wester <······@ilt.fraunhofer.de> writes:

    Rolf> rif wrote:

    >> I might care about 10% performance in a 350 line program that I was
    >> going to use for years.  My common use case is a 500 line program that
    >> I use for a couple of weeks.  Even if it's 25% or 50% slower than C,

    Rolf> I use CMUCL for small number crunching programs too, but the fastest
    Rolf> CL program of all I wrote and compared to a C++ equivalent was 50%
    Rolf> slower. It was a matrix-matrix multiplication using declarations as
    Rolf> much
    Rolf> as possible. Most other programs were even slower.

Very likely because CMUCL doesn't have specialized 2D arrays so you
get hit with several memory accesses per element.  If you reimpliment
the code using 1D specialized arrays, you'll probably get something a
bit closer.  It probably won't match C speed though.

Ray
From: Christian Lynbech
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87wu633x9t.fsf@baguette.defun.dk>
>>>>> "rif" == rif  <···@mit.edu> writes:

rif> If you're going to get involved with this (I'm deeply unclear whether
rif> this is a good plan), and you actually care about interacting with
rif> "nobody", I suggest being careful to agree in advance on the terms of
rif> comparison.

I am not truly convinced of the wisdom of my endeavour either, but I
think the challenge is fun and I doubt I can make more damage by doing
a poor job than by not trying at all.

rif> Of course, there are other problems with nobody's challenge.  One is
rif> the choice of 10% as a threshold of "good enough", which is arbitrary.

It is however in some sense my own invention. This is the number I
remember from the PFANNKUCH article in the ACM Lisp Pointers.


------------------------+-----------------------------------------------------
Christian Lynbech       | christian ··@ defun #\. dk
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - ·······@hal.com (Michael A. Petonic)
From: Marco Antoniotti
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <vO71c.71$IJ5.51614@typhoon.nyu.edu>
Christian Lynbech wrote:

>>>>>>"rif" == rif  <···@mit.edu> writes:
> 
> 
> rif> If you're going to get involved with this (I'm deeply unclear whether
> rif> this is a good plan), and you actually care about interacting with
> rif> "nobody", I suggest being careful to agree in advance on the terms of
> rif> comparison.
> 
> I am not truly convinced of the wisdom of my endeavour either, but I
> think the challenge is fun and I doubt I can make more damage by doing
> a poor job than by not trying at all.
> 
> rif> Of course, there are other problems with nobody's challenge.  One is
> rif> the choice of 10% as a threshold of "good enough", which is arbitrary.
> 
> It is however in some sense my own invention. This is the number I
> remember from the PFANNKUCH article in the ACM Lisp Pointers.

You can always answer with a counter request to implement and test in 
C++ some nifty multimethod combination or a full blown CHANGE-CLASS. :) 
  That would be fun :)

Cheers
--
Marco
From: Ray Dillinger
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <40455463.2011B4AE@sonic.net>
Christian Lynbech wrote:
> 
> >>>>> "nobody" == nobody  <··················@yahoo.com> writes:
> 
> nobody> The best known non-stupid (real problem, any algorithm) benchmark
> nobody> is probably the Coyote Gulch test. There are many languages that it
> nobody> has been translated into.
> 
> I'll have a look at it.
> 
> So far I have only found implementations in Fortran, Java and C++
> (almabench 1.0.1), do you know of any other language implementations?
> 

Well, if the overflow and roundoff errors in C, and the CPU cycles
a Lisp program would normally spend detecting them and doing better, 
are important to this test, you may want to use a lisp that has the 
same numeric limitations as C.

If you write this in Stalin, and compile it, it will probably have 
within-a-whisker-of-C performance.  

				Bear
From: nobody
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <165b3efa.0403060625.665be571@posting.google.com>
Christian Lynbech <·················@ericsson.com> wrote in message news:<··············@situla.ted.dk.eu.ericsson.se>...
> >>>>> "nobody" == nobody  <··················@yahoo.com> writes:
>  
>>  [ http://www.coyotegulch.com ]
> 
> I'll have a look at it. 


UPDATE: Lispers try hard, fail miserably, as usual.


I was proven right on all points. The version Lispers declared final (*) :

 * was *twice* the size of C++
   (excluding comments, "data", blank lines, "{", "}" and "#||"
   in both versions - 186 vs 87 lines)

 * ran 2.88 times slower than C++
   (CMUCL vs icc -wX -O3 on Pentium4. ICC can be freely
   downloaded from intel.com BTW. Use rpm2cpio to install on non-RH machines)

 * took days and cooperation of multiple coders just to _optimize_
   (original version was way slower)

 * couldn't compile on some compliant Lisp systems like CLisp
   after the optimizations.

 * was *buggy* !
   (this was no "accident". I predicted how error-prone Lisp would
   get if you try to optimize it:    
   ·································@posting.google.com )


Acknowledgements:

the following people translated the benchmark into Lisp or
contributed optimizations:

Pascal Bourguignon,
Gareth McCaughan,
Raymond Toy,
Christian Lynbech,
Will Hartung,
Marco Antoniotti,
Nils G�sche.
<···@mit.edu> eventually discovered that the "final" Lisp program
returned bogus results.

(*) 
    ···················@g.mccaughan.ntlworld.com
    ···················@thalassa.informatimago.com
From: Brian Mastenbrook
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <060320041015234048%NOSPAMbmastenbNOSPAM@cs.indiana.edu>
UPDATE: Nobody tries hard, fails miserably, as usual.

Nobody was proven wrong on all points. The response it decreed as final
(*) :

* was *twice* as moronic as the average c.l.l post
  (excluding random flames, spam and crossposting).

* took 3.5 times as long to respond
  In fact, nobody analyzed the /wrong/ version of the lisp program,
instead of the well optimized verson of Raymond Toy and myself, which
had been posted hours earlier and beats GCC 3.3 on the Pentium 4.

* needed days just to wrap his feeble little head around the lisp
version and the nature of optional optimization.

* couldn't test on a portable C++ compiler.

* was *buggy* !
  nobody confused a bug in the translation for an actual Lisp bug, and
completely failed to understand the nature of run-time safety
optimization levels. (this was no "accident". Raymond Toy predicted how
error prone his response would be:
···················@edgedsp4.rtp.ericsson.se )

(*)
    ·································@posting.google.com
    ············································@cs.indiana.edu

-- 
Brian Mastenbrook
http://www.cs.indiana.edu/~bmastenb/
From: nobody
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <165b3efa.0403070032.25041009@posting.google.com>
Brian Mastenbrook wrote:


> * was *twice* as moronic as the average c.l.l post
>   (excluding random flames, spam and crossposting).

A Lisp zealot slinging insults, how "unusual" ! You couldn't argue
with the cold facts, so you had to resort to the Lisper tactic.

> * took 3.5 times as long to respond

As long as what? We are not timing me, we are timing benchmark
entries, genius!

>   In fact, nobody analyzed the /wrong/ version of the lisp program,
> instead of the well optimized verson of Raymond Toy and myself

No, I didn't. I used the version R. Toy posted after G. McCaughan
claimed the Lisp version was "good enough". The bug in it was
discovered later, and correcting it only made the Lisp version a bit
bigger and slower, so I don't see what you are whining about.

> * needed days just to wrap his feeble little head around the lisp
> version and the nature of optional optimization.

The optimized version didn't work with CLisp. Period. Be a man and
admit it.

In contrast, the C++ program works with any compliant compiler (And
without any conditional compilation macros)

The Lisp program is 2 times longer than C++, and using the best
compilers for the respective languages, it is 2.88 times slower on
Pentium 4.

You had to manually hoist all variables out of the intensive parts,
because your Lisp compiler couldn't do it for you.

The real issue is, of course, that in trying to optimize it, you made
the program return bogus results.

> * couldn't test on a portable C++ compiler.

I chose ICC. It works on both Windows and Linux. Your Lisp compiler
only works on UNIXes. But since we are using Linux 2.4.x on Pentium 4,
it does not really matter. You seem to be bitter about the freedom to
choose the fastest compiler for one's language.

> * was *buggy* !
>   nobody confused a bug in the translation for an actual Lisp bug, 

Nope. The bug was in your bloated and slow "optimized" Lisp program.
If <···@mit.edu> hadn't discovered it, it would have probably stayed
there.

(What's a "Lisp bug" anyway? A bug in ANSI spec? When did I claim
there was one there?)

Your childish flaming is amusing, as I'm sure it will be to your
future potential employers.

But please keep in mind that real professionals using REAL PROGRAMMING
LANGUAGES are reading this in non-Lisp forums, and to them, a Lisp
bigot like you sounds crazier than a Scientologist nut.

BTW, not a single person posted timings that showed CMUCL to be as
fast as GCC. You must be looking at "user time", which may or may not
include garbage collection, Mr. Lisp Expert.

Did I say your program is 2 times longer, 2.88 times slower, took days
to optimize and gave bogus results? :-P
From: Raymond Toy
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <OSH2c.398$Cm3.0@newsread3.news.pas.earthlink.net>
nobody wrote:
> 
> No, I didn't. I used the version R. Toy posted after G. McCaughan

I didn't post any version.  What are you talking about?
> 
> The optimized version didn't work with CLisp. Period. Be a man and
> admit it.

How so?  It's running it as I type this.  (However, I don't expect it to 
finish anytime soon, since Clisp doesn't compile to native code.)

> 
> In contrast, the C++ program works with any compliant compiler (And
> without any conditional compilation macros)
> 

Brian's code is 100% ANSI-CL.  If some compiler can't handle it, it's 
the compiler's fault, not Brian's.

> compilers for the respective languages, it is 2.88 times slower on
> Pentium 4.

With ICC.  Why do you not also whine about how bad gcc is?  Is it not 
equally bad in your eyes?  If not, why not?

Why do you also not complain about Python?  Java?  These languages have 
their own zealots, just like Lisp and C++.

> 
> The real issue is, of course, that in trying to optimize it, you made
> the program return bogus results.

That was a programmer error, since fixed.  What are you complaining 
about?  Never wrote a buggy program?

> Your childish flaming is amusing, as I'm sure it will be to your
> future potential employers.

Be sure to point your potential employers to this thread as well.

> 
> But please keep in mind that real professionals using REAL PROGRAMMING
> LANGUAGES are reading this in non-Lisp forums, and to them, a Lisp
> bigot like you sounds crazier than a Scientologist nut.

All real professionals understand the benefits and shortcomings of other 
languages and can appreciate the goals for other languages may very well 
not be the same goals as the language they're currently using.
> 
> BTW, not a single person posted timings that showed CMUCL to be as
> fast as GCC. You must be looking at "user time", which may or may not
> include garbage collection, Mr. Lisp Expert.

Did you even read the results that Brian posted?  Here is a short clip:

Intel Pentium 4 3.0 GHz

g++ (GCC) 3.3.1 20030626 (Debian prerelease)
g++ -o almabench -march=pentium4 -mcpu=pentium4 -O3 -msse -msse2 -mmmx
almabench.o

real    0m27.374s
user    0m27.360s
sys     0m0.000s


CMUCL 18e:

* (time (almabench:main))
; Compiling LAMBDA NIL:
; Compiling Top-Level Form:

; Evaluation took:
;   26.16 seconds of real time
;   26.16 seconds of user run time
;   0.0 seconds of system run time
;   78,845,426,712 CPU cycles
;   0 page faults and
;   216 bytes consed.
;

SBCL 0.8.5.24

* (time  (almabench:main))

Evaluation took:
                  26.895 seconds of real time
                  26.9 seconds of user run time
                  0.0 seconds of system run time
                  0 page faults and
                  0 bytes consed.


So, SBCL on the same 3.0GHz P4 took 26.895 sec real time (which includes 
everything, including GC time (which is zero in this case)).  GCC 3.3.1 
took 27.37s real time.  CMUCL 18e took 26.16 sec real time.  Looks like 
CMUCL and SBCL were faster than gcc in this case, which was exactly what 
you asked for.  (Yes, you may say icc is faster, but that's not what you 
asked for here.)

Ray
From: Kenny Tilton
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <e_I2c.24239$Wo2.16176@twister.nyc.rr.com>
nobody wrote:

> You had to manually hoist all variables out of the intensive parts,
> because your Lisp compiler couldn't do it for you.

I think you have that backwards. One of the key features and advantages 
of Lisp is that the compiler hoists values out of their necessary binary 
representation into wrappers (boxes) which can be passed around in code 
devoid of tedious type declarations. The alternative is for the 
programmer constantly to be tending to this chore or even grinding to a 
halt when they try to stuff mixed values in an array.

By chance (ha!) you happened to pick a benchmark in which the boxing 
served no value, and then the CL crowd was silly enough to rise to the 
bait. Well, no, they just thought it would be fun to beat gcc at its own 
game, so, yes, they whipped up some astounding declaration tricks to get 
the Lisp compiler effectively to compile C++ instead of Lisp. ie, they 
got it to leave out the boxing, and as a compiler enthusiast surely you 
admire that the Lisp compiler can be so persuaded.

As for the difficulty of persuading the compiler to handle numeric 
values as if at the machine level, yep, it looks a lot like what C++ and 
Java programmers do with every line of code they write.

The scoop is that most code does not look like this benchmark, so it is 
better to pick a programmer-friendly language that is fast for most code 
and then use declarations or the FFI to "Just Call C" if some corner of 
an app has to do a bunch of math /and/ gets hit often enough to become a 
performance issue.

> Your childish flaming is amusing, as I'm sure it will be to your
> future potential employers.

Watch the over-acting. Someone other than me may figure out that you are 
actually a lispnik posing as an anti-lispnik as an excuse to spam 
non-lisp NGs with pro-Lisp propaganda you know will arise from such a 
"troll".

kenneth

-- 
http://tilton-technology.com

Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film

Your Project Here! http://alu.cliki.net/Industry%20Application
From: Coby Beck
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <c2epl1$sg5$1@otis.netspace.net.au>
"nobody" <··················@yahoo.com> wrote in message
·································@posting.google.com...
> Brian Mastenbrook wrote:
> > * was *twice* as moronic as the average c.l.l post
> >   (excluding random flames, spam and crossposting).
>
> A Lisp zealot slinging insults, how "unusual" ! You couldn't argue
> with the cold facts, so you had to resort to the Lisper tactic.

plonk.

Hey, I can still say that I have nobody in my killfile...

-- 
Coby Beck
(remove #\Space "coby 101 @ big pond . com")
From: Kaz Kylheku
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <cf333042.0403021509.68d991fe@posting.google.com>
··················@yahoo.com (nobody) wrote in message news:<····························@posting.google.com>...
> 1. type declarations - these are very different from type
> declarations in other languages. If you declare types incorrectly
> in C++, the compiler will give you an error. If you declare your
> types incorrectly in Lisp, like state that function F
> accepts lists of integers, but after some complicated program
> logic erroneously give it an integer instead, your program will
> *crash*  at runtime, or worse.

[ blah blah ]

You really are an incredibly stupid troll.

Your problem is here that you don't know much about C++ *or* Lisp. I
would not trust you to put ten lines of C++ together.

The C++ language is chock-full of undefined behaviors. From the
addition of integers that overflows to evaluation order problems like
a[i] = i++.

Even type conversions produce undefined behavior, like coercing -1.0
to the type unsigned int.

The static type system is weak. It's very easy to write a C++ program
that will compile and link to produce an executable, yet contains
serious type mismatch errors.

I'm not even talking about programs that contain type casts! Here is
an example:

   // file foo.h
   int foo(int);

   // file foo.cpp --- missing #include "foo.h"
   double foo(int x) { /*...*/ }

   // file bar.cpp
   #include "foo.h"
   
   int main()
   {
     int x = foo(42);
     return 0;
   }

compiles, links and runs with g++ without a hitch. Many C++
implementations have type safe linkage only with respect to function
parameters, not the return value. Why? Because the implementors don't
care about safety at all. The so called ``type-safe'' linkage is
simply a side effect of supporting function overloading. Get one or
more arguments wrong, and the mangling produces a different symbol,
and so you can't link. Get only the return type wrong, and it links,
type safety be damned.

Structures don't include type enough detailed information for type
safe linkage either.

Suppose that some function takes an argument of ``struct foo''. The
definition of the struct changes, and the module is not recompiled.
This is a mismatch that won't be caught by C++ implementations. Try
this program:

   // foo1.cpp 
   struct foo {
     double y;
   };

   void foo1(foo *f)
   {
     // ...
   }

   // foo2.cpp
   struct foo {
     char *z;
   };

   void foo1(foo *f);

   void foo2(foo *f)
   {
     foo1(f);
   }

I can compile both of these modules and link them with some main() to
make one C++ program. The two incompatible definitions of ``struct
foo'' link without a hitch, and so two parts of the program have a
totally different idea about the same type!
From: Corey Murtagh
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <1078374602.267497@radsrv1.tranzpeer.net>
Kaz Kylheku wrote:

<snip>
> You really are an incredibly stupid troll.

Don't want to be seen to be encouraging trolling, but...

> Your problem is here that you don't know much about C++ *or* Lisp. I
> would not trust you to put ten lines of C++ together.
> 
> The C++ language is chock-full of undefined behaviors. From the
> addition of integers that overflows to evaluation order problems like
> a[i] = i++.

Uh... all of those 'undefined behavior' issues are documented in the 
standard, and should a programmer step into UB land then the programmer 
obviously is at fault.  A truly competent C++ programmer is well aware 
of this and doesn't go around invoking undefined behavior at every 
opportunity.

> Even type conversions produce undefined behavior, like coercing -1.0
> to the type unsigned int.

Which is why range checking type conversions is A Good Idea(tm).  Just 
like sanity checking function parameters and all those other things we do.

> The static type system is weak. It's very easy to write a C++ program
> that will compile and link to produce an executable, yet contains
> serious type mismatch errors.

Sure.  I can do that in a couple of lines of code.  I've also not 
produced that type of error in any production code that's gone out the 
door.  Nor have I yet managed to release code with a buffer overflow bug 
despite years of working with C++.

> I'm not even talking about programs that contain type casts! Here is
> an example:
> 
>    // file foo.h
>    int foo(int);
> 
>    // file foo.cpp --- missing #include "foo.h"
>    double foo(int x) { /*...*/ }

Yep, that's a stupid mistake.  Never done that one myself, not even back 
when I was a newbie.  But then I got into the habit from day one of 
correctly including module headers, etc. so it's not one I'm likely to 
come across *shrug*


Yes, C and C++ have lots of ways you can metaphorically cut your own 
head off.  Yes, a lot of inexperienced programmers have fallen into 
those traps.  Is the language to blame for that or the programmer?

Now can we PLEASE quit it with the interminable "my language is better 
than yours" rants and start acting like adults?

-- 
Corey Murtagh
The Electric Monk
"Quidquid latine dictum sit, altum viditur!"
From: Joe "Nuke Me Xemu" Foster
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <1078379507.393378@news-1.nethere.net>
"Corey Murtagh" <·····@slingshot.no.uce> wrote in message <······················@radsrv1.tranzpeer.net>...

> Yes, C and C++ have lots of ways you can metaphorically cut your own
> head off.  Yes, a lot of inexperienced programmers have fallen into
> those traps.  Is the language to blame for that or the programmer?

The language is indeed to blame, but apparently only when the language
being blamed happens to be BASIC, COBOL, ForTran, or Visual Basic. HTH

--
Joe Foster <mailto:jlfoster%40znet.com>  Sign the Check! <http://www.xenu.net/>
WARNING: I cannot be held responsible for the above        They're   coming  to
because  my cats have  apparently  learned to type.        take me away, ha ha!
From: Christian Lynbech
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87ptbv3woz.fsf@baguette.defun.dk>
>>>>> "nobody" == nobody  <··················@yahoo.com> writes:

nobody> I'll call your bluff, mister.

It is technically not really my bluff as the 10% comes from an actual
benchmark done and reported in a journal (ACM Lisp Pointers).

nobody> The best known non-stupid (real problem, any algorithm) benchmark
nobody> is probably the Coyote Gulch test. There are many languages that it
nobody> has been translated into.

I have now read the accompanying article and that clearly states that
the benchmark is designed to test real/floating-point number crunching
performance. Whether that qualifies as a general applicable "real
problem, any algorithm" kind of benchmarking is not really clear to
me, but number crunching is of course also important. The PFANNKUCH
benchmark (on which my claim was based) is explicitly an integer
manipulation benchmark.

nobody> Do you accept the challenge? (yes/no)

I am still game for the challenge. I will however not have a full
suite of compilers at my disposal. My results will of course also
depend very much on my lisp programming skills and my ability to
understand what the program actually does. 

Finally I should also add that I never intended to claim that all lisp
compilers necessarily would be able to compete with all available
C/C++/Fortran compilers. 

------------------------+-----------------------------------------------------
Christian Lynbech       | christian ··@ defun #\. dk
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - ·······@hal.com (Michael A. Petonic)
From: Pascal Bourguignon
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <877jy3vvuv.fsf@thalassa.informatimago.com>
··················@yahoo.com (nobody) writes:
> If you remove declaration statements from it, making it a "normal
> Lisp program", I predict that it will be about 100 (one hundred)
> times slower than C (10 if you are lucky, 1000 if not)

We'll remove declarations  in our lisp programs competing  with C when
YOU  remove  the delcarations  from  your  C  programs and  still  get
something aproximately functional.

Note that in C (at least  for compilers old enough), you can avoid all
declaration meaning that all variables will be of type int, as in lisp
we can avoid them all meaning that all variables are of type t.  We'll
see who is faster then.

-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Thomas Lindgren
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <m3ptbual2i.fsf@localhost.localdomain>
Pascal Bourguignon <····@thalassa.informatimago.com> writes:

> ··················@yahoo.com (nobody) writes:
> > If you remove declaration statements from it, making it a "normal
> > Lisp program", I predict that it will be about 100 (one hundred)
> > times slower than C (10 if you are lucky, 1000 if not)
> 
> We'll remove declarations  in our lisp programs competing  with C when
> YOU  remove  the delcarations  from  your  C  programs and  still  get
> something aproximately functional.
> 
> Note that in C (at least  for compilers old enough), you can avoid all
> declaration meaning that all variables will be of type int, as in lisp
> we can avoid them all meaning that all variables are of type t.  We'll
> see who is faster then.

Alternatively require that the code is written to comply with CODE
COMPLETE or WRITING SOLID CODE or whatever the tome du jour on writing
reliable C/C++ code may be.

In a wider context, a friend tells me the big Java-on-legacy system
he's working on spends 95% (sic) of its time in converting between the
data formats of the various "components", so perhaps this kind of
performance discussion is a bit out of touch with reality anyway.

Best,
                        Thomas
-- 
Thomas Lindgren
"It's becoming popular? It must be in decline." -- Isaiah Berlin
 
From: Pascal Bourguignon
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87k722c472.fsf@thalassa.informatimago.com>
··················@yahoo.com (nobody) writes:
> I'll call your bluff, mister.
> 
> The best known non-stupid (real problem, any algorithm) benchmark
> is probably the Coyote Gulch test. There are many languages that it
> has been translated into. If you can produce (write or find) and post
> a Lisp version that is within 10% of C performance, I will admit
> that #1 is incorrect. Otherwise, you will admit that your reply is
> misleading propaganda.
> 
> Let the platform be Pentium 4, Linux 2.4.x (see #6)
> 
> Do you accept the challenge? (yes/no)
> 
> 
> 
> 
> 
> 
> I predict that the fastest Lisp version will be 2-20 times
> slower than C, and its code size will be *bigger* .

Here is a naive translation  of almabench from c++ to Common-Lisp.  As
it is, sbcl  and cmucl generate function calls  for generic arithmetic
operators.   I assume  we would  have  to add  (the double-float  ...)
everywhere to get anything of the order of C++ speed.

Actually, I'm  VERY disapointed by sbcl.  It  sometimes gives messages
such as "note: deleting unreachable  code", but here it is not capable
of  detecting   that  there  is   no  side  effect  to   the  function
almabench:main and that  all it ever does is return  0.  I expected it
to  reduce  the  program  to  (defun  main  () 0)  and  run  it  in  0
nanosecond. ;-)



;; ALMABENCH 1.0.1
;; Standard C++ version
;; 
;; A number-crunching benchmark designed for cross-language and vendor
;; comparisons.
;; 
;; Written by Scott Robert Ladd.
;; Naively translated to Common-Lisp by Pascal Bourguignon
;; 
;; No rights reserved. This is public domain software, for use by anyone.
;; 
;; This program calculates the daily ephemeris (at noon) for the years
;; 2000-2099 using an algorithm developed by J.L. Simon, P. Bretagnon, J.
;; Chapront, M. Chapront-Touze, G. Francou and J. Laskar of the Bureau des
;; Longitudes, Paris, France), as detailed in Astronomy & Astrophysics
;; 282, 663 (1994)
;; 
;; Note that the code herein is design for the purpose of testing 
;; computational performance error handling and other such "niceties"
;; is virtually non-existent.
;; 
;; Actual (and oft-updated) benchmark results can be found at:
;; http:www.coyotegulch.com
;; 
;; Please do not use this information or algorithm in any way that might
;; upset the balance of the universe or otherwise cause planets to impact
;; upon one another.

(defpackage "ALMABENCH"
  (:use "COMMON-LISP")
  (:shadow "PI" "T" "POSITION")
  (:export "MAIN"))
(in-package "ALMABENCH")


(defconstant  PI          3.14159265358979323846d0)
(defconstant  J2000      2451545.0d0)
(defconstant  JCENTURY   36525.0d0)
(defconstant  JMILLENIA  365250.0d0)
(defconstant  TWOPI      (* 2.0d0  PI))
(defconstant  A2R        (/ PI  648000.0d0))
(defconstant  R2H        (/ 12.0d0  PI))
(defconstant  R2D        (/ 180.0d0  PI))
(defconstant  GAUSSK     0.01720209895d0)

;; number of days to include in test
(defconstant TEST-LOOPS   20)
(defconstant TEST-LENGTH  36525)

;; sin and cos of j2000 mean obliquity (iau 1976)
(defconstant  sineps  0.3977771559319137d0)
(defconstant  coseps  0.9174820620691818d0)

(defconstant amas
  #( 6023600.0d0  408523.5d0  328900.5d0  3098710.0d0  1047.355d0  3498.5d0  22869.0d0  19314.0d0 ))

;; tables giving the mean keplerian elements, limited to t**2 terms:
;; a       semi-major axis (au)
;; dlm     mean longitude (degree and arcsecond)
;; e       eccentricity
;; pi      longitude of the perihelion (degree and arcsecond)
;; dinc    inclination (degree and arcsecond)
;; omega   longitude of the ascending node (degree and arcsecond)

(defconstant  a
  #2A( (   0.3870983098d0              0d0         0d0 )
       (   0.7233298200d0              0d0         0d0 )
       (   1.0000010178d0              0d0         0d0 )
       (   1.5236793419d0          3d-10        0d0 )
       (   5.2026032092d0      19132d-10  -39d-10)
       (   9.5549091915d0  -0.0000213896d0   444d-10)
       (  19.2184460618d0      -3716d-10  979d-10)
       (  30.1103868694d0     -16635d-10  686d-10)))
       
(defconstant  dlm
  #2A( ( 252.25090552d0  5381016286.88982d0   -1.92789d0 )
       ( 181.97980085d0  2106641364.33548d0    0.59381d0 )
       ( 100.46645683d0  1295977422.83429d0   -2.04411d0 )
       ( 355.43299958d0   689050774.93988d0    0.94264d0 )
       (  34.35151874d0   109256603.77991d0  -30.60378d0 )
       (  50.07744430d0    43996098.55732d0   75.61614d0 )
       ( 314.05500511d0    15424811.93933d0   -1.75083d0 )
       ( 304.34866548d0     7865503.20744d0    0.21103d0 ))) 

(defconstant  e
  #2A( (   0.2056317526d0   0.0002040653d0     -28349d-10 )
       (   0.0067719164d0  -0.0004776521d0      98127d-10 )
       (   0.0167086342d0  -0.0004203654d0  -0.0000126734d0 )
       (   0.0934006477d0   0.0009048438d0     -80641d-10 )
       (   0.0484979255d0   0.0016322542d0  -0.0000471366d0 )
       (   0.0555481426d0  -0.0034664062d0  -0.0000643639d0 )
       (   0.0463812221d0  -0.0002729293d0   0.0000078913d0 )
       (   0.0094557470d0   0.0000603263d0             0d0 )))

(defconstant  pi-a
  #2A( (  77.45611904d0   5719.11590d0    -4.83016d0 )
       ( 131.56370300d0    175.48640d0  -498.48184d0 )
       ( 102.93734808d0  11612.35290d0    53.27577d0 )
       ( 336.06023395d0  15980.45908d0   -62.32800d0 )
       (  14.33120687d0   7758.75163d0   259.95938d0 )
       (  93.05723748d0  20395.49439d0   190.25952d0 )
       ( 173.00529106d0   3215.56238d0   -34.09288d0 )
       (  48.12027554d0   1050.71912d0    27.39717d0 )))

(defconstant  dinc
  #2A( (   7.00498625d0  -214.25629d0    0.28977d0 )
       (   3.39466189d0   -30.84437d0  -11.67836d0 )
       (            0d0   469.97289d0   -3.35053d0 )
       (   1.84972648d0  -293.31722d0   -8.11830d0 )
       (   1.30326698d0   -71.55890d0   11.95297d0 )
       (   2.48887878d0    91.85195d0  -17.66225d0 )
       (   0.77319689d0   -60.72723d0    1.25759d0 )
       (   1.76995259d0     8.12333d0    0.08135d0 )))

(defconstant  omega
  #2A( (  48.33089304d0   -4515.21727d0   -31.79892d0 )
       (  76.67992019d0  -10008.48154d0   -51.32614d0 )
       ( 174.87317577d0   -8679.27034d0    15.34191d0 )
       (  49.55809321d0  -10620.90088d0  -230.57416d0 )
       ( 100.46440702d0    6362.03561d0   326.52178d0 )
       ( 113.66550252d0   -9240.19942d0   -66.23743d0 )
       (  74.00595701d0    2669.15033d0   145.93964d0 )
       ( 131.78405702d0    -221.94322d0    -0.78728d0 ))) 

;; tables for trigonometric terms to be added to the mean elements
;; of the semi-major axes.
(defconstant  kp
  #2A ( ( 69613.0d0   75645.0d0  88306.0d0  59899.0d0  15746.0d0  71087.0d0  142173.0d0   3086.0d0     0.0d0 )
        ( 21863.0d0   32794.0d0  26934.0d0  10931.0d0  26250.0d0  43725.0d0   53867.0d0  28939.0d0     0.0d0 )
        ( 16002.0d0   21863.0d0  32004.0d0  10931.0d0  14529.0d0  16368.0d0   15318.0d0  32794.0d0     0.0d0 )
        (  6345.0d0    7818.0d0  15636.0d0   7077.0d0   8184.0d0  14163.0d0    1107.0d0   4872.0d0     0.0d0 )
        (  1760.0d0    1454.0d0   1167.0d0    880.0d0    287.0d0   2640.0d0      19.0d0   2047.0d0  1454.0d0 )
        (   574.0d0       0.0d0    880.0d0    287.0d0     19.0d0   1760.0d0    1167.0d0    306.0d0   574.0d0 )
        (   204.0d0       0.0d0    177.0d0   1265.0d0      4.0d0    385.0d0     200.0d0    208.0d0   204.0d0 )
        (     0.0d0     102.0d0    106.0d0      4.0d0     98.0d0   1367.0d0     487.0d0    204.0d0     0.0d0 ))) 

(defconstant  ca
  #2A( (       4.0d0     -13.0d0     11.0d0     -9.0d0     -9.0d0     -3.0d0     -1.0d0      4.0d0     0.0d0 )
       (    -156.0d0      59.0d0    -42.0d0      6.0d0     19.0d0    -20.0d0    -10.0d0    -12.0d0     0.0d0 )
       (      64.0d0    -152.0d0     62.0d0     -8.0d0     32.0d0    -41.0d0     19.0d0    -11.0d0     0.0d0 )
       (     124.0d0     621.0d0   -145.0d0    208.0d0     54.0d0    -57.0d0     30.0d0     15.0d0     0.0d0 )
       (  -23437.0d0   -2634.0d0   6601.0d0   6259.0d0  -1507.0d0  -1821.0d0   2620.0d0  -2115.0d0 -1489.0d0 )
       (   62911.0d0 -119919.0d0  79336.0d0  17814.0d0 -24241.0d0  12068.0d0   8306.0d0  -4893.0d0  8902.0d0 )
       (  389061.0d0 -262125.0d0 -44088.0d0   8387.0d0 -22976.0d0  -2093.0d0   -615.0d0  -9720.0d0  6633.0d0 )
       ( -412235.0d0 -157046.0d0 -31430.0d0  37817.0d0  -9740.0d0    -13.0d0  -7449.0d0   9644.0d0     0.0d0 ))) 

(defconstant  sa
  #2A( (     -29.0d0     -1.0d0      9.0d0      6.0d0     -6.0d0      5.0d0      4.0d0      0.0d0     0.0d0 )
       (     -48.0d0   -125.0d0    -26.0d0    -37.0d0     18.0d0    -13.0d0    -20.0d0     -2.0d0     0.0d0 )
       (    -150.0d0    -46.0d0     68.0d0     54.0d0     14.0d0     24.0d0    -28.0d0     22.0d0     0.0d0 )
       (    -621.0d0    532.0d0   -694.0d0    -20.0d0    192.0d0    -94.0d0     71.0d0    -73.0d0     0.0d0 )
       (  -14614.0d0 -19828.0d0  -5869.0d0   1881.0d0  -4372.0d0  -2255.0d0    782.0d0    930.0d0   913.0d0 )
       (  139737.0d0      0.0d0  24667.0d0  51123.0d0  -5102.0d0   7429.0d0  -4095.0d0  -1976.0d0 -9566.0d0 )
       ( -138081.0d0      0.0d0  37205.0d0 -49039.0d0 -41901.0d0 -33872.0d0 -27037.0d0 -12474.0d0 18797.0d0 )
       (       0.0d0  28492.0d0 133236.0d0  69654.0d0  52322.0d0 -49577.0d0 -26430.0d0  -3593.0d0     0.0d0 ))) 

;; tables giving the trigonometric terms to be added to the mean
;;  elements of the mean longitudes.
(defconstant  kq
  #2A( (  3086.0d0  15746.0d0  69613.0d0  59899.0d0  75645.0d0  88306.0d0  12661.0d0  2658.0d0   0.0d0    0.0d0 )
       ( 21863.0d0  32794.0d0  10931.0d0     73.0d0   4387.0d0  26934.0d0   1473.0d0  2157.0d0   0.0d0    0.0d0 )
       (    10.0d0  16002.0d0  21863.0d0  10931.0d0   1473.0d0  32004.0d0   4387.0d0    73.0d0   0.0d0    0.0d0 )
       (    10.0d0   6345.0d0   7818.0d0   1107.0d0  15636.0d0   7077.0d0   8184.0d0   532.0d0  10.0d0    0.0d0 )
       (    19.0d0   1760.0d0   1454.0d0    287.0d0   1167.0d0    880.0d0    574.0d0  2640.0d0  19.0d0 1454.0d0 )
       (    19.0d0    574.0d0    287.0d0    306.0d0   1760.0d0     12.0d0     31.0d0    38.0d0  19.0d0  574.0d0 )
       (     4.0d0    204.0d0    177.0d0      8.0d0     31.0d0    200.0d0   1265.0d0   102.0d0   4.0d0  204.0d0 )
       (     4.0d0    102.0d0    106.0d0      8.0d0     98.0d0   1367.0d0    487.0d0   204.0d0   4.0d0  102.0d0 ))) 

(defconstant  cl
  #2A( (      21.0d0    -95.0d0  -157.0d0    41.0d0    -5.0d0    42.0d0    23.0d0    30.0d0      0.0d0     0.0d0 )
       (    -160.0d0   -313.0d0  -235.0d0    60.0d0   -74.0d0   -76.0d0   -27.0d0    34.0d0      0.0d0     0.0d0 )
       (    -325.0d0   -322.0d0   -79.0d0   232.0d0   -52.0d0    97.0d0    55.0d0   -41.0d0      0.0d0     0.0d0 )
       (    2268.0d0   -979.0d0   802.0d0   602.0d0  -668.0d0   -33.0d0   345.0d0   201.0d0    -55.0d0     0.0d0 )
       (    7610.0d0  -4997.0d0 -7689.0d0 -5841.0d0 -2617.0d0  1115.0d0  -748.0d0  -607.0d0   6074.0d0   354.0d0 )
       (  -18549.0d0  30125.0d0 20012.0d0  -730.0d0   824.0d0    23.0d0  1289.0d0  -352.0d0 -14767.0d0 -2062.0d0 )
       ( -135245.0d0 -14594.0d0  4197.0d0 -4030.0d0 -5630.0d0 -2898.0d0  2540.0d0  -306.0d0   2939.0d0  1986.0d0 )
       (   89948.0d0   2103.0d0  8963.0d0  2695.0d0  3682.0d0  1648.0d0   866.0d0  -154.0d0  -1963.0d0  -283.0d0 ))) 

(defconstant  sl
  #2A( (   -342.0d0    136.0d0   -23.0d0    62.0d0    66.0d0   -52.0d0   -33.0d0    17.0d0      0.0d0     0.0d0 )
       (    524.0d0   -149.0d0   -35.0d0   117.0d0   151.0d0   122.0d0   -71.0d0   -62.0d0      0.0d0     0.0d0 )
       (   -105.0d0   -137.0d0   258.0d0    35.0d0  -116.0d0   -88.0d0  -112.0d0   -80.0d0      0.0d0     0.0d0 )
       (    854.0d0   -205.0d0  -936.0d0  -240.0d0   140.0d0  -341.0d0   -97.0d0  -232.0d0    536.0d0     0.0d0 )
       ( -56980.0d0   8016.0d0  1012.0d0  1448.0d0 -3024.0d0 -3710.0d0   318.0d0   503.0d0   3767.0d0   577.0d0 )
       ( 138606.0d0 -13478.0d0 -4964.0d0  1441.0d0 -1319.0d0 -1482.0d0   427.0d0  1236.0d0  -9167.0d0 -1918.0d0 )
       (  71234.0d0 -41116.0d0  5334.0d0 -4935.0d0 -1848.0d0    66.0d0   434.0d0 -1748.0d0   3780.0d0  -701.0d0 )
       ( -47645.0d0  11647.0d0  2166.0d0  3194.0d0   679.0d0     0.0d0  -244.0d0  -419.0d0  -2531.0d0    48.0d0 ))) 

;;---------------------------------------------------------------------------
;; Normalize angle into the range -pi <= A < +pi.
(defun anpm (aa)
  (declare (type double-float aa))
  (let ((w (mod aa TWOPI)))
    (when (>= (abs w) pi)
      (setf w (- w (if (< aa 0.0d0) (- twopi) twopi))))
    w));;anpm



(deftype int () 'fixnum)
;;---------------------------------------------------------------------------    
;; The reference frame is equatorial and is with respect to the
;;    mean equator and equinox of epoch j2000.
(defun  planetpv (epoch np pv)
  (declare (type (simple-array double-float (2)) epoch)
           (type int np)
           (type (simple-array double-float (2 3)) pv))
  ;; working storage
  (let ((k 0)
        (t 0d0) (da 0d0) (dl 0d0) (de 0d0) (dp 0d0) (di 0d0)
        (doh 0d0) (dmu 0d0) (arga 0d0) (argl 0d0) (am 0d0)
        (ae 0d0) (dae 0d0) (ae2 0d0) (at 0d0) (r 0d0) (v 0d0)
        (si2 0d0) (xq 0d0) (xp 0d0) (tl 0d0) (xsw 0d0)
        (xcw 0d0) (xm2 0d0) (xf 0d0) (ci2 0d0) (xms 0d0)
        (xmc 0d0) (xpxq2 0d0) (x 0d0) (y 0d0) (z 0d0))
    (declare (type int k)
             (type double-float
                   t da dl de dp di doh dmu arga argl am
                   ae dae ae2 at r v si2 xq xp tl xsw
                   xcw xm2 xf ci2 xms xmc xpxq2 x y z))

    ;; time: julian millennia since j2000.
    (setf t  (/ (+ (- (aref epoch 0) J2000) (aref epoch 1))  JMILLENIA))

    ;; compute the mean elements.
    (setf da (+ (aref a np 0)  (* (+ (aref a np 1) (* (aref a np 2) t)) t))
          dl (* (+ (* 3600.0d0 (aref dlm np 0)) (* (+ (aref dlm np 1) (* (aref dlm np 2) t)) t)) A2R)
          de (+ (aref e np 0) (* (+ (aref e np 1) (* (aref e np 2) t)) t))
          dp (anpm (* (+ (* 3600.0d0 (aref pi-a np 0)) (* (+ (aref pi-a np 1) (* (aref pi-a np 2) t)) t)) A2R))
          di (* (+ (* 3600.0d0 (aref dinc np 0)) (* (+ (aref dinc np 1) (* (aref dinc np 2) t)) t)) A2R)
          doh (anpm (* (+ (* 3600.0d0 (aref omega np 0)) (* (+ (aref omega np 1) (* (aref omega np 2) t)) t)) A2R)) )
    ;; apply the trigonometric terms.
    (setf dmu  (* 0.35953620d0  t))

    (do ((k 0 (1+ k)))
        ((<= 8 k))
      (setf arga (* (aref kp np k) dmu)
            argl (* (aref kq np k) dmu))
      (incf da  (* (+ (* (aref ca np k) (cos arga)) (* (aref sa np k)  (sin arga)))  0.0000001d0))
      (incf dl  (* (+ (* (aref cl np k) (cos argl)) (* (aref sl np k)  (sin argl)))  0.0000001d0)))

    (setf arga (* (aref kp np 8) dmu))
    (incf da (* t (+ (* (aref ca np 8) (cos arga)) (* (aref sa np 8) (sin arga)))  0.0000001d0))

    (do ((k 8 (1+ k)))
        ((< 9 k))
      (setf argl (* (aref kq np k) dmu))
      (incf dl (* t (+ (* (aref cl np k) (cos argl)) (* (aref sl np k) (sin argl))) 0.0000001d0)))

    (setf dl (mod dl TWOPI))

    ;; iterative solution of kepler's equation to get eccentric anomaly.
    (setf am (- dl dp)
          ae (+ am (* de (sin am))))

    (block :loop
      (setf k 0)
      (loop
       (setf dae  (/ (- (+ am (* de (sin ae))) ae) (- 1.0d0 (* de (cos ae)))))
       (incf ae dae)
       (incf k)
       (when (or (>= k 10) (< (abs dae) 1d-12))
         (return-from :loop))))
    
    ;; true anomaly.
    (setf ae2 (/ ae  2.0d0)
          at  (* 2.0d0 (atan (* (sqrt (/ (+ 1.0d0  de) (- 1.0d0  de))) (sin ae2))  (cos ae2))))

    ;; distance (au) and speed (radians per day).
    (setf r (* da  (- 1.0d0 (* de (cos ae))))
          v (* GAUSSK (sqrt (/ (+ 1.0d0 (/ 1.0d0 (aref amas np))) (* da da da)))))

    (setf si2   (sin (/ di  2.0d0))
          xq    (* si2 (cos doh))
          xp    (* si2 (sin doh))
          tl    (+ at  dp)
          xsw   (sin tl)
          xcw   (cos tl)
          xm2   (* 2.0d0 (- (* xp xcw) (* xq xsw)))
          xf    (/ da (sqrt (- 1.0d0 (* de de))))
          ci2   (cos (/ di 2.0d0))
          xms   (* (+ (* de (sin dp)) xsw) xf)
          xmc   (* (+ (* de (cos dp)) xcw) xf)
          xpxq2 (* 2.0d0  xp xq))

    ;; position (j2000 ecliptic x,y,z in au).
    (setf x (* r  (- xcw (* xm2 xp)))
          y (* r  (+ xsw (* xm2 xq)))
          z (* r  (* (- xm2) ci2)))

    ;; rotate to equatorial.
    (setf (aref pv 0 0)  x
          (aref pv 0 1) (- (* y coseps) (* z sineps))
          (aref pv 0 2) (+ (* y sineps) (* z coseps)))

    ;; velocity (j2000 ecliptic xdot,ydot,zdot in au/d).
    (setf x (* v  (+ (* (+ -1.0d0  (* 2.0d0  xp  xp))  xms) (* xpxq2  xmc)))
          y (* v  (- (* (- 1.0d0 (* 2.0d0  xq  xq))  xmc) (* xpxq2  xms)))
          z (* v  (* 2.0d0 ci2 (+ (* xp xms) (* xq  xmc)))))

    ;; rotate to equatorial.
    (setf (aref pv 1 0) x
          (aref pv 1 1) (- (* y  coseps) (* z  sineps))
          (aref pv 1 2) (+ (* y  sineps) (* z  coseps)))
    (values)));;planetpv

;;---------------------------------------------------------------------------
;; Computes RA, Declination, and distance from a state vector returned by
;; planetpv.
(defun radecdist (state rdd)
  (declare (type (simple-array double-float (2 3)) state)
           (type (simple-array double-float (3)) rdd))
  ;; distance
  (setf (aref rdd 2) (sqrt (+ (* (aref state 0 0)  (aref state 0 0))
                              (* (aref state 0 1)  (aref state 0 1))
                              (* (aref state 0 2)  (aref state 0 2)))))

  ;; RA
  (setf (aref rdd 0) (atan (aref state 0 1) (* (aref state 0 0) R2H)))
  (when (< (aref rdd 0)  0.0d0) (incf (aref rdd 0) 24.0d0))

  ;; Declination
  (setf (aref rdd 1) (* (asin (/ (aref state 0 2) (aref rdd 2))) R2D))
  (values));;radecdist


;;---------------------------------------------------------------------------
;; Entry point
;; Calculate RA and Dec for noon on every day in 1900-2100
(defun main ()
  (let ((jd (make-array '(2) :element-type 'double-float :initial-element 0d0))
        (pv (make-array '(2 3) :element-type 'double-float :initial-element 0d0))
        (position (make-array '(3) :element-type 'double-float :initial-element 0d0)))
    (declare (type (simple-array double-float (2)) jd)
             (type (simple-array double-float (2 3)) pv)
             (type (simple-array double-float (3)) position))
    (do ((i 0 (1+ i)))
        ((>= i TEST-LOOPS))
      (declare (type int i))
      (setf (aref jd 0) J2000
            (aref jd 1) 0.0d0)
      (do ((n 0 (1+ n)))
          ((>= n TEST-LENGTH))
        (declare (type int n))
        (incf (aref jd 0) 1.0d0)
        (do ((p 0 (1+ p)))
            ((>= p 8))
          (declare (type int p))
          (planetpv jd p pv)
          (radecdist pv position) ))))
  0);;main


;;;; almabench.lisp                   -- 2004-03-02 22:33:18 -- pascal   ;;;;


-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: rif
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <wj0r7waswhk.fsf@five-percent-nation.mit.edu>
[crossposts deleted]

A couple of comments/ideas on your code.  If these ideas are bad or
wrong, people should feel free to correct me.

1.  I think you might be well-served to make the constants into 1d
    arrays of specialized arrays of double-floats.  I don't think
    CMUCL/SBCL can specialize the 2d arrays.

2.  For making your type declarations compact, I have found the
    following to be helpful:

(defmacro make-typed-op (name function type)
  `(defmacro ,name (&rest args)
      `(the ,',type (,',function ,@(mapcar #'(lambda (a) 
                                                `(the ,',type ,a)) 
                                           args)))))

(make-typed-op df* * double-float)
(make-typed-op df+ + double-float)
(make-typed-op df- - double-float)
(make-typed-op df/ / double-float)

With these macros, you can just M-x replace + with df+, and so forth.

Cheers,

rif
From: Pascal Bourguignon
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87fzcqc0av.fsf@thalassa.informatimago.com>
Pascal Bourguignon <····@thalassa.informatimago.com> writes:

> ··················@yahoo.com (nobody) writes:
> > I'll call your bluff, mister.
> > 
> > The best known non-stupid (real problem, any algorithm) benchmark
> > is probably the Coyote Gulch test. There are many languages that it
> > has been translated into. If you can produce (write or find) and post
> > a Lisp version that is within 10% of C performance, I will admit
> > that #1 is incorrect. Otherwise, you will admit that your reply is
> > misleading propaganda.
> > 
> > Let the platform be Pentium 4, Linux 2.4.x (see #6)
> > 
> > Do you accept the challenge? (yes/no)
> > 
> > 
> > 
> > 
> > 
> > 
> > I predict that the fastest Lisp version will be 2-20 times
> > slower than C, and its code size will be *bigger* .
> 
> Here is a naive translation  of almabench from c++ to Common-Lisp.  As
> it is, sbcl  and cmucl generate function calls  for generic arithmetic
> operators.   I assume  we would  have  to add  (the double-float  ...)
> everywhere to get anything of the order of C++ speed.

Ok, that was not so hard after all.

Adding these macros:

(defmacro opt-op (op type)
  (let ((cl-op (intern (string op) "COMMON-LISP")))
    `(defmacro ,op (&rest args)
       (labels ((gen
                 (args)
                 (if (null (cdr args))
                   `(the ,(quote ,type) ,(car args))
                   `(the ,(quote ,type) (,(quote ,cl-op) 
                                            (the ,(quote ,type) ,(car args))
                                         ,(gen (cdr args)))))))
         (gen args)))));;opt-op

(opt-op + double-float)
(opt-op - double-float)
(opt-op * double-float)
(opt-op / double-float)


allowed me to make it run  twice as fast as c++ (on Athlon 1200 MHz):

;; SBCL:
* (time (almabench:main))
Evaluation took:
  		 135.384 seconds of real time
  		 96.65 seconds of user run time
  		 3.08 seconds of system run time
  		 0 page faults and
  		 2059355408 bytes consed.

# g++:
$ time ./almabench

real    3m0.195s
user    0m46.470s
sys     0m0.100s



--------------------------------------------------------------------------

;; ALMABENCH 1.0.1
;; Standard C++ version
;; 
;; A number-crunching benchmark designed for cross-language and vendor
;; comparisons.
;; 
;; Written by Scott Robert Ladd.
;; 
;; No rights reserved. This is public domain software, for use by anyone.
;; 
;; This program calculates the daily ephemeris (at noon) for the years
;; 2000-2099 using an algorithm developed by J.L. Simon, P. Bretagnon, J.
;; Chapront, M. Chapront-Touze, G. Francou and J. Laskar of the Bureau des
;; Longitudes, Paris, France), as detailed in Astronomy & Astrophysics
;; 282, 663 (1994)
;; 
;; Note that the code herein is design for the purpose of testing 
;; computational performance error handling and other such "niceties"
;; is virtually non-existent.
;; 
;; Actual (and oft-updated) benchmark results can be found at:
;; http:www.coyotegulch.com
;; 
;; Please do not use this information or algorithm in any way that might
;; upset the balance of the universe or otherwise cause planets to impact
;; upon one another.

(defpackage "ALMABENCH"
  (:use "COMMON-LISP")
  (:shadow "PI" "T" "+" "*" "-" "/")
  (:export "MAIN"))
(in-package "ALMABENCH")



(defmacro opt-op (op type)
  (let ((cl-op (intern (string op) "COMMON-LISP")))
    `(defmacro ,op (&rest args)
       (labels ((gen
                 (args)
                 (if (null (cdr args))
                   `(the ,(quote ,type) ,(car args))
                   `(the ,(quote ,type) (,(quote ,cl-op) 
                                            (the ,(quote ,type) ,(car args))
                                         ,(gen (cdr args)))))))
         (gen args)))));;opt-op

(opt-op + double-float)
(opt-op - double-float)
(opt-op * double-float)
(opt-op / double-float)



(defconstant  PI          3.14159265358979323846d0)
(defconstant  J2000      2451545.0d0)
(defconstant  JCENTURY   36525.0d0)
(defconstant  JMILLENIA  365250.0d0)
(defconstant  TWOPI      (* 2.0d0  PI))
(defconstant  A2R        (/ PI  648000.0d0))
(defconstant  R2H        (/ 12.0d0  PI))
(defconstant  R2D        (/ 180.0d0  PI))
(defconstant  GAUSSK     0.01720209895d0)

;; number of days to include in test
(defconstant TEST-LOOPS   20)
(defconstant TEST-LENGTH  36525)

;; sin and cos of j2000 mean obliquity (iau 1976)
(defconstant  sineps  0.3977771559319137d0)
(defconstant  coseps  0.9174820620691818d0)

(defconstant amas
  #( 6023600.0d0  408523.5d0  328900.5d0  3098710.0d0  1047.355d0  3498.5d0  22869.0d0  19314.0d0 ))

;; tables giving the mean keplerian elements, limited to t**2 terms:
;; a       semi-major axis (au)
;; dlm     mean longitude (degree and arcsecond)
;; e       eccentricity
;; pi      longitude of the perihelion (degree and arcsecond)
;; dinc    inclination (degree and arcsecond)
;; omega   longitude of the ascending node (degree and arcsecond)

(defconstant  a
  #2A( (   0.3870983098d0              0d0         0d0 )
       (   0.7233298200d0              0d0         0d0 )
       (   1.0000010178d0              0d0         0d0 )
       (   1.5236793419d0          3d-10        0d0 )
       (   5.2026032092d0      19132d-10  -39d-10)
       (   9.5549091915d0  -0.0000213896d0   444d-10)
       (  19.2184460618d0      -3716d-10  979d-10)
       (  30.1103868694d0     -16635d-10  686d-10)))
       
(defconstant  dlm
  #2A( ( 252.25090552d0  5381016286.88982d0   -1.92789d0 )
       ( 181.97980085d0  2106641364.33548d0    0.59381d0 )
       ( 100.46645683d0  1295977422.83429d0   -2.04411d0 )
       ( 355.43299958d0   689050774.93988d0    0.94264d0 )
       (  34.35151874d0   109256603.77991d0  -30.60378d0 )
       (  50.07744430d0    43996098.55732d0   75.61614d0 )
       ( 314.05500511d0    15424811.93933d0   -1.75083d0 )
       ( 304.34866548d0     7865503.20744d0    0.21103d0 ))) 

(defconstant  e
  #2A( (   0.2056317526d0   0.0002040653d0     -28349d-10 )
       (   0.0067719164d0  -0.0004776521d0      98127d-10 )
       (   0.0167086342d0  -0.0004203654d0  -0.0000126734d0 )
       (   0.0934006477d0   0.0009048438d0     -80641d-10 )
       (   0.0484979255d0   0.0016322542d0  -0.0000471366d0 )
       (   0.0555481426d0  -0.0034664062d0  -0.0000643639d0 )
       (   0.0463812221d0  -0.0002729293d0   0.0000078913d0 )
       (   0.0094557470d0   0.0000603263d0             0d0 )))

(defconstant  pi-a
  #2A( (  77.45611904d0   5719.11590d0    -4.83016d0 )
       ( 131.56370300d0    175.48640d0  -498.48184d0 )
       ( 102.93734808d0  11612.35290d0    53.27577d0 )
       ( 336.06023395d0  15980.45908d0   -62.32800d0 )
       (  14.33120687d0   7758.75163d0   259.95938d0 )
       (  93.05723748d0  20395.49439d0   190.25952d0 )
       ( 173.00529106d0   3215.56238d0   -34.09288d0 )
       (  48.12027554d0   1050.71912d0    27.39717d0 )))

(defconstant  dinc
  #2A( (   7.00498625d0  -214.25629d0    0.28977d0 )
       (   3.39466189d0   -30.84437d0  -11.67836d0 )
       (            0d0   469.97289d0   -3.35053d0 )
       (   1.84972648d0  -293.31722d0   -8.11830d0 )
       (   1.30326698d0   -71.55890d0   11.95297d0 )
       (   2.48887878d0    91.85195d0  -17.66225d0 )
       (   0.77319689d0   -60.72723d0    1.25759d0 )
       (   1.76995259d0     8.12333d0    0.08135d0 )))

(defconstant  omega
  #2A( (  48.33089304d0   -4515.21727d0   -31.79892d0 )
       (  76.67992019d0  -10008.48154d0   -51.32614d0 )
       ( 174.87317577d0   -8679.27034d0    15.34191d0 )
       (  49.55809321d0  -10620.90088d0  -230.57416d0 )
       ( 100.46440702d0    6362.03561d0   326.52178d0 )
       ( 113.66550252d0   -9240.19942d0   -66.23743d0 )
       (  74.00595701d0    2669.15033d0   145.93964d0 )
       ( 131.78405702d0    -221.94322d0    -0.78728d0 ))) 

;; tables for trigonometric terms to be added to the mean elements
;; of the semi-major axes.
(defconstant  kp
  #2A ( ( 69613.0d0   75645.0d0  88306.0d0  59899.0d0  15746.0d0  71087.0d0  142173.0d0   3086.0d0     0.0d0 )
        ( 21863.0d0   32794.0d0  26934.0d0  10931.0d0  26250.0d0  43725.0d0   53867.0d0  28939.0d0     0.0d0 )
        ( 16002.0d0   21863.0d0  32004.0d0  10931.0d0  14529.0d0  16368.0d0   15318.0d0  32794.0d0     0.0d0 )
        (  6345.0d0    7818.0d0  15636.0d0   7077.0d0   8184.0d0  14163.0d0    1107.0d0   4872.0d0     0.0d0 )
        (  1760.0d0    1454.0d0   1167.0d0    880.0d0    287.0d0   2640.0d0      19.0d0   2047.0d0  1454.0d0 )
        (   574.0d0       0.0d0    880.0d0    287.0d0     19.0d0   1760.0d0    1167.0d0    306.0d0   574.0d0 )
        (   204.0d0       0.0d0    177.0d0   1265.0d0      4.0d0    385.0d0     200.0d0    208.0d0   204.0d0 )
        (     0.0d0     102.0d0    106.0d0      4.0d0     98.0d0   1367.0d0     487.0d0    204.0d0     0.0d0 ))) 

(defconstant  ca
  #2A( (       4.0d0     -13.0d0     11.0d0     -9.0d0     -9.0d0     -3.0d0     -1.0d0      4.0d0     0.0d0 )
       (    -156.0d0      59.0d0    -42.0d0      6.0d0     19.0d0    -20.0d0    -10.0d0    -12.0d0     0.0d0 )
       (      64.0d0    -152.0d0     62.0d0     -8.0d0     32.0d0    -41.0d0     19.0d0    -11.0d0     0.0d0 )
       (     124.0d0     621.0d0   -145.0d0    208.0d0     54.0d0    -57.0d0     30.0d0     15.0d0     0.0d0 )
       (  -23437.0d0   -2634.0d0   6601.0d0   6259.0d0  -1507.0d0  -1821.0d0   2620.0d0  -2115.0d0 -1489.0d0 )
       (   62911.0d0 -119919.0d0  79336.0d0  17814.0d0 -24241.0d0  12068.0d0   8306.0d0  -4893.0d0  8902.0d0 )
       (  389061.0d0 -262125.0d0 -44088.0d0   8387.0d0 -22976.0d0  -2093.0d0   -615.0d0  -9720.0d0  6633.0d0 )
       ( -412235.0d0 -157046.0d0 -31430.0d0  37817.0d0  -9740.0d0    -13.0d0  -7449.0d0   9644.0d0     0.0d0 ))) 

(defconstant  sa
  #2A( (     -29.0d0     -1.0d0      9.0d0      6.0d0     -6.0d0      5.0d0      4.0d0      0.0d0     0.0d0 )
       (     -48.0d0   -125.0d0    -26.0d0    -37.0d0     18.0d0    -13.0d0    -20.0d0     -2.0d0     0.0d0 )
       (    -150.0d0    -46.0d0     68.0d0     54.0d0     14.0d0     24.0d0    -28.0d0     22.0d0     0.0d0 )
       (    -621.0d0    532.0d0   -694.0d0    -20.0d0    192.0d0    -94.0d0     71.0d0    -73.0d0     0.0d0 )
       (  -14614.0d0 -19828.0d0  -5869.0d0   1881.0d0  -4372.0d0  -2255.0d0    782.0d0    930.0d0   913.0d0 )
       (  139737.0d0      0.0d0  24667.0d0  51123.0d0  -5102.0d0   7429.0d0  -4095.0d0  -1976.0d0 -9566.0d0 )
       ( -138081.0d0      0.0d0  37205.0d0 -49039.0d0 -41901.0d0 -33872.0d0 -27037.0d0 -12474.0d0 18797.0d0 )
       (       0.0d0  28492.0d0 133236.0d0  69654.0d0  52322.0d0 -49577.0d0 -26430.0d0  -3593.0d0     0.0d0 ))) 

;; tables giving the trigonometric terms to be added to the mean
;;  elements of the mean longitudes.
(defconstant  kq
  #2A( (  3086.0d0  15746.0d0  69613.0d0  59899.0d0  75645.0d0  88306.0d0  12661.0d0  2658.0d0   0.0d0    0.0d0 )
       ( 21863.0d0  32794.0d0  10931.0d0     73.0d0   4387.0d0  26934.0d0   1473.0d0  2157.0d0   0.0d0    0.0d0 )
       (    10.0d0  16002.0d0  21863.0d0  10931.0d0   1473.0d0  32004.0d0   4387.0d0    73.0d0   0.0d0    0.0d0 )
       (    10.0d0   6345.0d0   7818.0d0   1107.0d0  15636.0d0   7077.0d0   8184.0d0   532.0d0  10.0d0    0.0d0 )
       (    19.0d0   1760.0d0   1454.0d0    287.0d0   1167.0d0    880.0d0    574.0d0  2640.0d0  19.0d0 1454.0d0 )
       (    19.0d0    574.0d0    287.0d0    306.0d0   1760.0d0     12.0d0     31.0d0    38.0d0  19.0d0  574.0d0 )
       (     4.0d0    204.0d0    177.0d0      8.0d0     31.0d0    200.0d0   1265.0d0   102.0d0   4.0d0  204.0d0 )
       (     4.0d0    102.0d0    106.0d0      8.0d0     98.0d0   1367.0d0    487.0d0   204.0d0   4.0d0  102.0d0 ))) 

(defconstant  cl
  #2A( (      21.0d0    -95.0d0  -157.0d0    41.0d0    -5.0d0    42.0d0    23.0d0    30.0d0      0.0d0     0.0d0 )
       (    -160.0d0   -313.0d0  -235.0d0    60.0d0   -74.0d0   -76.0d0   -27.0d0    34.0d0      0.0d0     0.0d0 )
       (    -325.0d0   -322.0d0   -79.0d0   232.0d0   -52.0d0    97.0d0    55.0d0   -41.0d0      0.0d0     0.0d0 )
       (    2268.0d0   -979.0d0   802.0d0   602.0d0  -668.0d0   -33.0d0   345.0d0   201.0d0    -55.0d0     0.0d0 )
       (    7610.0d0  -4997.0d0 -7689.0d0 -5841.0d0 -2617.0d0  1115.0d0  -748.0d0  -607.0d0   6074.0d0   354.0d0 )
       (  -18549.0d0  30125.0d0 20012.0d0  -730.0d0   824.0d0    23.0d0  1289.0d0  -352.0d0 -14767.0d0 -2062.0d0 )
       ( -135245.0d0 -14594.0d0  4197.0d0 -4030.0d0 -5630.0d0 -2898.0d0  2540.0d0  -306.0d0   2939.0d0  1986.0d0 )
       (   89948.0d0   2103.0d0  8963.0d0  2695.0d0  3682.0d0  1648.0d0   866.0d0  -154.0d0  -1963.0d0  -283.0d0 ))) 

(defconstant  sl
  #2A( (   -342.0d0    136.0d0   -23.0d0    62.0d0    66.0d0   -52.0d0   -33.0d0    17.0d0      0.0d0     0.0d0 )
       (    524.0d0   -149.0d0   -35.0d0   117.0d0   151.0d0   122.0d0   -71.0d0   -62.0d0      0.0d0     0.0d0 )
       (   -105.0d0   -137.0d0   258.0d0    35.0d0  -116.0d0   -88.0d0  -112.0d0   -80.0d0      0.0d0     0.0d0 )
       (    854.0d0   -205.0d0  -936.0d0  -240.0d0   140.0d0  -341.0d0   -97.0d0  -232.0d0    536.0d0     0.0d0 )
       ( -56980.0d0   8016.0d0  1012.0d0  1448.0d0 -3024.0d0 -3710.0d0   318.0d0   503.0d0   3767.0d0   577.0d0 )
       ( 138606.0d0 -13478.0d0 -4964.0d0  1441.0d0 -1319.0d0 -1482.0d0   427.0d0  1236.0d0  -9167.0d0 -1918.0d0 )
       (  71234.0d0 -41116.0d0  5334.0d0 -4935.0d0 -1848.0d0    66.0d0   434.0d0 -1748.0d0   3780.0d0  -701.0d0 )
       ( -47645.0d0  11647.0d0  2166.0d0  3194.0d0   679.0d0     0.0d0  -244.0d0  -419.0d0  -2531.0d0    48.0d0 ))) 

;;---------------------------------------------------------------------------
;; Normalize angle into the range -pi <= A < +pi.
(defun anpm (aa)
  (declare (type double-float aa))
  (let ((w (mod aa TWOPI)))
    (when (>= (abs w) pi)
      (setf w (- w (if (< aa 0.0d0) (- twopi) twopi))))
    w));;anpm



(deftype int () 'fixnum)
;;---------------------------------------------------------------------------    
;; The reference frame is equatorial and is with respect to the
;;    mean equator and equinox of epoch j2000.
(defun  planetpv (epoch np pv)
  (declare (type (simple-array double-float (2)) epoch)
           (type int np)
           (type (simple-array double-float (2 3)) pv))
  ;; working storage
  (let ((k 0)
        (t 0d0) (da 0d0) (dl 0d0) (de 0d0) (dp 0d0) (di 0d0)
        (doh 0d0) (dmu 0d0) (arga 0d0) (argl 0d0) (am 0d0)
        (ae 0d0) (dae 0d0) (ae2 0d0) (at 0d0) (r 0d0) (v 0d0)
        (si2 0d0) (xq 0d0) (xp 0d0) (tl 0d0) (xsw 0d0)
        (xcw 0d0) (xm2 0d0) (xf 0d0) (ci2 0d0) (xms 0d0)
        (xmc 0d0) (xpxq2 0d0) (x 0d0) (y 0d0) (z 0d0))
    (declare (type int k)
             (type double-float
                   t da dl de dp di doh dmu arga argl am
                   ae dae ae2 at r v si2 xq xp tl xsw
                   xcw xm2 xf ci2 xms xmc xpxq2 x y z))

    ;; time: julian millennia since j2000.
    (setf t  (/ (+ (- (aref epoch 0) J2000) (aref epoch 1))  JMILLENIA))

    ;; compute the mean elements.
    (setf da (+ (aref a np 0)  (* (+ (aref a np 1) (* (aref a np 2) t)) t))
          dl (* (+ (* 3600.0d0 (aref dlm np 0)) (* (+ (aref dlm np 1) (* (aref dlm np 2) t)) t)) A2R)
          de (+ (aref e np 0) (* (+ (aref e np 1) (* (aref e np 2) t)) t))
          dp (anpm (* (+ (* 3600.0d0 (aref pi-a np 0)) (* (+ (aref pi-a np 1) (* (aref pi-a np 2) t)) t)) A2R))
          di (* (+ (* 3600.0d0 (aref dinc np 0)) (* (+ (aref dinc np 1) (* (aref dinc np 2) t)) t)) A2R)
          doh (anpm (* (+ (* 3600.0d0 (aref omega np 0)) (* (+ (aref omega np 1) (* (aref omega np 2) t)) t)) A2R)) )
    ;; apply the trigonometric terms.
    (setf dmu  (* 0.35953620d0  t))

    (do ((k 0 (1+ k)))
        ((<= 8 k))
      (setf arga (* (aref kp np k) dmu)
            argl (* (aref kq np k) dmu))
      (incf da  (* (+ (* (aref ca np k) (cos arga)) (* (aref sa np k)  (sin arga)))  0.0000001d0))
      (incf dl  (* (+ (* (aref cl np k) (cos argl)) (* (aref sl np k)  (sin argl)))  0.0000001d0)))

    (setf arga (* (aref kp np 8) dmu))
    (incf da (* t (+ (* (aref ca np 8) (cos arga)) (* (aref sa np 8) (sin arga)))  0.0000001d0))

    (do ((k 8 (1+ k)))
        ((< 9 k))
      (setf argl (* (aref kq np k) dmu))
      (incf dl (* t (+ (* (aref cl np k) (cos argl)) (* (aref sl np k) (sin argl))) 0.0000001d0)))

    (setf dl (mod dl TWOPI))

    ;; iterative solution of kepler's equation to get eccentric anomaly.
    (setf am (- dl dp)
          ae (+ am (* de (sin am))))

    (block :loop
      (setf k 0)
      (loop
       (setf dae  (/ (- (+ am (* de (sin ae))) ae) (- 1.0d0 (* de (cos ae)))))
       (incf ae dae)
       (incf k)
       (when (or (>= k 10) (< (abs dae) 1d-12))
         (return-from :loop))))
    
    ;; true anomaly.
    (setf ae2 (/ ae  2.0d0)
          at  (* 2.0d0 (atan (* (sqrt (/ (+ 1.0d0  de) (- 1.0d0  de))) (sin ae2))  (cos ae2))))

    ;; distance (au) and speed (radians per day).
    (setf r (* da  (- 1.0d0 (* de (cos ae))))
          v (* GAUSSK (sqrt (/ (+ 1.0d0 (/ 1.0d0 (aref amas np))) (* da da da)))))

    (setf si2   (sin (/ di  2.0d0))
          xq    (* si2 (cos doh))
          xp    (* si2 (sin doh))
          tl    (+ at  dp)
          xsw   (sin tl)
          xcw   (cos tl)
          xm2   (* 2.0d0 (- (* xp xcw) (* xq xsw)))
          xf    (/ da (sqrt (- 1.0d0 (* de de))))
          ci2   (cos (/ di 2.0d0))
          xms   (* (+ (* de (sin dp)) xsw) xf)
          xmc   (* (+ (* de (cos dp)) xcw) xf)
          xpxq2 (* 2.0d0  xp xq))

    ;; position (j2000 ecliptic x,y,z in au).
    (setf x (* r  (- xcw (* xm2 xp)))
          y (* r  (+ xsw (* xm2 xq)))
          z (* r  (* (- xm2) ci2)))

    ;; rotate to equatorial.
    (setf (aref pv 0 0)  x
          (aref pv 0 1) (- (* y coseps) (* z sineps))
          (aref pv 0 2) (+ (* y sineps) (* z coseps)))

    ;; velocity (j2000 ecliptic xdot,ydot,zdot in au/d).
    (setf x (* v  (+ (* (+ -1.0d0  (* 2.0d0  xp  xp))  xms) (* xpxq2  xmc)))
          y (* v  (- (* (- 1.0d0 (* 2.0d0  xq  xq))  xmc) (* xpxq2  xms)))
          z (* v  (* 2.0d0 ci2 (+ (* xp xms) (* xq  xmc)))))

    ;; rotate to equatorial.
    (setf (aref pv 1 0) x
          (aref pv 1 1) (- (* y  coseps) (* z  sineps))
          (aref pv 1 2) (+ (* y  sineps) (* z  coseps)))
    (values)));;planetpv

;;---------------------------------------------------------------------------
;; Computes RA, Declination, and distance from a state vector returned by
;; planetpv.
(defun radecdist (state rdd)
  (declare (type (simple-array double-float (2 3)) state)
           (type (simple-array double-float (3)) rdd))
  ;; distance
  (setf (aref rdd 2) (sqrt (+ (* (aref state 0 0)  (aref state 0 0))
                              (* (aref state 0 1)  (aref state 0 1))
                              (* (aref state 0 2)  (aref state 0 2)))))

  ;; RA
  (setf (aref rdd 0) (atan (aref state 0 1) (* (aref state 0 0) R2H)))
  (when (< (aref rdd 0)  0.0d0) (incf (aref rdd 0) 24.0d0))

  ;; Declination
  (setf (aref rdd 1) (* (asin (/ (aref state 0 2) (aref rdd 2))) R2D))
  (values));;radecdist


;;---------------------------------------------------------------------------
;; Entry point
;; Calculate RA and Dec for noon on every day in 1900-2100
(defun main ()
  (let ((jd (make-array '(2) :element-type 'double-float :initial-element 0d0))
        (pv (make-array '(2 3) :element-type 'double-float :initial-element 0d0))
        (position (make-array '(3) :element-type 'double-float :initial-element 0d0)))
    (declare (type (simple-array double-float (2)) jd)
             (type (simple-array double-float (2 3)) pv)
             (type (simple-array double-float (3)) position))
    (do ((i 0 (1+ i)))
        ((>= i TEST-LOOPS))
      (declare (type int i))
      (setf (aref jd 0) J2000
            (aref jd 1) 0.0d0)
      (do ((n 0 (1+ n)))
          ((>= n TEST-LENGTH))
        (declare (type int n))
        (incf (aref jd 0) 1.0d0)
        (do ((p 0 (1+ p)))
            ((>= p 8))
          (declare (type int p))
          (planetpv jd p pv)
          (radecdist pv position) ))))
  0);;main


;;;; almabench.lisp                   -- 2004-03-02 23:02:02 -- pascal   ;;;;


-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: rif
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <wj0k722kdoq.fsf@five-percent-nation.mit.edu>
 
Ummmm... do you mean half as fast as C++ or twice as fast as C++?  By
user time, it looks like C++ is twice as fast.  By real time, CL's a
little bit faster, but we don't know anything about the load on the
machine.

Of course, you ought to be able to make the CL version quite a bit
faster still --- it shouldn't need to cons very much at all if you
really want to bum it.

rif
From: Joe Marshall
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <ishm6ce1.fsf@comcast.net>
Pascal Bourguignon <····@thalassa.informatimago.com> writes:

> Ok, that was not so hard after all.

As is expected of lisp, changing code is easy!

> allowed me to make it run  twice as fast as c++ (on Athlon 1200 MHz):

Sweet.

-- 
~jrm
From: Pascal Bourguignon
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <873c8qby7r.fsf@thalassa.informatimago.com>
Joe Marshall <·············@comcast.net> writes:

> Pascal Bourguignon <····@thalassa.informatimago.com> writes:
> 
> > Ok, that was not so hard after all.
> 
> As is expected of lisp, changing code is easy!
> 
> > allowed me to make it run  twice as fast as c++ (on Athlon 1200 MHz):
> 
> Sweet.

Oops, I meant: to make it run taking only twice the time the C++ program did.
That is, only half as fast...

    96.65 seconds for sbcl, vs. 46.470 seconds for C++


-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: rif
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <wj065dm64qb.fsf@five-percent-nation.mit.edu>
Are you planning to hack the code much further?  If you're not, you
might email me the present version and I might take a swing at it.

Cheers,

rif
From: Joe Marshall
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <1xoa6bp2.fsf@comcast.net>
Pascal Bourguignon <····@thalassa.informatimago.com> writes:

> Joe Marshall <·············@comcast.net> writes:
>
>> Pascal Bourguignon <····@thalassa.informatimago.com> writes:
>> 
>> > Ok, that was not so hard after all.
>> 
>> As is expected of lisp, changing code is easy!
>> 
>> > allowed me to make it run  twice as fast as c++ (on Athlon 1200 MHz):
>> 
>> Sweet.
>
> Oops, I meant: to make it run taking only twice the time the C++ program did.
> That is, only half as fast...

@···@!!


-- 
~jrm
From: Jeff Sandys
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <404DF6E5.513EFC16@juno.com>
Did you include the compile time in both runs?
Or are you comparing raw lisp code 
to compiled c++ code?

Thanks,
Jeff Sandys

Pascal Bourguignon wrote:
> 
> 
> Oops, I meant: to make it run taking only twice the time the C++ program did.
> That is, only half as fast...
> 
>     96.65 seconds for sbcl, vs. 46.470 seconds for C++
From: Pascal Bourguignon
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87k71t21h6.fsf@thalassa.informatimago.com>
Jeff Sandys <·······@juno.com> writes:
> > Oops, I meant: to make it run taking only twice the time the C++
> > program did.  That is, only half as fast...
> > 
> >     96.65 seconds for sbcl, vs. 46.470 seconds for C++
> Did you include the compile time in both runs?
> Or are you comparing raw lisp code 
> to compiled c++ code?

That was the  first version. We've much progressed  since, and now are
faster then C++.


-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Joe Marshall
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <ishdk33k.fsf@comcast.net>
Pascal Bourguignon <····@thalassa.informatimago.com> writes:

> Jeff Sandys <·······@juno.com> writes:
>> > Oops, I meant: to make it run taking only twice the time the C++
>> > program did.  That is, only half as fast...
>> > 
>> >     96.65 seconds for sbcl, vs. 46.470 seconds for C++
>> Did you include the compile time in both runs?
>> Or are you comparing raw lisp code 
>> to compiled c++ code?
>
> That was the  first version. We've much progressed  since, and now are
> faster then C++.

I think we should benchmark against the CINT interpreter.


-- 
~jrm
From: Christophe Rhodes
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <sq65dmhkqf.fsf@lambda.dyndns.org>
Pascal Bourguignon <····@thalassa.informatimago.com> writes:

> allowed me to make it run  twice as fast as c++ (on Athlon 1200 MHz):

Flattering though this claim is, I don't feel that the numbers you've
quoted prove your claim that your code runs twice as fast in sbcl as
the c++ code runs in gcc.

> ;; SBCL:
> * (time (almabench:main))
> Evaluation took:
>   		 135.384 seconds of real time
>   		 96.65 seconds of user run time
>
> # g++:
> $ time ./almabench
>
> real    3m0.195s
> user    0m46.470s

Why this huge discrepancy between real and user times?

Christophe
-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)
From: Pascal Bourguignon
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87wu62bx66.fsf@thalassa.informatimago.com>
Christophe Rhodes <·····@cam.ac.uk> writes:

> Pascal Bourguignon <····@thalassa.informatimago.com> writes:
> 
> > allowed me to make it run  twice as fast as c++ (on Athlon 1200 MHz):
> 
> Flattering though this claim is, I don't feel that the numbers you've
> quoted prove your claim that your code runs twice as fast in sbcl as
> the c++ code runs in gcc.
> 
> > ;; SBCL:
> > * (time (almabench:main))
> > Evaluation took:
> >   		 135.384 seconds of real time
> >   		 96.65 seconds of user run time
> >
> > # g++:
> > $ time ./almabench
> >
> > real    3m0.195s
> > user    0m46.470s
> 
> Why this huge discrepancy between real and user times?

Because I had  g++, cmucl, sbcl and clisp running  the same program at
the same time, not mentionning:

- aviplay /e2/movies/wean1.ulib.org/Lectures/Distinguished%20Lectures/2001/07.0%20John%20Mccarthy/4VIDEO/SCS Lecture 5-17.asf
- ····@home
- a couple of downloads,
- etc...

:-)


-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Raymond Toy
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <4045461A.2010000@earthlink.net>
Pascal Bourguignon wrote:
> 
> (defmacro opt-op (op type)
>   (let ((cl-op (intern (string op) "COMMON-LISP")))
>     `(defmacro ,op (&rest args)
>        (labels ((gen
>                  (args)
>                  (if (null (cdr args))
>                    `(the ,(quote ,type) ,(car args))
>                    `(the ,(quote ,type) (,(quote ,cl-op) 
>                                             (the ,(quote ,type) ,(car args))
>                                          ,(gen (cdr args)))))))
>          (gen args)))));;opt-op

I personally think this is a bad idea, especially when working with 
double-float numbers, since they're contagious.

Not wure why I'm doing this, but here are some notes on making this faster.


> (defconstant  a
>   #2A( (   0.3870983098d0              0d0         0d0 )
>        (   0.7233298200d0              0d0         0d0 )
>        (   1.0000010178d0              0d0         0d0 )
>        (   1.5236793419d0          3d-10        0d0 )
>        (   5.2026032092d0      19132d-10  -39d-10)
>        (   9.5549091915d0  -0.0000213896d0   444d-10)
>        (  19.2184460618d0      -3716d-10  979d-10)
>        (  30.1103868694d0     -16635d-10  686d-10)))

Not sure how important this is, but all of the constant arrays are 
actually arrays with element-type T.  This is pretty inefficient since 
every number is boxed up.  So you really need to convert these to 
something like

(defconstant a (make-array '(8 3) :element-type 'double-float 
:initial-contents '((...)))
> ;;---------------------------------------------------------------------------
> ;; Normalize angle into the range -pi <= A < +pi.
> (defun anpm (aa)
>   (declare (type double-float aa))
>   (let ((w (mod aa TWOPI)))
>     (when (>= (abs w) pi)
>       (setf w (- w (if (< aa 0.0d0) (- twopi) twopi))))
>     w));;anpm

I don't think this is doing what the comments say because it seems that 
(anpm -27d0) is 4.x.

If you really want A between -pi and pi, then the second value of 
ftruncate would probably be better.  At least in CMUCL, ftruncate is 
very fast.  mod is quite slow.

>     (setf ae2 (/ ae  2.0d0)
>           at  (* 2.0d0 (atan (* (sqrt (/ (+ 1.0d0  de) (- 1.0d0  de))) (sin ae2))  (cos ae2))))

If the compiler knew the arg to sqrt were non-negative, CMUCL would do a 
much better job and use the fsqrt instruction instead of calling out to 
a generic routine.
> 
>     ;; distance (au) and speed (radians per day).
>     (setf r (* da  (- 1.0d0 (* de (cos ae))))
>           v (* GAUSSK (sqrt (/ (+ 1.0d0 (/ 1.0d0 (aref amas np))) (* da da da)))))

Same comment here.
>           xf    (/ da (sqrt (- 1.0d0 (* de de))))

and here too.

>   ;; Declination
>   (setf (aref rdd 1) (* (asin (/ (aref state 0 2) (aref rdd 2))) R2D))

Similarly, if the compiler knew the arg to asin were between -1 and 1, a 
faster function could be used instead of the generic asin.

These changes should make it closer to C++.

To make it even closer, you'll have to make the 2D arrays into 1D arrays 
and access them appropriately, as rif suggests.  This is a deficiency of 
  CMUCL/SBCL which doesn't have multidimensional specialized arrays.

Compiling with safety 0 will help a bit, since that's basically what the 
C++ compiler does.

These might get you within 10% of c++.  Don't really know for sure, though.

Ray
From: rif
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <wj01xoa63s8.fsf@five-percent-nation.mit.edu>
Raymond Toy <····@earthlink.net> writes:

> Pascal Bourguignon wrote:
> > (defmacro opt-op (op type)
> >   (let ((cl-op (intern (string op) "COMMON-LISP")))
> >     `(defmacro ,op (&rest args)
> >        (labels ((gen
> >                  (args)
> >                  (if (null (cdr args))
> >                    `(the ,(quote ,type) ,(car args))
> >                    `(the ,(quote ,type) (,(quote ,cl-op)
> > (the ,(quote ,type) ,(car args))
> >                                          ,(gen (cdr args)))))))
> >          (gen args)))));;opt-op
> 
> I personally think this is a bad idea, especially when working with
> double-float numbers, since they're contagious.

What exactly do you mean by this?  I use (a variant of) this technique
basically all the time.  What's the actual problem that we can
encounter if we do this?

rif
From: Raymond Toy
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <xRb1c.16501$yZ1.1647@newsread2.news.pas.earthlink.net>
rif wrote:
> Raymond Toy <····@earthlink.net> writes:
> 
> 
>>Pascal Bourguignon wrote:
>>
>>>(defmacro opt-op (op type)
>>>  (let ((cl-op (intern (string op) "COMMON-LISP")))
>>>    `(defmacro ,op (&rest args)
>>>       (labels ((gen
>>>                 (args)
>>>                 (if (null (cdr args))
>>>                   `(the ,(quote ,type) ,(car args))
>>>                   `(the ,(quote ,type) (,(quote ,cl-op)
>>>(the ,(quote ,type) ,(car args))
>>>                                         ,(gen (cdr args)))))))
>>>         (gen args)))));;opt-op
>>
>>I personally think this is a bad idea, especially when working with
>>double-float numbers, since they're contagious.
> 
> 
> What exactly do you mean by this?  I use (a variant of) this technique
> basically all the time.  What's the actual problem that we can
> encounter if we do this?
> 

I guess because double-float are already the most contagious float type, 
so everything will get turned to double-floats.  If you hide this in a 
macro, then it's easy to mess up and, say, do something like multiply a 
double-float and fixnum, but declare the fixnum as a double-float.  Not 
a good idea.

Ray
From: rif
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <wj08yiiaakz.fsf@five-percent-nation.mit.edu>
Raymond Toy <····@earthlink.net> writes:

> rif wrote:
> > Raymond Toy <····@earthlink.net> writes:
> >
> >>Pascal Bourguignon wrote:
> >>
> >>>(defmacro opt-op (op type)
> >>>  (let ((cl-op (intern (string op) "COMMON-LISP")))
> >>>    `(defmacro ,op (&rest args)
> >>>       (labels ((gen
> >>>                 (args)
> >>>                 (if (null (cdr args))
> >>>                   `(the ,(quote ,type) ,(car args))
> >>>                   `(the ,(quote ,type) (,(quote ,cl-op)
> >>>(the ,(quote ,type) ,(car args))
> >>>                                         ,(gen (cdr args)))))))
> >>>         (gen args)))));;opt-op
> >>
> >>I personally think this is a bad idea, especially when working with
> >>double-float numbers, since they're contagious.
> > What exactly do you mean by this?  I use (a variant of) this
> > technique
> > basically all the time.  What's the actual problem that we can
> > encounter if we do this?
> >
> 
> I guess because double-float are already the most contagious float
> type, so everything will get turned to double-floats.  If you hide
> this in a macro, then it's easy to mess up and, say, do something like
> multiply a double-float and fixnum, but declare the fixnum as a
> double-float.  Not a good idea.
> 
> Ray

I somewhat get it.  Would you say that the variant I posted, where I
would use df+ to indicate "I want addition, all the inputs are
double-floats, and so is the output," to be safer because it's more
explicit?  In any case, for either variant, if you ran your program
first on a high-safety setting, the type error would get caught for
you, right?

rif
From: Christian Lynbech
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <ofptbue2ln.fsf@situla.ted.dk.eu.ericsson.se>
>>>>> "Raymond" == Raymond Toy <····@earthlink.net> writes:


>> (defconstant  a
>> #2A( (   0.3870983098d0              0d0         0d0 )
>> (   0.7233298200d0              0d0         0d0 )
>> (   1.0000010178d0              0d0         0d0 )
>> (   1.5236793419d0          3d-10        0d0 )
>> (   5.2026032092d0      19132d-10  -39d-10)
>> (   9.5549091915d0  -0.0000213896d0   444d-10)
>> (  19.2184460618d0      -3716d-10  979d-10)
>> (  30.1103868694d0     -16635d-10  686d-10)))

Raymond> Not sure how important this is, but all of the constant arrays are
Raymond> actually arrays with element-type T.  This is pretty inefficient since
Raymond> every number is boxed up.  So you really need to convert these to
Raymond> something like

Raymond> (defconstant a (make-array '(8 3) :element-type 'double-float
Raymond> :initial-contents '((...)))

I came up with this for the array constants:

(setq *read-default-float-format* 'double-float)

(defmacro create-double-array (dimensions contents)
  `(make-array
    ',dimensions :element-type 'double-float :initial-contents ',contents))

(defconstant a (create-double-array (8 3)
    ( (   0.3870983098              0         0 ) 
      (   0.7233298200              0         0 ) 
      (   1.0000010178              0         0 ) 
      (   1.5236793419          3e-10         0 ) 
      (   5.2026032092      19132e-10   -39e-10 ) 
      (   9.5549091915  -0.0000213896   444e-10 ) 
      (  19.2184460618      -3716e-10   979e-10 ) 
      (  30.1103868694     -16635e-10   686e-10 ) )))
       
(defconstant dlm (create-double-array (8 3) 
    ( ( 252.25090552  5381016286.88982   -1.92789 ) 
      ( 181.97980085  2106641364.33548    0.59381 ) 
      ( 100.46645683  1295977422.83429   -2.04411 ) 
      ( 355.43299958   689050774.93988    0.94264 ) 
      (  34.35151874   109256603.77991  -30.60378 ) 
      (  50.07744430    43996098.55732   75.61614 ) 
      ( 314.05500511    15424811.93933   -1.75083 ) 
      ( 304.34866548     7865503.20744    0.21103 ) )))

(defconstant e (create-double-array (8 3)
    ( (   0.2056317526   0.0002040653     -28349e-10 ) 
      (   0.0067719164  -0.0004776521      98127e-10 ) 
      (   0.0167086342  -0.0004203654  -0.0000126734 ) 
      (   0.0934006477   0.0009048438     -80641e-10 ) 
      (   0.0484979255   0.0016322542  -0.0000471366 ) 
      (   0.0555481426  -0.0034664062  -0.0000643639 ) 
      (   0.0463812221  -0.0002729293   0.0000078913 ) 
      (   0.0094557470   0.0000603263             0  ) )))

(defconstant pi-a (create-double-array (8 3)
    ( (  77.45611904   5719.11590    -4.83016 ) 
      ( 131.56370300    175.48640  -498.48184 ) 
      ( 102.93734808  11612.35290    53.27577 ) 
      ( 336.06023395  15980.45908   -62.32800 ) 
      (  14.33120687   7758.75163   259.95938 ) 
      (  93.05723748  20395.49439   190.25952 ) 
      ( 173.00529106   3215.56238   -34.09288 ) 
      (  48.12027554   1050.71912    27.39717 ) )))

(defconstant dinc (create-double-array (8 3)
    ( (   7.00498625  -214.25629    0.28977 ) 
      (   3.39466189   -30.84437  -11.67836 ) 
      (            0   469.97289   -3.35053 ) 
      (   1.84972648  -293.31722   -8.11830 ) 
      (   1.30326698   -71.55890   11.95297 ) 
      (   2.48887878    91.85195  -17.66225 ) 
      (   0.77319689   -60.72723    1.25759 ) 
      (   1.76995259     8.12333    0.08135 ) )))

(defconstant omega (create-double-array (8 3)
    ( (  48.33089304   -4515.21727   -31.79892 ) 
      (  76.67992019  -10008.48154   -51.32614 ) 
      ( 174.87317577   -8679.27034    15.34191 ) 
      (  49.55809321  -10620.90088  -230.57416 ) 
      ( 100.46440702    6362.03561   326.52178 ) 
      ( 113.66550252   -9240.19942   -66.23743 ) 
      (  74.00595701    2669.15033   145.93964 ) 
      ( 131.78405702    -221.94322    -0.78728 ) )))

;; tables for trigonometric terms to be added to the mean elements
;; of the semi-major axes.
(defconstant kp (create-double-array (8 9)
    ( ( 69613.0   75645.0  88306.0  59899.0  15746.0  71087.0  142173.0   3086.0     0.0 ) 
      ( 21863.0   32794.0  26934.0  10931.0  26250.0  43725.0   53867.0  28939.0     0.0 ) 
      ( 16002.0   21863.0  32004.0  10931.0  14529.0  16368.0   15318.0  32794.0     0.0 ) 
      (  6345.0    7818.0  15636.0   7077.0   8184.0  14163.0    1107.0   4872.0     0.0 ) 
      (  1760.0    1454.0   1167.0    880.0    287.0   2640.0      19.0   2047.0  1454.0 ) 
      (   574.0       0.0    880.0    287.0     19.0   1760.0    1167.0    306.0   574.0 ) 
      (   204.0       0.0    177.0   1265.0      4.0    385.0     200.0    208.0   204.0 ) 
      (     0.0     102.0    106.0      4.0     98.0   1367.0     487.0    204.0     0.0 ) )))

(defconstant ca (create-double-array (8 9)
    ( (       4.0     -13.0     11.0     -9.0     -9.0     -3.0     -1.0      4.0     0.0 ) 
      (    -156.0      59.0    -42.0      6.0     19.0    -20.0    -10.0    -12.0     0.0 ) 
      (      64.0    -152.0     62.0     -8.0     32.0    -41.0     19.0    -11.0     0.0 ) 
      (     124.0     621.0   -145.0    208.0     54.0    -57.0     30.0     15.0     0.0 ) 
      (  -23437.0   -2634.0   6601.0   6259.0  -1507.0  -1821.0   2620.0  -2115.0 -1489.0 ) 
      (   62911.0 -119919.0  79336.0  17814.0 -24241.0  12068.0   8306.0  -4893.0  8902.0 ) 
      (  389061.0 -262125.0 -44088.0   8387.0 -22976.0  -2093.0   -615.0  -9720.0  6633.0 ) 
      ( -412235.0 -157046.0 -31430.0  37817.0  -9740.0    -13.0  -7449.0   9644.0     0.0 ) )))

(defconstant sa (create-double-array (8 9)
    ( (     -29.0     -1.0      9.0      6.0     -6.0      5.0      4.0      0.0     0.0 ) 
      (     -48.0   -125.0    -26.0    -37.0     18.0    -13.0    -20.0     -2.0     0.0 ) 
      (    -150.0    -46.0     68.0     54.0     14.0     24.0    -28.0     22.0     0.0 ) 
      (    -621.0    532.0   -694.0    -20.0    192.0    -94.0     71.0    -73.0     0.0 ) 
      (  -14614.0 -19828.0  -5869.0   1881.0  -4372.0  -2255.0    782.0    930.0   913.0 ) 
      (  139737.0      0.0  24667.0  51123.0  -5102.0   7429.0  -4095.0  -1976.0 -9566.0 ) 
      ( -138081.0      0.0  37205.0 -49039.0 -41901.0 -33872.0 -27037.0 -12474.0 18797.0 ) 
      (       0.0  28492.0 133236.0  69654.0  52322.0 -49577.0 -26430.0  -3593.0     0.0 ) )))

;; tables giving the trigonometric terms to be added to the mean
;; elements of the mean longitudes.
(defconstant kq (create-double-array (8 10)
    ( (  3086.0  15746.0  69613.0  59899.0  75645.0  88306.0  12661.0  2658.0   0.0    0.0 ) 
      ( 21863.0  32794.0  10931.0     73.0   4387.0  26934.0   1473.0  2157.0   0.0    0.0 ) 
      (    10.0  16002.0  21863.0  10931.0   1473.0  32004.0   4387.0    73.0   0.0    0.0 ) 
      (    10.0   6345.0   7818.0   1107.0  15636.0   7077.0   8184.0   532.0  10.0    0.0 ) 
      (    19.0   1760.0   1454.0    287.0   1167.0    880.0    574.0  2640.0  19.0 1454.0 ) 
      (    19.0    574.0    287.0    306.0   1760.0     12.0     31.0    38.0  19.0  574.0 ) 
      (     4.0    204.0    177.0      8.0     31.0    200.0   1265.0   102.0   4.0  204.0 ) 
      (     4.0    102.0    106.0      8.0     98.0   1367.0    487.0   204.0   4.0  102.0 ) )))

(defconstant cl (create-double-array (8 10)
    ( (      21.0    -95.0  -157.0    41.0    -5.0    42.0    23.0    30.0      0.0     0.0 ) 
      (    -160.0   -313.0  -235.0    60.0   -74.0   -76.0   -27.0    34.0      0.0     0.0 ) 
      (    -325.0   -322.0   -79.0   232.0   -52.0    97.0    55.0   -41.0      0.0     0.0 ) 
      (    2268.0   -979.0   802.0   602.0  -668.0   -33.0   345.0   201.0    -55.0     0.0 ) 
      (    7610.0  -4997.0 -7689.0 -5841.0 -2617.0  1115.0  -748.0  -607.0   6074.0   354.0 ) 
      (  -18549.0  30125.0 20012.0  -730.0   824.0    23.0  1289.0  -352.0 -14767.0 -2062.0 ) 
      ( -135245.0 -14594.0  4197.0 -4030.0 -5630.0 -2898.0  2540.0  -306.0   2939.0  1986.0 ) 
      (   89948.0   2103.0  8963.0  2695.0  3682.0  1648.0   866.0  -154.0  -1963.0  -283.0 ) )))

(defconstant sl (create-double-array (8 10)
    ( (   -342.0    136.0   -23.0    62.0    66.0   -52.0   -33.0    17.0      0.0     0.0 ) 
      (    524.0   -149.0   -35.0   117.0   151.0   122.0   -71.0   -62.0      0.0     0.0 ) 
      (   -105.0   -137.0   258.0    35.0  -116.0   -88.0  -112.0   -80.0      0.0     0.0 ) 
      (    854.0   -205.0  -936.0  -240.0   140.0  -341.0   -97.0  -232.0    536.0     0.0 ) 
      ( -56980.0   8016.0  1012.0  1448.0 -3024.0 -3710.0   318.0   503.0   3767.0   577.0 ) 
      ( 138606.0 -13478.0 -4964.0  1441.0 -1319.0 -1482.0   427.0  1236.0  -9167.0 -1918.0 ) 
      (  71234.0 -41116.0  5334.0 -4935.0 -1848.0    66.0   434.0 -1748.0   3780.0  -701.0 ) 
      ( -47645.0  11647.0  2166.0  3194.0   679.0     0.0  -244.0  -419.0  -2531.0    48.0 ) )))



------------------------+-----------------------------------------------------
Christian Lynbech       | christian ··@ defun #\. dk
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - ·······@hal.com (Michael A. Petonic)
From: Marco Antoniotti
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <Ddp1c.75$IJ5.52451@typhoon.nyu.edu>
It may also help to, e.g.

(declaim (type (array double-float) a)

etc etc

marco



Christian Lynbech wrote:
>>>>>>"Raymond" == Raymond Toy <····@earthlink.net> writes:
> 
> 
> 
>>>(defconstant  a
>>>#2A( (   0.3870983098d0              0d0         0d0 )
>>>(   0.7233298200d0              0d0         0d0 )
>>>(   1.0000010178d0              0d0         0d0 )
>>>(   1.5236793419d0          3d-10        0d0 )
>>>(   5.2026032092d0      19132d-10  -39d-10)
>>>(   9.5549091915d0  -0.0000213896d0   444d-10)
>>>(  19.2184460618d0      -3716d-10  979d-10)
>>>(  30.1103868694d0     -16635d-10  686d-10)))
> 
> 
> Raymond> Not sure how important this is, but all of the constant arrays are
> Raymond> actually arrays with element-type T.  This is pretty inefficient since
> Raymond> every number is boxed up.  So you really need to convert these to
> Raymond> something like
> 
> Raymond> (defconstant a (make-array '(8 3) :element-type 'double-float
> Raymond> :initial-contents '((...)))
> 
From: Nils Gösche
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <ly8yiheqa5.fsf@cartan.de>
Marco Antoniotti <·······@cs.nyu.edu> writes:

> It may also help to, e.g.
> 
> (declaim (type (array double-float) a)

Or even (declaim (type (simple-array double-float (* *)) a))

Regards,
-- 
Nils G�sche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x0655CFA0
From: Will Hartung
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <c2597k$1ppivj$1@ID-197644.news.uni-berlin.de>
"Nils "G�sche"" <······@cartan.de> wrote in message
···················@cartan.de...
> Marco Antoniotti <·······@cs.nyu.edu> writes:
>
> > It may also help to, e.g.
> >
> > (declaim (type (array double-float) a)
>
> Or even (declaim (type (simple-array double-float (* *)) a))

Can someone summarize the differences between proclaim, declaim, and declare
for me?

Thanx!

Regards,

Will Hartung
(·····@msoft.com)
From: Matthew Danish
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <20040303194433.GM31147@mapcar.org>
On Wed, Mar 03, 2004 at 11:03:04AM -0800, Will Hartung wrote:
> Can someone summarize the differences between proclaim, declaim, and declare
> for me?

PROCLAIM is a function which processes declarations.  DECLAIM is the
same thing as (EVAL-WHEN (... all times ...) (PROCLAIM ...)).  DECLARE
is used to indicate lexically scoped declarations.

You generally would use PROCLAIM at the REPL, DECLAIM at the top-level
of a file, and DECLARE inside of forms.  DECLAIM, in a file, will only
apply to that file's contents.  PROCLAIM applies globally (after it is
called).

-- 
; Matthew Danish <·······@andrew.cmu.edu>
; OpenPGP public key: C24B6010 on keyring.debian.org
; Signed or encrypted mail welcome.
; "There is no dark side of the moon really; matter of fact, it's all dark."
From: Nils Gösche
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <lyr7w9d7x6.fsf@cartan.de>
Matthew Danish <·······@andrew.cmu.edu> writes:

> DECLAIM, in a file, will only apply to that file's contents.

I think this is unspecified, no?

Regards,
-- 
Nils G�sche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x0655CFA0
From: Christophe Rhodes
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <sqznaxfsud.fsf@lambda.dyndns.org>
Matthew Danish <·······@andrew.cmu.edu> writes:

> DECLAIM, in a file, will only apply to that file's contents.

I wouldn't depend on that.

I think what happens, in implementations which provide a compilation
environment distinguished from the evaluator's compile-time
environment, is that the declamation's compile-time effect is placed
in the compilation environment (so that subsequent compilations in the
same compilation environment pick up the optimization qualities; at
the end of the compile, the compilation environment is discarded).

Implementations which do not implement such a distinguished
compilation environment may currently disguise this by binding the
optimization policy with file scope; it's not at all clear that this
is conforming, and probably shouldn't be relied on.

Christophe
-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)
From: Nils Gösche
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <ly4qt5epct.fsf@cartan.de>
"Will Hartung" <·····@msoft.com> writes:

> "Nils "G�sche"" <······@cartan.de> wrote in message
> ···················@cartan.de...
> > Marco Antoniotti <·······@cs.nyu.edu> writes:
> >
> > > It may also help to, e.g.
> > >
> > > (declaim (type (array double-float) a)
> >
> > Or even (declaim (type (simple-array double-float (* *)) a))
> 
> Can someone summarize the differences between proclaim, declaim, and
> declare for me?

Well, you could check the HyperSpec ;-) As a rule of thumb, never use
PROCLAIM, and use DECLAIM for global declarations, DECLARE for local
ones.

Regards,
-- 
Nils G�sche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x0655CFA0
From: Marco Antoniotti
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <2cx1c.77$IJ5.52773@typhoon.nyu.edu>
Will Hartung wrote:
> "Nils "G�sche"" <······@cartan.de> wrote in message
> ···················@cartan.de...
> 
>>Marco Antoniotti <·······@cs.nyu.edu> writes:
>>
>>
>>>It may also help to, e.g.
>>>
>>>(declaim (type (array double-float) a)
>>
>>Or even (declaim (type (simple-array double-float (* *)) a))
> 
> 
> Can someone summarize the differences between proclaim, declaim, and declare
> for me?

The short story is that PROCLAIM and DECLAIM have global effect. 
PROCLAIM is a function so you can say things like

	(proclaim (list 'type 'foo '(member 42)))


DECLAIM is a macro with nicer syntax and extra compilation effects. 
Although it is not suggested by the standard, you could define DECLAIM as

	(defmacro declaim (type-spec)
	   `(proclaim ',type-spec))

Note that DECLAIM was not in the first CL draft, i.e. CLtL1.

OTOH a DECLARE expression is something that can happen within a larger 
form and its effects are scoped accordingly, as in

	(defun foo (x)
	   (declare (type (member 41 -42) x))
	   (1+ x))

Again, this is the short story.  The long one is in the CLHS.

Cheers
--
Marco
From: Raymond Toy
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <4nwu61zl6n.fsf@edgedsp4.rtp.ericsson.se>
>>>>> "Nils" == Nils G�sche <······@cartan.de> writes:

    Nils> Marco Antoniotti <·······@cs.nyu.edu> writes:
    >> It may also help to, e.g.
    >> 
    >> (declaim (type (array double-float) a)

    Nils> Or even (declaim (type (simple-array double-float (* *)) a))

Certainly better, but I don't think it's necessary with CMUCL.  I'm
pretty sure the compiler can figure out the type of a by itself, and
it probably gets the dimensions filled in correctly, too.  :-) Allows
the code to drop, maybe, the array index checks.

Ray
From: Pascal Bourguignon
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87ad2wb6nx.fsf@thalassa.informatimago.com>
Christian Lynbech <·················@ericsson.com> writes:

> >>>>> "Raymond" == Raymond Toy <····@earthlink.net> writes:
> 
> 
> >> (defconstant  a
> >> #2A( (   0.3870983098d0              0d0         0d0 )
> >> (   0.7233298200d0              0d0         0d0 )
> >> (   1.0000010178d0              0d0         0d0 )
> >> (   1.5236793419d0          3d-10        0d0 )
> >> (   5.2026032092d0      19132d-10  -39d-10)
> >> (   9.5549091915d0  -0.0000213896d0   444d-10)
> >> (  19.2184460618d0      -3716d-10  979d-10)
> >> (  30.1103868694d0     -16635d-10  686d-10)))
> 
> Raymond> Not sure how important this is, but all of the constant arrays are
> Raymond> actually arrays with element-type T.  This is pretty inefficient since
> Raymond> every number is boxed up.  So you really need to convert these to
> Raymond> something like
> 
> Raymond> (defconstant a (make-array '(8 3) :element-type 'double-float
> Raymond> :initial-contents '((...)))
> 
> I came up with this for the array constants:
> 
> (setq *read-default-float-format* 'double-float)
> 
> (defmacro create-double-array (dimensions contents)
>   `(make-array
>     ',dimensions :element-type 'double-float :initial-contents ',contents))

I was not aware that the reader did not built specialized arrays.
Replacing them with:

(defmacro unboxed-array (element-type contents)
  `(make-array  (quote ,(do ((level contents (car level))
                             (dimensions '()))
                            ((atom level) (nreverse dimensions))
                          (push (length level) dimensions)))
                :element-type (quote ,element-type)
                :initial-contents (quote ,contents)))

(defparameter  a
  (UNBOXED-ARRAY DOUBLE-FLOAT
                 ( (   0.3870983098d0              0d0         0d0 )
                   (   0.7233298200d0              0d0         0d0 )
                   (   1.0000010178d0              0d0         0d0 )
                   (   1.5236793419d0          3d-10        0d0 )
                   (   5.2026032092d0      19132d-10  -39d-10)
                   (   9.5549091915d0  -0.0000213896d0   444d-10)
                   (  19.2184460618d0      -3716d-10  979d-10)
                   (  30.1103868694d0     -16635d-10  686d-10))))
;; etc

leads sbcl to reduce execution time down to 63.56 seconds:

* (time (almabench:main))
Evaluation took:
  		 176.834 seconds of real time
  		 63.56 seconds of user run time
  		 2.76 seconds of system run time
  		 0 page faults and
  		 2152939496 bytes consed.

But  it still  conses  a  lot, boxing  and  unboxing double-floats  for
standard  function   arguments.   It  seems  that   using  these  (the
double-float argument) is not useful:

(defmacro sqrt (arg)
  `(the double-float (common-lisp:sqrt (the double-float ,arg))))
(defmacro asin (arg)
  `(the double-float (common-lisp:asin (the double-float ,arg))))
(defmacro mod (num div)
  `(multiple-value-bind (q r) (common-lisp:ftruncate
                               (the double-float ,num)
                               (the double-float ,div))
     (declare (ignore q))
     (the double-float r)));;mod



$ time ./almabench
time ./almabench

real	1m55.474s
user	0m43.960s
sys	0m0.030s


63.56/43.960 = 1.446; we're still 44.6% slower than C++.


-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Will Hartung
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <c28j08$1j29eg$1@ID-197644.news.uni-berlin.de>
"Pascal Bourguignon" <····@thalassa.informatimago.com> wrote in message
···················@thalassa.informatimago.com...

> But  it still  conses  a  lot, boxing  and  unboxing double-floats  for
> standard  function   arguments.   It  seems  that   using  these  (the
> double-float argument) is not useful:

I've been following along at home using LW Personal, and I found the same
thing, just heaps and gobs of consing while pushing the doubles around. I
also tried what you did with the functions, to no avail.

I think part of it is what was mentioned someplace else, that our various
functions can work with complex numbers, and sqrt notably can go from
double-float to complex.

But I was getting it for simple references.

For example, for (with your opt-op macros...):

(defun xxx (state rdd)
  (declare (type (simple-array double-float (2 3)) state)
           (type (simple-array double-float (3)) rdd))
  (setf (aref rdd 2) (sqrt  (* (aref state 0 0)  (aref state 0 0))))
  (values))

When I disassemble it, I get 3 boxing operations:
      97:      FF1530360E20     call  [200E3630]       ;
SYSTEM::BOX-DOUBLE-AUX

But it's not clear to me why (I'll spare the entire disassembly listing).

This is in LW 4.1.20 (I know it's not the latest/greatest, but from the
release notes its not clear they've done much in this area on more recent
versions, so...)

When I run the full benchmark, I cons up just insane amounts of stuff.

Regards,

Will Hartung
(·····@msoft.com)
From: Joe Marshall
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <ishknl1a.fsf@comcast.net>
"Will Hartung" <·····@msoft.com> writes:

> I've been following along at home using LW Personal, and I found the same
> thing, just heaps and gobs of consing while pushing the doubles around. I
> also tried what you did with the functions, to no avail.

For some reason, Lispworks doesn't seem to pack 2-dimensional float
arrays.  If you change them to one-dimensional it doesn't box anywhere
near as much stuff.

-- 
~jrm
From: Nils Gösche
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87znawdpuw.fsf@darkstar.cartan.de>
"Will Hartung" <·····@msoft.com> writes:

> For example, for (with your opt-op macros...):
> 
> (defun xxx (state rdd)
>   (declare (type (simple-array double-float (2 3)) state)
>            (type (simple-array double-float (3)) rdd))
>   (setf (aref rdd 2) (sqrt  (* (aref state 0 0)  (aref state 0 0))))
>   (values))
> 
> When I disassemble it, I get 3 boxing operations:
>       97:      FF1530360E20     call  [200E3630]       ;
> SYSTEM::BOX-DOUBLE-AUX

You can speed it up a bit:

(defun xxx (state rdd)
  (declare (type (simple-array double-float (2 3)) state)
           (type (simple-array double-float (3)) rdd)
           (optimize speed (float 0) (safety 0) (debug 0)))
  (setf (aref rdd 2) (sqrt  (* (aref state 0 0)  (aref state 0 0))))
  (values))

Regards,
-- 
Nils G�sche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID #xEEFBA4AF
From: Gareth McCaughan
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <878yig6ml9.fsf@g.mccaughan.ntlworld.com>
Pascal Bourguignon wrote:

> I was not aware that the reader did not built specialized arrays.
> Replacing them with:
...
> leads sbcl to reduce execution time down to 63.56 seconds:
...
> * (time (almabench:main))
> Evaluation took:
>   		 176.834 seconds of real time
>   		 63.56 seconds of user run time
>   		 2.76 seconds of system run time
>   		 0 page faults and
>   		 2152939496 bytes consed.
...
> $ time ./almabench
> time ./almabench
> 
> real	1m55.474s
> user	0m43.960s
> sys	0m0.030s
> 
> 
> 63.56/43.960 = 1.446; we're still 44.6% slower than C++.

For what it's worth, on my machine with CMUCL 18e and g++ 3.2.3
I get about 52.4s for C++ and about 59.2s for your Lisp version.
(My machine is a 1GHz Athlon.) I haven't tried any newer version
of g++. I have tried 2.95.4, which is sometimes faster for
number-crunching code, but in this case it was slower. I did
some experimentation looking for compiler options to get g++
to generate fast code, and ended up with
-O3 -fomit-frame-pointer -msse -mmmx -march=athlon .
(It's much faster with -ffast-math, but that's giving up
correctness for speed and I don't think that's allowed.)

So, 13% slower. More than 5s of the Lisp version's time
is spent in GC.

CMUCL's compiler notes indicate that ANPM is giving it some
pain; the (MOD DL TWOPI) calculation seems to be happening
in a suboptimal way because it doesn't know the range of
possible values for DL, and it's having to box and unbox
its argument and return value. Declaring an FTYPE for ANPM
doesn't appear to help (maybe someone more schooled in CMUCL
wizardry can explain why), but declaring it inline brings
the Lisp runtime down to 55.3 seconds, 3.7 of which are GC.

Hmm. The definition of ANPM is wrong, anyway. |fmod| in the
C standard library returns a result with the same sign as
its first argument; FMOD in CL returns a result with the
same sign as its second argument. So:

    (declaim (ftype (function (double-float) double-float) anpm))
    (declaim (inline anpm))
    (defun anpm (aa)
      (declare (type double-float aa))
      (multiple-value-bind (dummy w) (ffloor aa TWOPI)
        (declare (ignore dummy) (type double-float w))
        (if (>= w pi) (- w TWOPI) w)))

(I'm not sure whether the FTYPE declamation actually does
any good) and now the time is 51.6 seconds, including 2.8s
of GC time.

Now, here's something interesting. The result of ANPM is
guaranteed to be between -pi and +pi. Declaring its return
type as (double-float -4d0 4d0) enables CMUCL to elide
some range checks on trig functions that use values
returned from ANPM, reducing the runtime to 51.2 seconds.
Try *that* in C++. :-)

So, we're now a few percent faster than C++. Time to stop,
I think. Further improvement may well be possible.

-- 
Gareth McCaughan
.sig under construc
From: Joe Marshall
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <ad2vzxoq.fsf@ccs.neu.edu>
Gareth McCaughan <················@pobox.com> writes:

> For what it's worth, on my machine with CMUCL 18e and g++ 3.2.3
> I get about 52.4s for C++ and about 59.2s for your Lisp version.

> Declaring an FTYPE for ANPM
> doesn't appear to help (maybe someone more schooled in CMUCL
> wizardry can explain why), but declaring it inline brings
> the Lisp runtime down to 55.3 seconds, 3.7 of which are GC.

> (I'm not sure whether the FTYPE declamation actually does
> any good) and now the time is 51.6 seconds, including 2.8s
> of GC time.

> Declaring its return
> type as (double-float -4d0 4d0) enables CMUCL to elide
> some range checks on trig functions that use values
> returned from ANPM, reducing the runtime to 51.2 seconds.

I get 50.16 seconds for C++, 49.21 seconds for MIT-Scheme

Our work here is done.
From: Pascal Bourguignon
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <871xo7bxk8.fsf@thalassa.informatimago.com>
Gareth McCaughan <················@pobox.com> writes:
 
> Now, here's something interesting. The result of ANPM is
> guaranteed to be between -pi and +pi. Declaring its return
> type as (double-float -4d0 4d0) enables CMUCL to elide
> some range checks on trig functions that use values
> returned from ANPM, reducing the runtime to 51.2 seconds.
> Try *that* in C++. :-)
> 
> So, we're now a few percent faster than C++. Time to stop,
> I think. Further improvement may well be possible.

Good. 

Now, it would seem to be fair to propose a benchmark for the C++, java
and fortran programmers,  to see if they can match  the power of Lisp,
if they can at all do it.

For example, could they implement an "emacs"?



(Note, in  Objective-C, on NeXT, we  had an example with  a class that
was able  to take  a string containing  some Objective-C or  C source,
that would compile  it, and dynamically load it to be  run in the same
program.  The  compiler is extern  to the program, but  Objective-C is
dynamic  enough to be  able to  load and  instanciate classes  or just
methods of an existing class (in categories) to do (with sophisticated
internal  mechanisms) what  is done  simply and  efficiently  in Lisp.
That's an existence  proof.  Remains to see how  _users_ would program
and  debug stuff like  the following  in Objective-C  !?  :-)  But the
challenge  remains for  C++ and  Fortran programmers!  (I  assume Java
programmers  will  have  it  as  easy  (or  as  hard)  as  Objective-C
programmers since they've got introspection too).



(defun upcase-lisp-region (start end)
  "
DO:      From the start to end, converts to upcase all symbols.
         Does not touch string literals, comments starting with ';' and
         symbols quoted with '|' or with '\'.
"
  (interactive "*r")
  (case-lisp-region start end (function upcase-region))
  (message "Upcase LISP Done.")
  );;upcase-lisp-region


(defun case-lisp-region (start end transform)
  "
DO:      Applies transform on all subregions from start to end that are not
         a quoted character, a quote symbol, a comment (;... or #|...|#), 
         or a string.
"
  (save-excursion
    (goto-char start)
    (while (< (point) end)
      (while (and (< (point) end) (looking-at "\\([^\"#|;\\\\]\\|#[^|]\\)+"))
        (goto-char (match-end 0)))
      (funcall transform start (point))
      (cond
       ((looking-at "\\(\\\\.\\)") ;;   \x        quoted char (in symbol)
        (goto-char (match-end 0)))
       ((looking-at "\\(;.*$\\)") ;;    ;xxx      comment
        (goto-char (match-end 0)))
       ((looking-at "\\(|[^|]*|\\)") ;; |xxx|     quoted symbol
        (goto-char (match-end 0)))
       ((looking-at "\\(#|\\([^|]\\||[^#]\\)*|#\\)") ;;  #|xxx|#   comment
        (goto-char (match-end 0)))
       ((looking-at "\"\\([^\\\\\"]\\|\\\\.\\|\\\\\n\\)*\"") ;; "xx\"x" strings.
        (goto-char (match-end 0))))
      (setq start (point))
      );;while
    ) ;;save-excursion
  );;case-lisp-region


-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Christopher C. Stacy
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <uoerbn4es.fsf@news.dtpq.com>
>>>>> On 05 Mar 2004 07:46:47 +0100, Pascal Bourguignon ("Pascal") writes:
 Pascal> For example, could they implement an "emacs"?

I think that's called "GNU Emacs", just to cite one popular implementation.
From: Pascal Bourguignon
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87d67rafwy.fsf@thalassa.informatimago.com>
······@news.dtpq.com (Christopher C. Stacy) writes:

> >>>>> On 05 Mar 2004 07:46:47 +0100, Pascal Bourguignon ("Pascal") writes:
>  Pascal> For example, could they implement an "emacs"?
> 
> I think that's called "GNU Emacs", just to cite one popular implementation.

No. GNU Emacs has a lisp bytecode interpreter written in C, and some
editing functions hand-compiled into the C high level assembler for
optimization reason, but it's otherwise mainly written in lisp.

Anyway, the  benchmark feature  I alluded to  was the ability  for the
user  to  dynamically redefine  the  behavior  of  the application  by
editing  and  integrating new  functions  at  run-time.   This is  not
something you usually  see in programs written in  C, Fortran or Java,
at least  until they  apply Greenspun's Tenth  Rule. (That'  should be
called a Law or even a _Principle_, it sounds so fundamental!).



-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Christopher C. Stacy
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <ufzcnn0ib.fsf@news.dtpq.com>
>>>>> On 05 Mar 2004 08:53:17 +0100, Pascal Bourguignon ("Pascal") writes:

 Pascal> ······@news.dtpq.com (Christopher C. Stacy) writes:
 >> >>>>> On 05 Mar 2004 07:46:47 +0100, Pascal Bourguignon ("Pascal") writes:
 Pascal> For example, could they implement an "emacs"?
 >> 
 >> I think that's called "GNU Emacs", just to cite one popular implementation.

 Pascal> No. GNU Emacs has a lisp bytecode interpreter written in C,

Well, first of all, that's not correct.  GNU Emacs is much more than a
Lisp bytecode interpreter written in C.  The primary editing functions
and data structures of the editor application are implemented in C,
not Lisp.  And Emacs Lisp is not capable of implementing those things
nor itself.  Secondly, I am not sure I fully understand the point of
your challenge in such a way that this will not devolve into some
"Turing complete" argument.

 Pascal> Anyway, the  benchmark feature  I alluded to  was the ability  for the
 Pascal> user  to  dynamically redefine  the  behavior  of  the application  by
 Pascal> editing  and  integrating new  functions  at  run-time.   This is  not
 Pascal> something you usually  see in programs written in  C, Fortran or Java,
 Pascal> at least  until they  apply Greenspun's Tenth  Rule. (That'  should be
 Pascal> called a Law or even a _Principle_, it sounds so fundamental!).

This feature is common in a number of popular editors written in C.
You can use either their own C-like extension language interpreter,
or you can use an external language of your choice to compile your
extension as a DLL that speaks to the editor's extension API 
(which is a more "open" mechanism than what GNU Emacs provides).

These are commercially popular editors (that I hear about from my
non-Emacs non-Unix friends all the time).  I don't remember the names
of these products, they all sound alike to me..."Programmer's Perfect
Editor" or something.  DOS and Windows.  They also support key bindings, 
TAGS, regexps, multiple language definable syntax coloring (C,Lisp,etc.),
"electric" language modes providing arglists, completion, IDE interfaces, 
etc. etc. and also have some nice features that GNU Emacs is missing.

I mean, I'd rather write that stuff, and the extensions, in Lisp.
But you can't tell me that it's otherwise too hard to do it!
It is only epsilon more compelling than preferring to write
any arbitrary program in Lisp.  For languages that include
the ability to dynamically load code (which is a lot of them
these days) that epsilon is a very small since the extension
language could also be the implementation language.

Historically, there have been many "ersatz" (non-extensible) Emacs
clones written in C or other languages, going back to at least 1980.
But I have also seen a number of editors, written in C, that were
dynamically extensible and that even implemented Emacs (the buffers,
windows, point-mark, key rebinding, command set, etc.) I think the
first real Emacs-in-C I used was in 1986 on an early Macintosh.

And of course don't forget that the original EMACS was in TECO,
showing that you don't need Lisp at all for this kind of thing.
It's just easier, according to RMS and some other Emacs implementors.

(And if they re-implement GNU Emacs in Guile, which is Scheme, would
that count as "Lisp"?  Or do Common Lisper's also play the game of
"when it's positive we like say we're a Lisp but when we want to
distance ourselves we claim they're an Algol"?)

And didn't I hear something about an Emacs in JAVA, anyway?
From: Rahul Jain
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87y8qecatt.fsf@nyct.net>
······@news.dtpq.com (Christopher C. Stacy) writes:

> For languages that include
> the ability to dynamically load code (which is a lot of them
> these days) that epsilon is a very small since the extension
> language could also be the implementation language.

The ability to dynamically redefine classes and have all existing
instances migrated to the new format is more interesting. Being able to
load code that is guaranteed to crash doesn't really excite me. :)

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Christopher C. Stacy
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <ufzcmt1yt.fsf@news.dtpq.com>
>>>>> On Sat, 06 Mar 2004 09:24:46 -0500, Rahul Jain ("Rahul") writes:

 Rahul> ······@news.dtpq.com (Christopher C. Stacy) writes:
 >> For languages that include
 >> the ability to dynamically load code (which is a lot of them
 >> these days) that epsilon is a very small since the extension
 >> language could also be the implementation language.

 Rahul> The ability to dynamically redefine classes and have all
 Rahul> existing instances migrated

GNU Emacs does that?
From: Rahul Jain
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <878yiddisc.fsf@nyct.net>
······@news.dtpq.com (Christopher C. Stacy) writes:

>>>>>> On Sat, 06 Mar 2004 09:24:46 -0500, Rahul Jain ("Rahul") writes:
>
>  Rahul> ······@news.dtpq.com (Christopher C. Stacy) writes:
>  >> For languages that include
>  >> the ability to dynamically load code (which is a lot of them
>  >> these days) that epsilon is a very small since the extension
>  >> language could also be the implementation language.
>
>  Rahul> The ability to dynamically redefine classes and have all
>  Rahul> existing instances migrated
>
> GNU Emacs does that?

No. Your response was to a comment about dynamic redefinition and
proposing that as a challenge to the "other" languages.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Christopher C. Stacy
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <uu111ahc9.fsf@news.dtpq.com>
>>>>> On Sat, 06 Mar 2004 11:47:31 -0500, Rahul Jain ("Rahul") writes:

 Rahul> ······@news.dtpq.com (Christopher C. Stacy) writes:
 >>>>>>> On Sat, 06 Mar 2004 09:24:46 -0500, Rahul Jain ("Rahul") writes:
 >> 
 Rahul> ······@news.dtpq.com (Christopher C. Stacy) writes:
 >> >> For languages that include
 >> >> the ability to dynamically load code (which is a lot of them
 >> >> these days) that epsilon is a very small since the extension
 >> >> language could also be the implementation language.
 >> 
 Rahul> The ability to dynamically redefine classes and have all
 Rahul> existing instances migrated
 >> 
 >> GNU Emacs does that?

 Rahul> No. Your response was to a comment about dynamic redefinition and
 Rahul> proposing that as a challenge to the "other" languages.

How nice of you to tell me what I was responding to.  
I could have sworn that I was responding to a message from Pascal
Bourguignon where he suggested benchmarking programmer efficiency,
using emacs as a specific example.  My first reply asserted that
GNU Emacs is an emacs that is largely written in C. Pascal replied
to me by clarifying that he was interested in "the user to dynamically
redefine the behavior of the application by editing and integrating
new functions at run-time."  I then gave examples of where this has
been done in several other languages specifically for his example 
of implementing emacs.

You then jumped in to advise me as to the subject of my conversation.
Is there something else you would like to say to me?
From: Rahul Jain
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87znatbuqk.fsf@nyct.net>
······@news.dtpq.com (Christopher C. Stacy) writes:

> How nice of you to tell me what I was responding to.  
> I could have sworn that I was responding to a message from Pascal
> Bourguignon where he suggested benchmarking programmer efficiency,
> using emacs as a specific example.  My first reply asserted that
> GNU Emacs is an emacs that is largely written in C. Pascal replied
> to me by clarifying that he was interested in "the user to dynamically
> redefine the behavior of the application by editing and integrating
> new functions at run-time."  I then gave examples of where this has
> been done in several other languages specifically for his example 
> of implementing emacs.

The fact that Pascal's request moved from a request for a specific
application to a feature of a language. I was indicating that dynamic
redefinition doesn't only apply to code, but also to data types. The
fact that GNU emacs doesn't support this is irrelevant, as the
discussion had started to veer away from that specific example anyway.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Christopher C. Stacy
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <uu111r4gf.fsf@news.dtpq.com>
>>>>> On Sat, 06 Mar 2004 15:12:19 -0500, Rahul Jain ("Rahul") writes:

 Rahul> ······@news.dtpq.com (Christopher C. Stacy) writes:
 >> How nice of you to tell me what I was responding to.  
 >> I could have sworn that I was responding to a message from Pascal
 >> Bourguignon where he suggested benchmarking programmer efficiency,
 >> using emacs as a specific example.  My first reply asserted that
 >> GNU Emacs is an emacs that is largely written in C. Pascal replied
 >> to me by clarifying that he was interested in "the user to dynamically
 >> redefine the behavior of the application by editing and integrating
 >> new functions at run-time."  I then gave examples of where this has
 >> been done in several other languages specifically for his example 
 >> of implementing emacs.

 Rahul> The fact that Pascal's request moved from a request for a specific
 Rahul> application to a feature of a language. I was indicating that dynamic
 Rahul> redefinition doesn't only apply to code, but also to data types. The
 Rahul> fact that GNU emacs doesn't support this is irrelevant, as the
 Rahul> discussion had started to veer away from that specific example anyway.

If it is not relevent, why did you invade the sub-thread to talk about it?
From: Rahul Jain
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87r7w5lclm.fsf@nyct.net>
······@news.dtpq.com (Christopher C. Stacy) writes:

> If it is not relevent, why did you invade the sub-thread to talk about it?

Sigh. It is relevant to the topic at hand, not to the specific example
that was initially given and then broadened.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Raymond Toy
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <4neks7z25m.fsf@edgedsp4.rtp.ericsson.se>
>>>>> "Gareth" == Gareth McCaughan <················@pobox.com> writes:

    Gareth> Hmm. The definition of ANPM is wrong, anyway. |fmod| in the
    Gareth> C standard library returns a result with the same sign as
    Gareth> its first argument; FMOD in CL returns a result with the
    Gareth> same sign as its second argument. So:

    Gareth>     (declaim (ftype (function (double-float) double-float) anpm))
    Gareth>     (declaim (inline anpm))
    Gareth>     (defun anpm (aa)
    Gareth>       (declare (type double-float aa))
    Gareth>       (multiple-value-bind (dummy w) (ffloor aa TWOPI)
    Gareth>         (declare (ignore dummy) (type double-float w))
    Gareth>         (if (>= w pi) (- w TWOPI) w)))

Won't ftruncate then do what you want?  The second result will be
negative if it's first arg is.

Ray
From: Gareth McCaughan
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <871xo66cde.fsf@g.mccaughan.ntlworld.com>
Raymond Toy wrote:

> >>>>> "Gareth" == Gareth McCaughan <················@pobox.com> writes:
> 
>     Gareth> Hmm. The definition of ANPM is wrong, anyway. |fmod| in the
>     Gareth> C standard library returns a result with the same sign as
>     Gareth> its first argument; FMOD in CL returns a result with the
>     Gareth> same sign as its second argument. So:
> 
>     Gareth>     (declaim (ftype (function (double-float) double-float) anpm))
>     Gareth>     (declaim (inline anpm))
>     Gareth>     (defun anpm (aa)
>     Gareth>       (declare (type double-float aa))
>     Gareth>       (multiple-value-bind (dummy w) (ffloor aa TWOPI)
>     Gareth>         (declare (ignore dummy) (type double-float w))
>     Gareth>         (if (>= w pi) (- w TWOPI) w)))
> 
> Won't ftruncate then do what you want?  The second result will be
> negative if it's first arg is.

No, because what (ANPM X) returns is meant to be an element
of the equivalence class of X mod 2pi that's in [-pi,pi].
FROUND, however, will do the right thing. So we now have

    (defun anpm (aa)
      (multiple-value-bind (dummy w) (fround aa TWOPI)
        w))

plus whatever declarations seem appropriate. I'm too lazy
to benchmark this, but I expect it'll shave off another 0.3
seconds or something :-).

-- 
Gareth McCaughan
.sig under construc
From: Brian Mastenbrook
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <050320041308066128%NOSPAMbmastenbNOSPAM@cs.indiana.edu>
In article <··············@g.mccaughan.ntlworld.com>, Gareth McCaughan
<················@pobox.com> wrote:

> For what it's worth, on my machine with CMUCL 18e and g++ 3.2.3
> I get about 52.4s for C++ and about 59.2s for your Lisp version.
> (My machine is a 1GHz Athlon.) I haven't tried any newer version
> of g++. I have tried 2.95.4, which is sometimes faster for
> number-crunching code, but in this case it was slower. I did
> some experimentation looking for compiler options to get g++
> to generate fast code, and ended up with
> -O3 -fomit-frame-pointer -msse -mmmx -march=athlon .
> (It's much faster with -ffast-math, but that's giving up
> correctness for speed and I don't think that's allowed.)

That's totally awesome. How does SBCL 0.8.8 compare? (There are
binaries on the SourceForge download site for this release.)

> So, we're now a few percent faster than C++. Time to stop,
> I think. Further improvement may well be possible.

Wait, before you stop: can everyone involve attempt to collect their
optimizations (maybe on a CLiki page) so there is a canonical source
file for people wishing to reproduce this for themselves? This is a
wonderful achievement here and I think it ought to be preserved.

-- 
Brian Mastenbrook
http://www.cs.indiana.edu/~bmastenb/
From: Cesar Rabak
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <404924C7.8000100@acm.org>
Brian Mastenbrook escreveu:
> In article <··············@g.mccaughan.ntlworld.com>, Gareth McCaughan
> <················@pobox.com> wrote:
> 
> 
[snipped]

> 
> 
> Wait, before you stop: can everyone involve attempt to collect their
> optimizations (maybe on a CLiki page) so there is a canonical source
> file for people wishing to reproduce this for themselves? This is a
> wonderful achievement here and I think it ought to be preserved.
> 

In the hope my canvassing helps the cause ;-) I second this initiative.

--
Cesar Rabak
From: Raymond Toy
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <4n65djyxmz.fsf@edgedsp4.rtp.ericsson.se>
>>>>> "Gareth" == Gareth McCaughan <················@pobox.com> writes:

    Gareth> For what it's worth, on my machine with CMUCL 18e and g++ 3.2.3
    Gareth> I get about 52.4s for C++ and about 59.2s for your Lisp version.
    Gareth> (My machine is a 1GHz Athlon.) I haven't tried any newer version

Very nice.  Could you send or post your current code?  I'm just
curious to see if there's anything else missing or if there's some
obvious deficiencies.

Of course, you realize that the OP will just say, "Look how bad Lisp
is.  It took DAYS to make it run as fast as C++.  The C++ code was
already fast."

:-(

Ray
From: Pascal Bourguignon
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87vflj80i8.fsf@thalassa.informatimago.com>
Raymond Toy <···@rtp.ericsson.se> writes:

> >>>>> "Gareth" == Gareth McCaughan <················@pobox.com> writes:
> 
>     Gareth> For what it's worth, on my machine with CMUCL 18e and g++ 3.2.3
>     Gareth> I get about 52.4s for C++ and about 59.2s for your Lisp version.
>     Gareth> (My machine is a 1GHz Athlon.) I haven't tried any newer version
> 
> Very nice.  Could you send or post your current code?  I'm just
> curious to see if there's anything else missing or if there's some
> obvious deficiencies.
> 
> Of course, you realize that the OP will just say, "Look how bad Lisp
> is.  It took DAYS to make it run as fast as C++.  The C++ code was
> already fast."

Here  is  the  latest  source  and the  summary.   Let's  gather  some
statistics...   There is  still a  lot  of consing  done... but  we're
within 17% of C++ super-optimized code.


Processors
==========

    athlon-1200:
    ------------
    model name	: AMD Athlon(tm) Processor
    stepping	: 2
    cpu MHz		: 1200.094
    cache size	: 256 KB

    
    
    usparc-1200 : UltraSPARC, 1200 MHz
    -----------


user run time (seconds) 
=======================

    compiler          athlon-1200  usparc-1200
    -----------------  ------------ -----------
    g++    3.3          39.540  (1)  79.87  (3)
    g++    3.3          44.870  (2)
    -----------------  ------------ -----------
    sbcl   0.8.7        46.1  
    cmucl  18d          47.12 
    ecl    0.9         239.23   (6)
    -----------------  ------------ -----------
    gambit scheme (4)                95.900 (5)
    -----------------  ------------ -----------


    (1) g++ -O3 -march=athlon -msse -msse2 -mmmx 

    (2) g++ -O3 

    (3) g++ -O2 -mcpu=ultrasparc -mtune=ultrasparc

    (4) Gambit-C 4.0alpha8
    
    (5) gcc -fPIC -I/pkgs/gambit/include -O1 -fschedule-insns2 \
            -fno-math-errno -fno-strict-aliasing -mcpu=ultrasparc \
            -mtune=ultrasparc -Wall -W -Wno-unused -c -D___DYNAMIC \
            -D___SINGLE_HOST -o %s.o %s.c -save-temps

    (6) ecl-0.9 uses the following commands to compile it:

         gcc -g -O2 -fstrict-aliasing -Dlinux  -O \
             -I/local/languages/ecl/lib/ecl//h -w -c almabench.c \
             -o almabench.o
         gcc  -shared -o almabench.so -L/local/languages/ecl/lib/ecl/ \
              almabench.o


------------------------------------------------------------------------
;; ALMABENCH 1.0.1
;; Common-Lisp version
;; 
;; A number-crunching benchmark designed for cross-language and vendor
;; comparisons.
;; 
;; Written by Scott Robert Ladd.
;;
;; Naively translated to Common-Lisp by Pascal Bourguignon.
;;
;; Fine tuning with input from comp.lang.lisp readers:
;; Gareth McCaughan, Raymond Toy, Christian Lynbech, Will Hartung,
;; Marco Antoniotti, Nils G�sche,
;;
;; Bradley J Lucier converted it to scheme.
;;
;; (and special thanks to ··················@yahoo.com (nobody)
;;  for challenging Common-Lisp programmers to prove once more that
;;  Common-Lisp compilers can be faster than C++ compilers).
;;
;; No rights reserved. This is public domain software, for use by anyone.
;; 
;; This program calculates the daily ephemeris (at noon) for the years
;; 2000-2099 using an algorithm developed by J.L. Simon, P. Bretagnon, J.
;; Chapront, M. Chapront-Touze, G. Francou and J. Laskar of the Bureau des
;; Longitudes, Paris, France), as detailed in Astronomy & Astrophysics
;; 282, 663 (1994)
;; 
;; Note that the code herein is design for the purpose of testing 
;; computational performance error handling and other such "niceties"
;; is virtually non-existent.
;; 
;; Actual (and oft-updated) benchmark results can be found at:
;; http:www.coyotegulch.com
;; 
;; Please do not use this information or algorithm in any way that might
;; upset the balance of the universe or otherwise cause planets to impact
;; upon one another.


(defpackage "ALMABENCH"
  (:use "COMMON-LISP")
  (:shadow "+" "*" "-" "/" "SQRT" "ASIN" )
  (:export "MAIN"));;ALMABENCH
(in-package "ALMABENCH")



;;----------------------------------------------------------------------
;; Common-Lisp speed optimization

(declaim (optimize (safety 0) ;; run-time error checking            
                   (space  0) ;; both code size and run-time space  
                   (speed  3))) ;; speed of the object code           



(defmacro unboxed-array (element-type contents)
  `(make-array  (quote ,(do ((level contents (car level))
                             (dimensions '()))
                            ((atom level) (nreverse dimensions))
                          (push (length level) dimensions)))
                :element-type (quote ,element-type)
                :initial-contents (quote ,contents)));;unboxed-array



(defmacro opt-op (op type)
  (let ((cl-op (intern (string op) "COMMON-LISP")))
    `(defmacro ,op (&rest args)
       (labels ((gen
                 (args)
                 (if (null (cdr args))
                   (list 'the ',type (car args))
                   (list 'the ',type (list ',cl-op (list 'the ',type (car args))
                                           (gen (cdr args)))))))
         (gen args)))));;opt-op


(opt-op + double-float)
(opt-op - double-float)
(opt-op * double-float)
(opt-op / double-float)


(defmacro sqrt (arg)
  `(the (double-float 0d0) (common-lisp:sqrt (the (double-float 0d0) ,arg))))
(defmacro asin (arg)
  `(the double-float (common-lisp:asin (the (double-float -1d0 1d0) ,arg))))


(declaim (inline fmod))
(defun fmod (num div)
  (declare (double-float num div)
           (optimize speed (space 0)))
  (let ((q (ftruncate (/ num div))))
    (- num (* q div))));;fmod

#||
;; The above fmod seems slightly faster in sbcl !?
(defmacro fmod (num div)
  `(multiple-value-bind (q r) (common-lisp:ftruncate
                               (the double-float ,num)
                               (the double-float ,div))
     (declare (ignore q))
     (the double-float r)));;fmod
||#


;;------------------------------------------------------------------------
(deftype int () '(signed-byte 32))


(declaim (type int test-loops test-length))
(declaim (type double-float %pi j2000 jcentury jmillenia twopi a2r r2h r2d gaussk
               sineps coseps))
(declaim (type (simple-array double-float (8))   amas))
(declaim (type (simple-array double-float (8 3)) a dlm e pi-a dinc omega))
(declaim (type (simple-array double-float (8 9)) kp ca sa))
(declaim (type (simple-array double-float (8 10)) kq cl sl))


;; number of days to include in test
(defconstant TEST-LOOPS   20)
(defconstant TEST-LENGTH  36525)


(defconstant  %PI         3.14159265358979323846d0)
(defconstant  J2000      2451545.0d0)
(defconstant  JCENTURY   36525.0d0)
(defconstant  JMILLENIA  365250.0d0)
(defconstant  TWOPI      (the double-float (* 2.0d0  %PI)))
(defconstant  A2R        (the double-float (/ %PI  648000.0d0)))
(defconstant  R2H        (the double-float (/ 12.0d0  %PI)))
(defconstant  R2D        (the double-float (/ 180.0d0  %PI)))
(defconstant  GAUSSK     0.01720209895d0)

;; sin and cos of j2000 mean obliquity (iau 1976)
(defconstant  sineps  0.3977771559319137d0)
(defconstant  coseps  0.9174820620691818d0)





(defconstant amas
  (unboxed-array
   double-float
   ( 6023600.0d0  408523.5d0  328900.5d0  3098710.0d0
                  1047.355d0  3498.5d0  22869.0d0  19314.0d0 )));;amas

;; tables giving the mean keplerian elements, limited to tt**2 terms:
;; a       semi-major axis (au)
;; dlm     mean longitude (degree and arcsecond)
;; e       eccentricity
;; %pi      longitude of the perihelion (degree and arcsecond)
;; dinc    inclination (degree and arcsecond)
;; omega   longitude of the ascending node (degree and arcsecond)

(defconstant  a
  (UNBOXED-ARRAY DOUBLE-FLOAT
                 ( (   0.3870983098d0              0d0         0d0 )
                   (   0.7233298200d0              0d0         0d0 )
                   (   1.0000010178d0              0d0         0d0 )
                   (   1.5236793419d0          3d-10        0d0 )
                   (   5.2026032092d0      19132d-10  -39d-10)
                   (   9.5549091915d0  -0.0000213896d0   444d-10)
                   (  19.2184460618d0      -3716d-10  979d-10)
                   (  30.1103868694d0     -16635d-10  686d-10))));;a


(defconstant  dlm
  (UNBOXED-ARRAY DOUBLE-FLOAT
                 ( ( 252.25090552d0  5381016286.88982d0   -1.92789d0 )
                   ( 181.97980085d0  2106641364.33548d0    0.59381d0 )
                   ( 100.46645683d0  1295977422.83429d0   -2.04411d0 )
                   ( 355.43299958d0   689050774.93988d0    0.94264d0 )
                   (  34.35151874d0   109256603.77991d0  -30.60378d0 )
                   (  50.07744430d0    43996098.55732d0   75.61614d0 )
                   ( 314.05500511d0    15424811.93933d0   -1.75083d0 )
                   ( 304.34866548d0     7865503.20744d0    0.21103d0 ))));;dlm


(defconstant  e
  (UNBOXED-ARRAY DOUBLE-FLOAT
                 ( (   0.2056317526d0   0.0002040653d0     -28349d-10 )
                   (   0.0067719164d0  -0.0004776521d0      98127d-10 )
                   (   0.0167086342d0  -0.0004203654d0  -0.0000126734d0 )
                   (   0.0934006477d0   0.0009048438d0     -80641d-10 )
                   (   0.0484979255d0   0.0016322542d0  -0.0000471366d0 )
                   (   0.0555481426d0  -0.0034664062d0  -0.0000643639d0 )
                   (   0.0463812221d0  -0.0002729293d0   0.0000078913d0 )
                   (   0.0094557470d0   0.0000603263d0             0d0 ))));;e


(defconstant  %pi-a
  (UNBOXED-ARRAY DOUBLE-FLOAT
                 ( (  77.45611904d0   5719.11590d0    -4.83016d0 )
                   ( 131.56370300d0    175.48640d0  -498.48184d0 )
                   ( 102.93734808d0  11612.35290d0    53.27577d0 )
                   ( 336.06023395d0  15980.45908d0   -62.32800d0 )
                   (  14.33120687d0   7758.75163d0   259.95938d0 )
                   (  93.05723748d0  20395.49439d0   190.25952d0 )
                   ( 173.00529106d0   3215.56238d0   -34.09288d0 )
                   (  48.12027554d0   1050.71912d0    27.39717d0 ))));;%pi-a


(defconstant  dinc
  (UNBOXED-ARRAY DOUBLE-FLOAT
                 ( (   7.00498625d0  -214.25629d0    0.28977d0 )
                   (   3.39466189d0   -30.84437d0  -11.67836d0 )
                   (            0d0   469.97289d0   -3.35053d0 )
                   (   1.84972648d0  -293.31722d0   -8.11830d0 )
                   (   1.30326698d0   -71.55890d0   11.95297d0 )
                   (   2.48887878d0    91.85195d0  -17.66225d0 )
                   (   0.77319689d0   -60.72723d0    1.25759d0 )
                   (   1.76995259d0     8.12333d0    0.08135d0 ))));;dinc


(defconstant  omega
  (UNBOXED-ARRAY DOUBLE-FLOAT
                 ( (  48.33089304d0   -4515.21727d0   -31.79892d0 )
                   (  76.67992019d0  -10008.48154d0   -51.32614d0 )
                   ( 174.87317577d0   -8679.27034d0    15.34191d0 )
                   (  49.55809321d0  -10620.90088d0  -230.57416d0 )
                   ( 100.46440702d0    6362.03561d0   326.52178d0 )
                   ( 113.66550252d0   -9240.19942d0   -66.23743d0 )
                   (  74.00595701d0    2669.15033d0   145.93964d0 )
                   ( 131.78405702d0    -221.94322d0    -0.78728d0 ))));;omega


;; tables for trigonometric terms to be added to the mean elements
;; of the semi-major axes.
(defconstant  kp
  (UNBOXED-ARRAY
   DOUBLE-FLOAT
   ( ( 69613.0d0   75645.0d0  88306.0d0  59899.0d0  15746.0d0  71087.0d0  142173.0d0   3086.0d0     0.0d0 )
     ( 21863.0d0   32794.0d0  26934.0d0  10931.0d0  26250.0d0  43725.0d0   53867.0d0  28939.0d0     0.0d0 )
     ( 16002.0d0   21863.0d0  32004.0d0  10931.0d0  14529.0d0  16368.0d0   15318.0d0  32794.0d0     0.0d0 )
     (  6345.0d0    7818.0d0  15636.0d0   7077.0d0   8184.0d0  14163.0d0    1107.0d0   4872.0d0     0.0d0 )
     (  1760.0d0    1454.0d0   1167.0d0    880.0d0    287.0d0   2640.0d0      19.0d0   2047.0d0  1454.0d0 )
     (   574.0d0       0.0d0    880.0d0    287.0d0     19.0d0   1760.0d0    1167.0d0    306.0d0   574.0d0 )
     (   204.0d0       0.0d0    177.0d0   1265.0d0      4.0d0    385.0d0     200.0d0    208.0d0   204.0d0 )
     (     0.0d0     102.0d0    106.0d0      4.0d0     98.0d0   1367.0d0     487.0d0    204.0d0     0.0d0 ))));;kp


(defconstant  ca
  (UNBOXED-ARRAY
   DOUBLE-FLOAT
   ( (       4.0d0     -13.0d0     11.0d0     -9.0d0     -9.0d0     -3.0d0     -1.0d0      4.0d0     0.0d0 )
     (    -156.0d0      59.0d0    -42.0d0      6.0d0     19.0d0    -20.0d0    -10.0d0    -12.0d0     0.0d0 )
     (      64.0d0    -152.0d0     62.0d0     -8.0d0     32.0d0    -41.0d0     19.0d0    -11.0d0     0.0d0 )
     (     124.0d0     621.0d0   -145.0d0    208.0d0     54.0d0    -57.0d0     30.0d0     15.0d0     0.0d0 )
     (  -23437.0d0   -2634.0d0   6601.0d0   6259.0d0  -1507.0d0  -1821.0d0   2620.0d0  -2115.0d0 -1489.0d0 )
     (   62911.0d0 -119919.0d0  79336.0d0  17814.0d0 -24241.0d0  12068.0d0   8306.0d0  -4893.0d0  8902.0d0 )
     (  389061.0d0 -262125.0d0 -44088.0d0   8387.0d0 -22976.0d0  -2093.0d0   -615.0d0  -9720.0d0  6633.0d0 )
     ( -412235.0d0 -157046.0d0 -31430.0d0  37817.0d0  -9740.0d0    -13.0d0  -7449.0d0   9644.0d0     0.0d0 ))));;ca


(defconstant  sa
  (UNBOXED-ARRAY
   DOUBLE-FLOAT
   ( (     -29.0d0     -1.0d0      9.0d0      6.0d0     -6.0d0      5.0d0      4.0d0      0.0d0     0.0d0 )
     (     -48.0d0   -125.0d0    -26.0d0    -37.0d0     18.0d0    -13.0d0    -20.0d0     -2.0d0     0.0d0 )
     (    -150.0d0    -46.0d0     68.0d0     54.0d0     14.0d0     24.0d0    -28.0d0     22.0d0     0.0d0 )
     (    -621.0d0    532.0d0   -694.0d0    -20.0d0    192.0d0    -94.0d0     71.0d0    -73.0d0     0.0d0 )
     (  -14614.0d0 -19828.0d0  -5869.0d0   1881.0d0  -4372.0d0  -2255.0d0    782.0d0    930.0d0   913.0d0 )
     (  139737.0d0      0.0d0  24667.0d0  51123.0d0  -5102.0d0   7429.0d0  -4095.0d0  -1976.0d0 -9566.0d0 )
     ( -138081.0d0      0.0d0  37205.0d0 -49039.0d0 -41901.0d0 -33872.0d0 -27037.0d0 -12474.0d0 18797.0d0 )
     (       0.0d0  28492.0d0 133236.0d0  69654.0d0  52322.0d0 -49577.0d0 -26430.0d0  -3593.0d0     0.0d0 ))));;sa


;; tables giving the trigonometric terms to be added to the mean
;;  elements of the mean longitudes.
(defconstant  kq
  (UNBOXED-ARRAY
   DOUBLE-FLOAT
   ( (  3086.0d0  15746.0d0  69613.0d0  59899.0d0  75645.0d0  88306.0d0  12661.0d0  2658.0d0   0.0d0    0.0d0 )
     ( 21863.0d0  32794.0d0  10931.0d0     73.0d0   4387.0d0  26934.0d0   1473.0d0  2157.0d0   0.0d0    0.0d0 )
     (    10.0d0  16002.0d0  21863.0d0  10931.0d0   1473.0d0  32004.0d0   4387.0d0    73.0d0   0.0d0    0.0d0 )
     (    10.0d0   6345.0d0   7818.0d0   1107.0d0  15636.0d0   7077.0d0   8184.0d0   532.0d0  10.0d0    0.0d0 )
     (    19.0d0   1760.0d0   1454.0d0    287.0d0   1167.0d0    880.0d0    574.0d0  2640.0d0  19.0d0 1454.0d0 )
     (    19.0d0    574.0d0    287.0d0    306.0d0   1760.0d0     12.0d0     31.0d0    38.0d0  19.0d0  574.0d0 )
     (     4.0d0    204.0d0    177.0d0      8.0d0     31.0d0    200.0d0   1265.0d0   102.0d0   4.0d0  204.0d0 )
     (     4.0d0    102.0d0    106.0d0      8.0d0     98.0d0   1367.0d0    487.0d0   204.0d0   4.0d0  102.0d0 ))));;kq


(defconstant  cl
  (UNBOXED-ARRAY
   DOUBLE-FLOAT
   ( (      21.0d0    -95.0d0  -157.0d0    41.0d0    -5.0d0    42.0d0    23.0d0    30.0d0      0.0d0     0.0d0 )
     (    -160.0d0   -313.0d0  -235.0d0    60.0d0   -74.0d0   -76.0d0   -27.0d0    34.0d0      0.0d0     0.0d0 )
     (    -325.0d0   -322.0d0   -79.0d0   232.0d0   -52.0d0    97.0d0    55.0d0   -41.0d0      0.0d0     0.0d0 )
     (    2268.0d0   -979.0d0   802.0d0   602.0d0  -668.0d0   -33.0d0   345.0d0   201.0d0    -55.0d0     0.0d0 )
     (    7610.0d0  -4997.0d0 -7689.0d0 -5841.0d0 -2617.0d0  1115.0d0  -748.0d0  -607.0d0   6074.0d0   354.0d0 )
     (  -18549.0d0  30125.0d0 20012.0d0  -730.0d0   824.0d0    23.0d0  1289.0d0  -352.0d0 -14767.0d0 -2062.0d0 )
     ( -135245.0d0 -14594.0d0  4197.0d0 -4030.0d0 -5630.0d0 -2898.0d0  2540.0d0  -306.0d0   2939.0d0  1986.0d0 )
     (   89948.0d0   2103.0d0  8963.0d0  2695.0d0  3682.0d0  1648.0d0   866.0d0  -154.0d0  -1963.0d0  -283.0d0 ))));;cl


(defconstant  sl
  (UNBOXED-ARRAY
   DOUBLE-FLOAT
   ( (   -342.0d0    136.0d0   -23.0d0    62.0d0    66.0d0   -52.0d0   -33.0d0    17.0d0      0.0d0     0.0d0 )
     (    524.0d0   -149.0d0   -35.0d0   117.0d0   151.0d0   122.0d0   -71.0d0   -62.0d0      0.0d0     0.0d0 )
     (   -105.0d0   -137.0d0   258.0d0    35.0d0  -116.0d0   -88.0d0  -112.0d0   -80.0d0      0.0d0     0.0d0 )
     (    854.0d0   -205.0d0  -936.0d0  -240.0d0   140.0d0  -341.0d0   -97.0d0  -232.0d0    536.0d0     0.0d0 )
     ( -56980.0d0   8016.0d0  1012.0d0  1448.0d0 -3024.0d0 -3710.0d0   318.0d0   503.0d0   3767.0d0   577.0d0 )
     ( 138606.0d0 -13478.0d0 -4964.0d0  1441.0d0 -1319.0d0 -1482.0d0   427.0d0  1236.0d0  -9167.0d0 -1918.0d0 )
     (  71234.0d0 -41116.0d0  5334.0d0 -4935.0d0 -1848.0d0    66.0d0   434.0d0 -1748.0d0   3780.0d0  -701.0d0 )
     ( -47645.0d0  11647.0d0  2166.0d0  3194.0d0   679.0d0     0.0d0  -244.0d0  -419.0d0  -2531.0d0    48.0d0 ))));;sl

;;---------------------------------------------------------------------------
;; Normalize angle into the range -%pi <= A < +%pi.


(declaim (ftype (function (double-float) double-float) anpm))
(declaim (inline anpm))


#||
(defun anpm (aa)
  (declare (type double-float aa))
  (multiple-value-bind (dummy w) (ffloor aa TWOPI)
    (declare (ignore dummy) (type double-float w))
    (if (>= w %pi) (- w TWOPI) w)));;anpm-1
The following seems to be slightly faster on sbcl:
||#

(defun anpm (aa)
  (declare (type double-float aa))
  (let ((w (fmod aa TWOPI)))
    (when (>= (abs w) %pi)
      (setf w (- w (if (< aa 0.0d0) (- twopi) twopi))))
    (values (the double-float w))));;anpm



;;(declaim (ftype (function ((simple-array double-float (2))
;;                           int
;;                           (simple-array double-float (2 3))) double-float) planetpv))



;;---------------------------------------------------------------------------    
;; The reference frame is equatorial and is with respect to the
;;    mean equator and equinox of epoch j2000.
(defun  planetpv (epoch np pv)
  (declare (type (simple-array double-float (2)) epoch)
           (type int np)
           (type (simple-array double-float (2 3)) pv))
  ;; working storage
  (let ((k 0)
        (tt 0d0) (da 0d0) (dl 0d0) (de 0d0) (dp 0d0) (di 0d0)
        (doh 0d0) (dmu 0d0) (arga 0d0) (argl 0d0) (am 0d0)
        (ae 0d0) (dae 0d0) (ae2 0d0) (at 0d0) (r 0d0) (v 0d0)
        (si2 0d0) (xq 0d0) (xp 0d0) (tl 0d0) (xsw 0d0)
        (xcw 0d0) (xm2 0d0) (xf 0d0) (ci2 0d0) (xms 0d0)
        (xmc 0d0) (xpxq2 0d0) (x 0d0) (y 0d0) (z 0d0))
    (declare (type int k)
             (type double-float
                   tt da dl de dp di doh dmu arga argl am
                   ae dae ae2 at r v si2 xq xp tl xsw
                   xcw xm2 xf ci2 xms xmc xpxq2 x y z))

    ;; time: julian millennia since j2000.
    (setf tt  (/ (+ (- (aref epoch 0) J2000) (aref epoch 1))  JMILLENIA))

    ;; compute the mean elements.
    (setf da (+ (aref a np 0)  (* (+ (aref a np 1) (* (aref a np 2) tt)) tt))
          dl (* (+ (* 3600.0d0 (aref dlm np 0)) (* (+ (aref dlm np 1) (* (aref dlm np 2) tt)) tt)) A2R)
          de (+ (aref e np 0) (* (+ (aref e np 1) (* (aref e np 2) tt)) tt))
          dp (anpm (* (+ (* 3600.0d0 (aref %pi-a np 0)) (* (+ (aref %pi-a np 1) (* (aref %pi-a np 2) tt)) tt)) A2R))
          di (* (+ (* 3600.0d0 (aref dinc np 0)) (* (+ (aref dinc np 1) (* (aref dinc np 2) tt)) tt)) A2R)
          doh (anpm (* (+ (* 3600.0d0 (aref omega np 0)) (* (+ (aref omega np 1) (* (aref omega np 2) tt)) tt)) A2R)) )
    ;; apply the trigonometric terms.
    (setf dmu  (* 0.35953620d0  tt))

    (do ((k 0 (1+ k)))
        ((<= 8 k))
      (setf arga (* (aref kp np k) dmu)
            argl (* (aref kq np k) dmu))
      (incf da  (* (+ (* (aref ca np k) (cos arga)) (* (aref sa np k)  (sin arga)))  0.0000001d0))
      (incf dl  (* (+ (* (aref cl np k) (cos argl)) (* (aref sl np k)  (sin argl)))  0.0000001d0)))

    (setf arga (* (aref kp np 8) dmu))
    (incf da (* tt (+ (* (aref ca np 8) (cos arga)) (* (aref sa np 8) (sin arga)))  0.0000001d0))

    (do ((k 8 (1+ k)))
        ((< 9 k))
      (setf argl (* (aref kq np k) dmu))
      (incf dl (* tt (+ (* (aref cl np k) (cos argl)) (* (aref sl np k) (sin argl))) 0.0000001d0)))

    (setf dl (fmod dl TWOPI))

    ;; iterative solution of kepler's equation to get eccentric anomaly.
    (setf am (- dl dp)
          ae (+ am (* de (sin am))))

    (block :loop
      (setf k 0)
      (loop
       (setf dae  (/ (- (+ am (* de (sin ae))) ae) (- 1.0d0 (* de (cos ae)))))
       (incf ae dae)
       (incf k)
       (when (or (>= k 10) (< (abs dae) 1d-12))
         (return-from :loop))))
    
    ;; true anomaly.
    (setf ae2 (/ ae  2.0d0)
          at  (* 2.0d0 (atan (* (sqrt (/ (+ 1.0d0  de) (- 1.0d0  de))) (sin ae2))  (cos ae2))))

    ;; distance (au) and speed (radians per day).
    (setf r (* da  (- 1.0d0 (* de (cos ae))))
          v (* GAUSSK (sqrt (/ (+ 1.0d0 (/ 1.0d0 (aref amas np))) (* da da da)))))

    (setf si2   (sin (/ di  2.0d0))
          xq    (* si2 (cos doh))
          xp    (* si2 (sin doh))
          tl    (+ at  dp)
          xsw   (sin tl)
          xcw   (cos tl)
          xm2   (* 2.0d0 (- (* xp xcw) (* xq xsw)))
          xf    (/ da (sqrt (- 1.0d0 (* de de))))
          ci2   (cos (/ di 2.0d0))
          xms   (* (+ (* de (sin dp)) xsw) xf)
          xmc   (* (+ (* de (cos dp)) xcw) xf)
          xpxq2 (* 2.0d0  xp xq))

    ;; position (j2000 ecliptic x,y,z in au).
    (setf x (* r  (- xcw (* xm2 xp)))
          y (* r  (+ xsw (* xm2 xq)))
          z (* r  (* (- xm2) ci2)))

    ;; rotate to equatorial.
    (setf (aref pv 0 0)  x
          (aref pv 0 1) (- (* y coseps) (* z sineps))
          (aref pv 0 2) (+ (* y sineps) (* z coseps)))

    ;; velocity (j2000 ecliptic xdot,ydot,zdot in au/d).
    (setf x (* v  (+ (* (+ -1.0d0  (* 2.0d0  xp  xp))  xms) (* xpxq2  xmc)))
          y (* v  (- (* (- 1.0d0 (* 2.0d0  xq  xq))  xmc) (* xpxq2  xms)))
          z (* v  (* 2.0d0 ci2 (+ (* xp xms) (* xq  xmc)))))

    ;; rotate to equatorial.
    (setf (aref pv 1 0) x
          (aref pv 1 1) (- (* y  coseps) (* z  sineps))
          (aref pv 1 2) (+ (* y  sineps) (* z  coseps)))
    (values)));;planetpv

;;---------------------------------------------------------------------------
;; Computes RA, Declination, and distance from a state vector returned by
;; planetpv.
(defun radecdist (state rdd)
  (declare (type (simple-array double-float (2 3)) state)
           (type (simple-array double-float (3)) rdd))
  ;; distance
  (setf (aref rdd 2) (the double-float
                       (sqrt (+ (* (aref state 0 0)  (aref state 0 0))
                                (* (aref state 0 1)  (aref state 0 1))
                                (* (aref state 0 2)  (aref state 0 2))))))

  ;; RA
  (setf (aref rdd 0) (the double-float
                       (atan (aref state 0 1) (* (aref state 0 0) R2H))))
  (when (< (aref rdd 0)  0.0d0) (incf (aref rdd 0) 24.0d0))

  ;; Declination
  (setf (aref rdd 1) (the double-float
                       (* (asin (/ (aref state 0 2) (aref rdd 2))) R2D)))
  (values));;radecdist


;;---------------------------------------------------------------------------
;; Entry point
;; Calculate RA and Dec for noon on every day in 1900-2100
(defun main ()
  (let ((jd (make-array '(2) :element-type 'double-float :initial-element 0d0))
        (pv (make-array '(2 3) :element-type 'double-float :initial-element 0d0))
        (position (make-array '(3) :element-type 'double-float :initial-element 0d0)))
    (declare (type (simple-array double-float (2)) jd)
             (type (simple-array double-float (2 3)) pv)
             (type (simple-array double-float (3)) position))
    (do ((i 0 (1+ i)))
        ((>= i TEST-LOOPS))
      (declare (type int i))
      (setf (aref jd 0) J2000
            (aref jd 1) 0.0d0)
      (do ((n 0 (1+ n)))
          ((>= n TEST-LENGTH))
        (declare (type int n))
        (incf (aref jd 0) 1.0d0)
        (do ((p 0 (1+ p)))
            ((>= p 8))
          (declare (type int p))
          (planetpv jd p pv)
          (radecdist pv position) ))))
  0);;main


(time (almabench:main))

;;;; almabench.lisp                   -- 2004-03-05 22:04:37 -- pascal   ;;;;
------------------------------------------------------------------------

-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Duane Rettig
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <4vflikarh.fsf@franz.com>
Pascal Bourguignon <····@thalassa.informatimago.com> writes:

> Raymond Toy <···@rtp.ericsson.se> writes:
> 
> > >>>>> "Gareth" == Gareth McCaughan <················@pobox.com> writes:
> > 
> >     Gareth> For what it's worth, on my machine with CMUCL 18e and g++ 3.2.3
> >     Gareth> I get about 52.4s for C++ and about 59.2s for your Lisp version.
> >     Gareth> (My machine is a 1GHz Athlon.) I haven't tried any newer version
> > 
> > Very nice.  Could you send or post your current code?  I'm just
> > curious to see if there's anything else missing or if there's some
> > obvious deficiencies.
> > 
> > Of course, you realize that the OP will just say, "Look how bad Lisp
> > is.  It took DAYS to make it run as fast as C++.  The C++ code was
> > already fast."
> 
> Here  is  the  latest  source  and the  summary.   Let's  gather  some
> statistics...   There is  still a  lot  of consing  done... but  we're
> within 17% of C++ super-optimized code.

Pascal,

I have currently no time to spend on benchmarks, but I took an hour
anyway to modify your code to run on Allegro CL fairly quickly.  I
only did a quick-and-dirty job, and the benchmark does indeed point out
a few areas where I can tweak our lisp to produce better results (as
well as a couple of areas which we had already found for the next
version of our product after 6.2).  I got times in the neighborhood of
37.9 Seconds on an athlong 1.4 ghz machine, with virtually no consing,
which should be a little closer to the g++ times.  However, rather
than post it here directly, I'd prefer to send it to you, so you can
run it on the same machine on which you are running all of your other
benchmarks, in order to get an apples-to-apples comparison.

I assume that your email address is not valid (due to the nature of
the name).  If you would like me to send the modified source to
you, please send me private email to which I can respond.

BTW, if you don't want the source, at least note that you misspelled
%pi-a in the declaim form (it is listed as pi-a).  A good profiler
would have caught that :-)

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: rif
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <wj0y8qeznkm.fsf@five-percent-nation.mit.edu>
The most recent version of the code does not give the correct answers
(which I'm defining here to be the valuea given by the C++ version of
the benchmark), at least on my machine.  I can't post code right now,
because the computer I'm doing CL on isn't networked on the weekend,
but I can describe the problems.

1.  In the function radecdist, I believe that instead of:

  ;; RA 
  (setf (aref rdd 0) (the double-float
                       (atan (aref state 0 1) (* (aref state 0 0) R2H))))

you want:

(setf (aref add 0) (the double-float
                     (* (atan (aref state 0 1) (aref state 0 0)) R2H)))

2.  (Much weirder) On my machine, (a P3 1200 MhZ laptop, CMU CVS Head 2003-09-14), 

    (setf x (* r  (- xcw (* xm2 xp)))
          y (* r  (+ xsw (* xm2 xq)))
          z (* r  (* (- xm2) ci2)))

does not do the right thing.  In particular, it seems that for
whatever reason, inside this function, (- xm2) is being interpreted as
xm2, so z is getting a value that is the negative of what it should
get.  The workaround for me was to change (- xm2) in this function to
(- 0.0d0 xm2), at which point (when I also made the fix above) I got
the same numbers as the C++ version.  Is this a known problem with
that version?  I just did a new Linux install today, and that was the
newest version I could find for download...

rif
From: Pascal Bourguignon
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87y8qe7flk.fsf@thalassa.informatimago.com>
rif <···@mit.edu> writes:

> The most recent version of the code does not give the correct answers
> (which I'm defining here to be the valuea given by the C++ version of
> the benchmark), at least on my machine.  I can't post code right now,
> because the computer I'm doing CL on isn't networked on the weekend,
> but I can describe the problems.
> 
> 1.  In the function radecdist, I believe that instead of:
> 
>   ;; RA 
>   (setf (aref rdd 0) (the double-float
>                        (atan (aref state 0 1) (* (aref state 0 0) R2H))))
> 
> you want:
> 
> (setf (aref add 0) (the double-float
>                      (* (atan (aref state 0 1) (aref state 0 0)) R2H)))

Indeed.
 

> 2.  (Much weirder) On my machine, (a P3 1200 MhZ laptop, CMU CVS Head 2003-09-14), 
> 
>     (setf x (* r  (- xcw (* xm2 xp)))
>           y (* r  (+ xsw (* xm2 xq)))
>           z (* r  (* (- xm2) ci2)))
> 
> does not do the right thing.  In particular, it seems that for
> whatever reason, inside this function, (- xm2) is being interpreted as
> xm2, so z is getting a value that is the negative of what it should
> get.  The workaround for me was to change (- xm2) in this function to
> (- 0.0d0 xm2), at which point (when I also made the fix above) I got
> the same numbers as the C++ version.  Is this a known problem with
> that version?  I just did a new Linux install today, and that was the
> newest version I could find for download...

Yes, that would be an oversight in my macro.  This works correctly:

(defmacro opt-op (op type)
  (let ((cl-op (intern (string op) "COMMON-LISP")))
    `(defmacro ,op (&rest args)
       (labels
         ((step-ify
           (args)
           (case (length args)
             ((1)  (list 'the ',type (list ',cl-op (car args))))
             ((2)  (list 'the ',type (list ',cl-op (first args) (second args))))
             (otherwise
              (step-ify (cons (list 'the ',type
                                    (list ',cl-op (first args) (second args)))
                              (cddr args)))))))
         (step-ify (mapcar (lambda (arg) (list 'the ',type arg)) args))))))


With these corrections, I get 60  seconds for both sbcl and cmucl, the
same time when expanding:

    (almabench::- a b c d )

as:

(THE DOUBLE-FLOAT
 (-
  (THE DOUBLE-FLOAT
   (- (THE DOUBLE-FLOAT (- (THE DOUBLE-FLOAT A) (THE DOUBLE-FLOAT B)))
    (THE DOUBLE-FLOAT C)))
  (THE DOUBLE-FLOAT D))) ;

or as:

(THE DOUBLE-FLOAT (- (THE DOUBLE-FLOAT A) (THE DOUBLE-FLOAT B)
                     (THE DOUBLE-FLOAT C) (THE DOUBLE-FLOAT D)))



I'll try again when my system'll be less loaded...

-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Brian Mastenbrook
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <060320040038543147%NOSPAMbmastenbNOSPAM@cs.indiana.edu>
In article <·············@franz.com>, Duane Rettig <·····@franz.com>
wrote:

> I have currently no time to spend on benchmarks, but I took an hour
> anyway to modify your code to run on Allegro CL fairly quickly.  I
> only did a quick-and-dirty job, and the benchmark does indeed point out
> a few areas where I can tweak our lisp to produce better results (as
> well as a couple of areas which we had already found for the next
> version of our product after 6.2).  I got times in the neighborhood of
> 37.9 Seconds on an athlong 1.4 ghz machine, with virtually no consing,
> which should be a little closer to the g++ times.  However, rather
> than post it here directly, I'd prefer to send it to you, so you can
> run it on the same machine on which you are running all of your other
> benchmarks, in order to get an apples-to-apples comparison.

I'd be interested in seeing your ACL source, as the version that
Raymond Toy and myself have come up with does not fare very well at all
under ACL or LispWorks. But, on the other hand, CMUCL and SBCL reliably
beat GCC on the P4! Nobody's made the mistake of choosing the platform
on which we seem to score best. Nobody should have picked MIPS.

The version of the benchmark that I have been running is at
http://www.cs.indiana.edu/~bmastenb/misc/almabench.lisp . It compiles
without (notable) notes or style-warnings on CMUCL and SBCL. This
example also does a wonderful job of showing where we need an
instruction scheduler, as it would improve the PowerPC results a lot
(at least according to Shark).

SBCL 0.8.8 on a P4 1.6GHz:

Evaluation took:
                 50.213 seconds of real time
                 50.18 seconds of user run time
                 0.02 seconds of system run time
                 0 page faults and
                 0 bytes consed.

gcc (GCC) 3.2 20020903 (Red Hat Linux 8.0 3.2-7)
CPPFLAGS = -march=pentium4 -mcpu=pentium4 -O3 -msse -msse2 -mmmx
./almabench  52.97s user 0.02s system 99% cpu 53.325 total



SBCL 0.8.8 on a G5 2.0GHz:

Evaluation took:
                 42.564 seconds of real time
                 41.97 seconds of user run time
                 0.07 seconds of system run time
                 0 page faults and
                 232 bytes consed.
                 
CMU Common Lisp darwin-exp 2004-01-28-020, running on distral
;   37.67 seconds of real time
;   37.33 seconds of user run time
;   0.11 seconds of system run time
;   0 CPU cycles
;   0 page faults and
;   152 bytes consed.

Note: CMU numbers are suspect due to known floating point bugs in the
PPC backend, until all changes from SBCL have been merged.

LispWorks 4.3 Personal - note, (optimize (float 0)) was used in
addition:

Timing the evaluation of (ALMABENCH:MAIN)

user time    =    227.610
system time  =      0.580
Elapsed time =   0:03:51
Allocation   = 40182207768 bytes
0 Page faults
Calls to %EVAL    31

gcc (GCC) 3.3 20030304 (Apple Computer, Inc. build 1495)
CPPFLAGS = -fast
./almabench  23.58s user 0.08s system 98% cpu 24.111 total


SBCL 0.8.8.3 on an Athlon 2100+

Evaluation took:
                 29.306 seconds of real time
                 27.49582 seconds of user run time
                 0.035994 seconds of system run time
                 0 page faults and
                 0 bytes consed.

gcc (GCC) 3.3.3 (Debian)
CPPFLAGS = -march=athlon -mcpu=athlon -O3 -msse -mmmx

real    0m27.278s
user    0m26.156s
sys     0m0.032s


450 MHz Ultrasparc II
CMUCL CVS with local mods on 450 MHz Ultrasparc II

; Evaluation took:
;   271.37 seconds of real time
;   268.23 seconds of user run time
;   0.0 seconds of system run time
;   122,107,201,896 CPU cycles
;   0 page faults and
;   160 bytes consed.

gcc 3.3 (Solaris 8) Ultrasparc II
gcc -O6 -mcpu=ultrasparc almabench.cpp -o almabench -lm
./almabench     2m47.081s    2m45.190s     0m0.000s


866 MHz P3:

CVS CVS with local mods (different from solaris) on a 866 MHz P3

; Evaluation took:
;   56.99 seconds of real time
;   53.283 seconds of user run time
;   0.051 seconds of system run time
;   49,226,947,713 CPU cycles
;   0 page faults and
;   152 bytes consed.

SBCL 0.8.8

Evaluation took:
                 62.872 seconds of real time
                 56.034 seconds of user run time
                 0.043 seconds of system run time
                 0 page faults and
                 0 bytes consed.

gcc 3.3.1 (SuSE Linux)  Suse 9.0
gcc -march=pentium3 -mcpu=pentium3 -O3 -msse -mmmx almabench.cpp -o
almabench -lm

./almabench     0m56.668s    0m53.892s     0m0.036s

ACL 6.2

; cpu time (non-gc) 548,770 msec (00:09:08.770) user, 1,090 msec system
; cpu time (gc)     350,540 msec (00:05:50.540) user, 820 msec system
; cpu time (total)  899,310 msec (00:14:59.310) user, 1,910 msec system
; real time  968,757 msec (00:16:08.757)
; space allocation:
;  1 cons cell, 1,807,865,992 other bytes, 0 static bytes

CMUCL 18e, 180MHz MIPS R5000
; Evaluation took:
;   1915.27 seconds of real time
;   1831.2148 seconds of user run time
;   5.309149 seconds of system run time
;   0 CPU cycles
;   0 page faults and
;   200 bytes consed.
; 

MIPSpro Compilers: Version 7.3.1.2m
CPPFLAGS = -O3 -g0
./almabench  214.80s user 0.56s system 96% cpu 3:43.07 total

SBCL 0.8.8, 1GHz G4, 2MB L3 cache:

Evaluation took:
                 101.262 seconds of real time
                 101.02 seconds of user run time
                 0.03 seconds of system run time
                 0 page faults and
                 232 bytes consed.
                 
gcc (GCC) 3.3 20030304 (Apple Computer, Inc. build 1495)
CPPFLAGS = -fast -mcpu=7450 -mtune=7450
./almabench  63.87s user 0.07s system 99% cpu 1:04.04 total

A suspected AMD Athlon XP 1600+:

Both tests were run multiple times; these results were consistent.

SBCL 0.8.7.55
44.399 seconds of real time
40.29587 seconds of user run time
0.128981 seconds of system run time
0 page faults and
0 bytes consed.

gcc (GCC) 3.3.2
Optimization: -march=athlon -mcpu=athlon -O3
real    0m42.335s
user    0m38.441s
sys     0m0.106s

Intel Pentium 4 3.0 GHz

g++ (GCC) 3.3.1 20030626 (Debian prerelease)
g++ -o almabench -march=pentium4 -mcpu=pentium4 -O3 -msse -msse2 -mmmx
almabench.o

real    0m27.374s
user    0m27.360s
sys     0m0.000s


CMUCL 18e:

* (time (almabench:main))
; Compiling LAMBDA NIL: 
; Compiling Top-Level Form: 

; Evaluation took:
;   26.16 seconds of real time
;   26.16 seconds of user run time
;   0.0 seconds of system run time
;   78,845,426,712 CPU cycles
;   0 page faults and
;   216 bytes consed.
; 

SBCL 0.8.5.24

* (time  (almabench:main))

Evaluation took:
                 26.895 seconds of real time
                 26.9 seconds of user run time
                 0.0 seconds of system run time
                 0 page faults and
                 0 bytes consed.

-- 
Brian Mastenbrook
http://www.cs.indiana.edu/~bmastenb/
From: Pascal Bourguignon
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87smgm770a.fsf@thalassa.informatimago.com>
Brian Mastenbrook <····················@cs.indiana.edu> writes:
> I'd be interested in seeing your ACL source, as the version that
> Raymond Toy and myself have come up with does not fare very well at all
> under ACL or LispWorks. But, on the other hand, CMUCL and SBCL reliably
> beat GCC on the P4! Nobody's made the mistake of choosing the platform
> on which we seem to score best. Nobody should have picked MIPS.

Here  are  the  times  I  get  with  sbcl  and  acl-6.2-trial  on  the
almabench.cl Duane sent me, and with Brian's version:


    compiler            athlon-1200
    -----------------  ------------
    g++    3.3          39.540
    -----------------  ------------
    sbcl   0.8.7        46.12       almabench-rettig.cl
    acl    6.2-trial    47.40
    -----------------  ------------
    sbcl   0.8.7        42.57       almabench-mastenbrook.lisp  
    cmucl  18d          43.05
    -----------------  ------------
    Total: between 7.6% and 20% of C++ times.

Most significantly, almabench-rettig.cl does not cons on acl:

; cpu time (non-gc) 47,900 msec user, 120 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  47,900 msec user, 120 msec system
; real time  68,468 msec (00:01:08.468)
; space allocation:
;  1 cons cell, 168 other bytes, 0 static bytes


and  almabench-mastenbrook.lisp   does not cons on sbcl:

Evaluation took:
  		 57.694 seconds of real time
  		 42.57 seconds of user run time
  		 0.08 seconds of system run time
  		 3 page faults and
  		 0 bytes consed.

Evaluation took:
  144.22 seconds of real time
  43.05 seconds of user run time
  0.13 seconds of system run time
  0 page faults and
  0 bytes consed.



almabench-mastenbrook.lisp  at:    
http://www.cs.indiana.edu/~bmastenb/misc/almabench.lisp



almabench-rettig.cl
------------------------------------------------------------------------
;; ALMABENCH 1.0.1
;; Common-Lisp version
;; 
;; A number-crunching benchmark designed for cross-language and vendor
;; comparisons.
;; 
;; Written by Scott Robert Ladd.
;;
;; Naively translated to Common-Lisp by Pascal Bourguignon.
;;
;; Fine tuning with input from comp.lang.lisp readers:
;; Gareth McCaughan, Raymond Toy, Christian Lynbech, Will Hartung,
;; Marco Antoniotti, Nils G�sche,
;;
;; Tuned for ACL by Duane Rettig.
;;
;;
;; Bradley J Lucier converted it to scheme.
;;
;; (and special thanks to ··················@yahoo.com (nobody)
;;  for challenging Common-Lisp programmers to prove once more that
;;  Common-Lisp compilers can be faster than C++ compilers).
;;
;; No rights reserved. This is public domain software, for use by anyone.
;; 
;; This program calculates the daily ephemeris (at noon) for the years
;; 2000-2099 using an algorithm developed by J.L. Simon, P. Bretagnon, J.
;; Chapront, M. Chapront-Touze, G. Francou and J. Laskar of the Bureau des
;; Longitudes, Paris, France), as detailed in Astronomy & Astrophysics
;; 282, 663 (1994)
;; 
;; Note that the code herein is design for the purpose of testing 
;; computational performance error handling and other such "niceties"
;; is virtually non-existent.
;; 
;; Actual (and oft-updated) benchmark results can be found at:
;; http:www.coyotegulch.com
;; 
;; Please do not use this information or algorithm in any way that might
;; upset the balance of the universe or otherwise cause planets to impact
;; upon one another.


(defpackage "ALMABENCH"
  (:use "COMMON-LISP")
  (:shadow "+" "*" "-" "/" "SQRT" "ASIN" )
  (:export "MAIN"));;ALMABENCH
(in-package "ALMABENCH")



;;----------------------------------------------------------------------
;; Common-Lisp speed optimization

(declaim (optimize (safety 0) ;; run-time error checking            
                   (space  0) ;; both code size and run-time space  
                   (speed  3))) ;; speed of the object code           



(defmacro unboxed-array (element-type contents)
  `(make-array  (quote ,(do ((level contents (car level))
                             (dimensions '()))
                            ((atom level) (nreverse dimensions))
                          (push (length level) dimensions)))
                :element-type (quote ,element-type)
                :initial-contents (quote ,contents)));;unboxed-array



(defmacro opt-op (op type)
  (let ((cl-op (intern (string op) "COMMON-LISP")))
    `(defmacro ,op (&rest args)
       (labels ((gen
                 (args)
                 (if (null (cdr args))
                   (list 'the ',type (car args))
                   (list 'the ',type (list ',cl-op (list 'the ',type (car args))
                                           (gen (cdr args)))))))
         (gen args)))));;opt-op


(opt-op + double-float)
(opt-op - double-float)
(opt-op * double-float)
(opt-op / double-float)


(defmacro sqrt (arg)
  `(the (double-float 0d0) (common-lisp:sqrt (the (double-float 0d0) ,arg))))
#-allegro
(defmacro asin (arg)
  `(the double-float (common-lisp:asin (the (double-float -1d0 1d0) ,arg))))

#+allegro
(ff:def-foreign-call (asin "asin") ((dfloat :double))
  :returning :double
  :arg-checking nil
  :call-direct t)

#-allegro
(defmacro datan (float1 float2)
  `(the double-float (atan ,float1 ,float2)))

#+allegro
(ff:def-foreign-call (datan "atan") ((float1 :double) (float2 :double))
  :returning :double
  :arg-checking nil
  :call-direct t)


#-allegro
(declaim (inline fmod))
#-allegro
(defun fmod (num div)
  (declare (double-float num div)
           (optimize speed (space 0)))
  (let ((q (ftruncate (/ num div))))
    (- num (* q div))));;fmod

#+allegro
(defmacro fmod (num div)
  `(float (multiple-value-bind (q r)
	      (common-lisp:truncate
	       (the double-float ,num)
	       (the double-float ,div))
	    (declare (ignore q))
	    (the fixnum r)))
  1.0d0);;fmod

;;------------------------------------------------------------------------
(eval-when (compile load eval)
#-allegro
(deftype int () '(signed-byte 32))
#+allegro
(deftype int () 'fixnum)

(deftype ranged-double ()
  `(double-float #.(cl:- (expt 2.0d0 63)) #.(expt 2.0d0 63)))
)

(declaim (type int test-loops test-length))
(declaim (type double-float %pi j2000 jcentury jmillenia twopi a2r r2h r2d gaussk
               sineps coseps))
(declaim (type (simple-array double-float (8))   amas))
(declaim (type (simple-array double-float (8 3)) a dlm e %pi-a dinc omega))
(declaim (type (simple-array double-float (8 9)) kp ca sa))
(declaim (type (simple-array double-float (8 10)) kq cl sl))


;; number of days to include in test
(defconstant TEST-LOOPS   20)
(defconstant TEST-LENGTH  36525)


(defconstant  %PI         3.14159265358979323846d0)
(defconstant  J2000      2451545.0d0)
(defconstant  JCENTURY   36525.0d0)
(defconstant  JMILLENIA  365250.0d0)
(defconstant  TWOPI      (the double-float (* 2.0d0  %PI)))
(defconstant  A2R        (the double-float (/ %PI  648000.0d0)))
(defconstant  R2H        (the double-float (/ 12.0d0  %PI)))
(defconstant  R2D        (the double-float (/ 180.0d0  %PI)))
(defconstant  GAUSSK     0.01720209895d0)

;; sin and cos of j2000 mean obliquity (iau 1976)
(defconstant  sineps  0.3977771559319137d0)
(defconstant  coseps  0.9174820620691818d0)





(defconstant amas
  (unboxed-array
   double-float
   ( 6023600.0d0  408523.5d0  328900.5d0  3098710.0d0
                  1047.355d0  3498.5d0  22869.0d0  19314.0d0 )));;amas

;; tables giving the mean keplerian elements, limited to tt**2 terms:
;; a       semi-major axis (au)
;; dlm     mean longitude (degree and arcsecond)
;; e       eccentricity
;; %pi      longitude of the perihelion (degree and arcsecond)
;; dinc    inclination (degree and arcsecond)
;; omega   longitude of the ascending node (degree and arcsecond)

(defconstant  a
  (UNBOXED-ARRAY DOUBLE-FLOAT
                 ( (   0.3870983098d0              0d0         0d0 )
                   (   0.7233298200d0              0d0         0d0 )
                   (   1.0000010178d0              0d0         0d0 )
                   (   1.5236793419d0          3d-10        0d0 )
                   (   5.2026032092d0      19132d-10  -39d-10)
                   (   9.5549091915d0  -0.0000213896d0   444d-10)
                   (  19.2184460618d0      -3716d-10  979d-10)
                   (  30.1103868694d0     -16635d-10  686d-10))));;a


(defconstant  dlm
  (UNBOXED-ARRAY DOUBLE-FLOAT
                 ( ( 252.25090552d0  5381016286.88982d0   -1.92789d0 )
                   ( 181.97980085d0  2106641364.33548d0    0.59381d0 )
                   ( 100.46645683d0  1295977422.83429d0   -2.04411d0 )
                   ( 355.43299958d0   689050774.93988d0    0.94264d0 )
                   (  34.35151874d0   109256603.77991d0  -30.60378d0 )
                   (  50.07744430d0    43996098.55732d0   75.61614d0 )
                   ( 314.05500511d0    15424811.93933d0   -1.75083d0 )
                   ( 304.34866548d0     7865503.20744d0    0.21103d0 ))));;dlm


(defconstant  e
  (UNBOXED-ARRAY DOUBLE-FLOAT
                 ( (   0.2056317526d0   0.0002040653d0     -28349d-10 )
                   (   0.0067719164d0  -0.0004776521d0      98127d-10 )
                   (   0.0167086342d0  -0.0004203654d0  -0.0000126734d0 )
                   (   0.0934006477d0   0.0009048438d0     -80641d-10 )
                   (   0.0484979255d0   0.0016322542d0  -0.0000471366d0 )
                   (   0.0555481426d0  -0.0034664062d0  -0.0000643639d0 )
                   (   0.0463812221d0  -0.0002729293d0   0.0000078913d0 )
                   (   0.0094557470d0   0.0000603263d0             0d0 ))));;e


(defconstant  %pi-a
  (UNBOXED-ARRAY DOUBLE-FLOAT
                 ( (  77.45611904d0   5719.11590d0    -4.83016d0 )
                   ( 131.56370300d0    175.48640d0  -498.48184d0 )
                   ( 102.93734808d0  11612.35290d0    53.27577d0 )
                   ( 336.06023395d0  15980.45908d0   -62.32800d0 )
                   (  14.33120687d0   7758.75163d0   259.95938d0 )
                   (  93.05723748d0  20395.49439d0   190.25952d0 )
                   ( 173.00529106d0   3215.56238d0   -34.09288d0 )
                   (  48.12027554d0   1050.71912d0    27.39717d0 ))));;%pi-a


(defconstant  dinc
  (UNBOXED-ARRAY DOUBLE-FLOAT
                 ( (   7.00498625d0  -214.25629d0    0.28977d0 )
                   (   3.39466189d0   -30.84437d0  -11.67836d0 )
                   (            0d0   469.97289d0   -3.35053d0 )
                   (   1.84972648d0  -293.31722d0   -8.11830d0 )
                   (   1.30326698d0   -71.55890d0   11.95297d0 )
                   (   2.48887878d0    91.85195d0  -17.66225d0 )
                   (   0.77319689d0   -60.72723d0    1.25759d0 )
                   (   1.76995259d0     8.12333d0    0.08135d0 ))));;dinc


(defconstant  omega
  (UNBOXED-ARRAY DOUBLE-FLOAT
                 ( (  48.33089304d0   -4515.21727d0   -31.79892d0 )
                   (  76.67992019d0  -10008.48154d0   -51.32614d0 )
                   ( 174.87317577d0   -8679.27034d0    15.34191d0 )
                   (  49.55809321d0  -10620.90088d0  -230.57416d0 )
                   ( 100.46440702d0    6362.03561d0   326.52178d0 )
                   ( 113.66550252d0   -9240.19942d0   -66.23743d0 )
                   (  74.00595701d0    2669.15033d0   145.93964d0 )
                   ( 131.78405702d0    -221.94322d0    -0.78728d0 ))));;omega


;; tables for trigonometric terms to be added to the mean elements
;; of the semi-major axes.
(defconstant  kp
  (UNBOXED-ARRAY
   DOUBLE-FLOAT
   ( ( 69613.0d0   75645.0d0  88306.0d0  59899.0d0  15746.0d0  71087.0d0  142173.0d0   3086.0d0     0.0d0 )
     ( 21863.0d0   32794.0d0  26934.0d0  10931.0d0  26250.0d0  43725.0d0   53867.0d0  28939.0d0     0.0d0 )
     ( 16002.0d0   21863.0d0  32004.0d0  10931.0d0  14529.0d0  16368.0d0   15318.0d0  32794.0d0     0.0d0 )
     (  6345.0d0    7818.0d0  15636.0d0   7077.0d0   8184.0d0  14163.0d0    1107.0d0   4872.0d0     0.0d0 )
     (  1760.0d0    1454.0d0   1167.0d0    880.0d0    287.0d0   2640.0d0      19.0d0   2047.0d0  1454.0d0 )
     (   574.0d0       0.0d0    880.0d0    287.0d0     19.0d0   1760.0d0    1167.0d0    306.0d0   574.0d0 )
     (   204.0d0       0.0d0    177.0d0   1265.0d0      4.0d0    385.0d0     200.0d0    208.0d0   204.0d0 )
     (     0.0d0     102.0d0    106.0d0      4.0d0     98.0d0   1367.0d0     487.0d0    204.0d0     0.0d0 ))));;kp


(defconstant  ca
  (UNBOXED-ARRAY
   DOUBLE-FLOAT
   ( (       4.0d0     -13.0d0     11.0d0     -9.0d0     -9.0d0     -3.0d0     -1.0d0      4.0d0     0.0d0 )
     (    -156.0d0      59.0d0    -42.0d0      6.0d0     19.0d0    -20.0d0    -10.0d0    -12.0d0     0.0d0 )
     (      64.0d0    -152.0d0     62.0d0     -8.0d0     32.0d0    -41.0d0     19.0d0    -11.0d0     0.0d0 )
     (     124.0d0     621.0d0   -145.0d0    208.0d0     54.0d0    -57.0d0     30.0d0     15.0d0     0.0d0 )
     (  -23437.0d0   -2634.0d0   6601.0d0   6259.0d0  -1507.0d0  -1821.0d0   2620.0d0  -2115.0d0 -1489.0d0 )
     (   62911.0d0 -119919.0d0  79336.0d0  17814.0d0 -24241.0d0  12068.0d0   8306.0d0  -4893.0d0  8902.0d0 )
     (  389061.0d0 -262125.0d0 -44088.0d0   8387.0d0 -22976.0d0  -2093.0d0   -615.0d0  -9720.0d0  6633.0d0 )
     ( -412235.0d0 -157046.0d0 -31430.0d0  37817.0d0  -9740.0d0    -13.0d0  -7449.0d0   9644.0d0     0.0d0 ))));;ca


(defconstant  sa
  (UNBOXED-ARRAY
   DOUBLE-FLOAT
   ( (     -29.0d0     -1.0d0      9.0d0      6.0d0     -6.0d0      5.0d0      4.0d0      0.0d0     0.0d0 )
     (     -48.0d0   -125.0d0    -26.0d0    -37.0d0     18.0d0    -13.0d0    -20.0d0     -2.0d0     0.0d0 )
     (    -150.0d0    -46.0d0     68.0d0     54.0d0     14.0d0     24.0d0    -28.0d0     22.0d0     0.0d0 )
     (    -621.0d0    532.0d0   -694.0d0    -20.0d0    192.0d0    -94.0d0     71.0d0    -73.0d0     0.0d0 )
     (  -14614.0d0 -19828.0d0  -5869.0d0   1881.0d0  -4372.0d0  -2255.0d0    782.0d0    930.0d0   913.0d0 )
     (  139737.0d0      0.0d0  24667.0d0  51123.0d0  -5102.0d0   7429.0d0  -4095.0d0  -1976.0d0 -9566.0d0 )
     ( -138081.0d0      0.0d0  37205.0d0 -49039.0d0 -41901.0d0 -33872.0d0 -27037.0d0 -12474.0d0 18797.0d0 )
     (       0.0d0  28492.0d0 133236.0d0  69654.0d0  52322.0d0 -49577.0d0 -26430.0d0  -3593.0d0     0.0d0 ))));;sa


;; tables giving the trigonometric terms to be added to the mean
;;  elements of the mean longitudes.
(defconstant  kq
  (UNBOXED-ARRAY
   DOUBLE-FLOAT
   ( (  3086.0d0  15746.0d0  69613.0d0  59899.0d0  75645.0d0  88306.0d0  12661.0d0  2658.0d0   0.0d0    0.0d0 )
     ( 21863.0d0  32794.0d0  10931.0d0     73.0d0   4387.0d0  26934.0d0   1473.0d0  2157.0d0   0.0d0    0.0d0 )
     (    10.0d0  16002.0d0  21863.0d0  10931.0d0   1473.0d0  32004.0d0   4387.0d0    73.0d0   0.0d0    0.0d0 )
     (    10.0d0   6345.0d0   7818.0d0   1107.0d0  15636.0d0   7077.0d0   8184.0d0   532.0d0  10.0d0    0.0d0 )
     (    19.0d0   1760.0d0   1454.0d0    287.0d0   1167.0d0    880.0d0    574.0d0  2640.0d0  19.0d0 1454.0d0 )
     (    19.0d0    574.0d0    287.0d0    306.0d0   1760.0d0     12.0d0     31.0d0    38.0d0  19.0d0  574.0d0 )
     (     4.0d0    204.0d0    177.0d0      8.0d0     31.0d0    200.0d0   1265.0d0   102.0d0   4.0d0  204.0d0 )
     (     4.0d0    102.0d0    106.0d0      8.0d0     98.0d0   1367.0d0    487.0d0   204.0d0   4.0d0  102.0d0 ))));;kq


(defconstant  cl
  (UNBOXED-ARRAY
   DOUBLE-FLOAT
   ( (      21.0d0    -95.0d0  -157.0d0    41.0d0    -5.0d0    42.0d0    23.0d0    30.0d0      0.0d0     0.0d0 )
     (    -160.0d0   -313.0d0  -235.0d0    60.0d0   -74.0d0   -76.0d0   -27.0d0    34.0d0      0.0d0     0.0d0 )
     (    -325.0d0   -322.0d0   -79.0d0   232.0d0   -52.0d0    97.0d0    55.0d0   -41.0d0      0.0d0     0.0d0 )
     (    2268.0d0   -979.0d0   802.0d0   602.0d0  -668.0d0   -33.0d0   345.0d0   201.0d0    -55.0d0     0.0d0 )
     (    7610.0d0  -4997.0d0 -7689.0d0 -5841.0d0 -2617.0d0  1115.0d0  -748.0d0  -607.0d0   6074.0d0   354.0d0 )
     (  -18549.0d0  30125.0d0 20012.0d0  -730.0d0   824.0d0    23.0d0  1289.0d0  -352.0d0 -14767.0d0 -2062.0d0 )
     ( -135245.0d0 -14594.0d0  4197.0d0 -4030.0d0 -5630.0d0 -2898.0d0  2540.0d0  -306.0d0   2939.0d0  1986.0d0 )
     (   89948.0d0   2103.0d0  8963.0d0  2695.0d0  3682.0d0  1648.0d0   866.0d0  -154.0d0  -1963.0d0  -283.0d0 ))));;cl


(defconstant  sl
  (UNBOXED-ARRAY
   DOUBLE-FLOAT
   ( (   -342.0d0    136.0d0   -23.0d0    62.0d0    66.0d0   -52.0d0   -33.0d0    17.0d0      0.0d0     0.0d0 )
     (    524.0d0   -149.0d0   -35.0d0   117.0d0   151.0d0   122.0d0   -71.0d0   -62.0d0      0.0d0     0.0d0 )
     (   -105.0d0   -137.0d0   258.0d0    35.0d0  -116.0d0   -88.0d0  -112.0d0   -80.0d0      0.0d0     0.0d0 )
     (    854.0d0   -205.0d0  -936.0d0  -240.0d0   140.0d0  -341.0d0   -97.0d0  -232.0d0    536.0d0     0.0d0 )
     ( -56980.0d0   8016.0d0  1012.0d0  1448.0d0 -3024.0d0 -3710.0d0   318.0d0   503.0d0   3767.0d0   577.0d0 )
     ( 138606.0d0 -13478.0d0 -4964.0d0  1441.0d0 -1319.0d0 -1482.0d0   427.0d0  1236.0d0  -9167.0d0 -1918.0d0 )
     (  71234.0d0 -41116.0d0  5334.0d0 -4935.0d0 -1848.0d0    66.0d0   434.0d0 -1748.0d0   3780.0d0  -701.0d0 )
     ( -47645.0d0  11647.0d0  2166.0d0  3194.0d0   679.0d0     0.0d0  -244.0d0  -419.0d0  -2531.0d0    48.0d0 ))));;sl

;;---------------------------------------------------------------------------
;; Normalize angle into the range -%pi <= A < +%pi.


(declaim (ftype (function (double-float) double-float) anpm))
(declaim (inline anpm))

#+allegro
(eval-when (compile)
  (setf (get 'anpm 'sys::immed-args-call)
    '((double-float) double-float)))

#||
(defun anpm (aa)
  (declare (type double-float aa))
  (multiple-value-bind (dummy w) (ffloor aa TWOPI)
    (declare (ignore dummy) (type double-float w))
    (if (>= w %pi) (- w TWOPI) w)));;anpm-1
The following seems to be slightly faster on sbcl:
||#

#-allegro
(defun anpm (aa)
  (declare (type double-float aa))
  (let ((w (fmod aa TWOPI)))
    (when (>= (abs w) %pi)
      (setf w (- w (if (< aa 0.0d0) (- twopi) twopi))))
    (values (the double-float w))));;anpm



#+allegro
(defun anpm (aa)
  (declare (type double-float aa))
  (let ((w (fmod aa TWOPI)))
    (when (>= (abs w) %pi)
      (setq w (- w (if (< aa 0.0d0) (- twopi) twopi))))
    w));;anpm

;;(declaim (ftype (function ((simple-array double-float (2))
;;                           int
;;                           (simple-array double-float (2 3))) double-float) planetpv))



;;---------------------------------------------------------------------------    
;; The reference frame is equatorial and is with respect to the
;;    mean equator and equinox of epoch j2000.
(defun  planetpv (epoch np pv)
  (declare (type (simple-array double-float (2)) epoch)
           (type int np)
           (type (simple-array double-float (2 3)) pv))
  ;; working storage
  (let ((k 0)
        (tt 0d0) (da 0d0) (dl 0d0) (de 0d0) (dp 0d0) (di 0d0)
        (doh 0d0) (dmu 0d0) (arga 0d0) (argl 0d0) (am 0d0)
        (ae 0d0) (dae 0d0) (ae2 0d0) (at 0d0) (r 0d0) (v 0d0)
        (si2 0d0) (xq 0d0) (xp 0d0) (tl 0d0) (xsw 0d0)
        (xcw 0d0) (xm2 0d0) (xf 0d0) (ci2 0d0) (xms 0d0)
        (xmc 0d0) (xpxq2 0d0) (x 0d0) (y 0d0) (z 0d0))
    (declare (type int k)
             (type ranged-double
                   tt da dl de dp di doh dmu arga argl am
                   ae dae ae2 at r v si2 xq xp tl xsw
                   xcw xm2 xf ci2 xms xmc xpxq2 x y z))

    ;; time: julian millennia since j2000.
    (setf tt  (/ (+ (- (aref epoch 0) J2000) (aref epoch 1))  JMILLENIA))

    ;; compute the mean elements.
    (setq da (+ (aref a np 0)  (* (+ (aref a np 1) (* (aref a np 2) tt)) tt))
          dl (* (+ (* 3600.0d0 (aref dlm np 0)) (* (+ (aref dlm np 1) (* (aref dlm np 2) tt)) tt)) A2R)
          de (+ (aref e np 0) (* (+ (aref e np 1) (* (aref e np 2) tt)) tt))
          dp (anpm (* (+ (* 3600.0d0 (aref %pi-a np 0)) (* (+ (aref %pi-a np 1) (* (aref %pi-a np 2) tt)) tt)) A2R))
          di (* (+ (* 3600.0d0 (aref dinc np 0)) (* (+ (aref dinc np 1) (* (aref dinc np 2) tt)) tt)) A2R)
          doh (anpm (* (+ (* 3600.0d0 (aref omega np 0)) (* (+ (aref omega np 1) (* (aref omega np 2) tt)) tt)) A2R)) )
    ;; apply the trigonometric terms.
    (setf dmu  (* 0.35953620d0  tt))

    (do ((k 0 (the int (1+ k))))
        ((<= 8 k))
      (declare (type int k))
      (setf arga (* (aref kp np k) dmu)
            argl (* (aref kq np k) dmu))
      (incf da  (* (+ (* (aref ca np k) (cos arga)) (* (aref sa np k)  (sin arga)))  0.0000001d0))
      (incf dl  (* (+ (* (aref cl np k) (cos argl)) (* (aref sl np k)  (sin argl)))  0.0000001d0)))

    (setf arga (* (aref kp np 8) dmu))
    (incf da (* tt (+ (* (aref ca np 8) (cos arga)) (* (aref sa np 8) (sin arga)))  0.0000001d0))

    (do ((k 8 (the int (1+ k))))
        ((< 9 k))
      (declare (type int k))
      (setf argl (* (aref kq np k) dmu))
      (incf dl (* tt (+ (* (aref cl np k) (cos argl)) (* (aref sl np k) (sin argl))) 0.0000001d0)))

    (setf dl (fmod dl TWOPI))

    ;; iterative solution of kepler's equation to get eccentric anomaly.
    (setf am (- dl dp)
          ae (+ am (* de (sin am))))

    (block :loop
      (setf k 0)
      (loop
       (setf dae  (/ (- (+ am (* de (sin ae))) ae) (- 1.0d0 (* de (cos ae)))))
       (incf ae dae)
       (incf k)
       (when (or (>= k 10) (< (abs dae) 1d-12))
         (return-from :loop))))
    
    ;; true anomaly.
    (setf ae2 (/ ae  2.0d0)
          at  (* 2.0d0 (datan (* (sqrt (/ (+ 1.0d0  de) (- 1.0d0  de))) (sin ae2))  (cos ae2))))

    ;; distance (au) and speed (radians per day).
    (setf r (* da  (- 1.0d0 (* de (cos ae))))
          v (* GAUSSK (sqrt (/ (+ 1.0d0 (/ 1.0d0 (aref amas np))) (* da da da)))))

    (setf si2   (sin (the ranged-double (/ di  2.0d0)))
          xq    (* si2 (cos doh))
          xp    (* si2 (sin doh))
          tl    (+ at  dp)
          xsw   (sin tl)
          xcw   (cos tl)
          xm2   (* 2.0d0 (- (* xp xcw) (* xq xsw)))
          xf    (/ da (sqrt (- 1.0d0 (* de de))))
          ci2   (cos (the ranged-double (/ di 2.0d0)))
          xms   (* (+ (* de (sin dp)) xsw) xf)
          xmc   (* (+ (* de (cos dp)) xcw) xf)
          xpxq2 (* 2.0d0  xp xq))

    ;; position (j2000 ecliptic x,y,z in au).
    (setf x (* r  (- xcw (* xm2 xp)))
          y (* r  (+ xsw (* xm2 xq)))
          z (* r  (* (- xm2) ci2)))

    ;; rotate to equatorial.
    (setf (aref pv 0 0)  x
          (aref pv 0 1) (- (* y coseps) (* z sineps))
          (aref pv 0 2) (+ (* y sineps) (* z coseps)))

    ;; velocity (j2000 ecliptic xdot,ydot,zdot in au/d).
    (setf x (* v  (+ (* (+ -1.0d0  (* 2.0d0  xp  xp))  xms) (* xpxq2  xmc)))
          y (* v  (- (* (- 1.0d0 (* 2.0d0  xq  xq))  xmc) (* xpxq2  xms)))
          z (* v  (* 2.0d0 ci2 (+ (* xp xms) (* xq  xmc)))))

    ;; rotate to equatorial.
    (setf (aref pv 1 0) x
          (aref pv 1 1) (- (* y  coseps) (* z  sineps))
          (aref pv 1 2) (+ (* y  sineps) (* z  coseps)))
    (values)));;planetpv

;;---------------------------------------------------------------------------
;; Computes RA, Declination, and distance from a state vector returned by
;; planetpv.
(defun radecdist (state rdd)
  (declare (type (simple-array double-float (2 3)) state)
           (type (simple-array double-float (3)) rdd))
  ;; distance
  (setf (aref rdd 2) (the double-float
                       (sqrt (+ (* (aref state 0 0)  (aref state 0 0))
                                (* (aref state 0 1)  (aref state 0 1))
                                (* (aref state 0 2)  (aref state 0 2))))))

  ;; RA
  (setf (aref rdd 0) (the double-float
                       (* (datan (aref state 0 1) (aref state 0 0)) R2H)))
  (when (< (aref rdd 0)  0.0d0) (incf (aref rdd 0) 24.0d0))

  ;; Declination
  (setf (aref rdd 1) (the double-float
                       (* (asin (/ (aref state 0 2) (aref rdd 2))) R2D)))
  (values));;radecdist


;;---------------------------------------------------------------------------
;; Entry point
;; Calculate RA and Dec for noon on every day in 1900-2100
(defun main ()
  (let ((jd (make-array '(2) :element-type 'double-float :initial-element 0d0))
        (pv (make-array '(2 3) :element-type 'double-float :initial-element 0d0))
        (position (make-array '(3) :element-type 'double-float :initial-element 0d0)))
    (declare (type (simple-array double-float (2)) jd)
             (type (simple-array double-float (2 3)) pv)
             (type (simple-array double-float (3)) position))
    (do ((i 0 (the int (1+ i))))
        ((>= i TEST-LOOPS))
      (declare (type int i))
      (setf (aref jd 0) J2000
            (aref jd 1) 0.0d0)
      (do ((n 0 (the int (1+ n))))
          ((>= n TEST-LENGTH))
        (declare (type int n))
        (incf (aref jd 0) 1.0d0)
        (do ((p 0 (the int (1+ p))))
            ((>= p 8))
          (declare (type int p))
          (planetpv jd p pv)
          (radecdist pv position) ))))
  0);;main


(time (almabench:main))

------------------------------------------------------------------------

--
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Duane Rettig
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <47jxyxryt.fsf@franz.com>
Pascal Bourguignon <····@thalassa.informatimago.com> writes:

> Brian Mastenbrook <····················@cs.indiana.edu> writes:
> > I'd be interested in seeing your ACL source, as the version that
> > Raymond Toy and myself have come up with does not fare very well at all
> > under ACL or LispWorks. But, on the other hand, CMUCL and SBCL reliably
> > beat GCC on the P4! Nobody's made the mistake of choosing the platform
> > on which we seem to score best. Nobody should have picked MIPS.

Perhaps we should pick a completely new one, like the AMD64, which, even
with a 32-bit lisp, consistently provides SSE/SSE2 true random-access
float registers.

> Here  are  the  times  I  get  with  sbcl  and  acl-6.2-trial  on  the
> almabench.cl Duane sent me, and with Brian's version:
> 
> 
>     compiler            athlon-1200
>     -----------------  ------------
>     g++    3.3          39.540
>     -----------------  ------------
>     sbcl   0.8.7        46.12       almabench-rettig.cl
>     acl    6.2-trial    47.40
>     -----------------  ------------
>     sbcl   0.8.7        42.57       almabench-mastenbrook.lisp  
>     cmucl  18d          43.05
>     -----------------  ------------
>     Total: between 7.6% and 20% of C++ times.
> 
> Most significantly, almabench-rettig.cl does not cons on acl:

Not bad for an hour-or-so's work, eh?  Well, back to my slogging; I've
got a lisp to get out.   But before I go, I think that the lesson here
is important - although I never took the troll who started this thread
seriously, I definitely took his benchmark seriously.  And although
many here have decried nobody's business, almost to the point of even
coming right out and stating the half-truth that "there's no such thing
as a good benchmark", please note that it is only half the truth; the
other half is that "there's no such thing as a bad benchmark", either;
it behooves us all to take every timing test seriously enough to at
least know what is occurring, and also to know the implementation you
work with well enough to know its strengths and weaknesses.  This
particular benchmark makes strong use of floating-point and trig
functionality, which cmucl (and, I presume, by inheritance, sbcl)
has always done an excellent job with.  I for one have seen a
weakness in our own compilation ability of multiple-valued functions
in the arena where unboxed calls/returns are desired.  I intend to be
considering how to plug up that hole as soon as possible.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Gareth McCaughan
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87wu5y4xht.fsf@g.mccaughan.ntlworld.com>
Pascal Bourguignon wrote:

> Here  is  the  latest  source  and the  summary.   Let's  gather  some
> statistics...   There is  still a  lot  of consing  done... but  we're
> within 17% of C++ super-optimized code.
...
> #||
> (defun anpm (aa)
>   (declare (type double-float aa))
>   (multiple-value-bind (dummy w) (ffloor aa TWOPI)
>     (declare (ignore dummy) (type double-float w))
>     (if (>= w %pi) (- w TWOPI) w)));;anpm-1
> The following seems to be slightly faster on sbcl:
> ||#
> 
> (defun anpm (aa)
>   (declare (type double-float aa))
>   (let ((w (fmod aa TWOPI)))
>     (when (>= (abs w) %pi)
>       (setf w (- w (if (< aa 0.0d0) (- twopi) twopi))))
>     (values (the double-float w))));;anpm

As per my recent reply to Raymond Toy, try the following
instead:

    (declaim (inline anpm)
             (ftype (function (double-float) (double-float -4d0 4d0)) anpm))
    (defun anpm (aa)
      (declare (type double-float aa))
      (multiple-value-bind (dummy w) (fround aa TWOPI)
        (declare (ignore dummy)
                 (type double-float w))
        w))

I suspect that some of those declarations are redundant.

>   (let ((k 0)
>         (tt 0d0) (da 0d0) (dl 0d0) (de 0d0) (dp 0d0) (di 0d0)
>         (doh 0d0) (dmu 0d0) (arga 0d0) (argl 0d0) (am 0d0)
>         (ae 0d0) (dae 0d0) (ae2 0d0) (at 0d0) (r 0d0) (v 0d0)
>         (si2 0d0) (xq 0d0) (xp 0d0) (tl 0d0) (xsw 0d0)
>         (xcw 0d0) (xm2 0d0) (xf 0d0) (ci2 0d0) (xms 0d0)
>         (xmc 0d0) (xpxq2 0d0) (x 0d0) (y 0d0) (z 0d0))
>     (declare (type int k)
>              (type double-float
>                    tt da dl de dp di doh dmu arga argl am
>                    ae dae ae2 at r v si2 xq xp tl xsw
>                    xcw xm2 xf ci2 xms xmc xpxq2 x y z))

If some of those quantities can be given bounds then CMUCL
will be able to compile some of the trig function calls more
efficiently. Declaring the return type of ANPM appropriately
buys a little benefit of this sort. However, I don't have the
stamina to dig into the code and figure out correct bounds,
and maybe it's cheating anyway.

Incidentally, is anyone doing any checking that (each version
of) our Lisp code is actually producing the same answers as
the C++, up to the little differences you'd expect from
different evaluation order and the like?

-- 
Gareth McCaughan
.sig under construc
From: Bradley J Lucier
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <c2bl93$ski@arthur.cs.purdue.edu>
In article <··············@g.mccaughan.ntlworld.com>,
Gareth McCaughan  <················@pobox.com> wrote:
>Incidentally, is anyone doing any checking that (each version
>of) our Lisp code is actually producing the same answers as
>the C++, up to the little differences you'd expect from
>different evaluation order and the like?

Yes, I added lines to the C++ and Scheme versions to do a minimal check of
the x component of the position at the end.  We pass.

Brad
From: Barry Fishman
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <m37jxxw5hu.fsf@barry_fishman.att.net>
Pascal Bourguignon <····@thalassa.informatimago.com> writes:
> Here  is  the  latest  source  and the  summary.   Let's  gather  some
> statistics...   There is  still a  lot  of consing  done... but  we're
> within 17% of C++ super-optimized code.

Here is a data point on a AMD_64 someone asked about (But running the
32bit version of SUSE 9.0).

Processor
=========

model name      : AMD Athlon(tm) 64 Processor 3200+
stepping        : 8
cpu MHz         : 2000.682
cache size      : 1024 KB

user run time (seconds) 
=======================

    compiler           athlon-3200+
    -----------------  ------------
    g++    3.3          21.999 (1)
    g++    3.3          26.021 (2)
    -----------------  ------------ -----------
    sbcl   0.8.6        24.24
    cmucl  18e          25.445

    (1) g++ -O3 -march=athlon -msse -msse2 -mmmx 

    (2) g++ -O3 
From: Kenny Tilton
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <mhc2c.16486$Wo2.2042@twister.nyc.rr.com>
Raymond Toy wrote:

>>>>>>"Gareth" == Gareth McCaughan <················@pobox.com> writes:
> 
> 
>     Gareth> For what it's worth, on my machine with CMUCL 18e and g++ 3.2.3
>     Gareth> I get about 52.4s for C++ and about 59.2s for your Lisp version.
>     Gareth> (My machine is a 1GHz Athlon.) I haven't tried any newer version
> 
> Very nice.  Could you send or post your current code?  I'm just
> curious to see if there's anything else missing or if there's some
> obvious deficiencies.
> 
> Of course, you realize that the OP will just say, "Look how bad Lisp
> is.  It took DAYS to make it run as fast as C++.  The C++ code was
> already fast."
> 
> :-(

Cheer up! This particular troll had no redeeming social value and would 
have stuck to its unthinking monotone no matter what anyone said.

But next time, folks, do not let the troll determine the playing field. 
The point about Lisp being close to C in speed is merely our way of 
saying, hey, we are native-compiled (with nice optimizing compilers) 
just like C. The benchmark happened to penalize Lisp for one of its 
strengths, viz, seamlessly coping with all kinds of numeric 
representations at once.

My solution would have been to call the C benchmark via the FFI, making 
the larger point that should some computationally intense task ever 
become a problem, nothing stops a lispnik from coding that bit in C. If 
that bit also is expected to change a lot or is itself hard to get right 
algorithmically, then I might want it in Lisp--and then I just get on 
comp.lang.lisp or call my lisp vendor and ask about declarations, etc.

The point is that using a language that makes me ten times more 
productive cannot be argued against because some chunk of it might have 
to be in C, or because I might have to call some C library.

That benchmark was for comparing C compilers, btw, so again, don't let 
the jerks set the field.

kt

-- 
http://tilton-technology.com

Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film

Your Project Here! http://alu.cliki.net/Industry%20Application
From: jan
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <usmgcl7t2.fsf@iprimus.com.au>
Kenny Tilton <·······@nyc.rr.com> writes:

> My solution would have been to call the C benchmark via the FFI,
> making the larger point that should some computationally intense
> task ever become a problem, nothing stops a lispnik from coding that
> bit in C.

Has anyone got benchmark results of C via FFI vs pure C ?

My attempts showed FFI was slower but I didn't go to great lengths to
make the benchmark reliable.

-- 
jan
From: Kenny Tilton
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <0av4c.15958$c73.5405005@twister.nyc.rr.com>
jan wrote:

> Kenny Tilton <·······@nyc.rr.com> writes:
> 
> 
>>My solution would have been to call the C benchmark via the FFI,
>>making the larger point that should some computationally intense
>>task ever become a problem, nothing stops a lispnik from coding that
>>bit in C.
> 
> 
> Has anyone got benchmark results of C via FFI vs pure C ?

:) I was thinking one call to caculate all the results, so I think it 
would disappear in the 50 second noise.

otoh, Cello beats on the FFI like its daddy (all those neat libraries, 
you know) and if that can be tuned I'd love to know about it. 
Fortunately OpenGL has the display list mechanism to collapse a 
sequences of calls into one, after the first time. Just doing that on 
the 3d buttons seemed to boost me from 18 to 22 fps. Mind, you, that was 
on the lighting panel which has almost a hundred 3d frames.

kt

-- 
http://tilton-technology.com

Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film

Your Project Here! http://alu.cliki.net/Industry%20Application
From: Christian Lynbech
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87y8q5uvan.fsf@baguette.defun.dk>
>>>>> "jan" == jan  <······@iprimus.com.au> writes:

jan> Has anyone got benchmark results of C via FFI vs pure C ?

It probably depend quite a bit on compiler and possibly also on
platform.

I was experiementing a bit with the ACL version of almabench on
powerpc.

Duanes version contained a few FFI defs like this:

(ff:def-foreign-call (asin "asin") ((dfloat :double))
  :returning :double
  :arg-checking nil
  :call-direct t)

which judging from the arguments probably translates very directly
into a library call and thus should be as fast as any C compiler.

------------------------+-----------------------------------------------------
Christian Lynbech       | christian ··@ defun #\. dk
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - ·······@hal.com (Michael A. Petonic)
From: jan
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <uznaibnao.fsf@iprimus.com.au>
jan <······@iprimus.com.au> writes:

> Has anyone got benchmark results of C via FFI vs pure C ?
> 
> My attempts showed FFI was slower but I didn't go to great lengths
> to make the benchmark reliable.

I tried my benchmark again today and FFI is _not_ slower.

Next time I'll double check _before_ posting stupid questions.

-- 
jan
From: Raymond Toy
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <4nr7w7z5s0.fsf@edgedsp4.rtp.ericsson.se>
>>>>> "Pascal" == Pascal Bourguignon <····@thalassa.informatimago.com> writes:

    Pascal> But  it still  conses  a  lot, boxing  and  unboxing double-floats  for
    Pascal> standard  function   arguments.   It  seems  that   using  these  (the
    Pascal> double-float argument) is not useful:

    Pascal> (defmacro sqrt (arg)
    Pascal>   `(the double-float (common-lisp:sqrt (the double-float ,arg))))
    Pascal> (defmacro asin (arg)
    Pascal>   `(the double-float (common-lisp:asin (the double-float ,arg))))

Perhaps a deficiency in the compiler, since the output assertion
should tell the compiler that the inputs to the function are the right
range.  However doing the following will make sure you get the fast
versions:

    (sqrt (the (double-float 0d0) x))

    (asin (the (double-float -1d0 1d0) x))

    Pascal> (defmacro mod (num div)
    Pascal>   `(multiple-value-bind (q r) (common-lisp:ftruncate
    Pascal>                                (the double-float ,num)
    Pascal>                                (the double-float ,div))
    Pascal>      (declare (ignore q))
    Pascal>      (the double-float r)));;mod

This looks right, and there's not much to be done because sbcl doesn't
have a inline ftruncate, I think.  With a fairly recent cmucl, you can
do something like

    (declaim (inline fmod))
    (defun fmod (num div)
      (declare (double-float num div)
               (optimize speed (space 0)))
      (let ((q (ftruncate (/ num div))))
        (- num (* q div))))

(Untested, but I think that's right).  This will allow the ftruncate
to be inlined, so no boxing of numbers.

Ray
From: Cesar Rabak
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <404921C4.3080503@acm.org>
Pascal Bourguignon escreveu:
> Christian Lynbech <·················@ericsson.com> writes:
> 
[snipped]

> 
> 
> 63.56/43.960 = 1.446; we're still 44.6% slower than C++.
> 
> 

Do you have means for a 'quick and dirty' profiling on the Lisp code?

--
Cesar Rabak
From: Pascal Bourguignon
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <877jxy9118.fsf@thalassa.informatimago.com>
Cesar Rabak <······@acm.org> writes:

> Pascal Bourguignon escreveu:
> > Christian Lynbech <·················@ericsson.com> writes:
> >
> [snipped]
> 
> > 63.56/43.960 = 1.446; we're still 44.6% slower than C++.
> >
> 
> Do you have means for a 'quick and dirty' profiling on the Lisp code?

There's a coverage  utility somewhere, but this is  a simple algorithm
with a lot  of arithmetic and some trigonometry.   There's not much to
profile, we  know where time is  spent, the compilers  and the garbage
collector tell us.  

Well,  the compilers  could be  more explicit,  or we  could  read the
compilers to know how to make them generate the best code...

-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Cesar Rabak
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <40493503.3080609@acm.org>
Pascal Bourguignon escreveu:
> Cesar Rabak <······@acm.org> writes:
> 
> 
>>Pascal Bourguignon escreveu:
>>
>>>Christian Lynbech <·················@ericsson.com> writes:
>>>
>>
>>[snipped]
>>
>>
>>>63.56/43.960 = 1.446; we're still 44.6% slower than C++.
>>>
>>
>>Do you have means for a 'quick and dirty' profiling on the Lisp code?
> 
> 
> There's a coverage  utility somewhere, but this is  a simple algorithm
> with a lot  of arithmetic and some trigonometry.   There's not much to
> profile, we  know where time is  spent, the compilers  and the garbage
> collector tell us.  
> 
> Well,  the compilers  could be  more explicit,  or we  could  read the
> compilers to know how to make them generate the best code...
> 

I see ;-)
From: Bradley J Lucier
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <c26en2$jjr@arthur.cs.purdue.edu>
In article <··············@thalassa.informatimago.com>,
Pascal Bourguignon  <····@thalassa.informatimago.com> wrote:
>Ok, that was not so hard after all.
>
>Adding these macros:

<snip>

>allowed me to make it run  twice as fast as c++ (on Athlon 1200 MHz):
>
>;; SBCL:
>* (time (almabench:main))
>Evaluation took:
>  		 135.384 seconds of real time
>  		 96.65 seconds of user run time
>  		 3.08 seconds of system run time
>  		 0 page faults and
>  		 2059355408 bytes consed.
>
># g++:
>$ time ./almabench
>
>real    3m0.195s
>user    0m46.470s
>sys     0m0.100s

Pascal:

I've been archiving a surprising number of your recent posts because I've found
them interesting.  Since I like to do numeric programming in Scheme (Gambit-C
specifically) I thought I would convert what you did to Scheme and see how
it went.

On a 1200MHz UltraSPARC (sorry, no fast x86 linux boxes readily available),
I get

zada-17% g++ -O2 -o alma almabench.cpp -mcpu=ultrasparc -mtune=ultrasparc
zada-18% time ./alma
8 11.0531 7.10072
79.87u 0.00s 1:20.03 99.8%

for the original (I added a line to output the position vector at the end) and

> (time (main))
11.05307874040291
(time (main))
    96057 ms real time
    95900 ms cpu time (95830 user, 70 system)
    653 collections accounting for 434 ms real time (450 user, 30 system)
    6708840672 bytes allocated
    no minor faults
    no major faults

for the following scheme code using Gambit-C 4.0alpha8 and

"gcc -fPIC -I/pkgs/gambit/include -O1 -fschedule-insns2 -fno-math-errno -fno-strict-aliasing -mcpu=ultrasparc -mtune=ultrasparc -Wall -W -Wno-unused -c -D___DYNAMIC -D___SINGLE_HOST -o %s.o %s.c -save-temps"

as the command to compile the result of the Scheme->C translation.

So it takes almost exactly 20% longer in Scheme than in C++, and I'm pretty
sure that with a bit more pain I could get it to within 10%.  But I'm not that
hungry for pain at the moment.

Brad Lucier

(declare (standard-bindings)(extended-bindings)(block)
         (not safe)(flonum)(not interrupts-enabled))

(define-macro (FIX . body)
  `(let ()
    (declare (fixnum))
    ,@body))


(define  PI          3.14159265358979323846d0)
(define  J2000      2451545.0d0)
(define  JCENTURY   36525.0d0)
(define  JMILLENIA  365250.0d0)
(define  TWOPI      6.283185307179586)
(define  A2R        4.84813681109536e-6)
(define  R2H        3.819718634205488)
(define  R2D        57.29577951308232)
(define  GAUSSK     0.01720209895d0)

;; number of days to include in test
(define TEST-LOOPS 20)
(define TEST-LENGTH 36525)

;; sin and cos of j2000 mean obliquity (iau 1976)
(define  sineps  0.3977771559319137d0)
(define  coseps  0.9174820620691818d0)

(define amas
  '#f64(6023600.0d0  408523.5d0  328900.5d0  3098710.0d0  1047.355d0  3498.5d0  22869.0d0  19314.0d0 ))

;; tables giving the mean keplerian elements, limited to t**2 terms:
;; a       semi-major axis (au)
;; dlm     mean longitude (degree and arcsecond)
;; e       eccentricity
;; pi      longitude of the perihelion (degree and arcsecond)
;; dinc    inclination (degree and arcsecond)
;; omega   longitude of the ascending node (degree and arcsecond)

(define  a
  '#( #f64(   0.3870983098d0              0d0         0d0 )
       #f64(   0.7233298200d0              0d0         0d0 )
       #f64(   1.0000010178d0              0d0         0d0 )
       #f64(   1.5236793419d0          3d-10        0d0 )
       #f64(   5.2026032092d0      19132d-10  -39d-10)
       #f64(   9.5549091915d0  -0.0000213896d0   444d-10)
       #f64(  19.2184460618d0      -3716d-10  979d-10)
       #f64(  30.1103868694d0     -16635d-10  686d-10)))
       
(define  dlm
  '#( #f64( 252.25090552d0  5381016286.88982d0   -1.92789d0 )
       #f64( 181.97980085d0  2106641364.33548d0    0.59381d0 )
       #f64( 100.46645683d0  1295977422.83429d0   -2.04411d0 )
       #f64( 355.43299958d0   689050774.93988d0    0.94264d0 )
       #f64(  34.35151874d0   109256603.77991d0  -30.60378d0 )
       #f64(  50.07744430d0    43996098.55732d0   75.61614d0 )
       #f64( 314.05500511d0    15424811.93933d0   -1.75083d0 )
       #f64( 304.34866548d0     7865503.20744d0    0.21103d0 ))) 

(define  e
  '#( #f64(   0.2056317526d0   0.0002040653d0     -28349d-10 )
       #f64(   0.0067719164d0  -0.0004776521d0      98127d-10 )
       #f64(   0.0167086342d0  -0.0004203654d0  -0.0000126734d0 )
       #f64(   0.0934006477d0   0.0009048438d0     -80641d-10 )
       #f64(   0.0484979255d0   0.0016322542d0  -0.0000471366d0 )
       #f64(   0.0555481426d0  -0.0034664062d0  -0.0000643639d0 )
       #f64(   0.0463812221d0  -0.0002729293d0   0.0000078913d0 )
       #f64(   0.0094557470d0   0.0000603263d0             0d0 )))

(define  pi-a
  '#( #f64(  77.45611904d0   5719.11590d0    -4.83016d0 )
       #f64( 131.56370300d0    175.48640d0  -498.48184d0 )
       #f64( 102.93734808d0  11612.35290d0    53.27577d0 )
       #f64( 336.06023395d0  15980.45908d0   -62.32800d0 )
       #f64(  14.33120687d0   7758.75163d0   259.95938d0 )
       #f64(  93.05723748d0  20395.49439d0   190.25952d0 )
       #f64( 173.00529106d0   3215.56238d0   -34.09288d0 )
       #f64(  48.12027554d0   1050.71912d0    27.39717d0 )))

(define  dinc
  '#( #f64(   7.00498625d0  -214.25629d0    0.28977d0 )
       #f64(   3.39466189d0   -30.84437d0  -11.67836d0 )
       #f64(            0d0   469.97289d0   -3.35053d0 )
       #f64(   1.84972648d0  -293.31722d0   -8.11830d0 )
       #f64(   1.30326698d0   -71.55890d0   11.95297d0 )
       #f64(   2.48887878d0    91.85195d0  -17.66225d0 )
       #f64(   0.77319689d0   -60.72723d0    1.25759d0 )
       #f64(   1.76995259d0     8.12333d0    0.08135d0 )))

(define  omega
  '#( #f64(  48.33089304d0   -4515.21727d0   -31.79892d0 )
       #f64(  76.67992019d0  -10008.48154d0   -51.32614d0 )
       #f64( 174.87317577d0   -8679.27034d0    15.34191d0 )
       #f64(  49.55809321d0  -10620.90088d0  -230.57416d0 )
       #f64( 100.46440702d0    6362.03561d0   326.52178d0 )
       #f64( 113.66550252d0   -9240.19942d0   -66.23743d0 )
       #f64(  74.00595701d0    2669.15033d0   145.93964d0 )
       #f64( 131.78405702d0    -221.94322d0    -0.78728d0 ))) 

;; tables for trigonometric terms to be added to the mean elements
;; of the semi-major axes.
(define  kp
  '#( #f64( 69613.0d0   75645.0d0  88306.0d0  59899.0d0  15746.0d0  71087.0d0  142173.0d0   3086.0d0     0.0d0 )
        #f64( 21863.0d0   32794.0d0  26934.0d0  10931.0d0  26250.0d0  43725.0d0   53867.0d0  28939.0d0     0.0d0 )
        #f64( 16002.0d0   21863.0d0  32004.0d0  10931.0d0  14529.0d0  16368.0d0   15318.0d0  32794.0d0     0.0d0 )
        #f64(  6345.0d0    7818.0d0  15636.0d0   7077.0d0   8184.0d0  14163.0d0    1107.0d0   4872.0d0     0.0d0 )
        #f64(  1760.0d0    1454.0d0   1167.0d0    880.0d0    287.0d0   2640.0d0      19.0d0   2047.0d0  1454.0d0 )
        #f64(   574.0d0       0.0d0    880.0d0    287.0d0     19.0d0   1760.0d0    1167.0d0    306.0d0   574.0d0 )
        #f64(   204.0d0       0.0d0    177.0d0   1265.0d0      4.0d0    385.0d0     200.0d0    208.0d0   204.0d0 )
        #f64(     0.0d0     102.0d0    106.0d0      4.0d0     98.0d0   1367.0d0     487.0d0    204.0d0     0.0d0 ))) 

(define  ca
  '#( #f64(       4.0d0     -13.0d0     11.0d0     -9.0d0     -9.0d0     -3.0d0     -1.0d0      4.0d0     0.0d0 )
       #f64(    -156.0d0      59.0d0    -42.0d0      6.0d0     19.0d0    -20.0d0    -10.0d0    -12.0d0     0.0d0 )
       #f64(      64.0d0    -152.0d0     62.0d0     -8.0d0     32.0d0    -41.0d0     19.0d0    -11.0d0     0.0d0 )
       #f64(     124.0d0     621.0d0   -145.0d0    208.0d0     54.0d0    -57.0d0     30.0d0     15.0d0     0.0d0 )
       #f64(  -23437.0d0   -2634.0d0   6601.0d0   6259.0d0  -1507.0d0  -1821.0d0   2620.0d0  -2115.0d0 -1489.0d0 )
       #f64(   62911.0d0 -119919.0d0  79336.0d0  17814.0d0 -24241.0d0  12068.0d0   8306.0d0  -4893.0d0  8902.0d0 )
       #f64(  389061.0d0 -262125.0d0 -44088.0d0   8387.0d0 -22976.0d0  -2093.0d0   -615.0d0  -9720.0d0  6633.0d0 )
       #f64( -412235.0d0 -157046.0d0 -31430.0d0  37817.0d0  -9740.0d0    -13.0d0  -7449.0d0   9644.0d0     0.0d0 ))) 

(define  sa
  '#( #f64(     -29.0d0     -1.0d0      9.0d0      6.0d0     -6.0d0      5.0d0      4.0d0      0.0d0     0.0d0 )
       #f64(     -48.0d0   -125.0d0    -26.0d0    -37.0d0     18.0d0    -13.0d0    -20.0d0     -2.0d0     0.0d0 )
       #f64(    -150.0d0    -46.0d0     68.0d0     54.0d0     14.0d0     24.0d0    -28.0d0     22.0d0     0.0d0 )
       #f64(    -621.0d0    532.0d0   -694.0d0    -20.0d0    192.0d0    -94.0d0     71.0d0    -73.0d0     0.0d0 )
       #f64(  -14614.0d0 -19828.0d0  -5869.0d0   1881.0d0  -4372.0d0  -2255.0d0    782.0d0    930.0d0   913.0d0 )
       #f64(  139737.0d0      0.0d0  24667.0d0  51123.0d0  -5102.0d0   7429.0d0  -4095.0d0  -1976.0d0 -9566.0d0 )
       #f64( -138081.0d0      0.0d0  37205.0d0 -49039.0d0 -41901.0d0 -33872.0d0 -27037.0d0 -12474.0d0 18797.0d0 )
       #f64(       0.0d0  28492.0d0 133236.0d0  69654.0d0  52322.0d0 -49577.0d0 -26430.0d0  -3593.0d0     0.0d0 ))) 

;; tables giving the trigonometric terms to be added to the mean
;;  elements of the mean longitudes.
(define  kq
  '#( #f64(  3086.0d0  15746.0d0  69613.0d0  59899.0d0  75645.0d0  88306.0d0  12661.0d0  2658.0d0   0.0d0    0.0d0 )
       #f64( 21863.0d0  32794.0d0  10931.0d0     73.0d0   4387.0d0  26934.0d0   1473.0d0  2157.0d0   0.0d0    0.0d0 )
       #f64(    10.0d0  16002.0d0  21863.0d0  10931.0d0   1473.0d0  32004.0d0   4387.0d0    73.0d0   0.0d0    0.0d0 )
       #f64(    10.0d0   6345.0d0   7818.0d0   1107.0d0  15636.0d0   7077.0d0   8184.0d0   532.0d0  10.0d0    0.0d0 )
       #f64(    19.0d0   1760.0d0   1454.0d0    287.0d0   1167.0d0    880.0d0    574.0d0  2640.0d0  19.0d0 1454.0d0 )
       #f64(    19.0d0    574.0d0    287.0d0    306.0d0   1760.0d0     12.0d0     31.0d0    38.0d0  19.0d0  574.0d0 )
       #f64(     4.0d0    204.0d0    177.0d0      8.0d0     31.0d0    200.0d0   1265.0d0   102.0d0   4.0d0  204.0d0 )
       #f64(     4.0d0    102.0d0    106.0d0      8.0d0     98.0d0   1367.0d0    487.0d0   204.0d0   4.0d0  102.0d0 ))) 

(define  cl
  '#( #f64(      21.0d0    -95.0d0  -157.0d0    41.0d0    -5.0d0    42.0d0    23.0d0    30.0d0      0.0d0     0.0d0 )
       #f64(    -160.0d0   -313.0d0  -235.0d0    60.0d0   -74.0d0   -76.0d0   -27.0d0    34.0d0      0.0d0     0.0d0 )
       #f64(    -325.0d0   -322.0d0   -79.0d0   232.0d0   -52.0d0    97.0d0    55.0d0   -41.0d0      0.0d0     0.0d0 )
       #f64(    2268.0d0   -979.0d0   802.0d0   602.0d0  -668.0d0   -33.0d0   345.0d0   201.0d0    -55.0d0     0.0d0 )
       #f64(    7610.0d0  -4997.0d0 -7689.0d0 -5841.0d0 -2617.0d0  1115.0d0  -748.0d0  -607.0d0   6074.0d0   354.0d0 )
       #f64(  -18549.0d0  30125.0d0 20012.0d0  -730.0d0   824.0d0    23.0d0  1289.0d0  -352.0d0 -14767.0d0 -2062.0d0 )
       #f64( -135245.0d0 -14594.0d0  4197.0d0 -4030.0d0 -5630.0d0 -2898.0d0  2540.0d0  -306.0d0   2939.0d0  1986.0d0 )
       #f64(   89948.0d0   2103.0d0  8963.0d0  2695.0d0  3682.0d0  1648.0d0   866.0d0  -154.0d0  -1963.0d0  -283.0d0 ))) 

(define  sl
  '#( #f64(   -342.0d0    136.0d0   -23.0d0    62.0d0    66.0d0   -52.0d0   -33.0d0    17.0d0      0.0d0     0.0d0 )
       #f64(    524.0d0   -149.0d0   -35.0d0   117.0d0   151.0d0   122.0d0   -71.0d0   -62.0d0      0.0d0     0.0d0 )
       #f64(   -105.0d0   -137.0d0   258.0d0    35.0d0  -116.0d0   -88.0d0  -112.0d0   -80.0d0      0.0d0     0.0d0 )
       #f64(    854.0d0   -205.0d0  -936.0d0  -240.0d0   140.0d0  -341.0d0   -97.0d0  -232.0d0    536.0d0     0.0d0 )
       #f64( -56980.0d0   8016.0d0  1012.0d0  1448.0d0 -3024.0d0 -3710.0d0   318.0d0   503.0d0   3767.0d0   577.0d0 )
       #f64( 138606.0d0 -13478.0d0 -4964.0d0  1441.0d0 -1319.0d0 -1482.0d0   427.0d0  1236.0d0  -9167.0d0 -1918.0d0 )
       #f64(  71234.0d0 -41116.0d0  5334.0d0 -4935.0d0 -1848.0d0    66.0d0   434.0d0 -1748.0d0   3780.0d0  -701.0d0 )
       #f64( -47645.0d0  11647.0d0  2166.0d0  3194.0d0   679.0d0     0.0d0  -244.0d0  -419.0d0  -2531.0d0    48.0d0 ))) 

;;---------------------------------------------------------------------------
;; Normalize angle into the range -pi <= A < +pi.

(define (anpm a)
  (- a (* TWOPI (round (/ a TWOPI)))))

(define (fmod a b)
  (- a (* b (##flonum.<-fixnum (##flonum.->fixnum (/ a b))))))

;;---------------------------------------------------------------------------    
;; The reference frame is equatorial and is with respect to the
;;    mean equator and equinox of epoch j2000.
(define (planetpv epoch np pv)

  (let ((kpnp    (vector-ref kp np))
        (kqnp    (vector-ref kq np))
        (canp    (vector-ref ca np))
        (sanp    (vector-ref sa np))
        (clnp    (vector-ref cl np))
        (slnp    (vector-ref sl np)))
    
    
    ;; time: julian millennia since j2000.
    (let ((t  (/ (+ (- (f64vector-ref epoch 0) J2000) (f64vector-ref epoch 1))  JMILLENIA)))
      
      ;; compute the mean elements.
      (let ((da (let ((anp (vector-ref a np)))
                  (+ (f64vector-ref anp 0) 
                     (* (+ (f64vector-ref anp 1)
                           (* (f64vector-ref anp 2)
                              t))
                        t))))
            (dl (let ((dlmnp (vector-ref dlm np)))
                  (* (+ (* 3600.0d0 (f64vector-ref dlmnp 0))
                        (* (+ (f64vector-ref dlmnp 1) 
                              (* (f64vector-ref dlmnp 2)
                                 t))
                           t))
                     A2R)))
            (de (let ((enp (vector-ref e np)))
                  (+ (f64vector-ref enp 0)
                     (* (+ (f64vector-ref enp 1)
                           (* (f64vector-ref enp 2)
                              t))
                        t))))
            (dp (let ((pi-anp (vector-ref pi-a np)))
                  (anpm (* (+ (* 3600.0d0 (f64vector-ref pi-anp 0))
                              (* (+ (f64vector-ref pi-anp 1) 
                                    (* (f64vector-ref pi-anp 2)
                                       t)) 
                                 t))
                           A2R))))
            (di (let ((dincnp (vector-ref dinc np)))
                  (* (+ (* 3600.0d0 (f64vector-ref dincnp 0))
                        (* (+ (f64vector-ref dincnp 1) 
                              (* (f64vector-ref dincnp 2)
                                 t)) 
                           t)) 
                     A2R)))
            (doh (let ((omeganp (vector-ref omega np)))
                   (anpm (* (+ (* 3600.0d0 (f64vector-ref omeganp 0))
                               (* (+ (f64vector-ref omeganp 1)
                                     (* (f64vector-ref omeganp 2) 
                                        t))
                                  t))
                            A2R)))))
        ;; apply the trigonometric terms.
        (let ((dmu  (* 0.35953620d0  t)))
          (let loop1 ((k 0)
                      (da da)
                      (dl dl))
            (if (FIX (< k 8))
                (let ((arga (* (f64vector-ref kpnp k) dmu))
                      (argl (* (f64vector-ref kqnp k) dmu)))
                  (loop1 (FIX (+ k 1))
                         (+ da  (* (+ (* (f64vector-ref canp k) (cos arga))
                                      (* (f64vector-ref sanp k) (sin arga)))
                                   0.0000001d0))
                         (+ dl  (* (+ (* (f64vector-ref clnp k) (cos argl))
                                      (* (f64vector-ref slnp k) (sin argl)))
                                   0.0000001d0))))
                (let ((da (let ((arga (* (f64vector-ref kpnp 8) dmu)))
                            (+ da (* t (+ (* (f64vector-ref canp 8) (cos arga))
                                          (* (f64vector-ref sanp 8) (sin arga)))
                                     0.0000001d0)))))
                  (let loop2 ((k 8)
                              (dl dl))
                    (if (FIX (< k 10))
                        (let ((argl (* (f64vector-ref kqnp k) dmu)))
                          (loop2 (FIX (+ k 1))
                                 (+ dl (* t (+ (* (f64vector-ref clnp k) (cos argl))
                                               (* (f64vector-ref slnp k) (sin argl)))
                                          0.0000001d0))))
                        
                        ;; iterative solution of kepler's equation to get eccentric anomaly.
                        (let* ((dl (fmod dl TWOPI))
                               (am (- dl dp))
                               (ae (+ am (* de (sin am)))))
                          (let loop3 ((k 0)
                                      (ae ae))
                            (let* ((dae  (/ (- (+ am (* de (sin ae))) ae) (- 1.0d0 (* de (cos ae)))))
                                   (ae (+ ae dae)))
                              (if (and (FIX (< k 10))
                                       (>= (abs dae) 1d-12))
                                  (loop3 (FIX (+ k 1))
                                         ae)
                                  (let* ((ae2 (/ ae 2.0))
                                         (at (* 2.0d0 (atan (* (sqrt (/ (+ 1.0d0 de)
                                                                        (- 1.0d0 de)))
                                                               (sin ae2))
                                                            (cos ae2)))))
                                    ;; distance (au) and speed (radians per day).
                                    (let ((r (* da  (- 1.0d0 (* de (cos ae)))))
                                          (v (* GAUSSK (sqrt (/ (+ 1.0d0
                                                                   (/ 1.0d0
                                                                      (f64vector-ref amas np)))
                                                                (* da da da))))))
                                      (let* ((si2   (sin (/ di  2.0d0)))
                                             (xq    (* si2 (cos doh)))
                                             (xp    (* si2 (sin doh)))
                                             (tl    (+ at  dp))
                                             (xsw   (sin tl))
                                             (xcw   (cos tl))
                                             (xm2   (* 2.0d0 (- (* xp xcw) (* xq xsw))))
                                             (xf    (/ da (sqrt (- 1.0d0 (* de de)))))
                                             (ci2   (cos (/ di 2.0d0)))
                                             (xms   (* (+ (* de (sin dp)) xsw) xf))
                                             (xmc   (* (+ (* de (cos dp)) xcw) xf))
                                             (xpxq2 (* 2.0d0  xp xq)))
                                        
                                        ;; position (j2000 ecliptic x,y,z in au).
                                        (let ((x (* r  (- xcw (* xm2 xp))))
                                              (y (* r  (+ xsw (* xm2 xq))))
                                              (z (* r  (* (- xm2) ci2)))
                                              (pv0 (vector-ref pv 0)))
                                          
                                          ;; rotate to equatorial.
                                          (f64vector-set! pv0 0  x)
                                          (f64vector-set! pv0 1 (- (* y coseps) (* z sineps)))
                                          (f64vector-set! pv0 2 (+ (* y sineps) (* z coseps))))
                                        
                                        ;; velocity (j2000 ecliptic xdot,ydot,zdot in au/d).
                                        (let ((x (* v (+ (* (+ -1.0d0 (* 2.0d0 xp xp)) xms) (* xpxq2 xmc))))
                                              (y (* v (- (* (-  1.0d0 (* 2.0d0 xq xq)) xmc) (* xpxq2 xms))))
                                              (z (* v (* 2.0d0 ci2 (+ (* xp xms) (* xq  xmc)))))
                                              (pv1 (vector-ref pv 1)))
                                          
                                          ;; rotate to equatorial.
                                          (f64vector-set! pv1 0  x)
                                          (f64vector-set! pv1 1 (- (* y coseps) (* z sineps)))
                                          (f64vector-set! pv1 2 (+ (* y sineps) (* z coseps)))))))))))))))))))))

;;---------------------------------------------------------------------------
;; Computes RA, Declination, and distance from a state vector returned by
;; planetpv.
(define (radecdist state rdd)
  (let ((state0 (vector-ref state 0)))
    
    (define (square x)
      (* x x))
    
    ;; distance
    (f64vector-set! rdd 2 (sqrt (+ (square (f64vector-ref state0 0))
                                   (square (f64vector-ref state0 1))
                                   (square (f64vector-ref state0 2)))))
    ;; RA
    
    (f64vector-set! rdd 0 (* (atan (f64vector-ref state0 1)
                                   (f64vector-ref state0 0))
                             R2H))
    (if (< (f64vector-ref rdd 0) 0.0)
        (f64vector-set! rdd 0 (+ (f64vector-ref rdd 0) 24.0)))
    
    ;; Declination
    (f64vector-set! rdd 1 (* (asin (/ (f64vector-ref state0 2)
                                      (f64vector-ref rdd 2)))
                             R2D))))

;;---------------------------------------------------------------------------
;; Entry point
;; Calculate RA and Dec for noon on every day in 1900-2100
;;main
(define (main)
  (let ((jd (f64vector 0.0 0.0))
        (pv (vector (f64vector 0.0 0.0 0.0)
                    (f64vector 0.0 0.0 0.0)))
        (position (f64vector 0.0 0.0 0.0)))
    (do ((i 0 (FIX (+ i 1))))
        ((FIX (>= i TEST-LOOPS)))
      (f64vector-set! jd 0 J2000)
      (f64vector-set! jd 1 0.0)
      (do ((n 0 (FIX (+ n 1))))
          ((FIX (>= n TEST-LENGTH)))
        (f64vector-set! jd 0 (+ (f64vector-ref jd 0)
                                1.0))
        (do ((p 0 (FIX (+ p 1))))
            ((FIX (>= p 8)))
          (planetpv jd p pv)
          (radecdist pv position))))
    (pp (f64vector-ref position 0))))
From: Ray Dillinger
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <40475C1F.587CB6D7@sonic.net>
Bradley J Lucier wrote:
> 

> for the original (I added a line to output the position vector at the end) and
> 
> > (time (main))
> 11.05307874040291
> (time (main))
>     96057 ms real time
>     95900 ms cpu time (95830 user, 70 system)
>     653 collections accounting for 434 ms real time (450 user, 30 system)
>     6708840672 bytes allocated
>     no minor faults
>     no major faults
> 
> for the following scheme code using Gambit-C 4.0alpha8 and
> 
> "gcc -fPIC -I/pkgs/gambit/include -O1 -fschedule-insns2 -fno-math-errno -fno-strict-aliasing -mcpu=ultrasparc -mtune=ultrasparc -Wall -W -Wno-unused -c -D___DYNAMIC -D___SINGLE_HOST -o %s.o %s.c -save-temps"
> 
> as the command to compile the result of the Scheme->C translation.
> 
> So it takes almost exactly 20% longer in Scheme than in C++, and I'm pretty
> sure that with a bit more pain I could get it to within 10%.  But I'm not that
> hungry for pain at the moment.
> 
> Brad Lucier


Having a severe look at the algorithm -- it's possible to write this in a way 
that does no allocation. That would shave 434 ms of garbage collection off the
total, and an unknown number of microseconds of memory allocation and indexing. 
I'm not going to bother, but I'm just making the observation. 

				Bear
From: rif
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <wj0r7w4ipnz.fsf@five-percent-nation.mit.edu>
By modifying Brian Mastenbrook's version, I've made a slight
additional improvement (between 1 and 3%, averaged over a large number
of runs on two different P4 systems at varying speeds, the improvement
was over the -1d-arrays version, which was faster than the +1d-arrays
version on both systems I have).  The basic idea is that I represent
the 2D-arrays as a 1D-array of 1D-arrays of double-floats.  This is
effective because each call to planetpv only accesses one column
(column np) of all the 2D-arrays.  I modified Brian's +1d-arrays aref
macro to do what I want; to make it work, I also wrapped the most of
the body of planetpv in a new with-array-slice macro, which binds a
variable +a+-slice to the appropriate slice of the variable +a+; the
calls to aref are transformed to access the slice and ignore the first
dimension.

This was a good learning experience for me in terms of understanding
and modifying a moderately complex macro --- thanks Brian, I learned a
bunch from your code.  I think this variant is a little more
error-prone than the +1d-arrays macro Brian wrote, because it requires
the extra wrapping of the with-array-slice.  In both cases, you'd have
to have the name of the array you're aref'ing at compile time in order
to translate the aref call, so neither variant can really be used as
is in code that passes arrays into functions rather than naming them
globally.

So what have we learned?  We confirmed what we pretty much knew: you
can write a C program in CL, at which point the relative speed of your
C and CL versions will depend on the relative quality of the code
generation.  It's worthwhile to know the specifics of your
implementation (for instance, it seems that a substantial amount of
the last-mile speed up for this benchmark on CMUCL came from figuring
out ftruncate could be fast inside a macro, but not inside an inline
function).

What would be even more interesting to me would be an example more
like what I feel I experience anecdotally --- that I write programs
that are a bit slower than C (maybe 25% to 50%), but they're much more
flexible, more abstract, cleaner, easily modifiable, and so on.  I
don't really feel like the CL version of almabench we currently have
shows any of the benefits of CL.

I include my code below in case anyone wants it.

rif

;;;; ALMABENCH 1.0.1
;;;; Common Lisp version

;;; A number-crunching benchmark designed for cross-language and vendor
;;; comparisons.

;;; Written by Scott Robert Ladd.
;;; Naively translated to Common Lisp by:
;;; - Pascal Bourguignon <····@thalassa.informatimago.com>
;;; Improvements suggested via comp.lang.lisp by:
;;; - Christian Lynbech <·················@ericsson.com>
;;; - Marco Antoniotti <·······@cs.nyu.edu>
;;; - Gareth McCaughan <················@pobox.com>
;;; Improved for CMUCL and SBCL by:
;;; - Raymond Toy <···@rtp.ericsson.se>
;;; - Brian Mastenbrook <········@cs.indiana.edu>

;;; No rights reserved. This is public domain software, for use by anyone.

;;; This program calculates the daily ephemeris (at noon) for the years
;;; 2000-2099 using an algorithm developed by J.L. Simon, P. Bretagnon, J.
;;; Chapront, M. Chapront-Touze, G. Francou and J. Laskar of the Bureau des
;;; Longitudes, Paris, France), as detailed in Astronomy & Astrophysics
;;; 282, 663 (1994)

;;; Note that the code herein is design for the purpose of testing 
;;; computational performance error handling and other such "niceties"
;;; is virtually non-existent.

;;; Actual (and oft-updated) benchmark results can be found at:
;;; http://www.coyotegulch.com

;;; Please do not use this information or algorithm in any way that might
;;; upset the balance of the universe or otherwise cause planets to impact
;;; upon one another.

(defpackage :almabench-rif (:use :common-lisp)
  (:shadow :t :sin :cos :aref :defconstant) (:export :main))
(in-package :almabench-rif)

(declaim (optimize (speed 3) (space 0) (safety 0) (debug 0) (compilation-speed 0)))
;; (declaim (optimize (speed 1) (space 0) (safety 2) (debug 3) (compilation-speed 0)))

(defmacro defconstant (&rest args)
  `(eval-when (:compile-toplevel :load-toplevel)
     (cl:defconstant ,@args)))

(defconstant +pi+ 3.14159265358979323846d0)
(defconstant +j2000+ 2451545.0d0)
(defconstant +jcentury+ 36525.0d0)
(defconstant +jmillenia+ 365250.0d0)
(defconstant +twopi+ (* 2.0d0 +pi+))
(defconstant +a2r+ (/ +pi+ 648000.0d0))
(defconstant +r2h+ (/ 12.0d0 +pi+))
(defconstant +r2d+ (/ 180.0d0 +pi+))
(defconstant +gaussk+ 0.01720209895d0)

;; number of days to include in test
(defconstant +test-loops+ 20)
(defconstant +test-length+ 36525)

;; sin and cos of j2000 mean obliquity (iau 1976)
(defconstant +sineps+ 0.3977771559319137d0)
(defconstant +coseps+ 0.9174820620691818d0)

(eval-when (:compile-toplevel :load-toplevel)
  (defun length-2-listp (l) (and (listp l) (eql (length l) 2)))
  (defun lookup (value alist) (cdr (assoc value alist)))

  (let ((array-names nil)
	(slice-syms nil))

    (defmacro defconst-double-array (name dimensions contents)
      (when (length-2-listp dimensions)
	(push
	 (cons name (make-symbol (concatenate 'string (symbol-name name) "-SLICE")))
	 slice-syms)
	
	(push
	 (cons name
	       (lambda (dim1 dim2)
		 (declare (ignore dim1))
		 `(the double-float (cl:aref ,(lookup name slice-syms) (the fixnum ,dim2)))))
	 array-names))
	    
      `(defconstant ,name
	 (if (boundp ',name) (symbol-value ',name)
	   ,(if (length-2-listp dimensions)
		`(make-array 
		  ,(first dimensions)
		  :initial-contents ',(mapcar (lambda (c)
						(let ((l (second dimensions)))
						  (declare (type fixnum l))
						  (make-array l
							      :element-type 'double-float
							      :initial-contents c)))
					      contents))
	      `(make-array
		',dimensions :element-type 'double-float :initial-contents ',contents)))))
    
    (defmacro with-array-slices ((index &rest arrays) &body body)
      `(let (,@(mapcar (lambda (a)
			 
			 `(,(lookup a slice-syms)
			   (the (simple-array double-float) (cl:aref ,a (the fixnum ,index)))))
		       arrays))
	 ,@body))
  
    (defmacro aref (array &rest dims)
      (if (assoc (the symbol array) array-names)
	  (apply (the function (cdr (assoc array array-names))) dims)
	`(cl:aref ,array ,@dims)))
    )
  )
    
    

(defconst-double-array +amas+ 8
  (6023600.0d0 408523.5d0 328900.5d0 3098710.0d0 1047.355d0 3498.5d0 22869.0d0 19314.0d0))

(defconst-double-array +a+ (8 3)
  ((0.3870983098d0 0d0 0d0)
   (0.7233298200d0 0d0 0d0)
   (1.0000010178d0 0d0 0d0)
   (1.5236793419d0 3d-10 0d0)
   (5.2026032092d0 19132d-10 -39d-10)
   (9.5549091915d0 -0.0000213896d0 444d-10)
   (19.2184460618d0 -3716d-10 979d-10)
   (30.1103868694d0 -16635d-10 686d-10)))

(defconst-double-array +dlm+ (8 3) 
  ((252.25090552d0 5381016286.88982d0 -1.92789d0)
   (181.97980085d0 2106641364.33548d0 0.59381d0)
   (100.46645683d0 1295977422.83429d0 -2.04411d0)
   (355.43299958d0 689050774.93988d0 0.94264d0)
   (34.35151874d0 109256603.77991d0 -30.60378d0)
   (50.07744430d0 43996098.55732d0 75.61614d0)
   (314.05500511d0 15424811.93933d0 -1.75083d0)
   (304.34866548d0 7865503.20744d0 0.21103d0)))

(defconst-double-array +e+ (8 3)
  ((0.2056317526d0 0.0002040653d0 -28349d-10)
   (0.0067719164d0 -0.0004776521d0 98127d-10)
   (0.0167086342d0 -0.0004203654d0 -0.0000126734d0)
   (0.0934006477d0 0.0009048438d0 -80641d-10)
   (0.0484979255d0 0.0016322542d0 -0.0000471366d0)
   (0.0555481426d0 -0.0034664062d0 -0.0000643639d0)
   (0.0463812221d0 -0.0002729293d0 0.0000078913d0)
   (0.0094557470d0 0.0000603263d0 0d0)))

(defconst-double-array +pi-a+ (8 3)
  ((77.45611904d0 5719.11590d0 -4.83016d0)
   (131.56370300d0 175.48640d0 -498.48184d0)
   (102.93734808d0 11612.35290d0 53.27577d0)
   (336.06023395d0 15980.45908d0 -62.32800d0)
   (14.33120687d0 7758.75163d0 259.95938d0)
   (93.05723748d0 20395.49439d0 190.25952d0)
   (173.00529106d0 3215.56238d0 -34.09288d0)
   (48.12027554d0 1050.71912d0 27.39717d0)))

(defconst-double-array +dinc+ (8 3)
  ((7.00498625d0 -214.25629d0 0.28977d0)
   (3.39466189d0 -30.84437d0 -11.67836d0)
   (0d0 469.97289d0 -3.35053d0)
   (1.84972648d0 -293.31722d0 -8.11830d0)
   (1.30326698d0 -71.55890d0 11.95297d0)
   (2.48887878d0 91.85195d0 -17.66225d0)
   (0.77319689d0 -60.72723d0 1.25759d0)
   (1.76995259d0 8.12333d0 0.08135d0)))

(defconst-double-array +omega+ (8 3)
  ((48.33089304d0 -4515.21727d0 -31.79892d0)
   (76.67992019d0 -10008.48154d0 -51.32614d0)
   (174.87317577d0 -8679.27034d0 15.34191d0)
   (49.55809321d0 -10620.90088d0 -230.57416d0)
   (100.46440702d0 6362.03561d0 326.52178d0)
   (113.66550252d0 -9240.19942d0 -66.23743d0)
   (74.00595701d0 2669.15033d0 145.93964d0)
   (131.78405702d0 -221.94322d0 -0.78728d0)))

;; tables for trigonometric terms to be added to the mean elements
;; of the semi-major axes.

(defconst-double-array +kp+ (8 9)
  ((69613.0d0 75645.0d0 88306.0d0 59899.0d0 15746.0d0 71087.0d0 142173.0d0 3086.0d0 0.0d0)
   (21863.0d0 32794.0d0 26934.0d0 10931.0d0 26250.0d0 43725.0d0 53867.0d0 28939.0d0 0.0d0)
   (16002.0d0 21863.0d0 32004.0d0 10931.0d0 14529.0d0 16368.0d0 15318.0d0 32794.0d0 0.0d0)
   (6345.0d0 7818.0d0 15636.0d0 7077.0d0 8184.0d0 14163.0d0 1107.0d0 4872.0d0 0.0d0)
   (1760.0d0 1454.0d0 1167.0d0 880.0d0 287.0d0 2640.0d0 19.0d0 2047.0d0 1454.0d0)
   (574.0d0 0.0d0 880.0d0 287.0d0 19.0d0 1760.0d0 1167.0d0 306.0d0 574.0d0)
   (204.0d0 0.0d0 177.0d0 1265.0d0 4.0d0 385.0d0 200.0d0 208.0d0 204.0d0)
   (0.0d0 102.0d0 106.0d0 4.0d0 98.0d0 1367.0d0 487.0d0 204.0d0 0.0d0)))

(defconst-double-array +ca+ (8 9)
  ((4.0d0 -13.0d0 11.0d0 -9.0d0 -9.0d0 -3.0d0 -1.0d0 4.0d0 0.0d0)
   (-156.0d0 59.0d0 -42.0d0 6.0d0 19.0d0 -20.0d0 -10.0d0 -12.0d0 0.0d0)
   (64.0d0 -152.0d0 62.0d0 -8.0d0 32.0d0 -41.0d0 19.0d0 -11.0d0 0.0d0)
   (124.0d0 621.0d0 -145.0d0 208.0d0 54.0d0 -57.0d0 30.0d0 15.0d0 0.0d0)
   (-23437.0d0 -2634.0d0 6601.0d0 6259.0d0 -1507.0d0 -1821.0d0 2620.0d0 -2115.0d0 -1489.0d0)
   (62911.0d0 -119919.0d0 79336.0d0 17814.0d0 -24241.0d0 12068.0d0 8306.0d0 -4893.0d0 8902.0d0)
   (389061.0d0 -262125.0d0 -44088.0d0 8387.0d0 -22976.0d0 -2093.0d0 -615.0d0 -9720.0d0 6633.0d0)
   (-412235.0d0 -157046.0d0 -31430.0d0 37817.0d0 -9740.0d0 -13.0d0 -7449.0d0 9644.0d0 0.0d0)))

(defconst-double-array +sa+ (8 9)
  ((-29.0d0 -1.0d0 9.0d0 6.0d0 -6.0d0 5.0d0 4.0d0 0.0d0 0.0d0)
   (-48.0d0 -125.0d0 -26.0d0 -37.0d0 18.0d0 -13.0d0 -20.0d0 -2.0d0 0.0d0)
   (-150.0d0 -46.0d0 68.0d0 54.0d0 14.0d0 24.0d0 -28.0d0 22.0d0 0.0d0)
   (-621.0d0 532.0d0 -694.0d0 -20.0d0 192.0d0 -94.0d0 71.0d0 -73.0d0 0.0d0)
   (-14614.0d0 -19828.0d0 -5869.0d0 1881.0d0 -4372.0d0 -2255.0d0 782.0d0 930.0d0 913.0d0)
   (139737.0d0 0.0d0 24667.0d0 51123.0d0 -5102.0d0 7429.0d0 -4095.0d0 -1976.0d0 -9566.0d0)
   (-138081.0d0 0.0d0 37205.0d0 -49039.0d0 -41901.0d0 -33872.0d0 -27037.0d0 -12474.0d0 18797.0d0)
   (0.0d0 28492.0d0 133236.0d0 69654.0d0 52322.0d0 -49577.0d0 -26430.0d0 -3593.0d0 0.0d0)))

;; tables giving the trigonometric terms to be added to the mean
;; elements of the mean longitudes.

(defconst-double-array +kq+ (8 10)
  ((3086.0d0 15746.0d0 69613.0d0 59899.0d0 75645.0d0 88306.0d0 12661.0d0 2658.0d0 0.0d0 0.0d0)
   (21863.0d0 32794.0d0 10931.0d0 73.0d0 4387.0d0 26934.0d0 1473.0d0 2157.0d0 0.0d0 0.0d0)
   (10.0d0 16002.0d0 21863.0d0 10931.0d0 1473.0d0 32004.0d0 4387.0d0 73.0d0 0.0d0 0.0d0)
   (10.0d0 6345.0d0 7818.0d0 1107.0d0 15636.0d0 7077.0d0 8184.0d0 532.0d0 10.0d0 0.0d0)
   (19.0d0 1760.0d0 1454.0d0 287.0d0 1167.0d0 880.0d0 574.0d0 2640.0d0 19.0d0 1454.0d0)
   (19.0d0 574.0d0 287.0d0 306.0d0 1760.0d0 12.0d0 31.0d0 38.0d0 19.0d0 574.0d0)
   (4.0d0 204.0d0 177.0d0 8.0d0 31.0d0 200.0d0 1265.0d0 102.0d0 4.0d0 204.0d0)
   (4.0d0 102.0d0 106.0d0 8.0d0 98.0d0 1367.0d0 487.0d0 204.0d0 4.0d0 102.0d0)))

(defconst-double-array +cl+ (8 10)
  ((21.0d0 -95.0d0 -157.0d0 41.0d0 -5.0d0 42.0d0 23.0d0 30.0d0 0.0d0 0.0d0)
   (-160.0d0 -313.0d0 -235.0d0 60.0d0 -74.0d0 -76.0d0 -27.0d0 34.0d0 0.0d0 0.0d0)
   (-325.0d0 -322.0d0 -79.0d0 232.0d0 -52.0d0 97.0d0 55.0d0 -41.0d0 0.0d0 0.0d0)
   (2268.0d0 -979.0d0 802.0d0 602.0d0 -668.0d0 -33.0d0 345.0d0 201.0d0 -55.0d0 0.0d0)
   (7610.0d0 -4997.0d0 -7689.0d0 -5841.0d0 -2617.0d0 1115.0d0 -748.0d0 -607.0d0 6074.0d0 354.0d0)
   (-18549.0d0 30125.0d0 20012.0d0 -730.0d0 824.0d0 23.0d0 1289.0d0 -352.0d0 -14767.0d0 -2062.0d0)
   (-135245.0d0 -14594.0d0 4197.0d0 -4030.0d0 -5630.0d0 -2898.0d0 2540.0d0 -306.0d0 2939.0d0 1986.0d0)
   (89948.0d0 2103.0d0 8963.0d0 2695.0d0 3682.0d0 1648.0d0 866.0d0 -154.0d0 -1963.0d0 -283.0d0)))

(defconst-double-array +sl+ (8 10)
  ((-342.0d0 136.0d0 -23.0d0 62.0d0 66.0d0 -52.0d0 -33.0d0 17.0d0 0.0d0 0.0d0)
   (524.0d0 -149.0d0 -35.0d0 117.0d0 151.0d0 122.0d0 -71.0d0 -62.0d0 0.0d0 0.0d0)
   (-105.0d0 -137.0d0 258.0d0 35.0d0 -116.0d0 -88.0d0 -112.0d0 -80.0d0 0.0d0 0.0d0)
   (854.0d0 -205.0d0 -936.0d0 -240.0d0 140.0d0 -341.0d0 -97.0d0 -232.0d0 536.0d0 0.0d0)
   (-56980.0d0 8016.0d0 1012.0d0 1448.0d0 -3024.0d0 -3710.0d0 318.0d0 503.0d0 3767.0d0 577.0d0)
   (138606.0d0 -13478.0d0 -4964.0d0 1441.0d0 -1319.0d0 -1482.0d0 427.0d0 1236.0d0 -9167.0d0 -1918.0d0)
   (71234.0d0 -41116.0d0 5334.0d0 -4935.0d0 -1848.0d0 66.0d0 434.0d0 -1748.0d0 3780.0d0 -701.0d0)
   (-47645.0d0 11647.0d0 2166.0d0 3194.0d0 679.0d0 0.0d0 -244.0d0 -419.0d0 -2531.0d0 48.0d0)))

;; Normalize angle into the range -pi <= A < +pi.

(defmacro frem (x y)
  `(progn
    (setf q (the double-float (ftruncate (the (double-float -1d6 1d6) (/ ,x ,y)))))
    (- ,x (* q ,y))))

(defmacro anpm (aa)
  (let ((a (gensym)))
    `(let* ((,a ,aa)
            (q (frem ,a +twopi+)))
     (declare (type double-float ,a q))
     (the (double-float -4d0 4d0)
       (if (> (abs q) +pi+)
           (- q (if (< ,a 0d0) ,(- +twopi+) +twopi+))
           q)))))

(defconstant float-bound-low -9.223372036854777d+18)
(defconstant float-bound-high 9.223372036854777d+18)

(defmacro sin (arg)
  `(cl:sin (the (double-float ,float-bound-low ,float-bound-high) ,arg)))

(defmacro cos (arg)
  `(cl:cos (the (double-float ,float-bound-low ,float-bound-high) ,arg)))

;; The reference frame is equatorial and is with respect to the
;; mean equator and equinox of epoch j2000.

(declaim (inline planetpv))
(defun planetpv (epoch np pv)
  (declare (type (simple-array double-float (2)) epoch)
           (type fixnum np)
           (type (simple-array double-float (2 3)) pv))
  ;; time: julian millennia since j2000.

  (with-array-slices (np +a+ +dlm+ +e+ +pi-a+ +dinc+ +omega+ +kp+ +kq+ +ca+ +cl+ +sa+ +sl+)
     (let* ((q 0d0)
	    (t (/ (+ (- (aref epoch 0) +j2000+) (aref epoch 1)) +jmillenia+))
	    ;; compute the mean elements.
	    (da (+ (aref +a+ np 0) (* (+ (aref +a+ np 1) (* (aref +a+ np 2) t)) t)))
	    (dl (* (+ (* 3600.0d0 (aref +dlm+ np 0)) (* (+ (aref +dlm+ np 1) (* (aref +dlm+ np 2) t)) t)) +a2r+))
	    (de (+ (aref +e+ np 0) (* (+ (aref +e+ np 1) (* (aref +e+ np 2) t)) t)))
	    (dp (let ((temp (* (+ (* 3600.0d0 (aref +pi-a+ np 0)) (* (+ (aref +pi-a+ np 1) (* (aref +pi-a+ np 2) t)) t)) +a2r+))) (anpm temp)))
	    (di (* (+ (* 3600.0d0 (aref +dinc+ np 0)) (* (+ (aref +dinc+ np 1) (* (aref +dinc+ np 2) t)) t)) +a2r+))
	    (doh (anpm (* (+ (* 3600.0d0 (aref +omega+ np 0)) (* (+ (aref +omega+ np 1) (* (aref +omega+ np 2) t)) t)) +a2r+)))
	    ;; apply the trigonometric terms.
	    (dmu (* 0.35953620d0 t))
	    (arga 0.0d0) (argl 0.0d0))
       (declare (type double-float q t da dl de dp di doh dmu arga argl))
       
       (do* ((k 0 (1+ k)))
	   ((<= 8 k))
	 (declare (type fixnum k))
	 (setf arga (* (aref +kp+ np k) dmu)
	       argl (* (aref +kq+ np k) dmu))
	 (incf da (* (+ (* (aref +ca+ np k) (cos arga)) (* (aref +sa+ np k) (sin arga))) 0.0000001d0))
	 (incf dl (* (+ (* (aref +cl+ np k) (cos argl)) (* (aref +sl+ np k) (sin argl))) 0.0000001d0)))
    
       (setf arga (* (aref +kp+ np 8) dmu))
       (incf da (* t (+ (* (aref +ca+ np 8) (cos arga)) (* (aref +sa+ np 8) (sin arga))) 0.0000001d0))
       
       (do* ((k 8 (1+ k)))
	   ((< 9 k))
	 (declare (type fixnum k))
	 (setf argl (* (aref +kq+ np k) dmu))
	 (incf dl (* t (+ (* (aref +cl+ np k) (cos argl)) (* (aref +sl+ np k) (sin argl))) 0.0000001d0)))
       
       (setf dl (the double-float (frem dl +twopi+)))
    
       ;; iterative solution of kepler's equation to get eccentric anomaly.
       (let* ((am (- dl dp))
	      (ae (+ am (* de (sin am))))
	      (k 0) (dae 0d0))
	 (declare (type double-float am ae dae)
		  (type fixnum k))
	 
	 (loop
	  (setf dae (/ (- (+ am (* de (sin ae))) ae) (- 1.0d0 (* de (cos ae)))))
	  (incf ae dae)
	  (incf k)
	  (when (or (>= k 10) (< (abs dae) 1d-12))
	    (return)))
      
	 ;; true anomaly.
	 (let* ((ae2 (/ ae 2.0d0))
		(at (* 2.0d0 (atan (* (sqrt (the (double-float 0d0) (/ (+ 1.0d0 de) (- 1.0d0 de)))) (sin ae2)) (cos ae2))))
		;; distance (au) and speed (radians per day).
		(r (* da (- 1.0d0 (* de (cos ae)))))
		(v (* +gaussk+ (sqrt (the (double-float 0d0) (/ (+ 1.0d0 (/ 1.0d0 (aref +amas+ np))) (* da da da))))))
		(si2 (sin (/ di 2.0d0)))
		(xq (* si2 (cos doh)))
		(xp (* si2 (sin doh)))
		(tl (+ at dp))
		(xsw (sin tl))
		(xcw (cos tl))
		(xm2 (* 2.0d0 (- (* xp xcw) (* xq xsw))))
		(xf (/ da (sqrt (the (double-float 0d0) (- 1.0d0 (* de de))))))
		(ci2 (cos (/ di 2.0d0)))
		(xms (* (+ (* de (sin dp)) xsw) xf))
		(xmc (* (+ (* de (cos dp)) xcw) xf))
		(xpxq2 (* 2.0d0 xp xq))
		;; position (j2000 ecliptic x,y,z in au).
		(x (* r (- xcw (* xm2 xp))))
		(y (* r (+ xsw (* xm2 xq))))
		(z (* r (* (- xm2) ci2))))
	   (declare (type double-float ae2 at r v si2 xq xp tl xsw
			  xcw xm2 xf ci2 xms xmc xpxq2 x y z))
	   
	   ;; rotate to equatorial.
	   (setf (aref pv 0 0) x
		 (aref pv 0 1) (- (* y +coseps+) (* z +sineps+))
		 (aref pv 0 2) (+ (* y +sineps+) (* z +coseps+)))
	   
	   ;; velocity (j2000 ecliptic xdot,ydot,zdot in au/d).
	   (setf x (* v (+ (* (+ -1.0d0 (* 2.0d0 xp xp)) xms) (* xpxq2 xmc)))
		 y (* v (- (* (- 1.0d0 (* 2.0d0 xq xq)) xmc) (* xpxq2 xms)))
		 z (* v (* 2.0d0 ci2 (+ (* xp xms) (* xq xmc)))))
	   
	   ;; rotate to equatorial.
	   (setf (aref pv 1 0) x
		 (aref pv 1 1) (- (* y +coseps+) (* z +sineps+))
		 (aref pv 1 2) (+ (* y +sineps+) (* z +coseps+)))
	   (values))))))
  
;; Computes RA, Declination, and distance from a state vector returned by
;; planetpv.

(declaim (inline radecdist))
(defun radecdist (state rdd)
  (declare (type (simple-array double-float (2 3)) state)
           (type (simple-array double-float (3)) rdd))
  ;; distance
  (setf (aref rdd 2) (sqrt (the (double-float 0.0d0)
                             (+ (* (aref state 0 0) (aref state 0 0))
                                (* (aref state 0 1) (aref state 0 1))
                                (* (aref state 0 2) (aref state 0 2))))))

  ;; RA
  (setf (aref rdd 0) (* (atan (aref state 0 1) (aref state 0 0)) +r2h+))
  (when (< (aref rdd 0) 0.0d0) (incf (aref rdd 0) 24.0d0))

  ;; Declination
  (setf (aref rdd 1) (* (asin (the (double-float -1d0 1d0)
                                (/ (aref state 0 2) (aref rdd 2)))) +r2d+))
  (values))

;; Entry point
;; Calculate RA and Dec for noon on every day in 1900-2100

(defun main ()
  (let ((jd (make-array '(2) :element-type 'double-float :initial-element 0d0))
        (pv (make-array '(2 3) :element-type 'double-float :initial-element 0d0))
        (position (make-array '(3) :element-type 'double-float :initial-element 0d0)))
    (dotimes (i +test-loops+)
      (declare (type fixnum i))
      (setf (aref jd 0) +j2000+
            (aref jd 1) 0.0d0)
      (dotimes (n +test-length+)
        (declare (type fixnum n))
        (incf (aref jd 0) 1.0d0)
        ;;(format cl:t "JD = ~A~%" (aref jd 0))
        (dotimes (p 8)
          (declare (type fixnum p))
          (planetpv jd p pv)
          (radecdist pv position)
	  ;; (if (and (= i 0) (= n 0))
	  ;; (format cl:t "~A ~A ~A~%" p (aref position 0) (aref position 1)))
	  ))))
  (values))
From: Thomas F. Burdick
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <xcvwu5w5gjm.fsf@famine.OCF.Berkeley.EDU>
rif <···@mit.edu> writes:

> So what have we learned?  We confirmed what we pretty much knew: you
> can write a C program in CL, at which point the relative speed of your
> C and CL versions will depend on the relative quality of the code
> generation.  It's worthwhile to know the specifics of your
> implementation (for instance, it seems that a substantial amount of
> the last-mile speed up for this benchmark on CMUCL came from figuring
> out ftruncate could be fast inside a macro, but not inside an inline
> function).

We already knew this, but I think examples of how to do this are of
potentially great value.  As more people are drawn to Lisp, there will
be more people who need to microoptimize portions of their code, but
don't know how.  Educationals like this are one way we can get this
knowledge passed on.  Hey, c.l.lisp just made something out of
nothing^[^Dnobody.

> What would be even more interesting to me would be an example more
> like what I feel I experience anecdotally --- that I write programs
> that are a bit slower than C (maybe 25% to 50%), but they're much more
> flexible, more abstract, cleaner, easily modifiable, and so on.  I
> don't really feel like the CL version of almabench we currently have
> shows any of the benefits of CL.

This didn't show any of the benefits of CL wrt C++, but it does show
some of the benefits wrt languages like Squeak Smalltalk or Python.
Sure, it's possible to call out to C or C++ or Fortran, but if you
have working Lisp code, and you know how to micro-optimize for your
implementation, you can spend an hour or so (how long did Duane say he
spent on it?) and get C-ish performance.  And still have portable CL,
and all the safety benefits thereof.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'                               
From: Paul F. Dietz
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <ccqdnZirJsy5ItbdRVn-tw@dls.net>
Thomas F. Burdick wrote:

> We already knew this, but I think examples of how to do this are of
> potentially great value.  As more people are drawn to Lisp, there will
> be more people who need to microoptimize portions of their code, but
> don't know how.  Educationals like this are one way we can get this
> knowledge passed on.  Hey, c.l.lisp just made something out of
> nothing^[^Dnobody.

At least one compiler has already been patched to eliminate a
performance snafu revealed by this benchmark.

	Paul
From: Will Hartung
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <c2lgmq$1qen6k$1@ID-197644.news.uni-berlin.de>
"Thomas F. Burdick" <···@famine.OCF.Berkeley.EDU> wrote in message
····················@famine.OCF.Berkeley.EDU...
> > What would be even more interesting to me would be an example more
> > like what I feel I experience anecdotally --- that I write programs
> > that are a bit slower than C (maybe 25% to 50%), but they're much more
> > flexible, more abstract, cleaner, easily modifiable, and so on.  I
> > don't really feel like the CL version of almabench we currently have
> > shows any of the benefits of CL.
>
> This didn't show any of the benefits of CL wrt C++, but it does show
> some of the benefits wrt languages like Squeak Smalltalk or Python.
> Sure, it's possible to call out to C or C++ or Fortran, but if you
> have working Lisp code, and you know how to micro-optimize for your
> implementation, you can spend an hour or so (how long did Duane say he
> spent on it?) and get C-ish performance.  And still have portable CL,
> and all the safety benefits thereof.

To clarify, I think it did show some of the benefits of CL. You'll note that
this started with what was a trivial hand conversion of the C++ program.

Once that was done, a lot of the overall program remained the same, with
many of the speed ups happening not within the code itself directly, but
abstractly through macros defined on the outside.

While at some point a bug was introduced (it may well have been there from
the beginning, I don't recall), it went on to show that something like this
could have easily been isolated in a larger program, placed into its own
sandbox for safety and and hammered on to get the desired affect.

You'll note that early on there was little presumption over redefining basic
pieces like +, -, /, and * in order to facilitate optimizing the system. We
changed from 2-D arrays to 1-D arrays with little changes to either the
array defiinitions or the array access!

It also showed how we can specify type ranges, and how the compilers can act
upon them to make any potential improvements. These are semantics that can't
even be communicated in other environments (some yes, but certainly not
all).

It also showed that CL using certain implementations can get down pretty
close to the metal. The beauty of course is that typically in most programs,
you do not have to optimize that far. Yet, after all of that, the actual
code used to get down to the metal is at face value no different from other
general, safer code (the macros shrouded those details).

Finally, with a minor change of the declaim statement, all of the safety
that comes with CL comes back in the blink of a re-compile, so if the
algorithm is not completely thought out or starts misbehaving, that can all
be turned back on and the algorithm debugged (assuming, of course, that the
bug is in the algorithm and not the compiler).

So, I think this is a stellar example of the capabilities of CL.

Regards,

Will Hartung
(·····@msoft.com)
From: rif
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <wj0brn5twx9.fsf@five-percent-nation.mit.edu>
"Will Hartung" <·····@msoft.com> writes:

> "Thomas F. Burdick" <···@famine.OCF.Berkeley.EDU> wrote in message
> ····················@famine.OCF.Berkeley.EDU...
> > > What would be even more interesting to me would be an example more
> > > like what I feel I experience anecdotally --- that I write programs
> > > that are a bit slower than C (maybe 25% to 50%), but they're much more
> > > flexible, more abstract, cleaner, easily modifiable, and so on.  I
> > > don't really feel like the CL version of almabench we currently have
> > > shows any of the benefits of CL.
> >
> > This didn't show any of the benefits of CL wrt C++, but it does show
> > some of the benefits wrt languages like Squeak Smalltalk or Python.
> > Sure, it's possible to call out to C or C++ or Fortran, but if you
> > have working Lisp code, and you know how to micro-optimize for your
> > implementation, you can spend an hour or so (how long did Duane say he
> > spent on it?) and get C-ish performance.  And still have portable CL,
> > and all the safety benefits thereof.
> 
> To clarify, I think it did show some of the benefits of CL. You'll note that
> this started with what was a trivial hand conversion of the C++ program.
> 
> Once that was done, a lot of the overall program remained the same, with
> many of the speed ups happening not within the code itself directly, but
> abstractly through macros defined on the outside.
> 
> While at some point a bug was introduced (it may well have been there from
> the beginning, I don't recall), it went on to show that something like this
> could have easily been isolated in a larger program, placed into its own
> sandbox for safety and and hammered on to get the desired affect.
> 
> You'll note that early on there was little presumption over redefining basic
> pieces like +, -, /, and * in order to facilitate optimizing the system. We
> changed from 2-D arrays to 1-D arrays with little changes to either the
> array defiinitions or the array access!
> 
> It also showed how we can specify type ranges, and how the compilers can act
> upon them to make any potential improvements. These are semantics that can't
> even be communicated in other environments (some yes, but certainly not
> all).
> 
> It also showed that CL using certain implementations can get down pretty
> close to the metal. The beauty of course is that typically in most programs,
> you do not have to optimize that far. Yet, after all of that, the actual
> code used to get down to the metal is at face value no different from other
> general, safer code (the macros shrouded those details).
> 
> Finally, with a minor change of the declaim statement, all of the safety
> that comes with CL comes back in the blink of a re-compile, so if the
> algorithm is not completely thought out or starts misbehaving, that can all
> be turned back on and the algorithm debugged (assuming, of course, that the
> bug is in the algorithm and not the compiler).
> 
> So, I think this is a stellar example of the capabilities of CL.
> 
> Regards,
> 
> Will Hartung
> (·····@msoft.com)

Very nice points.  I retract my overly broad statement that this program
does show any of the benefits of CL.

However, I still feel that what I was trying to say has some value.
Looking at this program, it doesn't (to me) represent a huge
improvement over the C version.  It looks to me like a C program
written in CL.  On the other hand, for many programs, including
programs that do a lot of floating point computation, I feel like the
CL program would be substantially shorter, cleaner, and more elegant
than the "equivalent" C program, while *still* being not too much
slower.  We have a lot of features that make this plausible ---
closures and CLOS coming most immediately to mind.  I would find a
well put together example of this sort much more compelling, and this
is more what I meant before.

Cheers,

rif

ps.  As a side note, one of the optimizations in the program was to
state that numbers were in a certain range, and this made the calls to
sin and cos faster and/or less consy.  Can anyone explain what this
was about, and conjecture what gcc or other C compilers do with that?
Intuitively, it doesn't seem to make sense to take sin or cos of very
large double-precision floats at all, because the distance between
adjacent representable numbers becomes large relative to 2*pi, so the
difference between the sin of your number and the sign of the closest
double-float to your number becomes arbitrary.
From: Will Hartung
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <c2lne7$1vc8lq$1@ID-197644.news.uni-berlin.de>
"rif" <···@mit.edu> wrote in message
····················@five-percent-nation.mit.edu...

> Very nice points.  I retract my overly broad statement that this program
> does show any of the benefits of CL.
>
> However, I still feel that what I was trying to say has some value.
> Looking at this program, it doesn't (to me) represent a huge
> improvement over the C version.  It looks to me like a C program
> written in CL.

The problem domain in this case is very small, which has some impact on the
overall design. If this was the singular task to be done, then the "C
program in CL" may well have been proper result.

> On the other hand, for many programs, including
> programs that do a lot of floating point computation, I feel like the
> CL program would be substantially shorter, cleaner, and more elegant
> than the "equivalent" C program, while *still* being not too much
> slower.  We have a lot of features that make this plausible ---
> closures and CLOS coming most immediately to mind.  I would find a
> well put together example of this sort much more compelling, and this
> is more what I meant before.

Well, of course, that's the other benefit of CL. In CL we can write C
programs, but you can't write CL programs in C :-).

> ps.  As a side note, one of the optimizations in the program was to
> state that numbers were in a certain range, and this made the calls to
> sin and cos faster and/or less consy.  Can anyone explain what this
> was about, and conjecture what gcc or other C compilers do with that?
> Intuitively, it doesn't seem to make sense to take sin or cos of very
> large double-precision floats at all, because the distance between
> adjacent representable numbers becomes large relative to 2*pi, so the
> difference between the sin of your number and the sign of the closest
> double-float to your number becomes arbitrary.

I think it's potentially to tell the compiler that certain edge conditions
cases will simply not happen:

CL-USER 1 > (sin (sqrt -1))
#C(0.0 1.1752011936438014)

and just because it "doesn't make sense" to send huge floats to sin/cos,
doesn't mean that it shouldn't take into account that it can be used as
such. So, the specifier just helped narrow it down that much more.

Regards,

Will Hartung
(·····@msoft.com)
From: Pascal Bourguignon
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <877jxt1n5n.fsf@thalassa.informatimago.com>
rif <···@mit.edu> writes:
> However, I still feel that what I was trying to say has some value.
> Looking at this program, it doesn't (to me) represent a huge
> improvement over the C version.  It looks to me like a C program
> written in CL.  

Right  on!  We  started  with  the C  source  and "mechanically"  hand
translated to  Common-Lisp, without even thinking about  what it does.
All the remaining optimizations  have been done thusly with absolutely
no consideration  of the sematics  or the specific algorithms  of this
program.

> On the other hand, for many programs, including
> programs that do a lot of floating point computation, I feel like the
> CL program would be substantially shorter, cleaner, and more elegant
> than the "equivalent" C program, while *still* being not too much
> slower.  We have a lot of features that make this plausible ---
> closures and CLOS coming most immediately to mind.  I would find a
> well put together example of this sort much more compelling, and this
> is more what I meant before.

An  obvious "optimization" would  be to  denote the  program in  a way
leading  to parallel  execution.   In a  true lisp  re-implementation,
there  would  be  no  loops  in planetpv.   All  processing  would  be
expressed globally with  macros that would generate code  either for a
single processor, or for a  threaded multiprocessor, or for a parallel
machine, or perhaps using  vector instructions. Of course, the "naive"
implementation  mapping directly these  macros to  loops would  not be
faster than  the C++  version, but by  just implementing  these macros
with vectorial  instructions, we could  easily get a Lisp  program ten
times faster than any C++ implementation.

-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Raymond Toy
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <4n8yi8ycwa.fsf@edgedsp4.rtp.ericsson.se>
>>>>> "rif" == rif  <···@mit.edu> writes:

    rif> ps.  As a side note, one of the optimizations in the program was to
    rif> state that numbers were in a certain range, and this made the calls to
    rif> sin and cos faster and/or less consy.  Can anyone explain what this
    rif> was about, and conjecture what gcc or other C compilers do with that?
    rif> Intuitively, it doesn't seem to make sense to take sin or cos of very
    rif> large double-precision floats at all, because the distance between
    rif> adjacent representable numbers becomes large relative to 2*pi, so the
    rif> difference between the sin of your number and the sign of the closest
    rif> double-float to your number becomes arbitrary.

For sin/cos, this was mostly to get rid of a compiler note from
sbcl/cmucl on x86.  On these platforms, sin/cos of double-floats is
done by using the sin/cos instructions, but they are only accurate for
numbers below 2^64.  Try (cos (scale-float 1d0 200)) on x86 and try on
sparc.  The sparc answer, which calls out to the C cos function, is
the correct answer for cos(2^200).  The cos function there has some
thousands bits of pi so it can do range reduction accurately.

For ftruncate, the arg is declared to be in the range -1d6 to 1d6.
This may very well be lie---I didn't check.  It allowed the compiler
to use the float->integer and integer->float instructions to do
ftruncate.  Otherwise, a call to an ftruncate the returns a bignum
or a call to a special ftruncate operating on the float number itself
would be needed.

Ray
From: Matthias
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <36w65ddrmq6.fsf@hundertwasser.ti.uni-mannheim.de>
rif <···@mit.edu> writes:

> I feel like the CL program would be substantially shorter, cleaner,
> and more elegant than the "equivalent" C program, while *still*
> being not too much slower.  We have a lot of features that make this
> plausible --- closures and CLOS coming most immediately to mind.  I
> would find a well put together example of this sort much more
> compelling, and this is more what I meant before.

It is interesting that you mention CLOS: I'll be TA for Computer
Graphics class in summer and I'm planning to let the students write a
raytracer as term project.  The language probably will be Scheme (I
haven't completely made up my mind yet).

While prototyping a possible student's solution I found that
performance is adequate or good /unless/ I use CLOS classes. A CLOS
function call seems much (say 4-10 times) more expensive than an
ordinary function call.  Is this what one would expect?  Or did I just
pick a slow CLOS implementation?

Thanks,

 Matthias

(Background: I originally had in mind to provide some basic
vector/matrix operations on 4x4 matrices and homogeneous 3-vectors.
For such small functions the function call overhead is not
negligible.)
From: Ray Dillinger
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <4050B807.6D1C44D9@sonic.net>
Matthias wrote:
> 
> It is interesting that you mention CLOS: I'll be TA for Computer
> Graphics class in summer and I'm planning to let the students write a
> raytracer as term project.  The language probably will be Scheme (I
> haven't completely made up my mind yet).
> 
> While prototyping a possible student's solution I found that
> performance is adequate or good /unless/ I use CLOS classes. A CLOS
> function call seems much (say 4-10 times) more expensive than an
> ordinary function call.  Is this what one would expect?  Or did I just
> pick a slow CLOS implementation?
 
I think raytracing is a good example of an application where speed 
is a serious bottleneck and worth sacrificing some flexibility for, 
and there is overhead in any portable implementation of CLOS calls.
TinyClos, which is probably the CLOS you're using with scheme, is 
an example of a portable implementation of CLOS; it takes longer 
because all arguments to class functions wind up being boxed and 
consed.  Common Lisps, which mostly have native implementations of 
CLOS, frequently avoid this overhead. 

If it's important to stick with the simpler language for pedagogic 
reasons, I'd suggest a scheme with native support for SRFI-9 records 
instead of a scheme with TinyClos.  

Also, unless it's part of the learning exercise, you should provide 
matrix multiplies that are specialized to only use doubles for speed.
Common Lisp provides a portable way to do that with declarations, but 
relatively few schemes support this kind of specialization at all. 
The main exception is Stalin (in which *all* numbers are integers or 
doubles, overflow and roundoff errors be damned).

I usually prefer scheme for flexibility, simplicity, and clean 
semantics, but this is a speed-bottlenecked project and I'd consider 
Common Lisp to be the better tool for it because Common Lisp has 
better support for the kind of specializations you need to achieve 
this speed. 

				Bear
From: Pascal Bourguignon
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <87d67ixjml.fsf@thalassa.informatimago.com>
Matthias <··@spam.pls> writes:
> While prototyping a possible student's solution I found that
> performance is adequate or good /unless/ I use CLOS classes. A CLOS
> function call seems much (say 4-10 times) more expensive than an
> ordinary function call.  Is this what one would expect?  Or did I just
> pick a slow CLOS implementation?


I found  that method  calls were  only 2-3 times  as long  as function
calls in sbcl and clisp:


-------------------- ---------------- ---------------- ----------------
implementation       function call    generic (1 meth) generic (1000 m)
-------------------- ---------------- ---------------- ----------------
clisp    10e6 loops  ratio gen/fun:              2.18            2.21
        Time  (s)               6.85            14.92           15.15
        Space (byte)            0           18,768         133,188
-------------------- ---------------- ---------------- ----------------
sbcl    100e6 loops  ratio gen/fun:              2.10             2.64
        Time  (s)               5.16            10.83            13.61
        Space (byte)           56        1,225,776       16,130,672 
-------------------- ---------------- ---------------- ----------------
g++       1e9 loops  ratio meth/fun:             1.06             1.07
         Time (s)              11.64            12.35            12.45
-------------------- ---------------- ---------------- ----------------

Benchmark run on:
    Mach kernel version: Darwin Kernel Version 6.8:
    1 processor is physically available.
    Processor type: ppc750 (PowerPC 750)
    Primary memory available: 640.00 megabytes.

------------------------------------------------------------------------
#||
(load (compile-file "compare-method-function.lisp"))
(push :test=m *features*)
(load (compile-file "compare-method-function.lisp"))
||#
(defparameter loops 10000000)
(defparameter classes 1000)
#+sbcl (setf loops (* 10 loops))

(defun      fun-x (a b c)         (declare (ignore a b c)))
(defgeneric gen-x (a b c))

(defclass test () ())
(defmethod  gen-x ((a test) b c)  (declare (ignore a b c)))

(defmacro stuff ()
  (let ((res '()))
    (dotimes (i classes)
      (let ((cn (intern (format nil "TEST-~D" i))))
        (push `(defmethod gen-x ((a ,cn) b c) (declare (ignore a b c))) res)
        (push `(defclass ,cn () ()) res)))
    (push 'progn res)
    res));;stuff

#+test=m (stuff)

(let ((o (make-instance 'test)))
  (time (dotimes (i loops) (fun-x o 1 2)))
  (time (dotimes (i loops) (gen-x o 1 2))))

;;;; compare-method-function.lisp     -- 2004-03-06 03:59:40 -- pascal   ;;;;
------------------------------------------------------------------------
/*
    i=0 ; while [ $i -lt 1000 ] ; do
        echo  "class test_$i:public root {public: virtual void gen_x(int b,int c); };"
        echo "void test_$i::gen_x(int b,int c){}"
        i=$(( $i + 1 ))
    done > classes.inc
    g++ -o cmf-1 compare-method-function.c++  -O3
    g++ -o cmf-m compare-method-function.c++  -O3 -Dtest_m
    time ./cmf-1 ; time ./cmf-1 m ; time ./cmf-m m
*/

    static int loops=1000000;

    class root {  public: virtual void gen_x(int b,int c); };
    void root::gen_x(int b,int c){}
    class test:public root { public:  virtual void gen_x(int b,int c); };
    void test::gen_x(int b,int c){}

    void fun_x(root* a,int b,int c){}

#ifdef test_m
#include "classes.inc"
#endif

    int main(int argc,char* argv){
        int j,i;
        root* o=new test;
        if(argv[0]==0){
            for(j=0;j<1000;j++)for(i=0;i<loops;i++){fun_x(o,1,2);}
            return(0);
        }else{
            for(j=0;j<1000;j++)for(i=0;i<loops;i++){o->gen_x(1,2);}
            return(0);}}
        
/*** compare-method-function.c++      -- 2004-03-06 04:19:46 -- pascal   ***/
------------------------------------------------------------------------



-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Matthias
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <36wk71q9gql.fsf@hundertwasser.ti.uni-mannheim.de>
Pascal Bourguignon <····@thalassa.informatimago.com> writes:

> I found  that method  calls were  only 2-3 times  as long  as function
> calls in sbcl and clisp:
> 
> -------------------- ---------------- ---------------- ----------------
> implementation       function call    generic (1 meth) generic (1000 m)
> -------------------- ---------------- ---------------- ----------------
> clisp    10e6 loops  ratio gen/fun:              2.18            2.21
>         Time  (s)               6.85            14.92           15.15
>         Space (byte)            0           18,768         133,188
> -------------------- ---------------- ---------------- ----------------
> sbcl    100e6 loops  ratio gen/fun:              2.10             2.64
>         Time  (s)               5.16            10.83            13.61
>         Space (byte)           56        1,225,776       16,130,672 
> -------------------- ---------------- ---------------- ----------------
> g++       1e9 loops  ratio meth/fun:             1.06             1.07
>          Time (s)              11.64            12.35            12.45
> -------------------- ---------------- ---------------- ----------------

Very interesting.  Thanks for the code.  On my system (cmucl 18d,
linux 2.4.x, P4, 3GHz, 2GB) things look quite differently:

> cmucl   100e6 loops  ratio gen/fun:             16.6              16.6
>         Time  (s)              1.6              23.57             23.57
>         Space (byte)           0     2,399,109,176     2,399,112,544

However, if I remove the ignore declarations and use 

(defun      fun-x (a b c) a) ;;         (declare (ignore a b c)))
(defmethod  gen-x ((a test) b c) a) ;;  (declare (ignore a b c)))

instead, things look more reasonable (although I didn't check that the
compiler didn't optimize away too much):

> cmucl   100e6 loops  ratio gen/fun:              2.0               2.0
>         Time  (s)              1.57              3.07              3.17
>         Space (byte)           0             8,181                0

The scheme implementation I had in mind for the students doesn't work
as fast and faces quite a CLOS penalty (code below):

> chicken 100e6 loops  ratio gen/fun:             14.6 
> scheme  Time  (s)              6.93            101.59

  Matthias

---
(require 'tinyclos)

(define loops 100000000)

(define (fun-x a b c) a)
(define-generic gen-x)

(define <test> (make-class (list <object>) (list)))

(define-method (gen-x (a <test>) b c) a)

(define (run-test)
  (let ((o (make  <test>)))
    (print "function")
    (time (do ((i 0 (+ i 1)))
	      ((= i loops) #t)
	    (fun-x o 1 2)))
    (print "generic")
    (time (do ((i 0 (+ i 1)))
	      ((= i loops) #t)
	    (gen-x o 1 2)))))

(run-test)
From: Bradley J Lucier
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <c2t5u8$4i1@arthur.cs.purdue.edu>
In article <···············@hundertwasser.ti.uni-mannheim.de>,
Matthias  <··@spam.pls> wrote:
>The scheme implementation I had in mind for the students doesn't work
>as fast and faces quite a CLOS penalty (code below):
>
>> chicken 100e6 loops  ratio gen/fun:             14.6 
>> scheme  Time  (s)              6.93            101.59

Meroon is a CLOS-like object system for Scheme with the following restrictions:

1.  Single inheritance, not multiple inheritance.

2.  Compile-time MOP, not runtime MOP.

It is well integrated with Gambit-C and is available from the Gambit web site.

It is somewhat tricky to measure the dispatch versus call time.  Here's the
first program (all programs are compiled):

(declare (standard-bindings)(extended-bindings)(block)(fixnum)(not safe)(not interrupts-enabled))

(define loops 100000000)

(define (fun-x a b c) a)
(define-generic (gen-x (a) b c))

(define-class <test> (Object)
  ())

(define-method (gen-x (a <test>) b c) a)

(define (run-test)
  (pp loops)
  (let ((o (make-<test>)))
    (pp "function")
    (time (do ((i 0 (+ i 1)))
	      ((= i loops) #t)
	    (fun-x o 1 2)))
    (pp "generic")
    (time (do ((i 0 (+ i 1)))
	      ((= i loops) #t)
	    (gen-x o 1 2)))))

(run-test)

which gives the results when compiled of

> (load "dispatch")
100000000
"function"
(time (do ((i 0 (+ i 1))) ((= i loops) #t) (fun-x o 1 2)))
    152 ms real time
    150 ms cpu time (150 user, 0 system)
    no collections
    no bytes allocated
    no minor faults
    no major faults
"generic"
(time (do ((i 0 (+ i 1))) ((= i loops) #t) (gen-x o 1 2)))
    21208 ms real time
    15610 ms cpu time (15600 user, 10 system)
    no collections
    no bytes allocated
    no minor faults
    no major faults

But because of the (block) declaration all calls to fun-x are inlined 
trivially.  So we remove that declaration (allowing the possibility that
fun-x is redefined in another file) and get

> (load "dispatch")
100000000
"function"
(time (do ((i 0 (+ i 1))) ((= i loops) #t) (fun-x o 1 2)))
    3752 ms real time
    3200 ms cpu time (3190 user, 10 system)
    no collections
    no bytes allocated
    no minor faults
    no major faults
"generic"
(time (do ((i 0 (+ i 1))) ((= i loops) #t) (gen-x o 1 2)))
    21869 ms real time
    16380 ms cpu time (16360 user, 20 system)
    no collections
    no bytes allocated
    no minor faults
    no major faults

But the dispatch code for Meroon is sitting in the compiled object file
_meroon.o1, so there are at least two inter-module calls and returns for
the object dispatch compared to one intramodule call and return for the
function call.  So we split the function and method definitions to another
file

dispatch1.scm: 

(declare (standard-bindings)(extended-bindings)(fixnum)(not safe)(not interrupts-enabled)(block))

(define (fun-x a b c) a)
(define-generic (gen-x (a) b c))

(define-class <test> (Object)
  ())

(define-method (gen-x (a <test>) b c) a)

dispatch2.scm:

(declare (standard-bindings)(extended-bindings)(fixnum)(not safe)(not interrupts-enabled) (block))

(define loops 100000000)

(define (run-test)
  (pp loops)
  (let ((o (make-<test>)))
    (pp "function")
    (time (do ((i 0 (+ i 1)))
	      ((= i loops) #t)
	    (fun-x o 1 2)))
    (pp "generic")
    (time (do ((i 0 (+ i 1)))
	      ((= i loops) #t)
	    (gen-x o 1 2)))))

(run-test)

which results in

> (load "dispatch2")
100000000
"function"
(time (do ((i 0 (+ i 1))) ((= i loops) #t) (fun-x o 1 2)))
    7763 ms real time
    5820 ms cpu time (5810 user, 10 system)
    no collections
    no bytes allocated
    no minor faults
    no major faults
"generic"
(time (do ((i 0 (+ i 1))) ((= i loops) #t) (gen-x o 1 2)))
    25753 ms real time
    19360 ms cpu time (19290 user, 70 system)
    no collections
    no bytes allocated
    no minor faults
    no major faults

So the call to gen-x goes: intermodule call to gen-x in dispatch1.o1;
intermodule call to dispatcher in _meroon.o1; intermodule call to the closure
of the method in dispatch1.o1.

So it takes a little over three times as long to do a method dispatch as it
does to do an intermodule call.

This is on a 2GH Power Mac G5.

Brad
From: felix
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <e36dad49.0403300458.2bb01ab7@posting.google.com>
Matthias <··@spam.pls> wrote in message news:<···············@hundertwasser.ti.uni-mannheim.de>...
> 
> The scheme implementation I had in mind for the students doesn't work
> as fast and faces quite a CLOS penalty (code below):
> 
> > chicken 100e6 loops  ratio gen/fun:             14.6 
> > scheme  Time  (s)              6.93            101.59
> 

It's true that TinyCLOS is not particularly fast. But the times
for the pure function call look suspiciously long. What optimization
options are you using?


cheers,
felix
From: Raymond Toy
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <ijP2c.6129$%06.3094@newsread2.news.pas.earthlink.net>
rif wrote:
>  generation.  It's worthwhile to know the specifics of your
> implementation (for instance, it seems that a substantial amount of
> the last-mile speed up for this benchmark on CMUCL came from figuring
> out ftruncate could be fast inside a macro, but not inside an inline
> function).

I don't believe that is true.  I think if you take Brian's macro and 
made it an inline function, you will find that ftruncate will still be 
inlined in all places where that function is used.

Well, at least that was true for me on the particular version of cmucl 
that I used.

Ray
From: rif
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <wj0znarp6oq.fsf@five-percent-nation.mit.edu>
Raymond Toy <····@earthlink.net> writes:

> rif wrote:
> >  generation.  It's worthwhile to know the specifics of your
> > implementation (for instance, it seems that a substantial amount of
> > the last-mile speed up for this benchmark on CMUCL came from figuring
> > out ftruncate could be fast inside a macro, but not inside an inline
> > function).
> 
> I don't believe that is true.  I think if you take Brian's macro and
> made it an inline function, you will find that ftruncate will still be
> inlined in all places where that function is used.
> 
> Well, at least that was true for me on the particular version of cmucl
> that I used.
> 
> Ray

Well, I'm not sure it's about ftruncate per se.  Maybe it's about
maximum inlining depth, or something else.  If I take Brian's CL
version, and I replace his macro with an inline function, I get a
float to pointer coercion on the return value of the inline function,
and the program slows way down.  This is using the 2003-09-10 build of
CMUCL.

rif
From: Joe Marshall
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <llmbjtde.fsf@ccs.neu.edu>
Has anyone run this through a partial evaluator?
From: pkhuong
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <51184814.0403081318.67ce722b@posting.google.com>
Joe Marshall <···@ccs.neu.edu> wrote in message news:<············@ccs.neu.edu>...
> Has anyone run this through a partial evaluator?
In approximate order of importance: 
1. Please provide minimal quoting
2. The program outputs nothing at all. Any really good optimizer would
simply output a nill program.
3. The program has no input either. Dpending on the amount of
compile-time evaluation, any good partial evaluator will calculate up
to everything itself, and return a program that outputs...nothing.
Again, a nill program.

:)

You really gotta love artificial benchmarks.
From: Joe Marshall
Subject: Re: off-topic: open challenge to Christian Lynbeck
Date: 
Message-ID: <3c8f8br3.fsf@ccs.neu.edu>
Joe Marshall <···@ccs.neu.edu> writes:

> Has anyone run this through a partial evaluator?

Since no one else has, I decided to give it a try yesterday.  The
bottom line is that it looks somewhere around a 10-20% improvement in
performance and that there were some obstacles in using the partial
evaluator.


Details:

The idea here is to speed up the benchmark by using partial evaluation
to specialize the calls to PLANETPV and RADECDIST.  The `main'
function is the benchmark driver, and I didn't consider it as part of
the code.  (Besides, if we specialized main, all the code would
disappear because there is no output.)

I used Jeffrey Mark Siskind's `partial' package available at 
  http://www.ece.purdue.edu/~qobi/software.html

It relies on the code walker from Screamer, so be sure to get that as
well.

The first difficulty is that PARTIAL comes with no documentation.
After poking around I found that it isn't too difficult to use.  To
partially evaluate, the source code must be available to the partial
evaluator.  Partial shadows the usual defining forms like DEFUN,
DEFVAR, DEFMETHOD, etc.  There is a macro `DEFINE-PARTIAL-PACKAGE'
that you can use as a quick and easy way to get these.  You may also
need to tell the partial evaluator about the types of certain
functions or objects that are used, but not defined by your program
and there are macros for these as well.

PARTIAL is a macro that partially evaluates its body and returns the
resultant residue program.  It recursively walks its body and descends
into the functions *called* by the body simulating the effect of the
function calls using actual values where possible and placeholders
when not.

For almabench, each pass through planetpv uses a fixed value of np
(the planet number) and calculates using only that planet's slice of
the arrays.  Each array contains coefficients used to create a
polynomial approximation to a particular property of the planet's
motion, so there is an inner loop that calculates across the contents
of a slice (this is the loop over K).  Then there is an iterative
solution to kepler's equation where a loop is executed up to 10 times
or until the calculated error is less than 10^12.  The results of that
calculation determine the position and velocity for the planet
and those values are translated to ecliptic coordinates.

Since the planet number NP is fixed, it makes sense to partially
evaluate PLANETPV for each planet.

I added these functions:

(defun planetpv-dispatch (epoch np pv)
  (ecase np
    ((0) (planetpv0 epoch pv))
    ((1) (planetpv1 epoch pv))
    ((2) (planetpv2 epoch pv))
    ((3) (planetpv3 epoch pv))
    ((4) (planetpv4 epoch pv))
    ((5) (planetpv5 epoch pv))
    ((6) (planetpv6 epoch pv))
    ((7) (planetpv7 epoch pv))))

(defun planetpv0 (epoch pv) (partial (planetpv epoch 0 pv)))
(defun planetpv1 (epoch pv) (partial (planetpv epoch 1 pv)))
(defun planetpv2 (epoch pv) (partial (planetpv epoch 2 pv)))
(defun planetpv3 (epoch pv) (partial (planetpv epoch 3 pv)))
(defun planetpv4 (epoch pv) (partial (planetpv epoch 4 pv)))
(defun planetpv5 (epoch pv) (partial (planetpv epoch 5 pv)))
(defun planetpv6 (epoch pv) (partial (planetpv epoch 6 pv)))
(defun planetpv7 (epoch pv) (partial (planetpv epoch 7 pv)))

The loops within PLANETPV are unrolled because the partial evaluator
can determine a bound on the number of iterations.  With NP also
known, the AREFS in the loop are replaced with the actual values from
the array.  This turns out not to be that big a deal, however.  With
the appropriate declarations, the compiled code was efficiently
retrieving the values from the arrays.  With the array contents
inlined, the compiled code retrieves the values from the constant
vector associated with the code.  Either way takes the same amount of
time. 

Where the benefit comes is in collapsing some of the computation.  The
higher order terms in many of the arrays is zero.  When these terms
are multiplied in to the calculation, the result is obviously zero, so
the lesser terms in the calculation need not be performed.  (In the
non-partial version the higher order terms are calculated, but the
result is always zero.)  As an example, consider the calculation of DA
for the first planet:

  da =  (+ (aref a np 0)
           (* (+ (aref a np 1)
                 (* (aref a np 2) tt)) tt))

After partial evaluation, this becomes:

  da = (+ 0.3870983098d0
          (* (+ 0.0d0
                (* 0.0d0 tt)) tt))

Which simplifies to   0.3870983098d0
Thus replacing four floating point operations with a load.

In the loop involving K, we can achieve another optimization.  The
original code is this:

  dmu = (* 0.35953620d0 tt)

 arga = (* (aref kp np k) dmu)

   da = (* (+ (* (aref ca np k) (cos arga))
              (* (aref sa np k) (sin arga)))
            0.0000001d0)

Since K and NP are known, this becomes (for example if k=0,np=0)

 arga = (* 69613.0d0 (* 0.35953620d0 tt))

   da = (* (+ (* 4.0d0 (cos (* 69613.0d0 (* 0.35953620d0 tt))))
              (* -29.0d0 (sin (* 69613.0d0 (* 0.35953620d0 tt)))))
            0.0000001d0)

which simplifies to:

   da = (* (+ (* 4.0d0   (cos (* 25028.3934906 tt)))
              (* -29.0d0 (sin (* 25028.3934906 tt))))
            0.0000001d0)

By rewriting the original code to this *less efficient* version:

   da = (+ (* (aref ca np k) (cos arga) 0.0000001d0)
           (* (aref sa np k) (sin arga) 0.0000001d0))

the simplified code becomes:

   da = (+ (* 4.0d-7   (cos (* 25028.3934906 tt)))
           (* -2.9d-6  (sin (* 25028.3934906 tt))))

Saving at least one floating point operation.


These optimizations appear to reduce the runtime by several percent.

I ran into difficulty in a couple of places.  First, partial
evaluation is very memory intensive, so I hit the stack limit on some
of the calculations.  The other difficulties involved issues around
the evaluator itself.  

The partial evaluator does not track side effects, so it can be used
only on pure functional code.  The `C' style of programming where you
bash the same state variables over and over confuse the evaluator, so
I re-wrote the code in a functional style.  However, DO implicitly
side effects as well, so it needed to be rewritten.  Rewriting those
loops to be tail-recursive did the trick.

PLANETPV `returns' its values by modifying an array.  Again, side
effects are an issue, so I first tried just returning the six
interesting values.  Unfortunately, PARTIAL has certain restrictions
on the uses of multiple values so this was a problem.  I modified the
code to return the values as a list.  In theory, this limitation could
be overcome by extending PARTIAL.  The biggest difficulty this caused
was with the implementation of FMOD which uses multiple values.  It
appears that without using multiple values, there is no way to take
MOD of floating point without boxing the argument.

While PARTIAL tracks the types of all expressions, it does not emit
them in the residue program.  This caused all sorts of boxing and
unboxing.  A couple of changes to partial allowed me to emit the
correct declarations where needed.

Finally, there were some places where the compiler absolutely refused
to unbox the resulting code despite all the declarations.  (Some
multiplies of declared floats were nonetheless boxed.)

Despite these difficulties, I estimated the runtime by comparing
against the less efficient version that also boxes everything and
against a hand-specialized version.  The results indicate a several
percent speedup should be obtainable. Since we already exceed C, the
effort involved in extending the partial evaluator doesn't seem worth
it.