From: John M
Subject: Python and Lisp Test
Date: 
Message-ID: <f_CdnbC5fIKujS_eRVn-qg@comcast.com>
This is not a scientific test of any kind and I'm not
an expert on either Python or Lisp.

Having said that, I was interested in getting an idea
about the relative performance of each language when
computing the Fibonacci function.

I tried to come up with equivalent implementations
in each language.  Apologies if I did not succeed.

I was surprised at the result since I had read that
Lisp was the faster language.

Here are the functions in Lisp and Python:

#+:cmu (setf ext:*gc-verbose* nil)

(defun fib2 (n)
  (loop with f = 0 and f1 = 1 and f0 = 0
        initially
            (when (= n 0) (setf f f0))
            (when (= n 1) (setf f f1))
        for i from 2 to n
            do (setf f (+ f1 f0) f0 f1 f1 f)
        finally (return f)))

(time (let ((x (fib2 1000000)))))
(quit)


def fib (n):
    f = 0
    f1 = 1
    f0 = 0
    if n == 0: f = f0
    if n == 1: f = f1
    i = 2
    while i <= n:
        f = f1 + f0
        f0 = f1
        f1 = f
        i += 1
    return f

fib(1000000)

and the results...

% time python fib.py
95.711u 9.363s 1:46.53 98.6%    877+2023k 0+0io 0pf+0w
%
%
% time lisp -load fib
; Loading #p"/home/test/fib.x86f".

; Evaluation took:
;   172.59 seconds of real time
;   169.43845 seconds of user run time
;   0.260876 seconds of system run time
;   434,780,715,848 CPU cycles
;   [Run times include 63.18 seconds GC run time]
;   0 page faults and
;   43,402,286,400 bytes consed.
;
169.444u 0.271s 2:52.61 98.3%   118+2326k 0+0io 0pf+0w


Regards,
-- John

From: David Steuber
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <87irt9g756.fsf@david-steuber.com>
Just curious.  Did you try to print the result?  That would be the
most expensive thing to do.

-- 
http://www.david-steuber.com/
The UnBlog: An island of conformity in a sea of quirks.
http://www.david-steuber.com/snippets/Boycott_Sony/
From: John M
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <D4qdnWjl65CatS_eRVn-hQ@comcast.com>
David Steuber wrote:

> Just curious.  Did you try to print the result?  That would be the
> most expensive thing to do.
> 

Yes, printing with Python using:

print fib(1000000)

yields a number with 209989 digits and takes:

134.777u 8.801s 2:30.58 95.3%   906+2183k 0+1io 34pf+0w

printing with Lisp using (print (fib2 1000000)) the system
runs out of memory with the following:

*A2 gc_alloc_large failed, nbytes=63880.
 CMUCL has run out of dynamic heap space (512 MB).
  You can control heap size with the -dynamic-space-size commandline option.

Imminent dynamic space overflow has occurred:  Only a small amount of
dynamic
space is available now. Please note that you will be returned to the
Top-Level
without warning if you run out of space while debugging.
From: Ruslan Abdulkhalikov
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1135760266.302252.66850@g44g2000cwa.googlegroups.com>
this test can be rewritten a little:
      1 def fib (n):
      2     f = 0
      3     f1 = 1
      4     f0 = 0
      5     if n == 0: f = f0
      6     if n == 1: f = f1
      7     i = 2
      8     while i <= n:
      9         f = f1 + f0
     10         f0 = f1
     11         f1 = f
     12         i += 1
     13     return f
     14 i1 = 0
     15 while i1 < 1000000:
     16   i = 2
     17   while i < 10:
     18     fib(i)
     19     i += 1
     20   i1 += 1

and...
      1 (defun fib(n)
      2   (declare (optimize (speed 3) (space 2)))
      3   (labels ((fib-aux (r1 r2 n)
      4              (declare (fixnum n) (fixnum r1) (fixnum r2))
      5              (cond ((eq n 2) (values r1 r2))
      6                    (t (fib-aux r2 (+ r1 r2) (1- n))))))
      7     (multiple-value-bind (r1 r2)
      8       (fib-aux 1 1 n)
      9       r2)))
     10
     11 ;(fib 1000000)
     12 (loop for a from 1 to 1000000
     13       do (loop for a from 2 to 10
     14             do (fib a)))
     15 (quit)

$ time python a.py

real    0m21.845s
user    0m21.844s
sys     0m0.004s

$ time sbcl --load a.lisp
....
real    0m0.434s
user    0m0.425s
sys     0m0.007s

:)
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <87hd8txzcd.fsf@qrnik.zagroda>
"Ruslan Abdulkhalikov" <······@gmail.com> writes:

>       4              (declare (fixnum n) (fixnum r1) (fixnum r2))
>       5              (cond ((eq n 2) (values r1 r2))

eq on fixnums is wrong (even though it will work on most implementations).

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: John Thingstad
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <op.s2hajjbvpqzri1@mjolner.upc.no>
On Wed, 28 Dec 2005 05:10:24 +0100, John M <·····@yahoo.com> wrote:

> This is not a scientific test of any kind and I'm not
> an expert on either Python or Lisp.
>
> Having said that, I was interested in getting an idea
> about the relative performance of each language when
> computing the Fibonacci function.
>
> I tried to come up with equivalent implementations
> in each language.  Apologies if I did not succeed.
>
> I was surprised at the result since I had read that
> Lisp was the faster language.
>
> Here are the functions in Lisp and Python:
>

Only if you compile the function first!
lisp
(compile-file ..)
(load ..)

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
From: Bill Atkins
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1135779674.369571.166420@g43g2000cwa.googlegroups.com>
John Thingstad wrote:
>
> Only if you compile the function first!
> lisp
> (compile-file ..)
> (load ..)
>
> --
> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

He did.  Look at the extension of the file he's loading.
From: Raffael Cavallaro
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <2005122802072211272-raffaelcavallaro@pasdespamsilvousplaitmaccom>
On 2005-12-27 23:10:24 -0500, John M <·····@yahoo.com> said:

> I was surprised at the result since I had read that
> Lisp was the faster language.

As other replies have suggested, you need to compile your lisp code 
first, and algorithm and bignum libraries matter quite a bit here.

This version:

(defun fib (x)
  (declare (optimize (speed 3) (safety 0) (debug 0) (compilation-speed 
0) (space 3))
           (integer x))
  (if (< x 1) 0
    (let ((n (1- x)))
      (labels ((fib-aux (number)
                 (case number
                   ((0) (values 1 0))
                   ((1) (values 0 1))
                   (otherwise (progn
                                (let ((k (floor number 2)))
                                  (multiple-value-bind (a b) (fib-aux k)
                                    (let* ((aa (* a a))
                                           (bb (* b b))
                                           (ab2 (* 2 a b))
                                           (ab2bb (+ ab2 bb)))
                                      (if (= number (* 2 k))
                                        (values (+ aa bb) ab2bb)
                                        (values ab2bb (+ ab2bb aa bb)))))))))))
        (if (<= n 1) 1
          (progn
            (let ((k (floor n 2)))
              (multiple-value-bind (a b) (fib-aux k)
                (let ((ab (+ a b)))
                  (if (= n (* 2 k))
                    (+ (* ab ab) (* b b))
                    (+ (* ab ab) (* 2 b ab))))))))))))

computes fib (expt 10 6) in about 2 seconds on a dual G5 2 GHz 
(although only 1 cpu is used by sbcl):

* (time (fib (expt 10 6)))

Evaluation took:
  2.048 seconds of real time
  1.995197 seconds of user run time
  0.010075 seconds of system run time
  0 page faults and
  909,680 bytes consed.
19532821287077577316320149475962563324435429965918733969534051945716252578870156947666419876341501461288795243352202360846255109120195602337440154381151966361569199621256428943033701138278006380027674115279274666698655783793188228320612714975832303348548934895725992307229129019282092643316275217308614600179125820426996599360209593392020051848620284024473431398113674187202038684801753185386211128781082406177413832935545616876064540651259547180291265479428940369816592063610193592913521354103767990829403201557027161153950319759732477821629576316296533566947776632850623452455934606475750259358134434578167676462587885901137272990737294785114480895724561915035070255895291168685500088020132334587472177947814475467920160901706425856293597475465327575757400774320349134287851897953543047345603077650789387672865391667992328174493619915237681495576320853710478597061884387315305823956275608790631078190049751695947097367138917457045552021351233507944033607120305041446852210415650373210679322756258647511914611417360349681217380234224786080292021093192496490409832397066832470544417635125267324552754195016838452060230073949598542792982978312043821157576457876924955833514025221527206624418090032593807536284917966809529711850719137983367887377045991363933395581421203699026161797211325091840023055327607104316478190974300434647793363287601469996128023925829471557316688943339455429292871877487747892042961663565366107960239197021097284729667094273345863447980486339446352116549715072613427682054793209317507988801013041602798250635418234403455874223670128266635693461129461312312838906003654732766024569315151850018328483150645480029978935985161237074046158229354440701748339514575869547491750264542126364262224720600488554625899611904758921012242805428986215946466624785643735722177755498760876859120301185516356689020103446399839773266388890365078416180709154525299275973521395741547772914600879431433915606044582510782351166271892637923313014643880597879468444879060576786297460989627426663569682474293386740207436559426057944790711930522589315...


The 

scheme version it is based on:

(define (fib x)
  (if (< x 1) 0
      (let ((n (- x 1)))
        (letrec ((fib-aux
                  (match-lambda
                   (0 (values 1 0))
                   (1 (values 0 1))
                   (n (let ((k (quotient n 2)))
                        (receive (a b) (fib-aux k)
                          (let* ((aa (* a a))
                                 (bb (* b b))
                                 (ab2 (* 2 a b))
                                 (ab2bb (+ ab2 bb)))
                            (if (= n (* 2 k))
                                (values (+ aa bb) ab2bb)
                                (values ab2bb (+ ab2bb aa bb))))))))))
          (if (<= n 1)
              1
              (let ((k (quotient n 2)))
                (receive (a b) (fib-aux k)
                  (let ((ab (+ a b)))
                    (if (= n (* 2 k))
                        (+ (* ab ab) (* b b))
                        (+ (* ab ab) (* 2 b ab)))))))))))

when compiled with chicken using the gmp library, computes (fib (expt 
10 6)) in less than a tenth of a second:

 )   ___
(__/_____) /)   ,    /)
  /       (/      _ (/_   _ __
 /        / )__(_(__/(___(/_/ (_
(______)
Version 2, Build 216 - macosx-unix-gnu-ppc - [ libffi dload ptables ]
(c)2000-2005 Felix L. Winkelmann
#;1> (use fib-aux-cx)
; loading /usr/local/lib/chicken/fib-aux-cx.so ...
; loading /usr/local/lib/chicken/numbers-base.so ...
#;2> (time (fib (expt 10 6)))
   0.089 seconds elapsed
       0 seconds in (major) GC
     105 mutations
      10 minor GCs
       0 major GCs
19532821287077577316320149475962563324435429965918733969534051945716252578870156947666419876341501461288795243352202360846255109120195602337440154381151966361569199621256428943033701138278006380027674115279274666698655783793188228320612714975832303348548934895725992307229129019282092643316275217308614600179125820426996599360209593392020051848620284024473431398113674187202038684801753185386211128781082406177413832935545616876064540651259547180291265479428940369816592063610193592913521354103767990829403201557027161153950319759732477821629576316296533566947776632850623452455934606475750259358134434578167676462587885901137272990737294785114480895724561915035070255895291168685500088020132334587472177947814475467920160901706425856293597475465327575757400774320349134287851897953543047345603077650789387672865391667992328174493619915237681495576320853710478597061884387315305823956275608790631078190049751695947097367138917457045552021351233507944033607120305041446852210415650373210679322756258647511914611417360349681217380234224786080292021093192496490409832397066832470544417635125267324552754195016838452060230073949598542792982978312043821157576457876924955833514025221527206624418090032593807536284917966809529711850719137983367887377045991363933395581421203699026161797211325091840023055327607104316478190974300434647793363287601469996128023925829471557316688943339455429292871877487747892042961663565366107960239197021097284729667094273345863447980486339446352116549715072613427682054793209317507988801013041602798250635418234403455874223670128266635693461129461312312838906003654732766024569315151850018328483150645480029978935985161237074046158229354440701748339514575869547491750264542126364262224720600488554625899611904758921012242805428986215946466624785643735722177755498760876859120301185516356689020103446399839773266388890365078416180709154525299275973521395741547772914600879431433915606044582510782351166271892637923313014643880597879468444879060576786297460989627426663569682474293386740207436559426057944790711930522589315...

So, 

compile your lisp code, find better algorithms, and, as suggested by 
Bruce Hoult:
"Make sure you're measuring something that is important to you."
From: André Thieme
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1135764345.443487.290230@g49g2000cwa.googlegroups.com>
Raffael Cavallaro schrieb:

> On 2005-12-27 23:10:24 -0500, John M <·····@yahoo.com> said:
>
> > I was surprised at the result since I had read that
> > Lisp was the faster language.
>
> As other replies have suggested, you need to compile your lisp code
> first, and algorithm and bignum libraries matter quite a bit here.
>
> This version:
>
> (defun fib (x)
>   (declare (optimize (speed 3) (safety 0) (debug 0) (compilation-speed
> 0) (space 3))
>            (integer x))
>   (if (< x 1) 0
>     (let ((n (1- x)))
>       (labels ((fib-aux (number)
>                  (case number
>                    ((0) (values 1 0))
>                    ((1) (values 0 1))
>                    (otherwise (progn
>                                 (let ((k (floor number 2)))
>                                   (multiple-value-bind (a b) (fib-aux k)
>                                     (let* ((aa (* a a))
>                                            (bb (* b b))
>                                            (ab2 (* 2 a b))
>                                            (ab2bb (+ ab2 bb)))
>                                       (if (= number (* 2 k))
>                                         (values (+ aa bb) ab2bb)
>                                         (values ab2bb (+ ab2bb aa bb)))))))))))
>         (if (<= n 1) 1
>           (progn
>             (let ((k (floor n 2)))
>               (multiple-value-bind (a b) (fib-aux k)
>                 (let ((ab (+ a b)))
>                   (if (= n (* 2 k))
>                     (+ (* ab ab) (* b b))
>                     (+ (* ab ab) (* 2 b ab))))))))))))
>
> computes fib (expt 10 6) in about 2 seconds on a dual G5 2 GHz
> (although only 1 cpu is used by sbcl):
>
> * (time (fib (expt 10 6)))
>
> Evaluation took:
>   2.048 seconds of real time
>   1.995197 seconds of user run time
>   0.010075 seconds of system run time
>   0 page faults and
>   909,680 bytes consed.

Oh my...
In Python one simply writes down the function and it works pretty fast.
As Emre Sevinc asked a few days ago: do I have to be an expert to get
performance?


André
--
From: Thomas F. Burdick
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <xcvwthpmpbn.fsf@conquest.OCF.Berkeley.EDU>
"Andr� Thieme" <·····························@justmail.de> writes:

> Oh my...
> In Python one simply writes down the function and it works pretty fast.
> As Emre Sevinc asked a few days ago: do I have to be an expert to get
> performance?

What are you smoking?  The example Python given by the OP is just as
slow as the simple CL implementations.  Sure, the example that Rafael
gave is much more frightening than the clear and simple Python and CL
examples, but it also runs about 100x faster (on my G3).  If you write
a simple fib function in CL, it will be shorter and more readable than
the Python, and run at the same speed.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | Free Mumia Abu-Jamal! |
     ,--'    _,'   | Abolish the racist    |
    /       /      | death penalty!        |
   (   -.  |       `-----------------------'
   |     ) |                               
  (`-.  '--.)                              
   `. )----'                               
From: André Thieme
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1135772090.332826.86300@g44g2000cwa.googlegroups.com>
Thomas F. Burdick schrieb:

> "André Thieme" <·····························@justmail.de> writes:
>
> > Oh my...
> > In Python one simply writes down the function and it works pretty fast.
> > As Emre Sevinc asked a few days ago: do I have to be an expert to get
> > performance?
>
> What are you smoking?  The example Python given by the OP is just as
> slow as the simple CL implementations.  Sure, the example that Rafael
> gave is much more frightening than the clear and simple Python and CL
> examples, but it also runs about 100x faster (on my G3).  If you write
> a simple fib function in CL, it will be shorter and more readable than
> the Python, and run at the same speed.


If the Python code given by the OP is just as slow as the simple CL
code then I missed something, sorry.
I interpreted
% time python fib.py
95.711u 9.363s 1:46.53 98.6%    877+2023k 0+0io 0pf+0w

and

 % time lisp -load fib
; Loading #p"/home/test/fib.x86f".

; Evaluation took:
;   172.59 seconds of real time
;   169.43845 seconds of user run time
;   0.260876 seconds of system run time
;   434,780,715,848 CPU cycles
;   [Run times include 63.18 seconds GC run time]
;   0 page faults and
;   43,402,286,400 bytes consed.
;
169.444u 0.271s 2:52.61 98.3%   118+2326k 0+0io 0pf+0w

as the runtimes where it seems that ca. 96 seconds is less than ca. 169
seconds runtime. I have similar results on my machine.
Raffaels version runs of course *much* faster, as you said ca. two
orders of magnitude compared to the simple CL version and ca. 28 times
faster than the Python code.
From: Michael Sullivan
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1h89whd.epy8qb1re8fdiN%mes@panix.com>
Andr� Thieme <·····························@justmail.de> wrote:

> as the runtimes where it seems that ca. 96 seconds is less than ca. 169
> seconds runtime. I have similar results on my machine.
> Raffaels version runs of course *much* faster, as you said ca. two
> orders of magnitude compared to the simple CL version and ca. 28 times
> faster than the Python code.

That was the original poster.  In followups people who chose to compile
the lisp function ended up running about as fast or faster than the
python.  Admittedly, I was expecting better without a lot of gyrations,
but it's not clear that there are any gyrations which make a similar
speedup possible in python.


Michael
From: ··············@hotmail.com
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1135789320.543918.136880@g44g2000cwa.googlegroups.com>
Michael Sullivan wrote:
> André Thieme <·····························@justmail.de> wrote:
>
> > as the runtimes where it seems that ca. 96 seconds is less than ca. 169
> > seconds runtime. I have similar results on my machine.
> > Raffaels version runs of course *much* faster, as you said ca. two
> > orders of magnitude compared to the simple CL version and ca. 28 times
> > faster than the Python code.
>
> That was the original poster.  In followups people who chose to compile
> the lisp function ended up running about as fast or faster than the
> python.  Admittedly, I was expecting better without a lot of gyrations,
> but it's not clear that there are any gyrations which make a similar
> speedup possible in python.

Actually, the two Lisp implementations I tried (OpenMCL and SBCL on Mac
OS X, on a G4 Powerbook, both compiled) had substantially worse results
than the python, even with basic declarations (speed 3) (safety 0). I
attribute this to either a less efficient bignum implementation, and/or
some advantage Python has in its garbage collector (Does Python
reference count its bignums? Is "few bignums with very short lifetime"
a case where reference counting helps? Is Python better tuned to Mac OS
X/Unix virtual memory? etc, etc). That said, I only spent about 2
minutes trying speedups, enough to use up my attention span for this
problem.

If I cared about this particular case, these performance issues would
be important feedback to the implementors, and I would probably try
commercial implementations as well. I was actually surprised that
Python did so well.
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <87u0ctdkab.fsf@qrnik.zagroda>
···············@hotmail.com" <············@gmail.com> writes:

> I attribute this to either a less efficient bignum implementation,
> and/or some advantage Python has in its garbage collector (Does
> Python reference count its bignums?

Python uses classic reference counting (plus an extra collector which
detects cycles, but it probably has no effect here) and performs no
special optimizations for bignums. Bignum addition just check signs,
allocates the result with the maximum size of arguments + 1, adds
digits, and removes leading zeros. All in portable C. It uses 15 bits
per digit.

I just measured the time of fib 1000000 in various language implementations
(better of two runs):

            | real | user : system |
            | m:ss |    s :    s   |
------------+------+------+--------+
Haskell/GHC | 2:14 |  134 :    0.3 |
Kogut       | 2:25 |  140 :    4.0 |
KoScheme    | 2:48 |  163 :    5.5 |
Python      | 5:11 |  309 :    0.8 |
Lisp/SBCL   | 5:36 |  272 :   63.9 |

The long system time of SBCL is interesting. Perhaps its GC is not
properly triggered by allocation of large bignums.

GHC uses the GMP library. It sets GMP allocation functions to use
the Haskell heap. The GMP library on x86 uses 32 bits per digit
and critical fragments are written in assembler.

Kogut uses GMP too. It leaves the default digit allocation functions
which use malloc. The GC is informed about the estimate amount of
external memory, so it's triggered earlier.

KoScheme is my toy interpreter of Scheme written in Kogut. It has a
high interpretation overhead; it uses Kogut numbers as the underlying
representation of Scheme numbers. The difference between the time of
Kogut and KoScheme estimates the interpretation overhead, and suggests
that most of Python time is indeed spent in bignum operations rather
than the interpretation.

This is ghc-6.4.1, python-2.4.2, sbcl-0.9.0; Linux/x86 with 512MB
of RAM.

Here are the programs:

-------- Haskell --------

fib :: Integer -> Integer
fib 0 = 0
fib n = loop 0 1 n
    where
    loop a b n = if n > 1 then loop b (a + b) (n - 1) else b

main = print $ fib 1000000 `mod` 1000

-------- Kogut --------

let Fib n {
   if (n == 0) {0} else =>
   loop 0 1 n [
      a b (n > 1) {again b (a + b) (n - 1)}
      _ b _       {b}
   ]
};

WriteLine (Fib 1000000 %Mod 1000);

-------- Scheme --------

(define (fib n)
  (if (zero? n)
      0
      (let loop ((a 0) (b 1) (n n))
        (if (> n 1)
            (loop b (+ a b) (- n 1))
            b))))

(write (remainder (fib 1000000) 1000))
(newline)

-------- Python --------

def fib(n):
    if n == 0: return 0
    a = 0
    b = 1
    while n > 1:
        c = a + b
        a = b
        b = c
        n = n - 1
    return b

print fib(1000000) % 1000

-------- Lisp --------

(defun fib (n)
  (if (zerop n)
      0
      (do ((a 0 b)
           (b 1 (+ a b)))
          ((zerop (decf n)) b))))

(print (mod (fib 1000000) 1000))

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Juho Snellman
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <slrndr626n.8kt.jsnell@sbz-30.cs.Helsinki.FI>
<······@knm.org.pl> wrote:
> Lisp/SBCL   | 5:36 |  272 :   63.9 |
> 
> The long system time of SBCL is interesting. Perhaps its GC is not
> properly triggered by allocation of large bignums.

On Linux SBCL and CMUCL give memory back to the OS very aggressively
(essentially every bit of memory that the GC frees is munmapped and
then mmaped back in). This has bad performance characteristics on
modern systems, especially on this sort of pathological benchmark.

This is going to be changed for the next SBCL release. Of course that
has also been the intent for the last 5 releases or so, but this time
nothing can go wrong. ;-)

> This is ghc-6.4.1, python-2.4.2, sbcl-0.9.0; Linux/x86 with 512MB
> of RAM.

0.9.0 is a bit old, there's been a fair bit of GC work after that:

        user + system [s]
0.9.0  | 107.1
0.9.8  |  81.5
devel  |  47.2

And for comparison:

Python |  72.3

-- 
Juho Snellman
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <87u0csfd41.fsf@qrnik.zagroda>
Juho Snellman <······@iki.fi> writes:

> 0.9.0 is a bit old, there's been a fair bit of GC work after that:
>
>         user + system [s]
> 0.9.0  | 107.1
> 0.9.8  |  81.5
> devel  |  47.2

Indeed. Here is the table after upgrading to sbcl-0.9.8:

            | real | user : system |
            | m:ss |    s :    s   |
------------+------+------+--------+
Haskell/GHC | 2:14 |  134 :    0.3 |
Kogut       | 2:25 |  140 :    4.0 |
KoScheme    | 2:48 |  163 :    5.5 |
Lisp/SBCL   | 4:23 |  202 :   61.0 |
Python      | 5:11 |  309 :    0.8 |

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: ··············@hotmail.com
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1135872891.494806.254090@g49g2000cwa.googlegroups.com>
Marcin 'Qrczak' Kowalczyk wrote:
> ···············@hotmail.com" <············@gmail.com> writes:
>
> > I attribute this to either a less efficient bignum implementation,
> > and/or some advantage Python has in its garbage collector (Does
> > Python reference count its bignums?
>
> Python uses classic reference counting (plus an extra collector which
> detects cycles, but it probably has no effect here) and performs no
> special optimizations for bignums. Bignum addition just check signs,
> allocates the result with the maximum size of arguments + 1, adds
> digits, and removes leading zeros. All in portable C. It uses 15 bits
> per digit.

This (and Juho Snellman's comment about SBCL's interaction with the OS)
is informative. Thanks.

On the other hand, I was suspecting that there was some advantage to
reference counting when you have a small number of live objects, in
that it might allow for immediate reclamation within the inner loop, as
opposed to a less efficient scavenging of a lot of garbage when a gc
occurs in the Lisp, with worse memory locality during the compuation,
to boot. I really don't know, because I pretty much learned about GC on
a street corner.
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <87ek3v4nag.fsf@qrnik.zagroda>
···············@hotmail.com" <············@gmail.com> writes:

> On the other hand, I was suspecting that there was some advantage to
> reference counting when you have a small number of live objects, in
> that it might allow for immediate reclamation within the inner loop,
> as opposed to a less efficient scavenging of a lot of garbage when a gc
> occurs in the Lisp, with worse memory locality during the compuation,
> to boot.

I don't know. GC statistics from the Kogut program looked strange for
me for the first sight, but now they begin to make sense.

My GC has two heaps, young and old. Minor GC moves live data from the
young heap into the old heap. Major GC moves live data from both heaps
into the newly allocated old heap. The minor heap has 64kB by default.

Digits of bignums are allocated with malloc and they are not physically
moved during GC. Allocation or moving of an object with some associated
external memory, like a bignum, artificially brings the end of the heap
closer, so the GC will happen as early as if digits were allocated on
the regular heap. Objects which need primitive finalization are linked
in a list, young and old separately, and dead objects from these lists
are finalized, e.g. freeing the external memory.

The Fibonacci program has 3 large numbers alive at the given time.
In each step one of them is forgotten and a new one is being computed.
They all grow with the same uniform speed relative to the number of
steps. There is a million of steps, and at the end the numbers take
up 85kB each.

The program runs for about 140s. 14% of the time is spent in the GC.
It performs 513436 garbage collections in total, 9272 of which are
major GCs. At most 0.280MB of memory is alive at the given time;
this includes external memory. 20MB is allocated in total, and 1005MB
is moved by the GC; these numbers don't include external memory.

The count of GCs is very large and that at the same time that they
don't take too much time. An average GC takes just 38 microseconds
and moves 2kB of memory.

What really happens near the end of the run is that each number takes
up more external memory than the young heap, so a GC happens after
each step. The number of GCs is more than half of the number of steps!
The young heap contains only one object, the number just allocated,
and it's of course alive. The 20 bytes it takes are moved to the old
heap, but it's accounted for as many kilobytes as its digits really
use. Once every 55 GCs, or 108 steps, the kilobytes add up to the
limit for the old heap, which happens to be a little over 4MB here,
and a major GC is performed, which frees all 108 numbers except the
last three of them.

I see two problems:

- 8MB of memory is constantly touched, instead of 120kB with
  refcounting, which increases cache pressure.

- If the program kept a lot of memory allocated besides these numbers,
  it would be much slower, because each major GC would shuffle them
  around (it does shuffle hundreds or thousands of objects, e.g.
  dictionaries of generic functions, but it's not much).

  This would not necessarily happen with another GC algorithm. Even
  having 3 generations instead of 2 would help a lot in this case.

I wonder how much this behavior differs from other GC algorithms.

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Thomas Lindgren
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <m364p7bdf7.fsf@localhost.localdomain>
Marcin 'Qrczak' Kowalczyk <······@knm.org.pl> writes:

> What really happens near the end of the run is that each number takes
> up more external memory than the young heap, so a GC happens after
> each step. The number of GCs is more than half of the number of steps!
> The young heap contains only one object, the number just allocated,
> and it's of course alive. The 20 bytes it takes are moved to the old
> heap, but it's accounted for as many kilobytes as its digits really
> use. Once every 55 GCs, or 108 steps, the kilobytes add up to the
> limit for the old heap, which happens to be a little over 4MB here,
> and a major GC is performed, which frees all 108 numbers except the
> last three of them.
> 
> I see two problems:
> 
> - 8MB of memory is constantly touched, instead of 120kB with
>   refcounting, which increases cache pressure.
> 
> - If the program kept a lot of memory allocated besides these numbers,
>   it would be much slower, because each major GC would shuffle them
>   around (it does shuffle hundreds or thousands of objects, e.g.
>   dictionaries of generic functions, but it's not much).
> 
>   This would not necessarily happen with another GC algorithm. Even
>   having 3 generations instead of 2 would help a lot in this case.
> 
> I wonder how much this behavior differs from other GC algorithms.

If minor collections are a problem, maybe you should dynamically
adjust the size of the young generation?  For example, if a minor
collection shows that most of the heap was live, increase the young
generation size.

Best,
                        Thomas
-- 
Thomas Lindgren
"It's becoming popular? It must be in decline." -- Isaiah Berlin
 
From: Rob Warnock
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <Gs6dnS-_57EnnijeRVn-hw@speakeasy.net>
Marcin 'Qrczak' Kowalczyk  <······@knm.org.pl> wrote:
+---------------
| My GC has two heaps, young and old. Minor GC moves live data from the
| young heap into the old heap. Major GC moves live data from both heaps
| into the newly allocated old heap. The minor heap has 64kB by default.
+---------------

That sounds really small to me. The literature I've read on generational
GCs with a "nursery" [e.g., Ungar, Appel, or especially Wilson et al
<http://www.cs.utexas.edu/users/oops/papers.html#caching-gen-gc>] making
the nursery [your young heap] be roughly the size of the CPU's secondary
cache [but not larger].

Also, be sure you keep the nursery in a fixed location [if you're not
already doing that], since the whole point of that approach is to keep
the memory in the nursery "hot" in the cache.


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <874q4qvcgm.fsf@qrnik.zagroda>
····@rpw3.org (Rob Warnock) writes:

> That sounds really small to me. The literature I've read on generational
> GCs with a "nursery" [e.g., Ungar, Appel, or especially Wilson et al
> <http://www.cs.utexas.edu/users/oops/papers.html#caching-gen-gc>] making
> the nursery [your young heap] be roughly the size of the CPU's secondary
> cache [but not larger].

I have 256kB of cache. While in this non-natural program increasing
the young heap size helps (256kB is best), a more realistic program
(a compiler compiling itself; actually 23 invocations, one for each
module) is fastest with the young heap of 64kB:

nursery|one run |another
    kB |  s     |  s
-------+--------+--------
    32 | 19.517 | 19.373
    48 | 18.861 | 18.947
    64 | 18.648 | 18.683
    96 | 19.684 | 18.690
   128 | 18.982 | 18.858
   192 | 19.190 | 19.207
   256 | 19.576 | 19.315
   384 | 19.289 | 19.321
   512 | 19.234 | 19.238

I doubt it should be made exactly the cache size. Since the program
doesn't exclusively access objects it has just allocated, assuming
some LRU policy most parts of the nursery would be wiped out of the
cache just before they are read back during a GC.

> Also, be sure you keep the nursery in a fixed location [if you're
> not already doing that], since the whole point of that approach is
> to keep the memory in the nursery "hot" in the cache.

It is kept in the same location (unless a single object to be
allocated is larger; external memory like bignum digits and array
contents doesn't count).

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Rob Warnock
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <asGdnYgrw-yh-SvenZ2dnUVZ_tmdnZ2d@speakeasy.net>
Marcin 'Qrczak' Kowalczyk  <······@knm.org.pl> wrote:
+---------------
| ····@rpw3.org (Rob Warnock) writes:
| > That sounds really small to me. The literature I've read on generational
| > GCs with a "nursery" [e.g., Ungar, Appel, or especially Wilson et al...]
| > <http://www.cs.utexas.edu/users/oops/papers.html#caching-gen-gc>]
| > [suggests] making the nursery be roughly the size of the CPU's secondary
| > cache [but not larger].
| 
| I have 256kB of cache. While in this non-natural program increasing
| the young heap size helps (256kB is best), a more realistic program
| (a compiler compiling itself; actually 23 invocations, one for each
| module) is fastest with the young heap of 64kB: [results elided]
+---------------

Very interesting.  On the other hand, your results for each run
vary only 4-5% over the *very* wide range of nursery sizes. I'd
expect to see *much* larger variations [even if the minima were
at nursery sizes oter than I would expect], so I'm wondering if
the bottleneck is somewhere else entirely other than the GC or
cache pressure.

+---------------
| I doubt it should be made exactly the cache size. Since the program
| doesn't exclusively access objects it has just allocated, assuming
| some LRU policy most parts of the nursery would be wiped out of the
| cache just before they are read back during a GC.
+---------------

You have a point, but the optimal size probably depends very much
on the size and type of cache. I would expect to see direct-mapped
caches behave very differently than 2- or 4-way set-associative caches.
For the latter, I'd expect nurseries below 1/2 the cache size to
cause almost *no* thrashing with older objects. And for the direct-
mapped case, if the basic generational assumption is correct, even
with the nursery size as large as the cache size there should be
much less thrashing than you might think, since -- again, assuming
most new objects die very young -- at any point in time very little
of the nursery actually constains "live" objects.

+---------------
| > Also, be sure you keep the nursery in a fixed location...
| 
| It is kept in the same location...
+---------------

O.k., just checking...  ;-}  ;-}

+---------------
| (unless a single object to be allocated is larger; external memory
| like bignum digits and array contents doesn't count).
+---------------

Yes, of course. A lot of the papers I've read on nursery schemes
suggest not allocating "large" objects in the nursery at all
[where "large" is 1/64th to 1/8th the size of the nursery or more.]


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <87bqyx2oaj.fsf@qrnik.zagroda>
····@rpw3.org (Rob Warnock) writes:

> Very interesting.  On the other hand, your results for each run
> vary only 4-5% over the *very* wide range of nursery sizes. I'd
> expect to see *much* larger variations [even if the minima were
> at nursery sizes oter than I would expect], so I'm wondering if
> the bottleneck is somewhere else entirely other than the GC or
> cache pressure.

I don't know. GC takes 10%-25% of the time. Hmm, I think it used to be
around 10% a few months ago.

The compiler uses lots of small objects with varying lifetime, mostly
short. It basically transforms trees through several stages.

About 80% of objects die before the first GC. There are 0-2 major GCs
per run and hundreds or thousands of minor GCs. A run allocates tens
of megabytes in total and has at most a few MB alive at the given time.

> A lot of the papers I've read on nursery schemes suggest not
> allocating "large" objects in the nursery at all [where "large"
> is 1/64th to 1/8th the size of the nursery or more.]

This would complicate the implementation because initialization of
a newly allocated object would have to use write barrier if it's
possible that the object got placed in the old heap.

Heap sizes are settable by environment variables, so I would have to
pick an arbitrary minimal size, to be able to omit write barriers for
smaller sizes.

Several hand-written C fragments would have to be prepared for the
fact that a newly allocated object could be in the old heap. For
example conversion of an array to a list allocates all cons cells
in one step and fills the memory block.

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Lars Brinkhoff
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <85mzic2n73.fsf@junk.nocrew.org>
Marcin 'Qrczak' Kowalczyk writes:
> Rob Warnock writes:
> > A lot of the papers I've read on nursery schemes suggest not
> > allocating "large" objects in the nursery at all.
> This would complicate the implementation because initialization of a
> newly allocated object would have to use write barrier if it's
> possible that the object got placed in the old heap.

I believe is't usually suggested to put large objects in a separate
heap, not the old heap.  In that case, no write barrier would be
needed, right?
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <87psn8148y.fsf@qrnik.zagroda>
Lars Brinkhoff <·········@nocrew.org> writes:

>> > A lot of the papers I've read on nursery schemes suggest not
>> > allocating "large" objects in the nursery at all.
>> This would complicate the implementation because initialization of a
>> newly allocated object would have to use write barrier if it's
>> possible that the object got placed in the old heap.
>
> I believe is't usually suggested to put large objects in a separate
> heap, not the old heap.  In that case, no write barrier would be
> needed, right?

I know three ways of handling locations which may point to young
objects but aren't themselves contained in young objects: scan them
on every GC, partition them into generations like objects, or use a
write barrier to scan only recently modified locations on minor GCs.
It doesn't matter whether the location is in the old heap or elsewhere.
It must be found somehow during minor GC.

If the whole large object is put on a separate heap (witout a header
object in the young heap) and it may refer to young objects without a
write barrier, then something must be designed to avoid scanning all
large objects on every GC.

Large objects can be linked in lists and marked in some header bits,
such that they logically consist of several generations even though
they are not physically moved when promoted to an older generation.

If the payload is allocated separately but the header is in the young
heap, then the payload could be scanned whenever the header is scanned,
i.e. it logically belongs to the same object, taking advantage of the
fact that there are no pointers to the payload other than from the
header, so its lifetime doesn't have to be managed separately. The
only problem is that the object representation has one more indirection,
or can have one more indirection or not depending on the payload size,
so code accessing it must be more complex.

                          *       *       *

My GC uses different techniques in different places. It's suboptimal
because it's the only GC I've ever implemented, and thus I wanted it
to be simple so I can understand it well. In particular there are
almost no choices of formats or algorithms depending on object sizes:
each type is implemented in a single way (except that strings are
internally a mixture of ISO-8859-1 and UTF-32).

A major GC is similar to a minor GC: all heap-allocated objects are
moved to a separate place. So at those points the program requires
about twice more memory than it really uses, and the pause time can be
long. The runtime prefers to perform GC when all threads are waiting
for something.

Some types have payloads allocated separately from the header.
My language has vectors (with the size fixed at construction time),
strings (immutable), and arrays (with two internal fill pointers at
both ends). Arrays have the payload allocated with malloc, so they are
only scanned, not moved, and they don't participate in the doubling of
the amount of memory. Let's not talk about specialized element types.

Since the payload of an array can be reallocated when more elements
are added, I couldn't use a regular write barrier which stores
modified locations. The location might get invalidated, and there
is no way to undig it except by performing a GC or scanning all
locations stored by the write barrier (unless I used some bitmapped
write barrier instead of an array of locations, but this would be
much harder to get right).

So in the case of arrays the modified arrays themselves are remembered,
instead of modified individual locations inside them. By itself
this would lead to scanning the whole array during GC if any of
its elements was modified. So I'm keeping the length of the initial
segment of an array which is known to not point to young objects.
Arrays are more often appended at the end than prepended.

A very similar technique is used for thread stacks. Only the changed
part of each stack is scanned by minor GC. This is distinguished
by marking a bit in a pointer to a frame descriptor in each frame,
relying on the fact that only the topmost frame is mutable: marked
frames except the topmost one must have been pushed earlier than
the last GC, so they can't point to young objects.

Payloads of bignums are also allocated separately (I'm using the
GMP library). They don't have to be scanned at all of course.

Global, statically allocated variables are partitioned into
generations and use a regular write barrier. This includes fields
of statically allocated objects which may point to heap-allocated
objects. This means that a minor GC doesn't have to scan all global
variables, only those from recently initialized modules and those
which have been modified since last GC.

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Lars Brinkhoff
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <85oe2r0xpv.fsf@junk.nocrew.org>
Marcin 'Qrczak' Kowalczyk writes:
> Lars Brinkhoff writes:
> > > > A lot of the papers I've read on nursery schemes suggest not
> > > > allocating "large" objects in the nursery at all.

> > > This would complicate the implementation because initialization
> > > of a newly allocated object would have to use write barrier if
> > > it's possible that the object got placed in the old heap.

> > I believe is't usually suggested to put large objects in a
> > separate heap, not the old heap.  In that case, no write barrier
> > would be needed, right?

> If the whole large object is put on a separate heap (witout a header
> object in the young heap) and it may refer to young objects without
> a write barrier, then something must be designed to avoid scanning
> all large objects on every GC.

I think the large objects referred to in GC papers usually don't have
any pointers in them.  It would seem that, by mallocing space for such
objects, you do if effect have such a separate heap.
From: Matthias
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <36wlky42o6p.fsf@hundertwasser.ti.uni-mannheim.de>
Marcin 'Qrczak' Kowalczyk <······@knm.org.pl> writes:

> I just measured the time of fib 1000000 in various language implementations
> (better of two runs):
> 
>             | real | user : system |
>             | m:ss |    s :    s   |
> ------------+------+------+--------+
> Haskell/GHC | 2:14 |  134 :    0.3 |
> Kogut       | 2:25 |  140 :    4.0 |
> KoScheme    | 2:48 |  163 :    5.5 |
> Python      | 5:11 |  309 :    0.8 |
> Lisp/SBCL   | 5:36 |  272 :   63.9 |
> 
[...]
> -------- Python --------
> 
> def fib(n):
>     if n == 0: return 0
>     a = 0
>     b = 1
>     while n > 1:
>         c = a + b
>         a = b
>         b = c
>         n = n - 1
>     return b
> 
> print fib(1000000) % 1000

Python 2.2.3 gains about 30% speed if you write the algorithm more concisely:

===
def fib(n):
    if n == 0: return 0
    (a, b) = (0, 1)
    while n > 1:
        (a, b) = (b, a + b)
        n = n - 1
    return b

print fib(1000000) % 1000
===

Best wishes, 

  Matthias
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <87bqz0otub.fsf@qrnik.zagroda>
Matthias <··@spam.please> writes:

> Python 2.2.3 gains about 30% speed if you write the algorithm more
> concisely:

I gained only 2.6%, and it's hard to believe that you gained 30%.
There is no difference in bignum operations, and they seem to
dominate here.

The difference between KoScheme and Kogut is about as large as an
interpretation overhead can be, and it's only 16%. Two Python versions
are both interpreted from similar bytecode. Where would the 30%
difference come from?

I would expect Python bignum operations to be about twice slower
than in an implementations using GMP, because it uses twice narrower
digits, and because it's portable C rather than assembler (C doesn't
provide a good access to the carry flag).

Updated Python:

            | real | user : system |
            | m:ss |    s :    s   |
------------+------+------+--------+
Haskell/GHC | 2:14 |  134 :    0.3 |
Kogut       | 2:25 |  140 :    4.0 |
KoScheme    | 2:48 |  163 :    5.5 |
Lisp/SBCL   | 4:23 |  202 :   61.0 |
Python      | 5:03 |  299 :    1.0 |

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Pascal Costanza
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <41flboF1ejfbeU1@individual.net>
Andr� Thieme wrote:
> Thomas F. Burdick schrieb:
> 
> 
>>"Andr� Thieme" <·····························@justmail.de> writes:
>>
>>>Oh my...
>>>In Python one simply writes down the function and it works pretty fast.
>>>As Emre Sevinc asked a few days ago: do I have to be an expert to get
>>>performance?
>>
>>What are you smoking?  The example Python given by the OP is just as
>>slow as the simple CL implementations.  Sure, the example that Rafael
>>gave is much more frightening than the clear and simple Python and CL
>>examples, but it also runs about 100x faster (on my G3).  If you write
>>a simple fib function in CL, it will be shorter and more readable than
>>the Python, and run at the same speed.
> 
> If the Python code given by the OP is just as slow as the simple CL
> code then I missed something, sorry.
> I interpreted
[...]
> as the runtimes where it seems that ca. 96 seconds is less than ca. 169
> seconds runtime. I have similar results on my machine.
> Raffaels version runs of course *much* faster, as you said ca. two
> orders of magnitude compared to the simple CL version and ca. 28 times
> faster than the Python code.

You are comparing extreme cases. Again, as other posters said, comparing 
the efficiency of implementations of the Fibonacci function is not a 
very interesting case. As has not be said in this thread yet, micro 
benchmarks don't tell you anything about the efficiency characteristics 
of large programs.

Raffael's version is probably (I haven't checked) a highly optimized 
version. The original CL version wasn't optimized at all. You don't have 
to be an expert to do better than that, you only have to be an expert if 
you want to squeeze out the final drop of efficiency. But that's the 
case for any language.

My guess is that if you are relatively well-versed in Python, you will 
probably be able to implement reasonably efficient applications. Again, 
that's the case for most languages. (You have to know something about 
the specific strengths of a language, some general idea about the 
language implementation and something about the typical idioms.)

One important thing, however, is that if efficiency in Python becomes 
really important (for example, for the "hot spots" of your application) 
it seems that you have to switch to C to implement those parts 
efficiently. The nice thing about Common Lisp is that you can stay 
within Common Lisp to turn a non-optimized set of functions into an 
efficient one. I think that's a big advantage that you don't have to 
switch your set of tools when your goals changes slightly.


Pascal

-- 
My website: http://p-cos.net
Closer to MOP & ContextL:
http://common-lisp.net/project/closer/
From: Peter Scott
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1135793138.853120.72660@f14g2000cwb.googlegroups.com>
Pascal Costanza wrote:
> Raffael's version is probably (I haven't checked) a highly optimized
> version. The original CL version wasn't optimized at all. You don't have
> to be an expert to do better than that, you only have to be an expert if
> you want to squeeze out the final drop of efficiency. But that's the
> case for any language.

Raffael's version uses a much better algorithm, which requires O(log N)
2x2 matrix multiplications instead of O(N) additions. This really pays
off when computing *large* Fibonacci numbers.

-Peter
From: Thomas F. Burdick
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <xcvslscmb1f.fsf@conquest.OCF.Berkeley.EDU>
"Andr� Thieme" <·····························@justmail.de> writes:

> as the runtimes where it seems that ca. 96 seconds is less than ca. 169
> seconds runtime. I have similar results on my machine.
> Raffaels version runs of course *much* faster, as you said ca. two
> orders of magnitude compared to the simple CL version and ca. 28 times
> faster than the Python code.

In the context of two very non-performant solutions for something, I
think that 175% difference in speed is not all that much, particularly
since you're using the worst of the CL examples.  The simplest, the
one using DO, runs on my system, using SBCL, in about 115% the time of
the Python.  The optimized version (running on SBCL) ran in 0.010% the
time of either of them.  Or, run by CLISP, it completes in 0.0010% of
the time of either of the two naive versions.

My point is just that they're close enough in this example.  You can
only tell them apart by looking very carefully at the bad end of the
curve.  (To see something you can tell apart, try running one of the
naive implementations in CLISP).

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | Free Mumia Abu-Jamal! |
     ,--'    _,'   | Abolish the racist    |
    /       /      | death penalty!        |
   (   -.  |       `-----------------------'
   |     ) |                               
  (`-.  '--.)                              
   `. )----'                               
From: Thomas F. Burdick
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <xcvmzikmaso.fsf@conquest.OCF.Berkeley.EDU>
···@conquest.OCF.Berkeley.EDU (Thomas F. Burdick) writes:

> In the context of two very non-performant solutions for something, I
> think that 175% difference in speed is not all that much, particularly
> since you're using the worst of the CL examples.  The simplest, the
> one using DO, runs on my system, using SBCL, in about 115% the time of
> the Python.  The optimized version (running on SBCL) ran in 0.010% the
> time of either of them.  Or, run by CLISP, it completes in 0.0010% of
> the time of either of the two naive versions.

I pasted to hastily from the REPL.  What I meant to write, of course,
was 1% and 0.1%.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | Free Mumia Abu-Jamal! |
     ,--'    _,'   | Abolish the racist    |
    /       /      | death penalty!        |
   (   -.  |       `-----------------------'
   |     ) |                               
  (`-.  '--.)                              
   `. )----'                               
From: Michael
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <slrndr5vkc.tu.malus42@yahoo.com>
On 2005-12-28, Andr� Thieme <·····························@justmail.de> wrote:
>  Oh my...
>  In Python one simply writes down the function and it works pretty fast.
>  As Emre Sevinc asked a few days ago: do I have to be an expert to get
>  performance?

Depends on what you call an expert. They start teaching computer science
majors the importance of proper algorithm selection at least by the second
year of college.

So if by "expert" you mean a college sophomore then yes, you need to be an
"expert".

:)
From: Raffael Cavallaro
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <2005122818450116807-raffaelcavallaro@pasdespamsilvousplaitmaccom>
On 2005-12-28 05:05:45 -0500, "Andr� Thieme" 
<·····························@justmail.de> said:


> Oh my...
> In Python one simply writes down the function and it works pretty fast.
> As Emre Sevinc asked a few days ago: do I have to be an expert to get
> performance?

Here's a version in common lisp that is nearly as fast but much less scary:

(defun square (x)
 "This just squares an integer."
  (declare (integer x) (optimize (speed 3) (safety 0) (debug 0) 
(compilation-speed 0) (space 0)))
  (the integer (* x x)))

(defun fib (n)
  (declare (integer n) (optimize (speed 3) (safety 0) (debug 0) 
(compilation-speed 0) (space 0)))
  (cond
   ((= n 0) 0)
   ((< n 3) 1)
   ((evenp n) (- (square (fib (+ (truncate n 2) 1)))
                 (square (fib (- (truncate n 2) 1)))))
   (t (+ (square (fib (truncate n 2)))
         (square (fib (+ (truncate n 2) 1)))))))

regards,

Ralph
From: John M
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <2vKdnQ3mA7Y8oy7enZ2dnUVZ_sidnZ2d@comcast.com>
Raffael Cavallaro wrote:

> On 2005-12-28 05:05:45 -0500, "Andr� Thieme"
> <·····························@justmail.de> said:
> 
> 
>> Oh my...
>> In Python one simply writes down the function and it works pretty fast.
>> As Emre Sevinc asked a few days ago: do I have to be an expert to get
>> performance?
> 
> Here's a version in common lisp that is nearly as fast but much less
> scary:
> 
> (defun square (x)
>  "This just squares an integer."
>   (declare (integer x) (optimize (speed 3) (safety 0) (debug 0)
> (compilation-speed 0) (space 0)))
>   (the integer (* x x)))
> 
> (defun fib (n)
>   (declare (integer n) (optimize (speed 3) (safety 0) (debug 0)
> (compilation-speed 0) (space 0)))
>   (cond
>    ((= n 0) 0)
>    ((< n 3) 1)
>    ((evenp n) (- (square (fib (+ (truncate n 2) 1)))
>                  (square (fib (- (truncate n 2) 1)))))
>    (t (+ (square (fib (truncate n 2)))
>          (square (fib (+ (truncate n 2) 1)))))))
> 
> regards,
> 
> Ralph

For the record
a similar Python version:

def square (x): return x*x

def fib (n):
  if n == 0: return 0
  elif n < 3: return 1
  elif n%2:
    return square(fib(n/2)) + square(fib(n/2 + 1))
  else:
    return square(fib(n/2 + 1)) - square(fib(n/2 - 1))

for i in range(11): print fib(i),
fib (1000000)

% time python fib-raffael2.py
0 1 1 2 3 5 8 13 21 34 55
2.871u 0.021s 0:02.99 96.6%     893+2062k 0+0io 0pf+0w

% time lisp -load fib-raffael2.lisp
; Loading #p"/home/test/fib-raffael2.lisp".
0 1 1 2 3 5 8 13 21 34 55
16.730u 0.014s 0:17.41 96.1%    121+2381k 0+0io 0pf+0w

(no printing though)
From: Raffael Cavallaro
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <2005122821302075249-raffaelcavallaro@pasdespamsilvousplaitmaccom>
On 2005-12-28 20:40:47 -0500, John M <·····@yahoo.com> said:

> For the record
> a similar Python version:



For the record, your python version:

rafg5:~/Developer raffaelc$ time python ./fib-john.py
0 1 1 2 3 5 8 13 21 34 55

real    0m3.186s
user    0m2.877s
sys     0m0.158s


and the same in sbcl:

rafg5:~/Developer raffaelc$ sbcl --sysinit /dev/null --userinit 
/dev/null --load "/raflisp/fib-lucas-bench.fasl"
This is SBCL 0.9.4, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses.  See the CREDITS and COPYING files in the
distribution for more information.
0 1 1 2 3 5 8 13 21 34 55
Evaluation took:
  2.833 seconds of real time
  2.802525 seconds of user run time
  0.016415 seconds of system run time
  0 page faults and
  5,052,104 bytes consed.



So your timing of 17 seconds is way out of line with what others are seeing.

Now let's actually print the result:

rafg5:~/Developer raffaelc$ time python ./fib-john-print.py

[lots of digits elided]

real    0m33.492s
user    0m32.540s
sys     0m0.295s

and in sbcl, with printing:

rafg5:~/Developer raffaelc$ sbcl --sysinit /dev/null --userinit 
/dev/null --load "/raflisp/fib-print-bench.fasl"

[lots of digits elided]

Evaluation took:
  6.472 seconds of real time
  6.365908 seconds of user run time
  0.043454 seconds of system run time
  0 page faults and
  9,335,672 bytes consed.

So sbcl is 5 times as fast here.

Again, what is it that is actually important to you? Do you really need 
to compute large fibonacci numbers? If so, then  get a language 
implementation that has a high performance bignum library.

Finally, please realize that the benefits of lisp are not seen in toy 
benchmarks, but in large systems which take advantage of lisp macros 
and domain specific languages. Many correspondents here see the fact 
that lisp code is also lisp data as the key to the power of lisp. You 
can write powerful macros in the very same language that you write the 
rest of your app in because your macros use the full power of the 
language to manipulate code as data. This is a powerful feature which 
does not really show returns in tiny toy benchmarks like these, but 
only in larger more complex systems. In these larger systems, you end 
up writing your own mini language "on lisp"[1] and solving your problem 
in this new domain specific language.

regards,

Ralph

[1] hence Paul Graham's book title.
From: John M
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <74ednRIWVJz4kCnenZ2dnUVZ_sSdnZ2d@comcast.com>
Raffael Cavallaro wrote:

> On 2005-12-28 20:40:47 -0500, John M <·····@yahoo.com> said:
> 
>> For the record
>> a similar Python version:
> 
> 
> 
> For the record, your python version:
> 
> rafg5:~/Developer raffaelc$ time python ./fib-john.py
> 0 1 1 2 3 5 8 13 21 34 55
> 
> real    0m3.186s
> user    0m2.877s
> sys     0m0.158s
> 
> 
> and the same in sbcl:
> 
> rafg5:~/Developer raffaelc$ sbcl --sysinit /dev/null --userinit
> /dev/null --load "/raflisp/fib-lucas-bench.fasl"
> This is SBCL 0.9.4, an implementation of ANSI Common Lisp.
> More information about SBCL is available at <http://www.sbcl.org/>.
> 
> SBCL is free software, provided as is, with absolutely no warranty.
> It is mostly in the public domain; some portions are provided under
> BSD-style licenses.  See the CREDITS and COPYING files in the
> distribution for more information.
> 0 1 1 2 3 5 8 13 21 34 55
> Evaluation took:
>   2.833 seconds of real time
>   2.802525 seconds of user run time
>   0.016415 seconds of system run time
>   0 page faults and
>   5,052,104 bytes consed.
> 
> 
> 
> So your timing of 17 seconds is way out of line with what others are
> seeing.
> 
> Now let's actually print the result:
> 
> rafg5:~/Developer raffaelc$ time python ./fib-john-print.py
> 
> [lots of digits elided]
> 
> real    0m33.492s
> user    0m32.540s
> sys     0m0.295s
> 
> and in sbcl, with printing:
> 
> rafg5:~/Developer raffaelc$ sbcl --sysinit /dev/null --userinit
> /dev/null --load "/raflisp/fib-print-bench.fasl"
> 
> [lots of digits elided]
> 
> Evaluation took:
>   6.472 seconds of real time
>   6.365908 seconds of user run time
>   0.043454 seconds of system run time
>   0 page faults and
>   9,335,672 bytes consed.
> 
> So sbcl is 5 times as fast here.
> 
> Again, what is it that is actually important to you? Do you really need
> to compute large fibonacci numbers? If so, then  get a language
> implementation that has a high performance bignum library.
> 
> Finally, please realize that the benefits of lisp are not seen in toy
> benchmarks, but in large systems which take advantage of lisp macros
> and domain specific languages. 

What is your view of the Gabriel benchmark suite for measuring performance
across these language implementations?  Is this suite available for Python?

> Many correspondents here see the fact 
> that lisp code is also lisp data as the key to the power of lisp. You
> can write powerful macros in the very same language that you write the
> rest of your app in because your macros use the full power of the
> language to manipulate code as data. This is a powerful feature which
> does not really show returns in tiny toy benchmarks like these, but
> only in larger more complex systems. In these larger systems, you end
> up writing your own mini language "on lisp"[1] and solving your problem
> in this new domain specific language.
> 
> regards,
> 
> Ralph
> 
> [1] hence Paul Graham's book title.
From: Raffael Cavallaro
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <2005122914534950073-raffaelcavallaro@pasdespamsilvousplaitmaccom>
On 2005-12-29 11:23:01 -0500, John M <·····@yahoo.com> said:

> 
> What is your view of the Gabriel benchmark suite for measuring performance
> across these language implementations?  Is this suite available for Python?

<http://www.chez.com/emarsden/downloads/cl-bench.tar.gz>


is a set of common-lisp performance benchmarks along with instructions 
on how to run them for different lisp implementations. They are not 
available so far as I know for Python, but given your skill in 
translating lisp benchmarks into Python I don't think this will be much 
of a problem for you. I would suggest that you find the particular 
benchmarks that interest you (e.g., arrays) and translate them into 
Python for comparison. Again, if you're interested in performance I 
suggest you find an implementation that performs well in the area that 
most concerns you. You can do this by running the cl-bench suite in all 
of the common lisp implementations you have available to you.

Finally, if you're going to run Python benchmarks from the command line:

python ./somebenchmark.py

I suggest you make sure that your common lisp is not loading any init 
files when it starts especially if you use the shell time command 
rather than a call to common lisp's time function within your benchmark 
itself. Otherwise you'll be timing the load of some init file which 
your benchmark probably doesn't need or use rather than the benchmark. 
For example, in sbcl:

rafg5:~/Developer raffaelc$ sbcl --sysinit /dev/null --userinit 
/dev/null --load "/raflisp/fib-lucas-bench.fasl"

which loads no system init file, no user init file, and loads a 
compiled fasl file (not a lisp source file) which includes a call to 
the common lisp time function.


regards,

Ralph
From: John M
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <mcmdnbYPC_y8cS_eRVn-iA@comcast.com>
Raffael Cavallaro wrote:

> On 2005-12-27 23:10:24 -0500, John M <·····@yahoo.com> said:
> 
>> I was surprised at the result since I had read that
>> Lisp was the faster language.
> 
> As other replies have suggested, you need to compile your lisp code
> first, and algorithm and bignum libraries matter quite a bit here.
> 
> This version:
> 
> (defun fib (x)
>   (declare (optimize (speed 3) (safety 0) (debug 0) (compilation-speed
> 0) (space 3))
>            (integer x))
>   (if (< x 1) 0
>     (let ((n (1- x)))
>       (labels ((fib-aux (number)
>                  (case number
>                    ((0) (values 1 0))
>                    ((1) (values 0 1))
>                    (otherwise (progn
>                                 (let ((k (floor number 2)))
>                                   (multiple-value-bind (a b) (fib-aux k)
>                                     (let* ((aa (* a a))
>                                            (bb (* b b))
>                                            (ab2 (* 2 a b))
>                                            (ab2bb (+ ab2 bb)))
>                                       (if (= number (* 2 k))
>                                         (values (+ aa bb) ab2bb)
>                                         (values ab2bb (+ ab2bb aa
>                                         bb)))))))))))
>         (if (<= n 1) 1
>           (progn
>             (let ((k (floor n 2)))
>               (multiple-value-bind (a b) (fib-aux k)
>                 (let ((ab (+ a b)))
>                   (if (= n (* 2 k))
>                     (+ (* ab ab) (* b b))
>                     (+ (* ab ab) (* 2 b ab))))))))))))
> 
> computes fib (expt 10 6) in about 2 seconds on a dual G5 2 GHz
> (although only 1 cpu is used by sbcl):
> 
> * (time (fib (expt 10 6)))
> 
> Evaluation took:
>   2.048 seconds of real time
>   1.995197 seconds of user run time
>   0.010075 seconds of system run time
>   0 page faults and
>   909,680 bytes consed.
>
19532821287077577316320149475962563324435429965918733969534051945716252578870156947666419876341501461288795243352202360846255109120195602337440154381151966361569199621256428943033701138278006380027674115279274666698655783793188228320612714975832303348548934895725992307229129019282092643316275217308614600179125820426996599360209593392020051848620284024473431398113674187202038684801753185386211128781082406177413832935545616876064540651259547180291265479428940369816592063610193592913521354103767990829403201557027161153950319759732477821629576316296533566947776632850623452455934606475750259358134434578167676462587885901137272990737294785114480895724561915035070255895291168685500088020132334587472177947814475467920160901706425856293597475465327575757400774320349134287851897953543047345603077650789387672865391667992328174493619915237681495576320853710478597061884387315305823956275608790631078190049751695947097367138917457045552021351233507944033607120305041446852210415650373210679322756258647511914611417360349681217380234224786080292021093192496490409832397066832470544417635125267324552754195016838452060230073949598542792982978312043821157576457876924955833514025221527206624418090032593807536284917966809529711850719137983367887377045991363933395581421203699026161797211325091840023055327607104316478190974300434647793363287601469996128023925829471557316688943339455429292871877487747892042961663565366107960239197021097284729667094273345863447980486339446352116549715072613427682054793209317507988801013041602798250635418234403455874223670128266635693461129461312312838906003654732766024569315151850018328483150645480029978935985161237074046158229354440701748339514575869547491750264542126364262224720600488554625899611904758921012242805428986215946466624785643735722177755498760876859120301185516356689020103446399839773266388890365078416180709154525299275973521395741547772914600879431433915606044582510782351166271892637923313014643880597879468444879060576786297460989627426663569682474293386740207436559426057944790711930522589315...
> 
> 
> The
> 
> scheme version it is based on:
> 
> (define (fib x)
>   (if (< x 1) 0
>       (let ((n (- x 1)))
>         (letrec ((fib-aux
>                   (match-lambda
>                    (0 (values 1 0))
>                    (1 (values 0 1))
>                    (n (let ((k (quotient n 2)))
>                         (receive (a b) (fib-aux k)
>                           (let* ((aa (* a a))
>                                  (bb (* b b))
>                                  (ab2 (* 2 a b))
>                                  (ab2bb (+ ab2 bb)))
>                             (if (= n (* 2 k))
>                                 (values (+ aa bb) ab2bb)
>                                 (values ab2bb (+ ab2bb aa bb))))))))))
>           (if (<= n 1)
>               1
>               (let ((k (quotient n 2)))
>                 (receive (a b) (fib-aux k)
>                   (let ((ab (+ a b)))
>                     (if (= n (* 2 k))
>                         (+ (* ab ab) (* b b))
>                         (+ (* ab ab) (* 2 b ab)))))))))))
> 
> when compiled with chicken using the gmp library, computes (fib (expt
> 10 6)) in less than a tenth of a second:
> 
>  )   ___
> (__/_____) /)   ,    /)
>   /       (/      _ (/_   _ __
>  /        / )__(_(__/(___(/_/ (_
> (______)
> Version 2, Build 216 - macosx-unix-gnu-ppc - [ libffi dload ptables ]
> (c)2000-2005 Felix L. Winkelmann
> #;1> (use fib-aux-cx)
> ; loading /usr/local/lib/chicken/fib-aux-cx.so ...
> ; loading /usr/local/lib/chicken/numbers-base.so ...
> #;2> (time (fib (expt 10 6)))
>    0.089 seconds elapsed
>        0 seconds in (major) GC
>      105 mutations
>       10 minor GCs
>        0 major GCs
>
19532821287077577316320149475962563324435429965918733969534051945716252578870156947666419876341501461288795243352202360846255109120195602337440154381151966361569199621256428943033701138278006380027674115279274666698655783793188228320612714975832303348548934895725992307229129019282092643316275217308614600179125820426996599360209593392020051848620284024473431398113674187202038684801753185386211128781082406177413832935545616876064540651259547180291265479428940369816592063610193592913521354103767990829403201557027161153950319759732477821629576316296533566947776632850623452455934606475750259358134434578167676462587885901137272990737294785114480895724561915035070255895291168685500088020132334587472177947814475467920160901706425856293597475465327575757400774320349134287851897953543047345603077650789387672865391667992328174493619915237681495576320853710478597061884387315305823956275608790631078190049751695947097367138917457045552021351233507944033607120305041446852210415650373210679322756258647511914611417360349681217380234224786080292021093192496490409832397066832470544417635125267324552754195016838452060230073949598542792982978312043821157576457876924955833514025221527206624418090032593807536284917966809529711850719137983367887377045991363933395581421203699026161797211325091840023055327607104316478190974300434647793363287601469996128023925829471557316688943339455429292871877487747892042961663565366107960239197021097284729667094273345863447980486339446352116549715072613427682054793209317507988801013041602798250635418234403455874223670128266635693461129461312312838906003654732766024569315151850018328483150645480029978935985161237074046158229354440701748339514575869547491750264542126364262224720600488554625899611904758921012242805428986215946466624785643735722177755498760876859120301185516356689020103446399839773266388890365078416180709154525299275973521395741547772914600879431433915606044582510782351166271892637923313014643880597879468444879060576786297460989627426663569682474293386740207436559426057944790711930522589315...
> 
> So,
> 
> compile your lisp code, find better algorithms, and, as suggested by
> Bruce Hoult:
> "Make sure you're measuring something that is important to you."

FWIW, here is Raffael's algorithm coded in python.
I tried to translate it almost line by line.

from math import floor

def fib_aux (number):
  if number == 0: return (1, 0)
  elif number == 1: return (0, 1)
  else:
    k = floor (number/2)
    a,b = fib_aux (k)
    aa = a*a
    bb = b*b
    ab2 = 2*a*b
    ab2bb = ab2 + bb
    if number == 2*k: return (aa+bb, ab2bb)
    else: return (ab2bb, ab2bb+aa+bb)


def fib (x):
  if x < 1: return 0
  else:
    n = x - 1
    if n <= 1: return 1
    else:
      k = floor (n/2)
      a,b = fib_aux (k)
      ab = a + b
      if n == 2*k: return ab*ab + b*b
      else: return ab*ab + 2*b*ab


fib (1000000)

% time python fib-raffael.py
0.571u 0.000s 0:00.59 96.6%     880+2305k 0+0io 0pf+0w
From: André Thieme
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1135805745.418127.131120@g43g2000cwa.googlegroups.com>
John M schrieb:

> FWIW, here is Raffael's algorithm coded in python.
> I tried to translate it almost line by line.
>
> from math import floor
>
> def fib_aux (number):
>   if number == 0: return (1, 0)
>   elif number == 1: return (0, 1)
>   else:
>     k = floor (number/2)
>     a,b = fib_aux (k)
>     aa = a*a
>     bb = b*b
>     ab2 = 2*a*b
>     ab2bb = ab2 + bb
>     if number == 2*k: return (aa+bb, ab2bb)
>     else: return (ab2bb, ab2bb+aa+bb)
>
>
> def fib (x):
>   if x < 1: return 0
>   else:
>     n = x - 1
>     if n <= 1: return 1
>     else:
>       k = floor (n/2)
>       a,b = fib_aux (k)
>       ab = a + b
>       if n == 2*k: return ab*ab + b*b
>       else: return ab*ab + 2*b*ab
>
>
> fib (1000000)
>
> % time python fib-raffael.py
> 0.571u 0.000s 0:00.59 96.6%     880+2305k 0+0io 0pf+0w

This program is a little bit shorter (3 lines) compared to the lisp
version but is running slower:

[ fib(1000000) ]
$ time python fibo.py

real    0m0.640s
user    0m0.590s
sys     0m0.006s

And to measure the startup time:

[ fib(1) ]
$ time python fibo.py

real    0m0.035s
user    0m0.009s
sys     0m0.004s


Now Lisp:
[ (fib 1000000) ]
$ time sbcl --noinform --load fibo.fasl
Evaluation took:
  0.308 seconds of real time
  0.305953 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  904,448 bytes consed.

real    0m0.446s
user    0m0.350s
sys     0m0.042s


Not counting the startup time for Lisp the Lisp code is running ca.
twice as fast as the Python program.
From: Raffael Cavallaro
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <2005122816355016807-raffaelcavallaro@pasdespamsilvousplaitmaccom>
On 2005-12-28 14:48:47 -0500, John M <·····@yahoo.com> said:

> % time python fib-raffael.py
> 0.571u 0.000s 0:00.59 96.6%     880+2305k 0+0io 0pf+0w

Ah, but this version doesn't print the result, which takes the lion's 
share of the time:

Without printing the result:

rafg5:~ raffaelc$ time python ~/Developer/fib-aux.py

real    0m0.809s
user    0m0.576s
sys     0m0.169s

and with printing the result:

rafg5:~ raffaelc$ time python ~/Developer/fib-aux-print.py
19532821287077577316320149475962563324435429965918733969534051945716252578870156947666419876341501461288795243352202360846255109120195602337440154381151966361569199621256428943033701138278006380027674115279274666698655783793188228320612714975832303348548934895725992307229129019282092643316275217308614600179125820426996599...


real 

   0m30.673s
user    0m30.226s
sys     0m0.267s


or more than four times as long. In sbcl, *with* printing the result:

* (time (format t "~a" (fib (expt 10 6))))
1953282128707757731632014947596256332443542996591873396953405194571625257887015694766641987634150146128879524335220236084625510912019560233744015438115196636156919962125642894303370113827800638002767411527927466669865578379318822832061271497583230334854893489572599230722912901928209264331627521730861460017912582...

Evaluation 

took:
  5.748 seconds of real time
  5.615717 seconds of user run time
  0.058183 seconds of system run time
  0 page faults and
  5,193,312 bytes consed.


So sbcl computes the result more slowly, but prints it much faster. All 
of which proves what, exactly? That it is important to time something 
which is actually important to you, since synthetic benchmarks almost 
never excercise precisely those bottlenecks that will limit performance 
in your real world work or application.

In the case of large fibonacci numbers you are in large part measuring 
the performance of your implementation's bignum library. The fastest 
lisp times so far in this thread (lisp broadly defined) are for chicken 
scheme because chicken scheme uses the Gnu Multiprecision Library, 
which is a very high performance bignum library.

Finally, note that the biggest speedups by far came not from having the 
best compiler that generated the most optimal code, nor from arcane 
optimization techniques, but rather from selecting a more clever 
algorithm.

For more on these for fibonacci numbers in particular see:
<http://en.wikipedia.org/wiki/Fibonacci_number_program>
<http://cubbi.com/fibonacci.html>

regards,

Ralph
From: Surendra Singhi
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <64p81xma.fsf@netscape.net>
Raffael Cavallaro <················@pas-d'espam-s'il-vous-plait-mac.com>
writes:

> On 2005-12-27 23:10:24 -0500, John M <·····@yahoo.com> said:
>
>> I was surprised at the result since I had read that
>> Lisp was the faster language.
>
> As other replies have suggested, you need to compile your lisp code
> first, and algorithm and bignum libraries matter quite a bit here.
>
>
> compile your lisp code, find better algorithms, and, as suggested by

Another extremely fast algorithm (can be obtained by solving the recurrence
relation for Fibonacci series)

(defconstant phi (/ (1+ (sqrt 5d0)) 2))
(defconstant si (/ (- 1 (sqrt 5d0)) 2))

(defun fib (n)
  (round (/ (- (expt phi n) (expt si n)) (sqrt 5d0))))

With this algorithm you may not get the same result (for n > 37) because of
floating point approximations. 

Any suggestions on how to get accurate result using the above algorithm.

Cheers.
-- 
Surendra Singhi
http://www.public.asu.edu/~sksinghi/index.html

,----
| Great wits are sure to madness near allied,
| And thin partitions do their bounds divide.
| 
|     (John Dryden, Absalom and Achitophel, 1681)
`----
From: John Thingstad
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <op.s2jtf0i8pqzri1@mjolner.upc.no>
On Thu, 29 Dec 2005 02:49:01 +0100, Surendra Singhi  
<·········@netscape.net> wrote:

> Raffael Cavallaro <················@pas-d'espam-s'il-vous-plait-mac.com>
> writes:
>
>> On 2005-12-27 23:10:24 -0500, John M <·····@yahoo.com> said:
>>
>>> I was surprised at the result since I had read that
>>> Lisp was the faster language.
>>
>> As other replies have suggested, you need to compile your lisp code
>> first, and algorithm and bignum libraries matter quite a bit here.
>>
>>
>> compile your lisp code, find better algorithms, and, as suggested by
>
> Another extremely fast algorithm (can be obtained by solving the  
> recurrence
> relation for Fibonacci series)
>
> (defconstant phi (/ (1+ (sqrt 5d0)) 2))
> (defconstant si (/ (- 1 (sqrt 5d0)) 2))
>
> (defun fib (n)
>   (round (/ (- (expt phi n) (expt si n)) (sqrt 5d0))))
>
> With this algorithm you may not get the same result (for n > 37) because  
> of
> floating point approximations.
>
> Any suggestions on how to get accurate result using the above algorithm.
>
> Cheers.

Use the binominal series for sqrt 5.

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
From: Raffael Cavallaro
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <2005122923182143658-raffaelcavallaro@pasdespamsilvousplaitmaccom>
On 2005-12-28 20:49:01 -0500, Surendra Singhi <·········@netscape.net> said:


> 
> Another extremely fast algorithm (can be obtained by solving the recurrence
> relation for Fibonacci series)
> 
> (defconstant phi (/ (1+ (sqrt 5d0)) 2))
> (defconstant si (/ (- 1 (sqrt 5d0)) 2))
> 
> (defun fib (n)
>   (round (/ (- (expt phi n) (expt si n)) (sqrt 5d0))))
> 
> With this algorithm you may not get the same result (for n > 37) because of
> floating point approximations.
> Any suggestions on how to get accurate result using the above algorithm.
> 
> Cheers.

by simply changing the constants thus:

(defconstant root-5 (rationalize (sqrt 5.0d0)))
(defconstant phi (/ (1+ root-5) 2))
(defconstant si (/ (- 1 root-5) 2))


and the body of fib similarly:

(defun fib (n)
  (round (/ (- (expt phi n) (expt si n)) root-5)))

you'll get answers that are accurate up to n = 68.

If you want accurate results for large n, and you follow John 
Thingstad's suggestion, you'll need to compute a fair number of terms 
in the binomial expansion.

BTW, this algorithm is not particularly fast for large n - for example, 
(fib (expt 10 4)) computed with this algorithm takes 5 times as long as 
(fib (expt 10 6)) does with the other fib-aux algorithm - yes, that's 
not a typo, the other algorithm does fib 1 million 5x faster than this 
one does fib ten thousand. I didn't want to wait for this algorithm to 
finish fib 1 million - it was still merrily eating 98% cpu after more 
than 3 minutes...

This algorithm does have the virtue of being quite readable and elegant though.

regards,

Ralph
From: Bruce Hoult
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <bruce-0AFB50.21343628122005@news.clear.net.nz>
In article 
<····································@pasdespamsilvousplaitmaccom>,
 Raffael Cavallaro 
 <················@pas-d'espam-s'il-vous-plait-mac.com> wrote:

> On 2005-12-27 23:10:24 -0500, John M <·····@yahoo.com> said:
> 
> > I was surprised at the result since I had read that
> > Lisp was the faster language.
> 
> As other replies have suggested, you need to compile your lisp code 
> first, and algorithm and bignum libraries matter quite a bit here.
> 
> This version:
> 
> (defun fib (x)
>   (declare (optimize (speed 3) (safety 0) (debug 0) (compilation-speed 
> 0) (space 3))
>            (integer x))
>   (if (< x 1) 0
>     (let ((n (1- x)))
>       (labels ((fib-aux (number)
>                  (case number
>                    ((0) (values 1 0))
>                    ((1) (values 0 1))
>                    (otherwise (progn
>                                 (let ((k (floor number 2)))
>                                   (multiple-value-bind (a b) (fib-aux k)
>                                     (let* ((aa (* a a))
>                                            (bb (* b b))
>                                            (ab2 (* 2 a b))
>                                            (ab2bb (+ ab2 bb)))
>                                       (if (= number (* 2 k))
>                                         (values (+ aa bb) ab2bb)
>                                         (values ab2bb (+ ab2bb aa 
>                                 bb)))))))))))
>         (if (<= n 1) 1
>           (progn
>             (let ((k (floor n 2)))
>               (multiple-value-bind (a b) (fib-aux k)
>                 (let ((ab (+ a b)))
>                   (if (= n (* 2 k))
>                     (+ (* ab ab) (* b b))
>                     (+ (* ab ab) (* 2 b ab))))))))))))
> 
> computes fib (expt 10 6) in about 2 seconds on a dual G5 2 GHz 
> (although only 1 cpu is used by sbcl):

Nice.

Unfortunately Gwydon Dylan's bignum library sucks.  It's written for 
portability (no assembler used, just pure Dylan) and is s l o w :-(  So 
the following translation takes 27 seconds on a Dual G5 2 GHz, despite 
perfect type inferencing in fib-aux :-(

I guess the good news is that the GC time is essentially zero.


module: fast-fib

define method fib(x == 0) 0 end;
define method fib(x == 1) 1 end;

define method fib(x)
  let n = x - 1;
  local method fib-aux(n :: <integer>)
         => (a :: <extended-integer>, b :: <extended-integer>);
          select (n)
            0 => values(#e1, #e0);
            1 => values(#e0, #e1);
            otherwise =>
              begin
                let k = truncate/(n, 2);
                let (a, b) = fib-aux(k);
                let aa = a * a;
                let bb = b * b;
                let ab2 = #e2 * a * b;
                let ab2bb = ab2 + bb;
                if (n = 2 * k)
                  values(aa + bb, ab2bb);
                else
                  values(ab2bb, ab2bb + aa + bb);
                end;
              end;
          end select;
        end method fib-aux;

  let k = truncate/(n, 2);
  let (a, b) = fib-aux(k);
  let ab = a + b;
  if (n = 2 * k)
    ab * ab + b * b;
  else
    ab * ab + 2 * b * ab;
  end;
end fib;

fib(1000000);
//format-out("%=\n", fib(1000000));

-- 
Bruce |  41.1670S | \  spoken |          -+-
Hoult | 174.8263E | /\ here.  | ----------O----------
From: Frank Buss
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1v6io4vg7l0tb.6v6zphlm1847$.dlg@40tude.net>
John M wrote:

> Here are the functions in Lisp and Python:

I can't reproduce this. It is useless anyway, because your are testing the
bignum and GC speed, as Bruce Hoult already said, but here is my result:

merlin:~# time python fib.py

real    9m2.379s
user    9m1.924s
sys     0m0.262s

merlin:~# time lisp -load fib
; Loading #p"/root/fib.lisp".
; Compiling LAMBDA NIL:

; In: LAMBDA NIL

;   (LET (#)
;     )
; Note: Variable X defined but never used.
;
; Compiling Top-Level Form:

; Compilation unit finished.
;   1 note


; Evaluation took:
;   224.95 seconds of real time
;   174.43748 seconds of user run time
;   50.52132 seconds of system run time
;   628,275,946,935 CPU cycles
;   [Run times include 75.7 seconds GC run time]
;   0 page faults and
;   43,466,286,760 bytes consed.
;

real    3m44.977s
user    2m54.448s
sys     0m50.543s

merlin:~# python -V
Python 2.3.5

merlin:~# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 3
model name      : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping        : 3
cpu MHz         : 2794.181
cache size      : 1024 KB
physical id     : 0
siblings        : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor
ds_cpl cid
bogomips        : 5521.40

merlin:~# lisp
CMU Common Lisp CVS release-19a 19a-release-20040728 + minimal debian
patches, running on merlin
With core: /usr/lib/cmucl/lisp.core
Dumped on: Wed, 2005-12-28 05:18:06+01:00 on merlin
For support see http://www.cons.org/cmucl/support.html Send bug reports to
the debian BTS.
or to ········@debian.org
type (help) for help, (quit) to exit, and (demo) to see the demos

Loaded subsystems:
    Python 1.1, target Intel x86
    CLOS based on Gerd's PCL 2004/04/14 03:32:47

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: tichy
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <dotbhn$on5$1@nemesis.news.tpi.pl>
John M wrote:
> This is not a scientific test of any kind and I'm not
> an expert on either Python or Lisp.
> 
> Having said that, I was interested in getting an idea
> about the relative performance of each language when
> computing the Fibonacci function.
> 
> I tried to come up with equivalent implementations
> in each language.  Apologies if I did not succeed.
> 
> I was surprised at the result since I had read that
> Lisp was the faster language.

 > [ ..... ]

> (defun fib2 (n)
>   (loop with f = 0 and f1 = 1 and f0 = 0
>         initially
>             (when (= n 0) (setf f f0))
>             (when (= n 1) (setf f f1))
>         for i from 2 to n
>             do (setf f (+ f1 f0) f0 f1 f1 f)
>         finally (return f)))
> 

athlon 1.6 Ghz:

SBCL   0.9.7    256 sec
Python 2.4.1    262 sec
CMUCL   19 c    279 sec
CLISP   2.35    500 sec

I think you should comipile your lisp 'fib2' first...
(compile 'fib2)

Imho, your use of LOOP is a little overkill, my version:

(defun fib (n &aux (f (if (zerop n) 0 1))
                    (f1 1)
                    (f0 0))
   (declare (type (integer 0 *) n f f1 f0) (optimize speed))
   (loop repeat (1- n) do (setq f (+ f1 f0) f0 f1 f1 f))
   f)

Regards, Szymon.
From: Brian Downing
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <otqsf.680163$xm3.498558@attbi_s21>
In article <············@nemesis.news.tpi.pl>,
tichy  <···········@o2.pl> wrote:
> > (defun fib2 (n)
> >   (loop with f = 0 and f1 = 1 and f0 = 0
> >         initially
> >             (when (= n 0) (setf f f0))
> >             (when (= n 1) (setf f f1))
> >         for i from 2 to n
> >             do (setf f (+ f1 f0) f0 f1 f1 f)
> >         finally (return f)))
> 
> Imho, your use of LOOP is a little overkill, my version:
> 
> (defun fib (n &aux (f (if (zerop n) 0 1))
>                     (f1 1)
>                     (f0 0))
>    (declare (type (integer 0 *) n f f1 f0) (optimize speed))
>    (loop repeat (1- n) do (setq f (+ f1 f0) f0 f1 f1 f))
>    f)

Isn't Fibonacci sort of the canonical example for the bodyless do?

(defun fib (n)
  (do ((last 0 cur)
       (cur 1 (+ last cur)))
      ((zerop (decf n)) cur)))

-bcd
-- 
*** Brian Downing <bdowning at lavos dot net> 
From: Pascal Costanza
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <41fdlrF1e70luU1@individual.net>
John M wrote:
> This is not a scientific test of any kind and I'm not
> an expert on either Python or Lisp.
> 
> Having said that, I was interested in getting an idea
> about the relative performance of each language when
> computing the Fibonacci function.
> 
> I tried to come up with equivalent implementations
> in each language.  Apologies if I did not succeed.
> 
> I was surprised at the result since I had read that
> Lisp was the faster language.

In Lisp, you have to work for efficiency because by default, Lisp opts 
for safety and flexibility.

> Here are the functions in Lisp and Python:
> 
> #+:cmu (setf ext:*gc-verbose* nil)
> 
> (defun fib2 (n)
>   (loop with f = 0 and f1 = 1 and f0 = 0
>         initially
>             (when (= n 0) (setf f f0))
>             (when (= n 1) (setf f f1))
>         for i from 2 to n
>             do (setf f (+ f1 f0) f0 f1 f1 f)
>         finally (return f)))

Try this:

(defun fib2 (n)
   (declare (integer n))
   (loop repeat (1+ n)
         for f of-type integer = 0 then (+ f0 f1)
         for f0 of-type integer = 0 then f1
         for f1 of-type integer = 1 then f
         finally (return f)))

[Untested. I don't whether this will substantially improve performance.]

Also don't forget to (declaim (optimize (speed 3))), and maybe play 
around with the other optimization settings.

> (time (let ((x (fib2 1000000)))))

In my experience, using time like this yields suboptimal results. Better 
do that:

(defun run ()
   (fib2 1000000))

(time (run))

This typically improves the measurements. (I have no idea why.)


Pascal

-- 
My website: http://p-cos.net
Closer to MOP & ContextL:
http://common-lisp.net/project/closer/
From: John M
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <wfqdneOCiakMTC_eRVn-iA@comcast.com>
Pascal Costanza wrote:

> John M wrote:
>> This is not a scientific test of any kind and I'm not
>> an expert on either Python or Lisp.
>> 
>> Having said that, I was interested in getting an idea
>> about the relative performance of each language when
>> computing the Fibonacci function.
>> 
>> I tried to come up with equivalent implementations
>> in each language.  Apologies if I did not succeed.
>> 
>> I was surprised at the result since I had read that
>> Lisp was the faster language.
> 
> In Lisp, you have to work for efficiency because by default, Lisp opts
> for safety and flexibility.
> 
>> Here are the functions in Lisp and Python:
>> 
>> #+:cmu (setf ext:*gc-verbose* nil)
>> 
>> (defun fib2 (n)
>>   (loop with f = 0 and f1 = 1 and f0 = 0
>>         initially
>>             (when (= n 0) (setf f f0))
>>             (when (= n 1) (setf f f1))
>>         for i from 2 to n
>>             do (setf f (+ f1 f0) f0 f1 f1 f)
>>         finally (return f)))
> 
> Try this:
> 
> (defun fib2 (n)
>    (declare (integer n))
>    (loop repeat (1+ n)
>          for f of-type integer = 0 then (+ f0 f1)
>          for f0 of-type integer = 0 then f1
>          for f1 of-type integer = 1 then f
>          finally (return f)))
> 
> [Untested. I don't whether this will substantially improve performance.]
> 
> Also don't forget to (declaim (optimize (speed 3))), and maybe play
> around with the other optimization settings.
> 
>> (time (let ((x (fib2 1000000)))))
> 
> In my experience, using time like this yields suboptimal results. Better
> do that:
> 
> (defun run ()
>    (fib2 1000000))
> 
> (time (run))
> 
> This typically improves the measurements. (I have no idea why.)
> 
> 
> Pascal
> 

Not much change.

% time lisp -load fib-pascal
; Loading #p"/home/test/fib-pascal.x86f".

; Evaluation took:
;   184.54 seconds of real time
;   172.06851 seconds of user run time
;   0.276954 seconds of system run time
;   464,881,586,160 CPU cycles
;   [Run times include 64.86 seconds GC run time]
;   0 page faults and
;   43,402,373,192 bytes consed.
;
172.074u 0.287s 3:04.55 93.3%   124+2439k 0+0io 0pf+0w


Now, making a change in the loop to do nothing:

% time python fib.py
0.254u 0.000s 0:00.42 59.5%     834+1633k 31+0io 19pf+0w

time for lisp:
; Evaluation took:
;   3.86 seconds of real time
;   3.666366 seconds of user run time
;   0.015839 seconds of system run time
;   9,726,282,480 CPU cycles
;   [Run times include 0.08 seconds GC run time]
;   1 page fault and
;   64,000,464 bytes consed.
;
3.682u 0.029s 0:03.90 94.8%     121+2381k 0+0io 1pf+0w


In Raffael's lisp version:

; Evaluation took:
;   1.93 seconds of real time
;   1.850718 seconds of user run time
;   0.0 seconds of system run time
;   4,877,058,816 CPU cycles
;   0 page faults and
;   2,123,256 bytes consed.
;
1.862u 0.010s 0:02.15 86.9%     120+2369k 0+0io 71pf+0w

Mind you that I can still not print the value with any
algorithm in lisp but the python version does give an output.

Which suggests that you have to get away from consing/gc,
which is the nature of lisp programs, to get the speed.

It seems to me, that until the consing/gc becomes insignificant
in the implementation, then you will have the simple (simple to read)
algorithms in lisp be less efficient than similar algorithms 
expressed in other languages that don't do as much consing/gc.

(I'm assuming python doesn't do as much consing but I don't know)
From: ··············@hotmail.com
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1135793213.909355.273720@z14g2000cwz.googlegroups.com>
John M wrote:
w
>
> Mind you that I can still not print the value with any
> algorithm in lisp but the python version does give an output.
>
> Which suggests that you have to get away from consing/gc,
> which is the nature of lisp programs, to get the speed.
>
> It seems to me, that until the consing/gc becomes insignificant
> in the implementation, then you will have the simple (simple to read)
> algorithms in lisp be less efficient than similar algorithms
> expressed in other languages that don't do as much consing/gc.
>
> (I'm assuming python doesn't do as much consing but I don't know)

You are being far too general in your equation of consing/gc with Lisp.
Python, as far as I know, depends on automatic memory management as
much as Lisp does. "consing" is just Lisp jargon for "total allocated
memory", which should not be confused with "peak memory usage."

What you are probably testing is the load which the "bignum" (i.e.
integers too large to fit in a machine word) routines in a *particular
Lisp implementation* places on the memory management/garbage collection
subsystem, and how well the memory management subsystem of the
particular Lisp implementation responsds to that load. This is not a
feature of Lisp in general, but of Lisp implementations in particular.

As in all artifical benchmarks, you have to be careful to identify
exactly what is being measured, and how relevant that is to an
application of real interest. I'm willing to bet that few delivered
Lisp applications are bottlenecked by bignum performance.
From: Edi Weitz
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <uhd8tqdgl.fsf@agharta.de>
On Wed, 28 Dec 2005 12:55:27 -0500, John M <·····@yahoo.com> wrote:

> Mind you that I can still not print the value with any algorithm in
> lisp but the python version does give an output.

Huh?  I had no problems having the value printed with LispWorks.

> Which suggests that you have to get away from consing/gc, which is
> the nature of lisp programs, to get the speed.
>
> It seems to me, that until the consing/gc becomes insignificant in
> the implementation, then you will have the simple (simple to read)
> algorithms in lisp be less efficient than similar algorithms
> expressed in other languages that don't do as much consing/gc.

It seems to me that for someone who admitted that he is "not an expert
on either Python or Lisp" a couple of hours ago you come to general
conclusions (based on exactly one contrived example) pretty quickly.

From my experience with Common Lisp in the last years I'd say that
what you claimed above is nonsense.

Cheers,
Edi.

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: John M
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <Xq2dnUJpVd-SdS_eRVn-pA@comcast.com>
Edi Weitz wrote:

> On Wed, 28 Dec 2005 12:55:27 -0500, John M <·····@yahoo.com> wrote:
> 
>> Mind you that I can still not print the value with any algorithm in
>> lisp but the python version does give an output.
> 
> Huh?  I had no problems having the value printed with LispWorks.

I will try LispWorks.  Thanks

> 
>> Which suggests that you have to get away from consing/gc, which is
>> the nature of lisp programs, to get the speed.
>>
>> It seems to me, that until the consing/gc becomes insignificant in
>> the implementation, then you will have the simple (simple to read)
>> algorithms in lisp be less efficient than similar algorithms
>> expressed in other languages that don't do as much consing/gc.
> 
> It seems to me that for someone who admitted that he is "not an expert
> on either Python or Lisp" a couple of hours ago you come to general
> conclusions (based on exactly one contrived example) pretty quickly.

This example I came across while trying to learn lisp.
Nothing has been contrived or composed by me with any malice. (if that is
what you imply)

Simply, I was trying to learn the loop macro and thought of coding it
in python, since I am slightly more familiar with that language.

It is simply an observation/opinion formed from my limited knowledge.

> 
> From my experience with Common Lisp in the last years I'd say that
> what you claimed above is nonsense.

This claim does nothing toward making me a more informed Lisp programmer.

> 
> Cheers,
> Edi.
> 

Cheers to you as well.
From: tichy
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <douql2$879$1@nemesis.news.tpi.pl>
John M wrote:
> Edi Weitz wrote:
> 
>> On Wed, 28 Dec 2005 12:55:27 -0500, John M <·····@yahoo.com> wrote:
>>
>>> Mind you that I can still not print the value with any algorithm in
>>> lisp but the python version does give an output.
>> Huh?  I had no problems having the value printed with LispWorks.
> 
> I will try LispWorks.  Thanks

I have NO problem with this. Maybe your 'stack size' limit is too small...
(I set it to 'unlimited'):

bash-3.00$ ulimit -a

core file size        (blocks, -c) 0
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) 32
max memory size       (kbytes, -m) unlimited
open files                    (-n) 1024
pipe size          (512 bytes, -p) 8
stack size            (kbytes, -s) unlimited
cpu time             (seconds, -t) unlimited
max user processes            (-u) 4096
virtual memory        (kbytes, -v) unlimited


Regards, Szymon.
From: Pascal Costanza
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <41g8nnF1e91hdU1@individual.net>
John M wrote:

> This example I came across while trying to learn lisp.
> Nothing has been contrived or composed by me with any malice. (if that is
> what you imply)
> 
> Simply, I was trying to learn the loop macro and thought of coding it
> in python, since I am slightly more familiar with that language.
> 
> It is simply an observation/opinion formed from my limited knowledge.
> 
>>From my experience with Common Lisp in the last years I'd say that
>>what you claimed above is nonsense.
> 
> This claim does nothing toward making me a more informed Lisp programmer.

Don't get sidetracked by benchmarks that don't prove or disprove 
anything. If your goal is to become a better Lisp programmer, it's 
better to focus on large programs and don't worry too much about 
efficiency in the beginning (but make sure that you use a good 
tutorial). Better algorithms and better program organization are more 
important than micro efficiency.


Pascal

-- 
My website: http://p-cos.net
Closer to MOP & ContextL:
http://common-lisp.net/project/closer/
From: Edi Weitz
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <ud5jhq9rr.fsf@agharta.de>
On Wed, 28 Dec 2005 14:31:12 -0500, John M <·····@yahoo.com> wrote:

> Nothing has been contrived or composed by me with any malice. (if
> that is what you imply)

I meant "contrived" in the sense that your example hardly reflects
real-world usage and therefore is more or less irrelevant.

> This claim does nothing toward making me a more informed Lisp
> programmer.

Sure, no claim will do that.  You have to read books and write
programs.  If you haven't done so already take a look at

  <http://gigamonkeys.com/book/>.

Cheers,
Edi.

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: Frank Buss
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1rbew2opffnrg.163svjh6fe88a.dlg@40tude.net>
John M wrote:

> Simply, I was trying to learn the loop macro and thought of coding it
> in python, since I am slightly more familiar with that language.

I have some nice loop examples, if you want to learn the loop macro:

http://www.frank-buss.de/trisoli/trisoli.lisp.txt

The program solves this game:

http://www.frank-buss.de/trisoli/index.html

I think this would be a more real world example. Perhaps you should try it
with bigger boards and implement the same in Python, if you want to compare
the speed of both languages. But I think speed is not so important, because
computers are already very fast and different language implementations
differ only in factor 2-5. This is far below e.g. if you implement an
O(n^2) algorithm instead an O(log(n)) algorithm and with Lisp, at least for
me, it is easier to write programs and less program writing time is needed.

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: John M
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <gv2dnVgACbP9lS7enZ2dnUVZ_tidnZ2d@comcast.com>
Frank Buss wrote:

> John M wrote:
> 
>> Simply, I was trying to learn the loop macro and thought of coding it
>> in python, since I am slightly more familiar with that language.
> 
> I have some nice loop examples, if you want to learn the loop macro:
> 
> http://www.frank-buss.de/trisoli/trisoli.lisp.txt
> 
> The program solves this game:
> 
> http://www.frank-buss.de/trisoli/index.html
> 
> I think this would be a more real world example. Perhaps you should try it
> with bigger boards and implement the same in Python, if you want to
> compare the speed of both languages. But I think speed is not so
> important, because computers are already very fast and different language
> implementations differ only in factor 2-5. This is far below e.g. if you
> implement an O(n^2) algorithm instead an O(log(n)) algorithm and with
> Lisp, at least for me, it is easier to write programs and less program
> writing time is needed.
> 

Frank, thank you.
Actually it was your post that led me to experiment with LOOP. :)
From: Pascal Costanza
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <41g8crF1dnm81U1@individual.net>
John M wrote:

> It seems to me, that until the consing/gc becomes insignificant
> in the implementation, then you will have the simple (simple to read)
> algorithms in lisp be less efficient than similar algorithms 
> expressed in other languages that don't do as much consing/gc.
> 
> (I'm assuming python doesn't do as much consing but I don't know)

That's quite a conclusion based on a very simplistic experiment.


Pascal

-- 
My website: http://p-cos.net
Closer to MOP & ContextL:
http://common-lisp.net/project/closer/
From: Edi Weitz
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <ulky5qfvr.fsf@agharta.de>
On Tue, 27 Dec 2005 23:10:24 -0500, John M <·····@yahoo.com> wrote:

> This is not a scientific test of any kind and I'm not an expert on
> either Python or Lisp.
>
> Having said that, I was interested in getting an idea about the
> relative performance of each language when computing the Fibonacci
> function.
>
> I tried to come up with equivalent implementations in each language.
> Apologies if I did not succeed.
>
> I was surprised at the result since I had read that Lisp was the
> faster language.

Just to add another data point here are the results on my lowly laptop
using LispWorks 4.4.6 (using your unmodified Lisp code without
any declarations):

  ···@vmware:/tmp$ time python fib.py 

  real    4m13.355s
  user    4m5.330s
  sys     0m6.990s


  CL-USER 4 > (time (fib2 1000000))
  Timing the evaluation of (FIB2 1000000)
  ; Loading fasl file /usr/local/lib/LispWorks/lib/4-4-0-0/load-on-demand/pcl/util/callcoun.ufsl

  user time    =    202.500
  system time  =      0.160
  Elapsed time =   0:03:23
  Allocation   = 44379055328 bytes standard / 0 bytes conses
  1 Page faults

Cheers,
Edi.

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: Bruce Hoult
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <bruce-EBD855.18434328122005@news.clear.net.nz>
In article <······················@comcast.com>,
 John M <·····@yahoo.com> wrote:

> This is not a scientific test of any kind and I'm not
> an expert on either Python or Lisp.
> 
> Having said that, I was interested in getting an idea
> about the relative performance of each language when
> computing the Fibonacci function.

This is pretty silly unless the Fibonacci function or other operations 
on extremely large integers happen to be especially important to you.  
Which is fair enough if the are, but it's hardly a test of general 
language speed.

This is testing not the speed of the language (i.e. of code that you 
yourself write), but the speed of the bignum implementation and/or the 
garbage collector.

This can be shown by making a trivial change to the program that alters 
the running time dramatically:

def fib (n):
    f = 0
    f1 = 1
    f0 = 0
    if n == 0: f = f0
    if n == 1: f = f1
    i = 2
    while i <= n:
        f = f1 + f0
        f0 = f1
        f2 = f
        i += 1
    return f

fib(1000000)


That drops the time on my system from 103 seconds down to 0.55 seconds 
while executing exactly the same number of lines of code containing 
exactly the same arithmetic operations (just with different values).

Make sure you're measuring something that is important to you.

-- 
Bruce |  41.1670S | \  spoken |          -+-
Hoult | 174.8263E | /\ here.  | ----------O----------
From: Kirk Job Sluder
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <kirk-nospam-90709F.23173028122005@newsclstr02.news.prodigy.com>
In article <······················@comcast.com>,
 John M <·····@yahoo.com> wrote:

> This is not a scientific test of any kind and I'm not
> an expert on either Python or Lisp.
> 
> Having said that, I was interested in getting an idea
> about the relative performance of each language when
> computing the Fibonacci function.

I find it interesting that there is almost always a discussion on the 
python newsgroups debating its speed weighed against other languages.  
My conclusion is that no matter what your language, there is always 
going to be some benchmark to show that some other language is faster on 
that benchmark.

> -- John

-- 
Kirk Job-Sluder
From: André Thieme
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1135868730.993985.284400@g14g2000cwa.googlegroups.com>
Kirk Job Sluder schrieb:

> In article <······················@comcast.com>,
>  John M <·····@yahoo.com> wrote:
>
> > This is not a scientific test of any kind and I'm not
> > an expert on either Python or Lisp.
> >
> > Having said that, I was interested in getting an idea
> > about the relative performance of each language when
> > computing the Fibonacci function.
>
> I find it interesting that there is almost always a discussion on the
> python newsgroups debating its speed weighed against other languages.
> My conclusion is that no matter what your language, there is always
> going to be some benchmark to show that some other language is faster on
> that benchmark.

Did you consider assembler in your conclusion?
One might think that programs written directly in machine language
could have some advantages on the speed of program execution...


André
--
From: Kirk Job Sluder
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <kirk-nospam-F4CE06.12030229122005@newsclstr02.news.prodigy.com>
In article <························@g14g2000cwa.googlegroups.com>,
 "Andr� Thieme" <······························@justmail.de> wrote:

> Kirk Job Sluder schrieb:
> 
> > I find it interesting that there is almost always a discussion on the
> > python newsgroups debating its speed weighed against other languages.
> > My conclusion is that no matter what your language, there is always
> > going to be some benchmark to show that some other language is faster on
> > that benchmark.
> 
> Did you consider assembler in your conclusion?
> One might think that programs written directly in machine language
> could have some advantages on the speed of program execution...

At that point, the discussion always shifts to discussion of othe 
criteria such as portability, and development cycle. Even then, someone 
would always open the can of worms that you can get better performance 
yet by building a better hardware processor for a specific task.

And of course, somone will express the pragmatic and fiscal view that 
many, many programmer hours can be wasted making improvements that are 
trivial from a human perspective.  What you usually end up with is some 
qualitative goal such as, "I want this process to complete before the 
user gets bored waiting for it."  And then you figure out which 
combination of code/language/hardware will do the job without driving 
you crazy.

Incidentially, the fact that machines are used on human timescales is a 
primary reason why I'm skeptical of a Kurzweil-style singularity.  
Hardware gets faster, sofware gets slower, and human labor stays about 
the same.  

> 
> 
> Andr�
> --

-- 
Kirk Job-Sluder
From: Pascal Costanza
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <41ig6jF1dunkrU1@individual.net>
Andr� Thieme wrote:

> One might think that programs written directly in machine language
> could have some advantages on the speed of program execution...

Not necessarily. Modern processors run best when their pipelines are 
always filled. If I understand correctly, ensuring that that's the case 
is not straightforward and compilers are better at doing this than humans.


Pascal

-- 
My website: http://p-cos.net
Closer to MOP & ContextL:
http://common-lisp.net/project/closer/
From: Ulrich Hobelmann
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <41ih0iF1d9afoU1@individual.net>
Pascal Costanza wrote:
> Andr� Thieme wrote:
> 
>> One might think that programs written directly in machine language
>> could have some advantages on the speed of program execution...
> 
> Not necessarily. Modern processors run best when their pipelines are 
> always filled. If I understand correctly, ensuring that that's the case 
> is not straightforward and compilers are better at doing this than humans.

Interestingly modern languages (or those 40 years old) can benefit from 
these modern architectures and not run much slower than, say, C. 
Instruction-level parallelism means that the CPU can run both a type 
check (or tag check or something like that) and the "actual" operation 
in parallel.  Only those few architectures (such as portable ARM CPUs) 
that aren't superscalar might still be way faster in languages without 
runtime overhead.

-- 
Every decent man is ashamed of the government he lives under.
	H. L. Mencken
From: Rob Thorpe
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1136309508.693359.145960@z14g2000cwz.googlegroups.com>
Ulrich Hobelmann wrote:
> Pascal Costanza wrote:
> > André Thieme wrote:
> >
> >> One might think that programs written directly in machine language
> >> could have some advantages on the speed of program execution...
> >
> > Not necessarily. Modern processors run best when their pipelines are
> > always filled. If I understand correctly, ensuring that that's the case
> > is not straightforward and compilers are better at doing this than humans.
>
> Interestingly modern languages (or those 40 years old) can benefit from
> these modern architectures and not run much slower than, say, C.
> Instruction-level parallelism means that the CPU can run both a type
> check (or tag check or something like that) and the "actual" operation
> in parallel.  Only those few architectures (such as portable ARM CPUs)
> that aren't superscalar might still be way faster in languages without
> runtime overhead.

As Jon says this is not true, for several reasons.

Firstly, regardless of whether or not the tag checking instructions can
be executed in parallel, if they were then other waiting instructions
cannot be executed in parallel.

Secondly, assume the lisp implementation makes no attempt to
parallelise the code, and it's left to the processor.  This means that
the code consists of a tag check followed by a branch and actions based
on the tag check.  In order to run this code in parallel the processor
must successfully speculate the branch target then begin execution of
the code at the other end.  This is will often work in commonly used
parts of code though not in uncommonly used ones, and it will only work
on microprocessors that can speculate branches.  Regardless of whether
it works the processor must read in at least 3 instructions: a tag
check, a branch and the working instruction, and probably more in
practice.  This at least requires the decode resources to decode all
those instructions, which in the case without tag checking are free to
work on other instructions.

Thirdly, it is very difficult to make an implementation run a tag check
and an instruction on a lisp atom in parallel.  Imagine the
implementation thinks that the atom is an integer and proceeds on that
basis, it issues the tag check and an instruction based on the premise
that the atom is an integer at the same time.  If the integer operation
encounters an overflow then what can the implementation do about it?
Especially if the tag is found to be something else!  This problem can
be solved, but the solutions are very messy.

Lastly, some architectures use SMT, on these filling one thread with
instructions that contribute little work affects the other running
thread.
From: Ulrich Hobelmann
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <41vspcF1gr951U1@individual.net>
Rob Thorpe wrote:
> Firstly, regardless of whether or not the tag checking instructions can
> be executed in parallel, if they were then other waiting instructions
> cannot be executed in parallel.

But how much stuff is code doing at a given time, in general?  The G4 
and afaik the Athlon XP and the P4 can issue three instructions at a 
time.  The G5 bundles instructions to a block of up to four parallel 
instruction plus optionally a jump (or something like that).  I think 
they're not doing more in parallel, because most code doesn't really do 
more than three instructions at once (in parallel) most of the time.

In those cases where Lisp would check types, I suppose for arithmetic, 
or for functions like NULL or CAR/CDR, you could perform the arithmetic 
in parallel with the type check.  If the check doesn't succeed, throw an 
error and discard the result.  Ok, the CAR/CDR aren't as easy, as a 
wrong fetch instruction could result in a segfault.

> Secondly, assume the lisp implementation makes no attempt to
> parallelise the code, and it's left to the processor.  This means that
> the code consists of a tag check followed by a branch and actions based
> on the tag check.  In order to run this code in parallel the processor
> must successfully speculate the branch target then begin execution of
> the code at the other end.  This is will often work in commonly used
> parts of code though not in uncommonly used ones, and it will only work
> on microprocessors that can speculate branches.  Regardless of whether

Most CPUs speculate branches.  Even the cheapest ones do static branch 
prediction (mostly: forward = not taken); the PPC even allows you to 
tell in in the instruction if a branch is likely to be taken.  At least 
every desktop CPU does lots of dynamic prediction as well.

Mostly I think you'd do forward branches to some error handler, so that 
it most cases (where the type check succeeds) no performance is lost. 
Only in the error case does the CPU mispredict.

> it works the processor must read in at least 3 instructions: a tag
> check, a branch and the working instruction, and probably more in
> practice.  This at least requires the decode resources to decode all
> those instructions, which in the case without tag checking are free to
> work on other instructions.

But quite often instruction can't be issued in parallel, such as when 
they depend on the test result, or on some data that's just being 
calculated.  I don't think that filling one of three instruction-issue 
slots hurts a lot.

OTOH even optimizing a typical C function call on PPC (while still 
conforming to standard calling conventions) makes the function call 
noticeably faster.  That is, if you wouldn't do anything within that 
function.  So maybe in some cases the checking does cause a one-cycle 
operation to take two cycles, so overall you have several 10% slowdown.

> Thirdly, it is very difficult to make an implementation run a tag check
> and an instruction on a lisp atom in parallel.  Imagine the
> implementation thinks that the atom is an integer and proceeds on that
> basis, it issues the tag check and an instruction based on the premise
> that the atom is an integer at the same time.  If the integer operation
> encounters an overflow then what can the implementation do about it?

It simply ignores the result calculated so far.

> Especially if the tag is found to be something else!  This problem can
> be solved, but the solutions are very messy.
> 
> Lastly, some architectures use SMT, on these filling one thread with
> instructions that contribute little work affects the other running
> thread.

For Hyperthreading on the P4 it'd probably slow down other threads, yes. 
  But HT is only a hack anyway to get some more efficiency out of a not 
too great architecture.

-- 
the bottom line is that a JavaSchool that won't teach C and won't teach
Scheme is not really teaching computer science, either.  --  Joel Spolsky
From: Rob Thorpe
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1136316335.077017.317170@f14g2000cwb.googlegroups.com>
Ulrich Hobelmann wrote:
> Rob Thorpe wrote:
> > Firstly, regardless of whether or not the tag checking instructions can
> > be executed in parallel, if they were then other waiting instructions
> > cannot be executed in parallel.
>
> But how much stuff is code doing at a given time, in general?  The G4
> and afaik the Athlon XP and the P4 can issue three instructions at a
> time.  The G5 bundles instructions to a block of up to four parallel
> instruction plus optionally a jump (or something like that).  I think
> they're not doing more in parallel, because most code doesn't really do
> more than three instructions at once (in parallel) most of the time.

That's true.  In modern microarchitectures there is no reason to have
more parallel instruction resources than are already their because
there are too many dependent instructions.  So, if you have 3 available
and you waste 2 of them doing work that is not strictly speaking
necessary then you get slower execution.

> In those cases where Lisp would check types, I suppose for arithmetic,
> or for functions like NULL or CAR/CDR, you could perform the arithmetic
> in parallel with the type check.  If the check doesn't succeed, throw an
> error and discard the result.

That's often difficult to do because when arithmetic fails it causes a
non-maskable interrupt.  Even when it doesn't it causes a flag to be
set, which must be reset sometime later in order to be used.

> Ok, the CAR/CDR aren't as easy, as a
> wrong fetch instruction could result in a segfault.

Exactly, the same problem occurs.  I think IA64 has a special
instruction for this, something like "load x and don't bug me if it's
not there", but most architectures don't.

> > Secondly, assume the lisp implementation makes no attempt to
> > parallelise the code, and it's left to the processor.  This means that
> > the code consists of a tag check followed by a branch and actions based
> > on the tag check.  In order to run this code in parallel the processor
> > must successfully speculate the branch target then begin execution of
> > the code at the other end.  This is will often work in commonly used
> > parts of code though not in uncommonly used ones, and it will only work
> > on microprocessors that can speculate branches.  Regardless of whether
>
> Most CPUs speculate branches.  Even the cheapest ones do static branch
> prediction (mostly: forward = not taken); the PPC even allows you to
> tell in in the instruction if a branch is likely to be taken.  At least
> every desktop CPU does lots of dynamic prediction as well.

They do a certain amount yes.  But by adding tag checks you are putting
further strain on resources that could be used for other things.
Filling branch predictors and BTBs with otherwise unnecessary
information.

Tag checking is quite complex for example.  Lets say the tag is stored
in the lower bits a machine word.  The implementation must:-

1. compare the tag against a mask
2. On the basis of the tag jump to the relevant piece of code.
3. Remove the tag from the atom before using it (often but not always)

Step 2 can be done in two ways.  There could be a set of conditional
checks and jumps, which is relatively slow, requires lots of
instructions and consumes space in the branch predictor.  Or, the
processor told to jump to IP + tag.  This trick removes many
conditional branches, but it replaces them with a branch with a target
that cannot be discovered until the branch has executed and retired.
The processor must use it's branch target buffer in this situation
which normally only has very few entries, say 128.

Also, microprocessors cannot speculate down very many branches at once.
 At some point the processor must stop speculating because it cannot
store the necessary state.

If you need to execute 3 instructions to get the effect of 1 (and 3 is
an underestimate in most cases), then no matter if the execution units
can run you still need to decode them, and most processors these days
only support 3 instruction decodes in parallel.

> Mostly I think you'd do forward branches to some error handler, so that
> it most cases (where the type check succeeds) no performance is lost.
> Only in the error case does the CPU mispredict.

The error handler though must contain the remainder of the tag checks.

> > it works the processor must read in at least 3 instructions: a tag
> > check, a branch and the working instruction, and probably more in
> > practice.  This at least requires the decode resources to decode all
> > those instructions, which in the case without tag checking are free to
> > work on other instructions.
>
> But quite often instruction can't be issued in parallel, such as when
> they depend on the test result, or on some data that's just being
> calculated.  I don't think that filling one of three instruction-issue
> slots hurts a lot.

No, you need at least two not three.  One for the check and another for
the branch.  Even this is optimistic, as I mentioned above, often you
have to mask out the tag before the atom can be used.  Even if it did
only use one on average it is a significant penalty if many other
instructions are dependent on the tag checks.

> OTOH even optimizing a typical C function call on PPC (while still
> conforming to standard calling conventions) makes the function call
> noticeably faster.  That is, if you wouldn't do anything within that
> function.  So maybe in some cases the checking does cause a one-cycle
> operation to take two cycles, so overall you have several 10% slowdown.

I'm not sure I understand this argument.  Where does the 10% come from?

> > Thirdly, it is very difficult to make an implementation run a tag check
> > and an instruction on a lisp atom in parallel.  Imagine the
> > implementation thinks that the atom is an integer and proceeds on that
> > basis, it issues the tag check and an instruction based on the premise
> > that the atom is an integer at the same time.  If the integer operation
> > encounters an overflow then what can the implementation do about it?
>
> It simply ignores the result calculated so far.

See earlier, unfortunately not often possible.

> > Especially if the tag is found to be something else!  This problem can
> > be solved, but the solutions are very messy.
> >
> > Lastly, some architectures use SMT, on these filling one thread with
> > instructions that contribute little work affects the other running
> > thread.
>
> For Hyperthreading on the P4 it'd probably slow down other threads, yes.
>   But HT is only a hack anyway to get some more efficiency out of a not
> too great architecture.

HT on the P4 isn't good, but SMT in general is a different kettle of
fish.  I expect future SMT implementations will be much better.  Though
this is a relatively minor point.

Overall, as Jon said, the proof of the pudding is in the eating.  If
you can show me some lisp code run on an implementation using tagged
data types achieve similar performance to the same algorithm executed
using static types (not necessarily in lisp) then I'm obviously wrong.
(Assuming that the code in question does a sensible number of type
checks, it's not tied up in disk I/O or multiple-precision arithmetic
(like this fibonacci test).)
From: Ulrich Hobelmann
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <420328F1fpb87U1@individual.net>
Rob Thorpe wrote:
>> In those cases where Lisp would check types, I suppose for arithmetic,
>> or for functions like NULL or CAR/CDR, you could perform the arithmetic
>> in parallel with the type check.  If the check doesn't succeed, throw an
>> error and discard the result.
> 
> That's often difficult to do because when arithmetic fails it causes a
> non-maskable interrupt.  Even when it doesn't it causes a flag to be
> set, which must be reset sometime later in order to be used.

Oh ok, I'm not too familiar with arithmetic in assembly and what side 
effects those instructions have...

>> OTOH even optimizing a typical C function call on PPC (while still
>> conforming to standard calling conventions) makes the function call
>> noticeably faster.  That is, if you wouldn't do anything within that
>> function.  So maybe in some cases the checking does cause a one-cycle
>> operation to take two cycles, so overall you have several 10% slowdown.
> 
> I'm not sure I understand this argument.  Where does the 10% come from?

No, I meant some multiple, like 30% or 70% overhead compared to code 
that only does what you want without safety checks.  Sometimes the 
overhead doesn't cause another cycle, but in other cases it does. 
Claims that Lisp takes maybe 50% more time than C to execute make some 
sense under that assumption.  (I haven't run comparable programs in 
different languages, so I can't tell.)

> Overall, as Jon said, the proof of the pudding is in the eating.  If
> you can show me some lisp code run on an implementation using tagged
> data types achieve similar performance to the same algorithm executed
> using static types (not necessarily in lisp) then I'm obviously wrong.

Well, I haven't really looked at concrete Lisp output.  Lisp's main 
strength is in isolating me from low levels, and as long as it's fast 
enough, I don't really care.

> (Assuming that the code in question does a sensible number of type
> checks, it's not tied up in disk I/O or multiple-precision arithmetic
> (like this fibonacci test).)

-- 
the bottom line is that a JavaSchool that won't teach C and won't teach
Scheme is not really teaching computer science, either.  --  Joel Spolsky
From: Rob Thorpe
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1136400204.728773.176820@g47g2000cwa.googlegroups.com>
Ulrich Hobelmann wrote:
> Rob Thorpe wrote:
> >> In those cases where Lisp would check types, I suppose for arithmetic,
> >> or for functions like NULL or CAR/CDR, you could perform the arithmetic
> >> in parallel with the type check.  If the check doesn't succeed, throw an
> >> error and discard the result.
> >
> > That's often difficult to do because when arithmetic fails it causes a
> > non-maskable interrupt.  Even when it doesn't it causes a flag to be
> > set, which must be reset sometime later in order to be used.
>
> Oh ok, I'm not too familiar with arithmetic in assembly and what side
> effects those instructions have...
>
> >> OTOH even optimizing a typical C function call on PPC (while still
> >> conforming to standard calling conventions) makes the function call
> >> noticeably faster.  That is, if you wouldn't do anything within that
> >> function.  So maybe in some cases the checking does cause a one-cycle
> >> operation to take two cycles, so overall you have several 10% slowdown.
> >
> > I'm not sure I understand this argument.  Where does the 10% come from?
>
> No, I meant some multiple, like 30% or 70% overhead compared to code
> that only does what you want without safety checks.  Sometimes the
> overhead doesn't cause another cycle, but in other cases it does.
> Claims that Lisp takes maybe 50% more time than C to execute make some
> sense under that assumption.  (I haven't run comparable programs in
> different languages, so I can't tell.)

I doubt that a lisp implementation that uses only tagged types would
perform anywhere near as well.  Not only do you have the large number
of branches to contend with and their effects, but the "tag guessing
error" code must often be held in the cache.

There are implementations that run at speed close to C code, such as
Stalin.  SBCL, CMUCL and GCL are all fairly fast at least some of the
time.  But this is because they do type inference and remove the need
to check tags (or have tags) in many cases.

> > Overall, as Jon said, the proof of the pudding is in the eating.  If
> > you can show me some lisp code run on an implementation using tagged
> > data types achieve similar performance to the same algorithm executed
> > using static types (not necessarily in lisp) then I'm obviously wrong.
>
> Well, I haven't really looked at concrete Lisp output.  Lisp's main
> strength is in isolating me from low levels, and as long as it's fast
> enough, I don't really care.

That's right.  I was talking about how the lower layers behave.

If you're interested write some code that cannot be type inferred (eg
uses "read" from a file) compile it and look at the assembly output of
a lisp compiler in this case, it is not pretty and not simple.
From: Marco Antoniotti
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1136402118.271272.252640@f14g2000cwb.googlegroups.com>
Rob Thorpe wrote:
>
> There are implementations that run at speed close to C code, such as
> Stalin.  SBCL, CMUCL and GCL are all fairly fast at least some of the
> time.  But this is because they do type inference and remove the need
> to check tags (or have tags) in many cases.

AFAIK, only the Python compiler (common to CMUCL and SBCL) does type
inference.  Not the GCL one.  Of course other compilers may do the same
and I may be wrong about the latest GCL.

Cheers
--
Marco
From: Christophe Rhodes
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <sq4q4jii34.fsf@cam.ac.uk>
"Marco Antoniotti" <·······@gmail.com> writes:

> and I may be wrong about the latest GCL.

You are.

Christophe
From: Marco Antoniotti
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1136474436.911372.69090@g47g2000cwa.googlegroups.com>
Christophe Rhodes wrote:
> "Marco Antoniotti" <·······@gmail.com> writes:
>
> > and I may be wrong about the latest GCL.
> 
> You are.

Good to know.

Cheers
--
Marco
From: Thomas F. Burdick
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <xcvd5j8mfmk.fsf@conquest.OCF.Berkeley.EDU>
"Rob Thorpe" <·············@antenova.com> writes:

> Ulrich Hobelmann wrote:
>
> > Most CPUs speculate branches.  Even the cheapest ones do static branch
> > prediction (mostly: forward = not taken); the PPC even allows you to
> > tell in in the instruction if a branch is likely to be taken.  At least
> > every desktop CPU does lots of dynamic prediction as well.
> 
> They do a certain amount yes.  But by adding tag checks you are putting
> further strain on resources that could be used for other things.
> Filling branch predictors and BTBs with otherwise unnecessary
> information.

This is absolutely correct, but it's also only not relevant for a lot
of code.  In the fairly common case where the branch predictors, etc,
would not have been filled up by statically typed code, these features
do go some way to removing the runtime cost of dynamic typing.  But it
is true that it's a pipe dream to think that modern architectures are
going to make dynamic typing free.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | Free Mumia Abu-Jamal! |
     ,--'    _,'   | Abolish the racist    |
    /       /      | death penalty!        |
   (   -.  |       `-----------------------'
   |     ) |                               
  (`-.  '--.)                              
   `. )----'                               
From: ······@earthlink.net
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <1136915412.703462.79370@o13g2000cwo.googlegroups.com>
Rob Thorpe wrote:
> That's true.  In modern microarchitectures there is no reason to have
> more parallel instruction resources than are already their because
> there are too many dependent instructions.  So, if you have 3 available
> and you waste 2 of them doing work that is not strictly speaking
> necessary then you get slower execution.

Only if you could have done something necessary with those instruction
opportunities.  If the necessary instructions can't use those
opportunities
because of dependencies, those opportunities can either be completely
wasted or used for something that might help.

> 1. compare the tag against a mask
> 2. On the basis of the tag jump to the relevant piece of code.
> 3. Remove the tag from the atom before using it (often but not always)
>
> Step 2 can be done in two ways.  There could be a set of conditional
> checks and jumps, which is relatively slow, requires lots of
> instructions and consumes space in the branch predictor.

"Consumes space in the branch predictor" is not necessarily a
limitation.
The space that is likely to matter is a function of whether the
branches are
taken.  If, for a given set of checks, only one branch is taken (over a
reasonable
period of time), that's the only branch that is necessarily consuming
significant
predictor resources.  (The state to handle the misprediction recovery
for lots
of outstanding branches is a lot smaller than the state to predict lots
of branches
and the "predict not taken" case can use none of the latter.)

Correctly predicted/consistently not-taken branches are effectively
no-ops, which
aren't free, but they may use some resources (issue opportunities) that
would have
otherwise been wasted.

This is one place where code reuse can be more expensive than code
copies.
Most user-visible data operations (in lisp) are almost always working
on the same
data types, so only one path is dynamically active.  Sharing type check
code
can be a disaster.

> Also, microprocessors cannot speculate down very many branches at once.
>  At some point the processor must stop speculating because it cannot
> store the necessary state.

As I wrote above, that state is relatively small.  Since a processor
doesn't know
that a given instruction is a branch until after decode, which is after
some subsequent
instructions have been fetched and possibly started decode, I'd be
surprised if many
modern high performance processors stalled for that reason.  In other
words, I
suspect that most can handle "every outstanding instruction is a
branch".

> Overall, as Jon said, the proof of the pudding is in the eating.  If
> you can show me some lisp code run on an implementation using tagged
> data types achieve similar performance to the same algorithm executed
> using static types (not necessarily in lisp) then I'm obviously wrong.
> (Assuming that the code in question does a sensible number of type
> checks, it's not tied up in disk I/O or multiple-precision arithmetic
> (like this fibonacci test).)

You're forgetting cache misses.  Data cache misses are a killer these
days.

-andy
From: Jon Harrop
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <43baaa59$0$63060$ed2e19e4@ptn-nntp-reader04.plus.net>
Ulrich Hobelmann wrote:
> Interestingly modern languages (or those 40 years old) can benefit from
> these modern architectures and not run much slower than, say, C.
> Instruction-level parallelism means that the CPU can run both a type
> check (or tag check or something like that) and the "actual" operation
> in parallel.  Only those few architectures (such as portable ARM CPUs)
> that aren't superscalar might still be way faster in languages without
> runtime overhead.

Don't performance benchmarks point to the contrary? Run-time type checks
still impose a significant overhead and people still go to great lengths to
remove them, either by using static typing or by adding type declarations
to help the compilers of 40 year old languages?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/ocaml_for_scientists/chapter1.html
From: Jon Harrop
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <43b4922c$0$63069$ed2e19e4@ptn-nntp-reader04.plus.net>
John M wrote:
> I tried to come up with equivalent implementations
> in each language.  Apologies if I did not succeed.
> 
> I was surprised at the result since I had read that
> Lisp was the faster language.

I reduced the argument from 10^6 to 10^5 to save time and added SML, OCaml
and Mathematica implementations for comparison:

 1.435s MLton
 1.693s Mathematica
 2.073s OCaml
 2.709s Python
 7.813s SBCL
11.140s CMUCL
13.031s CLisp
49.162s MzScheme

Here's my SML implementation for MLton:

  fun fib(c, n, r) : IntInf.int = if r=0 then c else fib(n, c+n, r-1)
  val _ = fib(0, 1, 100000)

You can get the SML to print its answer by changing the second line to:

  val () = print (IntInf.toString (fib(1, 1, 100000)))

The OCaml implementation is uglier because OCaml lacks bigint literals:

  open Big_int
  let rec fib c n r = if r=0 then c else fib n (add_big_int c n) (r-1)
  let _ = fib zero_big_int unit_big_int 100000

The Mathematica is remarkably fast given the lack of compilation:

  $IterationLimit = Infinity;
  Fib[c_, _, 0] := c
  Fib[c_, n_, r_] := Fib[n, c + n, r - 1]
  AbsoluteTiming[Fib[0, 1, 100000];]

Here's the equivalent Lisp but it isn't significantly faster than yours:

  (defun fib (c n r)
    (if (= r 0) c
      (fib n (+ c n) (- r 1))))
  (time (fib 0 1 100000))

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/ocaml_for_scientists/chapter1.html
From: John
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <cK-dnTIb4uRHtijeRVn-hQ@comcast.com>
Jon Harrop wrote:

> John M wrote:
>> I tried to come up with equivalent implementations
>> in each language.  Apologies if I did not succeed.
>> 
>> I was surprised at the result since I had read that
>> Lisp was the faster language.
> 
> I reduced the argument from 10^6 to 10^5 to save time and added SML, OCaml
> and Mathematica implementations for comparison:
> 
>  1.435s MLton
>  1.693s Mathematica
>  2.073s OCaml
>  2.709s Python
>  7.813s SBCL
> 11.140s CMUCL
> 13.031s CLisp
> 49.162s MzScheme

MLton, OCaml, SBCL, CMUCL are native-code compiled.
Python, CLISP, MzScheme are byte-code interpreted.

I would expect the groups to have similar performance numbers.

> 
> Here's my SML implementation for MLton:
> 
>   fun fib(c, n, r) : IntInf.int = if r=0 then c else fib(n, c+n, r-1)
>   val _ = fib(0, 1, 100000)
> 
> You can get the SML to print its answer by changing the second line to:
> 
>   val () = print (IntInf.toString (fib(1, 1, 100000)))
> 
> The OCaml implementation is uglier because OCaml lacks bigint literals:
> 
>   open Big_int
>   let rec fib c n r = if r=0 then c else fib n (add_big_int c n) (r-1)
>   let _ = fib zero_big_int unit_big_int 100000
> 
> The Mathematica is remarkably fast given the lack of compilation:
> 
>   $IterationLimit = Infinity;
>   Fib[c_, _, 0] := c
>   Fib[c_, n_, r_] := Fib[n, c + n, r - 1]
>   AbsoluteTiming[Fib[0, 1, 100000];]
> 
> Here's the equivalent Lisp but it isn't significantly faster than yours:
> 
>   (defun fib (c n r)
>     (if (= r 0) c
>       (fib n (+ c n) (- r 1))))
>   (time (fib 0 1 100000))
> 
From: Patrick Frankenberger
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <dp3hqp$kcf$01$1@news.t-online.com>
John wrote:
> MLton, OCaml, SBCL, CMUCL are native-code compiled.
> Python, CLISP, MzScheme are byte-code interpreted.
> 
> I would expect the groups to have similar performance numbers.

native-code vs. byte-code only matters when:
a) the byte-code isn't JIT-compiled
b) the decoding of the byte-code takes time significant in relation to 
the time taken by the execution of the byte-code
c) optimizations across byte-code borders are possible when native-compiling

Your testcase mostly stresses GC and bignum performance for which 
neither b) nor c) are relevant.

HTH,
Patrick
From: John
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <csydnaDKG_6A9SjenZ2dnUVZ_vidnZ2d@comcast.com>
Patrick Frankenberger wrote:

> John wrote:
>> MLton, OCaml, SBCL, CMUCL are native-code compiled.
>> Python, CLISP, MzScheme are byte-code interpreted.
>> 
>> I would expect the groups to have similar performance numbers.
> 
> native-code vs. byte-code only matters when:
> a) the byte-code isn't JIT-compiled
> b) the decoding of the byte-code takes time significant in relation to
> the time taken by the execution of the byte-code
> c) optimizations across byte-code borders are possible when
> native-compiling
> 
> Your testcase mostly stresses GC and bignum performance for which
> neither b) nor c) are relevant.
> 
> HTH,
> Patrick

OK, the base line for bignum in C with GMP:

#include <stdio.h>
#include <stdlib.h>
#include <gmp.h>


void fib(int n, mpz_t result)
{
  int i;
  mpz_t f, f1, f0;
  mpz_init(f);
  mpz_init(f1); mpz_set_ui(f1, 1);
  mpz_init(f0);

  if (n == 0) mpz_set(result, f0);
  if (n == 1) mpz_set(result, f1);

  for (i = 2; i <= n; i++) {
    mpz_add(f, f1, f0);
    mpz_set(f0, f1);
    mpz_set(f1, f);
  }

  mpz_set(result, f);
  mpz_clear(f0);
  mpz_clear(f1);
  mpz_clear(f);
}

int main()
{
  mpz_t result;
  mpz_init(result);
  fib(1000000, result);
  gmp_printf("%Zd\n", result);
  mpz_clear(result);
  exit(EXIT_SUCCESS);
}


(for Freebsd)
% gcc -I/usr/local/include -L/usr/local/lib fib.c -lgmp

% time ./a.out

37205802043347225419033684684301719893411568996526838242546875
54.068u 0.000s 0:57.63 93.8%    5+592k 0+0io 0pf+0w
From: Marcin 'Qrczak' Kowalczyk
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <87zmmitxa5.fsf@qrnik.zagroda>
Patrick Frankenberger <···············@gmx.net> writes:

> Your testcase mostly stresses GC and bignum performance for which
> neither b) nor c) are relevant.

Indeed. The difference between Kogut and KoScheme is 15%-18%.
They use the same bignum implementation and GC, but one of them
is an interpreter hosted by the other, which is compiled via C.

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/
From: Jon Harrop
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <43baac10$0$63060$ed2e19e4@ptn-nntp-reader04.plus.net>
Patrick Frankenberger wrote:
> native-code vs. byte-code only matters when:
> a) the byte-code isn't JIT-compiled
> b) the decoding of the byte-code takes time significant in relation to
> the time taken by the execution of the byte-code
> c) optimizations across byte-code borders are possible when
> native-compiling
> 
> Your testcase mostly stresses GC and bignum performance for which
> neither b) nor c) are relevant.

Yes. OCaml bytecode takes only 2.5s, for example.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/ocaml_for_scientists/chapter1.html
From: Jens Axel Søgaard
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <43b50532$0$47092$edfadb0f@dread15.news.tele.dk>
Jon Harrop wrote:

> 49.162s MzScheme

Where is your mzscheme code, and how did you run it?

-- 
Jens Axel S�gaard
From: Jens Axel Søgaard
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <43b53828$0$47091$edfadb0f@dread15.news.tele.dk>
Jens Axel S�gaard wrote:
> Jon Harrop wrote:
> 
>> 49.162s MzScheme

You must have done something wrong.


I put this in "fib.scm":

(module fib mzscheme
   (define (fib c n r)
     (if (= r 0)
         c
         (fib n (+ c n) (sub1 r))))

   (collect-garbage)
   (display (time (fib 0 1 100000))))

(require fib)
(exit)

And ran it this way:

··@JS51 /c/tmp
$ /c/Programmer/PLT/mzc.exe -z fib.scm
mzc version 300, Copyright (c) 2004-2005 PLT Scheme Inc.
  [output to "fib.zo"]

··@JS51 /c/tmp
$ /c/Programmer/PLT/mzscheme.exe -f fib.zo
Welcome to MzScheme version 300, Copyright (c) 2004-2005 PLT Scheme Inc.
cpu time: 4466 real time: 4507 gc time: 3766
25974069...

This is very far from the 49 seconds you reported (I am on a 1.6GHz 
Pentium M).

[I didn't have a C-compiler on this machine otherwise, I would have
used mzc to compile it to C in stead of to a .zo (which contains
bytecode)]


-- 
Jens Axel S�gaard
From: Jon Harrop
Subject: Re: Python and Lisp Test
Date: 
Message-ID: <43baa981$0$63060$ed2e19e4@ptn-nntp-reader04.plus.net>
Jens Axel S�gaard wrote:
> You must have done something wrong.

Yes. I'm not sure what though.

Doing the same thing on this 900MHz Athlon t-bird under Debian Linux, I get:

$ cat >fib.scm
(module fib mzscheme
   (define (fib c n r)
     (if (= r 0)
         c
         (fib n (+ c n) (sub1 r))))

   (collect-garbage)
   (time (fib 0 1 100000)))

(require fib)
(exit)
$ mzc -z fib.scm
MzScheme compiler (mzc) version 209, Copyright (c) 2004 PLT Scheme, Inc.
 [output to "fib.zo"]
$ time mzscheme -f fib.zo
Welcome to MzScheme version 209, Copyright (c) 2004 PLT Scheme, Inc.
cpu time: 32730 real time: 36142 gc time: 25760

real    0m36.229s
user    0m32.390s
sys     0m0.430s

Any ideas what I'm doing wrong?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/ocaml_for_scientists/chapter1.html