Could the SBCL (or any X Lisp) compiler be optimised to Qi output?

From: Mark Tarver
Subject: Could the SBCL (or any X Lisp) compiler be optimised to Qi output?
Date: Fri, 13 Jul 2007 11:08:55 +0000
Message-ID: <1184324935.528639.65290@n2g2000hse.googlegroups.com>

The recent timings of Qi turbo-E seem to show that this unreleased
version produces list handling
code which is as good as the most heavily optimised code that a
professional Lisp programmer
can produce unless he goes into the dark arts of destructive
programming :).

Despite this, OCaml still has a factor of at least 2 over the code
that Qi turbo-E produces*. I don't think it lies within my power as a
Lisp programmer to significantly better the object code I'm producing
(feel free to argue otherwise, code is below, I'm delighted to accept
suggestions).

However Qi-turbo-E-outputted Lisp is quite rigid and uses a limited
numbers of constructs - about 15 in all.

defun, cons, list, car, cdr, consp, block, tagbody, return, go, eq,
equal, eql, let, if.

** This is a question for Lisp developers - could SBCL (or any CL) be
optimised around these features? Can improved performance be got from
clever implementation of any of these basic features so as to make
Lisp competitive with O'Caml here? **

Mark

* I'm awaiting final confirmation on this - but it looks like this is
about right.

*************************************************************************
The Source Code

(define simplify
  [Op A B] -> (s Op (simplify A) (simplify B))
  A -> A)

(define s
  + M N -> (+ M N)	where (and (number? M) (number? N))
  + 0 F -> F
  + F 0 -> F
  + A [+ B C] -> [+ [+ A B] C]
  * M N -> (* M N)	where (and (number? M) (number? N))
  * 0 F -> 0
  * F 0 -> 0
  * F 1 -> F
  * 1 F -> F
  * A [* B C] -> [* [* A B] C]
  Op A B -> [Op A B])

The Object Code

;; this was hand-edited to lowercase for modern Lisp settings
;; compiler directives issued by hand (Qi has this preset)
;; Qi generated the uppercase (hope no error crept in during editing)
;; it runs under bare metal Lisp - no Qi needed
;; generated by Qi turbo-E (unreleased)

(defun simplify (V191)
  (declare (optimize (speed 3)))
  (block nil
    (tagbody
      (if (consp V191)
          (let ((Cdr202 (cdr V191)))
            (if (consp Cdr202)
                (let ((Cdr201 (cdr Cdr202)))
                  (if (consp Cdr201)
                      (if (null (cdr Cdr201))
                          (return
                           (s (car V191) (simplify (car Cdr202))
                              (simplify (car Cdr201))))
                          (go tag197))
                      (go tag197)))
                (go tag197))))
     tag197
      (return V191))))

(defun s (V192 V193 V194)
  (declare (optimize (speed 3)))
   (block nil
    (tagbody
      (if (eq '+ V192)
          (if (and (numberp V193) (numberp V194))
              (return (the number (+ V193 V194)))
              (if (eql 0 V193) (return V194)
                  (if (eql 0 V194) (return V193)
                      (if (consp V194)
                          (let ((Cdr213 (cdr V194)))
                            (if (eq '+ (car V194))
                                (if (consp Cdr213)
                                    (let ((Cdr212 (cdr Cdr213)))
                                      (if (consp Cdr212)
                                          (if (null (cdr Cdr212))
                                              (return
                                               (cons '+
                                                     (cons
                                                      (list '+ V193
                                                            (car
Cdr213))
                                                      Cdr212)))
                                              (go tag203))
                                          (go tag203)))
                                    (go tag203))
                                (go tag203)))
                          (go tag203))))))
     tag203
      (tagbody
        (if (eq '* V192)
            (if (and (numberp V193) (numberp V194))
                (return (the number (* V193 V194)))
                (if (eql 0 V193) (return 0)
                    (if (eql 0 V194) (return 0)
                        (if (eql 1 V194) (return V193)
                            (if (eql 1 V193) (return V194)
                                (if (consp V194)
                                    (let ((Cdr226 (cdr V194)))
                                      (if (eq '* (car V194))
                                          (if (consp Cdr226)
                                              (let ((Cdr225 (cdr
Cdr226)))
                                                (if (consp Cdr225)
                                                    (if (null (cdr
Cdr225))
                                                        (return
                                                         (cons '*
                                                               (cons
                                                                (list
'* V193
 
(car
 
Cdr226))
 
Cdr225)))
                                                        (go tag214))
                                                    (go tag214)))
                                              (go tag214))
                                          (go tag214)))
                                    (go tag214))))))))
       tag214
        (return (list V192 V193 V194))))))

(setq *expr* '(* X (+ (+ (* 12 0) (+ 23 8)) Y)))

(defun tt (N) (declare (optimize (speed 3))) (gc) (time (test N)))

(defun test (N)
  (declare (optimize (speed 3)))
    (COND ((ZEROP N) 0)
          (T (simplify *expr*) (test (1- N)))))

(mapc 'compile '(tt test simplify s))

;; the benchmark - 10^7 iterations

(tt 10000000)

Re: Could the SBCL (or any X Lisp) compiler be optimised to Qi output? jayessay
- Re: Could the SBCL (or any X Lisp) compiler be optimised to Qi output? Mark Tarver
  - Re: Could the SBCL (or any X Lisp) compiler be optimised to Qi output? jayessay
Re: Could the SBCL (or any X Lisp) compiler be optimised to Qi output? David Golden
- Re: Could the SBCL (or any X Lisp) compiler be optimised to Qi output? Mark Tarver
Re: Could the SBCL (or any X Lisp) compiler be optimised to Qi output? Alex Mizrahi
- Re: Could the SBCL (or any X Lisp) compiler be optimised to Qi output? Jon Harrop
  - a vast difference? Mark Tarver
    - Re: a vast difference? Jon Harrop
      - Re: a vast difference? Mark Tarver
        Re: a vast difference? Dan Bensen
        Re: a vast difference? Dan Bensen
        Re: a vast difference? Jon Harrop
        Re: a vast difference? Mark Tarver
        Re: a vast difference? Raffael Cavallaro
        Re: a vast difference? Jon Harrop
        Re: a vast difference? Isaac Gouy
        Re: a vast difference? Waldek Hebisch

From: jayessay
Subject: Re: Could the SBCL (or any X Lisp) compiler be optimised to Qi output?
Date: Fri, 13 Jul 2007 16:04:32 +0000
Message-ID: <m34pk8thm7.fsf@sirius.goldenthreadtech.com>

Mark Tarver <··········@ukonline.co.uk> writes:

> *************************************************************************
> The Source Code

....
> The Object Code
....

I'm pretty sure you can do much better than this as your code is not
statically optimized at all.  The approach would likely involve
switching to vectors, fully declaring them and all indexing operations
on them, and (quite possibly) reusing preallocated chunks as much as
"reasonable".  I've done stuff like this and been able to obtain
machine code (check the disassembly) that is pretty close to as good
as you could get[1].

/Jon

1. Mind you, it may take some real work to set up the foundation and
   figure out how best to use it.  So, it is likely this is only worth
   doing if it really means something (for your application, future
   work, etc.)

-- 
'j' - a n t h o n y at romeo/charley/november com

From: Mark Tarver
Subject: Re: Could the SBCL (or any X Lisp) compiler be optimised to Qi output?
Date: Fri, 13 Jul 2007 16:26:30 +0000
Message-ID: <1184343990.905527.47510@w3g2000hsg.googlegroups.com>

On 13 Jul, 17:04, jayessay <······@foo.com> wrote:
> Mark Tarver <··········@ukonline.co.uk> writes:
> > *************************************************************************
> > The Source Code
>
> ....> The Object Code
>
> ....
>
> I'm pretty sure you can do much better than this as your code is not
> statically optimized at all.  The approach would likely involve
> switching to vectors, fully declaring them and all indexing operations
> on them, and (quite possibly) reusing preallocated chunks as much as
> "reasonable".  I've done stuff like this and been able to obtain
> machine code (check the disassembly) that is pretty close to as good
> as you could get[1].
>
> /Jon
>
> 1. Mind you, it may take some real work to set up the foundation and
>    figure out how best to use it.  So, it is likely this is only worth
>    doing if it really means something (for your application, future
>    work, etc.)
>
> --
> 'j' - a n t h o n y at romeo/charley/november com

That would really be equivalent to writing a compiler for Qi as a
stand alone language apart from Lisp.  If you get down to the level of
vectors and addresses and such then you might as well compile into C.
Yes; its a lot of work.  No, I wouldn't fancy it!  Some would though.

Mark

From: jayessay
Subject: Re: Could the SBCL (or any X Lisp) compiler be optimised to Qi output?
Date: Fri, 13 Jul 2007 16:42:34 +0000
Message-ID: <m3zm20s1ad.fsf@sirius.goldenthreadtech.com>

Mark Tarver <··········@ukonline.co.uk> writes:

> That would really be equivalent to writing a compiler for Qi as a
> stand alone language apart from Lisp.  If you get down to the level of
> vectors and addresses and such then you might as well compile into C.

I disagree with both these points.  There is no reason here that it
should or needs to be apart from Lisp.  And compiling into C, while
providing for possible improvements (e.g., you really can't get down
to the level of "raw addresses" and stay in Lisp) in certain areas,
would lose all the benefits of keeping things in the dynamic
environment and tightly integrated with Lisp.  I'd say both of these
suggestions are losers for most purposes.

/Jon

-- 
'j' - a n t h o n y at romeo/charley/november com

From: David Golden
Subject: Re: Could the SBCL (or any X Lisp) compiler be optimised to Qi output?
Date: Fri, 13 Jul 2007 14:43:12 +0000
Message-ID: <fcMli.20842$j7.378465@news.indigo.ie>

Hmm. Does Qi drop consp checks if it can statically prove an arg is 
always a list based on type information? I'm getting the (quite
possibly mistaken) impression Qi emits "dynamically safe" code even
when further optimisation might be possible by emitting "statically
safe, dynamically unsafe" code.

From: Mark Tarver
Subject: Re: Could the SBCL (or any X Lisp) compiler be optimised to Qi output?
Date: Fri, 13 Jul 2007 16:23:55 +0000
Message-ID: <1184343835.280607.128360@q75g2000hsh.googlegroups.com>

On 13 Jul, 15:43, David Golden <············@oceanfree.net> wrote:
> Hmm. Does Qi drop consp checks if it can statically prove an arg is
> always a list based on type information? I'm getting the (quite
> possibly mistaken) impression Qi emits "dynamically safe" code even
> when further optimisation might be possible by emitting "statically
> safe, dynamically unsafe" code.

Qi does emit dynamically safe code (though here there is no type
checking done anyhow) and could optimise further by using type
information.  Its doable to eliminate some consp checks that way -
though not totally trivial.

Mark

From: Alex Mizrahi
Subject: Re: Could the SBCL (or any X Lisp) compiler be optimised to Qi output?
Date: Fri, 13 Jul 2007 14:36:59 +0000
Message-ID: <46978e11$0$90271$14726298@news.sunsite.dk>

(message (Hello 'Mark)
(you :wrote  :on '(Fri, 13 Jul 2007 11:08:55 -0000))
(

 MT> Despite this, OCaml still has a factor of at least 2 over the code
 MT> that Qi turbo-E produces*. I don't think it lies within my power as a
 MT> Lisp programmer to significantly better the object code I'm producing
 MT> (feel free to argue otherwise, code is below, I'm delighted to accept
 MT> suggestions).

btw, i've checked your code on ACL 8.0 Express.
results are quite same

; cpu time (non-gc) 2,204 msec user, 0 msec system
; cpu time (gc)     2,640 msec user, 0 msec system
; cpu time (total)  4,844 msec user, 0 msec system

(machine is Intel Core 2 Duo 2.1 GHz)

i see here most time is spent on GC.if OCaml has somewhat different 
allocation/gc schemes, it might be not a fair comparison of pattern 
matchers.

; space allocation:
;  60,000,009 cons cells, 1,104 other bytes, 24 static bytes

possibly we can disable GC (and do that for ocaml too, if it's possible 
there) -- that might be a better comparions ;)
i think we can allocate some GBs of RAM on modern computer, so it won't need 
to do GC.

or some GC tuning could optimize things as well. all junk generated after 
one simplification can be eliminated, so ephemeral GC could just discard 
young generation.

)
(With-best-regards '(Alex Mizrahi) :aka 'killer_storm)
"scorn")

From: Jon Harrop
Subject: Re: Could the SBCL (or any X Lisp) compiler be optimised to Qi output?
Date: Sat, 14 Jul 2007 15:46:04 +0000
Message-ID: <4698f0fa$0$1589$ed2619ec@ptn-nntp-reader02.plus.net>

Alex Mizrahi wrote:
> i see here most time is spent on GC.if OCaml has somewhat different
> allocation/gc schemes, it might be not a fair comparison of pattern
> matchers.

OCaml is vastly faster at allocating. However, it is also vastly faster at
pattern matching...

-- 
Dr Jon D Harrop, Flying Frog Consultancy
The OCaml Journal
http://www.ffconsultancy.com/products/ocaml_journal/?usenet

From: Mark Tarver
Subject: a vast difference?
Date: Sat, 14 Jul 2007 20:57:46 +0000
Message-ID: <1184446666.091355.212330@r34g2000hsd.googlegroups.com>

On 14 Jul, 16:46, Jon Harrop <····@ffconsultancy.com> wrote:
> Alex Mizrahi wrote:
> > i see here most time is spent on GC.if OCaml has somewhat different
> > allocation/gc schemes, it might be not a fair comparison of pattern
> > matchers.
>
> OCaml is vastly faster at allocating. However, it is also vastly faster at
> pattern matching...
>
> --
> Dr Jon D Harrop, Flying Frog Consultancy
> The OCaml Journalhttp://www.ffconsultancy.com/products/ocaml_journal/?usenet

I've only been able to find a factor of 2 on the example you supplied
compared to the code I generated.

I looked up Lisp vs OCaml on benchmark shootout

http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all

Again not a vast difference.  Where do you get your figures from?

Mark

From: Jon Harrop
Subject: Re: a vast difference?
Date: Sun, 15 Jul 2007 00:21:27 +0000
Message-ID: <46996c8d$0$1631$ed2619ec@ptn-nntp-reader02.plus.net>

Mark Tarver wrote:
> I've only been able to find a factor of 2 on the example you supplied
> compared to the code I generated.

Yes. However, you are comparing the most heavily optimized Lisp
implementations with unoptimized OCaml without compiler flags.

Adding "-inline 0" makes the unoptimized OCaml ~25% faster. Optimizing the
OCaml makes it 2-3 times faster again.

Timing my fastest OCaml against Nathan's Lisp at simplifying a
randomly-generated 12Mb expression, OCaml is 8x faster.

> I looked up Lisp vs OCaml on benchmark shootout
> 
> http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all
> 
> Again not a vast difference.

Short value lifetime distributions are a pivotal feature of this benchmark
and none of the current shootout benchmarks share that property. The ray
tracer did but they deleted it.

> Where do you get your figures from? 

For the efficiency of pattern matching, I looked at the generated code. For
the efficiency of the run-time, this was an important lesson that my ray
tracer benchmark taught me:

  http://www.ffconsultancy.com/languages/ray_tracer/results.html

Look how much faster Juho's unboxing macro makes the final Lisp
implementation. Short value lifetime distributions are a consequence of
idiomatic OCaml programming (heavy use of immutable data structures) but
not of idiomatic Lisp. Consequently, OCaml's run-time is better optimized
for benchmarks like this.

I believe the slow OCaml is faster than the fastest Lisp for two main
reasons:

1. Efficient pattern matching: decision tree optimizations, dispatch tables,
exploit static type information. OCaml's pattern matcher is very heavily
optimized and it will be a lot of work to write something competitive.

2. Efficient run-time: the purely functional style results in rapid
allocation and collection of many short-lived values (AST nodes).

The fast OCaml is faster than the slow OCaml for two main reasons:

3. Static typing: the slow OCaml uses an inferred sum type, which imposes
the performance hit of dynamic typing, whereas the fast OCaml uses an
explicitly-specified sum type.

3. Unboxing: I manually unboxed the data structures quite a bit in the fast
OCaml, to avoid unnecessary temporaries.

This last point is the reason why Stalin and MLton will produce code several
times faster than the idiomatic OCaml.

On the basis of this, I would suggest several optimizations to the Lisp:

1. Factor out the patterns for the +: and *: operators as I did in the
OCaml.

2. Try mutating the s-expr in-place to reduce the stress on the run-time.

3. Try to unbox more aggressively, e.g. by unrolling the simplify function.

You may be interested in the s-expr intermediate representation generated by
OCaml's pattern match compiler:

$ ocamlc -dlambda -c simplify.ml
(seq
  (letrec
    (+:/61
       (function f/62 g/63
         (catch
           (catch
             (if (isint f/62) (exit 3)
               (if (!= (field 0 f/62) 3654863) (exit 3)
                 (let (n/64 (field 1 f/62))
                   (catch
                     (if (isint g/63) (exit 4)
                       (if (!= (field 0 g/63) 3654863) (exit 4)
                         (makeblock 0 3654863
                           (apply (field 0 (global Num!)) n/64
                             (field 1 g/63)))))
                    with (4)
                     (switch n/64
                      case tag 0: (if (!= (field 0 n/64) 0) (exit 3) g/63)
                      default: (exit 3))))))
            with (3)
             (if (isint g/63) (exit 2)
               (let (variant/106 (field 0 g/63))
                 (if (!= variant/106 3254785)
                   (if (!= variant/106 3654863) (exit 2)
                     (let (match/103 (field 1 g/63))
                       (switch match/103
                        case tag 0:
                         (if (!= (field 0 match/103) 0) (exit 2) f/62)
                        default: (exit 2))))
                   (let (match/105 (field 1 g/63))
                     (apply +:/61 (apply +:/61 f/62 (field 0 match/105))
                       (field 1 match/105)))))))
          with (2) (makeblock 0 3254785 (makeblock 0 f/62 g/63)))))
    (setfield_imm 0 (global Simplify!) +:/61))
  (letrec
    (*:/73
       (function f/74 g/75
         (catch
           (catch
             (catch
               (catch
                 (catch
                   (if (isint f/74) (exit 10)
                     (if (!= (field 0 f/74) 3654863) (exit 10)
                       (let (n/76 (field 1 f/74))
                         (catch
                           (if (isint g/75) (exit 11)
                             (if (!= (field 0 g/75) 3654863) (exit 11)
                               (makeblock 0 3654863
                                 (apply (field 5 (global Num!)) n/76
                                   (field 1 g/75)))))
                          with (11)
                           (switch n/76
                            case tag 0:
                             (if (!= (field 0 n/76) 0) (exit 10) (exit 5))
                            default: (exit 10))))))
                  with (10)
                   (if (isint g/75) (exit 9)
                     (if (!= (field 0 g/75) 3654863) (exit 9)
                       (let (match/114 (field 1 g/75))
                         (switch match/114
                          case tag 0:
                           (if (!= (field 0 match/114) 0) (exit 9) (exit 5))
                          default: (exit 9))))))
                with (9)
                 (if (isint f/74) (exit 8)
                   (if (!= (field 0 f/74) 3654863) (exit 8)
                     (let (match/117 (field 1 f/74))
                       (switch match/117
                        case tag 0:
                         (if (!= (field 0 match/117) 1) (exit 8) g/75)
                        default: (exit 8))))))
              with (8)
               (if (isint g/75) (exit 7)
                 (let (variant/123 (field 0 g/75))
                   (if (!= variant/123 3654863)
                     (if (!= variant/123 3855332) (exit 7)
                       (let (match/122 (field 1 g/75))
                         (apply *:/73 (apply *:/73 f/74 (field 0 match/122))
                           (field 1 match/122))))
                     (let (match/120 (field 1 g/75))
                       (switch match/120
                        case tag 0:
                         (if (!= (field 0 match/120) 1) (exit 7) f/74)
                        default: (exit 7)))))))
            with (7) (makeblock 0 3855332 (makeblock 0 f/74 g/75)))
          with (5) [0: 3654863 [0: 0]])))
    (setfield_imm 1 (global Simplify!) *:/73))
  (letrec
    (simplify/87
       (function f/88
         (let (variant/128 (field 0 f/88))
           (if (!= variant/128 3855332)
             (if (>= variant/128 3254786) f/88
               (let (match/124 (field 1 f/88))
                 (apply (field 0 (global Simplify!))
                   (apply simplify/87 (field 0 match/124))
                   (apply simplify/87 (field 1 match/124)))))
             (let (match/125 (field 1 f/88))
               (apply (field 1 (global Simplify!))
                 (apply simplify/87 (field 0 match/125))
                 (apply simplify/87 (field 1 match/125))))))))
    (setfield_imm 2 (global Simplify!) simplify/87))
  0a)

-- 
Dr Jon D Harrop, Flying Frog Consultancy
The OCaml Journal
http://www.ffconsultancy.com/products/ocaml_journal/?usenet

From: Mark Tarver
Subject: Re: a vast difference?
Date: Sun, 15 Jul 2007 08:49:23 +0000
Message-ID: <1184489363.216811.111450@m3g2000hsh.googlegroups.com>

On 15 Jul, 01:21, Jon Harrop <····@ffconsultancy.com> wrote:
> Mark Tarver wrote:
> > I've only been able to find a factor of 2 on the example you supplied
> > compared to the code I generated.
>
> Yes. However, you are comparing the most heavily optimized Lisp
> implementations with unoptimized OCaml without compiler flags.
>
> Adding "-inline 0" makes the unoptimized OCaml ~25% faster. Optimizing the
> OCaml makes it 2-3 times faster again.
>
> Timing my fastest OCaml against Nathan's Lisp at simplifying a
> randomly-generated 12Mb expression, OCaml is 8x faster.
>
> > I looked up Lisp vs OCaml on benchmark shootout
>
> >http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all
>
> > Again not a vast difference.
>
> Short value lifetime distributions are a pivotal feature of this benchmark
> and none of the current shootout benchmarks share that property. The ray
> tracer did but they deleted it.
>
> > Where do you get your figures from?
>
> For the efficiency of pattern matching, I looked at the generated code. For
> the efficiency of the run-time, this was an important lesson that my ray
> tracer benchmark taught me:
>
>  http://www.ffconsultancy.com/languages/ray_tracer/results.html
>
> Look how much faster Juho's unboxing macro makes the final Lisp
> implementation. Short value lifetime distributions are a consequence of
> idiomatic OCaml programming (heavy use of immutable data structures) but
> not of idiomatic Lisp. Consequently, OCaml's run-time is better optimized
> for benchmarks like this.
>
> I believe the slow OCaml is faster than the fastest Lisp for two main
> reasons:
>
> 1. Efficient pattern matching: decision tree optimizations, dispatch tables,
> exploit static type information. OCaml's pattern matcher is very heavily
> optimized and it will be a lot of work to write something competitive.
>
> 2. Efficient run-time: the purely functional style results in rapid
> allocation and collection of many short-lived values (AST nodes).
>
> The fast OCaml is faster than the slow OCaml for two main reasons:
>
> 3. Static typing: the slow OCaml uses an inferred sum type, which imposes
> the performance hit of dynamic typing, whereas the fast OCaml uses an
> explicitly-specified sum type.
>
> 3. Unboxing: I manually unboxed the data structures quite a bit in the fast
> OCaml, to avoid unnecessary temporaries.
>
> This last point is the reason why Stalin and MLton will produce code several
> times faster than the idiomatic OCaml.
>
> On the basis of this, I would suggest several optimizations to the Lisp:
>
> 1. Factor out the patterns for the +: and *: operators as I did in the
> OCaml.
>
> 2. Try mutating the s-expr in-place to reduce the stress on the run-time.
>
> 3. Try to unbox more aggressively, e.g. by unrolling the simplify function.
>
> You may be interested in the s-expr intermediate representation generated by
> OCaml's pattern match compiler:
>
> $ ocamlc -dlambda -c simplify.ml
> (seq
>   (letrec
>     (+:/61
>        (function f/62 g/63
>          (catch
>            (catch
>              (if (isint f/62) (exit 3)
>                (if (!= (field 0 f/62) 3654863) (exit 3)
>                  (let (n/64 (field 1 f/62))
>                    (catch
>                      (if (isint g/63) (exit 4)
>                        (if (!= (field 0 g/63) 3654863) (exit 4)
>                          (makeblock 0 3654863
>                            (apply (field 0 (global Num!)) n/64
>                              (field 1 g/63)))))
>                     with (4)
>                      (switch n/64
>                       case tag 0: (if (!= (field 0 n/64) 0) (exit 3) g/63)
>                       default: (exit 3))))))
>             with (3)
>              (if (isint g/63) (exit 2)
>                (let (variant/106 (field 0 g/63))
>                  (if (!= variant/106 3254785)
>                    (if (!= variant/106 3654863) (exit 2)
>                      (let (match/103 (field 1 g/63))
>                        (switch match/103
>                         case tag 0:
>                          (if (!= (field 0 match/103) 0) (exit 2) f/62)
>                         default: (exit 2))))
>                    (let (match/105 (field 1 g/63))
>                      (apply +:/61 (apply +:/61 f/62 (field 0 match/105))
>                        (field 1 match/105)))))))
>           with (2) (makeblock 0 3254785 (makeblock 0 f/62 g/63)))))
>     (setfield_imm 0 (global Simplify!) +:/61))
>   (letrec
>     (*:/73
>        (function f/74 g/75
>          (catch
>            (catch
>              (catch
>                (catch
>                  (catch
>                    (if (isint f/74) (exit 10)
>                      (if (!= (field 0 f/74) 3654863) (exit 10)
>                        (let (n/76 (field 1 f/74))
>                          (catch
>                            (if (isint g/75) (exit 11)
>                              (if (!= (field 0 g/75) 3654863) (exit 11)
>                                (makeblock 0 3654863
>                                  (apply (field 5 (global Num!)) n/76
>                                    (field 1 g/75)))))
>                           with (11)
>                            (switch n/76
>                             case tag 0:
>                              (if (!= (field 0 n/76) 0) (exit 10) (exit 5))
>                             default: (exit 10))))))
>                   with (10)
>                    (if (isint g/75) (exit 9)
>                      (if (!= (field 0 g/75) 3654863) (exit 9)
>                        (let (match/114 (field 1 g/75))
>                          (switch match/114
>                           case tag 0:
>                            (if (!= (field 0 match/114) 0) (exit 9) (exit 5))
>                           default: (exit 9))))))
>                 with (9)
>                  (if (isint f/74) (exit 8)
>                    (if (!= (field 0 f/74) 3654863) (exit 8)
>                      (let (match/117 (field 1 f/74))
>                        (switch match/117
>                         case tag 0:
>                          (if (!= (field 0 match/117) 1) (exit 8) g/75)
>                         default: (exit 8))))))
>               with (8)
>                (if (isint g/75) (exit 7)
>                  (let (variant/123 (field 0 g/75))
>                    (if (!= variant/123 3654863)
>                      (if (!= variant/123 3855332) (exit 7)
>                        (let (match/122 (field 1 g/75))
>                          (apply *:/73 (apply *:/73 f/74 (field 0 match/122))
>                            (field 1 match/122))))
>                      (let (match/120 (field 1 g/75))
>                        (switch match/120
>                         case tag 0:
>                          (if (!= (field 0 match/120) 1) (exit 7) f/74)
>                         default: (exit 7)))))))
>             with (7) (makeblock 0 3855332 (makeblock 0 f/74 g/75)))
>           with (5) [0: 3654863 [0: 0]])))
>     (setfield_imm 1 (global Simplify!) *:/73))
>   (letrec
>     (simplify/87
>        (function f/88
>          (let (variant/128 (field 0 f/88))
>            (if (!= variant/128 3855332)
>              (if (>= variant/128 3254786) f/88
>                (let (match/124 (field 1 f/88))
>                  (apply (field 0 (global Simplify!))
>                    (apply simplify/87 (field 0 match/124))
>                    (apply simplify/87 (field 1 match/124)))))
>              (let (match/125 (field 1 f/88))
>                (apply (field 1 (global Simplify!))
>                  (apply simplify/87 (field 0 match/125))
>                  (apply simplify/87 (field 1 match/125))))))))
>     (setfield_imm 2 (global Simplify!) simplify/87))
>   0a)
>
> --
> Dr Jon D Harrop, Flying Frog Consultancy
> The OCaml Journalhttp://www.ffconsultancy.com/products/ocaml_journal/?usenet

Well as far as I see your claim rests on two programs; your fast
OCaml on a 12Mb random input and your ray tracer.

The first, according to your own words, is not significantly faster
than your short version on the sample problem I gave.  If you're using
a 12Mb input then probably other considerations are influencing the
matter rather than the efficiency of pattern-matching - space
management being one.

Be that as it may, this is not a convenient benchmark to test (random
expressions being - erm - random). I'm not certain about the
practicality of a 12Mb algebraic expression for other people to
compare.  Even in theorem-proving we rarely generate expressions of
that size - if you did chances are your ATP has bellied up.

You may be right but I think you are making a large claim on a small
basis.  I think the consensus has to be 'unproved' on what I've had
the opportunity to see on the web and to test personally.  Drop the
'vastly' and I'd say 'yes'.

Mark

From: Dan Bensen
Subject: Re: a vast difference?
Date: Sun, 15 Jul 2007 15:50:37 +0000
Message-ID: <f7dfcj$vir$1@wildfire.prairienet.org>

 >> I believe the slow OCaml is faster than the fastest Lisp for two main
 >> reasons:
5.  It's buggy.  It's only testing for fixnums (Ints), not rationals and
     bignums (Ratios and Big_ints).  We need an input expression with
     rationals and large numbers.

 >> I would suggest several optimizations
 >> 1. Factor out the patterns for the +: and *: operators
 >>    as I did in the OCaml.
6.  Our "heavily optimized" Lisp programs are using all-purpose lists
     instead of specialized structs.

Mark Tarver wrote:
 > You may be right but I think you are making a large claim on a small
 > basis.
Do you find that surprising?

-- 
Dan
www.prairienet.org/~dsb/

From: Dan Bensen
Subject: Re: a vast difference?
Date: Mon, 16 Jul 2007 10:39:02 +0000
Message-ID: <f7fhgc$l07$1@wildfire.prairienet.org>

Dan Bensen wrote:
>  >> I believe the slow OCaml is faster than the fastest Lisp for two main
>  >> reasons:
> 5.  It's buggy.  It's only testing for fixnums (Ints), not rationals and
>     bignums (Ratios and Big_ints).

Okay, nevermind that.  There was a typedef hiding in the background,
it just wasn't getting posted most of the time.

-- 
Dan
www.prairienet.org/~dsb/

From: Jon Harrop
Subject: Re: a vast difference?
Date: Sun, 15 Jul 2007 21:47:43 +0000
Message-ID: <469a973c$0$1630$ed2619ec@ptn-nntp-reader02.plus.net>

Mark Tarver wrote:
> Well as far as I see your claim rests on two programs; your fast
> OCaml on a 12Mb random input and your ray tracer.

Yes.

> The first, according to your own words, is not significantly faster
> than your short version on the sample problem I gave.  If you're using
> a 12Mb input then probably other considerations are influencing the
> matter rather than the efficiency of pattern-matching - space
> management being one.

Consed space is roughly the same between the fastest Lisp and slow OCaml.

> Be that as it may, this is not a convenient benchmark to test (random
> expressions being - erm - random). I'm not certain about the
> practicality of a 12Mb algebraic expression for other people to
> compare.  Even in theorem-proving we rarely generate expressions of
> that size - if you did chances are your ATP has bellied up.

Fair enough. How about computing derivatives of x^x wrt x? This OCaml
computes the 10th derivative in 3.15s on this machine:

open Num;;

type expr =
  | Rat of num
  | Var of string
  | Add of expr * expr
  | Mul of expr * expr
  | Pow of expr * expr
  | Log of expr;;

let ( +: ) f g =
  match f, g with
  | Rat(Int 0), p | p, Rat(Int 0) -> p
  | Rat p, Rat q -> Rat(p +/ q)
  | p, q -> Add(p, q);;

let ( *: ) f g =
  match f, g with
  | Rat(Int 0), p | p, Rat(Int 0) -> Rat(Int 0)
  | Rat(Int 1), p | p, Rat(Int 1) -> p
  | Rat p, Rat q -> Rat(p */ q)
  | p, q -> Mul(p, q);;

let ( **: ) f g = Pow(f, g);;

let ( -: ) f g = f +: Rat(Int(-1)) *: g;;
let ( /: ) f g = f *: g **: Rat(Int(-1));;

let rec d f x =
  match f with
  | Var y when x=y -> Rat(Int 1)
  | Rat _ | Var _ -> Rat(Int 0)
  | Add(f, g) -> d f x +: d g x
  | Mul(f, g) -> f *: d g x +: g *: d f x
  | Pow(f, g) -> f **: g *: (g *: d f x /: f +: Log f *: d g x)
  | Log f -> d f x /: f;;

let rec nest n f x = if n=0 then x else nest (n-1) f (f x);;

let f =
  let x = Var "x" in
  x **: x;;

let f' = nest 10 (fun f -> d f "x") f;;

Note that simplification is arbitrarily difficult, so other programs must
implement the exact same reduction rules (in the same order) to guarantee
the same result and be comparable. Looking at the leaf count is a good way
to do this:

let rec leaf_count = function
  | Rat _ | Var _ -> 1
  | Add(f, g) | Mul(f, g) | Pow(f, g) -> leaf_count f + leaf_count g
  | Log f -> leaf_count f;;

Printf.printf "%d leaves\n" (leaf_count f');

I get 27,168,043 leaves.

> You may be right but I think you are making a large claim on a small
> basis.  I think the consensus has to be 'unproved' on what I've had
> the opportunity to see on the web and to test personally.  Drop the
> 'vastly' and I'd say 'yes'.

I'm all for quantifying the "vastly" and keeping this objective.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
The OCaml Journal
http://www.ffconsultancy.com/products/ocaml_journal/?usenet

From: Mark Tarver
Subject: Re: a vast difference?
Date: Mon, 16 Jul 2007 11:34:11 +0000
Message-ID: <1184585651.927389.187540@k79g2000hse.googlegroups.com>

On 15 Jul, 22:47, Jon Harrop <····@ffconsultancy.com> wrote:
> Mark Tarver wrote:
> > Well as far as I see your claim rests on two programs; your fast
> > OCaml on a 12Mb random input and your ray tracer.
>
> Yes.
>
> > The first, according to your own words, is not significantly faster
> > than your short version on the sample problem I gave.  If you're using
> > a 12Mb input then probably other considerations are influencing the
> > matter rather than the efficiency of pattern-matching - space
> > management being one.
>
> Consed space is roughly the same between the fastest Lisp and slow OCaml.
>
> > Be that as it may, this is not a convenient benchmark to test (random
> > expressions being - erm - random). I'm not certain about the
> > practicality of a 12Mb algebraic expression for other people to
> > compare.  Even in theorem-proving we rarely generate expressions of
> > that size - if you did chances are your ATP has bellied up.
>
> Fair enough. How about computing derivatives of x^x wrt x? This OCaml
> computes the 10th derivative in 3.15s on this machine:
>
> open Num;;
>
> type expr =
>   | Rat of num
>   | Var of string
>   | Add of expr * expr
>   | Mul of expr * expr
>   | Pow of expr * expr
>   | Log of expr;;
>
> let ( +: ) f g =
>   match f, g with
>   | Rat(Int 0), p | p, Rat(Int 0) -> p
>   | Rat p, Rat q -> Rat(p +/ q)
>   | p, q -> Add(p, q);;
>
> let ( *: ) f g =
>   match f, g with
>   | Rat(Int 0), p | p, Rat(Int 0) -> Rat(Int 0)
>   | Rat(Int 1), p | p, Rat(Int 1) -> p
>   | Rat p, Rat q -> Rat(p */ q)
>   | p, q -> Mul(p, q);;
>
> let ( **: ) f g = Pow(f, g);;
>
> let ( -: ) f g = f +: Rat(Int(-1)) *: g;;
> let ( /: ) f g = f *: g **: Rat(Int(-1));;
>
> let rec d f x =
>   match f with
>   | Var y when x=y -> Rat(Int 1)
>   | Rat _ | Var _ -> Rat(Int 0)
>   | Add(f, g) -> d f x +: d g x
>   | Mul(f, g) -> f *: d g x +: g *: d f x
>   | Pow(f, g) -> f **: g *: (g *: d f x /: f +: Log f *: d g x)
>   | Log f -> d f x /: f;;
>
> let rec nest n f x = if n=0 then x else nest (n-1) f (f x);;
>
> let f =
>   let x = Var "x" in
>   x **: x;;
>
> let f' = nest 10 (fun f -> d f "x") f;;
>
> Note that simplification is arbitrarily difficult, so other programs must
> implement the exact same reduction rules (in the same order) to guarantee
> the same result and be comparable. Looking at the leaf count is a good way
> to do this:
>
> let rec leaf_count = function
>   | Rat _ | Var _ -> 1
>   | Add(f, g) | Mul(f, g) | Pow(f, g) -> leaf_count f + leaf_count g
>   | Log f -> leaf_count f;;
>
> Printf.printf "%d leaves\n" (leaf_count f');
>
> I get 27,168,043 leaves.
>
> > You may be right but I think you are making a large claim on a small
> > basis.  I think the consensus has to be 'unproved' on what I've had
> > the opportunity to see on the web and to test personally.  Drop the
> > 'vastly' and I'd say 'yes'.
>
> I'm all for quantifying the "vastly" and keeping this objective.
>
> --
> Dr Jon D Harrop, Flying Frog Consultancy
> The OCaml Journalhttp://www.ffconsultancy.com/products/ocaml_journal/?usenet

Ok; I'm off for a break now.  But some observations.

1. Read what I have to say about benchmarks to Frank Buss

http://groups.google.co.uk/group/comp.lang.lisp/browse_frm/thread/6f980492e7aa4ddd/ab6314b6f4ba5e46?hl=en#ab6314b6f4ba5e46

which represents my design philosophy w.r.t. benchmarks.  I seriously
recommend the Gabriel benchmarks as the best way of making your
point.  At least they command enough respect to have been used for
many years by many Lispers.

2. Your OCaml '10th derivative in 3.15s' is v. impressive, but would
not be a motivator to choose OCaml.  People seriously don't compute
27Mb algebraic expressions unless something has gone horribly wrong.
It would be more motivating to choose a smaller derivative (and
iterate it) as a more realistic representation of the sort of problem
a person would want to do.  We've already done a similar problem and
the answer seemed to be that OCaml was 2X the speed of Qi or
thereabouts.  Do you think that this would be different?

3. Probably you need to put the leaf count for the 1st, 2nd etc.
derivative to provide a check for people to ensure that their solution
is on track.  Many will not make it to the 10th derivative.

4. Your 'vastly faster (?)' (I would say 'much faster' because 'vast'
for me signifies something like 100X) might with more accuracy have
been applied to the elegant and intuitive Lisp solution that Andre
Thieme supplied.  Andre came up with a clear natural solution that was
about 8X slower than your OCaml.  Nathan's solution was the result of
hours of optimisation, is fast but is very difficult to read and very
unnatural.

So I think you can rightly claim that OCaml is much faster than Lisp
w.r.t. complex list handling because I don't think that Lisp
programmers would want to write Nathan's code (clever tho' it is).  So
if its typical OCaml vs. typical Lisp then OCaml seems, on this
benchmark, to be much faster.  But you need a few more such
benchmarks.

5. The preceding claim does not carry over to Qi, because the Qi
   code is as short as the OCaml, is quite natural and about as
   fast as Nathan's and Qi runs under Lisp.  However this does not
   refute your original claim.  * But you should know that Lisp
   is par excellence a metaprogramming language and so Lispers view
   whatever we can make in Lisp as part of our Lisp culture.*  Hence
   if we like something we nick it :) - or more often we invent it.

6.  I liked the overlapping patterns stuff (I knew about it before
    but your posts motivated me to do something about it), so I
    nicked it. I hope you understand that this sort of behaviour is
    common in Lisp programmers, which is why a lot of people are shy
    of us.  The bunch on this group are no better I'm afraid ;) -
    thieves the lot of them.  Junior Lisp programmers or programmers
    of other paradigms are generally more honest souls who will
    genuinely buy into a package with an idea and reimburse you for
    it.  Lispers with a lot of form like that Tilton fellow will
    just appropriate your idea in 48 hours and leave you with your
    code.  I'm no better and I'm too old to change.

Mark

From: Raffael Cavallaro
Subject: Re: a vast difference?
Date: Mon, 16 Jul 2007 13:12:30 +0000
Message-ID: <2007071609123050073-raffaelcavallaro@pasdespamsilvousplaitmaccom>

On 2007-07-16 07:34:11 -0400, Mark Tarver <··········@ukonline.co.uk> said:

> I hope you understand that this sort of behaviour is
>     common in Lisp programmers, which is why a lot of people are shy
>     of us.  The bunch on this group are no better I'm afraid ;) -
>     thieves the lot of them.  Junior Lisp programmers or programmers
>     of other paradigms are generally more honest souls who will
>     genuinely buy into a package with an idea and reimburse you for
>     it.  Lispers with a lot of form like that Tilton fellow will
>     just appropriate your idea in 48 hours and leave you with your
>     code.  I'm no better and I'm too old to change.

"If there's something to steal, I steal it."

"Bad artists copy. Great artists steal."

- Pablo Picasso

From: Jon Harrop
Subject: Re: a vast difference?
Date: Mon, 16 Jul 2007 21:24:19 +0000
Message-ID: <469be33e$0$1629$ed2619ec@ptn-nntp-reader02.plus.net>

Mark Tarver wrote:
> 1. Read what I have to say about benchmarks to Frank Buss
> 
>
http://groups.google.co.uk/group/comp.lang.lisp/browse_frm/thread/6f980492e7aa4ddd/ab6314b6f4ba5e46?hl=en#ab6314b6f4ba5e46
> 
> which represents my design philosophy w.r.t. benchmarks.  I seriously
> recommend the Gabriel benchmarks as the best way of making your
> point.  At least they command enough respect to have been used for
> many years by many Lispers.

I have spent some time studying the Gabriel benchmarks (they have almost all
been translated to languages like OCaml already) but I yearn for more
significant benchmarks.

For example, the Boyer benchmark from Gabriel sounded interesting. The
source is 500 lines long in Lisp and 900 lines long in OCaml. But it is all
data and almost all of the time is spent in a single line of code (a map).

Some of the other benchmarks (e.g. tak) are just absurd.

To be frank, I think we have already done better than any of the Gabriel
benchmarks with this symbolic simplifier. I think the ray tracer is also
better. I believe a BASIC interpreter would be another great benchmark
(albeit testing similar things).

If we stray, I would like to compare cl-ppcre to OCaml's Str module and
perhaps do some comparisons of macros as well.

> 2. Your OCaml '10th derivative in 3.15s' is v. impressive, but would
> not be a motivator to choose OCaml.  People seriously don't compute
> 27Mb algebraic expressions unless something has gone horribly wrong.

This is the bread and butter of CASs and specialized programs like FFTW,
which are of huge importance in scientific computing. I appreciate that
this is a specialized domain but it is obviously one close to my heart.

> It would be more motivating to choose a smaller derivative (and
> iterate it)

If by "iterate it" you mean "add redundancy to make it take longer", I
strongly disagree because the results would have no bearing on reality as
they can (and will!) be arbitrarily optimized away by the compiler.
Benchmarks must be as irreducible as real programs, or they is no point in
testing them. Witness the problems the Haskell people have trying to get
their shootout entries to take >0s, for example.

> as a more realistic representation of the sort of problem 
> a person would want to do.

With our work on F#, we recently started to produce products for
Microsoft's .NET framework (aimed at C# and VB programmers). To our
surprise, there were no decent FFT implementations available for .NET at
the time, so we wrote one in C# and productized it:

  http://www.ffconsultancy.com/products/signal_processing_net/?cl

If there is sufficient commercial interest in this product, we'll improve
it. The next step in its development is to write a program that unwinds the
usual FFT inner loops:

     for (i = 1; i < n; i = 2 * i) {
          for (j = 0; j < i; ++j) {
               N wr, wi;
               mcexp(sign * (int)j, 2 * i, wr, wi);
               for (k = j; k < n; k += 2 * i) {
                    N *a0 = a + 2 * k;
                    N *a1 = a0 + 2 * i;
                    N r0, i0, r1, i1, t0, t1, xr, xi;
                    cpy(a0[0], r0); cpy(a0[1], i0);
                    cpy(a1[0], r1); cpy(a1[1], i1);
                    mul(r1, wr, t0); mul(i1, wi, t1); sub(t0, t1, xr);
                    mul(r1, wi, t0); mul(i1, wr, t1); add(t0, t1, xi);
                    add(r0, xr, a0[0]);  add(i0, xi, a0[1]);
                    sub(r0, xr, a1[0]);  sub(i0, xi, a1[1]);
               }
          }
     }

by representing that code as an AST in another (OCaml) program and unrolling
it for different-sized transforms and applying simple symbolic
simplifications. This is essentially what genfft does from the FFTW
distribution.

> We've already done a similar problem and  
> the answer seemed to be that OCaml was 2X the speed of Qi or
> thereabouts.  Do you think that this would be different?

Yes, I do. Running with bytecode profiling gives you usage counts for
branches:

$ ocamlcp nums.cma simplify.ml -o simplify
$ ./simplify
Took 0.536919s
$ ocamlprof simplify.ml
...
let rec ( +: ) f g = (* 300000 *) match f, g with
    | Q (Int 0), e | e, Q (Int 0) -> (* 100000 *) e
    | Q n, Q m -> (* 100000 *) Q (n +/ m)
    | f, Add(g, h) -> (* 0 *) f +: g +: h
    | f, g -> (* 100000 *) Add(f, g);;

let rec ( *: ) f g = (* 200000 *) match f, g with
    | Q (Int 0), e | e, Q (Int 0) -> (* 100000 *) Q (Int 0)
    | Q (Int 1), e | e, Q (Int 1) -> (* 0 *) e
    | Q n, Q m -> (* 0 *) Q (n */ m)
    | f, Mul(g, h) -> (* 0 *) f *: g *: h
    | f, g -> (* 100000 *) Mul(f, g);;

let rec simplify = function
    | Q _ | Var _ as f -> (* 600000 *) f
    | Add(f, g) -> (* 300000 *) simplify f +: simplify g
    | Mul(f, g) -> (* 200000 *) simplify f *: simplify g;;
...

You can see that 4 of the 12 branches are completely unused. So we've
written the code but this particular test isn't using it.

> 3. Probably you need to put the leaf count for the 1st, 2nd etc.
> derivative to provide a check for people to ensure that their solution
> is on track.  Many will not make it to the 10th derivative.

Good idea. Mathematica takes a significant amount of time to compute the
first 15 derivatives and produces much smaller results (it is doing more
simplification):

In[2]:= Timing[LeafCount ·@ Table[D[x^x, {x, i}], {i, 15}]]

Out[2]= {0.030995 Second, {8, 16, 38, 90, 209, 476, 1048, 2277, 4866, 10313,
21668, 45272, 94066, 194646, 401175}}

> 4. Your 'vastly faster (?)' (I would say 'much faster' because 'vast'
> for me signifies something like 100X) might with more accuracy have
> been applied to the elegant and intuitive Lisp solution that Andre
> Thieme supplied.  Andre came up with a clear natural solution that was
> about 8X slower than your OCaml.  Nathan's solution was the result of
> hours of optimisation, is fast but is very difficult to read and very
> unnatural.
> 
> So I think you can rightly claim that OCaml is much faster than Lisp
> w.r.t. complex list handling because I don't think that Lisp
> programmers would want to write Nathan's code (clever tho' it is).  So
> if its typical OCaml vs. typical Lisp then OCaml seems, on this
> benchmark, to be much faster.  But you need a few more such
> benchmarks.

Right. I don't actually think that is the correct conclusion though. I
believe the reason OCaml beats Lisp on this benchmark is that all of the
Lisp solutions are using purely functional data structures, i.e. no
mutation.

The Boyer benchmark from Gabriel uses mutation in the Lisp and is 4x faster
than the OCaml implementation from the OCaml distro's benchmark suite.

I spent some time yesterday optimizing this boyer.ml:

  http://stuff.mit.edu/afs/sipb/project/ocaml/src/current/test/

and managed to make it ~2x slower than the Lisp without sacrificing purity.
I suspect it would be as fast as the Lisp if it were impure, e.g. using a
mutable array to represent the tail of a term instead of an immutable list
and rewriting in-place.

However, this benchmark is seriously flawed in that it is completely
reducible. In constrast, our symbolic simplifier spreads time across more
code and is essentially irreducible when applied to a large external
expression.

So I believe the current implementations of the simplifier provide more
polarized evidence along the same lines of the ray tracer: that purely
functional code is slow in Lisp. This agrees with my impression that
mutation is much more common in real Lisp code than it is in OCaml and its
relatives.

Perhaps a really interesting benchmark would be to allow concurrency and see
how Lisp, OCaml, Haskell and F# perform on a parallel simplifier. This
benchmark would highlight many of the different trade-offs chosen by these
languages and it would be very interesting to see how the different
implementations did.

> 5. The preceding claim does not carry over to Qi, because the Qi
>    code is as short as the OCaml, is quite natural and about as
>    fast as Nathan's and Qi runs under Lisp.  However this does not
>    refute your original claim.  * But you should know that Lisp
>    is par excellence a metaprogramming language and so Lispers view
>    whatever we can make in Lisp as part of our Lisp culture.*  Hence
>    if we like something we nick it :) - or more often we invent it.

Ah, but my original claim was that you cannot nick this performance: it is a
core part of OCaml's design and it relies upon the foundation provided by
the language itself. Your results do not disprove that. Also, F# inherits
the same trade-off from the .NET platform.

I recently wrote GUI Sudoku solvers for articles in the F# and OCaml
Journals. They both use the same Prolog-style algorithm:

let invalid (i, j) (i', j') = i=i' || j=j' || i/n=i'/n && j/n=j'/n

let select p n p' ns = if invalid p p' then List.filter ((<>) n) ns else ns

let cmp (_, a) (_, b) = compare (List.length a) (List.length b)

let add p n sols =
  List.sort cmp (List.map (fun (p', ns) -> p', select p n p' ns) sols)

module Map = Map.Make(struct
                        type t = int * int
                        let compare = compare
                      end)

let rec search f sol = function
  | [] -> f sol
  | (p, ns)::sols ->
      List.iter (fun n -> search f (Map.add p n sol) (add p n sols)) ns

The exact same code runs 5x faster in OCaml than F#. The reason is, again,
that F# is just not optimized for the purely functional style.

> 6.  I liked the overlapping patterns stuff (I knew about it before
>     but your posts motivated me to do something about it), so I
>     nicked it.

Yes: or-patterns. You should also nick guarded patterns and maybe even
active patterns from F#.

>     I hope you understand that this sort of behaviour is 
>     common in Lisp programmers, which is why a lot of people are shy
>     of us.  The bunch on this group are no better I'm afraid ;) -
>     thieves the lot of them.  Junior Lisp programmers or programmers
>     of other paradigms are generally more honest souls who will
>     genuinely buy into a package with an idea and reimburse you for
>     it.  Lispers with a lot of form like that Tilton fellow will
>     just appropriate your idea in 48 hours and leave you with your
>     code.  I'm no better and I'm too old to change.

Yes. This is really awful. I always give a language feature back when I've
finished with it, so its really just "borrowing".

I don't think we can lash up something as good in this case though. After
all, the Haskell and OCaml compilers have probably had man-centuries put
into them and have far more users than any Lisp extensions like Qi.

I would also like to address some of OCaml's deficiencies using its macro
system. OCaml's macros were recently revamped. I started by stealing
unwind-protect from Lisp:

 EXTEND Gram
   expr: LEVEL "top"
   [[ "try"; f=sequence; "finally"; g=expr ->
        <:expr<
          ((function
            | `Val v, g -> g(); v
            | `Exn e, g -> g(); raise e)
             ((try `Val($f$) with e -> `Exn e), (fun () -> $g$)))
        >>]];

but I've since done some interesting things like a memory profiler that
works by rewriting whole programs.

OCaml now has inline lexers as well as parsers, so I would be interested to
compare this with other languages like Lisp as well. Something like a BASIC
or Mathematica interpreter would pull the lexing, parsing, symbolics and
even pretty printing together nicely.

-- 
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?usenet

From: Isaac Gouy
Subject: Re: a vast difference?
Date: Tue, 17 Jul 2007 23:42:24 +0000
Message-ID: <1184715744.890578.235110@i38g2000prf.googlegroups.com>

On Jul 16, 2:24 pm, Jon Harrop <····@ffconsultancy.com> wrote:
-snip-
> Benchmarks must be as irreducible as real programs, or they is no point in
> testing them. Witness the problems the Haskell people have trying to get
> their shootout entries to take >0s, for example.
-snip-

Heavens!

Which Haskell people have been trying to make their shootout entries
take >0s?

From: Waldek Hebisch
Subject: Re: a vast difference?
Date: Wed, 18 Jul 2007 02:13:47 +0000
Message-ID: <f7jt0r$ine$1@z-news.pwr.wroc.pl>

Jon Harrop <···@ffconsultancy.com> wrote:
> 
> Fair enough. How about computing derivatives of x^x wrt x? This OCaml
> computes the 10th derivative in 3.15s on this machine:
> 
<snip>
> 
> Note that simplification is arbitrarily difficult, so other programs must
> implement the exact same reduction rules (in the same order) to guarantee
> the same result and be comparable. Looking at the leaf count is a good way
> to do this:
> 
<snip>
> I get 27,168,043 leaves.
> 

FYI Axiom computes that derivative in about 0.1s and the result is
much smaller.  I do not understand the frase "simplification is
arbitrarily difficult":  it is well known that there are various
formulas which allow you to compute derivatives much faster than
naive use of chain rule and Leibnitz formula followed by
simplification.  So the way you propose seem to be very
unnatural (especially if you want performance).

-- 
                              Waldek Hebisch
·······@math.uni.wroc.pl