More on as-fast-as-C benchmarks

From: comp.lang.scheme
Subject: More on as-fast-as-C benchmarks
Date: Fri, 06 Mar 2009 05:14:37 +0000
Message-ID: <8e95a1b4-7406-488d-b46d-8d4133409ccc@l39g2000yqn.googlegroups.com>

A few days ago, I posted a comment on those as-fast-as-C Lisp
benchmarks. My results did not match the published data.  In my tests,
SBCL did not prove itself to be as fast as optimized C, but almost 2
times slower.  However, I examined the C code, and noticed that it
declares an array as global, while  Lisp creates a local array, and
passes it as argument to a function. I thought that this could slow
down Lisp, as compared to C. To test my theory, I modified the C
program, adding  a function to create an array.

After my modification, the C code became even faster than before. I
simply could not explain it. The modified C code became 3 times faster
than SBCL!  Then I noticed the following loop in the C code:

for (i=0;i<N;i++)
	res[2][i]=(res[0][i]+res[1][i])*res[0][i];

Array res has only 3 rows. Possibly the optimizer was doing things
like storing a pointer to a row in a register. Since there are only
three rows, it would need only three registers to do the trick. I do
not know much about compilers, but I decided to test the hypothesis. I
modified the benchmark keeping the same amount of array references,
and the same amount of calculations. The number of rows were
increased, so the C optimizer could not perform any tricks.  After the
modification,  SBCL and C came practically to a draw! I am using SBCL
1.26 on Windows (I recompiled it from sources, thanks to Pascal
Constanza's help).

============  Here are a few tests for C:

D:\Programs\lisp\bench>cbench
1721

D:\Programs\lisp\bench>cbench.exe
1593

D:\Programs\lisp\bench>cbench.exe
1609

============ The  SBCL test is shown below:

D:\Programs\lisp\bench>sbcl --noinform
* (load "bench.lisp")
STYLE-WARNING: defining (*N*) as a constant, even though the name
follows
the usual naming convention (names like *FOO*) for special variables

T
* (test)

Evaluation took:
  1.657 seconds of real time
  1.656250 seconds of total run time (1.625000 user, 0.031250 system)
  99.94% CPU
  3,948,255,126 processor cycles
  0 bytes consed

NIL
*

---------- C source ------------------------

Here is my version of the benchmark:


#include <stdio.h>
#include <time.h>


#define N 128*1024

double** makearray(int row,int col) {
  double **v;
  int i,j;
  v=(double **)malloc(row*sizeof(double *));
  for(i=0;i<row;i++){
    *(v+i)=(double *)malloc(col*sizeof(double));
  }

  for(i=0;i<row;i++){
	  for(j=0;j<col;j++)
      v[i][j]=0;}
  return(v);
}

double **bilde_array()
{
    int i, j;
    double **res;
    res= makearray(30, N);
    for (j=0; j<10;j++) {
      for (i=0;i<N;i++) {
	    res[j*3][i]=(double)i;
	    res[j*3+1][i]=N-i;
      }
    }
    return res;
}

double **rechne (double **res)
{
    int i, j;
   for (j=0; j<10; j++) {
    for (i=0;i<N;i++)
	res[j*3+2][i]=(res[j*3][i]+res[j*3+1][i])*res[j*3][i];
   }
	return res;
}

int main()
{
    int i;

    double **res;
    res= bilde_array();
    for (i=0;i<200;i++)
       rechne(res);
	printf("%d\n", clock());
}


------------------- Lisp Source ------------------

(proclaim '(optimize (safety 0)(speed 3)(space 0)
                     (compilation-speed 0)(debug 0)) )

(defconstant *n* #.(* 128 1024));)



(defun bilde-array ()
  (let ((res (make-array '(30 #.*n*)
			 :element-type 'double-float
			 )))
    (declare (type (simple-array double-float (30 #.*n*)) res) )
    (do ( (j 0 (1+ j)))  ( (>= j 10))
    (dotimes (i #.*n*)
	     (declare (type fixnum i))
	     (setf (aref res (* 3 j) i) (coerce i 'double-float))
	     (setf (aref res (1+ (* 3 j)) i) (coerce (- #.*n* i) 'double-
float))) )
    res
    ))

(defun rechne (res)
  (declare (type (simple-array double-float (30 #.*n*)) res) )
  (do ( (j 0 (1+ j))) ( (>= j 10))
  (dotimes (i #.*n*)
	   (declare (type fixnum i))
	   (setf (aref res (+ 2 (* j 3)) i)
		 (* (+ (aref res (* j 3) i)
		       (aref res (1+ (* j 3)) i))
		    (aref res (* j 3) i))
		    )))
  res)

(defun test ()
  (let ((foo (bilde-array)))
    (time (dotimes (i 200) (rechne foo   ))))
  nil)

Re: More on as-fast-as-C benchmarks Nicolas Neuss
- Re: More on as-fast-as-C benchmarks comp.lang.scheme
  - Re: More on as-fast-as-C benchmarks Nicolas Neuss
    - Re: More on as-fast-as-C benchmarks gugamilare
      - Re: More on as-fast-as-C benchmarks Nicolas Neuss
  - Re: More on as-fast-as-C benchmarks gugamilare
    - Re: More on as-fast-as-C benchmarks Nicolas Neuss
      - Re: More on as-fast-as-C benchmarks gugamilare
        Re: More on as-fast-as-C benchmarks Nicolas Neuss
        Re: More on as-fast-as-C benchmarks Juanjo
        Re: More on as-fast-as-C benchmarks gugamilare
    - Re: More on as-fast-as-C benchmarks Pillsy
    - Re: More on as-fast-as-C benchmarks William D Clinger
      - Re: More on as-fast-as-C benchmarks Nicolas Neuss
Re: More on as-fast-as-C benchmarks Waldek Hebisch
- Re: More on as-fast-as-C benchmarks comp.lang.scheme
  - Re: More on as-fast-as-C benchmarks Nicolas Neuss
  - Re: More on as-fast-as-C benchmarks Waldek Hebisch
    - Re: More on as-fast-as-C benchmarks Raymond Toy
      - Re: More on as-fast-as-C benchmarks Marco Antoniotti
- Re: More on as-fast-as-C benchmarks Raymond Toy
  - Re: More on as-fast-as-C benchmarks comp.lang.scheme
  - Re: More on as-fast-as-C benchmarks Waldek Hebisch
    - Re: More on as-fast-as-C benchmarks Raymond Toy

From: Nicolas Neuss
Subject: Re: More on as-fast-as-C benchmarks
Date: Fri, 06 Mar 2009 09:29:19 +0000
Message-ID: <87k5739ez4.fsf@ma-patru.mathematik.uni-karlsruhe.de>

> [...]

Perhaps also the following article is of interest to you:

N. Neuss: On using Common Lisp for Scientific Computing
Proceedings of the CISC Conference 2002, LNCSE, Springer-Verlag (2003).

You can download it (and also the code) from:

[http://www.iwr.uni-heidelberg.de/groups/techsim/people/neuss/publications.html]

(Sorry, these are still at my old Heidelberg site, I have not yet
transfered those to my new site in Karlsruhe.)

However, since that time I had some discussion with Rob Boyer who varied
some of the parameters (and also the inner operations) of my
mflop.c-program.  He and a colleague (Warren Hunt) managed to construct a
small C routine which run extremely slow on some architectures when
compiled with gcc maybe due to some cache aliasing problems.

So one should regard such microtests with some scepticism.  I think that
one can vary the parameters and the setup in such a way that code compiled
with some CL implementations is much faster than the corresponding code
compiled with gcc (and vice versa), at least on some architectures.

Yours, Nicolas

From: comp.lang.scheme
Subject: Re: More on as-fast-as-C benchmarks
Date: Fri, 06 Mar 2009 14:05:45 +0000
Message-ID: <bade471f-8563-4973-8650-29e19be4645c@o11g2000yql.googlegroups.com>

On 6 mar, 06:29, Nicolas Neuss <········@math.uni-karlsruhe.de> wrote:
> > [...]
>
> Perhaps also the following article is of interest to you:
>
> N. Neuss: On using Common Lisp for Scientific Computing
> Proceedings of the CISC Conference 2002, LNCSE, Springer-Verlag (2003).
>
> You can download it (and also the code) from:
>
> [http://www.iwr.uni-heidelberg.de/groups/techsim/people/neuss/publicat...]
>
> (Sorry, these are still at my old Heidelberg site, I have not yet
> transfered those to my new site in Karlsruhe.)
>
> However, since that time I had some discussion with Rob Boyer who varied
> some of the parameters (and also the inner operations) of my
> mflop.c-program.  He and a colleague (Warren Hunt) managed to construct a
> small C routine which run extremely slow on some architectures when
> compiled with gcc maybe due to some cache aliasing problems.
>
> So one should regard such microtests with some scepticism.  I think that
> one can vary the parameters and the setup in such a way that code compiled
> with some CL implementations is much faster than the corresponding code
> compiled with gcc (and vice versa), at least on some architectures.
>
> Yours, Nicolas

Hi, Nicolas.
I am sure that your article cache explanation for C being fast when
dealing with small problems, but not so fast in large problems, is the
correct one; my theory about register is clearly wrong, unless one
consider cache memory as a kind of register. A few comments on two
other aspects of your article:

(1) I am a CS student. I should be in my sophomore year, but since I
failed in all disciplines, but functional programming, I was left
behind. It seems that colleges do not adhere to the "no child left
behind" philosophy. In any case, I always wanted to use FEMLISP,
instead of the packages my NA teacher proposes. However, I noticed
that FEMLISP depends on Matlisp, that depends on BLAS and LAPACK. I am
using SBCL on windows. I discovered that it is very easy to add a
foreign language interface to SBCL. In fact, I added a graphics
interface in minutes, by shared-compiling its C-code, and doing a
(load-shared-object "gui.dll"). I suppose that it should be easy to
port BLAS and LAPACK to SBCL, and FEMLISP would follow suit. However,
many people have a strong difficulty in dealing with these large open
source code. Why don't you distribute FEMLISP in an out of the box
package? At least for Windows users, who are not so good with
configuring and compiling stuff.

(2) One thing in your article that drew my attention is your comment
on the verbose declaration method of Common Lisp. I also think it is
very annoying. However, other languages solved this very problem with
type inference, and selective methods for writing numbers. If one
wants type inference to generate specific code for number crunching, s/
he could declare:

(setf *genericity* Nil)

This is the way people do it in Bigloo. After that, the compiler
should infer the types from arithmetic operations. Another option,
could be adding the genericity stuff to the proclaim clause. BTW,
Stalin is very smart: It can do the job without any declarations but a
few compiler options.

I believe that SMCL team could do something about declarations without
going agains the standard. In any case, they already improved many
aspect of Common Lisp, avoiding a front crash with the compiler while
doing it. For instance, a few tests seem to show that one does not
need to declare things like (the fixnum (+ ...)). One does not need to
compile sources either: (load ...) is enough. The conclusion is that
it may not be that difficult to create a less cumbersome method of
dealing with declaration.

Another option is to add declarations automatically. User could do
this him/herself using a method similar to Scheme soft typing. One
could even steal code snippets from Andrew Wright's system... Since
SBCL is open source, all one needs  is to wrap the load function in a
soft typing system. What do you think?

From: Nicolas Neuss
Subject: Re: More on as-fast-as-C benchmarks
Date: Fri, 06 Mar 2009 15:43:40 +0000
Message-ID: <87hc263bdf.fsf@ma-patru.mathematik.uni-karlsruhe.de>

"comp.lang.scheme" <········@yahoo.ca> writes:

> (1) I am a CS student. I should be in my sophomore year, but since I
> failed in all disciplines, but functional programming, I was left
> behind. It seems that colleges do not adhere to the "no child left
> behind" philosophy. In any case, I always wanted to use FEMLISP,
> instead of the packages my NA teacher proposes. 

Don't do that.  Use what your teacher proposes which will probably be
Matlab/Octave/Scilab (which I use as well when teaching numerical analysis
to CS students).  CS students usually do not do numerics of PDEs, at least
not in the depth where Femlisp could be of advantage.

> However, I noticed that FEMLISP depends on Matlisp, that depends on BLAS
> and LAPACK.

No, Femlisp does not depend on Matlisp any more, even if I mention Matlisp
as interesting numerical software on the Femlisp homepage.

> I am using SBCL on windows. I discovered that it is very easy to add a
> foreign language interface to SBCL. In fact, I added a graphics interface
> in minutes, by shared-compiling its C-code, and doing a
> (load-shared-object "gui.dll"). I suppose that it should be easy to port
> BLAS and LAPACK to SBCL, and FEMLISP would follow suit.

I did that in the CVS version for some routines (especially eigenvalue
solvers which I needed).  You must have LAPACK installed on your system, of
course.

> However, many people have a strong difficulty in dealing with these large
> open source code. Why don't you distribute FEMLISP in an out of the box
> package? At least for Windows users, who are not so good with configuring
> and compiling stuff.

I guess it should compile with SBCL on Windows, but I did not try.

> (2) One thing in your article that drew my attention is your comment
> on the verbose declaration method of Common Lisp. I also think it is
> very annoying. However, other languages solved this very problem with
> type inference, and selective methods for writing numbers. If one
> wants type inference to generate specific code for number crunching, s/
> he could declare:
>
> (setf *genericity* Nil)
>
> This is the way people do it in Bigloo. 

I would be surprised if Bigloo does more type inference than SBCL.  At
least, the Bigloo code which I have seen up to now looked quite terrible
(IIRC, a lot of function names depended on the argument types).

> [...]
>
> Another option is to add declarations automatically. User could do
> this him/herself using a method similar to Scheme soft typing. One
> could even steal code snippets from Andrew Wright's system... Since
> SBCL is open source, all one needs  is to wrap the load function in a
> soft typing system. What do you think?

We agree that the type inference of CL implementations could be very much
improved (for extreme cases see Qi and ACL2, which achieve type-safe
resp. safe programs for "subsets" of CL), and I would like to have it, too.
However, due to the limited resources of the CL community I do not wait for
this to happen.  Probably, this is also not the most important battlefield,
e.g. portable threads, portable database access, and especially good
integration of other software are much more important for most people.

Nicolas

From: gugamilare
Subject: Re: More on as-fast-as-C benchmarks
Date: Wed, 11 Mar 2009 19:40:31 +0000
Message-ID: <c4a0843f-95c3-4adf-8b90-13fafd0be3a4@40g2000yqe.googlegroups.com>

On 6 mar, 12:43, Nicolas Neuss <········@math.uni-karlsruhe.de> wrote:
> However, due to the limited resources of the CL community I do not wait for
> this to happen.  Probably, this is also not the most important battlefield,
> e.g. portable threads, portable database access, and especially good
> integration of other software are much more important for most people.
>
> Nicolas

What is wrong with bordeaux-threads? There are also enough database
libraries you can use.
I don't think it is a good idea to put even more load over the CL
implementations, implementing CLOS, MOP, Gray Streams, etc., is
already enough. You can leave these easily implementable things to the
libraries themselves.

From: Nicolas Neuss
Subject: Re: More on as-fast-as-C benchmarks
Date: Thu, 12 Mar 2009 09:09:14 +0000
Message-ID: <87d4cnrttx.fsf@ma-patru.mathematik.uni-karlsruhe.de>

gugamilare <··········@gmail.com> writes:

> What is wrong with bordeaux-threads? There are also enough database
> libraries you can use.  I don't think it is a good idea to put even more
> load over the CL implementations, implementing CLOS, MOP, Gray Streams,
> etc., is already enough. You can leave these easily implementable things
> to the libraries themselves.

Yes, but then the central issue is that those libraries can be integrated
easily in major CL implementations on several OS.  The CL community has
done a lot of work in recent years wrt library integration, but my
experience is that several things do not work smoothly out of the box or
are at least difficult to set up - especially if you are not using SBCL on
Debian Linux, or if you want more recent version of libraries than are
available there, etc.

Nicolas

From: gugamilare
Subject: Re: More on as-fast-as-C benchmarks
Date: Wed, 11 Mar 2009 19:28:38 +0000
Message-ID: <9e7ece66-75b7-400f-a06e-10f0a519bd60@g38g2000yqd.googlegroups.com>

On 6 mar, 11:05, "comp.lang.scheme" <········@yahoo.ca> wrote:
> I believe that SMCL team could do something about declarations without
> going agains the standard. In any case, they already improved many
> aspect of Common Lisp, avoiding a front crash with the compiler while
> doing it. For instance, a few tests seem to show that one does not
> need to declare things like (the fixnum (+ ...)). One does not need to
> compile sources either: (load ...) is enough. The conclusion is that
> it may not be that difficult to create a less cumbersome method of
> dealing with declaration.

SBCL already does a lot of type inference (or I didn't understand what
you mean) and code substitution (e.g. (expt 2 n) to (ash 1 n)).
Without type declarations and (declare (optimize ...)), SBCL itself is
already very fast, since it checks types to the least possible.

On 6 mar, 12:43, Nicolas Neuss <········@math.uni-karlsruhe.de> wrote:
> We agree that the type inference of CL implementations could be very much
> improved (for extreme cases see Qi and ACL2, which achieve type-safe
> resp. safe programs for "subsets" of CL), and I would like to have it, too.
> However, due to the limited resources of the CL community I do not wait for
> this to happen.  Probably, this is also not the most important battlefield,
> e.g. portable threads, portable database access, and especially good
> integration of other software are much more important for most people.

Depends on what you mean by "CL implementations". SBCL and CMUCL
already do this, don't know very much about CCL, and in ECL, type
inference is one of the author's objectives.

If you have asked me, I would say that Scheme have holes in speed, not
CL, since there are no type declarations, which means that, if you
want to optimize a function that you know to receive only fixnums, you
can't. Or am I wrong?

I am not saying that scheme doesn't have it's own beauty and
simplicity which lacks in CL, and please don't start a flame war
because of this (there are already plenty of them around).

Gustavo.

From: Nicolas Neuss
Subject: Re: More on as-fast-as-C benchmarks
Date: Thu, 12 Mar 2009 09:00:12 +0000
Message-ID: <87hc1zru8z.fsf@ma-patru.mathematik.uni-karlsruhe.de>

gugamilare <··········@gmail.com> writes:

> On 6 mar, 12:43, Nicolas Neuss <········@math.uni-karlsruhe.de> wrote:
>> We agree that the type inference of CL implementations could be very
>> much improved (for extreme cases see Qi and ACL2, which achieve
>> type-safe resp. safe programs for "subsets" of CL) [...].
>
> Depends on what you mean by "CL implementations". SBCL and CMUCL already
> do this, don't know very much about CCL, and in ECL, type inference is
> one of the author's objectives.

I know CMUCL/SBCL, but my point stands nevertheless.  Much more than is
actually implemented there is conceivable and useful, e.g.

- Lists (or hash-tables) containing objects only of a certain type

- thorough use of CLOS slot type declarations, e.g. something like
   (defclass test ()                                                                        
      ((x :accessor xx :type  integer :initarg :x)))
   (setf (xx (make-instance 'test)) "Hi")
  should throw an error.

- Warnings (if desired), whenever CLOS methods or functions are not yet
  defined for some combination of parameter types determined at
  compile-time (needing a deeper integration of CLOS and compiler).

- Sealing of classes and generic functions to enhance performance.

- etc (And having extreme cases like Qi's type-inference and ACL2's
  correctness checks immediately for CL -especially incorporating also
  CLOS-, would make look many other programming languages rather primitive
  in comparison.)

Nicolas

From: gugamilare
Subject: Re: More on as-fast-as-C benchmarks
Date: Thu, 12 Mar 2009 17:20:45 +0000
Message-ID: <42ebbc45-47f7-4b24-b313-ee18e487811f@h20g2000yqn.googlegroups.com>

On 12 mar, 06:00, Nicolas Neuss <········@math.uni-karlsruhe.de>
wrote:
> gugamilare <··········@gmail.com> writes:
> > On 6 mar, 12:43, Nicolas Neuss <········@math.uni-karlsruhe.de> wrote:
> >> We agree that the type inference of CL implementations could be very
> >> much improved (for extreme cases see Qi and ACL2, which achieve
> >> type-safe resp. safe programs for "subsets" of CL) [...].
>
> > Depends on what you mean by "CL implementations". SBCL and CMUCL already
> > do this, don't know very much about CCL, and in ECL, type inference is
> > one of the author's objectives.
>
> I know CMUCL/SBCL, but my point stands nevertheless.  Much more than is
> actually implemented there is conceivable and useful, e.g.
>
> - Lists (or hash-tables) containing objects only of a certain type

Does this even exists in Lisp world? I mean, it would be really hard
to check the types if the list is to be like normal lists, since you
can append a list to the tail of another list. It you be very hard for
both the user control and implementor to control, not to mention it
could easily fall off the ANSI standard.
>
> - thorough use of CLOS slot type declarations, e.g. something like
>    (defclass test ()                                                                        
>       ((x :accessor xx :type  integer :initarg :x)))
>    (setf (xx (make-instance 'test)) "Hi")
>   should throw an error.

Well, this would be a good think, I agree with you.
>
> - Warnings (if desired), whenever CLOS methods or functions are not yet
>   defined for some combination of parameter types determined at
>   compile-time (needing a deeper integration of CLOS and compiler).

Ok, but only if you ask for it. I don't know how exactly this would
work, it could complain about some method not being defined for the
type 'integer and every built-in class will generate a warning, for
possibly a method which is not to be defined on these classes.
Besides, what would happen if you want to make a method for some
subclass of specific class to go directly to the parent class? It is
really boring to write:

(defmethod foo ((self bar))
  (call-next-method))

just to avoid the generation of a warning.
>
> - Sealing of classes and generic functions to enhance performance.

If I am not wrong, this existed before in SBCL for optimizing the GC,
but it was dropped for not being necessary to generational garbage
collectors.
>
> - etc (And having extreme cases like Qi's type-inference and ACL2's
>   correctness checks immediately for CL -especially incorporating also
>   CLOS-, would make look many other programming languages rather primitive
>   in comparison.)
>
I don't really know these languages (a first look into Qi made me not
like it very much).

On 12 mar, 06:09, Nicolas Neuss <········@math.uni-karlsruhe.de>
wrote:
> Yes, but then the central issue is that those libraries can be integrated
> easily in major CL implementations on several OS.  The CL community has
> done a lot of work in recent years wrt library integration, but my
> experience is that several things do not work smoothly out of the box or
> are at least difficult to set up - especially if you are not using SBCL on
> Debian Linux, or if you want more recent version of libraries than are
> available there, etc.
>
Well, if this weren't made in separate libraries, each implementation
would have to deal with each of these problems separately. This would
require MUCH more work. Not going smoothly is something much more
complicated in, say, C, than CL (ok, the comparison is not fair), and
C don't add this in the compiler because having the option of using
database X instead of using database Y and vice versa generates a
competition, which would make both databases to improve themselves.

I tend to believe that the burden of implement Standard + gray streams
+ MOP + ffi + sockets (maybe something else) is already too big, the
rest should really be up to the libraries to implement.

Gustavo.

From: Nicolas Neuss
Subject: Re: More on as-fast-as-C benchmarks
Date: Fri, 13 Mar 2009 08:54:06 +0000
Message-ID: <87hc1xrefl.fsf@ma-patru.mathematik.uni-karlsruhe.de>

gugamilare <··········@gmail.com> writes:

>> - Lists (or hash-tables) containing objects only of a certain type
>
> Does this even exists in Lisp world?

I think Qi has this feature.

>> - Warnings (if desired), whenever CLOS methods or functions are not yet
>> defined for some combination of parameter types determined at
>> compile-time (needing a deeper integration of CLOS and compiler).
>
> Ok, but only if you ask for it.

That's why I wrote "if desired".

> I don't know how exactly this would work, it could complain about some
> method not being defined for the type 'integer and every built-in class
> will generate a warning, for possibly a method which is not to be defined
> on these classes.

Why (example)?  I think it should/would work using

1. the information that CLOS method declarations anyhow have
2. additional declaration of the result type for methods (usually not done)
3. if necessary type declarations

(i.e. as an extended CMUCL/SBCL type inference scheme).

> Besides, what would happen if you want to make a
> method for some subclass of specific class to go directly to the parent
> class? It is really boring to write:
>
> (defmethod foo ((self bar))
>   (call-next-method))
>
> just to avoid the generation of a warning.

Of course, if the method for the superclass is there, no warning should be
issued.

>> - Sealing of classes and generic functions to enhance performance.
>
> If I am not wrong, this existed before in SBCL for optimizing the GC,
> but it was dropped for not being necessary to generational garbage
> collectors.

As much as I know this was only introduced for CMUCL by Gerd Moellmann, but
I might be wrong.

Nicolas

From: Juanjo
Subject: Re: More on as-fast-as-C benchmarks
Date: Fri, 13 Mar 2009 11:46:26 +0000
Message-ID: <5ecf43a6-1b81-42d0-91e9-02c119bc760f@h5g2000yqh.googlegroups.com>

On Mar 13, 9:54 am, Nicolas Neuss <········@math.uni-karlsruhe.de>
wrote:
> gugamilare <··········@gmail.com> writes:
> >> - Sealing of classes and generic functions to enhance performance.
>
> > If I am not wrong, this existed before in SBCL for optimizing the GC,
> > but it was dropped for not being necessary to generational garbage
> > collectors.
>
> As much as I know this was only introduced for CMUCL by Gerd Moellmann, but
> I might be wrong.

Indeed sealing of classes does not have to do with GC, but with fast
and predictable access to class slots.

This can be done at many levels, one of them being that you cannot
extend a sealed class or, as we do in ECL, that you declare a class to
be sealed and that you cannot change the location of its slots. This
still allows for extending classes and only imposes on the developer
the burden of not doing incompatible changes and forbids multiple
inheritance if it involves more than one sealed class.

The advantage is that you do (DECLARE (TYPE MY-SEALED-CLASS FOO)) and
(SETF (MY-SLOT-ACCESSOR FOO) 2.0) can be inlined and as fast as in C+
+.

Juanj

From: gugamilare
Subject: Re: More on as-fast-as-C benchmarks
Date: Fri, 13 Mar 2009 16:25:17 +0000
Message-ID: <6321dde6-0951-4ef3-b780-b96ca621e619@33g2000yqm.googlegroups.com>

On 13 mar, 05:54, Nicolas Neuss <········@math.uni-karlsruhe.de>
wrote:
> gugamilare <··········@gmail.com> writes:
>
> > I don't know how exactly this would work, it could complain about some
> > method not being defined for the type 'integer and every built-in class
> > will generate a warning, for possibly a method which is not to be defined
> > on these classes.
>
> Why (example)?

Well, what I had in mind is defining a class (e.g. foo) and creating a
method (bar) that is only meant to work on subclasses of foo, and
maybe the class foo itself. Then the compiler would complain that the
method bar is not defined for the class x for each class x it can find
that is not subclass of foo. Unless it looks for only classes in the
same package? In this case, maybe it would be useful, but I never ran
into that kind of problem.

> I think it should/would work using
>
> 1. the information that CLOS method declarations anyhow have
> 2. additional declaration of the result type for methods (usually not done)
> 3. if necessary type declarations
>
> (i.e. as an extended CMUCL/SBCL type inference scheme).
>
> > Besides, what would happen if you want to make a
> > method for some subclass of specific class to go directly to the parent
> > class? It is really boring to write:
>
> > (defmethod foo ((self bar))
> >   (call-next-method))
>
> > just to avoid the generation of a warning.
>
> Of course, if the method for the superclass is there, no warning should be
> issued.

In the example I gave, either the method bar is not to be defined on
class foo, and a warning "method bar is not defined on class foo" will
be signaled (and I tend to always write the code to suppress
warnings), or the method bar will be defined on class foo and it won't
generate any warnings at all for all subclasses of foo.

I might be missing some use case you may be considering as useful.
Maybe cl-store for instance? Well, cl-store is one specific case where
there is a method that is intended to be defined for every class, but
that is too specific. And I believe this is a so specific use case
that it's better to use MOP's features to implement this (generic-
function-methods, method-specializers, class-subclasses, ...). You can
map the subclasses of T or standard-class and see which don't have a
specialized method for your case.

From: Pillsy
Subject: Re: More on as-fast-as-C benchmarks
Date: Thu, 12 Mar 2009 19:53:27 +0000
Message-ID: <73e14393-d220-46a9-97e9-ed932d38aa73@d19g2000yqb.googlegroups.com>

On Mar 11, 3:28 pm, gugamilare <··········@gmail.com> wrote:
[...]
> If you have asked me, I would say that Scheme have holes in speed, not
> CL, since there are no type declarations, which means that, if you
> want to optimize a function that you know to receive only fixnums, you
> can't. Or am I wrong?

I'd say you mostly are. The Scheme standard doesn't specify type
declarations the way the CL standard does, but I know that some
implmentations[1] provide optional declarations like CL does. The
Scheme community pretty clearly likes leaving a whole lot more up to
the implementors than the CL community does, and conforming CL
implementations can always choose to ignore the type declarations.

Cheers,
Pillsy

[1] Bigloo Scheme, for one.

From: William D Clinger
Subject: Re: More on as-fast-as-C benchmarks
Date: Thu, 12 Mar 2009 23:50:36 +0000
Message-ID: <d6609972-d729-46d6-b413-688eaf03bb90@o36g2000yqh.googlegroups.com>

Gustavo wrote:
> If you have asked me, I would say that Scheme have holes in speed, not
> CL, since there are no type declarations, which means that, if you
> want to optimize a function that you know to receive only fixnums, you
> can't. Or am I wrong?

The R6RS specifies standard libraries for fixnum-specific
and flonum-specific arithmetic:

http://www.r6rs.org/final/html/r6rs-lib/r6rs-lib-Z-H-12.html#node_chap_11

Several	Scheme compilers infer representation types even
in the absence of representation-specific operations or
declarations, but that kind of optimization will usually
be more effective when the program uses fixnum-specific
and/or flonum-specific operations.

Will

From: Nicolas Neuss
Subject: Re: More on as-fast-as-C benchmarks
Date: Fri, 13 Mar 2009 08:59:06 +0000
Message-ID: <87d4clre79.fsf@ma-patru.mathematik.uni-karlsruhe.de>

William D Clinger <········@yahoo.com> writes:

> The R6RS specifies standard libraries for fixnum-specific
> and flonum-specific arithmetic:
>
> http://www.r6rs.org/final/html/r6rs-lib/r6rs-lib-Z-H-12.html#node_chap_11

I cannot say that I like this scheme of naming operations depending on
their argument types.  It looks like the poor man's type inference and
makes code quite ugly.

> Several Scheme compilers infer representation types even
> in the absence of representation-specific operations or
> declarations, but that kind of optimization will usually
> be more effective when the program uses fixnum-specific
> and/or flonum-specific operations.

The same would be valid for CMUCL/SBCL.  But I would very much prefer the
use of CLOS knowledge in the compiler to such naming schemes.

Nicolas

From: Waldek Hebisch
Subject: Re: More on as-fast-as-C benchmarks
Date: Fri, 06 Mar 2009 14:19:14 +0000
Message-ID: <gorbd2$8i4$1@z-news.wcss.wroc.pl>

comp.lang.scheme <········@yahoo.ca> wrote:
> A few days ago, I posted a comment on those as-fast-as-C Lisp
> benchmarks. My results did not match the published data.  In my tests,
> SBCL did not prove itself to be as fast as optimized C, but almost 2
> times slower.

That agrees very well with "published data" for example from
Debian shootout site.  Note: shootout has many flaws and you
should _really_ carefully examine their data before going to
any conclusion.  But if you put more weight on data collected
by other than on your own, then you may as well use shootout
data...

> After the
> modification,  SBCL and C came practically to a draw! I am using SBCL
> 1.26 on Windows (I recompiled it from sources, thanks to Pascal
> Constanza's help).
> 
> 

> double** makearray(int row,int col) {
>   double **v;
>   int i,j;
>   v=(double **)malloc(row*sizeof(double *));
>   for(i=0;i<row;i++){
>     *(v+i)=(double *)malloc(col*sizeof(double));
>   }
> 
>   for(i=0;i<row;i++){
>           for(j=0;j<col;j++)
>       v[i][j]=0;}
>   return(v);
> }
>

> 
> (defun bilde-array ()
>   (let ((res (make-array '(30 #.*n*)
>                          :element-type 'double-float
>                          )))
>     (declare (type (simple-array double-float (30 #.*n*)) res) )
>     (do ( (j 0 (1+ j)))  ( (>= j 10))
>     (dotimes (i #.*n*)
>              (declare (type fixnum i))
>              (setf (aref res (* 3 j) i) (coerce i 'double-float))
>              (setf (aref res (1+ (* 3 j)) i) (coerce (- #.*n* i) 'double-
> float))) )
>     res
>     ))
> 

You are comparing apples with oranges.  The C code below creates
array of pointers to arrays, forcing use of double indirection in
all calculations.  Lisp code uses one dimensional array.

If you want fair comparison either use array of arrays in Lisp
(will slow it down) or use one dimensional array in C.

If you want to "prove" that Lisp is as fast as C than you should
try to better hide inefficiency in C version.

-- 
                              Waldek Hebisch
·······@math.uni.wroc.pl

From: comp.lang.scheme
Subject: Re: More on as-fast-as-C benchmarks
Date: Fri, 06 Mar 2009 17:46:04 +0000
Message-ID: <86c440ab-ebc5-4750-9ebb-2848647722dd@v15g2000yqn.googlegroups.com>

On 6 mar, 11:19, Waldek Hebisch <·······@math.uni.wroc.pl> wrote:
> comp.lang.scheme <········@yahoo.ca> wrote:
> > A few days ago, I posted a comment on those as-fast-as-C Lisp
> > benchmarks. My results did not match the published data.  In my tests,
> > SBCL did not prove itself to be as fast as optimized C, but almost 2
> > times slower.
>
> That agrees very well with "published data" for example from
> Debian shootout site.  Note: shootout has many flaws and you
> should _really_ carefully examine their data before going to
> any conclusion.  But if you put more weight on data collected
> by other than on your own, then you may as well use shootout
> data...
>
>
>
> > After the
> > modification,  SBCL and C came practically to a draw! I am using SBCL
> > 1.26 on Windows (I recompiled it from sources, thanks to Pascal
> > Constanza's help).
>
> > double** makearray(int row,int col) {
> >   double **v;
> >   int i,j;
> >   v=(double **)malloc(row*sizeof(double *));
> >   for(i=0;i<row;i++){
> >     *(v+i)=(double *)malloc(col*sizeof(double));
> >   }
>
> >   for(i=0;i<row;i++){
> >           for(j=0;j<col;j++)
> >       v[i][j]=0;}
> >   return(v);
> > }
>
> > (defun bilde-array ()
> >   (let ((res (make-array '(30 #.*n*)
> >                          :element-type 'double-float
> >                          )))
> >     (declare (type (simple-array double-float (30 #.*n*)) res) )
> >     (do ( (j 0 (1+ j)))  ( (>= j 10))
> >     (dotimes (i #.*n*)
> >              (declare (type fixnum i))
> >              (setf (aref res (* 3 j) i) (coerce i 'double-float))
> >              (setf (aref res (1+ (* 3 j)) i) (coerce (- #.*n* i) 'double-
> > float))) )
> >     res
> >     ))
>
> You are comparing apples with oranges.  The C code below creates
> array of pointers to arrays, forcing use of double indirection in
> all calculations.  Lisp code uses one dimensional array.
>
> If you want fair comparison either use array of arrays in Lisp
> (will slow it down) or use one dimensional array in C.
>
> If you want to "prove" that Lisp is as fast as C than you should
> try to better hide inefficiency in C version.
>
> --
>                               Waldek Hebisch
> ·······@math.uni.wroc.pl

Hi, Waldek.
As I told you, the original C code was not using pointer of pointers.
I changed the code thinking that this would slow down C. I was amazed
to discover that C with pointers became much faster than the original
one, without pointers (3 times faster). I still cannot understand why.
Anyway, I do not know much about C. However, it seems that C is very
good at pointers.

From: Nicolas Neuss
Subject: Re: More on as-fast-as-C benchmarks
Date: Fri, 06 Mar 2009 18:30:52 +0000
Message-ID: <87ocwewlk3.fsf@ma-patru.mathematik.uni-karlsruhe.de>

"comp.lang.scheme" <········@yahoo.ca> writes:

> As I told you, the original C code was not using pointer of pointers.
> I changed the code thinking that this would slow down C. I was amazed
> to discover that C with pointers became much faster than the original
> one, without pointers (3 times faster). I still cannot understand why.

Presumably two-dimensional array access is not optimized well in SBCL.  For
faster operation you can use 1D-array access and do the index calculations
yourself.  Note that you can access a 2D-array as 1D-array with
ROW-MAJOR-AREF.

Nicolas

From: Waldek Hebisch
Subject: Re: More on as-fast-as-C benchmarks
Date: Fri, 06 Mar 2009 20:20:37 +0000
Message-ID: <gos0il$kje$1@z-news.wcss.wroc.pl>

comp.lang.scheme <········@yahoo.ca> wrote:
> 
> Hi, Waldek.
> As I told you, the original C code was not using pointer of pointers.
> I changed the code thinking that this would slow down C. I was amazed
> to discover that C with pointers became much faster than the original
> one, without pointers (3 times faster). I still cannot understand why.
> Anyway, I do not know much about C. However, it seems that C is very
> good at pointers.

Yes, actually pointers have only moderate effect in this case.
By increasing N you have increased data size so that it no longer
fits into the processor caches.

Below are my results on 2.4 GHz Core 2.  I have tried both your
value of N and smaller N, so that data fits into L2 cache
(for smaller N I compensated increasing number of iterations,
so that total number of operations stays the same).

         N   1024    128*1024
sbcl         2.104    2.228
C pointers   1.484    1.707
C arrays     0.422    1.607

I must admit that I do not understand why there is so big difference
between array and pointer version for smaller N -- at first glance
code generated for pointer version looks quite good.

-- 
                              Waldek Hebisch
·······@math.uni.wroc.pl

From: Raymond Toy
Subject: Re: More on as-fast-as-C benchmarks
Date: Mon, 09 Mar 2009 13:04:36 +0000
Message-ID: <sxd7i2y7spn.fsf@rtp.ericsson.se>

>>>>> "Marco" == Marco Antoniotti <·······@gmail.com> writes:

    Marco> I do not know either why the C compiler (I assume gcc) is faster on
    Marco> one example.
    Marco> But I'd bet that most of the SBCL/CMUCL slowdown is due to this

    Marco> CL-USER 42 > (aref #2A((0 1) (1 0)) 2 2)

    Marco> Error: The subscript 2 exceeds the limit 1 for the second dimension
    Marco> of the array #2A((0 1) (1 0)).
    Marco>   1 (abort) Return to level 0.
    Marco>   2 Return to top loop level 0.

    Marco> Type :b for backtrace, :c <option number> to proceed,  or :? for other
    Marco> options


    Marco> The above is LWM, but bound checks are mandated in CL.

But bounds checking is usually disabled when safety = 0, as is the
case with the test code.

Ray

From: Marco Antoniotti
Subject: Re: More on as-fast-as-C benchmarks
Date: Mon, 09 Mar 2009 16:46:24 +0000
Message-ID: <9aa795bc-bafd-4ff8-9e56-843d1446682a@z1g2000yqn.googlegroups.com>

On Mar 9, 2:04 pm, Raymond Toy <···········@stericsson.com> wrote:
> >>>>> "Marco" == Marco Antoniotti <·······@gmail.com> writes:
>
>     Marco> I do not know either why the C compiler (I assume gcc) is faster on
>     Marco> one example.
>     Marco> But I'd bet that most of the SBCL/CMUCL slowdown is due to this
>
>     Marco> CL-USER 42 > (aref #2A((0 1) (1 0)) 2 2)
>
>     Marco> Error: The subscript 2 exceeds the limit 1 for the second dimension
>     Marco> of the array #2A((0 1) (1 0)).
>     Marco>   1 (abort) Return to level 0.
>     Marco>   2 Return to top loop level 0.
>
>     Marco> Type :b for backtrace, :c <option number> to proceed,  or :? for other
>     Marco> options
>
>     Marco> The above is LWM, but bound checks are mandated in CL.
>
> But bounds checking is usually disabled when safety = 0, as is the
> case with the test code.

Ooops, I missed that.

Cheers
--
Marco

From: Raymond Toy
Subject: Re: More on as-fast-as-C benchmarks
Date: Fri, 06 Mar 2009 17:34:32 +0000
Message-ID: <sxdk5727dxz.fsf@rtp.ericsson.se>

>>>>> "Waldek" == Waldek Hebisch <·······@math.uni.wroc.pl> writes:

    >> 
    >> (defun bilde-array ()
    >> (let ((res (make-array '(30 #.*n*)
    >> :element-type 'double-float
    >> )))
    >> (declare (type (simple-array double-float (30 #.*n*)) res) )
    >> (do ( (j 0 (1+ j)))  ( (>= j 10))
    >> (dotimes (i #.*n*)
    >> (declare (type fixnum i))
    >> (setf (aref res (* 3 j) i) (coerce i 'double-float))
    >> (setf (aref res (1+ (* 3 j)) i) (coerce (- #.*n* i) 'double-
    >> float))) )
    >> res
    >> ))
    >> 

    Waldek> You are comparing apples with oranges.  The C code below creates
    Waldek> array of pointers to arrays, forcing use of double indirection in
    Waldek> all calculations.  Lisp code uses one dimensional array.

Looks like a 2-D array to me: (make-array '(30 #.*n*) ...)

Does SBCL have specialized 2D arrays?  If not, then each array access
might have to do several memory accesses to fetch/store an array
element.

(Interestingly, with CMUCL on sparc (1.5 GHz), the Lisp version takes 20 sec,
but the C version takes 24 sec.  With CMUCL on x86/linux (2.4 GHz), the Lisp
version takes 4.5 sec, and the C version takes 2.4 sec.  
Not sure what that this all means. )

Ray

From: comp.lang.scheme
Subject: Re: More on as-fast-as-C benchmarks
Date: Fri, 06 Mar 2009 18:47:13 +0000
Message-ID: <540b0f39-4e32-45fa-b5e0-9134e62dd65a@o11g2000yql.googlegroups.com>

On 6 mar, 14:34, Raymond Toy <···········@stericsson.com> wrote:
> >>>>> "Waldek" == Waldek Hebisch <·······@math.uni.wroc.pl> writes:
>
>     >>
>     >> (defun bilde-array ()
>     >> (let ((res (make-array '(30 #.*n*)
>     >> :element-type 'double-float
>     >> )))
>     >> (declare (type (simple-array double-float (30 #.*n*)) res) )
>     >> (do ( (j 0 (1+ j)))  ( (>= j 10))
>     >> (dotimes (i #.*n*)
>     >> (declare (type fixnum i))
>     >> (setf (aref res (* 3 j) i) (coerce i 'double-float))
>     >> (setf (aref res (1+ (* 3 j)) i) (coerce (- #.*n* i) 'double-
>     >> float))) )
>     >> res
>     >> ))
>     >>
>
>     Waldek> You are comparing apples with oranges.  The C code below creates
>     Waldek> array of pointers to arrays, forcing use of double indirection in
>     Waldek> all calculations.  Lisp code uses one dimensional array.
>
> Looks like a 2-D array to me: (make-array '(30 #.*n*) ...)
>
> Does SBCL have specialized 2D arrays?  If not, then each array access
> might have to do several memory accesses to fetch/store an array
> element.
>
> (Interestingly, with CMUCL on sparc (1.5 GHz), the Lisp version takes 20 sec,
> but the C version takes 24 sec.  With CMUCL on x86/linux (2.4 GHz), the Lisp
> version takes 4.5 sec, and the C version takes 2.4 sec.  
> Not sure what that this all means. )
>
> Ray

Hi, Ray.
I do not know much lisp, or C. In fact, I started learning Lisp two
weeks ago. Of course, I program well in Clean (a functional language)
and have a good starting on Scheme (mostly Stalin). Notwithstanding my
lack of knowledge to give an educated opinion, I agree with you. I
also thought that the author of the benchmark were using 2D arrays. I
did not comment because... well I do not know Lisp. In any case, if
the Lisp compiler specializes the 2D array automatically, the
benchmarker must live with it. I have the impression that there are C
compilers that optimize pointers, and we need to live with that too.

BTW, I wrote this posting to retract from a previous one, where I
claimed that Lisp benchmarks do not deliver what they promise. As an
example, I presented this program, and said that C was almost 3 times
faster. It seems that the author was right, and I was wrong. After a
very small modification to prevent the particular compiler that I am
using to perform a meaningless optimization, Lisp proved to be as fast
as C. At least for number crunching with a lot of array references.

> (Interestingly, with CMUCL on sparc (1.5 GHz), the Lisp version takes 20 sec,
> but the C version takes 24 sec.  With CMUCL on x86/linux (2.4 GHz), the Lisp
> version takes 4.5 sec, and the C version takes 2.4 sec.
> Not sure what that this all means. )

I suggest that you increase the array dimensions and size, in order to
see whether this behavior has anything to do with cache optimization,
as suggested in Nicolas' paper. If so, a sufficiently large array will
exhaust the cache. Increase all dimensions by the same amount.

From: Waldek Hebisch
Subject: Re: More on as-fast-as-C benchmarks
Date: Fri, 06 Mar 2009 19:59:42 +0000
Message-ID: <gorvbe$jb9$1@z-news.wcss.wroc.pl>

Raymond Toy <···········@stericsson.com> wrote:
> >>>>> "Waldek" == Waldek Hebisch <·······@math.uni.wroc.pl> writes:
> 
>     Waldek> You are comparing apples with oranges.  The C code below creates
>     Waldek> array of pointers to arrays, forcing use of double indirection in
>     Waldek> all calculations.  Lisp code uses one dimensional array.
> 
> Looks like a 2-D array to me: (make-array '(30 #.*n*) ...)
>

Yes, 2-D array.  Lisp does not have to go via double indirection
that C version has.
 
> Does SBCL have specialized 2D arrays?  If not, then each array access
> might have to do several memory accesses to fetch/store an array
> element.
> 
> (Interestingly, with CMUCL on sparc (1.5 GHz), the Lisp version takes 20 sec,
> but the C version takes 24 sec.  With CMUCL on x86/linux (2.4 GHz), the Lisp
> version takes 4.5 sec, and the C version takes 2.4 sec.  
> Not sure what that this all means. )
>

Your x86/linux machine has much better memory subsystem than sparc.

-- 
                              Waldek Hebisch
·······@math.uni.wroc.pl

From: Raymond Toy
Subject: Re: More on as-fast-as-C benchmarks
Date: Fri, 06 Mar 2009 20:16:24 +0000
Message-ID: <sxdfxhq76g7.fsf@rtp.ericsson.se>

>>>>> "Waldek" == Waldek Hebisch <·······@math.uni.wroc.pl> writes:

    Waldek> Raymond Toy <···········@stericsson.com> wrote:
    >> >>>>> "Waldek" == Waldek Hebisch <·······@math.uni.wroc.pl> writes:
    >> 
    Waldek> You are comparing apples with oranges.  The C code below creates
    Waldek> array of pointers to arrays, forcing use of double indirection in
    Waldek> all calculations.  Lisp code uses one dimensional array.
    >> 
    >> Looks like a 2-D array to me: (make-array '(30 #.*n*) ...)
    >> 

    Waldek> Yes, 2-D array.  Lisp does not have to go via double indirection
    Waldek> that C version has.

Do you know if SBCL has specialized 2D arrays?  If not, then, assuming
it still the same as CMUCL, there is a double indirection.  The first
is to get the address of where the array data is stored, the second
is to get the actual element.

Ray