From: Will Deakin
Subject: Copying arrays
Date: 
Message-ID: <85crca$62i$1@nnrp1.deja.com>
Dear All,

is there a `good' way of copying arbitary simple-arrays?

I have hacked up:

(defun copy-array (a)
  "Returns a copy of the given array"
  (let* ((dimensions (array-dimensions a))
         (size (apply #'* dimensions))
         (c (make-array dimensions))
         (u (make-array size :displaced-to a))
         (v (make-array size :displaced-to c)))
    (dotimes (i size c)
      (setf (aref v i) (aref u i)))))

This uses the :displaced-to to flatten the array for copying and so will
handle arbitary nesting. Is there a *BETTER* way? Also this doesn't
handle fill pointer stuff! How can you do this?

Best Regards and a Happy New Year,

:) will


Sent via Deja.com http://www.deja.com/
Before you buy.

From: Bernhard Pfahringer
Subject: Re: Copying arrays
Date: 
Message-ID: <85ct80$oge$1@www.univie.ac.at>
In article <············@nnrp1.deja.com>,
Will Deakin  <········@pindar.com> wrote:
>Dear All,
>
>is there a `good' way of copying arbitary simple-arrays?
>
>I have hacked up:
>
>(defun copy-array (a)
>  "Returns a copy of the given array"
>  (let* ((dimensions (array-dimensions a))
>         (size (apply #'* dimensions))
>         (c (make-array dimensions))
>         (u (make-array size :displaced-to a))
>         (v (make-array size :displaced-to c)))
>    (dotimes (i size c)
>      (setf (aref v i) (aref u i)))))
>
>This uses the :displaced-to to flatten the array for copying and so will
>handle arbitary nesting. Is there a *BETTER* way? Also this doesn't
>handle fill pointer stuff! How can you do this?
>

use ROW-MAJOR-AREF (and ARRAY-TOTAL-SIZE, btw)

(defun copy-array (a)
  "Returns a copy of the given array"
  (let* ((dimensions (array-dimensions a))
	 (size (array-total-size a))
	 (c (make-array dimensions)))
    (dotimes (i size c)
      (setf (row-major-aref c i) (row-major-aref a i)))))

Fill-pointer: check for it, create if present and copy up
to the current value (if that is what you want).
The same goes for specialized arrays, e.g. single-float,
as well as displacements, what have you. Different versions/
semantics of copying are reasonable, and you have to decide
which one's appropriate for your task.

Bernhard
-- 
--------------------------------------------------------------------------
Bernhard Pfahringer
Austrian Research Institute for  http://www.ai.univie.ac.at/~bernhard/
Artificial Intelligence          ········@ai.univie.ac.at 
From: Will Deakin
Subject: Re: Copying arrays
Date: 
Message-ID: <85fagb$15p$1@nnrp1.deja.com>
Thank you for your help.

As a suplimentary: Is there an easy way to splice arrays?

For example: I would like the values of the second column of the array:

#3(((1 2 3 4) (5 6 7 8) (9 10 11 12)))

that is, #(2 6 10), say. Or of the middle row, #(5 6 7 8).

Best Regards,

:) will



Sent via Deja.com http://www.deja.com/
Before you buy.
From: Barry Margolin
Subject: Re: Copying arrays
Date: 
Message-ID: <1aIe4.6$hm1.95@burlma1-snr2>
In article <············@nnrp1.deja.com>,
Will Deakin  <········@pindar.com> wrote:
>Thank you for your help.
>
>As a suplimentary: Is there an easy way to splice arrays?
>
>For example: I would like the values of the second column of the array:
>
>#3(((1 2 3 4) (5 6 7 8) (9 10 11 12)))
>
>that is, #(2 6 10), say. Or of the middle row, #(5 6 7 8).

You can get the middle row by using a displaced array, but there isn't any
way to get a column.

The Lisp Machine has a neat feature called "conformant arrays".  These are
displaced arrays that access a sub-rectangle of another array.  They were
created to assist with the window system implementation, I think, and make
use of hardware or microcode assistance for performance.

-- 
Barry Margolin, ······@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
From: Tim Bradshaw
Subject: Re: Copying arrays
Date: 
Message-ID: <ey3u2kkmssi.fsf@cley.com>
* Barry Margolin wrote:

> You can get the middle row by using a displaced array, but there isn't any
> way to get a column.

Note (Will!) that this is pretty much dictated by the way arrays are
stored in memory -- getting at a row is just a matter of finding the
chunk of memory that corresponds to the row, while getting at a column
involves skipping by row-sized offsets between each element.

It would be quite easy to create a `column vector' object which
supported CVREF and (SETF CVREF), by storing a secret reference to the
array, and computing the appropriate ROW-MAJOR-AREF into it.  But
you'd probably find that performance was dreadful because of machine
caching issues.

--tim
From: Marco Antoniotti
Subject: Re: Copying arrays
Date: 
Message-ID: <lwzoublfk7.fsf@parades.rm.cnr.it>
Tim Bradshaw <···@cley.com> writes:

> * Barry Margolin wrote:
> 
> > You can get the middle row by using a displaced array, but there isn't any
> > way to get a column.
> 
> Note (Will!) that this is pretty much dictated by the way arrays are
> stored in memory -- getting at a row is just a matter of finding the
> chunk of memory that corresponds to the row, while getting at a column
> involves skipping by row-sized offsets between each element.
> 
> It would be quite easy to create a `column vector' object which
> supported CVREF and (SETF CVREF), by storing a secret reference to the
> array, and computing the appropriate ROW-MAJOR-AREF into it.  But
> you'd probably find that performance was dreadful because of machine
> caching issues.

How would you implement column addressing in C?  Just curious.

BTW. Some time ago I implemented a little package that allowed you to
automatically define some arrary accessors for displaced arrays, in
order to emulate some C pointer juggling tricks.  Maybe I could add
CVREF to it.  Somehow the new GNUS I installed does not allow me to
send uuencoded text (I get a 'binary in non binary group') and I
haven'e had time to investigate it yet.  So if you want I can send it
to you via email.

Cheers

PS. Any hint about this GNUS setup will be appreciated.

-- 
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - 06 68 10 03 17, fax. +39 - 06 68 80 79 26
http://www.parades.rm.cnr.it/~marcoxa
From: Tim Bradshaw
Subject: Re: Copying arrays
Date: 
Message-ID: <ey3ogarms6w.fsf@cley.com>
* Marco Antoniotti wrote:

> How would you implement column addressing in C?  Just curious.

I think you'd do the same thing -- have some object which had a pointer
to the real array and stashed the stride you needed.  In C I guess
you'd need to say cvref(o,i) and so on, in C++ you could overload []?

--tim
From: Will Deakin
Subject: Re: Copying arrays
Date: 
Message-ID: <85ks55$3e7$1@nnrp1.deja.com>
  Tim wrote:
> * Marco wrote:
> > How would you implement column addressing in C?  Just curious.
>
> I think you'd do the same thing -- have some object which had a
> pointer to the real array and stashed the stride you needed.  In C I
> guess you'd need to say cvref(o,i) and so on,

I agree. Sort of based on Numerical Recipes in C I wrote code that
allocated multidimensional arrays: 1D-array (vector) as a block of
memory with a pointer to the start of the block; A 2D-array as a pointer
to a vector of pointers to vectors, the row; A 3D-array as a pointer to
a vector of pointers to a vector of pointers. Ad nauseum.

Rows are accessed using `straightforward' pointer arithmetic. A column
vector (for a 2D-array) or pointer to a column vector of pointers (for a
3D-array) could be created. You could even write a macros (sic) to
access the values you wanted! Something like: #define cvref(i,o)
(*(i+nrows*o)))...or something. Urrgh.

> in C++ you could overload []?

Yup. Or would you overload ()? And write a `nice' API layer to sit on
top of the C-code described above. ;)

Best Regards,

:) will


Sent via Deja.com http://www.deja.com/
Before you buy.
From: Will Deakin
Subject: Re: Copying arrays
Date: 
Message-ID: <85kho1$rcd$1@nnrp1.deja.com>
IMarco wrote:

> ...Some time ago I implemented a little package that allowed you
> to automatically define some arrary  accessors for displaced
> arrays...So if you want I can send it to you via email.

Yes please. That would be appreciated,

"mille grazie e migliorissimo riguardi",

:) will


Sent via Deja.com http://www.deja.com/
Before you buy.
From: Robert Monfera
Subject: Re: Copying arrays
Date: 
Message-ID: <387C957B.E1A40EDF@fisec.com>
BM: You can get the middle row by using a displaced array, but there
BM: isn't any way to get a column.

TB: It would be quite easy to create a `column vector' object which
TB: supported CVREF and (SETF CVREF), by storing a secret reference to
TB: the array, and computing the appropriate ROW-MAJOR-AREF into it.
TB: But you'd probably find that performance was dreadful because of
TB: machine caching issues.

Hi Tim,

Your comments on cache miss penalty made me curious as it would
negatively impact the application I am working on (or conversely, it
would bring an opportunity to further increase its speed :-).

On ACL/Pentium2, I did some testing with simple pieces of code and
(speed 3)(safety 0) that did nothing other than accumulated 0 array
values or set array values to 0 with AREF.

1. 4000x4000 fixnum array traversal both ways: no difference in time (I
guess ACL optimized my nested DOTIMES).

2. Vector of 4000 elements, where each element has 4000 fixnums (to
confuse the compiler): the run-time of the 'bad' order was 25% longer of
the other.

3. Vector of 16 elements, where each element has 1000000 fixnums: the
slower run-time was just 2.9% times longer of the faster.

As you suggest, in both cases, the faster runtime award always went with
the cache-observant direction.

The most interesting thing, however, is that vectors in vectors allow
you to access elements with SVREF, which halve run-time overall!  As an
added benefit, you can handle much larger arrays this way, and you can
easily sort both rows and columns.

Using SVREF, the cache miss penalty was also 2.9% in case 3.  I'd be
interested in your opinion as to why performance did not degrade
significantly when I had 16 vectors of 1000000 elements (4MB/column).
It seems P2 either does not flush the entire cache or that cache miss is
not that expensive (operations were light in both cases to amplify cache
effect, only accumulating or writing 0's).

I think CVREF based on 2 dimensional arrays (like case 1) could not
approach the speed of SVREF (case 3), because SVREF does not work on 2
dimensional arrays or displaced arrays/vectors, and I see no way of
writing an acceptably fast user-level CVREF without SVREF.

[Note: on ACL, I got almost identical results if I declared that vectors
are of element type t or fixnum - SVREF even worked on fixnum vectors
(although it would not have to according to the Spec).]

My advice for fast and flexible array operations would be to use SVREF
on vectors of vectors.

Best regards
Robert
From: Bernhard Pfahringer
Subject: Re: Copying arrays
Date: 
Message-ID: <85i64b$1oko$1@www.univie.ac.at>
In article <·················@fisec.com>,
Robert Monfera  <·······@fisec.com> wrote:
>
>My advice for fast and flexible array operations would be to use SVREF
>on vectors of vectors.
>

Not if you want to deal with single- and double-float. Speed-wise properly 
defined AREF beats SVREF all the time (at least in my coding experience).

Bernhard
-- 
--------------------------------------------------------------------------
Bernhard Pfahringer
Austrian Research Institute for  http://www.ai.univie.ac.at/~bernhard/
Artificial Intelligence          ········@ai.univie.ac.at 
From: Duane Rettig
Subject: Re: Copying arrays
Date: 
Message-ID: <4zoubtgkd.fsf@beta.franz.com>
Robert Monfera <·······@fisec.com> writes:

> [Note: on ACL, I got almost identical results if I declared that vectors
> are of element type t or fixnum - SVREF even worked on fixnum vectors
> (although it would not have to according to the Spec).]

Careful.  A mistake I always have to remind myself not to make whenever
benchmarking is to check the accuracy of my results.  SVREF may "work"
on fixnum vectors for the purposes of benchmarking, but as soon as you
start increasing speed and decreasing safety to obtain speed, it becomes
more likley that you wind up with inaccurate results without knowing it
(because you've told the compiler to sacrifice safety for speed).

An SVREF when compiled acts like its array argument has the declaration
(type (simple-array t (*)) ...).  Here is a simple run in Allegro CL 5.0.1
that shows the concept above, for both compiled and interpreted svrefs:

USER(1): (defvar *a* (make-array 5 :element-type 'fixnum
                                   :initial-contents '(1 2 3 4 5)))
*A*
USER(2): (svref *a* 0)
Error: Illegal vector object passed to svref: #(1 2 3 4 5)
  [condition type: SIMPLE-ERROR]

Restart actions (select using :continue):
 0: Return to Top Level (an "abort" restart)
 1: Abort #<PROCESS Initial Lisp Listener>
[1] USER(3): :res
USER(4): (defun foo (x) 
            (declare (optimize speed))
            (loop for i from 0 to 4
                  collect (svref x i)))
FOO
USER(5): (compile *)
FOO
NIL
NIL
USER(6): (foo *a*)
(#<Printer Error, obj=#x1: #<SIMPLE-ERROR @ #x205015da>>
 #<Printer Error, obj=#x2: Received signal number 4 (Illegal instruction)>
 #<unknown object of type number 3 @ #x3> 1
 #<Printer Error, obj=#x5: #<SIMPLE-ERROR @ #x20502fba>>)
USER(7): 

I didn't show it in the example above, but removing the optimize
declaration in the definition of foo will also yield an error, as
did the interpreted access attempt.

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)
From: Tim Bradshaw
Subject: Re: Copying arrays
Date: 
Message-ID: <ey3embmmd5f.fsf@cley.com>
* Robert Monfera wrote:

> On ACL/Pentium2, I did some testing with simple pieces of code and
> (speed 3)(safety 0) that did nothing other than accumulated 0 array
> values or set array values to 0 with AREF.

> 1. 4000x4000 fixnum array traversal both ways: no difference in time (I
> guess ACL optimized my nested DOTIMES).

Are you sure you are measuring the right thing?  I timed this code:


    (defun zfa (a &optional bad-order-p)
      (declare (type (simple-array fixnum (4000 4000))
		     a)
	       (optimize speed
			 (safety 0)))
      (if bad-order-p
	  (dotimes (i 4000 a)
	    (declare (type fixnum i))
	    (dotimes (j 4000)
	      (declare (type fixnum j))
	      (setf (aref a j i) 0)))
	  (dotimes (i 4000 a)
	    (declare (type fixnum i))
	    (dotimes (j 4000)
	      (declare (type fixnum j))
	      (setf (aref a i j) 0)))))



    (defvar *a* (make-array '(4000 4000)
			    :element-type 'fixnum
			    :initial-element 0))


On a Sun Ultra 10.  For ACL 5.0.1 I get about a factor of 2
performance difference between good and bad orders.  For CMUCL I get
about a factor of 4.  The good case for CMUCL is about 3x as fast as
the good ACL case (CMUCL is very good at this kind of code, it's not
3x as fast as ACL in general, in fact it's probably slower).

You need to be very sure that the array type has objects stored
directly in it, otherwise you incur an indirection per access which
may well ensure that you miss the cache almost all the time in either
order.

> The most interesting thing, however, is that vectors in vectors allow
> you to access elements with SVREF, which halve run-time overall!  As an
> added benefit, you can handle much larger arrays this way, and you can
> easily sort both rows and columns.

I would be quite surprised if SVREF is faster in a case you can get
the really good cache behaviour (which I think is basically arrays of
good numerical types). Since SVREF must often follow a pointer on
every access you'll get at least 2 sorts of cache effects all mixed
up, the second (and perhaps dominant) one depending on how large the
objects the array points to are and how good the memory allocator & GC
is at getting them close to each other if they are small.

(I presume that SVREF *doesn't* follow a pointer if the array actually
has fixnums or something in, since they'll still be immediate, but it
will still have to do the tag check I guess.  For something like
*-FLOATS it should lose badly).

> Using SVREF, the cache miss penalty was also 2.9% in case 3.  I'd be
> interested in your opinion as to why performance did not degrade
> significantly when I had 16 vectors of 1000000 elements (4MB/column).
> It seems P2 either does not flush the entire cache or that cache miss is
> not that expensive (operations were light in both cases to amplify cache
> effect, only accumulating or writing 0's).

I'm far from an expert on cache behaviours.  I think machines
typically flush lines at a time, and a line is maybe 16 or 32 bytes?
Cache miss ought to be quite expensive, but there are several layers
of cache between you and main memory of course.  Also, if you can find
other stuff to do to hide the latency you may be able to hide a lot of
the penalty.  There's a whole arcana in this stuff I think, and not
many people (certainly not me!) understand it.

--tim
From: Robert Monfera
Subject: Re: Copying arrays
Date: 
Message-ID: <387E35B5.46F4797C@fisec.com>
Tim Bradshaw wrote:

Thanks for your answer, it was very instructive in terms of cache
behavior.

> * Robert Monfera wrote:
>
> > On ACL/Pentium2, I did some testing with simple pieces of code and
> > (speed 3)(safety 0) that did nothing other than accumulated 0 array
> > values or set array values to 0 with AREF.
>
> > 1. 4000x4000 fixnum array traversal both ways: no difference in time (I
> > guess ACL optimized my nested DOTIMES).
>
> Are you sure you are measuring the right thing?

Not anymore :-)  My code was shorter with the declaration of the array
dimensions, which made a tremendous difference.  I have rerun those
tests and I confirm the ~3x speed difference caused by caching.  The
large difference shows up with both AREF'd fixna and SVREF'd t.  Thanks!

> > The most interesting thing, however, is that vectors in vectors allow
> > you to access elements with SVREF, which halve run-time overall!  As an
> > added benefit, you can handle much larger arrays this way, and you can
> > easily sort both rows and columns.
>
> I would be quite surprised if SVREF is faster in a case you can get
> the really good cache behaviour

According to my measurements (ACL 5.0.1, P2 350MHz, 100MHz bus), SVREF
is still faster, _especially_ with cache-optimized behaviour.

; fixnum / aref version
(defun zfa (a &optional bad-order-p)
      (declare (type (simple-array fixnum (4000 4000)) a)
	       (optimize speed (safety 0)))
      (if bad-order-p
	  (dotimes (i 4000 a)
	    (declare (type fixnum i))
	    (dotimes (j 4000)
	      (declare (type fixnum j))
	      (setf (aref a j i) 0)))
	  (dotimes (i 4000 a)
	    (declare (type fixnum i))
	    (dotimes (j 4000)
	      (declare (type fixnum j))
       (setf (aref a i j) 0)))))

> (defvar *a* (make-array '(4000 4000)
                          :element-type 'fixnum :initial-element 1))
*A*
> (time (zfa *a*))
; cpu time (non-gc) 1,100 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  1,100 msec user, 0 msec system
; real time  1,100 msec
; space allocation:
;  1 cons cell, 0 symbols, 0 other bytes, 0 static bytes
> (time (zfa *a* t))
; cpu time (non-gc) 2,910 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  2,910 msec user, 0 msec system
; real time  2,910 msec
; space allocation:
;  2 cons cells, 0 symbols, 0 other bytes, 0 static bytes

; t / svref version
(defun zfa-rm (a &optional bad-order-p)
  (declare (type (simple-array t 4000) a)
           (optimize speed (safety 0)))
  (if bad-order-p
      (dotimes (i 4000 a)
        (declare (type fixnum i))
        (dotimes (j 4000)
          (declare (type fixnum j))
          (let ((inner-vector (svref a j)))
            (declare (type (simple-array t 4000) inner-vector))
            (setf (svref inner-vector i) 0))))
    (dotimes (i 4000 a)
      (declare (type fixnum i))
      (let ((inner-vector (svref a i)))
        (declare (type (simple-array t 4000) inner-vector))
        (dotimes (j 4000)
          (declare (type fixnum j))
          (setf (svref inner-vector j) 0))))))

> (defvar *b* (make-array 4000))
> (dotimes (i 4000) (setf (aref *b* i)
     (make-array 4000 :element-type t :initial-element 1)))
> (time (zfa-rm *b*))
; cpu time (non-gc) 330 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  330 msec user, 0 msec system
; real time  330 msec ; <-- this was 660ms w/o LET
; space allocation:
;  1 cons cell, 0 symbols, 0 other bytes, 0 static bytes
> (time (zfa-rm *b* t))
; cpu time (non-gc) 2,360 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  2,360 msec user, 0 msec system
; real time  2,360 msec
; space allocation:
;  2 cons cells, 0 symbols, 0 other bytes, 0 static bytes

I have also tested a more interesting example (thus remained faithful to
the subject), which confirm SVREF's advantage:

(defun vector-copy-fixnum (a b)
  (declare (type (simple-array fixnum 1000000) a b)
           (optimize (speed 3)(safety 0)(debug 0)(compilation-speed 0)))
  (dotimes (c 1000000)
    (declare (type fixnum c))
    (let ((r (the fixnum (aref a c))))
      (declare (type fixnum r))
      (setf (aref b c) r))))

(defun vector-copy-t (a b)
  (declare (type (simple-array t 1000000) a b)
           (optimize (speed 3)(safety 0)(debug 0)(compilation-speed 0)))
  (dotimes (c 1000000)
    ;(declare (type fixnum c))
    (let ((r (aref a c)))
      ;(declare (type fixnum r))
      (setf (svref b c) r))))

> (time (vector-copy-fixnum a b))
; cpu time (non-gc) 610 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  610 msec user, 0 msec system
; real time  610 msec
; space allocation:
;  2 cons cells, 0 symbols, 0 other bytes, 0 static bytes
NIL
> (time (vector-copy-t c d))
; cpu time (non-gc) 270 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  270 msec user, 0 msec system
; real time  270 msec
; space allocation:
;  7 cons cells, 0 symbols, 0 other bytes, 0 static bytes
NIL

A third, more real-life test I did was a comparison of a fixnum and a t
version of my merge-sort algorithm.  The t version runs at the same
speed as ACL's SORT (they are likely the same), the fixnum version was
several times slower (probably copying dominated run-time).

SVREF can reach bignums etc., and the vector of vectors model allows
larger tables.  (To bring in a platform-dependent argument, SORT in ACL
is optimized for simple, general vectors, as described by Duane
recently.

Conclusions:

1. cache-awareness pays off big time
2. for some reasons, SVREF looks faster (or I am missing some
declaration)

Thanks again for your enlightening post on cache effects.

Regards,
Robert
From: Will Deakin
Subject: Re: Copying arrays
Date: 
Message-ID: <85n1n4$man$1@nnrp1.deja.com>
Dear Robert and Tim

Thank you for this. And to everybody else who has participated in this
discussion.

Best Regards,

:) will

ps: if anybody is interested (which I'm sure you are not since this is
comp.lang.lisp :) here is the C-code I rambled on about. This works
using a `vector of vectors' type thingie. Since the machine I am using
does not have enough memory to allocate the 4000x4000 array I have not
been able to profile this. But when I get the chance I'll profile this C
code and compare the timings with cmucl, clisp and allegro under Linux
on a machine with MORE MEMORY.

/*----------------------------------------------------------------------
Copyright (C) Will Deakin 2000. This is free for anybody to use for what
ever purpose they see fit. If it is broken or doesn't work don't blame
me, let me know and I'll try and fix it. Probably. But I might not be
able to so (time pressures, incopetence &c). So don't take this as any
kind of waranty. Other than this: do what you want with it*/

#define _POSIX_SOURCE 1
#include <stdlib.h>
#include <stdio.h>
#define NROW 4000
#define NCOL 4000
#define TYPE long int /*to approximate a fixnum*/

TYPE **array(const int nrows, const int ncolumns)
{
  TYPE **r;
  int i,j;
  /*the following line will need to be edited to put it on one line!*/
  r=(TYPE**)malloc((size_t)(nrows*((sizeof(TYPE*)+ncolumns* \
sizeof(TYPE)))));
  if (!r) exit(EXIT_FAILURE);
  /*this next line allocates the array of pointers to arrays*/
  for(i=0;i<ncolumns;i++) *(r+i)=(TYPE*)r+nrows+i*ncolumns;
  for (i=0;i<NROW;i++) {
    for (j=0;j<NCOL;j++) {
      r[i][j]=(TYPE)1;
    }
  }
  return r;
}

/*this is based on the lisp by tim. The badorder thing is an integer
  that when set to 1 (to approximate a boolean) reverse the order of
  the set*/

void zfa(TYPE** m,const int badorder)
{
  int i,j;
  if (badorder==1) {
    for (i=0;i<NROW;i++) {
      for (j=0;j<NCOL;j++) {
	m[j][i]=(TYPE)0;
      }
    }
  } else {
    for (i=0;i<NROW;i++) {
      for (j=0;j<NCOL;j++) {
	m[i][j]=(TYPE)0;
      }
    }
  }
}

void print(TYPE** m)
{
  int i,j;
  for (i=0;i<NROW;i++) {
    for (j=0;j<NCOL;j++) {
      printf("%d",m[i][j]); /*As this is C, this must be changed when
                              TYPE is changed*/
    }
    printf("\n");
  }
}

int main(void)
{
  TYPE **m;
  m=array(NROW,NCOL);
  zfa(m,0);
  print(m);
  free(m);
  return EXIT_SUCCESS;
}



Sent via Deja.com http://www.deja.com/
Before you buy.
From: Marco Antoniotti
Subject: Re: Copying arrays
Date: 
Message-ID: <lwg0w0234l.fsf@parades.rm.cnr.it>
For what it's worth....

Will Deakin <········@pindar.com> writes:

> Dear Robert and Tim
> 
> Thank you for this. And to everybody else who has participated in this
> discussion.
> 
> Best Regards,
> 
> :) will
> 
> ps: if anybody is interested (which I'm sure you are not since this is
> comp.lang.lisp :) here is the C-code I rambled on about. This works
> using a `vector of vectors' type thingie. Since the machine I am using
> does not have enough memory to allocate the 4000x4000 array I have not
> been able to profile this. But when I get the chance I'll profile this C
> code and compare the timings with cmucl, clisp and allegro under Linux
> on a machine with MORE MEMORY.

The problem is really the C stack.  Assuming a 'long int' is W 8-bits
bytes long, if yoy try to allocate a

	int array[4000][4000]

you will end up eating up 16000000 * W bytes. If W is 8 (i.e. 'long
int' is 64-bits) you will try to allocate 128,000,000 bytes on the C
stack.  This usually results in a segfault.

So you end up doing the regular Common Lisp thing to allocate the
array as a vector of vectors in the heap (using 'malloc' and friends).

I ran the example (without much thinking) in CMUCL and C (gcc)

In CMUCL on an Ultra 2 you get about 5 seconds for the good order and
about 10 seconds for the bad order case.  GC eats up about 3.5 secs in
both cases.

If you turn off GC, then CMUCL is pretty good, and the time is about
2.39 secs for the good case and about 7.5 secs for the bad case.

With 'gcc -O' the times are very impressive.
Here are the results

# Good case
·······@copernico:cl 75> gcc -O zfa.c
·······@copernico:cl 76> time a.out
0.52u 0.79s 0:01.39 94.2%

# Bad case 
·······@copernico:cl 77> gcc -O zfa.c
·······@copernico:cl 78> time a.out
5.41u 0.80s 0:06.33 98.1%

However, using gcc *without* the '-O' flags, causes the C program to
run in a time, which is pretty much the same (albeit always lower) as
in the CMUCL case.

So: for CL case, GC plays quite a role (especially with CMUCL on
Solaris which has a not-so-good GC).

Moreover, it is obvious that GCC has much better optimizations than
CMUCL at this point in history.  I believe this is simply a reality of
the "economy of scale" rather than something that is inherent in CL.

We have to live with it.

Cheers

-- 
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - 06 68 10 03 17, fax. +39 - 06 68 80 79 26
http://www.parades.rm.cnr.it/~marcoxa
From: Bernhard Pfahringer
Subject: Re: Copying arrays
Date: 
Message-ID: <85neac$38bg$1@www.univie.ac.at>
In article <··············@parades.rm.cnr.it>,
Marco Antoniotti  <·······@parades.rm.cnr.it> wrote:
>
>For what it's worth....
>
.... STUFF DELETED ....
>
>However, using gcc *without* the '-O' flags, causes the C program to
>run in a time, which is pretty much the same (albeit always lower) as
>in the CMUCL case.
>
>So: for CL case, GC plays quite a role (especially with CMUCL on
>Solaris which has a not-so-good GC).
>
>Moreover, it is obvious that GCC has much better optimizations than
>CMUCL at this point in history.  I believe this is simply a reality of
>the "economy of scale" rather than something that is inherent in CL.
>
>We have to live with it.
>

Well sort of, 
for CMUCL I found ROW-MAJOR-AREF to be quite efficient, the following
is twice as quick as the "good order" version:

(defun zfa-better (a)
   (declare (type (simple-array fixnum (4000 4000)) a)
	    (optimize (speed 3) (safety 0) (compilation-speed 0)))
   (dotimes (i (array-total-size a) a)
      (declare (type fixnum i))
      (setf (row-major-aref a i) 0)))

Unfortunately different implementations optimize differently,
so your "economy of scale" seems appropriate. But on the other
hand don't forget that we will never approach the efficiency
of register-based *p++ due to GC issues in general (though there
maybe save cases for optimization like this one). I am more
than happy to pay this small price for the convenience of having a GC!

Bernhard
-- 
--------------------------------------------------------------------------
Bernhard Pfahringer
Austrian Research Institute for  http://www.ai.univie.ac.at/~bernhard/
Artificial Intelligence          ········@ai.univie.ac.at 
From: Robert Monfera
Subject: Re: Copying arrays
Date: 
Message-ID: <387F3FAE.F7FC44FF@fisec.com>
Marco Antoniotti wrote:

> The problem is really the C stack.  Assuming a 'long int' is W 8-bits
> bytes long, if yoy try to allocate a
>
>         int array[4000][4000]

Speaking of long ints, what are your thoughts about a mednum data type
in Lisp (native 64-bit integers), or declarations like (integer 0 (expt
2 63)) being optimized?

> In CMUCL on an Ultra 2 you get about 5 seconds for the good order and
> about 10 seconds for the bad order case.  GC eats up about 3.5 secs in
> both cases.

What's the clock frequency of your Sun?  Is the better performance on
Pentium explained by Pentium being faster?  Tim mentioned CMUCL is
superior at number crunching, so the difference is even more surprising.

> If you turn off GC, then CMUCL is pretty good, and the time is about
> 2.39 secs for the good case and about 7.5 secs for the bad case.

How come GC plays a role when no garbage is generated?  Or maybe CMUCL
generates some in this case?  This alone would explain why CMUCL is
slower.

> Moreover, it is obvious that GCC has much better optimizations than
> CMUCL at this point in history.  I believe this is simply a reality of
> the "economy of scale" rather than something that is inherent in CL.

Putting aside Lispy things like bignums or complex data types, a Lisp
compiler could invoke GCC when it is declared so (at 0 safety of
course), and would paste the resulting code in the CL function
definition.  It requires a Lisp to C translation, which may be more
difficult than a Lisp to yet another processor compilation.  This would
turn the economy of scale argument upside down, as advances in C
compiler technology would directly benefit Lisp compilers.  The very
educating and interesting arguments from Duane (Franz) and Howard
Stearns (Eclipse) pro and cons of direct machine compilation were about
overall compilation, whereas here we talk about local optimizations of
tight loops (sorting etc.) working on restricted data types.

This can also be done manually, i.e., you get to insert the assembly
snippet in performance-critical computations (remember explanations from
Duane for ACL and considerations from Erik), but then you should be
familiar with what registers you aren't allowed to touch, and what
protocol you are required to observe.

Best regards
Robert
From: Arvid Gr�tting
Subject: Re: Copying arrays
Date: 
Message-ID: <yg11z7kskj7.fsf@pc-arvidg.infotek.no>
* Robert Monfera

> Putting aside Lispy things like bignums or complex data types, a Lisp
> compiler could invoke GCC when it is declared so (at 0 safety of
> course), and would paste the resulting code in the CL function
> definition.  It requires a Lisp to C translation, [...]
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Not necessarily.  Someone could write a CL front-end for GCC, hooking
into the (somewhat under-documented last time I looked at it, but that
may very well have changed by now) middle layer after parsing and
before code-generation.

-- 
Arvid

Saddam Hussein Nazi Qaddafi Semtex radar Waco, Texas kibo AK-47
From: Matthew Economou
Subject: Re: Copying arrays
Date: 
Message-ID: <w4o4scggv8x.fsf@nemesis.irtnog.org>
>>>>> "AG" == Arvid Gr�tting <······@pc-arvidg.infotek.no> writes:

    AG> Not necessarily.  Someone could write a CL front-end for GCC,
    AG> hooking into the (somewhat under-documented last time I looked
    AG> at it, but that may very well have changed by now) middle
    AG> layer after parsing and before code-generation.

I've looked into it, and it is very difficult without writing the Lisp
front-end in Lisp.  Several other front-ends (Modula-2, Ada) are
written in the language they compile, and there are problems building
a cross-compiler, etc.

-- 
"What though the field be lost? / All is not lost; the unconquerable Will,
 And study of revenge, immortal hate, / And courage never to submit or yield."
   -- Lucifer,
      'Paradise Lost', Milton
From: Raymond Toy
Subject: Re: Copying arrays
Date: 
Message-ID: <4nwvpcimvz.fsf@rtp.ericsson.se>
>>>>> "Robert" == Robert Monfera <·······@fisec.com> writes:

    Robert> Marco Antoniotti wrote:

    >> The problem is really the C stack.  Assuming a 'long int' is W 8-bits
    >> bytes long, if yoy try to allocate a
    >> 
    >> int array[4000][4000]

    Robert> Speaking of long ints, what are your thoughts about a
    Robert> mednum data type in Lisp (native 64-bit integers), or
    Robert> declarations like (integer 0 (expt 2 63)) being optimized?

I believe CMUCL for Alpha supports 64-bit integers, although fixnums
are still 29 bits.  I don't have an Alpha, so I don't know how good
this support is.

    Robert> Putting aside Lispy things like bignums or complex data
    Robert> types, a Lisp compiler could invoke GCC when it is
    Robert> declared so (at 0 safety of course), and would paste the
    Robert> resulting code in the CL function definition.  It requires
    Robert> a Lisp to C translation, which may be more difficult than
    Robert> a Lisp to yet another processor compilation.  This would

GCL does this when it compiles lisp functions.  My experience, though,
has been that CMUCL still does a better job in number crunching than
GCL.  That's probably an artifact of the Lisp-to-C translation in GCL.

Ray
From: Marco Antoniotti
Subject: Re: Copying arrays
Date: 
Message-ID: <lwln5n7gs9.fsf@parades.rm.cnr.it>
Robert Monfera <·······@fisec.com> writes:

> Marco Antoniotti wrote:
> 
> > The problem is really the C stack.  Assuming a 'long int' is W 8-bits
> > bytes long, if yoy try to allocate a
> >
> >         int array[4000][4000]
> 
> Speaking of long ints, what are your thoughts about a mednum data type
> in Lisp (native 64-bit integers), or declarations like (integer 0 (expt
> 2 63)) being optimized?

What do you mean by "being optimized"?  I personally do not have an
opinion.  I would just like to have the underlying CL to try as hard
as possible to use the best internal representation given the
processor architecture.  If this is not the case I can surely live
with it :)

> > In CMUCL on an Ultra 2 you get about 5 seconds for the good order and
> > about 10 seconds for the bad order case.  GC eats up about 3.5 secs in
> > both cases.
> 
> What's the clock frequency of your Sun?  Is the better performance on
> Pentium explained by Pentium being faster?  Tim mentioned CMUCL is
> superior at number crunching, so the difference is even more
> surprising.

200 Mhz

> > If you turn off GC, then CMUCL is pretty good, and the time is about
> > 2.39 secs for the good case and about 7.5 secs for the bad case.
> 
> How come GC plays a role when no garbage is generated?  Or maybe CMUCL
> generates some in this case?  This alone would explain why CMUCL is
> slower.

The test I ran was the function

	(defun test-zfa (order)
	   (zfa (make-array '(4000 4000) :element-type 'fixnum) order))

when you allocate the array, the (poor) CMUCL/Solaris gc thresholds
often and runs.

> > Moreover, it is obvious that GCC has much better optimizations than
> > CMUCL at this point in history.  I believe this is simply a reality of
> > the "economy of scale" rather than something that is inherent in CL.
> 
> Putting aside Lispy things like bignums or complex data types, a Lisp
> compiler could invoke GCC when it is declared so (at 0 safety of
> course), and would paste the resulting code in the CL function
> definition.  It requires a Lisp to C translation, which may be more
> difficult than a Lisp to yet another processor compilation.  This would
> turn the economy of scale argument upside down, as advances in C
> compiler technology would directly benefit Lisp compilers.  The very
> educating and interesting arguments from Duane (Franz) and Howard
> Stearns (Eclipse) pro and cons of direct machine compilation were about
> overall compilation, whereas here we talk about local optimizations of
> tight loops (sorting etc.) working on restricted data types.

Reading Dueane and Howard, I'd say that it is easier said than done. :{

> This can also be done manually, i.e., you get to insert the assembly
> snippet in performance-critical computations (remember explanations from
> Duane for ACL and considerations from Erik), but then you should be
> familiar with what registers you aren't allowed to touch, and what
> protocol you are required to observe.

You are right.  However, I program in CL (actually not.  I program in
Java :) ), why should I care for such things?

This is not a rethorical question.  You can come up with a convincing
argument supporting the need for such interventions.  Actually Erik
has made the best such arguments I have seen so far.  However, I want
to be able fisrt to write program fast, then to write fast programs.
Right now, all the major CL implementations allow you to write "fast
enough programs" fast.  AFAIK, no other language environment gives you
this edge.  If you want to write fast programs, then you are better
off using some assembly language (like C/C++ :) ).  If you write "fast
enough" programs and then you have an option to manually optimize your
code by sticking in special code, then you are doing the right thing
overall.  At least this is my (unsubstatiated) opinion.

Cheers

-- 
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - 06 68 10 03 17, fax. +39 - 06 68 80 79 26
http://www.parades.rm.cnr.it/~marcoxa
From: Robert Monfera
Subject: Re: Copying arrays
Date: 
Message-ID: <3884737C.F1EEB744@fisec.com>
Marco Antoniotti wrote:

> The test I ran was the function
>
>         (defun test-zfa (order)
>            (zfa (make-array '(4000 4000) :element-type 'fixnum) order))
>
> when you allocate the array, the (poor) CMUCL/Solaris gc thresholds
> often and runs.

OK, this explains a few things as to why your CMUCL code ran
significantly longer than the C code (e.g., Tim's original ZFA did not
include array creation).  I looked up results Will posted later, which
showed essentially identical speed for CMUCL code as for GCC -O3 code on
Intel (i.e., CMUCL being relatively faster than in your test).

As for Graham's mantra (write programs fast then fast programs) and good
enough speed, I absolutely agree (see my reference to Erik's postings
about optimizations).

Regards
Robert
From: Will Deakin
Subject: Re: Copying arrays
Date: 
Message-ID: <866ltg$nve$1@nnrp1.deja.com>
Robert wrote:

> ...I looked up results Will posted later, which showed essentially
> identical speed for CMUCL code as for GCC -O3 code on Intel
> (i.e., CMUCL being relatively faster than in your test).

Just a small point: in these tests on x86, both acl and cmucl were
faster than gcc for array access, in particular when using svref access.
This can be seen from the gprof results. But I did removed the
comparison of memory allocation time.

I would like to carry out these tests on a solaris box. Is there a cmucl
for Solaris 2.6/2.7? I am happy to build or do some porting work but
this would required somebody pointing me in the right direction.

> As for Graham's mantra (write programs fast then fast programs) and
> good enough speed, I absolutely agree (see my reference to Erik's
> postings about optimizations).

I just kept this in because the sentiment it too good to remove ;)

Best Regards,

:) will


Sent via Deja.com http://www.deja.com/
Before you buy.
From: Will Deakin
Subject: Re: Copying arrays
Date: 
Message-ID: <85nlel$570$1@nnrp1.deja.com>
Thank you.


I've got some even more scary numbers here: These are tests with run on
an old Axil under Solaris 2.5 with 64M (I  think it has 2 60MHz
processors and a 33MHz bus or something) for  1000x1000 arrays. The
print(m) in the code was commented out and the  (dotime i 1000 a) was
changed to (dotimes 1000) so as to have the output not affect the
results.

The first set of results is for the `easy' and the second set the `hard'
set.

gcc no optimisation 1000x1000
real    0m0.98s user    0m0.60s sys     0m0.33s

real    0m1.32s user    0m0.91s sys     0m0.34s

gcc -O3 optimisation 1000x1000
real    0m0.66s user    0m0.31s sys     0m0.31s

real    0m0.93s user    0m0.58s sys     0m0.30s

Solaris cc no optimisation 1000x1000
real    0m0.96s user    0m0.63s sys     0m0.29s

real    0m1.42s user    0m1.00s sys     0m0.37s

Solaris cc -Ox4 optimisation 1000x1000
real    0m0.65s user    0m0.29s sys     0m0.32s

real    0m1.03s user    0m0.62s sys     0m0.25s

zfa on Allegro CL 4.3 [SPARC; R1] (7/19/99 14:01)

; cpu time (non-gc) 327,390 msec (00:05:27.390) user, 1,000 msec system
; cpu time (gc)     2,610 msec user, 990 msec system
; cpu time (total)  330,000 msec (00:05:30) user, 1,990 msec system
; real time  337,089 msec (00:05:37.089)
; space allocation:
;  54,011,131 cons cells, 4,000,000 symbols, 96,016,528 other bytes

; cpu time (non-gc) 315,520 msec (00:05:15.520) user, 1,480 msec system
; cpu time (gc)     2,780 msec user, 940 msec system
; cpu time (total)  318,300 msec (00:05:18.300) user, 2,420 msec system
; real time  324,831 msec (00:05:24.831)
; space allocation:
;  54,011,132 cons cells, 4,000,000 symbols, 96,016,528 other bytes

zfa-rm Allegro CL 4.3 [SPARC; R1] (7/19/99 14:01)

; cpu time (non-gc) 82,840 msec (00:01:22.840) user, 650 msec system
; cpu time (gc)     3,140 msec user, 250 msec system
; cpu time (total)  85,980 msec (00:01:25.980) user, 900 msec system
; real time  87,443 msec (00:01:27.443)
; space allocation:
;  16,017,011 cons cells, 0 symbols, 64 other bytes

; cpu time (non-gc) 127,130 msec (00:02:07.130) user, 1,040 msec system
; cpu time (gc)     3,120 msec user, 130 msec system
; cpu time (total)  130,250 msec (00:02:10.250) user, 1,170 msec system
; real time  132,695 msec (00:02:12.695)
; space allocation:
;  22,011,012 cons cells, 0 symbols, 64 other bytes

When the print routine was included in the C code, the display of the
array took over 10-minutes on stdout.

I'm sure this is not a fair comparison of acl vs c inasmuch as the
system in the lisp case ran out of memory, requiring a lot of swapping
(determined by looking at the vmstat &c and the system is bottle-necked
very heavily by io to disk the /tmp area is also too small for this).

This machine is a `broke-dick-dog' if there ever were one. Also the acl
is OLD. It's time to try this at home.

Best Regards,

:) w


Sent via Deja.com http://www.deja.com/
Before you buy.
From: Robert Monfera
Subject: Re: Copying arrays
Date: 
Message-ID: <387F6029.3CB606EA@fisec.com>
Will Deakin wrote:

[ results elided ]

> I'm sure this is not a fair comparison of acl vs c inasmuch as the
> system in the lisp case ran out of memory, requiring a lot of swapping
> (determined by looking at the vmstat &c and the system is bottle-necked
> very heavily by io to disk the /tmp area is also too small for this).

Yes, it pretty much renders results meaningless as there could be orders
of magnitude of difference.  I am somewhat surprised by the consing
(especially if you used the exact same code, which consed nothing on ACL
5.0.1).  Also, better response times of zfa-rm could be explained by the
fact that the system had paged in a bigger chunk of the array.  Maybe if
you run it multiple times, the whole 16MB array gets mapped into the
memory.

Regards
Robert
From: Quentin Deakin
Subject: Re: Copying arrays
Date: 
Message-ID: <85o1r8$em6$1@barcode.tesco.net>
The answer to this is that I am a gurt big wazzock. I did not load the
compiled version of the file, just the interpreted versions and for this I
should be slapped.


Here are some more sensible results from my Linux K3-350 at home:

For a 1000x1000 matrix in the order zfa nil, zfa t, zfa-rm nil, zfa-rm t.

gcc 2.95.2

0.10user 0.07system 0:00.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (76major+986minor)pagefaults 0swaps

0.40user 0.03system 0:00.43elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (76major+986minor)pagefaults 0swaps

gcc 2.95.2 -O3

0.08user 0.06system 0:00.14elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (76major+986minor)pagefaults 0swaps

0.34user 0.06system 0:00.40elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (76major+986minor)pagefaults 0swaps

Allegro CL Trial Edition 5.0.1 [Linux/X86] (6/29/99 16:11)

; cpu time (non-gc) 90 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  90 msec user, 0 msec system
; real time  87 msec
; space allocation:
;  1 cons cell, 0 symbols, 0 other bytes, 0 static bytes

; cpu time (non-gc) 350 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  350 msec user, 0 msec system
; real time  350 msec
; space allocation:
;  2 cons cells, 0 symbols, 0 other bytes, 0 static bytes

; cpu time (non-gc) 70 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  70 msec user, 0 msec system
; real time  69 msec
; space allocation:
;  1 cons cell, 0 symbols, 0 other bytes, 0 static bytes

; cpu time (non-gc) 340 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  340 msec user, 0 msec system
; real time  345 msec
; space allocation:
;  2 cons cells, 0 symbols, 0 other bytes, 0 static bytes

CMU Common Lisp release x86-linux 2.4.17  8 January 2000 build 275, running
on leontiadaes

Evaluation took:
  0.07 seconds of real time
  0.07 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  0 bytes consed.

Evaluation took:
  0.34 seconds of real time
  0.34 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  0 bytes consed.

Evaluation took:
  0.07 seconds of real time
  0.07 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  0 bytes consed.

Evaluation took:
  0.32 seconds of real time
  0.32 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  0 bytes consed.

clisp 1999-07-22

Real time: 1.552807 sec.
Run time: 1.55 sec.
Space: 0 Bytes

Real time: 1.665643 sec.
Run time: 1.66 sec.
Space: 0 Bytes

Real time: 0.982459 sec.
Run time: 0.98 sec.
Space: 0 Bytes

Real time: 1.558619 sec.
Run time: 1.56 sec.
Space: 0 Bytes

Got to go and bath the baby,

Best Regards,

:) will
From: Quentin Deakin
Subject: Re: Copying arrays
Date: 
Message-ID: <85q448$mm4$1@epos.tesco.net>
And now the baby is bathed, I have run and included gprof timings for C code
and repeated the timings for acl, clisp and the debian distribution cmucl
for x86.

Firstly: From these tests it is appears that optimised native compiled
Common Lisp code can provide array access times within +/-10% of the
optimised C-code. This is well within the margin of error and variation
observed over multiple measurements.

Secondly: Array access in Common Lisp to arrays consisting of vectors of
vectors via svref is marginally quicker that to `normal' array access via
aref.

Finally: The optimisation approach for the Lisp may be suboptimal (I'm not
optimisation expert). However, if the Lisp code is providing access about as
fast and or marginally quicker that C-code can be optimised further, I would
be suprised.

Best Regards,

:) will

ps: If anybody is still interested here are the timings.

All the code is as has been posted previously in this thread. This code has
been tweaked so that the routines do not call the print routine in the c
code and that the top level dotimes in the zfa and zfa-rm code returns nil
and not the array.

These timing are carried out for a 1000x1000 arrays. All timing were carried
out under Linux 2.2.14 using glibc 2.1.2 on a K6-3 350 PCI-bus i586.

Timings of C array access:
All timings for C execution were carried out using gcc 2.95.2. The results
are paired in column row and row column array access order.

`gcc' and time'd execution:
0.10user 0.07system 0:00.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (76major+986minor)pagefaults 0swaps

`gcc -pg' and output from gprof:
  %   cumulative   self              self     total
 time   seconds   seconds    calls  us/call  us/call  name
 56.25      0.09     0.09        1 90000.00 90000.00  array
 43.75      0.16     0.07        1 70000.00 70000.00  zfa

`gcc' and time'd execution:
0.40user 0.03system 0:00.43elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (76major+986minor)pagefaults 0swaps

`gcc -pg' and timed output from gprof:
  %   cumulative   self              self     total
 time   seconds   seconds    calls  us/call  us/call  name
 79.55      0.35     0.35        1 350000.00 350000.00  zfa
 20.45      0.44     0.09        1 90000.00 90000.00  array

`gcc -O2' and time'd execution:
0.08user 0.06system 0:00.14elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (76major+986minor)pagefaults 0swap

`gcc -pg -O2' and output from gprof:
  %   cumulative   self              self     total
 time   seconds   seconds    calls  us/call  us/call  name
 53.85      0.07     0.07        1 70000.00 70000.00  array
 46.15      0.13     0.06        1 60000.00 60000.00  zfa

`gcc -O2' and time'd execution:
0.36user 0.05system 0:00.41elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (76major+986minor)pagefaults 0swaps

`gcc -pg -O2' and output from gprof:
  %   cumulative   self              self     total
 time   seconds   seconds    calls  us/call  us/call  name
 82.50      0.33     0.33        1 330000.00 330000.00  zfa
 17.50      0.40     0.07        1 70000.00 70000.00  array

`gcc -O3' and time'd exection:
NB: this `breaks' gprof

0.08user 0.06system 0:00.14elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (76major+986minor)pagefaults 0swaps

0.34user 0.06system 0:00.40elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (76major+986minor)pagefaults 0swaps

Timings of Common Lisp array access:
The results are paired in column row and row column array access order, as
above. The first and second results are for array access using aref. The
third and fourth set of results are for vector-of-vector array access using
svref.

Allegro CL Trial Edition 5.0.1 [Linux/X86] (6/29/99 16:11)

; cpu time (non-gc) 80 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  80 msec user, 0 msec system
; real time  86 msec
; space allocation:
;  1 cons cell, 0 symbols, 0 other bytes, 0 static bytes

; cpu time (non-gc) 340 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  340 msec user, 0 msec system
; real time  342 msec
; space allocation:
;  2 cons cells, 0 symbols, 0 other bytes, 0 static bytes

;  1 cons cell, 0 symbols, 0 other bytes, 0 static bytes
; cpu time (non-gc) 60 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  60 msec user, 0 msec system
; real time  62 msec
; space allocation:
;  1 cons cell, 0 symbols, 0 other bytes, 0 static bytes

; cpu time (non-gc) 330 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  330 msec user, 0 msec system
; real time  345 msec
; space allocation:
;  2 cons cells, 0 symbols, 0 other bytes, 0 static bytes

CMU Common Lisp release x86-linux 2.4.17  8 January 2000 build 275, running
on leontiadaes

Evaluation took:
  0.06 seconds of real time
  0.06 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  0 bytes consed.

Evaluation took:
  0.34 seconds of real time
  0.34 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  0 bytes consed.

NB: optimisation uses (declare (type (simple-vector 1000) a)

Evaluation took:
  0.06 seconds of real time
  0.06 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  0 bytes consed.

Evaluation took:
  0.32 seconds of real time
  0.32 seconds of user run time
  0.0 seconds of system run time
  0 page faults and
  0 bytes consed.

CLISP 1999-07-22 (July 1999)

Real time: 1.527073 sec.
Run time: 1.53 sec.
Space: 0 Bytes

Real time: 1.661797 sec.
Run time: 1.66 sec.
Space: 0 Bytes

Real time: 0.976969 sec.
Run time: 0.98 sec.
Space: 0 Bytes

Real time: 1.558619 sec.
Run time: 1.56 sec.
Space: 0 Bytes
From: Tim Bradshaw
Subject: Re: Copying arrays
Date: 
Message-ID: <ey3wvpc4k5a.fsf@cley.com>
* Will Deakin wrote:
> zfa on Allegro CL 4.3 [SPARC; R1] (7/19/99 14:01)

> ; cpu time (non-gc) 327,390 msec (00:05:27.390) user, 1,000 msec system
> ; cpu time (gc)     2,610 msec user, 990 msec system
> ; cpu time (total)  330,000 msec (00:05:30) user, 1,990 msec system
> ; real time  337,089 msec (00:05:37.089)
> ; space allocation:
> ;  54,011,131 cons cells, 4,000,000 symbols, 96,016,528 other bytes

I'm unclear what you are testing here.  But if this code is consing
*at all* you are dead in the water -- this has consed over 150Mbytes
of stuff.  See my other article posted just now.  My code conses
basically nothing on any system I can run it (it won't run on clisp or
gcl because of size limitations on the arrays I think).

It may be that I've lost track of what is going on, and there are now
a whole bunch of functions called ZFA all doing something different.

Are you doing the `make a column-major object' to mash an array in
column-major order, then... No even then there should only be 4000 of
these things, that can't be more than a few Mb.  Something is badly
wrong if you are consing like this.

Finally, my ZFA functions run in about 70Mb working set on Lisp, a
little over 60 for the C (actually I suspect that the extra 10Mb is
lisp environment that could be paged out but I have enough memory that
it isn't being).

--tim
From: Tim Bradshaw
Subject: Re: Copying arrays
Date: 
Message-ID: <ey3embk61d7.fsf@cley.com>
* Marco Antoniotti wrote:

> I ran the example (without much thinking) in CMUCL and C (gcc)

> In CMUCL on an Ultra 2 you get about 5 seconds for the good order and
> about 10 seconds for the bad order case.  GC eats up about 3.5 secs in
> both cases.

> If you turn off GC, then CMUCL is pretty good, and the time is about
> 2.39 secs for the good case and about 7.5 secs for the bad case.

What?  Can you post your code?  My cmucl times are way better than
that.  For this code:

    (defvar *a* (make-array '(4000 4000)
			    :element-type 'fixnum
			    :initial-element 0))

    (defun zfa (a &optional bad-order-p)
      (declare (type (simple-array fixnum (4000 4000))
		     a)
	       (optimize speed
			 (safety 0)))
      (if bad-order-p
	  (dotimes (i 4000 a)
	    (declare (type fixnum i))
	    (dotimes (j 4000)
	      (declare (type fixnum j))
	      (setf (aref a j i) 0)))
	  (dotimes (i 4000 a)
	    (declare (type fixnum i))
	    (dotimes (j 4000)
	      (declare (type fixnum j))
	      (setf (aref a i j) 0)))))

    (defun zfa-rma (a)
      (declare (type (simple-array fixnum (* *)) a)
	       (optimize speed (safety 0)))
      (dotimes (i (array-total-size a) a)
	(declare (type fixnum i))
	(setf (row-major-aref a i) 0)))

I get runtimes of about 0.8-0.9 secs for ZFA and about 0.4-0.5 for
ZFA-RMA.  And no GC overhead. (averaged over 500 runs, ZFA-RMA gets
0.42 or something).

Will's C code comes in something over a second, but the comparison is
not fair because it has to build the array (albeit stack allocating
should be very fast).  A modified version which calls zfa 500 times on
the same array runs at about .45 secs per iteration and that seems to
be reasonably asymptotically correct.

So for this code, CMUCL is competitive with gcc, on a sparc, if you
use ROW-MAJOR-AREF, and within a small factor if you don't.  Which
means that it's not really competitive, because you can't always use
ROW-MAJOR-AREF for matrix computations without putting a lot of
overhead into your own code.

Notes: (1) The fixnum case is fairly weird because fixnums will probably
fit into general arrays, so SVREF doesn't lose as badly as it should.
(2) If you have a support contract, Franz can tell you tricks to make
array stuff in acl pretty competitive with CMUCL I think.

--tim
From: Oleg Goldshmidt
Subject: Re: Copying arrays
Date: 
Message-ID: <lvzoucxt94.fsf@NOSPAM.bfr.co.il>
Barry Margolin <······@bbnplanet.com> writes:

> >For example: I would like the values of the second column of the array:
> >
> >#3(((1 2 3 4) (5 6 7 8) (9 10 11 12)))
> >
> >that is, #(2 6 10), say. Or of the middle row, #(5 6 7 8).
> 
> You can get the middle row by using a displaced array, but there isn't any
> way to get a column.

Well, I suppose one can write a custom function that will access the
right elements in a loop, cons up a list, and create an array from
that, e.g. (quick'n'dirty)

(defun get-col (arr3 col)
  (let ((dim (nth 1 (array-dimensions arr3)))
        (lst nil))
    (dotimes (n dim lst)
      (setf lst (cons (aref arr3 0 (- dim n 1) col) lst)))
    (make-array dim :initial-contents lst)))

With the array above denoted as x, (get-col x 1) will give #(2 6 10),
(get-col x 2) will get you #(3 7 11). Specific to 3D arrays, and
imperfect in so many other ways, but it's a start.

-- 
Oleg Goldshmidt | BLOOMBERG L.P. (BFM) | ····@bfr.co.il
"... We work by wit, and not by witchcraft;
 And wit depends on dilatory time." - W. Shakespeare.
From: Marc Battyani
Subject: Re: Copying arrays
Date: 
Message-ID: <CA1234D40AD732D6.B579675939EDC3FD.D6B84F19A2B57380@lp.airnews.net>
Will Deakin <········@pindar.com> wrote in message
·················@nnrp1.deja.com...
> is there a `good' way of copying arbitary simple-arrays?
> (defun copy-array (a)
>   "Returns a copy of the given array"
>   (let* ((dimensions (array-dimensions a))
>          (size (apply #'* dimensions))
>          (c (make-array dimensions))
>          (u (make-array size :displaced-to a))
>          (v (make-array size :displaced-to c)))
>     (dotimes (i size c)
>       (setf (aref v i) (aref u i)))))

Instead of :displaced-to use row-major-aref. (�15.2 in the hyperspec)

Marc Battyani
From: Will Deakin
Subject: Re: Copying arrays
Date: 
Message-ID: <85fg33$5hj$1@nnrp1.deja.com>
Thanks.

:) will


Sent via Deja.com http://www.deja.com/
Before you buy.