Parallel matrix ops

From: Mike G.
Subject: Parallel matrix ops
Date: Thu, 20 Dec 2007 03:46:39 +0000
Message-ID: <e202176c-d284-425c-8665-13a018d7ec2c@p69g2000hsa.googlegroups.com>

Hi all,

I was bored today, so I grabbed "matrix.cl" from the CMU lisp
repository and parallelized it.

My basic method was to define a macro, which I call "partition" which
is responsible for partitioning a computation across the rows of a
matrix. It uses (num-rows matrix) and a user-defined special *NUM-
THREADS* to 'chunk' the matrix operations into multiple threads.
Functions which can be partitioned must have special calling form.

(defun foo (arg1 .. argN &optional (start nil) (end nil) (result nil))
    ...)

To use the mechanism, I might do:

(defun foo-matrix-func (matrix1 matrix2 &optional (start nil) (end
nil) (result nil))
    (if (and (size-of-matrices-good matrix1 matrix2)
               (null start))
        (let ((result (make-array (array-dimensions matrix1)))
           (partition some-func result matrix1 matrix2))
       (loop for row from start to end do
           (loop for col from 0 to (num-cols matrix1)
              (setf (aref result row col) (foo (aref matrix1 row col)
(aref matrix2 row col)))))))

My partition macro is included below. All this works well enough. I
get very nearly a 50% speed increase for all operations that I've
tuned (I haven't done gaussian elimination, obviously). At some point
soon -- likely my next span of boredom, I'll empirically determine
optimal sizes for cutting over from single-threading for small
matrices to multi-threading for large matrices.

Comments?

(defmacro partition (func-name res &rest matrices)
  (let ((rows (gensym))
	 (starts (gensym))
	 (ends (gensym))
	 (threads (gensym))
	 (i (gensym))
	 (j (gensym))
	 (width (gensym)))
  `(let* ((,rows (1- (num-rows ,(first matrices))))
	   (,width (ceiling (/ ,rows *NUM-THREADS*)))
	   (,starts (loop for i from 0 to (1- ,rows) by ,width collect i))
	   (,ends (map 'list #'(lambda (x)
		   		(let ((tmp (+ x (1- ,width))))
				  (if (> tmp ,rows)
				      ,rows
				      tmp)))
		      ,starts))
	  (,threads nil))

       (do ((,i ,starts (cdr ,i))
	      (,j ,ends (cdr ,j)))
	   ((null ,i) nil)
	 (push (sb-thread:make-thread #'(lambda ()
					  (,func-name ,@matrices (car ,i) (car ,j) ,res)))
	       ,threads))
       (wait-for-threads-to-complete ,threads)
      ,res)))

Re: Parallel matrix ops Mike G.
Re: Parallel matrix ops D Herring
- Re: Parallel matrix ops Mike G.
  - Re: Parallel matrix ops D Herring
    - Re: Parallel matrix ops Marco Antoniotti
    - Re: Parallel matrix ops Mike G.
      - Re: Parallel matrix ops Albert Krewinkel
    - Re: Parallel matrix ops Jason Nielsen
      - Re: Parallel matrix ops Mike G.
        Re: Parallel matrix ops Nicolas Neuss
        Re: Parallel matrix ops Mike G.
        Re: Parallel matrix ops Nicolas Neuss

From: Mike G.
Subject: Re: Parallel matrix ops
Date: Thu, 20 Dec 2007 05:18:32 +0000
Message-ID: <da91d3ce-4a6f-4f87-94bc-e758ff960441@r60g2000hsc.googlegroups.com>

On Dec 19, 10:46 pm, "Mike G." <···············@gmail.com> wrote:
>
> My partition macro is included below. All this works well enough. I
> get very nearly a 50% speed increase for all operations that I've
> tuned

Just a correction to myself. Run-time is nearly halved, so its a
100% speed increase. Duh.

-M

From: D Herring
Subject: Re: Parallel matrix ops
Date: Fri, 21 Dec 2007 01:51:13 +0000
Message-ID: <69KdnfQsQoeMgfbanZ2dnUVZ_jSdnZ2d@comcast.com>

Mike G. wrote:
> Hi all,
> 
> I was bored today, so I grabbed "matrix.cl" from the CMU lisp
> repository and parallelized it.
...
> My partition macro is included below. All this works well enough. I
> get very nearly a 50% speed increase for all operations that I've
> tuned (I haven't done gaussian elimination, obviously). At some point
> soon -- likely my next span of boredom, I'll empirically determine
> optimal sizes for cutting over from single-threading for small
> matrices to multi-threading for large matrices.
> 
> Comments?

Yes!

Seriously, Lisp has the basic infrastructure to outperform C/C++ in 
numerics, but the pieces haven't fallen into place yet.

Read this page from a C++ numerics toolkit:
http://www.oonumerics.org/blitz/whatis.html

What on that page wouldn't be easier to do in Lisp?  They're using C++ 
templates to do what Lisp macros were designed for.  If you haven't 
tried it yourself, believe me in saying that sophisticated C++ 
templates are harder to develop and easier to break than the 
equivalent Lisp macros.

The basic goal of Blitz++, VSIPL++, and other C++ template libraries 
is to take expressions such as
y = A*x + b
and evaluate it as a single loop instead of the more traditional C 
expansion
tmp = A*x
y = tmp + b

SIMD instructions also tend to outperform normal loop constructs.  If 
you're interested, there is raw support for x86 instructions in the 
sb-simd project.
http://common-lisp.net/project/sb-simd/
http://repo.or.cz/w/sbcl/simd.git

- Daniel

From: Mike G.
Subject: Re: Parallel matrix ops
Date: Fri, 21 Dec 2007 07:11:27 +0000
Message-ID: <be18d399-1037-4ec4-8119-881c657baf54@x29g2000prg.googlegroups.com>

On Dec 20, 8:51 pm, D Herring <········@at.tentpost.dot.com> wrote:
> Seriously, Lisp has the basic infrastructure to outperform C/C++ in
> numerics, but the pieces haven't fallen into place yet.

Indeed. Do you happen to know of any groups that are trying to
organize a project to make these pieces fall into place? Next time I'm
bored, I'd rather hack some code that someone else has at least a
snowball's chance in hell of using.

> SIMD instructions also tend to outperform normal loop constructs.  If
> you're interested, there is raw support for x86 instructions in the
> sb-simd project.http://common-lisp.net/project/sb-simd/http://repo.or.cz/w/sbcl/simd.git
>
> - Daniel

Thanks, I found this recently myself -- I'm very much interested in
this. I've been wanting to play with vector operations for some time
now, but apart from some very simple stuff in assembler, never got
much further than a cursory review of the AMD 3DNow! ops on the K6.
I've never done anything with SSE, for example.

I havent had a chance to try and build the sb-simd stuff yet. Has
anyone done this? Does it apply cleanly against a recent SBCL?

If I can get it working, I think I'll try to add SIMD support to this
little package next. Actually, I think I'll back up and dig out my old
matrix library that did dense and sparse storage and work with that.

From: D Herring
Subject: Re: Parallel matrix ops
Date: Fri, 21 Dec 2007 07:52:29 +0000
Message-ID: <z6mdnY-Z7JVc7fbanZ2dnUVZ_qqgnZ2d@comcast.com>

Mike G. wrote:
> On Dec 20, 8:51 pm, D Herring <········@at.tentpost.dot.com> wrote:
>> Seriously, Lisp has the basic infrastructure to outperform C/C++ in
>> numerics, but the pieces haven't fallen into place yet.
> 
> Indeed. Do you happen to know of any groups that are trying to
> organize a project to make these pieces fall into place? Next time I'm
> bored, I'd rather hack some code that someone else has at least a
> snowball's chance in hell of using.

I don't know of any organized project; but there is interest scattered 
around the web.

Here's some relevant info:
http://www.cs.berkeley.edu/~mhoemmen/lisp-matrix/www/

An *old* package:
ftp://ftp.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/math/matrix/0.html

MatLisp wraps the BLAS and LAPACK.
http://matlisp.sourceforge.net/

NLisp is a similar system.
http://nlisp.info/

Common-math has some files under numerics/linear-algebra.
http://common-lisp.net/websvn/listing.php?repname=common-math

Cyrus Harmon might be interested.  http://cyrusharmon.com/blog
In particular, see clem, ch-image, and fftw-ffi.

Christophe Rhodes started sb-simd:
http://sourceforge.net/mailarchive/message.php?msg_id=20011012102125.A32093%40cam.ac.uk

I'm interested.

>> SIMD instructions also tend to outperform normal loop constructs.  If
>> you're interested, there is raw support for x86 instructions in the
>> sb-simd project.
>> http://common-lisp.net/project/sb-simd/ 
>> http://repo.or.cz/w/sbcl/simd.git
> 
> Thanks, I found this recently myself -- I'm very much interested in
> this. I've been wanting to play with vector operations for some time
> now, but apart from some very simple stuff in assembler, never got
> much further than a cursory review of the AMD 3DNow! ops on the K6.
> I've never done anything with SSE, for example.
> 
> I havent had a chance to try and build the sb-simd stuff yet. Has
> anyone done this? Does it apply cleanly against a recent SBCL?

Yes, it applies cleanly; no, I don't think its been tested or used 
much in recent history.

> If I can get it working, I think I'll try to add SIMD support to this
> little package next. Actually, I think I'll back up and dig out my old
> matrix library that did dense and sparse storage and work with that.

Sounds like a plan.  If you go down that path, liboil may provide 
useful inspiration.
http://liboil.freedesktop.org/wiki/

other SSE links:
http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
http://www.tommesani.com/SSE.html

- Daniel

From: Marco Antoniotti
Subject: Re: Parallel matrix ops
Date: Fri, 21 Dec 2007 09:40:05 +0000
Message-ID: <8c51c121-c8e4-4238-985a-4457b20f2615@i12g2000prf.googlegroups.com>

On Dec 21, 8:52 am, D Herring <········@at.tentpost.dot.com> wrote:
> Mike G. wrote:
> > On Dec 20, 8:51 pm, D Herring <········@at.tentpost.dot.com> wrote:
> >> Seriously, Lisp has the basic infrastructure to outperform C/C++ in
> >> numerics, but the pieces haven't fallen into place yet.
>
> > Indeed. Do you happen to know of any groups that are trying to
> > organize a project to make these pieces fall into place? Next time I'm
> > bored, I'd rather hack some code that someone else has at least a
> > snowball's chance in hell of using.
>
> I don't know of any organized project; but there is interest scattered
> around the web.
>
> Here's some relevant info:http://www.cs.berkeley.edu/~mhoemmen/lisp-matrix/www/
>
> An *old* package:ftp://ftp.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/m...
>
> MatLisp wraps the BLAS and LAPACK.http://matlisp.sourceforge.net/
>
> NLisp is a similar system.http://nlisp.info/
>
> Common-math has some files under numerics/linear-algebra.http://common-lisp.net/websvn/listing.php?repname=common-math
>

Common Math is in a bit of a stasis.  I have more code in my personal
repository, but I have had other chores to perform.  In any case, any
help is welcome.

Cheers
--
Marco

From: Mike G.
Subject: Re: Parallel matrix ops
Date: Fri, 21 Dec 2007 18:28:02 +0000
Message-ID: <0cd39a5d-7bc2-4b0f-8ec6-c706b0bf0364@s8g2000prg.googlegroups.com>

On Dec 21, 2:52 am, D Herring <········@at.tentpost.dot.com> wrote:
> An *old* package:ftp://ftp.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/m...

:) This is the one I multi-threaded.

> MatLisp wraps the BLAS and LAPACK.http://matlisp.sourceforge.net/

I used this for a little while - it was OK. It was inconvenient to
have to convert CL arrays to Matlisp arrays, though.

> NLisp is a similar system.http://nlisp.info/

This looks neat. Also doesn't look as if its been updated in awhile. :
(

> Yes, it applies cleanly; no, I don't think its been tested or used
> much in recent history.

Well, I have a newly-rolled SBCL (1.0.12) with the sb-simd patches.
I'll play with it for a few days and post here with whether it seems
to be working or not.

From: Albert Krewinkel
Subject: Re: Parallel matrix ops
Date: Sat, 22 Dec 2007 07:31:11 +0000
Message-ID: <fwuabo3w5hc.fsf@pc06.inb.uni-luebeck.de>

"Mike G." <···············@gmail.com> writes:

> On Dec 21, 2:52 am, D Herring <········@at.tentpost.dot.com> wrote:
>
>> NLisp is a similar system.http://nlisp.info/
>
> This looks neat. Also doesn't look as if its been updated in awhile. 
> :(

Depends on your definition of 'awhile'.  The website does lag behind
quite a bit, but the last release was done just one month ago.  Take a
look at their sourceforge project page: http://sf.net/projects/nlisp

Cheers

From: Jason Nielsen
Subject: Re: Parallel matrix ops
Date: Fri, 21 Dec 2007 18:02:04 +0000
Message-ID: <Pine.LNX.4.64.0712211300520.8633@taz>

> Here's some relevant info:
> http://www.cs.berkeley.edu/~mhoemmen/lisp-matrix/www/
>
> An *old* package:
> ftp://ftp.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/math/matrix/0.html
>
> MatLisp wraps the BLAS and LAPACK.
> http://matlisp.sourceforge.net/
>
> NLisp is a similar system.
> http://nlisp.info/
>
> Common-math has some files under numerics/linear-algebra.
> http://common-lisp.net/websvn/listing.php?repname=common-math
>
> Cyrus Harmon might be interested.  http://cyrusharmon.com/blog
> In particular, see clem, ch-image, and fftw-ffi.
>
> Christophe Rhodes started sb-simd:
> http://sourceforge.net/mailarchive/message.php?msg_id=20011012102125.A32093%40cam.ac.uk
>

Don't forget Femlisp which includes a lisp version of BLAS and some 
support for sparse matrices:
http://www.femlisp.org/

Jason

From: Mike G.
Subject: Re: Parallel matrix ops
Date: Sat, 22 Dec 2007 04:24:54 +0000
Message-ID: <af1952ee-300c-4c7c-ac76-fe710ac2f47d@r60g2000hsc.googlegroups.com>

On Dec 21, 1:02 pm, Jason Nielsen <····@math.carleton.ca> wrote:
> > Here's some relevant info:
> >http://www.cs.berkeley.edu/~mhoemmen/lisp-matrix/www/
>
> > An *old* package:
> >ftp://ftp.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/m...
>
> > MatLisp wraps the BLAS and LAPACK.
> >http://matlisp.sourceforge.net/
>
> > NLisp is a similar system.
> >http://nlisp.info/
>
> > Common-math has some files under numerics/linear-algebra.
> >http://common-lisp.net/websvn/listing.php?repname=common-math
>
> > Cyrus Harmon might be interested.  http://cyrusharmon.com/blog
> > In particular, see clem, ch-image, and fftw-ffi.
>
> > Christophe Rhodes started sb-simd:
> >http://sourceforge.net/mailarchive/message.php?msg_id=20011012102125....
>
> Don't forget Femlisp which includes a lisp version of BLAS and some
> support for sparse matrices:http://www.femlisp.org/
>
> Jason

Femlisp uses Matlisp.

Parallelizing matlisp may be possible, some support for this would
probably need to be pushed down into Fortran. Not gonna be me ;-P
OTOH, its been awhile since I've used Matlisp and I don't have it
installed to check -- but perhaps there are internal functions to the
package that allow may allow one to work around it.

-M

From: Nicolas Neuss
Subject: Re: Parallel matrix ops
Date: Sat, 22 Dec 2007 15:10:00 +0000
Message-ID: <87odci22bb.fsf@ma-patru.mathematik.uni-karlsruhe.de>

"Mike G." <···············@gmail.com> writes:

> Femlisp uses Matlisp.

Not really.  It implements basic matrix routines in CL, but tries to stay
very Matlisp-compatible.  Matlisp routines can be called, if necessary,
using a simple wrapper.

Nicolas

From: Mike G.
Subject: Re: Parallel matrix ops
Date: Sun, 23 Dec 2007 01:49:22 +0000
Message-ID: <cb972745-fc5d-4be6-9951-dc62675179af@s48g2000hss.googlegroups.com>

On Dec 22, 10:10 am, Nicolas Neuss <········@math.uni-karlsruhe.de>
wrote:
> "Mike G." <···············@gmail.com> writes:
> > Femlisp uses Matlisp.
>
> Not really.  It implements basic matrix routines in CL, but tries to stay
> very Matlisp-compatible.  Matlisp routines can be called, if necessary,
> using a simple wrapper.
>
> Nicolas

Heh. You're the author, huh? <pulling foot out of mouth>.

OK - so, Matlisp is used for BLAS / LAPACK integration, then? If you
don't want BLAS, then the compatible code in CL is used? I'm just
trying to understand the statement, from the webpage "Matlisp yields
integration of the BLAS and LAPACK libraries" .. sounds to me that
Matlisp is used, at least in some respect.

Very impressive work, by the way - gnuplot and OpenDX graphical
support -- very, very nice.

Could you offer any suggestions on how to effectively parallelize
Matlisp proper? An ATLAS-based Matlisp with SIMD vectorization would
be nice -- not as nice as having SIMD in Lisp, of course -- but at
least the matrix support will be juicy.

-M

From: Nicolas Neuss
Subject: Re: Parallel matrix ops
Date: Mon, 24 Dec 2007 09:39:45 +0000
Message-ID: <87tzm8ph26.fsf@ma-patru.mathematik.uni-karlsruhe.de>

"Mike G." <···············@gmail.com> writes:

> On Dec 22, 10:10 am, Nicolas Neuss <········@math.uni-karlsruhe.de>
> wrote:
>> "Mike G." <···············@gmail.com> writes:
>> > Femlisp uses Matlisp.
>>
>> Not really.  It implements basic matrix routines in CL, but tries to stay
>> very Matlisp-compatible.  Matlisp routines can be called, if necessary,
>> using a simple wrapper.
>>
>> Nicolas
>
> Heh. You're the author, huh? <pulling foot out of mouth>.

Yes:-)

> OK - so, Matlisp is used for BLAS / LAPACK integration, then? If you
> don't want BLAS, then the compatible code in CL is used? I'm just
> trying to understand the statement, from the webpage "Matlisp yields
> integration of the BLAS and LAPACK libraries" .. sounds to me that
> Matlisp is used, at least in some respect.

Femlisp needed Matlisp some years ago.  However, I changed this, because:

- Matlisp is not easily portable (especially because it depends on Fortran
  compiler and libraries).  It is also not available as a Linux package.

- It comes with Fortran libraries which have to be compiled.  IMO, the
  right way would be to link against some binary.

- It is geared towards handling large matrices, the overhead for small
  matrices is very large.

- Femlisp needs only a rather small and simple BLAS/LAPACK subset.  Thus,
  it was feasible to write this subset from scratch in CL.

- Matlisp (and standard BLAS/LAPACK) does not provide sparse matrices.
  Efficient sparse matrix code (compressed storage schemes) can be
  implemented along the same lines as the code for full matrices.

The remaining note on the Femlisp homepage is meant to refer to Matlisp in
the same sense as it refers to Maxima and other tools: it is nice to have
them available if the need arises.

> Very impressive work, by the way - gnuplot and OpenDX graphical
> support -- very, very nice.

Thanks.

> Could you offer any suggestions on how to effectively parallelize Matlisp
> proper? An ATLAS-based Matlisp with SIMD vectorization would be nice --
> not as nice as having SIMD in Lisp, of course -- but at least the matrix
> support will be juicy.

I am more interested in higher-level numerical solution of partial
differential equations.  I.e., I would very much like to use some existing
well-established and good-performing matrix library, and only see myself
forced to do something myself at the matrix level, because what is there at
the moment does not fit my needs.  Thus, I am not the right person to
answer this question (maybe Mark Hoemmen could tell you more).

However, for (non-shared-memory) parallelization of sparse matrix
operations one should study how C/C++ libraries like PETSC or DUNE
(www.dune-project.org) attack the problem.  I guess that the first step
would be to implement portable MPI bindings for CL, preferably with a
higher-level interface using the MOP (some metaclass distributed-class).

Merry christmas,
Nicolas