optimizing diffusion code (was arithmetic speed)

From: Eric Lavigne
Subject: optimizing diffusion code (was arithmetic speed)
Date: Wed, 27 Apr 2005 00:54:18 +0000
Message-ID: <1114563258.327590.86130@f14g2000cwb.googlegroups.com>

>
> Good luck with the project. The c.l.l. crowd loves tuning challenges,
so
> keep them up to date. Is the latest source up in one of the articles?


I will post this email as the start of a new thread. The old thread has
become rather cluttered with multiple parallel conversations and the
old topic name is no longer appropriate.

I put a link to the source in my original post. I update the file every
hour or two.
plaza.ufl.edu/lavigne/
Neutron Diffusion Code 3

The latest version (of the gauss-seidel function) is 6 times faster
than the version I had when I started the thread. It's not yet clear to
me whether this is fast enough, but it is a huge improvement.

> have ACL enterprise with profiling, maybe I could spot something. If
you
> have a standalone source including the test case you are trying to
tune
> (perhaps abbreviated to facilitate repeated experiments) I will fire
it
> up and see what I can see (not that I am a tuning guru).

My latest improvement came from eliminating some unnecessary
destruction/creation of nodes. Keeping the old nodes and just changing
a few values was much faster. It doesn't take a tuning guru to notice
things like this, but it sure took this newbie a long time.

My current set of commands for testing are listed below. It may be time
to change them, though, now that gauss-seidel is a bit faster. It is
likely that the bottleneck has moved, and I need to find it. Probably
the next bottleneck will be expand-flux. It is not called nearly as
often, but it is extremely slow. The t at the end of some lines is
meant to supress output for my convenience, a trick that worked in
clisp but not in LispWorks. Note that gauss-seidel will run 10 times
faster if you run it a second time without first resetting the flux
(with the dotimes stuff), because it will benefit from that fact that
its job has already been done.

(setf flux (make-array '(9 9 9))) t
(setf my-material (make-material :d .9 :v-sigma-f .07 :sigma-a .066))
(dotimes (i (array-dimension flux 0)) (dotimes (j (array-dimension flux
1)) (dotimes (k (array-dimension flux 2)) (setf (aref flux i j k)
(make-node :material my-material :flux 1.0)))))
(setf source (set-source flux 1.5)) t
(compile 'gauss-seidel)
(compile 'phi-calc)
(time (gauss-seidel source flux (/ 40 (1- (array-dimension flux 0)))))

> btw, do you
> have a feel for how fast it would run in C or Fortran or something to
> set the height of the performance bar we are trying to hit?
>

I don't have any numbers for that because my project is a bit unique...
Not only am I the only person doing the assignment in Lisp (Fortran is
the most common, some are using C++), but I am also the only one doing
it in 3D. This is a cylinder problem, and others are modeling that
cylinder in 2D (r,z) cylindrical coordinates (and are still not finding
it easy). I am writing a 3D arbitrary geometry code with the intention
of modeling that cylinder in it later. This makes my code more
complicated (arbitrary geometry) and slower (50-100 times more nodes
due to extra dimension). Also, this problem scales much worse than
linearly with number of nodes because of convergence issues.

The performance bar I am trying to hit is 30 minutes for the whole
program. This would allow a halfway reasonable rate for my
edit-compile-run development cycle. I don't think I'm anywhere close to
this number, since expand-flux alone can run for 30 minutes and is
called more than 20 times in the full program.

> cheers, kenny
> 
>

Re: optimizing diffusion code (was arithmetic speed) Wade Humeniuk
- Re: optimizing diffusion code (was arithmetic speed) Eric Lavigne
  - Re: optimizing diffusion code Edi Weitz
Re: optimizing diffusion code (was arithmetic speed) Eric Lavigne
- Re: optimizing diffusion code Edi Weitz
Re: optimizing diffusion code (was arithmetic speed) Paul Dietz
Re: optimizing diffusion code (was arithmetic speed) Eric Lavigne
Re: optimizing diffusion code Eric Lavigne
Re: optimizing diffusion code (was arithmetic speed) Eric Lavigne
- Re: optimizing diffusion code (was arithmetic speed) Wade Humeniuk
Re: optimizing diffusion code (was arithmetic speed) Sampo Smolander
- Re: optimizing diffusion code (was arithmetic speed) Eric Lavigne
  - Re: optimizing diffusion code (was arithmetic speed) Sampo Smolander
    - Re: optimizing diffusion code (was arithmetic speed) Eric Lavigne
      - Re: optimizing diffusion code (was arithmetic speed) Kenny Tilton
        Re: optimizing diffusion code (was arithmetic speed) Eric Lavigne
        Re: optimizing diffusion code (was arithmetic speed) Larry Clapp
        Re: optimizing diffusion code (was arithmetic speed) Eric Lavigne
Re: optimizing diffusion code Nicolas Neuss

From: Wade Humeniuk
Subject: Re: optimizing diffusion code (was arithmetic speed)
Date: Wed, 27 Apr 2005 01:19:21 +0000
Message-ID: <tMBbe.1483$3V3.1155@edtnps89>

Eric Lavigne wrote:

> 
> (setf flux (make-array '(9 9 9))) t

Is your flux array just an array of single-float(s)? (or even for that matter
double-float).  If it is then you should probably create flux arrays like

(make-array '(9 9 9) :element-type 'single-float)

and then when you use it in a function add this declaration,

(declare (type (simple-array single-float (* * *)) <var>)

e.g. in gauss-seidal add to the current declaration

(type (simple-array single-float (* * *)) flux)

I am not sure if anything has to be done with copy-array.

Wade

From: Eric Lavigne
Subject: Re: optimizing diffusion code (was arithmetic speed)
Date: Wed, 27 Apr 2005 01:40:14 +0000
Message-ID: <1114566014.624480.310760@o13g2000cwo.googlegroups.com>

Flux is an array of nodes. A node looks like this:
node
          -flux (single-float)
          -material(struct)
                    D(single-float)
                    v-sigma-f(single-float)
                    sigma-a(single-float)
So I think specifying that it will contain nodes in advance is probably
not a big deal since a pointer to the node will be needed anyway. I do
save memory and reduce copying by making only a few material structs,
with many nodes having pointers to the same material.

Someone recommended switching from single-floats to double-floats.
Apparently they are just as fast and are more precise. My only problem
with taking this advice is that I don't understand the notation. If I
pass 1.0 to a function and use (declare (double-float... on that
argument, is that an error because I passed a single-float and called
it double? Do I need to say 1d0 instead?

Slightly off-topic, I'm trying to use the Lisp Resource Kit, based on
Knoppix. I downloaded an ISO, burned it to CD, but it doesn't boot. How
do I make CDs bootable, and is there a free windows program that I can
use for the job?

From: Edi Weitz
Subject: Re: optimizing diffusion code
Date: Wed, 27 Apr 2005 08:23:45 +0000
Message-ID: <umzrkeige.fsf@agharta.de>

On 26 Apr 2005 18:40:14 -0700, "Eric Lavigne" <············@gmail.com> wrote:

> Slightly off-topic, I'm trying to use the Lisp Resource Kit, based
> on Knoppix. I downloaded an ISO, burned it to CD, but it doesn't
> boot. How do I make CDs bootable,

If the ISO describes a bootable CD than the CD-R you burned should
boot I think.

> and is there a free windows program that I can use for the job?

The Debian FAQ says that this is a freeware program that can burn CDs
from ISOs:

  <http://www.cdburnerxp.se/>

Cheers,
Edi.

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")

From: Eric Lavigne
Subject: Re: optimizing diffusion code (was arithmetic speed)
Date: Wed, 27 Apr 2005 22:25:19 +0000
Message-ID: <1114640719.075027.61410@l41g2000cwc.googlegroups.com>

>Can you change your "signature" to mention
> on what webpage your code  (and running hints)
> can be found?  When I read the above, I wanted
> to  time the code myself.

It's on my webpage at
plaza.ufl.edu/lavigne/
Look for Neutron Diffusion Code 3.
Running hints are below it.

Note that I don't blame CMUCL for the poor performance. A compiler
should be measured on its performance with good code and its general
helpfulness with bad code. CMUCL gets better marks on both points. In
benchmarks it does better (with good coders providing the benchmark
code, not newbies like me). When I feed it bad code it runs a bit
slower than CLISP but gives me the comments I need to get better. CLISP
never complained about my mistakes, even when I gave it an integer and
called it a float (fixed this morning when I tried to run it on ACL).
CMUCL is the first compiler I've seen, for any language, that gives
warnings about... your code would be 10 ticks per iteration faster
if... On the other hand, I don't understand why it gives me so many
messages about garbage collection. I would think that printing those
messages would take more time than actually collecting the garbage.

I will add a signature right after this post. Thanks for the suggestion.

From: Edi Weitz
Subject: Re: optimizing diffusion code
Date: Wed, 27 Apr 2005 22:39:43 +0000
Message-ID: <uzmvj6dzk.fsf@agharta.de>

On 27 Apr 2005 15:25:19 -0700, "Eric Lavigne" <············@gmail.com> wrote:

> On the other hand, I don't understand why it gives me so many
> messages about garbage collection. I would think that printing those
> messages would take more time than actually collecting the garbage.

You can turn them off with (setq ext:*gc-verbose* nil).

Add that form to your initialization file (~/.cmucl-init) and you'll
never see those messages again... :)

Cheers,
Edi.

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")

From: Paul Dietz
Subject: Re: optimizing diffusion code (was arithmetic speed)
Date: Wed, 27 Apr 2005 20:25:47 +0000
Message-ID: <d4orvo$39b$1@avnika.corp.mot.com>

Eric Lavigne wrote:
>>What do you use Gauss-Seidel for?  Solving a linear system?  This is
> 
> an
> 
>>extremely bad choice.
>>For performance, you should
>>0. Switch to another iteration, e.g. multigrid
> 
> 
> I am surprised to hear that. My department's proudest accomplishment,
> PENTRAN, uses a similar algorithm, block-jacobi, for solving linear
> systems.
> http://ufttg.nre.ufl.edu/~haghigha/pentran.html
> If there are much better iteration methods, I might end up turning
> this into a dissertation project. I don't know if I have time to learn
> these methods and apply them before my project is due, but either way
> I am excited to hear about these.

Multigrid solvers for the Laplace (diffusion) equation run in time
linear in the number of finest grid points, IIRC.

	Paul

From: Eric Lavigne
Subject: Re: optimizing diffusion code (was arithmetic speed)
Date: Wed, 27 Apr 2005 23:48:26 +0000
Message-ID: <1114640770.282947.172890@f14g2000cwb.googlegroups.com>

>Can you change your "signature" to mention
> on what webpage your code  (and running hints)
> can be found?  When I read the above, I wanted
> to  time the code myself.

It's on my webpage at
plaza.ufl.edu/lavigne/
Look for Neutron Diffusion Code 3.
Running hints are below it.

Note that I don't blame CMUCL for the poor performance. A compiler
should be measured on its performance with good code and its general
helpfulness with bad code. CMUCL gets better marks on both points. In
benchmarks it does better (with good coders providing the benchmark
code, not newbies like me). When I feed it bad code it runs a bit
slower than CLISP but gives me the comments I need to get better. CLISP
never complained about my mistakes, even when I gave it an integer and
called it a float (fixed this morning when I tried to run it on ACL).
CMUCL is the first compiler I've seen, for any language, that gives
warnings about... your code would be 10 ticks per iteration faster
if... On the other hand, I don't understand why it gives me so many
messages about garbage collection. I would think that printing those
messages would take more time than actually collecting the garbage.

I will add a signature right after this post. Thanks for the suggestion.

From: Eric Lavigne
Subject: Re: optimizing diffusion code
Date: Thu, 28 Apr 2005 00:59:44 +0000
Message-ID: <1114649984.834562.69140@l41g2000cwc.googlegroups.com>

>The bad news is that CMUCL on LRK is running about 20% slower than
> CLISP on winxp. The good news is that it gives me many pages of
> explanations about why it can't optimize my code
I've managed to take care of all of the warnings in phi-calc, except
for one:
  Note: Doing float to pointer coercion (cost 13) to "<return value>"
How do I fix this?

My code is available on my website.
plaza.ufl.edu/lavigne/
Diffusion Code 3

From: Eric Lavigne
Subject: Re: optimizing diffusion code (was arithmetic speed)
Date: Thu, 28 Apr 2005 01:33:25 +0000
Message-ID: <1114652005.681450.196190@l41g2000cwc.googlegroups.com>

If I want to declare something as a 3 dimensional array of
single-float, how do I do that?

Ordinary variable declarations look like (declare (single-float
source)), but I can't think of how that would be changed to work with
arrays.

From: Wade Humeniuk
Subject: Re: optimizing diffusion code (was arithmetic speed)
Date: Thu, 28 Apr 2005 03:07:58 +0000
Message-ID: <isYbe.3878$0X6.3385@edtnps90>

Eric Lavigne wrote:
> If I want to declare something as a 3 dimensional array of
> single-float, how do I do that?
> 

(declare (type (simple-array single-float (* * *)) something)

> Ordinary variable declarations look like (declare (single-float
> source)), but I can't think of how that would be changed to work with
> arrays.
>

From: Sampo Smolander
Subject: Re: optimizing diffusion code (was arithmetic speed)
Date: Thu, 28 Apr 2005 06:07:35 +0000
Message-ID: <d4puj7$j7a$1@oravannahka.helsinki.fi>

Eric Lavigne <············@gmail.com> wrote:
> > What do you use Gauss-Seidel for?  Solving a linear system? 
> > This is an extremely bad choice. For performance, you should
> > 0. Switch to another iteration, e.g. multigrid

> I am surprised to hear that. My department's proudest accomplishment,
> PENTRAN, uses a similar algorithm, block-jacobi, for solving linear
> systems.
> http://ufttg.nre.ufl.edu/~haghigha/pentran.html

  "PENTRAN performs multigroup, anisotropic, discrete ordinates neutral 
   particle transport calculations"

May I just point out, that you are solving the diffusion equation, and
(according to the above web page) PENTRAN is solving the radiative
transport equation. These are two entirely different beasts.

It seems that multigrid techniques are common to speed up numerical
solutions to diffusion equation, and other partial differential equations.
But transport equation is a partial differential-integral equation, and
I don't know whether multigrip techniques would work on it, too.

(You who know, please enlighten us.)

From: Eric Lavigne
Subject: Re: optimizing diffusion code (was arithmetic speed)
Date: Thu, 28 Apr 2005 06:37:02 +0000
Message-ID: <1114670222.360137.87260@z14g2000cwz.googlegroups.com>

>It seems that multigrid techniques are common to speed up numerical
>solutions to diffusion equation, and other partial differential
equations.
>But transport equation is a partial differential-integral equation,
and
>I don't know whether multigrip techniques would work on it, too.

What is a differential-integral equation?
Diffusion and transport are different beasts, in the sense that
transport has a lot more simultaneous equations. I don't really
understand how multigrid works, though, so maybe there is something
special about diffusion that makes multigrid effective...

From: Sampo Smolander
Subject: Re: optimizing diffusion code (was arithmetic speed)
Date: Thu, 28 Apr 2005 07:02:57 +0000
Message-ID: <d4q1r1$kr0$1@oravannahka.helsinki.fi>

Eric Lavigne <············@gmail.com> wrote:
> What is a differential-integral equation?

An equation that has both differential and integral terms.

For example, the transport equation is written out as
eq. 2.5, on page 13, in the PENTRAN manual,
http://ufttg.nre.ufl.edu/~haghigha/PENMANV900n.pdf

> I don't really understand how multigrid works, though, so maybe there
> is something special about diffusion that makes multigrid effective...

Then again, googling for 'multigrid radiative transfer' or 'multigrid
radiative transport' seems to give enough reading for more that just one
evening :-)

From: Eric Lavigne
Subject: Re: optimizing diffusion code (was arithmetic speed)
Date: Fri, 29 Apr 2005 00:05:35 +0000
Message-ID: <1114733135.938137.9350@z14g2000cwz.googlegroups.com>

I discussed this issue with one of PENTRAN's creators. It is applicable
to neutron transport but has three problems:

1) Requires about twice as much memory. One of the reasons for parallel
computing, in addition to the speedup, is that larger problems can be
solved with more computers contributing memory.
2)Complicated (difficult to program).
3)Convergence issues near material boundaries if the materials have
extremely different properties.

On the other hand, he agreed that multigrid was much faster, sometimes
by an order of magnitude. I am under the impression that all of the
above issues are solveable, and it sounds worthwhile to do so. It
sounds like Nicolas Neuss has already taken care of 2 (and possibly 3,
I haven't studied his articles enough yet). I am inclined to ignore 1.
Double the memory for an order of magnitude in speed sounds like a good
deal.

From: Kenny Tilton
Subject: Re: optimizing diffusion code (was arithmetic speed)
Date: Fri, 29 Apr 2005 00:58:57 +0000
Message-ID: <lFfce.16796$mp6.2500021@twister.nyc.rr.com>

Eric Lavigne wrote:

> I discussed this issue with one of PENTRAN's creators. It is applicable
> to neutron transport but has three problems:
> 
> 1) Requires about twice as much memory. One of the reasons for parallel
> computing, in addition to the speedup, is that larger problems can be
> solved with more computers contributing memory.
> 2)Complicated (difficult to program).
> 3)Convergence issues near material boundaries if the materials have
> extremely different properties.
> 
> On the other hand, he agreed that multigrid was much faster, sometimes
> by an order of magnitude. I am under the impression that all of the
> above issues are solveable, and it sounds worthwhile to do so. It
> sounds like Nicolas Neuss has already taken care of 2 (and possibly 3,
> I haven't studied his articles enough yet). I am inclined to ignore 1.
> Double the memory for an order of magnitude in speed sounds like a good
> deal.
> 

Cool, so you are undertaking a multigrid project? What happened with the 
project due today?

kenny

-- 
Cells? Cello? Cells-Gtk?: http://www.common-lisp.net/project/cells/
Why Lisp? http://lisp.tech.coop/RtL%20Highlight%20Film

"Doctor, I wrestled with reality for forty years, and I am happy to 
state that I finally won out over it." -- Elwood P. Dowd

From: Eric Lavigne
Subject: Re: optimizing diffusion code (was arithmetic speed)
Date: Fri, 29 Apr 2005 03:56:31 +0000
Message-ID: <1114746991.735043.7040@g14g2000cwa.googlegroups.com>

>Cool, so you are undertaking a multigrid project?
There's a good chance I'll do that later. Not now.

>What happened with the project due today?
I increased the tolerances, trading accuracy for speed so I could get
it done. I still need to update my webpage about the diffusion code's
latest status.

I am starting a new project called CRYSTAL. My dad designed a software
system years ago to support forensics analysts. The idea was to take a
bunch of existing forensics tools, develop a few new ones, and have one
program (CRYSTAL) that would tie all those tools together. CRYSTAL
would also automatically document the progress of an investigation,
improving the quality of reports and the ease of generating them.

He came up with the idea while doing forensics work for the Air Force,
and he arranged for civilian contractors to program it. After a few
years of slow development, the project was cancelled. I never thought
it should take years, but I have ignored it in the past because of the
need for gui development. Peter Seibel's book showed me that gui
development doesn't need to be complicated. I can just turn the whole
program into a webpage, which brings some nice advantages to the table
anyway.

This week, starting Friday or Saturday, I will create my first
prototype of CRYSTAL. This is a big task. Ordinary programmers tried
for years and failed. But I am no ordinary programmer. I am a Lisp
newbie. Wish me luck :-)

From: Larry Clapp
Subject: Re: optimizing diffusion code (was arithmetic speed)
Date: Fri, 29 Apr 2005 14:39:51 +0000
Message-ID: <slrnd74hp9.vbc.larry@theclapp.ddts.net>

[ posted and mailed ]

In article <······················@g14g2000cwa.googlegroups.com>, Eric Lavigne wrote:
>>Kenny Tilton Wrote:
>>What happened with the project due today?
> I increased the tolerances, trading accuracy for speed so I could get
> it done. I still need to update my webpage about the diffusion code's
> latest status.

I came to this discussion fairly late.  In the event that it's still
relevant, I downloaded your Diffusion 3 code, compiled it under cmucl,
and posted the output at
http://www.theclapp.org/diff-3d.lisp.compile.out.txt.  The actual code
I compiled is at http://www.theclapp.org/diff-3d.lisp retrieved from
http://plaza.ufl.edu/lavigne/diff-3d.lisp at 10:12am edt 4/29/05.

Perhaps people more knowledgeable in cmucl than I am could point you
to problems.  It prints a lot of warnings along the lines of

  ;   (INCF *ITERATION-COUNT-INNER*)
  ; --> LET* 
  ; ==>
  ;   (+ *ITERATION-COUNT-INNER* 1)
  ; Note: Unable to optimize due to type uncertainty:
  ;     The first argument is a NUMBER, not a FLOAT.
  [...]
  ;   (ARRAY-DIMENSION FLUX 0)
  ; Note: Unable to optimize because:
  ;     Array dimensions unknown, must call array-dimension at runtime.

On the other hand, you use clisp, so it might be completely
irrelevant, but I figured it couldn't hurt.

It's also within the realm of possibility that telnetting to
prompt.franz.com might help, at least as far as optimizing your code,
though perhaps not in actually running it.  (prompt.franz.com provides
a text-mode version of Allegro CL Trial Edition v6.2.)

-- Larry

From: Eric Lavigne
Subject: Re: optimizing diffusion code (was arithmetic speed)
Date: Fri, 29 Apr 2005 15:17:13 +0000
Message-ID: <1114787833.014919.126830@l41g2000cwc.googlegroups.com>

>came to this discussion fairly late.  In the event that it's still
>relevant, I downloaded your Diffusion 3 code, compiled it under cmucl,
>and posted the output at

Nicolas Neuss convinced me that I was coming at this problem from the
wrong angle. Since I had a project due the next day, I ignored his
advice, stuck with what I had, and focused on finishing. If and when I
return to this project, I will probably throw away my old code and
start over, so I no longer have reason to continue optimizing the old
code.

From: Nicolas Neuss
Subject: Re: optimizing diffusion code
Date: Thu, 28 Apr 2005 11:39:14 +0000
Message-ID: <87ekcvdtb1.fsf@ortler.iwr.uni-heidelberg.de>

"Eric Lavigne" <············@gmail.com> writes:

> Will do. Also, I have good news. I'm writing this message on my
> temporary linux computer. I am using a knoppix distribution called the
> Linux Resource Kit. On to CMUCL. My next goal is to do some
> performance tests to see if CMUCL gives me much better speed. After
> that I'm going to look into FEMLISP. It might not be practical for me
> because I will have to reinstall it every time I reboot my computer,
> but it might still work out if installation is fast enough.

It should be possible to get Femlisp working on GCL, too.  This is
something which is on my TODO list.  GCL's arithmetic speed might be
acceptable, at least it proved to be so for some simple tests.
Unfortunately, you will encounter other difficulties along this road (which
is why noone has recommended it).  Several useful stuff is not yet
available for GCL, for example SLIME.  [Another option for Windows is
Allegro, of course.  It should be easy to port Femlisp there.]

If you should try out Femlisp, you will note that it is not yet optimized
for speed.  This is mainly a problem of Femlisp's datastructures and only a
little bit a problem of Lisp.  Also one should keep in mind that Femlisp
works on general unstructured meshes, and calculations on structured meshes
are always an order of magnitude faster in any language.  [Using structured
meshes can lead to astonishing performance: it is very well possible to
solve linear systems with 1e6 unknowns arising from discretization of the
2d Laplace equation in less than 1 second (using CL).]

Nicolas.