From: Blake D
Subject: garbage collector pauses, help?
Date: 
Message-ID: <Lf7Jb.98116$pY.96514@fed1read04>
Hello,
   I am new to lisp, and my team is a making a massively multiplayer online
role playing game, similar in concept to asheron's call, everquest, or
ultima online. Of course, much more fun ;).

It needs a client and a server, and I have some questions related to
performance. Our lisp product of choice so far is LispWorks.

Execution speed:
Performance on the server is important. Each server must support hundreds
(or thousands) of players at the same time. I am willing to accept an
overall performance penalty for using lisp. I would be ok with code that
runs at 50% or better of the speed of the same kind of server in c++. Can
this be accomplished in lisp fairly easily?

Pauses:
The server will produce a lot of garbage as players walk across the world
and map parts are loaded and unloaded into memory. It would be hard to
accommodate the server pausing for more than 3-5 seconds at most, ever. I
read the manual about the LispWorks generational garbage collector, and I
was thinking that the server should GC all generations periodically because
it is supposed to run continuously, not ever shut down except for
maintenance. It seems to me that if levels 2 and 3 are occasionally added to
but never GC'ed, they would eventually consume all available memory.

I have no idea yet how much memory the server will be using, but total
memory usage ranging from 300 MB to a few gigabytes would be possible.

Is there any way that garbage collection of all generations can be done
without exceeding my pause limits?

Can garbage collection be done "incrementally" or "in the background"?

Is there anything else about lisp or LispWorks that might prevent me from
meeting these pause and performance requirements?

thank you in advance for any help,
Blake

From: Ray Dillinger
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <3FF517C1.3B2ECA07@sonic.net>
Blake D wrote:
> 
> Hello,
>    I am new to lisp, and my team is a making a massively multiplayer online
> role playing game, similar in concept to asheron's call, everquest, or
> ultima online. Of course, much more fun ;).
> 
> It needs a client and a server, and I have some questions related to
> performance. Our lisp product of choice so far is LispWorks.
> 
> Execution speed:
> Performance on the server is important. Each server must support hundreds
> (or thousands) of players at the same time. I am willing to accept an
> overall performance penalty for using lisp. I would be ok with code that
> runs at 50% or better of the speed of the same kind of server in c++. Can
> this be accomplished in lisp fairly easily?

Yes.  With appropriate optional declarations, Lisp code usually
generates machine code that runs within 5% of the speed of comparable
C code (and on many problems, actually runs faster).  And you can 
decide which code to decorate with optional declarations on the 
basis of profiling.  Even without declarations, it's rarely less 
than 85% as fast as C when compiled. 

> Pauses:
> The server will produce a lot of garbage as players walk across the world
> and map parts are loaded and unloaded into memory. It would be hard to
> accommodate the server pausing for more than 3-5 seconds at most, ever. I
> read the manual about the LispWorks generational garbage collector, and I
> was thinking that the server should GC all generations periodically because
> it is supposed to run continuously, not ever shut down except for
> maintenance. It seems to me that if levels 2 and 3 are occasionally added to
> but never GC'ed, they would eventually consume all available memory.

The idea with generational GC is that you collect a shallow layer frequently
and collect deeper layers occasionally.  So after collecting the most recent
generation of garbage N times (N varies from one implementation to another) 
it would go ahead and collect all generations once. 

That said, I have not observed "pauses" of more than a tenth of a second on 
modern Lisps, and many garbage collectors don't halt the program to work. 
 
> I have no idea yet how much memory the server will be using, but total
> memory usage ranging from 300 MB to a few gigabytes would be possible.
 
> Is there any way that garbage collection of all generations can be done
> without exceeding my pause limits?

I haven't observed pauses of more than a tenth of a second in modern Lisps. 
"Stop-and-copy" garbage collection, which gave rise to longer pauses, is no 
longer a popular implementation strategy. In fact, many modern garbage 
collectors don't stop the program at all in an SMP system. 
 
> Can garbage collection be done "incrementally" or "in the background"?

In principle, yes.  In practice, it depends on the implementation. I don't 
know what the answer is for LispWorks specifically; I'll let someone who 
knows take that up. 
 
					Bear
From: Gareth McCaughan
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <87oetmfppf.fsf@g.mccaughan.ntlworld.com>
Ray Dillinger wrote:

> Yes.  With appropriate optional declarations, Lisp code usually
> generates machine code that runs within 5% of the speed of comparable
> C code (and on many problems, actually runs faster).  And you can 
> decide which code to decorate with optional declarations on the 
> basis of profiling.  Even without declarations, it's rarely less 
> than 85% as fast as C when compiled. 

These numbers are better than I've seen with CMU CL, though
I haven't done extensive testing, and CMU CL is generally
reckoned one of the faster implementations. Where do they
come from?

-- 
Gareth McCaughan
.sig under construc
From: Ray Dillinger
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <3FF5C4BC.46B7E089@sonic.net>
Gareth McCaughan wrote:
> 
> Ray Dillinger wrote:
> 
> > Yes.  With appropriate optional declarations, Lisp code usually
> > generates machine code that runs within 5% of the speed of comparable
> > C code (and on many problems, actually runs faster).  And you can
> > decide which code to decorate with optional declarations on the
> > basis of profiling.  Even without declarations, it's rarely less
> > than 85% as fast as C when compiled.
> 
> These numbers are better than I've seen with CMU CL, though
> I haven't done extensive testing, and CMU CL is generally
> reckoned one of the faster implementations. Where do they
> come from?
> 

These numbers *are* what I've seen with CMUCL.  I've been using it 
to do genetic algorithms to find optimal players of turn-based games. 

I got 85% of the speed of my C code, then gave it type declarations
and watched it pop up to 98% of the speed of my C code. Virtually no
time is spent in the Garbage Collector.

What numbers would you consider typical?

				Bear
From: rif
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <wj0wu8agjnc.fsf@five-percent-nation.mit.edu>
Ray Dillinger <····@sonic.net> writes:

> Gareth McCaughan wrote:
> > 
> > Ray Dillinger wrote:
> > 
> > > Yes.  With appropriate optional declarations, Lisp code usually
> > > generates machine code that runs within 5% of the speed of comparable
> > > C code (and on many problems, actually runs faster).  And you can
> > > decide which code to decorate with optional declarations on the
> > > basis of profiling.  Even without declarations, it's rarely less
> > > than 85% as fast as C when compiled.
> > 
> > These numbers are better than I've seen with CMU CL, though
> > I haven't done extensive testing, and CMU CL is generally
> > reckoned one of the faster implementations. Where do they
> > come from?
> > 
> 
> These numbers *are* what I've seen with CMUCL.  I've been using it 
> to do genetic algorithms to find optimal players of turn-based games. 
> 
> I got 85% of the speed of my C code, then gave it type declarations
> and watched it pop up to 98% of the speed of my C code. Virtually no
> time is spent in the Garbage Collector.
> 
> What numbers would you consider typical?
> 
> 				Bear

I do a lot of numerical work in CMUCL --- mostly signal processing and
machine learning.  Very heavy on floating point computations.

Without declarations and inlining, I find that I typically get
programs that're 10 times or more slower than a similar C program.
Doing all my arithmetic generic, and needing to box up the result of
every function that returns a floating point, gives me programs that
are way too slow, even compiled.  Once I add declarations, I feel like
I can usually get within 50% or so of C.  Then I usually have to do
another round of optimization (if I care about making the program
faster), focussing on avoiding excessive consing, and that usually
gets me to within 25% or so.  These numbers are all somewhat rough
estimates.

rif

ps. It took me awhile to get the hang of the optimization, but now I
have a bunch of macros that make it almost transparent --- for
instance, instead of using map, I'll use df-vmap, which tells me that
all the inputs are vectors of double-floats and the output should also
be one, and optionally the output can overwrite one of the input
arguments.  I have a number of other similar macros, so now I feel
like I can write fast programs pretty rapidly, without getting bogged
down declaring everything.  Certainly, I'm happy with where I am right
now on the optimization front --- I feel like CMUCL gives me a good
balance of easily being able to write programs that are fast enough.
I had to debug a piece of C++ code last week, and if it wasn't for
valgrind, I'd still be there.
From: Ray Dillinger
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <3FF63DCF.3D45583D@sonic.net>
rif wrote:
> 
> Ray Dillinger <····@sonic.net> writes:
> 
> > Gareth McCaughan wrote:
> > >
> > > Ray Dillinger wrote:
> > >
> > > > Yes.  With appropriate optional declarations, Lisp code usually
> > > > generates machine code that runs within 5% of the speed of comparable
> > > > C code (and on many problems, actually runs faster).  And you can
> > > > decide which code to decorate with optional declarations on the
> > > > basis of profiling.  Even without declarations, it's rarely less
> > > > than 85% as fast as C when compiled.
> > >
> > > These numbers are better than I've seen with CMU CL, though
> > > I haven't done extensive testing, and CMU CL is generally
> > > reckoned one of the faster implementations. Where do they
> > > come from?
> > >
> >
> > These numbers *are* what I've seen with CMUCL.  I've been using it
> > to do genetic algorithms to find optimal players of turn-based games.
> >
> > I got 85% of the speed of my C code, then gave it type declarations
> > and watched it pop up to 98% of the speed of my C code. Virtually no
> > time is spent in the Garbage Collector.
> >
> > What numbers would you consider typical?
> >
> >                               Bear
> 
> I do a lot of numerical work in CMUCL --- mostly signal processing and
> machine learning.  Very heavy on floating point computations.
> 
> Without declarations and inlining, I find that I typically get
> programs that're 10 times or more slower than a similar C program.
> Doing all my arithmetic generic, and needing to box up the result of
> every function that returns a floating point, gives me programs that
> are way too slow, even compiled.  Once I add declarations, I feel like
> I can usually get within 50% or so of C.  Then I usually have to do
> another round of optimization (if I care about making the program
> faster), focussing on avoiding excessive consing, and that usually
> gets me to within 25% or so.  These numbers are all somewhat rough
> estimates.
> 

Interesting.  

My GA code is essentially a translation of my C code - so I'm not 
using higher order functions like map, and as it happens I do just 
about no consing at all once the population of genomes is initialized 
- the "loser" in each contest is overwritten by an offspring of one 
or both of the two "winners". 

Suppose the speed difference comes down to the fact that I have been 
looking at something that's not, after all, written in a very lispy 
style?

				Bear
From: rif
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <wj0d6a11wsi.fsf@five-percent-nation.mit.edu>
Ray Dillinger <····@sonic.net> writes:

> rif wrote:
> > 
> > Ray Dillinger <····@sonic.net> writes:
> > 
> > > Gareth McCaughan wrote:
> > > >
> > > > Ray Dillinger wrote:
> > > >
> > > > > Yes.  With appropriate optional declarations, Lisp code usually
> > > > > generates machine code that runs within 5% of the speed of comparable
> > > > > C code (and on many problems, actually runs faster).  And you can
> > > > > decide which code to decorate with optional declarations on the
> > > > > basis of profiling.  Even without declarations, it's rarely less
> > > > > than 85% as fast as C when compiled.
> > > >
> > > > These numbers are better than I've seen with CMU CL, though
> > > > I haven't done extensive testing, and CMU CL is generally
> > > > reckoned one of the faster implementations. Where do they
> > > > come from?
> > > >
> > >
> > > These numbers *are* what I've seen with CMUCL.  I've been using it
> > > to do genetic algorithms to find optimal players of turn-based games.
> > >
> > > I got 85% of the speed of my C code, then gave it type declarations
> > > and watched it pop up to 98% of the speed of my C code. Virtually no
> > > time is spent in the Garbage Collector.
> > >
> > > What numbers would you consider typical?
> > >
> > >                               Bear
> > 
> > I do a lot of numerical work in CMUCL --- mostly signal processing and
> > machine learning.  Very heavy on floating point computations.
> > 
> > Without declarations and inlining, I find that I typically get
> > programs that're 10 times or more slower than a similar C program.
> > Doing all my arithmetic generic, and needing to box up the result of
> > every function that returns a floating point, gives me programs that
> > are way too slow, even compiled.  Once I add declarations, I feel like
> > I can usually get within 50% or so of C.  Then I usually have to do
> > another round of optimization (if I care about making the program
> > faster), focussing on avoiding excessive consing, and that usually
> > gets me to within 25% or so.  These numbers are all somewhat rough
> > estimates.
> > 
> 
> Interesting.  
> 
> My GA code is essentially a translation of my C code - so I'm not 
> using higher order functions like map, and as it happens I do just 
> about no consing at all once the population of genomes is initialized 
> - the "loser" in each contest is overwritten by an offspring of one 
> or both of the two "winners". 
> 
> Suppose the speed difference comes down to the fact that I have been 
> looking at something that's not, after all, written in a very lispy 
> style?
> 
> 				Bear

That seems pretty reasonable.

For me, one of the real advantages of CL is that I find that
programming in a functional style often leads to programs that have
fewer bugs.  More specifically, I find I'm often presented with a
signal that needs to go through various modifications, filters, etc.
The "natural" way for me to write this in CL is to have each
processing step generate a new copy of the array.  In C, I'd be more
likely to overwrite at every step.

The CL approach leads to very flexible programs, because if I find
later that I need a given step for a couple of different purposes (I
want to use a given array as input to a new step, and also sum over it
to compute its energy), I don't have to be as careful about
understanding when in the program the array contains the information I
want.

The downside is that I do a lot more consing.  But then, once the
program's working, if this is really the problem (and often it isn't,
often the program is fast enough, and then I really win, because I was
able to write it much faster), I go through and make various steps in
the algorithm start "sharing" arrays.  But usually this is pretty
easy, especially with the macros I have.

Really, I should make a higher level abstraction for doing this kind
of thing that automatically deduces when a given array can be written
over and when I need to cons up a new one...

Cheers,

rif
From: Thomas F. Burdick
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <xcvllopmnx3.fsf@famine.OCF.Berkeley.EDU>
rif <···@mit.edu> writes:

> The downside is that I do a lot more consing.  But then, once the
> program's working, if this is really the problem (and often it isn't,
> often the program is fast enough, and then I really win, because I was
> able to write it much faster), I go through and make various steps in
> the algorithm start "sharing" arrays.  But usually this is pretty
> easy, especially with the macros I have.
> 
> Really, I should make a higher level abstraction for doing this kind
> of thing that automatically deduces when a given array can be written
> over and when I need to cons up a new one...

That's it right there -- that's the thing that got me hooked to Lisp,
and that's the reason that I'll only use Lisp for hard problems[*].  I
don't know a name for the concept, but it's the ability to describe a
problem in an abstract, comprehensible way, and have it execute
efficiently.  It's the impetus behind Literate Programming, AOP, High
Level (mostly-)Functional languages, and on and on.

[*] Even when the end product is in another language, I'll give you
one guess as to what language I developed the program in :-)

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'                               
From: Paolo Amoroso
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <87smixpbmv.fsf@plato.moon.paoloamoroso.it>
rif writes:

> ps. It took me awhile to get the hang of the optimization, but now I
> have a bunch of macros that make it almost transparent --- for
> instance, instead of using map, I'll use df-vmap, which tells me that
> all the inputs are vectors of double-floats and the output should also
> be one, and optionally the output can overwrite one of the input
> arguments.  I have a number of other similar macros, so now I feel

Could this be done with compiler macros?


Paolo
-- 
Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film
From: rif
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <wj0hdzddssi.fsf@five-percent-nation.mit.edu>
Paolo Amoroso <·······@mclink.it> writes:

> rif writes:
> 
> > ps. It took me awhile to get the hang of the optimization, but now I
> > have a bunch of macros that make it almost transparent --- for
> > instance, instead of using map, I'll use df-vmap, which tells me that
> > all the inputs are vectors of double-floats and the output should also
> > be one, and optionally the output can overwrite one of the input
> > arguments.  I have a number of other similar macros, so now I feel
> 
> Could this be done with compiler macros?
> 
> 
> Paolo
> -- 
> Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film

I've never used compiler macros, so I'm not quite sure, but looking at
it a little, I don't immediately see why it would help.  The compiler
macro would still need to know the types, and that still means
declarations, right?  Can you give an example of how you think
compiler macros might help in a situation like this?  I (as always)
could be missing something obvious.

rif
From: Paolo Amoroso
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <877k09hrgy.fsf@plato.moon.paoloamoroso.it>
rif writes:

> it a little, I don't immediately see why it would help.  The compiler
> macro would still need to know the types, and that still means
> declarations, right?  Can you give an example of how you think
> compiler macros might help in a situation like this?  I (as always)

If I understand compiler macros correctly, they would allow you to
rewrite ordinary calls to MAP (taking types and declarations into
account) instead of explicitly calling custom mappers such as your
DF-VMAP.


Paolo
-- 
Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film
From: Hannah Schroeter
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <bt7n0c$k43$1@c3po.use.schlund.de>
Hello!

Paolo Amoroso  <·······@mclink.it> wrote:
>rif writes:

>> it a little, I don't immediately see why it would help.  The compiler
>> macro would still need to know the types, and that still means
>> declarations, right?  Can you give an example of how you think
>> compiler macros might help in a situation like this?  I (as always)

>If I understand compiler macros correctly, they would allow you to
>rewrite ordinary calls to MAP (taking types and declarations into
>account) instead of explicitly calling custom mappers such as your
>DF-VMAP.

There's AFAIK no (portable) way to access type inference/declaration
information from the compiler macro expander code. The compiler
macro just gets the form calling the function, just like an ordinary
macro gets the macro call form as argument.

So this seems to be difficult at least.

Kind regards,

Hannah.
From: Gareth McCaughan
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <87isjuf5rk.fsf@g.mccaughan.ntlworld.com>
Ray Dillinger wrote:

> > > Yes.  With appropriate optional declarations, Lisp code usually
> > > generates machine code that runs within 5% of the speed of comparable
> > > C code (and on many problems, actually runs faster).  And you can
> > > decide which code to decorate with optional declarations on the
> > > basis of profiling.  Even without declarations, it's rarely less
> > > than 85% as fast as C when compiled.
> > 
> > These numbers are better than I've seen with CMU CL, though
> > I haven't done extensive testing, and CMU CL is generally
> > reckoned one of the faster implementations. Where do they
> > come from?
> 
> These numbers *are* what I've seen with CMUCL.  I've been using it 
> to do genetic algorithms to find optimal players of turn-based games. 
> 
> I got 85% of the speed of my C code, then gave it type declarations
> and watched it pop up to 98% of the speed of my C code. Virtually no
> time is spent in the Garbage Collector.
> 
> What numbers would you consider typical?

I've seen a wide range, from approximately C speed down to something
like 1/4 of C speed (after adding declarations, in both cases). The
latter was on an example someone posted in c.l.l, which -- if I'm
remembering right -- did badly because of suboptimalities in CMU CL's
register allocation. That's all with declarations; without, speeds
can be way way below C speed.

Here's a toy example of what can happen without declarations.
First, some code in C:

    int main(void) {
      int i;
      double s = 1;
      for (i=0; i<10000000; ++i) {
        s += i/s;
      }
      printf("%g\n", s);
    }

which runs in about 0.26 seconds on my machine after compiling
with "gcc -O3". And some corresponding declaration-free code in CL:

    (defun foo ()
      (let ((s 1d0))
        (loop for i from 0 below 10000000 do
          (incf s (/ i s)))
        (format t "~&~A~%" s)))

which runs in about 7.3 seconds on my machine, including
approximately 2.3 seconds of GC. (That's after compilation,
of course. Interpreted, it would be much slower.)

To get decent speed from this function, you need to add
two declarations:

    (defun foo ()
      (let ((s 1d0))
        (declare (type double-float s))
        (loop for i fixnum from 0 below 10000000 do
          (incf s (/ i s)))
        (format t "~&~A~%" s)))

at which point CMU CL does it in about 0.26 seconds,
just like "gcc -O3". It seems to me that CMU CL really
ought to be able to infer that i is a fixnum (because
it is never set other than by the loop, and its limits
are fixna) and then that s is a double-float (by induction),
but it can't :-).

I'm using CMU CL 18e on an Athlon machine running FreeBSD.

-- 
Gareth McCaughan
.sig under construc
From: Thomas F. Burdick
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <xcvsmiylzrm.fsf@famine.OCF.Berkeley.EDU>
Gareth McCaughan <················@pobox.com> writes:

>     (defun foo ()
>       (let ((s 1d0))
>         (loop for i from 0 below 10000000 do
>           (incf s (/ i s)))
>         (format t "~&~A~%" s)))

Note that you're doing two things here: testing CMUCL's ability to
infer the type of S, and picking on its LOOP.  If you use DOTIMES:

  (defun foo ()
    (let ((s 1d0))
      (dotimes (i 10000000)
        (incf s (/ i s)))
      (format t "~&~A~%" s)))

it can figure out the type of I.  I'd file that under CMUCL quirks,
not shortcomings, because it depends on the user's style.

S, however, is a good example of a shortcoming in Python.  It can't
figure out that (/ I S) is a double-float until it knowns the type of
S, for which it needs to know the type of (/ I S).

So, you do need declarations to get decent speed, but only one:

  (defun foo ()
    (let ((s 1d0))
      (dotimes (i 10000000)
        (incf s (the double-float (/ i s))))
      (format t "~&~A~%" s)))

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'                               
From: Marc Battyani
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <bt4qcu$tq1@library2.airnews.net>
"Gareth McCaughan" <················@pobox.com> wrote
> Here's a toy example of what can happen without declarations.
> First, some code in C:
>
>     int main(void) {
>       int i;
>       double s = 1;
>       for (i=0; i<10000000; ++i) {
>         s += i/s;
>       }
>       printf("%g\n", s);
>     }

On my Dell Inspiron P4M 2.2GHz
C : 0.363s
Lispworks 4.3 : 0.390s

The Lispworks inner loop

     142:      81FF0040420F     cmp   edi, F424000
     148:      7C10             jl    L4
L3:  150:      81C300010000     add   ebx, 100
     156:      81FB000A0000     cmp   ebx, A00
     162:      7D85             jge   L1
     164:      EBE6             jmp   L2
L4:  166:      DD45F8           fldl  [ebp-8]
     169:      DC7DF0           fdivrl [ebp-10]
     172:      DC45F8           faddl [ebp-8]
     175:      DD5DF8           fstpl [ebp-8]
     178:      DD45F0           fldl  [ebp-10]
     181:      DD0548A86720     fldl  [2067A848]       ; 1.0
     187:      DEC1             faddp st(1), st
     189:      DD5DF0           fstpl [ebp-10]
     192:      81C700010000     add   edi, 100
     198:      81FF0040420F     cmp   edi, F424000
     204:      7DC8             jge   L3
     206:      EBD6             jmp   L4

Look very good for me!

I just splitted the loop in 2 to stay in the small Lispworks fixnums

(defun foo ()
  (declare (optimize (speed 3)(compilation-speed 0)
                     (float 0)(space 0)(safety 0)(debug 0)))
  (let ((s 1.0d0)(i 0.0d0))
    (declare (type double-float s i))
    (loop for n fixnum from 0 below 10 do
          (loop for m fixnum from 0 below 1000000 do
                (incf s (/ i s))
                (incf i 1.0d0)))
    (format t "~&~A~%" s)))

Marc
From: Christophe Rhodes
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <sqekuix9bj.fsf@lambda.jcn.srcf.net>
> "Gareth McCaughan" <················@pobox.com> wrote
>> Here's a toy example of what can happen without declarations.
                                           ^^^^^^^^^^^^^^^^^^^^
"Marc Battyani" <·············@fractalconcept.com> writes:

> (defun foo ()
>   (declare (optimize (speed 3)(compilation-speed 0)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>                      (float 0)(space 0)(safety 0)(debug 0)))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   (let ((s 1.0d0)(i 0.0d0))
>     (declare (type double-float s i))
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>     (loop for n fixnum from 0 below 10 do
                  ^^^^^^
>           (loop for m fixnum from 0 below 1000000 do
                        ^^^^^^
>                 (incf s (/ i s))
>                 (incf i 1.0d0)))
>     (format t "~&~A~%" s)))

Christophe
-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)
From: Marc Battyani
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <bt4ts3$vog@library2.airnews.net>
"Christophe Rhodes" <·····@cam.ac.uk> wrote
> > "Gareth McCaughan" <················@pobox.com> wrote
> >> Here's a toy example of what can happen without declarations.
>                                            ^^^^^^^^^^^^^^^^^^^^
> "Marc Battyani" <·············@fractalconcept.com> writes:
>
> > (defun foo ()
> >   (declare (optimize (speed 3)(compilation-speed 0)
>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >                      (float 0)(space 0)(safety 0)(debug 0)))
>                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >   (let ((s 1.0d0)(i 0.0d0))
> >     (declare (type double-float s i))
>       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >     (loop for n fixnum from 0 below 10 do
>                   ^^^^^^
> >           (loop for m fixnum from 0 below 1000000 do
>                         ^^^^^^
> >                 (incf s (/ i s))
> >                 (incf i 1.0d0)))
> >     (format t "~&~A~%" s)))

OK, I declare I will read the post before replying next time...
BTW LW needs more of them than CMUCL. (especially compared to Thomas'
example)

Marc
From: Gareth McCaughan
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <87ekuieyob.fsf@g.mccaughan.ntlworld.com>
"Marc Battyani" <·············@fractalconcept.com> writes:

> "Gareth McCaughan" <················@pobox.com> wrote
> > Here's a toy example of what can happen without declarations.
> > First, some code in C:
> >
> >     int main(void) {
> >       int i;
> >       double s = 1;
> >       for (i=0; i<10000000; ++i) {
> >         s += i/s;
> >       }
> >       printf("%g\n", s);
> >     }
> 
> On my Dell Inspiron P4M 2.2GHz
> C : 0.363s
> Lispworks 4.3 : 0.390s
> 
> The Lispworks inner loop
> 
>      142:      81FF0040420F     cmp   edi, F424000
>      148:      7C10             jl    L4
> L3:  150:      81C300010000     add   ebx, 100
>      156:      81FB000A0000     cmp   ebx, A00
>      162:      7D85             jge   L1
>      164:      EBE6             jmp   L2
> L4:  166:      DD45F8           fldl  [ebp-8]
>      169:      DC7DF0           fdivrl [ebp-10]
>      172:      DC45F8           faddl [ebp-8]
>      175:      DD5DF8           fstpl [ebp-8]
>      178:      DD45F0           fldl  [ebp-10]
>      181:      DD0548A86720     fldl  [2067A848]       ; 1.0
>      187:      DEC1             faddp st(1), st
>      189:      DD5DF0           fstpl [ebp-10]
>      192:      81C700010000     add   edi, 100
>      198:      81FF0040420F     cmp   edi, F424000
>      204:      7DC8             jge   L3
>      206:      EBD6             jmp   L4
> 
> Look very good for me!
> 
> I just splitted the loop in 2 to stay in the small Lispworks fixnums
> 
> (defun foo ()
>   (declare (optimize (speed 3)(compilation-speed 0)
>                      (float 0)(space 0)(safety 0)(debug 0)))
>   (let ((s 1.0d0)(i 0.0d0))
>     (declare (type double-float s i))
>     (loop for n fixnum from 0 below 10 do
>           (loop for m fixnum from 0 below 1000000 do
>                 (incf s (/ i s))
>                 (incf i 1.0d0)))
>     (format t "~&~A~%" s)))

Here's CMU CL's assembler for the original code with
declarations of S as double-float and I as fixnum, but
no optimize declaration and no type-diddling.

4818D278:       .ENTRY FOO()                 ; (FUNCTION NIL
                                             ;  (OR BASE-STRING NULL))
     290:       POP   DWORD PTR [EBP-8]
     293:       LEA   ESP, [EBP-32]

     296:       TEST  ECX, ECX
     298:       JNE   L4
     29E:       FSTPD FR1                    ; No-arg-parsing entry point
     2A0:       FLD1
     2A2:       FXCH  FR1
     2A4:       XOR   EAX, EAX

;; The inner loop begins here:

     2A6: L0:   MOV   ECX, EAX
     2A8:       SAR   ECX, 2
     2AB:       MOV   [EBP-12], ECX
     2AE:       FSTPD FR0
     2B0:       FILD  [EBP-12]
     2B3:       FDIVD FR1
     2B5:       FADD-STI FR1
     2B7:       SAR   EAX, 2
     2BA:       INC   EAX
     2BB:       MOV   ECX, EAX
     2BD:       SHL   ECX, 1
     2BF:       JO    L6
     2C5:       SHL   ECX, 1
     2C7:       JO    L6
     2CD: L1:   TEST  CL, 3
     2D0:       JNE   L9
     2D6:       MOV   EAX, ECX
     2D8:       CMP   EAX, 40000000
     2DD:       JL    L0

... and what follows is code for the call to FORMAT, error handling
and the like, plus (at L6) overflow-handling code because it isn't
quite sure the loop counter won't overflow :-).

With (declare (optimize (speed 3) (safety 0))), I get this
instead:

481A6968:       .ENTRY FOO()                 ; (FUNCTION NIL NULL)
     980:       POP   DWORD PTR [EBP-8]
     983:       LEA   ESP, [EBP-32]
     986:       FSTPD FR1                    ; No-arg-parsing entry point
     988:       FLD1
     98A:       FXCH  FR1
     98C:       XOR   EAX, EAX

;; The inner loop begins here:

     98E: L0:   MOV   ECX, EAX
     990:       SAR   ECX, 2
     993:       MOV   [EBP-12], ECX
     996:       FSTPD FR0
     998:       FILD  [EBP-12]
     99B:       FDIVD FR1
     99D:       FADD-STI FR1
     99F:       ADD   EAX, 4
     9A2:       CMP   EAX, 40000000
     9A7:       JL    L0

... and what follows is error-handling, FORMAT, etc. That looks
pretty good to me. GCC -O3 produces this instead:

main:
        pushl %ebp
        movl %esp,%ebp
        subl $24,%esp
        fld1
        xorl %eax,%eax
        .p2align 2,0x90

;; Start of inner loop:

.L6:
        movl %eax,-4(%ebp)
        fildl -4(%ebp)
        fdiv %st(1),%st
        incl %eax
        faddp %st,%st(1)
        cmpl $9999999,%eax
        jle .L6

... followed, again, by non-inner-loopy stuff. Three
instructions shorter than CMU CL's code.

-- 
Gareth McCaughan
.sig under construc
From: Rahul Jain
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <87n094o4u7.fsf@nyct.net>
Gareth McCaughan <················@pobox.com> writes:

> which runs in about 0.26 seconds on my machine after compiling
> with "gcc -O3". And some corresponding declaration-free code in CL:
        ^^^^^^^                          ^^^^^^^^^^^^^^^^
That's not declaration-free. Apples and oranges, my friend. :)

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Gareth McCaughan
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <87isjsea33.fsf@g.mccaughan.ntlworld.com>
Rahul Jain <·····@nyct.net> writes:

> Gareth McCaughan <················@pobox.com> writes:
> 
> > which runs in about 0.26 seconds on my machine after compiling
> > with "gcc -O3". And some corresponding declaration-free code in CL:
>         ^^^^^^^                          ^^^^^^^^^^^^^^^^
> That's not declaration-free. Apples and oranges, my friend. :)

That's a matter of terminology. I think the correct generalization
of "declaration-free" is "not requiring any alterations to the
code". Compiling everything -O3 fits that.

-- 
Gareth McCaughan
.sig under construc
From: Rahul Jain
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <87isjso37r.fsf@nyct.net>
Gareth McCaughan <················@pobox.com> writes:

> Rahul Jain <·····@nyct.net> writes:
>
>> Gareth McCaughan <················@pobox.com> writes:
>> 
>> > which runs in about 0.26 seconds on my machine after compiling
>> > with "gcc -O3". And some corresponding declaration-free code in CL:
>>         ^^^^^^^                          ^^^^^^^^^^^^^^^^
>> That's not declaration-free. Apples and oranges, my friend. :)
>
> That's a matter of terminology. I think the correct generalization
> of "declaration-free" is "not requiring any alterations to the
> code". Compiling everything -O3 fits that.

The fact remains that you're comparing output that's optmized to output
that's unoptimized and claiming that the unoptimized code is slower.

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Gareth McCaughan
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <87d6a0djd2.fsf@g.mccaughan.ntlworld.com>
Rahul Jain <·····@nyct.net> writes:

> Gareth McCaughan <················@pobox.com> writes:
> 
> > Rahul Jain <·····@nyct.net> writes:
> >
> >> Gareth McCaughan <················@pobox.com> writes:
> >> 
> >> > which runs in about 0.26 seconds on my machine after compiling
> >> > with "gcc -O3". And some corresponding declaration-free code in CL:
> >>         ^^^^^^^                          ^^^^^^^^^^^^^^^^
> >> That's not declaration-free. Apples and oranges, my friend. :)
> >
> > That's a matter of terminology. I think the correct generalization
> > of "declaration-free" is "not requiring any alterations to the
> > code". Compiling everything -O3 fits that.
> 
> The fact remains that you're comparing output that's optmized to output
> that's unoptimized and claiming that the unoptimized code is slower.

Yes, that's true. What I didn't mention is that adding
(declare (optimize (speed 3) (safety 0)))
at the start of the function doesn't actually make
much difference. It's the type declarations that do.
With those, even without an OPTIMIZE declaration,
its code runs at about the same speed as C's.

Of course you could also argue that no C code is
declaration-free, because every variable has a type. :-)

-- 
Gareth McCaughan
.sig under construc
From: Rahul Jain
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <8765frl95m.fsf@nyct.net>
Gareth McCaughan <················@pobox.com> writes:

> Yes, that's true. What I didn't mention is that adding
> (declare (optimize (speed 3) (safety 0)))
> at the start of the function doesn't actually make
> much difference. It's the type declarations that do.
> With those, even without an OPTIMIZE declaration,
> its code runs at about the same speed as C's.

Ok, fair enough. I suppose you got lots of notes about type
uncertainties. Also, ISTR that the COMPILE-FILE does a better job of
optimization than COMPILE, but that may just be a result of knowing the
type signature of all functions being called by all other functions
before fully optimizing the code. Not really relevant here, if that's
the case.

> Of course you could also argue that no C code is
> declaration-free, because every variable has a type. :-)

Indeed. :)

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Gareth McCaughan
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <87isjrcqdv.fsf@g.mccaughan.ntlworld.com>
Rahul Jain <·····@nyct.net> writes:

> Gareth McCaughan <················@pobox.com> writes:
> 
> > Yes, that's true. What I didn't mention is that adding
> > (declare (optimize (speed 3) (safety 0)))
> > at the start of the function doesn't actually make
> > much difference. It's the type declarations that do.
> > With those, even without an OPTIMIZE declaration,
> > its code runs at about the same speed as C's.
> 
> Ok, fair enough. I suppose you got lots of notes about type
> uncertainties. Also, ISTR that the COMPILE-FILE does a better job of
> optimization than COMPILE, but that may just be a result of knowing the
> type signature of all functions being called by all other functions
> before fully optimizing the code. Not really relevant here, if that's
> the case.

All done with COMPILE-FILE. But indeed I wouldn't expect it
to make a difference here.

-- 
Gareth McCaughan
.sig under construc
From: Tim Bradshaw
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <fbc0f5d1.0401050451.9520b81@posting.google.com>
Gareth McCaughan <················@pobox.com> wrote in message news:<··············@g.mccaughan.ntlworld.com>...

> That's a matter of terminology. I think the correct generalization
> of "declaration-free" is "not requiring any alterations to the
> code". Compiling everything -O3 fits that.

But the C code was just *full* of very precise type declarations!
From: Gareth McCaughan
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <873caucvn6.fsf@g.mccaughan.ntlworld.com>
Tim Bradshaw wrote:

[I said:]
>> That's a matter of terminology. I think the correct generalization
>> of "declaration-free" is "not requiring any alterations to the
>> code". Compiling everything -O3 fits that.
> 
> But the C code was just *full* of very precise type declarations!

As I pointed out myself elsewhere in the thread.

-- 
Gareth McCaughan
.sig under construc
From: Eric Marsden
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <wzismiy4bt4.fsf@melbourne.laas.fr>
>>>>> "rd" == Ray Dillinger <····@sonic.net> writes:

  rd> Yes.  With appropriate optional declarations, Lisp code usually
  rd> generates machine code that runs within 5% of the speed of comparable
  rd> C code (and on many problems, actually runs faster).  And you can 
  rd> decide which code to decorate with optional declarations on the 
  rd> basis of profiling.  Even without declarations, it's rarely less 
  rd> than 85% as fast as C when compiled.

I agree that some CL implementations have excellent performance, for a
dynamic language, and that profiling and adding declarations can
greatly improve speed, but this is not the first time you have made
comparisons with compiled C that made me raise my eyebrows ("actually
runs faster", "without declarations, 85% of comparable C code"). Do
you have a reference?
  
-- 
Eric Marsden                          <URL:http://www.laas.fr/~emarsden/>
From: Joe Marshall
Subject: Re: garbage collector pauses, help?
Date: 
Message-ID: <4qverwak.fsf@ccs.neu.edu>
"Blake D" <············@yahoo.com> writes:

> Hello,
>    I am new to lisp, and my team is a making a massively multiplayer online
> role playing game, similar in concept to asheron's call, everquest, or
> ultima online.  Of course, much more fun ;).

Is your entire team new to Lisp?

> Execution speed:
> Performance on the server is important. Each server must support hundreds
> (or thousands) of players at the same time. I am willing to accept an
> overall performance penalty for using lisp. I would be ok with code that
> runs at 50% or better of the speed of the same kind of server in c++. Can
> this be accomplished in lisp fairly easily?

No problem.

> Pauses:
> The server will produce a lot of garbage as players walk across the world
> and map parts are loaded and unloaded into memory. It would be hard to
> accommodate the server pausing for more than 3-5 seconds at most, ever. I
> read the manual about the LispWorks generational garbage collector, and I
> was thinking that the server should GC all generations periodically because
> it is supposed to run continuously, not ever shut down except for
> maintenance. It seems to me that if levels 2 and 3 are occasionally added to
> but never GC'ed, they would eventually consume all available memory.
>
> I have no idea yet how much memory the server will be using, but total
> memory usage ranging from 300 MB to a few gigabytes would be possible.
>
> Is there any way that garbage collection of all generations can be done
> without exceeding my pause limits?

It is likely that you can accomplish this, but it is also quite
possible that you will need to tune the GC and pay careful attention
to what you are doing to avoid getting very screwed.

It will depend on your rate of allocation, your rate of retention
(consing stuff that you throw away immeditately is inexpensive),
whether you have any large resources amenable to manual management,
etc.  It sounds obvious, but once you reach equilibrium, you must
deallocate a byte for every byte you allocate.  If you do not, you
will run out of memory.  With the proper tuning, however, and a
good-sized heap, you can run for a long time without having to perform
a major collection.