From: ··········@questions.com
Subject: Speed of memory management in Lispworks
Date: 
Message-ID: <bi6ihtkku2qij28oaourtoctlcd02mhpaq@4ax.com>
I have noticed that even though Lispworks compiled code is very fast in tight
loops, it seems very slow when it involves memory allocation.  It seems to
take about 500 CPU cycles per cons cell, including whatever time that adds to
garbage collection.  In C++ I can allocate and free a memory cell by taking
it off a "free list" and putting it back on, which adds up to just a few CPU
instructions.  I'm confused about why Lispworks takes so long.  What is it
doing?  Is it the garbage collection that takes most of the time?

From: Stephen J. Bevan
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <m366eeg55t.fsf@yahoo.com>
··········@questions.com writes:
> I have noticed that even though Lispworks compiled code is very fast in tight
> loops, it seems very slow when it involves memory allocation.  It seems to
> take about 500 CPU cycles per cons cell, including whatever time that adds to
> garbage collection.  In C++ I can allocate and free a memory cell by taking
> it off a "free list" and putting it back on, which adds up to just a few CPU
> instructions.  I'm confused about why Lispworks takes so long.  What is it
> doing?  Is it the garbage collection that takes most of the time?

I don't know anything about LispWorks memory management but I can say
something about its stablemate MLWorks.  On a SPARC it would take 3
instructions to allocate space for a cons.  IIRC it was somewhere
between 9 and 15 on an ix86 (I've forgotten the exact sequences).
Consequently I would expect LispWorks to have similar numbers.
How did you arrive at the 500 cycles measurement?
From: ··········@questions.com
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <bcaiht06rmof2uaqv30n5ii7h3cf97pbbv@4ax.com>
On Sat, 02 Jun 2001 17:31:11 GMT, ·············@yahoo.com (Stephen J. Bevan)
wrote:

>How did you arrive at the 500 cycles measurement?

From a tight loop that repeatedly makes a short list of small integers and
assigns it to the same variable, thereby abandoning the previous short list.
I no longer have the exact code, because the free version of Lispworks exits
after a few hours.  If you have a better test, and don't have Lispworks
installed, you can post it and I will run it and post the results.  My
calculations took into account how many cons cells my short list had, how
many iterations, and the speed of my CPU.  A tight Lispworks loop without any
consing can take as little as 4 to 5 machine cycles per iteration, which
implies that most of the time consumed is related to the consing.
From: Kent M Pitman
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <sfwbso6n1mj.fsf@world.std.com>
··········@questions.com writes:

> On Sat, 02 Jun 2001 17:31:11 GMT, ·············@yahoo.com (Stephen J. Bevan)
> wrote:
> 
> >How did you arrive at the 500 cycles measurement?
> 
> From a tight loop that repeatedly makes a short list of small integers and
> assigns it to the same variable, thereby abandoning the previous short list.
> I no longer have the exact code, because the free version of Lispworks exits
> after a few hours.  If you have a better test, and don't have Lispworks
> installed, you can post it and I will run it and post the results.

Just wondering if, for example, you disabled multiprocessing so that you 
weren't timing process switches and other background processes.

You should always do mp:without-preemption around such things.  It's 
possible that on a tight loop that does nothing, you go fast enough that
you get no process switches, for example, but that if you are consing
either the fact of some function call or the fact of a little longer run
causes a process switch.

> My
> calculations took into account how many cons cells my short list had, how
> many iterations, and the speed of my CPU.  A tight Lispworks loop without any
> consing can take as little as 4 to 5 machine cycles per iteration, which
> implies that most of the time consumed is related to the consing.

This wasn't really the part I was caring about, though I can see why you'd
think I was.  It is important to know this as well, though, in order to have
any useful discussion.  One should always talk about the speed of specific
code, not the speed of code in general, I think.  Different things are often
differenc speeds.  That's why there is no one single benchmark that works
for all purposes.
From: ··········@questions.com
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <qjhiht8plg47tv4lpi7stu4l6d6tgpe4u5@4ax.com>
On Sat, 2 Jun 2001 19:05:40 GMT, Kent M Pitman <······@world.std.com> wrote:

>You should always do mp:without-preemption around such things.  It's 

Ok, I did that, and it made it 8.8% faster.  I will always do that from now
on.  But in any case, my main question still stands.  What is Lispworks doing
that takes so much time?  I'm not trying to fine-tune the speed, but just to
understand what it could be doing during all that time.  Multiprocessing was
a factor I should have thought of, but it turns out not to be a big factor,
so there must be some other big factor I don't know about.
From: Janis Dzerins
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <878zj8eigv.fsf@asaka.latnet.lv>
··········@questions.com writes:

> On Sat, 2 Jun 2001 19:05:40 GMT, Kent M Pitman <······@world.std.com> wrote:
> 
> >You should always do mp:without-preemption around such things.  It's 
> 
> Ok, I did that, and it made it 8.8% faster.  I will always do that from now
> on.

I don't think you should *always* do that.

> But in any case, my main question still stands.  What is Lispworks doing
> that takes so much time?

Have you looked at disassembly?

-- 
Janis Dzerins

  If million people say a stupid thing it's still a stupid thing.
From: Stephen J. Bevan
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <m3zobqe66q.fsf@yahoo.com>
··········@questions.com writes:
> On Sat, 02 Jun 2001 17:31:11 GMT, ·············@yahoo.com (Stephen J. Bevan)
> >How did you arrive at the 500 cycles measurement?
> 
> From a tight loop that repeatedly makes a short list of small integers and
> assigns it to the same variable, thereby abandoning the previous short list.
> I no longer have the exact code, because the free version of Lispworks exits
> after a few hours.  If you have a better test, and don't have
> Lispworks ...

My better test, such that it is, is to try and coax LispWorks to
display the assembly for a function that uses cons so that you can see
exactly what instructions are used -- that's how you would find out
the sequences using MLWorks.  I've no idea if LispWorks in general, or
the particular version you are using, will let you do this though.
From: Kent M Pitman
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <sfwsnhircfr.fsf@world.std.com>
··········@questions.com writes:

> I have noticed that even though Lispworks compiled code is very fast in tight
> loops, it seems very slow when it involves memory allocation.  It seems to
> take about 500 CPU cycles per cons cell, including whatever time that adds to
> garbage collection.  In C++ I can allocate and free a memory cell by taking
> it off a "free list" and putting it back on, which adds up to just a few CPU
> instructions.  I'm confused about why Lispworks takes so long.  What is it
> doing?  Is it the garbage collection that takes most of the time?

I have no idea if you're right or not, but the first order of business
would be for you to say precisely how you arrived at the 500 CPU cycle
number.
From: ··········@questions.com
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <3jdihtci5hjg2stn8l5uup16ovf0cr3qan@4ax.com>
On Sat, 2 Jun 2001 17:58:32 GMT, Kent M Pitman <······@world.std.com> wrote:

>I have no idea if you're right or not, but the first order of business
>would be for you to say precisely how you arrived at the 500 CPU cycle
>number.

The 500 was from using list to allocate the cons cells.  When I use the
following code, it's reduced by a factor of 2.  But it's still an order of
magnitude more time-consuming than it seems like it should be.

Here are two versions of the loop, with and without the cons:

(defvar *X*)
(defun test1 (n) (dotimes (j n) (dotimes (i 1000) (setq *X* 0))))
(defun test2 (n) (dotimes (j n) (dotimes (i 1000) (setq *X* (cons 1 2)))))
(compile 'test1)
(compile 'test2)

Why not run them yourself and see if you get the same results?  The one
without the cons runs in a small fraction of the time, making it clear that
the time-consumption is related to the cons.
From: Christopher J. Vogt
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <3B1961DE.7C518D57@computer.org>
··········@questions.com wrote:
> 
> On Sat, 2 Jun 2001 17:58:32 GMT, Kent M Pitman <······@world.std.com> wrote:
> 
> >I have no idea if you're right or not, but the first order of business
> >would be for you to say precisely how you arrived at the 500 CPU cycle
> >number.
> 
> The 500 was from using list to allocate the cons cells.  When I use the
> following code, it's reduced by a factor of 2.  But it's still an order of
> magnitude more time-consuming than it seems like it should be.
> 
> Here are two versions of the loop, with and without the cons:
> 
> (defvar *X*)
> (defun test1 (n) (dotimes (j n) (dotimes (i 1000) (setq *X* 0))))
> (defun test2 (n) (dotimes (j n) (dotimes (i 1000) (setq *X* (cons 1 2)))))
> (compile 'test1)
> (compile 'test2)
> 
> Why not run them yourself and see if you get the same results?  The one
> without the cons runs in a small fraction of the time, making it clear that
> the time-consumption is related to the cons.

To determine much of anything, you'd really need to run the same sort of code
on the same machine written in C or some other language for comparison.  Maybe
your machine is paging, maybe it is some cache issue.  It's easy to say what
it *should* be in C, but it is equally easy to say what it should be in Lisp,
and just as easy to be wrong.  Performance measuring and understanding is 
generally much more difficult than most people think.
From: Bulent Murtezaoglu
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <87wv6uh2vl.fsf@nkapi.internal>
>>>>> "CJV" == Christopher J Vogt <····@computer.org> writes:
[...]
    CJV> To determine much of anything, you'd really need to run the
    CJV> same sort of code on the same machine written in C or some
    CJV> other language for comparison.  Maybe your machine is paging,
    CJV> maybe it is some cache issue.  It's easy to say what it
    CJV> *should* be in C, but it is equally easy to say what it
    CJV> should be in Lisp, and just as easy to be wrong.  [...]

All good advice.  Might I add that we *DO* have a disassemble facility 
in CL?  It is not too hard to notice that there's a type check or
register allocation issue by glancing at the at the disassembly 
output.  Compliers usually warn about doing type checks, but some
remain silent if there wasn't an optimizer to try. 

cheers,

BM
From: Carl Shapiro
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <ouy1yp2sdad.fsf@panix3.panix.com>
··········@questions.com writes:

                       In C++ I can allocate and free a memory cell by taking
> it off a "free list" and putting it back on, which adds up to just a few CPU
> instructions.  
                 
Lisp can have similar allocation and deallocation semantics when using
resourced storage (pools).
From: Lieven Marchand
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <m3u21yfv5r.fsf@localhost.localdomain>
··········@questions.com writes:

> In C++ I can allocate and free a memory cell by taking it off a
> "free list" and putting it back on, which adds up to just a few CPU
> instructions.

If you insist that a general malloc/free or new/delete implementation
will always work with just a few CPU instructions, you need to study
Paul Wilson's review of memory allocators
(ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps).

But yes, your test generates a lot of garbage, and gc will take some
time. Allocation in modern gc implementations is generally faster than
with malloc implementations. In the extreme case, it is an increase of
a pointer + a check whether the pointer has gone out of the range of
the current generation.

-- 
Lieven Marchand <···@wyrd.be>
Making laws appears to win votes. Enforcing them doesn't. 
See Rule One.         Roger Burton West in the monastery.
From: Pierre R. Mai
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <87n17q4e20.fsf@orion.bln.pmsf.de>
··········@questions.com writes:

> I have noticed that even though Lispworks compiled code is very fast in tight
> loops, it seems very slow when it involves memory allocation.  It seems to
> take about 500 CPU cycles per cons cell, including whatever time that adds to
> garbage collection.  In C++ I can allocate and free a memory cell by taking
> it off a "free list" and putting it back on, which adds up to just a few CPU
> instructions.  I'm confused about why Lispworks takes so long.  What is it
> doing?  Is it the garbage collection that takes most of the time?

You are probably comparing apples and oranges there.

Let's take a look at the following function:

(defun time-cons (x)
  (declare (optimize (speed 3) (safety 0) (space 0) (debug 0))
           (fixnum x))
  (let ((res nil))
    (dotimes (i x)
      (declare (fixnum i))
      (dotimes (j 1000)
        (declare (fixnum j))
        (setq res (cons i j))))
    res))

When we time this function on 10000 outer loops (thereby consing 10M
conses, or around 80MB of space), LispWorks takes around 2.9 seconds
on an AMD K6-2/550.  That comes out at around 290ns or less than 160
cycles per iteration.

Now note that the code is doing more than just allocating the conses,
it is also initializing both the car and cdr of those to some fixnum.
To determine the amount of time this takes, we compare the runtime
with the following function:

(defun time-set-cons (x)
  (declare (optimize (speed 3) (safety 0) (space 0) (debug 0))
           (fixnum x))
  (let ((res (cons 0 0)))
    (dotimes (i x)
      (declare (fixnum i))
      (dotimes (j 1000)
        (declare (fixnum j))
        (rplaca res i)
        (rplacd res j)))
    res))

This takes 1.1 seconds for 10000 outer loops, or around 110ns of the
290ns we determined earlier.  In reality this influence is likely to
be _greater_, though, since the memory changed in time-set-cons always
is in the 1st level cache, whereas we modify a bounded set of changing
conses in time-cons, and it is likely that the infirmary generation is
larger than the 1st level cache.

Experimentation using arrays of conses to simulate non-locality
effects on the cache take from 1.3s (1000 conses, i.e. a little over
12KB working set, fully inside the 32KB 1st level data-cache), through
1.5s (10000 conses, i.e. 80KB WS inside 2nd level data-cache), to 3.0s
(100000 conses, i.e. 800KB WS outside 2nd level data-cache).

So we likely will have to explain only less than 180ns or less than
100 cycles for the allocation and reclamation of each cons.

If we now take a look at the GC information that LW's extended-time
macro will give us, we can see that we spend nearly 1.3s of 3.1s
runtime in around 420 mark-and-sweep collections of the infirmary
generation.  That seems a tad too long, and I'm certain that tuning
LWWs GC parameters would reduce that amount of time.

What remains is between 10ns and 50ns or 6-27 cycles per cons
allocated, which seems reasonable, given that allocation takes little
time in a well designed system.

On CMU CL, which hasn't the best of generational GCs, we get the
following numbers:

					Real	User	Sys	GC
TIME-SET-CONS:				  50	  50	   0	   0
TIME-CONS, 128KB infirmary:		8640	7810	 820	5290
TIME-CONS, default 2MB infirmary:	4160	3430	 730	 940
TIME-CONS, 20MB infirmary:		3440	2460	 970	  70

The system time is mostly because of ZERO page initialization of
memory, which CMU CL does by unmapping and remapping of pages on
Linux.  This hurts performance a bit in this instance, because we
initialize the cons anyway.

As we can see, large infirmary generations can help in the case of
massive ephemeral garbage creation, because of savings in the overhead
of collections, when collections are costly.

To show how much code is needed to implement the non-GC part of
TIME-CONS, here's the disassembly of it on CMU CL (which provided the
easiest to follow disassembly):

480106F0:       .ENTRY TIME-CONS()           ; FUNCTION
     708:       POP   DWORD PTR [EBP-8]
     70B:       LEA   ESP, [EBP-32]
     70E:       MOV   EAX, 671088651         ; No-arg-parsing entry point
     713:       XOR   ECX, ECX
     715:       JMP   L5
     717: L0:   XOR   EBX, EBX
     719:       JMP   L4
     71B: L1:   MOV   BYTE PTR [#x280001D4], 0
     722:       MOV   BYTE PTR [#x280001BC], 4
     729:       MOV   EAX, 8
     72E:       ADD   EAX, [#x8069724]       ; current_region_free_pointer
     734:       CMP   EAX, [#x80696F8]       ; current_region_end_addr
     73A:       JBE   L2
     73C:       CALL  #x80536F8              ; alloc_overflow_eax
     741: L2:   XCHG  EAX, [#x8069724]       ; current_region_free_pointer
     747:       LEA   EAX, [EAX+3]
     74A:       MOV   BYTE PTR [#x280001BC], 0
     751:       CMP   BYTE PTR [#x280001D4], 0
     758:       JEQ   L3
     75A:       BREAK 9                      ; Pending interrupt trap
     75C: L3:   MOV   [EAX-3], ECX
     75F:       MOV   [EAX+1], EBX
     762:       ADD   EBX, 4
     765: L4:   CMP   EBX, 4000
     76B:       JL    L1
     76D:       ADD   ECX, 4
     770: L5:   CMP   ECX, EDX
     772:       JL    L0
     774:       MOV   EDX, EAX
     776:       MOV   ECX, [EBP-8]
     779:       MOV   EAX, [EBP-4]
     77C:       ADD   ECX, 2
     77F:       MOV   ESP, EBP
     781:       MOV   EBP, EAX
     783:       JMP   ECX
     785:       NOP
     786:       NOP
     787:       NOP

In the above the BYTE PTR stuff 

     71B: L1:   MOV   BYTE PTR [#x280001D4], 0
     722:       MOV   BYTE PTR [#x280001BC], 4

and

     74A:       MOV   BYTE PTR [#x280001BC], 0
     751:       CMP   BYTE PTR [#x280001D4], 0
     758:       JEQ   L3
     75A:       BREAK 9                      ; Pending interrupt trap

is interspersed interrupt handling code and can be ignored.

The inner and outer loops are pretty trivial, and the allocation stuff
is just the following:

     729:       MOV   EAX, 8
     72E:       ADD   EAX, [#x8069724]       ; current_region_free_pointer

We want to allocate 8 bytes and calculate the new value of the free
pointer after allocating 8 bytes.

     734:       CMP   EAX, [#x80696F8]       ; current_region_end_addr
     73A:       JBE   L2
     73C:       CALL  #x80536F8              ; alloc_overflow_eax

We check if this was more than was available.  If yes then we have to
call the overflow handler, which will start GC, and/or switch to
another region, etc.  In the normal case we just fall through:

     741: L2:   XCHG  EAX, [#x8069724]       ; current_region_free_pointer

We write back the modified freepointer, and retrieve the old free
pointer, which is the pointer to our allocated cons cell, modulo
pointer tag handling done by:

     747:       LEA   EAX, [EAX+3]

Done.  After the interrupt handling code, we see just the
initialiazation of our cons (set car and cdr to i and j):

     75C: L3:   MOV   [EAX-3], ECX
     75F:       MOV   [EAX+1], EBX

And the rest of the inner and outer loops.

Running similar benchmarks on ACL we see that we can get as low as
1.2s/65 cycles total time (i.e. allocation + initialization + gc +
loop overhead) per cons, suggesting that that implementation's GC
default parameters are better suited for this scenario, in that the
infirmary GC scheme is fast enough to be called very often with little
overhead, and hence the infirmary generation fits into the 2nd level
cache, making memory accesses quite a bit faster.

Now back to your comparison with C++:

- Note first that on any modern CPU, any instruction that touches
  memory is going to take many cycles, especially if it has to touch
  normal memory (even on a 550 MHz machine, any instruction that
  touches PC100 SDRAM will take more than 10ns, and that doesn't
  include secondary effects from pipe-line stalls, etc.).

- The correct comparisson to C/C++ would be to allocate and deallocate
  the cons cell using malloc/free or new/delete respectively.  On said
  AMD, the C version of time-cons takes 5s for 10M conses, the C++
  version takes 6.8s.

- If you allow optimizations based on the knowledge that the conses
  that are allocated become garbage at once, then you can get faster
  C/C++ performance, but you can also gain much faster performance
  from the CL version, as seen by time-set-cons, which on CMU CL
  performs as fast as the C variant of the same thing.  Ditto for
  manual pooling of allocation requests, etc.

- Note that normal allocation patterns have much more intricate
  life-cycle patterns, thereby making trivial manual allocation
  schemes impossible, yielding even higher differences in performance
  between automatic and manual storage reclamation schemes[1].

All in all, this example can give some interesting food for thought
for those interested in the inner workings of GC and implementations,
though it teaches us little about real-world performance levels.

Regs, Pierre.

Footnotes: 
[1]  With thanks to Duane Rettig...

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein
From: Rob Warnock
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <9fc7vk$dbdp3$1@fido.engr.sgi.com>
Pierre R. Mai <····@acm.org> wrote:
+---------------
| ...it is likely that the infirmary generation is larger than
| the 1st level cache.
+---------------

Do you really mean "infirmary" here? Doubtless this is a cross-language
difference in slang, but I'm curious as to the connotations of the word
"infirmary" in your native tongue. In English, an "infirmary" is a hospital
or clinic [a place to take care of sick people].

In the GC literature I've read, the slang term I've most often seen
used for a fixed-location ephemeral allocation area is "nursery" --
in English, a place where infant children (or plants) are cared for
(or grown), which they eventually leave (or are transplanted from),
never to return.


-Rob

p.s. In the U.S., the sequence of educational insitutions is typically
thus: nursery school (or "day care", if there's no explicit educational
component) -> pre-school (a.k.a. kindergarten) -> grammar school -> middle
school (only in a few places) -> high school -> college -> "post-grad"
(after one's bachelor's degree) -> "post-doc" (after a PhD). [And then
"life", one hopes!]

-----
Rob Warnock, 31-2-510		<····@sgi.com>
SGI Network Engineering		<URL:http://reality.sgi.com/rpw3/>
1600 Amphitheatre Pkwy.		Phone: 650-933-1673
Mountain View, CA  94043	PP-ASEL-IA

[Note: Please don't use ·········@sgi.com or ········@sgi.com ]
From: Pierre R. Mai
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <87n17po4em.fsf@orion.bln.pmsf.de>
····@rigden.engr.sgi.com (Rob Warnock) writes:

> Pierre R. Mai <····@acm.org> wrote:
> +---------------
> | ...it is likely that the infirmary generation is larger than
> | the 1st level cache.
> +---------------
> 
> Do you really mean "infirmary" here? Doubtless this is a cross-language

No, of course not, I did mean nursery, as you point out.  I plead the
late hour of the night, and certain strange language pathways that
sometimes spring up in my brain.  That said, let's see how this might
have come about:

> difference in slang, but I'm curious as to the connotations of the word
> "infirmary" in your native tongue. In English, an "infirmary" is a hospital
> or clinic [a place to take care of sick people].

The picture that I might have had in mind, was that (especially in the
case at hand) most members of the nursery don't survive it, by virtue
of having little vitality left in them, i.e. of them being "infirm" in
some way.  Mortality rates in nurseries luckilly don't exceed 10% most
of the time, hence infirmary and not nursery somehow sprang to mind...

> In the GC literature I've read, the slang term I've most often seen
> used for a fixed-location ephemeral allocation area is "nursery" --
> in English, a place where infant children (or plants) are cared for
> (or grown), which they eventually leave (or are transplanted from),
> never to return.

I've always thought that this was in slight contradiction to the
conjecture underlying ephemeral or generational garbage collectors,
namely that of high infant mortality.  Since this was the sole reason
that ephemeral garbage collectors where thought to be a win at first,
it would be counter-productive to put nurture and care into those
infants.  That is of course in stark contrast to the nurseries in
real-life.

As it later turned out, the conjecture doesn't actually have to hold,
in order for ephemeral GC to be a win, though.

Regs, Pierre.

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein
From: romeo bernardi
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <obtS6.27171$vt.256859@news1.tin.it>
Here's the result of a very simple-minded test I did.  I tried a
similar function on LW, Corman Lisp, and Petite Chez Scheme.
(People who think that Scheme is not Lisp please skip the results of
Chez).

It seems that, if the results of this test extends in some measure to
real programs, LW has room for improvement in memory management.

NOTES:

- Corman Lisp did not like LOOP, so I replaced the LOOP expression
  with DOTIMES.

- In all three systems, GET-INTERNAL-RUN-TIME in the CLs and RUN-TIME
  in PCS (which is documented as being equivalent to Cl's
  GET-INTERNAL-RUN-TIME), apparently returns _real_time_, or something
  very close to it, instead of run time.  Anyway, there were no other
  processes running.

- None of the systems had enough smarts to dead-code away the computation
  being timed.

- The systems were tested as they come out of the box, with no
  particular tuning.

================
;;; CL

(declaim (optimize speed (safety 0) (debug 0)))

(defconstant +hz-cpu+ 701.6e6)

(defun straconsa (n l)
  (let* ((inizio (get-internal-run-time))
         (foo (loop repeat n
                    do (make-list l)))
         (fine (get-internal-run-time))
         (qcons (* n l))
         (tempo (float (/ (- fine inizio) internal-time-units-per-second)))
         (cons-per-sec (/ qcons tempo))
         (cicli-per-cons (* tempo +hz-cpu+ (/ qcons))))
    (declare (ignore foo))
    (format t "~&cons: ~S~%" qcons)
    (format t "~&tempo: ~S~%" tempo)
    (format t "~&cons per sec.: ~S~%" cons-per-sec)
    (format t "~&cicli per cons: ~S~%" cicli-per-cons)))

================

WARNING! Scheme code follows!
Those who can't stand the sight, please skip the next 24 lines.

================
;;; Chez Scheme

(define hz-cpu 701.6e6)

(define internal-time-units-per-second 1000.0)

(define (straconsa n l)
  (let* ((inizio (cpu-time))
         (foo (repeat n
                (make-list l)))
         (fine (cpu-time))
         (qcons (* n l))
         (tempo (/ (- fine inizio) internal-time-units-per-second))
         (cons-per-sec (/ qcons tempo))
         (cicli-per-cons (* tempo hz-cpu (/ qcons))))
    (printf "cons: ~S~%" qcons)
    (printf "tempo: ~S~%" tempo)
    (printf "cons per sec.: ~S~%" cons-per-sec)
    (printf "cicli per cons: ~S~%" cicli-per-cons)))

================

RESULTS
-------
                        Cicles x cons (min/max of a few runs)
LW
(straconsa 1000 1000)     266/343
(straconsa 1000 100000)   1472/1534

Corman
(straconsa 1000 1000)     69/79
(straconsa 1000 100000)   369/379

PCS
(straconsa 1000 1000)     35/42
(straconsa 1000 100000)   74/75

====

Athlon / 701.6 Mhz(*) / Windows ME
Lispworks Personal Edition 4.1.20
Corman Lisp 1.42
Petite Chez Scheme Version 6.0a

(*) As reported by an utility called 'cpuz'. Nominally 700 Mhz.

--
Pierpaolo Bernardi
From: Pierre R. Mai
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <87puclo570.fsf@orion.bln.pmsf.de>
"romeo bernardi" <········@tin.it> writes:

> It seems that, if the results of this test extends in some measure to
> real programs, LW has room for improvement in memory management.

Like I wrote in my own silly-benchmark-of-the-week article, results
from benchmarks like these are pretty unlikely to give any real
insight into real-world programs, for a number of reasons:

1. Real world programs don't behave like this (generating tons conses
   which are thrown away the moment they have been created and
   initialized).

2. If they did, it would be trivial to optimize away this sort of waste
   by resourcing the conses in question.  See point 1.

3. Since GC performance is highly correlated with the tuning of the GC
   parameters to the life-time patterns encountered, and given that
   real world programs don't do the thing we are talking about (see
   points 1 and 2), it seems highly likely that any serious
   implementation isn't going to come tuned to this improbable pattern
   out-of-the-box.

I have never looked at tuning GC parameters in LWL/LWW, hence I'll
leave speculations on better parameters to others.

Regs, Pierre.

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein
From: romeo bernardi
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <WQyS6.28525$vt.278838@news1.tin.it>
"Pierre R. Mai" <····@acm.org> ha scritto nel messaggio
···················@orion.bln.pmsf.de...
> "romeo bernardi" <········@tin.it> writes:
>
> > It seems that, if the results of this test extends in some measure to
> > real programs, LW has room for improvement in memory management.
>
> Like I wrote in my own silly-benchmark-of-the-week article, results
> from benchmarks like these are pretty unlikely to give any real
> insight into real-world programs, for a number of reasons:

Yes, yes, you are right. We both know that a CL cannot
be evaluated by a 2 lines loop.

I just wanted to test whether lispnewbie's impression had
some fundament in reality. As it happens, this impression
was not without base: at least in this artificial bench, LW
does not do well. Whether this has implications for real
use, clearly requires more research.   8-)


Cheers,
  Pierpaolo
From: Marco Antoniotti
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <y6cy9r8ntsz.fsf@octagon.mrl.nyu.edu>
··········@questions.com writes:

> I have noticed that even though Lispworks compiled code is very fast in tight
> loops, it seems very slow when it involves memory allocation.  It seems to
> take about 500 CPU cycles per cons cell, including whatever time that adds to
> garbage collection.  In C++ I can allocate and free a memory cell by taking
> it off a "free list" and putting it back on, which adds up to just a few CPU
> instructions.

And then get a "free list corrupt" message in RPM that not even

	rpm --rebuilddb

can fix?

I am really really serious here.  This is the situation I am in right
now.  My RPM db is messed up and all I am told I can do is to
reinstall RMP 3.xx, rebuild the database (meanwhile screwing up the
dependencies in other packages) and hope for the best.

All of this, because somebody sometime thought that it is "faster" to
go the C++ way.

It is true that I am using free software and therefore I shouldn't
complain, but I thought that this is a good cautionary tale about non
GCed languages :)

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: Raymond Toy
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <4n4rtwnn7e.fsf@rtp.ericsson.se>
>>>>> "lispnewbie" == lispnewbie  <··········@questions.com> writes:

    lispnewbie> garbage collection.  In C++ I can allocate and free a
    lispnewbie> memory cell by taking it off a "free list" and putting
    lispnewbie> it back on, which adds up to just a few CPU
    lispnewbie> instructions.  I'm confused about why Lispworks takes
    lispnewbie> so long.  What is it

It's been a long time since I looked, but malloc and free aren't all
that cheap.  Looking things up on a free list takes a lot of cycles,
and freeing does too.  At least the fast malloc that comes with XEmacs
looks pretty expensive for malloc and free.

For comparison, on the sparc port of CMUCL, allocating a cons cell is
a handful of instructions to increment a pointer.  GC is lot more,
though.


Ray
From: James Hague
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <3b1bdbf5_2@newsfeeds>
Raymond Toy:
>
> It's been a long time since I looked, but malloc and free aren't all
> that cheap.  Looking things up on a free list takes a lot of cycles,
> and freeing does too.  At least the fast malloc that comes with XEmacs
> looks pretty expensive for malloc and free.

malloc was traditionally coded with the assumptions that (a) most
dynamically allocated blocks will not be of a trivial size (i.e. allocating
blocks of 4 or 8 bytes is silly, as the header takes more space), and (b)
within reason, reducing fragmentation is more important than speed of
allocation (the underlying assumption being that calling malloc inside of
real-time loops is misguided).

With OOP, these characteristics changed.  In a patch for Visual C++,
Microsoft incorporated a new memory manager that could improve allocation
speed by something like 40 times.  But you can't assume that every
implementation of malloc is designed to be performance oriented.

James




-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----==  Over 80,000 Newsgroups - 16 Different Servers! =-----
From: Marco Antoniotti
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <y6celt0dnns.fsf@octagon.mrl.nyu.edu>
"James Hague" <···········@volition-inc.com> writes:

	...

> With OOP, these characteristics changed.  In a patch for Visual C++,
> Microsoft incorporated a new memory manager that could improve allocation
> speed by something like 40 times.  But you can't assume that every
> implementation of malloc is designed to be performance oriented.

Just recently we ported a piece of code from Solaris/Linux/IRIX to
Windows. The piece of code has worked flawlessly for years under these
UN*X environments.

Then we ported it to Windows and the system started behaving badly in a
random way.  We were compiling the piece of code under VC++.  After a
couple of hours of head scratching, we finally noticed that what
should have been a null pointer was not null.  It turned out that one
allocator function (note that we are working in C) was doing something
like

	zut* z;
        z = (zut*) malloc(sizeof(zut) * N);
	/* happily use 'z' hereafter. */

Now the question. Why doesn't VC++ 'malloc' execute the missing
'memset 0' whe had to manually insert everywhere?
Or, why in the world should I worry about these things if I am not
writing a device driver for a network-enabled espresso machine? :)

lazily yours.

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: Raymond Toy
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <4nu21wm1az.fsf@rtp.ericsson.se>
>>>>> "Marco" == Marco Antoniotti <·······@cs.nyu.edu> writes:

    Marco> Just recently we ported a piece of code from Solaris/Linux/IRIX to
    Marco> Windows. The piece of code has worked flawlessly for years under these
    Marco> UN*X environments.

    Marco> Then we ported it to Windows and the system started behaving badly in a
    Marco> random way.  We were compiling the piece of code under VC++.  After a
    Marco> couple of hours of head scratching, we finally noticed that what
    Marco> should have been a null pointer was not null.  It turned out that one
    Marco> allocator function (note that we are working in C) was doing something
    Marco> like

    Marco> 	zut* z;
    Marco>         z = (zut*) malloc(sizeof(zut) * N);
    Marco> 	/* happily use 'z' hereafter. */

    Marco> Now the question. Why doesn't VC++ 'malloc' execute the missing
    Marco> 'memset 0' whe had to manually insert everywhere?

But nowhere in the C standard says a NULL pointer is really 0.  C++
has some other rule that says 0 is equivalent to NULL, but doesn't
actually say NULL is 0.  (I think.  I don't do C++ much anymore.)

(Or perhaps, I don't follow your example and comments.)

Ray
From: Marco Antoniotti
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <y6c66ecdjs1.fsf@octagon.mrl.nyu.edu>
Raymond Toy <···@rtp.ericsson.se> writes:

> >>>>> "Marco" == Marco Antoniotti <·······@cs.nyu.edu> writes:
> 
>     Marco> Just recently we ported a piece of code from Solaris/Linux/IRIX to
>     Marco> Windows. The piece of code has worked flawlessly for years under these
>     Marco> UN*X environments.
> 
>     Marco> Then we ported it to Windows and the system started behaving badly in a
>     Marco> random way.  We were compiling the piece of code under VC++.  After a
>     Marco> couple of hours of head scratching, we finally noticed that what
>     Marco> should have been a null pointer was not null.  It turned out that one
>     Marco> allocator function (note that we are working in C) was doing something
>     Marco> like
> 
>     Marco> 	zut* z;
>     Marco>         z = (zut*) malloc(sizeof(zut) * N);
>     Marco> 	/* happily use 'z' hereafter. */
> 
>     Marco> Now the question. Why doesn't VC++ 'malloc' execute the missing
>     Marco> 'memset 0' whe had to manually insert everywhere?
> 
> But nowhere in the C standard says a NULL pointer is really 0.  C++
> has some other rule that says 0 is equivalent to NULL, but doesn't
> actually say NULL is 0.  (I think.  I don't do C++ much anymore.)
> 
> (Or perhaps, I don't follow your example and comments.)

You are right.  It is just a rant of mine against all the pitfalls you
have to be aware of when the language itself does not help you.

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: Geoff Summerhayes
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <thpsihg6dq5te4@corp.supernews.com>
"Marco Antoniotti" <·······@cs.nyu.edu> wrote in message ····················@octagon.mrl.nyu.edu...
>
> Raymond Toy <···@rtp.ericsson.se> writes:
>
> > >>>>> "Marco" == Marco Antoniotti <·······@cs.nyu.edu> writes:
> >
> >     Marco> Just recently we ported a piece of code from Solaris/Linux/IRIX to
> >     Marco> Windows. The piece of code has worked flawlessly for years under these
> >     Marco> UN*X environments.
> >
> >     Marco> Then we ported it to Windows and the system started behaving badly in a
> >     Marco> random way.  We were compiling the piece of code under VC++.  After a
> >     Marco> couple of hours of head scratching, we finally noticed that what
> >     Marco> should have been a null pointer was not null.  It turned out that one
> >     Marco> allocator function (note that we are working in C) was doing something
> >     Marco> like
> >
> >     Marco> zut* z;
> >     Marco>         z = (zut*) malloc(sizeof(zut) * N);
> >     Marco> /* happily use 'z' hereafter. */
> >
> >     Marco> Now the question. Why doesn't VC++ 'malloc' execute the missing
> >     Marco> 'memset 0' whe had to manually insert everywhere?
> >
> > But nowhere in the C standard says a NULL pointer is really 0.  C++
> > has some other rule that says 0 is equivalent to NULL, but doesn't
> > actually say NULL is 0.  (I think.  I don't do C++ much anymore.)
> >
> > (Or perhaps, I don't follow your example and comments.)
>
> You are right.  It is just a rant of mine against all the pitfalls you
> have to be aware of when the language itself does not help you.
>

I'd prefer to think of it as a rant against programmers relying on
implementation-dependant behaviour. Just because xyz's lisp does

-(DEFVAR X '(1 2 3 4))
X
-(NREVERSE X)
(4 3 2 1)
-X
(4 3 2 1)

doesn't mean I should skip the SETF.

Geoff
From: Pierre R. Mai
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <877kyrlxpe.fsf@orion.bln.pmsf.de>
Raymond Toy <···@rtp.ericsson.se> writes:

> But nowhere in the C standard says a NULL pointer is really 0.  C++

Yes it does.  See sections 6.3.2.3 and 7.17 of ISO/IEC 9899:1999.  A
null pointer constant is:

       An integer constant expression with  the  value  0,  or
       such  an  expression  cast  to type void *, is called a null
       pointer constant.46)  If a null pointer constant is assigned
       to or compared for equality to a pointer,  the  constant  is
       converted to a pointer of that type.  Such a pointer, called
       a null pointer,  is  guaranteed  to  compare  unequal  to  a
       pointer to any object or function.

I.e. it the following code fragment portably tests whether z is a null
pointer:

if (z == 0)
  stop_the_world();

Furthermore it is certain that the following expression is always
true:

 NULL == 0

The thing that is left open is the bit-pattern(s) that represent
null pointers at the memory level, which is relevant in this case to
some degree.

Regs, Pierre.

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein
From: Hartmann Schaffer
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <slrn9ho4vf.1e2.hs@paradise.nirvananet>
In article <··············@orion.bln.pmsf.de>, Pierre R. Mai wrote:
>Raymond Toy <···@rtp.ericsson.se> writes:
>
>> But nowhere in the C standard says a NULL pointer is really 0.  C++
>
>Yes it does.  See sections 6.3.2.3 and 7.17 of ISO/IEC 9899:1999.  A
>null pointer constant is:

i don't have the standard at hand, but i think the implication of those
sections is that "if the literal '0' is used in a pointer context it is
equivalent to the NULL pointer".  afaict, the main reason for the macro
NULL (usually implemented as a cast of 0 to a suitable pointer type) was
to provide for cases where the bit pattern for the null pointer is not
the machine implementation for the literal 0.

> ...

-- 

hs

----------------------------------------------------------------

"The cheapest pride is national pride.  I demonstrates the lack of
characteristics and achievements you can be proud of.  The worst loser
can have national pride"  - Schopenhauer
From: Pierre R. Mai
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <87wv6q3jly.fsf@orion.bln.pmsf.de>
··@paradise.nirvananet (Hartmann Schaffer) writes:

> In article <··············@orion.bln.pmsf.de>, Pierre R. Mai wrote:
> >Raymond Toy <···@rtp.ericsson.se> writes:
> >
> >> But nowhere in the C standard says a NULL pointer is really 0.  C++
> >
> >Yes it does.  See sections 6.3.2.3 and 7.17 of ISO/IEC 9899:1999.  A
> >null pointer constant is:
> 
> i don't have the standard at hand, but i think the implication of those
> sections is that "if the literal '0' is used in a pointer context it is
> equivalent to the NULL pointer".  afaict, the main reason for the macro
> NULL (usually implemented as a cast of 0 to a suitable pointer type) was
> to provide for cases where the bit pattern for the null pointer is not
> the machine implementation for the literal 0.

Just to close this whole thread out:  Please re-read my postings and
the standard.  NULL is _always_ defined as any kind of null pointer
constant (again see 7.17).  NULL is defined just for aesthetic
reasons.  This whole issue is completely seperate from the
"low-level" representation of null pointers, which can be any kind of
bit pattern, and can vary from pointer type to pointer type.

It was this issue of NULL vs. 0 that I was commenting on, not the
issue of "low-level" representation, which Raymond rightly pointed
out.  In fact I might have quoted a bit to concisely, thereby
misleading others into think I was speaking about the low-level
representation of null pointers.

For further information refer to the relevant standards.  The
meta-reason why I repeatedly brought up these unrelated language
issues is that I feel that if we discuss C, we should do it with the
same amount of accuracy that we afford CL.

Regs, Pierre.

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein
From: Barry Margolin
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <pCUS6.27$5t5.1087@burlma1-snr2>
In article <··············@orion.bln.pmsf.de>,
Pierre R. Mai <····@acm.org> wrote:
>Raymond Toy <···@rtp.ericsson.se> writes:
>
>> But nowhere in the C standard says a NULL pointer is really 0.  C++
>
>Yes it does.  See sections 6.3.2.3 and 7.17 of ISO/IEC 9899:1999.  A
>null pointer constant is:
>
>       An integer constant expression with  the  value  0,  or

That's a null pointer constant, not a null pointer.  He's talking about the
representation, not the syntax.

Assuming the following declarations:

  int *p;
  int i = 0;

You can assign a null pointer to p with:

  p = 0;

But neither:

  p = i;

nor

  memset(&p, 0, sizeof p);

are guaranteed to assign a null pointer to p.  It will work in most
architectures, mainly because the implementors chose to make the
representation mirror the syntax (perhaps because there was lots of old
code that made this assumption), but it's not required by the language.

Let's end this debate now.  This isn't comp.{lang,std}.c, and the issue
gets debated in those groups more than enough.  We don't need to have it
here.

-- 
Barry Margolin, ······@genuity.net
Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
From: George Neuner
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <3b1d1317.500203174@helice>
On 04 Jun 2001 16:34:44 -0400, Raymond Toy <···@rtp.ericsson.se>
wrote:

>But nowhere in the C standard says a NULL pointer is really 0.  C++
>has some other rule that says 0 is equivalent to NULL, but doesn't
>actually say NULL is 0.  (I think.  I don't do C++ much anymore.)

All the standard says is that NULL is a pointer value distinguishable
from all valid pointers.  The value zero just happens to be convenient
on most machines.

George
From: Hrvoje Niksic
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <sxselsyhfhl.fsf@florida.arsdigita.de>
Raymond Toy <···@rtp.ericsson.se> writes:

>    Marco> Now the question. Why doesn't VC++ 'malloc' execute the missing
>    Marco> 'memset 0' whe had to manually insert everywhere?
> 
> But nowhere in the C standard says a NULL pointer is really 0.

The `memset' Marco is talking about has nothing to do with NULL
pointers, really.  From what I gather, the source (wrongly) assumed
that all the memory returned by malloc will be cleared with zeros.

As far C goes, it is true that NULL pointer need not have a value of
zero, but it can be represented by `0' in a pointer context.
Therefore:

char *foo = 0;
/* `foo' is now a NULL pointer, and comparing it to NULL or to 0
   will yield true.  However, it doesn't mean that &foo contains an
   all-zero bit pattern.  */

The C definition of NULL pointer is pretty weird, in fact so weird
that few C programmers understand it.  The C FAQ dedicates a whole
chapter to the topic.

For instance, under a strict C definition of NULL pointer, the
standard practice of calling memset(struct_ptr, 0, sizeof(*struct_ptr))
does not guarantee that the pointer members of STRUCT_PTR will be
initialized to NULL!
From: Barry Margolin
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <AsSS6.25$5t5.926@burlma1-snr2>
In article <···············@octagon.mrl.nyu.edu>,
Marco Antoniotti  <·······@cs.nyu.edu> wrote:
>Now the question. Why doesn't VC++ 'malloc' execute the missing
>'memset 0' whe had to manually insert everywhere?

Because malloc() just does memory allocation, not initialization.  If you
want the memory to be zeroed, use calloc() instead of malloc().

Note, however, that memset 0 on a pointer has undefined consequences, as
the other responder mentioned.  If you want to null out a pointer, you have
to assign a null pointer constant to it, not use memset() or calloc().

-- 
Barry Margolin, ······@genuity.net
Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
From: Marco Antoniotti
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <y6c3d9gdjin.fsf@octagon.mrl.nyu.edu>
Barry Margolin <······@genuity.net> writes:

> In article <···············@octagon.mrl.nyu.edu>,
> Marco Antoniotti  <·······@cs.nyu.edu> wrote:
> >Now the question. Why doesn't VC++ 'malloc' execute the missing
> >'memset 0' whe had to manually insert everywhere?
> 
> Because malloc() just does memory allocation, not initialization.  If you
> want the memory to be zeroed, use calloc() instead of malloc().
> 
> Note, however, that memset 0 on a pointer has undefined consequences, as
> the other responder mentioned.  If you want to null out a pointer, you have
> to assign a null pointer constant to it, not use memset() or calloc().

Very good point indeed.  Ranting apart, this was a very good piece of
advice.

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.
From: Pierre R. Mai
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <87k82sklak.fsf@orion.bln.pmsf.de>
Marco Antoniotti <·······@cs.nyu.edu> writes:

> 	zut* z;
>         z = (zut*) malloc(sizeof(zut) * N);
> 	/* happily use 'z' hereafter. */
> 
> Now the question. Why doesn't VC++ 'malloc' execute the missing
> 'memset 0' whe had to manually insert everywhere?

Because zero initializing memory is not part of malloc's contract (see
section 7.20.3.3 of ISO/IEC 9899:1999).  If you want all bits zero
initialized memory, you need to use calloc (section 7.20.3.1).

This is not different from ANSI CL, where e.g. make-array is not
required to initialize the elements of the string it allocates, unless
you specify an initial-element or initial-contents.

> Or, why in the world should I worry about these things if I am not
> writing a device driver for a network-enabled espresso machine? :)

While I agree with you that C requires users to worry about too many
details that the system should worry about instead, I don't think that
in this particular instance C is to blame for the defect, IMHO.

Regs, Pierre.

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein
From: George Neuner
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <3b1d15a7.500859357@helice>
On 04 Jun 2001 15:56:39 -0400, Marco Antoniotti <·······@cs.nyu.edu>
wrote:

>Now the question. Why doesn't VC++ 'malloc' execute the missing
>'memset 0' whe had to manually insert everywhere?

Because malloc() is a primitive function upon which you build more
complicated things ... such as allocators that automagically zero
memory.


>Or, why in the world should I worry about these things if I am not
>writing a device driver for a network-enabled espresso machine? :)

If you are about to fill the memory with something else, clearing it
was a waste of time.  'C' is a systems language - it is about
efficiency, not protecting you from yourself.

George
From: Tim Bradshaw
Subject: Re: Speed of memory management in Lispworks
Date: 
Message-ID: <nkjn17okywt.fsf@tfeb.org>
··········@questions.com writes:

> I have noticed that even though Lispworks compiled code is very fast in tight
> loops, it seems very slow when it involves memory allocation.  It seems to
> take about 500 CPU cycles per cons cell, including whatever time that adds to
> garbage collection.  In C++ I can allocate and free a memory cell by taking
> it off a "free list" and putting it back on, which adds up to just a few CPU
> instructions.  I'm confused about why Lispworks takes so long.  What is it
> doing?  Is it the garbage collection that takes most of the time?

This is a kind of belated followup, sorry.

In general a good modern GC can be made to be very fast indeed if
you're willing to tune it.  Although the kind of `application' you
have is pretty non-typical, it can be made very fast in a copying
generational GC by making the first generation very large (if you have
enough memory). This causes the GC to run very infrequently, and since
it does basically no work when it runs (as everything is garbage).

The typical allocation mechanism for such a GC is increment a pointer,
check a bound and initialize.  This is almost definitively fast - it's
really hard to see how any system can allocate much faster than this.
There are actually a couple of tricks - you may be ablie to avoid the
bounds check in some cases if you know the allocation size; you may be
able to stack allocate (which can avoid the bounds check and also be
more cache-friendly perhaps); and finally you may be able to avoid
initialising for things like arrays of non-pointer types.  Many Lisp
systems can do some or all of these things.  I have persuaded Lisp
systems to allocate at a pretty high fraction of memory bandwidth
without very much tuning.

There was a thread where this was discussed a while ago (late last
year?).  If you look for something like `allocation' and either my
name or Duane Rettig's in deja (erm, google?) you will probably find
it.

In the specific case of LW I don't know enough about the GC to know
what the issues are.  However you can get dramatically better
performance by testing with something like (make-array n :element-type
'fixnum), so I suspect what you may be seeing is that MAKE-LIST is not
really optimized, and alternatively if you loop doing CONS each time
you tend to drown in loop overhead.  Unfortunately the machine I to
hand is a laptop and it has a funny (slow) memory system like many
laptops I think so I don't really trust the numbers I get from it.

--tim