From: Markus Freericks
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <3vssnd$hd0@news.cs.tu-berlin.de>
In article <············@wildcard.demon.co.uk> Cyber Surfer <············@wildcard.demon.co.uk> writes:
> It could be written as:
> 
> class str {
>     int iCount;
>     char achData[1];
> };
> 
> Then you can allocate the extra memory. C++, like C, doesn't stop
> you from doing this.

No, but its wrong, wrong, wrong. Imagine a compiler that uses different
pointer types for differently-sized objects, e.g. a PC compiler.

str* foo=malloc(sizeof(str)+7000)

will give you a near pointer, while

str* foo=malloc(sizeof(str)+70000)

will give you some kind of far pointer, and likely break your program.

The Correct way is

class str {
   int iCount;
   char achData[MAXDATA]; /* whatever maximum size you want */
};

#define alloc_str_of_size(N) (str*)malloc(sizeof(str)-MAXDATA+N)

-- Markus

From: Cyber Surfer
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <807568856snz@wildcard.demon.co.uk>
In article <··········@news.cs.tu-berlin.de>
           ···@cs.tu-berlin.de "Markus Freericks" writes:

> No, but its wrong, wrong, wrong. Imagine a compiler that uses different
> pointer types for differently-sized objects, e.g. a PC compiler.
> 
> str* foo=malloc(sizeof(str)+7000)
> 
> will give you a near pointer, while
> 
> str* foo=malloc(sizeof(str)+70000)
> 
> will give you some kind of far pointer, and likely break your program.

That's why I never use sizeof to find the length of a string. I _know_
when a C/C++ string object is the size the compiler thinks it is, and
when it is not.

It may not be clean, but it works.
 
> The Correct way is
> 
> class str {
>    int iCount;
>    char achData[MAXDATA]; /* whatever maximum size you want */
> };
> 
> #define alloc_str_of_size(N) (str*)malloc(sizeof(str)-MAXDATA+N)

So you still have to know when the true size of the string will be
different to what the compiler thinks it is. I don't this is any more
"correct" than any other version we've seen here. The differences
are of style, and a matter of how much work a programmer is prepared
to do for the compiler.

I prefer to just let a compiler handle all this, but as far as I'm
aware, there's still no "standard" C++ string class. Please correct
me if I'm wrong, as it would be news to me.

Perhaps this is just as well, as we might end up with one that we're
not happy with. At present, if we're lucky enough, we can choose how
to design a string class. It's generally a choice that has been made
for me, if I'm using a C++ framework.
-- 
<URL:http://www.demon.co.uk/community/index.html>
<URL:http://www.enrapture.com/cybes/namaste.html>
"You can never browse enough."
From: Scott Wheeler
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <jywahd@bmtech.demon.co.uk>
In Article <············@wildcard.demon.co.uk> Cyber Surfer  writes:
>I prefer to just let a compiler handle all this, but as far as I'm
>aware, there's still no "standard" C++ string class. Please correct
>me if I'm wrong, as it would be news to me.

See PJ Plauger "The draft ANSI C++ library". This part of the draft can 
be expected not to change significantly. It's implemented in the 
Borland compiler.

Scott
From: Cyber Surfer
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <807820047snz@wildcard.demon.co.uk>
In article <······@bmtech.demon.co.uk>
           ······@bmtech.demon.co.uk "Scott Wheeler" writes:

> See PJ Plauger "The draft ANSI C++ library". This part of the draft can 
> be expected not to change significantly. It's implemented in the 
> Borland compiler.

Is that a book? If so, do you have the ISBN?

Any idea when MS or Symantec will implement it? Sadly, I can't choose
which compiler I use, at the moment, and Borland isn't on the list.

Thanks.
-- 
<URL:http://www.demon.co.uk/community/index.html>
<URL:http://www.enrapture.com/cybes/namaste.html>
"You can never browse enough."
From: Scott Wheeler
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <jywfhu@bmtech.demon.co.uk>
In Article <············@wildcard.demon.co.uk> Cyber Surfer  writes:
>> See PJ Plauger "The draft ANSI C++ library". This part of the draft 
can
>> be expected not to change significantly. It's implemented in the
>> Borland compiler.
>
>Is that a book? 

Yes, a very useful reference, comments on motivation, and an 
implementation ("The ANSI C library", ibid, is also highly 
recommended). It doesn't cover the Standard Template Library, a recent 
addition to the draft, but since I don't know of a PC compiler with the 
extensions to the template functionality required to support the STL, 
this isn't a major loss.

> If so, do you have the ISBN?

At home - I'll bring it in tomorrow if no-one answers first.

>Any idea when MS or Symantec will implement it? Sadly, I can't choose
>which compiler I use, at the moment, and Borland isn't on the list.

I think it is possible to licence Plauger's implementation for 
commercial use (in source form in the book, and available on disk), 
which would be ideal providing MS doesn't choke on the syntax. Other 
than that, I'd suggest implementing the parts of the string class that 
you are actually using, which shouldn't be too time-consuming, and 
extend it to the full spec as needed.

One minor point (probably only of interest if you program for the 
Pacific Rim): in the first draft, strings were defined as a normal 
class. They are now supposed to be defined as a class template, so that 
strings can be produced for wide and long characters. From memory, 
PJP's  implementation is for the original spec, but it is trivial for 
this to become a specialised implementation of the general template. In 
any case, the interface for the programmer hasn't changed 
significantly.

Scott
From: Cyber Surfer
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <807948815snz@wildcard.demon.co.uk>
In article <······@bmtech.demon.co.uk>
           ······@bmtech.demon.co.uk "Scott Wheeler" writes:

> >Is that a book? 
> 
> Yes, a very useful reference, comments on motivation, and an 
> implementation ("The ANSI C library", ibid, is also highly 
> recommended). It doesn't cover the Standard Template Library, a recent 
> addition to the draft, but since I don't know of a PC compiler with the 
> extensions to the template functionality required to support the STL, 
> this isn't a major loss.

Yes, the lack of such a compiler is a major pain. It's made worse
by a lack of _choice_ of compiler, but that's another matter. I can
at least choose which books I read. ;-)
 
> > If so, do you have the ISBN?
> 
> At home - I'll bring it in tomorrow if no-one answers first.

Excellent. Thanks.
 
> >Any idea when MS or Symantec will implement it? Sadly, I can't choose
> >which compiler I use, at the moment, and Borland isn't on the list.
> 
> I think it is possible to licence Plauger's implementation for 
> commercial use (in source form in the book, and available on disk), 
> which would be ideal providing MS doesn't choke on the syntax. Other 
> than that, I'd suggest implementing the parts of the string class that 
> you are actually using, which shouldn't be too time-consuming, and 
> extend it to the full spec as needed.

Well, I'll consider that. It'll depend on how much time it takes,
of course. Not everything that is possible is also practical, but
it's worth a try.
 
> One minor point (probably only of interest if you program for the 
> Pacific Rim): in the first draft, strings were defined as a normal 
> class. They are now supposed to be defined as a class template, so that 
> strings can be produced for wide and long characters. From memory, 
> PJP's  implementation is for the original spec, but it is trivial for 
> this to become a specialised implementation of the general template. In 
> any case, the interface for the programmer hasn't changed 
> significantly.

Thanks.
-- 
<URL:http://www.demon.co.uk/community/index.html>
<URL:http://www.enrapture.com/cybes/namaste.html>
"You can never browse enough."
From: David Boles
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <40djgo$f2v@zeppelin.convex.com>
In article <············@wildcard.demon.co.uk>,
Cyber Surfer  <············@wildcard.demon.co.uk> wrote:
>In article <······@bmtech.demon.co.uk>
>           ······@bmtech.demon.co.uk "Scott Wheeler" writes:
>> addition to the draft, but since I don't know of a PC compiler with the 
>> extensions to the template functionality required to support the STL, 
>> this isn't a major loss.
>
>Yes, the lack of such a compiler is a major pain. It's made worse
>by a lack of _choice_ of compiler, but that's another matter. I can
>at least choose which books I read. ;-)

I'd like to point out that if you had a choice, such a compiler does
indeed exist. In fact it was the PC reference compiler for the STL
implementation: IBM's CSet++ 2.1 or 3.0. I use the public
implementation of STL with it and have had no problems so far. It is
an *extremely* high quality compiler that is very well supported.
If you have to use C++ on a PC, there is no better choice.

Cheers,

Dave Boles
······@convex.com
From: Cyber Surfer
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <808121462snz@wildcard.demon.co.uk>
In article <··········@zeppelin.convex.com>
           ······@news.eng.convex.com "David Boles" writes:

> I'd like to point out that if you had a choice, such a compiler does
> indeed exist. In fact it was the PC reference compiler for the STL
> implementation: IBM's CSet++ 2.1 or 3.0. I use the public
> implementation of STL with it and have had no problems so far. It is
> an *extremely* high quality compiler that is very well supported.
> If you have to use C++ on a PC, there is no better choice.

I'll see what I can do, but at the moment I don't have any choice
over which C/C++ compiler I use. I'm currently using VC++ 1.5 and/or
SC++ 6.1, depending on which machine I use, but an upgrade may be
possible later this year.

Does CSet++ support the Win16 and Win32 platforms?

Thanks.
-- 
<URL:http://www.demon.co.uk/community/index.html>
<URL:http://www.enrapture.com/cybes/namaste.html>
"You can never browse enough."
From: ·······@inmind.com
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <40it8r$jmc@mujibur.inmind.com>
In <············@wildcard.demon.co.uk>, Cyber Surfer <············@wildcard.demon.co.uk> writes:
>
>Does CSet++ support the Win16 and Win32 platforms?
>

Currently, C/Set++ (renamed to VisualAge C++ with version 3.0) supports OS/2,
AIX and a couple of other platforms.  However, the NT port is currently in
progresss (I have the first NT beta) and will be released sometime around the
first of next year.  You can probably get on the beta, the OS/2 beta had two
public releases (the current NT beta is private) that you could pick up for $25.
Even the beta were far better compilers than most released commercial
compilers -- especially the Microsoft compilers which are (in my opinion) basically
junk.

However, the C/Set++ compilers are ALL 32-bit and will not be supporting 16-bit
code.  They will, however, LINK to 16-bit code via "thunks".  The NT compiler
will run on and support both NT and Win95.

Michael Lee Finney -- no relation to IBM other than testing of betas.
From: Cyber Surfer
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <808297290snz@wildcard.demon.co.uk>
In article <··········@mujibur.inmind.com> ·······@inmind.com  writes:

> Currently, C/Set++ (renamed to VisualAge C++ with version 3.0) supports OS/2,
> AIX and a couple of other platforms.  However, the NT port is currently in
> progresss (I have the first NT beta) and will be released sometime around the
> first of next year.  You can probably get on the beta, the OS/2 beta had two
> public releases (the current NT beta is private) that you could pick up for
>  $25.

Well, I'll probably be using Win95 long before I use NT, but if IBM
have yet to add Win95 support, then I can wait for it. It's _very_
unlikely that I'll be using OS/2 in the near future, but it's worth
knowing about.

> Even the beta were far better compilers than most released commercial
> compilers -- especially the Microsoft compilers which are (in my opinion)
>  basically
> junk.

I agree. I try to avoid VC++, which is why I'm curious about CSet++.
Almost all my current Win16 development is done using VC++, but I'm
hoping that'll change soon.
 
> However, the C/Set++ compilers are ALL 32-bit and will not be supporting 16-bit
> code.  They will, however, LINK to 16-bit code via "thunks".  The NT compiler
> will run on and support both NT and Win95.

Ah. Well, I need Win16 _and_ Win32 support at the moment. I'm not sure
if I can use Win32s, altho I'd love to just forget about the Win16 API
and only code for Win32, whilst using Win32s for running my code on Win16.

Sadly, this is not my choice to make. If it was, then I might choose
something like Allegro CL, or Dylan. Perhaps that's _why_ it's not my
choice...
 
> Michael Lee Finney -- no relation to IBM other than testing of betas.

You can never beta test enough. ;-)

Many thanks.
-- 
<URL:http://www.demon.co.uk/community/index.html>
<URL:http://www.enrapture.com/cybes/namaste.html>
"You can never browse enough."
From: Bruce Hoult
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <2890899001@hoult.actrix.gen.nz>
Scott Wheeler <······@bmtech.demon.co.uk> writes:
> Yes, a very useful reference, comments on motivation, and an 
> implementation ("The ANSI C library", ibid, is also highly 
> recommended). It doesn't cover the Standard Template Library, a recent 
> addition to the draft, but since I don't know of a PC compiler with the 
> extensions to the template functionality required to support the STL, 
> this isn't a major loss.

On the Macintosh, the latest versions of both Symantec C++ and Metrowerks
CodeWarrior have come with the STL since about May.

-- Bruce
From: Scott Wheeler
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <jywjri@bmtech.demon.co.uk>
In Article <············@wildcard.demon.co.uk> Cyber Surfer  writes:
>> See PJ Plauger "The draft ANSI C++ library". This part of the draft 
can
>> be expected not to change significantly. It's implemented in the
>> Borland compiler.
>
>Is that a book? If so, do you have the ISBN?

"The Draft Standard C++ Library", PJ Plauger, Prentice Hall, 
0-13-117003-1

Scott
From: Cyber Surfer
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <808086808snz@wildcard.demon.co.uk>
In article <······@bmtech.demon.co.uk>
           ······@bmtech.demon.co.uk "Scott Wheeler" writes:

> "The Draft Standard C++ Library", PJ Plauger, Prentice Hall, 
> 0-13-117003-1

Excellent, thanks. I have some book tokens to use up, so that'll help!
-- 
<URL:http://www.demon.co.uk/community/index.html>
<URL:http://www.enrapture.com/cybes/namaste.html>
"You can never browse enough."
From: David Chase
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <3vtcud$u4@wcap.centerline.com>
Cyber Surfer <············@wildcard.demon.co.uk> writes:
> > It could be written as:
> > 
> > class str {
> >     int iCount;
> >     char achData[1];
> > };
> > 
> > Then you can allocate the extra memory. C++, like C, doesn't stop
> > you from doing this.

But (as the next poster pointed out) this is wrong.

···@cs.tu-berlin.de (Markus Freericks) writes:
> The Correct way is
> 
> class str {
>    int iCount;
>    char achData[MAXDATA]; /* whatever maximum size you want */
> };
> 
> #define alloc_str_of_size(N) (str*)malloc(sizeof(str)-MAXDATA+N)

I think this is also wrong, though it is difficult to find language in the
C++ working paper that says one way or another.  You'll certainly cause
pain for the tools (debuggers, interpreters) that you use.  The "correct"
thing to do is to quit bending the rules and do the second indirection, as
in:

class str {
   int iCount;
   char * achData;
};

If you're going to be accessing the string in any significant way, it's
difficult to imagine that the single second indirection will cause you
any significant loss of efficiency.  Besides which, if you plan to use
C++, you'd be much better off with something like (and this is still
pretty low-level, as interfaces go):

class str {
  int iCount;
  char * achData;
public:
  inline int length() {return iCount;}
  inline char * string() {return achData;}
  str(int n) iCount(n), achData(new char[n]) : { memset(achData, 0, n); }
  ~str() { delete[] achData; }
};

You should really resist the urge to write your own memory allocator,
and instead agitate for better ones from vendors.  Or, find another
vendor.  Or, find one in the public domain written by someone else.

speaking for myself,

David Chase
From: Henry Baker
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <hbaker-0408951324110001@192.0.2.1>
In article <·········@wcap.centerline.com>, ·····@centerline.com (David
Chase) wrote:

> class str {
>    int iCount;
>    char * achData;
> };
> 
> If you're going to be accessing the string in any significant way, it's
> difficult to imagine that the single second indirection will cause you
> any significant loss of efficiency.  Besides which, if you plan to use
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   <---- We were talking about 3X & greater
> C++, you'd be much better off with something like (and this is still
> pretty low-level, as interfaces go):
> 
> class str {
>   int iCount;
>   char * achData;
> public:
>   inline int length() {return iCount;}
>   inline char * string() {return achData;}
>   str(int n) iCount(n), achData(new char[n]) : { memset(achData, 0, n); }
>   ~str() { delete[] achData; }
> };

David:

You may have missed the start of this thread.  The major complaint is
not in simply _accessing_ the string, although there is definitely a
slowdown associated with that, but with _allocating/deallocating_ the
string.

The Stroustrup/Coplien school advocates using 2 mallocated objects in C++
when one will do in C, and this thread already described many of the
horrors of malloc inefficiencies.

So the point of the posting was that the Stroustrup/Coplien style
_guarantees_ bad performance, unless you are using class-specific
allocation, in which you still get bad performance (although possibly
not quite as egregious), but now difficult-to-maintain code, as well.

> You should really resist the urge to write your own memory allocator,
> and instead agitate for better ones from vendors.  Or, find another
> vendor.  Or, find one in the public domain written by someone else.

This is the point of the thread, but I was simply pointing out that
Stroustrup/Coplien acknowledge that their school has bad allocator
performance, and recommend that users write their own mallocator.

So you have now gone on record as disagreeing with Stroustrup & Coplien.

-- 
www/ftp directory:
ftp://ftp.netcom.com/pub/hb/hbaker/home.html
From: Scott Wheeler
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <jywahs@bmtech.demon.co.uk>
In Article <·······················@192.0.2.1> Henry Baker writes:
>The Stroustrup/Coplien school advocates using 2 mallocated objects in 
>C++ when one will do in C, and this thread already described many of 
>the horrors of malloc inefficiencies.

Only one allocation of heap memory (not *malloc*, please - that's C) 
unless you are using "new" to generate the string, because the top 
level of the object will go on the stack. Since temporaries and 
automatic variables go on the stack, this makes a substantial 
difference.

>So the point of the posting was that the Stroustrup/Coplien style
>_guarantees_ bad performance, 

I've yet to see any hard data posted. The 3x quoted repeatedly here has 
not yet been shown to be more than a guess.

Scott
From: Henry Baker
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <hbaker-0708950804440001@192.0.2.1>
In article <······@bmtech.demon.co.uk>, Scott Wheeler
<······@bmtech.demon.co.uk> wrote:

> In Article <·······················@192.0.2.1> Henry Baker writes:
> >The Stroustrup/Coplien school advocates using 2 mallocated objects in 
> >C++ when one will do in C, and this thread already described many of 
> >the horrors of malloc inefficiencies.
> 
> Only one allocation of heap memory (not *malloc*, please - that's C) 
> unless you are using "new" to generate the string, because the top 
> level of the object will go on the stack. Since temporaries and 
> automatic variables go on the stack, this makes a substantial 
> difference.

Excuse me??  You should read the Stroustrup/Coplien books again.  They
suggest newing/mallocating everything, & putting ref counts in the string
'headers'.  Your scheme could conceivably work, if one were willing to
copy around both the ptr to the string chars _and_ the length field, but
such an implementation would make a string 'handle' bigger than a usual
C++ pointer, and thereby cause a number of other problems with the C++
type system.  Furthermore, one would _not_ want to copy any ref counts
with such a scheme. In any case, I haven't seen this particular variant
recommended in any of the usual books.

> >So the point of the posting was that the Stroustrup/Coplien style
> >_guarantees_ bad performance, 
> 
> I've yet to see any hard data posted. The 3x quoted repeatedly here has 
> not yet been shown to be more than a guess.

The double allocator hit would be 2X by itself, all other things being
equal.  If the potentially greater fragmentation of the free list caused
a lengthening of freelist searches, then this would show up in further reduced
allocator and deallocator performance.

Accessing the string itself is at least 2X slower, due to the double
indirection.

As Paul Wilson pointed out, the 'rounding up' of each object to at least
16 bytes could cost severely in memory hierarchy costs (caching, VM, etc.),
and such costs would be most evident in the case where most strings were
short (i.e., <16 bytes, the most usual case).

Yes, my 3x guess is just a guess, but a well-educated guess based on
lots & lots of experience.

-- 
www/ftp directory:
ftp://ftp.netcom.com/pub/hb/hbaker/home.html
From: Scott Wheeler
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <jywcwc@bmtech.demon.co.uk>
In Article <·······················@192.0.2.1> Henry Baker writes:
>In article <······@bmtech.demon.co.uk>, Scott Wheeler
><······@bmtech.demon.co.uk> wrote:
>
>> In Article <·······················@192.0.2.1> Henry Baker writes:
>> >The Stroustrup/Coplien school advocates using 2 mallocated objects 
in
>> >C++ when one will do in C, and this thread already described many 
of
>> >the horrors of malloc inefficiencies.
>>
>> Only one allocation of heap memory (not *malloc*, please - that's C)
>> unless you are using "new" to generate the string, because the top
>> level of the object will go on the stack. Since temporaries and
>> automatic variables go on the stack, this makes a substantial
>> difference.
>
>Excuse me??  You should read the Stroustrup/Coplien books again.  They
>suggest newing/mallocating everything, & putting ref counts in the 
string
>'headers'.

The ARM does have an imcomplete reference-counted string class (I don't 
have Coplien to hand, or Stroustrop's later books). Of course, this 
means one would expect the number of "new"s per stack-based string 
object created in the program to be between 0 and 2, dependent on the 
program (new, *not* malloc - it simply isn't interchangable with new 
and neither S nor C recommend it here). 

>Your scheme could conceivably work, if one were willing to
>copy around both the ptr to the string chars _and_ the length field, 
>but such an implementation would make a string 'handle' bigger than a 
>usual C++ pointer, and thereby cause a number of other problems with 
>the C++ type system.  

If you are talking about the data representation

  class str {
    int nLen;        // string length
    char *pachData;  // ptr to character array containing data
  };

it's hardly "my" scheme (I use the draft-ANSI string class where 
available), it's perfectly legal C++, doesn't subvert the type system, 
and it works just fine. There is no reason why it should be of the same 
size as a pointer - I'm not proposing some odd cast. You do realise 
that there are the usual set of member functions? I omitted them 
because they weren't relevant to the present discussion.

>Furthermore, one would _not_ want to copy any ref counts
>with such a scheme. In any case, I haven't seen this particular 
>variant recommended in any of the usual books.

No, this is not a reference counted implementation - I though I made 
it clear that nLen is a length count (hence no need to use strlen() 
etc.). BTW, I came across an article a couple of years back (probably 
CUJ) giving a case study of developing and profiling a string class. 
The author passed through this version to a reference counted one, and 
found that his performance decreased. Probably strongly dependent on 
the application, but it does indicate that a non-reference-counted 
version is not necessarily naive.  

>> >So the point of the posting was that the Stroustrup/Coplien style
>> >_guarantees_ bad performance,
>>
>> I've yet to see any hard data posted. The 3x quoted repeatedly here 
has
>> not yet been shown to be more than a guess.
>
>The double allocator hit would be 2X by itself, all other things being
>equal.  If the potentially greater fragmentation of the free list 
>caused a lengthening of freelist searches, then this would show up in 
>further reduced allocator and deallocator performance.

*If* you do a double allocation. The whole point of using a reference 
counted version with copy-on-write is that you are guessing that in 
most cases you are copying an existing string - hence you only need to 
create a new "header" (on the stack) and increment the count in the 
"body" (srep struct in the ARM example) - no heap allocation, and 
follow one pointer.

And *if* the two allocations are equally costly, i.e. both or neither 
cause a page fault.

>Accessing the string itself is at least 2X slower, due to the double
>indirection.

The second point above holds here. 

>Yes, my 3x guess is just a guess, but a well-educated guess based on
>lots & lots of experience.

Why guess when you can measure? My guess, founded on experience 
profiling Noah's project planning software (the product was successful, 
but my company went under), is that you will find it very difficult to 
measure a systematic difference, and that if you *can* prove anything, 
it will be strongly dependent on the application area.

Scott
From: Matthew Kennel
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <408g35$cq1@martha.utk.edu>
Henry Baker (······@netcom.com) wrote:

: The double allocator hit would be 2X by itself, all other things being
: equal.  If the potentially greater fragmentation of the free list caused
: a lengthening of freelist searches, then this would show up in further reduced
: allocator and deallocator performance.

: Accessing the string itself is at least 2X slower, due to the double
: indirection.

: As Paul Wilson pointed out, the 'rounding up' of each object to at least
: 16 bytes could cost severely in memory hierarchy costs (caching, VM, etc.),
: and such costs would be most evident in the case where most strings were
: short (i.e., <16 bytes, the most usual case).

: Yes, my 3x guess is just a guess, but a well-educated guess based on
: lots & lots of experience.

This issue also comes up in numerical matrix libraries.  It's complicated
by the fact that there is also the issue of 2 dimensional indexing.

The questions are:

	*) Should matrix data be stored through an extra indirection? 
	   E.g. have a header and then an internal pointer to the data
           itself?

	*) Should columns vs rows be accessed through "dope vectors"
	   i.e. numerical recipes in C style allocation?

For speed, the answers are both "no".  But such a solution is difficult
to achieve in C++.

Given the first issue (indirect pointer to data) there are optimization
difficulties in trying to get it automatically elided out.

I believe these were discussed in a recent article in _Computers in Physics_,
whose empirical results demonstrated a typical 3x-4x speed reduction
compared to direct-access Fortran style, just as Mr Baker is predicting.

Some of it could be recovered by very smart optimizations but the language
definition and aliasing problems make this difficult.

<plug> I should point out that the standard Sather implementation does things
the "obvious" way advocated by Mr Baker, and puts the size fields of strings
and matrices along with the variable-sized data in one contiguous allocation
unit.

We hope to be a factor of "1" compared to idiomatic C on matrix indexing
(but with dynamically sized matrices who carry their bounds with them) with
the upcoming 1.1 release.  A test doing manual source transformations of the
optimizations intended to be automatic in that release resulted in identical
assembly code being generated for the naive matrix multiply loops for Sather
and native C (compiling using the GNU C compiler).  AIX Fortran generated
code that was about 10% faster.

There will also be Sather matrix classes which call the BLAS libraries for
their underlying computation.  </plug>

: -- T
: www/ftp directory:
: ftp://ftp.netcom.com/pub/hb/hbaker/home.html
From: David Chase
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <405cgj$8ei@wcap.centerline.com>
······@netcom.com (Henry Baker) writes:

> This is the point of the thread, but I was simply pointing out that
> Stroustrup/Coplien acknowledge that their school has bad allocator
> performance, and recommend that users write their own mallocator.
> 
> So you have now gone on record as disagreeing with Stroustrup & Coplien.

These things happen.  I imagine I'll survive the experience.
However -- 

Seeing as how there's been a perfectly adequate 32-bit Unix memory
allocator available from

  ftp.parc.xerox.com:pub/gc/gc.tar.Z

since about 1988, it seems like it would be unwise to write your
own.  (This memory allocator is also a garbage collector, but it is
trival to turn off the garbage collector and use it instead as a
fast, low-overhead malloc-style memory allocator.  At the time, it
was faster than ANY vendor-supplied malloc I could find, and it has
a very low overhead as well).  The fact that this is possible was
mentioned in Boehm and Weiser's Software Practice and Experience
paper (late 1988) where they describe use of garbage collection
techniques to implement a leak detector.

[ This is also available (I think) with real live support as "Great
  Circle" from Geodesic Systems (http://www.geodesic.com, or
  ·····@geodesic.com). ]

Furthermore, even if you have to write a memory allocator, very
reasonable advice on how-to was given as long ago as 1977, and
allocators designed according to that advice are still among the
best available (Norman Nielsen, "Dynamic Memory Allocation in
Computer Simulation", CACM 20:11, November, 1977).  This paper was
no big secret -- I remember being pointed to it in a graduate-level
OS principles class when I was an undergraduate.  It was in that
same class that the badness of roving-fit (and why) were first
described to me as well, so, again, this was not a big secret either.

speaking for myself,

David Chase
From: Brian N. Miller
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <40d33o$bd6@central.bbt.com>
·······@netcom.com (Henry Baker) writes:
|
|> This is the point of the thread, but I was simply pointing out that
|> Stroustrup/Coplien acknowledge that their school has bad allocator
|> performance, and recommend that users write their own mallocator.
|> 
|> So you have now gone on record as disagreeing with Stroustrup & Coplien.

I'm generally a fan of Stroustrup, but his reference counting 
allocation proposal sucks:

1) It expects application programmers to flawlessly obey the reference
   counting protocol at the source code level.  What a pain in the ass.
   There's just to much opportunity for mistakes. 

2) It's not well suited for concurrency.  Garbage collection can be run
   as a background thread.  Reference count triggered deallocation is
   not so readily parallelized.

3) It lacks compaction.  This is a big miss for some applications, and
   so I question its general applicability.

4) Recommending users write their own allocator in desparation is just
   silly.  Bootstrapping the heap is one thing a language's audience
   should almost never have to do.
From: Cyber Surfer
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <807569672snz@wildcard.demon.co.uk>
In article <·········@wcap.centerline.com>
           ·····@centerline.com "David Chase" writes:

> > > Then you can allocate the extra memory. C++, like C, doesn't stop
> > > you from doing this.
> 
> But (as the next poster pointed out) this is wrong.

I've still found it useful. Curiously, one example was in an allocator
for a Lisp heap. I knew _exactly_ how it was going to be used, and if
it was safe enough.

> I think this is also wrong, though it is difficult to find language in the
> C++ working paper that says one way or another.  You'll certainly cause
> pain for the tools (debuggers, interpreters) that you use.  The "correct"

I've never used a C/C++ interpreter, so I'm not worried about that,
and I've not used a debgugger for a few years now. Perhaps when I
upgrade my current C/C++ system, I'll try the debugger and test this.

> You should really resist the urge to write your own memory allocator,
> and instead agitate for better ones from vendors.  Or, find another
> vendor.  Or, find one in the public domain written by someone else.

I needed one with a GC, and as I couldn't get a commercial one, I write
my own. If you know of a C/C++ vendor that includes an alternative to
the malloc/free, new/delete style of allocation, please let me know.
I'll certainly be curious, and if the system includes support for a
platform that I'm developing for, then I may be able to use it.

Thanks.
-- 
<URL:http://www.demon.co.uk/community/index.html>
<URL:http://www.enrapture.com/cybes/namaste.html>
"You can never browse enough."
From: Hans Boehm
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <405i5n$e7r@news.parc.xerox.com>
Cyber Surfer <············@wildcard.demon.co.uk> writes:

>I needed one with a GC, and as I couldn't get a commercial one, I write
>my own. If you know of a C/C++ vendor that includes an alternative to
>the malloc/free, new/delete style of allocation, please let me know.
>I'll certainly be curious, and if the system includes support for a
>platform that I'm developing for, then I may be able to use it.

See ftp://parcftp.xerox.com/pub/gc/gc.html for pointers to a couple
of commercial garbage collectors for C/C++ (as well as a description
of a freely available one).  I know of no commercial C/C++
implementations that include GC support.  However, I believe NAG
Fortran 90 does.

Hans-J. Boehm
(·····@parc.xerox.com)
Standard disclaimer ...
From: Cyber Surfer
Subject: Re: allocator and GC locality (was Re: cost of malloc)
Date: 
Message-ID: <807820651snz@wildcard.demon.co.uk>
In article <··········@news.parc.xerox.com>
           ·····@parc.xerox.com "Hans Boehm" writes:

> See ftp://parcftp.xerox.com/pub/gc/gc.html for pointers to a couple
> of commercial garbage collectors for C/C++ (as well as a description
> of a freely available one).  I know of no commercial C/C++
> implementations that include GC support.  However, I believe NAG
> Fortran 90 does.

I'll make a note of that, in case I ever find a use for Fortran 90,
or any Fortran. Thanks for the C/C++ GC pointers.

I FTP'd your GC last year, but I didn't have time to rewrite the
makefile so I could build it. I'd still like to play with it, someday.

Thanks.
-- 
<URL:http://www.demon.co.uk/community/index.html>
<URL:http://www.enrapture.com/cybes/namaste.html>
"You can never browse enough."