Why Arc isn't especially OO

From: Bill Clementson
Subject: Why Arc isn't especially OO
Date: Wed, 12 Dec 2001 05:01:03 +0000
Message-ID: <jwBR7.11052$7y.115433@rwcrnsc54>

FYI, Paul Graham has posted a new article on his web site discussing why Arc
isn't especially OO:
http://www.paulgraham.com/noop.html

--
Bill Clementson

Re: Why Arc isn't especially OO Tim Bradshaw
- Re: Why Arc isn't especially OO Espen Vestre
  - Re: Why Arc isn't especially OO Scott McKay
    - Re: Why Arc isn't especially OO Tim Bradshaw
    - Re: Why Arc isn't especially OO Erik Naggum
    - Re: Why Arc isn't especially OO Andreas Bogk
      - Re: Why Arc isn't especially OO Scott McKay
        Re: Why Arc isn't especially OO Marco Antoniotti
        Re: Why Arc isn't especially OO Kent M Pitman
        Re: Why Arc isn't especially OO Marco Antoniotti
        Re: Why Arc isn't especially OO Scott McKay
        Protocols? WAS: Why Arc isn't especially OO Nils Goesche
        Re: Protocols? WAS: Why Arc isn't especially OO Erik Naggum
        Re: Protocols? WAS: Why Arc isn't especially OO Nils Goesche
        Re: Why Arc isn't especially OO Frank A. Adrian
Re: Why Arc isn't especially OO Kenny Tilton
- Re: Why Arc isn't especially OO Nils Goesche
  - Re: Why Arc isn't especially OO Coby Beck
    - Re: Why Arc isn't especially OO Kent M Pitman
      - Re: Why Arc isn't especially OO ·······@attbi.com
        Re: Why Arc isn't especially OO Kent M Pitman
        Re: Why Arc isn't especially OO ·······@attbi.com
      - Re: Why Arc isn't especially OO Lieven Marchand
  - Re: Why Arc isn't especially OO James A. Crippen
    - Re: Why Arc isn't especially OO Tim Bradshaw
    - Re: Why Arc isn't especially OO Gabe Garza
      - Re: Why Arc isn't especially OO ········@acm.org
        Re: Why Arc isn't especially OO Daniel Barlow
        Re: Why Arc isn't especially OO ········@acm.org
        Re: Why Arc isn't especially OO Thomas F. Burdick
        Re: Why Arc isn't especially OO Christophe Rhodes
    - Re: Why Arc isn't especially OO Erik Naggum
      - Re: Why Arc isn't especially OO Alexander Kjeldaas
        Re: Why Arc isn't especially OO Carl Shapiro
        Re: Why Arc isn't especially OO Kent M Pitman
        Re: Why Arc isn't especially OO Alexander Kjeldaas
        Re: Why Arc isn't especially OO Scott McKay
        Re: Why Arc isn't especially OO Erann Gat
        Re: Why Arc isn't especially OO Thomas F. Burdick
        Re: Why Arc isn't especially OO Erann Gat
        Re: Why Arc isn't especially OO Bruce Hoult
        Re: Why Arc isn't especially OO Scott McKay
        Re: Why Arc isn't especially OO Alexander Kjeldaas
        Re: Why Arc isn't especially OO Scott McKay
        Re: Why Arc isn't especially OO Thomas F. Burdick
        Re: Why Arc isn't especially OO Alexander Kjeldaas
        Re: Why Arc isn't especially OO Erik Naggum
        Dataspaces Michael J. Ferrador
        Re: Dataspaces Kent M Pitman
        Re: Why Arc isn't especially OO Alexander Kjeldaas
        Re: Why Arc isn't especially OO Carl Shapiro
        Re: Why Arc isn't especially OO Kent M Pitman
        Re: Why Arc isn't especially OO Alexander Kjeldaas
        Re: Why Arc isn't especially OO Carl Shapiro
        Re: Why Arc isn't especially OO Alexander Kjeldaas
        Re: Why Arc isn't especially OO Duane Rettig
        Re: Why Arc isn't especially OO Alexander Kjeldaas
        Re: Why Arc isn't especially OO Duane Rettig
        Re: Why Arc isn't especially OO Alexander Kjeldaas
        Re: Why Arc isn't especially OO Duane Rettig
        Re: Why Arc isn't especially OO Alexander Kjeldaas
        Re: Why Arc isn't especially OO Erik Naggum
        Re: Why Arc isn't especially OO Duane Rettig
        Re: Why Arc isn't especially OO ········@acm.org
        Re: Why Arc isn't especially OO Pierre R. Mai
        Re: Why Arc isn't especially OO Bruce Hoult
        Re: Why Arc isn't especially OO Thomas F. Burdick
        Re: Why Arc isn't especially OO Alexander Kjeldaas
        Re: Why Arc isn't especially OO Daniel Barlow
        Re: Why Arc isn't especially OO Bruce Hoult
        Re: Why Arc isn't especially OO Alexander Kjeldaas
        Re: Why Arc isn't especially OO Alexander Kjeldaas
      - Re: Why Arc isn't especially OO Lars Lundback

From: Tim Bradshaw
Subject: Re: Why Arc isn't especially OO
Date: Wed, 12 Dec 2001 11:03:52 +0000
Message-ID: <fbc0f5d1.0112120303.41376b36@posting.google.com>

"Bill Clementson" <···············@yahoo.com> wrote in message news:<·····················@rwcrnsc54>...
> FYI, Paul Graham has posted a new article on his web site discussing why Arc
> isn't especially OO:
> http://www.paulgraham.com/noop.html


He's very smart, isn't he?

I think there's a big backlash happening / going to happen against
OO-as-it-has-become, which really means some terribly restrictive
cultish thing involving obsessive data hiding, restrictive inheritance
protocols making programming unduly difficult (you want to support
this protocol? well, sorry you can't mix in this class that does it,
you have to reimplement it all with nice different bugs), tying
everything into class definitions and so on.  Of course this is
nothing to do with what object oriented programming actually once was,
or with what CLOS is.

So he's worked this out, I guess, and is positioning arc as non-OO. 
He's missed the point a bit with CLOS, since it turns out that CLOS
isn't OO either (That's OO-the-cult as opposed to being object
oriented...), but maybe that's just anti-CLism, or just a matter of
style (I'm kind of an inverse Paul Graham: I use CLOS all the time,
but I don't think I do OOP).

Having taught CL courses over the last 5-6 years where I've tried to
demonstrate that CL really is OO, even if it has these inconvenient
things like method definitions outside classes and so on, I can see
that now I'll have to reposition it as `not *actually* OO even though
it looks a bit like it'.

--tim

From: Espen Vestre
Subject: Re: Why Arc isn't especially OO
Date: Wed, 12 Dec 2001 11:09:49 +0000
Message-ID: <w6k7vsbsoy.fsf@wallace.ws.nextra.no>

··········@tfeb.org (Tim Bradshaw) writes:

> things like method definitions outside classes and so on, I can see
> that now I'll have to reposition it as `not *actually* OO even though
> it looks a bit like it'.

I can see your point - but isn't it a bit sad to surrender to the cultish
definition of OO? On the other hand - maybe we should start referring
to CLOS programming as "generic function programming" and refer, in
footnotes, to "Cult OO" as a "a severely constrained special case of GFP" 
;-)?
-- 
  (espen)

From: Scott McKay
Subject: Re: Why Arc isn't especially OO
Date: Wed, 12 Dec 2001 14:41:36 +0000
Message-ID: <A0KR7.13893$5W5.5761440@typhoon.ne.mediaone.net>

"Espen Vestre" <·····@*do-not-spam-me*.vestre.net> wrote in message
···················@wallace.ws.nextra.no...
> ··········@tfeb.org (Tim Bradshaw) writes:
>
> > things like method definitions outside classes and so on, I can see
> > that now I'll have to reposition it as `not *actually* OO even though
> > it looks a bit like it'.
>
> I can see your point - but isn't it a bit sad to surrender to the cultish
> definition of OO? On the other hand - maybe we should start referring
> to CLOS programming as "generic function programming" and refer, in
> footnotes, to "Cult OO" as a "a severely constrained special case of GFP"

I've tried to make the point to Paul that he is setting up a Java-like
strawman of OO, then knocking it down.  So what, big deal.  So
instead of thinking about the useful things OO can provide, he's
throwing out the baby with the bathwater.

Here's a suggestion: stop calling what languages such as Common Lisp
and Dylan provide "object-oriented programming".  God forbid, don't
call it "function-oriented programming", because that will get other
people's
knickers in a twist.  Some years ago I suggested "protocol-oriented
programming".

From: Tim Bradshaw
Subject: Re: Why Arc isn't especially OO
Date: Wed, 12 Dec 2001 18:49:28 +0000
Message-ID: <fbc0f5d1.0112121049.63cf5805@posting.google.com>

"Scott McKay" <···@mediaone.net> wrote in message news:<·······················@typhoon.ne.mediaone.net>...

> Some years ago I suggested "protocol-oriented
> programming".

I like this, or something with protocol in it, anyway.

--tim

From: Erik Naggum
Subject: Re: Why Arc isn't especially OO
Date: Wed, 12 Dec 2001 17:21:44 +0000
Message-ID: <3217166502392064@naggum.net>

* "Scott McKay" <···@mediaone.net>
| Some years ago I suggested "protocol-oriented programming".

  That is probably _the_ most constructive suggestion I have seen.

///
-- 
  The past is not more important than the future, despite what your culture
  has taught you.  Your future observations, conclusions, and beliefs are
  more important to you than those in your past ever will be.  The world is
  changing so fast the balance between the past and the future has shifted.

From: Andreas Bogk
Subject: Re: Why Arc isn't especially OO
Date: Wed, 12 Dec 2001 17:30:35 +0000
Message-ID: <87667cpcqs.fsf@teonanacatl.andreas.org>

"Scott McKay" <···@mediaone.net> writes:

> Here's a suggestion: stop calling what languages such as Common Lisp
> and Dylan provide "object-oriented programming".  God forbid, don't
> call it "function-oriented programming", because that will get other
> people's knickers in a twist.  Some years ago I suggested
> "protocol-oriented programming".

There's a solid scientific theory behind what we do, the
lambda& calculus[0].  And it does count as both functional
and object-oriented.  So why hide?

Andreas

[0] For those who don't know, not the regular lambda calculus,
    but extended by multiple dispatch.  See
      http://www.di.ens.fr/~castagna/pub.frametop.html
    Buy the book if you're interested in a calculus explaining
    CLOS and Dylan.

-- 
"In my eyes it is never a crime to steal knowledge. It is a good
theft. The pirate of knowledge is a good pirate."
                                                       (Michel Serres)

From: Scott McKay
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 13:31:07 +0000
Message-ID: <vanS7.16599$5W5.7005623@typhoon.ne.mediaone.net>

"Andreas Bogk" <·······@andreas.org> wrote in message
···················@teonanacatl.andreas.org...
> "Scott McKay" <···@mediaone.net> writes:
>
> > Here's a suggestion: stop calling what languages such as Common Lisp
> > and Dylan provide "object-oriented programming".  God forbid, don't
> > call it "function-oriented programming", because that will get other
> > people's knickers in a twist.  Some years ago I suggested
> > "protocol-oriented programming".
>
> There's a solid scientific theory behind what we do, the
> lambda& calculus[0].  And it does count as both functional
> and object-oriented.  So why hide?
>

Because Java, e.g., has staked out what it calls "object-oriented"
and the model is broken.  We just disassociate ourselves with
the name.  And anybody who ever attended an old Lisp & Functional
Programming Conference will remember how bitter the fights are
over the word "functional".

Sometimes retreating to find a better point of entry is a better way
to win a battle.

Summary: it's not hiding, it's just politics.

From: Marco Antoniotti
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 15:19:54 +0000
Message-ID: <y6c1yhxhlr9.fsf@octagon.mrl.nyu.edu>

"Scott McKay" <···@mediaone.net> writes:

> "Andreas Bogk" <·······@andreas.org> wrote in message
> ···················@teonanacatl.andreas.org...
> > "Scott McKay" <···@mediaone.net> writes:
> >
> > > Here's a suggestion: stop calling what languages such as Common Lisp
> > > and Dylan provide "object-oriented programming".  God forbid, don't
> > > call it "function-oriented programming", because that will get other
> > > people's knickers in a twist.  Some years ago I suggested
> > > "protocol-oriented programming".
> >
> > There's a solid scientific theory behind what we do, the
> > lambda& calculus[0].  And it does count as both functional
> > and object-oriented.  So why hide?
> >
> 
> Because Java, e.g., has staked out what it calls "object-oriented"
> and the model is broken.  We just disassociate ourselves with
> the name.  And anybody who ever attended an old Lisp & Functional
> Programming Conference will remember how bitter the fights are
> over the word "functional".
> 
> Sometimes retreating to find a better point of entry is a better way
> to win a battle.
> 
> Summary: it's not hiding, it's just politics.

I wholeheartedly agree.  I would go as far as claiming that a good
time to retreat is when the "hype level" gets to high :)

Anyway, I really like the term "protocol oriented programming". I
would also adopt KMP's "identity oriented programming".

As an aside, there is still the question of "distributed object
oriented programming".  Maybe, "distributed protocol programming" is
what we want?

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.

From: Kent M Pitman
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 16:05:11 +0000
Message-ID: <sfwpu5h6b48.fsf@shell01.TheWorld.com>

Marco Antoniotti <·······@cs.nyu.edu> writes:

> Anyway, I really like the term "protocol oriented programming". I
> would also adopt KMP's "identity oriented programming".

I think the word "programming" is what is really causing the problem.

Programming is an "internal" thing.  That is, it's on what Java calls
"the private side".  But from the outside, it doesn't really matter if
a thing is programmed one way or another, IMO.  That is, you can black
box things as we've done with BUILT-IN-CLASS, and it's not a lot different
than a sealed standard class (a la Dylan).  

From the outside, the issue is whether there is discovery and of what nature
that discovery is.

One might claim that Lisp and Java both have protocols, but that Lisp is
gf-centric and Java is interface-centric.   It's a pity these two aren't
unified, frankly, since each has its virtues.  C++ I don't see as having
protocols, really, and not even gf's, just overloading.

When I refer to "identity oriented", I'm less referring to how one programs
(which I regard as a personal matter) and more referring to the external
way one treats an object they didn't program.

And there's the question of reflectivity.  Can a program ask what another
program has available or must one consult the doc.

I'd be comfortable saying Lisp is a protocol-oriented, gf-centric, 
call-by-identity, partially reflective, dynamic language.

I'd say Java is a protocol-oriented, class/interface-centric, 
call-mostly-by-identity, partially reflective, half-dynamic/half-static 
language.

I'd say C++ is a protocol-wannabe, class-centric, call-by-this-and-that,
overly static language.

> As an aside, there is still the question of "distributed object
> oriented programming".

For those who want to be dooped.

Though the acronym doop has already been used for dynamic object 
oriented programming and is a good one to stay away from.

> Maybe, "distributed protocol programming" is what we want?

Dunno even what "distributed" contributes here.

From: Marco Antoniotti
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 18:53:22 +0000
Message-ID: <y6cr8px8wgt.fsf@octagon.mrl.nyu.edu>

Kent M Pitman <······@world.std.com> writes:

> Marco Antoniotti <·······@cs.nyu.edu> writes:
> 
> > Maybe, "distributed protocol programming" is what we want?
> 
> Dunno even what "distributed" contributes here.

This is in reference to an old article by Scott McKey about
"distributed objects" and how they seem to fit well with a current
practice, e.g. in Java.  The issue is whether the gf-centric nature of
CL can be adapted to these "distribution" ideas.

Sorry for the brevity

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.

From: Scott McKay
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 20:32:19 +0000
Message-ID: <nltS7.17066$5W5.7190472@typhoon.ne.mediaone.net>

"Marco Antoniotti" <·······@cs.nyu.edu> wrote in message
····················@octagon.mrl.nyu.edu...
>
> Kent M Pitman <······@world.std.com> writes:
>
> > Marco Antoniotti <·······@cs.nyu.edu> writes:
> >
> > > Maybe, "distributed protocol programming" is what we want?
> >
> > Dunno even what "distributed" contributes here.
>
> This is in reference to an old article by Scott McKey about
> "distributed objects" and how they seem to fit well with a current
> practice, e.g. in Java.  The issue is whether the gf-centric nature of
> CL can be adapted to these "distribution" ideas.
>

Yeah, for me it's a little easier to get my head around "distributed
objects".
Distributed generic-functions-with-multi-methods makes my brain hurt.
I've been doing some reading and thinking about this, but I don't have
anything to say that would clarify it.  The Java "enterprise bean" model
actually works quite well for some things, but there are other parts of
it that are hugely confusing.

I found the references to Java Spaces and Kali helpful, if anyone else is
interested.

From: Nils Goesche
Subject: Protocols?  WAS: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 17:30:43 +0000
Message-ID: <lk1yhx4sl8.fsf_-_@pc022.bln.elmeg.de>

Kent M Pitman <······@world.std.com> writes:

> I'd be comfortable saying Lisp is a protocol-oriented, gf-centric, 
> call-by-identity, partially reflective, dynamic language.
> 
> I'd say Java is a protocol-oriented, class/interface-centric, 
> call-mostly-by-identity, partially reflective, half-dynamic/half-static 
> language.
> 
> I'd say C++ is a protocol-wannabe, class-centric, call-by-this-and-that,
> overly static language.

I know (parts of) Java, C++ and CLOS; but could somebody be so kind
and enlighten me about what ``protocol'' means in this context (or
MOP)?  I always thought a protocol is something like Q.931 or X.75 :-)

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9

From: Erik Naggum
Subject: Re: Protocols?  WAS: Why Arc isn't especially OO
Date: Sat, 15 Dec 2001 03:01:16 +0000
Message-ID: <3217374073660749@naggum.net>

* Nils Goesche <······@cartan.de>
| I know (parts of) Java, C++ and CLOS; but could somebody be so kind
| and enlighten me about what ``protocol'' means in this context (or
| MOP)?  I always thought a protocol is something like Q.931 or X.75 :-)

  Those are _communications_ protocol specifications.  There are many kinds
  of protocols, but they are all concerned with distributed synchronized
  state transition and context-dependent information exchange.  The reasons
  for specifying communications protocols with such heavy-handed machinery
  are (1) that there is high value in conserving bandwidth, and (2) that
  they must be implementable as completely separate projects.  This is not
  different from specifying functions and data types and state transitions
  in a single software project.  For instance, you can only call read-line
  on an open character stream, you can call close only on an open stream,
  etc.  These specifications are entirely equivalent to what you must do in
  a communications protocol.

  In essence, a remote communications peer may be regarded as an "object"
  to which one sends "messages" in order to influence or interrogate its
  state.  The "information hiding" aspect of some object-oriented models
  acutely parallels the design of communicating peers, about which one
  should not know too much and really not force to change state outside of
  the protocol.  What an object or module or whatever exports may be seen
  as the elements of its (interaction) protocol.

  The distinction between a protocol and an API is that the protocol will
  make states and state transitions explicit, while they are usually hidden
  in commentaries and error conditions in API.  Something designed and
  implemented as a _protocol_ will usually be much more orderly, simply
  because the number of states and state transitions are reduced when they
  are made explicit and visible to the programmer.

  E.g., global variables scare many programmers because one may change the
  state of the whole system so easily and it is so hard to restore state.
  A global variable is hard to model accurately, but this is in sharp
  contrast to the special variable, which may be regarded as arguments to
  the functions called inside their bindings, each function propagating the
  entire set of global bindings in effect when the call is made and models
  their unbinding accurately.  However, most protocols that invent variable
  exchange fail to implement nesting behavior properly.  In fact, nesting
  and recursion appear to be uncommon in protocols designed by people who
  prefer a "flat" state space.  Numerous programming techniques would be so
  much more _conscious_ if they were designed as parts of protocols.
  
///
-- 
  The past is not more important than the future, despite what your culture
  has taught you.  Your future observations, conclusions, and beliefs are
  more important to you than those in your past ever will be.  The world is
  changing so fast the balance between the past and the future has shifted.

From: Nils Goesche
Subject: Re: Protocols?  WAS: Why Arc isn't especially OO
Date: Sun, 16 Dec 2001 17:28:48 +0000
Message-ID: <87y9k3t6pb.fsf@darkstar.cartan>

Erik Naggum <····@naggum.net> writes:

> * Nils Goesche <······@cartan.de>
> | I know (parts of) Java, C++ and CLOS; but could somebody be so kind
> | and enlighten me about what ``protocol'' means in this context (or
> | MOP)?  I always thought a protocol is something like Q.931 or X.75 :-)
> 
>   Those are _communications_ protocol specifications.  There are many kinds
>   of protocols, but they are all concerned with distributed synchronized
>   state transition and context-dependent information exchange.

[snip]

Thank you very much for the detailed explanation.  I printed it
out and have a lot to think about now...

Regards,
-- 
Nils Goesche
Ask not for whom the <CONTROL-G> tolls.

PGP key ID 0xC66D6E6F

From: Frank A. Adrian
Subject: Re: Why Arc isn't especially OO
Date: Sat, 15 Dec 2001 06:50:06 +0000
Message-ID: <yoCS7.2261$D%1.273408@news.uswest.net>

Scott McKay wrote:

> Because Java, e.g., has staked out what it calls "object-oriented"`
This can be validated by the fact that of 27 papers in this year's OOPSLA 
proceding, 25 were directly about Java.  The other two were on a VM for UML 
and on a graphical display notion for classes.

> and the model is broken.

And this can be validated by the fact that most of those papers were on 
trying to get Java to do things that Lisp has been able to do for the last 
15 or so years...

faa

From: Kenny Tilton
Subject: Re: Why Arc isn't especially OO
Date: Wed, 12 Dec 2001 07:09:52 +0000
Message-ID: <3C1703A8.2F2929AD@nyc.rr.com>

Bill Clementson wrote:
> 
> FYI, Paul Graham has posted a new article on his web site discussing why Arc
> isn't especially OO:
> http://www.paulgraham.com/noop.html
> 

Go figure. OO is a natural form of expression for me. I was doing OO
before the languages were OO. Maybe that is Graham's point, but I think
formalizing my ad hoc pseudo-OO-wannabe techniques was a Good Thing. And
CLOS especially does not really lock you in too badly, especially with
the MOP.

programmers sure do differ on basic stuff. :)

kenny
clinisys

From: Nils Goesche
Subject: Re: Why Arc isn't especially OO
Date: Wed, 12 Dec 2001 15:33:09 +0000
Message-ID: <lkpu5k4fnu.fsf@pc022.bln.elmeg.de>

Kenny Tilton <·······@nyc.rr.com> writes:

> Bill Clementson wrote:
> > 
> > FYI, Paul Graham has posted a new article on his web site
> > discussing why Arc isn't especially OO:
> > http://www.paulgraham.com/noop.html
> 
> Go figure. OO is a natural form of expression for me. I was doing OO
> before the languages were OO. Maybe that is Graham's point, but I think
> formalizing my ad hoc pseudo-OO-wannabe techniques was a Good Thing. And
> CLOS especially does not really lock you in too badly, especially with
> the MOP.
> 
> programmers sure do differ on basic stuff. :)

Well, he doesn't really say doing OO is bad, does he?  Just that he
personally never felt the need to use CLOS.  Actually, I sympathize
with most of the articles he writes about his `Arc' language; I don't
like the look of the long `multiple-value-bind' either.  But what I
absolutely, positively don't get is why he thinks all those (IMO)
minor issues justify all the effort of the creation of a new
language.  He could easily write a few macros that would turn CL into
the very language he describes in those articles, or something very
similar.  I guess he has become rich and doesn't know how to use his
time for something useful :-)

Regards,
-- 
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9

From: Coby Beck
Subject: Re: Why Arc isn't especially OO
Date: Wed, 12 Dec 2001 18:43:38 +0000
Message-ID: <uzNR7.78992$Ga5.12283059@typhoon.tampabay.rr.com>

"Nils Goesche" <······@cartan.de> wrote in message
···················@pc022.bln.elmeg.de...
> Kenny Tilton <·······@nyc.rr.com> writes:
>
> > Bill Clementson wrote:
> > >
> > > FYI, Paul Graham has posted a new article on his web site
> > > discussing why Arc isn't especially OO:
> > > http://www.paulgraham.com/noop.html
> >
> > Go figure. OO is a natural form of expression for me. I was doing OO
> > before the languages were OO. Maybe that is Graham's point, but I think
> > formalizing my ad hoc pseudo-OO-wannabe techniques was a Good Thing. And
> > CLOS especially does not really lock you in too badly, especially with
> > the MOP.
> >
> > programmers sure do differ on basic stuff. :)
>
> Well, he doesn't really say doing OO is bad, does he?  Just that he
> personally never felt the need to use CLOS.  Actually, I sympathize
> with most of the articles he writes about his `Arc' language; I don't
> like the look of the long `multiple-value-bind' either.  But what I
> absolutely, positively don't get is why he thinks all those (IMO)
> minor issues justify all the effort of the creation of a new
> language.  He could easily write a few macros that would turn CL into
> the very language he describes in those articles, or something very
> similar.

This is what struck me very hard too.  But I didn't read it in great detail...

--
Coby
(remove #\space "coby . beck @ opentechgroup . com")

From: Kent M Pitman
Subject: Re: Why Arc isn't especially OO
Date: Wed, 12 Dec 2001 20:37:20 +0000
Message-ID: <sfwpu5kxji7.fsf@shell01.TheWorld.com>

"Coby Beck" <·····@mercury.bc.ca> writes:

> "Nils Goesche" <······@cartan.de> wrote in message
> ···················@pc022.bln.elmeg.de...
> > Kenny Tilton <·······@nyc.rr.com> writes:
> >
> > Well, he doesn't really say doing OO is bad, does he?  Just that he
> > personally never felt the need to use CLOS.  Actually, I sympathize
> > with most of the articles he writes about his `Arc' language; I don't
> > like the look of the long `multiple-value-bind' either.  But what I
> > absolutely, positively don't get is why he thinks all those (IMO)
> > minor issues justify all the effort of the creation of a new
> > language.  He could easily write a few macros that would turn CL into
> > the very language he describes in those articles, or something very
> > similar.
> 
> This is what struck me very hard too.  But I didn't read it in great
> detail...

I haven't read it at all, but I can still speak to the issue of this point.
I think it's related to the IF* thing, though I mention that only to show
another instance of a small matter turned big, when other small matters don't,
so that one has more understanding that this is a general issue, not specific
one to Paul.

It comes back to my thesis that languages are political parties and to
the thought experiment raised in my paper
http://world.std.com/~pitman/PS/Lambda.html where I suggest that EVEN
IF you vacated some other language's semantics and made it identical
to Lisp, or vice versa, the two languages might be momentarily at the
same point in design space, but their velocity vector, or perhaps
better, their acceleration (since we're really talking gravitational
force here), is different.  Over time, as long as there ARE two
languages, they would drift, not move forever after in lockstep
evolving the same.  In time, they would be different again.  It's
almost like them snapping back to their original destiny (as I vaguely
recall might have been the thesis in Asimov's The End of Eternity,
where I think they said that you could make local changes in time
without affecting the whole future of time because things would
eventually converge again).  (Ironic, I suppose that the "controlled"
aspect of design might lead to divergence but the "uncontrolled"
aspect of natural forces might leave to convergence.  Hmm. I'll have
to think about that.)  Anyway, my point is that the analysis here
about "small change" presumes you are measuring distance in design
space rather than acceleration due to gravity.  If you measure the
latter, you may find that the diffrence is much greater, even if the
two things are coincidentally co-located.

In plain English, it isn't where you are but where you're going that
defines whether there might be a need to diverge.  

And unlike in physics, politics uses "negotiated gravitational
attraction" where the quark named "spin" (as in "spin control") is
what makes the difference in who's going to be attracted to whom.

I don't know if that makes it right for people to diverge.  But
hopefully it makes it easier to understand that they did it for
reasons that are perhaps more non-trivial than they seem.  Always
negotiations on getting someone back into the fold should be on the
issue of "direction" not "position", or they will fall on deaf ears.

From: ·······@attbi.com
Subject: Re: Why Arc isn't especially OO
Date: Thu, 13 Dec 2001 02:27:19 +0000
Message-ID: <wkd71jsvif.fsf@attbi.com>

Kent M Pitman <······@world.std.com> writes:

> "Coby Beck" <·····@mercury.bc.ca> writes:
> 
> > "Nils Goesche" <······@cartan.de> wrote in message
> > ···················@pc022.bln.elmeg.de...
> > > Kenny Tilton <·······@nyc.rr.com> writes:
> > > <snip ...>
> > > language.  He could easily write a few macros that would turn CL into
> > > the very language he describes in those articles, or something very
> > > similar.
> > 
> > This is what struck me very hard too.  But I didn't read it in great
> > detail...

It could be that this is what will be (has been?) done. Paul Graham references
comments that were made by Jonathan Rees. On Jonathan's home page, under the
link related to jobs he has had, he mentions that he is currently working for a
company called Clear Methods Inc. His comments about the company:

  "a startup still somewhat in stealth mode. 
   We are using the enhanced productivity made possible by Lisp to make 
   tools for web application construction and deployment. The company's 
   Lisp dialect is disguised as an extension of XML, both to obtain 
   excellent XML integration and because of the immense popular prejudice 
   against Lisp."

   http://mumble.net/jar/industry.html

The web page for Clear Methods is a bit Spartan, indicating a Q4/2001 launch
and a slogan "Ending the Software Crisis":
http://www.clearmethods.com/

It is unclear who owns Clear Methods or what their product is. Maybe Arc is
either a "feeler" to gauge potential support or a "teaser" for functionality
that will be released by a company as a commercial product. This is, of course,
just speculation on my part ...

-- 
Bill Clementson

From: Kent M Pitman
Subject: Re: Why Arc isn't especially OO
Date: Thu, 13 Dec 2001 05:04:44 +0000
Message-ID: <sfwitbblngz.fsf@shell01.TheWorld.com>

·······@attbi.com writes:

> It is unclear who owns Clear Methods or what their product is. Maybe
> Arc is either a "feeler" to gauge potential support or a "teaser"
> for functionality that will be released by a company as a commercial
> product. This is, of course, just speculation on my part ...

I'm pretty sure that whatever Clear Methods' product is, it isn't
Paul Graham's Arc, if that's what you're hinting at.

Language designers just like to chat among each other, like any other
interest group.

From: ·······@attbi.com
Subject: Re: Why Arc isn't especially OO
Date: Thu, 13 Dec 2001 05:44:13 +0000
Message-ID: <wksnaf65b5.fsf@attbi.com>

Kent M Pitman <······@world.std.com> writes:

> ·······@attbi.com writes:
> 
> > It is unclear who owns Clear Methods or what their product is. Maybe
> > Arc is either a "feeler" to gauge potential support or a "teaser"
> > for functionality that will be released by a company as a commercial
> > product. This is, of course, just speculation on my part ...
> 
> I'm pretty sure that whatever Clear Methods' product is, it isn't
> Paul Graham's Arc, if that's what you're hinting at.

I wasn't "hinting" at anything, I  noticed that there were some
similarities between the stated goals for Arc and the work that Jonathan 
Rees was doing and made a speculative comment on that. If they did happen to 
be collaborating on a product, I would view that as a positive thing as it 
would make more sense to me. I have trouble understanding why anyone would try 
to create a new dialect of Lisp and, if Arc happened to be a precursor to a
product built on Lisp but customized for web site creation, I would find that 
more logical than an attempt to create a new lisp dialect just for the hell of 
it. 

-- 
Bill Clementson

From: Lieven Marchand
Subject: Re: Why Arc isn't especially OO
Date: Thu, 13 Dec 2001 17:41:36 +0000
Message-ID: <m33d2fj9v3.fsf@localhost.localdomain>

Kent M Pitman <······@world.std.com> writes:

> It comes back to my thesis that languages are political parties and to
> the thought experiment raised in my paper
> http://world.std.com/~pitman/PS/Lambda.html where I suggest that EVEN
> IF you vacated some other language's semantics and made it identical
> to Lisp, or vice versa, the two languages might be momentarily at the
> same point in design space, but their velocity vector, or perhaps
> better, their acceleration (since we're really talking gravitational
> force here), is different.  Over time, as long as there ARE two
> languages, they would drift, not move forever after in lockstep
> evolving the same.  In time, they would be different again.  It's
> almost like them snapping back to their original destiny 

The C/C++ community did perform a form of your thought experiment,
with one of the original goals of the C++ standardisation committee
being to let ANSI Classic C be as much a subset of ISO C++ as
possible, which they succeeded in remarkably well. (Most differences
are of the trivia kind.) The updated C standard has taken off in a
very different direction and the new C++ standard probably will remove
further from the common subset too. And this is with a reasonable
amount of overlap in the two standard communities.

-- 
Lieven Marchand <···@wyrd.be>
She says, "Honey, you're a Bastard of great proportion."
He says, "Darling, I plead guilty to that sin."
Cowboy Junkies -- A few simple words

From: James A. Crippen
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 03:36:22 +0000
Message-ID: <m3ofl2h3rd.fsf@kappa.unlambda.com>

Nils Goesche <······@cartan.de> writes:

> But what I absolutely, positively don't get is why he thinks all
> those (IMO) minor issues justify all the effort of the creation of
> a new language.  He could easily write a few macros that would
> turn CL into the very language he describes in those articles, or
> something very similar.  I guess he has become rich and doesn't
> know how to use his time for something useful :-)

Now now, don't be too mean (although I'd love to be rich and have time
to tinker with new languages too).  I'd bet that the first
implementation *will* be in a Lisp, but the whole point is to have a
small and easily compiled language.  Scheme is small but not easily
compilable, and CL is large.  Recall that his method of web-service
using Lisp was to spawn processes for each new connection.  Smaller
processes (using smaller languages) spawn more quickly, and take up
less memory.  They also run faster.

I understand his purpose perfectly.  CL is really too large for the
purpose he applied it to.  If you don't need CLOS or hash tables for
instance, then you're losing lots of code space in your binary.
Trimming down CL is a start, but he's taking the chance to tinker with
a new language design, which is fun enough, and might eventually be
useful.

And the Lisp Hacker that has not written their own Lisp-like language
is not the true Lisp Hacker.

'james

-- 
James A. Crippen <·····@unlambda.com> ,-./-.  Anchorage, Alaska,
Lambda Unlimited: Recursion 'R' Us   |  |/  | USA, 61.20939N, -149.767W
Y = \f.(\x.f(xx)) (\x.f(xx))         |  |\  | Earth, Sol System,
Y(F) = F(Y(F))                        \_,-_/  Milky Way.

From: Tim Bradshaw
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 10:27:15 +0000
Message-ID: <fbc0f5d1.0112140227.473c0aa7@posting.google.com>

·····@unlambda.com (James A. Crippen) wrote in message news:<··············@kappa.unlambda.com>...
> Scheme is small but not easily
> compilable, and CL is large.

CL is large?  What on earth are languages like Java / C++ then?  Oh I
know, there's some theoretical minimal core of Java which is `small'
but I don't think you can count it as a java system unless it has at
least a Gb of libraries.  Likewise perl, python (if not now, soon) &c
&c.

CL is really really small.

And this `it takes a long time to start' stuff.  CLISP starts *really*
fast on my semi-antique Sun.

--tim

From: Gabe Garza
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 04:11:44 +0000
Message-ID: <adwmxwxt.fsf@kynopolis.org>

·····@unlambda.com (James A. Crippen) writes:

> I understand his purpose perfectly.  CL is really too large for the
> purpose he applied it to.  If you don't need CLOS or hash tables for
> instance, then you're losing lots of code space in your binary.

   The purpose of responding to http requests, or the implementation 
method of spawning a new process for each request?  If you instead
implement http responses using threads (as, e.g., cl-http does) you 
get faster "process" start-up times then spawning a new OS-level process
and the size of the executable (environment, interpreter, compiler, 
delivered app, whatever) and language becomes largely irrelevant.

   Of course, this assumes that Lisp has threads, which, unfortunately,
is arguable.

Gabe Garza

From: ········@acm.org
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 17:26:43 +0000
Message-ID: <nDqS7.30064$DO.3612592@news20.bellglobal.com>

Gabe Garza <·······@ix.netcom.com> writes:
> ·····@unlambda.com (James A. Crippen) writes:
> > I understand his purpose perfectly.  CL is really too large for the
> > purpose he applied it to.  If you don't need CLOS or hash tables for
> > instance, then you're losing lots of code space in your binary.

> The purpose of responding to http requests, or the implementation
> method of spawning a new process for each request?  If you instead
> implement http responses using threads (as, e.g., cl-http does) you
> get faster "process" start-up times then spawning a new OS-level
> process and the size of the executable (environment, interpreter,
> compiler, delivered app, whatever) and language becomes largely
> irrelevant.

> Of course, this assumes that Lisp has threads, which, unfortunately,
> is arguable.

This points nicely to the "C10K" perspective
<http://www.kegel.com/c10k.html>.

There are multiple strategies, that may even be mixable, and the
"obvious" answers _aren't necessarily correct_.

I'd argue that having a Whopping Huge Shared Binary, in the context of
OSes that cope well with shared binaries, is a Huge Win if you want to
spawn new processes a lot.

If the CLOS or hash table support sits disused in a corner of a Big
Shared Binary, it doesn't do anything to hurt performance.  The same
is true for just about any other bit of "linked-in" stuff you might
point to.

The place where "Big Language Specification" becomes a _problem_ for
spawning a process is when that leads to having to do a lot of
load-time work.  If setting up CLOS requires running lots of load-time
code, and requires allocating a lot of memory, that's a "big lose."

This is exactly the thing that is irritating about making use of
interesting libraries with CL, Perl, Python, Scheme, or, for that
matter, C/C++.

 -> If you use Perl for something, and are using lots of CPAN libs,
    application startup time will include loading up and allocating
    memory to those libs.  The loaded modules are NOT going to be
    shared; if you spawn 15 Perl processes that each load an LDAP
    driver, you may have just 1 copy of the Perl binaries, but you've
    got 15 copies of the LDAP wrappers.

 -> If you run LaTeX, and are always using some custom macro set, that
    macro set has to get loaded, consuming time and memory each time.
    If there are multiple instances, the loaded macros repeat by
    instance; they're not shared.

 -> It takes my CMU/CL instance something like 15 seconds to load in
    the CLG library.  That means that application start time is
    _guaranteed_ to have an initial latency of _at least_ 15 seconds,
    and involves having a bunch of _dynamic_ memory allocation.

 -> Netscape Navigator/Communicator loads up all sorts of font info,
    user info, blah, blah, blah, which leads to bloated start up
    times, and, again, no sharing of the outside-the-binary data.

You don't want to have a lot of respawnings of CL environments;
generating the user's environment seems to be expensive enough that
you don't want to do it too often.

This is where the Apache "mod_lisp" module came in; if loads get heavy
and a single CL instance doesn't cope well enough when hit by lots of
concurrent requests, it would probably be a good thing to spawn
multiple CL instances, and divide requests up amongst them.

.. And I love the serendipity of the occasionally spectacularly
entertaining randomly selected .signature; I couldn't have chosen a
better one if I tried :-).
-- 
(concatenate 'string "cbbrowne" ·@ntlug.org")
http://www.ntlug.org/~cbbrowne/finances.html
"Real concurrency---in which one program actually continues to
function while you call up and use another---is more amazing but of
small use to the average person.  How many programs do you have that
take more than a few seconds to perform any task?"
-- New York Times, 4/25/89

From: Daniel Barlow
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 18:02:30 +0000
Message-ID: <87pu5hwuh5.fsf@noetbook.telent.net>

········@acm.org writes:

> The place where "Big Language Specification" becomes a _problem_ for
> spawning a process is when that leads to having to do a lot of
> load-time work.  If setting up CLOS requires running lots of load-time
> code, and requires allocating a lot of memory, that's a "big lose."

To be picky: allocation is cheap.  Anonymous memory allocation might
even be as cheap as adjusting the value of the "next-free-word"
pointer.  _Writing_ to that allocated memory (e.g. doing fixups) is
the expensive bit, because then your VM actually has to do some work.

KDE suffered badly from this at one time; due to C++ vtables they have
a zillion relocations in every program, which makes applications 
take an age to start up.  There's a resource about this at

  http://www.suse.de/~bastian/Export/linking.txt

-dan

-- 

  http://ww.telent.net/cliki/ - Link farm for free CL-on-Unix resources

From: ········@acm.org
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 19:04:09 +0000
Message-ID: <J2sS7.48854$Us5.3780282@news20.bellglobal.com>

Daniel Barlow <···@telent.net> writes:
> ········@acm.org writes:
> > The place where "Big Language Specification" becomes a _problem_
> > for spawning a process is when that leads to having to do a lot of
> > load-time work.  If setting up CLOS requires running lots of
> > load-time code, and requires allocating a lot of memory, that's a
> > "big lose."

> To be picky: allocation is cheap.  Anonymous memory allocation might
> even be as cheap as adjusting the value of the "next-free-word"
> pointer.  _Writing_ to that allocated memory (e.g. doing fixups) is
> the expensive bit, because then your VM actually has to do some
> work.

I guess I thought it would go without saying that this wasn't merely
doing a malloc(BIGNUM); I was certainly thinking more along the lines
of (for instance) a CLOS implementation going off and instantiating a
whole horde of (say) metaobject protocol objects.

The pathological case would be the scenario where you build the
(sometimes-often-disputed) "minimal CL core," and that core then loads
in, as a combination of load-time/run-time, the rest of the CL system.

In effect, I'm not at all concerned about the VM issues; I'm concerned
about the work involved computing whatever it is that's getting put
into VM.

> KDE suffered badly from this at one time; due to C++ vtables they
> have a zillion relocations in every program, which makes
> applications take an age to start up.  There's a resource about this
> at

>   http://www.suse.de/~bastian/Export/linking.txt

I've noticed that no KDE application "boots up" quickly; the article
is an interesting cautionary analysis that's worth a read even if
you're not doing C++...

It would probably be a neat thing if they could set up some sort of
"lazy library-loading" scheme so that (for instance) the "MS Word
import module" would get loaded either on demand, when the user
actually asked for it, or based on some sort of "lazy loader" thread
that would load in extra libraries a bit later.  Either approach
represents rather a "hack," and could well be pretty
platform-dependent.  Which might be OK if KDE were exclusively a Linux
thing, but they do try to make sure it runs elsewhere...
-- 
(reverse (concatenate 'string ····················@" "454aa"))
http://www.ntlug.org/~cbbrowne/oses.html
Philosophy is a game with objectives and no rules.
Mathematics is a game with rules and no objectives.

From: Thomas F. Burdick
Subject: Re: Why Arc isn't especially OO
Date: Sat, 15 Dec 2001 02:07:27 +0000
Message-ID: <xcv3d2db5i8.fsf@apocalypse.OCF.Berkeley.EDU>

········@acm.org writes:

>  -> It takes my CMU/CL instance something like 15 seconds to load in
>     the CLG library.  That means that application start time is
>     _guaranteed_ to have an initial latency of _at least_ 15 seconds,
>     and involves having a bunch of _dynamic_ memory allocation.

Why don't you just dump a core with it already loaded?  It *would*
take me a long time to start up Hemlock, Garnet, etc., except that I
only have to do it once per CMUCL upgrade.  I do love my
kitchen-sink.core file :-).

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Christophe Rhodes
Subject: Re: Why Arc isn't especially OO
Date: Mon, 17 Dec 2001 11:10:56 +0000
Message-ID: <sqn10iqeyn.fsf@athena.jesus.cam.ac.uk>

···@apocalypse.OCF.Berkeley.EDU (Thomas F. Burdick) writes:

> ········@acm.org writes:
> 
> >  -> It takes my CMU/CL instance something like 15 seconds to load in
> >     the CLG library.  That means that application start time is
> >     _guaranteed_ to have an initial latency of _at least_ 15 seconds,
> >     and involves having a bunch of _dynamic_ memory allocation.
> 
> Why don't you just dump a core with it already loaded?  It *would*
> take me a long time to start up Hemlock, Garnet, etc., except that I
> only have to do it once per CMUCL upgrade.  I do love my
> kitchen-sink.core file :-).

I believe that currently cmucl can't dump a core with unix shared
libraries loaded (or rather, it can, but the library isn't accessible
on restart). I have a vague recollection that there was ongoing work
regarding this problem, but I don't remember who was doing it or how
far they got...

Christophe

From: Erik Naggum
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 06:10:24 +0000
Message-ID: <3217299021996648@naggum.net>

* James A. Crippen
| I understand his purpose perfectly.  CL is really too large for the
| purpose he applied it to.  If you don't need CLOS or hash tables for
| instance, then you're losing lots of code space in your binary.

  I do not understand this position at all.  Regardless of the "tree
  shaking" issues, "too large language" is simply false.  The concept of a
  language being "too large" is vacuous.  Just because starting up a new
  process is thought to be slow, does not mean that the language is too
  large.  First, startup time has nothing to do with language size.
  Second, code size has nothing to do with startup time.  Third, neither
  has binary size or disk footprint or any of the other numerous idiot
  measures of size.  As a matter of counter-evidential fact, most of the
  Schemes I used before I came to my senses are still way, way slower to
  start up than the Common Lisps I have used, especially with actual code
  being used, and Scheme is about the smallest language you can get.

  Execing a new image is usually quite expensive, but forking is not, at
  least under reasonable Unices, so it should be possible to fork a running
  image of a large binary with almost no overhead.  Reasonable Unices even
  do copy-on-write and will not waste memory needlessly, either.  This
  means that the startup time has become _completely_ irrelevant.

  Furhtermore, the one-thread-per-connection model is inefficient compared
  to a tightly integrated I/O loop because a context switch is so much more
  expensive than merely running the same code with several file descriptors.

  Making languages smaller to get faster startup time in a system that has
  decided on the fork and exec model appears to me no smarter or effective
  than shaving to lose weight.

| And the Lisp Hacker that has not written their own Lisp-like language is
| not the true Lisp Hacker.

  Nor is he who has not seen the folly of the endeavor in time.

///
-- 
  The past is not more important than the future, despite what your culture
  has taught you.  Your future observations, conclusions, and beliefs are
  more important to you than those in your past ever will be.  The world is
  changing so fast the balance between the past and the future has shifted.

From: Alexander Kjeldaas
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 09:41:32 +0000
Message-ID: <9vchgh$jj0$1@news.ost.eltele.no>

Erik Naggum wrote:

> * James A. Crippen
> | I understand his purpose perfectly.  CL is really too large for the
> | purpose he applied it to.  If you don't need CLOS or hash tables for
> | instance, then you're losing lots of code space in your binary.
> 
>   I do not understand this position at all.  Regardless of the "tree
>   shaking" issues, "too large language" is simply false.  The concept of a
>   language being "too large" is vacuous.  Just because starting up a new
>   process is thought to be slow, does not mean that the language is too
>   large.  First, startup time has nothing to do with language size.
>   Second, code size has nothing to do with startup time.  Third, neither
>   has binary size or disk footprint or any of the other numerous idiot
>   measures of size.  As a matter of counter-evidential fact, most of the
>   Schemes I used before I came to my senses are still way, way slower to
>   start up than the Common Lisps I have used, especially with actual code
>   being used, and Scheme is about the smallest language you can get.
> 
>   Execing a new image is usually quite expensive, but forking is not, at
>   least under reasonable Unices, so it should be possible to fork a
>   running
>   image of a large binary with almost no overhead.  Reasonable Unices even
>   do copy-on-write and will not waste memory needlessly, either.  This
>   means that the startup time has become _completely_ irrelevant.
> 
>   Furhtermore, the one-thread-per-connection model is inefficient compared
>   to a tightly integrated I/O loop because a context switch is so much
>   more expensive than merely running the same code with several file
>   descriptors.
> 

I have no experience with server programming in lisp, but I would think 
that a fork-once-in-a-while model would be an efficient method for 
implementing a server in a garbage-collected language.  In the 
fork-once-in-a-while model, you have a "tree-shaked" mother process and 
fork off a pool of processes that handle connections.   Each process can 
handle more than one connection, but when the processes have consed too 
much, you kill them off and fork a new process instead of starting garbage 
collection. 

In this system, you have to explicitly communicate with the mother process 
whenever you want to change global state (session management etc. for a 
http server).  In such a system you could also take all kinds of shortcuts 
with the memory allocator since you don't want to garbage collect in the 
slave process.

Now, what I am saying is that you might be best off by using fork() to 
overcome perceived limitations of garbage collection for implementation of 
server processes.  A server usually receives a request, processes that 
request, delivers a result, and starts from scratch.  Another way to say 
this is that most lisps (correct me if I am wrong) don't support the most 
efficient method for setting up and tearing down memory heaps for server 
applications.  This might be a grave accusation given that I stated that I 
have no experience in this area, but read on :-)

A generational garbage collector probably works fairly well for a server 
application.  Most objects live fairly short, and if you garbage collect 
between requests, you'll get decent performance.  However, as you suggest 
above, the most efficient method to implement most server applications is 
as an event-driven state machine.  In such an implementation, you can 
expect to have, say 200 simultaneous requests.  In this situation, there is 
no point in time where a generational garbage collector will work well.  If 
you garbage collect after a request has finished, you will only reclaim 
0.5% of the local heap.  In order to get decent performance out of the 
generational garbage collector you will have to garbage collect every, say 
200 requests which would reclaim on average 50% of the local heap.  The 
downside of this is that you would use twice the memory compared to a 
server written in C. 

I think a solution to this problem is to support multiple heaps.  Each 
request should use a separate heap that can be teared down after the 
request has been serviced. This requires a tiny bit of compiler support, 
but I think it would make it possible to implement a server application 
with virtually no memory or garbage collection overhead compared to a 
server written in C.

The operations I think are needed to support multiple heaps are: an 
operation to create heaps with various characteristics, and the ability to 
call a function so that consing happens within the given heap for 
everything that is done within that function (recursively).

Example:
(let ((my-heap (make-heap :no-external-references-allowed)))
  (call-with-heap my-heap #'my-func))

In order to take full advantage of the ability to create heaps with various 
characteristics, I would need a little help from the memory system and the 
compiler.  For instance, I woule like to create a heap with the invariant 
that no object outside the heap (except for the evaluation stack) should be 
able to refer to an object inside the heap.  This is a fantastic heap 
because it allows me to allocate objects fast, and it allows me to throw 
away the whole heap after I am done with it - exactly the thing I want for 
server programming.  

The compiler support that I need for this heap is that when I do
(call-with-heap my-heap #'my-func)
above, the compiler should make a copy of all stack variables, and it 
should make all global variables copy-on-write.  That way it impossible for 
my-func and functions called by it to "communicate" with the outside world. 
To communicate with the outside world, a message passing interface could be 
used, just as we would do in a fork()-based server.

I think this is one cool thing that Arc could implement - and something 
that would make it ideal for server programming.

There are of course lots of additional issues that needs to be considered 
before this can work, but I think this idea is sound and can make a real 
difference for a certain class of programs (a class which is getting more 
and more important it seems).

astor

From: Carl Shapiro
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 14:39:20 +0000
Message-ID: <ouyn10lj27b.fsf@panix3.panix.com>

Alexander Kjeldaas <··········@fast.no> writes:

> I think a solution to this problem is to support multiple heaps.  Each 
> request should use a separate heap that can be teared down after the 
> request has been serviced. This requires a tiny bit of compiler support, 
> but I think it would make it possible to implement a server application 
> with virtually no memory or garbage collection overhead compared to a 
> server written in C.

On the LispM one can create an "area" which is explicitly marked as
temporary and later clear it when you are absolutely certain that you
are finished with the storage.  However, this is a extremely dangerous
way to manually manage memory.  You will find yourself in serious
trouble if anything pointed into that area prior to resetting it.

It is much safer (and simpler) to pool the objects you will be using
for each request.

From: Kent M Pitman
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 14:55:27 +0000
Message-ID: <sfw6679c0m8.fsf@shell01.TheWorld.com>

Carl Shapiro <········@panix.com> writes:

> Alexander Kjeldaas <··········@fast.no> writes:
> 
> > I think a solution to this problem is to support multiple heaps.  Each 
> > request should use a separate heap that can be teared down after the 
> > request has been serviced. This requires a tiny bit of compiler support, 
> > but I think it would make it possible to implement a server application 
> > with virtually no memory or garbage collection overhead compared to a 
> > server written in C.
> 
> On the LispM one can create an "area" which is explicitly marked as
> temporary and later clear it when you are absolutely certain that you
> are finished with the storage.  However, this is a extremely dangerous
> way to manually manage memory.  You will find yourself in serious
> trouble if anything pointed into that area prior to resetting it.
> 
> It is much safer (and simpler) to pool the objects you will be using
> for each request.

Indeed.  The Lisp Machine experimented briefly with the idea of using 
temporary areas for consing during compilation, figuring all the result of
compilation went away after.  The number of bugs this turned up in programs
was extraordinary and they backed out of it, in favor of the resource kind
of model Carl mentions.

Of course, the lispm used a dynamic flag to control consing.  This
means that code that didn't know it was executing in a temporary area
ended up consing into the other heap.  One might say a lexical flag was
better, but then the code would either be specialized for a given heap
(and you'd need duplicate code specialized to other heaps in order to
reuse the code) or else you'd need explicit dataflow in from an outer caller
saying what heap to use and every call to anything at all that consed would
need to take an extra arg saying what area to cons into.  Each of these is
messy.  

A good generational GC gives you all of this with none of these
hassles, even without resourcing.  Generational GC is ideally designed
for things like server situations that drop all their pointers when
they are done.  The amount of work required to reclaim storage in such 
situations is very small and the application robustness is MUCH higher.

From: Alexander Kjeldaas
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 18:27:07 +0000
Message-ID: <9vdg9u$s6c$1@news.ost.eltele.no>

Kent M Pitman wrote:
> 
> Indeed.  The Lisp Machine experimented briefly with the idea of using
> temporary areas for consing during compilation, figuring all the result of
> compilation went away after.  The number of bugs this turned up in
> programs was extraordinary and they backed out of it, in favor of the
> resource kind of model Carl mentions.
> 
> Of course, the lispm used a dynamic flag to control consing.  This
> means that code that didn't know it was executing in a temporary area
> ended up consing into the other heap.  One might say a lexical flag was
> better, but then the code would either be specialized for a given heap
> (and you'd need duplicate code specialized to other heaps in order to
> reuse the code) or else you'd need explicit dataflow in from an outer
> caller saying what heap to use and every call to anything at all that
> consed would
> need to take an extra arg saying what area to cons into.  Each of these is
> messy.

It seems that the method of using a temporary area described is indeed 
messy.  You need to make sure that all code called conses in the same heap 
in order to preserve the correctness of the system.  Conseptually this 
means one needs an extra parameter to all functions in the system - the 
argument would indicate which heap should be used. However, since this is a 
part of the lisp system, there are a variety of tricks that can be used to 
virtually eliminate any overhead associated with this "extra" parameter. So 
I don't agree with your last comment.  Having an extra "argument" to every 
function call is not necessarily expensive, nor messy. 

> 
> A good generational GC gives you all of this with none of these
> hassles, even without resourcing.  Generational GC is ideally designed
> for things like server situations that drop all their pointers when
> they are done.  The amount of work required to reclaim storage in such
> situations is very small and the application robustness is MUCH higher.
> 

Could you give an example of a generational GC system which would compete 
with a "C-program" in the scenario described in my previous post:  an 
event-driven server which has 200 outstanding requests at any given time?

astor

From: Scott McKay
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 20:28:40 +0000
Message-ID: <YhtS7.17056$5W5.7189232@typhoon.ne.mediaone.net>

"Alexander Kjeldaas" <·····@fast.no> wrote in message
·················@news.ost.eltele.no...

> Could you give an example of a generational GC system which would compete
> with a "C-program" in the scenario described in my previous post:  an
> event-driven server which has 200 outstanding requests at any given time?

If the C program and the program using a generational GC both allocate
roughly the same amount of memory and don't have memory leaks, a
modern generational GC will whup the pants off malloc/free.  Why?
Because GC gets to amortize the cost of calling 'free' over many calls
to malloc.

This effect has been amply documented.

From: Erann Gat
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 22:14:49 +0000
Message-ID: <gat-1412011414490001@eglaptop.jpl.nasa.gov>

In article <·······················@typhoon.ne.mediaone.net>, "Scott
McKay" <···@mediaone.net> wrote:

> If the C program and the program using a generational GC both allocate
> roughly the same amount of memory and don't have memory leaks, a
> modern generational GC will whup the pants off malloc/free.  Why?
> Because GC gets to amortize the cost of calling 'free' over many calls
> to malloc.
> 
> This effect has been amply documented.

References please?

E.

From: Thomas F. Burdick
Subject: Re: Why Arc isn't especially OO
Date: Sat, 15 Dec 2001 00:47:17 +0000
Message-ID: <xcv8zc5b97u.fsf@apocalypse.OCF.Berkeley.EDU>

···@jpl.nasa.gov (Erann Gat) writes:

> In article <·······················@typhoon.ne.mediaone.net>, "Scott
> McKay" <···@mediaone.net> wrote:
> 
> > If the C program and the program using a generational GC both allocate
> > roughly the same amount of memory and don't have memory leaks, a
> > modern generational GC will whup the pants off malloc/free.  Why?
> > Because GC gets to amortize the cost of calling 'free' over many calls
> > to malloc.
> > 
> > This effect has been amply documented.
> 
> References please?

Have you done you due dilligence here?  If you spent 15, 20 minutes
looking for papers, I'm certian you'd find the references you need.
If you're really curious/doubtful, put in the effort.  If you still
can't find any papers, *then* come back and ask other people to do the
work for you.

BTW, a simple thought experiement should suffice to convince doubters
that this is at least *possible*: When you do manual deallocation, you
recursively trace all the dead objects, tracing and freeing their
children.  A GC only traces the live objects.  Since a properly
behaving gen-gc collects when the 0 generation is mostly garbage, it
should do less tracing than manual management.  Of course, this leaves
out a ton of other factors, but since most people's doubts I've heard
about GC efficiency center around tracing...

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Erann Gat
Subject: Re: Why Arc isn't especially OO
Date: Sat, 15 Dec 2001 08:24:05 +0000
Message-ID: <gat-1512010024350001@192.168.1.50>

In article <···············@apocalypse.OCF.Berkeley.EDU>,
···@apocalypse.OCF.Berkeley.EDU (Thomas F. Burdick) wrote:

> ···@jpl.nasa.gov (Erann Gat) writes:
> 
> > In article <·······················@typhoon.ne.mediaone.net>, "Scott
> > McKay" <···@mediaone.net> wrote:
> > 
> > > If the C program and the program using a generational GC both allocate
> > > roughly the same amount of memory and don't have memory leaks, a
> > > modern generational GC will whup the pants off malloc/free.  Why?
> > > Because GC gets to amortize the cost of calling 'free' over many calls
> > > to malloc.
> > > 
> > > This effect has been amply documented.
> > 
> > References please?
> 
> Have you done you due dilligence here?  If you spent 15, 20 minutes
> looking for papers, I'm certian you'd find the references you need.

Well, I did a Google search on "generational garbage collection malloc
free whup pants" and got no results.  :-)

I left out the "whup pants" part and "I'm feeling lucky" takes you to the
Boem GC page, where it says that the performance is generally "comparable"
and "may be substantially [better]" in multi-threaded applications.  Not
exactly unqualified support for a general claim of pants-whupping.

But what I was really after was not documentation for the claim per se,
but rather finding out what Scott considered "ample documentation",
whether his claim had any implicit restrictions, and what he meant
(quantitatively) by "whup the pants off."  I thought simply asking for
references would make the inquiry a little more concise, but I guess some
of the intent got lost.

E.

From: Bruce Hoult
Subject: Re: Why Arc isn't especially OO
Date: Sat, 15 Dec 2001 00:56:45 +0000
Message-ID: <bruce-1DD51C.13564515122001@news.paradise.net.nz>

In article <····················@eglaptop.jpl.nasa.gov>, 
···@jpl.nasa.gov (Erann Gat) wrote:

> In article <·······················@typhoon.ne.mediaone.net>, "Scott
> McKay" <···@mediaone.net> wrote:
> 
> > If the C program and the program using a generational GC both allocate
> > roughly the same amount of memory and don't have memory leaks, a
> > modern generational GC will whup the pants off malloc/free.  Why?
> > Because GC gets to amortize the cost of calling 'free' over many calls
> > to malloc.
> > 
> > This effect has been amply documented.
> 
> References please?

You don't even need generational GC.  Take any random C program that 
uses malloc/free and link in Boehm and "#define free(p)".  I've done 
this with quite a number of (proprietry) programs and never yet seen a 
performance decrease, and often substantial performance increases.  Not 
to mention bugs and leaks disappearing.

Of course it may just be that Boehm is simply better written than your 
average OS or compiler vendor's malloc/free, but that's hardly an 
argument against using it.

-- Bruce

From: Scott McKay
Subject: Re: Why Arc isn't especially OO
Date: Sat, 15 Dec 2001 15:45:34 +0000
Message-ID: <yeKS7.3718$zX1.6108026@typhoon.ne.mediaone.net>

"Scott McKay" <···@mediaone.net> wrote in message
····························@typhoon.ne.mediaone.net...
>
> "Alexander Kjeldaas" <·····@fast.no> wrote in message
> ·················@news.ost.eltele.no...
>
> > Could you give an example of a generational GC system which would
compete
> > with a "C-program" in the scenario described in my previous post:  an
> > event-driven server which has 200 outstanding requests at any given
time?
>
> If the C program and the program using a generational GC both allocate
> roughly the same amount of memory and don't have memory leaks, a
> modern generational GC will whup the pants off malloc/free.  Why?
> Because GC gets to amortize the cost of calling 'free' over many calls
> to malloc.
>
> This effect has been amply documented.
>

By the way, the site at the company I work at -- www.hotdispatch.com --
is written in Lisp, and we have app servers that listen about 100+ sockets
at a time.  We cycle the servers about once a week, but that's usually
because it's the simplest way to ensure they have all patches loaded, etc.
We don't have memory leak or performance problems.

We're using Xanalys (nee Harlequin) Lisp on Netras.

From: Alexander Kjeldaas
Subject: Re: Why Arc isn't especially OO
Date: Sat, 15 Dec 2001 17:59:52 +0000
Message-ID: <9vg32p$lg7$1@news.ost.eltele.no>

Scott McKay wrote:

> 
> "Alexander Kjeldaas" <·····@fast.no> wrote in message
> ·················@news.ost.eltele.no...
> 
>> Could you give an example of a generational GC system which would compete
>> with a "C-program" in the scenario described in my previous post:  an
>> event-driven server which has 200 outstanding requests at any given time?
> 
> If the C program and the program using a generational GC both allocate
> roughly the same amount of memory and don't have memory leaks, a
> modern generational GC will whup the pants off malloc/free.  Why?
> Because GC gets to amortize the cost of calling 'free' over many calls
> to malloc.
> 
> This effect has been amply documented.

My point is that a generational GC and a C program will not possibly 
allocate "rougly the same amount of memory". Let me repeat the argument 
from my initial posting:

The problem for an event-driven server is that if you have 200 outstanding 
requests at any given time, you have 200 "contexts" where you allocate 
memory.  These contexts are independant of each other and there is no 
single point in time where all of them can be "freed" (remember the 
requirement that you have 200 outstanding requests _at any given time_). If 
you want to keep your memory usage comparable to what a C program would 
use, you will have to GC fairly often - say every 20 requests.  Since only 
10% of the local heap will be freed by the generational GC (20 of 200 
requests), the performance of the generational GC will not be great.

An alternative to GCing often is GCing seldom.  You would GC every 200th 
request.  In this case you would be able to free 50% of the local heap, but 
you would use rougly twice the memory of a C program.  The performance of 
the generational GC is still not great (it only removes 50% of the memory 
it scans).

Or you would GC every 1000th request, and use huge amounts of memory.

These calculations are not quite fair as a generational GC will have less 
fragmentation than malloc()/free(), but I think the general picture is 
correct.

If you think a generational GC is _extremely_ fast compared to malloc() you 
might argue that even if a generational GC has to collect the local heap 10 
times as often for an event-driven server than it would have for a normal 
program, that it still would beat malloc/free, then I think you are wrong. 
I don't think there is any generational GC implementation with a documented 
performance which is that good, but please prove me wrong.

So I am not saying that a generational GC always suck, and I am not 
interested in a general discussion about whether generational GC is faster 
than malloc()/free().  I am interested in peculiar problems related to 
server programming.

astor

From: Scott McKay
Subject: Re: Why Arc isn't especially OO
Date: Sat, 15 Dec 2001 21:22:47 +0000
Message-ID: <HaPS7.3734$zX1.6167043@typhoon.ne.mediaone.net>

"Alexander Kjeldaas" <··········@fast.no> wrote in message
·················@news.ost.eltele.no...
>
> The problem for an event-driven server is that if you have 200 outstanding
> requests at any given time, you have 200 "contexts" where you allocate
> memory.  These contexts are independant of each other and there is no
> single point in time where all of them can be "freed" (remember the
> requirement that you have 200 outstanding requests _at any given time_).
If
> you want to keep your memory usage comparable to what a C program would
> use, you will have to GC fairly often - say every 20 requests.  Since only
> 10% of the local heap will be freed by the generational GC (20 of 200
> requests), the performance of the generational GC will not be great.
>
> An alternative to GCing often is GCing seldom.  You would GC every 200th
> request.  In this case you would be able to free 50% of the local heap,
but
> you would use rougly twice the memory of a C program.  The performance of
> the generational GC is still not great (it only removes 50% of the memory
> it scans).

You description leaves me completely confused.  It's clearly the case
that a Lisp program can be written that *allocates* the same amount
of memory as a C program that does the same thing, right?  And it's
clearly the case that, if the requests made to the Lisp and the C program
are the same, that the same amount of memory will either become garbage
or be released via 'free' at the same time, right?  So the only difference
is
exactly when the memory will actually be freed?  The GC will run as
often as necessary to meet certain criteria, such as "x% of memory
should be available for allocation".  And the GC will reclaim all the
garbage (*) when it runs, right?

I just don't think you are setting up a scenario that demonstrates any
difference, except that the GC is likely to have lower overhead than
repeated calls to 'free'.  (OK, check out Paul Wilson's big survey
papers if you want pointers to more information on this.)

If you are really concerned about certain kinds of allocation, you
can also "resource" the objects in questions.

What can I say, I'm one of the authors of a Lisp web server that
meets your criteria (except we usually have about 100 "open"
requests), and it works just fine.

----------
(*) "Reclaim all the garbage", to first approximation.  A precise GC will
reclaim all the garbage.  In practice, a conservative GC will reach a stable
state in which some small percentage of memory will not be reclaimed,
but this is usually "in the noise".

From: Thomas F. Burdick
Subject: Re: Why Arc isn't especially OO
Date: Sat, 15 Dec 2001 22:06:30 +0000
Message-ID: <xcvvgf8i1eh.fsf@conquest.OCF.Berkeley.EDU>

"Scott McKay" <···@mediaone.net> writes:

> I just don't think you are setting up a scenario that demonstrates any
> difference, except that the GC is likely to have lower overhead than
> repeated calls to 'free'.  (OK, check out Paul Wilson's big survey
> papers if you want pointers to more information on this.)
> 
> If you are really concerned about certain kinds of allocation, you
> can also "resource" the objects in questions.

Aha, I knew something was bugging me in that line of argument.  If
free() were free (har-har), then the manually-managed program would
have a smaller memory footprint, because it would have 0% garbage at
any given moment.  The GC'd server would have a decently high
percentage of garbage just before collecting.  However, free() is
expensive, so any sane server is going to allocate pools for requests,
and allocate out of that pool within a request.  The
yet-to-be-allocated regions of the pools are garbage.  As are the
recycled pools on the free list.  Certainly these won't be given back
to the system, because they'd just get allocated again.

So the only way for the manually-managed program to have low memory
overhead would be to waste huge numbers of cycles to allocate and free
memory from and to the system.  Does anyone write servers this way?  I
sure hope not.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Alexander Kjeldaas
Subject: Re: Why Arc isn't especially OO
Date: Sun, 16 Dec 2001 23:40:16 +0000
Message-ID: <9vjbd4$g15$1@news.ost.eltele.no>

Scott McKay wrote:

> 
> "Alexander Kjeldaas" <··········@fast.no> wrote in message
> ·················@news.ost.eltele.no...
>>
>> The problem for an event-driven server is that if you have 200
>> outstanding requests at any given time, you have 200 "contexts" where you
>> allocate
>> memory.  These contexts are independant of each other and there is no
>> single point in time where all of them can be "freed" (remember the
>> requirement that you have 200 outstanding requests _at any given time_).
> If
>> you want to keep your memory usage comparable to what a C program would
>> use, you will have to GC fairly often - say every 20 requests.  Since
>> only 10% of the local heap will be freed by the generational GC (20 of
>> 200 requests), the performance of the generational GC will not be great.
>>
>> An alternative to GCing often is GCing seldom.  You would GC every 200th
>> request.  In this case you would be able to free 50% of the local heap,
> but
>> you would use rougly twice the memory of a C program.  The performance of
>> the generational GC is still not great (it only removes 50% of the memory
>> it scans).
> 
> You description leaves me completely confused.  It's clearly the case
> that a Lisp program can be written that *allocates* the same amount
> of memory as a C program that does the same thing, right?  

Right

> And it's
> clearly the case that, if the requests made to the Lisp and the C program
> are the same, that the same amount of memory will either become garbage
> or be released via 'free' at the same time, right?  So the only difference
> is
> exactly when the memory will actually be freed?  The GC will run as
> often as necessary to meet certain criteria, such as "x% of memory
> should be available for allocation".  And the GC will reclaim all the
> garbage (*) when it runs, right?
> 

I agree.  However, the assumption behind generational GC is that objects 
that were recently allocated will die fast, and that those that survive 
will live for a long time.  You get good performance by scanning only a 
subset of the heap, the local heap. If you are event-driven, your 
allocation patterns may look more round-robin, and both of the assumptions 
behind generational GC fail more or less. The generational GC 
implementation might have problems with this.  On the other hand, setting 
up a memory area that you allocate all objects from (often called "areas" 
in the C/C++ world), and tearing it down when you are finished with the 
request is a well known technique which is used all over the place where 
server programming happens.  It is an extremely efficient way of handling 
this issue, and I think it would be nice to have some language support for 
this.

> I just don't think you are setting up a scenario that demonstrates any
> difference, except that the GC is likely to have lower overhead than
> repeated calls to 'free'.  (OK, check out Paul Wilson's big survey
> papers if you want pointers to more information on this.)
> 

free() would be a nop.

> If you are really concerned about certain kinds of allocation, you
> can also "resource" the objects in questions.
> 

I would prefer a general solution that works for all allocation, for all 
objects, also allocation done in a library not written with GC quirks in 
mind.  But I agree that resourcing is a good engineering solution that 
overcomes GC problems.

> What can I say, I'm one of the authors of a Lisp web server that
> meets your criteria (except we usually have about 100 "open"
> requests), and it works just fine.

astor

From: Erik Naggum
Subject: Re: Why Arc isn't especially OO
Date: Mon, 17 Dec 2001 06:36:14 +0000
Message-ID: <3217559771425033@naggum.net>

* Alexander Kjeldaas <··········@fast.no>
| On the other hand, setting up a memory area that you allocate all objects
| from (often called "areas" in the C/C++ world), and tearing it down when
| you are finished with the request is a well known technique which is used
| all over the place where server programming happens.  It is an extremely
| efficient way of handling this issue, and I think it would be nice to
| have some language support for this.

  You need explicit allocation to make this work.  Common Lisp does not
  have explicit allocation.  C++ does, and therefore needs this dubious
  "feature".  There is in fact _no_ language support in Common Lisp for
  _any_ form of allocation or deallocation.  Think about it.

  In Common Lisps with a generational scavending garbage collector, such
  "areas" would amount to a "private" newspace, the GC'ing of which would
  copy live objects somewhere else before it got torn down.  Because the
  whole language does allocation at will, the amount of control that would
  benefit from such areas being torn down as an inexpensive operation is
  simply not there.  Whether you garbage collect a private newspace or the
  global newspace thus has no impact on performance (other than to copy any
  living objects once).

| I would prefer a general solution that works for all allocation, for all
| objects, also allocation done in a library not written with GC quirks in
| mind.  But I agree that resourcing is a good engineering solution that
| overcomes GC problems.

  Resourcing is not about "overcoming GC problems".  Their main use is to
  avoid allocating space that is usually freed immediately.  In effect,
  resources are in Common Lisp what "areas" are in C++: privately managed
  allocation.  Normally, however, allocated resources are discarded when
  GC happens, because they are maintained using weak pointers.  This is a
  useful technique only when one expects allocating a lot of garbage of the
  exact same type.

  It appears that the desire to control memory allocation manually is
  almost a one-bit mental state.  When I recently argued in a C group that
  calling free in short-lived or low-garbage applications was a waste of
  programmer brain cycles, one fairly irrational dude jumped on me and got
  nearly hysterical that I could even suggest such a thing.  Explicit calls
  to free was to him the very mark of good programmer, and lack of which
  the mark of the devi.  He argued that some broken version of Windows did
  not free memory allocated by a process unless expliciet calls to free
  were made all of the program, and has yet to realize that the goals of
  (manual) memory management may be obtained by automatic means.  I, on the
  other hand, consider manual memory management to be _really_ stupid, and
  the desire to obtain control over this part of program development by
  forcing every allocation and deallocation to be manual as sheer lunacy.
  I have 15 years of experience writing C applications where manual memory
  allocation was never a problem.  Now it is.  My bit has been inverted.

///
-- 
  The past is not more important than the future, despite what your culture
  has taught you.  Your future observations, conclusions, and beliefs are
  more important to you than those in your past ever will be.  The world is
  changing so fast the balance between the past and the future has shifted.

From: Michael J. Ferrador
Subject: Dataspaces
Date: Fri, 14 Dec 2001 22:22:19 +0000
Message-ID: <vYuS7.4671$bs2.1082405@news02.optonline.net>

has anyone had experience porting lisp to (only 2 examples I know)

 PDP-11/ >=40? (I & D (2) space)

 370/XA ((8 bits of) Dataspaces, (1?) Hyperspace)

The dataspace(s) is a 0 to 2^n space, no OS, associated
in the MMU to a regular address space, addressable by
extra bit(s) in the  instruction or addressing mode.

LispMe (scheme) moved some (vector?) datatypes to their own
Motorola DragonBall 68K +/- 32K reletive branch segments

> Kent M Pitman wrote:
> >
> > Indeed.  The Lisp Machine experimented briefly with the idea of using
> > temporary areas for consing during compilation, figuring all the result
of
> > compilation went away after.  The number of bugs this turned up in
> > programs was extraordinary and they backed out of it, in favor of the
> > resource kind of model Carl mentions.
> >
> > Of course, the lispm used a dynamic flag to control consing.  This
> > means that code that didn't know it was executing in a temporary area
> > ended up consing into the other heap.  One might say a lexical flag was
> > better, but then the code would either be specialized for a given heap
> > (and you'd need duplicate code specialized to other heaps in order to
> > reuse the code) or else you'd need explicit dataflow in from an outer
> > caller saying what heap to use and every call to anything at all that
> > consed would
> > need to take an extra arg saying what area to cons into.  Each of these
is
> > messy.
> >
Alexander Kjeldaas <·····@fast.no> wrote in message
·················@news.ost.eltele.no...
> It seems that the method of using a temporary area described is indeed
> messy.  You need to make sure that all code called conses in the same heap
> in order to preserve the correctness of the system.  Conseptually this
> means one needs an extra parameter to all functions in the system - the
> argument would indicate which heap should be used. However, since this is
a
> part of the lisp system, there are a variety of tricks that can be used to
> virtually eliminate any overhead associated with this "extra" parameter.
So
> I don't agree with your last comment.  Having an extra "argument" to every
> function call is not necessarily expensive, nor messy.

on 370/XA dataspaces (8 bit select of 31/32 ? bit spaces)
make the type number part of the pointer, to the right space?

or should each namespace / package get it's own address space?

---

any other ISA's have this feature, that I didn't know about?

might be on the way out with 64 bit CPU's, but if there is a programming
benefit
how does an 8 / 56 bit mode sound?

From: Kent M Pitman
Subject: Re: Dataspaces
Date: Fri, 14 Dec 2001 23:04:18 +0000
Message-ID: <sfwpu5hmmj1.fsf@shell01.TheWorld.com>

"Michael J. Ferrador" <·····@orn.com> writes:

> has anyone had experience porting lisp to (only 2 examples I know)
> 
>  PDP-11/ >=40? (I & D (2) space)
> 
>  370/XA ((8 bits of) Dataspaces, (1?) Hyperspace)
> 
> The dataspace(s) is a 0 to 2^n space, no OS, associated
> in the MMU to a regular address space, addressable by
> extra bit(s) in the  instruction or addressing mode.
> 
> LispMe (scheme) moved some (vector?) datatypes to their own
> Motorola DragonBall 68K +/- 32K reletive branch segments

There may be specific situations for which this works, like the large
data tables one uses when decoding a gif that is not shared with
anyone and that is discarded upon being done with the gif.  But in
general the problem isn't the bits at all.  That is imminently doable
The problem is the linguistic effort of specifying which program uses
uses which bits.  Lisp is a language that is SPECIFICALLY not about
keeping track of data layout, and it is completely counter to the
entire design of the language to have to think about this.  It would
perturb every aspect of every program, IMO.  The attempts I have seen
to do this by some hidden/automatic means have failed utterly, and I
think for this reason: The reason it works in C/malloc is because
people want to, for each and every program, think about memory
management.  That works, if you like that kind of thing--the "fast GC"
of malloc/free is paid for in the pain and suffering of every
programming minute.  Not thinking about memory management works for
Lisp and GC; the freedom to program incurs a GC overhead that we try
to keep small but is nevertheless always present.  But hybrids "not
thinking but still doing" are doomed to failure because they expect
magic to happen.  There is no magic in the Lisp approach nor the C
approach.  There is magic in the hybrid because it expects the best of
both worlds doing the work of neither.

> > Kent M Pitman wrote:
> > >
> > > Indeed.  The Lisp Machine experimented briefly with the idea of using
> > > temporary areas for consing during compilation, figuring all the result
> of
> > > compilation went away after.  The number of bugs this turned up in
> > > programs was extraordinary and they backed out of it, in favor of the
> > > resource kind of model Carl mentions.
> > >
> > > Of course, the lispm used a dynamic flag to control consing.  This
> > > means that code that didn't know it was executing in a temporary area
> > > ended up consing into the other heap.  One might say a lexical flag was
> > > better, but then the code would either be specialized for a given heap
> > > (and you'd need duplicate code specialized to other heaps in order to
> > > reuse the code) or else you'd need explicit dataflow in from an outer
> > > caller saying what heap to use and every call to anything at all that
> > > consed would
> > > need to take an extra arg saying what area to cons into.  Each of these
> is
> > > messy.
> > >
> Alexander Kjeldaas <·····@fast.no> wrote in message
> ·················@news.ost.eltele.no...
> > It seems that the method of using a temporary area described is indeed
> > messy.  You need to make sure that all code called conses in the same heap
> > in order to preserve the correctness of the system.  Conseptually this
> > means one needs an extra parameter to all functions in the system - the
> > argument would indicate which heap should be used. However, since this is
> a
> > part of the lisp system, there are a variety of tricks that can be used to
> > virtually eliminate any overhead associated with this "extra" parameter.
> So
> > I don't agree with your last comment.  Having an extra "argument" to every
> > function call is not necessarily expensive, nor messy.
> 
> on 370/XA dataspaces (8 bit select of 31/32 ? bit spaces)
> make the type number part of the pointer, to the right space?

I don't see what this would buy.

> or should each namespace / package get it's own address space?

No.  *package* is dynamic information and would cause bad side-effects on
code that was tested with a different prevailing *package*.  If you mean the
package of the symbol that is the definition name, that is arbitrary and would
not be suitable because of sharing of symbols among packages, because packages
share code (e.g., through macros) so t hat there is no necessary relation between the
package of the function called and the package of the data, and it begs the
question of whose package is active in:
 (let ((*package* (find-package 'alpha)))
   (funcall (let ((*package* 'beta))
              (compile nil 
                       (let ((*package* 'gamma))
                         (read-from-string *some-definition*))))))
where *some-definition* holds perhaps a string like:
 "(delta:some-macro (epsilon:some-function zeta:*some-special* 'eta:some-constant))"
What on earth would it mean to have this function partition its consing
by package?

> any other ISA's have this feature, that I didn't know about?
> 
> might be on the way out with 64 bit CPU's, but if there is a programming
> benefit
> how does an 8 / 56 bit mode sound?

What is all this discussion of bits???  There are a million ways to implement
anything we decide the language should mean.  Our problem is not absence
of low-level substrate to support cool ideas, it is absence of linguistic
clarity to know what is being asked for by the programmer such that we don't
gc an area that might be pointed into by other areas we didn't bother
to check or other such horror stories like that.  It's all well and good
to have hairy bookkeeping schemes but ultimately what you say in the language
has to really clearly keep the programmer from getting into accidental
trouble.  I can't imagine a discussion that significantly relies on any
discussion of bits, bytes, or pointers will improve matters.  The discussion
that needs to happen has to talk about dataflow through explicit parameter
passing, dynamic and static variable bindings, etc., as well as general 
theories of what compilers may know or infer from text in a program, and how
users can plainly express intent in a way that will not get them into trouble.
Bits and bytes ultimately will underly any such implementation, but no 
discussion of bits and bytes seems to me to elucidate such an implementation.

JMO.

From: Alexander Kjeldaas
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 18:07:56 +0000
Message-ID: <9vdf5u$qkd$1@news.ost.eltele.no>

Carl Shapiro wrote:

> Alexander Kjeldaas <··········@fast.no> writes:
> 
>> I think a solution to this problem is to support multiple heaps.  Each
>> request should use a separate heap that can be teared down after the
>> request has been serviced. This requires a tiny bit of compiler support,
>> but I think it would make it possible to implement a server application
>> with virtually no memory or garbage collection overhead compared to a
>> server written in C.
> 
> On the LispM one can create an "area" which is explicitly marked as
> temporary and later clear it when you are absolutely certain that you
> are finished with the storage.  However, this is a extremely dangerous
> way to manually manage memory.  You will find yourself in serious
> trouble if anything pointed into that area prior to resetting it.
> 
> It is much safer (and simpler) to pool the objects you will be using
> for each request.
> 

I think several lisps have the ability to allocate such areas.  However, I 
have not heard of an implementation that has the compiler support 
needed to make it safe to use them.  I am not advocating using areas the 
way you describe them.

astor

From: Carl Shapiro
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 21:22:26 +0000
Message-ID: <ouy8zc55wfh.fsf@panix3.panix.com>

Alexander Kjeldaas <·····@fast.no> writes:

> I think several lisps have the ability to allocate such areas.  However, I 
> have not heard of an implementation that has the compiler support 
> needed to make it safe to use them.  I am not advocating using areas the 
> way you describe them.

The problems one often runs into with these sorts of allocation
schemes are not easily solved with help from the compiler.  For
example, how do you plan on handling common operations that absolutely
have to side effect the environment such as I/O and logging?  What
about asynchronous operations that hold onto a buffer but may not
complete by the time your form unwinds?  I suppose you could get away
with wrapping only small extents of code with a "cons over here
instead" form.  Nevertheless, I cannot see how one would make
effective use out of this magic feature without submitting to a
painfully rigorous coding convention.  However, if you have given any
thought to solving these screw cases, I would very much like to hear
about it.

From: Kent M Pitman
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 21:36:05 +0000
Message-ID: <sfwellx7ad6.fsf@shell01.TheWorld.com>

Carl Shapiro <········@panix.com> writes:

> Alexander Kjeldaas <·····@fast.no> writes:
> 
> > I think several lisps have the ability to allocate such areas.  However, I 
> > have not heard of an implementation that has the compiler support 
> > needed to make it safe to use them.  I am not advocating using areas the 
> > way you describe them.
> 
> The problems one often runs into with these sorts of allocation
> schemes are not easily solved with help from the compiler.  For
> example, how do you plan on handling common operations that absolutely
> have to side effect the environment such as I/O and logging?  

Exactly.  Or program-level caching, debugging traces, etc.

> What
> about asynchronous operations that hold onto a buffer but may not
> complete by the time your form unwinds?

Not to mention code to manage keyboard interrupts, synchronous
condition handlers and other closures passed down into this code from
elsewhere in the system, daemon programs like "if-added" daemons that
wake up and run when very general-purpose operations are done, method
combination issues, etc.

> I suppose you could get away
> with wrapping only small extents of code with a "cons over here
> instead" form.  Nevertheless, I cannot see how one would make
> effective use out of this magic feature without submitting to a
> painfully rigorous coding convention.  However, if you have given any
> thought to solving these screw cases, I would very much like to hear
> about it.

Right.

And keep in mind throughout that there is substantial hidden cost in
(a) proving such weird dataflows would work, (b) retooling your system
to use these specialized techniques, (c) discussions within your
company about the fact committing to these techniques now means you
can't do a variety of extensions in straightforward ways and must seek
workarounds, (d) lack of real-time response to customer requests
because working within such specialized schemes can be very tricky to
write and can involve very complex QA to be sure you've gotten right,
(e) more support dollars to manage because bugs are hard to understand
when reported and hard to track down without good reports, and on and
on.

From: Alexander Kjeldaas
Subject: Re: Why Arc isn't especially OO
Date: Sat, 15 Dec 2001 17:22:56 +0000
Message-ID: <9vg0tj$kfo$1@news.ost.eltele.no>

Kent M Pitman wrote:

> Carl Shapiro <········@panix.com> writes:
> 
>> Alexander Kjeldaas <·····@fast.no> writes:
>> 
>> > I think several lisps have the ability to allocate such areas. 
>> > However, I have not heard of an implementation that has the compiler
>> > support
>> > needed to make it safe to use them.  I am not advocating using areas
>> > the way you describe them.
>> 
>> The problems one often runs into with these sorts of allocation
>> schemes are not easily solved with help from the compiler.  For
>> example, how do you plan on handling common operations that absolutely
>> have to side effect the environment such as I/O and logging?
> 
> Exactly.  Or program-level caching, debugging traces, etc.
> 

Let me backtrack a little and repeat what I believe we are discussing here. 
As I understand it, you are critical of the idea that a compiler/lisp 
system can provide support for a separate heap (heap A) so that it would be 
impossible to create a pointer in some other heap in the system that would 
point to an object within heap A.  

The above does _not_ mean that one has to avoid side effects.  Let me give 
an analogy which works pretty well for the particular type of heap I used 
in my example (the no-external-references-allowed heap).

Imagine two UNIX processes A and B.  A has forked off B and is B's parent.  
Now B in effect has its own heap which is independent from A's heap.  When 
B reads from its heap, it will in effect read from the same physical pages 
as A's heap.  When B tries to write to its own heap, it will be denied 
this.  The OS will copy the page B tried to write to and make it B's page. 
If B wants to change A's heap, it can do so, but it has to communicate with 
A using some kind of RPC protocol.  Using RPC, B can change A's heap at 
will, but it cannot do one thing - it cannot make A's heap refer to an 
object in B's heap.  Similarly, A can change B's heap.

This analogy is not quite right - Unix processes executes independently of 
each other.  Please ignore that for a moment and imagine that you could 
have two processes which executed as one.  Process A would call process B 
using RPC, and B could call A.

When you do an RPC call, arguments are marshalled.  This is what I meant 
with a "message passing interface" that would have to be used when 
communicating between heaps.  Say a function is executing in a 
no-external-references-allowed heap context and wants to call another 
function that should use the main heap context.  Thec all would have to be 
made in a manner such that arguments would be "marshalled" and no 
references to the no-external-references-allowed heap could get through.  
Such an "RPC call" doesn't need to be a normal call, you would explicitly 
specify when you wanted to modify the global heap.

It is possible to imagine problems doing such a call if the argument is a 
very complex structure.  You could get into "deep copy" and other 
problems.  It is not necessary to make this complicated.  A system could 
only allow simple objects to be passed as arguments: numbers, characters, 
and strings would probably be enough.

This means that certain parts of the system such as I/O and logging might 
have an "RPC" wrapper so that all parts of the system had the same view of 
the global state, but I don't see that as a major problem.

>> What
>> about asynchronous operations that hold onto a buffer but may not
>> complete by the time your form unwinds?

I am not sure I understand this particular problem.

> 
> Not to mention code to manage keyboard interrupts, synchronous
> condition handlers and other closures passed down into this code from
> elsewhere in the system, daemon programs like "if-added" daemons that
> wake up and run when very general-purpose operations are done, method
> combination issues, etc.
> 

Keyboard interrupts: Can they not run as any other code in the main heap 
context?

Condition handlers: don't let anything other than simple objects be passed 
as arguments to conditions that cross a "heap-boundary" (a place in the 
call-stack where one function uses one heap context and another function 
uses another heap context).  Or else you must make sure you set up a 
handler within the no-external-references-allowed heap.  In most cases, 
this would be a pretty natural thing to do.

Closures passed down: You are not allowed to pass anything but simple 
objects as arguments when calling a function which uses another heap 
context.

Daemon programs: A closure created in a no-external-references-allowed heap 
could be called as long as you only pass simple objects as arguments or 
have a way of marshalling the arguments.

I am not sure what issues related to method combination you refer to.

>> I suppose you could get away
>> with wrapping only small extents of code with a "cons over here
>> instead" form.  Nevertheless, I cannot see how one would make
>> effective use out of this magic feature without submitting to a
>> painfully rigorous coding convention.  However, if you have given any
>> thought to solving these screw cases, I would very much like to hear
>> about it.
> 
> Right.
> 
> And keep in mind throughout that there is substantial hidden cost in
> (a) proving such weird dataflows would work, (b) retooling your system
> to use these specialized techniques, (c) discussions within your
> company about the fact committing to these techniques now means you
> can't do a variety of extensions in straightforward ways and must seek
> workarounds, (d) lack of real-time response to customer requests
> because working within such specialized schemes can be very tricky to
> write and can involve very complex QA to be sure you've gotten right,
> (e) more support dollars to manage because bugs are hard to understand
> when reported and hard to track down without good reports, and on and
> on.

I don't think something like this would be added to a commercial lisp in 
the near future :-)  I am discussing this within the context of Arc as 
something that would be cool to have, and something that I think would be 
especially well suited for server programming - which is something Arc aims 
for, and something I believe generational GC has problems with under high 
system loads (200 concurrent requests).

astor

From: Carl Shapiro
Subject: Re: Why Arc isn't especially OO
Date: Sat, 15 Dec 2001 23:23:43 +0000
Message-ID: <ouysnaccbk0.fsf@panix3.panix.com>

Alexander Kjeldaas <··········@fast.no> writes:

> This means that certain parts of the system such as I/O and logging might 
> have an "RPC" wrapper so that all parts of the system had the same view of 
> the global state, but I don't see that as a major problem.

Having to marshal arguments in and out of your private heap world
seems like an awful lot of overhead, very likely involving more
computation than a "nursery" generation scavenge.  Furthermore,
call-backs into the code running in the special heap context would seem
to be fraught with peril.

> >> What
> >> about asynchronous operations that hold onto a buffer but may not
> >> complete by the time your form unwinds?
> 
> I am not sure I understand this particular problem.

Queued I/O operations require one to retain buffers that can stay
active for long periods of time until an I/O completes.  Usually,
call-backs are used to signal the completion of an operation.  Without
having to marshal data out of your magic heap world to issue an I/O
request, how would you propose to safely and efficiently accommodate
the large buffers needed for these operations and the call-backs made
into your process?

> I am not sure what issues related to method combination you refer to.

I do not know what Kent had in mind here, but I can foresee situations
where somebody advises a function or sticks and :AROUND method on a
generic function which you may not know about and all sorts of
horrible things happening.

Your system may make sense if you have complete control over all of
the code that would be called in these special heap extents.  However,
allocation schemes like this make it incredibly difficult to
incorporate library code into your application add also add a lot of
"context" to your application.  Comprehending such code will likely
have a high cognitive cost.

> I don't think something like this would be added to a commercial lisp in 
> the near future :-)  I am discussing this within the context of Arc as 
> something that would be cool to have, and something that I think would be 
> especially well suited for server programming - which is something Arc aims 
> for, and something I believe generational GC has problems with under high 
> system loads (200 concurrent requests).

People have offered empirical evidence that a generational collector
is not a limiting factor for these sorts of server applications.  Do
you have any real evidence that such a collector has problems under
"high system loads"?

From: Alexander Kjeldaas
Subject: Re: Why Arc isn't especially OO
Date: Sun, 16 Dec 2001 23:10:21 +0000
Message-ID: <9vj9l2$fnq$1@news.ost.eltele.no>

Carl Shapiro wrote:

> Alexander Kjeldaas <··········@fast.no> writes:
> 
>> This means that certain parts of the system such as I/O and logging might
>> have an "RPC" wrapper so that all parts of the system had the same view
>> of the global state, but I don't see that as a major problem.
> 
> Having to marshal arguments in and out of your private heap world
> seems like an awful lot of overhead, very likely involving more
> computation than a "nursery" generation scavenge.  Furthermore,
> call-backs into the code running in the special heap context would seem
> to be fraught with peril.
> 

How marshalling is done depends on how you implement this, but I would 
think that a few simple pointer comparisons to check that each argument 
points into a pointerfree heap could suffice. 

I think you can accept that some function calls are a bit slow in order to 
avoid GCing. 

It would be impossible to register a call-back to a closure which was 
created in the no-external-references-allowed heap because that would mean 
that the main heap had a pointer into this heap.  I therefore think 
call-backs would have to be implemented by giving registering enough 
information externally so that the call-back would be callable using 
call-with-heap.  Something like:

(defvar *heaps* (make-hash-table))

(defvar *callbacks* nil)

(defun register-callback (fun heapid args)
  (push #'(lambda () 
            (call-with-heap (gethash heapid *heaps*) fun args))
        *callbacks*))

(defun call-back-in-some-heap (arg)
  ...)

(defun executed-in-some-heap ()
  (call-with-heap (gethash *main-heap-id* *heaps*) #'register-callback 
                  (list #'call-back-in-some-heap *current-heap-id* 123)))

Here I would need call-with-heap to pass a reference to a function as an 
argument, but that is ok since it isn't a pointer into the 
no-external-references-allowed heap.  A closure on the other hand, could 
not work.

>> >> What
>> >> about asynchronous operations that hold onto a buffer but may not
>> >> complete by the time your form unwinds?
>> 
>> I am not sure I understand this particular problem.
> 
> Queued I/O operations require one to retain buffers that can stay
> active for long periods of time until an I/O completes.  Usually,
> call-backs are used to signal the completion of an operation.  Without
> having to marshal data out of your magic heap world to issue an I/O
> request, how would you propose to safely and efficiently accommodate
> the large buffers needed for these operations and the call-backs made
> into your process?
> 

Call-backs could be made as described above.  If you need global buffers 
you need to marshall all access to these buffers.  However, sometimes you 
don't need to update the global state all the time.  In a lot of cases, you 
can buffer and only once in a while update global state. For instance, a 
heap that would execute code related to a query could build the request 
locally in its heap, and flush it directly out on the network socket 
without it ever being visible to the main heap. 

>> I am not sure what issues related to method combination you refer to.
> 
> I do not know what Kent had in mind here, but I can foresee situations
> where somebody advises a function or sticks and :AROUND method on a
> generic function which you may not know about and all sorts of
> horrible things happening.
> 

> Your system may make sense if you have complete control over all of
> the code that would be called in these special heap extents.  However,
> allocation schemes like this make it incredibly difficult to
> incorporate library code into your application add also add a lot of
> "context" to your application.  Comprehending such code will likely
> have a high cognitive cost.
> 

On the contrary, I think that the alternative would be difficult. 
Incorporating library code into a program which uses resource pooling to 
avoid GCing would be difficult.  If the library code assumed that the 
generational GC would work very well, but it so happens that in your 
server, because of the fact that you are servicing 200 requests 
simultaneously, this assumption doesn't hold, then what do you do?

When you incorporate libraries, you could use three strategies:

o For libraries that don't have global state, you don't need to do anything.

o For libraries that have global state, but where the global state is 
initialized during startup and then never changed, you do nothing.

o For libraries that have global state, and where the global state is 
changing all the time, you wrap each function in a call-with-heap wrapper.  
If this somehow isn't possible, then tough luck, don't use call-with-heap 
in your application.

Wrt. cognitive cost, it is very hard to tell what the cognitive cost would 
be.  I think it would be minimal as I think most of this call-with-heap 
stuff could be hidden away.  Also consider the cognitive cost of 
maintaining your own resource pooling because you have to work your way 
around the garbage collector.

>> I don't think something like this would be added to a commercial lisp in
>> the near future :-)  I am discussing this within the context of Arc as
>> something that would be cool to have, and something that I think would be
>> especially well suited for server programming - which is something Arc
>> aims for, and something I believe generational GC has problems with under
>> high system loads (200 concurrent requests).
> 
> People have offered empirical evidence that a generational collector
> is not a limiting factor for these sorts of server applications.  Do
> you have any real evidence that such a collector has problems under
> "high system loads"?

The evidence that I've seen said that the generational collector worked 
great for a server because most of the memory created during a request was 
freed when GC was run.  I am totally in on that.  My theory is that a 
generational GC will work great, but will loose compared to a "C program" 
in either CPU cycles spend pr byte reclaimed, or in bytes of memory used 
when the number of concurrent requests is increased.  I don't see how this 
is wrong, and I haven't seen arguments why I am wrong, or empirical 
evidence contrary to this.

"high system loads" is not just a fancy expression taken out of thin air.  
What specifically happens is that you are working on many problems 
simultaneously.  Each problem solving activity produces lots of 
garbage. You even have specific points in time when basically all memory 
you used to solve a problem is garbage.  But the system as a whole produces 
a steady stream of garbage, and there is no way you can tell your 
generational GC to only look for garbage among the memory that was 
allocated by a problem solving activity that was recently completed.

astor

From: Duane Rettig
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 18:40:59 +0000
Message-ID: <43d2d3aro.fsf@beta.franz.com>

Alexander Kjeldaas <··········@fast.no> writes:

> I have no experience with server programming in lisp, but I would think 
> that a fork-once-in-a-while model would be an efficient method for 
> implementing a server in a garbage-collected language.  In the 
> fork-once-in-a-while model, you have a "tree-shaked" mother process and 
> fork off a pool of processes that handle connections.   Each process can 
> handle more than one connection, but when the processes have consed too 
> much, you kill them off and fork a new process instead of starting garbage 
> collection. 

We've had many discussions at Franz about the many aspects of server
technology that can be implemented.  The "poor man's global-gc" (i.e.
throw the lisp away and start over) is a serious idea.  It is essentially
an extrapolation of the idea that the most efficient garbage-collectors
tend not to collect garbage, but instead they leave the garbage behind
and collect the good stuff!

Throwing away a lisp process is one way to do gc globally in a generational
situation, but there are other ways.  We've had good measurements so far
from a new concept we intorduced into Allegro CL 6.0, where we allow you
to "close" sections of oldest heap space so that the global-gc doesn't
bother trying to collect from those areas.  In essence, you would build
your application, globally gc it and size the areas as low as possible,
and then "close" the application so that it never gets globally-gc'd.
Then, when the application creates new workspaces and they become tenured
to oldspaces, those "open" oldspaces are the only ones that get globally
gc'd.  Such a global-gc is much faster than gc-ing the whole oldspace.

Another feature of closed old areas is that once such a heap is dumped
out, it can be shared by multiple program invocations on any operating
system which allows copy-on-write mapping.  The reason for this is that
those closed old-areas tend not to change at all, and so the "copy"
portion of the copy-on-write is invoked less.  This means more efficient
startup times for second, third, ... lisps, and more sharing of the
heap (as well, of course, as the obvious sharing of the shared-libraries
and the text section of the main).  All in all, it makes it feasible
to get very good efficiency by using multiple lisp invocations.

> In this system, you have to explicitly communicate with the mother process 
> whenever you want to change global state (session management etc. for a 
> http server).  In such a system you could also take all kinds of shortcuts 
> with the memory allocator since you don't want to garbage collect in the 
> slave process.

Yes, such communication is needed for coordination of multiple processes.
We've come up with a new Lisp RPC module, which allows one to hide
most of the low-level communication grunge, and instead allows communication
via lisp calls.  It was based on technology that was developed for
efficient communication with Java.  The developer who did this figured
that since we were doing these great things to communicate with other
languages, it would also be nice to have multiple Lisp processes be able
to talk to each other at a high level as well!

> Now, what I am saying is that you might be best off by using fork() to 
> overcome perceived limitations of garbage collection for implementation of 
> server processes.  A server usually receives a request, processes that 
> request, delivers a result, and starts from scratch.  Another way to say 
> this is that most lisps (correct me if I am wrong) don't support the most 
> efficient method for setting up and tearing down memory heaps for server 
> applications.  This might be a grave accusation given that I stated that I 
> have no experience in this area, but read on :-)

Well, yes, this is a grave accusation.  It may be true, but before you
make such statements, you should solidify your definition of "the most
efficient method for setting up and tearing down memory heaps for server
applications".  I'm not sure if you are figuring that it's simply
obvious what this most efficient method is, or if you are leading to
your conclusion in this article about multiple heaps, but it is not
clear to me what "the most efficient method" means to you.

> A generational garbage collector probably works fairly well for a server 
> application.  Most objects live fairly short, and if you garbage collect 
> between requests, you'll get decent performance.  However, as you suggest 
> above, the most efficient method to implement most server applications is 
> as an event-driven state machine.  In such an implementation, you can 
> expect to have, say 200 simultaneous requests.  In this situation, there is 
> no point in time where a generational garbage collector will work well.  If 
> you garbage collect after a request has finished, you will only reclaim 
> 0.5% of the local heap.  In order to get decent performance out of the 
> generational garbage collector you will have to garbage collect every, say 
> 200 requests which would reclaim on average 50% of the local heap.  The 
> downside of this is that you would use twice the memory compared to a 
> server written in C. 

The tuning of a generational gc is an art, and we tend to take them on
a case-by-case basis.  We also try to make such art available to
programmers who will need it without bogging others down in details,
by providing as much control as possible with good defaults.  In order
to test your theory, we would have to have a real test case.  I think
that it is dfangerous to make generalizations as above without actually
trying things out; you may be surprised in both directions - some
assumptions you've made about how bad things will be will be wrong, and
some assumptions about how good things will be will be wrong.  It
largely depends on the application itself, and on which of the
application's parameters "hits the wall" before others.

> I think a solution to this problem is to support multiple heaps.

 [ ...]

Others have posted some history in response to the idea of multiple heaps.
In addition, a fundamental deterrent to having multiple heaps is 
a Fragmentation Problem: in a linear address-space which is almost
full nowadays (i.e. in 32-bit architectures), it is hard to predict
the best sizes for such multiple-space memory segments because such
address-space is so precious.  For example, in a threaded environment,
where each thread has its own stack, how many threads can you reasonably
fit into your process?  You might be surprised at how fast the address
space fills up just so accomodate stacks.  Perhaps 64-bits will give us
some breathing room for a generation or so...

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)

From: Alexander Kjeldaas
Subject: Re: Why Arc isn't especially OO
Date: Sat, 15 Dec 2001 18:41:34 +0000
Message-ID: <9vg5h0$lu6$1@news.ost.eltele.no>

Duane Rettig wrote:

> We've had many discussions at Franz about the many aspects of server
> technology that can be implemented.  The "poor man's global-gc" (i.e.
> throw the lisp away and start over) is a serious idea.  It is essentially
> an extrapolation of the idea that the most efficient garbage-collectors
> tend not to collect garbage, but instead they leave the garbage behind
> and collect the good stuff!
> 
> Throwing away a lisp process is one way to do gc globally in a
> generational
> situation, but there are other ways.  We've had good measurements so far
> from a new concept we intorduced into Allegro CL 6.0, where we allow you
> to "close" sections of oldest heap space so that the global-gc doesn't
> bother trying to collect from those areas.  In essence, you would build
> your application, globally gc it and size the areas as low as possible,
> and then "close" the application so that it never gets globally-gc'd.
> Then, when the application creates new workspaces and they become tenured
> to oldspaces, those "open" oldspaces are the only ones that get globally
> gc'd.  Such a global-gc is much faster than gc-ing the whole oldspace.
> 
> Another feature of closed old areas is that once such a heap is dumped
> out, it can be shared by multiple program invocations on any operating
> system which allows copy-on-write mapping.  The reason for this is that
> those closed old-areas tend not to change at all, and so the "copy"
> portion of the copy-on-write is invoked less.  This means more efficient
> startup times for second, third, ... lisps, and more sharing of the
> heap (as well, of course, as the obvious sharing of the shared-libraries
> and the text section of the main).  All in all, it makes it feasible
> to get very good efficiency by using multiple lisp invocations.
> 

A closed old area seems like a good idea, but maybe a morer adical idea can 
get even more radical improvements :-).  Even if you never have to GC the 
old area it seems that you still have problems with the amount of garbage 
you can reclaim in the local heap each time you GC.

>> Now, what I am saying is that you might be best off by using fork() to
>> overcome perceived limitations of garbage collection for implementation
>> of
>> server processes.  A server usually receives a request, processes that
>> request, delivers a result, and starts from scratch.  Another way to say
>> this is that most lisps (correct me if I am wrong) don't support the most
>> efficient method for setting up and tearing down memory heaps for server
>> applications.  This might be a grave accusation given that I stated that
>> I have no experience in this area, but read on :-)
> 
> Well, yes, this is a grave accusation.  It may be true, but before you
> make such statements, you should solidify your definition of "the most
> efficient method for setting up and tearing down memory heaps for server
> applications".  I'm not sure if you are figuring that it's simply
> obvious what this most efficient method is, or if you are leading to
> your conclusion in this article about multiple heaps, but it is not
> clear to me what "the most efficient method" means to you.
> 

We are discussing server applications, and I am thinking of a model where a 
request comes in, is processed, and a result is returned.  A little global 
state might be updated in the process, but most memory consed is never 
again referenced, and can be freed when we finish with our request.

The most efficient method to handle memory efficiently in this situation, 
assuming memory is a fairly scarse resource, is to have each request use a 
separate pool of memory, and reclaim memory from that pool when the request 
finishes.

> 
> The tuning of a generational gc is an art, and we tend to take them on
> a case-by-case basis.  We also try to make such art available to
> programmers who will need it without bogging others down in details,
> by providing as much control as possible with good defaults.  In order
> to test your theory, we would have to have a real test case.  I think
> that it is dfangerous to make generalizations as above without actually
> trying things out; you may be surprised in both directions - some
> assumptions you've made about how bad things will be will be wrong, and
> some assumptions about how good things will be will be wrong.  It
> largely depends on the application itself, and on which of the
> application's parameters "hits the wall" before others.
> 

It is hard to argue against this :-) I don't have any numbers except for 
the back-of-an-envelope calculations posted initially, and I am not 
prepared to implement this idea to prove my point - at least not while this 
thread is active :-)

> Others have posted some history in response to the idea of multiple heaps.
> In addition, a fundamental deterrent to having multiple heaps is
> a Fragmentation Problem: in a linear address-space which is almost
> full nowadays (i.e. in 32-bit architectures), it is hard to predict
> the best sizes for such multiple-space memory segments because such
> address-space is so precious.  For example, in a threaded environment,
> where each thread has its own stack, how many threads can you reasonably
> fit into your process?  You might be surprised at how fast the address
> space fills up just so accomodate stacks.  Perhaps 64-bits will give us
> some breathing room for a generation or so...
> 

I know all too well that this is a problem, but a heap doesn't have to use 
a linear address space.

astor

From: Duane Rettig
Subject: Re: Why Arc isn't especially OO
Date: Sun, 16 Dec 2001 02:25:56 +0000
Message-ID: <4d71f52a3.fsf@beta.franz.com>

Alexander Kjeldaas <··········@fast.no> writes:

> A closed old area seems like a good idea, but maybe a morer adical idea can 
> get even more radical improvements :-).  Even if you never have to GC the 
> old area it seems that you still have problems with the amount of garbage 
> you can reclaim in the local heap each time you GC.

And what amount of garbage is that?  You've been offering up several sets
of numbers in various places on this thread, but you haven't shown us the
source of assumptions you've made for why thinking that any particular amount
of consing is needed.  Perhaps you think it is obvious that lisp must cons
a certain amount. Or perhaps you believe that resourcing is a Bad Thing
in a garbage-collected environment (it is not a bad thing, if done carefully).

> >> Now, what I am saying is that you might be best off by using fork() to
> >> overcome perceived limitations of garbage collection for implementation
> >> of
> >> server processes.  A server usually receives a request, processes that
> >> request, delivers a result, and starts from scratch.  Another way to say
> >> this is that most lisps (correct me if I am wrong) don't support the most
> >> efficient method for setting up and tearing down memory heaps for server
> >> applications.  This might be a grave accusation given that I stated that
> >> I have no experience in this area, but read on :-)
> > 
> > Well, yes, this is a grave accusation.  It may be true, but before you
> > make such statements, you should solidify your definition of "the most
> > efficient method for setting up and tearing down memory heaps for server
> > applications".  I'm not sure if you are figuring that it's simply
> > obvious what this most efficient method is, or if you are leading to
> > your conclusion in this article about multiple heaps, but it is not
> > clear to me what "the most efficient method" means to you.
> > 
> 
> We are discussing server applications, and I am thinking of a model where a 
> request comes in, is processed, and a result is returned.  A little global 
> state might be updated in the process, but most memory consed is never 
> again referenced, and can be freed when we finish with our request.

Whether memory used by a request is again referenced or not is entirely
up to you as the application programmer.

> The most efficient method to handle memory efficiently in this situation, 
> assuming memory is a fairly scarse resource, is to have each request use a 
> separate pool of memory, and reclaim memory from that pool when the request 
> finishes.

Yes.  Others have mentioned resourcing in various ways on this thread.
Where are you going with this?

> > The tuning of a generational gc is an art, and we tend to take them on
> > a case-by-case basis.  We also try to make such art available to
> > programmers who will need it without bogging others down in details,
> > by providing as much control as possible with good defaults.  In order
> > to test your theory, we would have to have a real test case.  I think
> > that it is dfangerous to make generalizations as above without actually
> > trying things out; you may be surprised in both directions - some
> > assumptions you've made about how bad things will be will be wrong, and
> > some assumptions about how good things will be will be wrong.  It
> > largely depends on the application itself, and on which of the
> > application's parameters "hits the wall" before others.
> > 
> 
> It is hard to argue against this :-) I don't have any numbers except for 
> the back-of-an-envelope calculations posted initially, and I am not 
> prepared to implement this idea to prove my point - at least not while this 
> thread is active :-)

I'm actually not quite sure what your point is in this thread.  I get a
general feeling that you are dubious about gc being applicable to servers,
but could you please state what your point is more clearly?

> > Others have posted some history in response to the idea of multiple heaps.
> > In addition, a fundamental deterrent to having multiple heaps is
> > a Fragmentation Problem: in a linear address-space which is almost
> > full nowadays (i.e. in 32-bit architectures), it is hard to predict
> > the best sizes for such multiple-space memory segments because such
> > address-space is so precious.  For example, in a threaded environment,
> > where each thread has its own stack, how many threads can you reasonably
> > fit into your process?  You might be surprised at how fast the address
> > space fills up just so accomodate stacks.  Perhaps 64-bits will give us
> > some breathing room for a generation or so...
> > 
> 
> I know all too well that this is a problem, but a heap doesn't have to use 
> a linear address space.

What other kind of address space might it use?  At the lowest level, all
memory addresses are linear.  And as such, one must choose maximum sizes
for each of them, which thus means that they will either hit those maximums.
One can implement a virtual addressability, but the more flexible the
addressing, the more it costs in efficiency.

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)

From: Alexander Kjeldaas
Subject: Re: Why Arc isn't especially OO
Date: Sun, 16 Dec 2001 21:45:14 +0000
Message-ID: <9vj4le$dq4$1@news.ost.eltele.no>

Duane Rettig wrote:

> 
> And what amount of garbage is that?  You've been offering up several sets
> of numbers in various places on this thread, but you haven't shown us the
> source of assumptions you've made for why thinking that any particular
> amount
> of consing is needed.  Perhaps you think it is obvious that lisp must cons
> a certain amount. Or perhaps you believe that resourcing is a Bad Thing
> in a garbage-collected environment (it is not a bad thing, if done
> carefully).
> 

I assume that a certain amount of consing is needed to service a request.

I have not considered handling resourcing, pooling objects etc. because 
they would be engineering tricks to work around the GC, and if done 
extensively to reduce consing, would lead to the kinds of bugs that GC is 
there to avoid in the first place. 

I assume that given the choice between doing resourcing and not doing 
resourcing, and that both approaches would be comparable in speed, you 
would choose not to do resourcing.

>> 
>> We are discussing server applications, and I am thinking of a model where
>> a
>> request comes in, is processed, and a result is returned.  A little
>> global state might be updated in the process, but most memory consed is
>> never again referenced, and can be freed when we finish with our request.
> 
> Whether memory used by a request is again referenced or not is entirely
> up to you as the application programmer.
> 

I assume that most memory consed will never again be referenced.  I think 
this was supported by at least one posting in this thread.

>> The most efficient method to handle memory efficiently in this situation,
>> assuming memory is a fairly scarse resource, is to have each request use
>> a separate pool of memory, and reclaim memory from that pool when the
>> request finishes.
> 
> Yes.  Others have mentioned resourcing in various ways on this thread.
> Where are you going with this?
> 

AFAIK, there is no way to reclaim memory from one of these memory pools 
when a request has finished except by working around the GC system and do 
manual pooling of resources.  I am suggesting that there should be a 
general way of dealing with this situation, and I have explained and given 
some examples of how such a language feature might work.

> 
> I'm actually not quite sure what your point is in this thread.  I get a
> general feeling that you are dubious about gc being applicable to servers,
> but could you please state what your point is more clearly?
> 

I think I have stated this several times now, but doing it once more won't 
hurt: 

You have an even-driven server.  It has lots of outstanding requests, say 
200. Each request uses resources, say 1MB.  Most of these resources will be 
released when the request finishes, say 95%.  Using a generational GC 
implementation is seems you generally have the option of choosing between 
three evils:

1) GC after each request finishes: The GC would be able to free 1MB, but 
would have to scan 200MB of data, assuming that all objects belonging to 
outstanding requests were located in the initial generation.  This is not a 
very effective GC - only .5% of the memory scanned is freed.  [If some 
objects belonging to outstanding requests are not located in the initial 
generation, this GC would on average scan less memory, but wouldn't be able 
to free as much memory (in bytes) on average.]

2) GC after every, say, 200 requests: The GC would be able to free 200MB, 
and would have to scan 400MB to free the 200.  This is pretty effective - 
50% of the memory scanned is freed.  However, this means that one would 
have to use 400MB of memory.

3) Don't use the GC system and do pooling of objects internally.  This is 
the good engineering solution, but is complex.  Taking pooling to the 
extreme, this also leads to exactly the kind of bugs GC helps you avoid. 
This requires dicipline, might have problems with external libraries etc.

Both of the GC solutions have problems.  Solution 1 uses a lot of CPU 
scanning memory, solution 2 uses a lot of memory.  So you go for solution 3 
- avoiding the GC system.

I think a language that was meant to be used for "web programming" could go 
the extra mile to have a more general solution to this that doesn't involve 
the programmer having to avoid the GC system to get comparable speed to a C 
program.

My proposed "solution" is to have heaps as first class objects, and to be 
able to evaluate a function in the context of a heap.  From a performance 
point of view, I also think this is worthwhile goal because it makes it 
possible to use the most efficient allocator, period. You don't need the 
overhead of tracing garbage memory like C programs do, and you don't need 
the overhead of tracing live objects like GC does.

>> > Others have posted some history in response to the idea of multiple
>> > heaps. In addition, a fundamental deterrent to having multiple heaps is
>> > a Fragmentation Problem: in a linear address-space which is almost
>> > full nowadays (i.e. in 32-bit architectures), it is hard to predict
>> > the best sizes for such multiple-space memory segments because such
>> > address-space is so precious.  For example, in a threaded environment,
>> > where each thread has its own stack, how many threads can you
>> > reasonably
>> > fit into your process?  You might be surprised at how fast the address
>> > space fills up just so accomodate stacks.  Perhaps 64-bits will give us
>> > some breathing room for a generation or so...
>> > 
>> 
>> I know all too well that this is a problem, but a heap doesn't have to
>> use a linear address space.
> 
> What other kind of address space might it use?  At the lowest level, all
> memory addresses are linear.  And as such, one must choose maximum sizes
> for each of them, which thus means that they will either hit those
> maximums. One can implement a virtual addressability, but the more
> flexible the addressing, the more it costs in efficiency.
> 

Let me rephrase that: A heap does not have to consist of a continuous 
address range.  If a heap consists of individual pages, you don't have much 
of a fragmentation problem.  On the other hand, stacks which usually needs 
to be continuous are more of a problem.

astor

From: Duane Rettig
Subject: Re: Why Arc isn't especially OO
Date: Mon, 17 Dec 2001 00:25:19 +0000
Message-ID: <4wuzmu1zk.fsf@beta.franz.com>

Alexander Kjeldaas <··········@fast.no> writes:

> Duane Rettig wrote:
> 
> > 
> > And what amount of garbage is that?  You've been offering up several sets
> > of numbers in various places on this thread, but you haven't shown us the
> > source of assumptions you've made for why thinking that any particular
> > amount
> > of consing is needed.  Perhaps you think it is obvious that lisp must cons
> > a certain amount. Or perhaps you believe that resourcing is a Bad Thing
> > in a garbage-collected environment (it is not a bad thing, if done
> > carefully).
> > 
> 
> I assume that a certain amount of consing is needed to service a request.

Why do you assume that?

In Common Lisp, some functions cons by definition.  In Common Lisp
implementations, some functions which are not defined to cons will
cons simply because they are inefficiently implemented.  Many operators
do not (or need not) cons at all.

So why do you try to tune your hypothetical server application by
trying to manage garbage, when you haven't even considered not
generating the garbage in the first place?

> > I'm actually not quite sure what your point is in this thread.  I get a
> > general feeling that you are dubious about gc being applicable to servers,
> > but could you please state what your point is more clearly?
> > 
> 
> I think I have stated this several times now, but doing it once more won't 
> hurt: 

I asume that because you've stated the same thing yet again, you
must not have gotten my hint that it wasn't what I was asking for.
So let's break it down explicitly:

> You have an even-driven server.

Granted.

> It has lots of outstanding requests, say 200.

Granted.

> Each request uses resources, say 1MB.

Why?  Why not half a MB, or, better yet, 1kb?

>  Most of these resources will be released when the request finishes,
> say 95%.

Why?  Why not 100%?  Why not 0%?

>  Using a generational GC 
> implementation is seems you generally have the option of choosing between 
> three evils:

Until you answer the two issues above, and define your assumptions
for your conclusions, it makes no sense to discuss what the GC is
going to do with it.

"Doctor, Doctor, my gc hurts when I cons uncontrollably."
"Well, don't do that."

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)

From: Alexander Kjeldaas
Subject: Re: Why Arc isn't especially OO
Date: Tue, 18 Dec 2001 19:45:17 +0000
Message-ID: <9vo6ch$f35$1@news.ost.eltele.no>

Duane Rettig wrote:

> Alexander Kjeldaas <··········@fast.no> writes:
> 
>> 
>> I assume that a certain amount of consing is needed to service a request.
> 
> Why do you assume that?
> 
> In Common Lisp, some functions cons by definition.  In Common Lisp
> implementations, some functions which are not defined to cons will
> cons simply because they are inefficiently implemented.  Many operators
> do not (or need not) cons at all.
> 
> So why do you try to tune your hypothetical server application by
> trying to manage garbage, when you haven't even considered not
> generating the garbage in the first place?
> 

That is true, I have not considered not generating garbage because I 
thought it was pretty obvious that it can't be done given the assumption in 
the next paragraph that you cut away.  The paragraph reads:

I have not considered handling resourcing, pooling objects etc. because 
they would be engineering tricks to work around the GC, and if done 
extensively to reduce consing, would lead to the kinds of bugs that GC is 
there to avoid in the first place. 

Now back to your question.  Consider an extremely simple example: A request 
comes in to fetch a certain file from the disk and write it to a socket.  
You start an asynchronous I/O command.  To set up this command you need a 
buffer to store the data you have read, thus you have to allocate it on the 
heap.  I don't see any other way to do this.

>> 
>> I think I have stated this several times now, but doing it once more
>> won't hurt:
> 
> I asume that because you've stated the same thing yet again, you
> must not have gotten my hint that it wasn't what I was asking for.

I think your "hints" indicate that we might have a basic disagreement about 
what constitues an event-driven program. Therefore, I suggest that you say 
what is on your mind instead of hinting as that serves no purpose other 
than to make this thread long and (for me at least) uninteresting.

> So let's break it down explicitly:
> 
>> You have an even-driven server.
> 
> Granted.
> 
>> It has lots of outstanding requests, say 200.
> 
> Granted.
> 
>> Each request uses resources, say 1MB.
> 
> Why?  Why not half a MB, or, better yet, 1kb?
> 

Some servers might require 1MB per request, some might require 1kb per 
request.  See extremely simple example of event-driven server above.

>>  Most of these resources will be released when the request finishes,
>> say 95%.
> 
> Why?  Why not 100%?  Why not 0%?
> 

If you need temporary storage to compute the result needed, and your 
temporary results does not benefit from caching, it will become garbage 
after the request finishes.  Consider an IRC server or some server which 
pushes non-cacheable data.  

> 
> Until you answer the two issues above, and define your assumptions
> for your conclusions, it makes no sense to discuss what the GC is
> going to do with it.
> 
> "Doctor, Doctor, my gc hurts when I cons uncontrollably."
> "Well, don't do that."
> 

Please tell me how.

astor

From: Erik Naggum
Subject: Re: Why Arc isn't especially OO
Date: Tue, 18 Dec 2001 21:14:05 +0000
Message-ID: <3217698843229235@naggum.net>

* Alexander Kjeldaas <·····@fast.no>
| I have not considered handling resourcing, pooling objects etc. because
| they would be engineering tricks to work around the GC, and if done
| extensively to reduce consing, would lead to the kinds of bugs that GC is
| there to avoid in the first place.

  This is an incredibly unwarranted assumption.  Do you still think in C++
  when you are discussing memory management in Common Lisp?  It is so wrong
  to think of these techniques as "engineering tricks" and especially to do
  so "to work around the GC".  The same condescending comments would apply
  to nreverse and other non-consing functions.  There is a difference
  between being aware of and trying to prevent wanton waste and thinking
  that whoever cleans up after you has problems so you had better not drop
  any garbage around.  Good programmers are concerned about wanton waste or
  any resource, and not because they "work around" CPU speeds or GC, but
  because they are good practitioners of the art of programming.

  Reading this discussion sounds just like a documentary I saw the other
  day on the coral reefs of Australia.  The narrator spoke French with
  mostly English words and usually English pronunciation like most French
  do, but retained French idioms, choice of latinate words, and where the
  words looked too much like a French word, chose French pronunciation,
  etc, basically largely unintelligible unless one knows what the French
  would have meant had it actually been in French.  In other words, I think
  you are speaking Common Lisp with _very_ heavy accent.  That does not
  mean that a lot of the issues you raise are invalid or uninteresting,
  only that they look very much like someone's C++ perspective and may not
  actually be relevant to Common Lisp.

///
-- 
  The past is not more important than the future, despite what your culture
  has taught you.  Your future observations, conclusions, and beliefs are
  more important to you than those in your past ever will be.  The world is
  changing so fast the balance between the past and the future has shifted.

From: Duane Rettig
Subject: Re: Why Arc isn't especially OO
Date: Wed, 19 Dec 2001 09:17:36 +0000
Message-ID: <44rmn7emn.fsf@beta.franz.com>

[sorry, this response got fairly long]:

Alexander Kjeldaas <·····@fast.no> writes:

> Duane Rettig wrote:
> 
> > So why do you try to tune your hypothetical server application by
> > trying to manage garbage, when you haven't even considered not
> > generating the garbage in the first place?
> > 
> 
> That is true, I have not considered not generating garbage because I 
> thought it was pretty obvious that it can't be done given the assumption in 
> the next paragraph that you cut away.  The paragraph reads:
> 
> I have not considered handling resourcing, pooling objects etc. because 
> they would be engineering tricks to work around the GC, and if done 
> extensively to reduce consing, would lead to the kinds of bugs that GC is 
> there to avoid in the first place. 

Erik Naggum has responded to this, and explains effectively why I left
out this above paragraph.  However, the fact that you have reinstated
it explains why you haven't gotten my hints.  I will reveal all, see
below.

> Now back to your question.  Consider an extremely simple example: A request 
> comes in to fetch a certain file from the disk and write it to a socket.  
> You start an asynchronous I/O command.  To set up this command you need a 
> buffer to store the data you have read, thus you have to allocate it on the 
> heap.  I don't see any other way to do this.

I will come back to this example scenario after I have dealt with the
more important issue of breaking you free from your assumptions.  See
below.

> >> I think I have stated this several times now, but doing it once more
> >> won't hurt:
> > 
> > I asume that because you've stated the same thing yet again, you
> > must not have gotten my hint that it wasn't what I was asking for.
> 
> I think your "hints" indicate that we might have a basic disagreement about 
> what constitues an event-driven program.

So disappointing; you were so close!  You have all of the elements to
break free from your assumptions.

>            Therefore, I suggest that you say 
> what is on your mind instead of hinting as that serves no purpose other 
> than to make this thread long and (for me at least) uninteresting.

Whenever I encounter a person who has strong opinions that are based
on incorrect assumptions, I try to lead that person to discover his
incorrect assumptions for himself, by asking questions that lead him
to discover the contradictions in his assumptions.  This way, the
learning which is accomplished is more permanent (because he has
"discovered" it through his own epiphany) than if I just told him
the truth, which might lead him to either not buy into that truth
completely, or to in fact reject it entirely just _because_ I had
said it.

No, our disagreement has nothing to do with event-driven programs.
And the conversation has not been about Garbage Collection, though
it appears clear to me that that is what you are concentrating on.
But Garbage Collection only occurs when there is garbage to collect,
so I have been attempting to raise your thought processes to a level
somewhat above the level of garbage :-)  So to me, the whole
conversation has really been about memory management and especially
about _consing_.

The term "consing" is a somewhat overloaded term.  The Glossary in
the CL spec gives us a hint at this by giving the definition of the
verb "cons" (i.e. to allocate cons cells), and then to give an idiom
(to allocate _any_ lisp objects).  In practice, we also tend to use
the term as a mental abbreviation which essentially means "to
allocate garbage".  I call this an abbreviation because in reality
it is ludicrous to allocate garbage, and indeed any object that
lisp allocates starts out as non-garbage even if it becomes garbage
soon after it was allocated.  Perhaps a better definition for this
last idiomatic usage is "to allocate needlessly".  This should not
be confused with allocating ephemerally, which is not quite the
same in my mind.  Ephemeral consing is not needless - by programmer
proclamation - Some level of consing might be acceptable in an
application, and this may be proclaimed as not being needless, in
which case it is ephemeral.

So how do we reduce needless allocation?  You keep going back to
garbage collection, but the collection side of memory management
is only one side of the discussion.  Instead, let's track an object
through its lifetime.

- Most lisp objects are born, have a lifetime, and then die, after
which they usually become subject to the garbage-collector if
and when it runs.

This rule is not hard and fast, due to the usages of these objects
in the lisp.  Because lisp environments are usually created by
cloning other lisp environments, an object may have not been
born in this invocation of lisp; it is part of the system it thus
effectively has no birth.  And since some caching of data might occur
during the startup phase of an application, caches may be built,
and those data associated with those caches effectively never
die (assuming the caches are not subsequently cleared).

Let's go back to the normal case, where an object is born, lives
out a life of (hopefully) useful work, and then dies.  How many
times an object is used for potentially different purposes is
completely up to the programmer.  All that is necessary is that
a handle is kept to that object, and it will not die, and is thus
not subject to the garbage-collector.  Resourcing facilities are
one way to give objects multiple uses. 

Now, a purist might say that the object can only be born for a
single purpose, and must never be used for any other work.  However,
CL is not a "pure" language, and pretty much lets you do what you want
to do, including resourcing.  In fact, I think most vendors offer
resource facilities, and at least use them internally.  In Allegro CL,
we have a resource facility that does a good job of reusing objects,
and although it is not documented, we provide informal documentation
to customers who request it.

Your own writing places you at some level into the non-reuse camp -
if we revisit your statement:

> I have not considered handling resourcing, pooling objects etc. because 
> they would be engineering tricks to work around the GC, and if done 
> extensively to reduce consing, would lead to the kinds of bugs that GC is 
> there to avoid in the first place. 

There are two aspects of this paragraph I want to visit, and although
Erik identified the paragraph as being C/C++ oriented and advised you
not to think that way, I'd like to take the opposite approach and
show you how thinking in C/C++ terms will refute your claims.

- First, you say that "they would be engineering tricks to work
around the GC".  OK.  Let's look at an example of resourcing in C,
which thus will not have anything to do with GC.

The example is malloc/free.  This is a classic resourcing mechanism.
It has many different forms, implementing many different algorithms,
most of which pull memory blocks from a pre-allocated pool, and when
that pool is exhausted, it asks for more memory from the system.
Most malloc/free implementations never give memory back to the
system, so in that sense there is no GC involved.  Some malloc
and/or free implementations do coalescing of adjacent blocks of
memory, but I also do not consider that a GC, since blocks are
not considered for coalescing unless free is explicitly called.
There may be combinations of GC-ing mallocs or malloc-ing GCs,
but neither have anything in particular to do with each other
fundamentally.

 -Second, you say that resourcing might "lead to the kinds of bugs
that GC is there to avoid".  Two comments on this:

1. It is clear that resourcing carries with it pitfalls, and a good
programmer will try to avoid those pitfalls.  This is as true in
C/C++ as it is in Lisp.

2. A garbage collector's primary purpose is to remove garbage
(i.e. dead objects). Even though the most efficient modern gc's
do not really physically collect garbage, but instead collect
live data, the name "garbage collector" is still useful in
describing its primary goal, which is to remove garbage.
Because of this, a garbage collector generally has no dealings
with live data, and thus has little to do with resourcing
mechanisms (although gc and resource-mechanisms might cooperate
with each other for efficiency).  Thus, GC does not exist to
avoid resourcing or its potential bugs.

Having said #2, I must also add that GC does tend to reduce the
need for resourcing.  In fact, it is possible to use no resourcing
at all for most applications, and to count on GC taking up the slack.
Some purists might take that idea and run with it, creating some kind
of programming taboo on resourcing in a GC-ing environment.  I do not
agree with such a taboo.  Resourcing is another tool, just like GC.
And although modern GCs can do some fairly impressive things in
memory management, it is not the be-all and end-all, and there are
times when resourcing can allow things to be done which GC cannot
easily do.

> > So let's break it down explicitly:
> > 
> >> You have an even-driven server.
> > 
> > Granted.
> > 
> >> It has lots of outstanding requests, say 200.
> > 
> > Granted.
> > 
> >> Each request uses resources, say 1MB.
> > 
> > Why?  Why not half a MB, or, better yet, 1kb?
> > 
> 
> Some servers might require 1MB per request, some might require 1kb per 
> request.  See extremely simple example of event-driven server above.
> 
> >>  Most of these resources will be released when the request finishes,
> >> say 95%.
> > 
> > Why?  Why not 100%?  Why not 0%?
> > 
> 
> If you need temporary storage to compute the result needed, and your 
> temporary results does not benefit from caching, it will become garbage 
> after the request finishes.  Consider an IRC server or some server which 
> pushes non-cacheable data.  
> 
> > 
> > Until you answer the two issues above, and define your assumptions
> > for your conclusions, it makes no sense to discuss what the GC is
> > going to do with it.
> > 
> > "Doctor, Doctor, my gc hurts when I cons uncontrollably."
> > "Well, don't do that."
> > 
> 
> Please tell me how.

OK.  If we break up allocations into three kinds, we can deal with each
of them individually.

 1. Caching: As the application is used, objects of certain kinds are
allocated, but only once.  This builds up the state of a complex
application, and configures it to its current usage.  A possible
use for caching in the server example is that if information must
be generated as to how to handle a particular kind of request, it
might be generated once, cached, and then used thereafter when similar
requests come in.  As the application runs, the caching, if
well-designed, reaches a steady-state, after which normal operation
will no longer add to the caching, and special circumstances which
add to the caching occur infrequently.

 2. Resourcing: Workspace buffers can be resourced.  Streams can
be resourced.  Wherever the steady-state reached for #1 above might
cause the need for unknown numbers of objects to be required,
resourcing might be appropriate.  In our own multiprocessing
implementation, we resource a structure called a clock-event,
and chains of such clock-events keep track of scheduler
interruptions, even though it is impossible to predict
how many of these will be needed at any one time.
If the resourcing mechanism is well-designed, it will not work
against the garbage-collector.  Our own internal resource
mechanism uses a weak vector to store objects that have been
allocated at one time but have then been freed.  Thus, when the
garbage-collector is copying new objects into effectively the
next generation, it does not bother with these objects, but
leaves them behind instead as garbage.  Objects that are still in
use are obviously copied, and when put back, become freed resources
for the next use.

 3. Ephemeral consing: If in the course of operation some ephemeral
consing is done, and heavy use is being made of resourcing, then that
ephemeral consing should be reduced.  If it can be reduced to close to
zero, then it will increase the efficiency of the resourcing.

Reducing ephemeral consing can be done in a couple of ways:

 a. Consider rewriting for stack-consing.  This can be extremely
efficient, since stack-allocated objects are usually very easy to
free.  As with other means of memory management, stack-consing can
be dangerous.  But it also carries a big win with it.

 b. Consider rewriting for resourcing.  This would of course change the
face of this ephemeral data to look like #2, above.

 c. Consider rewriting for no consing.  Using a space profiler, you can
find out where lisp objects are being consed and rewrite to produce the
same result with no consing.  Sometimes this requires a request to the
vendor, if the function doing the consing is a system function that need
not cons.

So finally, here is your specific request.

> Now back to your question.  Consider an extremely simple example: A request 
> comes in to fetch a certain file from the disk and write it to a socket.  
> You start an asynchronous I/O command.  To set up this command you need a 
> buffer to store the data you have read, thus you have to allocate it on the 
> heap.  I don't see any other way to do this.

Actually, I really don't think you need a buffer to store the data, other
than any buffering that is provided either by the file stream or socket
stream.  So let's consider these two resources.  If your stream system
allows you to resource file and socket streams, and provides for the
resourcing of the buffers used in each stream (either as part of the
resourcing of the streams or as a separate resource), then no consing need
be done for any of this operation.  In fact, it is likely that an upper
limit on the number of such requests that can be simultaneously handled
here is less likely to be memory, and more likely to be the operating
system's limit on the total number of open sockets or on the number
of threads that can be started at a time.

Have I now been explicit enough?

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)

From: ········@acm.org
Subject: Re: Why Arc isn't especially OO
Date: Sun, 16 Dec 2001 22:39:17 +0000
Message-ID: <po9T7.4226$Yq5.412704@news20.bellglobal.com>

Alexander Kjeldaas <··········@fast.no> writes:
> You have an even-driven server.  It has lots of outstanding
> requests, say 200. Each request uses resources, say 1MB.  Most of
> these resources will be released when the request finishes, say 95%.
> Using a generational GC implementation is seems you generally have
> the option of choosing between three evils:

> 1) GC after each request finishes: The GC would be able to free 1MB,
> but would have to scan 200MB of data, assuming that all objects
> belonging to outstanding requests were located in the initial
> generation.  This is not a very effective GC - only .5% of the
> memory scanned is freed.  [If some objects belonging to outstanding
> requests are not located in the initial generation, this GC would on
> average scan less memory, but wouldn't be able to free as much
> memory (in bytes) on average.]

> 2) GC after every, say, 200 requests: The GC would be able to free
> 200MB, and would have to scan 400MB to free the 200.  This is pretty
> effective - 50% of the memory scanned is freed.  However, this means
> that one would have to use 400MB of memory.

> 3) Don't use the GC system and do pooling of objects internally.
> This is the good engineering solution, but is complex.  Taking
> pooling to the extreme, this also leads to exactly the kind of bugs
> GC helps you avoid.  This requires dicipline, might have problems
> with external libraries etc.

> Both of the GC solutions have problems.  Solution 1 uses a lot of
> CPU scanning memory, solution 2 uses a lot of memory.  So you go for
> solution 3 - avoiding the GC system.

You don't seem to be using Option #2 to its fullest potential.

It _is_ a given, irrespective of strategy, that there needs to be at
least 200MB around to support the expected data structures.

- Supposing you've got a comfortable 256MB around, that would permit
  doing a GC run roughly every 50 requests.  The efficiency isn't
  spectular; that results in about 20% of the scanned memory getting
  freed.  It's better than just getting 0.5%; it's not as good as 50%;
  it may still be _OK_.

I suppose it would theoretically ideal to be able to tag heaps to
indicate some form of namespace association; that would mean that
you'd spawn a heap for each session, and then (in theory) be able to
efficiently reap that heap at the end.  The result there would be
along the lines that you'd have each session inside something like:

  (with-heap (s "blah")
    (initialization s)
    (code s))

Which would expand (roughly) to:

 (let ((s (make-heap "blah")))
    (unwind-protect
     (initialization s)
     (code s))
    (gc s))

That still leaves open the issue of how you make sure that all the
memory that gets allocated within that environment gets associated
with the heap.

Furthermore, it leaves open the issue of how you make sure that some
objects are NOT allocated in that heap, as they need to get kept for
other purposes.

It's theoretically nice enough to create some sort of "enclosure"
where you indicate that you're allocating inside one heap or another:

  (with-heap (s)
    (setf x '(1 2 3)))
  (with-heap (t)
    (setf y '(4 5 6)))

It still pulls in some tough questions:
 -> What's the scope of this declaration?  Lexical?  Dynamic?
 -> What about places where I want "global" definition?
      Do I have to spatter the code with huge numbers of WITH-HEAP
      declarations?

That last question is _the_ question: Do I have to spatter the code
with lots of WITH-HEAP declarations?  More declarations means more
chances to get it _wrong_.

I'm not sure but that throwing an extra 50MB at the application, and
GCing every 50 iterations, isn't a reasonable golden mean...
-- 
(reverse (concatenate 'string ········@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/spreadsheets.html
Rules of the Evil  Overlord  #48.  "I will  treat  any beast  which  I
control through magic or technology with respect and kindness. Thus if
XXXXthe control is ever broken, it will not  immediately come after me
for revenge." <http://www.eviloverlord.com/>

From: Pierre R. Mai
Subject: Re: Why Arc isn't especially OO
Date: Mon, 17 Dec 2001 00:26:46 +0000
Message-ID: <87ellu7ku1.fsf@orion.bln.pmsf.de>

Alexander Kjeldaas <··········@fast.no> writes:

> I think I have stated this several times now, but doing it once more won't 
> hurt: 
> 
> You have an even-driven server.  It has lots of outstanding requests, say 
> 200. Each request uses resources, say 1MB.  Most of these resources will be 
> released when the request finishes, say 95%.  Using a generational GC 
> implementation is seems you generally have the option of choosing between 
> three evils:

The remaining 5% of memory will have to be freed pretty soon, too, or you
will have no memory left in less than 15 minutes on a 4GB machine,
assuming 100 requests per second.  ;)

> 1) GC after each request finishes: The GC would be able to free 1MB, but 
> would have to scan 200MB of data, assuming that all objects belonging to 
> outstanding requests were located in the initial generation.  This is not a 
> very effective GC - only .5% of the memory scanned is freed.  [If some 
> objects belonging to outstanding requests are not located in the initial 
> generation, this GC would on average scan less memory, but wouldn't be able 
> to free as much memory (in bytes) on average.]

This makes the assumption that the 1MB used by each request will all
have lifetimes that exactly match the lifetime of each request, and
there will not be any other temporary memory allocations.  That seem
like very artificial assumptions, which match no server I have ever
implemented.

> 2) GC after every, say, 200 requests: The GC would be able to free 200MB, 
> and would have to scan 400MB to free the 200.  This is pretty effective - 
> 50% of the memory scanned is freed.  However, this means that one would 
> have to use 400MB of memory.

Even in that worst case, 200MB additonal overhead over the theoretical
best case of exact memory management, seems a very good trade to me.

> 3) Don't use the GC system and do pooling of objects internally.  This is 
> the good engineering solution, but is complex.  Taking pooling to the 
> extreme, this also leads to exactly the kind of bugs GC helps you avoid. 
> This requires dicipline, might have problems with external libraries etc.

Since you assume easily manageable object lifetimes (all per-request
data lives exactly as long as the request), pooling will not lead to
significant management problems.  Of course, in reality, object
lifetimes will be more complex, but that will also mean that the GC
will be more efficient.

> Both of the GC solutions have problems.  Solution 1 uses a lot of CPU 
> scanning memory, solution 2 uses a lot of memory.  So you go for solution 3 
> - avoiding the GC system.

I don't see the problem with using "lots" of memory.  400 MB is not
lots of memory, for a server that has at least a 200MB working set
anyway.

> I think a language that was meant to be used for "web programming" could go 
> the extra mile to have a more general solution to this that doesn't involve 
> the programmer having to avoid the GC system to get comparable speed to a C 
> program.

I fail to see how C achieves the theoretical speed you seem to
presume.  You completely fail to account for the higher allocation
overhead that malloc has, and the overall higher management of a
malloc and free based memory manager:  In order to realize your
theoretical minimum of 200MB working set, you will have to keep
fragmentation to a low level.  Realistically I'd expect serious
amounts of fragmentation, (unless you only allocate same-sized
objects), which will likely increase the 200MB theoretical figure to a
somewhat larger practical figure.  Furthermore, unless the 1MB is just
in one or two large objects, allocation and freeing overhead will
increase even more.

Given that your theoretical server app will have to do something with
the 200MB working set that it constantly has, and assuming a
throughput of > 10 requests/sec, I think that even a moderatly
inefficient GC will account for less than 10% of overall system
performance.  Take a silly little example, which creates 10 processes
per second, each of which allocates 512KB of memory (unspecialized
array, hence the GC has to trace its contents!), sets one cell, sleeps
10 seconds, and returns the referenced value of that cell.  All in all
around 100 processes are live at each point in time, with a total of
1000 processes being created/destroyed.  With an untuned, mediocre
conservative generational GC, 10MB nursery generation, and a GC
manually triggered every 10 processes, memory consumption ranges from
50 to 100 MB, and the GC runs less than 10 seconds of the 112 seconds
the test runs.  And that's on a lowly AMD K6-2/550, which furthermore
has a very slow memory subsystem, compared to todays workstations, let
alone current server machines.

Figures like this are in the noise...

> My proposed "solution" is to have heaps as first class objects, and to be 
> able to evaluate a function in the context of a heap.  From a performance 
> point of view, I also think this is worthwhile goal because it makes it 
> possible to use the most efficient allocator, period. You don't need the 
> overhead of tracing garbage memory like C programs do, and you don't need 
> the overhead of tracing live objects like GC does.

How do you propose to keep the function from creating references
between different heaps?  If you can't then the GC will have to take
references from other heaps into the currently GCed heap into
account.  This will mean that the GC will have to scan all of those
other heaps, in order to GC one heap.  Since the function heaps can't
be marked read-only (like older generations in generational GCs), you
can't use the trick of not scanning unchanged heaps.

So in order for your trick to increase efficiency, you will at the
very least prevent references between the "per-request" heaps.  But in
order to do this, you will also have to prevent the request processors
from creating references in the shared global heaps to their own
heaps, since that could be used to create inter-request-heap
references.

So in effect you have prevented each request processor from creating
durable state, which makes such a system pretty useless...

Regs, Pierre.

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein

From: Bruce Hoult
Subject: Re: Why Arc isn't especially OO
Date: Mon, 17 Dec 2001 01:06:07 +0000
Message-ID: <bruce-1BF35E.14060717122001@news.paradise.net.nz>

In article <··············@orion.bln.pmsf.de>, "Pierre R. Mai" 
<····@acm.org> wrote:

> > 2) GC after every, say, 200 requests: The GC would be able to free 
> > 200MB, 
> > and would have to scan 400MB to free the 200.  This is pretty effective 
> > - 
> > 50% of the memory scanned is freed.  However, this means that one would 
> > have to use 400MB of memory.
> 
> Even in that worst case, 200MB additonal overhead over the theoretical
> best case of exact memory management, seems a very good trade to me.

Especially given that:

- this is presumably a for-profit organisation
- 256 MB of PC133 RAM costs NZ$75 (US$30)
- programmers capable of correctly manually managing memory cost 
$50+/hour in salary alone

From: Thomas F. Burdick
Subject: Re: Why Arc isn't especially OO
Date: Sat, 15 Dec 2001 01:32:30 +0000
Message-ID: <xcv6679b74h.fsf@apocalypse.OCF.Berkeley.EDU>

Alexander Kjeldaas <··········@fast.no> writes:

> I think a solution to this problem is to support multiple heaps.  Each 
> request should use a separate heap that can be teared down after the 
> request has been serviced. This requires a tiny bit of compiler support, 
> but I think it would make it possible to implement a server application 
> with virtually no memory or garbage collection overhead compared to a 
> server written in C.

I agree with the idea that each request should get its own pool.
Apache does its memory management this way, and it's a pretty
reasonable thing to do for the C world.  I'm working on a (server)
application (in Lisp) that manages memory in a similar way.  Its
"pools", though, are really ephemeral generations.  In a special case,
they can have no roots, and "collection" just frees the whole pool
without checking for pointers from older generations.

> The operations I think are needed to support multiple heaps are: an 
> operation to create heaps with various characteristics, and the ability to 
> call a function so that consing happens within the given heap for 
> everything that is done within that function (recursively).

That's not how we're doing it.  We have a dynamic variable that
contains the current pool, and the ability to make new pools.
Primarily, we use this to better identify generations (at the end of
servicing a request, there's almost no memory retained -- but this is
a function of the dynamic contours of the program, not necessarily of
the number of bytes consed).  This isn't managed directly by the
programer, though.  Conceptually, I put the primitives in the same
realm as GO: they're a powerful, lowlevel tools that I only want to
use when building the tools that I'll actually use to write the
application.

> Example:
> (let ((my-heap (make-heap :no-external-references-allowed)))
>   (call-with-heap my-heap #'my-func))

So, what you're trying to express here is that everything you're
allocating will have dynamic-extent.  Funny, my compiler recognizes
this by looking at dynamic-extent declarations.  And, yeah, they're a
little bit of a pain to type all the time, but that's good.  I want
the compiler to assume that everything is indefinate extent, unless I
promise it otherwise.  Forcing the programmer to think before making
promises as grand as limiting the extent of an object is IMO good.

> In order to take full advantage of the ability to create heaps with various 
> characteristics, I would need a little help from the memory system and the 
> compiler.  For instance, I woule like to create a heap with the invariant 
> that no object outside the heap (except for the evaluation stack) should be 
> able to refer to an object inside the heap.  This is a fantastic heap 
> because it allows me to allocate objects fast, and it allows me to throw 
> away the whole heap after I am done with it - exactly the thing I want for 
> server programming.  

I agree, but I think you're moving in to dangerously manual territory
with the way you're doing this.  And, FWIW, you really don't need a
new language for this.  A new compiler, probably, but there's nothing
you want that can't be done with simple extensions to CL.

> The compiler support that I need for this heap is that when I do
> (call-with-heap my-heap #'my-func)
> above, the compiler should make a copy of all stack variables, and it 
> should make all global variables copy-on-write.  That way it impossible for 
> my-func and functions called by it to "communicate" with the outside world. 
> To communicate with the outside world, a message passing interface could be 
> used, just as we would do in a fork()-based server.
> 
> I think this is one cool thing that Arc could implement - and something 
> that would make it ideal for server programming.

fork(), shmork().  C servers need to kill their children off from time
to time because they're full of memory leaks.  We shouldn't need to
deal with this in Lisp.  All the overhead of OS preemption is *waaay*
to much to pay.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Alexander Kjeldaas
Subject: Re: Why Arc isn't especially OO
Date: Sat, 15 Dec 2001 19:55:01 +0000
Message-ID: <9vg9qn$mm1$1@news.ost.eltele.no>

Thomas F. Burdick wrote:

>> Example:
>> (let ((my-heap (make-heap :no-external-references-allowed)))
>>   (call-with-heap my-heap #'my-func))
> 
> So, what you're trying to express here is that everything you're
> allocating will have dynamic-extent.

I think the example is somewhat wrong. Since a server often is event-driven 
you want to keep the heap around until you are done with it.  The state 
needed to deal with a multiple concurrent queries cannot be stored on a 
single unified stack.

I even think destruction of such extra heaps could be done by the GC [no 
that doesn't mean we're back to square one, because the GC would only have 
to GC a tiny tiny local heap compared to what it would look like if 
everything was allocated there].

Example:

(defvar *active-query-heaps*)

(defun handle-query (cmd)
  ...)

(defun handle-io (cmd)
  (declare (simple-string cmd))
  (let* ((heap (gethash (queryid cmd) *active-query-heaps*))
         (result (call-with-heap heap #'handle-query cmd)))
    (declare (simple-string result))
    (when (query-finished result)
      (setf (gethash (queryid cmd) *active-query-heaps*) nil)
      ;; GC of tiny "main" local heap will detect dead heap and free 
      ;; the resulting memory without having to scan that heap.
      (gc))))

>  Funny, my compiler recognizes
> this by looking at dynamic-extent declarations.  And, yeah, they're a
> little bit of a pain to type all the time, but that's good.  I want
> the compiler to assume that everything is indefinate extent, unless I
> promise it otherwise.  Forcing the programmer to think before making
> promises as grand as limiting the extent of an object is IMO good.
> 

dynamic-extent looks interesting, but I have a few problems with it: 
It looks unsafe to use.  You promise to limit the extent of an object, but 
will the system catch your errors for you?  If it does, at what price?  I 
am not suggesting that objects are given limited extent. Also, it seems 
that you must use a threaded model since you can't use a single unified 
stack to hold information if you use a event-driven design.  Third, the 
lisp implementation probably needs to deal with non-linear stacks on a 
32-bit implementation at least.

astor

From: Daniel Barlow
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 22:22:32 +0000
Message-ID: <87heqtwifr.fsf@noetbook.telent.net>

Alexander Kjeldaas <··········@fast.no> writes:

> I think a solution to this problem is to support multiple heaps.  Each 
> request should use a separate heap that can be teared down after the 
> request has been serviced. This requires a tiny bit of compiler support, 
> but I think it would make it possible to implement a server application 
> with virtually no memory or garbage collection overhead compared to a 
> server written in C.

This seems so obvious that there is probably something horribly wrong
with it, but what about a generational GC with a per-request nursery.
When the request is done, you GC that heap, copying the live data to
the next generation.  The cost of a copying GC is mostly related to
what you keep rather than what you leave behind, so in the case that
you throw everything away, this is probably not going to hurt too
much.  If when you come to measure performance you find that you're
not throwing as much away as you wanted to, you have to fix things so
that you are - i.e. the system may get slower, but it remains safe.

Certainly it's going to require less compiler support than making all
global variables copy-on-write.  And you still have the option, when
you need it, of manipualting global state.

I have a feeling that I've read a paper (could have been Henry Baker?)
describing something similar where he puts the nursery on the stack
and transports it to safety when each function exits.  It was a few
years ago and I may be misremembering, though.

-dan

-- 

  http://ww.telent.net/cliki/ - Link farm for free CL-on-Unix resources

From: Bruce Hoult
Subject: Re: Why Arc isn't especially OO
Date: Fri, 14 Dec 2001 23:06:06 +0000
Message-ID: <bruce-378BE4.12060615122001@news.paradise.net.nz>

In article <··············@noetbook.telent.net>, Daniel Barlow 
<···@telent.net> wrote:

> Alexander Kjeldaas <··········@fast.no> writes:
> 
> > I think a solution to this problem is to support multiple heaps.  Each 
> > request should use a separate heap that can be teared down after the 
> > request has been serviced. This requires a tiny bit of compiler 
> > support, 
> > but I think it would make it possible to implement a server application 
> > with virtually no memory or garbage collection overhead compared to a 
> > server written in C.

This is actually how a number of Macintosh programs I've written (in C++ 
or Pascal) work.

On classic MacOS (nearly) the whole of memory is a big heap, and the OS 
creates sub-heaps for each Application running.  You can do the same 
yourself.  Allocate some memory using NewPtr(), initialize it as a heap, 
set it to be the current heap, allocate away, and then when you're done 
just set the current heap back to the main Application heap and 
DisposePtr() your temporary heap.  You can even have a number of them a 
the same time -- e.g. per request or per thread -- and just switch the 
curent heap each time you change threads or requests.  Instead of 
disposing of an old heap, you might just reinitialize it.  It's all easy.

Of course you don't get any help from GC (what GC?) but it's no worse 
than any other malloc/free style programming.

> I have a feeling that I've read a paper (could have been Henry Baker?)
> describing something similar where he puts the nursery on the stack
> and transports it to safety when each function exits.  It was a few
> years ago and I may be misremembering, though.

The "Chicken" Scheme compiler uses this technique.  The code is CPS 
converted and translated to C, so it consists of a collection of C 
functions that *never* return and therefore use the C stack in a 
continually growing, linear manner.  The C stack is the GC nursery and 
when the stack overflows a GC is done and then the stack is unwound 
using longjump().  For best performance the C stack is limited to a 
fixed size determined experimentally by a test program when you install 
Chicken.  On my 700 MHz Athlon it chooses 8 KB.  On an 867 MHz G4 Mac it 
chooses 128 KB.

-- Bruce

From: Alexander Kjeldaas
Subject: Re: Why Arc isn't especially OO
Date: Sat, 15 Dec 2001 18:09:32 +0000
Message-ID: <9vg3ku$ljp$1@news.ost.eltele.no>

Daniel Barlow wrote:

> Alexander Kjeldaas <··········@fast.no> writes:
> 
>> I think a solution to this problem is to support multiple heaps.  Each
>> request should use a separate heap that can be teared down after the
>> request has been serviced. This requires a tiny bit of compiler support,
>> but I think it would make it possible to implement a server application
>> with virtually no memory or garbage collection overhead compared to a
>> server written in C.
> 
> This seems so obvious that there is probably something horribly wrong
> with it, but what about a generational GC with a per-request nursery.
> When the request is done, you GC that heap, copying the live data to
> the next generation.  The cost of a copying GC is mostly related to
> what you keep rather than what you leave behind, so in the case that
> you throw everything away, this is probably not going to hurt too
> much.  If when you come to measure performance you find that you're
> not throwing as much away as you wanted to, you have to fix things so
> that you are - i.e. the system may get slower, but it remains safe.
> 
> Certainly it's going to require less compiler support than making all
> global variables copy-on-write.  And you still have the option, when
> you need it, of manipualting global state.
> 

I think the basic problem with this approach which would be great if it 
could be implemented effectively, is that you need to trace references 
across all the local heaps.  You will have problems GCing just one heap.  
So while I think this would be great, I am not sure it is possible to 
implement.

> I have a feeling that I've read a paper (could have been Henry Baker?)
> describing something similar where he puts the nursery on the stack
> and transports it to safety when each function exits.  It was a few
> years ago and I may be misremembering, though.
> 

astor

From: Alexander Kjeldaas
Subject: Re: Why Arc isn't especially OO
Date: Mon, 17 Dec 2001 21:51:39 +0000
Message-ID: <9vlpdi$fnr$1@news.ost.eltele.no>

Alexander Kjeldaas wrote:

> Daniel Barlow wrote:
> 
>> Alexander Kjeldaas <··········@fast.no> writes:
>> 
>>> I think a solution to this problem is to support multiple heaps.  Each
>>> request should use a separate heap that can be teared down after the
>>> request has been serviced. This requires a tiny bit of compiler support,
>>> but I think it would make it possible to implement a server application
>>> with virtually no memory or garbage collection overhead compared to a
>>> server written in C.
>> 
>> This seems so obvious that there is probably something horribly wrong
>> with it, but what about a generational GC with a per-request nursery.
>> When the request is done, you GC that heap, copying the live data to
>> the next generation.  The cost of a copying GC is mostly related to
>> what you keep rather than what you leave behind, so in the case that
>> you throw everything away, this is probably not going to hurt too
>> much.  If when you come to measure performance you find that you're
>> not throwing as much away as you wanted to, you have to fix things so
>> that you are - i.e. the system may get slower, but it remains safe.
>> 
>> Certainly it's going to require less compiler support than making all
>> global variables copy-on-write.  And you still have the option, when
>> you need it, of manipualting global state.
>> 
> 
> I think the basic problem with this approach which would be great if it
> could be implemented effectively, is that you need to trace references
> across all the local heaps.  You will have problems GCing just one heap.
> So while I think this would be great, I am not sure it is possible to
> implement.

Thinking a bit more about this problem, it might actually be implementable. 

Say you have the global heap, and several local heaps, one per query.  In 
order to GC a single local heap without having to trace all the other local 
heaps, you have to know about all references from other local heaps, and 
from the global heap.  However, it doesn't seem like it is possible to end 
up with a reference into our local heap from another local heap which will 
not be caught by a write-barrier on the global heap.  In order to get 
another local heap to refer to an object we have created we first have to 
change some "global" variable that can be seen by both heaps, and this 
"global" variable could/would be protected by a write-barrier.

If this actually works, implementing call-with-heap would basically involve 
registering the arguments to the function as new roots to the calling heap.

Interesting.

astor

From: Lars Lundback
Subject: Re: Why Arc isn't especially OO
Date: Sun, 16 Dec 2001 08:16:43 +0000
Message-ID: <3c1c4cfc.4096726@news1.telia.com>

On Fri, 14 Dec 2001 06:10:24 GMT, Erik Naggum <····@naggum.net> wrote:

>* James A. Crippen
>| I understand his purpose perfectly.  CL is really too large for the
>| purpose he applied it to.  If you don't need CLOS or hash tables for
>| instance, then you're losing lots of code space in your binary.
>
>  I do not understand this position at all.  Regardless of the "tree
>  shaking" issues, "too large language" is simply false.  The concept of a
>  language being "too large" is vacuous.  Just because starting up a new
>  process is thought to be slow, does not mean that the language is too
>  large.  First, startup time has nothing to do with language size.
>  Second, code size has nothing to do with startup time.  Third, neither
>  has binary size or disk footprint or any of the other numerous idiot
>  measures of size.  As a matter of counter-evidential fact, most of the
>  Schemes I used before I came to my senses are still way, way slower to
>  start up than the Common Lisps I have used, especially with actual code
>  being used, and Scheme is about the smallest language you can get.

I would agree on all the technical points you make. Absolutely. What
was once thought to be big is today no longer so. And when CL appeared
at last, very much longed-for, it was way much better than the Franz
Lisp and Interlisp I had been using. Add to that CL is a stable
standard, and it becomes impossible to advocate for new "Lisps".

However, I think some of you miss the more personal "feel" of CL size,
that some people have (perhaps?). This feeling may be based, not on
code size but on CL implementations' internal complexity. In short,
one may feel a need for total mastering of all the internals of an
implementation and then a complete CL implementation becomes too much
to handle. This feeling is rational because it is personal but it is
justifiable only when you remain in a personal "Lisp" grotto.

It could be that Arc is meant to be implemented "on top of itself",
allowing a tighter integration with the OS, or even becoming a part of
it. That's the only reason I can see from my limited view. Graham says
in his Arc FAQ that Arc is intended for bottom-up programming. Aren't
all programming languages exactly that? But why yet another Lispish
language must be invented escapes me.

There is this "Modernizing Common Lisp: Recommended Extensions"
document, written at MIT in 1999. I wonder if something came out of
it? If not, it sadly reflects how very difficult it is to handle
united efforts at reshaping and extending CL without sacrificing the
current standard. 

>  Execing a new image is usually quite expensive, but forking is not, at
>  least under reasonable Unices, so it should be possible to fork a running
>  image of a large binary with almost no overhead.  Reasonable Unices even
>  do copy-on-write and will not waste memory needlessly, either.  This
>  means that the startup time has become _completely_ irrelevant.
>
>  Furhtermore, the one-thread-per-connection model is inefficient compared
>  to a tightly integrated I/O loop because a context switch is so much more
>  expensive than merely running the same code with several file descriptors.
>
>  Making languages smaller to get faster startup time in a system that has
>  decided on the fork and exec model appears to me no smarter or effective
>  than shaving to lose weight.
>
>| And the Lisp Hacker that has not written their own Lisp-like language is
>| not the true Lisp Hacker.
>
>  Nor is he who has not seen the folly of the endeavor in time.

Oh, I am a fool allright, but only in the sense that I spend private
time somewhere in the lowest cellars of a CL implementation, and I do
it only for fun. But inventing a Lisp-like language - no Sir, I would
do that only after a very extensive analysis as to the needs for it.

Lars