Re: Eval (Was: What's a treeshaker?)

From: ···@sef-pmax.slisp.cs.cmu.edu
Subject: Re: Eval (Was: What's a treeshaker?)
Date: Wed, 03 Aug 1994 03:32:19 +0000
Message-ID: <31n343$me2@cantaloupe.srv.cs.cmu.edu>

    From: ····@ichips.intel.com (Mike Haertel)
    
    Everybody *says* the facilities you don't use don't get paged in, and
    then uses this as an excuse to provide creeping featurism from hell.
    The creepy features probably don't get used terribly often, and certainly
    not in the greatest generality, but likely just often enough to keep
    them continually paged in.

    I've never seen a lisp system yet that didn't have a huge working set.
    Not even when I used a VAX running BSD with 1K pages.
    
The standard CMU CL core image weighs in at something like 20 Mbytes.  If
you are running a straightforward application -- something like a
neural-net simulator -- only a small fraction of this comes into working
memory.  The compiler, inspector, debugging tools, Hemlock editor, doc
strings, bignum stuff, and all your creeping features stay paged out.  I
haven't measured the working set, but it is dominated by the data
structures used by the application.

During normal running, if you're not doing I/O, there is no consing, so the
GC stays paged out as well.  But I admit that this is unusual -- most
applications will call in the GC every so often.

If you have big pages under your favorite OS, you will get some extraneous
stuff on the same pages with stuff you need.  However, if the Lisp has been
carefully packed so that related stuff is adjacent in virtual memory, the
amount of junk you page in should be much less than the amount you get by
linking in libraries just to get a few functions in them.

In our system, the vast majority of the initial core image is pure, and is
shared among all the Lisp jobs you may have running.  (A few versions of
Unix make this sharing impossible, unfortunately.)
    
    In the "all the world's linked in" model of software development, you
    don't have to go to any hassle to use J. Random Feature from the Foo
    library.  It's already there.  By contrast, if you have to explicitly
    link the Foo library, there's a hassle factor, and perhaps also a moment
    of guilt, that makes you think about whether you really need the feature.
    Thus, although the "VM can replace explicit linker control" mindset 
    certainally theoretically true, I believe in practice it is doomed
    to fail because of psychological factors.

Wow!  You really believe that it is A Good Thing to make it a hassle to use
pre-existing facilities so that people will have to stop and think about
whether they really want to pay for each routine?  Maybe it's best not to
program in a high-level language at all -- we don't want to hide all those
low-level decisions that just might cost you a few cycles or a few words.

I prefer to make it as easy as possible for the programmer to use any
pre-existing facilities.  We should also provide some tools so that people
who want to obsess about working set can find out how much things are going
to cost them.
    
    IMHO the "all the world's linked in" model is great for rapid prototyping,
    [which Lisp excels at] but falls flat on its face in production software
    development and delivery environments.
    
Well, we are going the other way in Dylan, but we certainly are going to
make it as easy as possible to pull in all your favorite creeping features.
I guess that in lieu of hassles, Gwydion could have a mode that administers
a mild electric shock every time you load a new library, explicitly or
implicitly, with voltage proportional to the size of the code.

-- Scott

===========================================================================
Scott E. Fahlman			Internet:  ····@cs.cmu.edu
Principal Research Scientist		Phone:     412 268-2575
School of Computer Science              Fax:       412 681-5739
Carnegie Mellon University		Latitude:  40:26:46 N
5000 Forbes Avenue			Longitude: 79:56:55 W
Pittsburgh, PA 15213
===========================================================================

Re: Eval (Was: What's a treeshaker?) Dan Weinreb
Re: Eval (Was: What's a treeshaker?) Martin Rodgers
eval Stephen P. Smith
- Re: eval Jeff Dalton

From: Dan Weinreb
Subject: Re: Eval (Was: What's a treeshaker?)
Date: Thu, 04 Aug 1994 14:45:18 +0000
Message-ID: <DLW.94Aug4144518@butterball.odi.com>

In article <··········@cantaloupe.srv.cs.cmu.edu> ···@sef-pmax.slisp.cs.cmu.edu writes:

If you have big pages under your favorite OS, you will get some extraneous
stuff on the same pages with stuff you need. However, if the Lisp has been
carefully packed so that related stuff is adjacent in virtual memory, the
amount of junk you page in should be much less than the amount you get by
linking in libraries just to get a few functions in them.

Unfortunately it's usually hard to get accurate quantiative
information about this kind of thing. It's hard to know whether your
packing is really careful or not. The timesharing systems generally
don't even provide the crucial hooks that would allow you to have a
metering tool to see just which pages are being faulted. At least,
that has been my own experience.

Wow! You really believe that it is A Good Thing to make it a hassle to use
pre-existing facilities so that people will have to stop and think about
whether they really want to pay for each routine? Maybe it's best not to
program in a high-level language at all -- we don't want to hide all those
low-level decisions that just might cost you a few cycles or a few words.

I think the real problem is that it's so difficult to get a real and
useful measurement of what you are paying to use a facility. It might
not be just a few cycles or a few words; it might be a lot of disk
operations, particularly given the way a small increase in working set
can sometimes lead to a large increase in total realtime, particularly
if you're near one of those "knees in the curve".

Making it all much more complicated and hard to measure is that the
performance degradation can depend not only on the exact history of
your program session, but even on how many other users or daemons or
whatever are executing on the system, and exactly what they're doing.
This difficulty in obtaining useful quantitative cost information
forces everyone to use very vague, intutive notions of cost; a very
unsatisfactory way to make engineering decisions.

I'm afraid I don't have much of a proposal for fixing this, other than
enhancing the operating system to let you see its paging decisions
(within security constraints, etc), and given those, get tools that
let you see what's happening, how you might recluster your stuff, etc.
That's not easy, either.

This is particularly important given that a major goal of Dylan is to
avoid the classic Lisp problem of applications that are slow because
(inter alia) the working set is so big.

From: Martin Rodgers
Subject: Re: Eval (Was: What's a treeshaker?)
Date: Thu, 04 Aug 1994 14:58:34 +0000
Message-ID: <776012314snz@wildcard.demon.co.uk>

In article <··········@cantaloupe.srv.cs.cmu.edu>
           ···@sef-pmax.slisp.cs.cmu.edu  writes:

> Well, we are going the other way in Dylan, but we certainly are going to
> make it as easy as possible to pull in all your favorite creeping features.
> I guess that in lieu of hassles, Gwydion could have a mode that administers
> a mild electric shock every time you load a new library, explicitly or
> implicitly, with voltage proportional to the size of the code.

I like it! I just wish Microsoft would use something like that. I used
to use a Sage II, and I remember reading about how developers were using
it to write software for the Apple II, running the P-system. They found
that apps that ran fine on the Sage II ran slowly on the Apple II.

The reason was the same as the one we're discussing here. The Sage II
had more RAM, and used a lot of it for a RAM disk, so that "paging"
was far less expensive or noticable than on the slower machine. A mild
electric shock to remind the programmers uses a Sage II might have
encouraged them to take more care.

We have a similar situation today, where most users have less powerful
machines than the developers of the apps they use. I'm not saying that's
a bad thing, but I _am_ concerned about history repeating itself.

-- 
Future generations are relying on us
It's a world we've made - Incubus	
We're living on a knife edge, looking for the ground -- Hawkwind

From: Stephen P. Smith
Subject: eval
Date: Mon, 08 Aug 1994 20:55:01 +0000
Message-ID: <326635$mbf@chnews.intel.com>

In article<··········@cantaloupe.srv.cs.cmu.edu>···@sef-pmax.slisp.cs.cmu.edu writes:
>     From: ····@ichips.intel.com (Mike Haertel)
>     
>     In the "all the world's linked in" model of software development, you
>     don't have to go to any hassle to use J. Random Feature from the Foo
>     library.  It's already there.  By contrast, if you have to explicitly
>     link the Foo library, there's a hassle factor, and perhaps also a moment
>     of guilt, that makes you think about whether you really need the feature.
>     Thus, although the "VM can replace explicit linker control" mindset 
>     certainally theoretically true, I believe in practice it is doomed
>     to fail because of psychological factors.
> 
> Wow!  You really believe that it is A Good Thing to make it a hassle to use
> pre-existing facilities so that people will have to stop and think about
> whether they really want to pay for each routine?  Maybe it's best not to
> program in a high-level language at all -- we don't want to hide all those
> low-level decisions that just might cost you a few cycles or a few words.
> 
> I prefer to make it as easy as possible for the programmer to use any
> pre-existing facilities.  We should also provide some tools so that people
> who want to obsess about working set can find out how much things are going
> to cost them.
>     

Yes, the first argument above reminds me of my early days in the
70's when we use to reason that our local CDC Cybers were
clearly superior in design and performance because they eschewed
VM altogether.  After all, you really should have to worry about
memory to use it correctly, right?

Do you really want your lusers worrying about whether their
machines happens to have VBRUN666.DLL, etc., on it to load your
latest and greatest software creation?

However, the key ideas in all of this seem to be:

 o A consistent set of well-design primatives in libraries,
 o Version control of those libraries,
 o A good development environment that allows reasoned choices 
   in utilizing library components.

Of course, the first point is the most important, no matter how
libraries are implemented.  It is an implementation question
(important, none the less) if those libraries are linked in
dynamically or pre-linked into the virtual image.

It seems to me that the idea of external libraries better
represents the idea of frameworks of objects interacting to
achieve certain behaviors (pattern matching, file i/o, or
whatever behavior level we deem appropriate to design in).  I
think this is what Mike Haertel was arguing.

External libraries certainly lead to the proliferation of
multiple libraries with overlapping behavior, which is one of
the things that standardized CL helped to reduce (but certainly
not eliminiate).  Of course, the standardization process could
as well have happened at the library level.

Actually, I believe that the correct thinking about all of this,
given a pure OO approach, is the ability to call out library
routines purely by some fundamental object address.  If that
address happens to be in core, you get it, if it happens to be
in you VM image, it gets paged in, if it happens to be in a
local .DLL, it gets linked in, if it happens to be in release on
the network, it gets RPCed, etc.  Thus, if you will,
names/address of routines replace VM addresses.

Dr. Stephen P. Smith
Not speaking for Intel; There is very little Intel Inside this
post.

From: Jeff Dalton
Subject: Re: eval
Date: Fri, 12 Aug 1994 18:16:30 +0000
Message-ID: <CuFpFI.5ou@cogsci.ed.ac.uk>

In article <··········@chnews.intel.com> <······@lox.ch.intel.com> (Stephen P. Smith) writes:
>In article<··········@cantaloupe.srv.cs.cmu.edu>···@sef-pmax.slisp.cs.cmu.edu writes:
>>     From: ····@ichips.intel.com (Mike Haertel)
>>     
>>     In the "all the world's linked in" model of software development, you
>>     don't have to go to any hassle to use J. Random Feature from the Foo
>>     library.  It's already there.  By contrast, if you have to explicitly
>>     link the Foo library, there's a hassle factor, [...]
>> 
>> Wow!  You really believe that it is A Good Thing to make it a hassle to use
>> pre-existing facilities so that people will have to stop and think about
>> whether they really want to pay for each routine?  Maybe it's best not to
>> program in a high-level language at all -- we don't want to hide all those
>> low-level decisions that just might cost you a few cycles or a few words.
>> 
>> I prefer to make it as easy as possible for the programmer to use any
>> pre-existing facilities.  We should also provide some tools so that people
>> who want to obsess about working set can find out how much things are going
>> to cost them.

What exactly is the hassle?  In Common Lisp, I may have to say
use-package before I can use something even if it's already
linked in.  And I already have to "hassle" to all of get my 
own files loaded.  So I'm going to have defsystem and calls to
use-package anyway.  Surely it's possible to have a system
that uses external libraries and requires no more than that.

>External libraries certainly lead to the proliferation of
>multiple libraries with overlapping behavior, which is one of
>the things that standardized CL helped to reduce (but certainly
>not eliminiate).  Of course, the standardization process could
>as well have happened at the library level.

Just so.

>Actually, I believe that the correct thinking about all of this,
>given a pure OO approach, is the ability to call out library
>routines purely by some fundamental object address.  If that
>address happens to be in core, you get it, if it happens to be
>in you VM image, it gets paged in, if it happens to be in a
>local .DLL, it gets linked in, if it happens to be in release on
>the network, it gets RPCed, etc.  Thus, if you will,
>names/address of routines replace VM addresses.

Sort of like fancy auto-loading...

-- jeff