Re: why is tree-shaking hard?

From: Tom McDougal
Subject: Re: why is tree-shaking hard?
Date: Mon, 13 Mar 1995 18:59:17 +0000
Message-ID: <D5E7Et.7HL@midway.uchicago.edu>

In article <·····················@orca.uchicago.edu> timoshenko, 
········@cs.uchicago.edu writes:
>I understand that allowing EVAL would make tree-shaking impossible,
>(or at least "fruitless" :-) since you would not be able to predict in
>advance which portions of the language would be needed at run time.
>I'm curious what complexities have to be addressed even if you 
>rule out EVAL.

(Hi Tim.)

Not just EVAL, but also FUNCALL (and maybe APPLY).

For example, an evil programmer could write something like:

 (let* ((input (get-some-user-input))
        (function-name (read-a-fn-name input)))
   (funcall function-name))

Any possible function might be envoked.


A less evil programmer could write

  (let ((function-name (choose-function-from-assoc-list)))
    (funcall function-name . <args>))

The problem for the tree-shaker in the latter case is to realize
that the assoc-list contains names of functions that might
be executed.

I'm just making this stuff up but it seems reasonable. :-)

-- 
Tom McDougal    University of Chicago Artificial Intelligence
     ···············@cs.uchicago.edu 
     http://cs-www.uchicago.edu/~mcdougal
     PP-RH

Re: why is tree-shaking hard? Dave Yost
- Re: why is tree-shaking hard? Harley Davis
  - Re: why is tree-shaking hard? CHRISTOPHER ELIOT
    - Re: why is tree-shaking hard? Steven Ritter
      - Re: why is tree-shaking hard? Mark Johnson
      - Re: why is tree-shaking hard? Scott McLoughlin
        Re: why is tree-shaking hard? CHRISTOPHER ELIOT
        Re: why is tree-shaking hard? Shriram Krishnamurthi
        Re: why is tree-shaking hard? Alex Crain
    - Re: reducing lisp programs' size Dave Dyer
      - Re: reducing lisp programs' size Chris Reedy
        Re: reducing lisp programs' size Kelly Murray
Re: why is tree-shaking hard? Rob MacLachlan
- Re: why is tree-shaking hard? Barry Margolin

From: Dave Yost
Subject: Re: why is tree-shaking hard?
Date: Tue, 14 Mar 1995 06:21:02 +0000
Message-ID: <3k3cke$u6@Yost.com>

In article <·····················@orca.uchicago.edu> timoshenko,
········@cs.uchicago.edu writes:
>I understand that allowing EVAL would make tree-shaking impossible,
>(or at least "fruitless" :-) since you would not be able to predict in
>advance which portions of the language would be needed at run time.
>I'm curious what complexities have to be addressed even if you
>rule out EVAL.

I say why worry about eval, funcall, apply, and
whatever.  If you're doing a tree shake, it's because
you want to let go of a lot of stuff, and you are
claiming that anything you really want to be there at
runtime is referenced somewhere, say by keeping a
global list of the things you want available at runtime
whether they're clearly referenced in your code or not.

I could imagine someone programming a spreadsheet which
would translate user formulas and/or macros to lisp for
eval-uation into machine code.  The functions and
objects which must be available at runtime to serve the
user could simply be held in a list so they don't get
shaken off.

Thus, a tree shaker could be free to be very aggressive and
even a little dumb, with some burden on the programmer.

Do any tree shakers work this way?

 Dave Yost
     @    .com

From: Harley Davis
Subject: Re: why is tree-shaking hard?
Date: Tue, 14 Mar 1995 11:40:22 +0000
Message-ID: <DAVIS.95Mar14124022@passy.ilog.fr>

In article <·········@Yost.com> ····@Yost.com (Dave Yost) writes:

   In article <·····················@orca.uchicago.edu> timoshenko,
   ········@cs.uchicago.edu writes:
   >I understand that allowing EVAL would make tree-shaking impossible,
   >(or at least "fruitless" :-) since you would not be able to predict in
   >advance which portions of the language would be needed at run time.
   >I'm curious what complexities have to be addressed even if you
   >rule out EVAL.

   I say why worry about eval, funcall, apply, and
   whatever.  If you're doing a tree shake, it's because
   you want to let go of a lot of stuff, and you are
   claiming that anything you really want to be there at
   runtime is referenced somewhere, say by keeping a
   global list of the things you want available at runtime
   whether they're clearly referenced in your code or not.

   Thus, a tree shaker could be free to be very aggressive and
   even a little dumb, with some burden on the programmer.

   Do any tree shakers work this way?

Well, we don't have a tree-shaker in Ilog Talk, but we do compute the
transitive closure for module dependencies when creating libraries or
executables; this achieves the same goal in a cleaner way. (You build
up rather than tear down.)  The problem with EVAL and SYMBOL-FUNCTION
still comes up.  (There is no problem with FUNCALL and APPLY; they
take objects as arguments and never names.)

We agree with Dave Yost.  We believe that module dependencies,
determined at compile time, should cover the entire application needs.
The cases where this is insufficient are rare and often a symptom of
an underlying design deficiency.  In the (very) few remaining cases,
the system provides escape hooks to go outside the module system.  For
instance, you can manually import a module into another, ignoring the
static dependency detection done by the module analyzer.

You can get more information about the Talk module system from our
LFP'94 conference paper "Talking about Modules and Delivery",
available from Ilog's web server at <URL:http://www.ilog.com/>.

-- Harley Davis

-- 

------------------------------------------------------------------------------
Harley Davis                            net: ·····@ilog.fr
ILOG S.A.                               tel: +33 1 46 63 66 66
2 Avenue Galli�ni, BP 85                fax: +33 1 46 63 15 82
94253 Gentilly Cedex, France            url: http://www.ilog.com/

           Ilog Talk information: ····@ilog.com

From: CHRISTOPHER ELIOT
Subject: Re: why is tree-shaking hard?
Date: Wed, 15 Mar 1995 14:39:26 +0000
Message-ID: <3k6u6u$k42@kernighan.cs.umass.edu>

Wouldn't it be a better approach to start with a tiny kernel and
only load what is needed, instead of loading everything and then
trying to get rid of everything?

I'm probably wrong, but it would seem easy enough for Digitool
to produce a "minimal" version of MCL with just enough stuff around
to be able to do a FASLOAD.  Then they could build the rest
in layers on top of that, with the programmer allowed to
mix and match packages to load. The compiler would not have
to be distributed in this form, only as part of a completely
loaded development environment.

As a guess, the following packages might be big enough to be
worth leaving out, and *might* be separable enough to keep separate.

Fred.
The compiler.
Bignums.
Much of the fancy math stuff (atanh etc.)
CLOS
VIEWS/WINDOWS (so you could build a small headless server.)
All debug support.
Etc.

The objective would be to break up the 1500K base MCL
appliction into about a dozen major pieces plus a kernel.
Then you might be able to build an MCL app. that would be
around 800K-1000K, instead of about a 2000K minimum.

On the other hand, would it really be worth it? You won't
get a 200K application out of MCL no matter what; it just was
not designed for that purpose. (Try XLISP instead, which was.)
And any major program is going to need all of the functionality
that MCL supplies anyhow. After all, the real value of Lisp is
when you are building very large, and very complex programs,
which is, I think, the future of programming.

-Chris Eliot

From: Steven Ritter
Subject: Re: why is tree-shaking hard?
Date: Wed, 15 Mar 1995 17:32:58 +0000
Message-ID: <ojNmH_y00WBNM3t4pU@andrew.cmu.edu>

Excerpts from netnews.comp.lang.lisp.mcl: 15-Mar-95 Re: why is
tree-shaking hard? by CHRISTOPHER ·····@cs.uma 
>  After all, the real value of Lisp is
> when you are building very large, and very complex programs,
> which is, I think, the future of programming.
> 

No, I think the future of programming involves building very large and
very complex user environments which are constructed out of small,
relatively simple components (like OpenDoc parts or whatever OLE calls
them). In Lisp, each part incurs Lisp's overhead. In Dylan, parts can
share the runtime libraries. In the example given earlier, you can have
10 little, hello-world-size Dylan parts taking up less than 2 meg.

It may be possible to build a Lisp that relies on runtime libraries in
this way (and, if so, I hope someone does it) but as of now, Lisp is not
practical in a compound-document world.

Steve

From: Mark Johnson
Subject: Re: why is tree-shaking hard?
Date: Wed, 15 Mar 1995 20:15:20 +0000
Message-ID: <3k7hso$ge9@cat.cis.Brown.EDU>

In article <··················@andrew.cmu.edu>,
Steven Ritter  <·····@andrew.cmu.edu> wrote:
>It may be possible to build a Lisp that relies on runtime libraries in
>this way (and, if so, I hope someone does it) but as of now, Lisp is not
>practical in a compound-document world.

It may interest you to know that Chez Scheme for SVR4 (a Unix system)
uses shared libraries to do exactly this.  Very handy on a multi-processor
platform!

Mark



-- 
Mark Johnson
Cognitive and Linguistic Sciences, Box 1978
Brown University, Providence, RI 02912 
(401) 863 1670, fax (401) 863 2255

From: Scott McLoughlin
Subject: Re: why is tree-shaking hard?
Date: Wed, 15 Mar 1995 23:44:28 +0000
Message-ID: <Ts412c1w165w@sytex.com>

Steven Ritter <·····@andrew.cmu.edu> writes:

> It may be possible to build a Lisp that relies on runtime libraries in
> this way (and, if so, I hope someone does it) but as of now, Lisp is not
> practical in a compound-document world.
> 
> Steve
> 

Howdy,

Has anyone done any research on doing this, e.i., splitting Common Lisp
into a kernel + libraries.  The idea is mentioned/suggested many times
here on this newsgroup. What goes in the kernel. What goes in the
libraries. What are the "bootstrap" issues. Is the design better
facillitated by the addition of more primitives (I'm thinking of
the SEQUENCES rats nest) specialized to certain data types. How
is debugging "folded" into a system. Etc. etc. etc. etc. etc. 

If such work has not been done rigourously, maybe someone can squeeze
a Masters/PhD thesis out of it ;-)  Inquiring minds want to know.

=============================================
Scott McLoughlin
Conscious Computing
=============================================

From: CHRISTOPHER ELIOT
Subject: Re: why is tree-shaking hard?
Date: Thu, 16 Mar 1995 15:08:23 +0000
Message-ID: <3k9k97$eul@kernighan.cs.umass.edu>

In article <············@sytex.com> ····@sytex.com (Scott McLoughlin) writes:
>Steven Ritter <·····@andrew.cmu.edu> writes:
>
>> It may be possible to build a Lisp that relies on runtime libraries in
>> this way (and, if so, I hope someone does it) but as of now, Lisp is not
>> practical in a compound-document world.
>> 
>> Steve
>> 
>
>Howdy,
>
>Has anyone done any research on doing this, e.i., splitting Common Lisp
>into a kernel + libraries.  The idea is mentioned/suggested many times
>here on this newsgroup. What goes in the kernel. 

In the Lisp Machine, runtime type checking was driven down into
the hardware level. Each memory word had extra type bits and
all data access (except certain special subprimitives) was 
checked in parallel by the hardware with a trap if the type
was wrong. I think that a standardized data format, runtime type
checking and garbarge collection could be built into the
hardware/operating system level in a language independent way.

There are many proposals for building future operating systems
in a "modular" object oriented way with good object oriented
user interface support. That might eliminate the need for much
of the MCL view/window/menu mechanism.

Most applications can probably live without FRED and the compiler.
Beyond this, I suspect it gets very hard. I would guess that
most implementations are meta-circular and that a kernel+libraries
implementation would be inefficient.

-Chris Eliot

>What goes in the
>libraries. What are the "bootstrap" issues. Is the design better
>facillitated by the addition of more primitives (I'm thinking of
>the SEQUENCES rats nest) specialized to certain data types. How
>is debugging "folded" into a system. Etc. etc. etc. etc. etc. 
>
>If such work has not been done rigourously, maybe someone can squeeze
>a Masters/PhD thesis out of it ;-)  Inquiring minds want to know.
>
>=============================================
>Scott McLoughlin
>Conscious Computing
>=============================================

From: Shriram Krishnamurthi
Subject: Re: why is tree-shaking hard?
Date: Sat, 18 Mar 1995 04:11:21 +0000
Message-ID: <3kdmh9$sph@larry.rice.edu>

·····@cs.umass.edu (CHRISTOPHER ELIOT) writes:

> I think that a standardized data format, runtime type checking and
> garbarge collection could be built into the hardware/operating
> system level in a language independent way.

Except that by doing this, you lose out on a valuable kind of compiler
optimization that is possible, namely the elimination of run-time tags
through static analysis.  Such tools exist (eg, Soft Scheme, which
does this for Scheme programs), and preliminary investigations seem to
show that they can be of some use -- on larger programs, we have
noticed a speed-up of at last about 5%, and we're studying what
factors affect these figures.  Other tools (which conduct flow
analysis) might produce better results; we're looking into it.

'shriram

From: Alex Crain
Subject: Re: why is tree-shaking hard?
Date: Sun, 19 Mar 1995 15:07:34 +0000
Message-ID: <3khhbm$2t4@ashnet.clark.net>

In article <··········@larry.rice.edu>,
Shriram Krishnamurthi <·······@europa.cs.rice.edu> wrote:
>·····@cs.umass.edu (CHRISTOPHER ELIOT) writes:
>
>> I think that a standardized data format, runtime type checking and
>> garbarge collection could be built into the hardware/operating
>> system level in a language independent way.
>
>Except that by doing this, you lose out on a valuable kind of compiler
>optimization that is possible, namely the elimination of run-time tags
>through static analysis.

	Not true. Hardware type checking must take one of two forms:

	(a) a zero penalty implimentation, implimented at the page
	protection level and triggering a HW interrupt on failure.

	(b) implementation as a hardware instruction, which can be
	scheduled like any other instruction.

	In the first case, run-time tagging need not be eliminated by
	the compiler, since there is not performace penalty anyway.

	In the second case, the system is simply offering to do in
	hardware what was previously done in software, so the old rules
	still apply.

	The problem that I see with hardware checking is with inheretance.
	Object systems change the semantics if (Type-of ...) to 
	(Class-of ...) and (SuperClass-of ...), which in turn requires
	the OS to implement a global class structure. While this is
	not necessarily a bad thing, its a far trip from the rather
	trivial type checking idea that started this thread.

-- 
#
#						Alex Crain
#						ASH Engineering Inc.
#						····@ashnet.clark.net

From: Dave Dyer
Subject: Re: reducing lisp programs' size
Date: Thu, 16 Mar 1995 17:45:27 +0000
Message-ID: <ddyerD5Jnzr.5E8@netcom.com>

It is true that pound-for-pound, lisp requires more real memory, more real disk,
and more CPU to get the same actual results from a running program.  The trick
is to get a running program.

For my money, efforts to reduce a completed lisp program's size by
treeshaking, or fasloading, or making shared libraries are not worth
the effort in modern virtual memory environments.  

Fasloading is essentially sleight of hand - it merely moves the memory
to someplace you don't notice it so easily.

Treeshaking has real effect, but risks introducing bugs, has little
effect on performance, and marginal effect on size.

Building shared libraries can yield real benefits, *if* there are several
lisp programs running to do the sharing.  Most places, its a struggle
to get the first lisp program on the run list.

It is wonderful to be able to download trivially small programs with
wonderful functionality, but that isn't Lisp's niche.

---
My home page: ftp://ftp.netcom.com/pub/dd/ddyer/home.html
or try http://www.triple-i.com/~ddyer/home.html
-- 
---
My home page: ftp://ftp.netcom.com/pub/dd/ddyer/home.html
or try http://www.triple-i.com/~ddyer/home.html

From: Chris Reedy
Subject: Re: reducing lisp programs' size
Date: Thu, 16 Mar 1995 22:42:45 +0000
Message-ID: <creedy-1603951745060001@clreedy-mac.mitre.org>

In article <···············@netcom.com>, ·····@netcom.com wrote:

> It is true that pound-for-pound, lisp requires more real memory, more
real disk,
> and more CPU to get the same actual results from a running program.  The trick
> is to get a running program.
> 
> For my money, efforts to reduce a completed lisp program's size by
> treeshaking, or fasloading, or making shared libraries are not worth
> the effort in modern virtual memory environments.  
> 
> [more comments deleted]

The problem with "modern virtual memory environments" is that they require
real disk space and real I/O capacity to support the virtual memory
mechanism.  This usually isn't a problem for a single user workstation
with reasonably large amounts of real memory and a large disk.  This will
be a problem if you have a system with a number of diskless nodes and/or
nodes with small disks that can't allocate large swap spaces.  And, even
though additional disk space for a single machine may be inexpensive,
additional disk space for a large number of machines (100s to 1000s) can
add up to real money.

  Chris

The above opinions are my own and not MITRE's.
Chris Reedy, Workstation System Engineering Center, Z667
The MITRE Corporation, 7525 Colshire Drive, McLean, VA 22102-3481
Email: ······@mitre.org  Phone: (703) 883-7183  FAX: (703) 883-6991

From: Kelly Murray
Subject: Re: reducing lisp programs' size
Date: Fri, 17 Mar 1995 23:22:38 +0000
Message-ID: <3kd5ju$mn9@no-names.nerdc.ufl.edu>

In article <·······················@clreedy-mac.mitre.org>, ······@mitre.org (Chris Reedy) writes:
|> In article <···············@netcom.com>, ·····@netcom.com wrote:
|> 
|> > It is true that pound-for-pound, lisp requires more real memory, more
|> real disk,
|> > and more CPU to get the same actual results from a running program.  The trick
|> > is to get a running program.
|> > 
|> > For my money, efforts to reduce a completed lisp program's size by
|> > treeshaking, or fasloading, or making shared libraries are not worth
|> > the effort in modern virtual memory environments.  
|> > 
|> > [more comments deleted]
|> 
|> The problem with "modern virtual memory environments" is that they require
|> real disk space and real I/O capacity to support the virtual memory
|> mechanism.  This usually isn't a problem for a single user workstation
|> with reasonably large amounts of real memory and a large disk.  This will
|> be a problem if you have a system with a number of diskless nodes and/or
|> nodes with small disks that can't allocate large swap spaces.  And, even
|> though additional disk space for a single machine may be inexpensive,
|> additional disk space for a large number of machines (100s to 1000s) can
|> add up to real money.
|> 

These factors are true, but I think the bigger issue is a little different.
In many commercial environments they have a smaller number of big machines that
service lots of users, who typically interface to the big machine
with X terminals or PC's. In that environment, a program that uses "only 4 mb"
of real memory (and wants 20mb of vm) multiplies out quite a bit when you
have 10-100 users on the system running the program.  

So you can't really write a little calendar program that every secretary 
or data-entry clerk working at XYZ insurance company can all use at once.

An OS shared-library for the code portions seems at least a minimum 
way to deal with this, but I think we can do even better. 

-- 
-Kelly Murray  (···@prl.ufl.edu) <a href="http://www.prl.ufl.edu">
-University of Florida Parallel Research Lab </a>

From: Rob MacLachlan
Subject: Re: why is tree-shaking hard?
Date: Tue, 14 Mar 1995 16:30:21 +0000
Message-ID: <3k4gat$5sa@cantaloupe.srv.cs.cmu.edu>

I don't know what others have done, but our tentative effort at tree-shaking 
(no longer supported) completely ignored the possibility of functions becoming
accessible via INTERN, READ, etc.  Even so, tree-shaking was only marginally
effective (producing applications that were still several megabytes.)  

The problem was that a lot of CL was implicitly used by the "run-time system".
In particular, unless you totally gave up the possibility of debugging, any
error could land you in the debugger, which used read, print, format, sequence
functions, and any other hairy CL feature you care to think of.  Also, tree
shaking can never eliminate support for things liks bignums.

What our "tree-shaker" did was simply destroy the package system, and then do a
GC.  Any symbols remaining were put back in the package they were originally
in, allowing debugging.  It was kind of amusing to note that you could still
evaluate just about any CL expression that randomly came to mind without
calling any of the functions in the 2/3's of the image that was missing.

b.t.w., when you look at the distributed CMU CL image, a whopping fraction is
the compiler and PCL.  By simply not loading some packages and byte-compiling
others, we get our "runtime" image, which is 40% the size of the standard
image.  This has all of CLtL1 except general EVAL and COMPILE, and isn't that
much bigger than what you get from non-aggressive tree-shaking.

You can definitely get better control over size by not loading the stuff you
don't need.  One way that this has been done in Lisp is by "autoloading", but
this isn't a complete solution, since it still requires as much or more disk
space, and may result in worse sharability in multi-user environments.

It's hard to beat the idea of libraries which are explicitly loaded when
needed (though tree shaking may still be useful if only a fraction of a library
is used.)

  Rob

From: Barry Margolin
Subject: Re: why is tree-shaking hard?
Date: Tue, 14 Mar 1995 23:20:28 +0000
Message-ID: <3k58bs$4r8@tools.near.net>

In article <··········@cantaloupe.srv.cs.cmu.edu> ···@cs.cmu.edu (Rob MacLachlan) writes:
>You can definitely get better control over size by not loading the stuff you
>don't need.  One way that this has been done in Lisp is by "autoloading", but
>this isn't a complete solution, since it still requires as much or more disk
>space, and may result in worse sharability in multi-user environments.

Multics Maclisp used memory mapping (via the dynamic linker) to access the
code sections of compiled files loaded with LOAD.  It seems that a similar
mechanism could be implemented in modern Unix Lisps.
-- 
Barry Margolin
BBN Planet Corporation, Cambridge, MA
······@near.net