One Man's Language Comparison for Estimating Codebase Size Reduction

From: David McClain
Subject: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 06:31:26 +0000
Message-ID: <eajP7.420$p97.331175@news.uswest.net>

I just finished my experiment to reduce the size of a fielded application by
recoding in either of Lisp or OCaml. I had early indications that, aside
from pure ease of programming in these HLL's, the overall code base would be
drastically reduced (5x to 6x). That is certainly true if you count all the
source code needed to produce the application, but an honest, impartial,
comparison of the lines I actually had to write, of non-reusable,
application specific code shows somewhat disappointing results on this basis
alone.

The application is a system network server that performs recursive prefix
mappings of file pathnames, including environment variable substitutions.
This is a variation on the system provided by the Sprite experimental OS
developed at UCB by John Ousterhout, et. al. in the late 1980's and early
1990's.

The existing version was coded in M$ VC++ making heavy use of STL. It is a
COM/OLE process server based on M$ ATL. All three versions retain a machine
generated ATL wrapper code for this COM/OLE behavior -- I only needed to
write a few lines of IDL to produce the basic skeleton, and all three
versions use identical stuff here...

For the application specific coding, the scores are:

Existing App:
C/C++ = 1106 LOC

Lisp Version:
C/C++ = 284 LOC, Lisp = 798 LOC  --> Total = 1082 LOC

OCaml Version:
C/C++ = 284 LOC, Lisp = 58 LOC, OCaml = 453 LOC --> Total = 888

These LOC counts do *not* include blank lines and comment only lines.

On the basis of code-base size reduction, these results are nearly a tie.

But on the basis of ease of programming, I have to award Lisp first,
followed by OCaml, and distantly trailed by C/C++. The reasons for this are:

1. Lisp is a huge langauge with nearly everything you need already built in.
But it produces very bulky DLL's -- on the order of 15 MBytes.

2. OCaml is equally terse as Lisp, or even slightly better, but needs a fair
amount of additional support routines written, to cover the application
needs. Some of this is in C/C++ (very little) but most has to do with
providing things like unwind-protect, generalized string handling,
generalized list operations. It produces very fast runtime code (not needed
here) and quite reasonably sized DLL's -- about 300 KBytes (50x smaller than
Lisp!!)

3. C/C++, making heavy use of classes and STL is nearly unreadable, took a
long time to program, and is frightening to revisit after some time away
from it (1 year or more since original writing). C/C++ retains the
capability to utilize Unicode (FWIW -- I don't really need it), but it was
written with some embedded bugs that I found only when I was able to remain
at the abstract levels permitted by HOL's.

Both the Lisp and OCaml versions were written in the course of 2-3 hours.
Writing the C/C++ version took the better part of 1 week. Prior to that I
had written experimental versions in Lisp and had more than a year of
playing with the system to get an understanding of the needed algorithms.

I will say that both Lisp and OCaml allowed me to spot some errors in the
C/C++ implementation, fix those errors, and add some extra capability (about
20 LOC in both Lisp and OCaml for the extra stuff).  I estimate the time
needed to go back and refamiliarize myself with STL and the internal
architecture of the existing application -- in order to fix the bugs I
discovered and add the additional capabilities -- would be several days.

I find it remarkable that OCaml has a slight edge on Lisp for terseness of
expression. OCaml is a highly expressive syntax and you can say quite a lot
in a few keystrokes. Lisp tends to be more wordy, use longer identifiers,
and the code is quite a bit sparser for semantic content over a given number
of LOC.

This is as close as I can come to providing an honest, impartial, comparison
of these languages for the purpose of rewriting existing code to be more
maintainable, robust, and correct. I definitely think the effort is
worthwhile, but not entirely for the reasons I had originally anticipated.

Cheers,

- David McClain, Sr. Scientist, Raytheon Systems Co., Tucson, AZ

Re: One Man's Language Comparison for Estimating Codebase Size Reduction Duane Rettig
- Re: One Man's Language Comparison for Estimating Codebase Size Reduction David McClain
  - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Duane Rettig
  - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Kaz Kylheku
    - Re: One Man's Language Comparison for Estimating Codebase Size Reduction David McClain
Re: One Man's Language Comparison for Estimating Codebase Size Reduction David McClain
- Re: One Man's Language Comparison for Estimating Codebase Size Reduction Alain Picard
  - Re: One Man's Language Comparison for Estimating Codebase Size Reduction David McClain
    - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Robert Monfera
      - Re: One Man's Language Comparison for Estimating Codebase Size Reduction David McClain
  - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Duane Rettig
    - Re: One Man's Language Comparison for Estimating Codebase Size Reduction David McClain
    - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Kenny Tilton
  - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Samir Sekkat
  - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Erik Naggum
Re: One Man's Language Comparison for Estimating Codebase Size Reduction Carl Shapiro
- Re: One Man's Language Comparison for Estimating Codebase Size Reduction David McClain
  - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Marco Antoniotti
    - Re: One Man's Language Comparison for Estimating Codebase Size Reduction David McClain
  - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Wade Humeniuk
    - Re: One Man's Language Comparison for Estimating Codebase Size Reduction David McClain
  - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Carl Shapiro
    - Re: One Man's Language Comparison for Estimating Codebase Size Reduction David McClain
- Re: One Man's Language Comparison for Estimating Codebase Size Reduction Robert Monfera
  - Re: One Man's Language Comparison for Estimating Codebase Size Reduction ········@acm.org
  - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Kaz Kylheku
    - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Will Deakin
Re: One Man's Language Comparison for Estimating Codebase Size Reduction David McClain
- Re: One Man's Language Comparison for Estimating Codebase Size Reduction Marco Antoniotti
- Re: One Man's Language Comparison for Estimating Codebase Size Reduction Kaz Kylheku
  - Re: One Man's Language Comparison for Estimating Codebase Size Reduction David McClain
    - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Sashank Varma
    - Re: One Man's Language Comparison for Estimating Codebase Size Reduction ········@acm.org
      - Re: One Man's Language Comparison for Estimating Codebase Size Reduction David McClain
- Re: One Man's Language Comparison for Estimating Codebase Size Reduction Tim Bradshaw
  - Re: One Man's Language Comparison for Estimating Codebase Size Reduction David McClain
Re: One Man's Language Comparison for Estimating Codebase Size Reduction Christopher Stacy
- Re: One Man's Language Comparison for Estimating Codebase Size Reduction David McClain
  - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Alain Picard
Re: One Man's Language Comparison for Estimating Codebase Size Reduction Marc Battyani
- Re: One Man's Language Comparison for Estimating Codebase Size Reduction Duane Rettig
  - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Thomas F. Burdick
    - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Raymond Toy
  - Re: One Man's Language Comparison for Estimating Codebase Size Reduction Marc Battyani

From: Duane Rettig
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 07:51:36 +0000
Message-ID: <4elmaw1dj.fsf@beta.franz.com>

"David McClain" <······@qwest.net> writes:

> Both the Lisp and OCaml versions were written in the course of 2-3 hours.
> Writing the C/C++ version took the better part of 1 week. Prior to that I
> had written experimental versions in Lisp and had more than a year of
> playing with the system to get an understanding of the needed algorithms.

I'm not actually suggesting that you do this, but to be _completely_ fair
and impartial to C++ you would probably want to rewrite the application
from scratch in C++.  There have been numerous discussions on this and
other language newsgroups regarding rewriting code, and although it is
extremely hard to make real comparisons due to the preferences and
expertise that each developer has in his own language coupled with his
raw programming expertise, one sense that I do get from such discussions
is that complete rewrites tend to go faster (and run faster and smaller)
than original designs.

Of course, I would understand completely if you declined :-)

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)

From: David McClain
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 17:32:35 +0000
Message-ID: <1SsP7.731$Lj1.89859@news.uswest.net>

> Both the Lisp and OCaml versions were written in the course of 2-3 hours.
> Writing the C/C++ version took the better part of 1 week. Prior to that I
> had written experimental versions in Lisp and had more than a year of
> playing with the system to get an understanding of the needed algorithms.

One of the reasons I stated this was to show that I already knew, when I
wrote the C/C++ version, where I was headed...

But I think a significant showing here is that despite that knowledge, C/C++
forced me to work at such a low syntactic level that I got lost in those
language details and lost sight of the overall algorithm. I think that
staying in a HOL is *very* important for this very reason!

- DM

From: Duane Rettig
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 17:48:53 +0000
Message-ID: <4g06psgl6.fsf@beta.franz.com>

"David McClain" <······@qwest.net> writes:

> > Both the Lisp and OCaml versions were written in the course of 2-3 hours.
> > Writing the C/C++ version took the better part of 1 week. Prior to that I
> > had written experimental versions in Lisp and had more than a year of
> > playing with the system to get an understanding of the needed algorithms.
> 
> One of the reasons I stated this was to show that I already knew, when I
> wrote the C/C++ version, where I was headed...

Yes, I can see that I misunderstood the timeline.  It looks from re-reading
this paragraph that all three of the versions _were_ rewrites, after having
experimented with the algorithms in Lisp.

> But I think a significant showing here is that despite that knowledge, C/C++
> forced me to work at such a low syntactic level that I got lost in those
> language details and lost sight of the overall algorithm. I think that
> staying in a HOL is *very* important for this very reason!

In light of this and of your methodology, 1 week vs 2-3 hours is quite
an impressive ratio!

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)

From: Kaz Kylheku
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 19:13:18 +0000
Message-ID: <iluP7.25419$nm3.1159358@news1.rdc1.bc.home.com>

In article <···················@news.uswest.net>, David McClain wrote:
>> Both the Lisp and OCaml versions were written in the course of 2-3 hours.
>> Writing the C/C++ version took the better part of 1 week. Prior to that I
>> had written experimental versions in Lisp and had more than a year of
>> playing with the system to get an understanding of the needed algorithms.
>
>One of the reasons I stated this was to show that I already knew, when I
>wrote the C/C++ version, where I was headed...

What is this language ``C/C++''? Did you write this program in C or in C++,
or some mixture?

>But I think a significant showing here is that despite that knowledge, C/C++
>forced me to work at such a low syntactic level that I got lost in those
>language details and lost sight of the overall algorithm.

You lost sight of your algorithm in only 800+ lines of code?

From: David McClain
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Thu, 06 Dec 2001 01:07:46 +0000
Message-ID: <NwzP7.274$Ku6.404945@news.uswest.net>

> You lost sight of your algorithm in only 800+ lines of code?

Yes, let me point out that I develop code very quickly, as it is not my main
function. I am a physicist with better than average understanding of
computing, computer languages, etc. But unlike many on this list,
programming is only a side venture for me. I do it as needed to support our
research and data collection activities.

Specifically, what I lost sight of, during the C++ development (the language
is C++, but I state C/C++ because sometimes there are mixtures of the
two...) was the possibility that a request could arrive at the server that
seeks to reparent a heirarchical mapper to either itself or some other
mapper of which it is already a parent. That would have produced an
"improper" list in Lisp parlance and should have been disallowed.

Seeing that kind of possibility was very easy in a HOL, but not so easy when
you are fretting about with STL and C++ syntactic gymnastics... My whole
point here was that the syntax was so distracting that one easily looses
sight of the higher purposes.

- DM

From: David McClain
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 07:10:36 +0000
Message-ID: <YKjP7.423$p97.343736@news.uswest.net>

....ahem...
I use a fairly open style of C/C++ coding, so if I reject lines containing
only an open or close curly brace, or a close curly followed by a semicolon,
then the counts revise downward to the following:

Existing C/C++ App:  LOC = 838
Lisp Version LOC = 1008
OCaml Version LOC = 814

... making the results even more of a tie overall on the basis of LOC.

- DM

From: Alain Picard
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 11:10:52 +0000
Message-ID: <86zo4xapmr.fsf@gondolin.local.net>

"David McClain" <······@qwest.net> writes:

> then the counts revise downward to the following:
> 
> Existing C/C++ App:  LOC = 838
> Lisp Version LOC = 1008
> OCaml Version LOC = 814

Are there really _still_ people using LOC to compare languages ?!
If so, at the very least, hit us with programs with > 100kLoc 
before making any comparison, if you want them to be meaningful...

FWIW, the code base I'm working on now has 50,000 lines of Lisp,
including all white space, comments, and test harnesses.  It implements
3 separate distributed process which cooperate together to do some
fairly fancy stuff.  I've written modules of that order of size in C++
which implemented _one_ subsystem of a telco app, which did almost
nothing, compared to my present app.  I'm still completely amazed
at how _small_ our codebase is, wrt to how much functionality it provides. 

Let's just say that DEFMACRO, READ, and not having to write your
own parser/lexer/compiler can make for quite a bit of savings.  :-)

But seriously, the only intersting about the LOC measurement of a program
is that there is a strong correlation between LOC and number of bugs,
which is why high level languages are nice.

-- 
It would be difficult to construe        Larry Wall, in  article
this as a feature.			 <·····················@netlabs.com>

From: David McClain
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 17:06:06 +0000
Message-ID: <dtsP7.540$Lj1.83466@news.uswest.net>

Well, not every program is a mega liner... But if you get enough programs in
your inventory the burden can become quite great...

I did this study in part because I have more than 300 KLOC to tend to for
just one aspect of our work. I find that to be an oppressive amount of code
to protect against bit-rot.

So anything to shrink this codebase would be quite welcomed!!!

- DM

From: Robert Monfera
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Thu, 06 Dec 2001 01:09:10 +0000
Message-ID: <3C0EC50A.20809@fisec.com>

CL has a lot of tools for making significant changes and refactoring 
without breaking everything.  Of course, the problem domain has to be 
sufficiently complex to gain from this.

CL lends itself to generic programming, where you don't implement 
customer requests - you look behind them, search for commonality or 
analogous things in your code and anticipate similar things coming.

The Lisp tools for this are the world's best object system:CLOS+MOP, 
macros, keyword arguments, exception handling, safe runtime typing, 
functional programming and interactivity.  One can learn ideas from the 
AMOP book to help program in a generic way, as if doing aspect-oriented 
programming.

In a module I worked on, I achieved a significant code shrink _and_ easy 
extensibility by separating the following aspects, sets of meta-functions:
- object definitions and declarations
- logic (calculations)
- validations, rollbacks
- caching
- interfacing

Of course, multiple inheritance and multi-arg dispatch help within these 
areas, too.

Not only the code became much smaller, but incremental changes have a 
different impact.  In the previous code, the mix-up of the aspects meant 
that time required to add features would have grown almost 
quadratically.  Now sometimes it is more like just enabling some 
constellation of things.

Besides the decrease of the overall code size, the aspect-oriented 
separation keeps the core logic and class hierarchy very compact, 
unburdened from housekeeping duties.

As few clients want solutions for easy problems, the tasks tend to be 
complex, and they stream in continuously.  It's a dynamic world, so 
understanding requirements quickly and seeing beyond is an advantage. 
Therefore I believe that one can be a vastly more successful implementor 
by being or trying to become an _expert_ in the problem domain, besides 
using abstraction tools.  One person with two hats on may be immensely 
more productive than one developer and one domain expert, not the least 
for the ease of co-ordination, communication, resource allocation and 
dependency scheduling within one's head.

Robert

David McClain wrote:

 
> So anything to shrink this codebase would be quite welcomed!!!

From: David McClain
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Thu, 06 Dec 2001 01:24:32 +0000
Message-ID: <vMzP7.293$Ku6.416198@news.uswest.net>

Yes, I have been following AMOP now for several years. I am quite intrigued
at its possibilities. I did download the AMOP from Xerox PARC a few years
ago, but I haven't honestly done anything more than to borrow some if its
macrology ideas.

I will look more deeply at it now...

- DM

From: Duane Rettig
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 18:37:13 +0000
Message-ID: <4bshdsecm.fsf@beta.franz.com>

Alain Picard <·······@optushome.com.au> writes:

> "David McClain" <······@qwest.net> writes:
> 
> 
> > then the counts revise downward to the following:
> > 
> > Existing C/C++ App:  LOC = 838
> > Lisp Version LOC = 1008
> > OCaml Version LOC = 814
> 
> Are there really _still_ people using LOC to compare languages ?!
> If so, at the very least, hit us with programs with > 100kLoc 
> before making any comparison, if you want them to be meaningful...

I don't think that is necessary.  The meaning which can be drawn
from LOC measurements depends on what support is given to the numbers,
not the volume.

David, would you be willing to explain these numbers further, either
by assessing your own coding style in each of these three languages,
or better yet by making the code available?

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)

From: David McClain
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Thu, 06 Dec 2001 01:00:32 +0000
Message-ID: <0qzP7.270$Ku6.401712@news.uswest.net>

Hi, yes... I will make all the code analysed in this test, in all three
languages, available to anyone desiring a closer look. I welcome your input
on this subject.

Just send me a note asking for the test zip containing these sources...

- DM

From: Kenny Tilton
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 21:45:26 +0000
Message-ID: <3C0E9628.D97DE9EE@nyc.rr.com>

Duane Rettig wrote:
> 
> > "David McClain" <······@qwest.net> writes:
> >
> >
> > > then the counts revise downward to the following:
> > >
> > > Existing C/C++ App:  LOC = 838
> > > Lisp Version LOC = 1008
> > > OCaml Version LOC = 814
> >
> 
> David, would you be willing to explain these numbers further, either
> by assessing your own coding style in each of these three languages,
> or better yet by making the code available?

I was curious about seeing the code, too. But is the Mystery of The
Missing (C++) LOC explained by the declared incomprehensibility of the
code? As someone noted, losing comprehensibility in 800 LOC is
impressive(ly damning). 

How long would the C++ version be if it were comprehensible? I think it
was stated that the low C++ count was achieved thru use of the STL, and
that using the STL is what makes the code hard to follow. So to become
clear the C++ code needs to lose the STL and put on LOC weight--the
Missing LOC.

Mind you, the only thing that matters now is how fast a developer can
deliver how much functionality. LOC (or some attempted measure of
functionality) is necessary only where "how much functionality" varies
between implementations. Here functionality is held constant, so LOC can
be ignored.

kenny
clinisys

From: Samir Sekkat
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 19:31:08 +0000
Message-ID: <MPG.16789343b307a25798968d@news.t-online.de>

In article <··············@gondolin.local.net>, ·······@optushome.com.au 
says...
> "David McClain" <······@qwest.net> writes:
> 
> 
> > then the counts revise downward to the following:
> > 
> > Existing C/C++ App:  LOC = 838
> > Lisp Version LOC = 1008
> > OCaml Version LOC = 814
> 
> Are there really _still_ people using LOC to compare languages ?!
> If so, at the very least, hit us with programs with > 100kLoc 
> before making any comparison, if you want them to be meaningful...
> 
> FWIW, the code base I'm working on now has 50,000 lines of Lisp,
> including all white space, comments, and test harnesses.  It implements
> 3 separate distributed process which cooperate together to do some
> fairly fancy stuff.  I've written modules of that order of size in C++
> which implemented _one_ subsystem of a telco app, which did almost
> nothing, compared to my present app.  I'm still completely amazed
> at how _small_ our codebase is, wrt to how much functionality it provides. 

well said.

Last time a potential investor asked me how much our code size was, and 
I answered that it was totally irrelevant. Because there are still 
people outside thinking that small code size means less functionality.

They just dont now CL :-)

Samir

From: Erik Naggum
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 20:33:49 +0000
Message-ID: <3216573228097060@naggum.net>

* Alain Picard <·······@optushome.com.au>
| Are there really _still_ people using LOC to compare languages ?!
| If so, at the very least, hit us with programs with > 100kLoc 
| before making any comparison, if you want them to be meaningful...

  Well, this may restrict us from seeing some fairly useful measures.  A
  few years ago, I took over a project that had barely worked.  It was
  about 30,000 lines of C code, and amazingly stupid C code at that.  It
  took me three months just to figure out what it was _really_ doing, then
  two months to reimplement the entire system in 2000 lines of Common Lisp
  code.  As functionality grew and bad functionality was taken out, it grew
  and shrunk, but after about two years, it was still less than 3000 lines
  of code and now offered services that they had been wanting for at least
  a decade.  The C system had basically tried to invent multithreading and
  interprocess communication, which Allegro CL offered as the baseline.
  (This was before the Java craze and multithreading actually worked, and
  was one of the first, if not the first, Linux-based Allegro CL systems in
  production use.)  I would like to believe that this is the kind of edge
  that Common Lisp can give to _small_ software systems.

///
-- 
  The past is not more important than the future, despite what your culture
  has taught you.  Your future observations, conclusions, and beliefs are
  more important to you than those in your past ever will be.  The world is
  changing so fast the balance between the past and the future has shifted.

From: Carl Shapiro
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 14:48:26 +0000
Message-ID: <ouyvgflen9h.fsf@panix3.panix.com>

"David McClain" <······@qwest.net> writes:

[...]

> 1. Lisp is a huge langauge with nearly everything you need already built in.
> But it produces very bulky DLL's -- on the order of 15 MBytes.

Did you deliver your DLL with or without performing some sort of
tree-shaking or delivery optimization of your Lisp image?

From: David McClain
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 17:15:59 +0000
Message-ID: <wCsP7.614$Lj1.85921@news.uswest.net>

Yes,... the tree shaker shakes too much from the Lisp at this time so that
the result can't start up at DLL load time. That is what gives me 15 MB
images. When the tree shaker is turned on hard, I have some indication that
the DLL image size could shrink to 3 MB.

- DM

From: Marco Antoniotti
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 17:25:46 +0000
Message-ID: <y6citblk291.fsf@octagon.mrl.nyu.edu>

"David McClain" <······@qwest.net> writes:

> Yes,... the tree shaker shakes too much from the Lisp at this time so that
> the result can't start up at DLL load time. That is what gives me 15 MB
> images. When the tree shaker is turned on hard, I have some indication that
> the DLL image size could shrink to 3 MB.

I think one of the things to take into account is that for sizable
programs, the footprint is probably bound to be comparable anyway
across languages.

I do not have hard numbers to support this (who does?) but it seems
reasonable to assume.

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.

From: David McClain
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 17:37:56 +0000
Message-ID: <5XsP7.766$Lj1.91255@news.uswest.net>

Frankly, I don't consider image footprint or runtime speed (within reason)
to be factors worthy of comparison here...

We live in a virtual memory world with large memories, large disk spaces,
and with page-frame demand loading of code. It wouldn't matter how large a
DLL is, because only the working portions need to be paged into memory. Of
course, this presumes some reasonable degree of code and data locality.

But if a program gets too large -- get a bigger disk. If it runs too
slowly -- get a faster computer.

My real quest here was to shrink the burden of code maintenance against
inevitable bit-rot. I didn't exactly achieve that through LOC reduction
here. But I did demonstrate ease of coding, and ease of maintenance, when
the application was coded in a HOL like Lisp.

- DM

From: Wade Humeniuk
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 19:47:13 +0000
Message-ID: <9ultj7$3ud$1@news3.cadvision.com>

Since you are using LW is is probable that you are just not specifying
delivery keywords correctly in the delivered image.  At level 5 you have to
add some keywords that are not obvious.  Perhaps you could post your
delivery file for some analysis?

It is very unlikely that LW is removing some of your actual application or
supporting code.

Wade


"David McClain" <······@qwest.net> wrote in message
························@news.uswest.net...
> Yes,... the tree shaker shakes too much from the Lisp at this time so that
> the result can't start up at DLL load time. That is what gives me 15 MB
> images. When the tree shaker is turned on hard, I have some indication
that
> the DLL image size could shrink to 3 MB.
>
> - DM
>
>

From: David McClain
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Thu, 06 Dec 2001 01:17:11 +0000
Message-ID: <CFzP7.287$Ku6.411610@news.uswest.net>

Yes, my code is available to all...

I ran into difficulties with the shaker excising portions of the FLI at all
levels above level 0. Level 1 manages to report the problem at DLL load
time, but by the time you get to level 5, there is no error reporting
available. So my test was run with a level 0 delivery.  No doubt I could
futz around with all the keywords and find some combo that would work.

But that has no bearing on whether this test succeeds or not for codebase
shrinkage. In fact, that is why I never even mentioned the particular Lisp
in the original posting. All of the application code is in CL. A few lines
(100?) are devoted to COM/OLE argument unmarshalling.

But I consider a variation of 100 lines to be in the noise... The question
was, does using a HOL significantly (e.g., 5x to 10x) shrink the size of
code to be maintained? I found, in this particular test case,
that it does not.

Does that mean one should just drop the idea of recoding in a HOL?
Absolutely not!  It simply means that I cannot go to my management with a
claim of codebase shrinkage as an important reason for moving to a HOL.
There are many more reasons, as anyone on this list knows, for going ahead
with HOL recoding anyway.

- DM

From: Carl Shapiro
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 19:24:36 +0000
Message-ID: <ouyy9khlbbf.fsf@panix3.panix.com>

"David McClain" <······@qwest.net> writes:

> Yes,... the tree shaker shakes too much from the Lisp at this time so that
> the result can't start up at DLL load time. That is what gives me 15 MB
> images. When the tree shaker is turned on hard, I have some indication that
> the DLL image size could shrink to 3 MB.

I think I am having some trouble following you here.  Are you
suggesting that your tree shaker is excising parts of your Lisp image
which are actually used by your program?

You've mentioned that you don't care too much about the DLL size, but
I am still curious as to how tree shaking has negatively impacted the
fitness of your application.

From: David McClain
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Thu, 06 Dec 2001 01:11:32 +0000
Message-ID: <jAzP7.279$Ku6.407893@news.uswest.net>

> I think I am having some trouble following you here.  Are you
> suggesting that your tree shaker is excising parts of your Lisp image
> which are actually used by your program?

What I am suggesting is that the tree shaker appears to have some problems.
It appears to excise portions of the foreign language interface that are
apparently needed at DLL load time, well ahead of any point where I could
possibly intervene and catch the error.

I am not uncomfortable with this particular quirk, as a new version of the
lisp is about to be introduced. I assume, if we decided to go this
direction, that working with the vendor would ameliorate this kind of
problem.

So in total, the tree shaking problems, and the size of the DLL, have no
bearing on my opinion of the applicability of Lisp to my codebase reduction
needs.

- DM

From: Robert Monfera
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Thu, 06 Dec 2001 01:16:30 +0000
Message-ID: <3C0EC6C5.4080108@fisec.com>

Unless the DLL runs in a mobile phone, why is it important?  15MB RAM 
costs exactly one dollar.

Robert

Carl Shapiro wrote:

> "David McClain" <······@qwest.net> writes:
> 
> [...]
> 
> 
>>1. Lisp is a huge langauge with nearly everything you need already built in.
>>But it produces very bulky DLL's -- on the order of 15 MBytes.
>>
> 
> Did you deliver your DLL with or without performing some sort of
> tree-shaking or delivery optimization of your Lisp image?
>

From: ········@acm.org
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Thu, 06 Dec 2001 01:57:55 +0000
Message-ID: <DgAP7.22544$iF3.2085992@news20.bellglobal.com>

Robert Monfera <·······@fisec.com> writes:
> Unless the DLL runs in a mobile phone, why is it important?  15MB
> RAM costs exactly one dollar.

.. Unless it's RAM for my firewall machine, in which case 15MB of RAM
costs, um, about $53.  (IBM IntelliStation, of Pentium Pro vintage...)

And I'd have to go with an estimate of 
   (float (* 15 (/ 45 256)))
or about $2.64.

Off by a little over 1 binary order of magnitude, though I _certainly_
agree with the base point, which is that if 256MB of RAM costs under
$50, many performance problems may be easily solved by applying $50 to
them...
-- 
(concatenate 'string "aa454" ·@freenet.carleton.ca")
http://www.cbbrowne.com/info/finances.html
"Anyway I know how to not be bothered by consing on the fly."
-- Dave Moon

From: Kaz Kylheku
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Thu, 06 Dec 2001 03:19:14 +0000
Message-ID: <SsBP7.27528$nm3.1239391@news1.rdc1.bc.home.com>

In article <················@fisec.com>, Robert Monfera wrote:
>Unless the DLL runs in a mobile phone, why is it important?  15MB RAM 
>costs exactly one dollar.

Awesome! I have $1000 to burn, let's put 15,000 megs of RAM into
my computer!  :)

From: Will Deakin
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Thu, 06 Dec 2001 12:21:07 +0000
Message-ID: <3C0F62B3.9010603@hotmail.com>

Kaz Kylheku wrote:

> Awesome! I have $1000 to burn, let's put 15,000 megs of RAM into
> my computer!  :)

Since this seems a tad excessive. Could you give me $500 and then 
get 7500M of RAM...

:)w

From: David McClain
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 18:18:18 +0000
Message-ID: <UwtP7.822$Lj1.101600@news.uswest.net>

For the record...

I should point out that these tests were performed in LWW 4.1.20.

As a Lisp "guru" coworker has properly pointed out, the shrinkage would be
far more remarkable had the Lisp provided a COM/OLE interface. In that case
the entire C/C++ wrapper could be elided, and in fact, all the machine
generated C/C++ wrappers for M$ ATL could be discarded.

Then the total code base size of the Lisp app would be on the order of 1
KLOC, as compared to an overall size of about 3500 LOC for the ATL C/C++
version.

I understand that both ALC 6.1 and Corman Lisp provide COM/OLE capabilities.
I hear rumors (?) that LWW 4.2 will too.

- DM

From: Marco Antoniotti
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 19:14:44 +0000
Message-ID: <y6c7ks1eaxn.fsf@octagon.mrl.nyu.edu>

"David McClain" <······@qwest.net> writes:

> For the record...
> 
> I should point out that these tests were performed in LWW 4.1.20.
> 
> As a Lisp "guru" coworker has properly pointed out, the shrinkage would be
> far more remarkable had the Lisp provided a COM/OLE interface. In that case
> the entire C/C++ wrapper could be elided, and in fact, all the machine
> generated C/C++ wrappers for M$ ATL could be discarded.
> 
> Then the total code base size of the Lisp app would be on the order of 1
> KLOC, as compared to an overall size of about 3500 LOC for the ATL C/C++
> version.
> 
> I understand that both ALC 6.1 and Corman Lisp provide COM/OLE capabilities.
> I hear rumors (?) that LWW 4.2 will too.

Time for a standardization effort?

Cheers

-- 
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group        tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                 fax  +1 - 212 - 995 4122
New York, NY 10003, USA                 http://bioinformatics.cat.nyu.edu
                    "Hello New York! We'll do what we can!"
                           Bill Murray in `Ghostbusters'.

From: Kaz Kylheku
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 19:16:29 +0000
Message-ID: <houP7.25421$nm3.1159358@news1.rdc1.bc.home.com>

In article <····················@news.uswest.net>, David McClain wrote:
>For the record...
>
>I should point out that these tests were performed in LWW 4.1.20.
>
>As a Lisp "guru" coworker has properly pointed out, the shrinkage would be
>far more remarkable had the Lisp provided a COM/OLE interface.

COM and OLE are not part of C++, so C++ cannot be said to provide these
interfaces either.

>In that case
>the entire C/C++ wrapper could be elided, and in fact, all the machine
>generated C/C++ wrappers for M$ ATL could be discarded.

If you are using COM, ATL and all that stuff, you are not programming
in standard C++, but in a bastardized platform-specific dialect involving
a whole lot of proprietary component technology.

If you don't reveal this upfront, then your comparison results are misleading.

From: David McClain
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Thu, 06 Dec 2001 01:22:07 +0000
Message-ID: <eKzP7.291$Ku6.414602@news.uswest.net>

I believe I did state in the original post that I used M$ VC++ and its ATL
frameworks.

...from the original posting...
---------------------------
The existing version was coded in M$ VC++ making heavy use of STL. It is a
COM/OLE process server based on M$ ATL. All three versions retain a machine
generated ATL wrapper code for this COM/OLE behavior -- I only needed to
write a few lines of IDL to produce the basic skeleton, and all three
versions use identical stuff here...
-----------------------------

Look, nobody is trying to give any language a "black eye" here. I simply
tried an experiment to verify some initial indications of drastic codebase
size reductions when moving to Lisp. I did not find that to be the case in
this particular instance.

For the record, I happen to like all of C, C++, Lisp, OCaml, and many more.
I have enormous respect for compiler and language writers. I have been a
compiler writer several times now myself. I know the blood, sweat, and
tears, that go into producing a language system for computing. I don't think
anyone sets out to develop a "bad" language. At it happens, sometimes, some
ideas are not very good, and others are extremely good. I have yet to find
one language that manages to hit the mark on every possible comparison
point. I don't believe that is even possible.  But nearly every language
invented has some special strengths -- if nothing more than to teach what
not to do the next time around.

- DM

From: Sashank Varma
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Thu, 06 Dec 2001 01:24:17 +0000
Message-ID: <sashank.varma-0512011924170001@129.59.212.53>

In article <····················@news.uswest.net>, "David McClain"
<······@qwest.net> wrote:

>Look, nobody is trying to give any language a "black eye" here. I simply
>tried an experiment to verify some initial indications of drastic codebase
>size reductions when moving to Lisp. I did not find that to be the case in
>this particular instance.

I've been reading along and find your experiences/experiment
interesting.

From: ········@acm.org
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Thu, 06 Dec 2001 01:58:30 +0000
Message-ID: <ahAP7.15205$BS1.2094681@news20.bellglobal.com>

"David McClain" <······@qwest.net> writes:
> Look, nobody is trying to give any language a "black eye" here. I
> simply tried an experiment to verify some initial indications of
> drastic codebase size reductions when moving to Lisp. I did not find
> that to be the case in this particular instance.

Hopefully few will jump to a "knee-jerk response" of assuming that
your knee is heading towards somebody's eye...

Your observations have been pretty even-handed, and it's clear enough
that you're not simply trying to bash any of the languages.  

It might be that there is some "silver bullet of expressivity" that
you could have pulled out for one or another of the languages that
would have led to _massive_ reductions in program size; absent
evidence, it's not fair to assume that to be the case.
-- 
(reverse (concatenate 'string ·············@" "sirhc"))
http://www.cbbrowne.com/info/unix.html
Rules  of  the  Evil  Overlord  #89.  "After  I  captures  the  hero's
superweapon, I  will not immediately  disband my legions and  relax my
guard because I believe whoever holds the weapon is unstoppable. After
all,   the  hero  held   the  weapon   and  I   took  it   from  him."
<http://www.eviloverlord.com/>

From: David McClain
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Thu, 06 Dec 2001 02:23:46 +0000
Message-ID: <3EAP7.308$Ku6.459156@news.uswest.net>

> It might be that there is some "silver bullet of expressivity" that

Yes, please, anyone! If there is such a "silver bullet" let me know!!!

Cheers,

- DM

From: Tim Bradshaw
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Tue, 11 Dec 2001 14:48:09 +0000
Message-ID: <fbc0f5d1.0112110648.7bc9459f@posting.google.com>

"David McClain" <······@qwest.net> wrote in message news:<····················@news.uswest.net>...
> 
> Then the total code base size of the Lisp app would be on the order of 1
> KLOC, as compared to an overall size of about 3500 LOC for the ATL C/C++
> version.

One thing that occurs to me is that all these numbers are probably too
close to zero to be really useful: you may be just measuring small
fluctuations which don't really affect what a big program will look
like.

Here's an example I have.  I'm campaigning to write a program in CL
which would otherwise be implemented in some combination of python or
perl.  There are a couple of things that it needs to do which can be
done very concisely in perl or (I suppose) python.  Unfortunately one
of these (do something like popen) is also not available in portable
CL (this is not a suggestion that CL should have popen), although it
is available in practice in all implementations I care about.

So I have some utility code, some of which is portable CL and some
(not much) of which is implementation-specific code.  I guess there's
currently 200 lines of this stuff, and maybe there will eventually be
a little more. There will definitely be more if I have to run in more
than one implementation, so I am maybe losing to C++ here, if it is
actually possible to write compiler/OS independent C++ code, which it
may be, if you are very careful.  I'm *not* losing to perl or python
since I don't have the option of another implementation there.

Anyway, back to the point: my null application is now 200 lines bigger
than the null application in perl or python.   Even if Lisp halves my
LOC I have to write over 400 lines of perl/python before my CL
application is winning in LOC.  This is kind of close to your
1,000-line ballpark.  But my program is probably going to be 10,000
lines, not 1,000, and at that point my zero-point cost is really in
the noise.

Now, I claim anyway that LOC comparisons are largely not interesting,
but even if they are interesting then they may only be interesting for
really substantial programs.

--tim

From: David McClain
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Tue, 11 Dec 2001 23:39:44 +0000
Message-ID: <4OwR7.159$a47.156906@news.uswest.net>

Yes, I am beginning to feel this way myself. I still believe there is much
to be gained in going to CL for this kind of work. But for little service
modules like this DLL I used for my test, it is difficult to describe its
operation, in any language, using fewer lines of code than we saw in that
test.

> Now, I claim anyway that LOC comparisons are largely not interesting,
> but even if they are interesting then they may only be interesting for
> really substantial programs.

I would say LOC comparisons are interesting to me only from the perspective
of how much source I must wade through to get an understanding of its
operation, and how complex a "minor" change might be. Other than that, I
tend to agree wholeheartedly.

BTW, good luck in prevailing with CL! I am not a perl nor python programmer,
though I have played with python for at least a few hours. It appears that
both the pearl and python worlds are in a state of constant flux at this
time. cf, ···········@ai.mit.edu

- DM

From: Christopher Stacy
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 11:30:10 +0000
Message-ID: <ud71tc3b1.fsf@spacy.Boston.MA.US>

I'm not a great believer in LOC comparisons in general,
but if I were doing a cross-language comparison I would
think carefully about whether to count bare structural
punctuation lines (such as "}"s in C).   The indentation
and line-break decisions in Lisp are analagous to that;
both languages have other issues with how much to cram
on one line.   Also, as you noted, Lisp programs tend
to use longer identifiers, which those programmers
consider to be a positive feature, even though it affects
the indentation decisions and thereby increases the LOC count.

As I am sure you know, LOC statistics is a very debatable metric in
general.  I can't think of any time I've ever worried about how many
lines of code a Lisp program is, but I get more worried about large
programs in most other languages because I know they are almost
guaranteed to be harder to maintain, improve, or rewrite.
That is, to compute the value of a specific property in the
comparisons, I multiply the LOC by other terms whose coefficients 
are based on the selected language.

From: David McClain
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 17:12:38 +0000
Message-ID: <kzsP7.588$Lj1.84907@news.uswest.net>

Yep... I also recounted the Lisp discarding lines containing only closing
parens... And it shrinks the Lisp counts too. But so what, these counts are
so close as to be a tie.

That C/C++ achieved these low LOC counts is due entirely to the cryptic and
arcane use of massive language overloading (ie., unreadable code) and
intensive use of STL (ie., unreadable code). So while C/C++ *can* show
impressively low LOC counts on the same order as Lisp or OCaml, once it
does, you have to realize that it was done in a write-only language.

I *really* wanted to drop the LOC on my back!! I wasn't able to do that with
this one test case. But is there still reason to recode in Lisp?
ABSOLUTELY!!!

- DM

From: Alain Picard
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Thu, 06 Dec 2001 10:54:29 +0000
Message-ID: <86vgfkaaai.fsf@gondolin.local.net>

"David McClain" <······@qwest.net> writes:

> Yep... I also recounted the Lisp discarding lines containing only closing
> parens... And it shrinks the Lisp counts too. 

WHAT?!  You have lines containing _only_ closing parens in lisp? 

Heresy! Burn him!     

:-)

                        Alain "))))))))" Picard

-- 
It would be difficult to construe        Larry Wall, in  article
this as a feature.			 <·····················@netlabs.com>

From: Marc Battyani
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 11:54:11 +0000
Message-ID: <BC509B6316F74CC6.88B528E779C0ED4F.1757712AC26FCE91@lp.airnews.net>

"David McClain" <······@qwest.net> wrote
> I just finished my experiment to reduce the size of a fielded application
by
> recoding in either of Lisp or OCaml. I had early indications that, aside
> from pure ease of programming in these HLL's, the overall code base would
be
> drastically reduced (5x to 6x). That is certainly true if you count all
the
> source code needed to produce the application, but an honest, impartial,
> comparison of the lines I actually had to write, of non-reusable,
> application specific code shows somewhat disappointing results on this
basis
> alone.
>
> The application is a system network server that performs recursive prefix
> mappings of file pathnames, including environment variable substitutions.
> This is a variation on the system provided by the Sprite experimental OS
> developed at UCB by John Ousterhout, et. al. in the late 1980's and early
> 1990's.
>
> The existing version was coded in M$ VC++ making heavy use of STL. It is a
> COM/OLE process server based on M$ ATL. All three versions retain a
machine
> generated ATL wrapper code for this COM/OLE behavior -- I only needed to
> write a few lines of IDL to produce the basic skeleton, and all three
> versions use identical stuff here...
>
> For the application specific coding, the scores are:
>
> Existing App:
> C/C++ = 1106 LOC
>
> Lisp Version:
> C/C++ = 284 LOC, Lisp = 798 LOC  --> Total = 1082 LOC
>
> OCaml Version:
> C/C++ = 284 LOC, Lisp = 58 LOC, OCaml = 453 LOC --> Total = 888
>
> These LOC counts do *not* include blank lines and comment only lines.
>
> On the basis of code-base size reduction, these results are nearly a tie.
>
> But on the basis of ease of programming, I have to award Lisp first,
> followed by OCaml, and distantly trailed by C/C++. The reasons for this
are:
>
> 1. Lisp is a huge langauge with nearly everything you need already built
in.
> But it produces very bulky DLL's -- on the order of 15 MBytes.
>
> 2. OCaml is equally terse as Lisp, or even slightly better, but needs a
fair
> amount of additional support routines written, to cover the application
> needs. Some of this is in C/C++ (very little) but most has to do with
> providing things like unwind-protect, generalized string handling,
> generalized list operations. It produces very fast runtime code (not
needed
> here) and quite reasonably sized DLL's -- about 300 KBytes (50x smaller
than
> Lisp!!)
>
> 3. C/C++, making heavy use of classes and STL is nearly unreadable, took a
> long time to program, and is frightening to revisit after some time away
> from it (1 year or more since original writing). C/C++ retains the
> capability to utilize Unicode (FWIW -- I don't really need it), but it was
> written with some embedded bugs that I found only when I was able to
remain
> at the abstract levels permitted by HOL's.
>
> Both the Lisp and OCaml versions were written in the course of 2-3 hours.
> Writing the C/C++ version took the better part of 1 week. Prior to that I
> had written experimental versions in Lisp and had more than a year of
> playing with the system to get an understanding of the needed algorithms.
>
> I will say that both Lisp and OCaml allowed me to spot some errors in the
> C/C++ implementation, fix those errors, and add some extra capability
(about
> 20 LOC in both Lisp and OCaml for the extra stuff).  I estimate the time
> needed to go back and refamiliarize myself with STL and the internal
> architecture of the existing application -- in order to fix the bugs I
> discovered and add the additional capabilities -- would be several days.
>
> I find it remarkable that OCaml has a slight edge on Lisp for terseness of
> expression. OCaml is a highly expressive syntax and you can say quite a
lot
> in a few keystrokes. Lisp tends to be more wordy, use longer identifiers,
> and the code is quite a bit sparser for semantic content over a given
number
> of LOC.
>
> This is as close as I can come to providing an honest, impartial,
comparison
> of these languages for the purpose of rewriting existing code to be more
> maintainable, robust, and correct. I definitely think the effort is
> worthwhile, but not entirely for the reasons I had originally anticipated.

I think the LOC ratio between Lisp and C++/Java is highly dependant of
several factors like :

Code size: Very small applications like "Hello world" will have a ratio of
1. The LOC count for small applications is quite noisy. If you need a
function that is in one language but not in the other you will have to add
or not a few tens of LOC to do it. The code size grows less in Lisp when the
problem size grows.

Complexity: The more complex the problem, the higher will be the ratio. If
your problem only uses iteration and string manipulation then you won't have
the same advantage as if you use Lisp power points like macros, closures,
after/before/around methods, multiple arg dispatch, MOP, etc...

People writing the program: In this case it's the same person but
statistically people using languages like Lisp write better program than
C++/Java/VB guys. ;-) (No flames please I said statistically....)

I generally have a 5 to 10 LOC reduction factor for medium size program (10K
to 100K C++ LOC).
Recently I rewrote a "web service"  from Java using tomcat to Lisp using
mod_lisp. The LOC reduction were from 9000+ to 805! OK this case is a little
bit extreme but it's a real life one.... (BTW the Lisp web service is
faster, more reliable and has better error handling and logging.)

Anyway as you have noted, LOC is only one factor. Development time,
reliability, maintainability, ease of modification, and performance are more
important ones though more difficult to evaluate.

Marc

From: Duane Rettig
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 16:58:26 +0000
Message-ID: <4k7w1six9.fsf@beta.franz.com>

"Marc Battyani" <·············@fractalconcept.com> writes:

> I generally have a 5 to 10 LOC reduction factor for medium size program (10K
> to 100K C++ LOC).
> Recently I rewrote a "web service"  from Java using tomcat to Lisp using
> mod_lisp. The LOC reduction were from 9000+ to 805! OK this case is a little
> bit extreme but it's a real life one.... (BTW the Lisp web service is
> faster, more reliable and has better error handling and logging.)

If this was a one-directional rewrite, I would tend to discount the
ratio somewhat due to the tendency for code redesigns/rewrites to
always result in reduction in size, an increase in speed and reliability,
and higher readability, whether there is a language change or not.
The improvements due to Lisp are not known, because the natural
rewrite improvements have not yet been factored out.

If you were then to rewrite your Java program (i.e. at a time after you
had rewritten the program in Lisp) and you measured LOC and reliability,
then I would tend to believe your numbers more, whether the Java LOC number
were 1000, 2000, or 8000.

(As I told David McClain, I'm not actually asking you to do this, but
I'm making a point about how language comparisons might be improved
in general).

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)

From: Thomas F. Burdick
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 21:46:29 +0000
Message-ID: <xcvher59w7e.fsf@apocalypse.OCF.Berkeley.EDU>

Duane Rettig <·····@franz.com> writes:

> If this was a one-directional rewrite, I would tend to discount the
> ratio somewhat due to the tendency for code redesigns/rewrites to
> always result in reduction in size, an increase in speed and reliability,
> and higher readability, whether there is a language change or not.
> The improvements due to Lisp are not known, because the natural
> rewrite improvements have not yet been factored out.
> 
> If you were then to rewrite your Java program (i.e. at a time after you
> had rewritten the program in Lisp) and you measured LOC and reliability,
> then I would tend to believe your numbers more, whether the Java LOC number
> were 1000, 2000, or 8000.

There's always the flip side: being forced to rewrite a Lisp system in
C++, and having a LOC explosion.  This happened to me once, and it
sucked.  Plus, the C++ was slower and had a couple weird bugs I
couldn't reliably reproduce / figure out (almost certainly on account
of it including an ad-hoc, buggy implementation of some parts of CL
I'd made heavy use of :)

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Raymond Toy
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Thu, 06 Dec 2001 16:52:07 +0000
Message-ID: <4nzo4w1ebs.fsf@rtp.ericsson.se>

>>>>> "Thomas" == Thomas F Burdick <···@apocalypse.OCF.Berkeley.EDU> writes:

    Thomas> Duane Rettig <·····@franz.com> writes:
    >> If this was a one-directional rewrite, I would tend to discount the
    >> ratio somewhat due to the tendency for code redesigns/rewrites to
    >> always result in reduction in size, an increase in speed and reliability,
    >> and higher readability, whether there is a language change or not.
    >> The improvements due to Lisp are not known, because the natural
    >> rewrite improvements have not yet been factored out.
    >> 
    >> If you were then to rewrite your Java program (i.e. at a time after you
    >> had rewritten the program in Lisp) and you measured LOC and reliability,
    >> then I would tend to believe your numbers more, whether the Java LOC number
    >> were 1000, 2000, or 8000.

    Thomas> There's always the flip side: being forced to rewrite a Lisp system in
    Thomas> C++, and having a LOC explosion.  This happened to me once, and it
    Thomas> sucked.  Plus, the C++ was slower and had a couple weird bugs I
    Thomas> couldn't reliably reproduce / figure out (almost certainly on account
    Thomas> of it including an ad-hoc, buggy implementation of some parts of CL
    Thomas> I'd made heavy use of :)

I am (was?) in a similar situation.  I have lots of Lisp code
implementing the physical layer of a system, and I figured it would
take at least a month to redo the thing in C/C++ and re-verify
everything yet again.

I'm pretty sure it will take fewer lines of C (the stuff being lots of
simple bit manipulations), but it will also be vastly less extensible
because it's not needed now, but was very useful when then.

Fortunately, I convinced my colleagues that this is a waste of time,
and hooked up a socket interface between my Lisp system and the other
system that wanted to use it.  Works great, and the Lisp side isn't
even the bottle neck which I thought it might be because I wasn't very
careful about speed.

I might eventually be forced to change it, but maybe by then I'll have
the entire system in Lisp so redoing it would be out of the question.
I hope.

Ray

From: Marc Battyani
Subject: Re: One Man's Language Comparison for Estimating Codebase Size Reduction
Date: Wed, 05 Dec 2001 17:40:06 +0000
Message-ID: <EB909539B78AC28D.36F30DB8B9387932.B2E015050603114B@lp.airnews.net>

"Duane Rettig" <·····@franz.com> wrote
> "Marc Battyani" <·············@fractalconcept.com> writes:
>
> > I generally have a 5 to 10 LOC reduction factor for medium size program
(10K
> > to 100K C++ LOC).
> > Recently I rewrote a "web service"  from Java using tomcat to Lisp using
> > mod_lisp. The LOC reduction were from 9000+ to 805! OK this case is a
little
> > bit extreme but it's a real life one.... (BTW the Lisp web service is
> > faster, more reliable and has better error handling and logging.)
>
> If this was a one-directional rewrite, I would tend to discount the
> ratio somewhat due to the tendency for code redesigns/rewrites to
> always result in reduction in size, an increase in speed and reliability,
> and higher readability, whether there is a language change or not.
> The improvements due to Lisp are not known, because the natural
> rewrite improvements have not yet been factored out.

I fact my terminology was wrong. I wrote the web service in Lisp from the
specs. I did not re-wrote it from the Java sources and I'm not the Java
programmer. (and not a Java programmer...)

> If you were then to rewrite your Java program (i.e. at a time after you
> had rewritten the program in Lisp) and you measured LOC and reliability,
> then I would tend to believe your numbers more, whether the Java LOC
number
> were 1000, 2000, or 8000.
>
> (As I told David McClain, I'm not actually asking you to do this, but
> I'm making a point about how language comparisons might be improved
> in general).

Yes please don't! ;-)

BTW I'm downloading Tomcat and the IBM Java environment to do some Java/Lisp
web service benchmarks next week. I would be interested to know if anybody
has already done some such benchmarks. And what HTTP test application to use
?

Marc