From: k p c
Subject: Results of "fast-loading lisp to replace awk" query
Date: 
Message-ID: <1994Jul22.081215.11816@ptolemy-ethernet.arc.nasa.gov>
I posted a query for a fast-loading lisp to replace awk.  A current
application of such a language is to transform Unix ASCII data files
from one format to another; a possible future application is to
program a PDA.  The ideal implementation would be an ANSI Common Lisp
that loads in zero time, runs infinitely fast, uses zero memory, and
has a regexp library.  Compromises include such things as scheme +
bawk.scm.

Here are my results so far, first for scheme and then for other lisps.
In each section, I describe some statistically insignificant results
with implementations.  Finally, I end with why I chose lisp for now.

Comments are welcome, via email preferred.

1.  Scheme.

I don't speak scheme and I do like Common Lisp, but I wanted to "let
the data speak for themselves", so I downloaded and tried

	o scheme48, stk, scm/hobbit, vscm, elk, siod, bigloo,
	o pseudoscheme,
	o slib, bawk.scm, and
	o various other packages that promised regexps or CL
	  compatibility.

I also wrote the author of scsh, visited various FAQs and web pages
including a standard page, and skimmed SICP.  I haven't delved into
call/cc, seriously benchmarked, or written real programs in scheme.

What have I learned about scheme?

	o schemers are enthusiastic.

		o I got a big response from schemers, including two
		  implementers and a repository maintainer; none from
		  Common Lispers; one from an ISO Lisper who politely
		  offered to sell me his implementation for $3000; and
		  one from rms about elisp.

	o useful scheme programs are probably not likely to be
	  portable.

		o bawk.scm looked promising, so i tried it in
		  scheme48.  Mistake!  I then tried it in scm, for
		  which it was written.  Still crashed.  (The bawk
		  author was kind enough to tell me that he is willing
		  to look into why it didn't work in scm, but probably
		  I simply had not compiled scm correctly.)  stk at
		  first didn't like its extension, then crashed when I
		  tried to spoof the extension.  I wish I could, for
		  example, run bawk in scheme48 or stk or scm or elk
		  interchangeably.

		o slib, like bawk, promises to do most of what I need,
		  and promises portability, but it, too, was not
		  portable for a greenhorn.  I tried the distribution
		  slib in scheme48, but access-scheme48 was unbound.
		  (access-scheme-48 was bound: typo?)  I tried the
		  slib that comes with scheme48, but it didn't load.

		  I'm used to this sort of thing in elisp, so I know
		  how it is, and I hardly expect perfection from a
		  changing scheme scene, but I thought that I might be
		  able to use slib as a turnkey package.  This isn't a
		  big disappointment, just a practical impediment.

	o I like many things about scheme, but:

		o Part of why a CL subset or a scheme compatibility
		  library would be more attractive than a spartan
		  scheme is that I would be able to port code more
		  easily among scheme and other lisps.

		o I was hoping that slib would have something like the
		  nice subset of Common Lisp that Dave Gillespie's
		  excellent cl.el Common Lisp compatibility package in
		  elisp provides.  A cltl2 loop subset would be nice,
		  for example.  (I've heard that loop is about the
		  most unschemely thing there is, but I do use it.  de
		  gustibus.)

My impression at this point is that scheme is currently better suited
to metaprogramming, pedagogy, and computer language research than to
very fast, small, practical, portable, high level string- and
OS-oriented tasks.  On c.l.s I see a tension between the small, fast,
practical lisp crowd and the conceptually elegant, pedagogical lisp
crowd.  I sympathize with both perspectives and perhaps a schism is in
order sometime where there is substance.  I'd like to see any lisp
widely used on a PDA, on the web, etc.; what particular lisp it is is
of much less importance, even if I do have preferences.

Scheme implementations:

This is by no means an exhaustive comparison, just quick, preliminary,
possibly wrong impressions offered in the hope that they inform people
about implementations that they haven't tried better than the FAQ
does.  I recommend trying these instead of relying heavily on my
comments, since the implementations are changing (and perhaps my
comments themselves will change them).

	o elk.  People said it would be slow but it's probably faster
	  by now (just guessing).  Installation is pretty clean,
	  candid, and controllable.  Has perhaps a longer history to
	  it than scheme48 and stk?  It claims to have an
	  incremental/generational garbage collector.  fp work great;
	  bignums work great too.  No ratios.  Dumping works nicely
	  (strangely, 1.4MB new image with 'record; the original image
	  is 352KB).

	o stk compiled pretty cleanly.  Numbers show up as exponential
	  notation (elk numbers go on forever; maybe this is
	  configurable).  No ratios.  X stuff is widely talked about.

	o scheme48 compiled cleanly, is a complete environment with
	  things like a trace mechanism, and looks wizzy as a result.
	  I was thrown by the way it emailed my address to MIT when I
	  compiled it.  (Nice idea, amusing in its completeness, but I
	  prefer controlling when my userid sends email.)  It's a
	  virtual machine, so it might be comparatively slow, but I
	  have done no benchmarks.  Floating point numbers and bignums
	  are extremely slow if they are long, but they work.  Has
	  ratios, format, and packages.  Used in web agent work, as in
	  sending small programs around and executing them on web
	  servers.

	  Will it load quickly enough to be used in pipelines a lot? A
	  la:

		cat ·@ |
			grep abc |
			myscheme48 -e '(foo)' |
			sort -$weirdoptions |
			myscheme48 -e '(bar)'

	  scsh description says that pipelines might be a drawback,
	  but since scsh is written in scheme48 and scsh or some
	  derivation of it is likely to gain popularity, there might
	  be pressure on scheme48 to work out a method of fast loading
	  (and faster exact arithmetic?).  And scsh has potential to
	  do useful things (i wonder if perhaps it will have too many
	  features, but that is a minor quibble).

	o scm is reputed to be fast.  Didn't compile cleanly and
	  fully, probably because I am a disliker of single-letter
	  languages and didn't pore over the tweaks.  Everybody uses
	  it.  No compiler?  But hobbit works with it.  A small fee
	  for certain types of redistribution, I think.  Ratio syntax
	  abbreviates division.  There exists a configure script for
	  scm, but I found out about it after compiling it.

	o pseudoscheme didn't load in the one of fcl4.[12] that I
	  tried.  I didn't pursue it since I need a fast loading lisp
	  and I was only trying it to see if it would be useful in
	  debugging or learning scheme.

	o vscm never finished compiling.  As with scm, I did not
	  pursue it.  Doesn't mean there isn't some tweak I missed
	  that would make it compile cleanly; just means it's not
	  totally turnkey the way elk, stk and scheme48 almost are.

	o bigloo, similarly etc.

	o cscheme (c->scheme?  MIT scheme?).  I didn't try it or them,
	  since I didn't see them in the repository (which doesn't
	  mean they are not there).  Somebody said that MIT scheme is
	  only twice as slow as a single letter language.

I like the first 3 implementations the best since they compiled
cleanly (again, the others might also if the right tweaks are
performed, or on different machines than my Sparc 2 running SunOS).  I
can afford to be this picky because the whole purpose of this exercise
is to get away from syntax-oriented languages and to save time.  On
the other hand, scm was recommended by several people.

2.  Lisp:

WRT lisp, I downloaded WCL and benchmarked:

	o WCL, allegro 4.[12], cmulisp, and emacs 19.24.

Each has its advantages.  I'm still benchmarking them for load time,
fp arithmetic, and string matching speed.  I don't have widely useful,
statistically significant results to share, but preliminary results
are interesting in that

	o cmulisp really can be very fast compiled,
	o some lisps scale well with the number of loop iterations
	  performed while others sometimes do not scale well, and
	o elisp can be surprisingly competitive.

(Of course, elisp differs from Common Lisp.)  I have not yet tried
CLiCC, clisp, or GCL.  Of these, I will probably try clisp first,
since it seems to start quickly.  I hope that at least two free
conforming portable Common Lisps survive and flourish.

3.  Likely direction:

Scheme looks fun, but I'm not after fun as much as portability with my
present lisps, certain features, and arrant, bullheaded, unapologetic
speed.

Eventually I will probably learn scheme, but I have decided that
unless it is substantially faster than EL or CL, I will stick with
lisp for the time being.  While I will relax one desideratum for
another, there's no point in a language that's almost as slow as CL
and not compatible with it.  I can't be implementing the equivalent of
(foo) all the time; I'm beyond the point of being so interested in a
new dialect that I'm willing to port back and forth a lot.

I have not done serious scheme benchmarking because I have not yet
gotten used to tail recursion enough to be certain of the significance
of the results as compared to, e.g., loop.  I would like to see
results comparing scheme to lisp, but I have written nothing that
would do so in a fashion representative of the style I would use in
both.  I don't have time to be messing around with scheme more than I
have unless I have some assurance that it is substantially quicker
than lisp.

It would be wonderful if there were a web page dedicated to
benchmarking results for all lisps, including schemes.

(Note to the confused (or latently inflammable :-)):

I might seem to be pulling in two directions, wanting speed and
wanting some degree of CL compatibility.  I don't think that this is
really a strong dichotomy; not only are some CLs approaching good
speed (e.g. compiled cmulisp), but I am willing to give up most of CL
for this purpose, if I get a fast subset in return.  Features I can do
without in a lisp for this purpose include primitive redefinition,
multiple values, fill pointers, conditions, packages, a complete
reader (except perhaps #+), CLOS, old-fashioned constructs like prog,
and a tracer.  elisp's efficiency does not seem to suffer for cl.el,
for example, and sometimes it is even improved by it (e.g. with its
use of compiler macros or defsubst).)

I hope this helps somebody.  I welcome comments, email preferred.

--
···@ptolemy.arc.nasa.gov.  AI, multidisciplinary neuroethology, info filtering.
The FDA attacks tryptophan, melatonin, etc. to increase their and MDs' monopoly
on permission to buy medicine.  This is power; power corrupts.  Health matters.
1speech 2arms 3quarter 4search 5process 6jury 7jury 8excessive 9people 10people

From: Bryan O'Sullivan
Subject: Re: Results of "fast-loading lisp to replace awk" query
Date: 
Message-ID: <CtC4oJ.LsK@dcs.gla.ac.uk>
···@ptolemy.arc.nasa.gov (k p c) writes:

Some notes:

> [Scheme 48 i]s a virtual machine, so it might be comparatively slow,
> but I have done no benchmarks.

I find that it generally runs at about the same speed as (if not a
little faster than) SCM.  Numerical performance is, as you note, a dog
at the moment, but the FP end at least should improve soon.  Loading
source files is a little slow because of the compilation overhead, but
once you've got a fixed image dumped, you're laughing.

> Will it load quickly enough to be used in pipelines a lot?

It takes about 0.2 seconds to fire up and load a 1.2 megabyte heap
image on a moderately loaded SPARC 10, as against 0.1 seconds to get a
less rich environment going in SCM on an unloaded Alpha.  So the
numbers work out approximately the same (excuse the apples-to-oranges
comparison), and compare poorly against about 0.03 seconds to load perl
on both machines.

> scm is reputed to be fast.  Didn't compile cleanly and fully,
> probably because I am a disliker of single-letter languages and didn't
> pore over the tweaks.

There is, as you mention, a configure script which makes the whole
thing go smoothly in plug-and-play manner (across all the platforms
I've tried, at least, including MIPS ``more broken than a very broken
thing'' RISC/os).

SCM is covered by the GPL, so there's no fee *required*.

	<b

[Followups to comp.lang.scheme]
-- 
Bryan O'Sullivan              Will herd cats for food.  3270: my life are yow.
Computing Science Department  Email: ········@maths.tcd.ie,  ···@dcs.gla.ac.uk
University of Glasgow         World Wide Wuh:  http://www.scrg.cs.tcd.ie/~bos/
From: Oliver Laumann
Subject: Re: Results of "fast-loading lisp to replace awk" query
Date: 
Message-ID: <30o4s8$oe@news.cs.tu-berlin.de>
Thank you for the interesting survey.


Let me add a few remarks with respect to Elk:

> 	o People said it would be slow but it's probably faster 
> 	  by now (just guessing).

It's still slow when compared to Scheme implementations that employ
some form of byte-code compilation.  (It's faster than Emacs-Lisp,
though.)  I considered tight integration with C/C++ code and with X and
UNIX more important for an extension language than high execution
speed; time-critical functions can always be recoded in C.

>         Installation is pretty clean, candid, and controllable.
>         Has perhaps a longer history to it than scheme48 and stk?

Elk was first published in fall 1989.

> 	  Dumping works nicely (strangely, 1.4MB new image with 'record;
>         the original image is 352KB).

The newly-created executable file must, of course, include the Scheme
heap.  The sizes reported by ls(1) and size(1) are misleading, though--
the `dump' primitive seeks over the unused parts of the heap (i.e. at
least one half) when creating the new a.out.  As a result, what you see
is a sparse a.out file (at least in systems using the BSD-style a.out
format, such as SunOS 4.x).
From: Ken Anderson
Subject: Re: Results of "fast-loading lisp to replace awk" query
Date: 
Message-ID: <KANDERSO.94Jul22165241@wheaton.bbn.com>
In article <······················@ptolemy-ethernet.arc.nasa.gov> ···@ptolemy.arc.nasa.gov (k p c) writes:

1.  I appreciate your effort and fustration, don't give up.  Bringing up
every scheme or lisp all at once, especially if you are mixing and matching
libraries just may not be as easy as we'd like, and perhaps interopability
is something we should work on.

2.  If you were running a Lisp shell, load time would not be such an issue,
however since you are probably interacting with other Unix things, you
can't eliminate the load time issue either.  There are a lot of things like
EMACS and XMOSAIC we tend ot only start once a day, say.

3.  There is an interesting shell, called es, you can get from
ftp.white.toronto.edu that has many scheme like characteristics, like
closures.  It does not have sheme like syntax however, or TK, or other
things you might want.  However, it is a model of a scheme like shell that
does not have quoting hell.

k
--
Ken Anderson 
Internet: ·········@bbn.com
BBN ST               Work Phone: 617-873-3160
10 Moulton St.       Home Phone: 617-643-0157
Mail Stop 6/4a              FAX: 617-873-2794
Cambridge MA 02138
USA
From: Stephen J Bevan
Subject: Re: Results of "fast-loading lisp to replace awk" query
Date: 
Message-ID: <BEVAN.94Jul23122641@lemur.cs.man.ac.uk>
In article <······················@ptolemy-ethernet.arc.nasa.gov> ···@ptolemy.arc.nasa.gov (k p c) writes:
                   ...
		   o bawk.scm looked promising, so i tried it in
		     scheme48.  Mistake!  I then tried it in scm, for
		     which it was written.  Still crashed.  (The bawk
		     author was kind enough to tell me that he is willing
		     to look into why it didn't work in scm, but probably
		     I simply had not compiled scm correctly.)
                   ...

I think it is more likely that there is a problem with bawk.  I've
never got around to fixing the known problems because a) I didn't
think anyone was using it since until kpc's email nobody had ever
emailed me about it and b) I tend to use mawk.
From: ozan s. yigit
Subject: Re: Results of "fast-loading lisp to replace awk" query
Date: 
Message-ID: <OZ.94Jul24011256@nexus.yorku.ca>
i am still curious why anyone would want to replace awk with anything
else to do those things awk specialises in. does life get any easier when
awk's loop-over-input-lines-and-pattern-match-those-lines-and-break-those
matching-lines-into-delimited-chunks-and-do-something-with-them
model is subsumed by something more powerful?

oz
From: Stephen J Bevan
Subject: Re: Results of "fast-loading lisp to replace awk" query
Date: 
Message-ID: <BEVAN.94Jul24090559@lemur.cs.man.ac.uk>
In article <················@nexus.yorku.ca> ··@nexus.yorku.ca (ozan s. yigit) writes:
   i am still curious why anyone would want to replace awk with anything
   else to do those things awk specialises in.

I can't speak for others but I can say why I using Scheme for some
problems.  I generally use AWK (old habits die hard), but for some
tasks I've found the available data structures in AWK limiting.  In
one program I needed to read in lots of lines of the form :-

  a b c
  d
  a 
  e
  a b
  ...

and generate C code that would match the patterns against some data.
The standard read-a-line+print-result method of processing would
result in code which preformed repeated tests (i.e. if the input was
"a", then "a b c" and "d" would be tested before getting to "a").  A
trie seemed a natural way to store the data and get the optimial
pattern match, but I couldn't figure out how to represent a trie in
AWK.  So I used Scheme (mawk to read in the data, avl-trie to store &
walk data).  If anyone can suggest an AWK solution (with or without
using a trie) that produces an optimal match I'd be interested in
seeing it.
From: Don Bennett
Subject: Re: Results of "fast-loading lisp to replace awk" query
Date: 
Message-ID: <DPB.94Jul25105241@pedernales.sgi.com>
> i am still curious why anyone would want to replace awk with anything
> else to do those things awk specialises in.

You can do those same things in perl and it comes with a
a debugger.

Don
From: Nicolas M Williams
Subject: Re: Results of "fast-loading lisp to replace awk" query
Date: 
Message-ID: <1994Jul25.204513.10875@cs.rit.edu>
In article <················@nexus.yorku.ca> ··@nexus.yorku.ca (ozan s. yigit) writes:
>i am still curious why anyone would want to replace awk with anything
>else to do those things awk specialises in. does life get any easier when
>awk's loop-over-input-lines-and-pattern-match-those-lines-and-break-those
>matching-lines-into-delimited-chunks-and-do-something-with-them
>model is subsumed by something more powerful?

Because awk is not always enough and because the traditional
shell/sed/grep/awk/tail/head/uniq/etc... scripts are not very efficient
(which is fine most of the time as well, but *not* always). (Of course,
lisp is often not much more efficient either; IMHO it is good to have
lots of variety, and a good programmer is one that will make the right
decision as to what language tools to use after bearing in mind all the
trade-offs).

Now mind you, I don't think lisp is the answer either, heck, why is perl
so popular? Simple: situations where lisp is better than perl for Unix
text handling problems are not very common at all; if anything Icon
would be the language of choice when perl is not enough. The only
problems I have with Icon are that it has grown to 1/2M for the
executor and does not have lambda expressions, but otherwise it is an
*excellent* language that any lisp programmer should feel comfortable
with specially when it comess to processing text (very typical in Unix
environments...).

>oz

Nick
From: Brent Benson
Subject: Re: Results of "fast-loading lisp to replace awk" query
Date: 
Message-ID: <BRENT.BENSON.94Jul26081801@jade.csd.harris.com>
[I apologize for the lack of Lisp-substance in this article.]

ozan s. yigit writes:
# 
# i am still curious why anyone would want to replace awk with anything
# else to do those things awk specialises in. does life get any easier when
# awk's loop-over-input-lines-and-pattern-match-those-lines-and-break-those
# matching-lines-into-delimited-chunks-and-do-something-with-them
# model is subsumed by something more powerful?
# 

Perl is just such a language.  While I'm a Lisper at heart, I prefer
perl over awk for l-o-i-l-a-p-m-t-l-a-b-t-m-l-i-d-c-a-d-s-w-t
problems.  In contrast to awk which has undocumented limits on most
everything, perl has very few built in limits.  In addition, awk is
very brittle.  One of my colleagues at work had something resembling
the following in his .signature for a few months:

You don't have awk?  Use this simple shell emulation:
#!/bin/sh
#
echo awk: bailing out at source line 1
exit 2

--
Brent Benson                     
Harris Computer Systems
From: ···@mitech.com
Subject: Re: Results of "fast-loading lisp to replace awk" query
Date: 
Message-ID: <64.2e365d95@mitech.com>
In article <················@nexus.yorku.ca>, ··@nexus.yorku.ca (ozan s. yigit) writes:
> i am still curious why anyone would want to replace awk with anything
> else to do those things awk specialises in.

Maybe these people don't want to use pipes or direct output to
temporary files, or need to get that extra 20% speed improvement.
I can't understand it either, because we are constantly using awk at 
MITECH to preprocess data into easily LISP readable format for 
further processing in commercial applications. In fact the "staged"
nature of things is a confidence builder in mission critical situations,
whatever the cost in computer resources.

Although I can honestly say that a "callable awk" would be convenient.
An API into a subroutine library instead of a program, for when you want
to apply a preparsed awk script to a string that is already in memory
and output to another memory resident string.
From: Thomas M. Breuel
Subject: Re: Results of "fast-loading lisp to replace awk" query
Date: 
Message-ID: <TMB.94Jul29165147@arolla.idiap.ch>
In article <················@nexus.yorku.ca> ··@nexus.yorku.ca (ozan s. yigit) writes:
|i am still curious why anyone would want to replace awk with anything
|else to do those things awk specialises in. does life get any easier when
|awk's loop-over-input-lines-and-pattern-match-those-lines-and-break-those
|matching-lines-into-delimited-chunks-and-do-something-with-them
|model is subsumed by something more powerful?

Much as I like "awk" and "perl" for getting work done, when it comes
to manipulating most kinds of non-trivial data structure (union-find,
priority queues, trees, geometric objects, etc.), they are horrible.

For some of the work I do, it would indeed be nice to have a language
with powerful data abstraction facilities _and_ the file scanning
conveniences of "awk" or "perl" available.  I wouldn't mind that
language to be a Lisp-like language (I'm often using CommonLisp for
that purpose right now).  But I fear Perl 5 is ultimately going to win
for those applications.

				Thomas.