Sexp-base, text generating little languages

From: Paolo Amoroso
Subject: Sexp-base, text generating little languages
Date: Fri, 19 Apr 2002 16:45:02 +0000
Message-ID: <KEnAPDrU1E8pHuxhiayZf7FjrHLe@4ax.com>

I recently read this paper by Richard Waters:

  "Using the New Common Lisp Pretty Printer"
  Some Useful Lisp Algorithms: Part 2
  MERL Technical Report 93-17, August 1993
  http://www.merl.com/reports/TR93-17/

It was an "aha!" experience for me, and I highly recommend it. I realized
how much work this functionality can potentially spare. I love large
languages with fatty specifications :)

This led me to think about some related issues. Suppose you design one of
those sexp-based, text output generating little languages, such as the
popular HTML generators written in Lisp. I can think of at least the
following implementation options:

1) Implement an interpreter/compiler that executes/compiles to output
   generating Lisp forms.

2) Implement little language constructs as macros that expand into output
   generating Lisp forms.

3) Keep little language code as list structures as much as possible, and
   use the pretty printer for output generation.

I have a few questions:

- Are there other options?
- Can option #2 be considered a special case of #1?
- In which cases is #3 preferable to other options?


Paolo
-- 
EncyCMUCLopedia * Extensive collection of CMU Common Lisp documentation
http://www.paoloamoroso.it/ency/README
[http://cvs2.cons.org:8000/cmucl/doc/EncyCMUCLopedia/]

Re: Sexp-base, text generating little languages Bill Clementson
- Re: Sexp-base, text generating little languages Paolo Amoroso
  - Re: Sexp-base, text generating little languages Bill Clementson
Re: Sexp-base, text generating little languages Paul Tarvydas

From: Bill Clementson
Subject: Re: Sexp-base, text generating little languages
Date: Sun, 21 Apr 2002 05:43:06 +0000
Message-ID: <wky9fhehcb.fsf@attbi.com>

Paolo Amoroso <·······@mclink.it> writes:

> I recently read this paper by Richard Waters:
> 
>   "Using the New Common Lisp Pretty Printer"
>   Some Useful Lisp Algorithms: Part 2
>   MERL Technical Report 93-17, August 1993
>   http://www.merl.com/reports/TR93-17/

Interesting paper - thanks for the link.

> It was an "aha!" experience for me, and I highly recommend it. I realized
> how much work this functionality can potentially spare. I love large
> languages with fatty specifications :)
> 
> This led me to think about some related issues. Suppose you design one of
> those sexp-based, text output generating little languages, such as the
> popular HTML generators written in Lisp. I can think of at least the
> following implementation options:
> 
> 1) Implement an interpreter/compiler that executes/compiles to output
>    generating Lisp forms.
> 
> 2) Implement little language constructs as macros that expand into output
>    generating Lisp forms.

I guess that a variation on #1 & #2 would be little language constructs as
macros that expand into non-Lisp output. An example would be the 8051 macro
language sample that Frode Fjeld did: www.cs.uit.no/~frodef/picl.lisp

> 3) Keep little language code as list structures as much as possible, and
>    use the pretty printer for output generation.

The above paper was the first time I've seen this approach taken. Neat idea.

> I have a few questions:
> 
> - Are there other options?

4) The Genetic Programming books by Koza demonstrate using a combination of 
   algorithms (function sets), rules (terminal sets) and iterative processing 
   to produce an sexp-based output program that solves a problem. 

5) Pyemacs allows emacs users to extend emacs with Python using an sexp-based 
   extension to elisp. (Incidentally, it also allows Python users to call into emacs). 
   http://www.iro.umontreal.ca/~pinard/pymacs/pymacs.tar.gz
   This isn't the same as your other examples but demonstrates an sexp-based way
   of representing a non-sexp language.

> - Can option #2 be considered a special case of #1?
> - In which cases is #3 preferable to other options?

--
Bill Clementson

From: Paolo Amoroso
Subject: Re: Sexp-base, text generating little languages
Date: Sun, 21 Apr 2002 13:03:03 +0000
Message-ID: <a6bCPAHpgaZVo+BKUmOkyPzD2CGg@4ax.com>

On Sun, 21 Apr 2002 05:43:06 GMT, Bill Clementson <·······@attbi.com>
wrote:

> Paolo Amoroso <·······@mclink.it> writes:
[...]
> > This led me to think about some related issues. Suppose you design one of
> > those sexp-based, text output generating little languages, such as the
> > popular HTML generators written in Lisp. I can think of at least the
> > following implementation options:
[...]
> > - Are there other options?
> 
> 4) The Genetic Programming books by Koza demonstrate using a combination of 
>    algorithms (function sets), rules (terminal sets) and iterative processing 
>    to produce an sexp-based output program that solves a problem. 

If I understand things correctly, this approach solves a different problem:
it looks for a sexp-based program for solving a given problem. I instead
referred to a possibly more mundane problem: generating text output once
you already know how it should look like.


Paolo
-- 
EncyCMUCLopedia * Extensive collection of CMU Common Lisp documentation
http://www.paoloamoroso.it/ency/README
[http://cvs2.cons.org:8000/cmucl/doc/EncyCMUCLopedia/]

From: Bill Clementson
Subject: Re: Sexp-base, text generating little languages
Date: Mon, 22 Apr 2002 03:29:32 +0000
Message-ID: <wkk7r0l89o.fsf@attbi.com>

Paolo Amoroso <·······@mclink.it> writes:

> On Sun, 21 Apr 2002 05:43:06 GMT, Bill Clementson <·······@attbi.com>
> wrote:
> 
> > Paolo Amoroso <·······@mclink.it> writes:
> [...]
> > > This led me to think about some related issues. Suppose you design one of
> > > those sexp-based, text output generating little languages, such as the
> > > popular HTML generators written in Lisp. I can think of at least the
> > > following implementation options:
> [...]
> > > - Are there other options?
> > 
> > 4) The Genetic Programming books by Koza demonstrate using a combination of 
> >    algorithms (function sets), rules (terminal sets) and iterative processing 
> >    to produce an sexp-based output program that solves a problem. 
> 
> If I understand things correctly, this approach solves a different problem:
> it looks for a sexp-based program for solving a given problem. I instead
> referred to a possibly more mundane problem: generating text output once
> you already know how it should look like.

Koza's program is sexp-based and text output generating. You are right in that
it is different conceptually from the other examples, but I thought it was an
interesting alternative approach (and I thought that was what you were asking
for). Instead of taking one form of syntax and generating another, it takes 
algorithms and success rules and generates an sexp-based program.

--
Bill Clementson

From: Paul Tarvydas
Subject: Re: Sexp-base, text generating little languages
Date: Tue, 23 Apr 2002 18:50:56 +0000
Message-ID: <k2ix8.47126$VLV.33139@news01.bloor.is.net.cable.rogers.com>

"Paolo Amoroso" <·······@mclink.it> wrote in message
·································@4ax.com...
> I recently read this paper by Richard Waters:
>
>   "Using the New Common Lisp Pretty Printer"
>   Some Useful Lisp Algorithms: Part 2
...
> I have a few questions:
>
> - Are there other options?
> - Can option #2 be considered a special case of #1?
> - In which cases is #3 preferable to other options?


You might be interested in checking out TXL http://www.txl.ca/.  It started
out as a tool to experiment with language extentions.  You feed it a loose
grammar and a set of (functional) tree-transformation rules.  It parses,
transforms, unparses and pretty-prints the result.  Other tree-transforming
grammar tools exist (e.g. Antlr).

The book "Effective Tcl/Tk Programming" (Harrison, McLennan) promotes using
a fact-base style, where relations ("facts" - think Prolog) are stored in a
hashtable.  The relations are directly executable Tcl commands (which wows
everyone who hasn't seen Lisp).  The book, of course, promotes using
fact-bases for GUI work, but I've seen it done for compiler work.  For
example, instead of building some sort of data structure as you
parse/analyze incoming code, you simply add facts to the fact base.  The
equivalent of a "data structure traversal" (e.g. to emit code) is done by
querying the fact-base.  Fact bases are cool, because you can use existing
facts to infer new relationships and add them to the factbase, deriving some
very powerful information.  The inferences are written declaratively
(functional) and are, hence, easier to write correctly (and with less Code).

The programming language Icon is a simple and powerful string manipulator
(from the inventors of Snobol).

I've always liked the power of regexps to do source transformation, but
regexps collapse when you need to keep "state information" (doing certain
things across more than one line, deal with languages that contain
structure, etc).  This is what parsers are for (most parsers are, literally,
state machines).  But, most parsers suck at source-to-source transformation
for a couple of reasons:

(a) They often use a class of strict grammar (e.g. LALR) which tends to make
things cumbersome to describe (esp. if you just want to do something
slightly harder than what you can do in Perl or Tcl).  TXL solves this
problem by using a backtracking parser - it accepts all sorts of ambiguous
and quick-to-throw-together grammars.  On today's machines, backtracking
(nondeterminism) ain't a problem.

(b) You often need to build a scanner, as well as a parser.

(c) You have to overspecify the grammar.  Even if all you want to do is to
find some simple pattern, say function names in declarations in C programs,
you have to create a set of tokens that covers all of C, and you have to
specify grammar rules for all of the guts of C programs.  There's no way to
say "just skip to the next matching brace bracket".

(d) Worst of all, parsers tend to use tokenization that discards whitespace
and comments.  You cannot write "the unity transform" with most compiler
tools (even TXL has this problem) - the output can never equal the input.

I fooled with these concepts once and hacked a tool together in LispWorks
that looks like a scanner/parser tool and practically solves the unity
transform problem.  It has a class of tokens called "noise" which are
maintained within the concrete parse tree, but are "invisible" to the parser
and the rule-walker.  The unparser sees and re-emits the noise tokens (the
unparser is 11 lines Lisp code - we're not talking about a difficult concept
:-).  The expermental tool also provides a way to declare many grammars at
once - so if you want to say "skip to the next matching brace", you write a
grammar that acts on braces and simply ignores all other tokens and switch
to that grammar at the appropriate point in the parse.  It ain't perfect,
but it stradles the gap between regexp stuff and parser stuff.

pt