"parse" vs. "read"

From: Sean Champ
Subject: "parse" vs. "read"
Date: Tue, 23 Sep 2003 10:20:24 +0000
Message-ID: <bkp6p8$84o$1@chessie.cirr.com>

Hello.

(disclaim
   Maybe this question will seem too general, but I have to start somewhere.
   So, here goes.)


what are people's thoughts/experiences in regards to the choice of:

  1) modifying the lisp-reader's syntax table (reader-macros, .. etc?)
  
versus

  2) building a parser

...for reading something, with a Lisp image, that was written in  in a
non-lisp language?



my perusal of cl-xml was fairly shallow, and happened a while ago, but it
looked like it /might/ have been using approach #1 for reading XML.

even if it wasn't doing {that}, exactly (it was exporting symbols like < and
 > and such, iirc... I realize, now, that this doesn't mean it was using the
lisp reader in such a way as I meant with approach #1) .. the possibility
"got me to wondering about it".

I think, without having tried it yet, that I would really prefer the work,
of modifying the reader's syntaxt table, to that of having it chew on some
BNF and then ... doing whatever a parser-builder does to get something like XML
parsed. 

The latter approach -- lexxing a grammar-description with something, 
and building functions, on top of the lisp reader, that process the input[1]
-- seems  like it could take less time, from the start of it to the final
"parser" (given a tool like Zebu, for example, and a suitable,
machine-readable grammar description) but I wonder if the former
approach would result in a more efficient thing, which isn't as much a
/parser/ as it is, perhaps, an /interpreter/. 


or does this just sound like rubbish?




Footnotes:

[1] My understanding of "how parsing works, or can work" is limited, yet.
    That may be evident, here.



-- 
Sean Champ
gimbal #at# sdf.lonestar.org
SDF Public Access UNIX System - http://sdf.lonestar.org

Re: "parse" vs. "read" Joe Marshall
Re: "parse" vs. "read" james anderson
Re: "parse" vs. "read" Steven M. Haflich

From: Joe Marshall
Subject: Re: "parse" vs. "read"
Date: Tue, 23 Sep 2003 16:57:06 +0000
Message-ID: <smmnv3x9.fsf@ccs.neu.edu>

Sean Champ <······@sdf.lonestar.org> writes:

> Hello.
>
> (disclaim
>    Maybe this question will seem too general, but I have to start somewhere.
>    So, here goes.)
>
>
> what are people's thoughts/experiences in regards to the choice of:
>
>   1) modifying the lisp-reader's syntax table (reader-macros, .. etc?)
>   
> versus
>
>   2) building a parser
>
> ...for reading something, with a Lisp image, that was written in  in a
> non-lisp language?

Do the latter.

Hacking the readtable is good if you want to augment the lisp syntax, but
basically keep it the same.  It isn't a general parser, though.

One trick I've used, however, is to write a simple parser using
something like Lex and have it simply emit a lisp-readable token stream.

From: james anderson
Subject: Re: "parse" vs. "read"
Date: Tue, 23 Sep 2003 12:36:33 +0000
Message-ID: <3F703B5C.3F604307@setf.de>

Sean Champ wrote:
> 
> Hello.
> 
> (disclaim
>    Maybe this question will seem too general, but I have to start somewhere.
>    So, here goes.)
> 
> what are people's thoughts/experiences in regards to the choice of:
> 
>   1) modifying the lisp-reader's syntax table (reader-macros, .. etc?)
> 
> versus
> 
>   2) building a parser
> 
> ...for reading something, with a Lisp image, that was written in  in a
> non-lisp language?
> 
> my perusal of cl-xml was [] shallow, and happened a while ago, but it
> looked like it /might/ have been using approach #1 for reading XML.
> 

cl-xml uses approach 2. the kernel of the xml parser is a collection of
functions which implement an atn state machine. these functions are generated
by compiling a version of the w3c xml bnf which is slightly modified to
express recursive phrases in more tractible forms. the same mechanism is used
for the xml path and xml query parsers.

> even [though cl-xml] wasn't doing {that}, [at all] (it was exporting symbols like < and
>  > and such, iirc... 

these symbols serve as lexical tokens.

there was a single file in earlier releases, which did use the readtable
approach to read a subset of xml, but it's been missing for a while. the lexer
for the xml-path parser was reader-based. 

>    I realize, now, that this doesn't mean it was using the
> lisp reader in such a way as I meant with approach #1) .. the possibility
> "got me to wondering about it".
> 
> I think, without having tried it yet, that I would really prefer the work,
> of modifying the reader's syntaxt table, to that of having it chew on some
> BNF 

the only thing which does bnf-chewing in cl-xml is in the
"xml:code;atn-parser;" directory. upon perusal, one will recognize a bnf compiler.

>   and then ... doing whatever a parser-builder does to get something like XML
> parsed.

which can be used to generate parsers for a large subset of bnf-amenable
languages. it does not handle general direct left-recursion, and deeply
recursive input may require either careful bnf formulation or any
appropriately large stack.

> 
> The latter approach -- lexxing a grammar-description with something,
> and building functions, on top of the lisp reader, that process the input[1]
> -- seems  like it could take less time, from the start of it to the final
> "parser" (given a tool like Zebu, for example, and a suitable,
> machine-readable grammar description) but I wonder if the former
> approach would result in a more efficient thing, which isn't as much a
> /parser/ as it is, perhaps, an /interpreter/.
> 

the cl-xml parser compiler is the equivalent of zebu. i didn't know of zebu at
the time. 

in any case, one might consider that

- many languages have context-sensitive lexical properties. to account for
them, one would have to swap read tables.

- if the reader macros in the read tables are the standard ones, then,
depending on the nature of the data which the parser generates, use if
temporary storage may be an issue.

- if the reader macros aren't the standard ones, then the read tables are
being used to map (character x character) -> function only, for which purpose
the application may allow alternative implementations.

on the other hand, it has been observed that, if one follows approach 1, one
can avoid slow, buggy reimplementions of system facilities, which onemight be
tempted to undertake in pursuit of approach 2. perhaps that has something to
recommend it.

...

From: Steven M. Haflich
Subject: Re: "parse" vs. "read"
Date: Thu, 25 Sep 2003 01:17:52 +0000
Message-ID: <3F724275.4000805@alum.mit.edu>

Sean Champ wrote:

> I think, without having tried it yet, that I would really prefer the work,
> of modifying the reader's syntaxt table, to that of having it chew on some
> BNF and then ... doing whatever a parser-builder does to get something like XML
> parsed. 

This is a curious statement because you don't give any basis for it.

A readtable provides the facility to read a character and dispatch to
a user-specified function that can do arbitrary things, including reading
more characters.  A general parser (LALR or otherwise) is a program that
reads characters and dispatches (depending on state) to action functions
and a new state.

I fail to see a difference except in details.  Therefore using the Lisp
reader doesn't gain you anything.  Using the Lisp reader and hacking a
readtable is useful if your target language resembles Lisp in enough places
that the standard readtable provides you useful behavior.  To the contrary,
there is almost no overlap with standard Lisp syntax and the Lisp readtable,
and only a few places where Lisp syntax is even _close_ to XML syntax.

BTW, parsing XML conformantly is _hard_ because the specification has
all sorts of nooks and crannies in the specification to hold the melted
syntax -- wait!  I'm confusing the XML specification with a 40-year-old
TV commercials for a particular brand of breakfast English muffin.  It's
an easy mistake to make...