From: Henrik Motakef
Subject: Processing (SG|X)ML
Date: 
Message-ID: <87adgalp7e.fsf@interim.henrik-motakef.de>
Hi,

I know that this is kind of a heretic question here, but however:

I'm looking for a library to work with XML. I know that there are a
few options around (CL-XML, CLLIB xml, UncommonXML, some Allegro stuff
that I don't know the name of), but none of them seem to fit my needs.

It would be very useful if I could access "low-level" information
about the syntactic structure of the processed documents, like what
appears in which storage unit, the character encoding used in them,
unexpanded character/general entities, internal DTD structure
including parameter entities, conditional sections (included or not),
internal subsets etc. That is, the XML Infoset actually is /not/
enough for what I want to do. I want to modify documents on a physical
level, leaving features like the ones mentioned intact.

On the other hand, high-level access, up to XML namespaces, would be
handy sometimes, too. But I guess it's way easier to build that myself
upon a low-level library than the other way around.

Cross-CL-Implementation portability would be a huge plus. I'm
currently using CMUCL/SBCL on Unix, and would not like having to
switch. Users of the stuff I intend to write with it might well have
different preferences. (I guess Unicode handling will be a huge
problem here.)

My impression is that these kinds of things are not really popular
among XML folks, so maybe I have more luck with an older(?) SGML
package. However, I also don't know /any/ SGML impl for CL whatsoever,
although I'm fairly sure there have to be some. Science my input
documents will most often XML, it would be good if the library would
either support WebSGML already or would be easily hackable to do so.

Any hints are highly appreciated.

tia
Henrik

From: Marco Antoniotti
Subject: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <AXa9a.1$qc.160@typhoon.nyu.edu>
Hi

IMHO we have a different problem altogether here.

We miss a portable and Lispy Lex/Yacc pair.

Please do not refer me to Baker's paper.  It does not cut it (even with 
my hacks :) ).

We need something along the lines of ANTLR in Java.  But it should be Lispy.

E.g. I want something with the following interface.

	define-grammar name (&optional args) &optional options

	define-rule (name grammar &key options) lambda-list LHS ==> RHS


Where DEFINE-RULE causes the grammar and the parser to be modified 
dynamically.

The main entry point would be a generic function

	parse stream grammar &key &allow-other-keys

where STREAM could be a string or a stream etc etc.

... and now I hide my hand, after having thrown the stone.

Cheers


--
Marco Antoniotti


Henrik Motakef wrote:

> Hi,
>
> I know that this is kind of a heretic question here, but however:
>
> I'm looking for a library to work with XML. I know that there are a
> few options around (CL-XML, CLLIB xml, UncommonXML, some Allegro stuff
> that I don't know the name of), but none of them seem to fit my needs.
>
> It would be very useful if I could access "low-level" information
> about the syntactic structure of the processed documents, like what
> appears in which storage unit, the character encoding used in them,
> unexpanded character/general entities, internal DTD structure
> including parameter entities, conditional sections (included or not),
> internal subsets etc. That is, the XML Infoset actually is /not/
> enough for what I want to do. I want to modify documents on a physical
> level, leaving features like the ones mentioned intact.
>
> On the other hand, high-level access, up to XML namespaces, would be
> handy sometimes, too. But I guess it's way easier to build that myself
> upon a low-level library than the other way around.
>
> Cross-CL-Implementation portability would be a huge plus. I'm
> currently using CMUCL/SBCL on Unix, and would not like having to
> switch. Users of the stuff I intend to write with it might well have
> different preferences. (I guess Unicode handling will be a huge
> problem here.)
>
> My impression is that these kinds of things are not really popular
> among XML folks, so maybe I have more luck with an older(?) SGML
> package. However, I also don't know /any/ SGML impl for CL whatsoever,
> although I'm fairly sure there have to be some. Science my input
> documents will most often XML, it would be good if the library would
> either support WebSGML already or would be easily hackable to do so.
>
> Any hints are highly appreciated.
>
> tia
> Henrik
From: Edward O'Connor
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <dd1y1mk78j.fsf@oecpc11.ucsd.edu>
> We miss a portable and Lispy Lex/Yacc pair.

Why not use ZEBU?


Ted

-- 
Edward O'Connor
·······@soe.ucsd.edu
From: Marco Antoniotti
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <ERb9a.6$qc.196@typhoon.nyu.edu>
Edward O'Connor wrote:

> >We miss a portable and Lispy Lex/Yacc pair.
>
>
> Why not use ZEBU?


Not Lispy enough and, last I checked, it does not have a nice license. 
Has this changed recently?

Cheers

--
marco
From: ozan s yigit
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <vi4llzuv3in.fsf@blue.cs.yorku.ca>
Marco Antoniotti <·······@cs.nyu.edu> writes:


> We miss a portable and Lispy Lex/Yacc pair.

IMS will clinger has something like this he uses in teaching, but i saw
only  some grammars for it, not the whole tool, so i may be assuming more
than it actually has. i don't even know if he would want to distribute
it.

another possibility is to take rechenberg/mossenbock's coco and write
a lisp version with lispish syntax. [there are many versions of coco
around, eg. http://www.ssw.uni-linz.ac.at/Research/Projects/Coco/Java/
and the code is by and large penetrable]

oz
-- 
you take a banana, you get a lunar landscape. -- j. van wijk
From: Peter Seibel
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <m3u1ei8w9r.fsf@localhost.localdomain>
Marco Antoniotti <·······@cs.nyu.edu> writes:

> Hi
> 
> IMHO we have a different problem altogether here.
> 
> We miss a portable and Lispy Lex/Yacc pair.
> 
> Please do not refer me to Baker's paper.  It does not cut it (even
> with my hacks :) ).

How does it not cut it? I'm interested because I started from that
paper and ended up with something that I found more usable than ANTLR.
And I've spent a fair bit of time wrestling with ANTLR in the past. I
was going to post the code here but it seems to have suffered a bit of
bit rot in the past few weeks while I've been away from it. (It's not
really done, etc. etc. but I was pretty close to getting it to do
things that I never was able to wrap my head around ANTLR well enough
to do. I hope to get back to it soon.)

It did however, get somewhat away from Baker's approach in that I
decided not to screw around with read macros, mostly because I haven't
quite wrapped my head around *them* yet. But the grammar language is
quite concise: For instance a simple grammar for arithemetic
expressions looks like this:

  (defchartype digit     '(satisfies digit-char-p))
  (defchartype term-op   '(member #\+ #\-))
  (defchartype factor-op '(member #\/ #\*))
  
  (defprod number      (+ digit))
  (defprod expression  term (* term-op term))
  (defprod term        factor (* factor-op factor))
  (defprod factor      (/ number (#\( expression #\))))
  
  (defparser arithmetic expression))

Which then defines a parsing function 'arithmetic' which allows you
say:

  (arithmetic "(3+4)*5")

Which will tell you whether the string parses correctly. If you
actually want to get a parse tree, you currently need to annotate your
grammar something like:

  (defun atoi (char) (parse-integer (string char)))

  (defun get-sym (char) (find-symbol (string char) "COMMON-LISP"))
  
  (defprod number     
    (+ (^ digit (+ (* 10 (or number 0)) (atoi digit)))))
  
  (defprod expression
    (^ term) (* (^ (term-op term)
                     (list (get-sym term-op)
                           expression
                           term))))


  (defprod term
    (^ factor) (* (^ (factor-op factor)
                       (list (get-sym factor-op)
                             term
                             factor))))

  (defprod factor
    (/ (^ number) (#\( (^ expression) #\)))))

  (defparser arithmetic (^ expression))

Where the '^' character means capture the value. There's also some
punning going on between the names of the productions and a variable
that will be bound to the currently captured value which is how, for
instance, the 'number' production can build up its value on digit at a
time. In this case, the 'arithmetic' function will return two values,
t or nil to indicate whether the parse suceeded and (in this case) the
s-exp built up during the parse. When I was last working on it, I was
trying to come up with a clever way to minimize the amount of
annotation one needed to do to a plain grammar (such as the first one)
while still getting something more useful than just a boolean out your
parsing function.

For the moment it only parses strings but making it capable of parsing
from streams would probably not be too terrible. (Though I choose to
parse from strings because I decided that all the things that *I* care
about parsing, will easily fit into memory and I save a fair bit of
complexity by not having to worry about how to do backtracking on
streams.)

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

  The intellectual level needed   for  system design is  in  general
  grossly  underestimated. I am  convinced  more than ever that this
  type of work is very difficult and that every effort to do it with
  other than the best people is doomed to either failure or moderate
  success at enormous expense. --Edsger Dijkstra
From: Gabe Garza
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <87fzq27cir.fsf@ix.netcom.com>
Peter Seibel <·····@javamonkey.com> writes:

> Marco Antoniotti <·······@cs.nyu.edu> writes:
> 
> > Hi
> > 
> > IMHO we have a different problem altogether here.
> > 
> > We miss a portable and Lispy Lex/Yacc pair.
> > 
> > Please do not refer me to Baker's paper.  It does not cut it (even
> > with my hacks :) ).
> 
> How does it not cut it? 

  o Having to manually eliminate left recursion is incredibly annoying
  
  o Giving up a lot of useful macro characters is also annoying

  o Having to use a language that emacs can't indent well is annoying

  o The grammar has to be expressed on characters, not streams of 
    tokens, which is sometimes annoying

  o It's not quite write-only, but it can get really close...
    
Please note I think it's annoying, not useless. :)  I use it pretty
much any time I have to deal with non-trivial text in Common Lisp.

Gabe Garza
From: Peter Seibel
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <m3isuyv74q.fsf@localhost.localdomain>
Gabe Garza <·······@ix.netcom.com> writes:

> Peter Seibel <·····@javamonkey.com> writes:
> 
> > Marco Antoniotti <·······@cs.nyu.edu> writes:
> > 
> > > Hi
> > > 
> > > IMHO we have a different problem altogether here.
> > > 
> > > We miss a portable and Lispy Lex/Yacc pair.
> > > 
> > > Please do not refer me to Baker's paper.  It does not cut it (even
> > > with my hacks :) ).
> > 
> > How does it not cut it? 

FWIW:

>   o Having to manually eliminate left recursion is incredibly annoying

My system still suffers from this same problem. I haven't thought too
much about how or whether to fix it. Mostly I was waiting until I had
written some more grammars to see how much of a pain it really was. I
*have* been bit by it already though.

>   o Giving up a lot of useful macro characters is also annoying

So, I avoid this by being too chicken to use reader macros. ;-)

>   o Having to use a language that emacs can't indent well is annoying

I think mine indents fine. It's just regular s-exps. (Well I did have
to tweak my cl-indent a bit for it to grok my new definitional macros.
But that's pretty normal, it seems.)

>   o The grammar has to be expressed on characters, not streams of 
>     tokens, which is sometimes annoying

Mine, like ANTLR, starts from characters and then builds up. But even
more seamlessly. Some productions match particular characters and
other productions compose them into higher level productions. So if
you're trying to parse something that was designed to be lexed and
then parsed (e.g. Java source code) you can write a set of productions
that map to the lexer you would write in lex and then compose those
into a parser that refers to them.

>   o It's not quite write-only, but it can get really close...

In addition to being too chicken to use reader macros, I also had a
hard time reading the Meta language; *I* think my language is more
readable but I would, wouldn't I.

Anyway, I think I'll go see if I can tidy up the bit rot so I can post
some code.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

  The intellectual level needed   for  system design is  in  general
  grossly  underestimated. I am  convinced  more than ever that this
  type of work is very difficult and that every effort to do it with
  other than the best people is doomed to either failure or moderate
  success at enormous expense. --Edsger Dijkstra
From: Marco Antoniotti
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <tWp9a.20$qc.740@typhoon.nyu.edu>
Thanks, but what you write is just the beginning.

The most annoying thing about it is that you do not have good built-in 
and standardized backtracking facilities.  These are necessary when you 
build recursive descent parsers.

You could come up with them using HANDLER-CASE and RESTARTS.  The 
machinery is there.  Of course, as soon as you do that you find yourself 
having to move state data around. (Hint: call/cc :) ).


Cheers


Gabe Garza wrote:

> Peter Seibel  writes:
>
>
> >Marco Antoniotti  writes:
> >
> >
> >>Hi
> >>
> >>IMHO we have a different problem altogether here.
> >>
> >>We miss a portable and Lispy Lex/Yacc pair.
> >>
> >>Please do not refer me to Baker's paper.  It does not cut it (even
> >>with my hacks :) ).
> >
> >How does it not cut it?
>
>
>   o Having to manually eliminate left recursion is incredibly annoying
>
>   o Giving up a lot of useful macro characters is also annoying
>
>   o Having to use a language that emacs can't indent well is annoying
>
>   o The grammar has to be expressed on characters, not streams of
>     tokens, which is sometimes annoying
>
>   o It's not quite write-only, but it can get really close...
>
> Please note I think it's annoying, not useless. :)  I use it pretty
> much any time I have to deal with non-trivial text in Common Lisp.
>
> Gabe Garza
>
From: Joe Marshall
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <wujdzr4x.fsf@ccs.neu.edu>
Marco Antoniotti <·······@cs.nyu.edu> writes:

> Thanks, but what you write is just the beginning.
> 
> The most annoying thing about it is that you do not have good built-in
> and standardized backtracking facilities.  These are necessary when
> you build recursive descent parsers.
> 
> 
> You could come up with them using HANDLER-CASE and RESTARTS.  The
> machinery is there.  Of course, as soon as you do that you find
> yourself having to move state data around. (Hint: call/cc :) ).

You can use continuation-passing-style in Common Lisp.  It works quite
well:


(defun parse-digit (parse-state success failure)
  (let ((probe (current-character parse-state)))
    (if (digitp probe)
        (funcall success (digit-value probe) (next-state parse-state))
        (funcall failure parse-state))))

;; Gobble all the leading whitespace.  Invoke SUCCESS when there is at
;; least one whitespace character, invoke FAILURE when there are none.

(defun parse-leading-whitespace (parse-state success failure)
  (if (whitespacep (current-character parse-state))
      (parse-leading-whitespace (next-state parse-state) success success)
      (funcall failure)))
From: Marco Antoniotti
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <qbq9a.23$qc.740@typhoon.nyu.edu>
Joe Marshall wrote:

> Marco Antoniotti  writes:
>
>
> >Thanks, but what you write is just the beginning.
> >
> >The most annoying thing about it is that you do not have good built-in
> >and standardized backtracking facilities.  These are necessary when
> >you build recursive descent parsers.
> >
> >
> >You could come up with them using HANDLER-CASE and RESTARTS.  The
> >machinery is there.  Of course, as soon as you do that you find
> >yourself having to move state data around. (Hint: call/cc :) ).
>
>
> You can use continuation-passing-style in Common Lisp.

Yes. That is what I meant. :) However, none of this is available in the 
Baker-style code out there.

Cheers

marco






>   It works quite
> well:
>
>
> (defun parse-digit (parse-state success failure)
>   (let ((probe (current-character parse-state)))
>     (if (digitp probe)
>         (funcall success (digit-value probe) (next-state parse-state))
>         (funcall failure parse-state))))
>
> ;; Gobble all the leading whitespace.  Invoke SUCCESS when there is at
> ;; least one whitespace character, invoke FAILURE when there are none.
>
> (defun parse-leading-whitespace (parse-state success failure)
>   (if (whitespacep (current-character parse-state))
>       (parse-leading-whitespace (next-state parse-state) success success)
>       (funcall failure)))
From: Russell McManus
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <878yvmh6xa.fsf@thelonious.dyndns.org>
I have some Baker style code that I use for chopping up lisp
expressions.  It compiles into fast code, and it's pretty powerful.  I
mention it here because I think that a lot the problems with Baker's
META language is the macro character business, and it might be
interesting to show an alternative syntax to express META style
parsers.

My variation on the Baker style parser uses s-exprs instead of macro
characters.  I think that this makes the resulting mini-language
easier to both read and write.  See below for two examples.

Match expression are described by this grammar:

;;; <match-expr>
;;;   :: <atom>                             uses eql
;;;      | (:! <lisp-expr>)                 run lisp code
;;;      | (<match-expr-list>)              match sublist
;;;      | (:or <match-expr-list>)          logical or
;;;      | (:and <match-expr-list>)         sequence, logical and
;;;      | (:* <match-expr)                 0 or more matches
;;;      | (:n <n> <match-expr>)            exactly n matches
;;;      | (:type <type-spec> <var> <code>) type check and binding
;;;      | (:opt <match-expr>)              optional match
;;;
;;; <match-expr-list> :: <match-expr>*


Here are two examples of what you can do with the match compiler:

(deftype form () '(not (member nil)))

;;; matches a scheme-style let block, and returns three values; the
;;; list of bound variables, the list of initial bindings, and the
;;; list of body forms.
(defun match-let (x)
  (let (vars inits body-forms)
    (and (match (let ((:* ((:type symbol var (push var vars))
                           (:type form init (push init inits)))))
                  (:* (:type form f (push f body-forms))))
                x)
         (values (reverse vars) (reverse inits) (reverse body-forms)))))


(match-let '(let ((a 1) (b 2)) (+ a b) "foo!"))
   => (A B) 
      (1 2)
      ((+ A B) "foo!")


;;; matches a cond expression, and returns the equivalent sequence
;;; of nested if expressions.
(defun cond-convert (x)
  (labels ((convert (clauses)
             (if (null clauses) 
                 'nil
               (let* ((clause (car clauses))
                      (test (car clause))
                      (body (cdr clause)))
                 `(if ,test ,body ,(convert (cdr clauses)))))))
    (let (body-forms clauses)
      (and (match
            (cond (:* ((:type form test) 
                       (:* (:type form body (push body body-forms)))
                       (:! 
                        (push (cons test (cons 'progn (reverse body-forms))) clauses)
                        (setf body-forms nil)))))
            x)
           (convert (reverse clauses))))))


(cond-convert '(cond ((= 1 2) "foo" "bar")
                     ((= 3 4) "other" "stuff")))

  => (IF (= 1 2) (PROGN "foo" "bar") (IF (= 3 4) (PROGN "other" "stuff") NIL)) 


With the above changes, I find the Baker approach very convenient (I
use the matcher primarily for code transformations).  Email me if you
want to see the match compiler itself.

-russ


Marco Antoniotti <·······@cs.nyu.edu> writes:

> Joe Marshall wrote:
> 
> > Marco Antoniotti  writes:
> >
> >
> > >Thanks, but what you write is just the beginning.
> > >
> > >The most annoying thing about it is that you do not have good built-in
> > >and standardized backtracking facilities.  These are necessary when
> > >you build recursive descent parsers.
> > >
> > >
> > >You could come up with them using HANDLER-CASE and RESTARTS.  The
> > >machinery is there.  Of course, as soon as you do that you find
> > >yourself having to move state data around. (Hint: call/cc :) ).
> >
> >
> > You can use continuation-passing-style in Common Lisp.
> 
> Yes. That is what I meant. :) However, none of this is available in
> the Baker-style code out there.
> 
> Cheers
> 
> marco
> 
> 
> 
> 
> 
> 
> >   It works quite
> > well:
> >
> >
> > (defun parse-digit (parse-state success failure)
> >   (let ((probe (current-character parse-state)))
> >     (if (digitp probe)
> >         (funcall success (digit-value probe) (next-state parse-state))
> >         (funcall failure parse-state))))
> >
> > ;; Gobble all the leading whitespace.  Invoke SUCCESS when there is at
> > ;; least one whitespace character, invoke FAILURE when there are none.
> >
> > (defun parse-leading-whitespace (parse-state success failure)
> >   (if (whitespacep (current-character parse-state))
> >       (parse-leading-whitespace (next-state parse-state) success success)
> >       (funcall failure)))
From: Peter Seibel
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <m38yvm8bvm.fsf@localhost.localdomain>
Russell McManus <···············@yahoo.com> writes:

> I have some Baker style code that I use for chopping up lisp
> expressions. It compiles into fast code, and it's pretty powerful. I
> mention it here because I think that a lot the problems with Baker's
> META language is the macro character business, and it might be
> interesting to show an alternative syntax to express META style
> parsers.

Heh. That's the same conclusion (that the macro chars are confusing) I
came to when I was working on my own META-inspired parsing scheme
which I mentioned a few days back in this thread. Anyway, I dusted off
the rotten bits and put mine up on web. (Seems everybody's got one of
these things lying around, these days!) Anyway, mine lets you write
grammars like this one below which parses simple arithmetic
expressions and returns a tree of symbols and numbers that can then
(in this case) be passed to EVAL to get compute the answer. The @- and
^-expressions are escapes that let you capture values. There's a fair
bit of punning between production names and variables that get created
for you in the macro expansion. And, of course, none of it is
documented yet.

The full parser generator, which is still a work in progress, is at:

  <http://www.gigamonkeys.com/lisp/parser.lisp>

while this parser is at:

  <http://www.gigamonkeys.com/lisp/math-parser.lisp>

A more complicated parser (lexer actually) is a grammar for tokening
java source code:

  <http://www.gigamonkeys.com/lisp/java-lexer.lisp>

I'm still working on building a full Java parser on top of that lexer
which will probably occasion some changes to the parser framework.

-Peter

(defpackage "MATH-PARSER"
  (:use "COMMON-LISP" "PARSER")
  (:export "ARITHMETIC" "CALCULATOR"))

(in-package "MATH-PARSER")

(defchartype digit     '(satisfies digit-char-p))
(defchartype term-op   '(member #\+ #\-))
(defchartype factor-op '(member #\/ #\*))
  
(defprod number (d)
  (+ (^ (@ digit (setq d (digit-char-p digit)))
        (+ (* 10 (or number 0)) d))))

(defprod expression ()
  ((^ term)
   (* (^ (term-op term)
         (list (find-symbol (string term-op) "COMMON-LISP")
               expression term)))))

(defprod term ()
  ((^ factor)
   (* (^ (factor-op factor)
         (list (get-sym factor-op) term factor)))))

(defprod factor ()
  (/ (^ number)
     (#\( (^ expression) #\))))
  
(defparser arithmetic (^ expression))

(defun calculator (string)
  "Parse the given string and evaluate it as an arithmetic expression."
  (multiple-value-bind (ok tree) (arithmetic string)
    (when ok (eval tree))))



-- 
Peter Seibel                                      ·····@javamonkey.com

  The intellectual level needed   for  system design is  in  general
  grossly  underestimated. I am  convinced  more than ever that this
  type of work is very difficult and that every effort to do it with
  other than the best people is doomed to either failure or moderate
  success at enormous expense. --Edsger Dijkstra
From: Joerg Hoehle
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <u4r67v0be.fsf@dont.t-systems.UCE.spam.no.com>
Marco Antoniotti <·······@cs.nyu.edu> writes:
> > You can use continuation-passing-style in Common Lisp.
> Yes. That is what I meant. :) However, none of this is available in the 
> Baker-style code out there.

This cannot be true since Baker himself wrote papers about
continuation passing style (IIRC in papers on the fringe problem,
iterators, or linear variables, or ...)

> >   It works quite well:
> > (defun parse-digit (parse-state success failure)
> >   (let ((probe (current-character parse-state)))
> >     (if (digitp probe)
> >         (funcall success (digit-value probe) (next-state parse-state))
> >         (funcall failure parse-state))))

I don't think so, except for toy examples.  You'll soon hurt stack
depth limits on larger data. All these FUNCALL tend to pile up on the
stack. How far you can go largely depends on your Lisp system.

Here's a story: once, I wrote a sieve of erathostenes using lazy lists
(aka streams) in CLISP. Then I rewrote it in Scheme, using Peter
Norvig's Scheme-in-CL compiler (from his PAIP book).

Thanks to Scheme's mandatory tail-call-optimization (or whatever the
correct name is), *many* more primes were computed using
Scheme-under-CLISP than using CLISP directly, before the
implementation raised a stack overflow error.

Regards,
	Jorg Hohle
Telekom/T-Systems Technology Center
From: Joe Marshall
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <65qns6jt.fsf@ccs.neu.edu>
Joerg Hoehle <············@dont.t-systems.UCE.spam.no.com> writes:

> > Joe Marshall wrote:
> > >   It works quite well:
> > > (defun parse-digit (parse-state success failure)
> > >   (let ((probe (current-character parse-state)))
> > >     (if (digitp probe)
> > >         (funcall success (digit-value probe) (next-state parse-state))
> > >         (funcall failure parse-state))))
> 
> I don't think so, except for toy examples.  You'll soon hurt stack
> depth limits on larger data.  All these FUNCALL tend to pile up on the
> stack.  How far you can go largely depends on your Lisp system.

It is true that you *could* run out of stack, but there are mitigating
factors.  Not every piece of the parser needs to be written in
continuation-passing-style.  The moment you hit a `normal' return, the
non-optimized tail-calls will be discarded.  Alternatively, most major
lisps can be persuaded to do tail-call optimization given the
appropriate declarations.  Finally, you could rewrite the parse-digit
example as follows:

(defun parse-digit (parse-state)
  (let ((probe (current-character parse-state)))
    (if (digitp probe)
        (values t (next-state parse-state) (digit-value probe))
        (values nil parse-state nil))))

and call it thus:

 (multiple-value-bind (successp state value)
     (parse-digit parse-state)
   (if successp
       ....success condition....
       ....failure condition....))

This saves stack space at the expense of testing the condition twice.

I don't use CPS code all that frequently in Common Lisp, but sometimes
it is just the right tool and it is worth figuring out how to ensure
it will not overflow the stack.
From: Bob Riemenschneider
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <ce15782d.0303121655.7e1c08ef@posting.google.com>
Marco Antoniotti <·······@cs.nyu.edu> wrote in message news:<···············@typhoon.nyu.edu>... 
> The most annoying thing about it is that you do not have good built-in 
> and standardized backtracking facilities.  These are necessary when you 
> build recursive descent parsers.

We've been doing recursive descent parsing by using (a slightly
optimized) version of the PAIP code for Prolog grammar rules in Lisp. 
While not exactly maximally efficient, this turns out to be plenty
fast enough for our applications.

                                                          -- rar
From: Ray Blaak
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <ullzuh16x.fsf@telus.net>
Marco Antoniotti <·······@cs.nyu.edu> writes:
> We miss a portable and Lispy Lex/Yacc pair.
> 
> Please do not refer me to Baker's paper.  It does not cut it (even with 
> my hacks :) ).

Just roll your own recursive descent parsers. When written right, the code
tends to be no more complicated than grammar files *and* more expressive,
since you can easily code up special cases, error corrections, etc.

I used to use nice parsing tools. They were great. Except that I had to deal
with product versioning/licensing issues and bugs.

Then I started doing things directly. The result was less dependencies on
other parsing tools and the same order of complexity in terms of the work I
had to do. In fact things were simpler because usually one writes a grammar
file, and then one writes code the interfaces/implements each rule of the
grammar.

When you write your own recursive descent parser directly, this "double work"
collapses into a directly usable parse tree in the appropriate languange and
data structure of your choice. Furthermore, the code implementing the parser
tends to look like a grammar file and be just as understandable.

I haven't looked back since.

--
Cheers,                                        The Rhythm is around me,
                                               The Rhythm has control.
Ray Blaak                                      The Rhythm is inside me,
·····@telus.net                                The Rhythm has my soul.
From: Rob Warnock
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <28ecnbm8eJ3OKfijXTWc-g@speakeasy.net>
Ray Blaak  <·····@telus.net> wrote:
+---------------
| Marco Antoniotti <·······@cs.nyu.edu> writes:
| > We miss a portable and Lispy Lex/Yacc pair.
| > Please do not refer me to Baker's paper.  It does not cut it...
| 
| Just roll your own recursive descent parsers. When written right, the code
| tends to be no more complicated than grammar files *and* more expressive,
| since you can easily code up special cases, error corrections, etc.
+---------------

I completely agree with this, with one small extension borrowed from
the front-ends of the BLISS compiler family (~30 yrs ago), useful in
those cases where it makes sense:

Yes, use recursive descent for the "major" or "outer" syntax -- say, in
the case of an Algol-like programming language, the control structures
and declarations -- but consider using simple operator precedence for
infix arithmetic expressions. Simple operator precedence can easily be
nested inside a recursive descent parser by assigning very high precedence
to "keywords" in the outer syntax and left-brackets/parens/etc., and by
assigning very low precedence to "stoppers" (right-brackets/parens/commas/
semicolons/EOF).

Compared to a "pure" recursive descent parser, when parsing the most
common infix arithmetic expressions (and especially with syntax like
that of C which has so many levels of precedence!) using simple operator
precedence eliminates the hordes of dummy interior parse nodes you would
otherwise end up with that do nothing but maintain the default operator
grouping priority.

[Though you don't want to use operator precedence for the *whole* parser,
either, since that has other (sometimes more serious) problems. For most
languages, the combination-style parser is easier/better than either alone.]

If this isn't clear I can dig up some sample code (though I'm afraid
it will be in Scheme, that being the most recent time I've used it)...


-Rob

p.s. One more trick from the BLISS compilers: Instead of a one-token
lookahead, they used a *four*-token "lexical window" (the main interface
between the lexer & the parser), with the lexeme (token) slots labelled
SYM, DEL, FUTSYM, & FUTDEL (where FUT == "future", that is,  "next").]
Only "operators" (including keywords and delimiters) went into the *DEL
slots, and only "values" (source constants, variables, or reduced
parse-tree nodes representing values) went into the SYM slot(s). Calling
the lexer (named "RUND" -- Read Until Next Delimiter) shifted the window,
filling the FUTSYM (if it could) and FUTDEL slots from the program text
stream. (Given the BLISS syntax [very close to LL(1)], it was possible
for a SYM slot to sometimes be null [e.g., adjacent operators], but never
a DEL slot.)

Sounds complicated, but actually it's easier to *do* than describe!  ;-}  ;-}

-----
Rob Warnock, PP-ASEL-IA		<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Steven E. Harris
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <87el5ld2bi.fsf@harris.sdo.us.ray.com>
····@rpw3.org (Rob Warnock) writes:

> One more trick from the BLISS compilers: Instead of a one-token
> lookahead, they used a *four*-token "lexical window" (the main
> interface between the lexer & the parser), with the lexeme (token)
> slots labelled SYM, DEL, FUTSYM, & FUTDEL (where FUT == "future",
> that is, "next").]

Sorry to reply to only an ancillary topic of your post, but this
sketch piqued my interest. Can you provide a brief sketch of how some
sample data would get scanned into and moved through these four token
slots?

-- 
Steven E. Harris        :: ········@raytheon.com
Raytheon                :: http://www.raytheon.com
From: Ray Blaak
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <uof4psmwt.fsf@telus.net>
····@rpw3.org (Rob Warnock) writes:
> p.s. One more trick from the BLISS compilers: Instead of a one-token
> lookahead, they used a *four*-token "lexical window" [...]

In the parsers I write, one can do arbitrary lookahead whenever it is
needed. Usually one is asking the tokenizer for the next token, but sometimes
you can ask if the next N tokens are of a particular form, can backtrack to
any point, etc.

This is what I meant by having more expressive parsers when things are done
directly: since you are doing everything completely in your programming
language of choice instead of a grammar file, you can write arbitrary code for
any exceptional situations.

-- 
Cheers,                                        The Rhythm is around me,
                                               The Rhythm has control.
Ray Blaak                                      The Rhythm is inside me,
·····@telus.net                                The Rhythm has my soul.
From: Joe Marshall
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <1y1l3k77.fsf@ccs.neu.edu>
Ray Blaak <·····@telus.net> writes:

> Marco Antoniotti <·······@cs.nyu.edu> writes:
> > We miss a portable and Lispy Lex/Yacc pair.
> > 
> > Please do not refer me to Baker's paper.  It does not cut it (even with 
> > my hacks :) ).
> 
> Just roll your own recursive descent parsers. 

I've had a lot of luck with this approach as well when META didn't do
what I wanted.

You might want to look at `packrat parsing'

@inproceedings{ ford02icfp,
    author = "Bryan Ford",
    title = "Packrat Parsing:
                Simple, Powerful, Lazy, Linear Time",
    booktitle = "Proceedings of the 2002 International Conference
                on Functional Programming",
    year = "2002",
    month = "Oct",
    url = "citeseer.nj.nec.com/534158.html" }

or the parser-tools collection of PLT Scheme which provides a LEX/YACC
facility.
From: Ray Blaak
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <uof4pey21.fsf@telus.net>
Joe Marshall <···@ccs.neu.edu> writes:
> You might want to look at `packrat parsing'
> 
> @inproceedings{ ford02icfp,
>     author = "Bryan Ford",
>     title = "Packrat Parsing:
>                 Simple, Powerful, Lazy, Linear Time",
>     booktitle = "Proceedings of the 2002 International Conference
>                 on Functional Programming",
>     year = "2002",
>     month = "Oct",
>     url = "citeseer.nj.nec.com/534158.html" }

Very interesting. Thanks for the reference.

--
Cheers,                                        The Rhythm is around me,
                                               The Rhythm has control.
Ray Blaak                                      The Rhythm is inside me,
·····@telus.net                                The Rhythm has my soul.
From: Ng Pheng Siong
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <b449oa$4cd$1@reader01.singnet.com.sg>
According to Marco Antoniotti  <·······@cs.nyu.edu>:
> We miss a portable and Lispy Lex/Yacc pair.
> 
> Please do not refer me to Baker's paper.  It does not cut it (even with 
> my hacks :) ).
> 
> We need something along the lines of ANTLR in Java.  But it should be Lispy.

I tried the Meta approach. I tried writing a recursive-descent parser
directly. Gave up after half a day.

Then I installed Prolog and tried writing DCG rules. So far it's been very
easy; the grammar rules are almost a direct translation from the
specification. I simply have my parser spit out s-expressions to be
consumed by my existing Lisp stuff.

I notice there are Prolog/DCG implementations in Lisp. Given my copious
spare time, I'll try those next.

(In my dreams I have long-running Lisp and Prolog processes talking to each
other using the mod_lisp protocol, trading Prolog clauses and Lisp
s-expressions. Real Soon Now!)


-- 
Ng Pheng Siong <····@netmemetic.com> 

http://firewall.rulemaker.net  -+- Improve Your Firewall Operation ROI
http://www.post1.com/home/ngps -+- Open Source Python Crypto & SSL
From: Fred Gilham
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <u7n0k96exp.fsf@snapdragon.csl.sri.com>
> I tried the Meta approach. I tried writing a recursive-descent
> parser directly. Gave up after half a day.
> 
> Then I installed Prolog and tried writing DCG rules. So far it's
> been very easy; the grammar rules are almost a direct translation
> from the specification. I simply have my parser spit out
> s-expressions to be consumed by my existing Lisp stuff.
> 
> I notice there are Prolog/DCG implementations in Lisp. Given my
> copious spare time, I'll try those next.

We use Peter Norvig's stuff to do this.

We wind up with grammar files that look like the following:

      ....

(rule (formal-parameters-or-epsilon (formal-parameters-or-epsilon ?fpl)) -->
      (:word |(|)
      (formal-parameter-list ?fpl)
      (:word |)|))
(rule (formal-parameters-or-epsilon ()) -->
      (:word |(|)
      (:word |)|))
(rule (formal-parameters-or-epsilon ()) --> :epsilon)

(rule (formal-parameter-list (?fp . ?fps)) -->
      (formal-parameter ?fp)
      (:word |,|)
      (formal-parameter-list ?fps))
(rule (formal-parameter-list (formal-parameter-list ?fp)) -->
      (formal-parameter ?fp))


(rule (formal-parameter (formal-parameter ?idm)) -->
      (id-or-meta-list ?idml)
      (:word |:|)
      (syntactic-class ?sc))

      ....

Once you get the hang of it, it's not hard.  It can be tedious to
debug, though.

-- 
Fred Gilham                                        ······@csl.sri.com
The spam folder --- you will never find a more wretched hive of scum
and villainy.  We must be cautious.
From: Bob Riemenschneider
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <ce15782d.0303121708.3a8c117e@posting.google.com>
Fred Gilham <······@snapdragon.csl.sri.com> wrote in message news:<··············@snapdragon.csl.sri.com>...
> > I tried the Meta approach. I tried writing a recursive-descent
> > parser directly. Gave up after half a day.
> > 
> > Then I installed Prolog and tried writing DCG rules. So far it's
> > been very easy; the grammar rules are almost a direct translation
> > from the specification. I simply have my parser spit out
> > s-expressions to be consumed by my existing Lisp stuff.
> > 
> > I notice there are Prolog/DCG implementations in Lisp. Given my
> > copious spare time, I'll try those next.
> 
> We use Peter Norvig's stuff to do this.
> 
> We wind up with grammar files that look like the following:
> 
>       ....
> 
> (rule (formal-parameters-or-epsilon (formal-parameters-or-epsilon ?fpl)) -->
>       (:word |(|)
>       (formal-parameter-list ?fpl)
>       (:word |)|))
> (rule (formal-parameters-or-epsilon ()) -->
>       (:word |(|)
>       (:word |)|))
> (rule (formal-parameters-or-epsilon ()) --> :epsilon)
> 
> (rule (formal-parameter-list (?fp . ?fps)) -->
>       (formal-parameter ?fp)
>       (:word |,|)
>       (formal-parameter-list ?fps))
> (rule (formal-parameter-list (formal-parameter-list ?fp)) -->
>       (formal-parameter ?fp))
> 
> 
> (rule (formal-parameter (formal-parameter ?idm)) -->
>       (id-or-meta-list ?idml)
>       (:word |:|)
>       (syntactic-class ?sc))
> 
>       ....
> 
> Once you get the hang of it, it's not hard.  It can be tedious to
> debug, though.


Actually, it's not too hard to modify the Norvig stuff (or Prolog
grammar rules) so that the parsing keeps track of the farthest it ever
got in the token stream.  This gives you debugging comparable to what
you get with standard parser generators (i.e., "everything looks okay
to here, but I couldn't figure out how to go any farther").

                                                           -- rar
From: Andreas Hinze
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <3E663120.838067B2@smi.de>
Hi,
maybe i don't understand the problem but there is a nice (and fast ;-)
lexer in Michael Parkers CLAWK package. It is very portable (i tested it on
CMUCL, CLisp, LWW & Corman) and produce (as Michael said) output that is
compatible to parsers from LispWorks parser generator tool.

And lalr.lisp is (obviously) an LALR parser generator from the CMU repository
(if i remember right). I haven't used it until now, just played with the
demo in the source (btw. if anyone has experiences with this code please let me
know).

Sincerly
AHz



Marco Antoniotti wrote:
> 
> Hi
> 
> IMHO we have a different problem altogether here.
> 
> We miss a portable and Lispy Lex/Yacc pair.
> 
> Please do not refer me to Baker's paper.  It does not cut it (even with
> my hacks :) ).
> 
> We need something along the lines of ANTLR in Java.  But it should be Lispy.
> 
> E.g. I want something with the following interface.
> 
>         define-grammar name (&optional args) &optional options
> 
>         define-rule (name grammar &key options) lambda-list LHS ==> RHS
> 
> Where DEFINE-RULE causes the grammar and the parser to be modified
> dynamically.
> 
> The main entry point would be a generic function
> 
>         parse stream grammar &key &allow-other-keys
> 
> where STREAM could be a string or a stream etc etc.
> 
> ... and now I hide my hand, after having thrown the stone.
> 
> Cheers
> 
> --
> Marco Antoniotti
> 
> Henrik Motakef wrote:
> 
> > Hi,
> >
> > I know that this is kind of a heretic question here, but however:
> >
> > I'm looking for a library to work with XML. I know that there are a
> > few options around (CL-XML, CLLIB xml, UncommonXML, some Allegro stuff
> > that I don't know the name of), but none of them seem to fit my needs.
> >
> > It would be very useful if I could access "low-level" information
> > about the syntactic structure of the processed documents, like what
> > appears in which storage unit, the character encoding used in them,
> > unexpanded character/general entities, internal DTD structure
> > including parameter entities, conditional sections (included or not),
> > internal subsets etc. That is, the XML Infoset actually is /not/
> > enough for what I want to do. I want to modify documents on a physical
> > level, leaving features like the ones mentioned intact.
> >
> > On the other hand, high-level access, up to XML namespaces, would be
> > handy sometimes, too. But I guess it's way easier to build that myself
> > upon a low-level library than the other way around.
> >
> > Cross-CL-Implementation portability would be a huge plus. I'm
> > currently using CMUCL/SBCL on Unix, and would not like having to
> > switch. Users of the stuff I intend to write with it might well have
> > different preferences. (I guess Unicode handling will be a huge
> > problem here.)
> >
> > My impression is that these kinds of things are not really popular
> > among XML folks, so maybe I have more luck with an older(?) SGML
> > package. However, I also don't know /any/ SGML impl for CL whatsoever,
> > although I'm fairly sure there have to be some. Science my input
> > documents will most often XML, it would be good if the library would
> > either support WebSGML already or would be easily hackable to do so.
> >
> > Any hints are highly appreciated.
> >
> > tia
> > Henrik
From: Nils Goesche
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <ly65qx3e9h.fsf@cartan.de>
Andreas Hinze <···@smi.de> writes:

> maybe i don't understand the problem but there is a nice (and fast
> ;-) lexer in Michael Parkers CLAWK package. It is very portable (i
> tested it on CMUCL, CLisp, LWW & Corman) and produce (as Michael
> said) output that is compatible to parsers from LispWorks parser
> generator tool.

Indeed, they work nicely together.  A simple example:

(defpackage "CALC"
  (:use "COMMON-LISP" "LEXER" "PARSERGEN")
  (:export "CALC"))

(in-package "CALC")

(deflexer lex
  ("[:digit:]+" (return (values 'number (num %0))))
  ("\\+" (return 'plus))
  ("-" (return 'minus))
  ("\\*" (return 'times))
  ("/" (return 'div))
  ("\\(" (return 'lparen))
  ("\\)" (return 'rparen))
  ("[:space:]+"))

(defparser parse
  ((expr sum) $1)
  ((sum sum plus mult) `(+ ,$1 ,$3))
  ((sum sum minus mult) `(- ,$1 ,$3))
  ((sum mult) $1)
  ((mult mult times atom) `(* ,$1 ,$3))
  ((mult mult div atom) `(/ ,$1 ,$3))
  ((mult atom) $1)
  ((atom minus atom) `(- ,$2))
  ((atom plus atom) `(+ ,$2))
  ((atom number) $1)
  ((atom lparen sum rparen) $2))

(defun calc ()
  (eval (parse (lex (read-line)))))

CL-USER 29 > (calc:calc)
-(2 + 3 * -5) * 4 / 13
4

IMHO lexing is one of the (maybe exceptional) cases where regular
expressions are simply the Right Thing.

Regards,
-- 
Nils G�sche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x0655CFA0
From: Marco Antoniotti
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <5VL9a.52$qc.1310@typhoon.nyu.edu>
Hi

I know about CALWK.  It is a very good package.

Alas, it is only one piece of the puzzle.  DEFPARSER works on LW, not 
on, say, MCL.

Cheers




Nils Goesche wrote:

> Andreas Hinze  writes:
>
>
> >maybe i don't understand the problem but there is a nice (and fast
> >;-) lexer in Michael Parkers CLAWK package. It is very portable (i
> >tested it on CMUCL, CLisp, LWW & Corman) and produce (as Michael
> >said) output that is compatible to parsers from LispWorks parser
> >generator tool.
>
>
> Indeed, they work nicely together.  A simple example:
>
> (defpackage "CALC"
>   (:use "COMMON-LISP" "LEXER" "PARSERGEN")
>   (:export "CALC"))
>
> (in-package "CALC")
>
> (deflexer lex
>   ("[:digit:]+" (return (values 'number (num %0))))
>   ("\\+" (return 'plus))
>   ("-" (return 'minus))
>   ("\\*" (return 'times))
>   ("/" (return 'div))
>   ("\\(" (return 'lparen))
>   ("\\)" (return 'rparen))
>   ("[:space:]+"))
>
> (defparser parse
>   ((expr sum) $1)
>   ((sum sum plus mult) `(+ ,$1 ,$3))
>   ((sum sum minus mult) `(- ,$1 ,$3))
>   ((sum mult) $1)
>   ((mult mult times atom) `(* ,$1 ,$3))
>   ((mult mult div atom) `(/ ,$1 ,$3))
>   ((mult atom) $1)
>   ((atom minus atom) `(- ,$2))
>   ((atom plus atom) `(+ ,$2))
>   ((atom number) $1)
>   ((atom lparen sum rparen) $2))
>
> (defun calc ()
>   (eval (parse (lex (read-line)))))
>
> CL-USER 29 > (calc:calc)
> -(2 + 3 * -5) * 4 / 13
> 4
>
> IMHO lexing is one of the (maybe exceptional) cases where regular
> expressions are simply the Right Thing.
>
> Regards,
From: Nils Goesche
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <lyy93s1ght.fsf@cartan.de>
Marco Antoniotti <·······@cs.nyu.edu> writes:

> I know about CLAWK.  It is a very good package.
> 
> Alas, it is only one piece of the puzzle.  DEFPARSER works on LW,
> not on, say, MCL.

Yes, that's a pity; but IMHO DEFLEXER already solves the hard part.
Once you've got a stream of tokens, writing a parser is a piece of
cake.  Hand-written ad hoc lexers, OTOH, tend to be slow, consing
monsters...

Regards,
-- 
Nils G�sche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x0655CFA0
From: Andreas Hinze
Subject: Re: We need a Lex/Yacc for Common Lisp (Re: Processing (SG|X)ML)
Date: 
Message-ID: <3E67AC32.A6005059@smi.de>
But whats about LALR.LISP from CMU repository. As i wrote in my last mail
it seems to handle the parsing job.
Best
AHz

Marco Antoniotti wrote:
> 
> Hi
> 
> I know about CALWK.  It is a very good package.
> 
> Alas, it is only one piece of the puzzle.  DEFPARSER works on LW, not
> on, say, MCL.
> 
> Cheers
> 
> Nils Goesche wrote:
> 
> > Andreas Hinze  writes:
> >
> >
> > >maybe i don't understand the problem but there is a nice (and fast
> > >;-) lexer in Michael Parkers CLAWK package. It is very portable (i
> > >tested it on CMUCL, CLisp, LWW & Corman) and produce (as Michael
> > >said) output that is compatible to parsers from LispWorks parser
> > >generator tool.
> >
> >
> > Indeed, they work nicely together.  A simple example:
> >
> > (defpackage "CALC"
> >   (:use "COMMON-LISP" "LEXER" "PARSERGEN")
> >   (:export "CALC"))
> >
> > (in-package "CALC")
> >
> > (deflexer lex
> >   ("[:digit:]+" (return (values 'number (num %0))))
> >   ("\\+" (return 'plus))
> >   ("-" (return 'minus))
> >   ("\\*" (return 'times))
> >   ("/" (return 'div))
> >   ("\\(" (return 'lparen))
> >   ("\\)" (return 'rparen))
> >   ("[:space:]+"))
> >
> > (defparser parse
> >   ((expr sum) $1)
> >   ((sum sum plus mult) `(+ ,$1 ,$3))
> >   ((sum sum minus mult) `(- ,$1 ,$3))
> >   ((sum mult) $1)
> >   ((mult mult times atom) `(* ,$1 ,$3))
> >   ((mult mult div atom) `(/ ,$1 ,$3))
> >   ((mult atom) $1)
> >   ((atom minus atom) `(- ,$2))
> >   ((atom plus atom) `(+ ,$2))
> >   ((atom number) $1)
> >   ((atom lparen sum rparen) $2))
> >
> > (defun calc ()
> >   (eval (parse (lex (read-line)))))
> >
> > CL-USER 29 > (calc:calc)
> > -(2 + 3 * -5) * 4 / 13
> > 4
> >
> > IMHO lexing is one of the (maybe exceptional) cases where regular
> > expressions are simply the Right Thing.
> >
> > Regards,
From: Robert Braddock
Subject: Re: Processing (SG|X)ML
Date: 
Message-ID: <slrnb6cvkb.uo7.xethair@ogre.concordant-thought.com>
In article <··············@interim.henrik-motakef.de>, Henrik Motakef wrote:
> My impression is that these kinds of things are not really popular
> among XML folks, so maybe I have more luck with an older(?) SGML
> package. However, I also don't know /any/ SGML impl for CL whatsoever,
> although I'm fairly sure there have to be some. Science my input

I'm not any kind of authority, but I don't know of any CL SGML parsers.  I
think your best bet would be to use the OpenSP SGML/XML parser, either by
linking into the C++ library or by adding the information you want to the
onsgmls tool's output and reading that into your lisp.

If you'd like to throw some ideas around on the latter, or generalize the
ESIS idea into something for needs like yours, that would be fun so feel
free to write me.  

--
Robert Braddock

(And if anyone in the DC area would like to hire someone interested in
 Lisp who thinks SGML is "fun," well, that's probably just me, y'know... ;)
From: Lieven Marchand
Subject: Re: Processing (SG|X)ML
Date: 
Message-ID: <87u1egrw2t.fsf@wyrd.be>
Robert Braddock <······@concordant-thought.com> writes:

> I'm not any kind of authority, but I don't know of any CL SGML parsers.  I
> think your best bet would be to use the OpenSP SGML/XML parser, either by
> linking into the C++ library or by adding the information you want to the
> onsgmls tool's output and reading that into your lisp.

There's also the XML parser at www.cl-xml.org.

-- 
And I'm wilder than her and it drives her out of her mind
I guess she thought that she was just one of a kind
But she's a summer storm and I'm a hurricane
One just blows through town, one blows the town away
From: Markus Fix
Subject: Re: Processing (SG|X)ML
Date: 
Message-ID: <3E677C5A.8010503@otop.de>
Henrik,

you might want to have a look at STIL:

ftp://ftp.th-darmstadt.de/pub/text/sgml/stil/README

-fix
From: james anderson
Subject: processing xml [was Re: Processing (SG|X)ML
Date: 
Message-ID: <3E6B5EC8.BAF27E4B@setf.de>
hello;

Henrik Motakef wrote:
> 
> Hi,
> 
> I know that this is kind of a heretic question here, but however:
> 
> I'm looking for a library to work with XML. I know that there are a
> few options around (CL-XML, CLLIB xml, UncommonXML, some Allegro stuff
> that I don't know the name of), but none of them seem to fit my needs.
> 
> It would be very useful if I could access "low-level" information
> about the syntactic structure of the processed documents, like what
> appears in which storage unit, the character encoding used in them,
> unexpanded character/general entities, internal DTD structure
> including parameter entities, conditional sections (included or not),
> internal subsets etc. That is, the XML Infoset actually is /not/
> enough for what I want to do. I want to modify documents on a physical
> level, leaving features like the ones mentioned intact.
> 

to which of the above does cl-xml not afford access?

> On the other hand, high-level access, up to XML namespaces, would be
> handy sometimes, too. But I guess it's way easier to build that myself
> upon a low-level library than the other way around.

it depends. my conclusions, from observing the travails of the java-based xml-world, is no, that's a bad idea. by default, cl-xml will intern universal names to
symbols or name instances. including the dtd components.

as i understand the things on your list, 0.918 can provide you with all information except the storage unit. a distinction between document scope and entity
scope was not introduced until later. if it matters, the changes are minor. as to the other items, by default cl-xml intends to construct an infoset "conform"
document model. one can turn that of and make it leave things unexpanded, but i have to admit i don't use it that way so iut has not been tested in that mode.
if you want you can use it selectively as an event-oriented parser and capture whatever information you need.

> 
> Cross-CL-Implementation portability would be a huge plus. I'm
> currently using CMUCL/SBCL on Unix, and would not like having to
> switch. Users of the stuff I intend to write with it might well have
> different preferences. (I guess Unicode handling will be a huge
> problem here.)
> 

cl-xml 0.918 runs in mcl, allegro, lispworks.
i'm not sure of cmucl status. from time to time i get mail from cmucl users about problems. i write to them with suggestions, but i never hear back, so i don't
know the actual compatibility status.
i observe that sbcl is ported to openmcl, so i could look at compatibility there.

if you intend to work with unicode, i'd be very interested in what you're doing. i'm trying to figure out how to handle scalar values > (unsigned-byte 16) for
the next release. one suggestion has been to use private use code points and register the (unsigned-byte 32) scalar value and/or the surrogate pair. i'm looking
for feedback on how people would need to decode the result subsequent processing.

...
From: Henrik Motakef
Subject: Re: processing xml [was Re: Processing (SG|X)ML
Date: 
Message-ID: <877kb84j1u.fsf@interim.henrik-motakef.de>
james anderson <··············@setf.de> writes:

> > It would be very useful if I could access "low-level" information
> > about the syntactic structure of the processed documents, like what
> > appears in which storage unit, the character encoding used in them,
> > unexpanded character/general entities, internal DTD structure
> > including parameter entities, conditional sections (included or not),
> > internal subsets etc. That is, the XML Infoset actually is /not/
> > enough for what I want to do. I want to modify documents on a physical
> > level, leaving features like the ones mentioned intact.
> > 
> 
> to which of the above does cl-xml not afford access?

I don't really know, given I haven't got it running yet. :-)

That said, from looking at cl-xml's documentation and parts of the
code, I think it's probably the way to go. It looks a /lot/ better
that the other options. (The fact that it actually /has/ documentation
makes it stand out, actually.)


> as i understand the things on your list, 0.918 can provide you with
> all information except the storage unit. a distinction between
> document scope and entity scope was not introduced until later. if
> it matters, the changes are minor. 

This isn't by chance included in the patches you uploaded recently?

> cl-xml 0.918 runs in mcl, allegro, lispworks.  i'm not sure of cmucl
> status. from time to time i get mail from cmucl users about
> problems. i write to them with suggestions, but i never hear back,
> so i don't know the actual compatibility status.

I'm just trying to get it to work with CMUCL. So far, I have
sucessfully compiled the atn-parser (but not tested it yet), and fixed
some stuff in the xqdm module; both problems were due to some MOP
issues (standard-class vs. pcl::standard-class etc). I'm currently
struggling with xqdm-conditions.lisp, apparently CMU can't find the
slots of conditions created with the defWFC etc. macros for some
reason.

Let's see how far I get. I'll keep you informed.

Regards
Henrik
From: Adam Warner
Subject: Re: processing xml [was Re: Processing (SG|X)ML
Date: 
Message-ID: <pan.2003.03.10.02.29.29.511130@consulting.net.nz>
Hi james anderson,

> if you intend to work with unicode, i'd be very interested in what
> you're doing. i'm trying to figure out how to handle scalar values >
> (unsigned-byte 16) for the next release. one suggestion has been to use
> private use code points and register the (unsigned-byte 32) scalar value
> and/or the surrogate pair. i'm looking for feedback on how people would
> need to decode the result subsequent processing.

I've been thinking about using fixnums for storing Unicode characters.
Boxing is not needed because the supplementary character set only needs to
be 21 bits wide:
http://oss.software.ibm.com/icu/docs/papers/unicode_wchar_t.html

   Unicode was designed from the ground up, and it is not byte based. It
   specifies a code point range from 0 to 0x10ffff, so that a data type
   for single Unicode characters needs to be at least 21 bits wide.

This still leaves room for growth on 32-bit platforms. Being able to
compare unique fixnums should be much more efficient than trying to
support extraction and comparison of double-width UTF-16 encoded
characters. Supporting double-width UTF-16 could break random access to
individual characters unless one moved to 32-bit encoding.

I suspect quite fast Unicode support could be built into platforms like
CMUCL without having to rewrite the internal structures. A double-quote
macro could be written to construct strings in the new type. And the
functions that operate on strings could be progressively rewritten to test
and create the new type when desired.

Regards,
Adam
From: Joe Marshall
Subject: Re: processing xml [was Re: Processing (SG|X)ML
Date: 
Message-ID: <7kb7thq9.fsf@ccs.neu.edu>
"Adam Warner" <······@consulting.net.nz> writes:

> I've been thinking about using fixnums for storing Unicode characters.

This is certainly a reasonable approach.  (See Unicode Standard Annex
#19, UTF-32)

> Boxing is not needed because the supplementary character set only needs to
> be 21 bits wide:
> http://oss.software.ibm.com/icu/docs/papers/unicode_wchar_t.html
> 
>    Unicode was designed from the ground up, and it is not byte based. It
>    specifies a code point range from 0 to 0x10ffff, so that a data type
>    for single Unicode characters needs to be at least 21 bits wide.

There are a couple of things to consider.  First, most western scripts
have fewer than 256 characters, so your encoding is likely to be
rather sparse.  Second, some OS's use UCS-2 (similar to UTF-16)
internally.  There may be performance issues if you are converting
between UTF-32 and UCS-2.
 
From: Adam Warner
Subject: Re: processing xml [was Re: Processing (SG|X)ML
Date: 
Message-ID: <pan.2003.03.10.22.52.55.403506@consulting.net.nz>
Hi Joe Marshall,

>> I've been thinking about using fixnums for storing Unicode characters.
> 
> This is certainly a reasonable approach.  (See Unicode Standard Annex
> #19, UTF-32)
> 
>> Boxing is not needed because the supplementary character set only needs
>> to be 21 bits wide:
>> http://oss.software.ibm.com/icu/docs/papers/unicode_wchar_t.html
>> 
>>    Unicode was designed from the ground up, and it is not byte based.
>>    It specifies a code point range from 0 to 0x10ffff, so that a data
>>    type for single Unicode characters needs to be at least 21 bits
>>    wide.
> 
> There are a couple of things to consider.  First, most western scripts
> have fewer than 256 characters, so your encoding is likely to be rather
> sparse.

Indeed. One could consider using internal UTF-8 encoding with 8 bit
strings instead of fixnums with the loss of constant time access to any
character (character access would have similar access properties as
lists). Most western scripts would turn out 1.1 times larger.

Better might be to only promote an 8-bit string to fixnum storage using
some form of contagion. The 8-bit strings should return the same char-code
numbers as the same unicode strings.

> Second, some OS's use UCS-2 (similar to UTF-16) internally.  There may
> be performance issues if you are converting between UTF-32 and UCS-2.

If an operating system uses UCS-2 internally it doesn't support the
supplementary character set[1]. I don't think Unicode should be
implemented in a Lisp today without supporting the full character set.

Regards,
Adam

[1] <http://www.cl.cam.ac.uk/~mgk25/unicode.html>:

   "Unicode" originally implied that the encoding was UCS-2 and it
   initially didn't make any provisions for characters outside the BMP
   (U+0000 to U+FFFF). When it became clear that more than 64k characters
   would be needed for certain special applications (historic alphabets
   and ideographs, mathematical and musical typesetting, etc.), Unicode
   was turned into a sort of 21-bit character set with possible code
   points in the range U-00000000 to U-0010FFFF. The 2�1024 surrogate
   characters (U+D800 to U+DFFF) were introduced into the BMP to allow
   1024�1024 non-BMP characters to be represented as a sequence of two
   16-bit surrogate characters. This way UTF-16 was born, which represents
   the extended "21-bit" Unicode in a way backwards compatible with UCS-2.
From: Adam Warner
Subject: Re: Supporting all of Unicode [was Re: processing xml [was Re: Processing (SG|X)ML]
Date: 
Message-ID: <pan.2003.03.11.01.00.16.98289@consulting.net.nz>
It has been expressed to me privately that (unsigned-byte 21) would be
non-portable. That's not what I was proposing. Fixnums are massively
optimised on all Lisp implementations. On 32-bit systems these fixnums
contain enough free bytes to store every character code. To make the code
fast on 64-bit systems one could just check the size of the fixnum and
store two characters per fixnum when space permits.

I consider it far more important that all rare codes be handled correctly
rather than ignoring the consequences of them appearing. UTF-16 will break
constant-time random access to characters since one character per unit is
now a fiction. Furthermore using fixnums would allow one to actually use
eql to compare if two characters are the same (well, perhaps only on
32-bit systems).

If constant-time random access is not a concern then UTF-16 is also
ridiculously excessive. UTF-8 should be the internal format in such
circumstances. But I don't think Lispers want to be working on strings
like they work on lists instead of sequences.

BTW there are already cool definitions in the supplementary character
set:

Mathematical Alphanumeric Symbols
Range: 1D400�1D7FF
http://www.unicode.org/charts/PDF/U1D400.pdf

It's a pity the glyphs aren't freely available as part of Unicode :-)

Regards,
Adam
From: Adam Warner
Subject: Re: Supporting all of Unicode [was Re: processing xml [was Re: Processing (SG|X)ML]
Date: 
Message-ID: <pan.2003.03.11.01.09.23.388591@consulting.net.nz>
It has been expressed to me privately that (unsigned-byte 21) would be
non-portable. That's not what I was proposing. Fixnums are massively
optimised on all Lisp implementations. On 32-bit systems these fixnums
contain enough free bytes to store every character code. To make the code
fast on 64-bit systems one could just check the size of the fixnum and
store two characters per fixnum when space permits.

I consider it far more important that all rare codes be handled correctly
rather than ignoring the consequences of them appearing. UTF-16 will break
constant-time random access to characters since one character per unit is
now a fiction. Furthermore using fixnums would allow one to actually use
eql to compare if two characters are the same (well, perhaps only on
32-bit systems).

If constant-time random access is not a concern then UTF-16 is also
ridiculously excessive. UTF-8 should be the internal format in such
circumstances. But I don't think Lispers want to be working on strings
like they work on lists instead of vectors. [superseded correction]

BTW there are already cool definitions in the supplementary character set:

Mathematical Alphanumeric Symbols
Range: 1D400�1D7FF
http://www.unicode.org/charts/PDF/U1D400.pdf

It's a pity the glyphs aren't freely available as part of Unicode :-)

Regards,
Adam
From: Sam Steingold
Subject: Re: Processing (SG|X)ML
Date: 
Message-ID: <m3adg4l1h6.fsf@loiso.podval.org>
> * In message <··············@interim.henrik-motakef.de>
> * On the subject of "Processing (SG|X)ML"
> * Sent on 04 Mar 2003 23:52:21 +0100
> * Honorable Henrik Motakef <··············@web.de> writes:
>
> I'm looking for a library to work with XML. I know that there are a
> few options around (CL-XML, CLLIB xml, UncommonXML, some Allegro stuff
> that I don't know the name of), but none of them seem to fit my needs.

what is wrong with CLLIB/XML?

-- 
Sam Steingold (http://www.podval.org/~sds) running RedHat8 GNU/Linux
<http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/>
<http://www.mideasttruth.com/> <http://www.palestine-central.com/links.html>
The program isn't debugged until the last user is dead.
From: Henrik Motakef
Subject: Re: Processing (SG|X)ML
Date: 
Message-ID: <87heac2phi.fsf@interim.henrik-motakef.de>
Sam Steingold <···@gnu.org> writes:

> > I'm looking for a library to work with XML. I know that there are a
> > few options around (CL-XML, CLLIB xml, UncommonXML, some Allegro stuff
> > that I don't know the name of), but none of them seem to fit my needs.
> 
> what is wrong with CLLIB/XML?

First of all, it just doesn't do the kinds of thing I was looking for,
as far as I can tell - it just seems to provide a tree-based, DOM-like
API; probably it's non-validating per se, so likely not of big use when
messing with DTDs. Of course, that is hard to tell, given that there
seems to be no real documentation and the code is not exactly readable
either.

Other problems include the fact that it's GPLed, which could well
become a problem to me, scince I tend to use some libraries with
GPL-incompatible licenses (you offer to work out a solution in cases
where this is a problem, but it still seems to be an unneeded risk, or
at least inconvenience, given that there are alternatives with less
problematic license terms), and that it's released as an unversioned zip
file together with lots of completly unrelated stuff.

Not to mention that it has some kind of namespace support without
reporting namespace-wellformedness violations (for example
redefinition of xmlns:xml), reports some wellformedness constraint
violations as "end-of-file" errors (parameter entities in the internal
subset inside markup declarations), doesn't report others at all (for
example '<' in attribute values or references to undefined entities),
doesn't do attribute value normalization, apparently ignores the XML
decl completly etc., so it plain doesn't qualify as a conformat XML
processor.

Regards
Henrik