From: Kaelin Colclasure
Subject: Pragmatic parsing macros (aka. META)
Date: 
Message-ID: <9I3y3.56$L4.6750@newsin1.ispchannel.com>
I've been playing around with Henry Baker's META macros (as in
ftp://ftp.netcom.com/pub/hb/hbaker/Prag-Parse.html). While I find his
definitions quite concise, I just couldn't resist trying to play around
with the concept a bit in an effort a) to better understand it and b) to
come up with a set of definitions that don't wierd-out auto-indentation
in emacs. :-)

I'm reasonably comfortable with simple macro definitions in Common Lisp,
but this is my first exposure to reader-macros. [Perhaps this is why I
feel inclined to eliminate them from the code.] My feeble attempts to
get a "normal" macro to do what Baker's reader macros do have thus far
yielded only consternation.

The crucial function called my META's macros is #'compileit:

(defun compileit (x)
  (typecase x
    (meta
     (ecase (meta-char x)
       (#\! (meta-form x))
       (#\[ `(and ,@(mapcar #'compileit (meta-form x))))
       (#\{ `(or  ,@(mapcar #'compileit (meta-form x))))
       (#\$ `(not (do () ((not ,(compileit (meta-form x)))))))
       (··@ (let ((f (meta-form x))) `(match-type ,(car f) ,(cadr f))))))
    (t
     `(match ,x))))

This function is called via a simple macro:

(defmacro matchit (x) (compileit x))

The 'meta type is a defstruct that gets instantiated in various forms
by the reader macros. So the form that goes into x gets transformed by
the reader into a recursive data structure. Very neat. So, all I have
to do is build an equivalent data structure from a normal macro and pass
it to #'compileit -- and then return the result as the macroexpansion...

This is where I am foundering. I can't figure out how to write a macro
that does this. I have tried, but I end up passing unevaluated forms to
#'compileit instead of the 'meta structure.

<Questions>
Is this one of those obscure cases where it's necessary to call #'eval?
Or do you really *have* to use reader macros to do this sort of thing?
</Questions>

Below are two definitions for #'parse-query -- one using Baker's syntax
and one using something more palatable to emacs (but as yet not
successfully implemented).

;;; Working definition (manually indented), using Baker's META macros:
(defun parse-query (string &optional (index 0) (end (length string)))
  (let (c
        (name (make-array 255 :element-type 'character :fill-pointer 0))
        (value (make-array 4096 :element-type 'character :fill-pointer 0))
        (vars nil))
    (matchit
     $[ #\& $[ @(value-char c) !(progn
                                  (setf (char name (fill-pointer name)) c)
                                  (incf (fill-pointer name))) ]
        #\= $[ @(value-char c) !(progn
                                  (setf (char value (fill-pointer value)) c)
                                  (incf (fill-pointer value)))]
        !(progn
           (push (cons (copy-seq name) (copy-seq value)) vars)
           (setf (fill-pointer name) 0)
           (setf (fill-pointer value) 0)) ])
    (when (eql index end)
      vars)))

;;; Non-working definition, using a syntax emacs tolerates (and I prefer):
(defun parse-query (string &optional (index 0) (end (length string)))
  (let (c
        (name (make-array 255 :element-type 'character :fill-pointer 0))
        (value (make-array 4096 :element-type 'character :fill-pointer 0))
        (vars nil))
    (with-match-operators
        $(seq #\& $(seq @(value-char c)
                        !(progn (setf (char name (fill-pointer name)) c)
                                (incf (fill-pointer name))))
              #\= $(seq @(value-char c)
                        !(progn (setf (char value (fill-pointer value)) c)
                                (incf (fill-pointer value))))
              !(progn (push (cons (copy-seq name) (copy-seq value)) vars)
                      (setf (fill-pointer name) 0)
                      (setf (fill-pointer value) 0))))
    (when (eql index end)
      vars)))

From: T. Kurt Bond
Subject: Re: Pragmatic parsing macros (aka. META)
Date: 
Message-ID: <m31zcm17rf.fsf@localhost.localdomain>
"Kaelin Colclasure" <······@everest.com> writes:

> I've been playing around with Henry Baker's META macros (as in
> ftp://ftp.netcom.com/pub/hb/hbaker/Prag-Parse.html). While I find his
> definitions quite concise, I just couldn't resist trying to play around
> with the concept a bit in an effort a) to better understand it and b) to
> come up with a set of definitions that don't wierd-out auto-indentation
> in emacs. :-)
> 
> I'm reasonably comfortable with simple macro definitions in Common Lisp,
> but this is my first exposure to reader-macros. [Perhaps this is why I
> feel inclined to eliminate them from the code.] My feeble attempts to
> get a "normal" macro to do what Baker's reader macros do have thus far
> yielded only consternation.

I've played around a little bit with META, and in the process added a
fully parenthesized syntax.  Here's my version:

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;;; vmeta.lsp -- verbose META.  A work in progress.

#|;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

  You can also find this code at http://access.mountain.net/~tkb/software.html

  This code was inspired by Henry G. Baker's article "Pragmatic
  Parsing in Common Lisp", ACM LISP Pointers IV,2 (April-June
  1991),3-15.  However, any errors are mine; inefficencies too.  There
  have been some minor changes and additions from the code in that
  article. You can download the original paper from
  <URL:ftp://ftp.netcom.com/pub/hb/hbaker/home.html>.

  Comments to ···@access.mountain.net.

  Description of forms:

    match exp => generalized-boolean

      This form applies the matching expression `exp' to the value of
      `sequence', starting at `index' and ending at `end'.  It returns
      true if the expression matched.

      The user must lexically bind `sequence' to the sequence to parse,
      `index' to where the parse should start, and `end' to where the
      parse should end.  The implementation of the matching languag
      uses those variables for its own purposes, so the user should
      not assign to them (or even reference them if they do not know
      what they are doing.).

      Example:

        (deftype alphanumeric () '(satisfies alphanumericp))
        (deftype not-space () '(not (eql #\space)))
        (defun match-imap-status (sequence
                                  &optional (index 0) (end (length sequence))
                                  &aux id status rest)
               (and
                (match [%(id ^(@alphanumeric 1 nil))
                        #\space
                        %(status ^(@not-space 1 nil))
                        {[#\space %(rest ^(@character))] []}])
                (values t id status rest)))
        (match-imap-status "a001 OK Interesting response.")
        ;;=> T ; "a001" ; "OK" ; "Interesting response."
        (match-imap-status "bogusdatahere")
        ;;=> NIL

    match-expr s ([start [end]]) exp => generalized-boolean

      This form binds `sequence' to the value of s and applies the
      matching expression `exp' to `sequence'.  If `start' is
      specified `index' is bound to its value; otherwise `index' is
      bound to zero.  If `end' is specified, `end' is bound to its
      value; otherwise `end' is bound to the length of the sequence.

      Example:

        (let (id status rest)
          (and
           (match-expr "a002 BAD Another interesting response." ()
                       [%(id ^(@alphanumeric 1 nil))
                         #\space
                         %(status ^(@not-space 1 nil))
                         {[#\space %(rest ^(@character))] []}])
           (values t id status rest)))
         ;;=> T ; "a002" ; "BAD" ; "Another interesting response."



  Description of matching expression syntax:

    Non-terminals are in lower case.  The symbol `...' indicates zero or
    more of the preceding expression. 

    [exp ...] 
    (SEQ exp ...)
        Matches every one of `exp ...'.

    {exp ...]
    (ALT exp ...)
        Matches exactly one of `exp ...'

    @typepred
    @(typepred var)
    (type typepred)
    (type typepred var)
        Matches if, given the current `element', (typep element typepred).
        If `var' was specified, it is assigned the value of `element'.

    $exp
    ^(exp)
    ^(exp min)
    ^(exp min max)
    (star exp)
    (star exp min)
    (star exp min max)

        Matches zero or more of `exp'.  If min is specified by itself,
        matches exactly `min' of `exp'.  If min and max are specified,
        matches at least `min' and at most `max' of `exp'.  Min and max
        can be NIL, in which case they do not limit (Thus, (star exp 3 nil)
        matches at least 3 of `exp', and possibly more.)

    (not exp)
        Doesn't match if `exp' does.

    %(var exp)
    (name var exp)
        Matches `exp' and sets `var' to the matching subsequence.

    (push var exp)
        Matches `exp' and pushes the matching subsequence onto `var'.

    (end)
        Matches if at the end of the sequence..

  Note:

    Despite the use of `sequence' as a variable name, I haven't
    actually tried using this to parse anything but lists.  At least
    the `name' ecase branch, `match-literal' and `match-type' would
    have to change for that.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;|#

;;; Implementation

;; I dropped back to a fully parenthesized form so I could work with
;; the internals and not have to worry about the read-table/syntax
;; issues at the same time.  When I finished with that I added the 
;; syntax from the paper back, setting read macro characters to translate
;; into the parenthesized form.  That way I can use this in Lisps that don't
;; have read macros, but have the terse and convenient syntax in
;; those that do.

;; Baker's compileit.
;; * I thought about adding (start), like (end) below, but since this doesn't
;;   do searching it might not be useful.
;; * esc tries to have no effect on the matching, which requires keeping
;;   track of whether we are in a sequence or not.  Pred is just like
;;   Baker's `!'.
;; * I wonder if it is worth it to worry about name capture?
(defun compile-match (x)
  (labels ((helper (x in-seq-p)
            (cond 
             ((and (listp x) (symbolp (car x)))
              (ecase (car x)
                (esc `(progn ,@(cdr x) ,in-seq-p))
                (pred (cadr x))
                (seq `(and ,@(mapcar #'(lambda (x) (helper x t))
                                          (cdr x))))
                (alt `(or ,@(mapcar #'(lambda (x) (helper x nil))
                                             (cdr x))))
                ;; This is complicated by the fact that @typepred results in
                ;; (type . typepred)
                (type (if (symbolp (cdr x))
                          `(match-type ,(cdr x) nil)
                          `(match-type ,(cadr x) ,(caddr x))))
                ;; Ugh.  This would be simple if we only allowed (star x):
                ;;   (star `(not (do () ((not (helper (cadr x) in-seq-p))))))
                (star (match-star (cddr x)   ; min-specified
                                  (cdddr x)  ; max-specified
                                  (caddr x)  ; min
                                  (cadddr x) ; max
                                  (helper (cadr x) in-seq-p) ;matcher
                                  x          ; for error messages.
                                  ))
                (not `(not ,(helper (cadr x) in-seq-p)))
                (name `(let* ((start index)
                              (val ,(helper (caddr x) in-seq-p))
                              (last index))
                         (if val
                             (setq ,(cadr x) (subseq sequence start last)))
                         val))
                (push `(let* ((start index)
                              (val ,(helper (caddr x) in-seq-p))
                              (last index))
                         (if val
                             (push (subseq sequence start last)
                                   ,(cadr x)))
                         val))
                (end '(= index end))
                ))
             (t `(match-literal ,x)))))
    (helper x nil)))

;; This is only so complicated because we allow
;;   (star x) to match zero or more Xs,
;;   (star x min) to match exactly MIN Xs,
;;   (star x min nil) to match at least MIN Xs,
;;   (star x nil max) to match at most MAX Xs, and
;;   (star x min max) to match at least MIN and at most MAX Xs.
(defun match-star (min-specified max-specified min max matcher x)
  (cond
   ;; No min or max specified, match any number.
   ((not (or min max))
    `(not (do () ((not ,matcher)))))
   ;; Min specified, but no max specified, so
   ;; match exactly min.
   ((and min (not max-specified))
    `(do ((i 0 (1+ i)))
         ((or (> i ,min) (not ,matcher))
          (and (= i ,min)))))

   ;; Min specified, but max is nil, so match
   ;; at least min.
   ((and min max-specified (not max))
    `(do ((i 0 (1+ i)))
         ((not ,matcher)
          (>= i ,min))))

   ;; Min specified, max specified, so match
   ;; at least min (none if min was null) and at
   ;; most most max.
   ((and min-specified max-specified max)
    (if (not min)
        (setq min 0))
    `(do ((i 0 (1+ i)))
         ((or (= i ,max) (not ,matcher))
          (and (>= i ,min) (<= i ,max)))))

   (t
    (error "~S is not a valid star form." x))))

;; Baker's matchit.
(defmacro match (x) (compile-match x))

;; Baker's match, for strings.
(defmacro match-literal (x)
  (etypecase x
   (character
    `(when (and (< index end) (eql (char sequence index) ',x))
       (incf index)))
   (string
    `(let ((old-index index))           ; 'old-index' is a *lexical* variable.
       (or (and ,@(map 'list #'(lambda (c) `(match-literal ,c)) x))
           (progn (setq index old-index) nil))))))

;; Baker's match-type, for strings.
(defmacro match-type (x v)
  `(when (and (< index end) (typep (char sequence index) ',x))
     ,(if v `(setq ,v (char sequence index)))
     (incf index)))

;; This allows using match as an expression without wrapping it in a
;; function.
(defmacro match-expr (sequence (&optional (index nil)
                                        (end nil))
                             expr)
  `(let* ((sequence ,sequence)
          (index ,(if index index 0))
          (end   ,(if end end '(length sequence))))
     ,(compile-match expr)))

;;; End of implementation of match.

;;; read macro characters.

;; Here are the translations
;;   esc:  no short form -> (esc exp)
;;   pred: !exp -> (pred exp)
;;   seq:  [exp ...] -> (seq exp ...)
;;   alt:  {exp ...} -> (alt exp ...)
;;   type: @(type-name [var]) -> (type type-name [var])
;;   star: $exp -> (star exp)
;;         ^(exp [min [max]]) -> (star exp [min [max]])
;;   not:  no short form -> (not exp)
;;   name: %(var exp) -> (name var exp)
;;   push: no short form -> (push var exp)
;;   end:  no short form -> (end)

(set-macro-character #\!
  #'(lambda (s c) (declare (ignore c)) `(pred ,(read s))))
(set-macro-character #\[
  #'(lambda (s c) (declare (ignore c)) `(seq ,@(read-delimited-list #\] s t))))
(set-macro-character #\{
  #'(lambda (s c) (declare (ignore c)) `(alt ,@(read-delimited-list #\} s t))))
(mapc #'(lambda (c) (set-macro-character c (get-macro-character #\) nil)))
      '(#\] #\}))
(set-macro-character ··@
  #'(lambda (s c) (declare (ignore c)) `(type ,@(read s))))
(set-macro-character #\$
  #'(lambda (s c) (declare (ignore c)) `(star ,(read s))))
(set-macro-character #\^
  #'(lambda (s c) (declare (ignore c)) `(star ,@(read s))))
(set-macro-character #\%
  #'(lambda (s c) (declare (ignore c)) `(name ,@(read s))))

;;; End of read macro characters.

;;; End of file
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

-- 
T. Kurt Bond, ···@access.mountain.net
From: Marco Antoniotti
Subject: Re: Pragmatic parsing macros (aka. META)
Date: 
Message-ID: <lwbtbpvdj4.fsf@copernico.parades.rm.cnr.it>
"Kaelin Colclasure" <······@everest.com> writes:

> I've been playing around with Henry Baker's META macros (as in
> ftp://ftp.netcom.com/pub/hb/hbaker/Prag-Parse.html). While I find his
> definitions quite concise, I just couldn't resist trying to play around
> with the concept a bit in an effort a) to better understand it and b) to
> come up with a set of definitions that don't wierd-out auto-indentation
> in emacs. :-)
> 
> I'm reasonably comfortable with simple macro definitions in Common Lisp,
> but this is my first exposure to reader-macros. [Perhaps this is why I
> feel inclined to eliminate them from the code.] My feeble attempts to
> get a "normal" macro to do what Baker's reader macros do have thus far
> yielded only consternation.
> 
> The crucial function called my META's macros is #'compileit:
> 
> (defun compileit (x)
>   (typecase x
>     (meta
>      (ecase (meta-char x)
>        (#\! (meta-form x))
>        (#\[ `(and ,@(mapcar #'compileit (meta-form x))))
>        (#\{ `(or  ,@(mapcar #'compileit (meta-form x))))
>        (#\$ `(not (do () ((not ,(compileit (meta-form x)))))))
>        (··@ (let ((f (meta-form x))) `(match-type ,(car f) ,(cadr f))))))
>     (t
>      `(match ,x))))

The #\$ reader macro breaks something in MCL.  You need to check the
READTABLE definition as well.

Cheers

-- 
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - 06 68 10 03 17, fax. +39 - 06 68 80 79 26
http://www.parades.rm.cnr.it/~marcoxa