From: rif
Subject: lisp parsing question
Date: 
Message-ID: <wj04qmh638i.fsf@five-percent-nation.mit.edu>
I have a language defined in BNF form.  For instance:

<rule1> = foo bar <rule2>
<rule2> = quux munge <rule3> | quux munge <rule4>
<rule3> = argl
<rule4> = bargl

I get a string, like "foo bar quux bargl", and I want to get from this
to something like "(<rule1> foo bar (<rule2> quux munge (<rule4>
bargl)))".  There are no recursive rules --- all sentences are finite
length.  Each sentence will have a unique parse.  I believe this
implies the language is "regular" in CS terms, and could in theory be
recognized by a regular expression/finite state machine.  Is there a
freely-available Lisp tool that makes this easy?

My sense is that CL-PPCRE is too "low-level", in that I don't think I
can specify patterns recursively.  Something like META or ZEBU seem
like overkill in that they're designed to parse a larger class of
languages.  Any thoughts?

Cheers,

rif

From: Matthew Danish
Subject: Re: lisp parsing question
Date: 
Message-ID: <20040902001749.GT8087@mapcar.org>
On Wed, Sep 01, 2004 at 06:20:13PM -0400, rif wrote:
> 
> I have a language defined in BNF form.  For instance:
> 
> <rule1> = foo bar <rule2>
> <rule2> = quux munge <rule3> | quux munge <rule4>
> <rule3> = argl
> <rule4> = bargl
> 
> I get a string, like "foo bar quux bargl", and I want to get from this
> to something like "(<rule1> foo bar (<rule2> quux munge (<rule4>
> bargl)))".  There are no recursive rules --- all sentences are finite
> length.  Each sentence will have a unique parse.  I believe this
> implies the language is "regular" in CS terms, and could in theory be
> recognized by a regular expression/finite state machine.  Is there a
> freely-available Lisp tool that makes this easy?
> 
> My sense is that CL-PPCRE is too "low-level", in that I don't think I
> can specify patterns recursively.  Something like META or ZEBU seem
> like overkill in that they're designed to parse a larger class of
> languages.  Any thoughts?
> 
> Cheers,
> 
> rif

;; recursive descent parser generator

(defpackage #:simple-parser
  (:use #:cl)
  (:nicknames #:sp)
  (:export #:define-simple-parser
           #:make-list-monad
           #:make-text-stream-monad))

(in-package #:simple-parser)

(defmethod token-match-p (a b)
  (equal a b))

(defun expect (get-next what succ fail)
  (multiple-value-bind (tok get-next2)
      (funcall get-next)
    (if (token-match-p tok
                       what)
        (funcall succ get-next2)
        (funcall fail))))

(eval-when (:compile-toplevel :load-toplevel :execute)
  (defun rule-name-p (name)
    (and (symbolp name) (char= #\< (char (string name) 0))))

  (defun compile-rule-branch (rule-name rule succ fail &optional accum)
    (cond ((null rule)
           `(funcall ,succ (list ',rule-name ,@accum) get-next))
          ((rule-name-p (first rule))
           (let ((res (gensym "RESULT")))
             `(,(first rule)
               get-next
               #'(lambda (,res get-next)
                   ,(compile-rule-branch rule-name
                                         (rest rule)
                                         succ fail
                                         (append accum (list res))))
               ,fail)))
          (t
           `(expect get-next ',(first rule)
             #'(lambda (get-next)
                 ,(compile-rule-branch rule-name 
                                       (rest rule)
                                       succ fail
                                       (append accum (list `',(first rule)))))
             ,fail))))

  (defun compile-rule (rule-name rule succ fail)
    (cond ((null rule)
           `(funcall ,fail))
          (t
           (let ((fail-fn (gensym "FAIL")))
             `(flet ((,fail-fn ()
                      ,(compile-rule rule-name 
                                     (rest rule)
                                     succ 
                                     fail)))
               ,(compile-rule-branch rule-name
                                     (first rule)
                                     succ
                                     `(function ,fail-fn))))))))

(defmacro define-simple-parser (name rules)
  `(defun ,name (get-next)
    (flet ((top-succ (res get-next)
             (declare (ignore get-next))
             res)
           (top-fail ()
             nil))
      (labels 
          ,(loop for (rule-name . rule) in rules
                 collect `(,rule-name (get-next succ fail)
                           ,(compile-rule
                             rule-name rule 'succ 'fail)))
        (,(first (first rules))
          get-next
          #'top-succ
          #'top-fail)))))

(defun make-list-monad (&rest list)
  #'(lambda ()
      (values (first list)
              (apply 'make-list-monad (rest list)))))

(defun make-text-stream-monad (stream)
  (let ((char (read-char stream nil nil))
        (new-monad nil))
    #'(lambda ()
        (unless new-monad 
          (setf new-monad (make-text-stream-monad stream)))
        (values char new-monad))))

(define-simple-parser lang1
    ((<rule1> ("foo" "bar" <rule2>))
     (<rule2> ("quux" "munge" <rule3>)
              ("quux" "munge" <rule4>))
     (<rule3> ("argl"))
     (<rule4> ("bargl"))))

(define-simple-parser lang2
    ((<rule1> (#\c <rule2> #\t)
              (#\d <rule3> #\g))
     (<rule2> (#\a)
              (#\o)
              (#\u))
     (<rule3> (#\i)
              (#\o))))


;; (lang1 (make-list-monad "foo" "bar" "quux" "munge" "bargl"))
;; (lang2 (make-text-stream-monad (make-string-input-stream "dog")))

-- 
;;;; Matthew Danish -- user: mrd domain: cmu.edu
;;;; OpenPGP public key: C24B6010 on keyring.debian.org
From: TLOlczyk
Subject: Re: lisp parsing question
Date: 
Message-ID: <de3dj09i608kkut0mrnnlpdmt0fb7b1h7i@4ax.com>
On 01 Sep 2004 18:20:13 -0400, rif <···@mit.edu> wrote:

>
>I have a language defined in BNF form.  For instance:
>
><rule1> = foo bar <rule2>
><rule2> = quux munge <rule3> | quux munge <rule4>
><rule3> = argl
><rule4> = bargl
>
>I get a string, like "foo bar quux bargl", and I want to get from this
>to something like "(<rule1> foo bar (<rule2> quux munge (<rule4>
>bargl)))".  There are no recursive rules --- all sentences are finite
>length.  Each sentence will have a unique parse.  I believe this
>implies the language is "regular" in CS terms, and could in theory be
>recognized by a regular expression/finite state machine.  Is there a
>freely-available Lisp tool that makes this easy?
>
>My sense is that CL-PPCRE is too "low-level", in that I don't think I
>can specify patterns recursively.  Something like META or ZEBU seem
>like overkill in that they're designed to parse a larger class of
>languages.  Any thoughts?
>
>Cheers,
>
>rif
Look at combinatorial parsing.



The reply-to email address is ··········@yahoo.com.
This is an address I ignore.
To reply via email, remove 2002 and change yahoo to
interaccess,

**
Thaddeus L. Olczyk, PhD

There is a difference between
*thinking* you know something,
and *knowing* you know something.
From: Petter Gustad
Subject: Re: lisp parsing question
Date: 
Message-ID: <m3pt558133.fsf@scimul.dolphinics.no>
rif <···@mit.edu> writes:

> I have a language defined in BNF form.  For instance:
> 
> <rule1> = foo bar <rule2>
> <rule2> = quux munge <rule3> | quux munge <rule4>
> <rule3> = argl
> <rule4> = bargl
> 
> I get a string, like "foo bar quux bargl", and I want to get from this
> to something like "(<rule1> foo bar (<rule2> quux munge (<rule4>
> bargl)))".  There are no recursive rules --- all sentences are finite

Personally I don't find zebu overkill for this task. The
foo-domain.lisp file created from the below example is 588 bytes, the
foo.tab file is 1251 bytes. The zebu grammar descriptions maps pretty
well into your BNF:

foo.zb
   (:name "foo" 
     :grammar "null-grammar" 
     :package "foo")
   
   (defrule RULE1
     := ("foo" "bar" RULE2)
     :build (list '<rule1> 'foo 'bar RULE2))
   
   (defrule RULE2
     := ("quux" "munge" RULE3)
     :build (list '<rule2> 'quux 'munge RULE3)
     := ("quux" "munge" RULE4)
     :build (list '<rule2> 'quux 'munge RULE4))
   
   (defrule RULE3
     := ("argl")
     :build (list '<rule3> 'argl))
   
   (defrule RULE4
     := ("bargl")
     :build (list '<rule4> 'bargl))

foo.cl

  (unless (find-package "ZEBU")
    (asdf:operate 'asdf:load-op :zebu-compiler))
  (unless (find-package "foo")
    (make-package "foo"))
  
  (defparameter *load-dirname* (pathname-directory *load-truename*))
  
  (zb:zebu-compile-file 
   (make-pathname :name "foo" :type "zb" :directory *load-dirname*))
  
  (zb:zebu-load-file  
   (make-pathname :name "foo" :type "tab" :directory *load-dirname*))
  
  (princ (zb:read-parser "foo bar quux munge bargl" :grammar (zb:find-grammar "foo")))


Loading this results in:

   (<RULE1> FOO BAR (<RULE2> QUUX MUNGE (<RULE4> BARGL)))

Petter

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?