From: Brian Kendig
Subject: Halp!  Here's a sentence parser; would you help me debug it?
Date: 
Message-ID: <1992May28.213542.23587@Princeton.EDU>
Remember the parser I said about a week or so ago that I was working
on?  Well, it's working now... almost.

I'm at wit's end for why it's behaving the way it is, and I don't have
very good debugging tools to use on it.  Therefore I'm going to make
use of the fact that the Usenet has some really good Lisp programmers
and plenty of people with plenty of time (do these two sets
intersect?), and post my code here in the hopes that someone can
(a) figure it out, (b) help me debug it, and/or (c) help me make it
better.

It's a sort of poor-man's natural language parser.  Enter

	(parse '(the old man) '(np))

and it'll return a list properly identifying the determiner, the
adjective, and the noun.  Take a look in the grammar and words-list to
see what it knows; it follows a top-down approach to parsing (you have
to tell it what to try to parse the words as, and it'll either return
a tree of the parsed-text or nil if it can't parse it).

The problem I'm having is this: The function returns a list,

	(parse-tree words-left)

because sometimes I've gotta know what words are left over when I'm
done parsing something.  This is so when I parse a sentence, which is
just '(np vp), I can parse the np and put the results of the list into
parentheses like (s (np ...) (vp ...)).  I could do it all linearly,
(that is, expand '(np vp) into '(determiner adjective noun vp)), but
then there'd be no way for me to tell where the np ended so I couldn't
group the output together nicely.

But, depending on the order of the rules in my grammar, sometimes
trying to (parse '(the orange man liked the old woman) '(s)) works
correctly and parses everything, but other times it just returns whose
first element is the np parsed correctly, but whose second element is
the vp as the "words left over" after parsing the np.

I'm not at all sure what's going wrong.  I probably have too many or
too few parenthesis around something somewhere, but I can't find them.
I'm at wit's end right now and tearing my hair out.  =:-0

This code is by me for my own amusement.  If you need me to explain
anything else about it, just holler!  And feel free to hack it as much
as you want as you play with it.  :) If I _do_ get this thing working,
I'll post it for the benefit of anybody who wants it, no restrictions.

Thanks!

     << Brian >>


(in-package 'user)

(defparameter *grammar*
  '((S   -> (NP1 VP1))
    (NP1 -> (NP))
    (NP1 -> (NP PP+))
    (NP  -> (Determiner Noun))
    (NP  -> (Determiner A+ Noun))
    (NP  -> (Pronoun))
    (NP  -> (Proper-noun))
    (VP1 -> (VP))
    (VP1 -> (VP PP+))
    (VP  -> (Verb))
    (VP  -> (Verb NP))
    (PP+ -> (PP PP+))
    (PP  -> (Preposition NP))
    (A+  -> (Adjective))
    (A+  -> (Adjective A+))))

(defparameter *word-types*
  '(noun proper-noun pronoun determiner adjective verb adverb preposition))

(defparameter *words*
  '((noun
     man ball woman table orange saw saws noun verb)
    (proper-noun
     John Mary)
    (pronoun
     I you he she it me him her)
    (determiner
     the a an)
    (adjective
     big little old young blue green orange)
    (verb
     hit took saw liked saws)
    (adverb
     quickly)
    (preposition
     with for at on by of in)))

(defun same-part-of-speech (word part)
  (member word (rest (assoc part *words*))))

(defun rules-about (what)
  (mapcar #'third
	  (remove-if-not #'(lambda (x) (equal (car x) what)) *grammar*)))

(defun tree-of (result) (first result))
(defun words-left (result) (second result))

(defun parse (words what-to-parse-as)
  (cond ((null words) nil)
	((null what-to-parse-as) (list nil words))
	((member (first what-to-parse-as) *word-types*)

	 ; Are we looking for a word of a specific part of speech?
	 (when (same-part-of-speech (first words) (first what-to-parse-as))
	       (if (null (rest what-to-parse-as))

		   ; No more words to parse
		   (list (list (list (first what-to-parse-as)
				     (first words)))
			 (rest words))

		 ; More words to parse
		 (let ((result (parse (rest words) (rest what-to-parse-as))))
		   (when result
			 (list (cons
				(list (first what-to-parse-as) (first words))
				(tree-of result))
			       (words-left result)))))))

	; We've got a rule we've got to break down
	(t (mapcan #'(lambda (rule)
		       (let ((result1 (parse words rule)))
			 (when result1
			       (if (null (words-left result1))

				   ; No words left to parse, so this parsed ok
				   (when (null (rest what-to-parse-as))
					 (list (list
						(cons (first what-to-parse-as)
						      (tree-of result1)))))

				 ; More words -- check them to see if parsed ok
				 (let ((result2
					(parse (words-left result1)
					       (rest what-to-parse-as))))
				   (when result2
					 (list
					  (cons (cons
						 (first what-to-parse-as)
						 (tree-of result1))
						(tree-of result2))
					  (words-left result2))))))))

		   (rules-about (first what-to-parse-as))))))


;The general approach: (not entirely what's up there, but close)
;(is the 'needed' a specific word type (noun, verb, adj, adv, art, pro)? if so,
;    (is the first word that word type? if so,
;	(parse everything after that word) (otherwise return nil))
;(otherwise, get possible configurations (in the grammar) of what's needed)
;(for each config
;     (for each part of the config, keep parsing with what's left)
;     (if nothing's left when we're done, then keep this config?)


-- 
| Brian S. Kendig       --/\-- Tri     ········@phoenix.Princeton.EDU, @PUCC
| Computer Science BSE  |/  \| Quad  You gave your life to become the person
| Princeton University  /____\ clubs    you are right now.  Was it worth it?

From: Peter Norvig - Sun Labs Boston
Subject: Re: Halp!  Here's a sentence parser; would you help me debug it?
Date: 
Message-ID: <105ef6INNrf1@seven-up.East.Sun.COM>
Here's a simple version from my book "Paradigms of AI Programming."
Other versions which do semantics and preferences, and which
parse much faster (using memoization), are available.  
You can ftp them from unix.sri.com in the directory pub/norvig.

;;;; -*- Mode: Lisp; Syntax: Common-Lisp -*-
;;;; Code from Paradigms of AI Programming
;;;; Copyright (c) 1991 Peter Norvig

;;;; File syntax1.lisp: The PSG-based natural language parser.
;;;; This is the more efficient version of the non-semantic parser,
;;;; which handles unknown words as described in Section 19.4.
;;;; Remember to use a grammar, as in (use *grammar4*)
;;;; Then do, for example, (parser '(The man saw the orange saw))

(in-package "USER")

(defvar *grammar* nil "The grammar used by GENERATE.")

(defstruct (rule (:type list)) lhs -> rhs)

(defstruct (parse) "A parse tree and a remainder." tree rem)

;; Trees are of the form: (lhs . rhs)
(defun new-tree (cat rhs) (cons cat rhs))
(defun tree-lhs (tree) (first tree))
(defun tree-rhs (tree) (rest tree))

(defun parse-lhs (parse) (tree-lhs (parse-tree parse)))

(defparameter *open-categories* '(N V A Name)
  "Categories to consider for unknown words")

(defun lexical-rules (word)
  "Return a list of rules with word on the right hand side."
  (or (find-all word *grammar* :key #'rule-rhs :test #'equal)
      (mapcar #'(lambda (cat) `(,cat -> ,word)) *open-categories*)))

(defun rules-starting-with (cat)
  "Return a list of rules where cat starts the rhs."
  (find-all cat *grammar* 
            :key #'(lambda (rule) (first-or-nil (rule-rhs rule)))))

(defun complete-parses (parses)
  "Those parses that are complete (have no remainder)."
  (find-all-if #'null parses :key #'parse-rem))

(defun parse (words)
  "Bottom-up parse, returning all parses of any prefix of words."
  (unless (null words)
    (mapcan #'(lambda (rule)
                (extend-parse (rule-lhs rule) (list (first words))
                              (rest words) nil))
            (lexical-rules (first words)))))

(defun extend-parse (lhs rhs rem needed)
  "Look for the categories needed to complete the parse."
  (if (null needed)
      ;; If nothing needed, return parse and upward extensions
      (let ((parse (make-parse :tree (new-tree lhs rhs) :rem rem)))
        (cons parse
              (mapcan
                #'(lambda (rule)
                    (extend-parse (rule-lhs rule)
                                  (list (parse-tree parse))
                                  rem (rest (rule-rhs rule))))
                (rules-starting-with lhs))))
      ;; otherwise try to extend rightward
      (mapcan
        #'(lambda (p)
            (if (eq (parse-lhs p) (first needed))
                (extend-parse lhs (append1 rhs (parse-tree p))
                              (parse-rem p) (rest needed))))
        (parse rem))))

(defun append1 (items item)
  "Add item to end of list of items."
  (append items (list item)))


(defun parser (words)
  "Return all complete parses of a list of words."
  (mapcar #'parse-tree (complete-parses (parse words))))

(defun use (grammar)
  "Switch to a new grammar."
  (length (setf *grammar* grammar)))


;;; Grammars

(defparameter *grammar3*
  '((Sentence -> (NP VP))
    (NP -> (Art Noun))
    (VP -> (Verb NP))
    (Art -> the) (Art -> a)
    (Noun -> man) (Noun -> ball) (Noun -> woman) (Noun -> table)
    (Noun -> noun) (Noun -> verb)
    (Verb -> hit) (Verb -> took) (Verb -> saw) (Verb -> liked)))

(defparameter *grammar4*
  '((S -> (NP VP))
    (NP -> (D N))
    (NP -> (D A+ N))
    (NP -> (NP PP))
    (NP -> (Pro))
    (NP -> (Name))
    (VP -> (V NP))
    (VP -> (V))
    (VP -> (VP PP))
    (PP -> (P NP))
    (A+ -> (A))
    (A+ -> (A A+))
    (Pro -> I) (Pro -> you) (Pro -> he) (Pro -> she)
    (Pro -> it) (Pro -> me) (Pro -> him) (Pro -> her)
    (Name -> John) (Name -> Mary)
    (A -> big) (A -> little) (A -> old) (A -> young)
    (A -> blue) (A -> green) (A -> orange) (A -> perspicuous)
    (D -> the) (D -> a) (D -> an)
    (N -> man) (N -> ball) (N -> woman) (N -> table) (N -> orange)
    (N -> saw) (N -> saws) (N -> noun) (N -> verb)
    (P -> with) (P -> for) (P -> at) (P -> on) (P -> by) (P -> of) (P -> in)
    (V -> hit) (V -> took) (V -> saw) (V -> liked) (V -> saws)))
From: Peter Norvig - Sun Labs Boston
Subject: Re: Halp!  Here's a sentence parser; would you help me debug it?
Date: 
Message-ID: <105gb8INNrgo@seven-up.East.Sun.COM>
Its been pointed out to me that my last posting neglected to include
some utility functions:

(defun find-all (item sequence &rest args &key (test #'eql) test-not
		      &allow-other-keys)
  "Find all those elements of sequence that match item."
  (if test-not
      (apply #'remove item sequence :test (complement test-not) args)
      (apply #'remove item sequence :test (complement test) args)))

(setf (symbol-function 'find-all-if) #'remove-if-not)

#-CLtL2
(defun complement (fn)
  "Return a function that returns true when FN returns false, and vice versa."
  #'(lambda (&rest args) (not (apply fn args))))
From: Jon Callas
Subject: Re: Halp!  Here's a sentence parser; would you help me debug it?
Date: 
Message-ID: <1992Jun9.183438.27148@nntpd.lkg.dec.com>
By the bye, since Peter can't say this, I will.

Buy his book. It's really good. There's a lot of good stuff on parsing in it.

	Jon