From: ··········@gmail.com
Subject: Pattern matching on trees
Date: 
Message-ID: <1114049318.370826.241100@o13g2000cwo.googlegroups.com>
Hello,

I've come across the following problem: I need some way of pattern
matching trees.  I am currently working with parse trees, but I think
the biggest use for this would be XML trees.

So consider the following tree:
(setf tree '(S (NP Mary) (VP saw (NP a tree)))

For something simple, I'd like to be able to say
(query '(NP 1) tree)

and get back ((1 . Mary)) and ((1 . a tree))

Of course, I'd also like something like
(quey '((NP 1) (VP 2 NP 3)))

and get back ((1. Mary) (2 . saw) (3 . a tree))

Of course, a lot more complex patterns could be imagined, where
restrictions could be made on what's matched (eg "Give me all the
subtrees that start with NP and have only atomic members" or "give me
all the grandchildren of something that [query]")

Sorry if this isn't clear; it's not at all clear in my head what the
optimal way of doing this is.  In many ways, this seems to be a request
for a lispish version of XPath, though I'm not sure if XPath has all
the capabilities needed.

Do you know if such a package exists?  I've homebrewed something, but
it's
a) quite simplistic and only tailored for my needs
b) horribly (I mean O(2^n)) inefficient.

The problem seems to be isomorphic to regexps on strings (and in fact
could be reduced to it), so the same techniques as used for regexps
could be used for a speedup.  However, something like this seems so
natural for lisp, I'm pretty sure something must exist.

Pointers?

Alex

From: Sashank Varma
Subject: Re: Pattern matching on trees
Date: 
Message-ID: <1114064293.688020.290660@z14g2000cwz.googlegroups.com>
··········@gmail.com wrote:

> I've come across the following problem: I need some way of pattern
> matching trees.  I am currently working with parse trees, but I think
> the biggest use for this would be XML trees.

The way you're thinking reminds me of computational linguistics
algorithms for parsing through unification.

If you have access to Norvig's "Paradigms of Artificial Intelligence
Programming", read the NLP chapters and grab the Lisp code at
http://www.norvig.com/paip/README.html.

You might also want to consult Jurafsky and Martin's "Speech and
Language Processing," which develops algorithms in generic pseudocode.

However, I should add that what you described doesn't exactly match
unification. I think this reflects a confusion on your part but it may
well be that you require something different.
From: Marco Antoniotti
Subject: Re: Pattern matching on trees
Date: 
Message-ID: <ADP9e.3$mi7.22925@typhoon.nyu.edu>
It does not do exactly what you want, but it will be an obvious building 
block.


http://common-lisp.net/project/cl-unification/


Cheers
--
Marco






··········@gmail.com wrote:
> Hello,
> 
> I've come across the following problem: I need some way of pattern
> matching trees.  I am currently working with parse trees, but I think
> the biggest use for this would be XML trees.
> 
> So consider the following tree:
> (setf tree '(S (NP Mary) (VP saw (NP a tree)))
> 
> For something simple, I'd like to be able to say
> (query '(NP 1) tree)
> 
> and get back ((1 . Mary)) and ((1 . a tree))
> 
> Of course, I'd also like something like
> (quey '((NP 1) (VP 2 NP 3)))
> 
> and get back ((1. Mary) (2 . saw) (3 . a tree))
> 
> Of course, a lot more complex patterns could be imagined, where
> restrictions could be made on what's matched (eg "Give me all the
> subtrees that start with NP and have only atomic members" or "give me
> all the grandchildren of something that [query]")
> 
> Sorry if this isn't clear; it's not at all clear in my head what the
> optimal way of doing this is.  In many ways, this seems to be a request
> for a lispish version of XPath, though I'm not sure if XPath has all
> the capabilities needed.
> 
> Do you know if such a package exists?  I've homebrewed something, but
> it's
> a) quite simplistic and only tailored for my needs
> b) horribly (I mean O(2^n)) inefficient.
> 
> The problem seems to be isomorphic to regexps on strings (and in fact
> could be reduced to it), so the same techniques as used for regexps
> could be used for a speedup.  However, something like this seems so
> natural for lisp, I'm pretty sure something must exist.
> 
> Pointers?
> 
> Alex
> 
From: Mark Tarver
Subject: Re: Pattern matching on trees
Date: 
Message-ID: <1114154518.241760.247970@f14g2000cwb.googlegroups.com>
Sorry if this appears twice; I'm still acclimatising to
Google Groups beta version.

OK; I'm not quite certain of your syntax for 'query', but the following
may help.  It is written in Qi - a GPL-based language that runs
under CLisp that I announced recently. It has pattern-matching built
into it - see www.lambdassociates.org for details.

Part of your problem is an instance of a more general case which is
the problem of searching for the an object or sublist of a list that
has
a property in a nested list .  We can write this as a higher-order
function
in Qi that uses backtracking.

(define find
    F X -> X	where (F X)
    F [X | Y] <- (find F X)
    F [X | Y] -> (find F Y)
    _ _ -> #\Escape)

This says

"If F applies to X, then return X, else if not try to search the head
 X of the list [X | Y] for a match and if you find a match return it
else if #\Escape is returned then backtrack and look in the tail else
return #\Escape in all other cases."

This is a bit of a mouthful, but see chapter 7 on Non-Determinism
in 'Functional Programming in Qi' for <- and the semantics of
backtracking in Qi. (www.lambdassociates.org/webbook/chap7.htm).

Here's a test.

(1-) (find number? [a b c [r t 67 u]])
67

(2-) (find number? [a b c [4] [r t 67 u]])
4

(3-) (find number? [a b c [d e f]])
#\Escape

(4-) (define little_list?
         [X Y Z] -> true
         _ -> false)

(5-) (find little_list? [a b c d e])
[c d e]

OK, now you have most of what you need.  So you write a little driver.
Your query function becomes a higher-order function that takes
your match function and your tree, applies the match and yanks the
content out of what it finds.

(define query
  Match Tree -> (let Find (find Match Tree)
                        (if (= #\Escape Find)
                            "query failure!"
                            (yank-content Find))))

(define yank-content
   [Category | Content] -> Content)

Now we need a match function; well here is something dumb and simple.
You can be more creative than this.

(define my-simple-matcher?
   Category [Category | _] -> true
   _ _ -> false)

Off we go.

\Set *test* to a tree.\

(5-) (set *test* [Sent [NP [N [Name Mark]]]
                      [VP [Vtrans needs] [NP [Det a] [N haircut]] ]])
[Sent [NP [N [Name Mark]]] [VP [Vtrans needs] [NP [Det a] [N haircut]]
]]

\Create a lambda function containing my matcher and give
 it to 'query' to use. Look for the first Vtrans.\

(6-) (query (/. Node (my-simple-matcher? Vtrans Node)) (value *test*))
[needs]

\Look for the first Adj.\
(7-) (query (/. Node (my-simple-matcher? Adj Node)) (value *test*))
"query failure!"

There are some clever chaps in this group who can show you how to
do this in Lisp too.

For other aspects of your post, you need something like the Prolog
bagof to collect all objects that have a certain property in a nested
list.
That's not too hard, but I'll leave it there.

hope this helps

Mark


··········@gmail.com wrote:
> Hello,
>
> I've come across the following problem: I need some way of pattern
> matching trees.  I am currently working with parse trees, but I think
> the biggest use for this would be XML trees.
>
> So consider the following tree:
> (setf tree '(S (NP Mary) (VP saw (NP a tree)))
>
> For something simple, I'd like to be able to say
> (query '(NP 1) tree)
>
> and get back ((1 . Mary)) and ((1 . a tree))
>
> Of course, I'd also like something like
> (quey '((NP 1) (VP 2 NP 3)))
>
> and get back ((1. Mary) (2 . saw) (3 . a tree))
>
> Of course, a lot more complex patterns could be imagined, where
> restrictions could be made on what's matched (eg "Give me all the
> subtrees that start with NP and have only atomic members" or "give me
> all the grandchildren of something that [query]")
>
> Sorry if this isn't clear; it's not at all clear in my head what the
> optimal way of doing this is.  In many ways, this seems to be a
request
> for a lispish version of XPath, though I'm not sure if XPath has all
> the capabilities needed.
>
> Do you know if such a package exists?  I've homebrewed something, but
> it's
> a) quite simplistic and only tailored for my needs
> b) horribly (I mean O(2^n)) inefficient.
>
> The problem seems to be isomorphic to regexps on strings (and in fact
> could be reduced to it), so the same techniques as used for regexps
> could be used for a speedup.  However, something like this seems so
> natural for lisp, I'm pretty sure something must exist.
> 
> Pointers?
> 
> Alex