Parsing English

From: Steve Graham
Subject: Parsing English
Date: Fri, 12 Mar 2004 04:26:14 +0000
Message-ID: <40513BBD.8020703@comcast.net>

Recently I've been trying to extract facts from an English language book 
and I've been amazed at just how difficult this is.  Of course, being a 
programmer by trade, I've tried developing a program to do the work for 
me.  Haven't had a lot of success.

I was thinking that probably somebody has already started and/or 
finished such code, and that it may be in the public domain.  Can anyone 
point me to such?

TIA, Steve Graham

Re: Parsing English Rick Wojcik
Re: Parsing English Espen Vestre
Re: Parsing English John Thingstad
Re: Parsing English Steve Graham
- Re: Parsing English Cameron MacKinnon
  - Re: Parsing English David Steuber
Re: Parsing English Pete Kirkham
Re: Parsing English ··········@YahooGroups.Com
- Re: Parsing English Gareth McCaughan
- Re: Parsing English John Thingstad
Re: Parsing English Steve Graham

From: Rick Wojcik
Subject: Re: Parsing English
Date: Fri, 12 Mar 2004 06:06:15 +0000
Message-ID: <rrc4c.3387$Pa7.86841@bgtnsc05-news.ops.worldnet.att.net>

Steve, what you propose to do is not entirely clear, but it sounds like 
the holy grail of Natural Language Processing.  We have devised some 
pretty good techniques for classifying text content and retrieving 
broadly-defined categories of information, but you aren't going to find 
any simple code that processes large amounts of text with anything 
approaching the accuracy of humans.  It would probably be much more 
cost-effective just to read the book and write down the facts that seem 
relevant to you.  Unfortunately, natural language does not really 
contain all of the information necessary to extract information.  When 
you read a book (or any text), the author is only helping you to 
annotate information that largely exists inside your head.  Unlike 
computer code, natural language is extremely ambiguous, and we are only 
able to understand text and conversations because our vast store of 
knowledge helps us to disambiguate.  Unless you can construct a program 
that already knows a lot about the subject of the text, it will become 
hopelessly confused by all the possible different interpretations that 
words and phrases can have.

Could you be a little more specific about the types of "facts" you 
propose to extract?  What is it that you want to do with them?

Steve Graham wrote:

> Recently I've been trying to extract facts from an English language book 
> and I've been amazed at just how difficult this is.  Of course, being a 
> programmer by trade, I've tried developing a program to do the work for 
> me.  Haven't had a lot of success.
> 
> I was thinking that probably somebody has already started and/or 
> finished such code, and that it may be in the public domain.  Can anyone 
> point me to such?
> 
> TIA, Steve Graham

From: Espen Vestre
Subject: Re: Parsing English
Date: Fri, 12 Mar 2004 08:07:02 +0000
Message-ID: <kwsmgev68p.fsf@merced.netfonds.no>

Steve Graham <·········@comcast.net> writes:

> I was thinking that probably somebody has already started and/or
> finished such code, and that it may be in the public domain.  Can
> anyone point me to such?

Welcome to the world of Computational Linguistics! I used to be a
CL/semantics ph.d.student/researcher/teacher before I left academia
and started hacking. The short answer is that this is incredibly
difficult. In fact, my personal view is that real high quality CL is
tightly tied to solving the general problem of Artificial
Intelligence, i.e. something I don't expect to see in my own life-
time.

However, there's a lot of fun to do without solving all the problems.
If you want to look at natural language parsing from a programmer's
viewpoint, have a look at Norvig's Paradigm of Artificial Intelligence
Programming. It's really fun and easy to understand for anyone that
has some lisp experience. If you like to understand the mathematical
underpinnings of parsers and automata, I recommend Hopcroft and Ullmans 
classic - now in revised version:
http://www.aw-bc.com/catalog/academic/product/0,4096,0201441241,00.html

Have fun!
-- 
  (espen)

From: John Thingstad
Subject: Re: Parsing English
Date: Fri, 12 Mar 2004 10:09:42 +0000
Message-ID: <opr4quigp2xfnb1n@news.chello.no>

There are several ways to go about parsing english.
One is a petry net. Go to ww.norvig.com and check out the example code 
here.
Another is site you might want to check out is alicebot.com.
Lots of ideas here. I don't think a version of alice for lisp has been 
released yet.
The creators want to clean up the code and remove the web server code 
first.

On Fri, 12 Mar 2004 04:26:14 GMT, Steve Graham <·········@comcast.net> 
wrote:

> Recently I've been trying to extract facts from an English language book 
> and I've been amazed at just how difficult this is.  Of course, being a 
> programmer by trade, I've tried developing a program to do the work for 
> me.  Haven't had a lot of success.
>
> I was thinking that probably somebody has already started and/or 
> finished such code, and that it may be in the public domain.  Can anyone 
> point me to such?
>
> TIA, Steve Graham

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

From: Steve Graham
Subject: Re: Parsing English
Date: Fri, 12 Mar 2004 14:05:47 +0000
Message-ID: <%sj4c.16165$Gm5.46512@attbi_s04>

Well my example was fairly vague, wasn't it?  If I was to parse the 
following verse

I, NEPHI, having been born of goodly parents, therefore I was taught 
somewhat in all the learning of my father; and having seen many 
afflictions in the course of my days, nevertheless, having been highly 
favored of the Lord in all my days; yea, having had a great knowledge of 
the goodness and the mysteries of God, therefore I make a record of my 
proceedings in my days.

I could glean the following:
1. Nephi had goodly parents
2. He was taught in the learning of his father
3. He saw many afflictions
4. He was highly favored of the Lord
5. He had a great knowledge of the goodness of God
6. He had a great knowledge of the mysteries of God
7. He mad a record of his proceedings

Steve

---

Steve Graham wrote:

> Recently I've been trying to extract facts from an English language book 
> and I've been amazed at just how difficult this is.  Of course, being a 
> programmer by trade, I've tried developing a program to do the work for 
> me.  Haven't had a lot of success.
> 
> I was thinking that probably somebody has already started and/or 
> finished such code, and that it may be in the public domain.  Can anyone 
> point me to such?
> 
> TIA, Steve Graham

From: Cameron MacKinnon
Subject: Re: Parsing English
Date: Fri, 12 Mar 2004 18:30:52 +0000
Message-ID: <PO-dncHZjNBAnM_dRVn-hA@golden.net>

Steve Graham wrote:
> Well my example was fairly vague, wasn't it?  If I was to parse the 
> following verse
> 
> I, NEPHI, having been born of goodly parents, therefore I was taught 
> somewhat in all the learning of my father; and having seen many 
> afflictions in the course of my days, nevertheless, having been highly 
> favored of the Lord in all my days; yea, having had a great knowledge of 
> the goodness and the mysteries of God, therefore I make a record of my 
> proceedings in my days.
> 
> I could glean the following:
> 1. Nephi had goodly parents
> 2. He was taught in the learning of his father

Would that be his father, or our Father?

> 3. He saw many afflictions
> 4. He was highly favored of the Lord
> 5. He had a great knowledge of the goodness of God
> 6. He had a great knowledge of the mysteries of God

He said he had a great knowledge of "the goodness", and of the mysteries 
of God. Parsing that to mean "the goodness of God" could be tricky.

> 7. He mad[e] a record of his proceedings

No, he makes a record of his proceedings. How did you infer the past tense?

You could also glean that all people born of handsome and/or unusually 
large parents are taught somewhat in the learnings of their/our 
father/Father, and that people who know (or think they know) the 
mysteries of God tend to wax autobiographical.

There's also the issue of what students of fiction call the "unreliable 
narrator" problem -- If not all statements are (or can be) true, how 
does the computer decide which ones are false?

Seriously, it is my understanding that the current state of the art in 
computational linguistics has enough difficulty parsing series of 
simple, well formed sentences whose words are being used in one of their 
mainstream senses. Parsing run-on sentences full of esoteric usage and 
dodgy syntax would be a mighty challenge indeed.

-- 
Cameron MacKinnon
Toronto, Canada

From: David Steuber
Subject: Re: Parsing English
Date: Sat, 13 Mar 2004 04:59:45 +0000
Message-ID: <m2brn18hq7.fsf@david-steuber.com>

Cameron MacKinnon <··········@clearspot.net> writes:

> Seriously, it is my understanding that the current state of the art in
> computational linguistics has enough difficulty parsing series of
> simple, well formed sentences whose words are being used in one of
> their mainstream senses. Parsing run-on sentences full of esoteric
> usage and dodgy syntax would be a mighty challenge indeed.

I have a problem doing that myself.  Nevermind writing a program to
do it.  The programming language would not even be an issue for me.
The problem itself is just too hard for me to solve.

-- 
Those who do not remember the history of Lisp are doomed to repeat it,
badly.

> (dwim x)
NIL

From: Pete Kirkham
Subject: Re: Parsing English
Date: Sat, 13 Mar 2004 13:14:22 +0000
Message-ID: <4053092f$0$3311$cc9e4d1f@news-text.dial.pipex.com>

You might want to browse out of
   http://www.cl.cam.ac.uk/~asa28/useful_semiotics_research_links.htm

There is a link grammar parser at
   http://www.link.cs.cmu.edu/link/
which I've heard good things about, but never personally used.


Pete

From: ··········@YahooGroups.Com
Subject: Re: Parsing English
Date: Mon, 15 Mar 2004 07:33:42 +0000
Message-ID: <REM-2004mar14-006@Yahoo.Com>

> From: Steve Graham <·········@comcast.net>
> Recently I've been trying to extract facts from an English language
> book and I've been amazed at just how difficult this is.  Of course,
> being a programmer by trade, I've tried developing a program to do
> the work for me.  Haven't had a lot of success.

Because the task is ***EXTREMELY*** difficult to figure out how to do
it, either the way we do it or some new way. Many people have spend
their entire professional lives working on the problem without much
success. The algorithm probably has to figure out what frame
(environment) it's in, in order to understand what meanings of words
are most likely, and the program needs to know most of the standard
scripts (joseki) in that frame in order to fill in the gaps that aren't
stated explicitly. For example, "She went in, found an empty table, sat
down, and started looking at the menu." would make sense in a
restaurant. Left unsaid is the fact she was looking at the menu in
order to decide what food to order, and that next a waiter would
probably come over and ask her what she'd like, and she'd pick some
appetizer or drink from the menu, order it (tell the waiter what she
had selected), waiter would walk away and come back a few minutes later
with the requested item, put it on the table, walk away, and she'd
start eating that delivered item. But there are so many variations on
the script within that single frame: She might be waiting for her date
to show up. She might need to use the toilet before ordering. She might
not like the prices and decide to leave without ordering. The program
would need to have a way to turn all those sequences of words into
precise meaning.

If any natural-language-understanding researchers feel the inclination,
I would like them to set up WebServer applications that demonstrate the
innerds of their particular methodology. For example, if the Web user
types in "She asked the waiter if there were any specials of the day."
the n.l.u. server might tag each ambiguous word with a specific
meaning, either in natural language or in program-innerds data
structures, to illustrate how the program understands the frame and the
special meanings of that word in that frame. It might also generate a
syntactic parse (like a "sentence diagram" you might have learned in
high-school English class, but more precise in structure). With lots of
such n.l.u.-fragment demo-servers available, we might be able to browse
the to get an idea what such fragments are the state of the art, and
some good researcher might figure out how to put some of the pieces
together to yield tomorrow's state of art n.l.u. software.

Appx. ten years ago some research center was seeking employees to code
massive quantitites of common knowledge into databases that might
eventually be used by n.l.u. software. I haven't heard of such efforts
recently. I wonder whatever happened to those projects (other than the
fact that the recession probably ended their funding).

From: Gareth McCaughan
Subject: Re: Parsing English
Date: Mon, 15 Mar 2004 11:35:42 +0000
Message-ID: <87brmypckx.fsf@g.mccaughan.ntlworld.com>

Robert Maas wrote:

> Appx. ten years ago some research center was seeking employees to code
> massive quantitites of common knowledge into databases that might
> eventually be used by n.l.u. software. I haven't heard of such efforts
> recently. I wonder whatever happened to those projects (other than the
> fact that the recession probably ended their funding).

There's a company called Cycorp that's been doing this for some time;
to the best of my knowledge they're still at it. I think their
database of facts is intended for more than "just" NL applications.

-- 
Gareth McCaughan
.sig under construc

From: John Thingstad
Subject: Re: Parsing English
Date: Wed, 17 Mar 2004 12:45:59 +0000
Message-ID: <opr40a2xhdxfnb1n@news.chello.no>

On Sun, 14 Mar 2004 23:33:42 -0800, <··········@YahooGroups.Com> wrote:

Have you heard of AIML a xml format for expressing english responces.
Try www.alicebot.org and follow references from there.
The version I use here is a java version but a lisp version can be tried 
at the above address.
They expect to come withe a public source release when they have seperated 
the http server code from the AI code and cleaned it up. Some time this 
year perhaps.
The nice thing is that there is a lot of precoded rules and general 
knowlege so you don't have
to work from scratch. You can bind sentences to shell commands and have 
access to a javascript like language.
I have just played with it a little but it seems workable.
It is a generalization of the same methodology used by eliza.
Hope this is usefull.

Good luck

>> From: Steve Graham <·········@comcast.net>
>> Recently I've been trying to extract facts from an English language
>> book and I've been amazed at just how difficult this is.  Of course,
>> being a programmer by trade, I've tried developing a program to do
>> the work for me.  Haven't had a lot of success.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

From: Steve Graham
Subject: Re: Parsing English
Date: Fri, 19 Mar 2004 06:02:13 +0000
Message-ID: <B1w6c.38207$_w.593167@attbi_s53>

Thanks to all those who shared their knowledge/experience.  Guess I 
shouldn't feel so bad that a database programmer had a few problems with 
this domain.


Steve Graham

---

Steve Graham wrote:

> Recently I've been trying to extract facts from an English language book 
> and I've been amazed at just how difficult this is.  Of course, being a 
> programmer by trade, I've tried developing a program to do the work for 
> me.  Haven't had a lot of success.
> 
> I was thinking that probably somebody has already started and/or 
> finished such code, and that it may be in the public domain.  Can anyone 
> point me to such?
> 
> TIA, Steve Graham