Re: large scale pattern searching

From: Rand Sobriquet
Subject: Re: large scale pattern searching
Date: Sat, 14 Dec 2002 10:25:56 +0000
Message-ID: <1e249696.0212140225.598a6fad@posting.google.com>

> I'm trying to write a lisp library to store and manage rules and
> predicates on a large scale. Ideally the knowledge base would be
> able to handle millions of predicates and unify a pattern against
> them in a time largely proportional to the length and complexity
> of the pattern and the number of matches, rather than the total
> number of predicates in the knowledge base.
> 
> Before I go too much further with this, does anyone know of any
> other code which is available to do this? I've already looked at
> FramerD but it doesn't really suit my needs. Presumably Cyc has
> something like this under the hood, but the source code is not
> available for it (yet?).
> 
> Thanks in advance for any pointers.

Andrew,

sv0f has already pointed you to lisa, but I'm not sure if you're aware
of Xanalys's KnowledgeWorks:

http://www.lispworks.com/reference/lww42/KW-W/html/kwprolog-w-75.htm#pgfId-891609

This link points to KnowledgeWorks's Meta Rule Protocol which may
allow you to write a linear matching criteria (if I understand you
correctly).

I think it will be A LOT EASIER to use Knowledgeworks, which comes
with Xanalys Enterprise (and, I believe, can be layered on top of
Professional), than to write a companion lisp library.


Wow, millions of predicates. for real?,
Rand

From: Andrew Smith
Subject: Re: large scale pattern searching
Date: Sat, 14 Dec 2002 12:26:06 +0000
Message-ID: <u08fta.i6f.ln@192.168.1.111>

Rand Sobriquet wrote:
>>I'm trying to write a lisp library to store and manage rules and
>>predicates on a large scale. Ideally the knowledge base would be
>>able to handle millions of predicates and unify a pattern against
>>them in a time largely proportional to the length and complexity
>>of the pattern and the number of matches, rather than the total
>>number of predicates in the knowledge base.

... snip

> sv0f has already pointed you to lisa, but I'm not sure if you're aware
> of Xanalys's KnowledgeWorks:
> 
> http://www.lispworks.com/reference/lww42/KW-W/html/kwprolog-w-75.htm#pgfId-891609
> 
> This link points to KnowledgeWorks's Meta Rule Protocol which may
> allow you to write a linear matching criteria (if I understand you
> correctly).
> 
> I think it will be A LOT EASIER to use Knowledgeworks, which comes
> with Xanalys Enterprise (and, I believe, can be layered on top of
> Professional), than to write a companion lisp library.

Thanks for the info about Knowledgeworks, but I'm already done since my
original post. I've spent about a month of spare time on it (and loved
every minute of it ;-), but it all just fell in to place this week.

Also, I would really like to be able to release this as open source, so
it would be a bad idea to tie it to any proprietary solutions.

Re Lisa, I should have mentioned that I had a good look at that, and it
was curiosity about that kind of thing that prompted me to try to write
this. According to my albeit limited reading, the rete algorithm on
which Lisa is based doesn't scale well beyond a thousand rules or so,
and is very memory hungry. But I didn't actually try it to see, so
hopefully that's a misconception.

> Wow, millions of predicates. for real?

Yes, for real. At the moment I'm using a test knowledge base of 575,000
predicates derived from WordNet 1.7 and the following is the output of a
test run [clisp 2.30 on Debian Linux 3.0, Athlon 1.4GHz + 1GB DDR RAM]:

----------
(time (print-source (predicate::merge-source
                        (predicate-source *facts* '(?x ?y |cat|))
                        (predicate-source *facts* '(?x ?y (|cat| ?z))))))

(DERIVED-FROM-ADJECTIVE (|feline| 1) |cat|)
(IDEA-OF N1789046 |cat|)
(IDEA-OF N1794924 (|cat| 1))
(IDEA-OF N2597014 (|cat| 2))
(IDEA-OF N2598927 (|cat| 3))
(IDEA-OF N8130567 (|cat| 4))
(IDEA-OF N8330370 (|cat| 5))
(IDEA-OF V59854 (|cat| 6))
(IDEA-OF V1117567 (|cat| 7))

Real time: 0.002463 sec.
Run time: 0.01 sec.
Space: 18544 Bytes
----------

The uncompressed size of the clisp memory image is 53MB which compares
favourably to the size of the raw text data which is 17MB, so I think it
should scale ok. Don't ask me what the algorithm is, I just made it up
as I went along.

Currently working on an efficient way to import and export knowledge
bases so they don't have to reside entirely in RAM and can be mixed
together dynamically. This is intended eventually to help with natural
language processing but at this stage is mainly just a curiosity. Will
release the code as soon as I've finished futzing with it.