From: David E. Young
Subject: Lightweight search engine for Lisp
Date: 
Message-ID: <ThdD8.57119$YQ1.26697660@typhoon.southeast.rr.com>
Greetings. I've a project involving some "natural language"-like search
requirements. In other words, some UI component, possibly a browser, will
allow free-text entry; the software takes this text and searches against a
"database" for phrases containing some or all of the words in the text. Just
like a web search engine I suppose. The domain is restricted; phrases will
be specific to a single discipline. For example, let's say we've a database
containing these entries:

P-51D; fixed-wing, single-seat military aircraft
P-47D; fixed-wing, single-seat military aircraft
B-17G; fixed-wing, multi-seat military aircraft
Cessna 170; fixed-wing, multi-seat civilian aircraft
Bell UH-1B: rotary-wing, multi-seat military aircraft
...

Typical queries, along with their results, might be:

"fixed-wing, single-seat" => P-51D, P-47D
"multi-seat military" => B-17B, Bell UH-1B
"civilian" => Cessna 170
...

I'm looking for Lisp code that will get me started here. I see that CLiki
contains a search engine and have downloaded it to have a look. The engine
needs to be "lightweight"; I'd like to target this software for machines
like handheld PDAs (Palm, ipaq). Right now I'm considering CLisp for the
Lisp environment, but haven't done enough research to know if CLisp is the
proper choice. I'll also be needing a lightweight web server for the app;
perhaps AllegroServe if it's small enough or some home-grown solution; don't
know yet.

I'm playing in a space I haven't been before (web stuff, search engines,
etc.), so apologies if anything here seems silly. I appreciate
suggestions/recommendations.

--

Cheers,
--
------------------------------------------
David E. Young
········@computer.org
http://lisa.sourceforge.net

"Those who expect to reap the blessings of liberty
 must undergo the fatigues of supporting it."
  -- Thomas Paine

"But all the world understands my language."
  -- Franz Joseph Haydn (1732-1809)

From: Gabe Garza
Subject: Re: Lightweight search engine for Lisp
Date: 
Message-ID: <8z6qxwg1.fsf@anubis.kynopolis.org>
"David E. Young" <·······@nc.rr.com> writes:

> Greetings. I've a project involving some "natural language"-like search
> requirements. In other words, some UI component, possibly a browser, will
> allow free-text entry; the software takes this text and searches against a
> "database" for phrases containing some or all of the words in the text. Just
> like a web search engine I suppose. The domain is restricted; phrases will
> be specific to a single discipline. For example, let's say we've a database
> containing these entries:

This is obviously something that could get very complicated if it had
to handle huge datasets using reasonable resource amounts, but if
*very* lightweight (the dataset will be in the 10s of megabytes) is
OK, and it's easy to split the records into a list of "words", then
there's always the ultranaive approach (I've used this for "full text
searches" in small applications and it seems to work just fine...)

For each record, insert each word of the record into a hash table that
maps words to lists of records that contain the word.

Simple searches are easy: just lookup the word in the hash table and return
the records that contain it.  More complicated lookup is also easy: to search
for word W1 and word W2, just take the INTERSECTION of the list of records
containing W1 and the list of records containing W2; for 'W1 or W2', just
take the UNION of the list of records containing W1 and the list of records
containing W2.

This isn't *that* space hungry, either.  The size of the hash table is
just going to be the number of distinct words in all the records, and
the size of the "lists of records" is just going to be however many
cons cells it takes to represent the list.

I'm sure to people who actually know what they're doing this is a
pretty asinine way of doing it, but if your needs are as simple as you
say maybe that'll be OK. A simple implementation takes about as many
characters as I've taken to explain it. ;)

Gabe Garza
From: David E. Young
Subject: Re: Lightweight search engine for Lisp
Date: 
Message-ID: <PbhD8.65176$gd5.26923475@typhoon.southeast.rr.com>
Yes, this is what I'd hoped I would learn. At this time the datasets look
like they'll certainly be within the 10s of megabytes, and I really like the
idea of an algorithm such as Gabe proposes.

I don't know if CLisp runs on PDAs; haven't thought that far yet. I'm
prototyping on a desktop machine at this point just to see if the project is
feasible. Maybe I'll need to look at one of the embedded Lisps. Frankly, the
world of PDA development is new to me also so I've some learning to do there
as well. But, a workable algorithm to do data retrieval is a good place to
start.

Don't know about a real database. At this time I'd say that's out as a goal
of the project is simplicity; requiring a wireless lan to access a database
is probably excessive at this point. And yes, Fernando. It is indeed
Information Retrieval.

Perhaps others will weigh in on this topic; thanks for the information
received thus far.

Cheers,
--
------------------------------------------
David E. Young
········@computer.org
http://lisa.sourceforge.net

"Those who expect to reap the blessings of liberty
 must undergo the fatigues of supporting it."
  -- Thomas Paine

"But all the world understands my language."
  -- Franz Joseph Haydn (1732-1809)

"Gabe Garza" <·······@ix.netcom.com> wrote in message
·················@anubis.kynopolis.org...
> "David E. Young" <·······@nc.rr.com> writes:
>
> > Greetings. I've a project involving some "natural language"-like search
> > requirements. In other words, some UI component, possibly a browser,
will
> > allow free-text entry; the software takes this text and searches against
a
> > "database" for phrases containing some or all of the words in the text.
Just
> > like a web search engine I suppose. The domain is restricted; phrases
will
> > be specific to a single discipline. For example, let's say we've a
database
> > containing these entries:
>
> This is obviously something that could get very complicated if it had
> to handle huge datasets using reasonable resource amounts, but if
> *very* lightweight (the dataset will be in the 10s of megabytes) is
> OK, and it's easy to split the records into a list of "words", then
> there's always the ultranaive approach (I've used this for "full text
> searches" in small applications and it seems to work just fine...)
>
> For each record, insert each word of the record into a hash table that
> maps words to lists of records that contain the word.
>
> Simple searches are easy: just lookup the word in the hash table and
return
> the records that contain it.  More complicated lookup is also easy: to
search
> for word W1 and word W2, just take the INTERSECTION of the list of records
> containing W1 and the list of records containing W2; for 'W1 or W2', just
> take the UNION of the list of records containing W1 and the list of
records
> containing W2.
>
> This isn't *that* space hungry, either.  The size of the hash table is
> just going to be the number of distinct words in all the records, and
> the size of the "lists of records" is just going to be however many
> cons cells it takes to represent the list.
>
> I'm sure to people who actually know what they're doing this is a
> pretty asinine way of doing it, but if your needs are as simple as you
> say maybe that'll be OK. A simple implementation takes about as many
> characters as I've taken to explain it. ;)
>
> Gabe Garza
>
From: lin8080
Subject: Re: Lightweight search engine for Lisp
Date: 
Message-ID: <3CE57B7D.C93E5509@freenet.de>
"David E. Young" schrieb:

> I don't know if CLisp runs on PDAs; haven't thought that far yet. I'm
> prototyping on a desktop machine at this point just to see if the project is
> feasible. Maybe I'll need to look at one of the embedded Lisps. Frankly, the
> world of PDA development is new to me also so I've some learning to do there
> as well. But, a workable algorithm to do data retrieval is a good place to
> start.

http://www.winikoff.net/palm/dev.html
there should be also an interesting link-page

  pippy       beta python-project for palms
  lispme      scheme fpr palm with graphics
  PalmLog     prolog-like interpreter for palm
  OnboardC    C-Compiler, runs on palm
  poplet      java-script for palm
  ...

DB2Everyplace  links IBMs DB2 to handheld-applications (WinCEs and
Psions)

hope it helps
stefan

There is something going on with Java.
From: Fernando Rodr�guez
Subject: Re: Lightweight search engine for Lisp
Date: 
Message-ID: <clsqduklq5j7g2g7b6bo653fq9igpqfs4j@4ax.com>
On Sat, 11 May 2002 18:20:35 GMT, "David E. Young" <·······@nc.rr.com> wrote:

>Greetings. I've a project involving some "natural language"-like search
>requirements. In other words, some UI component, possibly a browser, will
>allow free-text entry; the software takes this text and searches against a
>"database" for phrases containing some or all of the words in the text. Just
>like a web search engine I suppose. The domain is restricted; phrases will
>be specific to a single discipline. For example, let's say we've a database
>containing these entries:

This is actually Information Retrieval and not NLP. :-) 

If you are planning to implement the 'search-engine like' db (in Lisp or any
other language) better forget it, it's _really_ complicated. 

You could use a Prolog like db if you are _absolutely_ sure that the number of
records will be always very small.

Given your problem description, it doesn't look like you need a documental
database (the 'search-engine like' db) at all. You would only need it if the
records were mostly large chunks of text that you wanted to index word by
word, such as web pages, journal articles, books, etc... Consider a documental
database if you are planning to do something similar to a concordance.

I would use some sql database and let the user create an sql query with the
GUI: combos for fixed-wing, rotary-wing, etc... IMHO this makes more sense,
since the 'vocabulary' of your db will be very restricted. Of course, you can
also write an interpreter for a specific query language that translates to
sql, but I don't think it's worth the trouble.

>I'm looking for Lisp code that will get me started here. I see that CLiki
>contains a search engine and have downloaded it to have a look. The engine
>needs to be "lightweight"; I'd like to target this software for machines

I really don't think this is the best solution for you, but if you want to go
the documental db way, I suggest you read this book: 'Managing Gigabytes".

>like handheld PDAs (Palm, ipaq). Right now I'm considering CLisp for the

The db would be on the PDA or on some remote system? CLisp is available for
Palm? O:-)





-----------------------
Fernando Rodriguez
From: Paolo Amoroso
Subject: Re: Lightweight search engine for Lisp
Date: 
Message-ID: <6OPfPITDUZe12rR78KnjpXfqH9g6@4ax.com>
On Sat, 11 May 2002 18:20:35 GMT, "David E. Young" <·······@nc.rr.com>
wrote:

> I'm looking for Lisp code that will get me started here. I see that CLiki
> contains a search engine and have downloaded it to have a look. The engine
> needs to be "lightweight"; I'd like to target this software for machines
> like handheld PDAs (Palm, ipaq). Right now I'm considering CLisp for the

Note that no Common Lisp implementation is available for Palm OS devices.


Paolo
-- 
EncyCMUCLopedia * Extensive collection of CMU Common Lisp documentation
http://www.paoloamoroso.it/ency/README
[http://cvs2.cons.org:8000/cmucl/doc/EncyCMUCLopedia/]
From: David E. Young
Subject: Re: Lightweight search engine for Lisp
Date: 
Message-ID: <FxYD8.82518$gd5.30634695@typhoon.southeast.rr.com>
Yes, that's what it looks like. What about Windows CE devices? Anybody have
any info regarding Lisp on those platforms? Thanks...

--
------------------------------------------
David E. Young
········@computer.org
http://lisa.sourceforge.net

"Those who expect to reap the blessings of liberty
 must undergo the fatigues of supporting it."
  -- Thomas Paine

"But all the world understands my language."
  -- Franz Joseph Haydn (1732-1809)

"Paolo Amoroso" <·······@mclink.it> wrote in message
·································@4ax.com...
> On Sat, 11 May 2002 18:20:35 GMT, "David E. Young" <·······@nc.rr.com>
> wrote:
>
> > I'm looking for Lisp code that will get me started here. I see that
CLiki
> > contains a search engine and have downloaded it to have a look. The
engine
> > needs to be "lightweight"; I'd like to target this software for machines
> > like handheld PDAs (Palm, ipaq). Right now I'm considering CLisp for the
>
> Note that no Common Lisp implementation is available for Palm OS devices.
>
>
> Paolo
> --
> EncyCMUCLopedia * Extensive collection of CMU Common Lisp documentation
> http://www.paoloamoroso.it/ency/README
> [http://cvs2.cons.org:8000/cmucl/doc/EncyCMUCLopedia/]
From: Jochen Schmidt
Subject: Re: Lightweight search engine for Lisp
Date: 
Message-ID: <abpmr2$35f$1@rznews2.rrze.uni-erlangen.de>
David E. Young wrote:

> Yes, that's what it looks like. What about Windows CE devices? Anybody
> have any info regarding Lisp on those platforms? Thanks...

I don't know about Windows CE but about Linux:

http://web.njit.edu/~rxt1077/clisp-maxima-zaurus.html

This site shows that CLISP runs on the Sharp Zaurus which is an ARM based
PDA which runs Linux.

ciao,
Jochen

--
http://www.dataheaven.de
From: Paolo Amoroso
Subject: Re: Lightweight search engine for Lisp
Date: 
Message-ID: <KXDiPFuBEnGB6GHdv4ntDawW2HNG@4ax.com>
On Tue, 14 May 2002 00:05:57 GMT, "David E. Young" <·······@nc.rr.com>
wrote:

> Yes, that's what it looks like. What about Windows CE devices? Anybody have
> any info regarding Lisp on those platforms? Thanks...

CLISP, together with Maxima, was ported to a handheld device, but I can't
remember its name right now. You may try Google with "clisp maxima handheld
port", or something like that.


Paolo
-- 
EncyCMUCLopedia * Extensive collection of CMU Common Lisp documentation
http://www.paoloamoroso.it/ency/README
[http://cvs2.cons.org:8000/cmucl/doc/EncyCMUCLopedia/]
From: Marc Mertens
Subject: Re: Lightweight search engine for Lisp
Date: 
Message-ID: <pan.2002.05.15.19.07.16.976807.308@vt4.net>
Try the following link where you have a port of GCL and maxima on pocket
PC, have not tried it myself

http://www.rainer-keuchel.de/wince/gcl-ce.html

Marc Mertens
On Wed, 15 May 2002 16:22:03 +0000, Paolo Amoroso wrote:


> On Tue, 14 May 2002 00:05:57 GMT, "David E. Young" <·······@nc.rr.com>
> wrote:
> 
>> Yes, that's what it looks like. What about Windows CE devices? Anybody
>> have any info regarding Lisp on those platforms? Thanks...
> 
> CLISP, together with Maxima, was ported to a handheld device, but I
> can't remember its name right now. You may try Google with "clisp maxima
> handheld port", or something like that.
> 
> 
> Paolo
From: Peter Wiehe
Subject: Re: Lightweight search engine for Lisp
Date: 
Message-ID: <ec028e1f.0205200747.5d7050f8@posting.google.com>
David, you have extremely low memory (on PDA). You want a simple
algorithm. You spoke only about searching a phrase (not about needing
precisely SQL etc). Maybe just a simple text search, even without
regular expressions, would do for you?! If you have influence on the
syntax of the data file (and organize it clever), you can make this
simple search perhaps faster and smarter. If that was TOO simple,
forget it :) Regards Peter Wiehe