www.fast-index.com

From: Nick Levine
Subject: www.fast-index.com
Date: Wed, 20 Jun 2001 09:10:15 +0000
Message-ID: <c2e2603d.0106200110.4f27fdac@posting.google.com>

Hi.

I'd be interested in people's opinions of this site. I've already
mailed about it to Lispweb and the response, which was most useful,
included the suggestion that I also post to CLL.

I originally built this site with a possible commercial demo in mind;
the demo never materialised so I converted the site into something
more "academic", bought a domain, started the server and left it
running. It's been up for a few months but doesn't get much traffic,
so I'd appreciate it if everyone would visit the site simultaneously
and then I can see how it stands up to load. (Ha!)

My intentions in producing this site (other than the commercial
possibilities) were:
i) to explore the limits of web presentation without use of graphics
ii) to write a search engine and discover where the pain lay
iii) to do it all in lisp (well, almost all)

The end results have been fun to play with. The code needs a little
cleaning up, which I hope to do in the next month or so, after which I
will be happy to open-source it. If anyone thinks there's any mileage
in me doing this please let me know, otherwise I'll find something
else to do with my time.

Best regards,

- nick

[Apologies if you're seeing this for a second time. I originally
posted from apu.ac.uk, but it doesn't appear to have worked so I'm
trying Google now.]

Re: www.fast-index.com Reini Urban
- Re: www.fast-index.com Nick Levine

From: Reini Urban
Subject: Re: www.fast-index.com
Date: Fri, 22 Jun 2001 16:14:42 +0000
Message-ID: <3b336d5c.1407538463@news.tu-graz.ac.at>

Nick Levine wrote:
>I'd be interested in people's opinions of this site. I've already
>mailed about it to Lispweb and the response, which was most useful,
>included the suggestion that I also post to CLL.
>
>The end results have been fun to play with. The code needs a little
>cleaning up, which I hope to do in the next month or so, after which I
>will be happy to open-source it. If anyone thinks there's any mileage
>in me doing this please let me know, otherwise I'll find something
>else to do with my time.

I like it. Especially the verb forms. "forget" finds "forgotten" and so
on. not just a silly agrep or soundex grep.

Did you compare it to lambda-vista or the usual popular indexers around,
benchmark wise? The features I can see, but not the numbers.

How many KB text is there?

Why not mod_lisp? (okay, the new comparable mod_lisp 2.0 just arrived
now)
-- 
Reini Urban
http://xarch.tu-graz.ac.at/autocad/news/faq/autolisp.html

From: Nick Levine
Subject: Re: www.fast-index.com
Date: Mon, 25 Jun 2001 09:46:00 +0000
Message-ID: <c2e2603d.0106250146.28fc72a7@posting.google.com>

······@x-ray.at (Reini Urban) wrote in message news:<···················@news.tu-graz.ac.at>...
> Nick Levine wrote:
> >I'd be interested in people's opinions of this site. I've already
> >mailed about it to Lispweb and the response, which was most useful,
> >included the suggestion that I also post to CLL.
> >
> >The end results have been fun to play with. The code needs a little
> >cleaning up, which I hope to do in the next month or so, after which I
> >will be happy to open-source it. If anyone thinks there's any mileage
> >in me doing this please let me know, otherwise I'll find something
> >else to do with my time.

Looks like nobody did. I'll archive it when my current hosting
arrangment ceases at the end of July.

> I like it. Especially the verb forms. "forget" finds "forgotten" and so
> on. not just a silly agrep or soundex grep.

Thanks.

I spent a couple of days messing with a download of Webster's
dictionary and hauling out irregular verbs etc. This projects 6000
words like "forgotten" into around 2000 like "forget". I always
intended to put Shakespeare onto the site as well but the thought of
getting the grammar right horrified me so I backed off.

But most of it is handled really successfully by a dead simple lookup
list:

(defparameter *simple-endings*
  '(("e"  . "")
    ("ed" . "")
    ("ing" . "")
    ("ings" . "")
    ("ies"  . "y")
    ("ves"  . "f")
    ("ches" . "ch")
    ("xes"  . "x")
    ("ses"  . "s")
    ("ier" . "y")
    ("er" . "")
    ("ers"  . "")
    ("iest" . "y")
    ("est" . "")
    ("ests" . "")
    ("es" . "")
    ("s"   . "")))

It only rarely does something daft (but when it does the result is
embarassing).

> Did you compare it to lambda-vista or the usual popular indexers around,
> benchmark wise? The features I can see, but not the numbers.
> 
> How many KB text is there?

Currently: 11 volumes containing a total of 4.3 MB text. Approx
750,000 words, of which 18000 are (barring plurals etc) distinct.

I could probably double the library - if I could be bothered to -
without stretching the machine. More than that and I might want to
think about better compression for the indexing information (currently
4 or 5 times the size of the text itself): a couple of days work at
most for what I have in mind, would reduce the index to the size of
the text.

> Why not mod_lisp? (okay, the new comparable mod_lisp 2.0 just arrived
> now)

It wasn't around when I did the work. I would probably have used it,
because it meets my needs. Otoh, the C layer I wrote only took a
copule of hours to get working, so it wasn't that important.

- nick