"You meant" search software?

From: Douglas M. Auclair
Subject: "You meant" search software?
Date: Wed, 19 Feb 2003 22:29:49 +0000
Message-ID: <49f7a00.0302191429.536a649c@posting.google.com>

Dear all, 

I need to do searches on strings that may not be a perfect match, e.g.
that are close but different because of misspellings or because of a
juxtaposition of letters or only a part of the string.

I have heard about libraries that do this (I believe some Lisps and
Prolog systems replace syntactically/semantically incorrect symbols
with correct ones automatically), but haven't found any references or
(more importantly) source code on the 'net.  Does someone have a
reference to this kind of software?

Thanks, 
Doug Auclair

Re: "You meant" search software? Basile STARYNKEVITCH
Re: "You meant" search software? Jeff Greif
Re: "You meant" search software? Carl Shapiro
- Re: Awesome search stuff! Douglas M. Auclair

From: Basile STARYNKEVITCH
Subject: Re: "You meant" search software?
Date: Wed, 19 Feb 2003 22:43:30 +0000
Message-ID: <q5r3cmjriwd.fsf@hector.lesours>

>>>>> "Douglas" == Douglas M Auclair <········@hotmail.com> writes:

    Douglas> Dear all, I need to do searches on strings that may not
    Douglas> be a perfect match, e.g.  that are close but different
    Douglas> because of misspellings or because of a juxtaposition of
    Douglas> letters or only a part of the string.

The agrep program do this. But it is not coded in Lisp.

ftp://ftp.cs.arizona.edu/agrep/

For an Ocaml version see
http://cristal.inria.fr/~xleroy/software.html#agrep


-- 

Basile STARYNKEVITCH         http://starynkevitch.net/Basile/ 
email: basile<at>starynkevitch<dot>net 
alias: basile<at>tunes<dot>org 
8, rue de la Fa�encerie, 92340 Bourg La Reine, France

From: Jeff Greif
Subject: Re: "You meant" search software?
Date: Thu, 20 Feb 2003 00:39:53 +0000
Message-ID: <ttV4a.169805$tq4.4124@sccrnsc01>

You might search for DWIM ("do what I mean") which was a popular name for
this kind of user assistance.  There was quite a bit of research and
implementation in this area in the 80's and perhaps earlier.  People worked
out heuristics for determining the most likely kind of typos based on
ergonomics and keyboard layout (transpositions, touch typing with the wrong
hand, nearby characters, etc) as well as common kinds of misspellings, and
could thus generate lists of possible items you might have meant, and
possibly filter those lists against symbols already known.  There certainly
were lisp implementations of some of these (I remember discussing the design
of one in some company that I worked for and is long since gone.)  Here
(http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?DWIM )Interlisp DWIM is
mentioned, along with an amusing anecdote.

If you're looking for approximate string matching algorithms in general, Dan
Gusfield's book "Algorithms on Strings, Trees and Sequences" might be
helpful.

"Douglas M. Auclair" <········@hotmail.com> wrote in message
································@posting.google.com...
> Dear all,
>
> I need to do searches on strings that may not be a perfect match, e.g.
> that are close but different because of misspellings or because of a
> juxtaposition of letters or only a part of the string.

From: Carl Shapiro
Subject: Re: "You meant" search software?
Date: Thu, 20 Feb 2003 07:53:24 +0000
Message-ID: <ouy8ywb75hn.fsf@panix3.panix.com>

········@hotmail.com (Douglas M. Auclair) writes:

> I need to do searches on strings that may not be a perfect match, e.g.
> that are close but different because of misspellings or because of a
> juxtaposition of letters or only a part of the string.

As a goof I wrote a "fuzzy" apropos which may be useful to you.  It
matches strings and symbols below a certain edit distance threshold
using the classic Levenshtein algorithm.

	ftp://ftp.hlrz.com/pub/fuzzy-apropos.lisp

The implementation of the Levenshtein algorithm is not optimal for
repeated comparisons.  Fortunately, that would not be hard to correct.

From: Douglas M. Auclair
Subject: Re: Awesome search stuff!
Date: Thu, 20 Feb 2003 21:17:22 +0000
Message-ID: <49f7a00.0302201317.56d77af6@posting.google.com>

Dear all who responded,

Y'all're AWESOME!  Thank you for the suggestions and the source code
implementations.

Sincerely,
Doug Auclair