Dear all,
I need to do searches on strings that may not be a perfect match, e.g.
that are close but different because of misspellings or because of a
juxtaposition of letters or only a part of the string.
I have heard about libraries that do this (I believe some Lisps and
Prolog systems replace syntactically/semantically incorrect symbols
with correct ones automatically), but haven't found any references or
(more importantly) source code on the 'net. Does someone have a
reference to this kind of software?
Thanks,
Doug Auclair
>>>>> "Douglas" == Douglas M Auclair <········@hotmail.com> writes:
Douglas> Dear all, I need to do searches on strings that may not
Douglas> be a perfect match, e.g. that are close but different
Douglas> because of misspellings or because of a juxtaposition of
Douglas> letters or only a part of the string.
The agrep program do this. But it is not coded in Lisp.
ftp://ftp.cs.arizona.edu/agrep/
For an Ocaml version see
http://cristal.inria.fr/~xleroy/software.html#agrep
--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net
alias: basile<at>tunes<dot>org
8, rue de la Fa�encerie, 92340 Bourg La Reine, France
You might search for DWIM ("do what I mean") which was a popular name for
this kind of user assistance. There was quite a bit of research and
implementation in this area in the 80's and perhaps earlier. People worked
out heuristics for determining the most likely kind of typos based on
ergonomics and keyboard layout (transpositions, touch typing with the wrong
hand, nearby characters, etc) as well as common kinds of misspellings, and
could thus generate lists of possible items you might have meant, and
possibly filter those lists against symbols already known. There certainly
were lisp implementations of some of these (I remember discussing the design
of one in some company that I worked for and is long since gone.) Here
(http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?DWIM )Interlisp DWIM is
mentioned, along with an amusing anecdote.
If you're looking for approximate string matching algorithms in general, Dan
Gusfield's book "Algorithms on Strings, Trees and Sequences" might be
helpful.
"Douglas M. Auclair" <········@hotmail.com> wrote in message
································@posting.google.com...
> Dear all,
>
> I need to do searches on strings that may not be a perfect match, e.g.
> that are close but different because of misspellings or because of a
> juxtaposition of letters or only a part of the string.
········@hotmail.com (Douglas M. Auclair) writes:
> I need to do searches on strings that may not be a perfect match, e.g.
> that are close but different because of misspellings or because of a
> juxtaposition of letters or only a part of the string.
As a goof I wrote a "fuzzy" apropos which may be useful to you. It
matches strings and symbols below a certain edit distance threshold
using the classic Levenshtein algorithm.
ftp://ftp.hlrz.com/pub/fuzzy-apropos.lisp
The implementation of the Levenshtein algorithm is not optimal for
repeated comparisons. Fortunately, that would not be hard to correct.