Re: matching strings with probabilities

From: Volkan YAZICI
Subject: Re: matching strings with probabilities
Date: Fri, 27 Jun 2008 12:55:21 +0000
Message-ID: <4f1018d9-18cc-4242-975a-9c3983607262@j22g2000hsf.googlegroups.com>

On Jun 27, 3:35 pm, Francogrex <······@grex.org> wrote:
> Hello, I have a question about matching strings.
> Suppose I have the following strings:
> "tets"
> "estt"
> "rtes7"
> "gstes"
> "tes5t"
> Is there a straightforward  and simple Lisp procedure to determine how
> related each string is to the reference string "test", for example to
> say that "tets" is similar to "test" with a probability of 0.9 or
> something of that sort? Thanks

What you are looking for is distance metrics. While there are many
algorithms you can prefer, Levenshtein distance metric is the most
commonly used one. IIRC while there are some other distance metric
implementations you can find at cliki[1], see util.lisp in bk-tree[2]
for a pretty optimized one.

BTW, if you'll perform such string similarity comparisons quite often,
using a bk-tree[2] structure will boost your performance amazingly.

Regards.

[1] http://cliki.net/
[2] http://cliki.net/bk-tree