Hi folks,
I'm trying to implement a phonetic string matching algorithm called Phonix
(a variant of the classic Soundex algorithm). The first part of Phonix
involves several dozen character substitutions. For example, all instances
of ths subsequence "lough" are replaced with "low".
How would one go about making this type of swap? Because the subsequences
are different lengths, the simple solutions (e.g., replace) won't work.
It seems to me one has to resort to setf and subseq, but even this does not
quite work in one step:
Given: "Fairclough"
> (setf foo "Fairclough")
"Fairclough"
> (setf (subseq foo (search "lough" foo)) "low")
"Fairclowgh"
Is there a better way to do this? Is the only solution to build a
replacement string from scratch, rather than substituting subsequences?
Thanks.
-bruce
"Bruce L. Lambert" <········@uic.edu> writes:
> I'm trying to implement a phonetic string matching algorithm called Phonix
> (a variant of the classic Soundex algorithm). The first part of Phonix
> involves several dozen character substitutions. ...
>
> Is there a better way to do this? Is the only solution to build a
> replacement string from scratch, rather than substituting subsequences?
Bruce,
Your basic problem is that strings aren't the right choice of data
structure, given the operation you want to perform. If you really need
ELT to be O(1), then you can either make a new string and forget about
sharing structure or use an array of characters with a fill-pointer so
that you can adjust the length and shift the end of the string around as
required. If you can settle for ELT being O(n), using lists of characters
instead of strings makes reusing structure in the substitution
straightforward and much less expensive.
-- rar
This question has reminded me of something I was gonna ask when I
first moved from elisp to CL:
Is there a library out there for REGEXP string manipulations in Common
Lisp? that would be useful. If that existed, then Bruce would be
set too.
I wonder if Eric Naggum knows anything, since he's involved with
trying to make a CLEmacs, and since the regexp stuff is internal, I
suppose he's had to deal with this issue.
dave
From: John Atwood
Subject: Re: regexps and string replace (Help Eric!)
Date:
Message-ID: <6klhhl$mpg$1@news.NERO.NET>
David Bakhash <·····@bu.edu> wrote:
>Is there a library out there for REGEXP string manipulations in Common
>Lisp? that would be useful. If that existed, then Bruce would be
>set too.
Try:
http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/match/0.html
John
In article <············@news.NERO.NET> ·······@ice.CS.ORST.EDU (John Atwood) writes:
David Bakhash <·····@bu.edu> wrote:
>Is there a library out there for REGEXP string manipulations in Common
>Lisp? that would be useful. If that existed, then Bruce would be
>set too.
Try:
http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/match/0.html
There is a better lisp regexp matcher at
ftp://ftp.digitool.com/pub/mcl/contrib/
Sunil
In article <···············@clairmont.cc.gatech.edu>,
·······@clairmont.cc.gatech.edu (Sunil Mishra) writes:
> There is a better lisp regexp matcher at
> ftp://ftp.digitool.com/pub/mcl/contrib/
>
> Sunil
Ah, which one?
Mike McDonald
·······@mikemac.com
In article <···················@news.teleport.com> ·······@mikemac.com (Mike McDonald) writes:
In article <···············@clairmont.cc.gatech.edu>,
·······@clairmont.cc.gatech.edu (Sunil Mishra) writes:
> There is a better lisp regexp matcher at
> ftp://ftp.digitool.com/pub/mcl/contrib/
>
> Sunil
Ah, which one?
Just look for the string regex, there is a version 1.01 which is the
latest. The package might be unsupported though.
Sunil
From: David Bakhash
Subject: Re: regexps and string replace (Help Eric!)
Date:
Message-ID: <cxjn2c0ope1.fsf@hawk.bu.edu>
·······@clairmont.cc.gatech.edu (Sunil Mishra) writes:
> Try:
> http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/match/0.html
>
>
> There is a better lisp regexp matcher at
> ftp://ftp.digitool.com/pub/mcl/contrib/
Somehow I don't see how this is better, especially in terms of
user-interface. Is it a matter of speed? It definately appears to be
less efficent and less intuitive than the other, which is based on
strings, and thus resembles Perl/Emacs.
dave
In article <···············@hawk.bu.edu> David Bakhash <·····@bu.edu> writes:
Somehow I don't see how this is better, especially in terms of
user-interface. Is it a matter of speed? It definately appears to be
less efficent and less intuitive than the other, which is based on
strings, and thus resembles Perl/Emacs.
dave
Suddenly I wonder if we are talking about the same program...
ftp://ftp.digitool.com/pub/mcl/contrib/regexp1.01.sit.hqx
I haven't used it in a while, but as I remember it, one can either match a
string to an expression, or compile an expression into a function that can
then be used for matching. From the CMU archive, I had at one point
downloaded nregex. (I was not looking for something that was a front end
for the unix utility, and hopefully for something that would compile.) I
found the program somewhat broken, with much less support for expressions
than the one in the digitool archives.
(As I look again at what had been suggested...)
Nope, there isn't anything new there that I might have missed.
Sunil
From: Jim Veitch
Subject: Re: regexps and string replace (Help Eric!)
Date:
Message-ID: <356EF662.1F90@franz.com>
David Bakhash wrote:
>
> This question has reminded me of something I was gonna ask when I
> first moved from elisp to CL:
>
> Is there a library out there for REGEXP string manipulations in Common
> Lisp? that would be useful. If that existed, then Bruce would be
> set too.
>
> I wonder if Eric Naggum knows anything, since he's involved with
> trying to make a CLEmacs, and since the regexp stuff is internal, I
> suppose he's had to deal with this issue.
>
> dave
Franz has a regexp package which you can get from Franz' FTP site
ftp://ftp.franz.com/pub/regexp which runs in Allegro CL 4.3.x under UNIX
and ACL 3.0.2 under Windows.
We have integrated this into the product in our new Allegro CL 5.0
release, which runs under both UNIX and Windows.
Jim Veitch
David Bakhash <·····@bu.edu> writes:
[...]
> Is there a library out there for REGEXP string manipulations in Common
> Lisp? that would be useful.
[...]
> dave
While not exactly regular expressions, the META technique described in
Henry G. Baker's article "Pragmatic Parsing in Common Lisp", ACM LISP
POinters IV,2 (April-June 1991),3-15 (available for browsing or
download from ftp://ftp.netcom.com/pub/hb/hbaker/home.html) is, in my
experience, quite useful and flexible, though a little less terse than
the conventional syntax for regular expressions. Here's the abstract
of his paper:
We review META, a classic technique for building recursive descent
parsers, that is both simple and efficient. While META does not handle
all possible regular or context-free grammars, it handles a
surprisingly large fraction of the grammars encountered by Lisp
programmers. We show how META can be used to parse streams, strings
and lists--including Common Lisp's hairy lambda expression parameter
lists. Finally, we compare the execution time of this parsing method
to the built-in methods of Common Lisp.
META parsers can be used for pulling strings apart, just like regular
expressions.
The paper includes an actual implementation.
--
T. Kurt Bond, ···@access.mountain.net
David Bakhash <·····@bu.edu> wrote:
> Is there a library out there for REGEXP string manipulations in Common
> Lisp? that would be useful.
Several implementations have this built-in or as an add-on. For example,
CLISP has it as a separate package. To have it included, you have to
build the `full' (not the `base') binaries when you downloaded a binary
distribution of CLISP.
Bruno http://clisp.cons.org/
Bruce Lambert asks:
For example, all instances of ths subsequence "lough" are replaced with
"low". How would one go about making this type of swap? Because the
subsequences are different lengths, the simple solutions (e.g., replace)
won't work. ... Is there a better way to do this? Is the only solution to
build a replacement string from scratch, rather than substituting
subsequences?
I find that this has come up for me several times, so I wrote a string
utility that does the equivalent of SUBSTITUTE:
(defun string-substitute (string substring replacement-string)
(let ((substring-length (length substring))
(last-end 0)
(new-string ""))
(do ((next-start
(search substring string)
(search substring string :start2 last-end)))
((null next-start)
(concatenate 'string new-string (subseq string last-end)))
(setq new-string
(concatenate 'string
new-string
(subseq string last-end next-start)
replacement-string))
(setq last-end (+ next-start substring-length)))))
Hope this does the trick for you.
Larry