From: Bruce L. Lambert
Subject: replacing subsequences of different lengths
Date: 
Message-ID: <01bd8a75$66fff8a0$dd4df880@ry.pharm.uic.edu>
Hi folks,

I'm trying to implement a phonetic string matching algorithm called Phonix
(a variant of the classic Soundex algorithm). The first part of Phonix
involves several dozen character substitutions. For example, all instances
of ths subsequence "lough" are replaced with "low".

How would one go about making this type of swap? Because the subsequences
are different lengths, the simple solutions (e.g., replace) won't work.

It seems to me one has to resort to setf and subseq, but even this does not
quite work in one step:

Given: "Fairclough"

> (setf foo "Fairclough")

"Fairclough"

> (setf (subseq foo (search "lough" foo)) "low")

"Fairclowgh"

Is there a better way to do this? Is the only solution to build a
replacement string from scratch, rather than substituting subsequences?

Thanks.

-bruce

From: Bob Riemenschneider
Subject: Re: replacing subsequences of different lengths
Date: 
Message-ID: <tpk976m3i8.fsf@violet.csl.sri.com>
"Bruce L. Lambert" <········@uic.edu> writes:

> I'm trying to implement a phonetic string matching algorithm called Phonix
> (a variant of the classic Soundex algorithm). The first part of Phonix
> involves several dozen character substitutions. ...
> 
> Is there a better way to do this? Is the only solution to build a
> replacement string from scratch, rather than substituting subsequences?

Bruce,

Your basic problem is that strings aren't the right choice of data
structure, given the operation you want to perform.  If you really need
ELT to be O(1), then you can either make a new string and forget about
sharing structure or use an array of characters with a fill-pointer so
that you can adjust the length and shift the end of the string around as
required.  If you can settle for ELT being O(n), using lists of characters
instead of strings makes reusing structure in the substitution
straightforward and much less expensive.

                                                        -- rar
From: David Bakhash
Subject: regexps and string replace (Help Eric!)
Date: 
Message-ID: <cxjogwhoe9v.fsf_-_@hawk.bu.edu>
This question has reminded me of something I was gonna ask when I
first moved from elisp to CL:

Is there a library out there for REGEXP string manipulations in Common 
Lisp?  that would be useful.  If that existed, then Bruce would be
set too.

I wonder if Eric Naggum knows anything, since he's involved with
trying to make a CLEmacs, and since the regexp stuff is internal, I
suppose he's had to deal with this issue.

dave
From: John Atwood
Subject: Re: regexps and string replace (Help Eric!)
Date: 
Message-ID: <6klhhl$mpg$1@news.NERO.NET>
David Bakhash  <·····@bu.edu> wrote:
>Is there a library out there for REGEXP string manipulations in Common 
>Lisp?  that would be useful.  If that existed, then Bruce would be
>set too.

Try:
http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/match/0.html


John
From: Sunil Mishra
Subject: Re: regexps and string replace (Help Eric!)
Date: 
Message-ID: <efy1ztdoyp9.fsf@clairmont.cc.gatech.edu>
In article <············@news.NERO.NET> ·······@ice.CS.ORST.EDU (John Atwood) writes:

   David Bakhash  <·····@bu.edu> wrote:
   >Is there a library out there for REGEXP string manipulations in Common 
   >Lisp?  that would be useful.  If that existed, then Bruce would be
   >set too.

   Try:
   http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/match/0.html


There is a better lisp regexp matcher at
ftp://ftp.digitool.com/pub/mcl/contrib/

Sunil
From: Mike McDonald
Subject: Re: regexps and string replace (Help Eric!)
Date: 
Message-ID: <kjCb1.515$58.265698@news.teleport.com>
In article <···············@clairmont.cc.gatech.edu>,
	·······@clairmont.cc.gatech.edu (Sunil Mishra) writes:

> There is a better lisp regexp matcher at
> ftp://ftp.digitool.com/pub/mcl/contrib/
> 
> Sunil

  Ah, which one?

  Mike McDonald
  ·······@mikemac.com
From: Sunil Mishra
Subject: Re: regexps and string replace (Help Eric!)
Date: 
Message-ID: <efyzpfyntw0.fsf@clairmont.cc.gatech.edu>
In article <···················@news.teleport.com> ·······@mikemac.com (Mike McDonald) writes:

   In article <···············@clairmont.cc.gatech.edu>,
	   ·······@clairmont.cc.gatech.edu (Sunil Mishra) writes:

   > There is a better lisp regexp matcher at
   > ftp://ftp.digitool.com/pub/mcl/contrib/
   > 
   > Sunil

     Ah, which one?

Just look for the string regex, there is a version 1.01 which is the
latest. The package might be unsupported though.

Sunil
From: David Bakhash
Subject: Re: regexps and string replace (Help Eric!)
Date: 
Message-ID: <cxjn2c0ope1.fsf@hawk.bu.edu>
·······@clairmont.cc.gatech.edu (Sunil Mishra) writes:

>    Try:
>    http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/match/0.html
> 
> 
> There is a better lisp regexp matcher at
> ftp://ftp.digitool.com/pub/mcl/contrib/

Somehow I don't see how this is better, especially in terms of
user-interface.  Is it a matter of speed?  It definately appears to be 
less efficent and less intuitive than the other, which is based on
strings, and thus resembles Perl/Emacs.

dave
From: Sunil Mishra
Subject: Re: regexps and string replace (Help Eric!)
Date: 
Message-ID: <efywwb0n2rc.fsf@clairmont.cc.gatech.edu>
In article <···············@hawk.bu.edu> David Bakhash <·····@bu.edu> writes:

   Somehow I don't see how this is better, especially in terms of
   user-interface.  Is it a matter of speed?  It definately appears to be 
   less efficent and less intuitive than the other, which is based on
   strings, and thus resembles Perl/Emacs.

   dave

Suddenly I wonder if we are talking about the same program...

ftp://ftp.digitool.com/pub/mcl/contrib/regexp1.01.sit.hqx

I haven't used it in a while, but as I remember it, one can either match a
string to an expression, or compile an expression into a function that can
then be used for matching. From the CMU archive, I had at one point
downloaded nregex. (I was not looking for something that was a front end
for the unix utility, and hopefully for something that would compile.) I
found the program somewhat broken, with much less support for expressions
than the one in the digitool archives.

(As I look again at what had been suggested...)

Nope, there isn't anything new there that I might have missed.

Sunil
From: Jim Veitch
Subject: Re: regexps and string replace (Help Eric!)
Date: 
Message-ID: <356EF662.1F90@franz.com>
David Bakhash wrote:
> 
> This question has reminded me of something I was gonna ask when I
> first moved from elisp to CL:
> 
> Is there a library out there for REGEXP string manipulations in Common
> Lisp?  that would be useful.  If that existed, then Bruce would be
> set too.
> 
> I wonder if Eric Naggum knows anything, since he's involved with
> trying to make a CLEmacs, and since the regexp stuff is internal, I
> suppose he's had to deal with this issue.
> 
> dave

Franz has a regexp package which you can get from Franz' FTP site
ftp://ftp.franz.com/pub/regexp which runs in Allegro CL 4.3.x under UNIX
and ACL 3.0.2 under Windows.

We have integrated this into the product in our new Allegro CL 5.0
release, which runs under both UNIX and Windows.

Jim Veitch
From: T. Kurt Bond
Subject: Re: regexps and string replace (Help Eric!)
Date: 
Message-ID: <m3g1hrrb7w.fsf@localhost.localdomain>
David Bakhash <·····@bu.edu> writes:

[...]
> Is there a library out there for REGEXP string manipulations in Common 
> Lisp?  that would be useful.
[...]
> dave

While not exactly regular expressions, the META technique described in
Henry G. Baker's article "Pragmatic Parsing in Common Lisp", ACM LISP
POinters IV,2 (April-June 1991),3-15 (available for browsing or
download from ftp://ftp.netcom.com/pub/hb/hbaker/home.html) is, in my
experience, quite useful and flexible, though a little less terse than
the conventional syntax for regular expressions.  Here's the abstract
of his paper:

   We review META, a classic technique for building recursive descent
   parsers, that is both simple and efficient. While META does not handle
   all possible regular or context-free grammars, it handles a
   surprisingly large fraction of the grammars encountered by Lisp
   programmers. We show how META can be used to parse streams, strings
   and lists--including Common Lisp's hairy lambda expression parameter
   lists. Finally, we compare the execution time of this parsing method
   to the built-in methods of Common Lisp.

META parsers can be used for pulling strings apart, just like regular
expressions.

The paper includes an actual implementation.

-- 
T. Kurt Bond, ···@access.mountain.net
From: Bruno Haible
Subject: Re: regexps and string replace (Help Eric!)
Date: 
Message-ID: <6kvd6p$bis@news.u-bordeaux.fr>
David Bakhash <·····@bu.edu> wrote:
> Is there a library out there for REGEXP string manipulations in Common 
> Lisp?  that would be useful.

Several implementations have this built-in or as an add-on. For example,
CLISP has it as a separate package. To have it included, you have to
build the `full' (not the `base') binaries when you downloaded a binary
distribution of CLISP.

                       Bruno                     http://clisp.cons.org/
From: Larry Hunter
Subject: Re: replacing subsequences of different lengths
Date: 
Message-ID: <rbu369dvp3.fsf@work.nlm.nih.gov>
Bruce Lambert asks:
  
  For example, all instances of ths subsequence "lough" are replaced with
  "low".  How would one go about making this type of swap? Because the
  subsequences are different lengths, the simple solutions (e.g., replace)
  won't work. ...  Is there a better way to do this? Is the only solution to
  build a replacement string from scratch, rather than substituting
  subsequences?

I find that this has come up for me several times, so I wrote a string
utility that does the equivalent of SUBSTITUTE:

(defun string-substitute (string substring replacement-string)
  (let ((substring-length (length substring))
        (last-end 0)
        (new-string ""))
    (do ((next-start
          (search substring string)
          (search substring string :start2 last-end)))
        ((null next-start)
         (concatenate 'string new-string (subseq string last-end)))
      (setq new-string
            (concatenate 'string
                         new-string
                         (subseq string last-end next-start)
                         replacement-string))
      (setq last-end (+ next-start substring-length)))))
  
Hope this does the trick for you.

Larry