From: Chris Capel
Subject: Replacing subsequences
Date: 
Message-ID: <10rj649bqbpgre9@corp.supernews.com>
I had a need just now to replace one substring in my string with a different
substring, of a different length (for quoting parameters to SQL queries,
incidentally). It appears that accomplishing this isn't straightforward in
CL. Below is what I came up with. Does anyone have a better way than this?
You'd figure CL would have something that does this, but I couldn't find
anything. Maybe it's time to read the standard cover-to-cover.

Chris Capel

(defun replace-sequence (array old new)
  (do ((new-array (make-array (length array)
                              :adjustable t
                              :element-type (array-element-type array)))
       (new-array-position 0 (+ new-array-position (length old)))
       (array-position (search old array)
                       (search old array :start2 (1+ array-position)))
       (prev-array-position 0 array-position)
       (difference (if (> 0 #1=(- (length new) (length old))) nil #1#)))
      ((not array-position) (progn (replace new-array array
                                            :start1 new-array-position
                                            :start2 prev-array-position)
                                   new-array))
    (replace new-array array
             :start1 new-array-position
             :start2 prev-array-position :end2 array-position)
    (incf new-array-position (- array-position prev-array-position))
    (when difference
      (adjust-array new-array (+ (length new-array) difference)))
    (replace new-array new :start1 new-array-position)))

From: Thomas A. Russ
Subject: Re: Replacing subsequences
Date: 
Message-ID: <ymioeh258fd.fsf@sevak.isi.edu>
Chris Capel <······@iba.nktech.net> writes:

> 
> I had a need just now to replace one substring in my string with a different
> substring, of a different length (for quoting parameters to SQL queries,
> incidentally). It appears that accomplishing this isn't straightforward in
> CL. Below is what I came up with. Does anyone have a better way than this?
> You'd figure CL would have something that does this, but I couldn't find
> anything. Maybe it's time to read the standard cover-to-cover.

If I need to do something like that, I usually end up using a string
stream and writing various substrings to it, splicing in the substituted
values.

The advantage is that the string maintenance is nice and clean.

One drawback is that the setup overhead and consing is not trivial, so
it is often worthwhile to first do a search on the string to see if the
substring to be replaced even occurs.  It adds an additional pass
through the string, but if cases where no substitution is needed, it can
provide a big speedup.  That makes it perhaps useful in a general
utility than in special purpose code where you know or expect to be
doing the substitution.

Herewith is my entry:

(defun string-substitute (string old new)
  (if (search old string)
    (with-output-to-string (out)
      (loop with old-len = (length old)
            for start = 0 then (+ end old-len)
            as end = (search old string :start2 start)
	    do (write-string (subseq string start end) out)
               (if end
                  (write-string new out)
	          (loop-finish))))
    string))


-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Chris Capel
Subject: Re: Replacing subsequences
Date: 
Message-ID: <10rjslg81svg2e0@corp.supernews.com>
Thomas A. Russ wrote:

> Chris Capel <······@iba.nktech.net> writes:
> 
>> 
>> I had a need just now to replace one substring in my string with a
>> different substring, of a different length (for quoting parameters to SQL
>> queries, incidentally). It appears that accomplishing this isn't
>> straightforward in CL. Below is what I came up with. Does anyone have a
>> better way than this? You'd figure CL would have something that does
>> this, but I couldn't find anything. Maybe it's time to read the standard
>> cover-to-cover.
> 
> If I need to do something like that, I usually end up using a string
> stream and writing various substrings to it, splicing in the substituted
> values.
> 
> The advantage is that the string maintenance is nice and clean.

Very elegant, I agree. I'm of the opinion, however, that you write it once,
and forget about it. So do it well, make it fast, and you're set
forevermore.

Chris Capel
From: Thomas F. Burdick
Subject: Re: Replacing subsequences
Date: 
Message-ID: <xcvfz2ej623.fsf@conquest.OCF.Berkeley.EDU>
Chris Capel <······@iba.nktech.net> writes:

> Thomas A. Russ wrote:
> 
> > Chris Capel <······@iba.nktech.net> writes:
> > 
> >> 
> >> I had a need just now to replace one substring in my string with a
> >> different substring, of a different length (for quoting parameters to SQL
> >> queries, incidentally). It appears that accomplishing this isn't
> >> straightforward in CL. Below is what I came up with. Does anyone have a
> >> better way than this? You'd figure CL would have something that does
> >> this, but I couldn't find anything. Maybe it's time to read the standard
> >> cover-to-cover.
> > 
> > If I need to do something like that, I usually end up using a string
> > stream and writing various substrings to it, splicing in the substituted
> > values.
> > 
> > The advantage is that the string maintenance is nice and clean.
> 
> Very elegant, I agree. I'm of the opinion, however, that you write it once,
> and forget about it. So do it well, make it fast, and you're set
> forevermore.

Thomas Russ' is both efficient (assuming an efficient w-o-t-string),
and, very importantly, it scales nicely to large strings.  Here's what
I get on sbcl: On small strings with small replacements, your is about
30% faster; on large strings with large replacements, it's many orders
of magnitude slower.  When you're evaluating stuff for library use,
don't make the mistake of only considering best-case behavior!
From: Jason Kantz
Subject: Re: Replacing subsequences
Date: 
Message-ID: <1102705110.854129.3620@c13g2000cwb.googlegroups.com>
You could be better off using a library like CL-PPCRE.
http://www.weitz.de/cl-ppcre/#regex-replace
From: Chris Capel
Subject: Re: Replacing subsequences
Date: 
Message-ID: <10rjt49o1rp3re6@corp.supernews.com>
Jason Kantz wrote:

> You could be better off using a library like CL-PPCRE.
> http://www.weitz.de/cl-ppcre/#regex-replace

Wow. That's about seven times as fast as mine.

Wow.

Chris Capel
From: Wade Humeniuk
Subject: Re: Replacing subsequences
Date: 
Message-ID: <xblud.17061$U47.12976@clgrps12>
Chris Capel wrote:

> I had a need just now to replace one substring in my string with a different
> substring, of a different length (for quoting parameters to SQL queries,
> incidentally). It appears that accomplishing this isn't straightforward in
> CL. Below is what I came up with. Does anyone have a better way than this?
> You'd figure CL would have something that does this, but I couldn't find
> anything. Maybe it's time to read the standard cover-to-cover.
> 

(defun replace-sequence (array old new &key (test #'eql))
   (let ((new-array (make-array (length array)
                                :adjustable t
                                :fill-pointer 0
                                :element-type (array-element-type array))))
     (loop with index = 0
           while (< index (length array))
           if (and (funcall test (elt array index) (elt old 0))
                   (search old array :start2 index :end2 (+ index (length old)) :test test))
           do
           (loop for new-index from 0 below (length new)
                 do (vector-push-extend (elt new new-index) new-array))
           (incf index (length old))
           else do
           (vector-push-extend (elt array index) new-array)
           (incf index))
     new-array))

CL-USER 9 > (replace-sequence "test spot the dog" "dog" "the dogs"
                               :test #'char=)
"test spot the the dogs"

CL-USER 10 > (replace-sequence "test spot the dog" "the" "many"
                               :test #'char=)
"test spot many dog"

CL-USER 11 >

It is a little confusing if you want this function to work on all
sequences (in which case it needs some work or change the name)
or just arrays.

Wade
From: Chris Capel
Subject: Re: Replacing subsequences
Date: 
Message-ID: <10rjs36gks46j94@corp.supernews.com>
Wade Humeniuk wrote:

> Chris Capel wrote:
> 
>> I had a need just now to replace one substring in my string with a
>> different substring, of a different length (for quoting parameters to SQL
>> queries, incidentally). It appears that accomplishing this isn't
>> straightforward in CL. Below is what I came up with. Does anyone have a
>> better way than this? You'd figure CL would have something that does
>> this, but I couldn't find anything. Maybe it's time to read the standard
>> cover-to-cover.
>> 
> 
> (defun replace-sequence (array old new &key (test #'eql))
>    (let ((new-array (make-array (length array)
>                                 :adjustable t
>                                 :fill-pointer 0
>                                 :element-type (array-element-type
>                                 :array))))
>      (loop with index = 0
>            while (< index (length array))
>            if (and (funcall test (elt array index) (elt old 0))
>                    (search old array :start2 index :end2 (+ index (length
>                    old)) :test test))
>            do
>            (loop for new-index from 0 below (length new)
>                  do (vector-push-extend (elt new new-index) new-array))
>            (incf index (length old))
>            else do
>            (vector-push-extend (elt array index) new-array)
>            (incf index))
>      new-array))

Hmm. That's pretty fast--faster than my original code--but not as fast as
this:

(defun replace-sequence (array old new)
  (let ((hits (list (- 0 (length old))))
        (n-hits 0))
    (let (hit (pos 0))
      (while (setf hit (search old array :start2 pos))
        (push hit hits)
        (incf n-hits)
        (setf pos (1+ hit)))
      (setf hits (nreverse hits)))
    (let* ((difference (- (length new) (length old)))
           (new-array (make-array (+ (length array)
                                     (* difference n-hits))
                                  :element-type (array-element-type array)))
           (new-array-position 0))
      (docdr (x hits)
        (let ((start (+ (car x) (length old)))
              (next (cadr x)))
          (replace new-array array
                   :start1 new-array-position
                   :start2 start :end2 next)
          (when next
            (incf new-array-position (- next start))
            (replace new-array new
                     :start1 new-array-position)
            (incf new-array-position (length new)))))
      new-array)))

Yours is a bit easier to read, because of the fill-pointers, though. (I
could probably use loop, but I don't know if that would help much, and then
I'd have to learn it.) It'd be nice if I could have some sort of
fill-pointer that let me append sequences to my sequence at the fill
pointer position. Then I could get rid of some counters and do something
nicer than REPLACE.

Oh, and in case you want to test this, you'll probably need these:

(defmacro while (eval-form &body body)
  (with-gensyms (ret)
    `(do ((,ret))
         ((not ,eval-form) ,ret)
       (setf ,ret
             (progn
               ,@body)))))

(defmacro docdr ((var list) &body body)
  `(do ((,var ,list (cdr ,var)))
       ((null ,var))
     ,@body))

Worth the price of avoiding loop, I say. Speaking of which, I could probably
include collecting in the first part of my code above, so that it becomes

    (loop with hit = -1 then (search old array :start2 (1+ hit))
          while hit
          do collect hit into hits
          (incf n-hits))

My first loop! Thanks to Peter Seibel's book for a quick reference. It's an
improvement of two lines. Also, considerably faster than my WHILE. I wonder
where the speed boost is coming from. But ... I shudder at the sheer
loopiness of it. (*groan*) The second section of my function could probably
be replaced with some sort of loop equivalent. Anyone care to take a shot? 

I wonder if there's a way to make COLLECT available separately from any sort
of iteration facilities. It'd require some masterful macrology, I do
believe. But I seem to remember a package simulating pointers to setf'able
places that would help out nicely in the exercise.

> It is a little confusing if you want this function to work on all
> sequences (in which case it needs some work or change the name)
> or just arrays.

It'd probably need to be able to create a blank sequence of the same format
as the input (nil in the case of a list). It'd probably be best to provide
two sections of the body that worked only on vectors and lists,
respectively. Unless you can figure out how this abstraction applies to any
array. Is there anything besides lists and vectors to worry about?

Chris Capel
From: Thomas A. Russ
Subject: Re: Replacing subsequences
Date: 
Message-ID: <ymillc56cw6.fsf@sevak.isi.edu>
Chris Capel <······@iba.nktech.net> writes:

> (defun replace-sequence (array old new)
>   (let ((hits (list (- 0 (length old))))
>         (n-hits 0))
>     (let (hit (pos 0))
>       (while (setf hit (search old array :start2 pos))
>         (push hit hits)
>         (incf n-hits)
>         (setf pos (1+ hit)))
>       (setf hits (nreverse hits)))

Hmmm.  This is interesting.  You seem to allow for overlapping
replacements, since each match position is incremented by 1 instead
of the length of OLD.   Do you get what you expect from the 
following call:

   (replace-sequence "aaaaWWWWaaaWWWaa" "aa" "bb")

BTW, when I tried the above, I got an error....

>     (let* ((difference (- (length new) (length old)))
>            (new-array (make-array (+ (length array)
>                                      (* difference n-hits))
>                                   :element-type (array-element-type array)))
>            (new-array-position 0))
>       (docdr (x hits)
>         (let ((start (+ (car x) (length old)))
>               (next (cadr x)))
>           (replace new-array array
>                    :start1 new-array-position
>                    :start2 start :end2 next)
>           (when next
>             (incf new-array-position (- next start))
>             (replace new-array new
>                      :start1 new-array-position)
>             (incf new-array-position (length new)))))
>       new-array)))

-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Wade Humeniuk
Subject: Re: Replacing subsequences
Date: 
Message-ID: <4Roud.18041$Ya4.7826@edtnps84>
Chris Capel wrote:

> 
> It'd probably need to be able to create a blank sequence of the same format
> as the input (nil in the case of a list). It'd probably be best to provide
> two sections of the body that worked only on vectors and lists,
> respectively. Unless you can figure out how this abstraction applies to any
> array. Is there anything besides lists and vectors to worry about?
> 

Like Frank's version, a generalized sequence version:

(defun replace-sequence (seq old new &key (start 0) (end nil) (test #'eql))
   (if (< start (or end (length seq)))
     (let ((position (search old seq :start2 start :end2 end)))
       (if position
           (concatenate (type-of seq)
                        (subseq seq start position)
                        new
                        (replace-sequence seq
                                          old new
                                          :start (+ position (length old))
                                          :end end
                                          :test test))
         (subseq seq start)))
     (subseq seq start)))

Wade
From: Dominic Robinson
Subject: Re: Replacing subsequences
Date: 
Message-ID: <1103200038.057927.286450@z14g2000cwz.googlegroups.com>
How about this variation which recurses rather than collecting the
hits:

;(defun replace-sequence (string old new)
;  (let ((old-length (length old))
;        (new-length (length new)))
;    (labels ((replacer (pos new-pos)
;               (let ((match-pos (search old string :start2 pos)))
;                 (if match-pos
;                     (let* ((match-offset (- match-pos pos))
;                            (next-pos (+ match-pos old-length))
;                            (next-new-pos (+ new-pos match-offset
new-length))
;                            (new-string (replacer next-pos
next-new-pos)))
;                       (replace new-string string
;                                :start1 new-pos
;                                :start2 pos :end2 match-pos)
;                       (replace new-string new
;                                :start1 (+ new-pos match-offset)))
;                   (if (= pos 0)
;                       string
;                     (let ((new-string (make-array (+ new-pos (-
(length string) pos)) :element-type (array-element-type string))))
;                       (replace new-string string
;                                :start1 new-pos
;                                :start2 pos)))))))
;             (replacer 0 0))))

Overlapping replacements are not handled as the matching pass only sees
the original string.

Apologies for the ;s - but at least they stop google groups eating the
indentation

Dominic Robinson
From: Wade Humeniuk
Subject: Re: Replacing subsequences
Date: 
Message-ID: <iFlud.17065$U47.4254@clgrps12>
There are a few bugs in the code.

A better version is:

(defun replace-sequence (array old new &key (test #'eql))
   (assert (> (length old) 0))
   (let ((new-array (make-array (length array)
                                :adjustable t
                                :fill-pointer 0
                                :element-type (array-element-type array))))
     (loop with index = 0
           while (< index (length array))
           if (search old array :start2 index
                      :end2 (min (length array) (+ index (length old)))
                      :test test)
           do
           (loop for new-index from 0 below (length new)
                 do (vector-push-extend (elt new new-index) new-array))
           (incf index (length old))
           else do
           (vector-push-extend (elt array index) new-array)
           (incf index))
     new-array))

Wade
From: Bulent Murtezaoglu
Subject: Re: Replacing subsequences
Date: 
Message-ID: <87653a9lbi.fsf@p4.internal>
Perhaps something like:

(defun replace-string (string old new)
  (let ((end-of-first (search old string)))
    (if (not end-of-first) string ;; nothing to replace
	(concatenate 'string 
		     (subseq string 0 end-of-first)
		     new
		     (subseq string (+ end-of-first (length old)))))))

This _is_ consy, and not general purpose (we are assuming old occurs 
at most once in string), but that's what I'd do first.

cheers,

BM
  
From: Frank Buss
Subject: Re: Replacing subsequences
Date: 
Message-ID: <cpd2an$ggv$1@newsreader2.netcologne.de>
Chris Capel <······@iba.nktech.net> wrote:

> Below is what I came up with. 

looks like it doesn't work correctly. I've tested it with LispWorks:

CL-USER > (replace-sequence "abc" "b" "123")
"a1bc^P"

A function, which works with every sequence (I hope):

(defun replace-sequence (seq old new &key (start 0))
  (let ((seq-len (length seq))
        (old-len (length old))
        (new-len (length new)))
    (if (or (< (- seq-len start) old-len) (= old-len 0))
        (copy-seq seq)
      (if (search seq old :start1 start :end1 (+ start old-len))
          (replace-sequence 
           (concatenate (type-of seq) 
                        (subseq seq 0 start)
                        new 
                        (subseq seq (+ start old-len)))
           old 
           new 
           :start (+ start new-len))
        (replace-sequence seq old new :start (1+ start))))))

CL-USER : 2 > (replace-sequence "" "" "")
""

CL-USER : 2 > (replace-sequence "abc" "b" "123")
"a123c"

CL-USER : 2 > (replace-sequence "Hello universe!" "universe" "world")
"Hello world!"

CL-USER : 2 > (replace-sequence '(1 2 3 4 5) '(3 4) nil)
(1 2 5)

CL-USER : 2 > (replace-sequence '#(1 2 3 4 5) '#(2 3) '#(10 11 12))
#(1 10 11 12 4 5)

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: David Sletten
Subject: Re: Replacing subsequences
Date: 
Message-ID: <MOzud.2126$gd.1827@twister.socal.rr.com>
Frank Buss wrote:


> A function, which works with every sequence (I hope):
> 
> (defun replace-sequence (seq old new &key (start 0))
>   (let ((seq-len (length seq))
>         (old-len (length old))
>         (new-len (length new)))
>     (if (or (< (- seq-len start) old-len) (= old-len 0))
>         (copy-seq seq)
>       (if (search seq old :start1 start :end1 (+ start old-len))
>           (replace-sequence 
>            (concatenate (type-of seq)
                           ^^^^^^^^^^^^^

This doesn't seem to be portable. In LispWorks:
(type-of "foo") => SIMPLE-TEXT-STRING
But in CLISP and SBCL:
(type-of "foo") => (SIMPLE-BASE-STRING 3)

So your code works as long as the NEW sequence is the same length as the 
OLD:
(replace-sequence "Is this not pung?" "this" "that") => "Is that not pung?"

But if the lengths are different you violate the type specifier:
(replace-sequence "abc" "b" "123") =>
*** - sequence type forces length 3, but result has length 5

I think you need something like:
(concatenate (typecase seq (string 'string) (vector 'vector)) ...)

David Sletten
From: Frank Buss
Subject: Re: Replacing subsequences
Date: 
Message-ID: <cpkqbb$r3$1@newsreader2.netcologne.de>
David Sletten <·····@slytobias.com> wrote:

> But if the lengths are different you violate the type specifier:
> (replace-sequence "abc" "b" "123") =>
> *** - sequence type forces length 3, but result has length 5
> 
> I think you need something like:
> (concatenate (typecase seq (string 'string) (vector 'vector)) ...)

but if I create my own sequence type, I have to add it to this typecase 
list. How can I write a concatenate, which works for all sequence types, 
and which perhaps creates a (SIMPLE-BASE-STRING 5) in CLISP, if the inputs 
are (SIMPLE-BASE-STRING 2) and (SIMPLE-BASE-STRING 3)?

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: Barry Margolin
Subject: Re: Replacing subsequences
Date: 
Message-ID: <barmar-D7AE53.19293813122004@comcast.dca.giganews.com>
In article <···········@newsreader2.netcologne.de>,
 Frank Buss <··@frank-buss.de> wrote:

> David Sletten <·····@slytobias.com> wrote:
> 
> > But if the lengths are different you violate the type specifier:
> > (replace-sequence "abc" "b" "123") =>
> > *** - sequence type forces length 3, but result has length 5
> > 
> > I think you need something like:
> > (concatenate (typecase seq (string 'string) (vector 'vector)) ...)
> 
> but if I create my own sequence type, I have to add it to this typecase 
> list. How can I write a concatenate, which works for all sequence types, 
> and which perhaps creates a (SIMPLE-BASE-STRING 5) in CLISP, if the inputs 
> are (SIMPLE-BASE-STRING 2) and (SIMPLE-BASE-STRING 3)?

CL doesn't provide a way for you to create new sequence types, so it 
naturally doesn't provide a way to extend the sequence functions, either.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
From: Chris Capel
Subject: Re: Replacing subsequences
Date: 
Message-ID: <10rlci08f45516d@corp.supernews.com>
Chris Capel wrote:

> You'd figure CL would have something that does this,
> but I couldn't find anything.

Well, I guess it doesn't. How did something like this not make it into the
standard? Did people just not use strings back then? Every other language I
know of has something to do this, that works at least for strings, built
in. Was the ANSI committee simply too pressed for time?

Chris Capel
From: Jeff
Subject: Re: Replacing subsequences
Date: 
Message-ID: <R0Fud.641853$mD.355027@attbi_s02>
Chris Capel wrote:

> Chris Capel wrote:
> 
> > You'd figure CL would have something that does this,
> > but I couldn't find anything.
> 
> Well, I guess it doesn't. How did something like this not make it
> into the standard? Did people just not use strings back then? Every
> other language I know of has something to do this, ...

What languages? 

Aside from scripting languages that have some form of regular
expression parsing (eg, Perl, Python), I don't know of any [compiled]
language that can replace sections of a string of different lengths. It
is just easier in some languages to find the string you want replaced,
and easier to concatenate your new string with the previous characters
that you want to keep.

Jeff M.

-- 
http://www.retrobyte.org
··············@gmail.com
From: Jeff
Subject: Re: Replacing subsequences
Date: 
Message-ID: <ujtud.219163$HA.71769@attbi_s01>
Chris Capel wrote:

> I had a need just now to replace one substring in my string with a
> different substring, of a different length 

If you want something that's quick to just "drop in and use", might I
suggest using my regular expression package (url below). It allows very
quick and simple replacing of text in a string:

(re:replace-all "Some random string" 
                #/\s(\a+)$/
                (format nil "~D" (length $1)))

 ==> "Some random 6"

-- 
http://www.retrobyte.org
··············@gmail.com
From: David Sletten
Subject: Re: Replacing subsequences
Date: 
Message-ID: <Jozud.2125$gd.1228@twister.socal.rr.com>
Chris Capel wrote:
> I had a need just now to replace one substring in my string with a different
> substring, of a different length (for quoting parameters to SQL queries,
> incidentally). It appears that accomplishing this isn't straightforward in
> CL. Below is what I came up with. Does anyone have a better way than this?
> You'd figure CL would have something that does this, but I couldn't find
> anything. Maybe it's time to read the standard cover-to-cover.
> 
> Chris Capel
> 
> (defun replace-sequence (array old new)
>   (do ((new-array (make-array (length array)
>                               :adjustable t
>                               :element-type (array-element-type array)))
>        (new-array-position 0 (+ new-array-position (length old)))
>        (array-position (search old array)
>                        (search old array :start2 (1+ array-position)))
>        (prev-array-position 0 array-position)
>        (difference (if (> 0 #1=(- (length new) (length old))) nil #1#)))
>       ((not array-position) (progn (replace new-array array
>                                             :start1 new-array-position
>                                             :start2 prev-array-position)
>                                    new-array))
>     (replace new-array array
>              :start1 new-array-position
>              :start2 prev-array-position :end2 array-position)
>     (incf new-array-position (- array-position prev-array-position))
>     (when difference
>       (adjust-array new-array (+ (length new-array) difference)))
>     (replace new-array new :start1 new-array-position)))
> 
May I suggest that whichever implementation you choose, you follow the 
convention of parameter order in SUBST and SUBSTITUTE? Namely,
(defun replace-sequence (new old seq) ...)

David Sletten