From: ·······@sandia.gov
Subject: Problem With Replacing Strings in a File
Date: 
Message-ID: <u8yqyidqj.fsf@sandia.gov>
Hi All,

(Relevant info: clisp 2.30, Windows NT 4.0 SP6)

I'm trying to replace an arbitrary number of (multi-line) strings in a
file.  Below I have my current attempt (called replace-strings-in-file),
but it has a problem.  First, checkout the code:

(defun replace-string (old new string)
  "Return a copy of STRING.  All occurrences of OLD are replaced with NEW."
  (do ((result "")
       (position (search old string)
                 (search old string :start2 (+ position (length old))))
       (old-position 0))
      ((not position) (concatenate 'string result
                                   (subseq string old-position)))
    (setf result (concatenate 'string result
                              (subseq string old-position position) new))
    (setf old-position (+ position (length old)))
    ))

(defun replace-strings-in-file (target-dir target-file subst-list)
  "Destructively replace strings within TARGET-FILE."
  ;;SUBST-LIST is an associative list with the following format:
  ;;((old1 new1) (old2 new2) ... (oldN newN))
  ;;where each occurrence of oldX is replaced with newX.
  (let* ((target-file-path (concatenate 'string target-dir target-file))
         (backup-file-path (concatenate 'string target-file-path ".bak"))
         (mod-file-data))
    (if (not (probe-file backup-file-path))
        (rename-file target-file-path backup-file-path))
    (with-open-file (mod-file backup-file-path :direction :input)
      (setf mod-file-data (make-string (file-length mod-file)))
      (read-sequence mod-file-data mod-file))
    (dolist (subst-elt subst-list)
      (setf mod-file-data (replace-string (first subst-elt)
                                          (second subst-elt)
                                          mod-file-data)))
    (with-open-file (mod-file target-file-path :direction :output
                              :if-exists :overwrite
                              :if-does-not-exist :create)
      (write-sequence mod-file-data mod-file))
    ))

As a test, I put the following text in the file H:\test.txt:
This is
the test file
of mine.

Then I call replace-strings-in-file like so:

(replace-strings-in-file "H:\\" "test.txt"
  `((,(format nil "the test file~%")
     ,(format nil (concatenate 'string
                               "the modified~%"
                               "new and improved test file~%")))))

I get the following results in the H:\test.txt file:
This is
the modified
new and improved test file
of mine.e.

Notice the additional "e." at the end.  I've tried other text in the
test file and it appears that there are always some number of the last
characters of the input file that get repeated at the end of the
resulting file.  It also appears that the number of extra characters are
related to the number of lines in the input test file.

Thanks,
Tad

From: Greg Menke
Subject: Re: Problem With Replacing Strings in a File
Date: 
Message-ID: <m37k6ixqk6.fsf@europa.pienet>
·······@sandia.gov writes:

> Hi All,
> 
> (Relevant info: clisp 2.30, Windows NT 4.0 SP6)
> 
> I'm trying to replace an arbitrary number of (multi-line) strings in a
> file.  Below I have my current attempt (called replace-strings-in-file),
> but it has a problem.  First, checkout the code:
<snip>



> 
> As a test, I put the following text in the file H:\test.txt:
> This is
> the test file
> of mine.
> 
> I get the following results in the H:\test.txt file:
> This is
> the modified
> new and improved test file
> of mine.e.
> 
> Notice the additional "e." at the end.  I've tried other text in the
> test file and it appears that there are always some number of the last
> characters of the input file that get repeated at the end of the
> resulting file.  It also appears that the number of extra characters are
> related to the number of lines in the input test file.

It seems to work OK in clisp under Linux.  Perhaps you might try
:rename-and-delete instead of :overwrite?  Also its generally better
(easier too I think) to use the pathname related functions to assemble
filenames instead of doing the string contatenation yourself.  I've
also found loop preferable to do, which I've never really understood.

Gregm
From: Andreas Hinze
Subject: Re: Problem With Replacing Strings in a File
Date: 
Message-ID: <3F16593F.3D87220C@smi.de>
Curious, it works here even with clisp 2.30 & WinNT SP 6:

This is
the modified
new and improved test file
of mine.

Regards
AHz
> 
> Hi All,
> 
> (Relevant info: clisp 2.30, Windows NT 4.0 SP6)
> 
> I'm trying to replace an arbitrary number of (multi-line) strings in a
> file.  Below I have my current attempt (called replace-strings-in-file),
> but it has a problem.  First, checkout the code:
> 
> (defun replace-string (old new string)
>   "Return a copy of STRING.  All occurrences of OLD are replaced with NEW."
>   (do ((result "")
>        (position (search old string)
>                  (search old string :start2 (+ position (length old))))
>        (old-position 0))
>       ((not position) (concatenate 'string result
>                                    (subseq string old-position)))
>     (setf result (concatenate 'string result
>                               (subseq string old-position position) new))
>     (setf old-position (+ position (length old)))
>     ))
> 
> (defun replace-strings-in-file (target-dir target-file subst-list)
>   "Destructively replace strings within TARGET-FILE."
>   ;;SUBST-LIST is an associative list with the following format:
>   ;;((old1 new1) (old2 new2) ... (oldN newN))
>   ;;where each occurrence of oldX is replaced with newX.
>   (let* ((target-file-path (concatenate 'string target-dir target-file))
>          (backup-file-path (concatenate 'string target-file-path ".bak"))
>          (mod-file-data))
>     (if (not (probe-file backup-file-path))
>         (rename-file target-file-path backup-file-path))
>     (with-open-file (mod-file backup-file-path :direction :input)
>       (setf mod-file-data (make-string (file-length mod-file)))
>       (read-sequence mod-file-data mod-file))
>     (dolist (subst-elt subst-list)
>       (setf mod-file-data (replace-string (first subst-elt)
>                                           (second subst-elt)
>                                           mod-file-data)))
>     (with-open-file (mod-file target-file-path :direction :output
>                               :if-exists :overwrite
>                               :if-does-not-exist :create)
>       (write-sequence mod-file-data mod-file))
>     ))
> 
> As a test, I put the following text in the file H:\test.txt:
> This is
> the test file
> of mine.
> 
> Then I call replace-strings-in-file like so:
> 
> (replace-strings-in-file "H:\\" "test.txt"
>   `((,(format nil "the test file~%")
>      ,(format nil (concatenate 'string
>                                "the modified~%"
>                                "new and improved test file~%")))))
> 
> I get the following results in the H:\test.txt file:
> This is
> the modified
> new and improved test file
> of mine.e.
> 
> Notice the additional "e." at the end.  I've tried other text in the
> test file and it appears that there are always some number of the last
> characters of the input file that get repeated at the end of the
> resulting file.  It also appears that the number of extra characters are
> related to the number of lines in the input test file.
> 
> Thanks,
> Tad
From: ··········@YahooGroups.Com
Subject: Re: Problem With Replacing Strings in a File
Date: 
Message-ID: <REM-2003jul26-005@Yahoo.Com>
{{Date: 16 Jul 2003 11:27:16 -0600
  From: ·······@sandia.gov
  I'm trying to replace an arbitrary number of (multi-line) strings in
  a file.}}

Two general suggestions:

[1] When scanning one-pass through a large string, making changes as
you go along, instead of concatenating the stuff to the left and the
stuff to the right over and over, which requires copying the same data
many times, use this method which copies the data only twice:
(1) Call (make-string-output-stream) to accumulate the pieces of
original-copied and new-transformed code in sequence.
(2) Call (write-string ... :start ... :end ...) as needed to copy
pieces from old string to new string, or to make copies of anything you
have already as string data such as from an assoc or other table
lookup, and (printf ...) as needed to write any constant pieces that
are directly in your code via program logic such as case.
(3) Call get-output-stream-string at the end to retrieve the entire
string you've built so you can then (write-string ...) to actual file.

If you need to re-substitute stuff you've already gotten from pieces of
old and new text, then of course the above won't work. Even so,
multiple passes of the above may be more efficient than multiple passes
of multiple concatenates.

[2] When updating a file, don't use overwrite mode at all, ever!!
Instead, write output to a temporary new file, then rename old file out
of the way to a backup then rename temporary file to main file. I wrote
a simple utility I call "roll-files" which does this sort of thing
easily: Before starting to write the temporary file, it makes sure the
temporary file doesn't exist. After finishing the temporary file, it
recurses on a list of filenames from temporary name through main name
to successive older backups, looking as deep as necessary to find a
non-existant file, then renaming into that hole on the way back up the
recursion.

I suspect a few characters from the old file left-over at the end of
the new file when using overwrite mode is because the new file is
shorter than the old file but somehow the effective file length isn't
trimmed to exactly the new file size but rather is expressed as the
correct number of words with zero bytes padded to fill out the last
word, except when using overwrite mode those extra bytes aren't zero so
you get the new file plus those extra nonzero bytes from the old file.
But the biggest reason not to use overwrite mode is what happens if
your program crashes halfway through writing the new file on top of the
old file. The extra overhead of making a new directory slot for the
temporary file and doing all those renames is trivial compared to the
pain if your file is trashed by an incomplete overwrite. If your disk
is so full there's no room for the temporary file, hence no room for
any backup either, get a bigger disk or swap some no-longer-needed file
to another medium, don't risk trashing your important file!
From: Joerg-Cyril Hoehle
Subject: Re: Problem With Replacing Strings in a File
Date: 
Message-ID: <uadaouz41.fsf@T-Systems.com>
·······@sandia.gov writes:
>  It also appears that the number of extra characters are
> related to the number of lines in the input test file.

This is an indication that you may be running into line-termination
whoes, as commonly seen on DOS systems.

> (Relevant info: clisp 2.30, Windows NT 4.0 SP6)
On DOS/MS-Windows systems, line-terminators usually is CRLF,
i.e. takes two bytes.

>       (setf mod-file-data (make-string (file-length mod-file)))
>       (read-sequence mod-file-data mod-file))

That's it: FILE-LENGTH returns a byte cout, but READ-SEQUENCE may
return less elements than that number of bytes, because each CRLF byte
pair is converted into a single #\Newline character.

Check the result of read-sequence and make a shorter array out of it,
if need be.

I've been using arrays with a fill-pointer for this exact purpose
(let* ((length (file-length ...))
       (mod-file-data ...))
  (setf (fill-pointer mod-file-data)
        (read-sequence mod-file-data length))

Please look up adjustable arrays and fill-pointers in the CLHS.

There are several ways your algorithm could be improved (if needed),
but that was not your question. Other posters have pointed at some
improvement paths.

Regards,
	Joerg Hoehle
T-Systems International ITC-Security