A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files

From: gnuist006
Subject: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
Date: Tue, 14 Jan 2003 00:03:15 +0000
Message-ID: <b00bb831.0301131603.34e9704c@posting.google.com>

Here is the type of lines I have in a file:

junk  label="junk1/junk2/junk3/.../junkn/" more junk

I want to find every line that has

label="..."

pattern

and then I want to replace every / by _ inside the
quotes.

For the purposes of continuity, I want the script to look like this:

cat file |
sed commands |
awk commands |

etc.

I do not care if it all sed or awk or in what order.

Note that the junk is usually alphanumeric with dots etc but no slashes.
So it can be represented by [^/]* if / is considered non-special otherwise
escape it. There may be other /'s on the line outside the pattern 
the double quotes starting with label= and they must not be changed.

This problem can be described as making changes to a pattern matching a
regexp. It is not the problem of making changes to a pattern in the whole
line contaning a regexp. That is what is making it difficult for me.
The other reason is that I do not have a definite number of slashes in
the pattern in the single quote otherwise I would use the tagged expression.

I hope you to enjoy this problem. I put on the net only after wrestling it
with some.

gnuist.

BTW, I can do this kind of operation inside emacs. However, I do not know
how to write an lisp script. Or how to automatically load 100s or files
one after another in emacs, run a function on them and then store their
output to another file. Then close these buffers and go to the next file.
I would like as many approaches to this problem as possible, ie
sed
awk
lisp script
lisp in emacs

Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files Christopher J. White
Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files gnuist006
- Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files Friedrich Dominicus
- Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files Friedrich Dominicus
- Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files Wayne Throop
Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files Friedrich Dominicus
Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files ······@lksejb.lks.agilent.com
- Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files Stefan Monnier
Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files Kaz Kylheku
Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files Alan Mackenzie

From: Christopher J. White
Subject: Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
Date: Tue, 14 Jan 2003 03:46:03 +0000
Message-ID: <m2vg0stmv8.fsf@bluesteel.grierwhite.com>

Here's my quick perl solution...

Save as "junk.pl", then run as "junk.pl <file1> <file2> <file3> ... ". 
Each file is renamed to <file>.old, output <file>.  For each

Note, this assumes that there are no double quotes in the label
expression to be manipulated.  I can't see how you'd determine the
end of the expression if this isn't true (unless double quotes
might be quoted with a backslash or something).  

#!/usr/bin/perl

foreach my $outfile (@ARGV) 
{
    my $infile = $outfile . ".old";

    print "infile: $infile\n";
    print "outfile: $outfile\n";

    my $line, $s1, $s2, $s3;

    rename $outfile, $infile;

    open INFILE, "<$infile";
    open OUTFILE, ">$outfile";

    while ($line = <INFILE>) {
        if ($line =~ /^(.*)label=\"([^\"]*)\"(.*)$/)
        {
            $s1 = $1; $s2 = $2; $s3 = $3;
            $s2 =~ s/\//_/g;
            print OUTFILE $s1 . "label=\"" . $s2 . "\"" . $s3 . "\n";
        }
        else
        {
            print OUTFILE $line;
        }
    }

    close INFILE;
    close OUTFILE;
}

From: gnuist006
Subject: Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
Date: Tue, 14 Jan 2003 08:42:05 +0000
Message-ID: <b00bb831.0301140042.4dff4b4e@posting.google.com>

The main group for followup to this is: comp.unix.shell
or gnu.emacs.help depending on the type of solution.
I hope that I have the relevant groups for cross-posting.
----

Even though the problem posed in this thread is still

*** UNSOLVED ***

let me extend my gratitude to Mr Christofer and Friedrich
for their kind attempts to answer this. I hope for some
more help tonight.

Mr Friedrich's reply uses perl. I am not familiar with
this language. Also I prefer not to generate very many 
intermediate files since I have to process a large number
of them. I also want to use bash/sed/awk instead of
perl.

I can write a bash wrapper loop but what I do not know
here is how to implement the core logic of replacing an
indefinite number of forward-slashes within a pattern
of interest.



On the other hand for a lisp based solution 
I CAN write a macro or a lisp function to do the
core logic in lisp inside emacs by myself using narrow 
and widen or transient mode. But here what I do not know 
is how to load one file after another and then save it
to a new name and close that buffer. Please just show me
how to do a bunch of files in this way. I can generate the
file names like this in bash:

for i in `du -a directory | grep file.txt | sed to remove some junk from du`; do

core logic

done


Thanks a lot and do not forget to have fun working on
this problem. It is a little out of the way.

Gnuist


BTW there is a disjoint thread on this subject in comp.unix.shell
and there have been no useful replies as you can see.

From: Friedrich Dominicus
Subject: Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
Date: Tue, 14 Jan 2003 10:23:58 +0000
Message-ID: <874r8cjagx.fsf@fbigm.here>

·········@hotmail.com (gnuist006) writes:

> 
> Mr Friedrich's reply uses perl. 
Definitly not. It's Common Lisp.


> 
> On the other hand for a lisp based solution 
> I CAN write a macro or a lisp function to do the
> core logic in lisp inside emacs by myself using narrow 
> and widen or transient mode. But here what I do not know 
> is how to load one file after another and then save it
> to a new name and close that buffer. Please just show me
> how to do a bunch of files in this way. I can generate the
> file names like this in bash:
Well extending my solution to more files is easy

(mapc '#(lambda (file) (q-2003-01-14 file) 
          ;; rename the generated file if needed)
        (directory "pattern"))

That's all

Doing that all in Emacs Lisp isn't much more difficult.


> 
> for i in `du -a directory | grep file.txt | sed to remove some junk
> from du`; do
For getting a file listing in Common Lisp use directory in Emacs Lisp
it's directory-files. 

But I *strongly* sugggest you post there where you expect an
answer. Is it a shell problem use some .shell group if it's Emacs Lisp
use some emacs Newgroup and if you want Common Lisp post here. 

Friedrich

From: Friedrich Dominicus
Subject: Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
Date: Tue, 14 Jan 2003 10:37:30 +0000
Message-ID: <87wul8hv9x.fsf@fbigm.here>

·········@hotmail.com (gnuist006) writes:

> 
> Mr Friedrich's reply uses perl.
Please check again. 
> this language. Also I prefer not to generate very many 
> intermediate files since I have to process a large number
> of them. I also want to use bash/sed/awk instead of
> perl.
No problem just use one intermediate file and rename it on the run.

> 
> 
> On the other hand for a lisp based solution 
> I CAN write a macro or a lisp function to do the
> core logic in lisp inside emacs by myself using narrow 
> and widen or transient mode. But here what I do not know 
> is how to load one file after another and then save it
> to a new name and close that buffer. Please just show me
> how to do a bunch of files in this way. I can generate the
> file names like this in bash:
> 
> for i in `du -a directory | grep file.txt | sed to remove some junk from du`; do
> 
extending my solution to your needs is simple and you do not need any
bash-programming with it. Doing the stuff in Emacs Lisp is not
terrible difficult too and does not need any shell programming
too. I suggest you decide what you want and post where you'll expect
that answer is it shell scripting use come shell Group is it emacs
some emacs group. I have posted a solution which is on-topic in this
group, so feel free to use it or not. But do not ask here for Shell
solutions. 

Friedrich

From: Wayne Throop
Subject: Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
Date: Tue, 14 Jan 2003 22:10:59 +0000
Message-ID: <1042582259@sheol.org>

: ·········@hotmail.com (gnuist006)
: Even though the problem posed in this thread is still
: 
: *** UNSOLVED ***
: 
: let me extend my gratitude to Mr Christofer and Friedrich for their
: kind attempts to answer this.  I hope for some more help tonight. 

    perl -pe 's-\b(label="[^"]*")- ((($x=$1) =~ s./._.g),$x) -ge'

: I can write a bash wrapper loop but what I do not know here is how to
: implement the core logic of replacing an indefinite number of
: forward-slashes within a pattern of interest. 

    s/\//_/g

: I also want to use bash/sed/awk instead of perl. 

Why?

Oh, well.  You'd think this bit of awk-wardness would work
by analogy with the perl above	`

    awk '{gsub("label=\"[^\"]*\"", gensub("\/","_","g","&"));print}'

but it doesn't.  Hrm.  Maybe

    awk '{
        s=$0;
        m=match(s,"label=\"[^\"]*\"");
        if(m){
            pre =substr(s,1,RSTART-1);
            inf =substr(s,RSTART,RLENGTH);
            post=substr(s,RSTART+RLENGTH);
            gsub("/","_",inf);
            s= pre inf post;
        }
        print s;
    }'

Yeah, that works, at least on the cases I tested.  The perl is a tiny
bit cleaner, though; the perl version handles multiple labels on a line,
and the \b ensures the "label" isn't part of a larger word.  Both of
which are a bit tricky to do in awk.  Doable, just not as easy. 

Plus which, the perl looks more like line noise,
which is cool, and promotes job security.


Wayne Throop   ·······@sheol.org   http://sheol.org/throopw

From: Friedrich Dominicus
Subject: Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
Date: Tue, 14 Jan 2003 07:13:56 +0000
Message-ID: <874r8ckxu3.fsf@fbigm.here>

·········@hotmail.com (gnuist006) writes:

> Here is the type of lines I have in a file:
> 
> junk  label="junk1/junk2/junk3/.../junkn/" more junk
> 
> I want to find every line that has
> 
> label="..."
> 
> pattern
> 
> and then I want to replace every / by _ inside the
> quotes.
> 
> For the purposes of continuity, I want the script to look like this:
> 
> cat file |
> sed commands |
> awk commands |
you do not nead cat for just one file. 

> 
> etc.
> 
> I do not care if it all sed or awk or in what order.
Well you posted to c.l.lisp here's a Lisp solutoin for one file. It is
left to you to expand to more files (which is not too difficult)
(defun q-2003-01-14 (in-file)
  (let ((out-file (concatenate 'string (subseq in-file 0 
                                               (position #\. in-file :from-end t)) ".out")))
    (with-open-file (out out-file :direction :output 
                         :if-does-not-exist :create
                         :if-exists :supersede)
      (clawk:for-file-lines (in-file)
        (when (clawk:match clawk:$0 "label=\"\(.*)\"")
          (let* ((submatch (aref clawk:*regs* 1))
                 (substr (subseq clawk:$0 (car submatch) (cdr submatch))))
            (replace clawk:$0 
                     (pregexp-replace* "/" substr "_")
                     :start1 (car submatch)
                     :end1 (cdr submatch))))
        (princ clawk:$0 out)
        (terpri out) 
        (values)))))
              

With this file 
junk label="junk1/junk2/junk3/junk4" other stuff
other junk label="j1/j2/j3" other stuff and much more
nothing
label="/t1/t2/t3"

I got this result
junk label="junk1_junk2_junk3_junk4" other stuff
other junk label="j1_j2_j3" other stuff and much more
nothing
label="_t1_t2_t3"

Quite nice scripting with Common Lisp IMHO :)

Regards
Friedrich

From: ······@lksejb.lks.agilent.com
Subject: Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
Date: Tue, 14 Jan 2003 18:18:01 +0000
Message-ID: <uadi34mue.fsf@lksejb.lks.agilent.com>

·········@hotmail.com (gnuist006) writes:

> Here is the type of lines I have in a file:
> 
> junk  label="junk1/junk2/junk3/.../junkn/" more junk
> 
> I want to find every line that has
> 
> label="..."
> 
> pattern
> 
> and then I want to replace every / by _ inside the
> quotes.
> 
> For the purposes of continuity, I want the script to look like this:
> 
> cat file |
> sed commands |
> awk commands |
> 
> etc.
> 
> I do not care if it all sed or awk or in what order.

Can we assume that the only double quotes are those surrounding the
label string?  If so, you can use that to split the input file into
three files, something like:

cat file | cut -f1 -d\" > file1
cat file | cut -f2 -d\" > file2
cat file | cut -f3 -d\" > file3
sed 's/\//_/g' < file2 > file2.new
paste -d\" file1 file2.new file3 > outputfile

I didn't test this, but it might work...

-- 
Eric Backus
R&D Design Engineer
Agilent Technologies, Inc.
425-335-2495 Tel

From: Stefan Monnier
Subject: Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
Date: Tue, 14 Jan 2003 19:34:02 +0000
Message-ID: <5llm1ncyqd.fsf@rum.cs.yale.edu>

>>>>> "ericjb" == ericjb  <······@lksejb.lks.agilent.com> writes:
>> Here is the type of lines I have in a file:
>> junk  label="junk1/junk2/junk3/.../junkn/" more junk
>> I want to find every line that has
>> label="..."
>> pattern
>> and then I want to replace every / by _ inside the
>> quotes.

sed '/label=".*"/s|/|_/'

>> cat file |
>> sed commands |
>> awk commands |

I don't know if the Useless Use of Cat Award is still up for grabs,
so I'd recommend you don't bother running for it and just use
redirection instead:

  sed commands <file |
  awk commands |


-- Stefan


PS: This has nothing to do with Lisp, Emacs, or GNU, so I redirected
    the discussion to comp.unix.shell.

From: Kaz Kylheku
Subject: Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
Date: Tue, 14 Jan 2003 23:19:48 +0000
Message-ID: <cf333042.0301141519.369b7d90@posting.google.com>

·········@hotmail.com (gnuist006) wrote in message news:<····························@posting.google.com>...
> cat file |

Doh!

> BTW, I can do this kind of operation inside emacs. However, I do not know
> how to write an lisp script.

In an October 2002 thread you (as gnuist007) started under the subject
line ``On refining regexp by adding exceptions systematically'' and in
the the November thread ``Lambda Calculus and it [sic] relation to
LISP'' you received some comments regarding your use of newsgroups. 
This would be a good time to re-read some of them.

From: Alan Mackenzie
Subject: Re: A very simple question on SED or AWK for a GURU, possibly a lisp script or emacs batch processing of many files
Date: Tue, 14 Jan 2003 22:54:19 +0000
Message-ID: <re420b.v5.ln@acm.acm>

gnuist006 <·········@hotmail.com> wrote on 13 Jan 2003 16:03:15 -0800:
> Here is the type of lines I have in a file:

> junk  label="junk1/junk2/junk3/.../junkn/" more junk

> I want to find every line that has

> label="..."

> pattern

> and then I want to replace every / by _ inside the
> quotes.

Sounds like awk could be your tool of choice.  Using gawk:

cat file |
gawk 'BEGIN {FS = "\""; OFS = "\""}; /[a-zA-Z_0-9]+=/ {gsub("/", "_", $2)}; {print}'

(or something very like it) will do the job.  Note:  I haven't tested
this.  The solution assumes that the "junk" at the beginning of each line
doesn't contain any "s.

I would guess that alternative solutions, whether in Emacs lisp or
perl or whatever would be much longer than this one-liner.  The
newsgroup comp.lang.awk might be a better place to ask such questions.
Alternatively, email me if the above gawk program doesn't "quite" work,
or you want me to explain it.

> gnuist.

-- 
Alan Mackenzie (Munich, Germany)
Email: ····@muuc.dee; to decode, wherever there is a repeated letter
(like "aa"), remove half of them (leaving, say, "a").