From: Jeremy
Subject: Excluding items with Regular Expression
Date: 
Message-ID: <ef54c5b3.0204082042.31bd4350@posting.google.com>
Hello,

I am working on a project to identify tag lines in a file.  The
general pattern is an alpha-numeric string, followed by a dash (-),
followed by subsequent alpha-numeric strings.  There are lines in the
file that fit the pattern, but should not be included in the list. 
Does anyone have any idea how to exclude a specific subset within a
regular expression?

To help illustrate the problem, from the following list of possible
matches:

PROGRAM-ID.
DATE-WRITTEN.
DATE-COMPILED.
SOURCE-COMPUTER.
OBJECT-COMPUTER.
INPUT-OUTPUT
FILE-CONTROL.
WORKING-STORAGE
Z000-CARD-ERROR.
Z000-CARD-ERROR-EXIT.

Only the Z000-CARD-ERROR. should be selected.  The following RE will
select the Z000-CARD-ERROR. line and exclude the Z000-CARD-ERROR-EXIT.
line.  It just picks up all prior lines too...

RE: "\(^[0-9A-Z]+\-\([^E]\|E\([^X]\|X\([^I]\|I\([^T]\|T[^\.]\)\)\)\)\)\([^
]\)*\.$"

Any advice is appreciated!

Mahalo,
Jeremy

From: Michael Parker
Subject: Re: Excluding items with Regular Expression
Date: 
Message-ID: <9f023346.0204090817.10e3c045@posting.google.com>
···············@hotmail.com (Jeremy) wrote in message news:<····························@posting.google.com>...
> Hello,
> 
> I am working on a project to identify tag lines in a file.  The
> general pattern is an alpha-numeric string, followed by a dash (-),
> followed by subsequent alpha-numeric strings.  There are lines in the
> file that fit the pattern, but should not be included in the list. 
> Does anyone have any idea how to exclude a specific subset within a
> regular expression?

From first blush, all the lines you give match this informal pattern.
 
> To help illustrate the problem, from the following list of possible
> matches:
> 
> PROGRAM-ID.
> DATE-WRITTEN.
> DATE-COMPILED.
> SOURCE-COMPUTER.
> OBJECT-COMPUTER.
> INPUT-OUTPUT
> FILE-CONTROL.
> WORKING-STORAGE
> Z000-CARD-ERROR.
> Z000-CARD-ERROR-EXIT.
> 
> Only the Z000-CARD-ERROR. should be selected.  The following RE will
> select the Z000-CARD-ERROR. line and exclude the Z000-CARD-ERROR-EXIT.
> line.  It just picks up all prior lines too...
> 
> RE: "\(^[0-9A-Z]+\-\([^E]\|E\([^X]\|X\([^I]\|I\([^T]\|T[^\.]\)\)\)\)\)\([^
> ]\)*\.$"
> 
> Any advice is appreciated!

Since all of these lines fit your verbal description of the pattern,
what
really makes "Z000-CARD-ERROR." special?

If that's the only one you want to pick up, then how about the pattern
"^Z000-CARD-ERROR\\."?

If there are many more, why not just enumerate them in your pattern?

If there are many many more, then you need to be more specific in your
description of what you're trying to recognize.
From: Joe Marshall
Subject: Re: Excluding items with Regular Expression
Date: 
Message-ID: <o_Cs8.15959$%s3.5494652@typhoon.ne.ipsvc.net>
"Jeremy" <···············@hotmail.com> wrote in message
·································@posting.google.com...
> Hello,
>
> I am working on a project to identify tag lines in a file.  The
> general pattern is an alpha-numeric string, followed by a dash (-),
> followed by subsequent alpha-numeric strings.  There are lines in the
> file that fit the pattern, but should not be included in the list.

How do you (as a human) tell the difference?

> Does anyone have any idea how to exclude a specific subset within a
> regular expression?

You shouldn't try to use a regular expression for anything too
complicated.  If you have an expression that correctly selects
all potential `tag lines', then another that correctly rejects
the ones that accidentally matched, then perhaps you should
do it in two passes.
From: Dorai Sitaram
Subject: Re: Excluding items with Regular Expression
Date: 
Message-ID: <a8v11l$72i$1@news.gte.com>
In article <····························@posting.google.com>,
Jeremy <···············@hotmail.com> wrote:
>Hello,
>
>I am working on a project to identify tag lines in a file.  The
>general pattern is an alpha-numeric string, followed by a dash (-),
>followed by subsequent alpha-numeric strings.  There are lines in the
>file that fit the pattern, but should not be included in the list. 
>Does anyone have any idea how to exclude a specific subset within a
>regular expression?
>
>To help illustrate the problem, from the following list of possible
>matches:
>
>PROGRAM-ID.
>DATE-WRITTEN.
>DATE-COMPILED.
>SOURCE-COMPUTER.
>OBJECT-COMPUTER.
>INPUT-OUTPUT
>FILE-CONTROL.
>WORKING-STORAGE
>Z000-CARD-ERROR.
>Z000-CARD-ERROR-EXIT.
>
>Only the Z000-CARD-ERROR. should be selected.  The following RE will
>select the Z000-CARD-ERROR. line and exclude the Z000-CARD-ERROR-EXIT.
>line.  It just picks up all prior lines too...
>
>RE: "\(^[0-9A-Z]+\-\([^E]\|E\([^X]\|X\([^I]\|I\([^T]\|T[^\.]\)\)\)\)\)\([^
>]\)*\.$"
>
>Any advice is appreciated!

You have not stated, in English, which patterns
you want to match and which you want to reject.  All
you have stated is you want to match

Z000-CARD-ERROR.

but not

Z000-CARD-ERROR-EXIT.

But it doesn't look like you want
"^Z000-CARD-ERROR\.$", which matches exactly the first
string and nothing else.

If you want to exclude a pattern "pat" that
otherwise would match from being followed by "-EXIT",
you could use negative lookahead

"pat(?!-EXIT)"