From: Slobodan Blazeski
Subject: Anybody willing to share their cl-ppcre regex for validating email 	address
Date: 
Message-ID: <7408afe6-b8e0-436b-80d1-b06c114eb060@h11g2000prf.googlegroups.com>
I'm adding a new type - email address and I want to make validation
with cl-ppcre .
I just started using the regexes and  I found a plenty of those email
validators on web  but I would prefer to see what lispers use.

thanks
Slobodan

From: Brian Adkins
Subject: Re: Anybody willing to share their cl-ppcre regex for validating 	email address
Date: 
Message-ID: <ac8f5571-893a-493f-97eb-418ca00e476f@c4g2000hsg.googlegroups.com>
On Jan 3, 4:58 pm, Slobodan Blazeski <·················@gmail.com>
wrote:
> I'm adding a new type - email address and I want to make validation
> with cl-ppcre .
> I just started using the regexes and  I found a plenty of those email
> validators on web  but I would prefer to see what lispers use.
>
> thanks
> Slobodan

I read the following article recently but haven't had time to port it
to something other than PHP yet. You may find some useful ideas in it:
http://www.linuxjournal.com/article/9585

I've been using the following regex which is far from perfect but
hasn't caused me any great problems:
^([-\w+.]+)@(([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})|(([-\w]+
\.)+[a-zA-Z]{2,4}))$

Brian Adkins
From: Vassil Nikolov
Subject: Re: Anybody willing to share their cl-ppcre regex for validating email  address
Date: 
Message-ID: <snwwsqq5ic2.fsf@luna.vassil.nikolov.names>
Slobodan Blazeski <·················@gmail.com> writes:
> ...
> I just started using the regexes and  I found a plenty of those email
> validators on web  but I would prefer to see what lispers use.

  It very much matters what exactly you require (if you are just
  experimenting, then it probably doesn't matter at all).  For
  example, do you need to err on the side of caution, or on the side
  of tolerance?

  Generally speaking, validating the domain part is fairly easy (but
  note that e.g. top-level domains are not necessarily up to four
  characters long); validating the local part is harder.

  (When I needed such a regex some time ago, I rolled my own.  If
  nothing else, that was easier than verifying that someone else's
  regex did what I wanted.)

  ---Vassil.

-- 
Bound variables, free programmers.
From: Christopher Browne
Subject: Re: Anybody willing to share their cl-ppcre regex for validating email  address
Date: 
Message-ID: <60wsqp31k8.fsf@dba2.int.libertyrms.com>
Vassil Nikolov <···············@pobox.com> writes:
> Slobodan Blazeski <·················@gmail.com> writes:
>> ...
>> I just started using the regexes and  I found a plenty of those email
>> validators on web  but I would prefer to see what lispers use.
>
>   It very much matters what exactly you require (if you are just
>   experimenting, then it probably doesn't matter at all).  For
>   example, do you need to err on the side of caution, or on the side
>   of tolerance?
>
>   Generally speaking, validating the domain part is fairly easy (but
>   note that e.g. top-level domains are not necessarily up to four
>   characters long); validating the local part is harder.

ICANN has approved TLDs .museum and .travel that are both > 4
characters long...

>   (When I needed such a regex some time ago, I rolled my own.  If
>   nothing else, that was easier than verifying that someone else's
>   regex did what I wanted.)

I've always found that if I can split the address into pieces, it's
easier and safer to validate those pieces...

Notably, an email address will consist of at least the following
"major chunks":

  [local identity section]@[optional 3rd-or-higher level domain].[2nd level domain].[TLD]

OR
  [local identity section]@[IP address] 
      (which is in poor form, which I'd be liable to reject)

You could apply some more-or-less strict rules to the 2nd level
domain, and potentially some rather strict rules to the TLD.

 - For instance with the TLD, there is a limited set of TLDs of length
   >3, which might be readily encoded; something like:

   (net|com|org|info|name|aero|museum|biz|arpa|asia|coop|edu|gov|int|jobs|mil|mobi|pro|tel|cat)

 - There are a periodically-varying set of 2 letter country TLDs that
   *might* be encoded by a regexp that distinguishes between those that
   do and don't exist.

   This changes frequently enough that I'd rather go with matching
   them against: [a-zA-Z][a-zA-Z]

For the TLDs, ICANN periodically approves new ones, most recently,
.asia, so that there is some riskiness to encoding this very tightly.

How tolerant to be is indeed a big question here.  

I'd be inclined to be "pretty tolerant," because intolerance is liable
to lead to the code needing to get revved a lot, and not for
particularly valuable reasons.
-- 
output = ("cbbrowne" ·@" "linuxdatabases.info")
http://linuxdatabases.info/info/finances.html
:FATAL ERROR -- VECTOR OUT OF HILBERT SPACE
From: Rob Warnock
Subject: Re: Anybody willing to share their cl-ppcre regex for validating email  address
Date: 
Message-ID: <1rqdnZwiRfi8GuLanZ2dnUVZ_ournZ2d@speakeasy.net>
Christopher Browne  <········@ca.afilias.info> wrote:
+---------------
| How tolerant to be is indeed a big question here.  
| 
| I'd be inclined to be "pretty tolerant," because intolerance is liable
| to lead to the code needing to get revved a lot, and not for
| particularly valuable reasons.
+---------------

And then there's also that venerable Internet protocol design maxim:

   Be strict in what you emit, and liberal in what you accept.


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Vassil Nikolov
Subject: Re: Anybody willing to share their cl-ppcre regex for validating email  address
Date: 
Message-ID: <snwodc02wbp.fsf@luna.vassil.nikolov.names>
····@rpw3.org (Rob Warnock) writes:
> ...
>    Be strict in what you emit, and liberal in what you accept.

  Quite, but apply it carefully to the case at hand.  Consider that
  one possible use of validation of e-mail addresses is when they are
  collected and entered into a repository (e.g. an address book, to
  catch typos etc.)---the "accept" part, and then these addresses,
  typically just as they are, would be used to send e-mail---the
  "emit" part.  I don't think there is a one-size-fits-all decision
  here; much depends, for example, on whether and how delivery
  failures caused by invalid addresses are handled.  (It is probably
  not worth it to analyze this in the abstract, though.)

  ---Vassil.


-- 
Bound variables, free programmers.
From: Edi Weitz
Subject: Re: Anybody willing to share their cl-ppcre regex for validating email  address
Date: 
Message-ID: <uk5mqr1fe.fsf@agharta.de>
On Thu, 3 Jan 2008 13:58:01 -0800 (PST), Slobodan Blazeski <·················@gmail.com> wrote:

> I'm adding a new type - email address and I want to make validation
> with cl-ppcre .  I just started using the regexes and I found a
> plenty of those email validators on web but I would prefer to see
> what lispers use.

Go to the next library and get a copy of Jeffrey Friedl's "Mastering
Regular Expressions".  It contains a 6600-byte regex which spans
several pages.  Friedl says it doesn't cover all possible email
addresses, but it should be pretty OK... :)

Edi.

-- 

European Common Lisp Meeting, Amsterdam, April 19/20, 2008

  http://weitz.de/eclm2008/

Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: Slobodan Blazeski
Subject: Re: Anybody willing to share their cl-ppcre regex for validating 	email address
Date: 
Message-ID: <f81a748d-00fa-442f-ad6c-62ec34c03df4@e23g2000prf.googlegroups.com>
On Jan 3, 11:26 pm, Edi Weitz <········@agharta.de> wrote:
> On Thu, 3 Jan 2008 13:58:01 -0800 (PST), Slobodan Blazeski <·················@gmail.com> wrote:
> > I'm adding a new type - email address and I want to make validation
> > with cl-ppcre .  I just started using the regexes and I found a
> > plenty of those email validators on web but I would prefer to see
> > what lispers use.
>
> Go to the next library and get a copy of Jeffrey Friedl's "Mastering
> Regular Expressions".  It contains a 6600-byte regex which spans
> several pages.  Friedl says it doesn't cover all possible email
> addresses, but it should be pretty OK... :)
>
> Edi.

Something like this http://ex-parrot.com/~pdw/Mail-RFC822-Address.html
beside I already got Jeffrey E. F. Friedl book a week ago, that's what
I'm using to learn regular expressions together with cl-ppcre and
regex coach.  :)

What do you suggest instead of regex validation beside sending mail to
it?

Slobodan
>
> --
>
> European Common Lisp Meeting, Amsterdam, April 19/20, 2008
>
>  http://weitz.de/eclm2008/
>
> Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: Edi Weitz
Subject: Re: Anybody willing to share their cl-ppcre regex for validating  email address
Date: 
Message-ID: <u3ateqzu9.fsf@agharta.de>
On Thu, 3 Jan 2008 14:48:12 -0800 (PST), Slobodan Blazeski <·················@gmail.com> wrote:

> Something like this http://ex-parrot.com/~pdw/Mail-RFC822-Address.html

Not sure if that's the same one.  Looks so short... :)

> beside I already got Jeffrey E. F. Friedl book a week ago, that's
> what I'm using to learn regular expressions together with cl-ppcre
> and regex coach.  :)

Good.  Make sure to download the latest version of Regex Coach which I
released yesterday.

> What do you suggest instead of regex validation beside sending mail
> to it?

I think regex validation is fine and Friedl was overdoing it a bit, of
course.  I usually use some ad-hoc regex which (kind of) works.
(Although ISTR I once or twice had false positives and I had to change
the regex to make it a bit more permissive.)

Edi.

-- 

European Common Lisp Meeting, Amsterdam, April 19/20, 2008

  http://weitz.de/eclm2008/

Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: Michael Weber
Subject: Re: Anybody willing to share their cl-ppcre regex for validating 	email address
Date: 
Message-ID: <6fd1fae8-a737-42fa-84d0-5624a057769c@x69g2000hsx.googlegroups.com>
On Jan 4, 12:00 am, Edi Weitz <········@agharta.de> wrote:
> I usually use some ad-hoc regex which (kind of) works.
> (Although ISTR I once or twice had false positives and I had to change
> the regex to make it a bit more permissive.)

The tragedy is that many developers cook something like that up
without reading the relevant RFCs,  probably thinking "Yeah, I know
how an email address looks like".  Except that they did not know that,
e.g., #\+ or #\- are valid characters.  And I am not even talking
about _full_ email addresses, with ()-comments and all.

Does your ad-hoc regex accept ···········@example.com? :)


Cheers,
Michael
From: Slobodan Blazeski
Subject: Re: Anybody willing to share their cl-ppcre regex for validating 	email address
Date: 
Message-ID: <23d7f602-d958-48e8-bee0-57927c21184b@s12g2000prg.googlegroups.com>
Seems that even Jeffrey Friedl's regex copied from Joost link
http://search.cpan.org/src/RJBS/Email-Valid-0.179/lib/Email/Valid.pm
is far from perfect.
Here's example fed with http://www.linuxjournal.com/article/9585
data:

PPCRE> (defun valid (email)
	 (let ((scanner (create-scanner rg1)))
	   (scan scanner email)))
VALID

PPCRE> (valid ·········@blazeski.com")
0
21
#(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL
NIL NIL
  NIL NIL NIL NIL NIL)
#(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL
NIL NIL
  NIL NIL NIL NIL NIL

PPCRE> (defvar *valid-emails*
	 (list ·····@us.ibm.com"
               ······@···@example.com"
               ········@example.com"
               "Fred\\ ······@example.com"
               ·············@example.com"
               ······@·····@example.com"
               "\"Fred ········@example.com"
               ·····························@example.com"
               ·········@example.com"
               ·············@example.com"
               ··········@example.com"
               ·············@example.com"
               ············@example.com"
               "Doug\\ \\\"Ace\\\"\\ ······@example.com"
               "\"Doug \\\"Ace\\\" ····@example.com"))
*VALID-EMAILS*
PPCRE> (loop for email in *valid-emails* when (not (valid email))
collect email)
(········@example.com") ; ideally this should be nil

PPCRE> (defvar *invalid-emails*
	 (list ····@···@example.com"
               ········@···@example.com"
               ······@example.com"
               ·@example.com"
               ·····@"
               ·····@example.com"
               ······@example.com"
               ·····@example.com"
               ·····@example.com"
               ·········@example.com"
               "\"Doug \"Ace\" ····@example.com"
               "Doug\\ \\\"Ace\\\"\\ ····@example.com"
               "hello ·····@example.com"
               ·······@f.sc.ot.t.f.i.tzg.era.l.d."))

PPCRE> (loop for email in *invalid-emails* when (valid email) collect
email)
(····@···@example.com" ········@···@example.com" ·····@example.com"
 ·····@example.com" ·········@example.com" "\"Doug \"Ace\" L.
··@example.com"
 "hello ·····@example.com" ·······@f.sc.ot.t.f.i.tzg.era.l.d.") ;
ideally this should be nil

cheers
Slobodan
From: Robert Uhl
Subject: Re: Anybody willing to share their cl-ppcre regex for validating  email address
Date: 
Message-ID: <m37iiptf8y.fsf@latakia.dyndns.org>
Slobodan Blazeski <·················@gmail.com> writes:
> PPCRE> (defvar *valid-emails*
> 	 (list ·····@us.ibm.com"
>                ······@···@example.com"

According to RFC 2822 <http://tools.ietf.org/html/rfc2822> that's not a
valid email address: the local part of an address can contain a letter,
number or !#$%&'*+-/=?^_`{|}~ _or_ it can be a quoted string.  So
·····@····@example.com would be valid, but ····@···@example.com ain't.

>                ········@example.com"

Ditto; ·······@example.com would be valid but ·····@example.com isn't.

>                "Fred\\ ······@example.com"

Ditto.

>                ·············@example.com"

Ditto.

>                ······@·····@example.com"
>                "\"Fred ········@example.com"
>                ·····························@example.com"
>                ·········@example.com"
>                ·············@example.com"
>                ··········@example.com"
>                ·············@example.com"

Some nice examples of valid email addresses most validators fail.  The
bastards.

>                "Doug\\ \\\"Ace\\\"\\ ······@example.com"

That's invalid; it should be "Doug \"Ace\" ·······@example.com

You forgot valid addresses of the form ···@[···@baz!quux\]quuux]

*grin*

Invalid addresses:

>                ·······@f.sc.ot.t.f.i.tzg.era.l.d."))

That's valid, no?  If there were a valid top-level domain 'd,' anyway.

-- 
Robert Uhl <http://public.xdi.org/=ruhl>
What should be written on the point end of a claymore?
'This end tae enemy.'                  --Mike Andrews
From: Vassil Nikolov
Subject: Re: Anybody willing to share their cl-ppcre regex for validating  email address
Date: 
Message-ID: <snwzlvl49vq.fsf@luna.vassil.nikolov.names>
Slobodan Blazeski <·················@gmail.com> writes:
> ...
>                ·····@example.com"
>                ·····@example.com"
>                ·········@example.com"

  I share the general sentiment, but why are the above three listed as
  invalid?  Is there an obscure constraint on the local part in some
  RFCs that I am missing?

  ---Vassil.


-- 
Bound variables, free programmers.
From: Geoffrey Summerhayes
Subject: Re: Anybody willing to share their cl-ppcre regex for validating 	email address
Date: 
Message-ID: <76c4b12c-cd26-4edf-9785-c298beea462f@y5g2000hsf.googlegroups.com>
On Jan 4, 3:24 pm, Vassil Nikolov <···············@pobox.com> wrote:
> Slobodan Blazeski <·················@gmail.com> writes:
> > ...
> >                ·····@example.com"
> >                ·····@example.com"
> >                ·········@example.com"
>
>   I share the general sentiment, but why are the above three listed as
>   invalid?  Is there an obscure constraint on the local part in some
>   RFCs that I am missing?

RFC 822:

local-part  =  word *("." word)
word        =  atom / quoted-string
atom        =  1*<any CHAR except specials, SPACE and CTLs>
quoted-string = <"> *(qtext/quoted-pair) <">
specials    =  "(" / ")" / "<" / ">" / ·@"  ; Must be in quoted-
            /  "," / ";" / ":" / "\" / <">  ;  string, to use
            /  "." / "[" / "]"              ;  within a word.

The '.' is defined as a seperator and there has to be something
on either side.
Mind you, it looks to me that ··@foo.com and ·········@foo.com
would both be legal.

---
Geoff
From: Vassil Nikolov
Subject: Re: Anybody willing to share their cl-ppcre regex for validating  email address
Date: 
Message-ID: <snwbq813wo7.fsf@luna.vassil.nikolov.names>
Geoffrey Summerhayes <·······@gmail.com> writes:

> On Jan 4, 3:24�pm, Vassil Nikolov <···············@pobox.com> wrote:
>> Slobodan Blazeski <·················@gmail.com> writes:
>> > ...
>> > � � � � � � � ······@example.com"
>> > � � � � � � � ······@example.com"
>> > � � � � � � � ··········@example.com"
>>
>> � I share the general sentiment, but why are the above three listed as
>> � invalid? �Is there an obscure constraint on the local part in some
>> � RFCs that I am missing?
>
> RFC 822:
>
> local-part  =  word *("." word)
> word        =  atom / quoted-string
> atom        =  1*<any CHAR except specials, SPACE and CTLs>

  And not an obscure part, either.  Should have checked that myself.
  Thanks.

> ...
> Mind you, it looks to me that ··@foo.com and ·········@foo.com
> would both be legal.

  Me, too.  I seem to recall, though, that some MUAs don't work (well
  or at all) with e-mail addresses containing quotes.  Of course, "is
  this a valid e-mail address" is not exactly the same question as
  "can I send mail to it"...

  ---Vassil.


-- 
Bound variables, free programmers.
From: Tony Garnock-Jones
Subject: Re: Anybody willing to share their cl-ppcre regex for validating  email address
Date: 
Message-ID: <NMadnWrqlbUKWuPanZ2dnUVZ8vGdnZ2d@eclipse.net.uk>
Geoffrey Summerhayes wrote:
> RFC 822:

Does 2822 make things any simpler?

Tony
From: Rob Warnock
Subject: Re: Anybody willing to share their cl-ppcre regex for validating  email address
Date: 
Message-ID: <M4WdnZ6ypp_SHuLanZ2dnUVZ_tCrnZ2d@speakeasy.net>
Tony Garnock-Jones  <·······················@google.easily> wrote:
+---------------
| Geoffrey Summerhayes wrote:
| > RFC 822:
| 
| Does 2822 make things any simpler?
+---------------

Possibly a little bit, but not in any significiant way in the area
I think you're asking about. RFC 2822 introduced the "dot-atom"
production which simplified the description of when periods were
allowed in unquoted local-parts:

    Some of the structured header field bodies also allow the period
    character (".", ASCII value 46) within runs of atext. An additional
    "dot-atom" token is defined for those purposes.
    ...
    atom            =       [CFWS] 1*atext [CFWS]
    dot-atom        =       [CFWS] dot-atom-text [CFWS]
    dot-atom-text   =       1*atext *("." 1*atext)

    Both atom and dot-atom are interpreted as a single unit, comprised of
    the string of characters that make it up.  Semantically, the optional
    comments and FWS surrounding the rest of the characters are not part
    of the atom; the atom is only the run of atext characters in an atom,
    or the atext and "." characters in a dot-atom.

and then "addr-spec" was tweaked to use "dot-atom":

    3.4.1. Addr-spec specification
    An addr-spec is a specific Internet identifier that contains a
    locally interpreted string followed by the at-sign character (·@",
    ASCII value 64) followed by an Internet domain.  The locally
    interpreted string is either a quoted-string or a dot-atom.  If the
    string can be represented as a dot-atom (that is, it contains no
    characters other than atext characters or "." surrounded by atext
    characters), then the dot-atom form SHOULD be used and the
    quoted-string form SHOULD NOT be used. Comments and folding white
    space SHOULD NOT be used around the ·@" in the addr-spec.


    addr-spec       =       local-part ·@" domain
    local-part      =       dot-atom / quoted-string / obs-local-part
    domain          =       dot-atom / domain-literal / obs-domain
    domain-literal  =       [CFWS] "[" *([FWS] dcontent) [FWS] "]" [CFWS]
    dcontent        =       dtext / quoted-pair
    dtext           =       NO-WS-CTL /     ; Non white space controls
			    %d33-90 /       ; The rest of the US-ASCII
			    %d94-126        ;  characters not including "[",
					    ;  "]", or "\"

Also, the "route" syntax in a "addr-spec" was deprecated, see
"4.4 Obsolete Addressing" and the "obs-angle-addr" production.

Finally, "CFWS" [comment and/or folding-white-space] was removed
from being allowed around the dots within a "word" and around the
·@" between a "local-part" and a "domain". [I think. If I'm reading
"4.4" correctly.]

Unfortunately, while these simplifications apply to what you may *send*,
they do *NOT* apply to what you must still be prepared to *receive*:

    3.1. Introduction
    ...
    In some of the definitions, there will be nonterminals whose names
    start with "obs-".  These "obs-" elements refer to tokens defined in
    the obsolete syntax in section 4.  In all cases, these productions
    are to be ignored for the purposes of generating legal Internet
    messages and MUST NOT be used as part of such a message.  However,
    when interpreting messages, these tokens MUST be honored as part of
    the legal syntax.  In this sense, section 3 defines a grammar for
    generation of messages, with "obs-" elements that are to be ignored,
    while section 4 adds grammar for interpretation of messages.

This means that you must still be prepared to *parse* all the
old, ugly syntax, which means that it's really no simplification
at all, practically speaking. (*sigh*)


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Robert Uhl
Subject: Re: Anybody willing to share their cl-ppcre regex for validating  email address
Date: 
Message-ID: <m3sl1crsau.fsf@latakia.dyndns.org>
Vassil Nikolov <···············@pobox.com> writes:

> Slobodan Blazeski <·················@gmail.com> writes:
>> ...
>>                ·····@example.com"
>>                ·····@example.com"
>>                ·········@example.com"
>
>   I share the general sentiment, but why are the above three listed as
>   invalid?  Is there an obscure constraint on the local part in some
>   RFCs that I am missing?

If I read the pseudo-BNF in the RFC properly, names may include anything
but a dot, followed by a dot, followed by anything but a dot (this
pattern may repeat); thus they can neither start nor end with a dot, nor
contain two consecutive dots.

-- 
Robert Uhl <http://public.xdi.org/=ruhl>
That was possibly the only time in my entire career that somebody actually
noticed and cared to say that they were impressed by something I'd achieved.
These days, I could turn lead into gold, and I'd be whinged at because they
wanted platinum.                                            --Peter Corlett
From: Rob Warnock
Subject: Re: Anybody willing to share their cl-ppcre regex for validating  email address
Date: 
Message-ID: <ccudnYvVV-DyOR3anZ2dnUVZ_gqdnZ2d@speakeasy.net>
Robert Uhl  <·········@NOSPAMgmail.com> wrote:
+---------------
| If I read the pseudo-BNF in the RFC properly...
+---------------

Slight aside: What you call "pseudo-BNF" is actually called
"Augmented BNF" (ABNF), a well-known, well-defined extension
of basic BNF. As RFC 2822 "1.2.2. Syntactic notation" notes:

    ... Augmented Backus-Naur Form (ABNF) notation specified in
    [RFC2234] for the formal definitions of the syntax of messages.

RFC 2234 is a "refactoring" of earlier definitions of ABNF, which
was originally specified in RFC 733, copied in RFC 822, and used
in several other RFCs, each of which contained its own definition
of it [though they tended to be pretty much identical]. The main
difference between the ABNF of RFC 2234 and earlier versions is
that the ABNF of RFC 2234 is formally defined both in English
and again *in* ABNF, rather than in just English as in RFC 733.

[Note: RFC 4234 (in 2005) superceded RFC 2234 to correct a
number of minor typos and formatting infelicities. Also see
<http://en.wikipedia.org/wiki/Augmented_Backus-Naur_form>.]

So it's hardly "pseudo", any more than the "Modified BNF" used
in the ANSI Common Lisp Standard is:

    http://alu.org/HyperSpec/Body/sec_1-4-1-2.html


-Rob

p.s. There is a rich history of extensions to BNF, such as ISO-14977
EBNF <http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form>.

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Edi Weitz
Subject: Re: Anybody willing to share their cl-ppcre regex for validating  email address
Date: 
Message-ID: <ud4sh493o.fsf@agharta.de>
On Fri, 4 Jan 2008 10:51:36 -0800 (PST), Slobodan Blazeski <·················@gmail.com> wrote:

> Seems that even Jeffrey Friedl's regex copied from Joost link
> http://search.cpan.org/src/RJBS/Email-Valid-0.179/lib/Email/Valid.pm
> is far from perfect.

Did you make sure that the backslashes were interpreted correctly when
you pasted the regex into your Lisp?  And whitespace?

Edi.

-- 

European Common Lisp Meeting, Amsterdam, April 19/20, 2008

  http://weitz.de/eclm2008/

Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: Slobodan Blazeski
Subject: Re: Anybody willing to share their cl-ppcre regex for validating 	email address
Date: 
Message-ID: <0fdd76db-acd2-460b-b175-5c31152846b9@q77g2000hsh.googlegroups.com>
On Jan 4, 9:41 pm, Edi Weitz <········@agharta.de> wrote:
> On Fri, 4 Jan 2008 10:51:36 -0800 (PST), Slobodan Blazeski <·················@gmail.com> wrote:
> > Seems that even Jeffrey Friedl's regex copied from Joost link
> >http://search.cpan.org/src/RJBS/Email-Valid-0.179/lib/Email/Valid.pm
> > is far from perfect.
>
> Did you make sure that the backslashes were interpreted correctly when
> you pasted the regex into your Lisp?  And whitespace?
>
> Edi.

I repeated the tests in a clean enviroment with restarted sbcl but the
results are the same .
This time I paste the string from http://search.cpan.org/src/RJBS/Email-Valid-0.179/lib/Email/Valid.pm
into a file and set the regex variable like this :
(setf regex (apply #'concatenate 'string
			    (let ((in (open "/home/bobi/regex" :if-does-not-exist nil)))
			      (when in
				(loop for line = (read-line in nil)
				   while line collect line)))))
everything else was same, beside package designator,  I worked in cl-
user package this time:

Slobodan

CL-USER> (defun valid (email)
	   (let ((scanner (cl-ppcre:create-scanner regex)))
	     (cl-ppcre:scan scanner email))
VALID
CL-USER> (valid ·····@gmail.com")
0
14
#()
#()
CL-USER> (defvar *valid-emails*
         (list ·····@us.ibm.com"
               ······@···@example.com"
               ········@example.com"
               "Fred\\ ······@example.com"
               ·············@example.com"
               ······@·····@example.com"
               "\"Fred ········@example.com"
               ·····························@example.com"
               ·········@example.com"
               ·············@example.com"
               ··········@example.com"
               ·············@example.com"
               ············@example.com"
               "Doug\\ \\\"Ace\\\"\\ ······@example.com"
               "\"Doug \\\"Ace\\\" ····@example.com"))
*VALID-EMAILS*
CL-USER> (loop for email in *valid-emails* when (not (valid email))
collect email)
(········@example.com")
CL-USER> (defvar *invalid-emails*
         (list ····@···@example.com"
               ········@···@example.com"
               ······@example.com"
               ·@example.com"
               ·····@"
               ·····@example.com"
               ······@example.com"
               ·····@example.com"
               ·····@example.com"
               ·········@example.com"
               "\"Doug \"Ace\" ····@example.com"
               "Doug\\ \\\"Ace\\\"\\ ····@example.com"
               "hello ·····@example.com"
               ·······@f.sc.ot.t.f.i.tzg.era.l.d."))


*INVALID-EMAILS*
CL-USER> (loop for email in *invalid-emails* when (valid email)
collect
email)
(····@···@example.com" ········@···@example.com" ·····@example.com"
 ·····@example.com" ·········@example.com" "\"Doug \"Ace\" L.
··@example.com"
 "hello ·····@example.com" ·······@f.sc.ot.t.f.i.tzg.era.l.d.")
CL-USER> (set-difference '(········@example.com")
			 ' (········@example.com") :test 'equalp)
NIL
CL-USER> (set-difference '(····@···@example.com" "abc\\\
·@···@example.com" ·····@example.com"
 ·····@example.com" ·········@example.com" "\"Doug \"Ace\" L.
··@example.com"
 "hello ·····@example.com" ·······@f.sc.ot.t.f.i.tzg.era.l.d.")
			   '(····@···@example.com" ········@···@example.com"
·····@example.com"
 ·····@example.com" ·········@example.com" "\"Doug \"Ace\" L.
··@example.com"
 "hello ·····@example.com" ·······@f.sc.ot.t.f.i.tzg.era.l.d.") :test
#'equalp)
NIL
From: Joost Diepenmaat
Subject: Re: Anybody willing to share their cl-ppcre regex for validating email  address
Date: 
Message-ID: <87d4siy15r.fsf@zeekat.nl>
Edi Weitz <········@agharta.de> writes:

> On Thu, 3 Jan 2008 13:58:01 -0800 (PST), Slobodan Blazeski <·················@gmail.com> wrote:
>
>> I'm adding a new type - email address and I want to make validation
>> with cl-ppcre .  I just started using the regexes and I found a
>> plenty of those email validators on web but I would prefer to see
>> what lispers use.
>
> Go to the next library and get a copy of Jeffrey Friedl's "Mastering
> Regular Expressions".  It contains a 6600-byte regex which spans
> several pages.  Friedl says it doesn't cover all possible email
> addresses, but it should be pretty OK... :)
>

I was going to point to that too. Also, the source of the Email::Valid
perl module appears to contain the regular expression from that book:

http://search.cpan.org/src/RJBS/Email-Valid-0.179/lib/Email/Valid.pm

Joost.
From: Robert Uhl
Subject: Re: Anybody willing to share their cl-ppcre regex for validating email  address
Date: 
Message-ID: <m3k5mptqqk.fsf@latakia.dyndns.org>
Slobodan Blazeski <·················@gmail.com> writes:

> I just started using the regexes and I found a plenty of those email
> validators on web but I would prefer to see what lispers use.

Email addresses can't really be validated with a regular
expression--they require a push-down automaton, not a finite state
automaton.  You can find some regular expressions on the web, but
they're necessarily incomplete and will reject valid email address.

The best practical way to check that an email address is correct is to
send email to it; if it's not correct, it'll bounce.

Whatever you do, don't reject '+' in the local portion of the email
address, or gnomes will eat your children.

-- 
Robert Uhl <http://public.xdi.org/=ruhl>
So I was reading Twelfth Night ... and would you believe that the I LOVE YOU
hoax is the exact same trick Shakespeare uses to point out what an arrogant,
self-absorbed fool Malvolio is?                           --Julia McKinnell
From: Vassil Nikolov
Subject: Re: Anybody willing to share their cl-ppcre regex for validating email  address
Date: 
Message-ID: <snwve6949bs.fsf@luna.vassil.nikolov.names>
Robert Uhl <·········@NOSPAMgmail.com> writes:
> Email addresses can't really be validated with a regular
> expression--they require a push-down automaton, not a finite state
> automaton.  You can find some regular expressions on the web, but
> they're necessarily incomplete and will reject valid email address.

  Well, in the modern day a regex is more than an FSM-equivalent
  regular expression, but sometimes it travels long and winding
  roads...

> The best practical way to check that an email address is correct is to
> send email to it; if it's not correct, it'll bounce.

  Unless one has a future e-mail address at hand (domain registered,
  mail server not running yet), though.  (Or one happens to hit
  maintenance downtime, or ...)

  ---Vassil.

-- 
Bound variables, free programmers.
From: Robert Uhl
Subject: Re: Anybody willing to share their cl-ppcre regex for validating email  address
Date: 
Message-ID: <m3odc0rrw0.fsf@latakia.dyndns.org>
Vassil Nikolov <···············@pobox.com> writes:
>
>> Email addresses can't really be validated with a regular
>> expression--they require a push-down automaton, not a finite state
>> automaton.  You can find some regular expressions on the web, but
>> they're necessarily incomplete and will reject valid email address.
>
>   Well, in the modern day a regex is more than an FSM-equivalent
>   regular expression, but sometimes it travels long and winding
>   roads...

If it's not equivalent to a finite state machine, it's not a regular
expression.  That's the _definition_ of a regular expression!

I'll grant your point, that in practical use 'regular expression' has
come to mean 'any of a class of things resembling regular expressions.'

Still, I don't think even Perl's regexps can match the full range of
valid email addresses.  Maybe they can, of course.

>> The best practical way to check that an email address is correct is to
>> send email to it; if it's not correct, it'll bounce.
>
>   Unless one has a future e-mail address at hand (domain registered,
>   mail server not running yet), though.  (Or one happens to hit
>   maintenance downtime, or ...)

If it's valid in the future but not now, it's not valid yet is it?
There are already protocols for how long to wait to deliver an email
upon a failure; just as a human being can expect a hard,
not-trying-anymore bounce to be an honest-to-goodness invalid address,
so too can a machine.

-- 
Robert Uhl <http://public.xdi.org/=ruhl>
Some people are born blind, others are born crippled, and some are born
Americans.  One should not be held responsible for what is essentially an
accident of birth.                                        --Harald Horgen
From: Vassil Nikolov
Subject: Re: Anybody willing to share their cl-ppcre regex for validating email  address
Date: 
Message-ID: <snw8x342ccb.fsf@luna.vassil.nikolov.name>
Robert Uhl <·········@NOSPAMgmail.com> writes:

> Vassil Nikolov <···············@pobox.com> writes:
>> ...
>>   Well, in the modern day a regex is more than an FSM-equivalent
                               ^^^^^
>>   regular expression, but sometimes it travels long and winding
     ^^^^^^^^^^^^^^^^^^
>>   roads...
>
> If it's not equivalent to a finite state machine, it's not a regular
> expression.  That's the _definition_ of a regular expression!
>
> I'll grant your point, that in practical use 'regular expression' has
> come to mean 'any of a class of things resembling regular expressions.'

  My point is terminological: that the optimal resolution seems to be
  to treat "regex(p)" as a term different from "regular expression".

> ...
>>> The best practical way to check that an email address is correct is to
>>> send email to it; if it's not correct, it'll bounce.
>>
>>   Unless one has a future e-mail address at hand (domain registered,
>>   mail server not running yet), though.  (Or one happens to hit
>>   maintenance downtime, or ...)
>
> If it's valid in the future but not now, it's not valid yet is it?

  It may very well be appropriate to enter it (say, in an address
  book) now, without waiting for it(s maildrop) to become available,
  at which time of entry the (lexical) validation is done.

  ---Vassil.


-- 
Bound variables, free programmers.
From: Paul Wallich
Subject: Re: Anybody willing to share their cl-ppcre regex for validating email  address
Date: 
Message-ID: <flp0cv$aig$1@reader2.panix.com>
Vassil Nikolov wrote:
> Robert Uhl <·········@NOSPAMgmail.com> writes:
> 
>> Vassil Nikolov <···············@pobox.com> writes:
>>> ...
>>>   Well, in the modern day a regex is more than an FSM-equivalent
>                                ^^^^^
>>>   regular expression, but sometimes it travels long and winding
>      ^^^^^^^^^^^^^^^^^^
>>>   roads...
>> If it's not equivalent to a finite state machine, it's not a regular
>> expression.  That's the _definition_ of a regular expression!
>>
>> I'll grant your point, that in practical use 'regular expression' has
>> come to mean 'any of a class of things resembling regular expressions.'
> 
>   My point is terminological: that the optimal resolution seems to be
>   to treat "regex(p)" as a term different from "regular expression".
> 
>> ...
>>>> The best practical way to check that an email address is correct is to
>>>> send email to it; if it's not correct, it'll bounce.
>>>   Unless one has a future e-mail address at hand (domain registered,
>>>   mail server not running yet), though.  (Or one happens to hit
>>>   maintenance downtime, or ...)
>> If it's valid in the future but not now, it's not valid yet is it?
> 
>   It may very well be appropriate to enter it (say, in an address
>   book) now, without waiting for it(s maildrop) to become available,
>   at which time of entry the (lexical) validation is done.

And of course, in general sending test email to an address may cause 
side effects (challenge-response sequences, sender blacklisting etc) 
that the validator is not prepared to deal with.

paul
From: Slobodan Blazeski
Subject: Re: Anybody willing to share their cl-ppcre regex for validating 	email address
Date: 
Message-ID: <332fcfd2-8570-446c-bb27-8641b299119d@e23g2000prf.googlegroups.com>
On Jan 4, 7:01 pm, Robert Uhl <·········@NOSPAMgmail.com> wrote:
> Slobodan Blazeski <·················@gmail.com> writes:

> The best practical way to check that an email address is correct is to
> send email to it; if it's not correct, it'll bounce.
I know but sometimes that's not an option.
>
> Whatever you do, don't reject '+' in the local portion of the email
> address, or gnomes will eat your children.
Underpants gnomes :)
cheers
Slobodan
>
> --
> Robert Uhl <http://public.xdi.org/=ruhl>
> So I was reading Twelfth Night ... and would you believe that the I LOVE YOU
> hoax is the exact same trick Shakespeare uses to point out what an arrogant,
> self-absorbed fool Malvolio is?                           --Julia McKinnell