From: David Bakhash
Subject: regexps...
Date: 
Message-ID: <cxjiuai2lbo.fsf@acs5.bu.edu>
Allegro CL has a facility to do regexp matching.  However, their regexps lack
some of the prettifications that are found in other implementations of regexp
matchers, such as Perl.  things like "\\(this\\|that\\)" is Allegro are simply
"(this|that)" in Perl.  I hate having to quote `|' and `(' stuff b/c they come
up so much more than their litterals do.  Also, they break the simple rule
that when you backslashify a character, that gets you the litteral.  

Worse even is that the (very useful) `?' (e.g. in Perl) doesn't exist, so you
something like "(hello)?" would end up looking like "\\(\\(hello\\)\\|\\)" (I
think).  What makes this all even more difficult is that if there were a
converter to change nice-looking perl regexps into Lisp ones, then you'd
probably have to re-parenthesize the regexp, and that would change the \n's as
a result. 

I would like to know if anyone has written a converter, reader macro, or
anything to facilitate writing these regexps with Allegro CL.

If I had a converter, I imagine the best way to use it would be to make a
reader macro that would be used as follows:

(defvar *phone-regexp*
  #R"(\\([0-9][0-9][0-9]\\) ?)?[0-9][0-9][0-9]-?[0-9][0-9][0-9][0-9]")

Of course, even better would be if someone implmented the the {n,m} thing, as
well as some more little helpers like \d (which is equivalent to [0-9]).  Such
a macro would be really useful, and would make this stuff a whole lot easier.
I don't think it's such a simple task, though, if it's done right.

thanks,
dave

From: Eric Marsden
Subject: Re: regexps...
Date: 
Message-ID: <wzilnfeges5.fsf@mail.dotcom.fr>
>>>>> "db" == David Bakhash <·····@bu.edu> writes:

  db> If I had a converter, I imagine the best way to use it would be
  db> to make a reader macro that would be used as follows:
  db> 
  db> (defvar *phone-regexp*
  db>   #R"(\\([0-9][0-9][0-9]\\)?)?[0-9][0-9][0-9]-?[0-9][0-9][0-9][0-9]")

are you familiar with Structural Regular Expressions, proposed by Olin
Shivers? Your example would be written (modulo my errors)

  (let ((digit (/ "0" "9")))
    (seq (? (= 3 digit)) (= 3 digit) (? "-") (= 4 digit)))


   <URL:http://www.ai.mit.edu/~shivers/sre.txt>
  
-- 
~$ universe -G 6.67e-11 -e 1.602e-19 -h 6.626e-34 &
From: Marco Antoniotti
Subject: Re: regexps...
Date: 
Message-ID: <lwr9p5urjm.fsf@copernico.parades.rm.cnr.it>
Eric Marsden <········@mail.dotcom.fr> writes:

> >>>>> "db" == David Bakhash <·····@bu.edu> writes:
> 
>   db> If I had a converter, I imagine the best way to use it would be
>   db> to make a reader macro that would be used as follows:
>   db> 
>   db> (defvar *phone-regexp*
>   db>   #R"(\\([0-9][0-9][0-9]\\)?)?[0-9][0-9][0-9]-?[0-9][0-9][0-9][0-9]")
> 
> are you familiar with Structural Regular Expressions, proposed by Olin
> Shivers? Your example would be written (modulo my errors)
> 
>   (let ((digit (/ "0" "9")))
>     (seq (? (= 3 digit)) (= 3 digit) (? "-") (= 4 digit)))
> 
> 
>    <URL:http://www.ai.mit.edu/~shivers/sre.txt>
>   

They need a port to Common Lisp. Plus some of the operators are not
Common Lisp friendly.

Any taker? :)

Cheers

-- 
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - 06 68 10 03 17, fax. +39 - 06 68 80 79 26
http://www.parades.rm.cnr.it/~marcoxa
From: Thomas A. Russ
Subject: Re: regexps...
Date: 
Message-ID: <ymi90belyim.fsf@sevak.isi.edu>
David Bakhash <·····@bu.edu> writes:

> 
> Allegro CL has a facility to do regexp matching.  However, their regexps lack
> some of the prettifications that are found in other implementations of regexp
> matchers, such as Perl.  things like "\\(this\\|that\\)" is Allegro are simply
> "(this|that)" in Perl.  I hate having to quote `|' and `(' stuff b/c they come
> up so much more than their litterals do.  Also, they break the simple rule
> that when you backslashify a character, that gets you the litteral.  

I would guess that this is a legacy of Emacs style regular expressions.
I imagine the desire was to have compatibility with Emacs rather than
Perl, possibly because more Lisp programmers use Emacs?

NB.  You could always use your own regexp package in Lisp.  I have
updated and modified the nregex package from the CMU archives.  (I know,
I should resubmit the changes...)  It seems to have much of what you
want in the way of things like \d, although it does use Emacs syntax.
However, since you get the source code, you could easily change the
parser it uses to flip the sense of quoting.  (Email me if you want the
code).


One nice thing about that package is that it nicely exploits Lisp
features.  What the package produces is a function to recognize a
particular regular expression.  You compile and save this function, and
can therefore very efficiently determine whether a given string
matches.  This is very convenient if you are mapping over a large set of
objects to find matches.

-- 
Thomas A. Russ,  USC/Information Sciences Institute          ···@isi.edu    
From: Reini Urban
Subject: Re: regexps...
Date: 
Message-ID: <3726bf0c.75649728@judy>
i do it with perl.
you could always link to or call the "simple" spencer regexp library,
from the gnu glibc for example, but then you only have the searcher. eg
CLISP does this.

I also favor perl's regexp esp. the simple search/replace on strings
with 's///' and the quantifiers, not always trying the longest match.
and the syntax is quite short.

I'm not that good in the ACL FFI, calling the perl.dll or perl.so.
I do it inside a C++ module which is called by my lisp, but anyway it is
quite simple.
"Advanced Perl Programming" by Sriram Srinivasan, Chapter "Embedding
Perl" and the perlembed.pod and perlcall.pod are pretty good in
describing this.

in C:
static PerlInterpreter *my_perl;
my_perl = perl_alloc();
perl_construct(my_perl);

and then
int perl_eval_va (char *str),		// input string
		[char *type, *arg,]*    // type and buffer 
		// for the return values. like "i", &i
		NULL)
which returns the number of result values. -1 for failure 
or 1 on success with one return value.

"s///" may return the number of changes or 
it may be used with destructively like this:
	"$s =~ s/foo/bar/"
which needs setting and getting the string value $s via
  set_str(char *var, char* value) and
  int get_str(char *var, char** value)
you can grab any number of strings by using the sideeffects inside the
s/// operator. 

But the obvious problem is to preallocate the return buffer.

then at last:
perl_destruct(my_perl);
perl_free(my_perl);
for each instance of the interpreter, or simplier perl_close().

The probability that any shared perl library is at your machine is quite
high. not so on win32 but very likely on other machines.
for win32 you can grab any precompiled perl.dll

---
Reini Urban
http://xarch.tu-graz.ac.at/autocad/news/faq/autolisp.html
From: Joe Nall
Subject: Re: regexps...
Date: 
Message-ID: <3731FFB1.A1FA16D6@nall.com>
Martin Cracauer wrote:
> 
> David Bakhash <·····@bu.edu> writes:
> Exacly. The code by Henry Spencer (which was used for perl, if I'm not
> mistaken) shows a lot of thought that has been put into it. Language
> issues aside, this is great code.
The current Perl code is only remotely related to Henry Spencer's
original code. It has a lot of additional (e.g. non-greedy regexps)
functionality. It is also deeply intertwined with the guts of the Perl
interpreter.
There is a Perl compatible regexp package called pcre by Philip Hazel
<····@cam.ac.uk> that would be a better starting spot.
ftp://ftp.cus.cam.ac.uk/pub/software/programs/pcre/

Joe Nall
From: David Bakhash
Subject: Re: regexps...
Date: 
Message-ID: <cxj7lqk2vcd.fsf@acs5.bu.edu>
Joe Nall <···@nall.com> writes:

> There is a Perl compatible regexp package called pcre by Philip Hazel
> <····@cam.ac.uk> that would be a better starting spot.
> ftp://ftp.cus.cam.ac.uk/pub/software/programs/pcre/

some questions:

1) has this package been tested against others in terms of speed?  How does it
   compare?
2) has anyone written Lisp interfaces to the functionality in there (e.g. like
   a foreign function interface, such as Allegro's FFI)?

dave