From: Barry Margolin
Subject: Re: regular expressions
Date: 
Message-ID: <63otod$f6a@pasilla.bbnplanet.com>
In article <·············@WINTERMUTE.eagle>,
SDS  <···········@cctrading.com> wrote:
>What is the reason of absence of the regular expression matching from
>LISP (not mentioned in either CLtL2 or CLHS) ? Even C has it :-) (and
>scheme too!).

I don't think C has it; I think you're confusing the C language standard
with the POSIX OS standard.  AFAIK, the only languages that have it as
standard features are some string-processing languages like SNOBOL and
scripting languages like Perl, TCL, and AWK.

As Kent said, at the time CL was being developed, regular expressions
weren't heavily used, except in specialized environments like text
editors.  The built-in features of CL are intended to be very general; you
don't see many features applicable just to a small class of applications.

-- 
Barry Margolin, ······@bbnplanet.com
GTE Internetworking, Powered by BBN, Cambridge, MA
Support the anti-spam movement; see <http://www.cauce.org/>
Please don't send technical questions directly to me, post them to newsgroups.

From: Espen Vestre
Subject: Re: regular expressions
Date: 
Message-ID: <w6g1pbhjpq.fsf@gromit.nextel.no>
Barry Margolin <······@bbnplanet.com> writes:

> As Kent said, at the time CL was being developed, regular expressions
> weren't heavily used, except in specialized environments like text
> editors.  The built-in features of CL are intended to be very general; you
> don't see many features applicable just to a small class of applications.

Yes, but it did strike me sometimes that the string datatype should
have gotten som more care, the number of string-handling functions
in the standard is relatively minimal.  For instance, there's no
structure-sharing substring-function (although it's very easy to
write one using a make-array with :displaced-to).

--

  Espen Vestre
From: Kent M Pitman
Subject: Re: regular expressions
Date: 
Message-ID: <sfwd8kf4f0n.fsf@world.std.com>
Espen Vestre <··@nextel.no> writes:

> Barry Margolin <······@bbnplanet.com> writes:
>
> > The built-in features of CL are intended to be very general; you
> > don't see many features applicable just to a small class of applications.
>
> Yes, but it did strike me sometimes that the string datatype should
> have gotten som more care, the number of string-handling functions
> in the standard is relatively minimal.

Actually, it got an enormous amount of care but you may be looking in the
wrong place.  The issue is that there is almost no function that is applicable
to strings which is not also applicable to sequences if written correctly,
so since strings are sequences, you want to be using the sequence functions.
All of those are probably most commonly used for strings, but there are a
wide variety of other applications that can use them.  And in some
implementations, using good type declarations cause more efficient functions 
to get called--but the functions have the same semantics so we didn't give
them specialized names.

Honestly, while regexps are missing, I haven't heard very many people
say CL lacks for string functions (e.g., compared to other things lacking).
It's already criticized for being a huge language and if it were to grow,
I doubt this would be most people's choice of the part to focus on.
Lisp has been used for many, many years to write high-quality text editors 
and the features needed to do that are pretty much well-understood and
were heavily scrutinized to be present even in the first CLTL.

> For instance, there's no structure-sharing substring-function 
> (although it's very easy to write one using a make-array with
> :displaced-to).

Your observation that there is no nsubstring function is correct.  There
is only SUBSEQ and it has no companion NSUBSEQ.  I'm not convinced this
is bad, though.  I personally have only ever gotten injured when I've used
string sharing--it's very hard to debug.  And also, it varies by 
implementation, but it's easy to confuse yourself and think you are being
efficient when you are not.  The array header on a string may be large 
enough that a "shared" substring takes more space than a "non-shared" copy
unless you have a fair amount of data.  Don't get me wrong--there are
CLEARLY cases where a sharable substring is the right thing computationally,
but the decision not to add another "attractive nuissance" to the language
is not obviously a bad one.  I personally am glad the way you get there is
obscure enough that most people don't mistakenly find it and use it 
thinking they are being clever when they're not.  I bet there would be lots
more REALLY obscure program bugs running around if we had made this easy.
By raising the "activation energy" needed to discover how to do this, we
filter out people who are just stumbling around and tend to get people who
know what they are doing ... at least a bit.  And that's perhaps good.

So at the meta level especially, I take issue with your claim that the
absence of some feature you can think of (and that could be simply
invoked) automatically implies that "care" was not applied.

In the real world, there are guns and there are guns with "safety locks".
It'd be easier to provide just guns and one could observe this as an
"obvious omission" that thwarts efficiency.  But that neglects what
we might sum up as "other adverse effects".  I hope the parallel to
NSUBSTRING here is obvious...
From: Barry Margolin
Subject: Re: regular expressions
Date: 
Message-ID: <63q9im$a34@tools.bbnplanet.com>
In article <··············@gromit.nextel.no>,
Espen Vestre  <··@nextel.no> wrote:
>Yes, but it did strike me sometimes that the string datatype should
>have gotten som more care, the number of string-handling functions
>in the standard is relatively minimal.  For instance, there's no
>structure-sharing substring-function (although it's very easy to
>write one using a make-array with :displaced-to).

Rather than doing this, almost all the string-handling functions take
:START and :END keywords.  Perhaps it's not the most elegant way to do it,
but it effectively provides the same function.  It's also likely to be more
efficient (less consing and indirection), so it's more appropriate as a
primitive language feature.  If you want a structure-sharing substring,
it's trivial to implement it yourself, but it would be difficult to
replicate the entire string section of the language with versions that took
explicit ranges if the only primitive provided were a structure-sharing
substring.

-- 
Barry Margolin, ······@bbnplanet.com
GTE Internetworking, Powered by BBN, Cambridge, MA
Support the anti-spam movement; see <http://www.cauce.org/>
Please don't send technical questions directly to me, post them to newsgroups.