why emacs lisp's regex has 2-steps escapes?

From: ······@gmail.com
Subject: why emacs lisp's regex has 2-steps escapes?
Date: Wed, 09 Jul 2008 10:29:03 +0000
Message-ID: <0add1712-a31e-4499-9523-955c49126d8f@x41g2000hsb.googlegroups.com>

emacs regex has a odd pecularity in that it needs a lot backslashes.
More specifically, a string first needs to be properly escaped, then
this passed to the regex engine.

For example, suppose you have this text “Sin[x] + Sin[y]” and you need
to capture the x or y.

In emacs i need to use
“\\(\\[[a-z]\\]\\)”
for the actual regex
“\(\[[a-z]\]\)”.

Here's somewhat typical but long regex for matching a html image tag

(search-forward-regexp "<img +src=\"\\([^\"]+\\)\" +alt=\"\\([^\"]+\\)?
\" +width=\"\\([0-9]+\\)\" +height=\"\\([0-9]+\\)\" ?>" nil t)

The toothpick syndrom gets crazy making already difficult regex syntax
impossible to read and hard to code.

My question is, why is elisp's regex has this 2-steps process? Is this
some design decision or just happened that way historically?

Second question: can't elisp create some like “regex-string” wrapper
function that automatically takes care of the quoting? I can't see how
this migth be difficult?

Thanks.

  Xah
∑ http://xahlee.org/

☄

Re: why emacs lisp's regex has 2-steps escapes? Barry Margolin
- Re: why emacs lisp's regex has 2-steps escapes? Johan Bockgård
  - Re: why emacs lisp's regex has 2-steps escapes? Vassil Nikolov
- Re: why emacs lisp's regex has 2-steps escapes? ······@gmail.com
Re: why emacs lisp's regex has 2-steps escapes? Pascal J. Bourguignon
Re: why emacs lisp's regex has 2-steps escapes? Pascal J. Bourguignon
Re: why emacs lisp's regex has 2-steps escapes? Joseph Brenner
- Re: why emacs lisp's regex has 2-steps escapes? Vassil Nikolov
Re: why emacs lisp's regex has 2-steps escapes? ······@gmail.com
- Re: why emacs lisp's regex has 2-steps escapes? Joost Diepenmaat
- Re: why emacs lisp's regex has 2-steps escapes? ·······@poczta.onet.pl
  - Re: why emacs lisp's regex has 2-steps escapes? David Kastrup
    - Re: why emacs lisp's regex has 2-steps escapes? Matthias Buelow

From: Barry Margolin
Subject: Re: why emacs lisp's regex has 2-steps escapes?
Date: Wed, 09 Jul 2008 13:13:03 +0000
Message-ID: <barmar-420682.09130309072008@newsgroups.comcast.net>

In article 
<····································@x41g2000hsb.googlegroups.com>,
 ·······@gmail.com" <······@gmail.com> wrote:

> My question is, why is elisp's regex has this 2-steps process? Is this
> some design decision or just happened that way historically?

Just history.  They adopted string notation from Common Lisp, which uses 
backslash as the escape.  And they adopted the standard Unix regex 
notation, which also uses backslash as the escape.

> 
> Second question: can't elisp create some like “regex-string” wrapper
> function that automatically takes care of the quoting? I can't see how
> this migth be difficult?

It can't be done using a function, because literal parsing is done at 
read time.  It would have to be done using a new string syntax.  Perhaps 
something like:

#/regex/

would make sense.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***

From: Johan Bockgård
Subject: Re: why emacs lisp's regex has 2-steps escapes?
Date: Wed, 09 Jul 2008 17:13:24 +0000
Message-ID: <yoijvdzf2d6z.fsf@remote5.student.chalmers.se>

Barry Margolin <······@alum.mit.edu> writes:

> It can't be done using a function, because literal parsing is done at
> read time. It would have to be done using a new string syntax. Perhaps
> something like:
>
> #/regex/
>
> would make sense.

For the record,

  ``SXEmacs has Python-style raw strings. It greatly reduces "backslashitis"
    when writing those hairy regexps. :-)

    Normal regexp: "\\(?:^\\|[^\\]\\)\\(?:\\\\\\\\\\)*\\(\\\\[@A-Za-z]+\\)"
    Raw string regexp: #r"\(?:^\|[^\]\)\(?:\)*\(\\[@A-Za-z]+\)"

    XEmacs 21.5 now has raw strings.''

http://www.sxemacs.org/

-- 
[sic]

From: Vassil Nikolov
Subject: Re: why emacs lisp's regex has 2-steps escapes?
Date: Thu, 10 Jul 2008 03:43:04 +0000
Message-ID: <snzfxqil7zr.fsf@luna.vassil.nikolov.name>

[ refraining from cross-posting ]

On Wed, 09 Jul 2008 19:13:24 +0200, ············@dd.chalmers.se (Johan Bockgård) quoted:
| ...
|     XEmacs 21.5 now has raw strings.

  If this had to be done, I wonder if it would have been a good idea
  to also make things like the following work:

    #r'foo "bar" baz' (i.e. "foo \"bar\" baz")
    #r[foo 'bar ["baz"] quux' quuux] (i.e. "foo 'bar [\"baz\"] quux' quuux")

  In any case, these are like Common Lisp strings, but with a
  non-disappearing single-escape character (were they all worth it?):

    (coerce #r"\n" 'list)
    => (?\\ ?n)

    (coerce #r"\\" 'list)
    => (?\\ ?\\)

    (coerce #r"\"" 'list)
    => (?\\ ?\")

    (coerce #r"\\"" 'list)
    Syntax error: Unbalanced parentheses

  ---Vassil.


-- 
Peius melius est.  ---Ricardus Gabriel.

From: ······@gmail.com
Subject: Re: why emacs lisp's regex has 2-steps escapes?
Date: Sat, 12 Jul 2008 07:16:21 +0000
Message-ID: <ca938016-28f8-425e-b563-0449b605152c@25g2000hsx.googlegroups.com>

Thank Barry Margolin. Also thanks Johan Bockgård.

  Xah
∑ http://xahlee.org/

☄
--------
xah lee wrote:

« My question is, why is elisp's regex has this 2-steps process? Is
this some design decision or just happened that way historically?  »

Barry Margolin wrote:
> Just history.  They adopted string notation from Common Lisp, which uses
> backslash as the escape.  And they adopted the standard Unix regex
> notation, which also uses backslash as the escape.

Xah wrote:
«Second question: can't elisp create some like “regex-string” wrapper
function that automatically takes care of the quoting? I can't see how
this migth be difficult?»

Barry wrote:
> It can't be done using a function, because literal parsing is done at
> read time.  It would have to be done using a new string syntax.  Perhaps
> something like:
>
> #/regex/

Thank Barry. Also thanks Johan Bockgård.

  Xah
∑ http://xahlee.org/

☄

From: Pascal J. Bourguignon
Subject: Re: why emacs lisp's regex has 2-steps escapes?
Date: Wed, 09 Jul 2008 10:51:46 +0000
Message-ID: <7cd4lncou5.fsf@pbourguignon.anevia.com>

·······@gmail.com" <······@gmail.com> writes:

> emacs regex has a odd pecularity in that it needs a lot backslashes.
> More specifically, a string first needs to be properly escaped, then
> this passed to the regex engine.
>
> For example, suppose you have this text “Sin[x] + Sin[y]” and you need
> to capture the x or y.
>
> In emacs i need to use
> “\\(\\[[a-z]\\]\\)”
> for the actual regex
> “\(\[[a-z]\]\)”.
>
> Here's somewhat typical but long regex for matching a html image tag
>
> (search-forward-regexp "<img +src=\"\\([^\"]+\\)\" +alt=\"\\([^\"]+\\)?
> \" +width=\"\\([0-9]+\\)\" +height=\"\\([0-9]+\\)\" ?>" nil t)
>
> The toothpick syndrom gets crazy making already difficult regex syntax
> impossible to read and hard to code.
>

Copy the following expression in *scratch*, put your cursor after it:

(let ((regexp (read-from-minibuffer "Enter the regexp: "))
      (string "Sin[x] + Sin[y]"))
  (insert "regexp: " regexp "\n" "string: " string "\n"
          "string-match: " (format "%d" (string-match regexp string) "\n")))

and type C-x C-e
Then enter the regular expression: \(\[[a-z]\]\) and type RETurn.
It shall insert the following in the buffer:

regexp: \(\[[a-z]\]\)
string: Sin[x] + Sin[y]
string-match: 3  

Moral: there is absolutely NO double antislash.  It's a figment of your imagination.

Another proof there is no double antislash:

(let ((test "\\a\\b"))
  (insert (format "%c %c %c %c\n" (aref test 0) (aref test 1) (aref test 2) (aref test 3))))
C-x C-e

inserts:
\ a \ b

and not:
\ \ a \

> My question is, why is elisp's regex has this 2-steps process? Is this
> some design decision or just happened that way historically?

The Emacs Regexp  syntax involves double anti-slash, only to match an antislash:

  (string-match "\\\\" "abc\\def") --> 3

Otherwise, there is only one anti-slash in each occurence:

  (string-match "\\." "abc.def") --> 3

> Second question: can't elisp create some like “regex-string” wrapper
> function that automatically takes care of the quoting? I can't see how
> this migth be difficult?

It would not be simple, because there is no reader macros in emacs
lisp.  

You would have to change either the syntax of regular expressions, to
use some other character. For example, double-quote.

 ".     would be the regexp to match a single dot
 "( ")  would match a group, etc.

Then you would write:  "\"(a\"|\".\")"  ; look ma! no double-slash!
(but double-quotes...)

Or you could try another character that doesn't need escaping from
string literals. Let's say ^

   "^(a^|^.^)"    matches a or dot anywhere.

Or, if you changed the emacs reader, you could have it use another
character tha anti-slash to escape double-quote and the escape
character in strings.  Let's use ~

     "\(~"a~"\|~\\.\)"  would match "a" or \.
     (insert  "\(~"a~"\|\.\)") would insert: \("a"\|\.\)
You would have to write ~n to insert a newline in your string literals...

-- 
__Pascal Bourguignon__

From: Pascal J. Bourguignon
Subject: Re: why emacs lisp's regex has 2-steps escapes?
Date: Wed, 09 Jul 2008 12:30:17 +0000
Message-ID: <7c1w23ck9y.fsf@pbourguignon.anevia.com>

···@informatimago.com (Pascal J. Bourguignon) writes:
> character in strings.  Let's use ~
>
>      "\(~"a~"\|~\\.\)"  would match "a" or \.

Sorry, a typo here. It should be:

       "\(~"a~"\|\\\.\)"  would match "a" or \.


-- 
__Pascal Bourguignon__

From: Joseph Brenner
Subject: Re: why emacs lisp's regex has 2-steps escapes?
Date: Thu, 10 Jul 2008 19:52:45 +0000
Message-ID: <87k5ftldo2.fsf@kzsu.stanford.edu>

·······@gmail.com" <······@gmail.com> writes:

> emacs regex has a odd pecularity in that it needs a lot backslashes.

Yes, I've made the same complaint.  You haven't even mentioned what
I would call the really bad problem: some backslashes need to be
doubled up when you stash them in strings, but *others don't*:
if you turn "\t" or "\n" into "\\t" or "\\n" you'll break the regexp.

> My question is, why is elisp's regex has this 2-steps process? Is this
> some design decision or just happened that way historically?

I hypothesize that "(" and ")" need to be escaped because lisp hackers
think in terms of meta-hacking lisp, but I don't know that that's true.

As other people have pointed out, the problem is an interaction
between two different sets of design decisions, one concerning
regexps, the other concerning strings.

> Second question: can't elisp create some like �regex-string� wrapper
> function that automatically takes care of the quoting? I can't see how
> this migth be difficult?

I've had the same thought (after making the same complaint):

  http://obsidianrook.com/devnotes/whinery/elisp-regexps.html

No, I don't think it would be all that hard to write a
"regexp-whack-off" function that would do the escaping for
you, but getting anyone else to use it might be difficult:
it's the sort of thing where you don't see the need for it
until you've learned to dance around it.

From: Vassil Nikolov
Subject: Re: why emacs lisp's regex has 2-steps escapes?
Date: Thu, 10 Jul 2008 23:07:10 +0000
Message-ID: <snz3amhl4o1.fsf@luna.vassil.nikolov.name>

On Thu, 10 Jul 2008 12:52:45 -0700, Joseph Brenner <····@kzsu.stanford.edu> said:
|| My question is, why is elisp's regex has this 2-steps process? Is this
|| some design decision or just happened that way historically?

| I hypothesize that "(" and ")" need to be escaped because lisp hackers
| think in terms of meta-hacking lisp, but I don't know that that's true.

  A trade-off is inevitable, but neither of the two possible options
  has an absolute advantage over the other, i.e. each is appropriate
  in the corresponding context:

  (a) ed(1) users are expected to need to match parentheses more often
      than to use backreferences, so with regexes, ed(1) style, a
      parenthesis is literal and a subexpression delimiter needs to be
      tagged with a backslash;

  (b) Perl users are expected to need to capture subexpressions more
      often than to match parentheses [*], so in PCREs a parenthesis
      is a subexpression marker by itself and needs to be escaped to
      be treated literally.

  _________
  [*] parentheses are rarely used for structure in files where each
      line represents a record of single-character-separated fields

| As other people have pointed out, the problem is an interaction
| between two different sets of design decisions, one concerning
| regexps, the other concerning strings.

  In other words, overloading the same character (the backslash) with
  two different functions, an escape character and a functional
  marker.  Arguably, the Right Thing would be to use two different
  characters for these two different purposes; but worse is better...

  ---Vassil.


-- 
Peius melius est.  ---Ricardus Gabriel.

From: ······@gmail.com
Subject: Re: why emacs lisp's regex has 2-steps escapes?
Date: Fri, 11 Jul 2008 22:45:00 +0000
Message-ID: <eae66784-3515-4b1d-9cd0-39b5ce468afe@34g2000hsh.googlegroups.com>

Hi David.

r u, like, trying 2 b a dumbass, or u trying 2 b divisive?

i can play a round of game with u or 2.

In this thread, so far there r 2 morons. One is Pascal J Bourguignon
(congradulations Pascal!). The other is you. Congrat.

          *          *          *

sometimes i wonder why there are these fucking morons slaving in
newsgroups. Are they like, having nothing to do? Yes, i think that is
the reason. I too, have nothing to do. But at least i think i have
some redeeming qualities. While, on the other hand, these massive
number of fuckheads that slave in newsgroups, although perhaps
suffering from loneliness to a lesser degree than me, but, their
knowledge, moral quality, philosophical outlook, depth of humour, is
at a level equivalent to skateboard toting teens. Thus, the behavior,
actions, of these morons at boredom, is rather insufferable.

          *          *          *

there are, of course, quite a few common lisp morons here. After
interacting with them a while, u sometimes realize they are no
different than unix morons they despise. They just wear different
pants. These morons at least really believed their dickless opinions
and mushy drivels when they tech geek in computing forums online.
However, folks, look at this case here today. What is David doing??
Does he actually really not understood the subject of this thread?
Could he have really misread? Surely he's a emacs lisp developers for
years, who are, capable of grasping lisp subject matters, i think. So,
is he, intentionally trying to be a fuckhead?? Like, he's having a bad
day maybe? The third possibility, is that he is just one of the
billions of walking joe, shits all over as they carry about thru life
without care or capable of control. With this interpretation, then
David probably didn't pay attention to this thread, and simply just
injected his say into a thread, and that it. Its, like, who gives a
shit?

Like, one plus one is three a-ok, yes no a-ok. LOLZ and wtf? who gives
a shit?!? u can do it with regex-opt and regexp-quote.

In my online forum posting career, since about 1991 with CompuServe
and AppleLink days, especially since about 1998 when i started to get
into contact with the unix industry programers, i noticed there is
this type of morons, this class. It originally began with the unix
morons, perl morons, as i classified. But no, it's not just they. When
i got in contact with Mac community, where u can smell fashion and
elegance in their air, there are quite a lot of morons too, among them
known as Mac fanatics. The typical Mac fanatics are in general dumber
by the numbers, although their demeanor is holier-than-thou with their
more-expensive-than-thou hardware. But then in Python community, which
i made first contact in 2005, moron masses i witnessed in them too of
the type. In lisp community, morons of this type you see too...
(though, i have to say, oddly you dont see much of these type of
morons in Java communities. Umm. Perhaps because java programers tends
to be more suites) i think i dunno what i'm trying to say here am a
bit dizzy atm n cant be bothered to put out thoughts in a focused way
or make it to my writings. But distinctly these morons impressed me,
and i wanted to make sure that it is these particular class of morons
i'm currently trying to write out. Ok, back to topic... these morons.
morons... Ok, i think originally what i was trying to say, was that i
realized...

kk, i've been wondering, why are these so many this type morons? My
final thought and solution. My final answer to this wonderment, is the
last interpretation of the paragraph about David's case.  The essence,
that characterizes or qualifies this moron class's behavior, is just
because that's what they are. They shit thru life. They careless.
Happy-go-lucky. Most human animals are like this, actually. Me,
myself, with intense introspection and capabilities, austerity and
asceticism, magnanimity and grandiloquence of self love, realized that
i'm not like most people in being a lone genius. Not that i'm a
wunderkind or the greatest, but my kind is sparse.

stupid people offend me. people with low IQ offend me. Sloppiness and
slouchers offend me. Low aspiration, low lifers, witless shits offends
me. Well they dont rly offend me, just that i find little interest in
them and look down on them. O, Jail me! O, my passion for the world
and human animals. (n pussy) O. It began to OFFEND me, when these
morons begin to be hateful. Bingo, that's most of these tech geeking
morons who slaves in programing newsgroups, are. Dickless cocks.

What is it they want in life? well you get married and have kids and
die. A inevitibility. (there's glimmer of immortability with biology
on the horizon, however) But what they want? What are they thinking?
What is it, they care? Sure, we dont want pain, we dont want hunger.
We want to have money, and all. But the tech geeking morons, what r
they getting out of their existance? What do for example David in this
thread, who inject his irrelevance with irreverence in the middle,
want to achieve?

Is it the chatting? The socialization? The simple fact of smashing a
wine glass and have another human ilk respond, underwritten by empathy
and pleasure. Like, rubbing elbows, schmooze, have a laugh. Having a
good time. I, being what i am, dont see the humour. There is no
knowledge; there is no art. But they can go about their tech drivel
and i go about weighty fantasies in math or human outcome. But these
morons impinge me, when they trust forward their male nature for a
pissing fight with ignorance and hatefulness.

u see,

  Xah
∑ http://xahlee.org/

☄

On Jul 11, 2:35 am, David Kastrup <····@gnu.org> wrote:
> Joseph Brenner <····@kzsu.stanford.edu> writes:
> > David Kastrup <····@gnu.org> writes:
> >> ·······@gmail.com" <······@gmail.com> writes:
>
> >>> Second question: can't elisp create some like “regex-string” wrapper
> >>> function that automatically takes care of the quoting? I can't see how
> >>> this migth be difficult?
>
> >> Do you mean something like regexp-quote?
>
> >> regexp-quote is a built-in function in `C source code'.
>
> >> (regexp-quote STRING)
>
> >> Return a regexp string which matches exactly STRING and nothing else.
>
> > Actually, no, that isn't at all the kind of thing we're talking about.
>
> > The problem is not how to literally match on "\(this\|that\)", the
> > problem is how to both enter and then convert a string like
> > "\(this\|that\)", into "\\(this\\|that\\)", so that it'll match
> > "this" or "that".
>
> (regexp-opt '("this" "that")) => "\\(?:th\\(?:at\\|is\\)\\)"
>
> --
> David Kastrup

From: Joost Diepenmaat
Subject: Re: why emacs lisp's regex has 2-steps escapes?
Date: Sat, 12 Jul 2008 01:02:13 +0000
Message-ID: <87lk07diei.fsf@zeekat.nl>

·······@gmail.com" <······@gmail.com> writes:

> Hi David.
>
> r u, like, trying 2 b a dumbass, or u trying 2 b divisive?

  [ ... ]

> sometimes i wonder why there are these fucking morons slaving in
> newsgroups. Are they like, having nothing to do? Yes, i think that is
> the reason. I too, have nothing to do. But at least i think i have
> some redeeming qualities.

I thought your redeeming qualities were good enough, but this post
confirms that I should have known better.

*plonk*

-- 
Joost Diepenmaat | blog: http://joost.zeekat.nl/ | work: http://zeekat.nl/

From: ·······@poczta.onet.pl
Subject: Re: why emacs lisp's regex has 2-steps escapes?
Date: Sat, 12 Jul 2008 23:47:30 +0000
Message-ID: <8647f2c1-5baa-4f63-837c-f297c9c7f435@z66g2000hsc.googlegroups.com>

On 12 Lip, 00:45, ·······@gmail.com" <······@gmail.com> wrote:
> r u, like, trying 2 b a dumbass, or u trying 2 b divisive?
(...)

Your nasty humor is the most hilarious thing I have read in a while,
really.
Yes, you guessed, I'm a moron too, a Java one, by the way.
I think you'd be delighted with our moronic sense of humor :D

Keep kool,
--
José A. Romero L.
joseito (at) poczta (dot) onet (dot) pl
"We who cut mere stones must always be envisioning cathedrals."
(Quarry worker's creed)

From: David Kastrup
Subject: Re: why emacs lisp's regex has 2-steps escapes?
Date: Sun, 13 Jul 2008 08:03:02 +0000
Message-ID: <854p6umcsp.fsf@lola.goethe.zz>

·······@poczta.onet.pl writes:

> On 12 Lip, 00:45, ·······@gmail.com" <······@gmail.com> wrote:
>> r u, like, trying 2 b a dumbass, or u trying 2 b divisive?
> (...)
>
> Your nasty humor is the most hilarious thing I have read in a while,
> really.

I suppose you mean "humor" as in

     3. State of mind, whether habitual or temporary (as formerly
        supposed to depend on the character or combination of the
        fluids of the body); disposition; temper; mood; as, good
        humor; ill humor.
        [1913 Webster]
  
              Examine how your humor is inclined,
              And which the ruling passion of your mind.
                                                    --Roscommon.
        [1913 Webster]
  
              A prince of a pleasant humor.         --Bacon.
        [1913 Webster]
  
              I like not the humor of lying.        --Shak.
        [1913 Webster]
  

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

From: Matthias Buelow
Subject: Re: why emacs lisp's regex has 2-steps escapes?
Date: Mon, 14 Jul 2008 15:59:36 +0000
Message-ID: <6e1bf8F4qomhU1@mid.dfncis.de>

David Kastrup wrote:

> I suppose you mean "humor" as in
[...]

I think "humus" would be a more appropriate term here.