From: Adam Warner
Subject: Lisp editor that can parse strings
Date: 
Message-ID: <pan.2003.01.12.09.39.13.585504@consulting.net.nz>
Hi all,

I'm sick of hitting Emacs (+ ilisp) broken string parsing that mucks up
text entry, subsequent syntax highlighting and indentation, e.g.:

"
(defun
    Why am I typing indented? Because Emacs thinks this is code!"

It appears that the regexp parser interprets a ( or [ at the start of a
line as lisp code even though I am clearly in the midst of a string. Cf
with:

"
 (defun
Emacs no longer thinks the string is Lisp code because I put a space
before the opening bracket"

I last hit this issue quoting the LGPL (extract):

"Copyright (C) 1991, 1999 Free Software Foundation, Inc. 59 Temple Place,
Suite 330, Boston, MA  02111-1307  USA Everyone is permitted to copy and
distribute verbatim copies of this license document, but changing it is
not allowed.

[This is the first released version of the Lesser GPL.  It also counts
 as the successor of the GNU Library Public License, version 2, hence the
 version number 2.1.]

Preamble

The licenses for most software are designed to take away your freedom to
share and change it. By contrast, the GNU General Public Licenses are
intended to guarantee your freedom to share and change free software--to
make sure the software is free for all its users."

The problem in the above string is the line starting with [. Because of
this character placement the paragraph starting with "The licenses" is not
highlighted as a string. If such strings are in the middle of code one
finds that subsequent indentation is broken, e.g.

(list "
[string
 "
why-is-there-no-indent?)

A: Because Emacs thinks the second " starts instead of concludes a string.

Has anyone fixed Emacs string parsing? Or knows of an editor that can
correctly parse Lisp strings, syntax highlight and reindent code?

I am not looking for any way to trick Emacs into interpreting a string as
a string [e.g. by inserting a backslash before an opening bracket or by
using syntax such as #.(format nil "~%[string~%")]. I simply want to be
able to use an editor that understands basic Lisp syntax.

Thanks,
Adam

From: Gerd Moellmann
Subject: Re: Lisp editor that can parse strings
Date: 
Message-ID: <86bs2my422.fsf@gerd.free-bsd.org>
"Adam Warner" <······@consulting.net.nz> writes:

> Has anyone fixed Emacs string parsing?

C-h v open-paren-in-column-0-is-defun-start RET

in Emacs 21.
From: Adam Warner
Subject: Re: Lisp editor that can parse strings
Date: 
Message-ID: <pan.2003.01.12.14.29.39.156740@consulting.net.nz>
Hi Gerd Moellmann,

>> Has anyone fixed Emacs string parsing?
> 
> C-h v open-paren-in-column-0-is-defun-start RET
> 
> in Emacs 21.

Thanks for the tip. BTW you are the first person to mention this setting
in any newsgroup according to Google Groups!:
http://groups.google.com/groups?q=%22open-paren-in-column-0-is-defun-start%22

The square bracket example I gave still breaks the string recognition.
Square brackets appear to define a vector type in Emacs Lisp:
http://www.gnu.org/manual/elisp-manual-21-2.8/html_node/elisp_34.html

I'll about this in the Emacs newsgroup about this behaviour.

Many thanks,
Adam
From: Erik Naggum
Subject: Re: Lisp editor that can parse strings
Date: 
Message-ID: <3251374102450229@naggum.no>
* "Adam Warner" <······@consulting.net.nz>
| Thanks for the tip. BTW you are the first person to mention this setting
| in any newsgroup according to Google Groups!

  Do you depend on others to read the documentation (and maybe the
  source code) and post news articles about it?  Is the way to make
  people read the Emacs documentation to post it all regularly to a
  lot of newsgroups and on lots and lots of web sites so google can
  find it and people can search the Net instead of their own computer?

-- 
Erik Naggum, Oslo, Norway

Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.
From: Paul O'Donnell
Subject: Re: Lisp editor that can parse strings
Date: 
Message-ID: <pan.2003.02.10.22.09.43.287980@rogers.com>
On Sun, 12 Jan 2003 15:28:22 +0000, Erik Naggum wrote:

> * "Adam Warner" <······@consulting.net.nz>
> | Thanks for the tip. BTW you are the first person to mention this setting in
> | any newsgroup according to Google Groups!
> 
>   Do you depend on others to read the documentation (and maybe the source code)
>   and post news articles about it?  Is the way to make people read the Emacs
>   documentation to post it all regularly to a lot of newsgroups and on lots and
>   lots of web sites so google can find it and people can search the Net instead
>   of their own computer?

You sound like an asshole Erik. Having a bad day? Gerd was just asking a
question, he took the initiative to try to find a solution by searching the
usenet articles. That is why they are archived, and sometimes it is easier to
find exactly what you are looking for in Google than it is in the manual. Open
source is about sharing code, sharing information and sharing experiences so
that we can all benefit, and there is nothing wrong with participating in the
dialogue and reading the archives. If we all kept our knowledge to ourselves we
would stay stuck in the dark ages with you. And thanks to all who have helped
me, either directly or through archived information and I am happy to share what
I know if it helps others go up the learning curve a little faster so they can
produce more.

Paul
From: Thomas F. Burdick
Subject: Re: Lisp editor that can parse strings
Date: 
Message-ID: <xcvwulanjdm.fsf@conquest.OCF.Berkeley.EDU>
"Adam Warner" <······@consulting.net.nz> writes:

> I simply want to be able to use an editor that understands basic
> Lisp syntax.

Which is exactly what Emacs doesn't do.  You could write a new
Lisp-mode based on parsing the content of the buffer, or you could
switch to Hemlock, which doesn't use braindamaged regexps for
everything.  Of course, you'll be tied into a particular platform then
(cmucl), but I think that's the general tradeoff -- live with Emacs,
or use each vendor's Emacs-alike.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'                               
From: Christian Lynbech
Subject: Re: Lisp editor that can parse strings
Date: 
Message-ID: <of3cnx30q2.fsf@situla.ted.dk.eu.ericsson.se>
The problem is that Emacs' indenter does not parse the entire file but
instead uses heuristics based on the immediate context and that is
easily fooled as you have experienced. 

However the alternative is not better, IMHO, as I have tried that in
modes for other programming langauges. The wait incurred by the
reparsing of large portions of a large file for innocent changes quickly
becomes intolerable. Nor would I ever want to use a syntax directed
editor (or mode) in which you can only edit in terms of legitimate
syntactic contructs (a simple way to keep a parse of the syntax up to
date).

I have not found adding a backslash in front of parenthesis contructs
to be much of a hassle. These kinds of problems only seem to occur in
documentation strings.

------------------------+-----------------------------------------------------
Christian Lynbech       | Ericsson Telebit, Skanderborgvej 232, DK-8260 Viby J
Phone: +45 8938 5244    | email: ·················@ted.ericsson.se
Fax:   +45 8938 5101    | web:   www.ericsson.com
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - ·······@hal.com (Michael A. Petonic)
From: Tim Bradshaw
Subject: Re: Lisp editor that can parse strings
Date: 
Message-ID: <ey3r8bh4251.fsf@cley.com>
* Christian Lynbech wrote:
> However the alternative is not better, IMHO, as I have tried that in
> modes for other programming langauges. The wait incurred by the
> reparsing of large portions of a large file for innocent changes quickly
> becomes intolerable. Nor would I ever want to use a syntax directed
> editor (or mode) in which you can only edit in terms of legitimate
> syntactic contructs (a simple way to keep a parse of the syntax up to
> date).

This is only because these modes are badly written.  There is no need
to reparse the whole buffer, or even large portions of it in almost
all cases - you just need to keep tabs on where the changes occur and
what the parsing state is there.  PSGML, which has a horrible parsing
job to do and does a reasonably good job of it is perfectly fine for
even quite large SGML files.

Emacs's `paren in first column is beginning of defun' trick might have
been a reasonably compromise on a vax, but it's just annoyingly stupid
now.

--tim
From: Adam Warner
Subject: If you can't beat 'em... [was Re: Lisp editor that can parse strings]
Date: 
Message-ID: <pan.2003.01.14.10.37.09.320604@consulting.net.nz>
Hi Tim Bradshaw,

> Emacs's `paren in first column is beginning of defun' trick might have
> been a reasonably compromise on a vax, but it's just annoyingly stupid
> now.

I've spent a few hours coming to understand how the font-lock mode etc.
works but didn't figure out how to improve its reliability. I was
searching in vain trying to find how square brackets were being matched. I
finally realised that the regular expression \s( also matches square
brackets in lisp mode.

The .el sources are very tidy and wonderfully commented. I can't be as
enthusiastic for some of the design decisions.

At this stage to work around the string parsing problem I've created a
function to escape any region and make it font-lock friendly:

(defun escape-and-font-lock-friendly-region (start end)
  "Escapes any backslashes and double quotes within the region and inserts
   semantically neutral backslashes at the start of any line that includes
   a bracket so Emacs' font-lock doesn't get confused."
  (interactive "r")
  (save-excursion
    (goto-char start)
    (while (search-forward "\\" end t)
      (replace-match "\\\\" t t))
     
    (goto-char start)
    (while (search-forward "\"" end t)
      (replace-match "\\\"" t t))

    ;;"In Emacs Lisp, the delimiters for lists and vectors (`()' and `[]')
    ;; are classified as parenthesis characters."
    (goto-char start)
    (while (re-search-forward "^\\s(" end t)
      (backward-char)
      (insert "\\")
      (forward-line))))

People should read this to find out how to bind a function to a key
combination: http://www.geek-girl.com/emacs/faq/113.html

Regards,
Adam
From: Ingvar Mattsson
Subject: Re: If you can't beat 'em... [was Re: Lisp editor that can parse strings]
Date: 
Message-ID: <87wul7pz8r.fsf@gruk.tech.ensign.ftech.net>
"Adam Warner" <······@consulting.net.nz> writes:

> Hi Tim Bradshaw,
> 
> > Emacs's `paren in first column is beginning of defun' trick might have
> > been a reasonably compromise on a vax, but it's just annoyingly stupid
> > now.
> 
> I've spent a few hours coming to understand how the font-lock mode etc.
> works but didn't figure out how to improve its reliability. I was
> searching in vain trying to find how square brackets were being matched. I
> finally realised that the regular expression \s( also matches square
> brackets in lisp mode.

I think \s( matyches any character with a "(" syntax class (open of
some sort, a two-position slot in the buffer syntax table, with the
second being the close character).

//Ingvar
-- 
(defmacro fakelambda (args &body body) `(labels ((me ,args ,@body)) #'me))
(funcall (fakelambda (a b) (if (zerop (length a)) b (format nil "~a~a" 
 (aref a 0) (me b (subseq a 1))))) "Js nte iphce" "utaohrls akr")
From: Adam Warner
Subject: Re: If you can't beat 'em... [was Re: Lisp editor that can parse strings]
Date: 
Message-ID: <pan.2003.01.14.11.06.32.782037@consulting.net.nz>
Oops,

> (defun escape-and-font-lock-friendly-region (start end)
>   "Escapes any backslashes and double quotes within the region and inserts
>    semantically neutral backslashes at the start of any line that includes
>    a bracket so Emacs' font-lock doesn't get confused."
>   (interactive "r")
>   (save-excursion
>     (goto-char start)
>     (while (search-forward "\\" end t)
>       (replace-match "\\\\" t t))
>      
>     (goto-char start)
>     (while (search-forward "\"" end t)
>       (replace-match "\\\"" t t))
> 
>     ;;"In Emacs Lisp, the delimiters for lists and vectors (`()' and `[]')
>     ;; are classified as parenthesis characters."
>     (goto-char start)
>     (while (re-search-forward "^\\s(" end t)
>       (backward-char)
>       (insert "\\")
>       (forward-line))))

Just noticed a bug. Every time a backslash is inserted the end of the
region grows by one. Hopefully this is (not efficient) but correct:

(defun escape-and-font-lock-friendly-region (start end)
  "Escapes any backslashes and double quotes within the region and inserts
   semantically neutral backslashes at the start of any line that includes
   a bracket so Emacs' font-lock doesn't get confused."
  (interactive "r")
  (save-excursion
    (goto-char start)
    (while (search-forward "\\" end t)
      (replace-match "\\\\" t t)
      (incf end))
     
    (goto-char start)
    (while (search-forward "\"" end t)
      (replace-match "\\\"" t t)
      (incf end))

    ;;"In Emacs Lisp, the delimiters for lists and vectors (`()' and `[]')
    ;; are classified as parenthesis characters."
    (goto-char start)
    (while (re-search-forward "^\\s(" end t)
      (backward-char)
      (insert "\\")
      (incf end)
      (forward-line))))

Regards,
Adam
From: Ray Blaak
Subject: Re: If you can't beat 'em... [was Re: Lisp editor that can parse strings]
Date: 
Message-ID: <ubs2igakl.fsf@telus.net>
"Adam Warner" <······@consulting.net.nz> writes:
> I've spent a few hours coming to understand how the font-lock mode etc.
> works but didn't figure out how to improve its reliability. I was
> searching in vain trying to find how square brackets were being matched. I
> finally realised that the regular expression \s( also matches square
> brackets in lisp mode.
> 
> The .el sources are very tidy and wonderfully commented. I can't be as
> enthusiastic for some of the design decisions.

You can always override things so that you have complete control. The Delphi
mode, for example (e.g. delphi-mode.el, which I wrote), for example, does not
use regular expressions at all for font-lock coloring. Instead one "tokenizes"
and explicitly skips matching groups as needed.

The font lock set up looks like:

(defconst delphi-font-lock-defaults
  '(nil ; We have our own fontify routine, so keywords don't apply.
    t ; Syntactic fontification doesn't apply.
    nil ; Don't care about case since we don't use regexps to find tokens.
    nil ; Syntax alists don't apply.
    nil ; Syntax begin movement doesn't apply
    (font-lock-fontify-region-function . delphi-fontify-region)
    (font-lock-verbose . delphi-fontifying-progress-step))
  "Delphi mode font-lock defaults. Syntactic fontification is ignored.")

I found I didn't like Emacs' notion of syntax specification: too C-centric,
too cryptic, too hacky, too fragile. So I stepped around it and did it the way
I thought it should be done.

--
Cheers,                                        The Rhythm is around me,
                                               The Rhythm has control.
Ray Blaak                                      The Rhythm is inside me,
·····@telus.net                                The Rhythm has my soul.
From: Adam Warner
Subject: Re: Lisp editor that can parse strings
Date: 
Message-ID: <pan.2003.01.13.10.37.35.178777@consulting.net.nz>
Hi Christian Lynbech,

> The problem is that Emacs' indenter does not parse the entire file but
> instead uses heuristics based on the immediate context and that is
> easily fooled as you have experienced.

BTW I soon discovered that open-paren-in-column-0-is-defun-start is also
temperamental and can break just by deleting whitespace in the code:
http://groups.google.com/groups?selm=pan.2003.01.12.15.17.41.201561%40consulting.net.nz

The only response so far is in the vein of just rewrite the legal syntax.

> However the alternative is not better, IMHO, as I have tried that in
> modes for other programming langauges. The wait incurred by the
> reparsing of large portions of a large file for innocent changes quickly
> becomes intolerable.

I would suggest the implementation is poor. With the speed of computers
today and the right algorithms even a larger body of code should be able
to be understood in close to real time. Especially Lisp code.
Understanding what character exits a string (with an escape character
proviso) is not difficult: If the start of a string is properly detected
then there is simply no excuse for not detecting the end of the string
with less than 100% accuracy. Strings are the easiest objects to parse in
Lisp. If an editor cannot parse them it is either a ten minute bug fix or
an ongoing indication of fundamental design problems.

The main point of highlighting/parsing of Lisp code in an editor is to
give visual clues and allow automatic indentation of code. If the parser
continually gives false visual clues and can't even understand elementary
default syntax it is broken and sometimes more trouble than its worth.

I suspect there are real-time, accurate syntax highlighting XML editors
available. If it could be done with XML it could be done with Lisp.

> Nor would I ever want to use a syntax directed editor (or mode) in which
> you can only edit in terms of legitimate syntactic contructs (a simple
> way to keep a parse of the syntax up to date).

Personally I'd love the opportunity to use an accurate parser written in
Lisp that also provides a good way to extend the syntax recognition. After
Thomas' comment I'll have another look at Hemlock. It sounds like a
decade(s) old editor can better parse Lisp than Emacs 21.

> I have not found adding a backslash in front of parenthesis contructs to
> be much of a hassle. These kinds of problems only seem to occur in
> documentation strings.

In what I'm doing it's a hassle. But even if it wasn't I despair at the
general attitude that broken design is good enough. We don't think its OK
if 5% of the time a Lisp interpreter or complier doesn't detect the end of
a string. Yet its OK if the editor can't. And we tell others to rewrite
their code just to work around the editor. And we wax lyrically about how
this fuzziness is actually a good thing because one can extend the
language's syntax and some things remain as broken as they already were.

I'm writing documents as Lisp code. When I paste in a huge chuck of text I
expect that the code should continue to be properly highlighted if I have
not made a mistake in the syntax. Yet I can't rely upon this.

Furthermore I'm also using my triple-double-quote macro that has no
backslash escaping. It's great to use for verbatim pasting in of any text
(even when it includes backslashes and double quotes). While I don't
expect Emacs to always get the parsing right without extending the syntax
recognition (e.g. unbalanced double quotes will leave Emacs in the wrong
state) I shouldn't have to insert backslashes in what was default legal
string syntax. And in fact I can't because they then appear in the output.

If Emacs' Lisp parsing was already on a solid foundation I could feel
confident that it would be a small step to add accurate
triple-double-quote string parsing.

Regards,
Adam