From: ······@gmail.com
Subject: tutorial: writing a emacs major mode for syntax coloring
Date: 
Message-ID: <98c9657a-903e-4635-8d17-e01fd0858661@r10g2000prf.googlegroups.com>
• How To Write A Emacs Major Mode For Syntax Coloring
  http://xahlee.org/emacs/elisp_syntax_coloring.html

plain text version follows
------------------------
How To Write A Emacs Major Mode For Syntax Coloring

Xah Lee, 2008-11

This page gives a practical example of writing a emacs major mode to
do syntax coloring of your own language. You should have at least few
months experience of coding emacs lisp.

The Problem

Your company uses its own in-house language. You want to write a major
mode for that language, so that the keywords of the language will be
highlighted.

Solution

Suppose your language source code looks like this:

Sin[x]^2 + Cos[y]^2 == 1
Pi^2/6 == Sum[1/x^2,{x,1,Infinity}]

You want the words “Sin”, “Cos”, “Sum”, colored as functions, and “Pi”
and “Infinity” colored as constants.

Here's how you define the mode:

(setq myKeywords
 '(("Sin\\|Cos\\|Sum" . font-lock-function-name-face)
   ("Pi\\|Infinity" . font-lock-constant-face)
  )
)

(define-derived-mode math-lang-mode fundamental-mode
  (setq font-lock-defaults '(myKeywords)))

The string “"Sin\\|Cos\\|Sum"” is a regex, the “font-lock-function-
name-face” is a pre-defined variable that holds the value for the
default font face used for functions.

The line “define-derived-mode” defines your mode, named math-lang-
mode, based on the fundamental-mode (which is the most basic mode).
The line (setq font-lock-defaults '(myKeywords)) tells emacs that when
your mode is active, the syntax coloring should be set according to
your keywords.

That's all there is to it. Now, when you invoke “math-lang-mode”,
emacs will now syntax color the buffer's text. (you must have font-
lock-mode on, if not, do “Alt+x font-lock-mode”.) Here's what it looks
like:

Sin[x]^2 + Cos[y]^2 == 1
Pi^2/6 == Sum[1/x^2,{x,1,Infinity}]
O My GOD, Emacs is beautiful!

(info "(elisp)Font Lock Mode")
(info "(elisp)Major Modes")
(info "(elisp)Faces for Font Lock")

Hundreds Of Keywords

Typically, a language may have hundreds of keywords. Elisp provide a
way to generate regex for your keywords.

Suppose the you are writing a mode for the Linden Scripting Language↗,
which has close to 6 hundred keywords. Here's a example of how to code
it.

;; define several class of keywords
(defvar mylsl-keywords
  '("break" "default" "do" "else" "for" "if" "return" "state" "while")
  "LSL keywords.")

(defvar mylsl-types
  '("float" "integer" "key" "list" "rotation" "string" "vector")
  "LSL types.")

(defvar mylsl-constants
  '("ACTIVE" "AGENT" "ALL_SIDES" "ATTACH_BACK")
  "LSL constants.")

(defvar mylsl-events
  '("at_rot_target" "at_target" "attach")
  "LSL events.")

(defvar mylsl-functions
  '("llAbs" "llAcos" "llAddToLandBanList" "llAddToLandPassList")
  "LSL functions.")

In the above, first we define several lists, each one is a class of
keywords in the language. Note that the keyword list in the above is
truncated. Each list can have hundreds of keywords.

;; create the regex string for each class of keywords
(defvar mylsl-keywords-regexp (regexp-opt mylsl-keywords 'words))
(defvar mylsl-type-regexp (regexp-opt mylsl-types 'words))
(defvar mylsl-constant-regexp (regexp-opt mylsl-constants 'words))
(defvar mylsl-event-regexp (regexp-opt mylsl-events 'words))
(defvar mylsl-functions-regexp (regexp-opt mylsl-functions 'words))

In the above, we generate the regex for each keyword class, using the
built-in function “regexp-opt”. We gave regexp-opt a second optional
argument “'words”. This will create a regex to match whole word only.
So that, when a word is contained inside a longer word, it will not be
highlighted. (For example, “for” is usually a looping keyword, but if
you have a user created function named “inform”, you don't want part
of the word colored as “for”.)

(info "(elisp)Regexp Functions")

;; clear memory
(setq mylsl-keywords nil)
(setq mylsl-types nil)
(setq mylsl-constants nil)
(setq mylsl-events nil)
(setq mylsl-functions nil)

In the above, we clear the lists to save memory, since we don't need
it anymore.

;; create the list for font-lock.
;; each class of keyword is given a particular face
(setq mylsl-font-lock-keywords
  `(
    (,mylsl-type-regexp . font-lock-type-face)
    (,mylsl-constant-regexp . font-lock-constant-face)
    (,mylsl-event-regexp . font-lock-builtin-face)
    (,mylsl-functions-regexp . font-lock-function-name-face)
    (,mylsl-keywords-regexp . font-lock-keyword-face)
    ;; note: order above matters. “mylsl-keywords-regexp” goes last
because
    ;; otherwise the keyword “state” in the function “state_entry”
    ;; would be highlighted.
))

In the above, we create a list in preparation to feed it to “font-lock-
defaults”.

Note that the highlighting mechanism of font-lock-defaults is based on
first-come-first-serve basis, and once a piece of text got its
coloring, it won't be changed. So, the order of your list is
important. Make sure the smallest lengthed text goes last. (this won't
fix all cases where a keyword matches part of other keywords. If your
language has a lot such keywords, you need to use other forms to solve
this problem. (info "(elisp)Search-based Fontification"))

The “`( ,a ,b ...)” is a lisp special syntax to evaluate parts of
element inside the list. Inside the paren, elements preceded by a “,”
will be evaluated.

Finally, we define our mode like this:

;; define the mode
(define-derived-mode mylsl-mode fundamental-mode
  "lsl mode"
  "Major mode for editing LSL (Linden Scripting Language)..."
  ;; ...

  ;; code for syntax highlighting
  (setq font-lock-defaults '((mylsl-font-lock-keywords)))

  ;; clear memory
  (setq mylsl-keywords-regexp nil)
  (setq mylsl-types-regexp nil)
  (setq mylsl-constants-regexp nil)
  (setq mylsl-events-regexp nil)
  (setq mylsl-functions-regexp nil)

  ;; ...
)

In the above, we based our mode on fundamental-mode, which is the most
basic mode. If you are actually writing a mode for LSL, it makes sense
to base it on c-mode, since the syntax is similar. Basing on a similar
language's mode will save you time in coding many features, such as
handling comment and indentation.

Also, the above code only covers syntax coloring. A full featured
major mode will also have commands to handle comments, indentation,
keyword completion, function documentation lookup, function templates,
graphical menus, or any other features.

  Xah
∑ http://xahlee.org/

☄

From: ······@corporate-world.lisp.de
Subject: Re: tutorial: writing a emacs major mode for syntax coloring
Date: 
Message-ID: <e9789274-2754-4201-85cc-5022e8e00540@k19g2000yqg.googlegroups.com>
On Nov 21, 8:51 am, ·······@gmail.com" <······@gmail.com> wrote:
> ;; create the list for font-lock.
> ;; each class of keyword is given a particular face
> (setq mylsl-font-lock-keywords
>   `(
>     (,mylsl-type-regexp . font-lock-type-face)
>     (,mylsl-constant-regexp . font-lock-constant-face)
>     (,mylsl-event-regexp . font-lock-builtin-face)
>     (,mylsl-functions-regexp . font-lock-function-name-face)
>     (,mylsl-keywords-regexp . font-lock-keyword-face)
>     ;; note: order above matters. “mylsl-keywords-regexp” goes last
> because
>     ;; otherwise the keyword “state” in the function “state_entry”
>     ;; would be highlighted.
> ))

...

>
>
> The “`( ,a ,b ...)” is a lisp special syntax to evaluate parts of
> element inside the list. Inside the paren, elements preceded by a “,”
> will be evaluated.

I don't believe it. This guy is in the 'cons business' and using
"irregular" Lisp syntax.
After years of posting to this newsgroup about how bad this is he
uses:

* "irregular" Lisp syntax
* "irregular" Lisp comments
* "cons business"

At least he could have tried to get rid of some of the "irregular"
syntax:

(setq mylsl-font-lock-keywords
      (list (cons mylsl-type-regexp        (quote font-lock-type-
face))
            (cons mylsl-constant-regexp    (quote font-lock-constant-
face))
            (cons mylsl-event-regexp       (quote font-lock-builtin-
face))
            (cons mylsl-functions-regexp   (quote font-lock-function-
name-face))
            (cons mylsl-keywords-regexp    (quote font-lock-keyword-
face))
            ;; note: order above matters. “mylsl-keywords-regexp” goes
last because
            ;; otherwise the keyword “state” in the function
“state_entry”
            ;; would be highlighted.
            ))

Still there is the 'cons business' and the irregular comment syntax.
How much more would he be productive, if he also would drop those???
From: ······@gmail.com
Subject: Re: tutorial: writing a emacs major mode for syntax coloring
Date: 
Message-ID: <ea20afa0-0f73-4a11-9f34-eebedfcad29b@z27g2000prd.googlegroups.com>
On Nov 21, 12:05 pm, ·······@corporate-world.lisp.de"
<······@corporate-world.lisp.de> wrote:

> I don't believe it. This guy is in the 'cons business' and using
> "irregular" Lisp syntax.
> After years of posting to this newsgroup about how bad this is he
> uses:
>
> * "irregular" Lisp syntax
> * "irregular" Lisp comments
> * "cons business"
>
> At least he could have tried to get rid of some of the "irregular"
> syntax:
>
> (setq mylsl-font-lock-keywords
>       (list (cons mylsl-type-regexp        (quote font-lock-type-
> face))
>             (cons mylsl-constant-regexp    (quote font-lock-constant-
> face))
>             (cons mylsl-event-regexp       (quote font-lock-builtin-
> face))
>             (cons mylsl-functions-regexp   (quote font-lock-function-
> name-face))
>             (cons mylsl-keywords-regexp    (quote font-lock-keyword-
> face))
>             ;; note: order above matters. “mylsl-keywords-regexp” goes
> last because
>             ;; otherwise the keyword “state” in the function
> “state_entry”
>             ;; would be highlighted.
>             ))
>
> Still there is the 'cons business' and the irregular comment syntax.
> How much more would he be productive, if he also would drop those???

Dear Rainer,

you see, your problem is one of those fanatics who refuse to see.

I have wrote this FAQ about how having vectors, hash, list, does not
solve the cons problem.

• Fundamental Problems of Lisp
  http://xahlee.org/UnixResource_dir/writ/lisp_problems.html

See the second section of “Frequently Asked Questions” near the bottom
of page.

here's a excerpt:

Q: If you don't like cons, lisp has arrays and hashmaps, too.

A: Suppose there's a lang called gisp. In gisp, there's cons but also
fons. Fons are just like cons except it has 3 cells with car, cbr,
cdr. Now, gisp is a old lang, the fons are deeply rooted in the lang.
Every some 100 lines of code you'll see a use of fons and car, cbr,
cdr, or any one of the caar, cdar, cbbar, cdbbar, etc. You got annoyed
by this. You as a critic, complains that fons is bad. But then some
gisp fanatics retort by saying: “If you don't like fons, gisp has
cons, too.”.

You see, by “having something too”, does not solve the problem of
pollution. Sure, you can use just cons in gisp, but every lib or
other's code you encounter, there's a invasion of fons with its cbbar,
cdbbar, cbbbr. The problem created by fons cannot be solved by “having
cons too”.

------------------------

Now, in your post you complain that having regular form of syntax
should quell syntax irregularity criticism.

Not so. The issue is similar the FAQ cited above about the cons
problem.

But also note, not all of the special irregular syntax in lisp has a
corresponding regular form.

Further, even if all special irregular syntax has a corresponding
form, many criticisms of the irregular syntax still stands. (perhaps
now it's time to reread my article carefully)

This message is posted to: comp.lang.lisp, comp.emacs,
comp.lang.scheme, comp.lang.functional .

  Xah
∑ http://xahlee.org/

☄
From: Asgeir
Subject: Re: tutorial: writing a emacs major mode for syntax coloring
Date: 
Message-ID: <20081121203515.0ced31e6@sigmund>
Nice and very usefull, thanks a lot guy.
-- 
Asgeir