Parsing s-expression trees

From: Carter
Subject: Parsing s-expression trees
Date: Sun, 31 Aug 2008 18:24:12 +0000
Message-ID: <7f3dd11e-a633-4e7f-a5e7-445e42b61189@1g2000pre.googlegroups.com>

Hi,

I am trying to design a language interpreter which takes text --> s-
expressions --> lisp. The first transformation for me for me is
reasonably well understood and i suspect I have enough knowledge to
know how to go about it but the second causes me some difficulties
because I am unsure how to construct a parser generator for trees (I
would prefer this since I expect my language specification to change
as I proceed). Are there existing tools which can walk and match s-
expressions efficiently or perhaps in the worse case some references
on how to parse tree grammars so I can generate my own?

Thanks in advance,

Carter.

Re: Parsing s-expression trees Mark Tarver
Re: Parsing s-expression trees Pascal J. Bourguignon
- Re: Parsing s-expression trees Carter
  - Re: Parsing s-expression trees Pascal J. Bourguignon
    - Re: Parsing s-expression trees Carter
      - Re: Parsing s-expression trees Mark Tarver
        Re: Parsing s-expression trees Carter
      - Re: Parsing s-expression trees Willem Broekema
Re: Parsing s-expression trees ······@gmail.com

From: Mark Tarver
Subject: Re: Parsing s-expression trees
Date: Sun, 31 Aug 2008 21:05:31 +0000
Message-ID: <aea85889-4fac-49e6-8f47-abfa9b94fafc@w7g2000hsa.googlegroups.com>

On 31 Aug, 19:24, Carter <···········@gmail.com> wrote:
> Hi,
>
> I am trying to design a language interpreter which takes text --> s-
> expressions --> lisp. The first transformation for me for me is
> reasonably well understood and i suspect I have enough knowledge to
> know how to go about it but the second causes me some difficulties
> because I am unsure how to construct a parser generator for trees (I
> would prefer this since I expect my language specification to change
> as I proceed). Are there existing tools which can walk and match s-
> expressions efficiently or perhaps in the worse case some references
> on how to parse tree grammars so I can generate my own?
>
> Thanks in advance,
>
> Carter.

http://www.lambdassociates.org/webbook/chap8.htm

might be useful - though its Qi not Lisp you can adapt the approach.

Mark

From: Pascal J. Bourguignon
Subject: Re: Parsing s-expression trees
Date: Sun, 31 Aug 2008 19:45:28 +0000
Message-ID: <87k5dxq887.fsf@hubble.informatimago.com>

Carter <···········@gmail.com> writes:
> I am trying to design a language interpreter which takes text --> s-
> expressions --> lisp. The first transformation for me for me is
> reasonably well understood and i suspect I have enough knowledge to
> know how to go about it but the second causes me some difficulties
> because I am unsure how to construct a parser generator for trees (I
> would prefer this since I expect my language specification to change
> as I proceed). Are there existing tools which can walk and match s-
> expressions efficiently or perhaps in the worse case some references
> on how to parse tree grammars so I can generate my own?

Once you have S-expressions, you have lisp.  There's nothing else to do.

You can directly interpret or compile it.

C/USER[16]> (defun my-compile (s-exp)
              (if (atom s-exp)
                  (typecase s-exp
                    (symbol `((fetch-push ,s-exp)))
                    (t      `((push-lit   ,s-exp))))
                  (case (first s-exp)
                    ((progn) `(,@(mapcan (function my-compile) (rest s-exp))))
                    ((if) (let ((cond (my-compile (second s-exp)))
                                (then (my-compile (third  s-exp)))
                                (else (my-compile (fourth s-exp))))
                            `(,@cond
                               (bfalse ,(1+ (length then)))
                               ,@then
                               ,@(when else
                                       `((branch ,(length else))
                                         ,@else)))))
                    ((setq) `(,@(my-compile (third s-exp))
                                (pop-store ,(second s-exp))))
                    (otherwise
                     `(,@(mapcan (function my-compile) (rest s-exp))
                         (jsr ,(first s-exp)))))))
MY-COMPILE
C/USER[17]> (dolist (ins (my-compile '(progn
                           (setq a 2)
                           (setq b (* 2 a))
                           (setq c 1)
                           (setq d (- (* b b) (* 4 a c)))
                           (if (<= 0 d)
                               (print (list (/ (- (- b) (sqrt d)) 2 a)
                                            (/ (+ (- b) (sqrt d)) 2 a)))
                               (print "none")))))
  (print ins))

(PUSH-LIT 2) 
(POP-STORE A) 
(PUSH-LIT 2) 
(FETCH-PUSH A) 
(JSR *) 
(POP-STORE B) 
(PUSH-LIT 1) 
(POP-STORE C) 
(FETCH-PUSH B) 
(FETCH-PUSH B) 
(JSR *) 
(PUSH-LIT 4) 
(FETCH-PUSH A) 
(FETCH-PUSH C) 
(JSR *) 
(JSR -) 
(POP-STORE D) 
(PUSH-LIT 0) 
(FETCH-PUSH D) 
(JSR <=) 
(BFALSE 19) 
(FETCH-PUSH B) 
(JSR -) 
(FETCH-PUSH D) 
(JSR SQRT) 
(JSR -) 
(PUSH-LIT 2) 
(FETCH-PUSH A) 
(JSR /) 
(FETCH-PUSH B) 
(JSR -) 
(FETCH-PUSH D) 
(JSR SQRT) 
(JSR +) 
(PUSH-LIT 2) 
(FETCH-PUSH A) 
(JSR /) 
(JSR LIST) 
(JSR PRINT) 
(BRANCH 2) 
(PUSH-LIT "none") 
(JSR PRINT) 
NIL
C/USER[18]> 

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

CAUTION: The mass of this product contains the energy equivalent of
85 million tons of TNT per net ounce of weight.

From: Carter
Subject: Re: Parsing s-expression trees
Date: Sun, 31 Aug 2008 20:40:05 +0000
Message-ID: <df514fc1-d9b9-4af3-ace4-3632c5e20427@o40g2000prn.googlegroups.com>

Thanks. Perhaps my terminology is a bit incorrect. Basically what I
envision is something that more resembles an AST expressed as a lisp
list where alot of the operations are not yet defined. I could
obviously just interpret this form but I would prefer to compile it
into more efficient lisp code/s-expressions.

From: Pascal J. Bourguignon
Subject: Re: Parsing s-expression trees
Date: Sun, 31 Aug 2008 21:52:43 +0000
Message-ID: <878wucrgwk.fsf@hubble.informatimago.com>

Carter <···········@gmail.com> writes:

> Thanks. Perhaps my terminology is a bit incorrect. Basically what I
> envision is something that more resembles an AST expressed as a lisp
> list where alot of the operations are not yet defined. I could
> obviously just interpret this form but I would prefer to compile it
> into more efficient lisp code/s-expressions.

Well if you need to attach attributes to the nodes of the syntactic
tree, you can indeed define structures or objects for them, and
translate the s-expressions into an AST.

And if you want to generate s-expression instead of lap, there's no
problem with that.

So what's the question?

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

"Remember, Information is not knowledge; Knowledge is not Wisdom;
Wisdom is not truth; Truth is not beauty; Beauty is not love;
Love is not music; Music is the best." -- Frank Zappa

From: Carter
Subject: Re: Parsing s-expression trees
Date: Sun, 31 Aug 2008 22:22:34 +0000
Message-ID: <5df9e16b-ebe4-4187-adc8-6c4f4df9b1c2@w39g2000prb.googlegroups.com>

Perhaps an example of what I mean would be the best way to explain it.
I envision my list language to be useful for programming and it might
look something like this-

(program A (parent A B C) ((int x) (int y) (int z))

(
   ((int fib) ( (int x) (int y) ) ( <code > )
   ((int fact) ( (int n) ) ( (* n (fact (- n 1)))))
)

)

Or some such. This isnt really quite lisp. So I need to I think have a
tree grammar which specifies the structures and write a parser
generator to generate the code I want. If I also want error reporting
I will also have to associate character indexes with various
constructs in this language. The question I have is how best or what
techniques exist for parsing such structures?

I am not sure simple recursive descent works can something like Qi-
Yacc accept lists as input?

Regards,

Carter.

From: Mark Tarver
Subject: Re: Parsing s-expression trees
Date: Sun, 31 Aug 2008 23:34:06 +0000
Message-ID: <ae2c4c1b-95fc-4ea9-9e5c-cf88536894fd@2g2000hsn.googlegroups.com>

On 31 Aug, 23:22, Carter <···········@gmail.com> wrote:
> Perhaps an example of what I mean would be the best way to explain it.
> I envision my list language to be useful for programming and it might
> look something like this-
>
> (program A (parent A B C) ((int x) (int y) (int z))
>
> (
>    ((int fib) ( (int x) (int y) ) ( <code > )
>    ((int fact) ( (int n) ) ( (* n (fact (- n 1)))))
> )
>
> )
>
> Or some such. This isnt really quite lisp. So I need to I think have a
> tree grammar which specifies the structures and write a parser
> generator to generate the code I want. If I also want error reporting
> I will also have to associate character indexes with various
> constructs in this language. The question I have is how best or what
> techniques exist for parsing such structures?
>
> I am not sure simple recursive descent works can something like Qi-
> Yacc accept lists as input?
>
> Regards,
>
> Carter.

Yes - Qi YACC reads lists and is designed to do so.  The whole read
and parse routine of Qi II is written in Qi-YACC saving me a great
deal of space.  You just have to be aware of common pitfalls like left-
recursion.

Generally the best way to begin is to write down the BNF of your
object language L as a right-recursive grammar.  Once you have done
that, its easy to generate a recognisor for L, i.e. a program that
recognises all and only grammatical sentences
of L.  Thus

<sentence> := <assignment> | <goto>
<assignment> := goto <symbol>

becomes

(defcc <sentence>
   <assignment>;
   <goto>;)

(defcc <assignment>
  goto <symbol>;)

When your recognisor is working then you attach semantic actions to
the production rules of your grammar. e.g.

(defcc <sentence>
   <assignment> := (parse-my-assignment <assignment>);
   <goto> := (parse-my-goto <goto>);)

If you use the mainstream Lisp, you might want to follow this
development scheme, but your code will be longer.

Here is a chunk of the Prolog compiler in Qi II written in Qi-YACC.
It is reading the Prolog program in Edinburgh syntax as a list of
characters and converting the whole thing into a list structure.

(defcc <horn_clauses>
  <optional_whitespace> <horn_clause> <horn_clauses>
             := [<horn_clause> | <horn_clauses>];
  <optional_whitespace> := [];)

(defcc <horn_clause>
  <head> <colon-dash> <body> #\. := [<head> :- <body>];
  <head> #\. := [<head> :- []];)

(defcc <head>
  <predicate> #\( <terms> #\) := [<predicate> | <terms>];
  <predicate> := [<predicate>];)

(defcc <colon-dash>
  <whitespaces> #\: #\-;
  #\: #\-;)

(defcc <optional_whitespace>
  <whitespace> <optional_whitespace>;
  <e> := [];)

The new Qi-YACC in Qi II will generate compilers that will incorporate
sensible error messages when the input does not parse, by citing the
part of the input where the parsing breaks down.

(1-) (m-prolog "foo(X,Y) : g(X,Y).")
syntax error in Prolog here:

foo(X,Y) : g(X,Y).

Lisp is more understood than Qi-YACC, so its easier to get help.  Qi
is itself better understood than Qi-YACC and because of pattern-
directed list handling, quite concise.  Though not as concise for
parsing as Qi-YACC.

Mark

From: Carter
Subject: Re: Parsing s-expression trees
Date: Mon, 01 Sep 2008 17:08:37 +0000
Message-ID: <2884a4ea-eaf6-4167-980a-b299ec649531@b38g2000prf.googlegroups.com>

On Sep 1, 7:34 am, Mark Tarver <··········@ukonline.co.uk> wrote:
> On 31 Aug, 23:22, Carter <···········@gmail.com> wrote:
>
>
>
> > Perhaps an example of what I mean would be the best way to explain it.
> > I envision my list language to be useful for programming and it might
> > look something like this-
>
> > (program A (parent A B C) ((int x) (int y) (int z))
>
> > (
> >    ((int fib) ( (int x) (int y) ) ( <code > )
> >    ((int fact) ( (int n) ) ( (* n (fact (- n 1)))))
> > )
>
> > )
>
> > Or some such. This isnt really quite lisp. So I need to I think have a
> > tree grammar which specifies the structures and write a parser
> > generator to generate the code I want. If I also want error reporting
> > I will also have to associate character indexes with various
> > constructs in this language. The question I have is how best or what
> > techniques exist for parsing such structures?
>
> > I am not sure simple recursive descent works can something like Qi-
> > Yacc accept lists as input?
>
> > Regards,
>
> > Carter.
>
> Yes - Qi YACC reads lists and is designed to do so.  The whole read
> and parse routine of Qi II is written in Qi-YACC saving me a great
> deal of space.  You just have to be aware of common pitfalls like left-
> recursion.
>
> Generally the best way to begin is to write down the BNF of your
> object language L as a right-recursive grammar.  Once you have done
> that, its easy to generate a recognisor for L, i.e. a program that
> recognises all and only grammatical sentences
> of L.  Thus
>
> <sentence> := <assignment> | <goto>
> <assignment> := goto <symbol>
>
> becomes
>
> (defcc <sentence>
>    <assignment>;
>    <goto>;)
>
> (defcc <assignment>
>   goto <symbol>;)
>
> When your recognisor is working then you attach semantic actions to
> the production rules of your grammar. e.g.
>
> (defcc <sentence>
>    <assignment> := (parse-my-assignment <assignment>);
>    <goto> := (parse-my-goto <goto>);)
>
> If you use the mainstream Lisp, you might want to follow this
> development scheme, but your code will be longer.
>
> Here is a chunk of the Prolog compiler in Qi II written in Qi-YACC.
> It is reading the Prolog program in Edinburgh syntax as a list of
> characters and converting the whole thing into a list structure.
>
> (defcc <horn_clauses>
>   <optional_whitespace> <horn_clause> <horn_clauses>
>              := [<horn_clause> | <horn_clauses>];
>   <optional_whitespace> := [];)
>
> (defcc <horn_clause>
>   <head> <colon-dash> <body> #\. := [<head> :- <body>];
>   <head> #\. := [<head> :- []];)
>
> (defcc <head>
>   <predicate> #\( <terms> #\) := [<predicate> | <terms>];
>   <predicate> := [<predicate>];)
>
> (defcc <colon-dash>
>   <whitespaces> #\: #\-;
>   #\: #\-;)
>
> (defcc <optional_whitespace>
>   <whitespace> <optional_whitespace>;
>   <e> := [];)
>
> The new Qi-YACC in Qi II will generate compilers that will incorporate
> sensible error messages when the input does not parse, by citing the
> part of the input where the parsing breaks down.
>
> (1-) (m-prolog "foo(X,Y) : g(X,Y).")
> syntax error in Prolog here:
>
> foo(X,Y) : g(X,Y).
>
> Lisp is more understood than Qi-YACC, so its easier to get help.  Qi
> is itself better understood than Qi-YACC and because of pattern-
> directed list handling, quite concise.  Though not as concise for
> parsing as Qi-YACC.
>
> Mark

Thanks Mark. I doubt I will be using Qi-Yacc at this point but perhaps
I could recycle the algorithm.

From: Willem Broekema
Subject: Re: Parsing s-expression trees
Date: Tue, 02 Sep 2008 09:01:06 +0000
Message-ID: <d7dd56d9-fdc3-4a2b-b62b-a0ba7fff6565@m3g2000hsc.googlegroups.com>

On Sep 1, 12:22 am, Carter <···········@gmail.com> wrote:
> (program A (parent A B C) ((int x) (int y) (int z))
>
> (
>    ((int fib) ( (int x) (int y) ) ( <code > )
>    ((int fact) ( (int n) ) ( (* n (fact (- n 1)))))
> )
>
> )
>
> Or some such. This isnt really quite lisp. So I need to I think have a
> tree grammar which specifies the structures and write a parser
> generator to generate the code I want.

My approach for CLPython is to implement compilation as macros. Maybe
that could work here too:

(program A (parent A B C) ((int x) (int y) (int z))
         (funcdef int fib
                  ((int x) (int y))
                  ("code1" "code2"))
         (funcdef int fact
                  ((int n))
                  ((* n (fact (- n 1))))))

(defmacro program (name parents vars &body body)
  `(... ,@body))

(defmacro funcdef (type name vars &body body)
  `(... ,@body))

The source type declarations could be included in the expansion as
type declarations or check-type forms.

> If I also want error reporting I will also have to associate
> character indexes with various constructs in this language.

Keeping the association between text source code location and
resulting Lisp form seems not so simple, but nice to have. Currently
in CLPython only errors detected by the lexer report the source
location line, as the lexer keeps count of newlines.

> The question I have is how best or what techniques exist for parsing such structures?

Not saying the above is better than the alternatives mentioned in this
thread, but at least for me it's working very well.

- Willem

From: ······@gmail.com
Subject: Re: Parsing s-expression trees
Date: Mon, 01 Sep 2008 18:32:54 +0000
Message-ID: <f4e90a54-34c2-486e-ba94-0153c0a7d15d@x16g2000prn.googlegroups.com>

On Aug 31, 11:24 am, Carter <···········@gmail.com> wrote:
> Hi,
>
> I am trying to design a language interpreter which takes text --> s-
> expressions --> lisp. The first transformation for me for me is
> reasonably well understood and i suspect I have enough knowledge to
> know how to go about it but the second causes me some difficulties
> because I am unsure how to construct a parser generator for trees (I
> would prefer this since I expect my language specification to change
> as I proceed). Are there existing tools which can walk and match s-
> expressions efficiently or perhaps in the worse case some references
> on how to parse tree grammars so I can generate my own?

Tangential to your issue, but note that lisp's s-expression is
irregular, this makes writing a parser for it non-trivial.

For detail, see:

★ Fundamental Problems of Lisp
http://xahlee.org/UnixResource_dir/writ/lisp_problems.html

★ The Concepts and Confusions of Prefix, Infix, Postfix and Fully
Nested Notations
http://xahlee.org/UnixResource_dir/writ/notations.html

also, in another post of recent, i detailed a isomorphism between tree
data structure and tree in textual form (like sexp). This isomorphism
has significant impact on the issues of parsing and much of the lang's
power. I haven't yet put that post up on my website.

In the following, is parts of the essay of one of the above article,
followed by the article on isomorphism between tree data structure and
tree syntax.

---------------------------
Fundamental Problems of Lisp

Xah Lee, 2008-07

In this article, we discuss 2 problems that are rooted in lisp. One is
the irregularity in its often cited regular syntax. The other is the
language's use of “cons” for list.

Syntax Irregularities

Lisp family of languages, in particular, Common Lisp, Scheme Lisp,
Emacs Lisp, are well know for its syntax's regularity, namely,
“everything” is of the form “(f x1 x2 ...)”. However, it is little
talked about that there are several irregularities in its syntax. Here
are some examples of the syntax irregularity.

The comment syntax of semicolon to end of line “;”.
The dotted notation for cons cell “(1 . 2)”.
The single quote syntax used to hold evaluation, e.g. “'(1 2 3)”.
The backquote and comma syntax used to hold but evaluate parts of
expression, e.g. “(setq x 1) (setq myVariableAndValuePair `(x ,x))”.
The “,@” for inserting a list as elements into another list. e.g.
“(setq myListX (list 1 2)) (setq myListY (list 3 4)) (setq myListXY
`(,@ myListX ,@ myListY))”
There are various others in Common Lisp or Scheme Lisp. For example,
the char “#” and “#|”. In Scheme's R6RS, it has introduced a few new
ones.
In the following, we detail how these irregularities hamper the power
of regular syntax, and some powerful features and language
developments that lisp have missed that may be due to it.

Confusing

Lisp's irregular syntax are practically confusing. For example, the
difference between “(list 1 2 3)”, “'(1 2 3)”, “(quote (1 2 3))” is a
frequently asked question. The use of “` , ,@” etc are esoteric. If
all these semantics use the regular syntactical form “(f args)”, then
much confusion will be reduced and people will understand and use
these features better. For example, The “'(1 2 3)” might be changed to
“(' 1 2 3)”, and

(setq myListXY `(,@ myListX ,@ myListY))
could have been:

(setq myListXY (eval-parts (splice myListX) (splice myListY)))
or with sugar syntax for typing convenience:

(setq myListXY (` (,@ myListX) (,@ myListY)))”
Syntax-Semantics Correspondence

A regular nested syntax has a one-to-one correspondence to the
language's abstract syntax tree, and to a large extent the syntax has
some correspondence to the language's semantics. The irregularities in
syntax break this correspondence.

For example, programers can pretty much tell what piece of source code
“(f x1 x2 x3 ...)” do by just reading the name that appears as first
element in the paren. As a contrast, in syntax soup languages such as
Java, Perl, the programmer must be familiar with each of its tens of
syntactical forms. (e.g. “if (...) {...}”, “for (...; ...; ...)
{...}”, “(some? this: that)”, “x++”, “myList = [1, 2, 3]” etc.) As a
example, if lisp's “'(1 2 3)” is actually “(quote 1 2 3)” or shortcut
form “(' 1 2 3)”, then it is much easier to understand.

Source Code Transformation

Lisp relies on a regular nested syntax. Because of such regularity of
the syntax, it allows transformation of the source code by a simple
lexical scan. This has powerful ramification. (lisp's macros is one
example) For example, since the syntax is regular, one could easily
have alternative, easier to read syntaxes as a layer. (the concept is
somewhat known in early lisp as M-expression) Mathematica took this
advantage (probably independent of lisp's influence), so that you
really have easy to read syntax, yet fully retain the regular form
advantages. In lisp history, such layer been done and tried here and
there in various forms or langs ( CGOL↗, Dylan↗), but never caught on
due to largely social happenings. Part of these reasons are political
and lisper's sensitivity to criticism of its nested parens.

In lisp communities, it is widely recognized that lisp's regular
syntax has the property that “code is data; data is code”. However,
what does that mean exactly is usually not clearly understood in the
lisp community. Here is its defining characteristics:

A regular nested syntax, makes it possible to do source code
transformations trivially with a lexical scan.

The benefits of a regular syntax has become widely recognized since
mid 2000s, by the XML language. The XML language, due to its strict
syntactical regularity, has developed into many transformation
technologies such XSLT, XQuery, STX etc. See XML transformation
languages↗.

Automatic, Uniform, Universal, Source Code Display

One of the advantage of pure fully functional syntax is that a
programer should never need to format his source code (i.e. pressing
tabs, returns) in coding, and save the hundreds hours of labor,
guides, tutorials, advices, publications, editor tools, on what's
known as “coding style convention”, because the editor can reformat
the source code on the fly based on a simple lexical scan.

Because lisp's syntax has lots of nested parenthesis, the source code
formatting is much more labor intensive than syntax soup languages
such as Perl, even when using a dedicated lisp editor such as emacs
that contain large number editing commands on nested syntax.

The lisp community, established a particular way of formatting lisp
code as exhibited in emacs's lisp modes and written guides of
conventions. The recognition of such convention further erode any
possibility and awareness of automatic, uniform, universal,
formatting. (e.g. the uniform and universal part of advantage is
exhibited by Python)

As a example, the Mathematica language features a pure nested syntax
similar to lisp but without irregularities. So, in that language,
since version 3 released in 1996, the source code in its editor are
automatically formatted on the fly as programer types, much in the
same way paragraphs are automatically wrapped in a word processor
since early 1990s

Note the phrase “automatic, uniform, universal, source code display”.
By “automatic”, it means that any text editor can format your code on
the fly or by request, and this feature can be trivially implemented.
By “uniform”, it means there is one simple and mechanical heuristic,
to determine a canonical way to format any lisp code for human-
readable display. By “universal” is meant that all programers, will
recognize and habituated with this one canonical way, as a standard.
(they can of course set their editor to display it in other ways)

The “uniform” and “universal” aspect is a well-known property of
Python lang's source code. The reason Python's source code has such
uniform and universal display formatting is because it is worked into
the language's semantics. i.e. the semantics of the code depends on
the formatting (i.e. where you press tabs and returns). But also note,
Python's source code is not and cannot be automatically formatted,
precisely because the semantics and formatting is tied together. A
strictly regular nested syntax, such as Mathematica's, can, and is
done, since 1996. Lisp, despite its syntax irregularities, i think it
still can have a automatic formatting at least to a large, practical,
extent. Once lisp has automatic on-the-fly formatting (think of
emacs's fill-sexp, auto-fill-sexp), then lisp code will achieve
uniform and universal source code formatting display.

The advantage of having a automatic, uniform, universal, source code
display for a language is that it gets rids of the hundreds of hours
on the labor, tools, guides, arguments, about how one should format
his code. (this is partly the situation of Python already) But more
importantly, by having such properties, it will actually have a impact
on how programer codes in the language. i.e. what kind of idioms they
choose to use, what type of comments they put in code, and where.
This, further influences the evolution of the language, i.e. what kind
of functions or features are added to the lang. For some detail on
this aspect, see: The Harm of Manual Code Formating

Syntax As Markup Language

One of the power of such pure nested syntax is that you could build up
layers on top of it, so that the source code can function as markup of
conventional mathematical notations (i.e. MathML) and or as a word-
processing-like file that can contain structures, images (e.g.
Microsoft Office Open XML↗), yet lose practical nothing.

This is done in Mathematica in 1996 with release of Mathematica
version 3. (e.g. think of XML, its uniform nested syntax, its diverse
use as a markup lang, then, some people are adding computational
semantics to it now (i.e. a computer language with syntax of xml. e.g.
O:XML↗). You can think of Mathematica going the other way, by starting
with a computer lang with a regular nested syntax, then add new but
inert keywords to it with markup semantics. The compiler will just
treat these inert keywords like comment syntax when doing computation.
When the source code is read by a editor, the editor takes the markup
keywords for structural or stylistic representation, with title,
chapter heading, tables, images, animations, hyperlinks, typeset math
expression (e.g. think of MathML↗) etc. The non-marked-up keywords are
shown as one-dimensional textual source code just like source code is
normally shown is most languages.)

Frequently Asked Questions

You say that lisp syntax irregularities “reduce such syntax's power”.
What you mean by “syntax's power”?

Here are some concrete examples of what i mean by power of syntax.

In lisp, the comment is done by the char “;” running to end of line.
Note that this does not allow nested comment. So for example, if you
have multi-line code, and you want to comment out them all, you have
to prepend each line by a semicolon. However, if you have nested
comment syntax, one could just bracket the block of code to comment it
out. This, is a simple, perhaps trivial, example of “power of a
syntax”.

In Python, the formatting is part of the lang's syntax. Some
programers may not like it, but it is well accepted that due to
Python's syntax, python code is very easy to read, and it much done
away about programer preferences and argument about code formatting.
This is example of power of a syntax.

Let me give another, different example. You know that perl's syntax,
often the function's arguments do not necessarily need to have a paren
around it. For example, “print (3);” and “print 3;” are the same
thing. This is a example of power of syntax, when considered as a
flexibility or save of typing, for good or bad. Similarly, in
javascript for example, ending semicolon is optional when it is at end
of a line. (for sample perl and python code, see Xah's Perl and Python
Tutorial.)

In Mathematica, the language has a syntax system such that you can use
fully regular nested notation (e.g. “f[g[x]]”), postfix notation (e.g.
“x//g//f”), prefix notation (e.g. ··@·@x”), infix notation (e.g.
“1~Plus~2”), for ANY function in the language, and you can mix all of
the above. (prefix and postfix by nature are limited to functions with
just 1 arg, and infix notation by nature are limited to functions with
2 args) This is a example of power of syntax.

(for detailed explanation of Mathematica syntax and comparison to
lisp's, see: The Concepts and Confusions of Prefix, Infix, Postfix and
Fully Functional Notations )

In general, a computer lang has a syntax. The syntax, as text written
from left to right, has various properties and characteristics. Ease
of input (think of APL as counter example), succinctness (e.g. Perl,
APL), variability (Perl, Mathematica), readability (Python),
familiarity (C, Java, Javascript, ...), 2-dimensional notation (e.g.
traditional math notation) (Mathematica), ease of parsing (lisp),
regularity (APL, Mathematica, Lisp, XML), flexibility (Mathematica)...
etc. Basically, you can look at syntax, and programer's need to type
them, and how the textual structure can be mapped to the semantic
space, from many perspectives. The good qualities, such as ease of
input, ease of reading, ease of parsing, ease of transformation,
simplicity, flexibility, etc, can be considered as elements of the
syntax's power.

As a example of syntax of little power, think of a lang using just the
symbol “0” and “1” as its sole char set.

Many of lisp's sugar syntax are designed to reduce nested paren. Why
using a more consistent, but more annoying sugar syntax?

Ultimately, you have to ask why lisper advocate nested syntax in the
first place.

If lispers love the nested syntax, then, the argument that there
should not be irregularities, has merit. If lispers think occasional
irregularities of non parenthesized syntax is good, then there's the
question of how many, or what form. You might as well introduce “++i”
for “(setq i (1+ i))”.

--------------------------------------------------------------

2008-08-23

someone wrote:
«
(setf bin-tree '(4 (2 1 3) (6 5 7)))

Thus in my view the only fringe nodes (leaves) are 1, 3, 5, and 7.
»

Yes, you are right.

also note, when in a language where there's isomorphism between the
syntax and the nested list, there's a concept of head.

For example, in pseudo lisp:

(list (list 1 3) (list 5 7))

In this list, the 1,3,5,7 are leafs. Each having index {1,1}, {1,2},
{2,1}, {2,2}.

Now, the three appearances of “list”, are non-leaf nodes in the tree,
having indexes of: {0}, {1,0}, {2,0}. These positions are called Head
in Mathematica, and the notion of head also appear in lisp.

Now, consider this pseudo lisp: (4 (2 1 3) (6 5 7))

which is closer to what you gave above. Now, the element at index {0}
is 4, and at {1,0} is 2, and at {2,0} is 6.

The whole expression itself, is still a tree or a nested list. In this
way, we can see that there is a isomorphism between the textual
representation of a tree, and what is actually considered a list
datatype in the lisp-like language.

So, suppose you invented a lisp language, so that there's no cons, but
the symbol “list” represent a list datatype (the implementation of the
language may actually just be linked list as cons). So, in this
language, the expression

(list (list 1 3) (list 5 7))

would represent a list datatype. However, the expression

(4 (2 1 3) (6 5 7))

would not be a list datatype, however, the expression's structure is
identical to the previous one, and still a tree. In this language,
when the head of a expression does not have a valid definition, such
as being a integer, it is simply left unevaluated.

So, in this lang, both

(list (list 1 3) (list 5 7)) and (4 (2 1 3) (6 5 7))

are valid expressions, and of identical structure. The expression
itself represent a tree. The 2 expression can be easily transformed
into each other, by simply doing a replacement of the atom “list” to
one of integer, or vice versa. (e.g. replacing by pattern matching or
actually apply a function to the head positions)

Now, the thing about languages with a pure nested syntax is that, the
head itself can be a nested expression. For example, you can have

((f x) 1 2 3)

So, when you have a expression such as

(x (y 1 3) (z 5 7))

The indexes at {0}, {1,0}, {2,0}, i.e. the x, y, z, needs not to be
atoms themselves. They can be arbitrary expressions (tree).

So this means, in this language the head, or non-leaf nodes of a tree,
can hold data, not just the leafs.

In most lang that supports nested list, such as perl, php, python,
only the leaf nodes holds data. But as you can see the above, in this
lang with regular nested syntax, not only leaf nodes can hold data,
but any node, including non-leaf nodes (heads), can hold arbitrarily
nested data.

As a illustration to intermediat, in XML, the none-leaf nodes can hold
a limited amount of data, called its attributes.

actually, lang such as perl, php, python, javascript, you can actually
have a nested list where the non-leaf nodes also holds arbitrary
data.  All you have to do, is to consider that the first element of a
list as the non-leaf nodes (i.e. lisp's “head”).

In langs such as perl etc, assuming 1st element of list as non-leaf
node is not a problem. But in lang with a purely nested syntax, by
assuming the 1st element of list as non-leaf node, i think it
necessarily introduces a more complex model of interpretation if you
still want a isomorphism between the syntax, tree, and list datatype.

--------------------------

Now, in lisp, because of the cons, combined with the fact that its
syntax is irregular, truely, fucked up all the beauty and power of
list processing.

The problem of the cons business is well known. For example, your
surprise at the various definition of leaf is just one Frequent
puzzle. The other thing about its irregular syntax, such as

'(1 2 3)
(quote (1 2 3))
(list 1 2 3)

adds more complexity to the cons problem. For example, if you read
again the above exposition about the isomorphism between the purely
nested uniform syntax, tree, and list datatype, and try to apply it to
Common Lisp, Emacs Lisp, or Scheme Lisp, you'll find all sort of
problems, and really see how lisp is crippled, in the sense that it
could have been far more consistent, simpler, powerful.

The above pseudo lisp lang explanation is based on my knowledge of
Mathematica. i.e. it is basically how Mathematica is.

Further readings:

• Trees
 http://xahlee.org/tree/tree.html

• “The Concepts and Confusions of Prefix, Infix, Postfix and Fully
Functional Notations”
 http://xahlee.org/UnixResource_dir/writ/notations.html

• Fundamental Problems of Lisp
http://xahlee.org/UnixResource_dir/writ/lisp_problems.html

  Xah
∑ http://xahlee.org/

☄