backquote question

From: Bakul Shah
Subject: backquote question
Date: Fri, 05 Dec 2008 21:32:59 +0000
Message-ID: <49399E0B.2070409@bitblocks.com>

Consider the following code.

0: (define x '(1 2 3))  ; or (defvar x '(1 2 3)) in CL
1: `(0 . ,x)
2: `#(0 ,x)
3: `#(0 . ,x)

Expressions 1 & 2 were handled correctly by all the lisp
implementations I tried but they disagreed on expression 3.
guile, scm, ypsilon returned #(0 1 2 3), while others either
complained or returned something wrong like #(0 unquote x),
or spectacularly wrong such as #(0 (1 . 1) (1 2 3))!

Now I don't care to know which implementations work right,
but I do want to know if expression 3 is a valid Scheme (or
CL) expression.  My reading of Scheme R5RS and the CL
Hyperspec says it is legal (but if you argue it is not, then
expression 1 should also be illegal).

Re: backquote question leppie
Re: backquote question Ray Dillinger
Re: backquote question Xah Lee
Re: backquote question Kaz Kylheku
- Re: backquote question Bakul Shah
Re: backquote question William D Clinger

From: leppie
Subject: Re: backquote question
Date: Fri, 05 Dec 2008 21:55:02 +0000
Message-ID: <072f753d-0f6a-46d0-93fc-13245f6937f8@c1g2000yqg.googlegroups.com>

On Dec 5, 11:32 pm, Bakul Shah <············@bitblocks.com> wrote:
> Now I don't care to know which implementations work right,
> but I do want to know if expression 3 is a valid Scheme (or
> CL) expression.  My reading of Scheme R5RS and the CL
> Hyperspec says it is legal (but if you argue it is not, then
> expression 1 should also be illegal).

Expression 3 is a lexical error. A vector has no '.' operator (or what
ever you wish to call it).

From: Ray Dillinger
Subject: Re: backquote question
Date: Fri, 05 Dec 2008 22:13:26 +0000
Message-ID: <4939a78e$0$95518$742ec2ed@news.sonic.net>

Bakul Shah wrote:

> Consider the following code.

> 0: (define x '(1 2 3))  ; or (defvar x '(1 2 3)) in CL
> 1: `(0 . ,x)
> 2: `#(0 ,x)
> 3: `#(0 . ,x)

> Expressions 1 & 2 were handled correctly by all the lisp
> implementations I tried but they disagreed on expression 3.
...
> Now I don't care to know which implementations work right,
> but I do want to know if expression 3 is a valid Scheme (or
> CL) expression.  My reading of Scheme R5RS and the CL
> Hyperspec says it is legal (but if you argue it is not, then
> expression 1 should also be illegal).

In scheme, arrays start with #( and end with ).  The only 
things arrays may contain are values.  A dot by itself does 
not represent any value.  It's not a number, a symbol, a 
string, a boolean, or a list.  

There is a dot by itself in list syntax, but lists are not 
arrays.  Expression 3 does not represent a legal value.  The 
implementations that complained about a syntax error are the 
most correct. 

                                Bear

From: Xah Lee
Subject: Re: backquote question
Date: Fri, 05 Dec 2008 22:22:01 +0000
Message-ID: <fcd836f8-b7b8-42ff-850c-427fe69ed17d@35g2000pry.googlegroups.com>

On Dec 5, 1:32 pm, Bakul Shah <············@bitblocks.com> wrote:
> Consider the following code.
>
> 0: (define x '(1 2 3))  ; or (defvar x '(1 2 3)) in CL
> 1: `(0 . ,x)
> 2: `#(0 ,x)
> 3: `#(0 . ,x)
>
> Expressions 1 & 2 were handled correctly by all the lisp
> implementations I tried but they disagreed on expression 3.
> guile, scm, ypsilon returned #(0 1 2 3), while others either
> complained or returned something wrong like #(0 unquote x),
> or spectacularly wrong such as #(0 (1 . 1) (1 2 3))!
>
> Now I don't care to know which implementations work right,
> but I do want to know if expression 3 is a valid Scheme (or
> CL) expression.  My reading of Scheme R5RS and the CL
> Hyperspec says it is legal (but if you argue it is not, then
> expression 1 should also be illegal).

your question is a FAQ.

Others have answered your question.

See here for a general explanation of the situation:

• Fundamental Problems of Lisp
  http://xahlee.org/UnixResource_dir/writ/lisp_problems.html

Here's is the exceprt of the section “Syntax Irregularities”.

-------------------------------------------

Syntax Irregularities

Lisp family of languages, in particular, Common Lisp, Scheme Lisp,
Emacs Lisp, are well know for its syntax's regularity, namely,
“everything” is of the form “(f x1 x2 ...)”. However, it is little
talked about that there are several irregularities in its syntax. Here
are some examples of the syntax irregularity.

The comment syntax of semicolon to end of line “;”.
The dotted notation for cons cell “(1 . 2)”.
The single quote syntax used to hold evaluation, e.g. “'(1 2 3)”.
The backquote and comma syntax used to hold but evaluate parts of
expression, e.g. “(setq x 1) (setq myVariableAndValuePair `(x ,x))”.
The “,@” for inserting a list as elements into another list. e.g.
“(setq myListX (list 1 2)) (setq myListY (list 3 4)) (setq myListXY `
(,@ myListX ,@ myListY))”
There are various others in Common Lisp or Scheme Lisp. For example,
the char “#” and “#|”. In Scheme's R6RS, it has introduced a few new
ones.
In the following, i detail how these irregularities hamper the power
of regular syntax, and some powerful features and language
developments that lisp have missed that may be due to it.

Confusing

Lisp's irregular syntax are practically confusing. For example, the
difference between “(list 1 2 3)”, “'(1 2 3)”, “(quote (1 2 3))” is a
frequently asked question. The use of “` , ,@” etc are esoteric. If
all these semantics use the regular syntactical form “(f args)”, then
much confusion will be reduced and people will understand and use
these features better. For example, The “'(1 2 3)” might be changed to
“(' 1 2 3)”, and

(setq myListXY `(,@ myListX ,@ myListY))
could have been:

(setq myListXY (eval-parts (splice myListX) (splice myListY)))
or with sugar syntax for typing convenience:

(setq myListXY (` (,@ myListX) (,@ myListY)))”
Syntax-Semantics Correspondence

A regular nested syntax has a one-to-one correspondence to the
language's abstract syntax tree, and to a large extent the syntax has
some correspondence to the language's semantics. The irregularities in
syntax break this correspondence.

For example, programers can pretty much tell what piece of source code
“(f x1 x2 x3 ...)” do by just reading the name that appears as first
element in the paren. As a contrast, in syntax soup languages such as
Java, Perl, the programmer must be familiar with each of its tens of
syntactical forms. (e.g. “if (...) {...}”, “for (...; ...; ...)
{...}”, “(some? this: that)”, “x++”, “myList = [1, 2, 3]” etc.) As a
example, if lisp's “'(1 2 3)” is actually “(quote 1 2 3)” or shortcut
form “(' 1 2 3)”, then it is much easier to understand.

Source Code Transformation

Lisp relies on a regular nested syntax. Because of such regularity of
the syntax, it allows transformation of the source code by a simple
lexical scan. This has powerful ramification. (lisp's macros is one
example) For example, since the syntax is regular, one could easily
have alternative, easier to read syntaxes as a layer. (the concept is
somewhat known in early lisp as M-expression↗) Mathematica took this
advantage (probably independent of lisp's influence), so that you
really have easy to read syntax, yet fully retain the regular form
advantages. In lisp history, such layer been done and tried here and
there in various forms or langs ( CGOL↗, Dylan↗), but never caught on
due to largely social happenings. Part of these reasons are political
and lisper's sensitivity to criticism of its nested parens.

In lisp communities, it is widely recognized that lisp's regular
syntax has the property that “code is data; data is code”. However,
what does that mean exactly is usually not clearly understood in the
lisp community. Here is its defining characteristics:

A regular nested syntax, makes it possible to do source code
transformations trivially with a lexical scan.

The benefits of a regular syntax has become widely recognized since
mid 2000s, by the XML language. The XML language, due to its strict
syntactical regularity, has developed into many transformation
technologies such XSLT, XQuery, STX etc. See XML transformation
languages↗.

Automatic, Uniform, Universal, Source Code Display

One of the advantage of pure fully functional syntax is that a
programer should never need to format his source code (i.e. pressing
tabs, returns) in coding, and save the hundreds hours of labor,
guides, tutorials, advices, publications, editor tools, on what's
known as “coding style convention”, because the editor can reformat
the source code on the fly based on a simple lexical scan.

Because lisp's syntax has lots of nested parenthesis, the source code
formatting is much more labor intensive than syntax soup languages
such as Perl, even when using a dedicated lisp editor such as emacs
that contain large number editing commands on nested syntax.

The lisp community, established a particular way of formatting lisp
code as exhibited in emacs's lisp modes and written guides of
conventions. The recognition of such convention further erode any
possibility and awareness of automatic, uniform, universal,
formatting. (e.g. the uniform and universal part of advantage is
exhibited by Python)

As a example, the Mathematica language features a pure nested syntax
similar to lisp but without irregularities. So, in that language,
since version 3 released in 1996, the source code in its editor are
automatically formatted on the fly as programer types, much in the
same way paragraphs are automatically wrapped in a word processor
since early 1990s

Note the phrase “automatic, uniform, universal, source code display”.
By “automatic”, it means that any text editor can format your code on
the fly or by request, and this feature can be trivially implemented.
By “uniform”, it means there is one simple and mechanical heuristic,
to determine a canonical way to format any lisp code for human-
readable display. By “universal” is meant that all programers, will
recognize and habituated with this one canonical way, as a standard.
(they can of course set their editor to display it in other ways)

The “uniform” and “universal” aspect is a well-known property of
Python lang's source code. The reason Python's source code has such
uniform and universal display formatting is because it is worked into
the language's semantics. i.e. the semantics of the code depends on
the formatting (i.e. where you press tabs and returns). But also note,
Python's source code is not and cannot be automatically formatted,
precisely because the semantics and formatting is tied together. A
strictly regular nested syntax, such as Mathematica's, can, and is
done, since 1996. Lisp, despite its syntax irregularities, i think it
still can have a automatic formatting at least to a large, practical,
extent. Once lisp has automatic on-the-fly formatting (think of it as
emacs's command named fill-sexp, auto-fill-sexp), then lisp code will
achieve uniform and universal source code formatting display.

The advantage of having a automatic, uniform, universal, source code
display for a language is that it gets rids of the hundreds of hours
on the labor, tools, guides, arguments, about how one should format
his code. (this is partly the situation of Python already) But more
importantly, by having such properties, it will actually have a impact
on how programer codes in the language. i.e. what kind of idioms they
choose to use, what type of comments they put in code, and where.
This, further influences the evolution of the language, i.e. what kind
of functions or features are added to the lang. For some detail on
this aspect, see: The Harm of Manual Code Formating

Syntax As Markup Language

One of the power of such pure nested syntax is that you could build up
layers on top of it, so that the source code can function as markup of
conventional mathematical notations (i.e. MathML) and or as a word-
processing-like file that can contain structures, images (e.g.
Microsoft Office Open XML↗), yet lose practical nothing.

This is done in Mathematica in 1996 with release of Mathematica
version 3. (e.g. think of XML, its uniform nested syntax, its diverse
use as a markup lang, then, some people are adding computational
semantics to it now (i.e. a computer language with syntax of xml. e.g.
O:XML↗). You can think of Mathematica going the other way, by starting
with a computer lang with a regular nested syntax, then add new but
inert keywords to it with markup semantics. The compiler will just
treat these inert keywords like comment syntax when doing computation.
When the source code is read by a editor, the editor takes the markup
keywords for structural or stylistic representation, with title,
chapter heading, tables, images, animations, hyperlinks, typeset math
expression (e.g. think of MathML↗) etc. The non-marked-up keywords are
shown as one-dimensional textual source code just like source code is
normally shown is most languages.)

Frequently Asked Questions

You say that lisp syntax irregularities “reduce such syntax's power”.
What you mean by “syntax's power”?

Here are some concrete examples of what i mean by power of syntax.

In many languages, such as Perl, they have comment syntax of a special
char running to end of line. For example, in Perl, Python, Ruby, Bash,
Windows Powershell, the special char is “#”. Note that this does not
allow nested comment. So for example, if you have multi-line code, and
you want to comment out them all, you have to pre-pend each line by
the comment char. However, if you have block comment syntax, one could
just bracket the block of code to comment it out.

Of course, these langs may also have block comment syntax, and even if
not, programing text editors often have features to add the comment
char to a block of lines. However, in the context of analyzing a
particular syntax, the line based comment syntax is inferior to block
based one with respect to what the syntax can do. This, is a simple,
perhaps trivial, example of “power of a syntax”.

In Python, the formatting is part of the lang's syntax. Some
programers may not like it, but it is well accepted that Python code
is very easy to read, and it has much done away about programer
preferences and argument about code formatting. This is example of
power of a syntax.

Let me give another, different example. In Perl, often the function's
arguments do not necessarily need to have a paren around it. For
example, “print (3);” and “print 3;” are the same thing. This is a
example of power of syntax. Similarly, in javascript for example,
ending semicolon is optional when it is at end of a line.

In Mathematica, the language has a syntax system such that you can use
fully regular nested notation (e.g. “f[g[x]]”), postfix notation (e.g.
“x//g//f”), prefix notation (e.g. ··@·@x”), infix notation (e.g.
“1~Plus~2 instead of Plus[1,2]”), for ANY function in the language,
and you can mix all of the above. (prefix and postfix by nature are
limited to functions with just 1 arg, and infix notation by nature are
limited to functions with 2 args) This is a example of power of
syntax. (For detail, see: The Concepts and Confusions of Prefix,
Infix, Postfix and Fully Nested Notations)

In general, a computer lang has a syntax. The syntax, as text written
from left to right, has various properties and characteristics. Ease
of input (think of APL as counter example), succinctness (e.g. Perl,
APL), variability (Perl, Mathematica), readability (Python),
familiarity (C, Java, Javascript, ...), 2-dimensional notation (e.g.
traditional math notation) (Mathematica), ease of parsing (lisp),
regularity (APL, Mathematica, Lisp, XML), flexibility (Mathematica)...
etc. Basically, you can look at syntax, and programer's need to type
them, and how the textual structure can be mapped to the semantic
space, from many perspectives. The good qualities, such as ease of
input, ease of reading, ease of parsing, ease of transformation,
simplicity, flexibility, etc, can be considered as elements of the
syntax's power.

As a example of syntax of little power, think of a lang using just the
symbol “0” and “1” as its sole char set.

Many of lisp's sugar syntax are designed to reduce nested paren. Why
using a more consistent, but more annoying sugar syntax?

Ultimately, you have to ask why lisper advocate nested syntax in the
first place.

If lispers love the nested syntax, then, the argument that there
should not be irregularities, has merit. If lispers think occasional
irregularities of non parenthesized syntax is good, then there's the
question of how many, or what form. You might as well introduce “++i”
for “(setq i (1+ i))”.

  Xah
∑ http://xahlee.org/

☄

From: Kaz Kylheku
Subject: Re: backquote question
Date: Sat, 06 Dec 2008 00:50:38 +0000
Message-ID: <20081221012900.532@gmail.com>

["Followup-To:" header set to comp.lang.lisp.]
On 2008-12-05, Bakul Shah <············@bitblocks.com> wrote:
> Consider the following code.
>
> 0: (define x '(1 2 3))  ; or (defvar x '(1 2 3)) in CL

I will take door #2. :)

> 1: `(0 . ,x)
> 2: `#(0 ,x)
> 3: `#(0 . ,x)
>
> Expressions 1 & 2 were handled correctly by all the lisp
> implementations I tried but they disagreed on expression 3.

Expression 3 is garbage in Common Lisp. The backquote is irrelevant.

The description of the Sharpsign Left-Parenthesis (2.4.8.3) read syntax doesn't
say anything about the support for dotted syntax.

In particular, the definition doesn't say that the vector material is read as
an ordinary list notation.  That is to say, in Common Lisp, a vector literal is
not simply a list literal with a hash mark stuck in front of it, denoting that
a list literal should be recursively read and then coerced to a vector object
by the reader.  It's a syntax in its own right.  And that syntax doesn't define
the dot behavior anywhere.  Consequenlty, a conscientious implementor will
treat this as a syntax error (unless this is somehow seen as an opportunity to
provide a spectacularly useful extension). A careful programmer will regard
this as undefined behavior.

The Common Lisp backquote description explicitly defines the backquoted #(...)
syntax as special variant, and not simply as recursion over the list cases.

The vector case is described in 2.4.6 like this:

  * `#(x1 x2 x3 ... xn) may be interpreted to mean 
    (apply #'vector `(x1 x2 x3 ...  * xn)). 

Note that for backquoted lists, there are these three separately-defined cases:

  `(x1 x2 ... xn . atom) 
  `(x1 x2 ... xn)
  `(x1 x2 ... xn . ,form)

For backquoted vectors, there is only that one case.

If there was any intent to support dot notation in backquoted vectors, then
2.4.6 would have would have cases for that.

> Hyperspec says it is legal (but if you argue it is not, then

Chapter and verse?

> expression 1 should also be illegal).

So in other words, because dot notation in vectors makes no sense, the usage
should be banned from lists?

Huh?

From: Bakul Shah
Subject: Re: backquote question
Date: Sat, 06 Dec 2008 20:39:28 +0000
Message-ID: <493AE300.9040807@bitblocks.com>

Kaz Kylheku wrote:
> Expression 3 is garbage in Common Lisp. The backquote is irrelevant.
> 
> The description of the Sharpsign Left-Parenthesis (2.4.8.3) read syntax doesn't
> say anything about the support for dotted syntax.

Right. I missed looking there. Thanks for pointing out the section.

From: William D Clinger
Subject: Re: backquote question
Date: Mon, 08 Dec 2008 01:14:53 +0000
Message-ID: <47526c85-a117-4910-867b-fc6ff2f17c29@j39g2000yqn.googlegroups.com>

Bakul Shah wrote:
> guile, scm, ypsilon returned #(0 1 2 3), while others either
> complained or returned something wrong like #(0 unquote x),
> or spectacularly wrong such as #(0 (1 . 1) (1 2 3))!

In R5 Scheme, `#(0 . ,x) is an error, which means
portable code is not allowed to use the notation,
and implementations are allowed to do whatever they
like if they see it; complaining about it is widely
considered to be the nicest thing an implementation
can do, but that's just a quality-of-implementation
issue.

In R6 Scheme, `#(0 . ,x) is illegal, both as an
expression and as an input to the read and get-datum
procedures.  Chapter 4 of the R6RS says (in the last
paragraph before section 4.1) "an implementation
must not extend the lexical or datum syntax in any
way, with one exception...", and `#(0 . ,x) does
not qualify for the exception.  If the notation
appears in the input to get-datum or read, then
R6RS library document 8.2.9 says "an exception
with condition types &lexical and &i/o-read is
raised"; according to R6RS 5.3, that means the
exception *must* be raised, which (according to
R6RS chapter 2) means that raising the exception
is an absolute requirement of the R6RS.

Hence guile and scm, being implementations of the
R4RS or R5RS, are allowed to do whatever they like
when they encounter the illegal notation.  Ypsilon,
however, being an implementation of the R6RS,
*must* raise an exception, so Ypsilon's failure
to raise the exception is a definite bug.

Note, however, that this applies only when you
are executing an R6RS top-level program.  Since
the R6RS forbids REPLs (this is yet another
consequence of the R6RS's absolute requirements),
any REPL that Ypsilon may provide lies completely
outside the purview of the R6RS.  That means you
and I cannot use the R6RS to reason about what is
or is not a bug in Ypsilon's REPL (or any other
REPL).

HTH.

Will