Backquote implementation questions.

From: Kaz Kylheku
Subject: Backquote implementation questions.
Date: Fri, 14 Mar 2003 22:09:45 +0000
Message-ID: <cf333042.0303141409.bbf02e9@posting.google.com>

I have written an implementation of backquote and it's time to spit
and polish it. My intention is to replace the existing backquote
implementation in CLISP.

I have a few questions.

1) 

Firstly, unquoting is supposed to work in vectors, notated by #(...),
but not required to work in arrays.

CLISP hacks it by giving access to the backquote level special
variable to the #A notation reader; the reader dynamically binds the
backquote level to zero, and so any occurences of unquotes in the
array syntax produce an error.

My approach is to eliminate such a dependency; instead, the unquote
syntax sets a flag, and we check the flag, and the type of the object
that was read. If it's not a list, or vector (other than a string),
and an unquote has occured, then signal an error.

The problem is how do you distinguish #A1(1 2 3) from #(1 2 3)? 

My gut instinct is to just let this slide; just allow unquotes to work
over one dimensional arrays.

2)

Related to the backquoting of arrays; what is a sufficient test to
detect that a #( ... ) object has been read?

I currently just have this simple-minded approach: if it's vectorp,
but not stringp, then it's game. One-dimensional #A arrays count as
vectorp, as I observed earlier. Is there anything else I should watch
out for? It just occured to me that I forgot about the #*10010101
notation for bit vectors; consequently, my backquote mangles them
through a conversion from vector to list to vector again: `#*1001 -->
#(1 0 0 1). Oopsies!

Can I deal with this in a general by recording the TYPE-OF the vector
object? Then I can just treat all vectors through the backquote
expansion by converting them to lists; afterward, use COERCE to put
them into exactly the original type. But the problem is that the
number of elements can change if you use splicing in a `#(...)
notation. I can perhaps just test whether the type is a list, and
extract the type symbol in that case from the first position. It seems
better just to have an exhaustive list of specific vector subtypes
that can come out of the various read notations, and avoid doing any
treatment on those objects.

3)

Secondly, in CLISP, you can have a splicing unquote at the end of a
list whose expression does not expand to a list! You get a dotted
object back, for instance:

   `(foo ,@3) -->  (FOO . 3)

This seems like an extension since the spec says that the ,@ argument
must produce a list object. Should I preserve this behavior, or just
let it signal an error that 3 is not a list?

On the other hand, the APPEND function does support an atom at the end
of the list; and the translation of `(a . b) can just produce (append
(list 'a) b). If `(foo 1 ,@3) produces (append (list '1) 3) that leads
to (1 . 3) in a straightforward way. Ditto (list* 1 3). So it makes a
certain amount of sense for the splicing unquote to work in the last
position of a form for an atom.

But it's also noteworthy that this ( ... ,@X) behavior is exactly the
same as the dotted notation  ( ... . ,X)  which is explicitly required
by the spec. So it doesn't achieve anything new.