From: Paul F. Dietz
Subject: Constituent characters with constituent trait 'invalid' -- multiple escape ok?
Date: 
Message-ID: <C5Odndo2_LqhsUXcRVn-qg@dls.net>
In the CL standard, constituent characters with constituent trait
invalid are described by (2.1.4.3):

   Characters with the constituent trait invalid cannot ever appear
   in a token except under the control of a single escape character.
   If an invalid character is encountered while an object is being
   read, an error of type reader-error is signaled.

However, the reader algorithm (section 2.2) indicates that such
characters can also be escaped by multiple escape (vertical bars).

How should I resolve this contradiction, and what was the intended
behavior?

	Paul

From: Duane Rettig
Subject: Re: Constituent characters with constituent trait 'invalid' -- multiple escape ok?
Date: 
Message-ID: <4wtuvfn5h.fsf@franz.com>
"Paul F. Dietz" <·····@dls.net> writes:

> In the CL standard, constituent characters with constituent trait
> invalid are described by (2.1.4.3):
> 
>    Characters with the constituent trait invalid cannot ever appear
>    in a token except under the control of a single escape character.
>    If an invalid character is encountered while an object is being
>    read, an error of type reader-error is signaled.
> 
> However, the reader algorithm (section 2.2) indicates that such
> characters can also be escaped by multiple escape (vertical bars).
> 
> How should I resolve this contradiction, and what was the intended
> behavior?

I think 1.5.1.4.1 is your friend here.  How you interpret "more
specific" might be different than how I interpret it.  It seems
though, like the first mentioned paragraph, 2.1.4.3, is the more
specific and thus applies.  It might be hard for the invalid
constituent characters to be part of a name anyway, since if they
became alphabetic per the reader algorithm, how do they print?
The only questions that would remain for me are why the explicit
exclusivity is given in 2.1.4.3 in the first place, and also, what
should happen when _both_ single-escape and multiple-escape are
present?

Nasty, non-intuitive business.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Paul F. Dietz
Subject: Re: Constituent characters with constituent trait 'invalid' -- multiple escape ok?
Date: 
Message-ID: <41D8A03C.2060201@dls.net>
Duane Rettig wrote:
> "Paul F. Dietz" <·····@dls.net> writes:
> 
> 
>>In the CL standard, constituent characters with constituent trait
>>invalid are described by (2.1.4.3):
>>
>>   Characters with the constituent trait invalid cannot ever appear
>>   in a token except under the control of a single escape character.
>>   If an invalid character is encountered while an object is being
>>   read, an error of type reader-error is signaled.
>>
>>However, the reader algorithm (section 2.2) indicates that such
>>characters can also be escaped by multiple escape (vertical bars).
>>
>>How should I resolve this contradiction, and what was the intended
>>behavior?
> 
> 
> I think 1.5.1.4.1 is your friend here.  How you interpret "more
> specific" might be different than how I interpret it.  It seems
> though, like the first mentioned paragraph, 2.1.4.3,  is the more
> specific and thus applies.  It might be hard for the invalid
> constituent characters to be part of a name anyway, since if they
> became alphabetic per the reader algorithm, how do they print?
> The only questions that would remain for me are why the explicit
> exclusivity is given in 2.1.4.3 in the first place, and also, what
> should happen when _both_ single-escape and multiple-escape are
> present?

Two comments:

(1) I've tried to use the 1.5.1.4.1 argument before in another context,
and it was argued that it doesn't apply if, as here, one of the
two interpretations is not an exceptional situation (this was with
clisp).

(2) *Every* lisp I've tried this on, with no exceptions, accepts
multiple-escaped invalid characters, instead of signalling a read
error.  This includes Allegro.  What's more, whitespace characters
have the invalid character trait, so this would also appear to
rule out using | ... | for symbol names containing spaces -- and
there are examples in the standard that do just that (yes, I know
examples are not normative.)

	Paul
From: Duane Rettig
Subject: Re: Constituent characters with constituent trait 'invalid' -- multiple escape ok?
Date: 
Message-ID: <4sm5jfd9a.fsf@franz.com>
"Paul F. Dietz" <·····@dls.net> writes:

> Duane Rettig wrote:
> > "Paul F. Dietz" <·····@dls.net> writes:
> >
> 
> >>In the CL standard, constituent characters with constituent trait
> >>invalid are described by (2.1.4.3):
> >>
> >>   Characters with the constituent trait invalid cannot ever appear
> >>   in a token except under the control of a single escape character.
> >>   If an invalid character is encountered while an object is being
> >>   read, an error of type reader-error is signaled.
> >>
> >>However, the reader algorithm (section 2.2) indicates that such
> >>characters can also be escaped by multiple escape (vertical bars).
> >>
> >>How should I resolve this contradiction, and what was the intended
> >>behavior?
> > I think 1.5.1.4.1 is your friend here.  How you interpret "more
> 
> > specific" might be different than how I interpret it.  It seems
> > though, like the first mentioned paragraph, 2.1.4.3,  is the more
> > specific and thus applies.  It might be hard for the invalid
> > constituent characters to be part of a name anyway, since if they
> > became alphabetic per the reader algorithm, how do they print?
> > The only questions that would remain for me are why the explicit
> > exclusivity is given in 2.1.4.3 in the first place, and also, what
> > should happen when _both_ single-escape and multiple-escape are
> > present?
> 
> Two comments:
> 
> (1) I've tried to use the 1.5.1.4.1 argument before in another context,
> and it was argued that it doesn't apply if, as here, one of the
> two interpretations is not an exceptional situation (this was with
> clisp).

This doesn't follow from the example itself in 1.5.1.4.1.1 (yes,
being an example, not part of the spec, per 1.4.3, but interesting
in and of itself).  The example shows a conflict between an
"exceptional situation" and an explicitly undefined situation.
Is an explicitly undefined situation an "exceptional situation"?
Well, unfortunately, the term "Exceptional Situation" is not given
a definition.  However, the closest thing that comes to defining
the term is 1.4.4.10, which describes the section of a dictionary
entry by that name.  All cases that are described as showing up or
not showing up in that section are described as being explicitly
detected, handled, or signalled by some code.  So it is clear in at
least this case (i.e. 1.5.1.4.1) that the term "exceptional
situation" is not being used to describe the same thing as an
undefined situation.  Let's then back of and try construction -
what is "exceptional" in its generic sense?  It is a situation
where the rule is not followed; i.e. "the exception to the rule".
But if we take that point of view, then _any_ situation where the
spec describes two conflicting behaviors is in fact describing
an exception.  At any rate, it is hard to argue definitionally
using an undefined concept.

I'd be interested to hear what situation the argument against
1.5.1.4.1 was used...

> (2) *Every* lisp I've tried this on, with no exceptions, accepts
> multiple-escaped invalid characters, instead of signalling a read
> error.  This includes Allegro.  What's more, whitespace characters
> have the invalid character trait, so this would also appear to
> rule out using | ... | for symbol names containing spaces -- and
> there are examples in the standard that do just that (yes, I know
> examples are not normative.)

Right.  But not only that, consider what Allegro CL does for the
Backspace character in these situations (the reason for my bringing
up the question about combinations).  I have verified that each symbol
shown is indeed what it appears to be; there are no hidden characters
in the strings:

CL-USER(1): '|abc|
|ac|
CL-USER(2): '|ab\c|
|abc|
CL-USER(3): 'ab\c
ABC
CL-USER(4): 

Finally, note that a single-escaped space prints with multiple-escape
syntax, regardless of the value of *print-readably*; I suspect that
all lisps do this in order to maintain print/read integrity for this
common case at least:

CL-USER(1): 'ab\ c
|AB C|
CL-USER(2): '|AB C|
|AB C|
CL-USER(3): 

I repeat:

| Nasty, non-intuitive business.


-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Paul F. Dietz
Subject: Re: Constituent characters with constituent trait 'invalid' -- multiple escape ok?
Date: 
Message-ID: <5_6dnczQBPHoXEXcRVn-qQ@dls.net>
Duane Rettig wrote:

>>(1) I've tried to use the 1.5.1.4.1 argument before in another context,
>>and it was argued that it doesn't apply if, as here, one of the
>>two interpretations is not an exceptional situation (this was with
>>clisp).
> 
> 
> This doesn't follow from the example itself in 1.5.1.4.1.1 (yes,
> being an example, not part of the spec, per 1.4.3, but interesting
> in and of itself).  The example shows a conflict between an
> "exceptional situation" and an explicitly undefined situation.
> Is an explicitly undefined situation an "exceptional situation"?
> Well, unfortunately, the term "Exceptional Situation" is not given
> a definition.  However, the closest thing that comes to defining
> the term is 1.4.4.10, which describes the section of a dictionary
> entry by that name.  All cases that are described as showing up or
> not showing up in that section are described as being explicitly
> detected, handled, or signalled by some code.  So it is clear in at
> least this case (i.e. 1.5.1.4.1) that the term "exceptional
> situation" is not being used to describe the same thing as an
> undefined situation.

There are examples of 'is undefined' showing up in the
Exceptional Situations section of dictionary entries.  See:
WITH-HASH-TABLE-ITERATOR, GET-OUTPUT-STREAM-STRING,
WITH-OUTPUT-TO-STRING, PPRINT-EXIT-IF-LIST-EXHAUSTED, FORMAT,
ED, COMPILE, COMPILER-MACRO-FUNCTION, MACRO-FUNCTION, THE,
TYPEP, FMAKUNBOUND, LOOP-FINISH, SLOT-BOUNDP, SLOT-MAKUNBOUND,
WITH-ACCESSORS, WITH-SLOTS, DEFSTRUCT, WITH-PACKAGE-ITERATOR,
and UNEXPORT.

> I'd be interested to hear what situation the argument against
> 1.5.1.4.1 was used...

I unfortunately don't remember what it was.


> Finally, note that a single-escaped space prints with multiple-escape
> syntax, regardless of the value of *print-readably*; I suspect that
> all lisps do this in order to maintain print/read integrity for this
> common case at least:
> 
> CL-USER(1): 'ab\ c
> |AB C|
> CL-USER(2): '|AB C|
> |AB C|
> CL-USER(3): 

Not Lispworks.

CL-USER 1 > '|AB C|
AB\ C

	Paul
From: Duane Rettig
Subject: Re: Constituent characters with constituent trait 'invalid' -- multiple escape ok?
Date: 
Message-ID: <4oeg7ez0s.fsf@franz.com>
"Paul F. Dietz" <·····@dls.net> writes:

> Duane Rettig wrote:
> 
> >>(1) I've tried to use the 1.5.1.4.1 argument before in another context,
> >>and it was argued that it doesn't apply if, as here, one of the
> >>two interpretations is not an exceptional situation (this was with
> >>clisp).
> > This doesn't follow from the example itself in 1.5.1.4.1.1 (yes,
> 
> > being an example, not part of the spec, per 1.4.3, but interesting
> > in and of itself).  The example shows a conflict between an
> > "exceptional situation" and an explicitly undefined situation.
> > Is an explicitly undefined situation an "exceptional situation"?
> > Well, unfortunately, the term "Exceptional Situation" is not given
> > a definition.  However, the closest thing that comes to defining
> > the term is 1.4.4.10, which describes the section of a dictionary
> > entry by that name.  All cases that are described as showing up or
> > not showing up in that section are described as being explicitly
> > detected, handled, or signalled by some code.  So it is clear in at
> > least this case (i.e. 1.5.1.4.1) that the term "exceptional
> > situation" is not being used to describe the same thing as an
> > undefined situation.
> 
> There are examples of 'is undefined' showing up in the
> Exceptional Situations section of dictionary entries.  See:
> WITH-HASH-TABLE-ITERATOR, GET-OUTPUT-STREAM-STRING,
> WITH-OUTPUT-TO-STRING, PPRINT-EXIT-IF-LIST-EXHAUSTED, FORMAT,
> ED, COMPILE, COMPILER-MACRO-FUNCTION, MACRO-FUNCTION, THE,
> TYPEP, FMAKUNBOUND, LOOP-FINISH, SLOT-BOUNDP, SLOT-MAKUNBOUND,
> WITH-ACCESSORS, WITH-SLOTS, DEFSTRUCT, WITH-PACKAGE-ITERATOR,
> and UNEXPORT.

Yeah, I guess so.  It still seems like a weak argument for such
a narrow view of a phrase that isn't defined.  The closest I could
find was a definition for the term "situation", which is simply the
evaluation of a form in a specific environment.  So maybe an
exceptional situation could be an extremely good form :-)

> > I'd be interested to hear what situation the argument against
> > 1.5.1.4.1 was used...
> 
> I unfortunately don't remember what it was.
> 
> 
> > Finally, note that a single-escaped space prints with multiple-escape
> > syntax, regardless of the value of *print-readably*; I suspect that
> > all lisps do this in order to maintain print/read integrity for this
> > common case at least:
> > CL-USER(1): 'ab\ c
> 
> > |AB C|
> > CL-USER(2): '|AB C|
> > |AB C|
> > CL-USER(3):
> 
> 
> Not Lispworks.
> 
> CL-USER 1 > '|AB C|
> AB\ C

I would guess, though, that if you typed 'ab\ c or 'AB\ C to
Lispworks, you'd also get a consistent result.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Kalle Olavi Niemitalo
Subject: Re: Constituent characters with constituent trait 'invalid' -- multiple escape ok?
Date: 
Message-ID: <87hdlzhrnw.fsf@Astalo.kon.iki.fi>
Duane Rettig <·····@franz.com> writes:

> CL-USER(1): '|abc|
> |ac|

Is this done by the reader, or by the tty interface?
(I.e. would reading from a file or a string have the same result?)
From: Duane Rettig
Subject: Re: Constituent characters with constituent trait 'invalid' -- multiple escape ok?
Date: 
Message-ID: <41xd2cxkb.fsf@franz.com>
Kalle Olavi Niemitalo <···@iki.fi> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > CL-USER(1): '|abc|
> > |ac|
> 
> Is this done by the reader, or by the tty interface?
> (I.e. would reading from a file or a string have the same result?)

Ah, yes; good catch:

CL-USER(1): (shell "cat foo.cl")

(defparameter *w* 'abc)
(defparameter *x* '|abc|)
(defparameter *y* 'ab\c)
(defparameter *z* '|ab\c|)

0
CL-USER(2): :ld foo.cl
; Loading /tmp_mnt/net/gemini/home/duane/foo.cl
CL-USER(3): *w*
ABC
CL-USER(4): *x*
|abc|
CL-USER(5): *y*
ABC
CL-USER(6): *z*
|abc|
CL-USER(7): 

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Pascal Bourguignon
Subject: Re: Constituent characters with constituent trait 'invalid' -- multiple escape ok?
Date: 
Message-ID: <87ekh1a5cl.fsf@thalassa.informatimago.com>
Duane Rettig <·····@franz.com> writes:

> Kalle Olavi Niemitalo <···@iki.fi> writes:
> 
> > Duane Rettig <·····@franz.com> writes:
> > 
> > > CL-USER(1): '|abc|
> > > |ac|
> > 
> > Is this done by the reader, or by the tty interface?
> > (I.e. would reading from a file or a string have the same result?)
> 
> Ah, yes; good catch:
> 
> CL-USER(1): (shell "cat foo.cl")
> 
> (defparameter *w* 'abc)
> (defparameter *x* '|abc|)
> (defparameter *y* 'ab\c)
> (defparameter *z* '|ab\c|)
> 
> 0
> CL-USER(2): :ld foo.cl
> ; Loading /tmp_mnt/net/gemini/home/duane/foo.cl
> CL-USER(3): *w*
> ABC
> CL-USER(4): *x*
> |abc|
> CL-USER(5): *y*
> ABC
> CL-USER(6): *z*
> |abc|
> CL-USER(7): 

Same question applies to cat. Try: (shell "cat -A foo.cl")

-- 
__Pascal_Bourguignon__               _  Software patents are endangering
()  ASCII ribbon against html email (o_ the computer industry all around
/\  1962:DO20I=1.100                //\ the world http://lpf.ai.mit.edu/
    2001:my($f)=`fortune`;          V_/   http://petition.eurolinux.org/