Reader problem, : disappears

From: André Næss
Subject: Reader problem, : disappears
Date: Wed, 14 May 2003 02:16:26 +0000
Message-ID: <b9s8t8$a0s$1@maud.ifi.uio.no>

I'm playing around with the Lisp reader in CMUCL, and I've run into a
strange problem. I've made < a macro-char, using something that starts like
this:

(set-macro-character 
 #\<
 #'(lambda (stream char)
   ;; At this point, I start doing stuff with stream

However, when I get to the commented point I have a small piece of code
which uses peek-char to look at the stream and decide what to do. If I use
read-from-string on something like "<p <:id para-1> A paragraph>"
Everything works just swell. When entering the second < I see the : using
peek-char, and I can do what I want to do (which you've probably guessed by
now is to determine if what I'm looking at is an attribute or a tag).

But when I then go to type the above directly into the interactive lisp, the
: simply disappears. If I print both char and the result from peek-char I
get the values
<
i
When the reader enters the <:id para-1> tag, but I was expecting
<
:
Also, If I type
<p <::id para-1> A paragraph>
The result is precisely the same as when I used read-from string, so only
the first : is being kidnapped.

I'm guessing the : has some sort of special meaning (it is after all used
for keyword arguments), but I'm fairly new with lisp so I don't quite
understand what's going on. Also I find it very strange that
read-from-string behaves differently than if I just typed this directly
into the interactive session, the reader is being used in both cases right?

I would be very grateful for some clarification :)

Andr� N�ss

Re: Reader problem, : disappears Christian Lynbech
- Re: Reader problem, : disappears André Næss
  - Re: Reader problem, : disappears Tim Bradshaw
    - Re: Reader problem, : disappears André Næss
      - Re: Reader problem, : disappears Fred Gilham
      - Re: Reader problem, : disappears Kent M Pitman
        Re: Reader problem, : disappears André Næss
        Re: Reader problem, : disappears Kent M Pitman
        Re: Reader problem, : disappears André Næss
        Re: Reader problem, : disappears Kent M Pitman
        Re: Reader problem, : disappears André Næss
      - Re: Reader problem, : disappears Barry Margolin
  - Re: Reader problem, : disappears Christian Lynbech
    - Re: Reader problem, : disappears André Næss

From: Christian Lynbech
Subject: Re: Reader problem, : disappears
Date: Wed, 14 May 2003 15:38:04 +0000
Message-ID: <ofsmrh1qub.fsf@situla.ted.dk.eu.ericsson.se>

>>>>> "Andr�" == Andr� N�ss <·······················@ifi.uio.no> writes:

Andr�> I'm guessing the : has some sort of special meaning

Yep.

I am not quite able to decipher what is going wrong for you, more
context is needed, but let me list some supiscions (warning: I am no
readtable expert, but I have played with it some).

First of all you need to be aware of the distinction between
terminating and non-termination macro characters:

    (read-from-string "aabaa") => AABAA   ; b is non-terminating

    (read-from-string "aa'aa") => AA      ; ' is terminating

Secondly you should remember that `read' may be doing rather
sophisticated stuff before it returns to you. In

    (read-from-string "aa:aa")

you get an error from inside the reader because it has grabbed the
symbol `aa:aa' and is trying to lookup the package `aa' before
returning the token.

Lastly, as a slight variation on the above theme, you should remember
that `read' is recursive such that

    (read-from-string "((aa aa) (bb bb))")

actually gets to a lot of reading before returning so that a new macro
character must be sure not to be confused over such nested recursion
or multiple invocation.

Generally, staring at the hyperspec entries for `set-macro-character'
and friends will be of great help.

Here is a snippet of code that turns a number of characters into
terminating macro characters turning the char into the equivalent
symbol (I am in no way claiming that this is a recommended way of
achieving the effect):

    (defvar *mib-readtable* (copy-readtable))
    (defun standalone-char-reader (s c) 
      (declare (ignore s)) 
      (let ((b (make-string 1 :initial-element c)))
        (intern b)))
    (map nil 
      #'(lambda (c) 
          (set-macro-character c #'standalone-char-reader nil *mib-readtable*))
      "{}[];:,")

such that

    (let ((*readtable* *mib-readtable*)) 
      (read-from-string "(aa;bb cc:dd {abc} [a])"))

will yield

    (AA |;| BB CC |:| DD { ABC } [ A ])


------------------------+-----------------------------------------------------
Christian Lynbech       | christian ··@ defun #\. dk
------------------------+-----------------------------------------------------
Hit the philistines three times over the head with the Elisp reference manual.
                                        - ·······@hal.com (Michael A. Petonic)

From: André Næss
Subject: Re: Reader problem, : disappears
Date: Wed, 14 May 2003 16:18:50 +0000
Message-ID: <b9tq8i$d3v$1@maud.ifi.uio.no>

Christian Lynbech:

>>>>>> "Andr�" == Andr� N�ss <·······················@ifi.uio.no> writes:
> 
> Andr�> I'm guessing the : has some sort of special meaning
> 
> Yep.
> 
> I am not quite able to decipher what is going wrong for you, more
> context is needed, but let me list some supiscions (warning: I am no
> readtable expert, but I have played with it some).

After a lot more struggling it seems that the error is with code I have
which tries to lock on to attributes. My convention is that attributes
start with '<:' (since tags start with just '<').

Problem is, to find out if something is a attribute or a tag I have to read
the '<' then use peek-char to look at what might be a :. What might happen
now is that '<' must be unread, and it seems that this is where something
goes wrong. I've been staring at my code for hours now, and I can't seem to
figure it out, I'm starting to wonder if unread-char fails in some rare
cases when reading interactive input, compared to reading from a string,
but I'm not sure about this as I have the same problems with input coming
from a file (using load). I've also found that it's just not the : which
disappears, it's always the char after the one which is supposed to be
unread, so the problem is somewhere in this area ... :)

I'm just gonna have to stare at it a little longer... I'll try to see if I
can create an example which visualizes the problem without all my
supporting code, but right now my entire reader is broken :P

Thanks for the input. Your code actually taught me a couple of tricks which
even if they don't solve my problem do help improve my code :) (I'm very
much a newbie!)

Andr� N�ss

From: Tim Bradshaw
Subject: Re: Reader problem, : disappears
Date: Wed, 14 May 2003 16:49:10 +0000
Message-ID: <ey3ptml1njt.fsf@cley.com>

* Andr  wrote:

> Problem is, to find out if something is a attribute or a tag I have
> to read the '<' then use peek-char to look at what might be a
> :. What might happen now is that '<' must be unread, and it seems
> that this is where something goes wrong. 

If you are doing what it sounds like, namely reading #\<, peeking at
~\: and then trying to unread *both*, you can't.  You can only unread
one character, including ones you just peek at.  See the definition of
UNREAD-CHAR.

--tim

From: André Næss
Subject: Re: Reader problem, : disappears
Date: Wed, 14 May 2003 18:39:59 +0000
Message-ID: <b9u2h5$dij$1@maud.ifi.uio.no>

Tim Bradshaw:

> * Andr  wrote:
> 
>> Problem is, to find out if something is a attribute or a tag I have
>> to read the '<' then use peek-char to look at what might be a
>> :. What might happen now is that '<' must be unread, and it seems
>> that this is where something goes wrong.
> 
> If you are doing what it sounds like, namely reading #\<, peeking at
> ~\: and then trying to unread *both*, you can't.  You can only unread
> one character, including ones you just peek at.  See the definition of
> UNREAD-CHAR.

Hmm... If the start of the stream is '<:attr', I do a read char and store it
in a variable c. So c has the value #\<. I then do peek-char on the stream
(which is now ':attr'), and find a ':'. Are you saying that at this point I
can't unread the '<'? i.e.: (unread-char c stream) Does a peek-char make it
impossible to unread the last character you read using read-char? I tried
reading the manual entry but I'm not sure if I understand it, and I also
believe I have done this succesfully  in other attempts.

Andr� N�ss

From: Fred Gilham
Subject: Re: Reader problem, : disappears
Date: Wed, 14 May 2003 20:25:13 +0000
Message-ID: <u7bry5p97a.fsf@snapdragon.csl.sri.com>

From the Hyperspec on unread-char:

    Invoking peek-char or read-char commits all previous
    characters. The consequences of invoking unread-char on any
    character preceding that which is returned by peek-char (including
    those passed over by peek-char that has a non-nil peek-type) are
==> unspecified. In particular, the consequences of invoking
==> unread-char after peek-char are unspecified.

Clearly your example is invoking unspecified behavior.  So even if it
works sometimes, it isn't guaranteed to work all the time.  I believe
there was a recent claim in this newsgroup that one possible outcome
of invoking unspecified behavior was the emission of maleficent
supernatural beings from one's nasal passages.

-- 
Fred Gilham                                   ······@csl.sri.com
I was storing data in every conceivable way, including keeping a chain
of sound waves running between the speaker and the microphone. There
was no more memory left to be had....

From: Kent M Pitman
Subject: Re: Reader problem, : disappears
Date: Wed, 14 May 2003 18:47:13 +0000
Message-ID: <sfwk7ctgyby.fsf@shell01.TheWorld.com>

Andr� N�ss <·······················@ifi.uio.no> writes:

> Tim Bradshaw:
> 
> > * Andr  wrote:
> > 
> >> Problem is, to find out if something is a attribute or a tag I have
> >> to read the '<' then use peek-char to look at what might be a
> >> :. What might happen now is that '<' must be unread, and it seems
> >> that this is where something goes wrong.
> > 
> > If you are doing what it sounds like, namely reading #\<, peeking at
> > ~\: and then trying to unread *both*, you can't.  You can only unread
> > one character, including ones you just peek at.  See the definition of
> > UNREAD-CHAR.
> 
> Hmm... If the start of the stream is '<:attr', I do a read char and store it
> in a variable c. So c has the value #\<. I then do peek-char on the stream
> (which is now ':attr'), and find a ':'. Are you saying that at this point I
> can't unread the '<'? 

Yes.

> i.e.: (unread-char c stream) Does a peek-char make it
> impossible to unread the last character you read using read-char?

Yes, it's stream-specific, but it's allowed to and often does.

peek-char is permitted to be implemented as read-char+unread-char,
and a stream doesn't have to have more than a one-character putback buffer.

> I tried
> reading the manual entry but I'm not sure if I understand it, and I also
> believe I have done this succesfully  in other attempts.

Now you're trying to derive language semantics from implementation.
It may work in some string-stream implementations, for example, if all
that unread-char does is decrement a string index.  It may even work on
some streams if there's a "current buffer" and unread-char can work by
just backing up an index in that buffer.  But at the beginning of the
buffer, it isn't going to "recall the previous buffer", so it might fail
sometimes and succeed sometimes.  It depends on the implementation.
You, as a user, are not allowed to rely on your belief that it will be
implemented a certain way.

From: André Næss
Subject: Re: Reader problem, : disappears
Date: Wed, 14 May 2003 19:42:15 +0000
Message-ID: <b9u65t$doo$1@maud.ifi.uio.no>

Kent M Pitman:

> peek-char is permitted to be implemented as read-char+unread-char,
> and a stream doesn't have to have more than a one-character putback
> buffer.

Ok I see.

>> I tried
>> reading the manual entry but I'm not sure if I understand it, and I also
>> believe I have done this succesfully  in other attempts.
> 
> Now you're trying to derive language semantics from implementation.
> It may work in some string-stream implementations, for example, if all
> that unread-char does is decrement a string index.  It may even work on
> some streams if there's a "current buffer" and unread-char can work by
> just backing up an index in that buffer.  But at the beginning of the
> buffer, it isn't going to "recall the previous buffer", so it might fail
> sometimes and succeed sometimes.  It depends on the implementation.
> You, as a user, are not allowed to rely on your belief that it will be
> implemented a certain way.

No I see, but it's a real shame because it simplifies my reader a lot. I
guess I'll just have to figure out a different solution then :(

Andr� N�ss

From: Kent M Pitman
Subject: Re: Reader problem, : disappears
Date: Wed, 14 May 2003 19:44:59 +0000
Message-ID: <sfwel31fh38.fsf@shell01.TheWorld.com>

Andr� N�ss <·······················@ifi.uio.no> writes:

> [multi-char putback onto streams]
> No I see, but it's a real shame because it simplifies my reader a lot. I
> guess I'll just have to figure out a different solution then :(

Make your own stream for which you guarantee the putback behavior by
allocating your own stream that has arbitrary (or even just
n-character, for some acceptable value of n) putback buffering, and
then always do your parsing through such a stream.

From: André Næss
Subject: Re: Reader problem, : disappears
Date: Wed, 14 May 2003 21:10:16 +0000
Message-ID: <b9ubat$e5h$1@maud.ifi.uio.no>

Kent M Pitman:

> Andr� N�ss <·······················@ifi.uio.no> writes:
> 
>> [multi-char putback onto streams]
>> No I see, but it's a real shame because it simplifies my reader a lot. I
>> guess I'll just have to figure out a different solution then :(
> 
> Make your own stream for which you guarantee the putback behavior by
> allocating your own stream that has arbitrary (or even just
> n-character, for some acceptable value of n) putback buffering, and
> then always do your parsing through such a stream.

Currently I've simply made '<' a macro-character, so whenever I find a '<'
in the input string I use my own routine. All other data is handled by the
default reader, so I can type things interactively, read from a file or
read from a string, without doing anything special which is really neat.

If I was to use my own stream I wouldn't be able to use the internal reader
directly? Or am I missing something here? I mean if I type something
interactively then it's already on a default stream, and I can't change
this? (Or I imagine I can -- this being lisp and all -- but it would not be
as simple as the current solution?).

Andr� N�ss

From: Kent M Pitman
Subject: Re: Reader problem, : disappears
Date: Wed, 14 May 2003 21:21:32 +0000
Message-ID: <sfwznlp6x7n.fsf@shell01.TheWorld.com>

Andr� N�ss <·······················@ifi.uio.no> writes:

> If I was to use my own stream I wouldn't be able to use the internal reader
> directly? Or am I missing something here? I mean if I type something
> interactively then it's already on a default stream, and I can't change
> this? (Or I imagine I can -- this being lisp and all -- but it would not be
> as simple as the current solution?).

That depends on whether the stream would do a complete read and leave the
reader in a consistent state.  Let's say you make a reader that permits
two characters of putback.  At the end, it better not have more than one
character in it, and the parser you call had better tell the stream it's
done, so that the stream can give the underlying stream the last character.

If those constraints are satisfied, nothing keeps you from doing:

 (defun dispatch-handler (stream char arg)
   (let ((encapsulated-stream 
           (make-instance 'my-stream :underlying-stream stream)))
     (prog1 (parse-using-my-stream encapsulated-stream)
            (evacuate-stream-to-underlying-stream encapsulated-stream))))

These two operations are ones you'd have to write.  And I've assumed you
either know how to make a stream, or can look it up.  It's implementation
dependent anyway, but most implementations support either Gray Streams
or Simple Streams or both, and those protocols are ones for which 
doc and examples are available.

From: André Næss
Subject: Re: Reader problem, : disappears
Date: Thu, 15 May 2003 07:43:54 +0000
Message-ID: <b9vges$h1r$2@maud.ifi.uio.no>

Kent M Pitman:

> Andr� N�ss <·······················@ifi.uio.no> writes:
> 
>> If I was to use my own stream I wouldn't be able to use the internal
>> reader directly? Or am I missing something here? I mean if I type
>> something interactively then it's already on a default stream, and I
>> can't change this? (Or I imagine I can -- this being lisp and all -- but
>> it would not be as simple as the current solution?).
> 
> That depends on whether the stream would do a complete read and leave the
> reader in a consistent state.  Let's say you make a reader that permits
> two characters of putback.  At the end, it better not have more than one
> character in it, and the parser you call had better tell the stream it's
> done, so that the stream can give the underlying stream the last
> character.
> 
> If those constraints are satisfied, nothing keeps you from doing:
> 
>  (defun dispatch-handler (stream char arg)
>    (let ((encapsulated-stream
>            (make-instance 'my-stream :underlying-stream stream)))
>      (prog1 (parse-using-my-stream encapsulated-stream)
>             (evacuate-stream-to-underlying-stream encapsulated-stream))))
> 
> These two operations are ones you'd have to write.  And I've assumed you
> either know how to make a stream, or can look it up.  It's implementation
> dependent anyway, but most implementations support either Gray Streams
> or Simple Streams or both, and those protocols are ones for which
> doc and examples are available.

Ok thanks, I'll look into it.

Andr� N�ss

From: Barry Margolin
Subject: Re: Reader problem, : disappears
Date: Wed, 14 May 2003 18:49:03 +0000
Message-ID: <zcwwa.16$of5.702@paloalto-snr1.gtei.net>

In article <············@maud.ifi.uio.no>,
Andr� N�ss  <·······················@ifi.uio.no> wrote:
>Hmm... If the start of the stream is '<:attr', I do a read char and store it
>in a variable c. So c has the value #\<. I then do peek-char on the stream
>(which is now ':attr'), and find a ':'. Are you saying that at this point I
>can't unread the '<'? i.e.: (unread-char c stream) Does a peek-char make it
>impossible to unread the last character you read using read-char? I tried
>reading the manual entry but I'm not sure if I understand it, and I also
>believe I have done this succesfully  in other attempts.

The CLHS says "The consequences of invoking UNREAD-CHAR on any character
preceding that which is returned by PEEK-CHAR ... are unspecified.  In
particular, the consequences of invoking UNREAD-CHAR after PEEk-CHAR are
unspecified."

Basically, you should treat PEEK-CHAR as equivalent to doing READ-CHAR
followed by UNREAD-CHAR.  So calling UNREAD-CHAR after PEEK-CHAR is like
calling UNREAD-CHAR twice in a row, which is not permitted.

-- 
Barry Margolin, ··············@level3.com
Genuity Managed Services, a Level(3) Company, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

From: Christian Lynbech
Subject: Re: Reader problem, : disappears
Date: Thu, 15 May 2003 12:31:58 +0000
Message-ID: <ofznlo5r29.fsf@situla.ted.dk.eu.ericsson.se>

>>>>> "Andr�" == Andr� N�ss <·······················@ifi.uio.no> writes:

Andr�> What might happen now is that '<' must be unread

Perhaps you should attack this part. Could you explain more about what
the reader should return of "<aaa" and "<:bbb" and why the unread is
necessary?

To illustrate, suppose that I wanted both tags and attributes to parse
as symbols but in two distinct packages, I could do a reader macro
like:

(defun standalone-char-reader (s c) 
  (declare (ignore c)) 
  (if (eql (peek-char t s) #\:)
      (progn 
	(read-char s) 
	(intern (symbol-name (read s nil nil t)) (find-package :attributes)))
      (intern (symbol-name (read s nil nil t)) (find-package :tags))))

(set-macro-character #\< #'standalone-char-reader nil *mib-readtable*)

(let ((*readtable* *mib-readtable*))
  (ignore-errors (make-package :tags))
  (ignore-errors (make-package :attributes))
  (list (read-from-string "<aaa") (read-from-string "<:bbb")))

Do not forget that the common lisp reader is very powerfull but not
without limitations! 

In particular, if you feel a need to backtrack perhaps you are trying
to do something that is too complicated to live entirely in the
reader.

Either you should try to merge the various actions into one or you
should make the reader a lexer and leave the sophisticated parsing for
a real parser. In other words, rather than teaching the reader about
the full differences between tags and attributes you could one reader
macro turning "<:" into "( :tag", "<[^:]" into "( :attribute" and ">"
into ">" such that

    (read-from-string "<aaa>")  => ( :ATTRIBUTE AAA )
    (read-from-string "<:bbb>")  => ( :TAGS BBB )

and then let a real parser (eg. a Baker style parser) postprocess the
output of the reader.

That was a very contrived example (and not very easily implementable)
but hopefully indicative of what I am thinking about.

                               -- CHLY

From: André Næss
Subject: Re: Reader problem, : disappears
Date: Thu, 15 May 2003 13:12:51 +0000
Message-ID: <ba03nj$idm$1@maud.ifi.uio.no>

Christian Lynbech:

>>>>>> "Andr�" == Andr� N�ss <·······················@ifi.uio.no> writes:
> 
> Andr�> What might happen now is that '<' must be unread
> 
> Perhaps you should attack this part. Could you explain more about what
> the reader should return of "<aaa" and "<:bbb" and why the unread is
> necessary?
> 
> To illustrate, suppose that I wanted both tags and attributes to parse
> as symbols but in two distinct packages, I could do a reader macro
> like:
> 
> (defun standalone-char-reader (s c)
>   (declare (ignore c))
>   (if (eql (peek-char t s) #\:)
>       (progn
> (read-char s)
> (intern (symbol-name (read s nil nil t)) (find-package :attributes)))
>       (intern (symbol-name (read s nil nil t)) (find-package :tags))))
> 
> (set-macro-character #\< #'standalone-char-reader nil *mib-readtable*)
> 
> (let ((*readtable* *mib-readtable*))
>   (ignore-errors (make-package :tags))
>   (ignore-errors (make-package :attributes))
>   (list (read-from-string "<aaa") (read-from-string "<:bbb")))

I'm not sure I understand what's going on here, I haven't really looked at
packages yet.

> Do not forget that the common lisp reader is very powerfull but not
> without limitations!
> 
> In particular, if you feel a need to backtrack perhaps you are trying
> to do something that is too complicated to live entirely in the
> reader.
> 
> Either you should try to merge the various actions into one or you
> should make the reader a lexer and leave the sophisticated parsing for
> a real parser. In other words, rather than teaching the reader about
> the full differences between tags and attributes you could one reader
> macro turning "<:" into "( :tag", "<[^:]" into "( :attribute" and ">"
> into ">" such that
> 
>     (read-from-string "<aaa>")  => ( :ATTRIBUTE AAA )
>     (read-from-string "<:bbb>")  => ( :TAGS BBB )
> 
> and then let a real parser (eg. a Baker style parser) postprocess the
> output of the reader.
> 
> That was a very contrived example (and not very easily implementable)
> but hopefully indicative of what I am thinking about.
> 
>                                -- CHLY

Ok what I'm trying to do isn't very complicated (I think). I want basically
what your example shows, <aaa> is a tag, <:bbb> is an attribute. Then I
have constructs like this:

<p <:class simple> A paragraph>

But I also allow these:

<p <b Text in bold>>
<p A paragraph>
<p (when (eq (language 'english)) <:class "english">)>
<p 2 + 2 equals (+ 2 2)>

Currently my reader is invoked when a '<' is seen, at this point I create a
tag macro if what is now at the start of the stream is something else than
a ':', so the first thing that happens when a '<' is seen is that I go on
to parse it as an attribute or a tag. For simplicity I wanted this as the
only entry point as it made the rest of the process very simple. But
because attributes are attributes of tags and not content I have to handle
them specially, so what I do now is that after I've found that something is
a tag I go into a loop where I look at the beginning of the next element,
if it starts with <: I invoke the reader to parse it. But to be able to do
this I have to put the '<' back on the stream so that the reader will read
the '<', enter my procedure, see the : and parse the attribute. This allows
me to use the same procedure to parse both attributes and tags, which
seemed simple at first.

The reader itself simply produces a macro, either a (make-tag or a
(make-attribute macro. The problem with attributes is that they have to
become part of the enclosing tag, so I can't close the tag until I'm done
with them. This all works now, as long as I use read-from-string, clearly
because the implementation there allows me to use the unread trick, but as
Kent Pitman and others pointed out I'm relying on unspecified behavior, so
I have to fix it.

I guess I can either implement my own stream as proposed by Kent Pitman, but
the more I think about it I might get by if I restructure my code a little.
Currently I'm struggling a bit because of many backticks and commas and the
fact that this is the first time I write macros. There's also the
well-known lack of time :)

I have implemented all of this using a preprocessor, but I reckon that doing
it all inside the lisp reader would be a lot simpler and cleaner, and so
far that has been true. It has also been proven to be a great excercise in
lisp.

Andr� N�ss