read-time conditions

From: Erik Naggum
Subject: read-time conditions
Date: Fri, 09 Apr 1999 00:00:00 +0000
Message-ID: <3132649180002795@naggum.no>

  in a protocol I have designed, I use straight Lisp forms for the syntax,
  and use the Common Lisp reader to handle the input.  to make this secure
  and simple, I use a special readtable with limited capabilities.  to
  avoid a _lot_ of reimplementation, I handle conditions that crop up
  during parsing, and most of them are easy to handle correctly.  however,
  there is no restriction on the number of lines in a form, so I have to be
  able to continue to collect lines.  to ensure syntactic integrity of the
  form, I bind *READ-SUPPRESS* to T and try to read it.  if this succeeds
  without error, at least the syntactic structure of the form is intact,
  but if an END-OF-FILE error occurs, there is no meaningful distinction
  between a missing internal delimiter and a missing outermost delimiter,
  and I'm not sure I want there to be, but it means that I can't stop
  reading more lines when a human would clearly see an unterminated string
  and a properly terminated list.  this is annoying, since the system will
  just sit there and wait for more input until it times out.  however, one
  problem remains completely unsolved.

  most errors during reading can be dealt with by preventing them from
  happening in the first place.  some can't, by virtue of the integrated
  parser for symbols and numbers.  despite the rather simple protocol, I
  have to deal with package errors and numeric errors signaled in this
  reader function.  the errors are signaled in cases like these:

    FOO:BAR		(not external)
    FOO::BAR		(no package)
    34/0		(division by zero)
    1s999		(range error)

  I think an error message should be as complete as possible, so I'd like
  to report which unacceptable character string caused the error, but this
  has turned out to be excessively hard.  also, if there are more errors in
  the same list, only the first can be reported, because it's hard to move
  past individual elements.

  so, my question: should the reader signal errors of type READER-ERROR,
  only (which may be augmented to contain the actual string with the
  problem), or is it free _not_ to catch any other error that might be
  signaled by a function it calls to construct an object?

#:Erik

Re: read-time conditions Howard R. Stearns
Re: read-time conditions Vassil Nikolov
Re: read-time conditions Bruno Haible

From: Howard R. Stearns
Subject: Re: read-time conditions
Date: Fri, 09 Apr 1999 00:00:00 +0000
Message-ID: <370E2C0D.EF8168FE@elwood.com>

I vaguely recall some wizard telling met that the CL reader algorithm is
designed to be very flexible in allowing extensions to Lisp expression
syntax so that things being read as Lisp code can support the new syntax
-- It is not intended to be flexible enough to handle using the CL
reader for purposes other than reading Lisp code.

If you accept this explanation, then, although I'm not completely sure,
I suspect that the current behavior in which the reader does not have to
capture other errors is acceptable.  This is because for the purpose of
reading Lisp code, I'm not convinced that a divide by zero that occurs
is really better reported as a reader-error.  (You are welcome to
disagree.  I don't feel that strongly about this aspect.)

However, while I can accept that the above explanation does reflect the
intent of the design, I don't accept that this is a good thing in
hindsight.  I think that having given people a tool which everyone finds
convenient to co-opt for other purposes, we ought to just give up and
accept the fact there is a need for some utility which provides the
capabilities that people need. 

Towards this end, I hope that J13 will consider what, if anything, can
be done to make the CL reader more general and suitable for all the
purposes people seem to use it for.  

For example, one very simple thing which I found useful was to have a
(setf) READTABLE-PARSE function which, when nil, returns unparsed
strings for each token read instead of interning symbols and trying to
read numbers.  This alone gives me 90% of what I want.

A more complex change would be a protocol whereby the function which
actually implements the parser is available directly, and it would allow
different specifications of BNF (or whatever) to be created by the user
and plugged in.  A sort of parse-table analogous to read-table.  This
gives me the other 90% of what I want. (!)

Erik Naggum wrote:
> 
>   in a protocol I have designed, I use straight Lisp forms for the syntax,
>   and use the Common Lisp reader to handle the input.  to make this secure
>   and simple, I use a special readtable with limited capabilities.  to
>   avoid a _lot_ of reimplementation, I handle conditions that crop up
>   during parsing, and most of them are easy to handle correctly.  however,
>   there is no restriction on the number of lines in a form, so I have to be
>   able to continue to collect lines.  to ensure syntactic integrity of the
>   form, I bind *READ-SUPPRESS* to T and try to read it.  if this succeeds
>   without error, at least the syntactic structure of the form is intact,
>   but if an END-OF-FILE error occurs, there is no meaningful distinction
>   between a missing internal delimiter and a missing outermost delimiter,
>   and I'm not sure I want there to be, but it means that I can't stop
>   reading more lines when a human would clearly see an unterminated string
>   and a properly terminated list.  this is annoying, since the system will
>   just sit there and wait for more input until it times out.  however, one
>   problem remains completely unsolved.
> 
>   most errors during reading can be dealt with by preventing them from
>   happening in the first place.  some can't, by virtue of the integrated
>   parser for symbols and numbers.  despite the rather simple protocol, I
>   have to deal with package errors and numeric errors signaled in this
>   reader function.  the errors are signaled in cases like these:
> 
>     FOO:BAR             (not external)
>     FOO::BAR            (no package)
>     34/0                (division by zero)
>     1s999               (range error)
> 
>   I think an error message should be as complete as possible, so I'd like
>   to report which unacceptable character string caused the error, but this
>   has turned out to be excessively hard.  also, if there are more errors in
>   the same list, only the first can be reported, because it's hard to move
>   past individual elements.
> 
>   so, my question: should the reader signal errors of type READER-ERROR,
>   only (which may be augmented to contain the actual string with the
>   problem), or is it free _not_ to catch any other error that might be
>   signaled by a function it calls to construct an object?
> 
> #:Erik

From: Vassil Nikolov
Subject: Re: read-time conditions
Date: Fri, 09 Apr 1999 00:00:00 +0000
Message-ID: <7eln4e$3on$1@nnrp1.dejanews.com>

In article <················@naggum.no>,
  Erik Naggum <····@naggum.no> wrote:
(...)
>   so, my question: should the reader signal errors of type READER-ERROR,
>   only (which may be augmented to contain the actual string with the
>   problem), or is it free _not_ to catch any other error that might be
>   signaled by a function it calls to construct an object?

Would it be sufficient if the reader is defined to do the following:
(A) always use the _most specific_ condition type;
(B) before signalling the error, examine any handlers for that
    condition, and signal only if none are available?

In this way, in the normal case (no handlers) the reader would do its
normal signalling of errors (preventing garbage from being read (as far
as it can)), while the programmer would be able to `take over' if
necessary.

Apart from that, I agree with Howard Stearns that it would be
good to have some sort of hook into the reader, after an
extended token is consumed but before the reader attempts to
turn it into a number or symbol.  (The information about the way
the reader would treat the token (integer, float, symbol with
or without a package marker, invalid, etc.) could be made available
to the hook.  (I believe that it is cheap to determine this
in the course of accumulating the extended token.))

As to the reader evolving into a full-fledged parser, there is
something in CLtL that it was deliberately decided not to go
for such complexity although such implementations had been
in use.  (Sorry, don't have the book handy.)

Vassil Nikolov <········@poboxes.com> www.poboxes.com/vnikolov
(You may want to cc your posting to me if I _have_ to see it.)
   LEGEMANVALEMFVTVTVM  (Ancient Roman programmers' adage.)

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own

From: Bruno Haible
Subject: Re: read-time conditions
Date: Wed, 14 Apr 1999 00:00:00 +0000
Message-ID: <7f30lr$pfb$1@news.u-bordeaux.fr>

#\Erik wrote:
>
>    FOO:BAR		(not external)
>    FOO::BAR		(no package)
>    34/0		(division by zero)
>    1s999		(range error)
>
>  so, my question: should the reader signal errors of type READER-ERROR,
>  only ... ?

In the first two cases, it's debatable whether a READER-ERROR or a
PACKAGE-ERROR is signalled. In the last two cases, however, it's definitely
a READER-ERROR, as per CLHS section 2.3.1.1.

             Bruno                            http://clisp.cons.org/~haible/