I just started playing with the reader since i need to create a csv or
arff reader that treats all data in the rows as symbols. I'm not clear
on how to use any character to signal the end of the list. Ideally I'd
be able to read
1,2,3.14
ab,foo bar, zot
into the following list: ((1 2 3.14) (ab |foo bar| zot))
After spending hours playing with the readtable and trying to
manipulate #\newline to signal the termination of the list read, i'm
still at square one.
Any help is extremely appreciated.
Adam Ingram-Goble <·······@cs.pdx.edu> writes:
> Ideally I'd be able to read
> 1,2,3.14
> ab,foo bar, zot
>
> into the following list: ((1 2 3.14) (ab |foo bar| zot))
Not | zot|?
This seems to do at least part of the job:
;; Make a custom readtable as a copy of the standard one.
(defvar *csv-readtable* (copy-readtable nil))
;; READ-DELIMITED-LIST requires a non-whitespace character.
(set-syntax-from-char #\Newline #\) *csv-readtable*)
;; Commas separate elements in lists.
(set-syntax-from-char #\, #\Space *csv-readtable*)
;; Spaces are just graphic characters without case.
(set-syntax-from-char #\Space #\- *csv-readtable*)
;; Then, read lines like this.
(let ((*readtable* *csv-readtable*))
(read-delimited-list #\Newline))
However, there are problems with this approach:
* Many characters in *csv-readtable* still have Lisp-specific
meanings. For example, a semicolon begins a comment. I
suppose these could be disabled by calling SET-SYNTAX-FROM-CHAR
in a loop over all characters.
* If a sequence of characters looks like a number, the reader
will treat it as such. You wrote you wanted everything to
become symbols.
* "foo:bar" becomes a symbol in a FOO package.
* "foo,,,bar" is treated as "foo,bar".
* Reaching the end of the file will signal an error. You can
handle that, but getting an error from such a necessary event
is still wrong.
For these reasons, I think you should implement the parser
without using the Lisp reader. You could for example call
READ-LINE, split the result at commas (using one of the
sequence-splitting functions amply discussed here), and INTERN
each part in a package of your choosing. Or, you could read
one character at a time with READ-CHAR and VECTOR-PUSH-EXTEND
them into an adjustable string.
On Sat, 10 Apr 2004 13:01:14 -0700, Adam Ingram-Goble <·······@cs.pdx.edu> wrote:
> I just started playing with the reader since i need to create a csv
> or arff reader that treats all data in the rows as symbols. I'm not
> clear on how to use any character to signal the end of the
> list. Ideally I'd be able to read
> 1,2,3.14
> ab,foo bar, zot
>
> into the following list: ((1 2 3.14) (ab |foo bar| zot))
>
> After spending hours playing with the readtable and trying to
> manipulate #\newline to signal the termination of the list read, i'm
> still at square one.
>
> Any help is extremely appreciated.
Here's a recent thread about reading CSV data, might be helpful:
<http://www.google.com/groups?threadm=3fdc241b%241%40news.starhub.net.sg>
Edi.
Adam Ingram-Goble wrote:
> that treats all data in the rows as symbols.
If this is what you want to do, it is not possible with the CL reader.
Check the ANS sections 2.2 and 2.3. The CL reader algorithm is split
into two phases, documented in these two sections. The first documents
how the readtable guides the reader into splitting the input stream into
potential tokens. The second documents how potential tokens are resolved
into objects.
In Common Lisp the parsing of potential tokens is exquisitely customizable
via the readtable. The interpretation of tokens is not cusomtizable at all
(except for the obvious effects of global variables such as *package* and
*read-base*).
There have been some implementations in which the interpretation of tokens
was custmoizable, but this never made it into the ANS. Be careful to
understand the difference between the operation of these reader phases, or
else you will be forever confused in your attempts to make the reader solve
all your problems. The CL reader wasn't intended to solve all needs for
customized input. It was intended to solve the problem of reading Lisp
sexpr notation. Other input program needs are best solved by writing a
computer program that implements them.
"Steven M. Haflich" <·················@alum.mit.edu> writes:
> There have been some implementations in which the interpretation of
> tokens was custmoizable, but this never made it into the ANS. Be
> careful to understand the difference between the operation of these
> reader phases, or else you will be forever confused in your attempts
> to make the reader solve all your problems. The CL reader wasn't
> intended to solve all needs for customized input. It was intended to
> solve the problem of reading Lisp sexpr notation. Other input
> program needs are best solved by writing a computer program that
> implements them.
Just to add to that good advice: if you do go the route of writing
your own code to read some format not amenable to lexing/parsing by
the Lisp reader, it's quite likely that you can still use Lisp's
symbols and package machinery (c.f. INTERN, MAKE-PACKAGE et al.) to
represent name-like things in the input. Thus you automatically get a
system for managing different namespaces (packages) and also the easy
ability to annotate your names with other information, should the need
arise, using the symbols' plists.
-Peter
--
Peter Seibel ·····@javamonkey.com
Lisp is the red pill. -- John Fraser, comp.lang.lisp
Thank you all for the help. I'm still wonking on my reader
modifications (trying to add arff format support), but i got the csv
reading to work as needed.
adam