Input with Common Lisp?

From: Dirk Zoller
Subject: Input with Common Lisp?
Date: Tue, 22 Apr 1997 00:00:00 +0000
Message-ID: <5jiasa$rrv@rts1.rtsffm.com>

Hello together,

in Common Lisp I found a nice and powerful way to do formatted output.
The (format) function.

Then I wanted to read in a List of data dumped by someone else from his
PC based dBase-like data base management tool. It has lines containing
comma-separated numbers and strings. All lines look alike but the first
which contains field names instead of data.

	"Monday",123456,12.34
	"Tuesday",4321,87.54
	etc, several thousand lines.

Now I didn't find any equally simple and capable input method
to read in these data. Just like if C had printf but no scanf.

I ended up transforming the data with awk into lisp-like lists

	("Monday" 123456 12.34)
	etc, thousands of lines

and read these in with the lisp reader. The result was extremely slow,
it took minutes to read the file (clisp on Linux but that's not the
point here). There must be better methods. What am I missing?

Thanks for your hints.

Dirk

From: Howard R. Stearns
Subject: Re: Input with Common Lisp?
Date: Thu, 01 May 1997 00:00:00 +0000
Message-ID: <3368BF47.6102D7B7@elwoodcorp.com>

Dirk Zoller wrote:
> 
> Hello together,
> 
> in Common Lisp I found a nice and powerful way to do formatted output.
> The (format) function.
> 
> Then I wanted to read in a List of data dumped by someone else from his
> PC based dBase-like data base management tool. It has lines containing
> comma-separated numbers and strings. All lines look alike but the first
> which contains field names instead of data.
> 
>         "Monday",123456,12.34
>         "Tuesday",4321,87.54
>         etc, several thousand lines.
> 
> Now I didn't find any equally simple and capable input method
> to read in these data. Just like if C had printf but no scanf.
> 
> I ended up transforming the data with awk into lisp-like lists
> 
>         ("Monday" 123456 12.34)
>         etc, thousands of lines
> 
> and read these in with the lisp reader. The result was extremely slow,
> it took minutes to read the file (clisp on Linux but that's not the
> point here). There must be better methods. What am I missing?
> 
> Thanks for your hints.
> 
> Dirk

I never saw any responses to this, so I thought I'd pass along some belated
comments.

1. If you haven't already figured it out, the canonical way to read a file of a
given format (in any language) is not to try to read the entire file with a
single library function call, but to iterate over the records of the file,
reading one record at a time, and terminating on the first record that fails to
be read (i.e. at end-of-file, or perhaps because of some other failure).

A function to read one record of your file might be someting like:

(defun read-my-record (stream eof-value)
  (let ((day (read stream nil eof-value))
    (if (eql eof-vlaue day) 
        eof-value
        (values day 
                (read-my-number stream "date")
                (read-my-number stream "integer"))))

(defun read-my-number (stream previous-field-name)
   (unless (char= #\, (read-char stream t nil t))
      (error "Missing comma after ~a." previous-field-name))
    (read stream t nil t))

A variation on this might be to not use READ, but to instead create your own
specialized functions using READ-CHAR and READ-SEQUENCE.  This may or may not be
faster than READ.

In any case, reading a whole record into a list creates a lot of garbage, unless
of course you actually want the stuff to be in a list.

2. For more free format parsing, Common Lisp provides a more general mechanism in
the form of a custimizable reader.  You can specify that a comma, for example, is
to be considered whitespace, and read one field at a time rather than one line at
a time (perhaps in a nested loop).  You can also create several different
readtables for parsing different kinds of fields, and then have the parser change
the readtable to the kind for the next field after it completes the parsing of a
single field.

3. Another variation on each of the above is to read in a line at a time using
READ-LINE, and then parse the fields from the resulting string rather than from
the file directly.  I do not believe this really gets you anything, though,
except for more garbage to be collected.

4. The Common Lisp standard kind of wimped out on defining a general file
parser/scanner - and probably with good reason: there is little aggreement on
"the right way" to do it.  There are plenty of tools available for parsing and
pattern matching, and they all work differently.

For example, many years ago I built a "double object oriented" input/output
system for building translators to and from CAD and other systems that used both
ASCII and binary files.  There were reader and writer objects defined for each
file format, that each supported a fixed set of methods for reading a float,
writing a label, etc.  A language allowed data to be transmitted differently than
the format/printf/scanf style.  For example, rather than:

  (with-open-file (stream "file.iges" ...)
    (format stream "~/IGES:integer/, ~/IGES:float/;" 
                    integer-value float-value))  
one wrote:

  (with-writer (IGES "file.iges" ...) 
     (write-env (:integer integer-value)
                (:float float-value))) etc.

Using various macro tricks, it SEEMED like the writer object object had dynamic
scope, and this permitted a lot more inheritance between operations.  (I also
prefered having my values near my operators, rather than having them following
the format control string in a list.)

A second set of objects corresponded to the entities represented in the file
(usually geometry) and shadowed the geometric objects within our system.  These
used the generic operators to read or write their own data to and from the file.

Using this, an incredibly small amount of code enabled us to automatically write
arbitrary geometric models (including wireframe prismatics, surfaces and solids)
with annotation to IGES, ComputerVision (two formats), Autocad, Intergraph,
Catia, Unigraphics, and 3D-Systems and read geometry from IGES, Catia,
Unigraphics and Computvision.

Conclusion: No there is isn't a scanf.  (Many people think there shouldn't have
been a format, either.)  However, Lisp is designed for building your own
languages.  Chances are that if you do it right, than a large project (or
multiple product lines) will end up using less code using your own specialized
parsing system than if you relied on some built in scaning system that did simple
problems more easily, but didn't really solve your problem properly.