I/O libraries?

From: Svein Ove Aas
Subject: I/O libraries?
Date: Mon, 19 Apr 2004 17:12:13 +0000
Message-ID: <VKTgc.7630$px6.107782@news2.e.nsc.no>

Judging by the fairly involved conversation about reading from strings in
another thread, and the code that was produced, there is a fairly severe
lack of support for I/O in Lisp.

Sure, you managed to write a "safe-read" function eventually, but the mere
fact that this was neccessary shows that standard support is lacking.


As such, I'm going to have to ask two questions:

- First, are there any I/O libraries around that would handle this properly?
- Second, what would be the procedure for making this part of the CL
standard? Is anyone actually working on it?

I'm writing an application that will take arbitrary input from the internet,
and I'd appreciate it if someone a bit more experienced than I am were to
think about security.

Re: I/O libraries? Christophe Rhodes
- Re: I/O libraries? Svein Ove Aas
  - Re: I/O libraries? Christopher C. Stacy
    - Re: I/O libraries? Svein Ove Aas
      - Re: I/O libraries? ··········@YahooGroups.Com
        Re: I/O libraries? Svein Ove Aas
      - Re: I/O libraries? Matthew Danish
Re: I/O libraries? Steven M. Haflich
Re: I/O libraries? ··········@YahooGroups.Com

From: Christophe Rhodes
Subject: Re: I/O libraries?
Date: Mon, 19 Apr 2004 17:18:13 +0000
Message-ID: <sqzn97lwdm.fsf@lambda.dyndns.org>

Svein Ove Aas <··············@brage.info> writes:

> Judging by the fairly involved conversation about reading from
> strings in another thread, and the code that was produced, there is
> a fairly severe lack of support for I/O in Lisp.
>
> Sure, you managed to write a "safe-read" function eventually, but
> the mere fact that this was neccessary shows that standard support
> is lacking.

READ is not what I would describe as an I/O primitive.  Look up
READ-BYTE and READ-SEQUENCE.  (Also be slightly more conservative in
future about drawing conclusions from incomplete data).

> I'm writing an application that will take arbitrary input from the
> internet, and I'd appreciate it if someone a bit more experienced
> than I am were to think about security.

Typical arbitrary input from the internet is not usually parsed as
lisp forms.

Christophe
-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)

From: Svein Ove Aas
Subject: Re: I/O libraries?
Date: Mon, 19 Apr 2004 17:54:22 +0000
Message-ID: <qmUgc.7635$px6.108040@news2.e.nsc.no>

Christophe Rhodes wrote:

> Svein Ove Aas <··············@brage.info> writes:
> 
>> Judging by the fairly involved conversation about reading from
>> strings in another thread, and the code that was produced, there is
>> a fairly severe lack of support for I/O in Lisp.
>>
>> Sure, you managed to write a "safe-read" function eventually, but
>> the mere fact that this was neccessary shows that standard support
>> is lacking.
> 
> READ is not what I would describe as an I/O primitive.  Look up
> READ-BYTE and READ-SEQUENCE.  (Also be slightly more conservative in
> future about drawing conclusions from incomplete data).
> 
>> I'm writing an application that will take arbitrary input from the
>> internet, and I'd appreciate it if someone a bit more experienced
>> than I am were to think about security.
> 
> Typical arbitrary input from the internet is not usually parsed as
> lisp forms.

It's more a case of me not knowing any way of parsing numbers than by using
(read).

Then again, the application on the other end is also a Lisp application, so
lisp forms might be useful.

From: Christopher C. Stacy
Subject: Re: I/O libraries?
Date: Mon, 19 Apr 2004 19:13:29 +0000
Message-ID: <uhdvfwzl1.fsf@news.dtpq.com>

>>>>> On Mon, 19 Apr 2004 19:54:22 +0200, Svein Ove Aas ("Svein") writes:

 Svein> Christophe Rhodes wrote:
 >> Svein Ove Aas <··············@brage.info> writes:
 >> 
 >>> Judging by the fairly involved conversation about reading from
 >>> strings in another thread, and the code that was produced, there is
 >>> a fairly severe lack of support for I/O in Lisp.

You don't know very much about Lisp, to say the least.
Rather than judging from a conversation that you can't 
quite follow because you lack the background and understand
none of the basics, perhaps you should read some books or
take some courses or something before drawing your conclusions.

 >>> 
 >>> Sure, you managed to write a "safe-read" function eventually, but
 >>> the mere fact that this was neccessary shows that standard support
 >>> is lacking.
 >> 
 >> READ is not what I would describe as an I/O primitive.  Look up
 >> READ-BYTE and READ-SEQUENCE.  (Also be slightly more conservative in
 >> future about drawing conclusions from incomplete data).
 >> 
 >>> I'm writing an application that will take arbitrary input from the
 >>> internet, and I'd appreciate it if someone a bit more experienced
 >>> than I am were to think about security.
 >> 
 >> Typical arbitrary input from the internet is not usually parsed as
 >> lisp forms.

 Svein> It's more a case of me not knowing any way of parsing numbers
 Svein> than by using (read).

Accept the input as a string and call PARSE-INTEGER on it.
(If you need to parse some floating point syntax, there are 
libraries that are available for that, like in all languages.)

 Svein> Then again, the application on the other end is also a Lisp
 Svein> application, so lisp forms might be useful.

If it's going to manipulate numbers, then perhaps numbers might be useful.

From: Svein Ove Aas
Subject: Re: I/O libraries?
Date: Mon, 19 Apr 2004 20:32:08 +0000
Message-ID: <kGWgc.7665$px6.108621@news2.e.nsc.no>

Christopher C. Stacy wrote:

>>>>>> On Mon, 19 Apr 2004 19:54:22 +0200, Svein Ove Aas ("Svein") writes:
> 
>  Svein> Christophe Rhodes wrote:
>  >> Svein Ove Aas <··············@brage.info> writes:
>  >> 
>  >>> Judging by the fairly involved conversation about reading from
>  >>> strings in another thread, and the code that was produced, there is
>  >>> a fairly severe lack of support for I/O in Lisp.
> 
> You don't know very much about Lisp, to say the least.
> Rather than judging from a conversation that you can't
> quite follow because you lack the background and understand
> none of the basics, perhaps you should read some books or
> take some courses or something before drawing your conclusions.

Maybe I come over as more judgemental than I really am; if so, I'm sorry.[1]
I *am* doing those things you suggest (well, except for the courses; none
seem to be available here), but I'm also impatient enough not to wade
through all the books I've got now to find out how to read numbers.

In any case, the (possibly unclear) main thrust of my OP was to ask about
the standardization process, to which I feel I've gotten a good answer via
IRC; thanks for taking the time to correct me, though; that's what I was
looking for, really.

>  >>> 
>  >>> Sure, you managed to write a "safe-read" function eventually, but
>  >>> the mere fact that this was neccessary shows that standard support
>  >>> is lacking.
>  >> 
>  >> READ is not what I would describe as an I/O primitive.  Look up
>  >> READ-BYTE and READ-SEQUENCE.  (Also be slightly more conservative in
>  >> future about drawing conclusions from incomplete data).

I was already aware of those. I wasn't aware of parse-integer, though. I
think I'm going to go off and write an equivalent of C scanf in any case.

> If it's going to manipulate numbers, then perhaps numbers might be useful.

Numbers might be useful, yes, but...

Eval is going to play a role in the client application, and I need to figure
out how to serialize generic objects. (The server is trusted.)

I think I'll keep that as my own puzzle, though.

-- 
[1]: I did in fact say what I did specifically so people could correct my
misconceptions. You've certainly managed that splendidly.

From: ··········@YahooGroups.Com
Subject: Re: I/O libraries?
Date: Sat, 08 May 2004 20:26:49 +0000
Message-ID: <REM-2004may08-001@Yahoo.Com>

> From: Svein Ove Aas <··············@brage.info>
> I'm also impatient enough not to wade through all the books I've got
> now to find out how to read numbers.

I agree with you. I am a strong proponent of access to nicely-indexed
reference information, as opposed to telling somebody they have to
memorize the whole body of knowledge and then just draw from that
memory without having access to any reference materials.

With only what's presently available, you should learn how to
effectively use the online Web-based ANSI-CL specification and one or
more of the more nicely organized CL tutorials. But they aren't
organized in the way that makes it obvious where to find things. Let me
see how I fare at looking up the info you or somebody originally asked,
namely how to parse an integer from its text representation.
Start at google.com, search for: ansi common lisp specification
First match is:    Linkname: Common Lisp HyperSpec (TM)
        URL: http://www.lisp.org/HyperSpec/FrontMatter/index.html
Going into the table of contents, and clicking on the chapter on Numbers,
and then into 12.2 The Numbers Dictionary
and then into    System Class INTEGER
and then into the See also item:
Section 2.3.2 (Constructing Numbers from Tokens)
Nope, that seems to be a dead-end, telling about syntax for numbers
when (READ) is parsing input, but not anything specific about how to
parse a number when you know that's what you want.
Backing up to the Numbers Dictionary and browsing forward, eventually
we reach:    Linkname: CLHS: Function PARSE-INTEGER
        URL: http://www.lisp.org/HyperSpec/Body/fun_parse-integer.html
The key text is:
   parse-integer parses an integer in the specified radix from the
   substring of string delimited by start and end.
   parse-integer expects an optional sign (+ or -) followed by a a
   non-empty sequence of digits to be interpreted in the specified radix.
   Optional leading and trailing whitespace[1] is ignored.
So if that task is what you need accomplished, then you know you've
found the right function, and you can read the spec in more detail and
do some experiments to see how it works in practice with your favorite
CL implementation. Thanks to the interactive read-eval-print loop, such
simple experiments are painless.

The above search took a while before I found the right spot. (I already
knew parse-integer was the goal, but pretended not to know that and
searched for anything that looked like it might parse an integer from a
string, as you might have if you had tried to find the info in this
way. I can imagine it might sometimes be frustrating to traverse the
hyperspec like that hoping to find the info you seek.) I would like
there to be a more organized way to find that same info, where you
specify what data type you have to start with, in this case a string,
and what data type you want as output, in this case an integer, and it
immediately tells you all the functions likely to be of use to you. In
this case it would tell you first about parse-integer, then later about
read which can parse not just integers but a wide range of data types,
and you pick which level of generality is most appropriate for your
particular task.

> the (possibly unclear) main thrust of my OP was to ask about the
> standardization process ...

If all you really need to do is include, in some program you're writing,
the capability of converting a string representation of an integer into
an actual integer (FIXNUM or BIGNUM), I'm wondering why you even care
about side issues such as the standardization process? It takes years
to propose a change in the standard, get it approved, and then wait
until various implementations adopt the changes. Isn't that rather a
detour from what you really need to accomplish not ten or twenty years
in the future but *today*??

> I wasn't aware of parse-integer

Did you look at the hyper-spec, as I demonstrated above, trying to find
if there was such a function in ANSI-CL, and if so the name of such
function and details about how to use it? Or did you not even believe
there would be such a function defined, so not bother to look for any
such? Or did you now know such a hyper-spec even existed?

> I think I'm going to go off and write an equivalent of C scanf in any
> case.

If you do so, be sure to make it directly compatible with the FORMAT
function in CL, which uses ~ instead of % to start formatting commands.

By the way, a couple months ago when I was trying to use sscanf in C to
parse a comma-delimited list I found it to be such a royal pain that I
gave up and wrote my own comma-finding algorithm and then passed just
the individual substrings (between commas) to sscanf.

How do you propose to emulate the "feature" of C that you can pass
memory addresses as parameters to a function such as scanf or sscanf,
and it will clobber those memory locations with the parsed values,
without bothering to do any type checking, often producing totally
garbage results that can crash your program with runtime exception? CL
doesn't allow you to do anything like that. To return lots of values,
either you must actually return multiple values in the CL sense and
then collect them via MULTIPLE-VALUE-LET or MULTIPLE-VALUE-SETQ etc.
(or less nice, but more Java-like, return a VECTOR of the results), or
to make it syntax-similar to C/PASCAL code you must make your scanf or
sscanf a macro that accepts places for all the side-effected arguments
and does the appropriate setf on each. Before you start actual coding,
have you decided which method you'll choose? For a first attempt, I
might suggest having each individual-field parser a separate function,
such as sscanf-parse-integer sscanf-parse-float sscanf-parse-string
etc. and collect the result from each such call via vector-push-extend,
and then just return the resultant vector when all done. Later you can
write a macro that simply calls that version of sscanf and does all the
SETFs from the individual elements of the vector, slightly less
efficient than theoretically possible because of creating the vector
and discarding it shortly, but Java does that all the time and Java is
popular so who's to complain on your first major programming attempt?

> Numbers might be useful, yes, but...
> Eval is going to play a role in the client application, and I need to
> figure out how to serialize generic objects. (The server is trusted.)

I would *hope* the server is trusted! Otherwise, all your files,
including the source of the program you're writing, are wormfood from
the start. Still, you should keep backups in a more secure place just
in case your server ever gets compromised.

Why exactly do you need EVAL in your application? Why not control
everything yourself and only use APPLY or FUNCALL to call a known and
trusted function on already-computed arguments (parameters)? Whatever
is passed to EVAL will need to be recursively traversed to check to
make sure no dangerous functions (such as DELETE-FILE) are called, so
why not have the traversal algorithm actually perform all the APPLYs
that would be done later by EVAL anyway? That would also have the
advantage that you can check the actual (evaluated) arguments to make
sure you aren't calling (APPLY/FUNCALLing) some function on arguments
that are specifically dangerous even if the function in other cases is
safe to call, for example if you allow your remote/untrusted user to
create some temporary disk files and later delete them, you can check
the actual argument to DELETE-FILE to make sure it refers to one of
that user's own files and not anyone else's files, rather than forbid
DELETE-FILE to ever be used.

If you allow the user to specify anything like (MAPCAR ...), that would
be a pain to either traverse first then pass to EVAL or manually
traverse and APPLY/FUNCALL, but still the latter seems less likely for
you to overlook something and create a security leak. (If the traversal
function has a bug where it merely returns from some level without
doing anything below that point, the former approach will leave a BIG
security leak where all lower-level code is passed to EVAL without ever
being checked, whereas the latter approach will merely fail to execute
any of that unchecked code at all.)

From: Svein Ove Aas
Subject: Re: I/O libraries?
Date: Sun, 09 May 2004 23:31:18 +0000
Message-ID: <P9znc.406$RL3.5393@news2.e.nsc.no>

··········@YahooGroups.Com wrote:

>> Numbers might be useful, yes, but...
>> Eval is going to play a role in the client application, and I need to
>> figure out how to serialize generic objects. (The server is trusted.)
> 
> I would *hope* the server is trusted! Otherwise, all your files,
> including the source of the program you're writing, are wormfood from
> the start. Still, you should keep backups in a more secure place just
> in case your server ever gets compromised.
> 
> Why exactly do you need EVAL in your application? Why not control
> everything yourself and only use APPLY or FUNCALL to call a known and
> trusted function on already-computed arguments (parameters)? Whatever

It's taking the place of automated patches, basically. Seems to work fine.

No, I'm not using eval on *my* machines, just on the clients; those will
just have to trust me. The server had *better* not get compromised, but
as it happens the client Lisp is running in a sandbox, so they'd be ok.

(Interpreted CL-subset inside interpreted Python... it's *ridiculously*
slow, but even a slowdown of 1000x - which is the real number - is
acceptable in this case. I'm going to have to think of something smarter
before long, though.)

From: Matthew Danish
Subject: Re: I/O libraries?
Date: Sat, 08 May 2004 21:35:52 +0000
Message-ID: <20040508213552.GU25328@mapcar.org>

On Mon, Apr 19, 2004 at 10:32:08PM +0200, Svein Ove Aas wrote:
> I think I'm going to go off and write an equivalent of C scanf in any case.

You might want to take a look at http://www.cliki.net/format-setf.

-- 
; Matthew Danish <·······@andrew.cmu.edu>
; OpenPGP public key: C24B6010 on keyring.debian.org
; Signed or encrypted mail welcome.
; "There is no dark side of the moon really; matter of fact, it's all dark."

From: Steven M. Haflich
Subject: Re: I/O libraries?
Date: Tue, 20 Apr 2004 05:01:37 +0000
Message-ID: <R82hc.53187$VA.52097@newssvr25.news.prodigy.com>

I copied the body of your message to a file /tmp/foo and then executed
the following code:

  cl-user(8): (pprint (with-open-file (f "/tmp/foo")
		       (loop as x = (read f nil f)
			   until (eq x f)
			   collect x)))
  Error: Comma not inside a backquote. [file position = 89]
    [condition type: reader-error]

Please correct your input...

The serious point here is that the CL read function is not designed
for all arbitrary input tasks.  It is designed primarily so Lisp programs
can read Lisp sexprs, and allows arbitrary control over the separation
between and collection of tokens.  However, it allows no control over the
interpretation of tokens.

If you have smoe input taks that isn't the reading of Lisp sexprs, you
have to write a computer program (or "subroutine") to accomplish this
need.  Common Lisp is exactly like C and C++ and Basic and Pascal and
FORTH and FORTRAN and ALGOL and PERL and Scheme and Python and 6801
assembly language lots of other languages in this regard.

From: ··········@YahooGroups.Com
Subject: Re: I/O libraries?
Date: Fri, 07 May 2004 18:24:35 +0000
Message-ID: <REM-2004may07-001@Yahoo.Com>

> From: Svein Ove Aas <··············@brage.info>
> there is a fairly severe lack of support for I/O in Lisp.

I don't know what that's supposed to mean. Most languages have the
ability to read/write text as sequences of characters and binary data
as sequences of bytes. LISP also has the ability to read/write most
kinds of data, such that if you read in and write out you get exactly
the same text as you started except for some cleanups such as flushing
leading zeroes of numbers, and if you write out and read back in you
get something isomorphic to the original object. That beats most other
languages, including Java (which I'm comparing to CL in another
thread).

> Sure, you managed to write a "safe-read" function eventually, but the
> mere fact that this was neccessary shows that standard support is
> lacking.

Standard support for LISP has traditionally been for a single user on
his/her own machine having as much power as possible. LISP has done a
fine job in that frame. That same works fine for CGI applications that
don't allow remote (untrusted) users to pass data through READ or
READ-FROM-STRING. All READing is of the programmer's own source files
and data files, so safe-read isn't needed there either.

The two applications where safe-read is needed are:

http://www.google.com/groups?selm=c54dq5%248kk%241%40reader1.panix.com
Apparently a desire to sit a naive user down at a LISP environent and
have that user run your LISP application and not have it crash into the
break package due to some unanticipated error when parsing a number
from a string. Or maybe just a desire to encapulate the error recovery
from such an error into a compact unit to make it easier to write the
rest of the program. It's not clear from the original article where the
string containing representation of number or garbage came from in the
first place, so I'm just guessing.

My own interest in writing CGI applications, where user input (in a
HTML FORM) will be parsed by my application to produce s-expressions.
(I mentionned my two specific applications are: CAI for teaching people
about LISP, and an s-expression-based RPC = Remote Procedure Call.)

So I think it's not unreasonable to expect a few lines of extra code,
beyond that needed for single-user-on-own-machine-hacking-away, might
be appropriate to handle these specialized applications, and not a
reason to condemn CL for not *already* providing this extra capability
in the standard.

> are there any I/O libraries around that would handle this properly?

If it's just a few lines of code, as the examples in the other thread
have indicated, why would you need a whole I/O library for it?

> what would be the procedure for making this part of the CL standard?

That's premature, IMO. First we need to try it and see what "best
practice" would be. Then eventually decide whether it really needs to
be added to the standard, or merely provided as a free add-on that
works on all platforms and is generally available.

> I'm writing an application that will take arbitrary input from the
> internet, and I'd appreciate it if someone a bit more experienced
> than I am were to think about security.

You need to state more clearly the situation you are in. Are you
writing a daemon that will run continuously for days at at time and
will require manual intervention to re-start if it ever crashes? Or are
you writing a CGI appliation that is re-started for each user and if it
crashes whenever the user enters grossly bad input that merely aborts
that one user's HTTP session so the user presses the back button on
his/her browser and tries better input? In the former case, crashes are
a pain, but in the latter crashes are not very important, only actual
security breaches are worth serious worry. (Yes it'd be nice to avoid
crashes, generate nice meaningful error messages to user instead, but
that's a minor issue compared to security breaches.)