Hello,
I want to process binary data sent by an html form (using post method)
like this:
<form ACTION=http://xxxx.cgi METHOD=POST
enctype='multipart/form-data'>
<input NAME="zipfile" TYPE="FILE" SIZE=40 >
</form>
The program should read binary data from standard input but read-byte
gives an error if the stream is not open with :element-type set to
unsigned-byte.
How can I solve this problem ?
Thibault
From: Tim Daly, Jr.
Subject: Re: read-byte and *standard-input*
Date:
Message-ID: <87n0i6m8sg.fsf@tenkan.org>
·················@di.fc.ul.pt (Thibault Langlois) writes:
> Hello,
>
> I want to process binary data sent by an html form (using post method)
> like this:
>
> <form ACTION=http://xxxx.cgi METHOD=POST
> enctype='multipart/form-data'>
> <input NAME="zipfile" TYPE="FILE" SIZE=40 >
> </form>
>
> The program should read binary data from standard input but read-byte
> gives an error if the stream is not open with :element-type set to
> unsigned-byte.
>
> How can I solve this problem ?
>
> Thibault
Well, the first thing that comes to mind is (CHAR-CODE (READ-CHAR)).
That's ugly, might not be correct in all cases, and won't perform
well -- but it ought to get the job done.
-Tim
--
> * In message <···························@posting.google.com>
> * On the subject of "read-byte and *standard-input*"
> * Sent on 1 May 2003 16:33:38 -0700
> * Honorable ·················@di.fc.ul.pt (Thibault Langlois) writes:
>
> The program should read binary data from standard input but read-byte
> gives an error if the stream is not open with :element-type set to
> unsigned-byte.
this is normal: you can only use READ-BYTE on binary streams, so you
have to open your input stream in binary mode or change the stream
element type before doing READ-BYTE:
$ ./clisp -q -norc bin-stdio.lisp < bin-stdio.lisp
*STANDARD-INPUT*=#<INPUT BUFFERED FILE-STREAM CHARACTER #P"/dev/fd/0" @1>
*STANDARD-INPUT*=#<INPUT BUFFERED FILE-STREAM (UNSIGNED-BYTE 8) #P"/dev/fd/0">
read 5 bytes:
#(40 102 111 114 109)="(form"
$ cat bin-stdio.lisp
(format t "~s=~s~%" '*standard-input* *standard-input*)
(setf (stream-element-type *standard-input*) '(unsigned-byte 8))
(format t "~s=~s~%" '*standard-input* *standard-input*)
(setq buf (make-array 5 :element-type '(unsigned-byte)))
(format t "read ~:d bytes:~%~s=~s~%" (read-sequence buf *standard-input*)
buf (convert-string-from-bytes buf charset:utf-8))
(quit)
$
if you would like to test this interactively, when *STANDARD-INPUT*
points to the terminal stream, you have to play a small trick:
(with-open-file (*standard-input* "/dev/fd/0"
:element-type '(unsigned-byte 8))
(print (read-byte *standard-input*)))
h
==> 104
--
Sam Steingold (http://www.podval.org/~sds) running RedHat9 GNU/Linux
<http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/>
<http://www.mideasttruth.com/> <http://www.palestine-central.com/links.html>
What garlic is to food, insanity is to art.
Hi Thibault Langlois,
> Hello,
>
> I want to process binary data sent by an html form (using post method)
> like this:
>
> <form ACTION=http://xxxx.cgi METHOD=POST
> enctype='multipart/form-data'>
> <input NAME="zipfile" TYPE="FILE" SIZE=40 >
> </form>
>
> The program should read binary data from standard input but read-byte
> gives an error if the stream is not open with :element-type set to
> unsigned-byte.
>
> How can I solve this problem ?
First you should always tell people your implementation. If it's CLISP the
developers made a decision that one is not allowed to read binary data from
STDIN. I think the decision is unfortunate and it affects the efficiency
of CLISP for CGI programming. The workaround is to read the data using
a faithfully reproducing character set with Unix end-of-lines. This will
result in a character stream that is byte identical to the binary stream.
ISO-8859-1 is a good choice for the faithfully reproducing character set.
Details of how to get this working are set out in the CL Cookbook:
<http://cl-cookbook.sourceforge.net/io.html>
Regards,
Adam
Adam Warner <······@consulting.net.nz> wrote:
+---------------
| Hi Thibault Langlois,
| > The program should read binary data from standard input but read-byte
| > gives an error if the stream is not open with :element-type set to
| > unsigned-byte.
|
| First you should always tell people your implementation. If it's CLISP the
| developers made a decision that one is not allowed to read binary data from
| STDIN.
+---------------
Not just CLISP, CMUCL complains as well:
> (read-byte *standard-input*)
#<Stream for Standard Input> is not a binary input stream.
Restarts:
0: [ABORT] Return to Top-Level.
+---------------
| I think the decision is unfortunate and it affects the efficiency
| of CLISP for CGI programming. The workaround is to read the data using
| a faithfully reproducing character set with Unix end-of-lines. This will
| result in a character stream that is byte identical to the binary stream.
| ISO-8859-1 is a good choice for the faithfully reproducing character set.
|
| Details of how to get this working are set out in the CL Cookbook:
| <http://cl-cookbook.sourceforge.net/io.html>
+---------------
Well, maybe, though AFAICT that URL says nothing about "faithful input";
it only talks about "faithful output".
But Adam's main point is correct: You need to specify your implementation.
For example, in CMUCL-18e there are (at least) two ways to work around this:
1. Use the CMUCL-specific function SYSTEM:READ-N-BYTES with *STANDARD-INPUT*
as the stream argument [lines tagged "T:" are typed input]:
> (defvar *buf* (make-array 10 :element-type '(unsigned-byte 8)))
*BUF*
> *buf*
#(0 0 0 0 0 0 0 0 0 0)
> (system:read-n-bytes *standard-input* *buf* 0 6 nil)
T: hello!
6
> *buf*
#(104 101 108 108 111 33 0 0 0 0)
>
2. Use the CMUCL-specific function SYSTEM:MAKE-FD-STREAM, since Unix
standard input is always file descriptor #0:
> (with-open-stream (s (system:make-fd-stream
(unix:unix-dup 0)
:element-type '(unsigned-byte 8)))
(loop for i = (read-byte s)
collect i
until (= i 10)))
T: hello, there!
(104 101 108 108 111 44 32 116 104 101 114 101 33 10)
>
Or some combination of both...
-Rob
p.s. Why did I use (UNIX:UNIX-DUP 0) as the fd instead of just 0?
Well... When you close an fd-stream [which WITH-OPEN-STREAM will do,
of course], CMUCL will also close the underlying Unix file descriptor,
which if it were the "real" fd #0 would cause subsequent reads to its
own *STANDARD-INPUT* to fail and then you'd be in a world of hurt.
[About the only way out at that point is to call (UNIX:UNIX-EXIT).]
-----
Rob Warnock, PP-ASEL-IA <····@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607
····@rpw3.org (Rob Warnock) wrote in message news:<······················@speakeasy.net>...
> Adam Warner <······@consulting.net.nz> wrote:
> +---------------
> | Hi Thibault Langlois,
> | > The program should read binary data from standard input but read-byte
> | > gives an error if the stream is not open with :element-type set to
> | > unsigned-byte.
> |
> | First you should always tell people your implementation. If it's CLISP the
> | developers made a decision that one is not allowed to read binary data from
> | STDIN.
> +---------------
>
> Not just CLISP, CMUCL complains as well:
>
> > (read-byte *standard-input*)
>
> #<Stream for Standard Input> is not a binary input stream.
> Restarts:
> 0: [ABORT] Return to Top-Level.
>
In Lispworks too:
CL-USER 1 > (read-byte *standard-input*)
Error: No applicable methods for #<STANDARD-GENERIC-FUNCTION
STREAM:STREAM-READ-BYTE 20787A72> with args (#<EDITOR::RUBBER-STREAM
#<EDITOR:BUFFER CAPI interactive-pane 2>>)
1 (continue) Call #<STANDARD-GENERIC-FUNCTION
STREAM:STREAM-READ-BYTE 20787A72> again
2 (abort) Return to level 0.
3 Return to top loop level 0.
Type :b for backtrace, :c <option number> to proceed, or :? for other
options
> +---------------
> | I think the decision is unfortunate and it affects the efficiency
> | of CLISP for CGI programming. The workaround is to read the data using
> | a faithfully reproducing character set with Unix end-of-lines. This will
> | result in a character stream that is byte identical to the binary stream.
> | ISO-8859-1 is a good choice for the faithfully reproducing character set.
> |
> | Details of how to get this working are set out in the CL Cookbook:
> | <http://cl-cookbook.sourceforge.net/io.html>
> +---------------
>
> Well, maybe, though AFAICT that URL says nothing about "faithful input";
> it only talks about "faithful output".
>
> But Adam's main point is correct: You need to specify your implementation.
> For example, in CMUCL-18e there are (at least) two ways to work around this:
>
I thought it was not relevant but it is. The specs do not specify what
kind of stream *standard-input* should be. If I understand well it is
up to the implementation.
I am using LW and clisp. Both implementations do not allow read-byte
on *standard-input*.
It would be very handy to be able to convert a (possibly open) stream:
(let ((s (make-binary-stream *standard-input*)))
(read-byte s))
or something like this (which fails on clisp):
[4]> (SETF (STREAM-ELEMENT-TYPE *standard-input*) '(UNSIGNED-BYTE 8))
*** - (SETF STREAM-ELEMENT-TYPE) on #<IO TERMINAL-STREAM> is illegal
1. Break [5]>
This is a small test for Tim's solution:
(defun my-ugly-test ()
(print (lisp-implementation-type))
(print (lisp-implementation-version))
(let* ((file (progn
(with-open-file (f "/tmp/test.bin" :direction
:output
:element-type 'unsigned-byte
:if-exists :supersede)
(loop for b from 0 to 255 do (write-byte b f)))
"/tmp/test.bin"))
(v-code (with-open-file (f file :direction :input)
(loop for c = (read-char f nil nil)
when c collect (char-code c)
while c)))
(v-byte (with-open-file (f file :direction :input
:element-type 'unsigned-byte)
(loop for c = (read-byte f nil nil)
when c collect c
while c))))
(remove-if #'(lambda (a) (= (first a) (rest a)))
(mapcar #'cons v-byte v-code))))
Clisp:
[7]> (my-ugly-test)
"CLISP"
"2.27 (released 2001-07-17) (built 3214131121) (memory 3250927427)"
((13 . 10))
in clisp, if byte is 13 the code returned by (char-code (read-char
stream))
for this byte is 10. if byte is 10 the value retured is 10.
Why ?
How can I differentiate bytes 10 and 13 with clisp ?
Allegro and Lispworks are ok:
Allegro:
USER(4): (my-ugly-test)
"Allegro CL Trial Edition"
"5.0 [Linux/X86] (8/29/98 10:57)"
NIL
Lispworks:
CL-USER 57 > (my-ugly-test)
"LispWorks"
"4.2.6"
NIL
So, Tim's solution works at least with lispworks and allegro.
Unfortunately I use clisp for cgi scripts...
Thibault
> 1. Use the CMUCL-specific function SYSTEM:READ-N-BYTES with *STANDARD-INPUT*
> as the stream argument [lines tagged "T:" are typed input]:
>
> > (defvar *buf* (make-array 10 :element-type '(unsigned-byte 8)))
> *BUF*
> > *buf*
> #(0 0 0 0 0 0 0 0 0 0)
> > (system:read-n-bytes *standard-input* *buf* 0 6 nil)
> T: hello!
> 6
> > *buf*
> #(104 101 108 108 111 33 0 0 0 0)
> >
>
> 2. Use the CMUCL-specific function SYSTEM:MAKE-FD-STREAM, since Unix
> standard input is always file descriptor #0:
>
> > (with-open-stream (s (system:make-fd-stream
> (unix:unix-dup 0)
> :element-type '(unsigned-byte 8)))
> (loop for i = (read-byte s)
> collect i
> until (= i 10)))
> T: hello, there!
> (104 101 108 108 111 44 32 116 104 101 114 101 33 10)
> >
>
> Or some combination of both...
>
>
> -Rob
>
> p.s. Why did I use (UNIX:UNIX-DUP 0) as the fd instead of just 0?
> Well... When you close an fd-stream [which WITH-OPEN-STREAM will do,
> of course], CMUCL will also close the underlying Unix file descriptor,
> which if it were the "real" fd #0 would cause subsequent reads to its
> own *STANDARD-INPUT* to fail and then you'd be in a world of hurt.
> [About the only way out at that point is to call (UNIX:UNIX-EXIT).]
>
> -----
> Rob Warnock, PP-ASEL-IA <····@rpw3.org>
> 627 26th Avenue <URL:http://rpw3.org/>
> San Mateo, CA 94403 (650)572-2607
> * In message <····························@posting.google.com>
> * On the subject of "Re: read-byte and *standard-input*"
> * Sent on 2 May 2003 09:03:04 -0700
> * Honorable ·················@di.fc.ul.pt (Thibault Langlois) writes:
>
> "CLISP"
> "2.27 (released 2001-07-17) (built 3214131121) (memory 3250927427)"
this is rather old...
> in clisp, if byte is 13 the code returned by (char-code (read-char
> stream)) for this byte is 10. if byte is 10 the value retured is 10.
> Why ? How can I differentiate bytes 10 and 13 with clisp ?
CLISP READ-CHAR reads both bytes 10 (LF) and 13 (CR) as #\Newline,
which has CHAR-CODE of 10.
Consider a file foo.dos created like this:
(with-open-file (s "foo.dos" :direction :output
:element-type '(unsigned-byte 8))
(write-sequence
(mapcar #'char-code
'(#\f #\o #\o #\Newline #\b #\a #\r #\Return #\Newline))
s))
now, what should
(with-open-file (s "foo.dos" :direction :input
:element-type 'character
:external-format :dos)
(read-line s))
return?
You might reasonably argue that the right string to return is
"foo\fbar". Unfortunately, "\f" (== (code-char 10)) is #\Newline in
CLISP, so READ-LINE would return a string with an embedded newline,
which, if not outright non-compliant, would be quite surprising to a
user.
Because of this problem, CLISP reads CR, LF and CRLF as #\Newline.
--
Sam Steingold (http://www.podval.org/~sds) running RedHat9 GNU/Linux
<http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/>
<http://www.mideasttruth.com/> <http://www.palestine-central.com/links.html>
If brute force does not work, you are not using enough.
"Thibault Langlois" <·················@di.fc.ul.pt> wrote
> In Lispworks too:
> CL-USER 1 > (read-byte *standard-input*)
>
> Error: No applicable methods for #<STANDARD-GENERIC-FUNCTION
> STREAM:STREAM-READ-BYTE 20787A72> with args (#<EDITOR::RUBBER-STREAM
> #<EDITOR:BUFFER CAPI interactive-pane 2>>)
In LW it works for file and socket streams where it's useful. It doesn't make
much sense to do this with an editor input stream.
I asked Xanalys for this functionality because I needed it for cl-pdf and web
applications with mod_lisp, which have mixed text/binary contents. (It's now
in the standard LW) I think bivalent streams should be present in all
implementations as it's really needed these days.
Marc