read-byte and *standard-input*

From: Thibault Langlois
Subject: read-byte and *standard-input*
Date: Thu, 01 May 2003 23:33:38 +0000
Message-ID: <1fff13b3.0305011533.7d5532a@posting.google.com>

Hello,

I want to process binary data sent by an html form (using post method)
like this:

<form ACTION=http://xxxx.cgi METHOD=POST
enctype='multipart/form-data'>
<input NAME="zipfile" TYPE="FILE" SIZE=40 >
</form>

The program should read binary data from standard input but read-byte
gives an error if the stream is not open with :element-type set to
unsigned-byte.

How can I solve this problem ?

Thibault

Re: read-byte and *standard-input* Tim Daly, Jr.
Re: read-byte and *standard-input* Sam Steingold
Re: read-byte and *standard-input* Adam Warner
- Re: read-byte and *standard-input* Rob Warnock
  - Re: read-byte and *standard-input* Thibault Langlois
    - Re: read-byte and *standard-input* Sam Steingold
    - Re: read-byte and *standard-input* Marc Battyani

From: Tim Daly, Jr.
Subject: Re: read-byte and *standard-input*
Date: Fri, 02 May 2003 01:33:51 +0000
Message-ID: <87n0i6m8sg.fsf@tenkan.org>

·················@di.fc.ul.pt (Thibault Langlois) writes:

> Hello,
> 
> I want to process binary data sent by an html form (using post method)
> like this:
> 
> <form ACTION=http://xxxx.cgi METHOD=POST
> enctype='multipart/form-data'>
> <input NAME="zipfile" TYPE="FILE" SIZE=40 >
> </form>
> 
> The program should read binary data from standard input but read-byte
> gives an error if the stream is not open with :element-type set to
> unsigned-byte.
> 
> How can I solve this problem ?
> 
> Thibault

Well, the first thing that comes to mind is (CHAR-CODE (READ-CHAR)).
That's ugly, might not be correct in all cases, and won't perform
well -- but it ought to get the job done.

-Tim

--

From: Sam Steingold
Subject: Re: read-byte and *standard-input*
Date: Fri, 02 May 2003 15:52:27 +0000
Message-ID: <m3d6j12vmz.fsf@loiso.podval.org>

> * In message <···························@posting.google.com>
> * On the subject of "read-byte and *standard-input*"
> * Sent on 1 May 2003 16:33:38 -0700
> * Honorable ·················@di.fc.ul.pt (Thibault Langlois) writes:
>
> The program should read binary data from standard input but read-byte
> gives an error if the stream is not open with :element-type set to
> unsigned-byte.

this is normal: you can only use READ-BYTE on binary streams, so you
have to open your input stream in binary mode or change the stream
element type before doing READ-BYTE:

$ ./clisp -q -norc bin-stdio.lisp < bin-stdio.lisp
*STANDARD-INPUT*=#<INPUT BUFFERED FILE-STREAM CHARACTER #P"/dev/fd/0" @1>
*STANDARD-INPUT*=#<INPUT BUFFERED FILE-STREAM (UNSIGNED-BYTE 8) #P"/dev/fd/0">
read 5 bytes:
#(40 102 111 114 109)="(form"
$ cat bin-stdio.lisp
(format t "~s=~s~%" '*standard-input* *standard-input*)
(setf (stream-element-type *standard-input*) '(unsigned-byte 8))
(format t "~s=~s~%" '*standard-input* *standard-input*)
(setq buf (make-array 5 :element-type '(unsigned-byte)))
(format t "read ~:d bytes:~%~s=~s~%" (read-sequence buf *standard-input*)
        buf (convert-string-from-bytes buf charset:utf-8))
(quit)
$

if you would like to test this interactively, when *STANDARD-INPUT*
points to the terminal stream, you have to play a small trick:

(with-open-file (*standard-input* "/dev/fd/0"
                    :element-type '(unsigned-byte 8))
   (print (read-byte *standard-input*)))
h
==> 104 


-- 
Sam Steingold (http://www.podval.org/~sds) running RedHat9 GNU/Linux
<http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/>
<http://www.mideasttruth.com/> <http://www.palestine-central.com/links.html>
What garlic is to food, insanity is to art.

From: Adam Warner
Subject: Re: read-byte and *standard-input*
Date: Fri, 02 May 2003 05:05:03 +0000
Message-ID: <pan.2003.05.02.05.05.02.93117@consulting.net.nz>

Hi Thibault Langlois,

> Hello,
> 
> I want to process binary data sent by an html form (using post method)
> like this:
> 
> <form ACTION=http://xxxx.cgi METHOD=POST
> enctype='multipart/form-data'>
> <input NAME="zipfile" TYPE="FILE" SIZE=40 >
> </form>
> 
> The program should read binary data from standard input but read-byte
> gives an error if the stream is not open with :element-type set to
> unsigned-byte.
> 
> How can I solve this problem ?

First you should always tell people your implementation. If it's CLISP the
developers made a decision that one is not allowed to read binary data from
STDIN. I think the decision is unfortunate and it affects the efficiency
of CLISP for CGI programming. The workaround is to read the data using
a faithfully reproducing character set with Unix end-of-lines. This will
result in a character stream that is byte identical to the binary stream.
ISO-8859-1 is a good choice for the faithfully reproducing character set.

Details of how to get this working are set out in the CL Cookbook:
<http://cl-cookbook.sourceforge.net/io.html>

Regards,
Adam

From: Rob Warnock
Subject: Re: read-byte and *standard-input*
Date: Fri, 02 May 2003 09:32:26 +0000
Message-ID: <WKycnbrGnNy3pi-jXTWc-g@speakeasy.net>

Adam Warner <······@consulting.net.nz> wrote:
+---------------
| Hi Thibault Langlois,
| > The program should read binary data from standard input but read-byte
| > gives an error if the stream is not open with :element-type set to
| > unsigned-byte.
| 
| First you should always tell people your implementation. If it's CLISP the
| developers made a decision that one is not allowed to read binary data from
| STDIN.
+---------------

Not just CLISP, CMUCL complains as well:

	> (read-byte *standard-input*)

	#<Stream for Standard Input> is not a binary input stream.
	Restarts:
	  0: [ABORT] Return to Top-Level.

+---------------
| I think the decision is unfortunate and it affects the efficiency
| of CLISP for CGI programming. The workaround is to read the data using
| a faithfully reproducing character set with Unix end-of-lines. This will
| result in a character stream that is byte identical to the binary stream.
| ISO-8859-1 is a good choice for the faithfully reproducing character set.
| 
| Details of how to get this working are set out in the CL Cookbook:
| <http://cl-cookbook.sourceforge.net/io.html>
+---------------

Well, maybe, though AFAICT that URL says nothing about "faithful input";
it only talks about "faithful output".

But Adam's main point is correct: You need to specify your implementation.
For example, in CMUCL-18e there are (at least) two ways to work around this:

1. Use the CMUCL-specific function SYSTEM:READ-N-BYTES with *STANDARD-INPUT*
   as the stream argument [lines tagged "T:" are typed input]:

	> (defvar *buf* (make-array 10 :element-type '(unsigned-byte 8)))
	*BUF*
	> *buf*
	#(0 0 0 0 0 0 0 0 0 0)
	> (system:read-n-bytes *standard-input* *buf* 0 6 nil)
   T:	hello!
	6
	> *buf*
	#(104 101 108 108 111 33 0 0 0 0)
	>

2. Use the CMUCL-specific function SYSTEM:MAKE-FD-STREAM, since Unix
   standard input is always file descriptor #0:

	> (with-open-stream (s (system:make-fd-stream
				(unix:unix-dup 0)
				:element-type '(unsigned-byte 8)))
	    (loop for i = (read-byte s)
                  collect i
                  until (= i 10)))
   T:	hello, there!
	(104 101 108 108 111 44 32 116 104 101 114 101 33 10)
	> 

Or some combination of both...

-Rob

p.s. Why did I use (UNIX:UNIX-DUP 0) as the fd instead of just 0?
Well... When you close an fd-stream [which WITH-OPEN-STREAM will do,
of course], CMUCL will also close the underlying Unix file descriptor,
which if it were the "real" fd #0 would cause subsequent reads to its
own *STANDARD-INPUT* to fail and then you'd be in a world of hurt.
[About the only way out at that point is to call (UNIX:UNIX-EXIT).]

-----
Rob Warnock, PP-ASEL-IA		<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Thibault Langlois
Subject: Re: read-byte and *standard-input*
Date: Fri, 02 May 2003 16:03:04 +0000
Message-ID: <1fff13b3.0305020803.3e8f554f@posting.google.com>

····@rpw3.org (Rob Warnock) wrote in message news:<······················@speakeasy.net>...
> Adam Warner <······@consulting.net.nz> wrote:
> +---------------
> | Hi Thibault Langlois,
> | > The program should read binary data from standard input but read-byte
> | > gives an error if the stream is not open with :element-type set to
> | > unsigned-byte.
> | 
> | First you should always tell people your implementation. If it's CLISP the
> | developers made a decision that one is not allowed to read binary data from
> | STDIN.
> +---------------
> 
> Not just CLISP, CMUCL complains as well:
> 
> 	> (read-byte *standard-input*)
> 
> 	#<Stream for Standard Input> is not a binary input stream.
> 	Restarts:
> 	  0: [ABORT] Return to Top-Level.
> 

In Lispworks too:
CL-USER 1 > (read-byte *standard-input*)

Error: No applicable methods for #<STANDARD-GENERIC-FUNCTION
STREAM:STREAM-READ-BYTE 20787A72> with args (#<EDITOR::RUBBER-STREAM
#<EDITOR:BUFFER CAPI interactive-pane 2>>)
  1 (continue) Call #<STANDARD-GENERIC-FUNCTION
STREAM:STREAM-READ-BYTE 20787A72> again
  2 (abort) Return to level 0.
  3 Return to top loop level 0.

Type :b for backtrace, :c <option number> to proceed,  or :? for other
options


> +---------------
> | I think the decision is unfortunate and it affects the efficiency
> | of CLISP for CGI programming. The workaround is to read the data using
> | a faithfully reproducing character set with Unix end-of-lines. This will
> | result in a character stream that is byte identical to the binary stream.
> | ISO-8859-1 is a good choice for the faithfully reproducing character set.
> | 
> | Details of how to get this working are set out in the CL Cookbook:
> | <http://cl-cookbook.sourceforge.net/io.html>
> +---------------
> 
> Well, maybe, though AFAICT that URL says nothing about "faithful input";
> it only talks about "faithful output".
> 
> But Adam's main point is correct: You need to specify your implementation.
> For example, in CMUCL-18e there are (at least) two ways to work around this:
>

I thought it was not relevant but it is. The specs do not specify what
kind of stream *standard-input* should be. If I understand well it is
up to the implementation.
I am using LW and clisp. Both implementations do not allow read-byte
on *standard-input*.
It would be very handy to be able to convert a (possibly open) stream:

(let ((s (make-binary-stream *standard-input*)))
   (read-byte s))

or something like this (which fails on clisp):
[4]> (SETF (STREAM-ELEMENT-TYPE *standard-input*) '(UNSIGNED-BYTE 8))
 
*** - (SETF STREAM-ELEMENT-TYPE) on #<IO TERMINAL-STREAM> is illegal
1. Break [5]>

This is a small test for Tim's solution:

(defun my-ugly-test () 
   (print (lisp-implementation-type))
   (print (lisp-implementation-version))
   (let* ((file (progn 
                  (with-open-file (f "/tmp/test.bin" :direction
:output
                                     :element-type 'unsigned-byte 
                                     :if-exists :supersede)
                     (loop for b from 0 to 255 do (write-byte b f)))
                     "/tmp/test.bin"))
          (v-code (with-open-file (f file :direction :input) 
                     (loop for c = (read-char f nil nil)
                           when c collect (char-code c)
                           while c)))
          (v-byte (with-open-file (f file :direction :input 
                                   :element-type 'unsigned-byte) 
                     (loop for c = (read-byte f nil nil)
                           when c collect c
                           while c))))
      (remove-if #'(lambda (a) (= (first a) (rest a)))
           (mapcar #'cons v-byte v-code))))

Clisp:

[7]> (my-ugly-test)
 
"CLISP"
"2.27 (released 2001-07-17) (built 3214131121) (memory 3250927427)"
((13 . 10))

in clisp, if byte is 13 the code returned by (char-code (read-char
stream))
for this byte is 10. if byte is 10 the value retured is 10.
Why ?
How can I differentiate bytes 10 and 13 with clisp ?

Allegro and Lispworks are ok:

Allegro:

USER(4): (my-ugly-test)
 
"Allegro CL Trial Edition"
"5.0 [Linux/X86] (8/29/98 10:57)"
NIL

Lispworks:

CL-USER 57 > (my-ugly-test)

"LispWorks" 
"4.2.6" 
NIL

So, Tim's solution works at least with lispworks and allegro.
Unfortunately I use clisp for cgi scripts...

Thibault

 
> 1. Use the CMUCL-specific function SYSTEM:READ-N-BYTES with *STANDARD-INPUT*
>    as the stream argument [lines tagged "T:" are typed input]:
> 
> 	> (defvar *buf* (make-array 10 :element-type '(unsigned-byte 8)))
>  *BUF*
> 	> *buf*
>  #(0 0 0 0 0 0 0 0 0 0)
> 	> (system:read-n-bytes *standard-input* *buf* 0 6 nil)
>    T:	hello!
> 	6
> 	> *buf*
>  #(104 101 108 108 111 33 0 0 0 0)
> 	>
> 
> 2. Use the CMUCL-specific function SYSTEM:MAKE-FD-STREAM, since Unix
>    standard input is always file descriptor #0:
> 
> 	> (with-open-stream (s (system:make-fd-stream
>  (unix:unix-dup 0)
> 				:element-type '(unsigned-byte 8)))
> 	    (loop for i = (read-byte s)
>                   collect i
>                   until (= i 10)))
>    T:	hello, there!
> 	(104 101 108 108 111 44 32 116 104 101 114 101 33 10)
> 	> 
> 
> Or some combination of both...
> 
> 
> -Rob
> 
> p.s. Why did I use (UNIX:UNIX-DUP 0) as the fd instead of just 0?
> Well... When you close an fd-stream [which WITH-OPEN-STREAM will do,
> of course], CMUCL will also close the underlying Unix file descriptor,
> which if it were the "real" fd #0 would cause subsequent reads to its
> own *STANDARD-INPUT* to fail and then you'd be in a world of hurt.
> [About the only way out at that point is to call (UNIX:UNIX-EXIT).]
> 
> -----
> Rob Warnock, PP-ASEL-IA		<····@rpw3.org>
> 627 26th Avenue			<URL:http://rpw3.org/>
> San Mateo, CA 94403		(650)572-2607

From: Sam Steingold
Subject: Re: read-byte and *standard-input*
Date: Fri, 02 May 2003 16:49:33 +0000
Message-ID: <m31xzh2szu.fsf@loiso.podval.org>

> * In message <····························@posting.google.com>
> * On the subject of "Re: read-byte and *standard-input*"
> * Sent on 2 May 2003 09:03:04 -0700
> * Honorable ·················@di.fc.ul.pt (Thibault Langlois) writes:
>
> "CLISP"
> "2.27 (released 2001-07-17) (built 3214131121) (memory 3250927427)"

this is rather old...

> in clisp, if byte is 13 the code returned by (char-code (read-char
> stream)) for this byte is 10. if byte is 10 the value retured is 10.
> Why ?  How can I differentiate bytes 10 and 13 with clisp ?

CLISP READ-CHAR reads both bytes 10 (LF) and 13 (CR) as #\Newline,
which has CHAR-CODE of 10.

Consider a file foo.dos created like this:

(with-open-file (s "foo.dos" :direction :output
                 :element-type '(unsigned-byte 8))
  (write-sequence
   (mapcar #'char-code
           '(#\f #\o #\o #\Newline #\b #\a #\r #\Return #\Newline))
  s))

now, what should

(with-open-file (s "foo.dos" :direction :input
                 :element-type 'character
                 :external-format :dos)
  (read-line s))

return?

You might reasonably argue that the right string to return is
"foo\fbar".  Unfortunately, "\f" (== (code-char 10)) is #\Newline in
CLISP, so READ-LINE would return a string with an embedded newline,
which, if not outright non-compliant, would be quite surprising to a
user.

Because of this problem, CLISP reads CR, LF and CRLF as #\Newline.

-- 
Sam Steingold (http://www.podval.org/~sds) running RedHat9 GNU/Linux
<http://www.camera.org> <http://www.iris.org.il> <http://www.memri.org/>
<http://www.mideasttruth.com/> <http://www.palestine-central.com/links.html>
If brute force does not work, you are not using enough.

From: Marc Battyani
Subject: Re: read-byte and *standard-input*
Date: Fri, 02 May 2003 19:23:03 +0000
Message-ID: <ABDABA5D7AE00810.2124640825C881E2.3221B2496A2A2D6C@lp.airnews.net>

"Thibault Langlois" <·················@di.fc.ul.pt> wrote

> In Lispworks too:
> CL-USER 1 > (read-byte *standard-input*)
>
> Error: No applicable methods for #<STANDARD-GENERIC-FUNCTION
> STREAM:STREAM-READ-BYTE 20787A72> with args (#<EDITOR::RUBBER-STREAM
> #<EDITOR:BUFFER CAPI interactive-pane 2>>)

In LW it works for file and socket streams where it's useful. It doesn't make
much sense to do this with an editor input stream.
I asked Xanalys for this functionality because I needed it for cl-pdf and web
applications with mod_lisp, which have mixed text/binary contents. (It's now
in the standard LW) I think bivalent streams should be present in all
implementations as it's really needed these days.

Marc