Corman Lisp and binary files

From: Mark Carter
Subject: Corman Lisp and binary files
Date: Sat, 08 Apr 2006 10:48:39 +0000
Message-ID: <44379502$0$15789$14726298@news.sunsite.dk>

I'm trying to process a binary file using Corman Lisp; but I'm just not 
getting any loving. I stripped down my code to a test script:


(defmacro wif (stream filename &rest body)
   `(with-open-file (,stream ,filename :direction :input
			    :element-type   'character ; '(unsigned-byte 8) ;doesn't work in 
Corman
			    )
		   ,@body))

(defun main (filename)
   (wif fin filename
        (let ((buffer (make-array 4096
				 ;;:initial-element #\
				 :element-type 'character
				 ))
		  (bytes-read-sum 0))
	      (loop with bytes-read do
		    (setf bytes-read (read-sequence  buffer fin))
		    until (= 0 bytes-read)
		    do
		    (setf bytes-read-sum (+ bytes-read-sum bytes-read))
		    		
	      )
	      (write-line (format nil   "Total bytes ~A" bytes-read-sum))
	      )
	    ))

(main "somefile")


I've tried all sort of combinations of array and input types, but the 
number of bytes apparently read is less than the actual number of bytes 
in the file. I've tried to pin it down to something like the number of 
CR or LF in a file, but nothing seems to correspond. Can anyone shed 
some light on the mystery?

Re: Corman Lisp and binary files Ken Tilton
- Re: Corman Lisp and binary files Mark Carter
  - Re: Corman Lisp and binary files Mark Carter
    - Re: Corman Lisp and binary files Ken Tilton
      - Re: Corman Lisp and binary files Mark Carter
        Re: Corman Lisp and binary files Ken Tilton
        Re: Corman Lisp and binary files Mark Carter
        Re: Corman Lisp and binary files Joe Marshall
        Re: Corman Lisp and binary files Ken Tilton
        Re: Corman Lisp and binary files Pascal Costanza
        Re: Corman Lisp and binary files Joe Marshall
        Re: Corman Lisp and binary files Pascal Costanza
        Re: Corman Lisp and binary files Joe Marshall
        Re: Corman Lisp and binary files Pascal Costanza
        Re: Corman Lisp and binary files Pascal Bourguignon
    - Re: Corman Lisp and binary files Shyamal Prasad
      - Re: Corman Lisp and binary files Zach Beane
        Re: Corman Lisp and binary files Shyamal Prasad
        Re: Corman Lisp and binary files Zach Beane
        Re: Corman Lisp and binary files Rob Warnock
        Re: Corman Lisp and binary files Shyamal Prasad
        Re: Corman Lisp and binary files Pascal Bourguignon
        Re: Corman Lisp and binary files ··············@hotmail.com
        Re: Corman Lisp and binary files Pascal Bourguignon
        Re: Corman Lisp and binary files ··············@hotmail.com
        Re: Corman Lisp and binary files Pascal Bourguignon
        Re: Corman Lisp and binary files Rob Warnock
        Re: Corman Lisp and binary files Luís Oliveira
        Re: Corman Lisp and binary files Shyamal Prasad

From: Ken Tilton
Subject: Re: Corman Lisp and binary files
Date: Sat, 08 Apr 2006 13:55:04 +0000
Message-ID: <ZgPZf.3$K43.0@fe10.lga>

Mark Carter wrote:
> I'm trying to process a binary file using Corman Lisp; but I'm just not 
> getting any loving. I stripped down my code to a test script:
> 
> 
> (defmacro wif (stream filename &rest body)
>   `(with-open-file (,stream ,filename :direction :input
>                 :element-type   'character ; '(unsigned-byte 8) ;doesn't 
> work in Corman
>                 )
>            ,@body))
> 
> (defun main (filename)
>   (wif fin filename
>        (let ((buffer (make-array 4096
>                  ;;:initial-element #\
>                  :element-type 'character
>                  ))
>           (bytes-read-sum 0))
>           (loop with bytes-read do
>             (setf bytes-read (read-sequence  buffer fin))
>             until (= 0 bytes-read)
>             do
>             (setf bytes-read-sum (+ bytes-read-sum bytes-read))
>                    
>           )
>           (write-line (format nil   "Total bytes ~A" bytes-read-sum))
>           )
>         ))
> 
> (main "somefile")
> 
> 
> I've tried all sort of combinations of array and input types, but the 
> number of bytes apparently read is less than the actual number of bytes 
> in the file. I've tried to pin it down to something like the number of 
> CR or LF in a file, but nothing seems to correspond. Can anyone shed 
> some light on the mystery?

No. <g> But as a crazy stab, have you tried telling read-sequence how 
many bytes to read each time? If there is a problem with Corman, that 
might be a workaround.

Btw, you say the numbers do not correspond without saying how they do 
not. You should give that info up. Is it off by one byte? 100k? How big 
/is/ the file. How do you know? etc etc.

Finally, a loop note:

    (loop for bytes-read = (r-s b f) then (r-s b f)
	while (plusp bytes-read)
	summing bytes-read)

ken

-- 
Cells: http://common-lisp.net/project/cells/

"Have you ever been in a relationship?"
    Attorney for Mary Winkler, confessed killer of her
    minister husband, when asked if the couple had
    marital problems.

From: Mark Carter
Subject: Re: Corman Lisp and binary files
Date: Sat, 08 Apr 2006 18:32:57 +0000
Message-ID: <443801d4$0$15790$14726298@news.sunsite.dk>

Ken Tilton wrote:
> 
> 
> Mark Carter wrote:
> 
>> I'm trying to process a binary file using Corman Lisp; but I'm just 
>> not getting any loving. I stripped down my code to a test script:
>>
>>
>> (defmacro wif (stream filename &rest body)
>>   `(with-open-file (,stream ,filename :direction :input
>>                 :element-type   'character ; '(unsigned-byte 8) 
>> ;doesn't work in Corman
>>                 )
>>            ,@body))
>>
>> (defun main (filename)
>>   (wif fin filename
>>        (let ((buffer (make-array 4096
>>                  ;;:initial-element #\
>>                  :element-type 'character
>>                  ))
>>           (bytes-read-sum 0))
>>           (loop with bytes-read do
>>             (setf bytes-read (read-sequence  buffer fin))
>>             until (= 0 bytes-read)
>>             do
>>             (setf bytes-read-sum (+ bytes-read-sum bytes-read))
>>                              )
>>           (write-line (format nil   "Total bytes ~A" bytes-read-sum))
>>           )
>>         ))
>>
>> (main "somefile")
>>
>>
>> I've tried all sort of combinations of array and input types, but the 
>> number of bytes apparently read is less than the actual number of 
>> bytes in the file. I've tried to pin it down to something like the 
>> number of CR or LF in a file, but nothing seems to correspond. Can 
>> anyone shed some light on the mystery?
> 
> 
> No. <g> But as a crazy stab, have you tried telling read-sequence how 
> many bytes to read each time? If there is a problem with Corman, that 
> might be a workaround.

I'll investigate.

> Btw, you say the numbers do not correspond without saying how they do 
> not. You should give that info up. Is it off by one byte? 100k? How big 
> /is/ the file. How do you know? etc etc.

I use the dir command - which gives an accurate reading of the number of 
bytes used. diff confirms a discrepency. The original test file is 
289,349 bytes long. The file output by Lisp is 288,875 bytes, a 
difference of 474 bytes. The original file has 769 LFs, and 1248 CRs.

Dabbling with some numbers, Lisp reports that it think it wrote 288106 
bytes. Now, 288875-288106 is 769, which kinda indicates that Corman is 
dropping the LFs in its count. Not sure this proves anything yet, thoough.

The disappointing thing is that I had originally written a Pascal 
program to do what I wanted in probably about half an hour (and I only 
decided to pick up on Pascal about a week ago), and decided to translate 
it to Lisp to see how I got on. And now I know. And the Pascal program 
works fine.

From: Mark Carter
Subject: Re: Corman Lisp and binary files
Date: Sat, 08 Apr 2006 19:33:22 +0000
Message-ID: <44380ffd$0$15788$14726298@news.sunsite.dk>

Mark Carter wrote:

> The disappointing thing is that I had originally written a Pascal 
> program to do what I wanted in probably about half an hour (and I only 
> decided to pick up on Pascal about a week ago), and decided to translate 
> it to Lisp to see how I got on. And now I know. And the Pascal program 
> works fine.

After extensive fiddling, I've managed to get it to work. Basically, one 
needs to make everything an unsigned-byte. So, something like:

(defmacro wif (stream filename &rest body)
   `(with-open-file (,stream ,filename :direction :input
			    :element-type   'unsigned-byte

			    )
		   ,@body))

(defmacro wof (stream filename &rest body)
   `(with-open-file (,stream ,filename :direction :output
			    :element-type 'unsigned-byte
			    :if-exists :overwrite)
		   ,@body))

(defun main (filename)
   (wif fin filename
        (wof fout (string+ filename ".rot")
	    (let ((buffer (make-array 4096
				      :initial-element 0
				      :element-type 'unsigned-byte
				      ))
		  (bytes-read-sum 0))
	      (loop with bytes-read do
		    (setf bytes-read (read-sequence  buffer fin))
		    until (= 0 bytes-read)
		    do
		    (setf bytes-read-sum (+ bytes-read-sum bytes-read))
		    (setf buffer (loop for b in (coerce buffer 'list)
				       collect (transform b)))
		    ;(check (= (length buffer) bytes-read))
		    (write-sequence buffer fout :end bytes-read)
		
	      )
	      (write bytes-read-sum)
	      )
	    ))
   (write-line "finished")
)

Took me long enough to figure it all out - and the Lisp code isn't as 
fast as the Pascal version, either.

From: Ken Tilton
Subject: Re: Corman Lisp and binary files
Date: Sat, 08 Apr 2006 23:05:18 +0000
Message-ID: <OkXZf.100$1C4.70@fe09.lga>

Mark Carter wrote:
> Mark Carter wrote:
> 
>> The disappointing thing is that I had originally written a Pascal 
>> program to do what I wanted in probably about half an hour (and I only 
>> decided to pick up on Pascal about a week ago), and decided to 
>> translate it to Lisp to see how I got on. And now I know. And the 
>> Pascal program works fine.
> 
> 
> After extensive fiddling, I've managed to get it to work. Basically, one 
> needs to make everything an unsigned-byte. So, something like:
> 
> (defmacro wif (stream filename &rest body)
>   `(with-open-file (,stream ,filename :direction :input
>                 :element-type   'unsigned-byte
> 
>                 )
>            ,@body))
> 
> (defmacro wof (stream filename &rest body)
>   `(with-open-file (,stream ,filename :direction :output
>                 :element-type 'unsigned-byte
>                 :if-exists :overwrite)
>            ,@body))
> 
> (defun main (filename)
>   (wif fin filename
>        (wof fout (string+ filename ".rot")
>         (let ((buffer (make-array 4096
>                       :initial-element 0
>                       :element-type 'unsigned-byte
>                       ))
>           (bytes-read-sum 0))
>           (loop with bytes-read do
>             (setf bytes-read (read-sequence  buffer fin))
>             until (= 0 bytes-read)
>             do
>             (setf bytes-read-sum (+ bytes-read-sum bytes-read))
>             (setf buffer (loop for b in (coerce buffer 'list)
>                        collect (transform b)))
>             ;(check (= (length buffer) bytes-read))
>             (write-sequence buffer fout :end bytes-read)
>        
>           )
>           (write bytes-read-sum)
>           )
>         ))
>   (write-line "finished")
> )
> 
> Took me long enough to figure it all out - and the Lisp code isn't as 
> fast as the Pascal version, either.

So that is your language comparison test? Copying a binary file byte by 
byte? I think we found The Real Problem(tm).

:)

ken

-- 
Cells: http://common-lisp.net/project/cells/

"Have you ever been in a relationship?"
    Attorney for Mary Winkler, confessed killer of her
    minister husband, when asked if the couple had
    marital problems.

From: Mark Carter
Subject: Re: Corman Lisp and binary files
Date: Sat, 08 Apr 2006 23:18:19 +0000
Message-ID: <443844b6$0$15791$14726298@news.sunsite.dk>

Ken Tilton wrote:

> So that is your language comparison test? Copying a binary file byte by 
> byte? 

What I posted was test code. I stripped out the processing. Whilst I 
didn't want to do much processing, I found the the process of simply 
reading and writing a binary file to be quite tortuous.

From: Ken Tilton
Subject: Re: Corman Lisp and binary files
Date: Sun, 09 Apr 2006 01:50:29 +0000
Message-ID: <FLZZf.448$K43.319@fe10.lga>

Mark Carter wrote:
> Ken Tilton wrote:
> 
>> So that is your language comparison test? Copying a binary file byte 
>> by byte? 
> 
> 
> What I posted was test code. I stripped out the processing. Whilst I 
> didn't want to do much processing, I found the the process of simply 
> reading and writing a binary file to be quite tortuous.

I was not concerned about the code, I was concerned about the language 
comparison drawn from the exercise. Speaking of which, wouldn't C be 
even better?

ken

-- 
Cells: http://common-lisp.net/project/cells/

"Have you ever been in a relationship?"
    Attorney for Mary Winkler, confessed killer of her
    minister husband, when asked if the couple had
    marital problems.

From: Mark Carter
Subject: Re: Corman Lisp and binary files
Date: Sun, 09 Apr 2006 07:50:24 +0000
Message-ID: <4438bcba$0$15795$14726298@news.sunsite.dk>

Ken Tilton wrote:

> Speaking of which, wouldn't C be 
> even better?

I wanted to experiment with Pascal.

From: Joe Marshall
Subject: Re: Corman Lisp and binary files
Date: Mon, 10 Apr 2006 16:21:23 +0000
Message-ID: <1144686083.759309.238500@v46g2000cwv.googlegroups.com>

Mark Carter wrote:
> Ken Tilton wrote:
>
> > Speaking of which, wouldn't C be
> > even better?
> 
> I wanted to experiment with Pascal.

Constanza or Bourguignon?

From: Ken Tilton
Subject: Re: Corman Lisp and binary files
Date: Mon, 10 Apr 2006 20:52:16 +0000
Message-ID: <GAz_f.46$3D7.37@fe09.lga>

Joe Marshall wrote:
> Mark Carter wrote:
> 
>>Ken Tilton wrote:
>>
>>
>>>Speaking of which, wouldn't C be
>>>even better?
>>
>>I wanted to experiment with Pascal.
> 
> 
> Constanza or Bourguignon?
> 

Pascal or Beef?

kt

-- 
Cells: http://common-lisp.net/project/cells/

"Have you ever been in a relationship?"
    Attorney for Mary Winkler, confessed killer of her
    minister husband, when asked if the couple had
    marital problems.

From: Pascal Costanza
Subject: Re: Corman Lisp and binary files
Date: Mon, 10 Apr 2006 17:36:51 +0000
Message-ID: <49vjdjFqvaqcU1@individual.net>

Joe Marshall wrote:
> Mark Carter wrote:
> 
>>Ken Tilton wrote:
>>
>>>Speaking of which, wouldn't C be
>>>even better?
>>
>>I wanted to experiment with Pascal.
> 
> Constanza or Bourguignon?

Costanza, not Constanza.

Pascal

-- 
3rd European Lisp Workshop
July 3-4 - Nantes, France - co-located with ECOOP 2006
http://lisp-ecoop06.bknr.net/

From: Joe Marshall
Subject: Re: Corman Lisp and binary files
Date: Mon, 10 Apr 2006 20:32:25 +0000
Message-ID: <1144701145.730242.167700@v46g2000cwv.googlegroups.com>

Pascal Costanza wrote:
> Joe Marshall wrote:
> > Mark Carter wrote:
> >
> >>Ken Tilton wrote:
> >>
> >>>Speaking of which, wouldn't C be
> >>>even better?
> >>
> >>I wanted to experiment with Pascal.
> >
> > Constanza or Bourguignon?
>
> Costanza, not Constanza.

Mea culpa.  I tried to check the spelling on both but I guess I failed.

My apologies.

From: Pascal Costanza
Subject: Re: Corman Lisp and binary files
Date: Mon, 10 Apr 2006 21:32:32 +0000
Message-ID: <4a017fFq6ilcU3@individual.net>

Joe Marshall wrote:
> Pascal Costanza wrote:
> 
>>Joe Marshall wrote:
>>
>>>Mark Carter wrote:
>>>
>>>>Ken Tilton wrote:
>>>>
>>>>>Speaking of which, wouldn't C be
>>>>>even better?
>>>>
>>>>I wanted to experiment with Pascal.
>>>
>>>Constanza or Bourguignon?
>>
>>Costanza, not Constanza.
> 
> Mea culpa.  I tried to check the spelling on both but I guess I failed.
> 
> My apologies.

No problem. I am doomed because of this name. Almost everyone gets it 
wrong. So don't worry...

See http://www.pascalconstanza.de/ ;)


Pascal

-- 
3rd European Lisp Workshop
July 3-4 - Nantes, France - co-located with ECOOP 2006
http://lisp-ecoop06.bknr.net/

From: Joe Marshall
Subject: Re: Corman Lisp and binary files
Date: Tue, 11 Apr 2006 16:04:12 +0000
Message-ID: <1144771452.647329.108190@e56g2000cwe.googlegroups.com>

Pascal Costanza wrote:
>
> No problem. I am doomed because of this name. Almost everyone gets it
> wrong. So don't worry...
>
> See http://www.pascalconstanza.de/ ;)

Since I wanted to spell your name right, I googled it.
The first URL, p-cos.net didn't help, but the second URL was
http://www.pascalconstanza.de/

I didn't click on it, I just assumed that you'd registered your name
with the correct spelling.

From: Pascal Costanza
Subject: Re: Corman Lisp and binary files
Date: Tue, 11 Apr 2006 16:45:28 +0000
Message-ID: <4a24p9Fr50d8U1@individual.net>

Joe Marshall wrote:
> Pascal Costanza wrote:
> 
>>No problem. I am doomed because of this name. Almost everyone gets it
>>wrong. So don't worry...
>>
>>See http://www.pascalconstanza.de/ ;)
> 
> Since I wanted to spell your name right, I googled it.
> The first URL, p-cos.net didn't help, but the second URL was
> http://www.pascalconstanza.de/
> 
> I didn't click on it, I just assumed that you'd registered your name
> with the correct spelling.

Oh dear... ;)

But wait a moment: How can you google for the correct spelling of my 
name without typing my name?!?


Pascal

-- 
3rd European Lisp Workshop
July 3-4 - Nantes, France - co-located with ECOOP 2006
http://lisp-ecoop06.bknr.net/

From: Pascal Bourguignon
Subject: Re: Corman Lisp and binary files
Date: Tue, 11 Apr 2006 22:02:28 +0000
Message-ID: <87u08zlqiz.fsf@thalassa.informatimago.com>

Pascal Costanza <··@p-cos.net> writes:

> Joe Marshall wrote:
>> Pascal Costanza wrote:
>> 
>>>No problem. I am doomed because of this name. Almost everyone gets it
>>>wrong. So don't worry...
>>>
>>>See http://www.pascalconstanza.de/ ;)
>> Since I wanted to spell your name right, I googled it.
>> The first URL, p-cos.net didn't help, but the second URL was
>> http://www.pascalconstanza.de/
>> I didn't click on it, I just assumed that you'd registered your name
>> with the correct spelling.
>
> Oh dear... ;)
>
> But wait a moment: How can you google for the correct spelling of my
> name without typing my name?!?

Google is smart!

If it doesn't find with the correct spelling it returns with the wrong
spelling.

Even, it first look at the contents of the page, not at the url!

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

"You question the worthiness of my code? I should kill you where you
stand!"

From: Shyamal Prasad
Subject: Re: Corman Lisp and binary files
Date: Sun, 09 Apr 2006 06:07:51 +0000
Message-ID: <871ww72stx.fsf@turtle.local>

>>>>> "Mark" == Mark Carter <··@privacy.net> writes:

    Mark> Mark Carter wrote:
    >> The disappointing thing is that I had originally written a
    >> Pascal program to do what I wanted in probably about half an
    >> hour (and I only decided to pick up on Pascal about a week
    >> ago), and decided to translate it to Lisp to see how I got
    >> on. And now I know. And the Pascal program works fine.

    Mark> After extensive fiddling, I've managed to get it to
    Mark> work. Basically, one needs to make everything an
    Mark> unsigned-byte. So, something like:

I'm no expert, but I believe you've hit upon what is, IMHO, one of the
few areas where common lisp shows its age: it has a built in bias
towards "text" files as opposed to "binary" files. Support for binary
files is very implementation dependent, and it is hard to write any
efficient code using completely portable code. The CL standard
actually requires that text files do CR/LF conversion (I suspect this
was normal in pre-Unix days, but even I'm too young to know for
sure).

You basically hit on the solution: you need to find a type for which
the stream reader will not attempt to convert between the character
type and the actual character coding in the file. You discovered that
the type unsigned-byte does the trick in Corman (and, actually, also
in SBCL). I do not believe there is a truly portable way to write this
code in CL (because there is no concept of an 8 bit byte in CL, but I
might be wrong there). Actually, in SBCL we would usually write the
type as '(unsigned-byte 8).

    Mark> Took me long enough to figure it all out 

You did hit one of the most frustrating gaps in CL in my opinion, so
just getting it solved counts as a bit of a victory :)

    Mark> - and the Lisp code isn't as fast as the Pascal version,
    Mark> either.

I tried copying a large file (1,597,440,000 bytes)

·······@turtle:~/gen$ time cp bulk copy

real    1m51.128s
user    0m0.144s
sys     0m11.745s

and in my emacs SBCL buffer I get:

Evaluation took:
  116.272 seconds of real time
  9.988 seconds of user run time
  12.93 seconds of system run time
  0 page faults and
  21,825,752 bytes consed.
1597440000

The Lisp was only six seconds slower than the Unix cp command (of
course, cp was paged in, SBCL was already running, but that is at most
another second or so). So I'm really not sure what you are complaining
about.....it's not the language that is the problem.

Also, as far as your complaint later in the thread that the Lisp
version was harder to read than the Pascal - that is, IMHO, mostly
your own doing. Here is the code I wrote to test:

(defun read-file-size (in-file out-file)
  (with-open-file
   (istream in-file :direction :input :element-type 'unsigned-byte)

   (with-open-file
    (ostream out-file :direction :output :if-exists :supersede
	     :element-type 'unsigned-byte)

    (let ((buffer (make-array 4096 :element-type
			      (stream-element-type istream))))
      (do ((n 0 (+ n r)) r)
	  ((eql r 0) n)
	(setf r (read-sequence buffer istream))
	(if (> r 0 )
	    (write-sequence buffer ostream :end r)))))))

It is only hard to read if you've never done Lisp before me
thinks...oh, and by the way, this is the first bit of CL code I've
written in over 12 years so please give me a break if I'm doing
something un-stylish. I've been doing C, C++, Java and Perl (and emacs
lisp) for the last decade or so since I left college, so I'm in much
the same position as you.

Cheers!
Shyamal

From: Zach Beane
Subject: Re: Corman Lisp and binary files
Date: Sun, 09 Apr 2006 11:51:02 +0000
Message-ID: <m3slonj7bt.fsf@unnamed.xach.com>

Shyamal Prasad <·············@verizon.net> writes:

> I'm no expert, but I believe you've hit upon what is, IMHO, one of the
> few areas where common lisp shows its age: it has a built in bias
> towards "text" files as opposed to "binary" files. Support for binary
> files is very implementation dependent, and it is hard to write any
> efficient code using completely portable code.

It is in fact easy to process binary files with Common Lisp;
inexperienced fumblings should not be mistaken for actual
difficulty. 

See also:

   http://www.gigamonkeys.com/book/practical-parsing-binary-files.html

Zach

From: Shyamal Prasad
Subject: Re: Corman Lisp and binary files
Date: Sun, 09 Apr 2006 16:50:58 +0000
Message-ID: <87wtdy1z4i.fsf@turtle.local>

>>>>> "Zach" == Zach Beane <····@xach.com> writes:

    Zach> Shyamal Prasad <·············@verizon.net> writes:
    >> I'm no expert, but I believe you've hit upon what is, IMHO, one
    >> of the few areas where common lisp shows its age: it has a
    >> built in bias towards "text" files as opposed to "binary"
    >> files. Support for binary files is very implementation
    >> dependent, and it is hard to write any efficient code using
    >> completely portable code.

    Zach> It is in fact easy to process binary files with Common Lisp;
    Zach> inexperienced fumblings should not be mistaken for actual
    Zach> difficulty.

Agreed. 

I was just refering to CL as a standard. Or rather, cltl and the
hyperspec since I've never read the actual standard. The language
specification seems extremely stand offish with regards to processing
binary streams: there is no "8 bit byte" type data, the only
operations on binary streams are one unit at a time read/write, and
converting from codes to characters is full of proviso's. It all seems
to hint to a time before the "all files are fundamentally sequences of
8 bit bytes" view of the world that most people have today. 

It can all seem overwhelming to a newbie. I've had CLTL and Paul
Graham's "ANSI Common Lisp" on my shelf for years (dusty
unfortunately). I find that if you don't already know how to read
binary files, it can be very hard to learn how to do it. Which I
believe is what the OP ran into.

    Zach> See also:

    Zach>
    Zach> http://www.gigamonkeys.com/book/practical-parsing-binary-files.html

I've seen that book in stores, I guess that one chapter shows me it
really is a good one :) It still seems implementation specific
though(?).

But it still leaves open the my question: are types like
'unsigned-byte and '(unsigned-byte 8) largely portable amongst modern
CL implementations?

Cheers!
Shyamal

From: Zach Beane
Subject: Re: Corman Lisp and binary files
Date: Sun, 09 Apr 2006 18:21:48 +0000
Message-ID: <m3mzeuk3sz.fsf@unnamed.xach.com>

Shyamal Prasad <·············@verizon.net> writes:

> I was just refering to CL as a standard. Or rather, cltl and the
> hyperspec since I've never read the actual standard. The language
> specification seems extremely stand offish with regards to processing
> binary streams: there is no "8 bit byte" type data, 

That's not true. (UNSIGNED-BYTE 8) is such a data type.

> the only operations on binary streams are one unit at a time
> read/write,

Also not true. READ-SEQUENCE works fine on binary streams.

Zach

From: Rob Warnock
Subject: Re: Corman Lisp and binary files
Date: Mon, 10 Apr 2006 06:47:32 +0000
Message-ID: <yt-dnQMmHegZYqTZnZ2dnUVZ_u-dnZ2d@speakeasy.net>

Zach Beane  <····@xach.com> wrote:
+---------------
| Shyamal Prasad <·············@verizon.net> writes:
| > I was just refering to CL as a standard. Or rather, cltl and the
| > hyperspec since I've never read the actual standard.
+---------------

For all practical purposes, the HyperSpec (CLHS) *is* "the actual
standard" [well, to be precise, mechanically-derived from the same
TeX sources the printed ANSI Standard was mechanically-derived from].
You can have confidence in the CLHS.

CLtL & CLtL2, on the other hand, are another matter entirely...

+---------------
| > The language specification seems extremely stand offish with regards
| > to processing binary streams: there is no "8 bit byte" type data, 
| 
| That's not true. (UNSIGNED-BYTE 8) is such a data type.
+---------------

So we don't confuse Shyamal *too* much, we should point out that
while most implementations do provide (UNSIGNED-BYTE 8), a random
implementation need not implement it except as a subset of a larger
type. Similarly, most implementations *don't* provide an exact
(UNSIGNED-BYTE 5) type, but some random implementation might.
Since he's interested in what specialized array types will
"do the right thing" with READ-SEQUENCE, we should mention
using UPGRADED-ARRAY-ELEMENT-TYPE to see how some specific
integer subtype behaves:

    cmucl> (upgraded-array-element-type '(unsigned-byte 5))

    (UNSIGNED-BYTE 8)
    cmucl> (make-array 10 :element-type '(unsigned-byte 5))

    #(0 0 0 0 0 0 0 0 0 0)
    cmucl> (type-of *)

    (SIMPLE-ARRAY (UNSIGNED-BYTE 8) (10))
    cmucl> 

And similarly:

    cmucl> (with-open-file (s "foo" :element-type '(unsigned-byte 5))
	     (describe s))
    #<Stream for file "foo"> is a structure of type FD-STREAM.
    ...[trimmed]...
    ELEMENT-SIZE: 1.
    ELEMENT-TYPE: (UNSIGNED-BYTE 8).
    FD: 6.
    BUFFERING: :FULL.
    ...[trimmed]...
    cmucl> 

So if you tried to read 5-bit bytes with CMUCL you would silently
get 8-bit bytes instead.

That said, (UNSIGNED-BYTE 8) *is* probably the type you want to use
for binary streams of 8-bit bytes on most CL implementations.


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Shyamal Prasad
Subject: Re: Corman Lisp and binary files
Date: Tue, 11 Apr 2006 03:09:46 +0000
Message-ID: <87lkuc24me.fsf@turtle.local>

>>>>> "Rob" == Rob Warnock <····@rpw3.org> writes:

    Rob> So if you tried to read 5-bit bytes with CMUCL you would
    Rob> silently get 8-bit bytes instead.

Thanks for a very educational post.

With SBCL I see


* (upgraded-array-element-type '(unsigned-byte 5))

(UNSIGNED-BYTE 7)
* (upgraded-array-element-type '(unsigned-byte 8))

(UNSIGNED-BYTE 8)

which made me start wondering what would happen if I used an
:element-type of '(unsigned-byte 7) with read-sequence. Any clues?
(Yes, I will try it but I'd be curious to hear).

Cheers!
Shyamal

From: Pascal Bourguignon
Subject: Re: Corman Lisp and binary files
Date: Tue, 11 Apr 2006 13:01:24 +0000
Message-ID: <87irpgmfkr.fsf@thalassa.informatimago.com>

Shyamal Prasad <·············@verizon.net> writes:

>>>>>> "Rob" == Rob Warnock <····@rpw3.org> writes:
>
>     Rob> So if you tried to read 5-bit bytes with CMUCL you would
>     Rob> silently get 8-bit bytes instead.
>
> Thanks for a very educational post.
>
> With SBCL I see
>
>
> * (upgraded-array-element-type '(unsigned-byte 5))
>
> (UNSIGNED-BYTE 7)
> * (upgraded-array-element-type '(unsigned-byte 8))
>
> (UNSIGNED-BYTE 8)
>
> which made me start wondering what would happen if I used an
> :element-type of '(unsigned-byte 7) with read-sequence. Any clues?
> (Yes, I will try it but I'd be curious to hear).

Obviously, it is implementation dependant.

If the implementation chooses to allow reading random files as
(unsigned-byte 7) files, I would expect an exception when reading any
octet with the most significant bit set.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

The world will now reboot.  don't bother saving your artefacts.

From: ··············@hotmail.com
Subject: Re: Corman Lisp and binary files
Date: Tue, 11 Apr 2006 22:38:26 +0000
Message-ID: <1144795106.118333.37930@v46g2000cwv.googlegroups.com>

Pascal Bourguignon wrote:
> Shyamal Prasad <·············@verizon.net> writes:
>
> >>>>>> "Rob" == Rob Warnock <····@rpw3.org> writes:
> >
> >     Rob> So if you tried to read 5-bit bytes with CMUCL you would
> >     Rob> silently get 8-bit bytes instead.
> >
> > Thanks for a very educational post.
> >
> > With SBCL I see
> >
> >
> > * (upgraded-array-element-type '(unsigned-byte 5))
> >
> > (UNSIGNED-BYTE 7)
> > * (upgraded-array-element-type '(unsigned-byte 8))
> >
> > (UNSIGNED-BYTE 8)
> >
> > which made me start wondering what would happen if I used an
> > :element-type of '(unsigned-byte 7) with read-sequence. Any clues?
> > (Yes, I will try it but I'd be curious to hear).
>
> Obviously, it is implementation dependant.
>
> If the implementation chooses to allow reading random files as
> (unsigned-byte 7) files, I would expect an exception when reading any
> octet with the most significant bit set.

You could also, I believe, have implementations that read in 8-bit
files and decode 8 7-bit integers from every 7 8-bit octets in the
file. No bit pattern need be rejected. This might depend on the file
system in question.

I believe there are two separate issues here, in danger of being
confused.

1) upgraded-array-element-types, which are essentially compact or
efficienct representations *in memory* which might or might not be
supported by an implementation. E.g. 32-bit integers useful for OS
interaction.

2) binary representation of different byte sizes on files/streams
(including files which might be located on servers elsewhere in the
network, running completely different OS's).

#1 is not generally needed for making programs portable across Lisp
implementations, but is useful for making efficient programs on a fixed
combination of Lisp implementation, OS, and machine.

#2 is not generally needed for useful work on a fixed combination of
Lisp implementation, OS, and machine, but is in theory crucial to
getting things to work when file servers, etc., are heterogeneous.

Lisp environments might have very different levels of support between
#1 and #2, and the existence of separate array types for different bit
widths in #1 does not (as far as I know) actually imply any particular
behavior in case #2.

For Lisp, there is a lot of historical baggage (before my time) having
to do with supporting architectures and file servers of various byte
widths (such as 36-bit binary words and 6-bit character encodings), but
the standard does not mandate a particular way in which this must be
accomplished.

Today, with UNIX/Windows and "every byte is 8 bits, possibly with some
semi-standard character encoding" ruling the world, the world is far
less heterogeneous. One might expect implementations to agree in
practice, but not because the Common Lisp standard is mandating it.

From: Pascal Bourguignon
Subject: Re: Corman Lisp and binary files
Date: Tue, 11 Apr 2006 22:59:52 +0000
Message-ID: <87ek03lnvb.fsf@thalassa.informatimago.com>

···············@hotmail.com" <············@gmail.com> writes:
>> > * (upgraded-array-element-type '(unsigned-byte 5))
>> >
>> > (UNSIGNED-BYTE 7)
>> > * (upgraded-array-element-type '(unsigned-byte 8))
>> >
>> > (UNSIGNED-BYTE 8)
>> >
>> > which made me start wondering what would happen if I used an
>> > :element-type of '(unsigned-byte 7) with read-sequence. Any clues?
>> > (Yes, I will try it but I'd be curious to hear).
>>
>> Obviously, it is implementation dependant.
>>
>> If the implementation chooses to allow reading random files as
>> (unsigned-byte 7) files, I would expect an exception when reading any
>> octet with the most significant bit set.
>
> You could also, I believe, have implementations that read in 8-bit
> files and decode 8 7-bit integers from every 7 8-bit octets in the
> file. No bit pattern need be rejected. This might depend on the file
> system in question.

Well there's a difficulty: CLHS specifies that FILE-LENGTH returns a
number of element-type units.  Since in general file systems specify
file sizes in bytes, you can have some uncertainy for small
element-types.  That's why clisp and other implementations use a
header.

Granted, in the case of 7-bit element-type mapped to a 8-bit file, and
other similar cases, it would be possible to avoid the header, and
just lose some filler bits, without ambiguity.

> Today, with UNIX/Windows and "every byte is 8 bits, possibly with some
> semi-standard character encoding" ruling the world, the world is far
> less heterogeneous. One might expect implementations to agree in
> practice, but not because the Common Lisp standard is mandating it.

Yes, it would be good if there was a more restrictive "substandard"
for some important subsets of plateforms.  We can keep the lax CL
standard for implementations that target specialty hardware.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

"Our users will know fear and cower before our software! Ship it!
Ship it and let them flee like the dogs they are!"

From: ··············@hotmail.com
Subject: Re: Corman Lisp and binary files
Date: Wed, 12 Apr 2006 00:22:52 +0000
Message-ID: <1144801372.589255.14520@z34g2000cwc.googlegroups.com>

Pascal Bourguignon wrote:
> ···············@hotmail.com" <············@gmail.com> writes:

> >
> > You could also, I believe, have implementations that read in 8-bit
> > files and decode 8 7-bit integers from every 7 8-bit octets in the
> > file. No bit pattern need be rejected. This might depend on the file
> > system in question.
>
> Well there's a difficulty: CLHS specifies that FILE-LENGTH returns a
> number of element-type units.  Since in general file systems specify
> file sizes in bytes, you can have some uncertainy for small
> element-types.  That's why clisp and other implementations use a
> header.

FILE-LENGTH does not have to be consistent with any other tool, nor
even with any other Lisp implementation on the same (combination of)
platforms. It can also bail and return NIL.

Not that any of these possibilities are likely to be what the user
"really wants."

> Granted, in the case of 7-bit element-type mapped to a 8-bit file, and
> other similar cases, it would be possible to avoid the header, and
> just lose some filler bits, without ambiguity.
>
> > Today, with UNIX/Windows and "every byte is 8 bits, possibly with some
> > semi-standard character encoding" ruling the world, the world is far
> > less heterogeneous. One might expect implementations to agree in
> > practice, but not because the Common Lisp standard is mandating it.
>
> Yes, it would be good if there was a more restrictive "substandard"
> for some important subsets of plateforms.  We can keep the lax CL
> standard for implementations that target specialty hardware.

Perhaps. But someday, we might hope, Windows and UNIX are going to be
long-forgotten "specialty" platforms, but Lisp will still be around.
Text/binary interchange might yet converge to more robust standards.

On the other hand, 36-bit bytes and SIXBIT are probably never coming
back, outside of toy emulators for recreational use.

From: Pascal Bourguignon
Subject: Re: Corman Lisp and binary files
Date: Wed, 12 Apr 2006 00:50:48 +0000
Message-ID: <87acarliqf.fsf@thalassa.informatimago.com>

···············@hotmail.com" <············@gmail.com> writes:
>> Yes, it would be good if there was a more restrictive "substandard"
>> for some important subsets of plateforms.  We can keep the lax CL
>> standard for implementations that target specialty hardware.
>
> Perhaps. But someday, we might hope, Windows and UNIX are going to be
> long-forgotten "specialty" platforms, but Lisp will still be around.
> Text/binary interchange might yet converge to more robust standards.
>
> On the other hand, 36-bit bytes and SIXBIT are probably never coming
> back, outside of toy emulators for recreational use.

Perhaps we'll have 84-bit words, 21-bit bytes some day?

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
The rule for today:
Touch my tail, I shred your hand.
New rule tomorrow.

From: Rob Warnock
Subject: Re: Corman Lisp and binary files
Date: Wed, 12 Apr 2006 03:49:22 +0000
Message-ID: <V-KdnSCDTNJf5aHZRVn-vw@speakeasy.net>

Pascal Bourguignon  <······@informatimago.com> wrote:
+---------------
| ···············@hotmail.com" <············@gmail.com> writes:
| > Perhaps. But someday, we might hope, Windows and UNIX are going to be
| > long-forgotten "specialty" platforms, but Lisp will still be around.
| > Text/binary interchange might yet converge to more robust standards.
| >
| > On the other hand, 36-bit bytes and SIXBIT are probably never coming
| > back, outside of toy emulators for recreational use.
| 
| Perhaps we'll have 84-bit words, 21-bit bytes some day?
+---------------

The PDP-10 is dead!  Long live the PDP-10!  LDB & DPB forever!!!


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Luís Oliveira
Subject: Re: Corman Lisp and binary files
Date: Sun, 09 Apr 2006 17:27:27 +0000
Message-ID: <m2y7yehd6o.fsf@deadspam.com>

Shyamal Prasad <·············@verizon.net> writes:
> But it still leaves open the my question: are types like
> 'unsigned-byte and '(unsigned-byte 8) largely portable amongst modern
> CL implementations?

http://www.lispworks.com/documentation/HyperSpec/Body/t_unsgn_.htm

-- 
Luís Oliveira
luismbo (@) gmail (.) com
Equipa Portuguesa do Translation Project
http://www.iro.umontreal.ca/translation/registry.cgi?team=pt

From: Shyamal Prasad
Subject: Re: Corman Lisp and binary files
Date: Sun, 09 Apr 2006 18:56:57 +0000
Message-ID: <87slom1tb0.fsf@turtle.local>

>>>>> "Luís" == Luís Oliveira <·············@deadspam.com> writes:

    Luís> Shyamal Prasad <·············@verizon.net> writes:

    >> But it still leaves open the my question: are types like
    >> 'unsigned-byte and '(unsigned-byte 8) largely portable amongst
    >> modern CL implementations?

    Luís> http://www.lispworks.com/documentation/HyperSpec/Body/t_unsgn_.htm

>>>>> "Zach" == Zach Beane <·············@deadspam.com> writes:

    >> binary streams: there is no "8 bit byte" type data, 

    Zach> That's not true. (UNSIGNED-BYTE 8) is such a data type.

    >> the only operations on binary streams are one unit at a time
    >> read/write,

    Zach> Also not true. READ-SEQUENCE works fine on binary streams.

Thanks for setting me straight on this.

Cheers!
Shyamal