Actual bit patterns used by write-byte and read-byte?

From: Peter Seibel
Subject: Actual bit patterns used by write-byte and read-byte?
Date: Sat, 14 Aug 2004 21:41:47 +0000
Message-ID: <m31xi94cvh.fsf@javamonkey.com>

Does the language standard say anything about the actual bit patterns
that will be written to disk if I open a file stream with
:element-type '(unsigned-byte 8) and call write-byte? Similarly, what
are we guaranteed about how read-byte interprets the "bytes" it reads?
(I assume in reality that on all contemporary operating systems
streams with :element-type (unsigned-byte 8) operate on octets.) How
about for larger sizes of unsigned-byte, e.g. 16 or 32? Big endian?
Little endian? Up to the implementation?

-Peter


-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp

Re: Actual bit patterns used by write-byte and read-byte? Pascal Bourguignon
Re: Actual bit patterns used by write-byte and read-byte? David Steuber
- Re: Actual bit patterns used by write-byte and read-byte? Pascal Bourguignon
  - Re: Actual bit patterns used by write-byte and read-byte? Pascal Bourguignon
  - Re: Actual bit patterns used by write-byte and read-byte? Vassil Nikolov
    - Re: Actual bit patterns used by write-byte and read-byte? David Steuber
      - Re: Actual bit patterns used by write-byte and read-byte? Vassil Nikolov

From: Pascal Bourguignon
Subject: Re: Actual bit patterns used by write-byte and read-byte?
Date: Sat, 14 Aug 2004 22:08:52 +0000
Message-ID: <87hdr5cqyz.fsf@thalassa.informatimago.com>

Peter Seibel <·····@javamonkey.com> writes:

> Does the language standard say anything about the actual bit patterns
> that will be written to disk if I open a file stream with
> :element-type '(unsigned-byte 8) and call write-byte? Similarly, what
> are we guaranteed about how read-byte interprets the "bytes" it reads?
> (I assume in reality that on all contemporary operating systems
> streams with :element-type (unsigned-byte 8) operate on octets.) How
> about for larger sizes of unsigned-byte, e.g. 16 or 32? Big endian?
> Little endian? Up to the implementation?

I've not seen anything about it in CLHS, so I'd guess it
implementation (and OS) dependant.

If you're running on a system where the byte is 6 bit, you may have
difficulties to interchange a (unsigned-byte 8) file with a system
where the byte is 9 bit.  You can only hope that your implementations
and your OS tools do the right thing.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

Our enemies are innovative and resourceful, and so are we. They never
stop thinking about new ways to harm our country and our people, and
neither do we.

From: David Steuber
Subject: Re: Actual bit patterns used by write-byte and read-byte?
Date: Wed, 18 Aug 2004 04:17:04 +0000
Message-ID: <87zn4t2i7z.fsf@david-steuber.com>

Peter Seibel <·····@javamonkey.com> writes:

> Does the language standard say anything about the actual bit patterns
> that will be written to disk if I open a file stream with
> :element-type '(unsigned-byte 8) and call write-byte? Similarly, what
> are we guaranteed about how read-byte interprets the "bytes" it reads?
> (I assume in reality that on all contemporary operating systems
> streams with :element-type (unsigned-byte 8) operate on octets.) How
> about for larger sizes of unsigned-byte, e.g. 16 or 32? Big endian?
> Little endian? Up to the implementation?

This question has some relevance to a hack I'm working on to write
TIFF 6.0 compliant files.  I'm only going to be writing MM byte order
files, so I have the joy of figuring out how to make sure I write out
the bytes in the right order.  I C I would be using bit masks and the
right shift (>>) operator.  I'm not sure the best way to do it in Lisp
yet.

TIFF also supports a FillOrder field that says which order the bits
go in bytes (which are assumed to be 8 bits by the TIFF standard that
I have).  For bileval images (I'm only interested in RGB at present),
fill order in lisp would be nice to know (although I suppose a
bit-vector can insure the order you want with write-sequence).

I actually kind of wish I found some code for writing TIFF files on
Cliki.

-- 
An ideal world is left as an excercise to the reader.
   --- Paul Graham, On Lisp 8.1

From: Pascal Bourguignon
Subject: Re: Actual bit patterns used by write-byte and read-byte?
Date: Wed, 18 Aug 2004 13:17:43 +0000
Message-ID: <87u0v07fgo.fsf@thalassa.informatimago.com>

David Steuber <·····@david-steuber.com> writes:

> Peter Seibel <·····@javamonkey.com> writes:
> 
> > Does the language standard say anything about the actual bit patterns
> > that will be written to disk if I open a file stream with
> > :element-type '(unsigned-byte 8) and call write-byte? Similarly, what
> > are we guaranteed about how read-byte interprets the "bytes" it reads?
> > (I assume in reality that on all contemporary operating systems
> > streams with :element-type (unsigned-byte 8) operate on octets.) How
> > about for larger sizes of unsigned-byte, e.g. 16 or 32? Big endian?
> > Little endian? Up to the implementation?
> 
> This question has some relevance to a hack I'm working on to write
> TIFF 6.0 compliant files.  I'm only going to be writing MM byte order
> files, so I have the joy of figuring out how to make sure I write out
> the bytes in the right order.  I C I would be using bit masks and the
> right shift (>>) operator.  I'm not sure the best way to do it in Lisp
> yet.

I'd use LDB and DPB. 

    (LDB (BYTE size offset) biginteger)   [ 4 parens in lisp ]
==   (biginteger>>offset)&((1<<size)-1)   [ 6 parens in C... ]

Therefore, to write 32 bits in big endian:

    (write-byte (ldb (byte 8 24) 32bits) out)
    (write-byte (ldb (byte 8 16) 32bits) out)
    (write-byte (ldb (byte 8  8) 32bits) out)
    (write-byte (ldb (byte 8  0) 32bits) out)


> TIFF also supports a FillOrder field that says which order the bits
> go in bytes (which are assumed to be 8 bits by the TIFF standard that
> I have).  For bileval images (I'm only interested in RGB at present),
> fill order in lisp would be nice to know (although I suppose a
> bit-vector can insure the order you want with write-sequence).

MU! 

I'd not bet on it.  It all depends on the hardware!  On a given
computer, this is actually totally meaningless given the definition of
BYTE.  The question arises only when transmiting the bits on a wire,
and the PHYSICAL protocols define the required bit order. It's up to
the electronicians to ensure the bits are transmited and received in
the right order. They usually do.

 
> I actually kind of wish I found some code for writing TIFF files on
> Cliki.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

Our enemies are innovative and resourceful, and so are we. They never
stop thinking about new ways to harm our country and our people, and
neither do we.

From: Pascal Bourguignon
Subject: Re: Actual bit patterns used by write-byte and read-byte?
Date: Wed, 18 Aug 2004 15:44:40 +0000
Message-ID: <87acws78nr.fsf@thalassa.informatimago.com>

Ingvar <······@hexapodia.net> writes:
> Actually, I think this is so one can accomodate different type of
> display hardware.
> 
> Imagine, if you will, that you have a "pixel on/pixel off" screen with
> 8-bit wide writeable addresses, with each byte being "from screen left
> to screen right".
> 
> We then have (I'll just show the 3 leftmost octets) two different
> possible ways (well, more, but only two that make "lots" of sense, I
> am willingly ignoring the bit layout for the ZX Spectrum vide RAM
> here) of mapping bits to pixels (bit numbers correspond to 2^n):
> 
> | octet1 | octet2 | octet3 |
>  76543210 76543210 76543210
> or
>  01234567 01234567 01234567
> 
> Thus, naively blasting in bytes to make "5 bits on, one off" (for a
> dashed straight line, say) would require, in one case, writing the
> values 251 239 190 and in the other 125 247 223.

Yes, but my point is that you won't solve it with a bit vector, and
cannot even know for sure how a (unsigned-byte 8) is stored bitwise,
because it all depends on the hardware.

On a given hardware, you may know that you have to store the bits in a
given permutation.  If you work at the byte level (instead of the bit
vector level), then you have some hope to get the same behavior from
several common-lisp implementation on the same hardware (because
hopefully they'll store bytes in the same way, using the hardware
primitives).

The fastest and for byte permutations not so memory hungry solution
would be to use a mapping vector:

(defun bit-reverse (byte)
  (setf byte (logior (logand #x55 (ash byte -1)) (logand #xaa (ash byte  1))))
  (setf byte (logior (logand #x33 (ash byte -2)) (logand #xcc (ash byte  2))))
  (setf byte (logior (logand #x0f (ash byte -4)) (logand #xf0 (ash byte  4))))
  byte)

(defparameter invert-bits
  (do ((table (make-array '(256) :element-type '(unsigned-byte 8)))
       (i 0 (1+ i)))
      ((<= 256 i) table)
    (setf (aref table i) (bit-reverse i))))

Then use: (aref invert-bits byte) to inverse the order of bits in byte.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

Our enemies are innovative and resourceful, and so are we. They never
stop thinking about new ways to harm our country and our people, and
neither do we.

From: Vassil Nikolov
Subject: Re: Actual bit patterns used by write-byte and read-byte?
Date: Thu, 19 Aug 2004 02:36:19 +0000
Message-ID: <lzzn4rhn18.fsf@janus.vassil.nikolov.names>

Pascal Bourguignon <····@mouse-potato.com> writes:

> [...]
> I'd use LDB and DPB. 
>
>     (LDB (BYTE size offset) biginteger)   [ 4 parens in lisp ]
> ==   (biginteger>>offset)&((1<<size)-1)   [ 6 parens in C... ]
>
> Therefore, to write 32 bits in big endian:
>
>     (write-byte (ldb (byte 8 24) 32bits) out)
>     (write-byte (ldb (byte 8 16) 32bits) out)
>     (write-byte (ldb (byte 8  8) 32bits) out)
>     (write-byte (ldb (byte 8  0) 32bits) out)

  And DPB when reading?

  Perhaps one should be careful with an implementation where a fixnum
  is less than 32 bits, since depositing the most significant 8 bits
  first might cause unnecessary consing.  (Depends on compiler
  optimizations, of course.)

  ---Vassil.

-- 
Vassil Nikolov <········@poboxes.com>

Hollerith's Law of Docstrings: Everything can be summarized in 72 bytes.

From: David Steuber
Subject: Re: Actual bit patterns used by write-byte and read-byte?
Date: Fri, 20 Aug 2004 04:42:24 +0000
Message-ID: <87llga1kun.fsf@david-steuber.com>

Vassil Nikolov <········@poboxes.com> writes:

>   Perhaps one should be careful with an implementation where a fixnum
>   is less than 32 bits, since depositing the most significant 8 bits
>   first might cause unnecessary consing.  (Depends on compiler
>   optimizations, of course.)

Is consing zeros the same as consing nothing?

My current TIFF writing code is linked to in another thread.  While it
is rather ugly, it does sort of work.

-- 
An ideal world is left as an excercise to the reader.
   --- Paul Graham, On Lisp 8.1

From: Vassil Nikolov
Subject: Re: Actual bit patterns used by write-byte and read-byte?
Date: Mon, 23 Aug 2004 04:38:22 +0000
Message-ID: <lzfz6etqo1.fsf@janus.vassil.nikolov.names>

David Steuber <·····@david-steuber.com> writes:

> Vassil Nikolov <········@poboxes.com> writes:
>
>>   Perhaps one should be careful with an implementation where a fixnum
>>   is less than 32 bits, since depositing the most significant 8 bits
>>   first might cause unnecessary consing.  (Depends on compiler
>>   optimizations, of course.)
>
> Is consing zeros the same as consing nothing?

  Since 0 is a fixnum, producing it won't involve consing [1].  My
  point was about 32-bit values outside of the fixnum range (many
  implementations have fewer-than-32-bit fixnums).  If a 32-bit value
  is produced from four 8-bit values by calling DPB four times, if the
  result is a bignum, and if the DPB call for the most significant
  byte is made first, then four bignums will be consed [2].  The
  result will be correct, of course, but performance might not be as
  good as possible.

  [1] To be extremely precise, that depends on the implementation, but
      in practice one wouldn't encounter implementations so bad that
      would cons a 0.

  [2] Unless a (really) good optimizing compiler does the right thing.

  ---Vassil.

-- 
Vassil Nikolov <········@poboxes.com>

Hollerith's Law of Docstrings: Everything can be summarized in 72 bytes.