Byte swapping streams

From: ·············@planet.nl
Subject: Byte swapping streams
Date: Mon, 27 Aug 2007 13:36:39 +0000
Message-ID: <1188221799.835416.178780@w3g2000hsg.googlegroups.com>

Hi all,

I have written some tools to read/write astronomical data formats
using CMUCL. For these formats, it is necessary to byte swap the data
before storing or after reading on little endian machines. With
Allegro and CMUCL, I can, and do,  use READ-VECTOR/WRITE-VECTOR from
SIMPLE-STREAMS. Is there something comparable available in Lispworks
or SBCL?

Regards,

Michiel Brentjens

Re: Byte swapping streams Zach Beane
- Re: Byte swapping streams ·············@planet.nl

From: Zach Beane
Subject: Re: Byte swapping streams
Date: Mon, 27 Aug 2007 13:50:39 +0000
Message-ID: <m37inhaxxs.fsf@unnamed.xach.com>

·············@planet.nl writes:

> Hi all,
>
> I have written some tools to read/write astronomical data formats
> using CMUCL. For these formats, it is necessary to byte swap the data
> before storing or after reading on little endian machines. With
> Allegro and CMUCL, I can, and do,  use READ-VECTOR/WRITE-VECTOR from
> SIMPLE-STREAMS. Is there something comparable available in Lispworks
> or SBCL?

I use READ-SEQUENCE and WRITE-SEQUENCE (or just READ-BYTE and
WRITE-BYTE) on (unsigned-byte 8) streams, and assemble integers from
octets myself.

Some file formats, like ESRI GIS shapefiles, mix big- and
little-endian data in the same file, and sometimes in the same
header. Fun stuff.

Zach

From: ·············@planet.nl
Subject: Re: Byte swapping streams
Date: Mon, 27 Aug 2007 14:34:39 +0000
Message-ID: <1188225279.558182.143350@19g2000hsx.googlegroups.com>

On Aug 27, 3:50 pm, Zach Beane <····@xach.com> wrote:
> ·············@planet.nl writes:
> > Hi all,
>
> > I have written some tools to read/write astronomical data formats
> > using CMUCL. For these formats, it is necessary to byte swap the data
> > before storing or after reading on little endian machines. With
> > Allegro and CMUCL, I can, and do,  use READ-VECTOR/WRITE-VECTOR from
> > SIMPLE-STREAMS. Is there something comparable available in Lispworks
> > or SBCL?
>
> I use READ-SEQUENCE and WRITE-SEQUENCE (or just READ-BYTE and
> WRITE-BYTE) on (unsigned-byte 8) streams, and assemble integers from
> octets myself.
>
> Some file formats, like ESRI GIS shapefiles, mix big- and
> little-endian data in the same file, and sometimes in the same
> header. Fun stuff.
>
> Zach

I have been implementing a similar sort of "compatibility layer" in
SBCL myself. 16 and 32 bit Integers are no problem. However, things
get interesting when you want to read large quantities of IEEE 32 bit
and 64 bit floating point numbers *efficiently* (kind of a requirement
when processing data files bigger than 1 GB). I have stolen and
optimized a bit of code from IEEE-FLOATS (http://common-lisp.net/
project/ieee-floats/), but especially reading and byte-swapping 64 bit
numbers on a 32 bit machine using SBCL 1.0.5 is not nearly as fast as
the CMUCL 19c READ-VECTOR implementation. The bit I came up with for
64 bit floating point numbers follows:


;; Stolen from IEEE-FLOATS...
(defun decode-float64 (bits)
   (declare (optimize (speed 3) (safety 1))
            (type (unsigned-byte 64) bits))
   (let* ((sign (ldb (byte 1 63) bits))
          (exponent (ldb (byte 11 52) bits))
          (significand (ldb (byte 52 0) bits)))
     (if (zerop exponent) (setf exponent 1)
         (setf (ldb (byte 1 52) significand) 1))
     (unless (zerop sign) (setf significand (- significand)))
     (scale-float (float significand 1.0d0) (- exponent 1075))))

(defun make-unsigned-byte-64 (low high)
  (declare (optimize (speed 3) (safety 1))
           (type (unsigned-byte 32) low high))
  (let ((n 0))
    (setf (ldb (byte 32 0) n) low
          (ldb (byte 32 32) n) high)
    n))

(defun double-float<-octets (octets &key endian-swap)
  (declare (optimize (speed 3) (safety 0))
           (type (simple-array (unsigned-byte 8) (8)) octets))
  (let ((low 0) (high 0))
    (declare (type (unsigned-byte 32) low high))
    (if endian-swap
        (loop :for start :of-type (integer 0 32) :from (- 32
8) :downto 0 :by 8
              :for ix :below 4 :do
              (setf (ldb (byte 8 start) low) (aref octets (+ 4 ix)))
              (setf (ldb (byte 8 start) high) (aref octets ix)))
        (loop :for start :of-type (integer 0 24) :from 0 :to 24 :by 8
              :for ix :below 4 :do
              (setf (ldb (byte 8 start) low) (aref octets ix))
              (setf (ldb (byte 8 start) high) (aref octets (+ ix
4)))))
    (decode-float64 (make-unsigned-byte-64 low high))))

(defun read-vector-double-float (vector stream &key endian-swap)
  "stream must be an (unsigned-byte 8) binary stream. This should be a
more-or-less drop-in version of read-vector"
  (declare (optimize (speed 3) (safety 1))
           (type (simple-array double-float (*)) vector)
           (type stream stream))
  (let ((octets (make-array 8 :element-type '(unsigned-byte 8))))
    (declare (type (simple-array (unsigned-byte 8) (8)) octets))
    (loop :for ix :below (length vector)
          :for bytes-read  = (read-sequence octets stream)
          :summing bytes-read  :into bytes :of-type (unsigned-byte 32)
          :when (> bytes-read 0) :do
          (setf (aref vector ix)
                (the double-float (double-float<-octets octets :endian-
swap endian-swap)))
          :finally (return bytes))))


Michiel