I frequently need to work with binary data files. Writing them with Lisp
is no problem. Reading them has posed some challenges. I think I've boiled
one of them down to a very simple case that I'm hoping one of you can
enlighten me on.
Suppose I want to read the following 16-bit integer: E8FD (little-endian).
As an (unsigned-byte 16) it is easily read with:
(defun read-u2 (in &optional (type :little-endian))
"read a 2-byte integer"
(let ((value 0))
(declare (type (unsigned-byte 16) value))
(case type
(:big-endian
(setf (ldb (byte 8 8) value) (read-byte in))
(setf (ldb (byte 8 0) value) (read-byte in)))
(otherwise
(setf (ldb (byte 8 0) value) (read-byte in))
(setf (ldb (byte 8 8) value) (read-byte in))))
value))
(with-open-file (fs "/home/jcunningham/turner/cnvt-to-seqb/int-test.bin"
:element-type '(unsigned-byte 8)
:direction :input)
(print (read-u2 fs)))
(modified from P. Siebel's development in PCL).
But if I want to read it as a signed integer by changing the declare to
(signed-byte 16) I get the following error:
==> The value 65000 is not of type (SIGNED-BYTE 16).
[Condition of type TYPE-ERROR]
It seems that internally the ldb isn't just setting bits in the
(signed-byte 16) value, but doing some bounds checking as well.
Now, I know I could open the file as a (signed-byte 8), but then when I
encountered some (unsigned-byte 16) binary field of sufficient size in the
file (both signed and unsigned fields are present), it will die there for
the same reason.
What I want to be able to do is twiddle high-byte in this two-byte
signed-word without it objecting to what the actual bits are.
Is there a way to do this? (byte-wise - doing it one bit at a time would
be horribly inefficient).
Thanks!
-Jeff
jkc <·······@makewavs.com> writes:
> I frequently need to work with binary data files. Writing them with Lisp
> is no problem. Reading them has posed some challenges. I think I've boiled
> one of them down to a very simple case that I'm hoping one of you can
> enlighten me on.
>
> Suppose I want to read the following 16-bit integer: E8FD (little-endian).
> As an (unsigned-byte 16) it is easily read with:
>
> (defun read-u2 (in &optional (type :little-endian))
> "read a 2-byte integer"
> (let ((value 0))
> (declare (type (unsigned-byte 16) value))
> (case type
> (:big-endian
> (setf (ldb (byte 8 8) value) (read-byte in))
> (setf (ldb (byte 8 0) value) (read-byte in)))
> (otherwise
> (setf (ldb (byte 8 0) value) (read-byte in))
> (setf (ldb (byte 8 8) value) (read-byte in))))
> value))
>
> (with-open-file (fs "/home/jcunningham/turner/cnvt-to-seqb/int-test.bin"
> :element-type '(unsigned-byte 8)
> :direction :input)
> (print (read-u2 fs)))
>
> (modified from P. Siebel's development in PCL).
>
>
> But if I want to read it as a signed integer by changing the declare to
> (signed-byte 16) I get the following error:
>
> ==> The value 65000 is not of type (SIGNED-BYTE 16).
> [Condition of type TYPE-ERROR]
>
> It seems that internally the ldb isn't just setting bits in the
> (signed-byte 16) value, but doing some bounds checking as well.
>
> Now, I know I could open the file as a (signed-byte 8), but then when I
> encountered some (unsigned-byte 16) binary field of sufficient size in the
> file (both signed and unsigned fields are present), it will die there for
> the same reason.
>
> What I want to be able to do is twiddle high-byte in this two-byte
> signed-word without it objecting to what the actual bits are.
>
> Is there a way to do this? (byte-wise - doing it one bit at a time would
> be horribly inefficient).
Here's what I have in zpb-ttf to read some of the signed & unsigned
values from a TrueType file:
(defun read-uint16 (stream)
(loop repeat 2
for value = (read-byte stream)
then (logior (ash value 8) (read-byte stream))
finally (return value)))
(defun read-int16 (stream)
(let ((result (read-uint16 stream)))
(if (logbitp 15 result)
(1- (- (logandc2 #xFFFF result)))
result)))
(defun read-uint8 (stream)
(read-byte stream))
(defun read-int8 (stream)
(let ((result (read-byte stream)))
(if (logbitp 7 result)
(1- (- (logandc2 #xFF result)))
result)))
Zach
On Sat, 04 Aug 2007 21:00:48 -0400, Zach Beane wrote:
> [quoted text muted]
Thanks to both of you, I have a much better idea what's going on now. I've
coded up some test functions and was playing around with optimization. It
seems like since the types are known that it should be possible to exploit
that. But everything I tried in the way of coercion actually made things
*much slower*.
Is there any way to make this go faster?
(defun read-uint16 (stream)
"Read (unsigned-byte 16) from (unsigned-byte 8) stream "
(declare (optimize (speed 3) (safety 0) (debug 0)))
(logior (ash (read-byte stream) 8) (read-byte stream)))
(defun read-int16 (stream)
"Read (signed-byte 16) from (unsigned-byte 8) stream "
(declare (optimize (speed 3) (safety 0) (debug 0)))
(let ((result (read-uint16 stream)))
(declare (type (unsigned-byte 16) result))
(if (logbitp 15 result)
(1- (- (logandc2 #xFFFF result)))
result)))
(let ((N 10000000))
;; write a test file
(with-open-file (fs "lots-of-int16s.bin"
:element-type '(signed-byte 16)
:direction :output :if-exists :supersede)
(dotimes (i N)
(write-byte (- (random 64000) 32000) fs)))
(with-open-file (fs "lots-of-int16s.bin"
:element-type '(unsigned-byte 8)
:direction :input)
(time
(let ((acc 0))
(dotimes (i N)
(incf acc (read-int16 fs)))
(print acc))))
The compile optimization warnings are:
; file: /tmp/filetukRZJ.lisp
; in: DEFUN READ-UINT16
; (ASH (READ-BYTE STREAM) 8)
;
; note: forced to do full call
; unable to do inline ASH (cost 2) because:
; The first argument is a INTEGER, not a FIXNUM.
; The result is a (VALUES INTEGER &OPTIONAL), not a (VALUES FIXNUM &REST T).
; unable to do inline ASH (cost 3) because:
; The first argument is a INTEGER, not a (UNSIGNED-BYTE 32).
; The result is a (VALUES INTEGER &OPTIONAL), not a (VALUES
; (UNSIGNED-BYTE 32)
; &REST T).
; etc.
; (LOGIOR (ASH (READ-BYTE STREAM) 8) (READ-BYTE STREAM))
;
; note: forced to do static-fun Two-arg-ior (cost 53)
; unable to do inline fixnum arithmetic (cost 2) because:
; The first argument is a INTEGER, not a FIXNUM.
; The second argument is a INTEGER, not a FIXNUM.
; The result is a (VALUES INTEGER &OPTIONAL), not a (VALUES FIXNUM &REST T).
; unable to do inline (signed-byte 32) arithmetic (cost 3) because:
; The first argument is a INTEGER, not a (SIGNED-BYTE 32).
; The second argument is a INTEGER, not a (SIGNED-BYTE 32).
; The result is a (VALUES INTEGER &OPTIONAL), not a (VALUES
; (SIGNED-BYTE 32) &REST
; T).
; etc.
Thanks.
--jeff
jkc <·······@makewavs.com> writes:
> On Sat, 04 Aug 2007 21:00:48 -0400, Zach Beane wrote:
>
>> [quoted text muted]
>
> Thanks to both of you, I have a much better idea what's going on now. I've
> coded up some test functions and was playing around with optimization. It
> seems like since the types are known that it should be possible to exploit
> that. But everything I tried in the way of coercion actually made things
> *much slower*.
>
> Is there any way to make this go faster?
Dunno, sorry. I've never needed to read enough of these integers where
I had to care.
Zach
jkc wrote:
> On Sat, 04 Aug 2007 21:00:48 -0400, Zach Beane wrote:
>
>> [quoted text muted]
>
> Thanks to both of you, I have a much better idea what's going on now. I've
> coded up some test functions and was playing around with optimization. It
> seems like since the types are known that it should be possible to exploit
> that. But everything I tried in the way of coercion actually made things
> *much slower*.
>
> Is there any way to make this go faster?
I don't have the energy to test this now, but would you see a speedup
if you used arrays instead of single integers? What if you declare
the functions to be inline? I'm guessing that boxing individual
values is costing a large portion of your time.
- Daniel
jkc <·······@makewavs.com> wrote:
> On Sat, 04 Aug 2007 21:00:48 -0400, Zach Beane wrote:
>
>> [quoted text muted]
>
> Thanks to both of you, I have a much better idea what's going on now. I've
> coded up some test functions and was playing around with optimization. It
> seems like since the types are known that it should be possible to exploit
> that.
The types aren't known, as should be obvious from the optimization
notes printed by the compiler. Since the stream given to read-uint16
could have an element-type other than (u-b 8), the compiler can't
assume that READ-BYTE returns an octet, but rather must be prepared to
deal with it returning an unbounded integer:
> ; (ASH (READ-BYTE STREAM) 8)
> ;
> ; note: forced to do full call
> ; unable to do inline ASH (cost 2) because:
> ; The first argument is a INTEGER, not a FIXNUM.
> But everything I tried in the way of coercion actually made things
> *much slower*.
Just declaring the types properly [e.g. by using (the (unsigned-byte 8)
(read-byte ...))] certainly shouldn't make things slower. When you say
coercion, do you really mean using CL:COERCE? That's generally not an
operator you should be using for optimization.
--
Juho Snellman
From: Dimiter "malkia" Stanev
Subject: Re: Question about working with mixed signed integers
Date:
Message-ID: <46B8BCA1.1030908@gmail.com>
On LispWorks, I was able to get it from 1.5s to 1.1s on my machine:
(declaim (optimize (speed 3) (safety 0) (debug 0) (space 0)
(compilation-speed 0)
#+lispworks (hcl:fixnum-safety 0)))
(declaim (ftype (function (stream) (unsigned-byte 16)) read-uint16)
(ftype (function (stream) (signed-byte 16)) read-int16)
(inline read-uint16 read-int16))
(defun read-uint16 (stream)
"Read (unsigned-byte 16) from (unsigned-byte 8) stream "
(declare (optimize (speed 3) (safety 0) (debug 0)))
(logior (ash (read-byte stream) 8) (read-byte stream)))
(defun read-int16 (stream)
"Read (signed-byte 16) from (unsigned-byte 8) stream "
(declare (optimize (speed 3) (safety 0) (debug 0)))
(let ((result (read-uint16 stream)))
(declare (type (unsigned-byte 16) result))
(if (logbitp 15 result)
(1- (- (logandc2 #xFFFF result)))
result)))
(defun test ()
(let ((N 10000000))
;; write a test file
(with-open-file (fs "lots-of-int16s.bin"
:element-type '(signed-byte 16)
:direction :output :if-exists :supersede)
(dotimes (i N)
(write-byte (- (random 64000) 32000) fs)))
(with-open-file (fs "lots-of-int16s.bin"
:element-type '(unsigned-byte 8)
:direction :input)
(time
(let ((acc 0))
(dotimes (i N)
(incf acc (read-int16 fs)))
(print acc))))))
#-nil
(test)
As some other poster said, making it to read from an array, rather from
stream might be faster (it depends on how quick read-byte is
implemented, and probably whether open-file allows something like
setting the internal buffer size).
Here is the main loop in x86 assembly, after Lispworks optimizations:
L6: 430: B501 moveb ch, 1
432: 8B45E8 move eax, [ebp-18]
435: FF1574F21820 call [2018F274] ; READ-BYTE
441: 8945E0 move [ebp-20], eax
444: C165E008 shl [ebp-20], 8
448: B501 moveb ch, 1
450: 8B45E8 move eax, [ebp-18]
453: FF1574F21820 call [2018F274] ; READ-BYTE
459: 8B7DE0 move edi, [ebp-20]
462: 0BF8 or edi, eax
464: 89FB move ebx, edi
466: 81E300000200 and ebx, 20000
472: 83FB00 cmp ebx, 0
475: 740E je L7
477: 83F7FF xor edi, -1
480: 81E7FCFF0300 and edi, 3FFFC
486: F7DF neg edi
488: 83EF04 sub edi, 4
L7: 491: 8B5DF8 move ebx, [ebp-8]
494: 03FB add edi, ebx
496: 897DF8 move [ebp-8], edi
499: 8B7DFC move edi, [ebp-4]
502: 81FFFC596202 cmp edi, 26259FC
508: 0F8D95010000 jge L24
514: 8B7DFC move edi, [ebp-4]
517: 83C704 add edi, 4
520: 897DFC move [ebp-4], edi
523: EBA1 jmp L6
That's not a bad code (ahem it cuold be optimized even better, but the
real bottleneck is "read-byte").
On Tue, 07 Aug 2007 11:40:33 -0700, Dimiter "malkia" Stanev wrote:
> On LispWorks, I was able to get it from 1.5s to 1.1s on my machine:
>
On SBCL under Slime I was able to go from 1.2s to about 0.5s using your
optimizations. Pretty slick. This is very instructive for me on the
subject of optimization.
Any idea how one can eliminate the compile warnings? E.g.,
; --> BLOCK LOGIOR
; ==>
; (LOGIOR
; (LOGIOR
; (LOGIOR (LOGIOR (LOGIOR # #) (ASH # 24)) (ASH (READ-BYTE STREAM) 16))
; (ASH (READ-BYTE STREAM) 8))
; (READ-BYTE STREAM))
;
; note: forced to do static-fun Two-arg-ior (cost 53)
; unable to do inline fixnum arithmetic (cost 2) because:
; The first argument is a INTEGER, not a FIXNUM.
; The second argument is a INTEGER, not a FIXNUM.
; The result is a (VALUES (UNSIGNED-BYTE 64) &OPTIONAL), not a (VALUES
; FIXNUM
; &REST T).
; unable to do inline (signed-byte 32) arithmetic (cost 3) because:
; The first argument is a INTEGER, not a (SIGNED-BYTE 32).
; The second argument is a INTEGER, not a (SIGNED-BYTE 32).
; The result is a (VALUES (UNSIGNED-BYTE 64) &OPTIONAL), not a (VALUES
; (SIGNED-BYTE
; 32)
; &REST T).
; etc.
Regards,
--Jeff
jkc <·······@makewavs.com> writes:
> I frequently need to work with binary data files. Writing them with Lisp
> is no problem. Reading them has posed some challenges. I think I've boiled
> one of them down to a very simple case that I'm hoping one of you can
> enlighten me on.
>
> Suppose I want to read the following 16-bit integer: E8FD (little-endian).
> As an (unsigned-byte 16) it is easily read with:
>
> (defun read-u2 (in &optional (type :little-endian))
> "read a 2-byte integer"
> (let ((value 0))
> (declare (type (unsigned-byte 16) value))
> (case type
> (:big-endian
> (setf (ldb (byte 8 8) value) (read-byte in))
> (setf (ldb (byte 8 0) value) (read-byte in)))
> (otherwise
> (setf (ldb (byte 8 0) value) (read-byte in))
> (setf (ldb (byte 8 8) value) (read-byte in))))
> value))
>
> (with-open-file (fs "/home/jcunningham/turner/cnvt-to-seqb/int-test.bin"
> :element-type '(unsigned-byte 8)
> :direction :input)
> (print (read-u2 fs)))
>
> (modified from P. Siebel's development in PCL).
>
>
> But if I want to read it as a signed integer by changing the declare to
> (signed-byte 16) I get the following error:
>
> ==> The value 65000 is not of type (SIGNED-BYTE 16).
> [Condition of type TYPE-ERROR]
>
> It seems that internally the ldb isn't just setting bits in the
> (signed-byte 16) value, but doing some bounds checking as well.
>
> Now, I know I could open the file as a (signed-byte 8), but then when I
> encountered some (unsigned-byte 16) binary field of sufficient size in the
> file (both signed and unsigned fields are present), it will die there for
> the same reason.
>
> What I want to be able to do is twiddle high-byte in this two-byte
> signed-word without it objecting to what the actual bits are.
>
> Is there a way to do this? (byte-wise - doing it one bit at a time would
> be horribly inefficient).
Indeed, given that lisp negative integers have an infinite number of
non-null bits! Try: (ldb (byte 8 100000) -3)
or any even bigger value for the position...
So what you need to do, is to read your 2-complement number as a
positive lisp number, and then convert it to a signed integer:
(defun unsigned-to-signed/2-complement (x width)
(declare (integer x width))
(let ((maxpos+1 (expt 2 (1- width))))
(if (< x maxpos+1) x (- x (* 2 maxpos+1)))))
* (UNSIGNED-TO-SIGNED/2-COMPLEMENT 32000 16)
32000
* (UNSIGNED-TO-SIGNED/2-COMPLEMENT 64000 16)
-1536
* (+ (UNSIGNED-TO-SIGNED/2-COMPLEMENT 64000 16)
(UNSIGNED-TO-SIGNED/2-COMPLEMENT 1536 16))
0
*
--
__Pascal Bourguignon__ http://www.informatimago.com/
READ THIS BEFORE OPENING PACKAGE: According to certain suggested
versions of the Grand Unified Theory, the primary particles
constituting this product may decay to nothingness within the next
four hundred million years.