Question about working with mixed signed integers

From: jkc
Subject: Question about working with mixed signed integers
Date: Sun, 05 Aug 2007 01:01:05 +0000
Message-ID: <pan.2007.08.05.01.01.04.314261@makewavs.com>

I frequently need to work with binary data files. Writing them with Lisp
is no problem. Reading them has posed some challenges. I think I've boiled
one of them down to a very simple case that I'm hoping one of you can
enlighten me on. 

Suppose I want to read the following 16-bit integer: E8FD (little-endian).
As an (unsigned-byte 16) it is easily read with: 

(defun read-u2 (in &optional (type :little-endian))
  "read a 2-byte integer"
  (let ((value 0))
    (declare (type (unsigned-byte 16) value))
    (case type
      (:big-endian
       (setf (ldb (byte 8 8) value) (read-byte in))
       (setf (ldb (byte 8 0) value) (read-byte in)))
      (otherwise
       (setf (ldb (byte 8 0) value) (read-byte in))
       (setf (ldb (byte 8 8) value) (read-byte in))))
    value))

(with-open-file (fs "/home/jcunningham/turner/cnvt-to-seqb/int-test.bin"
		    :element-type '(unsigned-byte 8)
		    :direction :input)
  (print (read-u2 fs)))

(modified from P. Siebel's development in PCL). 


But if I want to read it as a signed integer by changing the declare to
(signed-byte 16) I get the following error:

==> The value 65000 is not of type (SIGNED-BYTE 16).
   [Condition of type TYPE-ERROR]

It seems that internally the ldb isn't just setting bits in the
(signed-byte 16) value, but doing some bounds checking as well. 

Now, I know I could open the file as a (signed-byte 8), but then when I
encountered some (unsigned-byte 16) binary field of sufficient size in the
file (both signed and unsigned fields are present), it will die there for
the same reason. 

What I want to be able to do is twiddle high-byte in this two-byte
signed-word without it objecting to what the actual bits are. 

Is there a way to do this? (byte-wise - doing it one bit at a time would
be horribly inefficient). 

Thanks!

-Jeff

Re: Question about working with mixed signed integers Zach Beane
- Re: Question about working with mixed signed integers jkc
  - Re: Question about working with mixed signed integers Zach Beane
  - Re: Question about working with mixed signed integers D Herring
  - Re: Question about working with mixed signed integers Juho Snellman
  - Re: Question about working with mixed signed integers Dimiter "malkia" Stanev
    - Re: Question about working with mixed signed integers jkc
Re: Question about working with mixed signed integers Pascal Bourguignon

From: Zach Beane
Subject: Re: Question about working with mixed signed integers
Date: Sun, 05 Aug 2007 01:00:48 +0000
Message-ID: <m3ps22kd6n.fsf@unnamed.xach.com>

jkc <·······@makewavs.com> writes:

> I frequently need to work with binary data files. Writing them with Lisp
> is no problem. Reading them has posed some challenges. I think I've boiled
> one of them down to a very simple case that I'm hoping one of you can
> enlighten me on. 
>
> Suppose I want to read the following 16-bit integer: E8FD (little-endian).
> As an (unsigned-byte 16) it is easily read with: 
>
> (defun read-u2 (in &optional (type :little-endian))
>   "read a 2-byte integer"
>   (let ((value 0))
>     (declare (type (unsigned-byte 16) value))
>     (case type
>       (:big-endian
>        (setf (ldb (byte 8 8) value) (read-byte in))
>        (setf (ldb (byte 8 0) value) (read-byte in)))
>       (otherwise
>        (setf (ldb (byte 8 0) value) (read-byte in))
>        (setf (ldb (byte 8 8) value) (read-byte in))))
>     value))
>
> (with-open-file (fs "/home/jcunningham/turner/cnvt-to-seqb/int-test.bin"
> 		    :element-type '(unsigned-byte 8)
> 		    :direction :input)
>   (print (read-u2 fs)))
>
> (modified from P. Siebel's development in PCL). 
>
>
> But if I want to read it as a signed integer by changing the declare to
> (signed-byte 16) I get the following error:
>
> ==> The value 65000 is not of type (SIGNED-BYTE 16).
>    [Condition of type TYPE-ERROR]
>
> It seems that internally the ldb isn't just setting bits in the
> (signed-byte 16) value, but doing some bounds checking as well. 
>
> Now, I know I could open the file as a (signed-byte 8), but then when I
> encountered some (unsigned-byte 16) binary field of sufficient size in the
> file (both signed and unsigned fields are present), it will die there for
> the same reason. 
>
> What I want to be able to do is twiddle high-byte in this two-byte
> signed-word without it objecting to what the actual bits are. 
>
> Is there a way to do this? (byte-wise - doing it one bit at a time would
> be horribly inefficient). 

Here's what I have in zpb-ttf to read some of the signed & unsigned
values from a TrueType file:

  (defun read-uint16 (stream)
    (loop repeat 2
          for value = (read-byte stream)
            then (logior (ash value 8) (read-byte stream))
          finally (return value)))

  (defun read-int16 (stream)
    (let ((result (read-uint16 stream)))
      (if (logbitp 15 result)
          (1- (- (logandc2 #xFFFF result)))
          result)))

  (defun read-uint8 (stream)
    (read-byte stream))

  (defun read-int8 (stream)
    (let ((result (read-byte stream)))
      (if (logbitp 7 result)
          (1- (- (logandc2 #xFF result)))
          result)))

Zach

From: jkc
Subject: Re: Question about working with mixed signed integers
Date: Sun, 05 Aug 2007 23:16:46 +0000
Message-ID: <pan.2007.08.05.23.16.42.841622@makewavs.com>

On Sat, 04 Aug 2007 21:00:48 -0400, Zach Beane wrote:

> [quoted text muted]

Thanks to both of you, I have a much better idea what's going on now. I've
coded up some test functions and was playing around with optimization. It
seems like since the types are known that it should be possible to exploit
that. But everything I tried in the way of coercion actually made things
*much slower*. 

Is there any way to make this go faster?

(defun read-uint16 (stream)
  "Read (unsigned-byte 16) from (unsigned-byte 8) stream "
  (declare (optimize (speed 3) (safety 0) (debug 0)))
  (logior (ash (read-byte stream) 8) (read-byte stream)))

(defun read-int16 (stream)
  "Read (signed-byte 16) from (unsigned-byte 8) stream "
  (declare (optimize (speed 3) (safety 0) (debug 0)))
  (let ((result (read-uint16 stream)))
    (declare (type (unsigned-byte 16) result))
    (if (logbitp 15 result)
	(1- (- (logandc2 #xFFFF result)))
	result)))

(let ((N 10000000))
  ;; write a test file
  (with-open-file (fs "lots-of-int16s.bin" 
		      :element-type '(signed-byte 16) 
  		      :direction :output :if-exists :supersede)
    (dotimes (i N)
      (write-byte (- (random 64000) 32000) fs)))

  (with-open-file (fs "lots-of-int16s.bin" 
		      :element-type '(unsigned-byte 8)
		      :direction :input) 
    (time
     (let ((acc 0))
       (dotimes (i N)
	 (incf acc (read-int16 fs)))
       (print acc))))

The compile optimization warnings are:

; file: /tmp/filetukRZJ.lisp
; in: DEFUN READ-UINT16
;     (ASH (READ-BYTE STREAM) 8)
; 
; note: forced to do full call
;       unable to do inline ASH (cost 2) because:
;       The first argument is a INTEGER, not a FIXNUM.
;       The result is a (VALUES INTEGER &OPTIONAL), not a (VALUES FIXNUM &REST T).
;       unable to do inline ASH (cost 3) because:
;       The first argument is a INTEGER, not a (UNSIGNED-BYTE 32).
;       The result is a (VALUES INTEGER &OPTIONAL), not a (VALUES
;                                                          (UNSIGNED-BYTE 32)
;                                                          &REST T).
;       etc.

;     (LOGIOR (ASH (READ-BYTE STREAM) 8) (READ-BYTE STREAM))
; 
; note: forced to do static-fun Two-arg-ior (cost 53)
;       unable to do inline fixnum arithmetic (cost 2) because:
;       The first argument is a INTEGER, not a FIXNUM.
;       The second argument is a INTEGER, not a FIXNUM.
;       The result is a (VALUES INTEGER &OPTIONAL), not a (VALUES FIXNUM &REST T).
;       unable to do inline (signed-byte 32) arithmetic (cost 3) because:
;       The first argument is a INTEGER, not a (SIGNED-BYTE 32).
;       The second argument is a INTEGER, not a (SIGNED-BYTE 32).
;       The result is a (VALUES INTEGER &OPTIONAL), not a (VALUES
;                                                          (SIGNED-BYTE 32) &REST
;                                                          T).
;       etc.

Thanks.

--jeff

From: Zach Beane
Subject: Re: Question about working with mixed signed integers
Date: Mon, 06 Aug 2007 13:19:35 +0000
Message-ID: <m3643skdg8.fsf@unnamed.xach.com>

jkc <·······@makewavs.com> writes:

> On Sat, 04 Aug 2007 21:00:48 -0400, Zach Beane wrote:
>
>> [quoted text muted]
>
> Thanks to both of you, I have a much better idea what's going on now. I've
> coded up some test functions and was playing around with optimization. It
> seems like since the types are known that it should be possible to exploit
> that. But everything I tried in the way of coercion actually made things
> *much slower*. 
>
> Is there any way to make this go faster?

Dunno, sorry. I've never needed to read enough of these integers where
I had to care.

Zach

From: D Herring
Subject: Re: Question about working with mixed signed integers
Date: Mon, 06 Aug 2007 23:29:38 +0000
Message-ID: <puGdndY-O86vMyrbnZ2dnUVZ_oKhnZ2d@comcast.com>

jkc wrote:
> On Sat, 04 Aug 2007 21:00:48 -0400, Zach Beane wrote:
> 
>> [quoted text muted]
> 
> Thanks to both of you, I have a much better idea what's going on now. I've
> coded up some test functions and was playing around with optimization. It
> seems like since the types are known that it should be possible to exploit
> that. But everything I tried in the way of coercion actually made things
> *much slower*. 
> 
> Is there any way to make this go faster?

I don't have the energy to test this now, but would you see a speedup 
if you used arrays instead of single integers?  What if you declare 
the functions to be inline?  I'm guessing that boxing individual 
values is costing a large portion of your time.

- Daniel

From: Juho Snellman
Subject: Re: Question about working with mixed signed integers
Date: Tue, 07 Aug 2007 07:25:06 +0000
Message-ID: <slrnfbg7ii.jfc.jsnell@sbz-30.cs.Helsinki.FI>

jkc <·······@makewavs.com> wrote:
> On Sat, 04 Aug 2007 21:00:48 -0400, Zach Beane wrote:
>
>> [quoted text muted]
>
> Thanks to both of you, I have a much better idea what's going on now. I've
> coded up some test functions and was playing around with optimization. It
> seems like since the types are known that it should be possible to exploit
> that.

The types aren't known, as should be obvious from the optimization
notes printed by the compiler. Since the stream given to read-uint16
could have an element-type other than (u-b 8), the compiler can't
assume that READ-BYTE returns an octet, but rather must be prepared to
deal with it returning an unbounded integer:

> ;     (ASH (READ-BYTE STREAM) 8)
> ; 
> ; note: forced to do full call
> ;       unable to do inline ASH (cost 2) because:
> ;       The first argument is a INTEGER, not a FIXNUM.

> But everything I tried in the way of coercion actually made things
> *much slower*. 

Just declaring the types properly [e.g. by using (the (unsigned-byte 8)
(read-byte ...))] certainly shouldn't make things slower. When you say
coercion, do you really mean using CL:COERCE? That's generally not an
operator you should be using for optimization.

-- 
Juho Snellman

From: Dimiter "malkia" Stanev
Subject: Re: Question about working with mixed signed integers
Date: Tue, 07 Aug 2007 18:40:33 +0000
Message-ID: <46B8BCA1.1030908@gmail.com>

On LispWorks, I was able to get it from 1.5s to 1.1s on my machine:

(declaim (optimize (speed 3) (safety 0) (debug 0) (space 0) 
(compilation-speed 0)
                    #+lispworks (hcl:fixnum-safety 0)))

(declaim (ftype (function (stream) (unsigned-byte 16)) read-uint16)
          (ftype (function (stream) (signed-byte 16)) read-int16)
          (inline read-uint16 read-int16))

(defun read-uint16 (stream)
   "Read (unsigned-byte 16) from (unsigned-byte 8) stream "
   (declare (optimize (speed 3) (safety 0) (debug 0)))
   (logior (ash (read-byte stream) 8) (read-byte stream)))

(defun read-int16 (stream)
   "Read (signed-byte 16) from (unsigned-byte 8) stream "
   (declare (optimize (speed 3) (safety 0) (debug 0)))
   (let ((result (read-uint16 stream)))
     (declare (type (unsigned-byte 16) result))
     (if (logbitp 15 result)
	(1- (- (logandc2 #xFFFF result)))
	result)))

(defun test ()
   (let ((N 10000000))
     ;; write a test file
     (with-open-file (fs "lots-of-int16s.bin"
                         :element-type '(signed-byte 16)
                         :direction :output :if-exists :supersede)
       (dotimes (i N)
         (write-byte (- (random 64000) 32000) fs)))

     (with-open-file (fs "lots-of-int16s.bin"
                         :element-type '(unsigned-byte 8)
                         :direction :input)
       (time
        (let ((acc 0))
          (dotimes (i N)
            (incf acc (read-int16 fs)))
          (print acc))))))

#-nil
(test)

As some other poster said, making it to read from an array, rather from 
stream might be faster (it depends on how quick read-byte is 
implemented, and probably whether open-file allows something like 
setting the internal buffer size).

Here is the main loop in x86 assembly, after Lispworks optimizations:

L6:  430:      B501             moveb ch, 1
      432:      8B45E8           move  eax, [ebp-18]
      435:      FF1574F21820     call  [2018F274]       ; READ-BYTE
      441:      8945E0           move  [ebp-20], eax
      444:      C165E008         shl   [ebp-20], 8
      448:      B501             moveb ch, 1
      450:      8B45E8           move  eax, [ebp-18]
      453:      FF1574F21820     call  [2018F274]       ; READ-BYTE
      459:      8B7DE0           move  edi, [ebp-20]
      462:      0BF8             or    edi, eax
      464:      89FB             move  ebx, edi
      466:      81E300000200     and   ebx, 20000
      472:      83FB00           cmp   ebx, 0
      475:      740E             je    L7
      477:      83F7FF           xor   edi, -1
      480:      81E7FCFF0300     and   edi, 3FFFC
      486:      F7DF             neg   edi
      488:      83EF04           sub   edi, 4
L7:  491:      8B5DF8           move  ebx, [ebp-8]
      494:      03FB             add   edi, ebx
      496:      897DF8           move  [ebp-8], edi
      499:      8B7DFC           move  edi, [ebp-4]
      502:      81FFFC596202     cmp   edi, 26259FC
      508:      0F8D95010000     jge   L24
      514:      8B7DFC           move  edi, [ebp-4]
      517:      83C704           add   edi, 4
      520:      897DFC           move  [ebp-4], edi
      523:      EBA1             jmp   L6

That's not a bad code (ahem it cuold be optimized even better, but the 
real bottleneck is "read-byte").

From: jkc
Subject: Re: Question about working with mixed signed integers
Date: Wed, 08 Aug 2007 23:49:09 +0000
Message-ID: <pan.2007.08.08.23.49.07.414331@makewavs.com>

On Tue, 07 Aug 2007 11:40:33 -0700, Dimiter "malkia" Stanev wrote:

> On LispWorks, I was able to get it from 1.5s to 1.1s on my machine:
> 

On SBCL under Slime I was able to go from 1.2s to about 0.5s using your
optimizations. Pretty slick. This is very instructive for me on the
subject of optimization. 

Any idea how one can eliminate the compile warnings? E.g., 

; --> BLOCK LOGIOR 
; ==>
;   (LOGIOR
;    (LOGIOR
;     (LOGIOR (LOGIOR (LOGIOR # #) (ASH # 24)) (ASH (READ-BYTE STREAM) 16))
;     (ASH (READ-BYTE STREAM) 8))
;    (READ-BYTE STREAM))
; 
; note: forced to do static-fun Two-arg-ior (cost 53)
;       unable to do inline fixnum arithmetic (cost 2) because:
;       The first argument is a INTEGER, not a FIXNUM.
;       The second argument is a INTEGER, not a FIXNUM.
;       The result is a (VALUES (UNSIGNED-BYTE 64) &OPTIONAL), not a (VALUES
;                                                                     FIXNUM
;                                                                     &REST T).
;       unable to do inline (signed-byte 32) arithmetic (cost 3) because:
;       The first argument is a INTEGER, not a (SIGNED-BYTE 32).
;       The second argument is a INTEGER, not a (SIGNED-BYTE 32).
;       The result is a (VALUES (UNSIGNED-BYTE 64) &OPTIONAL), not a (VALUES
;                                                                     (SIGNED-BYTE
;                                                                      32)
;                                                                     &REST T).
;       etc.

Regards,

--Jeff

From: Pascal Bourguignon
Subject: Re: Question about working with mixed signed integers
Date: Sun, 05 Aug 2007 01:17:15 +0000
Message-ID: <87ps223hlw.fsf@voyager.informatimago.com>

jkc <·······@makewavs.com> writes:

> I frequently need to work with binary data files. Writing them with Lisp
> is no problem. Reading them has posed some challenges. I think I've boiled
> one of them down to a very simple case that I'm hoping one of you can
> enlighten me on. 
>
> Suppose I want to read the following 16-bit integer: E8FD (little-endian).
> As an (unsigned-byte 16) it is easily read with: 
>
> (defun read-u2 (in &optional (type :little-endian))
>   "read a 2-byte integer"
>   (let ((value 0))
>     (declare (type (unsigned-byte 16) value))
>     (case type
>       (:big-endian
>        (setf (ldb (byte 8 8) value) (read-byte in))
>        (setf (ldb (byte 8 0) value) (read-byte in)))
>       (otherwise
>        (setf (ldb (byte 8 0) value) (read-byte in))
>        (setf (ldb (byte 8 8) value) (read-byte in))))
>     value))
>
> (with-open-file (fs "/home/jcunningham/turner/cnvt-to-seqb/int-test.bin"
> 		    :element-type '(unsigned-byte 8)
> 		    :direction :input)
>   (print (read-u2 fs)))
>
> (modified from P. Siebel's development in PCL). 
>
>
> But if I want to read it as a signed integer by changing the declare to
> (signed-byte 16) I get the following error:
>
> ==> The value 65000 is not of type (SIGNED-BYTE 16).
>    [Condition of type TYPE-ERROR]
>
> It seems that internally the ldb isn't just setting bits in the
> (signed-byte 16) value, but doing some bounds checking as well. 
>
> Now, I know I could open the file as a (signed-byte 8), but then when I
> encountered some (unsigned-byte 16) binary field of sufficient size in the
> file (both signed and unsigned fields are present), it will die there for
> the same reason. 
>
> What I want to be able to do is twiddle high-byte in this two-byte
> signed-word without it objecting to what the actual bits are. 
>
> Is there a way to do this? (byte-wise - doing it one bit at a time would
> be horribly inefficient). 

Indeed, given that lisp negative integers have an infinite number of
non-null bits!  Try: (ldb (byte 8 100000) -3)
or any even bigger value for the position...

So what you need to do, is to read your 2-complement number as a
positive lisp number, and then convert it to a signed integer:

  (defun unsigned-to-signed/2-complement (x width) 
    (declare (integer x width))
    (let ((maxpos+1 (expt 2 (1- width))))
	 (if (< x maxpos+1) x (- x (* 2 maxpos+1)))))


* (UNSIGNED-TO-SIGNED/2-COMPLEMENT 32000 16)

32000
* (UNSIGNED-TO-SIGNED/2-COMPLEMENT 64000 16)

-1536
* (+ (UNSIGNED-TO-SIGNED/2-COMPLEMENT 64000 16)
     (UNSIGNED-TO-SIGNED/2-COMPLEMENT  1536 16))

0
* 



-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

READ THIS BEFORE OPENING PACKAGE: According to certain suggested
versions of the Grand Unified Theory, the primary particles
constituting this product may decay to nothingness within the next
four hundred million years.