Bits and bytes in CL

From: Oyvin Halfdan Thuv
Subject: Bits and bytes in CL
Date: Tue, 08 Nov 2005 13:46:06 +0000
Message-ID: <slrndn1b0u.lhs.oyvinht@apollo.orakel.ntnu.no>

Does anyone have a reccomodation of efficiently coercing a bit-vector into an 
integer?

I have a bit string consisting of (maybe) 8 bits representing one number, and
(maybe) 16 bits representing another number. That is, they are both represented
using the same bit string.

E.g. the bit string #*001100111000000000000001 should coerce to the values 51
and 32769.

I guess I could do this (perhaps slowly) using subseq, and afterwards calulating
each integer by recursively summing up the (expt index 2), but is there a
fast built-in way of doing this?

-- 
Oyvin

Re: Bits and bytes in CL Petter Gustad
Re: Bits and bytes in CL Carl Taylor
- Re: Bits and bytes in CL Oyvin Halfdan Thuv
  - Re: Bits and bytes in CL Petter Gustad
    - Re: Bits and bytes in CL Carl Taylor
      - Re: Bits and bytes in CL Petter Gustad
        Re: Bits and bytes in CL Carl Taylor
        Re: Bits and bytes in CL Juho Snellman
        Re: Bits and bytes in CL Marcin 'Qrczak' Kowalczyk
        Re: Bits and bytes in CL Petter Gustad
        Re: Bits and bytes in CL Oyvin Halfdan Thuv
        Re: Bits and bytes in CL Petter Gustad

From: Petter Gustad
Subject: Re: Bits and bytes in CL
Date: Tue, 08 Nov 2005 14:47:07 +0000
Message-ID: <87acgfp578.fsf@filestore.home.gustad.com>

Oyvin Halfdan Thuv <·······@nospam.orakel.ntnu.no> writes:

> Does anyone have a reccomodation of efficiently coercing a
> bit-vector into an integer?

Maybe:

(defun bvi (bv)
  "bit-vector to integer conversion"
  (loop for bit across bv and
        i from 0
        sum (ash bit i))) 

(ldb (byte 8 0) (bvi #*001100111000000000000001)) -> 204
(ldb (byte 16 8) (bvi #*001100111000000000000001)) -> 32769

Petter
-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

From: Carl Taylor
Subject: Re: Bits and bytes in CL
Date: Tue, 08 Nov 2005 17:51:18 +0000
Message-ID: <qA5cf.61568$zb5.53870@bgtnsc04-news.ops.worldnet.att.net>

Oyvin Halfdan Thuv wrote:
> Does anyone have a reccomodation of efficiently coercing a bit-vector
> into an integer?
>
> I have a bit string consisting of (maybe) 8 bits representing one
> number, and (maybe) 16 bits representing another number. That is,
> they are both represented using the same bit string.
>
> E.g. the bit string #*001100111000000000000001 should coerce to the
> values 51 and 32769.

(defun coerce-bit-vector-to-integer (in-bit-vector)
   (let ((b-pos -1))
    (reduce #'+
            (the simple-bit-vector in-bit-vector)
            :from-end t
            :key #'(lambda (b)
                        (incf b-pos)
                        (ash b b-pos)))))


CL-USER 26 > (ldb (byte 8 16) (coerce-bit-vector-to-integer 
#*001100111000000000000001))
51

CL-USER 27 > (ldb (byte 16 0) (coerce-bit-vector-to-integer 
#*001100111000000000000001))
32769

Carl Taylor

From: Oyvin Halfdan Thuv
Subject: Re: Bits and bytes in CL
Date: Tue, 08 Nov 2005 18:52:13 +0000
Message-ID: <slrndn1sut.lhs.oyvinht@apollo.orakel.ntnu.no>

In article <·····················@bgtnsc04-news.ops.worldnet.att.net>,
Carl Taylor wrote:
> Oyvin Halfdan Thuv wrote:
>> Does anyone have a reccomodation of efficiently coercing a bit-vector
>> into an integer?
>>
>> I have a bit string consisting of (maybe) 8 bits representing one
>> number, and (maybe) 16 bits representing another number. That is,
>> they are both represented using the same bit string.
>>
>> E.g. the bit string #*001100111000000000000001 should coerce to the
>> values 51 and 32769.
> 
> (defun coerce-bit-vector-to-integer (in-bit-vector)
>    (let ((b-pos -1))
>     (reduce #'+
>             (the simple-bit-vector in-bit-vector)
>             :from-end t
>             :key #'(lambda (b)
>                         (incf b-pos)
>                         (ash b b-pos)))))

Thanks, this is quite fast! 

-- 
Oyvin

From: Petter Gustad
Subject: Re: Bits and bytes in CL
Date: Tue, 08 Nov 2005 19:43:53 +0000
Message-ID: <87lkzzvsau.fsf@filestore.home.gustad.com>

Oyvin Halfdan Thuv <·······@nospam.orakel.ntnu.no> writes:


> Thanks, this is quite fast! 

Very nice use of reduce, but it conses a bit. Further, it collects the
bits in the wrong order (unless my brain is in reverse mode today).
The bit-vector you supplied:

#*001100111000000000000001

corresponds to the hexadecimal number #x8001cc. 

CL-USER> #x8001cc
8389068
CL-USER> (bvi #*001100111000000000000001)
8389068
CL-USER> (coerce-bit-vector-to-integer #*001100111000000000000001)
3375105

I don't know which implementation you are using, but using CMUCL on my
old 200MHz Pentium Pro machine I get:

CL-USER> (lisp-implementation-type)
"CMU Common Lisp"
CL-USER> (lisp-implementation-version)
"CVS release-18e-branch + minimal debian patches"
CL-USER> (time (dotimes (n 100000) (coerce-bit-vector-to-integer  #*001100111000000000000001)))
; Compiling LAMBDA NIL: 
; Compiling Top-Level Form: 

; Evaluation took:
;   4.37 seconds of real time
;   3.44 seconds of user run time
;   0.63 seconds of system run time
;   869,560,677 CPU cycles
;   [Run times include 0.45 seconds GC run time]
;   0 page faults and
;   39,200,256 bytes consed.
; 
NIL
CL-USER> (time (dotimes (n 100000) (bvi  #*001100111000000000000001)))
; Compiling LAMBDA NIL: 
; Compiling Top-Level Form: 

; Evaluation took:
;   0.72 seconds of real time
;   0.73 seconds of user run time
;   0.0 seconds of system run time
;   142,779,745 CPU cycles
;   0 page faults and
;   64 bytes consed.
; 
NIL
CL-USER> 

BTW, I added the same optimization to bvi:

(defun bvi (bv)
  "bit-vector to integer conversion"
  (loop for bit across (the simple-bit-vector bv) and
        i from 0
        sum (ash bit i))) 


Petter

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

From: Carl Taylor
Subject: Re: Bits and bytes in CL
Date: Wed, 09 Nov 2005 00:35:08 +0000
Message-ID: <0vbcf.35645$qk4.3328@bgtnsc05-news.ops.worldnet.att.net>

Petter Gustad wrote:
> Oyvin Halfdan Thuv <·······@nospam.orakel.ntnu.no> writes:
>
>
>> Thanks, this is quite fast!
>
> Very nice use of reduce, but it conses a bit.

There's no consing in LispWorks for the <reduce> in this  function *if it's 
compiled*:

CL-USER 24 > (time (coerce-bit-vector-to-integer 
#*11100101010000100101000100111))
Timing the evaluation of (COERCE-BIT-VECTOR-TO-INTEGER 
#*11100101010000100101000100111)

user time    =      0.000
system time  =      0.000
Elapsed time =   0:00:00
Allocation   = 128 bytes standard / 0 bytes conses
0 Page faults
480791079

>...... Further, it collects the
> bits in the wrong order (unless my brain is in reverse mode today).

No, to parce or convert a binary *integer* one must proceed right to left, 
i.e. the zeroth bit of a binary number is the rightmost bit. Yet the zeroth 
bit of a *bit vector* is the leftmost.  Hence the <:from-end t> keyword 
argument to obtain the results expected by the OP, i.e 51 and not 204.

Carl Taylor

From: Petter Gustad
Subject: Re: Bits and bytes in CL
Date: Wed, 09 Nov 2005 10:57:01 +0000
Message-ID: <87pspam6ma.fsf@filestore.home.gustad.com>

"Carl Taylor" <··········@att.net> writes:

> Petter Gustad wrote:
>> Oyvin Halfdan Thuv <·······@nospam.orakel.ntnu.no> writes:
>>
>>
>>> Thanks, this is quite fast!
>>
>> Very nice use of reduce, but it conses a bit.
>
> There's no consing in LispWorks for the <reduce> in this  function *if
> it's compiled*:

Seems like I have to buy a commercial compiler soon. Did you specify
any optimize declarations? 

; Compilation finished in 0:00:00.
; Loading #p"/raid0/home/petter/lisp/bitvector.x86f".

CL-USER> (time (dotimes (n 100000) (coerce-bit-vector-to-integer  #*001100111000000000000001)))
; Compiling LAMBDA NIL: 
...
;   39,200,256 bytes consed.

>>...... Further, it collects the
>> bits in the wrong order (unless my brain is in reverse mode today).
>
> No, to parce or convert a binary *integer* one must proceed right to
> left, i.e. the zeroth bit of a binary number is the rightmost bit. Yet
> the zeroth bit of a *bit vector* is the leftmost.  Hence the

But that is the opposite bit-ordering of Common Lisp bit oriented
functions, where bit zero is the least significant bit. When you talk
about left and right here I assume you refer to the way arrays are
*printed* by the REPL.


(loop for i below 24 when (logbitp i #x8001cc) collect i)
-> (2 3 6 7 8 23)

(loop for i below 24 when (eq (ldb (byte 1 i) #x8001cc) 1) collect i)
-> (2 3 6 7 8 23)

or to extract the least significant byte, from bit 0 and 8 bits
towards the most significant bit:

(format nil "~X" (ldb (byte 8 0) #x8001cc))
-> "CC"

Doing the same thing using bit-vectors:

(loop for i below 24 when (eq (bit #*001100111000000000000001 i) 1) collect i)
(2 3 6 7 8 23)

(format nil "~X" (bvi (subseq #*001100111000000000000001 0 8)))
-> "CC"

Going back to �yvins message I see that he also uses big-endian bit
ordering in order to get the values he listed. Personally, I prefer to
use the same bit-ordering as the bit oriented Common Lisp functions.

Of course one can use any bit-order (even middle-out) as long as you
know where the bits are. 

Petter
-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

From: Carl Taylor
Subject: Re: Bits and bytes in CL
Date: Wed, 09 Nov 2005 16:21:45 +0000
Message-ID: <tmpcf.38164$qk4.37061@bgtnsc05-news.ops.worldnet.att.net>

Petter Gustad wrote:

> Seems like I have to buy a commercial compiler soon. Did you specify
> any optimize declarations?

No, I just did a copy/paste of the function into the REPL followed by a
(compile *) at the next prompt.

> CL-USER> (time (dotimes (n 100000) (coerce-bit-vector-to-integer
> #*001100111000000000000001))) ; Compiling LAMBDA NIL:
> ...
> ;   39,200,256 bytes consed.

You have collected consing data for <dotimes> as well as for the coerce
function! <dotimes> is a macro that expands into various type checks,
conditional limit checks, and system calls, i.e. a lot of overhead.

>> No, to parce or convert a binary integer one must proceed right to
>> left, i.e. the zeroth bit of a binary number is the rightmost bit.
>> Yet the zeroth bit of a bit vector is the leftmost.  Hence the
>
> But that is the opposite bit-ordering of Common Lisp bit oriented
> functions, where bit zero is the least significant bit. When you talk
> about left and right here I assume you refer to the way arrays are
> printed by the REPL.

But that was the problem to be solved. As you say below the OP decided to
have his bit vectors isomorphic to binary numbers, for #*1101 to match up
with #b1101. This certainly is the natural way to go.  It would be very
confusing to handle a situation where #*1011 is meant to represent the value
of #b1101.

> Going back to �yvins message I see that he also uses big-endian bit
> ordering in order to get the values he listed. Personally, I prefer to
> use the same bit-ordering as the bit oriented Common Lisp functions.
>
> Of course one can use any bit-order (even middle-out) as long as you
> know where the bits are.

Why make life more difficult for yourself?

Carl Taylor

From: Juho Snellman
Subject: Re: Bits and bytes in CL
Date: Wed, 09 Nov 2005 17:25:09 +0000
Message-ID: <slrndn4c7l.v7t.jsnell@sbz-30.cs.Helsinki.FI>

<··········@att.net> wrote:
> Petter Gustad wrote:
>> CL-USER> (time (dotimes (n 100000) (coerce-bit-vector-to-integer
>> #*001100111000000000000001))) ; Compiling LAMBDA NIL:
>> ...
>> ;   39,200,256 bytes consed.
> 
> You have collected consing data for <dotimes> as well as for the coerce
> function!

So you prefer the superior accuracy of measuring just one iteration of
a function that will have an average run-time a orders of magnitude
lower than the timer resolution?

For what it's worth, there are reasonable implementation strategies
for TIME which don't give exact consing figures, but for example ones
rounded to nearest 4KB or 8KB. So doing just one call to a low-consing
function is also pretty worthless for measuring consing.

> <dotimes> is a macro that expands into various type checks,
> conditional limit checks, and system calls, i.e. a lot of overhead.

I would hope that any CL implementation compiles a DOTIMES as trivial
as that to near-optimal code, unless interpreted or compiled with
extereme debug settings. Type checks? Sounds unlikely, considering the
iteration count is constant. Consing, system calls? Pure insanity. The
overhead imposed by the approximately three machine instructions
required for the DOTIMES will be just noise compared to the amount of
work that even a heroically optimized version of the reduce-based
COERCE-BIT-VECTOR-TO-INTEGER would do.

Quiz time! What *two* things are responsible for the consing for the
function (reproduced below) on CMUCL? What would be the minimal
changes to this code to make it non-consing? (Hint: REDUCE isn't
open-coded, and CMUCL doesn't do any dynamic-extent optimizations by
default).

(defun coerce-bit-vector-to-integer (in-bit-vector)
   (let ((b-pos -1))
    (reduce #'+
            (the simple-bit-vector in-bit-vector)
            :from-end t
            :key #'(lambda (b)
                        (incf b-pos)
                        (ash b b-pos)))))

S
P
O
I
L
E
R

S
P
A
C
E

#'+ is a variadic function. When it's funcalled by REDUCE, a
&rest-list of one element has to be consed. With dynamic-extent
support this could be allocated on the stack, but without it'll have
to be done from the heap. Direct calls are transformed into a
&rest-less form by the compiler, and not affected. A similar issue
exists for other functions with &rest-lists, for example cases like
(SORT FOO :TEST #'>). To "fix", just replace the #'+ with 
(lambda (a b) (+ a b)).

The second (and less noticeable) cause for consing is that the :key
lambda closes over b-pos, and is then passed to REDUCE. A
non-dynamic-extent closure requires heap allocation. Simplest
workaround that I can see (and a rather bad one) would be declaring
b-pos as special.

-- 
Juho Snellman
"Premature profiling is the root of all evil."

From: Marcin 'Qrczak' Kowalczyk
Subject: Re: Bits and bytes in CL
Date: Wed, 09 Nov 2005 16:30:05 +0000
Message-ID: <87r79p4wdu.fsf@qrnik.zagroda>

"Carl Taylor" <··········@att.net> writes:

> As you say below the OP decided to have his bit vectors isomorphic
> to binary numbers, for #*1101 to match up with #b1101. This
> certainly is the natural way to go.

It's not natural because the notation for bit vectors proceeds with
increasing indices, while the notation for binary numbers proceeds
with decreasing indices.

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From: Petter Gustad
Subject: Re: Bits and bytes in CL
Date: Wed, 09 Nov 2005 17:25:53 +0000
Message-ID: <87slu5lom6.fsf@filestore.home.gustad.com>

"Carl Taylor" <··········@att.net> writes:

>> CL-USER> (time (dotimes (n 100000) (coerce-bit-vector-to-integer
>> #*001100111000000000000001))) ; Compiling LAMBDA NIL:
>> ...
>> ;   39,200,256 bytes consed.
>
> You have collected consing data for <dotimes> as well as for the coerce
> function! <dotimes> is a macro that expands into various type checks,
> conditional limit checks, and system calls, i.e. a lot of overhead.

That would be the same for the bvi function as well, which only
reported 64 bytes consed (actually 0 in CMUCL 19a).
From my earlier message:

CL-USER> (time (dotimes (n 100000) (bvi  #*001100111000000000000001)))
...
;   64 bytes consed.

CL-USER> (macroexpand-1 '(dotimes (n 100000) (coerce-bit-vector-to-integer  #*001100111000000000000001)))
(DO ((N 0 (1+ N)))
    ((>= N 100000) NIL)
  (DECLARE (TYPE (INTEGER 0 100000) N))
  (COERCE-BIT-VECTOR-TO-INTEGER #*001100111000000000000001))

CL-USER> (macroexpand-1 '(dotimes (n 100000) (bvi  #*001100111000000000000001)))
(DO ((N 0 (1+ N)))
    ((>= N 100000) NIL)
  (DECLARE (TYPE (INTEGER 0 100000) N))
  (BVI #*001100111000000000000001))

I would assume the DO would expand into the same for both of them as
well.

> This certainly is the natural way to go. It would be very confusing
> to handle a situation where #*1011 is meant to represent the value
> of #b1101.

I find this very unnatural since this is the opposite of the
bit-ordering of Common Lisp . If I was to populate the bit-vector
above and an integer I could do:


(let ((bv (make-array 4 :element-type 'bit))
      (word 0))
  (dolist (bitno '(0 2 3))
    (setf (bit bv bitno) 1)
    (setf (ldb (byte 1 bitno) word) 1)))


As supposed to:

(let ((bv (make-array 4 :element-type 'bit))
      (word 0))
  (dolist (bitno '(0 2 3))
    (setf (bit bv (- 3 bitno)) 1)
    (setf (ldb (byte 1 bitno) word) 1)))


It's a matter of taste, but I find it more natural to use the same
bit-ordering as the other Common Lisp functions. The order in wich the
REPL prints arrays is less relevant.

Petter

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

From: Oyvin Halfdan Thuv
Subject: Re: Bits and bytes in CL
Date: Wed, 09 Nov 2005 18:53:40 +0000
Message-ID: <slrndn4hdk.bdm.oyvinht@apollo.orakel.ntnu.no>

In article <··············@filestore.home.gustad.com>, Petter Gustad wrote:
> "Carl Taylor" <··········@att.net> writes:
> 
>>> CL-USER> (time (dotimes (n 100000) (coerce-bit-vector-to-integer
>>> #*001100111000000000000001))) ; Compiling LAMBDA NIL:
>>> ...
>>> ;   39,200,256 bytes consed.
>>
>> You have collected consing data for <dotimes> as well as for the coerce
>> function! <dotimes> is a macro that expands into various type checks,
>> conditional limit checks, and system calls, i.e. a lot of overhead.
> 
> That would be the same for the bvi function as well, which only
> reported 64 bytes consed (actually 0 in CMUCL 19a).
> From my earlier message:
> 
> CL-USER> (time (dotimes (n 100000) (bvi  #*001100111000000000000001)))
> ...
> ;   64 bytes consed.
> 
> CL-USER> (macroexpand-1 '(dotimes (n 100000) (coerce-bit-vector-to-integer  #*001100111000000000000001)))
> (DO ((N 0 (1+ N)))
>     ((>= N 100000) NIL)
>   (DECLARE (TYPE (INTEGER 0 100000) N))
>   (COERCE-BIT-VECTOR-TO-INTEGER #*001100111000000000000001))
> 
> CL-USER> (macroexpand-1 '(dotimes (n 100000) (bvi  #*001100111000000000000001)))
> (DO ((N 0 (1+ N)))
>     ((>= N 100000) NIL)
>   (DECLARE (TYPE (INTEGER 0 100000) N))
>   (BVI #*001100111000000000000001))
> 
> I would assume the DO would expand into the same for both of them as
> well.
> 
>> This certainly is the natural way to go. It would be very confusing
>> to handle a situation where #*1011 is meant to represent the value
>> of #b1101.
> 
> I find this very unnatural since this is the opposite of the
> bit-ordering of Common Lisp . If I was to populate the bit-vector
> above and an integer I could do:
> 
> 
> (let ((bv (make-array 4 :element-type 'bit))
>       (word 0))
>   (dolist (bitno '(0 2 3))
>     (setf (bit bv bitno) 1)
>     (setf (ldb (byte 1 bitno) word) 1)))
> 
> 
> As supposed to:
> 
> (let ((bv (make-array 4 :element-type 'bit))
>       (word 0))
>   (dolist (bitno '(0 2 3))
>     (setf (bit bv (- 3 bitno)) 1)
>     (setf (ldb (byte 1 bitno) word) 1)))
> 
> 
> It's a matter of taste, but I find it more natural to use the same
> bit-ordering as the other Common Lisp functions. The order in wich the
> REPL prints arrays is less relevant.

This expanded to a very interesting discussion. I didn't know that CL used 
bits ordering non-proportional to binary numbers. The reason I went for big-
endian representation in the first place was the ease of debugging (I happen
to print out and look at the bit sequences from time to time, and I am used
to reading them that way). 

Thanks for the tips anyway. Both of you.

-- 
Oyvin

From: Petter Gustad
Subject: Re: Bits and bytes in CL
Date: Wed, 09 Nov 2005 21:32:45 +0000
Message-ID: <878xvx1p8i.fsf@parish.home.gustad.com>

Oyvin Halfdan Thuv <·······@not-spam.orakel.ntnu.no> writes:

> > It's a matter of taste, but I find it more natural to use the same
> > bit-ordering as the other Common Lisp functions. The order in wich the
> > REPL prints arrays is less relevant.
> 
> This expanded to a very interesting discussion. I didn't know that CL used 
> bits ordering non-proportional to binary numbers. 

I'm not sure what you mean by "non-proportional to binary numbers",
but if the bit-ordering was big-endian what would be the bit number of
the least significant bit of an arbitrarily large (or variable size)
integer? How would you write a function to return the least
significant bit?

Using little-endian bit ordering you call the least significant bit 0
and move towards the more significant end. You can do stuff like:

CL-USER> (expt 2 257)
231584178474632390847141970017375815706539969331281128078915168015826259279872
CL-USER> (logbitp 257 (expt 2 257))
T
CL-USER> (ldb (byte 3 255) (expt 2 257))
4

>The reason I went for big-endian representation in the first place
>was the ease of debugging (I happen to print out and look at the bit
>sequences from time to time, and I am used to reading them that way).

Why not convert the bit-vector to integer before printing:

(let ((bv (make-array 258 :element-type 'bit)))
  (setf (bit bv 257) 1)
  (format t "~,,' ,4:B~%" (bvi bv))
  (format t "~,,' ,4:X~%" (bvi bv)))

10 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
2 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

Or make a function to print the bits in reversed order.

Petter

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?