Byte Manipulation

From: Jason Kantz
Subject: Byte Manipulation
Date: Tue, 22 Feb 2000 00:00:00 +0000
Message-ID: <wkk8jxj1m5.fsf@kantz.com>

Hi, 

I will being making some data structures that will be manipulated as
bits and bytes and written to a stream.  I've noticed two ways to do
this ... 1. use integers and functions like byte, ldb, and dpb
... 2. make bit-vectors and use sequence operations.  Are there
obvious advantages/disadvantages to either that someone can point out?

Re: Byte Manipulation Joe Marshall
Re: Byte Manipulation Robert Munyer
- Re: Byte Manipulation Tim Bradshaw
  - Re: Byte Manipulation Robert Monfera
    - Re: Byte Manipulation Tim Bradshaw
      - Re: Byte Manipulation Robert Monfera
        Re: Byte Manipulation Tim Bradshaw
- Re: Byte Manipulation Robert Monfera

From: Joe Marshall
Subject: Re: Byte Manipulation
Date: Tue, 22 Feb 2000 00:00:00 +0000
Message-ID: <ifxs4.58235$vi4.123159@dfw-read.news.verio.net>

Bit vectors are likely to have more overhead than integers
when the integer is a fixnum.  If you are reading single
bytes and bits, you'll probably get more performance and
less consing with integers.

If you are using enough bits to make the integer spill over
into a bignum, then the overhead should be more or
less the same.  I'd probably use the sequence operations
in this case.  I find the sequence operations to be more
powerful and less confusing than the integer ones.

Jason Kantz <·····@kantz.com> wrote in message
···················@kantz.com...
> Hi,
>
> I will being making some data structures that will be manipulated as
> bits and bytes and written to a stream.  I've noticed two ways to do
> this ... 1. use integers and functions like byte, ldb, and dpb
> ... 2. make bit-vectors and use sequence operations.  Are there
> obvious advantages/disadvantages to either that someone can point out?
>

From: Robert Munyer
Subject: Re: Byte Manipulation
Date: Tue, 22 Feb 2000 00:00:00 +0000
Message-ID: <88unce$go4$1@eve.enteract.com>

In article <··············@kantz.com>,
Jason Kantz <·····@kantz.com> wrote:

> I will being making some data structures that will be manipulated
> as bits and bytes and written to a stream.  I've noticed two ways
> to do this ... 1. use integers and functions like byte, ldb, and
> dpb ... 2. make bit-vectors and use sequence operations.

If you'll only be using your bits as truth values, or as one-bit
numbers, then bit-vectors are fine.  But if you'll ever want to
interpret some groups of bits as an n-bit numbers, where n > 1,
and you don't have a specific need for any sequence operations,
I think just using integers instead of vectors would be a win.

> Are there obvious advantages/disadvantages to either that someone
> can point out?

Bit-vectors have only one truly fundamental advantage (i. e.
something that integers can't do at all).  That's the ability to
have a fixed length that's independent of the bits it contains.
For example, if I just give you the number 60, you don't have any
way to know if I meant it as an 8-bit number, or a 19-bit number,
or whatever.  But if I give you the bit-vector #*00111100, you know
that I meant it as a vector of length 8.  To get the same effect
with integers, you'd have to create a data structure (perhaps a
cons) and store the length as separate small integer.

If you don't need that particular feature (built-in, data-independent,
fixed length), your decision is just a matter of speed and convenience,
because you could get every other feature just by converting back
and forth when appropriate.

-- Robert Munyer <······@mcs.com>

From: Tim Bradshaw
Subject: Re: Byte Manipulation
Date: Tue, 22 Feb 2000 00:00:00 +0000
Message-ID: <ey3itzhauhj.fsf@cley.com>

* Robert Munyer wrote:

> If you'll only be using your bits as truth values, or as one-bit
> numbers, then bit-vectors are fine.  But if you'll ever want to
> interpret some groups of bits as an n-bit numbers, where n > 1,
> and you don't have a specific need for any sequence operations,
> I think just using integers instead of vectors would be a win.

Note there's a significant disadvantage to using integers, especially
if you end up using bignums, which is that you can't modify them.  So
if I want to set some bit to zero, I have to make a copy of the whole
object.  Of course a SSC might, in theory be able to notice that the
original number was not referred to elsewhere, and modify it in place,
but I wonder if anyone actually does that.

Of course this is only a big problem if you have bignums and you want
to modify them.

--tim

From: Robert Monfera
Subject: Re: Byte Manipulation
Date: Tue, 22 Feb 2000 00:00:00 +0000
Message-ID: <38B30120.37F210E2@fisec.com>

Tim Bradshaw wrote:

> Note there's a significant disadvantage to using integers, especially
> if you end up using bignums, which is that you can't modify them.  So
> if I want to set some bit to zero, I have to make a copy of the whole
> object.

I see your point when the number is a bignum.

Some compilers seem to know the number of references to a number as they
analyze the lexical environment or nested numerical expressions.  If a
number is _unboxed_ (for example, you accumulate numbers into a variable
inside a loop, or perform bit operations), these compilers generate code
that avoids consing (copying) already.  Or are you suggesting that they
could use assembly instructions that directly operate on a memory byte
without having to fetch (copy) it into a register, perform the operation
and store back?

In case I misunderstood you, could you give an example?

Robert

From: Tim Bradshaw
Subject: Re: Byte Manipulation
Date: Wed, 23 Feb 2000 00:00:00 +0000
Message-ID: <ey33dqkb1oh.fsf@cley.com>

* Robert Monfera wrote:

> I see your point when the number is a bignum.

I think the only interesting case is when the numbers are bignums.  If
their fixnums (for the usual implementation, I guess, I don't know if
that's even defined anywhere), then you don't cons anyway since the
number is represented immediately.

> Some compilers seem to know the number of references to a number as they
> analyze the lexical environment or nested numerical expressions.  If a
> number is _unboxed_ (for example, you accumulate numbers into a variable
> inside a loop, or perform bit operations), these compilers generate code
> that avoids consing (copying) already.  Or are you suggesting that they
> could use assembly instructions that directly operate on a memory byte
> without having to fetch (copy) it into a register, perform the operation
> and store back?

No, I wasn't suggesting anything fancy (I don't think modern machines
support memory-memory ops anyway typically, do they?).  All I meant
was something like this:

	(let ((x (1- (expt 2 100))))
          (let ((y (logand x (1- (expt 2 50)))))
            ...))

(here X and Y are meant to be obvious bignums), then X is not altered
by this operation, so there needs to be a whole bunch of copying going
on.  Now, perhaps a very smart compiler could spot that X was not used
in the ... bit, and actually reuse its storage for Y, but I wonder if
any do.

This is completely different than the bit-array case where you can
explicitly modify arrays in a way that should be obviously cons-free.

--tim

From: Robert Monfera
Subject: Re: Byte Manipulation
Date: Wed, 23 Feb 2000 00:00:00 +0000
Message-ID: <38B3F7FE.BE2F16B3@fisec.com>

Tim Bradshaw wrote:

> I think the only interesting case is when the numbers are bignums. If
> their fixnums (for the usual implementation, I guess, I don't know if
> that's even defined anywhere), then you don't cons anyway since the
> number is represented immediately.

If we look at your example or this one:

(defun reuse (a x)
  (declare (optimize (safety 0) speed)
           (type fixnum x)
           (type (simple-array (signed-byte 32) (*)) a))
  (setf (aref a 1)
    (+ 10 (* 2 (1- (logand x 127)))))
  (values))

consing (memory allocation for y) or additional register usage would
still be performed by a sufficiently naive compiler even for fixnums.
(I meant to include unboxed floats BTW.)  The reason you are right in
practical terms is that fixnums or floats would normally be in registers
in their wholeness.

The only reason I bring up this suspicion for is that I think the
analysis you describe below is already performed by compilers, to a
certain extent at least, because it's almost the same as the commonly
used register coloring.  The compiler tries to reuse registers that hold
values not needed any more in an effort to minimize memory access and
usage.

The difference is whether we talk about a compiler not smart enough to
trace lexical variable use vs. a sufficiently smart compiler with
suboptimal bignum handling.  I prefer the latter :-)

> No, I wasn't suggesting anything fancy (I don't think modern machines
> support memory-memory ops anyway typically, do they?).  All I meant
> was something like this:
>
>         (let ((x (1- (expt 2 100))))
>           (let ((y (logand x (1- (expt 2 50)))))
>             ...))
>
> (here X and Y are meant to be obvious bignums), then X is not altered
> by this operation, so there needs to be a whole bunch of copying going
> on.  Now, perhaps a very smart compiler could spot that X was not used
> in the ... bit, and actually reuse its storage for Y, but I wonder if
> any do.

In any case, this code is equivalent with this:

(let ((y (logand (1- (expt 2 100)) (1- (expt 2 50))))) ...)

where (1- (expt 2 100) and (1- (expt 2 50)) are constants, but if you
subtracted values unknown at compilation time, I'd expect three
allocations of bignums generated _inside_ the expression.

> This is completely different than the bit-array case where you can
> explicitly modify arrays in a way that should be obviously cons-free.

Because bignums are probably implemented as some sequence of
quasi-fixnums or bit-vectors, there may be some unsupported,
implementation-dependent way to achieve this effect with bignums.

Robert

From: Tim Bradshaw
Subject: Re: Byte Manipulation
Date: Wed, 23 Feb 2000 00:00:00 +0000
Message-ID: <ey3ya8banzx.fsf@cley.com>

* Robert Monfera wrote:

> consing (memory allocation for y) or additional register usage would
> still be performed by a sufficiently naive compiler even for fixnums.
> (I meant to include unboxed floats BTW.)  The reason you are right in
> practical terms is that fixnums or floats would normally be in registers
> in their wholeness.

unless you mean floats (which are not fixnums), I'd be kind of
surprised if there is consing going on for any fixnum-only operation.
It's kind of a definition of fixnums that they don't cons (yes, I
know, it's not really a definition, but it works) because they are
represented immediately.  I don't care about additional register
usage, that's way too implementation-dependent to be interesting
(unless you have an implementation where registers are
heap-allocated!).

I think you are still not understanding what I'm saying (but I may be
wrong, and I may also be saying it very badly).

I'm talking about objects which you need to `change', in the sense
that given some object x you want an object y which is like it but not
quite.

If the objects have mutation operations on them, then I have the
option of altering x in place.  If they don't, I need to make a whole
new object which is suitably different from the original object.

Numbers do not have mutation operations in CL.  Bit-vectors do.

If the objects in question are heap-allocated (like bignums are
generally), then if there are no mutation operations, I must consume
heap to do this change operation.

If the objects are *not* heap allocated (like fixnums generally), then
I do not need to consume heap to do this, even when there are no
mutation operations.

However, it might be the case that smart compilers can avoid this heap
allocation in certain special cases, namely when they can prove that
the original object is now unreachable, and thus reuse its storage.

As I said, I would be reasonably surprised if any compilers do this
for general bignums.  Some compilers can do something which looks
similar, which is to represent some small bignums immediately inside
functions.  But that's not the same trick really.  (CMUCL can do this
for 32bit numbers and also for complexes I think.)  However I'm not an
implementor and I could easily be wrong.

Another way you can avoid the heap allocation would be by
stack-allocating the objects instead of heap-allocating them.
Some compilers definitely support this for some types, I don't know
about bignums though.

--tim

From: Robert Monfera
Subject: Re: Byte Manipulation
Date: Tue, 22 Feb 2000 00:00:00 +0000
Message-ID: <38B2EF5A.347141A1@fisec.com>

Robert Munyer wrote:

> Bit-vectors have only one truly fundamental advantage (i. e.
> something that integers can't do at all).  That's the ability to
> have a fixed length that's independent of the bits it contains.
> For example, if I just give you the number 60, you don't have any
> way to know if I meant it as an 8-bit number, or a 19-bit number,
> or whatever.  But if I give you the bit-vector #*00111100, you know
> that I meant it as a vector of length 8.  To get the same effect
> with integers, you'd have to create a data structure (perhaps a
> cons) and store the length as separate small integer.

Or you could use an integer and have the convention of a leading 1 after
12 zeros, which has obvious advantages and disadvantages.

Another thing you can nicely do with bit-vectors is getting
arbitrary-length chunks at arbitrary positions, which is useful for
compression schemes for example.  (It can be done easily with
arbitrary-length bignums and masking, but it wouldn't be too efficient
:-)

Robert