why are tags and data packed together?

From: ···········@my-deja.com
Subject: why are tags and data packed together?
Date: Mon, 15 Nov 1999 00:00:00 +0000
Message-ID: <80nj8e$93$1@nnrp1.deja.com>

Why do modern lisp compiler implementations tend to pack data and tag
into a single 32 bit value?  For example in my implementation a fixnum
is often encoded as bits [31:2], and bits [0:1] are busy saying "I'm a
fixnum".

Why don't compilers store the tag bits in a seperate place, leaving all
32 bits for data, and thus reducing code on multiplies etc since the
data needn't be unpacked?

It seems like it would be better to store the tags of several data items
in a seperate 32 bit quantity, perhaps up to 8 of them packed into a
single word.


Sent via Deja.com http://www.deja.com/
Before you buy.

Re: why are tags and data packed together? Johan Kullstam
Re: why are tags and data packed together? Tim Bradshaw
- Re: why are tags and data packed together? Fernando D. Mato Mira
Re: why are tags and data packed together? Barry Margolin
- Re: why are tags and data packed together? Roger Corman
  - Re: why are tags and data packed together? Barry Margolin
Re: why are tags and data packed together? Harley Davis
- Re: why are tags and data packed together? Rob Warnock

From: Johan Kullstam
Subject: Re: why are tags and data packed together?
Date: Sun, 14 Nov 1999 00:00:00 +0000
Message-ID: <m2wvrkleg7.fsf@sophia.axel.nom>

···········@my-deja.com writes:

> Why do modern lisp compiler implementations tend to pack data and tag
> into a single 32 bit value?  For example in my implementation a fixnum
> is often encoded as bits [31:2], and bits [0:1] are busy saying "I'm a
> fixnum".

because data always carries its type around with it (except when
declarations and optimizations make it superfluous).  you need the
bits saying "i'm a fixnum".  most lisp vendors/implementors make a
design decision to pack this into a single word rather than using two
words.  99% of the time, this is the right thing.

> Why don't compilers store the tag bits in a seperate place, leaving all
> 32 bits for data, and thus reducing code on multiplies etc since the
> data needn't be unpacked?

the tag bits need to be manipulated (stored, moved, copied &c) with
the data.  manipulating two words is more expensive than manipulating
one.

i look forward to the day when 64 bit processors become inexpensive.
i think this will be a big help to languages with heavy types like
lisp.

> It seems like it would be better to store the tags of several data items
> in a seperate 32 bit quantity, perhaps up to 8 of them packed into a
> single word.

btw CMUCL can do 32 bit integers.  arrays could also have a global
type tag.

-- 
J o h a n  K u l l s t a m
[········@ne.mediaone.net]
Don't Fear the Penguin!

From: Tim Bradshaw
Subject: Re: why are tags and data packed together?
Date: Mon, 15 Nov 1999 00:00:00 +0000
Message-ID: <ey3k8nkgmqw.fsf@lostwithiel.tfeb.org>

* david morse wrote:

> It seems like it would be better to store the tags of several data items
> in a seperate 32 bit quantity, perhaps up to 8 of them packed into a
> single word.

It seems to me that a strategy like this would be problematic for a
few reasons: How do you know where these other tag words *are* for an
arbitrary word?  Once you have that, then you need to do two loads in
the general case to find out if something is a fixnum, say, which is
bad.  You can win if the objects that share a tag word are used at the
same time, so then you need only a small overhead to load the tag word
once; but that's kind of hard in general I expect. Or perhaps (only
for small objects like fixnums) you could have little blocks of tag
word + n numbers, which all fitted in a cache line so they all got
loaded, effectively, anyway.

The other point is that, for the small number of types where the type
information is directly in the word, removing it is usually a mask
and/or shift, which should be incredibly cheap, certainly much cheaper
than a memory access, and perhaps even cheaper than an extra register
full of tag info (though modern machines, other than x86, seem to have
ludicrous numbers of registers!).  For all other types, you have a
separate word of type info anyway.

--tim

From: Fernando D. Mato Mira
Subject: Re: why are tags and data packed together?
Date: Mon, 15 Nov 1999 00:00:00 +0000
Message-ID: <38302514.5BC910C5@iname.com>

Tim Bradshaw wrote:

> The other point is that, for the small number of types where the type
> information is directly in the word, removing it is usually a mask
> and/or shift, which should be incredibly cheap, certainly much cheaper
> than a memory access, and perhaps even cheaper than an extra register

You might also tag fixnums in such a way that addition and substraction are
performed directly w/o extra bit fiddling instructions.

--
Fernando D. Mato Mira
Real-Time SW Eng & Networking
Advanced Systems Engineering Division
CSEM
Jaquet-Droz 1                   email: matomira AT acm DOT org
CH-2007 Neuchatel                 tel:       +41 (32) 720-5157
Switzerland                       FAX:       +41 (32) 720-5720

www.csem.ch      www.vrai.com     ligwww.epfl.ch/matomira.html

From: Barry Margolin
Subject: Re: why are tags and data packed together?
Date: Mon, 15 Nov 1999 00:00:00 +0000
Message-ID: <qZWX3.49$6v5.955@burlma1-snr2>

In article <···········@nnrp1.deja.com>,  <···········@my-deja.com> wrote:
>Why do modern lisp compiler implementations tend to pack data and tag
>into a single 32 bit value?  For example in my implementation a fixnum
>is often encoded as bits [31:2], and bits [0:1] are busy saying "I'm a
>fixnum".
>
>Why don't compilers store the tag bits in a seperate place, leaving all
>32 bits for data, and thus reducing code on multiplies etc since the
>data needn't be unpacked?

In many cases, these bits are free.  Most objects in Lisp are represented
using pointers, and these pointers often point to data blocks that are at
least 4 or 8 bytes long; if you align them on 4- or 8-byte boundaries, you
have several low-order bits that are effectively unused, so they can be
coopted to hold type information.

An alternative scheme used in some Lisp implementations is BIBOP: Big Bag
Of Pages.  In this system, each page of memory is assigned to hold a
particular type of data, and there's a table indicating what each page's
data type is.  So that high-order bits of the pointers are indirect type
tags.  One drawback of this is that it can lead to more memory
fragmentation and paging.  Also, you still have to grab some bits for
immediate data types; however, since they're high-order bits, it's common
to dedicate something like 0000 and 1111 for negative integers, so no
masking or shifting is needed during arithmetic.

-- 
Barry Margolin, ······@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

From: Roger Corman
Subject: Re: why are tags and data packed together?
Date: Mon, 15 Nov 1999 00:00:00 +0000
Message-ID: <38307ca7.1111195374@nntp.best.com>

On Mon, 15 Nov 1999 16:59:03 GMT, Barry Margolin
<······@bbnplanet.com> wrote:

>In article <···········@nnrp1.deja.com>,  <···········@my-deja.com> wrote:
>>Why do modern lisp compiler implementations tend to pack data and tag
>>into a single 32 bit value?  For example in my implementation a fixnum
>>is often encoded as bits [31:2], and bits [0:1] are busy saying "I'm a
>>fixnum".
Just to add a little to what Barry and others already pointed out:

If you use the lower two or three unused bits of heap addresses as a
tag, the cost at run time is probably zero, because you normally add
some offset to the pointer dereference anyway, and the extra bits can
be managed entirely at compile time by adjusting the generated code.
On Intel at least, there is no cost associated with having an offset
vs. not having an offset.

As for fixnums, if you choose a tag of zero, in the low bits, for
fixnums, then addtion, subtraction and comparisons all work without
any tag manipulation (either to the operands or to the result).
Multiply and divide each require one shift to one of the operands
prior to the operation, and the result is already tagged. This is
extremely cheap, and you aren't going to do any better any way I could
think of.

The biggest problem, in my opinion, is people's natural idea that
everything should take up 16 or 32 or 64 bits. This manifests itself
in standards that lisps wants to be compatibile with (like IEEE
floats) and makes it tricky to fit tags into those sizes. There is
really no reason ints need to be 32 bits, but a lot of programmers
probably have a gut feeling that they should. This is manifested in
Java, where the language specifies the size of everything (at 16, 32
or 64 bits, of course) making tagged implementations of Java much
trickier to build.

I don't wee any reason that a 64-bit machine fundamentally changes the
issues here, either. You still need tags, and are still not
necessarily going to want to double the size of a piece of data to
have a tag. I picture on a 64-bit machine you would have either 61-bit
fixnums or 29 or 30-bit fixnums, because you would want to use as much
of the memory for data as possible. Having a 32-bit fixnum and 2-bit
tag seems like it is unnecessarily wasting 30 bits. Of course I assume
addresses will be larger than 32-bits, and the same kind of game would
be played with those.

Roger Corman

From: Barry Margolin
Subject: Re: why are tags and data packed together?
Date: Mon, 15 Nov 1999 00:00:00 +0000
Message-ID: <gx0Y3.83$6v5.1937@burlma1-snr2>

In article <···················@nntp.best.com>,
Roger Corman <·····@xippix.com> wrote:
>I don't wee any reason that a 64-bit machine fundamentally changes the
>issues here, either. You still need tags, and are still not
>necessarily going to want to double the size of a piece of data to
>have a tag. I picture on a 64-bit machine you would have either 61-bit
>fixnums or 29 or 30-bit fixnums, because you would want to use as much
>of the memory for data as possible. Having a 32-bit fixnum and 2-bit
>tag seems like it is unnecessarily wasting 30 bits. Of course I assume
>addresses will be larger than 32-bits, and the same kind of game would
>be played with those.

IIRC, Symbolics only ported Open Genera to the Alpha processor because it
was 64 bits, allowing them to fit the LispM tagged words into Alpha 64-bit
words.

While this is quite wasteful of space, 64-bit machines provide time
efficiencies that offset this somewhat.  There are instructions to load
these 64-bit words in one step, instructions to manipulate them in large
registers, and 64-bit data paths that move it all more efficiently around
the processor.

-- 
Barry Margolin, ······@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

From: Harley Davis
Subject: Re: why are tags and data packed together?
Date: Mon, 15 Nov 1999 00:00:00 +0000
Message-ID: <38306ea5$0$231@newsreader.alink.net>

<···········@my-deja.com> wrote in message
················@nnrp1.deja.com...
> Why do modern lisp compiler implementations tend to pack data and tag
> into a single 32 bit value?  For example in my implementation a fixnum
> is often encoded as bits [31:2], and bits [0:1] are busy saying "I'm a
> fixnum".
>
> Why don't compilers store the tag bits in a seperate place, leaving all
> 32 bits for data, and thus reducing code on multiplies etc since the
> data needn't be unpacked?
>
> It seems like it would be better to store the tags of several data items
> in a seperate 32 bit quantity, perhaps up to 8 of them packed into a
> single word.

John Rose's Scheme implementation "esh" actually did this by passing an
extra type arg with all arguments passed around and stored.  It was cool
because it let all 32 bits be used for value/pointers which aided his goal
of simplifying interoperability with C, but it did require 2x the space for
argument passing and introduced the obvious complexities in maintaining the
link between objects and their types.  Still, I think it was an interesting
tradeoff.

-- Harley

From: Rob Warnock
Subject: Re: why are tags and data packed together?
Date: Tue, 16 Nov 1999 00:00:00 +0000
Message-ID: <80qlgv$r0t9@fido.engr.sgi.com>

Harley Davis <·············@nospam.museprime.com> wrote:
+---------------
| <···········@my-deja.com> wrote"
| > Why do modern lisp compiler implementations tend to pack data and tag
| > into a single 32 bit value?
| 
| John Rose's Scheme implementation "esh" actually did this by passing an
| extra type arg with all arguments passed around and stored.
+---------------

"Elk", as distributed[*],  does this, too.

David, you've gotten lots of good answers, but if you *really* want a
heavy dose of this stuff, be sure to read:

  Gudeman. "Representing Type Information in Dynamically Typed Languages"
  <URL:ftp://ftp.cs.indiana.edu/pub/scheme-repository/doc/pubs/typeinfo.ps.gz>

It goes into a large number of variations on how to "wrap" native machine
values into dynamically-typed values and vice-versa, including low-level
efficiency tradeoffs between them. E.g., Gudeman refers to the "esh"/"Elk"
method as "double wrappers", whereas SCM, for example, uses "multi-stage
tagging".


-Rob

[*] Just for fun (and as a basis for another project), I hacked up Elk-3.0
to *not* use double wrappers, but a more classic tagged-pointer scheme.
Didn't seem to affect runtime much, though it *definitely* lowered memory
consumption. Changes to garbage collection seemed to have much more impact...

-----
Rob Warnock, 8L-846		····@sgi.com
Applied Networking		http://reality.sgi.com/rpw3/
Silicon Graphics, Inc.		Phone: 650-933-1673
1600 Amphitheatre Pkwy.		FAX: 650-933-0511
Mountain View, CA  94043	PP-ASEL-IA