From: Paul Wilson
Subject: heaps of numbers (tagged immediate float formats & float usage)
Date: 
Message-ID: <1990Mar5.220724.3718@Neon.Stanford.EDU>
[ continuing a thread from comp.arch ]

FLAME-RETARDANT NOTE:

Despite some positive and (very) negative responses about non-IEEE 
low-precision floats, that's not really what I'm talking about here.  
I'm *not* sacrificing any precision or range, I'm just special-casing 
the representation and handling of the most common case, for performance
reasons.

>-->    [ I wrote: ]
>--> I'm looking for info that would be useful in implementing short
>--> floating point numbers for Scheme/Lisp/Smalltalk/Icon etc.
>--> 
>--> The basic idea is that I want short (1-word) floating point numbers to
>--> fit in one 32-bit words, *with* a runtime tag included.  My plan is
>--> [...shoehorning floats within the usual range into one word, and heap-
>-->     allocating those that that won't fit into the 1-word format... ]
>--> Of course, this means I have to sacrifice a few bits of a standard 32-bit
>--> float to make it fit.
>
>Standard 32-bit floats have 1 sign bit, 8 exponent bits and 23 (+1) bits of
>mantissa.  This gives an exponent range of 10**-38 to 10**+38, and a
>precision of about 7 decimal digits.
>
>For each bit of exponent you drop you cut the range in half -- 4 bit exponents
>only leave a range of .004 to 511.  Each four bits of mantissa you drop will
>lose you slightly more than 1 decimal digit.

I don't get this.  It seems to me that if 8 bits gives you a range of 10**-38
to 10**+38, then 4 bits ought to give you half as many orders of magnitude,
or about 10**-19 to 10**+19.  That's a LOT smaller/bigger than just
.004 to 511.  Maybe you lose a little because of the sign bit or something,
but I'd guess you could express the large majority of numbers that occur
in *most* programs.  (There would certainly be a significant minority of
programs for which this would break down, though, as a couple of astro guys
have pointed out to me.)

>Furthermore, anything you do will mean that the FP hardware won't work with
>your number format.  And, no, you couldn't fake it, truncate it or trick it
>to get "approximately right" results from a chain of calculations "most of
>the time".  Your users would shoot you.

I'm not planning to use approximations, ever, except in the sense that IEEE
formats are themselves approximations.  (In particular, I wouldn't try to
truncate mantissas _at_all_.) I'm just doing a little compression
of common ranges of data.  As with Lisp "infinite precision"
integers, numbers that wouldn't fit into the shoehorned
one-word format would be represented differently -- as pointers to
full-sized objects (with headers added) on the heap.  They could overflow
into doubles if they need the extra range.  You could also use doubles
for your temporaries to keep from losing any precision or causing unnecessary
over/underflows, and then squish things down to short floats before storing
them into the heap (iff they'll fit).

The idea is to make the shoehorned format transparent to the user.  The
potential hitch is that if numbers frequently overflow, you get a
performance hit.  (You not only have to spend the space and time of
allocating several words on the heap, but with a generational gc you 
have to do at least a handful more, to check whether you're creating 
a pointer into a younger generation than you're storing into.  
Nonpointer objects are therefore much nicer.)

My users may kill me for the performance hit, if they have programs that
allocate lots of floats that end up on the heap.  (This is the bane of
generation scavenging.)  In particular, non-temporary double floats are 
always going to have to be allocated on the heap.  I've got ideas for 
dealing with them, too, but that would require a better compiler than I
feel like writing at the moment, or the use of ugly declarations.

It's been pointed out to me that a lot of crunch-intensive programs use
normalization for really big/small numbers to keep from overflowing the IEEE 
float format.  I'm wondering whether this would work to keep things within a 
4-bit exponent range.  If you're normalizing anyway, do you lose big by
being constrained to an even smaller range?  (If necessary, I can sacrifice
as few as two bits of my floats, but four would be easier.  Assuming 8-bit
exponents are standard, that leaves 4 to 6 bits before overflowing the 
shoehorned format.)

If this worked out well in practice, a guideline "small float"
format could be established, and libararies could be written that were
careful to avoid this performance hit.  (Implementations that didn't use
immediate floats, -- or whose immediate floats didn't have enough range -- 
could still run the same code _and_get_the_exact_same_results_, but might
not perform as well.  You could translate your FORTRAN programs to Scheme,
without losing big because of precision problems.)

The question is this -- how many bits should be in the "minimal" range
for immediate floats?  It would be nice to leave several bits for the
tag scheme, so that you could distinguish between more immediate types
(e.g., you might also have immediate complex numbers and ratios, whose
components were two 13-bit ints packed into the word with a 6-bit tag.)

John Levine sez that the VAX has a tiny little immediate float format,
with a three-bit mantissa and three-bit exponent, which suffices for a lot
of small constants.  (e.g., 1, 15, 1/2, 3/4)  He suggests similarly optimizing
a few special cases of heap floats (perhaps just zero).  Apparently these 
little VAX floats account for a big fraction of compiled constants.  (John 
also suggested a flat table-lookup technique for packing and unpacking 
floats, combining the tag checks & range overflow checks, and even
the checks for special cases.)

Anybody got any *dynamic* float distribution data?  I'm more worried 
about what gets stored into heap objects than what occurs in arithmetic 
expressions.  (I assume a decent compiler will just use normal 
representations for intermediate results results that the gc never 
sees, and that compile-time literals can be stored somewhere in standard 
IEEE format.)  I assume that the dynamic distribution will not be nearly 
as friendly as the static one, but I'd guess that it's still pretty modal. 

Actually, I'd be interested in any kind of empirical information on float
frequencies -- static, quasi-static, or dynamic.  (By "static" I mean 
compile-time, by "dynamic" I mean what gets operated on the most, and by
"quasi-static" I mean what the distribution of floats _in_memory_ is if 
you make a snapshot of memory at random intervals.)  Data for 
languages like C and FORTRAN is fine, since I wouldn't want to assume
people would keep writing the same kinds of programs in Lisp/Smalltalk,
etc. that they do now.


Thanks,

   Paul


Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   ······@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680 

From: Jim Giles
Subject: Re: heaps of numbers (tagged immediate float formats & float usage)
Date: 
Message-ID: <14261@lambda.UUCP>
From article <····················@Neon.Stanford.EDU>, by ······@carcoar.Stanford.EDU (Paul Wilson):
> [...]
> I don't get this.  It seems to me that if 8 bits gives you a range of 10**-38
> to 10**+38, then 4 bits ought to give you half as many orders of magnitude,
> or about 10**-19 to 10**+19.  [...]

Exponent range scales as the _LOG_ (base 2) of the number of bits in
the exponent.  So, 8 bits gives an exponent range of (about) 2^-127 thru
2^127 - that is, about 10^-39 thru 10^39.  A 4 bit exponent only gives
an exponent range of about 2^-15 thru 2^15 - that is, a little more than
10^-5 thru 10^5.

J. Giles
From: Paul Wilson
Subject: CORRECTION to: heaps of numbers (tagged immediate floats)
Date: 
Message-ID: <1990Mar6.015230.20068@Neon.Stanford.EDU>
Thanks to Carl Lowenstein, Jim Giles and Herman Rubin for pointing out
my misunderstanding of the IEEE float format.  I had not realized that
the exponent is really interpreted with a double exponentiation -- that
changes things a bit.  (Or maybe two bits.)

(I seem to recall that the only floating point format I ever had to learn
used a power of a fixed number, and I thought IEEE would be the same.)

So it looks like each marginal bit would be more important than I thought,
favoring stealing _very_few_ bits from the exponent. 2**(2**4) would
only give you a range of 1/32K to 32K.  But 2**(2**5) would give you
a range from 1/2nano- to 2giga- which seems pretty reasonable.  And 2**(2**6)
goes from pretty seriously small to pretty seriously big (by my standards).

I interpret this to mean that I can steal 2 bits from the 8-bit exponent, 
leaving 6, or maybe 3, leaving 5.

It would be awkward to use less than two bits for primary tags, given that
you probably want separate tags for pointers, immediate ints, and other
immediates.  So it looks like the question comes down to this:  is the
1/2nano- to 2giga- range enough for the large majority of floats that
get stored into memory, or should I go with the less convenient scheme 
of only stealing 2 bits?  In the latter case I'd have to use up one of our 
four primary (2-bit low-) tags, but we could live with it.

Any empirical info relevant to this tradeoff would be greatly appreciated.

And if I've got it wrong somehow, please point it out.  (As I see it, my
real assumption is this:  the upper 2 or 3 non-sign bits of an IEEE short
are usually zero.  If I've misunderstood FP representation or
distributions, and this is not true, let me know.)

   -- Paul

Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   ······@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680 
From: Jeff Kenton
Subject: Re: CORRECTION to: heaps of numbers (tagged immediate floats)
Date: 
Message-ID: <11313@encore.Encore.COM>
From article <·····················@Neon.Stanford.EDU>, by ······@carcoar.Stanford.EDU (Paul Wilson):
> 
> And if I've got it wrong somehow, please point it out.  (As I see it, my
> real assumption is this:  the upper 2 or 3 non-sign bits of an IEEE short
> are usually zero.  If I've misunderstood FP representation or
> distributions, and this is not true, let me know.)
> 

Unfortunately for your purposes, the exponent is "biased" by 127.  This means
that the representation of exponents runs from 1 to 255 (smallest to largest).
All floats >= 2.0 have the high bit of the exponent set.  Yet another gotcha.







- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      jeff kenton  ---	temporarily at ·······@pinocchio.encore.com	 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
From: Tim Moore
Subject: Re: CORRECTION to: heaps of numbers (tagged immediate floats)
Date: 
Message-ID: <1990Mar5.221226.4841@hellgate.utah.edu>
In article <·····················@Neon.Stanford.EDU> ······@carcoar.Stanford.EDU (Paul Wilson) writes:

>So it looks like each marginal bit would be more important than I thought,
>favoring stealing _very_few_ bits from the exponent. 2**(2**4) would
>only give you a range of 1/32K to 32K.  But 2**(2**5) would give you
>a range from 1/2nano- to 2giga- which seems pretty reasonable.  And 2**(2**6)
>goes from pretty seriously small to pretty seriously big (by my standards).
>
>I interpret this to mean that I can steal 2 bits from the 8-bit exponent, 
>leaving 6, or maybe 3, leaving 5.
>
>It would be awkward to use less than two bits for primary tags, given that
>you probably want separate tags for pointers, immediate ints, and other
>immediates.  So it looks like the question comes down to this:  is the
>1/2nano- to 2giga- range enough for the large majority of floats that
>get stored into memory, or should I go with the less convenient scheme 
>of only stealing 2 bits?  In the latter case I'd have to use up one of our 
>four primary (2-bit low-) tags, but we could live with it.
>

I've been thinking about implementing short floats for Utah Common
Lisp. I think it would be better to steal bits from the mantissa than
the exponent. With our low tags scheme, a simple mask operation would
restore a short float to an IEEE 32-bit float with the least
significant bits of the mantissa zeroed, allowing the floating point
hardware to do its thing.

The original post by Paul Wilson discussed the problem of deciding
when to use short floats vs. when to use long floats to avoid losing
precision. This isn't an issue in Common Lisp (I'm reading this in
comp.lang.lisp); the rules of floating point contagion (promotion) are
defined by the language, and the user can chose the reader's default
float type. 

Tim Moore                     ·····@cs.utah.edu {bellcore,hplabs}!utah-cs!moore
"Ah, youth. Ah, statute of limitations."
		-John Waters
From: ········@gorgo.ifi.unizh.ch
Subject: Re: CORRECTION to: heaps of numbers (tagged immediate floats)
Date: 
Message-ID: <1990Mar7.165256.28867@gorgo.ifi.unizh.ch>
···@lambda.UUCP (Jim Giles) writes:
> ...A 4 bit exponent only gives an exponent range of about 2^-15 thru
> 2^15 - that is, a little more than 10^-5 thru 10^5.

······@carcoar.Stanford.EDU (Paul Wilson) writes:
> Thanks to Carl Lowenstein, Jim Giles and Herman Rubin for pointing out
> my misunderstanding of the IEEE float format.  I had not realized that
> the exponent is really interpreted with a double exponentiation -- that
> changes things a bit.  (Or maybe two bits.)
>
> (I seem to recall that the only floating point format I ever had to learn
> used a power of a fixed number, and I thought IEEE would be the same.)
>
> So it looks like each marginal bit would be more important than I thought,
> favoring stealing _very_few_ bits from the exponent. 2**(2**4) would
> only give you a range of 1/32K to 32K.  But 2**(2**5) would give you
> a range from 1/2nano- to 2giga- which seems pretty reasonable.  And 2**(2**6)
> goes from pretty seriously small to pretty seriously big (by my standards).
>
> I interpret this to mean that I can steal 2 bits from the 8-bit exponent,
> leaving 6, or maybe 3, leaving 5.

····@tygra.UUCP (David Conrad) writes:
> Well, he said he could use as little as 2 bits, so with 6 bits of
> exponent the range should be 2^63 thru 2^-63 or roughly 10^19 thru 10^-19,
> probably sufficient for his purposes.

Please folks. It's so simple.

- With four bits you can represent sixteen values, e.g. -8 to 7.

- An exponent between -8 and 7, together with a mantissa between 0.5 and
  (almost) 1, covers the range 0.5 * 2^-8 to (almost) 2^7. (Assuming that
  you don't represent denormalized numbers, they don't make sense for the
  application being discussed.)

- Alternatively, an exponent between -7 and 8 would cover 0.5 * 2^-7 to
  (almost) 2^8. You can fine-tune the range by defining how you interpret
  the sixteen four-bit-combinations.

So, five bits gives approximately 1/64K to 64K. Six bits (not five) gives
the nano to giga range. Seven bits gives approximately 10^-19 to 10^19.
And eight bits approximately 10^-38 to 10^38, as we all know.

Besides, there is no "double exponentiation" involved. The constant "two"
is simply raised to the power indicated by the exponent. But having one
more exponent bit gives you twice as many possible exponent values: The
exponent range is DOUBLED, which means the floating-point number range is
SQUARED.

But I like Paul Wilson's idea of special-caseing the common values.

---
Daniel Schaerer, University of Zurich/Switzerland
········@ifi.unizh.ch
From: David Conrad
Subject: Re: heaps of numbers (tagged immediate float formats & float usage)
Date: 
Message-ID: <74@tygra.UUCP>
>
>Exponent range scales as the _LOG_ (base 2) of the number of bits in
>the exponent.  So, 8 bits gives an exponent range of (about) 2^-127 thru
>2^127 - that is, about 10^-39 thru 10^39.  A 4 bit exponent only gives
>an exponent range of about 2^-15 thru 2^15 - that is, a little more than
>10^-5 thru 10^5.
>
>J. Giles

Well, he said he could use as little as 2 bits, so with 6 bits of
exponent the range should be 2^63 thru 2^-63 or roughly 10^19 thru 10^-19,
probably sufficient for his purposes.
--
David R. Conrad
·················@um.cc.umich.edu
···············@sharkey.cc.umich.edu

-- 
=  CAT-TALK Conferencing Network, Prototype Computer Conferencing System  =
-  1-800-825-3069, 300/1200/2400/9600 baud, 8/N/1. New users use 'new'    - 
=  as a login id.    <<Redistribution to GEnie PROHIBITED!!!>>>           =
E-MAIL Address: ···············@sharkey.cc.umich.edu