Internal representation of fixnums

From: An Airplane
Subject: Internal representation of fixnums
Date: Thu, 09 Dec 1999 00:00:00 +0000
Message-ID: <385039B7.629DED4E@vatican.va>

i was looking at the compiled code produced by MCL, and i noticed something i don't quite
understand. Could someone explain this?

i defined a useless function:

(defun kl ()
    666)

and disassembled it, which produced this:

  (TWNEI NARGS 0)
  (MFLR LOC-PC)
  (BLA .SPSAVECONTEXTVSP)
  (LWZ NARGS 331 RNIL)
  (TWGTI NARGS 0)
  (LI ARG_Z '666)
  (BA .SPPOPJ)

when i disassembled it in MacsBug (low-level debugger) i got this:

            03D704C4   twnei      r8,0x0000                               | 0F080000
            03D704C8   mflr       r15                       ; LR = 0x0008 | 7DE802A6
            03D704CC   bla        0x00F09758                              | 48F0975B
            03D704D0   lwz        r8,0x014B(RTOC)                         | 8102014B
            03D704D4   twgti      r8,0x0000                               | 0D080000
            03D704D8   li         r23,0x0A68                              | 3AE00A68
            03D704DC   ba         0x00F09750                              | 48F09752

in the arguments to the li, 0x0A68 is (* 666 4). why is the number four times larger?

also, isn't that a lot of code just to return a fixnum?

Re: Internal representation of fixnums Barry Margolin
- Re: Internal representation of fixnums Robert Monfera
  - Re: Internal representation of fixnums Duane Rettig
    - Re: Internal representation of fixnums Robert Monfera
      - Re: Internal representation of fixnums Frode Vatvedt Fjeld
        Re: Internal representation of fixnums Tim Bradshaw
        Re: Internal representation of fixnums Robert Monfera
        Re: Internal representation of fixnums Tim Bradshaw
    - Re: Internal representation of fixnums Pekka P. Pirinen
- Re: Internal representation of fixnums Duane Rettig

From: Barry Margolin
Subject: Re: Internal representation of fixnums
Date: Thu, 09 Dec 1999 00:00:00 +0000
Message-ID: <b6X34.35$c72.2123@burlma1-snr2>

In article <·················@vatican.va>,
An Airplane  <····@vatican.va> wrote:
>in the arguments to the li, 0x0A68 is (* 666 4). why is the number four
>times larger?

A common way to represent Lisp data types is to use the low-order 2 bits
for the type tags.  All the boxed types are big enough that they need at
least 4-byte alignment, so you can just mask off these bits to get a
pointer to the object.  The type bits 00 and 11 are used for positive and
negative fixnums, respectively; this allows you to do addition and
subtraction of them without having to mask off the type tags.  But it also
means that the numbers are shifted up by two bits, which is the same as
multiplication by 4.

>also, isn't that a lot of code just to return a fixnum?

I think most of it was checking that you supplied the correct number of
arguments.  Try recompiling it with safety 0 and speed 3 to see if that
stuff is left out.

-- 
Barry Margolin, ······@bbnplanet.com
GTE Internetworking, Powered by BBN, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

From: Robert Monfera
Subject: Re: Internal representation of fixnums
Date: Fri, 10 Dec 1999 00:00:00 +0000
Message-ID: <385120F8.9DFFED8E@fisec.com>

Barry Margolin wrote:

> In article <·················@vatican.va>,
> An Airplane  <····@vatican.va> wrote:
> >in the arguments to the li, 0x0A68 is (* 666 4). why is the number four
> >times larger?
>
> A common way to represent Lisp data types is to use the low-order 2 bits
> for the type tags.

In LispWorks, it is 8 bits, which leaves you with 24-bit fixna.

Robert

From: Duane Rettig
Subject: Re: Internal representation of fixnums
Date: Fri, 10 Dec 1999 00:00:00 +0000
Message-ID: <44sdqagkp.fsf@beta.franz.com>

Robert Monfera <·······@fisec.com> writes:

> Barry Margolin wrote:
> 
> > In article <·················@vatican.va>,
> > An Airplane  <····@vatican.va> wrote:
> > >in the arguments to the li, 0x0A68 is (* 666 4). why is the number four
> > >times larger?
> >
> > A common way to represent Lisp data types is to use the low-order 2 bits
> > for the type tags.
> 
> In LispWorks, it is 8 bits, which leaves you with 24-bit fixna.

Someone from Harlequin will have to comment, but I do not think
that this is correct.

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)

From: Robert Monfera
Subject: Re: Internal representation of fixnums
Date: Fri, 10 Dec 1999 00:00:00 +0000
Message-ID: <38514255.B902FE14@fisec.com>

Duane Rettig wrote:
>
> Robert Monfera <·······@fisec.com> writes:
>
> > Barry Margolin wrote:
> >
> > > In article <·················@vatican.va>,
> > > An Airplane  <····@vatican.va> wrote:
> > > >in the arguments to the li, 0x0A68 is (* 666 4). why is the number four
> > > >times larger?
> > >
> > > A common way to represent Lisp data types is to use the low-order 2 bits
> > > for the type tags.
> >
> > In LispWorks, it is 8 bits, which leaves you with 24-bit fixna.
>
> Someone from Harlequin will have to comment, but I do not think
> that this is correct.

In general, I have a hard time finding information on how types and
objects in CL are represented, (un)boxed, and what their space occupancy
is.  I use maximum-sized arrays that mostly hold fixnum values, and some
5% bignums, that's why I looked this.  I thought at least I understood
this one:

CL-USER 79 > most-positive-fixnum
8388607

CL-USER 80 > (1+ (log (1+ most-positive-fixnum) 2))
24.0

Is it not an implication that fixnums are 24-bit values (of which one is
sign)?
To give a fuller picture though, when using declarations, it will use
(signed-byte 32) - but I can just as well declare signed-byte or
unsigned-byte myself.

Any clarification is appreciated.

Robert

From: Frode Vatvedt Fjeld
Subject: Re: Internal representation of fixnums
Date: Sun, 12 Dec 1999 00:00:00 +0000
Message-ID: <2hhfhoj6m6.fsf@dslab7.cs.uit.no>

Robert Monfera <·······@fisec.com> writes:

> To give a fuller picture though, when using declarations, it will
> use (signed-byte 32) - but I can just as well declare signed-byte or
> unsigned-byte myself.

My understanding of lisp integer types is that an integer object is
("is" as in "stored as".. [is there a lisp term for this?]) either a
fixnum or a bignum. Declaring a variable to be for example
(unsigned-byte 32) will not buy you anything (performance-wise),
because this will be upgraded to bignum when the lisp implementation's
fixnum type is less than 32 bits.

The (sole?) exception to this rule is when you have specialized
arrays, where all elements are of the same type, and the elements'
type information is (may be) stored only in the array object (not the
element objects). In such cases, implementations are typically able to
store (unsigned-byte 32)-objects as "true" 32-bits objects, for
example. This is what UPGRADED-ARRAY-ELEMENT-TYPE will inform you
about. But when such elements are extracted by AREF, they are
immediately coerced to fixnums or bignums ("boxed"), according to
value (although when AREF is used to designate a place, an optimizing
compiler may be able to skip the boxing).

> I use maximum-sized arrays that mostly hold fixnum values, and some
> 5% bignums, that's why I looked this.

If the values of those 5% all fit in 32 bits, a specialized array of
element-type (unsigned-byte 32) would require a little bit less
storage than a general array (one word less per bignum), at the cost
of the array not being able to hold elemets with values outside the
32-bit range.

But all you really need to do (or know) is to inform lisp about the
range of values (integer <min-value> <max-value>) for the
:element-type, and lisp will figure out for itself if it can use a
specialized array or not.

-- 
Frode Vatvedt Fjeld

From: Tim Bradshaw
Subject: Re: Internal representation of fixnums
Date: Sun, 12 Dec 1999 00:00:00 +0000
Message-ID: <ey3d7sbsx0t.fsf@cley.com>

* Frode Vatvedt Fjeld wrote:

> My understanding of lisp integer types is that an integer object is
> ("is" as in "stored as".. [is there a lisp term for this?]) either a
> fixnum or a bignum. Declaring a variable to be for example
> (unsigned-byte 32) will not buy you anything (performance-wise),
> because this will be upgraded to bignum when the lisp implementation's
> fixnum type is less than 32 bits.

It can buy you something in several cases.  for a start, within a
function a Lisp system may be able to represent things without tags,
and thus may be able to deal with bigger numbers with immediate
representations: CMUCL can do this.  Secondly if you declare something
unsigned it might be able to deduce useful things about it (for
instance if it's used as an array index, or if you take SQRT of it).
Thirdly, in a slightly variant case, where you are able to declare
two types which are substantially smaller than a fixnum, then the
system may be able to deduce that combinations of them will still fit
in a fixnum and thus compile better code -- since (+ fixnum fixnum)
may not be a fixnum, but (+ (unsigned-byte 16) (unsigned-byte 16)) is
(assuming 20-something bit fixnums).

If you have CMUCL you can see many of these things in action.

--tim

From: Robert Monfera
Subject: Re: Internal representation of fixnums
Date: Sun, 12 Dec 1999 00:00:00 +0000
Message-ID: <38545A74.B63ACCF1@fisec.com>

Frode Vatvedt Fjeld wrote:

> My understanding of lisp integer types is that an integer object is
> ("is" as in "stored as".. [is there a lisp term for this?]) either a
> fixnum or a bignum. Declaring a variable to be for example
> (unsigned-byte 32) will not buy you anything (performance-wise),
> because this will be upgraded to bignum when the lisp implementation's
> fixnum type is less than 32 bits.

While it does not necessarily have to be the case (as Tim points out),
it does seem to be the common practice amongst implementations other
than CMUCL.

(defun test ()
  (declare (optimize (speed 3) (safety 0) (fixnum-safety 0)))
  (let* ((a (* most-positive-fixnum 2))
         (b (* most-positive-fixnum 2))
         (c (time (the (unsigned-byte 32) (+ a b)))))
       (declare (type (unsigned-byte 32) a b c))
       (> c most-positive-fixnum)))
TEST

> (test)

0.0 seconds used.
Standard Allocation 0 bytes.
Fixlen Allocation 0 bytes.
NIL

If I delete the fixnum-safety optimization, or _all_ declarations except
(safety 0)(speed 3), it will give the correct answer, but the addition
will use up an amazing 16 bytes (probably because it will box two
numbers, make bignums and add them.

I think the reasons behind it are not fundamental but incidental (as
CMUCL reportedly demonstrates the possibility).

To avoid garbage, it seems I am better off writing some specialised
arithmetic routines for 'mediumnum' numbers by using two fixnums.

> The (sole?) exception to this rule is when you have specialized
> arrays, where all elements are of the same type, and the elements'
> type information is (may be) stored only in the array object (not the
> element objects).

Maybe it's even true for structures and objects, as you can declare the
type of their slots?  (OK, this is wishful thinking.)

> In such cases, implementations are typically able to
> store (unsigned-byte 32)-objects as "true" 32-bits objects, for
> example. This is what UPGRADED-ARRAY-ELEMENT-TYPE will inform you
> about. But when such elements are extracted by AREF, they are
> immediately coerced to fixnums or bignums ("boxed"), according to
> value (although when AREF is used to designate a place, an optimizing
> compiler may be able to skip the boxing).
>
> > I use maximum-sized arrays that mostly hold fixnum values, and some
> > 5% bignums, that's why I looked this.

> If the values of those 5% all fit in 32 bits, a specialized array of
> element-type (unsigned-byte 32) would require a little bit less
> storage than a general array (one word less per bignum), at the cost
> of the array not being able to hold elemets with values outside the
> 32-bit range.

Yes, I'm considering a separate 'mednum' or bignum table table for these
outliers.

Does anyone know by the way if SVREF is safe on fixnum or 32-bit
vectors?  ANSI CL does not require SVREF to work on anything other than
simple (generalized) vectors, but LispWorks seems to get along with
fixnum vectors.

Thanks for your comments.

Robert

From: Tim Bradshaw
Subject: Re: Internal representation of fixnums
Date: Mon, 13 Dec 1999 00:00:00 +0000
Message-ID: <ey3902zrttx.fsf@cley.com>

* Robert Monfera wrote:
> Does anyone know by the way if SVREF is safe on fixnum or 32-bit
> vectors?  ANSI CL does not require SVREF to work on anything other than
> simple (generalized) vectors, but LispWorks seems to get along with
> fixnum vectors.

svref is only good for simple general vectors.  So no, it's not safe
on anything else.  AREF should be just as fast in almost any
reasonable Lisp if you declare enough types.

--tim

From: Pekka P. Pirinen
Subject: Re: Internal representation of fixnums
Date: Fri, 10 Dec 1999 00:00:00 +0000
Message-ID: <ix9032a90t.fsf@gaspode.cam.harlequin.co.uk>

Duane Rettig <·····@franz.com> writes:
> Robert Monfera <·······@fisec.com> writes:
> > Barry Margolin wrote:
> > > A common way to represent Lisp data types is to use the low-order 2 bits
> > > for the type tags.
> > 
> > In LispWorks, it is 8 bits, which leaves you with 24-bit fixna.
> 
> Someone from Harlequin will have to comment, but I do not think
> that this is correct.

It's not quite correct.  LispWorks for Windows has 24-bit fixnums,
Unix versions have 30 bits.

On LispWorks, boxed types use the two or three lowest bits of the
pointer for tags, but immediate types use more bits in a complicated
way, particularly on Windows, where the representation was optimized
for type checking speed (as opposed to, say, large integer arithmetic
speed).
-- 
Pekka P. Pirinen
Adaptive Memory Management Group, Harlequin Limited
<html><body><center><blink><h1>12:00</h1></blink></center></body></html>

From: Duane Rettig
Subject: Re: Internal representation of fixnums
Date: Fri, 10 Dec 1999 00:00:00 +0000
Message-ID: <466y6agq3.fsf@beta.franz.com>

Barry Margolin <······@bbnplanet.com> writes:

> In article <·················@vatican.va>,
> An Airplane  <····@vatican.va> wrote:
> >in the arguments to the li, 0x0A68 is (* 666 4). why is the number four
> >times larger?
> 
> A common way to represent Lisp data types is to use the low-order 2 bits
> for the type tags.  All the boxed types are big enough that they need at
> least 4-byte alignment, so you can just mask off these bits to get a
> pointer to the object.  The type bits 00 and 11 are used for positive and
> negative fixnums, respectively; this allows you to do addition and
> subtraction of them without having to mask off the type tags.  But it also
> means that the numbers are shifted up by two bits, which is the same as
> multiplication by 4.

Oops, this isn't quite correct.  On many of the lisps on GP hardware,
the low two bits do specify the "fixnum" tag, which is always the
bits #b00 (not #b00 and #b11).  Since they are the ls-bits, the sign
of the number makes no difference, and shifting by two or multiplying
by four always introduces 0 bits at the low end.

Also, another clarification: these implementations tend to in fact use
three bits for tags, and align lisp objects on 16-byte boundaries.  To
get the apparent two-bit fixnum tag, we take two tags, #b000 for even
fixnums and #b100 for odd fixnums.  This leaves six other tags for some
of the type codes, and one tag is usually set aside as an "other" tag
which says that the type of the object is a byte within the object
itself.

> >also, isn't that a lot of code just to return a fixnum?
> 
> I think most of it was checking that you supplied the correct number of
> arguments.  Try recompiling it with safety 0 and speed 3 to see if that
> stuff is left out.

That is probably the case.  In Allegro CL, the difference is the
elimination of both the argument checking and the interrupt-sync
check.  The last three instructions are, respectively, setting the
number-of-values-returned count, loading the caller's function
object, and the actual return.  I suppose that on the powerpc
this combination could be done by jumping to a "return-1-value"
code segment, but on other architectures such a jump would cause
an extra instruction execution _and_ a pipeline flush...

USER(1): (defun kl ()
    666)
KL
USER(2): (disassemble 'kl)
;; disassembly of #<Function (:ANONYMOUS-LAMBDA 3) @ #x404f1132>
;; formals: 

;; code start: #x404f1104:
   0: 0c300000             twlgti r16,0   	"number of args"
   4: 8252ffff             lwz r18,-1(r18)
   8: 38600a68     [addi]  lil r3,2664
  12: 3a000001     [addi]  lil r16,1
  16: 82e10008             lwz r23,8(r1)
  20: 4e800020             blr
USER(3): (declaim (optimize speed (safety 0) (debug 0)))
T
USER(4): (disassemble 'kl)
;; disassembly of #<Function (:ANONYMOUS-LAMBDA 4) @ #x404f3afa>
;; formals: 

;; code start: #x404f3ad4:
   0: 38600a68     [addi]  lil r3,2664
   4: 3a000001     [addi]  lil r16,1
   8: 82e10008             lwz r23,8(r1)
  12: 4e800020             blr
USER(5): 

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)