From: David Gudeman
Subject: Summary: Representations of Dynamic Type Information
Date: 
Message-ID: <38557@optima.cs.arizona.edu>
Well, thanks for all of the responses.  I got several new ideas, and
got reminded of several things I had forgotten to include.  It would
be a little impractical for me to "summarize" the entire report here,
since it is 26 pages long.  However, there is a fairly complete draft
available (at 26 pages it ought to be fairly complete :-), and if
anyone is interested in reading it, it is available in compressed
postscript by anonymous ftp from cs.arizona.edu in the file
"/usr/ftp/janus/jc/tags.ps.Z".  If anyone has trouble getting this or
needs an uncompressed file, or wants the latex source, let me know.  I
haven't had time to discuss all of the ideas I got from the net, so
I've included draft notes in the text to give a hint about these (and
I've tried to credit the people who wrote me).  A few of the draft
notes are essentially the remainders of my outline.  Since I have to
go on to other work, the report is probably going to remain in this
unfinished state for a while.

For those who are curious, but not curious enough to read a 26-page
report, here is a rough outline of the methods (this does not include
all of the tricks and special cases)

tagged-words (type information is in the machine word)
  tag fields (word is broken up into tag and data fields)
    tag field in high end (most-significant bits) of word
       use tag of all zeros for one type to avoid tagging cost
       negative ints get a tag of all ones, non-negative ints
                                  get a tag of all zeros
       use sign bit for one type
         use sign-bit = 0 for one type and optimize another type by
                                  giving it the tag of all ones in the
                                  high end and tagging by negation.
    tag field in low end of word
      use n-bit tags to represent pointers data aligned on n-byte boundaries
                                  this allows tagging without shifting
        use a tag of all zeros to avoid tagging and untagging
      use tags that contain only one non-zero bit to make testing faster
      use all zeros to optimize integer arithmetic
      optimize integer arithmetic by adding/subtracting tag
                                  after subtraction/addition
    tag field in both ends of word
       various combinations of tricks from the other two options
  partitioning by pattern (type is encoded in the representation of the value,
                           no boxing/unboxing is done, usually words are
                           partitioned into ranges of numbers they rep.)
    simple range tests to identify types
    segments with statically determined types (a segment is a range of
                           numbers that can be identified by an
                           initial field in the bit-patterns used to
                           represent them).
    segments with dynamically determined types
      use BIBOP (table indexed by segments) to identify types
      identify type in the segment itself
        first word of segment
        last word of segment
      on a segmented architecture like the 80x86, one location has
                           many addresses so the (machine) segment
                           part of the address can be set to the
                           (representation) segment of the data type
                           being represented, and the offset can be
                           set in such a way that any object can be in
                           any machine segment.
      on machines that ignore the high bits of pointer, these bits can
                           be used for free boxing/unboxing
  make everything a legal IEEE float, using the NaN bit patterns to
                           encode all other types of data.

object-pointers (untagged pointers refering to self-identifying blocks
                 on the heap)
  combinations of this scheme with the tagged-word scheme

descriptors (two-word data elements divided into a type word and a
             value word)
  encoding a qualifier (address+length representation of a sequence)
                           in a descriptor
  encoding a cons cell in a descriptor
  direct representation of floats

segregated type codes (type information is kept elsewhere)
  type information for globals kept in a global area
  type information for locals kept in stack frame
  type information kept in header of aggregate objects

Thanks to Paul Tarau, Richard Fateman, Mikael Pettersson, Nick
Thompson, Andrew Wright, Benjamin Goldberg, Aubrey Jaffer, Eric
Benson, Marc Brandis, Kelvin Nilsen, Stavros Macrakis, Alexandr
Kopilovich, Pekka Pirinen, William Griswold, David Keppel,
Marc-Michael Brandis, Mark Tillotson, Richard Brooksby,
·····@magpie.linknet.com, Hintermeier Claus, Hendrik Boom, and Tim
Lindholm for your help.
-- 
					David Gudeman
·······@cs.arizona.edu
From: Kellom{ki Pertti
Subject: Re: Summary: Representations of Dynamic Type Information
Date: 
Message-ID: <PK.93Apr30122126@talitiainen.cs.tut.fi>
In article <·····@optima.cs.arizona.edu> ·······@cs.arizona.edu (David Gudeman) writes:
   ... it is available in compressed
   postscript by anonymous ftp from cs.arizona.edu in the file
   "/usr/ftp/janus/jc/tags.ps.Z". 

It seems that the correct path is "/janus/jc/tags.ps.Z". At least I
found it there, and it is currently on the way to the printer...
Thanks for putting it online!
--
Pertti Kellom\"aki (TeX format)  #       These opinions are mine, 
  Tampere Univ. of TeXnology     #              ALL MINE !
      Software Systems Lab       #  (but go ahead and use them, if you like)