tagged/untagged data in lisp system programming

From: Jürgen Böhm
Subject: tagged/untagged data in lisp system programming
Date: Fri, 15 Dec 2006 17:19:06 +0000
Message-ID: <eluldn$766$03$1@news.t-online.com>

Hi,

let us suppose one wants to define a lisp dialect and use it to write an
operating system for a new processor (on which no previous lisp
implementation exists).

Especially all device drivers, the garbage collecting system and the
processing of strings and arrays shall be written in this lisp itself.

The processor shall be a 32 bit word architecture but shall be able to
access halfwords and bytes by address (32 bit address space for bytes).

The lisp system shall have the usual tagged data types, that is either
pointers to the data (for conses, vector and strings) and tagged
immediate data, fitting into a 32 bit cell (charcters and fixnums with,
say 30 or 24 bit).

Now to me it seems unavoidable, to have untagged data too, to implement
the abovely named primitive operations (GC, string-ops, vector-ops,
device driver) in lisp itself. For example, doing marking in the GC
phase (supposing there is some kind of marking to be done) one has to
execute maybe an "or"-operation which applies to all 32 bits of a
machine word. I call this an "untagged-or" (u-or) in contrary to the
"internal-or" working on tagged integers, applying only to their data
holding 24 or 30 bits.

Thinking everything through, one comes to some elementary functions
(1)

u-put-word (i val)
u-put-byte (i val)
u-get-word (i)
u-get-byte (i)

u-or (x y) ... etc.

which work on untagged arguments (just 32 bit words) and return such.

It is not too difficult to program a simple garbage collector and the
other constructing and referencing primitives in lisp using only the
functions in (1), and this should be true for device drivers also.

But now a problem appears: This untagged data gets its place on the
stack or in the heap and irritates the garbage collector who has no
possibility to discriminate it from pointers or tagged immediate data.

So it seems, that the part of the operating system that is making use of
untagged data cannot be writen in the same lisp as the "higher" parts of
the system. Especially, it seems, one has to impose the restriction that
in this subset of the full lisp to be provided - lets call this subset
lisp' - that in lisp' there can be no automatic garbage collection but
instead everything to be kept during GC has to be explicitly handled
(maybe pushed explicitly on a distinguished list). (Alternatively one
could include compiler options "declaim untagged" or so, to let the
compiler take care of it)

This is the point to which my thoughts during the last weeks - where I
have searched a lot for information about these issues on the net - have
grown, but I have the impression, that maybe I am nevertheless heading
in the wrong direction. Especially I know too less about the Symbolics
lisp-machines - I just read a statement in a comp.lang.lisp discussion,
that on these machines there was only tagged data. Is this true, and is
there a simple way to build an "all tagged" lisp + processor in which
all primitives can be done (reasonably efficient) with tagged data only?

Greetings

J�rgen

-- 
J�rgen B�hm                                            www.aviduratas.de
"At a time when so many scholars in the world are calculating, is it not
desirable that some, who can, dream ?"  R. Thom

Re: tagged/untagged data in lisp system programming Tim Bradshaw
Re: tagged/untagged data in lisp system programming Joe Marshall
Re: tagged/untagged data in lisp system programming Kaz Kylheku
Re: tagged/untagged data in lisp system programming Barry Margolin
Re: tagged/untagged data in lisp system programming Rainer Joswig
- Re: tagged/untagged data in lisp system programming Joe Marshall
  - Re: tagged/untagged data in lisp system programming Rainer Joswig
- Re: tagged/untagged data in lisp system programming Jürgen Böhm
  - Re: tagged/untagged data in lisp system programming Joe Marshall
Re: tagged/untagged data in lisp system programming Wade Humeniuk
- Re: tagged/untagged data in lisp system programming Pierre THIERRY
  - Re: tagged/untagged data in lisp system programming Wade Humeniuk
    - Re: tagged/untagged data in lisp system programming Wade Humeniuk

From: Tim Bradshaw
Subject: Re: tagged/untagged data in lisp system programming
Date: Fri, 15 Dec 2006 21:52:06 +0000
Message-ID: <1166219526.706713.132940@n67g2000cwd.googlegroups.com>

Jürgen Böhm wrote:


> This is the point to which my thoughts during the last weeks - where I
> have searched a lot for information about these issues on the net - have
> grown, but I have the impression, that maybe I am nevertheless heading
> in the wrong direction. Especially I know too less about the Symbolics
> lisp-machines - I just read a statement in a comp.lang.lisp discussion,
> that on these machines there was only tagged data. Is this true, and is
> there a simple way to build an "all tagged" lisp + processor in which
> all primitives can be done (reasonably efficient) with tagged data only?

It's important to remember that the Symbolic Lisp machines were not
32bit systems.  They were 36 (L, G machine) or 40 (44?) bit (Ivory)
machines.

From: Joe Marshall
Subject: Re: tagged/untagged data in lisp system programming
Date: Fri, 15 Dec 2006 22:13:19 +0000
Message-ID: <1166220799.778448.320460@79g2000cws.googlegroups.com>

Jürgen Böhm wrote:
> Hi,
>
> let us suppose one wants to define a lisp dialect and use it to write an
> operating system for a new processor (on which no previous lisp
> implementation exists).
>
> So it seems, that the part of the operating system that is making use of
> untagged data cannot be writen in the same lisp as the "higher" parts of
> the system. Especially, it seems, one has to impose the restriction that
> in this subset of the full lisp to be provided - lets call this subset
> lisp' - that in lisp' there can be no automatic garbage collection but
> instead everything to be kept during GC has to be explicitly handled
> (maybe pushed explicitly on a distinguished list). (Alternatively one
> could include compiler options "declaim untagged" or so, to let the
> compiler take care of it)

Basically right.

> This is the point to which my thoughts during the last weeks - where I
> have searched a lot for information about these issues on the net - have
> grown, but I have the impression, that maybe I am nevertheless heading
> in the wrong direction. Especially I know too less about the Symbolics
> lisp-machines - I just read a statement in a comp.lang.lisp discussion,
> that on these machines there was only tagged data. Is this true, and is
> there a simple way to build an "all tagged" lisp + processor in which
> all primitives can be done (reasonably efficient) with tagged data only?

The Lisp machines had a microcoded execution engine that worked `below'
the Lisp level.  The hardware was designed to support a tagged data
model.  The earlier lisp machines handled untagged data at the
microcode level and was careful to not mix tagged and untagged data.

It would be reasonable to build an all-tagged processor and memory
system these days.  Back in the 80's it was considered extravagant.

From: Kaz Kylheku
Subject: Re: tagged/untagged data in lisp system programming
Date: Sat, 16 Dec 2006 04:29:46 +0000
Message-ID: <1166243386.358709.119130@73g2000cwn.googlegroups.com>

Jürgen Böhm wrote:
> Thinking everything through, one comes to some elementary functions
> (1)
>
> u-put-word (i val)
> u-put-byte (i val)
> u-get-word (i)
> u-get-byte (i)

Yes, of course you need escape hatches to gain access to the machine in
order to implement the substrate. Even kernels written in C still have
to use some assembly language, even though C types are not tagged.

How will you write an interrupt handler which saves the machine context
and then dispatches Lisp code? How will you disable interrupts around a
critical section? How will you atomically compare and swap a word, or
perform a load-linked/store-conditional? How will you do memory
barriers? How will you call firmware routines and extract their
results?

What you can do is design an assembler that has Lisp syntax and write
those things in that assembler. Then when you have the Lisp system up
and running, you can port that assembler to it, perhaps as an extension
of its dialect. So then it becomes self hosting in the sense that the
Lisp implementation that forms the basis of that system can accept and
process the assembly-language files along with Lisp files to
re-generate its own bootable image.

> u-or (x y) ... etc.
>
> which work on untagged arguments (just 32 bit words) and return such.
>
> It is not too difficult to program a simple garbage collector and the
> other constructing and referencing primitives in lisp using only the
> functions in (1), and this should be true for device drivers also.
>
> But now a problem appears: This untagged data gets its place on the
> stack or in the heap and irritates the garbage collector who has no
> possibility to discriminate it from pointers or tagged immediate data.

This problem only occurs in garbage collectors that are introduced into
memory management systems that are hostile against garbage collection,
and for various reasons cannot be upgraded to support garbage
collection properly.  Such garbage collectors which make safe
assumptions are then called conservative. For instance, if you want to
introduce garbage collection in a C program to hunt down malloced
memory, you have this problem. Because of the way C works, pointers to
the memory can end up in data areas whose structure your garbage
collector cannot understand. So you cannot even solve the root
reference problem correctly: discovering where in the static areas of
the program exist pointers to objects.

On the other hand, the garbage collector for your Lisp system doesn't
have to blindly search through memory, not knowing whether any given
word is a pointer, or just some integer value that looks like one. It
can have precise knowledge about where the root objects are, and
knowledge about the structure of memory in general.

The unsafe code which manipulates raw machine words is careful to use
its own data areas which are not subject to garbage collection. It is
carefully written and its use is minimized. Moreover you can even
introduce a security scheme whereby programs written with these unsafe
extensions can only be translated by a privileged compiler and executed
by a privileged user. If the compiler is run without privilege, it
diagnoses the use of these language extensions and rejects them.

> So it seems, that the part of the operating system that is making use of
> untagged data cannot be writen in the same lisp as the "higher" parts of
> the system.

Moreover, the part of the system which switches context between threads
or handles an interrupt also cannot be.

> Especially, it seems, one has to impose the restriction that
> in this subset of the full lisp to be provided - lets call this subset
> lisp' - that in lisp' there can be no automatic garbage collection but
> instead everything to be kept during GC has to be explicitly handled
> (maybe pushed explicitly on a distinguished list). (Alternatively one
> could include compiler options "declaim untagged" or so, to let the
> compiler take care of it)

Note that implementations of Lisp which are hosted on various operating
systems commonly link to libraries on those operating systems. Yet,
somehow, their garbage collectors know not to mess with the data
structures managed by those libraries.

A Lisp linked to your C library doesn't try to garbage collect the
``FILE *stdout'' structure, wherever it is.

That's because it's not just roving over arbitrary memory, looking for
objects! It knows about the data areas where Lisp objects live and
where garbage collection takes place, and stays away from other areas
where ``foreign'' objects live.

If a foreign object is to be treated within Lisp, there are various
approaches, one of which is to wrap a Lisp object around it which
carries the pointer as an opaque value. That Lisp object carries the
type tag and lives in the garbage-collected heap, so the value stored
in it doesn't have to be tagged. The garbage collector knows that in
that type of object, the opaque field is something to stay away from
and not chase it as an object reference.

From: Barry Margolin
Subject: Re: tagged/untagged data in lisp system programming
Date: Sat, 16 Dec 2006 03:14:46 +0000
Message-ID: <barmar-488B8D.22144615122006@comcast.dca.giganews.com>

In article <···············@news.t-online.com>,
 J�rgen B�hm <······@gmx.net> wrote:

> 
> Hi,
> 
> let us suppose one wants to define a lisp dialect and use it to write an
> operating system for a new processor (on which no previous lisp
> implementation exists).
> 
> Especially all device drivers, the garbage collecting system and the
> processing of strings and arrays shall be written in this lisp itself.
> 
> The processor shall be a 32 bit word architecture but shall be able to
> access halfwords and bytes by address (32 bit address space for bytes).
> 
> The lisp system shall have the usual tagged data types, that is either
> pointers to the data (for conses, vector and strings) and tagged
> immediate data, fitting into a 32 bit cell (charcters and fixnums with,
> say 30 or 24 bit).
> 
> Now to me it seems unavoidable, to have untagged data too, to implement
> the abovely named primitive operations (GC, string-ops, vector-ops,
> device driver) in lisp itself. For example, doing marking in the GC
> phase (supposing there is some kind of marking to be done) one has to
> execute maybe an "or"-operation which applies to all 32 bits of a
> machine word. I call this an "untagged-or" (u-or) in contrary to the
> "internal-or" working on tagged integers, applying only to their data
> holding 24 or 30 bits.
> 
> Thinking everything through, one comes to some elementary functions
> (1)
> 
> u-put-word (i val)
> u-put-byte (i val)
> u-get-word (i)
> u-get-byte (i)
> 
> u-or (x y) ... etc.
> 
> which work on untagged arguments (just 32 bit words) and return such.
> 
> It is not too difficult to program a simple garbage collector and the
> other constructing and referencing primitives in lisp using only the
> functions in (1), and this should be true for device drivers also.
> 
> But now a problem appears: This untagged data gets its place on the
> stack or in the heap and irritates the garbage collector who has no
> possibility to discriminate it from pointers or tagged immediate data.

The solution, although it's not pretty, is to use TWO Lisp objects 
rather than just one to hold the 32-bit data.  I.e.

(u-put-word loc-high loc-low val-high val-low)
(u-get-word loc-high loc-low) => val-high val-low

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***

From: Rainer Joswig
Subject: Re: tagged/untagged data in lisp system programming
Date: Fri, 15 Dec 2006 22:19:46 +0000
Message-ID: <C1A8DE12.66065%joswig@lisp.de>

Am 15.12.2006 18:19 Uhr schrieb "J�rgen B�hm" unter <······@gmx.net> in
···············@news.t-online.com:

> 
> Hi,
> 
> let us suppose one wants to define a lisp dialect and use it to write an
> operating system for a new processor (on which no previous lisp
> implementation exists).
> 
> Especially all device drivers, the garbage collecting system and the
> processing of strings and arrays shall be written in this lisp itself.
> 
> The processor shall be a 32 bit word architecture but shall be able to
> access halfwords and bytes by address (32 bit address space for bytes).
> 
> The lisp system shall have the usual tagged data types, that is either
> pointers to the data (for conses, vector and strings) and tagged
> immediate data, fitting into a 32 bit cell (charcters and fixnums with,
> say 30 or 24 bit).
> 
> Now to me it seems unavoidable, to have untagged data too, to implement
> the abovely named primitive operations (GC, string-ops, vector-ops,
> device driver) in lisp itself. For example, doing marking in the GC
> phase (supposing there is some kind of marking to be done) one has to
> execute maybe an "or"-operation which applies to all 32 bits of a
> machine word. I call this an "untagged-or" (u-or) in contrary to the
> "internal-or" working on tagged integers, applying only to their data
> holding 24 or 30 bits.

Look at the TI Explorer's microprocessor. It was a 32bit processor
with all that stuff.

> 
> Thinking everything through, one comes to some elementary functions
> (1)
> 
> u-put-word (i val)
> u-put-byte (i val)
> u-get-word (i)
> u-get-byte (i)
> 
> u-or (x y) ... etc.
> 
> which work on untagged arguments (just 32 bit words) and return such.
> 
> It is not too difficult to program a simple garbage collector and the
> other constructing and referencing primitives in lisp using only the
> functions in (1), and this should be true for device drivers also.
> 
> But now a problem appears: This untagged data gets its place on the
> stack or in the heap and irritates the garbage collector who has no
> possibility to discriminate it from pointers or tagged immediate data.

There is not THE garbage collector. On the Symbolics (and related)
you can work without any GC (using the large address space
until it fills up and then reboot). You also have the choice
of different GCs (incremental, ephemeral, mark&sweep, ...).

The MIT derived Lisp Machine operating systems had all kinds
of memory management schemes. Several Garbage Collectors were
among them. The memory was devided into areas with different
memory management strategies. Some areas were not
automatically managed by a GC. Areas were used for specific
purposes. There are areas for strings, for symbols, for
bitmaps, ... You can also create your own areas and define
some memory management for those. The OS also provides
"resources", which are manually managed pools of objects.

> 
> So it seems, that the part of the operating system that is making use of
> untagged data cannot be writen in the same lisp as the "higher" parts of
> the system. Especially, it seems, one has to impose the restriction that
> in this subset of the full lisp to be provided - lets call this subset
> lisp' - that in lisp' there can be no automatic garbage collection but
> instead everything to be kept during GC has to be explicitly handled
> (maybe pushed explicitly on a distinguished list). (Alternatively one
> could include compiler options "declaim untagged" or so, to let the
> compiler take care of it)
> 
> This is the point to which my thoughts during the last weeks - where I
> have searched a lot for information about these issues on the net - have
> grown, but I have the impression, that maybe I am nevertheless heading
> in the wrong direction. Especially I know too less about the Symbolics
> lisp-machines - I just read a statement in a comp.lang.lisp discussion,
> that on these machines there was only tagged data. Is this true, and is
> there a simple way to build an "all tagged" lisp + processor in which
> all primitives can be done (reasonably efficient) with tagged data only?

There was not only tagged data on Symbolics systems. For example my MacIvory
has an interface to non-tagged data in the Mac OS. Also
it could talk to hardware (Nubus, Ethernet, graphics cards,
...) with their ideas of data formats.

With the Symbolics OS it was easier to work with non-tagged data,
since the processor could work directly with 32-bit words (the
words were 36bit or 40bits on the Symbolics). So when
you read/wrote a 32bit-word from the outside, it would fit
into a Lisp Machine word.


The TI Lisp Machines were probably the first machines
to use the Nubus (before Apple) and they could
talk about the Nubus to all kinds of cards
(memory, DSP, ...). The processor later was that
mentioned microprocessor which should be a good
thing for you to study. There is a lot
of documentation for the TI Explorer online and lots
of it describes internals, like the memory management
system.


> 
> Greetings
> 
> J�rgen

From: Joe Marshall
Subject: Re: tagged/untagged data in lisp system programming
Date: Fri, 15 Dec 2006 22:34:49 +0000
Message-ID: <1166222089.281206.245600@f1g2000cwa.googlegroups.com>

Rainer Joswig wrote:
>
> The TI Lisp Machines were probably the first machines
> to use the Nubus (before Apple) and they could
> talk about the Nubus to all kinds of cards
> (memory, DSP, ...).

The NuBus was originally designed by Steve Ward's group for the
NuMachine.  TI was involved somehow and ended up with the NuBus.  When
TI decided to build the Explorer, they did a technology transfer with
LMI.  The LMI Lambda got the NuBus and TI got the Lambda architecture.
I think the LMI Lambda predates the Explorer slightly.

Just some trivia.

From: Rainer Joswig
Subject: Re: tagged/untagged data in lisp system programming
Date: Fri, 15 Dec 2006 23:01:53 +0000
Message-ID: <C1A8E7F1.66070%joswig@lisp.de>

Am 15.12.2006 23:34 Uhr schrieb "Joe Marshall" unter <··········@gmail.com>
in ························@f1g2000cwa.googlegroups.com:

> 
> Rainer Joswig wrote:
>> 
>> The TI Lisp Machines were probably the first machines
>> to use the Nubus (before Apple) and they could
>> talk about the Nubus to all kinds of cards
>> (memory, DSP, ...).
> 
> The NuBus was originally designed by Steve Ward's group for the
> NuMachine.  TI was involved somehow and ended up with the NuBus.  When
> TI decided to build the Explorer, they did a technology transfer with
> LMI.  The LMI Lambda got the NuBus and TI got the Lambda architecture.
> I think the LMI Lambda predates the Explorer slightly.
> 
> Just some trivia.
> 

This also reads kind of interesting:

http://www.ti.com/corp/docs/company/history/watson.htm

From: Jürgen Böhm
Subject: Re: tagged/untagged data in lisp system programming
Date: Fri, 15 Dec 2006 23:22:14 +0000
Message-ID: <elvami$ibs$03$1@news.t-online.com>

Rainer Joswig wrote:
> Am 15.12.2006 18:19 Uhr schrieb "J�rgen B�hm" unter <······@gmx.net> in
> ···············@news.t-online.com:
> 
>>
>> But now a problem appears: This untagged data gets its place on the
>> stack or in the heap and irritates the garbage collector who has no
>> possibility to discriminate it from pointers or tagged immediate data.
> 
> There is not THE garbage collector. On the Symbolics (and related)
> you can work without any GC (using the large address space
> until it fills up and then reboot). You also have the choice
> of different GCs (incremental, ephemeral, mark&sweep, ...).
> 

Speaking of *the* garbage collector, I did not intend to say, there
could be only one active in the system, although in my planned system of
course there will be only one first, for reasons of simplicity. My point
is instead better expressed in the following paragraph:

>> So it seems, that the part of the operating system that is making use of
>> untagged data cannot be writen in the same lisp as the "higher" parts of
>> the system. Especially, it seems, one has to impose the restriction that
>> in this subset of the full lisp to be provided - lets call this subset
>> lisp' - that in lisp' there can be no automatic garbage collection but
>> instead everything to be kept during GC has to be explicitly handled
>> (maybe pushed explicitly on a distinguished list). (Alternatively one
>> could include compiler options "declaim untagged" or so, to let the
>> compiler take care of it)
>>

Or one could say it more pictorially: This untagged data smuggles itself
in as "dirt" into the "clean heaven of Lisp tagged data". There it
always threatens the integrity of all system functions which expect only
tagged data and can revoke wrong typed arguments only under the
assumption they are correctly tagged. The only way to keep up with this
problem seems to me to supplement the compiler with a limited amount of
"type inferencing" aided by suitable "declaims" so that it can decide
between tagged and untagged data and construct all internal
datastructures in a way, that untagged data becomes "tagged by
compilation". A memory model for such a Lisp might (just for example) be
to have only blocks of the type

(1)

|header1|n1|n2|tagdata(1..n1)|untagdata(1..n2)|

where header1, n1, n2 are 32 bit words and n1 tagdata words and n2
untagdata words follow. It would be the compilers task to compile the
Lisp code to correct writers, accessors and modificators of these
blocks. Of course cons cells with tagged data content are a special case
of (1), which could be given a more compact representation.

Is this method (implicitly tagging by the compiler) the method chosen in
the Lisps of the Lisp-machines or where they - on the contrary - able to
formulate all the basic algorithms, including garbage collecting - be it
done by one routine or in an arbitrary complex way - with tagged data
operations alone?

Greetings

J�rgen

-- 
J�rgen B�hm                                            www.aviduratas.de
"At a time when so many scholars in the world are calculating, is it not
desirable that some, who can, dream ?"  R. Thom

From: Joe Marshall
Subject: Re: tagged/untagged data in lisp system programming
Date: Sat, 16 Dec 2006 01:24:13 +0000
Message-ID: <1166232253.880142.76080@n67g2000cwd.googlegroups.com>

Jürgen Böhm wrote:
>
> Or one could say it more pictorially: This untagged data smuggles itself
> in as "dirt" into the "clean heaven of Lisp tagged data". There it
> always threatens the integrity of all system functions which expect only
> tagged data and can revoke wrong typed arguments only under the
> assumption they are correctly tagged. The only way to keep up with this
> problem seems to me to supplement the compiler with a limited amount of
> "type inferencing" aided by suitable "declaims" so that it can decide
> between tagged and untagged data and construct all internal
> datastructures in a way, that untagged data becomes "tagged by
> compilation".

It helps to have compiler support for this sort of programming, but you
can get pretty far without too much *special* support if you have a
reasonably good model of what the compiler is going to do with your
data.  For the most part, tagged and untagged data are handled opaquely
(as simple 32-bit quantities) until you get to the primitive functions.
 For instance, suppose you had these functions:

(defun foo (x y)
  (if (> x 0)
      (bar y)
      (baz y)))

(defun bar (object) (car object))

The function FOO doesn't do anything to Y but pass it as an argument to
another function.  The compiler *ought* to be able to compile that into
a push instruction.  (OK, you can't *prove* it will or portably ensure
this, but if you have control over the compiler you can arrange for
this to reliably be the case.)  It doesn't matter if Y is boxed or not,
it just gets pushed.

Now when we get to BAR, we have a different story.  The call to CAR
will have to check the type of OBJECT (barring funniness in
declarations, etc.)

At the level of `normal' code, all objects are tagged (boxed).  You
don't need any special compiler support.  At the GC level, you can get
away with a certain level of mixed boxed and unboxed use because the GC
won't interrupt itself.  (We assume the GC is carefully written and
won't leave unboxed data around to confuse itself on the next
iteration.)

> A memory model for such a Lisp might (just for example) be
> to have only blocks of the type
>
> (1)
>
> |header1|n1|n2|tagdata(1..n1)|untagdata(1..n2)|
>
> where header1, n1, n2 are 32 bit words and n1 tagdata words and n2
> untagdata words follow. It would be the compilers task to compile the
> Lisp code to correct writers, accessors and modificators of these
> blocks. Of course cons cells with tagged data content are a special case
> of (1), which could be given a more compact representation.
>
> Is this method (implicitly tagging by the compiler) the method chosen in
> the Lisps of the Lisp-machines or where they - on the contrary - able to
> formulate all the basic algorithms, including garbage collecting - be it
> done by one routine or in an arbitrary complex way - with tagged data
> operations alone?

With the appropriate `subprimitives', yes.  It requires care, but it is
doable.

From: Wade Humeniuk
Subject: Re: tagged/untagged data in lisp system programming
Date: Sat, 16 Dec 2006 02:05:37 +0000
Message-ID: <RvIgh.76407$hn.30663@edtnps82>

J�rgen B�hm wrote:
> Hi,
> 
> let us suppose one wants to define a lisp dialect and use it to write an
> operating system for a new processor (on which no previous lisp
> implementation exists).
> 
> Especially all device drivers, the garbage collecting system and the
> processing of strings and arrays shall be written in this lisp itself.
> 
> The processor shall be a 32 bit word architecture but shall be able to
> access halfwords and bytes by address (32 bit address space for bytes).
> 
> The lisp system shall have the usual tagged data types, that is either
> pointers to the data (for conses, vector and strings) and tagged
> immediate data, fitting into a 32 bit cell (charcters and fixnums with,
> say 30 or 24 bit).
> 
> Now to me it seems unavoidable, to have untagged data too, to implement
> the abovely named primitive operations (GC, string-ops, vector-ops,
> device driver) in lisp itself. For example, doing marking in the GC
> phase (supposing there is some kind of marking to be done) one has to
> execute maybe an "or"-operation which applies to all 32 bits of a
> machine word. I call this an "untagged-or" (u-or) in contrary to the
> "internal-or" working on tagged integers, applying only to their data
> holding 24 or 30 bits.
> 

Take the easy road.  No 32-bit cells, only 64-bit cells.  32-bits as value,
32-bits as tag.  As it has been pointed out it would be great to have a
36-bit, 40-bit or 72-bit processor.  If you can allow yourself the luxury
of not worrying about USING memory then there is no problem.

Wade

From: Pierre THIERRY
Subject: Re: tagged/untagged data in lisp system programming
Date: Sat, 16 Dec 2006 02:17:27 +0000
Message-ID: <elvkvn$453$1@biggoron.nerim.net>

Le Sat, 16 Dec 2006 02:05:37 +0000, Wade Humeniuk a écrit:
> Take the easy road.  No 32-bit cells, only 64-bit cells.  32-bits as
> value, 32-bits as tag.  As it has been pointed out it would be great
> to have a 36-bit, 40-bit or 72-bit processor.

Is it so important to have 32-bits values? Why not use a fraction of the
32-bits word in a 32-bits architecture, in the same way than 36, 40, 44
or 72-bits architectures did? Is the fact that the size of the value has
to be a power of 2 the problem?

Curiously,
Pierre



-- 
···········@levallois.eu.org
OpenPGP 0xD9D50D8A

From: Wade Humeniuk
Subject: Re: tagged/untagged data in lisp system programming
Date: Sat, 16 Dec 2006 02:58:21 +0000
Message-ID: <hhJgh.72974$rv4.18255@edtnps90>

Pierre THIERRY wrote:
> Le Sat, 16 Dec 2006 02:05:37 +0000, Wade Humeniuk a écrit:
>> Take the easy road.  No 32-bit cells, only 64-bit cells.  32-bits as
>> value, 32-bits as tag.  As it has been pointed out it would be great
>> to have a 36-bit, 40-bit or 72-bit processor.
> 
> Is it so important to have 32-bits values? Why not use a fraction of the
> 32-bits word in a 32-bits architecture, in the same way than 36, 40, 44
> or 72-bits architectures did? Is the fact that the size of the value has
> to be a power of 2 the problem?
> 

Well....

Many data types are expressed in powers of 2.

32-bit Red-Green-Blue-Alpha Colour. Think OpenGL vectors.
32-bit Machine Addresses
32-bit CRCs
8-bit ASCII
16-bit Unicode

To interact with computer hardware, its best to use its
natural bit sizing.  Since very few machines are designed
to handle adding 23-bit integers, it becomes a problem to
pick something unnaturally sized.

Why worry about wasting space?  Spread out, enjoy the luxury
of massive amounts of bit storage.  Why is it necessary to
fight and make one's life difficult?

Wade

From: Wade Humeniuk
Subject: Re: tagged/untagged data in lisp system programming
Date: Sat, 16 Dec 2006 04:09:15 +0000
Message-ID: <LjKgh.73998$rv4.46205@edtnps90>

I am reminded that the power-of-2-cult has enslaved the
Hardware and Compiler Designers, making one dependent on the
other.  Overthrowing the 9-bit-byte-multics and 6-bit-byte-cdc
kingdoms.  It must be the siren song of procreative DOUBLING
that has wrecked so many upon the rocks.

Wade