From: Vassil Nikolov
Subject: economical keywords?
Date: 
Message-ID: <82624.vnikolov@math.acad.bg>
Recently Kent Pitman mentioned his concern that a lot
of keywords are created in CLOS programming:

  On Wed, 1 Apr 1998 22:25:51 -0800, 
  Kent M Pitman  <······@world.std.com> wrote in comp.lang.lisp:
  (...)
  >Actually, an area I've started to recently feel worse about is the use
  >of keyword symbols as slot initargs, etc.  When one does:
  >
  > (defclass foo () 
  >   ((width :initarg :width :accessor width)))
  >
  >I'm starting to feel like we're building up a LOT of symbols like :WIDTH
  >here that are utterly gratuitous, all so we can say
  >  (make-instance 'foo :width 3)
  >instead of
  >  (make-instance 'foo 'width 3)
  (...)

In addition to having gone slightly mad over such issues already,
I disregarded his advice to ignore him and thought about keyword
use.  Since keywords are used quite a lot, they should be cheap,
so that one doesn't hesitate over their use, and what could make
them not so cheap would be all that memory they would take up.

Now, as everybody knows, a symbol would normally need five machine
words (plus perhaps 1-2 more for system data).  Of these five,
two are redundant (value and package), and two other---function
definition and p-list---are rarely used, if at all, since keywords
are almost exclusively used as tags (markers, labels, whatever).

This leaves us with the need for just one machine word---a pointer
to the name---to represent a typical keyword.  Assuming that symbol
names are `interned' across packages, in the usual case of slots
using the same name as the initarg/initform keyword, introducing
a keyword would bring little or none memory overhead if an `immediate'
representation of keywords is used.  It seems quite likely that
a typical implementation would be able to allocate a value for
the tag bits in a reference, indicating that the rest is a pointer
to the name of the keyword; if not, then just one additional word
would suffice.

What would be the price to pay:
(A) some comlication in the code that dispatches on the tag bits,
    and some more complication in the code for handling symbols
    (adding a special case), which doesn't look like much overhead
    to me;
(B) slowing down of setting the p-list or the function definition
    (the first time), when an `economical' keyword has to be coerced
    into a normal symbol; this would be particularly severe if
    the `keywords as immediate values' are implemented since all
    memory will have to be scanned (as if full GC is taking place)
    and all references to the keyword update; but this would
    happen *very rarely*.

What would be gained:
assuming 5 words per normal symbol, a fully immediate representation
for keywords, and `interning' of symbol names (so that in the case
of CL-USER::WIDTH and :WIDTH the string "WIDTH" is stored only once),
if we imagine a program with 500 classes (would that be a big
application, or merely a medium-sized one?), 20 slots per class on
the average, and 1.5 initarg/initform keywords per slot on the
average, we would have savings of 75,000 words, or 300,000 bytes on
a 32-bit machine, plus better sleep for the Lisp programmer.

I wonder if anyone has found this optimisation appropriate,
and implemented it?

Best regards,
Vassil.



Vassil Nikolov         <········@math.acad.bg>       (+359-2) 713-3813
Department of Information Research                       also 713-3818
Institute of Mathematics and Informatics        fax: (+359-2) 9713649
Acad. G. Bonchev, block 8, Sofia 1113, Bulgaria

From: Kelly Murray
Subject: Re: economical keywords?
Date: 
Message-ID: <6gmb7r$icv$1@news2.franz.com>
Yes, thought has been given over optimizing away keywords
as immediate values.  They essentially ARE immediate values
in the sense that EQ is used to compare them, so the print-name
isn't accessed when they are used at runtime, for example when
parsing a keyword argument list in a function call.
The memory savings in eliminating the function, value, and plist
slots for keywords are miniscule and not worth any extra complication 
in the implementation to special-case them.

In any case, Common Lisp *requires* they have a symbol-value,
symbol-plist and even symbol-function.  The Franz CORBA interface
implemented by Lewis Stille uses keywords as generic functions (yuck)
since they are in a global package that avoids package problems.
Some of us hate that, but Common Lisp allows it. e.g. (:move-object obj x y)

If you give up Common Lisp conformance, all the allocated memory
for keyword symbols can be eliminated in a runtime system that
doesn't allow #'intern, since just their unique addresses would used,
which don't actually point to any memory allocated for the symbol.
They are essentially only immediate values with a unique immediate 
type tag, which is true already (e.g. symbols have #b111 as the low-three bits)

A related and more generalized implementation strategy 
of the keyword issues is to represent ALL symbols as
just unique-tagged immediate values, rather than an address to
its 6-word memory block (must be 8byte aligned)
In this implementation all symbol slots
would be accessed indirectly using hashtables.
(symbol-name 'foo) would be effectively implemented as (gethash 'foo *symbol-names*)  
and (get 'foo 'bar) would be (getf (gethash 'foo *symbol-properties*) 'bar)

They are many advantages to this.
One being that the "address" of a symbol would never change improving GC.
The second is memory savings, as most symbols don't have both
a value, plist, and functions.  The third and perhaps most important
is locality, as the function cells of symbols will be grouped together 
in memory rather than scattered all over in symbol space.
In my view of a pure-CLOS environment where all functions are generic,
this works perfectly --- the symbol's address is a key into the
dispatch table of the first argument class.  No need for a "function cell".
e.g. (move-object object x y) ==>
  (funcall (gethash 'move-object object->dispatch) object x y)

-Kelly Murray  ···@franz.com  



In article <··············@math.acad.bg>, "Vassil Nikolov" <········@math.acad.bg> writes:
>> ....
>> This leaves us with the need for just one machine word---a pointer
>> to the name---to represent a typical keyword.  Assuming that symbol
>> names are `interned' across packages, in the usual case of slots
>> using the same name as the initarg/initform keyword, introducing
>> a keyword would bring little or none memory overhead if an `immediate'
>> representation of keywords is used.  It seems quite likely that
>> a typical implementation would be able to allocate a value for
>> the tag bits in a reference, indicating that the rest is a pointer
>> to the name of the keyword; if not, then just one additional word
>> would suffice.
>> ...
From: Kent M Pitman
Subject: Re: economical keywords?
Date: 
Message-ID: <sfwg1jklvv0.fsf@world.std.com>
···@math.ufl.edu (Kelly Murray) writes:

> In any case, Common Lisp *requires* they have a symbol-value,
> symbol-plist and even symbol-function. 

It requires that these slots be implemented by these accessors;
however, it does not require that the storage scheme used physically
allocate space for unused slots.  For example, symbol value cells
could be stored elsewhere (in a hash table, for example); nothing
requires the cell to be allocated in the symbol block.  We spent more
time than you can imagine arguing about whether people would make the
specific unintended assumption that these had to be physically present
in the symbol block.  I was forbidden by my compatriots from using the
term "slots" (which I'd have preferred to use) because people feared
that it would require them to waste space.  Personally, I don't think
slots even in standard objects are REQUUIRED to waste space when the
implementation has a theory about how to beat that space waste, so I
thought it a dumb restriction, but I abided.

It's an issue over which philosophers have debated forever--questions
of what knowledge people "really know" (i.e., would have physical
representation in the brain if you cut it open and accounted for
the neurons) and what they "create on demand" (like the Star Trek
holodecks) to give the illusion of it having been there all along.

I also think it's silly to talk about it being "physically present in
the symbol block" since depending on your processor, it might be on
different disk pages, or various compiler or memory tricks might cause
words at different addresses to go to places that are not physically
contiguous in memory.  Some of the tricks we use are low-level and
some are high level.  All I care is that the defined abstractions
work--not at all about how Lisp lays out memory I'm not able to test.

- - - - - 

Btw, we had talked about the idea of making keywords not be symbols
but some other type.  It was toward the end of the initial flurry
putting the 1984 spec together, as I recall.  I wouldn't mind seeing
a return to that.  If keywords were not symbols, but more like a
special subtype of string, they could be symbol names.  That might
be kinda cool.  Ah, but people do so hate having their code broken.
From: Bruno Haible
Subject: Re: economical keywords?
Date: 
Message-ID: <6gvqqq$gmq$1@nz12.rz.uni-karlsruhe.de>
Kent M Pitman <······@world.std.com> wrote:
>
> Btw, we had talked about the idea of making keywords not be symbols
> but some other type.

That's exactly what ILOG Talk implements. And indeed it allows a very
compact representation of a keyword: just a single slot, which contains
the keyword's name.

                    Bruno
                    <······@ilog.fr>             http://clisp.cons.org/~haible/
From: Jason Trenouth
Subject: Re: economical keywords?
Date: 
Message-ID: <355c4e1d.1290074375@newshost>
On 14 Apr 1998 14:11:38 GMT, ······@clisp.cons.org (Bruno Haible) wrote:

> Kent M Pitman <······@world.std.com> wrote:
> >
> > Btw, we had talked about the idea of making keywords not be symbols
> > but some other type.
> 
> That's exactly what ILOG Talk implements. And indeed it allows a very
> compact representation of a keyword: just a single slot, which contains
> the keyword's name.

And Dylan's symbols are similarly lightweight interned strings.

__Jason