(simple-vector 20) vs (vector single-float 20)

From: Mark Kantrowitz
Subject: (simple-vector 20) vs (vector single-float 20)
Date: Mon, 17 May 1993 20:13:12 +0000
Message-ID: <C76tI0.1uu.1@cs.cmu.edu>

Which is more efficient? 
	(simple-array single-float (20))
	(vector single-float 20)
	(simple-vector 20)
I'd think that the first one would be the most efficient, but svref
works only with the latter. I'm running in Lucid 4.0.1 and Allegro 4.1.

--mark

Re: (simple-vector 20) vs (vector single-float 20) ············@cs.cmu.edu
Re: (simple-vector 20) vs (vector single-float 20) Kelly Murray
- Re: (simple-vector 20) vs (vector single-float 20) Bruce Krulwich
  - Re: (simple-vector 20) vs (vector single-float 20) Barry Margolin
  - Re: (simple-vector 20) vs (vector single-float 20) Kelly Murray
    - Re: (simple-vector 20) vs (vector single-float 20) ············@cs.cmu.edu

From: ············@cs.cmu.edu
Subject: Re: (simple-vector 20) vs (vector single-float 20)
Date: Tue, 18 May 1993 17:07:56 +0000
Message-ID: <gfyFTg600jePIRn8M6@cs.cmu.edu>

······@GLINDA.OZ.CS.CMU.EDU (Mark Kantrowitz) writes:
> Which is more efficient? 
>         (simple-array single-float (20))
>         (vector single-float 20)
>         (simple-vector 20)
> I'd think that the first one would be the most efficient, but svref
> works only with the latter. I'm running in Lucid 4.0.1 and Allegro 4.1.
> 
> --mark

SVREF is a special flavor of AREF that only works with SIMPLE-VECTORs.
It is not necessarily faster or slower then AREF on other kinds of
arrays.  In fact, in CMUCL, (svref v x) is converted into (aref (the
simple-vector v) x) by the compiler.

Your three type declarations all indicate a different kind of vector.
Which is more effecient depends on what kind of vector your
implementation of common lisp does best.

>         (simple-array single-float (20))

This indicates that you have a simple array of single floats, with one
dimension of 20.  The one dimension means that it is a vector (the
definition of vector).  The simple means that it is not necessarily
adjustable, does not have a fill pointer, and is not displaced.  The
single-float means that you will only ever put single floats in it, so
if the implementation supports a special kind of array for
single-floats it can use it.

>         (vector single-float 20)

This is just shorthand for (array single-float (20)), which is the
same as the first, except for the simple.  As this is not necessarily
simple, each time the array is accessed, it will have to be checked
for displacement, fill pointers, etc.

>         (simple-vector 20)

This is shorthand for (simple-array t (20)).  The difference between
this and the first one is that it says that the array has to be able
to hold anything, so the compiler cannot use a specialized array type.

Note: simple-vector does not mean ``a vector that is simple.''  It
means ``a vector of T that is simple.''  Even though simple-array just
means ``an array that is simple'' and indicates nothing about the
element type.  [Can you say ``Common Lisp is consistent''?  If so, you
lie.]

Anyway, in CMUCL the first is the most effecient, then the third, then
the second.  Specifically, if you were to say:

	(+ (the single-float (aref v 0)) 1.0)

with v declared with each of these types, the resultant code would be:

For (simple-array single-float (20)):
	load 32 bits directly from the vector into the fp unit
	[the single-float check can be optimized away]
	load the constant 1.0 into the fp unit
	do the add

For (vector single-float 20):
	call the out-of-line aref routine
		(can only inline accesses to simple arrays)
	unbox the heap allocated result
	[the single-float check can be optimized away]
	load the constant 1.0 into the fp unit
	do the add

The out-of-line aref routine has to:
	check the type, notice that it really is simple.
	do the access
	copy the value into the heap
	return a pointer to the heap allocated float

For (simple-vector 20):
	load the 32-bit descriptor from the vector
	check to make sure it really is a single-float
		[not done if safety=0]
	load the 32-bit single-float from the heap into the fp unit
	load 1.0 as a constant
	do the add.

So the ``simple'' saves you a lot, and the ``single-float'' saves you
some consing and an extra indirection.

I would assume most other lisps are the same way.

-William Lott
CMU Common Lisp Group

From: Kelly Murray
Subject: Re: (simple-vector 20) vs (vector single-float 20)
Date: Tue, 18 May 1993 16:32:12 +0000
Message-ID: <1tb32cINNmrl@no-names.nerdc.ufl.edu>

In article <············@cs.cmu.edu>, ······@GLINDA.OZ.CS.CMU.EDU (Mark Kantrowitz) writes:
|> Which is more efficient? 
|> 	(simple-array single-float (20))
|> 	(vector single-float 20)
|> 	(simple-vector 20)
|> I'd think that the first one would be the most efficient, but svref
|> works only with the latter. I'm running in Lucid 4.0.1 and Allegro 4.1.
|> 
|> --mark

The answer is heavily dependent on the underlying Lisp implementation,
so you probably need a answer from both Lucid and Franz.
It also on what you are trying to be efficient about.

A (simple-array single-float (20)) is probably identical to
the (vector single-float 20), as the compiler probably converts both
forms into the same thing.  The advantage of this form is that
the system has the option of putting the float-bits directly in the array, so you
save memory and indirection when accessing the floats, and hopefully
the compiler can recognize this and generate floating point arithmetic code
that operates on elements of the array directly.
So a (setf (aref float-array 10) (+ 1.0 (aref float-array 10)))
would compile into really efficient inline code like:
 (let ((float-bits-pointer "&"(aref float-array 10)))
    (float-add3 float-bits-pointer 1.0 float-bits-pointer))

If the compiler wasn't smart enough, it could end up being less efficient
to operate on the elements, since the system would "box" or cons new
lisp floats on all array accesses to operate on the values.
So you get something like:
  (let ((lisp-float-object (cons-float "&"(aref float-array 10)))
    (setf (aref float-array 10) (uncons-float (+ 1.0 lisp-float-object))))

A (simple-vector 20) and SVREF will always hold "boxed" floats, so
you don't need to cons them up on access, or uncons them when storing them.
But then the compiler does not have the option of generating real efficient
inline code. However, if the values are mostly read-only, then this may
actually be more efficient, since it is faster to access the boxed values
and move them around (i.e. pass as parameters, store in other places).

The only true way to find out is to write test programs and experiment.
If you look at disassembled compiler output, you can likely determine whether
the floats get boxed up or not.

-Kelly

From: Bruce Krulwich
Subject: Re: (simple-vector 20) vs (vector single-float 20)
Date: Wed, 19 May 1993 09:40:00 +0000
Message-ID: <KRULWICH.93May19094000@zowie.ils.nwu.edu>

···@prl.ufl.edu (Kelly Murray) writes:

> A (simple-array single-float (20)) is probably identical to the (vector
> single-float 20), as the compiler probably converts both forms into the same
> thing.  The advantage of this form is that the system has the option of
> putting the float-bits directly in the array, so you save memory and
> indirection when accessing the floats, and hopefully the compiler can
> recognize this and generate floating point arithmetic code that operates on
> elements of the array directly.

Can anyone comment on which major LISP implementations perform optimizations
like this?  How about folding structures directly into vectors?  How about
floats or structures into structures?  Etc.

Thanks!

Bruce Krulwich
········@ils.nwu.edu

From: Barry Margolin
Subject: Re: (simple-vector 20) vs (vector single-float 20)
Date: Fri, 21 May 1993 16:12:42 +0000
Message-ID: <1tiv1qINN8t4@early-bird.think.com>

In article <······················@zowie.ils.nwu.edu> ········@zowie.ils.nwu.edu (Bruce Krulwich) writes:
>···@prl.ufl.edu (Kelly Murray) writes:
>
>> A (simple-array single-float (20)) is probably identical to the (vector
>> single-float 20), as the compiler probably converts both forms into the same
>> thing.  The advantage of this form is that the system has the option of
>> putting the float-bits directly in the array, so you save memory and
>> indirection when accessing the floats, and hopefully the compiler can
>> recognize this and generate floating point arithmetic code that operates on
>> elements of the array directly.
>
>Can anyone comment on which major LISP implementations perform optimizations
>like this?

Lucid implements floating point arrays using immediate data rather than
boxed floats.

>	     How about folding structures directly into vectors?  How about
>floats or structures into structures?  Etc.

Most Common Lisps implement non-:TYPEd structures as vectors, with a flag
so that they structures can be distinguished from other types of vectors.
Are you asking whether they optimize the case where all the slots specify
:TYPE SINGLE-FLOAT and generate a vector with unboxed single floats?  I
haven't heard of implementations that do that.

Or are you asking about optimizing (vector structure-type) so that the
structure elements are laid out in the vector (as in conventional languages
like PL/I, Pascal, and C) rather than generating a vector of pointers to
separate heap-allocated structures?  I don't think this would be valid,
because Lisp structures have first-class status as objects.  If the program
contains:

	(setq foo (aref vector 0))
	(setf (aref vector 0) bar)

The second assignment must not affect the object that FOO references.

The same is true about structure elements that are other structures.  The
:INCLUDE option provides a limited form of this, though.
-- 
Barry Margolin
System Manager, Thinking Machines Corp.

······@think.com          {uunet,harvard}!think!barmar

From: Kelly Murray
Subject: Re: (simple-vector 20) vs (vector single-float 20)
Date: Fri, 28 May 1993 17:47:06 +0000
Message-ID: <1u5j6sINNrqg@no-names.nerdc.ufl.edu>

In article <··················@cs.cmu.edu>, ············@cs.cmu.edu writes:
|> ···@prl.ufl.edu (Kelly Murray) writes:
|> > In article <··················@cs.cmu.edu>, ············@cs.cmu.edu writes:
|> > [talks about float slots in structures in CMUCL]

|> ···@prl.ufl.edu (Kelly Murray) writes:
|> > A better approach that would waste an additional word per float,
|> > but should give better speed, would be to embed the float objects
|> > into the structure directly.
|> > So you would get:
|> > 
|> >         (defstruct ship
|> >           name
|> >           (x 0.0 :type single-float)
|> >           (y 0.0 :type single-float))
|> >         (make-ship :name 'o-fools :x 1.0 :y 2.0)
|> > 
|> >  results in the following in memory:
|> >  
|> >      ship:
|> >         +------------------------------+
|> >         | structure header, w/ length  |
|> >         +------------------------------+
|> >         | pointer to class info        |
|> >         +------------------------------+
|> >         | pointer to o-fools symbol    |
|> >  1.0    +------------------------------+
|> >  -----> | float header                 |
|> >  obj    +------------------------------+
|> >         | bits for 1.0                 |
|> >         +------------------------------+
|> >  -----> | float header                 |
|> >         +------------------------------+
|> >         | bits for 2.0                 |
|> >         +------------------------------+
|> > 
|> > Thus, you can access the float bits directly in the structure, but at the
|> > same time, you can get a real lisp-float object without actually consing
|> > by constructing a pointer into the structure.
|> 
|> But then if you said:
|> 
|> 	(let ((orig-x (ship-x ship)))
|> 	  (setf (ship-x ship) (+ ship-x-speed orig-x))
|> 	  orig-x)
|> 
|> Then the returned value would be incorrect because orig-x would still
|> point at the structure slot which just got modified.  I guess you
|> could do it for read-only slots, but the benifits would be
|> sufficiently minimal to not be worth the effort.
|> 

I realized this was a problem after I posted.  I was wondering if someone
would catch it.  It is a fatal flaw in the design.  
And as you pointed out, with generational-gc, the intermediate value consing
may not be that expensive.

|> > This would require the garbage collector to be aware of this type of
|> > embedding.  The easy fix for scanning purposes would be to add a slot
|> > to the class info that identifies where the "lisp" slots end, with the rest
|> > of the slots being "bits".  Thus, you store all the lisp objects in
|> > the top of the structure, and put all the raw bits at the end.
|> > So for the above example, the ship-class-info would have a field with
|> > the value 1, indicating that only the first 32bit slot in the structure
|> > is a lisp object, the rest the GC can ignore.
|> 
|> Moving all the non-descriptor slot to the end breaks down in light of
|> :include.  If the including structure defines additional descriptor
|> slots, then you couldn't line up the shared slots and keep all the
|> non-descriptor slots at the end.

Yes, this is also a problem.

|> 
|> But you don't really need this information.  The garbage collector has
|> to scan each slot anyway, so it just checks for embedded-float
|> headers, and if it finds one, it skips the next slot.

This is only true if the float-header bits can be distinguished from normal
lisp objects, which is not true for my lisp.  If this is true for CMULisp,
then there is no need store floats at the end of the instance, or for that
matter, in a seperate indirect array, to avoid GC confusion.
The embedded design does add another word to each float storage, though.

Let me suggest another alternative.  Have a bit mask for each slot in the
structure, identifying whether the slot is a lisp-object, or raw bits.
You didn't mention it, but let me point out that both seperate array design and 
this one would also allow more than just floats to be stored directly,
such as 8bit-sized slots, etc.
The seperate array design could pack things in better, since this bitmask
design would require a more rigid 32bit alignment.

         (defstruct (plane (:include ship))
           (pilot :type symbol)
           (z 0.0 :type single-float)
           (a 0   :type (unsigned-byte 16))
           (b 0   :type (unsigned-byte 16))
          )
 plane:
        +------------------------------+
        | structure header, w/ length  |
        +------------------------------+
        | pointer to class info        | -->  class has a slot-bitmap 011011
        +------------------------------+
        | pointer to o-fools symbol    | lisp object
        +------------------------------+
        | bits for 1.0                 | nonlisp object (no more header)
        +------------------------------+ 
        | bits for 2.0                 | nonlisp
        +------------------------------+
        | pilot slot                   | lisp object
        +------------------------------+
        | z slot 0.0                   | nonlisp
        +------------------------------+ 
        | a slot       |   b slot      | nonlisp
        +------------------------------+ 

GC now has to consult the class bitmap to do a scan of the object.
This might be a problem for GC overhead,
and it might be better to allocate a header slot in the instance with a 
one-word bitmap for instances with less than 32 slots, or actually use
some upper bits on the length field of the structure header.

Did I forget anything William? :)

-Kelly

From: ············@cs.cmu.edu
Subject: Re: (simple-vector 20) vs (vector single-float 20)
Date: Sat, 29 May 1993 09:19:56 +0000
Message-ID: <og1mew600jePNCYpAv@cs.cmu.edu>

···@prl.ufl.edu (Kelly Murray) writes:
> In article <··················@cs.cmu.edu>, ············@cs.cmu.edu writes: 
> |> But you don't really need this information.  The garbage collector has
> |> to scan each slot anyway, so it just checks for embedded-float
> |> headers, and if it finds one, it skips the next slot.
> 
> This is only true if the float-header bits can be distinguished from normal
> lisp objects, which is not true for my lisp.  If this is true for CMULisp,
> then there is no need store floats at the end of the instance, or for that
> matter, in a seperate indirect array, to avoid GC confusion.
> The embedded design does add another word to each float storage, though.

All the header ``types'' we use are disjoint from regular types, so
all we need to know about any particular word is whether it is a lisp
object (i.e. has a type tag) or a raw 32-bit value.

But the extra header word per non-descriptor slot would be rather
expensive, espically when there are alternate solutions, as you point
out:

> Let me suggest another alternative.  Have a bit mask for each slot in the
> structure, identifying whether the slot is a lisp-object, or raw bits.
> You didn't mention it, but let me point out that both seperate array
> design and this one would also allow more than just floats to be
> stored directly, such as 8bit-sized slots, etc.

We currently use allocate single-float, double-float, and
(unsigned-byte 32) types in the indirect array.  It would be trivial
to add support for other non-descriptor objects.

For types that fit inside a fixnum (e.g. 8-bit bytes) we just leave
them in as regular slots because they don't have to be boxed.

> The seperate array design could pack things in better, since this bitmask
> design would require a more rigid 32bit alignment.

Actually, the bitmap idea can work along with packing.  Just interpret
each bit to indicate whether or not a word (not slot) is a descriptor
object or raw bits.  Then if you have two adjacent (unsigned-byte 16)
slots, you could allocate them in the same word and just mark that
whole word as raw.

> GC now has to consult the class bitmap to do a scan of the object.
> This might be a problem for GC overhead,

Yes, the extra processing overhead at collection time is the main cost
of this method.

> and it might be better to allocate a header slot in the instance with a 
> one-word bitmap for instances with less than 32 slots, or actually use
> some upper bits on the length field of the structure header.

Actually, it probably would be better just to stay consistent.  Each
structure needs the pointer to the class info, so tucking the bitmap
in there would only impose a small size overhead to each class
description.  Having a special short-structure representation could
easily cause problems for :include if the included structure was
short and the including structure ended up too big.

Another related idea that is commonly used in static langauges, is to
generate a class specific scavenger function for each structure type
that has information about which slots need to be scavenged hard wired
into it.  Then the garbage collector just needs to extract the pointer
to this routine from the class-info and call it.  If there are lots of
non-descriptor slots, then this can be a big win because there is *no*
cost for them, unlike with the bitmap where you have to scan the
bitmap checking each bit.

-William Lott
CMU Common Lisp Group