From: Jacek Generowicz
Subject: float to pointer coercion (cost 13)
Date: 
Message-ID: <tyffyy16zit.fsf@lxplus068.cern.ch>
Consider the following four functions, and their corresponding CMUCL
compiler notes

;; add floats, no return type declaration
(defun foo (a b)
  (declare (optimize (speed 3) (debug 0) (safety 0))
   (single-float a b))
  (+ a b))

Note: Doing float to pointer coercion (cost 13) to "<return value>".


;; as above + return type declaration
(defun foo (a b)
  (declare (optimize (speed 3) (debug 0) (safety 0))
   (single-float a b))
  (the single-float (+ a b)))

Note: Doing float to pointer coercion (cost 13) to "<return value>".



;; add ints, no return type declaration
(defun foo (a b)
  (declare (optimize (speed 3) (debug 0) (safety 0))
   (fixnum a b))
  (+ a b))

Note: Doing signed word to integer coercion (cost 20) to "<return
value>".



;; as above + return type declaration
(defun foo (a b)
  (declare (optimize (speed 3) (debug 0) (safety 0))
   (fixnum a b))
  (the fixunm (+ a b)))

;; No notes issued at all


Is there a way of avoiding the float to pointer coercion cost for
returned values?

From: Joe Marshall
Subject: Re: float to pointer coercion (cost 13)
Date: 
Message-ID: <vf6xb59e.fsf@ccs.neu.edu>
Jacek Generowicz <················@cern.ch> writes:

> Is there a way of avoiding the float to pointer coercion cost for
> returned values?

I'm not a CMUCL expert, but I can't see how it could avoid boxing the
result.  It can get away with it for fixnums because they can squeeze
the tag into the low-order bits, but there is no room in a float.
From: Christophe Rhodes
Subject: Re: float to pointer coercion (cost 13)
Date: 
Message-ID: <sqwtrdqln6.fsf@cam.ac.uk>
Jacek Generowicz <················@cern.ch> writes:

> Is there a way of avoiding the float to pointer coercion cost for
> returned values?

Answering the question you asked: no.

What is happening here?  Let's consider the (+ <fixnum> <fixnum>) case
first.  When the return value of the addition is not declared, cmucl
(quite rightly) cannot deduce that the return value is a fixnum --
because the range of the addition is twice the fixnum range.  However,
because of the representation of integers, it can deduce that twice
the fixnum range nevertheless fits into a 32-bit register.  So cmucl
compiles the addition as something like

  shr $1, 2
  shr $2, 2
  add $3, $1, $2

so that register $3 has a machine-integer representation of the
addition's result.  Unfortunately, that machine-integer representation
does not correspond to the lisp-level representation; since objects
(rather than places or variables) in lisp are typed, some space in the
machine word must be taken up for this type information; in cmucl's
case, a small integer is indicated by having the lowest two bits of
the machine word be zero, and the remaining 30 bits the value as a 
(signed-byte 30).

Therefore, there is a chance that after this addition $3 will contain
something that cannot be shifted left twice without losing
information; if so, cmucl will have to cons up a bignum to return the
value (which will go to an arbitrary caller, remember); this is the
expensive thing that the compiler warns you about.

In the case where the return form is (the fixnum (+ <fixnum> <fixnum>)),
the declaration is a promise to the compiler that the addition will not
exceed the fixnum range.  In that case, the compiler is free to implement
the addition as
  add $3, $1, $2
directly, and use the contents of $3 as the return value.

Now, what about the floats?  Well, similarly, the kernel of the
function looks a little like

  fmov $f1, [$1+1]
  fmov $f2, [$2+1]
  fadd $f3, $f1, $f2

but now there is no way, no matter how the return value is declared,
to treat the floating point value in $f3 as a lisp return value.
Therefore, whether the return is declared or not, cmucl will have to
do something like

  mov  [$3-7], 14  ; single float tag
  fmov [$3+1], $f3

to generate a lisp-level return value.

To forestall the question you might have meant to ask: you can
maintain computational efficiency when you use a function which on
being compiled emits one of these notes by declaring it to be
inline[*].  If the function is inlined, then the system does not need
_at that call site_ to generate a lisp value; if the situation is
right, it can preserve the low-level machine representation for the
next stage in the calculation.  The function definition itself might
continue to emit the compiler note; it might be possible to quiet the
compiler with a suitable inhibit-warnings declaration.

Christophe

[*] or another technique: compiler macros, block compilation, or
defining the worker function with flet around its caller.
From: Jacek Generowicz
Subject: Re: float to pointer coercion (cost 13)
Date: 
Message-ID: <m2u0mh9nbj.fsf@genera.local>
Thanks for your careful reply.

Christophe Rhodes <·····@cam.ac.uk> writes:

> but now there is no way, no matter how the return value is declared,
> to treat the floating point value in $f3 as a lisp return value.

I guess I was too optimistic in interpreting the CMUCL manual where it
mentions that it provides non-descriptor representations of
single-float. I was hoping that if both the caller and callee agreed
that the type was guaranteed to be one which has a non-descriptor
representation, then things could be arranged for this representation
to be used to return the value. But now I think I see that this is not
really possible.

> To forestall the question you might have meant to ask: you can
> maintain computational efficiency when you use a function which on
> being compiled emits one of these notes by declaring it to be
> inline[*].

Indeed, that was going to be the next question.
From: Duane Rettig
Subject: Re: float to pointer coercion (cost 13)
Date: 
Message-ID: <4ekdlniqy.fsf@franz.com>
Jacek Generowicz <················@cern.ch> writes:

> Thanks for your careful reply.
> 
> Christophe Rhodes <·····@cam.ac.uk> writes:
> 
> > but now there is no way, no matter how the return value is declared,
> > to treat the floating point value in $f3 as a lisp return value.
> 
> I guess I was too optimistic in interpreting the CMUCL manual where it
> mentions that it provides non-descriptor representations of
> single-float. I was hoping that if both the caller and callee agreed
> that the type was guaranteed to be one which has a non-descriptor
> representation, then things could be arranged for this representation
> to be used to return the value. But now I think I see that this is not
> really possible.

Allegro CL does in fact provide for "non-descriptor" representations
of both single float and double float.  It is called our "immediate-args"
facility, and it applies to return values as well as arguments.  It
also not only applies to single-floats and double-floats, but machine
integers as well; if you have just overflowed a fixnum range, the
result need not be boxed up into a bignum; it can just be compiled as
immediate, as long as it stays within the 32 bit range (or 64-bit, for
64-bit versions).

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: rif
Subject: Re: float to pointer coercion (cost 13)
Date: 
Message-ID: <wj0sm21vubb.fsf@five-percent-nation.mit.edu>
> Allegro CL does in fact provide for "non-descriptor" representations
> of both single float and double float.  It is called our "immediate-args"
> facility, and it applies to return values as well as arguments.  It
> also not only applies to single-floats and double-floats, but machine
> integers as well; if you have just overflowed a fixnum range, the
> result need not be boxed up into a bignum; it can just be compiled as
> immediate, as long as it stays within the 32 bit range (or 64-bit, for
> 64-bit versions).

May I ask how this works?  Do you compile two versions of the
function, one with immediate-args and one without, and decide which
one to use at compile-time, depending on whether or not the calling
function knows the return type?

rif
From: Duane Rettig
Subject: Re: float to pointer coercion (cost 13)
Date: 
Message-ID: <4aco8orbs.fsf@franz.com>
rif <···@mit.edu> writes:

> > Allegro CL does in fact provide for "non-descriptor" representations
> > of both single float and double float.  It is called our "immediate-args"
> > facility, and it applies to return values as well as arguments.  It
> > also not only applies to single-floats and double-floats, but machine
> > integers as well; if you have just overflowed a fixnum range, the
> > result need not be boxed up into a bignum; it can just be compiled as
> > immediate, as long as it stays within the 32 bit range (or 64-bit, for
> > 64-bit versions).
> 
> May I ask how this works?  Do you compile two versions of the
> function, one with immediate-args and one without, and decide which
> one to use at compile-time, depending on whether or not the calling
> function knows the return type?

Nope.  The function is compiled once, but it is given two entry
points.  The "normal" entry point is actually at the beginning of
the code vector, and it jumps to a hook function, whose job it is
to unbox the arguments according to the spec and then "call" the
original function again, but to the immediate-args entry point.
Once the function is done, it returns to the hook function, which
boxes up the result.  When a function is compiled that knows about
the immediate-function, it places its arguments according to the calling
standard, and calls the function at its secondary (immediate-args)
entry point.  It is also responsible for handling any unboxed value
coming back, since the function will return directly to the caller.

Note that there are lots of caveats to a system like this, and the
specification method is pretty bogus, which is why we haven't
exported the functionality (though we give unofficial documentation
to any customer who asks for it).  Here is an example on the amd64
(I notice that I need to do some optimizing - there is no need for
all the register movement in the floating point instructions):

CL-USER(1): (setf (get 'foo 'sys::immed-args-call)
              '((double-float double-float) double-float))
((DOUBLE-FLOAT DOUBLE-FLOAT) DOUBLE-FLOAT)
CL-USER(2): (compile (defun foo (x y)
                       (declare (optimize speed (safety 0)) (double-float x y))
                       (+ x y)))
FOO
NIL
NIL
CL-USER(3): (disassemble *)
;; disassembly of #<Function FOO>
;; formals: X EXCL::DF_PLACE-HOLDER Y EXCL::DF_PLACE-HOLDER

;; code start: #x10009348f8:
   0: 41 ff a7 3f 06 jmp	*[r15+1599]  ; SYS::IMMED-ARG-HOOK
      00 00 
   7: f2 44 0f 10 e8 movsd	xmm13,xmm0
  12: f2 44 0f 10 e1 movsd	xmm12,xmm1
  17: f2 45 0f 58 ec addsd	xmm13,xmm12
  22: f2 41 0f 10 c5 movsd	xmm0,xmm13
  27: 4c 8b 74 24 10 movq	r14,[rsp+16]
  32: c3             ret
  33: 90             nop
CL-USER(4): 

So the "normal" entry point is at offset 0, and the "immediate" entry
is at offset 7 on this architecture.  Note that there should be only
one floating point instruction, a "addsd xmm0,xmm1", but that should be
easy to fix.

When the garbage collector enounters one of these functions on the stack,
it has a spec list of argument kinds (one of single-float, double-float,
machine-integer, or lisp) and it will skip over the slots that aren't
lisp slots (if it didn't do this, there would be problems for bit
patterns that happened to look like lisp objects).

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: rif
Subject: Re: float to pointer coercion (cost 13)
Date: 
Message-ID: <wj0fyy0x6dm.fsf@five-percent-nation.mit.edu>
Duane Rettig <·····@franz.com> writes:

> > May I ask how this works?  Do you compile two versions of the
> > function, one with immediate-args and one without, and decide which
> > one to use at compile-time, depending on whether or not the calling
> > function knows the return type?
> 
> Nope.  The function is compiled once, but it is given two entry
> points.  The "normal" entry point is actually at the beginning of
> the code vector, and it jumps to a hook function, whose job it is
> to unbox the arguments according to the spec and then "call" the
> original function again, but to the immediate-args entry point.
> Once the function is done, it returns to the hook function, which
> boxes up the result.  When a function is compiled that knows about
> the immediate-function, it places its arguments according to the calling
> standard, and calls the function at its secondary (immediate-args)
> entry point.  It is also responsible for handling any unboxed value
> coming back, since the function will return directly to the caller.

That's totally sweet.

rif
From: Kent M Pitman
Subject: Re: float to pointer coercion (cost 13)
Date: 
Message-ID: <uzmw3o32u.fsf@nhplace.com>
Duane Rettig <·····@franz.com> writes:

> > May I ask how this works?  Do you compile two versions of the
> > function, one with immediate-args and one without, and decide which
> > one to use at compile-time, depending on whether or not the calling
> > function knows the return type?
> 
> Nope.  The function is compiled once, but it is given two entry
> points.  The "normal" entry point is actually at the beginning of
> the code vector, and it jumps to a hook function, whose job it is
> to unbox the arguments according to the spec and then "call" the
> original function again, but to the immediate-args entry point.
> Once the function is done, it returns to the hook function, which
> boxes up the result.  When a function is compiled that knows about
> the immediate-function, it places its arguments according to the calling
> standard, and calls the function at its secondary (immediate-args)
> entry point.  It is also responsible for handling any unboxed value
> coming back, since the function will return directly to the caller.

I assume this technique was borrowed from MACLISP, which did a similar
thing.  Or maybe it came from some contemporary hack in the "Franz Lisp",
which I have always assumed is some conceptual if not actual relative 
of Allegro, though oddly I never did the work to cement their connection
in my brain.

Every time I think of this magic calling sequence, it reminds me of a
funny story that... ah, well, why should I repeat myself? Google knows...

http://groups-beta.google.com/group/comp.lang.lisp/msg/08d9bcb39f41765a
From: Duane Rettig
Subject: Re: float to pointer coercion (cost 13)
Date: 
Message-ID: <4sm1vcqxr.fsf@franz.com>
Kent M Pitman <······@nhplace.com> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > > May I ask how this works?  Do you compile two versions of the
> > > function, one with immediate-args and one without, and decide which
> > > one to use at compile-time, depending on whether or not the calling
> > > function knows the return type?
> > 
> > Nope.  The function is compiled once, but it is given two entry
> > points.  The "normal" entry point is actually at the beginning of
> > the code vector, and it jumps to a hook function, whose job it is
> > to unbox the arguments according to the spec and then "call" the
> > original function again, but to the immediate-args entry point.
> > Once the function is done, it returns to the hook function, which
> > boxes up the result.  When a function is compiled that knows about
> > the immediate-function, it places its arguments according to the calling
> > standard, and calls the function at its secondary (immediate-args)
> > entry point.  It is also responsible for handling any unboxed value
> > coming back, since the function will return directly to the caller.
> 
> I assume this technique was borrowed from MACLISP, which did a similar
> thing.

No, I didn't realize that Maclisp used that technique.

>  Or maybe it came from some contemporary hack in the "Franz Lisp",
> which I have always assumed is some conceptual if not actual relative 
> of Allegro, though oddly I never did the work to cement their connection
> in my brain.

It's good that you didn't, because the only thing they have had in
common is the developers who have worked on them.  Franz Lisp largely
mirrored Maclisp, and its original purpose was to provide an underpinning
for the copy of Macsyma that Dr. Fateman brought with him to Berkeley.
Allegro CL was started from scratch (though its sequence functions came
essentially from Spice Lisp code) and had very little moved over from
Franz Lisp, except that the Flavors module was ported to Allegro CL
and then stayed there (where it remains today).

> Every time I think of this magic calling sequence, it reminds me of a
> funny story that... ah, well, why should I repeat myself? Google knows...
> 
> http://groups-beta.google.com/group/comp.lang.lisp/msg/08d9bcb39f41765a

I didn't realize that MacLisp had that kind of implementation. It looks
like it has the same redefinition issues that our immed-args system has.
Also, your farther-out idea seems like if it were carried to extreme,
it would carry with it a combinatorial explosion.  But the idea looks
strikingly similar to a more user-driven style that we are all used
to - defstruct boa-constructors.  With such multiple (but limited)
"entry points" (i.e. functions) describable by the user, a compiler
could indeed do such an optimization.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182