From: Peter Seibel
Subject: References to symbols in functions?
Date: 
Message-ID: <m3hdstzoxj.fsf@javamonkey.com>
So I was thinking about how function calls work and how a function
such as foo:

  (defun foo () (bar))

when compiled needs to contain a reference to the symbol object BAR.
I.e. it can't just jump to the address of the BAR function because BAR
may get redefined. So my question is this: are the references to
symbols in a compiled function just another kind of object reference
when it comes to garbage collection? That is, if the garbage collector
moves the symbol object does the pointer to it from within the
function get dealt with in the same way. The more I think about it the
more it seems that would be the natural way to do it but who knows.
Also I imagine this might differ from implementation to implementation
so I'd be interested to hear about different implementation
techniques. Implementors?

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp

From: Tim Bradshaw
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <ey38ye5jza2.fsf@cley.com>
* Peter Seibel wrote:
> So I was thinking about how function calls work and how a function
> such as foo:

>   (defun foo () (bar))

> when compiled needs to contain a reference to the symbol object BAR.
> I.e. it can't just jump to the address of the BAR function because BAR
> may get redefined. 

Actually, it does not need to refer to the symbol in several
circumstances, which are (I think) at least: if it's a self-recursive
call and there is no NOTINLINE declaration, if it's a call to a
function defined in the same file (compilation unit?) which has no
NOTINLINE declaration.  The details are in the spec in the compilation
section.

--tim
From: Christophe Rhodes
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <sqfz8dh3e0.fsf@cam.ac.uk>
Tim Bradshaw <···@cley.com> writes:

> if it's a call to a function defined in the same file (compilation
> unit?) which has no NOTINLINE declaration.

File.  There is scope in the standard for extending
WITH-COMPILATION-UNIT in this direction, but it emphatically must not
happen by default.  Does any implementation actually have this
extension?

Christophe
-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)
From: Tim Bradshaw
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <ey38ye4rrsj.fsf@cley.com>
* Christophe Rhodes wrote:
> File.  There is scope in the standard for extending
> WITH-COMPILATION-UNIT in this direction, but it emphatically must not
> happen by default.  Does any implementation actually have this
> extension?

It would be nice if they did.

This trick occurred to me in a moment of late-night-wakefulness:

    ;;; File A
    ;;;

    (defun foo/main (...)
      ...)

    (declaim (inline foo))

    (defun foo (...)
      ;; FOO can fast-call FOO/MAIN without going through 
      ;; the function cell of the symbol
      (foo/main ...))

    ;;; End of file A

    ;;; File B, compiled after A is compiled & loaded

    (defun bar (...)
      ... (foo ...))

    ;;; End of file B

Now, I think that BAR can inline FOO, but it *can't* inline FOO/MAIN,
but since FOO can `fast-call' FOO/MAIN, and FOO is inlined in BAR,
then I think BAR is licensed to fast-call it too.  Obviously it's
implementation-dependent whether it actually does or not.

I was originally thinking that an INLINE declaration licenses
fast-calls as well since, well, if you can inline something you
obviously are not expected to go through the function cell.  But an
INLINE declaration also licenses inlining, of course, which may well
not be what you want because of code-bloat issues.  This (slightly
horrible) trick might sort-of be equivalent to a potential FAST-CALL
declaration.

--tim
From: Thomas Schilling
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <2kfg7oF1nkgmU1@uni-berlin.de>
Peter Seibel wrote:

> So I was thinking about how function calls work and how a function
> such as foo:
> 
>   (defun foo () (bar))
> 
> when compiled needs to contain a reference to the symbol object BAR.
> I.e. it can't just jump to the address of the BAR function because BAR
> may get redefined. So my question is this: are the references to
> symbols in a compiled function just another kind of object reference
> when it comes to garbage collection? That is, if the garbage collector
> moves the symbol object does the pointer to it from within the
> function get dealt with in the same way. The more I think about it the
> more it seems that would be the natural way to do it but who knows.
> Also I imagine this might differ from implementation to implementation
> so I'd be interested to hear about different implementation
> techniques. Implementors?

AFAICT most (all?) lisp implementations simply use indirect calls. i.e. the
place of the symbol function entry is a fixed location in memory. So if you
redefine the function you just write the address of the new function in the
entry at the symbol table.

Then you have these tricks with several entry points, doing different degree
of arg parsing. I.e. if a function calls itself recursively it can
typically use the non-parsing entry point. I think these kind of
entry-points are marked with some "hairy" in their names.

One advantage of local function definitions is that they can't be redefinied
so the compiler need not use indirect calls (and don't check for legal args
at runtime, so it's pure C-like calls).

One final note (for better reading the following SBCL disassemble output).
Where implementations do differ is the calling convention. I.e. many lisps
don't use CALL/RET. As you see below SBCL uses pure jumps.

Un(?)fortunately SBCL already performed some optimisation in that simple
code, but at least the disassembly of #'foo is useful. So here's its
output:

CL-USER> (defun bar () 1)
BAR
CL-USER> (defun foo () (bar))
FOO
CL-USER> (disassemble #'foo)
; 092EA3E2:       MOV EAX, [#x92EA3B8]        ; #<FDEFINITION object for
BAR>
                                              ; no-arg-parsing entry point
;       E8:       XOR ECX, ECX
;       EA:       PUSH DWORD PTR [EBP-8]
;       ED:       JMP DWORD PTR [EAX+5]
;       F0:       BREAK 10                    ; error trap
;       F2:       BYTE #X02
;       F3:       BYTE #X16                   ; INVALID-ARG-COUNT-ERROR
;       F4:       BYTE #X4D                   ; ECX
; 
NIL
CL-USER> (disassemble #'bar)
; 092AD222:       MOV EDX, 4                  ; no-arg-parsing entry point
;       27:       MOV ECX, [EBP-8]
;       2A:       MOV EAX, [EBP-4]
;       2D:       ADD ECX, 2
;       30:       MOV ESP, EBP
;       32:       MOV EBP, EAX
;       34:       JMP ECX
;       36:       NOP
;       37:       NOP
;       38:       BREAK 10                    ; error trap
;       3A:       BYTE #X02
;       3B:       BYTE #X16                   ; INVALID-ARG-COUNT-ERROR
;       3C:       BYTE #X4D                   ; ECX
; 
NIL

And for examination of return value policies this might be quite
informative:

CL-USER> (defun foo () (bar) nil)
STYLE-WARNING: redefining FOO in DEFUN
FOO
CL-USER> (disassemble #'foo)
; 09B6C5AA:       MOV EDX, ESP                ; no-arg-parsing entry point
;       AC:       SUB ESP, 12
;       AF:       MOV EAX, [#x9B6C580]        ; #<FDEFINITION object for
BAR>
;       B5:       XOR ECX, ECX
;       B7:       MOV [EDX-4], EBP
;       BA:       MOV EBP, EDX
;       BC:       CALL DWORD PTR [EAX+5]
;       BF:       MOV ESP, EBX
;       C1:       MOV EDX, 83886091
;       C6:       MOV ECX, [EBP-8]
;       C9:       MOV EAX, [EBP-4]
;       CC:       ADD ECX, 2
;       CF:       MOV ESP, EBP
;       D1:       MOV EBP, EAX
;       D3:       JMP ECX
;       D5:       NOP
;       D6:       NOP
;       D7:       NOP
;       D8:       BREAK 10                    ; error trap
;       DA:       BYTE #X02
;       DB:       BYTE #X16                   ; INVALID-ARG-COUNT-ERROR
;       DC:       BYTE #X4D                   ; ECX
; 
NIL

(It, kind of, pushes a different return address, before calling #'bar.)

BTW: Your approach with the garbage collector would, theoretically, work.
But you'd need a mechanism to tell the gc that an object has moved, the old
one is discardable and let it then--immediately, without interruption--look
for all references to the old location. So it could become really slow.
(And it probably wouldn't give too much speed advantage, while complicating
the implementation because we now have to rely on a certain gc feature.)

any comments welcome

-ts
-- 
     ,,       
    \../   /  <<< The LISP Effect
   |_\\ _==__         
__ | |bb|   | _________________________________________________
From: Paul F. Dietz
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <q-ednU1lJ_WHMH_dRVn-sw@dls.net>
Thomas Schilling wrote:

> One final note (for better reading the following SBCL disassemble output).
> Where implementations do differ is the calling convention. I.e. many lisps
> don't use CALL/RET. As you see below SBCL uses pure jumps.

I seem to recall that CALL/RET on the x86 is specially optimized
in recent processors (for branch prediction, for example).  I wonder
if avoiding it is actually causing a performance hit.

	Paul
From: Duane Rettig
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <47jtpozf2.fsf@franz.com>
"Paul F. Dietz" <·····@dls.net> writes:

> Thomas Schilling wrote:
> 
> > One final note (for better reading the following SBCL disassemble output).
> > Where implementations do differ is the calling convention. I.e. many lisps
> > don't use CALL/RET. As you see below SBCL uses pure jumps.
> 
> I seem to recall that CALL/RET on the x86 is specially optimized
> in recent processors (for branch prediction, for example).  I wonder
> if avoiding it is actually causing a performance hit.

Likely so.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Ari Johnson
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <AQAEc.3583$nc.3421@fed1read03>
Paul F. Dietz wrote:

> Thomas Schilling wrote:
> 
>> One final note (for better reading the following SBCL disassemble 
>> output).
>> Where implementations do differ is the calling convention. I.e. many 
>> lisps
>> don't use CALL/RET. As you see below SBCL uses pure jumps.
> 
> 
> I seem to recall that CALL/RET on the x86 is specially optimized
> in recent processors (for branch prediction, for example).  I wonder
> if avoiding it is actually causing a performance hit.

I thought I saw a CALL in that code.  Are you sure the JMP wasn't just 
CMUCL doing tail recursion elimination in (defun foo () (bar)) ?
From: Alexander Kjeldaas
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <40E2CCEA.8080906@fast.no>
Paul F. Dietz wrote:
> Thomas Schilling wrote:
> 
>> One final note (for better reading the following SBCL disassemble 
>> output).
>> Where implementations do differ is the calling convention. I.e. many 
>> lisps
>> don't use CALL/RET. As you see below SBCL uses pure jumps.
> 
> 
> I seem to recall that CALL/RET on the x86 is specially optimized
> in recent processors (for branch prediction, for example).  I wonder
> if avoiding it is actually causing a performance hit.
> 

From
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22054.pdf

"The return address stack optimizes CALL/RET instruction pairs by 
storing the return address of each CALL within a nested series of 
subroutines and supplying a return address as the predicted target 
address of the corresponding RET instruction"

But you get similar benefits by using normal branch prediction logic. 
The advantage of the stack is that you never get a collision in the 
table (because it's a stack and not a cache). However, if you have all 
your routines aligned, and you have branches at  fixed offsets from the 
beginning of the routines (argument parsing for example), you can get in 
trouble with the branch predictor.

 From the same document:
"The AMD Athlon processor implements a two-way, 2048-entry branch 
prediction table. The branch prediction table stores prediction 
information that is used for predicting the direction of conditional 
branches".

Two-way means you don't want conditional branches at fixed, aligned offsets.

astor
From: Duane Rettig
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <4y8m5njfu.fsf@franz.com>
Alexander Kjeldaas <··········@fast.no> writes:

> Paul F. Dietz wrote:
> > Thomas Schilling wrote:
> >
> 
> >> One final note (for better reading the following SBCL disassemble
> >> output).
> 
> >> Where implementations do differ is the calling
> >> convention. I.e. many lisps
> 
> >> don't use CALL/RET. As you see below SBCL uses pure jumps.
> > I seem to recall that CALL/RET on the x86 is specially optimized
> 
> > in recent processors (for branch prediction, for example).  I wonder
> > if avoiding it is actually causing a performance hit.
> >
> 
> 
> From
> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22054.pdf
> 
> "The return address stack optimizes CALL/RET instruction pairs by
> storing the return address of each CALL within a nested series of
> subroutines and supplying a return address as the predicted target
> address of the corresponding RET instruction"
> 
> 
> But you get similar benefits by using normal branch prediction
> logic. The advantage of the stack is that you never get a collision in
> the table (because it's a stack and not a cache). However, if you have
> all your routines aligned, and you have branches at  fixed offsets
> from the beginning of the routines (argument parsing for example), you
> can get in trouble with the branch predictor.

I'm not sure what you mean by "get into trouble".  If you mean "lose the
optimization that would otherwise be done", then that would be correct,
but saying it that way might also tend to cause you to draw the conclusion
that there is a huge hit in trying to obtain the optimization and failing.
To my knowledge, there is not, because the branch prediction stack is always
tried, and so a mismatch always results in the pessimization.

>  From the same document:
> "The AMD Athlon processor implements a two-way, 2048-entry branch
> prediction table. The branch prediction table stores prediction
> information that is used for predicting the direction of conditional
> branches".
> 
> 
> Two-way means you don't want conditional branches at fixed, aligned offsets.

Huh?  Explain that, please.

My AMD optimization guide suggests that branches be aligned in such a
way that they don't cross 16-byte boundaries, which suggests that
alignment of branches should be considered at worst neutral, and at
best optimized.  And at least one AMD developer (actually, a SuSE
Linux developer) recommended to me that branch targets be aligned to
16 bytes, especially when used in tight loops.  These seem to
contradict your statement.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Duane Rettig
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <4brj1ozfu.fsf@franz.com>
Thomas Schilling <······@yahoo.de> writes:

> Peter Seibel wrote:
> 
> > So I was thinking about how function calls work and how a function
> > such as foo:
> > 
> >   (defun foo () (bar))
> > 
> > when compiled needs to contain a reference to the symbol object BAR.
> > I.e. it can't just jump to the address of the BAR function because BAR
> > may get redefined. So my question is this: are the references to
> > symbols in a compiled function just another kind of object reference
> > when it comes to garbage collection? That is, if the garbage collector
> > moves the symbol object does the pointer to it from within the
> > function get dealt with in the same way. The more I think about it the
> > more it seems that would be the natural way to do it but who knows.
> > Also I imagine this might differ from implementation to implementation
> > so I'd be interested to hear about different implementation
> > techniques. Implementors?
> 
> AFAICT most (all?) lisp implementations simply use indirect calls. i.e. the
> place of the symbol function entry is a fixed location in memory. So if you
> redefine the function you just write the address of the new function in the
> entry at the symbol table.

Allegro CL does not do this.  We actually use relative offsets and keep two
registers, one for a "global-table" (like a GOT in many unixen or the TOC
in AIX) which holds many symbols that are globally used, and a function
register that holds local data.  Code then references data in these areas
using normal register/offset references.

We consider the fact that our code vectors have _no_ actual addressses
to be a feature.  Functions can be surprisingly consistent; many of the
code vectors in a typical Allegro CL with its application can share 
a surprising number of code vectors, and this helps it to scale for
larger and larger apps and to maintain locality-of-reference for a
longer period.

About 12 years ago, when I first did our AIX port, I tried porting some
concepts over from the Cray port I had done a couple of years before
that - Symbols, when they were tenured (survived ephemeral gcs) were
placed into areas which did not then move them ever again.  This
allowed me to fix code vectors to actually manufacture the symbols'
addresses in memory, and to reduce three chained memory references
down to two parallel (and thus non-chained) memory references to get
the function and its start address (I also made symbols slightly
larger and copied the current start address into the symbol to get
this parallelism).

This technique worked to speed up calls, but not as much as I had
hoped; even though the "slower" calls were chained, there was enough
distance between references, _and_ the call-code was always in cache
because it was the only such code ever used, so that there were only
a couple of percentage points speedup (where I had expected at least
10 - 15%).  In the end, what really killed this concept was the
non-scalability of this style; code vectors could never be merged
(since they now contained or could contain hard memory addresses),
and the few percent speedup was lost due to the loss of
locality-of-refernce due to so many more code-vectors, even with
the cache-poor machines back then.  I eventually killed that port
and went back to our traditional more-scalable style (see below)

> Then you have these tricks with several entry points, doing different degree
> of arg parsing. I.e. if a function calls itself recursively it can
> typically use the non-parsing entry point. I think these kind of
> entry-points are marked with some "hairy" in their names.

We don't tend to identify non-parsing entry-points, though conceptually
they tend to be there for self-calls.  Instead, we simply remove entry
seqences that are not necessary for specific optimization settings.

> One advantage of local function definitions is that they can't be redefinied
> so the compiler need not use indirect calls (and don't check for legal args
> at runtime, so it's pure C-like calls).

I think it was Tim that already answered when this is allowed, and when not.

> One final note (for better reading the following SBCL disassemble output).
> Where implementations do differ is the calling convention. I.e. many lisps
> don't use CALL/RET. As you see below SBCL uses pure jumps.

I shudder at this thought.  Modern machines have "branch prediction"
stacks, which pairs off calls and rets and which causes these specialized
jumps to be much more efficient than just a jump instruction.  I see in
the sbcl code below that the last touching of the ecx register is two
instructions earlier than the jmp, so this may or may not be enough time
to allow enough on-the-fly prediction to remove most of the instruction
pipeline stalls.  However, I can't imagine not using the hardware speed
assistance when it is available....

> Un(?)fortunately SBCL already performed some optimisation in that simple
> code, but at least the disassembly of #'foo is useful. So here's its
> output:

I intersperse Allegro CL equivalent code.  I gave a (speed 3), (debug 0)
optimization level, to show what seems the equivalent of this sbcl code.

> CL-USER> (defun bar () 1)
> BAR
> CL-USER> (defun foo () (bar))
> FOO
> CL-USER> (disassemble #'foo)
> ; 092EA3E2:       MOV EAX, [#x92EA3B8]        ; #<FDEFINITION object for
> BAR>
>                                               ; no-arg-parsing entry point
> ;       E8:       XOR ECX, ECX
> ;       EA:       PUSH DWORD PTR [EBP-8]
> ;       ED:       JMP DWORD PTR [EAX+5]
> ;       F0:       BREAK 10                    ; error trap
> ;       F2:       BYTE #X02
> ;       F3:       BYTE #X16                   ; INVALID-ARG-COUNT-ERROR
> ;       F4:       BYTE #X4D                   ; ECX
> ; 
> NIL
> CL-USER> (disassemble #'bar)
> ; 092AD222:       MOV EDX, 4                  ; no-arg-parsing entry point
> ;       27:       MOV ECX, [EBP-8]
> ;       2A:       MOV EAX, [EBP-4]
> ;       2D:       ADD ECX, 2
> ;       30:       MOV ESP, EBP
> ;       32:       MOV EBP, EAX
> ;       34:       JMP ECX
> ;       36:       NOP
> ;       37:       NOP
> ;       38:       BREAK 10                    ; error trap
> ;       3A:       BYTE #X02
> ;       3B:       BYTE #X16                   ; INVALID-ARG-COUNT-ERROR
> ;       3C:       BYTE #X4D                   ; ECX
> ; 
> NIL

Our versions are as follows.  Note that in #'foo, 'BAR is taken
from the function object as an offset from the function register
(esi, in this case):

CL-USER(8): (compile (defun foo () (bar)))
FOO
NIL
NIL
CL-USER(9): (disassemble 'foo)
;; disassembly of #<Function FOO>
;; formals: [none]
;; constant vector:
0: BAR

;; code start: #x10b10c9c:
   0: e3 02       jcxz	4
   2: cd 61       int	$97             ; SYS::TRAP-ARGERR
   4: 80 7f cb 00 cmpb	[edi-53],$0     ; SYS::C_INTERRUPT-PENDING
   8: 74 02       jz	12
  10: cd 64       int	$100            ; SYS::TRAP-SIGNAL-HIT
  12: 8b 5e 12    movl	ebx,[esi+18]    ; BAR
  15: b1 00       movb	cl,$0
  17: ff e7       jmp	*edi
  19: 90          nop
CL-USER(10): 

Note that if I called a much-used function like #'list, it is
taken from the global table, which is always available to us
for free because we always center the global table around NIL
(register edi in the x86 case).  Note also that the tail-call
is a jmp, which still doesn't throw off the branch-prediction
stack, since the jmp doesn't touch the prediction stack, which
means that it still matches the return address which is still
the next address that will be RET'd from...

CL-USER(28): (compile (defun foo (x y) (list x y)))
FOO
NIL
NIL
CL-USER(29): (disassemble 'foo)
;; disassembly of #<Function FOO>
;; formals: [none]

;; code start: #x10b75f04:
   0: 83 f9 02    cmpl	ecx,$2
   3: 74 02       jz	7
   5: cd 61       int	$97             ; SYS::TRAP-ARGERR
   7: 80 7f cb 00 cmpb	[edi-53],$0     ; SYS::C_INTERRUPT-PENDING
  11: 74 02       jz	15
  13: cd 64       int	$100            ; SYS::TRAP-SIGNAL-HIT
  15: 33 c9       xorl	ecx,ecx
  17: b1 02       movb	cl,$2
  19: ff 67 2f    jmp	*[edi+47]       ; LIST
CL-USER(30): 

> And for examination of return value policies this might be quite
> informative:
> 
> CL-USER> (defun foo () (bar) nil)
> STYLE-WARNING: redefining FOO in DEFUN
> FOO
> CL-USER> (disassemble #'foo)
> ; 09B6C5AA:       MOV EDX, ESP                ; no-arg-parsing entry point
> ;       AC:       SUB ESP, 12
> ;       AF:       MOV EAX, [#x9B6C580]        ; #<FDEFINITION object for
> BAR>
> ;       B5:       XOR ECX, ECX
> ;       B7:       MOV [EDX-4], EBP
> ;       BA:       MOV EBP, EDX
> ;       BC:       CALL DWORD PTR [EAX+5]

Note that since this call instruction isn't paired off with a ret
instruction at the called code, the prediction stack becomes
mismatched, and could actually slow down any intermediate non-lisp
code (C libraries and the like, depending on the depth of the
prediction stack).

> ;       BF:       MOV ESP, EBX
> ;       C1:       MOV EDX, 83886091
> ;       C6:       MOV ECX, [EBP-8]
> ;       C9:       MOV EAX, [EBP-4]
> ;       CC:       ADD ECX, 2
> ;       CF:       MOV ESP, EBP
> ;       D1:       MOV EBP, EAX
> ;       D3:       JMP ECX
> ;       D5:       NOP
> ;       D6:       NOP
> ;       D7:       NOP
> ;       D8:       BREAK 10                    ; error trap
> ;       DA:       BYTE #X02
> ;       DB:       BYTE #X16                   ; INVALID-ARG-COUNT-ERROR
> ;       DC:       BYTE #X4D                   ; ECX
> ; 
> NIL
> 
> (It, kind of, pushes a different return address, before calling #'bar.)

And here is ours; similar, but again, we use ret to return.

CL-USER(24): (compile (defun foo () (bar) nil))
FOO
NIL
NIL
CL-USER(25): (disassemble 'foo)
;; disassembly of #<Function FOO>
;; formals: [none]
;; constant vector:
0: BAR

;; code start: #x10b6d384:
   0: 55          pushl	ebp
   1: 8b ec       movl	ebp,esp
   3: 56          pushl	esi
   4: 83 ec 24    subl	esp,$36
   7: e3 02       jcxz	11
   9: cd 61       int	$97             ; SYS::TRAP-ARGERR
  11: 80 7f cb 00 cmpb	[edi-53],$0     ; SYS::C_INTERRUPT-PENDING
  15: 74 02       jz	19
  17: cd 64       int	$100            ; SYS::TRAP-SIGNAL-HIT
  19: 8b 5e 12    movl	ebx,[esi+18]    ; BAR
  22: b1 00       movb	cl,$0
  24: ff d7       call	*edi
  26: 8b c7       movl	eax,edi
  28: f8          clc
  29: c9          leave
  30: 8b 75 fc    movl	esi,[ebp-4]
  33: c3          ret
CL-USER(26): 

Finally, here is an example of code-vector sharability; the
code-vectors in this example are not shared, because I am using
a lisp that doesn't have any shared-codevectors.  But if you
note that the actual bytes in the code-vectors are precisely
the same, these two code-vectors can be shared, when placed
into a .pll file (our version of a pure/read-only/shared space).
For clarity, I added (safety 0) to the optimize declamation:

CL-USER(13): (compile (defun foo () (bar)))
FOO
NIL
NIL
CL-USER(14): (disassemble 'foo)
;; disassembly of #<Function FOO>
;; formals: [none]
;; constant vector:
0: BAR

;; code start: #x10b1507c:
   0: 8b 5e 12    movl	ebx,[esi+18]    ; BAR
   3: b1 00       movb	cl,$0
   5: ff e7       jmp	*edi
   7: 90          nop
CL-USER(15): 


CL-USER(16): (compile (defun bas () (bam)))
Warning: While compiling these undefined functions were referenced: BAM.
BAS
NIL
NIL
CL-USER(17): (disassemble 'bas)
;; disassembly of #<Function BAS>
;; formals: [none]
;; constant vector:
0: BAM

;; code start: #x10b17524:
   0: 8b 5e 12    movl	ebx,[esi+18]    ; BAM
   3: b1 00       movb	cl,$0
   5: ff e7       jmp	*edi
   7: 90          nop
CL-USER(18): 

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Helmut Eller
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <m2n02l7xux.fsf@stud3.tuwien.ac.at>
Duane Rettig <·····@franz.com> writes:

[lots of interesting stuff deleted]
> Allegro CL does not do this.  We actually use relative offsets and keep two
> registers, one for a "global-table" (like a GOT in many unixen or the TOC
> in AIX) which holds many symbols that are globally used, and a function
> register that holds local data.  Code then references data in these areas
> using normal register/offset references.
[...]
> Our versions are as follows.  Note that in #'foo, 'BAR is taken
> from the function object as an offset from the function register
> (esi, in this case):
[...]
> CL-USER(9): (disassemble 'foo)
> ;; disassembly of #<Function FOO>
> ;; formals: [none]
> ;; constant vector:
> 0: BAR
>
> ;; code start: #x10b10c9c:
>    0: e3 02       jcxz	4
>    2: cd 61       int	$97             ; SYS::TRAP-ARGERR
>    4: 80 7f cb 00 cmpb	[edi-53],$0     ; SYS::C_INTERRUPT-PENDING
>    8: 74 02       jz	12
>   10: cd 64       int	$100            ; SYS::TRAP-SIGNAL-HIT
>   12: 8b 5e 12    movl	ebx,[esi+18]    ; BAR
>   15: b1 00       movb	cl,$0
>   17: ff e7       jmp	*edi
>   19: 90          nop
> CL-USER(10): 

And edi is the pointer to the "global-table" right?  So you move 'BAR
(or the function for BAR) to ebx and then jump to some code in the
first slot of the global-table.  I assume that code performs some
magic and eventually jumps to BAR via ebx.

First question: don't you have poor branch target prediction in the
code at edi[0]?  Many calls go through that code and most of the time
with different values for ebx; the BTB will not work efficiently.

Second question: wouldn't it be faster to inline the code in edi[0]
instead of sharing it?  The code would get a bit larger but the BTB
could work better.

Helmut.
From: Duane Rettig
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <4pt7gonzp.fsf@franz.com>
Helmut Eller <········@stud3.tuwien.ac.at> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> [lots of interesting stuff deleted]
> > ;; code start: #x10b10c9c:
> >    0: e3 02       jcxz	4
> >    2: cd 61       int	$97             ; SYS::TRAP-ARGERR
> >    4: 80 7f cb 00 cmpb	[edi-53],$0     ; SYS::C_INTERRUPT-PENDING
> >    8: 74 02       jz	12
> >   10: cd 64       int	$100            ; SYS::TRAP-SIGNAL-HIT
> >   12: 8b 5e 12    movl	ebx,[esi+18]    ; BAR
> >   15: b1 00       movb	cl,$0
> >   17: ff e7       jmp	*edi
> >   19: 90          nop
> > CL-USER(10): 
> 
> And edi is the pointer to the "global-table" right?

Right.  It is also the NIL register, and in the case of x86 (and
a few other architectures) it is also the "trampoline register".

>  So you move 'BAR
> (or the function for BAR) to ebx and then jump to some code in the
> first slot of the global-table.  I assume that code performs some
> magic and eventually jumps to BAR via ebx.

Yes, besides allowing multiple calling style options, including
stack-overflow detection (always on), call-counting (without
having to recompile any code), possible stepping and/or tracing of
function starts, and saving of arguments that are normally passed in
registers [in Allegro CL on the x86, the first two arguments are
passed in eax and edx, respectively, and they do not get saved
anywhere unless the argument-saving option is specified.  This
results in much faster code in general, but it also reduces
debuggability, since the debugger doesn't know where the called
function has stored the values, if at all - the :args command allows
you to decide for the trampoline to arrange to store these register
arguments, at a slight cost in speed, in a known location.  C compilers
give you this option as well, but it generally tends to be a compiler
option only, so code you want to debug must be recompiled e.g. with -g]


> First question: don't you have poor branch target prediction in the
> code at edi[0]?

Not any worse than any other code.

  Many calls go through that code and most of the time
> with different values for ebx; the BTB will not work efficiently.

BTBs tend to be most useful in loops.  I'm not sure what the
characteristics would be for calls.  Note that the taken/not-taken
prediction is easy (these calls are always taken).  However, since
the BTB is of limited size, I don't think it scales well.  I'm not
an expert on these, though.

The branch prediction _stack_ is primarily used to speed up returns,
and it does so reliably as long as the stack is kept synchronized, and
as long as the recursion depth is not larger than the stack.

> Second question: wouldn't it be faster to inline the code in edi[0]
> instead of sharing it?  The code would get a bit larger but the BTB
> could work better.

But how well does the BTB work anyway, in the face of a copying
garbage-collector?  In our code generation, the branches _to_ *edi
are always predictable and never change, thus allowing them to be
zero cost.  Also, the code at *edi is always in cache, because it is
used so often.  If we were to inline the call code, then we would have
to deal with hits on code size itself (reducing the liklihood that the
code is in the cache), as well as whole BTB invaidation phases in the
code execution immediately after gcs.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Helmut Eller
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <m2fz8c949y.fsf@stud3.tuwien.ac.at>
Duane Rettig <·····@franz.com> writes:

> But how well does the BTB work anyway, in the face of a copying
> garbage-collector? 

Hard to tell without benchmarking.  Intuitively I would say that
frequently executed functions and code objects are in the older
generations and so rarely moved/updated that it hasn't much impact on
BTB performance.

> In our code generation, the branches _to_ *edi
> are always predictable and never change, thus allowing them to be
> zero cost.  Also, the code at *edi is always in cache, because it is
> used so often.  If we were to inline the call code, then we would have
> to deal with hits on code size itself (reducing the liklihood that the
> code is in the cache), as well as whole BTB invaidation phases in the
> code execution immediately after gcs.

My idea was to inline only the final jump, i.e. instead of writing 
 
   jmp *edi

use something like

   call *edi
   jmp *[esi+<offset>]

Most of the code would still be shared, but the call site gets its own
entry in the BTB and a good chance that the target will be predicted
correctly.  Of course it's impossible to say if this actually faster
without benchmarking, so I better stop here :-)

Helmut.
From: Duane Rettig
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <4lli4ogxm.fsf@franz.com>
Helmut Eller <········@stud3.tuwien.ac.at> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > But how well does the BTB work anyway, in the face of a copying
> > garbage-collector? 
> 
> Hard to tell without benchmarking.  Intuitively I would say that
> frequently executed functions and code objects are in the older
> generations and so rarely moved/updated that it hasn't much impact on
> BTB performance.
> 
> > In our code generation, the branches _to_ *edi
> > are always predictable and never change, thus allowing them to be
> > zero cost.  Also, the code at *edi is always in cache, because it is
> > used so often.  If we were to inline the call code, then we would have
> > to deal with hits on code size itself (reducing the liklihood that the
> > code is in the cache), as well as whole BTB invaidation phases in the
> > code execution immediately after gcs.
> 
> My idea was to inline only the final jump, i.e. instead of writing 
>  
>    jmp *edi
> 
> use something like
> 
>    call *edi
>    jmp *[esi+<offset>]
> 
> Most of the code would still be shared, but the call site gets its own
> entry in the BTB and a good chance that the target will be predicted
> correctly.  Of course it's impossible to say if this actually faster
> without benchmarking, so I better stop here :-)

This is an intriguing idea.  It would bring back logistical problems
for our profiler, which looks at the function register to determine what
function the timer hit has stopped on, so if it stopped on the jmp
instruction, it would have to throw that hit away.  It is also completely
unintuitive that adding yet _another_ jump instruction would result in
more efficient branching.  But it certainly seems worth looking into.

The AMD's BTB has 2048 entries.  Do we know what P4 has?

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Paul Dietz
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <40E32F43.B746BA66@motorola.com>
Helmut Eller wrote:
> 
> Duane Rettig <·····@franz.com> writes:
> 
> > But how well does the BTB work anyway, in the face of a copying
> > garbage-collector?
> 
> Hard to tell without benchmarking.  Intuitively I would say that
> frequently executed functions and code objects are in the older
> generations and so rarely moved/updated that it hasn't much impact on
> BTB performance.

Wouldn't the instruction cache be invalidated after code
was moved?  That would (I suspect) dominate the effect on the BTB.

	Paul
From: Duane Rettig
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <4hdssoecp.fsf@franz.com>
Paul Dietz <············@motorola.com> writes:

> Helmut Eller wrote:
> > 
> > Duane Rettig <·····@franz.com> writes:
> > 
> > > But how well does the BTB work anyway, in the face of a copying
> > > garbage-collector?
> > 
> > Hard to tell without benchmarking.  Intuitively I would say that
> > frequently executed functions and code objects are in the older
> > generations and so rarely moved/updated that it hasn't much impact on
> > BTB performance.
> 
> Wouldn't the instruction cache be invalidated after code
> was moved?  That would (I suspect) dominate the effect on the BTB.

Helmut's point is that older code tends to be moved less than new
code.  And it is indeed true that I've seen in our profiler
otherwise-unexplained hits on the very first instruction of a code
vector (which is what would be seen if the BTB failed).  I suspect that
in small applications where gcs are rare (especially benchmarks :-)
it might make a sizeable difference.

Anyway, a BTB miss invalidates that entry, whether the miss was due
to gc moving code or jumping from a too-general funnelling point.
As I told Helmut, my interest is piqued.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Christopher C. Stacy
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <ueknxj5pz.fsf@news.dtpq.com>
>>>>> On Wed, 30 Jun 2004 17:35:18 GMT, Helmut Eller ("Helmut") writes:
 Helmut> And edi is the pointer to the "global-table" right?

Edi's always pointing at various useful things.
From: Thomas Schilling
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <2kgbliF21vrcU1@uni-berlin.de>
Duane Rettig wrote:

>> One final note (for better reading the following SBCL disassemble
>> output). Where implementations do differ is the calling convention. I.e.
>> many lisps don't use CALL/RET. As you see below SBCL uses pure jumps.
> 
> I shudder at this thought.  Modern machines have "branch prediction"
> stacks, which pairs off calls and rets and which causes these specialized
> jumps to be much more efficient than just a jump instruction.  I see in
> the sbcl code below that the last touching of the ecx register is two
> instructions earlier than the jmp, so this may or may not be enough time
> to allow enough on-the-fly prediction to remove most of the instruction
> pipeline stalls.  However, I can't imagine not using the hardware speed
> assistance when it is available....

I'm sorry for stating such nonsense.

SBCL *does* use at least CALL. As a the following shows (Already wondered
how you could get the Instruction Pointer onto the stack on a x86 without
using CALL)

CL-USER> (defun fac (x)
           (if (< x 2)
               1
               (* x (fac (1- x)))))
FAC
CL-USER> (disassemble #'fac)
; 095EA916:       MOV EDX, [EBP-12]           ; no-arg-parsing entry point
;       19:       MOV EDI, 8
;       1E:       CALL #x1000380              ; GENERIC-<
;       23:       MOV ESP, EBX
;       25:       CMP EDX, 83886091
;       2B:       JNE L1
;       2D:       MOV EDX, [EBP-12]
;       30:       MOV EDI, 4
;       35:       CALL #x10001FA              ; GENERIC--
;       3A:       MOV ESP, EBX
;       3C:       MOV EBX, ESP
;       3E:       SUB ESP, 12
;       41:       MOV EAX, [#x95EA8E8]        ; #<FDEFINITION object for
FAC>
;       47:       MOV ECX, 4
;       4C:       MOV [EBX-4], EBP
;       4F:       MOV EBP, EBX
;       51:       CALL DWORD PTR [EAX+5]
;       54:       MOV ESP, EBX
;       56:       MOV EDI, EDX
;       58:       MOV EDX, [EBP-12]
;       5B:       CALL #x1000266              ; GENERIC-*
;       60:       MOV ESP, EBX
;       62: L0:   MOV ECX, [EBP-8]
;       65:       MOV EAX, [EBP-4]
;       68:       ADD ECX, 2
;       6B:       MOV ESP, EBP
;       6D:       MOV EBP, EAX
;       6F:       JMP ECX
;       71: L1:   MOV EDX, 4
;       76:       JMP L0
;       78:       BREAK 10                    ; error trap
;       7A:       BYTE #X02
;       7B:       BYTE #X16                   ; INVALID-ARG-COUNT-ERROR
;       7C:       BYTE #X4D                   ; ECX
; 
NIL

Though. I can't find a RET. just the JMP ECX at 6F. where ECX is a value
from the stack, which looks like the return address. Obviously predefined
functions are called directly.

What's interesting is that the recursion is an indirect call. Hmmm ...
Possibly to allow redefinition of running recursive functions.

-- 
     ,,       
    \../   /  <<< The LISP Effect
   |_\\ _==__         
__ | |bb|   | _________________________________________________
From: Joe Marshall
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <acyl2enp.fsf@ccs.neu.edu>
Thomas Schilling <······@yahoo.de> writes:

> AFAICT most (all?) lisp implementations simply use indirect calls. i.e. the
> place of the symbol function entry is a fixed location in memory. So if you
> redefine the function you just write the address of the new function in the
> entry at the symbol table.

MIT Scheme uses UUO-links.  Each code block has a `jump table' that
contains the address of the functions that are called.  When a code
block is created or loaded, the jump table points to the linker.  When
the linker is invoked, it finds the function entry point and puts that
in the jump table (it `snaps' the UUO-link).  When you redefine a
function, all callers of the function are patched up.

The LMI K-machine had some special hardware for doing this lazily.

> Then you have these tricks with several entry points, doing different degree
> of arg parsing. I.e. if a function calls itself recursively it can
> typically use the non-parsing entry point. I think these kind of
> entry-points are marked with some "hairy" in their names.

UUO-links can take advantage of special entry points.
From: Thomas Schilling
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <2kgc9rF21vrcU2@uni-berlin.de>
Joe Marshall wrote:

> Thomas Schilling <······@yahoo.de> writes:
> 
>> AFAICT most (all?) lisp implementations simply use indirect calls. i.e.
>> the place of the symbol function entry is a fixed location in memory. So
>> if you redefine the function you just write the address of the new
>> function in the entry at the symbol table.
> 
> MIT Scheme uses UUO-links.  Each code block has a `jump table' that
> contains the address of the functions that are called.  When a code
> block is created or loaded, the jump table points to the linker.  When
> the linker is invoked, it finds the function entry point and puts that
> in the jump table (it `snaps' the UUO-link).  When you redefine a
> function, all callers of the function are patched up.
> 
> The LMI K-machine had some special hardware for doing this lazily.
> 
>> Then you have these tricks with several entry points, doing different
>> degree of arg parsing. I.e. if a function calls itself recursively it can
>> typically use the non-parsing entry point. I think these kind of
>> entry-points are marked with some "hairy" in their names.
> 
> UUO-links can take advantage of special entry points.

Yes already thought about this approach being possible. Maybe it doesn't
give you that much performance improvement. (I.e. is prediction of an
indirect call as fast as for a direct call?).

-- 
     ,,       
    \../   /  <<< The LISP Effect
   |_\\ _==__         
__ | |bb|   | _________________________________________________
From: Peter Seibel
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <m38ye5yxmv.fsf@javamonkey.com>
Thomas Schilling <······@yahoo.de> writes:

> Peter Seibel wrote:
>
>> So I was thinking about how function calls work and how a function
>> such as foo:
>> 
>>   (defun foo () (bar))
>> 
>> when compiled needs to contain a reference to the symbol object BAR.
>> I.e. it can't just jump to the address of the BAR function because BAR
>> may get redefined. So my question is this: are the references to
>> symbols in a compiled function just another kind of object reference
>> when it comes to garbage collection? That is, if the garbage collector
>> moves the symbol object does the pointer to it from within the
>> function get dealt with in the same way. The more I think about it the
>> more it seems that would be the natural way to do it but who knows.
>> Also I imagine this might differ from implementation to implementation
>> so I'd be interested to hear about different implementation
>> techniques. Implementors?
>
> AFAICT most (all?) lisp implementations simply use indirect calls.
> i.e. the place of the symbol function entry is a fixed location in
> memory. So if you redefine the function you just write the address
> of the new function in the entry at the symbol table.

So are you saying the compiled function simply contains the raw
address of the entry points of the other functions it calls? Then how
are those fixed addresses managed? What happens if I define a function
BAR, define FOO that calls BAR. Then unintern both the FOO and BAR
symbols and then define new BAR and FOO functions (i.e. with new
symbols as names)? Is there some way to eventually recycle fixed
symbol function entry addresses of the original functions?

> BTW: Your approach with the garbage collector would, theoretically,
> work. But you'd need a mechanism to tell the gc that an object has
> moved, the old one is discardable and let it then--immediately,
> without interruption--look for all references to the old location.
> So it could become really slow. (And it probably wouldn't give too
> much speed advantage, while complicating the implementation because
> we now have to rely on a certain gc feature.)

But don't garbage collectors already have to do this for all other
kinds of objects. Any copying garbage collector (including
generational collectors) has to be able to move objects, right? Why
should function objects be treated differently?

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Duane Rettig
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <43c4doyzt.fsf@franz.com>
Peter Seibel <·····@javamonkey.com> writes:

> Thomas Schilling <······@yahoo.de> writes:
> 
> > Peter Seibel wrote:
> >
> >> So I was thinking about how function calls work and how a function
> >> such as foo:
> >> 
> >>   (defun foo () (bar))
> >> 
> >> when compiled needs to contain a reference to the symbol object BAR.
> >> I.e. it can't just jump to the address of the BAR function because BAR
> >> may get redefined. So my question is this: are the references to
> >> symbols in a compiled function just another kind of object reference
> >> when it comes to garbage collection? That is, if the garbage collector
> >> moves the symbol object does the pointer to it from within the
> >> function get dealt with in the same way. The more I think about it the
> >> more it seems that would be the natural way to do it but who knows.
> >> Also I imagine this might differ from implementation to implementation
> >> so I'd be interested to hear about different implementation
> >> techniques. Implementors?
> >
> > AFAICT most (all?) lisp implementations simply use indirect calls.
> > i.e. the place of the symbol function entry is a fixed location in
> > memory. So if you redefine the function you just write the address
> > of the new function in the entry at the symbol table.
> 
> So are you saying the compiled function simply contains the raw
> address of the entry points of the other functions it calls? Then how
> are those fixed addresses managed? What happens if I define a function
> BAR, define FOO that calls BAR. Then unintern both the FOO and BAR
> symbols and then define new BAR and FOO functions (i.e. with new
> symbols as names)? Is there some way to eventually recycle fixed
> symbol function entry addresses of the original functions?

In the case of Allegro CL, the function object is distinct from the
code vector it uses (which might also be used by other functions).
So the containers of specific addresses are done via the function
objects, not the code vectors.  Code vectors, just like other lisp
objects, might or might not move, depending on what space they are
in, and so the gc must allow for the potential movement of not
only code vectors, but addresses within those code vectors (i.e.,
a return address stored on the stack when that code called another
function).

> > BTW: Your approach with the garbage collector would, theoretically,
> > work. But you'd need a mechanism to tell the gc that an object has
> > moved, the old one is discardable and let it then--immediately,
> > without interruption--look for all references to the old location.
> > So it could become really slow. (And it probably wouldn't give too
> > much speed advantage, while complicating the implementation because
> > we now have to rely on a certain gc feature.)
> 
> But don't garbage collectors already have to do this for all other
> kinds of objects. Any copying garbage collector (including
> generational collectors) has to be able to move objects, right? Why
> should function objects be treated differently?

In Allegro CL, the gc is what is doing the moving of all objects,
so there is no need to inform the gc what it already knows it is
doing.  Code start and return addresses are only treated specially
in the sense that they are random pointers somewhere into the middle
of a lisp object, whereas other lisp pointers to be forwarded tend
to be tagged pointers to the objects themselves.  Otherwise, the
treatment is the same as for other lisp objects.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Peter Seibel
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <m3wu1pxbq9.fsf@javamonkey.com>
Duane Rettig <·····@franz.com> writes:

> In the case of Allegro CL, the function object is distinct from the
> code vector it uses (which might also be used by other functions).
> So the containers of specific addresses are done via the function
> objects, not the code vectors.

Is the code vector the thing that actually contains the machine code?
And the function objects contain references to the symbols or
functions they call? I'm sort of lost. Can you explain how the code
generated for:

  (defun foo () (bar))

finds the current definition for the function named BAR (assuming BAR
is declared NOTINLINE, etc.) when FOO is called.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Duane Rettig
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <4u0wsoqqr.fsf@franz.com>
Peter Seibel <·····@javamonkey.com> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > In the case of Allegro CL, the function object is distinct from the
> > code vector it uses (which might also be used by other functions).
> > So the containers of specific addresses are done via the function
> > objects, not the code vectors.
> 
> Is the code vector the thing that actually contains the machine code?

Yes.

> And the function objects contain references to the symbols or
> functions they call?

Yes.

> I'm sort of lost.

Doesn't sound like it to me.

> Can you explain how the code generated for:
> 
>   (defun foo () (bar))
> 
> finds the current definition for the function named BAR (assuming BAR
> is declared NOTINLINE, etc.) when FOO is called.

Sure.  The inspector and disassemble are your friends, here - with
(declaim (OPTIMIZE (SAFETY 0) (SPACE 1) (SPEED 3) (DEBUG 0))) to
reduce clutter:

CL-USER(9): (compile (defun foo () (bar)))
Warning: While compiling these undefined functions were referenced: BAR.
FOO
NIL
NIL
CL-USER(10): #'foo
#<Function FOO>
CL-USER(11): :i *
A NEW #<Function FOO>
  lambda-list: NIL
   0 excl-type ----> Bit field: #x08
   1 flags --------> Bit field: #xa8
   2 start --------> Bit field: #x71b18734
   3 hash ---------> Bit field: #x00009874
   4 symdef -------> The symbol FOO
   5 code ---------> short simple CODE vector (8) = #(24203 45330 65280 ...)
   6 formals ------> The symbol NIL
   7 cframe-size --> fixnum 0 [#x00000000]
   8 immed-args ---> fixnum 0 [#x00000000]
   9 locals -------> fixnum 0 [#x00000000]
  10 <constant> ---> The symbol BAR
CL-USER(12): 

Note that the constant 'BAR is in #'FOO, as is a code
vector of length 8 [we use two-byte representations, similar to
(unsigned-byte 16) for code-vectors on all platforms, simply for
consistency.  This sometime results in nops on x86, whose
granularity is on byte, and it thus requires two code-vector
elements per instruction on RISC versions, whose instructions
are always 32-bits wide]:

CL-USER(12): :i 5
A NEW short simple CODE vector (8) @ #x71b18742
   0-> The field #x5e8b
   1-> The field #xb112
   2-> The field #xff00
   3-> The field #x90e7
   4-> The field #x0000
   5-> The field #x0000
   6-> The field #x0000
   7-> The field #x0000
CL-USER(13): 

This corresponds to the bytes in the disassembler output, below:

CL-USER(13): (disassemble 'foo)
;; disassembly of #<Function FOO>
;; formals: [none]
;; constant vector:
0: BAR

;; code start: #x71b18734:
   0: 8b 5e 12    movl	ebx,[esi+18]    ; BAR
   3: b1 00       movb	cl,$0
   5: ff e7       jmp	*edi
   7: 90          nop
CL-USER(14): 

The actual call is done at *edi, but the symbol is gotten
by accessing 'BAR out of esi (which has #'FOO) and doing the
jump (or call).  We always use ebx to name the function being
called, in case the symbol is not fboundp, in which case an
appropriate error can be constructed.

-- 
Duane Rettig    ·····@franz.com    Franz Inc.  http://www.franz.com/
555 12th St., Suite 1450               http://www.555citycenter.com/
Oakland, Ca. 94607        Phone: (510) 452-2000; Fax: (510) 452-0182   
From: Peter Seibel
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <m3k6xoyhp5.fsf@javamonkey.com>
Duane Rettig <·····@franz.com> writes:

> Peter Seibel <·····@javamonkey.com> writes:
>
>> Duane Rettig <·····@franz.com> writes:
>> 
>> > In the case of Allegro CL, the function object is distinct from the
>> > code vector it uses (which might also be used by other functions).
>> > So the containers of specific addresses are done via the function
>> > objects, not the code vectors.
>> 
>> Is the code vector the thing that actually contains the machine code?
>
> Yes.
>
>> And the function objects contain references to the symbols or
>> functions they call?
>
> Yes.
>
>> I'm sort of lost.
>
> Doesn't sound like it to me.
>
>> Can you explain how the code generated for:
>> 
>>   (defun foo () (bar))
>> 
>> finds the current definition for the function named BAR (assuming BAR
>> is declared NOTINLINE, etc.) when FOO is called.
>
> Sure.  The inspector and disassemble are your friends, here - with
> (declaim (OPTIMIZE (SAFETY 0) (SPACE 1) (SPEED 3) (DEBUG 0))) to
> reduce clutter:

Heh. I tried disassembling but it didn't occur to me to inspect the
function object. Duh. I guess my Lisp Fu is weaker than I thought.

> CL-USER(9): (compile (defun foo () (bar)))
> Warning: While compiling these undefined functions were referenced: BAR.
> FOO
> NIL
> NIL
> CL-USER(10): #'foo
> #<Function FOO>
> CL-USER(11): :i *
> A NEW #<Function FOO>
>   lambda-list: NIL
>    0 excl-type ----> Bit field: #x08
>    1 flags --------> Bit field: #xa8
>    2 start --------> Bit field: #x71b18734
>    3 hash ---------> Bit field: #x00009874
>    4 symdef -------> The symbol FOO
>    5 code ---------> short simple CODE vector (8) = #(24203 45330 65280 ...)
>    6 formals ------> The symbol NIL
>    7 cframe-size --> fixnum 0 [#x00000000]
>    8 immed-args ---> fixnum 0 [#x00000000]
>    9 locals -------> fixnum 0 [#x00000000]
>   10 <constant> ---> The symbol BAR
> CL-USER(12): 
>
> Note that the constant 'BAR is in #'FOO, as is a code
> vector of length 8 [we use two-byte representations, similar to
> (unsigned-byte 16) for code-vectors on all platforms, simply for
> consistency.  This sometime results in nops on x86, whose
> granularity is on byte, and it thus requires two code-vector
> elements per instruction on RISC versions, whose instructions
> are always 32-bits wide]:
>
> CL-USER(12): :i 5
> A NEW short simple CODE vector (8) @ #x71b18742
>    0-> The field #x5e8b
>    1-> The field #xb112
>    2-> The field #xff00
>    3-> The field #x90e7
>    4-> The field #x0000
>    5-> The field #x0000
>    6-> The field #x0000
>    7-> The field #x0000
> CL-USER(13): 
>
> This corresponds to the bytes in the disassembler output, below:
>
> CL-USER(13): (disassemble 'foo)
> ;; disassembly of #<Function FOO>
> ;; formals: [none]
> ;; constant vector:
> 0: BAR
>
> ;; code start: #x71b18734:
>    0: 8b 5e 12    movl	ebx,[esi+18]    ; BAR
>    3: b1 00       movb	cl,$0
>    5: ff e7       jmp	*edi
>    7: 90          nop
> CL-USER(14): 
>
> The actual call is done at *edi, but the symbol is gotten
> by accessing 'BAR out of esi (which has #'FOO) and doing the
> jump (or call).  We always use ebx to name the function being
> called, in case the symbol is not fboundp, in which case an
> appropriate error can be constructed.

Okay, let me try to translate that into something more my speed (as I
don't really know x86 assembly)--you jump to the code pointed to by
the address in the edi register which is some built in
function-calling code that's part of the runtime. That code knows to
look in the ebx register for the address of the symbol that names the
function to be called and from there can find the function object
associated with that symbol. And the first instruction in FOO's code
moves that address into ebx from the address in esi + 18 (bytes?
words?) where esi always holds the address of the function object
currently being called.

So now to loop back around to my original question--which I think you
answered but I want to make sure I understood--in Allegro does the
address of the symbol BAR every change, e.g. when the GC moves things
around? And if so, is the address that is part of #'FOO get fixed just
like any other address that changes as a result of the GC? I think the
answers are "yes" and "yes". And finally, is it also the case that
code vectors themselves never have to change as a result of GC, i.e.
they always refer to things by some offset from the start of the
function object? I assume, since you talked about sharing code
vectors, that this also means closures find their captured environment
via the function object. Is that right?

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Frode Vatvedt Fjeld
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <2h7jtp8ic4.fsf@vserver.cs.uit.no>
Peter Seibel <·····@javamonkey.com> writes:

> are the references to symbols in a compiled function just another
> kind of object reference when it comes to garbage collection?

This is the case in my CL implementation, and I believe it's the
standard thing to do.

I have been trying to think of other ways to do this so as to avoid
some of the indirections when calling some function by name, but any
such scheme is likely to require special-casing in the GC, and some
intricacies in (setf symbol-function), and all in all I'm not at all
convinced it's even possible to achieve this without accepting some
serious penalties one way or another.

-- 
Frode Vatvedt Fjeld
From: Joe Marshall
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <eknx2eyw.fsf@ccs.neu.edu>
Peter Seibel <·····@javamonkey.com> writes:

> So I was thinking about how function calls work and how a function
> such as foo:
>
>   (defun foo () (bar))
>
> when compiled needs to contain a reference to the symbol object BAR.
> I.e. it can't just jump to the address of the BAR function because BAR
> may get redefined.  So my question is this: are the references to
> symbols in a compiled function just another kind of object reference
> when it comes to garbage collection?  

Sort of.  Since symbols are interned, you can put them in a special
area where they don't need to be moved.  Then you can ignore them.
From: Peter Seibel
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <m3smcdxahs.fsf@javamonkey.com>
Joe Marshall <···@ccs.neu.edu> writes:

> Peter Seibel <·····@javamonkey.com> writes:
>
>> So I was thinking about how function calls work and how a function
>> such as foo:
>>
>>   (defun foo () (bar))
>>
>> when compiled needs to contain a reference to the symbol object BAR.
>> I.e. it can't just jump to the address of the BAR function because BAR
>> may get redefined.  So my question is this: are the references to
>> symbols in a compiled function just another kind of object reference
>> when it comes to garbage collection?  
>
> Sort of.  Since symbols are interned, you can put them in a special
> area where they don't need to be moved.  Then you can ignore them.

So does this mean--in implementations that work the way you
describe--that even if a symbol is uninterned and all references to it
dropped it will continue to take up space in memory?

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Barry Margolin
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <barmar-7953A0.12531630062004@comcast.dca.giganews.com>
In article <··············@javamonkey.com>,
 Peter Seibel <·····@javamonkey.com> wrote:

> Joe Marshall <···@ccs.neu.edu> writes:
> 
> > Peter Seibel <·····@javamonkey.com> writes:
> >
> >> So I was thinking about how function calls work and how a function
> >> such as foo:
> >>
> >>   (defun foo () (bar))
> >>
> >> when compiled needs to contain a reference to the symbol object BAR.
> >> I.e. it can't just jump to the address of the BAR function because BAR
> >> may get redefined.  So my question is this: are the references to
> >> symbols in a compiled function just another kind of object reference
> >> when it comes to garbage collection?  
> >
> > Sort of.  Since symbols are interned, you can put them in a special
> > area where they don't need to be moved.  Then you can ignore them.
> 
> So does this mean--in implementations that work the way you
> describe--that even if a symbol is uninterned and all references to it
> dropped it will continue to take up space in memory?

Yes.  Since uninterning is such an uncommon thing, the potential waste 
is very minor, and the benefits to keeping them stationary may outweigh 
this cost.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
From: Peter Seibel
Subject: Re: References to symbols in functions?
Date: 
Message-ID: <m3oen1x9ic.fsf@javamonkey.com>
Barry Margolin <······@alum.mit.edu> writes:

> In article <··············@javamonkey.com>,
>  Peter Seibel <·····@javamonkey.com> wrote:
>
>> Joe Marshall <···@ccs.neu.edu> writes:
>> 
>> > Peter Seibel <·····@javamonkey.com> writes:
>> >
>> >> So I was thinking about how function calls work and how a function
>> >> such as foo:
>> >>
>> >>   (defun foo () (bar))
>> >>
>> >> when compiled needs to contain a reference to the symbol object BAR.
>> >> I.e. it can't just jump to the address of the BAR function because BAR
>> >> may get redefined.  So my question is this: are the references to
>> >> symbols in a compiled function just another kind of object reference
>> >> when it comes to garbage collection?  
>> >
>> > Sort of.  Since symbols are interned, you can put them in a special
>> > area where they don't need to be moved.  Then you can ignore them.
>> 
>> So does this mean--in implementations that work the way you
>> describe--that even if a symbol is uninterned and all references to
>> it dropped it will continue to take up space in memory?
>
> Yes.  Since uninterning is such an uncommon thing, the potential waste 
> is very minor, and the benefits to keeping them stationary may outweigh 
> this cost.

Yup. I can see that. Just wanted to make sure I understood where the
tradeoff was being made. Thanks.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp