How are closures stored?

From: Edi Weitz
Subject: How are closures stored?
Date: Thu, 28 Nov 2002 11:17:48 +0000
Message-ID: <87adjuuen7.fsf@bird.agharta.de>

Hi!

As I'm currently working on an application which creates a lot of
closures I was wondering how much space they would need. It was my
original assumption (knowing virtually nothing about how CL is
implemented internally) that different closures created by the same
piece of code will share a lot of, if not all, (machine) code and it
should thus be able to store them efficiently. That is, I expected
that basically only the closed-over environments had to be stored
while the closure "itself", i.e. the code, would just be a jump to a
common starting point.

However, I wasn't sure (after all, a function which created a closure
might be modified afterwards and whatever), googled around a bit and
found this older c.l.l. posting which seemed to imply that for each
(?) closure the code proper is put on the heap:

  <http://groups.google.com/groups?selm=166%40utah-orion.UUCP>

So, I started experimenting a little bit and came up with this one:

  (defun foo (i)
    (lambda (x)
      (+ x i)))

  (defun bar (n)
    (let ((a (make-array n)))
      (ext:gc :full t)
      (room nil)
      (loop for i of-type fixnum below n
            do (setf (aref a i) (foo i)))
      (ext:gc :full t)
      (room nil)))

After compiling these functions I got (in CMUCL):

  * (bar 1000)
  Dynamic Space Usage:     1,357,792 bytes.
  Read-Only Space Usage:  17,666,672 bytes.
  Static Space Usage:      2,636,088 bytes.
  Control Stack Usage:           484 bytes.
  Binding Stack Usage:            96 bytes.
  The current dynamic space is 0.
  Garbage collection is currently enabled.
  Dynamic Space Usage:     1,366,344 bytes.
  Read-Only Space Usage:  17,666,672 bytes.
  Static Space Usage:      2,636,088 bytes.
  Control Stack Usage:           452 bytes.
  Binding Stack Usage:            96 bytes.
  The current dynamic space is 0.
  Garbage collection is currently enabled.
  * (float (/ (- 1366344 1357792) 1000))
  8.552

If my na�ve approach of measuring space with ROOM and EXT:GC is
correct, then CMUCL needs about 8 Bytes per closure. 8 Bytes is
definitely less than the whole machine code an object like (FOO I)
would occupy, so I think this confirms my initial assumption.

Furthermore, when looking at the disassemblies of different closures,
say (FOO 1) and (FOO 2), they seem to have /exactly/ the same code
(see attachment below). In LispWorks (also attached) the code is not
exactly the same but all closures seem to jump to a shared block of
code. All of this seems to imply that closures are stored rather
efficiently with respect to space (in these two implementations).

So, my question is, is my assumption about how closures are stored
basically right, i.e. can I assume that, in current CL
implementations, closures are stored efficiently and I will basically
only have to worry about the space used by the closed-over
environments?

(I think it is clear from my wording that I'm aware of the fact that
this is implementation-dependent. But there are only so many
implementations out there and maybe they share a common strategy. I
wouldn't bother to learn about different ways to do this either, maybe
in older Lisps or in CS theory.)

Thanks in advance,
Edi.


  * (let ((f1 (foo 1)) (f2 (foo 2))) (disassemble f1) (disassemble f2))
  48078FA1:       ADD   [EAX], AL
        A3:       ADD   [EAX], AL
        A5:       ADD   [EAX], AL
        A7:       ADD   [ESI+46], BH
        AA:       ADD   [EAX], AL
        AC:       ROR   BYTE PTR [EDI+739335], 0
        B3:       SUB   [EDI-683142066], BH
        B9:       DEC   ESI
        BA:       ADC   AL, 72
        BC:       INC   EBX
        BD:       LEA   EDX, [ESI]
        BF:       DEC   EAX
        C0:       POP   DWORD PTR [EBP-8]
        C3:       LEA   ESP, [EBP-32]
        C6:       MOV   EDI, [EAX+7]
        C9:       CMP   ECX, 4
        CC:       JNE   L0
        CE:       MOV   [EBP-12], EDX
        D1:       MOV   EDX, [EBP-12]          ; No-arg-parsing entry point
        D4:       CALL  #x100001C8
        D9:       MOV   ESP, EBX
        DB:       MOV   ECX, [EBP-8]
        DE:       MOV   EAX, [EBP-4]
        E1:       ADD   ECX, 2
        E4:       MOV   ESP, EBP
        E6:       MOV   EBP, EAX
        E8:       JMP   ECX
        EA:       NOP
        EB:       NOP
        EC:       NOP
        ED:       NOP
        EE:       NOP
        EF:       NOP
        F0:       BREAK 10                     ; Error trap
        F2:       BYTE  #x02
        F3:       BYTE  #x19                   ; INVALID-ARGUMENT-COUNT-ERROR
        F4:       BYTE  #x4D                   ; ECX
        F5: L0:   BREAK 10                     ; Error trap
        F7:       BYTE  #x02
        F8:       BYTE  #x19                   ; INVALID-ARGUMENT-COUNT-ERROR
        F9:       BYTE  #x4D                   ; ECX
  48078FA1:       ADD   [EAX], AL
        A3:       ADD   [EAX], AL
        A5:       ADD   [EAX], AL
        A7:       ADD   [ESI+46], BH
        AA:       ADD   [EAX], AL
        AC:       ROR   BYTE PTR [EDI+739335], 0
        B3:       SUB   [EDI-683142066], BH
        B9:       DEC   ESI
        BA:       ADC   AL, 72
        BC:       INC   EBX
        BD:       LEA   EDX, [ESI]
        BF:       DEC   EAX
        C0:       POP   DWORD PTR [EBP-8]
        C3:       LEA   ESP, [EBP-32]
        C6:       MOV   EDI, [EAX+7]
        C9:       CMP   ECX, 4
        CC:       JNE   L0
        CE:       MOV   [EBP-12], EDX
        D1:       MOV   EDX, [EBP-12]          ; No-arg-parsing entry point
        D4:       CALL  #x100001C8
        D9:       MOV   ESP, EBX
        DB:       MOV   ECX, [EBP-8]
        DE:       MOV   EAX, [EBP-4]
        E1:       ADD   ECX, 2
        E4:       MOV   ESP, EBP
        E6:       MOV   EBP, EAX
        E8:       JMP   ECX
        EA:       NOP
        EB:       NOP
        EC:       NOP
        ED:       NOP
        EE:       NOP
        EF:       NOP
        F0:       BREAK 10                     ; Error trap
        F2:       BYTE  #x02
        F3:       BYTE  #x19                   ; INVALID-ARGUMENT-COUNT-ERROR
        F4:       BYTE  #x4D                   ; ECX
        F5: L0:   BREAK 10                     ; Error trap
        F7:       BYTE  #x02
        F8:       BYTE  #x19                   ; INVALID-ARGUMENT-COUNT-ERROR
        F9:       BYTE  #x4D                   ; ECX


  CL-USER 38 > (disassemble (foo 1))
  2065BD0A:
         0:      BFF4BC6520       move  edi, 2065BCF4
         5:      E906BD0000       jmp   20667A1A
  20667A1A:
         0:      80FD01           cmpb  ch, 1
         3:      7527             jne   L2
         5:      3B25BC150020     cmp   esp, [200015BC]  ; T
        11:      761F             jbe   L2
        13:      55               push  ebp
        14:      89E5             move  ebp, esp
        16:      50               push  eax
        17:      50               push  eax
        18:      897DF8           move  [ebp-8], edi
        21:      8B7DF8           move  edi, [ebp-8]
        24:      8B7F04           move  edi, [edi+4]
        27:      89F9             move  ecx, edi
        29:      0A4DFC           orb   cl, [ebp-4]
        32:      750F             jne   L3
        34:      8B45FC           move  eax, [ebp-4]
        37:      03C7             add   eax, edi
        39:      7008             jo    L3
  L1:   41:      FD               std   
        42:      C9               leave 
        43:      C3               ret   
  L2:   44:      E877E39DFF       call  20045DC2         ; #<function 20045DC2>
  L3:   49:      89F8             move  eax, edi
        51:      FF75FC           push  [ebp-4]
        54:      E8557D4E00       call  20B4F7AA         ; #<function 20B4F7AA>
        59:      EBEC             jmp   L1
        61:      90               nop   
  NIL

  CL-USER 39 > (disassemble (foo 2))
  2065DAA2:
         0:      BF8CDA6520       move  edi, 2065DA8C
         5:      E96E9F0000       jmp   20667A1A
  20667A1A:
         0:      80FD01           cmpb  ch, 1
         3:      7527             jne   L2
         5:      3B25BC150020     cmp   esp, [200015BC]  ; T
        11:      761F             jbe   L2
        13:      55               push  ebp
        14:      89E5             move  ebp, esp
        16:      50               push  eax
        17:      50               push  eax
        18:      897DF8           move  [ebp-8], edi
        21:      8B7DF8           move  edi, [ebp-8]
        24:      8B7F04           move  edi, [edi+4]
        27:      89F9             move  ecx, edi
        29:      0A4DFC           orb   cl, [ebp-4]
        32:      750F             jne   L3
        34:      8B45FC           move  eax, [ebp-4]
        37:      03C7             add   eax, edi
        39:      7008             jo    L3
  L1:   41:      FD               std   
        42:      C9               leave 
        43:      C3               ret   
  L2:   44:      E877E39DFF       call  20045DC2         ; #<function 20045DC2>
  L3:   49:      89F8             move  eax, edi
        51:      FF75FC           push  [ebp-4]
        54:      E8557D4E00       call  20B4F7AA         ; #<function 20B4F7AA>
        59:      EBEC             jmp   L1
        61:      90               nop   
  NIL

Re: How are closures stored? Roger Corman
Re: How are closures stored? Hallvard B Furuseth
Re: How are closures stored? Alexander Kjeldaas
- Re: How are closures stored? Edi Weitz
  - Re: How are closures stored? Thomas F. Burdick
Re: How are closures stored? William D Clinger

From: Roger Corman
Subject: Re: How are closures stored?
Date: Fri, 29 Nov 2002 08:30:42 +0000
Message-ID: <3de724d3.1867951862@news.sf.sbcglobal.net>

On 28 Nov 2002 12:17:48 +0100, Edi Weitz <···@agharta.de> wrote:

>Hi!
>
>As I'm currently working on an application which creates a lot of
>closures I was wondering how much space they would need. It was my
>original assumption (knowing virtually nothing about how CL is
>implemented internally) that different closures created by the same
>piece of code will share a lot of, if not all, (machine) code and it
>should thus be able to store them efficiently. That is, I expected
>that basically only the closed-over environments had to be stored
>while the closure "itself", i.e. the code, would just be a jump to a
>common starting point.

I think your analysis and conclusions are correct--any reasonable implementation
would not make a copy of the code for each closure (unless the code for a
closure turned out to be very small). A closure is just a structure with two
components, the code and the environment. Only the environment differs per
instance. There is no reason not to share the code.

If a function gets redefined that doesn't matter. The old definition will be
hanging off old closures, while new closures will get the new definition. In
this case there will be two pieces of code, but this has nothing to do with the
closures per se. It is just caused by the redefinition.

From: Hallvard B Furuseth
Subject: Re: How are closures stored?
Date: Fri, 29 Nov 2002 08:44:21 +0000
Message-ID: <HBF.20021129etlo@bombur.uio.no>

Edi Weitz <···@agharta.de> writes:
> found this older c.l.l. posting which seemed to imply that for each
> (?) closure the code proper is put on the heap:
> 
>   <http://groups.google.com/groups?selm=166%40utah-orion.UUCP>

I think he either meant the _address_ of the code is stored on the heap,
or he was confused and was thinking of closures that are implemented as
small functions (on the heap) which just load environment pointer(s)
into a register or two and then jump to the body of the closure's code.
If you think "C" language for a moment, the advantage of this is that a
closure can be treated as a normal function which you can take the
address of and jump to, with no extra magic.  I think.

-- 
Hallvard

From: Alexander Kjeldaas
Subject: Re: How are closures stored?
Date: Fri, 29 Nov 2002 11:03:49 +0000
Message-ID: <as7gsd$k32$1@news.broadnet.no>

Edi Weitz wrote:

>   CL-USER 38 > (disassemble (foo 1))
>   2065BD0A:
>          0:      BFF4BC6520       move  edi, 2065BCF4
>          5:      E906BD0000       jmp   20667A1A

I think this is your closure.  Load the pointer to the environment (into edi 
- 5 bytes). Jump to the code (5 bytes).  This is 10 bytes of code.  
Additional bytes is needed at 2065BCF4 for the environment I would think - 
probably at least 4 bytes.

I tried your code, and in my tests it used about 16 bytes per closure when 
foo and bar were compiled.  If the above is how a closure is compiled, I do 
not understand how you can get to the 8 bytes/closure figure.

astor

From: Edi Weitz
Subject: Re: How are closures stored?
Date: Fri, 29 Nov 2002 15:48:24 +0000
Message-ID: <8765ugbcmv.fsf@bird.agharta.de>

Alexander Kjeldaas <··········@fast.no> writes:

> Edi Weitz wrote:
> 
> >   CL-USER 38 > (disassemble (foo 1))
> >   2065BD0A:
> >          0:      BFF4BC6520       move  edi, 2065BCF4
> >          5:      E906BD0000       jmp   20667A1A
> 
> I think this is your closure.  Load the pointer to the environment
> (into edi - 5 bytes). Jump to the code (5 bytes).  This is 10 bytes
> of code.  Additional bytes is needed at 2065BCF4 for the environment
> I would think - probably at least 4 bytes.
> 
> I tried your code, and in my tests it used about 16 bytes per
> closure when foo and bar were compiled.  If the above is how a
> closure is compiled, I do not understand how you can get to the 8
> bytes/closure figure.

This was measured with CMUCL. Either CMUCL uses a different way to
store the closure or my way of measuring this (using (EXT:GC :FULL T)
and (ROOM) - see my original posting) wasn't correct. In LW I'd use
HCL:MARK-AND-SWEEP but I can't seem to get any meaningful results. How
did you measure this? (Just curious.)

Cheers,
Edi.

From: Thomas F. Burdick
Subject: Re: How are closures stored?
Date: Fri, 29 Nov 2002 19:59:46 +0000
Message-ID: <xcv3cpkrvt9.fsf@famine.OCF.Berkeley.EDU>

Edi Weitz <···@agharta.de> writes:

> Alexander Kjeldaas <··········@fast.no> writes:
> 
> > Edi Weitz wrote:
> > 
> > >   CL-USER 38 > (disassemble (foo 1))
> > >   2065BD0A:
> > >          0:      BFF4BC6520       move  edi, 2065BCF4
> > >          5:      E906BD0000       jmp   20667A1A
> > 
> > I think this is your closure.  Load the pointer to the environment
> > (into edi - 5 bytes). Jump to the code (5 bytes).  This is 10 bytes
> > of code.  Additional bytes is needed at 2065BCF4 for the environment
> > I would think - probably at least 4 bytes.
> > 
> > I tried your code, and in my tests it used about 16 bytes per
> > closure when foo and bar were compiled.  If the above is how a
> > closure is compiled, I do not understand how you can get to the 8
> > bytes/closure figure.
> 
> This was measured with CMUCL. Either CMUCL uses a different way to
> store the closure or my way of measuring this (using (EXT:GC :FULL T)
> and (ROOM) - see my original posting) wasn't correct. In LW I'd use
> HCL:MARK-AND-SWEEP but I can't seem to get any meaningful results. How
> did you measure this? (Just curious.)

With a large enough N for your BAR function, it shouldn't matter, but
remember, CMUCL on x86 has a conservative GC, so there's some
precision that will probably be missing there.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: William D Clinger
Subject: Re: How are closures stored?
Date: Sun, 01 Dec 2002 21:33:29 +0000
Message-ID: <b84e9a9f.0212011333.2797707@posting.google.com>

Edi Weitz found:
> found this older c.l.l. posting which seemed to imply that for each
> (?) closure the code proper is put on the heap:
> 
>   <http://groups.google.com/groups?selm=166%40utah-orion.UUCP>

I don't believe that article says that each closure has a separate
copy of the code.  The author of that article is correct when he says
that preserving an entire environment in each closure can waste space,
and observes that it is better to preserve only the relevant parts of
the environment.  This is part of what people who talk about "safe
for space complexity" are talking about.

> So, my question is, is my assumption about how closures are stored
> basically right, i.e. can I assume that, in current CL
> implementations, closures are stored efficiently and I will basically
> only have to worry about the space used by the closed-over
> environments?

Yes.

Will