Hi!
As I'm currently working on an application which creates a lot of
closures I was wondering how much space they would need. It was my
original assumption (knowing virtually nothing about how CL is
implemented internally) that different closures created by the same
piece of code will share a lot of, if not all, (machine) code and it
should thus be able to store them efficiently. That is, I expected
that basically only the closed-over environments had to be stored
while the closure "itself", i.e. the code, would just be a jump to a
common starting point.
However, I wasn't sure (after all, a function which created a closure
might be modified afterwards and whatever), googled around a bit and
found this older c.l.l. posting which seemed to imply that for each
(?) closure the code proper is put on the heap:
<http://groups.google.com/groups?selm=166%40utah-orion.UUCP>
So, I started experimenting a little bit and came up with this one:
(defun foo (i)
(lambda (x)
(+ x i)))
(defun bar (n)
(let ((a (make-array n)))
(ext:gc :full t)
(room nil)
(loop for i of-type fixnum below n
do (setf (aref a i) (foo i)))
(ext:gc :full t)
(room nil)))
After compiling these functions I got (in CMUCL):
* (bar 1000)
Dynamic Space Usage: 1,357,792 bytes.
Read-Only Space Usage: 17,666,672 bytes.
Static Space Usage: 2,636,088 bytes.
Control Stack Usage: 484 bytes.
Binding Stack Usage: 96 bytes.
The current dynamic space is 0.
Garbage collection is currently enabled.
Dynamic Space Usage: 1,366,344 bytes.
Read-Only Space Usage: 17,666,672 bytes.
Static Space Usage: 2,636,088 bytes.
Control Stack Usage: 452 bytes.
Binding Stack Usage: 96 bytes.
The current dynamic space is 0.
Garbage collection is currently enabled.
* (float (/ (- 1366344 1357792) 1000))
8.552
If my na�ve approach of measuring space with ROOM and EXT:GC is
correct, then CMUCL needs about 8 Bytes per closure. 8 Bytes is
definitely less than the whole machine code an object like (FOO I)
would occupy, so I think this confirms my initial assumption.
Furthermore, when looking at the disassemblies of different closures,
say (FOO 1) and (FOO 2), they seem to have /exactly/ the same code
(see attachment below). In LispWorks (also attached) the code is not
exactly the same but all closures seem to jump to a shared block of
code. All of this seems to imply that closures are stored rather
efficiently with respect to space (in these two implementations).
So, my question is, is my assumption about how closures are stored
basically right, i.e. can I assume that, in current CL
implementations, closures are stored efficiently and I will basically
only have to worry about the space used by the closed-over
environments?
(I think it is clear from my wording that I'm aware of the fact that
this is implementation-dependent. But there are only so many
implementations out there and maybe they share a common strategy. I
wouldn't bother to learn about different ways to do this either, maybe
in older Lisps or in CS theory.)
Thanks in advance,
Edi.
* (let ((f1 (foo 1)) (f2 (foo 2))) (disassemble f1) (disassemble f2))
48078FA1: ADD [EAX], AL
A3: ADD [EAX], AL
A5: ADD [EAX], AL
A7: ADD [ESI+46], BH
AA: ADD [EAX], AL
AC: ROR BYTE PTR [EDI+739335], 0
B3: SUB [EDI-683142066], BH
B9: DEC ESI
BA: ADC AL, 72
BC: INC EBX
BD: LEA EDX, [ESI]
BF: DEC EAX
C0: POP DWORD PTR [EBP-8]
C3: LEA ESP, [EBP-32]
C6: MOV EDI, [EAX+7]
C9: CMP ECX, 4
CC: JNE L0
CE: MOV [EBP-12], EDX
D1: MOV EDX, [EBP-12] ; No-arg-parsing entry point
D4: CALL #x100001C8
D9: MOV ESP, EBX
DB: MOV ECX, [EBP-8]
DE: MOV EAX, [EBP-4]
E1: ADD ECX, 2
E4: MOV ESP, EBP
E6: MOV EBP, EAX
E8: JMP ECX
EA: NOP
EB: NOP
EC: NOP
ED: NOP
EE: NOP
EF: NOP
F0: BREAK 10 ; Error trap
F2: BYTE #x02
F3: BYTE #x19 ; INVALID-ARGUMENT-COUNT-ERROR
F4: BYTE #x4D ; ECX
F5: L0: BREAK 10 ; Error trap
F7: BYTE #x02
F8: BYTE #x19 ; INVALID-ARGUMENT-COUNT-ERROR
F9: BYTE #x4D ; ECX
48078FA1: ADD [EAX], AL
A3: ADD [EAX], AL
A5: ADD [EAX], AL
A7: ADD [ESI+46], BH
AA: ADD [EAX], AL
AC: ROR BYTE PTR [EDI+739335], 0
B3: SUB [EDI-683142066], BH
B9: DEC ESI
BA: ADC AL, 72
BC: INC EBX
BD: LEA EDX, [ESI]
BF: DEC EAX
C0: POP DWORD PTR [EBP-8]
C3: LEA ESP, [EBP-32]
C6: MOV EDI, [EAX+7]
C9: CMP ECX, 4
CC: JNE L0
CE: MOV [EBP-12], EDX
D1: MOV EDX, [EBP-12] ; No-arg-parsing entry point
D4: CALL #x100001C8
D9: MOV ESP, EBX
DB: MOV ECX, [EBP-8]
DE: MOV EAX, [EBP-4]
E1: ADD ECX, 2
E4: MOV ESP, EBP
E6: MOV EBP, EAX
E8: JMP ECX
EA: NOP
EB: NOP
EC: NOP
ED: NOP
EE: NOP
EF: NOP
F0: BREAK 10 ; Error trap
F2: BYTE #x02
F3: BYTE #x19 ; INVALID-ARGUMENT-COUNT-ERROR
F4: BYTE #x4D ; ECX
F5: L0: BREAK 10 ; Error trap
F7: BYTE #x02
F8: BYTE #x19 ; INVALID-ARGUMENT-COUNT-ERROR
F9: BYTE #x4D ; ECX
CL-USER 38 > (disassemble (foo 1))
2065BD0A:
0: BFF4BC6520 move edi, 2065BCF4
5: E906BD0000 jmp 20667A1A
20667A1A:
0: 80FD01 cmpb ch, 1
3: 7527 jne L2
5: 3B25BC150020 cmp esp, [200015BC] ; T
11: 761F jbe L2
13: 55 push ebp
14: 89E5 move ebp, esp
16: 50 push eax
17: 50 push eax
18: 897DF8 move [ebp-8], edi
21: 8B7DF8 move edi, [ebp-8]
24: 8B7F04 move edi, [edi+4]
27: 89F9 move ecx, edi
29: 0A4DFC orb cl, [ebp-4]
32: 750F jne L3
34: 8B45FC move eax, [ebp-4]
37: 03C7 add eax, edi
39: 7008 jo L3
L1: 41: FD std
42: C9 leave
43: C3 ret
L2: 44: E877E39DFF call 20045DC2 ; #<function 20045DC2>
L3: 49: 89F8 move eax, edi
51: FF75FC push [ebp-4]
54: E8557D4E00 call 20B4F7AA ; #<function 20B4F7AA>
59: EBEC jmp L1
61: 90 nop
NIL
CL-USER 39 > (disassemble (foo 2))
2065DAA2:
0: BF8CDA6520 move edi, 2065DA8C
5: E96E9F0000 jmp 20667A1A
20667A1A:
0: 80FD01 cmpb ch, 1
3: 7527 jne L2
5: 3B25BC150020 cmp esp, [200015BC] ; T
11: 761F jbe L2
13: 55 push ebp
14: 89E5 move ebp, esp
16: 50 push eax
17: 50 push eax
18: 897DF8 move [ebp-8], edi
21: 8B7DF8 move edi, [ebp-8]
24: 8B7F04 move edi, [edi+4]
27: 89F9 move ecx, edi
29: 0A4DFC orb cl, [ebp-4]
32: 750F jne L3
34: 8B45FC move eax, [ebp-4]
37: 03C7 add eax, edi
39: 7008 jo L3
L1: 41: FD std
42: C9 leave
43: C3 ret
L2: 44: E877E39DFF call 20045DC2 ; #<function 20045DC2>
L3: 49: 89F8 move eax, edi
51: FF75FC push [ebp-4]
54: E8557D4E00 call 20B4F7AA ; #<function 20B4F7AA>
59: EBEC jmp L1
61: 90 nop
NIL
On 28 Nov 2002 12:17:48 +0100, Edi Weitz <···@agharta.de> wrote:
>Hi!
>
>As I'm currently working on an application which creates a lot of
>closures I was wondering how much space they would need. It was my
>original assumption (knowing virtually nothing about how CL is
>implemented internally) that different closures created by the same
>piece of code will share a lot of, if not all, (machine) code and it
>should thus be able to store them efficiently. That is, I expected
>that basically only the closed-over environments had to be stored
>while the closure "itself", i.e. the code, would just be a jump to a
>common starting point.
I think your analysis and conclusions are correct--any reasonable implementation
would not make a copy of the code for each closure (unless the code for a
closure turned out to be very small). A closure is just a structure with two
components, the code and the environment. Only the environment differs per
instance. There is no reason not to share the code.
If a function gets redefined that doesn't matter. The old definition will be
hanging off old closures, while new closures will get the new definition. In
this case there will be two pieces of code, but this has nothing to do with the
closures per se. It is just caused by the redefinition.
Edi Weitz <···@agharta.de> writes:
> found this older c.l.l. posting which seemed to imply that for each
> (?) closure the code proper is put on the heap:
>
> <http://groups.google.com/groups?selm=166%40utah-orion.UUCP>
I think he either meant the _address_ of the code is stored on the heap,
or he was confused and was thinking of closures that are implemented as
small functions (on the heap) which just load environment pointer(s)
into a register or two and then jump to the body of the closure's code.
If you think "C" language for a moment, the advantage of this is that a
closure can be treated as a normal function which you can take the
address of and jump to, with no extra magic. I think.
--
Hallvard
Edi Weitz wrote:
> CL-USER 38 > (disassemble (foo 1))
> 2065BD0A:
> 0: BFF4BC6520 move edi, 2065BCF4
> 5: E906BD0000 jmp 20667A1A
I think this is your closure. Load the pointer to the environment (into edi
- 5 bytes). Jump to the code (5 bytes). This is 10 bytes of code.
Additional bytes is needed at 2065BCF4 for the environment I would think -
probably at least 4 bytes.
I tried your code, and in my tests it used about 16 bytes per closure when
foo and bar were compiled. If the above is how a closure is compiled, I do
not understand how you can get to the 8 bytes/closure figure.
astor
Alexander Kjeldaas <··········@fast.no> writes:
> Edi Weitz wrote:
>
> > CL-USER 38 > (disassemble (foo 1))
> > 2065BD0A:
> > 0: BFF4BC6520 move edi, 2065BCF4
> > 5: E906BD0000 jmp 20667A1A
>
> I think this is your closure. Load the pointer to the environment
> (into edi - 5 bytes). Jump to the code (5 bytes). This is 10 bytes
> of code. Additional bytes is needed at 2065BCF4 for the environment
> I would think - probably at least 4 bytes.
>
> I tried your code, and in my tests it used about 16 bytes per
> closure when foo and bar were compiled. If the above is how a
> closure is compiled, I do not understand how you can get to the 8
> bytes/closure figure.
This was measured with CMUCL. Either CMUCL uses a different way to
store the closure or my way of measuring this (using (EXT:GC :FULL T)
and (ROOM) - see my original posting) wasn't correct. In LW I'd use
HCL:MARK-AND-SWEEP but I can't seem to get any meaningful results. How
did you measure this? (Just curious.)
Cheers,
Edi.
Edi Weitz <···@agharta.de> writes:
> Alexander Kjeldaas <··········@fast.no> writes:
>
> > Edi Weitz wrote:
> >
> > > CL-USER 38 > (disassemble (foo 1))
> > > 2065BD0A:
> > > 0: BFF4BC6520 move edi, 2065BCF4
> > > 5: E906BD0000 jmp 20667A1A
> >
> > I think this is your closure. Load the pointer to the environment
> > (into edi - 5 bytes). Jump to the code (5 bytes). This is 10 bytes
> > of code. Additional bytes is needed at 2065BCF4 for the environment
> > I would think - probably at least 4 bytes.
> >
> > I tried your code, and in my tests it used about 16 bytes per
> > closure when foo and bar were compiled. If the above is how a
> > closure is compiled, I do not understand how you can get to the 8
> > bytes/closure figure.
>
> This was measured with CMUCL. Either CMUCL uses a different way to
> store the closure or my way of measuring this (using (EXT:GC :FULL T)
> and (ROOM) - see my original posting) wasn't correct. In LW I'd use
> HCL:MARK-AND-SWEEP but I can't seem to get any meaningful results. How
> did you measure this? (Just curious.)
With a large enough N for your BAR function, it shouldn't matter, but
remember, CMUCL on x86 has a conservative GC, so there's some
precision that will probably be missing there.
--
/|_ .-----------------------.
,' .\ / | No to Imperialist war |
,--' _,' | Wage class war! |
/ / `-----------------------'
( -. |
| ) |
(`-. '--.)
`. )----'
Edi Weitz found:
> found this older c.l.l. posting which seemed to imply that for each
> (?) closure the code proper is put on the heap:
>
> <http://groups.google.com/groups?selm=166%40utah-orion.UUCP>
I don't believe that article says that each closure has a separate
copy of the code. The author of that article is correct when he says
that preserving an entire environment in each closure can waste space,
and observes that it is better to preserve only the relevant parts of
the environment. This is part of what people who talk about "safe
for space complexity" are talking about.
> So, my question is, is my assumption about how closures are stored
> basically right, i.e. can I assume that, in current CL
> implementations, closures are stored efficiently and I will basically
> only have to worry about the space used by the closed-over
> environments?
Yes.
Will