Trying to understand "disassemble"

From: Francogrex
Subject: Trying to understand "disassemble"
Date: Fri, 23 Jan 2009 21:41:05 +0000
Message-ID: <f8c1b9d1-b046-4861-a1f4-7972c8ef6812@v5g2000prm.googlegroups.com>

I'm trying to understand how disassemble works. Here is a test in 2
implementations:

* (defun add (x y) (+ x y))
ADD
* (disassemble 'add)

in SBCL

; 23C06A25:       8B55F4           MOV EDX, [EBP-12]          ; no-arg-
parsing entry point
;       28:       8B7DF0           MOV EDI, [EBP-16]
;       2B:       E890973FFE       CALL #x220001C0            ;
GENERIC-+
;       30:       7302             JNB L0
;       32:       8BE3             MOV ESP, EBX
;       34: L0:   8D65F8           LEA ESP, [EBP-8]
;       37:       F8               CLC
;       38:       8B6DFC           MOV EBP, [EBP-4]
;       3B:       C20400           RET 4
;       3E:       CC0A             BREAK 10                   ; error
trap
;       40:       02               BYTE #X02
;       41:       18               BYTE #X18                  ;
INVALID-ARG-COUNT-ERROR
;       42:       4D               BYTE #X4D                  ; ECX
;

in Corman
;Disassembling from address #xD72A70:
;#x0: 55             push    ebp
;#x1: 8BEC           mov     ebp,esp
;#x3: 57             push    edi
;#x4: 83F902         cmp     ecx,00002h
;#x7: 7406           je      000F
;#x9: FF96C4100000   call    near dword ptr [esi+0000010C4h]      ;
Call COMMON-LISP::%WRONG-NUMBER-
OF-ARGS
;#xF: FF750C         push    dword ptr [ebp+0000Ch]
;#x12: 8B4508         mov     eax,[ebp+00008h]
;#x15: 5A             pop     edx
;#x16: A807           test    al,007h
;#x18: 750B           jne     0025
;#x1A: F6C207         test    dl,007h
;#x1D: 7506           jne     0025
;#x1F: 03C2           add     eax,edx
;#x21: 7108           jno     002B
;#x23: 2BC2           sub     eax,edx
;#x25: FF9604180000   call    near dword ptr [esi+000001804h]      ;
Call COMMON-LISP::%PLUS_EAX_EDX

;#x2B: B901000000     mov     ecx,0001
;#x30: 8BE5           mov     esp,ebp
;#x32: 5D             pop     ebp
;#x33: C3             ret


Does this mean that if I take any of the code above and use a
microsoft assembler I would be able to make a standalone program from
this function (or anyone in CL)?

Re: Trying to understand "disassemble" Brian
Re: Trying to understand "disassemble" Dimiter "malkia" Stanev
Re: Trying to understand "disassemble" D Herring
- Re: Trying to understand "disassemble" Thomas A. Russ
  - Re: Trying to understand "disassemble" D Herring

From: Brian
Subject: Re: Trying to understand "disassemble"
Date: Fri, 23 Jan 2009 21:47:37 +0000
Message-ID: <ba9f6e41-68d6-46fa-8a3d-c87990fe313b@g39g2000pri.googlegroups.com>

On Jan 23, 3:41 pm, Francogrex <······@grex.org> wrote:
> Does this mean that if I take any of the code above and use a
> microsoft assembler I would be able to make a standalone program from
> this function (or anyone in CL)?
No.  The disassemble function is provided to allow the user to see how
the compiler is compiling their code.

From: Dimiter "malkia" Stanev
Subject: Re: Trying to understand "disassemble"
Date: Fri, 23 Jan 2009 22:44:15 +0000
Message-ID: <gldh83$f4n$1@news.motzarella.org>

I can't talk generally for every implementation, but basically you can, 
but in most cases 99.9999% you can't.

Normally a C compiler comes with an ABI standard (Application Binary 
Interface) - it's basically a specification of how you call your 
function, how you provide the arguments (through the stack, registers, 
or both, and which way), etc. Also which registers need to preserved 
across calls, and which can be used, etc.

For example take a look at the ARM ABI:
http://www.arm.com/products/DevTools/ABI.html

There is no ABI for Lisps - e.g. having Lispworks as DLL, and Allegro as 
DLL won't do any good - they can't call each other directly, as there 
was not established low-level protocol between them (such as ABI).

But granted, this is what gives freedom to Lisp Compilers to optimize in 
areas, which are even not possible in standard C language - for example 
- how closures are stored, non-local gotos are done, exception handling, 
multiple values, optional values, garbage collecting, boxing, unboxing, 
etc, most importantly deep and shallow binding.

As "C" is more or less a machine level language, it does not differ much 
from what most of the CPU's around there are, and that's why such 
interoperability is possible - e.g. library or DLL compiled with one "C" 
implementation talking with another - even if there might be ways in 
each of the compilers to do the things a little bit different, there is 
almost always a compatibility mode (e.g. in gcc - do it the way msvc 
does in mingw, or doit the way the ARM ABI says).

But you know, "C" ABI goes as far as the hardware being used is 
similiar, and the compilers - for example ABI won't help you on the 
Playstation3 where you have PPU code calling SPU one, or it won't help 
you calling a network machine to another one, etc.

In such cases, either solutions like CFFI, IDL, or anything that packs 
arguments, and environment (if needed), function call, and calls other 
would work.

So in short - the dissasembly is useful to check whether the Lisp 
Compiler is doing alright. I'm mostly using it to cross check certain 
calcutions vs. "C" code - I'm not worried about the code size usually, 
but the consing or anytime I see a GENERIC-+ call on a function that 
usually operates on one and same type across large data - there is 
definitely need for improvement

Just try this:

* (defun add (x y) (declare (fixnum x y) (optimize (speed 3) (safety 0) 
(space 0) (debug 0))) (the fixnum (+ x y)))
STYLE-WARNING: redefining ADD in DEFUN

ADD
* (disassemble 'add)

; 23C1598A:       01FA             ADD EDX, EDI               ; 
no-arg-parsing entry point
;       8C:       8D65F8           LEA ESP, [EBP-8]
;       8F:       F8               CLC
;       90:       8B6DFC           MOV EBP, [EBP-4]
;       93:       C20400           RET 4
;
NIL
*

As you can see SBCL assumes that first argument is in EDX (or EDI), 
second in EDI (or EDX), and result is in EDX - also it restores the 
caller state (kind of like "Pascal" calling convention, instead of 
relying on the callee (as in "C")

But you would get totally different results in other compilers

Francogrex wrote:
> I'm trying to understand how disassemble works. Here is a test in 2
> implementations:
> 
> * (defun add (x y) (+ x y))
> ADD
> * (disassemble 'add)
> 
> in SBCL
> 
> ; 23C06A25:       8B55F4           MOV EDX, [EBP-12]          ; no-arg-
> parsing entry point
> ;       28:       8B7DF0           MOV EDI, [EBP-16]
> ;       2B:       E890973FFE       CALL #x220001C0            ;
> GENERIC-+
> ;       30:       7302             JNB L0
> ;       32:       8BE3             MOV ESP, EBX
> ;       34: L0:   8D65F8           LEA ESP, [EBP-8]
> ;       37:       F8               CLC
> ;       38:       8B6DFC           MOV EBP, [EBP-4]
> ;       3B:       C20400           RET 4
> ;       3E:       CC0A             BREAK 10                   ; error
> trap
> ;       40:       02               BYTE #X02
> ;       41:       18               BYTE #X18                  ;
> INVALID-ARG-COUNT-ERROR
> ;       42:       4D               BYTE #X4D                  ; ECX
> ;
> 
> in Corman
> ;Disassembling from address #xD72A70:
> ;#x0: 55             push    ebp
> ;#x1: 8BEC           mov     ebp,esp
> ;#x3: 57             push    edi
> ;#x4: 83F902         cmp     ecx,00002h
> ;#x7: 7406           je      000F
> ;#x9: FF96C4100000   call    near dword ptr [esi+0000010C4h]      ;
> Call COMMON-LISP::%WRONG-NUMBER-
> OF-ARGS
> ;#xF: FF750C         push    dword ptr [ebp+0000Ch]
> ;#x12: 8B4508         mov     eax,[ebp+00008h]
> ;#x15: 5A             pop     edx
> ;#x16: A807           test    al,007h
> ;#x18: 750B           jne     0025
> ;#x1A: F6C207         test    dl,007h
> ;#x1D: 7506           jne     0025
> ;#x1F: 03C2           add     eax,edx
> ;#x21: 7108           jno     002B
> ;#x23: 2BC2           sub     eax,edx
> ;#x25: FF9604180000   call    near dword ptr [esi+000001804h]      ;
> Call COMMON-LISP::%PLUS_EAX_EDX
> 
> ;#x2B: B901000000     mov     ecx,0001
> ;#x30: 8BE5           mov     esp,ebp
> ;#x32: 5D             pop     ebp
> ;#x33: C3             ret
> 
> 
> Does this mean that if I take any of the code above and use a
> microsoft assembler I would be able to make a standalone program from
> this function (or anyone in CL)?
> 
>

From: D Herring
Subject: Re: Trying to understand "disassemble"
Date: Fri, 23 Jan 2009 22:46:04 +0000
Message-ID: <497a48a8$0$3340$6e1ede2f@read.cnntp.org>

Francogrex wrote:
> I'm trying to understand how disassemble works. Here is a test in 2
> implementations:
...
> Does this mean that if I take any of the code above and use a
> microsoft assembler I would be able to make a standalone program from
> this function (or anyone in CL)?

Not directly
- disassemble doesn't have to respect any external assembler's syntax
- it doesn't show the environment (libraries in memory, stack, 
registers, etc.) required to successfully call the code

But the general premise (that you see the raw assembly) should be true.

- Daniel

From: Thomas A. Russ
Subject: Re: Trying to understand "disassemble"
Date: Sat, 24 Jan 2009 00:31:55 +0000
Message-ID: <ymiskn98ryc.fsf@blackcat.isi.edu>

D Herring <········@at.tentpost.dot.com> writes:

> Francogrex wrote:
> > I'm trying to understand how disassemble works. Here is a test in 2
> > implementations:
> ...
> > Does this mean that if I take any of the code above and use a
> > microsoft assembler I would be able to make a standalone program from
> > this function (or anyone in CL)?
> 
> Not directly
> - disassemble doesn't have to respect any external assembler's syntax
> - it doesn't show the environment (libraries in memory, stack,
> registers, etc.) required to successfully call the code
> 
> But the general premise (that you see the raw assembly) should be
> true.

But there is the additional compliation of lisp implementations that
don't compile to native code, but rather to an intermediate byte-code
for a virtual machine.

In such an implementation I would expect DISASSEMBLE to show the byte
code and not assembler.

-- 
Thomas A. Russ,  USC/Information Sciences Institute

From: D Herring
Subject: Re: Trying to understand "disassemble"
Date: Sat, 24 Jan 2009 01:12:56 +0000
Message-ID: <497a6b15$0$3339$6e1ede2f@read.cnntp.org>

Thomas A. Russ wrote:
> D Herring <········@at.tentpost.dot.com> writes:
> 
>> Francogrex wrote:
>>> I'm trying to understand how disassemble works. Here is a test in 2
>>> implementations:
>> ...
>>> Does this mean that if I take any of the code above and use a
>>> microsoft assembler I would be able to make a standalone program from
>>> this function (or anyone in CL)?
>> Not directly
>> - disassemble doesn't have to respect any external assembler's syntax
>> - it doesn't show the environment (libraries in memory, stack,
>> registers, etc.) required to successfully call the code
>>
>> But the general premise (that you see the raw assembly) should be
>> true.
> 
> But there is the additional compilation of lisp implementations that
> don't compile to native code, but rather to an intermediate byte-code
> for a virtual machine.
> 
> In such an implementation I would expect DISASSEMBLE to show the byte
> code and not assembler.
> 

Eh.  For a dead language, there sure are a bunch of options.

# clisp
[1]> (defun f (x) (+ x 1))
F
[2]> (disassemble #'f)

Disassembly of function F
1 required argument
0 optional arguments
No rest parameter
No keyword parameters
3 byte-code instructions:
0     (LOAD&PUSH 1)
1     (CALLS2 177)                        ; 1+
3     (SKIP&RET 2)
NIL
[3]>

Details...  You win just because nobody built the hardware yet.  ;)

- Daniel