SBCL 0.9.11 and 32-bit arithmetic

From: Tord Kallqvist Romstad
Subject: SBCL 0.9.11 and 32-bit arithmetic
Date: Sat, 01 Apr 2006 11:42:01 +0000
Message-ID: <gqk3bgxikty.fsf@europa.uio.no>

I'm trying to get started using SBCL 0.9.11 on an Intel iMac (on my
old Mac I used OpenMCL, which unfortunately doesn't run on Intel
Macs). Section 5.2 of the SBCL manual shows a way to do fast 32-bit
arithmetic with SBCL.  The following function is given as an example:

(defun i (x y)
  (declare (type (unsigned-byte 32) x y))
  (ldb (byte 32 0) (logxor x (lognot y))))

According to the manual, this is supposed to compile to native 32-bit
machine arithmetic on x86 CPUs.  This sounds cool.  The problem is
that it doesn't seem to work:

CL-USER> (disassemble 'i)
; 11B65DDA:       8BC6             MOV EAX, ESI               ; no-arg-parsing entry point
;      DDC:       F7D0             NOT EAX
;      DDE:       8BCB             MOV ECX, EBX
;      DE0:       31C1             XOR ECX, EAX
;      DE2:       F7C1000000E0     TEST ECX, 3758096384
;      DE8:       7522             JNE L1
;      DEA:       8D148D00000000   LEA EDX, [ECX*4]
;      DF1: L0:   8D65F8           LEA ESP, [EBP-8]
;      DF4:       F8               CLC
;      DF5:       8B6DFC           MOV EBP, [EBP-4]
;      DF8:       C20400           RET 4
;      DFB:       90               NOP
;      DFC:       90               NOP
;      DFD:       90               NOP
;      DFE:       90               NOP
;      DFF:       90               NOP
;      E00:       0F0B0A           BREAK 10                   ; error trap
;      E03:       02               BYTE #X02
;      E04:       18               BYTE #X18                  ; INVALID-ARG-COUNT-ERROR
;      E05:       0D               BYTE #X0D                  ; EAX
;      E06:       0F0B0A           BREAK 10                   ; error trap
;      E09:       02               BYTE #X02
;      E0A:       18               BYTE #X18                  ; INVALID-ARG-COUNT-ERROR
;      E0B:       4D               BYTE #X4D                  ; ECX
;      E0C: L1:   7907             JNS L2
;      E0E:       BA0A020000       MOV EDX, 522
;      E13:       EB05             JMP L3
;      E15: L2:   BA0A010000       MOV EDX, 266
;      E1A: L3:   C6057C02000804   MOV BYTE PTR [#x800027C], 4  ; unboxed_region
;      E21:       B810000000       MOV EAX, 16
;      E26:       030590783100     ADD EAX, [#x317890]        ; boxed_region
;      E2C:       3B0594783100     CMP EAX, [#x317894]        ; boxed_region
;      E32:       7607             JBE L4
;      E34:       E81BAF4AEE       CALL #x10D54               ; alloc_overflow_eax
;      E39:       EB09             JMP L5
;      E3B: L4:   890590783100     MOV [#x317890], EAX        ; boxed_region
;      E41:       83E810           SUB EAX, 16
;      E44: L5:   8910             MOV [EAX], EDX
;      E46:       8D5007           LEA EDX, [EAX+7]
;      E49:       894AFD           MOV [EDX-3], ECX
;      E4C:       C6057C02000800   MOV BYTE PTR [#x800027C], 0  ; unboxed_region
;      E53:       803D9402000800   CMP BYTE PTR [#x8000294], 0  ; unboxed_region
;      E5A:       7403             JEQ L6
;      E5C:       0F0B09           BREAK 9                    ; pending interrupt trap
;      E5F: L6:   EB90             JMP L0

Adding an (optimze (speed 3) (safety 0)) declaration doesn't change
anything.

Does anyone know what is wrong here?  Do I have to do something
special to enable the modular arithmetic optimizer?
 
-- 
Tord Romstad

Re: SBCL 0.9.11 and 32-bit arithmetic Christophe Rhodes
- Re: SBCL 0.9.11 and 32-bit arithmetic Didier Verna
- Re: SBCL 0.9.11 and 32-bit arithmetic Tord Kallqvist Romstad

From: Christophe Rhodes
Subject: Re: SBCL 0.9.11 and 32-bit arithmetic
Date: Sat, 01 Apr 2006 13:53:08 +0000
Message-ID: <sqpsk1cshn.fsf@cam.ac.uk>

Tord Kallqvist Romstad <·······@math.uio.no> writes:

> I'm trying to get started using SBCL 0.9.11 on an Intel iMac (on my
> old Mac I used OpenMCL, which unfortunately doesn't run on Intel
> Macs). Section 5.2 of the SBCL manual shows a way to do fast 32-bit
> arithmetic with SBCL.  The following function is given as an example:
>
> (defun i (x y)
>   (declare (type (unsigned-byte 32) x y))
>   (ldb (byte 32 0) (logxor x (lognot y))))
>
> According to the manual, this is supposed to compile to native 32-bit
> machine arithmetic on x86 CPUs.  This sounds cool.  The problem is
> that it doesn't seem to work:

It does, actually.

> CL-USER> (disassemble 'i)
> ; 11B65DDA:       8BC6             MOV EAX, ESI               ; no-arg-parsing entry point
> ;      DDC:       F7D0             NOT EAX       
> ;      DDE:       8BCB             MOV ECX, EBX
> ;      DE0:       31C1             XOR ECX, EAX

This bit is the native 32-bit arithmetic: (lognot y) is the second
line, (logxor x <result>) is the fourth.  By this stage, we have
computed the answer to your function.  However...

> ;      DE2:       F7C1000000E0     TEST ECX, 3758096384
> ;      DE8:       7522             JNE L1
> ;      DEA:       8D148D00000000   LEA EDX, [ECX*4]
> ;      DF1: L0:   8D65F8           LEA ESP, [EBP-8]
> ;      DF4:       F8               CLC
> ;      DF5:       8B6DFC           MOV EBP, [EBP-4]
> ;      DF8:       C20400           RET 4

This bit (and the consing sequence that I've snipped, down at L1) is
_returning_ the 32-bit value.  The callee of I must get a tagged lisp
object back, rather than a raw 32-bit value; since (LOGXOR Y (LOGNOT
X)) can have any 32-bit value, the return sequence must test the raw
32-bit value to see if it fits in a fixnum, and otherwise must
allocate a bignum for it.

So, why is this useful?  Well, if you're just calling your I function
on random input, it isn't; however, if you have slightly longer
arithmetic sequences, as are found in cryptographic or hashing
functions, or if you are storing the result of your short arithmetic
sequence into an array of results, then the inner loop or bottleneck
of the routine is made to take much less time than if the arithmetic
were generic.

Christophe

From: Didier Verna
Subject: Re: SBCL 0.9.11 and 32-bit arithmetic
Date: Sat, 01 Apr 2006 13:59:58 +0000
Message-ID: <muxmzf5ieg1.fsf@uzeb.lrde.epita.fr>

Christophe Rhodes <·····@cam.ac.uk> wrote:

> It does, actually.
>
>> CL-USER> (disassemble 'i)
>> ; 11B65DDA:       8BC6             MOV EAX, ESI               ; no-arg-parsing entry point
>> ;      DDC:       F7D0             NOT EAX       
>> ;      DDE:       8BCB             MOV ECX, EBX
>> ;      DE0:       31C1             XOR ECX, EAX
>
> This bit is the native 32-bit arithmetic: (lognot y) is the second
> line, (logxor x <result>) is the fourth.  By this stage, we have
> computed the answer to your function.  However...
>
>> ;      DE2:       F7C1000000E0     TEST ECX, 3758096384
>> ;      DE8:       7522             JNE L1
>> ;      DEA:       8D148D00000000   LEA EDX, [ECX*4]
>> ;      DF1: L0:   8D65F8           LEA ESP, [EBP-8]
>> ;      DF4:       F8               CLC
>> ;      DF5:       8B6DFC           MOV EBP, [EBP-4]
>> ;      DF8:       C20400           RET 4
>
> This bit (and the consing sequence that I've snipped, down at L1) is
> _returning_ the 32-bit value. The callee of I must get a tagged lisp object
> back, rather than a raw 32-bit value; since (LOGXOR Y (LOGNOT X)) can have
> any 32-bit value, the return sequence must test the raw 32-bit value to see
> if it fits in a fixnum, and otherwise must allocate a bignum for it.

        This makes me think; what would be your advice to become familiar with
compiled lisp code disassembly and understanding (especially for somebody not
fluent at all in assembler) ? I'm aware of the page on this matter at the
CMUCL web site, but it is really short.


Thanks.

-- 
Didier Verna, ······@lrde.epita.fr, http://www.lrde.epita.fr/~didier

EPITA / LRDE, 14-16 rue Voltaire   Tel.+33 (1) 44 08 01 85
94276 Le Kremlin-Bic�tre, France   Fax.+33 (1) 53 14 59 22   ······@xemacs.org

From: Tord Kallqvist Romstad
Subject: Re: SBCL 0.9.11 and 32-bit arithmetic
Date: Sat, 01 Apr 2006 17:01:52 +0000
Message-ID: <gqkodzlgrgf.fsf@europa.uio.no>

Christophe Rhodes <·····@cam.ac.uk> writes:

> This bit (and the consing sequence that I've snipped, down at L1) is
> _returning_ the 32-bit value.  The callee of I must get a tagged lisp
> object back, rather than a raw 32-bit value; since (LOGXOR Y (LOGNOT
> X)) can have any 32-bit value, the return sequence must test the raw
> 32-bit value to see if it fits in a fixnum, and otherwise must
> allocate a bignum for it.

I see.  Thanks a lot for the explanation!  

-- 
Tord Romstad