From: Waldek Hebisch
Subject: Re: Optimize a function for speed
Date: Fri, 27 Feb 2009 03:54:21 +0000
Message-ID: <go7o5d$iqi$1@z-news.wcss.wroc.pl> Francogrex <······@grex.org> wrote:
> Hello, I have this run function but it's going really slow and I would
> like it to go much faster. I thought of hash-tables but how can we fit
> array structures into hash tables? It would be a learning experience
> for me if someone can give me a few hints on how to optimize this for
> speed. I am using SBCL. Thanks
>
On my machine (using sbcl-1.0.16) your original version takes 42.842s
1) As noted in Dimiter post access to nth element of the mlist is
takinig a lot of time. Using array instead of mlist reduced time to
1.615s
2) Using (proclaim '(optimize (speed 3) (safety 0))) to generate fast
but unsafe code reduced time to 1.105s
3) You have a big array where you store integes and floats -- such
array has siginificant cost. Since apparently you do not need
the array later the simplest thing is to eliminate array completly
and just use a bunch of variables to keep current result. This
reduced time to 0.384s
4) You use global variables and use almost no declarations. Using let
to introduce local variables and adding a few declarations reduced
time to 0.03s -- about 1300 times faster that original version.
Of course, if you need your mtable later you will have to keep it,
but al least you should you specialized integer array for indices
and separate float array for values. Also, indices are easily
computable -- recomputiong indices may be cheaper then storing them
up in the array.
If you use 32-bit machine and need big n you should replace
"the fixnum" in the progrma below by "the (signed byte 64)"
(but on 64-bit machine the second version gave me slower code).
Also, if you need better accuracy replace 'single-float' by
'double-float' and adjust constants (use 1.0d0 instead of 1.0, etc).
Anyway, my version below:
;; The while macro
(defmacro while (condition &rest body)
(let ((var (gensym)))
`(do ((,var nil (progn ,@body)))
((null ,condition) ,var))))
(defvar mlist (loop repeat 4000 collect (random 1.0))) ;;I need to use
;;; even 10000 repeats or more
;;The function to optimize
(defun run (mlist)
(let (
(n (1- (length mlist)))
(amlist (coerce (cdr mlist) '(simple-array single-float (*))))
(maxi 0)
(maxj 0)
(curPm 0.0)
(Pm 0.0)
(i 0))
(while (<= i (1- n))
(let (
( sP 0.0)
(j (1+ i)))
(declare (fixnum i j n))
(declare (single-float sP Pm))
(while (< j n)
(setf sP (+ sP (aref amlist (1- j))))
(setf curPm (if (< sP 1) (/ 1 (* (float (the fixnum (* (- j i)
(+ (- n j) (1+ i))))) sP)) 1.0))
(when (> curPm Pm) (setf
Pm curPm
maxi i
maxj j))
(incf j))
(incf i)))
(setf mtable2 (list maxi maxj Pm)))
)
(time (run mlist))
--
Waldek Hebisch
·······@math.uni.wroc.pl From: Dimiter "malkia" Stanev
Subject: Re: Optimize a function for speed
Date: Fri, 27 Feb 2009 09:27:18 +0000
Message-ID: <dc11482c-487e-49e4-af17-55032068a01e@v35g2000pro.googlegroups.com> FYI: A little more optimized version, based on Waldeks:
(defmacro while (condition &rest body)
(let ((var (gensym)))
`(do ((,var nil (progn ,@body)))
((null ,condition) ,var))))
(defvar mlist (loop repeat 4000 collect (random 1.0e0))) ;;I need to
use
;;; even 10000 repeats or more
;;The function to optimize
(defun run (mlist)
(declare (optimize (speed 3) (space 0) (debug 0) (compilation-speed
0) (safety 0)
#+lispworks (fixnum-safety 0)
#+lispworks (float 0)))
(let ((n (1- (length mlist)))
(amlist (coerce (cdr mlist) '(simple-array single-float
(*))))
(maxi 0)
(maxj 0)
(curPm 0.0e0)
(Pm 0.0e0)
(i 0))
(declare (fixnum i n maxi maxj))
(declare (single-float curPm Pm))
(declare ((simple-array single-float (*)) amlist))
(while (< i n)
(let ((sP 0.0e0)
(j (+ i 1)))
(declare (fixnum j))
(declare (single-float sP))
(while (< j n)
(setf sP (+ sP (aref amlist (- j 1))))
(setf curPm
(if (< sP 1.0e0)
(/ 1.0e0
(* (coerce (* (- j i) (+ (- n j) (+ i
1))) 'single-float)
sP))
1.0e0))
(when (> curPm Pm)
(setf Pm curPm
maxi i
maxj j))
(incf j))
(incf i)))
(values Pm maxi maxj)))
(time (run mlist))
; Under Lispworks 32-bit 5.1.2 (Mac OS X, PowerBook 2.6 Ghz Intel Core
2 Duo 4GB RAM 667 Mhz DDR2 SDRAM)
; User time = 0.031
; System time = 0.000
; Elapsed time = 0.031
; Allocation = 16060 bytes
; 0 Page faults
; Note that this time includes the coercion to vector, and I was lazy
to verify how much time it actually took.
; Ideally your data would be prepared in most suitable formats
(specialized arrays)
; So from 48 seconds, went to 0.031s - that's close to 1500 speedup
from the original version
Here is btw, the disassembly from Lispworks, look at this beauty (Yes
I know it's not the most optimized code, and Intel C/C++ would beat
it's pants, but for such a high-level language as Lisp this is simply
amazing):
And the thing is, a peephole optimizer would need just the generate
the bytecodes to optimize it even more (for example jge L8, jumping
back to L2 and adding 4 to a register).
And other lisp compilers might generate even better code than this...
CL-USER 19 > (disassemble 'run)
200C415A:
0: 55 push ebp
1: 89E5 move ebp, esp
3: 50 push eax
4: 50 push eax
5: 50 push eax
6: 50 push eax
7: FF35F4330020 push [200033F4] ; SYSTEM:*
%DYNAMIC-ENVIRONMENT
13: FF3530D76921 push [2169D730] ; MLIST
19: 681FD76921 push 2169D71F ; MLIST
24: 8925F4330020 move [200033F4], esp ; SYSTEM:*
%DYNAMIC-ENVIRONMENT
30: 890530D76921 move [2169D730], eax ; MLIST
36: 8B0530D76921 move eax, [2169D730] ; MLIST
42: B501 moveb ch, 1
44: FF152CE50320 call [2003E52C] ; LENGTH
50: 89C6 move esi, eax
52: 83EE04 sub esi, 4
55: 8975FC move [ebp-4], esi
58: 8B3D30D76921 move edi, [2169D730] ; MLIST
64: 83FF56 cmp edi, 56
67: 7403 je L1
69: 8B7F07 move edi, [edi+7]
L1: 72: 57 push edi
73: B502 moveb ch, 2
75: B865CC0F20 move eax, 200FCC65 ; (SIMPLE-ARRAY
SINGLE-FLOAT (*))
80: FF15B4761F20 call [201F76B4] ;
SYSTEM::COERCE-TO-VECTOR
86: 8945F0 move [ebp-10], eax
89: C745F800000000 move [ebp-8], 0
96: C745F400000000 move [ebp-C], 0
103: F30F102DB42E0F20 movss xmm5, [200F2EB4] ; 0.0
111: 33FF xor edi, edi
; BEGIN MAIN LOOP
L2: 113: 3B7DFC cmp edi, [ebp-4]
116: 0F8D80000000 jge L7
122: F30F1035B42E0F20 movss xmm6, [200F2EB4] ; 0.0
130: 8D5F04 lea ebx, [edi+4]
L3: 133: 3B5DFC cmp ebx, [ebp-4]
136: 0F8DB3000000 jge L8
142: 89DA move edx, ebx
144: 83EA04 sub edx, 4
147: 8B75F0 move esi, [ebp-10]
150: F30F10641605 movss xmm4, [esi+5+edx]
156: F30F58F4 addss xmm6, xmm4
160: F30F1025C42E0F20 movss xmm4, [200F2EC4] ; 1.0
168: 0F2FE6 comiss xmm4, xmm6
171: 7643 jbe L6
173: 89D8 move eax, ebx
175: 2BC7 sub eax, edi
177: 8B55FC move edx, [ebp-4]
180: 2BD3 sub edx, ebx
182: 8D4F04 lea ecx, [edi+4]
185: 03D1 add edx, ecx
187: C1FA02 sar edx, 2
190: 0FAFD0 imul edx, eax
193: C1FA02 sar edx, 2
196: F30F2ADA cvtsi2ss xmm3, edx
200: F30F59DE mulss xmm3, xmm6
204: F30F1025C42E0F20 movss xmm4, [200F2EC4] ; 1.0
212: F30F5EE3 divss xmm4, xmm3
L4: 216: 0F2FE5 comiss xmm4, xmm5
219: 760E jbe L5
221: F20F10EC movsd xmm5, xmm4
225: 89FA move edx, edi
227: 8955F8 move [ebp-8], edx
230: 89DA move edx, ebx
232: 8955F4 move [ebp-C], edx
L5: 235: 83C304 add ebx, 4
238: EB95 jmp L3
L6: 240: F30F1025C42E0F20 movss xmm4, [200F2EC4] ; 1.0
248: EBDE jmp L4
; END MAIN LOOP, BUT TAKE A LOOK ALSO AT Label L8
L7: 250: 83EC08 sub esp, 8
253: C7042486080000 move [esp], 886
260: F30F116C2404 movss [esp+4], xmm5
266: B501 moveb ch, 1
268: FF15CC0F1420 call [20140FCC] ; SYSTEM::RAW-
FAST-BOX-SINGLE
274: 8B7DF8 move edi, [ebp-8]
277: 8B5DF4 move ebx, [ebp-C]
280: 893D64300020 move [20003064], edi ; SYSTEM:*
%MULTIPLE-VALUE-0
286: 891D68300020 move [20003068], ebx ; T
292: FC cld
293: C7056030002000030000 move [20003060], 300 ; SYSTEM:*
%MULTIPLE-VALUE-COUNT
303: 8B25F4330020 move esp, [200033F4] ; SYSTEM:*
%DYNAMIC-ENVIRONMENT
309: 5E pop esi
310: 8F4611 pop [esi+11]
313: 8F05F4330020 pop [200033F4] ; SYSTEM:*
%DYNAMIC-ENVIRONMENT
319: C9 leave
320: C3 ret
L8: 321: 83C704 add edi, 4
324: E928FFFFFF jmp L2
329: 90 nop
NIL From: Raffael Cavallaro
Subject: Re: Optimize a function for speed
Date: Fri, 27 Feb 2009 18:19:05 +0000
Message-ID: <66f698a5-9f87-4c98-bc72-39ed2fe1b027@q11g2000yqh.googlegroups.com> On Feb 27, 4:27 am, "Dimiter \"malkia\" Stanev" <······@gmail.com>
wrote:
> FYI: A little more optimized version, based on Waldeks:
>
And here it is using loop instead of while-incf pair:
(defun run (mlist)
(declare (optimize (speed 3) (space 0) (debug 0) (compilation-speed
0) (safety 0)
#+lispworks (fixnum-safety 0)
#+lispworks (float 0)))
(let ((n (1- (length mlist)))
(amlist (coerce (cdr mlist) '(simple-array single-float
(*))))
(maxi 0)
(maxj 0)
(curPm 0.0e0)
(Pm 0.0e0))
(declare (fixnum n maxi maxj))
(declare (single-float curPm Pm))
(declare ((simple-array single-float (*)) amlist))
(loop for i fixnum below n do
(let ((sP 0.0e0))
(declare (single-float sP))
(loop for j fixnum from (1+ i) below n do
(setf sP (+ sP (aref amlist (- j 1))))
(setf curPm
(if (< sP 1.0e0)
(/ 1.0e0
(* (coerce (* (- j i) (+ (- n j) (+ i 1)))
'single-float)
sP))
1.0e0))
(when (> curPm Pm)
(setf Pm curPm
maxi i
maxj j)))))
(values Pm maxi maxj))) From: Alex Mizrahi
Subject: Re: Optimize a function for speed
Date: Fri, 27 Feb 2009 17:40:16 +0000
Message-ID: <49a8258a$0$90269$14726298@news.sunsite.dk> DmS> Here is btw, the disassembly from Lispworks,
why does it use XMM registers, simple FPU ain't cool anymore? From: William James
Subject: Re: Optimize a function for speed
Date: Fri, 27 Feb 2009 20:15:48 +0000
Message-ID: <go9hlk01rbj@enews2.newsguy.com> Dimiter "malkia" Stanev wrote:
> FYI: A little more optimized version, based on Waldeks:
>
> (defmacro while (condition &rest body)
> (let ((var (gensym)))
> `(do ((,var nil (progn ,@body)))
> ((null ,condition) ,var))))
>
> (defvar mlist (loop repeat 4000 collect (random 1.0e0))) ;;I need to
> use
> ;;; even 10000 repeats or more
> ;;The function to optimize
>
> (defun run (mlist)
> (declare (optimize (speed 3) (space 0) (debug 0) (compilation-speed
> 0) (safety 0)
> #+lispworks (fixnum-safety 0)
> #+lispworks (float 0)))
> (let ((n (1- (length mlist)))
> (amlist (coerce (cdr mlist) '(simple-array single-float
> (*))))
> (maxi 0)
> (maxj 0)
> (curPm 0.0e0)
> (Pm 0.0e0)
> (i 0))
> (declare (fixnum i n maxi maxj))
> (declare (single-float curPm Pm))
> (declare ((simple-array single-float (*)) amlist))
> (while (< i n)
> (let ((sP 0.0e0)
> (j (+ i 1)))
> (declare (fixnum j))
> (declare (single-float sP))
> (while (< j n)
> (setf sP (+ sP (aref amlist (- j 1))))
> (setf curPm
> (if (< sP 1.0e0)
> (/ 1.0e0
> (* (coerce (* (- j i) (+ (- n j) (+ i
> 1))) 'single-float)
> sP))
> 1.0e0))
> (when (> curPm Pm)
> (setf Pm curPm
> maxi i
> maxj j))
> (incf j))
> (incf i)))
> (values Pm maxi maxj)))
>
> (time (run mlist))
>
> ; Under Lispworks 32-bit 5.1.2 (Mac OS X, PowerBook 2.6 Ghz Intel Core
> 2 Duo 4GB RAM 667 Mhz DDR2 SDRAM)
> ; User time = 0.031
For comparison, here's a version that uses doubles instead of singles
and takes 0.082 seconds on a 2GHz laptop using the F# interpreter.
No type annotations are used.
#nowarn "62" ;;
let gen = new System.Random() ;;
let mlist = Array.init 4000 (fun _ -> gen.NextDouble() ) ;;
let run mlist =
let mlist = Array.sub mlist 1 (pred (Array.length mlist)) in
let n = Array.length mlist and
sp = ref 0.0 and
pm = ref 0.0 and
cur_pm = ref 0.0 and
maxi = ref 0 and
maxj = ref 0 in
for i = 0 to pred n do
sp := 0.0;
for j = succ i to pred n do
sp := !sp + mlist.[pred j] ;
cur_pm :=
if !sp >= 1.0 then 1.0 else
1.0 / ( float_of_int ((j - i) * (n - j + i + 1)) * !sp) ;
if !cur_pm > !pm then
( pm := !cur_pm ;
maxi := i;
maxj := j ) ;
done ;
done ;
printf "%d %d %f\n" !maxi !maxj !pm ;;
let timer = System.Diagnostics.Stopwatch.StartNew () in
run mlist;
printf "\n%d milliseconds\n" timer.ElapsedMilliseconds; From: Marco Antoniotti
Subject: Re: Optimize a function for speed
Date: Fri, 27 Feb 2009 20:53:08 +0000
Message-ID: <b8a2f83f-e598-4180-809e-78307e6f2cd6@b16g2000yqb.googlegroups.com> On Feb 27, 9:15 pm, "William James" <> wrote:
> Dimiter "malkia" Stanev wrote:
> > FYI: A little more optimized version, based on Waldeks:
>
> > (defmacro while (condition &rest body)
> > (let ((var (gensym)))
> > `(do ((,var nil (progn ,@body)))
> > ((null ,condition) ,var))))
>
> > (defvar mlist (loop repeat 4000 collect (random 1.0e0))) ;;I need to
> > use
> > ;;; even 10000 repeats or more
> > ;;The function to optimize
>
> > (defun run (mlist)
> > (declare (optimize (speed 3) (space 0) (debug 0) (compilation-speed
> > 0) (safety 0)
> > #+lispworks (fixnum-safety 0)
> > #+lispworks (float 0)))
> > (let ((n (1- (length mlist)))
> > (amlist (coerce (cdr mlist) '(simple-array single-float
> > (*))))
> > (maxi 0)
> > (maxj 0)
> > (curPm 0.0e0)
> > (Pm 0.0e0)
> > (i 0))
> > (declare (fixnum i n maxi maxj))
> > (declare (single-float curPm Pm))
> > (declare ((simple-array single-float (*)) amlist))
> > (while (< i n)
> > (let ((sP 0.0e0)
> > (j (+ i 1)))
> > (declare (fixnum j))
> > (declare (single-float sP))
> > (while (< j n)
> > (setf sP (+ sP (aref amlist (- j 1))))
> > (setf curPm
> > (if (< sP 1.0e0)
> > (/ 1.0e0
> > (* (coerce (* (- j i) (+ (- n j) (+ i
> > 1))) 'single-float)
> > sP))
> > 1.0e0))
> > (when (> curPm Pm)
> > (setf Pm curPm
> > maxi i
> > maxj j))
> > (incf j))
> > (incf i)))
> > (values Pm maxi maxj)))
>
> > (time (run mlist))
>
> > ; Under Lispworks 32-bit 5.1.2 (Mac OS X, PowerBook 2.6 Ghz Intel Core
> > 2 Duo 4GB RAM 667 Mhz DDR2 SDRAM)
> > ; User time = 0.031
>
> For comparison, here's a version that uses doubles instead of singles
> and takes 0.082 seconds on a 2GHz laptop using the F# interpreter.
>
> No type annotations are used.
>
> #nowarn "62" ;;
>
> let gen = new System.Random() ;;
> let mlist = Array.init 4000 (fun _ -> gen.NextDouble() ) ;;
>
> let run mlist =
> let mlist = Array.sub mlist 1 (pred (Array.length mlist)) in
> let n = Array.length mlist and
> sp = ref 0.0 and
> pm = ref 0.0 and
> cur_pm = ref 0.0 and
> maxi = ref 0 and
> maxj = ref 0 in
> for i = 0 to pred n do
> sp := 0.0;
> for j = succ i to pred n do
> sp := !sp + mlist.[pred j] ;
> cur_pm :=
> if !sp >= 1.0 then 1.0 else
> 1.0 / ( float_of_int ((j - i) * (n - j + i + 1)) * !sp) ;
> if !cur_pm > !pm then
> ( pm := !cur_pm ;
> maxi := i;
> maxj := j ) ;
> done ;
> done ;
> printf "%d %d %f\n" !maxi !maxj !pm ;;
>
> let timer = System.Diagnostics.Stopwatch.StartNew () in
> run mlist;
> printf "\n%d milliseconds\n" timer.ElapsedMilliseconds;
Shouldn't you be working on something else? There are two pending
code requests on you. You can also try the last one in F# if you
decided to drop Ruby in the last few days. :)
Cheers
--
Marco