I am interested in the speed of a Common Lisp function isqrt
for extremely big bignums.
I compared three functions:
fast-isqrt-mid1.....given below
fast-isqrt-rec......given below
isqrt...............a Common Lisp function
These functions should return the same value.
I tried it on the following three environments:
MACL.........Macintosh Allegro Common Lisp 1.3.2 on Mac SE
accelerated with 25 MHz 68020
lucid........Lucid Sun Common Lisp 4 on SPARCstation IPC
akcl.........Austin Kyoto Common Lisp 1.530 on SPARCstation IPC
I tried to calculate isqrt of:
(* 2 (expt 10 2000)).....to get 1000 digits of square root of 2
(* 2 (expt 10 6000)).....to get 3000 digits of square root of 2
(* 2 (expt 10 20000))....to get 10000 digits of square root of 2
Here is the result.
*** For 1000 digits ***
fast-isqrt-mid1 fast-isqrt-rec isqrt
MACL 1.250 sec 0.317 sec 174 sec
lucid 2.72 sec 0.63 sec 11.91 sec
akcl 2.400 sec 0.567 sec 2.483 sec
*** For 3000 digits ***
fast-isqrt-mid1 fast-isqrt-rec isqrt
MACL 12.233 sec 2.450 sec 4503 sec
lucid 27.68 sec 5.45 sec 102.12 sec
akcl 24.567 sec 5.100 sec 24.567 sec
*** For 10000 digits ***
fast-isqrt-mid1 fast-isqrt-rec isqrt
MACL 142 sec 35 sec very long
lucid 327.75 sec 80.31 sec 1126.10 sec
akcl 273.917 sec 64.367 sec 275.783 sec
Looking at this table,
(1) 25 MHz 68020 Macintosh can be as twice as faster than SPARCstation IPC
as far as it is concerned with division of bignums by bignums.
(2) akcl is a little bit faster than lucid at least as far as it is
concerned with division of bignums by bignums.
(3) isqrt = fast-isqrt-mid1 on akcl ?
I appreciate very much if you would evaluate the following forms
(isqrt-time-comparison (* 2 (expt 10 2000)) 1 1)
(isqrt-time-comparison (* 2 (expt 10 6000)) 1 1)
(isqrt-time-comparison (* 2 (expt 10 20000)) 1 1)
on a different environment, and e-mail it to me. The function
isqrt-time-comparison is given below.
Also, if you have a faster function, would you inform me?
Thank you.
---
(defun fast-isqrt-simple (n &aux init-value iterated-value)
"argument n must be a non-negative integer."
(cond
((zerop n)
0)
(t
(setq init-value 1)
; do iteration once
(setq iterated-value (floor (+ init-value (floor n init-value)) 2))
(setq init-value iterated-value)
(loop
; do iteration, same as above
(setq iterated-value (floor (+ init-value (floor n init-value)) 2))
(if (>= iterated-value init-value) (return init-value))
(setq init-value iterated-value)))))
(defun fast-isqrt-mid1 (n &aux n-len init-value iterated-value)
"argument n must be a non-negative integer"
(cond
((> n 24) ; theoretically (> n 0) ,i.e., n-len > 0
(setq n-len (integer-length n))
(if (evenp n-len)
(setq init-value (ash 1 (ash n-len -1)))
(setq init-value (ash 2 (ash n-len -1))))
(loop
(setq iterated-value (ash (+ init-value (floor n init-value)) -1))
(if (not (< iterated-value init-value))
(return init-value)
(setq init-value iterated-value))))
((> n 15) 4)
((> n 8) 3)
((> n 3) 2)
((> n 0) 1)
((> n -1) 0)
(t nil)))
(defun fast-isqrt-rec (n &aux n-len-quarter n-half n-half-isqrt
init-value iterated-value)
"argument n must be a non-negative integer"
(cond
((> n 24) ; theoretically (> n 7) ,i.e., n-len-quarter > 0
(setq n-len-quarter (ash (integer-length n) -2))
(setq n-half (ash n (- (ash n-len-quarter 1))))
(setq n-half-isqrt (fast-isqrt-rec n-half))
(setq init-value (ash (1+ n-half-isqrt) n-len-quarter))
(loop
(setq iterated-value (ash (+ init-value (floor n init-value)) -1))
(if (not (< iterated-value init-value))
(return init-value)
(setq init-value iterated-value))))
((> n 15) 4)
((> n 8) 3)
((> n 3) 2)
((> n 0) 1)
((> n -1) 0)
(t nil)))
(defun isqrt-time-comparison (n1 how-many iteration-times &aux n2)
(setq n2 (+ n1 how-many))
(print "do nothing")
(time
(let ((n n1))
(loop
(if (not (< n n2)) (return))
(dotimes (i-t iteration-times)
)
(incf n))))
(print "fast-isqrt-mid1")
(time
(let ((n n1))
(loop
(if (not (< n n2)) (return))
(dotimes (i-t iteration-times)
(fast-isqrt-mid1 n))
(incf n))))
(print "fast-isqrt-rec")
(time
(let ((n n1))
(loop
(if (not (< n n2)) (return))
(dotimes (i-t iteration-times)
(fast-isqrt-rec n))
(incf n))))
(print "isqrt")
(time
(let ((n n1))
(loop
(if (not (< n n2)) (return))
(dotimes (i-t iteration-times)
(isqrt n))
(incf n))))
nil)
----
Akira Kurihara
······@tansei.cc.u-tokyo.ac.jp
School of Mathematics
Japan Women's University
Mejirodai 2-8-1, Bunkyo-ku
Tokyo 112
Japan
03-3943-3131 ext-7243
I have done an evaluation this "isqrt" benchmark on CMU CL systems here at CMU.
I discovered that the results being computed were wrong due to bugs in
division. After these problems had been solved, the times were significantly
longer than those reported Miles, but still much faster than those of
commercial implementations. (generally 2-5x)
I'd like the remark about the apparent inconsistency of high CMU bignum
primitive performance v.s. poor performance on the built-in ISQRT.
Really there is no inconsistency. The bignum implementation has consistently
*not* been tuned. When we first got it running, it was already much faster
than the commercial implementations.
As to why CMU is faster, it's hard to say for sure. It isn't algorithmic
cleverness; we implemented Knuth fairly directly. My guess is that the biggest
factor is compiler technology: the CMU bignum code is written in Lisp using a
few extra primitives like 32 x 32 -> 64. Unlike the memory-to-memory
operations described in the Lucid bignum paper, these are register-to-register
operations. Python efficiently open-codes 32 bit integer operations (even
using the standard CL arithmetic/logic functions.)
In the case of division (which these benchmarks are intensive with), it is also
possible that we have implemented the 64 / 32 divide more efficiently. This
must be done as an assembly routine, since neither MIPS nor SPARC have an
extended divide instruction.
So what is the significance of this benchmark (and of bignum performance in
general)? Well, that obviously depends on how much you use bignums. These
sorts of large speedups for CMU are not seen for more conventional
"symbol-crunching" Lisp programs (20% is more common...) But if you are
working on mathematical or scientific software, you will like CMU CL's dramatic
superiority in bignum and floating-point arithmetic.
The fact that CMU bignums are written in Lisp and are public domain also makes
it easy for people doing serious work with extended-precision arithmetic to
efficiently add any new primitives they need.
As you may have inferred from Miles's message, CMU CL is up on SUNOS Sparcs.
However, we are not ready to release it yet. We will post to comp.lang.lisp.
Raw data:
The Lucid numbers were copied from the net. The SparcStation 1+ numbers were
done at CMU using allegro 4.0.1. Both CMU and Allegro numbers are elapsed
(real) time, not any sort of "CPU" time.
1000 digits
system\test mid1 rec
ds5000 CMU 0.38 0.09
" Lucid 1.34 0.32
ds3100 CMU 0.56 0.14
" Lucid 2.34 0.55
ss1+ CMU 0.77 0.27
" Allegro 1.65 0.38
3000 digits
system\test mid1 rec
ds5000 CMU 3.39 0.69
" Lucid 13.7 2.70
ds3100 CMU 4.88 1.02
" Lucid 23.9 4.71
ss1+ CMU 5.37 1.15
" Allegro 16.6 3.29
10000 digits
system\test mid1 rec
ds5000 CMU 38.7 10.5
" Lucid 161 39.6
ds3100 CMU 56.9 16.6
" Lucid 284 69.5
ss1+ CMU 64.5 17.6
" Allegro 198 48.2
Robert A. MacLachlan (···@cs.cmu.edu)