isqrt

From: Akira Kurihara
Subject: isqrt
Date: Wed, 05 Jun 1991 18:32:11 +0000
Message-ID: <676362388.61@egsgate.FidoNet.Org>

I am interested in the speed of a Common Lisp function isqrt
for extremely big bignums.

I compared three functions:

fast-isqrt-mid1.....given below
fast-isqrt-rec......given below
isqrt...............a Common Lisp function

These functions should return the same value.
I tried it on the following three environments:

MACL.........Macintosh Allegro Common Lisp 1.3.2 on Mac SE
             accelerated with 25 MHz 68020
lucid........Lucid Sun Common Lisp 4 on SPARCstation IPC
akcl.........Austin Kyoto Common Lisp 1.530 on SPARCstation IPC

I tried to calculate isqrt of:

(* 2 (expt 10 2000)).....to get  1000 digits of square root of 2
(* 2 (expt 10 6000)).....to get  3000 digits of square root of 2
(* 2 (expt 10 20000))....to get 10000 digits of square root of 2

Here is the result.

*** For 1000 digits ***

           fast-isqrt-mid1     fast-isqrt-rec         isqrt
MACL          1.250 sec           0.317 sec        174     sec
lucid         2.72  sec           0.63  sec         11.91  sec
akcl          2.400 sec           0.567 sec          2.483 sec

*** For 3000 digits ***

           fast-isqrt-mid1     fast-isqrt-rec         isqrt
MACL         12.233 sec           2.450 sec       4503     sec
lucid        27.68  sec           5.45  sec        102.12  sec
akcl         24.567 sec           5.100 sec         24.567 sec

*** For 10000 digits ***

           fast-isqrt-mid1     fast-isqrt-rec         isqrt
MACL        142     sec          35     sec         very long
lucid       327.75  sec          80.31  sec       1126.10  sec
akcl        273.917 sec          64.367 sec        275.783 sec

Looking at this table,
(1) 25 MHz 68020 Macintosh can be as twice as faster than SPARCstation IPC
as far as it is concerned with division of bignums by bignums.
(2) akcl is a little bit faster than lucid at least as far as it is
concerned with division of bignums by bignums.
(3) isqrt = fast-isqrt-mid1 on akcl ?

I appreciate very much if you would evaluate the following forms

(isqrt-time-comparison (* 2 (expt 10 2000)) 1 1)

(isqrt-time-comparison (* 2 (expt 10 6000)) 1 1)

(isqrt-time-comparison (* 2 (expt 10 20000)) 1 1)

on a different environment, and e-mail it to me. The function
isqrt-time-comparison is given below.

Also, if you have a faster function, would you inform me?

Thank you.

---

(defun fast-isqrt-simple (n &aux init-value iterated-value)
  "argument n must be a non-negative integer."
  (cond
   ((zerop n)
    0)
   (t
    (setq init-value 1)
    ; do iteration once
    (setq iterated-value (floor (+ init-value (floor n init-value)) 2))
    (setq init-value iterated-value)
    (loop
      ; do iteration, same as above
      (setq iterated-value (floor (+ init-value (floor n init-value)) 2))
      (if (>= iterated-value init-value) (return init-value))
      (setq init-value iterated-value)))))


(defun fast-isqrt-mid1 (n &aux n-len init-value iterated-value)
  "argument n must be a non-negative integer"
  (cond
   ((> n 24)		; theoretically (> n 0) ,i.e., n-len > 0
    (setq n-len (integer-length n))
    (if (evenp n-len)
      (setq init-value (ash 1 (ash n-len -1)))
      (setq init-value (ash 2 (ash n-len -1))))
    (loop
      (setq iterated-value (ash (+ init-value (floor n init-value)) -1))
      (if (not (< iterated-value init-value))
        (return init-value)
        (setq init-value iterated-value))))
   ((> n 15) 4)
   ((> n  8) 3)
   ((> n  3) 2)
   ((> n  0) 1)
   ((> n -1) 0)
   (t nil)))


(defun fast-isqrt-rec (n &aux n-len-quarter n-half n-half-isqrt
                       init-value iterated-value)
  "argument n must be a non-negative integer"
  (cond
   ((> n 24)		; theoretically (> n 7) ,i.e., n-len-quarter > 0
    (setq n-len-quarter (ash (integer-length n) -2))
    (setq n-half (ash n (- (ash n-len-quarter 1))))
    (setq n-half-isqrt (fast-isqrt-rec n-half))
    (setq init-value (ash (1+ n-half-isqrt) n-len-quarter))
    (loop
      (setq iterated-value (ash (+ init-value (floor n init-value)) -1))
      (if (not (< iterated-value init-value))
        (return init-value)
        (setq init-value iterated-value))))
   ((> n 15) 4)
   ((> n  8) 3)
   ((> n  3) 2)
   ((> n  0) 1)
   ((> n -1) 0)
   (t nil)))


(defun isqrt-time-comparison (n1 how-many iteration-times &aux n2)
  (setq n2 (+ n1 how-many))
  (print "do nothing")
  (time
   (let ((n n1))
     (loop
       (if (not (< n n2)) (return))
       (dotimes (i-t iteration-times)
         )
       (incf n))))
  (print "fast-isqrt-mid1")
  (time
   (let ((n n1))
     (loop
       (if (not (< n n2)) (return))
       (dotimes (i-t iteration-times)
         (fast-isqrt-mid1 n))
       (incf n))))
  (print "fast-isqrt-rec")
  (time
   (let ((n n1))
     (loop
       (if (not (< n n2)) (return))
       (dotimes (i-t iteration-times)
         (fast-isqrt-rec n))
       (incf n))))
  (print "isqrt")
  (time
   (let ((n n1))
     (loop
       (if (not (< n n2)) (return))
       (dotimes (i-t iteration-times)
         (isqrt n))
       (incf n))))
  nil)

----

Akira Kurihara

······@tansei.cc.u-tokyo.ac.jp

School of Mathematics
Japan Women's University
Mejirodai 2-8-1, Bunkyo-ku
Tokyo 112
Japan
03-3943-3131 ext-7243

From: Rob MacLachlan
Subject: Re: isqrt
Date: Thu, 13 Jun 1991 00:31:22 +0000
Message-ID: <1991Jun13.003122.19863@cs.cmu.edu>

I have done an evaluation this "isqrt" benchmark on CMU CL systems here at CMU.
I discovered that the results being computed were wrong due to bugs in
division.  After these problems had been solved, the times were significantly
longer than those reported Miles, but still much faster than those of
commercial implementations.  (generally 2-5x)

I'd like the remark about the apparent inconsistency of high CMU bignum
primitive performance v.s. poor performance on the built-in ISQRT.
Really there is no inconsistency.  The bignum implementation has consistently
*not* been tuned.  When we first got it running, it was already much faster
than the commercial implementations.

As to why CMU is faster, it's hard to say for sure.  It isn't algorithmic
cleverness; we implemented Knuth fairly directly.  My guess is that the biggest
factor is compiler technology: the CMU bignum code is written in Lisp using a
few extra primitives like 32 x 32 -> 64.  Unlike the memory-to-memory
operations described in the Lucid bignum paper, these are register-to-register
operations.  Python efficiently open-codes 32 bit integer operations (even
using the standard CL arithmetic/logic functions.)

In the case of division (which these benchmarks are intensive with), it is also
possible that we have implemented the 64 / 32 divide more efficiently.  This
must be done as an assembly routine, since neither MIPS nor SPARC have an
extended divide instruction.

So what is the significance of this benchmark (and of bignum performance in
general)?  Well, that obviously depends on how much you use bignums.  These
sorts of large speedups for CMU are not seen for more conventional
"symbol-crunching" Lisp programs (20% is more common...)  But if you are
working on mathematical or scientific software, you will like CMU CL's dramatic
superiority in bignum and floating-point arithmetic.

The fact that CMU bignums are written in Lisp and are public domain also makes
it easy for people doing serious work with extended-precision arithmetic to
efficiently add any new primitives they need.

As you may have inferred from Miles's message, CMU CL is up on SUNOS Sparcs.
However, we are not ready to release it yet.  We will post to comp.lang.lisp.


Raw data:

The Lucid numbers were copied from the net.  The SparcStation 1+ numbers were
done at CMU using allegro 4.0.1.  Both CMU and Allegro numbers are elapsed
(real) time, not any sort of "CPU" time.

1000 digits

system\test	mid1	rec
ds5000 CMU	0.38	0.09
   "   Lucid	1.34	0.32
ds3100 CMU	0.56	0.14
   "   Lucid	2.34	0.55	
ss1+  CMU	0.77	0.27
   "  Allegro	1.65	0.38

3000 digits

system\test	mid1	rec
ds5000 CMU	3.39	0.69
   "   Lucid   13.7	2.70
ds3100 CMU	4.88	1.02
   "   Lucid   23.9	4.71
ss1+ CMU	5.37	1.15
   " Allegro   16.6	3.29

10000 digits

system\test	mid1	rec
ds5000 CMU	38.7	10.5
   "   Lucid   161	39.6
ds3100 CMU	56.9	16.6
   "   Lucid   284	69.5
ss1+  CMU	64.5	17.6
   "  Allegro  198	48.2

  Robert A. MacLachlan (···@cs.cmu.edu)