From: David Gadbois
Subject: Re: isqrt
Date: 
Message-ID: <19910606055232.5.GADBOIS@KISIN.ACA.MCC.COM>
    Date: 5 Jun 91 13:32:11 GMT
    From: ······@tansei.cc.u-tokyo.ac.jp (Akira Kurihara)


    I am interested in the speed of a Common Lisp function isqrt
    for extremely big bignums.

I added the results of running the test on a Symbolics machine.  The
results for the builtin ISQRT are surprising.  While I normally run
screaming from numeric code, I had a look at the source, and it uses one
of those clever bit-twiddling tricks that appears to calculate the
result in O( (INTEGER-LENGTH n) ) time.  One of the "numerical recipes"
books probably has an explanation of it.

    I compared three functions:

    fast-isqrt-mid1.....given below
    fast-isqrt-rec......given below
    isqrt...............a Common Lisp function

    These functions should return the same value.
    I tried it on the following three environments:

    MACL.........Macintosh Allegro Common Lisp 1.3.2 on Mac SE
		 accelerated with 25 MHz 68020
    lucid........Lucid Sun Common Lisp 4 on SPARCstation IPC
    akcl.........Austin Kyoto Common Lisp 1.530 on SPARCstation IPC
    XL1200.......Symbolics XL1200 running Genera 8.0.2

    I tried to calculate isqrt of:

    (* 2 (expt 10 2000)).....to get  1000 digits of square root of 2
    (* 2 (expt 10 6000)).....to get  3000 digits of square root of 2
    (* 2 (expt 10 20000))....to get 10000 digits of square root of 2

    Here is the result.

    *** For 1000 digits ***

	       fast-isqrt-mid1     fast-isqrt-rec         isqrt
    MACL          1.250 sec           0.317 sec        174       sec
    lucid         2.72  sec           0.63  sec         11.91    sec
    akcl          2.400 sec           0.567 sec          2.483   sec
    XL1200        0.757 sec           0.178 sec          0.00122 sec

    *** For 3000 digits ***

	       fast-isqrt-mid1     fast-isqrt-rec         isqrt
    MACL         12.233 sec           2.450 sec       4503       sec
    lucid        27.68  sec           5.45  sec        102.12    sec
    akcl         24.567 sec           5.100 sec         24.567   sec
    XL1200        7.257 sec           1.484 sec          0.00450 sec

    *** For 10000 digits ***

	       fast-isqrt-mid1     fast-isqrt-rec         isqrt
    MACL        142     sec          35     sec         very long
    lucid       327.75  sec          80.31  sec       1126.10   sec
    akcl        273.917 sec          64.367 sec        275.783  sec
    XL1200       84.780 sec          20.910 sec          0.0120 sec

    Looking at this table,

    (1) 25 MHz 68020 Macintosh can be as twice as faster than
    SPARCstation IPC as far as it is concerned with division of bignums
    by bignums.

I have found that MCL (nee MACL) is a really nice second-generation
Common Lisp.  In most of the cases I have tested, it does better than
the UNIX-based Lisps.  Plus, you can get a decent development platform
for under $10K.  The ISQRT implementation is evidently a disappointment,
though.

--David Gadbois

From: Mark Adler
Subject: Re: isqrt
Date: 
Message-ID: <23210@shlump.lkg.dec.com>
Here are the results I got on a DECsystem 3100 using Lucid Common Lisp.
By the way, when I compiled the code in production mode,
the isqrt time went to zero.  I assume that the compiler "optimized" the
calculation out of existence, presumably since the value was not used.
(This probably also explains the Syumbolics results reported in close to
zero time.)


;;; Lucid Common Lisp/DECsystem
;;; Development Environment Version 4.0, 12 November 1990
;;; Copyright (C) 1985, 1986, 1987, 1988, 1989, 1990 by Lucid, Inc.
;;; All Rights Reserved
;;;
;;; This software product contains confidential and trade secret information
;;; belonging to Lucid, Inc.  It may not be copied for any reason other than
;;; for archival and backup purposes.
;;;
;;; Lucid Common Lisp is a trademark of Lucid, Inc.
;;; DECsystem is a trademark of Digital Equipment Corp.

Lisp> (isqrt-time-comparison (* 2 (expt 10 2000)) 1 1)

"do nothing" 
Elapsed Real Time = 0.03 seconds
Total Run Time    = 0.00 seconds
User Run Time     = 0.00 seconds
System Run Time   = 0.00 seconds
Process Page Faults    =          1
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =        840

"fast-isqrt-mid1" 
Elapsed Real Time = 2.46 seconds
Total Run Time    = 2.36 seconds
User Run Time     = 2.34 seconds
System Run Time   = 0.02 seconds
Process Page Faults    =          2
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =     21,616

"fast-isqrt-rec" 
Elapsed Real Time = 0.59 seconds
Total Run Time    = 0.55 seconds
User Run Time     = 0.55 seconds
System Run Time   = 0.00 seconds
Process Page Faults    =          1
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =     10,256

"isqrt" 
Elapsed Real Time = 12.18 seconds
Total Run Time    = 11.68 seconds
User Run Time     = 11.54 seconds
System Run Time   = 0.14 seconds
Process Page Faults    =         57
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed = 10,502,096
There were 20 ephemeral GCs
NIL
Lisp> (isqrt-time-comparison (* 2 (expt 10 6000)) 1 1)

"do nothing" 
Elapsed Real Time = 0.00 seconds
Total Run Time    = 0.00 seconds
User Run Time     = 0.00 seconds
System Run Time   = 0.00 seconds
Process Page Faults    =          1
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =      2,496

"fast-isqrt-mid1" 
Elapsed Real Time = 24.14 seconds
Total Run Time    = 23.91 seconds
User Run Time     = 23.87 seconds
System Run Time   = 0.04 seconds
Process Page Faults    =          1
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =     74,088

"fast-isqrt-rec" 
Elapsed Real Time = 4.77 seconds
Total Run Time    = 4.72 seconds
User Run Time     = 4.71 seconds
System Run Time   = 0.01 seconds
Process Page Faults    =          0
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =     30,872

"isqrt" 
Elapsed Real Time = 97.92 seconds (1 minute, 37.92 seconds)
Total Run Time    = 96.18 seconds (1 minute, 36.18 seconds)
User Run Time     = 95.85 seconds (1 minute, 35.85 seconds)
System Run Time   = 0.33 seconds
Process Page Faults    =        105
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed = 93,533,248
There were 179 ephemeral GCs
NIL
Lisp> (isqrt-time-comparison (* 2 (expt 10 20000)) 1 1)

"do nothing" 
Elapsed Real Time = 0.00 seconds
Total Run Time    = 0.00 seconds
User Run Time     = 0.00 seconds
System Run Time   = 0.00 seconds
Process Page Faults    =          0
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =      8,312

"fast-isqrt-mid1" 
Elapsed Real Time = 291.34 seconds (4 minutes, 51.34 seconds)
Total Run Time    = 285.04 seconds (4 minutes, 45.04 seconds)
User Run Time     = 283.89 seconds (4 minutes, 43.89 seconds)
System Run Time   = 1.16 seconds
Process Page Faults    =         22
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =    262,072
There was 1 ephemeral GC

"fast-isqrt-rec" 
Elapsed Real Time = 70.17 seconds (1 minute, 10.17 seconds)
Total Run Time    = 69.63 seconds (1 minute, 9.63 seconds)
User Run Time     = 69.54 seconds (1 minute, 9.54 seconds)
System Run Time   = 0.09 seconds
Process Page Faults    =          3
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =    113,392

"isqrt" 
Elapsed Real Time = 1068.20 seconds (17 minutes, 48.20 seconds)
Total Run Time    = 1055.11 seconds (17 minutes, 35.11 seconds)
User Run Time     = 1052.86 seconds (17 minutes, 32.86 seconds)
System Run Time   = 2.25 seconds
Process Page Faults    =        381
Dynamic Bytes Consed   =    223,568
Ephemeral Bytes Consed = 1,035,828,048
There were 1989 ephemeral GCs
NIL
Lisp> 
From: Mark Adler
Subject: Re: isqrt
Date: 
Message-ID: <23216@shlump.lkg.dec.com>
I've added DS-3100 and DS-5000 to the list.


    MACL.........Macintosh Allegro Common Lisp 1.3.2 on Mac SE
		 accelerated with 25 MHz 68020
    lucid........Lucid Sun Common Lisp 4 on SPARCstation IPC
    akcl.........Austin Kyoto Common Lisp 1.530 on SPARCstation IPC
    XL1200.......Symbolics XL1200 running Genera 8.0.2
    DS-3100......Lucid Common Lisp 4.0 on DEC DS-3100
    DS-5000......Lucid Common Lisp 4.0 on DEC DS-5000


*** For 1000 digits ***

           fast-isqrt-mid1     fast-isqrt-rec         isqrt
MACL          1.250 sec           0.317 sec        174     sec
lucid         2.72  sec           0.63  sec         11.91  sec
akcl          2.400 sec           0.567 sec          2.483 sec
XL1200        0.757 sec           0.178 sec          0.00122 sec
DS-3100	      2.34  sec            .55  sec         11.54  sec
DS-5000       1.34  sec            .32  sec          6.57  sec

*** For 3000 digits ***

           fast-isqrt-mid1     fast-isqrt-rec         isqrt
MACL         12.233 sec           2.450 sec       4503     sec
lucid        27.68  sec           5.45  sec        102.12  sec
akcl         24.567 sec           5.100 sec         24.567 sec
XL1200        7.257 sec           1.484 sec          0.00450 sec
DS-3100	     23.87  sec           4.71  sec         95.85  sec
DS-5000      13.69  sec           2.70  sec         54.10  sec


*** For 10000 digits ***

           fast-isqrt-mid1     fast-isqrt-rec         isqrt
MACL        142     sec          35     sec         very long
lucid       327.75  sec          80.31  sec       1126.10  sec
akcl        273.917 sec          64.367 sec        275.783 sec
XL1200       84.780 sec          20.910 sec          0.0120 sec
DS-3100	    283.89  sec          69.54  sec       1052.86  sec    
DS-5000     161.16  sec          39.61  sec        605.57  sec
From: Bruce R. Miller
Subject: Re: isqrt
Date: 
Message-ID: <2885210567@ARTEMIS.cam.nist.gov>
In article <························@KISIN.ACA.MCC.COM>, David Gadbois writes: 
>     Date: 5 Jun 91 13:32:11 GMT
>     From: ······@tansei.cc.u-tokyo.ac.jp (Akira Kurihara)
> 
>     I am interested in the speed of a Common Lisp function isqrt
>     for extremely big bignums.
> 
> I added the results of running the test on a Symbolics machine.  The
> results for the builtin ISQRT are surprising.  While I normally run
> screaming from numeric code, I had a look at the source, and it uses one
> of those clever bit-twiddling tricks that appears to calculate the
> result in O( (INTEGER-LENGTH n) ) time.  One of the "numerical recipes"
> books probably has an explanation of it.

Actually, one of the compiler books probably has an explanation of it!

I did the same computations (but using Genera 8.0.1, in case it matters)
and got essentially the same results as you did.
I mailed the results directly to Akira -- with one change:

Apparently the symbolics knows that isqrt has no `real' side-effects and
optimizes it out -- the disassembled code reveals no call to isqrt --
and I doubt that isqrt is stashed in microcode! 
To trick the compiler, I changed each timing body to 
  (setq result n)
  (setq result (fast-isqrt-mid1 n))
  ...
  (setq result (isqrt n))
and redid the tests.  Everything was the same except the isqrt timings
which now are essentially the same as fast-isqrt-mid1 (which _is_ "one of
those clever bit-twiddling tricks" !). 

bruce
······@cam.nist.gov
From: John B. Boyland
Subject: Re: isqrt
Date: 
Message-ID: <14007@pasteur.Berkeley.EDU>
In article <··········@ARTEMIS.cam.nist.gov>, ······@FS1.cam.nist.gov
(Bruce R. Miller) writes:
|> 
|> In article <························@KISIN.ACA.MCC.COM>, David
Gadbois writes: 
|> >     Date: 5 Jun 91 13:32:11 GMT
|> >     From: ······@tansei.cc.u-tokyo.ac.jp (Akira Kurihara)
|> > 
|> >     I am interested in the speed of a Common Lisp function isqrt
|> >     for extremely big bignums.
|> > 
|> > I added the results of running the test on a Symbolics machine.  The
|> > results for the builtin ISQRT are surprising. [...]
|> 
|> [...]
|> Apparently the symbolics knows that isqrt has no `real' side-effects and
|> optimizes it out [...]

I'm relieved to hear this: I spent a few hours finding a method that runs
about twice as fast as fast-isqrt-rec and was disappointed that it was
rendered worse than obsolete.

I have a [lovely?, no!] proof, which unfortunately cannot fit in
this margin :-), for this algorithm.  Essentially it works by noting that
this every recursive call of fast-isqrt-rec does two or three iterations
and that the last iteration is never useful (it only verifies the answer).
The three iteration case needs to be checked for but this only requires
a multiply of numbers only half as long.

For allegro on a DS3100, I get the comparison:
(I don't have the figures for the other versions, I sent them to Akira already)

		fast-isqrt-rec	faster-isqrt-rec
1000		    .884              .517
3000		   7.30              4.02
10000            107.               43.8

Here's the function.  throw stones at it and make it break.
(Who knows, maybe it has subtle bugs?)
(It uses a lot of code from the old fast-isqrt-rec)

---------------------------------------------------------------------

(defun faster-isqrt-rec (n &aux n-len-quarter n-half n-half-isqrt
			     init-value q r iterated-value)
  "argument n must be a non-negative integer"
  (cond
   ((> n 24)		; theoretically (> n 7) ,i.e., n-len-quarter > 0
    (setq n-len-quarter (ash (integer-length n) -2))
    (setq n-half (ash n (- (ash n-len-quarter 1))))
    (setq n-half-isqrt (faster-isqrt-rec n-half))
    (setq init-value (ash (1+ n-half-isqrt) n-len-quarter))
    (multiple-value-setq (q r) (floor n init-value))
    (setq iterated-value (ash (+ init-value q) -1))
    (if (eq (logbitp 0 q) (logbitp 0 init-value)) ; same sign
	;; average is exact and we need to test the result
	(let ((m (- iterated-value init-value)))
	  (if (> (* m m) r)
	      (- iterated-value 1)
	      iterated-value))
	;; average was not exact, we take value
	iterated-value))
   ((> n 15) 4)
   ((> n  8) 3)
   ((> n  3) 2)
   ((> n  0) 1)
   ((> n -1) 0)
   (t nil)))
-------------------------------------------------------------------

John Boyland
·······@sequoia.berkeley.edu