Fast modular exponentiation in CL?

From: Jochen Schmidt
Subject: Fast modular exponentiation in CL?
Date: Sat, 17 Feb 2001 10:28:35 +0000
Message-ID: <96ljfk$m1bh1$1@ID-22205.news.dfncis.de>

Hoi,
I've tried a while to get an efficient implementation of the miller-rabin
algorithm (probabilistic pseudoprime test).
After profiling my implementation I found out that one of the most critical 
parts is my expt-mod function which calculates (mod (expt base exponent) n) 
in a more efficient manner than directly.

Here is it:

(defun expt-mod (number exponent modulus)
    (loop :with result = 1
            :for i of-type fixnum :from 0 :below (integer-length exponent)
            :for sqr = number :then (mod (* sqr sqr) modulus)
            :when (logbitp i exponent) :do
              (setf result (mod (* result sqr) modulus))
            :finally (return result)))

How can I make this function faster?
I would also welcome completely different ideas on implementing this 
modular exponentiation.

If someone is interested in the rest of the code I can upload it on my 
Webpage.

Regards,
Jochen

http://www.dataheaven.de

Re: Fast modular exponentiation in CL? Lieven Marchand
- Re: Fast modular exponentiation in CL? Jochen Schmidt
  - Re: Fast modular exponentiation in CL? Lieven Marchand
Re: Fast modular exponentiation in CL? Michael Hudson
- Re: Fast modular exponentiation in CL? Jochen Schmidt
  - Re: Fast modular exponentiation in CL? Douglas T. Crosher
    - Re: Fast modular exponentiation in CL? Lieven Marchand
      - Re: Fast modular exponentiation in CL? Raymond Toy
        Re: Fast modular exponentiation in CL? Bradley J Lucier
        Re: Fast modular exponentiation in CL? Lieven Marchand
  - Re: Fast modular exponentiation in CL? Raymond Toy

From: Lieven Marchand
Subject: Re: Fast modular exponentiation in CL?
Date: Sat, 17 Feb 2001 12:01:13 +0000
Message-ID: <m3u25tmsc6.fsf@localhost.localdomain>

Jochen Schmidt <···@dataheaven.de> writes:

> I would also welcome completely different ideas on implementing this 
> modular exponentiation.
> 

Your method seems to require at least N+1 multiplications. You can do
better with Knuth's Algorithm A in 4.6.3. Another useful discussion of
this problem is in Chapter 14 of the Handbook of Applied Cryptography,
which you can download at http://www.cacr.math.uwaterloo.ca/hac/.

-- 
Lieven Marchand <···@wyrd.be>
Gla�r ok reifr skyli gumna hverr, unz sinn b��r bana.

From: Jochen Schmidt
Subject: Re: Fast modular exponentiation in CL?
Date: Sat, 17 Feb 2001 19:15:38 +0000
Message-ID: <96mibr$l5232$1@ID-22205.news.dfncis.de>

Lieven Marchand wrote:

> Jochen Schmidt <···@dataheaven.de> writes:
> 
>> I would also welcome completely different ideas on implementing this
>> modular exponentiation.
>> 
> 
> Your method seems to require at least N+1 multiplications. You can do
> better with Knuth's Algorithm A in 4.6.3. Another useful discussion of
> this problem is in Chapter 14 of the Handbook of Applied Cryptography,
> which you can download at http://www.cacr.math.uwaterloo.ca/hac/.

I don't have Knuth's book - can you give me some points where I can find 
something about it? Or possibly a name I can search for e. g "Montgomery 
Exponentiation" or "Exponentiation by Window Sliding".

Thanks,
Jochen

From: Lieven Marchand
Subject: Re: Fast modular exponentiation in CL?
Date: Sat, 17 Feb 2001 21:17:00 +0000
Message-ID: <m3y9v5dn77.fsf@localhost.localdomain>

Jochen Schmidt <···@dataheaven.de> writes:

> Lieven Marchand wrote:
> 
> > Jochen Schmidt <···@dataheaven.de> writes:
> > 
> >> I would also welcome completely different ideas on implementing this
> >> modular exponentiation.
> >> 
> > 
> > Your method seems to require at least N+1 multiplications. You can do
> > better with Knuth's Algorithm A in 4.6.3. Another useful discussion of
> > this problem is in Chapter 14 of the Handbook of Applied Cryptography,
> > which you can download at http://www.cacr.math.uwaterloo.ca/hac/.
> 
> I don't have Knuth's book - can you give me some points where I can find 
> something about it? Or possibly a name I can search for e. g "Montgomery 
> Exponentiation" or "Exponentiation by Window Sliding".

Knuth, The Art of Computer Programming, Vol.2 Seminumerical
algorithms. You can get by with the Handbook for practical
implementation.

-- 
Lieven Marchand <···@wyrd.be>
Gla�r ok reifr skyli gumna hverr, unz sinn b��r bana.

From: Michael Hudson
Subject: Re: Fast modular exponentiation in CL?
Date: Sat, 17 Feb 2001 13:14:06 +0000
Message-ID: <m3d7che9k1.fsf@atrus.jesus.cam.ac.uk>

Jochen Schmidt <···@dataheaven.de> writes:

> Hoi,
> I've tried a while to get an efficient implementation of the miller-rabin
> algorithm (probabilistic pseudoprime test).
> After profiling my implementation I found out that one of the most critical 
> parts is my expt-mod function which calculates (mod (expt base exponent) n) 
> in a more efficient manner than directly.
> 
> Here is it:
> 
> (defun expt-mod (number exponent modulus)
>     (loop :with result = 1
>             :for i of-type fixnum :from 0 :below (integer-length exponent)
>             :for sqr = number :then (mod (* sqr sqr) modulus)
>             :when (logbitp i exponent) :do
>               (setf result (mod (* result sqr) modulus))
>             :finally (return result)))
> 
> How can I make this function faster?

More declarations?  On cmucl, this:

(defun his-expt-mod (number exponent modulus)
  (declare (fixnum number exponent modulus)
           (optimize (speed 3) (safety 0)))
  (setf number (mod number modulus))
  (loop :with result :of-type fixnum = 1
        :for i :of-type fixnum :from 0 :below (integer-length exponent)
        :for sqr = number :then (mod (the fixnum (* sqr sqr)) modulus)
        :when (logbitp i exponent) :do
        (setf result (mod (the fixnum (* result sqr)) modulus))
        :finally (return result)))

seems to be about 4 times faster than the above.  Of course, you then
have to be certain that none of the operations will overflow for the
parameters you pass in (basically this means (* modulus modulus) is a
fixnum).

> I would also welcome completely different ideas on implementing this 
> modular exponentiation.

I don't know of any radically different way of doing it; I'd write:

(defun my-expt-mod (number exponent modulus)
  (declare (fixnum number exponent modulus)
           (optimize (speed 3) (safety 0)))
  (setf number (mod number modulus))
  (let ((result 1))
    (declare (fixnum result))
    (loop :while (not (zerop exponent))
          :if (oddp exponent)
          :do (setf result
                    (mod (the fixnum (* result number)) modulus))
          :do (setf number (mod (the fixnum (* number number)) modulus)
                    exponent (truncate exponent 2)))
    result))

which seems to be a little quicker (on my machine, with my lisp, with
this phase of the moon, etc), but it's basically doing the same thing.

Cheers,
M.

-- 
  First time I've gotten a programming job that required a drug
  test. I was worried they were going to say "you don't have enough
  LSD in your system to do Unix programming".          -- Paul Tomblin
               -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html

From: Jochen Schmidt
Subject: Re: Fast modular exponentiation in CL?
Date: Sat, 17 Feb 2001 19:03:06 +0000
Message-ID: <96mhka$m4d65$1@ID-22205.news.dfncis.de>

Michael Hudson wrote:

> Jochen Schmidt <···@dataheaven.de> writes:

> More declarations?  On cmucl, this:
> 
> (defun his-expt-mod (number exponent modulus)
>   (declare (fixnum number exponent modulus)
>            (optimize (speed 3) (safety 0)))
>   (setf number (mod number modulus))
>   (loop :with result :of-type fixnum = 1
>         :for i :of-type fixnum :from 0 :below (integer-length exponent)
>         :for sqr = number :then (mod (the fixnum (* sqr sqr)) modulus)
>         :when (logbitp i exponent) :do
>         (setf result (mod (the fixnum (* result sqr)) modulus))
>         :finally (return result)))
> 
> seems to be about 4 times faster than the above.  Of course, you then
> have to be certain that none of the operations will overflow for the
> parameters you pass in (basically this means (* modulus modulus) is a
> fixnum).

Yes sure, but as this function is used for finding big primes, or 
cryptoalgorithms like RSA, ElGamal, Diffie Hellman it would not make much 
sense to cut the arguments at fixnum lenght.
The encryption-exponent of RSA e.g coul easily be a number > 1e+400.

Maybe the calculating of primes is not more optimizable any more...
The my former posted implementation is 2 times faster than the former 
implementation that used (floor exponent 2) to "walk" through the exponent.

I'll post some more code below so maybe the wizards of you can give me some 
hints on making it better.

>> I would also welcome completely different ideas on implementing this
>> modular exponentiation.
> 
> I don't know of any radically different way of doing it; I'd write:
> 
> (defun my-expt-mod (number exponent modulus)
>   (declare (fixnum number exponent modulus)
>            (optimize (speed 3) (safety 0)))
>   (setf number (mod number modulus))
>   (let ((result 1))
>     (declare (fixnum result))
>     (loop :while (not (zerop exponent))
>           :if (oddp exponent)
>           :do (setf result
>                     (mod (the fixnum (* result number)) modulus))
>           :do (setf number (mod (the fixnum (* number number)) modulus)
>                     exponent (truncate exponent 2)))
>     result))
> 
> which seems to be a little quicker (on my machine, with my lisp, with
> this phase of the moon, etc), but it's basically doing the same thing.

Have not tried it yet, but one of my implementations used (floor ...) in a 
similar manner as you used (truncate ...). This implementation was slower 
and consed _much_ more. As we are working with bignums here, such an 
operation might easily cons heavily. That is why I tried not to change the 
exponent at all and only (logbitp'ing ) over it.

Here are some of the other functions:

;; The primes beyond 2000
(defparameter *primeset*
  (coerce (make-array 302
                      :element-type 'fixnum
                      :initial-contents
                      '(3 5 7 11 13 17 19 23 29 31 37 41
                        43 47 53 59 61 67 71 73 79 83 89 97 101 103
                        107 109 113 127 131 137 139 149 151 157 163
                        167 173 179 181 191 193 197 199 211 223 227
                        229 233 239 241 251 257 263 269 271 277 281
                        283 293 307 311 313 317 331 337 347 349 353
                        359 367 373 379 383 389 397 401 409 419 421
                        431 433 439 443 449 457 461 463 467 479 487
                        491 499 503 509 521 523 541 547 557 563 569
                        571 577 587 593 599 601 607 613 617 619 631
                        641 643 647 653 659 661 673 677 683 691 701
                        709 719 727 733 739 743 751 757 761 769 773
                        787 797 809 811 821 823 827 829 839 853 857
                        859 863 877 881 883 887 907 911 919 929 937
                        941 947 953 967 971 977 983 991 997 1009 1013
                        1019 1021 1031 1033 1039 1049 1051 1061 1063
                        1069 1087 1091 1093 1097 1103 1109 1117 1123
                        1129 1151 1153 1163 1171 1181 1187 1193 1201
                        1213 1217 1223 1229 1231 1237 1249 1259 1277
                        1279 1283 1289 1291 1297 1301 1303 1307 1319
                        1321 1327 1361 1367 1373 1381 1399 1409 1423
                        1427 1429 1433 1439 1447 1451 1453 1459 1471
                        1481 1483 1487 1489 1493 1499 1511 1523 1531
                        1543 1549 1553 1559 1567 1571 1579 1583 1597
                        1601 1607 1609 1613 1619 1621 1627 1637 1657
                        1663 1667 1669 1693 1697 1699 1709 1721 1723
                        1733 1741 1747 1753 1759 1777 1783 1787 1789
                        1801 1811 1823 1831 1847 1861 1867 1871 1873
                        1877 1879 1889 1901 1907 1913 1931 1933 1949
                        1951 1973 1979 1987 1993 1997 1999))
          '(simple-array fixnum (302))))

;;;Test if n is a prime
(defun primep (n &key (trials 15.))
    (let ((+n (abs n)))
      (cond ((< +n 2) nil)
            ((= +n 2) t)
            ((= +n 3) t)
       ;     ((and (> +n 100)
       ;           (not (= 1 (gcd +n 223092870))))  ;;= (* 2.  3.  5.  7. 11
       ;                                                     ;;13 17 19 23)
       ;        nil)
            ((and (> +n 2000)
                  (not-primep +n)) nil)
            (t (multiple-value-bind (r s) (%calc-r-s +n)
                 (loop repeat trials
                   for a of-type integer = (+ (random (- +n 3)) 2)
                   never (not (%miller-rabin-test +n a r s))))))))

In this function the candididate prime n is first divided through the 
primes in *primeset* vie the function (not-primep..) If it is dividable 
through one of this primes, it is a composited number and therefore we can 
conclude for sure that n is not prime. This operation is made to speed up 
the whole process...
The uncommented lines contain a similar approach to that, that uses the 
(gcd ...) with a composited number of different primes.

;;; Cheap primetest by dividing n with all primes in *primeset*
(defun not-primep (n)
  (declare (type (simple-array fixnum (*)) *primeset*))
  (loop :for prime :of-type fixnum :across *primeset*
        :thereis (zerop (mod n prime))))

;;; Calculate r and s so that: n-1 = 2^s * r | r is oddp
(defun %calc-r-s (n)
  (loop :with n-1 of-type integer = (1- n)
    :for s of-type fixnum :upfrom 0
    :until (logbitp s n-1)
    :finally (return (values (ash n-1 (- s)) s))))

;;; Return nil if n is a component number and t if it is (probably) a prime
(defun %miller-rabin-test (n a r s)
  (let ((y (expt-mod a r n))
        (n-1 (1- n)))
    (if (and (/= y 1)
             (/= y n-1))
        (loop :for j of-type fixnum :upfrom 1
              :while (and (<= j (1- s))
                      (/= y  n-1))
              :do (setf y (mod (* y y) n))
              :never (= y 1)
              :finally (return (= y n-1)))
      t)))

;;; Search next prime iteratively
(defun find-prime (N &key (base 2) (test #'primep))
  (let ((number (let ((rnd (random (expt base n))))
                  (if (oddp rnd)
                      rnd
                    (1+ rnd)))))
    (loop for prime of-type integer = number then (+ prime 2)
          until (funcall test prime)
          finally (return prime))))

Here are two runs of the cmucl profiler:

  Seconds  |  Consed   |  Calls  |  Sec/Call  |  Name:
------------------------------------------------------
    27.050 | 1,582,254 |      27 |    1.00185 | EXPT-MOD
     0.199 | -1,475,120 |      91 |    0.00219 | NOT-PRIMEP
     0.020 |    59,614 |      27 |    0.00073 | %MILLER-RABIN-TEST
     0.019 |    64,644 |      91 |    0.00021 | PRIMEP
     0.000 |     1,026 |      13 |    0.00000 | %CALC-R-S
     0.000 |    13,612 |       1 |    0.00000 | FIND-PRIME
------------------------------------------------------
    27.288 |   246,030 |     250 |            | Total

  Seconds  |  Consed   |  Calls  |  Sec/Call  |  Name:
------------------------------------------------------
   108.489 | 2,700,042 |     107 |    1.01392 | EXPT-MOD
     0.737 | -2,959,914 |     547 |    0.00135 | NOT-PRIMEP
     0.043 |   311,204 |     547 |    0.00008 | PRIMEP
     0.039 |   188,924 |     107 |    0.00036 | %MILLER-RABIN-TEST
     0.004 |    63,294 |       2 |    0.00222 | FIND-PRIME
     0.000 |     8,098 |      79 |    0.00000 | %CALC-R-S
------------------------------------------------------
   109.313 |   311,648 |   1,389 |            | Total

The times depend on how wide the first random candidate-prime is away from 
the next prime, but because of the uniform distribution of the primes, the 
times to find a 1024 bit prime are mostly between 30 and 50 seconds on my
AMD-K6-2 300. The cmucl profiler seems to show clearly that the work lies 
in EXPT-MOD which seems to need around a second/call.

From: Douglas T. Crosher
Subject: Re: Fast modular exponentiation in CL?
Date: Sat, 17 Feb 2001 21:53:40 +0000
Message-ID: <3A8EF2E4.309C8122@scieneer.com>

Jochen Schmidt wrote:
...
> The times depend on how wide the first random candidate-prime is away from
> the next prime, but because of the uniform distribution of the primes, the
> times to find a 1024 bit prime are mostly between 30 and 50 seconds on my
> AMD-K6-2 300. The cmucl profiler seems to show clearly that the work lies
> in EXPT-MOD which seems to need around a second/call.

For an alternative bignum multiplication implementation for
CMUCL using Karatsuba multiplication and scaling better for
large numbers see:

http://www2.cons.org:8000/ftp-area/cmucl/experimental/bignum-mult.lisp

This may double performance, but for better performance you'll
need to write a custom bignum expt-mod. See the above code for
an example of writing bignum code for CMUCL.

Regards
Douglas Crosher

From: Lieven Marchand
Subject: Re: Fast modular exponentiation in CL?
Date: Sat, 17 Feb 2001 21:15:34 +0000
Message-ID: <m3zofldn9l.fsf@localhost.localdomain>

"Douglas T. Crosher" <···@scieneer.com> writes:

> For an alternative bignum multiplication implementation for
> CMUCL using Karatsuba multiplication and scaling better for
> large numbers see:
> 
> http://www2.cons.org:8000/ftp-area/cmucl/experimental/bignum-mult.lisp
> 
> This may double performance, but for better performance you'll
> need to write a custom bignum expt-mod. See the above code for
> an example of writing bignum code for CMUCL.

Don't be too sure about the performance benefit in general
circumstances. There's a paper about the Lucid implementation in an
ACM proceeding that describes their bignum implementation with the
classical algorithms and says that they rejected the asymptotically
faster methods because of a sort of Zipf's law that almost all
integers are fixnums, of those that are not the most part are double
the size of fixnums etc.

-- 
Lieven Marchand <···@wyrd.be>
Gla�r ok reifr skyli gumna hverr, unz sinn b��r bana.

From: Raymond Toy
Subject: Re: Fast modular exponentiation in CL?
Date: Mon, 19 Feb 2001 17:20:19 +0000
Message-ID: <4n3dda8u98.fsf@rtp.ericsson.se>

>>>>> "Lieven" == Lieven Marchand <···@wyrd.be> writes:

    Lieven> "Douglas T. Crosher" <···@scieneer.com> writes:
    >> For an alternative bignum multiplication implementation for
    >> CMUCL using Karatsuba multiplication and scaling better for
    >> large numbers see:
    >> 
    >> http://www2.cons.org:8000/ftp-area/cmucl/experimental/bignum-mult.lisp
    >> 
    >> This may double performance, but for better performance you'll
    >> need to write a custom bignum expt-mod. See the above code for
    >> an example of writing bignum code for CMUCL.

    Lieven> Don't be too sure about the performance benefit in general
    Lieven> circumstances. There's a paper about the Lucid implementation in an
    Lieven> ACM proceeding that describes their bignum implementation with the
    Lieven> classical algorithms and says that they rejected the asymptotically
    Lieven> faster methods because of a sort of Zipf's law that almost all
    Lieven> integers are fixnums, of those that are not the most part are double
    Lieven> the size of fixnums etc.

I don't think Douglas was saying to use this method for all bignum
multiplies---just for the really big bignums.  In fact, I'm pretty
sure the code use the standard multiply for "small" numbers and then
switches to Karatsuba multiplication only when the numbers are big
enough, about 512 or 1024 bits or larger.

Ray

From: Bradley J Lucier
Subject: Re: Fast modular exponentiation in CL?
Date: Mon, 19 Feb 2001 20:56:09 +0000
Message-ID: <96s199$nn2@arthur.cs.purdue.edu>

Re:
> From: Raymond Toy <···@rtp.ericsson.se>
> >>>>> "Lieven" == Lieven Marchand <···@wyrd.be> writes:
> 
>     Lieven> "Douglas T. Crosher" <···@scieneer.com> writes:
>     >> For an alternative bignum multiplication implementation for
>     >> CMUCL using Karatsuba multiplication and scaling better for
>     >> large numbers see:
>     >> 
>     >> http://www2.cons.org:8000/ftp-area/cmucl/experimental/bignum-mult.lisp
>     >> 
>     >> This may double performance, but for better performance you'll
>     >> need to write a custom bignum expt-mod. See the above code for
>     >> an example of writing bignum code for CMUCL.
> 
>     Lieven> Don't be too sure about the performance benefit in general
>     Lieven> circumstances. There's a paper about the Lucid implementation in an
>     Lieven> ACM proceeding that describes their bignum implementation with the
>     Lieven> classical algorithms and says that they rejected the asymptotically
>     Lieven> faster methods because of a sort of Zipf's law that almost all
>     Lieven> integers are fixnums, of those that are not the most part are double    Lieven> the size of fixnums etc.
> 
> I don't think Douglas was saying to use this method for all bignum
> multiplies---just for the really big bignums.  In fact, I'm pretty
> sure the code use the standard multiply for "small" numbers and then
> switches to Karatsuba multiplication only when the numbers are big
> enough, about 512 or 1024 bits or larger.
> 
> Ray

I recently wrote a bignum package that was inspired by the article
I think you're referring to for the next version of Gambit-C, a
Scheme implementation.  The bignum package itself is written in
Scheme.

Based on some timing tests, we use the following algorithms for bignum
multiplication, which are good, but probably not optimal on any
particular machine.  Assuming arguments of roughly the same size, we
use

1.  Naive multiplication for < 1400 bit arguments;
2.  Karatsuba decompositions for arguments between 1400 and 6800 bits.
3.  FFT for arguments between 6800 and $2^{23}$ bits.
4.  Karatsuba again for larger arguments.

The upper limit on the FFT implementation is due to limits in accuracy
with 64-bit IEEE double-precision arithmetic.  The Scheme FFT code is based
on an older version of Ooura's fft code in C; Ooura's C code runs only
about 25% faster than the Scheme version.

With this package one gets the following timings on a 500 MHz Alpha
21264 with a 100 MByte heap:

> (time (##exact-int.width (expt 3 100000)))
(time (##exact-int.width (expt 3 100000)))
    80 ms real time
    80 ms cpu time (75 user, 5 system)
    no collections
    3114952 bytes allocated
    196 minor faults
    no major faults
158497
> (time (##exact-int.width (expt 3 1000000)))
(time (##exact-int.width (expt 3 1000000)))
    1143 ms real time
    1143 ms cpu time (1099 user, 44 system)
    no collections
    28958712 bytes allocated
    2744 minor faults
    13 major faults
1584963
> (time (##exact-int.width (expt 3 10000000)))
(time (##exact-int.width (expt 3 10000000)))
    22007 ms real time
    22014 ms cpu time (21562 user, 452 system)
    4 collections accounting for 31 ms real time (5 user, 26 system)
    351461592 bytes allocated
    27979 minor faults
    no major faults
15849626
> (time (##exact-int.width (expt 3 100000000)))
(time (##exact-int.width (expt 3 100000000)))
    703743 ms real time
    698924 ms cpu time (686080 user, 12844 system)
    135 collections accounting for 1128 ms real time (173 user, 955 system)
    10708136536 bytes allocated
    760961 minor faults
    10 major faults
158496251

Brad Lucier

From: Lieven Marchand
Subject: Re: Fast modular exponentiation in CL?
Date: Tue, 20 Feb 2001 17:22:59 +0000
Message-ID: <m3hf1pclqk.fsf@localhost.localdomain>

···@cs.purdue.edu (Bradley J Lucier) writes:

> I recently wrote a bignum package that was inspired by the article
> I think you're referring to for the next version of Gambit-C, a
> Scheme implementation.  The bignum package itself is written in
> Scheme.

The paper I was thinking about is by Jon L. White in Proceedings of
the 1986 ACM conference on LISP and functional programming:

Reconfigurable, retargetable bignums: a case study in efficient,
portable Lisp system building

> Based on some timing tests, we use the following algorithms for bignum
> multiplication, which are good, but probably not optimal on any
> particular machine.  Assuming arguments of roughly the same size, we
> use
> 
> 1.  Naive multiplication for < 1400 bit arguments;
> 2.  Karatsuba decompositions for arguments between 1400 and 6800 bits.
> 3.  FFT for arguments between 6800 and $2^{23}$ bits.
> 4.  Karatsuba again for larger arguments.
> 
> The upper limit on the FFT implementation is due to limits in accuracy
> with 64-bit IEEE double-precision arithmetic.  The Scheme FFT code is based
> on an older version of Ooura's fft code in C; Ooura's C code runs only
> about 25% faster than the Scheme version.

That sounds about right. The topic has come up in sci.math regarding
what the people who compute pi to 60 million decimals and
computational number theorist use. I believe Bob Silverman gave a list
very similar to yours but deja^Wgoogle has lost the post.

-- 
Lieven Marchand <···@wyrd.be>
Gla�r ok reifr skyli gumna hverr, unz sinn b��r bana.

From: Raymond Toy
Subject: Re: Fast modular exponentiation in CL?
Date: Mon, 19 Feb 2001 17:40:02 +0000
Message-ID: <4nu25q7erx.fsf@rtp.ericsson.se>

>>>>> "Jochen" == Jochen Schmidt <···@dataheaven.de> writes:


    Jochen> Have not tried it yet, but one of my implementations used (floor ...) in a 
    Jochen> similar manner as you used (truncate ...). This implementation was slower 
    Jochen> and consed _much_ more. As we are working with bignums here, such an 
    Jochen> operation might easily cons heavily. That is why I tried not to change the 
    Jochen> exponent at all and only (logbitp'ing ) over it.

That depends on the implementation of logbitp.  I know that in earlier
versions of CMUCL, logbitp consed a lot because it basically did a big
ash and logand to figure out if the given bit was set or not.  Newer
versions (18c?) now do the obvious thing and extract just the desired
bit which is faster for big bignums.

Ray