Hoi,
I've tried a while to get an efficient implementation of the miller-rabin
algorithm (probabilistic pseudoprime test).
After profiling my implementation I found out that one of the most critical
parts is my expt-mod function which calculates (mod (expt base exponent) n)
in a more efficient manner than directly.
Here is it:
(defun expt-mod (number exponent modulus)
(loop :with result = 1
:for i of-type fixnum :from 0 :below (integer-length exponent)
:for sqr = number :then (mod (* sqr sqr) modulus)
:when (logbitp i exponent) :do
(setf result (mod (* result sqr) modulus))
:finally (return result)))
How can I make this function faster?
I would also welcome completely different ideas on implementing this
modular exponentiation.
If someone is interested in the rest of the code I can upload it on my
Webpage.
Regards,
Jochen
http://www.dataheaven.de
Jochen Schmidt <···@dataheaven.de> writes:
> I would also welcome completely different ideas on implementing this
> modular exponentiation.
>
Your method seems to require at least N+1 multiplications. You can do
better with Knuth's Algorithm A in 4.6.3. Another useful discussion of
this problem is in Chapter 14 of the Handbook of Applied Cryptography,
which you can download at http://www.cacr.math.uwaterloo.ca/hac/.
--
Lieven Marchand <···@wyrd.be>
Gla�r ok reifr skyli gumna hverr, unz sinn b��r bana.
Lieven Marchand wrote:
> Jochen Schmidt <···@dataheaven.de> writes:
>
>> I would also welcome completely different ideas on implementing this
>> modular exponentiation.
>>
>
> Your method seems to require at least N+1 multiplications. You can do
> better with Knuth's Algorithm A in 4.6.3. Another useful discussion of
> this problem is in Chapter 14 of the Handbook of Applied Cryptography,
> which you can download at http://www.cacr.math.uwaterloo.ca/hac/.
I don't have Knuth's book - can you give me some points where I can find
something about it? Or possibly a name I can search for e. g "Montgomery
Exponentiation" or "Exponentiation by Window Sliding".
Thanks,
Jochen
Jochen Schmidt <···@dataheaven.de> writes:
> Lieven Marchand wrote:
>
> > Jochen Schmidt <···@dataheaven.de> writes:
> >
> >> I would also welcome completely different ideas on implementing this
> >> modular exponentiation.
> >>
> >
> > Your method seems to require at least N+1 multiplications. You can do
> > better with Knuth's Algorithm A in 4.6.3. Another useful discussion of
> > this problem is in Chapter 14 of the Handbook of Applied Cryptography,
> > which you can download at http://www.cacr.math.uwaterloo.ca/hac/.
>
> I don't have Knuth's book - can you give me some points where I can find
> something about it? Or possibly a name I can search for e. g "Montgomery
> Exponentiation" or "Exponentiation by Window Sliding".
Knuth, The Art of Computer Programming, Vol.2 Seminumerical
algorithms. You can get by with the Handbook for practical
implementation.
--
Lieven Marchand <···@wyrd.be>
Gla�r ok reifr skyli gumna hverr, unz sinn b��r bana.
Jochen Schmidt <···@dataheaven.de> writes:
> Hoi,
> I've tried a while to get an efficient implementation of the miller-rabin
> algorithm (probabilistic pseudoprime test).
> After profiling my implementation I found out that one of the most critical
> parts is my expt-mod function which calculates (mod (expt base exponent) n)
> in a more efficient manner than directly.
>
> Here is it:
>
> (defun expt-mod (number exponent modulus)
> (loop :with result = 1
> :for i of-type fixnum :from 0 :below (integer-length exponent)
> :for sqr = number :then (mod (* sqr sqr) modulus)
> :when (logbitp i exponent) :do
> (setf result (mod (* result sqr) modulus))
> :finally (return result)))
>
> How can I make this function faster?
More declarations? On cmucl, this:
(defun his-expt-mod (number exponent modulus)
(declare (fixnum number exponent modulus)
(optimize (speed 3) (safety 0)))
(setf number (mod number modulus))
(loop :with result :of-type fixnum = 1
:for i :of-type fixnum :from 0 :below (integer-length exponent)
:for sqr = number :then (mod (the fixnum (* sqr sqr)) modulus)
:when (logbitp i exponent) :do
(setf result (mod (the fixnum (* result sqr)) modulus))
:finally (return result)))
seems to be about 4 times faster than the above. Of course, you then
have to be certain that none of the operations will overflow for the
parameters you pass in (basically this means (* modulus modulus) is a
fixnum).
> I would also welcome completely different ideas on implementing this
> modular exponentiation.
I don't know of any radically different way of doing it; I'd write:
(defun my-expt-mod (number exponent modulus)
(declare (fixnum number exponent modulus)
(optimize (speed 3) (safety 0)))
(setf number (mod number modulus))
(let ((result 1))
(declare (fixnum result))
(loop :while (not (zerop exponent))
:if (oddp exponent)
:do (setf result
(mod (the fixnum (* result number)) modulus))
:do (setf number (mod (the fixnum (* number number)) modulus)
exponent (truncate exponent 2)))
result))
which seems to be a little quicker (on my machine, with my lisp, with
this phase of the moon, etc), but it's basically doing the same thing.
Cheers,
M.
--
First time I've gotten a programming job that required a drug
test. I was worried they were going to say "you don't have enough
LSD in your system to do Unix programming". -- Paul Tomblin
-- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html
Michael Hudson wrote:
> Jochen Schmidt <···@dataheaven.de> writes:
> More declarations? On cmucl, this:
>
> (defun his-expt-mod (number exponent modulus)
> (declare (fixnum number exponent modulus)
> (optimize (speed 3) (safety 0)))
> (setf number (mod number modulus))
> (loop :with result :of-type fixnum = 1
> :for i :of-type fixnum :from 0 :below (integer-length exponent)
> :for sqr = number :then (mod (the fixnum (* sqr sqr)) modulus)
> :when (logbitp i exponent) :do
> (setf result (mod (the fixnum (* result sqr)) modulus))
> :finally (return result)))
>
> seems to be about 4 times faster than the above. Of course, you then
> have to be certain that none of the operations will overflow for the
> parameters you pass in (basically this means (* modulus modulus) is a
> fixnum).
Yes sure, but as this function is used for finding big primes, or
cryptoalgorithms like RSA, ElGamal, Diffie Hellman it would not make much
sense to cut the arguments at fixnum lenght.
The encryption-exponent of RSA e.g coul easily be a number > 1e+400.
Maybe the calculating of primes is not more optimizable any more...
The my former posted implementation is 2 times faster than the former
implementation that used (floor exponent 2) to "walk" through the exponent.
I'll post some more code below so maybe the wizards of you can give me some
hints on making it better.
>> I would also welcome completely different ideas on implementing this
>> modular exponentiation.
>
> I don't know of any radically different way of doing it; I'd write:
>
> (defun my-expt-mod (number exponent modulus)
> (declare (fixnum number exponent modulus)
> (optimize (speed 3) (safety 0)))
> (setf number (mod number modulus))
> (let ((result 1))
> (declare (fixnum result))
> (loop :while (not (zerop exponent))
> :if (oddp exponent)
> :do (setf result
> (mod (the fixnum (* result number)) modulus))
> :do (setf number (mod (the fixnum (* number number)) modulus)
> exponent (truncate exponent 2)))
> result))
>
> which seems to be a little quicker (on my machine, with my lisp, with
> this phase of the moon, etc), but it's basically doing the same thing.
Have not tried it yet, but one of my implementations used (floor ...) in a
similar manner as you used (truncate ...). This implementation was slower
and consed _much_ more. As we are working with bignums here, such an
operation might easily cons heavily. That is why I tried not to change the
exponent at all and only (logbitp'ing ) over it.
Here are some of the other functions:
;; The primes beyond 2000
(defparameter *primeset*
(coerce (make-array 302
:element-type 'fixnum
:initial-contents
'(3 5 7 11 13 17 19 23 29 31 37 41
43 47 53 59 61 67 71 73 79 83 89 97 101 103
107 109 113 127 131 137 139 149 151 157 163
167 173 179 181 191 193 197 199 211 223 227
229 233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349 353
359 367 373 379 383 389 397 401 409 419 421
431 433 439 443 449 457 461 463 467 479 487
491 499 503 509 521 523 541 547 557 563 569
571 577 587 593 599 601 607 613 617 619 631
641 643 647 653 659 661 673 677 683 691 701
709 719 727 733 739 743 751 757 761 769 773
787 797 809 811 821 823 827 829 839 853 857
859 863 877 881 883 887 907 911 919 929 937
941 947 953 967 971 977 983 991 997 1009 1013
1019 1021 1031 1033 1039 1049 1051 1061 1063
1069 1087 1091 1093 1097 1103 1109 1117 1123
1129 1151 1153 1163 1171 1181 1187 1193 1201
1213 1217 1223 1229 1231 1237 1249 1259 1277
1279 1283 1289 1291 1297 1301 1303 1307 1319
1321 1327 1361 1367 1373 1381 1399 1409 1423
1427 1429 1433 1439 1447 1451 1453 1459 1471
1481 1483 1487 1489 1493 1499 1511 1523 1531
1543 1549 1553 1559 1567 1571 1579 1583 1597
1601 1607 1609 1613 1619 1621 1627 1637 1657
1663 1667 1669 1693 1697 1699 1709 1721 1723
1733 1741 1747 1753 1759 1777 1783 1787 1789
1801 1811 1823 1831 1847 1861 1867 1871 1873
1877 1879 1889 1901 1907 1913 1931 1933 1949
1951 1973 1979 1987 1993 1997 1999))
'(simple-array fixnum (302))))
;;;Test if n is a prime
(defun primep (n &key (trials 15.))
(let ((+n (abs n)))
(cond ((< +n 2) nil)
((= +n 2) t)
((= +n 3) t)
; ((and (> +n 100)
; (not (= 1 (gcd +n 223092870)))) ;;= (* 2. 3. 5. 7. 11
; ;;13 17 19 23)
; nil)
((and (> +n 2000)
(not-primep +n)) nil)
(t (multiple-value-bind (r s) (%calc-r-s +n)
(loop repeat trials
for a of-type integer = (+ (random (- +n 3)) 2)
never (not (%miller-rabin-test +n a r s))))))))
In this function the candididate prime n is first divided through the
primes in *primeset* vie the function (not-primep..) If it is dividable
through one of this primes, it is a composited number and therefore we can
conclude for sure that n is not prime. This operation is made to speed up
the whole process...
The uncommented lines contain a similar approach to that, that uses the
(gcd ...) with a composited number of different primes.
;;; Cheap primetest by dividing n with all primes in *primeset*
(defun not-primep (n)
(declare (type (simple-array fixnum (*)) *primeset*))
(loop :for prime :of-type fixnum :across *primeset*
:thereis (zerop (mod n prime))))
;;; Calculate r and s so that: n-1 = 2^s * r | r is oddp
(defun %calc-r-s (n)
(loop :with n-1 of-type integer = (1- n)
:for s of-type fixnum :upfrom 0
:until (logbitp s n-1)
:finally (return (values (ash n-1 (- s)) s))))
;;; Return nil if n is a component number and t if it is (probably) a prime
(defun %miller-rabin-test (n a r s)
(let ((y (expt-mod a r n))
(n-1 (1- n)))
(if (and (/= y 1)
(/= y n-1))
(loop :for j of-type fixnum :upfrom 1
:while (and (<= j (1- s))
(/= y n-1))
:do (setf y (mod (* y y) n))
:never (= y 1)
:finally (return (= y n-1)))
t)))
;;; Search next prime iteratively
(defun find-prime (N &key (base 2) (test #'primep))
(let ((number (let ((rnd (random (expt base n))))
(if (oddp rnd)
rnd
(1+ rnd)))))
(loop for prime of-type integer = number then (+ prime 2)
until (funcall test prime)
finally (return prime))))
Here are two runs of the cmucl profiler:
Seconds | Consed | Calls | Sec/Call | Name:
------------------------------------------------------
27.050 | 1,582,254 | 27 | 1.00185 | EXPT-MOD
0.199 | -1,475,120 | 91 | 0.00219 | NOT-PRIMEP
0.020 | 59,614 | 27 | 0.00073 | %MILLER-RABIN-TEST
0.019 | 64,644 | 91 | 0.00021 | PRIMEP
0.000 | 1,026 | 13 | 0.00000 | %CALC-R-S
0.000 | 13,612 | 1 | 0.00000 | FIND-PRIME
------------------------------------------------------
27.288 | 246,030 | 250 | | Total
Seconds | Consed | Calls | Sec/Call | Name:
------------------------------------------------------
108.489 | 2,700,042 | 107 | 1.01392 | EXPT-MOD
0.737 | -2,959,914 | 547 | 0.00135 | NOT-PRIMEP
0.043 | 311,204 | 547 | 0.00008 | PRIMEP
0.039 | 188,924 | 107 | 0.00036 | %MILLER-RABIN-TEST
0.004 | 63,294 | 2 | 0.00222 | FIND-PRIME
0.000 | 8,098 | 79 | 0.00000 | %CALC-R-S
------------------------------------------------------
109.313 | 311,648 | 1,389 | | Total
The times depend on how wide the first random candidate-prime is away from
the next prime, but because of the uniform distribution of the primes, the
times to find a 1024 bit prime are mostly between 30 and 50 seconds on my
AMD-K6-2 300. The cmucl profiler seems to show clearly that the work lies
in EXPT-MOD which seems to need around a second/call.
Jochen Schmidt wrote:
...
> The times depend on how wide the first random candidate-prime is away from
> the next prime, but because of the uniform distribution of the primes, the
> times to find a 1024 bit prime are mostly between 30 and 50 seconds on my
> AMD-K6-2 300. The cmucl profiler seems to show clearly that the work lies
> in EXPT-MOD which seems to need around a second/call.
For an alternative bignum multiplication implementation for
CMUCL using Karatsuba multiplication and scaling better for
large numbers see:
http://www2.cons.org:8000/ftp-area/cmucl/experimental/bignum-mult.lisp
This may double performance, but for better performance you'll
need to write a custom bignum expt-mod. See the above code for
an example of writing bignum code for CMUCL.
Regards
Douglas Crosher
"Douglas T. Crosher" <···@scieneer.com> writes:
> For an alternative bignum multiplication implementation for
> CMUCL using Karatsuba multiplication and scaling better for
> large numbers see:
>
> http://www2.cons.org:8000/ftp-area/cmucl/experimental/bignum-mult.lisp
>
> This may double performance, but for better performance you'll
> need to write a custom bignum expt-mod. See the above code for
> an example of writing bignum code for CMUCL.
Don't be too sure about the performance benefit in general
circumstances. There's a paper about the Lucid implementation in an
ACM proceeding that describes their bignum implementation with the
classical algorithms and says that they rejected the asymptotically
faster methods because of a sort of Zipf's law that almost all
integers are fixnums, of those that are not the most part are double
the size of fixnums etc.
--
Lieven Marchand <···@wyrd.be>
Gla�r ok reifr skyli gumna hverr, unz sinn b��r bana.
>>>>> "Lieven" == Lieven Marchand <···@wyrd.be> writes:
Lieven> "Douglas T. Crosher" <···@scieneer.com> writes:
>> For an alternative bignum multiplication implementation for
>> CMUCL using Karatsuba multiplication and scaling better for
>> large numbers see:
>>
>> http://www2.cons.org:8000/ftp-area/cmucl/experimental/bignum-mult.lisp
>>
>> This may double performance, but for better performance you'll
>> need to write a custom bignum expt-mod. See the above code for
>> an example of writing bignum code for CMUCL.
Lieven> Don't be too sure about the performance benefit in general
Lieven> circumstances. There's a paper about the Lucid implementation in an
Lieven> ACM proceeding that describes their bignum implementation with the
Lieven> classical algorithms and says that they rejected the asymptotically
Lieven> faster methods because of a sort of Zipf's law that almost all
Lieven> integers are fixnums, of those that are not the most part are double
Lieven> the size of fixnums etc.
I don't think Douglas was saying to use this method for all bignum
multiplies---just for the really big bignums. In fact, I'm pretty
sure the code use the standard multiply for "small" numbers and then
switches to Karatsuba multiplication only when the numbers are big
enough, about 512 or 1024 bits or larger.
Ray
Re:
> From: Raymond Toy <···@rtp.ericsson.se>
> >>>>> "Lieven" == Lieven Marchand <···@wyrd.be> writes:
>
> Lieven> "Douglas T. Crosher" <···@scieneer.com> writes:
> >> For an alternative bignum multiplication implementation for
> >> CMUCL using Karatsuba multiplication and scaling better for
> >> large numbers see:
> >>
> >> http://www2.cons.org:8000/ftp-area/cmucl/experimental/bignum-mult.lisp
> >>
> >> This may double performance, but for better performance you'll
> >> need to write a custom bignum expt-mod. See the above code for
> >> an example of writing bignum code for CMUCL.
>
> Lieven> Don't be too sure about the performance benefit in general
> Lieven> circumstances. There's a paper about the Lucid implementation in an
> Lieven> ACM proceeding that describes their bignum implementation with the
> Lieven> classical algorithms and says that they rejected the asymptotically
> Lieven> faster methods because of a sort of Zipf's law that almost all
> Lieven> integers are fixnums, of those that are not the most part are double Lieven> the size of fixnums etc.
>
> I don't think Douglas was saying to use this method for all bignum
> multiplies---just for the really big bignums. In fact, I'm pretty
> sure the code use the standard multiply for "small" numbers and then
> switches to Karatsuba multiplication only when the numbers are big
> enough, about 512 or 1024 bits or larger.
>
> Ray
I recently wrote a bignum package that was inspired by the article
I think you're referring to for the next version of Gambit-C, a
Scheme implementation. The bignum package itself is written in
Scheme.
Based on some timing tests, we use the following algorithms for bignum
multiplication, which are good, but probably not optimal on any
particular machine. Assuming arguments of roughly the same size, we
use
1. Naive multiplication for < 1400 bit arguments;
2. Karatsuba decompositions for arguments between 1400 and 6800 bits.
3. FFT for arguments between 6800 and $2^{23}$ bits.
4. Karatsuba again for larger arguments.
The upper limit on the FFT implementation is due to limits in accuracy
with 64-bit IEEE double-precision arithmetic. The Scheme FFT code is based
on an older version of Ooura's fft code in C; Ooura's C code runs only
about 25% faster than the Scheme version.
With this package one gets the following timings on a 500 MHz Alpha
21264 with a 100 MByte heap:
> (time (##exact-int.width (expt 3 100000)))
(time (##exact-int.width (expt 3 100000)))
80 ms real time
80 ms cpu time (75 user, 5 system)
no collections
3114952 bytes allocated
196 minor faults
no major faults
158497
> (time (##exact-int.width (expt 3 1000000)))
(time (##exact-int.width (expt 3 1000000)))
1143 ms real time
1143 ms cpu time (1099 user, 44 system)
no collections
28958712 bytes allocated
2744 minor faults
13 major faults
1584963
> (time (##exact-int.width (expt 3 10000000)))
(time (##exact-int.width (expt 3 10000000)))
22007 ms real time
22014 ms cpu time (21562 user, 452 system)
4 collections accounting for 31 ms real time (5 user, 26 system)
351461592 bytes allocated
27979 minor faults
no major faults
15849626
> (time (##exact-int.width (expt 3 100000000)))
(time (##exact-int.width (expt 3 100000000)))
703743 ms real time
698924 ms cpu time (686080 user, 12844 system)
135 collections accounting for 1128 ms real time (173 user, 955 system)
10708136536 bytes allocated
760961 minor faults
10 major faults
158496251
Brad Lucier
···@cs.purdue.edu (Bradley J Lucier) writes:
> I recently wrote a bignum package that was inspired by the article
> I think you're referring to for the next version of Gambit-C, a
> Scheme implementation. The bignum package itself is written in
> Scheme.
The paper I was thinking about is by Jon L. White in Proceedings of
the 1986 ACM conference on LISP and functional programming:
Reconfigurable, retargetable bignums: a case study in efficient,
portable Lisp system building
> Based on some timing tests, we use the following algorithms for bignum
> multiplication, which are good, but probably not optimal on any
> particular machine. Assuming arguments of roughly the same size, we
> use
>
> 1. Naive multiplication for < 1400 bit arguments;
> 2. Karatsuba decompositions for arguments between 1400 and 6800 bits.
> 3. FFT for arguments between 6800 and $2^{23}$ bits.
> 4. Karatsuba again for larger arguments.
>
> The upper limit on the FFT implementation is due to limits in accuracy
> with 64-bit IEEE double-precision arithmetic. The Scheme FFT code is based
> on an older version of Ooura's fft code in C; Ooura's C code runs only
> about 25% faster than the Scheme version.
That sounds about right. The topic has come up in sci.math regarding
what the people who compute pi to 60 million decimals and
computational number theorist use. I believe Bob Silverman gave a list
very similar to yours but deja^Wgoogle has lost the post.
--
Lieven Marchand <···@wyrd.be>
Gla�r ok reifr skyli gumna hverr, unz sinn b��r bana.
>>>>> "Jochen" == Jochen Schmidt <···@dataheaven.de> writes:
Jochen> Have not tried it yet, but one of my implementations used (floor ...) in a
Jochen> similar manner as you used (truncate ...). This implementation was slower
Jochen> and consed _much_ more. As we are working with bignums here, such an
Jochen> operation might easily cons heavily. That is why I tried not to change the
Jochen> exponent at all and only (logbitp'ing ) over it.
That depends on the implementation of logbitp. I know that in earlier
versions of CMUCL, logbitp consed a lot because it basically did a big
ash and logand to figure out if the given bit was set or not. Newer
versions (18c?) now do the obvious thing and extract just the desired
bit which is faster for big bignums.
Ray