Alain Picard wrote:
>
> Fellow lispers,
>
> I needed a simple, in-lisp (i.e. no FFI) encryption for an
> application, so I wrote the following. Perhaps it will be
> of some use to someone else.
>
> I'm posting this here because I don't have a public FTP site, and
> I'm not too sure where would be a good place to send this to.
> Certainly, either the CCLAN or CCLOCC (or both!) people are welcome
> to incorporate this into their offerings, if they desire.
>
> My thanks to Pierre Mai for posting an excellend MD5 implementation
> in lisp, from which I learned many things which went into this code.
I could offer you to include it to my "Cryptography in Common Lisp" site
on http://www.dataheaven.de if you want.
It is rather minimalistic but it will certainly grow in future. I've
contributed my "in-lisp" RSA implementation, math utilities and my binding
to CL-SSL.
Some months ago I worked on a version of ARC4 (A stream cipher) but I have
to look it up on my harddisk first.
I remember there were implementations of RIPEMD and SHA too but I don't
remember the details.
ciao,
Jochen
--
http://www.dataheaven.de
Jochen Schmidt <···@dataheaven.de> writes:
> I could offer you to include it to my "Cryptography in Common Lisp" site
> on http://www.dataheaven.de if you want.
This is great :-)
I just implemented both blowfish and RSA myself (mostly because I
needed something small and light which could be cross-platform
portable without having to depend on external libraries), so I don't
need any of them right now, but whenever I might find some spare time,
I'll have a look and see if I can contribute anything at all to what
you guys have already done!
--
(espen)
Espen Vestre wrote:
> Jochen Schmidt <···@dataheaven.de> writes:
>
>> I could offer you to include it to my "Cryptography in Common Lisp" site
>> on http://www.dataheaven.de if you want.
>
> This is great :-)
>
> I just implemented both blowfish and RSA myself (mostly because I
> needed something small and light which could be cross-platform
> portable without having to depend on external libraries), so I don't
> need any of them right now, but whenever I might find some spare time,
> I'll have a look and see if I can contribute anything at all to what
> you guys have already done!
I really would be interested in seeing your modular exponentiation function
as this is the major bottleneck in my RSA implementation.
I think a fast lowlevel tuned modular exponentiation function would be a
good enhancement for CLs numeric function repertoire. Many cryptographic
algorithms make use of that. It is possible to implement it fast by using
things like "montgomery multiplication" which enables fast modular
exponentiation.
ciao,
Jochen
--
http://www.dataheaven.de
(defun expt-mod-n (a x n)
(let ( (result 1) )
(while (> x 0)
(if (oddp x)
(setf result (mod (* result a) n)))
(setf x (ash x -1))
(setf a (mod (* a a) n)))
result))
In article <············@rznews2.rrze.uni-erlangen.de>, Jochen Schmidt
<···@dataheaven.de> wrote:
> Espen Vestre wrote:
>
> > Jochen Schmidt <···@dataheaven.de> writes:
> >
> >> I could offer you to include it to my "Cryptography in Common Lisp" site
> >> on http://www.dataheaven.de if you want.
> >
> > This is great :-)
> >
> > I just implemented both blowfish and RSA myself (mostly because I
> > needed something small and light which could be cross-platform
> > portable without having to depend on external libraries), so I don't
> > need any of them right now, but whenever I might find some spare time,
> > I'll have a look and see if I can contribute anything at all to what
> > you guys have already done!
>
> I really would be interested in seeing your modular exponentiation function
> as this is the major bottleneck in my RSA implementation.
>
> I think a fast lowlevel tuned modular exponentiation function would be a
> good enhancement for CLs numeric function repertoire. Many cryptographic
> algorithms make use of that. It is possible to implement it fast by using
> things like "montgomery multiplication" which enables fast modular
> exponentiation.
>
> ciao,
> Jochen
>
> --
> http://www.dataheaven.de
Erann Gat wrote:
> (defun expt-mod-n (a x n)
> (let ( (result 1) )
> (while (> x 0)
> (if (oddp x)
> (setf result (mod (* result a) n)))
> (setf x (ash x -1))
> (setf a (mod (* a a) n)))
> result))
I rewrote it using (loop while (> x 0) ...)
because while is not part of ANSI CL.
I also added declarations.
(defun expt-mod-n (a x n)
(declare (optimize (speed 3) (safety 0)))
(let ((result 1))
(loop while (> x 0)
do (if (oddp x)
(setf result (mod (* result a) n)))
(setf x (ash x -1))
(setf a (mod (* a a) n)))
result))
I compared it to mine:
(defun expt-mod (n exponent modulus)
"As (mod (expt n exponent) modulus), but more efficient."
(declare (optimize (speed 3) (safety 0) (space 0) (debug 0)))
(loop with result = 1
for i of-type fixnum from 0 below (integer-length exponent)
for sqr = n then (mod (* sqr sqr) modulus)
when (logbitp i exponent) do
(setf result (mod (* result sqr) modulus))
finally (return result)))
Of course the result of integer-length can be bigger than a fixnum but
not in application to the cryptographic algorithms I used it for.
A more general CL implementation can test the ranges of it's parameters
first and then choose the best (and more important: a correct)
implementation.
The difference in LispWorks 4.2.6 was significant. (With mine being faster).
Here is my reasoning why this is so:
I think that the use of logbitp and integer-length enables the
implementation to reduce consing here. The ASH will allocate a new bignum
per cycle in your code while increasing the bitcounter i in mine will not.
There are two optimizations I see which would make modular exponentation
faster. One is reusing allocated bignums. The result does not have to get
allocated over and over. The same counts for sqr. Both result, sqr and
intermediate results are never bigger than twice of the size of the modulus.
It would be nice if one could preallocate two "bignum-buffers" (for sqr and
the result) and then let the arithmetic operations use those buffers.
I'm not sure how far dataflow optimization can deduce such reuse of bignums
and if any implementation does something like that.
The other optimization would be to use montgomery representation for the
numbers. This would mean having a conversion step at the beginning and then
using "montgomery-multiplication" instead of (mod (* ...) modulus). If the
result is calculated one converts it from montgomery representation to
normal representation back. This allows one to leave out the MODs.
ciao,
Jochen
--
http://www.dataheaven.de
In article <············@rznews2.rrze.uni-erlangen.de>, Jochen Schmidt
<···@dataheaven.de> wrote:
> I rewrote it using (loop while (> x 0) ...)
> because while is not part of ANSI CL.
Yes, but it should be ;-)
> I also added declarations.
>
> (defun expt-mod-n (a x n)
> (declare (optimize (speed 3) (safety 0)))
> (let ((result 1))
> (loop while (> x 0)
> do (if (oddp x)
> (setf result (mod (* result a) n)))
> (setf x (ash x -1))
> (setf a (mod (* a a) n)))
> result))
>
> I compared it to mine:
>
> (defun expt-mod (n exponent modulus)
> "As (mod (expt n exponent) modulus), but more efficient."
> (declare (optimize (speed 3) (safety 0) (space 0) (debug 0)))
> (loop with result = 1
> for i of-type fixnum from 0 below (integer-length exponent)
> for sqr = n then (mod (* sqr sqr) modulus)
> when (logbitp i exponent) do
> (setf result (mod (* result sqr) modulus))
> finally (return result)))
>
> Of course the result of integer-length can be bigger than a fixnum but
> not in application to the cryptographic algorithms I used it for.
> A more general CL implementation can test the ranges of it's parameters
> first and then choose the best (and more important: a correct)
> implementation.
>
> The difference in LispWorks 4.2.6 was significant. (With mine being faster).
Indeed. In MCL the difference is a factor 10. Zowie!
> Here is my reasoning why this is so:
> I think that the use of logbitp and integer-length enables the
> implementation to reduce consing here. The ASH will allocate a new bignum
> per cycle in your code while increasing the bitcounter i in mine will not.
Sounds plausible.
E.
In article <············@rznews2.rrze.uni-erlangen.de>, Jochen Schmidt
<···@dataheaven.de> wrote:
> The other optimization would be to use montgomery representation for the
> numbers. This would mean having a conversion step at the beginning and
> then
> using "montgomery-multiplication" instead of (mod (* ...) modulus). If
> the
> result is calculated one converts it from montgomery representation to
> normal representation back. This allows one to leave out the MODs.
In C, it is certainly true that the Montgomery representation speeds up
modular exponentiation dramatically. In Common Lisp, it's more murky.
Montgomery representation lets you do modular multiplies with shifts
instead of divides for the modulus operation. But in Common Lisp, shifts
of bignums typically cons, and may not be much faster than divides.
It depends on your implementation.
Other optimizations are worth trying too. For example, the Karatsuba
formulation is a good way to optimize multiplication of large numbers in
general, regardless of modulo operations (Montgomery only applies to
modular multiplication). Karatsuba is elegantly recursive and
substitutes two additions for one of the four needed sub-multiplications
at each step. (Do a Google search on Karatsuba for more information).
Karatsuba can multiply in O[N^1.585] operations instead of O[N^2]
operations. CMUCL already uses Karatsuba internally for large bignum
multiplications, but other CLs may not.
Squaring is another avenue for optimization. It's possible to formulate
squaring to be faster than general multiplication, and since
exponentiation requires squaring, speeding up squaring speeds up
exponentiation.
See http://members.tripod.com/careybloodworth/karatsuba.htm
for a good description of how to go one step further and combine the
Karatsuba reformulation trick with squaring, which makes squaring faster
still.
While it's possible to code all these optimizations in Lisp--and it's
an enlightening exercise to do so--such an approach will probably not
help the speed of your code much unless you're using a very smart
compiler that knows when how to avoid excess consing.
If you have LAP available, however, they could make a real difference.
--
Shannon Spires
Shannon Spires wrote:
> In article <············@rznews2.rrze.uni-erlangen.de>, Jochen Schmidt
> <···@dataheaven.de> wrote:
>
>> The other optimization would be to use montgomery representation for the
>> numbers. This would mean having a conversion step at the beginning and
>> then
>> using "montgomery-multiplication" instead of (mod (* ...) modulus). If
>> the
>> result is calculated one converts it from montgomery representation to
>> normal representation back. This allows one to leave out the MODs.
>
> In C, it is certainly true that the Montgomery representation speeds up
> modular exponentiation dramatically. In Common Lisp, it's more murky.
> Montgomery representation lets you do modular multiplies with shifts
> instead of divides for the modulus operation. But in Common Lisp, shifts
> of bignums typically cons, and may not be much faster than divides.
> It depends on your implementation.
Yes of course. I actually meant to implement Montgomery multiplication
using lowlevel means of particular implementations (for example access to
the bignum vector).
> Other optimizations are worth trying too. For example, the Karatsuba
> formulation is a good way to optimize multiplication of large numbers in
> general, regardless of modulo operations (Montgomery only applies to
> modular multiplication). Karatsuba is elegantly recursive and
> substitutes two additions for one of the four needed sub-multiplications
> at each step. (Do a Google search on Karatsuba for more information).
> Karatsuba can multiply in O[N^1.585] operations instead of O[N^2]
> operations. CMUCL already uses Karatsuba internally for large bignum
> multiplications, but other CLs may not.
>
> Squaring is another avenue for optimization. It's possible to formulate
> squaring to be faster than general multiplication, and since
> exponentiation requires squaring, speeding up squaring speeds up
> exponentiation.
Hm... this may be interesting too - I remember reading something about that
the Handbook of applied Cryptography.
> See http://members.tripod.com/careybloodworth/karatsuba.htm
> for a good description of how to go one step further and combine the
> Karatsuba reformulation trick with squaring, which makes squaring faster
> still.
>
> While it's possible to code all these optimizations in Lisp--and it's
> an enlightening exercise to do so--such an approach will probably not
> help the speed of your code much unless you're using a very smart
> compiler that knows when how to avoid excess consing.
Yes as I said one has to solve the consing problem first. A
WITH-BIGNUM-BUFFERS which enables the compiler to reuse the bignum buffers
could be a good idea (combined with dataflow analysis in the body of the
macro).
> If you have LAP available, however, they could make a real difference.
Whats LAP?
ciao,
Jochen
--
http://www.dataheaven.de
In article <···········@rznews2.rrze.uni-erlangen.de>, Jochen Schmidt
<···@dataheaven.de> wrote:
> Shannon Spires wrote:
> > Squaring is another avenue for optimization. It's possible to formulate
> > squaring to be faster than general multiplication, and since
> > exponentiation requires squaring, speeding up squaring speeds up
> > exponentiation.
>
> Hm... this may be interesting too - I remember reading something about
> that
> the Handbook of applied Cryptography.
Here's why. Pardon me if parts of this are too elementary, but I want
to be clear for the benefit of readers who might not have studied
multiplication algorithms.
Notice that if you think of bignum multiplication as
polynomial multiplication (which is what scalar multiplication
really is at core), you could express (* A B) as the two-digit
polynomials
A2r + A1
times
B2r + B1
where r is your base or digit size. For computation, it's usually most
convenient to make r a power of 2. (When one writes an ordinary decimal
number, r is just ten and it and the "+" signs are implicit.)
So multiplication is then of course
A2B2r^2 + A1B2r + B1A2r + A1B1 [1]
which is four sub-multiplies (A2B2, A1B2, B1A2, and A1B1) plus some
additions plus some shifts if we choose r to be a power of 2.
Thus our sub-multiplications (presumably the most expensive
sub-operations) are O[n^2], where n is the number of digits. One could
make r smaller and increase the number of digits; the multiplication is
still O[n^2]. But I chose two digits for reasons that will become clear.
Notice that each of these four multiplications could be reformulated
exactly the same way with a new r that is half the length of the old r.
Thus multiplication itself is inherently recursive. This buys you no
performance; it's just an interesting observation at this point,
especially when you contrast it with the manifestly non-recursive
algorithm we were all taught for multiplication in grade school.
Now let's examine squaring by the same mechanism.
(* A A) becomes
A2r + A1
times
A2r + A1
which is
A2A2r^2 + A1A2r + A1A2r + A1A1
or
A2^2r^2 + 2A1A2r + A1^2 [2]
aha! Only 3 multiplications now, and one fewer addition. Thus squaring
is already faster than multiplication.
But we're just getting started.
By now, your recursion detector should be lit up bright red, and you've
noticed that two of these multiplications are recursive calls to our
square routine itself. The third is a general-purpose multiply that we
cannot do anything about (just yet). So not only do we need 3/4 as many
sub-multiplies, but two-thirds of _those_ need only 3/4 as many
sub-sub-multiplies, etc.
Thus, without any Karatsuba tricks, we can formulate squaring
recursively to be faster than general multiplication. Providing we can
do so without the recursion itself adding too much overhead.
The key further insights of the Karatsuba mechanism are that
a) You can reformulate [1] above to have one fewer multiplication than
it has now, and if you implement it recursively, each sub-multiply gets
the same advantage. Now, recursively expressing general multiplication
_does_ buy you something. To wit:
A2B2(r^2+r) + (A2-A1)(B1-B2)r + A1B1(r+1) [3]
b) You can reformulate [2] above to make all _three_ sub-multiplies into
recursive square calls, which makes things even faster than either [2]
or [3] if you're squaring:
A2^2(r^2+r) - ((A2-A1)^2)r + A1^2(r+1) [4]
Karatsuba imparts overhead of its own in the form of additional
additions, shifts, and recursive calls. Reminiscent of Quicksort, it
typically pays off only on fairly large numbers and one switches
to ordinary multiplications in the leaf nodes of the recursion tree.
The potential speed boost of squaring is why I wish Common Lisp
had included #'square in the spec. Although it probably only makes
much difference to cryptographers. Perhaps even better, implementers
might consider including a compiler macro to detect it in the case of
(* foo foo).
> > If you have LAP available, however, they could make a real difference.
>
> Whats LAP?
Lisp Assembly Programming. Some CL implementations (e.g. MCL and OpenMCL
and some others) allow you to write assembler using Lisp notation.
--
Shannon Spires
From: Raymond Toy
Subject: Re: The blowfish encryption algorithm, in CL [CODE]
Date:
Message-ID: <4nd6u7pmz2.fsf@rtp.ericsson.se>
>>>>> "Jochen" == Jochen Schmidt <···@dataheaven.de> writes:
>> Other optimizations are worth trying too. For example, the Karatsuba
>> formulation is a good way to optimize multiplication of large numbers in
>> general, regardless of modulo operations (Montgomery only applies to
>> modular multiplication). Karatsuba is elegantly recursive and
>> substitutes two additions for one of the four needed sub-multiplications
>> at each step. (Do a Google search on Karatsuba for more information).
>> Karatsuba can multiply in O[N^1.585] operations instead of O[N^2]
>> operations. CMUCL already uses Karatsuba internally for large bignum
>> multiplications, but other CLs may not.
No, CMUCL doesn't use Karatsuba. I wrote one, but it was never
incoporated into the sources, mostly because it only speeds up
multiplication if the numbers are 512 or 1024 bits or more. That was
a fast as I could get the Lisp implementation to go with
CMUCL-specific stuff.
Ray
Raymond Toy wrote:
>>>>>> "Jochen" == Jochen Schmidt <···@dataheaven.de> writes:
>
> >> Other optimizations are worth trying too. For example, the
> >> Karatsuba formulation is a good way to optimize multiplication of
> >> large numbers in general, regardless of modulo operations
> >> (Montgomery only applies to modular multiplication). Karatsuba is
> >> elegantly recursive and substitutes two additions for one of the
> >> four needed sub-multiplications at each step. (Do a Google search
> >> on Karatsuba for more information). Karatsuba can multiply in
> >> O[N^1.585] operations instead of O[N^2] operations. CMUCL already
> >> uses Karatsuba internally for large bignum multiplications, but
> >> other CLs may not.
>
> No, CMUCL doesn't use Karatsuba. I wrote one, but it was never
> incoporated into the sources, mostly because it only speeds up
> multiplication if the numbers are 512 or 1024 bits or more. That was
> a fast as I could get the Lisp implementation to go with
> CMUCL-specific stuff.
While your post was certainly informative I only wanted to note that it was
not me who wrote the paragraph you quoted - It was Shannon.
ciao,
Jochen
--
http://www.dataheaven.de
From: Raymond Toy
Subject: Re: The blowfish encryption algorithm, in CL [CODE]
Date:
Message-ID: <4nbs9rnvfm.fsf@rtp.ericsson.se>
>>>>> "Jochen" == Jochen Schmidt <···@dataheaven.de> writes:
Jochen> Raymond Toy wrote:
>>>>>>> "Jochen" == Jochen Schmidt <···@dataheaven.de> writes:
>>
>> >> Other optimizations are worth trying too. For example, the
>> >> Karatsuba formulation is a good way to optimize multiplication of
>> >> large numbers in general, regardless of modulo operations
>> >> (Montgomery only applies to modular multiplication). Karatsuba is
>> >> elegantly recursive and substitutes two additions for one of the
>> >> four needed sub-multiplications at each step. (Do a Google search
>> >> on Karatsuba for more information). Karatsuba can multiply in
>> >> O[N^1.585] operations instead of O[N^2] operations. CMUCL already
>> >> uses Karatsuba internally for large bignum multiplications, but
>> >> other CLs may not.
>>
>> No, CMUCL doesn't use Karatsuba. I wrote one, but it was never
>> incoporated into the sources, mostly because it only speeds up
>> multiplication if the numbers are 512 or 1024 bits or more. That was
>> a fast as I could get the Lisp implementation to go with
>> CMUCL-specific stuff.
Jochen> While your post was certainly informative I only wanted to note that it was
Jochen> not me who wrote the paragraph you quoted - It was Shannon.
Sorry, I obviously snipped the wrong part. Mea culpa.
Ray
Jochen Schmidt <···@dataheaven.de> writes:
> I really would be interested in seeing your modular exponentiation function
> as this is the major bottleneck in my RSA implementation.
It's not fast. It's the main bottleneck in my implementation, too.
> I think a fast lowlevel tuned modular exponentiation function would be a
> good enhancement for CLs numeric function repertoire. Many cryptographic
> algorithms make use of that. It is possible to implement it fast by using
> things like "montgomery multiplication" which enables fast modular
> exponentiation.
I whole-heartedly agree!
--
(espen)