From: Jochen Schmidt
Subject: Re: The blowfish encryption algorithm, in CL [CODE]
Date: 
Message-ID: <af6ifo$dgm$1@rznews2.rrze.uni-erlangen.de>
Alain Picard wrote:

> 
> Fellow lispers,
> 
> I needed a simple, in-lisp (i.e. no FFI) encryption for an
> application, so I wrote the following.  Perhaps it will be
> of some use to someone else.
> 
> I'm posting this here because I don't have a public FTP site, and
> I'm not too sure where would be a good place to send this to.
> Certainly, either the CCLAN or CCLOCC (or both!) people are welcome
> to incorporate this into their offerings, if they desire.
> 
> My thanks to Pierre Mai for posting an excellend MD5 implementation
> in lisp, from which I learned many things which went into this code.

I could offer you to include it to my "Cryptography in Common Lisp" site
on http://www.dataheaven.de if you want.

It is rather minimalistic but it will certainly grow in future. I've 
contributed my "in-lisp" RSA implementation, math utilities and my binding 
to CL-SSL.

Some months ago I worked on a version of ARC4 (A stream cipher) but I have
to look it up on my harddisk first.

I remember there were implementations of RIPEMD and SHA too but I don't 
remember the details.

ciao,
Jochen

--
http://www.dataheaven.de

From: Roland Kaufmann
Subject: Re: The blowfish encryption algorithm, in CL [CODE]
Date: 
Message-ID: <tl23cvcy9ab.fsf@space.at>
A Common Lisp implementation of ARC4 can be found on
  http://www.xs4all.nl/~cg/ciphersaber/

                                best regards
                                    Roland Kaufmann
From: Espen Vestre
Subject: Re: The blowfish encryption algorithm, in CL [CODE]
Date: 
Message-ID: <kw7kknii6b.fsf@merced.netfonds.no>
Jochen Schmidt <···@dataheaven.de> writes:

> I could offer you to include it to my "Cryptography in Common Lisp" site
> on http://www.dataheaven.de if you want.

This is great :-) 

I just implemented both blowfish and RSA myself (mostly because I
needed something small and light which could be cross-platform
portable without having to depend on external libraries), so I don't
need any of them right now, but whenever I might find some spare time,
I'll have a look and see if I can contribute anything at all to what
you guys have already done!

-- 
  (espen)
From: Jochen Schmidt
Subject: Re: The blowfish encryption algorithm, in CL [CODE]
Date: 
Message-ID: <afan3a$k3m$1@rznews2.rrze.uni-erlangen.de>
Espen Vestre wrote:

> Jochen Schmidt <···@dataheaven.de> writes:
> 
>> I could offer you to include it to my "Cryptography in Common Lisp" site
>> on http://www.dataheaven.de if you want.
> 
> This is great :-)
> 
> I just implemented both blowfish and RSA myself (mostly because I
> needed something small and light which could be cross-platform
> portable without having to depend on external libraries), so I don't
> need any of them right now, but whenever I might find some spare time,
> I'll have a look and see if I can contribute anything at all to what
> you guys have already done!

I really would be interested in seeing your modular exponentiation function
as this is the major bottleneck in my RSA implementation.

I think a fast lowlevel tuned modular exponentiation function would be a 
good enhancement for CLs numeric function repertoire. Many cryptographic 
algorithms make use of that. It is possible to implement it fast by using 
things like "montgomery multiplication" which enables fast modular 
exponentiation.

ciao,
Jochen

--
http://www.dataheaven.de
From: Erann Gat
Subject: Re: The blowfish encryption algorithm, in CL [CODE]
Date: 
Message-ID: <gat-2506021628310001@k-137-79-50-101.jpl.nasa.gov>
(defun expt-mod-n (a x n)
  (let ( (result 1) )
    (while (> x 0)
      (if (oddp x)
        (setf result (mod (* result a) n)))
      (setf x (ash x -1))
      (setf a (mod (* a a) n)))
    result))

In article <············@rznews2.rrze.uni-erlangen.de>, Jochen Schmidt
<···@dataheaven.de> wrote:

> Espen Vestre wrote:
> 
> > Jochen Schmidt <···@dataheaven.de> writes:
> > 
> >> I could offer you to include it to my "Cryptography in Common Lisp" site
> >> on http://www.dataheaven.de if you want.
> > 
> > This is great :-)
> > 
> > I just implemented both blowfish and RSA myself (mostly because I
> > needed something small and light which could be cross-platform
> > portable without having to depend on external libraries), so I don't
> > need any of them right now, but whenever I might find some spare time,
> > I'll have a look and see if I can contribute anything at all to what
> > you guys have already done!
> 
> I really would be interested in seeing your modular exponentiation function
> as this is the major bottleneck in my RSA implementation.
> 
> I think a fast lowlevel tuned modular exponentiation function would be a 
> good enhancement for CLs numeric function repertoire. Many cryptographic 
> algorithms make use of that. It is possible to implement it fast by using 
> things like "montgomery multiplication" which enables fast modular 
> exponentiation.
> 
> ciao,
> Jochen
> 
> --
> http://www.dataheaven.de
From: Jochen Schmidt
Subject: Re: The blowfish encryption algorithm, in CL [CODE]
Date: 
Message-ID: <afbtqb$aju$1@rznews2.rrze.uni-erlangen.de>
Erann Gat wrote:

> (defun expt-mod-n (a x n)
>   (let ( (result 1) )
>     (while (> x 0)
>       (if (oddp x)
>         (setf result (mod (* result a) n)))
>       (setf x (ash x -1))
>       (setf a (mod (* a a) n)))
>     result))

I rewrote it using (loop while (> x 0) ...)
because while is not part of ANSI CL.
I also added declarations.

(defun expt-mod-n (a x n)
  (declare (optimize (speed 3) (safety 0)))
  (let ((result 1))
    (loop while (> x 0)
          do (if (oddp x)
                 (setf result (mod (* result a) n)))
          (setf x (ash x -1))
          (setf a (mod (* a a) n)))
    result))

I compared it to mine:

(defun expt-mod (n exponent modulus)
  "As (mod (expt n exponent) modulus), but more efficient."
  (declare (optimize (speed 3) (safety 0) (space 0) (debug 0))) 
  (loop with result = 1
        for i of-type fixnum from 0 below (integer-length exponent)
        for sqr = n then (mod (* sqr sqr) modulus)
        when (logbitp i exponent) do
        (setf result (mod (* result sqr) modulus))
        finally (return result)))

Of course the result of integer-length can be bigger than a fixnum but
not in application to the cryptographic algorithms I used it for.
A more general CL implementation can test the ranges of it's parameters 
first and then choose the best (and more important: a correct) 
implementation.

The difference in LispWorks 4.2.6 was significant. (With mine being faster).

Here is my reasoning why this is so:
I think that the use of logbitp and integer-length enables the 
implementation to reduce consing here. The ASH will allocate a new bignum 
per cycle in your code while increasing the bitcounter i in mine will not.

There are two optimizations I see which would make modular exponentation 
faster. One is reusing allocated bignums. The result does not have to get 
allocated over and over. The same counts for sqr. Both result, sqr and 
intermediate results are never bigger than twice of the size of the modulus.
It would be nice if one could preallocate two "bignum-buffers" (for sqr and 
the result) and then let the arithmetic operations use those buffers.
I'm not sure how far dataflow optimization can deduce such reuse of bignums 
and if any implementation does something like that.

The other optimization would be to use montgomery representation for the 
numbers. This would mean having a conversion step at the beginning and then 
using "montgomery-multiplication" instead of (mod (* ...) modulus). If the 
result is calculated one converts it from montgomery representation to 
normal representation back. This allows one to leave out the MODs.

ciao,
Jochen

--
http://www.dataheaven.de
From: Erann Gat
Subject: Re: The blowfish encryption algorithm, in CL [CODE]
Date: 
Message-ID: <gat-2606020933220001@192.168.1.50>
In article <············@rznews2.rrze.uni-erlangen.de>, Jochen Schmidt
<···@dataheaven.de> wrote:

> I rewrote it using (loop while (> x 0) ...)
> because while is not part of ANSI CL.

Yes, but it should be ;-)

> I also added declarations.
> 
> (defun expt-mod-n (a x n)
>   (declare (optimize (speed 3) (safety 0)))
>   (let ((result 1))
>     (loop while (> x 0)
>           do (if (oddp x)
>                  (setf result (mod (* result a) n)))
>           (setf x (ash x -1))
>           (setf a (mod (* a a) n)))
>     result))
> 
> I compared it to mine:
> 
> (defun expt-mod (n exponent modulus)
>   "As (mod (expt n exponent) modulus), but more efficient."
>   (declare (optimize (speed 3) (safety 0) (space 0) (debug 0))) 
>   (loop with result = 1
>         for i of-type fixnum from 0 below (integer-length exponent)
>         for sqr = n then (mod (* sqr sqr) modulus)
>         when (logbitp i exponent) do
>         (setf result (mod (* result sqr) modulus))
>         finally (return result)))
> 
> Of course the result of integer-length can be bigger than a fixnum but
> not in application to the cryptographic algorithms I used it for.
> A more general CL implementation can test the ranges of it's parameters 
> first and then choose the best (and more important: a correct) 
> implementation.
> 
> The difference in LispWorks 4.2.6 was significant. (With mine being faster).

Indeed.  In MCL the difference is a factor 10.  Zowie!

> Here is my reasoning why this is so:
> I think that the use of logbitp and integer-length enables the 
> implementation to reduce consing here. The ASH will allocate a new bignum 
> per cycle in your code while increasing the bitcounter i in mine will not.

Sounds plausible.

E.
From: Shannon Spires
Subject: Re: The blowfish encryption algorithm, in CL [CODE]
Date: 
Message-ID: <svspire-93A12E.20402728062002@news.sandia.gov>
In article <············@rznews2.rrze.uni-erlangen.de>, Jochen Schmidt 
<···@dataheaven.de> wrote:

> The other optimization would be to use montgomery representation for the 
> numbers. This would mean having a conversion step at the beginning and 
> then 
> using "montgomery-multiplication" instead of (mod (* ...) modulus). If 
> the 
> result is calculated one converts it from montgomery representation to 
> normal representation back. This allows one to leave out the MODs.

In C, it is certainly true that the Montgomery representation speeds up 
modular exponentiation dramatically. In Common Lisp, it's more murky. 
Montgomery representation lets you do modular multiplies with shifts 
instead of divides for the modulus operation. But in Common Lisp, shifts 
of bignums typically cons, and may not be much faster than divides.
It depends on your implementation.

Other optimizations are worth trying too. For example, the Karatsuba 
formulation is a good way to optimize multiplication of large numbers in 
general, regardless of modulo operations (Montgomery only applies to 
modular multiplication). Karatsuba is elegantly recursive and 
substitutes two additions for one of the four needed sub-multiplications 
at each step. (Do a Google search on Karatsuba for more information). 
Karatsuba can multiply in O[N^1.585] operations instead of O[N^2] 
operations. CMUCL already uses Karatsuba internally for large bignum 
multiplications, but other CLs may not. 

Squaring is another avenue for optimization. It's possible to formulate 
squaring to be faster than general multiplication, and since 
exponentiation requires squaring, speeding up squaring speeds up 
exponentiation.

See http://members.tripod.com/careybloodworth/karatsuba.htm
for a good description of how to go one step further and combine the 
Karatsuba reformulation trick with squaring, which makes squaring faster 
still.

While it's possible to code all these optimizations in Lisp--and it's
an enlightening exercise to do so--such an approach will probably not
help the speed of your code much unless you're using a very smart 
compiler that knows when how to avoid excess consing.

If you have LAP available, however, they could make a real difference.

--
Shannon Spires
From: Jochen Schmidt
Subject: Re: The blowfish encryption algorithm, in CL [CODE]
Date: 
Message-ID: <afk0f1$3s$1@rznews2.rrze.uni-erlangen.de>
Shannon Spires wrote:

> In article <············@rznews2.rrze.uni-erlangen.de>, Jochen Schmidt
> <···@dataheaven.de> wrote:
> 
>> The other optimization would be to use montgomery representation for the
>> numbers. This would mean having a conversion step at the beginning and
>> then
>> using "montgomery-multiplication" instead of (mod (* ...) modulus). If
>> the
>> result is calculated one converts it from montgomery representation to
>> normal representation back. This allows one to leave out the MODs.
> 
> In C, it is certainly true that the Montgomery representation speeds up
> modular exponentiation dramatically. In Common Lisp, it's more murky.
> Montgomery representation lets you do modular multiplies with shifts
> instead of divides for the modulus operation. But in Common Lisp, shifts
> of bignums typically cons, and may not be much faster than divides.
> It depends on your implementation.

Yes of course. I actually meant to implement Montgomery multiplication 
using lowlevel means of particular implementations (for example access to 
the bignum vector). 

> Other optimizations are worth trying too. For example, the Karatsuba
> formulation is a good way to optimize multiplication of large numbers in
> general, regardless of modulo operations (Montgomery only applies to
> modular multiplication). Karatsuba is elegantly recursive and
> substitutes two additions for one of the four needed sub-multiplications
> at each step. (Do a Google search on Karatsuba for more information).
> Karatsuba can multiply in O[N^1.585] operations instead of O[N^2]
> operations. CMUCL already uses Karatsuba internally for large bignum
> multiplications, but other CLs may not.
> 
> Squaring is another avenue for optimization. It's possible to formulate
> squaring to be faster than general multiplication, and since
> exponentiation requires squaring, speeding up squaring speeds up
> exponentiation.

Hm... this may be interesting too - I remember reading something about that
the Handbook of applied Cryptography.

> See http://members.tripod.com/careybloodworth/karatsuba.htm
> for a good description of how to go one step further and combine the
> Karatsuba reformulation trick with squaring, which makes squaring faster
> still.
> 
> While it's possible to code all these optimizations in Lisp--and it's
> an enlightening exercise to do so--such an approach will probably not
> help the speed of your code much unless you're using a very smart
> compiler that knows when how to avoid excess consing.

Yes as I said one has to solve the consing problem first. A 
WITH-BIGNUM-BUFFERS which enables the compiler to reuse the bignum buffers 
could be a good idea (combined with dataflow analysis in the body of the 
macro).

> If you have LAP available, however, they could make a real difference.

Whats LAP?

ciao,
Jochen

--
http://www.dataheaven.de
From: Shannon Spires
Subject: Re: The blowfish encryption algorithm, in CL [CODE]
Date: 
Message-ID: <svspire-89B478.14501630062002@news.nmia.com>
In article <···········@rznews2.rrze.uni-erlangen.de>, Jochen Schmidt 
<···@dataheaven.de> wrote:

> Shannon Spires wrote:
> > Squaring is another avenue for optimization. It's possible to formulate
> > squaring to be faster than general multiplication, and since
> > exponentiation requires squaring, speeding up squaring speeds up
> > exponentiation.
> 
> Hm... this may be interesting too - I remember reading something about 
> that
> the Handbook of applied Cryptography.

Here's why. Pardon me if parts of this are too elementary, but I want
to be clear for the benefit of readers who might not have studied
multiplication algorithms.

Notice that if you think of bignum multiplication as
polynomial multiplication (which is what scalar multiplication
really is at core), you could express (* A B) as the two-digit
polynomials

A2r + A1
times
B2r + B1

where r is your base or digit size. For computation, it's usually most 
convenient to make r a power of 2. (When one writes an ordinary decimal 
number, r is just ten and it and the "+" signs are implicit.)

So multiplication is then of course

A2B2r^2 + A1B2r + B1A2r + A1B1           [1]

which is four sub-multiplies (A2B2, A1B2, B1A2, and A1B1) plus some 
additions plus some shifts if we choose r to be a power of 2.

Thus our sub-multiplications (presumably the most expensive 
sub-operations) are O[n^2], where n is the number of digits. One could 
make r smaller and increase the number of digits; the multiplication is 
still O[n^2]. But I chose two digits for reasons that will become clear.

Notice that each of these four multiplications could be reformulated 
exactly the same way with a new r that is half the length of the old r. 
Thus multiplication itself is inherently recursive. This buys you no 
performance; it's just an interesting observation at this point, 
especially when you contrast it with the manifestly non-recursive
algorithm we were all taught for multiplication in grade school.

Now let's examine squaring by the same mechanism.
(* A A) becomes

A2r + A1
times
A2r + A1

which is 

A2A2r^2 + A1A2r + A1A2r + A1A1
or

A2^2r^2 + 2A1A2r + A1^2                   [2]

aha! Only 3 multiplications now, and one fewer addition. Thus squaring 
is already faster than multiplication.

But we're just getting started.

By now, your recursion detector should be lit up bright red, and you've 
noticed that two of these multiplications are recursive calls to our 
square routine itself. The third is a general-purpose multiply that we 
cannot do anything about (just yet). So not only do we need 3/4 as many 
sub-multiplies, but two-thirds of _those_ need only 3/4 as many 
sub-sub-multiplies, etc.

Thus, without any Karatsuba tricks, we can formulate squaring 
recursively to be faster than general multiplication. Providing we can 
do so without the recursion itself adding too much overhead.

The key further insights of the Karatsuba mechanism are that

a) You can reformulate [1] above to have one fewer multiplication than 
it has now, and if you implement it recursively, each sub-multiply gets 
the same advantage. Now, recursively expressing general multiplication 
_does_ buy you something. To wit:
   
   A2B2(r^2+r) + (A2-A1)(B1-B2)r + A1B1(r+1)       [3]

b) You can reformulate [2] above to make all _three_ sub-multiplies into 
recursive square calls, which makes things even faster than either [2]
or [3] if you're squaring:

   A2^2(r^2+r) - ((A2-A1)^2)r + A1^2(r+1)          [4]

Karatsuba imparts overhead of its own in the form of additional 
additions, shifts, and recursive calls. Reminiscent of Quicksort, it 
typically pays off only on fairly large numbers and one switches
to ordinary multiplications in the leaf nodes of the recursion tree.

The potential speed boost of squaring is why I wish Common Lisp
had included #'square in the spec. Although it probably only makes
much difference to cryptographers. Perhaps even better, implementers
might consider including a compiler macro to detect it in the case of
(* foo foo).

> > If you have LAP available, however, they could make a real difference.
> 
> Whats LAP?

Lisp Assembly Programming. Some CL implementations (e.g. MCL and OpenMCL 
and some others) allow you to write assembler using Lisp notation.

--
Shannon Spires
From: Raymond Toy
Subject: Re: The blowfish encryption algorithm, in CL [CODE]
Date: 
Message-ID: <4nd6u7pmz2.fsf@rtp.ericsson.se>
>>>>> "Jochen" == Jochen Schmidt <···@dataheaven.de> writes:

    >> Other optimizations are worth trying too. For example, the Karatsuba
    >> formulation is a good way to optimize multiplication of large numbers in
    >> general, regardless of modulo operations (Montgomery only applies to
    >> modular multiplication). Karatsuba is elegantly recursive and
    >> substitutes two additions for one of the four needed sub-multiplications
    >> at each step. (Do a Google search on Karatsuba for more information).
    >> Karatsuba can multiply in O[N^1.585] operations instead of O[N^2]
    >> operations. CMUCL already uses Karatsuba internally for large bignum
    >> multiplications, but other CLs may not.

No, CMUCL doesn't use Karatsuba.  I wrote one, but it was never
incoporated into the sources, mostly because it only speeds up
multiplication if the numbers are 512 or 1024 bits or more.  That was
a fast as I could get the Lisp implementation to go with
CMUCL-specific stuff.

Ray
From: Jochen Schmidt
Subject: Re: The blowfish encryption algorithm, in CL [CODE]
Date: 
Message-ID: <afq595$kh8$1@rznews2.rrze.uni-erlangen.de>
Raymond Toy wrote:

>>>>>> "Jochen" == Jochen Schmidt <···@dataheaven.de> writes:
> 
>     >> Other optimizations are worth trying too. For example, the
>     >> Karatsuba formulation is a good way to optimize multiplication of
>     >> large numbers in general, regardless of modulo operations
>     >> (Montgomery only applies to modular multiplication). Karatsuba is
>     >> elegantly recursive and substitutes two additions for one of the
>     >> four needed sub-multiplications at each step. (Do a Google search
>     >> on Karatsuba for more information). Karatsuba can multiply in
>     >> O[N^1.585] operations instead of O[N^2] operations. CMUCL already
>     >> uses Karatsuba internally for large bignum multiplications, but
>     >> other CLs may not.
> 
> No, CMUCL doesn't use Karatsuba.  I wrote one, but it was never
> incoporated into the sources, mostly because it only speeds up
> multiplication if the numbers are 512 or 1024 bits or more.  That was
> a fast as I could get the Lisp implementation to go with
> CMUCL-specific stuff.

While your post was certainly informative I only wanted to note that it was 
not me who wrote the paragraph you quoted - It was Shannon.

ciao,
Jochen

--
http://www.dataheaven.de
From: Raymond Toy
Subject: Re: The blowfish encryption algorithm, in CL [CODE]
Date: 
Message-ID: <4nbs9rnvfm.fsf@rtp.ericsson.se>
>>>>> "Jochen" == Jochen Schmidt <···@dataheaven.de> writes:

    Jochen> Raymond Toy wrote:
    >>>>>>> "Jochen" == Jochen Schmidt <···@dataheaven.de> writes:
    >> 
    >> >> Other optimizations are worth trying too. For example, the
    >> >> Karatsuba formulation is a good way to optimize multiplication of
    >> >> large numbers in general, regardless of modulo operations
    >> >> (Montgomery only applies to modular multiplication). Karatsuba is
    >> >> elegantly recursive and substitutes two additions for one of the
    >> >> four needed sub-multiplications at each step. (Do a Google search
    >> >> on Karatsuba for more information). Karatsuba can multiply in
    >> >> O[N^1.585] operations instead of O[N^2] operations. CMUCL already
    >> >> uses Karatsuba internally for large bignum multiplications, but
    >> >> other CLs may not.
    >> 
    >> No, CMUCL doesn't use Karatsuba.  I wrote one, but it was never
    >> incoporated into the sources, mostly because it only speeds up
    >> multiplication if the numbers are 512 or 1024 bits or more.  That was
    >> a fast as I could get the Lisp implementation to go with
    >> CMUCL-specific stuff.

    Jochen> While your post was certainly informative I only wanted to note that it was 
    Jochen> not me who wrote the paragraph you quoted - It was Shannon.

Sorry, I obviously snipped the wrong part.  Mea culpa.

Ray
From: Espen Vestre
Subject: Re: The blowfish encryption algorithm, in CL [CODE]
Date: 
Message-ID: <kwznxifpbx.fsf@merced.netfonds.no>
Jochen Schmidt <···@dataheaven.de> writes:

> I really would be interested in seeing your modular exponentiation function
> as this is the major bottleneck in my RSA implementation.

It's not fast. It's the main bottleneck in my implementation, too.

> I think a fast lowlevel tuned modular exponentiation function would be a 
> good enhancement for CLs numeric function repertoire. Many cryptographic 
> algorithms make use of that. It is possible to implement it fast by using 
> things like "montgomery multiplication" which enables fast modular 
> exponentiation.

I whole-heartedly agree!

-- 
  (espen)