Analysis of different algorithms (rather OT)

From: William Bland
Subject: Analysis of different algorithms (rather OT)
Date: Thu, 23 Jan 2003 12:33:55 +0000
Message-ID: <pan.2003.01.23.12.33.55.32488@abstractnonsense.com>

Hello all,
	I posted this to c.l.s recently but haven't had a reply yet.
I know it's fairly off-topic for both newsgroups - sorry!


My current implementation of the prime number sieve creates a list of
numbers from 2 to n, then takes the current head of the list, removes
all multiples of it from the tail of the list, and loops until the
tail is empty (building up a list of primes less than n in the process).

I know division is expensive, so I was wondering if the following would
be more efficient:

Build a list of pairs (i b) where i is an integer and b is a boolean.
Initially we have all numbers from 2 to n, and all the booleans are #t.

We have an outer loop, j from 2 to n:
	Inside the loop, we have an inner loop k from 2*j to n, incrementing by j:
		Inside this loop we just set the boolean in the k'th pair to #f.

Once we have finished, we simply collect all the numbers in those pairs whose
boolean is #t.

I'm pretty sure this method works (although I haven't tried implementing
it yet - just did some pen-and-paper examples and it seems to work).
I'm not sure how to decide which is faster though (never done this kind of
thing before).

My old method did something like O(n^2) divisions (actually I think it's
less than that but I don't know how much less).

The new method accesses each pair something like O(n^2) times (again, it's
actually less but I can't see a good way to estimate it).

Assuming accessing pairs from a list is faster than doing division...
and that I wasn't too far wrong on both of those O(n^2)'s...

Hmm.  I don't think I've really got the hang of this analysis business -
can somebody step in and stop me embarrassing myself further?! ;-)

Best wishes,
		Bill.

PS: To anyone who is worried this might be a homework problem, it's not.
I got my Ph.D. 3 years ago, but not in computing.  I'm teaching myself
this stuff from books in my spare time.

Re: Analysis of different algorithms (rather OT) Nicolas Neuss
- Re: Analysis of different algorithms (rather OT) William Bland
  - Re: Analysis of different algorithms (rather OT) Nicolas Neuss
    - Re: Analysis of different algorithms (rather OT) Nicolas Neuss
      - Re: Analysis of different algorithms (rather OT) Geoffrey Summerhayes
        Re: Analysis of different algorithms (rather OT) Nicolas Neuss
        Re: Analysis of different algorithms (rather OT) Geoffrey Summerhayes

From: Nicolas Neuss
Subject: Re: Analysis of different algorithms (rather OT)
Date: Thu, 23 Jan 2003 13:57:45 +0000
Message-ID: <87fzrkt1d2.fsf@ortler.iwr.uni-heidelberg.de>

"William Bland" <····@abstractnonsense.com> writes:

> We have an outer loop, j from 2 to n:
> 	Inside the loop, we have an inner loop k from 2*j to n, incrementing by j:
> 		Inside this loop we just set the boolean in the k'th pair to #f.

A much more reasonable optimization of Eratosthenes' sieve is to see that
it is sufficient to loop for j from 2 upto (sqrt n).  Then the complexity
of a straightforward array implementation is O(n*logarithmic-terms(n)).

Nicolas.

From: William Bland
Subject: Re: Analysis of different algorithms (rather OT)
Date: Thu, 23 Jan 2003 14:48:27 +0000
Message-ID: <pan.2003.01.23.14.48.27.760840@abstractnonsense.com>

On Thu, 23 Jan 2003 14:57:45 +0100, Nicolas Neuss wrote:

> "William Bland" <····@abstractnonsense.com> writes:
> 
>> We have an outer loop, j from 2 to n:
>> 	Inside the loop, we have an inner loop k from 2*j to n, incrementing by j:
>> 		Inside this loop we just set the boolean in the k'th pair to #f.
> 
> A much more reasonable optimization of Eratosthenes' sieve is to see that
> it is sufficient to loop for j from 2 upto (sqrt n).  Then the complexity
> of a straightforward array implementation is O(n*logarithmic-terms(n)).
> 
> Nicolas.

Oops, yes I *meant* to go up to (sqrt n)!

How do you get O(n*logarithmic-terms(n))?

Best wishes,
		Bill.

From: Nicolas Neuss
Subject: Re: Analysis of different algorithms (rather OT)
Date: Thu, 23 Jan 2003 15:34:43 +0000
Message-ID: <87y95bswvg.fsf@ortler.iwr.uni-heidelberg.de>

"William Bland" <····@abstractnonsense.com> writes:

> On Thu, 23 Jan 2003 14:57:45 +0100, Nicolas Neuss wrote:
> 
> > "William Bland" <····@abstractnonsense.com> writes:
> > 
> >> We have an outer loop, j from 2 to n:
> >> 	Inside the loop, we have an inner loop k from 2*j to n, incrementing by j:
> >> 		Inside this loop we just set the boolean in the k'th pair to #f.
> > 
> > A much more reasonable optimization of Eratosthenes' sieve is to see that
> > it is sufficient to loop for j from 2 upto (sqrt n).  Then the complexity
> > of a straightforward array implementation is O(n*logarithmic-terms(n)).
> > 
> > Nicolas.
> 
> Oops, yes I *meant* to go up to (sqrt n)!
> 
> How do you get O(n*logarithmic-terms(n))?
> 
> Best wishes,
> 		Bill.

The costs for the innermost loop can be bounded by

O(\sum_{k=1}^\sqrt{n}  n/k) = O(n \sum_{k=1}^\sqrt{n} 1/k) =
O(n \log \sqrt{n}) = O(n \log n)

Introducing some estimates on prime density from number theory, these bound
can be improved, but not much (something like n \log \log n IIRC).

Nicolas.

From: Nicolas Neuss
Subject: Re: Analysis of different algorithms (rather OT)
Date: Thu, 23 Jan 2003 17:31:44 +0000
Message-ID: <87ptqnsrgf.fsf@ortler.iwr.uni-heidelberg.de>

Nicolas Neuss <·············@iwr.uni-heidelberg.de> writes:

> The costs for the innermost loop can be bounded by
> 
> O(\sum_{k=1}^\sqrt{n}  n/k) = O(n \sum_{k=1}^\sqrt{n} 1/k) =
> O(n \log \sqrt{n}) = O(n \log n)
> 
> Introducing some estimates on prime density from number theory, these bound
> can be improved, but not much (something like n \log \log n IIRC).
> 
> Nicolas.

And looking again at these lines, I see that at least the O(n \log n)
complexity bound remains valid if you "loop for j from 2 upto n".  So there
is not much gain for going only upto (sqrt n), contrarily to what I
indicated in my first mail.  The major part of the work is done on small j.

Nicolas.

From: Geoffrey Summerhayes
Subject: Re: Analysis of different algorithms (rather OT)
Date: Thu, 23 Jan 2003 23:37:32 +0000
Message-ID: <_0%X9.8063$5K5.828710@news20.bellglobal.com>

"Nicolas Neuss" <·············@iwr.uni-heidelberg.de> wrote in message ···················@ortler.iwr.uni-heidelberg.de...
> Nicolas Neuss <·············@iwr.uni-heidelberg.de> writes:
>
> > The costs for the innermost loop can be bounded by
> >
> > O(\sum_{k=1}^\sqrt{n}  n/k) = O(n \sum_{k=1}^\sqrt{n} 1/k) =
> > O(n \log \sqrt{n}) = O(n \log n)
> >
> > Introducing some estimates on prime density from number theory, these bound
> > can be improved, but not much (something like n \log \log n IIRC).
> >
> > Nicolas.
>
> And looking again at these lines, I see that at least the O(n \log n)
> complexity bound remains valid if you "loop for j from 2 upto n".  So there
> is not much gain for going only upto (sqrt n), contrarily to what I
> indicated in my first mail.  The major part of the work is done on small j.

Just a thought, ugly code follows...

(defun prime-p(n found-primes)
  (let ((root (sqrt n)))
    (dolist (x found-primes t)
      (cond ((> x root) (return t))
            ((zerop (mod n x)) (return nil))))))

(defun generate-primes(n)
  (let* ((list (list 2))
         (tail list))
    (loop for x from 3 upto n by 2
          when (prime-p x list)
          do (progn
               (rplacd tail (cons x nil))
               (setf tail (cdr tail))))
    list))

--
Geoff

From: Nicolas Neuss
Subject: Re: Analysis of different algorithms (rather OT)
Date: Fri, 24 Jan 2003 11:25:20 +0000
Message-ID: <87k7gussbj.fsf@ortler.iwr.uni-heidelberg.de>

"Geoffrey Summerhayes" <·······@NhOoStPmAaMil.com> writes:

> Just a thought, ugly code follows...
> 
> (defun prime-p(n found-primes)
>   (let ((root (sqrt n)))
>     (dolist (x found-primes t)
>       (cond ((> x root) (return t))
>             ((zerop (mod n x)) (return nil))))))
> 
> (defun generate-primes(n)
>   (let* ((list (list 2))
>          (tail list))
>     (loop for x from 3 upto n by 2
>           when (prime-p x list)
>           do (progn
>                (rplacd tail (cons x nil))
>                (setf tail (cdr tail))))
>     list))

On first sight, this may look like an improvement, but probably it isn't.
The reason is that prime density decays only very slowly.  IIRC, you have
something like n/ln(n) primes below n, and you would have to test for each
prime p upto sqrt(p).  I assume that a careful analysis would yield
something like an O(n^(3/2)*logarithmic(n)) operation count.

Nicolas.

From: Geoffrey Summerhayes
Subject: Re: Analysis of different algorithms (rather OT)
Date: Sat, 25 Jan 2003 11:15:37 +0000
Message-ID: <mluY9.13703$5K5.1106674@news20.bellglobal.com>

"Nicolas Neuss" <·············@iwr.uni-heidelberg.de> wrote in message ···················@ortler.iwr.uni-heidelberg.de...
> "Geoffrey Summerhayes" <·······@NhOoStPmAaMil.com> writes:
>
> > Just a thought, ugly code follows...
> >
> > (defun prime-p(n found-primes)
> >   (let ((root (sqrt n)))
> >     (dolist (x found-primes t)
> >       (cond ((> x root) (return t))
> >             ((zerop (mod n x)) (return nil))))))
> >
> > (defun generate-primes(n)
> >   (let* ((list (list 2))
> >          (tail list))
> >     (loop for x from 3 upto n by 2
> >           when (prime-p x list)
> >           do (progn
> >                (rplacd tail (cons x nil))
> >                (setf tail (cdr tail))))
> >     list))
>
> On first sight, this may look like an improvement, but probably it isn't.
> The reason is that prime density decays only very slowly.  IIRC, you have
> something like n/ln(n) primes below n, and you would have to test for each
> prime p upto sqrt(p).  I assume that a careful analysis would yield
> something like an O(n^(3/2)*logarithmic(n)) operation count.

One of the reasons I distrust big O, it tells about behaviour but avoids
the constants, what's the use of an O(1) procedure if it takes 5*10^49 years
to run? Using a bare bones sieve without division is faster (w/o optimization
in one implementation the following ran the primes n=1000000 in almost
1/10th the time)

(defun sieve-primes(n)
  "Returns a simple-bit-vector size n+1 where each index
that is prime is set and each non-prime is clear."
  (let ((sieve (make-array (+ n 1)
                           :element-type 'bit
                           :initial-element 1)))
    (setf (sbit sieve 0) 0
          (sbit sieve 1) 0)
    (flet ((clear-sieve (i)
             (loop for x from (+ i i) upto n by i
                   do (setf (sbit sieve x) 0))))
      (loop for x from 2 upto n
            when (= 1 (sbit sieve x))
            do (clear-sieve x))
      sieve)))