From: John Thingstad
Subject: set intersection O(n) using ordering
Date: 
Message-ID: <op.szzctpiopqzri1@mjolner.upc.no>
On my last post yesterday I said set intersection could be done in O(n) is  
the
elements are ordered.
Here is a example.
It is primitive and has only been made to work with lists of integers.

However, it should be fairly easy to extend to general elements of the  
same type.
Obviously creating a set is O(n log n).
There is no overhead in memory and time when doing a intersection though.


(defun make-set (list)
   (sort list #'<))

(defun set-intersection (set1 set2)
   (when (or (= (length set1) 0) (= (length set2) 0))
     (return-from set-intersection nil))
   (let ((result-set nil))
     (block intersect
       (macrolet ((advance (pointer)
                    `(progn
                      (setf ,pointer (rest ,pointer))
                      (when (eq ,pointer nil) (return-from intersect  
nil)))))
         (loop
           (cond
            ((< (first set1) (first set2))
             (advance set1))
            ((< (first set2) (first set1))
             (advance set2))
            (t
             (push (first set1) result-set)
             (advance set1)
             (advance set2))))))
     (setf result-set (nreverse result-set))))

The defintion I use for set is:

(defun numlist-p (candidate) (and (listp candidate) (every #'integerp  
candidate)))

(defun sorted-p (candidate)
   (when (< (length candidate) 2)
     (return-from sorted-p t))
   (let ((current-list candidate) (trailing-list (rest candidate)))
     (loop
       (when (eq trailing-list nil)
         (return-from sorted-p t))
       (when (> (first current-list) (first trailing-list))
         (return-from sorted-p nil))
       (setf current-list trailing-list trailing-list (rest  
trailing-list)))))

(defun set-p (candidate) (and (numlist-p candidate) (sorted-p candidate)))

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

From: Louis Theran
Subject: Re: set intersection O(n) using ordering
Date: 
Message-ID: <1131552410.660295.226790@g43g2000cwa.googlegroups.com>
John Thingstad wrote:

> (defun set-intersection (set1 set2)
>    (when (or (= (length set1) 0) (= (length set2) 0))
>      (return-from set-intersection nil))
>    (let ((result-set nil))
>      (block intersect
>        (macrolet ((advance (pointer)
>                     `(progn
>                       (setf ,pointer (rest ,pointer))
>                       (when (eq ,pointer nil) (return-from intersect
> nil)))))
>          (loop
>            (cond
>             ((< (first set1) (first set2))
>              (advance set1))
>             ((< (first set2) (first set1))
>              (advance set2))
>             (t
>              (push (first set1) result-set)
>              (advance set1)
>              (advance set2))))))
>      (setf result-set (nreverse result-set))))

This strikes me as overly complicated.  Here is a simpler one.

(defun sorted-intersection (r s)
  (loop with l = (merge 'list r s '<=)
        for i in l
        for j in (cdr l)
        while j
        when (= i j) collect i))

If you are willing to accept a randomized algorithm, then something
like

(defun fast-intersection (l r)
  (loop
    with h = (make-hash-table :test 'eql)
    for i in (append l r)
    when (> (incf (gethash i h 0)) 1)
    collect i))

works too.

^L
From: Kaz Kylheku
Subject: Re: set intersection O(n) using ordering
Date: 
Message-ID: <1131565330.002914.3830@g44g2000cwa.googlegroups.com>
John Thingstad wrote:
> On my last post yesterday I said set intersection could be done in O(n) is
> the
> elements are ordered.

That must be so, since it can also be done if they are not ordered. The
test for set membership can be an O(1) operation (think hashtables).

You iterate over one set, collecting all elements which are present in
the other.

> Obviously creating a set is O(n log n).

Or O(n) if you just collect into an unordered bag such as a hash.
From: Louis Theran
Subject: Re: set intersection O(n) using ordering
Date: 
Message-ID: <1131566779.036016.260890@o13g2000cwo.googlegroups.com>
Kaz Kylheku wrote:

> That must be so, since it can also be done if they are not ordered. The
> test for set membership can be an O(1) operation (think hashtables).

The context here is deterministic algorithms.

^L
From: John Thingstad
Subject: Re: set intersection O(n) using ordering
Date: 
Message-ID: <op.szzougmlpqzri1@mjolner.upc.no>
On Wed, 09 Nov 2005 20:42:10 +0100, Kaz Kylheku <········@gmail.com> wrote:

> John Thingstad wrote:
>> On my last post yesterday I said set intersection could be done in O(n)  
>> is
>> the
>> elements are ordered.
>
> That must be so, since it can also be done if they are not ordered. The
> test for set membership can be an O(1) operation (think hashtables).
>
> You iterate over one set, collecting all elements which are present in
> the other.
>
>> Obviously creating a set is O(n log n).
>
> Or O(n) if you just collect into an unordered bag such as a hash.
>

You are missing the dicussion we had on this before under
Re: Why do Python, Java and Perl have so many libraries and CL does not?
(Yeah.. irrelevant topic header)

The idea is that these data sets are stored in lists.

William Bland had already offered a hash table approach.
The idea is that these list's are very long and that
there is a large overhead in heap usage and thus GC due to this.
My algorithm uses only one pass and allocates memory only
for the result.

Marcio Antionitti also seems to have missed this.
He allocates a new array using merge.
Also he needs two passes.

I have optimized for intersection at the cost of setting up
the set. The idea being that these set's are used in many
intersections.


-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
From: Marco Antoniotti
Subject: Re: set intersection O(n) using ordering
Date: 
Message-ID: <Iqtcf.74$pa3.24570@typhoon.nyu.edu>
John Thingstad wrote:
> On Wed, 09 Nov 2005 20:42:10 +0100, Kaz Kylheku <········@gmail.com> wrote:
> 
>> John Thingstad wrote:
>>
>>> On my last post yesterday I said set intersection could be done in 
>>> O(n)  is
>>> the
>>> elements are ordered.
>>
>>
>> That must be so, since it can also be done if they are not ordered. The
>> test for set membership can be an O(1) operation (think hashtables).
>>
>> You iterate over one set, collecting all elements which are present in
>> the other.
>>
>>> Obviously creating a set is O(n log n).
>>
>>
>> Or O(n) if you just collect into an unordered bag such as a hash.
>>
> 
> You are missing the dicussion we had on this before under
> Re: Why do Python, Java and Perl have so many libraries and CL does not?
> (Yeah.. irrelevant topic header)
> 
> The idea is that these data sets are stored in lists.
> 
> William Bland had already offered a hash table approach.
> The idea is that these list's are very long and that
> there is a large overhead in heap usage and thus GC due to this.
> My algorithm uses only one pass and allocates memory only
> for the result.
> 
> Marcio Antionitti also seems to have missed this.
> He allocates a new array using merge.
> Also he needs two passes.

Never did this.

I was talking about something else altogether.

> 
> I have optimized for intersection at the cost of setting up
> the set. The idea being that these set's are used in many
> intersections.
> 

I.e. you tailored your solution to a particular application, which is 
what should be done every time.

My application requires a lot of calls to an intersection function for 
relatively large sets which I choose to represent as lists (as I do not 
need a "search" function in the set).  My points were that (1) I got 
bitten by CL:INTERSECTION not guaranteeing at least expected O(N) time, 
and (2) writing an hash table based intersection function which gave me 
O(N) expected time performance solved my problems.

Cheers
--
marco
From: John Thingstad
Subject: Re: set intersection O(n) using ordering
Date: 
Message-ID: <op.szzs97ocpqzri1@mjolner.upc.no>
On Wed, 09 Nov 2005 21:59:19 +0100, Marco Antoniotti <·······@cs.nyu.edu>  
wrote:

> Never did this.
>
> I was talking about something else altogether.

Sorry! I was talkin abous someone else altogether. Lois Theran
Must have got my wires crossed.

>
> My application requires a lot of calls to an intersection function for  
> relatively large sets which I choose to represent as lists (as I do not  
> need a "search" function in the set).  My points were that (1) I got  
> bitten by CL:INTERSECTION not guaranteeing at least expected O(N) time,  
> and (2) writing an hash table based intersection function which gave me  
> O(N) expected time performance solved my problems.

Yes. I think we both figured that out in the end.

My point was that there are two ways of going about this
each with it's own advantages.
This was the option that wasn't covered by Blake.


-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
From: Louis Theran
Subject: Re: set intersection O(n) using ordering
Date: 
Message-ID: <1131573507.483000.139410@f14g2000cwb.googlegroups.com>
John Thingstad wrote:

> Sorry! I was talkin abous someone else altogether. Lois Theran

You are having a little trouble with names today. :)

^L
From: Cameron MacKinnon
Subject: Re: set intersection O(n) using ordering
Date: 
Message-ID: <VfqdnQ5-j57Bpe_eRVn-hg@rogers.com>
John Thingstad wrote:
> (defun set-intersection (set1 set2)
>   (when (or (= (length set1) 0) (= (length set2) 0))

(or (null set1) (null set2))
Think of one null set and one 10,000 element set.

>     (return-from set-intersection nil))
>   (let ((result-set nil))
>     (block intersect
...
From: Louis Theran
Subject: Re: set intersection O(n) using ordering
Date: 
Message-ID: <1131560558.793690.52470@g44g2000cwa.googlegroups.com>
Cameron MacKinnon wrote:

> Think of one null set and one 10,000 element set.

It's still linear time.  Do you have an algorithm more output sensitive
than the size of the smaller set?

^L
From: Cameron MacKinnon
Subject: Re: set intersection O(n) using ordering
Date: 
Message-ID: <b7WdnYB_05l83e_eRVn-rw@rogers.com>
Louis Theran wrote:
> Cameron MacKinnon wrote:
> 
> 
>>Think of one null set and one 10,000 element set.
> 
> 
> It's still linear time.  Do you have an algorithm more output sensitive
> than the size of the smaller set?

I don't really understand what you're saying here. My point was that the 
null set check should be constant time, but in John's code it was 
linear. You seem happy with this suboptimal aspect of the code as 
presented. Maybe we could improve it by checking the length of the 
shorter set first?  :-)
From: Louis Theran
Subject: Re: set intersection O(n) using ordering
Date: 
Message-ID: <1131562652.133886.173180@g44g2000cwa.googlegroups.com>
Cameron MacKinnon wrote:
> Louis Theran wrote:
> > Cameron MacKinnon wrote:
> >
> >
> >>Think of one null set and one 10,000 element set.
> >
> >
> > It's still linear time.  Do you have an algorithm more output sensitive
> > than the size of the smaller set?
>
> I don't really understand what you're saying here.

My algorithm looks like this asymptotically:

  Preprecessing time: n*log n
  Intersection time: n

With your improvement, we get to:

  Preprecessing time: n*log n
  Intersection time: size of smaller set

Ideally we would like something like:

  Proprocessing time: n*log n
  Intersection time: size of intersection + log n

since we are doing super-linear preprocessing work and wasting linear
space in the process.  I haven't thought much about it, so I don't know
what's possible.  In real problems you often get a little more
structure that allows for smarter algorithms.

> My point was that the
> null set check should be constant time, but in John's code it was
> linear.

I admit to not wading through his code, since it seemed overly
complicated for the job.  Sorry.

^L
From: John Thingstad
Subject: Re: set intersection O(n) using ordering
Date: 
Message-ID: <op.szzjkzbqpqzri1@mjolner.upc.no>
On Wed, 09 Nov 2005 18:55:08 +0100, Cameron MacKinnon  
<··········@clearspot.net> wrote:

> John Thingstad wrote:
>> (defun set-intersection (set1 set2)
>>   (when (or (= (length set1) 0) (= (length set2) 0))
>
> (or (null set1) (null set2))
> Think of one null set and one 10,000 element set.
>
>>     (return-from set-intersection nil))
>>   (let ((result-set nil))
>>     (block intersect
> ...

urm..yes.. The original code had a
(check-type set1 (set-p set1))
(check-type set2 (set-p set2))

If you look at the code for set-p you will see that it needs to
be of type list and thus the only special case is '()
I removed it for efficiency reasons and for clearity but
, yes, in the I gave you it assumes YOU assure it is a legal
set-p.

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
From: John Thingstad
Subject: Re: set intersection O(n) using ordering
Date: 
Message-ID: <op.szzli1espqzri1@mjolner.upc.no>
On Wed, 09 Nov 2005 19:13:37 +0100, John Thingstad  
<··············@chello.no> wrote:

Oops I should check before I write..

(deftype set () `(satisfies set-p))

(defun setintersection (set1 set2)
   (check-type set1 set)
   (check-type set2 set)
   ...

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/