From: Oyvin Halfdan Thuv
Subject: Join of sets in CL
Date: 
Message-ID: <7ozmxs68ow.fsf@apollo.orakel.ntnu.no>
CL has a few set functions (member, union, intersection) and these are of
great use. However, while trying to implement the apriori (data-mining)
algrotitm by Agrawar/Srikant (look at this document for more information:
http://www.almaden.ibm.com/software/quest/Publications/papers/vldb94.pdf), I
need to implement join. As far as I can tell it is a natural join, as known
from database theory, but I'm not 100% sure on this. Therefore I kindly ask
anyone mastering the art of set theory to answer how to implement join in
CL.

The sets refered to as an example in the article are {{1 2 3} {1 2 4} {1 3 4}
{1 3 5} {2 3 4}}. This set (of sets) is joined by itself (for whatever that
means) and the result should be {{1 2 3 4} {1 3 4 5}}.

-- 
Best regards,
Oyvin

From: ···············@yahoo.com
Subject: Re: Join of sets in CL
Date: 
Message-ID: <1109353252.914770.32500@z14g2000cwz.googlegroups.com>
Please tell us what "join" and "natural join" mean.  For example, why
is the answer to the example not {{1 2 3 4 5}}?
Once we know what they mean, it will be fun to implement the functions
in CL.
From: Barry Margolin
Subject: Re: Join of sets in CL
Date: 
Message-ID: <barmar-AAE576.19434825022005@comcast.dca.giganews.com>
In article <·······················@z14g2000cwz.googlegroups.com>,
 ···············@yahoo.com wrote:

> Please tell us what "join" and "natural join" mean.  For example, why
> is the answer to the example not {{1 2 3 4 5}}?
> Once we know what they mean, it will be fun to implement the functions
> in CL.

I haven't read it, but his original post gave the URL to an article that 
supposedly explains what he's trying to do.

Also, if you know anything about relational databases, "join" is a 
well-known term there.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
From: Barry Margolin
Subject: Re: Join of sets in CL
Date: 
Message-ID: <barmar-976382.19460025022005@comcast.dca.giganews.com>
In article <··············@apollo.orakel.ntnu.no>,
 Oyvin Halfdan Thuv <·····@remove.spam.oyvins.net> wrote:

> CL has a few set functions (member, union, intersection) and these are of
> great use. However, while trying to implement the apriori (data-mining)
> algrotitm by Agrawar/Srikant (look at this document for more information:
> http://www.almaden.ibm.com/software/quest/Publications/papers/vldb94.pdf), I
> need to implement join. As far as I can tell it is a natural join, as known
> from database theory, but I'm not 100% sure on this. Therefore I kindly ask
> anyone mastering the art of set theory to answer how to implement join in
> CL.

If this is database theory, the individual items in a join are not sets, 
they're tuples.  

> The sets refered to as an example in the article are {{1 2 3} {1 2 4} {1 3 4}
> {1 3 5} {2 3 4}}. This set (of sets) is joined by itself (for whatever that
> means) and the result should be {{1 2 3 4} {1 3 4 5}}.

A difference between sets and tuples is that tuples allow duplicates.  
E.g. one of the tuples in the set could be {1 1 3}, but as a set that 
would be {1 3}.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
From: John Thingstad
Subject: Re: Join of sets in CL
Date: 
Message-ID: <opsmrq9zu0pqzri1@mjolner.upc.no>
On 25 Feb 2005 15:00:31 +0100, Oyvin Halfdan Thuv  
<·····@remove.spam.oyvins.net> wrote:

> CL has a few set functions (member, union, intersection) and these are of
> great use. However, while trying to implement the apriori (data-mining)
> algrotitm by Agrawar/Srikant (look at this document for more information:
> http://www.almaden.ibm.com/software/quest/Publications/papers/vldb94.pdf),  
> I
> need to implement join. As far as I can tell it is a natural join, as  
> known
> from database theory, but I'm not 100% sure on this. Therefore I kindly  
> ask
> anyone mastering the art of set theory to answer how to implement join in
> CL.
>
> The sets refered to as an example in the article are {{1 2 3} {1 2 4} {1  
> 3 4}
> {1 3 5} {2 3 4}}. This set (of sets) is joined by itself (for whatever  
> that
> means) and the result should be {{1 2 3 4} {1 3 4 5}}.
>

Not sure quite what it means but this produses the given
result given the opening list.

(defvar list '((1 2 3) (1 2 4) (1 3 4) (1 3 5) (2 3 4)))

(defun join-pair (one two)
   (remove-duplicates
    (merge 'list (copy-list one) (copy-list two) #'eql)))

(defun join (list)
   (loop for first = (car list) then (car list)
         and second = (cadr list) then (cadr list)
         until (or (eq first nil) (eq second nil))
         do (setf list (cddr list))
         collect (join-pair first second)))

(join list)

=> ((1 2 3 4) (1 3 4 5))

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
From: Harald Hanche-Olsen
Subject: Re: Join of sets in CL
Date: 
Message-ID: <pcozmxs5qgl.fsf@shuttle.math.ntnu.no>
+ Oyvin Halfdan Thuv <·····@remove.spam.oyvins.net>:

| CL has a few set functions (member, union, intersection) and these are of
| great use. However, while trying to implement the apriori (data-mining)
| algrotitm by Agrawar/Srikant (look at this document for more information:
| http://www.almaden.ibm.com/software/quest/Publications/papers/vldb94.pdf), I
| need to implement join. As far as I can tell it is a natural join, as known
| from database theory, but I'm not 100% sure on this. Therefore I kindly ask
| anyone mastering the art of set theory to answer how to implement join in
| CL.
| 
| The sets refered to as an example in the article are {{1 2 3} {1 2 4} {1 3 4}
| {1 3 5} {2 3 4}}. This set (of sets) is joined by itself (for whatever that
| means) and the result should be {{1 2 3 4} {1 3 4 5}}.

From a quick look at the referenced pdf, I think that this "join"
operation is not so much an operation on sets as on ordered sets.
Notice how each set in the example is listed with their elements in
increasing order.  If I'm right, after ordering we're really being
asked to work on a set of lists, say

  {(1 2 3) (1 2 4) (1 3 4) (1 3 5) (2 3 4)}

(in quasi-Lisp notation).  Then we're looking for pairs in which all
but the final element agree.  One case is (1 2 3) and (1 2 4).  So we
take the union of their elements, in order: (1 2 3 4).  The only other
case it (1 3 4) and (1 3 5), together yielding (1 3 4 5).

Now the strategy seems clear: We hash on the keys (1 2) (1 2) (1 3) (1
3) (2 4), then iterate through the hash table.  Whenever a bucket
holds n>1 items, include the n(n-1)/2 items obtained by pairing them
up.

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- Debating gives most of us much more psychological satisfaction
  than thinking does: but it deprives us of whatever chance there is
  of getting closer to the truth.  -- C.P. Snow
From: Oyvin Halfdan Thuv
Subject: Re: Join of sets in CL
Date: 
Message-ID: <7owtsvh2ec.fsf@apollo.orakel.ntnu.no>
Harald Hanche-Olsen <······@math.ntnu.no> writes:

> + Oyvin Halfdan Thuv <·····@remove.spam.oyvins.net>:

> | The sets refered to as an example in the article are {{1 2 3} {1 2 4} {1 3 4}
> | {1 3 5} {2 3 4}}. This set (of sets) is joined by itself (for whatever that
> | means) and the result should be {{1 2 3 4} {1 3 4 5}}.
> 
> From a quick look at the referenced pdf, I think that this "join"
> operation is not so much an operation on sets as on ordered sets.

I believe so. "Items within an itemset are kept in lexicographic order", is
stated in the article.

> Notice how each set in the example is listed with their elements in
> increasing order.  If I'm right, after ordering we're really being
> asked to work on a set of lists, say
> 
>   {(1 2 3) (1 2 4) (1 3 4) (1 3 5) (2 3 4)}
> 
> (in quasi-Lisp notation).  Then we're looking for pairs in which all
> but the final element agree.  One case is (1 2 3) and (1 2 4).  So we
> take the union of their elements, in order: (1 2 3 4).  The only other
> case it (1 3 4) and (1 3 5), together yielding (1 3 4 5).

I found this:
http://www.togaware.com/datamining/survivor/Algorithm0.html (pseudocode).
The page is partially broken, so I'll post the interesting part here:

| Generate-Candidates(f_k):
| 
| candidates <- NIL
| for u <Element-Of> f_k, v <Element-Of> f_k, u < v:
|         if u_{1:-1} = v_{1:-1}:
|                 c <- u <Union> v_-1
|                 s <- Subsets(c)
|                 if Length(Filter(<lambda>x : x <Element-Of> f_k, s)) =
|                                                                  Length(s):
|                         candidates.append(c)
| return candidates

f_k is a set of item sets of length k, so u and v are item sets. It seems
that only the first element is compared here, before a union of the sets
are made. This makes sense though, because the arguments that are applied
to this funcion are sets of larger and larger size. They're not nessercarily
sets of size 3.

Since all sets are of the same size, the comparison u < v must refer to
lexicographical order.

The filter is supposed to remove any candidate that does not contain all
subsets of c, so to avoid looking into "uninteresting" relations later in
the algrorithm.

So if I understand you correctly, you are right, exept that we're hashing on
one element, not on pairs. So, thanks for your comments, I'll try to 
implement this in CL.

-- 
Oyvin