From: Ira Baxter
Subject: How to do set-equality comparisons in CommonLisp?
Date: 
Message-ID: <BAXTER.93Dec29183022@bandit.austin.slcs.slb.com>
I have two big sets (several thousand elements), represented as lists
with no duplicate elements, each list having arbitrary elements (well,
structures) in some arbitrary order.  How do I efficiently determine
if the contents of both lists are identical?

1) I can check that the result of set-exclusive-or on the two list is
null, but that goes through the extra work and consumes extra storage
to actually compute the set difference.  All I want is T or F.

2) I could insert the contents of the first list into a hash table,
and then delete everything found from the second list from it.  An
element from the second set that is missing from the hash table tells
me "not equal.".  A non-empty hash table when done tells me "not
equal". This suffers from much the same problem as 1.
At least I know it is linear time in the size of the sets.

3) I also thought of sorting of the lists, and comparing for EQUAL,
but I can't think of an appropriate ordering operator (sxhash?
Nah... no gaurantee that two different elements don't have same hash
code, which means sort order may be unstable, which makes
sort-then-compare unreliable).  This is N ln N time.

Any ideas?

[How do the built-in functions INTERSECTION and UNION work?
For LUCID, they seem to take linear time.]

--------------------------------------------------------------------
Ira Baxter   email: ······@austin.slcs.slb.com   ph: 1-512-331-3714
Schlumberger Laboratory for Computer Science
8311 North RR 620   [for snail mail, use PO Box 200015 instead]
Austin, Texas, 78720

--
--------------------------------------------------------------------
Ira Baxter   email: ······@austin.slcs.slb.com   ph: 1-512-331-3714
Schlumberger Laboratory for Computer Science
8311 North RR 620   [for snail mail, use PO Box 200015 instead]
Austin, Texas, 78720

From: David Loewenstern
Subject: Re: How to do set-equality comparisons in CommonLisp?
Date: 
Message-ID: <CJ2Bx5.96J@cbnewsl.cb.att.com>
In article <····················@bandit.austin.slcs.slb.com>, ······@bandit.austin.slcs.slb.com (Ira Baxter) writes:
> 
> 
> I have two big sets (several thousand elements), represented as lists
> with no duplicate elements, each list having arbitrary elements (well,
> structures) in some arbitrary order.  How do I efficiently determine
> if the contents of both lists are identical?

Have you considered adding a dummy slot to the structures?  Yes, it
adds an extra word per structure.  But then, you can simply mark all
the elements of the first list, check for an element in the second
list that's unmarked, then reverse the procedure.  
Time: O(length(list1)+length(list2))

These opinions are shareware.  If you like the product,
please send your $0.02 to
               David Loewenstern
<········@paul.rutgers.edu || ·················@att.com>
From: CHRISTOPHER ELIOT
Subject: Re: How to do set-equality comparisons in CommonLisp?
Date: 
Message-ID: <5JAN199418092187@cs.umass.edu>
In article <··········@cbnewsl.cb.att.com>, ·····@hogpb.att.com (David Loewenstern) writes...
>In article <····················@bandit.austin.slcs.slb.com>, ······@bandit.austin.slcs.slb.com (Ira Baxter) writes:
>> 
>> 
>> I have two big sets (several thousand elements), represented as lists
>> with no duplicate elements, each list having arbitrary elements (well,
>> structures) in some arbitrary order.  How do I efficiently determine
>> if the contents of both lists are identical?

You can sort both lists [ O(N log(N)) ] and then check for exact
equality [ O(N) ] provided there is any key to sort them on.

This is significantly better than the dumb element by element check
which is O(n*N) of course.

You can also use:

(and (subsetp S1 S2 :key #'your-key)
     (subsetp S2 S1 :key #'your-key))

If subsetp is implemented well this might also be efficient.

Chris Eliot
Umass/Amherst
From: Jens Franzen
Subject: Re: How to do set-equality comparisons in CommonLisp?
Date: 
Message-ID: <1994Jan6.073915.16800@newsserver.rrzn.uni-hannover.de>
Another possibility is to write your own function doing the following:

1. Store the elements of set1 in hashtable
2. Iterate over set2 and remove all elements in the hash table

The sets are not equal if 
a. an element of set2 is not in the hashtable
or
b. the hash table is not empty after the iteration over set2 is finished

The efficiency depends on the determination of the hash key and the table 
management, ignoring this it may be O(2n).


--------------------------------------------------------------------------------
Dipl.-Ing. Jens Franzen                    Tel. +49-511-762-5039
Laboratorium fuer Informationstechnologie  Fax. +49-511-762-5051
University of Hannover                     Telex. 923868 unihn d
Schneiderberg 32                           Email
D-3000  Hannover                           ·······@scoop.ims.uni-hannover.dbp.de
Fed. Rep. of Germany
--------------------------------------------------------------------------------
From: Patrick A. O'Donnell
Subject: Re: How to do set-equality comparisons in CommonLisp?
Date: 
Message-ID: <PAO.94Jan6102700@soggy-fibers.ai.mit.edu>
In article <··········@cbnewsl.cb.att.com>, ·····@hogpb.att.com (David Loewenstern) writes...
>In article <····················@bandit.austin.slcs.slb.com>, ······@bandit.austin.slcs.slb.com (Ira Baxter) writes:
>> 
>> 
>> I have two big sets (several thousand elements), represented as lists
>> with no duplicate elements, each list having arbitrary elements (well,
>> structures) in some arbitrary order.  How do I efficiently determine
>> if the contents of both lists are identical?

If you can use O(N) extra space, make a hash table; store each element
of the first list in the table; test each element of the second for
existence in the table; if all of the second set's elements are in
there and the sets are of the same length, they're identical.
O(N) time.

		- Patrick A. O'Donnell
		  ···@ai.mit.edu
From: Markus Fischer
Subject: Re: How to do set-equality comparisons in C
Date: 
Message-ID: <1994Jan6.160152.11303@unix.brighton.ac.uk>
In article ················@cs.umass.edu, ·····@cs.umass.edu (CHRISTOPHER ELIOT) writes:
> In article <··········@cbnewsl.cb.att.com>, ·····@hogpb.att.com (David Loewenstern) writes...
> >In article <····················@bandit.austin.slcs.slb.com>, ······@bandit.austin.slcs.slb.com (Ira Baxter) writes:
> >> 
> >> 
> >> I have two big sets (several thousand elements), represented as lists
> >> with no duplicate elements, each list having arbitrary elements (well,
> >> structures) in some arbitrary order.  How do I efficiently determine
> >> if the contents of both lists are identical?
> 
> You can sort both lists [ O(N log(N)) ] and then check for exact
> equality [ O(N) ] provided there is any key to sort them on.
> 
> This is significantly better than the dumb element by element check
> which is O(n*N) of course.
> 
> You can also use:
> 
> (and (subsetp S1 S2 :key #'your-key)
>      (subsetp S2 S1 :key #'your-key))
> 
> If subsetp is implemented well this might also be efficient.
 
(null (set-difference s1 s2))
might also be worthwile testing.
But as it turns out, (equal (sort s1 #'>) (sort s2 #'>)) 
outperformed the other approaches by an order of a magnitude
on both of the lisp platforms I have access to. (Test sequences
had 1000 different fixnums in it).
 
Markus Fischer (···@itri.bton.ac.uk)
Information Technology Research Institute - University of Brighton
Lewes Road, Brighton BN2 4AT, UK  Tel: +44 273 642916  Fax: 606653