From: Paul Reiners
Subject: Q about Lisp union
Date: 
Message-ID: <9dd75922-7d7c-4a48-af7e-d03e09518cf2@t54g2000hsg.googlegroups.com>
As I thought I understood it, Lisp union implements set union. So, I
thought that

  (union '(a a) '(a b))

would return

  (a b)

And, in fact, it does. This led me to think that

  (union '(a a) '(b))

would also return

  (A B)

However, it returns:

  (A A B)

Why does the fact that there is one less 'a in the second argument in
the second call mean that there is one more 'a in the result? This is
definitely not set-theoretic union. Why is the behavior of Lisp union
defined this way?

Here is the sequence of calls again:

  [1]> (union '(a a) '(a b))
  (A B)
  [2]> (union '(a a) '(b))
  (A A B)

From: Paul Reiners
Subject: Re: Q about Lisp union
Date: 
Message-ID: <a6376be0-3640-41e3-843b-616278402dad@m44g2000hsc.googlegroups.com>
On Mar 26, 8:25 am, Paul Reiners <············@gmail.com> wrote:
> As I thought I understood it, Lisp union implements set union. So, I
> thought that
>
>   (union '(a a) '(a b))
>
> would return
>
>   (a b)
>
> And, in fact, it does. This led me to think that
>
>   (union '(a a) '(b))
>
> would also return
>
>   (A B)
>
> However, it returns:
>
>   (A A B)
>
> Why does the fact that there is one less 'a in the second argument in
> the second call mean that there is one more 'a in the result? This is
> definitely not set-theoretic union. Why is the behavior of Lisp union
> defined this way?
>
> Here is the sequence of calls again:
>
>   [1]> (union '(a a) '(a b))
>   (A B)
>   [2]> (union '(a a) '(b))
>   (A A B)

It turns out that if there are duplicate entries in either of the two
lists, the behavior is undefined as to whether there should be
duplicate entries in the union:

If there is a duplication between list-1 and list-2, only one of the
duplicate instances will be in the result. If either list-1 or list-2
has duplicate entries within it, the redundant entries might or might
not appear in the result. [my emphasis]

--- from http://www.lisp.org/HyperSpec/Body/fun_unioncm_nunion.html

This is from the ANSI Common Lisp standard.

The fact that they leave the behavior of union in this case undefined
in the standard is surprising to me.  The fact that SBCL and GNU CLISP
implement it the way they do is just bizarre to me.
From: Thomas A. Russ
Subject: Re: Q about Lisp union
Date: 
Message-ID: <ymiabklry90.fsf@blackcat.isi.edu>
Paul Reiners <············@gmail.com> writes:

> As I thought I understood it, Lisp union implements set union. So, I
> thought that
> 
>   (union '(a a) '(a b))
>   (union '(a a) '(b))

These are illegal inputs to UNION.  Union is allowed to assume that the
inputs to it are proper sets, so you are not supposed to pass '(a a) as
an argument.  Internally, the algorithm can do whatever it wants to
process the union based on the assumption that it has gotten sets.  It
does not need to check for duplicates in its input.

One could imagine a simple algorithm that tries to be a little bit
clever by starting with the larger set and then only testing the smaller
set for membership.  Perhaps that is what your implementation does.

-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Pascal Bourguignon
Subject: Re: Q about Lisp union
Date: 
Message-ID: <87wsnpqgj3.fsf@thalassa.informatimago.com>
···@sevak.isi.edu (Thomas A. Russ) writes:

> Paul Reiners <············@gmail.com> writes:
>
>> As I thought I understood it, Lisp union implements set union. So, I
>> thought that
>> 
>>   (union '(a a) '(a b))
>>   (union '(a a) '(b))
>
> These are illegal inputs to UNION.  Union is allowed to assume that the
> inputs to it are proper sets, so you are not supposed to pass '(a a) as
> an argument.  

Absolutely not.

CLHS is extremely clear about that.  Either input list can contain
duplicates, and CLHS explicitely  specifies that in this case the
result may contain duplicates or not.

> Internally, the algorithm can do whatever it wants to
> process the union based on the assumption that it has gotten sets.  

There is no specific assumption.  CL:UNION is merely specified to return:

      "a list that contains every element that occurs in either
      list-1 or list-2."

> It does not need to check for duplicates in its input.

Indeed, as specified, it doesn't need to.

> One could imagine a simple algorithm that tries to be a little bit
> clever by starting with the larger set and then only testing the smaller
> set for membership.  Perhaps that is what your implementation does.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS d? s++:++ a+ C+++ UL++++ P--- L+++ E+++ W++ N+++ o-- K- w--- 
O- M++ V PS PE++ Y++ PGP t+ 5+ X++ R !tv b+++ DI++++ D++ 
G e+++ h+ r-- z? 
------END GEEK CODE BLOCK------
From: Scott Burson
Subject: Re: Q about Lisp union
Date: 
Message-ID: <56c6ff3f-0398-4480-9bac-a2b4e0226c05@i29g2000prf.googlegroups.com>
If you want to do set operations in Lisp -- and particularly if you
want to do them efficiently on large sets -- permit me to recommend my
FSet library:

http://common-lisp.net/project/fset/

It uses binary trees to do union and similar operations in linear
time.  It also permits structures like sets of sets, maps from sets to
something else, etc. to be manipulated very elegantly.

-- Scott
From: Ken Tilton
Subject: Re: Q about Lisp union
Date: 
Message-ID: <47ead457$0$15194$607ed4bc@cv.net>
Scott Burson wrote:
> If you want to do set operations in Lisp -- and particularly if you
> want to do them efficiently on large sets -- permit me to recommend my
> FSet library:
> 
> http://common-lisp.net/project/fset/
> 
> It uses binary trees to do union and similar operations in linear
> time.  It also permits structures like sets of sets, maps from sets to
> something else, etc. to be manipulated very elegantly.

haha, sounds hot and I actually have an application for that if I decide 
to tear back into one of those ITA puzzles in anger.

now just keep pounding away at these yobbos. :)

c,k

-- 
http://smuglispweeny.blogspot.com/
http://www.theoryyalgebra.com/

"In the morning, hear the Way;
  in the evening, die content!"
                     -- Confucius
From: Pascal J. Bourguignon
Subject: Re: Q about Lisp union
Date: 
Message-ID: <7ciqz9sfvl.fsf@pbourguignon.anevia.com>
Paul Reiners <············@gmail.com> writes:

> As I thought I understood it, Lisp union implements set union. So, I
> thought that
>
>   (union '(a a) '(a b))
>
> would return
>
>   (a b)
>
> And, in fact, it does. This led me to think that
>
>   (union '(a a) '(b))
>
> would also return
>
>   (A B)
>
> However, it returns:
>
>   (A A B)
>
> Why does the fact that there is one less 'a in the second argument in
> the second call mean that there is one more 'a in the result? 

Why do you care?


(defun set-equalp (x y) (and (subsetp x y) (subsetp y x)))
(set-equalp '(a a b) '(a b)) --> true


> This is
> definitely not set-theoretic union. 

Not at all. This is definitely set-theoretic. You have to distinguish
the sets from their representations.  


Note that you just CANNOT distinguish the two sets (a a b) and (a b).
Try it:

(defun try-to-distinguish (x y)
   (dolist (element x)
      (assert (member element y)))
   (dolist (element y)
      (assert (member element x)))
   (values 'undistinguishable-sets x y))

(try-to-distinguish '(a a b) '(a b)) 
--> UNDISTINGUISHABLE-SETS ;
    (A A B) ;
    (A B)


And remember that length is defined on lists, not on sets!

(defun cardinal (set)   (length (remove-duplicates set)))

(= (cardinal '(a a b)) (cardinal '(a b))) --> true



> Why is the behavior of Lisp union
> defined this way?

Because this is more efficient.

CL set-functions represent sets as _lists_ of (possibly duplicated)
elements.

If you want to normalize your representations of set, you can always
use REMOVE-DUPLICATES

(defun normalized-union (x y) (remove-duplicates (union x y)))


-- 
__Pascal Bourguignon__
From: Paul Reiners
Subject: Re: Q about Lisp union
Date: 
Message-ID: <575c707f-42fd-48f3-80bb-0c231d4dd62f@a23g2000hsc.googlegroups.com>
On Mar 26, 9:45 am, ····@informatimago.com (Pascal J. Bourguignon)
wrote:
> Paul Reiners <············@gmail.com> writes:

> (defun set-equalp (x y) (and (subsetp x y) (subsetp y x)))
> (set-equalp '(a a b) '(a b)) --> true
>
> > This is
> > definitely not set-theoretic union.
>
> Not at all. This is definitely set-theoretic. You have to distinguish
> the sets from their representations.

Yes, you're right.  If I'm willing to accept '(a a) as representing
the set '(a) when I use it as an input to union, I should be just as
willing to accept '(a a b) as representing the set '(a b) when I
receive it back from union.
From: Michael Livshin
Subject: Re: Q about Lisp union
Date: 
Message-ID: <877ifpa7vf.fsf@colinux.kakpryg.net.cmm>
Paul Reiners <············@gmail.com> writes:

> Why does the fact that there is one less 'a in the second argument in
> the second call mean that there is one more 'a in the result? This is
> definitely not set-theoretic union. Why is the behavior of Lisp union
> defined this way?

because UNION expects sets as input, and '(A A) is not a set.

> Here is the sequence of calls again:
>
>   [1]> (union '(a a) '(a b))
>   (A B)

here, the daemons that flew out of your nose were benevolent.

>   [2]> (union '(a a) '(b))
>   (A A B)

... and then there were only nasty ones left.

cheers,
--m
From: Pascal J. Bourguignon
Subject: Re: Q about Lisp union
Date: 
Message-ID: <7cej9xsf2h.fsf@pbourguignon.anevia.com>
Michael Livshin <······@cmm.kakpryg.net> writes:

> Paul Reiners <············@gmail.com> writes:
>
>> Why does the fact that there is one less 'a in the second argument in
>> the second call mean that there is one more 'a in the result? This is
>> definitely not set-theoretic union. Why is the behavior of Lisp union
>> defined this way?
>
> because UNION expects sets as input, and '(A A) is not a set.

Indeed. '(A A) reads a form.  
It evaluates to something that prints as: (A A) and that is a list.
It just happens that CL uses lists to _represent_ sets.

The two lists (A A B) and (A B) are obviously different.
But (A B) and (B A) are also different lists.

However, these three lists (A A B), (A B) and (B A) are all
representants of the _same_ set: {A, B}.

Note that in mathematics, {a,a,b}={a,b}={b,a}, since we also write
_representations_ of mathematical sets.

-- 
__Pascal Bourguignon__
From: John Thingstad
Subject: Re: Q about Lisp union
Date: 
Message-ID: <op.t8ms3p2nut4oq5@pandora.alfanett.no>
P� Wed, 26 Mar 2008 14:25:11 +0100, skrev Paul Reiners  
<············@gmail.com>:

> As I thought I understood it, Lisp union implements set union. So, I
> thought that ...

The set isssue has been discussed by others.
I thought I'd mention remove-duplicates.

--------------
John Thingstad
From: Robert Maas, see http://tinyurl.com/uh3t
Subject: Re: Q about Lisp union
Date: 
Message-ID: <rem-2008mar27-004@yahoo.com>
> From: Paul Reiners <············@gmail.com>
> As I thought I understood it, Lisp union implements set union.

Only if that function is given two sets to begin with.
If you give it something that's not a set (for either parameter),
then what it's supposed to do is undefined. IMO if safety and debug
setting is maximum and speed and space setting is zero, it *should*
carefully check for Garbage In (invalid parameter) before doing
what it's supposed to do, and if you gave it something that wasn't
a set then it *should* signal an error and refuse to even try to
return a value. If it doesn't signal an error in this case, you
could complain to the company that wrote the Lisp system you're
using, or you could write a wrapper function that **does** do the
Garbage In check-and-signal-error.
(defun paranoid-union (a b)
  (unless (is-set a) (error "First parameter not a set:~%~S" a))
  (unless (is-set b) (error "Second parameter not a set:~%~S" a))
  (union a b))
(defun is-set (s)
  (not (has-duplicate-elements s)))
(defun has-duplicate-elements (s) ;order length^2 time
  (loop for (elem . tail) on s
        do (if (member elem tail) (return t))))
(defun has-duplicate-elements (s) ;much faster for large sets
  (let ((ht (make-hash-table))) ;Defaults EQL
    (loop for elem in s
          do (if (gethash elem ht) (return t) (setf (gethash elem ht) t)))))
;Exercise for reader to include keyword parameters too.

> So, I thought that
>  (union '(a a) '(a b))
> would ...
STOP ALREADY!!

Don't just read the name of a function and guess what it does.
Instead, read the documentation for it and try to understand what it says.
<http://www.lispworks.com/documentation/HyperSpec/Body/f_unionc.htm>
   union list-1 list-2 &key key test test-not => result-list
   ... If either list-1 or list-2 has duplicate entries within it,
   the redundant entries might or might not appear in the result. ...

(describe 'union) in CMUCL doesn't give that warning. The HyperSpec
is better for new functions you haven't used before.

  [1]> (union '(a a) '(a b))
  (A B)
  [2]> (union '(a a) '(b))
  (A A B)
Remember the words of the HyperSpec:
   "the redundant entries might or might not appear in the result"
You are observing exactly that behaviour!! What more could you ask for?
(Well IMO it *could* have been specified as being allowed to throw
 an exception, signal an error, if debug/safety is high and
 speed/storage is low, but the wind didn't blow that way.)

Question to Kent: If anyone ever finds a mistake in the CLHS, would
you edit the Web page to correct the mistake?

Followup question to Kent: Would you, like Knuth, pay $10 or $25 to
anyone finding a mistake?
From: Russell McManus
Subject: Re: Q about Lisp union
Date: 
Message-ID: <87prtc3oky.fsf@thelonious.cl-user.org>
Not really relevant to the original poster's question, but am I the
only one who ever thought that CL's implementation of set operations
on top of the list data type is sad?

In my view, this is a symptom of a larger hole in the standard: there
is no sorted collection datatype like a heap.  If there was, then the
set operations would naturally be defined in terms of it, rather than
list[1].

This omission is #1 on my list of things missing from CL.

-russ


[1] Or perhaps in addition to the list datatype.
From: Scott Burson
Subject: Re: Q about Lisp union
Date: 
Message-ID: <fa4a885d-9ef4-4a90-90a0-5881d5077601@c19g2000prf.googlegroups.com>
On Mar 29, 7:58 pm, Russell McManus <···············@yahoo.com> wrote:
> Not really relevant to the original poster's question, but am I the
> only one who ever thought that CL's implementation of set operations
> on top of the list data type is sad?
>
> In my view, this is a symptom of a larger hole in the standard: there
> is no sorted collection datatype like a heap.  If there was, then the
> set operations would naturally be defined in terms of it, rather than
> list[1].
>
> This omission is #1 on my list of things missing from CL.
>
> -russ
>
> [1] Or perhaps in addition to the list datatype.

I have endeavored to rectify this omission.  Please have a look:

  http://common-lisp.net/project/fset/

-- Scott