From: Jedrzej Nasiadek
Subject: simple exercise from ACL
Date: 
Message-ID: <9hdh3m$ps3$2@sunsite.icm.edu.pl>
Hi,

I'm a lisp beginner, I'm learning from Paul Grahams ACL book.
I've got a question about lisp style of coding - there is an exercise
in this book:
   Write a version of "union" that preserves the order of the elements
   in the original lists:

   > (new-union '(a b c) '(b a d))
   (A B C D)

I solved it as below:
(defun new-union (l1 l2)
  (let ((res nil))
    (dolist (obj (append l1 l2))
            (setf res (adjoin obj res)))
    (reverse res)))

Is there a better, more proper, or simply more elegant way to do it according
to lisp-style-of-thinking?

bye

-- 
[  Yen  *  Jedrzej Nasiadek  *  J.Nasiadek_at_ia.pw.edu.pl  ]

From: Kent M Pitman
Subject: Re: simple exercise from ACL
Date: 
Message-ID: <sfwn16twqiw.fsf@world.std.com>
Jedrzej Nasiadek <····@sig.below.pl> writes:

[I'll assume this isn't homework.]

> Hi,
>
> I'm a lisp beginner,

Welcome.

> I'm learning from Paul Grahams ACL book.
> I've got a question about lisp style of coding - there is an exercise
> in this book:
>    Write a version of "union" that preserves the order of the elements
>    in the original lists:
> 
>    > (new-union '(a b c) '(b a d))
>    (A B C D)
> 
> I solved it as below:
> (defun new-union (l1 l2)
>   (let ((res nil))
>     (dolist (obj (append l1 l2))
>             (setf res (adjoin obj res)))
>     (reverse res)))
> 
> Is there a better, more proper, or simply more elegant way to do it according
> to lisp-style-of-thinking?

It depends on what you're out to achieve.  Maybe rather than saying
"better" or "more proper" let's just look at some things you can change
and get different elegance/speed effects.

First, when you dolist down the list (append l1 l2), you're consing a
list for no purpose other than to walk over it.  Another way to write
this is

 (defun new-union (l1 l2)
   (let ((result '()))
     (flet ((adjoin1 (item) (setf result (adjoin item result))))
       (declare (dynamic-extent #'adjoin1)) ;no "upward closure" required
       (mapc #'adjoin1 l1)
       (mapc #'adjoin1 l2)
       (reverse result))))

So here you are basically avoiding doing the work and consing of the append
and still traversing the lists l1 and l2 in order.  [Some implementations
may not do the dynamic-extent declaration correctly, and so may cons a 
closure for the FLET.  Even so, they will probably only cons 2 such closures,
one for each use of #'adjoin, and you may still get more efficiency.  If you
were utterly paranoid, you could refer to #'adjoin1 only once and then put
the value in a LET.  However, I recommend just ignoring this issue or
sending a bug report if  you find it conses gratuitously.
(You could also just repeat the SETF code in two explicit dolist forms,
one for l1 and one for l2.)

Second, you yourself are building the return list RESULT.  No one else.
There is no bad effect created in such a case, and there are many benefits,
to using NREVERSE instead of REVERSE.  That recycles the same storage as
you built up with ADJOIN in order to "make" the returned object.

Third, personally, I always use SETQ instead of SETF when I'm just assigning
variables. SETQ is conceptually more primitive.  However, others will tell
you they prefer to uniformly use SETF and never SETQ.  It's just a personal
style issue you should know about.

Fourth, you'll note in the code above, I replaced (LET ((RESULT NIL)) ...)
with (LET ((RESULT '())) ...).  '() and 'NIL and NIL all evaluate to the
same object.  However, as a point of style, I and others use '() to refer
to the intended  use of a list, 'NIL to refer to the intended use of a 
symbol, and NIL to refer to the constant.

Fifth, there are recursive solutions.  I'm glad to see you didn't do one.
Common Lisp, unlike Scheme, does not reliably remove stack on tail-calls,
so had you recursed down the tail, there are examples for which you'd have
run out of stack.  This implementation you picked with DOLIST is better
for CL's semantics.

Sixth, the iterative use of ADJOIN makes this take O(N^2) time, where N is
the length of the list.  You can cut down the time to O(N) in some special
cases with more work, and lists of all symbols are one such case.  This is
mostly of interest for sets of very long lists.  e.g.,

 (defun new-union (l1 l2)
   (macrolet ((do-l1-and-l2 ((var) &body forms)
                `(progn (dolist (,var l1) ,@forms)
                        (dolist (,var l2) ,@forms))))
     (let ((tag (gensym)) (result '()))
       (unwind-protect (do-l1-and-l2 (element)
                         (unless (get element tag)
                           (setf (get element tag) t)
                           (push element result)))
         (do-l1-and-l2 (element)
           (remprop element tag)))
       (nreverse result))))

You might do this if you knew L1 and/or L2 were very long and their plists
were very short (which usually plists are).

Seventh, there are also things you know jsut by inspection of the problem
like that if l1 is really a set, you can take it for granted that it is in
order and dup free.  So actually just

 (defun new-union (l1 l2)
   (let ((tag (gensym)) (extras '()))
      (unwind-protect (progn (dolist (element l1)
                               (setf (get element tag) t))
                             (dolist (element l2)
                               (unless (get element tag)
                                 ;; No need to mark this since it was 
                                 ;; either already marked, or is not 
                                 ;; duplicated in l2
                                 (push element extras))))
        (dolist (element l1)
          (remprop element tag)))
      (append l1 (nreverse extras))))

Of course, you could factor back out the special marking hack with the gensyms
and turn this one back into a use of adjoin, for the case where you were
not known to have all symbols so couldn't use property lists to efficiently
notate whether an element had been seen.

This trick does work with GETHASH instead of GET for some other situations.
I'll leave that as an exercise, too.  Whether it's profitable depends on
the speed of your hash tables and the lengths of the lists.
From: Jedrzej Nasiadek
Subject: Re: simple exercise from ACL
Date: 
Message-ID: <3b3fac14@news.astercity.net>
Kent M Pitman <······@world.std.com> wrote:

Thanks a lot for your time and for such an indepth explanation!
It was very helpful.

> [I'll assume this isn't homework.]

Nope - I'm learning lisp just for myself.
 
> Third, personally, I always use SETQ instead of SETF when I'm just assigning
> variables. SETQ is conceptually more primitive.  However, others will tell
> you they prefer to uniformly use SETF and never SETQ.  It's just a personal
> style issue you should know about.

So - just for being sure - though SETQ is more primitive there is no
performance gain in using it (for assigning variables)?

> Fifth, there are recursive solutions.  I'm glad to see you didn't do one.
> Common Lisp, unlike Scheme, does not reliably remove stack on tail-calls,
> so had you recursed down the tail, there are examples for which you'd have
> run out of stack.  This implementation you picked with DOLIST is better
> for CL's semantics.

It's good to know such a thing - as I browsed several examples of lisp code
It seemed to me there is some kind of obsession of writing recursive
functions everywhere it's possible. 

bye,

-- 
[  Yen  *  Jedrzej Nasiadek  *  J.Nasiadek_at_ia.pw.edu.pl  ]
From: Kent M Pitman
Subject: Re: simple exercise from ACL
Date: 
Message-ID: <sfwsnggupho.fsf@world.std.com>
Jedrzej Nasiadek <····@sig.below.pl> writes:

> Kent M Pitman <······@world.std.com> wrote:
> 
> Thanks a lot for your time and for such an indepth explanation!
> It was very helpful.

:-)

> > Third, personally, I always use SETQ instead of SETF when I'm just
> > assigning variables. SETQ is conceptually more primitive.
> > However, others will tell you they prefer to uniformly use SETF
> > and never SETQ.  It's just a personal style issue you should know
> > about.
> 
> So - just for being sure - though SETQ is more primitive there is no
> performance gain in using it (for assigning variables)?

No. Probably I'm just stuck in my ways as an old-timer.  There used to be an
issue in the very early implementations of Lisp with SETQ being more efficient
in the interpreter, but even that has been resolved.  Mostly people don't run
interpreted code any more, but even when they do it's not as inefficient as
you might guess.  Just using SETF for everything will probably do just fine
for you, and will be in many ways conceptually simpler.