From: Kevin Rodgers
Subject: random selection w/o replacement
Date: 
Message-ID: <1991Jul25.152843.28948@colorado.edu>
I'm trying to randomly select n > 1 distinct elements from a sequence.
The most straightforward way would be to randomly select the first
element, then randomly select the second from a sequence from which the
first element had been REMOVEd, and so on.  However, I need to execute
this operation many, many times during my simulation program, and the
lists I am using aren't short, so I'm looking for an approach that
avoids allocating so much temporary storage.

I've discovered a method that works for n = 2, but I don't know how it
could be generalized for n > 2:

(defun select-random-pair (sequence)
  (let* ((length (length sequence))
	 (elt-1 (elt sequence (random length)))
	 (elt-2 (elt sequence
		     (mod (+ (position elt-1 sequence :test #'eq)
			     (random (- length 1))
			     1)
			  length))))
    (values elt-1 elt-2)))

The idea is that after the first element has been selected, the
remaining length - 1 elements are contiguously indexed from its
position, modulo the length of the sequence.  The problem is, after the
second element has been selected, the remaining length - 2 elements
aren't necessarily contiguously indexed, because the sequence has been
divided into 3 subsequences.

Any suggestions from the net.lisp.wizards?  Thanks,
-- 
Kevin Rodgers                           ········@cs.colorado.edu
Department of Computer Science          (303) 492-8425
University of Colorado			GO BUFFS!
Boulder CO 80309-0430

From: Scott "TCB" Turner
Subject: Re: random selection w/o replacement
Date: 
Message-ID: <1991Jul25.181747.17371@aero.org>
Interesting question.  Let me be the first at my site to post my poor 
and probably incorrect answers.

If the number of items you're selecting is << than the length of the
sequence, then the following is simple and probably okay:

(defun randum-elements (n eseq)
  (do ((len (length eseq))
       (result nil))
      ((= n (length result)) result)
      (pushnew (elt eseq (random len)) result)))

Of course, this *may* take a while to finish.  This can be improved by
doing the member test explictly, thereby avoiding taking the length of
result on each iteration.  I leave that as an exercise for the reader
:-).

A better algorithm is to copy the sequence into an array (fast, no
garbage), randomize the array (slow?, no garbage), and then take the
first N values out of the array (fast, no garbage).  Provided you
declare your array globally, this conserves consing.  However,
scrambling the array isn't trivial.  I tend to swap the 0th element
with a random element a large number of times, but there must be
better algorithms.  I'm not even sure the swapping algorithm results
in a random ordering.

						-- Scott Turner
From: James A Anderson
Subject: Re: random selection w/o replacement
Date: 
Message-ID: <1991Jul25.185900.6952@athena.mit.edu>
it is proposed:
" A better algorithm is to copy the sequence into an array (fast, no
 garbage), randomize the array (slow?, no garbage), and then take the
 first N values out of the array (fast, no garbage). ... "

you could simply sort the original sequence using an ordering predicate
which return t or nil randomly.
From: Jacques Duthen
Subject: Re: random selection w/o replacement
Date: 
Message-ID: <1991Jul26.152204.17727@ircam.fr>
In article <·····················@athena.mit.edu> ······@athena.mit.edu (James A Anderson) writes:
>>" A better algorithm is to copy the sequence into an array (fast, no
>> garbage), randomize the array (slow?, no garbage), and then take the
>> first N values out of the array (fast, no garbage). ... "
>
>you could simply sort the original sequence using an ordering predicate
>which return t or nil randomly.

Nop!
If you do that, you get a *biased* random permutation, which depends on the
sorting algorithm.
Imagine the Bobble algorithm: the first element has few chances to change
its position.
I tried with Macintosh Allegro Common Lisp 1.3. Here is the result:

(rp-stats 'nrp 8 1024)
#2A((151 178 162 157 120 109 71 76)
    (68 86 143 190 161 132 109 135)
    (71 105 144 162 165 131 139 107)
    (70 81 123 109 132 130 196 183)
    (55 94 121 116 130 150 180 178)
    (49 101 101 116 148 190 163 156)
    (67 104 96 121 133 169 157 177)
    (493 275 134 53 35 13 9 12))

This shows that the 8th element has 493/1024~=1/2 chance to be moved to the 1st
position and 275/1024~1/4 to be moved to 2nd position, etc.

For those who are interested, here is the code:

(defun nrp (l)
  "Not random permutation"
  (sort l #'(lambda (x y) (declare (ignore x y)) (zerop (random 2)))))

(defun permutn-random (list)
  "Returns a destructive random permutation of <list>."
  (let ((result ()) (length (length list)))
    (nconc list list)
    (dotimes (i length)
      (setq list (nthcdr (random length) list)
            result (rplacd (prog1 (cdr list) (rplacd list (cddr list))) result)
            length (1- length)))
    result))

;Note the indispensible (1- length), otherwise the first element is random,
;but the second has more chances to the next one than any other one, etc.

(defun stats (ll)
  (let ((v (make-array (list (length (car ll)) (length (car ll)))
                       :initial-element 0)))
    (mapc
     #'(lambda (l)
         (do ((l l (cdr l))
              (i 0 (1+ i)))
             ((null l))
           (incf (aref v (car l) i))))
     ll)
    v))

(defun rp-stats (fun length nb-times)
  (let ((ll ()))
    (dotimes (i nb-times)
      (push (funcall fun (make-list1 length)) ll))
    (stats ll)))

(rp-stats 'permutn-random 8 1024)
#2A((126 116 146 148 117 123 120 128)
    (130 141 120 124 102 128 137 142)
    (123 120 130 129 119 141 135 127)
    (135 138 129 126 148 124 107 117)
    (134 137 119 126 122 128 130 128)
    (110 135 152 122 131 131 119 124)
    (138 112 113 124 148 124 138 127)
    (128 125 115 125 137 125 138 131))

This one is better ! (even if the test is not complete (adjacent pairs, etc.))
From: ·······@csgrad.cs.vt.edu
Subject: Re: random selection w/o replacement
Date: 
Message-ID: <1434@creatures.cs.vt.edu>
[stuff about the bias involved in sorting based on a random comparison operator]

Not to mention that sorting involves a much lower efficiency than is inherent
in the problem.  The solution I posted a couple messages ago, with Barry
Margolin's correction that I should have meant a indexed-from-1 array,
is O(n).

Joe
--
_______________________________________________________________________
 Joseph W. Lavinus, Virginia Tech      email: ·······@csgrad.cs.vt.edu
     "You can't do that in horizontal mode." -- TeX error message
From: ·······@csgrad.cs.vt.edu
Subject: Re: random selection w/o replacement
Date: 
Message-ID: <1428@creatures.cs.vt.edu>
As far as randomly shuffling an array, I did a wee bit of research into this
once, and it seems that the best way, statistically speaking, is to swap
the 0th element with a randomly picked element a[i] | 1 <= i <= n, then
swap the 1st element with a[i] | 2 <= i <= n, then the 3rd, and so on.

Joe
--
_______________________________________________________________________
 Joseph W. Lavinus, Virginia Tech      email: ·······@csgrad.cs.vt.edu
     "You can't do that in horizontal mode." -- TeX error message
From: Barry Margolin
Subject: Re: random selection w/o replacement
Date: 
Message-ID: <1991Jul25.204912.5809@Think.COM>
In article <····@creatures.cs.vt.edu> ·······@csgrad.cs.vt.edu () writes:
>As far as randomly shuffling an array, I did a wee bit of research into this
>once, and it seems that the best way, statistically speaking, is to swap
>the 0th element with a randomly picked element a[i] | 1 <= i <= n, then
>swap the 1st element with a[i] | 2 <= i <= n, then the 3rd, and so on.

I think you have some fencepost errors in this algorithm.  Perhaps the
reference you looked at uses one-based indexing>?  The algorithm you
described can never result in any element being left where it started.  It
can also try to swap an element with a[n], which doesn't exist in a
zero-based array system (indices of an n-element array range from 0 to
n-1).
-- 
Barry Margolin, Thinking Machines Corp.

······@think.com
{uunet,harvard}!think!barmar
From: Dan Hoey
Subject: Re: random selection w/o replacement
Date: 
Message-ID: <742@zogwarg.etl.army.mil>
The key point here is to manage the non-replacement by exchanging the
selected elements with elements at one end of the sequence.  That way the
unselected elements form a contiguous subsequence, which you can index
into with (random number-remaining).  That way, if your sequence is a
vector, you take time linear in the number selected.

If you have to do this a lot of times, you can use the vector over and
over again.  It just gets more and more randomized.

And of course if you select all the elements, you've got a shuffle.

(defun select-without-replacement (n eseq &aux (len (length eseq)))
  (dotimes (used n (subseq eseq (- len n)))
    (rotatef (elt eseq (random (- len used)))
	     (elt eseq (- len used 1)))))

>if the number of items you're selecting is << than the length of the
>sequence, then the following is simple and probably okay:

>(defun randum[sic]-elements (n eseq)
>  (do ((len (length eseq))
>       (result nil))
>      ((= n (length result)) result)
>      (pushnew (elt eseq (random len)) result)))

The I'th element into result will take Theta(I + I^2/LEN) time on
average, so the total time will be Theta(N^2 + N^3/LEN).  Provided, of
course, that ESEQ is a vector.

>... This can be improved by doing the member test explic[i]tly,
>thereby avoiding taking the length of result on each iteration....

Pretty minimal speedup, since pushnew (or member) has to look at the
whole list anyway.

>A better algorithm is to copy the sequence into an array (fast, no
>garbage), randomize the array (slow?, no garbage), and then take the
>first N values out of the array (fast, no garbage).

As I've shown, you only need to randomize the part of the array you
are going to use.

>I tend to swap the 0th element with a random element a large number
>of times....

Now that is just wrong.  Worse, that is a common beginner's mistake
for shuffling a deck of cards, and you should get over it before you
start giving advice on the subject.  Your method will randomize the
first element of the array, but you will never randomize the rest of
the array (unless LEN<=2).  It's easy to see you can't get a random
shuffle, because the the probability of each outcome is a terminating
fraction in base LEN arithmetic.

The way to shuffle a deck of card is to--guess what--select without
replacement (until you reach the end then stop).  The above function
will do it.  In some languages it's easier to collect the selected
items at the beginning of the array, and that's a reader's exercise.
But remember: if you keep calling random with the same argument, you
should start thinking about how that has got to lose.

Followups belong in an algorithms group: this is not language-specific.
I guess sci.math.num-analysis is about it.

Dan Hoey
····@AIC.NRL.Navy.Mil
From: David F. Skoll
Subject: Re: random selection w/o replacement
Date: 
Message-ID: <dfs.680467760@data>
In <······················@colorado.edu> ········@cs.Colorado.EDU
(Kevin Rodgers) writes:

>I'm trying to randomly select n > 1 distinct elements from a sequence.
>The most straightforward way would be to randomly select the first
>element, then randomly select the second from a sequence from which the
>first element had been REMOVEd, and so on.  However, I need to execute
>this operation many, many times during my simulation program, and the
>lists I am using aren't short, so I'm looking for an approach that
>avoids allocating so much temporary storage.

Here's a sketch of one possible algorithm, with an implementation given
at the end.

1 - If it's OK to destructively modify the list, fine.  If not, make a
copy of the list that you can destructively modify.  Call the working
list LST.

2 - Choose a random element from LST.

3 - Destructively swap the first element from LST with the element just
chosen in step 3.

4 - Set LST to (cdr LST).

5 - Reduce by 1 the count of elements remaining in LST

6 - Go back to 2

This requires additional storage only for the templist pointer, and one
location for doing the swap in step 4.  It should be quite fast, too.

Implementation example:  Note that you could come up with a better way
to avoid the evaluation of the slow (nth) function; I was just too lazy,
and this is for illustration only. :-)

(defun n-random-elements (lst n)

 "Given a list, return a list consisting of n random elements picked from the
  list without replacement.  In case of an error, returns NIL.  Destructively
  modifies the original list by permuting the elements.  If this is a problem,
  make a copy of the original list and pass the copy to this function."

  (declare (type fixnum n))
  (prog* (
	  (ret nil)
	  (len (length lst))
	  (chosen 0)
	 )
    (declare (type fixnum chosen len))

; Check for pathalogical cases.
    (when (or (< n 1) (> n len)) (return nil))

    (dotimes (i n ret)
      (setq chosen (random len))                ; Pick an element
      (setq ret (cons (nth chosen lst) ret))    ; Add it to return list
      (psetf (car lst)        (nth chosen lst)
             (nth chosen lst) (car lst))        ; Swap chosen, car
      (setq lst (cdr lst))                      ; Remove car from list
      (setq len (- len 1))                      ; Adjust length
    )
    (return ret)
  )
)

--
David F. Skoll
From: David Loewenstern
Subject: Re: random selection w/o replacement
Date: 
Message-ID: <1991Jul25.190727.18672@cbnewsl.cb.att.com>
In article <······················@colorado.edu> ········@cs.Colorado.EDU (Kevin Rodgers) writes:
>
>I'm trying to randomly select n > 1 distinct elements from a sequence.
>The most straightforward way would be to randomly select the first
>element, then randomly select the second from a sequence from which the
>first element had been REMOVEd, and so on.  However, I need to execute
>this operation many, many times during my simulation program, and the
>lists I am using aren't short, so I'm looking for an approach that
>avoids allocating so much temporary storage.

My first approach would be the following:

(DEFUN Choose-Subseq (seq n)
  (LET* ((copy (COPY-SEQ (COERCE seq 'array)))
         (last (LENGTH copy)))
    (LOOP repeat n
          as position = (RANDOM last)
	  collect (AREF copy position)
	  do (SETF (AREF copy position) (AREF copy (DECF last))))))
	
These opinions are shareware.  If you like the product,
please send your $0.02 to
               David Loewenstern
	      <·····@homxc.att.com>
From: David F. Skoll
Subject: Re: random selection w/o replacement
Date: 
Message-ID: <dfs.680472563@reg>
In <······················@colorado.edu> ········@cs.Colorado.EDU
(Kevin Rodgers) writes:

>I'm trying to randomly select n > 1 distinct elements from a sequence.
>The most straightforward way would be to randomly select the first
>element, then randomly select the second from a sequence from which the
>first element had been REMOVEd, and so on.  However, I need to execute
>this operation many, many times during my simulation program, and the
>lists I am using aren't short, so I'm looking for an approach that
>avoids allocating so much temporary storage.

Here's a sketch of one possible algorithm, with an implementation given
at the end.

1 - If it's OK to destructively modify the list, fine.  If not, make a
copy of the list that you can destructively modify.  Call the working
list LST.

2 - Choose a random element from LST.

3 - Destructively swap the first element from LST with the element just
chosen in step 3.

4 - Set LST to (cdr LST).

5 - Reduce by 1 the count of elements remaining in LST

6 - Go back to 2

This requires additional storage only for the templist pointer, and one
location for doing the swap in step 4.  It should be quite fast, too.

Implementation example:  Note that you could come up with a better way
to avoid the evaluation of the slow (nth) function; I was just too lazy,
and this is for illustration only. :-)

(defun n-random-elements (lst n)

 "Given a list, return a list consisting of n random elements picked from the
  list without replacement.  In case of an error, returns NIL.  Destructively
  modifies the original list by permuting the elements.  If this is a problem,
  make a copy of the original list and pass the copy to this function."

  (declare (type fixnum n))
  (prog* (
	  (ret nil)
	  (len (length lst))
	  (chosen 0)
	 )
    (declare (type fixnum chosen len))

; Check for pathalogical cases.
    (when (or (< n 1) (> n len)) (return nil))

    (dotimes (i n ret)
      (setq chosen (random len))                ; Pick an element
      (setq ret (cons (nth chosen lst) ret))    ; Add it to return list
      (psetf (car lst)        (nth chosen lst)
             (nth chosen lst) (car lst))        ; Swap chosen, car
      (setq lst (cdr lst))                      ; Remove car from list
      (setq len (- len 1))                      ; Adjust length
    )
    (return ret)
  )
)

--
David F. Skoll
From: Dan Rosenfeld
Subject: Re: random selection w/o replacement
Date: 
Message-ID: <danr.680481567@melange>
········@cs.Colorado.EDU (Kevin Rodgers) writes:


>I'm trying to randomly select n > 1 distinct elements from a sequence.
>The most straightforward way would be to randomly select the first
>element, then randomly select the second from a sequence from which the
>first element had been REMOVEd, and so on.  However, I need to execute
>this operation many, many times during my simulation program, and the
>lists I am using aren't short, so I'm looking for an approach that
>avoids allocating so much temporary storage.

>-- 
>Kevin Rodgers                           ········@cs.colorado.edu
>Department of Computer Science          (303) 492-8425
>University of Colorado			GO BUFFS!
>Boulder CO 80309-0430


Here's one possibility:

(defun select-random-elements (sequence n)
  (do* (;; Points to the beginning of the range of the array
	;; still 'active'.
	(fill-ptr 0 (1+ fill-ptr))
	(length (length sequence))
	temp 
	random-index)
       ((= n fill-ptr)
	;; Return an sequence containing the selected elements.
	(subseq sequence 0 n))
      ;; Save the element at the location that the randomly selected
      ;; element will be moved to.
      (setf temp (elt elements fill-ptr))
      ;; ** Generate a random index into the range of "elements"
      ;; not containing previously selected elements. **
      (setf random-index (+ fill-ptr (random (- length fill-ptr))))
      (setf (elt elements fill-ptr) (elt sequence random-index))
      (setf (elt elements random-index) temp)))


The idea, in brief, is as follows.  The sequence is separated into two
ranges: one at the beginning of the sequence, containing all the
elements selected thus far, and the rest of the sequence containing
elements not selected.  To select each element, and index is generated
into the to the 'rest' part of the array mentioned above.  This
element at this index is moved to the first range by swapping it with
the element just past the current end of the first range.  The bottom
range is extended and the process repeats until 'n' elements have been
selected.

Two caveats: this piece of code assumes that n < length(sequence), and
that there isn't some other part of your program that cares whether
the elements of sequence get scrambled.  

Hope this helps.

Dan Rosenfeld

[····@autodesk.com]
From: Mark Kantrowitz
Subject: Re: random selection w/o replacement
Date: 
Message-ID: <1991Jul25.233824.9011@cs.cmu.edu>
In article <······················@colorado.edu> ········@cs.Colorado.EDU (Kevin Rodgers) writes:
>
>I'm trying to randomly select n > 1 distinct elements from a sequence.
>The most straightforward way would be to randomly select the first
>element, then randomly select the second from a sequence from which the
>first element had been REMOVEd, and so on.  However, I need to execute
>this operation many, many times during my simulation program, and the
>lists I am using aren't short, so I'm looking for an approach that
>avoids allocating so much temporary storage.

If your sequence isn't an array, elt and position could potentially
have to scan the entire sequence.

From your example it appears you want an offline algorithm; i.e., you
know n ahead of time.

I'd suggest something along the lines of the following. Let m be the
length of your sequence. Choose a random number between 1 and 2^m
which has n bits on. Consider this number as a bit-vector which
indicates which elements of the sequence have been selected, and
gather the elements. It should be possible to write a gather routine
which runs in O(m) time by scanning down the sequence and the
bit-vector in parallel. On a vector-processor it will be even faster.

The key is generating a random number between 1 and 2^m with n bits
on. I'm certain that there's an elegant mathematical way of generating
such a number, but I can't think of it offhand. If all else fails, you
could generate n random numbers to index into the array and turn the
bits on, with special provisions for repeating when there are
collisions.

Another possibility is to scan down your sequence, and for each
element, flip a coin which has probability n/m of generating a heads
and keep the element if the coin toss was a heads. The expected number
of heads is n, but you'll possibly need to flip additional coins if
less than n heads appear, and discard extra coins tosses if more than
n heads appear.

For example,

(defun select-n (sequence n)
  ;; Length runs in linear time if the sequence is a list, constant
  ;; time if it is a vector with fill-pointer. Also, the variables
  ;; selected-elements and unselected-elements should ideally be
  ;; vectors, and we should use vector-push (or vector-push-extend) 
  ;; instead of push to add elements to them. Note that this is
  ;; not guaranteed to return the selected elements in the same order 
  ;; in which they appear in the original sequence, which may or may
  ;; not be desirable.
  (let ((length (length sequence)) 
	(count 0)
	(selected-elements nil)
	(unselected-elements nil))
    ;; Destructive modification of the original sequence could 
    ;; result in less consing.
    (cond ((>= n length) 
	   ;; n is greater than the length of the sequence, 
	   ;; so return the entire sequence.
	   sequence)
	  (t
	   ;; n is less than the length of the sequence, so choose each
	   ;; element with probability n/length
	   (dolist (element sequence)
	     (cond ((< (random length) n)
		    (incf count)
		    (push element selected-elements))
		   (t
		    (push element unselected-elements))))
	   ;; We've selected count elements, which is potentially
	   ;; different from n. Fix this up:
	   (cond ((= count n)
		  selected-elements)
		 ((> count n)
		  (select-n selected-elements n))
		 ((< count n)
		  (append selected-elements
			  (select-n unselected-elements (- n count)))))))))

--mark
From: Jacques Duthen
Subject: Re: random selection w/o replacement
Date: 
Message-ID: <1991Jul26.145953.16955@ircam.fr>
In article <······················@colorado.edu> ········@cs.Colorado.EDU (Kevin Rodgers) writes:
>I'm trying to randomly select n > 1 distinct elements from a sequence.
>The most straightforward way would be to randomly select the first
>element, then randomly select the second from a sequence from which the
>first element had been REMOVEd, and so on.  However, I need to execute
>this operation many, many times during my simulation program, and the
>lists I am using aren't short, so I'm looking for an approach that
>avoids allocating so much temporary storage.

What do you mean by "randomly select" ?
In the example you wrote, you return multiple values.
Do you want to collect the "selected" elements ?
Do you want to avoid to put them into a list or a vector or a sequence ?
Do you want to get them in big multiple values ?

Or is your problem that you don't want to make a copy of the sequence
to work on ?
In this case, as you say you do it many times, it might not be a problem:
You can copy your different sequences, using always the same temporary cons
cells instead of allocating new cells each time and giving them to the gc.

(defvar *freelist* nil)
(defun cons (car cdr)
  (if (null *freelist*)
    (cons car cdr)
    (prog1
      (rplaca *freelist* car)
      (rplacd *freelist*
        (prog1 cdr (setf *freelist* (cdr *freelist*)))))))
(defun freelist (l)
  (setf *freelist* (nconc l *freelist*)))

Then you can use any random permutation algorithm

(defun permutn-random (list)
  "Returns a destructive random permutation of <list>."
  (let ((result ()) (length (length list)))
    (nconc list list)
    (dotimes (i length)
      (setq list (nthcdr (random length) list)
            result (rplacd (prog1 (cdr list) (rplacd list (cddr list))) result)
            length (1- length)))
    result))

Note the indispensible (1- length), otherwise the first element is random,
but the second has more chances to be the next one than any other one, etc.

							[jack]
Note that the bench-marks made by Lee Boynton (NeXT) show that it's more
efficient to let the incremental garbage collector to its job than trying
to help him !
From: David Wallace
Subject: Re: random selection w/o replacement
Date: 
Message-ID: <1550002@hpdtczb.HP.COM>
Jon Bentley also discusses this problem on page 118 of Programming Pearls,
referencing Algorithm S from Knuth volume 2, section 3.4.2.
Basically, you visit each element in order and select it with probability
S/R, if you need to select S more numbers with R remaining.

Thus if you have a 5 element sequence and want 2 of them:

Select 1st element with probability 2/5.
Select 2nd element with probability 2/4 if you didn't choose 1st, otherwise
select it with probability 1/4.
Select 3rd element with probability 2/3 if you have chosen 0 so far, 1/3 if
you have chosen 1, 0/3 if you have chosen 2.
Etc.

This selects each possible subset with equal probability, and only requires
2 passes down the list (one to compute length, one to pick the elements).
If you want all permutations of the subsets, you could always permute the
subset randomly after generating it.

Dave W.		(·············@hpdtl.ctgsc.hp.com)