list of numbers in randomised order - prg. struggles at large (~10^5) numbers

From: Richard Smith
Subject: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sat, 25 Mar 2006 11:13:36 +0000
Message-ID: <m2ek0qeq9x.fsf@82-35-236-132.cable.ubr01.enfi.blueyonder.co.uk>

Here is my lisp program for generating a list of numbers from 1 to
given number, but presented in random order:

; (progn (compile 'my-num-series-rand-ord)(compile 'myh-num-series-rand-ord))

(defun my-num-series-rand-ord (max)
  "randomise order of a list of numbers 1 to max"
  (myh-num-series-rand-ord
   (let ((my-l ()))(dotimes (i max) (push (+ i 1) my-l))(nreverse my-l))
   ()))

(defun myh-num-series-rand-ord (the-l out-l)
  "randomise order of a list"
  (if the-l
      (let ((num (nth (random (length the-l)) the-l)))
	(myh-num-series-rand-ord (remove num the-l)(push num out-l)))
    out-l))


I wanted it to randomise 1 to 88799.
However, at numbers above about 20000 it doesn't seem to ever return.

Question 1 - why is this?

Question 2 - Am I doing what I want to do in the right way?  I want to
randomise the order of a text list of family names so I can do
database work and present demonstration versions without confidential
data in them.
So:
- take a number which is the number of lines in the text file.
- produce a list of numbers 1 to number of lines, but in randomised
  order
- use a unix shell script to list the lines of the file in the order
  of the index
"while read i; do sed -n ${i}P ordered_orig/name_first_female.txt;
  done < randomise_name_list_order/index_random_female_names.txt".
(with so many lines, wanted to do job in steps)

Question 3
(apart from the issue of using "loop", which I haven't looked at)
Is the code OK?


Other code:

This will return a list with the same elements as the supplied
list, but in randomised order (well, I hope that's what it does!)...

(defun list-rand-ord (the-l)
  "randomise order of a list"
  (h-list-rand-ord the-l ()))

(defun h-list-rand-ord (the-l out-l)
  "randomise order of a list"
  ; (format t "In: ~A~%Out: ~A~%" the-l out-l)   ; debug
  (if the-l
      (let ((rn (random (length the-l))))
	(h-list-rand-ord
	 (but-nth rn the-l)
	 (push (nth rn the-l) out-l)))
    out-l))

(defun but-nth (n lst)
  "the inverse of nth - returns a list minus the nth value
negative n test not needed as n from random number gen."
  (cond ((= n 0) (cdr lst))
	(t (append (subseq lst 0 n)
		   (nthcdr (1+ n) lst)))))


Any use?

Richard Smith

Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers David Sletten
- Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Christian Haselbach
  - Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers David Sletten
    - Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Christian Haselbach
- Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Frank Buss
  - Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Marcin 'Qrczak' Kowalczyk
  - Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Marc Battyani
    - Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Juho Snellman
  - Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Rob Warnock
    - Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Peter Seibel
      - Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Gareth McCaughan
        Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Peter Seibel
        Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Alexander Schmolck
        Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Peter Seibel
        Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Christian Haselbach
        Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Gareth McCaughan
      - Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Marcin 'Qrczak' Kowalczyk
        Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Peter Seibel
        Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Marcin 'Qrczak' Kowalczyk
        Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Marcin 'Qrczak' Kowalczyk
    - Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Alexander Schmolck
- Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers Espen Vestre
  - Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers David Sletten

From: David Sletten
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sat, 25 Mar 2006 12:38:30 +0000
Message-ID: <aRaVf.11956$w86.5166@tornado.socal.rr.com>

Richard Smith wrote:

> Here is my lisp program for generating a list of numbers from 1 to
> given number, but presented in random order:
> 
> ; (progn (compile 'my-num-series-rand-ord)(compile 'myh-num-series-rand-ord))
> 
> (defun my-num-series-rand-ord (max)
>   "randomise order of a list of numbers 1 to max"
>   (myh-num-series-rand-ord
>    (let ((my-l ()))(dotimes (i max) (push (+ i 1) my-l))(nreverse my-l))
>    ()))
> 
> (defun myh-num-series-rand-ord (the-l out-l)
>   "randomise order of a list"
>   (if the-l
>       (let ((num (nth (random (length the-l)) the-l)))
> 	(myh-num-series-rand-ord (remove num the-l)(push num out-l)))
>     out-l))
> 
> 
> I wanted it to randomise 1 to 88799.
> However, at numbers above about 20000 it doesn't seem to ever return.
> 
> Question 1 - why is this?
> 

Allow me to rewrite this in a more conventional style, and then I'll 
show you a more efficient way to do this.

It's kind of odd to place your LET form inside the call to the helper 
function. Normally you'd place that call inside the LET (the scope of 
MY-L) after the DOTIMES that initialized the list. However, I think it's 
cleaner to separate things entirely. Note how DOTIMES has an optional 
3rd result form 
(http://www.lispworks.com/documentation/HyperSpec/Body/m_dotime.htm):
(defun make-numlist (max)
   (let ((result '()))
     (dotimes (i max (nreverse result))
       (push (1+ i) result))))

This is adequate, but it's extremely straightforward using LOOP:
(defun make-numlist (max)
   (loop for i from 1 upto max collect i))

Next, we get this:
(defun randomize (in out)
   (if (endp in)
       out
       (let ((num (nth (random (length in)) in)))
         (randomize (remove num in) (cons num out)))) )

(randomize (make-numlist 40) '()) => (35 15 28 36 39 26 20 27 19 3 22 7 
38 1 2 23 40 24 18 9 21 34 30 4 13 12 6 16 5
  29 17 37 25 8 14 33 32 11 10 31)


> Question 2 - Am I doing what I want to do in the right way?  I want to
> randomise the order of a text list of family names so I can do
> database work and present demonstration versions without confidential
> data in them.
> So:
> - take a number which is the number of lines in the text file.
> - produce a list of numbers 1 to number of lines, but in randomised
>   order
> - use a unix shell script to list the lines of the file in the order
>   of the index
> "while read i; do sed -n ${i}P ordered_orig/name_first_female.txt;
>   done < randomise_name_list_order/index_random_female_names.txt".
> (with so many lines, wanted to do job in steps)
> 
> Question 3
> (apart from the issue of using "loop", which I haven't looked at)
> Is the code OK?
> 

Unfortunately, the RANDOMIZE function is terribly inefficient (both 
yours and mine). On each invocation you call NTH, LENGTH, and REMOVE 
which must each traverse the list linearly (at least partially). Of 
course the input IN is shrinking as RANDOMIZE proceeds, but this is 
still a lot of unnecessary work.

Instead, I would suggest using a vector, which allows random access, and 
the Fisher-Yates shuffle algorithm (see Knuth Vol. 2 Pg. 145). First, we 
redefine MAKE-NUMLIST:
(defun make-numvector (max)
   (let ((v (make-array max)))
     (dotimes (i max v)
       (setf (aref v i) (1+ i)))) )

Then we shuffle the vector in place:
(defun shuffle (v)
   (loop for j from (1- (length v)) downto 1
         for k = (random (1+ j))
         do (rotatef (aref v j) (aref v k)))
   v)

(shuffle (make-numvector 40)) => #(3 13 17 16 23 15 33 4 36 11 31 21 25 
34 26 8 27 20 9 30 14 24 6 22 32 29 40 19 10 18 35 1 37 39 28 12 38 5 2 7)

Using SBCL I get:
* (time (shuffle (make-numvector 80000)))

Evaluation took:
   0.202 seconds of real time
   0.095459 seconds of user run time
   0.010836 seconds of system run time
   0 page faults and
   3,196,776 bytes consed.

> 
> Other code:
> 
> This will return a list with the same elements as the supplied
> list, but in randomised order (well, I hope that's what it does!)...
> 
> (defun list-rand-ord (the-l)
>   "randomise order of a list"
>   (h-list-rand-ord the-l ()))
> 
> (defun h-list-rand-ord (the-l out-l)
>   "randomise order of a list"
>   ; (format t "In: ~A~%Out: ~A~%" the-l out-l)   ; debug
>   (if the-l
>       (let ((rn (random (length the-l))))
> 	(h-list-rand-ord
> 	 (but-nth rn the-l)
> 	 (push (nth rn the-l) out-l)))
>     out-l))
> 
> (defun but-nth (n lst)
>   "the inverse of nth - returns a list minus the nth value
> negative n test not needed as n from random number gen."
>   (cond ((= n 0) (cdr lst))
> 	(t (append (subseq lst 0 n)
> 		   (nthcdr (1+ n) lst)))))
> 
> 
> Any use?
> 
> Richard Smith

Here's my version of BUT-NTH:
(defun transfer (l i)
   "Remove the ith element from a list, returning the element and the 
new list."
   (labels ((transfer-aux (l i result)
              (cond ((endp l) (error "Could not transfer element"))
                    ((zerop i) (values (car l)
                                       (nconc (nreverse result)
                                              (cdr l))))
                    (t (transfer-aux (cdr l)
                                     (1- i)
                                     (cons (car l) result)))) ))
     (transfer-aux l i '())))

(transfer (make-numlist 20) 12) =>
13
(1 2 3 4 5 6 7 8 9 10 11 12 14 15 16 17 18 19 20)

Aloha,
David Sletten

P.S. Your function names could be a bit more perspicuous.

From: Christian Haselbach
Subject: Re: list of numbers in randomised order - prg. struggles at large   (~10^5) numbers
Date: Sat, 25 Mar 2006 13:45:21 +0000
Message-ID: <e03hhj$pvu$1@online.de>

David Sletten schrieb:

> Here's my version of BUT-NTH:
> (defun transfer (l i)
>   "Remove the ith element from a list, returning the element and the new 
> list."
>   (labels ((transfer-aux (l i result)
>              (cond ((endp l) (error "Could not transfer element"))
>                    ((zerop i) (values (car l)
>                                       (nconc (nreverse result)
>                                              (cdr l))))
>                    (t (transfer-aux (cdr l)
>                                     (1- i)
>                                     (cons (car l) result)))) ))
>     (transfer-aux l i '())))

I think this is a faster, but destroyes the input:
(defun but-nth (n l)
   "Returns a list with the n-th element removed. Destroyes the input list."
   (cond
     ((>= n (length l)) (values nil l))
     ((= 0 n) (values (car l) (cdr l)))
     (t (let* ((lp (nthcdr (1- n) l))
	      (x (cadr lp)))
	 (setf (cdr lp) (cddr lp))
	 (values x l)))))

Are there any other drawbacks to this solution. I'm pretty new to lisp.

Regards,
Christian

From: David Sletten
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sun, 26 Mar 2006 01:39:59 +0000
Message-ID: <PhmVf.12022$w86.8288@tornado.socal.rr.com>

Christian Haselbach wrote:

> David Sletten schrieb:
> 
>> Here's my version of BUT-NTH:
>> (defun transfer (l i)
>>   "Remove the ith element from a list, returning the element and the 
>> new list."
>>   (labels ((transfer-aux (l i result)
>>              (cond ((endp l) (error "Could not transfer element"))
>>                    ((zerop i) (values (car l)
>>                                       (nconc (nreverse result)
>>                                              (cdr l))))
>>                    (t (transfer-aux (cdr l)
>>                                     (1- i)
>>                                     (cons (car l) result)))) ))
>>     (transfer-aux l i '())))
> 
> 
> I think this is a faster, but destroyes the input:
> (defun but-nth (n l)
>   "Returns a list with the n-th element removed. Destroyes the input list."
>   (cond
>     ((>= n (length l)) (values nil l))
>     ((= 0 n) (values (car l) (cdr l)))
>     (t (let* ((lp (nthcdr (1- n) l))
>           (x (cadr lp)))
>      (setf (cdr lp) (cddr lp))
>      (values x l)))))
> 
> Are there any other drawbacks to this solution. I'm pretty new to lisp.
> 

Yours does appear faster. It's actually quite clever. The inital LENGTH 
test appears unfortunate at first since you must traverse the entire 
list and then the traversal occurs again with NTHCDR when the test 
fails. But I don't see any way around it.

Of course, you've recognized that the function is destructive. That's 
fine if you know what you're doing, but it slows down if you do this:
(defun but-nth-1 (n l)
   "Returns a list with the n-th element removed. Destroyes the input list."
   (cond
     ((>= n (length l)) (values nil l))
     ((= 0 n) (values (car l) (cdr l)))
     (t (let* ((l (copy-seq l))
               (lp (nthcdr (1- n) l))
               (x (cadr lp)))
      (setf (cdr lp) (cddr lp))
      (values x l)))))

I wrote a few more versions, none of which match the speed of your 
destructive version. The first 2 use a structure called a "tconc", which 
maintains a pointer to the end of the growing list:
(defun transfer-1 (l n)
   (let* ((result (list '()))
          (tail result))
     (do ((list (cdr l) (cdr list))
          (elt (car l) (car list))
          (i 0 (1+ i)))
         ((= i n) (setf (cdr tail) list) (values elt (cdr result)))
       (if (endp list)
           (error "Could not transfer element")
           (setf tail (setf (cdr tail) (list elt)))) )))

(defun transfer-2 (l n)
   (let* ((accum (list '()))
          (tail accum))
     (loop for elt in l
           for cons on (rest l)
           for i from 0 below n
           do (setf tail (setf (cdr tail) (list elt)))
           finally (cond ((< i (1- n)) (error "Could not transfer element"))
                         (t (setf (cdr tail) cons)
                            (return (values elt (cdr accum)))) ))))

(defun transfer-3 (l n)
   (loop for elt in l
         for cons on (rest l)
         for i from 0 below n
         collect elt into result
         finally (return (values elt (nconc result cons)))) )

The semantics of TRANSFER-3 are not quite right (let's just say it 
behaves differently). If (>= n (length l)), then the final element is 
removed.

I still haven't decided what the correct behavior should be in this case:
* (transfer '() 0)

debugger invoked on a SIMPLE-ERROR: Could not transfer element

Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
   0: [ABORT] Exit debugger, returning to top level.

(TRANSFER NIL 0)
0] :a

* (transfer-1 '() 0)

NIL
NIL
* (transfer '(()) 0)

NIL
NIL
* (transfer-1 '(()) 0)

NIL
NIL

Aloha,
David Sletten

From: Christian Haselbach
Subject: Re: list of numbers in randomised order - prg. struggles at large   (~10^5) numbers
Date: Sun, 26 Mar 2006 14:36:05 +0000
Message-ID: <e068sn$r5g$1@online.de>

David Sletten wrote:

> Yours does appear faster. It's actually quite clever. The inital LENGTH 
> test appears unfortunate at first since you must traverse the entire 
> list and then the traversal occurs again with NTHCDR when the test 
> fails. But I don't see any way around it.

I have not thought about the cost of LENGTH, because I have to program 
in Java in my day-work. And in Java the common list implementations know
their length. Of course, lists are a little bit different in Lisp.

However, the following implementation avoids LENGTH:
(defun but-nth-v3 (n l)
   (handler-case
       (if (zerop n)
	  (values (car l) (cdr l))
	  (let* ((lp (nthcdr (1- n) l))
		 (x (cadr lp)))
	    (setf (cdr lp) (cddr lp))
	    (values x l)))
     (error () (values nil l))))

This also handles the case where the function is called with an empty 
list, using the same semantics where a list is provided, but n is 
greater than the length of the list.

Is this handler-case too general? I.e., could there somehow be a reason 
an error is raised not due to an invalid cdr or car call?

> I still haven't decided what the correct behavior should be in this case:
> * (transfer '() 0)

Well, IMHO it should be the same as asking for an element out of range.
I think one should either raise an error or return nil as the element 
and the unmodified list. The last option has the disadvantage that you 
cannot tell whether nil was returned because the request was out of 
range or theres really a nil at the n-th position.

Ciao Christian

From: Frank Buss
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sat, 25 Mar 2006 14:00:34 +0000
Message-ID: <1oep2y1w0mamw.164eksyu1y1ms.dlg@40tude.net>

David Sletten wrote:

> Instead, I would suggest using a vector, which allows random access, and 
> the Fisher-Yates shuffle algorithm (see Knuth Vol. 2 Pg. 145). First, we 
> redefine MAKE-NUMLIST:
> (defun make-numvector (max)
>    (let ((v (make-array max)))
>      (dotimes (i max v)
>        (setf (aref v i) (1+ i)))) )
> 
> Then we shuffle the vector in place:
> (defun shuffle (v)
>    (loop for j from (1- (length v)) downto 1
>          for k = (random (1+ j))
>          do (rotatef (aref v j) (aref v k)))
>    v)

I think this is the fastest algorithm.

Another nice one I've read in this newsgroup some time ago (I don't
remember where, but I remember the concept) :

(defun randomize (limit)
  (mapcar #'car
          (sort (loop for i from 0 below limit
                      collect (cons i (random 1.0)))
                #'(lambda (x y) (< (cdr x) (cdr y))))))

The shuffle-part has the advantage, that it works with lists, too.

-- 
Frank Buss, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: Marcin 'Qrczak' Kowalczyk
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sat, 25 Mar 2006 15:49:43 +0000
Message-ID: <87lkuyy160.fsf@qrnik.zagroda>

Frank Buss <··@frank-buss.de> writes:

> Another nice one I've read in this newsgroup some time ago (I don't
> remember where, but I remember the concept) :
>
> (defun randomize (limit)
>   (mapcar #'car
>           (sort (loop for i from 0 below limit
>                       collect (cons i (random 1.0)))
>                 #'(lambda (x y) (< (cdr x) (cdr y))))))

It costs O(N log N) time, while O(N) is archievable.

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From: Marc Battyani
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sat, 25 Mar 2006 18:46:49 +0000
Message-ID: <i5-dnaY8SIEVErjZRVnyjg@giganews.com>

"Frank Buss" <··@frank-buss.de> wrote

> (defun randomize (limit)
>  (mapcar #'car
>          (sort (loop for i from 0 below limit
>                      collect (cons i (random 1.0)))
>                #'(lambda (x y) (< (cdr x) (cdr y))))))
>

Here is a variation of the one I use in the Common Lisp Directory to
randomize the lists of items:

(defun randomize (limit)
  (sort (loop for i from 1 to limit collect i)
         #'> :key #'(lambda(x) (random 1000))))

Not that efficient either, but shorter and fast enough.

CL-USER 35 > (time (randomize 100000))
Timing the evaluation of (randomize 100000)

user time    =      0.234
system time  =      0.000
Elapsed time =   0:00:01
Allocation   = 16040 bytes standard / 1108041 bytes conses

Marc

The Common Lisp Directory: http://www.cl-user.net

From: Juho Snellman
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sat, 25 Mar 2006 19:32:09 +0000
Message-ID: <slrne2b6lp.d3a.jsnell@sbz-30.cs.Helsinki.FI>

Marc Battyani <·············@fractalconcept.com> wrote:
> (defun randomize (limit)
>   (sort (loop for i from 1 to limit collect i)
>          #'> :key #'(lambda(x) (random 1000))))
> 
> Not that efficient either, but shorter and fast enough.

But not something to use if you need well-distributed results, since
there's no guarantee that the key function gets called only once for
each element. For example, the positions in a 10-element list that the
number 9 ends up in over 100000 trials:

CL-USER> (let ((plist nil))
           (dotimes (i 100000)
             (incf (getf plist (position 9 (randomize 10)) 0)))
           (dotimes (i 10)
             (format t "~a: ~a~%" i (getf plist i))))
0: 24976
1: 24971
2: 12717
3: 7446
4: 4877
5: 3582
6: 2624
7: 2127
8: 6699
9: 9981

(Your mileage might vary based on the sort algorithm used by the
implementation.)

-- 
Juho Snellman

From: Rob Warnock
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sun, 26 Mar 2006 02:26:20 +0000
Message-ID: <QuudnQK5xuFRZrjZnZ2dnUVZ_tqdnZ2d@speakeasy.net>

Frank Buss  <··@frank-buss.de> wrote:
+---------------
| David Sletten wrote:
| > Instead, I would suggest using a vector, which allows random access,
| > and the Fisher-Yates shuffle algorithm (see Knuth Vol. 2 Pg. 145).
...
| > (defun shuffle (v)
| >    (loop for j from (1- (length v)) downto 1
| >          for k = (random (1+ j))
| >          do (rotatef (aref v j) (aref v k)))
| >    v)
| 
| I think this is the fastest algorithm.
| 
| Another nice one I've read in this newsgroup some time ago
| (I don't remember where, but I remember the concept) :
| 
| (defun randomize (limit)
|   (mapcar #'car
|           (sort (loop for i from 0 below limit
|                       collect (cons i (random 1.0)))
|                 #'(lambda (x y) (< (cdr x) (cdr y))))))
+---------------

The pages <http://www.nist.gov/dads/HTML/fisherYatesShuffle.html>
and <http://www.nist.gov/dads/HTML/perfectShuffle.html> contain
comments on & references to why you don't want to do this. Briefly;

    Note: Attaching random tags then sorting (see permutation)
    may not be a perfect shuffle: if tags may be duplicated, a
    stable sort has a greater chance of producing permutations
    with some items in the original order.


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Peter Seibel
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sun, 26 Mar 2006 13:40:15 +0000
Message-ID: <m2psk91g06.fsf@gigamonkeys.com>

····@rpw3.org (Rob Warnock) writes:

> Frank Buss  <··@frank-buss.de> wrote:
> +---------------
> | David Sletten wrote:
> | > Instead, I would suggest using a vector, which allows random access,
> | > and the Fisher-Yates shuffle algorithm (see Knuth Vol. 2 Pg. 145).
> ...
> | > (defun shuffle (v)
> | >    (loop for j from (1- (length v)) downto 1
> | >          for k = (random (1+ j))
> | >          do (rotatef (aref v j) (aref v k)))
> | >    v)
> | 
> | I think this is the fastest algorithm.
> | 
> | Another nice one I've read in this newsgroup some time ago
> | (I don't remember where, but I remember the concept) :
> | 
> | (defun randomize (limit)
> |   (mapcar #'car
> |           (sort (loop for i from 0 below limit
> |                       collect (cons i (random 1.0)))
> |                 #'(lambda (x y) (< (cdr x) (cdr y))))))
> +---------------
>
> The pages <http://www.nist.gov/dads/HTML/fisherYatesShuffle.html>
> and <http://www.nist.gov/dads/HTML/perfectShuffle.html> contain
> comments on & references to why you don't want to do this. Briefly;
>
>     Note: Attaching random tags then sorting (see permutation)
>     may not be a perfect shuffle: if tags may be duplicated, a
>     stable sort has a greater chance of producing permutations
>     with some items in the original order.


Here's a fairly simple yet efficient algorithm for shuffling lists,
based on a post from comp.lang.functional by Ben Rudiak-Gould. It
should produce a perfect shuffle.

  (defun shuffle-list (list)
    "Return a shuffled copy of list. Based on a post by Ben
  Rudiak-Gould in comp.lang.functional, Message-ID:
  <··································@4ax.com>"
    (if (or (null list) (null (cdr list)))
        list
        (let (heads tails)
          (dolist (item list (nconc (shuffle-list heads) (shuffle-list tails)))
            (if (zerop (random 2))
                (push item heads)
                (push item tails))))))

-Peter

-- 
Peter Seibel           * ·····@gigamonkeys.com
Gigamonkeys Consulting * http://www.gigamonkeys.com/
Practical Common Lisp  * http://www.gigamonkeys.com/book/

From: Gareth McCaughan
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sun, 26 Mar 2006 15:07:52 +0000
Message-ID: <87hd5ls1fn.fsf@g.mccaughan.ntlworld.com>

Peter Seibel wrote:

> Here's a fairly simple yet efficient algorithm for shuffling lists,
> based on a post from comp.lang.functional by Ben Rudiak-Gould. It
> should produce a perfect shuffle.
> 
>   (defun shuffle-list (list)
>     "Return a shuffled copy of list. Based on a post by Ben
>   Rudiak-Gould in comp.lang.functional, Message-ID:
>   <··································@4ax.com>"
>     (if (or (null list) (null (cdr list)))
>         list
>         (let (heads tails)
>           (dolist (item list (nconc (shuffle-list heads) (shuffle-list tails)))
>             (if (zerop (random 2))
>                 (push item heads)
>                 (push item tails))))))

Impossible. It only ever uses coin-flips -- (RANDOM 2) -- and it
uses only a finite number of them, so the probability of getting
any given permutation must have a power of 2 as its denominator.
Therefore it can't be right when n! isn't a power of 2; in other
words, even for n=3 it can't be right.

-- 
Gareth McCaughan
.sig under construc

From: Peter Seibel
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sun, 26 Mar 2006 16:15:21 +0000
Message-ID: <m2irq118tj.fsf@gigamonkeys.com>

Gareth McCaughan <················@pobox.com> writes:

> Peter Seibel wrote:
>
>> Here's a fairly simple yet efficient algorithm for shuffling lists,
>> based on a post from comp.lang.functional by Ben Rudiak-Gould. It
>> should produce a perfect shuffle.
>> 
>>   (defun shuffle-list (list)
>>     "Return a shuffled copy of list. Based on a post by Ben
>>   Rudiak-Gould in comp.lang.functional, Message-ID:
>>   <··································@4ax.com>"
>>     (if (or (null list) (null (cdr list)))
>>         list
>>         (let (heads tails)
>>           (dolist (item list (nconc (shuffle-list heads) (shuffle-list tails)))
>>             (if (zerop (random 2))
>>                 (push item heads)
>>                 (push item tails))))))
>
> Impossible. It only ever uses coin-flips -- (RANDOM 2) -- and it
> uses only a finite number of them, so the probability of getting
> any given permutation must have a power of 2 as its denominator.
> Therefore it can't be right when n! isn't a power of 2; in other
> words, even for n=3 it can't be right.

Well, I know you're a math guy (and I'm not). You may want to check
out the original post (referenced in the docstring above) and see if
his informal proof convinces you. In the meantime here's some science
(easier than math):

UTILITIES> (defun testit (iters) 
	     (let ((h (make-hash-table :test 'equal)))
	       (loop repeat iters do (incf (gethash (shuffle-list '(1 2 3)) h 0)))
	       (loop for k being the hash-keys of h 
		  do (format t "~a -> ~d (~f)~%" k (gethash k h) (/ (gethash k h) iters)))))
TESTIT
UTILITIES> (testit 10000)
(3 2 1) -> 1640 (0.164)
(3 1 2) -> 1669 (0.1669)
(2 3 1) -> 1684 (0.1684)
(1 2 3) -> 1630 (0.163)
(2 1 3) -> 1710 (0.171)
(1 3 2) -> 1667 (0.1667)
NIL
UTILITIES> (testit 100000)
(3 2 1) -> 16864 (0.16864)
(3 1 2) -> 16685 (0.16685)
(2 3 1) -> 16750 (0.1675)
(1 2 3) -> 16597 (0.16597)
(2 1 3) -> 16514 (0.16514)
(1 3 2) -> 16590 (0.1659)
NIL
UTILITIES> (testit 1000000)
(3 2 1) -> 166762 (0.166762)
(3 1 2) -> 166398 (0.166398)
(2 3 1) -> 166702 (0.166702)
(1 2 3) -> 166460 (0.16646)
(2 1 3) -> 166728 (0.166728)
(1 3 2) -> 166950 (0.16695)
NIL

Sure looks to me like it's heading toward an even distribution.

-Peter

-- 
Peter Seibel           * ·····@gigamonkeys.com
Gigamonkeys Consulting * http://www.gigamonkeys.com/
Practical Common Lisp  * http://www.gigamonkeys.com/book/

From: Alexander Schmolck
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sun, 26 Mar 2006 15:58:03 +0000
Message-ID: <yfs8xqxb3lg.fsf@oc.ex.ac.uk>

Gareth McCaughan <················@pobox.com> writes:

> Peter Seibel wrote:
> 
> > Here's a fairly simple yet efficient algorithm for shuffling lists,
> > based on a post from comp.lang.functional by Ben Rudiak-Gould. It
> > should produce a perfect shuffle.
> > 
> >   (defun shuffle-list (list)
> >     "Return a shuffled copy of list. Based on a post by Ben
> >   Rudiak-Gould in comp.lang.functional, Message-ID:
> >   <··································@4ax.com>"
> >     (if (or (null list) (null (cdr list)))
> >         list
> >         (let (heads tails)
> >           (dolist (item list (nconc (shuffle-list heads) (shuffle-list tails)))
> >             (if (zerop (random 2))
> >                 (push item heads)
> >                 (push item tails))))))
> 
> Impossible. It only ever uses coin-flips -- (RANDOM 2) -- and it
> uses only a finite number of them,

Maybe I'm missing something, but I think that's incorrect (the number is
potentially infinite as this is in fact not an algorithm in the proper sense
-- it's not guranteed to terminate).

There is another bug though -- contrary to the docstring, it doesn't always
return a copy.

'as

From: Peter Seibel
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sun, 26 Mar 2006 16:21:47 +0000
Message-ID: <m2acbd18it.fsf@gigamonkeys.com>

Alexander Schmolck <··········@gmail.com> writes:

> Gareth McCaughan <················@pobox.com> writes:
>
>> Peter Seibel wrote:
>> 
>> > Here's a fairly simple yet efficient algorithm for shuffling lists,
>> > based on a post from comp.lang.functional by Ben Rudiak-Gould. It
>> > should produce a perfect shuffle.
>> > 
>> >   (defun shuffle-list (list)
>> >     "Return a shuffled copy of list. Based on a post by Ben
>> >   Rudiak-Gould in comp.lang.functional, Message-ID:
>> >   <··································@4ax.com>"
>> >     (if (or (null list) (null (cdr list)))
>> >         list
>> >         (let (heads tails)
>> >           (dolist (item list (nconc (shuffle-list heads) (shuffle-list tails)))
>> >             (if (zerop (random 2))
>> >                 (push item heads)
>> >                 (push item tails))))))
>> 
>> Impossible. It only ever uses coin-flips -- (RANDOM 2) -- and it
>> uses only a finite number of them,
>
> Maybe I'm missing something, but I think that's incorrect (the number is
> potentially infinite as this is in fact not an algorithm in the proper sense
> -- it's not guranteed to terminate).
>
> There is another bug though -- contrary to the docstring, it doesn't always
> return a copy.

Ah, you mean in the case where it's called with a one-element list? Good point.

-Peter

-- 
Peter Seibel           * ·····@gigamonkeys.com
Gigamonkeys Consulting * http://www.gigamonkeys.com/
Practical Common Lisp  * http://www.gigamonkeys.com/book/

From: Christian Haselbach
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sun, 26 Mar 2006 16:43:14 +0000
Message-ID: <e06gb4$8gn$1@online.de>

Alexander Schmolck schrieb:
> Gareth McCaughan <················@pobox.com> writes:
> 
> 
>>Peter Seibel wrote:
>>
>>
>>>Here's a fairly simple yet efficient algorithm for shuffling lists,
>>>based on a post from comp.lang.functional by Ben Rudiak-Gould. It
>>>should produce a perfect shuffle.
>>>
>>>  (defun shuffle-list (list)
>>>    "Return a shuffled copy of list. Based on a post by Ben
>>>  Rudiak-Gould in comp.lang.functional, Message-ID:
>>>  <··································@4ax.com>"
>>>    (if (or (null list) (null (cdr list)))
>>>        list
>>>        (let (heads tails)
>>>          (dolist (item list (nconc (shuffle-list heads) (shuffle-list tails)))
>>>            (if (zerop (random 2))
>>>                (push item heads)
>>>                (push item tails))))))
>>
>>Impossible. It only ever uses coin-flips -- (RANDOM 2) -- and it
>>uses only a finite number of them,
> 
> 
> Maybe I'm missing something, but I think that's incorrect (the number is
> potentially infinite as this is in fact not an algorithm in the proper sense
> -- it's not guranteed to terminate).

The definitions for algorithm that I know of do not require termination. 
I have a hard time understanding the above let-dolist combination, but 
looking at the Haskell implementation in the referenced posting, it is 
not guaranteed to terminate. The propability for getting a 
non-terminating run is 0, but they do exist.

However, the runs should tend to use about n*log(n) coin-flips, 
considering that the splitting into two sublists is most likely to 
result in nearly equally long lists. Therefore, every permutation might 
be possible, but some are very unlikely to get, assuming that we need 
n^2 coin-flips to get every combination, as Gareth pointed out.

Ciao Christian

From: Gareth McCaughan
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Fri, 07 Apr 2006 00:14:41 +0000
Message-ID: <87odzeuu23.fsf@g.mccaughan.ntlworld.com>

Alexander Schmolck wrote (several days ago; I've been busy...):

> Gareth McCaughan <················@pobox.com> writes:
> 
>> Peter Seibel wrote:
>> 
>>> Here's a fairly simple yet efficient algorithm for shuffling lists,
>>> based on a post from comp.lang.functional by Ben Rudiak-Gould. It
>>> should produce a perfect shuffle.
>>> 
>>>   (defun shuffle-list (list)
>>>     "Return a shuffled copy of list. Based on a post by Ben
>>>   Rudiak-Gould in comp.lang.functional, Message-ID:
>>>   <··································@4ax.com>"
>>>     (if (or (null list) (null (cdr list)))
>>>         list
>>>         (let (heads tails)
>>>           (dolist (item list (nconc (shuffle-list heads) (shuffle-list tails)))
>>>             (if (zerop (random 2))
>>>                 (push item heads)
>>>                 (push item tails))))))
>> 
>> Impossible. It only ever uses coin-flips -- (RANDOM 2) -- and it
>> uses only a finite number of them,
> 
> Maybe I'm missing something, but I think that's incorrect (the number is
> potentially infinite as this is in fact not an algorithm in the proper sense
> -- it's not guranteed to terminate).

Hm. You're right; I misread the code. In fact, all that's
necessary to break my argument is for the number of coin-flips
used not to be bounded. I should have written "uses only a
bounded number of them"; if I'd been careful about that,
perhaps I'd have checked my reading of the code more
carefully! Apologies to anyone who was foolish enough to
believe me.

But, indeed, this one isn't guaranteed to terminate -- though,
with an idealized RNG, it would terminate with probability 1.

-- 
Gareth McCaughan
.sig under construc

From: Marcin 'Qrczak' Kowalczyk
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sun, 26 Mar 2006 16:09:35 +0000
Message-ID: <87slp5nq68.fsf@qrnik.zagroda>

Peter Seibel <·····@gigamonkeys.com> writes:

> Here's a fairly simple yet efficient algorithm for shuffling lists,
> based on a post from comp.lang.functional by Ben Rudiak-Gould. It
> should produce a perfect shuffle.

I compared it with the classic algorithm: pick a random element below
n, swap it with (n-1)th, decrement n, loop. On a 10-element sequence
the nice algorithm was 3 times slower than the classic, on 100 elements -
5 times slower, on 1000 elements - 8 times slower, on 10000 elements -
9 times slower, on 100000 elements - 11 times slower.

Of course this depends on the overhead of random number generator:
it gets worse with slower generators, because they must generate many
bools instead of fewer integers, assuming that generating a random
bool requires about the same work as generating an integer from a
reasonable range, but uses only one bit of a random word. This was
implemented in my language, the generator was Mersenne Twister.

Another advantage of the classic algorithm is that it can be implemented
lazily, such that taking only a few elements of the result is cheap.

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From: Peter Seibel
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sun, 26 Mar 2006 18:00:40 +0000
Message-ID: <m21wwp13y0.fsf@gigamonkeys.com>

Marcin 'Qrczak' Kowalczyk <······@knm.org.pl> writes:

> Peter Seibel <·····@gigamonkeys.com> writes:
>
>> Here's a fairly simple yet efficient algorithm for shuffling lists,
>> based on a post from comp.lang.functional by Ben Rudiak-Gould. It
>> should produce a perfect shuffle.
>
> I compared it with the classic algorithm: pick a random element below
> n, swap it with (n-1)th, decrement n, loop. On a 10-element sequence
> the nice algorithm was 3 times slower than the classic, on 100 elements -
> 5 times slower, on 1000 elements - 8 times slower, on 10000 elements -
> 9 times slower, on 100000 elements - 11 times slower.

Even when the data structure is a singly-linked list. It seems like
walking down the list to get to the random element is going to be
pretty expensive but I must admit I didn't actually compare the
performance of the code I posted against a list-based implementation
of Fisher-Yates.

-Peter

-- 
Peter Seibel           * ·····@gigamonkeys.com
Gigamonkeys Consulting * http://www.gigamonkeys.com/
Practical Common Lisp  * http://www.gigamonkeys.com/book/

From: Marcin 'Qrczak' Kowalczyk
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sun, 26 Mar 2006 19:01:07 +0000
Message-ID: <87mzfd58uk.fsf@qrnik.zagroda>

Peter Seibel <·····@gigamonkeys.com> writes:

>> I compared it with the classic algorithm: pick a random element below
>> n, swap it with (n-1)th, decrement n, loop. On a 10-element sequence
>> the nice algorithm was 3 times slower than the classic, on 100 elements -
>> 5 times slower, on 1000 elements - 8 times slower, on 10000 elements -
>> 9 times slower, on 100000 elements - 11 times slower.
>
> Even when the data structure is a singly-linked list. It seems like
> walking down the list to get to the random element is going to be
> pretty expensive but I must admit I didn't actually compare the
> performance of the code I posted against a list-based implementation
> of Fisher-Yates.

In the classic version each shuffle started with a list, converted
it to a vector, permuted it, and converted it to a list again. The
nice version worked with lists exclusively. These conversions used
10% of the time.



-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From: Marcin 'Qrczak' Kowalczyk
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sun, 26 Mar 2006 19:23:14 +0000
Message-ID: <87y7yx3t99.fsf@qrnik.zagroda>

Peter Seibel <·····@gigamonkeys.com> writes:

>> I compared it with the classic algorithm: pick a random element below
>> n, swap it with (n-1)th, decrement n, loop. On a 10-element sequence
>> the nice algorithm was 3 times slower than the classic, on 100 elements -
>> 5 times slower, on 1000 elements - 8 times slower, on 10000 elements -
>> 9 times slower, on 100000 elements - 11 times slower.
>
> Even when the data structure is a singly-linked list. It seems like
> walking down the list to get to the random element is going to be
> pretty expensive but I must admit I didn't actually compare the
> performance of the code I posted against a list-based implementation
> of Fisher-Yates.

In the classic version each shuffle started with a list, converted
it to a vector, permuted it, and converted it to a list again. The
nice version worked with lists exclusively. These conversions used
10% of the time.

Perhaps in a language implementation with faster random function
it would look differently. I wonder how much language-dependent
are such comparisons. Technical details follow.

In this implementation conversion of a vector to a list uses a single
big allocation, filling it with cons cells. Similarly, conversion of
a list to a vector doesn't need a write barrier for filling the vector
because it's done atomically wrt. GC. OTOH swapping two vector elements
uses a software write barrier.

The nice version prepends lists one element at a time, so these cons
cells are allocated individually. It uses immutable lists, no NCONC,
but appending lists uses one big allocation and fills it with cons
cells.

The core of the random generator is implemented in C. Computing a
random number is dispatched wrt. the random state, which amounts to
a hash table lookup and some function calls.

-- 
   __("<         Marcin Kowalczyk
   \__/       ······@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

From: Alexander Schmolck
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sun, 26 Mar 2006 23:20:10 +0000
Message-ID: <yfsveu0aj4l.fsf@oc.ex.ac.uk>

····@rpw3.org (Rob Warnock) writes:

> The pages <http://www.nist.gov/dads/HTML/fisherYatesShuffle.html>
> and <http://www.nist.gov/dads/HTML/perfectShuffle.html> contain
> comments on & references to why you don't want to do this. Briefly;
> 
>     Note: Attaching random tags then sorting (see permutation)
>     may not be a perfect shuffle: if tags may be duplicated, a
>     stable sort has a greater chance of producing permutations
>     with some items in the original order.

I'm not sure one really has to worry to much about this as long as one uses
something with enough states -- (random 1.0d0), say.

Assuming the RNG is good I'd randomly guess you should get >2^50 or so
uniformly distributed, distinct numbers (if your RNG is crap, you can't reach
most permutations even for fairly small n anyway). Let x=2^50 and n=1e6, the
length of your list to shuffle. So you got

p = x^n 

possibilites for the n-random numbers you associate with the list-items.
Conversely there are

u = x * (x-1) * ... * (x-n+1)

ways to obtain n unique random samples. So the probability that you will have
at least one duplicate (i.e a non-perfect shuffle) is

1 - u/p

let u' = (x-n+1)^n then this is obviously bounded from above by

1 - u'/p =  1 - ((x-n+1)/(x))^n = 1 - (1-(n+1)/x)^n 
         ~= 1 - (1 - 1e6/2^50)^1e6 ~= 1 - 0.999 = 0.0001

This doesn't look at all terrible to me, to be honest -- after all even if
you've got a small number of duplicates in 1e6 random numbers, it's not like
the algorithm will catastrophically fail (and if you've got more then 1e6
items, you'd rather use something more efficient anyway).

I wouldn't generally advise people to trust my mathematical reasoning (esp. at
1am), but some quick experiments with (random 1d0) and (random 1s0) seem to
back the conclusion up.

If I'm too worry about crappy algorithms, BTW then surely cmucl's
remove-duplicates seems like a more worthy object:

 (length (remove-duplicates (loop for c below 10000 collect (random 1.0))))

How long does this take in your lisp of choice? If you increase by a factor of
10? On my (not terribly new) version of cmucl this soon falls over because it
didn't apparently occur to anyone to use a hashtable for the (majority of)
cases where it's feasible and N isn't quite small, so that it runs in
quadratic time.

Is that fixed in newer cmucls or alternatively what's the chance of having a
patch accepted?

'as

From: Espen Vestre
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sat, 25 Mar 2006 23:49:50 +0000
Message-ID: <m1acbeqe3l.fsf@vestre.net>

David Sletten <·····@slytobias.com> writes:

> Using SBCL I get:
> * (time (shuffle (make-numvector 80000)))
> 
> Evaluation took:
>    0.202 seconds of real time
>    0.095459 seconds of user run time
>    0.010836 seconds of system run time
>    0 page faults and
>    3,196,776 bytes consed.

Hmm - I guess the make-numvector stuff makes up a lot of the
exetution time here?

Tested on LW on a my PowerBook:

CL-USER 22 > (setf test (make-numvector 800000))
(...)
CL-USER 23 > (length (time (shuffle test)))
Timing the evaluation of (SHUFFLE TEST)

user time    =      0.424
system time  =      0.008
Elapsed time =   0:00:00
Allocation   = 6240 bytes
0 Page faults
800000

I use a function very similar to your in my utility library, but
I prefer to use ELT instead of AREF. It gives me a performance hit
(about 750ms vs. 450ms for the 800000 element array), but it's
nice to have a general sequence function.
-- 
  (espen)

From: David Sletten
Subject: Re: list of numbers in randomised order - prg. struggles at large (~10^5) numbers
Date: Sat, 25 Mar 2006 23:56:16 +0000
Message-ID: <AMkVf.12010$w86.6524@tornado.socal.rr.com>

Espen Vestre wrote:

> David Sletten <·····@slytobias.com> writes:
> 
> 
>>Using SBCL I get:
>>* (time (shuffle (make-numvector 80000)))
>>
>>Evaluation took:
>>   0.202 seconds of real time
>>   0.095459 seconds of user run time
>>   0.010836 seconds of system run time
>>   0 page faults and
>>   3,196,776 bytes consed.
> 
> 
> Hmm - I guess the make-numvector stuff makes up a lot of the
> exetution time here?
> 
> Tested on LW on a my PowerBook:
> 
> CL-USER 22 > (setf test (make-numvector 800000))
> (...)
> CL-USER 23 > (length (time (shuffle test)))

Ha! I forgot to wrap the call to TIME inside LENGTH! Obviously I only 
showed a small part of the output...

David Sletten