Interesting clustering/permutations problem

From: Eli Bendersky
Subject: Interesting clustering/permutations problem
Date: Wed, 03 Jul 2002 17:14:52 +0000
Message-ID: <3D23310C.6EF993CD@in.the.newsgroup>

    Hi all,

I have a problem I want to solve in Lisp (needed for a
project I'm writing, in Lisp !).

Say there is a pool of 9 numbers. X takes a number,
then Y takes a number, then X takes a number, then
Y takes a number, etc (maximum - X has 5 numbers
and Y has 4). I want to list all the permutations of the
numbers in X & Y's hands, w/o necessarily reaching
the final state. The permutations are represented as
follows:

((a b ...) (c d ...))

A list of 2 lists, X's numbers and Y's numbers

For example, the following are legal permutations:

((1 5 2) (6 7))

((1) (nil))

((4 5 1 2 7) (9 8 6 3))

The following are illegal:

((1 2) (4 1))    <---  both have 1

((1) (2 3))     <--- Y can't have more numbers than X

((1 2 4) (5))    <--- X can have at most 1 number more than Y (because
it goes in turns)

Hope I'm making it clear enough, so far...

I implemented a function which does what I want, eg returns
a list of all legal permutations, here it is:

(defun partition (s)
  (loop
    for i in (possible-moves s)
    collect (next-state :X s i)
    append
    (loop
       with new-state = (next-state :X s i)
       for j in (possible-moves new-state)
       collect (next-state :Y new-state j)
       when (not (null (possible-moves (next-state :Y new-state j))))
       append (partition (next-state :Y new-state j)))))

Where possible-moves and next-state are defined as follows:

(defun possible-moves (state)
  (loop for i from 1 to 9
        unless (or (member i (first state))
                   (member i (second state)))
        collect i))

(defun next-state (player state move)
  "returns new state after making the indicated move by the indicated
player (:X or :O)"
  (let ((X-moves (first state))
        (O-moves (second state)))
    (if move
      (list (if (eq player :X)
              (sort (copy-list (cons move X-moves)) #'>)
              X-moves)
            (if (eq player :O)
              (sort (copy-list (cons move O-moves)) #'>)
              O-moves))
      state)))



It works, but runs too slow, probably because of the recursion
in the double loop.

Any ideas on a more efficient function/algorithm to achieve the same ?

Thanks in advance

--
Eli Bendersky - http://www.geocities.com/spur4444/

comp.lang.c++ FAQ: http://www.parashift.com/c++-faq-lite/
comp.lang.c FAQ  : http://www.eskimo.com/~scs/C-faq/top.html

Re: Interesting clustering/permutations problem Eli Bendersky
Re: Interesting clustering/permutations problem Kenny Tilton
Re: Interesting clustering/permutations problem Gareth McCaughan
- Re: Interesting clustering/permutations problem [slightly OT] Eli Bendersky
  - Re: Interesting clustering/permutations problem [slightly OT] Gareth McCaughan
Re: Interesting clustering/permutations problem Kenny Tilton
- Re: Interesting clustering/permutations problem Eli Bendersky
  - Re: Interesting clustering/permutations problem Kenny Tilton
  - Re: Interesting clustering/permutations problem Eli Bendersky
    - Re: Interesting clustering/permutations problem Kenny Tilton

From: Eli Bendersky
Subject: Re: Interesting clustering/permutations problem
Date: Wed, 03 Jul 2002 17:16:10 +0000
Message-ID: <3D23315A.B7AD6CC1@in.the.newsgroup>

Sorry, replace all occurences of :O with :Y

--
Eli Bendersky - http://www.geocities.com/spur4444/

comp.lang.c++ FAQ: http://www.parashift.com/c++-faq-lite/
comp.lang.c FAQ  : http://www.eskimo.com/~scs/C-faq/top.html

From: Kenny Tilton
Subject: Re: Interesting clustering/permutations problem
Date: Wed, 03 Jul 2002 20:28:59 +0000
Message-ID: <3D2360B3.84D3A2C8@nyc.rr.com>

Eli Bendersky wrote:
> (defun partition (s)
>   (loop
>     for i in (possible-moves s)
>     collect (next-state :X s i)
>     append
>     (loop
>        with new-state = (next-state :X s i)
>        for j in (possible-moves new-state)
>        collect (next-state :Y new-state j)
>        when (not (null (possible-moves (next-state :Y new-state j))))
>        append (partition (next-state :Y new-state j)))))

Bit of a waste in the last two lines, yes? I see (next-state :Y
new-state j) twice, the first result being discarded after the test only
to be recalculated if the test succeeds. Looks like you execute
(next-state :X s i) twice as well.

Overall, how about a bit vector instead of a list to encode the states?
Should eliminate a lot of consing, sorting and list-searching.

I never LOOP, so I might be missing something else.

-- 

 kenny tilton
 clinisys, inc
 ---------------------------------------------------------------
""Well, I've wrestled with reality for thirty-five years, Doctor, 
  and I'm happy to state I finally won out over it.""
                                                  Elwood P. Dowd

From: Gareth McCaughan
Subject: Re: Interesting clustering/permutations problem
Date: Wed, 03 Jul 2002 22:41:36 +0000
Message-ID: <slrnai6vcv.g3s.Gareth.McCaughan@g.local>

Eli Bendersky wrote:
>     Hi all,
> 
> I have a problem I want to solve in Lisp (needed for a
> project I'm writing, in Lisp !).
[SNIP: he wants all legal partial tic-tac-toe games,
each represented as a list of two lists: X's moves
and O's -- er, Y's -- moves.]

Try this.

(defun enumerate (p n used result)

  "The first N elements of the vector P contain distinct
  numbers from 1 to 9. USED is an integer in which bits
  1..9 indicate which digits have been used (1 for used,
  0 for unused).

  Push onto RESULT a representation of every partial game
  beginning with those moves in 'shuffled order' -- where
  Y's moves come first, then X's. Return RESULT at the end.

  On exit from this function, the first N elements of P
  are unchanged. Later elements of P may have been changed."

  (push (translate p n) result)
  (when (< n 9)
    (loop for next from 1 to 9
          for bit = 2 then (ash bit 1) do
      (when (zerop (logand used bit))
        (setf (aref p n) next)
        (setf result (enumerate p (1+ n) (logior used bit) result)))))
  result)

(defun translate (p n)

  "The first N elements of the vector P contain distinct
  integers from 1 to 9. Return a list of two lists,
  the first containing the first (FLOOR N 2) elements
  and the second containing the rest."

  (let ((x nil) (y nil))
    (loop for i from 0 below n do
      (if (oddp i)
        (push (aref p i) y)
        (push (aref p i) x)))
    (list x y)))

(defun enumerate-all-moves ()
  "Enumerate everything, from the starting position onwards."
  (enumerate (make-array 9) 0 0 nil))

On my box (1GHz Athlon, FreeBSD, CMU CL 18d) this takes about
10 seconds, about half of which is GC and most of the rest of
which is in TRANSLATE. The final result occupies several tens
of megabytes, so the GC overhead isn't very surprising. I bet
ACL does better; its GC is quite good.

-- 
Gareth McCaughan  ················@pobox.com
.sig under construc

From: Eli Bendersky
Subject: Re: Interesting clustering/permutations problem [slightly OT]
Date: Thu, 04 Jul 2002 06:31:24 +0000
Message-ID: <3D23EBBC.C17718F4@in.the.newsgroup>

Gareth McCaughan wrote:

> [SNIP: he wants all legal partial tic-tac-toe games,
> each represented as a list of two lists: X's moves
> and O's -- er, Y's -- moves.]

<snip>

> On my box (1GHz Athlon, FreeBSD, CMU CL 18d) this takes about
> 10 seconds, about half of which is GC and most of the rest of
> which is in TRANSLATE. The final result occupies several tens
> of megabytes, so the GC overhead isn't very surprising. I bet
> ACL does better; its GC is quite good.

Hi Gareth,

Indeed, your tic-tac-toe guess is right. I presented the problem
as a clustering one to avoid over-complication (of magic-square
tic-tac-toe representation, etc).

Perhaps you can help me understand why it takes so long...
According to my calculations, there are a 3025 ways in total
to make such division, so it shouldn't take more than a few
tens of K's of memory !

--
Eli Bendersky - http://www.geocities.com/spur4444/

comp.lang.c++ FAQ: http://www.parashift.com/c++-faq-lite/
comp.lang.c FAQ  : http://www.eskimo.com/~scs/C-faq/top.html

From: Gareth McCaughan
Subject: Re: Interesting clustering/permutations problem [slightly OT]
Date: Fri, 05 Jul 2002 22:00:53 +0000
Message-ID: <slrnaic5ol.15sl.Gareth.McCaughan@g.local>

Eli Bendersky wrote:

> Perhaps you can help me understand why it takes so long...
> According to my calculations, there are a 3025 ways in total
> to make such division, so it shouldn't take more than a few
> tens of K's of memory !

The list of positions has almost a million elements.
Much more than 3025. However, this is treating positions
as different when they have the same pieces in the same
places, provided they arose via different move orders.
This probably isn't what you want, but it's what (as I
understood) you asked for. :-)

If you want to enumerate positions instead of game histories,
here's one approach.

(defun enumerate (x-so-far y-so-far next nx ny result)
  "Push onto RESULT every list with the following properties,
  finally returning RESULT.
  (1) The list has two elements X and Y, both lists.
  (2) The length of X is equal to that of Y, or one greater.
  (3) Some tail of X equals X-SO-FAR and some tail of Y equals Y-SO-FAR.
  (4) X and Y are disjoint.
  (5) The elements of X and Y are distinct integers satisfying (<= 1 n 9).
  (6) The elements of X and Y each appear in ascending order.
  (7) The elements in X and Y which aren't in X-SO-FAR and Y-SO-FAR
      are at most NEXT.
  Precondition: X-SO-FAR and Y-SO-FAR satisfy all of those
  conditions. Furthermore, NX and NY are the lengths of those
  lists."

  (when (<= ny nx (1+ ny))
    (push (list x-so-far y-so-far) result))

  (loop for actual-next from 1 upto next do
    (setq result
          (enumerate (cons actual-next x-so-far) y-so-far
                     (1- actual-next)
                     (1+ nx) ny
                     result))
    (setq result
          (enumerate x-so-far (cons actual-next y-so-far)
                     (1- actual-next)
                     nx (1+ ny)
                     result)))

  result)

(defun check (x y)
  (let ((nx (length x))
        (ny (length y)))
    (assert (<= ny nx (1+ ny)))
    (assert (= (length (remove-duplicates (append x y)))
               (+ nx ny)))
    (assert (every (lambda (a) (<= 1 a 9)) x))
    (assert (every (lambda (a) (<= 1 a 9)) y))))

(let ((all-positions (enumerate nil nil 9 0 0 nil)))
  (loop for (x y) in all-positions do (check x y))
  (length all-positions))

... which, by the way, returns 6046. How sure are you about
your figure of 3025? What exactly is it the number of?

You could save some work by checking for the possibility that
NX and NY are so far apart that filling all the remaining spaces
wouldn't bring them back into balance, but the function takes
about 0.01 seconds as it is, and it's possible that the overhead
of checking the condition would outweigh the savings.

You ought to consider, by the way, whether you actually need to
enumerate the positions at all...

-- 
Gareth McCaughan  ················@pobox.com
.sig under construc

From: Kenny Tilton
Subject: Re: Interesting clustering/permutations problem
Date: Thu, 04 Jul 2002 03:03:13 +0000
Message-ID: <3D23BD19.632C1400@nyc.rr.com>

I looked at the code some more, found some stuff. Not rigorously
verified, but:

This was my timing of your code under ACL5:

; cpu time (non-gc) 9,916 msec user, 0 msec system
; cpu time (gc)     143,174 msec (00:02:23.174) user, 0 msec system
; cpu time (total)  153,090 msec (00:02:33.090) user, 0 msec system
; real time  153,090 msec (00:02:33.090)
; space allocation:
;  69,876,746 cons cells, 0 symbols, 73,960 other bytes, 0 static bytes

Switching the loop appends to nconcs (I think that's OK):

; cpu time (non-gc) 10,073 msec user, 0 msec system
; cpu time (gc)     47,170 msec user, 0 msec system
; cpu time (total)  57,243 msec user, 0 msec system
; real time  57,232 msec
; space allocation:
;  62,965,251 cons cells, 0 symbols, 68,848 other bytes, 0 static bytes

Removing the sort from the new state gen (why sort? if nec, do at end,
not after each insertion):

; cpu time (non-gc) 5,176 msec user, 0 msec system
; cpu time (gc)     26,149 msec user, 0 msec system
; cpu time (total)  31,325 msec user, 0 msec system
; real time  31,325 msec
; space allocation:
;  22,223,161 cons cells, 0 symbols, 34,600 other bytes, 0 static bytes

removing the copy-list from new state gen (looks unnecessary):

; cpu time (non-gc) 4,547 msec user, 0 msec system
; cpu time (gc)     5,858 msec user, 0 msec system
; cpu time (total)  10,405 msec user, 0 msec system
; real time  10,405 msec
; space allocation:
;  12,130,893 cons cells, 0 symbols, 14,824 other bytes, 0 static bytes

eliminating possible-moves (just loop from 1 to 9 testing the state):

; cpu time (non-gc) 4,465 msec user, 0 msec system
; cpu time (gc)     552 msec user, 0 msec system
; cpu time (total)  5,017 msec user, 0 msec system
; real time  5,017 msec
; space allocation:
;  10,157,053 cons cells, 0 symbols, 160 other bytes, 0 static bytes

etc etc... bit vectors would be next.

-- 

 kenny tilton
 clinisys, inc
 ---------------------------------------------------------------
""Well, I've wrestled with reality for thirty-five years, Doctor, 
  and I'm happy to state I finally won out over it.""
                                                  Elwood P. Dowd

From: Eli Bendersky
Subject: Re: Interesting clustering/permutations problem
Date: Thu, 04 Jul 2002 06:00:53 +0000
Message-ID: <3D23E495.9647FCC8@in.the.newsgroup>

Thanks for all the good suggestions !

I am indeed writing a Tic-Tac-Toe learning
program. I need all possible states because I want
to run a value-iteration algorithm on them.

A follow-up question: how do you time the code and
print all these useful statistics about its run-time and
memory consumption ?

The lisp I'm using is:

GNU CLISP 2.28 (released 2002-03-03)

--
Eli Bendersky - http://www.geocities.com/spur4444/

comp.lang.c++ FAQ: http://www.parashift.com/c++-faq-lite/
comp.lang.c FAQ  : http://www.eskimo.com/~scs/C-faq/top.html

From: Kenny Tilton
Subject: Re: Interesting clustering/permutations problem
Date: Thu, 04 Jul 2002 06:23:07 +0000
Message-ID: <3D23EB81.2DC5281E@nyc.rr.com>

Eli Bendersky wrote:
> A follow-up question: how do you time the code and
> print all these useful statistics about its run-time and
> memory consumption ?

 (time <form>)

That's in the CL spec. I use ACL, never had the pleasure of CLISP.

btw, shouldn't permutations be pruned at wins? And stalemates? And
howseabout milking the symmetry (starting at any of the four corners is
equivalent)?

-- 

 kenny tilton
 clinisys, inc
 ---------------------------------------------------------------
""Well, I've wrestled with reality for thirty-five years, Doctor, 
  and I'm happy to state I finally won out over it.""
                                                  Elwood P. Dowd

From: Eli Bendersky
Subject: Re: Interesting clustering/permutations problem
Date: Thu, 04 Jul 2002 06:21:41 +0000
Message-ID: <3D23E975.CEC0B81C@in.the.newsgroup>

Eli Bendersky wrote:

> A follow-up question: how do you time the code and
> print all these useful statistics about its run-time and
> memory consumption ?

If I am using the time macro, I get results
of the form:

Real time: 73.92294 sec.
Run time: 73.68 sec.
Space: 250416720 Bytes
GC: 49, GC time: 5.5 sec.

What does each field mean ? What is the difference here between Real
time and
Run time, and what is the number 49 after GC ?

Thanks in advance

--
Eli Bendersky - http://www.geocities.com/spur4444/

comp.lang.c++ FAQ: http://www.parashift.com/c++-faq-lite/
comp.lang.c FAQ  : http://www.eskimo.com/~scs/C-faq/top.html

From: Kenny Tilton
Subject: Re: Interesting clustering/permutations problem
Date: Thu, 04 Jul 2002 14:22:19 +0000
Message-ID: <3D245BD1.DF096196@nyc.rr.com>

Collapsing the duplicate next-state calls has it down to:

; cpu time (non-gc) 3,446 msec user, 0 msec system
; cpu time (gc)     230 msec user, 0 msec system
; cpu time (total)  3,676 msec user, 0 msec system
; real time  3,675 msec
; space allocation:
;  4,932,058 cons cells, 0 symbols, 64 other bytes, 0 static bytes

Eli Bendersky wrote:
> 
> If I am using the time macro, I get results
> of the form:
> 
> Real time: 73.92294 sec.
> Run time: 73.68 sec.
> Space: 250416720 Bytes
> GC: 49, GC time: 5.5 sec.
> 
> What does each field mean ? What is the difference here between Real
> time and
> Run time, and what is the number 49 after GC ?

Implementation dependent, so check the CLisp doc. Do you suppose that
"49" is the count of how many GCs got kicked off? And perhaps in a
multi-tasking setup the diff between real time and run time would show
you how much time went to processes other than the one being timed;
maybe real time is wall-clock elapsed time and runtime is the fraction
of that during which the timed process had the CPU. Just guessing.

-- 

 kenny tilton
 clinisys, inc
 ---------------------------------------------------------------
""Well, I've wrestled with reality for thirty-five years, Doctor, 
  and I'm happy to state I finally won out over it.""
                                                  Elwood P. Dowd