From: Jim Newton
Subject: searching for the largest array
Date: 
Message-ID: <2ku5vtF6dib2U1@uni-berlin.de>
hi everyone, i have an algorithm problem ... wondering if anyone has
any clever ideas.  I can (and have) solved it in a very exhaustive
way, but maybe there are some optimizations or different approaches.

You are given a list of 2d-points.  for simplicity assume that all x
and y values are integers.  Starting with a given point, find the
largest array of points in the set for which the given point is the
bottom left most point.

E.g.,  S = a set of points {(x1,y1) (x2,y2) (x3,y3) ....}

Of all the possible (dx,dy,n,m) such that (x1 + i*dx, y1 + j*dy)
is an element of S for all 0<=i<n and 0<=j<m, which such (dx,dy,n,m)
maximizes n*m?

graphically:

8     *         *
7         *
6 *   *   *
5 *
4 * * * *   *   *
3   *   *
2 * * * *   *   *
1 *   *
0 * * * *   *   *
   0 1 2 3 4 5 6 7

In this set the largest array containing (0,0) has
(dx=2,dy=2,n=2,m=4) because for i=0,1, and for j=0,1,2,3
every (0+2*i,0+2*j) is in the set.

8     *         *
7         *
6 @   @
5 *
4 @ * @ *   *   *
3   *   *
2 @ * @ *   *   *
1 *   *
0 @ * @ *   *   *
   0 1 2 3 4 5 6 7

There are larger arrays, but none containing (0,0).

Furthermore, if there is more than one solution, i'd
like to take one that is the squarest.  I.e.,
minimize abs(dx-dy).  I can sacrifice this step
as long as the solution is deterministic.

The way i've solved it so far is to take all the x
values and all the y values and generate a unique
list of all possible dx and dy.  Then for each pair
(dx,dy) solve for find all (n,m) taking he one that
maximizes (n*m), never examining an (n,m) for which
n*m is less than the current maximum.

Still this is an n^3 order search.

Is there a way to search the space somehow in a way
of decreasing n*m so that when we find the first one
we know it is the maximum?  Or does someone have
an even better idea?

-jim

From: Gareth McCaughan
Subject: Re: searching for the largest array
Date: 
Message-ID: <873c45fs92.fsf@g.mccaughan.ntlworld.com>
Jim Newton <·····@rdrop.com> writes:

> You are given a list of 2d-points.  for simplicity assume that all x
> and y values are integers.  Starting with a given point, find the
> largest array of points in the set for which the given point is the
> bottom left most point.
> 
> E.g.,  S = a set of points {(x1,y1) (x2,y2) (x3,y3) ....}
> 
> Of all the possible (dx,dy,n,m) such that (x1 + i*dx, y1 + j*dy)
> is an element of S for all 0<=i<n and 0<=j<m, which such (dx,dy,n,m)
> maximizes n*m?
...
> Furthermore, if there is more than one solution, i'd
> like to take one that is the squarest.  I.e.,
> minimize abs(dx-dy).  I can sacrifice this step
> as long as the solution is deterministic.
...
> The way i've solved it so far is to take all the x
> values and all the y values and generate a unique
> list of all possible dx and dy.  Then for each pair
> (dx,dy) solve for find all (n,m) taking he one that
> maximizes (n*m), never examining an (n,m) for which
> n*m is less than the current maximum.
> 
> Still this is an n^3 order search.

Sounds like more than n^3 to me. For each (dx,dy) you
may need to examine on the order of n^2 points, each of
which you can do in (on average) constant time, and the
number of possible (dx,dy) seems like it would be of
order n^2 too. Am I missing something?

How about the following?

Your main data structures are (1) a set of maximal
arrays found so far -- that's maximal in the sense
that you can't increase m or n without introducing
points you haven't processed yet -- and (2) a queue
of points still to be processed. I'll call these
ARRAYS and QUEUE. QUEUE will actually hold records
of the form (x,y,dx,dy), and we'll pull them out
in lexicographic order of, say, (x+y,x,y,dx+dy).

You need to be able to add elements to ARRAYS, and
remove them, rapidly. A hash table can do those
things in constant time on average; its worst case
may be bad, but basically never happens. There are
tree-based data structures that can do insertion and
deletion in time log(n); I think some fancy data
structures can do it in amortized constant time
or thereabouts, though I seem to remember that
the constants are unpleasantly large in practice.

You need to be able to add elements to QUEUE and
to remove them in order. So you need a priority
queue. There are plenty of structures that let you
do insertion and deletion in time of order
log(size of queue).

To begin with, set ARRAYS to contain just one array,
the one containing only (0,0). (For simplicity I'm
assuming that (0,0) is your bottom left point.) And
for each (dx,0) in the set, add (dx,0,dx,0) to QUEUE;
for each (0,dy), add (0,dy,0,dy).

Now, as long as QUEUE isn't empty ...

  - take an element (x,y,dx,dy) out of QUEUE,
    along with any other elements that start (x,y).
    (The ordering we chose for QUEUE guarantees
    that they're consecutive.)
  - take the arrays (x-dx,y-dy,dx,dy) out of ARRAYS
  - put each (x,y,dx,dy) into ARRAYS
  - for each dx such that (x+dx,y) is in the set,
    put (x+dx,y,dx,dy) into QUEUE. Likewise for
    (x,y+dy,dx,dy).

Finally, find the element of ARRAYS for which
x*y/(dx*dy) is maximal. (You can keep track of that
incrementally, but there's no way that can be more
efficient than just searching at the end.)

The time this takes is proportional to the number
of different things that ever go into ARRAYS, times
some relatively small logarithmic thing that depends
on the performance characteristics of the data structures
you choose. In the worst case, there can be rather a lot
of things in ARRAYS; if your set of points is very dense,
it can be as many as the sum over all (x,y) of d(x)d(y)
where d(x) is the number of divisors of x. That's
(sum over all x of d(x))^2, which is of order n^2 (log n)^2.
As I said, there's probably another log n factor there
because your set and priority queue operations will take
more than constant time. So the whole thing ought to
work in time n^2 (log n)^3 or thereabouts, at worst.
It will be much better when the set of points is sparse
and when there aren't many arrays in it.

-- 
Gareth McCaughan
.sig under construc
From: Jim Newton
Subject: Re: searching for the largest array
Date: 
Message-ID: <2llqq2Fds6bcU1@uni-berlin.de>
thanks gareth, i have not completely understood your suggestion
yet but something i've thought of but cannot quite get my
head around is the following.  for example.

if the max x  = 100
        max y  = 200

and i have identified an array dx=5,dy=10,rows=20,cols=10, rows*cols=200
then if i'm looking at dx=6,dy=17 then i know i don't need to examine it
because even if there are the maximum number of rows and columns 
100/6=16 by 200/17=11, then 16*11 = 176 < 200. I.e., even if dx=6,dy=17
identifies an array then it is not useful because it cannot generate
more than 200 hits.

is this a useful optimization? or am i spending extra time checking for
a situation that is not going to occure very often?

-jim


Gareth McCaughan wrote:
> Jim Newton <·····@rdrop.com> writes:
> 
> 
>>You are given a list of 2d-points.  for simplicity assume that all x
>>and y values are integers.  Starting with a given point, find the
>>largest array of points in the set for which the given point is the
>>bottom left most point.
>>
>>E.g.,  S = a set of points {(x1,y1) (x2,y2) (x3,y3) ....}
>>
>>Of all the possible (dx,dy,n,m) such that (x1 + i*dx, y1 + j*dy)
>>is an element of S for all 0<=i<n and 0<=j<m, which such (dx,dy,n,m)
>>maximizes n*m?
> 
> ...
> 
>>Furthermore, if there is more than one solution, i'd
>>like to take one that is the squarest.  I.e.,
>>minimize abs(dx-dy).  I can sacrifice this step
>>as long as the solution is deterministic.
> 
> ...
> 
>>The way i've solved it so far is to take all the x
>>values and all the y values and generate a unique
>>list of all possible dx and dy.  Then for each pair
>>(dx,dy) solve for find all (n,m) taking he one that
>>maximizes (n*m), never examining an (n,m) for which
>>n*m is less than the current maximum.
>>
>>Still this is an n^3 order search.
> 
> 
> Sounds like more than n^3 to me. For each (dx,dy) you
> may need to examine on the order of n^2 points, each of
> which you can do in (on average) constant time, and the
> number of possible (dx,dy) seems like it would be of
> order n^2 too. Am I missing something?
> 
> How about the following?
> 
> Your main data structures are (1) a set of maximal
> arrays found so far -- that's maximal in the sense
> that you can't increase m or n without introducing
> points you haven't processed yet -- and (2) a queue
> of points still to be processed. I'll call these
> ARRAYS and QUEUE. QUEUE will actually hold records
> of the form (x,y,dx,dy), and we'll pull them out
> in lexicographic order of, say, (x+y,x,y,dx+dy).
> 
> You need to be able to add elements to ARRAYS, and
> remove them, rapidly. A hash table can do those
> things in constant time on average; its worst case
> may be bad, but basically never happens. There are
> tree-based data structures that can do insertion and
> deletion in time log(n); I think some fancy data
> structures can do it in amortized constant time
> or thereabouts, though I seem to remember that
> the constants are unpleasantly large in practice.
> 
> You need to be able to add elements to QUEUE and
> to remove them in order. So you need a priority
> queue. There are plenty of structures that let you
> do insertion and deletion in time of order
> log(size of queue).
> 
> To begin with, set ARRAYS to contain just one array,
> the one containing only (0,0). (For simplicity I'm
> assuming that (0,0) is your bottom left point.) And
> for each (dx,0) in the set, add (dx,0,dx,0) to QUEUE;
> for each (0,dy), add (0,dy,0,dy).
> 
> Now, as long as QUEUE isn't empty ...
> 
>   - take an element (x,y,dx,dy) out of QUEUE,
>     along with any other elements that start (x,y).
>     (The ordering we chose for QUEUE guarantees
>     that they're consecutive.)
>   - take the arrays (x-dx,y-dy,dx,dy) out of ARRAYS
>   - put each (x,y,dx,dy) into ARRAYS
>   - for each dx such that (x+dx,y) is in the set,
>     put (x+dx,y,dx,dy) into QUEUE. Likewise for
>     (x,y+dy,dx,dy).
> 
> Finally, find the element of ARRAYS for which
> x*y/(dx*dy) is maximal. (You can keep track of that
> incrementally, but there's no way that can be more
> efficient than just searching at the end.)
> 
> The time this takes is proportional to the number
> of different things that ever go into ARRAYS, times
> some relatively small logarithmic thing that depends
> on the performance characteristics of the data structures
> you choose. In the worst case, there can be rather a lot
> of things in ARRAYS; if your set of points is very dense,
> it can be as many as the sum over all (x,y) of d(x)d(y)
> where d(x) is the number of divisors of x. That's
> (sum over all x of d(x))^2, which is of order n^2 (log n)^2.
> As I said, there's probably another log n factor there
> because your set and priority queue operations will take
> more than constant time. So the whole thing ought to
> work in time n^2 (log n)^3 or thereabouts, at worst.
> It will be much better when the set of points is sparse
> and when there aren't many arrays in it.
> 
From: Gareth McCaughan
Subject: Re: searching for the largest array
Date: 
Message-ID: <87y8li1ejy.fsf@g.mccaughan.ntlworld.com>
Jim Newton <·····@rdrop.com> writes:

> thanks gareth, i have not completely understood your suggestion
> yet but something i've thought of but cannot quite get my
> head around is the following.  for example.
> 
> if the max x  = 100
>         max y  = 200
> 
> and i have identified an array dx=5,dy=10,rows=20,cols=10, rows*cols=200
> then if i'm looking at dx=6,dy=17 then i know i don't need to examine it
> because even if there are the maximum number of rows and columns
> 100/6=16 by 200/17=11, then 16*11 = 176 < 200. I.e., even if dx=6,dy=17
> identifies an array then it is not useful because it cannot generate
> more than 200 hits.
> 
> is this a useful optimization? or am i spending extra time checking for
> a situation that is not going to occure very often?

Dunno. It's easy enough to implement that you can probably afford
to try it both ways and see whether it helps on typical datasets.

-- 
Gareth McCaughan
.sig under construc
From: Mark McConnell
Subject: Re: searching for the largest array
Date: 
Message-ID: <d3aed052.0407180439.506930b7@posting.google.com>
Jim Newton <·····@rdrop.com> wrote in message news:<··············@uni-berlin.de>...
> hi everyone, i have an algorithm problem ... wondering if anyone has
> any clever ideas.  I can (and have) solved it in a very exhaustive
> way, but maybe there are some optimizations or different approaches.
> 
> You are given a list of 2d-points.  for simplicity assume that all x
> and y values are integers.  Starting with a given point, find the
> largest array of points in the set for which the given point is the
> bottom left most point.
> 
> E.g.,  S = a set of points {(x1,y1) (x2,y2) (x3,y3) ....}
> 
> Of all the possible (dx,dy,n,m) such that (x1 + i*dx, y1 + j*dy)
> is an element of S for all 0<=i<n and 0<=j<m, which such (dx,dy,n,m)
> maximizes n*m?
> 
> [snip]

Not a solution, but an idea... It would be good to sort your points
lexicographically.  That is, sort them by less-than on the x's, and
sort by less-than on the y's whenever there is a tie in the x's.  This
sorting is inexpensive compared to the rest of the algorithm.  It
could help you avoid looking at points you don't have to.  Maybe even
make two sorted copies of the points, with the second copy sorted by
y's first and then x's.

[By the way, did you write a paper in the late 80s on the
finitely-generated subgroups of a torus?  I have a vague memory of a
"Jim Newton" writing such a paper, and the topics are similar.]
From: Jim Newton
Subject: Re: searching for the largest array
Date: 
Message-ID: <2m02ahFhcjmuU1@uni-berlin.de>
no the torus paper was not mine.  i was still doing undergrad work
during the 80.  i did not know what a finite subgroup was probably.

-jim

Mark McConnell wrote:
> Jim Newton <·····@rdrop.com> wrote in message news:<··············@uni-berlin.de>...
> 
>>hi everyone, i have an algorithm problem ... wondering if anyone has
>>any clever ideas.  I can (and have) solved it in a very exhaustive
>>way, but maybe there are some optimizations or different approaches.
>>
>>You are given a list of 2d-points.  for simplicity assume that all x
>>and y values are integers.  Starting with a given point, find the
>>largest array of points in the set for which the given point is the
>>bottom left most point.
>>
>>E.g.,  S = a set of points {(x1,y1) (x2,y2) (x3,y3) ....}
>>
>>Of all the possible (dx,dy,n,m) such that (x1 + i*dx, y1 + j*dy)
>>is an element of S for all 0<=i<n and 0<=j<m, which such (dx,dy,n,m)
>>maximizes n*m?
>>
>>[snip]
> 
> 
> Not a solution, but an idea... It would be good to sort your points
> lexicographically.  That is, sort them by less-than on the x's, and
> sort by less-than on the y's whenever there is a tie in the x's.  This
> sorting is inexpensive compared to the rest of the algorithm.  It
> could help you avoid looking at points you don't have to.  Maybe even
> make two sorted copies of the points, with the second copy sorted by
> y's first and then x's.
> 
> [By the way, did you write a paper in the late 80s on the
> finitely-generated subgroups of a torus?  I have a vague memory of a
> "Jim Newton" writing such a paper, and the topics are similar.]