How to implement these conventional data structures?

From: Wang Yin
Subject: How to implement these conventional data structures?
Date: Thu, 23 Oct 2003 07:36:06 +0000
Message-ID: <m3u1605rtl.fsf@wangyin.com>

Hi, 

Now I'm going to write some realistic LISP programs.  I want to
implement a minimum spanning tree (MST) algorithm on a point set in
the plane.

I'll first use a plane sweep algorithm by Steven Fortune to generate a
Delaunay triangulation on the point set. This is a graph that contains
an MST.  And then I find an MST on it with Tarjan's O(n) MST algorithm
on planar graphs.

I have implemented this with advanced data structures in C some days
ago. With pains of course. I'm now shifing to LISP because I have much
more complex algorithms built upon this MST. I cannot haddle the
complexity with C any further, with myself.

Now I'm writing the Common Lisp code, but I met some decision problems
on data structures. Thanks to Lisp, I can make decision at the last
stage and I can change my bottom data structures at any time :)

But the final decision have to be made eventurally. I must implemente
the following data structures.


1. I need to input a large set of points in the plane with coordinate
   information. I'm now storing them in lists. I must sort them
   according to their coordinates. "On Lisp" says "avoid CONSes". I
   wonder if the sort functions comes with Common Lisp is efficient
   with list? But if I use arrays, how do I enlarge the array when I
   add more points to the set?

2. An O(log n) search time structure for the wavefront list. It keeps
   the bisectors in some order and I must be able to find or delete a
   wavefront node in O(log n) time. And I must be able to search for
   the predecessor and successor of a bisector in contant time. In C
   I've tried a kind of hashed list and AVL trees, redblack trees,
   among them the hashed list gives best performance with medium size
   data sets and threaded RB trees becomes the best when the data set
   becomes huge. I've found some RB tree and N-Ary tree code in Common
   Lisp. But I don't know what kind gives the best performance here.

3. A priority queue for use in Tarjan's MST algorithm. I implemented
   this with a heap in C. But I don't know if it's proper to implement
   a heap with Common Lisp's array?

4. I also plan to implement Kruskal's MST algorithm. So I need
   disjoint sets. When implementing disjoint sets, I used a field
   "set-link" in my point structure with defstruct. Then I can find
   what set a node is in, and I can merge the sets in constant time
   (actully reverse Ackerman's function). It works of course, but I
   think this method builds the "set" structure into my nodes and
   breaks the modularity somewhat. I wonder if there is a more modular
   way to do this with the same efficiency?


Thanks. All kinds of suggestions is welcome :)


-- 
Yin Wang,
EDA Lab,
Deparment of Computer Science and Technology,
Tsinghua University,
100084
Beijing China

Re: How to implement these conventional data structures? Madhu
- Re: How to implement these conventional data structures? Madhu
Re: How to implement these conventional data structures? Gareth McCaughan
Re: How to implement these conventional data structures? Wang Yin
- Re: How to implement these conventional data structures? Marco Antoniotti
Re: How to implement these conventional data structures? Alain Picard
Re: How to implement these conventional data structures? Robert St. Amant

From: Madhu
Subject: Re: How to implement these conventional data structures?
Date: Thu, 23 Oct 2003 16:19:24 +0000
Message-ID: <xfead7ssz8z.fsf@shelby.cs.unm.edu>

Helu
* Wang Yin  <··············@wangyin.com> :
| Hi, 
|
| 1. I need to input a large set of points in the plane with coordinate
|    information. I'm now storing them in lists. I must sort them
|    according to their coordinates. "On Lisp" says "avoid CONSes". I
|    wonder if the sort functions comes with Common Lisp is efficient
|    with list? But if I use arrays, how do I enlarge the array when I
|    add more points to the set?
VECTOR-PUSH-EXTEND
ADJUST-ARRAY
and MAKE-ARRAY with the DISPLACED-TO keyword 
<http://www.lispworks.com/reference/HyperSpec/Body/f_adjust.htm>
are all options.

I've often gotten by just fine using CL:SORT on lists of conses to
keep points sorted. Mileage varies depending on how often you call
SORT and how reasonable your lisp implementaion' version's SORT
function is.

|
| 2. An O(log n) search time structure for the wavefront list. It keeps
|    the bisectors in some order and I must be able to find or delete a
|    wavefront node in O(log n) time. And I must be able to search for
|    the predecessor and successor of a bisector in contant time. In C
|    I've tried a kind of hashed list and AVL trees, redblack trees,
|    among them the hashed list gives best performance with medium size
|    data sets and threaded RB trees becomes the best when the data set
|    becomes huge. I've found some RB tree and N-Ary tree code in Common
|    Lisp. But I don't know what kind gives the best performance here.

Again I'm not sure about the performance for your problem, but there
is an option of using a TREAP data structure here (based onrandomized
search trees). It is mostly O(log N) operations which I think suit
your complexity requirements. I have CL code I'd be glad to share if
you are willing to try it out :>

| 3. A priority queue for use in Tarjan's MST algorithm. I implemented
|    this with a heap in C. But I don't know if it's proper to implement
|    a heap with Common Lisp's array?

(I seriously  doubt it is forbidden!)

| 4. I also plan to implement Kruskal's MST algorithm. So I need
|    disjoint sets. When implementing disjoint sets, I used a field
|    "set-link" in my point structure with defstruct. Then I can find
|    what set a node is in, and I can merge the sets in constant time
|    (actully reverse Ackerman's function). It works of course, but I
|    think this method builds the "set" structure into my nodes and
|    breaks the modularity somewhat. I wonder if there is a more modular
|    way to do this with the same efficiency?

This sounds like a job for the UNION-FIND algorithm.  Marco Antonietti
has implemented this data structure in CL and it is available from the
CLOCC project from soresforge. (http://clocc.sourceforge.net)

Regards
Madhu

From: Madhu
Subject: Re: How to implement these conventional data structures?
Date: Thu, 23 Oct 2003 17:10:31 +0000
Message-ID: <xfe7k2vubg8.fsf@shelby.cs.unm.edu>

(Corrcting what I posted)
* Madhu  <···············@shelby.cs.unm.edu> :
| Helu
| * Wang Yin  <··············@wangyin.com> :
<snip>
| |    with list? But if I use arrays, how do I enlarge the array when I
| |    add more points to the set?
| VECTOR-PUSH-EXTEND
| ADJUST-ARRAY
| <http://www.lispworks.com/reference/HyperSpec/Body/f_adjust.htm>
| are all options.

| and MAKE-ARRAY with the DISPLACED-TO keyword 
|
isnt. the DISPLACED-TO keyword will not enlarge the array. (It'll do the
opposite). Sorry.

(I also meant to note that you could check Marco's UNION-FIND API and
compare it , and see if you thought that was modular)

Regards
Madhu

From: Gareth McCaughan
Subject: Re: How to implement these conventional data structures?
Date: Thu, 23 Oct 2003 09:39:45 +0000
Message-ID: <87he20jnry.fsf@g.mccaughan.ntlworld.com>

Wang Yin wrote:

> Now I'm going to write some realistic LISP programs.  I want to
> implement a minimum spanning tree (MST) algorithm on a point set in
> the plane.
...
> 1. I need to input a large set of points in the plane with coordinate
>    information. I'm now storing them in lists. I must sort them
>    according to their coordinates. "On Lisp" says "avoid CONSes". I
>    wonder if the sort functions comes with Common Lisp is efficient
>    with list? But if I use arrays, how do I enlarge the array when I
>    add more points to the set?

ADjustable arrays. See, e.g., ADJUST-ARRAY and VECTOR-PUSH-EXTEND
in the HyperSpec. You should check whether your implementation
extends generously enough under the cover to give you linear or
near-linear performance when repeatedly extending arrays; if not,
you may want to write a little wrapper around VECTOR-PUSH-EXTEND
that always extends the list by, say, a factor of 2 or 1.5 or
something. (It doesn't need to be difficult, because VECTOR-PUSH-EXTEND
helpfully takes an optional argument saying the minimum amount to
extend by if the vector must be extended.)

> 2. An O(log n) search time structure for the wavefront list. It keeps
>    the bisectors in some order and I must be able to find or delete a
>    wavefront node in O(log n) time. And I must be able to search for
>    the predecessor and successor of a bisector in contant time. In C
>    I've tried a kind of hashed list and AVL trees, redblack trees,
>    among them the hashed list gives best performance with medium size
>    data sets and threaded RB trees becomes the best when the data set
>    becomes huge. I've found some RB tree and N-Ary tree code in Common
>    Lisp. But I don't know what kind gives the best performance here.

So try them :-). Seriously: it will depend on the implementation
details (both of your Lisp implementation and of the data
structure).

> 3. A priority queue for use in Tarjan's MST algorithm. I implemented
>    this with a heap in C. But I don't know if it's proper to implement
>    a heap with Common Lisp's array?

I don't see any reason why not.

> 4. I also plan to implement Kruskal's MST algorithm. So I need
>    disjoint sets. When implementing disjoint sets, I used a field
>    "set-link" in my point structure with defstruct. Then I can find
>    what set a node is in, and I can merge the sets in constant time
>    (actully reverse Ackerman's function). It works of course, but I
>    think this method builds the "set" structure into my nodes and
>    breaks the modularity somewhat. I wonder if there is a more modular
>    way to do this with the same efficiency?

Probably not "with the same efficiency", but obviously
you can do

    (defstruct set-node
      actual-object
      parent
      set-size)

or whatever it is you need for the disjoint set operations,
which separates "set" stuff from "point" stuff cleanly.
It means following one more pointer any time you have a
SET-NODE and want a point. Against that, if you're doing
set stuff and point stuff at different times then not
having the two intermingled so much may be a bit more
cache-friendly. It seems unlikely that this will make
much difference to the time your program takes.

-- 
Gareth McCaughan
.sig under construc

From: Wang Yin
Subject: Re: How to implement these conventional data structures?
Date: Fri, 24 Oct 2003 06:37:42 +0000
Message-ID: <m3k76vi1jd.fsf@wangyin.com>

Thank you very much!

I've tried the union-find package. It's what I wanted.
I've never thought of seperate unions from my data structures before.
I learned a good abstraction method :)

What's more, I need a good structure for graphs.  It'll have vertices
and edges. It must be fast to traverse the graph. It must be fast to
list all edges of a vertex. And it must be fast to add and delete
edges and vertices. 

In C, I used to use double linked list edge structures which contains
two pointers to the two end points. And another 4 pointers to next and
previous edges of the two end points.

Can you suggest a good graph representation in LISP?


-- 
Yin Wang,
EDA Lab,
Deparment of Computer Science and Technology,
Tsinghua University,
100084
Beijing China

From: Marco Antoniotti
Subject: Re: How to implement these conventional data structures?
Date: Fri, 24 Oct 2003 16:26:06 +0000
Message-ID: <xocmb.63$KR3.17421@typhoon.nyu.edu>

Wang Yin wrote:

> Thank you very much!
> 
> I've tried the union-find package. It's what I wanted.
> I've never thought of seperate unions from my data structures before.
> I learned a good abstraction method :)

Shameless plug :)  There is a UNION-FIND package in the CLOCC :) 
http://clocc.sf.net.

Cheers
--
Marco

From: Alain Picard
Subject: Re: How to implement these conventional data structures?
Date: Fri, 24 Oct 2003 14:56:16 +0000
Message-ID: <87fzhik7lb.fsf@memetrics.com>

Wang Yin <··@wangyin.com> writes:

>
> 1. I need to input a large set of points in the plane with coordinate
>    information. I'm now storing them in lists. I must sort them
>    according to their coordinates. "On Lisp" says "avoid CONSes". I
>    wonder if the sort functions comes with Common Lisp is efficient
>    with list? But if I use arrays, how do I enlarge the array when I
>    add more points to the set?

What _On Lisp_ means by "avoiding CONSes" is probably
avoiding consING.  This is an advanced topic, so you shouln't
worry too much about it initially.  One day you'll notice
that long chains of MAPCAR, REMOVE-IF etc tend to run slowly...

For a structure used as a _container_, a list (or tree) 
may be just fine.  There is no gross inefficiency in Lisp's SORT
on lists (be aware that it is destructive, however).

If you don't need O(1) access to the elements, lists are fine.
For example, if you're basically going to be mapping across
(or recursively searching down tree like) lists, lists are fine.
If you do needs such access, you can make adjustable arrays which
you can extend.

>
> 2. An O(log n) search time structure for the wavefront list. It keeps
>    the bisectors in some order and I must be able to find or delete a
>    wavefront node in O(log n) time. And I must be able to search for
>    the predecessor and successor of a bisector in contant time. In C

Okay, you lost me in the precise details of your required algorithm,
but by the description it sounds like you're trying to keep "pointers"
to other parts of the structure; typically you'd do that by storing
small structs in the container; e.g.

(defstruct node
  predecessor
  successor
  payload)

etc.

> Thanks. All kinds of suggestions is welcome :)

Happy hacking!  I'm sure you'll enjoy it once you wrap your head around
it, and you won't regret your choice.

-- 
It would be difficult to construe        Larry Wall, in  article
this as a feature.			 <·····················@netlabs.com>

From: Robert St. Amant
Subject: Re: How to implement these conventional data structures?
Date: Tue, 28 Oct 2003 16:09:31 +0000
Message-ID: <lpnu15tmjic.fsf@haeckel.csc.ncsu.edu>

Wang Yin <··@wangyin.com> writes:

> 3. A priority queue for use in Tarjan's MST algorithm. I implemented
>    this with a heap in C. But I don't know if it's proper to implement
>    a heap with Common Lisp's array?

Sure.  Also, you might be able to track down existing implementations
rather than implementing it all from scratch.  I use
priority-queue.lisp from the EKSL utils suite, which Google should be
able to find for you.

-- 
Rob St. Amant
http://www4.ncsu.edu/~stamant