Need something like a "fuzzy" hash table

From: Greg Menke
Subject: Need something like a "fuzzy" hash table
Date: Tue, 31 Dec 2002 21:35:30 +0000
Message-ID: <m3u1gtdg7x.fsf@europa.pienet>

Hi,

I'm working on a program that counts the number of 3,4,5,... sided
figures formed when some quantity of lines of random slope and
position are dropped onto a plane.  My current approach is to
accumulate the lines & intersections, then run thru the dataset
counting the figures.  Its not an elegant approach, but such things
are fun in Lisp anyway.  As expected, it doesn't take many lines to
make the dataset very large.

I keep a list of intersections for each line.  As I add each new line,
I compute its intersections with all the lines already dropped, adding
new intersection records where required.  Each intersection record is
exactly one object, referenced by the lines that share the point.  I
chose to use a list so I could avoid the vector resizing overhead.

I have to keep the intersection entries in sequence within each line
so I can navigate down the list when counting the figures.

Right now, I binary search the lists to find the positions where new
intersection records are to be inserted- this becomes slow as one
would expect.

I'd like to use something like a hash table, keyed to the coordinates
of the intersection.  However, when adding a new intersection, I have
to search for the nearest existing intersection point on the line so I
can insert before or after it as the case may be.  This requires some
kind of next/previous operation and "nearness" search that it doesn't
seem hash tables readily support.

A one record/node tree has obvious overhead problems (I need minimum
overhead so space is left for the intersection records themselves),
not to mention balancing issues to keep it efficient.

I've made a pretty good start on an n-ary tree, which looks like it
should work OK (once I find that dumb rebalancing bug...), but I was
wondering if I've missed some other more clever approach to the
problem.

As I see it, I need somthing that will allow me to accumulate a sparse
set of keys, efficiently handle disordered if not random searches with
"nearness" instead of "equivalence" or "equal" as the criteria,
disordered inserts, and lastly, preserve next/previous order at all
times.  Perhaps the ordering requirement could be deferred to a
"post-indexing" state.

A database is a natural choice, but I'd like to keep this in-core for
the sake of Lisp practice if nothing else.

Thanks for any pointers,

Greg Menke

Re: Need something like a "fuzzy" hash table Joe Marshall
- Re: Need something like a "fuzzy" hash table Greg Menke
Re: Need something like a "fuzzy" hash table Erann Gat
Re: Need something like a "fuzzy" hash table Lieven Marchand
- Re: Need something like a "fuzzy" hash table Steven M. Haflich
  - Re: Need something like a "fuzzy" hash table Jeff Greif
Re: Need something like a "fuzzy" hash table Dave Bakhash
- Re: Need something like a "fuzzy" hash table Greg Menke

From: Joe Marshall
Subject: Re: Need something like a "fuzzy" hash table
Date: Tue, 31 Dec 2002 21:46:28 +0000
Message-ID: <1y3xkgjv.fsf@ccs.neu.edu>

Greg Menke <··········@toadmail.com> writes:

> Right now, I binary search the lists to find the positions where new
> intersection records are to be inserted- this becomes slow as one
> would expect.

I'm not sure I understand how one would `binary search' a list.  The
method that pops into my mind is something like this:
  (let* ((midpoint (floor (length list) 2))
         (probe    (elt list midpoint)))
    (if (> key probe)
         (....search last half of list....)
         (....search first half of list ....)))

But that would be far slower than even a linear search!

> As I see it, I need somthing that will allow me to accumulate a sparse
> set of keys, efficiently handle disordered if not random searches with
> "nearness" instead of "equivalence" or "equal" as the criteria,
> disordered inserts, and lastly, preserve next/previous order at all
> times.  Perhaps the ordering requirement could be deferred to a
> "post-indexing" state.

How about a red-black tree?

From: Greg Menke
Subject: Re: Need something like a "fuzzy" hash table
Date: Tue, 31 Dec 2002 22:24:27 +0000
Message-ID: <m3d6nhomhw.fsf@europa.pienet>

Joe Marshall <···@ccs.neu.edu> writes:

> Greg Menke <··········@toadmail.com> writes:
> 
> > Right now, I binary search the lists to find the positions where new
> > intersection records are to be inserted- this becomes slow as one
> > would expect.
> 
> I'm not sure I understand how one would `binary search' a list.  The
> method that pops into my mind is something like this:
>   (let* ((midpoint (floor (length list) 2))
>          (probe    (elt list midpoint)))
>     (if (> key probe)
>          (....search last half of list....)
>          (....search first half of list ....)))
> 
> But that would be far slower than even a linear search!

Thats pretty much what I did.  Actually my question didn't express it
well- each intersection record keeps a reference to the previous and
next points.  The list items themselves are not ordered, the links
between them provide the order.  So its more of a doubly-linked list
that happens to be contained in a list.  I search by traversing the
links and simply add to the head.  Sorry for not being more clear...

> > As I see it, I need somthing that will allow me to accumulate a sparse
> > set of keys, efficiently handle disordered if not random searches with
> > "nearness" instead of "equivalence" or "equal" as the criteria,
> > disordered inserts, and lastly, preserve next/previous order at all
> > times.  Perhaps the ordering requirement could be deferred to a
> > "post-indexing" state.
> 
> How about a red-black tree?

Or an n-ary tree.  I was wondering about taking advantage of some
subtle property of hash tables or something else built in.

Gregm

From: Erann Gat
Subject: Re: Need something like a "fuzzy" hash table
Date: Wed, 01 Jan 2003 03:27:45 +0000
Message-ID: <gat-3112021928310001@192.168.1.51>

In article <··············@europa.pienet>, Greg Menke
<··········@toadmail.com> wrote:

> A database is a natural choice, but I'd like to keep this in-core for
> the sake of Lisp practice if nothing else.

Just by way of clarification, the use of a database does not necessarily
imply that the information is not "in core."  Many databases let you
create in-memory tables, and even if a table is on disk, a good database
will cache the relevant parts "in core" anyway.

E.

From: Lieven Marchand
Subject: Re: Need something like a "fuzzy" hash table
Date: Tue, 31 Dec 2002 23:33:06 +0000
Message-ID: <87bs31darx.fsf@wyrd.be>

Greg Menke <··········@toadmail.com> writes:

> As I see it, I need somthing that will allow me to accumulate a sparse
> set of keys, efficiently handle disordered if not random searches with
> "nearness" instead of "equivalence" or "equal" as the criteria,
> disordered inserts, and lastly, preserve next/previous order at all
> times.  Perhaps the ordering requirement could be deferred to a
> "post-indexing" state.
> 
> A database is a natural choice, but I'd like to keep this in-core for
> the sake of Lisp practice if nothing else.

From what little I remember of computational geometry, you might be
able to use quad trees. In any case, I'm fairly sure this is a well
studied problem so a library visit might be helpfull.

-- 
People don't bore me. I like people. - Really? All of them? - All of them. -
Even the creepy ones? - Nobody's creepy from the inside, Hazel.  Some of them 
are sad, and some of them hurt, and some of them think they're the only real 
thing in the whole world. But they're not creepy.         --  Hazel and Death

From: Steven M. Haflich
Subject: Re: Need something like a "fuzzy" hash table
Date: Wed, 01 Jan 2003 01:17:45 +0000
Message-ID: <3E1241E6.8040208@alum.mit.edu>

Lieven Marchand wrote:

> From what little I remember of computational geometry, you might be
> able to use quad trees. In any case, I'm fairly sure this is a well
> studied problem so a library visit might be helpfull.

Indeed, yes!  There is probably a quad tree or something like
it inside any screen UI that needs to implement mouse sensitivity
against a large number of objects.  The implementation would
generally be two btrees, with objects linked into both according
to their {x,y} ordinates.  Given such a structure, code can easily
answer questions like "Find all objects within some delta of
{x0,y0}".

You (Greg) might possibly find example code inside free clim, for
example, if you can find the sources...

From: Jeff Greif
Subject: Re: Need something like a "fuzzy" hash table
Date: Wed, 01 Jan 2003 17:05:16 +0000
Message-ID: <gdFQ9.190921$qF3.13292@sccrnsc04>

I don't think a quad-tree is needed, since each line keeps a list of its
intersections with other lines.  The ordering of these intersections can be
done based on x- or y-coordinate alone.  A quad-tree or one of the
alternatives developed for variant problems would be useful if you wanted to
keep all the intersections for all the lines in a single data structure.
Such a data structure might simplify the polygon-finding algorithm or at
least speed it up.

To keep the ordered list of intersections with a single line, a heap or
red-black tree would probably do the job nicely.

Jeff

From: Dave Bakhash
Subject: Re: Need something like a "fuzzy" hash table
Date: Fri, 03 Jan 2003 14:57:05 +0000
Message-ID: <c29lm2246ym.fsf@no-knife.mit.edu>

Greg Menke <··········@toadmail.com> writes:

> I'd like to use something like a hash table, keyed to the coordinates
> of the intersection.  However, when adding a new intersection, I have
> to search for the nearest existing intersection point on the line so I
> can insert before or after it as the case may be.  This requires some
> kind of next/previous operation and "nearness" search that it doesn't
> seem hash tables readily support.

The thing that comes to my mind are skiplists:

 http://mit.edu/cadet/www/skiplists.pdf

hope that helps.

dave

From: Greg Menke
Subject: Re: Need something like a "fuzzy" hash table
Date: Fri, 03 Jan 2003 18:42:47 +0000
Message-ID: <m3adiioz14.fsf@europa.pienet>

Dave Bakhash <·····@alum.mit.edu> writes:

> Greg Menke <··········@toadmail.com> writes:
> 
> > I'd like to use something like a hash table, keyed to the coordinates
> > of the intersection.  However, when adding a new intersection, I have
> > to search for the nearest existing intersection point on the line so I
> > can insert before or after it as the case may be.  This requires some
> > kind of next/previous operation and "nearness" search that it doesn't
> > seem hash tables readily support.
> 
> The thing that comes to my mind are skiplists:
> 
>  http://mit.edu/cadet/www/skiplists.pdf

Skiplists are pretty neat.  I've been concentrating on a tree with
vectors of items in each node, mostly to minimize per-item overhead.
I'll look at them again, rebalancing the tree is kind of annoying...

Gregm