Reducing the size of a Hash table....

From: Kenneth P. Turvey
Subject: Reducing the size of a Hash table....
Date: Tue, 21 Jul 1998 00:00:00 +0000
Message-ID: <slrn6r8drs.i5g.kturvey@www.sprocketshop.com>

I am writing (or continuing to write) a program to solve a Rubic's cube
given any initial state.  A method of improving the performance of this
search was suggested by David Lee Winston Miller in his paper Solving
Rubic's Cube Using the Bestfast Algorithm and Profile tables
(http://www.sunyit.edu/~millerd1/RUBIK.HTM).  

I have avoided referencing the source code so please forgive me if some
of my questions would be answered there.  His implementation is in
prolog so much of it may not be applicable in any case.  

The problem I am running into involves a large hash table that is used
to improve the heuristic for the search.  The hash table has entries
that relate cube configurations to their distance from a solved cube for
small distances.  

(My cube looks like this)  ->  (It is six moves from a solution) 

I would like to be able to store as many of these entries as possible in
the hash table for obvious reasons.  

The problems involved are:

1) Each cube configuration is a member of a set of 24 cubes
which are obviously equidistant from the solution (based on the 24
possible orientations of the cube).  I would like to be able to have one
entry in the hash table to represent all 24 of these cubes, but this
requires manipulating the cube and taking the hash for each
configuration.  Is there any other way to do this?  

2) Upon getting the 24 hashes, how may I compare them?  Looking at the
hyperspec there doesn't seem to be a specific type type for the return
value of gethash.  Have I missed something?  I would like to be able to
select the configuration with the lowest (or highest) hash value and
only use this one in the table.  

3) How may I dump the table to disk for future use?  I don't want to
have to recalculate all these hash values every time.  Is there anyway
to do this?  

Thanks, 

I am relatively new to Lisp (considering the size of the language), so
please forgive me if this is obvious.  I have never used hash tables
before in a program of greater than "hello world" complexity.  

-- 
Kenneth P. Turvey <·······@pug1.SprocketShop.com> 

Anyone who challenges the prevailing orthodoxy finds himself silenced
with surprising effectiveness.  A genuinely unfashionable opinion is
almost never given a fair hearing.
	-- George Orwell

Re: Reducing the size of a Hash table.... Sunil Mishra
Re: Reducing the size of a Hash table.... Barry Margolin
- Re: Reducing the size of a Hash table.... Kenneth P. Turvey

From: Sunil Mishra
Subject: Re: Reducing the size of a Hash table....
Date: Tue, 21 Jul 1998 00:00:00 +0000
Message-ID: <efyemvfi8i1.fsf@cleon.cc.gatech.edu>

·······@www.sprocketshop.com (Kenneth P. Turvey) writes:

> I am writing (or continuing to write) a program to solve a Rubic's cube
> given any initial state.  A method of improving the performance of this
> search was suggested by David Lee Winston Miller in his paper Solving
> Rubic's Cube Using the Bestfast Algorithm and Profile tables
> (http://www.sunyit.edu/~millerd1/RUBIK.HTM).  

Never heard of the bestfast algorithm. Perhaps you mean best first search?
Anyway, the particular algorithm is irrelevant, since you question is
applicable to any kind of search you might employ.

> I have avoided referencing the source code so please forgive me if some
> of my questions would be answered there.  His implementation is in
> prolog so much of it may not be applicable in any case.  
> 
> The problem I am running into involves a large hash table that is used
> to improve the heuristic for the search.  The hash table has entries
> that relate cube configurations to their distance from a solved cube for
> small distances.  
> 
> (My cube looks like this)  ->  (It is six moves from a solution) 
> 
> I would like to be able to store as many of these entries as possible in
> the hash table for obvious reasons.  
> 
> The problems involved are:
> 
> 1) Each cube configuration is a member of a set of 24 cubes
> which are obviously equidistant from the solution (based on the 24
> possible orientations of the cube).  I would like to be able to have one
> entry in the hash table to represent all 24 of these cubes, but this
> requires manipulating the cube and taking the hash for each
> configuration.  Is there any other way to do this?  
> 
> 2) Upon getting the 24 hashes, how may I compare them?  Looking at the
> hyperspec there doesn't seem to be a specific type type for the return
> value of gethash.  Have I missed something?  I would like to be able to
> select the configuration with the lowest (or highest) hash value and
> only use this one in the table.  
> 
> 3) How may I dump the table to disk for future use?  I don't want to
> have to recalculate all these hash values every time.  Is there anyway
> to do this?  

This is a neat problem, and I haven't quite thought of these issues before,
so my answer may not be the best.

I don't think that hash tables are the most appropriate data structure for
your problem. Either that, or you have to redefine the concept of a state.

What I mean is that hash tables can only compare using eq, eql, equal and
equalp. Your program however requires you to make a completely different,
domain specific, type of comparison, for detecting equivalent cube
states. One solution then is to get rid of hash tables, and define your own
data structure for maintaining equivalent states. This is probably the best
solution if you want to maximize efficiency and all that.

The other alternative I can think of is to translate the state into some
canonical form, such that you are guaranteed that all equivalent states go
to the same canonical form. This can then be compared within a hash table,
and you will be able to pick the right operator given the state. The reason
this approach is inferior is that you will have to spend time and memory
doing a translation to a canonical form, rather than being able to compare
directly through your data structure.

The canonicalization solution suggests another variant of
rewrite-your-data-structure. You could write a hashing function that given
the state is able to compute the hash value of the canonical form of the
state. So two equivalent states would be guaranteed to generate the same
hash value.

Good luck!

Sunil

From: Barry Margolin
Subject: Re: Reducing the size of a Hash table....
Date: Tue, 21 Jul 1998 00:00:00 +0000
Message-ID: <Bx2t1.36$DF4.1268341@cam-news-reader1.bbnplanet.com>

In article <······················@www.sprocketshop.com>,
Kenneth P. Turvey <·······@www.sprocketshop.com> wrote:
>1) Each cube configuration is a member of a set of 24 cubes
>which are obviously equidistant from the solution (based on the 24
>possible orientations of the cube).  I would like to be able to have one
>entry in the hash table to represent all 24 of these cubes, but this
>requires manipulating the cube and taking the hash for each
>configuration.  Is there any other way to do this?  

Perhaps there's some way you can compute a canonical form for a cube
configuration representation.  This will transform all the equivalent
configurations into the same representation, which you would then hash.

>2) Upon getting the 24 hashes, how may I compare them?  Looking at the
>hyperspec there doesn't seem to be a specific type type for the return
>value of gethash.  Have I missed something?  I would like to be able to
>select the configuration with the lowest (or highest) hash value and
>only use this one in the table.  

I'm not sure what you mean by "specific type for the return value".
GETHASH returns whatever you stored in the hash table for that key.

If you're trying to find the internal hash code that the hash table uses,
that's not available as a standard interface (Symbolics CL actually uses
association arrays that were searched sequentially in its implementation of
small hash tables).  You can call SXHASH, but it may or may not use the
same algorithm as an EQUAL hash table (SXHASH has some requirements that
don't necessarily apply to the internal hash function of a table).

>3) How may I dump the table to disk for future use?  I don't want to
>have to recalculate all these hash values every time.  Is there anyway
>to do this?

The only portable way to dump a hash table is to dump the keys and values.
Recreating it will require recomputing the hash values of the keys.

-- 
Barry Margolin, ······@bbnplanet.com
GTE Internetworking, Powered by BBN, Cambridge, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.

From: Kenneth P. Turvey
Subject: Re: Reducing the size of a Hash table....
Date: Tue, 21 Jul 1998 00:00:00 +0000
Message-ID: <slrn6r9uf4.kdb.kturvey@www.sprocketshop.com>

On Tue, 21 Jul 1998 15:33:53 GMT, Barry Margolin <······@bbnplanet.com> wrote:
>In article <······················@www.sprocketshop.com>,
>
>Perhaps there's some way you can compute a canonical form for a cube
>configuration representation.  This will transform all the equivalent
>configurations into the same representation, which you would then hash.

This I've tried to do, but there isn't an obvious solution.  I have some
ideas that work for the vast majority of cubes that essentially set up a
priority for the various faces of the cube.  Many cubes, however, do end up 
having equivalent priorities for more than one face using any system
I've devised.  

>I'm not sure what you mean by "specific type for the return value".
>GETHASH returns whatever you stored in the hash table for that key.
>
>If you're trying to find the internal hash code that the hash table uses,
>that's not available as a standard interface (Symbolics CL actually uses
>association arrays that were searched sequentially in its implementation of
>small hash tables).  You can call SXHASH, but it may or may not use the
>same algorithm as an EQUAL hash table (SXHASH has some requirements that
>don't necessarily apply to the internal hash function of a table).

I was referring to the actual internal hash code used by the
implementation.  I was afraid this was the case.  Maybe using SXHASH
would be the way to go.  I could reimplement hash tables that would
also allow dumping to disk.  I'll think about it. 

>The only portable way to dump a hash table is to dump the keys and values.
>Recreating it will require recomputing the hash values of the keys.

Well, I guess that is an improvement over doing all of the work
required, but I was hoping for a more elegant solution.  

Thanks for all your help, I really do appreciate it, 

-- 
Kenneth P. Turvey <·······@pug1.SprocketShop.com> 

The optimist thinks this is the best of all possible worlds. The
pessimist fears it is true.	
	-- Robert Oppenheimer