Most efficient way of converting a hash table to a 2D array (or other easily serializable object)?

From: dpapathanasiou
Subject: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)?
Date: Fri, 03 Nov 2006 18:43:25 +0000
Message-ID: <1162579405.362079.80040@k70g2000cwa.googlegroups.com>

I have a hash table I want to serialize as an S-expression using PRIN1.

Since PRIN1 does not preserve the hash contents to the output file (it
writes just the hash handle, e.g. "#<EQUALP hash table, 11723 entries
{581579ED}>"), I've written a simple routine to convert the hash to a
two dimensional array instead:

(defun hash-to-2D-array (hash)
  (let ((elems (hash-table-count hash)))
    (when (> elems 0)
      (let ((index (make-array (list elems 2)))
	    (counter 0))
	(maphash #'(lambda (k v)
		     (setf (aref index counter 0) k)
		     (setf (aref index counter 1) v)
		     (incf counter))
		 hash)
	index))))

Because the hash table is simple (each key is a string and each value
is a list of strings), using PRIN1 on the resulting array does
serialize all the content.

Although it's functional, this conversion method is slow: for example,
even for hash tables containing around 25,000 entries, it takes several
seconds to produce the array.

Is there a better way of writing (hash-to-2D-array) or is there a
better way to serialize a simple hash table without having to convert
to an array first?

Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)? Zach Beane
- Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)? dpapathanasiou
Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)? Pascal Bourguignon
- Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)? dpapathanasiou
  - Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)? Otto Diesenbacher
    - Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)? dpapathanasiou
  - Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)? Pascal Bourguignon
    - Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)? dpapathanasiou
      - Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)? Pascal Bourguignon
  - Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)? Szymon
  - Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)? Rob Thorpe

From: Zach Beane
Subject: Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)?
Date: Fri, 03 Nov 2006 20:40:42 +0000
Message-ID: <m3y7qsuuut.fsf@unnamed.xach.com>

"dpapathanasiou" <···················@gmail.com> writes:

> Although it's functional, this conversion method is slow: for example,
> even for hash tables containing around 25,000 entries, it takes several
> seconds to produce the array.

It seems likely to me that producing the array is very fast, but
printing it (especially if *PRINT-PRETTY* is T) will take some time.

What happens if you do something like:

  (progn (hash-to-2D-array hash) nil)

Is it faster than just (hash-to-2D-array hash)?

Zach

From: dpapathanasiou
Subject: Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)?
Date: Fri, 03 Nov 2006 21:15:16 +0000
Message-ID: <1162588515.939080.224470@m73g2000cwd.googlegroups.com>

Zach Beane wrote:
> What happens if you do something like:
>
>   (progn (hash-to-2D-array hash) nil)
>
> Is it faster than just (hash-to-2D-array hash)?

Zach, that was brilliant!

You're right, the array conversion is fast, but calling PRIN1 with a
large array object causes the delay.

Although *PRINT-PRETTY* was set to T, I didn't think it mattered in
prod because I run all my complied code in -quiet mode.

So even without changing the default *PRINT-PRETTY* value (though I
need to research that a bit more), wrapping the PRIN1 call around
(progn ... nil) made a *huge* difference in performance.

Many thanks for pointing it out.

From: Pascal Bourguignon
Subject: Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)?
Date: Fri, 03 Nov 2006 19:58:12 +0000
Message-ID: <877iycnvzf.fsf@thalassa.informatimago.com>

"dpapathanasiou" <···················@gmail.com> writes:

> I have a hash table I want to serialize as an S-expression using PRIN1.
>
> Since PRIN1 does not preserve the hash contents to the output file (it
> writes just the hash handle, e.g. "#<EQUALP hash table, 11723 entries
> {581579ED}>"), I've written a simple routine to convert the hash to a
> two dimensional array instead:
>
> (defun hash-to-2D-array (hash)
>   (let ((elems (hash-table-count hash)))
>     (when (> elems 0)
>       (let ((index (make-array (list elems 2)))
> 	    (counter 0))
> 	(maphash #'(lambda (k v)
> 		     (setf (aref index counter 0) k)
> 		     (setf (aref index counter 1) v)
> 		     (incf counter))
> 		 hash)
> 	index))))
>
> Because the hash table is simple (each key is a string and each value
> is a list of strings), using PRIN1 on the resulting array does
> serialize all the content.
>
> Although it's functional, this conversion method is slow: for example,
> even for hash tables containing around 25,000 entries, it takes several
> seconds to produce the array.
>
> Is there a better way of writing (hash-to-2D-array) or is there a
> better way to serialize a simple hash table without having to convert
> to an array first?

I don't understand.  Do you need to print a hash table, or do you need
to convert it to a 2D array?

(defpackage "MINE" (:use "CL"))

(defun mine::hash-table (&key size test rehash-size rehash-threshold elements)
  (let ((table (make-hash-table :size size :test test
                                 :rehash-size rehash-size
                                 :rehash-threshold rehash-threshold)))
     (dolist (item elements table)
       (setf (gethash (first item) table) (second item)))))

(defmethod print-object ((self hash-table) stream)
  (format stream "#.(MINE::HASH-TABLE :SIZE ~D :TEST (FUNCTION ~S) ~%~
                ~&                    :REHASH-SIZE ~A :REHASH-THRESHOLD ~A~%~
                ~&   :ELEMENTS '("
                 (hash-table-count self) (hash-table-test self)
                 (hash-table-rehash-size self) (hash-table-rehash-threshold self))
  (maphash (lambda (k v) (format stream "~%(~S ~S)" k v)) self)
  (format stream "))")
  self)


(print-object (hash-table :size 3 :test (function equal)
                          :rehash-size 1.5 :rehash-threshold 0.75 
                          :elements '((a 1) (b 2) (c 3))) t)
prints:
#.(MINE::HASH-TABLE :SIZE 3 :TEST (FUNCTION EXT:FASTHASH-EQUAL) 
                    :REHASH-SIZE 1.5S0 :REHASH-THRESHOLD 0.75s0
   :ELEMENTS '(
(C 3)
(B 2)
(A 1)))
returns:
#S(HASH-TABLE :TEST EXT:FASTHASH-EQUAL (C . 3) (B . 2) (A . 1))

Of course, the easiest way would be to use the right implementation...


[53]> (let ((h (mine::hash-table :size 25000 :test (function EXT:FASTHASH-EQUAL) 
                    :rehash-size 1.5s0 :rehash-threshold 0.75s0
   :elements (loop :for i :from 0 :below 25000 :collect (list (format nil "~R" i) i)))))
(time (prog1 nil (with-output-to-string (out) (print-object h out)))))
Real time: 2.427929 sec.
Run time: 2.416151 sec.
Space: 44662496 Bytes
GC: 21, GC time: 0.832054 sec.
NIL
[54]> 


2.5 seconds not even compiled!

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
        Un chat errant
se soulage
        dans le jardin d'hiver
                                        Shiki

From: dpapathanasiou
Subject: Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)?
Date: Fri, 03 Nov 2006 20:17:57 +0000
Message-ID: <1162585077.236399.16840@b28g2000cwb.googlegroups.com>

Pascal Bourguignon wrote:
> I don't understand.  Do you need to print a hash table, or do you need
> to convert it to a 2D array?

The goal is to be able to serialize the hash table to a file, so that
it can be read back into memory as the same hash table object at a
later time, either by the original process or a completely different
process.

The nice thing about converting the hash to an array is that when I
read the file containing the array (i.e. the file created by passing
the array to PRIN1), it is loaded and recongized immediately as an
array object -- of course, I then have to convert the array to a hash
table to get back to the point I want.

So while I recognize that your suggestion will indeed print the hash
contents, what happens when I try to read that output back into memory?

I'll still need some kind of conversion function to get me back to the
original hash table object, correct?

From: Otto Diesenbacher
Subject: Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)?
Date: Fri, 03 Nov 2006 22:32:14 +0000
Message-ID: <87psc440wh.fsf@gmail.com>

"dpapathanasiou" <···················@gmail.com> writes:

> The goal is to be able to serialize the hash table to a file, so that
> it can be read back into memory as the same hash table object at a
> later time, either by the original process or a completely different
> process.
[...]
> I'll still need some kind of conversion function to get me back to the
> original hash table object, correct?

just for serialization and deserialization - do you know:
http://common-lisp.net/project/cl-store/

        okflo      

-- 
Otto Karl Florian Diesenbacher

From: dpapathanasiou
Subject: Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)?
Date: Sat, 04 Nov 2006 17:52:12 +0000
Message-ID: <1162662732.628062.202840@h48g2000cwc.googlegroups.com>

Otto Diesenbacher wrote:
> just for serialization and deserialization - do you know:
> http://common-lisp.net/project/cl-store/

Thanks for mentioning this -- I'll definitely try it.

From: Pascal Bourguignon
Subject: Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)?
Date: Fri, 03 Nov 2006 23:18:27 +0000
Message-ID: <87d584qfuk.fsf@thalassa.informatimago.com>

"dpapathanasiou" <···················@gmail.com> writes:

> Pascal Bourguignon wrote:
>> I don't understand.  Do you need to print a hash table, or do you need
>> to convert it to a 2D array?
>
> The goal is to be able to serialize the hash table to a file, so that
> it can be read back into memory as the same hash table object at a
> later time, either by the original process or a completely different
> process.
>
> The nice thing about converting the hash to an array is that when I
> read the file containing the array (i.e. the file created by passing
> the array to PRIN1), it is loaded and recongized immediately as an
> array object -- of course, I then have to convert the array to a hash
> table to get back to the point I want.
>
> So while I recognize that your suggestion will indeed print the hash
> contents, what happens when I try to read that output back into memory?
>
>
> I'll still need some kind of conversion function to get me back to the
> original hash table object, correct?

Perhaps you should read my previous post better...

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

Nobody can fix the economy.  Nobody can be trusted with their finger
on the button.  Nobody's perfect.  VOTE FOR NOBODY.

From: dpapathanasiou
Subject: Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)?
Date: Sat, 04 Nov 2006 17:24:59 +0000
Message-ID: <1162661099.671583.8510@f16g2000cwb.googlegroups.com>

Pascal Bourguignon wrote:
> Perhaps you should read my previous post better...

Well, ever since I saw this post of yours (and I'm sure you were
probably joking):
http://groups.google.com/group/comp.lang.lisp/browse_thread/thread/88a36fb2c239a44e/4c95766a8bdda347?lnk=gst&q=&rnum=63#4c95766a8bdda347
I'm less likely to read your replies very thoroughly.

From: Pascal Bourguignon
Subject: Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)?
Date: Sat, 04 Nov 2006 18:10:45 +0000
Message-ID: <8764dvytei.fsf@thalassa.informatimago.com>

"dpapathanasiou" <···················@gmail.com> writes:

> Pascal Bourguignon wrote:
>> Perhaps you should read my previous post better...
>
> Well, ever since I saw this post of yours (and I'm sure you were
> probably joking):
> http://groups.google.com/group/comp.lang.lisp/browse_thread/thread/88a36fb2c239a44e/4c95766a8bdda347?lnk=gst&q=&rnum=63#4c95766a8bdda347
> I'm less likely to read your replies very thoroughly.

This post was written to motivate, on the contrary, more VERY throughful reading.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

THIS IS A 100% MATTER PRODUCT: In the unlikely event that this
merchandise should contact antimatter in any form, a catastrophic
explosion will result.

From: Szymon
Subject: Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)?
Date: Sat, 04 Nov 2006 13:57:10 +0000
Message-ID: <eii68i$fll$1@atlantis.news.tpi.pl>

dpapathanasiou wrote:

> I'll still need some kind of conversion function to get me back to the
> original hash table object, correct?

It's already in Pascal's code.

From: Rob Thorpe
Subject: Re: Most efficient way of converting a hash table to a 2D array (or other easily serializable object)?
Date: Mon, 06 Nov 2006 12:50:19 +0000
Message-ID: <1162817419.299570.4360@i42g2000cwa.googlegroups.com>

dpapathanasiou wrote:
> Pascal Bourguignon wrote:
> > I don't understand.  Do you need to print a hash table, or do you need
> > to convert it to a 2D array?
>
> The goal is to be able to serialize the hash table to a file, so that
> it can be read back into memory as the same hash table object at a
> later time, either by the original process or a completely different
> process.
>
> The nice thing about converting the hash to an array is that when I
> read the file containing the array (i.e. the file created by passing
> the array to PRIN1), it is loaded and recongized immediately as an
> array object -- of course, I then have to convert the array to a hash
> table to get back to the point I want.
>
> So while I recognize that your suggestion will indeed print the hash
> contents, what happens when I try to read that output back into memory?
>
>
> I'll still need some kind of conversion function to get me back to the
> original hash table object, correct?

Pascal's code provides this.  It serializes the hash table to a list
which becomes a hash-table again when read.

I agree with you about treating Pascal's suggestions with some care,
occasionally stuff he says is crazy, but mostly he knows what he's
talking about.