Benchmarks: mapping lists vs. hash-tables (longish)

From: Len Charest
Subject: Benchmarks: mapping lists vs. hash-tables (longish)
Date: Thu, 17 Sep 1992 18:40:02 +0000
Message-ID: <1992Sep17.184002.7949@jpl-devvax.jpl.nasa.gov>

I have a large (~1000) set of objects that I want to map over. The order of the mapping is irrelevant, but the speed of the mapping is critical. What type of aggregate storage should I use for the set? 

Since the set of objects is dynamic, I'll limit the storage options to list and hash-table. So I'm really asking, Which data type is faster for mapping, list or hash-table--not in general but specifically for my application?

I'm assuming that the answer depends on the Lisp implementation I'm using, so I'd like to write some benchmarking code to test mapping performance of lists and hash-tables. Unfortunately, I'm next to ignorant about how to properly benchmark such performance. What should the code do? What should the state of the garbage collector be? The state of the Lisp image (fresh vs. 'stale')? 

Below I've inculded some code I wrote to test list-vs-hash-table mapping performance in my Lisp environment (Lucid 4.0.2 on Sparc2). Before testing I hypothesized that hash-table mapping would be faster, since hash-table storage would be implemented as consecutive memory locations (a blatant assumption) and locality of reference would ensure speedier performance than lists. Comments on the results?

Notes: code was compiled in 'development' mode of compiler (compilation-speed=3, speed=2, safety=3, space=0); compiled code was executed in a fresh Lisp image; the ephemeral GC was on.

----------------- benchmark code -----------------
(in-package "USER")

(defvar *list*)
(defvar *table*)
(defvar *size*)

(defun init-test (size) 
  (setq *size* size
	*list* nil
	*table* (make-hash-table :size *size*
				 :test #'eq))
  (dotimes (i *size*)
    (push (gensym) *list*))
  (dotimes (i *size*)
    (let ((x (gensym)))
      (setf (gethash x *table*) x))))

(defun run-test ()
  (format t "~%Size: ~d~%List:~%" *size*)
  (time (dotimes (i 1000)
	  (mapc #'ignore *list*)))
  (format t "~%Hash-table:~%")
  (time (dotimes (i 1000)
	  (maphash #'ignore *table*)))
  (fresh-line)
  (values)) 

----------------- results ------------------------
;;; Loading binary file "/home/users/charest/lisp/table-vs-list.sbin"
#P"/home/users/charest/lisp/table-vs-list.sbin"
> (init-test 100)
NIL
> (run-test)
Size: 100
List:
Elapsed Real Time = 0.12 seconds
Total Run Time    = 0.10 seconds
User Run Time     = 0.10 seconds
System Run Time   = 0.00 seconds
Process Page Faults    =          3
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =          0

Hash-table:
Elapsed Real Time = 0.30 seconds
Total Run Time    = 0.30 seconds
User Run Time     = 0.30 seconds
System Run Time   = 0.00 seconds
Process Page Faults    =          1
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =          0
> (init-test 1000)
NIL
> (run-test)
Size: 1000
List:
Elapsed Real Time = 1.02 seconds
Total Run Time    = 0.99 seconds
User Run Time     = 0.99 seconds
System Run Time   = 0.00 seconds
Process Page Faults    =          0
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =          0

Hash-table:
Elapsed Real Time = 2.92 seconds
Total Run Time    = 2.92 seconds
User Run Time     = 2.92 seconds
System Run Time   = 0.00 seconds
Process Page Faults    =          0
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =          0
> (init-test 10000)
NIL
> (run-test)
Size: 10000
List:
Elapsed Real Time = 16.37 seconds
Total Run Time    = 16.36 seconds
User Run Time     = 16.36 seconds
System Run Time   = 0.00 seconds
Process Page Faults    =          0
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =          0

Hash-table:
Elapsed Real Time = 32.06 seconds
Total Run Time    = 31.95 seconds
User Run Time     = 31.95 seconds
System Run Time   = 0.00 seconds
Process Page Faults    =          0
Dynamic Bytes Consed   =          0
Ephemeral Bytes Consed =          0

..................................................
                                  Len Charest, Jr.
                 JPL Artificial Intelligence Group
                          ·······@aig.jpl.nasa.gov

Re: Benchmarks: mapping lists vs. hash-tables (longish) Barry Margolin
Re: Benchmarks: mapping lists vs. hash-tables (longish) Andy Wilson
- Re: Benchmarks: mapping lists vs. hash-tables (longish) Len Charest
  - Re: Benchmarks: mapping lists vs. hash-tables (longish) Barry Margolin

From: Barry Margolin
Subject: Re: Benchmarks: mapping lists vs. hash-tables (longish)
Date: Thu, 17 Sep 1992 22:59:06 +0000
Message-ID: <19b2jqINNs8b@early-bird.think.com>

In article <·····················@jpl-devvax.jpl.nasa.gov> ·······@aig.jpl.nasa.gov writes:
>Since the set of objects is dynamic, I'll limit the storage options to
>list and hash-table. So I'm really asking, Which data type is faster for
>mapping, list or hash-table--not in general but specifically for my
>application?

>Below I've inculded some code I wrote to test list-vs-hash-table mapping
>performance in my Lisp environment (Lucid 4.0.2 on Sparc2). Before testing
>I hypothesized that hash-table mapping would be faster, since hash-table
>storage would be implemented as consecutive memory locations (a blatant
>assumption) and locality of reference would ensure speedier performance
>than lists. Comments on the results?

For mapping, I would expect lists and arrays to be faster than hash tables.
Hash tables are optimized for lookup, not sequential traversal.  Hash
tables elements are *not* usually stored sequentially; hash tables work
best when the capacity of the table is significantly larger than the number
of elements, so that collisions are rare.  It therefore takes more work to
find the next element of the hash table.  Also, MAPHASH calls the function
with two arguments (the key and value) while MAPC only calls with one
argument, so the function call overhead will be greater.

Why do you not expect list elements to be stored as consecutive memory
locations?  Most CONS implementations allocate cons cells consecutively
from the heap, so allocating a list in a tight loop like yours is likely to
result in good locality; if you instead do lots of other computation in
between pushing elements onto the list you're likely to get performance
more representative of that in actual programs.  However, copying garbage
collectors (which are used in most modern Lisp implementations) usually
cause related objects to be brought together even if they weren't created
consecutively, resulting in good locality again.

P.S. Please limit your lines to under 80 characters.  It's tough to read
lines that wrap in the middle of words, and some people use news readers
that don't wrap at all.
-- 
Barry Margolin
System Manager, Thinking Machines Corp.

······@think.com          {uunet,harvard}!think!barmar

From: Andy Wilson
Subject: Re: Benchmarks: mapping lists vs. hash-tables (longish)
Date: Thu, 17 Sep 1992 23:18:15 +0000
Message-ID: <19b3nnINNq6h@early-bird.think.com>

In article <·····················@jpl-devvax.jpl.nasa.gov>, ·······@Aig.Jpl.Nasa.Gov (Len Charest) writes:

...

|> 
|> Below I've inculded some code I wrote to test list-vs-hash-table mapping performance in my Lisp environment (Lucid 4.0.2 on Sparc2). Before testing I hypothesized that hash-table mapping would be faster, since hash-table storage would be implemented as co|> nsecutive memory locations (a blatant assumption) and locality of reference would ensure speedier performance than lists. Comments on the results?

In most Lisp implementations,  hash tables are longer than the number of
elements stored,  so the timing for the MAPHASH operation includes the overhead
of mapping over the empty buckets.  There's a tradeoff between space usage
and access speed.  Optimizing for space would result in faster mapping.  Since
it is important that hash tables be fast,  MAPHASH is a non-optimal way to map over a collection.

|> 
|> Notes: code was compiled in 'development' mode of compiler (compilation-speed=3, speed=2, safety=3, space=0); compiled code was executed in a fresh Lisp image; the ephemeral GC was on.

Also, for benchmarking,  use the production mode of the compiler (compilation-speed = 0).

From: Len Charest
Subject: Re: Benchmarks: mapping lists vs. hash-tables (longish)
Date: Fri, 18 Sep 1992 18:12:30 +0000
Message-ID: <1992Sep18.181230.8993@jpl-devvax.jpl.nasa.gov>

In article <············@early-bird.think.com>, ···@Think.COM (Andy Wilson) writes:
|> In article <·····················@jpl-devvax.jpl.nasa.gov>, ·······@Aig.Jpl.Nasa.Gov (Len Charest) writes:
|> |> Notes: code was compiled in 'development' mode of compiler 
|> |> (compilation-speed=3, speed=2, safety=3, space=0); compiled code 
|> |> was executed in a fresh Lisp image; the ephemeral GC was on.
|> 
|> Also, for benchmarking,  use the production mode of the compiler 
|> (compilation-speed = 0).
|> 

Why? We run our application using code compiled in development mode.
Shouldn't I benchmark under the same conditions?
..................................................
                                  Len Charest, Jr.
                 JPL Artificial Intelligence Group
                          ·······@aig.jpl.nasa.gov

From: Barry Margolin
Subject: Re: Benchmarks: mapping lists vs. hash-tables (longish)
Date: Fri, 18 Sep 1992 22:06:26 +0000
Message-ID: <19djt2INNdmj@early-bird.think.com>

In article <·····················@jpl-devvax.jpl.nasa.gov> ·······@aig.jpl.nasa.gov writes:
>In article <············@early-bird.think.com>, ···@Think.COM (Andy Wilson) writes:
>|> Also, for benchmarking,  use the production mode of the compiler 
>|> (compilation-speed = 0).
>Why? We run our application using code compiled in development mode.
>Shouldn't I benchmark under the same conditions?

Yes.  But if you're concerned about performance, why aren't you using the
optimizing compiler to generate your application code?
-- 
Barry Margolin
System Manager, Thinking Machines Corp.

······@think.com          {uunet,harvard}!think!barmar