Generating code which compiles to a jmphash

From: ···········@yahoo.com
Subject: Generating code which compiles to a jmphash
Date: Mon, 24 Mar 2008 23:20:29 +0000
Message-ID: <1e48c01d-add0-417f-a5e3-1e07b2033de4@s13g2000prd.googlegroups.com>

Hi,
  I have a number of small utility functions which I would like to
compile
to JMPHASHs (to steal a term from clisp's (disassemble 'foo) output.
Feel free to correct my terminology.)  These functions are called
several million times.  I can and have transformed small, clear
functions
into ugly, hard to maintain case statements.

  A little background:  I'm coding a puzzle solver in my spare time,
for the challenge and as a way to learn lisp.  One part of the solver
is an end-game hash, which takes quite a while to compute.
Turning nice code into ugly code has made the code significantly
faster.

  A simple response of "learn to write macros" probably would not
be inappropriate.  On the other hand, I am at least starting to see
patterns in the code which I know can be abstracted away with macros,
and I'm hoping that a little shove in the right direction will make
that easier.

At the risk of too much information, I'll provide two examples.
Feel free to ignore the first one in your responses.  The first
is included as better example of the kind of transformation
I am interested in, but the second is fairly trivial, and a good
jumping off point for me.

(defconstant dirs '(up down left right))

(defun all-moves ()

  (loop for dir in dirs
     append
       (loop for i from 0 to 4
          collect (list i dir))))

(defun move-to-int (move)
  (+ (first move) (* 5 (position (second move) dirs))))

> (nth 12 (all-moves))
       => (2 left)
> (map 'list 'move-to-int (all-moves))
       => (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19)

Re: Generating code which compiles to a jmphash ···········@yahoo.com
- Re: Generating code which compiles to a jmphash Pascal Bourguignon
- Re: Generating code which compiles to a jmphash Thomas A. Russ
  - Re: Generating code which compiles to a jmphash Edi Weitz
    - Re: Generating code which compiles to a jmphash Thomas A. Russ
      - Re: Generating code which compiles to a jmphash Kent M Pitman
        Re: Generating code which compiles to a jmphash Pascal J. Bourguignon
        Re: Generating code which compiles to a jmphash ···········@yahoo.com
        Re: Generating code which compiles to a jmphash ···········@yahoo.com
        Re: Generating code which compiles to a jmphash Pascal J. Bourguignon
Re: Generating code which compiles to a jmphash Gene
- Re: Generating code which compiles to a jmphash ···········@yahoo.com
Re: Generating code which compiles to a jmphash Pascal Bourguignon
Re: Generating code which compiles to a jmphash vanekl

From: ···········@yahoo.com
Subject: Re: Generating code which compiles to a jmphash
Date: Mon, 24 Mar 2008 23:41:15 +0000
Message-ID: <341542d4-5b9a-4241-b7fb-52334e615571@s8g2000prg.googlegroups.com>

Oops.  I accidentially hit send before completing that post...

The problem with the above code is the function move-to-int.
Instead of calling (position ...) every time, I would like code
which compiles to a jmphash, as if I wrote it using case statements.


Here is the second example.

(defun old-char+ (char)
  (if (char= char #\9)
      #\A
      (int-char (1+ (char-int char)))))
vs
(defun char+ (char)
  (case char
    (#\1 #\2) (#\2 #\3) (#\3 #\4) (#\4 #\5) (#\5 #\6) (#\6 #\7)
    (#\7 #\8) (#\8 #\9) (#\9 #\A) (#\A #\B) (#\B #\C) (#\C #\D)
    (#\D #\E) (#\E #\F) (#\F #\G) (#\G #\H) (#\H #\I) (t   #\*)))

char+ results in a modest 5% improvement in code speed over old-char+.
I really don't care so much about that, or about finding the best
version of char+, but am interested in how you would write code
which given some range of inputs will compile to something similar
to the compiled code for char+.

Thanks.  Sorry for the split post.

From: Pascal Bourguignon
Subject: Re: Generating code which compiles to a jmphash
Date: Tue, 25 Mar 2008 00:39:22 +0000
Message-ID: <87r6dzsklh.fsf@thalassa.informatimago.com>

···········@yahoo.com writes:

> Oops.  I accidentially hit send before completing that post...
>
> The problem with the above code is the function move-to-int.
> Instead of calling (position ...) every time, I would like code
> which compiles to a jmphash, as if I wrote it using case statements.
>
>
> Here is the second example.
>
> (defun old-char+ (char)
>   (if (char= char #\9)
>       #\A
>       (int-char (1+ (char-int char)))))
> vs
> (defun char+ (char)
>   (case char
>     (#\1 #\2) (#\2 #\3) (#\3 #\4) (#\4 #\5) (#\5 #\6) (#\6 #\7)
>     (#\7 #\8) (#\8 #\9) (#\9 #\A) (#\A #\B) (#\B #\C) (#\C #\D)
>     (#\D #\E) (#\E #\F) (#\F #\G) (#\G #\H) (#\H #\I) (t   #\*)))
>
> char+ results in a modest 5% improvement in code speed over old-char+.
> I really don't care so much about that, or about finding the best
> version of char+, but am interested in how you would write code
> which given some range of inputs will compile to something similar
> to the compiled code for char+.
>
> Thanks.  Sorry for the split post.

Ok.

First, this kind of optimization will depend highly on the compiled
used.  The above CASE will generate faster code on clisp.  On sbcl,
with (declaim (optimize (speed 3))) you can get a faster char+ with
the following.


The JMPHASH is generated at the discretion of the compiler.  When it's
generated, it's probably the fastest.  You cannot really count on it.
I mean, there could be some non lexical items influencing the compiler
to generate different code, like a sequence of tests and jumps.

If you want to ensure O(1) processing of the various cases, while
maintaining portability, you must do it explicitely.  Fortunately, in
your case it's easy to do:


(cll:require "http://groups.google.com/group/comp.lang.lisp/msg/5218731887d2baf4") 
;-)


(defun gen-char-next (name order otherwise)
  `(defun ,name (char)
     (code-char , (destructuring-bind (minc . maxc)
                      (reduce (extremum (function <) :key (function char-code)) order
                              :initial-value (cons (aref order 0) (aref order 0)))
                    (let* ((base (char-code minc))
                           (size (- (char-code maxc)
                                    (char-code minc))) ; no offby1, maxc has no succ.
                           (vect  (make-array size :element-type 'integer
                                                   :initial-element (char-code otherwise)))
                           (vcode (gensym)))
                      (loop
                         :for i :from 0 :below (1- (length order))
                         :do (setf (aref vect (- (char-code (aref order i)) base))
                                   (char-code (aref order (1+ i)))))
                      `(let ((,vcode (char-code char)))
                         (if (<= ,base ,vcode ,(+ base size -1))
                             (aref ',vect (- ,vcode ,base))
                             ',(char-code otherwise))))))))

(gen-char-next 'char+ "123456789ABCDEFGHI" #\*)
-->
(DEFUN CHAR+ (CHAR)
 (CODE-CHAR
  (LET ((#1=#:G16919 (CHAR-CODE CHAR)))
   (IF (<= 49 #1# 72)
    (AREF '#(50 51 52 53 54 55 56 57 65 42 42 42 42 42 42 42 66 67 68 69 70 71 72 73)
     (- #1# 49))
    '42))))



Yep, by the way, you don't really need to learn about macros.  Just
plain normal lisp functions.  The only thing you need to know about
macros is:

(defmacro generate-char-next (name order otherwise) 
   (gen-char-next name order otherwise))

(generate-char-next char+ "123456789ABCDEFGHI" #\*)



Of course, substitution of expression will work the same in macros as
in functions, so if you prefer to write:

(defmacro generate-char-next (name order otherwise) 
   `(defun ...))

and skip the (defun gen-char-next ...), your call.



(loop :for i :from (char-code #\0) :to  (char-code #\Z) 
      :do (when (zerop (mod i 4)) (terpri))
      :do (princ " ")
      :do (prin1 `(,(code-char i) --> ,(char+ (code-char i)))))
prints:
 (#\0 --> #\*) (#\1 --> #\2) (#\2 --> #\3) (#\3 --> #\4)
 (#\4 --> #\5) (#\5 --> #\6) (#\6 --> #\7) (#\7 --> #\8)
 (#\8 --> #\9) (#\9 --> #\A) (#\: --> #\*) (#\; --> #\*)
 (#\< --> #\*) (#\= --> #\*) (#\> --> #\*) (#\? --> #\*)
 (··@ --> #\*) (#\A --> #\B) (#\B --> #\C) (#\C --> #\D)
 (#\D --> #\E) (#\E --> #\F) (#\F --> #\G) (#\G --> #\H)
 (#\H --> #\I) (#\I --> #\*) (#\J --> #\*) (#\K --> #\*)
 (#\L --> #\*) (#\M --> #\*) (#\N --> #\*) (#\O --> #\*)
 (#\P --> #\*) (#\Q --> #\*) (#\R --> #\*) (#\S --> #\*)
 (#\T --> #\*) (#\U --> #\*) (#\V --> #\*) (#\W --> #\*)
 (#\X --> #\*) (#\Y --> #\*) (#\Z --> #\*)
--> NIL


With:

(defun test () 
  (loop :repeat 1000000 
        :do (loop :for ch :across "0123456789:;<=>·@ABCDEFGHIJKLMNOPQRSTUVWXYZ" 
                  :do (char+ ch))))

and the CASE char+ above gives on my system:
  3.17 seconds of real time
  3.140196 seconds of user run time

while the AREF char+ gives:
  2.124 seconds of real time
  2.120133 seconds of user run time

I guess you could get even a faster function with (declaim (optimize
(safety 0) (debug 0)))...


With clisp, the CASE char+ is faster. You can use #+ in the macro, to
generate the best code on the various target platforms.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

"You question the worthiness of my code? I should kill you where you
stand!"

From: Thomas A. Russ
Subject: Re: Generating code which compiles to a jmphash
Date: Tue, 25 Mar 2008 16:52:49 +0000
Message-ID: <ymiy786oie6.fsf@blackcat.isi.edu>

···········@yahoo.com writes:

> Oops.  I accidentially hit send before completing that post...
> 
> The problem with the above code is the function move-to-int.
> Instead of calling (position ...) every time, I would like code
> which compiles to a jmphash, as if I wrote it using case statements.
> 
> 
> Here is the second example.
> 
> (defun old-char+ (char)
>   (if (char= char #\9)
>       #\A
>       (int-char (1+ (char-int char)))))

Not quite sure what INT-CHAR/CHAR-INT do, but I assume they are similar
to CODE-CHAR?  In any case, this makes some unwarranted assumptions
about the encoding of the character set, namely that the characters for
the digits are contiguous and so are the (upper case) letters.  There
have historically been encodings for which this is not true (see, for
example EBCDIC).

For a nice, encoding independent method, you could do something like the
following (not necessarily efficient) method:

(defun universal-char+ (char &optional (base 36))
  (let ((*print-base* base))
    (char (princ-to-string (1+ (digit-char-p char base))) 0)))


CL-USER> (universal-char+ #\7)
#\8
CL-USER> (universal-char+ #\j)
#\k
CL-USER> (universal-char+ #\J)
#\k

> vs
> (defun char+ (char)
>   (case char
>     (#\1 #\2) (#\2 #\3) (#\3 #\4) (#\4 #\5) (#\5 #\6) (#\6 #\7)
>     (#\7 #\8) (#\8 #\9) (#\9 #\A) (#\A #\B) (#\B #\C) (#\C #\D)
>     (#\D #\E) (#\E #\F) (#\F #\G) (#\G #\H) (#\H #\I) (t   #\*)))
> 
> char+ results in a modest 5% improvement in code speed over old-char+.
> I really don't care so much about that, or about finding the best
> version of char+, but am interested in how you would write code
> which given some range of inputs will compile to something similar
> to the compiled code for char+.

Well, the easiest specific answer would be to write a macro that
generates the case body that you want.

(defmacro build-char-case (name min max)
  (let ((*print-base* max))
     `(defun ,name (char)
        (case char
           ,@(loop for v from min below max
                   as ch = (char (princ-to-string v) 0)
                   collect `(,ch ,(universal-char+ ch)))
           (t #\*)))))

Try 

  (macroexpand-1 '(build-char-case char+ 1 18))

Insert char-upcase as desired.


-- 
Thomas A. Russ,  USC/Information Sciences Institute

From: Edi Weitz
Subject: Re: Generating code which compiles to a jmphash
Date: Tue, 25 Mar 2008 18:33:24 +0000
Message-ID: <uabkm8xhn.fsf@agharta.de>

On 25 Mar 2008 09:52:49 -0700, ···@sevak.isi.edu (Thomas A. Russ) wrote:

> Not quite sure what INT-CHAR/CHAR-INT do, but I assume they are
> similar to CODE-CHAR?

CHAR-INT is a standard Common Lisp function:

  http://www.lispworks.com/documentation/HyperSpec/Body/f_char_i.htm

Edi.

-- 

European Common Lisp Meeting, Amsterdam, April 19/20, 2008

  http://weitz.de/eclm2008/

Real email: (replace (subseq ·········@agharta.de" 5) "edi")

From: Thomas A. Russ
Subject: Re: Generating code which compiles to a jmphash
Date: Tue, 25 Mar 2008 22:47:01 +0000
Message-ID: <ymitziuo1zu.fsf@blackcat.isi.edu>

Edi Weitz <········@agharta.de> writes:

> On 25 Mar 2008 09:52:49 -0700, ···@sevak.isi.edu (Thomas A. Russ) wrote:
> 
> > Not quite sure what INT-CHAR/CHAR-INT do, but I assume they are
> > similar to CODE-CHAR?
> 
> CHAR-INT is a standard Common Lisp function:
> 
>   http://www.lispworks.com/documentation/HyperSpec/Body/f_char_i.htm

OK. Learn a new CL function every once-in-a-while.

I do note, though that since "If character has no implementation-defined
attributes, the results of char-int and char-code are the same.",
CHAR-INT might be an even worse choice than CHAR-CODE to use for this
purpose.

-- 
Thomas A. Russ,  USC/Information Sciences Institute

From: Kent M Pitman
Subject: Re: Generating code which compiles to a jmphash
Date: Wed, 26 Mar 2008 02:47:17 +0000
Message-ID: <uiqzakxqi.fsf@nhplace.com>

···@sevak.isi.edu (Thomas A. Russ) writes:

> Edi Weitz <········@agharta.de> writes:
> 
> > On 25 Mar 2008 09:52:49 -0700, ···@sevak.isi.edu (Thomas A. Russ) wrote:
> > 
> > > Not quite sure what INT-CHAR/CHAR-INT do, but I assume they are
> > > similar to CODE-CHAR?
> > 
> > CHAR-INT is a standard Common Lisp function:
> > 
> >   http://www.lispworks.com/documentation/HyperSpec/Body/f_char_i.htm
> 
> 
> OK. Learn a new CL function every once-in-a-while.
> 
> I do note, though that since "If character has no implementation-defined
> attributes, the results of char-int and char-code are the same.",
> CHAR-INT might be an even worse choice than CHAR-CODE to use for this
> purpose.

[The following is offered out of context of the specific thread here.
 I don't know what a jmphash is.  I'm just commenting generally on the
 char-code vs char-int thing.]

In case it helps, for LispWorks...

(char-int #\control-a)  => 131137
(char-code #\control-a) => 65
(char-bits #\control-a) => 1

Then again...

(make-char #\a 16) => #\Shift-\a
(make-char #\A 16) => #\Shift-A

This could confuse some people, I suppose.  It's one of those things
where if it were a joke you'd say "you have to have been there".

[Of course, you (TAR) were there, and a lot of this that follows is probably
no surprise.  But it may be at least a little to others who are newer 
on the scene.]

It might be easier to understand if I did:

(format nil "~b" (char-int #\control-a)) =>     "100000000001000001"
(format nil "~b" (char-int #\Shift-\a )) => "1000000000000001100001"
(format nil "~b" (char-int #\Shift-A  )) => "1000000000000001000001"

That is, the function of the "shift" bit (and the control bit and
so on) is to modify a particular underlying character, so that a
particular gesture can be referred to.  But, as with the pathname
system, it had so many different systems to accommodate, and they
did not agree, so the set of things it ended up doing was overly 
expansive, and is probably best ignored for most complicated things.

One last oddity, still specific to LispWorks, though many systems have this
issue: ^A is the notation often used for old-style controlification, like
on ASCII keyboards, where ^I is tab, ^J is linefeed, ^L is form, ^M is return
because in ASCII those are the locations of TAB (09), LF (10), FF (12), 
and CR (13), for example.  In ASCII, code 01 (SOH) is referred to in that
notation as ^A, and is distinct from the modern control-A, mostly because
on the old keyboards, only certain characters were allowed to controlified;
the others just mapped onto those by having the old-style control key just
xor the character you pressed with #b11111 (31.) to get the resulting code.
So the letter A and the letter a both turned into just 1 (ASCII SOH) when 
you xor'd it with 31.  Because people didn't like that "folding", a control
bit was later added and it co-resided with the old scheme, allowing oddities
like the ability to say control-^A, which incidentally some keyboards could
generate (and Emacs wanted to be able to dispatch on) even though it makes
no sense on modern keyboards ... in part because codes 1-31 on those keyboards
were sometimes (as on the LispM) re-purposed to be graphical characters.

  #\               => #\SOH
  #\control-^a       => #\Control-SOH
  #\control-shift-^a => #\Shift-Control-SOH

  (format nil "~b" (char-int #\^A))               =>                      "1"
  (format nil "~b" (char-int #\control-^A))       =>     "100000000000000001"
  (format nil "~b" (char-int #\control-shift-^A)) => "1000100000000000000001"

If this all sounds like BS, it's because you're in the wrong character set.
Backspace (BS) is code 8, and actually was the character Lambda on the LispM.

Probably that's too long to make crisp sense and too brief to be
really useful, so maybe it doesn't help anyone know what these
characters do now.  But hopefully you can see that it was very useful
in transition among character sets from the old to the new to be able
to represent all the different modalities... it's increasingly
vestigial now, but I bet there are still stray uses of this and that,
which is why it's abstracted out so that people don't constantly rely
on assumptions about individual bits.

From: Pascal J. Bourguignon
Subject: Re: Generating code which compiles to a jmphash
Date: Wed, 26 Mar 2008 08:55:07 +0000
Message-ID: <7cbq51uaok.fsf@pbourguignon.anevia.com>

Kent M Pitman <······@nhplace.com> writes:
> [The following is offered out of context of the specific thread here.
>  I don't know what a jmphash is.  I'm just commenting generally on the
>  char-code vs char-int thing.]

JMPHASH is the clisp VM instruction used to implement CASE.  

Basically, a  (go (aref arg2 (gethash arg0 arg1)))


C/USER[145]> (disassemble (compile nil (lambda (x) (case x ((1) 'one) ((2) 'two) ((3 4 5)  'more) (otherwise 'much)))))

Disassembly of function NIL
(CONST 0) = #S(HASH-TABLE :TEST EXT:STABLEHASH-EQ (5 . 7) (4 . 7) (3 . 7) (2 . 4) (1 . 1))
(CONST 1) = ONE
(CONST 2) = TWO
(CONST 3) = MORE
(CONST 4) = MUCH
1 required argument
0 optional arguments
No rest parameter
No keyword parameters
14 byte-code instructions:
0     (LOAD 1)
1     (JMPHASH 0 L13 L4 L7 #1=L10 #1# #1#)
4     L4
4     (CONST 1)                           ; ONE
5     (SKIP&RET 2)
7     L7
7     (CONST 2)                           ; TWO
8     (SKIP&RET 2)
10    L10
10    (CONST 3)                           ; MORE
11    (SKIP&RET 2)
13    L13
13    (CONST 4)                           ; MUCH
14    (SKIP&RET 2)
NIL

-- 
__Pascal Bourguignon__

From: ···········@yahoo.com
Subject: Re: Generating code which compiles to a jmphash
Date: Wed, 26 Mar 2008 10:25:32 +0000
Message-ID: <be1d1181-b6a6-4e64-acec-5aade2e6feeb@e10g2000prf.googlegroups.com>

As far as optimizations go, the particular form of these functions
does not matter too much.  They are all pretty fast compared to
the code they are embedded in.

In other places, I've cut down on consing by a couple orders of
magnitude.
That, plus some algorithmic changes have made the bulk of the code run
in about 1/5 of the original time.  I have been profiling
aggressively, mostly as a way to learn a bit more about lisp,
and about my particular environment.

From: ···········@yahoo.com
Subject: Re: Generating code which compiles to a jmphash
Date: Wed, 26 Mar 2008 11:21:18 +0000
Message-ID: <295a44c5-5e9c-4b46-9270-af044c98190c@e23g2000prf.googlegroups.com>

Here's the code I'm using currently:

(defmacro make-case (domain range key)
  `(case ,key
     ,@(mapcar #'list domain range)))

(defmacro char-ops (numbs chars)
  `(progn
     (defun c2n (c) (make-case ,chars ,numbs c))
     (defun n2c (n) (make-case ,numbs ,chars n))
     (defun c+1 (c) (make-case ,(butlast chars) ,(rest chars) c))
     (defun c-1 (c) (make-case ,(rest chars) ,(butlast chars) c))
     (defun -c  (c) (make-case ,chars ,(reverse chars) c))))
(I have removed a couple functions for *cough* brevity.)

The actual characters used in the conversions are not meant
for human consumption.  As far as portability goes,
I'm guessing that the above code is fine.  What matters
is the ability to change the representation without too much fuss.

Initially, I was using:
(char-ops
  (-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6)
  (#\A #\B #\C #\D #\E #\F #\G #\H #\I #\J #\K #\L #\M))

But I grabbed the output from:
(loop for i from -6 to 6 collect (int-char (random 256)))

and changed to:
(char-ops
 (-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6)
 (#\Nak #\LATIN_CAPITAL_LETTER_E_WITH_GRAVE #\U0089
        #\LATIN_SMALL_LETTER_I_WITH_GRAVE #\? #\#
        #\LATIN_CAPITAL_LETTER_E_WITH_CIRCUMFLEX #\f #\$
        #\Etx #\U009A #\INVERTED_QUESTION_MARK #\U0084))

I'm guessing this is not portable, but the loop
to generate a random set of characters is.
I don't expect to use it too many times, so I'm not
interested in insuring uniqueness at the moment,
other than by eye.


And here's how I'm using the code:

I have 10^5 to 10^7 objects of the form:
  (setq thing ((-3 2 1 0 1) (0 1 0 2 0) ... (-1 2 0 4 0)))
where each object is a unique arrangement of integers
between -n and n inclusive.  (I used 6 in the example above.)
I don't have all such objects, and don't have a simple
ordering technique which would allow me to basically
never create the objects in the first place, but
generate them from an index if they become relevant.

If I do (setf (gethash thing h) whatever) on the list "thing" above,
I get 80% to 90% collision.  If I convert those things
to strings, with each character representing one integer
from the list, and hash on the string, I get about
1% percent collision rate.

This is the first time I've used hash tables, so I
really don't know a thing about tuning them.   Using
random character sets cuts collisions in half again:
0.5% collisions, with a maximum of around 5 collisions
per key.  I have no idea yet if that has a measurable
impact.

Anyway, thanks for the continuing education.

From: Pascal J. Bourguignon
Subject: Re: Generating code which compiles to a jmphash
Date: Wed, 26 Mar 2008 12:42:12 +0000
Message-ID: <7cve39slln.fsf@pbourguignon.anevia.com>

···········@yahoo.com writes:
> And here's how I'm using the code:
>
> I have 10^5 to 10^7 objects of the form:
>   (setq thing ((-3 2 1 0 1) (0 1 0 2 0) ... (-1 2 0 4 0)))
> where each object is a unique arrangement of integers
> between -n and n inclusive.  (I used 6 in the example above.)
> I don't have all such objects, and don't have a simple
> ordering technique which would allow me to basically
> never create the objects in the first place, but
> generate them from an index if they become relevant.
>
> If I do (setf (gethash thing h) whatever) on the list "thing" above,
> I get 80% to 90% collision.  

Not surprizing.  clisp doesn't walk very long a list to compute it's
sxhash...

If you consider integers in intervals [-N .. N], you could map them to
[0 .. 2N], and then a thing is like a list of integers in [0 .. 2N^p], 

(let ((n 6))
  (mapcar (lambda (integer)
            (loop
               :for digit :in integer
               :for i = (+ digit n) :then (+ (* 2 n i) (+ digit n))
               :finally (return i)))
      '((-3 2 1 0 1) (0 1 0 2 0) ... (-1 2 0 4 0))))

--> (77119 137478 ... 118494)

Flat lists like this would hash better.

But of course, you can reiterate the transformation, considering each
element of the flat list as a digit in [0..2N^P].

Then single integers would hash very nicely, unless they've got a
common modulo to the hash table size...

Now of course, you'd have to take into account the time complexity of
this transformation, but on the other hand, you don't have to create
the object in the first place, since what I've shown you is just a way
to index them.  And of course, you can recover the list of list of
small integers from these big integers since the transformation is
bijective.

> If I convert those things
> to strings, with each character representing one integer
> from the list, and hash on the string, I get about
> 1% percent collision rate.

> This is the first time I've used hash tables, so I
> really don't know a thing about tuning them.   Using
> random character sets cuts collisions in half again:
> 0.5% collisions, with a maximum of around 5 collisions
> per key.  I have no idea yet if that has a measurable
> impact.
>
> Anyway, thanks for the continuing education.

-- 
__Pascal Bourguignon__

From: Gene
Subject: Re: Generating code which compiles to a jmphash
Date: Tue, 25 Mar 2008 01:07:39 +0000
Message-ID: <a563c373-82f6-4a0c-ae88-7e2c00b75e14@b64g2000hsa.googlegroups.com>

On Mar 24, 7:20 pm, ···········@yahoo.com wrote:
> Hi,
>   I have a number of small utility functions which I would like to
> compile
> to JMPHASHs (to steal a term from clisp's (disassemble 'foo) output.
> Feel free to correct my terminology.)  These functions are called
> several million times.  I can and have transformed small, clear
> functions
> into ugly, hard to maintain case statements.
>
>   A little background:  I'm coding a puzzle solver in my spare time,
> for the challenge and as a way to learn lisp.  One part of the solver
> is an end-game hash, which takes quite a while to compute.
> Turning nice code into ugly code has made the code significantly
> faster.
>
>   A simple response of "learn to write macros" probably would not
> be inappropriate.  On the other hand, I am at least starting to see
> patterns in the code which I know can be abstracted away with macros,
> and I'm hoping that a little shove in the right direction will make
> that easier.
>
> At the risk of too much information, I'll provide two examples.
> Feel free to ignore the first one in your responses.  The first
> is included as better example of the kind of transformation
> I am interested in, but the second is fairly trivial, and a good
> jumping off point for me.
>
> (defconstant dirs '(up down left right))
>
> (defun all-moves ()
>
>   (loop for dir in dirs
>      append
>        (loop for i from 0 to 4
>           collect (list i dir))))
>
> (defun move-to-int (move)
>   (+ (first move) (* 5 (position (second move) dirs))))
>
> > (nth 12 (all-moves))
>        => (2 left)
> > (map 'list 'move-to-int (all-moves))
>
>        => (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19)

On Mar 24, 7:41 pm, ···········@yahoo.com wrote:
> Oops.  I accidentially hit send before completing that post...
>
> The problem with the above code is the function move-to-int.
> Instead of calling (position ...) every time, I would like code
> which compiles to a jmphash, as if I wrote it using case statements.
>
> Here is the second example.
>
> (defun old-char+ (char)
>   (if (char= char #\9)
>       #\A
>       (int-char (1+ (char-int char)))))
> vs
> (defun char+ (char)
>   (case char
>     (#\1 #\2) (#\2 #\3) (#\3 #\4) (#\4 #\5) (#\5 #\6) (#\6 #\7)
>     (#\7 #\8) (#\8 #\9) (#\9 #\A) (#\A #\B) (#\B #\C) (#\C #\D)
>     (#\D #\E) (#\E #\F) (#\F #\G) (#\G #\H) (#\H #\I) (t   #\*)))
>
> char+ results in a modest 5% improvement in code speed over old-char+.
> I really don't care so much about that, or about finding the best
> version of char+, but am interested in how you would write code
> which given some range of inputs will compile to something similar
> to the compiled code for char+.
>
> Thanks.  Sorry for the split post.

The 5% difference isn't surprising.  Unless the compiler is very
clever, it is going to expand the case into an "if then else" chain
that is not going to run much faster than the (position ) call.

As the lists of values get long, hash tables will eventually be
faster.  You can do something like:

CL-USER> (defun hashify (fn domain &key (test #'eql))
	   (let ((hash (make-hash-table :test test)))
	     (dolist (val domain hash)
	       (setf (gethash val hash) (funcall fn val)))))

CL-USER> (defun old-char+ (char)
	  (if (char= char #\9)
	      #\A
	      (code-char (1+ (char-code char)))))

CL-USER> (setq h (hashify #'old-char+ '(#\1 #\2 #\3 #\4 #\5 #\6 #\7 #
\8 #\9
				        #\A #\B #\C #\D #\E #\F #\G #\H #\I)))/
CL-USER> (gethash #\1 h)
#\2
T

If you are really hard-over on the case idea, then

CL-USER> (defmacro casify (fn domain case-key)
	   `(case ,case-key
	     ,@(mapcar #'(lambda (x) (list x (funcall fn x)))
		      domain)))
CASIFY
CL-USER> (macroexpand-1 '(casify 1+ (1 2 3) foo))
(CASE FOO (1 2) (2 3) (3 4))
T

From: ···········@yahoo.com
Subject: Re: Generating code which compiles to a jmphash
Date: Tue, 25 Mar 2008 05:16:53 +0000
Message-ID: <a71b935c-695d-49bc-afd2-756f5b586f05@i12g2000prf.googlegroups.com>

Thank you both.  You've given me plenty to process.
If you are curious, here is how I plan to use your suggestions:

char+ is one of about 10 functions which transform data.
char+ transforms data within a string representation.
Several other functions transform data between representations,
i.e., between strings and nested lists.

The details of the string representation are irrelevant.
I don't care which characters get used.  1-9 and A-I are
holdovers from using (write-to-string num :base 36).
Having said that, I do want the flexibility to modify
the transformation through a single operation, rather
than through rewriting 10 functions.

Data in the string representation is used as a hash key.
Hashing objects in the other representation produces
hash tables with 90% collisions, most of which concentrate
into a single key.  Hashing in the string representation
so far has produced very nice hash tables, but I would
like to be able to quickly change the transformation
to see the effect on the quality of tables.  When they
grow by 3 orders of magnitude I may need to do something
quite different.

As far as generating code which compiles to a particular form,
I was definitely thinking about the wrong problem.

From: Pascal Bourguignon
Subject: Re: Generating code which compiles to a jmphash
Date: Mon, 24 Mar 2008 23:38:53 +0000
Message-ID: <87ve3bsnea.fsf@thalassa.informatimago.com>

···········@yahoo.com writes:

> Hi,
>   I have a number of small utility functions which I would like to
> compile
> to JMPHASHs (to steal a term from clisp's (disassemble 'foo) output.
> Feel free to correct my terminology.)  These functions are called
> several million times.  I can and have transformed small, clear
> functions
> into ugly, hard to maintain case statements.
>
>   A little background:  I'm coding a puzzle solver in my spare time,
> for the challenge and as a way to learn lisp.  One part of the solver
> is an end-game hash, which takes quite a while to compute.
> Turning nice code into ugly code has made the code significantly
> faster.
>
>   A simple response of "learn to write macros" probably would not
> be inappropriate.  On the other hand, I am at least starting to see
> patterns in the code which I know can be abstracted away with macros,
> and I'm hoping that a little shove in the right direction will make
> that easier.
>
> At the risk of too much information, I'll provide two examples.
> Feel free to ignore the first one in your responses.  The first
> is included as better example of the kind of transformation
> I am interested in, but the second is fairly trivial, and a good
> jumping off point for me.

I don't get it.  What's your question?


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

"You cannot really appreciate Dilbert unless you read it in the
original Klingon"

From: vanekl
Subject: Re: Generating code which compiles to a jmphash
Date: Tue, 25 Mar 2008 19:34:54 +0000
Message-ID: <fsbk4j$g5g$1@aioe.org>

···········@yahoo.com wrote:
> Hi,
>   I have a number of small utility functions which I would like to
> compile
> to JMPHASHs (to steal a term from clisp's (disassemble 'foo) output.
> Feel free to correct my terminology.)  These functions are called
> several million times.  I can and have transformed small, clear
> functions
> into ugly, hard to maintain case statements.
> 
>   A little background:  I'm coding a puzzle solver in my spare time,
> for the challenge and as a way to learn lisp.  One part of the solver
> is an end-game hash, which takes quite a while to compute.
> Turning nice code into ugly code has made the code significantly
> faster.
> 
>   A simple response of "learn to write macros" probably would not
> be inappropriate.  On the other hand, I am at least starting to see
> patterns in the code which I know can be abstracted away with macros,
> and I'm hoping that a little shove in the right direction will make
> that easier.
> 
> At the risk of too much information, I'll provide two examples.
> Feel free to ignore the first one in your responses.  The first
> is included as better example of the kind of transformation
> I am interested in, but the second is fairly trivial, and a good
> jumping off point for me.
> 
> (defconstant dirs '(up down left right))
> 
> (defun all-moves ()
> 
>   (loop for dir in dirs
>      append
>        (loop for i from 0 to 4
>           collect (list i dir))))
> 
> (defun move-to-int (move)
>   (+ (first move) (* 5 (position (second move) dirs))))
> 
>> (nth 12 (all-moves))
>        => (2 left)
>> (map 'list 'move-to-int (all-moves))
>        => (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19)
> 

Five optimization tips.

One: I hope you have first exhausted the potentially more
problematic areas you may have in your code that can give
dramatic speed-ups. Most notably, minimize consing, and reuse
common data structures that are heavily used or require a great
deal of time to construct or generate excessive consing
(perhaps even reusing the data structure across program invocations).

Two: optimizing what you think may be the problem rarely gives
marked results. Instead, optimize what profiling says is the problem,
paying special attention to the ratio of gc to user time.

Three: speeding up one portion of your code by unrolling loops or
case statements may slow other portions down if they are pushed out of
cache. Micro-optimizing without measuring total performance may mask
these affects.

Four: it's easy to forget that code compilation can be affected
with compilation hints, such as the "optimize," "type," and "inline"
declarations.

Five: try to minimize I/O when optimizing since this tends to
cloud the picture. System calls are slow.