Re: Slow Loop (alternatives in lisp?)

From: Rainer Joswig
Subject: Re: Slow Loop (alternatives in lisp?)
Date: Sat, 16 Aug 2008 14:33:31 +0000
Message-ID: <joswig-87CDF5.16333016082008@news-europe.giganews.com>

In article 
<····································@k7g2000hsd.googlegroups.com>,
 Francogrex <······@grex.org> wrote:

> Hello, I'm trying to imitate the behaviour of the pivot-table in excel
> where you take a list of items and another list of their values and
> you sum similar ones together (see toy example below). I have a list
> of 30000 items and their associated values and in excel using a pivot-
> table the computation is done instantaneously (less than 2 seconds)
> while the procedure I wrote in lisp will take about 12 hours !(I give
> an example of only 15 items below, this goes fast of course because
> only 15 items, but the 30,000 will take an estimate of about 12 hours;
> I never reached that far because around 5 hours I give up). Do you
> know why? Is there a way to enhance the procedure and make it as fast
> as the pivot table? Thanks
> 
> 
> ;;Toy example on only 15 items:
> ;; ls is the list and counter is the associated values
> (defvar ls '("a" "b" "c" "b" "a" "f" "e" "g" "h" "k" "z" "k" "r" "u"
> "f"))
> (defvar counter '(1 5 8 7 14 8 3 7 9 4 3 21 5 7 9))
> ;;while macro similar to dotimes.
> (defmacro while (condition &rest body)
> 	       (let ((var (gensym)))
> 		 `(do ((,var nil (progn ,@body)))
> 		      ((null ,condition) ,var))))
> ;;Tabulate like the pivot table.
> 	    (setf i 0)
> 	    (while (< i (length ls))
> 	      (setf j (+ i 1))
> 	      (while (< j (length ls))
> 		(if (AND (equal (nth i ls) (nth j ls)) (not (equal (nth j ls)
> 'indic)))
> 		    (AND (setf (nth i counter)
> 			       (+ (nth j counter) (nth i counter)))
> 			 (AND (setf (nth j ls) 'indic) (setf (nth j counter) 'indic))))
> (incf j))
> 	      (pprint i)(incf i))
> ;;Remove the indicators.
> 	    (delete 'indic ls)
> 	    (delete 'indic counter)
> ;; printing ls and counter now gives the summed values.

First you should try to format your code so that we can read it:


(defvar ls '("a" "b" "c" "b" "a" "f" "e" "g" "h" "k" "z" "k" "r" "u" "f"))
(defvar counter '(1 5 8 7 14 8 3 7 9 4 3 21 5 7 9))


;;while macro similar to dotimes.
(defmacro while (condition &rest body)
  (let ((var (gensym)))
    `(do ((,var nil (progn ,@body)))
         ((null ,condition) ,var))))

;; Tabulate like the pivot table.
(setf i 0)
(while (< i (length ls))
  (setf j (+ i 1))
  (while (< j (length ls))
    (if (AND (equal (nth i ls) (nth j ls))
             (not (equal (nth j ls) 'indic)))
        (AND (setf (nth i counter)
                   (+ (nth j counter) (nth i counter)))
             (AND (setf (nth j ls) 'indic) (setf (nth j counter) 'indic))))
    (incf j))
  (pprint i)
  (incf i))

;;Remove the indicators.
(delete 'indic ls)
(delete 'indic counter)

Let's have a look:

* variables defined by defvar should be written as *ls* and *counter* .

* there is already a WHILE in Common Lisp. No need to invent a new one:

(loop while (foo-p) do .... )

* don't write code like this. Write functions you can try out.

* understand when to use lists and when arrays

* using multiple calls to LENGTH on lists in Loops is expensive

* using random access on lists is expensive. The cheapest 
  access you get on the first element.

* don't modify literal lists. They are constant data.

You may want to read some beginner's book on Lisp and list processing.


Here is a 'solution' that is a bit more complicated, since
it uses a hash-table. An alternative would have been
to use an assoc list. But the solution is 'fast' for
long lists. 

(defun distribution1 (items values test)
  (let ((table (make-hash-table :test test)))
    (loop for item in items
          for value in values
          do (incf (gethash item table 0) value))
    (let ((items-list nil))
      (maphash (lambda (item sum-value)
                 (push (cons item sum-value) items-list))
               table)
      (sort items-list #'> :key #'cdr))))

An example call:

CL-USER 58 > (distribution1 '("a" "b" "c" "b" "a" "f" "e" "g" "h" "k" "z" "k" "r" "u" "f")
                            '(1 5 8 7 14 8 3 7 9 4 3 21 5 7 9)
                            #'equal)
(("k" . 25) ("f" . 17) ("a" . 15) ("b" . 12) ("h" . 9) ("c" . 8) ("g" . 7) ("u" . 7) ("r" . 5) ("e" . 3) ("z" . 3))

You can see that the loop goes once over the input lists. It iterates
of both lists in one loop. For every element in list ITEMS
the hashtable will be updated by incrementing by the value.

Then it walks with MAPHASH once over the hashtable and builds a new
list of item sum-value. This is almost our result.
Here the result is sorted, so the items with the largest
sum-values are first.

So, it walks once first to last (which is efficient in Lisp)
over the lists. The update of the hash-table is fast, too.
Then the result list is created. and the result list is sorted.

-- 
http://lispm.dyndns.org/

From: Rainer Joswig
Subject: Re: Slow Loop (alternatives in lisp?)
Date: Sat, 16 Aug 2008 19:58:29 +0000
Message-ID: <joswig-C6F75A.21582916082008@news-europe.giganews.com>

In article 
<····································@j22g2000hsf.googlegroups.com>,
 Francogrex <······@grex.org> wrote:

> On Aug 16, 4:33�pm, Rainer Joswig <······@lisp.de> wrote:
> > ...You may want to read some beginner's book on Lisp and list processing...
> 
> Thanks to all you guys helping me find solutions and being patient
> with me. I am reading several books including 2 Paul Graham books:
> "ANSI common lisp" (done) and "on lisp" (I am still reading it).
> Seibel-"Practical Common Lisp" (half-way through) Touretzky-"CL: A
> Gentle Introduction to Symbolic Computation" (done) and
> Norvig-"Paradigms of AI programming" (the book was actually a gift
> from a friend; I still find it very difficult, but was the one that
> attracted me to Common Lisp in the 1st place).

That's why I'm mentioning it. If you want to learn driving then
'On Lisp' will explain you how to be a stunt driver and
drive on two, one or even zero wheels.

ANSI CL is somehow useful to get an overview. Touretzky is probably
the most useful text for learning basics (coding algorithms in Lisp).

Check out also this book:

http://www.cse.buffalo.edu/pub/WWW/faculty/shapiro/Commonlisp/

It has , as its names says, a very interactive approach.

Writing lots of little functions and trying out alternative
versions is important. I took also some algorithms book
and coded the stuff that was in there.

Writing code and posting it here, like you do, is also
a good approach. But check out some basic texts about symbolic
programming. Reading SICP and trying the examples in either
Scheme or Common Lisp is also recommended.

> Reading is fun and very
> useful but I have to keep practicing (doing the exercises) to become
> better. I admit I have still to shake off the damned Cpp and Pascal
> habits. I think I would have been better quicker in CL if I came to it
> without knowing any of those other languages...

This effect has been observed long ago:

 Anyone could learn Lisp in one day, except that if they already 
 new Fortran, it would take three days.

 Marvin Minsky

-- 
http://lispm.dyndns.org/