Wide-finder optimisation

From: Robert Uhl
Subject: Wide-finder optimisation
Date: Tue, 02 Oct 2007 05:34:34 +0000
Message-ID: <m3fy0ujd05.fsf@latakia.dyndns.org>

Just read this blog entry:

  http://www.method-combination.net/blog/archives/2007/09/29/wide-finder-in-lisp.html

Essentially, the fellow implemented Bray's Wide Finder
(http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder) in
Common Lisp.  The remarkable thing is that the Lisp version runs about
as fast (or as slowly) as the Ruby version.  This is kinda surprising,
given that Ruby is the canonically slow language and that SBCL is
compiled.  I figured it'd be an interesting exercise in optimisation.
The code is below:

  (defun run (&rest logs)
  (let ((counts (make-hash-table :test #'equal)))
    (dolist (filename logs)
      (with-open-file (stream filename :direction :input
				       :external-format :latin-1)
		      (loop for line = (read-line stream nil stream)
			    until (eq line stream)
			    do (cl-ppcre:register-groups-bind 
				(match)
				("GET /ongoing/When/\\d{3}x/(\\d{4}/\\d{2}/\\d{2}/[^ .]+) " line)
				(incf (gethash match counts 0))))))
    (loop for key being the hash-keys of counts
	  collect key into keys
	  finally (map nil #'(lambda (x)
			       (format t "~D: ~A~%" (gethash x counts) x))
		       (subseq (sort keys #'>
				     :key #'(lambda (x) (gethash x counts)))
			       0 10)))))

Looking at it, it's majorly straightforward: initialise a hash table;
read a file line-by-line; look for a regexp and increment the match's
hash table entry when found; later, print everything out.

In fact, I can't really think how to improve it.  Is it just the case
that SBCL's IO is slow?  I think I noticed the same thing when working
on some SPOJ entries--I couldn't read files in the time given, which was
kinda annoying.

-- 
Robert Uhl <http://public.xdi.org/=ruhl>
Very often many things are said by the Holy Scriptures and in it many
names are used not in a literal sense...those who have minds
understand this.                      --St. Isaac the Syrian

Re: Wide-finder optimisation Juho Snellman
Re: Wide-finder optimisation Hannes Klas
- Re: Wide-finder optimisation Edi Weitz
Re: Wide-finder optimisation Vityok
- Re: Wide-finder optimisation John Thingstad

From: Juho Snellman
Subject: Re: Wide-finder optimisation
Date: Tue, 02 Oct 2007 08:33:01 +0000
Message-ID: <slrnfg40ht.sso.jsnell@sbz-30.cs.Helsinki.FI>

Robert Uhl <·········@NOSPAMgmail.com> wrote:
> Just read this blog entry:
>
>   http://www.method-combination.net/blog/archives/2007/09/29/wide-finder-in-lisp.html
>
> Essentially, the fellow implemented Bray's Wide Finder
> (http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder) in
> Common Lisp.  The remarkable thing is that the Lisp version runs about
> as fast (or as slowly) as the Ruby version.  This is kinda surprising,
> given that Ruby is the canonically slow language and that SBCL is
> compiled.

It doesn't seem particularily surprising to me, considering that
reading in log files and matching the lines to regexps is a bread and
butter use case of scripting languages. Of course a scripting language
is going to be optimized for that case. On this program I expect
practically all the time will be spent in libraries written in C, with
very little time spent actually executing Ruby.

> In fact, I can't really think how to improve it.  Is it just the
> case that SBCL's IO is slow?

As far as I know the performance of READ-LINE in SBCL is pretty good
when compared to other Lisp implementations. But it is fundamentally
going to be slower than just reading in a bunch of raw bytes in C,
since the character IO pipeline will end up copying the data around
multiple times between buffers, transcoding from the native external
format to the internal utf-32 presentation, etc.

-- 
Juho Snellman

From: Hannes Klas
Subject: Re: Wide-finder optimisation
Date: Tue, 02 Oct 2007 13:46:12 +0000
Message-ID: <87y7eld3yz.fsf@rz.fh-augsburg.de>

Robert Uhl <·········@NOSPAMgmail.com> writes:


>   (defun run (&rest logs)
>   (let ((counts (make-hash-table :test #'equal)))
>     (dolist (filename logs)
>       (with-open-file (stream filename :direction :input
> 				       :external-format :latin-1)
> 		      (loop for line = (read-line stream nil stream)
> 			    until (eq line stream)
> 			    do (cl-ppcre:register-groups-bind 
> 				(match)
> 				("GET /ongoing/When/\\d{3}x/(\\d{4}/\\d{2}/\\d{2}/[^ .]+) " line)
> 				(incf (gethash match counts 0))))))

Looks like you're building the RegEx for every line in the file. Try
building it before, it might make a huge difference.

From: Edi Weitz
Subject: Re: Wide-finder optimisation
Date: Tue, 02 Oct 2007 13:58:47 +0000
Message-ID: <utzp9zkh4.fsf@agharta.de>

On Tue, 02 Oct 2007 15:46:12 +0200, Hannes Klas <···@rz.fh-augsburg.de> wrote:

> Looks like you're building the RegEx for every line in the file. Try
> building it before, it might make a huge difference.

The compiler macros in CL-PPCRE should take care of that
automatically.

  CL-USER 1 > (trace ppcre:create-scanner)
  (CL-PPCRE:CREATE-SCANNER)

  CL-USER 2 > (defun foo (n)
                (dotimes (i n)
                  (ppcre:register-groups-bind ((#'parse-integer number))
                      ("(\\d+) bottles of beer" (format nil "~A bottles of beer" i))
                    (print number))))
  FOO

  CL-USER 3 > (compile *)
  0 CL-PPCRE:CREATE-SCANNER > ...
    >> CL-PPCRE::REGEX : "(\\d+) bottles of beer"
    1 CL-PPCRE:CREATE-SCANNER > ...
      >> CL-PPCRE::REGEX                 : (:GROUP (:SEQUENCE (:REGISTER (:GREEDY-REPETITION 1 NIL :DIGIT-CLASS)) " bottles of beer"))
      >> CL-PPCRE::CASE-INSENSITIVE-MODE : NIL
      >> CL-PPCRE::MULTI-LINE-MODE       : NIL
      >> CL-PPCRE::SINGLE-LINE-MODE      : NIL
      >> CL-PPCRE::DESTRUCTIVE           : T
    1 CL-PPCRE:CREATE-SCANNER < ...
      << VALUE-0 : #<Closure (CL-PPCRE::CREATE-SCANNER-AUX . 7) 200B8482>
      << VALUE-1 : NIL
  0 CL-PPCRE:CREATE-SCANNER < ...
    << VALUE-0 : #<Closure (CL-PPCRE::CREATE-SCANNER-AUX . 7) 200B8482>
    << VALUE-1 : NIL
  FOO
  NIL
  NIL

  CL-USER 4 > (foo 5)

  0 
  1 
  2 
  3 
  4 
  NIL

Edi.

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")

From: Vityok
Subject: Re: Wide-finder optimisation
Date: Thu, 22 Nov 2007 07:43:44 +0000
Message-ID: <3b616c39-25c0-4bfe-9157-3d270a861fe4@o42g2000hsc.googlegroups.com>

On 2 öÏ×, 07:34, Robert Uhl <·········@NOSPAMgmail.com> wrote:
> Just read this blog entry:
>
>  http://www.method-combination.net/blog/archives/2007/09/29/wide-finde...
>
> Essentially, the fellow implemented Bray'sWideFinder
> (http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder) in
> Common Lisp.  The remarkable thing is that the Lisp version runs about
> as fast (or as slowly) as the Ruby version.  This is kinda surprising,
> given that Ruby is the canonically slow language and that SBCL is
> compiled.  I figured it'd be an interesting exercise in optimisation.

According to the latest benchmarks (as of 22 Nov, 07:40 UTC),
the Perl version is the fastest.[1]  It just interesting to see if it
is
possible to optimize the Common Lisp version similarly to the
optimizations
done in the fastest implementations.

[1] http://www.tbray.org/ongoing/When/200x/2007/10/30/WF-Results

With best regards,

Victor

http://vityok.org.ua

From: John Thingstad
Subject: Re: Wide-finder optimisation
Date: Thu, 22 Nov 2007 08:29:19 +0000
Message-ID: <op.t16pu5e5ut4oq5@pandora.alfanett.no>

På Thu, 22 Nov 2007 08:43:44 +0100, skrev Vityok <······@ua.fm>:

> On 2 Жов, 07:34, Robert Uhl <·········@NOSPAMgmail.com> wrote:
>> Just read this blog entry:
>>
>>  http://www.method-combination.net/blog/archives/2007/09/29/wide-finde...
>>
>> Essentially, the fellow implemented Bray'sWideFinder
>> (http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder) in
>> Common Lisp.  The remarkable thing is that the Lisp version runs about
>> as fast (or as slowly) as the Ruby version.  This is kinda surprising,
>> given that Ruby is the canonically slow language and that SBCL is
>> compiled.  I figured it'd be an interesting exercise in optimisation.
>
> According to the latest benchmarks (as of 22 Nov, 07:40 UTC),
> the Perl version is the fastest.[1]  It just interesting to see if it
> is
> possible to optimize the Common Lisp version similarly to the
> optimizations
> done in the fastest implementations.
>
> [1] http://www.tbray.org/ongoing/When/200x/2007/10/30/WF-Results
>
> With best regards,
>
> Victor
>
> http://vityok.org.ua

Seems the I/O is the culprit. CL's stream libraries are inefficient. I  
think CLISP has the fastest streams and thus provides the best  
performance. (I'm not sure I mostly use LispWorks)
The best "optimation" is to set *print-pretty* to nil. (Many  
implementations have it nil per default.)
There is a flaw in the design of the pretty printer that makes common  
operations very slow.
Appart from that if you are using clsql then when connecting use :pool t  
:if-exists :old to pool connections and reuse them rather than make new  
all the time. A connection to MySQL will probaly run faster than one to  
pgsql. As you probaly gathered the speed of the libraries are more  
importand than the speed of the compiled code in most CRUD (Create Read  
Update Delete) App's.

--------------
John Thingstad