From: emnistal
Subject: Speed Comparison - Iterate at the REPL vs. Iterate Wrapped in a 	Function
Date: 
Message-ID: <45e2ab8f-202d-4801-93c9-caa5286618da@a2g2000prm.googlegroups.com>
Can someone please clue me in what I am doing wrong? I am using SBCL
1.0.20.

I am iterating through a large file (159M) looking for lines that
begin with a particular string. When I run the following at the REPL:

(iterate:iter (iterate:for line in-file "myfile" using 'read-line)
	 while line
	 do (when (cl-ppcre:scan "^my-regexp" line)
	      (iterate:collect line)))

It is reasonably quick, about 4 seconds to completion.

When I wrap it in a function:

(defun iterate-grep-collect (regexp filename)
    (iterate:iter (iterate:for line in-file filename :using 'read-
line)
	 while line
	 do (when (cl-ppcre:scan regexp line)
	      (iterate:collect line))))

and execute:

(iterate-grep-collect "^my-regexp" "myfile")

It slows way down, i.e. about 145 seconds to completion. Any thoughts
would be appreciated.

As a side note when I (use-package :iterate) I get lots of name
conflicts between symbols in the iterate package and the commmon-lisp-
user package. Any work around besides defining the necessary package
every time?

-- EMN

From: Rainer Joswig
Subject: Re: Speed Comparison - Iterate at the REPL vs. Iterate Wrapped in a Function
Date: 
Message-ID: <joswig-8F0C54.01282308102008@news-europe.giganews.com>
In article 
<····································@a2g2000prm.googlegroups.com>,
 emnistal <········@gmail.com> wrote:

> Can someone please clue me in what I am doing wrong? I am using SBCL
> 1.0.20.
> 
> I am iterating through a large file (159M) looking for lines that
> begin with a particular string. When I run the following at the REPL:
> 
> (iterate:iter (iterate:for line in-file "myfile" using 'read-line)
> 	 while line
> 	 do (when (cl-ppcre:scan "^my-regexp" line)
> 	      (iterate:collect line)))
> 
> It is reasonably quick, about 4 seconds to completion.
> 
> When I wrap it in a function:
> 
> (defun iterate-grep-collect (regexp filename)
>     (iterate:iter (iterate:for line in-file filename :using 'read-
> line)
> 	 while line
> 	 do (when (cl-ppcre:scan regexp line)
> 	      (iterate:collect line))))
> 
> and execute:
> 
> (iterate-grep-collect "^my-regexp" "myfile")
> 
> It slows way down, i.e. about 145 seconds to completion. Any thoughts
> would be appreciated.
> 
> As a side note when I (use-package :iterate) I get lots of name
> conflicts between symbols in the iterate package and the commmon-lisp-
> user package. Any work around besides defining the necessary package
> every time?
> 
> -- EMN


Let me guess: you need to compile the regexp once and
then use a precompiled version in each iteration?

-- 
http://lispm.dyndns.org/
From: Tim Bradshaw
Subject: Re: Speed Comparison - Iterate at the REPL vs. Iterate Wrapped in a 	Function
Date: 
Message-ID: <ddf5e628-df2c-43e2-b7fa-89d828f79036@e2g2000hsh.googlegroups.com>
On Oct 7, 11:36 pm, emnistal <········@gmail.com> wrote:

>
> It slows way down, i.e. about 145 seconds to completion. Any thoughts
> would be appreciated.

Wild guess: does the REPL compile forms you type at it?  If so it is
probably compiling only the *call* to the function, not the function.
So compiling the function may well help.

(I can't see that the loop overhead would be as high as you are seeing
when running interpreted, but it may well be that if you compile the
function it can do clever things with precompiling the regexp (but
perhaps not, if you are passing it in as a string - you may need to do
some kind of semi-explicit make-a-compiled-regexp thing in the
function.)

--tim
From: Brian
Subject: Re: Speed Comparison - Iterate at the REPL vs. Iterate Wrapped in a 	Function
Date: 
Message-ID: <26b2dfb2-a820-4c81-a05e-ba3925ea3f90@q5g2000hsa.googlegroups.com>
On Oct 8, 4:45 am, Tim Bradshaw <··········@tfeb.org> wrote:
> Wild guess: does the REPL compile forms you type at it?  If so it is
> probably compiling only the *call* to the function, not the function.
> So compiling the function may well help.
If the call is in a time body, it is going to be compiled first.

It looks like as of SBCL 1.0.19 at least that functions aren't
automatically compiled anymore:
* (defun factorial (x)
(if (zerop x) 1 (* x (factorial (1- x)))))

FACTORIAL
* (compiled-function-p 'factorial)

NIL

So that is probably the problem.
From: Leandro Rios
Subject: Re: Speed Comparison - Iterate at the REPL vs. Iterate Wrapped in a  Function
Date: 
Message-ID: <gcier0$5oo$1@registered.motzarella.org>
Brian escribi�:
> On Oct 8, 4:45 am, Tim Bradshaw <··········@tfeb.org> wrote:
>> Wild guess: does the REPL compile forms you type at it?  If so it is
>> probably compiling only the *call* to the function, not the function.
>> So compiling the function may well help.
> If the call is in a time body, it is going to be compiled first.
> 
> It looks like as of SBCL 1.0.19 at least that functions aren't
> automatically compiled anymore:
> * (defun factorial (x)
> (if (zerop x) 1 (* x (factorial (1- x)))))
> 
> FACTORIAL
> * (compiled-function-p 'factorial)
> 
> NIL
> 
> So that is probably the problem.

CL-USER> (compiled-function-p #'factorial)
T
CL-USER>
From: emnistal
Subject: Re: Speed Comparison - Iterate at the REPL vs. Iterate Wrapped in a 	Function
Date: 
Message-ID: <f79f8fb2-c9e7-44e3-9728-b98e5c5ea461@w24g2000prd.googlegroups.com>
On Oct 8, 8:07 am, Leandro Rios <··················@gmail.com> wrote:
> Brian escribió:
>
>
>
> > On Oct 8, 4:45 am, Tim Bradshaw <··········@tfeb.org> wrote:
> >> Wild guess: does the REPL compile forms you type at it?  If so it is
> >> probably compiling only the *call* to the function, not the function.
> >> So compiling the function may well help.
> > If the call is in a time body, it is going to be compiled first.
>
> > It looks like as of SBCL 1.0.19 at least that functions aren't
> > automatically compiled anymore:
> > * (defun factorial (x)
> > (if (zerop x) 1 (* x (factorial (1- x)))))
>
> > FACTORIAL
> > * (compiled-function-p 'factorial)
>
> > NIL
>
> > So that is probably the problem.
>
> CL-USER> (compiled-function-p #'factorial)
> T
> CL-USER>

Compiling the function takes, maybe, a second off the run time. I
think Rainer is on to it. After reading the CL-PPCRE hints again it
looks like Dr. Weitz recommends against the kind of function I am
using. The question is how do I force cl-ppcre:scan to to compile the
regular expression only once on the first pass? defconstant?
From: Thomas A. Russ
Subject: Re: Speed Comparison - Iterate at the REPL vs. Iterate Wrapped in a  Function
Date: 
Message-ID: <ymitzbnhsdh.fsf@blackcat.isi.edu>
emnistal <········@gmail.com> writes:

> After reading the CL-PPCRE hints again it
> looks like Dr. Weitz recommends against the kind of function I am
> using. The question is how do I force cl-ppcre:scan to to compile the
> regular expression only once on the first pass? defconstant?

You don't do it that way.

Instead you (manually) create a scanner and then perhaps even compile
the resulting function.

Perhaps something like:

  (let ((scanner  (cl-ppcre:create-scanner my-regexp)))
    (iterate ....
        (scan scanner line)))

or 

  (let ((scanner (compile nil (cl-ppcre:create-scanner my-regexp))))
    ...)


-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: emnistal
Subject: Re: Speed Comparison - Iterate at the REPL vs. Iterate Wrapped in a 	Function
Date: 
Message-ID: <349e1497-8fe8-4aeb-86a1-ac96a7882217@x16g2000prn.googlegroups.com>
On Oct 8, 12:08 pm, ····@sevak.isi.edu (Thomas A. Russ) wrote:
> emnistal <········@gmail.com> writes:
> > After reading the CL-PPCRE hints again it
> > looks like Dr. Weitz recommends against the kind of function I am
> > using. The question is how do I force cl-ppcre:scan to to compile the
> > regular expression only once on the first pass? defconstant?
>
> You don't do it that way.
>
> Instead you (manually) create a scanner and then perhaps even compile
> the resulting function.
>
> Perhaps something like:
>
>   (let ((scanner  (cl-ppcre:create-scanner my-regexp)))
>     (iterate ....
>         (scan scanner line)))
>
> or
>
>   (let ((scanner (compile nil (cl-ppcre:create-scanner my-regexp))))
>     ...)
>
> --
> Thomas A. Russ,  USC/Information Sciences Institute

Thank you for the help. Creating the scanner outside the loop did the
trick.

(defun iterate-grep-collect (regexp filename)
   (let ((scanner (cl-ppcre:create-scanner regexp)))
       (iterate:iter (iterate:for line in-file filename :using 'read-
line)
            while line
            do (when (cl-ppcre:scan scanner line)
                 (iterate:collect line)))))

Runs in about 4.8 seconds on a 159M file. There is no real time
difference between the interpreted and compiled function.
From: Thomas F. Burdick
Subject: Re: Speed Comparison - Iterate at the REPL vs. Iterate Wrapped in a 	Function
Date: 
Message-ID: <4dc507aa-b38c-48b4-adc5-9a07c5b6eb81@m74g2000hsh.googlegroups.com>
On Oct 8, 6:05 pm, emnistal <········@gmail.com> wrote:
> On Oct 8, 8:07 am, Leandro Rios <··················@gmail.com> wrote:
>
>
>
> > Brian escribió:
>
> > > On Oct 8, 4:45 am, Tim Bradshaw <··········@tfeb.org> wrote:
> > >> Wild guess: does the REPL compile forms you type at it?  If so it is
> > >> probably compiling only the *call* to the function, not the function.
> > >> So compiling the function may well help.
> > > If the call is in a time body, it is going to be compiled first.
>
> > > It looks like as of SBCL 1.0.19 at least that functions aren't
> > > automatically compiled anymore:
> > > * (defun factorial (x)
> > > (if (zerop x) 1 (* x (factorial (1- x)))))
>
> > > FACTORIAL
> > > * (compiled-function-p 'factorial)
>
> > > NIL
>
> > > So that is probably the problem.
>
> > CL-USER> (compiled-function-p #'factorial)
> > T
> > CL-USER>
>
> Compiling the function takes, maybe, a second off the run time. I
> think Rainer is on to it. After reading the CL-PPCRE hints again it
> looks like Dr. Weitz recommends against the kind of function I am
> using. The question is how do I force cl-ppcre:scan to to compile the
> regular expression only once on the first pass? defconstant?

Compile it with create-scanner outside your loop.