From: Robert Uhl
Subject: Fast Run-time String Interpolation
Date: 
Message-ID: <m3ejubeq1p.fsf@NOSPAMgmail.com>
After many months of not doing anything, I'm working on my Common Lisp
port of blosxom.  I've got it in a roughly working version, but I've
found a few bottlenecks.  One was searching the disk for entries--I
managed to write an optimised version of DIRECTORY which did what I
needed much more quickly than SBCL's built-in version.  But now I'm
facing a tougher one, and figured perhaps someone could be of help.

The way that blosxom works is that it reads in snippets of text
(templates) which contain variable references of the form
$[plugin::]variable.  As a template is used, the value of the variable
is substituted in.  The approach I'm taking is that rather than use
actual variable references, I'm getting the values from a hash
table--but this is abstracted away, so it doesn't really matter.

My first naive verbatim port of the Perl code was this:

  (defun interpolate (template)
    (unless template
      (return-from interpolate ""))
    (loop
     with last = 0
     for (s  e) = (multiple-value-list
                    (scan *template-variable-re* template :start last))
     while s
     while (< last (length template))
     do (let* ((var (intern (string-upcase (subseq template (1+ s)
  						 e))))
  	     (val (get-template-var var)))
  	(setf last (+ s (length  val)))
  	(if val
  	    (setf template (concatenate 'string
  					  (subseq template 0 s)
  					  val
  					  (when (< e (length template))
  					    (subseq template e))))
  	    (incf last))))
    template)

I played around with various type declarations, but couldn't get it
significantly faster.  Then I thought that perhaps it'd make sense to
take advantage of the highly optimised CL-PPCRE, so I wrote this:

  (defun interpolate-2 (template)
    (unless template
      (return-from interpolate-2 ""))
    (regex-replace-all
     "(\\$\\w+(?:::)?\\w*)" 
     template
     (lambda (str start end match-start match-end
  	    register-start register-end)
       (declare (ignorable start end register-start register-end))
       (or (get-template-var (intern (string-upcase
  				    (subseq str (1+ match-start)
  					    match-end))))
           (subseq str match-start match-end)))))

One would think that this should be faster, but it is in fact almost
twice as slow.  Could it be due to the overhead of a function call for
each match?

Once again, I tried various optimisations with (optimize (speed 3) (size
0) (safety 0) (debug 0)) and tossing in type declarations until the
compiler quite warning that it could do better, but to no avail.

My question is two-fold: first, can anyone advise a better string
interpolation algorithm?  Second, could anyone help me optimise what I
have here?

Oh, and I had one other idea: in INTERPOLATE, instead of concatenating
every time through, instead append up a list of strings (every other
string being the original template text and the remainder the
interpolated values), then finally doing an (apply #'concatenate
list-o-strings) at the end.  Remarkably, that was as slow as
INTERPOLATE-2.

-- 
Robert Uhl <http://public.xdi.org/=ruhl>
Sometimes, I think the French were put on earth for no other reason than
to give Germany an over-inflated sense of their military prowess. Other
times, I can't come up with any reason at all.         --Burt Prelutsky

From: Tayssir John Gabbour
Subject: Re: Fast Run-time String Interpolation
Date: 
Message-ID: <1158426689.853309.196700@h48g2000cwc.googlegroups.com>
Robert Uhl wrote:
> I played around with various type declarations, but couldn't get it
> significantly faster.  Then I thought that perhaps it'd make sense to
> take advantage of the highly optimised CL-PPCRE, so I wrote this:
>
>   (defun interpolate-2 (template)
>     (unless template
>       (return-from interpolate-2 ""))
>     (regex-replace-all
>      "(\\$\\w+(?:::)?\\w*)"
>      template
>      (lambda (str start end match-start match-end
>   	    register-start register-end)
>        (declare (ignorable start end register-start register-end))
>        (or (get-template-var (intern (string-upcase
>   				    (subseq str (1+ match-start)
>   					    match-end))))
>            (subseq str match-start match-end)))))

If your lisp implementation doesn't support compiler macros, then
putting the string
"(\\$\\w+(?:::)?\\w*)"
in your code verbatim may be very slow due to the overhead of parsing
it each time. It may be preferable to do something like:

;; Completely untested; off the top of my head.
(defparameter *blah*
  (create-scanner "(\\$\\w+(?:::)?\\w*)"))

(Excuse me if I misunderstand; gotta run to the store.)


MfG,
Tayssir
From: Robert Uhl
Subject: Re: Fast Run-time String Interpolation
Date: 
Message-ID: <m33baremrs.fsf@NOSPAMgmail.com>
"Tayssir John Gabbour" <···········@yahoo.com> writes:
>
> If your lisp implementation doesn't support compiler macros, then
> putting the string
> "(\\$\\w+(?:::)?\\w*)"
> in your code verbatim may be very slow due to the overhead of parsing
> it each time. It may be preferable to do something like:
>
> ;; Completely untested; off the top of my head.
> (defparameter *blah*
>   (create-scanner "(\\$\\w+(?:::)?\\w*)"))
>
> (Excuse me if I misunderstand; gotta run to the store.)

That's in fact what I do--there's a parameter *template-var-re*.  But I
discovered that it's un-needed; SBCL apparently supports compiler macros.

-- 
Robert Uhl <http://public.xdi.org/=ruhl>
Ah, those were the days. Burn anything that would burn, and try to burn
anything else.  We melted a lot of tent poles.
        -- Gus Hartmann, on being a Boy Scout
From: Zach Beane
Subject: Re: Fast Run-time String Interpolation
Date: 
Message-ID: <m3irjnof26.fsf@unnamed.xach.com>
Robert Uhl <·········@NOSPAMgmail.com> writes:

> My question is two-fold: first, can anyone advise a better string
> interpolation algorithm?

The model used by HTML-TEMPLATE is very nice.

Zach
From: Stephen Compall
Subject: Re: Fast Run-time String Interpolation
Date: 
Message-ID: <2kXOg.35784$PM1.25703@fe04.news.easynews.com>
Robert Uhl wrote:
> My question is two-fold: first, can anyone advise a better string
> interpolation algorithm?  Second, could anyone help me optimise what I
> have here?

Precompute the real TEMPLATE from the original file before performing
substitutions.  I did one of these when I was learning Lisp, and it
simply answered Lisp forms to be compiled in a PROGN that would write
the template with substitutions (which were inline Lisp forms).

Since your vars are symbols, you can also precompute by keeping a
simple list, with strings written out verbatim and symbols looked up
as vars.  This can be extended in an OO fashion by adding classes,
each with its own write semantic given vars.

(defun expand-template (template stream)
  (dolist (subtemplate template)
    (expand-template-on subtemplate stream)))

(defmethod expand-template-on ((self string) stream)
  (write-string self stream))

(defmethod expand-template-on ((self symbol) stream)
  (princ (get-template-var self) stream))

-- 
Stephen Compall
http://scompall.nocandysw.com/blog
From: Wade Humeniuk
Subject: Re: Fast Run-time String Interpolation
Date: 
Message-ID: <s5%Og.14691$cz3.9411@edtnps82>
Robert Uhl wrote:
> After many months of not doing anything, I'm working on my Common Lisp
> port of blosxom.  I've got it in a roughly working version, but I've
> found a few bottlenecks.  One was searching the disk for entries--I
> managed to write an optimised version of DIRECTORY which did what I
> needed much more quickly than SBCL's built-in version.  But now I'm
> facing a tougher one, and figured perhaps someone could be of help.
> 
> The way that blosxom works is that it reads in snippets of text
> (templates) which contain variable references of the form
> $[plugin::]variable.  As a template is used, the value of the variable
> is substituted in.  The approach I'm taking is that rather than use
> actual variable references, I'm getting the values from a hash
> table--but this is abstracted away, so it doesn't really matter.
> 

This may give you few ideas, the example gets the value from the
top-level symbol's value (symbol-value ...)

(defun interpolate (template)
   (with-output-to-string (os)
     (with-input-from-string (is template)
       (handler-case
           (loop for c = (read-char is)
                 if (char= c #\$) do
                    (write (symbol-value (read-preserving-whitespace is)) :stream os)
                 else do (write-char c os))
         (end-of-file () nil)))))

CL-USER 20 > (defvar test 10)
TEST

CL-USER 21 > (defvar test::test20 20)
TEST::TEST20

CL-USER 22 > (interpolate "$test")
"10"

CL-USER 23 > (interpolate "other stuff $test::test20 and it 10 ")
"other stuff 20 and it 10 "

CL-USER 24 >

Wade
From: Holger Schauer
Subject: Re: Fast Run-time String Interpolation
Date: 
Message-ID: <yxz3bakch1k.fsf@elendil.holgi.priv>
On 4763 September 1993, Robert Uhl wrote:
> My first naive verbatim port of the Perl code was this:
>   (defun interpolate (template)
>     (unless template
>       (return-from interpolate ""))
>     (loop
>      with last = 0
>      for (s  e) = (multiple-value-list
>                     (scan *template-variable-re* template :start last))

I think one culprit is this part: Regardless of where you want to
start from (i.e. what is the current value of last), you'll go over
the template from 0 to last.

>      while s
>      while (< last (length template))

And that's of course horrible, too: Every time you're looping, you'll
compute the length of the template anew, scanning the entire template!

>      do (let* ((var (intern (string-upcase (subseq template (1+ s)
>   						 e))))
>   	     (val (get-template-var var)))
>   	(setf last (+ s (length  val)))
>   	(if val
>   	    (setf template (concatenate 'string
>   					  (subseq template 0 s)
>   					  val
>   					  (when (< e (length template))

And here, the complete template will even be scanned a third time.
Actually, Wade's propasal essentially overcomes these problems, as it
tries to walk the string only once.

>   (defun interpolate-2 (template)
>     (unless template
>       (return-from interpolate-2 ""))
>     (regex-replace-all
>      "(\\$\\w+(?:::)?\\w*)" 
>      template
>      (lambda (str start end match-start match-end
[..]

> One would think that this should be faster, but it is in fact almost
> twice as slow.  Could it be due to the overhead of a function call for
> each match?

I don't think so. Could it be that your regular expression is too
general, so that the lambda expression is called in too many cases? 
But I have next to no experience with CL-PPCRE, so there might be
another problem.

> My question is two-fold: first, can anyone advise a better string
> interpolation algorithm?

Wade's proposal is the way that I would have went, too.

Holger

-- 
---          http://www.coling.uni-freiburg.de/~schauer/            ---
"It took writing a PhD dissertation on the topic to convince me
 otherwise.  So I'll hereby admit it: Mrs. Shellenbarger, you were
 right!"          -- Mike Maxwell in comp.ai.nat-lang