From: Conrad
Subject: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <0c5fa38d-9b39-4558-871d-b9554ac16faf@x35g2000hsb.googlegroups.com>
Hi guys/gals:

Suppose I want to build a big string out of smaller parts in CL in a
fully functional way...

For example, suppose that I want to build a string representing an SVG
file (this would be the same for any string-based data structure
though)

The obvious way to do this is to create functions for SVG rectangles,
ellipses, etc first. These would all return string representations of
those objects. Next, one would write a 'build-svg function that might
look like this:

(defun build-svg (shapes)
  (concatenate (build-svg-header)
               (apply #'concatenate (mapcar (lambda (shape)
                                              (case (car chape)
                                                (rectangle (apply
#'rectangle (cdr shape)))
                                                (ellipse (apply
#'ellipse (cdr shape)))
                                                ...))
                                            shapes))
               (build-svg-footer)))

We've all written code like this before, building big strings from
small strings in an iterative fashion. But as all of you would
probably agree, building big strings like this using nested
concatenations is inefficient, because each new concatenation might
require unnecessary allocation operations as the string fragments are
connected together.

In lazy languages, such as Haskell, I believe the inefficiency of
these concatenations is usually not an issue. But the majority of
languages, including CL, require a less naive approach to string
building, using either "string builders" or some kind of stream-based
approach.

Consequently, a more appropriate way to build strings in CL, if I
understand the problem correctly, would be as follows:

(defun build-svg (shapes)
  (with-output-to-string
      (princ (build-svg-header))
      (mapc (lambda (shape)
              (case (car chape)
                (rectangle (princ (apply #'rectangle (cdr shape))))
                (ellipse (princ (apply #'ellipse (cdr shape))))
                ...))
            shapes)
      (princ (build-svg-footer))))

Here, we use the properties of output streams to collect the
individual strings through 'princ and then collect the results with
'with-output-to-string. This approach is a much more efficient way of
iteratively accumulating small strings- but this code is, of course,
not functional.

My question is therefore as follows: What are the "best practices" for
accumulating strings in a purely functional way in CL? Is it to just
concatenate strings as I do in my first example, despite the
inefficiency? Or, is there a way to do write pure FP CL that has an
improved performance profile, as shown in my second example?

I suspect the answer is "No", but I don't know for sure...

Thanks in advance for any insights/advice you may have into pure FP CL
string accumulation :-)

From: Paul Donnelly
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <87od5r36r6.fsf@plap.localdomain>
Conrad <······@gmail.com> writes:

> Hi guys/gals:
>
> Suppose I want to build a big string out of smaller parts in CL in a
> fully functional way...
>
> For example, suppose that I want to build a string representing an SVG
> file (this would be the same for any string-based data structure
> though)
>
> The obvious way to do this is to create functions for SVG rectangles,
> ellipses, etc first. These would all return string representations of
> those objects. Next, one would write a 'build-svg function that might
> look like this:
>
> (defun build-svg (shapes)
>   (concatenate (build-svg-header)
>                (apply #'concatenate (mapcar (lambda (shape)
>                                               (case (car chape)
>                                                 (rectangle (apply
> #'rectangle (cdr shape)))
>                                                 (ellipse (apply
> #'ellipse (cdr shape)))
>                                                 ...))
>                                             shapes))
>                (build-svg-footer)))
>
> We've all written code like this before, building big strings from
> small strings in an iterative fashion. But as all of you would
> probably agree, building big strings like this using nested
> concatenations is inefficient, because each new concatenation might
> require unnecessary allocation operations as the string fragments are
> connected together.

I doubt

(reduce (lambda (s1 s2) (concatenate 'string s1 s2)) strings)

is going to be problematic. For example, rendering CL data structures to
a SVG file would be hard to do too slowly. Why not write this nearly all
of the time and use WITH-OUTPUT-TO-STRING for the very few cases where
extra speed is needed (if it is actually faster... have you profiled
these)?
From: Rainer Joswig
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <joswig-175CF8.09350424062008@news-europe.giganews.com>
In article <··············@plap.localdomain>,
 Paul Donnelly <·············@sbcglobal.net> wrote:

> Conrad <······@gmail.com> writes:
> 
> > Hi guys/gals:
> >
> > Suppose I want to build a big string out of smaller parts in CL in a
> > fully functional way...
> >
> > For example, suppose that I want to build a string representing an SVG
> > file (this would be the same for any string-based data structure
> > though)
> >
> > The obvious way to do this is to create functions for SVG rectangles,
> > ellipses, etc first. These would all return string representations of
> > those objects. Next, one would write a 'build-svg function that might
> > look like this:
> >
> > (defun build-svg (shapes)
> >   (concatenate (build-svg-header)
> >                (apply #'concatenate (mapcar (lambda (shape)
> >                                               (case (car chape)
> >                                                 (rectangle (apply
> > #'rectangle (cdr shape)))
> >                                                 (ellipse (apply
> > #'ellipse (cdr shape)))
> >                                                 ...))
> >                                             shapes))
> >                (build-svg-footer)))
> >
> > We've all written code like this before, building big strings from
> > small strings in an iterative fashion. But as all of you would
> > probably agree, building big strings like this using nested
> > concatenations is inefficient, because each new concatenation might
> > require unnecessary allocation operations as the string fragments are
> > connected together.
> 
> I doubt
> 
> (reduce (lambda (s1 s2) (concatenate 'string s1 s2)) strings)
> 
> is going to be problematic. For example, rendering CL data structures to
> a SVG file would be hard to do too slowly. Why not write this nearly all
> of the time and use WITH-OUTPUT-TO-STRING for the very few cases where
> extra speed is needed (if it is actually faster... have you profiled
> these)?

In case one has a bunch of strings and wants to concatenate
them, to use REDUCE is not that clever.

WITH-OUTPUT-TO-STRING is not that clever, either.

It is more clever to:

* allocate the result string (since you concatenate strings,
  one knows the lengths)

* copy each string into the result string at the correct position

-- 
http://lispm.dyndns.org/
From: Paul Donnelly
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <87k5gf2rbp.fsf@plap.localdomain>
Rainer Joswig <······@lisp.de> writes:

> In article <··············@plap.localdomain>,
>  Paul Donnelly <·············@sbcglobal.net> wrote:
>
>> Conrad <······@gmail.com> writes:
>> 
>> > Hi guys/gals:
>> >
>> > Suppose I want to build a big string out of smaller parts in CL in a
>> > fully functional way...
>> >
>> > For example, suppose that I want to build a string representing an SVG
>> > file (this would be the same for any string-based data structure
>> > though)
>> >
>> > The obvious way to do this is to create functions for SVG rectangles,
>> > ellipses, etc first. These would all return string representations of
>> > those objects. Next, one would write a 'build-svg function that might
>> > look like this:
>> >
>> > (defun build-svg (shapes)
>> >   (concatenate (build-svg-header)
>> >                (apply #'concatenate (mapcar (lambda (shape)
>> >                                               (case (car chape)
>> >                                                 (rectangle (apply
>> > #'rectangle (cdr shape)))
>> >                                                 (ellipse (apply
>> > #'ellipse (cdr shape)))
>> >                                                 ...))
>> >                                             shapes))
>> >                (build-svg-footer)))
>> >
>> > We've all written code like this before, building big strings from
>> > small strings in an iterative fashion. But as all of you would
>> > probably agree, building big strings like this using nested
>> > concatenations is inefficient, because each new concatenation might
>> > require unnecessary allocation operations as the string fragments are
>> > connected together.
>> 
>> I doubt
>> 
>> (reduce (lambda (s1 s2) (concatenate 'string s1 s2)) strings)
>> 
>> is going to be problematic. For example, rendering CL data structures to
>> a SVG file would be hard to do too slowly. Why not write this nearly all
>> of the time and use WITH-OUTPUT-TO-STRING for the very few cases where
>> extra speed is needed (if it is actually faster... have you profiled
>> these)?
>
> In case one has a bunch of strings and wants to concatenate
> them, to use REDUCE is not that clever.

That's not exactly the metric by which I judge code. REDUCE is clear and
probably good in most situations (I preferred it over APPLY since there
could potentially be a lot of strings -- although I'm aware that many
Lisps have a very high CALL-ARGUMENTS-LIMIT).

> WITH-OUTPUT-TO-STRING is not that clever, either.
>
> It is more clever to:
>
> * allocate the result string (since you concatenate strings,
>   one knows the lengths)
>
> * copy each string into the result string at the correct position
>
> -- 
> http://lispm.dyndns.org/
From: Rainer Joswig
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <joswig-8F19FA.11295024062008@news-europe.giganews.com>
In article <··············@plap.localdomain>,
 Paul Donnelly <·············@sbcglobal.net> wrote:

> Rainer Joswig <······@lisp.de> writes:
> 
> > In article <··············@plap.localdomain>,
> >  Paul Donnelly <·············@sbcglobal.net> wrote:
> >
> >> Conrad <······@gmail.com> writes:
> >> 
> >> > Hi guys/gals:
> >> >
> >> > Suppose I want to build a big string out of smaller parts in CL in a
> >> > fully functional way...
> >> >
> >> > For example, suppose that I want to build a string representing an SVG
> >> > file (this would be the same for any string-based data structure
> >> > though)
> >> >
> >> > The obvious way to do this is to create functions for SVG rectangles,
> >> > ellipses, etc first. These would all return string representations of
> >> > those objects. Next, one would write a 'build-svg function that might
> >> > look like this:
> >> >
> >> > (defun build-svg (shapes)
> >> >   (concatenate (build-svg-header)
> >> >                (apply #'concatenate (mapcar (lambda (shape)
> >> >                                               (case (car chape)
> >> >                                                 (rectangle (apply
> >> > #'rectangle (cdr shape)))
> >> >                                                 (ellipse (apply
> >> > #'ellipse (cdr shape)))
> >> >                                                 ...))
> >> >                                             shapes))
> >> >                (build-svg-footer)))
> >> >
> >> > We've all written code like this before, building big strings from
> >> > small strings in an iterative fashion. But as all of you would
> >> > probably agree, building big strings like this using nested
> >> > concatenations is inefficient, because each new concatenation might
> >> > require unnecessary allocation operations as the string fragments are
> >> > connected together.
> >> 
> >> I doubt
> >> 
> >> (reduce (lambda (s1 s2) (concatenate 'string s1 s2)) strings)
> >> 
> >> is going to be problematic. For example, rendering CL data structures to
> >> a SVG file would be hard to do too slowly. Why not write this nearly all
> >> of the time and use WITH-OUTPUT-TO-STRING for the very few cases where
> >> extra speed is needed (if it is actually faster... have you profiled
> >> these)?
> >
> > In case one has a bunch of strings and wants to concatenate
> > them, to use REDUCE is not that clever.
> 
> That's not exactly the metric by which I judge code.

Which metric? Basic algorithmic complexity for simple
algorithms that can be easily be coded in o(n) should not be
made worse.

If one writes in LIsp, then one may write lots of little building blocks.
Make sure that they scale, because much of the rest of the software
will depend on them.

> REDUCE is clear and
> probably good in most situations (I preferred it over APPLY since there
> could potentially be a lot of strings -- although I'm aware that many
> Lisps have a very high CALL-ARGUMENTS-LIMIT).

Alan Perlis said:

 A LISP programmer knows the value of everything, but the cost of nothing.


(defun not-so-clever-concatenate (strings)
  (reduce #'concatenate2 strings))

(defun concatenate2 (s1 s2)
  (format t "~% concatenating ~a and ~a" s1 s2)
  (let ((result (concatenate 'string s1 s2)))
      (format t "~% > to ~a" result)
      result))

CL-USER 2 > (not-so-clever-concatenate '("ab" "cd" "ef" "gh" "ij" "kl" "mn" "op" "qr" "st" "uv" "wx" "yz"))

 concatenating ab and cd
 > to abcd
 concatenating abcd and ef
 > to abcdef
 concatenating abcdef and gh
 > to abcdefgh
 concatenating abcdefgh and ij
 > to abcdefghij
 concatenating abcdefghij and kl
 > to abcdefghijkl
 concatenating abcdefghijkl and mn
 > to abcdefghijklmn
 concatenating abcdefghijklmn and op
 > to abcdefghijklmnop
 concatenating abcdefghijklmnop and qr
 > to abcdefghijklmnopqr
 concatenating abcdefghijklmnopqr and st
 > to abcdefghijklmnopqrst
 concatenating abcdefghijklmnopqrst and uv
 > to abcdefghijklmnopqrstuv
 concatenating abcdefghijklmnopqrstuv and wx
 > to abcdefghijklmnopqrstuvwx
 concatenating abcdefghijklmnopqrstuvwx and yz
 > to abcdefghijklmnopqrstuvwxyz

"abcdefghijklmnopqrstuvwxyz"


Looks downright stupid to me. If you wanted to design a string
concatenate that is creating the most amount of work,
above is a good candidate.





> 
> > WITH-OUTPUT-TO-STRING is not that clever, either.
> >
> > It is more clever to:
> >
> > * allocate the result string (since you concatenate strings,
> >   one knows the lengths)
> >
> > * copy each string into the result string at the correct position
> >
> > -- 
> > http://lispm.dyndns.org/

-- 
http://lispm.dyndns.org/
From: Paul Donnelly
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <877ice3ggp.fsf@plap.localdomain>
Rainer Joswig <······@lisp.de> writes:

> In article <··············@plap.localdomain>,
>  Paul Donnelly <·············@sbcglobal.net> wrote:
>
>> Rainer Joswig <······@lisp.de> writes:
>> 
>> > In article <··············@plap.localdomain>,
>> >  Paul Donnelly <·············@sbcglobal.net> wrote:
>> >
>> >> Conrad <······@gmail.com> writes:
>> >> 
>> >> > Hi guys/gals:
>> >> >
>> >> > Suppose I want to build a big string out of smaller parts in CL in a
>> >> > fully functional way...
>> >> >
>> >> > For example, suppose that I want to build a string representing an SVG
>> >> > file (this would be the same for any string-based data structure
>> >> > though)
>> >> >
>> >> > The obvious way to do this is to create functions for SVG rectangles,
>> >> > ellipses, etc first. These would all return string representations of
>> >> > those objects. Next, one would write a 'build-svg function that might
>> >> > look like this:
>> >> >
>> >> > (defun build-svg (shapes)
>> >> >   (concatenate (build-svg-header)
>> >> >                (apply #'concatenate (mapcar (lambda (shape)
>> >> >                                               (case (car chape)
>> >> >                                                 (rectangle (apply
>> >> > #'rectangle (cdr shape)))
>> >> >                                                 (ellipse (apply
>> >> > #'ellipse (cdr shape)))
>> >> >                                                 ...))
>> >> >                                             shapes))
>> >> >                (build-svg-footer)))
>> >> >
>> >> > We've all written code like this before, building big strings from
>> >> > small strings in an iterative fashion. But as all of you would
>> >> > probably agree, building big strings like this using nested
>> >> > concatenations is inefficient, because each new concatenation might
>> >> > require unnecessary allocation operations as the string fragments are
>> >> > connected together.
>> >> 
>> >> I doubt
>> >> 
>> >> (reduce (lambda (s1 s2) (concatenate 'string s1 s2)) strings)
>> >> 
>> >> is going to be problematic. For example, rendering CL data structures to
>> >> a SVG file would be hard to do too slowly. Why not write this nearly all
>> >> of the time and use WITH-OUTPUT-TO-STRING for the very few cases where
>> >> extra speed is needed (if it is actually faster... have you profiled
>> >> these)?
>> >
>> > In case one has a bunch of strings and wants to concatenate
>> > them, to use REDUCE is not that clever.
>> 
>> That's not exactly the metric by which I judge code.
>
> Which metric? Basic algorithmic complexity for simple
> algorithms that can be easily be coded in o(n) should not be
> made worse.
>
> If one writes in LIsp, then one may write lots of little building blocks.
> Make sure that they scale, because much of the rest of the software
> will depend on them.
>
>> REDUCE is clear and
>> probably good in most situations (I preferred it over APPLY since there
>> could potentially be a lot of strings -- although I'm aware that many
>> Lisps have a very high CALL-ARGUMENTS-LIMIT).
>
> Alan Perlis said:
>
>  A LISP programmer knows the value of everything,

Including my time. I think there's an appropriate time to do what you
suggest, but there's also an appropriate time not to bother.
From: Rainer Joswig
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <joswig-CFF9BD.19373024062008@news-europe.giganews.com>
In article <··············@plap.localdomain>,
 Paul Donnelly <·············@sbcglobal.net> wrote:

> Rainer Joswig <······@lisp.de> writes:
> 
> > In article <··············@plap.localdomain>,
> >  Paul Donnelly <·············@sbcglobal.net> wrote:
> >
> >> Rainer Joswig <······@lisp.de> writes:
> >> 
> >> > In article <··············@plap.localdomain>,
> >> >  Paul Donnelly <·············@sbcglobal.net> wrote:
> >> >
> >> >> Conrad <······@gmail.com> writes:
> >> >> 
> >> >> > Hi guys/gals:
> >> >> >
> >> >> > Suppose I want to build a big string out of smaller parts in CL in a
> >> >> > fully functional way...
> >> >> >
> >> >> > For example, suppose that I want to build a string representing an SVG
> >> >> > file (this would be the same for any string-based data structure
> >> >> > though)
> >> >> >
> >> >> > The obvious way to do this is to create functions for SVG rectangles,
> >> >> > ellipses, etc first. These would all return string representations of
> >> >> > those objects. Next, one would write a 'build-svg function that might
> >> >> > look like this:
> >> >> >
> >> >> > (defun build-svg (shapes)
> >> >> >   (concatenate (build-svg-header)
> >> >> >                (apply #'concatenate (mapcar (lambda (shape)
> >> >> >                                               (case (car chape)
> >> >> >                                                 (rectangle (apply
> >> >> > #'rectangle (cdr shape)))
> >> >> >                                                 (ellipse (apply
> >> >> > #'ellipse (cdr shape)))
> >> >> >                                                 ...))
> >> >> >                                             shapes))
> >> >> >                (build-svg-footer)))
> >> >> >
> >> >> > We've all written code like this before, building big strings from
> >> >> > small strings in an iterative fashion. But as all of you would
> >> >> > probably agree, building big strings like this using nested
> >> >> > concatenations is inefficient, because each new concatenation might
> >> >> > require unnecessary allocation operations as the string fragments are
> >> >> > connected together.
> >> >> 
> >> >> I doubt
> >> >> 
> >> >> (reduce (lambda (s1 s2) (concatenate 'string s1 s2)) strings)
> >> >> 
> >> >> is going to be problematic. For example, rendering CL data structures to
> >> >> a SVG file would be hard to do too slowly. Why not write this nearly all
> >> >> of the time and use WITH-OUTPUT-TO-STRING for the very few cases where
> >> >> extra speed is needed (if it is actually faster... have you profiled
> >> >> these)?
> >> >
> >> > In case one has a bunch of strings and wants to concatenate
> >> > them, to use REDUCE is not that clever.
> >> 
> >> That's not exactly the metric by which I judge code.
> >
> > Which metric? Basic algorithmic complexity for simple
> > algorithms that can be easily be coded in o(n) should not be
> > made worse.
> >
> > If one writes in LIsp, then one may write lots of little building blocks.
> > Make sure that they scale, because much of the rest of the software
> > will depend on them.
> >
> >> REDUCE is clear and
> >> probably good in most situations (I preferred it over APPLY since there
> >> could potentially be a lot of strings -- although I'm aware that many
> >> Lisps have a very high CALL-ARGUMENTS-LIMIT).
> >
> > Alan Perlis said:
> >
> >  A LISP programmer knows the value of everything,
> 
> Including my time. I think there's an appropriate time to do what you
> suggest, but there's also an appropriate time not to bother.

No, its the wrong attitude to write sloppy code and to not bother.
I do sometimes, too. But when we discuss here at length about
a certain problem, we should not end and promote the worst choice.

It is also useful to get a algorithm design 'mostly' right
while developing the code. It REALLY gets expensive in
maintenance to get rid of those wrong choices. That's where
my time gets valuable. I like to do new stuff and not
maintain my sloppy code and hunt for naive code.

-- 
http://lispm.dyndns.org/
From: Paul Donnelly
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <87zlpa45cz.fsf@plap.localdomain>
Rainer Joswig <······@lisp.de> writes:

> In article <··············@plap.localdomain>,
>  Paul Donnelly <·············@sbcglobal.net> wrote:
>
>> Rainer Joswig <······@lisp.de> writes:
>> 
>> > In article <··············@plap.localdomain>,
>> >  Paul Donnelly <·············@sbcglobal.net> wrote:
>> >
>> >> Rainer Joswig <······@lisp.de> writes:
>> >> 
>> >> > In article <··············@plap.localdomain>,
>> >> >  Paul Donnelly <·············@sbcglobal.net> wrote:
>> >> >
>> >> >> Conrad <······@gmail.com> writes:
>> >> >> 
>> >> >> > Hi guys/gals:
>> >> >> >
>> >> >> > Suppose I want to build a big string out of smaller parts in CL in a
>> >> >> > fully functional way...
>> >> >> >
>> >> >> > For example, suppose that I want to build a string representing an SVG
>> >> >> > file (this would be the same for any string-based data structure
>> >> >> > though)
>> >> >> >
>> >> >> > The obvious way to do this is to create functions for SVG rectangles,
>> >> >> > ellipses, etc first. These would all return string representations of
>> >> >> > those objects. Next, one would write a 'build-svg function that might
>> >> >> > look like this:
>> >> >> >
>> >> >> > (defun build-svg (shapes)
>> >> >> >   (concatenate (build-svg-header)
>> >> >> >                (apply #'concatenate (mapcar (lambda (shape)
>> >> >> >                                               (case (car chape)
>> >> >> >                                                 (rectangle (apply
>> >> >> > #'rectangle (cdr shape)))
>> >> >> >                                                 (ellipse (apply
>> >> >> > #'ellipse (cdr shape)))
>> >> >> >                                                 ...))
>> >> >> >                                             shapes))
>> >> >> >                (build-svg-footer)))
>> >> >> >
>> >> >> > We've all written code like this before, building big strings from
>> >> >> > small strings in an iterative fashion. But as all of you would
>> >> >> > probably agree, building big strings like this using nested
>> >> >> > concatenations is inefficient, because each new concatenation might
>> >> >> > require unnecessary allocation operations as the string fragments are
>> >> >> > connected together.
>> >> >> 
>> >> >> I doubt
>> >> >> 
>> >> >> (reduce (lambda (s1 s2) (concatenate 'string s1 s2)) strings)
>> >> >> 
>> >> >> is going to be problematic. For example, rendering CL data structures to
>> >> >> a SVG file would be hard to do too slowly. Why not write this nearly all
>> >> >> of the time and use WITH-OUTPUT-TO-STRING for the very few cases where
>> >> >> extra speed is needed (if it is actually faster... have you profiled
>> >> >> these)?
>> >> >
>> >> > In case one has a bunch of strings and wants to concatenate
>> >> > them, to use REDUCE is not that clever.
>> >> 
>> >> That's not exactly the metric by which I judge code.
>> >
>> > Which metric? Basic algorithmic complexity for simple
>> > algorithms that can be easily be coded in o(n) should not be
>> > made worse.
>> >
>> > If one writes in LIsp, then one may write lots of little building blocks.
>> > Make sure that they scale, because much of the rest of the software
>> > will depend on them.
>> >
>> >> REDUCE is clear and
>> >> probably good in most situations (I preferred it over APPLY since there
>> >> could potentially be a lot of strings -- although I'm aware that many
>> >> Lisps have a very high CALL-ARGUMENTS-LIMIT).
>> >
>> > Alan Perlis said:
>> >
>> >  A LISP programmer knows the value of everything,
>> 
>> Including my time. I think there's an appropriate time to do what you
>> suggest, but there's also an appropriate time not to bother.
>
> No, its the wrong attitude to write sloppy code and to not bother.
> I do sometimes, too. But when we discuss here at length about
> a certain problem, we should not end and promote the worst choice.

Fair enough. Using REDUCE turned out to be slower than I expected (the
need for continued allocation?). I was reacting mostly against your use
of the word "clever", which carries the bad connotation to me of writing
code that's more than the situation calls for -- but in retrospect, I
think your solution would be completely warranted more often than I
thought. Maybe I'll listen to my elders next time?

> It is also useful to get a algorithm design 'mostly' right
> while developing the code. It REALLY gets expensive in
> maintenance to get rid of those wrong choices. That's where
> my time gets valuable. I like to do new stuff and not
> maintain my sloppy code and hunt for naive code.
>
> -- 
> http://lispm.dyndns.org/
From: Rainer Joswig
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <joswig-810AAA.09465925062008@news-europe.giganews.com>
In article <··············@plap.localdomain>,
 Paul Donnelly <·············@sbcglobal.net> wrote:

> Fair enough. Using REDUCE turned out to be slower than I expected (the
> need for continued allocation?).

There are several factors that make the version of REDUCE with
concatenate slow:

* function call overhead of calling concatenate

* concatenate needs to allocate a new result array

* concatenate has to determine the right result type

* concatenate copies element by element the contents of the arrays,
  not only for l + m times

* concatenate gets called n-1 times and each time
  it has to copy the elements of the intermediate string
  and the next  string from the list to reduce into a new string.


CL-USER 4 > (test (list "ab" "cd" "ef" "gh" "ij" "km" "no" "pq" "rs" "tu" "vw" "xy")) 
"abcdefghijkmnopqrstuvwxy"
154

So, instead of just 26 characters, we now have 154 times moved a character.

Make the list four times longer:

CL-USER 6 > (test (list "ab" "cd" "ef" "gh" "ij" "km" "no" "pq" "rs" "tu" "vw" "xy"
            "ab" "cd" "ef" "gh" "ij" "km" "no" "pq" "rs" "tu" "vw" "xy"
            "ab" "cd" "ef" "gh" "ij" "km" "no" "pq" "rs" "tu" "vw" "xy"
            "ab" "cd" "ef" "gh" "ij" "km" "no" "pq" "rs" "tu" "vw" "xy"))
"abcdefghijkmnopqrstuvwxyabcdefghijkmnopqrstuvwxyabcdefghijkmnopqrstuvwxyabcdefghijkmnopqrstuvwxy"
2350

We now have moved characters 2350 times - instead of 104 times. It gets
costly to append many strings.


But this is nothing that should surprise us. Many Common Lisp implementations
are coming with source code you can look at. For example
if you are using something Emacs-like (Slime/Emacs, Zmacs, Fred, ...)
then type concatenate and press M-. If source for your Lisp
is available, the editor will show you the places where CONCATENATE
is defined. In Slime you get a two-buffer split window with the
source locations on top and a text buffer below. You can move
in the source locations with the cursor and pressing RETURN
will display the source in the buffer below.
This means you can look at all the tricks that the implementors
use and you can learn from it. M-. on symbols brings up the source.
If it is a symbol that is defined by your Lisp implementation,
then it will show you this source.


> I was reacting mostly against your use
> of the word "clever", which carries the bad connotation to me of writing
> code that's more than the situation calls for -- but in retrospect, I
> think your solution would be completely warranted more often than I
> thought. Maybe I'll listen to my elders next time?

-- 
http://lispm.dyndns.org/
From: Pascal J. Bourguignon
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <7cbq1pn6wc.fsf@pbourguignon.anevia.com>
Rainer Joswig <······@lisp.de> writes:

> In article <··············@plap.localdomain>,
>  Paul Donnelly <·············@sbcglobal.net> wrote:
>
>> Fair enough. Using REDUCE turned out to be slower than I expected (the
>> need for continued allocation?).
>
> There are several factors that make the version of REDUCE with
> concatenate slow:

There is also one factor that can make the version using REDUCE with
CONCATENATE fast:

  * DEFINE-COMPILER-MACRO

------------------------------                ------------------------------                    
Without the compiler macro:                   Same source, using REDUCE CONCATENATE, 
                                              with the compiler macro:
------------------------------                ------------------------------                    
52  (CALL1 8)  ; NOT-SO-CLEVER-CONCATENATE    52  (CALL1 8)  ; SMARTER-CONCATENATE-STRINGS      
                                                                                                
                                                                                                
                                                                                                
                                                                                                
concatenate 10 strings                        concatenate 10 strings                            
Real time: 3.7E-5 sec.                        Real time: 2.0E-5 sec.                            
Run time: 0.0 sec.                            Run time: 0.0 sec.                                
Space: 10944 Bytes                            Space: 2016 Bytes                                 
                                                                                                
                                                                                                
                                                                                                
concatenate 100 strings                       concatenate 100 strings                           
Real time: 0.00182 sec.                       Real time: 9.2E-5 sec.                            
Run time: 0.0 sec.                            Run time: 0.0 sec.                                
Space: 1011384 Bytes                          Space: 20016 Bytes                                
                                                                                                
                                                                                                
                                                                                                
concatenate 1000 strings                      concatenate 1000 strings                          
Real time: 0.304861 sec.                      Real time: 7.92E-4 sec.                           
Run time: 0.303314 sec.                       Run time: 0.0 sec.                                
Space: 100115784 Bytes                        Space: 200016 Bytes                               
GC: 14, GC time: 0.129991 sec.                                                                  
                                                                                                
                                                                                                
                                              concatenate 10000 strings                         
concatenate 10000 strings                     Real time: 0.007825 sec.                          
Real time: 33.39584 sec.                      Run time: 0.009999 sec.                           
Run time: 33.391155 sec.                      Space: 2000016 Bytes                              
Space: 10001159784 Bytes
GC: 1588, GC time: 15.975617 sec.
------------------------------                ------------------------------                


That's another strength of Lisp, that most concerns are orthogonal to
the others.  

You don't manage memory because there is a garbage collector to do it
for you, and specialists garbage collector designers to do a better
garbage collector for your application if the standard one isn't good
enough (it's already better than what you could do yourself).

Well, you don't manage complexity either, because there are hooks in
the compiler, so you can have specialists algorithmicians do it for
you.  Just write your pseudo-code, and have them, if needed, write the
define-compiler-macro needed to make your application fly.

Or is it expecting too much?  ;-)



Now the question is whether writting:

(defun smarter-concatenate-strings (strings)
  (loop
     :with result = (make-string (loop :for s :in strings :sum (length s)))
     :for s :in strings
     :for start = 0 :then (+ start (length s))
     :do (replace result s :start1 start)
     :finally (return result)))

is really more complex / harder than writting:

(defun not-so-clever-concatenate (strings)
  (reduce (lambda (a b) (concatenate 'string a b)) strings))

Perhaps not, the more so when it should already be in your utility package.



Notice also that SBCL with-output-to-string is extremely well optimized, so 

(defun concatenate-strings-with-io (strings)
  (with-output-to-string (*standard-output*) (dolist (s strings) (princ s))))

is actually faster than smarter-concatenate-string in sbcl (while consing more)..


---8<----------(test.lisp)-------------------------------------------------------

(unintern 'not-so-clever-concatenate) ; delete the compiler macro.

(defun concatenate2 (s1 s2)
  (concatenate 'string s1 s2))

(defun not-so-clever-concatenate (strings)
  (reduce #'concatenate2 strings))


(defun test ()
  (loop
     :repeat 4
     :for i = 10 :then (* i 10)
     :do (format t "~2%") (finish-output) (ext:gc)
     :do (format t "~2%concatenate ~D strings~%" i)
     :do (let ((s (make-list i :initial-element (make-string 200 :initial-element #\a))))
           (time (not-so-clever-concatenate s)))))

(mapc 'compile '(concatenate2 not-so-clever-concatenate test))

(format t "------------------------------~%")
#+clisp (format t "~A~%" (find-if (lambda (line) (regexp:match  "^.*CALL1 8.*$" line))
                                  (split-sequence:split-sequence #\newline (with-output-to-string (*standard-output*) (disassemble 'test)))))
(test)
(format t "------------------------------~%")



(defun concatenate2 (s1 s2)
  (concatenate 'string s1 s2))

(defun not-so-clever-concatenate (strings)
  (reduce #'concatenate2 strings))

(defun smarter-concatenate-strings (strings)
  (loop
     :with result = (make-string (loop :for s :in strings :sum (length s)))
     :for s :in strings
     :for start = 0 :then (+ start (length s))
     :do (replace result s :start1 start)
     :finally (return result)))

(define-compiler-macro not-so-clever-concatenate (strings)
  `(smarter-concatenate-strings ,strings))

(defun test ()
  (loop
     :repeat 4
     :for i = 10 :then (* i 10)
     :do (format t "~2%") (finish-output) (ext:gc)
     :do (format t "~2%concatenate ~D strings~%" i)
     :do (let ((s (make-list i :initial-element (make-string 200 :initial-element #\a))))
           (time (not-so-clever-concatenate s)))))

(mapc 'compile '(concatenate2 not-so-clever-concatenate smarter-concatenate-strings test))

#+clisp (format t "~A~%" (find-if (lambda (line) (regexp:match  "^.*CALL1 8.*$" line))
                                  (split-sequence:split-sequence #\newline (with-output-to-string (*standard-output*) (disassemble 'test)))))
(test)
(format t "------------------------------~%")



(defun test ()
  (loop
     :repeat 4
     :for i = 10 :then (* i 10)
     :do (format t "~2%") (finish-output) (ext:gc)
     :do (format t "~2%concatenate ~D strings~%" i)
     :do (let ((s (make-list i :initial-element (make-string 200 :initial-element #\a))))
           (time (smarter-concatenate-strings s)))))

(mapc 'compile '(test))

#+clisp (format t "~A~%" (find-if (lambda (line) (regexp:match  "^.*CALL1 8.*$" line))
                                  (split-sequence:split-sequence #\newline (with-output-to-string (*standard-output*) (disassemble 'test)))))
(test)
(format t "------------------------------~%")


---8<----------(test.lisp)-------------------------------------------------------



-- 
__Pascal Bourguignon__
From: Rainer Joswig
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <joswig-894FEB.12504125062008@news-europe.giganews.com>
In article <··············@pbourguignon.anevia.com>,
 ···@informatimago.com (Pascal J. Bourguignon) wrote:

...

> You don't manage memory because there is a garbage collector to do it

I do manage memory/allocation/reuse/... sometimes.

> for you, and specialists garbage collector designers to do a better
> garbage collector for your application if the standard one isn't good
> enough (it's already better than what you could do yourself).

Managing memory with a GC is just one technique. Then for many applications
one also needs to tune the GC for best performance.

> Well, you don't manage complexity either, because there are hooks in
> the compiler, so you can have specialists algorithmicians do it for
> you.  Just write your pseudo-code, and have them, if needed, write the
> define-compiler-macro needed to make your application fly.
> 
> Or is it expecting too much?  ;-)

Yes.

> Now the question is whether writting:
> 
> (defun smarter-concatenate-strings (strings)
>   (loop
>      :with result = (make-string (loop :for s :in strings :sum (length s)))
>      :for s :in strings
>      :for start = 0 :then (+ start (length s))
>      :do (replace result s :start1 start)
>      :finally (return result)))
> 
> is really more complex / harder than writting:
> 
> (defun not-so-clever-concatenate (strings)
>   (reduce (lambda (a b) (concatenate 'string a b)) strings))

I don't think so.

> Perhaps not, the more so when it should already be in your utility package.

Right.

> Notice also that SBCL with-output-to-string is extremely well optimized, so 
> 
> (defun concatenate-strings-with-io (strings)
>   (with-output-to-string (*standard-output*) (dolist (s strings) (princ s))))
> 
> is actually faster than smarter-concatenate-string in sbcl (while consing more)..

Any numbers? How is it done?

...

> 
> 
> 
> (defun concatenate2 (s1 s2)
>   (concatenate 'string s1 s2))
> 
> (defun not-so-clever-concatenate (strings)
>   (reduce #'concatenate2 strings))
> 
> (defun smarter-concatenate-strings (strings)
>   (loop
>      :with result = (make-string (loop :for s :in strings :sum (length s)))
>      :for s :in strings
>      :for start = 0 :then (+ start (length s))
>      :do (replace result s :start1 start)
>      :finally (return result)))
> 
> (define-compiler-macro not-so-clever-concatenate (strings)
>   `(smarter-concatenate-strings ,strings))

I don't see the point of the compiler-macro here. compiler-macros are useful, I agree.
But in this case I'd just redefine the original function and that's it. 

The compiler macro gets more useful when it can choose between several implementations
based on some compile-time knowledge.

-- 
http://lispm.dyndns.org/
From: Pascal J. Bourguignon
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <7c7icdn2up.fsf@pbourguignon.anevia.com>
Rainer Joswig <······@lisp.de> writes:
>> Notice also that SBCL with-output-to-string is extremely well optimized, so 
>> 
>> (defun concatenate-strings-with-io (strings)
>>   (with-output-to-string (*standard-output*) (dolist (s strings) (princ s))))
>> 
>> is actually faster than smarter-concatenate-string in sbcl (while consing more)..
>
> Any numbers? How is it done?


I've not read the source, but I assume it's because the allocation is
very fast, and the reallocation "faster".  I'd bet it'd be slower if
we were accumulating into two or more strings streams at the same
time.  Also, I suppose having a fast RAM bus and a big L2 cache helps
not slowing it down.



Using REPLACE                               Using WITH-OUTPUT-TO-STRING
(smarter-concatenate-strings)               (concatenate-strings-with-io)
------------------------------              ------------------------------
concatenate 10 strings                      concatenate 10 strings                              
Evaluation took:                            Evaluation took:                                    
  0.0 seconds of real time                    0.0 seconds of real time                          
  0.0 seconds of user run time                0.0 seconds of user run time                      
  0.0 seconds of system run time              0.0 seconds of system run time                    
  0 calls to %EVAL                            0 calls to %EVAL                                  
  0 page faults and                           0 page faults and                                 
  1,792 bytes consed.                         15,024 bytes consed.                              
                                                                                                
                                                                                                
                                                                                                
                                                                                                
concatenate 100 strings                     concatenate 100 strings                             
Evaluation took:                            Evaluation took:                                    
  0.001 seconds of real time                  0.0 seconds of real time                          
  0.0 seconds of user run time                0.0 seconds of user run time                      
  0.0 seconds of system run time              0.0 seconds of system run time                    
  0 calls to %EVAL                            0 calls to %EVAL                                  
  0 page faults and                           0 page faults and                                 
  80,032 bytes consed.                        173,376 bytes consed.                             
                                                                                                
                                                                                                
                                                                                                
                                                                                                
concatenate 1000 strings                    concatenate 1000 strings                            
Evaluation took:                            Evaluation took:                                    
  0.004 seconds of real time                  0.001 seconds of real time                        
  0.003333 seconds of user run time           0.0 seconds of user run time                      
  0.0 seconds of system run time              0.0 seconds of system run time                    
  0 calls to %EVAL                            0 calls to %EVAL                                  
  0 page faults and                           0 page faults and                                 
  800,032 bytes consed.                       1,608,288 bytes consed.                           
                                                                                                
                                                                                                
                                                                                                
                                                                                                
concatenate 10000 strings                   concatenate 10000 strings                           
Evaluation took:                            Evaluation took:                                    
  0.041 seconds of real time                  0.017 seconds of real time                        
  0.039998 seconds of user run time           0.003333 seconds of user run time                 
  0.0 seconds of system run time              0.013332 seconds of system run time               
  0 calls to %EVAL                            [Run times include 0.003 seconds GC run time.]    
  0 page faults and                           0 calls to %EVAL                                  
  8,000,032 bytes consed.                     0 page faults and                                 
------------------------------                21,098,688 bytes consed.                          
                                            ------------------------------                      

-- 
__Pascal Bourguignon__
From: Paul Donnelly
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <87r6algv1s.fsf@plap.localdomain>
Rainer Joswig <······@lisp.de> writes:

> In article <··············@plap.localdomain>,
>  Paul Donnelly <·············@sbcglobal.net> wrote:
>
>> Fair enough. Using REDUCE turned out to be slower than I expected (the
>> need for continued allocation?).
>
> There are several factors that make the version of REDUCE with
> concatenate slow:
>
> * function call overhead of calling concatenate
>
> * concatenate needs to allocate a new result array
>
> * concatenate has to determine the right result type
>
> * concatenate copies element by element the contents of the arrays,
>   not only for l + m times

Ah, there's my mistake. I totally missed that; thanks for explaining
this.
From: Matthew D Swank
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <dO88k.169$g32.18@newsfe07.lga>
On Tue, 24 Jun 2008 11:29:50 +0200, Rainer Joswig wrote:

> In article <··············@plap.localdomain>,
>  Paul Donnelly <·············@sbcglobal.net> wrote:
> 
>> Rainer Joswig <······@lisp.de> writes:
>> 
>> > In article <··············@plap.localdomain>,
>> >  Paul Donnelly <·············@sbcglobal.net> wrote:
>> >
>> >> Conrad <······@gmail.com> writes:
>> >> 
>> >> > Hi guys/gals:
>> >> >
>> >> > Suppose I want to build a big string out of smaller parts in CL in
>> >> > a fully functional way...
>> >> >
>> >> > For example, suppose that I want to build a string representing an
>> >> > SVG file (this would be the same for any string-based data
>> >> > structure though)
>> >> >
>> >> > The obvious way to do this is to create functions for SVG
>> >> > rectangles, ellipses, etc first. These would all return string
>> >> > representations of those objects. Next, one would write a
>> >> > 'build-svg function that might look like this:
>> >> >
>> >> > (defun build-svg (shapes)
>> >> >   (concatenate (build-svg-header)
>> >> >                (apply #'concatenate (mapcar (lambda (shape)
>> >> >                                               (case (car chape)
>> >> >                                                 (rectangle (apply
>> >> > #'rectangle (cdr shape)))
>> >> >                                                 (ellipse (apply
>> >> > #'ellipse (cdr shape)))
>> >> >                                                 ...))
>> >> >                                             shapes))
>> >> >                (build-svg-footer)))
>> >> >
>> >> > We've all written code like this before, building big strings from
>> >> > small strings in an iterative fashion. But as all of you would
>> >> > probably agree, building big strings like this using nested
>> >> > concatenations is inefficient, because each new concatenation
>> >> > might require unnecessary allocation operations as the string
>> >> > fragments are connected together.
>> >> 
>> >> I doubt
>> >> 
>> >> (reduce (lambda (s1 s2) (concatenate 'string s1 s2)) strings)
>> >> 
>> >> is going to be problematic. For example, rendering CL data
>> >> structures to a SVG file would be hard to do too slowly. Why not
>> >> write this nearly all of the time and use WITH-OUTPUT-TO-STRING for
>> >> the very few cases where extra speed is needed (if it is actually
>> >> faster... have you profiled these)?
>> >
>> > In case one has a bunch of strings and wants to concatenate them, to
>> > use REDUCE is not that clever.
>> 
>> That's not exactly the metric by which I judge code.
> 
> Which metric? Basic algorithmic complexity for simple algorithms that
> can be easily be coded in o(n) should not be made worse.
> 
> If one writes in LIsp, then one may write lots of little building
> blocks. Make sure that they scale, because much of the rest of the
> software will depend on them.
> 
>> REDUCE is clear and
>> probably good in most situations (I preferred it over APPLY since there
>> could potentially be a lot of strings -- although I'm aware that many
>> Lisps have a very high CALL-ARGUMENTS-LIMIT).
> 
> Alan Perlis said:
> 
>  A LISP programmer knows the value of everything, but the cost of
>  nothing.
> 
> 
> (defun not-so-clever-concatenate (strings)
>   (reduce #'concatenate2 strings))
> 
> (defun concatenate2 (s1 s2)
>   (format t "~% concatenating ~a and ~a" s1 s2) (let ((result
>   (concatenate 'string s1 s2)))
>       (format t "~% > to ~a" result)
>       result))
> 
> CL-USER 2 > (not-so-clever-concatenate '("ab" "cd" "ef" "gh" "ij" "kl"
> "mn" "op" "qr" "st" "uv" "wx" "yz"))
> 
>  concatenating ab and cd
>  > to abcd
>  concatenating abcd and ef
>  > to abcdef
>  concatenating abcdef and gh
>  > to abcdefgh
>  concatenating abcdefgh and ij
>  > to abcdefghij
>  concatenating abcdefghij and kl
>  > to abcdefghijkl
>  concatenating abcdefghijkl and mn
>  > to abcdefghijklmn
>  concatenating abcdefghijklmn and op
>  > to abcdefghijklmnop
>  concatenating abcdefghijklmnop and qr
>  > to abcdefghijklmnopqr
>  concatenating abcdefghijklmnopqr and st
>  > to abcdefghijklmnopqrst
>  concatenating abcdefghijklmnopqrst and uv
>  > to abcdefghijklmnopqrstuv
>  concatenating abcdefghijklmnopqrstuv and wx
>  > to abcdefghijklmnopqrstuvwx
>  concatenating abcdefghijklmnopqrstuvwx and yz
>  > to abcdefghijklmnopqrstuvwxyz
> 
> "abcdefghijklmnopqrstuvwxyz"
> 
> 
> Looks downright stupid to me. If you wanted to design a string
> concatenate that is creating the most amount of work, above is a good
> candidate.
> 
>

This has more to do with the fact that vectors aren't very amenable to 
manipulation using functional idioms, than the some sort of inherent 
"stupidity" in visualizing sequence operations in terms of recursive 
relations.  Also, even though the functional version creates a lot of new 
vectors, aren't the intermediate strings available for immediate garbage 
collection?  

We have all these great tools in lisp that allow the programmer to think 
at the level of the problem and not at the level of the machine, but 
people still get punished in this group for not thinking like C 
programmers.  Make your code modular, profile when you have problems, and 
save the ugly code for optimization.

Matt
From: John Thingstad
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <op.uc9hazxxut4oq5@pandora.alfanett.no>
P� Tue, 24 Jun 2008 17:55:53 +0200, skrev Matthew D Swank  
<··················@gmail.com>:


>
> This has more to do with the fact that vectors aren't very amenable to
> manipulation using functional idioms, than the some sort of inherent
> "stupidity" in visualizing sequence operations in terms of recursive
> relations.  Also, even though the functional version creates a lot of new
> vectors, aren't the intermediate strings available for immediate garbage
> collection?
>

Fact remains that printing to a character stream is a lot more efficient  
because of the huge overhead of copying going on in the reduce concatenate  
version.
Better still it scales better as each class can provide it's own  
print-object.
So both from a design and a efficiency view streams are the better choice.

--------------
John Thingstad
From: Matthew D Swank
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <hCe8k.2036$8n4.569@newsfe06.lga>
On Tue, 24 Jun 2008 18:26:49 +0200, John Thingstad wrote:

> På Tue, 24 Jun 2008 17:55:53 +0200, skrev Matthew D Swank
> <··················@gmail.com>:
> 
> 
> 
>> This has more to do with the fact that vectors aren't very amenable to
>> manipulation using functional idioms, than the some sort of inherent
>> "stupidity" in visualizing sequence operations in terms of recursive
>> relations.  Also, even though the functional version creates a lot of
>> new vectors, aren't the intermediate strings available for immediate
>> garbage collection?
>>
>>
> Fact remains that printing to a character stream is a lot more efficient
> because of the huge overhead of copying going on in the reduce
> concatenate version.
> Better still it scales better as each class can provide it's own
> print-object.
> So both from a design and a efficiency view streams are the better
> choice.
> 

I actually like the stream method, but the original poster was asking 
about efficient, functional string building.  The short answer probably 
should have been "Don't bother; strings are vectors, and it's not worth 
it".  

The nice thing about using streams for functional programmers, is that it 
makes string building look like i/o (Haskell and ML) or message passing 
(Termite and  Erlang) which functional programmers are used to sequencing 
anyway.

Does anyone know how does Clojure handles string building with its 
functional collections?
  

Matt

-- 
"You do not really understand something unless you can explain it to your 
grandmother." -- Albert Einstein.
From: Rich Hickey
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <43fa4d9d-cffc-4a9c-a15d-af722bcb5bda@y21g2000hsf.googlegroups.com>
On Jun 24, 6:32 pm, Matthew D Swank <··················@gmail.com>
wrote:
> On Tue, 24 Jun 2008 18:26:49 +0200, John Thingstad wrote:
> > På Tue, 24 Jun 2008 17:55:53 +0200, skrev Matthew D Swank
> > <··················@gmail.com>:
>
> >> This has more to do with the fact that vectors aren't very amenable to
> >> manipulation using functional idioms, than the some sort of inherent
> >> "stupidity" in visualizing sequence operations in terms of recursive
> >> relations.  Also, even though the functional version creates a lot of
> >> new vectors, aren't the intermediate strings available for immediate
> >> garbage collection?
>
> > Fact remains that printing to a character stream is a lot more efficient
> > because of the huge overhead of copying going on in the reduce
> > concatenate version.
> > Better still it scales better as each class can provide it's own
> > print-object.
> > So both from a design and a efficiency view streams are the better
> > choice.
>
> I actually like the stream method, but the original poster was asking
> about efficient, functional string building.  The short answer probably
> should have been "Don't bother; strings are vectors, and it's not worth
> it".
>
> The nice thing about using streams for functional programmers, is that it
> makes string building look like i/o (Haskell and ML) or message passing
> (Termite and  Erlang) which functional programmers are used to sequencing
> anyway.
>
> Does anyone know how does Clojure handles string building with its
> functional collections?
>

On Jun 24, 6:32 pm, Matthew D Swank <··················@gmail.com>
wrote:
> On Tue, 24 Jun 2008 18:26:49 +0200, John Thingstad wrote:
> > På Tue, 24 Jun 2008 17:55:53 +0200, skrev Matthew D Swank
> > <··················@gmail.com>:
>
> >> This has more to do with the fact that vectors aren't very amenable to
> >> manipulation using functional idioms, than the some sort of inherent
> >> "stupidity" in visualizing sequence operations in terms of recursive
> >> relations.  Also, even though the functional version creates a lot of
> >> new vectors, aren't the intermediate strings available for immediate
> >> garbage collection?
>
> > Fact remains that printing to a character stream is a lot more efficient
> > because of the huge overhead of copying going on in the reduce
> > concatenate version.
> > Better still it scales better as each class can provide it's own
> > print-object.
> > So both from a design and a efficiency view streams are the better
> > choice.
>
> I actually like the stream method, but the original poster was asking
> about efficient, functional string building.  The short answer probably
> should have been "Don't bother; strings are vectors, and it's not worth
> it".
>
> The nice thing about using streams for functional programmers, is that it
> makes string building look like i/o (Haskell and ML) or message passing
> (Termite and  Erlang) which functional programmers are used to sequencing
> anyway.
>
> Does anyone know how does Clojure handles string building with its
> functional collections?
>

There are many efficient options:

You can, as recommended here for CL, use print functions.

str is the Clojure string-izing function, and takes a rest arg,

    (apply str strings)

will work, and efficiently use a stream-like string builder
internally. Clojure's variadic fns have no argument limit.

You can use concat, which is lazy in Clojure and thus makes no
realized interim copies.

You can, in a recursive manner, use a Clojure vector, which is a
persistent data structure that supports constant-time append.

Rich
From: Thomas A. Russ
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <ymiwskd5wih.fsf@blackcat.isi.edu>
OK, in all of this, I didn't see any mention of just piling on the
arguments to CONCATENATE directly.  It is defined with an &rest argument
list, so one could just do:

    (apply #'concatenate 'string strings)

and then let the implmentation worry about doing the concatenation
efficiently.  If you can write the form directly, perhaps via a macro,
then you also make possible compiler-macro optimizations.  I note that
Allegro CL has a compiler macro for CONCATENATE.

Now, the only drawback to the APPLY approach is that one is limited to
CALL-ARGUMENTS-LIMIT parameters, but if that is going to be a problem
for the application, then one clearly can't go with a naive
implementation, since the O(n^2) complexity will kill you with large
numbers of strings to concatenate.

I suppose my preferred answer to the original question would be that it
is silly to insist upon a functional programming solution to this
problem.  One shouldn't have to work inside self-imposed straight
jackets when there are better approaches available using other
paradigms.


-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: Pascal J. Bourguignon
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <87hcbhjra9.fsf@hubble.informatimago.com>
···@sevak.isi.edu (Thomas A. Russ) writes:

> OK, in all of this, I didn't see any mention of just piling on the
> arguments to CONCATENATE directly.  It is defined with an &rest argument
> list, so one could just do:
>
>     (apply #'concatenate 'string strings)

No, this is wrong.

You have to write:

(if (< (+ 2 (length strings)) call-arguments-limit)
    (apply (function concatenate) 'string strings)
    (smarter-concatenate-string strings))

;; Well you mentionned call-arguments-limit, ok.


> I suppose my preferred answer to the original question would be that it
> is silly to insist upon a functional programming solution to this
> problem.  One shouldn't have to work inside self-imposed straight
> jackets when there are better approaches available using other
> paradigms.

I'm not sure, but I heard functionnal programming guys invented monads
to be able to keep track of state, so I guess we could still use SETF
or REPLACE and call it purely functionnal, with an implicit monad.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

Pour moi, la grande question n'a jamais �t�: �Qui suis-je? O� vais-je?� 
comme l'a formul� si adroitement notre ami Pascal, mais plut�t: 
�Comment vais-je m'en tirer?� -- Jean Yanne
From: Rainer Joswig
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <joswig-A85D5B.19105424062008@news-europe.giganews.com>
In article <················@newsfe07.lga>,
 Matthew D Swank <··················@gmail.com> wrote:

> > (defun not-so-clever-concatenate (strings)
> >   (reduce #'concatenate2 strings))
> > 
> > (defun concatenate2 (s1 s2)
> >   (format t "~% concatenating ~a and ~a" s1 s2) (let ((result
> >   (concatenate 'string s1 s2)))
> >       (format t "~% > to ~a" result)
> >       result))
> > 
> > CL-USER 2 > (not-so-clever-concatenate '("ab" "cd" "ef" "gh" "ij" "kl"
> > "mn" "op" "qr" "st" "uv" "wx" "yz"))
> > 
> >  concatenating ab and cd
> >  > to abcd
> >  concatenating abcd and ef
> >  > to abcdef
> >  concatenating abcdef and gh
> >  > to abcdefgh
> >  concatenating abcdefgh and ij
> >  > to abcdefghij
> >  concatenating abcdefghij and kl
> >  > to abcdefghijkl
> >  concatenating abcdefghijkl and mn
> >  > to abcdefghijklmn
> >  concatenating abcdefghijklmn and op
> >  > to abcdefghijklmnop
> >  concatenating abcdefghijklmnop and qr
> >  > to abcdefghijklmnopqr
> >  concatenating abcdefghijklmnopqr and st
> >  > to abcdefghijklmnopqrst
> >  concatenating abcdefghijklmnopqrst and uv
> >  > to abcdefghijklmnopqrstuv
> >  concatenating abcdefghijklmnopqrstuv and wx
> >  > to abcdefghijklmnopqrstuvwx
> >  concatenating abcdefghijklmnopqrstuvwx and yz
> >  > to abcdefghijklmnopqrstuvwxyz
> > 
> > "abcdefghijklmnopqrstuvwxyz"
> > 
> > 
> > Looks downright stupid to me. If you wanted to design a string
> > concatenate that is creating the most amount of work, above is a good
> > candidate.
> > 
> >
> 
> This has more to do with the fact that vectors aren't very amenable to 
> manipulation using functional idioms, than the some sort of inherent 
> "stupidity" in visualizing sequence operations in terms of recursive 
> relations.  Also, even though the functional version creates a lot of new 
> vectors, aren't the intermediate strings available for immediate garbage 
> collection?  
> 
> We have all these great tools in lisp that allow the programmer to think 
> at the level of the problem and not at the level of the machine, but 
> people still get punished in this group for not thinking like C 
> programmers.  Make your code modular, profile when you have problems, and 
> save the ugly code for optimization.
> 
> Matt

That's the type of code that gave Lisp a bad reputation for being
slow. We should have learned to do better by now. Lisp is not magically
saving us from choosing the right algorithms with the best
complexity. All these inefficiencies will add up and let
the software burn useless cycles. We would keep the GC busy
and the combination of bad implementations does not scale.
We would be beaten by it soon one leaves toy examples and all
these time wasters add up.

We are not even talking about 'optimization' here. It is
just about **not** increasing the algorithmic complexity.

 http://en.wikipedia.org/wiki/Algorithmic_complexity

 http://en.wikipedia.org/wiki/Big_O_notation

That's one of the things students should learn in computer
science courses very early - to choose the right algorithms
for the data structures. It has nothing to do if you
write in C or Lisp or something else. The complexity
of algorithms and their implementations is independent
of the programming language. Also nobody punishes anybody here.
This newsgroup should help people write better code.
Encouraging people to use totally naive algorithms hurts every
user. One looks at it and can easily see what the complexity
of the implementation is and see that this is bad. Lisp code
possibly survives us - do we really want concatenate a list of strings
like that? After 50 years of Lisp? I hope not.

-- 
http://lispm.dyndns.org/
From: Edi Weitz
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <uzlpbkzks.fsf@agharta.de>
On Tue, 24 Jun 2008 09:35:04 +0200, Rainer Joswig <······@lisp.de> wrote:

> In case one has a bunch of strings and wants to concatenate them, to
> use REDUCE is not that clever.
>
> WITH-OUTPUT-TO-STRING is not that clever, either.
>
> It is more clever to:
>
> * allocate the result string (since you concatenate strings, one
> knows the lengths)
>
> * copy each string into the result string at the correct position

It should be noted that you can allocate a string of the right length
and then use WITH-OUTPUT-TO-STRING with this string as its target.
The string has to have a fill pointer, though, and a specialized
solution will likely still be a bit faster, but in many cases
WITH-OUTPUT-TO-STRING is more elegant.

Edi.

-- 

Lisp is not dead, it just smells funny.

Real email: (replace (subseq ·········@agharta.de" 5) "edi")
From: Rainer Joswig
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <joswig-00AC95.11175624062008@news-europe.giganews.com>
In article <·············@agharta.de>, Edi Weitz <········@agharta.de> 
wrote:

> On Tue, 24 Jun 2008 09:35:04 +0200, Rainer Joswig <······@lisp.de> wrote:
> 
> > In case one has a bunch of strings and wants to concatenate them, to
> > use REDUCE is not that clever.
> >
> > WITH-OUTPUT-TO-STRING is not that clever, either.
> >
> > It is more clever to:
> >
> > * allocate the result string (since you concatenate strings, one
> > knows the lengths)
> >
> > * copy each string into the result string at the correct position
> 
> It should be noted that you can allocate a string of the right length
> and then use WITH-OUTPUT-TO-STRING with this string as its target.
> The string has to have a fill pointer, though, and a specialized
> solution will likely still be a bit faster, but in many cases
> WITH-OUTPUT-TO-STRING is more elegant.

But then you have to go though the STREAM output machinery...

> 
> Edi.

-- 
http://lispm.dyndns.org/
From: Brian
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <9c0a7649-2c69-43e7-99bb-524798ca5f4b@z72g2000hsb.googlegroups.com>
On Jun 24, 2:35 am, Rainer Joswig <······@lisp.de> wrote:
> It is more clever to:
>
> * allocate the result string (since you concatenate strings,
>   one knows the lengths)
>
> * copy each string into the result string at the correct position
>
> --http://lispm.dyndns.org/
A macro to do that isn't so hard to write:

(defmacro adding-strings (&body body)
  "Returns the concatenation of all strings that ADD-STRING was called
on in the body of the form."
  (let ((list (gensym))
        (end (gensym))
        (string (gensym))
        (count (gensym)))
    `(let* ((,list (list nil))
            (,end ,list)
            ,string)
       (flet ((add-string (string)
                (setf (cdr ,end) (cons string nil)
                      ,end (cdr ,end))))
         ,@body)
       (setf ,string (make-string (reduce #'+ (cdr ,list) :key
#'length)))
       (let ((,count 0))
         (dolist (str (cdr ,list))
           (setf (subseq ,string ,count) str)
           (incf ,count (length str))))
       ,string)))
From: Rainer Joswig
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <joswig-88B759.00045025062008@news-europe.giganews.com>
In article 
<····································@z72g2000hsb.googlegroups.com>,
 Brian <··············@gmail.com> wrote:

> On Jun 24, 2:35�am, Rainer Joswig <······@lisp.de> wrote:
> > It is more clever to:
> >
> > * allocate the result string (since you concatenate strings,
> > � one knows the lengths)
> >
> > * copy each string into the result string at the correct position
> >
> > --http://lispm.dyndns.org/
> A macro to do that isn't so hard to write:
> 
> (defmacro adding-strings (&body body)
>   "Returns the concatenation of all strings that ADD-STRING was called
> on in the body of the form."
>   (let ((list (gensym))
>         (end (gensym))
>         (string (gensym))
>         (count (gensym)))
>     `(let* ((,list (list nil))
>             (,end ,list)
>             ,string)
>        (flet ((add-string (string)
>                 (setf (cdr ,end) (cons string nil)
>                       ,end (cdr ,end))))
>          ,@body)
>        (setf ,string (make-string (reduce #'+ (cdr ,list) :key
> #'length)))
>        (let ((,count 0))
>          (dolist (str (cdr ,list))
>            (setf (subseq ,string ,count) str)
>            (incf ,count (length str))))
>        ,string)))



From the MIT CADR Lisp Machine source from 1980 (probably written earlier):

http://www.unlambda.com/lisp/mit.page

 >lispm2>string.LISP.54

; STRING-APPEND concatenates any number of strings.
; It will copy a single string.  Numbers and symbols are coerced.
; Actually, it will concatenate any type of arrays, making the result the same
; type as its first argument.

(DEFUN STRING-APPEND (&REST STRINGS &AUX (LENGTH 0) STRING COERCED (I 0))
    (DOLIST (S STRINGS)
      (SETQ LENGTH (+ LENGTH (STRING-LENGTH S))))
    (SETQ STRING (MAKE-ARRAY NIL
              (IF (ARRAYP (CAR STRINGS)) (ARRAY-TYPE (CAR STRINGS))
             'ART-STRING)
              LENGTH))
    (DOLIST (S STRINGS)
      (COND ((NUMBERP S)
        (ASET S STRING I)
        (SETQ I (1+ I)))
       (T (SETQ COERCED (STRING S))
          (COPY-ARRAY-PORTION COERCED 0 (SETQ LENGTH (ARRAY-ACTIVE-LENGTH COERCED))
               STRING I (SETQ I (+ I LENGTH))))))
    STRING)


Now that's not Common Lisp, but similar (Lisp Machine Lisp).

-- 
http://lispm.dyndns.org/
From: ···············@gmail.com
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <de00e9b3-1881-4eed-a21e-746faa80ab4d@a9g2000prl.googlegroups.com>
On Jun 24, 7:08 am, Conrad <······@gmail.com> wrote:
> Hi guys/gals:
>
> Suppose I want to build a big string out of smaller parts in CL in a
> fully functional way...
>
> For example, suppose that I want to build a string representing an SVG
> file (this would be the same for any string-based data structure
> though)
>
> The obvious way to do this is to create functions for SVG rectangles,
> ellipses, etc first. These would all return string representations of
> those objects. Next, one would write a 'build-svg function that might
> look like this:
>
> (defun build-svg (shapes)
>   (concatenate (build-svg-header)
>                (apply #'concatenate (mapcar (lambda (shape)
>                                               (case (car chape)
>                                                 (rectangle (apply
> #'rectangle (cdr shape)))
>                                                 (ellipse (apply
> #'ellipse (cdr shape)))
>                                                 ...))
>                                             shapes))
>                (build-svg-footer)))
>
> We've all written code like this before, building big strings from
> small strings in an iterative fashion. But as all of you would
> probably agree, building big strings like this using nested
> concatenations is inefficient, because each new concatenation might
> require unnecessary allocation operations as the string fragments are
> connected together.
>
> In lazy languages, such as Haskell, I believe the inefficiency of
> these concatenations is usually not an issue. But the majority of
> languages, including CL, require a less naive approach to string
> building, using either "string builders" or some kind of stream-based
> approach.
>
> Consequently, a more appropriate way to build strings in CL, if I
> understand the problem correctly, would be as follows:
>
> (defun build-svg (shapes)
>   (with-output-to-string
>       (princ (build-svg-header))
>       (mapc (lambda (shape)
>               (case (car chape)
>                 (rectangle (princ (apply #'rectangle (cdr shape))))
>                 (ellipse (princ (apply #'ellipse (cdr shape))))
>                 ...))
>             shapes)
>       (princ (build-svg-footer))))
>
> Here, we use the properties of output streams to collect the
> individual strings through 'princ and then collect the results with
> 'with-output-to-string. This approach is a much more efficient way of
> iteratively accumulating small strings- but this code is, of course,
> not functional.
>
> My question is therefore as follows: What are the "best practices" for
> accumulating strings in a purely functional way in CL? Is it to just
> concatenate strings as I do in my first example, despite the
> inefficiency? Or, is there a way to do write pure FP CL that has an
> improved performance profile, as shown in my second example?
>
> I suspect the answer is "No", but I don't know for sure...
>
> Thanks in advance for any insights/advice you may have into pure FP CL
> string accumulation :-)

There is a somewhat related discussion on generating build commands
("gcc ..." etc.) here:
http://groups.google.com/group/comp.lang.lisp/msg/1d5fdf6b1dd0f0ea

The idea is to keep everything as lists as long as possible
and convert it to a string only when needed (e.g. writing to a
file). As the list can contain any data type like symbols, strings
numbers etc. its very efficient in space and speed. E.g. a list
for a compiler command could be:

'("abc" 1 #\Space 23 #P"file.txt")

Here are some useful methods that I have used based on
recommendations in the above thread.

(defun interleave (lst &key (pre "") (post #\Space))
  "(interleave '('a' 'b' 'c' 'd') ' ') => ('a' ' ' 'b' ' ' 'c' ' '
'd')"
  (loop for first = t then nil
        for item in lst
        collect pre
        collect item
        collect post))

The pre and post can be used to add prefix or suffix.

(defun lst->str (lst &key (pre "") (post #\Space))
  (format nil "~{~A~^~}" (interleave lst :pre pre :post post)))

I found format to be a very convenient way to convert these
list to string just before sending them to a stream.

--
Parth
From: Scott Burson
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <09d6f099-d0b9-4e87-8f8e-d8c2a7a29f36@h1g2000prh.googlegroups.com>
On Jun 23, 7:08 pm, Conrad <······@gmail.com> wrote:
> Thanks in advance for any insights/advice you may have into pure FP CL
> string accumulation :-)

If all you're doing is accumulation, the other suggestions here will
suffice, but if you want to do FP in CL more generally, my FSet
library can be very handy:

http://common-lisp.net/project/fset/

FSet gives you: pure sets with log-time membership testing, log-time
insertion, and linear-time union, intersection, and set-difference;
pure maps with log-time lookup and update; and pure sequences with log-
time concatenation and subsequence operations (and I'm just hitting
the highlights here; there are many more operations defined).  All
these types are implemented with heterogeneous binary trees for space
efficiency.

-- Scott
From: Stephen Compall
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <m21w2nldbj.fsf@nocandy.dyndns.org>
Conrad <······@gmail.com> writes:
> My question is therefore as follows: What are the "best practices" for
> accumulating strings in a purely functional way in CL? Is it to just
> concatenate strings as I do in my first example, despite the
> inefficiency? Or, is there a way to do write pure FP CL that has an
> improved performance profile, as shown in my second example?

Define your own string type, say, fstring, where an fstring is:

1. A string; or
2. A simple vector of fstrings.

(declaim (inline concatenate-fstring))
(defun concatenate-fstring (&rest strings)
  (apply #'vector strings))

(defun fstring->string (fstring)
  (with-output-to-string output
    (labels ((descend (fstring)
               (typecase fstring
                 (string (write-string fstring))
                 (otherwise (loop for sub across fstring
                                  do (descend sub))))))
      (descend fstring))))

This has the nice property of isolating the side-effecty code, and the
less nice property of overflowing the stack if your fstrings get too
deep.

If your types need to be context-free you could wrap all this up in CLOS
with a suitable PRINT-OBJECT so PRINC et al would work.

-- 
But you know how reluctant paranormal phenomena are to reveal
themselves when skeptics are present. --Robert Sheaffer, SkI 9/2003
From: Conrad
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <c9469b61-ff20-412e-80b9-e0d3a8971ec7@x35g2000hsb.googlegroups.com>
Thank you everyone for the great answers! Clearly a lot of different
opinions on this issue, since it's hard to balance simplicity,
performance and FP-ness for this case. Looks like I'll be spending
some time with, er, 'time.
From: Peter Gijsels
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <7a6af997-4e2a-43df-9982-6b166de5c753@z72g2000hsb.googlegroups.com>
On 24 jun, 04:08, Conrad <······@gmail.com> wrote:
> My question is therefore as follows: What are the "best practices" for
> accumulating strings in a purely functional way in CL? Is it to just
> concatenate strings as I do in my first example, despite the
> inefficiency? Or, is there a way to do write pure FP CL that has an
> improved performance profile, as shown in my second example?

You could use a rope implementation (google for "Ropes: an Alternative
to Strings").
It has efficient concatenation, subrope selection and element
selection (log n), and has a purely functional interface.

I had a first shot at implementing ropes in Common Lisp:
http://common-lisp.net/project/cl-rope/
Any feedback is welcome.

Regards,
Peter
From: Rob Warnock
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <_pOdnYe646ss9v7VnZ2dnUVZ_ojinZ2d@speakeasy.net>
Peter Gijsels  <·············@gmail.com> wrote:
+---------------
| You could use a rope implementation (google for "Ropes: an Alternative
| to Strings").
+---------------

Is this at all similar to the "cords" package that is (was?)
supplied with the Boehm-Demers-Weiser Conservative GC package?


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Peter Gijsels
Subject: Re: Best practices for efficient FP string building in CL?
Date: 
Message-ID: <056d82b0-1ffc-47ee-a792-46a05200188c@59g2000hsb.googlegroups.com>
On Jun 26, 12:13 pm, ····@rpw3.org (Rob Warnock) wrote:
> Peter Gijsels  <·············@gmail.com> wrote:
> +---------------
> | You could use a rope implementation (google for "Ropes: an Alternative
> | to Strings").
> +---------------
>
> Is this at all similar to the "cords" package that is (was?)
> supplied with the Boehm-Demers-Weiser Conservative GC package?

Yes. The cords package is a C reimplementation by Hans Boehm of an
idea implemented in the Cedar Xerox environment. He also made a C++
implementation: the SGI STL rope.