So what is the right way to write string.join?

From: Ron Garret
Subject: So what is the right way to write string.join?
Date: Sun, 18 Nov 2007 09:11:34 +0000
Message-ID: <rNOSPAMon-62C438.01113418112007@news.gha.chartermi.net>

Leaving aside the format puzzle I posted earlier today, it occurred to 
me that writing string.join in Lisp the "right way" is surprisingly 
non-trivial.  After giving up on format as being too cute, I tried:

(defun join (l &optional (delim ""))
  (apply 'concatenate 'string (cdr (loop for i in l
     collect delim collect (princ-to-string i)))))

but that fails if l is longer than the arglist limit (or whatever that's 
called). Concatenating incrementally becomes an O(n^2) solution, which 
is bad.  (The fact that concatenate takes the things to concatenate as a 
restarg is starting to look like a bad design decision.)

So what is the Right Way to write this function in Lisp, where Right Way 
is defined as the way that produces reasonable runtime performance with 
no arbitrary limit on the size of the input list without writing too 
much code (for some reasonable value of "too much").  Should I use a 
string output stream?  An adjustable vector of characters with a fill 
pointer?

rg

Re: So what is the right way to write string.join? Damien Kick
- Re: So what is the right way to write string.join? Alan Crowe
Re: So what is the right way to write string.join? Thomas F. Burdick
- Re: So what is the right way to write string.join? Espen Vestre
  - Re: So what is the right way to write string.join? Russell McManus
    - Re: So what is the right way to write string.join? Espen Vestre
      - Re: So what is the right way to write string.join? John Thingstad
        Re: So what is the right way to write string.join? Russell McManus
        Re: So what is the right way to write string.join? Espen Vestre
        Re: So what is the right way to write string.join? Thomas F. Burdick
        Re: So what is the right way to write string.join? Espen Vestre
Re: So what is the right way to write string.join? Maciej Katafiasz
Re: So what is the right way to write string.join? Maciej Katafiasz
- Re: So what is the right way to write string.join? Marco Antoniotti
  - Re: So what is the right way to write string.join? Maciej Katafiasz
- Re: So what is the right way to write string.join? Pascal Costanza
- Re: So what is the right way to write string.join? Juho Snellman
Re: So what is the right way to write string.join? Pascal Costanza
Re: So what is the right way to write string.join? Phil Bewig
- Re: So what is the right way to write string.join? Sohail Somani
Re: So what is the right way to write string.join? Szymon 'tichy'
Re: So what is the right way to write string.join? Alan Crowe
Re: So what is the right way to write string.join? Robert Maas, see http://tinyurl.com/uh3t
Re: So what is the right way to write string.join? Alain Picard

From: Damien Kick
Subject: Re: So what is the right way to write string.join?
Date: Sun, 18 Nov 2007 10:28:51 +0000
Message-ID: <13k04v4e11fr858@corp.supernews.com>

Ron Garret wrote:
> Leaving aside the format puzzle I posted earlier today, it occurred to 
> me that writing string.join in Lisp the "right way" is surprisingly 
> non-trivial.

Previously, on comp.lang.lisp: <http://tinyurl.com/2w3lgr>.

From: Alan Crowe
Subject: Re: So what is the right way to write string.join?
Date: Sun, 18 Nov 2007 16:19:07 +0000
Message-ID: <86pry7r0g4.fsf@cawtech.freeserve.co.uk>

Damien Kick <·····@earthlink.net> writes:

> Ron Garret wrote:
> > Leaving aside the format puzzle I posted earlier today, it occurred to 
> > me that writing string.join in Lisp the "right way" is surprisingly 
> > non-trivial.
> 
> Previously, on comp.lang.lisp: <http://tinyurl.com/2w3lgr>.

I remember that thread. I learnt a LOOP idiom there.

(loop for (thing . more-to-come) on thing-list ...)

thing gets set to each item on thing-list, and you can tell
when you are on the last one because more-to-come is nil
instead of being a cons.

I concluded that although naive code rambles on for 10 or 20
lines, it tells a strong, clear story, making it easy enough
to read.

(defun join (word-list separator-string)
  "separate the words in the list with the separator string"
  ;; We start by calculating the size of the final string
  ;; and allocate a big enough string at the beginning
  (let* ((gap (length separator-string))
         (buffer (make-string
                  (+ (loop for word in word-list
                           sum (length word))
                     (* gap (- (length word-list) 1))))))
    ;; Work forwards filling the buffer alternately
    ;; from the word list and the separator string
    (loop for (word . more) on word-list
          and start = 0 then (+ start (length word) gap)
          do (replace buffer word
                      :start1 start)
             (when more (replace buffer separator-string
                                 :start1 (+ start (length word))))
          finally (return buffer))))

Alan Crowe
Edinburgh
Scotland

From: Thomas F. Burdick
Subject: Re: So what is the right way to write string.join?
Date: Sun, 18 Nov 2007 13:13:42 +0000
Message-ID: <2ff4499b-e017-4524-baf9-1c924a23c451@w73g2000hsf.googlegroups.com>

On Nov 18, 10:11 am, Ron Garret <·········@flownet.com> wrote:

> (defun join (l &optional (delim ""))
>   (apply 'concatenate 'string (cdr (loop for i in l
>      collect delim collect (princ-to-string i)))))
>

 [...]

> So what is the Right Way to write this function in Lisp, where Right Way
> is defined as the way that produces reasonable runtime performance with
> no arbitrary limit on the size of the input list without writing too
> much code (for some reasonable value of "too much").  Should I use a
> string output stream?  An adjustable vector of characters with a fill
> pointer?

Yes, one of the two.  Which is more efficient depends on the
implementation, but the string output stream solution is definately
more readable:

(defun join (strings &optional (delim ""))
  (with-output-to-string (out)
    (loop for (string . more?) on strings
          do (write-string string out)
          when more? do (write-string delim out))))

From: Espen Vestre
Subject: Re: So what is the right way to write string.join?
Date: Mon, 19 Nov 2007 10:13:07 +0000
Message-ID: <m1oddqtufg.fsf@gazonk.vestre.net>

"Thomas F. Burdick" <········@gmail.com> writes:

> Yes, one of the two.  Which is more efficient depends on the
> implementation, but the string output stream solution is definately
> more readable:
>
> (defun join (strings &optional (delim ""))
>   (with-output-to-string (out)
>     (loop for (string . more?) on strings
>           do (write-string string out)
>           when more? do (write-string delim out))))

I vote for that one. Very readable and very efficient. 

Using #'write-string instead of e.g. #'princ can save quite a bit of
cpu time - just a few weeks ago I optimized some old code by doing
that replacement.
-- 
  (espen)

From: Russell McManus
Subject: Re: So what is the right way to write string.join?
Date: Mon, 19 Nov 2007 14:03:23 +0000
Message-ID: <87oddqqqms.fsf@thelonious.cl-user.org>

(defun string-join (list &optional (delim #\,))
  (unless list (return-from string-join ""))
  (let ((out (make-array (1- (loop for s in list sum (1+ (length s)))) 
			 :element-type 'character))
	(start (length (car list))))
    (setf (subseq out 0 (length (car list))) (car list))
    (dolist (s (cdr list))
      (setf (aref out start) delim
	    (subseq out (+ 1 start) (+ 1 start (length s))) s
	    start (+ 1 start (length s))))
    out))

Example code using a single string allocation.  I have not profiled
the above code.

-russ

From: Espen Vestre
Subject: Re: So what is the right way to write string.join?
Date: Tue, 20 Nov 2007 08:57:40 +0000
Message-ID: <m1wssds397.fsf@gazonk.vestre.net>

Russell McManus <···············@yahoo.com> writes:

> (defun string-join (list &optional (delim #\,))
>   (unless list (return-from string-join ""))
>   (let ((out (make-array (1- (loop for s in list sum (1+ (length s)))) 
> 			 :element-type 'character))
> 	(start (length (car list))))
>     (setf (subseq out 0 (length (car list))) (car list))
>     (dolist (s (cdr list))
>       (setf (aref out start) delim
> 	    (subseq out (+ 1 start) (+ 1 start (length s))) s
> 	    start (+ 1 start (length s))))
>     out))
>
> Example code using a single string allocation.  I have not profiled
> the above code.

Nice. You can speed it up quite a bit, and reduce the allocation to
a fraction, by replacing the make-array with a make-string! Your version
is a little less general than the version Thomas posted, though, since
the delimiter is just a character.

Here's a version of your function that uses make-string and allows
general delimiters (strings, characters and symbols):

(defun string-join (list &optional (delim #\,))
  (unless list (return-from string-join ""))
  (unless (stringp delim)
    (setf delim (string delim)))
  (let* ((dlen (length delim))
         (start (length (car list)))
         (out (make-string (+ start (loop for s in (cdr list) sum (+ dlen (length s)))))))
    (setf (subseq out 0 start) (car list))
    (dolist (s (cdr list))
      (setf (subseq out start (+ start dlen)) delim
	    (subseq out (+ start dlen) (+ start dlen (length s))) s
	    start (+ start dlen (length s))))
    out))
-- 
  (espen)

From: John Thingstad
Subject: Re: So what is the right way to write string.join?
Date: Tue, 20 Nov 2007 09:41:01 +0000
Message-ID: <op.t123unjlut4oq5@pandora.alfanett.no>

P� Tue, 20 Nov 2007 09:57:40 +0100, skrev Espen Vestre <·····@vestre.net>:

> Russell McManus <···············@yahoo.com> writes:
>
>> (defun string-join (list &optional (delim #\,))
>>   (unless list (return-from string-join ""))
>>   (let ((out (make-array (1- (loop for s in list sum (1+ (length s))))
>> 			 :element-type 'character))
>> 	(start (length (car list))))
>>     (setf (subseq out 0 (length (car list))) (car list))
>>     (dolist (s (cdr list))
>>       (setf (aref out start) delim
>> 	    (subseq out (+ 1 start) (+ 1 start (length s))) s
>> 	    start (+ 1 start (length s))))
>>     out))
>>
>> Example code using a single string allocation.  I have not profiled
>> the above code.
>
> Nice. You can speed it up quite a bit, and reduce the allocation to
> a fraction, by replacing the make-array with a make-string! Your version
> is a little less general than the version Thomas posted, though, since
> the delimiter is just a character.
>
> Here's a version of your function that uses make-string and allows
> general delimiters (strings, characters and symbols):
>
> (defun string-join (list &optional (delim #\,))
>   (unless list (return-from string-join ""))
>   (unless (stringp delim)
>     (setf delim (string delim)))
>   (let* ((dlen (length delim))
>          (start (length (car list)))
>          (out (make-string (+ start (loop for s in (cdr list) sum  
> (+ dlen (length s)))))))
>     (setf (subseq out 0 start) (car list))
>     (dolist (s (cdr list))
>       (setf (subseq out start (+ start dlen)) delim
> 	    (subseq out (+ start dlen) (+ start dlen (length s))) s
> 	    start (+ start dlen (length s))))
>     out))


errm.. Yes it's fast enough, but at what cost? This looks more loke  
inefficient C code..

my favorite is

(defun string-join (delimitter &rest strings)
   (with-output-to-string (out)
     (loop for (element . more?) on strings do
           (write-string element out)
           (when more?
             (write-string delimitter out)))))

CL-USER 1 > (string-join ";" "one" "two" "three")
"one;two;three"


Which looks Lispy and gives a good balance between performance and  
readabillity.


--------------
John Thingstad

From: Russell McManus
Subject: Re: So what is the right way to write string.join?
Date: Tue, 20 Nov 2007 15:29:39 +0000
Message-ID: <8763zxq6jg.fsf@thelonious.cl-user.org>

"John Thingstad" <·······@online.no> writes:

> errm.. Yes it's fast enough, but at what cost? This looks more loke
> inefficient C code..

The cost is a few lines of code, which is worth it if this is a
heavily used library function.

-russ

From: Espen Vestre
Subject: Re: So what is the right way to write string.join?
Date: Tue, 20 Nov 2007 10:05:15 +0000
Message-ID: <m1sl31s04k.fsf@gazonk.vestre.net>

"John Thingstad" <·······@online.no> writes:

> errm.. Yes it's fast enough, but at what cost? This looks more loke
> inefficient C code..
>
> my favorite is

That's almost my favourite too (I prefer a list parameter and not
&rest, see Thomas's version in this thread), but if you should happen
to write code that needs a highly optimized string-joining function,
it's utterly stupid to refuse to do so just because one of your
utility functions is starting to look a little ugly.
-- 
  (espen)

From: Thomas F. Burdick
Subject: Re: So what is the right way to write string.join?
Date: Tue, 20 Nov 2007 15:42:54 +0000
Message-ID: <41138075-55d1-4124-b4ee-172c79b5c0bf@e10g2000prf.googlegroups.com>

On Nov 20, 11:05 am, Espen Vestre <·····@vestre.net> wrote:
> "John Thingstad" <·······@online.no> writes:
> > errm.. Yes it's fast enough, but at what cost? This looks more loke
> > inefficient C code..
>
> > my favorite is
>
> That's almost my favourite too (I prefer a list parameter and not
> &rest, see Thomas's version in this thread), but if you should happen
> to write code that needs a highly optimized string-joining function,
> it's utterly stupid to refuse to do so just because one of your
> utility functions is starting to look a little ugly.

What are you optimizing for?  For what it's worth, I'm pretty sure
that mine is faster than yours -- when given many strings -- and that
yours uses less memory than mine.  At least on SBCL.  I haven't
actually tested the exact code in question, but that's based on
experience.

From: Espen Vestre
Subject: Re: So what is the right way to write string.join?
Date: Tue, 20 Nov 2007 16:15:00 +0000
Message-ID: <m1abp8sxkr.fsf@gazonk.vestre.net>

"Thomas F. Burdick" <········@gmail.com> writes:

> What are you optimizing for?  For what it's worth, I'm pretty sure
> that mine is faster than yours -- when given many strings -- and that
> yours uses less memory than mine.  At least on SBCL.  I haven't
> actually tested the exact code in question, but that's based on
> experience.

Well, I actually prefer your version... But on LW on the mac,
Russell's version is more than twice as fast and uses less than half
the memory if I try it on a huge list of words (the contents of
/usr/share/dict/words). But now that I take a second look, I see that
my generalization of Russell's code to allow for general string
delimiters is *not* faster than yours, and uses just 0,007% less
memory ;-)

(I always use string streams for this kind code myself)
-- 
  (espen)

From: Maciej Katafiasz
Subject: Re: So what is the right way to write string.join?
Date: Sun, 18 Nov 2007 09:57:36 +0000
Message-ID: <fhp2ag$2kh$1@news.net.uni-c.dk>

Den Sun, 18 Nov 2007 01:11:34 -0800 skrev Ron Garret:

> Leaving aside the format puzzle I posted earlier today, it occurred to
> me that writing string.join in Lisp the "right way" is surprisingly
> non-trivial.  After giving up on format as being too cute, I tried:
> 
> (defun join (l &optional (delim ""))
>   (apply 'concatenate 'string (cdr (loop for i in l
>      collect delim collect (princ-to-string i)))))
> 
> but that fails if l is longer than the arglist limit (or whatever that's
> called). Concatenating incrementally becomes an O(n^2) solution, which
> is bad.  (The fact that concatenate takes the things to concatenate as a
> restarg is starting to look like a bad design decision.)
> 
> So what is the Right Way to write this function in Lisp, where Right Way
> is defined as the way that produces reasonable runtime performance with
> no arbitrary limit on the size of the input list without writing too
> much code (for some reasonable value of "too much").  Should I use a
> string output stream?  An adjustable vector of characters with a fill
> pointer?

Here's an INTERSPERSE, operating on lists, stolen from cl-weblocks. It's 
pretty fast, I found it to be faster than my FORMAT hacking when I tried 
it at one point, although it's recursive and I didn't really try it on 
huge lists. Probably specialising it for simple-strings and doing the 
(coerce (concatenate 'simple-string (intersperse ...) 'simple-string) 
mumbo jumbo in one step would make it an order of magnitude or three 
faster.

(defun intersperse (list delimeter &key (last delimeter))
  "Intersperses a list with a delimeter.

If 'last' is specified, it will be used for the last delimeter,
instead of 'delimeter'.

\(intersperse '(1 2 3 4 5) 0)
=> (1 0 2 0 3 0 4 0 5)"
  (cond
    ((null list) list)
    ((null (cdr list)) list)
    ((null (cddr list)) (list (car list)
			      last
			      (cadr list)))
    (t (cons (car list)
	     (cons delimeter
		   (intersperse (cdr list) delimeter :last last))))))

Cheers,
Maciej

From: Maciej Katafiasz
Subject: Re: So what is the right way to write string.join?
Date: Sun, 18 Nov 2007 11:45:04 +0000
Message-ID: <fhp8k0$3vl$1@news.net.uni-c.dk>

A little benchmarking:

(defun webl-join (strings &optional (delim " "))
  ;; Intersperse as given in my previous post
  (apply #'concatenate 'string (intersperse strings delim)))

(require :iterate)
(use-package :iter)

;; My own join, specialised for strings
(defun mthr-join (strings &optional (delim " "))
  (declare (optimize speed)
	   (type simple-string delim))
  (let ((dlen (length delim)))
    (declare (type integer dlen))
    (iter (with (the simple-string str) =
		(make-string (iter (for s in strings)
				   (sum (+ dlen (length (the simple-
string s))) into lens)
				   (finally (return (- lens dlen))))))
	  (for (the simple-string del) initially "" then delim)
	  (for (the integer dl) initially 0 then dlen)
	  (for (the simple-string s) in strings)
	  (replace str del :start1 pos)
	  (replace str s :start1 (+ pos dl))
	  (sum (+ dl (length s)) into pos)
	  (finally (return str)))))

(defun pascal-join (list &optional (delim " "))
   (with-output-to-string (s)
     (when list
       (format s "~A" (first list))
       (dolist (element (rest list))
         (format s "~A~A" delim element)))))

(require :cl-interpol)
(cl-interpol:enable-interpol-syntax)
(defun mw-join (l &optional (cl-interpol:*list-delimiter* ""))
  ···@{l}")


CL-USER> (setf *seq* (loop for i from 1 upto 100 collecting "asd"))

CL-USER> (time (dotimes (i 100000) (webl-join *seq* " ")))
Evaluation took:
  11.208 seconds of real time
  10.792675 seconds of user run time
  0.136008 seconds of system run time
  [Run times include 0.64 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  1,127,956,184 bytes consed.
NIL

CL-USER> (time (dotimes (i 100000) (mthr-join *seq* " ")))
Evaluation took:
  3.889 seconds of real time
  3.868242 seconds of user run time
  0.004 seconds of system run time
  [Run times include 0.1 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  160,789,064 bytes consed.
NIL

CL-USER> (time (dotimes (i 100000) (mw-join *seq* " ")))
Evaluation took:
  161.991 seconds of real time
  153.5616 seconds of user run time
  3.012188 seconds of system run time
  [Run times include 18.888 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  34,929,182,312 bytes consed.
NIL

CL-USER> (time (dotimes (i 100000) (pascal-join *seq* " ")))
Evaluation took:
  178.864 seconds of real time
  170.63867 seconds of user run time
  2.544159 seconds of system run time
  [Run times include 20.597 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  36,521,204,088 bytes consed.
NIL

Arguably, my version[1] is the ugliest, but I didn't try very hard to 
minimise it, I stopped after making it fast :). Somewhat surprisingly, 
webl-join conses almost 10x as many bytes, but is only 2.5x slower.

Cheers,
Maciej

[1] Actually it's stolen from #lisp, where I happened to cause a major 
string catenation flame yesterday... I just rewrote it for the particular 
case of join and using ITERATE.

From: Marco Antoniotti
Subject: Re: So what is the right way to write string.join?
Date: Sun, 18 Nov 2007 18:41:22 +0000
Message-ID: <54cf3376-cc56-49f3-b5e0-45df4861d5b3@b15g2000hsa.googlegroups.com>

On Nov 18, 12:45 pm, Maciej Katafiasz <········@gmail.com> wrote:
> A little benchmarking:
>
> (defun webl-join (strings &optional (delim " "))
>   ;; Intersperse as given in my previous post
>   (apply #'concatenate 'string (intersperse strings delim)))
>
> (require :iterate)
> (use-package :iter)
>
> ;; My own join, specialised for strings
> (defun mthr-join (strings &optional (delim " "))
>   (declare (optimize speed)
>            (type simple-string delim))
>   (let ((dlen (length delim)))
>     (declare (type integer dlen))
>     (iter (with (the simple-string str) =
>                 (make-string (iter (for s in strings)
>                                    (sum (+ dlen (length (the simple-
> string s))) into lens)
>                                    (finally (return (- lens dlen))))))
>           (for (the simple-string del) initially "" then delim)
>           (for (the integer dl) initially 0 then dlen)
>           (for (the simple-string s) in strings)
>           (replace str del :start1 pos)
>           (replace str s :start1 (+ pos dl))
>           (sum (+ dl (length s)) into pos)
>           (finally (return str)))))
>
> (defun pascal-join (list &optional (delim " "))
>    (with-output-to-string (s)
>      (when list
>        (format s "~A" (first list))
>        (dolist (element (rest list))
>          (format s "~A~A" delim element)))))
>
> (require :cl-interpol)
> (cl-interpol:enable-interpol-syntax)
> (defun mw-join (l &optional (cl-interpol:*list-delimiter* ""))
>   ···@{l}")
>
> CL-USER> (setf *seq* (loop for i from 1 upto 100 collecting "asd"))
>
> CL-USER> (time (dotimes (i 100000) (webl-join *seq* " ")))
> Evaluation took:
>   11.208 seconds of real time
>   10.792675 seconds of user run time
>   0.136008 seconds of system run time
>   [Run times include 0.64 seconds GC run time.]
>   0 calls to %EVAL
>   0 page faults and
>   1,127,956,184 bytes consed.
> NIL
>
> CL-USER> (time (dotimes (i 100000) (mthr-join *seq* " ")))
> Evaluation took:
>   3.889 seconds of real time
>   3.868242 seconds of user run time
>   0.004 seconds of system run time
>   [Run times include 0.1 seconds GC run time.]
>   0 calls to %EVAL
>   0 page faults and
>   160,789,064 bytes consed.
> NIL
>
> CL-USER> (time (dotimes (i 100000) (mw-join *seq* " ")))
> Evaluation took:
>   161.991 seconds of real time
>   153.5616 seconds of user run time
>   3.012188 seconds of system run time
>   [Run times include 18.888 seconds GC run time.]
>   0 calls to %EVAL
>   0 page faults and
>   34,929,182,312 bytes consed.
> NIL
>
> CL-USER> (time (dotimes (i 100000) (pascal-join *seq* " ")))
> Evaluation took:
>   178.864 seconds of real time
>   170.63867 seconds of user run time
>   2.544159 seconds of system run time
>   [Run times include 20.597 seconds GC run time.]
>   0 calls to %EVAL
>   0 page faults and
>   36,521,204,088 bytes consed.
> NIL
>
> Arguably, my version[1] is the ugliest, but I didn't try very hard to
> minimise it, I stopped after making it fast :). Somewhat surprisingly,
> webl-join conses almost 10x as many bytes, but is only 2.5x slower.
>
> Cheers,
> Maciej
>
> [1] Actually it's stolen from #lisp, where I happened to cause a major
> string catenation flame yesterday... I just rewrote it for the particular
> case of join and using ITERATE.

What is the running time without ITERATE but using LOOP?  It may be
that various implementations optimize LOOP better.

Cheers

Marco

From: Maciej Katafiasz
Subject: Re: So what is the right way to write string.join?
Date: Sun, 18 Nov 2007 20:53:06 +0000
Message-ID: <fhq8nh$3vl$2@news.net.uni-c.dk>

Den Sun, 18 Nov 2007 10:41:22 -0800 skrev Marco Antoniotti:

>> [1] Actually it's stolen from #lisp, where I happened to cause a major
>> string catenation flame yesterday... I just rewrote it for the
>> particular case of join and using ITERATE.
> 
> What is the running time without ITERATE but using LOOP?  It may be that
> various implementations optimize LOOP better.

Not sure, I don't have an equivalent version using LOOP. The code I was 
modelling it after was an implementation of strcat, not join, so it'd 
need a bit of tweaking.

Cheers,
Maciej

From: Pascal Costanza
Subject: Re: So what is the right way to write string.join?
Date: Sun, 18 Nov 2007 23:46:28 +0000
Message-ID: <5qc16kFv0g9sU1@mid.individual.net>

Maciej Katafiasz wrote:
> A little benchmarking:
> 

Under what circumstances is join a performance-critical operation?


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/

From: Juho Snellman
Subject: Re: So what is the right way to write string.join?
Date: Sun, 18 Nov 2007 23:41:46 +0000
Message-ID: <87lk8vce9x.fsf@vasara.proghammer.com>

Maciej Katafiasz <········@gmail.com> writes:
> A little benchmarking:
[...]
> (defun pascal-join (list &optional (delim " "))
>    (with-output-to-string (s)
>      (when list
>        (format s "~A" (first list))
>        (dolist (element (rest list))
>          (format s "~A~A" delim element)))))

From the performance viewpoint, this version should either be binding
*print-pretty* to NIL or (better) using WRITE-STRING directly.

-- 
Juho Snellman

From: Pascal Costanza
Subject: Re: So what is the right way to write string.join?
Date: Sun, 18 Nov 2007 11:00:56 +0000
Message-ID: <5qakb8Fudei5U1@mid.individual.net>

Ron Garret wrote:
> Leaving aside the format puzzle I posted earlier today, it occurred to 
> me that writing string.join in Lisp the "right way" is surprisingly 
> non-trivial.  After giving up on format as being too cute, I tried:
> 
> (defun join (l &optional (delim ""))
>   (apply 'concatenate 'string (cdr (loop for i in l
>      collect delim collect (princ-to-string i)))))
> 
> but that fails if l is longer than the arglist limit (or whatever that's 
> called). Concatenating incrementally becomes an O(n^2) solution, which 
> is bad.  (The fact that concatenate takes the things to concatenate as a 
> restarg is starting to look like a bad design decision.)
> 
> So what is the Right Way to write this function in Lisp, where Right Way 
> is defined as the way that produces reasonable runtime performance with 
> no arbitrary limit on the size of the input list without writing too 
> much code (for some reasonable value of "too much").  Should I use a 
> string output stream?  An adjustable vector of characters with a fill 
> pointer?

(defun join (list &optional (delim ""))
   (with-output-to-string (s)
     (when list
       (format s "~A" (first list))
       (dolist (element (rest list))
         (format s "~A~A" delim element)))))

?!?


Pascal

-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/

From: Phil Bewig
Subject: Re: So what is the right way to write string.join?
Date: Sun, 18 Nov 2007 23:13:34 +0000
Message-ID: <3381aee6-87cc-456d-9f30-26528efd4448@e4g2000hsg.googlegroups.com>

On Nov 18, 3:11 am, Ron Garret <·········@flownet.com> wrote:
> Leaving aside the format puzzle I posted earlier today, it occurred to
> me that writing string.join in Lisp the "right way" is surprisingly
> non-trivial.  After giving up on format as being too cute, I tried:
>
> (defun join (l &optional (delim ""))
>   (apply 'concatenate 'string (cdr (loop for i in l
>      collect delim collect (princ-to-string i)))))
>
> but that fails if l is longer than the arglist limit (or whatever that's
> called). Concatenating incrementally becomes an O(n^2) solution, which
> is bad.  (The fact that concatenate takes the things to concatenate as a
> restarg is starting to look like a bad design decision.)
>
> So what is the Right Way to write this function in Lisp, where Right Way
> is defined as the way that produces reasonable runtime performance with
> no arbitrary limit on the size of the input list without writing too
> much code (for some reasonable value of "too much").  Should I use a
> string output stream?  An adjustable vector of characters with a fill
> pointer?
>
> rg

Consider this algorithm, which I learned from Roberto Ierusalimschy:

The pieces of the string are accumulated in a stack of strings, with
the invariant that no string can sit atop a larger string (just like
the Tower of Hanoi, where no disk can sit atop a larger disk); the
stack initially contains only a single element, which is "" (the null
string).  Iterate through the pieces to be concatenated, adding each
piece to the top of the stack.  Any time a string is larger than one
below, concatenate the two into a single string.  Then check if the
newly-combined string is larger than the one below it, and so on; a
single insertion may cause a cascade of concatenations through the
stack.  When the input is exhausted, allocate a single string long
enough to hold all the pieces and copy the pieces into the single long
string.  This algorithm is O(n log n).

From: Sohail Somani
Subject: Re: So what is the right way to write string.join?
Date: Sun, 18 Nov 2007 23:46:40 +0000
Message-ID: <A940j.7742$Ji6.6140@edtnps89>

On Sun, 18 Nov 2007 15:13:34 -0800, Phil Bewig wrote:
> When the input is exhausted, allocate a single string long
> enough to hold all the pieces and copy the pieces into the single long
> string.  This algorithm is O(n log n).

Interesting. So what is n in this case?

-- 
Sohail Somani
http://uint32t.blogspot.com

From: Szymon 'tichy'
Subject: Re: So what is the right way to write string.join?
Date: Sun, 18 Nov 2007 21:39:21 +0000
Message-ID: <fhqbmg$f5u$1@atlantis.news.tpi.pl>

Hi.

Why not: http://groups.google.com/group/comp.lang.lisp/msg/d2ff9480322ffbfe

(defun join-strings (strings &key (delimiter ","))
    (collect-append 'string (spread (catenate #Z(0) (series 1))
                                    (scan strings)
                                    (string delimiter))))

Regards, Szymon.

From: Alan Crowe
Subject: Re: So what is the right way to write string.join?
Date: Mon, 19 Nov 2007 17:09:08 +0000
Message-ID: <86bq9qnowb.fsf@cawtech.freeserve.co.uk>

Ron Garret <·········@flownet.com> writes:

> Leaving aside the format puzzle I posted earlier today, it occurred to 
> me that writing string.join in Lisp the "right way" is surprisingly 
> non-trivial. 
> 
> So what is the Right Way to write this function in Lisp, where Right Way 
> is defined as the way that produces reasonable runtime performance with 
> no arbitrary limit on the size of the input list without writing too 
> much code (for some reasonable value of "too much").  Should I use a 
> string output stream?  An adjustable vector of characters with a fill 
> pointer?
> 
I think that the Right Way involves taking advantage of the
fact that you can calculate the size of the final string to
allocate a buffer and fill it. Then you have no lingering
worries about porting it and hitting a performance problem
because the new implementation has naive but conforming code.

(defun join (word-list separator)
  (if (endp word-list) ""
    (let ((buffer (make-string (+ (reduce #'+ word-list :key #'length)
                                  (* (length separator)
                                     (- (length word-list) 1)))))
          (pointer 0))
      (flet ((write-in (string)
               (replace buffer string :start1 pointer)
               (incf pointer (length string))))
        (write-in (first word-list))
        (dolist (word (rest word-list) buffer)
          (write-in separator)
          (write-in word))))))

Alan Crowe
Edinburgh
Scotland

From: Robert Maas, see http://tinyurl.com/uh3t
Subject: Re: So what is the right way to write string.join?
Date: Tue, 20 Nov 2007 02:30:31 +0000
Message-ID: <rem-2007nov19-013@yahoo.com>

> From: Ron Garret <·········@flownet.com>
> ... it occurred to me that writing string.join in Lisp the "right
> way" is surprisingly non-trivial. ... Should I use a string
> output stream?

Sure. That's basically the obvious way to do it. What's the purpose
of a string output stream? To accumulate all the pieces of a
string, just as if you had written all those pieces to a file then
read it back in, except without the overhead of system I/O combined
with reading it all back in. If all you're going to do is write the
result to a file, then just use regular I/O. If all you're going to
do is process it further interally (and the finally write the
result to file or do something else) then you want a string output
string. If you need both an in-memory string and a file with a
backup copy of the contents, then it's probably more efficient to
build it as a string output stream then write it all in one gulp to
disk, rather than write it piecewise to disk then read it back in
in a gulp.

Extra bonus note: If you are building the pieces sequentially, or
they are given pre-computed in a list, as here, it's easy. If you
are going to build the pieces in some random sequence, such as by
exploring a tree and returning pieces for each branch of the tree,
then you might first build a tree structure of all the pieces, then
flatten that tree by left-first depth-first traversal, which reduces
it to the problem we already solved here.

From: Alain Picard
Subject: Re: So what is the right way to write string.join?
Date: Tue, 20 Nov 2007 09:31:18 +0000
Message-ID: <87y7ctw9eh.fsf@memetrics.com>

Ron Garret <·········@flownet.com> writes:

> So what is the Right Way to write this function in Lisp, where Right
> Way is defined as the way that produces reasonable runtime
> performance with no arbitrary limit on the size of the input list
> without writing too much code (for some reasonable value of "too
> much").

Well, now that you're opening the field for arbitrary code,
I'll chime in:

I have, stashed away in a file that
always gets included, utilites which look like this:

(defun weave-between (fn1 fn2 source)
  "Apply fn1 to each element of SOURCE, weaving calls to fn2 between each.
   E.g. (weave-between #'print #'(lambda (&rest foo) (print \" , \")) '(1 2 3))
     => \"1 , 2 , 3\""
  (loop for (elem . rest) on source
	do
	(funcall fn1 elem)
	(when rest
	  (funcall fn2 elem))))

So I would write join something like this:

(defun join (l &optional (delim ""))
  (with-output-to-string (stream)
    (flet ((print-obj (obj)
	     (princ obj stream))
	   (print-sep (obj)
	     (declare (ignore obj))
	     (princ delim stream)))
      (declare (dynamic-extent #'print-obj #'print-sep))
      (weave-between #'print-obj #'print-sep l))))

Interestingly, the doc-string on weave-between was written
years ago, and it uses JOIN as its typical example; yet
I never (until now) felt the need to write JOIN as part
of my utilities.  How odd.

                        --ap

-- 
Please read about why Top Posting
is evil at: http://en.wikipedia.org/wiki/Top-posting
and http://www.dickalba.demon.co.uk/usenet/guide/faq_topp.html

Please read about why HTML in email is evil at: http://www.birdhouse.org/etc/evilmail.html