Re: how to efficiently concatenate strings?

From: Erik Naggum
Subject: Re: how to efficiently concatenate strings?
Date: Fri, 03 Jul 1998 00:00:00 +0000
Message-ID: <3108455196865392@naggum.no>

* kp gores
| in a loop reading a file char-by-char into current-char i do :

  consider reading the whole file into a string with READ-SEQUENCE, then
  extract strings from it with SUBSEQ or with displaced arrays if they are
  longish and would only create garbage.

| to collect the characters into  a string current-token.

  well, one option is to collect the characters into a list, and finally
  (apply #'concatenate 'string <list>), but that may seem wasteful to many.

  another option is to create an adjustable string with a fill pointer (use
  MAKE-ARRAY) and use VECTOR-PUSH-EXTEND to deposit characters into the
  string as you read them.  for extra optimization, you can reuse the
  buffer and reset the fill pointer after you have copied out and returned
  the string you're interested in.

  incidentally, I think a buffering protocol would be very useful in Common
  Lisp streams, such that one could put a "mark" in a buffer and extract
  the string from the mark to the current read point.  this would have
  saved me a lot of hassle in copying strings from input files, considering
  that a significant cost of some file read operations are in the copying
  of the characters, and optimizing for that cost can lead to weird code.

#:Erik
-- 
  http://www.naggum.no/spam.html is about my spam protection scheme and how
  to guarantee that you reach me.  in brief: if you reply to a news article
  of mine, be sure to include an In-Reply-To or References header with the
  message-ID of that message in it.  otherwise, you need to read that page.

From: Howard R. Stearns
Subject: buffering protcol (was Re: how to efficiently concatenate strings?)
Date: Mon, 06 Jul 1998 00:00:00 +0000
Message-ID: <35A115DC.68F5@elwood.com>

Erik Naggum wrote:
>   incidentally, I think a buffering protocol would be very useful in Common
>   Lisp streams, such that one could put a "mark" in a buffer and extract
>   the string from the mark to the current read point.  this would have
>   saved me a lot of hassle in copying strings from input files, considering
>   that a significant cost of some file read operations are in the copying
>   of the characters, and optimizing for that cost can lead to weird code.
> 

I'd also like to see such a thing.  From my point of view, it would be
an extension to "CLOS streams", and also provide a MOP hook into "the
reader algorithim" (i.e. the code that all Lisp implementations have
that implements the reader algorithm, and is called by such functions as
READ).

In my view, on measure of the quality of the design of such a protocol
would be that the protocol itself would not assume extended characters
(i.e. could be implemented efficiently in base-char-only Lisps), but
could be easily extended by the implementation or users to support BOTH
one-to-one Lisp <-> external-format mappings such as extended-char <->
UCS-2/UCS-4 AND one-to-many mappings such as extended-char <->
multi-byte/UTF-8.  (The latter probably incorporating some
double-buffering protocol.)

Any ideas?