From: Erik Naggum
Subject: Re: how to efficiently concatenate strings?
Date:
Message-ID: <3108455196865392@naggum.no>
* kp gores
| in a loop reading a file char-by-char into current-char i do :
consider reading the whole file into a string with READ-SEQUENCE, then
extract strings from it with SUBSEQ or with displaced arrays if they are
longish and would only create garbage.
| to collect the characters into a string current-token.
well, one option is to collect the characters into a list, and finally
(apply #'concatenate 'string <list>), but that may seem wasteful to many.
another option is to create an adjustable string with a fill pointer (use
MAKE-ARRAY) and use VECTOR-PUSH-EXTEND to deposit characters into the
string as you read them. for extra optimization, you can reuse the
buffer and reset the fill pointer after you have copied out and returned
the string you're interested in.
incidentally, I think a buffering protocol would be very useful in Common
Lisp streams, such that one could put a "mark" in a buffer and extract
the string from the mark to the current read point. this would have
saved me a lot of hassle in copying strings from input files, considering
that a significant cost of some file read operations are in the copying
of the characters, and optimizing for that cost can lead to weird code.
#:Erik
--
http://www.naggum.no/spam.html is about my spam protection scheme and how
to guarantee that you reach me. in brief: if you reply to a news article
of mine, be sure to include an In-Reply-To or References header with the
message-ID of that message in it. otherwise, you need to read that page.
From: Howard R. Stearns
Subject: buffering protcol (was Re: how to efficiently concatenate strings?)
Date:
Message-ID: <35A115DC.68F5@elwood.com>
Erik Naggum wrote:
> incidentally, I think a buffering protocol would be very useful in Common
> Lisp streams, such that one could put a "mark" in a buffer and extract
> the string from the mark to the current read point. this would have
> saved me a lot of hassle in copying strings from input files, considering
> that a significant cost of some file read operations are in the copying
> of the characters, and optimizing for that cost can lead to weird code.
>
I'd also like to see such a thing. From my point of view, it would be
an extension to "CLOS streams", and also provide a MOP hook into "the
reader algorithim" (i.e. the code that all Lisp implementations have
that implements the reader algorithm, and is called by such functions as
READ).
In my view, on measure of the quality of the design of such a protocol
would be that the protocol itself would not assume extended characters
(i.e. could be implemented efficiently in base-char-only Lisps), but
could be easily extended by the implementation or users to support BOTH
one-to-one Lisp <-> external-format mappings such as extended-char <->
UCS-2/UCS-4 AND one-to-many mappings such as extended-char <->
multi-byte/UTF-8. (The latter probably incorporating some
double-buffering protocol.)
Any ideas?