Text Processing

From: Bruce Feist
Subject: Text Processing
Date: Wed, 02 Dec 1992 01:54:26 +0000
Message-ID: <723262882.AA00000@blkcat.UUCP>

I'm open to suggestions on how to process text files in Common LISP for lexical
analysis.  Right now, I'm doing it character-by-character with read-char and
then using concat to combine the characters I read into tokens; is there a
better way?  How much more efficient would it be to use read-line and separate
tokens from each other?

Thanks.

Bruce

From: Rob MacLachlan
Subject: Re: Text Processing
Date: Tue, 15 Dec 1992 21:12:30 +0000
Message-ID: <BzBK91.FLG.1@cs.cmu.edu>

In article <·················@blkcat.UUCP> ···········@f615.n109.z1.fidonet.org (Bruce Feist) writes:
>I'm open to suggestions on how to process text files in Common LISP for
>lexical analysis.  Right now, I'm doing it character-by-character with
>read-char and then using concat to combine the characters I read into tokens;
>is there a better way?

If you are doing:
    (setq buffer (concatentate buffer (string char)))
you would be doing better by preallocating a string buffer and growing it as
needed, then taking SUBSEQ at the end of accumulation.

>How much more efficient would it be to use read-line and separate tokens from
>each other?

It depends on the implementation's relative costs of stream operation overhead
and string allocation.  I would guess that in implementations with
generational GC, the READ-LINE strategy could well be faster.  Most
implementation have internal operations to read a large number of characters
into a preallocated buffer, which could get the best of both worlds at a
portability cost.

  Rob