I apologize if this is a common topic, but I could not find anything in the FAQ.
I am writing an application that is well suited to functional programming but I have come to a rather basic obstacle. One of the things that I need to do is read in data from a stream and write it out to a file, until I reach a predefined delimiter. The data could be binary and arbitrarily large.
I've tried looping read-line (which actually works nice because the delimiter of this data always follows a newline). It runs a little slow but not too bad. My problem with that method is that the read-line function truncates the newline portion which makes it useless for working with binary data. Also rather useless if the data is for example all binary zeros :)
So I have a nice accurate solution with a looping read-char. I can handle the added complexity it adds to the code, but I cannot however handle the added IO overhead this adds. Even the most basic "do" loop with only a read-char and a write-char will take several seconds or minutes to read and write a 5MB file. Add in the overhead of searching for a text pattern and you can see where I am having problems.
I've been hacking with lisp on and off for a couple of years now, so it pains me to ask such a basic question, but here it is:
What is the most efficient solution for basic file IO?
--
Nicholas Harbour, ···············@yahoo.com
"Nearly half of all people are below average."
Nicholas Harbour <···············@yahoo.com> writes:
> I apologize if this is a common topic, but I could not find anything in the FAQ.
>
> I am writing an application that is well suited to functional
> programming but I have come to a rather basic obstacle. One of the
> things that I need to do is read in data from a stream and write it
> out to a file, until I reach a predefined delimiter. The data could
> be binary and arbitrarily large.
>
> I've tried looping read-line [...]
[ Yuck! I don't even do that in Perl! (not that I haven't been tempted) ]
>
> So I have a nice accurate solution with a looping read-char.
If it's binary input, my first suggestion is that you use binary
streams, not character streams.
> I can handle the added complexity it adds to the code, but I cannot
> however handle the added IO overhead this adds. Even the most basic
> "do" loop with only a read-char and a write-char will take several
> seconds or minutes to read and write a 5MB file. Add in the
> overhead of searching for a text pattern and you can see where I am
> having problems.
So, use buffering. Combining binary streams and READ-SEQUENCE on
(vector (unsigned-byte 32)) objects should get you the efficient I/O
you need.
As for combining efficiency with ease of use, if you haven't checked
out Gray streams, you should. And if you're searching for a text
pattern and you're concerned with efficiency, hopefully you've built a
search tree or something similar, and you're reading in your data in
appropriate chunks.
--
/|_ .-----------------------.
,' .\ / | No to Imperialist war |
,--' _,' | Wage class war! |
/ / `-----------------------'
( -. |
| ) |
(`-. '--.)
`. )----'
···@famine.OCF.Berkeley.EDU (Thomas F. Burdick) writes:
> Nicholas Harbour <···············@yahoo.com> writes:
>
> > I can handle the added complexity it adds to the code, but I cannot
> > however handle the added IO overhead this adds. Even the most basic
> > "do" loop with only a read-char and a write-char will take several
> > seconds or minutes to read and write a 5MB file. Add in the
> > overhead of searching for a text pattern and you can see where I am
> > having problems.
>
> So, use buffering. Combining binary streams and READ-SEQUENCE on
> (vector (unsigned-byte 32)) objects should get you the efficient I/O
> you need.
Two things:
1. Might be better to use a simble-array with needed element type.
2. The fact that 32-bit bytes should be used is just an example, right?
--
Janis Dzerins
If million people say a stupid thing, it's still a stupid thing.