I'm looking for advice on how to do faster I/O from CL. I have an
application (a simulation) that generates significant amounts of data
(such as several parameters at several up to a hundred million data
points).
Profiling has shown that my program spends most of its time in a
function which does little more than (format) several floating point and
integer numbers to an output stream.
Another test has shown that reading data using (read) is comparably
slow.
My I/O needs are really quite simple, I just need to write (and read)
rows composed of floats and integers. I guess both format and read are
overkill for that.
What would be the best way to do the I/O faster?
--J.
Jan Rychter <···@rychter.com> writes:
> I'm looking for advice on how to do faster I/O from CL. I have an
> application (a simulation) that generates significant amounts of data
> (such as several parameters at several up to a hundred million data
> points).
>
> Profiling has shown that my program spends most of its time in a
> function which does little more than (format) several floating point and
> integer numbers to an output stream.
>
> Another test has shown that reading data using (read) is comparably
> slow.
>
> My I/O needs are really quite simple, I just need to write (and read)
> rows composed of floats and integers. I guess both format and read are
> overkill for that.
>
> What would be the best way to do the I/O faster?
- buffers
- binary I/O
For example:
(defparameter seq (make-array '(1000)
:element-type '(SIGNED-BYTE 32)
:initial-element 123456789));;seq
(time
(with-open-file (out "/tmp/file.dat"
:direction :output
:if-exists :supersede :if-does-not-exist :create
:element-type '(SIGNED-BYTE 32))
(dotimes (i 1000)
(write-sequence seq out))))
(time
(with-open-file (in "/tmp/file.dat"
:direction :input
:if-does-not-exist :create
:element-type '(SIGNED-BYTE 32))
(dotimes (i 1000)
(read-sequence seq in))))
gives the following times in clisp-2.30/Athlon 1200 MHz
Real time: 0.564498 sec.
Run time: 0.43 sec.
Space: 94324 Bytes
Real time: 1.631381 sec.
Run time: 1.16 sec.
Space: 12094324 Bytes
GC: 12, GC time: 0.68 sec.
[······@thalassa tmp]$ od -x /tmp/file.dat
0000000 cd15 075b cd15 075b cd15 075b cd15 075b
*
17204400
For floating-point numbers, use: integer-decode-float
(multiple-value-bind (signif expon sign) (integer-decode-float f)
(* (scale-float signif expon) sign))
== f
--
__Pascal_Bourguignon__ http://www.informatimago.com/
----------------------------------------------------------------------
Do not adjust your mind, there is a fault in reality.
Pascal Bourguignon wrote:
>
>
>
> For floating-point numbers, use: integer-decode-float
>
> (multiple-value-bind (signif expon sign) (integer-decode-float f)
> (* (scale-float signif expon) sign))
>
> == f
>
I have some problems with this example. Using CMUCL I get an error message.
integer-decode-float returns three integers but scale-float expects a
float and an integer. Am I wrong?
Rolf Wester
Rolf Wester <······@ilt.fhg.de> writes:
> I have some problems with this example. Using CMUCL I get an error message.
> integer-decode-float returns three integers but scale-float expects a
> float and an integer. Am I wrong?
The fastest way to dump binary data to disk (and retrieving it) under
cmucl is through the following code, posted some time ago by Pierre
Mai on cmucl-help.
It uses the compiler's fasl (fast loading) facility:
(defvar *internal-value-passer*)
(defun bindump (object filename)
(let ((file (c::open-fasl-file (pathname filename) nil t)))
(unwind-protect
(let ((c::*coalesce-constants* nil)
(c::*dump-only-valid-structures* nil)
(c::*cold-load-dump* t))
(c::dump-fop 'lisp::fop-normal-load file)
(c::dump-object `(setq *internal-value-passer* (quote ,object)) file)
(c::dump-fop 'lisp::fop-eval-for-effect file))
(c::close-fasl-file file nil)))
t)
(defun binload (filename)
(let ((*internal-value-passer* nil))
(load filename :verbose nil :print nil)
*internal-value-passer*))
The object argument to bindump can be pretty much anything. I've tried
with structs, lists of vectors and symbols etc. it just works.
And is /FAST/.
Regards,
Mario.
Mario S. Mommer <········@yahoo.com> writes:
> Rolf Wester <······@ilt.fhg.de> writes:
> > I have some problems with this example. Using CMUCL I get an error message.
> > integer-decode-float returns three integers but scale-float expects a
> > float and an integer. Am I wrong?
>
> The fastest way to dump binary data to disk (and retrieving it) under
> cmucl is through the following code, posted some time ago by Pierre
> Mai on cmucl-help.
>
> It uses the compiler's fasl (fast loading) facility:
>
> (defvar *internal-value-passer*)
>
> (defun bindump (object filename)
> (let ((file (c::open-fasl-file (pathname filename) nil t)))
> (unwind-protect
> (let ((c::*coalesce-constants* nil)
> (c::*dump-only-valid-structures* nil)
> (c::*cold-load-dump* t))
> (c::dump-fop 'lisp::fop-normal-load file)
> (c::dump-object `(setq *internal-value-passer* (quote ,object)) file)
> (c::dump-fop 'lisp::fop-eval-for-effect file))
> (c::close-fasl-file file nil)))
> t)
>
> (defun binload (filename)
> (let ((*internal-value-passer* nil))
> (load filename :verbose nil :print nil)
> *internal-value-passer*))
>
> The object argument to bindump can be pretty much anything. I've tried
> with structs, lists of vectors and symbols etc. it just works.
>
> And is /FAST/.
And NON portable.
I admit that I replaced decode-float by integer-decode-float blindly
in the example found in CLHS. You could convert the first integer to
a float without loss of precision, because it's the mantissa of a
float.
--
__Pascal_Bourguignon__ http://www.informatimago.com/
----------------------------------------------------------------------
Do not adjust your mind, there is a fault in reality.
Jan Rychter <···@rychter.com> writes:
> I'm looking for advice on how to do faster I/O from CL. I have an
> application (a simulation) that generates significant amounts of data
> (such as several parameters at several up to a hundred million data
> points).
>
> Profiling has shown that my program spends most of its time in a
> function which does little more than (format) several floating point and
> integer numbers to an output stream.
>
Kevin Rosenberg has recently written some articles about this topic at
<URL:http://b9.com/> ("Avoiding Allegro's Format", "A Bottleneck").
These should help at the output side.
>
> Another test has shown that reading data using (read) is comparably
>slow.
>
> My I/O needs are really quite simple, I just need to write (and read)
> rows composed of floats and integers. I guess both format and read are
> overkill for that.
>
Use specific functions: parse-integer instead of read when you know an
integer is coming, and check if your implementation supplies a way to
parse floats (most do, AFAIK).
Hope that helps,
Rudi
--
whois DRS1020334-NICAT http://constantly.at/pubkey.gpg.asc
Key fingerprint = C182 F738 6B9A 83AF 9C25 62D9 EFAE 45A6 9A69 0867
Rudi Schlatte <·········@ist.tu-graz.ac.at> writes:
> Jan Rychter <···@rychter.com> writes:
>
> > I'm looking for advice on how to do faster I/O from CL. I have an
> > application (a simulation) that generates significant amounts of data
> > (such as several parameters at several up to a hundred million data
> > points).
> >
> > Profiling has shown that my program spends most of its time in a
> > function which does little more than (format) several floating point and
> > integer numbers to an output stream.
> >
>
> Kevin Rosenberg has recently written some articles about this topic at
> <URL:http://b9.com/> ("Avoiding Allegro's Format", "A Bottleneck").
> These should help at the output side.
The macro "formatter" may come in handy - it creates a
function that is equivalent to a format string. E.g, instead of
(format t "~a, ~a~%" "hello" "world")
you'd use
(format t (formatter "~a, ~a~%") "hello" "world")
with SBCL, (macroexpand 'formatter "~a, ~a~%")) gives
#'(LAMBDA
(STREAM
&OPTIONAL
(#:FORMAT-ARG-2478
(ERROR 'SB-FORMAT::FORMAT-ERROR
:COMPLAINT
"required argument missing"
:CONTROL-STRING
"~a, ~a~%"
:OFFSET
1))
(#:FORMAT-ARG-2479
(ERROR 'SB-FORMAT::FORMAT-ERROR
:COMPLAINT
"required argument missing"
:CONTROL-STRING
"~a, ~a~%"
:OFFSET
5))
&REST SB-FORMAT::ARGS)
(BLOCK NIL
(PRINC #:FORMAT-ARG-2478 STREAM)
(WRITE-STRING ", " STREAM)
(PRINC #:FORMAT-ARG-2479 STREAM)
(TERPRI STREAM))
SB-FORMAT::ARGS)
--
Raymond Wiker Mail: ·············@fast.no
Senior Software Engineer Web: http://www.fast.no/
Fast Search & Transfer ASA Phone: +47 23 01 11 60
P.O. Box 1677 Vika Fax: +47 35 54 87 99
NO-0120 Oslo, NORWAY Mob: +47 48 01 11 60
Try FAST Search: http://alltheweb.com/
"Jan Rychter" <···@rychter.com> wrote in message ···················@tnuctip.rychter.com...
> I'm looking for advice on how to do faster I/O from CL. I have an
> application (a simulation) that generates significant amounts of data
> (such as several parameters at several up to a hundred million data
> points).
...
>
> What would be the best way to do the I/O faster?
I have some questions.
Which CL Implementation are you using?
Do you wish to stay within ANSI CL functionality, or
are you willing to use implementation specific extensions?
Is this data being read at your apps start-up, or is it
somehow doing it all the time? Why are you saving and
then reading the data?
Wade
>>>>> "Wade" == Wade Humeniuk <····@nospam.nowhere> writes:
Wade> "Jan Rychter" <···@rychter.com> wrote in message
Wade> ···················@tnuctip.rychter.com...
>> I'm looking for advice on how to do faster I/O from CL. I have an
>> application (a simulation) that generates significant amounts of
>> data (such as several parameters at several up to a hundred million
>> data points).
Wade> ...
>>
>> What would be the best way to do the I/O faster?
Wade> I have some questions.
Wade> Which CL Implementation are you using?
CMUCL-18e.
Wade> Do you wish to stay within ANSI CL functionality, or are you
Wade> willing to use implementation specific extensions?
I'd prefer to stay with ANSI CL, if that's possible. If not, I'm willing
to trade portability for performance (but that's always a painful choice
to make).
Wade> Is this data being read at your apps start-up, or is it somehow
Wade> doing it all the time?
I probably should have mentioned that I need the data to be read by
other applications as well (such as code written in Mathematica). So it
pretty much has to be numbers represented in ASCII, or I'll have to
implement (and debug) binary I/O.
Actually, binary I/O could be an option, but probably only if I had a
library that would service one of existing scientific formats (such as
HDF or FITS, both of which Mathematica can read). However, googling for
"HDF format lisp" didn't produce much interesting stuff.
Wade> Why are you saving and then reading the data?
I'm running a simulation and I can't afford to hold everything in
memory. However, after the simulation is done, I can afford to read
*some* of the generated data back and process it.
BTW, thanks to everybody who gave answers and suggestions in this
thread.
--J.
Jan Rychter <···@rychter.com> writes:
> I probably should have mentioned that I need the data to be read by
> other applications as well (such as code written in Mathematica). So it
> pretty much has to be numbers represented in ASCII, or I'll have to
> implement (and debug) binary I/O.
Do IEEE floating point units provide a float to ASCII hardware
operation? I doubt it. So converting between floating points and
ASCII is necessarily a costly operation (well, in terms of
computation, not of I/O).
First try with buffering: instead of outputing directly from the
format, put the result in a buffer and write out the buffer.
If that's not fast enough, then it'll probably mean that the problem
comes from the conversion and that you need to do binary I/O.
> Actually, binary I/O could be an option, but probably only if I had a
> library that would service one of existing scientific formats (such as
> HDF or FITS, both of which Mathematica can read). However, googling for
> "HDF format lisp" didn't produce much interesting stuff.
The number obtained from integer-decode-float should allow for a very
portable binary format.
> Wade> Why are you saving and then reading the data?
>
> I'm running a simulation and I can't afford to hold everything in
> memory. However, after the simulation is done, I can afford to read
> *some* of the generated data back and process it.
>
> BTW, thanks to everybody who gave answers and suggestions in this
> thread.
>
> --J.
--
__Pascal_Bourguignon__ http://www.informatimago.com/
----------------------------------------------------------------------
Do not adjust your mind, there is a fault in reality.