File I/O slowness

From: Jan Rychter
Subject: File I/O slowness
Date: Tue, 03 Jun 2003 07:41:55 +0000
Message-ID: <m2u1b7txoa.fsf@tnuctip.rychter.com>

I'm looking for advice on how to do faster I/O from CL. I have an
application (a simulation) that generates significant amounts of data
(such as several parameters at several up to a hundred million data
points).

Profiling has shown that my program spends most of its time in a
function which does little more than (format) several floating point and
integer numbers to an output stream.

Another test has shown that reading data using (read) is comparably
slow.

My I/O needs are really quite simple, I just need to write (and read)
rows composed of floats and integers. I guess both format and read are
overkill for that.

What would be the best way to do the I/O faster?

--J.

Re: File I/O slowness Pascal Bourguignon
- Re: File I/O slowness Rolf Wester
  - Re: File I/O slowness Mario S. Mommer
    - Re: File I/O slowness Pascal Bourguignon
      - Re: File I/O slowness Mario S. Mommer
Re: File I/O slowness Rudi Schlatte
- Re: File I/O slowness Raymond Wiker
Re: File I/O slowness Wade Humeniuk
- Re: File I/O slowness Jan Rychter
  - Re: File I/O slowness Pascal Bourguignon

From: Pascal Bourguignon
Subject: Re: File I/O slowness
Date: Tue, 03 Jun 2003 08:56:05 +0000
Message-ID: <87vfvnczfe.fsf@thalassa.informatimago.com>

Jan Rychter <···@rychter.com> writes:

> I'm looking for advice on how to do faster I/O from CL. I have an
> application (a simulation) that generates significant amounts of data
> (such as several parameters at several up to a hundred million data
> points).
> 
> Profiling has shown that my program spends most of its time in a
> function which does little more than (format) several floating point and
> integer numbers to an output stream.
> 
> Another test has shown that reading data using (read) is comparably
> slow.
> 
> My I/O needs are really quite simple, I just need to write (and read)
> rows composed of floats and integers. I guess both format and read are
> overkill for that.
> 
> What would be the best way to do the I/O faster?

- buffers
- binary I/O


For example:

(defparameter seq (make-array '(1000)
                              :element-type '(SIGNED-BYTE 32)
                              :initial-element 123456789));;seq

(time
  (with-open-file (out "/tmp/file.dat"
                       :direction :output
                       :if-exists :supersede :if-does-not-exist :create
                       :element-type '(SIGNED-BYTE 32))
    (dotimes (i 1000)
      (write-sequence seq out))))


(time
  (with-open-file (in  "/tmp/file.dat"
                       :direction :input
                       :if-does-not-exist :create
                       :element-type '(SIGNED-BYTE 32))
    (dotimes (i 1000)
      (read-sequence seq in))))

gives the following times in clisp-2.30/Athlon 1200 MHz

Real time: 0.564498 sec.
Run time: 0.43 sec.
Space: 94324 Bytes

Real time: 1.631381 sec.
Run time: 1.16 sec.
Space: 12094324 Bytes
GC: 12, GC time: 0.68 sec.

[······@thalassa tmp]$ od -x /tmp/file.dat 
0000000 cd15 075b cd15 075b cd15 075b cd15 075b
*
17204400



For floating-point numbers, use: integer-decode-float

 (multiple-value-bind (signif expon sign) (integer-decode-float f)
   (* (scale-float signif expon) sign))

 ==  f



-- 
__Pascal_Bourguignon__                   http://www.informatimago.com/
----------------------------------------------------------------------
Do not adjust your mind, there is a fault in reality.

From: Rolf Wester
Subject: Re: File I/O slowness
Date: Tue, 03 Jun 2003 10:04:13 +0000
Message-ID: <bbhrqt$l1r$1@nets3.rz.RWTH-Aachen.DE>

Pascal Bourguignon wrote:
> 
> 
> 
> For floating-point numbers, use: integer-decode-float
> 
>  (multiple-value-bind (signif expon sign) (integer-decode-float f)
>    (* (scale-float signif expon) sign))
> 
>  ==  f
> 

I have some problems with this example. Using CMUCL I get an error message.
integer-decode-float returns three integers but scale-float expects a 
float and an integer. Am I wrong?

Rolf Wester

From: Mario S. Mommer
Subject: Re: File I/O slowness
Date: Tue, 03 Jun 2003 10:15:27 +0000
Message-ID: <fzisrna2m8.fsf@cupid.igpm.rwth-aachen.de>

Rolf Wester <······@ilt.fhg.de> writes:
> I have some problems with this example. Using CMUCL I get an error message.
> integer-decode-float returns three integers but scale-float expects a
> float and an integer. Am I wrong?

The fastest way to dump binary data to disk (and retrieving it) under
cmucl is through the following code, posted some time ago by Pierre
Mai on cmucl-help.

It uses the compiler's fasl (fast loading) facility:

(defvar *internal-value-passer*)

(defun bindump (object filename)
  (let ((file (c::open-fasl-file (pathname filename) nil t)))
    (unwind-protect
        (let ((c::*coalesce-constants* nil)
              (c::*dump-only-valid-structures* nil)
              (c::*cold-load-dump* t))
          (c::dump-fop 'lisp::fop-normal-load file)
          (c::dump-object `(setq *internal-value-passer* (quote ,object)) file)
          (c::dump-fop 'lisp::fop-eval-for-effect file))
      (c::close-fasl-file file nil)))
  t)

(defun binload (filename)
  (let ((*internal-value-passer* nil))
    (load filename :verbose nil :print nil)
    *internal-value-passer*))

The object argument to bindump can be pretty much anything. I've tried
with structs, lists of vectors and symbols etc. it just works.

And is /FAST/.

Regards,
        Mario.

From: Pascal Bourguignon
Subject: Re: File I/O slowness
Date: Wed, 04 Jun 2003 12:25:27 +0000
Message-ID: <878ysif2rs.fsf@thalassa.informatimago.com>

Mario S. Mommer <········@yahoo.com> writes:

> Rolf Wester <······@ilt.fhg.de> writes:
> > I have some problems with this example. Using CMUCL I get an error message.
> > integer-decode-float returns three integers but scale-float expects a
> > float and an integer. Am I wrong?
> 
> The fastest way to dump binary data to disk (and retrieving it) under
> cmucl is through the following code, posted some time ago by Pierre
> Mai on cmucl-help.
> 
> It uses the compiler's fasl (fast loading) facility:
> 
> (defvar *internal-value-passer*)
> 
> (defun bindump (object filename)
>   (let ((file (c::open-fasl-file (pathname filename) nil t)))
>     (unwind-protect
>         (let ((c::*coalesce-constants* nil)
>               (c::*dump-only-valid-structures* nil)
>               (c::*cold-load-dump* t))
>           (c::dump-fop 'lisp::fop-normal-load file)
>           (c::dump-object `(setq *internal-value-passer* (quote ,object)) file)
>           (c::dump-fop 'lisp::fop-eval-for-effect file))
>       (c::close-fasl-file file nil)))
>   t)
> 
> (defun binload (filename)
>   (let ((*internal-value-passer* nil))
>     (load filename :verbose nil :print nil)
>     *internal-value-passer*))
> 
> The object argument to bindump can be pretty much anything. I've tried
> with structs, lists of vectors and symbols etc. it just works.
> 
> And is /FAST/.

And NON portable.  

I admit  that I replaced decode-float  by integer-decode-float blindly
in the example found in CLHS.   You could convert the first integer to
a  float without loss  of precision,  because it's  the mantissa  of a
float.

-- 
__Pascal_Bourguignon__                   http://www.informatimago.com/
----------------------------------------------------------------------
Do not adjust your mind, there is a fault in reality.

From: Mario S. Mommer
Subject: Re: File I/O slowness
Date: Fri, 06 Jun 2003 08:15:27 +0000
Message-ID: <fz3cinaag0.fsf@cupid.igpm.rwth-aachen.de>

Pascal Bourguignon <····@thalassa.informatimago.com> writes:
> Mario S. Mommer <········@yahoo.com> writes:
> > 
> > And is /FAST/.
> 
> And NON portable.  

Here's a portable version, based on another idea by Pierre Mai
<······································@cons.org/msg00547.html>.
Please excuse the rather lame temporary filename generation :)

(defpackage #:faslstore
  (:export #:bindump #:binload)
  (:nicknames #:fs)
  (:use :cl))

(in-package #:faslstore)

(defparameter *hook* nil)

(defun gentempname nil
  "Generate a rather unlikely filename."
  (format nil "~Afaslize.lisp" (get-universal-time)))

(defun bindump (data fname)
  (let ((tmp (gentempname)))
    (setq *hook* data)
    (with-open-file (str tmp
			 :direction :output
			 :if-exists :supersede) ;:error)
	(format str "(in-package #:faslstore)~%~
                     (let (c)~%~
                       (defun returner nil~%~
                       (cond ((not c) (setf c t) '#.*hook*)~%~
                             (t nil))))~%"))
    (compile-file tmp :output-file fname)
    (delete-file tmp)))

(defun returner nil nil)

(defun binload (fname)
  (load fname)
  (returner))

From: Rudi Schlatte
Subject: Re: File I/O slowness
Date: Tue, 03 Jun 2003 09:48:31 +0000
Message-ID: <8765nnze34.fsf@semmel.constantly.at>

Jan Rychter <···@rychter.com> writes:

> I'm looking for advice on how to do faster I/O from CL. I have an
> application (a simulation) that generates significant amounts of data
> (such as several parameters at several up to a hundred million data
> points).
>
> Profiling has shown that my program spends most of its time in a
> function which does little more than (format) several floating point and
> integer numbers to an output stream.
>

Kevin Rosenberg has recently written some articles about this topic at
<URL:http://b9.com/> ("Avoiding Allegro's Format", "A Bottleneck").
These should help at the output side.

>
> Another test has shown that reading data using (read) is comparably
>slow.
>
> My I/O needs are really quite simple, I just need to write (and read)
> rows composed of floats and integers. I guess both format and read are
> overkill for that.
>

Use specific functions: parse-integer instead of read when you know an
integer is coming, and check if your implementation supplies a way to
parse floats (most do, AFAIK).

Hope that helps,

Rudi
-- 
whois DRS1020334-NICAT                    http://constantly.at/pubkey.gpg.asc
     Key fingerprint = C182 F738 6B9A 83AF 9C25  62D9 EFAE 45A6 9A69 0867

From: Raymond Wiker
Subject: Re: File I/O slowness
Date: Tue, 03 Jun 2003 10:08:52 +0000
Message-ID: <86r86bscaz.fsf@raw.grenland.fast.no>

Rudi Schlatte <·········@ist.tu-graz.ac.at> writes:

> Jan Rychter <···@rychter.com> writes:
> 
> > I'm looking for advice on how to do faster I/O from CL. I have an
> > application (a simulation) that generates significant amounts of data
> > (such as several parameters at several up to a hundred million data
> > points).
> >
> > Profiling has shown that my program spends most of its time in a
> > function which does little more than (format) several floating point and
> > integer numbers to an output stream.
> >
> 
> Kevin Rosenberg has recently written some articles about this topic at
> <URL:http://b9.com/> ("Avoiding Allegro's Format", "A Bottleneck").
> These should help at the output side.

        The macro "formatter" may come in handy - it creates a
function that is equivalent to a format string. E.g, instead of 

(format t "~a, ~a~%" "hello" "world")

you'd use

(format t (formatter "~a, ~a~%") "hello" "world")

with SBCL, (macroexpand 'formatter "~a, ~a~%")) gives 

#'(LAMBDA
      (STREAM
       &OPTIONAL
       (#:FORMAT-ARG-2478
        (ERROR 'SB-FORMAT::FORMAT-ERROR
               :COMPLAINT
               "required argument missing"
               :CONTROL-STRING
               "~a, ~a~%"
               :OFFSET
               1))
       (#:FORMAT-ARG-2479
        (ERROR 'SB-FORMAT::FORMAT-ERROR
               :COMPLAINT
               "required argument missing"
               :CONTROL-STRING
               "~a, ~a~%"
               :OFFSET
               5))
       &REST SB-FORMAT::ARGS)
    (BLOCK NIL
      (PRINC #:FORMAT-ARG-2478 STREAM)
      (WRITE-STRING ", " STREAM)
      (PRINC #:FORMAT-ARG-2479 STREAM)
      (TERPRI STREAM))
    SB-FORMAT::ARGS)


-- 
Raymond Wiker                        Mail:  ·············@fast.no
Senior Software Engineer             Web:   http://www.fast.no/
Fast Search & Transfer ASA           Phone: +47 23 01 11 60
P.O. Box 1677 Vika                   Fax:   +47 35 54 87 99
NO-0120 Oslo, NORWAY                 Mob:   +47 48 01 11 60

Try FAST Search: http://alltheweb.com/

From: Wade Humeniuk
Subject: Re: File I/O slowness
Date: Tue, 03 Jun 2003 14:59:47 +0000
Message-ID: <DJ2Da.8904$6f3.1666389@news1.telusplanet.net>

"Jan Rychter" <···@rychter.com> wrote in message ···················@tnuctip.rychter.com...
> I'm looking for advice on how to do faster I/O from CL. I have an
> application (a simulation) that generates significant amounts of data
> (such as several parameters at several up to a hundred million data
> points).
...
> 
> What would be the best way to do the I/O faster?

I have some questions.

Which CL Implementation are you using?

Do you wish to stay within ANSI CL functionality, or
are you willing to use implementation specific extensions?

Is this data being read at your apps start-up, or is it
somehow doing it all the time?  Why are you saving and
then reading the data?

Wade

From: Jan Rychter
Subject: Re: File I/O slowness
Date: Tue, 03 Jun 2003 21:31:07 +0000
Message-ID: <m21xyasvag.fsf@tnuctip.rychter.com>

>>>>> "Wade" == Wade Humeniuk <····@nospam.nowhere> writes:
 Wade> "Jan Rychter" <···@rychter.com> wrote in message
 Wade> ···················@tnuctip.rychter.com...
 >> I'm looking for advice on how to do faster I/O from CL. I have an
 >> application (a simulation) that generates significant amounts of
 >> data (such as several parameters at several up to a hundred million
 >> data points).
 Wade> ...
 >>
 >> What would be the best way to do the I/O faster?

 Wade> I have some questions.

 Wade> Which CL Implementation are you using?

CMUCL-18e.

 Wade> Do you wish to stay within ANSI CL functionality, or are you
 Wade> willing to use implementation specific extensions?

I'd prefer to stay with ANSI CL, if that's possible. If not, I'm willing
to trade portability for performance (but that's always a painful choice
to make).

 Wade> Is this data being read at your apps start-up, or is it somehow
 Wade> doing it all the time?  

I probably should have mentioned that I need the data to be read by
other applications as well (such as code written in Mathematica). So it
pretty much has to be numbers represented in ASCII, or I'll have to
implement (and debug) binary I/O.

Actually, binary I/O could be an option, but probably only if I had a
library that would service one of existing scientific formats (such as
HDF or FITS, both of which Mathematica can read). However, googling for
"HDF format lisp" didn't produce much interesting stuff.

 Wade> Why are you saving and then reading the data?

I'm running a simulation and I can't afford to hold everything in
memory. However, after the simulation is done, I can afford to read
*some* of the generated data back and process it.

BTW, thanks to everybody who gave answers and suggestions in this
thread.

--J.

From: Pascal Bourguignon
Subject: Re: File I/O slowness
Date: Wed, 04 Jun 2003 12:30:51 +0000
Message-ID: <874r36f2is.fsf@thalassa.informatimago.com>

Jan Rychter <···@rychter.com> writes:
> I probably should have mentioned that I need the data to be read by
> other applications as well (such as code written in Mathematica). So it
> pretty much has to be numbers represented in ASCII, or I'll have to
> implement (and debug) binary I/O.

Do  IEEE  floating point  units  provide  a  float to  ASCII  hardware
operation?  I doubt  it.  So  converting between  floating  points and
ASCII  is   necessarily  a  costly   operation  (well,  in   terms  of
computation, not of I/O).

First  try with  buffering:  instead of  outputing  directly from  the
format, put the result in a buffer and write out the buffer.

If that's not  fast enough, then it'll probably  mean that the problem
comes from the conversion and that you need to do binary I/O.

> Actually, binary I/O could be an option, but probably only if I had a
> library that would service one of existing scientific formats (such as
> HDF or FITS, both of which Mathematica can read). However, googling for
> "HDF format lisp" didn't produce much interesting stuff.

The number obtained from  integer-decode-float should allow for a very
portable binary format.

>  Wade> Why are you saving and then reading the data?
> 
> I'm running a simulation and I can't afford to hold everything in
> memory. However, after the simulation is done, I can afford to read
> *some* of the generated data back and process it.
> 
> BTW, thanks to everybody who gave answers and suggestions in this
> thread.
> 
> --J.

-- 
__Pascal_Bourguignon__                   http://www.informatimago.com/
----------------------------------------------------------------------
Do not adjust your mind, there is a fault in reality.