Big files in memory

From: Software Scavenger
Subject: Big files in memory
Date: Sun, 09 Dec 2001 00:18:33 +0000
Message-ID: <a6789134.0112081618.728cef96@posting.google.com>

What is the easiest way, using LWW or ACL, to read a 20-megabyte file
into memory and establish random access to it as if it were an array?

Re: Big files in memory Wade Humeniuk
Re: Big files in memory Kent M Pitman
- Re: Big files in memory Software Scavenger
  - Re: Big files in memory Thomas F. Burdick
    - Re: Big files in memory - simple cases, optimization, correctness Michael J. Ferrador
  - Re: Big files in memory Pierre R. Mai
    - Re: Big files in memory Duane Rettig
      - Re: Big files in memory Lieven Marchand
        Re: Big files in memory Duane Rettig
        Re: Big files in memory Lieven Marchand
        Re: Big files in memory Pierre R. Mai
        Re: Big files in memory Lieven Marchand
        My conclusions (Re: Big files in memory) Duane Rettig
        Re: My conclusions (Re: Big files in memory) Kent M Pitman
        Re: My conclusions (Re: Big files in memory) Martin Cracauer
        Re: My conclusions (Re: Big files in memory) ········@acm.org
- Re: Big files in memory Kent M Pitman
Re: Big files in memory Gareth McCaughan
Re: Big files in memory Duane Rettig

From: Wade Humeniuk
Subject: Re: Big files in memory
Date: Sun, 09 Dec 2001 01:02:31 +0000
Message-ID: <9uud63$6hg$1@news3.cadvision.com>

Since it is Windows I would suggest using the win32 api to do a file mapping
(using the FFI)  into memory and then release it when you are done.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/file
map_79wn.asp

Wade


----- Original Message -----
From: "Software Scavenger" <··········@mailandnews.com>
Newsgroups: comp.lang.lisp
Sent: Saturday, December 08, 2001 5:18 PM
Subject: Big files in memory


> What is the easiest way, using LWW or ACL, to read a 20-megabyte file
> into memory and establish random access to it as if it were an array?

From: Kent M Pitman
Subject: Re: Big files in memory
Date: Sun, 09 Dec 2001 01:21:48 +0000
Message-ID: <sfw667hdw7n.fsf@shell01.TheWorld.com>

··········@mailandnews.com (Software Scavenger) writes:

> What is the easiest way, using LWW or ACL, to read a 20-megabyte file
> into memory and establish random access to it as if it were an array?

As a rule, CL is not a language in which it is a good idea to
overconstrain the answer to your question.  Is your desire to achieve
a particular solution to the question of random access, excluding
other possible solutions, or do you just assume that reading a file
into memory is the only way to do this?  If you could just open the
file in random access mode and move around in it and access various
parts of it, would that be sufficient?  CL is a language about "what"
you want to do, not "how" to do it.  (Things in Lisp give you
functionality, but little info on how that functionality occurs.  C,
by contrast, is a language that is very about "how" but not really
about "what".  C programs just do as they're told with little regard
as to what you're trying to achieve.)

From: Software Scavenger
Subject: Re: Big files in memory
Date: Sun, 09 Dec 2001 09:08:51 +0000
Message-ID: <a6789134.0112090108.4ecc776a@posting.google.com>

Kent M Pitman <······@world.std.com> wrote in message news:<···············@shell01.TheWorld.com>...

> As a rule, CL is not a language in which it is a good idea to
> overconstrain the answer to your question.  Is your desire to achieve
> a particular solution to the question of random access, excluding
> other possible solutions, or do you just assume that reading a file
> into memory is the only way to do this?  If you could just open the

Random access to a file which might be cached by the operating system
is one possible solution, but I'm also interested in solutions that
give better random access speed.  Having the data in memory directly
accessible by the program would be an attractive alternative, because
it wouldn't have the overhead of operating system calls.

From: Thomas F. Burdick
Subject: Re: Big files in memory
Date: Sun, 09 Dec 2001 09:25:19 +0000
Message-ID: <xcvd71o7nk0.fsf@conquest.OCF.Berkeley.EDU>

··········@mailandnews.com (Software Scavenger) writes:

> Kent M Pitman <······@world.std.com> wrote in message news:<···············@shell01.TheWorld.com>...
> 
> > As a rule, CL is not a language in which it is a good idea to
> > overconstrain the answer to your question.  Is your desire to achieve
> > a particular solution to the question of random access, excluding
> > other possible solutions, or do you just assume that reading a file
> > into memory is the only way to do this?  If you could just open the
> 
> Random access to a file which might be cached by the operating system
> is one possible solution, but I'm also interested in solutions that
> give better random access speed.  Having the data in memory directly
> accessible by the program would be an attractive alternative, because
> it wouldn't have the overhead of operating system calls.

In that case, why don't you just read the whole file into an array and
access it exactly like an array.

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Michael J. Ferrador
Subject: Re: Big files in memory - simple cases, optimization, correctness
Date: Sun, 09 Dec 2001 15:29:42 +0000
Message-ID: <GrLQ7.70911$MX1.11321865@news02.optonline.net>

Correctness (we can still talk about that in Lisp, right?
 should I still pursue The Little/Seasoned Schemers to learn this aspect?
 which other aspects?)

> > Kent M Pitman <······@world.std.com> wrote in message
news:<···············@shell01.TheWorld.com>...
> >
> > > As a rule, CL is not a language in which it is a good idea to
> > > overconstrain the answer to your question.  Is your desire to achieve
> > > a particular solution to the question of random access, excluding
> > > other possible solutions, or do you just assume that reading a file
> > > into memory is the only way to do this?  If you could just open the

The Sixth Commandment (TLS) - simplify only after the function is correct
probably has a corrollary - optimise only after the function is correct

> ··········@mailandnews.com (Software Scavenger) writes:
> > Random access to a file which might be cached by the operating system
> > is one possible solution, but I'm also interested in solutions that
> > give better random access speed.  Having the data in memory directly
> > accessible by the program would be an attractive alternative, because
> > it wouldn't have the overhead of operating system calls.

Write it using the vendor proven or at least tested STD-IO (is this C
pollution in my mind?)
Make sure the rest of your program is correct.

Then write your, FILE-AS-LIST (-ARRAY -VECTOR? maybe using the same STREAMS
interface?) But then you've got a second set of functions to prove correct.

Thomas F. Burdick <···@conquest.OCF.Berkeley.EDU> wrote in message
····················@conquest.OCF.Berkeley.EDU...
> In that case, why don't you just read the whole file into an array and
> access it exactly like an array.

you need to use STD-IO to read the file into the array (vector?), anyway
make sure you have practice using it, before your "simple pointer".

---

20M doesn't sound big, If it was a >n bits (32 - 4G) file
in a n bit environment running on a 2n bit CPU
(64 - all RISC, Intel always last) with OS high-mem-map-files...

From: Pierre R. Mai
Subject: Re: Big files in memory
Date: Sun, 09 Dec 2001 16:35:40 +0000
Message-ID: <87d71ojqqr.fsf@orion.bln.pmsf.de>

··········@mailandnews.com (Software Scavenger) writes:

> Random access to a file which might be cached by the operating system
> is one possible solution, but I'm also interested in solutions that
> give better random access speed.  Having the data in memory directly
> accessible by the program would be an attractive alternative, because
> it wouldn't have the overhead of operating system calls.

Using mmap is usually not going to be the best way to achieve high
random access speeds:  With mmap the operating system has to guess
your access patterns just from the isolated page faults it sees,
with far less knowledge than your application has, so it will often
do just the wrong thing.  If you are going for high-speed, you should
just read in the whole file (or suitable segments of it) in one go,
and do random access in memory, which is going to be far more
efficient.  For write access, flush out pages at regular intervals, or
synchronisation points (or implement some more complicated form of
transaction system, as required by your application).  With that
approach you might even overlap loading/writing and processing using
threads or async-io.

What mmap offers you is the ability to treat file-storage like memory,
at a commensurate cost in speed.  The problem is it offers that
directly only to languages whose native array datastructures are just
chunks of memory.

While CL implementations can usually use mmap and the generated
mappings through their FFIs, this doesn't offer the same level of
integration (and often speed as well) as CL's native datastructures
do.  E.g. in CMU CL you might use the following:

(use-package :alien)
(use-package :c-call)

(declaim (inline mmap))
(def-alien-routine "mmap" (* t)
  (start (* t) :in)
  (length unix:size-t :in)
  (prot int :in)
  (flags int :in)
  (fd int :in)
  (offset unix:off-t :in))

(declaim (inline munmap))
(def-alien-routine "munmap" int
  (start (* t) :in)
  (length unix:size-t :in))

(defun count-newlines (pathname)
  (with-open-file (stream pathname)
    (let* ((length (file-length stream))
           (map (mmap nil length #x01 #x02 (kernel::fd-stream-fd stream) 0))
           (fm (cast map (* unsigned-char))))
      (unwind-protect
          (do ((count 0)
               (newline (char-code #\Newline))
               (index 0 (1+ index)))
              ((= index length) count)
            (when (= (deref fm index) newline)
              (incf count)))
        (munmap map length)))))

As you can see you have to use DEREF to access elements of the
mapping, so you'll have to copy portions to CL datastructures in order
to use things like the sequence and string functions.

It is possible to implement specialized ARRAYs that are backed by
mmap, and integrate the FFI/external-format frobbing stuff, so that
the elements of the array/file make sense to foreign code in the end.
But in order to do that in a form that allows those arrays being used
in all operations where memory-backed ARRAYs make sense, you'll need
low-level support from your implementation.

You can of course also wrap your own generic code library that unifies
mmap-based FFI arrays, and normal arrays, but that will mean
duplicating most of the relevant CL library stuff.

Don't know if any implementations currently support such a beast,
though I heard that ACL has some support for mmap.

Regs, Pierre.

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein

From: Duane Rettig
Subject: Re: Big files in memory
Date: Mon, 10 Dec 2001 08:32:41 +0000
Message-ID: <4itbfjx06.fsf@beta.franz.com>

"Pierre R. Mai" <····@acm.org> writes:

> ··········@mailandnews.com (Software Scavenger) writes:
> 
> > Random access to a file which might be cached by the operating system
> > is one possible solution, but I'm also interested in solutions that
> > give better random access speed.  Having the data in memory directly
> > accessible by the program would be an attractive alternative, because
> > it wouldn't have the overhead of operating system calls.
> 
> Using mmap is usually not going to be the best way to achieve high
> random access speeds:  With mmap the operating system has to guess
> your access patterns just from the isolated page faults it sees,
> with far less knowledge than your application has, so it will often
> do just the wrong thing.  If you are going for high-speed, you should
> just read in the whole file (or suitable segments of it) in one go,
> and do random access in memory, which is going to be far more
> efficient.

I have to disagree, here.  For sparse data access and modification
on a large file, a memory-mapped approach will be much more efficient.
Contrast the reading of a whole 20 Mb file into memory, accessing parts
of that file, and (possibly) writing the file back out again, as opposed
to mapping the file into memory, modifying selected portions of the
memory, and synching the file.  The former always transfers 40 Mb of
data, while the latter transfers as few as 2 pages of data (on the
order of 10s to 100s of kbytes).

>  For write access, flush out pages at regular intervals, or
> synchronisation points (or implement some more complicated form of
> transaction system, as required by your application).  With that
> approach you might even overlap loading/writing and processing using
> threads or async-io.
> 
> What mmap offers you is the ability to treat file-storage like memory,
> at a commensurate cost in speed.  The problem is it offers that
> directly only to languages whose native array datastructures are just
> chunks of memory.

There are a couple of issues here.  Mmap is not portable - Windows at
least does not have it (though Windows has equivalent capabilities).
Also, while I agree that diving down to the memory-addressing level
is not desirable in CL, why should we resign ourselves to having to
do this, when we can instead bring all of the power of CL to be brought
to bear on such otherwise unstructured data?  I am, of course, alluding
to streams extensions in CL to implement mapped-files, which is desirable
because it allows the file to be treated in the same way that CL normally
treats a file, with access through a stream.  Any structure on top of
that is the same whether the file is mapped or buffered, and is simply
up to the user.

> While CL implementations can usually use mmap and the generated
> mappings through their FFIs, this doesn't offer the same level of
> integration (and often speed as well) as CL's native datastructures
> do.  E.g. in CMU CL you might use the following:

 [example and explanations elided ...]

> Don't know if any implementations currently support such a beast,
> though I heard that ACL has some support for mmap.

More generally, mapped-file streams.  Actually, CMUCL has such
support as well, though it is only a month old.  The last known
site for Paul Foley's simple-streams extensions for CMUCL (of
which mapped-files were a major concern of his) is:

http://www.actrix.gen.nz/users/mycroft/cl.html

I assume that Paul will inform us if the location changes.

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)

From: Lieven Marchand
Subject: Re: Big files in memory
Date: Mon, 10 Dec 2001 17:05:05 +0000
Message-ID: <m38zcbm2f2.fsf@localhost.localdomain>

Duane Rettig <·····@franz.com> writes:

> "Pierre R. Mai" <····@acm.org> writes:
> 
> > Using mmap is usually not going to be the best way to achieve high
> > random access speeds:  With mmap the operating system has to guess
> > your access patterns just from the isolated page faults it sees,
> > with far less knowledge than your application has, so it will often
> > do just the wrong thing.  If you are going for high-speed, you should
> > just read in the whole file (or suitable segments of it) in one go,
> > and do random access in memory, which is going to be far more
> > efficient.
> 
> I have to disagree, here.  For sparse data access and modification
> on a large file, a memory-mapped approach will be much more efficient.
> Contrast the reading of a whole 20 Mb file into memory, accessing parts
> of that file, and (possibly) writing the file back out again, as opposed
> to mapping the file into memory, modifying selected portions of the
> memory, and synching the file.  The former always transfers 40 Mb of
> data, while the latter transfers as few as 2 pages of data (on the
> order of 10s to 100s of kbytes).

You leave out Pierre's caveat about suitable segments. The relative
efficiency of combinations of fopen+fseek+madvise versus mmap is
fairly complex and highly system and implementation dependent.

For example, in the Linux kernel there is a hidden cost in using mmap
of setting up the appropriate vm page tables, doing a page table flush
and later on having to scan those page tables to determine which pages
are dirty and have to be written back. For 2 pages with an madvise
that tells the kernel not to prefetch, mmap might be slower than
fseek+write.

-- 
Lieven Marchand <···@wyrd.be>
She says, "Honey, you're a Bastard of great proportion."
He says, "Darling, I plead guilty to that sin."
Cowboy Junkies -- A few simple words

From: Duane Rettig
Subject: Re: Big files in memory
Date: Mon, 10 Dec 2001 21:50:52 +0000
Message-ID: <4heqyeocj.fsf@beta.franz.com>

Lieven Marchand <···@wyrd.be> writes:

> Duane Rettig <·····@franz.com> writes:
> 
> > "Pierre R. Mai" <····@acm.org> writes:
> > 
> > > Using mmap is usually not going to be the best way to achieve high
> > > random access speeds:  With mmap the operating system has to guess
> > > your access patterns just from the isolated page faults it sees,
> > > with far less knowledge than your application has, so it will often
> > > do just the wrong thing.  If you are going for high-speed, you should
> > > just read in the whole file (or suitable segments of it) in one go,
> > > and do random access in memory, which is going to be far more
> > > efficient.
> > 
> > I have to disagree, here.  For sparse data access and modification
> > on a large file, a memory-mapped approach will be much more efficient.
> > Contrast the reading of a whole 20 Mb file into memory, accessing parts
> > of that file, and (possibly) writing the file back out again, as opposed
> > to mapping the file into memory, modifying selected portions of the
> > memory, and synching the file.  The former always transfers 40 Mb of
> > data, while the latter transfers as few as 2 pages of data (on the
> > order of 10s to 100s of kbytes).
> 
> You leave out Pierre's caveat about suitable segments.

Well, yes, I did - I took at face value the original poster's
desire to have _random_ access, and Pierre's suitable-segment
caveat removes this capability.  And although the suitable-segment
style does greatly decrease the time used over the whole-file-read
approach, there is a much larger cognitive overhead of trying to
predict what blocks of data are needed for any given run of the
application.  And although one might be able to _approach_ the
mmap style in speed by accurate prediction, why bother, when mmap
is faster (see below) and also so much easier because there is
no prediction involved?

> The relative
> efficiency of combinations of fopen+fseek+madvise versus mmap is
> fairly complex and highly system and implementation dependent.
> 
> For example, in the Linux kernel there is a hidden cost in using mmap
> of setting up the appropriate vm page tables, doing a page table flush
> and later on having to scan those page tables to determine which pages
> are dirty and have to be written back. For 2 pages with an madvise
> that tells the kernel not to prefetch, mmap might be slower than
> fseek+write.

Did you actually try this?

[Readers, some of the following is more C/operating-system related,
and C source is presented at the end, so you may want to skip some
or all of it since it has less to do with Lisp than earlier parts of
this thread]

I got curious when you posted your above paragraph, because indeed I
agree that mmap has some overheads that read doesn't have (though my
instincts were telling me that read's overhead was higher).  So I
threw together a quick experiment to tell me what kinds of overheads
we were dealing with.  Granted, both the suitable-segment approach and
the mmap approach are going to be far more efficient than the
whole-file-read approach, and the timing differences between the two
more efficient approaches are going to be hardly measurable.  But I
wanted to see if I would be surprised with the results, and it turns
out I was not surprised.  The three programs I created are
 - readtry, which just reads the given file into an array in memory,
 - partialtry, which implements the suitable-segment approach by
   reading one int from the same file, modifying that word, and
   writing it back out, and
 - maptry, which mmaps the file, modifies one word, and does a
   synchronous msync to ensure that it got all the way out to the
   file.

I tried this result on a concatenation of several copies of a
vmlinux (just to cons up a large file for the experiment).
It was done on a linux with a 2.2.19 kernel.

Resultant transcript:

% ls -l bar
-rw-rw-r--   1 duane    fi        4743784 Dec 10 12:44 bar
% time readtry bar
User   clock ticks elapsed: 0
System clock ticks elapsed: 2
User   clock ticks elapsed (child): 0
System clock ticks elapsed (child): 0
0.000u 0.020s 0:05.26 0.3%	0+0k 0+0io 85pf+0w
% time partialtry bar
User   clock ticks elapsed: 0
System clock ticks elapsed: 0
User   clock ticks elapsed (child): 0
System clock ticks elapsed (child): 0
0.000u 0.000s 0:00.03 0.0%	0+0k 0+0io 84pf+0w
% time maptry bar
User   clock ticks elapsed: 0
System clock ticks elapsed: 0
User   clock ticks elapsed (child): 0
System clock ticks elapsed (child): 0
0.000u 0.000s 0:00.00 0.0%	0+0k 0+0io 86pf+0w
% 

Note that the last two are both much more efficient than
the first one, and both have mostly zero times.  However,
despite having two more pagefaults than partialtry, maptry
is indeed virtually at zero for all of the timings, including
total program run time.  [One of the extra pagefaults can be
attributed to the fstat call in both readtry and maptry, which
when partialtry started out with the unecessary call it added
one pagefault to it's result, so I took it out to be fair to
partialtry]

Experiments with an almost-50Mb file gave similar results.

These results may not convince you, since the timings are so
close to zero, but they do convince me that my original gut
instinct was not wrong.

Feel free to play with the C sources below my signature, and
to convince yourself as well, or to demonstrate to me any of
the methodologies in my experiment which are wrong.

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)


Sources are given below for the three approaches.
Compile each with "cc -o <file> <file>.c"

readtry.c:

=====

#include <stdio.h>
#include <sys/types.h>
#include <sys/times.h>
#include <sys/stat.h>
#include <sys/fcntl.h>

int
main(int argc, char *argv[])
{
    struct tms time1, time2;
    struct stat stat;
    int fd, size;
    caddr_t addr;

    if (argc<2){
	printf("Need a file name\n");
	exit(1);
    }
    fd = open(argv[1], O_RDWR);
    if(fstat(fd, &stat)) {
	printf("Can't stat %s\n", argv[1]);
	exit(1);
    }
    size = stat.st_size;
    if((addr=(caddr_t)malloc(size))==NULL){
	printf("malloc failed\n");
	exit(1);
    }

/* start the timing: */
    times(&time1);

    read(fd, addr, size);

/* End the timing: */
    times(&time2);

    printf("User   clock ticks elapsed: %d\n", time2.tms_utime - time1.tms_utime);
    printf("System clock ticks elapsed: %d\n", time2.tms_stime - time1.tms_stime);
    printf("User   clock ticks elapsed (child): %d\n", time2.tms_cutime - time1.tms_cutime);
    printf("System clock ticks elapsed (child): %d\n", time2.tms_cstime - time1.tms_cstime);

}
=====

partialtry.c:

=====

#include <stdio.h>
#include <sys/types.h>
#include <sys/times.h>
#include <sys/stat.h>
#include <sys/fcntl.h>

int
main(int argc, char *argv[])
{
    struct tms time1, time2;
    struct stat stat;
    int fd;
    caddr_t addr;

    if (argc<2){
	printf("Need a file name\n");
	exit(1);
    }
    fd = open(argv[1], O_RDWR);
    if((addr=(caddr_t)malloc(sizeof(int)))==NULL){
	printf("malloc failed\n");
	exit(1);
    }
    
/* start the timing: */
    times(&time1);

    lseek(fd, 10000, SEEK_SET);
    read(fd, addr, sizeof(int));
    *(int*)addr = 0;
    lseek(fd, 10000, SEEK_SET);
    write(fd, addr, sizeof(int));

/* End the timing: */
    times(&time2);

    printf("User   clock ticks elapsed: %d\n", time2.tms_utime - time1.tms_utime);
    printf("System clock ticks elapsed: %d\n", time2.tms_stime - time1.tms_stime);
    printf("User   clock ticks elapsed (child): %d\n", time2.tms_cutime - time1.tms_cutime);
    printf("System clock ticks elapsed (child): %d\n", time2.tms_cstime - time1.tms_cstime);

}
=====

maptry.c:

=====

#include <stdio.h>
#include <sys/types.h>
#include <sys/times.h>
#include <sys/stat.h>
#include <sys/fcntl.h>
#include <sys/mman.h>

int
main(int argc, char *argv[])
{
    struct tms time1, time2;
    struct stat stat;
    int fd, size;
    caddr_t addr;

    if (argc<2){
	printf("Need a file name\n");
	exit(1);
    }
    fd = open(argv[1], O_RDWR);
    if(fstat(fd, &stat)) {
	printf("Can't stat %s\n", argv[1]);
	exit(1);
    }
    size = stat.st_size;
    
/* start the timing: */
    times(&time1);

    if((addr=(caddr_t)mmap(0, size, PROT_READ|PROT_WRITE,
			   MAP_PRIVATE, fd, 0)) == (caddr_t)-1) {
	printf("mmap failed\n");
	exit(1);
    }

    *(int*)(addr+10000) = 0;

    msync((void *)addr, size, MS_SYNC);

/* End the timing: */
    times(&time2);

    printf("User   clock ticks elapsed: %d\n", time2.tms_utime - time1.tms_utime);
    printf("System clock ticks elapsed: %d\n", time2.tms_stime - time1.tms_stime);
    printf("User   clock ticks elapsed (child): %d\n", time2.tms_cutime - time1.tms_cutime);
    printf("System clock ticks elapsed (child): %d\n", time2.tms_cstime - time1.tms_cstime);

}
=====

From: Lieven Marchand
Subject: Re: Big files in memory
Date: Tue, 11 Dec 2001 17:30:39 +0000
Message-ID: <m3r8q1hdfk.fsf@localhost.localdomain>

Duane Rettig <·····@franz.com> writes:

> Lieven Marchand <···@wyrd.be> writes:
> 
> > The relative
> > efficiency of combinations of fopen+fseek+madvise versus mmap is
> > fairly complex and highly system and implementation dependent.
>
> Did you actually try this?

I've done such measurements in a number of real world situations on
different flavours of Unix and I would stand by the above rather than
'mmap is always going to be faster than read'. mmap has another
advantage that it is in many cases a more elegant way to program, if
you don't have to make the file longer or shorter, but when you're
talking about speed, measure.

> I tried this result on a concatenation of several copies of a
> vmlinux (just to cons up a large file for the experiment).
> It was done on a linux with a 2.2.19 kernel.
> 
> Experiments with an almost-50Mb file gave similar results.

On a machine with how much RAM? I don't run the 2.2 series any more
but ISTR that early 2.2 kernels had a markedly different behaviour in
read ahead for mmap versus read if the mapped file was a fair bit
larger than available RAM, say 4 or 5 times larger. It's possible they
changed that in later 2.2 kernels though.

> These results may not convince you, since the timings are so
> close to zero, but they do convince me that my original gut
> instinct was not wrong.

I'm convinced for that application and version of the OS. Note that
e.g. the apache tuning guide describes the variable MMAP_THRESHOLD and
states that on some systems, mmap-ing small files is slower than
reading them.

dejagoogle has a lot to say about this but these are some of the more
relevant ones:

Linus' own view on it:

http://groups.google.com/groups?hl=en&threadm=fa.igkk18v.d2s78o%40ifi.uio.no&rnum=4&prev=/groups%3Fq%3Dmmap%2Bread%2Bperformance%26hl%3Den

This was an exciting time with the 2.4.x VM madness for low values of
x where mmap suffered big time:

http://groups.google.com/groups?hl=en&threadm=linux.kernel.3A62C5F0.80C0E8B5%40sw.com.sg&rnum=12&prev=/groups%3Fq%3Dmmap%2Bread%2Bperformance%26hl%3Den%26start%3D10%26sa%3DN

and this is an application/OS combo where it makes no difference:

http://groups.google.com/groups?hl=en&threadm=87adyor98z.fsf%40muon.robnet.com&rnum=19&prev=/groups%3Fq%3Dmmap%2Bread%2Bperformance%26hl%3Den%26start%3D10%26sa%3DN

> Feel free to play with the C sources below my signature, and
> to convince yourself as well, or to demonstrate to me any of
> the methodologies in my experiment which are wrong.

I don't think it will make much difference, but you should close() and
munmap() where appropriate. Especially write() may return without
having written anything to disc. Since you do msync, munmap is less
important but can take some time, especially since the implementation
of vm mappings in the 2.3 kernel has oscillated between AVL-trees with
higher initial overhead but better scalability and lists which are
faster for small number of mappings but get n^2 behaviour. The
difference on something like the Electric Fence malloc debugger which
did a mapping for every malloced piece of memory was significant.

Perhaps we should move this to comp.unix.programmer or something like
that ;-)

-- 
Lieven Marchand <···@wyrd.be>
She says, "Honey, you're a Bastard of great proportion."
He says, "Darling, I plead guilty to that sin."
Cowboy Junkies -- A few simple words

From: Pierre R. Mai
Subject: Re: Big files in memory
Date: Tue, 11 Dec 2001 20:28:49 +0000
Message-ID: <87ofl5sdq6.fsf@orion.bln.pmsf.de>

Duane Rettig <·····@franz.com> writes:

> > You leave out Pierre's caveat about suitable segments.
> 
> Well, yes, I did - I took at face value the original poster's
> desire to have _random_ access, and Pierre's suitable-segment
> caveat removes this capability.  And although the suitable-segment

It does not!  Random access does not mean that access addresses have
to be generated with a scintillation counter to guarantee
non-predictable access patterns.

> style does greatly decrease the time used over the whole-file-read
> approach, there is a much larger cognitive overhead of trying to
> predict what blocks of data are needed for any given run of the
> application.  And although one might be able to _approach_ the
> mmap style in speed by accurate prediction, why bother, when mmap
> is faster (see below) and also so much easier because there is
> no prediction involved?

If you can really predict access patterns (at least partially) you
will be _faster_ than mmap, which can't do so, unless your kernel or
stdlib have needlessly inefficient implementations of read, but a fast
implementation of mmap.  Historically that has been the case for a
number of Unices, where read copied the read data lots of times,
before it ended up in its destination.  But modern implementations of
read don't suffer from such problems.

In the near-worst case, i.e. random unpredictable accesses, you will
approach the speed of mmap, crossing it for certain degenerate cases,
where mmap's ability to use the hardware read/write-barriers for
access detection can give it a certain edge over mechanisms that can't
do so.

I have never questioned the ease of use of mmap.  I'm questioning the
mystical belief in the superior performance of mmap over read/write,
regardless of circumstances.

> I got curious when you posted your above paragraph, because indeed I
> agree that mmap has some overheads that read doesn't have (though my
> instincts were telling me that read's overhead was higher).  So I

The overhead is one thing.  The other is that the application has in
many _real_ cases an obvious advantage in knowledge of access
patterns, since completely unpredictable, random accesses are just not
the norm.  The application can use that knowledge to anticipate needed
data, which mmap can't.  The important factor in the performance of
random access processing of files is _LATENCY_!!!!  Being able to
prefetch data in the background, _before_ it is needed, and in as
large as possible chunks, is a _huge_ advantage.  mmap can usually not
do this intelligently, because it cannot reliably predict access
patterns from the little information it receives through the page
faults.  Hence it usually falls back to some common heuristics, which
unless they perfectly match your application will not help you much.

An application can often predict future accesses much better, because
it has semantic knowledge of the data being processed and knowledge of
the algorithm processing it.  It can therefore prefetch data, and/or
generate larger continuous requests, etc.

> threw together a quick experiment to tell me what kinds of overheads

I fail to see what your quick experiment shows, except that trivial
workloads take zero time on today's hardware.

When I worry about the performance of random file access, I'm not
talking about files that trivially fit into the buffer/page cache,
with one or two word-size accesses.  Any realistic approach will not
take noticably more than a second here, so why worry about performance
at all?

I'm talking about file I/O that actually continually touches the
harddisk, where disk latency will dominate, and where the application
does something more interesting in the process.  That is the sort of
problem where worrying about random access performance actually
matters, and that's where you will find that mmap often is not the
superior solution.

If performance isn't such a pressing concern, I'm all for using
whatever is simplest to use, and mmap can be that for many algorithms
with random-access, in languages where it is integrated with normal
in-memory datastructures.  OTOH reading and writing the file in one
go, and using memory can be similarly fast, and if your file is
smaller than real memory, and you make a signifcant number of
accesses, will usually approach mmap performance, if not surpassing
it.  But again, performance isn't really a pressing concern here.

The reason I regularly advise people not to focus solely on mmap is
that there is this common misunderstanding that mmap is per-se faster
than explicit read/write-style access, and that is IMHO not a
well-founded belief.

What is true is that many people write hideously suboptimal read/write
code, because writing good code in the face of non-trivial algorithms
is hard.  They then switch over to mmap, which of course performs
better than their code.  They then conclude that mmap is per-se faster
than read/write.  That is IMHO wrong.

I'd say the situation of mmap vs. (aio_)read/write is _in principle_
similar to GC vs. manual memory management.  But I/O patterns are
often less intricate than memory management patterns, and disk access
time is often 
a bigger part of total execution time than memory management, with
much worse worst case scenarios.  So there are more cases where doing
"manual" management of I/O wins against "automatic" I/O management.

> These results may not convince you, since the timings are so
> close to zero, but they do convince me that my original gut
> instinct was not wrong.

You are right, the test cases didn't convince me at all.  There is
also the problem that mmap with MAP_PRIVATE will not modify the
on-disk file at all, so the maptry version is only a page read, more
or less.

Regs, Pierre.

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein

From: Lieven Marchand
Subject: Re: Big files in memory
Date: Tue, 11 Dec 2001 20:34:50 +0000
Message-ID: <m3k7vtts0l.fsf@localhost.localdomain>

"Pierre R. Mai" <····@pmsf.de> writes:

> The overhead is one thing.  The other is that the application has in
> many _real_ cases an obvious advantage in knowledge of access
> patterns, since completely unpredictable, random accesses are just not
> the norm.  The application can use that knowledge to anticipate needed
> data, which mmap can't.  

To be fair to mmap, there is madvise() which can be used for some of
the standard access patterns.

-- 
Lieven Marchand <···@wyrd.be>
She says, "Honey, you're a Bastard of great proportion."
He says, "Darling, I plead guilty to that sin."
Cowboy Junkies -- A few simple words

From: Duane Rettig
Subject: My conclusions (Re: Big files in memory)
Date: Fri, 14 Dec 2001 17:23:30 +0000
Message-ID: <47krp3ect.fsf_-_@beta.franz.com>

Duane Rettig <·····@franz.com> writes:

I wanted to post some conclusions on this thread, because there are
a couple of open issues which I'd like to revisit, including some
statements I myself made which I want to clarify.

- First, the original poster wants random access into files.  He can
have that.  He wants to know the best way in LWW or A[llegro ]CL.
The answer: use the standard Common Lisp I/O functions.  In a
different branch of this thread, I stated that he probably didn't
want to read the whole file into memory.  I think that for this
thread branch, Pierre Mai and Lieven Marchand would agree with
me on that point.  Any of the CL read/write functions, plus
file-position, should be usable and effiecient if implemented
well, and should always be much more efficient than reading the
whole (big) file into memory and accessing the memory as an array.
This is indeed verified by my benchmark, whether you open the
file with

(open "foo" :direction :io :if-exists :overwrite)

or with

(open "foo" :direction :io :mapped t :if-exists :overwrite)

using Allegro CL's mapping extension for simple-streams.
With this small difference in the above open calls, and bearing
in mind file-growth caveats for mapped files, all other programming
remains exactly the same, whichever way you open the files.
I think it's a Good Thing if one can program in such an
implementation-independent way.


- I also recognize the hacky nature of my benchmark, and Pierre's
correction to use MAP_SHARED instead of MAP_PRIVATE.  This
brought the maptry and partialtry benchmarks to the point where
the difference was not measurable (though they were so close to
zero anyway that the only real conclusion that could be drawn
was that both were obviously much faster than readtry,
the version which read the whole file into memory).


- Where we got off was when I reacted to Pierre's assertion that
mmap was inefficient.  I realize now that he was reacting to the
(incorrect) assumption that many people make that mmap is much
faster than read/write/seek operations.  I regret a statement I
made saying that mmap actually did appear faster (but that was
based on a bad benchmark) and although I had been surprised at
such a result (I had expected mmap and read/write/seek to be
equivalent), I should not have drawn the erroneous conclusion
without checking it out more thoroughly.

Having said that, I now stand by my original instincts which say
that mmap and read/write/seek programming are essentially
equivalent in efficiency, in contrast to reading in a whole file
and then working from the memory.

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)

From: Kent M Pitman
Subject: Re: My conclusions (Re: Big files in memory)
Date: Fri, 14 Dec 2001 18:31:14 +0000
Message-ID: <sfwadwlbqml.fsf@shell01.TheWorld.com>

Duane Rettig <·····@franz.com> writes:

> ... This is indeed verified by my benchmark, whether you open the
> file with
> 
> (open "foo" :direction :io :if-exists :overwrite)
> 
> or with
> 
> (open "foo" :direction :io :mapped t :if-exists :overwrite)
> 
> using Allegro CL's mapping extension for simple-streams.

Or with 

(open "foo" :direction :io :mapped t :if-exists :overwrite :allow-other-keys t)

so you can use the same keywords for everyone.

From: Martin Cracauer
Subject: Re: My conclusions (Re: Big files in memory)
Date: Fri, 14 Dec 2001 22:39:40 +0000
Message-ID: <9vdv3c$2hev$1@counter.bik-gmbh.de>

Duane Rettig <·····@franz.com> writes:

>Having said that, I now stand by my original instincts which say
>that mmap and read/write/seek programming are essentially
>equivalent in efficiency, in contrast to reading in a whole file
>and then working from the memory.

Look up comments by Linux kernel developers (amoung them Linus) and
FreeBSD kernel developers (amoung them John Dyson and Matt Dillon)
what they think of the increasing trend to use mmap instead of
read/write/seek.

Essentially they draw big warning messages that usually the mmap
solution will only run faster as long as it has the machine for
itself, which unfortunately is the typical benchmark environment.

However, excessive mmap()ing causes big overhead to context switching.
And a page fault is a very expensive operation.  The read/write/seek
solution has big overhead as well, mostly because of the number of
systemcalls and will usually cause the mmap solutiont o be faster on
first glance.  However, on a loaded production machine the
read/write/seek solution will basically behave the same as in
unloaded, while many mmap()ing proceses will cause more than
proportional load on the system.  Also, modern multi-CPU kernels can
make read/write/seek I/O party simultaneously, while there is no way
to do so in case of a memory page fault, the whole machine is blocked
and the other CPUs will be idle.

Another disadvatage of mmap() is that the behaviour of the OS may not
be easily predicted, especially OSes usually do a useful readahead for
read/write and can tune it based on application behaviour.  While the
useful readahead of mmap()'ed regions is not obvious and it is very
hard to gather data to make an educated guess to tune it at runtime.

As a personal comment, I have seen choices between mmap and
read/write.  Note the absense of seek().  These guys were too lazy or
too stupid to actually seek in the file and their choice was between
mmaping all of it or just looping over all of it.  of course, the data
is not proprocesses for locality.  They get what they deserve, fast
apps in benchmarks and production machines grinting to a halt.

Martin
-- 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <········@bik-gmbh.de> http://www.bik-gmbh.de/~cracauer/
FreeBSD - where you want to go. Today. http://www.freebsd.org/

From: ········@acm.org
Subject: Re: My conclusions (Re: Big files in memory)
Date: Fri, 14 Dec 2001 23:41:24 +0000
Message-ID: <E6wS7.42603$%26.4322756@news20.bellglobal.com>

········@counter.bik-gmbh.de (Martin Cracauer) writes:
> As a personal comment, I have seen choices between mmap and
> read/write.  Note the absense of seek().  These guys were too lazy
> or too stupid to actually seek in the file and their choice was
> between mmaping all of it or just looping over all of it.  of
> course, the data is not proprocesses for locality.  They get what
> they deserve, fast apps in benchmarks and production machines
> grinting to a halt.

Strange.  That sounds remarkably equivalent to the assertion that
"Lisp is a language that is just used for processing lists," which has
a tendancy to be true when used in teaching courses.

"Yup; we've got this problem, you see...  We've got this Lisp program
that's Really Slow.  We've got this list that all the data gets put
into, and as it gets bigger, it takes longer and longer to run the
program..."
-- 
(reverse (concatenate 'string ····················@" "454aa"))
http://www.ntlug.org/~cbbrowne/linux.html
"Funny, the  only thing that makes me  go Keanu about Microsoft is the
fact  that they are constantly  behind the times  and yet  claim to be
innovating." -- Steve Lamb <········@despair.rpglink.com>

From: Kent M Pitman
Subject: Re: Big files in memory
Date: Sun, 09 Dec 2001 03:06:23 +0000
Message-ID: <sfwd71pw0r4.fsf@shell01.TheWorld.com>

Kent M Pitman <······@world.std.com> writes:

> ··········@mailandnews.com (Software Scavenger) writes:
> 
> > What is the easiest way, using LWW or ACL, to read a 20-megabyte file
> > into memory and establish random access to it as if it were an array?
> 
> As a rule, CL is not a language in which it is a good idea to
> overconstrain the answer to your question.  Is your desire to achieve
> a particular solution to the question of random access, excluding
> other possible solutions, or do you just assume that reading a file
> into memory is the only way to do this?  If you could just open the
> file in random access mode and move around in it and access various
> parts of it, would that be sufficient?  CL is a language about "what"
> you want to do, not "how" to do it.  (Things in Lisp give you
> functionality, but little info on how that functionality occurs.  C,
> by contrast, is a language that is very about "how" but not really
> about "what".  C programs just do as they're told with little regard
> as to what you're trying to achieve.)

For example, here's a stupid made-up example of using FILE-POSITION on
a stream open in :IO mode.  Whether that's implemented as mapping the
file into memory or not is, for many applications, an implementational
artifact.  Surely there are times that it matters that you have a
particular memory management model and exacting implementation, but
for a lot of situations, it doesn't.  In general, you shouldn't begin
by overconstraining yourself unless you have some reason to believe
you need to.

(with-open-file (stream "sample-data.binary"
                        :direction :io
                        :element-type '(unsigned-byte 16))
 (let ((bound 250) (sample-size 15))
    ;; Allocate some file space
    (dotimes (i (* 2 bound))
      (write-byte 0 stream)
      (write-byte 0 stream))
    ;; Go back and fill it in backwards, not because this is the
    ;; best way, but "just because we can"
    (loop for i from (1- bound) downto 0
          do (file-position stream (* 2 i))
             (write-byte (isqrt i) stream)
             (write-byte (* i i) stream))
    ;; Check that we really did fill stuff in as we expected, and
    ;; that we can get it back in arbitrary order.
    (dotimes (i sample-size)
      (let ((n1 (random bound)))
        (file-position stream (* 2 n1))
        (format t "~&isqrt(~3,' D) = ~2,' D  square(~3,' D) = ~6,' D~%" 
                n1 (read-byte stream)
                n1 (read-byte stream))))))
isqrt( 47) =  6  square( 47) =   2209
isqrt(103) = 10  square(103) =  10609
isqrt(130) = 11  square(130) =  16900
isqrt(110) = 10  square(110) =  12100
isqrt(193) = 13  square(193) =  37249
isqrt( 66) =  8  square( 66) =   4356
isqrt(204) = 14  square(204) =  41616
isqrt(113) = 10  square(113) =  12769
isqrt( 29) =  5  square( 29) =    841
isqrt(140) = 11  square(140) =  19600
isqrt(104) = 10  square(104) =  10816
isqrt(242) = 15  square(242) =  58564
isqrt( 93) =  9  square( 93) =   8649
isqrt(  8) =  2  square(  8) =     64
isqrt(232) = 15  square(232) =  53824
=> NIL

One advantage of the above strategy is that it will port to systems
that can't buffer 20MB in memory and may even run in some systems that
don't even have 20MB of RAM memory, since those systems are permitted
to use buffering instead.

From: Gareth McCaughan
Subject: Re: Big files in memory
Date: Sun, 09 Dec 2001 01:31:15 +0000
Message-ID: <slrna15fn3.18vf.Gareth.McCaughan@g.local>

"Software Scavenger" wrote:

> What is the easiest way, using LWW or ACL, to read a 20-megabyte file
> into memory and establish random access to it as if it were an array?

READ-SEQUENCE.

-- 
Gareth McCaughan  ················@pobox.com
.sig under construc

From: Duane Rettig
Subject: Re: Big files in memory
Date: Mon, 10 Dec 2001 07:53:27 +0000
Message-ID: <4n10rjytk.fsf@beta.franz.com>

··········@mailandnews.com (Software Scavenger) writes:

> What is the easiest way, using LWW or ACL, to read a 20-megabyte file
> into memory and establish random access to it as if it were an array?

You probably don't want to read the file into memory (as you've also
indicated in another article) but to access the file as directly as
the operating system will make possible.  The best way to do this is
to open the file as a memory-mapped file.

In the simple-streams model implemented by Allegro CL with versions 6.0
or 6.1, you can open the file, e.g.:

  (open "foo" :direction :io :mapped t :if-exists :overwrite)

on a file which has read/write permissions, you will get a stream
on which you can do random access via file-position, reading, and
writing via other CL functions; these operations are performed
directly to the file (through the memory mapping) without ever
actually transferring the whole file.

There are some caveats with mapped files.  Most operating systems do
mapping differently, and different techniques in synchronizing the
memory pages with the mapped files might cause temporary mismatch
between what you have written and what appears in the file to other
processes (but you can use force-output or finish-output to do the
synchronization if you're unsure).  Also, most operating systems do
not allow the automatic extension of mapped files, so if you write
off the end of the file you're likely to get a write error (or an eof
on a read).  The default simple-streams design takes that into account
and doesn't do any file extension, but it might be possible, given
any particular operating system which allows it, to subclass
mapped-file-simple-stream and to write your own automatic file
extender.  Such a subclass would, however, be non-portable across
operating systems.

Speakng of portability, Paul Foley told me that allowing mapped files
is one of the major reasons why he started porting simple-streams for
CMUCL.  It looks like it was the most fully developed part of that
implementation (as of November 1, when I grabbed a snapshot of it).

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)