From: Tim Bradshaw
Subject: Re: Testing for the existence of a file
Date: 
Message-ID: <cgvg7e$9on@odah37.prod.google.com>
David Steuber wrote:
> I want to test for the existence of a file named either foo.data or
> foo.data.gz.  To do this, I created a wild pathname.  PROBE-FILE
> doesn't like wild pathnames so I tried DIRECTORY instead.  DIRECTORY
> worked for foo.data but failed for foo.data.gz.
> [...]
> Is there a simple way to use a wild pathname to find both foo.data
and
> foo.data.gz?  Or do I have to construct a pathname for each?

The unwashed multitudes are already descending on you to explain how
the inability to do this portably shows how completely rubbish the CL
pathname system is.  I'm sure they're right.

However doing things this way is bad for a much more practical reason:
speed.  doing something like DIRECTORY on a typical modern system
probably involves enumerating the names in the directory and then
matching them against the wildcard.  That sounds easy but imagine if
the directory is (a) large, (b) being accessed over a network, and (c)
being accessed from a platform some implementations of which have
rather deficient performance for large directories.  In other words
Linux and NFS.

An approach which is more likely to work well in practice is to
construct the names you want, and then do a PROBE-FILE for each of
them.  Using NFS for instance, this can be done by a single NFS_LOOKUP
call for each file rather than a sequence of NFS_READDIR calls, which
is likely to be much more efficient in the presence of large
directories. `Much more efficient' can mean fractions of a second
rather than, in bad cases, minutes (yes, really!).

Of course the UM will point out that you can't portably construct
these `multiple extension' file names, because the CL pathname system
is completely rubbish.  Well, you can use strings and/or
PARSE-NAMESTRING, which works quite well, I find.

--tim
From: David Steuber
Subject: Re: Testing for the existence of a file
Date: 
Message-ID: <4ddd570c.0408301633.76803e01@posting.google.com>
"Tim Bradshaw" <··········@tfeb.org> wrote in message news:<··········@odah37.prod.google.com>...

> The unwashed multitudes are already descending on you to explain how
> the inability to do this portably shows how completely rubbish the CL
> pathname system is.  I'm sure they're right.

I don't think I'll argue the point.  Although I did find the functions
ensure-directories-exist, user-homedir-pathname, and merge-pathnames
useful.  Granted, I've just tried them on OpenMCL so far and haven't
seen if they work on SBCL running on Linux.  Here's hoping.
 
> However doing things this way is bad for a much more practical reason:
> speed.  doing something like DIRECTORY on a typical modern system
> probably involves enumerating the names in the directory and then
> matching them against the wildcard.  That sounds easy but imagine if
> the directory is (a) large, (b) being accessed over a network, and (c)
> being accessed from a platform some implementations of which have
> rather deficient performance for large directories.  In other words
> Linux and NFS.
> 
> An approach which is more likely to work well in practice is to
> construct the names you want, and then do a PROBE-FILE for each of
> them.  Using NFS for instance, this can be done by a single NFS_LOOKUP
> call for each file rather than a sequence of NFS_READDIR calls, which
> is likely to be much more efficient in the presence of large
> directories. `Much more efficient' can mean fractions of a second
> rather than, in bad cases, minutes (yes, really!).

Hmm.  I didn't think of the cost of calling directory.  For what I'm
doing, that could be a serious bottle neck as there will be a few
thousand files when I'm done.  EXT3 is dog slow when there are lots of
files in a directory, no question.

Thanks for the replies.

And thank you, Verio for conking out on me while I was reading news
via NNTP.