Behavior of #'directory

From: Randall Randall
Subject: Behavior of #'directory
Date: Sat, 22 May 2004 05:39:14 +0000
Message-ID: <43619482.0405212139.77fcf821@posting.google.com>

In the CLHS, the function DIRECTORY is said to provide a list of files 
which match a pathspec.  One would expect that a function to list 
a directory would return all entries (or at least all files) in a directory.  
CMUCL and ACL both do this.  However, the CLHS goes on to say "If 
the pathspec is not wild, the resulting list will contain either zero or 
one elements", which would seem to indicate that a string specified 
path would need to end in a star to get an actual listing of the directory, 
if that were supported at all.  If no star, one should either get the 
directory itself in a list of one, or NIL.  As mentioned above, CMUCL 
and ACL do not do this, but do the (actually more useful) thing of 
returning a list of files in the specified directory, rather than a list of 
files which match the pathspec.

OpenMCL and CLISP return NIL for a directory without a wildcard 
component, *even though the directory exists*.  Shouldn't they 
both return a list of one path, the directory itself?  As mentioned 
above, both ACL and CMUCL return a list which seems to presume 
that directories have implicit wildcards when string specified (I didn't 
attempt to test this with (make-pathname ...), since that's far more 
complex than just using Unix paths as strings).

Is it really the case that all four of these implementations are broken 
in one of two ways for this function, or am I missing something? :)

--
Randall Randall <·······@randallsquared.com>
I'll make a Usenet .sig later.

Re: Behavior of #'directory Paul F. Dietz
Re: Behavior of #'directory Christophe Rhodes
Re: Behavior of #'directory Barry Margolin
- Re: Behavior of #'directory Christophe Rhodes
  - Re: Behavior of #'directory Randall Randall
    - Re: Behavior of #'directory Barry Margolin
  - Re: Behavior of #'directory Peter Seibel
    - Re: Behavior of #'directory Christophe Rhodes

From: Paul F. Dietz
Subject: Re: Behavior of #'directory
Date: Sat, 22 May 2004 11:03:16 +0000
Message-ID: <W72dnTBmQ9bprjLdRVn-hg@dls.net>

Randall Randall wrote:

> OpenMCL and CLISP return NIL for a directory without a wildcard 
> component, *even though the directory exists*.  Shouldn't they 
> both return a list of one path, the directory itself?

This makes the assumption that a directory is a file, which is
not necessarily true.

	Paul

From: Christophe Rhodes
Subject: Re: Behavior of #'directory
Date: Sat, 22 May 2004 09:55:23 +0000
Message-ID: <sq65aookfo.fsf@cam.ac.uk>

·······@randallsquared.com (Randall Randall) writes:

> (I didn't attempt to test this with (make-pathname ...), since
> that's far more complex than just using Unix paths as strings).

(non-logical) namestring parsing is implementation-dependent, so
you're just hiding the complexity and adding to it.

> Is it really the case that all four of these implementations are broken 
> in one of two ways for this function, or am I missing something? :)

The pathname specification is sufficiently ambiguous that more-or-less
any behaviour satisfies the standard.

Christophe
-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)

From: Barry Margolin
Subject: Re: Behavior of #'directory
Date: Sat, 22 May 2004 21:05:34 +0000
Message-ID: <barmar-D2E51A.17053422052004@comcast.dca.giganews.com>

In article <····························@posting.google.com>,
 ·······@randallsquared.com (Randall Randall) wrote:

> OpenMCL and CLISP return NIL for a directory without a wildcard 
> component, *even though the directory exists*.  Shouldn't they 
> both return a list of one path, the directory itself?

No, they should return a list of one path, the file named in the 
pathspec, or a list of zero paths if there is no such file.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

From: Christophe Rhodes
Subject: Re: Behavior of #'directory
Date: Sat, 22 May 2004 22:29:46 +0000
Message-ID: <sqoeog14f9.fsf@cam.ac.uk>

Barry Margolin <······@alum.mit.edu> writes:

> In article <····························@posting.google.com>,
>  ·······@randallsquared.com (Randall Randall) wrote:
>
>> OpenMCL and CLISP return NIL for a directory without a wildcard 
>> component, *even though the directory exists*.  Shouldn't they 
>> both return a list of one path, the directory itself?
>
> No, they should return a list of one path, the file named in the 
> pathspec, or a list of zero paths if there is no such file.

I'd just like to say that this is only one of several possible
interpretations of DIRECTORY's specified behaviour, given the
requirements in WILD-PATHNAME-P and PATHNAME-MATCH-P -- it happens to
be the interpretation I feel most natural, but there's certainly
nothing in particular that is overwhelmingly compelling about it.

Christophe
-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)

From: Randall Randall
Subject: Re: Behavior of #'directory
Date: Sun, 23 May 2004 07:28:42 +0000
Message-ID: <43619482.0405222328.39876841@posting.google.com>

Christophe Rhodes <·····@cam.ac.uk> wrote in message news:<··············@cam.ac.uk>...
> Barry Margolin <······@alum.mit.edu> writes:
> 
> > In article <····························@posting.google.com>,
> >  ·······@randallsquared.com (Randall Randall) wrote:
> >
> >> OpenMCL and CLISP return NIL for a directory without a wildcard 
> >> component, *even though the directory exists*.  Shouldn't they 
> >> both return a list of one path, the directory itself?
> >
> > No, they should return a list of one path, the file named in the 
> > pathspec, or a list of zero paths if there is no such file.
> 
> I'd just like to say that this is only one of several possible
> interpretations of DIRECTORY's specified behaviour, given the
> requirements in WILD-PATHNAME-P and PATHNAME-MATCH-P -- it happens to
> be the interpretation I feel most natural, but there's certainly
> nothing in particular that is overwhelmingly compelling about it.

I dunno, it just doesn't seem natural at all to me, for 
a function called "directory".  If it were called, say, 
"probe-file", that behavior would make sense.  

In any case, it isn't difficult to define functions that 
do what I expect from a set of pathname functions.  I was 
simply surprised that no implementation seemed to conform 
to my first impression of the CLHS.  It's interesting
that CMUCL and ACL do what I would expect such a function 
to do, without doing what the spec seems to say. :)

--
Randall Randall
maybe later, I'll have a .sig

From: Barry Margolin
Subject: Re: Behavior of #'directory
Date: Sun, 23 May 2004 09:39:37 +0000
Message-ID: <barmar-8A604C.05393723052004@comcast.dca.giganews.com>

In article <····························@posting.google.com>,
 ·······@randallsquared.com (Randall Randall) wrote:

> Christophe Rhodes <·····@cam.ac.uk> wrote in message 
> news:<··············@cam.ac.uk>...
> > Barry Margolin <······@alum.mit.edu> writes:
> > 
> > > In article <····························@posting.google.com>,
> > >  ·······@randallsquared.com (Randall Randall) wrote:
> > >
> > >> OpenMCL and CLISP return NIL for a directory without a wildcard 
> > >> component, *even though the directory exists*.  Shouldn't they 
> > >> both return a list of one path, the directory itself?
> > >
> > > No, they should return a list of one path, the file named in the 
> > > pathspec, or a list of zero paths if there is no such file.
> > 
> > I'd just like to say that this is only one of several possible
> > interpretations of DIRECTORY's specified behaviour, given the
> > requirements in WILD-PATHNAME-P and PATHNAME-MATCH-P -- it happens to
> > be the interpretation I feel most natural, but there's certainly
> > nothing in particular that is overwhelmingly compelling about it.
> 
> I dunno, it just doesn't seem natural at all to me, for 
> a function called "directory".  If it were called, say, 
> "probe-file", that behavior would make sense.  

It seems perfectly natural to me.  DIRECTORY lists all the files that 
match the given specification.  A file spec that doesn't contain any 
wildcards is a degenerate case that just matches a single filename 
exactly.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

From: Peter Seibel
Subject: Re: Behavior of #'directory
Date: Sun, 23 May 2004 15:36:45 +0000
Message-ID: <m3isen6tpu.fsf@javamonkey.com>

Christophe Rhodes <·····@cam.ac.uk> writes:

> Barry Margolin <······@alum.mit.edu> writes:
>
>> In article <····························@posting.google.com>,
>>  ·······@randallsquared.com (Randall Randall) wrote:
>>
>>> OpenMCL and CLISP return NIL for a directory without a wildcard 
>>> component, *even though the directory exists*.  Shouldn't they 
>>> both return a list of one path, the directory itself?
>>
>> No, they should return a list of one path, the file named in the 
>> pathspec, or a list of zero paths if there is no such file.
>
> I'd just like to say that this is only one of several possible
> interpretations of DIRECTORY's specified behaviour, given the
> requirements in WILD-PATHNAME-P and PATHNAME-MATCH-P -- it happens to
> be the interpretation I feel most natural, but there's certainly
> nothing in particular that is overwhelmingly compelling about it.

FWIW, when I was writing my chapter on pathnames I sat down and did a
fair bit of analysis of how different implementations implement the
various pathname related functions. I'd second Chistophe's sentiment
that there are several self-consistent interpretations of the standard
even when applied to the same filesystem (Unix). Below are some notes
I wrote for myself about the choices that implementors have to make,
each of which leads them off into a particular area of the space of
possible implementations. They may or may not be interesting to folks
trying to make sense of things. (Note, I use namestrings with *'s in
them as a shorthand for the pathname with :wild in the approprate
component. I understand that the parsing of namestrings is also
implementation dependent.)

* NIL's to :WILD in DIRECTORY's argument

The first choice implementors have to make is whether NIL components
in the pathname passed to DIRECTORY should be converted to :wild.

There are two reasons to do this conversion:

  - The standard says that the implementation of PATHNAME-MATCH-P,
    while implementation dependent "should be consistent with
    DIRECTORY". It also explicitly states that that PATHNAME-MATCH-P
    converts "missing components" (presumably meaning NIL components)
    to :WILD. Many people take these two requirements together to mean
    that DIRECTORY also upgrades missing components to :WILD.

  - Unix users, used to ls, may well expect (directory "/tmp/foo/") to
    return a list of files in the directory /tmp/foo, not a list of
    the one file, /tmp/foo/ itself, that matches this pathname.

On the other hand there are some reasons to not do this conversion:

  - By not upgrading NILs to :WILD, is possible to use DIRECTORY to
    get a list of just directories. (directory "/tmp/foo/") will
    return either a list of one item (if the file exists and is a
    directory) or NIL and (directory "/tmp/foo/*/") will return a list
    of immediate subdirectories of /tmp/foo/. 

A related issue:

  - 19.2.3 states: "Except as explicitly specified otherwise, for
    functions that manipulate or inquire about files in the file
    system, the pathname argument to such a function is merged
    with *default-pathname-defaults* before accessing the file system
    (as if by merge-pathnames)." Since DIRECTORY is clearly a function
    that "inquire[s] about files in the file system" it almost
    certainly should be following this rule. Depending on whether or
    not there are NILs in *default-pathname-defaults*, this merge may
    make the issue of converting NILs moot.

* "Directory Normal Form"

Another choice implementors have to make is what form of pathname to
use to represent the names of directories. In particular this choice
has to be addressed when implementing DIRECTORY. There appear to be
three implementation strategies in practice:

  - Return pathnames representing directories in "directory normal
    form" with all the name elements in the directory component and
    NIL or :UNSPECIFIC in the name and type components. SBCL, CMUCL,
    Lispworks, and OpenMCL do this by default. Allegro does it if
    passed a :directories-are-files argument of nil.

  - Return pathnames representing directories just like any other
    files, with the last name element in the name (and possibly type)
    component and the parent directory names in the directory
    component. Allegro by default does this. OpenMCL does if :??? is
    t.

  - Only return directories when given a wild card with NIL name and
    type components and return them in directory normal form. CLISP is
    the only implementation I know of that uses this strategy.

The advantage of returning directory normal form pathnames for
directories is that it is possible to tell which pathnames returned by
directory are actually names of directories without using a
implementation specific PROBE-DIRECTORY function. It also makes it
marginally easier to then obtain a listing of those subdirectories as
the name is already in the form (modulo perhaps needing to fill in the
name and type with :wild) required to pass to DIRECTORY.

The only disadvantage that I can see is that it requires DIRECTORY to
stat each file in order to create the pathname. Which seems like a
pretty small marginal cost given that it already has to do i/o to get
the directory listing (opendir and readdir).

A related question is what form of pathname PROBE-FILE returns when
given the name of a directory. Most implementations will accept either
#p"/tmp/foo" or #p"/tmp/foo/" as an argument to PROBE-FILE if
/tmp/foo/ is in fact a directory. However they differ in whether the
pathname returned--assuming the directory exists--is simply the same
pathname passed to the function or is converted to directory normal
form. Returning a name in DNF again has the advantage of indicating
not only that the file exists but that it is a directory. Not all
implementations that return DNF names from DIRECTORY return them from
PROBE-FILE. (CLISP is uniquely picky: (probe-file "/tmp/foo") signals
an error because the file in the filesystem is a directory not a file
and (probe-file "/tmp/foo/") signals an error because the pathname has
no name component.)

* PATHNAME-MATCH-P consistent with DIRECTORY

As noted above, the dictionary entry for PATHNAME-MATCH-P states that
the matching rules for PATHNAME-MATCH-P "should be consistent with
DIRECTORY". While "consistent" is not defined, either in this context
or globally, the obvious interpretation is that all the pathnames
returned by DIRECTORY for a given wild pathname should be considered
to match that same wild pathname by PATHNAME-MATCH-P.

While it seems that the choice of whether or not DIRECTORY upgrades
NIL to :WILD would have effect on this issue, it actually doesn't.
Implementations that don't upgrade NILs in DIRECTORY's argument, will
simply return fewer pathnames from DIRECTORY than implementations that
do. But all those pathnames should have either NIL or possibly
:UNSPECIFIC in the components that are NIL in the wild pathname. And
thus those pathnames should match against the same pathname when
passed to PATHNAME-MATCH-P after the NILs are converted to :WILD.

The thing that does break the PATHNAME-MATCH-P/DIRECTORY consistency
is the choice to return the names of directories in directory normal
form. For instance given the wild pathname #p"/tmp/foo/*.*", if
DIRECTORY returns #p"/tmp/foo/bar/" a literal component-by-component
match will return false because the wild pathname has a directory
component (:absolute "tmp" "foo") while the directory component of the
returned pathname is (:absolute "tmp" "foo" "bar").

Assuming that we agree that returning directory pathnames in DNF is
desirable, it might seem there is no way out of breaking
PATHNAME-MATCH-P/DIRECTORY consistency. However I think there is a
solution: since the rules for PATHNAME-MATCH-P are
implementation-dependent there's nothing that says the obvious
component-by-component matching rule is the only legal one. It seems
perfectly reasonable to say that #p"/tmp/foo/bar/" *does* match
#p"/tmp/foo/*.*" on the grounds that #p"/tmp/foo/bar/ is, for some
purposes anyway, equivalent to #p"/tmp/foo/bar". The algorithm for
PATHNAME-MATCH-P could be to check component-by-component first and if
that doesn't succeed and if the name and type components of the
non-wild argument are both NIL or :UNSPECIFIC, try again after moving
the last element of the directory component into the name and type as
appropriate. Note that this is still a purely syntactic rule; it does
not depend on the actual state of the filesystem. It does, however,
take advantage of knowledege of the file system semantics. Which seems
well within the bounds of "implementation dependent" behavior. No
implementations that I know of take this approach, however, prefering
to either break the PATHNAME-MATCH-P/DIRECTORY consistency rule or to
return directory names in non-DNF form or even not return them at all.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp

From: Christophe Rhodes
Subject: Re: Behavior of #'directory
Date: Sun, 23 May 2004 19:30:00 +0000
Message-ID: <sqoeofhrgn.fsf@cam.ac.uk>

Peter Seibel <·····@javamonkey.com> writes:

> Below are some notes I wrote for myself about the choices that
> implementors have to make, each of which leads them off into a
> particular area of the space of possible implementations.

Peter, thank you very much for this.  I hope that there's space for
this as an appendix to your book, but even if not, I look forward to
using the Message-ID of that in future. :-)

Christophe
-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)