Two :UNSPECIFIC puzzles

From: Richard M Kreuter
Subject: Two :UNSPECIFIC puzzles
Date: Tue, 25 Sep 2007 00:45:11 +0000
Message-ID: <87r6knd148.fsf@progn.net>

Hello,

Here are two related puzzles about the consequences of an
implementation permitting :UNSPECIFIC as a type component.  I wonder
if people have any opinions about these.

Puzzle (1)

Assume that PATHNAME is some pathname that's printable and that its
host is such that :UNSPECIFIC is permitted for the type (e.g.,
PATHNAME is not a logical pathname).  Consider this program:

(let ((*print-readably* t))
  (print (make-pathname :type nil :defaults pathname))
  (print (make-pathname :type :unspecific :defaults pathname)))

If this program doesn't error, we should see two #P"..." sequences.

Now 19.2.2.2.3.1 says

    If a pathname is converted to a namestring, the symbols ‘nil’ and
    :unspecific cause the field to be treated as if it were empty.
    That is, both ‘nil’ and :unspecific cause the component not to
    appear in the namestring.

This means that the two pathnames above must have STRING-EQUAL or
maybe even STRING= namestrings, right?  But the dictionary entry for
*PRINT-READABLY* requires that what's printed will read back in as
similar pathnames, which isn't possible, since the same namestring
parsed twice can't yield different pathnames.

Puzzle (2)

The dictionary entry for ENOUGH-NAMESTRING requires that the following
holds for all possible values of PATHNAME and DEFAULTS:

(merge-pathnames (enough-namestring pathname defaults) defaults)
  = (merge-pathnames (parse-namestring pathname nil defaults) defaults)

where the equal sign is supposed to represent equivalence of the
pathnames on either side.

Now, suppose PATHNAME is a pathname that has :UNSPECIFIC for the type,
and DEFAULTS is a pathname with a type other than :UNSPECIFIC.  Then

(parse-namestring pathname nil defaults) => PATHNAME
(pathname-type (merge-pathnames pathname defaults)) => :UNSPECIFIC

and so

(pathname-type (merge-pathnames (parse-namestring pathname nil defaults)))
=> :UNSPECIFIC

So the pathname on the right-hand-side of the equation has :UNSPECIFIC
for the type.

Now suppose that

(enough-namestring pathname defaults)

returns some string, and (from 19.2.2.2.3.1) the type component will
not appear in that string.  However, if the parse implicit in calling
MERGE-PATHNAMES returns a pathname whose type is NIL, then
MERGE-PATHNAMES will merge in DEFAULT's type, which is, by
stipulation, not :UNSPECIFIC.

Any thoughts?

--
RmK

Re: Two :UNSPECIFIC puzzles Kent M Pitman
- Re: Two :UNSPECIFIC puzzles Richard M Kreuter
  - Re: Two :UNSPECIFIC puzzles Kent M Pitman

From: Kent M Pitman
Subject: Re: Two :UNSPECIFIC puzzles
Date: Tue, 25 Sep 2007 04:12:33 +0000
Message-ID: <ubqbrfkni.fsf@nhplace.com>

Richard M Kreuter <·······@progn.net> writes:

> This means that the two pathnames above must have STRING-EQUAL or
> maybe even STRING= namestrings, right?  But the dictionary entry for
> *PRINT-READABLY* requires that what's printed will read back in as
> similar pathnames, which isn't possible, since the same namestring
> parsed twice can't yield different pathnames.

*PRINT-READABLY* does not guarantee you can print all things.  In
fact, a good and reasonable interpretation is that the one of these
pathnames that cannot be printed re-readably ought indeed to signal an
error during printing.

> Puzzle (2)
> 
> The dictionary entry for ENOUGH-NAMESTRING requires that the following
> holds for all possible values of PATHNAME and DEFAULTS:
> 
> (merge-pathnames (enough-namestring pathname defaults) defaults)
>   = (merge-pathnames (parse-namestring pathname nil defaults) defaults)
> 
> where the equal sign is supposed to represent equivalence of the
> pathnames on either side.
> 
> Now, suppose PATHNAME is a pathname that has :UNSPECIFIC for the type,
> and DEFAULTS is a pathname with a type other than :UNSPECIFIC.  Then
> 
> (parse-namestring pathname nil defaults) => PATHNAME
> (pathname-type (merge-pathnames pathname defaults)) => :UNSPECIFIC
> 
> and so
> 
> (pathname-type (merge-pathnames (parse-namestring pathname nil defaults)))
> => :UNSPECIFIC
> 
> So the pathname on the right-hand-side of the equation has :UNSPECIFIC
> for the type.
> 
> Now suppose that
> 
> (enough-namestring pathname defaults)
> 
> returns some string, and (from 19.2.2.2.3.1) the type component will
> not appear in that string.  However, if the parse implicit in calling
> MERGE-PATHNAMES returns a pathname whose type is NIL, then
> MERGE-PATHNAMES will merge in DEFAULT's type, which is, by
> stipulation, not :UNSPECIFIC.
> 
> Any thoughts?

Sounds like [and I didn't reason this through super-carefully tonight
as I'm skimming this, so I'm prepared to be debunked] this line of
reasoning (correctly, IMO) requires, by implication, that type=NIL
is the one that PARSE-NAMESTRING should not return.

I'm pretty sure chapter 19 was highest on my list of chapters to
rewrite if I'd had more time before we closed work on the spec.  It
was seriously in need of more work, I freely admit.  And it's not
surprising there are odd puzzles.  The problem is that all the
rewriting I did had to merely untangle wording but not change any
meaning--I wasn't at liberty to create new semantics (whether
allegedly better or not), only to expose existing semantics more
clearly.  Only the committee could, by vote, change the meaning... and
usually such a change involved being able to sort through the
vagueries of the much adored but sometimes-hard-to-puzzle-out CLTL
from which the original definition of the language was derived before
we modified it to be ANSI CL.

[As far as I'm concerned, it'd be fine if there were no parse operation
that returned type=NIL since that particular representation is really
a meta-expression that refers to a partially formed pathname that has not
been reified into concrete platform syntax.]

In fact, though, if you want to dive deeper into how this complete
mess arose, the problem comes that a lot of this stuff was taken from
the Lisp Machine, which had dealt with this in great detail, but the
committee did not want to accept all of what the LispM did, hoping
somehow that by leaving stuff out it would seem like a "conservative
result".  (I'm characterizing this based on personal observation, of
course, and this is therefore just my personal opinion about what
drove the motivation.  You won't be able to find documents of record
that support, other than by circumstantial evidence, that this is what
happened. But I think if you see it that way, you'll have at least one
coherent way of understanding the perils of committee results.)  On
the Lisp Machine, there was a character which was represented by the
character ASCII ^W (Control-W) and which in the Lisp Machine character
set (derived from the SAIL, or Stanford AI Lab character set) appeared
as a either a left-and-right-pointing-arrow (single horizontal line,
arrows at both ends) or two arrows, one left-pointing and one
right-pointing, in the same character.  That character (which I'll
denote by <-> was used to denote an absent component, and on the LispM
when you saw such a file it printed as #P"foo.<->" meaning, in effect,
#.(cl:make-pathname :name "FOO" :type :unspecific :case :common) The
LispM had made a critical distinction that CL left out, in which there
was a string-for-host operation distinct from namestring, so that you
could separately request CL's notation from the platform's native
notation.  And on the LispM, #P was defined to show the CL notation,
not the native notation, while the parsing operations were explicit
about whether you were parsing that notation or the host notation.
Because CL kind of blurs these, the rest is a bit murky on a few edge
cases.  I think the LispM was also more specific about what was
allowed to be :unspecific, and ANSI CL chose to be more vague.

Fortunately, while these things do sometimes keep implementors and 
language lawyers up late, I think mostly, in practice, they are not a 
huge problem for most users.

All just my opinion.  Sorry it's all so hastily written.  It's late and
I need to get to sleep.

From: Richard M Kreuter
Subject: Re: Two :UNSPECIFIC puzzles
Date: Wed, 26 Sep 2007 04:48:22 +0000
Message-ID: <87myvac9rd.fsf@progn.net>

Kent M Pitman <······@nhplace.com> writes:
> Richard M Kreuter <·······@progn.net> writes:
>
>> This means that the two pathnames above must have STRING-EQUAL or
>> maybe even STRING= namestrings, right?  But the dictionary entry for
>> *PRINT-READABLY* requires that what's printed will read back in as
>> similar pathnames, which isn't possible, since the same namestring
>> parsed twice can't yield different pathnames.
>
> *PRINT-READABLY* does not guarantee you can print all things.  In
> fact, a good and reasonable interpretation is that the one of these
> pathnames that cannot be printed re-readably ought indeed to signal an
> error during printing.

Okay.

>> Puzzle (2)
>> 
>> The dictionary entry for ENOUGH-NAMESTRING requires that the following
>> holds for all possible values of PATHNAME and DEFAULTS:
>> 
>> (merge-pathnames (enough-namestring pathname defaults) defaults)
>>   = (merge-pathnames (parse-namestring pathname nil defaults) defaults)
<snip>
> Sounds like [and I didn't reason this through super-carefully tonight
> as I'm skimming this, so I'm prepared to be debunked] this line of
> reasoning (correctly, IMO) requires, by implication, that type=NIL
> is the one that PARSE-NAMESTRING should not return.

This is actually what I'm interested in finding a justification for;
it also happens that no implementation I've tested does this.

However, I think an implementation could conform to the requirements
of the two puzzles by stipulating that for one of NIL or :UNSPECIFIC,
a pathname with that value for the type won't print while
*PRINT-READABLY* is true and that ENOUGH-NAMESTRING has a corner case
when the first argument is a pathname with :UNSPECIFIC for the type.

> [As far as I'm concerned, it'd be fine if there were no parse operation
> that returned type=NIL since that particular representation is really
> a meta-expression that refers to a partially formed pathname that has not
> been reified into concrete platform syntax.]

IMO the pathnames system would be vastly easier to explain and
understand if it were the case that physical pathnames with NIL for
any components didn't correspond to namestrings or denote filenames.
But as I mention, I can't find an implementation that does this: they
all seem to parse "foo" (as a namestring or as a filename) into a
pathname with NIL for the type, and it's not clear that they fail to
conform for doing so.  

However, it's not quite possible to say that no parse operation can
return a pathname with NIL for the type: logical pathname namestrings
with no type must parse to such pathnames by 19.3.2.1.  But note that
that means that if *DEFAULT-PATHNAME-DEFAULTS* is #P"SYS:" then

(merge-pathnames "foo" "bar.lisp") => "FOO.LISP"  ;give or take lettercase

But if *DEFAULT-PATHNAME-DEFAULTS* is some Unix pathname and the lack
of an extension parses as :UNSPECIFIC for Unix hosts.

(merge-pathnames "foo" "bar.lisp") => "foo" 

And that's a little weird, I think.  Or maybe it isn't; my intuitions
are shot.

> Fortunately, while these things do sometimes keep implementors and 
> language lawyers up late, I think mostly, in practice, they are not a 
> huge problem for most users.

I don't think the issues I've mentioned have kept enough implementors
up late: all implementations I can find that permit :UNSPECIFIC for
the type fail to meet the constraints of both the puzzles in my
previous post, and the reason for this, IMO, is that the CLHS doesn't
give implementors or users any model for how pathnames are really
supposed to work.  For example, I think any of these would be valid
interpretations of the roles of NIL and :UNSPECIFIC for the type (not
an exhaustive enumeration):

[A] Filenames with no type are always be represented by pathnames with
    :UNSPECIFIC for the type.  A pathname with NIL for the type does
    not denote a filename.

[B] Pathnames with either :UNSPECIFIC or NIL for the type denote
    filenames with no type.  Users may construct pathnames with either
    value for the type, but functions that return pathnames that
    denote filenames that lack types (e.g., PARSE-NAMESTRING,
    DIRECTORY) always have :UNSPECIFIC for the type.

[C] Pathnames with either :UNSPECIFIC or NIL for the type denote
    filenames with no type.  Users may construct pathnames with either
    value for the type, but functions that return pathnames that
    denote filenames that lack types always have NIL for the type.

(I've switched from talking about namestrings to talking about the
denotations of pathnames because that's the important thing.  From the
available evidence, I can't tell whether Symbolics's model was [A] or
[B].) For each pair of interpretations, you can find programs that
will work under one and not the other.  For example:

;; Should work under [B] and [C] but not [A].
(probe-file (make-pathname :name "FOO"
                           :type nil
                           :case :common
                           :defaults (user-homedir-pathname)))

;; Should work under [A] and [B], but probably not under [C].  Of
;; course nobody would do this, but the point is that interpretation
;; [C] makes *DEFAULT-PATHNAME-DEFAULTS* trickier to use safely.
(let ((*default-pathname-defaults* (make-pathname
                                    :type "SOME-EXTREMELY-IMPROBABLE-TYPE"
                                    :case :common)))
  (probe-file (pathname "/etc/passwd")))

As far as I can tell, most implementations behave as if they operated
under interpretation [C].  But interpretation [C] has the consequence
that the user's use of *DEFAULT-PATHNAME-DEFAULTS* must be constrained
to prevent allowing a type to be merged into, say, the pathnames
returned by calls to DIRECTORY.  In fact, under interpretation [C],
#.(cl:make-pathname :name "foo" :type :unspecific) is just some weird
variant of #.(cl:make-pathname :name "foo" :type nil) with different
merging behavior, whose only significant purpose is to let you rename
"foo.txt" to "foo".

> All just my opinion.  Sorry it's all so hastily written.  It's late and
> I need to get to sleep.

Well thanks for the feedback.

--
RmK

From: Kent M Pitman
Subject: Re: Two :UNSPECIFIC puzzles
Date: Wed, 26 Sep 2007 12:01:42 +0000
Message-ID: <ups05bpp5.fsf@nhplace.com>

Richard M Kreuter <·······@progn.net> writes:

I'm going to speak here only to my personal sense of what either was
intended or what I think personally, not to what the spec may say --
which could well conflict, for all I know, it being messy on some of
these points.  Take all of this as my personal opinion, not a definitive
reading of the spec, and not even my personal reading of the spec.

> For example, I think any of these would be valid
> interpretations of the roles of NIL and :UNSPECIFIC for the type (not
> an exhaustive enumeration):
> 
> [A] Filenames with no type are always be represented by pathnames with
>     :UNSPECIFIC for the type.  A pathname with NIL for the type does
>     not denote a filename.

I think not all pathnames correspond to filenames.  Some pathnames exist
for the purpose of merging.  I think NIL is, or should be, a filler only
valid in such a pathname.  I think a filename that corresponds to a file
with no type should have :UNSPECIFIC in it so that changing its type 
requires manually setting the type or merging with it a left-hand-side
filename on MERGE-PATHNAMES that also has a non-NIL type.

So I think I agree with [A].

> [B] Pathnames with either :UNSPECIFIC or NIL for the type denote
>     filenames with no type.  Users may construct pathnames with either
>     value for the type, but functions that return pathnames that
>     denote filenames that lack types (e.g., PARSE-NAMESTRING,
>     DIRECTORY) always have :UNSPECIFIC for the type.

I don't think this captures what I think, because of what I said about
"pathname" vs "filename" above.  I don't think there are, or should
be, any actual files in a file system with type NIL.  If the file
supports not having a type, I think Common Lisp should represent that
with :UNSPECIFIC.

However, I personally partition "functions that return pathnames" into
two categories you don't have here, and if you make it be only one 
category you blur two important things:  functions that manipulate 
filenames without consulting the file system and functions that do
consult the file system.  The latter category can return :UNSPECIFIC
because of a PROBE-FILE kind of thing, the former category cannot know
and should must return NIL unless :UNSPECIFIC was used explicitly
by the person doing the construction in a MAKE-PATHNAME call or must
use :UNSPECIFIC and never NIL by 
PARSE-NAMESTRING (but only because it's been put in the de facto
position of doing the operation on most systems that I'd rather call
something like PARSE-NATIVE-NAMESTRING).  [Although if the parse was
done on a machine that does not allow :UNSPECIFIC in the first place,
such as TOPS-20, NIL would be ok because there would be no ambiguity
that the user had typed only a partial filename--in effect, the file
system itself has the notion of partial files, which Unix can be argued
not to be--just a sometimes-aggressive notion of of taking full filenames
and sometimes making them more-full, which is not the same thing.
Since foo.bar.baz.quux is a filename on Unix, too, you can't say it's
added type fields every time.]  But, for example, a
(PROBE-FILE (MAKE-PATHNAME :NAME "foo" :TYPE NIL)) might well return
a :TYPE :UNSPECIFIC if the pathname defaults contained no type and a
discovery was made in the file system that there was a file that had
no type and no operation was used that forced a type. (There's a weird
middle case of doing a LOAD of a file like that where you're saying the
default type is NIL and where both "foo" and "foo.lisp" exist, and I think
it's been intended as a policy question for the implementation whether
"foo.<->", that is, name="FOO",type=NIL can find "foo" or whether it can
only find "foo.lisp", though personally I think just probing :UNSPECIFIC
first if that's what you want, just as you might probe :FASL first is
a good idea.)

[*] By the phrase "should must", I didn't make a typo, but rather I
    mean "should have to".  Since I do this occasionally without thinking,
    I should here while I am thinking that I like double-modals as a way
    of talking about certain hypotheticals, and I'm trying to use them 
    more to get them to gain currency even though I don't know if they
    are classically thought grammatical.

> [C] Pathnames with either :UNSPECIFIC or NIL for the type denote
>     filenames with no type.  Users may construct pathnames with either
>     value for the type, but functions that return pathnames that
>     denote filenames that lack types always have NIL for the type.

I think this is the ultimate intent--to have two ways of representing a
potential name of a file, one which gets filled later on merging and one
that does not. (Modulo the messy policy choice mentioned parentically at
the end of my remarks on [B] above, which I personally think.)

> (I've switched from talking about namestrings to talking about the
> denotations of pathnames because that's the important thing.  From the
> available evidence, I can't tell whether Symbolics's model was [A] or
> [B].) For each pair of interpretations, you can find programs that
> will work under one and not the other.  For example:

My intuitions and biases were originally formed by the use of the
symbolics system over many years since it was known to work on a great
many file systems invisibly and seamlessly interacting with all of
them, and I liked that, but I haven't used it regularly in quite some
time, so I'd have to double-check.  If I get some time, I'll try to
cross-check, but I think there are people on this group who may also
want to chime in and save me the trouble.

> ;; Should work under [B] and [C] but not [A].

I don't know that I agree.  I think it depends on whether 
USER-HOMEDIR-PATHNAME has NIL's in its type.  I don't think it has
to contain a valid name of a file, since not all file systems even
have directories that are filenames and also since even those that
do often have two forms of pathname that represent them.

> (probe-file (make-pathname :name "FOO"
>                            :type nil
>                            :case :common

[Not related, but:
 I think it was stupid that CL didn't make :CASE :COMMON the default.
 But oh well... no one seems to like that mode, probably because it
 uses uppercase as the interchange case, so maybe no one else does.
 I think it also would have been terrible to use lowercase as the 
 interchange case because it would have meant that people didn't even
 realize they were using it and got confused when it actually had to
 do something. At least presently it's usually visually distinctive
 and invites discussion about why that is.]

>                            :defaults (user-homedir-pathname)))
> 
> 
> ;; Should work under [A] and [B], but probably not under [C].  Of
> ;; course nobody would do this, but the point is that interpretation
> ;; [C] makes *DEFAULT-PATHNAME-DEFAULTS* trickier to use safely.
> (let ((*default-pathname-defaults* (make-pathname
>                                     :type "SOME-EXTREMELY-IMPROBABLE-TYPE"
>                                     :case :common)))
>   (probe-file (pathname "/etc/passwd")))

Well, I think it's reasonable for *default-pathname-defaults* to
always start with :UNSPECIFIC in its type on a host that permits there to be
an unspecific type.

But it's also reasonable for the pathname parser (if you're making it
overload as the native file system's filename string parser) to assume
you're giving it a hosted filename, and so to always return
:UNSPECIFIC in that case, so that there's not an issue because both
pathname and probe-file would return type=:UNSPECIFIC.

> As far as I can tell, most implementations behave as if they operated
> under interpretation [C].  But interpretation [C] has the consequence
> that the user's use of *DEFAULT-PATHNAME-DEFAULTS* must be constrained
> to prevent allowing a type to be merged into, say, the pathnames
> returned by calls to DIRECTORY.

So that means such a constraint may be implied in that choice, it
doesn't have to mean you can't make that choice.  The reason I point
this out is detailed in my summary point at the end so it's easier to
pick out for someone skimming.

>  In fact, under interpretation [C],
> #.(cl:make-pathname :name "foo" :type :unspecific) is just some weird
> variant of #.(cl:make-pathname :name "foo" :type nil) with different
> merging behavior, whose only significant purpose is to let you rename
> "foo.txt" to "foo".

I think this is the original motivation.

Though you never asked about the Unix filename called .login ... that one 
certainly makes a mess, too. ;)

> > All just my opinion.  Sorry it's all so hastily written.  It's late and
> > I need to get to sleep.
> 
> Well thanks for the feedback.

Not a problem.

Final thoughts...

The spec is a sort of an agreed-upon compromise (like a treaty almost)
among vendors, who ultimately each said "I can live with these words".
The words may have been construed by each individually differently,
and the fact of that flexibility may be the reason the spec came into
being.  That is, if you'd been more rigorous all you'd have is
non-agreement, not an agreed-upon thing everyone did the same.  That's
how politics and political treaties work.  So make sure you keep that
in mind and that you don't use the fact that there are multiple
readings that are internally consistent but mutually incompatible
trick you into thinking that it's a necessary choice that it all is
portable.  It was much worse in CLTL, and there were many compromises
that brought the community MUCH closer together in ANSI CL.  But one
of the compromises was not "we'll all break our implementations and do
something different".

Moreover, in the end, the way we finally closed work on the standard
was not to say "this is perfect" but to say "vendors are already
effectively using this to get work done and users seem happy and we
should just publish it".  So that means, I think, it mostly works.  I
think it's useful to publish these edge cases and for everyone to know
about them.  New implementations should try to be compatible with
existing ones unless they have a strong reason not to, I think, just
to minimize hassle for everyone.  And #+/#- and other tools are there
to help people over the hump of minor differences that remain after
everyone has done their best to get along and some disagreement
remains.