From: Clayton Weaver
Subject: command line options are a grammar
Date: 
Message-ID: <Pine.SUN.3.96.1000624103527.6070E-100000@eskimo.com>
That's pretty much what it comes down to. The robust program
recognizes which elements of that grammar are order independent and which
parts are dependent on ordering in the command line. If two
space-separated tokens don't make sense as individual arguments,
do they make sense as a file or folder or other persistent
storage object, regardless of whether they show up aggregated into a
single argument object or as discrete argument objects when the program
starts?

If the answer is still ambiguous, then the option grammar can't solve it,
and the program either reports an error or takes whatever it's default
action is in such cases.

I suggest that the limitations of the option grammars in common
existing programs (like touch) does not make inspecting argument strings
for multiple arguments in a program therefore useless by definition.

"What tokenization renders this entire command line a valid expression
in my option grammar?" When there is more than one such tokenization,
some ambiguities can be resolved by testing the filesystem. If a command
line ambiguity arising from white space and quoting can't be
resolved, so be it, but that conclusion can't be reached a priori
for any program on the basis of multiple arguments in a single argument
object.

If you use a pathname with

  /chars -chars/whatever

in it, that would seem to be inviting disaster. I'm less interested in
handling that case without failing than the case of a new user doing
something like

  "-pattern s.*t? -f /path/to/here"

and not having it fail unnecessarily, since my command args routine will
split that into four tokens and my program will figure out what the user
wants.

If the user instead passes

  "-p s.*t? /path/to/here*"

the token on the end could be part of the pattern. But if a program
is smart, it will check to see if "/path/to/here" (last token on
the command line up to the metacharacter) exists in the filesystem, since
otherwise the input would have to be stdin. If that path doesn't exist,
if there isn't anything on stdin, if no earlier arguments are
resolvable as input names, that is the end of possibilities for a correct
set of command args. Failing to split the argument object at all would
fail even in the first case, where the probable intent is easy for the
program to guess.

If the program is possibly creating files rather than just reading
them, that changes the semantics of what is a valid command line
according to the grammar, without affecting the issue of whether
splitting an argument on field separators can come up with a valid command
according to the option grammar.

If it's not possible to definitively handle all possible ambiguities of
this nature, it's still possible to handle some of them and win sometimes
even for a relatively shell-clueless user.

Field separators for shell command lines should not be allowed in
path/folder names. Space/tab/newline as field separators was broken
from day one, where the designers weren't anticipating user friendly
folder names translated to paths. "My stuff" is a perfectly
reasonable name for a folder, and it should be for a directory or file,
too. Argument separators for command lines should use an artifical
character that does not occur in the human language names of things,
invented just for the purpose, that is not used for anything else.

But we're well beyond that now in the installed software base, so what
is the lowest level of user cluefulness in shell command line quoting that
a command line parser can accomodate?

The answer in terms of argument splitting on existing field separator
characters is context-dependent, not absolute.

Regards,

Clayton Weaver
<·············@eskimo.com>
(Seattle)

"Everybody's ignorant, just in different subjects."  Will Rogers

-- 

Clayton Weaver
<·············@eskimo.com>
(Seattle)
From: vsync
Subject: Re: command line options are a grammar
Date: 
Message-ID: <87bt0qj3me.fsf@quadium.net>
Clayton Weaver <······@eskimo.com> writes:

> the token on the end could be part of the pattern. But if a program
> is smart, it will check to see if "/path/to/here" (last token on
> the command line up to the metacharacter) exists in the filesystem, since
> otherwise the input would have to be stdin. If that path doesn't exist,
> if there isn't anything on stdin, if no earlier arguments are
> resolvable as input names, that is the end of possibilities for a correct
> set of command args. Failing to split the argument object at all would
> fail even in the first case, where the probable intent is easy for the
> program to guess.

This is wrong.  You're talking absolutely huge duplication of effort
here, as every single program has to do the same type of checks.
Incidentally, what you're describing is quite similar to MS-DOS, where 
the shell had no logic or pattern matching whatsoever.  MS-DOS was
both a pain to use and a pain to program in, and there was absolutely
no consistency between Program A and Program B.

> folder names translated to paths. "My stuff" is a perfectly
> reasonable name for a folder, and it should be for a directory or file,
> too. Argument separators for command lines should use an artifical

It is.  I can easily create any kind of directory I want, for example:

   mkdir foo\ bar
or mkdir 'foo bar'

What it comes down to is that a good shell is incredibly powerful, but 
also quite complex.  If this confuses the user, they are free to use
one of the many GUIs out there, or a simpler shell.  But they lose
some of the flexibility.  If the user wants to use a system properly,
there is going to be some learning involved.  There's no way around
it.

-- 
vsync
http://quadium.net/ - last updated Fri Jun 23 23:28:05 MDT 2000
Orjner.