From: ····@arrakis.es
Subject: Need advice on implementing logical pathnames
Date: 
Message-ID: <8qf9gn$b14$1@nnrp1.deja.com>
Hi,

I am adding support for logical pathnames in ECLS. It will be
a very light implementation --everything regarding pathnames
is <=9kb -- but I hope the only omission will be :wild-inferiors.

What I wanted to know is whether I truly understood the ANSI spec.
Briefly, the parsers distinguishes between logical pathnames and
physical pathnames for the leading hostname.

If there is a hostname it tries parsing a logical pathname. Thus,
if the hostname is not a logical one or if there format is not
the one of a logical pathname it fails.

"AAA:AAA;AA;*.*.*" ---> logical
"AAA:aaa;aa;*.*.*" ---> error, there are lowercase characters
"BBB:AAA;AAA/*.*.*" ---> error, character / is not valid

If there is no leading hostname (or the user supplied not host
in #'parse-namestring), it defaults to parsing a unix
pathname, allowing all characters

"/usr/aaa/bb.b.b" ---> unix with pathname-name "bb", and
pathname-type "b.b"

"aaa:aaa.b.c" ---> unix with pathname-name = "aaa:aaa" and
pathname-type "b.c"

"/usr/aa aaa; a.bb.b" ---> unix with pathname-name = "aa aaa; a"
and pathname-type="bb.b"

Pathname translation is simple: two pathnames are matched character
by character, except for wildcards, which take any number of characters.
Matching is not maximal, in the sense that the character after a wildcard
determines the end of the section it matches. For example
"abcefgdefdeee" against "abc*d*" gives
    Wildcard 1 = "efg"
    Wildcard 2 = "efdeee"
instead of
    Wildcard 1 = "efgdef"
    Wildcard 2 = "eee"
Besides, expressions with wildcards cannot go beyond a
directory/name/type delimiter. Thus, "aa*;bb" cannot match
"aabc;cc;bb".

For each field of a pathname (directory, name, type), a collection
of sections matched with wildcards is created, and then each section
replaces a corresponding wildcard in the same section of the
template. For instance

(translate-pathname "aa/bb/c.d.e" "a*/*/c.*.*" "/usr/foo/*/k/*/c.*.*")
=> "/usr/foo/a/k/bb/c.d.e"

but

(translate-pathname "aa/bb/c.d.e" "a*/*/c.*.*" /usr/foo/k/*/*.*.*")
=> ERROR

I would like to know what you think about this simple set of
rules. Are they too confusing? Do they violate ANSI?

TIA
    Juanjo

--
Juan Jose Garcia Ripoll www: http://www.arrakis.es/~worm/index.html
Dpto. de Matematicas job: ········@ind-cr.uclm.es
E.T.S.I. Industriales home: ····@arrakis.es
Univ. de Castilla-La Mancha, Ciudad Real E-13071 (Spain)


Sent via Deja.com http://www.deja.com/
Before you buy.

From: Marco Antoniotti
Subject: Re: Need advice on implementing logical pathnames
Date: 
Message-ID: <y6cya0kbjkt.fsf@octagon.mrl.nyu.edu>
····@arrakis.es writes:

> Hi,
> 
> I am adding support for logical pathnames in ECLS. It will be
> a very light implementation --everything regarding pathnames
> is <=9kb -- but I hope the only omission will be :wild-inferiors.

You can look at the preliminary implementation by Mark Kantrowitz in
the AI.Repository.

I would also look at the implementation in CMUCL.  It is written in CL
and it is rather portable AFAIU.


> What I wanted to know is whether I truly understood the ANSI spec.
> Briefly, the parsers distinguishes between logical pathnames and
> physical pathnames for the leading hostname.
> 
> If there is a hostname it tries parsing a logical pathname. Thus,
> if the hostname is not a logical one or if there format is not
> the one of a logical pathname it fails.
> 
> "AAA:AAA;AA;*.*.*" ---> logical
> "AAA:aaa;aa;*.*.*" ---> error, there are lowercase characters

Why should thie second one be erroneous?

> "BBB:AAA;AAA/*.*.*" ---> error, character / is not valid
> 

I would also plan ahead and add some special code to recognize DOS
names that start with a single letter device name.

Cheers

-- 
Marco Antoniotti =============================================================
NYU Bioinformatics Group			 tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                          fax  +1 - 212 - 995 4122
New York, NY 10003, USA				 http://galt.mrl.nyu.edu/valis
             Like DNA, such a language [Lisp] does not go out of style.
			      Paul Graham, ANSI Common Lisp
From: ····@arrakis.es
Subject: Re: Need advice on implementing logical pathnames
Date: 
Message-ID: <8qfvvj$5fr$1@nnrp1.deja.com>
In article <···············@octagon.mrl.nyu.edu>,
  Marco Antoniotti <·······@cs.nyu.edu> wrote:
>
> ····@arrakis.es writes:
> > I am adding support for logical pathnames in ECLS. It will be
> > a very light implementation --everything regarding pathnames
> > is <=9kb -- but I hope the only omission will be :wild-inferiors.
> You can look at the preliminary implementation by Mark Kantrowitz in
> the AI.Repository.

There is a version of that package for ECLS, but it is far too
big and the logical-pathname type is not a built in one. Instead
I need to work on the C code of ECLS to add support for logical
pathnames in the core library.

I also tried to figure out how MK's implementation works and seem
to recall it deviates from ANSI in some aspects (see below).

> > If there is a hostname it tries parsing a logical pathname. Thus,
> > if the hostname is not a logical one or if there format is not
> > the one of a logical pathname it fails.
> >
> > "AAA:AAA;AA;*.*.*" ---> logical
> > "AAA:aaa;aa;*.*.*" ---> error, there are lowercase characters
>
> Why should thie second one be erroneous?

I was assuming that those strings were supposed to be parsed
as logical pathnames. Thus, following the "Filenames" section
of the ANSI specification

[...]
host ::=!word
directory ::=!word | !wildcard-word | !wild-inferiors-word
name ::=!word | !wildcard-word
[...]
wildcard-word--one or more asterisks, uppercase letters, digits, and
hyphens, including at least one asterisk, with no two asterisks adjacent.
word--one or more uppercase letters, digits, and hyphens.
[...]

Lowercase letters are not allowed and that is not a valid
logical-pathname namestring.

This is a confusing point (at least for me :) In CLISP logical
pathnames can include lowercase characters but they are converted
to uppercase. In CMUCL I seem to recall that lowercase characters
are also allowed and they are preserved (This is just my memory of
CMUCL -- my local debian version refuses to run right now)

I would like to adopt a more or less compatible policy in case
the standard would be a problem. Any hints about ACL or LW's
behavior?

> I would also plan ahead and add some special code to recognize DOS
> names that start with a single letter device name.

Thanks, I should have thought about this. I am afraid I am a bit
too focused on unix these days.

Juanjo

--
Juan Jose Garcia Ripoll www: http://www.arrakis.es/~worm/index.html
Dpto. de Matematicas job: ········@ind-cr.uclm.es
E.T.S.I. Industriales home: ····@arrakis.es
Univ. de Castilla-La Mancha, Ciudad Real E-13071 (Spain)


Sent via Deja.com http://www.deja.com/
Before you buy.
From: Marco Antoniotti
Subject: Re: Need advice on implementing logical pathnames
Date: 
Message-ID: <y6c8zski1dq.fsf@octagon.mrl.nyu.edu>
····@arrakis.es writes:

> In article <···············@octagon.mrl.nyu.edu>,
>   Marco Antoniotti <·······@cs.nyu.edu> wrote:
> >
> > ····@arrakis.es writes:
> > > I am adding support for logical pathnames in ECLS. It will be
> > > a very light implementation --everything regarding pathnames
> > > is <=9kb -- but I hope the only omission will be :wild-inferiors.
> > You can look at the preliminary implementation by Mark Kantrowitz in
> > the AI.Repository.
> 
> There is a version of that package for ECLS, but it is far too
> big and the logical-pathname type is not a built in one. Instead
> I need to work on the C code of ECLS to add support for logical
> pathnames in the core library.

Yep.  You need that.  Have you worked on adding Conditions?  I think I
have some old code around that added conditions to an old version of ECL.

> 
> I also tried to figure out how MK's implementation works and seem
> to recall it deviates from ANSI in some aspects (see below).

I guess so.  That version predates ANSI.

> > > If there is a hostname it tries parsing a logical pathname. Thus,
> > > if the hostname is not a logical one or if there format is not
> > > the one of a logical pathname it fails.
> > >
> > > "AAA:AAA;AA;*.*.*" ---> logical
> > > "AAA:aaa;aa;*.*.*" ---> error, there are lowercase characters
> >
> > Why should thie second one be erroneous?
> 
> I was assuming that those strings were supposed to be parsed
> as logical pathnames. Thus, following the "Filenames" section
> of the ANSI specification
> 
> [...]
> host ::=!word
> directory ::=!word | !wildcard-word | !wild-inferiors-word
> name ::=!word | !wildcard-word
> [...]
> wildcard-word--one or more asterisks, uppercase letters, digits, and
> hyphens, including at least one asterisk, with no two asterisks adjacent.
> word--one or more uppercase letters, digits, and hyphens.
> [...]
> 
> Lowercase letters are not allowed and that is not a valid
> logical-pathname namestring.

Ooops.  You are right.

> 
> This is a confusing point (at least for me :) In CLISP logical
> pathnames can include lowercase characters but they are converted
> to uppercase.

CLisp logical pathnames are not quite conformant.

> In CMUCL I seem to recall that lowercase characters
> are also allowed and they are preserved (This is just my memory of
> CMUCL -- my local debian version refuses to run right now)

Yep.  They are pretty much so.

Cheers

-- 
Marco Antoniotti =============================================================
NYU Bioinformatics Group			 tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                          fax  +1 - 212 - 995 4122
New York, NY 10003, USA				 http://galt.mrl.nyu.edu/valis
             Like DNA, such a language [Lisp] does not go out of style.
			      Paul Graham, ANSI Common Lisp
From: Juan Jose Garcia Ripoll
Subject: Re: Need advice on implementing logical pathnames
Date: 
Message-ID: <87bsxf5slw.fsf@ind-cr.uclm.es>
Marco Antoniotti <·······@cs.nyu.edu> writes:

> ····@arrakis.es writes:
> 
> > In article <···············@octagon.mrl.nyu.edu>,
> >   Marco Antoniotti <·······@cs.nyu.edu> wrote:
> > >
> > > ····@arrakis.es writes:
> > > > I am adding support for logical pathnames in ECLS. It will be
> > > > a very light implementation --everything regarding pathnames
> > > > is <=9kb -- but I hope the only omission will be :wild-inferiors.
> > > You can look at the preliminary implementation by Mark Kantrowitz in
> > > the AI.Repository.
> > 
> > There is a version of that package for ECLS, but it is far too
> > big and the logical-pathname type is not a built in one. Instead
> > I need to work on the C code of ECLS to add support for logical
> > pathnames in the core library.
> 
> Yep.  You need that.

I have finally opted to have two different syntaxes, the standard one
for logical pathnames, and a w3c-like for physical pathnames:
	[device:][//hostname][directory/][name][.type]
this way under MSDOS one could introduce "a", "b", "c", etc as
devices, and space is left for further extension of ECLS with user
defined protocols ("ftp", "http", etc).

>  Have you worked on adding Conditions?  I think I
> have some old code around that added conditions to an old version of ECL.

I haven't yet and have no clue since I have never used them
before. I would be very grateful if you could send me your
code to have a starting point. Indeed I am quite concerned
about error signaling right now, and have incorporated
some significant improvements, which include better backtracking
and correctable errors wherever possible. That way
        > (make-list 'a :initial-element 'b)
        Correctable error: A is not of type (INTEGER 0 *).
                           Signalled by MAKE-LIST.
        If continued: Enter new value.
        ;;; Warning: Clearing input from *debug-io*
        Broken at MAKE-LIST.
        >> :b
        Backtrace: > MAKE-LIST
        >> :continue
        Enter new value> 10
        (B B B B B B B B B B)
        >
The jump to the debugger may be suppressed by binding *break-enable*
to nil.
        > (setq *break-enable* nil)
        NIL
        > (make-list 'a :initial-element 'b)
        Correctable error: A is not of type (INTEGER 0 *).
                           Signalled by MAKE-LIST.
        Aborting:
        >
I suppose that conditions would also introduce some uniformity in
the error signaling and the messages that users see.

Regards

	Juanjo

-- 
Juan Jose Garcia Ripoll	www: http://www.arrakis.es/~worm
Dpto. de Matematicas	job: ········@ind-cr.uclm.es
E.T.S.I. Industriales	home: ····@arrakis.es
Univ. de Castilla-La Mancha, Ciudad Real E-13071 (Spain)
From: Pekka P. Pirinen
Subject: Re: Need advice on implementing logical pathnames
Date: 
Message-ID: <ixu2avqdo1.fsf@harlequin.co.uk>
Juan Jose Garcia Ripoll <········@ind-cr.uclm.es> writes:
> I have finally opted to have two different syntaxes, the standard one
> for logical pathnames, and a w3c-like for physical pathnames:
> 	[device:][//hostname][directory/][name][.type]
> this way under MSDOS one could introduce "a", "b", "c", etc as
> devices, and space is left for further extension of ECLS with user
> defined protocols ("ftp", "http", etc).

That's an interesting idea, but it doesn't fit into CL easily.  You
need a delimiter between hostname and directory; in URLs, it's slash,
with the result that a URL with a host can't have a relative directory
path.  This is OK for URLs, but CL pathnames can be like that:
(MAKE-PATHNAME :HOST "foo" :DIRECTORY '(:RELATIVE "bar")).  I think
there isn't a requirement that all physical pathnames must have a
printed representation, but it's not bad idea.

Furthermore, what does an "ftp" protocol pathname for C:\AUTOEXEC.BAT
look like?  Where does the "C" go?

Presumably you wouldn't allow the "file" protocol syntax; that would
confuse the matter further.

The standard suggests that physical namestrings use "conventions
customary for the file system in which the named file resides".  You
are free to say that you define an abstract w3c-like file system
overlaying the physical file systems.  However, as a practical matter,
programs will have to talk to end users who are familiar with the
conventions of that OS and unfamiliar with yours.  If the Lisp
implementation doesn't use a filename syntax similar to the OS syntax,
all the programs will be hard to use, or have to include filename
parsers and printers for use on ECLS.

Also, the design of the pathname system is intended to support variant
syntaxes in the manner of Genera: If you put the hostname first, you
can then parse the rest of the pathname in the syntax of the operating
system of that host.  Again, you don't have to do that, beyond
supporting logical hosts, but it's a possible extension.  See the
examples in the standard.

For integrating various file transfer protocols into the pathname
system, a better approach might be along the lines of Genera's Generic
Network System.  The network system knows about the hosts and what
protocols they talk; the application program doesn't have to, unless
it really wants to.  (A full set of Symbolics documentation is almost
essential equipment for a Lisp implementor. :-)
-- 
Pekka P. Pirinen, Adaptive Memory Management Group, Harlequin Limited
If at first you don't succeed, call it version 1.0.
From: Reini Urban
Subject: Re: Need advice on implementing logical pathnames
Date: 
Message-ID: <39d9065e.202936467@judy>
Pekka P. Pirinen wrote:
>-- 
>Pekka P. Pirinen, Adaptive Memory Management Group, Harlequin Limited
                                                     ^^^^^^^^^^^^^^^^^
Hah! I like that sig.
From: ····@arrakis.es
Subject: Re: Need advice on implementing logical pathnames
Date: 
Message-ID: <8rc4cb$8u1$1@nnrp1.deja.com>
In article <··············@harlequin.co.uk>,
  ·····@harlequin.co.uk (Pekka P. Pirinen) wrote:
> Juan Jose Garcia Ripoll <········@ind-cr.uclm.es> writes:
> > I have finally opted to have two different syntaxes, the standard one
> > for logical pathnames, and a w3c-like for physical pathnames:
> >  [device:][//hostname][directory/][name][.type]
> > this way under MSDOS one could introduce "a", "b", "c", etc as
> > devices, and space is left for further extension of ECLS with user
> > defined protocols ("ftp", "http", etc).
>
> That's an interesting idea, but it doesn't fit into CL easily.  You
> need a delimiter between hostname and directory;

That was a typo. It should be
 [device:][[//hostname]/][directory/][name][.type]
and now it is consistent. The current implementation identifies
"ftp", "file", and single letters as devices, not as protocols.

In the implementation I was working on, "file" and single letters
were understood by the streams handler and I planned to add
hooks so that one could program thinks like "ftp" and "http".

> Furthermore, what does an "ftp" protocol pathname for C:\AUTOEXEC.BAT
> look like?  Where does the "C" go?

They could not be mixed together. I assumed that if one was to really
use "ftp", the protocol should be supplied, and the corresponding hook
should be responsible for finding all the required information in
the pathname. Perhaps "ftp://some.microsoft.ftp.server/C:/AUTOEXEC.BAT"

It is difficult to take into account systems one never works with :)
I installed windows-nt today for the first time in four years, just
for the sake of porting ecls to it.

> The standard suggests that physical namestrings use "conventions
> customary for the file system in which the named file resides".  You
> are free to say that you define an abstract w3c-like file system
> overlaying the physical file systems.  However, as a practical matter,
> programs will have to talk to end users who are familiar with the
> conventions of that OS and unfamiliar with yours.  If the Lisp
> implementation doesn't use a filename syntax similar to the OS syntax,
> all the programs will be hard to use, or have to include filename
> parsers and printers for use on ECLS.

Nop. The default syntax parsed unix and dos pathnames ok. There
could be problems with Macs and VMS, though.

> Also, the design of the pathname system is intended to support variant
> syntaxes in the manner of Genera: If you put the hostname first, you
> can then parse the rest of the pathname in the syntax of the operating
> system of that host.  Again, you don't have to do that, beyond
> supporting logical hosts, but it's a possible extension.  See the
> examples in the standard.

That is a very good point. I would also save me a lot of work since
I could just ship a default parser and let the user hook theirs in.
Besides I found today an addendum to the ANSI spec which suggests
what you say and makes the behavior of PARSE-NAMESTRING less
confusing and easier to implement

http://www.xanalys.com/software_tools/reference/HyperSpec/Issues/iss258-writeup.html

> For integrating various file transfer protocols into the pathname
> system, a better approach might be along the lines of Genera's Generic
> Network System.  The network system knows about the hosts and what
> protocols they talk; the application program doesn't have to, unless
> it really wants to.  (A full set of Symbolics documentation is almost
> essential equipment for a Lisp implementor. :-)

I am still looking for one of those books. I hoped they could be
on-line but it seems I am out of luck.

Regards

   Juanjo

--
Juan Jose Garcia Ripoll www: h
Dpto. de Matematicas job: ········@ind-cr.uclm.es
E.T.S.I. Industriales home: ····@arrakis.es
Univ. de Castilla-La Mancha, Ciudad Real E-13071 (Spain)


Sent via Deja.com http://www.deja.com/
Before you buy.
From: Pekka P. Pirinen
Subject: Re: Need advice on implementing logical pathnames
Date: 
Message-ID: <ixd7hi2bma.fsf@harlequin.co.uk>
····@arrakis.es writes:
> In article <··············@harlequin.co.uk>,
>   ·····@harlequin.co.uk (Pekka P. Pirinen) wrote:
> Nop. The default syntax parsed unix and dos pathnames ok. There
> could be problems with Macs and VMS, though.

Even with backslashes in DOS names?  Although, if that's the only
issue, it would not be very hard for programmers to program around it.

> > Also, the design of the pathname system is intended to support variant
> > syntaxes in the manner of Genera: If you put the hostname first, you
> > can then parse the rest of the pathname in the syntax of the operating
> > system of that host. [...]
> 
> That is a very good point. I would also save me a lot of work since
> I could just ship a default parser and let the user hook theirs in.

Or perhaps implementors porting ECLS to other platforms.
Unfortunately, it's not quite that simple.

> Besides I found today an addendum to the ANSI spec which suggests
> what you say and makes the behavior of PARSE-NAMESTRING less
> confusing and easier to implement
> 
> http://www.xanalys.com/software_tools/reference/HyperSpec/Issues/iss258-writeup.html

It's basically a good idea, but as Barry Margolin points out in the
writeup, defining how it should work with DOS and VMS names that
already contain colons is not straight-forward.  I suppose that's why
that proposal was not adopted in the standard.  I do think it can be
made it work, you just need to work out a consistent and practical set
of rules.  The concept of a default host needs to be made explicit
(perhaps the host component of *DEFAULT-PATHNAME-DEFAULTS*, although
that would have repercussions all over).
-- 
<--/---\----^-78-cols---------------------Your-Name-here-------Something----->
 / Small \4 rows  .sig 1.0beta  *<Perth ·····@address.here  witty some dead
/  ASCII  \ |   {Minimize whitespace}    Profession here   guy once said here
\ Picture / v  Whatever you like here   Company name here  (But witty! Hear?)