From: Thaddeus L Olczyk
Subject: Normalising Pathnames and file names.
Date: 
Message-ID: <3c5566f5.78142734@nntp.interaccess.com>
There are several ways that I would like to normalise path/filenames.
They are:
1) Since the file may be on either Windows or Unix, and since Windows
    isn't particular about whether / or \ are the pathname seperators,

    I would like to change \ to /.
2) I want to remove all occurences of two simulataneuos /. EG   
    "///'->"/".
3) I want to remove all occurances of ".". ( Remember that "." can
    occur at the end or start of a name. At the end I just want to 
    remove it. At the start I want the absolute pathname. )
4) I want to remove all occurances of  ".." which also means that
     I want to remove the directory preceding "..".
5) If the name is local, I want to make it absolute.

One warning. "." and ".." can legally occur inside path/filenames.
So you have to be careful that it represents a directory.

I can do this relatively easily using regex's but if it's easy to do
in Lisp without regex's I would prefer that. 
Thanks.

From: Tim Bradshaw
Subject: Re: Normalising Pathnames and file names.
Date: 
Message-ID: <ey3d6zuwi7n.fsf@cley.com>
* Thaddeus L Olczyk wrote:
> 3) I want to remove all occurances of ".". ( Remember that "." can
>     occur at the end or start of a name. At the end I just want to 
>     remove it. At the start I want the absolute pathname. )

Actually I suspect you mean you want to elide /./, and if you see ./
at the start it is a *relative* pathname, not absolute.

> 4) I want to remove all occurances of  ".." which also means that
>      I want to remove the directory preceding "..".

You can't always do this.  ../foo for instance.

> 5) If the name is local, I want to make it absolute.

What does `local' mean?

> One warning. "." and ".." can legally occur inside path/filenames.
> So you have to be careful that it represents a directory.

// is also dangerous because it means things on Windows - //foo/bar
(or really \\foo\bar) is a UNC (?) pathname meaning share bar on host
foo or something. You need to be careful about eliding it, anyway.

--tim
From: Thaddeus L Olczyk
Subject: Re: Normalising Pathnames and file names.
Date: 
Message-ID: <3c5581b9.84994531@nntp.interaccess.com>
On 28 Jan 2002 16:31:08 +0000, Tim Bradshaw <···@cley.com> wrote:

>* Thaddeus L Olczyk wrote:
>> 3) I want to remove all occurances of ".". ( Remember that "." can
>>     occur at the end or start of a name. At the end I just want to 
>>     remove it. At the start I want the absolute pathname. )
>
>Actually I suspect you mean you want to elide /./, and if you see ./
>at the start it is a *relative* pathname, not absolute.
>
When I said I want the absolute pathname if it appears at the start,
I meant I want the function to return an absolute pathname. Yes I 
know that I'm being redundant, but in terms of the statement I have
to explain somewhat here.

>> 4) I want to remove all occurances of  ".." which also means that
>>      I want to remove the directory preceding "..".
>
>You can't always do this.  ../foo for instance.
>
Again I want the absolute pathname.

>> 5) If the name is local, I want to make it absolute.
>
>What does `local' mean?
relative

>
>> One warning. "." and ".." can legally occur inside path/filenames.
>> So you have to be careful that it represents a directory.
>
>// is also dangerous because it means things on Windows - //foo/bar
>(or really \\foo\bar) is a UNC (?) pathname meaning share bar on host
>foo or something. You need to be careful about eliding it, anyway.
>
Hmm. Yes. If it appears in the middle there is no problem. Just
another complication.
From: Tim Bradshaw
Subject: Re: Normalising Pathnames and file names.
Date: 
Message-ID: <ey37kq2wfsn.fsf@cley.com>
* Thaddeus L Olczyk wrote:
>> 
> Again I want the absolute pathname.

In that case your function needs at least one more argument: the
(absolute) pathname to merge with.

--tim
From: Lieven Marchand
Subject: Re: Normalising Pathnames and file names.
Date: 
Message-ID: <m3ofjetndc.fsf@localhost.localdomain>
Tim Bradshaw <···@cley.com> writes:

> // is also dangerous because it means things on Windows - //foo/bar
> (or really \\foo\bar) is a UNC (?) pathname meaning share bar on host
> foo or something. You need to be careful about eliding it, anyway.

It can also mean something on Unix. According to the POSIX spec, //
indicates the beginning of an implementation defined namespace. So if
a Unix decides to make //foo/bar different from /foo/bar, they're
entitled too. It'll probably break a lot of programs though.

-- 
Lieven Marchand <···@wyrd.be>
She says, "Honey, you're a Bastard of great proportion."
He says, "Darling, I plead guilty to that sin."
Cowboy Junkies -- A few simple words
From: Pierre R. Mai
Subject: Re: Normalising Pathnames and file names.
Date: 
Message-ID: <87elka1h12.fsf@orion.bln.pmsf.de>
Tim Bradshaw <···@cley.com> writes:

> // is also dangerous because it means things on Windows - //foo/bar
> (or really \\foo\bar) is a UNC (?) pathname meaning share bar on host
> foo or something. You need to be careful about eliding it, anyway.

IIRC then // at the start of a pathname is reserved for
"implementation" defined extensions by the relevant POSIX standard, so
this is something you'll have to be careful about if you want to stay
within POSIX as well...

Regs, Pierre.

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein
From: Kaz Kylheku
Subject: Re: Normalising Pathnames and file names.
Date: 
Message-ID: <p4i58.26092$nb.1339146@news1.calgary.shaw.ca>
In article <··············@orion.bln.pmsf.de>, Pierre R. Mai wrote:
>Tim Bradshaw <···@cley.com> writes:
>
>> // is also dangerous because it means things on Windows - //foo/bar
>> (or really \\foo\bar) is a UNC (?) pathname meaning share bar on host
>> foo or something. You need to be careful about eliding it, anyway.
>
>IIRC then // at the start of a pathname is reserved for
>"implementation" defined extensions by the relevant POSIX standard, so
>this is something you'll have to be careful about if you want to stay
>within POSIX as well...

Thanks for pointing this out. Note that Cygwin for instance uses this to
escape drive letter names (ugh). I didn't know that this was a place in
POSIX for extensions. Have a copy of Draft 7 handy so I will look this up.
From: Ian Wild
Subject: Re: Normalising Pathnames and file names.
Date: 
Message-ID: <3C56631C.B97A05F6@cfmu.eurocontrol.be>
Tim Bradshaw wrote:
> 
> * Thaddeus L Olczyk wrote:

> > One warning. "." and ".." can legally occur inside path/filenames.
> > So you have to be careful that it represents a directory.
> 
> // is also dangerous because it means things on Windows - //foo/bar
> (or really \\foo\bar) is a UNC (?) pathname meaning share bar on host
> foo or something. You need to be careful about eliding it, anyway.


Speaking of Windows, at least some variants of Win32
also treat "..." as an abbreviation of "../.." (and
similarly for higher numbers of dots).  Caused at least
one web server to leak files it had been asked not
to hand out willy nilly...
From: Kaz Kylheku
Subject: Re: Normalising Pathnames and file names.
Date: 
Message-ID: <itf58.19265$jb.954215@news2.calgary.shaw.ca>
In article <·················@nntp.interaccess.com>, Thaddeus L Olczyk wrote:
>There are several ways that I would like to normalise path/filenames.
>They are:
>1) Since the file may be on either Windows or Unix, and since Windows
>    isn't particular about whether / or \ are the pathname seperators,
>
>    I would like to change \ to /.
>2) I want to remove all occurences of two simulataneuos /. EG   
>    "///'->"/".
>3) I want to remove all occurances of ".". ( Remember that "." can
>    occur at the end or start of a name. At the end I just want to 
>    remove it. At the start I want the absolute pathname. )
>4) I want to remove all occurances of  ".." which also means that
>     I want to remove the directory preceding "..".
>5) If the name is local, I want to make it absolute.

I just put out a freeware program which has a canonicalization function
for POSIX paths. It doesn't do 1 or 5, for the following reasons.
Windows paths are different enough from POSIX paths that this naive
treatment won't do. For example if you change \\host\ to //host/ and
then suppress repeated slashes, you end up with something incorrect.

Requirement 5 entails making a call to the operating system to discover
the path to the current working directory.  There is no need to have this
in the canonicalization function.  If it's a relative path, the caller
can just catenate the current working directory path to that path
path and invoke your canonicalization routine, right?

/home/tolczyk + ../rel/path =  /home/tolczyk/../rel/path
                            => /home/rel/path

Also how do you know that the path is intended to be relative to the
current working directory, anyway? Maybe it's relative to somewhere else.

On the other hand, another requirement you might want to add is to
resolve symbolic links.  (I don't do this in my function, but the
BSD library function realpath() does it).

Which brings up another point: there may be a path canonicalization
function in your system library, which may be worth interfacing with
over FFI, provided it has the semantics you want.

>One warning. "." and ".." can legally occur inside path/filenames.
>So you have to be careful that it represents a directory.

Only if you do something silly, like use regular expressions. ;)

>I can do this relatively easily using regex's but if it's easy to do
>in Lisp without regex's I would prefer that. 

I actually hacked up an initial solution to this using regex's, and then
realized that it was overly complicated, and slow. Getting all the cases
right required hacks, such as protecting a leading / by doubling it up
and then removing it later. And of course, regular expressions cannot
handle the removal of .. because that requires a pushdown automaton.
So at best you can do it iteratively, by repeatedly reprocessing the
path with a regex for matching a component followed by .. until
they are all gone.

(defun canonicalize-path (path)
"Simplifies a POSIX path by eliminating . components, splicing out as many ..
components as possible, and condensing multiple slashes. A trailing slash is
guaranteed to be preserved, if it follows something that could be a file or
directory.  Two values are returned, the simplified path and a boolean value
which is true if there are any .. components that could not be spliced out.
Copyright 2002, Kaz Kylheku.  Permission granted to copy, modify and
use for any purpose."
  (let ((split-path (split-fields path "/"))
        uncanceled-up)

    ;; First, if the path has at least two components,
    ;; replace the first empty one with the symbol :root
    ;; and the last empty one with :dir. These indicate a
    ;; leading and trailing /
    (when (and (> (length split-path) 1))
      (when (string= (first split-path) "")
        (setf (first split-path) :root))
      (when (string= (first (last split-path)) "")
        (setf (first (last split-path)) :dir)))

    ;; Next, squash out all of the . and empty components,
    ;; and replace .. components with :up symbol.
    (setf split-path (mapcan #'(lambda (item)
				 (cond
				   ((string= item "") nil)
				   ((string= item ".") nil)
				   ((string= item "..") (list :up))
				   (t (list item)))) 
			     split-path))
    (let (folded-path)
      ;; Now, we use a pushdown automaton to reduce the .. paths
      ;; The remaining stack is the reversed path.
      (dolist (item split-path)
        (case item
	  ((:up)
	    (case (first folded-path)
	      ((:root)) ;; do nothing
	      ((:up nil) (push item folded-path) (setf uncanceled-up t))
	      (otherwise (pop folded-path))))
	  ((:dir)
	    (case (first folded-path)
	      ((:root :up nil))
	      (otherwise (push (format nil "~a/" (pop folded-path)) 
	                       folded-path))))
          (otherwise
	    (push item folded-path))))
      (setf split-path (nreverse folded-path)))

    ;; If there are at least two components, remove a leading :root
    ;; and add a / to the first component. If there are 0 components
    ;; add a "." component.
    (if (zerop (length split-path))
      (push "." split-path)
      (when (eq (first split-path) :root)
	(pop split-path)
	(push (format nil "/~a" (or (pop split-path) "")) split-path)))

    ;; Map remaining symbols back to strings
    (setf split-path (mapcar #'(lambda (item)
				 (case item
				   ((:up) "..")
				   (otherwise item))) split-path))

    ;; Convert back to text
    (values (reduce #'(lambda (x y) (format nil "~a/~a" x y)) split-path)
            uncanceled-up)))

(defun split-fields (in-string delim-char-string)
"Split string into rigid fields based on delimiter characters. 
Each individual delimiter character separates two fields.
Example:  (split-fields \":c:#a\" \":#\") ==> (\"\" \"c\" \"\" \"a\")"
  (let (list (token ""))
    (dotimes (index (length in-string)) 
      (let ((ch (aref in-string index)))
        (cond 
          ((find ch delim-char-string)
           (push token list)
           (setf token ""))
          (t (setf token (format nil "~a~a" token ch))))))
    (push token list)
    (nreverse list)))

(defun split-words (in-string delim-char-string)
"Munge sequences of delimiter characters. The pieces
in between are returned as a list of strings.
Example:  (split-words \" a b cde f \"  \" \") ==> (\"a\" \"b\" \"cde\" \"f\")
Copyright 2002, Kaz Kylheku.  Permission granted to copy, modify and
use for any purpose."
  (let (list (token "") (state :parsing-delim))
    (dotimes (i (length in-string))
      (let ((ch (aref in-string i)))
        (if (not (find ch delim-char-string))
          (progn (setf token (format nil "~a~a" token ch))
                 (setf state :parsing-token))
          (when (eq state :parsing-token)
            (push token list)
            (setf token "")
            (setf state :parsing-delim)))))
    (when (not (equal token ""))
      (push token list))
    (nreverse list)))
From: Kaz Kylheku
Subject: Re: Normalising Pathnames and file names.
Date: 
Message-ID: <5Uf58.19341$jb.956823@news2.calgary.shaw.ca>
In article <·················@nntp.interaccess.com>, Thaddeus L Olczyk wrote:
>There are several ways that I would like to normalise path/filenames.
>They are:
>1) Since the file may be on either Windows or Unix, and since Windows
>    isn't particular about whether / or \ are the pathname seperators,
>
>    I would like to change \ to /.
>2) I want to remove all occurences of two simulataneuos /. EG   
>    "///'->"/".
>3) I want to remove all occurances of ".". ( Remember that "." can
>    occur at the end or start of a name. At the end I just want to 
>    remove it. At the start I want the absolute pathname. )
>4) I want to remove all occurances of  ".." which also means that
>     I want to remove the directory preceding "..".
>5) If the name is local, I want to make it absolute.

I just put out a freeware program which has a canonicalization function
for POSIX paths. It doesn't do 1 or 5, for the following reasons.
Windows paths are different enough from POSIX paths that this naive
treatment won't do. For example if you change \\host\ to //host/ and
then suppress repeated slashes, you end up with something incorrect.

Requirement 5 entails making a call to the operating system to discover
the path to the current working directory.  There is no need to have this
in the canonicalization function.  If it's a relative path, the caller
can just catenate the current working directory path to that path
path and invoke your canonicalization routine, right?

/home/tolczyk + ../rel/path =  /home/tolczyk/../rel/path
                            => /home/rel/path

Also how do you know that the path is intended to be relative to the
current working directory, anyway? Maybe it's relative to somewhere else.

On the other hand, another requirement you might want to add is to
resolve symbolic links.  (I don't do this in my function, but the
BSD library function realpath() does it).

Which brings up another point: there may be a path canonicalization
function in your system library, which may be worth interfacing with
over FFI, provided it has the semantics you want.

>One warning. "." and ".." can legally occur inside path/filenames.
>So you have to be careful that it represents a directory.

Only if you do something silly, like use regular expressions. ;)

>I can do this relatively easily using regex's but if it's easy to do
>in Lisp without regex's I would prefer that. 

I actually hacked up an initial solution to this using regex's, and then
realized that it was overly complicated, and slow. Getting all the cases
right required hacks, such as protecting a leading / by doubling it up
and then removing it later. And of course, regular expressions cannot
handle the removal of .. because that requires a pushdown automaton.
So at best you can do it iteratively, by repeatedly reprocessing the
path with a regex for matching a component followed by .. until
they are all gone.

(defun canonicalize-path (path)
"Simplifies a POSIX path by eliminating . components, splicing out as many ..
components as possible, and condensing multiple slashes. A trailing slash is
guaranteed to be preserved, if it follows something that could be a file or
directory.  Two values are returned, the simplified path and a boolean value
which is true if there are any .. components that could not be spliced out.
Copyright 2002, Kaz Kylheku.  Permission granted to copy, modify and
use for any purpose."
  (let ((split-path (split-fields path "/"))
        uncanceled-up)

    ;; First, if the path has at least two components,
    ;; replace the first empty one with the symbol :root
    ;; and the last empty one with :dir. These indicate a
    ;; leading and trailing /
    (when (and (> (length split-path) 1))
      (when (string= (first split-path) "")
        (setf (first split-path) :root))
      (when (string= (first (last split-path)) "")
        (setf (first (last split-path)) :dir)))

    ;; Next, squash out all of the . and empty components,
    ;; and replace .. components with :up symbol.
    (setf split-path (mapcan #'(lambda (item)
				 (cond
				   ((string= item "") nil)
				   ((string= item ".") nil)
				   ((string= item "..") (list :up))
				   (t (list item)))) 
			     split-path))
    (let (folded-path)
      ;; Now, we use a pushdown automaton to reduce the .. paths
      ;; The remaining stack is the reversed path.
      (dolist (item split-path)
        (case item
	  ((:up)
	    (case (first folded-path)
	      ((:root)) ;; do nothing
	      ((:up nil) (push item folded-path) (setf uncanceled-up t))
	      (otherwise (pop folded-path))))
	  ((:dir)
	    (case (first folded-path)
	      ((:root :up nil))
	      (otherwise (push (format nil "~a/" (pop folded-path)) 
	                       folded-path))))
          (otherwise
	    (push item folded-path))))
      (setf split-path (nreverse folded-path)))

    ;; If there are at least two components, remove a leading :root
    ;; and add a / to the first component. If there are 0 components
    ;; add a "." component.
    (if (zerop (length split-path))
      (push "." split-path)
      (when (eq (first split-path) :root)
	(pop split-path)
	(push (format nil "/~a" (or (pop split-path) "")) split-path)))

    ;; Map remaining symbols back to strings
    (setf split-path (mapcar #'(lambda (item)
				 (case item
				   ((:up) "..")
				   (otherwise item))) split-path))

    ;; Convert back to text
    (values (reduce #'(lambda (x y) (format nil "~a/~a" x y)) split-path)
            uncanceled-up)))

(defun split-fields (in-string delim-char-string)
"Split string into rigid fields based on delimiter characters. 
Each individual delimiter character separates two fields.
Example:  (split-fields \":c:#a\" \":#\") ==> (\"\" \"c\" \"\" \"a\")
Copyright 2002, Kaz Kylheku.  Permission granted to copy, modify and
use for any purpose."
  (let (list (token ""))
    (dotimes (index (length in-string)) 
      (let ((ch (aref in-string index)))
        (cond 
          ((find ch delim-char-string)
           (push token list)
           (setf token ""))
          (t (setf token (format nil "~a~a" token ch))))))
    (push token list)
    (nreverse list)))