From: Bud
Subject: file-find
Date: 
Message-ID: <OiKZ4sGe#GA.185@upnetnews02.moswest.msn.net>
Has anyone written a simple find-file in LISP? I want to search all hard
drives, directories, and subdirectories for a file.

From: Sunil Mishra
Subject: Re: file-find
Date: 
Message-ID: <efyww02zv9r.fsf@whizzy.cc.gatech.edu>
"Bud" <········@email.msn.com> writes:

> Has anyone written a simple find-file in LISP? I want to search all hard
> drives, directories, and subdirectories for a file.

To find all lisp files in my home directory:

(directory "~/**/*.lisp")

(Not tested, but I'm pretty sure this will work fine.)

To find out more about lisp pathnames, check one of these two sites:

http://www.harlequin.com/books/HyperSpec/FrontMatter/Starting-Points.html
http://www-cgi.cs.cmu.edu/afs/cs.cmu.edu/project/ai-repository/ai/html/cltl/cltl2.html

It takes a little time to understand fully all the facilities lisp
provides, but it's well worth it.

Sunil
From: Christopher R. Barry
Subject: Re: file-find
Date: 
Message-ID: <874sn67ogd.fsf@2xtreme.net>
Sunil Mishra <·······@whizzy.cc.gatech.edu> writes:

> "Bud" <········@email.msn.com> writes:
> 
> > Has anyone written a simple find-file in LISP? I want to search all hard
> > drives, directories, and subdirectories for a file.
> 
> To find all lisp files in my home directory:
> 
> (directory "~/**/*.lisp")
> 
> (Not tested, but I'm pretty sure this will work fine.)

It's implementation dependent whether that will work or not. It does
not work with CMU Common Lisp, and is _incredibly_ slow (the better
part of a minute compared to the fraction of a second "find" takes)
with Allegro CL. Additionally, it will fail with Allegro CL if any of
your files have a ":" in them, for Allegro will identify the part of
the file-name before the ":" as a host-name and will give an error.

The only way to do this portably is to test whether each name in a
directory is a file or a directory, and move into it and repeat if
it's a directory. There is no DIRECTORYP function in ANSI CL, but the
last time the CL file stuff was discussed I believe someone posted a
portable one.

Christopher
From: Sunil Mishra
Subject: Re: file-find
Date: 
Message-ID: <efyvhfmzlrx.fsf@whizzy.cc.gatech.edu>
······@2xtreme.net (Christopher R. Barry) writes:

> Sunil Mishra <·······@whizzy.cc.gatech.edu> writes:
> 
> > "Bud" <········@email.msn.com> writes:
> > 
> > > Has anyone written a simple find-file in LISP? I want to search all hard
> > > drives, directories, and subdirectories for a file.
> > 
> > To find all lisp files in my home directory:
> > 
> > (directory "~/**/*.lisp")
> > 
> > (Not tested, but I'm pretty sure this will work fine.)
> 
> It's implementation dependent whether that will work or not. It does
> not work with CMU Common Lisp, and is _incredibly_ slow (the better
> part of a minute compared to the fraction of a second "find" takes)
> with Allegro CL. Additionally, it will fail with Allegro CL if any of
> your files have a ":" in them, for Allegro will identify the part of
> the file-name before the ":" as a host-name and will give an error.

Arguably these are implementation issues. Speed certainly is. If ACL is
braindead enough to attempt to interpret the pathname-string of a file that
is found, then it alone is to blame. But looking at the original post
again, my solution is *very* faulty at a higher level (discussed below).

> The only way to do this portably is to test whether each name in a
> directory is a file or a directory, and move into it and repeat if
> it's a directory. There is no DIRECTORYP function in ANSI CL, but the
> last time the CL file stuff was discussed I believe someone posted a
> portable one.

This may not be sufficient to deal with the speed issue, expecially if you
want to compare find with lisp.

Find has the following properties that allow it additional speed:

1. It does not collect by default. In other words, whatever replacement we
   want in lisp should take a handler, not attempt to construct a list.
2. It does not build up a pathname structure. Lisp deals with pathnames,
   which is nice for higher level operations. But I suspect that the amount 
   of structure wasted in building pathnames (especially for files that may 
   not even be relevant) is what is responsible for slowing down DIRECTORY.
3. It can take many, many more arguments to narrow down the scope of the
   search.
4. It does not have to be file system independent.

Of these, the first three we can hope to do something about. The fourth has 
to be a design decision. It may in fact be counter-productive to have a
file-system independent find.

The biggest slow-down, I'm willing to bet, would be from 2, and to deal
with it effectively I suspect the only way would be to try to go around the
lisp functionality. In other words, given a find command, I would like to:

0. Divide all the criteria presented into those that require parsing, and
   those that do no.
1. Get all the entries in the directory, along with the associated
   data. Trivially eliminate all the entries that do not meet the criteria.
2. Parse the entry. We know it is a file or a directory, so we should be
   able to ignore all the nonsense about hosts and pathnames. Additionally
   the underlying OS should also be able to distinguish between file and
   directory (and link and ...) which is necessary for getting find to
   work. This should further clarify how the entry should be parsed. Then
   construct a pathname taking the start directory as the default. This
   ought to result in a *huge* saving in time.
3. Apply the pathname-dependent constraints.
4. For those that satisfy all constraints, call the handler.

A similar mechanism would be necessary to get acceptable speed from
DIRECTORY. I bet CMUCL and ACL first obtain the entire string for the file
pathname, and then call PATHNAME on it. If that is the case, using the
portable DIRECTORY-P and DIRECTORY will be at least as slow as using
DIRECTORY as I had. We know a lot about the file already when we know how
to parse part of the path, and reusing this information will get us better
speed and correctness.

Sunil
From: Erik Naggum
Subject: Re: file-find
Date: 
Message-ID: <3131576049019458@naggum.no>
* ······@2xtreme.net (Christopher R. Barry)
| Additionally, it will fail with Allegro CL if any of your files have a
| ":" in them, for Allegro will identify the part of the file-name before
| the ":" as a host-name and will give an error.

  which version are you using?  _years_ have passed since I reported and
  Franz Inc fixed that problem.  there's a reason software has version
  identifiers, you know.  honest people do not fail to include them.

#:Erik
From: Christopher R. Barry
Subject: Re: file-find
Date: 
Message-ID: <871zia6m9z.fsf@2xtreme.net>
Erik Naggum <····@naggum.no> writes:

> * ······@2xtreme.net (Christopher R. Barry)
> | Additionally, it will fail with Allegro CL if any of your files have a
> | ":" in them, for Allegro will identify the part of the file-name before
> | the ":" as a host-name and will give an error.
> 
>   which version are you using?  _years_ have passed since I reported and
>   Franz Inc fixed that problem.  there's a reason software has version
>   identifiers, you know.  honest people do not fail to include them.

I have a file named "Ezekial-13:18" in a bible-notes subdirectory of
my home directory.

Allegro CL Trial Edition 5.0 [Linux/X86] (8/29/98 10:57)
Copyright (C) 1985-1998, Franz Inc., Berkeley, CA, USA.  All Rights Reserved.
;; Optimization settings: safety 1, space 1, speed 1, debug 2.
;; For a complete description of all compiler switches given the
;; current optimization settings evaluate (EXPLAIN-COMPILER-SETTINGS).
USER(1): (directory "~/**/*.lisp")
Error: host "Ezekial-13" not found in (sys:hosts.cl)
  [condition type: SIMPLE-ERROR]

Restart actions (select using :continue):
 0: :TRY-AGAIN
 1: Return to Top Level (an "abort" restart)
[1] USER(2): 

Christopher
From: Erik Naggum
Subject: Re: file-find
Date: 
Message-ID: <3131614305273030@naggum.no>
* ······@2xtreme.net (Christopher R. Barry)
| I have a file named "Ezekial-13:18" in a bible-notes subdirectory of
| my home directory.

  I checked with an unpatched 5.0, and the error is still there.  it turns
  out I had made a fix long ago that avoided this situation entirely.  I
  have now filed a new bug report, on which you have been copied.

#:Erik
From: Gareth McCaughan
Subject: Re: file-find
Date: 
Message-ID: <86hfr4a9z0.fsf@g.pet.cam.ac.uk>
Christopher Barry wrote:

> I have a file named "Ezekial-13:18" in a bible-notes subdirectory of
> my home directory.
...
> USER(1): (directory "~/**/*.lisp")
> Error: host "Ezekial-13" not found in (sys:hosts.cl)
>   [condition type: SIMPLE-ERROR]

Maybe it just noticed that you misspelled "Ezekiel" and thought
it should warn you. :-)

-- 
Gareth McCaughan       Dept. of Pure Mathematics & Mathematical Statistics,
·····@dpmms.cam.ac.uk  Cambridge University, England.
From: Christopher R. Barry
Subject: Re: file-find
Date: 
Message-ID: <87g16o58lc.fsf@2xtreme.net>
Gareth McCaughan <·····@dpmms.cam.ac.uk> writes:

> Maybe it just noticed that you misspelled "Ezekiel" and thought
> it should warn you. :-)

Yeah.

Christopher