From: Peter Seibel
Subject: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <m365dqirqw.fsf@javamonkey.com>
I'm guessing that on some historically important filesystem the
natural way to get the length of a file was the Lisp way: opening it
and getting the information from the stream. If so, what was that OS?

I only ask becuase it may seem strange to folks used to Unix-centric
languages where that information is available without opening the
file.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp

From: Artie Gold
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <c1rh5d$1mia83$1@ID-219787.news.uni-berlin.de>
Peter Seibel wrote:
> I'm guessing that on some historically important filesystem the
> natural way to get the length of a file was the Lisp way: opening it
> and getting the information from the stream. If so, what was that OS?
> 
> I only ask becuase it may seem strange to folks used to Unix-centric
> languages where that information is available without opening the
> file.
> 

Actually that's a misconception; in standard C (a language whose roots 
are undoubtedly `unix-centric') the only way to get the length of a file 
(without using some platform specific library) is to `read it and count'.

HTH,
--ag

-- 
Artie Gold -- Austin, Texas

"Yeah. It's an urban legend. But it's a *great* urban legend!"
From: David Steuber
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <m23c8ugysd.fsf@david-steuber.com>
Artie Gold <·········@austin.rr.com> writes:

> Peter Seibel wrote:
> > I'm guessing that on some historically important filesystem the
> > natural way to get the length of a file was the Lisp way: opening it
> > and getting the information from the stream. If so, what was that OS?
> > I only ask becuase it may seem strange to folks used to Unix-centric
> > languages where that information is available without opening the
> > file.
> > 
> 
> Actually that's a misconception; in standard C (a language whose roots
> are undoubtedly `unix-centric') the only way to get the length of a
> file (without using some platform specific library) is to `read it and
> count'.

It sure doesn't look that way to the user.  In C, the stat function
does the job.  Admittedly it is defined in sys/stat.h.

-- 
   One Emacs to rule them all.  One Emacs to find them,
   One Emacs to take commands and to the keystrokes bind them,

All other programming languages wish they were Lisp.
From: Artie Gold
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <c1t5d4$1kj0oj$1@ID-219787.news.uni-berlin.de>
David Steuber wrote:
> Artie Gold <·········@austin.rr.com> writes:
> 
> 
>>Peter Seibel wrote:
>>
>>>I'm guessing that on some historically important filesystem the
>>>natural way to get the length of a file was the Lisp way: opening it
>>>and getting the information from the stream. If so, what was that OS?
>>>I only ask becuase it may seem strange to folks used to Unix-centric
>>>languages where that information is available without opening the
>>>file.
>>>
>>
>>Actually that's a misconception; in standard C (a language whose roots
>>are undoubtedly `unix-centric') the only way to get the length of a
>>file (without using some platform specific library) is to `read it and
>>count'.
> 
> 
> It sure doesn't look that way to the user.  In C, the stat function
> does the job.  Admittedly it is defined in sys/stat.h.
> 
Right. It's *not* part of C. It's part of the interface supplied by a 
certain class of operating systems. What it *looks* like is irrelevant.

<OT>
Does anyone know if there was ever a Lisp machine that hosted a C 
implementation <gasp>?
</OT>

Cheers,
--ag
-- 
Artie Gold -- Austin, Texas

"Yeah. It's an urban legend. But it's a *great* urban legend!"
From: Martti Halminen
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <pan.2004.02.29.17.29.55.444573@kolumbus.fi>
On Sun, 29 Feb 2004 10:52:49 -0600, Artie Gold wrote:

> <OT>
> Does anyone know if there was ever a Lisp machine that hosted a C 
> implementation <gasp>?
> </OT>

Don't know about the others, but at least Symbolics did. Pascal, Fortran,
Ada and Prolog, too, IIRC.

-- 
From: Pierpaolo BERNARDI
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <AfC0c.42226$gk.1797150@news3.tin.it>
"Artie Gold" <·········@austin.rr.com> ha scritto nel messaggio ····················@ID-219787.news.uni-berlin.de...
> David Steuber wrote:
> > Artie Gold <·········@austin.rr.com> writes:

> <OT>
> Does anyone know if there was ever a Lisp machine that hosted a C 
> implementation <gasp>?
> </OT>

The Zeta C compiler, C on symbolics, written by Scott L. Burson, 
has been recently put in the public domain by its author.  

I don't have an url handy, the distribution is called ZETA-C-PD.tgz.

P.
From: Rahul Jain
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <877jy4ty0y.fsf@nyct.net>
"Pierpaolo BERNARDI" <··················@hotmail.com> writes:

> The Zeta C compiler, C on symbolics, written by Scott L. Burson, 
> has been recently put in the public domain by its author.  

Wow. That's gotta be some cool lisp code to look at.

> I don't have an url handy, the distribution is called ZETA-C-PD.tgz.

It's at www.spies.com/~aek/explorer/zeta-c/ (no surprise there :).

-- 
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: Pascal Bourguignon
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <87oerizalg.fsf@thalassa.informatimago.com>
Artie Gold <·········@austin.rr.com> writes:

> Peter Seibel wrote:
> > I'm guessing that on some historically important filesystem the
> > natural way to get the length of a file was the Lisp way: opening it
> > and getting the information from the stream. If so, what was that OS?
> > I only ask becuase it may seem strange to folks used to Unix-centric
> > languages where that information is available without opening the
> > file.
> >
> 
> Actually that's a misconception; in standard C (a language whose roots
> are undoubtedly `unix-centric') the only way to get the length of a
> file (without using some platform specific library) is to `read it and
> count'.

Peter did  not ask about languages  (like C), but  about systems (like
unix). In unix, the file size  is an attribute of the file (inode) and
is accessible without having to open the file.  You don't even have to
have read  or search access  rights on the  file you want to  know the
size!  You just have to have read access on a directory where an entry
for that file (inode) is stored.




FILE-LENGTH:

    file-length returns the length of stream, or nil if the length
    cannot be determined.

    For a binary file, the length is measured in units of the element
    type of the stream.


The standard  is totally uninformative  about the definition of  a non
binary file length.  Is that the  number of bytes?  Is that the number
of characters?   Is that the  number of records? Are  there non-binary
files that are not character files?   We only know that whe applied to
binary  files, they  have a  unit.  But  would that  be the  number of
elements one  can read  from  the  file? Or  would  that  be the  size
allocated to the file?  Or some other random value?


The page on FILE-POSITION give some constraints on file positions, but
nothing really relates file positions with file lenghts.  For example,
when you open a file, you could very well have:

    (with-open-file (in "sample-file-name" :direction :input)
        (assert (= (file-position in) (length "sample-file-name"))))

or:

    (with-open-file (in "sample-file-name" :direction :input)
        (assert (= (file-position in) (* 3 (file-length in)))))

While the differences in file  positions are specified (= 1 for binary
byte, >=  1 for character), there  is not enough  information to infer
anything about file-length and file-position at the end of the file.





People like  me have a  length.  When people  like me have  peanuts in
their  pockets,  their length  is  measured  in  units of  apple.  For
example, I've  got two peanuts in my  left pocket, and my  length is 6
apples.  But  my friend has no  peanuts, and his length  (he still has
one), is only 5 nano seconds.



-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Artie Gold
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <c1t4oe$1m4agg$1@ID-219787.news.uni-berlin.de>
Pascal Bourguignon wrote:
> Artie Gold <·········@austin.rr.com> writes:
> 
> 
>>Peter Seibel wrote:
>>
>>>I'm guessing that on some historically important filesystem the
>>>natural way to get the length of a file was the Lisp way: opening it
>>>and getting the information from the stream. If so, what was that OS?
>>>I only ask becuase it may seem strange to folks used to Unix-centric
                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>languages where that information is available without opening the
    ^^^^^^^^^
>>>file.
>>>
>>
>>Actually that's a misconception; in standard C (a language whose roots
>>are undoubtedly `unix-centric') the only way to get the length of a
>>file (without using some platform specific library) is to `read it and
>>count'.
> 
> 
> Peter did  not ask about languages  (like C), but  about systems (like
> unix). In unix, the file size  is an attribute of the file (inode) and
> is accessible without having to open the file.  You don't even have to
> have read  or search access  rights on the  file you want to  know the
> size!  You just have to have read access on a directory where an entry
> for that file (inode) is stored.
> 
[snip]

See above; he *did* ask about languages. And my response was directed 
toward that.

Your response, on the other hand, was merely informative and 
enlightening, covering the entire subject.

;-)

Cheers,
--ag

-- 
Artie Gold -- Austin, Texas

"Yeah. It's an urban legend. But it's a *great* urban legend!"
From: Tim Bradshaw
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <ey31xodzncw.fsf@cley.com>
* Pascal Bourguignon wrote:

> Peter did not ask about languages (like C), but about systems (like
> unix). In unix, the file size is an attribute of the file (inode)
> and is accessible without having to open the file.  You don't even
> have to have read or search access rights on the file you want to
> know the size!  You just have to have read access on a directory
> where an entry for that file (inode) is stored.

This is wrong.  You need execute permission on the directory: read
permission will let you have the names of the files but no other
information about them.

--tim
From: Steven M. Haflich
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <_ad0c.18372$eE2.6725@newssvr29.news.prodigy.com>
Peter Seibel wrote:
> I'm guessing that on some historically important filesystem the
> natural way to get the length of a file was the Lisp way: opening it
> and getting the information from the stream. If so, what was that OS?

Think about what the integer returned by file-length means.  Then think
about what the integer printed by Unix "ls -l" means.  They mean quite
different things.

> I only ask becuase it may seem strange to folks used to Unix-centric
> languages where that information is available without opening the
> file.
From: Peter Seibel
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <m3u11ah67y.fsf@javamonkey.com>
"Steven M. Haflich" <·················@alum.mit.edu> writes:

> Peter Seibel wrote:
> > I'm guessing that on some historically important filesystem the
> > natural way to get the length of a file was the Lisp way: opening it
> > and getting the information from the stream. If so, what was that OS?
> 
> Think about what the integer returned by file-length means.  Then think
> about what the integer printed by Unix "ls -l" means.  They mean quite
> different things.

I don't follow you. At least on GNU/Linux the integer printed in the
fourth column of the output of "ls -l" is the size of the file in
bytes. Based on previous discussions of FILE-LENGTH here, my
understanding is that is what FILE-LENGTH returns.

Unless you are telling me that there actually are Lisp implementations
where FILE-LENGTH measures the length of the file in element-type of
the stream (which, as I'm sure you know would require reading the
whole file for some character encodings) I don't see how those numbers
mean anything different at all. At any rate here are the numbers I get
from Allegro:

CL-USER> (with-open-file (out "/tmp/utf8test.txt"
                              :direction :output 
                              :element-type 'character
                              :external-format :utf8)
            (loop for code from 0 to char-code-limit 
                  when (code-char code)
                  do (format out "~c" (code-char code)) and count t))
                   
65536
CL-USER> (with-open-file (out "/tmp/utf8test.txt" :element-type 'character :external-format :utf8) (file-length out))
194434
CL-USER> (with-open-file (out "/tmp/utf8test.txt" :element-type '(unsigned-byte 8)) (file-length out))
194434

And here's what I get from ls -l:


[·····@xeon lisp-book]$ ls -l /tmp/utf8test.txt 
-rw-rw-r--    1 peter    peter      194434 Feb 28 20:07 /tmp/utf8test.txt


-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Bruno Haible
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <c1t79b$43j$1@laposte.ilog.fr>
Peter Seibel  <·····@javamonkey.com> wrote:
>> Think about what the integer returned by file-length means.  Then think
>> about what the integer printed by Unix "ls -l" means.  They mean quite
>> different things.
>
> I don't follow you. At least on GNU/Linux the integer printed in the
> fourth column of the output of "ls -l" is the size of the file in
> bytes.

"Size" can mean either
  - N: the number of bytes you can read() from a file,
  - M: the maximum allowed file position to which you can lseek(),
  - L: the st_size value, shown by "ls -l".

On DOS and Woe32 systems, you are accustomed to N < M = L, due to the fact
that the OS may convert CR/LF to LF when reading bytes.

But even on Linux you have files where N < M. The /proc/<pid>/maps are
an example. There is no requirement that a read() call which returns
'count' bytes of data increases the file position by exactly 'count'.
I'm not sure whether L should be = N or = M in this case; in the
/proc/<pid>/maps Linus decided to set it to 0.

                      Bruno
From: Peter Seibel
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <m3hdx9hert.fsf@javamonkey.com>
·····@clisp.org (Bruno Haible) writes:

> Peter Seibel  <·····@javamonkey.com> wrote:
> >> Think about what the integer returned by file-length means.  Then think
> >> about what the integer printed by Unix "ls -l" means.  They mean quite
> >> different things.
> >
> > I don't follow you. At least on GNU/Linux the integer printed in the
> > fourth column of the output of "ls -l" is the size of the file in
> > bytes.
> 
> "Size" can mean either
>   - N: the number of bytes you can read() from a file,
>   - M: the maximum allowed file position to which you can lseek(),
>   - L: the st_size value, shown by "ls -l".

Okay, but doesn't FILE-LENGTH still return L? By experimentation I see
that using different sizes of unsigned-byte causes it to return
different answers, at least in Allegro. But as my previous post
showed, it doesn't do that with characters which is the case where
there is another meaning of "length" that is not a simple matter of
arithmetic. When I open a stream with '(unsigned-byte 32) instead of
'(unsigned-byte 8) the length is indeed 1/4 of the value returned by
ls -l but it appears to return just as fast regardless of the length
of the file so I suspect it is doing a stat and dividing st_size by 4,
not actually reading the file.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Tim Bradshaw
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <ey3smgty85v.fsf@cley.com>
* Peter Seibel wrote:

> Okay, but doesn't FILE-LENGTH still return L? By experimentation I
> see that using different sizes of unsigned-byte causes it to return
> different answers, at least in Allegro. But as my previous post
> showed, it doesn't do that with characters which is the case where
> there is another meaning of "length" that is not a simple matter of
> arithmetic. 

Because the OS makes it very expensive to compute the real length in
characters. Essentially the only way you can do so on a Unix or
Windows system is to read the whole file and see how long it is.

--tim
From: Pascal Bourguignon
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <87eksdz6yg.fsf@thalassa.informatimago.com>
Peter Seibel <·····@javamonkey.com> writes:

> ·····@clisp.org (Bruno Haible) writes:
> 
> > Peter Seibel  <·····@javamonkey.com> wrote:
> > >> Think about what the integer returned by file-length means.  Then think
> > >> about what the integer printed by Unix "ls -l" means.  They mean quite
> > >> different things.
> > >
> > > I don't follow you. At least on GNU/Linux the integer printed in the
> > > fourth column of the output of "ls -l" is the size of the file in
> > > bytes.
> > 
> > "Size" can mean either
> >   - N: the number of bytes you can read() from a file,
> >   - M: the maximum allowed file position to which you can lseek(),
> >   - L: the st_size value, shown by "ls -l".
> 
> Okay, but doesn't FILE-LENGTH still return L? By experimentation I see
> that using different sizes of unsigned-byte causes it to return
> different answers, at least in Allegro. But as my previous post
> showed, it doesn't do that with characters which is the case where
> there is another meaning of "length" that is not a simple matter of
> arithmetic. When I open a stream with '(unsigned-byte 32) instead of
> '(unsigned-byte 8) the length is indeed 1/4 of the value returned by
> ls -l but it appears to return just as fast regardless of the length
> of the file so I suspect it is doing a stat and dividing st_size by 4,
> not actually reading the file.


On CLISP, (which is a good european, universalist software and knows
about UTF-8, chinese, ethiopian, korean, and all kind of scripts ;-):

[65]> (let
    ((path "/local/users/pascal/src/miscellaneous/tests/misc/UTF-8-demo.utf-8"))
  (with-open-file (in path :direction :input)
    (do ((i 0 (1+ i))
         (ch (read-char in nil nil)(read-char in nil nil)))
        ((null ch)
         (format t "~&clisp says it read               ~6D characters~%" i)))
    (format t "~&clisp says last file position is ~6D~%" (file-position in))
    (format t "~&clisp says file length        is ~6D~%" (file-length in)))
  (with-open-file (in path :direction :input
                      :element-type '(unsigned-byte 32))
    (format t "~&clisp says binary file length is ~6D~%" (file-length in)))
  (format t "~&unix says file size           is ~6D~%"
          (multiple-value-bind (res stat) (linux:|stat| path)
            (if (zerop res)
              (linux:|stat-st_size| stat)
              0))))
clisp says it read                 7627 characters
clisp says last file position is  14058
clisp says file length        is  14058
clisp says binary file length is   3514
unix says file size           is  14058
NIL



On SBCL, (which is a bad american, closed-minded, imperialist, pure
ASCII software and knows only about 8bit characters :-)):

* (let
    ((path "/local/users/pascal/src/miscellaneous/tests/misc/UTF-8-demo.utf-8"))
  (with-open-file (in path :direction :input)
    (do ((i 0 (1+ i))
         (ch (read-char in nil nil)(read-char in nil nil)))
        ((null ch)
    (format t "~&sbcl says it read               ~6D characters~%" i)))
    (format t "~&sbcl says last file position is ~6D~%" (file-position in))
    (format t "~&sbcl says file length        is ~6D~%" (file-length in)))
  (with-open-file (in path :direction :input
                      :element-type '(unsigned-byte 32))
    (format t "~&sbcl says binary file length is ~6D~%" (file-length in)))
  (format t "~&unix says file size          is ~6D~%"
          (multiple-value-bind (s a b c d u g f size) (sb-unix:unix-stat path)
            (declare (ignore a b c d u g f))
            (if s size 0))))
sbcl says it read                14058 characters
sbcl says last file position is  14058
sbcl says file length        is  14058
sbcl says binary file length is   3514
unix says file size          is  14058


Since the standard does not say what a file length is, any answer is good.
Note that (mod 14058 4) == 2.




I'd say that the safest best would be to use:

 (with-open-file (in path :direction :input :element-type '(unsigned-byte 8))
    (file-length in))

to get the unix file size, at least on any sane implementation. But if
you want any guarantee, you'd rather use FFI and stat(2).

    
-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Tim Bradshaw
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <ey3wu65y8b0.fsf@cley.com>
* Peter Seibel wrote:

> I don't follow you. At least on GNU/Linux the integer printed in the
> fourth column of the output of "ls -l" is the size of the file in
> bytes. Based on previous discussions of FILE-LENGTH here, my
> understanding is that is what FILE-LENGTH returns.

ls -l does indeed tell you the length in bytes.  This is very often
not useful.  If you've ever used an obscure OS called `Windows' (or
its predecessor, `MSDOS'), you'll know that the byte length of a text
file generally does not correspond to the length of the file in
characters as read by most Lisp implementations, because the
end-of-line sequence, which reads as one character, is represented in
the file as two.

Even stranger: there are countries (yes, really) which not only need
more than 7 bit characters to represent their alphabets, but need more
than *8*.  Worse, there's more than one of these countries, with
mutually incompatible alphabets.  You might need 16 or 32 bits to
represent a character.  Of course, you might then encode data in files
to keep it reasonably compact.  The length of the file in bytes then
bears almost no relation to its length in characters.

Of course, FILE-LENGTH generally doesn't solve this problem either,
but it has a better chance of providing the right answer than ls -l:
on a system which kept better metadata about files (lengths in octets,
characters &c), then once the file is opened there is probably enough
information to return a meaningful length.

--tim
From: Peter Seibel
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <m34qt9gxp3.fsf@javamonkey.com>
Tim Bradshaw <···@cley.com> writes:

> Of course, FILE-LENGTH generally doesn't solve this problem either,
> but it has a better chance of providing the right answer than ls -l:
> on a system which kept better metadata about files (lengths in
> octets, characters &c), then once the file is opened there is
> probably enough information to return a meaningful length.

Which brings me back to my original question: is there or was there
some operating system which worked this way--where the natural way to
get the length of a file required performing the same operation one
performed to read data from the file?

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Christopher C. Stacy
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <ud67xfdzk.fsf@news.dtpq.com>
>>>>> On Mon, 01 Mar 2004 01:31:20 GMT, Peter Seibel ("Peter") writes:

 Peter> Tim Bradshaw <···@cley.com> writes:
 >> Of course, FILE-LENGTH generally doesn't solve this problem either,
 >> but it has a better chance of providing the right answer than ls -l:
 >> on a system which kept better metadata about files (lengths in
 >> octets, characters &c), then once the file is opened there is
 >> probably enough information to return a meaningful length.

 Peter> Which brings me back to my original question: is there or was
 Peter> there some operating system which worked this way--where the
 Peter> natural way to get the length of a file required performing
 Peter> the same operation one performed to read data from the file?

The operating systems of the 1970/80s often had file systems whose
directories knew a file's character size.  Just not Unix and DOS.
You did not need to open the file and read it in order to guess.  
But the Lisp implementation might want to open up the file and peek 
at the first few bytes in order to discover that it was using some
particular character encoding (of known uniform size characters).

I think the LispM might have done that under some circumstances, 
but I can't really remember.  (I was thinking of the extended
character set with font encodings understood by ZMACS, but hmmm,
weren't those variable?  So I don't think that's right...)

However, you could also place arbitrary metadata in the LispM's native
filesystem directories.  (Directory entries could have a plist).

Obviously once you're talking about newline encoding or other 
variable-length randomly occuring characters, you just have to 
count them up unless the file system already counted them up.
I think the file system (aka "Record Management System") on
VMS would keep track for you, if you told it that each line
was a "record".
From: Pascal Bourguignon
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <87k725xm1z.fsf@thalassa.informatimago.com>
Peter Seibel <·····@javamonkey.com> writes:

> Tim Bradshaw <···@cley.com> writes:
> 
> > Of course, FILE-LENGTH generally doesn't solve this problem either,
> > but it has a better chance of providing the right answer than ls -l:
> > on a system which kept better metadata about files (lengths in
> > octets, characters &c), then once the file is opened there is
> > probably enough information to return a meaningful length.
> 
> Which brings me back to my original question: is there or was there
> some operating system which worked this way--where the natural way to
> get the length of a file required performing the same operation one
> performed to read data from the file?

That's the wrong question. (The obvious answer being yes).

The  right  question,  is:  what  the hell  does  Common-Lisp  exactly
understands as being the FILE-LENGTH???



Now, on  old systems, where there is  no notion of files  as stream of
bytes,  you could  have  for example  80-column  hollerith card  image
files.  On  sectors of 512 bytes  (not _so_ old  systems, just systems
with an  _old_ heritage, I'm  thinking in this  case of the  ICL S25),
they would  store 6 records of  80 characters (and leave  32 bytes per
sector  lost or  for the  file system  usage  (file-ID, deleted-flags,
next/previous  pointers for example).   So, when  you consider  such a
mess^W file, what do you call its size?

    - the number of 512-byte sectors allocated?
    - the number of non-deleted "cards"?
    - the number of characters?

How do you  count these characters?  On a  perforated card, when there
are no  perforations you  have a space  character (a null  character =
space).   So you  have  usually  80 character  per  line/card.  But  a
program that would want to  process variable sized lines could write a
line termination  character, and ignore all character  after this line
termination (that could well not appear if all 80 columns where used).

Obviously, to count the characters,  or even to count the cards, you'd
have to open the file and read it.  

AFAIK, unix was the first system to have file systems with a notion of
inode.  I may  be wrong here, but I've got  the impression that before
unix, you could have directories  with some meta data, but usually you
had to "open" the files to  get their properties. On unix you can read
all the  properties, the whole  inode, with stat(2) without  having to
open the file.


-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Christopher C. Stacy
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <uvflpdwx8.fsf@news.dtpq.com>
>>>>> On 01 Mar 2004 04:50:48 +0100, Pascal Bourguignon ("Pascal") writes:

 Pascal> AFAIK, unix was the first system to have file systems with a notion of
 Pascal> inode.  I may  be wrong here, but I've got  the impression that before
 Pascal> unix, you could have directories  with some meta data, but usually you
 Pascal> had to "open" the files to  get their properties. On unix you can read
 Pascal> all the  properties, the whole  inode, with stat(2) without  having to
 Pascal> open the file.

Where in the hell do people get such crazy ideas?
From: Pascal Bourguignon
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <87brngxqft.fsf@thalassa.informatimago.com>
······@news.dtpq.com (Christopher C. Stacy) writes:

> >>>>> On 01 Mar 2004 04:50:48 +0100, Pascal Bourguignon ("Pascal") writes:
> 
>  Pascal> AFAIK, unix was the first system to have file systems with
>  Pascal> a notion of inode.  I may  be wrong here, but I've got  the
>  Pascal> impression that before unix, you could have directories
>  Pascal> with some meta data, but usually you had to "open" the
>  Pascal> files to  get their properties. On unix you can read all
>  Pascal> the  properties, the whole  inode, with stat(2) without
>  Pascal> having to open the file.
> 
> Where in the hell do people get such crazy ideas?

AFAIK = As Far As I Know.

Perhaps I don't know enought.  For sure, I don't know everything.

-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Pascal Bourguignon
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <87ad31z6nw.fsf@thalassa.informatimago.com>
Tim Bradshaw <···@cley.com> writes:

> * Peter Seibel wrote:
> 
> > I don't follow you. At least on GNU/Linux the integer printed in the
> > fourth column of the output of "ls -l" is the size of the file in
> > bytes. Based on previous discussions of FILE-LENGTH here, my
> > understanding is that is what FILE-LENGTH returns.
> 
> ls -l does indeed tell you the length in bytes.  This is very often
> not useful.  

NO.  It is  very useful.  The need to  open the file to get  a size is
nefast because  it implies  a specific assumption  on what  is wanted.
Sorry, but  I NEVER need  to know how  many characters are in  a file.
Even  journalists generally  don't need  to know  how  many characters
they've typed, they just need to know how many WORDS they've got.

But, I often need to know how many BYTES a file takes, because I often
need to copy files to make backups and it's important to be able to
efficiently sum up the byte sizes of several files.

The way the file system API is defined in Common-Lisp implies that you
cannot  program  system  tools   such  as  simple  backup  program  in
Common-Lisp. You must rely on a OS specific layer.


-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Tim Bradshaw
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <ey3r7w8dsy8.fsf@cley.com>
* Pascal Bourguignon wrote:
> NO.  It is  very useful.  The need to  open the file to get  a size is
> nefast because  it implies  a specific assumption  on what  is wanted.
> Sorry, but  I NEVER need  to know how  many characters are in  a file.
> Even  journalists generally  don't need  to know  how  many characters
> they've typed, they just need to know how many WORDS they've got.

So, for instance, you never want to efficiently read a file into a
string.  OK.

--tim
From: Pascal Bourguignon
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <87ad2q2mcw.fsf@thalassa.informatimago.com>
Tim Bradshaw <···@cley.com> writes:

> * Pascal Bourguignon wrote:
> > NO.  It is  very useful.  The need to  open the file to get  a size is
> > nefast because  it implies  a specific assumption  on what  is wanted.
> > Sorry, but  I NEVER need  to know how  many characters are in  a file.
> > Even  journalists generally  don't need  to know  how  many characters
> > they've typed, they just need to know how many WORDS they've got.
> 
> So, for instance, you never want to efficiently read a file into a
> string.  OK.

When I want to load a whole file in memory, either I memory map it, or
I gather it as a list of lines.  But this is not the point.



We could define  abstract attributes for files.  Perhaps  we need high
level attributes such  as number of characters, but  then specifies it
explicitely and precisely:


#+COMMON-LISP-2010 "

FILE-SIZE: return the number of bytes in the file.

            (file-size path) === (with-open-file (in path :direction :input
                                               :element-type 'unsigned-byte)
                                    (do ((size 0 (1+ size))
                                         (byte (read-byte in nil nil)
                                               (read-byte in nil nil)))
                                        ((null byte) size)))


FILE-CHARACTER-COUNT: return the number of characters in the file.

   (file-character-count path) === (with-open-file (in path :direction :input
                                                    :element-type 'character)
                                    (do ((size 0 (1+ size))
                                         (char (read-char in nil nil)
                                               (read-char in nil nil)))
                                        ((null char) size)))


FILE-LINE-COUNT: return the number of lines in the file.

   (file-character-count path) === (with-open-file (in path :direction :input
                                                    :element-type 'character)
                                    (do ((size 0 (1+ size))
                                         (line (read-line in nil nil)
                                               (read-line in nil nil)))
                                        ((null line) size)))



These functions would be defined on  the PATH of the files because the
file system may very well keep their values as meta data.  If not, the
implementation would open and read the file.

"



To finish with your question:

> So, for instance, you never want to efficiently read a file into a
> string.  OK.

anyway  the current  COMMON-LISP  standard DOES  NOT SPECIFY  ANYTHING
WORTHWHILE as result for FILE-LENGTH, when the file has been open with
a binary element-type.


    file-length returns the length of stream, or nil if the length
    cannot be determined.

    For a binary file, the length is measured in units of the element
    type of the stream.

so if  you want to "efficiently" read  a file into a  string, you will
have in any case to read it twice:



(defun efficient-read-file-into-a-string (path)
    (let ((string (make-string   (with-open-file (in path :direction :input
                                                    :element-type 'character)
                                    (do ((size 0 (1+ size))
                                         (char (read-char in nil nil)
                                               (read-char in nil nil)))
                                        ((null char) size))))))
        (with-open-file (in path :direction :input :element-type 'character)
            (read-sequence string in))
        string))




But I bet that it will be more efficient to keep reading the file once
and extend the string as needed.  I could even apply this _heuristic_:
allocate a  string that can  hold as many  character as the  number of
bytes in  the file, read the  characters from the  file, extending the
string if ever  needed. When the file is read, reduce  the size of the
string to fit.

-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Tim Bradshaw
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <fbc0f5d1.0403100226.65a39e9d@posting.google.com>
Pascal Bourguignon <····@thalassa.informatimago.com> wrote in message news:<··············@thalassa.informatimago.com>...

> anyway  the current  COMMON-LISP  standard DOES  NOT SPECIFY  ANYTHING
> WORTHWHILE as result for FILE-LENGTH, when the file has been open with
> a binary element-type.
> 

It's good that it doesn't specify anything useful, because if it did
this would make it a very expensive operation on many current OSs for
character streams.  However it leaves open the *possibility* of
returning a useful result if, for instance, OSs which keep this kind
of data were ever to appear. I kind of like that, but I guess that's
because I'm a human being, not a computer scientist.
From: Pascal Bourguignon
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <87ishczie9.fsf@thalassa.informatimago.com>
··········@tfeb.org (Tim Bradshaw) writes:

> Pascal Bourguignon <····@thalassa.informatimago.com> wrote in message news:<··············@thalassa.informatimago.com>...
> 
> > anyway  the current  COMMON-LISP  standard DOES  NOT SPECIFY  ANYTHING
> > WORTHWHILE as result for FILE-LENGTH, when the file has been open with
> > a binary element-type.
> > 
> 
> It's good that it doesn't specify anything useful, because if it did
> this would make it a very expensive operation on many current OSs for
> character streams.  However it leaves open the *possibility* of
> returning a useful result if, for instance, OSs which keep this kind
> of data were ever to appear. I kind of like that, but I guess that's
> because I'm a human being, not a computer scientist.

Yep. If you were a computer programmer, you'd be lazy.

The problem with  such kinds of specificatio, is that  it gives you (a
little)   more   work.   Since   you   cannot   count  on   conforming
implementations  to provide  the  service, you  have  to implement  it
yourself anyway.  The  little more work is the  time you spend reading
the feature specification and realizing that it's useless.


-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Ray Dillinger
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <40454215.AB4AD44B@sonic.net>
Tim Bradshaw wrote:
> 

> Even stranger: there are countries (yes, really) which not only need
> more than 7 bit characters to represent their alphabets, but need more
> than *8*.  Worse, there's more than one of these countries, with
> mutually incompatible alphabets.  You might need 16 or 32 bits to
> represent a character.  Of course, you might then encode data in files
> to keep it reasonably compact.  The length of the file in bytes then
> bears almost no relation to its length in characters.
> 
> Of course, FILE-LENGTH generally doesn't solve this problem either,
> but it has a better chance of providing the right answer than ls -l:
> on a system which kept better metadata about files (lengths in octets,
> characters &c), then once the file is opened there is probably enough
> information to return a meaningful length.

Oh, it's even stranger than that; recently I've been involved in a 
spirited debate as to whether a "character" is actually a single 
unicode codepoint, or a base codepoint plus nondefective sequence 
of zero or more combining codepoints. 

And my conclusion is that when you ask "read-char" for a character 
on a unicode-enabled system, it ought to give you the latter, not
the former.  Likewise characters in strings and character values 
should be unicode sequences.  

And FILE-LENGTH?  If you want the length in readable characters you 
have to open it and read it.  If you want the length in octets, take
what the operating system gives you. 

				Bear
From: Pascal Bourguignon
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <87k721begh.fsf@thalassa.informatimago.com>
Ray Dillinger <····@sonic.net> writes:
> Oh, it's even stranger than that; recently I've been involved in a 
> spirited debate as to whether a "character" is actually a single 
> unicode codepoint, or a base codepoint plus nondefective sequence 
> of zero or more combining codepoints. 
> 
> And my conclusion is that when you ask "read-char" for a character 
> on a unicode-enabled system, it ought to give you the latter, not
> the former.  Likewise characters in strings and character values 
> should be unicode sequences.  

Indeed.
 
> And FILE-LENGTH?  If you want the length in readable characters you 
> have to open it and read it.  If you want the length in octets, take
> what the operating system gives you. 

And what's worse,  I don't think it's prudent  (portable) (ie, I think
there exists some system where it's  not possible) to open a text file
as binary or a  binary file as a text file.  That  is, when you have a
text file you cannot use FILE-LENGTH to get the number of bytes in it.
Ok, I'm  not concerned  since I use  unix and  don't even hope  to get
anything better before long, if ever.


-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/
From: Tim Bradshaw
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <ey3n06wds72.fsf@cley.com>
* Ray Dillinger wrote:

> And FILE-LENGTH?  If you want the length in readable characters you 
> have to open it and read it.  If you want the length in octets, take
> what the operating system gives you. 

Only on systems which don't know this information.  For systems which
do, it may well be enough to tell it how you want to read the file (or
let the system tell you how the file is encoded), and then just ask.
Progress is possible.

--tim
From: Ray Dillinger
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <404A20FF.33240CAD@sonic.net>
Tim Bradshaw wrote:
> 
> * Ray Dillinger wrote:
> 
> > And FILE-LENGTH?  If you want the length in readable characters you
> > have to open it and read it.  If you want the length in octets, take
> > what the operating system gives you.
> 
> Only on systems which don't know this information.  For systems which
> do, it may well be enough to tell it how you want to read the file (or
> let the system tell you how the file is encoded), and then just ask.
> Progress is possible.
> 
Are there any filesystems that allow us to "decorate" file entries 
with information like unicode grapheme (base char plus combining sequence) 
counts?

				Bear
From: Pascal Bourguignon
Subject: Re: Why does FILE-LENGTH take a stream rather a pathname?
Date: 
Message-ID: <87ad2t7fp3.fsf@thalassa.informatimago.com>
Ray Dillinger <····@sonic.net> writes:
> Are there any filesystems that allow us to "decorate" file entries 
> with information like unicode grapheme (base char plus combining sequence) 
> counts?

The details  of the  mechanism don't import.  (You could  use resource
forks  on   MacHFS,  attributes  on   some  file  systems   I  believe
MS-Windows-NT   has,   or   just    store   it   in   a   file   named
".${filename}.attributes" or whatever).

The  problem is to  keep the  consistency of  this attribute  with the
contents of the file.

Either   the   system    maintains   it   itself   automatically   and
autoritatively, or you  (the applications) will have to  read the file
anyway to count the characters or the graphemes.


Note that  nowadays, there  is so  much RAM that  we usually  load the
whole files (or even whole  databases!) in RAM before working on them,
so it  should not be too costly  to count the characters  each time we
"open" a  file.  That's  when you realize  that Multics  segments were
really what we  really need (I've known since  1984/MacOS), instead of
open/read/write/close.  Happily,  we can use memory  mapping on modern
unix  systems.  Unfortunately,  there's  not much  support for  memory
mapping  in  Lisp.  How  to  memory  map  and garbage  collect  "file"
segments?

This should be an area  of some experimentation and of standardization
for Common-Lisp-2010 !

-- 
__Pascal_Bourguignon__                     http://www.informatimago.com/
There is no worse tyranny than to force a man to pay for what he doesn't
want merely because you think it would be good for him.--Robert Heinlein
http://www.theadvocates.org/