Hello,
I am learning lisp and so I am lurking here for some time now.
There was an interesting post (for a novice like me) about fast loading
a file, so that each line would be appended to a list. Actually this was
the wrong way, somebody pointed out that this would lead to quadratic
time consumption if file size would doubled, so the solution had been to
put new lines at the beginning and reverse the list after finishing.
(I am wondering, if a use of one of those list-copying functions would
be an alternative, but therefore:)
I would like to reread those articles, but I can not find it via
google-groups. Does somebody remember the articles thread name?
Thanks,
Bernd
P.S.: vacation is over, the sun starts shining again and i am still not
finished with my lisp-book - grrr
--
https://gna.org/projects/mipisti - (microscope) picture stitching
T_a_k_e__c_a_r_e__o_f__y_o_u_r__R_I_G_H_T_S.
P_r_e_v_e_n_t__L_O_G_I_C--P_A_T_E_N_T_S
http://www.ffii.org, http://www.nosoftwarepatents.org
Bernd Schmitt <··················@gmx.net> writes:
> Hello,
>
> I am learning lisp and so I am lurking here for some time now.
>
> There was an interesting post (for a novice like me) about fast
> loading a file, so that each line would be appended to a
> list. Actually this was the wrong way, somebody pointed out that this
> would lead to quadratic time consumption if file size would doubled,
> so the solution had been to put new lines at the beginning and reverse
> the list after finishing.
Note that instead of using append or nconc, you could write this:
(defstruct (managed-list (:conc-name ml-)) head tail)
(defun ml-enqueue (ml item)
(if (null (ml-head ml))
(setf (ml-head ml) (list item)
(ml-tail ml) (ml-head ml))
(setf (cdr (ml-tail ml)) (list item)
(ml-tail ml) (cdr (ml-tail ml))))
ml)
(let ((l (make-managed-list)))
(print (ml-enqueue l :a))
(print (ml-enqueue l :b))
(print (ml-enqueue l :c))
(ml-head l))
#S(MANAGED-LIST :HEAD (:A) :TAIL (:A))
#S(MANAGED-LIST :HEAD (:A :B) :TAIL (:B))
#S(MANAGED-LIST :HEAD (:A :B :C) :TAIL (:C))
--> (:A :B :C)
This uses temporarily a little more (O(1) cste=2) memory than the
push/nreverse schem, and uses exactly the same time. If the list is
bigger than the cache, it might be slightly more efficient however,
since nreverse would have to reload several times the cache.
> (I am wondering, if a use of one of those list-copying functions would
> be an alternative, but therefore:)
> I would like to reread those articles, but I can not find it via
> google-groups. Does somebody remember the articles thread name?
--
__Pascal Bourguignon__ http://www.informatimago.com/
The rule for today:
Touch my tail, I shred your hand.
New rule tomorrow.
Bernd Schmitt <··················@gmx.net> wrote:
+---------------
| There was an interesting post (for a novice like me) about fast loading
| a file, so that each line would be appended to a list. Actually this was
| the wrong way, somebody pointed out that this would lead to quadratic
| time consumption if file size would doubled, so the solution had been to
| put new lines at the beginning and reverse the list after finishing.
+---------------
If you're willing to use a simple LOOP without necessarily
understanding it completely at this point in your learning, ;-}
the following [from my "random utilities" junkbox] has linear
behavior in most CL implementations:
(defun file-lines (path)
"Sucks up an entire file from PATH into a list of freshly-allocated
strings, returning two values: the list of strings and the number of
lines read."
(with-open-file (s path)
(loop for line = (read-line s nil nil)
while line
collect line into lines
counting t into line-count
finally (return (values lines line-count)))))
With CMUCL-19a on FreeBSD 4.10 on a 1.855 GHz Mobile Athlon, the
above takes a hair over one second to read in a 11637220 byte file
of 266478 lines.
And if, instead of a list of lines, you suck up the whole thing into
a single string [for later manipulation with CHAR or SUBSEQ]:
(defun file-string (path)
"Sucks up an entire file from PATH into a freshly-allocated string,
returning two values: the string and the number of bytes read."
(with-open-file (s path)
(let* ((len (file-length s))
(data (make-string len)))
(values data (read-sequence data s)))))
then on the same machine/file/etc. as above, it goes ~20 times as fast
(0.05 seconds of real time).
-Rob
-----
Rob Warnock <····@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607
Rob Warnock wrote:
> Bernd Schmitt <··················@gmx.net> wrote:
> +---------------
> | There was an interesting post (for a novice like me) about fast loading
> | a file, so that each line would be appended to a list. Actually this was
> | the wrong way, somebody pointed out that this would lead to quadratic
> | time consumption if file size would doubled, so the solution had been to
> | put new lines at the beginning and reverse the list after finishing.
> +---------------
I finally found it: "lists of lists (newbie)" from (Stephen Ramsay),
http://groups.google.com/group/comp.lang.lisp/browse_frm/thread/1f4af3e1b03b3cc7/ecc5cb97415e7a09?tvc=1&q=with-open-file#ecc5cb97415e7a09
> If you're willing to use a simple LOOP without necessarily
> understanding it completely at this point in your learning, ;-}
you are right. therefore i would like to ask some novice questions to
your code.
> the following [from my "random utilities" junkbox] has linear
> behavior in most CL implementations:
>
> (defun file-lines (path)
> "Sucks up an entire file from PATH into a list of freshly-allocated
> strings, returning two values: the list of strings and the number of
> lines read."
> (with-open-file (s path)
> (loop for line = (read-line s nil nil)
why is this indented like this?
> while line
> collect line into lines
> counting t into line-count
> finally (return (values lines line-count)))))
is this a kind of pseudo-code (otherwise i would miss some parens)?
If this is too novice-like, please excuse me and just give me the
keyword to google/cliki/pcl/...
thank you
Bernd
--
https://gna.org/projects/mipisti - (microscope) picture stitching
T_a_k_e__c_a_r_e__o_f__y_o_u_r__R_I_G_H_T_S.
P_r_e_v_e_n_t__L_O_G_I_C--P_A_T_E_N_T_S
http://www.ffii.org, http://www.nosoftwarepatents.org
Bernd Schmitt <··················@gmx.net> wrote:
+---------------
| Rob Warnock wrote:
| > the following [from my "random utilities" junkbox] has linear
| > behavior in most CL implementations:
| >
| > (defun file-lines (path)
| > "Sucks up an entire file from PATH into a list of freshly-allocated
| > strings, returning two values: the list of strings and the number of
| > lines read."
| > (with-open-file (s path)
| > (loop for line = (read-line s nil nil)
| why is this indented like this?
+---------------
Well, it *wasn't*, when I posted it! ;-} ;-}
It was indented like this, with the LOOP indented two spaces inside
the WITH-OPEN-FILE (which was in turn two spaces inside the DEFUN):
(defun file-lines (path)
"Sucks up an entire file from PATH into a list of freshly-allocated
strings, returning two values: the list of strings and the number of
lines read."
(with-open-file (s path)
(loop for line = (read-line s nil nil)
while line
collect line into lines
counting t into line-count
finally (return (values lines line-count)))))
Customarily, the bodies of forms which define things or establish
bindings are indented two spaces in from the beginning of the enclosing
form. The above follows this for the bodies of DEFUN and WITH-OPEN-FILE.
LOOP indenting is slightly different, since it has a more COBOL-like
syntax which uses unevaluated symbols [but not necessarily from the
KEYWORD package!] as syntax markers or "LOOP keywords". In the above
LOOP, the syntax markers used are FOR, =, WHILE, COLLECT, INTO, COUNTING,
and FINALLY. [It is this non-Lispy embedded syntax that made me include
in my previous posting the caveat: "If you're willing to use a simple
LOOP without necessarily understanding it completely..."]
Opinions differ on proper style for LOOP indenting. A LOOP form may
contain binding/initializing/stepping sub-forms as well as "action"/
collecting/counting/summing/terminating sub-forms. Some people [and
the editors they use] like to indent the entire body of the LOOP by
five spaces, to line up with (typically) the first binding/initializing/
stepping sub-form, like this:
(loop for line = (read-line s nil nil)
while line
collect line into lines
counting t into line-count
finally (return (values lines line-count)))
Others [yours truly among them] prefer to [and have configured their
editors to automate] indenting the binding/initializing/stepping
sub-forms five spaces, but then backing out to a two-space indent
for the "action"/collecting/counting/summing/terminating sub-forms,
like so:
(loop for line = (read-line s nil nil)
while line
collect line into lines
counting t into line-count
finally (return (values lines line-count)))
This causes the indenting to resemble that of other binding and
iteration forms in the language.
Whatever style you choose, try to stick to it [though, where it
makes sense to do so, not slavishly] so the readers of your code
don't wonder why some of your code looks one way and some another.
[Otherwise, they may try to read meaning that isn't there into the
entrails of your LOOPs!]
+---------------
| > while line
| > collect line into lines
| > counting t into line-count
| > finally (return (values lines line-count)))))
| is this a kind of pseudo-code (otherwise i would miss some parens)?
+---------------
Yes, one might well say that!! ;-} ;-}
It is widely agreed by both those who love the Commonn Lisp LOOP
macro and those who hate it that LOOP is one of the *least* "Lispy"
parts of the CL standard -- it is truly an example of using a macro
to embed a different, application-specific language inside normal CL.
Nevertheless, it provides concise, powerful expressiveness for a
large percentage of the iteration tasks one runs into in practice.
Some resources for when you're ready to dig into it further:
Tutorial:
http://gigamonkeys.com/book/macros-standard-control-constructs.html
[the section near the bottom called "The Mighty Loop"]
http://gigamonkeys.com/book/loop-for-black-belts.html
[the whole chapter]
The Standard [well, the CLHS, based on the ANSI CL Standard]:
http://www.lispworks.com/documentation/HyperSpec/Body/06_a.htm
"6.1 The LOOP Facility"
http://www.lispworks.com/documentation/HyperSpec/Body/06_aab.htm
"6.1.1.2 Loop Keywords"
http://www.lispworks.com/documentation/HyperSpec/Body/m_loop.htm
"Macro LOOP"
-Rob
-----
Rob Warnock <····@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607
Rob Warnock wrote:
> Bernd Schmitt <··················@gmx.net> wrote:
[indentation quoting error]
> Well, it *wasn't*, when I posted it! ;-} ;-}
So it was mozilla's error - i really should learn how to use emacs for
usenet ;)
> Some resources for when you're ready to dig into it further:
>
> Tutorial:
> http://gigamonkeys.com/book/macros-standard-control-constructs.html
> [the section near the bottom called "The Mighty Loop"]
> http://gigamonkeys.com/book/loop-for-black-belts.html
> [the whole chapter]
>
> The Standard [well, the CLHS, based on the ANSI CL Standard]:
> http://www.lispworks.com/documentation/HyperSpec/Body/06_a.htm
> "6.1 The LOOP Facility"
> http://www.lispworks.com/documentation/HyperSpec/Body/06_aab.htm
> "6.1.1.2 Loop Keywords"
> http://www.lispworks.com/documentation/HyperSpec/Body/m_loop.htm
> "Macro LOOP"
Many thanks, i will do so. I just passed destructive functions in the
lisp-book i am currently reading (nconc, ... -> loop is another 100
pages away :). I should focus on reading before I do posting next time...
Thank you,
Bernd
--
https://gna.org/projects/mipisti - (microscope) picture stitching
T_a_k_e__c_a_r_e__o_f__y_o_u_r__R_I_G_H_T_S.
P_r_e_v_e_n_t__L_O_G_I_C--P_A_T_E_N_T_S
http://www.ffii.org, http://www.nosoftwarepatents.org
From: Edi Weitz
Subject: Re: Can not find older posting: Reading files (fast)
Date:
Message-ID: <uy86mozhs.fsf@agharta.de>
On Sun, 28 Aug 2005 05:00:34 -0500, ····@rpw3.org (Rob Warnock) wrote:
> Well, it *wasn't*, when I posted it! ;-} ;-}
You used tabs which very often leads to this kind of confusion. IMHO
literal tabs should be avoided in source code.
(setq-default indent-tabs-mode nil)
Cheers,
Edi.
--
Lisp is not dead, it just smells funny.
Real email: (replace (subseq ·········@agharta.de" 5) "edi")
Edi Weitz <········@agharta.de> wrote:
+---------------
| ····@rpw3.org (Rob Warnock) wrote:
| > Well, it *wasn't*, when I posted it! ;-} ;-}
|
| You used tabs which very often leads to this kind of confusion.
+---------------
Ouch!! (*blush!*) I didn't know that, but looking at the file copy
of it, I see you're quite correct. Thanks for the catch!
+---------------
| IMHO literal tabs should be avoided in source code.
| (setq-default indent-tabs-mode nil)
+---------------
I use "vi" for composing news articles, and while I am generally
careful to never type tabs in source code, occasionally "vi" will
"helpfully" insert them anyway if I do a ">%" ["shift right to
matching bracket"] and some of the shifted text ends up being on or
past a tab stop. ·@·····@^!# (*grumph!*)
I'll see if I can find a setting that stops that...
Thanks,
-Rob
-----
Rob Warnock <····@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607
From: Christopher C. Stacy
Subject: Re: Can not find older posting: Reading files (fast)
Date:
Message-ID: <uzmr19rmo.fsf@news.dtpq.com>
····@rpw3.org (Rob Warnock) writes:
> Edi Weitz <········@agharta.de> wrote:
> +---------------
> | ····@rpw3.org (Rob Warnock) wrote:
> | > Well, it *wasn't*, when I posted it! ;-} ;-}
> |
> | You used tabs which very often leads to this kind of confusion.
> +---------------
>
> Ouch!! (*blush!*) I didn't know that, but looking at the file copy
> of it, I see you're quite correct. Thanks for the catch!
>
> +---------------
> | IMHO literal tabs should be avoided in source code.
> | (setq-default indent-tabs-mode nil)
> +---------------
>
> I use "vi" for composing news articles, and while I am generally
> careful to never type tabs in source code, occasionally "vi" will
> "helpfully" insert them anyway if I do a ">%" ["shift right to
> matching bracket"] and some of the shifted text ends up being on or
> past a tab stop. ·@·····@^!# (*grumph!*)
>
> I'll see if I can find a setting that stops that...
I use Emacs, and I try to remember to run the
command [m-x untabify] over the code region.
On 9216 day of my life Christopher C. Stacy wrote:
> I use Emacs, and I try to remember to run the
> command [m-x untabify] over the code region.
What about adding untabify to message-send-hook?
(add-hook
'message-send-hook
'(lambda ()
(message-goto-body)
(untabify (point) (point-max))))
--
Ivan Boldyrev
Outlook has performed an illegal operation and will be shut down.
If the problem persists, contact the program vendor.
From: Robert Uhl
Subject: Re: Can not find older posting: Reading files (fast)
Date:
Message-ID: <m3acj031np.fsf@4dv.net>
····@rpw3.org (Rob Warnock) writes:
>
> I use "vi" for composing news articles, and while I am generally
> careful to never type tabs in source code, occasionally "vi" will
> "helpfully" insert them anyway if I do a ">%" ["shift right to
> matching bracket"] and some of the shifted text ends up being on or
> past a tab stop. ·@·····@^!# (*grumph!*)
>
> I'll see if I can find a setting that stops that...
tmp=`which vi`; rm $tmp; ln -s `which emacs` $tmp
*grin*
--
Robert Uhl <http://public.xdi.org/=ruhl>
All words have not a single meaning but a swarm of them, like bees
around a hive. And like that swarm, changing its position
ever-so-slightly with each wingbeat, the word's meanings change a little
with each use upon the tongue or the page. --Maureen O'Brien
Rob Warnock wrote:
>
> And if, instead of a list of lines, you suck up the whole thing into
> a single string [for later manipulation with CHAR or SUBSEQ]:
>
> (defun file-string (path)
> "Sucks up an entire file from PATH into a freshly-allocated string,
> returning two values: the string and the number of bytes read."
> (with-open-file (s path)
> (let* ((len (file-length s))
> (data (make-string len)))
> (values data (read-sequence data s)))))
>
> then on the same machine/file/etc. as above, it goes ~20 times as fast
> (0.05 seconds of real time).
According to "~& ~:[?~;~:*~S~]: ~:[?~;~:*~S~] -> ~:[?~;~:*~S~]~%" [1],
this function is not portable :
"But this almost certainly will not work reliably. file-length will
almost certainly tell you the length of the file in octets, not
characters, and if the encoding is not trivial, this will mean that the
string allocated will be the wrong length (typically it will be too
long). To see why this is likely to be true, consider how you would make
things work `right' on a Unix-like system: since the file is actually
just a sequence of octets - there is no useful metadata - then in order
for file-length to calculate the character length of the file, it would
have to read the whole file, decoding it into characters. So in order to
work, this code has to read the whole file twice."
The following solution is offered :
(defun snarf-file (file)
;; encoding-resistant file reader. You can't use FILE-LENGTH
;; because in the presence of variable-length encodings (and DOS
;; linefeed conventions) the length of a file can bear little resemblance
;; to the length of the string it corresponds to. Reading each line
;; like this wastes a bunch of space but does solve the encoding
;; issues.
(with-open-file (in file :direction :input)
(loop for read = (read-line in nil nil)
while read
for i upfrom 1
collect read into lines
sum (length read) into len
finally (return
(let ((huge (make-string (+ len i))))
(loop with pos = 0
for line in lines
for len = (length line)
do (setf (subseq huge pos) line
(aref huge (+ pos len)) #\Newline
pos (+ pos len 1))
finally (return huge)))))))
Hope that helps.
[1] http://www.tfeb.org/lisp/obscurities.html
drewc
>
>
> -Rob
>
> -----
> Rob Warnock <····@rpw3.org>
> 627 26th Avenue <URL:http://rpw3.org/>
> San Mateo, CA 94403 (650)572-2607
>
--
Drew Crampsie
drewc at tech dot coop
"Never mind the bollocks -- here's the sexp's tools."
-- Karl A. Krueger on comp.lang.lisp
drewc <·····@rift.com> wrote:
+---------------
| Rob Warnock wrote:
| > (defun file-string (path)
| > "Sucks up an entire file from PATH into a freshly-allocated string,
| > returning two values: the string and the number of bytes read."
| > (with-open-file (s path)
| > (let* ((len (file-length s))
| > (data (make-string len)))
| > (values data (read-sequence data s)))))
|
| According to [ <http://www.tfeb.org/lisp/obscurities.html> ] ...
+---------------
Thanks for the ref!
+---------------
| ...this function is not portable :
|
| "But this almost certainly will not work reliably. file-length will
| almost certainly tell you the length of the file in octets, not
| characters...
+---------------
Hmmm... O.k., I'll agree with the non-portability in general, but
it *might* be slightly more portable than Tim's page suggests. ;-}
According to the CLHS:
FILE-LENGTH returns the length of stream, or NIL if the length
cannot be determined.
For a binary file, the length is measured in units of the
element type of the stream.
and refers one to OPEN, which says:
element-type---a type specifier for recognizable subtype of
CHARACTER; or a type specifier for a finite recognizable subtype
of INTEGER; or one of the symbols SIGNED-BYTE, UNSIGNED-BYTE, or
:DEFAULT. The default is CHARACTER.
And 13.1.4.1 "Graphic Characters" says that:
#\Backspace, #\Tab, #\Rubout, #\Linefeed, #\Return, and #\Page,
if they are supported by the implementation, are non-graphic.
But 2.1.3 "Standard Characters" only requires that the non-graphic
characters #\Space and #\Newline is supported.
So I guess it really boils down to whether in a given implementation
#\Return exists as a CHARACTER, and what happens when you READ-CHAR
a stream containing one, since READ-SEQUENCE is defined that way:
READ-SEQUENCE is identical in effect to iterating over the
indicated subsequence and reading one element at a time from
stream and storing it into sequence, but may be more efficient
than the equivalent loop. An efficient implementation is more
likely to exist for the case where the sequence is a vector with
the same element type as the stream.
Note that this is *not* the same as asking whether:
(= (length (file-string "foo"))
(with-open-file (s "foo")
(loop for line = (read-line s nil nil)
while line
sum (1+ (length line)))))
==> T
This clearly might be false on platforms where #\Newline is externally
represented as <CR><LF>, but if #\Return is a (non-graphic) CHARACTER
on those machines, then the following might still be true even if the
above is false:
(= (length (file-string "foo"))
(with-open-file (s "foo")
(loop for char = (read-char s nil nil)
while char
count t)))
Note that the former returns NIL on CMUCL under Unix when given a file
containing ASCII NULs (a .tar.gz! ;-} ) but the latter still returns T.
It would be interesting to know whether the latter also returns T on
MS/DOS or Windows platforms, and for which CL implemetations.
-Rob
-----
Rob Warnock <····@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607
Rob Warnock wrote:
> drewc <·····@rift.com> wrote:
> +---------------
> | Rob Warnock wrote:
> | > (defun file-string (path)
> | > "Sucks up an entire file from PATH into a freshly-allocated string,
> | > returning two values: the string and the number of bytes read."
> | > (with-open-file (s path)
> | > (let* ((len (file-length s))
> | > (data (make-string len)))
> | > (values data (read-sequence data s)))))
> |
> | According to [ <http://www.tfeb.org/lisp/obscurities.html> ] ...
> +---------------
>
> Thanks for the ref!
>
> +---------------
> | ...this function is not portable :
> |
> | "But this almost certainly will not work reliably. file-length will
> | almost certainly tell you the length of the file in octets, not
> | characters...
> +---------------
>
> Hmmm... O.k., I'll agree with the non-portability in general, but
> it *might* be slightly more portable than Tim's page suggests. ;-}
The real complaint he has is that, in the presense of multibyte
encodings, your string is going to be longer than the number of
characters it contains. This could cause some subtle bugs.
The following code serves to illustrate the problem, in a unicode
enabled SBCL, reading a UTF-8 encoded file :
CL-USER>
(defun slurp-stream (file-stream)
(with-output-to-string (datum)
(let ((buffer (make-array 4096 :element-type 'character)))
(loop for bytes-read = (read-sequence buffer file-stream)
do (write-sequence buffer datum :start 0 :end bytes-read)
while (= bytes-read 4096)))))
SLURP-STREAM
CL-USER> (with-open-file (s "/home/drewc/utf8_sample.html"
:external-format :utf8)
(values (file-length s) (length (slurp-stream s))))
11085
10279
FWIW, i use the following function from Marco Baringer's Arnesi library
which is somewhere between your function and the one Tim suggests in
speed and memory usage, and to my eyes seems quite portable:
(defun read-string-from-file (pathname &key (buffer-size 4096)
(element-type 'character))
"Return the contents of @var{pathname} as a string."
(with-input-from-file (file-stream pathname)
(with-output-to-string (datum)
(let ((buffer (make-array buffer-size :element-type element-type)))
(loop for bytes-read = (read-sequence buffer file-stream)
do (write-sequence buffer datum :start 0 :end bytes-read)
while (= bytes-read buffer-size))))))
hth,
drewc
> According to the CLHS:
>
> FILE-LENGTH returns the length of stream, or NIL if the length
> cannot be determined.
>
> For a binary file, the length is measured in units of the
> element type of the stream.
>
> and refers one to OPEN, which says:
>
> element-type---a type specifier for recognizable subtype of
> CHARACTER; or a type specifier for a finite recognizable subtype
> of INTEGER; or one of the symbols SIGNED-BYTE, UNSIGNED-BYTE, or
> :DEFAULT. The default is CHARACTER.
>
> And 13.1.4.1 "Graphic Characters" says that:
>
> #\Backspace, #\Tab, #\Rubout, #\Linefeed, #\Return, and #\Page,
> if they are supported by the implementation, are non-graphic.
>
> But 2.1.3 "Standard Characters" only requires that the non-graphic
> characters #\Space and #\Newline is supported.
>
> So I guess it really boils down to whether in a given implementation
> #\Return exists as a CHARACTER, and what happens when you READ-CHAR
> a stream containing one, since READ-SEQUENCE is defined that way:
>
> READ-SEQUENCE is identical in effect to iterating over the
> indicated subsequence and reading one element at a time from
> stream and storing it into sequence, but may be more efficient
> than the equivalent loop. An efficient implementation is more
> likely to exist for the case where the sequence is a vector with
> the same element type as the stream.
>
> Note that this is *not* the same as asking whether:
>
> (= (length (file-string "foo"))
> (with-open-file (s "foo")
> (loop for line = (read-line s nil nil)
> while line
> sum (1+ (length line)))))
> ==> T
>
> This clearly might be false on platforms where #\Newline is externally
> represented as <CR><LF>, but if #\Return is a (non-graphic) CHARACTER
> on those machines, then the following might still be true even if the
> above is false:
>
> (= (length (file-string "foo"))
> (with-open-file (s "foo")
> (loop for char = (read-char s nil nil)
> while char
> count t)))
>
> Note that the former returns NIL on CMUCL under Unix when given a file
> containing ASCII NULs (a .tar.gz! ;-} ) but the latter still returns T.
> It would be interesting to know whether the latter also returns T on
> MS/DOS or Windows platforms, and for which CL implemetations.
>
>
> -Rob
>
> -----
> Rob Warnock <····@rpw3.org>
> 627 26th Avenue <URL:http://rpw3.org/>
> San Mateo, CA 94403 (650)572-2607
>
--
Drew Crampsie
drewc at tech dot coop
"Never mind the bollocks -- here's the sexp's tools."
-- Karl A. Krueger on comp.lang.lisp
From: Edi Weitz
Subject: Re: Can not find older posting: Reading files (fast)
Date:
Message-ID: <uwtm5b269.fsf@agharta.de>
On Sun, 28 Aug 2005 22:46:18 -0500, ····@rpw3.org (Rob Warnock) wrote:
> Hmmm... O.k., I'll agree with the non-portability in general, but it
> *might* be slightly more portable than Tim's page suggests. ;-}
>
> [analysis snipped]
You're most likely aware of this but just for the record: The real
problem in this case is the fact that the file in question could be
encoded in UTF-8 or whatever. No matter how #\Return is treated by
the implementation it's still impossible to determine the string
length of the file from the octet length without reading through the
whole file. (Doesn't matter for CMUCL where characters are 8-bit but
it does for LispWorks, AllegroCL, SBCL, CLISP and probably others.)
Cheers,
Edi.
--
Lisp is not dead, it just smells funny.
Real email: (replace (subseq ·········@agharta.de" 5) "edi")