From: Josef Eschgfaeller
Subject: Reading files
Date: 
Message-ID: <37524D0B.773796E5@felix.unife.it>
Kent Pitman wrote:

> The code you use uses an O(n^2) string copying, so it's not surprising
> that it's slow.

> Even without doing anything else, you could switch it to
> an O(n) algorithm (where n is the number of lines) by doing:

Thanks. I applied your algorithm for a new function which is now very
quick also for big files. But I have now another O(n^2) algorithm:

  (defun read-file-as-list-of-rows (name)
  (with-open-file (file name :direction :input)
  (do ((row (read-line file) (read-line file nil 'file-end))
  (rows nil (nconc rows (list row))))
  ((eql row 'file-end) rows))))

It becomes sufficiently quick in this way:

  (defun read-file-as-vector-of-rows (name)
  (with-open-file (file name :direction :input)
  (do ((row (read-line file) (read-line file nil 'file-end))
  (v (make-array 11820)) (k 0 (+ 1 k)))
  ((eql row 'file-end) v) (setf (aref v k) row))))
  
But how can I know that my file has 11819 rows? I could divide
file-length by 80, say. Is there a way to access the result of
(system "wc ...") from Lisp? Also the setf seems ugly.

> There's also zero justification for this being a macro.

And here? For symmetry also this should become a function:

  (defmacro write-file (name &rest a)
  `(with-open-file (file ,name :direction :output)
  (format file ,@a)))

Sorry for so simple questions. But your advices were very useful.

J. Eschgfaeller

From: Steve Gonedes
Subject: Re: Reading files
Date: 
Message-ID: <m2aeula7jt.fsf@KludgeUnix.com>
Josef Eschgfaeller <···@felix.unife.it> writes:

< Thanks. I applied your algorithm for a new function which is now very
< quick also for big files. But I have now another O(n^2) algorithm:
<
<   (defun read-file-as-list-of-rows (name)
<   (with-open-file (file name :direction :input)
<   (do ((row (read-line file) (read-line file nil 'file-end))
<   (rows nil (nconc rows (list row))))
<   ((eql row 'file-end) rows))))

It might be faster and easier to read/write using loop; probably
depends on what you think about using loop.

(defun read-file-as-list-of-rows (file)
  (with-open-file (input file :direction :input)
   (loop for line = (read-line input nil nil)
      while line
         collect line)))

< It becomes sufficiently quick in this way:
<
<   (defun read-file-as-vector-of-rows (name)
<   (with-open-file (file name :direction :input)
<   (do ((row (read-line file) (read-line file nil 'file-end))
<   (v (make-array 11820)) (k 0 (+ 1 k)))
<   ((eql row 'file-end) v) (setf (aref v k) row))))
<
< But how can I know that my file has 11819 rows? I could divide
< file-length by 80, say. Is there a way to access the result of
< (system "wc ...") from Lisp? Also the setf seems ugly.

There is no reliable or recommended way to statically determine such a
limit. You can use a `fill-pointer' and an adjustable vector instead.

(defun rfvr (file)
 (let ((vector (make-array 1024
                 :initial-element nil
                 :fill-pointer 0
                 :adjustable t)))
  (with-open-file (input file :direction :input)
   (do ((line (read-line input nil nil) (read-line input nil nil)))
       ((null line)
        vector)
    (vector-push-extend line vector)))))

Using loop,

(defun rfvr (file)
 (let ((vector (make-array 1024
                 :initial-element nil
                 :fill-pointer 0
                 :adjustable t)))
  (with-open-file (input file :direction :input)
    (loop for line = (read-line input nil nil)
       while line
         do (vector-push-extend line vector)
      finally (return vector)))))
.



< > There's also zero justification for this being a macro.
<
< And here? For symmetry also this should become a function:
<
<   (defmacro write-file (name &rest a)
<   `(with-open-file (file ,name :direction :output)
<   (format file ,@a)))

I would use `write-sequence' instead of format.

I would make a macro call `with-file-lines' that would bind some
variable (similiar to with-open-file) to the current line.

(defun make-fill-string (size &optional adjustable-p)
  (make-array size
         :initial-element #\Null
         :element-type 'character
         :fill-pointer 0
         :adjustable adjustable-p))


(defmacro with-file-lines ((stream variable-name &optional return-val)
                    &body forms)
  (let ((ch (gensym "CHARACTER"))
        (fn-name (gensym "FN")))
   `(let ((,variable-name (make-fill-string 1024 t)))
       (flet ((,fn-name (,variable-name)
                ,@ forms))
      (do ((,ch (read-char ,stream nil nil) (read-char ,stream nil nil)))
           ((null ,ch)
              (unless (zerop (fill-pointer ,variable-name))
               ;; Should refuse to 
               ;; operate upon reaching this condition
               ;; for unix compliance? Tough choice.
                ; (vector-push-extend #\Newline ,variable-name)
                ; (error "File `~s' does not end!" (file-namestring ,stream))
                )
              (,fn-name ,variable-name)
              ,return-val)
      (vector-push-extend ,ch ,variable-name)
        ;; This `Newline' may not work for your OS,
        ;; and since I try not to use keywords and optional args
        ;; I leave you this comment instead.
       (cond ((char= ,ch #\Newline)
              (,fn-name ,variable-name)
              (setf (fill-pointer ,variable-name) 0))))))))


(let ((line-count 0))
 (with-open-file (in ".emacs" :direction :input)
   (with-file-lines (in line line-count)
      (write-string line)
      (incf line-count))))
From: Kent M Pitman
Subject: Re: Reading files
Date: 
Message-ID: <sfw4sktmeqn.fsf@world.std.com>
Steve Gonedes <········@worldnet.att.net> writes:

> (defun rfvr (file)
>  (let ((vector (make-array 1024
>                  :initial-element nil
>                  :fill-pointer 0
>                  :adjustable t)))
>   (with-open-file (input file :direction :input)
>     (loop for line = (read-line input nil nil)
>        while line
>          do (vector-push-extend line vector)
>       finally (return vector)))))

Some small comments here, mostly for those looking on, and probably
not really for Steve, who may know this but just not thought to have
mentioned it:

One is that it may be useful to open the vector with an initial size of 

  (or (ignore-errors (truncate (file-length input) n)) 1024)

instead of 1024. where n is the expected average line length.
The ignore-errors is in case it can't compute a file length.
[You have to factor the LET into the WITH-OPEN-FILE in order to
use the open stream INPUT as an arg to FILE-LENGTH.]

The second is that 
 (vector-push-extend line vector)
will grow the vector by an implemenation-dependent amount if you overrun
the vector allocated size.  [This will involve a copy of the backing-store
of the array in most cases.]  If the amount is a fixed small amount, as
it is in some systems, you'll end up with what may feel like 
an O(n^2) problem again.  It's possibly better to control this value yourself
and make sure the new vector size is fairly chunky so the copy won't be
done over and over again.  Suggesting a good value may depend somewhat
on knowledge of the data.
From: Bernhard Pfahringer
Subject: Re: Reading files
Date: 
Message-ID: <7iutcg$1hlu$1@www.univie.ac.at>
In article <···············@world.std.com>,
Kent M Pitman  <······@world.std.com> wrote:
>Steve Gonedes <········@worldnet.att.net> writes:
>
>> (defun rfvr (file)
>>  (let ((vector (make-array 1024
>>                  :initial-element nil
>>                  :fill-pointer 0
>>                  :adjustable t)))
>>   (with-open-file (input file :direction :input)
>>     (loop for line = (read-line input nil nil)
>>        while line
>>          do (vector-push-extend line vector)
>>       finally (return vector)))))
>
>Some small comments here, mostly for those looking on, and probably
>not really for Steve, who may know this but just not thought to have
>mentioned it:
>
>One is that it may be useful to open the vector with an initial size of 
>
>  (or (ignore-errors (truncate (file-length input) n)) 1024)
>
>instead of 1024. where n is the expected average line length.
>The ignore-errors is in case it can't compute a file length.
>[You have to factor the LET into the WITH-OPEN-FILE in order to
>use the open stream INPUT as an arg to FILE-LENGTH.]
>
>The second is that 
> (vector-push-extend line vector)
>will grow the vector by an implemenation-dependent amount if you overrun
>the vector allocated size.  [This will involve a copy of the backing-store
>of the array in most cases.]  If the amount is a fixed small amount, as
>it is in some systems, you'll end up with what may feel like 
>an O(n^2) problem again.  It's possibly better to control this value yourself
>and make sure the new vector size is fairly chunky so the copy won't be
>done over and over again.  Suggesting a good value may depend somewhat
>on knowledge of the data.
>

You might as well circumvent all these unexpected efficiency problems
of vector-push-extend and adjusting arrays by simply collecting lines
into a LIST and converting to a vector afterwards (if you really have to)
by sticking to the original collecting version:


(defun rfvr (file)
  (with-open-file (input file :direction :input)
    (coerce (loop for line = (read-line input nil nil)
		  while line
		  collect line)
	    'simple-vector)))


Bernhard

PS: reminds me of the trade-off between hash-tables and trees,
you incur some bounded overhead but get better worst-case protection.
-- 
--------------------------------------------------------------------------
Bernhard Pfahringer
Austrian Research Institute for  http://www.ai.univie.ac.at/~bernhard/
Artificial Intelligence          ········@ai.univie.ac.at 
From: Thomas A. Russ
Subject: Re: Reading files
Date: 
Message-ID: <ymihfosj5cr.fsf@sevak.isi.edu>
Josef Eschgfaeller <···@felix.unife.it> writes:
> 
> Thanks. I applied your algorithm for a new function which is now very
> quick also for big files. But I have now another O(n^2) algorithm:
> 
>   (defun read-file-as-list-of-rows (name)
>   (with-open-file (file name :direction :input)
>   (do ((row (read-line file) (read-line file nil 'file-end))
>   (rows nil (nconc rows (list row))))
>   ((eql row 'file-end) rows))))

A simple way to make this O(n) is:

 (defun read-file-as-list-of-rows (name)
  (with-open-file (file name :direction :input)
    (do ((row (read-line file) (read-line file nil 'file-end))
         (rows nil (cons row rows)))
        ((eql row 'file-end) (nreverse rows)))))

Use "rows" as a stack and accumulate the rows in reverse order.  This is
a O(n) operation.  Finally, make one pass to reverse the stack into
proper order.  Another O(n) operation.

-- 
Thomas A. Russ,  USC/Information Sciences Institute          ···@isi.edu    
From: Kent M Pitman
Subject: Re: Reading files
Date: 
Message-ID: <sfw7lpnohp0.fsf@world.std.com>
···@sevak.isi.edu (Thomas A. Russ) writes:

> A simple way to make this O(n) is:
> 
>  (defun read-file-as-list-of-rows (name)
>   (with-open-file (file name :direction :input)
>     (do ((row (read-line file) (read-line file nil 'file-end))
>          (rows nil (cons row rows)))
>         ((eql row 'file-end) (nreverse rows)))))
> 
> Use "rows" as a stack and accumulate the rows in reverse order.  This is
> a O(n) operation.  Finally, make one pass to reverse the stack into
> proper order.  Another O(n) operation.

Incidentally, the first call to read-line should be the same as
the repeat call.  It's possible for a file to have no lines.

This bug was in the original, and I forgot to mention it.  I bet you
noticed it too but forgot as well.  But better late than never...

The relevant line should be:

    (do ((row (read-line file nil 'file-end) (read-line file nil 'file-end))

Personally,  I just use NIL rather than 'FILE-END, but same idea.
From: Thomas A. Russ
Subject: Re: Reading files
Date: 
Message-ID: <ymig14cj5ao.fsf@sevak.isi.edu>
Josef Eschgfaeller <···@felix.unife.it> writes:
> 
> Thanks. I applied your algorithm for a new function which is now very
> quick also for big files. But I have now another O(n^2) algorithm:
> 
>   (defun read-file-as-list-of-rows (name)
>   (with-open-file (file name :direction :input)
>   (do ((row (read-line file) (read-line file nil 'file-end))
>   (rows nil (nconc rows (list row))))
>   ((eql row 'file-end) rows))))

A simple way to make this O(n) is:

 (defun read-file-as-list-of-rows (name)
  (with-open-file (file name :direction :input)
    (do ((row (read-line file) (read-line file nil 'file-end))
         (rows nil (cons row rows)))
        ((eql row 'file-end) (nreverse rows)))))

Use "rows" as a stack and accumulate the rows in reverse order.  This is
a O(n) operation.  Finally, make one pass to reverse the stack into
proper order.  Another O(n) operation.

Also, it would appear slightly safer to make the first call to READ-LINE
the same as subsequent calls.  That way you wouldn't throw an error on
an empty file.

-- 
Thomas A. Russ,  USC/Information Sciences Institute          ···@isi.edu