From: Nonzero
Subject: Arrays, files, and EOF
Date: 
Message-ID: <376b0355.0303081108.30a39d2b@posting.google.com>
Hello everyone,

I have a few questions...

I have a txt file with data I would like to read into an array.

How do I get lisp to keep reading until the end of the file?

In addition, lets assume I do not know how much data there is, how do
I define an array if I don't know how big it should be?

Thank you for your time,
-Colin

From: Gabe Garza
Subject: Re: Arrays, files, and EOF
Date: 
Message-ID: <87adg5wt3w.fsf@ix.netcom.com>
····@geneseo.edu (Nonzero) writes:

> I have a txt file with data I would like to read into an array.
> 
> How do I get lisp to keep reading until the end of the file?

The READ, READ-LINE, and READ-BYTE functions all have a second
optional argument called EOF-ERRORP and a third called EOF-VALUE.  If
EOF-ERRORP is NIL (which is not the default), then the read function
will return EOF-VALUE (which is NIL by default) when EOF is reached.

So, the standard idiom is to call READ, READ-LINE or READ-BYTE with
NIL as the second argument (sometimes you need to override EOF-VALUE's
default value, but we'll ignore that for now).

> In addition, lets assume I do not know how much data there is, how
> do I define an array if I don't know how big it should be?

Use an adjustable array with a fill pointer, e.g.:

[6]> (defvar *array* 
             (make-array '(0) :element-type '(unsigned-byte 8) 
                              :adjustable t 
                              :fill-pointer 0))
*ARRAY*
[7]> *array*
#()
[8]> (vector-push-extend 255 *array*)
0
[9]> *array*
#(255)
[10]> (vector-push-extend 254 *array*)
1
[11]> *array*
#(255 254)
[12]> (vector-push-extend 253 *array*)
2
[13]> *array*
#(255 254 253)

Putting it all together, here's a function that will read a file and
return an array of (unsigned-byte 8), which is what C calls "unsigned
char":

(defun file->octet-array (pathname)
  (let ((array (make-array '(0) :element-type '(unsigned-byte 8)
                                :adjustable t
                                :fill-pointer 0)))
    (with-open-file (stream pathname :direction :input
                                     :element-type '(unsigned-byte 8))
      (loop for byte = (read-byte stream nil)
            while byte
            do (vector-push-extend byte array)))
    array))

Gabe Garza


            
From: Wade Humeniuk
Subject: Re: Arrays, files, and EOF
Date: 
Message-ID: <xvsaa.7115$wW.718945@news2.telusplanet.net>
"Gabe Garza" <·······@ix.netcom.com> wrote in message ···················@ix.netcom.com...
> Putting it all together, here's a function that will read a file and
> return an array of (unsigned-byte 8), which is what C calls "unsigned
> char":
>
> (defun file->octet-array (pathname)
>   (let ((array (make-array '(0) :element-type '(unsigned-byte 8)
>                                 :adjustable t
>                                 :fill-pointer 0)))
>     (with-open-file (stream pathname :direction :input
>                                      :element-type '(unsigned-byte 8))
>       (loop for byte = (read-byte stream nil)
>             while byte
>             do (vector-push-extend byte array)))
>     array))

Or alternatively:

(defun file-to-array (filename &key (element-type 'unsigned-byte) (adjustable nil))
  (with-open-file (stream filename :direction :input :element-type element-type)
    (let ((array (make-array (file-length stream) :element-type element-type :adjustable
adjustable)))
      (read-sequence array stream)
      array)))

Potential problems: being unable to make an array large enough.

You can also just treat a file directly as a 1D array.

(defun fileref (stream index)
  "assumes binary stream"
  (file-position stream index)
  (read-byte stream))

(defun (setf fileref) (newval stream index)
  "assumes binary stream"
  (file-position stream index)
  (write-byte newval stream))

CL-USER 12 > (with-open-stream (stream (open #p"/user/wade/lww/file.lisp" :element-type
'unsigned-byte :direction :input))
              (prin1 (fileref stream 0)) (fresh-line)
              (prin1 (fileref stream 100)) (fresh-line)
              nil)
40
108
NIL

CL-USER 13 > (with-open-stream (stream (open #p"/user/wade/lww/file.lisp" :element-type
'(unsigned-byte 16) :direction :input))
              (prin1 (fileref stream 0)) (fresh-line)
              (prin1 (fileref stream 100)) (fresh-line)
              nil)
25640
11621
NIL

CL-USER 20 > (with-open-stream (stream (open #p"/user/wade/lww/file.lisp" :element-type
'unsigned-byte :direction :output :if-exists :overwrite))
              (setf (fileref stream 0) 20))
20

CL-USER 21 > (with-open-stream (stream (open #p"/user/wade/lww/file.lisp" :element-type
'unsigned-byte :direction :input))
              (prin1 (fileref stream 0)) (fresh-line)
              (prin1 (fileref stream 100)) (fresh-line)
              nil)
20
108
NIL

CL-USER 22 >

Wade
From: Tim Bradshaw
Subject: Re: Arrays, files, and EOF
Date: 
Message-ID: <ey3of4koess.fsf@cley.com>
* Wade Humeniuk wrote:

> (defun file-to-array (filename &key (element-type 'unsigned-byte) (adjustable nil))
>   (with-open-file (stream filename :direction :input :element-type element-type)
>     (let ((array (make-array (file-length stream) :element-type element-type :adjustable
> adjustable)))
>       (read-sequence array stream)
>       array)))

> Potential problems: being unable to make an array large enough.

A much more significant problem is that this will not, in general,
work *at all* for files being read as characters.  Consider a file
which you are contains unicode characters, whose external format is
UTF-8: the only way to know the length of the file in characters
(unless the OS can tell you this, which common OSs can't) is to read
the whole file, and decode it.

Lest you think this is some esoteric weirdness, consider CRLF -> LF
encoding for DOS
files. http://www.tfeb.org/lisp/obscurities.html#SNARFING-FILES has
some information about this.

--tim
From: Scott Schwartz
Subject: Re: Arrays, files, and EOF
Date: 
Message-ID: <8gn0k42udt.fsf@galapagos.cse.psu.edu>
Tim Bradshaw <···@cley.com> writes:
> >   (with-open-file (stream filename :direction :input :element-type element-type)
> >     (let ((array (make-array (file-length stream) :element-type element-type :adjustable
> > adjustable)))
> >       (read-sequence array stream)
> >       array)))
> 
> > Potential problems: being unable to make an array large enough.
> 
> A much more significant problem is that this will not, in general,
> work *at all* for files being read as characters.  Consider a file
> which you are contains unicode characters, whose external format is
> UTF-8: the only way to know the length of the file in characters
> (unless the OS can tell you this, which common OSs can't) is to read
> the whole file, and decode it.

But the number of unicode characters is less than the number of utf-8
bytes, so the array will at least be big enough.

The real problem is one of portability: you often want to read from
something that isn't a file, e.g. filename may be "/dev/stdin" or
"/net/tcp/2/data", which may be a pipe or socket, and can't be
stat()ed and doesn't have a predetermined length.

Reading into an extensible array seems like a better solution, in
general.

> Lest you think this is some esoteric weirdness, consider CRLF -> LF
> encoding for DOS
> files. http://www.tfeb.org/lisp/obscurities.html#SNARFING-FILES has
> some information about this.

That recommends reading the file twice, which may not be possible.
From: Tim Bradshaw
Subject: Re: Arrays, files, and EOF
Date: 
Message-ID: <ey3bs0knsue.fsf@cley.com>
* Scott Schwartz wrote:

> But the number of unicode characters is less than the number of utf-8
> bytes, so the array will at least be big enough.

However this may not be true for all encodings.  For instance one that
compresses the file on disk.

> That recommends reading the file twice, which may not be possible.

No, it doesn't.  It says that in order for FILE-LENGTH to compute the
length of a file in characters it would need to read it, and so if
FILE-LENGTH is to work `right' (the way people naively might expect it
to) you need to read the file twice.  It then goes on to provide a
function which reads the file line-by-line, which may be suboptimal
but does at least work, and read the file exactly once.

--tim