unix / dos newline - possible FAQ?

From: Ben
Subject: unix / dos newline - possible FAQ?
Date: Thu, 12 May 2005 17:47:10 +0000
Message-ID: <1115920030.611131.163910@g44g2000cwa.googlegroups.com>

I'm sorry if this is a FAQ - I've searched using google, and I can't
find the answer.

I am refreshing my Lisp knowledge this spring.  I had planned to learn
Ruby but, after reading how much Ruby owes to Lisp, decided to switch.
It's been 15 years since I even looked at Lisp code, but I remember it
as a lot of fun.  I'm still having fun.

  I am using Clisp under cygwin to write some scripts for my work.  As
a result of this mixed environment, I am running into some problems
with files that contain both dos and unix newlines.

  Could someone suggest a way to distinguish between the two newlines
so that read-line grabs everything from the stream that I want it to?
How would you parse them if there is no way to make read-line work?

Thanks,
-Ben

Re: unix / dos newline - possible FAQ? Eric Lavigne
Re: unix / dos newline - possible FAQ? Thomas A. Russ
- Re: unix / dos newline - possible FAQ? Thomas A. Russ
- Re: unix / dos newline - possible FAQ? Ben
Re: unix / dos newline - possible FAQ? Sam Steingold
Re: unix / dos newline - possible FAQ? Pascal Bourguignon
- Re: unix / dos newline - possible FAQ? Thomas A. Russ
  - Re: unix / dos newline - possible FAQ? Pascal Bourguignon
    - Re: unix / dos newline - possible FAQ? Thomas A. Russ
Re: unix / dos newline - possible FAQ? GP lisper
Re: unix / dos newline - possible FAQ? Pascal Bourguignon

From: Eric Lavigne
Subject: Re: unix / dos newline - possible FAQ?
Date: Thu, 12 May 2005 17:56:38 +0000
Message-ID: <1115920598.080395.292130@f14g2000cwb.googlegroups.com>

>I am using Clisp under cygwin to write some
>scripts for my work.  As a result of this mixed
>environment, I am running into some problems
>with files that contain both dos and unix newlines.

I am also using clisp in windows, but I don't think cygwin is involved.
I also have had no difficulty using read-line on files generated with
notepad. If  cygwin is giving your trouble, maybe you should switch. I
got my clisp from here

http://common-lisp.net/project/lispbox/

as part of Lisp in a Box, which also includes an emacs editor nicely
set up to support an REPL and Lisp file editing.

From: Thomas A. Russ
Subject: Re: unix / dos newline - possible FAQ?
Date: Thu, 12 May 2005 18:56:49 +0000
Message-ID: <ymi1x8cw9vy.fsf@sevak.isi.edu>

"Ben" <·····@yahoo.com> writes:

> 
> I'm sorry if this is a FAQ - I've searched using google, and I can't
> find the answer.
> 
>   I am using Clisp under cygwin to write some scripts for my work.  As
> a result of this mixed environment, I am running into some problems
> with files that contain both dos and unix newlines.
> 
>   Could someone suggest a way to distinguish between the two newlines
> so that read-line grabs everything from the stream that I want it to?
> How would you parse them if there is no way to make read-line work?

Well, having struggled with this myself, I find that there isn't really
any great answer in general.  For a general solution, one has to have a
special stream type that maintains some state information.  There are
some things that can be done if one only has to work with files and not
interactive streams such as terminal input or network inputs.  It can
also be simplified a bit if one can assume that a particular stream will
only have a single type of line ending in it.

I don't actually have any simple lisp code for handling the issue in
general, but I can outline a general algorithm that will handle the
issue.  It is best thought of as a finite state automaton that operates
on a stream with a bit of extra state in it (for the FSA).

Start
  --Return-->     return collected string, reset string ==> Return
  --Linefeed-->   return collected string, reset string ==> Start
  --EOF-->        if string has characters return them, reset string
                  else signal EOF ==> EOF
  --(Other)-->    collect character into string,   ==> Start

Return
  --Linefeed-->   do nothing ==> Start
  --Return-->     return empty string ==> Return
  --EOF-->        signal EOF ==> EOF
  --(Other)-->    

EOF
  (a read will always return EOF)

One of the key considerations is to always return a line on the FIRST
potential line terminating character.  This will stop you from hanging
if you get a Mac-format file with one CR's on an interactive stream or
via a network interface.  That is why there is a need for some state, so
that you remember the last character you've seen so you know if a
linefeed is encountered, if it is part of the CR-LF pair or not.

The real problem is that unless you implement some sort of buffering
scheme, doing READ-CHAR to get the items into memory will be really
slow.  Buffering with READ-SEQUENCE would be faster, but it would
require a lot more machinery to maintain the buffer and handle buffer
fills that span more than one line.

A low-performance, non-interactive version would be pretty simple to
code, but high performance is much trickier to achieve.

A high-performance version would be a good little bit of utility code,
though.

-- 
Thomas A. Russ,  USC/Information Sciences Institute

From: Thomas A. Russ
Subject: Re: unix / dos newline - possible FAQ?
Date: Thu, 12 May 2005 19:28:05 +0000
Message-ID: <ymiy8akutve.fsf@sevak.isi.edu>

OK, here's a first cut solution.  It will fail on interactive streams,
and it's slow, but it will at least work:


(in-package "CL-USER")

(defun read-any-line (stream &optional eof-error-p eof-value)
  "Simple cross-platform line reader.  For non-interactive, non-network
streams only.  Slow."

  ;; Handle EOF issues upfront:
  (if eof-error-p
    (peek-char nil stream t)
    (unless (peek-char nil stream nil nil)
      (return-from read-any-line eof-value)))

  ;; Collect string output.
  (with-output-to-string (buffer)
    (loop for ch = (read-char stream nil nil)
	  until (null ch)
          do (case ch
               (#\Return
                (when (char= (peek-char nil stream nil nil) #\Linefeed)
                  (read-char stream nil nil))
                (loop-finish))
               (#\Linefeed
                (loop-finish))
               (t (write-char ch buffer))))))


-- 
Thomas A. Russ,  USC/Information Sciences Institute

From: Ben
Subject: Re: unix / dos newline - possible FAQ?
Date: Thu, 12 May 2005 19:52:42 +0000
Message-ID: <1115927562.143980.119010@z14g2000cwz.googlegroups.com>

Thanks for the detailed reply.  I was afraid the answer would be that.
For the moment I've slapped in a (nasty) solution.

It's embarassing, but I'm parsing side-by-side diffs, so if the line is
< 80 chars, I know I hit an illegal newline, and I just read the next
line.

I WILL eventually write that stream utility, but I've spent more of
work's time learning lisp than I really should, so I feel I should
produce something useful.  I can release the nasty solution this
afternoon.

Thanks again.
-Ben

From: Sam Steingold
Subject: Re: unix / dos newline - possible FAQ?
Date: Thu, 12 May 2005 21:14:09 +0000
Message-ID: <ufywsm9jy.fsf@gnu.org>

> * Ben <·····@lnubb.pbz> [2005-05-12 10:47:10 -0700]:
>
> I'm sorry if this is a FAQ - I've searched using google, and I can't
> find the answer.

<http://www.google.com/search?q=clisp+newline>
the first hit answers your questions:
<http://clisp.cons.org/impnotes/clhs-newline.html>

a little bit more detailed explanation is here:
<http://www.podval.org/~sds/clisp/impnotes/clhs-newline.html>

> Could someone suggest a way to distinguish between the two newlines
> so that read-line grabs everything from the stream that I want it to?
> How would you parse them if there is no way to make read-line work?

READ-LINE treats all newlines the same,
as specifically recommended by the "Unicode Newline Guidelines"
<http://www.unicode.org/reports/tr13/tr13-9.html>.

if you need to distinguish between CR, LF and CR+LF in your input files,
you should use READ-CHAR-SEQUENCE instead of READ-LINE
<http://clisp.cons.org/impnotes/stream-dict.html#bulk-io>


-- 
Sam Steingold (http://www.podval.org/~sds) running w2k
<http://www.openvotingconsortium.org/> <http://www.jihadwatch.org/>
<http://www.honestreporting.com> <http://www.memri.org/> <http://ffii.org/>
Don't use force -- get a bigger hammer.

From: Pascal Bourguignon
Subject: Re: unix / dos newline - possible FAQ?
Date: Thu, 12 May 2005 18:42:53 +0000
Message-ID: <87zmv0cmky.fsf@thalassa.informatimago.com>

"Ben" <·····@yahoo.com> writes:

> I'm sorry if this is a FAQ - I've searched using google, and I can't
> find the answer.
>
> I am refreshing my Lisp knowledge this spring.  I had planned to learn
> Ruby but, after reading how much Ruby owes to Lisp, decided to switch.
> It's been 15 years since I even looked at Lisp code, but I remember it
> as a lot of fun.  I'm still having fun.
>
>   I am using Clisp under cygwin to write some scripts for my work.  As
> a result of this mixed environment, I am running into some problems
> with files that contain both dos and unix newlines.
>
>   Could someone suggest a way to distinguish between the two newlines
> so that read-line grabs everything from the stream that I want it to?
> How would you parse them if there is no way to make read-line work?


You can specify explicitely the newline you want with clisp, giving to
:external-format a clisp specific value.  But in anycase, when clisp
reads text files, it accepts any newline sequence:


(dolist (onewline '(:unix :dos :mac))
  (with-open-file (out "test.txt" :direction :output
                       :if-does-not-exist :create :if-exists :supersede
                       :external-format (ext:make-encoding
                                         :charset charset:iso-8859-1
                                         :line-terminator onewline))
    (format out "line1~%line2~%"))
  (with-open-file (in "test.txt" :direction :input 
                      :element-type '(unsigned-byte 8))
    (loop for byte = (read-byte in nil nil)
          while byte do (format t "~2,'0X " byte)
          finally (format t "~%")))
  (dolist (inewline '(:unix :dos :mac))
    (with-open-file (in "test.txt" :direction :input
                        :external-format (ext:make-encoding
                                          :charset charset:iso-8859-1
                                          :line-terminator inewline))
      (let ((line  (read-line in)))
        (format t "written as ~6A, read as ~6A: ~3D ~S~%" 
                onewline inewline (length line) line)))) )

6C 69 6E 65 31 0A 6C 69 6E 65 32 0A 
written as UNIX  , read as UNIX  :   5 "line1"
written as UNIX  , read as DOS   :   5 "line1"
written as UNIX  , read as MAC   :   5 "line1"
6C 69 6E 65 31 0D 0A 6C 69 6E 65 32 0D 0A 
written as DOS   , read as UNIX  :   5 "line1"
written as DOS   , read as DOS   :   5 "line1"
written as DOS   , read as MAC   :   5 "line1"
6C 69 6E 65 31 0D 6C 69 6E 65 32 0D 
written as MAC   , read as UNIX  :   5 "line1"
written as MAC   , read as DOS   :   5 "line1"
written as MAC   , read as MAC   :   5 "line1"
NIL


Now, I'm using clisp-2.33.83, and AFAIK, it worked the same with
clisp-2.33.2, but I cannot say if you're using an older version.
Upgrade!

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

This is a signature virus.  Add me to your signature and help me to live

From: Thomas A. Russ
Subject: Re: unix / dos newline - possible FAQ?
Date: Thu, 12 May 2005 19:08:11 +0000
Message-ID: <ymizmv0uusk.fsf@sevak.isi.edu>

Pascal Bourguignon <···@informatimago.com> writes:

> But in anycase, when clisp
> reads text files, it accepts any newline sequence:

Hooray for clisp!

> 
> 6C 69 6E 65 31 0A 6C 69 6E 65 32 0A 
> written as UNIX  , read as UNIX  :   5 "line1"
> written as UNIX  , read as DOS   :   5 "line1"
> written as UNIX  , read as MAC   :   5 "line1"
> 6C 69 6E 65 31 0D 0A 6C 69 6E 65 32 0D 0A 
> written as DOS   , read as UNIX  :   5 "line1"
> written as DOS   , read as DOS   :   5 "line1"
> written as DOS   , read as MAC   :   5 "line1"
> 6C 69 6E 65 31 0D 6C 69 6E 65 32 0D 
> written as MAC   , read as UNIX  :   5 "line1"
> written as MAC   , read as DOS   :   5 "line1"
> written as MAC   , read as MAC   :   5 "line1"
> NIL

But what does it get for the second line?  Does it get those correct as well!

-- 
Thomas A. Russ,  USC/Information Sciences Institute

From: Pascal Bourguignon
Subject: Re: unix / dos newline - possible FAQ?
Date: Thu, 12 May 2005 20:28:01 +0000
Message-ID: <87vf5ochpq.fsf@thalassa.informatimago.com>

···@sevak.isi.edu (Thomas A. Russ) writes:

> Pascal Bourguignon <···@informatimago.com> writes:
>
>> But in anycase, when clisp
>> reads text files, it accepts any newline sequence:
>
> Hooray for clisp!
>
>> 
>> 6C 69 6E 65 31 0A 6C 69 6E 65 32 0A 
>> written as UNIX  , read as UNIX  :   5 "line1"
>> written as UNIX  , read as DOS   :   5 "line1"
>> written as UNIX  , read as MAC   :   5 "line1"
>> 6C 69 6E 65 31 0D 0A 6C 69 6E 65 32 0D 0A 
>> written as DOS   , read as UNIX  :   5 "line1"
>> written as DOS   , read as DOS   :   5 "line1"
>> written as DOS   , read as MAC   :   5 "line1"
>> 6C 69 6E 65 31 0D 6C 69 6E 65 32 0D 
>> written as MAC   , read as UNIX  :   5 "line1"
>> written as MAC   , read as DOS   :   5 "line1"
>> written as MAC   , read as MAC   :   5 "line1"
>> NIL
>
> But what does it get for the second line?  Does it get those correct as well!

You have the source!  



Yes, of course.

(dolist (onewline '(:unix :dos :mac))
  (with-open-file (out "test.txt" :direction :output
                       :if-does-not-exist :create :if-exists :supersede
                       :external-format (ext:make-encoding
                                         :charset charset:iso-8859-1
                                         :line-terminator onewline))
    (format out "line1~%line2~%"))
  (with-open-file (in "test.txt" :direction :input 
                      :element-type '(unsigned-byte 8))
    (loop for byte = (read-byte in nil nil)
          while byte do (format t "~2,'0X " byte)
          finally (format t "~%")))
  (dolist (inewline '(:unix :dos :mac))
    (with-open-file (in "test.txt" :direction :input
                        :external-format (ext:make-encoding
                                          :charset charset:iso-8859-1
                                          :line-terminator inewline))
      (let ((line1  (read-line in))
            (line2  (read-line in)))
        (format t "written as ~6A, read as ~6A: ~{~%    ~3D ~S~}~%" 
                onewline inewline
                (list (length line1) line1
                      (length line2) line2))))))

6C 69 6E 65 31 0A 6C 69 6E 65 32 0A 
written as UNIX  , read as UNIX  : 
      5 "line1"
      5 "line2"
written as UNIX  , read as DOS   : 
      5 "line1"
      5 "line2"
written as UNIX  , read as MAC   : 
      5 "line1"
      5 "line2"
6C 69 6E 65 31 0D 0A 6C 69 6E 65 32 0D 0A 
written as DOS   , read as UNIX  : 
      5 "line1"
      5 "line2"
written as DOS   , read as DOS   : 
      5 "line1"
      5 "line2"
written as DOS   , read as MAC   : 
      5 "line1"
      5 "line2"
6C 69 6E 65 31 0D 6C 69 6E 65 32 0D 
written as MAC   , read as UNIX  : 
      5 "line1"
      5 "line2"
written as MAC   , read as DOS   : 
      5 "line1"
      5 "line2"
written as MAC   , read as MAC   : 
      5 "line1"
      5 "line2"
NIL


-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
The mighty hunter
Returns with gifts of plump birds,
Your foot just squashed one.

From: Thomas A. Russ
Subject: Re: unix / dos newline - possible FAQ?
Date: Thu, 12 May 2005 21:47:23 +0000
Message-ID: <ymiwtq4unf8.fsf@sevak.isi.edu>

Pascal Bourguignon <···@informatimago.com> writes:

> 
> ···@sevak.isi.edu (Thomas A. Russ) writes:
> > 
> > But what does it get for the second line?  Does it get those correct as well!
> 
> You have the source!  

Yes, but not the clisp implementation to run it on....

> Yes, of course.

Cool.  I wish this were more universally true, both across Common Lisp
implementations and across other languages as well.

-- 
Thomas A. Russ,  USC/Information Sciences Institute

From: GP lisper
Subject: Re: unix / dos newline - possible FAQ?
Date: Fri, 13 May 2005 05:19:46 +0000
Message-ID: <1115961631.53fb1d3b3e49ba856d23501d6ebd880e@teranews>

On 12 May 2005 10:47:10 -0700, <·····@yahoo.com> wrote:
>   Could someone suggest a way to distinguish between the two newlines
> so that read-line grabs everything from the stream that I want it to?
> How would you parse them if there is no way to make read-line work?

Emacs.

Emacs will convert text files, can be run from the command line, etc.


-- 
Everyman has three hearts;
one to show the world, one to show friends, and one only he knows.

From: Pascal Bourguignon
Subject: Re: unix / dos newline - possible FAQ?
Date: Fri, 13 May 2005 14:12:09 +0000
Message-ID: <87vf5nb4g6.fsf@thalassa.informatimago.com>

"Ben" <·····@yahoo.com> writes:
> I'm sorry if this is a FAQ - I've searched using google, and I can't
> find the answer.
>
> I am refreshing my Lisp knowledge this spring.  I had planned to learn
> Ruby but, after reading how much Ruby owes to Lisp, decided to switch.
> It's been 15 years since I even looked at Lisp code, but I remember it
> as a lot of fun.  I'm still having fun.
>
>   I am using Clisp under cygwin to write some scripts for my work.  As
> a result of this mixed environment, I am running into some problems
> with files that contain both dos and unix newlines.
>
>   Could someone suggest a way to distinguish between the two newlines
> so that read-line grabs everything from the stream that I want it to?
> How would you parse them if there is no way to make read-line work?

Now, as we've seen, when reading clisp tries to be helpful and
considers any occurence of LF, CR, or CRLF as a new line.

If you want to read them as distinct entities, you're not wanting to
read the file as text, but as binary.

And when writting, #\newline is converted to LF, CR or CRLF depending
on the encoding used.  The difficulty here is that there is that
#\linefeed  is used to denote #\newline, so you cannot output just
#\linefeed on a _text_ stream either.

But working with binary streamw you can easily interpret these codes as
you want.

Of course, you can change byteio:read-line to parse the bytes
differently. For example, if one is reading a LF-line-terminated file,
one might want to parse CRLFas a non line terminator, instead of CR
followed by a line terminator.  The fact is that it's a mess.


(DEFPACKAGE "COM.INFORMATIMAGO.CLISP.BYTEIO"
  (:NICKNAMES "BYTEIO")
  (:USE "COMMON-LISP")
  (:SHADOW "FORMAT" "READ-LINE" #|later add PRINT, PRINT1, PRINC,...|#)
  (:EXPORT "FORMAT" "READ-LINE"))
(IN-PACKAGE "COM.INFORMATIMAGO.CLISP.BYTEIO")

  
(DEFUN FORMAT (DEST CTRL &REST ARGS)
  (WHEN (EQ T DEST) (SETF DEST *STANDARD-OUTPUT*))
  (IF (AND DEST (STREAMP DEST)
           (EQUAL (STREAM-ELEMENT-TYPE DEST) '(UNSIGNED-BYTE 8)))
    (PROGN
      (WRITE-SEQUENCE (EXT:CONVERT-STRING-TO-BYTES
                       (APPLY (FUNCTION COMMON-LISP:FORMAT) NIL CTRL ARGS)
                       CUSTOM:*DEFAULT-FILE-ENCODING*)
                       DEST)
      NIL)
    (APPLY (FUNCTION COMMON-LISP:FORMAT) DEST CTRL ARGS)))


(DEFUN READ-LINE (&OPTIONAL (INPUT-STREAM NIL) (EOF-ERROR-P T)
                            (EOF-VALUE NIL) (RECURSIVE-P NIL)
                            ;; an extension:
                            (NEWLINE NIL))
  "
NEWLINE:  nil   <=> accepts any CR, LF, or CRLF as a new-line.
          :CR   <=> accepts only CR as new-line.
          :LF   <=> accepts only LF as new-line.
          :CRLF <=> accepts only CRLF as new-line.
"
  ;; => line, missing-newline-p
  (SETF INPUT-STREAM (OR INPUT-STREAM *STANDARD-INPUT*))
  (ASSERT (MEMBER NEWLINE '(NIL :CR :CRLF :LF)))
  (IF (AND INPUT-STREAM (STREAMP INPUT-STREAM)
           (EQUAL (STREAM-ELEMENT-TYPE INPUT-STREAM) '(UNSIGNED-BYTE 8)))
    (FLET ((RESULT
            (BUFFER MISSING-NEWLINE-P)
            (COND
             ((AND MISSING-NEWLINE-P EOF-ERROR-P)
              (ERROR (MAKE-CONDITION
                      'SYSTEM::SIMPLE-END-OF-FILE
                      :STREAM INPUT-STREAM
                      :FORMAT-CONTROL "~S: input stream ~S has reached its end"
                      :FORMAT-ARGUMENTS (LIST 'READ INPUT-STREAM))))
             ((AND MISSING-NEWLINE-P (ZEROP (LENGTH BUFFER)))
              (VALUES EOF-VALUE MISSING-NEWLINE-P))
             (T
              (VALUES (EXT:CONVERT-STRING-FROM-BYTES
                       BUFFER CUSTOM:*DEFAULT-FILE-ENCODING*)
                      MISSING-NEWLINE-P)))))
      (LOOP WITH BUFFER = (MAKE-ARRAY '(4094) :ELEMENT-TYPE '(UNSIGNED-BYTE 8)
                                      :INITIAL-ELEMENT 0
                                      :ADJUSTABLE T
                                      :FILL-POINTER 0)
            WITH EOF = (GENSYM)
            FOR BYTE = (READ-BYTE INPUT-STREAM NIL EOF)
            DO (IF (EQ EOF BYTE)
                 (RETURN (RESULT BUFFER T))
                 (CASE NEWLINE
                   ((:CR)
                    ;; BUG: we cannot peek a binary stream, so we take CR-LF as
                    ;;      a newline followed by a LF. We would have to buffer
                    ;;      the stream.
                    (IF (= 13 BYTE)
                      (RETURN (RESULT BUFFER NIL))
                      (VECTOR-PUSH-EXTEND BYTE BUFFER)))
                   ((:CRLF)
                    (IF (= 13 BYTE)
                      (LET ((BYTE (READ-BYTE INPUT-STREAM NIL EOF)))
                        (COND
                         ((EQ EOF BYTE)
                          (RETURN (RESULT BUFFER T)))
                         ((= 10 BYTE)
                          (RETURN (RESULT BUFFER NIL)))
                         (T
                          (VECTOR-PUSH-EXTEND 13 BUFFER)
                          (VECTOR-PUSH-EXTEND BYTE BUFFER))))
                      (VECTOR-PUSH-EXTEND BYTE BUFFER)))
                   ((:LF)
                    (IF (= 10 BYTE)
                      (RETURN (RESULT BUFFER NIL))
                      (VECTOR-PUSH-EXTEND BYTE BUFFER)))
                   ((NIL)
                    ;; BUG: we cannot peek a binary stream, so we take CR-LF as
                    ;;      a newline followed by a LF. We would have to buffer
                    ;;      the stream.
                    (IF (OR (= 10 BYTE) (= 13 BYTE))
                      (RETURN (RESULT BUFFER NIL))
                      (VECTOR-PUSH-EXTEND BYTE BUFFER)))))))
    (COMMON-LISP:READ-LINE INPUT-STREAM EOF-ERROR-P EOF-VALUE RECURSIVE-P)))



(IN-PACKAGE "COMMON-LISP-USER")

(SETF CUSTOM:*DEFAULT-FILE-ENCODING* (EXT:MAKE-ENCODING
                                      :CHARSET CHARSET:ISO-8859-1
                                      :LINE-TERMINATOR :UNIX))

(LET ((CR   (LIST (CODE-CHAR 13)))
      (LF   (LIST (CODE-CHAR 10)))
      (CRLF (LIST (CODE-CHAR 13)  (CODE-CHAR 10))))
  (DOLIST (NL (LIST CR LF CRLF))
    (LET ((P (FORMAT NIL "(~D) = ~{~C~} x" (LENGTH NL) NL)))
      ;; (print (list (length nl) (length p)))
      (PRINT (EXT:CONVERT-STRING-TO-BYTES P CUSTOM:*DEFAULT-FILE-ENCODING*))))
  (DOLIST (TEST (LIST (LIST CR LF CRLF)
                      (LIST CR CRLF LF)
                      (LIST LF CRLF CR)))
    (PRINT (EXT:CONVERT-STRING-TO-BYTES
            (APPLY (FUNCTION FORMAT) NIL
              "line1:field1~{~C~}field2~{~C~}field3~{~C~}line1:field1~{~C~}field2~{~C~}field3~{~C~}"
              (APPEND TEST TEST)) CUSTOM:*DEFAULT-FILE-ENCODING* ))))



(DEFPARAMETER +ASCII+
  #(
    NUL SOH STX ETX EOT ENQ ACK BEL BS TAB LF VT FF CR SO SI 
    DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US 
    SP NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL 
    NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL 
    NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL 
    NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL 
    NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL 
    NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL DEL 
    PAD HOP BPH NBH IND NEL SSA ESA HTS HTJ VTS PLD PLU RI SS2 SS3 
    DCS PU1 PU2 STS CCH MW SPA EPA SOS SGCI SCI CSI ST OSC PM APC))

(DEFUN SHOW-ALL (STRING &OPTIONAL (OUT T))
  (LOOP FOR CH ACROSS STRING
        DO (SHOW-CHAR CH OUT)))

(DEFUN SHOW-CHAR (CH &OPTIONAL (OUT T))
  (IF (OR (<= (LENGTH +ASCII+) (CHAR-CODE CH))
          (NULL (AREF +ASCII+  (CHAR-CODE CH))))
    (PRINC CH OUT)
    (FORMAT OUT "<~A>"  (AREF +ASCII+  (CHAR-CODE CH)))))


(LET ((CR   (LIST (CODE-CHAR 13)))
      (LF   (LIST (CODE-CHAR 10)))
      (CRLF (LIST (CODE-CHAR 13)  (CODE-CHAR 10))))
  (DOLIST (TEST (LIST (LIST CR LF CRLF)
                      (LIST LF CR CRLF)
                      (LIST CR CRLF LF)
                      (LIST LF CRLF CR)
                      (LIST CRLF CR LF)
                      (LIST CRLF LF CR)))
    (FORMAT T "~2%")
    (WITH-OPEN-FILE (OUT "test.txt" :DIRECTION :OUTPUT
                         :ELEMENT-TYPE '(UNSIGNED-BYTE 8)
                         :IF-DOES-NOT-EXIST :CREATE :IF-EXISTS :SUPERSEDE)
      (APPLY (FUNCTION BYTEIO:FORMAT) OUT "line1:field1~{~C~}field2~{~C~}field3~{~C~}line1:field1~{~C~}field2~{~C~}field3~{~C~}"  (APPEND TEST TEST)))
    (WITH-OPEN-FILE (IN "test.txt" :DIRECTION :INPUT 
                        :ELEMENT-TYPE '(UNSIGNED-BYTE 8))
      (LOOP FOR BYTE = (READ-BYTE IN NIL NIL)
            WHILE BYTE DO (SHOW-CHAR (CODE-CHAR BYTE))
            FINALLY (FORMAT T "~%")))
    (DOLIST (NEWLINE '(NIL :CR :LF :CRLF))
      (WITH-OPEN-FILE (IN "test.txt" :DIRECTION :INPUT
                          :ELEMENT-TYPE '(UNSIGNED-BYTE 8))
        (FORMAT T "reading with ~6A as newline: ~%" NEWLINE)
        (LOOP FOR LINE =  (BYTEIO:READ-LINE IN NIL NIL NIL NEWLINE)
              FOR NUM FROM 1
              WHILE LINE DO
              (FORMAT T "  line #~2D :  ~3D byte~:[ ~;s~]: ~A~%"
                      NUM
                      (LENGTH LINE)
                      (< 1 (LENGTH LINE))
                      (WITH-OUTPUT-TO-STRING (S) (SHOW-ALL LINE S))))))))


(WITH-OPEN-FILE (OUT "test.bin" :DIRECTION :OUTPUT
                         :ELEMENT-TYPE '(UNSIGNED-BYTE 8)
                         :IF-DOES-NOT-EXIST :CREATE :IF-EXISTS :SUPERSEDE)
  (BYTEIO:FORMAT OUT "-~C-~C-~C~C-" #\RETURN #\LINEFEED #\RETURN #\LINEFEED))

(WITH-OPEN-FILE (OUT "test.txt" :DIRECTION :OUTPUT
                     :ELEMENT-TYPE 'CHARACTER
                     :EXTERNAL-FORMAT (EXT:MAKE-ENCODING
                                       :CHARSET CHARSET:ISO-8859-1
                                       :LINE-TERMINATOR :DOS)
                     :IF-DOES-NOT-EXIST :CREATE :IF-EXISTS :SUPERSEDE)
  (FORMAT OUT "-~C-~C-~C~C-" #\RETURN #\LINEFEED #\RETURN #\LINEFEED))


(DEFUN DUMP-CHARS (FILE)
  (WITH-OPEN-FILE (IN FILE :DIRECTION :INPUT
                      :ELEMENT-TYPE '(UNSIGNED-BYTE 8))
    (LET ((BUFFER (MAKE-ARRAY '(4000) :FILL-POINTER T
                              :ELEMENT-TYPE  '(UNSIGNED-BYTE 8)
                              :INITIAL-ELEMENT 0)))
      (SETF (FILL-POINTER BUFFER) (READ-SEQUENCE BUFFER IN))
      (SHOW-ALL (MAP 'STRING (FUNCTION CODE-CHAR) BUFFER)))))
(TERPRI)(DUMP-CHARS "test.txt")
(TERPRI)(DUMP-CHARS "test.bin")
(TERPRI)


#||
(load"clisp-byteio.lisp")
;; Loading file clisp-byteio.lisp ...
#(40 49 41 32 61 32 13 32 120) 
#(40 49 41 32 61 32 10 32 120) 
#(40 50 41 32 61 32 13 10 32 120) 
#(108 105 110 101 49 58 102 105 101 108 100 49 13 102 105 101 108 100 50 10 102
  105 101 108 100 51 13 10 108 105 110 101 49 58 102 105 101 108 100 49 13 102
  105 101 108 100 50 10 102 105 101 108 100 51 13 10) 
#(108 105 110 101 49 58 102 105 101 108 100 49 13 102 105 101 108 100 50 13 10
  102 105 101 108 100 51 10 108 105 110 101 49 58 102 105 101 108 100 49 13 102
  105 101 108 100 50 13 10 102 105 101 108 100 51 10) 
#(108 105 110 101 49 58 102 105 101 108 100 49 10 102 105 101 108 100 50 13 10
  102 105 101 108 100 51 13 108 105 110 101 49 58 102 105 101 108 100 49 10 102
  105 101 108 100 50 13 10 102 105 101 108 100 51 13) 

line1:field1<CR>field2<LF>field3<CR><LF>line1:field1<CR>field2<LF>field3<CR><LF>
reading with NIL    as newline: 
  line # 1 :   12 bytes: line1:field1
  line # 2 :    6 bytes: field2
  line # 3 :    6 bytes: field3
  line # 4 :    0 byte : 
  line # 5 :   12 bytes: line1:field1
  line # 6 :    6 bytes: field2
  line # 7 :    6 bytes: field3
  line # 8 :    0 byte : 
reading with CR     as newline: 
  line # 1 :   12 bytes: line1:field1
  line # 2 :   13 bytes: field2<LF>field3
  line # 3 :   13 bytes: <LF>line1:field1
  line # 4 :   13 bytes: field2<LF>field3
  line # 5 :    1 byte : <LF>
reading with LF     as newline: 
  line # 1 :   19 bytes: line1:field1<CR>field2
  line # 2 :    7 bytes: field3<CR>
  line # 3 :   19 bytes: line1:field1<CR>field2
  line # 4 :    7 bytes: field3<CR>
reading with CRLF   as newline: 
  line # 1 :   26 bytes: line1:field1<CR>field2<LF>field3
  line # 2 :   26 bytes: line1:field1<CR>field2<LF>field3


line1:field1<LF>field2<CR>field3<CR><LF>line1:field1<LF>field2<CR>field3<CR><LF>
reading with NIL    as newline: 
  line # 1 :   12 bytes: line1:field1
  line # 2 :    6 bytes: field2
  line # 3 :    6 bytes: field3
  line # 4 :    0 byte : 
  line # 5 :   12 bytes: line1:field1
  line # 6 :    6 bytes: field2
  line # 7 :    6 bytes: field3
  line # 8 :    0 byte : 
reading with CR     as newline: 
  line # 1 :   19 bytes: line1:field1<LF>field2
  line # 2 :    6 bytes: field3
  line # 3 :   20 bytes: <LF>line1:field1<LF>field2
  line # 4 :    6 bytes: field3
  line # 5 :    1 byte : <LF>
reading with LF     as newline: 
  line # 1 :   12 bytes: line1:field1
  line # 2 :   14 bytes: field2<CR>field3<CR>
  line # 3 :   12 bytes: line1:field1
  line # 4 :   14 bytes: field2<CR>field3<CR>
reading with CRLF   as newline: 
  line # 1 :   26 bytes: line1:field1<LF>field2<CR>field3
  line # 2 :   26 bytes: line1:field1<LF>field2<CR>field3


line1:field1<CR>field2<CR><LF>field3<LF>line1:field1<CR>field2<CR><LF>field3<LF>
reading with NIL    as newline: 
  line # 1 :   12 bytes: line1:field1
  line # 2 :    6 bytes: field2
  line # 3 :    0 byte : 
  line # 4 :    6 bytes: field3
  line # 5 :   12 bytes: line1:field1
  line # 6 :    6 bytes: field2
  line # 7 :    0 byte : 
  line # 8 :    6 bytes: field3
reading with CR     as newline: 
  line # 1 :   12 bytes: line1:field1
  line # 2 :    6 bytes: field2
  line # 3 :   20 bytes: <LF>field3<LF>line1:field1
  line # 4 :    6 bytes: field2
  line # 5 :    8 bytes: <LF>field3<LF>
reading with LF     as newline: 
  line # 1 :   20 bytes: line1:field1<CR>field2<CR>
  line # 2 :    6 bytes: field3
  line # 3 :   20 bytes: line1:field1<CR>field2<CR>
  line # 4 :    6 bytes: field3
reading with CRLF   as newline: 
  line # 1 :   19 bytes: line1:field1<CR>field2
  line # 2 :   26 bytes: field3<LF>line1:field1<CR>field2
  line # 3 :    7 bytes: field3<LF>


line1:field1<LF>field2<CR><LF>field3<CR>line1:field1<LF>field2<CR><LF>field3<CR>
reading with NIL    as newline: 
  line # 1 :   12 bytes: line1:field1
  line # 2 :    6 bytes: field2
  line # 3 :    0 byte : 
  line # 4 :    6 bytes: field3
  line # 5 :   12 bytes: line1:field1
  line # 6 :    6 bytes: field2
  line # 7 :    0 byte : 
  line # 8 :    6 bytes: field3
reading with CR     as newline: 
  line # 1 :   19 bytes: line1:field1<LF>field2
  line # 2 :    7 bytes: <LF>field3
  line # 3 :   19 bytes: line1:field1<LF>field2
  line # 4 :    7 bytes: <LF>field3
reading with LF     as newline: 
  line # 1 :   12 bytes: line1:field1
  line # 2 :    7 bytes: field2<CR>
  line # 3 :   19 bytes: field3<CR>line1:field1
  line # 4 :    7 bytes: field2<CR>
  line # 5 :    7 bytes: field3<CR>
reading with CRLF   as newline: 
  line # 1 :   19 bytes: line1:field1<LF>field2
  line # 2 :   26 bytes: field3<CR>line1:field1<LF>field2
  line # 3 :    6 bytes: field3


line1:field1<CR><LF>field2<CR>field3<LF>line1:field1<CR><LF>field2<CR>field3<LF>
reading with NIL    as newline: 
  line # 1 :   12 bytes: line1:field1
  line # 2 :    0 byte : 
  line # 3 :    6 bytes: field2
  line # 4 :    6 bytes: field3
  line # 5 :   12 bytes: line1:field1
  line # 6 :    0 byte : 
  line # 7 :    6 bytes: field2
  line # 8 :    6 bytes: field3
reading with CR     as newline: 
  line # 1 :   12 bytes: line1:field1
  line # 2 :    7 bytes: <LF>field2
  line # 3 :   19 bytes: field3<LF>line1:field1
  line # 4 :    7 bytes: <LF>field2
  line # 5 :    7 bytes: field3<LF>
reading with LF     as newline: 
  line # 1 :   13 bytes: line1:field1<CR>
  line # 2 :   13 bytes: field2<CR>field3
  line # 3 :   13 bytes: line1:field1<CR>
  line # 4 :   13 bytes: field2<CR>field3
reading with CRLF   as newline: 
  line # 1 :   12 bytes: line1:field1
  line # 2 :   26 bytes: field2<CR>field3<LF>line1:field1
  line # 3 :   14 bytes: field2<CR>field3<LF>


line1:field1<CR><LF>field2<LF>field3<CR>line1:field1<CR><LF>field2<LF>field3<CR>
reading with NIL    as newline: 
  line # 1 :   12 bytes: line1:field1
  line # 2 :    0 byte : 
  line # 3 :    6 bytes: field2
  line # 4 :    6 bytes: field3
  line # 5 :   12 bytes: line1:field1
  line # 6 :    0 byte : 
  line # 7 :    6 bytes: field2
  line # 8 :    6 bytes: field3
reading with CR     as newline: 
  line # 1 :   12 bytes: line1:field1
  line # 2 :   14 bytes: <LF>field2<LF>field3
  line # 3 :   12 bytes: line1:field1
  line # 4 :   14 bytes: <LF>field2<LF>field3
reading with LF     as newline: 
  line # 1 :   13 bytes: line1:field1<CR>
  line # 2 :    6 bytes: field2
  line # 3 :   20 bytes: field3<CR>line1:field1<CR>
  line # 4 :    6 bytes: field2
  line # 5 :    7 bytes: field3<CR>
reading with CRLF   as newline: 
  line # 1 :   12 bytes: line1:field1
  line # 2 :   26 bytes: field2<LF>field3<CR>line1:field1
  line # 3 :   13 bytes: field2<LF>field3

-<CR>-<CR><LF>-<CR><CR><LF>-
-<CR>-<LF>-<CR><LF>-
;; Loaded file clisp-byteio.lisp
T
[299]>
||#

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS d? s++:++ a+ C+++ UL++++ P--- L+++ E+++ W++ N+++ o-- K- w--- 
O- M++ V PS PE++ Y++ PGP t+ 5+ X++ R !tv b+++ DI++++ D++ 
G e+++ h+ r-- z? 
------END GEEK CODE BLOCK------