I have tab-delimited date files all over the place. I'd like to
build some of my experimental programs around reading them in, doing
whatever processing, and spitting new tab-delimited files back out. The
hangup is that these files have a mix of line termination schemes. Some
terminate with a lf (#\Linefeed), some use a cr (#\Newline), and others
use the crlf pair.
Is there such a thing as readline-preserving-whitespace (presumably a
function I could tell to read into a string until it sees either a cr or
nl character, leaving that character on the input) or at least a neat
way of getting that effect?
I attempted to build one from parts, but it's broken and I do not
understand why.
(defun readline-funky-terminals (&optional (stream t))
(loop for Line = (make-string 1000)
and C = (read-char stream) then (read-char stream)
and place = 0 then (1+ place)
do (print C) ;debug
(case C
(#\Linefeed
(print 'linefeed) ;debug
(return (values Line #\Linefeed)))
(#\Return
(print 'return) ;debug
(return (values Line #\Return)))
(otherwise
(format t "setting ~W at ~A" C place) ;debug
(print (setf (elt Line place) C)))) ;debug!!
until (> place 1000)
finally
(print 'crap) ;debug
(values Line #\Newline)))
Behavior ought to be that it will read characters from a stream until
seeing a newline or carriage return, at which point it will return two
values: a sequence (string type) containing all the prior characters and
the terminal character. Aborts and returns a #\newline if the sequence
gets full.
Here's what has be confused: at the end of a line from a file I got
this output:
#\o setting #\o at 62
#\o
#\p setting #\p at 63
#\p
#\Newline
LINEFEED
"" ;
#\Newline
The CASE saw a linefeed, but everything else wants to call it a
newline. Is this due to an automatic translation the printer does when
showing all of this? In other words, are those #\Newlines really
#\Linefeeds? If so, can I suppress that behavior?
Also, why is the string printed out empty? Am I abusing sequences?
I figure I am since I'm treating them just like I would C strings.
Other questions I haven't had time to experiment much with yet. How
do I get format to print a real-life tab character without passing it as
an argument? I'd like to print records back out in tab-delimited
fashion, but I don't want to double the number of arguments I pass just
to get tabbed output.
With regards to calling to C code, I've found
http://clisp.sourceforge.net/impnotes.html#affi but is there a page
anywhere that gives more examples? And is there a way for C programs to
call Lisp programs inside clisp? Or is there a lightweight lisp I could
embed in a C program?
In the above function, the error handling is crap. What I'd prefer
would be a buffer object I can dynamically grow. Is there a class like
that available in a library I can play with? Or is there a way I can
get that effect with built-ins without much complexity?
Any word on the sanity simply having this function throw an error if
the fixed-size buffer overruns? I suppose I could let it go and fall
back on elt's error, but that seems ugly.
--
B.B. --I am not a goat! thegoat4 at airmail.net
Fire the stupid--Vote.
Hi B.B.,
> I have tab-delimited date files all over the place. I'd like to build
> some of my experimental programs around reading them in, doing whatever
> processing, and spitting new tab-delimited files back out. The hangup
> is that these files have a mix of line termination schemes. Some
> terminate with a lf (#\Linefeed), some use a cr (#\Newline), and others
> use the crlf pair.
You mentioned CLISP later in your post. There is no way to read the
distinction using a character stream in CLISP:
<http://clisp.sourceforge.net/impnotes.html#clhs-newline>
To distinguish this you will need a binary stream. Note that CLISP
prohibits setting *terminal-io* as a binary stream:
<http://www.geocrawler.com/mail/msg.php3?msg_id=6008951&list=1124>
Regards,
Adam
B.B. wrote:
> I have tab-delimited date files all over the place. I'd like to
> build some of my experimental programs around reading them in, doing
> whatever processing, and spitting new tab-delimited files back out. The
> hangup is that these files have a mix of line termination schemes. Some
> terminate with a lf (#\Linefeed), some use a cr (#\Newline), and others
> use the crlf pair.
> Is there such a thing as readline-preserving-whitespace (presumably a
> function I could tell to read into a string until it sees either a cr or
> nl character, leaving that character on the input) or at least a neat
> way of getting that effect?
> I attempted to build one from parts, but it's broken and I do not
> understand why.
>
> (defun readline-funky-terminals (&optional (stream t))
> (loop for Line = (make-string 1000)
this should be "(loop with line = etc". "With" is used to initialize
stuff before iterating, "for" is an iteration term. so you were
reinitializing "line" to a new string on each read-char.
> and C = (read-char stream) then (read-char stream)
> and place = 0 then (1+ place)
> do (print C) ;debug
> (case C
> (#\Linefeed
> (print 'linefeed) ;debug
> (return (values Line #\Linefeed)))
> (#\Return
> (print 'return) ;debug
> (return (values Line #\Return)))
> (otherwise
> (format t "setting ~W at ~A" C place) ;debug
> (print (setf (elt Line place) C)))) ;debug!!
> until (> place 1000)
> finally
> (print 'crap) ;debug
> (values Line #\Newline)))
>
...snip...
> Also, why is the string printed out empty? Am I abusing sequences?
> I figure I am since I'm treating them just like I would C strings.
What you did was OK, and if there were no other issues may work once you
change the "for" to "with", but ...
> In the above function, the error handling is crap. What I'd prefer
> would be a buffer object I can dynamically grow. Is there a class like
> that available in a library I can play with? Or is there a way I can
> get that effect with built-ins without much complexity?
here is another way to build up a string, in a function which changes
stuff like "glPushMatrix" to the symbol 'gl-push-matrix:
(defun lisp-fn (n$) ;; string input
(loop with ln = (make-array 0 :element-type 'character
:adjustable t :fill-pointer 0)
and n$len = (length n$)
for n upfrom 0
for c across n$
when (and (plusp n)
(upper-case-p c)
(or (lower-case-p (elt n$ (1- n)))
(unless (>= (1+ n) n$len)
(lower-case-p (elt n$ (1+ n))))))
do (vector-push-extend #\- ln)
do (vector-push-extend (char-upcase c) ln)
finally (return (intern ln))))
If you just return ln it will be a lisp string of the correct length.
kt
--
Home? http://tilton-technology.com
Cells? http://www.common-lisp.net/project/cells/
Cello? http://www.common-lisp.net/project/cello/
Why Lisp? http://alu.cliki.net/RtL%20Highlight%20Film
Your Project Here! http://alu.cliki.net/Industry%20Application
Kenny Tilton <·······@nyc.rr.com> writes:
> B.B. wrote:
>
> > In the above function, the error handling is crap. What I'd prefer
> > would be a buffer object I can dynamically grow. Is there a class like
> > that available in a library I can play with? Or is there a way I can
> > get that effect with built-ins without much complexity?
>
> here is another way to build up a string, in a function which changes
> stuff like "glPushMatrix" to the symbol 'gl-push-matrix:
>
> (defun lisp-fn (n$) ;; string input
> (loop with ln = (make-array 0 :element-type 'character
> :adjustable t :fill-pointer 0)
> and n$len = (length n$)
> for n upfrom 0
> for c across n$
> when (and (plusp n)
> (upper-case-p c)
> (or (lower-case-p (elt n$ (1- n)))
> (unless (>= (1+ n) n$len)
> (lower-case-p (elt n$ (1+ n))))))
> do (vector-push-extend #\- ln)
> do (vector-push-extend (char-upcase c) ln)
> finally (return (intern ln))))
>
> If you just return ln it will be a lisp string of the correct length.
For the OP, there are two techniques to build up a string of unknown
length; what Kenny just showed you, using adjustable vectors and
vector-push-extend, or you can use with-output-to-string:
(with-output-to-string (string)
(loop with *print-pretty* = nil
...
do (print foo string)
...))
Use whichever feels stylistically better at the time, and don't sweat
the technique. In CMUCL/SBCL, w-o-t-s is actually more efficient.
--
/|_ .-----------------------.
,' .\ / | No to Imperialist war |
,--' _,' | Wage class war! |
/ / `-----------------------'
( -. |
| ) |
(`-. '--.)
`. )----'
Try this function
(defun readline (&optional (stream *standard-input*))
"Similar to common-lisp:read-line but has control
over line termination. Will return line terminated
by LF, CR or end-of-file."
(let ((string (make-array 128 :adjustable t :fill-pointer 0 :element-type 'base-char)))
(loop for c = (read-char stream nil nil)
while c do
(case c
((#\lf #\cr) (return (values string c)))
(otherwise (vector-push-extend c string)))
finally (return (values string :eof)))))
Wade
Wade Humeniuk <····································@telus.net> writes:
> Try this function
>
> (defun readline (&optional (stream *standard-input*))
> "Similar to common-lisp:read-line but has control
> over line termination. Will return line terminated
> by LF, CR or end-of-file."
> (let ((string (make-array 128 :adjustable t :fill-pointer 0 :element-type 'base-char)))
> (loop for c = (read-char stream nil nil)
> while c do
> (case c
> ((#\lf #\cr) (return (values string c)))
> (otherwise (vector-push-extend c string)))
> finally (return (values string :eof)))))
Does that really work in some implementation? Modulo bivalent streams,
if stream is a character-stream, as it must be given that you're
passing it to READ-CHAR, then you'd never get a #\lf or #\cr back as
the stream would convert them (or the right combination thereof) to a
#\Newline character.
-Peter
--
Peter Seibel ·····@javamonkey.com
Lisp is the red pill. -- John Fraser, comp.lang.lisp
Peter Seibel <·····@javamonkey.com> writes:
> Wade Humeniuk <····································@telus.net> writes:
>
> > Try this function
> >
> > (defun readline (&optional (stream *standard-input*))
> > "Similar to common-lisp:read-line but has control
> > over line termination. Will return line terminated
> > by LF, CR or end-of-file."
> > (let ((string (make-array 128 :adjustable t :fill-pointer 0 :element-type 'base-char)))
> > (loop for c = (read-char stream nil nil)
> > while c do
> > (case c
> > ((#\lf #\cr) (return (values string c)))
> > (otherwise (vector-push-extend c string)))
> > finally (return (values string :eof)))))
>
> Does that really work in some implementation? Modulo bivalent streams,
> if stream is a character-stream, as it must be given that you're
> passing it to READ-CHAR, then you'd never get a #\lf or #\cr back as
> the stream would convert them (or the right combination thereof) to a
> #\Newline character.
Note however in 13.1.7 that #\newline and #\linefeed might in fact be
the same character.
--
Duane Rettig ·····@franz.com Franz Inc. http://www.franz.com/
555 12th St., Suite 1450 http://www.555citycenter.com/
Oakland, Ca. 94607 Phone: (510) 452-2000; Fax: (510) 452-0182
Duane Rettig <·····@franz.com> writes:
> Peter Seibel <·····@javamonkey.com> writes:
>
>> Wade Humeniuk <····································@telus.net> writes:
>>
>> > Try this function
>> >
>> > (defun readline (&optional (stream *standard-input*))
>> > "Similar to common-lisp:read-line but has control
>> > over line termination. Will return line terminated
>> > by LF, CR or end-of-file."
>> > (let ((string (make-array 128 :adjustable t :fill-pointer 0 :element-type 'base-char)))
>> > (loop for c = (read-char stream nil nil)
>> > while c do
>> > (case c
>> > ((#\lf #\cr) (return (values string c)))
>> > (otherwise (vector-push-extend c string)))
>> > finally (return (values string :eof)))))
>>
>> Does that really work in some implementation? Modulo bivalent streams,
>> if stream is a character-stream, as it must be given that you're
>> passing it to READ-CHAR, then you'd never get a #\lf or #\cr back as
>> the stream would convert them (or the right combination thereof) to a
>> #\Newline character.
>
> Note however in 13.1.7 that #\newline and #\linefeed might in fact be
> the same character.
But they might not. So this code should probably also treat #\Newline
as an end of line marker in case, for instance, on a CRLF platform the
#\Return #\Linefeed sequence gets translated to a distinct #\Newline
character that isn't EQL to either of those two characters.
-Peter
--
Peter Seibel ·····@javamonkey.com
Lisp is the red pill. -- John Fraser, comp.lang.lisp
Peter Seibel <·····@javamonkey.com> writes:
> Duane Rettig <·····@franz.com> writes:
>
> > Peter Seibel <·····@javamonkey.com> writes:
> >
> >> Wade Humeniuk <····································@telus.net> writes:
> >>
> >> > Try this function
> >> >
> >> > (defun readline (&optional (stream *standard-input*))
> >> > "Similar to common-lisp:read-line but has control
> >> > over line termination. Will return line terminated
> >> > by LF, CR or end-of-file."
> >> > (let ((string (make-array 128 :adjustable t :fill-pointer 0 :element-type 'base-char)))
> >> > (loop for c = (read-char stream nil nil)
> >> > while c do
> >> > (case c
> >> > ((#\lf #\cr) (return (values string c)))
> >> > (otherwise (vector-push-extend c string)))
> >> > finally (return (values string :eof)))))
> >>
> >> Does that really work in some implementation? Modulo bivalent streams,
> >> if stream is a character-stream, as it must be given that you're
> >> passing it to READ-CHAR, then you'd never get a #\lf or #\cr back as
> >> the stream would convert them (or the right combination thereof) to a
> >> #\Newline character.
> >
> > Note however in 13.1.7 that #\newline and #\linefeed might in fact be
> > the same character.
>
> But they might not. So this code should probably also treat #\Newline
> as an end of line marker in case, for instance, on a CRLF platform the
> #\Return #\Linefeed sequence gets translated to a distinct #\Newline
> character that isn't EQL to either of those two characters.
Your question was "Does that really work on some implementation?",
and the answer should be "Yes, on any implementation for which #\Newline
and #\Linefeed are the same". The #\cr is redundant, in the code above;
if the system is one which combines cr/lf into one character, but the
lisp's read-char doesn't do that merge to #\newline, then there are
likely to be other problems anyway.
The fact that there are likely implementations on which the above code
doesn't work makes the code non-portable, but it doesn't stop the answer
to the question "are there some systems on which it works?" from being
"yes".
--
Duane Rettig ·····@franz.com Franz Inc. http://www.franz.com/
555 12th St., Suite 1450 http://www.555citycenter.com/
Oakland, Ca. 94607 Phone: (510) 452-2000; Fax: (510) 452-0182
Duane Rettig <·····@franz.com> writes:
> Peter Seibel <·····@javamonkey.com> writes:
>
>> Duane Rettig <·····@franz.com> writes:
>>
>> > Peter Seibel <·····@javamonkey.com> writes:
>> >
>> >> Wade Humeniuk <····································@telus.net> writes:
>> >>
>> >> > Try this function
>> >> >
>> >> > (defun readline (&optional (stream *standard-input*))
>> >> > "Similar to common-lisp:read-line but has control
>> >> > over line termination. Will return line terminated
>> >> > by LF, CR or end-of-file."
>> >> > (let ((string (make-array 128 :adjustable t :fill-pointer 0 :element-type 'base-char)))
>> >> > (loop for c = (read-char stream nil nil)
>> >> > while c do
>> >> > (case c
>> >> > ((#\lf #\cr) (return (values string c)))
>> >> > (otherwise (vector-push-extend c string)))
>> >> > finally (return (values string :eof)))))
>> >>
>> >> Does that really work in some implementation? Modulo bivalent streams,
>> >> if stream is a character-stream, as it must be given that you're
>> >> passing it to READ-CHAR, then you'd never get a #\lf or #\cr back as
>> >> the stream would convert them (or the right combination thereof) to a
>> >> #\Newline character.
>> >
>> > Note however in 13.1.7 that #\newline and #\linefeed might in fact be
>> > the same character.
>>
>> But they might not. So this code should probably also treat #\Newline
>> as an end of line marker in case, for instance, on a CRLF platform the
>> #\Return #\Linefeed sequence gets translated to a distinct #\Newline
>> character that isn't EQL to either of those two characters.
>
> Your question was "Does that really work on some implementation?",
> and the answer should be "Yes, on any implementation for which #\Newline
> and #\Linefeed are the same". The #\cr is redundant, in the code above;
> if the system is one which combines cr/lf into one character, but the
> lisp's read-char doesn't do that merge to #\newline, then there are
> likely to be other problems anyway.
>
> The fact that there are likely implementations on which the above code
> doesn't work makes the code non-portable, but it doesn't stop the answer
> to the question "are there some systems on which it works?" from being
> "yes".
Good point. I forgot (and didn't read) what my own original question
was.
-Peter
--
Peter Seibel ·····@javamonkey.com
Lisp is the red pill. -- John Fraser, comp.lang.lisp