Beginner code - splitting lines on whitespace

From: Steve Allan
Subject: Beginner code - splitting lines on whitespace
Date: Thu, 22 May 2008 20:21:07 +0000
Message-ID: <uy762f5u4.fsf@attachmate.com>

I've been lurking here for a bit, and trying to learn lisp in my spare
time.  As an exercise, I'm trying to port some Perl code to lisp, and
the first bit of functionality I wrote was to read a file that looks
like this:

#------------------------------------------
# unique-label    hostname    home-dir
#------------------------------------------
mach1	host1	/home/me
mach2	host2	/export/home/me
mach3   host3   c:\\home\\me

And split each line into a list, returning a list of lists:

(("mach1" "host1" "/home/me") 
 ("mach2" "host2" "/export/home/me")
 ("mach3" "host3" "c:\\home\\me"))

The code below does this, but I'm pretty sure there's much that could
be improved. To speed up my learning, I'd love feedback in either/both
of these areas.

1) Critique on the implementation - how to improve the code as
written.  

2) Suggestions for a better approach altogether.

I realize the task I'm coding is pretty mundane, but I think I could
learn a lot of the basics by really getting this right, so your
critique would be most welcome.

Thanks!

Code is below.

--
-- Steve


;;;;================================================================
(defun get-platforms (file)
  (with-open-file (platforms file)
    (loop for line = (read-line platforms nil)
          while line
          unless (position #\# line)
          collect (split-on-space line))))

(defun split-on-space (string)
  "Splits string on whitespace, meaning spaces and tabs"
  (unless (null string) 
    (let ((space (or (position #\space string) (position #\tab string))))
      (cond 
        (space (cons 
                (subseq string 0  space)
                (split-on-space 
                 (string-trim '(#\Space #\Tab) (subseq string space)))))
        (t (list string))))))

(get-platforms "c:/projects/lisp/platforms.lst")

Re: Beginner code - splitting lines on whitespace vanekl
Re: Beginner code - splitting lines on whitespace Thomas A. Russ
- Re: Beginner code - splitting lines on whitespace Steve Allan
  - Re: Beginner code - splitting lines on whitespace Wade Humeniuk
  - Re: Beginner code - splitting lines on whitespace Thomas A. Russ
Re: Beginner code - splitting lines on whitespace Ken Tilton
- Re: Beginner code - splitting lines on whitespace Steve Allan
  - Re: Beginner code - splitting lines on whitespace Ken Tilton
Re: Beginner code - splitting lines on whitespace danb

From: vanekl
Subject: Re: Beginner code - splitting lines on whitespace
Date: Thu, 22 May 2008 20:43:39 +0000
Message-ID: <30183dff-ee20-4c25-9d35-36149531a992@34g2000hsh.googlegroups.com>

On May 22, 8:21 pm, Steve Allan <··········@yahoo.com> wrote:
> I've been lurking here for a bit, and trying to learn lisp in my spare
> time.  As an exercise, I'm trying to port some Perl code to lisp, and
> the first bit of functionality I wrote was to read a file that looks
> like this:
>
> #------------------------------------------
> # unique-label    hostname    home-dir
> #------------------------------------------
> mach1   host1   /home/me
> mach2   host2   /export/home/me
> mach3   host3   c:\\home\\me
>
> And split each line into a list, returning a list of lists:
>
> (("mach1" "host1" "/home/me")
>  ("mach2" "host2" "/export/home/me")
>  ("mach3" "host3" "c:\\home\\me"))
>
> The code below does this, but I'm pretty sure there's much that could
> be improved. To speed up my learning, I'd love feedback in either/both
> of these areas.
>
> 1) Critique on the implementation - how to improve the code as
> written.  
>
> 2) Suggestions for a better approach altogether.
>
> I realize the task I'm coding is pretty mundane, but I think I could
> learn a lot of the basics by really getting this right, so your
> critique would be most welcome.
>
> Thanks!
>
> Code is below.
>
> --
> -- Steve
>
> ;;;;================================================================
> (defun get-platforms (file)
>   (with-open-file (platforms file)
>     (loop for line = (read-line platforms nil)
>           while line
>           unless (position #\# line)
>           collect (split-on-space line))))
>
> (defun split-on-space (string)
>   "Splits string on whitespace, meaning spaces and tabs"
>   (unless (null string)
>     (let ((space (or (position #\space string) (position #\tab string))))
>       (cond
>         (space (cons
>                 (subseq string 0  space)
>                 (split-on-space
>                  (string-trim '(#\Space #\Tab) (subseq string space)))))
>         (t (list string))))))
>
> (get-platforms "c:/projects/lisp/platforms.lst")

check this out,
http://www.cliki.net/SPLIT-SEQUENCE
i think this would reduce your code by at least half

From: Thomas A. Russ
Subject: Re: Beginner code - splitting lines on whitespace
Date: Thu, 22 May 2008 23:37:17 +0000
Message-ID: <ymiiqx5ewr6.fsf@blackcat.isi.edu>

Steve Allan <··········@yahoo.com> writes:

> I've been lurking here for a bit, and trying to learn lisp in my spare
> time.  As an exercise, I'm trying to port some Perl code to lisp, and
> the first bit of functionality I wrote was to read a file that looks
> like this:
> 
> #------------------------------------------
> # unique-label    hostname    home-dir
> #------------------------------------------
> mach1	host1	/home/me
> mach2	host2	/export/home/me
> mach3   host3   c:\\home\\me
> 
> And split each line into a list, returning a list of lists:
> 
> (("mach1" "host1" "/home/me") 
>  ("mach2" "host2" "/export/home/me")
>  ("mach3" "host3" "c:\\home\\me"))
> 
> The code below does this, but I'm pretty sure there's much that could
> be improved. To speed up my learning, I'd love feedback in either/both
> of these areas.

Actually, this looks generally pretty good.

> 1) Critique on the implementation - how to improve the code as
> written.  

You could you the suggested SPLIT-SEQUENCE code or for a more full-blown
approach (especially if you are planning on porting more Perl code)
taking a look at the CL-PPCRE regular expression package
  <http://www.weitz.de/cl-ppcre/> 

> 2) Suggestions for a better approach altogether.

> ;;;;================================================================
> (defun get-platforms (file)
>   (with-open-file (platforms file)
>     (loop for line = (read-line platforms nil)
>           while line
>           unless (position #\# line)
>           collect (split-on-space line))))

Good use of the WITH-OPEN-FILE macro.  I will note that the code you
have here will throw away unprocessed all lines that contain "#"
anywhere in them.  That means you can't use # to put in line comments
after data -- but perhaps that isn't a legal format.  But it still seems
that it would be an issue.

> (defun split-on-space (string)
>   "Splits string on whitespace, meaning spaces and tabs"
>   (unless (null string) 
>     (let ((space (or (position #\space string) (position #\tab string))))
>       (cond 
>         (space (cons 
>                 (subseq string 0  space)
>                 (split-on-space 
>                  (string-trim '(#\Space #\Tab) (subseq string space)))))
>         (t (list string))))))

I would think you want to either define a WHITE-SPACE-P function or pass
in the delimiters to this function.

Note that the use of STRING-TRIM doesn't allow you to skip values in
your data format.  So if there isn't any value for a column you can't
detect that because you remove all consecutive occurences of your
whitespace.  Also, your current implementation doesn't do the trim until
the first recursive call.  So that will remove multiple leading and
trailing spaces then, it won't remove an initial leading space.  Try it
with 

  (split-on-space "  This is fun!  ")

Consider the following slightly modfied version, which optionally
collapses multiple delimiters.

(defun split-on-delimiter (string delimiters
                                  &key (allow-multiple-delimiters t))
  "Splits STRING on any character in the sequence DELIMITERS"
  (unless allow-multiple-delimiters
     (setq string (string-trim delimiters string)))
  (flet ((delimiterp (char)
           (find char delimiters :test #'char=)))
    (unless (null string)
      (let ((delimiter-position (position-if #'delimiterp string)))
        (if delimiter-position
            (cons (subseq string 0 delimiter-position)
                  (split-on-delimiter (subseq string (1+ delimiter-position))
                                      delimiters 
                                      :allow-multiple-delimiters 
                                       allow-multiple-delimiters))
            (list string))))))

Also, a lot of these type of operations tend to use iterative rather
than recursive solutions, exploiting the :START keyword to the POSITION
(or POSITION-IF) function.  If you search back in the archives for
things like split-sequence, or look at the library implementation, you
can see how that would be done.

> (get-platforms "c:/projects/lisp/platforms.lst")

-- 
Thomas A. Russ,  USC/Information Sciences Institute

From: Steve Allan
Subject: Re: Beginner code - splitting lines on whitespace
Date: Sat, 31 May 2008 16:26:55 +0000
Message-ID: <uod6mphgw.fsf@attachmate.com>

Thanks Thomas,

I finally got a chance to come back to this.  Probably a boring topic
for experienced lispers, but this exercise has really taught me a lot.

···@sevak.isi.edu (Thomas A. Russ) writes:

> Steve Allan <··········@yahoo.com> writes:
>

<snip>

>> ;;;;================================================================
>> (defun get-platforms (file)
>>   (with-open-file (platforms file)
>>     (loop for line = (read-line platforms nil)
>>           while line
>>           unless (position #\# line)
>>           collect (split-on-space line))))
>
> Good use of the WITH-OPEN-FILE macro.  I will note that the code you
> have here will throw away unprocessed all lines that contain "#"
> anywhere in them.  That means you can't use # to put in line comments
> after data -- but perhaps that isn't a legal format.  But it still seems
> that it would be an issue.

Yeah, I kind of punted on that issue for now - at some point I'll have
to deal with it though.

<snip>

> Also, a lot of these type of operations tend to use iterative rather
> than recursive solutions, exploiting the :START keyword to the POSITION
> (or POSITION-IF) function.  If you search back in the archives for
> things like split-sequence, or look at the library implementation, you
> can see how that would be done.

I wrote a white-space-p function as you suggested, and came up with
this iterative solution to split-on-space.

(defun white-space-p (c)
  (or (char= c #\Space) (char= c #\Tab)))

(defun split-on-space (string)
  (loop 
   for b = (position-if-not #'white-space-p string)
   then (position-if-not #'white-space-p string :start e)
   for e = (when b (position-if #'white-space-p string :start b))
   while b
   collect (subseq string b e)
   while e))

I wrote a (do ...) version as well, but the loop version seems more
elegant, so I posted that one.

Thanks again for the feedback!

-- 
-- Steve

From: Wade Humeniuk
Subject: Re: Beginner code - splitting lines on whitespace
Date: Sat, 31 May 2008 23:14:44 +0000
Message-ID: <EZk0k.262$7B3.98@edtnps91>

Steve Allan wrote:
> Thanks Thomas,
> 
> I finally got a chance to come back to this.  Probably a boring topic
> for experienced lispers, but this exercise has really taught me a lot.
> 
> ···@sevak.isi.edu (Thomas A. Russ) writes:
> 
>> Steve Allan <··········@yahoo.com> writes:
>>
> 
> <snip>
> 
>>> ;;;;================================================================
>>> (defun get-platforms (file)
>>>   (with-open-file (platforms file)
>>>     (loop for line = (read-line platforms nil)
>>>           while line
>>>           unless (position #\# line)
>>>           collect (split-on-space line))))
>> Good use of the WITH-OPEN-FILE macro.  I will note that the code you
>> have here will throw away unprocessed all lines that contain "#"
>> anywhere in them.  That means you can't use # to put in line comments
>> after data -- but perhaps that isn't a legal format.  But it still seems
>> that it would be an issue.
> 
> Yeah, I kind of punted on that issue for now - at some point I'll have
> to deal with it though.
> 
> <snip>
> 
>> Also, a lot of these type of operations tend to use iterative rather
>> than recursive solutions, exploiting the :START keyword to the POSITION
>> (or POSITION-IF) function.  If you search back in the archives for
>> things like split-sequence, or look at the library implementation, you
>> can see how that would be done.
> 
> I wrote a white-space-p function as you suggested, and came up with
> this iterative solution to split-on-space.
> 
> (defun white-space-p (c)
>   (or (char= c #\Space) (char= c #\Tab)))
> 
> (defun split-on-space (string)
>   (loop 
>    for b = (position-if-not #'white-space-p string)
>    then (position-if-not #'white-space-p string :start e)
>    for e = (when b (position-if #'white-space-p string :start b))
>    while b
>    collect (subseq string b e)
>    while e))
> 
> I wrote a (do ...) version as well, but the loop version seems more
> elegant, so I posted that one.
> 
> Thanks again for the feedback!
> 


The stream paradigm can also be your friend.  Just as you are reading 
lines from the file, you can read "tokens" from a string.

(defun get-platforms (file)
   (with-open-file (platforms file)
     (loop for line = (read-line platforms nil)
	 while line
	 unless (position #\# line)
	 collect (with-input-from-string (s line)
		   (loop for token = (read-token s)
			while token collect token)))))

(defun read-token (stream &optional (whitespace '(#\space)))
   (with-output-to-string (token)
     (loop with found-token = nil
	 for c = (read-char stream nil nil) do
	 (cond

	   ((null c)
	    (if found-token
		(loop-finish)
	      (return-from read-token nil)))

	   ((member c whitespace)
	    (if found-token (loop-finish)))

	   (t
	    (setf found-token t)
	    (write-char c token))))))

CL-USER> (get-platforms "/home/wade/Lisp/platforms.txt")
(("mach1" "host1" "/home/me") ("mach2" "host2" "/export/home/me")
  ("mach3" "host3" "c:\\\\home\\\\me"))
CL-USER>

Wade

From: Thomas A. Russ
Subject: Re: Beginner code - splitting lines on whitespace
Date: Mon, 02 Jun 2008 21:42:18 +0000
Message-ID: <ymifxrvcy4l.fsf@blackcat.isi.edu>

Steve Allan <··········@yahoo.com> writes:

> Thanks Thomas,
> 
> I finally got a chance to come back to this.  Probably a boring topic
> for experienced lispers, but this exercise has really taught me a lot.

Not really.
It was refreshing to see nice code from a new user.

> > Also, a lot of these type of operations tend to use iterative rather
> > than recursive solutions, exploiting the :START keyword to the POSITION
> > (or POSITION-IF) function.  If you search back in the archives for
> > things like split-sequence, or look at the library implementation, you
> > can see how that would be done.
> 
> I wrote a white-space-p function as you suggested, and came up with
> this iterative solution to split-on-space.
> 
> (defun white-space-p (c)
>   (or (char= c #\Space) (char= c #\Tab)))
> 
> (defun split-on-space (string)
>   (loop 
>    for b = (position-if-not #'white-space-p string)
>    then (position-if-not #'white-space-p string :start e)
>    for e = (when b (position-if #'white-space-p string :start b))
>    while b
>    collect (subseq string b e)
>    while e))

This is a good implementation (I like loop).

I prefer not to intersperse WHILE clauses in the body of the Loop, but
the use inside Loop is conformant with the standard and doing both tests
does provide a solution to making sure you get the last item without
having to do surgery outside the loop construct.


-- 
Thomas A. Russ,  USC/Information Sciences Institute

From: Ken Tilton
Subject: Re: Beginner code - splitting lines on whitespace
Date: Thu, 22 May 2008 21:42:40 +0000
Message-ID: <4835e8d0$0$11633$607ed4bc@cv.net>

Steve Allan wrote:
> I've been lurking here for a bit, and trying to learn lisp in my spare
> time.  As an exercise, I'm trying to port some Perl code to lisp, and
> the first bit of functionality I wrote was to read a file that looks
> like this:
> 
> #------------------------------------------
> # unique-label    hostname    home-dir
> #------------------------------------------
> mach1	host1	/home/me
> mach2	host2	/export/home/me
> mach3   host3   c:\\home\\me
> 
> And split each line into a list, returning a list of lists:
> 
> (("mach1" "host1" "/home/me") 
>  ("mach2" "host2" "/export/home/me")
>  ("mach3" "host3" "c:\\home\\me"))
> 
> The code below does this, but I'm pretty sure there's much that could
> be improved. To speed up my learning, I'd love feedback in either/both
> of these areas.
> 
> 1) Critique on the implementation - how to improve the code as
> written.  
> 
> 2) Suggestions for a better approach altogether.
> 
> I realize the task I'm coding is pretty mundane, but I think I could
> learn a lot of the basics by really getting this right, so your
> critique would be most welcome.
> 
> Thanks!
> 
> Code is below.
> 
> --
> -- Steve
> 
> 
> ;;;;================================================================
> (defun get-platforms (file)
>   (with-open-file (platforms file)
>     (loop for line = (read-line platforms nil)
>           while line
>           unless (position #\# line)
>           collect (split-on-space line))))
> 
> (defun split-on-space (string)
>   "Splits string on whitespace, meaning spaces and tabs"
>   (unless (null string) 
>     (let ((space (or (position #\space string) (position #\tab string))))
>       (cond 
>         (space (cons 
>                 (subseq string 0  space)
>                 (split-on-space 
>                  (string-trim '(#\Space #\Tab) (subseq string space)))))
>         (t (list string))))))

Not bad at all. (unless (null x)...) should be (when x...) I think you 
might agree. You might avoid searching the string twice, one for tab and 
once for space, by using position-if. And a lot of us have little macros 
akin to Paul Graham's AIF. I have (untested):

(when string
   (bif (delim-pos (position-if (lambda (c)
                                  (or (char= c #\space)(char= c #\tab)))
                        string))
      (cons (subseq string 0 delim-pos) ...etc...)
      (list string))

BIF left as an exercise. :)

If you want to go nuts you can string-right-trim once at the outset and 
then just string-left-trim thereafter.

hth,kt

-- 
http://smuglispweeny.blogspot.com/
http://www.theoryyalgebra.com/
ECLM rant: 
http://video.google.com/videoplay?docid=-1331906677993764413&hl=en
ECLM talk: 
http://video.google.com/videoplay?docid=-9173722505157942928&q=&hl=en

From: Steve Allan
Subject: Re: Beginner code - splitting lines on whitespace
Date: Mon, 26 May 2008 18:53:28 +0000
Message-ID: <uod6skic7.fsf@attachmate.com>

Ken Tilton <···········@optonline.net> writes:

> Steve Allan wrote:
>> (defun get-platforms (file)
>>   (with-open-file (platforms file)
>>     (loop for line = (read-line platforms nil)
>>           while line
>>           unless (position #\# line)
>>           collect (split-on-space line))))
>> (defun split-on-space (string)
>>   "Splits string on whitespace, meaning spaces and tabs"
>>   (unless (null string)     (let ((space (or (position #\space
>> string) (position #\tab string))))
>>       (cond         (space (cons                 (subseq string 0
>> space)
>>                 (split-on-space                  (string-trim
>> (#\Space #\Tab) (subseq string space)))))
>>         (t (list string))))))
>
> Not bad at all. (unless (null x)...) should be (when x...) I think you
> might agree. You might avoid searching the string twice, one for tab
> and once for space, by using position-if. And a lot of us have little
> macros akin to Paul Graham's AIF. I have (untested):
>
> (when string
>   (bif (delim-pos (position-if (lambda (c)
>                                  (or (char= c #\space)(char= c #\tab)))
>                        string))
>      (cons (subseq string 0 delim-pos) ...etc...)
>      (list string))
>
> BIF left as an exercise. :)

Kenny, thanks for pointing me to PG's aif. 'On Lisp' is a bit down on
my reading list so it would have taken me a long while to find that on
my own.

Combining your suggestions with some of Thomas', I came up with this:


(defmacro bif (test-form-binding then-form &optional else-form)
  `(let ((,(car test-form-binding) ,(cadr test-form-binding)))
    (if ,(car test-form-binding) ,then-form ,else-form)))

(defun white-space-p (char)
  (or (char= char #\Space) (char= char #\Tab)))

;; trim both ends of string at the start, then just trim from 
;; the left in recursive function
(defun split-on-space (string)
  (labels ((split (string)
             (when string
               (bif (delim-pos (position-if #'white-space-p string))
                    (cons (subseq string 0 delim-pos)
                          (split (string-left-trim 
                                  '(#\Space #\Tab) (subseq string delim-pos))))
                    (list string)))))
    (split (string-trim '(#\Space #\Tab) string))))

CL-USER> (split-on-space "  This is fun!  ")
("This" "is" "fun!")

It is indeed!

-- 
-- Steve

From: Ken Tilton
Subject: Re: Beginner code - splitting lines on whitespace
Date: Mon, 26 May 2008 19:22:48 +0000
Message-ID: <483b0dca$0$11625$607ed4bc@cv.net>

Steve Allan wrote:
> Ken Tilton <···········@optonline.net> writes:
> 
> 
>>Steve Allan wrote:
>>
>>>(defun get-platforms (file)
>>>  (with-open-file (platforms file)
>>>    (loop for line = (read-line platforms nil)
>>>          while line
>>>          unless (position #\# line)
>>>          collect (split-on-space line))))
>>>(defun split-on-space (string)
>>>  "Splits string on whitespace, meaning spaces and tabs"
>>>  (unless (null string)     (let ((space (or (position #\space
>>>string) (position #\tab string))))
>>>      (cond         (space (cons                 (subseq string 0
>>>space)
>>>                (split-on-space                  (string-trim
>>>(#\Space #\Tab) (subseq string space)))))
>>>        (t (list string))))))
>>
>>Not bad at all. (unless (null x)...) should be (when x...) I think you
>>might agree. You might avoid searching the string twice, one for tab
>>and once for space, by using position-if. And a lot of us have little
>>macros akin to Paul Graham's AIF. I have (untested):
>>
>>(when string
>>  (bif (delim-pos (position-if (lambda (c)
>>                                 (or (char= c #\space)(char= c #\tab)))
>>                       string))
>>     (cons (subseq string 0 delim-pos) ...etc...)
>>     (list string))
>>
>>BIF left as an exercise. :)
> 
> 
> Kenny, thanks for pointing me to PG's aif. 'On Lisp' is a bit down on
> my reading list so it would have taken me a long while to find that on
> my own.
> 
> Combining your suggestions with some of Thomas', I came up with this:
> 
> 
> (defmacro bif (test-form-binding then-form &optional else-form)
>   `(let ((,(car test-form-binding) ,(cadr test-form-binding)))
>     (if ,(car test-form-binding) ,then-form ,else-form)))

One of the nice things about defmacro is the destructuring fer free:

  (defmacro bif ((bindvar boundform) yup &optional nope)
    `(let ((,bindvar ,boundform))
        (if ,bindvar
           ,yup
           ,nope)))

I get paid by the LOC. You also have &key and &optional in there, and 
you can have as many destructuring forms as you like.

> 
> (defun white-space-p (char)
>   (or (char= char #\Space) (char= char #\Tab)))
> 
> ;; trim both ends of string at the start, then just trim from 
> ;; the left in recursive function
> (defun split-on-space (string)
>   (labels ((split (string)
>              (when string
>                (bif (delim-pos (position-if #'white-space-p string))
>                     (cons (subseq string 0 delim-pos)
>                           (split (string-left-trim 
>                                   '(#\Space #\Tab) (subseq string delim-pos))))
>                     (list string)))))
>     (split (string-trim '(#\Space #\Tab) string))))
> 
> CL-USER> (split-on-space "  This is fun!  ")
> ("This" "is" "fun!")
> 
> It is indeed!
> 

It gets better. :)

kt


-- 
http://smuglispweeny.blogspot.com/
http://www.theoryyalgebra.com/
ECLM rant: 
http://video.google.com/videoplay?docid=-1331906677993764413&hl=en
ECLM talk: 
http://video.google.com/videoplay?docid=-9173722505157942928&q=&hl=en

From: danb
Subject: Re: Beginner code - splitting lines on whitespace
Date: Sun, 01 Jun 2008 03:29:47 +0000
Message-ID: <c13f2145-5173-40fa-8e26-d041d04ea763@2g2000hsn.googlegroups.com>

On May 22, 3:21 pm, Steve Allan <··········@yahoo.com> wrote:
> I'm trying to port some Perl code to lisp
> 2) Suggestions for a better approach altogether.

Use cl-ppcre.

(defpackage :split (:use :cl :cl-ppcre))
(in-package :split)

(defun scan-file ()
  (with-open-file (in "split-lines.in")
    (loop for line = (read-line in nil nil)
          while line
          unless (scan "^\\s*(#.*)?$" line)
          collect (split " +" line))))

SPLIT> (scan-file)
(("mach1" "host1" "/home/me")
 ("mach2" "host2" "/export/home/me")
 ("mach3" "host3" "c:\\\\home\\\\me"))

--Dan

------------------------------------------------
Dan Bensen  http://www.prairienet.org/~dsb/

cl-match:  expressive pattern matching in Lisp
http://common-lisp.net/project/cl-match/