Word parser

From: Richard Krush
Subject: Word parser
Date: Wed, 30 May 2001 04:12:23 +0000
Message-ID: <9f1rv5$1rvg4$1@ID-60069.news.dfncis.de>

Hi!

I just finished an exercise from [K&R2] which is supposed to return a list
of words in passed string. I would really appreciate it if someone could
look at it and maybe give me some pointers on how I can improve it's
efficiency or ellegance (or both). I would also like to know if anyone has
written one such function using functional style (I couldn't figure out how
to do it that way).

  [K&R2] If you don't know, it's 'The C Programming Language' by Kernigham
         and Ritchie

;;;
;;; A "word" is defined as the sequence of characters separated
;;; by space, tab, newline, or EOL (End Of Line);
;;;
(defun parse-string (str)
  "Returns a list of words in string STR."
  (labels ((spacep (ch)
             (or (equal ch #\Space)
                 (equal ch #\Newline)
                 (equal ch #\Tab))))
    (when (stringp str)
      (let ((word '()) (words '()))
        (dovector (ch str)
          (if (spacep ch)
              (progn (setf words (append words (list (coerce word 'string))))
                     (setf word '()))
            (setf word (append word (list ch)))))
        (setf words (append words (list (coerce word 'string))))
        (return-from parse-string words)))))

Thanks in advance,
 rk

-- 
  Richard Krushelnitskiy   "A mathematician is a blind man in a dark
  ·········@gmx.net         room looking for a black cat which isn't
  http://rkrush.cjb.net	    there."                -- Charles Darwin

Re: Word parser Greg Menke
- Re: Word parser Hannah Schroeter
  - Re: Word parser Greg Menke
Re: Word parser ···············@solibri.com

From: Greg Menke
Subject: Re: Word parser
Date: Wed, 30 May 2001 12:45:20 +0000
Message-ID: <m33d9nronz.fsf@europa.mindspring.com>

> I just finished an exercise from [K&R2] which is supposed to return a list
> of words in passed string. I would really appreciate it if someone could
> look at it and maybe give me some pointers on how I can improve it's
> efficiency or ellegance (or both). I would also like to know if anyone has
> written one such function using functional style (I couldn't figure out how
> to do it that way).


I did something similar;

(defmacro trimstring (str)
  `(string-trim '(#\Space #\Tab #\Newline) ,str))



(defun unstringify-into-list (words)
  (let ((wlist nil))

    (loop for i = (position #\Space words)
	  until (= (length words) 0)
	  do 
	  (push (trimstring (if i 
				(subseq words 0 i)
			      ; else
			        words))
		wlist)
	  (setf words (if i
			  (trimstring (subseq words i))
		        ;else
		          "")))
    (nreverse wlist)))


- but I'm not sure if its very good.  Its worked OK so far, though I
  think it should also split words on #\Tab and #\Newline instead of
  just #\Space.


Gregm

From: Hannah Schroeter
Subject: Re: Word parser
Date: Wed, 30 May 2001 16:53:22 +0000
Message-ID: <9f38i2$m1r$1@c3po.schlund.de>

Hello!

In article <··············@europa.mindspring.com>,
Greg Menke  <··········@mindspring.com> wrote:

>(defmacro trimstring (str)
>  `(string-trim '(#\Space #\Tab #\Newline) ,str))

No need for a macro here.

(declaim (inline trimstring))
(defun trimstring (str)
  (string-trim '(#\Space #\Tab #\Newline) str))

>(defun unstringify-into-list (words)
>  (let ((wlist nil))

>    (loop for i = (position #\Space words)
>	  until (= (length words) 0)
>	  do 
>	  (push (trimstring (if i 
>				(subseq words 0 i)
>			      ; else
>			        words))
>		wlist)
>	  (setf words (if i
>			  (trimstring (subseq words i))
>		        ;else
>		          "")))
>    (nreverse wlist)))

You could use loop's collect clause for collecting the words instead
of the (push ...) action:

(defun unstringify-into-list (words)
  (loop for i = (position #\Space words)
        until (= (length words) 0)
        collect (trimstring (if i (subseq words 0 i)
                                  words))
        do (setf words (if i (trimstring (subseq words i))
                             ""))))

>- but I'm not sure if its very good.  Its worked OK so far, though I
>  think it should also split words on #\Tab and #\Newline instead of
>  just #\Space.

Yes, that'd be another improvement, also to be applied to my code.

Something like (position-if #'whitespace-p words) with an appropriate
definition of function whitespace-p.

Also, we could not cons intermediate strings (setf words ...), instead
just keeping current indices. The :start keyword parameter for
position(-if) helps.

Kind regards,

Hannah.

From: Greg Menke
Subject: Re: Word parser
Date: Wed, 30 May 2001 20:20:36 +0000
Message-ID: <m3y9reegh7.fsf@europa.mindspring.com>

> 
> >(defmacro trimstring (str)
> >  `(string-trim '(#\Space #\Tab #\Newline) ,str))
> 
> No need for a macro here.

Its used elsewhere too, and whatever calling overhead it imposes has
no other benefit...

> >- but I'm not sure if its very good.  Its worked OK so far, though I
> >  think it should also split words on #\Tab and #\Newline instead of
> >  just #\Space.
> 
> Yes, that'd be another improvement, also to be applied to my code.
> 
> Something like (position-if #'whitespace-p words) with an appropriate
> definition of function whitespace-p.
> 
> Also, we could not cons intermediate strings (setf words ...), instead
> just keeping current indices. The :start keyword parameter for
> position(-if) helps.
> 
> Kind regards,
> 
> Hannah.

Thanks, another reader of this newsgroup emailed me with a similarly
improved version.

Gregm

From: ···············@solibri.com
Subject: Re: Word parser
Date: Wed, 30 May 2001 14:09:36 +0000
Message-ID: <uae3v7wtb.fsf@solibri.com>

Richard Krush <·········@gmx.net> writes:

> Hi!
> 
> I just finished an exercise from [K&R2] which is supposed to return a list
> of words in passed string. I would really appreciate it if someone could
> look at it and maybe give me some pointers on how I can improve it's
> efficiency or ellegance (or both). I would also like to know if anyone has
> written one such function using functional style (I couldn't figure out how
> to do it that way).
> 
>   [K&R2] If you don't know, it's 'The C Programming Language' by Kernigham
>          and Ritchie
> 
> ;;;
> ;;; A "word" is defined as the sequence of characters separated
> ;;; by space, tab, newline, or EOL (End Of Line);
> ;;;
> (defun parse-string (str)
>   "Returns a list of words in string STR."
>   (labels ((spacep (ch)
>              (or (equal ch #\Space)
>                  (equal ch #\Newline)
>                  (equal ch #\Tab))))
>     (when (stringp str)
>       (let ((word '()) (words '()))
>         (dovector (ch str)
>           (if (spacep ch)
>               (progn (setf words (append words (list (coerce word 'string))))
>                      (setf word '()))
>             (setf word (append word (list ch)))))
>         (setf words (append words (list (coerce word 'string))))
>         (return-from parse-string words)))))

You'll find plenty of examples by using for example this query:

http://groups.google.com/groups?as_q="split-string"&as_ugroup=comp.lang.lisp&num=50

On your particular example a few comments:

- Collecting stuff by appending the new values to the end of the list
  is wasteful. A more common usage is adding them to the beginning of
  the list by PUSH etc, and reversing the finished list with REVERSE
  or NREVERSE.

- Using lists to collect the characters of the word one by one is
  unnecessary, you can use CL's sequence functions like POSITION and
  SUBSEQ.

--