Splitting a string on a character...

From: Cory Spencer
Subject: Splitting a string on a character...
Date: Mon, 06 May 2002 18:53:13 +0000
Message-ID: <ab6jeo$vs$1@nntp.itservices.ubc.ca>

Just a quickie question - is there already a Common Lisp function that
will split a string on a given character?

ie) will perform a similar function as this:

  (defun split (chr str)
    (let ((pos (position chr str)))
      (if (null pos)
        str
        (cons (substring str 0 pos)
              (split chr (substring str (+ pos 1)))))))

--
Cory Spencer <········@interchange.ubc.ca>

Re: Splitting a string on a character... Thomas F. Burdick
Re: Splitting a string on a character... Barry Margolin
Re: Splitting a string on a character... Edi Weitz
Re: Splitting a string on a character... Harald Hanche-Olsen
- Re: Splitting a string on a character... Herb Martin
Re: Splitting a string on a character... Erik Naggum
- Re: Splitting a string on a character... Wade Humeniuk
- Re: Splitting a string on a character... jb
  - Re: Splitting a string on a character... jb

From: Thomas F. Burdick
Subject: Re: Splitting a string on a character...
Date: Mon, 06 May 2002 19:25:10 +0000
Message-ID: <xcvn0vd5bah.fsf@conquest.OCF.Berkeley.EDU>

Cory Spencer <········@interchange.ubc.ca> writes:

> Just a quickie question - is there already a Common Lisp function that
> will split a string on a given character?

No.  Personally, I'm glad.  All the functions that deal with sequences
let you specify :start and :end, and we have POSITION.  Which all add
up to a more reasonable, less garbage-y way of doing things.  You can
see more of my thoughts on the matter here:

  <http://groups.google.com/groups?q=g:thl963481286d&dq=&hl=en&selm=xcvvghhpw95.fsf%40conquest.OCF.Berkeley.EDU>

-- 
           /|_     .-----------------------.                        
         ,'  .\  / | No to Imperialist war |                        
     ,--'    _,'   | Wage class war!       |                        
    /       /      `-----------------------'                        
   (   -.  |                               
   |     ) |                               
  (`-.  '--.)                              
   `. )----'

From: Barry Margolin
Subject: Re: Splitting a string on a character...
Date: Mon, 06 May 2002 19:02:55 +0000
Message-ID: <zrAB8.10$vD6.166@paloalto-snr1.gtei.net>

In article <···········@nntp.itservices.ubc.ca>,
Cory Spencer  <········@interchange.ubc.ca> wrote:
>Just a quickie question - is there already a Common Lisp function that
>will split a string on a given character?

No.

>ie) will perform a similar function as this:
>
>  (defun split (chr str)
>    (let ((pos (position chr str)))
>      (if (null pos)
>        str
>        (cons (substring str 0 pos)
>              (split chr (substring str (+ pos 1)))))))

You should probably return (list str) in the terminating case, so that the
final result is a proper list rather than a dotted list.

-- 
Barry Margolin, ······@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

From: Edi Weitz
Subject: Re: Splitting a string on a character...
Date: Mon, 06 May 2002 19:10:45 +0000
Message-ID: <m3y9ex853e.fsf@bird.agharta.de>

Cory Spencer <········@interchange.ubc.ca> writes:

> Just a quickie question - is there already a Common Lisp function that
> will split a string on a given character?
> 
> ie) will perform a similar function as this:
> 
>   (defun split (chr str)
>     (let ((pos (position chr str)))
>       (if (null pos)
>         str
>         (cons (substring str 0 pos)
>               (split chr (substring str (+ pos 1)))))))

No, but you might want to take a look at

  <http://ww.telent.net/cliki/SPLIT-SEQUENCE>

Also, some regex packages provide string splitters that can split on
arbitrary regular expressions. See

  <http://www.ccs.neu.edu/home/dorai/pregexp/pregexp-Z-H-2.html#%_sec_2.4>

for an example.

Edi.

From: Harald Hanche-Olsen
Subject: Re: Splitting a string on a character...
Date: Mon, 06 May 2002 20:30:53 +0000
Message-ID: <pcowuuh81du.fsf@thoth.math.ntnu.no>

+ Cory Spencer <········@interchange.ubc.ca>:

| Just a quickie question - is there already a Common Lisp function that
| will split a string on a given character?

Did you check the HyperSpec?  If so, did you see it in either the
Strings or the Sequences chapter?  If you don't know what the
HyperSpec is, please do yourself a favour and go find it here:

  http://www.xanalys.com/software_tools/

(not sure if this address is current -- I use my own local copy).

| ie) will perform a similar function as this:
| 
|   (defun split (chr str)
|     (let ((pos (position chr str)))
|       (if (null pos)
|         str
|         (cons (substring str 0 pos)
|               (split chr (substring str (+ pos 1)))))))

Apart from looking somewhat Scheme-ish (relying on recursion rather
than iteration), it won't work, because substring is not in the
language.  With subseq instead it runs, but it probably does not quite
do what you expected of it:

(split #\/ "abc/def//ghi") ==> ("abc" "def" "" . "ghi")

(Note the dot.)  This is probably a FAQ, but here is my suggestion for
a solution anyway:

(defun split (thing sequence)
  (loop for start = 0 then (1+ end)
	for end = (position thing sequence :start start)
	collect (subseq sequence start end)
	while end))

Note that it will split any sequence, not just a string:

(split #\/ "abc/def//ghi") ==> ("abc" "def" "" "ghi")
(split 0 #(3 1 0 5 7 1 9 0 0 5)) ==> (#(3 1) #(5 7 1 9) #() #(5))
(split 0 '(3 1 0 5 7 1 9 0 0 5)) ==> ((3 1) (5 7 1 9) NIL (5))

Neat, eh?

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- Yes it works in practice - but does it work in theory?

From: Herb Martin
Subject: Re: Splitting a string on a character...
Date: Wed, 08 May 2002 00:08:28 +0000
Message-ID: <00_B8.60500$Q42.3326561@typhoon.austin.rr.com>

> Neat, eh?

I love it.  Thanks.

Herb Martin
Try ADDS for great Weather too:
http://adds.aviationweather.noaa.gov/projects/adds

> (defun split (thing sequence)
>   (loop for start = 0 then (1+ end)
> for end = (position thing sequence :start start)
> collect (subseq sequence start end)
> while end))
>
> Note that it will split any sequence, not just a string:
>
> (split #\/ "abc/def//ghi") ==> ("abc" "def" "" "ghi")
> (split 0 #(3 1 0 5 7 1 9 0 0 5)) ==> (#(3 1) #(5 7 1 9) #() #(5))
> (split 0 '(3 1 0 5 7 1 9 0 0 5)) ==> ((3 1) (5 7 1 9) NIL (5))
>

From: Erik Naggum
Subject: Re: Splitting a string on a character...
Date: Tue, 07 May 2002 02:40:41 +0000
Message-ID: <3229728040064576@naggum.net>

* Cory Spencer
| Just a quickie question - is there already a Common Lisp function that
| will split a string on a given character?

  Most often, when people ask quickie questions, they have been working
  themselves through what one would think of as a labyrinth where they make
  brief excursions in the wrong direction and self-correct when they hit
  the wall, so to speak.  When they hit the wall and do not self-correct,
  they post a quickie question, but there is an arbitary amount of back-
  tracking involved in providing the right answer.  Just moving the person
  into a new labyrinth without the particular wall they have run into is
  seldom the best answer, as the wrong choice they have made will lead them
  right into another wall shortly thereafter.  Therefore, a "quickie" is a
  strong signal to experienced problem-solvers that something is wrong: The
  requestor is stuck, but does not think he should have been.  However, if
  his thinking were correct, he would not be stuck.  Yet he is, and that is
  a hint that the amount of backtracking required will be significant and
  that is just the opposite of a "quickie".
  
| ie) will perform a similar function as this:

  Generally speaking, a reader or parser of some sort.

  It is quite important to realize that you will never, ever have a case
  where you can entirely get rid of the "splitting" character.  If you
  think you can legitimately expect this, you are just too inexperienced at
  what you are doing and will run into a problem sooner or later.  Let me
  give you a few examples.  Under Unix, you cannot have a colon in your
  login name, in your home directory name, in your real name, or in your
  shell, because the colon separates these fields in a system password
  file.  (Not to mention null bytes and newlines.)  This is just too dumb
  to be believable on the face of it, but it is actually the case.  Unix
  freaks do not think this is a problem because they internalize the rules
  and do not _want_ a colon in those places.  However, software that
  updates the password file has to do sanity checks in order not to expose
  the system to serious security risks because there is no way to escape a
  payload colon from the delimiting colon.  In the standard Unix shells,
  whitespace separates arguments, but you have several escaping forms to
  allow whitespace to exist in arguments.  All in all, the mechanisms that
  are used in the shell are quite arcane and difficult to predict from a
  program, but a user can usually deal with it, in the standard Unix idea
  of "usually".  Then there is HTML and URL's and all that crap.  To make
  sure that a character is always a payload character, it must be written
  as &#nnn, where nnn is the ISO 10646 code for character, or you have to
  engagge in table lookups, context-sensitive parsing rules, and all sorts
  of random weirdness.  Likewise, in URL's, it is incredibly hard to get
  all you want through to the other side.  Recently, I subscribed to the
  Unabridged Merriam-Webster dictionary, and they need the e-mail address
  as the username.  It turned out to be very hard to write a URL that had a
  payload @ in the username and a syntax @ before the hostname.  I actually
  find such things absolutely incredible -- to be so thoughtless must have
  been _really_ hard.

  This is why you should not use position to find a character to split on,
  you should use a state machine that traverses the string and finds only
  those (matching) characters that are syntactically relevant, not those
  (matching) characters that are (or should be) payload characters.  A
  regular expression is _not_ sufficient for this task.
-- 
  In a fight against something, the fight has value, victory has none.
  In a fight for something, the fight is a loss, victory merely relief.

  70 percent of American adults do not understand the scientific process.

From: Wade Humeniuk
Subject: Re: Splitting a string on a character...
Date: Tue, 07 May 2002 16:00:10 +0000
Message-ID: <ab8t7k$o9v$1@news3.cadvision.com>

This is a multi-part message in MIME format.

------=_NextPart_000_0054_01C1F5AD.F96B7140
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit


"Erik Naggum" <····@naggum.net> wrote in message ·····················@naggum.net...
> * Cory Spencer
>
>   This is why you should not use position to find a character to split on,
>   you should use a state machine that traverses the string and finds only
>   those (matching) characters that are syntactically relevant, not those
>   (matching) characters that are (or should be) payload characters.  A
>   regular expression is _not_ sufficient for this task.


Thanks for the post Erik.  What you said is very true.  I have attached some code that
implements parsing time formats that I use in my running log program.  I was amazed that
it got so large for such a simple spec but it was necessary to reliably dynamically
determine if user input was valid during any point of entering the data from a
time-input-pane.

Wade

------=_NextPart_000_0054_01C1F5AD.F96B7140
Content-Type: application/octet-stream;
	name="time.lisp"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="time.lisp"

(in-package :running-log)

#|

Funtions to parse and format time strings.  Internally time is stored in
hundreths of seconds and is thus limited to that precision.

Users can enter time in two formats: [] is an optional element

1) Colon Time, where it is of the form =
[<hours>][:][<minutes>][:]<seconds>
   <hours> (integer 0 *) is optional, to specify <hours>, <minutes> must =
be specified
   <minutes> (integer 0 *) is optional, to specify <minutes>, <seconds> =
must be specified
   <seconds> (fixed-float 0 *) is required

   there must be a colon to seperate the time elements.
   whitespace is ignored between elements.

   Valid Examples:  "10:23:26.54"  "43:00"  "1376.2"  "108:231   : 21"

2) HMS Time, where the time is an unordered list of the elements =
[<H>][<M>][<S>]
   <H> :=3D <hours>H or <hours>h
   <M> :=3D <minutes>M or <minutes>m
   <S> :=3D <seconds>S or <seconds>s
  =20
   At least one of <H> <M> or <S> must be supplied.  Order is not =
important.  Multiple
   occurances of any element signals an error.

   Valid Examples: "10h23m26.54s"  "100m10h" "51243.23s"

A user can specify Colon Time or HMS Time but not a combination of both.

Interesting entry points.  READ-TIME-FROM-STRING - parses time from both =
formats.
                           TIME-STRING - converts integer time =
(hundreths) to colon time string

|#

(defun signal-time-reader-error ()
  (declare (special *time-string*))
  (error (make-condition 'invalid-time-format :invalid-format =
*time-string*))
  nil)

(defun time-list-to-hundreths (time-list)
  (declare (type list time-list))
  (let ((seconds (first time-list))
        (minutes (second time-list))
        (hours (third time-list)))
    (if (and time-list
             (< (length time-list) 4)
             (realp seconds)
             (or (null minutes) (realp minutes))
             (or (null hours) (realp hours)))
        (+ (round (* seconds 100))
           (* (if minutes minutes 0) 6000)
           (* (if hours hours 0) 360000))
      (signal-time-reader-error))))

(defun read-hms-time-from-string (*time-string*)
  (declare (special *time-string*) (type string *time-string*))
  (with-input-from-string (stream *time-string*)
    (read-hms-time stream)))

(defun read-time-number (stream)
  (declare (type stream stream))
  (flet ((char-to-integer (c) (position c "0123456789")))
    (let ((temp)
          (in-decimal)
          (decimal-depth 0)
          (c))
      (loop
       (setf c (read-char stream nil nil))
       (cond
        ((not c) (if temp (return temp) (signal-time-reader-error)))
        ((digit-char-p c)
         (unless temp (setf temp 0))
         (cond
          (in-decimal=20
           (incf decimal-depth 1)
           (incf temp (/ (char-to-integer c) (expt 10 decimal-depth))))
          (t (setf temp (+ (* temp 10) (char-to-integer c))))))
        ((char=3D #\. c)
         (cond
          ((not in-decimal) (setf in-decimal t decimal-depth 0))
          (in-decimal (signal-time-reader-error))))
        (t=20
         (unread-char c stream)
         (if temp (return temp) (signal-time-reader-error))))))))

(defun read-hms-time (stream)
  (declare (type stream stream))
  (let ((hours)(minutes)(seconds)(temp)(c)(last-read))
    (flet ((reset () (setf temp nil)))
      (loop
       (setf c (read-char stream nil nil))
       (cond
        ((not c)
         (when temp
           (case last-read
             (:hours (if minutes
                         (signal-time-reader-error)
                       (setf minutes temp)))
             (:minutes (if seconds
                           (signal-time-reader-error)
                         (setf seconds temp)))
             (otherwise (signal-time-reader-error))))
         (return (time-list-to-hundreths (list (or seconds 0) (or =
minutes 0) (or hours 0)))))
        ((or (digit-char-p c) (char=3D c #\.))
         (when temp (signal-time-reader-error))
         (unread-char c stream)
         (setf temp (read-time-number stream))
         (if (null temp) (signal-time-reader-error)))
        ((char-equal #\h c)
         (cond
          ((or (not temp) hours minutes seconds) =
(signal-time-reader-error))
          (t
           (setf hours temp last-read :hours)
           (reset))))
        ((char-equal #\m c)
         (cond
          ((or (not temp) minutes seconds) (signal-time-reader-error))
          (t
           (setf minutes temp last-read :minutes)
           (reset))))
        ((char-equal #\s c)
         (cond
          ((or (not temp) seconds) (signal-time-reader-error))
          (t
           (setf seconds temp last-read :seconds)
           (reset))))
        ((char=3D #\Space c) nil)
        (t
         (signal-time-reader-error)))))))

(defun read-colon-time-from-string (*time-string*)
  (declare (special *time-string*) (type string *time-string*))
  (with-input-from-string (stream *time-string*)
    (read-colon-time stream)))

(defun read-colon-time (stream)
  (declare (type stream stream))
  (let ((time-list)(temp 0)(expecting-colon)(c))
    (flet ((reset () (setf temp 0 expecting-colon nil)))
      (loop
       (setf c (read-char stream nil))
       (cond
        ((not c)
         (when temp (push temp time-list))
         (return (time-list-to-hundreths time-list)))
        ((or (digit-char-p c) (char=3D #\. c))
         (when expecting-colon (signal-time-reader-error))
         (unread-char c stream)
         (setf temp (read-time-number stream))
         (unless temp (signal-time-reader-error))
         (setf expecting-colon t))
        ((char-equal #\: c)
         (cond
          ((>=3D (length time-list) 2)
           (signal-time-reader-error))
          (t (push temp time-list) (reset))))
        ((char=3D #\Space c) t)
        (t
         (signal-time-reader-error)))))))

(defun read-time-from-string (time-string)
  (if (and time-string (> (length time-string) 0))
      (restart-case
          (handler-case (read-colon-time-from-string time-string)
            (invalid-time-format () (invoke-restart 'restart-hms-time)))
        (restart-hms-time () (read-hms-time-from-string time-string)))
    nil))

(defun time-string (time)
  (declare (type (integer 0 *) time))
  (if (and time (not (zerop time)))
      (let* ((hours (floor (/ time 360000)))
             (minutes (floor (/ (- time (* hours 360000)) 6000)))
             (hundreths-seconds (- time (* hours 360000) (* minutes =
6000)))
             (seconds (floor (/ hundreths-seconds 100)))
             (hundreths (mod hundreths-seconds 100))
             (result nil))
        (cond
         ((and (zerop hours)
               (zerop minutes))
          (setf result (concatenate 'string (princ-to-string seconds))))
         ((zerop hours)
          (setf result (concatenate 'string=20
                                    (princ-to-string minutes) ":"
                                    (when (< seconds 10) "0") =
(princ-to-string seconds))))
         (t
          (setf result (concatenate 'string
                                    (princ-to-string hours) ":"
                                    (when (< minutes 10) "0") =
(princ-to-string minutes) ":"
                                    (when (< seconds 10) "0") =
(princ-to-string seconds)))))
        (if (not (zerop hundreths))
            (setf result (concatenate 'string
                                      result
                                      "."
                                      (when (< hundreths 10) "0")
                                      (princ-to-string hundreths))))
        result)
    nil))

(defclass time-input-pane (capi:text-input-pane)
  ((previous-text :initform nil :reader time-input-pane-text)
   (previous-caret-position :initform 0))
  (:default-initargs
   :change-callback 'validate-time-text))

(defun validate-time-text (text pane interface caret-position)
  (with-slots (previous-text previous-caret-position) pane
    (handler-case
        (progn
          (read-time-from-string text)
          (setf previous-text text previous-caret-position =
caret-position))
      (invalid-time-format (condition)=20
              (setf (capi:text-input-pane-text pane) previous-text
                    (capi:text-input-pane-caret-position pane) =
previous-caret-position)
              (capi:beep-pane)))))

(defmethod (setf time-input-pane-text) (text (tip time-input-pane))
  (with-slots (previous-text previous-caret-position) tip
    (setf (capi:text-input-pane-text tip) text previous-text text =
previous-caret-position 0)))




------=_NextPart_000_0054_01C1F5AD.F96B7140--

From: jb
Subject: Re: Splitting a string on a character...
Date: Tue, 07 May 2002 21:06:52 +0000
Message-ID: <3cd83d85_2@news2.newsgroups.com>

Erik Naggum wrote:

> * Cory Spencer
> | Just a quickie question - is there already a Common Lisp function that
> | will split a string on a given character?
> 
>   Most often, when people ask quickie questions, they have been working
>   themselves through what one would think of as a labyrinth where they
>   make brief excursions in the wrong direction and self-correct when they
>   hit
>   the wall, so to speak.  When they hit the wall and do not self-correct,
>   they post a quickie question, but there is an arbitary amount of back-
>   tracking involved in providing the right answer.  Just moving the person
>   into a new labyrinth without the particular wall they have run into is
>   seldom the best answer, as the wrong choice they have made will lead
>   them
>   right into another wall shortly thereafter.  Therefore, a "quickie" is a
>   strong signal to experienced problem-solvers that something is wrong:
>   The
>   requestor is stuck, but does not think he should have been.  
> However, if
> his thinking were correct, he would not be stuck.  Yet he is, and that
> is a hint that the amount of backtracking required will be significant
> and that is just the opposite of a "quickie".

This is not always true. I have just changed my OS. I think it is completely 
normal if I do not know how to solve even simple UNIX problems. Then it is 
very helpful if I can ask somebody. (For example I could not install 
anti.aliased fonts in Qt and the right hint, that solved my problem, 
consisted of a single sentence.)

Now when I am acting as a teacher and one of my pupils asks me a "simple" 
question I carefully investigate whether he is having a more serious 
problem. In a newsgroup however (for example in de.sci.mathemtik) I simply 
answer the question and do not care about his deeper problems.

> | ie) will perform a similar function as this:
> 
>   Generally speaking, a reader or parser of some sort.
> 
>   It is quite important [...]

I do not think, I have understood this deep essay on payload characters, 
whatever they may be, and I wonder if the original poster did.
I must admit, however, that I do not understand the closing remark on 70% of 
the American adults either.

-- 
J B

Il n'y a gu�re dans la vie qu'une pr�occupation grave: c'est la mort; 
(Dumas) 

-----------== Posted via Newsgroups.Com - Uncensored Usenet News ==----------
   http://www.newsgroups.com       The #1 Newsgroup Service in the World!
-----= Over 100,000 Newsgroups - Ulimited Fast Downloads - 19 Servers =-----

From: jb
Subject: Re: Splitting a string on a character...
Date: Tue, 07 May 2002 21:11:10 +0000
Message-ID: <3cd83e85_2@news2.newsgroups.com>

Sorry, jb = Janos Blazi.
-- 
J B

Il n'y a gu�re dans la vie qu'une pr�occupation grave: c'est la mort; 
(Dumas) 



-----------== Posted via Newsgroups.Com - Uncensored Usenet News ==----------
   http://www.newsgroups.com       The #1 Newsgroup Service in the World!
-----= Over 100,000 Newsgroups - Ulimited Fast Downloads - 19 Servers =-----