Just a quickie question - is there already a Common Lisp function that
will split a string on a given character?
ie) will perform a similar function as this:
(defun split (chr str)
(let ((pos (position chr str)))
(if (null pos)
str
(cons (substring str 0 pos)
(split chr (substring str (+ pos 1)))))))
--
Cory Spencer <········@interchange.ubc.ca>
Cory Spencer <········@interchange.ubc.ca> writes:
> Just a quickie question - is there already a Common Lisp function that
> will split a string on a given character?
No. Personally, I'm glad. All the functions that deal with sequences
let you specify :start and :end, and we have POSITION. Which all add
up to a more reasonable, less garbage-y way of doing things. You can
see more of my thoughts on the matter here:
<http://groups.google.com/groups?q=g:thl963481286d&dq=&hl=en&selm=xcvvghhpw95.fsf%40conquest.OCF.Berkeley.EDU>
--
/|_ .-----------------------.
,' .\ / | No to Imperialist war |
,--' _,' | Wage class war! |
/ / `-----------------------'
( -. |
| ) |
(`-. '--.)
`. )----'
In article <···········@nntp.itservices.ubc.ca>,
Cory Spencer <········@interchange.ubc.ca> wrote:
>Just a quickie question - is there already a Common Lisp function that
>will split a string on a given character?
No.
>ie) will perform a similar function as this:
>
> (defun split (chr str)
> (let ((pos (position chr str)))
> (if (null pos)
> str
> (cons (substring str 0 pos)
> (split chr (substring str (+ pos 1)))))))
You should probably return (list str) in the terminating case, so that the
final result is a proper list rather than a dotted list.
--
Barry Margolin, ······@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
Cory Spencer <········@interchange.ubc.ca> writes:
> Just a quickie question - is there already a Common Lisp function that
> will split a string on a given character?
>
> ie) will perform a similar function as this:
>
> (defun split (chr str)
> (let ((pos (position chr str)))
> (if (null pos)
> str
> (cons (substring str 0 pos)
> (split chr (substring str (+ pos 1)))))))
No, but you might want to take a look at
<http://ww.telent.net/cliki/SPLIT-SEQUENCE>
Also, some regex packages provide string splitters that can split on
arbitrary regular expressions. See
<http://www.ccs.neu.edu/home/dorai/pregexp/pregexp-Z-H-2.html#%_sec_2.4>
for an example.
Edi.
+ Cory Spencer <········@interchange.ubc.ca>:
| Just a quickie question - is there already a Common Lisp function that
| will split a string on a given character?
Did you check the HyperSpec? If so, did you see it in either the
Strings or the Sequences chapter? If you don't know what the
HyperSpec is, please do yourself a favour and go find it here:
http://www.xanalys.com/software_tools/
(not sure if this address is current -- I use my own local copy).
| ie) will perform a similar function as this:
|
| (defun split (chr str)
| (let ((pos (position chr str)))
| (if (null pos)
| str
| (cons (substring str 0 pos)
| (split chr (substring str (+ pos 1)))))))
Apart from looking somewhat Scheme-ish (relying on recursion rather
than iteration), it won't work, because substring is not in the
language. With subseq instead it runs, but it probably does not quite
do what you expected of it:
(split #\/ "abc/def//ghi") ==> ("abc" "def" "" . "ghi")
(Note the dot.) This is probably a FAQ, but here is my suggestion for
a solution anyway:
(defun split (thing sequence)
(loop for start = 0 then (1+ end)
for end = (position thing sequence :start start)
collect (subseq sequence start end)
while end))
Note that it will split any sequence, not just a string:
(split #\/ "abc/def//ghi") ==> ("abc" "def" "" "ghi")
(split 0 #(3 1 0 5 7 1 9 0 0 5)) ==> (#(3 1) #(5 7 1 9) #() #(5))
(split 0 '(3 1 0 5 7 1 9 0 0 5)) ==> ((3 1) (5 7 1 9) NIL (5))
Neat, eh?
--
* Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/>
- Yes it works in practice - but does it work in theory?
* Cory Spencer
| Just a quickie question - is there already a Common Lisp function that
| will split a string on a given character?
Most often, when people ask quickie questions, they have been working
themselves through what one would think of as a labyrinth where they make
brief excursions in the wrong direction and self-correct when they hit
the wall, so to speak. When they hit the wall and do not self-correct,
they post a quickie question, but there is an arbitary amount of back-
tracking involved in providing the right answer. Just moving the person
into a new labyrinth without the particular wall they have run into is
seldom the best answer, as the wrong choice they have made will lead them
right into another wall shortly thereafter. Therefore, a "quickie" is a
strong signal to experienced problem-solvers that something is wrong: The
requestor is stuck, but does not think he should have been. However, if
his thinking were correct, he would not be stuck. Yet he is, and that is
a hint that the amount of backtracking required will be significant and
that is just the opposite of a "quickie".
| ie) will perform a similar function as this:
Generally speaking, a reader or parser of some sort.
It is quite important to realize that you will never, ever have a case
where you can entirely get rid of the "splitting" character. If you
think you can legitimately expect this, you are just too inexperienced at
what you are doing and will run into a problem sooner or later. Let me
give you a few examples. Under Unix, you cannot have a colon in your
login name, in your home directory name, in your real name, or in your
shell, because the colon separates these fields in a system password
file. (Not to mention null bytes and newlines.) This is just too dumb
to be believable on the face of it, but it is actually the case. Unix
freaks do not think this is a problem because they internalize the rules
and do not _want_ a colon in those places. However, software that
updates the password file has to do sanity checks in order not to expose
the system to serious security risks because there is no way to escape a
payload colon from the delimiting colon. In the standard Unix shells,
whitespace separates arguments, but you have several escaping forms to
allow whitespace to exist in arguments. All in all, the mechanisms that
are used in the shell are quite arcane and difficult to predict from a
program, but a user can usually deal with it, in the standard Unix idea
of "usually". Then there is HTML and URL's and all that crap. To make
sure that a character is always a payload character, it must be written
as &#nnn, where nnn is the ISO 10646 code for character, or you have to
engagge in table lookups, context-sensitive parsing rules, and all sorts
of random weirdness. Likewise, in URL's, it is incredibly hard to get
all you want through to the other side. Recently, I subscribed to the
Unabridged Merriam-Webster dictionary, and they need the e-mail address
as the username. It turned out to be very hard to write a URL that had a
payload @ in the username and a syntax @ before the hostname. I actually
find such things absolutely incredible -- to be so thoughtless must have
been _really_ hard.
This is why you should not use position to find a character to split on,
you should use a state machine that traverses the string and finds only
those (matching) characters that are syntactically relevant, not those
(matching) characters that are (or should be) payload characters. A
regular expression is _not_ sufficient for this task.
--
In a fight against something, the fight has value, victory has none.
In a fight for something, the fight is a loss, victory merely relief.
70 percent of American adults do not understand the scientific process.
This is a multi-part message in MIME format.
------=_NextPart_000_0054_01C1F5AD.F96B7140
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
"Erik Naggum" <····@naggum.net> wrote in message ·····················@naggum.net...
> * Cory Spencer
>
> This is why you should not use position to find a character to split on,
> you should use a state machine that traverses the string and finds only
> those (matching) characters that are syntactically relevant, not those
> (matching) characters that are (or should be) payload characters. A
> regular expression is _not_ sufficient for this task.
Thanks for the post Erik. What you said is very true. I have attached some code that
implements parsing time formats that I use in my running log program. I was amazed that
it got so large for such a simple spec but it was necessary to reliably dynamically
determine if user input was valid during any point of entering the data from a
time-input-pane.
Wade
------=_NextPart_000_0054_01C1F5AD.F96B7140
Content-Type: application/octet-stream;
name="time.lisp"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
filename="time.lisp"
(in-package :running-log)
#|
Funtions to parse and format time strings. Internally time is stored in
hundreths of seconds and is thus limited to that precision.
Users can enter time in two formats: [] is an optional element
1) Colon Time, where it is of the form =
[<hours>][:][<minutes>][:]<seconds>
<hours> (integer 0 *) is optional, to specify <hours>, <minutes> must =
be specified
<minutes> (integer 0 *) is optional, to specify <minutes>, <seconds> =
must be specified
<seconds> (fixed-float 0 *) is required
there must be a colon to seperate the time elements.
whitespace is ignored between elements.
Valid Examples: "10:23:26.54" "43:00" "1376.2" "108:231 : 21"
2) HMS Time, where the time is an unordered list of the elements =
[<H>][<M>][<S>]
<H> :=3D <hours>H or <hours>h
<M> :=3D <minutes>M or <minutes>m
<S> :=3D <seconds>S or <seconds>s
=20
At least one of <H> <M> or <S> must be supplied. Order is not =
important. Multiple
occurances of any element signals an error.
Valid Examples: "10h23m26.54s" "100m10h" "51243.23s"
A user can specify Colon Time or HMS Time but not a combination of both.
Interesting entry points. READ-TIME-FROM-STRING - parses time from both =
formats.
TIME-STRING - converts integer time =
(hundreths) to colon time string
|#
(defun signal-time-reader-error ()
(declare (special *time-string*))
(error (make-condition 'invalid-time-format :invalid-format =
*time-string*))
nil)
(defun time-list-to-hundreths (time-list)
(declare (type list time-list))
(let ((seconds (first time-list))
(minutes (second time-list))
(hours (third time-list)))
(if (and time-list
(< (length time-list) 4)
(realp seconds)
(or (null minutes) (realp minutes))
(or (null hours) (realp hours)))
(+ (round (* seconds 100))
(* (if minutes minutes 0) 6000)
(* (if hours hours 0) 360000))
(signal-time-reader-error))))
(defun read-hms-time-from-string (*time-string*)
(declare (special *time-string*) (type string *time-string*))
(with-input-from-string (stream *time-string*)
(read-hms-time stream)))
(defun read-time-number (stream)
(declare (type stream stream))
(flet ((char-to-integer (c) (position c "0123456789")))
(let ((temp)
(in-decimal)
(decimal-depth 0)
(c))
(loop
(setf c (read-char stream nil nil))
(cond
((not c) (if temp (return temp) (signal-time-reader-error)))
((digit-char-p c)
(unless temp (setf temp 0))
(cond
(in-decimal=20
(incf decimal-depth 1)
(incf temp (/ (char-to-integer c) (expt 10 decimal-depth))))
(t (setf temp (+ (* temp 10) (char-to-integer c))))))
((char=3D #\. c)
(cond
((not in-decimal) (setf in-decimal t decimal-depth 0))
(in-decimal (signal-time-reader-error))))
(t=20
(unread-char c stream)
(if temp (return temp) (signal-time-reader-error))))))))
(defun read-hms-time (stream)
(declare (type stream stream))
(let ((hours)(minutes)(seconds)(temp)(c)(last-read))
(flet ((reset () (setf temp nil)))
(loop
(setf c (read-char stream nil nil))
(cond
((not c)
(when temp
(case last-read
(:hours (if minutes
(signal-time-reader-error)
(setf minutes temp)))
(:minutes (if seconds
(signal-time-reader-error)
(setf seconds temp)))
(otherwise (signal-time-reader-error))))
(return (time-list-to-hundreths (list (or seconds 0) (or =
minutes 0) (or hours 0)))))
((or (digit-char-p c) (char=3D c #\.))
(when temp (signal-time-reader-error))
(unread-char c stream)
(setf temp (read-time-number stream))
(if (null temp) (signal-time-reader-error)))
((char-equal #\h c)
(cond
((or (not temp) hours minutes seconds) =
(signal-time-reader-error))
(t
(setf hours temp last-read :hours)
(reset))))
((char-equal #\m c)
(cond
((or (not temp) minutes seconds) (signal-time-reader-error))
(t
(setf minutes temp last-read :minutes)
(reset))))
((char-equal #\s c)
(cond
((or (not temp) seconds) (signal-time-reader-error))
(t
(setf seconds temp last-read :seconds)
(reset))))
((char=3D #\Space c) nil)
(t
(signal-time-reader-error)))))))
(defun read-colon-time-from-string (*time-string*)
(declare (special *time-string*) (type string *time-string*))
(with-input-from-string (stream *time-string*)
(read-colon-time stream)))
(defun read-colon-time (stream)
(declare (type stream stream))
(let ((time-list)(temp 0)(expecting-colon)(c))
(flet ((reset () (setf temp 0 expecting-colon nil)))
(loop
(setf c (read-char stream nil))
(cond
((not c)
(when temp (push temp time-list))
(return (time-list-to-hundreths time-list)))
((or (digit-char-p c) (char=3D #\. c))
(when expecting-colon (signal-time-reader-error))
(unread-char c stream)
(setf temp (read-time-number stream))
(unless temp (signal-time-reader-error))
(setf expecting-colon t))
((char-equal #\: c)
(cond
((>=3D (length time-list) 2)
(signal-time-reader-error))
(t (push temp time-list) (reset))))
((char=3D #\Space c) t)
(t
(signal-time-reader-error)))))))
(defun read-time-from-string (time-string)
(if (and time-string (> (length time-string) 0))
(restart-case
(handler-case (read-colon-time-from-string time-string)
(invalid-time-format () (invoke-restart 'restart-hms-time)))
(restart-hms-time () (read-hms-time-from-string time-string)))
nil))
(defun time-string (time)
(declare (type (integer 0 *) time))
(if (and time (not (zerop time)))
(let* ((hours (floor (/ time 360000)))
(minutes (floor (/ (- time (* hours 360000)) 6000)))
(hundreths-seconds (- time (* hours 360000) (* minutes =
6000)))
(seconds (floor (/ hundreths-seconds 100)))
(hundreths (mod hundreths-seconds 100))
(result nil))
(cond
((and (zerop hours)
(zerop minutes))
(setf result (concatenate 'string (princ-to-string seconds))))
((zerop hours)
(setf result (concatenate 'string=20
(princ-to-string minutes) ":"
(when (< seconds 10) "0") =
(princ-to-string seconds))))
(t
(setf result (concatenate 'string
(princ-to-string hours) ":"
(when (< minutes 10) "0") =
(princ-to-string minutes) ":"
(when (< seconds 10) "0") =
(princ-to-string seconds)))))
(if (not (zerop hundreths))
(setf result (concatenate 'string
result
"."
(when (< hundreths 10) "0")
(princ-to-string hundreths))))
result)
nil))
(defclass time-input-pane (capi:text-input-pane)
((previous-text :initform nil :reader time-input-pane-text)
(previous-caret-position :initform 0))
(:default-initargs
:change-callback 'validate-time-text))
(defun validate-time-text (text pane interface caret-position)
(with-slots (previous-text previous-caret-position) pane
(handler-case
(progn
(read-time-from-string text)
(setf previous-text text previous-caret-position =
caret-position))
(invalid-time-format (condition)=20
(setf (capi:text-input-pane-text pane) previous-text
(capi:text-input-pane-caret-position pane) =
previous-caret-position)
(capi:beep-pane)))))
(defmethod (setf time-input-pane-text) (text (tip time-input-pane))
(with-slots (previous-text previous-caret-position) tip
(setf (capi:text-input-pane-text tip) text previous-text text =
previous-caret-position 0)))
------=_NextPart_000_0054_01C1F5AD.F96B7140--
Erik Naggum wrote:
> * Cory Spencer
> | Just a quickie question - is there already a Common Lisp function that
> | will split a string on a given character?
>
> Most often, when people ask quickie questions, they have been working
> themselves through what one would think of as a labyrinth where they
> make brief excursions in the wrong direction and self-correct when they
> hit
> the wall, so to speak. When they hit the wall and do not self-correct,
> they post a quickie question, but there is an arbitary amount of back-
> tracking involved in providing the right answer. Just moving the person
> into a new labyrinth without the particular wall they have run into is
> seldom the best answer, as the wrong choice they have made will lead
> them
> right into another wall shortly thereafter. Therefore, a "quickie" is a
> strong signal to experienced problem-solvers that something is wrong:
> The
> requestor is stuck, but does not think he should have been.
> However, if
> his thinking were correct, he would not be stuck. Yet he is, and that
> is a hint that the amount of backtracking required will be significant
> and that is just the opposite of a "quickie".
This is not always true. I have just changed my OS. I think it is completely
normal if I do not know how to solve even simple UNIX problems. Then it is
very helpful if I can ask somebody. (For example I could not install
anti.aliased fonts in Qt and the right hint, that solved my problem,
consisted of a single sentence.)
Now when I am acting as a teacher and one of my pupils asks me a "simple"
question I carefully investigate whether he is having a more serious
problem. In a newsgroup however (for example in de.sci.mathemtik) I simply
answer the question and do not care about his deeper problems.
> | ie) will perform a similar function as this:
>
> Generally speaking, a reader or parser of some sort.
>
> It is quite important [...]
I do not think, I have understood this deep essay on payload characters,
whatever they may be, and I wonder if the original poster did.
I must admit, however, that I do not understand the closing remark on 70% of
the American adults either.
--
J B
Il n'y a gu�re dans la vie qu'une pr�occupation grave: c'est la mort;
(Dumas)
-----------== Posted via Newsgroups.Com - Uncensored Usenet News ==----------
http://www.newsgroups.com The #1 Newsgroup Service in the World!
-----= Over 100,000 Newsgroups - Ulimited Fast Downloads - 19 Servers =-----
Sorry, jb = Janos Blazi.
--
J B
Il n'y a gu�re dans la vie qu'une pr�occupation grave: c'est la mort;
(Dumas)
-----------== Posted via Newsgroups.Com - Uncensored Usenet News ==----------
http://www.newsgroups.com The #1 Newsgroup Service in the World!
-----= Over 100,000 Newsgroups - Ulimited Fast Downloads - 19 Servers =-----