Strings and lists

From: Josef Eschgfaeller
Subject: Strings and lists
Date: Sun, 20 Jun 1999 00:00:00 +0000
Message-ID: <Pine.LNX.4.04.9906200007360.16695-100000@arbzi.zuhause.fe>

I'm trying to improve
    
    (defun first-word (a)
    (let ((pos (position #\space a))) (if (not pos) a
    (subseq a 0 pos))))
    
In such algorithms for strings it is often not clear to me, in
which cases it is convenient to introduce lists, and when it has
sense to mix strings with lists.
When I try to apply the following idea, valid for a list of characters,
    
    (defun first-part (a)
    (do ((part nil) (b a (rest b)))
    ((or (not b) (char= (first b) #\space)) (nreverse part))
    (push (first b) part)))
    
I could begin by converting the string to a list by
(coerce a 'list), apply then first-part and reconvert
by (coerce b 'string). The second conversion seems easy,
the first one expensive.
    
I tried it also directly, as in
    
    (defmacro char-ok (a i) `(and (array-in-bounds-p ,a ,i) (char ,a ,i)))
    
    (defun first-word (a)
    (do* ((word "" (push-char word x)) (i 0 (+ i 1))
    (x (char-ok a i) (char-ok a i)))
    ((or (not x) (char= x #\space)) word)))
    
with
    
    (defun push-char (word x) (concatenate 'string word (list x)))
    
- probably not efficient - or
    
    (defun first-word (a)
    (do* ((word (make-array '0 :adjustable t :fill-pointer t))
    (i 0 (+ i 1)) (x (char-ok a i) (char-ok a i)))
    ((or (not x) (char= x #\space)) (coerce word 'string))
    (vector-push-extend x word)))

J. Eschgfaeller

Re: Strings and lists Steve Gonedes
Re: Strings and lists Erik Naggum

From: Steve Gonedes
Subject: Re: Strings and lists
Date: Sat, 19 Jun 1999 00:00:00 +0000
Message-ID: <m2zp1vr1c1.fsf@KludgeUnix.com>

Josef Eschgfaeller <···@felix.unife.it> writes:

< I'm trying to improve
<
<     (defun first-word (a)
<     (let ((pos (position #\space a))) (if (not pos) a
<     (subseq a 0 pos))))

Not sure what you mean by improve. This looks fine to me.

< In such algorithms for strings it is often not clear to me, in
< which cases it is convenient to introduce lists, and when it has
< sense to mix strings with lists.

Probably when you need to return a list of strings?

< I tried it also directly, as in
<
<     (defmacro char-ok (a i) `(and (array-in-bounds-p ,a ,i) (char ,a ,i)))

It is probably best to write `char-ok' as a function and declare it
inline.

<     (defun first-word (a)
<     (do* ((word "" (push-char word x)) (i 0 (+ i 1))
<     (x (char-ok a i) (char-ok a i)))
<     ((or (not x) (char= x #\space)) word)))
<
< with
<
<     (defun push-char (word x) (concatenate 'string word (list x)))
<
< - probably not efficient - or

I think the first version using position was the most effective means
of describing your intentions. Some would perhaps like declarations
for clarity, but they're very terse and typing them in is such a drag.

For run-time efficiency using declarations with position should be
fine.

(defun first-word (str)
  (declare (optimize (speed 3) (safety 1)) (simple-string str))
  (let ((pos (or (position #\Space str) -1)))
    (declare (fixnum pos))
    (if (minusp pos)
        str
        (subseq str 0 pos))))

You will probably gain a little bit of speed if you do the loop by
hand using do or loop, but I doubt how much time this will actually
shave.

(defun first-word (str)
  (declare (simple-string str))
  (do ((idx 0 (1+ idx))
       (len (1- (length str)) (1- len)))
      ((char= (schar str idx) #\Space)
       (subseq str 0 idx))
    (declare (fixnum idx len))
    (when (zerop len)
      (return-from first-word str))))

Using loop, it could look like the following.

(defun first-word (str)
  (or (loop for ch across str as idx upfrom 0
          when (char= ch #\Space)
          return (subseq str 0 idx))
      str))

From: Erik Naggum
Subject: Re: Strings and lists
Date: Sun, 20 Jun 1999 00:00:00 +0000
Message-ID: <3138871289438905@naggum.no>

* Josef Eschgfaeller <···@felix.unife.it>
| I'm trying to improve
|     
|     (defun first-word (a)
|     (let ((pos (position #\space a))) (if (not pos) a
|     (subseq a 0 pos))))

  it is hard to read your code the way you (don't) indent it.

(defun first-word (a)
  (let ((pos (position #\space a)))
    (if (not pos)
      a
      (subseq a 0 pos))))

| In such algorithms for strings it is often not clear to me in which cases
| it is convenient to introduce lists and when it has sense to mix strings
| with lists.

  you would have to have extraordinary needs for lists to make sense, but
  instead of writing code for strings, write your code for sequences --
  that way, you won't have to decide on types until you have a particular
  need.  note that POSITION, for instance, works on sequences, not only
  strings.  that is, (position #\Space (coerce "foo bar zot" 'list)) works
  just as well as (position #\Space "foo bar zot").  likewise for SUBSEQ,
  which is short for SUBSEQUENCE.  many new Lisp programmers are thrown by
  the fact that most usual string operations aren't string operations in
  Lisp, but sequence operations, like SEARCH, POSITION, and SUBSEQ.

  speaking generally, I think you should consider a less specific function,
  since hacking up sequences is a very common task, and you also often need
  to know whether there are further elements in the sequence you hack up.
  in other words, return both halves.

(defun get-token (sequence delimiter &key (key #'identity) (test #'eql))
  "Return the non-empty subsequences of SEQUENCES before and after the first
occurrence of the DELIMITER as two values, or NIL if it would be empty."
  (let* ((start (or (position delimiter sequence
 			      :key key
			      :test (complement test))
		    (return-from get-token nil)))
	 (end (or (position delimiter sequence
			    :key key
			    :test test
			    :start start)
		  (return-from get-token (subseq sequence start))))
	 (next (or (position delimiter sequence
			     :key key
			     :test (complement test)
			     :start (1+ end))
		   (return-from get-token (subseq sequence start)))))
    (values (subseq sequence start end) (subseq sequence next))))

  when hacking up fairly large strings, it makes sense to be cons-sensitive
  and replace the last form with

    (values (subseq sequence start end)
	    (if (vectorp sequence)
	      (multiple-value-bind (base offset)
		  (array-displacement sequence)
		(if base
		  (adjust-array sequence (- (length sequence) next)
				:displaced-to base
				:displaced-index-offset (+ next offset))
		  (make-array (- (length sequence) next)
		    :element-type (array-element-type sequence)
		    :displaced-to sequence
		    :displaced-index-offset next)))
	      (subseq sequence next)))

  which will not cause the tail of the string to needlessly copied.

  the above is somewhat expanded from production code in one of my systems.
  I use MAKE-INDIRECT-VECTOR and MOVE-INDIRECT-VECTOR to manipulate
  displaced strings, so the consequent of the final conditional is actually
  only (move-indirect-vector sequence :start next).  (this is why there are
  no :START and :END keywords.)

#:Erik
-- 
@1999-07-22T00:37:33Z -- pi billion seconds since the turn of the century