From: The Glauber
Subject: Q: parsing strings
Date: 
Message-ID: <8qt214$vu7$1@nnrp1.deja.com>
Hello,

is there an easy way in CL to parse fields contained within strings?

For example, suppose a file name:
"LASP.P.MVR.KS.TITL.S04752.D120765"

Is there an easy way to break this into a list:
("LASP" "P" "MVR" "KS" "TITL" "S04752" "D120765")

In Perl, for example, you could do this either by using the "split"
function or regular expressions.
Help me break out of my Perl habit! :-) :-) :-)


Thanks!

glauber

--
Glauber Ribeiro
··········@my-deja.com    http://www.myvehiclehistoryreport.com
"Opinions stated are my own and not representative of Experian"


Sent via Deja.com http://www.deja.com/
Before you buy.

From: David E. Lamy
Subject: Re: Q: parsing strings
Date: 
Message-ID: <slrn8t4f72.upu.delamy@opal.mint.net>
In article <············@nnrp1.deja.com>, The Glauber wrote:
>Hello,
>
>is there an easy way in CL to parse fields contained within strings?
>
>For example, suppose a file name:
>"LASP.P.MVR.KS.TITL.S04752.D120765"
>
>Is there an easy way to break this into a list:
>("LASP" "P" "MVR" "KS" "TITL" "S04752" "D120765")
>
Here follows a beginner's solution:

(defun my-split (orig key)
  (let (list-of-subseqs)
    (if (null orig)
	list-of-subseqs
      (let ((pos-key (position key orig)))
	(cond ((null pos-key)
	       (append (append list-of-subseqs (list orig))
		       (my-split nil key)))
	      (t
	       (append
		(append list-of-subseqs (list (subseq orig 0 pos-key)))
		(my-split (subseq orig (1+ pos-key)) key))))))))

I hope that this is not a too inelegant use of recursion.

-- 
   __o   As of 09/27/2000, free software is still the state of an  
 _`\<,_  open mind.  Develop it, share it and use it!
(*)/ (*) David Emile Lamy             ······@mint.net
         pgp key available at http://pgp5.ai.mit.edu/~bal
From: Rainer Joswig
Subject: Re: Q: parsing strings
Date: 
Message-ID: <joswig-526C00.20385027092000@news.is-europe.net>
In article <·····················@opal.mint.net>, ······@opal.mint.net 
(David E. Lamy) wrote:

> In article <············@nnrp1.deja.com>, The Glauber wrote:
> >Hello,
> >
> >is there an easy way in CL to parse fields contained within strings?
> >
> >For example, suppose a file name:
> >"LASP.P.MVR.KS.TITL.S04752.D120765"
> >
> >Is there an easy way to break this into a list:
> >("LASP" "P" "MVR" "KS" "TITL" "S04752" "D120765")
> >
> Here follows a beginner's solution:
> 
> (defun my-split (orig key)
>   (let (list-of-subseqs)
>     (if (null orig)
> 	list-of-subseqs
>       (let ((pos-key (position key orig)))
> 	(cond ((null pos-key)
> 	       (append (append list-of-subseqs (list orig))
> 		       (my-split nil key)))
> 	      (t
> 	       (append
> 		(append list-of-subseqs (list (subseq orig 0 pos-key)))
> 		(my-split (subseq orig (1+ pos-key)) key))))))))
> 
> I hope that this is not a too inelegant use of recursion.

Try to avoid APPEND.

-- 
Rainer Joswig, Hamburg, Germany
Email: ·············@corporate-world.lisp.de
Web: http://corporate-world.lisp.de/
From: Lieven Marchand
Subject: Re: Q: parsing strings
Date: 
Message-ID: <m3r965vla2.fsf@localhost.localdomain>
The Glauber <··········@my-deja.com> writes:

> In Perl, for example, you could do this either by using the "split"
> function or regular expressions.
> Help me break out of my Perl habit! :-) :-) :-)

There have been several implementations of split posted and refined on
this group. Use deja.com.

PS: Anyone know of another comprehensive archive to refer people to
now that deja.com seems to gradually get out of the archiving
business?

-- 
Lieven Marchand <···@bewoner.dma.be>
Lambda calculus - Call us a mad club
From: The Glauber
Subject: Re: Q: parsing strings
Date: 
Message-ID: <8qt8bf$5vs$1@nnrp1.deja.com>
In article <············@nnrp1.deja.com>,
  The Glauber <··········@my-deja.com> wrote:
> Hello,
>
> is there an easy way in CL to parse fields contained within strings?
>
> For example, suppose a file name:
> "LASP.P.MVR.KS.TITL.S04752.D120765"
>
> Is there an easy way to break this into a list:
> ("LASP" "P" "MVR" "KS" "TITL" "S04752" "D120765")


This is my shamefully novice attempt: it takes 2 parameters: a string to
parse and a separator character.

(defun my-split (source-str separator-char)
  (let
      ((result-list nil)
       (start-pos 0)
       (end-pos 0))
    (dotimes (num-elements (count separator-char source-str))
      (setf end-pos
            (position separator-char source-str :start (+ 1 start-pos)))
      (setf result-list
            (append result-list (list (subseq source-str start-pos end-pos))))
      (setf start-pos (+ 1 end-pos))
      )
    (setf result-list
          (append result-list (list (subseq source-str (+ 1 end-pos)))))
    result-list
))


--
Glauber Ribeiro
··········@my-deja.com    http://www.myvehiclehistoryreport.com
"Opinions stated are my own and not representative of Experian"


Sent via Deja.com http://www.deja.com/
Before you buy.
From: Christian Nyb�
Subject: Re: Q: parsing strings
Date: 
Message-ID: <8766nhg610.fsf@lapchr.siteloft.com>
The Glauber <··········@my-deja.com> writes:

> is there an easy way in CL to parse fields contained within strings?
> 
> For example, suppose a file name:
> "LASP.P.MVR.KS.TITL.S04752.D120765"

There is READ-DELIMITED-LIST, but it tries to read the stream, so it's
not quite useful for your purpose.  In Franz' AllegroServe, there's a
function that does what you want;

USER(17): (net.aserve::split-on-character 
                "LASP.P.MVR.KS.TITL.S04752.D120765" #\.)
("LASP" "P" "MVR" "KS" "TITL" "S04752" "D120765")


As the code is GPL'ed, I suppose it's allright to copy it here, for
context, get the whole aserve package at ftp.franz.com/serve.

;; The if* macro used in this code can be found at:
;;
;; http://www.franz.com/~jkf/ifstar.txt


(defun split-on-character (str char &key count)
  ;; given a string return a list of the strings between occurances
  ;; of the given character.
  ;; If the character isn't present then the list will contain just
  ;; the given string.
  (let ((loc (position char str))
	(start 0)
	(res))
    (if* (null loc)
       then ; doesn't appear anywhere, just the original string
	    (list str)
       else ; must do some work
	    (loop
	      (push (subseq str start loc) res)
	      (setq start (1+ loc))
	      (if* count then (decf count))
	      (setq loc (position char str :start start))
	      (if* (or (null loc)
		       (eql 0 count))
		 then (if* (< start (length str))
			 then (push (subseq str start) res)
			 else (push "" res))
		      (return (nreverse res)))))))

-- 
chr
From: The Glauber
Subject: Re: Q: parsing strings
Date: 
Message-ID: <8qt9q9$7ah$1@nnrp1.deja.com>
In article <··············@lapchr.siteloft.com>,
  ·····@eunet.no (Christian Nyb�) wrote:
> The Glauber <··········@my-deja.com> writes:
>
> > is there an easy way in CL to parse fields contained within strings?
> >
> > For example, suppose a file name:
> > "LASP.P.MVR.KS.TITL.S04752.D120765"
>
> There is READ-DELIMITED-LIST, but it tries to read the stream, so it's
> not quite useful for your purpose.

How about read-delimited-list using a string-stream?

glauber

--
Glauber Ribeiro
··········@my-deja.com    http://www.myvehiclehistoryreport.com
"Opinions stated are my own and not representative of Experian"


Sent via Deja.com http://www.deja.com/
Before you buy.
From: Sunil Mishra
Subject: Re: Q: parsing strings
Date: 
Message-ID: <39D2302E.9000605@everest.com>
Maybe I'm missing something, but I just don't see why the function in 
aserve is so complex... This does essentially the same job, minus the 
cuont argument, which should be really easy to introduce.

(defun split (string char)
(loop for start = 0 then (1+ end)
for end = (position char string :start start)
collect (subseq string start (or end (length string)))
while end))

Christian Nyb� wrote:

> The Glauber <··········@my-deja.com> writes:
> 
> 
>> is there an easy way in CL to parse fields contained within strings?
>> 
>> For example, suppose a file name:
>> "LASP.P.MVR.KS.TITL.S04752.D120765"
> 
> 
> There is READ-DELIMITED-LIST, but it tries to read the stream, so it's
> not quite useful for your purpose.  In Franz' AllegroServe, there's a
> function that does what you want;
> 
> USER(17): (net.aserve::split-on-character 
>                 "LASP.P.MVR.KS.TITL.S04752.D120765" #\.)
> ("LASP" "P" "MVR" "KS" "TITL" "S04752" "D120765")
> 
> 
> As the code is GPL'ed, I suppose it's allright to copy it here, for
> context, get the whole aserve package at ftp.franz.com/serve.
> 
> ;; The if* macro used in this code can be found at:
> ;;
> ;; http://www.franz.com/~jkf/ifstar.txt
> 
> 
> (defun split-on-character (str char &key count)
>   ;; given a string return a list of the strings between occurances
>   ;; of the given character.
>   ;; If the character isn't present then the list will contain just
>   ;; the given string.
>   (let ((loc (position char str))
> 	(start 0)
> 	(res))
>     (if* (null loc)
>        then ; doesn't appear anywhere, just the original string
> 	    (list str)
>        else ; must do some work
> 	    (loop
> 	      (push (subseq str start loc) res)
> 	      (setq start (1+ loc))
> 	      (if* count then (decf count))
> 	      (setq loc (position char str :start start))
> 	      (if* (or (null loc)
> 		       (eql 0 count))
> 		 then (if* (< start (length str))
> 			 then (push (subseq str start) res)
> 			 else (push "" res))
> 		      (return (nreverse res)))))))
From: Hannu Koivisto
Subject: Re: Q: parsing strings
Date: 
Message-ID: <87bsx98z4q.fsf@senstation.vvf.fi>
Sunil Mishra <············@everest.com> writes:

| Maybe I'm missing something, but I just don't see why the function in
| aserve is so complex... This does essentially the same job, minus the
| cuont argument, which should be really easy to introduce.

And minus the optimization that if the character doesn't appear
anywhere in the string, the original string is returned in a list.
But you are right, when I first saw that code it looked so
convoluted that I didn't even bother to try to read and understand
all of it.

| (defun split (string char)
| (loop for start = 0 then (1+ end)
| for end = (position char string :start start)
| collect (subseq string start (or end (length string)))
| while end))

(defun string-split (str &optional (separator #\Space))
  "Splits the string STR at each SEPARATOR character occurrence.
The resulting substrings are collected into a list which is returned.
A SEPARATOR at the beginning or at the end of the string STR results
in an empty string in the first or last position of the list
returned."
  (declare (type string str)
           (type character separator))
  (loop for start = 0 then (1+ end)
        for end   = (position separator str :start start)
        collect (subseq str start end)
        until (null end)))

When you replace that unneccessary (OR END (LENGTH STRING)) with
just END, you get almost my version above (where I'm being only
slightly more explicit with the ending condition), which I think I
posted here a while ago.  If the original poster happens to read
this, I'd recommend trying Deja next time.  This kind of questions
are asked quite frequently.

And yes, introducing the count argument is easy, just one line
more.  If one also wants that "character doesn't appear anywhere in
the string -> return the original string in a list" optimization,
it's also just two more lines or so if added explicitly to this
function, but one could also consider introducing an alternative
subseq function, which doesn't make the guarantee that it always
allocates a new sequence for a result and doesn't share storage
with an old sequence, and use that.  In any case, even with both
those additions, the result would be less complex (more readable,
less than half in length and probably faster too) than the one from
aserve.

-- 
Hannu
From: Frank A. Adrian
Subject: Re: Q: parsing strings
Date: 
Message-ID: <rOzA5.898$XZ5.466837@news.uswest.net>
Why not rename the sequence var, take out the declarations, and name the
function split?  It seems to work for any type of sequence. 'Twould be more
handy, though not necessarily more efficient, methinks.

faa

"Hannu Koivisto" <·····@iki.fi.ns> wrote in message
···················@senstation.vvf.fi...
> (defun string-split (str &optional (separator #\Space))
>   "Splits the string STR at each SEPARATOR character occurrence.
> The resulting substrings are collected into a list which is returned.
> A SEPARATOR at the beginning or at the end of the string STR results
> in an empty string in the first or last position of the list
> returned."
>   (declare (type string str)
>            (type character separator))
>   (loop for start = 0 then (1+ end)
>         for end   = (position separator str :start start)
>         collect (subseq str start end)
>         until (null end)))
From: Dirk Zoller
Subject: Re: Q: parsing strings
Date: 
Message-ID: <39E2BB4E.EB87BA62@onlinehome.de>
This is a multi-part message in MIME format.
--------------FB3F02D642D2D93114FB7658
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

Hello,

I always felt that the very powerful (format) lacks a counterpart
for input. Like in C there is scanf() which in some sense reverses
the effect of printf().

I then tried to write such a thing. It seems possible. Please find
attached a first attempt to do it.

I ran the attached program on a large file with stock-index prices.
In order to make it fast enough I added declarations and proclamations
which on CMUCL actually resulted in a very compact and fast machine
representation. Not too far away from what C's scanf() would do.

Why isn't such a nice function already defined in Common Lisp?

Kind regards

Dirk


-- 
Dirk Zoller				Fon: 06106-876566
Obere Marktstra�e 5			e-mail: ···@sol-3.de
63110 Rodgau
--------------FB3F02D642D2D93114FB7658
Content-Type: text/plain; charset=us-ascii;
 name="scan.lisp"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="scan.lisp"

;;; C-scanf() like format analogon for input.
;;;
;;; $Id$

(proclaim '(optimize (speed 3) (safety 0) (space 0)))


(defmacro with-type (type expr)
  `(the ,type ,(if (atom expr)
                   expr
                   (expand-call type (binarize expr)))))

(defun expand-call (type expr)
  `(,(car expr) ,@(mapcar #'(lambda (a)
                              `(with-type ,type ,a))
                          (cdr expr))))

(defun binarize (expr)
  (if (and (nthcdr 3 expr)
           (member (car expr) '(+ - * /)))
      (destructuring-bind (op a1 a2 . rest) expr
        (binarize `(,op (,op ,a1 ,a2) ,@rest)))
      expr))



(defun whitespace-char-p (c)
  "(c)

Returns non-nil iff c is a non visible character like space or newline."

  (position c #(#\Space
		#\Newline
		#\Tab
		#\Linefeed
		#\Return
		#\Page
		#\Backspace		; questionable
		#\Rubout)))		; questionable


(defun string-to-unsigned-fixnum (str &key (start 0) (end nil) (radix 10))

  "(str &key (start 0) (end nil) (radix 10))

Converts part of a string str to an unsigned fixnum.

Starts converting at position start, stops at end when specified or at
end of string or when a non-convertible character is seen. If end is
specified then it must not be greater than the length of the string.

Assumes the specified radix and does neither recognize a radix
encoding in the string nor a sign nor even leading whitespace.

Returns the converted number and the position of the first character
in the string behind that number. Returns 0 when no convertible
characters where seen at all."

  (declare (simple-string str)
           (fixnum start radix))
  (let ((len (or end (length str))))
    (declare (fixnum len))
    (do ((i start (1+ i))
         (n 0))
        ((= i len)
         (values n i))
      (declare (fixnum i n))
      (let ((d (digit-char-p (schar str i) radix)))
        (if d
            (setq n (with-type fixnum (+ d (* n radix))))
            (return (values n i)))))))


(defun string-to-unsigned-integer (str &key (start 0) (end nil) (radix 10))

  "(str &key (start 0) (end nil) (radix 10))

Converts part of a string str to an integer.

Starts converting at position start, stops at end when specified or at
end of string or when a non-convertible character is seen. If end is
specified then it must not be greater than the length of the string.

Assumes the specified radix and does neither recognize a radix
encoding in the string nor a sign nor even leading whitespace.

Returns the converted number and the position of the first character
in the string behind that number. Returns nil when no convertible
characters where seen at all."

  (declare (simple-string str)
           (fixnum start end radix))
  (let ((len (or end (length str))))
    (do ((i start (1+ i))
         (n 0))
        ((= i len)
         (values n len))
      (let ((d (digit-char-p (schar str i) radix)))
        (if d
            (setq n (with-type integer (+ d (* n radix))))
            (return (if (> i start)
			(values n i))))))))


(defun string-to-integer (str &key (start 0) (end nil) (radix 10))

  "(str &key (start 0) (end nil) (radix 10))

Converts part of a simple-string str to an integer.

Skips leading whitespace and recognizes #\+ and #\- as signs.

Converts up to end (if specified) or end of string or until a
non-convertible character is seen, whichever comes first.

Uses the specified radix as number conversion base. Alas the specified
radix may be overwritten from the input in the string using a syntax
of #x, #o, #b, #nnr for hex, octal, binary or other.

Returns nil on error or the resulting integer and the position of the
first character behind the integer."

  (declare (simple-string str)
	   (fixnum start radix))
  (let* ((len (let ((l (length str)))
		(if end (min l end) l)))
	 (i (position-if-not #'whitespace-char-p str
			     :start start :end len))
	 (negative nil)
	 (n nil))
    (declare (fixnum len i))
    (when (= i len)
      (return nil))
    (case (schar str i)
      (#\- (setq negative t)
	   (incf i)
	   (when (= i len)
	     (return nil)))
      (#\+ (incf i)
	   (when (= i len)
	     (return nil))))
    (when (char= #\# (schar str i))
      (incf i)
      (if (= i len)
	  (return nil))
      (case (schar str i)
	(#\b (incf i) (setf radix #b10))
	(#\o (incf i) (setf radix #o10))
	(#\x (incf i) (setf radix #x10))
	((t) (multiple-value-setq (radix i)
	       (string-to-unsigned-fixnum str :start i :end end))
	 (when (not radix)
	   (return nil))
	 (when (= i len)
	   (return nil))
	 (unless (char-equal (schar str i) #\r)
	   (return nil))
	 (incf i))))
    (multiple-value-setq (n i)
      (string-to-unsigned-integer str :start i :end end :radix radix))
    (when (not n)
      (return nil))
    (values (if negative (- n) n) i)))


#|

(defun string-to-integer-1 (str &key (start 0) (end nil) (radix 10))
  (declare (string str)
           (fixnum start end radix))
  (let* ((l (length str))
         (r (if end (min l end) l)))
    (do ((i start (1+ i))
         (d 0 (if (< i r)
                  (digit-char-p (char str i) radix)))
         (n 0 (+ d (* n radix))))
        ((not d)
         (values n (1- i))))))


(defun test-string-to-integer (fun n)
  (dotimes (i n)
    (let ((k (funcall fun "12345"))))))

(compile 'string-to-fixnum)
(compile 'string-to-integer)
(compile 'string-to-integer-1)
(compile 'test-string-to-integer)

|#


(defun string-to-float (str &key width (radix 10))
  (multiple-value-bind (int-part int-part-len)
      (string-to-integer str width radix)
    (if (or (not width) (< int-part-len width))
        (cond (char str int-part-len)

  (let* ((l (length str))
         (w (if width (min l width) l)))
    (do ((i 0 (1+ i))
         (d 0 (if (< i w)
                  (digit-char-p (char str i) radix)))
         (n 0 (+ d (* n radix))))
        ((not d)
         (values n (1- i))))))

(defun scan-formatted (form )
  (let (result)
    (labels
        ((scan-character (width)
           (push (read-char strm) result))
         (scan-string (width)
           (push (read-char strm) result))
         (scan-integer (width)
           (push (stream-parse-integer strm) result))
         (scan-float (width)
           (push (stream-parse-integer strm) result))
         (dispatch (c)
           (case c
             (#\~
              (let ((width (stream-parse-integer form))
                    (form-char (read-char form nil nil)))
                (case form-char
                  (#\c (scan-character width))
                  (#\s (scan-string width))
                  (#\d (scan-integer width))
                  (#\f (scan-float width))
                  (otherwise nil))))
             (otherwise
              (let ((d (read-char strm)))
                (eq c d))))))
      (do* ((c (read-char form nil nil)
               (read-char form nil nil)))
           ((or (not c)
                (not (dispatch c)))
            (if c
                ;; EOF in form not hit, means error, return nil.
                nil
                ;; EOF in form, means all formats processed.
                ;; Return reversed result list.
                (values-list (nreverse result))))))))

(defun read-daxa ()
  (with-open-file (daxa "daxa.asc"
                        :direction :input)
    (do ((line (read-line daxa nil nil)
               (read-line daxa nil nil))
         (result nil
                 (cons (multiple-value-list
                           (scan-string-string
                            "~2d~2d~2d,~3d.~3d,~3d.~3d,~3d.~3d,~3d.~3d"
                            line))
                       result)))
        ((not line) result)))
)

--------------FB3F02D642D2D93114FB7658--
From: Rahul Jain
Subject: Re: Q: parsing strings
Date: 
Message-ID: <8rufvp$crc$1@joe.rice.edu>
In article <·················@onlinehome.de> on Tue, 10 Oct 2000 08:46:38
+0200, Dirk Zoller <···@onlinehome.de> wrote:

> Hello,
> 
> I always felt that the very powerful (format) lacks a counterpart
> for input. Like in C there is scanf() which in some sense reverses
> the effect of printf().
> 
> I then tried to write such a thing. It seems possible. Please find
> attached a first attempt to do it.
...
> 
> Why isn't such a nice function already defined in Common Lisp?

Because of *print-readably* and (read).
For most objects, it works quite well.

-- 
-> -\-=-=-=-=-=-=-=-=-=-/^\-=-=-=<*><*>=-=-=-/^\-=-=-=-=-=-=-=-=-=-/- <-
-> -/-=-=-=-=-=-=-=-=-=/ {  Rahul -<>- Jain   } \=-=-=-=-=-=-=-=-=-\- <-
-> -\- "I never could get the hang of Thursdays." - HHGTTG by DNA -/- <-
-> -/- http://photino.sid.rice.edu/ -=- ·················@usa.net -\- <-
|--|--------|--------------|----|-------------|------|---------|-----|-|
   Version 11.423.999.210020101.23.50110101.042
   (c)1996-2000, All rights reserved. Disclaimer available upon request.
From: Dirk Zoller
Subject: Re: Q: parsing strings
Date: 
Message-ID: <39E2DDCD.A8F8E749@sol-3.de>
Rahul Jain wrote:
> > I always felt that the very powerful (format) lacks a counterpart
> > for input. Like in C there is scanf() which in some sense reverses
> > the effect of printf().
> >
> > Why isn't such a nice function already defined in Common Lisp?
> 
> Because of *print-readably* and (read).
> For most objects, it works quite well.

And what if I have file with data not printed by Lisp which I want
to in (not read back in)?

By pointing out the analogy printf()/scanf() I didn't mean that
I just want to read data which I previously wrote (this is particularly
easy in Lisp, although I found the Lisp reader of at least one system
pretty slow).

No, I meant to read *any* formatted text data from the outside world,
like (format) can very nicely format data to the outside world.

Don't Lisp programmers do that?

-- 
Dirk Zoller					Phone:  +49-69-50959861
Sol-3 GmbH&Co KG				Fax:    +49-69-50959859
Niddastrape 98-102				e-mail: ···@sol-3.de
60329 Frankfurt/M
Germany
From: Christophe Rhodes
Subject: Re: Q: parsing strings
Date: 
Message-ID: <sqsnq5ujqa.fsf@lambda.jesus.cam.ac.uk>
Dirk Zoller <···@sol-3.de> writes:

> Rahul Jain wrote:
> > > I always felt that the very powerful (format) lacks a counterpart
> > > for input. Like in C there is scanf() which in some sense reverses
> > > the effect of printf().
> > >
> > > Why isn't such a nice function already defined in Common Lisp?
> > 
> > Because of *print-readably* and (read).
> > For most objects, it works quite well.
> 
> And what if I have file with data not printed by Lisp which I want
> to in (not read back in)?
> 
> By pointing out the analogy printf()/scanf() I didn't mean that
> I just want to read data which I previously wrote (this is particularly
> easy in Lisp, although I found the Lisp reader of at least one system
> pretty slow).
> 
> No, I meant to read *any* formatted text data from the outside world,
> like (format) can very nicely format data to the outside world.
> 
> Don't Lisp programmers do that?

This seems to come up every once in a while. I think the answer is
"yes, they do", but that people write their own, dedicated code for
the particular application.

Having said that, I amused myself a while ago (when I was meant to be
revising for finals) by writing some code to do (setf format). You're
most welcome to a copy, with of course no warranty, etc, etc,
etc. It's at
<URL:http://www-jcsu.jesus.cam.ac.uk/~csr21/format-setf.lisp>. Last I
checked, it didn't break on load. 

However, the reasons that there isn't anything like this in the
language include the uncertainty of the semantics of (setf (format nil
"~d~d" x y) "123"), and so on, and you'll find that my code will
probably break on ambiguous input.

Christophe
-- 
Jesus College, Cambridge, CB5 8BL                           +44 1223 524 842
(FORMAT T "(·@{~w ········@{~w~^ ~})" 'FORMAT T "(·@{~w ········@{~w~^ ~})")
From: Dirk Zoller
Subject: Re: Q: parsing strings
Date: 
Message-ID: <39E2E6A2.CADB2553@sol-3.de>
Christophe Rhodes wrote:

> This seems to come up every once in a while. I think the answer is
> "yes, they do", but that people write their own, dedicated code for
> the particular application.

That was the answer I got two years ago when I last played with Lisp
and asked the same stupid question. I just thought maybe Lispers, who
seem otherwise very clever people, meanwhile got tired of reinventing
such an obviously required wheel. Seems not?

> However, the reasons that there isn't anything like this in the
> language include the uncertainty of the semantics of (setf (format nil
> "~d~d" x y) "123"), and so on, and you'll find that my code will
> probably break on ambiguous input.

This example isn't ambigous: x is assigned 123, y is left alone, a value
of 1 is returned.

But why so ambitious? Why this added (setf) complexity? I would just
return multiple values or a list of results. That would also be more
"functional".


-- 
Dirk Zoller					Phone:  +49-69-50959861
Sol-3 GmbH&Co KG				Fax:    +49-69-50959859
Niddastrape 98-102				e-mail: ···@sol-3.de
60329 Frankfurt/M
Germany
From: Christophe Rhodes
Subject: Re: Q: parsing strings
Date: 
Message-ID: <sqog0tugf6.fsf@lambda.jesus.cam.ac.uk>
Dirk Zoller <···@sol-3.de> writes:

> Christophe Rhodes wrote:
> 
> > This seems to come up every once in a while. I think the answer is
> > "yes, they do", but that people write their own, dedicated code for
> > the particular application.
> 
> That was the answer I got two years ago when I last played with Lisp
> and asked the same stupid question. I just thought maybe Lispers, who
> seem otherwise very clever people, meanwhile got tired of reinventing
> such an obviously required wheel. Seems not?

I don't think that the correct solution is at all obvious. That's the
problem (well, not that _I_ don't think the solution is obvious, but
that lots of people don't think the solution is obvious. Um. You know
what I mean).
 
> > However, the reasons that there isn't anything like this in the
> > language include the uncertainty of the semantics of (setf (format nil
> > "~d~d" x y) "123"), and so on, and you'll find that my code will
> > probably break on ambiguous input.
> 
> This example isn't ambigous: x is assigned 123, y is left alone, a value
> of 1 is returned.
> 
> But why so ambitious? Why this added (setf) complexity? I would just
> return multiple values or a list of results. That would also be more
> "functional".

Well, the short answer is that it was because it was fun (remember, I
should have been working on passing a Physics degree).

However, you can get what you want in any case with a little simple
macrology on top of the setf stuff, no?

Christophe
-- 
Jesus College, Cambridge, CB5 8BL                           +44 1223 524 842
(FORMAT T "(·@{~w ········@{~w~^ ~})" 'FORMAT T "(·@{~w ········@{~w~^ ~})")
From: Dirk Zoller
Subject: Re: Q: parsing strings
Date: 
Message-ID: <39E326B0.A0F32239@sol-3.de>
Christophe Rhodes wrote:
> I don't think that the correct solution is at all obvious. That's the
> problem (well, not that _I_ don't think the solution is obvious, but
> that lots of people don't think the solution is obvious. Um. You know
> what I mean).

Well, probably due to my lack of experience with Lisp, I don't see
any problems. I'd say, what I've posted earlier are the beginnings
of a nice solution.

The only drawback is that the Lisp code to fiddle around with each
character of both format string and input string is probably pretty
inefficient (unless heavily pumped up with type declarations).
Therefore this should be in the System implementation, were it could
be done in C or whatever chosen by the system maker.

The C-programmer (although she could) typically doesn't write her own
scanf(), so why should the Lisp programmer? Both the amount of work
and the benefits seem similar to me in both worlds. One provides such
a solution, the other doesn't. Strange.


-- 
Dirk Zoller					Phone:  +49-69-50959861
Sol-3 GmbH&Co KG				Fax:    +49-69-50959859
Niddastrape 98-102				e-mail: ···@sol-3.de
60329 Frankfurt/M
Germany
From: Erik Naggum
Subject: Re: Q: parsing strings
Date: 
Message-ID: <3180181525265382@naggum.net>
* Dirk Zoller <···@sol-3.de>
| Well, probably due to my lack of experience with Lisp, I don't see
| any problems. I'd say, what I've posted earlier are the beginnings
| of a nice solution.

  You responded "why so ambitious" to someone's argument that you
  can't do it all.  If you have the beginnings, where is it going, if
  you don't want others to be so ambitious?

| The only drawback is that the Lisp code to fiddle around with each
| character of both format string and input string is probably pretty
| inefficient (unless heavily pumped up with type declarations).
| Therefore this should be in the System implementation, were it could
| be done in C or whatever chosen by the system maker.

  Wrong conclusion.  Therefore the Common Lisp implementation should
  be improved to the point where such things are fast enough to be
  written in a good language.  In practice, they are.  Doing parsing
  in C is insane.  C is unsuited to process textual input.  (One might
  also argue that Common Lisp is unsuited to process binary input.)
  Perl is the answer to the needs of the C programmers.  And if this
  doesn't scare you, you're too much of a cynic for your own good.

| The C-programmer (although she could) typically doesn't write her
| own scanf(), so why should the Lisp programmer?  Both the amount of
| work and the benefits seem similar to me in both worlds.  One
| provides such a solution, the other doesn't. Strange.

  I wasn't aware that C programmers were female�, but C programmers
  don't write their own necessary tools because they have something
  that almost fits the bill, for a relaxed understanding of "fits",
  and so they never quite get it right, especially when the discover
  that to get it entirely right requires massive support from tools
  not at their disposal.  Hence the gargantuan libraris of C++ and
  Java, which both do a much better approximation to "fit" than C ever
  could hope for, except they are also _massively_ expensive to learn
  to use well and very cumbersome and verbose in practice to boot.

  Incidentally, tools such as yacc and lex are very good, but they are
  much, much slower than anyone who contemplates using them could even
  conceive that they would be.  It's C, so it must be fast, right?
  Well, they're C allright, and _therefore_ slow, because C doesn't
  have the necessary machinery to process textual input efficiently,
  so those who wanted it made half-assed attempts and were satisfied
  with them prematurely, like the immigrants who stop improving their
  English as soon as they are no longer actively bothered by repeating
  themselves to those who don't understand him, or find others who can
  understand their inferior language skills and pronunciation.

  I find that Lisp's very nature makes writing parsers easy and very
  straight-forward.  Much easier than doing them in C with all sorts
  of inferior tools that don't quite cut it.  Like scanf, regexps, ...

#:Erik
-------
� I am, however, aware of the silly, annoying trend among some people
  who don't appreciate the history of the English language to think
  that "he" and "man" refers to the _male_.  They don't.  The male in
  English doesn't have his own pronouns and sex�-specific terms the
  way the female does.  And so now you want to take everything away
  from the males?  To what end is this productive and constructive?
� Yes, it's "sex", not "gender", too.
-- 
  If this is not what you expected, please alter your expectations.
From: Dirk Zoller
Subject: Re: Q: parsing strings
Date: 
Message-ID: <39E34BCE.3E0D8085@sol-3.de>
Erik Naggum wrote:

>   You responded "why so ambitious" to someone's argument that you
>   can't do it all.  If you have the beginnings, where is it going, if
>   you don't want others to be so ambitious?

Negative. I meant it is pointless (over-ambitious) to do it in that
(setf()) disguise, at least at first. I'd care for the functionality
first. This can be perfectly presented using a list as return value or
multiple values.

>   I wasn't aware that C programmers were female�, but C programmers

Personally, I wish more programmers were female.

The rest is blah blah, partly right, partly wrong, mostly off topic,
just the stuff to heat up the discussion a little.


-- 
Dirk Zoller					Phone:  +49-69-50959861
Sol-3 GmbH&Co KG				Fax:    +49-69-50959859
Niddastrape 98-102				e-mail: ···@sol-3.de
60329 Frankfurt/M
Germany
From: Erik Naggum
Subject: Re: Q: parsing strings
Date: 
Message-ID: <3180206945125573@naggum.net>
* Dirk Zoller <···@sol-3.de>
| The rest is blah blah, partly right, partly wrong, mostly off topic,
| just the stuff to heat up the discussion a little.

  Why are you so easily manipulated?  Why are you telling everyone?

  Where's the purpose to your communication that enables others to see
  where you're going and to share your journey with you?  Responses
  like the above are clear indicators that you don't have any purpose
  of your own and get side-tracked by any contrary information.  This
  also explains why you think the way you do about how to use Lisp and
  why you are reinventing an inferior non-solution to a non-problem.
  Think carefully about what you want to accomplish, do not focus on
  the means of accomplishing it until you are ready.  Above all, don't
  get all self-defensive because you weren't ready in time -- just
  think about it more and _become_ ready at some later time.  It's OK
  to make mistakes if they are recognized as such, but not OK if you
  defend them as if they weren't.

  This was intended to return the discussion to a normal temperature
  despite your stupid desire to heat it up for your own entertainment.
  If you feel heated up, answer the first two questions honestly, and
  realize that you have exposed yourself _way_ too much already.

#:Erik
-- 
  If this is not what you expected, please alter your expectations.
From: Dirk Zoller
Subject: Off topic personal flame war.
Date: 
Message-ID: <39E3C35A.169AFE89@onlinehome.de>
Sorry list, I should have known better.
Maybe it has some entertaining value.


Erik Naggum wrote:
> * Dirk Zoller <···@sol-3.de>
> | The rest is blah blah, partly right, partly wrong, mostly off topic,
> | just the stuff to heat up the discussion a little.
> 
>   Why are you so easily manipulated?  Why are you telling everyone?

Well, your comments tend in all sorts of directions which seemed useless
with respect to what I was trying to say.

You also make cheap points insofar you simply claim when the world is not
perfect, then it just has to be perfect (compilers just must be compiling
good, data formats just have to be well designed etc etc) Of course its
you who decides what is perfect. A never ending source of joy for you.

This is not helpful to me and yes, this made your posting off topic blah blah.

>   Where's the purpose to your communication that enables others to see
>   where you're going and to share your journey with you?

Parse error in sentence.

>   Responses
>   like the above are clear indicators that you don't have any purpose
>   of your own and get side-tracked by any contrary information.

I would not go so far as to call your postings information.
In fact I'm trying to get not -- too -- distracted.
You're doing a good job at distracting. Years of practice I guess.

>   This
>   also explains why you think the way you do about how to use Lisp and
>   why you are reinventing an inferior non-solution to a non-problem.

See above, your trick is to aggressively deny anything not matching your
notion of perfection of the day.

Doing so at maximum length, implicitly pretending you could do better,
probably never showing a constructive solution. Yawn.

>   Think carefully about what you want to accomplish, do not focus on
>   the means of accomplishing it until you are ready.  Above all, don't
>   get all self-defensive because you weren't ready in time -- just
>   think about it more and _become_ ready at some later time.  It's OK
>   to make mistakes if they are recognized as such, but not OK if you
>   defend them as if they weren't.

You're so kind master. I really buy that you mean that.

>   This was intended to return the discussion to a normal temperature

Here I'd been your friend again :-)

>   despite your stupid desire to heat it up for your own entertainment.

But that spoiled it :-(

>   If you feel heated up, answer the first two questions honestly, and

I feel annoyed. Bugged.

>   realize that you have exposed yourself _way_ too much already.

Oh really? Damn!



-- 
Dirk Zoller				Fon: 06106-876566
Obere Marktstra�e 5			e-mail: ···@sol-3.de
63110 Rodgau
From: Erik Naggum
Subject: Re: Off topic personal flame war.
Date: 
Message-ID: <3180219689392883@naggum.net>
* Dirk Zoller <···@onlinehome.de>
| Sorry list, I should have known better.

  Yes, you should.  I'll repeat the questions you should ask yourself
  and answer honestly without posting any more personally revealing
  nonsense.

  Why are you so easily manipulated?  Why are you telling everyone?

  You have chosen to take on the role of a village idiot on tour,
  asking everyone to yank your chain in order to entertain them.  It
  isn't entertaining.  It's very stupid and annoying to watch to boot.

  Go reimplement scanf in XML, now.

#:Erik
-- 
  If this is not what you expected, please alter your expectations.
From: Rainer Joswig
Subject: Re: Q: parsing strings
Date: 
Message-ID: <joswig-ACACF9.12003010102000@news.is-europe.net>
In article <·················@sol-3.de>, Dirk Zoller <···@sol-3.de> 
wrote:

> Rahul Jain wrote:
> > > I always felt that the very powerful (format) lacks a counterpart
> > > for input. Like in C there is scanf() which in some sense reverses
> > > the effect of printf().
> > >
> > > Why isn't such a nice function already defined in Common Lisp?
> > 
> > Because of *print-readably* and (read).
> > For most objects, it works quite well.
> 
> And what if I have file with data not printed by Lisp which I want
> to in (not read back in)?
> 
> By pointing out the analogy printf()/scanf() I didn't mean that
> I just want to read data which I previously wrote (this is particularly
> easy in Lisp, although I found the Lisp reader of at least one system
> pretty slow).
> 
> No, I meant to read *any* formatted text data from the outside world,
> like (format) can very nicely format data to the outside world.

I don't have a definitive answer for that, but it seems
that there are no "simple" solutions available. Somebody
has to propose a design (an implementation would also
be nice) and the community will have to see if they like
it.

> Don't Lisp programmers do that?

They do. There are some regexp-packages (non-standard, too)
and some parser tools that people seem to use.

Unfortunately sometimes a "naive" design mixes with slow
basic performance of some Lisp facilities (STREAMS, ...),
so handwritten code is often faster.

Rainer Joswig

-- 
Rainer Joswig, Hamburg, Germany
Email: ·············@corporate-world.lisp.de
Web: http://corporate-world.lisp.de/
From: Marco Antoniotti
Subject: Re: Q: parsing strings
Date: 
Message-ID: <y6cpul8g2de.fsf@octagon.mrl.nyu.edu>
Dirk Zoller <···@sol-3.de> writes:

> Rahul Jain wrote:
> > > I always felt that the very powerful (format) lacks a counterpart
> > > for input. Like in C there is scanf() which in some sense reverses
> > > the effect of printf().
> > >
> > > Why isn't such a nice function already defined in Common Lisp?
> > 
> > Because of *print-readably* and (read).
> > For most objects, it works quite well.
> 
> And what if I have file with data not printed by Lisp which I want
> to in (not read back in)?

That is the crux of the problem.  XML is *a* solution not only for CL
but for the rest of the wolrd as well :)

> By pointing out the analogy printf()/scanf() I didn't mean that
> I just want to read data which I previously wrote (this is particularly
> easy in Lisp, although I found the Lisp reader of at least one system
> pretty slow).
> 
> No, I meant to read *any* formatted text data from the outside world,
> like (format) can very nicely format data to the outside world.
> 
> Don't Lisp programmers do that?

Yes.  And in the most general case you must resolve to writing a
complex parser (which may rely on scanf) to handle the quirkiness of
the format.  That is why XML is a step in the right direction.

Cheers

-- 
Marco Antoniotti =============================================================
NYU Bioinformatics Group			 tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                          fax  +1 - 212 - 995 4122
New York, NY 10003, USA				 http://galt.mrl.nyu.edu/valis
             Like DNA, such a language [Lisp] does not go out of style.
			      Paul Graham, ANSI Common Lisp
From: Erik Naggum
Subject: Re: Q: parsing strings
Date: 
Message-ID: <3180205719547890@naggum.net>
* Marco Antoniotti <·······@cs.nyu.edu>
| Yes.  And in the most general case you must resolve to writing a
| complex parser (which may rely on scanf) to handle the quirkiness of
| the format.  That is why XML is a step in the right direction.

  Bzzzt.  Just Plain Wrong.  XML does _exactly_ nothing to help this.
  It doesn't even _enable_ something that helps the situation.  XML is
  just syntax for naming elements in a structure.  That structure has
  a view, according to the granularity at which you want to process it.

  In Common Lisp, the Lisp reader increases the granularity to the
  object level.  This is very good.  This is in fact brilliant.  XML
  does no such thing.  XML only names the _strings_ that somehow make
  up the objects, the operative word being "somehow".

  How do you write a date in XML?  I favor <date>2000-10-10</date>.
  Others <date><year>2000</year><month>10</month><day>10</day><date>,
  and yet others prefer to omit the century (nothing learned from
  Y2K), write the date in human-friendly forms, or even using the
  names of the months, abbreviated, in local languages, misspelled.

  You can teach the Common Lisp reader to accept @2000-10-10 as a date
  object, and I have.  It works like a charm.  What does XML offer
  above and beyond specific object notations?  And speaking of those
  "objects", XML is supposed object-oriented, but in reality, it's
  _only_ character-oriented, as in: no object in sight, as in: those
  who get the named strings (counting both elements and attributes)
  from XML need to parse them for their own contents -- because, and
  this surprises the XML people when the limitation of their bogus
  approach dawns on them, real data is _not_ made up of strings.
  Strings constitute _representations_ of the data, which must be
  parsed, checked for consistency and used to create objects, and if
  this sounds like we're back at square 1, that's exactly the case.

  XML is a giant step in no direction at all.  If a syntax doesn't
  produce objects that may be manipulated as such, it's worthless.
  In the case of XML, it's a step in the right direction for all the
  hopeless twits who otherwise wouldn't have a job in the booming IT
  industry, for all those H1B visa applicants who would never have a
  chance to get out of their rotten, backward countries, etc, but as
  far as the information is concerned, our ability to read and write
  data consistently and portably, XML offers us exactly _nothing_, but
  carries huge expenses and causes investments and time to be diverted
  from every smarter solution, which could be a competitor... which is
  why such fine information custodians as Microsoft are adopting and
  embracing it.

  This is not to say that XML can't be used productively, but it isn't
  _XML_ that's doing it when it's done, it's the semantics you add to
  the syntax that does it, the objects that wind up in memory in some
  computer somewhere and which there exists code to manipulate.  You
  can do that better, cheaper, faster, and even better standardized
  without XML than with XML.  XML is a truly _magnificent_ waste.

#:Erik
-- 
  If this is not what you expected, please alter your expectations.
From: Marco Antoniotti
Subject: Re: Q: parsing strings
Date: 
Message-ID: <y6chf6jib61.fsf@octagon.mrl.nyu.edu>
Erik Naggum <····@naggum.net> writes:

> * Marco Antoniotti <·······@cs.nyu.edu>
> | Yes.  And in the most general case you must resolve to writing a
> | complex parser (which may rely on scanf) to handle the quirkiness of
> | the format.  That is why XML is a step in the right direction.
> 
>   Bzzzt.  Just Plain Wrong.  XML does _exactly_ nothing to help this.
>   It doesn't even _enable_ something that helps the situation.  XML is
>   just syntax for naming elements in a structure.  That structure has
>   a view, according to the granularity at which you want to process
>   it.

Come on!  I did not say that XML is an absolutely good thing.  I was
merely impliyng that XML is a reinforcement of the Fundamental Law of
Programming Languages

	\limit{y \rigtharrow \mathrm{today} + \epsilon} PL_{y}
        = \mathrm{Common Lisp}_{1989}
          + \mathrm{Type Inference}

(where $y$ is the year and $PL_y$ is any programming language other
than Common Lisp as it is used in year $y$). (I know the TeX may be
wrong! :) )

XML *almost* serves as S-exprs for the rest of the world (namely the
C/C++, Perl and Java world).  The pragmatics of this fact are IMHO
very important. Hype has its importance.

As for the rest of your message, it is right on the money.

Cheers

-- 
Marco Antoniotti =============================================================
NYU Bioinformatics Group			 tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                          fax  +1 - 212 - 995 4122
New York, NY 10003, USA				 http://galt.mrl.nyu.edu/valis
             Like DNA, such a language [Lisp] does not go out of style.
			      Paul Graham, ANSI Common Lisp
From: Marco Antoniotti
Subject: Re: Q: parsing strings
Date: 
Message-ID: <y6csnq4g2ki.fsf@octagon.mrl.nyu.edu>
Dirk Zoller <···@onlinehome.de> writes:

> This is a multi-part message in MIME format.
> --------------FB3F02D642D2D93114FB7658
> Content-Type: text/plain; charset=iso-8859-1
> Content-Transfer-Encoding: 8bit
> 
> Hello,
> 
> I always felt that the very powerful (format) lacks a counterpart
> for input. Like in C there is scanf() which in some sense reverses
> the effect of printf().

Your deed is a worthy one.  However, note that CL has READ which makes
scanf pretty much useless in many contexts.

> I then tried to write such a thing. It seems possible. Please find
> attached a first attempt to do it.
> 
> I ran the attached program on a large file with stock-index prices.
> In order to make it fast enough I added declarations and proclamations
> which on CMUCL actually resulted in a very compact and fast machine
> representation. Not too far away from what C's scanf() would do.


See above.

Cheers

-- 
Marco Antoniotti =============================================================
NYU Bioinformatics Group			 tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                          fax  +1 - 212 - 995 4122
New York, NY 10003, USA				 http://galt.mrl.nyu.edu/valis
             Like DNA, such a language [Lisp] does not go out of style.
			      Paul Graham, ANSI Common Lisp
From: Duane Rettig
Subject: Re: Q: parsing strings
Date: 
Message-ID: <4lmvwvj6c.fsf@beta.franz.com>
Dirk Zoller <···@onlinehome.de> writes:

> Hello,
> 
> I always felt that the very powerful (format) lacks a counterpart
> for input. Like in C there is scanf() which in some sense reverses
> the effect of printf().

Perhaps this is a naiive question, but how often do you really use
scanf?  Is it really useful to you?

Or perhaps you are fitting simpler problems into scanf solutions:
I found a few uses of scanf in our own code, but most of them are
of the form:

    sscanf( argv[1], "%d", &n );

which is much more simply and efficiently written as a call to atoi()
or atol().

Perhaps read-from-string, followed by a type test, is a simpler
way to solve some of the higher level problems you would like to
solve with a (setf format) function.

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)
From: Dirk Zoller
Subject: Re: Q: parsing strings
Date: 
Message-ID: <39E3469B.C16D2AFC@sol-3.de>
Duane Rettig wrote:
> > I always felt that the very powerful (format) lacks a counterpart
> > for input. Like in C there is scanf() which in some sense reverses
> > the effect of printf().
> 
> Perhaps this is a naiive question, but how often do you really use
> scanf?  Is it really useful to you?

Less often than xprintf(), but for good reasons. I have to deal with
messages formatted by other software and sscanf() is extremely handy
to chop these into the right pieces and dig out the values at the same
time.

I also define simple config file formats with a certain flexibility
which I achieve without massive parsers just by trying to sscanf() a
line in various ways. Without much hassle you get the information:
Is this the line you're expecting, what are the values?

> Or perhaps you are fitting simpler problems into scanf solutions:
> I found a few uses of scanf in our own code, but most of them are
> of the form:
> 
>     sscanf( argv[1], "%d", &n );
> 
> which is much more simply and efficiently written as a call to atoi()
> or atol().

Only if you can live with the uncertainty if that was a number at all
what you handed over to atoi(). I usually perfer something like

	if (1 != sscanf (line, "%d %n", &value, &length) ||
  	    length != strlen (line))
	  that was no number;

Simple enough, as it also catches cases where I expect a number of
numbers or strings with certain delimiters. That's just great.
(Don't try this with Borland or Microsoft C, their scanf() is broken.)

> Perhaps read-from-string, followed by a type test, is a simpler
> way to solve some of the higher level problems you would like to
> solve with a (setf format) function.

Sounds a little like, hey there's no need for sscanf(), you can
achieve all that with a little char* hackery. You can, but it's not
convenient. In the case of Lisp it is also very inefficient.

(To support that efficiency argument: I sed/awk-ed my input file into
nice Lisp expressions. That took a second. Then I handed it to (read),
that took 20 seconds. Besides having to resort to such a technique for
a simple data input problem is not exactly what I expect from a general
purpose high level language, the performance of (read) when applied to
lots of data was -- at least with my system -- very poor.)

All people pointing to (read) are missing the point. The most grotesque
idea I heard was hacking read macros to do some parsing on input. This 
is probably very exciting, but I'd still prefer if I could achieve
the same with a simple and effective tool which works similarly to
(format).


-- 
Dirk Zoller					Phone:  +49-69-50959861
Sol-3 GmbH&Co KG				Fax:    +49-69-50959859
Niddastrape 98-102				e-mail: ···@sol-3.de
60329 Frankfurt/M
Germany
From: Raymond Laning
Subject: Re: Q: parsing strings
Date: 
Message-ID: <39E371A1.5B9DCEE2@west.raytheon.com>
the people responding to your postings evidently never had to deal with
integrating legacy (e.g. paleolithic) systems.  I had to write a
formatted-read function to read output from fortran programs that could
not be maintained because the people that wrote them were no longer
employed (or in some cases, living).  I am sorry that the sourcecode for
my function is no longer available to me else I would pass it along, but
IIRC it had many similarities to yours

Dirk Zoller wrote:
> 
<snip>
> --
> Dirk Zoller                                     Phone:  +49-69-50959861
> Sol-3 GmbH&Co KG                                Fax:    +49-69-50959859
> Niddastrape 98-102                              e-mail: ···@sol-3.de
> 60329 Frankfurt/M
> Germany
From: Tim Bradshaw
Subject: Re: Q: parsing strings
Date: 
Message-ID: <ey3hf6kqusq.fsf@cley.com>
* Raymond Laning wrote:
> the people responding to your postings evidently never had to deal with
> integrating legacy (e.g. paleolithic) systems.  I had to write a
> formatted-read function to read output from fortran programs that could
> not be maintained because the people that wrote them were no longer
> employed (or in some cases, living).  I am sorry that the sourcecode for
> my function is no longer available to me else I would pass it along, but
> IIRC it had many similarities to yours

If I had to do that (and I have done related stuff), I'd tend to do
the grotty massaging in awk or (nowadays) perl or something rather
than spend a whole lot of time doing stuff in Lisp.  This stems from
experiences trying to do similar things with scanf in C and eventually
giving up because it just ended up easier to do the conversion in a
dedicated string-bashing language: scanf is pretty fragile.

In fact if I had to do it again, I'd probably write something which
looked like a formatted-read function but actually the data massaging
utility  with its stdout piped into a Lisp stream.

Now of course purists will hate me for being willing to use perl as
well believing that READ isn't always the answer.  Oh well.

--tim
From: Duane Rettig
Subject: Re: Q: parsing strings
Date: 
Message-ID: <4vgv0gybp.fsf@beta.franz.com>
Raymond Laning <········@west.raytheon.com> writes:

> the people responding to your postings evidently never had to deal with
> integrating legacy (e.g. paleolithic) systems.

Why did you assume that?  What evidence do you give?

>  I had to write a
> formatted-read function to read output from fortran programs that could
> not be maintained because the people that wrote them were no longer
> employed (or in some cases, living).  I am sorry that the sourcecode for
> my function is no longer available to me else I would pass it along, but
> IIRC it had many similarities to yours

A couple of questions for you:

 1. Did you ever have personal control over such sources?  Or was
    it owned by the organization you wrote it for?  If it was owned
    by others, did they have a policy of non-propagation of such
    innovations to the outside world?

 2. If it was at all under your control to promulgate the sources,
    did you consider the functionality to be of general-purpose
    use?  Whether true or not, did you seek outside help to further
    generalize it?

 3. If the code was fully general, did you try to pass this code along
    as a potential enhancement to the Common Lisp spec?

 4. If you get to this question without rejecting the other three,
    why did you then not pass it along while you had control of the
    code?

These questions are rhetorical; I do not want the answers to them.
I am simply revealing my thought process for asking questions
of Mr. Zoller, who is in the very beginning stages of a similar
process.

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)
From: Raymond Laning
Subject: Re: Q: parsing strings
Date: 
Message-ID: <39ED0548.31E89EF@west.raytheon.com>
<snip>
> Why did you assume that?  What evidence do you give?
<snip> 
>  1. Did you ever have personal control over such sources?  Or was
>     it owned by the organization you wrote it for?  If it was owned
>     by others, did they have a policy of non-propagation of such
>     innovations to the outside world?
> 
>  2. If it was at all under your control to promulgate the sources,
>     did you consider the functionality to be of general-purpose
>     use?  Whether true or not, did you seek outside help to further
>     generalize it?
> 
>  3. If the code was fully general, did you try to pass this code along
>     as a potential enhancement to the Common Lisp spec?
> 
>  4. If you get to this question without rejecting the other three,
>     why did you then not pass it along while you had control of the
>     code?
> 
> These questions are rhetorical; I do not want the answers to them.
> I am simply revealing my thought process for asking questions
> of Mr. Zoller, who is in the very beginning stages of a similar
> process.
> 
> --
> Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
> 1995 University Ave Suite 275  Berkeley, CA 94704
> Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)

Although you did not wish answers for these questions, I will provide
them in case they might be useful to Mr. Zoller:

the code was developed for Wisdom Systems, which was a competitor to
ICAD.  In order to deploy our system at one of our clients, it was
necessary to interface to some FORTRAN analysis code that was not
maintained.  While rewriting the analysis code in the Concept Modeller
(an object-oriented, Lisp system) would have been the Right Thing, it
was not an option at the time.  Wisdom Systems struggled to make a
profit on its software, so giving away such ancillary code, while
desirable, was not an option.  In hindsight, I probably should have
pursued the actions you ask about, because the sources are now likely at
the bottom of the Charles River along with the rest of Wisdom Systems'
code, where it went after ICAD bought WS.

And yes, it was general enough to be useful, IMHO.  I'm sure
ICAD/Concentra would gladly part with the code, since I doubt they still
support the Concept Modeller ;-)
From: Erik Naggum
Subject: Re: Q: parsing strings
Date: 
Message-ID: <3180207722329732@naggum.net>
* Raymond Laning <········@west.raytheon.com>
| the people responding to your postings evidently never had to deal
| with integrating legacy (e.g. paleolithic) systems.

  Or that's just what they had, but they did it the right way.

| I had to write a formatted-read function to read output from fortran
| programs that could not be maintained because the people that wrote
| them were no longer employed (or in some cases, living).  I am sorry
| that the sourcecode for my function is no longer available to me
| else I would pass it along, but IIRC it had many similarities to
| yours

  Then there's no wonder you, too, feel that legacy systems are
  painful and that the right solution lies in simple-minded but overly
  powerful tools like regular expressions and simple-minded parsers.

  Actually _understanding_ a legacy data format is not easy, as most
  of the people who write their own data formats are incredibly stupid
  and short-sighted (as in writing years with two digits), and you're
  trying to use all your brainpower to be as dumb as someone who
  didn't have a clue that someday someone would have to think like
  they did, because they didn't think at all.  Clearly, a regular
  expression or something like "scanf" can't hack this -- both are
  rife with the same kind of short-sightedness that produce such
  random results.  Hoping for a match between the outcomes of two
  random processes is just insane.

  Writing an input processor ("reader") for some foreign language or
  data format is not something you do by reversing "format".  Hell,
  you don't _use_ format to produce syntactically correct output in
  other syntaxes, either.  format is meant for _human_ consumption.

  Some day, programmers will understand that there are three ways to
  represent information: computer-to-human, human-to-computer, and
  computer-to-computer; they have exactly _nothing_ in common which
  you can use to deal with another when you have dealt with one.
  Tools that seem to work most of the time (perl), or that promise
  something they cannot possibly deliver (XML), will only delay it.

#:Erik
-- 
  If this is not what you expected, please alter your expectations.
From: Duane Rettig
Subject: Re: Q: parsing strings
Date: 
Message-ID: <4u2akgxoy.fsf@beta.franz.com>
Dirk Zoller <···@sol-3.de> writes:

> Duane Rettig wrote:
> > > I always felt that the very powerful (format) lacks a counterpart
> > > for input. Like in C there is scanf() which in some sense reverses
> > > the effect of printf().
> > 
> > Perhaps this is a naiive question, but how often do you really use
> > scanf?  Is it really useful to you?
> 
> Less often than xprintf(), but for good reasons. I have to deal with
> messages formatted by other software and sscanf() is extremely handy
> to chop these into the right pieces and dig out the values at the same
> time.
> 
> I also define simple config file formats with a certain flexibility
> which I achieve without massive parsers just by trying to sscanf() a
> line in various ways. Without much hassle you get the information:
> Is this the line you're expecting, what are the values?

Understood.  It seems, though, like you are not talking about the
inverse of format, but instead the inverse of scanf, in lisp.  And
while it might be possible to provide a tool that does both, they
are really worlds apart (or should I say: languages apart?)
If this description is allegorical, or if you are looking for a tool
that parses both lisp output and C output, then you should state
what you want to accomplish specifically, otherwise I am taking
your example to be literal.

> > Or perhaps you are fitting simpler problems into scanf solutions:
> > I found a few uses of scanf in our own code, but most of them are
> > of the form:
> > 
> >     sscanf( argv[1], "%d", &n );
> > 
> > which is much more simply and efficiently written as a call to atoi()
> > or atol().
> 
> Only if you can live with the uncertainty if that was a number at all
> what you handed over to atoi(). I usually perfer something like
> 
> 	if (1 != sscanf (line, "%d %n", &value, &length) ||
>   	    length != strlen (line))
> 	  that was no number;
> 
> Simple enough, as it also catches cases where I expect a number of
> numbers or strings with certain delimiters. That's just great.
> (Don't try this with Borland or Microsoft C, their scanf() is broken.)

This is not failsafe because C does not type its data; whether it is
a number or a string or a struct, "it's all bits".

> > Perhaps read-from-string, followed by a type test, is a simpler
> > way to solve some of the higher level problems you would like to
> > solve with a (setf format) function.
> 
> Sounds a little like, hey there's no need for sscanf(), you can
> achieve all that with a little char* hackery. You can, but it's not
> convenient. In the case of Lisp it is also very inefficient.
> 
> (To support that efficiency argument: I sed/awk-ed my input file into
> nice Lisp expressions. That took a second. Then I handed it to (read),
> that took 20 seconds. Besides having to resort to such a technique for
> a simple data input problem is not exactly what I expect from a general
> purpose high level language, the performance of (read) when applied to
> lots of data was -- at least with my system -- very poor.)

This doesn't really support an efficiency argument, because it is purely
anecdotal.  Please provide real data and/or code with which to reproduce
this.

> All people pointing to (read) are missing the point. The most grotesque
> idea I heard was hacking read macros to do some parsing on input. This 
> is probably very exciting, but I'd still prefer if I could achieve
> the same with a simple and effective tool which works similarly to
> (format).

It is not always correct to use read.  If you are trying to parse C data,
Lisp's reader is not the one to use.

If, on the other hand, you are talking about inter-language communication
in general, then that is a much larger issue.

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)
From: Dirk Zoller
Subject: Re: Q: parsing strings
Date: 
Message-ID: <39E3B6D0.F2D6D15C@onlinehome.de>
Duane Rettig wrote:

> Understood.  It seems, though, like you are not talking about the
> inverse of format, but instead the inverse of scanf, in lisp. 

and more evidence of confusion, like

> This is not failsafe because C does not type its data; whether it is
> a number or a string or a struct, "it's all bits".

Sigh!!


Please folks, stop this ignorant ranting against other languages.
That's the constant habit in all *losing* languages groups.


> This doesn't really support an efficiency argument, because it is purely
> anecdotal.  Please provide real data and/or code with which to reproduce
> this.

Correct. My "anecdote" didn't technically support an argument.
I phrased that badly.

But this experiment I did was good enough to convince me that (read) is
not the answer, not even with massive aid from evil alien tools which
were written in -- shrudder -- C.


> It is not always correct to use read.  If you are trying to parse C data,
> Lisp's reader is not the one to use.

There might be "Lisp-data" in the same sense as there are "Dbase data" or
"MS-Word 2000 data".

But there is no such thing like "C-data" out there.

And at first glance, Lisp does not seem to be up to this situation.
Leave it like that. Continue to take pride in either your skills to
work around this huge gap or your ignorance of not seeing it.
But don't complain about people preferring other tools.



-- 
Dirk Zoller				Fon: 06106-876566
Obere Marktstra�e 5			e-mail: ···@sol-3.de
63110 Rodgau
From: Duane Rettig
Subject: Re: Q: parsing strings
Date: 
Message-ID: <4em1nev44.fsf@beta.franz.com>
Dirk Zoller <···@onlinehome.de> writes:

> Duane Rettig wrote:
> 
> > Understood.  It seems, though, like you are not talking about the
> > inverse of format, but instead the inverse of scanf, in lisp. 
> 
> and more evidence of confusion, like
> 
> > This is not failsafe because C does not type its data; whether it is
> > a number or a string or a struct, "it's all bits".
> 
> Sigh!!
> 
> 
> Please folks, stop this ignorant ranting against other languages.
> That's the constant habit in all *losing* languages groups.

I now see my mistake.  It was in taking the question you posed
originally:

>> Why isn't such a nice function already defined in Common Lisp?

at face value, and trying to lead you to some reasonable set of
possible answers.  Instead, it is clear that it wasn't an honest
question at all; it was nothing but a troll.

I apologize to you and to the rest of this newsgroup for wasting
everyone's time.

-- 
Duane Rettig          Franz Inc.            http://www.franz.com/ (www)
1995 University Ave Suite 275  Berkeley, CA 94704
Phone: (510) 548-3600; FAX: (510) 548-8253   ·····@Franz.COM (internet)
From: Johannes Beck
Subject: Re: Q: parsing strings
Date: 
Message-ID: <39E3714B.A14B44B7@arcormail.de>
Duane Rettig wrote:
> 
> Dirk Zoller <···@onlinehome.de> writes:
> 
> > Hello,
> >
> > I always felt that the very powerful (format) lacks a counterpart
> > for input. Like in C there is scanf() which in some sense reverses
> > the effect of printf().
> 
> Perhaps this is a naiive question, but how often do you really use
> scanf?  Is it really useful to you?
> 
> Or perhaps you are fitting simpler problems into scanf solutions:
> I found a few uses of scanf in our own code, but most of them are
> of the form:
> 
>     sscanf( argv[1], "%d", &n );
> 
> which is much more simply and efficiently written as a call to atoi()
> or atol().
> 
> Perhaps read-from-string, followed by a type test, is a simpler
> way to solve some of the higher level problems you would like to
> solve with a (setf format) function.

Over the years I have come to the point of absolutely avoiding
read-from-string to parse string data from the outside world (eg user
interface, text files, sockets). you always forget to put an
error-handler around it or play with the reader before it is secure to
use read-from-string. and afterwards you have to the check the type of
the result if its like what you've expected.

if there's the chance that some wrong input will cause an error in your
program someone will make this wrong input (imagine how strings like
"(foo" "#.bla" are treated by read-from-string). 

So there's definitely a need for a good and stable string parsing
functions besides the reader.

Bye
	Joe

--
Johannes Beck   http://home.arcor-online.de/johannes.beck/
From: Rainer Joswig
Subject: Re: Q: parsing strings
Date: 
Message-ID: <joswig-4B64DC.21371711102000@news.is-europe.net>
In article <·················@arcormail.de>, Johannes Beck 
<·············@arcormail.de> wrote:

> So there's definitely a need for a good and stable string parsing
> functions besides the reader.

There are simple options like the META parser from
Henry Baker.


In a more interesting world we all might use CLIM:

http://www.xanalys.com/software_tools/reference/lwu41/climuser/GUID_105.HTM
http://www.xanalys.com/software_tools/reference/lwu41/climuser/GUID_106.HTM

Short explanation:

 There are basic presentation types like INTEGER or STRING.
 Presentation types can take PARAMETERS (like min-value and max-value) and OPTIONS
   (like the base of the integer to be read).
 There are more complex presentation types like SEQUENCE, OR, AND, ...
 Presentation types form a hierarchy (like in CLOS).
 One can ACCEPT and PRESENT objects. Special methods can
  define own parsers/presenters for certain presentation types.
 ACCEPTING from strings is possible via ACCEPT-FROM-STRING:

   CLIM:ACCEPT-FROM-STRING type string
                           &key view default default-type activation-gestures
                                additional-activation-gestures delimiter-gestures
                                additional-delimiter-gestures start end
  
   Like ACCEPT, except that the input is taken from string, starting at
   the position specified by start and ending at end. view, default, and
   default-type are as for accept. 
  
   ACCEPT-FROM-STRING returns an object and a presentation type (as in ACCEPT),
   but also returns a third value, the index at which input terminated. 



Parsing an integer between 0 and 1000 with base 16:

? (clim:accept-from-string '((clim:integer 0 1000) :base 16)
                           "FF")
255
((INTEGER 0 1000) :BASE 16)
2


One of FOO or BAR:

? (clim:accept-from-string '(clim:member-sequence (foo bar))
                           "foo")
FOO
((CLIM:COMPLETION (FOO BAR) :TEST EQL) :HIGHLIGHTER #<Compiled-function CLIM-INTERNALS::HIGHLIGHT-COMPLETION-CHOICE #xED72B6E> :PRINTER #<Compiled-function CLIM:WRITE-TOKEN #xEC47BAE>)
3


Some of FOO, BAR and BAZ.

? (clim:accept-from-string '(clim:subset-sequence (foo bar baz))
                           "foo,bar")
(FOO BAR)
((CLIM:SUBSET-COMPLETION (FOO BAR BAZ) :TEST EQL) :HIGHLIGHTER #<Compiled-function CLIM-INTERNALS::HIGHLIGHT-COMPLETION-CHOICE #xED72B6E> :PRINTER #<Compiled-function CLIM:WRITE-TOKEN #xEC47BAE>)
7


Yes or No:

? (clim:accept-from-string 'clim:boolean "Yes")
T
CLIM:BOOLEAN
3


A keyword:

? (clim:accept-from-string 'clim:keyword ":foo")
:FOO
KEYWORD
4



A float between 0.3 and 0.5:

? (clim:accept-from-string '(clim:float 0.3 0.5) "0.4")
0.4
(FLOAT 0.3 0.5)
3


A sequence of floats separated by comma:

? (clim:accept-from-string '(clim:sequence clim:float) "0.4,0.3,0.4")
(0.4 0.3 0.4)
(SEQUENCE FLOAT)
11



A boolean, a string and an integer separated by comma:

? (clim:accept-from-string
   '(clim:sequence-enumerated clim:boolean clim:string (clim:integer 0 100))
   "Yes,Rainer,36")
(T "Rainer" 36)
(CLIM:SEQUENCE-ENUMERATED CLIM:BOOLEAN STRING (INTEGER 0 100))
13


A boolean or an integer:

? (clim:accept-from-string
   '(clim:or clim:boolean (clim:integer 0 100))
   "Yes")
T
(OR CLIM:BOOLEAN (INTEGER 0 100))
3


A boolean, a string and an integer separated by space:

? (clim:accept-from-string
   '((clim:sequence-enumerated clim:boolean clim:string (clim:integer 0 100))
     :separator #\space)
   "Yes Rainer 36")
(T "Rainer" 36)
((CLIM:SEQUENCE-ENUMERATED CLIM:BOOLEAN STRING (INTEGER 0 100)) :SEPARATOR #\Space)
13

Make it shorter by defining an abbrevition MY-RECORD-ROW:

? (clim:define-presentation-type-abbreviation my-record-row ()
  '((clim:sequence-enumerated clim:boolean clim:string (clim:integer 0 100))
    :separator #\space))
MY-RECORD-ROW

? (clim:accept-from-string 'my-record-row "Yes Rainer 36")
(T "Rainer" 36)
((CLIM:SEQUENCE-ENUMERATED CLIM:BOOLEAN STRING (INTEGER 0 100)) :SEPARATOR #\Space)
13



It is especially interesting, because you can define new presentation types, which
inherit from others (like "age" could inherit from integer with a specified range).
Sure you can define presentations based on CLOS classes, etc...


See also CLIM:PROMPT-FOR-ACCEPT-1 :

? (clim:prompt-for-accept-1 *standard-output*
                            '((clim:sequence-enumerated
                               clim:boolean clim:string (clim:integer 0 100))
                              :separator #\space))
Enter a boolean, a string, and an integer between 0 and 100:

-- 
Rainer Joswig, Hamburg, Germany
Email: ·············@corporate-world.lisp.de
Web: http://corporate-world.lisp.de/
From: Paolo Amoroso
Subject: Re: Q: parsing strings
Date: 
Message-ID: <jdzlOfZHYaCsweQwumtTOr1ObZrw@4ax.com>
On Wed, 11 Oct 2000 21:37:17 +0200, Rainer Joswig
<······@corporate-world.lisp.de> wrote:

> In a more interesting world we all might use CLIM:

Just a quick reminder that an effort is under way to develop a free
implementation of CLIM:

  http://www.mikemac.com/mikemac/McCLIM/index.html

Mailing list:

  http://www2.cons.org:8000/mailman/listinfo/free-clim

Mailing list archives:

  http://www2.cons.org:8000/pipermail/free-clim/ (new)
  http://www3.cons.org/maillists/free-clim (old)


Paolo
-- 
EncyCMUCLopedia * Extensive collection of CMU Common Lisp documentation
http://cvs2.cons.org:8000/cmucl/doc/EncyCMUCLopedia/
From: The Glauber
Subject: Re: Q: parsing strings
Date: 
Message-ID: <8rvveo$ont$1@nnrp1.deja.com>
In article <·················@onlinehome.de>,
  ···@onlinehome.de, ···@sol-3.de wrote:
> This is a multi-part message in MIME format.
> --------------FB3F02D642D2D93114FB7658
> Content-Type: text/plain; charset=iso-8859-1
> Content-Transfer-Encoding: 8bit
>
> Hello,
>
> I always felt that the very powerful (format) lacks a counterpart
> for input. Like in C there is scanf() which in some sense reverses
> the effect of printf().
>
> I then tried to write such a thing. It seems possible. Please find
> attached a first attempt to do it.
[...]


Dirk,

this file doesn't load. According to CLISP:
CL-USER[7]> (load "scan.lsp")

;; Loading file scan.lsp ... *** - READ: input stream #<BUFFERED FILE-STREAM
CHARACTER #P"scan.lsp" @262> ends within an object. Last opening parenthesis
probably in line 200.


--
Glauber Ribeiro
··········@my-deja.com    http://www.myvehiclehistoryreport.com
"Opinions stated are my own and not representative of Experian"


Sent via Deja.com http://www.deja.com/
Before you buy.
From: Dirk Zoller
Subject: Re: Q: parsing strings
Date: 
Message-ID: <39E388DE.A8BEDF0C@onlinehome.de>
This is a multi-part message in MIME format.
--------------0CA95B18C2E6F87C0ADD7AD3
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

The Glauber wrote:

> this file doesn't load. According to CLISP:
> CL-USER[7]> (load "scan.lsp")
> 
> ;; Loading file scan.lsp ... *** - READ: input stream #<BUFFERED FILE-STREAM
> CHARACTER #P"scan.lsp" @262> ends within an object. Last opening parenthesis
> probably in line 200.

Yes I accidentally posted some old garbage file.

Please find attached a version I just checked to be working with clisp.

Anyway I must warn you that this file is in the midst of being reworked
from scanning strings to scanning streams. Most functions still scan strings
and are not actually being called. Only a simple integer conversion has
already tranformed into the stream-scanning style and is called in place
of all other conversions.

Given that, the idea come over anyway. I'm not sure what is better (scanning
streams vs. scanning streams). Maybe that will be a performance/versatility
tradeoff. Strings can be treated as streams but not vice versa. OTOH scanning
strings might perform better. You see, I'm experimenting.

I'll be happy to send this test file "daxa.asc" by mail. It's to big to post.

Regards
Dirk



-- 
Dirk Zoller				Fon: 06106-876566
Obere Marktstra�e 5			e-mail: ···@sol-3.de
63110 Rodgau
--------------0CA95B18C2E6F87C0ADD7AD3
Content-Type: text/plain; charset=us-ascii;
 name="scan.lisp"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="scan.lisp"

;;; $Id$
;;;
;;; C-scanf() like (format) analogon for input from strings.
;;;


(proclaim '(optimize (speed 3) (safety 0) (space 0)))

(eval-when (:load-toplevel
	    :compile-toplevel
	    :execute)

(defmacro with-type (type expr)
  `(the ,type ,(if (atom expr)
                   expr
                   (expand-call type (binarize expr)))))

(defun expand-call (type expr)
  `(,(car expr) ,@(mapcar #'(lambda (a)
                              `(with-type ,type ,a))
                          (cdr expr))))

(defun binarize (expr)
  (if (and (nthcdr 3 expr)
           (member (car expr) '(+ - * /)))
      (destructuring-bind (op a1 a2 . rest) expr
        (binarize `(,op (,op ,a1 ,a2) ,@rest)))
      expr))
)


(defparameter whitespace-characters

  #(#\Space #\Newline #\Tab
    #\Linefeed #\Return #\Page
    #\Backspace #\Rubout)		; questionable

  "Vector of all space-like characters like tab and newline.")


(defun whitespace-char-p (c)

  "Returns non-nil iff c is a non visible character like space or newline."

  (position c whitespace-characters))


(defun skip-charset (charset str &key (start 0) (end nil))

  "Returns the position of the first character in str which is not
element of charset. When end is specified then it must not be greater
than the length of the string. Returns end or length of str when str
consists entirely of elements of charset."

  (or (position-if-not #'(lambda (c) (position c charset))
		       str :start start :end end)
      end))

(defun scan-charset (charset str &key (start 0) (end nil))
  (or (position-if #'(lambda (c) (position c charset))
		   str :start start :end end)
      end))


(defun isolate-word (str &key (start 0) (end nil))

  "Reads a whitespace delimited word from the string."

  (declare (string str))
  (let* ((len (let ((l (length str)))
		(if end (min l end) l)))
	 (i (position-if-not #'whitespace-char-p str
			     :start start :end len))
	 (j (position-if #'whitespace-char-p str
			 :start i :end len)))
    (values (subseq str i j) j)))


(defun string-to-unsigned-fixnum (str &key (start 0) (end nil) (radix 10))

  "Converts part of a string str to an unsigned fixnum.

Starts converting at position start, stops at end when specified or at
end of string or when a non-convertible character is seen. If end is
specified then it must not be greater than the length of the string.

Assumes the specified radix and does neither recognize a radix
encoding in the string nor a sign nor even leading whitespace.

Returns the converted number and the position of the first character
in the string behind that number. Returns 0 when no convertible
characters where seen at all."

  (declare (simple-string str)
           (fixnum start radix))
  (let ((len (or end (length str))))
    (declare (fixnum len))
    (do ((i start (1+ i))
         (n 0))
        ((= i len)
         (values n i))
      (declare (fixnum i n))
      (let ((d (digit-char-p (schar str i) radix)))
        (if d
            (setq n (with-type fixnum (+ d (* n radix))))
            (return (values n i)))))))


(defun string-to-unsigned-integer (str &key (start 0) (end nil) (radix 10))

  "Converts part of a string str to an integer.

Starts converting at position start, stops at end when specified or at
end of string or when a non-convertible character is seen. If end is
specified then it must not be greater than the length of the string.

Assumes the specified radix and does neither recognize a radix
encoding in the string nor a sign nor even leading whitespace.

Returns the converted number and the position of the first character
in the string behind that number. Returns nil when no convertible
characters where seen at all."

  (declare (simple-string str)
           (fixnum start end radix))
  (let ((len (or end (length str))))
    (do ((i start (1+ i))
         (n 0))
        ((= i len)
         (values n len))
      (let ((d (digit-char-p (schar str i) radix)))
        (if d
            (setq n (with-type integer (+ d (* n radix))))
            (return (if (> i start)
			(values n i))))))))


(defun string-to-integer (str &key (start 0) (end nil) (radix 10))

  "Converts part of a simple-string str to an integer.

Skips leading whitespace and recognizes #\+ and #\- as signs.

Converts up to end (if specified) or end of string or until a
non-convertible character is seen, whichever comes first.

Uses the specified radix as number conversion base. Alas the specified
radix may be overridden from the input in the string using a syntax of
#x, #o, #b, #nnr for hex, octal, binary or other.

Returns nil on error or the resulting integer and the position of the
first character behind the integer."

  (declare (simple-string str)
	   (fixnum start radix))
  (block nil
    (let* ((len (let ((l (length str)))
		  (if end (min l end) l)))
	   (i (position-if-not #'whitespace-char-p str
			       :start start :end len))
	   (negative nil)
	   (n nil))
      (declare (fixnum len i))
      (when (= i len)
	(return nil))
      (case (schar str i)
	(#\- (setq negative t)
	     (incf i)
	     (when (= i len)
	       (return nil)))
	(#\+ (incf i)
	     (when (= i len)
	       (return nil))))
      (when (char= #\# (schar str i))
	(incf i)
	(if (= i len)
	    (return nil))
	(case (schar str i)
	  (#\b (incf i) (setf radix #b10))
	  (#\o (incf i) (setf radix #o10))
	  (#\x (incf i) (setf radix #x10))
	  (t (multiple-value-setq (radix i)
	       (string-to-unsigned-fixnum str :start i :end end))
	     (when (or (not radix)
		       (< radix 2)
		       (= i len)
		       (char-not-equal (schar str i) #\r))
	       (return nil))
	     (incf i))))
      (multiple-value-setq (n i)
	(string-to-unsigned-integer str :start i :end end :radix radix))
      (when (not n)
	(return nil))
      (values (if negative (- n) n) i))))


(defun string-to-float (str &key (start 0) (end nil))

  "(str &key (start 0) (end nil))

Read a double float from the string str. Start reading at start, stop
at end or at end of string or when the first non-convertible character
is seen. Assumes decimal. Returns the converted number and the
position of the first character in str following the converted
number. Returns nil on error."

  (declare (simple-string str)
	   (fixnum start))
  (block nil
    (let* ((len (let ((l (length str)))
		  (if end (min l end) l)))
	   (i (position-if-not #'whitespace-char-p str
			       :start start :end len))
	   negative
	   entier
	   fractional
	   fractional-digits
	   exponent
	   exponent-negative
	   result)
      (declare (fixnum len i))

      ;; Find sign.
      (when (= i len)
	(return nil))
      (case (schar str i)
	(#\- (setq negative t)
	     (incf i)
	     (when (= i len)
	       (return nil)))
	(#\+ (incf i)
	     (when (= i len)
	       (return nil))))

      ;; Find integer part.
      (multiple-value-setq (entier i)
	(string-to-unsigned-integer str :start i :end end))

      ;; Find fractional part.
      (when (and (< i len)
		 (char= (schar str i) #\.))
	(incf i)
	(setq fractional-digits i)
	(multiple-value-setq (fractional i)
	  (string-to-unsigned-integer str :start i :end end))
	(setq fractional-digits (- fractional-digits i)))

      ;; Ignore missing entier or fractional part but not both.
      (if entier
	  (unless fractional
	    (setq fractional 0))
	  (if fractional
	      (setq entier 0)
	      (return nil)))

      ;; Find exponent.
      (when (and (< i len)
		 (char-equal (schar str i) #\e))
	(incf i)
	(when (= i len)
	  (return nil))
	;; Find sign of exponent.
	(case (schar str i)
	  (#\- (setq exponent-negative t)
	       (incf i)
	       (when (= i len)
		 (return nil)))
	  (#\+ (incf i)
	       (when (= i len)
		 (return nil))))
	;; Find magnitude of exponent.
	(multiple-value-setq (exponent i)
	  (string-to-unsigned-integer str :start i :end end))
	;; Check exponent, apply sign.
	(when (not exponent)
	  (return nil))
	(when exponent-negative
	  (setq exponent (- exponent))))

      ;; Assemble result.
      (setq result (float entier 0d0))
      (when fractional-digits
	(incf result (* fractional (expt 1d1 fractional-digits))))
      (when exponent
	(setq result (* result (expt 10 exponent))))
      (values (if negative (- result) result) i))))

(defun stream-parse-integer (strm &optional (base 10))
  (do* ((n 0
	   (+ (* base n) d))
	(c (read-char strm nil nil)
	   (read-char strm nil nil))
	(d (and c (digit-char-p c base))
	   (and c (digit-char-p c base))))
       ((not d)
	(if c (unread-char c strm))
	n)))

(defun scan-strm-strm (form strm)
  (let (result)
    (labels
        ((scan-character (width)
           (push (read-char strm) result))
         (scan-string (width)
           (push (read-char strm) result))
         (scan-integer (width)
           (push (stream-parse-integer strm) result))
         (scan-float (width)
           (push (stream-parse-integer strm) result))
         (dispatch (c)
           (case c
             (#\~
              (let ((width (stream-parse-integer form))
                    (form-char (read-char form nil nil)))
                (case form-char
                  (#\c (scan-character width))
                  (#\s (scan-string width))
                  (#\d (scan-integer width))
                  (#\f (scan-float width))
                  (otherwise nil))))
             (otherwise
              (let ((d (read-char strm)))
                (eq c d))))))
      (do* ((c (read-char form nil nil)
               (read-char form nil nil)))
           ((or (not c)
                (not (dispatch c)))
            (if c
                ;; EOF in form not hit, means error, return nil.
                nil
                ;; EOF in form, means all formats processed.
                ;; Return reversed result list.
                (values-list (nreverse result))))))))

(defun scan-string-strm (form strm)
  (with-input-from-string (form-strm form)
    (scan-strm-strm form-strm strm)))

(defun scan-string-string (form string)
  (with-input-from-string (strm string)
    (scan-string-strm form strm)))

(defun read-daxa ()
  (with-open-file (daxa "daxa.asc"
                        :direction :input)
    (do ((line (read-line daxa nil nil)
               (read-line daxa nil nil))
         (result nil
                 (cons (multiple-value-list
                           (scan-string-string
                            "~2d~2d~2d,~3d.~3d,~3d.~3d,~3d.~3d,~3d.~3d"
                            line))
                       result)))
        ((not line) result))))

--------------0CA95B18C2E6F87C0ADD7AD3--
From: The Glauber
Subject: Re: Q: parsing strings
Date: 
Message-ID: <8s1r41$7a0$1@nnrp1.deja.com>
In article <·················@onlinehome.de>,
  ···@onlinehome.de, ···@sol-3.de wrote:
[...]
> Yes I accidentally posted some old garbage file.
>
> Please find attached a version I just checked to be working with clisp.
[...]

Thanks, i'll give it a try.

Incidentally, there's a library of Scheme functions called SLIB
(http://swissnet.ai.mit.edu/~jaffer/SLIB.html)
which includes scanf, and even (horror!) printf.
Someone with a lot of free time might try to translate that to Lisp.


--
Glauber Ribeiro
··········@my-deja.com    http://www.myvehiclehistoryreport.com
"Opinions stated are my own and not representative of Experian"


Sent via Deja.com http://www.deja.com/
Before you buy.
From: Marco Antoniotti
Subject: Re: Q: parsing strings
Date: 
Message-ID: <y6cem1ni8qu.fsf@octagon.mrl.nyu.edu>
The Glauber <··········@my-deja.com> writes:

> In article <·················@onlinehome.de>,
>   ···@onlinehome.de, ···@sol-3.de wrote:
> [...]
> > Yes I accidentally posted some old garbage file.
> >
> > Please find attached a version I just checked to be working with clisp.
> [...]
> 
> Thanks, i'll give it a try.
> 
> Incidentally, there's a library of Scheme functions called SLIB
> (http://swissnet.ai.mit.edu/~jaffer/SLIB.html)
> which includes scanf, and even (horror!) printf.
> Someone with a lot of free time might try to translate that to Lisp.

Why? :)

-- 
Marco Antoniotti =============================================================
NYU Bioinformatics Group			 tel. +1 - 212 - 998 3488
719 Broadway 12th Floor                          fax  +1 - 212 - 995 4122
New York, NY 10003, USA				 http://galt.mrl.nyu.edu/valis
             Like DNA, such a language [Lisp] does not go out of style.
			      Paul Graham, ANSI Common Lisp