From: Frank Buss
Subject: parsing a string
Date: 
Message-ID: <cc6g9o$4mv$1@newsreader2.netcologne.de>
I tried to write a parser for a very simple syntax. The program but I 
think it's too complicated and too long. Some other questions:

- how should I save the allowed keys? Later I want to save the values in 
a structure with the same keys, but I don't want to write them twice

- the parsing algorithm is bad, because it searches every key twice

- the "find" function for searching the allowed key list could be slow, a 
hashmap would be better, but a tree with the allowed keys, in which the 
node values are chars of the key, would be the fastest solution, but then 
the parsing algorithm must be changed

Here is the code. Feel free to criticize for helping me to become a 
better Lisp programmer.

(defun $-pos (string start )
  (position #\$ string :start start))

; returns the next valid key and the position, including the $, or nil
; example: (next-key "$id$test$next$rest" '("test")) -> 3, "test"
(defun next-key (string allowed-keys &optional (start 0))
  (let ((first ($-pos string start)))
    (if first
        (let ((second ($-pos string (1+ first))))
          (if second
              (let ((key (subseq string (1+ first) second)))
                (if (find key allowed-keys :test #'equal)
                    (values first key)
                    (next-key string allowed-keys (1+ first))))
              nil))
        nil)))

; parses a string to a list of pairs: symbol . string, starting with the
; first $ and ignoring all $ for not allowed keys
; example:
; (parse "test$foo$123$name$empty$$last$bar" '("foo" "empty" "last"))
;   -> (("foo" . "123$name") ("empty" . "") ("last" . "bar"))
(defun parse (string allowed-keys &optional (start 0))
  (multiple-value-bind (key-pos key) 
      (next-key string allowed-keys start)
    (if key-pos
        (let* ((value-pos (+ key-pos 2 (length key)))
               (next-key-pos (next-key string allowed-keys value-pos)))
          (if next-key-pos
              (append
               (list (cons key (subseq string value-pos next-key-pos)))
               (parse string allowed-keys next-key-pos))
              (list (cons key (subseq string value-pos)))))
        ())))
      
(defun test ()
  (parse "test$foo$123$name$empty$$last$bar" '("foo" "empty" "last")))

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de

From: Pascal Bourguignon
Subject: Re: parsing a string
Date: 
Message-ID: <87wu1l0zgp.fsf@thalassa.informatimago.com>
Frank Buss <··@frank-buss.de> writes:
> ; (parse "test$foo$123$name$empty$$last$bar" '("foo" "empty" "last"))
> ;   -> (("foo" . "123$name") ("empty" . "") ("last" . "bar"))

(defun parse (string keys &optional (start 0))
  (let ((parsed-list (split-sequence (character "$") string :start start)))
    (mapcar (lambda (key) (cons key (cadr (member key parsed-list
                                             :test (function string=)))))
            keys)))


(parse "test$foo$123$name$empty$$last$bar" '("foo" "empty" "last"))
--> (("foo" . "123") ("empty" . "") ("last" . "bar"))

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

There is no worse tyranny than to force a man to pay for what he does not
want merely because you think it would be good for him. -- Robert Heinlein
From: Frank Buss
Subject: Re: parsing a string
Date: 
Message-ID: <cc711i$dk7$2@newsreader2.netcologne.de>
Pascal Bourguignon <····@thalassa.informatimago.com> wrote:

> (defun parse (string keys &optional (start 0))
>   (let ((parsed-list (split-sequence (character "$") string :start
>   start))) 
>     (mapcar (lambda (key) (cons key (cadr (member key parsed-list
>                                              :test (function
>                               string=))))) 
>             keys)))

Yes, something with "split" and "map" was my first idea, too, but this 
doesn't work if "$foo$" can be part of the values, when "foo" is not in the 
keys list.

> (parse "test$foo$123$name$empty$$last$bar" '("foo" "empty" "last"))
> --> (("foo" . "123") ("empty" . "") ("last" . "bar"))

Should be (("foo" . "123$name") ("empty" . "") ("last" . "bar"))

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: Wade Humeniuk
Subject: Re: parsing a string
Date: 
Message-ID: <a9KFc.101662$E84.35907@edtnps89>
Here is one possible way

(defun parse ($string tokens)
   (loop with start = 0
         for (first next) on tokens
         for start1 = (+ (search first $string :test #'string= :start2 start)
                         (length first) 1)
         for end1 = (when next (search next $string :test #'string= :start2 start1))
         collect (prog1 (cons first (subseq $string start1 (when end1 (1- end1))))
                   (setf start end1))))

CL-USER 12 > (parse "test$foo$123$name$empty$$last$bar" '("foo" "empty" "last"))
(("foo" . "123$name") ("empty" . "") ("last" . "bar"))

My first solution was:

;; The string "test$foo$123$name$empty$$last$bar" is
;; equivalent to the list ("test" "foo" "123" "name" "empty" "" "last" "bar")


(defun split$string (string)
   (loop with string-end = (1- (length string))
         with start = 0
         for end from 0
         for c across string
         when (char= #\$ c) collect (prog1 (subseq string start end)
                                      (setf start (1+ end)))
         when (= end string-end) collect (subseq string start (1+ end))))

(defun sub$seq ($seq string1 &optional string2)
   (let ((start (position string1 $seq :test #'string=))
         (end (when string2 (position string2 $seq :test #'string=))))
     (assert (and start (or (null string2) end)))
     (subseq $seq (1+ start) end)))


(defun parse ($string tokens)
   (loop with split = (split$string $string)
         for (first next) on tokens
         collect (cons first (sub$seq split first next))))

CL-USER 13 > (parse "test$foo$123$name$empty$$last$bar" '("foo" "empty" "last"))
(("foo" "123" "name") ("empty" "") ("last" "bar"))

CL-USER 14 >

This solution is in my mind more lispy.  However it does not meet your
output requirements.

Wade
From: Frank Buss
Subject: Re: parsing a string
Date: 
Message-ID: <ccl9o7$538$1@newsreader2.netcologne.de>
Wade Humeniuk <········@telus.delete.net> wrote:

> Here is one possible way
> 
> (defun parse ($string tokens)
>    (loop with start = 0
>          for (first next) on tokens
>          for start1 = (+ (search first $string :test #'string= :start2
>          start) 
>                          (length first) 1)
>          for end1 = (when next (search next $string :test #'string=
>          :start2 start1)) collect (prog1 (cons first (subseq $string
>          start1 (when end1 (1- end1)))) 
>                    (setf start end1))))

Thanks, this looks interesting, the "for-on"-form is really useful. But 
it didn't work for my application, because a token can occur more than 
one time. 

But I can change my format. If I use a Lisp-like string:

(("key1" . "value1") ("key2" . "value2") ("key1" . "but another value") 
("empty" . "") ("key3" . "foo"))

it is really simple to parse it, using read-from-string, but how can I 
catch an error? I've tried this:

(handler-bind ((error #'(lambda (c) nil))) (read-from-string "("))

but it shows the debugger instead of returning nil.

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: Rob Warnock
Subject: Re: parsing a string
Date: 
Message-ID: <LPydndjQSpOYyXPdRVn-hQ@speakeasy.net>
Frank Buss  <··@frank-buss.de> wrote:
+---------------
| But I can change my format. If I use a Lisp-like string:
|   (("key1" . "value1") ("key2" . "value2") ("key1" . "but another value") 
|   ("empty" . "") ("key3" . "foo"))
| it is really simple to parse it, using read-from-string, but how can I 
| catch an error? I've tried this:
|   (handler-bind ((error #'(lambda (c) nil))) (read-from-string "("))
| but it shows the debugger instead of returning nil.
+---------------

For something that simple, just use HANDLER-CASE:

    > (handler-case
	  (read-from-string "(")
	(error (c)
	  (declare (ignore c))
	  nil))

    NIL
    > 

or even just IGNORE-ERRORS:

    > (ignore-errors (read-from-string "("))

    NIL
    #<END-OF-FILE {484EE365}>		; 2nd value is condition object
    >


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Frank Buss
Subject: Re: parsing a string
Date: 
Message-ID: <ccmv2a$96k$1@newsreader2.netcologne.de>
····@rpw3.org (Rob Warnock) wrote:

> or even just IGNORE-ERRORS:
> 
>     > (ignore-errors (read-from-string "("))
> 
>     NIL
>     #<END-OF-FILE {484EE365}>          ; 2nd value is condition object
>     >

Yes, this is something like the try-catch I know from Java. Looks like Lisp 
has more powerful constructs, for example if I want to continue where the 
error occurs without stack-unwinding. A very good description is this 
article:

http://www.nhplace.com/kent/Papers/Condition-Handling-2001.html

But currently I don't see useful applications of this concept, perhaps 
because of too much Java and C++ programming.

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: Peter Seibel
Subject: Re: parsing a string
Date: 
Message-ID: <m3d634pynf.fsf@javamonkey.com>
Frank Buss <··@frank-buss.de> writes:

> ····@rpw3.org (Rob Warnock) wrote:
>
>> or even just IGNORE-ERRORS:
>> 
>>     > (ignore-errors (read-from-string "("))
>> 
>>     NIL
>>     #<END-OF-FILE {484EE365}>          ; 2nd value is condition object
>>     >
>
> Yes, this is something like the try-catch I know from Java. Looks like Lisp 
> has more powerful constructs, for example if I want to continue where the 
> error occurs without stack-unwinding. A very good description is this 
> article:
>
> http://www.nhplace.com/kent/Papers/Condition-Handling-2001.html
>
> But currently I don't see useful applications of this concept, perhaps 
> because of too much Java and C++ programming.

I'd be interested to know if this chapter from my upcoming book on
Common Lisp helps you:

  <http://www.gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html>

My target audience is folks like you experienced programmers coming to
Lisp from some other languages, in particular Java or Python so any
comments you have about this did or did not clear things up for you
would be useful for me.

-Peter

-- 
Peter Seibel                                      ·····@javamonkey.com

         Lisp is the red pill. -- John Fraser, comp.lang.lisp
From: Frank Buss
Subject: Re: parsing a string
Date: 
Message-ID: <ccogvg$n6l$2@newsreader2.netcologne.de>
Peter Seibel <·····@javamonkey.com> wrote:

> I'd be interested to know if this chapter from my upcoming book on
> Common Lisp helps you:
> 
>   <http://www.gigamonkeys.com/book/beyond-exception-handling-conditions
>   -and-restarts.html> 

Yes, this was very helpful. I think now I understand the power of restarts, 
thanks to your logging example, and can use it in my own programs, if 
necessary. 

Perhaps you can improve the chapter with another example with a standard CL 
function, like catching (/ 1 0).

-- 
Frank Bu�, ··@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
From: Brian Downing
Subject: Re: parsing a string
Date: 
Message-ID: <tOAJc.84668$MB3.18782@attbi_s04>
In article <············@newsreader2.netcologne.de>,
Frank Buss  <··@frank-buss.de> wrote:
> But I can change my format. If I use a Lisp-like string:
> 
> (("key1" . "value1") ("key2" . "value2") ("key1" . "but another value") 
> ("empty" . "") ("key3" . "foo"))
> 
> it is really simple to parse it, using read-from-string, but how can I 
> catch an error? I've tried this:
> 
> (handler-bind ((error #'(lambda (c) nil))) (read-from-string "("))
> 
> but it shows the debugger instead of returning nil.

Others have explained how to make this work, but I thought I'd try to
explain why this didn't do what you thought it would.

Your handler #'(lambda (c) nil) is getting run.  However, it is
declining to handle the condition, so the search for a handler
continues, eventually hitting the debugger.

In order for a handler to not decline, it must perform a non-local
transfer of control, either in its own body or by calling a function or
restart that does so.

See, among other references,
http://www.lispworks.com/reference/HyperSpec/Body/09_ada.htm

which says:

   When a condition is signaled, the most recent applicable active
   handler is invoked. Sometimes a handler will decline by simply
   returning without a transfer of control. In such cases, the next most
   recent applicable active handler is invoked.

In your application, there was no next handler.  So, since the call
stack was not unwound, the call to ERROR in the READ-FROM-STRING code
continued on to call INVOKE-DEBUGGER.

-bcd
From: Björn Lindberg
Subject: Re: parsing a string
Date: 
Message-ID: <hcs8ydz1km6.fsf@knatte.nada.kth.se>
Frank Buss <··@frank-buss.de> writes:

> I tried to write a parser for a very simple syntax. The program but I 
> think it's too complicated and too long. Some other questions:
> 
> - how should I save the allowed keys? Later I want to save the values in 
> a structure with the same keys, but I don't want to write them twice
> 
> - the parsing algorithm is bad, because it searches every key twice
> 
> - the "find" function for searching the allowed key list could be slow, a 
> hashmap would be better, but a tree with the allowed keys, in which the 
> node values are chars of the key, would be the fastest solution, but then 
> the parsing algorithm must be changed

I must admit I didn't spend too much time understanding the problem in
detail, but it seems to me that since #\$ can just as well be a part
of a value as being a separator, and this only depends on the actual
keys provided, it is not very useful to try to separate the string on
#\$s to begin with. Here is an attampt that works on your test case,
but that is probably not very robust in the face of malformed input:


(defun parse (string allowed-keys &optional (start 0))
  "Parses a string to a list of pairs: symbol . string, starting with the
   first $ and ignoring all $ for not allowed keys
   example:
   (parse \"test$foo$123$name$empty$$last$bar\" '(\"foo\" \"empty\" \"last\"))
   -> ((\"foo\" . \"123$name\") (\"empty\" . \"\") (\"last\" . \"bar\"))"
  (let* ((key1 (car allowed-keys))
	 (key2 (cadr allowed-keys))
	 (key1-length (length key1))
	 (match1 (search key1 string :start2 start)))
    (if (null key2)
	(list (cons key1 (subseq string (+ 1 match1 key1-length))))
	(let ((match2 (search key2 string :start2 (+ match1 key1-length))))
	  (cons 
	   (cons key1 (subseq string (+ 1 match1 key1-length) (1- match2)))
	   (parse string (cdr allowed-keys) match2))))))


Bj�rn
From: Thomas A. Russ
Subject: Re: parsing a string
Date: 
Message-ID: <ymi6591qea3.fsf@sevak.isi.edu>
Frank Buss <··@frank-buss.de> writes:

> - the "find" function for searching the allowed key list could be slow, a 
> hashmap would be better, but a tree with the allowed keys, in which the 
> node values are chars of the key, would be the fastest solution, but then 
> the parsing algorithm must be changed

This really depends on how big the list of allowed keys is.  When
working with small N, the constant time part of algorithmic complexity
can often be the dominant driver.  You would need to do some testing,
but unless you expect to be working with significantly more than 10 keys
in the allowed key string, you are probably better off with just a
simple search through the list (like using FIND).


> (defun parse (string allowed-keys &optional (start 0))
>   (multiple-value-bind (key-pos key) 
>       (next-key string allowed-keys start)
>     (if key-pos
>         (let* ((value-pos (+ key-pos 2 (length key)))
>                (next-key-pos (next-key string allowed-keys value-pos)))
>           (if next-key-pos
>               (append
>                (list (cons key (subseq string value-pos next-key-pos)))
>                (parse string allowed-keys next-key-pos))

This clause (with the APPEND) is potentially the big efficiency killer,
since it needlessly copies the newly created list you make with LIST.  A
much more efficient and slightly simpler solution would be to use:

              (cons
                (cons key (subseq string value-pos next-key-pos))
                (parse string allowed-keys next-key-pos))


>               (list (cons key (subseq string value-pos)))))
>         ())))

As a general rule, you should always think carefully about the use of
append inside loops or recursive applications of functions.  It is a
very convenient function, but it always has to copy its first N-1
arguments.  NCONC doesn't do the copy, but it has to traverse the lists.
It is a way of sneaking O(n^2) performance into what might otherwise
look like an O(n) algorithm.


-- 
Thomas A. Russ,  USC/Information Sciences Institute