From: Smeckler
Subject: Lame code needs improvement
Date: 
Message-ID: <b9boep$c48$1@news7.svr.pol.co.uk>
I'm attempting to do some basic text parsing, and am obviously making a bit
of a dog's breakfast of it...

The bit I'm having most grief with is shown below - the
'get-numeric-literal' function is passed the first character of a sequence
of digits that's been read, and the stream we're reading from.  It reads all
the consecutive digits (without 'eating' any non digit characters from the
stream) and return a token structure containing the digits as a string.

Any constructive criticism would be greatly appreciated, as I have no other
Lisp-aware contacts to offer guidance.

Particular things I'm worried about -

- It seems wasteful to read the sequence into a list to find the size, then
allocate a string and copy it.
- I'm sure there must be a nicer way of adding characters to a list.
- If I print the strings I've created with format t "~A", they print as,
e.g., #(3 4 5), rather than "345".
- I don't get the convention for indenting 'do' structures.

thanks in advance...

<lame code>

(defun get-numeric-literal (first-char stream)
 (do ((num-list (list first-char))
    (new-char (peek-char nil stream) (peek-char nil stream)) )
   ((not (is-digit new-char)) (make-token :image (string-from-list num-list)
:type :numeric-literal))
     (setf num-list (append num-list (list (read-char stream nil))))))

(defun string-from-list (lst)
 (setq str (make-array (length lst)))
 (do ((i 0 (+ i 1)))
  ((eql i (length lst)) str)
   (setf (svref str i) (elt lst i))))

(defun is-digit(c)
 (and (char>= c #\0)
         (char<= c #\9)))

</lame code>

From: Barry Margolin
Subject: Re: Lame code needs improvement
Date: 
Message-ID: <WVdua.31$6b5.1362@paloalto-snr1.gtei.net>
In article <············@news7.svr.pol.co.uk>,
Smeckler <······················@hotmail.com> wrote:
>I'm attempting to do some basic text parsing, and am obviously making a bit
>of a dog's breakfast of it...
>
>The bit I'm having most grief with is shown below - the
>'get-numeric-literal' function is passed the first character of a sequence
>of digits that's been read, and the stream we're reading from.  It reads all
>the consecutive digits (without 'eating' any non digit characters from the
>stream) and return a token structure containing the digits as a string.
>
>Any constructive criticism would be greatly appreciated, as I have no other
>Lisp-aware contacts to offer guidance.
>
>Particular things I'm worried about -
>
>- It seems wasteful to read the sequence into a list to find the size, then
>allocate a string and copy it.
>- I'm sure there must be a nicer way of adding characters to a list.

Why not put the characters directly into the string with VECTOR-PUSH-EXTEND?

>- If I print the strings I've created with format t "~A", they print as,
>e.g., #(3 4 5), rather than "345".

You didn't create a string, you created an array; to create a string, you
must use MAKE-STRING or specify :ELEMENT-TYPE 'CHARACTER to MAKE-ARRAY.

Also, you didn't put characters into it, you put numbers.  See the function
DIGIT-CHAR.

>- I don't get the convention for indenting 'do' structures.

The iteration specifications and the end-test specification should be at
equivalent indentation, while the body should be indented less so that it
stands out.

(do ((i 0 (1+ i)))
    ((= i (length lst)))
  (setf (svref str i) (digit-char (elt lst i))))

However, calling LENGTH and ELT repeatedly on lists is generally poor
practice, you should iterate through the list with CDR:

(do ((i 0 (1+ i))
     (cur lst (cdr cur)))
    ((null cur))
  (setf (svref str i) (digit-char (car cur))))

-- 
Barry Margolin, ··············@level3.com
Genuity Managed Services, a Level(3) Company, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
From: Jeremy Yallop
Subject: Re: Lame code needs improvement
Date: 
Message-ID: <b9brhj$h9csf$1@ID-114079.news.dfncis.de>
Smeckler wrote:
> I'm attempting to do some basic text parsing, and am obviously making a bit
> of a dog's breakfast of it...

For simple parsing tasks such as this the Lisp reader is often
useful. For instance,

  * (write-to-string (read))
  12345
  ;
  "12345"
  *

> (defun string-from-list (lst)

(There's no need to give parameters awkward names.  `list' will do
fine.  Common Lisp has separate name spaces for functions and
variables).

The functionality is already available:

  (defun string-from-list (list)
    (coerce list 'string))

>  (setq str (make-array (length lst)))

SETQ is not the right way to create local variables.  Use LET instead.

> (defun is-digit(c)

There's also a standard function to do this.  It's called DIGIT-CHAR-P.

Jeremy.
From: Henrik Motakef
Subject: Re: Lame code needs improvement
Date: 
Message-ID: <874r464h4i.fsf@interim.henrik-motakef.de>
"Smeckler" <······················@hotmail.com> writes:

> The bit I'm having most grief with is shown below - the
> 'get-numeric-literal' function is passed the first character of a sequence
> of digits that's been read, and the stream we're reading from.  It reads all
> the consecutive digits (without 'eating' any non digit characters from the
> stream) and return a token structure containing the digits as a string.
[...]
> Particular things I'm worried about -
>
> - It seems wasteful to read the sequence into a list to find the size, then
> allocate a string and copy it.
> - I'm sure there must be a nicer way of adding characters to a list.

Well, strings are arrays, and as such may be "adjustable", i.e. they
can grow. So you could in principle get rid of the list altogether and
just use an adjustable string, adding newly read chars with
VECTOR-PUSH-EXTEND, for example.

> - If I print the strings I've created with format t "~A", they print as,
> e.g., #(3 4 5), rather than "345".
> - I don't get the convention for indenting 'do' structures.

Do you use a Lisp-friendly editor, like Emacs? It should handle
indentation for you.

> (defun string-from-list (lst)
>  (setq str (make-array (length lst)))
>  (do ((i 0 (+ i 1)))
>   ((eql i (length lst)) str)
>    (setf (svref str i) (elt lst i))))

This is equivalent to

(concatenate 'string list)

[Note that you can just use proper names like "list" or "string" as
variable names, no need for abbreviations like "lst" or "str"]

> (defun is-digit(c)
>  (and (char>= c #\0)
>          (char<= c #\9)))

(digit-char-p c)
From: Eduardo Muñoz
Subject: Re: Lame code needs improvement
Date: 
Message-ID: <uptmuwhnn.fsf@terra.es>
* "Smeckler" <······················@hotmail.com>
| The bit I'm having most grief with is shown below - the
| 'get-numeric-literal' function is passed the first character of a sequence
| of digits that's been read, and the stream we're reading from.  It reads all
| the consecutive digits (without 'eating' any non digit characters from the
| stream) and return a token structure containing the digits as a string.

Take a look at with-output-to-string (and loop too :)
This function does something close but not too much (I
think) to what you want. 

(defun gnc (stream)
  (with-output-to-string (num-list)
    (loop for char = (peek-char nil stream)
          until (not (digit-char-p char)) do
          (write-char (read-char stream) num-list))))





-- 
Eduardo Mu�oz          | (prog () 10 (print "Hello world!")
http://213.97.131.125/ |          20 (go 10))
From: Joe Marshall
Subject: Re: Lame code needs improvement
Date: 
Message-ID: <vfwla2yw.fsf@ccs.neu.edu>
"Smeckler" <······················@hotmail.com> writes:

> Any constructive criticism would be greatly appreciated, as I have no other
> Lisp-aware contacts to offer guidance.

Ok.

> - I don't get the convention for indenting 'do' structures.

;;; Untabified.  Use a fixed-pitch font for best results.

(defun get-numeric-literal (first-char stream)
  (do ((num-list (list first-char))
       (new-char (peek-char nil stream) (peek-char nil stream)))
      ((not (is-digit new-char)) (make-token :image (string-from-list num-list)
                                             :type :numeric-literal))
    (setf num-list (append num-list (list (read-char stream nil))))))

(defun string-from-list (lst)
  (setq str (make-array (length lst)))
  (do ((i 0 (+ i 1)))
      ((eql i (length lst)) str)
    (setf (svref str i) (elt lst i))))

> - I'm sure there must be a nicer way of adding characters to a list.

>   (setf num-list (append num-list (list (read-char stream nil))))))

NEVER do this!  It creates an O(n^2) process in the number of
characters.  Every time through the loop you make an entirely new copy
of all the characters you have read so far.

Really the best way to do this is to accumulate the list backwards and
reverse the list when you are done:

(defun get-numeric-literal (first-char stream)
  (do ((num-list (list first-char))
       (new-char (peek-char nil stream) (peek-char nil stream)))
      ((not (is-digit new-char))
       (make-token :image (string-from-list (nreverse num-list))
                   :type :numeric-literal))
    (push (read-char stream nil) num-list)))

In this:
(defun string-from-list (lst)
  (setq str (make-array (length lst)))
  (do ((i 0 (+ i 1)))
      ((eql i (length lst)) str)
    (setf (svref str i) (elt lst i))))

You are checking the length of list on each iteration (slow!)
and traversing the list from the beginning on each iteration (slow!)
Using a free variable as an accumulator is bad, as well.

You can simply call coerce:  (coerce lst 'string)
But if you wanted to write your own loop for this, 

(defun string-from-list (list)
  (let* ((length (length list))
         (result (make-string length)))
    (do ((i 0 (1+ i))
         (l list (cdr l)))
        ((>= i length) result)
      (setf (schar result i) (car l)))))

> - It seems wasteful to read the sequence into a list to find the size, then
>   allocate a string and copy it.

If you are really that concerned about space, read the characters into
a static buffer and then cons the string.


---
You might want to consider accumulating the numeric token as a number.

(defun get-numeric-literal (first-char stream)
  (number->token 
   (get-number first-char stream)))

(defun number->token (number)
  (make-token :image (prin1-to-string number)
              :type :numeric-literal))

(defun get-number (first-char stream)
  ;; Assumes that first-char has been peeked, but not read.
  ;; If first-char has been read, then simply unread it:
  ;; (unread-char first-char stream)
  (do* ((result 0 (+ (* result *read-base*)
                     (digit-char char *read-base*)))
        (char first-char (peek-char nil stream nil nil)))
      ((not (and char (digit-char-p char *read-base*))) result)
    (read-char stream)))
From: Smeckler
Subject: Re: Lame code needs improvement
Date: 
Message-ID: <b9df4k$pss$1$8300dec7@news.demon.co.uk>
OK, thanks very much to everyone for the assistance.

I'd like to blame Paul Graham's "ANSI Common Lisp" book for my
disagreeable habit of using variable names like 'lst' rather than
'list'.  Also for using 'setf' on variables that haven't previously been
declared (is that the right term in lisp?)

I think the main things I've learnt here are
1) Lisp is flexible, which can be bewildering (the burden of choice :)
2) Real coding problems can't be solved with a 3 line recursive
function, unlike problems examined in textbooks!  So my code probably
wasn't quite as lame as I'd feared.

cheers all.
From: Coby Beck
Subject: Re: Lame code needs improvement
Date: 
Message-ID: <b9clru$20pb$1@otis.netspace.net.au>
"Smeckler" <······················@hotmail.com> wrote in message
·················@news7.svr.pol.co.uk...
> I'm attempting to do some basic text parsing, and am obviously making a
bit
> of a dog's breakfast of it...
>
> The bit I'm having most grief with is shown below - the
> 'get-numeric-literal' function is passed the first character of a sequence
> of digits that's been read, and the stream we're reading from.  It reads
all
> the consecutive digits (without 'eating' any non digit characters from the
> stream) and return a token structure containing the digits as a string.

Hi, I have a bit of code I wrote a couple of years ago that may give you
some ideas.  It does not return a string, but the integer and it throws an
exception because the caller is supposed to know there is an integer here
(that can be altered easily).  The safe-peek-char and safe-read-char were
written because this was a socket stream and I did not want to hang, you can
probably just change it to peek-char and read-char and remove some args.  I
havn't looked at this in ages so I hope it will not embarass me!  but
regardless I always welcome constructive criticism.

(defun read-integer (stream &key (timeout .1) eof-error-p
                            (eof-value #\Null))
  (let ((ch (safe-peek-char stream
                            :timeout timeout
                            :eof-error-p eof-error-p
                            :eof-value eof-value)))
    (if (not (or (digit-char-p ch)
                 (char= #\- ch)
                 (char= #\+ ch)))
        (error 'parsing-error
               :stream-position (file-position stream)
               :expectation "integer"
               :in-buffer (sample-stream stream 20))
      (let ((sign #\+))
        (when (or (char= #\- ch) (char= #\+ ch))
          (setf sign (safe-read-char
                      stream
                      :timeout timeout
                      :eof-error-p eof-error-p
                      :eof-value eof-value))
          (when (or (eof? stream)
                    (not (digit-char-p (safe-peek-char
                                        stream
                                        :timeout timeout
                                        :eof-error-p eof-error-p
                                        :eof-value eof-value))))
            ;; we got a plus or minus, but no digit after
            (unread-char sign stream)
            (error 'parsing-error
                   :stream-position (file-position stream)
                   :expectation "integer"
                   :in-buffer (sample-stream stream 20))))
        ;; if we got here with no error, start eating digits
        (let ((sum 0) (digits-read 0))
          (loop for ch = (safe-read-char stream
                                         :timeout timeout
                                         :eof-error-p eof-error-p
                                         :eof-value nil)
                for num = (when ch (digit-char-p ch))
                until (null num)
                do (setf sum (+ num (* 10 sum)))
                (incf digits-read)
                ;; NB why check if ch was there?  It seems unread-char
                ;; will unread the last successful read, ignoring the nil
                ;; anyway... ANSI says the results are unspecified
                finally (when ch (unread-char ch stream)))
          (values (if (char= #\- sign) (- sum) sum)
                  digits-read))))))


-- 
Coby Beck
(remove #\Space "coby 101 @ bigpond . com")
From: Smeckler
Subject: Re: Lame code needs improvement
Date: 
Message-ID: <b9e22l$9kp$1$8302bc10@news.demon.co.uk>
OK, thanks very much to everyone for the assistance.

I'd like to blame Paul Graham's "ANSI Common Lisp" book for my
disagreeable habit of using variable names like 'lst' rather than
'list'.  Also for using 'setf' on variables that haven't previously been
declared (is that the right term in lisp?)

I think the main things I've learnt today are
1) Lisp is flexible, which can be bewildering (the burden of choice :)
2) Real coding problems can't be solved with a 3 line recursive
function, unlike problems examined in textbooks!  So my code probably
wasn't quite as lame as I'd feared.

cheers all.