PARSE-FLOAT

From: Mark Kantrowitz
Subject: PARSE-FLOAT
Date: Thu, 25 Aug 1994 05:03:32 +0000
Message-ID: <33h8n4$9oa@cantaloupe.srv.cs.cmu.edu>

I've always been bugged by Common Lisp's lack of an implementation of
PARSE-FLOAT to go with PARSE-INTEGER, so here's one. (Folks who use
READ-FROM-STRING to parse floating-point numbers are just looking for
trouble.)  Let me know if you find any bugs -- I did a very minimal
amount of testing after writing it.

--mark

;;; Thu Aug 25 00:56:39 1994 by Mark Kantrowitz <·····@SKEEZER.OZ.CS.CMU.EDU>
;;; atof.cl -- 7824 bytes

;;; ****************************************************************
;;; PARSE-FLOAT -- equivalent of C's atof **************************
;;; ****************************************************************
;;; 
;;; This program is based loosely on the CMU Common Lisp implementation 
;;; of PARSE-INTEGER.
;;;
;;; ORIGIN: ftp.cs.cmu.edu:/user/ai/lang/lisp/code/math/atof/
;;;
;;; Copyright (c) 1994 by Mark Kantrowitz
;;;
;;; This material was developed by Mark Kantrowitz of the School of
;;; Computer Science, Carnegie Mellon University.
;;;
;;; Permission to use, copy, modify, and distribute this material is
;;; hereby granted, subject to the following terms and conditions.
;;;
;;; In case it be determined by a court of competent jurisdiction that any
;;; provision herein contained is illegal, invalid or unenforceable, such
;;; determination shall solely affect such provision and shall not affect
;;; or impair the remaining provisions of this document.
;;; 
;;; 1. All copies of the software, derivative works or modified versions,
;;;    and any portions thereof, must include this entire copyright and
;;;    permission notice, without modification. The full notice must also
;;;    appear in supporting documentation.
;;; 
;;; 2. Users of this material agree to make their best efforts to inform
;;;    Mark Kantrowitz of noteworthy uses of this material. Correspondence
;;;    should be provided to Mark at:
;;; 
;;;         Mark Kantrowitz
;;;         School of Computer Science
;;;         Carnegie Mellon University
;;;         5000 Forbes Avenue
;;;         Pittsburgh, PA 15213-3891
;;; 
;;;         E-mail: ·····@cs.cmu.edu
;;; 
;;; 3. This software and derivative works may be distributed (but not
;;;    offered for sale) to third parties, provided such third parties
;;;    agree to abide by the terms and conditions of this notice. If you
;;;    modify this software, you must cause the modified file(s) to carry
;;;    a change log describing the changes, who made the changes, and the
;;;    date of the changes.
;;; 
;;; 4. All materials developed as a consequence of the use of this material
;;;    shall duly acknowledge such use, in accordance with the usual standards
;;;    of acknowledging credit in academic research.
;;; 
;;; 5. Neither the name of Mark Kantrowitz nor any adaptation thereof may
;;;    be used to endorse or promote products derived from this software
;;;    or arising from its use without specific prior written permission
;;;    in each case.
;;; 
;;; 6. Users of this software hereby grant back to Mark Kantrowitz and
;;;    Carnegie Mellon University a non-exclusive, unrestricted, royalty-free
;;;    right and license under any changes, enhancements or extensions made
;;;    to the core functions of the software, including but not limited to
;;;    those affording compatibility with other hardware or software
;;;    environments. Users further agree to use their best efforts to return to
;;;    Mark Kantrowitz any such changes, enhancements or extensions that they
;;;    make.
;;; 
;;; THE SOFTWARE IS PROVIDED "AS IS" AND MARK KANTROWITZ DISCLAIMS ALL
;;; EXPRESS OR IMPLIED WARRANTIES WITH REGARD TO THIS MATERIAL (INCLUDING
;;; SOFTWARE CONTAINED THEREIN), INCLUDING, WITHOUT LIMITATION, ALL
;;; IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
;;; PURPOSE. IN NO EVENT SHALL MARK KANTROWITZ BE LIABLE FOR ANY SPECIAL,
;;; DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
;;; RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
;;; CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
;;; CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE (INCLUDING BUT
;;; NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR
;;; LOSSES SUSTAINED BY THIRD PARTIES OR A FAILURE OF THE PROGRAM TO
;;; OPERATE AS DOCUMENTED). MARK KANTROWITZ IS UNDER NO OBLIGATION TO
;;; PROVIDE ANY SERVICES, BY WAY OF MAINTENANCE, UPDATE, OR OTHERWISE.
;;; 

; (in-package "LISP")
; (export '(parse-float))

(defparameter *whitespace-chars* '(#\space #\tab))

(defun whitespacep (char)
  (find char *whitespace-chars*))

(defun parse-float (string &key (start 0) end (radix 10) junk-allowed)
  "Converts a substring of STRING, as delimited by START and END, to a 
   floating point number, if possible. START and END default to the 
   beginning and end of the string. RADIX must be between 2 and 36. 
   A floating point number will be returned if the string consists of an
   optional string of spaces and an optional sign, followed by a string
   of digits optionally containing a decimal point, and an optional e or
   E followed by an optionally signed integer. The use of e/E to indicate
   an exponent only works for RADIX = 10. Returns the floating point
   number, if any, and the index for the first character after the number."

  ;; END defaults to the end of the string
  (setq end (or end (length string))) 

  ;; Skip over whitespace. If there's nothing but whitespace, signal an error.
  (let ((index (or (position-if-not #'whitespacep string :start start :end end)
                   (if junk-allowed
                       (return-from parse-float (values nil end))
                     (error "No non-whitespace characters in number."))))
        (minusp nil) (decimalp nil) (found-digit nil) 
        (before-decimal 0) (after-decimal 0) (decimal-counter 0)
        (exponent 0)
        (result 0))
    (declare (fixnum index))

    ;; Take care of optional sign.
    (let ((char (char string index)))
      (cond ((char= char #\-)
             (setq minusp t)
             (incf index))
            ((char= char #\+)
             (incf index))))

    (loop
     (when (= index end) (return nil))
     (let* ((char (char string index))
            (weight (digit-char-p char radix)))
       (cond ((and weight (not decimalp))
              ;; A digit before the decimal point
              (setq before-decimal (+ weight (* before-decimal radix))
                    found-digit t))
             ((and weight decimalp)
              ;; A digit after the decimal point
              (setq after-decimal (+ weight (* after-decimal radix))
                    found-digit t)
              (incf decimal-counter))
             ((and (char= char #\.) (not decimalp))
	      ;; The decimal point
              (setq decimalp t))
             ((and (char-equal char #\e) (= radix 10))
	      ;; E is for exponent
              (multiple-value-bind (num idx) 
                  (parse-integer string :start (1+ index) :end end
                                 :radix radix :junk-allowed junk-allowed)
                (setq exponent (or num 0)
                      index idx)
		(when (= index end) (return nil))))
             (junk-allowed (return nil))
             ((whitespacep char)
              (when (position-if-not #'whitespacep string
                                     :start (1+ index) :end end)
                (error "There's junk in this string: ~S." string))
              (return nil))
             (t
              (error "There's junk in this string: ~S." string))))
     (incf index))

    ;; Cobble up the resulting number
    (setq result (float (* (+ before-decimal
                              (* after-decimal 
                                 (expt radix (- decimal-counter))))
                           (expt radix exponent))))

    ;; Return the result
    (values
     (if found-digit
         (if minusp (- result) result)
       (if junk-allowed
           nil
         (error "There's no digits in this string: ~S" string)))
     index)))

;;; *EOF*

Re: PARSE-FLOAT Mark Kantrowitz
- Re: PARSE-FLOAT Eyvind Ness

From: Mark Kantrowitz
Subject: Re: PARSE-FLOAT
Date: Thu, 25 Aug 1994 18:14:41 +0000
Message-ID: <33in2h$cv3@cantaloupe.srv.cs.cmu.edu>

In article <··········@cantaloupe.srv.cs.cmu.edu>,
Mark Kantrowitz <······@cs.cmu.edu> wrote:
>(Folks who use
>READ-FROM-STRING to parse floating-point numbers are just looking for
>trouble.)  

Several folks have asked me why, so here's the explanation in brief:

[1]     > (read-from-string "(1.23")
	>>Error: End of file reading in a list, 
	           on stream #<Stream STRING-INPUT-STREAM 40A529C6>.
	         Surrounding context: (1.23)

[2]	> (read-from-string "#.(print \"you lose\")")

	"you lose" 
	"you lose"
	20

In the first example, you have to worry about aspects of the string
upsetting the reader. In the second example, a malicious user could
use this bug to wreak all kinds of havoc. (Several years ago the
Symbolics mailer used read-from-string to parse a field in the mail.
Amazing what this let one do.)

In short, the reader is too overpowered a tool to use for such a
simple task. If you use a tool that has more features than you're
using, don't be surprised if those extra features cause you trouble
down the road.

--mark

From: Eyvind Ness
Subject: Re: PARSE-FLOAT
Date: Sat, 27 Aug 1994 14:05:03 +0000
Message-ID: <EYVIND.94Aug27160503@bingen.hrp.no>

In article <··········@cantaloupe.srv.cs.cmu.edu> ······@cs.cmu.edu (Mark Kantrowitz) writes:

  ;; In article <··········@cantaloupe.srv.cs.cmu.edu>,
  ;; Mark Kantrowitz <······@cs.cmu.edu> wrote:
  ;; >(Folks who use
  ;; >READ-FROM-STRING to parse floating-point numbers are just looking for
  ;; >trouble.)  
  ;; 
  ;; Several folks have asked me why, so here's the explanation in brief:
  ;; 
  ;; [1]     > (read-from-string "(1.23")
  ;; 	>>Error: End of file reading in a list, 
  ;; 	           on stream #<Stream STRING-INPUT-STREAM 40A529C6>.
  ;; 	         Surrounding context: (1.23)
  ;; 
  ;; [2]	> (read-from-string "#.(print \"you lose\")")
  ;; 	
  ;; 	"you lose" 
  ;; 	"you lose"
  ;; 	20
  ;; 
  ;; In the first example, you have to worry about aspects of the string
  ;; upsetting the reader. In the second example, a malicious user could
  ;; use this bug to wreak all kinds of havoc. (Several years ago the
  ;; Symbolics mailer used read-from-string to parse a field in the mail.
  ;; Amazing what this let one do.)
  ;; 
  ;; In short, the reader is too overpowered a tool to use for such a
  ;; simple task. If you use a tool that has more features than you're
  ;; using, don't be surprised if those extra features cause you trouble
  ;; down the road.

I agree with that, but where is the PARSE-FLOAT built-in? What are you
supposed to do when you want to read a float from a stream?

   USER(19): (apropos "PARSE" "CL")
   PARSE-INTEGER       [function] (STRING &KEY START END ...)
   PARSE-NAMESTRING    [function] (THING &OPTIONAL HOST DEFAULT ...)
   PARSE-ERROR
   USER(20): 

And you can always get around the example problems you listed:

   USER(1): (let ((*read-eval* nil))
	      (read-from-string "#.(print \"you lose\")"))
   Error: The reader encountered a `#.' but *READ-EVAL* is NIL: (PRINT "you lose")
   [1] USER(2): 

We don't lose.

   USER(17): (handler-case (read-from-string "(1.23")
	      (end-of-file (c)
		(format t "Sorry, input is unparsable: ~A." c)))
   Sorry, input is unparsable: eof encountered on stream
			       #<EXCL::STRING-INPUT-STREAM @ #x3bd659>.
   NIL

We don't lose.

Eyvind.