From: Robert Maas, see http://tinyurl.com/uh3t
Subject: (parse-decimal-fraction <string>) --> <rational> ?
Date: 
Message-ID: <rem-2008mar31-001@yahoo.com>
Has anybody well-written a function to parse a string containing
decimal-fraction notation *exactly* to the correct rational number,
*rather*than* to some floating point approximation? For example:

(parse-decimal-fraction "3.7") => 37/10
                     (not 15518925/4194304)
        (not even 4165829655317709/1125899906842624)

(parse-decimal-fraction "3.701234501234501234501234567")
            => 3701234501234501234501234567/1000000000000000000000000000
                                  (not 1499/405)
                               (not 6168718/1666665)
                              (not 15524103/4194304)
                 (not even 4167219580142631/1125899906842624)

(parse-decimal-fraction "0.101" :radix 2) => 5/8

Ideally it would take all the same keyword arguments as
parse-integer, with analagous semantics. But a lesser attempt would
still be useful for some applications, so I'm curious to learn
about all the various versions of such a function that folks have
written in Common Lisp.

It would be somewhat equivalent to this java.math.BigDecimal constructor:
<http://java.sun.com/j2se/1.3/docs/api/java/math/BigDecimal.html#BigDecimal(java.lang.String)>
public BigDecimal(String val)
          Translates the String representation of a BigDecmal [sic] into a
          BigDecimal. The String representation consists of an optional
          sign ('+' or '-') followed by a sequence of zero or more
          decimal digits ("the integer"), optionally followed by a
          fraction, optionally followed by an exponent.
          The fraction consists of of [sic] a decimal point followed by zero or
          more decimal digits. The string must contain at least one digit
          in either the integer or the fraction. ...
except that this is a special-case class rather than a subset of
rationals (Java doesn't have a rational-number class or datatype anyway).

Rationale: What got me to thinking about this is that my HelloPlus
WebPage has recently included Flying Thunder as the 7th language
capable of fully using CGI (including properly decoding
encoded-form-contents to create a "map" i.e. associative array of
some type), after {php, perl, lisp, c, c++, java} were included
previously:
  <http://www.rawbw.com/~rem/HelloPlus/hellos.html#step3>
and when I find time and energy I'm proably going to include Flying
Thunder in my CookBook/Matrix also:
  <http://www.rawbw.com/~rem/HelloPlus/CookBook/Matrix.html>
and I plan shortly to write up a demo of safely parsing integers in
Flying Thunder to add to the demos in the other six languages I
already have:
  <http://www.rawbw.com/~rem/HelloPlus/CookBook/h4s.html>
So I have been asking myself over and over what would make a good
*second* example to include there. Finally today I've decided that
parsing decimal integers would be good, if there's existing code to
do it using any of the various languages. Well as you see Java has
it built-in, so I *should* include it, but it would be a bit of
work to implement it in Lisp also, wouldn't want Java to show up
Lisp would I, but then I thought I'll post this newsgroup article
asking who already did it so I can just show their example instead
of coding the whole thing from scratch. A crude version wouldn't be
too hard (scan for strings of decimal digits separated by #\. and
call parse-integer to convert the integer and fraction part to
integers then do arithmetic to divide the fraction integer by the
appropriate power of ten to get a true fraction part then add to
the integer part. But to include the keyword parameters too would
be more work than I feel like doing when I'm already busy on
another medium-size project, hence my desire to find the
parse-decimal-fraction code already well written.

Aside: This topic gets me back to thinking about interval
arithmetic, including my new idea yesterday that interval
arithmetic would make a great project for my first signficant use
of generic functions in Common Lisp. I could have several different
kinds of real-metric (complete-ordered-field) interval-arithmetic objects:
- Fixed intervals:
  - Rational endpoints
  - Floating-point endpoints (single and double precision)
  - Integer-with-power-of-two endpoints (like BigFloats from MacLisp)
  - Integer-with-power-of-ten endpoints (like BigDecimal from Java)
  - Rational-with-power-of-two endpoints (brand-new AFAIK)
  - Rational-with-power-of-ten endpoints (brand-new AFAIK)
- Infinite-precision mathematical real numbers by lazy-evaluation
  - Nested intervals (where each interval can be any kind of fixed interval)
  - Ordinary continued fractions
      (every term except first must be positive integer)
  - Redundant continued fractions
      (every term except first must equal positive integer plus 0 or 1/2)
  - Ordinary decimal (or other radix) notation.
  - Redundant decimal (or other radix) notation (using hexadecimal
      notation ABCDE for positive overflow past nine, and IBM 026
      keypunch notation JKLMN for negative overfow past zero, for printing)
  - Isolated zero of polynomial function within known-monotonic interval
  - Alternating Taylor series
  - Transcendental functions (trigonometric/logarithmetic/exponential/power)
  - Isolated zero of transcendental functions within known-monotonic interval
- Set-theoretic Dedekind cuts (predicate that tells whether given
    rational number is in upper or lower cut, but hangs on exact
    rational-real not *known* to be rational when compared to itself,
    but uses continued fraction (a la rationalize) to guess at the
    most likely rational)

The basic arithmetic functions would be replaced by plus minus
times quotient which would be generic functions whereby ordinary
data types and interval types could be freely mixed, and a keyword
argument could specify the desired type of interval result if the
default isn't wanted. Also, providing just a single parameter to
plus or times, plus the keyword for desired result type, would
coerce the value to the new type, usually by wrapping a
lazy-evaluation continution around it. Either a type specifier or a
sample value could be used.
  (setq x (plus 3/7 :result-type :decimal))
    => [OrdinaryDecimal "0.4" [Continuation for 285714...]]
  (refine-interval x)
    => [OrdinaryDecimal "0.42" [Continuation for 857142...]]
  (setq y (plus 1/13 :result-type x))
    => [OrdinaryDecimal "0.0" [Continuation for 769230...]]
  (refine-interval y)
    => [OrdinaryDecimal "0.07" [Continuation for 692307...]]
  (setq z (plus y :result-type '(:nested :rational 10))
    => [NestedRationalIntervals 7/100 8/100 <link to above object>]
  (setq w (plus y :result-type '(:nested :rational :floating 10))
    => [FloatingDecimal -2
         [NestedRationalIntervals [7 8] <link to earlier 692307... continuation>]
  (setq xx (plus x :result-type w))
    => [FloatingDecimal -1
         [NestedRationalIntervals [42 43] <link to earlier 857142... continuation>]
  (refine-interval xx :relative-precision 1/3)
    => [FloatingDecimal -1
         <some interval narrower than [42 43] by at least a factor of 3.
          probably [214/5 429/10], reduced from [428/10 429/10]>
         <some continuation on further narrowing that interval>]

BTW: Yesterday I got so inspired on this new idea that I took some
time off from my current medium-sized project to toy with a new
better algorithm for finding (real) zeroes of a polynomial as
interval arithmetic continuations. Of course a variation of
Newton's method is used once we already have a good-enough
preliminary interval around the exact zero. (I worked out the best
way to adapt Newton's method a few months ago but haven't gotten
around to programming it.) But yesterday I thought of an
easy-to-program apparently-fail-proof algorithm for isolating the
(real) zeroes and then automatically narrowing those isolation
intervals to the point where good Newton's method behaviour is
assured immediately, and spent a little bit too much of my time
playing around with the algorithm to get a "feel" for how it would
really work. Maybe after I finish the current medium-sized project
 (involving using 15-dimensional ProxHash vectors to build a
  nearest-neighbor network for a set of nearly three hundred records
  in a corpus, using a new idea I had a few months ago but didn't
  previously try implementing),
I might feel like spending the time to implement my pre-Newton
zero-isolating algorithm and then put it up as a Web demo. Of
course the first thing it will do is take the GCD between the
polynomial and its first derivative to find all multiple zeroes and
divide them out to get a squarefree polynomial which is much easier
to work with when doing interval arithmetic.

From: ·······@eurogaran.com
Subject: Re: (parse-decimal-fraction <string>) --> <rational> ?
Date: 
Message-ID: <87179b15-df33-4f03-b990-6c872d530766@s13g2000prd.googlegroups.com>
It is remarkable how (rationalize 3.7) in GCL yields
4165829655317709/1125899906842624 and 37/10 in almost every other
implementation (I've tried lispworks, genera, openmcl, clisp, cmucl
and sbcl).

On the other hand, (coerce 3.7 'rational) invariably fails. This makes
me wonder:
- Is really correct for it to do so?
- If correct, is it proper?
- Should the behavior you want be ideally provided by coerce? so that:
(coerce 3.7 'rational) -> 37/10
(rationalize 3.7) -> 4165829655317709/1125899906842624

What I mean is... Perhaps you should be hacking the very
implementation itself.
From: Thomas A. Russ
Subject: Re: (parse-decimal-fraction <string>) --> <rational> ?
Date: 
Message-ID: <ymi3aq59uhn.fsf@blackcat.isi.edu>
·······@eurogaran.com writes:

> It is remarkable how (rationalize 3.7) in GCL yields
> 4165829655317709/1125899906842624 and 37/10 in almost every other
> implementation (I've tried lispworks, genera, openmcl, clisp, cmucl
> and sbcl).

You may note that the value returned by GCL is the exact rational
representation of 3.7d0, rather than the "nice" fraction
represetnation.  You can get the exact version using RATIONAL instead of
RATIONALIZE.

> On the other hand, (coerce 3.7 'rational) invariably fails. This makes
> me wonder:
> - Is really correct for it to do so?
> - If correct, is it proper?
> - Should the behavior you want be ideally provided by coerce? so that:
> (coerce 3.7 'rational) -> 37/10
> (rationalize 3.7) -> 4165829655317709/1125899906842624

Well, you can get the latter by using RATIONAL.  Whether you get the
answer you show or perhaps 15518925/4194304 instead depends on details
of the floating point representation and what the default float read
format happens to be.  This latter result is for IEEE single floats
rather than double floats.

Anyway, the standard doesn't require RATIONALIZE to do more than
RATIONAL, since the defined behavior of RATIONAL satisfies the looser
behavior defined for RATIONALIZE.  Now all reasonable implementations do
try to use the approximation wiggle room to provide a better answer.
The CMUCL source has an interesting implementation of RATIONALIZE that
you could, perhaps, examine.

    

-- 
Thomas A. Russ,  USC/Information Sciences Institute
From: John Thingstad
Subject: Re: (parse-decimal-fraction <string>) --> <rational> ?
Date: 
Message-ID: <op.t8xti2vkut4oq5@pandora.alfanett.no>
P� Tue, 01 Apr 2008 16:36:53 +0200, skrev <·······@eurogaran.com>:

> It is remarkable how (rationalize 3.7) in GCL yields
> 4165829655317709/1125899906842624 and 37/10 in almost every other
> implementation (I've tried lispworks, genera, openmcl, clisp, cmucl
> and sbcl).
>
> On the other hand, (coerce 3.7 'rational) invariably fails. This makes
> me wonder:
> - Is really correct for it to do so?
> - If correct, is it proper?
> - Should the behavior you want be ideally provided by coerce? so that:
> (coerce 3.7 'rational) -> 37/10
> (rationalize 3.7) -> 4165829655317709/1125899906842624
>
> What I mean is... Perhaps you should be hacking the very
> implementation itself.

Yes. The computer uses binary notation for numbers. So just because the  
number requires only is a limited number of digits in decimal notation it  
does not follow that the same is true in binary notation. Note that this  
is perfectly natural. Only a small subset of all reals have a rational  
counter part.

In the decimal system you can expess any number with a finite number of  
digits as rational like .7 = 7/10
or a infinate repitition like .7777777.. 7/9.. This can be generalized so  
2.718281828.. a reasonable approximation to e could be 27/10 + 1/ 10 *  
1828/9999 = 271801/99990.

For this to work in binary you need to round it to the nearest decimal  
number with the accuracy you require.


--------------
John Thingstad
From: Alex Mizrahi
Subject: Re: (parse-decimal-fraction <string>) --> <rational> ?
Date: 
Message-ID: <47f366cc$0$90266$14726298@news.sunsite.dk>
 k> What I mean is... Perhaps you should be hacking the very
 k> implementation itself.

actually that's fairly simple to read it from string. something like this:

(defun parse-decimal-fraction (str)
  (let* ((dotpos (position #\. str))
           (left (subseq str 0 dotpos))
           (right (subseq str (1+ dotpos))))
    (+ (parse-integer left)
       (/ (parse-integer right)
          (expt 10 (length right))))))

CL-USER> (parse-decimal-fraction "3.7")
37/10
CL-USER> (parse-decimal-fraction "3.701234501234501234501234567")
3701234501234501234501234567/1000000000000000000000000000
CL-USER> (parse-decimal-fraction "0.101")
101/1000
From: Lars Brinkhoff
Subject: Re: (parse-decimal-fraction <string>) --> <rational> ?
Date: 
Message-ID: <85myocqrnq.fsf@junk.nocrew.org>
"Alex Mizrahi" <········@users.sourceforge.net> writes:
> (defun parse-decimal-fraction (str)
>   (let* ((dotpos (position #\. str))
>            (left (subseq str 0 dotpos))
>            (right (subseq str (1+ dotpos))))
>     (+ (parse-integer left)
>        (/ (parse-integer right)
>           (expt 10 (length right))))))

(defun parse-decimal-fraction (str &key (start 0) end (radix 10))
  (let ((dotpos (position #\. str :start start :end end)))
    (+ (parse-integer str :start start :end dotpos :radix radix)
       (/ (parse-integer str :start (1+ dotpos) :end end :radix radix)
          (expt 10 (- (or end (length str)) (1+ dotpos)))))))
From: Alex Mizrahi
Subject: Re: (parse-decimal-fraction <string>) --> <rational> ?
Date: 
Message-ID: <47f3b04d$0$90267$14726298@news.sunsite.dk>
 ??>> (defun parse-decimal-fraction (str)
 ??>>   (let* ((dotpos (position #\. str))
 ??>>            (left (subseq str 0 dotpos))
 ??>>            (right (subseq str (1+ dotpos))))
 ??>>     (+ (parse-integer left)
 ??>>        (/ (parse-integer right)
 ??>>           (expt 10 (length right))))))

 LB> (defun parse-decimal-fraction (str &key (start 0) end (radix 10))
 LB>   (let ((dotpos (position #\. str :start start :end end)))
 LB>     (+ (parse-integer str :start start :end dotpos :radix radix)
 LB>        (/ (parse-integer str :start (1+ dotpos) :end end :radix radix)
 LB>           (expt 10 (- (or end (length str)) (1+ dotpos)))))))

(expt radix .. ), i guess. 
From: Barry Margolin
Subject: Re: (parse-decimal-fraction <string>) --> <rational> ?
Date: 
Message-ID: <barmar-07BD8E.20023902042008@newsgroups.comcast.net>
In article <·························@news.sunsite.dk>,
 "Alex Mizrahi" <········@users.sourceforge.net> wrote:

>  ??>> (defun parse-decimal-fraction (str)
>  ??>>   (let* ((dotpos (position #\. str))
>  ??>>            (left (subseq str 0 dotpos))
>  ??>>            (right (subseq str (1+ dotpos))))
>  ??>>     (+ (parse-integer left)
>  ??>>        (/ (parse-integer right)
>  ??>>           (expt 10 (length right))))))
> 
>  LB> (defun parse-decimal-fraction (str &key (start 0) end (radix 10))
>  LB>   (let ((dotpos (position #\. str :start start :end end)))
>  LB>     (+ (parse-integer str :start start :end dotpos :radix radix)
>  LB>        (/ (parse-integer str :start (1+ dotpos) :end end :radix radix)
>  LB>           (expt 10 (- (or end (length str)) (1+ dotpos)))))))
> 
> (expt radix .. ), i guess. 

Isn't it an oxymoron to have the word "decimal" in the name, but then 
make the radix a parameter?  Doesn't "decimal" mean "base 10"?

I guess in this context it's just intended to refer to the use of "." as 
the separator between the integer and fraction component.  Is there a 
generic term for this delimiter?  "Radix point"?

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE don't copy me on replies, I'll read them in the group ***
From: Robert Maas, see http://tinyurl.com/uh3t
Subject: Re: (parse-decimal-fraction <string>) --> <rational> ?
Date: 
Message-ID: <rem-2008apr05-003@yahoo.com>
> From: ·······@eurogaran.com
> It is remarkable how (rationalize 3.7) in GCL yields
> 4165829655317709/1125899906842624 and 37/10 in almost every other
> implementation (I've tried lispworks, genera, openmcl, clisp, cmucl
> and sbcl).

It sounds like GCL doesn't do the "smart" thing, sigh. One good
reason to avoid using GCL. On the other hand, in
XLISP-PLUS version 2.1g
rationalize isn't even defined. (rational 3.7) returns 993211187/268435456
I was going to generate a long float and see what rational gave for
it, but XLISP doesn't even have a long float datatype as far as I
can tell. 37e1 reads as 370, but 37s1 37f1 37l1 37d1 each is
treated as name of a variable.

> On the other hand, (coerce 3.7 'rational) invariably fails. This
> makes me wonder:
> - Is really correct for it to do so?

Yes. There are any number of ways to convert a floating-point
approximation into an exact rational, given by rational and
rationalize for two examples (except in GCL), and hence I agree
with the decision not to pick one or the other for coerce to
provide.

> - If correct, is it proper?

Yes.

> - Should the behavior you want be ideally provided by coerce?

No, because different people would want different behaviour, and
the computer can't read the mind of the programmer or the user to
learn if they would want the same result and of so then provide
that but if disagreement then signal an error that minds of
programmer and user disagree and Lisp system doesn't want to choose
one and consequently be disliked by the other and possibly abused.

Besides, do you *really* want your computer to have the technology
to read your mind and do something other than what the program is
advertised to do just because it thinks you want something else?
What if it's wrong, and it mistakenly believes you want to die, so
the computer kills you to please you?

> What I mean is... Perhaps you should be hacking the very
> implementation itself.

Do you mean that I should trespass into the private filesystems at
CMU where CMUCL is built, and make changes so that the next release
will do what I personally want instead of what the designers of
CMUCL wanted? Are you aware of the Federal computer-crime law that
would put me in prison for years if I got caught doing that?

Or do you mean that I should trespass into the private filesystems
at CMU where CMUCL is built just to obtain all the source code,
then build a private copy of their program and install it on my
ISP, without permission of the admin here? I think the
computer-crime law would apply there too.

I can't install it on my own account because I don't have enough
free disk space and I can't afford to lease more.
From: ·······@eurogaran.com
Subject: Re: (parse-decimal-fraction <string>) --> <rational> ?
Date: 
Message-ID: <3bb813f1-6aaa-4afd-961e-628de7ea7d01@e23g2000prf.googlegroups.com>
From CLtl2 :
"Coercions from floating-point numbers to rationals and from ratios to
integers are purposely not provided because of rounding problems. The
functions rational, rationalize, floor, ceiling, truncate, and round
may be used for such purposes. Similarly, coercions from characters to
integers are purposely not provided; char-code or char-int may be used
explicitly to perform such conversions."

and from CLHS :
rational assumes that the float is completely accurate.
rationalize assumes that the float is accurate only to the precision
of the floating-point representation.
From: Robert Maas, see http://tinyurl.com/uh3t
Subject: Re: (parse-decimal-fraction <string>) --> <rational> ?
Date: 
Message-ID: <rem-2008apr06-001@yahoo.com>
> From: ·······@eurogaran.com
> From CLtl2 :
> "Coercions from floating-point numbers to rationals and from ratios to
> integers are purposely not provided because of rounding problems. The
> functions rational, rationalize, floor, ceiling, truncate, and round
> may be used for such purposes. Similarly, coercions from characters to
> integers are purposely not provided; char-code or char-int may be used
> explicitly to perform such conversions."

IMO the only reason to ever generate a floating point value is when
it's just an approximation, with unknown bounds on possible values,
i.e. just a shot in the dark. IMO it would have been better, from
the very start circa 1964, if rational-interval datatype had been
used, giving specific bounds on the possible values. Then if you
ever want to discard the interval and produce just a single value,
for some perverse reason, there are four reasonable conversions:
- Lower bound
- Upper bound
- Center of interval
- Simplest fraction within interval
In each of those four cases, the result would be an exact rational.

> and from CLHS :
> rational assumes that the float is completely accurate.

which is really a stupid assumption in virtually every case. The
*real* (actual) purpose of that function is a cheap way to generate
an exact rational for the internal representation of the
shot-in-dark. Sometimes you want integer-decode-float, but sometimes
you want exactly the same info in an easier-to-see-at-a-glance format,
hence a nice simple easy-to-recognize rational number.

> rationalize assumes that the float is accurate only to the
> precision of the floating-point representation.

Even that is a stupid assumption except in the special case where
the number was parsed from text input and then immediately passed
to rationalize. If any intermediate calculation occurred, the
assumption is wrong and GIGO applies.

CMU Common Lisp 18b

(read-from-string "3.7")
3.7

(rationalize *)
37/10

(- ** 3)
0.70000005

(rationalize *)
1957345/2796207

See, just a single step of arithmetic is enough to reduce the
accuracy relative to the precision, so that the precision-based
function rationalize gives an incorrect answer, by assuming all the
precision was accurate, when in fact several low-order bits of the
precision were not accurate per the intended value that was parsed.

Now if read-from-string would call parse-rational instead of
parse-float when it sees a decimal fraction, it would have worked
like this:

(read-from-string "3.7")
37/10

(- * 3)
7/10

So IMO this is what should have been implemented (instead of
floating point) by multiple vendors, to the point where it would
have been common practice, in Common Lisp as it was heading toward
standardization, and in principle what should have been implemented
way back around 1964 when Fortran was invented. But there's one
problem: Fortran didn't have dynamic memory allocation, so
unlimited-precision integers weren't possible, so rationals with
arbitrary numerator and denominator would be impossible while
rationals with fixed-range numerator and denominator would have
been not very useful.

One alternative that *would* have been almost practical would be
fixed-precision integers and rationals with a power-of-ten or
power-of-two scaling factor, using interval arithmetic.
Unfortunately memory was very scarse back then (20k bytes for IBM
1620 for example, what SCU had, 40k at San Jose State which had
more money to spend on computing resources), so some programs just
barely able to fit in memory with shot-in-dark floating point
values would have not fit with interval arithmetic with any sort of
reasonable bounds on numerator and denominator. Still, perhaps that
tradeoff might have been appealing to some users, and might have
been offered by some vendors (well IBM corporation was *the* vendor
for Fortran compiler for IBM 1620, so I really mean that IBM
dropped the ball IMO).

Another option would be regular floating-point endpoints, which
exactly doubled the storage cost compared to short-in-dark
floating-point, but greatly reduced the amount of human time needed
to develop the algorithms, since much of the existing
floating-point code could be re-used/adapted to interval
arithmetic.

Side remark: Circa 1966, I actually started work on an interval
arithmetic package for integers on the IBM 1620 at SCU, four basic
functions, that's as far as I got before I needed to divert my
time/energy to other more urgent tasks. (I forget whether I wrote
it in Fortran or assembly. If the latter, then I probably allowed
integers longer than Fortran allows, because the iBM 1620 is
inherently arbitrary-precision arithmetic, using BCD digits with
the leftmost digit flagged to make arithmetic operations stop
traversing to the left. But still there was no dynamic allocation.)