numerical issues

From: Brad Lucier
Subject: numerical issues
Date: Fri, 17 Apr 1998 00:00:00 +0000
Message-ID: <6h6cci$3in@mozo.cc.purdue.edu>

I have some questions about how to use IEEE arithmetic in Scheme/Lisp.
I'm OK as long as I stick with exact (rational, real or complex)
arithmetic, or if I stick with inexact (flonum, real) arithmetic, but
I run into problems when I try to mix exact and inexact, or real and
complex arithmetic.

For guidance, I looked at the Common Lisp Hyperspec;
the contagion rules for CL seem to conflict with how I think
(and probably how W. Kahan thinks) IEEE arithmetic should work.

Section 12.1.4.1 says:

When rationals and floats are combined by a numerical function,
the rational is first converted to a float of the same format.
For functions such as + that take more than two arguments, it
is permitted that part of the operation be carried out exactly
using rationals and the rest be done using floating-point arithmetic.

When rationals and floats are compared by a numerical function,
the function rational is effectively called to convert the float to
a rational and then an exact comparison is performed. In the case of
complex numbers, the real and imaginary parts are effectively
handled individually.

Section 12.1.4.4 says:

The result of a numerical function is a float of the largest format
among all the floating-point arguments to the function.

Section 12.1.5.2 says:

When a real and a complex are both part of a computation, the real
is first converted to a complex by providing an imaginary part of 0.

Perhaps Scheme is meant to follow these rules, too; I don't know,
R5RS is not detailed enough.

These rules lead to some anomalies; for example, assume that we
have one precision of float, IEEE double precision, and exact
bignums and ratnums.

(= (expt 2. 100) (+ (expt 2 100) 1)) => #f

yet

(zero? (- (expt 2. 100) (+ (expt 2 100) 1))) => #t

The answers would be the same if both numerical functions
and comparisons used the same float/rational contagion rules.

When one uses IEEE arithmetic, positive infinity, negative
infinity, and various Not-a-Numbers (here denoted by +inf,
-inf, and NaN) are perfectly good floating point numbers, with
well-defined rules of combination, so that +, -, *, and / are
closed.  For example,

(/ 1.0 0.0) => +inf

Using Section 12.1.4.1, one must assume that

(/ 1 0.0) => (/ 1.0 0.0) => +inf

yet, because IEEE floats have both +0.0 and -0.0, one
doesn't know whether

(/ 1.0 0) => (/ 1.0 0.0) => +inf

or

(/ 1.0 0) => (/ 1.0 -0.0) => -inf

We also obviously have, 

(< 1.0 +inf) => true

since both are floats, yet, if we try to evaluate

(< 1 +inf)

we find that +inf cannot be converted to an exact rational.

The complex contagion rule also poses difficulties. For
example

(* 1. #c(+inf +inf)) => (* #c(1.0 0) #c(+inf +inf))
                     => #c((- (* 1.0 +inf) (* 0 +inf))
                           (+ (* 0 +inf) (* 1.0 +inf)))
                     => #c((- (* 1.0 +inf) (* 0.0 +inf)); float contagion
                           (+ (* 0.0 +inf) (* 1.0 +inf)))
                     => #c((- +inf NaN)
                           (+ NaN +inf)) ; since (* 0.0 +inf) => NaN
                     => #c(NaN NaN) ; since (op Nan x) => NaN

which is probably not what one wants; the problem remains with
(* 1 #c(+inf +inf)).

The CL standard seems to deal well with the difference between
+0.0 and -0.0 with branch cuts of complex transcendental functions,
so it would seem that Scheme should emulate CL in this regard.

I would prefer, however, that the contagion rules be consistent---that
all arguments be converted to the most precise format of any
argument, so that if one argument is exact, then all arguments are
converted to exact arguments.  I think I would prefer that +inf, -inf,
and Nan be considered exact for certain rules (i.e., symbolic numbers
in some sense) so (< 1 +inf) makes sense.  It also makes sense to
me that dividing anything by exact zero yields an error, even if
the other argument is a float, and that multiplying any number
other than a Nan by exact zero yields zero.

Any comments from the R5RS or CL folks?

Brad Lucier   ······@math.purdue.edu

Re: numerical issues Keith Wright
- Re: numerical issues Brad Lucier
  - Re: numerical issues Barry Margolin
Re: numerical issues William D Clinger
- Re: numerical issues Niels M�ller

From: Keith Wright
Subject: Re: numerical issues
Date: Thu, 16 Apr 1998 00:00:00 +0000
Message-ID: <ykra2x2jmx.fsf@tiac.net>

······@polya.math.purdue.edu (Brad Lucier) writes:

> I have some questions about how to use IEEE arithmetic in Scheme/Lisp.
>
>     <many examples of alleged diffulties with equality
>              comparisons of inexact numbers and division by zero>
> 
> I would prefer, however, that the contagion rules be consistent---that
>     <certain rules be followed>
>
> Any comments from the R5RS or CL folks?
> 
> Brad Lucier   ······@math.purdue.edu

I think the Scheme report puposely allows for some latitude in
implementation of inexact numbers.  After all, they are not
exact anyway, and too tight a specification could be very
troublesome to implement on certain machines without
much advantage.

If you are implementing Scheme you are free to follow your own
reasonable rules in dealing with these cases, but if you are
trying to write portable Scheme programs you can not assume
that other implementors working with other machines with
other objective follow exactly the same rules for inexact
numbers.

Both comparing inexact number for equality and division by
zero should simply be avoided.
-- 
     --Keith

This mail message sent by GNU emacs and Linux.
Power to the people. Linux is here.
Food, Shelter, Source code.

From: Brad Lucier
Subject: Re: numerical issues
Date: Sat, 18 Apr 1998 00:00:00 +0000
Message-ID: <6h92dt$aje@mozo.cc.purdue.edu>

Keith Wright <·······@tiac.net> wrote:

> If you are implementing Scheme you are free to follow your own
> reasonable rules in dealing with these cases, but if you are
> trying to write portable Scheme programs you can not assume
> that other implementors working with other machines with
> other objective follow exactly the same rules for inexact
> numbers.

Actually, I'm beginning to believe that portable programs should
not mix exact and inexact arithmetic, or real and complex arithmetic,
without explicitly using exact->inexact, make-rectangular, and
friends.

> Both comparing inexact number for equality and division by
> zero should simply be avoided.

This claim has been made before in these groups, but as a numerical 
analyst, I disagree.  The semantics of these operations are well-defined
in IEEE arithmetic, and can be useful.  I'm just trying to see where
the language-specific rules and bindings might come up to bite me.

Here's another interesting example following ANSI CL rules: If you
have single precision and double precision floats, with short-float-infinity
and long-float-infinity, will short-float-infinity be mapped to
long-float-infinity when a short float is widened to a long float?
If so, then

(< (expt 2.0f0 200) (expt 2.0d0 400))
=> (< short-float-infinity (expt 2.0d0 400)) ; since 2^200 overflows in short
=> (< long-float-infinity (expt 2.0d0 400))
=> #f ; since 2^400 is representable in IEEE double precision

Brad Lucier    ······@math.purdue.edu

From: Barry Margolin
Subject: Re: numerical issues
Date: Sat, 18 Apr 1998 00:00:00 +0000
Message-ID: <BdVZ.28$zF3.668539@cam-news-reader1.bbnplanet.com>

In article <··········@mozo.cc.purdue.edu>,
Brad Lucier <······@polya.math.purdue.edu> wrote:
>Actually, I'm beginning to believe that portable programs should
>not mix exact and inexact arithmetic, or real and complex arithmetic,
>without explicitly using exact->inexact, make-rectangular, and
>friends.

I agree.  I think the rules specified in CL are intended to be appropriate
for casual arithmetic, with the intent to avoid losing the precision
afforded by rationals as much as possible.  Numerically-intense programs
should not depend on automatic conversions, though -- they should use
explicit coercions so that you'll know for sure where accuracy is being
lost.

There was a debate sometime last year about the rule where mixing different
float formats coerces to the largest format.  The intent of this is to
avoid losing precision (by dropping low-order bits of the longer argument),
but widening the shorter one makes it seem more precise than it actually
is.  There's no single rule that's appropriate in all cases; if you want
control over this, do it yourself.

-- 
Barry Margolin, ······@bbnplanet.com
GTE Internetworking, Powered by BBN, Cambridge, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.

From: William D Clinger
Subject: Re: numerical issues
Date: Thu, 23 Apr 1998 00:00:00 +0000
Message-ID: <353FD689.12103164@ccs.neu.edu>

······@polya.math.purdue.edu (Brad Lucier) wrote a lot of stuff about
the rules for numerical computation in Common Lisp, Scheme, and their
interaction with the rules for IEEE floating point arithmetic.  I will
use Scheme's terminology to respond to some of his points.

The IEEE floating point standards say nothing whatsoever about arithmetic
operations that combine a floating point number with a non-floating point
number.  Those standards are very hardware-oriented, and most hardware
just isn't capable of mixed-mode arithmetic.

The philosophy behind Scheme's rules for exact and inexact numbers is
that exact results can be trusted, unless the computation that led to
the exact result involved an explicit use of INEXACT->EXACT or the use
of a predicate on inexact arguments.

Common Lisp's rules for floating point arithmetic are much more specific
than Scheme's rules for inexact arithmetic.  The R5RS just says that,
for computations involving inexact numbers,

    ...it is the duty of each implementation to make the result as
    close as practical to the mathematically ideal result.

Lucier's examples assume IEEE double precision arithmetic.

> (= (expt 2. 100) (+ (expt 2 100) 1)) => #f
> 
> yet
> 
> (zero? (- (expt 2. 100) (+ (expt 2 100) 1))) => #t

The first result is correct, but the second result is not the one that
is requested (but not required) by the R5RS.

(expt 2. 100) evaluates to #i1267650600228229401496703205376, which
is not equal to 1267650600228229401496703205377.

Since -1.0 is the mathematically ideal result of
(- (expt 2. 100) (+ (expt 2 100) 1)), the R5RS says that

    (zero? (- (expt 2. 100) (+ (expt 2 100) 1)))

should evaluate to #f, not to #t.  Scheme48 gets this right, but it is
the only implementation I have found that does.  This is probably a
common bug in implementations of Scheme.

> ...if we try to evaluate
> 
> (< 1 +inf)
> 
> we find that +inf cannot be converted to an exact rational.

This may be a problem with the contagion rules for Common Lisp, but there
is no such problem in Scheme.  In the implementations I tried, (< 1 +inf)
evaluated to the mathematically ideal result #t.

Lucier noted a problem with the complex contagion rule in Common Lisp.
The mathematically ideal result for

    (let ((z (make-rectangular +inf +inf))
      (* 1. z))

is z itself.  Chez Scheme gets this right, but Larceny 0.32 gets it wrong.

> The CL standard seems to deal well with the difference between
> +0.0 and -0.0 with branch cuts of complex transcendental functions,
> so it would seem that Scheme should emulate CL in this regard.

I agree.  The R5RS cites both Common Lisp and Penfield's article on
branch cuts for APL when it says

    The above specification follows [Common Lisp], which in turn
    cites [Penfield]; refer to these sources for more detailed
    discussion of branch cuts, boundary conditions, and implementation
    of these functions.

I don't have my copy of the IEEE/ANSI Scheme standard with me, but it
contains similar language.

> I would prefer, however, that the contagion rules be consistent---that
> all arguments be converted to the most precise format of any
> argument, so that if one argument is exact, then all arguments are
> converted to exact arguments.

According to the R5RS, it appears that the best thing to do when an
arithmetic operation involves an inexact number is to convert all
operands to exact numbers (provided this can be done without any loss
of accuracy, which depends on the implementation) and to perform the
operation on those exact numbers, converting the result to an inexact
number.  When all of the operands are inexact reals and are represented
as IEEE double precision floating point numbers, then IEEE floating
point arithmetic achieves the same effect without all the conversions.

> I think I would prefer that +inf, -inf,
> and Nan be considered exact for certain rules (i.e., symbolic numbers
> in some sense) so (< 1 +inf) makes sense.

(< 1 +inf) makes sense even if +inf is considered inexact, and most
implementations of Scheme seem to get this right.

> It also makes sense to
> me that dividing anything by exact zero yields an error, even if
> the other argument is a float, and that multiplying any number
> other than a Nan by exact zero yields zero.

You would like Chez Scheme, which has precisely this behavior.

Will

From: Niels M�ller
Subject: Re: numerical issues
Date: Mon, 11 May 1998 00:00:00 +0000
Message-ID: <nnyaw9sbi3.fsf@sally.lysator.liu.se>

·······@cygnus.com (Per Bothner) writes:

> I guess my question is:  What is "practical"?
> 
> There are two choices:
> [INEXACT]  Convert exact arguments to inexact, and do an inexact operation.
> [EXACT]  Convert inexact arguments to exact, do an exact operation, and
>   convert the result to inexact.
> 
> Clearly both are equally powerful, and either can simulate the other,
> given appropriate insertion of exact->inexact and inexact->exact
> conversions.  Clearly also [EXACT] will on the whole yield a more precise
> result.  But does it justify the extra cost?  One question is what
> would an experienced programmer expect?  I understood that the result
> should be calculated using [INEXACT] arithmetic, and I suspect I have
> many with me.

I think it makes sense not to pretend to have more precision or
exactness than is actually present in the operands. This is even more
important if you have floating point numbers of vastly varying
precision. If you multiply a 300-bit inexact floating point number by
an inexact 30-bit floatingpoint number, it's just a waste of time and
storage to use 300 bits for the result, and to continue to use 300
bits of precision for all calculations that depend on this result.

For a non-lisp example, have a look at PARI, a package for number
theory, various algebra stuff, and among other things floating point
numbers of arbitrary precision. I cite the manual:

  \subsec{The PARI philosophy}.

  The basic philosophy which governs PARI is that operations and
  functions should first give as exact a result as possible, and
  second, be permitted if they make any kind of sense.

  More specifically, if you do an operation (not a transcendental one)
  between exact objects, you will get an exact object. For example,
  dividing 1 by 3 does not give $0.33333\cdots$ as you could expect,
  but simply the rational number $(1/3)$. If you really want the
  result in type real, evaluate $1./3$ add $0.$ to $(1/3)$.

  The result of operations between imprecise objects will be as
  precise as possible. Consider for example one of the most difficult
  cases, that is the addition of two real numbers $x$ and $y$. The
  \idx{accuracy} of the result is {\it a priori\/} unpredictable; it
  depends on the precisions of $x$ and $y$, on their sizes (i.e.~their
  exponents), and also on the size of $x+y$. PARI works out
  automatically the right precision for the result, even when it is
  working in calculator mode GP where there is a \idx{default
  precision}. See the technical reference manual for the precise
  formulas giving the precision of the result.

I think this approach makes a lot of sense, even if it seems
unorthodox for ordinary computer languages. PARI also handles other
inexact objects, for example power series. Here, the rules are
simpler, but follows from the same principle:

  1+x+x^5+O(x^6) + 1+x^3+O(x^4)

evaluates to

  2+x+x^3+O(x^4)

To include additional terms, and drop the lower order ordo-terms, like

  2+x+x^3+x^5+O(x^6)

would simply be incorrect. To include all terms, like

  2+x+x^3+x^5+O(x^6)+O(x^4)

would not make much sense, as the uncertainty in the O(x^4) term
dominates the other higher order terms.

/Niels