Actual Lisp Question - Interpretation of numeric tokens

From: Frank A. Adrian
Subject: Actual Lisp Question - Interpretation of numeric tokens
Date: Wed, 29 Nov 2000 00:00:00 +0000
Message-ID: <55mV5.2745$Ta5.439722@news.uswest.net>

In Section 2.3.1 of the HyperSpec, the production for exponent shows that
the exponent's digit can be any digit in the current input radix.  However
in Section 2.3.2.2, the spec states that the mantissa and fraction can be
followed by "an optional *decimal* exponent".  My assumptions on the
interpretation of floating point numbers would be as follows:

(1) The use of digit instead of decimal-digit in the exponent production in
Section 2.3.1 is an error (or at least an oversight).

(2) All digits within a floating point number, including the digits of the
exponent, are interpreted in radix 10 regardless of the value of *read-base*
(a corollary of assumption 1).

Are these assumptions correct?

faa

Re: Actual Lisp Question - Interpretation of numeric tokens Kent M Pitman
- Re: Actual Lisp Question - Interpretation of numeric tokens Frank A. Adrian
  - Re: Actual Lisp Question - Interpretation of numeric tokens Kent M Pitman
  - Re: Actual Lisp Question - Interpretation of numeric tokens Hannah Schroeter
  - Re: Actual Lisp Question - Interpretation of numeric tokens Pierre R. Mai
Re: Actual Lisp Question - Interpretation of numeric tokens Barry Margolin
- Re: Actual Lisp Question - Interpretation of numeric tokens Johan Kullstam
  - Re: Actual Lisp Question - Interpretation of numeric tokens Raymond Toy
    - Re: Actual Lisp Question - Interpretation of numeric tokens Gareth McCaughan
      - Re: Actual Lisp Question - Interpretation of numeric tokens Johan Kullstam
        Re: Actual Lisp Question - Interpretation of numeric tokens Raymond Toy

From: Kent M Pitman
Subject: Re: Actual Lisp Question - Interpretation of numeric tokens
Date: Thu, 30 Nov 2000 00:00:00 +0000
Message-ID: <sfwaeahh73a.fsf@world.std.com>

"Frank A. Adrian" <·······@uswest.net> writes:

> In Section 2.3.1 of the HyperSpec, the production for exponent shows that
> the exponent's digit can be any digit in the current input radix.  However
> in Section 2.3.2.2, the spec states that the mantissa and fraction can be
> followed by "an optional *decimal* exponent".  My assumptions on the
> interpretation of floating point numbers would be as follows:
> 
> (1) The use of digit instead of decimal-digit in the exponent production in
> Section 2.3.1 is an error (or at least an oversight).
> 
> (2) All digits within a floating point number, including the digits of the
> exponent, are interpreted in radix 10 regardless of the value of *read-base*
> (a corollary of assumption 1).
> 
> Are these assumptions correct?

First, of course, individuals can't opine officially on what the 
interpretation is.  Anyone's answer is just an opinion, and wouldn't
stop an implementation from making any other reasonable/defensible
interpretation dealing with the same data.

If what you say is correct, and I trust you've looked at it carefully,
then I suspect your suspicions as to original group intent are correct.

I do know that if you sit down to implement a floating point parser of
your own devising, you quickly run headlong into the issue of whether
floats should by right observe *read-base*, but if you've ever dealt
with other radixes and radix points, you quickly realize that 2.4 base
8 being 2.5 base 10 is very visually confusing.  It's not that it's wrong
or doesn't fall naturally out of the notation--it's just that people are
very, very used to seeing decimal points be expressed in decimal.  And
unlike, at least for me, the situation where in gradeschool you get drilled
on base 3 or base 8 or base 15 over and over until you understand it for
integers, I had never seen base 3 decimals in school, so I had no intuitions
about them.  So in all implementations I can think of (and I had thought
in the language itslef), in the standard float parser, we use decimal-only
for floats independent of *read-base*, for better or worse.

I certainly doubt that a mixed interpretation (like 10.5 base 8 meaning
8.5 base 10) would be anyone's desire.  And I also doubt that an 
interpretation where only certain digits were available in spite of a
potentially >10 radix being active would be anyone's desire.  Had we
intended this to respect *read-base*, I *know* someone would have
complained, and there'd be a *float-respects-*read-base** option 
variable, default NIL. :-)  Moreover, there'd be examples of how to print
these since surely printing would have to match.  So again that argues
for a normative interpretation as decimal only.

If you find any implementations that deviate, let me know.  But usually
when all implementations agree, there's not a crisis.  It's just assumed
to be an obvious typo.

From: Frank A. Adrian
Subject: Re: Actual Lisp Question - Interpretation of numeric tokens
Date: Thu, 30 Nov 2000 00:00:00 +0000
Message-ID: <UruV5.1572$aH5.82958@news.uswest.net>

"Kent M Pitman" <······@world.std.com> wrote in message
····················@world.std.com...
> If you find any implementations that deviate, let me know.  But usually
> when all implementations agree, there's not a crisis.  It's just assumed
> to be an obvious typo.

I've actually tried it on two or three implementations (including the public
versions of the two leading commercial ones) and they all read the number in
radix 10 regardless of read-base.  Usually one does not have access to all
implementations, so one is never sure that they all agree.  I was just
verifying.  Thanks for the "opinion" :-).

BTW, I have seen behaviors differ WRT the interpretation of tokens when
*readtable-case* was set to :INVERT and there were escaped characters in the
token string.  Assume the token consisted of the charctaers AB\qC.
According to the standard, escaped characters are NOT to be taken into
account when when it is decided if  character case-inversion is done
(specifically stated in Section 23.1.2).   So the interpretation for such a
token would be the symbol abqc.  Similarly, the interpretation of the token
ab\qc should be the symbol ABqC and that of the token aB\qc should be the
symbol aBqc.  Sadly, some of the implementations out there seem to just look
at the characters post-escape and ignore the escape issue.

faa

From: Kent M Pitman
Subject: Re: Actual Lisp Question - Interpretation of numeric tokens
Date: Thu, 30 Nov 2000 00:00:00 +0000
Message-ID: <sfwk89lju5o.fsf@world.std.com>

"Frank A. Adrian" <·······@uswest.net> writes:

> Sadly, some of the implementations out there seem to just look
> at the characters post-escape and ignore the escape issue.

Sounds like a bug you should report to the specific vendors.

If some vendor claims to you that it's not a bug, then it's grist 
for our mill here...

From: Hannah Schroeter
Subject: Re: Actual Lisp Question - Interpretation of numeric tokens
Date: Thu, 30 Nov 2000 00:00:00 +0000
Message-ID: <905uc5$886$1@c3po.schlund.de>

Hello!

In article <····················@news.uswest.net>,
Frank A. Adrian <·······@uswest.net> wrote:
>[...]

>I've actually tried it on two or three implementations (including the public
>versions of the two leading commercial ones) and they all read the number in
>radix 10 regardless of read-base.  Usually one does not have access to all
>implementations, so one is never sure that they all agree.  I was just
>verifying.  Thanks for the "opinion" :-).

clisp:

[11]> (setf *read-base* 16)
16
[17]> (+ 8.0D0 8.0D0)
16.0d0

ecls:

> (setf *read-base* 16)
16
> (+ 8.0D0 8.0D0)
16.0d0

sbcl:

* (setf *read-base* 16)

16
* (+ 8.0D0 8.0D0)

16.0d0

Seems to conform with your experiences.

>BTW, I have seen behaviors differ WRT the interpretation of tokens when
>*readtable-case* was set to :INVERT and there were escaped characters in the
>token string.  Assume the token consisted of the charctaers AB\qC.
>According to the standard, escaped characters are NOT to be taken into
>account when when it is decided if  character case-inversion is done
>(specifically stated in Section 23.1.2).   So the interpretation for such a
>token would be the symbol abqc.  Similarly, the interpretation of the token
>ab\qc should be the symbol ABqC and that of the token aB\qc should be the
>symbol aBqc.  Sadly, some of the implementations out there seem to just look
>at the characters post-escape and ignore the escape issue.

clisp and sbcl do as you expect, ecls complains this:

> (setf (readtable-case *readtable*) :invert)
Error: Cannot expand the SETF form (READTABLE-CASE *READTABLE*).
       Signalled by SETF.
;;; Warning: Clearing input from *debug-io*
Broken at SETF.

Regards,

Hannah.

From: Pierre R. Mai
Subject: Re: Actual Lisp Question - Interpretation of numeric tokens
Date: Thu, 30 Nov 2000 00:00:00 +0000
Message-ID: <87itp5gk2a.fsf@orion.bln.pmsf.de>

"Frank A. Adrian" <·······@uswest.net> writes:

> BTW, I have seen behaviors differ WRT the interpretation of tokens when
> *readtable-case* was set to :INVERT and there were escaped characters in the
> token string.  Assume the token consisted of the charctaers AB\qC.
> According to the standard, escaped characters are NOT to be taken into
> account when when it is decided if  character case-inversion is done
> (specifically stated in Section 23.1.2).   So the interpretation for such a
> token would be the symbol abqc.  Similarly, the interpretation of the token
> ab\qc should be the symbol ABqC and that of the token aB\qc should be the
> symbol aBqc.  Sadly, some of the implementations out there seem to just look
> at the characters post-escape and ignore the escape issue.

Hmmm.  Of all the implementations I tested with the file below (loaded
uncompiled), only the GCL descendants and ACL 6.0 in modern mode
broke.  The GCL descendands don't seem to support readtable-case at
all, and ACL 6.0 in modern mode fails for the known reasons.  ACL 6.0
in ANSI mode works correctly, as do CLISP (both native and ansi
modes), CMU CL and LispWorks Linux.

Regs, Pierre.

Testfile:

(in-package :cl-user)

(setq *readtable* (copy-readtable))
(setf (readtable-case *readtable*) :invert)

(assert (string= (symbol-name 'AB\qC) "abqc"))
(assert (string= (symbol-name 'ab\qc) "ABqC"))
(assert (string= (symbol-name 'aB\qc) "aBqc"))

-- 
Pierre R. Mai <····@acm.org>                    http://www.pmsf.de/pmai/
 The most likely way for the world to be destroyed, most experts agree,
 is by accident. That's where we come in; we're computer professionals.
 We cause accidents.                           -- Nathaniel Borenstein

From: Barry Margolin
Subject: Re: Actual Lisp Question - Interpretation of numeric tokens
Date: Thu, 30 Nov 2000 00:00:00 +0000
Message-ID: <4DvV5.14$DL1.94@burlma1-snr2>

In article <·····················@news.uswest.net>,
Frank A. Adrian <·······@uswest.net> wrote:
>In Section 2.3.1 of the HyperSpec, the production for exponent shows that
>the exponent's digit can be any digit in the current input radix.  However
>in Section 2.3.2.2, the spec states that the mantissa and fraction can be
>followed by "an optional *decimal* exponent".  My assumptions on the
>interpretation of floating point numbers would be as follows:
>
>(1) The use of digit instead of decimal-digit in the exponent production in
>Section 2.3.1 is an error (or at least an oversight).
>
>(2) All digits within a floating point number, including the digits of the
>exponent, are interpreted in radix 10 regardless of the value of *read-base*
>(a corollary of assumption 1).
>
>Are these assumptions correct?

I think this was the intent.  See section 2.3.2.2, where the word "decimal"
appears frequently (although not consistently).  Also, the dictionary entry
for *READ-BASE* says:

    The value of *read-base*, called the current input base, is the radix
    in which integers and ratios are to be read by the Lisp reader. The
    parsing of other numeric types (e.g., floats) is not affected by this
    option.

-- 
Barry Margolin, ······@genuity.net
Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

From: Johan Kullstam
Subject: Re: Actual Lisp Question - Interpretation of numeric tokens
Date: Thu, 30 Nov 2000 00:00:00 +0000
Message-ID: <m366l5je9z.fsf@sysengr.res.ray.com>

Barry Margolin <······@genuity.net> writes:

> In article <·····················@news.uswest.net>,
> Frank A. Adrian <·······@uswest.net> wrote:
> >In Section 2.3.1 of the HyperSpec, the production for exponent shows that
> >the exponent's digit can be any digit in the current input radix.  However
> >in Section 2.3.2.2, the spec states that the mantissa and fraction can be
> >followed by "an optional *decimal* exponent".

>     The value of *read-base*, called the current input base, is the radix
>     in which integers and ratios are to be read by the Lisp reader. The
>     parsing of other numeric types (e.g., floats) is not affected by this
>     option.

this prompts me to ask the question of is there a good way to print
and read back floating point values such that

1) precision is preserved
2) reading and writing is reasonbly fast
3) somewhat machine independent

the usual decimal radix float notation does 3) at the expense of the
other two.

for examplem, octave provides a hex format

jk:1> format hex
jk:2> pi
pi = 400921fb54442d18

which strikes me as a good idea.  does lisp have a similar thing?  i
couldn't find anything in a quick perusal of the hyperspec
(*print-radix* and such being for integers only).

-- 
J o h a n  K u l l s t a m
[········@ne.mediaone.net]
sysengr

From: Raymond Toy
Subject: Re: Actual Lisp Question - Interpretation of numeric tokens
Date: Thu, 30 Nov 2000 00:00:00 +0000
Message-ID: <4n1yvtjc1s.fsf@rtp.ericsson.se>

>>>>> "Johan" == Johan Kullstam <········@ne.mediaone.net> writes:

    Johan> this prompts me to ask the question of is there a good way to print
    Johan> and read back floating point values such that

    Johan> 1) precision is preserved
    Johan> 2) reading and writing is reasonbly fast
    Johan> 3) somewhat machine independent

    Johan> the usual decimal radix float notation does 3) at the expense of the
    Johan> other two.

    Johan> for examplem, octave provides a hex format

    Johan> jk:1> format hex
    Johan> jk:2> pi
    Johan> pi = 400921fb54442d18

    Johan> which strikes me as a good idea.  does lisp have a similar thing?  i
    Johan> couldn't find anything in a quick perusal of the hyperspec
    Johan> (*print-radix* and such being for integers only).

This only works if everyone uses the same format for floating-point
numbers.  If you don't care, and assume every one uses IEEE
floating-point and that single/double float all mean exactly the same
thing, then octave's idea is reasonable.  

I think for true portability, you could create a structure containing
an integer exponent, the mantissa as an integer, and the sign of the
number and the type of the number (short, single, double, long).
Representing NaN or infinities needs to be solved in some way, if
you're using IEEE.

Ray

From: Gareth McCaughan
Subject: Re: Actual Lisp Question - Interpretation of numeric tokens
Date: Sun, 03 Dec 2000 00:00:00 +0000
Message-ID: <slrn92lg6g.ir.Gareth.McCaughan@g.local>

Raymond Toy wrote:
>     Johan> this prompts me to ask the question of is there a good way to print
>     Johan> and read back floating point values such that
> 
>     Johan> 1) precision is preserved
>     Johan> 2) reading and writing is reasonbly fast
>     Johan> 3) somewhat machine independent
> 
>     Johan> the usual decimal radix float notation does 3) at the expense of the
>     Johan> other two.
> 
>     Johan> for examplem, octave provides a hex format
> 
>     Johan> jk:1> format hex
>     Johan> jk:2> pi
>     Johan> pi = 400921fb54442d18
> 
>     Johan> which strikes me as a good idea.  does lisp have a similar thing?
>     Johan> i couldn't find anything in a quick perusal of the hyperspec
>     Johan> (*print-radix* and such being for integers only).
> 
> This only works if everyone uses the same format for floating-point
> numbers.  If you don't care, and assume every one uses IEEE
> floating-point and that single/double float all mean exactly the same
> thing, then octave's idea is reasonable.  
> 
> I think for true portability, you could create a structure containing
> an integer exponent, the mantissa as an integer, and the sign of the
> number and the type of the number (short, single, double, long).
> Representing NaN or infinities needs to be solved in some way, if
> you're using IEEE.

.. suggesting an implementation along the following lines:

    (defun write-fp (x &optional (stream t))
      (multiple-value-bind (significand exponent sign)
                           (integer-decode-float x)
        (let ((radix (float-radix x))
              (digits (float-digits x)))
          (format stream "~A~16R:~16R:~16R:~16R"
                         (if (< sign 0) "-" "")
                         significand radix exponent digits))))

Reconstructing a floating-point number from this information
will be painless (using SCALE-FLOAT) if the reading system
and the writing system use the same radix. If they don't,
life is not so easy, but Johan asked only for "somewhat
machine independent". :-)

It might be worth adjusting the syntax a little so that you
can use a reader macro for reconstructing numbers.

-- 
Gareth McCaughan  ················@pobox.com
sig under construc

From: Johan Kullstam
Subject: Re: Actual Lisp Question - Interpretation of numeric tokens
Date: Mon, 04 Dec 2000 00:09:21 +0000
Message-ID: <m2itp13v0a.fsf@euler.axel.nom>

················@pobox.com (Gareth McCaughan) writes:

> Raymond Toy wrote:
> >     Johan> this prompts me to ask the question of is there a good way to print
> >     Johan> and read back floating point values such that
> > 
> >     Johan> 1) precision is preserved
> >     Johan> 2) reading and writing is reasonbly fast
> >     Johan> 3) somewhat machine independent
> > 
> >     Johan> the usual decimal radix float notation does 3) at the expense of the
> >     Johan> other two.
> > 
> >     Johan> for examplem, octave provides a hex format
> > 
> >     Johan> jk:1> format hex
> >     Johan> jk:2> pi
> >     Johan> pi = 400921fb54442d18
> > 
> >     Johan> which strikes me as a good idea.  does lisp have a similar thing?
> >     Johan> i couldn't find anything in a quick perusal of the hyperspec
> >     Johan> (*print-radix* and such being for integers only).
> > 
> > This only works if everyone uses the same format for floating-point
> > numbers.  If you don't care, and assume every one uses IEEE
> > floating-point and that single/double float all mean exactly the same
> > thing, then octave's idea is reasonable.  
> > 
> > I think for true portability, you could create a structure containing
> > an integer exponent, the mantissa as an integer, and the sign of the
> > number and the type of the number (short, single, double, long).
> > Representing NaN or infinities needs to be solved in some way, if
> > you're using IEEE.
> 
> .. suggesting an implementation along the following lines:
> 
>     (defun write-fp (x &optional (stream t))
>       (multiple-value-bind (significand exponent sign)
>                            (integer-decode-float x)
>         (let ((radix (float-radix x))
>               (digits (float-digits x)))
>           (format stream "~A~16R:~16R:~16R:~16R"
>                          (if (< sign 0) "-" "")
>                          significand radix exponent digits))))
> 
> Reconstructing a floating-point number from this information
> will be painless (using SCALE-FLOAT) if the reading system
> and the writing system use the same radix. If they don't,
> life is not so easy, but Johan asked only for "somewhat
> machine independent". :-)

yes this looks good.  my problem is that i often wish to save and
restore a large matrix of floating point numbers.  the size of the
file isn't terribly large, but reading and writing are slow since
conversion between a binary floating point number and a decimal
representation is fairly time consuming.

> It might be worth adjusting the syntax a little so that you
> can use a reader macro for reconstructing numbers.

nod.

-- 
J o h a n  K u l l s t a m
[········@ne.mediaone.net]
Don't Fear the Penguin!

From: Raymond Toy
Subject: Re: Actual Lisp Question - Interpretation of numeric tokens
Date: Mon, 04 Dec 2000 00:00:00 +0000
Message-ID: <4nwvdgfd9p.fsf@rtp.ericsson.se>

>>>>> "Johan" == Johan Kullstam <········@ne.mediaone.net> writes:

    Johan> yes this looks good.  my problem is that i often wish to save and
    Johan> restore a large matrix of floating point numbers.  the size of the
    Johan> file isn't terribly large, but reading and writing are slow since
    Johan> conversion between a binary floating point number and a decimal
    Johan> representation is fairly time consuming.

In that case, why not just save the data in binary form?  I used to
have some code somewhere that would take an IEEE float and read or
write it in binary format to a stream with element-type (unsigned-byte
8).  It was pretty portable and not too hard to write.  I'm pretty
sure the result was pretty fast too.

Ray