Why does formatting double-floats create bignums?

From: David J Cooper Jr
Subject: Why does formatting double-floats create bignums?
Date: Sat, 10 May 2003 07:41:39 +0000
Message-ID: <m34r435jv0.fsf@mccarthy.genworks.com>

I am doing some profiling of my app on Allegro CL 6.2 (Redhat
Linux). It does a lot of outputting of double-float numbers.  When
inspecting the results of space profiling, I notice a large percentage
of space being consumed by "newbignum," which seems to happen mostly
as a result of outputting the double-float numbers. See the following
trace:

;;=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

GDL-USER(29): (prof:with-profiling (:type :space) 
                (dotimes (n 10000) (format nil "~f" 0.1d0)))
NIL
GDL-USER(30): (prof:show-flat-profile)
Sampling stopped after 1145 samples taken for the space profiler.

Sample represents 9160.0 Kbytes of space allocated (out of a total of 9160.0)

Mem.s below 1.0% will be suppressed.

 %     %     self  total            self   total  Function
Mem.  Cum.  Kbyte  Kbyte    calls by/call by/call   name
51.2  51.2   4688   4688                          ... "newbignum"
 8.6  59.7    784    784                          ... "vector"
 7.7  67.4    704   1872                          ... EXCL::FORMAT-PARSER-F
 7.7  75.1    704    704                          ... APPEND
 6.9  82.0    632   5864                          ... EXCL::FORMAT-ENGINE
 5.1  87.1    464    464                          ... EXCL::INTERNAL-MAKE-STRING
 4.3  91.4    392   9160                          ... DOTIMES
 3.4  94.8    312   8688                          ... EXCL::%INVOKES
 1.7  96.5    160    160                          ... "new_double_float"

;;=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Now, 51.2% of the memory for this is apparently going to bignums. 

  1. Where are they coming from?

  2. Should I be concerned about them? (most of the memory apparently
     gets reclaimed pretty much immediately, but still isn't it making
     the GC work harder than might be necessary?)

  3. If i should be concerned, what can I do about it?  (i don't
     really need double-float precision for printing them out, only
     internally to maintain precision across arithmetic operations).

Thank you,

 -dave
David J Cooper Jr, Genworks International

Re: Why does formatting double-floats create bignums? Tim Bradshaw
- Re: Why does formatting double-floats create bignums? David J Cooper Jr
  - Re: Why does formatting double-floats create bignums? Tim Bradshaw
  - Re: Why does formatting double-floats create bignums? William D Clinger
Re: Why does formatting double-floats create bignums? Larry Hunter
- Re: Why does formatting double-floats create bignums? Christophe Rhodes
  - Re: Why does formatting double-floats create bignums? Kent M Pitman
    - Re: Why does formatting double-floats create bignums? William D Clinger
      - Re: Why does formatting double-floats create bignums? Kent M Pitman

From: Tim Bradshaw
Subject: Re: Why does formatting double-floats create bignums?
Date: Sat, 10 May 2003 12:37:41 +0000
Message-ID: <ey3bryb0yga.fsf@cley.com>

* David J Cooper, wrote:
>   2. Should I be concerned about them? (most of the memory apparently
>      gets reclaimed pretty much immediately, but still isn't it making
>      the GC work harder than might be necessary?)

Only if they dominate the time taken.  In particular, if this is doing
I/O (which I guess it is), how much time is spent actually writing
bits to the disk/network-stream/terminal? My guess is `much more than
anything else', though I could be wrong (good I/O systems can manage a
byte per instruction in some optimized cases, perhaps more, and some
of these can scale to multiple processor machines, leading to amazing
I/O throughputs).

--tim

From: David J Cooper Jr
Subject: Re: Why does formatting double-floats create bignums?
Date: Sat, 10 May 2003 17:25:27 +0000
Message-ID: <m3y91e4su0.fsf@mccarthy.genworks.com>

Tim Bradshaw <···@cley.com> writes:
> 
> Only if they dominate the time taken.  In particular, if this is doing
> I/O (which I guess it is), how much time is spent actually writing
> bits to the disk/network-stream/terminal? 
>

Tim,

 Thanks for the response. I am trying to move on to other things and
 not obsess on this. But i am curious why it has to create all those
 bignums just to print double-floats.

Check this out:

GDL-USER(55): (time  (dotimes (n 10000) (format nil "~f" 0.1d0)))
; cpu time (non-gc) 460 msec user, 0 msec system
; cpu time (gc)     0 msec user, 0 msec system
; cpu time (total)  460 msec user, 0 msec system
; real time  502 msec
; space allocation:
;  390,014 cons cells, 6,241,072 other bytes, 0 static bytes
NIL
GDL-USER(56): (time  (dotimes (n 10000) (format nil "~f" 0.1f0)))
; cpu time (non-gc) 360 msec user, 0 msec system
; cpu time (gc)     30 msec user, 0 msec system
; cpu time (total)  390 msec user, 0 msec system
; real time  405 msec
; space allocation:
;  390,017 cons cells, 1,367,600 other bytes, 0 static bytes
NIL
GDL-USER(57): 

This is not doing I/O is it? It's just converting a bunch of floats to
strings. The difference in time is about 20%. In my real app I am
seeing more like a 30% time savings by coercing double-floats to
single-floats before formatting (as well as the approx 80% savings in
"other bytes" you see here).

I think what I really need is a pointer to a practical guide on
dealing with numbers in general. Until now I have pretty much been
using numbers in CL as if the right thing is always just going to
happen by magic. Apparently it is time to start pulling my head out of
the sand.

Thanks,

 -dave

From: Tim Bradshaw
Subject: Re: Why does formatting double-floats create bignums?
Date: Sat, 10 May 2003 18:43:05 +0000
Message-ID: <ey3of2a63t2.fsf@cley.com>

* David J Cooper, wrote:

> This is not doing I/O is it? It's just converting a bunch of floats to
> strings. The difference in time is about 20%. In my real app I am
> seeing more like a 30% time savings by coercing double-floats to
> single-floats before formatting (as well as the approx 80% savings in
> "other bytes" you see here).

OK, well a 20%-30% difference definitely isn't nothing!

My *guess* would be that it might be generating integers to print
either the whole number or just the decimal part, and that for doubles
this integer is a bignum.

--tim

From: William D Clinger
Subject: Re: Why does formatting double-floats create bignums?
Date: Mon, 12 May 2003 17:27:07 +0000
Message-ID: <b84e9a9f.0305120927.6f6c5905@posting.google.com>

David J Cooper Jr wrote:
>  Thanks for the response. I am trying to move on to other things and
>  not obsess on this. But i am curious why it has to create all those
>  bignums just to print double-floats.

Reading and printing floating point numbers is nontrivial.  Most CL
systems use some variation of the Dragon algorithm by Steele and White,
which makes extensive use of bignums.  See

    Guy L Steele, Jr and Jon L White.  How to print floating-point
    numbers accurately.  In Proceedings of the 1990 ACM Conference
    on Programming Language Design and Implementation, pages 112-126.

    William D Clinger.  How to Read Floating Point Numbers Accurately.
    In Proceedings of the 1990 ACM Conference on Programming Language
    Design and Implementation, pages 92-101.

My paper, which presents an algorithm for the inverse problem, is
online at ftp://ftp.ccs.neu.edu/pub/people/will/howtoread.ps

Will

From: Larry Hunter
Subject: Re: Why does formatting double-floats create bignums?
Date: Wed, 14 May 2003 04:47:04 +0000
Message-ID: <m3wugu8793.fsf@huge.uchsc.edu>

David Cooper Jr. writes:

 Now, 51.2% of the memory for this is apparently going to bignums. 

 GDL-USER(29): (prof:with-profiling (:type :space) 
                 (dotimes (n 10000) (format nil "~f" 0.1d0)))

Will Clinger explained why you see bignums as part of printing double
floats. I want to point out that your profiling isn't telling you what
you think it is since you didn't compile it. You are seeing a lot of
overhead from the interpreter. You should try something more like
this:

(defun format-float (n) 
  (declare (optimize space)) 
  (dotimes (i n) (format nil "~f" 0.1d0)))

(compile 'format-float)

(prof:with-profiling (:type :space)
  (format-float 10000))

In which case for ACL 6.2 on linux/intel you would see more like 75%
of the 6152kb of space is going for newbignums (about 467 bytes per
iteration), with a little for "format-engine" and
"internal-make-string"

If you did single floats the same way, you would see 1408kb of space
used, roughly half for format-engine and half for
internal-make-string.

Larry

-- 

Lawrence Hunter, Ph.D.
Director, Center for Computational Pharmacology
Associate Professor of Pharmacology, PMB & Computer Science

phone  +1 303 315 1094           UCHSC, Campus Box C236    
fax    +1 303 315 1098           School of Medicine rm 2817b   
cell   +1 303 324 0355           4200 E. 9th Ave.                 
email: ············@uchsc.edu    Denver, CO 80262       
PGP key on public keyservers     http://compbio.uchsc.edu/hunter

From: Christophe Rhodes
Subject: Re: Why does formatting double-floats create bignums?
Date: Fri, 16 May 2003 14:11:14 +0000
Message-ID: <sq4r3v2d8d.fsf@lambda.jcn.srcf.net>

Larry Hunter <············@uchsc.edu> writes:

>  Now, 51.2% of the memory for this is apparently going to bignums. 
>
>  GDL-USER(29): (prof:with-profiling (:type :space) 
>                  (dotimes (n 10000) (format nil "~f" 0.1d0)))
>
> [...]
>
> (defun format-float (n) 
>   (declare (optimize space)) 
>   (dotimes (i n) (format nil "~f" 0.1d0)))
>
> (compile 'format-float)
>
> (prof:with-profiling (:type :space)
>   (format-float 10000))

I'm slightly surprised that no-one has pointed this out yet: I accept
that this is what is happening in this case, but I would nonetheless
counsel against profiling without having disassembled FORMAT-FLOAT to
check that it looks more-or-less as expected; in particular, would a
compiler not be justified in eliding the whole of the call to FORMAT
inside FORMAT-FLOAT, since it is in this guise side-effect-free and
hence completely flushable or, if you take the view that function call
is a side-effect, foldable to (COPY-SEQ "0.1D0")?

Christophe
-- 
http://www-jcsu.jesus.cam.ac.uk/~csr21/       +44 1223 510 299/+44 7729 383 757
(set-pprint-dispatch 'number (lambda (s o) (declare (special b)) (format s b)))
(defvar b "~&Just another Lisp hacker~%")    (pprint #36rJesusCollegeCambridge)

From: Kent M Pitman
Subject: Re: Why does formatting double-floats create bignums?
Date: Fri, 16 May 2003 14:27:57 +0000
Message-ID: <sfw4r3v55le.fsf@shell01.TheWorld.com>

Christophe Rhodes <·····@cam.ac.uk> writes:

> Larry Hunter <············@uchsc.edu> writes:
> 
> >  Now, 51.2% of the memory for this is apparently going to bignums. 
> >
> >  GDL-USER(29): (prof:with-profiling (:type :space) 
> >                  (dotimes (n 10000) (format nil "~f" 0.1d0)))
> >
> > [...]
> >
> > (defun format-float (n) 
> >   (declare (optimize space)) 
> >   (dotimes (i n) (format nil "~f" 0.1d0)))
> >
> > (compile 'format-float)
> >
> > (prof:with-profiling (:type :space)
> >   (format-float 10000))
> 
> I'm slightly surprised that no-one has pointed this out yet: I accept
> that this is what is happening in this case, but I would nonetheless
> counsel against profiling without having disassembled FORMAT-FLOAT to
> check that it looks more-or-less as expected; in particular, would a
> compiler not be justified in eliding the whole of the call to FORMAT
> inside FORMAT-FLOAT, since it is in this guise side-effect-free and
> hence completely flushable or, if you take the view that function call
> is a side-effect, foldable to (COPY-SEQ "0.1D0")?

Sure, it's allowed.

- - - 

I also have a problem with the subject line of this thread because
it's one of those "Have you stopped beating your wife?" kinds of
questions that cannot be answered on its face.

The wording does not qualify the remark even to an implementation.
Since it's about a bug, which is assumed to be transient, it should also
mention the version.

The wording offered in the subject suggests that somewhere in the ANSI
CL spec are the words

 "A conforming implementation is required to cons bignums 
  when formatting double floats, and to leave them around as
  garbage afterward just so that the GC will have something to do."

There isn't.

(There isn't even mention of garbage collection.  A conforming
implementation is not required to have one, incidentally.  The whole
issue of storage allocation and reclaiming is left to implementations.
An implementation is permitted to signal a storage exhausted condition
when there is not enough memory for whatever reason.)

Please don't assume that efects you see in an implementation are forced
by the standard.  Or else, if you know this is not so, please don't 
phrase questions to this forum in a way that implies otherwise.  This
distinction between language and implementation is critical to Common Lisp.
Unlike some other popular languages, CL is not defined by its 
implementation but by an independent specification.

I know this seems like a small point, but this is how myths about a
language get started.  People see a stray subject line out of context
and think the remark is about "Common Lisp" rather than about
someone's buggy attempt to implement Common Lisp, and they repeat it
in some other forum, and it takes on a life of its own.  When you find 
yourself hearing people saying random stuff about CL, don't just roll
your eyes and say "boy, are they stupid".  Ask yourself, "what could have
led them to have such a misconception?"  And consider at least the 
possibility that we, as a community, through our sloppiness, have caused
those misconceptions.

If you're new to the language and something funny happens interactively,
ALWAYS qualify your remark to the implementation and do not assume the
situation is general unless you find the passage in the ANSI CL spec that
requires the behavior.

Phew.  I feel better now.

From: William D Clinger
Subject: Re: Why does formatting double-floats create bignums?
Date: Sat, 17 May 2003 02:59:00 +0000
Message-ID: <b84e9a9f.0305161859.59a1ead3@posting.google.com>

Kent M Pitman wrote:
> I know this seems like a small point, but this is how myths about a
> language get started.  People see a stray subject line out of context
> and think the remark is about "Common Lisp" rather than about
> someone's buggy attempt to implement Common Lisp, and they repeat it
> in some other forum, and it takes on a life of its own....

Lest Kent's remarks take on a life of their own, I'd like to point out
that creating bignums while formatting floating point numbers is not a
bug, but a feature.  The standard algorithms for formatting a floating
point number accurately, with the fewest digits required to distinguish
the floating point number from adjacent floats, all create bignums.
Hence implementations that do not create bignums during this process
are almost certain to be inaccurate and/or to print more digits than
are necessary.

Will

From: Kent M Pitman
Subject: Re: Why does formatting double-floats create bignums?
Date: Sat, 17 May 2003 15:45:36 +0000
Message-ID: <sfwvfw97f1b.fsf@shell01.TheWorld.com>

······@qnci.net (William D Clinger) writes:

> Kent M Pitman wrote:
> > I know this seems like a small point, but this is how myths about a
> > language get started.  People see a stray subject line out of context
> > and think the remark is about "Common Lisp" rather than about
> > someone's buggy attempt to implement Common Lisp, and they repeat it
> > in some other forum, and it takes on a life of its own....
> 
> Lest Kent's remarks take on a life of their own, I'd like to point out
> that creating bignums while formatting floating point numbers is not a
> bug, but a feature.  The standard algorithms for formatting a floating
> point number accurately, with the fewest digits required to distinguish
> the floating point number from adjacent floats, all create bignums.
> Hence implementations that do not create bignums during this process
> are almost certain to be inaccurate and/or to print more digits than
> are necessary.

I wasn't meaning to address the algorithm.  I was assuming that the
underlying question was one about memory management.  

(The question of 'creation' per se seems irrelevant if the created things
are collected.  Some implementations have efficient generation-based 
gc such that the returning happens automatically and it doesn't matter.
In others, the creation might want to be accompanied by forced returning
if it's in an inner routine that is known by the implementors to create
garbage. etc...) 

But, in short, for better or worse, I wasn't answering the question 
on its face.