(NEWBIE) Integer-length?

From: ·············@gmail.com
Subject: (NEWBIE) Integer-length?
Date: Thu, 11 Dec 2008 10:34:43 +0000
Message-ID: <64615b6e-d119-411f-8dc6-4503527fef06@e1g2000pra.googlegroups.com>

I wanna get the length of an integer ...( How many digits ? )
integer-length gives me strange answers ... ( Is it in binary
length ?)
I also want to know how to convert an integer to string ?

Re: (NEWBIE) Integer-length? ·············@gmail.com
- Re: (NEWBIE) Integer-length? ······@corporate-world.lisp.de
  - Re: (NEWBIE) Integer-length? ·············@gmail.com
    - Re: (NEWBIE) Integer-length? ······@corporate-world.lisp.de
Re: (NEWBIE) Integer-length? ······@corporate-world.lisp.de
Re: (NEWBIE) Integer-length? Pascal J. Bourguignon
- Re: (NEWBIE) Integer-length? ·············@gmail.com
  - Re: (NEWBIE) Integer-length? Raymond Toy
    - Re: (NEWBIE) Integer-length? ·············@gmail.com
    - Re: (NEWBIE) Integer-length? Tamas K Papp
    - Re: (NEWBIE) Integer-length? Pascal J. Bourguignon
      - Re: (NEWBIE) Integer-length? Raymond Toy
        Re: (NEWBIE) Integer-length? Pascal J. Bourguignon
        Re: (NEWBIE) Integer-length? Raymond Toy
  - Re: (NEWBIE) Integer-length? Kaz Kylheku
Re: (NEWBIE) Integer-length? Robert Maas, http://tinyurl.com/uh3t
- Re: (NEWBIE) Integer-length? Raffael Cavallaro
  - Re: (NEWBIE) Integer-length? Raffael Cavallaro
    - Re: (NEWBIE) Integer-length? Raffael Cavallaro
- Re: (NEWBIE) Integer-length? Robert Maas, http://tinyurl.com/uh3t
  - Re: (NEWBIE) Integer-length? James Harris
    - Re: (NEWBIE) Integer-length? Mensanator
      - Re: (NEWBIE) Integer-length? James Harris
        Re: (NEWBIE) Integer-length? Mensanator
      - Re: (NEWBIE) Integer-length? Bartc
        Re: (NEWBIE) Integer-length? toby
        Re: (NEWBIE) Integer-length? Mensanator
        Re: (NEWBIE) Integer-length? Kaz Kylheku
  - Re: (NEWBIE) Integer-length? Ben Bacarisse
  - Re: (NEWBIE) Integer-length? Madhu
  - Re: (NEWBIE) Integer-length? Madhu
    - Re: (NEWBIE) Integer-length? Madhu
  - Re: (NEWBIE) Integer-length? Mark Wooding
    - Re: (NEWBIE) Integer-length? Kaz Kylheku
    - Re: (NEWBIE) Integer-length? Madhu
      - Re: (NEWBIE) Integer-length? Mark Wooding
        Re: (NEWBIE) Integer-length? Madhu
        Re: (NEWBIE) Integer-length? Robert Maas, http://tinyurl.com/uh3t
        Re: (NEWBIE) Integer-length? Mark Wooding
        Re: (NEWBIE) Integer-length? John Thingstad
        Re: (NEWBIE) Integer-length? Raffael Cavallaro
        Re: (NEWBIE) Integer-length? Kaz Kylheku
        Re: (NEWBIE) Integer-length? Mark Wooding
        Re: (NEWBIE) Integer-length? Madhu
        Re: (NEWBIE) Integer-length? Kaz Kylheku
        Re: (NEWBIE) Integer-length? Mark Wooding
        Re: (NEWBIE) Integer-length? Raffael Cavallaro
        Re: (NEWBIE) Integer-length? Kaz Kylheku
        language and program nostalgia (was Re: (NEWBIE) Integer-length?) Andrew Reilly
        Re: language and program nostalgia Pascal J. Bourguignon
        Re: language and program nostalgia Christopher Koppler
        Re: language and program nostalgia Pascal J. Bourguignon

From: ·············@gmail.com
Subject: Re: (NEWBIE) Integer-length?
Date: Thu, 11 Dec 2008 10:43:37 +0000
Message-ID: <d2980cad-8542-4c97-a038-250579971e2e@w1g2000prk.googlegroups.com>

Ok, I got it ...

(defmacro integer-length
(integer)
           `(length (format nil "~a" ,integer)))

Is there any smarter way to do it ?

From: ······@corporate-world.lisp.de
Subject: Re: (NEWBIE) Integer-length?
Date: Thu, 11 Dec 2008 10:51:13 +0000
Message-ID: <d4749698-409a-4760-88a2-b3e6c9b12f25@o4g2000pra.googlegroups.com>

On 11 Dez., 11:43, ·············@gmail.com wrote:
> Ok, I got it ...
>
> (defmacro integer-length
> (integer)
>            `(length (format nil "~a" ,integer)))
>
> Is there any smarter way to do it ?

Write a function not a macro, use write-to-string, make sure that the
base is right,
and think about negative numbers.

Most important: don't redefine existing Common Lisp functionality.

From: ·············@gmail.com
Subject: Re: (NEWBIE) Integer-length?
Date: Thu, 11 Dec 2008 11:28:05 +0000
Message-ID: <ec8091f0-d913-4fe2-ab4a-410c00897fb0@r37g2000prr.googlegroups.com>

I am now learning lisp with Common Lisp ..
And I am translating some exercises for my C Course last semester ...

> Write a function not a macro, use write-to-string, make sure that the
> base is right,
> and think about negative numbers.
> Most important: don't redefine existing Common Lisp functionality.

Now, I am learning macros and just practicing it harder ... ;) Sure, I
will just use write-to-string ...

Yeah, I do need a reference ...
Now, I am trying to set up Hyperspec with slime ( offline ) ...
Thanks ...

From: ······@corporate-world.lisp.de
Subject: Re: (NEWBIE) Integer-length?
Date: Thu, 11 Dec 2008 11:44:34 +0000
Message-ID: <ae2aa4b9-b97c-4808-b701-79bc56b86e33@r15g2000prh.googlegroups.com>

On 11 Dez., 12:28, ·············@gmail.com wrote:
> I am now learning lisp with Common Lisp ..
> And I am translating some exercises for my C Course last semester ...
>
> > Write a function not a macro, use write-to-string, make sure that the
> > base is right,
> > and think about negative numbers.
> > Most important: don't redefine existing Common Lisp functionality.
>
> Now, I am learning macros and just practicing it harder ... ;) Sure, I
> will just use write-to-string ...

Part of practicing macros is to understand where to use them and where
not.
When in doubt, use functions. Macros are there for code
transformations.

>
> Yeah, I do need a reference ...
> Now, I am trying to set up Hyperspec with slime ( offline ) ...

Yep, that makes sense!

> Thanks ...

From: ······@corporate-world.lisp.de
Subject: Re: (NEWBIE) Integer-length?
Date: Thu, 11 Dec 2008 10:49:26 +0000
Message-ID: <4bb85c8a-cad8-478a-8725-251891ac7972@s1g2000prg.googlegroups.com>

On 11 Dez., 11:34, ·············@gmail.com wrote:
> I wanna get the length of an integer ...( How many digits ? )
> integer-length gives me strange answers ... ( Is it in binary
> length ?)

Check out the HyperSpec - that's the standard for Common Lisp
converted to HTML.
It should answer your question.

> I also want to know how to convert an integer to string ?

(write-to-string 03840583048503980).

You might want to consult a reference manual, the Hyperspec or some
Lisp
introduction book.

From: Pascal J. Bourguignon
Subject: Re: (NEWBIE) Integer-length?
Date: Thu, 11 Dec 2008 12:23:50 +0000
Message-ID: <7cprjyvr1l.fsf@pbourguignon.anevia.com>

·············@gmail.com writes:

> I wanna get the length of an integer ...

What is the length of an integer?

> ( How many digits ? )

What is a digit?   For example, I need three digits to write fourteen: "XIV"

Oh!? You want a positional representation? In what base?

The number of digits needed to write N in base ten is:

  (lambda (n) (1+ (if (zerop n) 0 (truncate (log (abs n) 10.)))))

If you want to count the sign for negative numbers:

  (lambda (n) (+ (if (minusp n) 2 1) (if (zerop n) 0 (truncate (log (abs n) 10.)))))

> integer-length gives me strange answers ... ( Is it in binary
> length ?)

http://www.lispworks.com/reference/HyperSpec/Body/f_intege.htm

Yep, the number of bits can be characterized as a "binary length".

> I also want to know how to convert an integer to string ?

There are a lot of CL functions allowing you to do that. See for
example PRIN1-TO-STRING, and check the other functions in the same
CLHS chapter.

-- 
__Pascal Bourguignon__

From: ·············@gmail.com
Subject: Re: (NEWBIE) Integer-length?
Date: Thu, 11 Dec 2008 16:38:25 +0000
Message-ID: <caa883fd-c2d7-4677-9ec2-f06e0ccdb6ad@z28g2000prd.googlegroups.com>

>   (lambda (n) (1+ (if (zerop n) 0 (truncate (log (abs n) 10.)))))

Wow ... Nice Example ;)

From: Raymond Toy
Subject: Re: (NEWBIE) Integer-length?
Date: Thu, 11 Dec 2008 17:31:05 +0000
Message-ID: <sxdprjyy5ye.fsf@rtp.ericsson.se>

>>>>> "thurahlaing" == thurahlaing  <·············@gmail.com> writes:

    >> � (lambda (n) (1+ (if (zerop n) 0 (truncate (log (abs n) 10.)))))
    thurahlaing> Wow ... Nice Example ;)

I think that for large enough n, this might be off by one.

I don't have an example, but the results depend a lot on how the log
function is implemented, especially when both args are integers.

And why not using ceiling instead of truncate?

Ray

From: ·············@gmail.com
Subject: Re: (NEWBIE) Integer-length?
Date: Thu, 11 Dec 2008 19:20:34 +0000
Message-ID: <02fa37d3-7994-4e71-b3c9-3c3a6372be89@35g2000pry.googlegroups.com>

On Dec 11, 12:31 pm, Raymond Toy <···········@ericsson.com> wrote:
> >>>>> "thurahlaing" == thurahlaing  <·············@gmail.com> writes:
>
>     >>   (lambda (n) (1+ (if (zerop n) 0 (truncate (log (abs n) 10.)))))
>     thurahlaing> Wow ... Nice Example ;)
>
> I think that for large enough n, this might be off by one.
>
> I don't have an example, but the results depend a lot on how the log
> function is implemented, especially when both args are integers.
>
> And why not using ceiling instead of truncate?
>
> Ray

Well, I say Nice Example coz simply I don't know I can do that ... ( I
am just a newbie, not just in lisp but also in programming )

From: Tamas K Papp
Subject: Re: (NEWBIE) Integer-length?
Date: Thu, 11 Dec 2008 20:57:11 +0000
Message-ID: <6qdd57Fc2ckoU1@mid.individual.net>

On Thu, 11 Dec 2008 12:31:05 -0500, Raymond Toy wrote:

>>>>>> "thurahlaing" == thurahlaing  <·············@gmail.com> writes:
> 
>     >>   (lambda (n) (1+ (if (zerop n) 0 (truncate (log (abs n)
>     >>   10.)))))
>     thurahlaing> Wow ... Nice Example ;)
> 
> I think that for large enough n, this might be off by one.
> 
> I don't have an example, but the results depend a lot on how the log
> function is implemented, especially when both args are integers.
> 
> And why not using ceiling instead of truncate?

Here is another basic version, which is always exact.

(defun int-length (n)
  (assert (integerp n))
  (if (zerop n)
      1					; how many digits does zero have?
      (do ((digits 0 (1+ digits))
	   (i (abs n) (truncate i 10)))
	  ((zerop i) digits))))

Tamas

From: Pascal J. Bourguignon
Subject: Re: (NEWBIE) Integer-length?
Date: Fri, 12 Dec 2008 09:10:08 +0000
Message-ID: <7cej0dvjwv.fsf@pbourguignon.anevia.com>

Raymond Toy <···········@ericsson.com> writes:

>>>>>> "thurahlaing" == thurahlaing  <·············@gmail.com> writes:
>
>     >> � (lambda (n) (1+ (if (zerop n) 0 (truncate (log (abs n) 10.)))))
>     thurahlaing> Wow ... Nice Example ;)
>
> I think that for large enough n, this might be off by one.
>
> I don't have an example, but the results depend a lot on how the log
> function is implemented, especially when both args are integers.
>
> And why not using ceiling instead of truncate?

Let me see, why?  Perhaps because it would be wrong?


(mapcar (lambda (f) (funcall f 50))
        (list
           (lambda (n) (length (format nil "~A" n)))
           (lambda (n) (1+ (if (zerop n) 0 (truncate (log (abs n) 10.)))))
           (lambda (n) (1+ (if (zerop n) 0 (ceiling  (log (abs n) 10.)))))))
--> (2 2 3)


-- 
__Pascal Bourguignon__

From: Raymond Toy
Subject: Re: (NEWBIE) Integer-length?
Date: Fri, 12 Dec 2008 14:22:43 +0000
Message-ID: <sxd1vwdxyks.fsf@rtp.ericsson.se>

>>>>> "Pascal" == Pascal J Bourguignon <···@informatimago.com> writes:

    Pascal> Raymond Toy <···········@ericsson.com> writes:
    >>>>>>> "thurahlaing" == thurahlaing  <·············@gmail.com> writes:
    >> 
    >> >> � (lambda (n) (1+ (if (zerop n) 0 (truncate (log (abs n) 10.)))))
    thurahlaing> Wow ... Nice Example ;)
    >> 
    >> I think that for large enough n, this might be off by one.
    >> 
    >> I don't have an example, but the results depend a lot on how the log
    >> function is implemented, especially when both args are integers.
    >> 
    >> And why not using ceiling instead of truncate?

    Pascal> Let me see, why?  Perhaps because it would be wrong?


    Pascal> (mapcar (lambda (f) (funcall f 50))
    Pascal>         (list
    Pascal>            (lambda (n) (length (format nil "~A" n)))
    Pascal>            (lambda (n) (1+ (if (zerop n) 0 (truncate (log (abs n) 10.)))))
    Pascal>            (lambda (n) (1+ (if (zerop n) 0 (ceiling  (log (abs n) 10.)))))))
    --> (2 2 3)

Well, I was assuming that if you used ceiling, you'd also change the
1+.

Ray

From: Pascal J. Bourguignon
Subject: Re: (NEWBIE) Integer-length?
Date: Fri, 12 Dec 2008 16:25:56 +0000
Message-ID: <7cprjxcqcr.fsf@pbourguignon.anevia.com>

Raymond Toy <···········@ericsson.com> writes:

>>>>>> "Pascal" == Pascal J Bourguignon <···@informatimago.com> writes:
>
>     Pascal> Raymond Toy <···········@ericsson.com> writes:
>     >>>>>>> "thurahlaing" == thurahlaing  <·············@gmail.com> writes:
>     >> 
>     >> >> � (lambda (n) (1+ (if (zerop n) 0 (truncate (log (abs n) 10.)))))
>     thurahlaing> Wow ... Nice Example ;)
>     >> 
>     >> I think that for large enough n, this might be off by one.
>     >> 
>     >> I don't have an example, but the results depend a lot on how the log
>     >> function is implemented, especially when both args are integers.
>     >> 
>     >> And why not using ceiling instead of truncate?
>
>     Pascal> Let me see, why?  Perhaps because it would be wrong?
>
>
>     Pascal> (mapcar (lambda (f) (funcall f 50))
>     Pascal>         (list
>     Pascal>            (lambda (n) (length (format nil "~A" n)))
>     Pascal>            (lambda (n) (1+ (if (zerop n) 0 (truncate (log (abs n) 10.)))))
>     Pascal>            (lambda (n) (1+ (if (zerop n) 0 (ceiling  (log (abs n) 10.)))))))
>     --> (2 2 3)
>
> Well, I was assuming that if you used ceiling, you'd also change the
> 1+.

My test case didn't cover enough.  There you have:

(mapcar (lambda (f)  (mapcar f '(99 100 101)))
        (list
         (lambda (n) (length (format nil "~A" n)))
         (lambda (n) (1+ (if (zerop n) 0 (truncate (log (abs n) 10.)))))
         (lambda (n) (1+ (if (Azerop n) 0 (ceiling  (log (abs n) 10.)))))))
;;   IC C CI
--> ((2 3 3)   ; format
     (2 3 3)   ; truncate
     (3 3 4))  ; ceiling

-- 
__Pascal Bourguignon__

From: Raymond Toy
Subject: Re: (NEWBIE) Integer-length?
Date: Mon, 15 Dec 2008 14:42:16 +0000
Message-ID: <sxdiqpljy9j.fsf@rtp.ericsson.se>

>>>>> "Pascal" == Pascal J Bourguignon <···@informatimago.com> writes:

    Pascal> Raymond Toy <···········@ericsson.com> writes:
    >> 
    >> Well, I was assuming that if you used ceiling, you'd also change the
    >> 1+.

    Pascal> My test case didn't cover enough.  There you have:

    Pascal> (mapcar (lambda (f)  (mapcar f '(99 100 101)))
    Pascal>         (list
    Pascal>          (lambda (n) (length (format nil "~A" n)))
    Pascal>          (lambda (n) (1+ (if (zerop n) 0 (truncate (log (abs n) 10.)))))
    Pascal>          (lambda (n) (1+ (if (Azerop n) 0 (ceiling  (log (abs n) 10.)))))))
    Pascal> ;;   IC C CI
    --> ((2 3 3)   ; format
    Pascal>      (2 3 3)   ; truncate
    Pascal>      (3 3 4))  ; ceiling

My mistake.  You are, of course, correct.

Ray

From: Kaz Kylheku
Subject: Re: (NEWBIE) Integer-length?
Date: Thu, 11 Dec 2008 23:37:14 +0000
Message-ID: <20081227094246.681@gmail.com>

On 2008-12-11, ·············@gmail.com <·············@gmail.com> wrote:
>>   (lambda (n) (1+ (if (zerop n) 0 (truncate (log (abs n) 10.)))))
>
> Wow ... Nice Example ;)

You guys are forgetting to obtain the width of the digit characters in your
display font. Then you can get the integer-width in a point size. Maybe even
actual millimeters, if you consider printing hardcopy!

From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: (NEWBIE) Integer-length?
Date: Sat, 13 Dec 2008 13:15:33 +0000
Message-ID: <REM-2008dec13-009@Yahoo.Com>

> From: ·············@gmail.com
> I wanna get the length of an integer

An Integer doesn't have a length. Only the *representation* of an
integer as a string, using some base-and-digit system, has a
length. Most commonly the base (radix) is two, eight, ten or
sixteen, and the digits (characters, shown here as UniCode points
in hexadecimal notation) start at u+0030 and run through u+0039,
then skip to either u+0041 or u+0061. The particular characters
used to encode the digits doesn't affect the length, but the base
(radix) does affect it, very much!

The human-lazy way to compute the length of the representation of
an integer is to use FORMAT to actually convert to a string, then
call LENGTH on the result. (But watch out for negative numbers,
which have a minus sign before the first digit. You need to define
how precisely you want the "length" defined in that case.)

But for any base except ten, that's horribly inefficient compared
to calling a machine-language subroutine that typically executes a
single machine instruction (for single-word integers) to locate the
first bit in the number that differs from the sign bit, plus
additional instructions to convert that depending on whether what
base you want and whether the integer was negative or not. For
BIGNUMs, with the value split into multiple words, it's just a few
more instructions to get the index of that first non-sign-same bit.

Even for base (radix) ten, it might be more efficient to use the
bit-position to get an estimate of the number of decimal digits,
then if there are two possible answers then do a single arithmetic
test to see whether the value is above or below the threshold. For
example (for non-negative integers):
   #bits  cases  #decimalDigits
      1     1         1
      2     2,3       1
      3    4,5,6,7    1
      4      8,9      1
          10,...,15   2
      5   15,...,31   2
      6   32,...,63   2
      7   64,...,99   2
          100 - 127   3
See how most of the time the number of bits fully determines the
number of decimal digits, and only about a third of the time there
are two choices for number of digits and you need to to an
additional test?

So what's that additional test? If you compute that an integer that
is expressed as N bits can be either M or M+1 digits when expressed
in decimal, you simply need to compute ten^M and test whether your
integer is less that power or not. ten^M can be computed moderately
efficiently using only log(M) squarings and at most that many side
multiplications. (This same exact algorithm, except also reducing
modulo another number at each operation, can be used for efficient
modular exponentation used in RSA public-key encryption.)

If you're going to do this only a few times, take the
lazy-programmer approach, except for binary where it's trivial to
use integer-length. If this calculation is done many times in a
tight loop and you want it to run faster and do almost no CONSing,
consider coding the more efficient algorithm that doesn't build the
string, just figures out how long that string *would* have been.

> integer-length gives me strange answers ... ( Is it in binary length ?)

Rule number one when trying a new Common Lisp function (or for that
matter *any* built-in function in *any* programming language) for
your very first time, is to RTFM!!! Yes, integer-length gives the
number of bits in the two's complement binary representation of the
integer, which is the same as the internal representation on most
modern computers, or tightly related to it for BIGNUMs. RTFM to
learn precisely what this means if the integer is negative.

> I also want to know how to convert an integer to string ?

In what sense do you want it converted??

Express it using a radix (discussed above)?
  (format nil "~3r" N) ;radix-3 string representation of N

Generate the UniCode character corresponding to that integer, and
make a one-character FatString out of it, assuming you have a
UniCode compliant version of Common Lisp, else this works only for
US-ASCII using regular strings?
  (format nil "~C" (code-char N))

Look up in a table the Nth string in that table, such as the five
hundredth name in your address book?
  (nth N listOfStrings)
  (aref vectorOfStrings N)

Express in English words for cardinal number (how many)?
  (format nil "~R" N)

Express in English words for ordinal number (which among sequence)?
  (format nil "~:R" N)

Express in Roman numerals?
  (format nil ··@R" N)

Resistor color code?
  (nth N '("black" "brown" "red" "orange" "yellow" "green" "blue" "purple" "gray" "white"))

TraditionalTelephone alpha coding?

CellPhone/textMessage keypad alpha coding?

Most common words in English?
  1->"the" 2->"to" 3->"and" 4->"of" 5->"but" etc.

Most common words in Spanish?
  1->"de" 2->"que" etc.  

Something else you have in mind?

Do you see now that "convert an integer to string" is pretty much
meaningless until and unless you say which of the many conversion
systems you are wishing for?

Easy exercise:
"eight quindecillion four hundred seventy quattuordecillion three hundred twenty-nine tredecillion four hundred seventy-two duodecillion five hundred forty-three undecillion three decillion three hundred ninety nonillion six hundred eighty-three octillion two hundred twenty-five septillion six sextillion seven hundredninety-six quintillion four hundred nineteen quadrillion six hundred twenty trillion five hundred thirteen billion nine hundred sixteen million fifteen thousand six hundred twenty-five"
What is the prime factorization of the integer expressed by that string?

Not so easy exercise:
"four hundred forty-one quadrillion two hundred fifty-four trillion seventy-three billion four hundred eighty million three hundred thirty-six thousand seven hundred forty-nine"
What is the prime factorization of the integer expressed by that string?

From: Raffael Cavallaro
Subject: Re: (NEWBIE) Integer-length?
Date: Sat, 13 Dec 2008 16:20:28 +0000
Message-ID: <2008121311202816807-raffaelcavallaro@pasespamsilvousplaitmaccom>

On 2008-12-13 08:15:33 -0500, ·············@teh.intarweb.org (Robert 
Maas, http://tinyurl.com/uh3t) said:

> Even for base (radix) ten, it might be more efficient to use the
> bit-position to get an estimate of the number of decimal digits,
> then if there are two possible answers then do a single arithmetic
> test to see whether the value is above or below the threshold. For
> example (for non-negative integers):
>    #bits  cases  #decimalDigits
>       1     1         1
>       2     2,3       1
>       3    4,5,6,7    1
>       4      8,9      1
>           10,...,15   2
>       5   15,...,31   2
>       6   32,...,63   2
>       7   64,...,99   2
>           100 - 127   3
> See how most of the time the number of bits fully determines the
> number of decimal digits, and only about a third of the time there
> are two choices for number of digits and you need to to an
> additional test?
> 
> So what's that additional test? If you compute that an integer that
> is expressed as N bits can be either M or M+1 digits when expressed
> in decimal, you simply need to compute ten^M and test whether your
> integer is less that power or not.

This method is about 10 x faster than converting to a string and 
computing its length on the lisp/machine combination I'm using (CCL on 
an intel macbook pro):

(defun decimal-integer-length (n)
  (declare (optimize (speed 3) (safety 0) (debug 0)))
  (check-type n integer)
  (length (princ-to-string (abs n))))


(defun decimal-length-from-binary (n)
  (declare (optimize (speed 3) (safety 0) (debug 0)))
  (check-type n integer)
  (let* ((abs (abs n))
         (binary-length (integer-length abs))
         (binary-multiplier 4) ;; 8 and 9 have an integer length of 4
         (lower-bound (truncate binary-length binary-multiplier)))
    (labels ((current-or-higher (pos-int current-expt)
               (if (>= pos-int (expt 10 current-expt))
                 (current-or-higher pos-int (1+ current-expt))
                 current-expt)))
      (current-or-higher abs lower-bound))))

#| tests:

(loop repeat 10000
  with update-fn =
  (lambda () (- (random most-positive-fixnum) 3000000))
  for i = (funcall update-fn)
  then (funcall update-fn)
  always
  (=
   (- (decimal-integer-length i)
      (decimal-length-from-binary i))
   0))

(time
 (loop repeat 10000
  with update-fn =
  (lambda () (- (random most-positive-fixnum) 3000000))
  for i = (funcall update-fn)
  then (funcall update-fn)
  do
  (decimal-integer-length i)))

(time
 (loop repeat 10000
  with update-fn =
  (lambda () (- (random most-positive-fixnum) 3000000))
  for i = (funcall update-fn)
  then (funcall update-fn)
  do
  (decimal-length-from-binary i)))

|#
-- 
Raffael Cavallaro, Ph.D.

From: Raffael Cavallaro
Subject: Re: (NEWBIE) Integer-length?
Date: Sat, 13 Dec 2008 16:29:37 +0000
Message-ID: <2008121311293775249-raffaelcavallaro@pasespamsilvousplaitmaccom>

On 2008-12-13 11:20:28 -0500, Raffael Cavallaro 
<················@pas.espam.s.il.vous.plait.mac.com> said:

> #| tests:
> 
> (loop repeat 10000
>   with update-fn =
>   (lambda () (- (random most-positive-fixnum) 3000000))
>   for i = (funcall update-fn)
>   then (funcall update-fn)
>   always
>   (=
>    (- (decimal-integer-length i)
>       (decimal-length-from-binary i))
>    0))
> 
> (time
>  (loop repeat 10000
>   with update-fn =
>   (lambda () (- (random most-positive-fixnum) 3000000))
>   for i = (funcall update-fn)
>   then (funcall update-fn)
>   do
>   (decimal-integer-length i)))
> 
> (time
>  (loop repeat 10000
>   with update-fn =
>   (lambda () (- (random most-positive-fixnum) 3000000))
>   for i = (funcall update-fn)
>   then (funcall update-fn)
>   do
>   (decimal-length-from-binary i)))
> 
> |#

um, might want to update those tests to get more negative integers...

#| tests:

(loop repeat 10000
  with half-mfp = (truncate most-positive-fixnum 2)
  with update-fn =
  (lambda () (- (random most-positive-fixnum) half-mfp))
  for i = (funcall update-fn)
  then (funcall update-fn)
  always
  (=
   (- (decimal-integer-length i)
      (decimal-length-from-binary i))
   0))

(time
 (loop repeat 10000
   with half-mfp = (truncate most-positive-fixnum 2)
   with update-fn =
   (lambda () (- (random most-positive-fixnum) half-mfp))
   for i = (funcall update-fn)
   then (funcall update-fn)
   do
   (decimal-integer-length i)))

(time
 (loop repeat 10000
   with half-mfp = (truncate most-positive-fixnum 2)
   with update-fn =
   (lambda () (- (random most-positive-fixnum) half-mfp))
   for i = (funcall update-fn)
   then (funcall update-fn)
   do
   (decimal-length-from-binary i)))

|#
-- 
Raffael Cavallaro, Ph.D.

From: Raffael Cavallaro
Subject: Re: (NEWBIE) Integer-length?
Date: Sat, 13 Dec 2008 16:32:35 +0000
Message-ID: <2008121311323550073-raffaelcavallaro@pasespamsilvousplaitmaccom>

On 2008-12-13 11:29:37 -0500, Raffael Cavallaro 
<················@pas.espam.s.il.vous.plait.mac.com> said:

> with half-mfp = (truncate most-positive-fixnum 2)

... and "half-mpf"
              ^^^
is probably a better name for this quantity ;^)

sorry for the typo



-- 
Raffael Cavallaro, Ph.D.

From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: (NEWBIE) Integer-length?
Date: Mon, 15 Dec 2008 07:59:48 +0000
Message-ID: <REM-2008dec14-003@Yahoo.Com>

(Included another newsgroup because this topic has gone beyond just Lisp.
 For non-Lispers joining now, INTEGER-LENGTH tells number of bits needed
  to express a given integer, same as java.math.BigInteger.bitCount.
 OP was confused because the function returns number of bits rather
  than number of decimal digits the OP really wanted.)

Following up my own article to extend my thought on that topic.

I previously suggested that if you want to know the number of
digits in the decimal expression of a given number, you first call
a function that very quickly tells the number of digits in the
*binary* representation of that number, which is almost instant on
modern CPUs to find the index of the leftmost bit position not
matching the sign bit, then multiple by a conversion factor
log10(2) = ln(2)/ln(10) to get an approximate number of digits in
decimal representation, and finally in case of two choices you
directly generate the threshold power of ten and compare less than
or not:
> For example (for non-negative integers):
>       7   64,...,99   2
>           100 - 127   3
so if the number of bits is 7 it can be either 2 or 3 decimal
digits depending on whether the integer is less than ten^2 or not.
> ten^M can be computed moderately efficiently using only log(M)
> squarings and at most that many side multiplications.

In Common Lisp, the function integer-length tells you the number of
bits almost directly (for positive integers it's exactly what you
want; for negative integers, depending on whether you treat the
number as signed expression or as nine's complement you may need
different fudge factors, and how many digits zero has is a matter
of arbitrary definition). Once you have that information, the rest
is straightforward if tiresome to handle all the neg/zero/pos cases
exactly the way your customer wants the answer to turn out.

But last night I got to thinking about the problem of counting the
number of digits in the binary- (or other radix-) representation of
an integer in programming languages which don't have a built-in
function such as Common Lisp's INTEGER-LENGTH or Java's
java.math.BigInteger.bitCount to do that.

If you need to code such a function/method directly yourself in
some other programming language, obviously it's better to just
compute the number of bits without actually working out *what*
those digits are (by printing it to a string for example). A simple
loop, doing division repeatedly until the quotient is less than the
radix, would work, but for efficiency when processing very large
integers, to avoid too many BigInteger multiplications or
divisions, I got the idea of using repeated squaring to crudely
bound the given integer, thus needing only log(nDigits) instead of
nDigits multiplications to get to that point, then using a type of
"binary search" algorithm to interpolate between two adjacent
squarings to get the exact bounds between two consecutive powers of
the radix, again using only log(nDigits) multiplications to do that
half the algorithm. The nice thing about this algorithm is that to
get the number of digits in some radix other than binary you use
the exact same algorithm, just with a different starting point of
that base instead of 2. Thus you compute the number of decimal (or
other radix) digits *directly* in the first place, rather than
computing the number of bits then converting to number of digits by
multiplying by log[base](2) and rounding up or down depending on
the boundary case. It seems to me that this way of getting the
result for arbitrary radix directly is more clean than that earlier
algorithm of doing binary first then scaling to other base.
(Of course if the number of bits is extremely efficient on your
 particular CPU, then the binary-then-scale algorithm would probably
 be faster.)

So I went ahead and (in PowerLisp) wrote the code to actually do
this algorithm for positive integer and arbitrary radix. (Crashed
my computer due to bug in PowerLisp triggered by typo in my code,
and needed to cold-restart, only once.) It's broken into two major
parts which are run in sequence:

;Given a positive integer, and a radix (integer at least 2):
;Repeatedly square the radix thus doubling the exponent in power=radix^exp,
; keeping track of the exponent and power in parallel in a stack.
; Stop when the power exceeds the given integer.
;Return the stack at that point, where the top pair on the stack contains
; first power that exceeds given integer and all other pairs are smaller/equal.
(defun posint+radix-to-gross-powerbound-stack (x radix)
  ...)
;Requires approximately log2(N) squarings, where N is number of digits.
;The algorithm is "obvious", given my description earlier.

;Given positive integer, and stack of squarings from some radix to first past
; that given integer (result from posint+radix-to-gross-powerbound-stack):
;Unwind the stack to efficiently interpolate between last two exponents
; to narrow bound of power around given integer.
;Return list of two (exp . pwr) pairs whose exp differs by 1, the first with
; pwr<=posint, second with posint<pwr. The CAR of the latter equals number of
; digits.
(defun posint+stksqrs-narrow-power (posint stksqrs)
  ...)
;Requires approximately log2(N) incremental multiplications, where N is
; number of digits. Let me know if you need a clue as to the algorithm,
; because it might not be obvious from my description earlier.

A higher-level function, which I haven't bothered to write yet,
would test whether the given integer is positive/zero/negative,
pass the appropriate value to the two above functions in sequence,
and the convert the return value to the desired number-of-digits
for toplevel return.

Before I post my code for the bodies of the two above function
definitions, would anybody wish to try coding their own version, as
an exercise for a student for example? Feel free to code in any
language you wish. From my descriptions of the two algorithms given
in the preface to each function, I would expect the code in just
about any language to be pretty much the same, i.e. identical
flowcharts, only the details of how to build a linked list (for the
stack), and the exponent&power pairs to put on the stack, and how
to deal with big integers in a generic way (built-in in Common Lisp
and Java, but not so trivial in many other languages), such trivial
implementational details, and the surface syntax, would differ from
one language to another.

Remember my idea of building a Web-based "no-syntax" IDE that
allows constructing software algorithms without needing any formal
syntax for the software, all database+menu driven based on
intentional datatypes of the parameters and return values of
functions being called? The two above functions, plus the
higher-level function that calls them and produces the final
desired number-of-digits result, would be obvious candidates for
such an IDE, where a language-independent algorithm is built in
each case, which could then be automatically rendered in almost any
programming language the user/programmer may desire. It might also
be a good test case for intentional datatypes, because the
intentional datatype of a mathematical integer *expressed* in
decimal digits would need to have a "lazy" mode where we talk about
the digits that *would* be generated without actually generating
them yet. Thus some sort of mathematical logic would be needed to
talk about the hypothetical properties (number of digits) of
something that hasn't yet been construced (the actual string of
digits). Maybe an expert on type inference or something like that
can give me an idea what datatype model would be best for this
lazy-value nonlazy-inference application.

From: James Harris
Subject: Re: (NEWBIE) Integer-length?
Date: Mon, 15 Dec 2008 12:09:20 +0000
Message-ID: <d86b5283-3698-4e03-b3b1-9c91ddc1854e@q26g2000prq.googlegroups.com>

On 15 Dec, 07:59, ·············@teh.intarweb.org (Robert Maas,
http://tinyurl.com/uh3t) wrote:
...
> But last night I got to thinking about the problem of counting the
> number of digits in the binary- (or other radix-) representation of
> an integer in programming languages which don't have a built-in
> function such as Common Lisp's INTEGER-LENGTH or Java's
> java.math.BigInteger.bitCount to do that.
>

How large a number are we talking about? Can you say more about what
range of integer lengths you have in mind? If infinite what sizes do
you expect most often?

>
> If you need to code such a function/method directly yourself in
> some other programming language, obviously it's better to just
> compute the number of bits without actually working out *what*
> those digits are (by printing it to a string for example). A simple
> loop, doing division repeatedly until the quotient is less than the
> radix, would work, but for efficiency when processing very large
>

So, there's a challenge here to find a fast solution. This could be
interesting.

>
> A higher-level function, which I haven't bothered to write yet,
> would test whether the given integer is positive/zero/negative,
> pass the appropriate value to the two above functions in sequence,
> and the convert the return value to the desired number-of-digits
> for toplevel return.

Agreed. Given that the final solutions will depend on the range of
number of digits it seems a must to have a top-level function which
will perhaps do some preprocessing and then invoke the appropriate
lower level function.

James

From: Mensanator
Subject: Re: (NEWBIE) Integer-length?
Date: Mon, 15 Dec 2008 19:00:28 +0000
Message-ID: <5e93d6bb-c295-4432-b75b-66eb0806ad02@40g2000prx.googlegroups.com>

On Dec 15, 6:09 am, James Harris <··············@googlemail.com>
wrote:
> On 15 Dec, 07:59, ·············@teh.intarweb.org (Robert Maas,http://tinyurl.com/uh3t) wrote:
>
> ...
>
> > But last night I got to thinking about the problem of counting the
> > number of digits in the binary- (or other radix-) representation of
> > an integer in programming languages which don't have a built-in
> > function such as Common Lisp's INTEGER-LENGTH or Java's
> > java.math.BigInteger.bitCount to do that.
>
> How large a number are we talking about?

Why impose a limit?

> Can you say more about what
> range of integer lengths you have in mind? If infinite what sizes do
> you expect most often?

That might be hard to say, depending on the nature of your problem.

>
>
>
> > If you need to code such a function/method directly yourself in
> > some other programming language, obviously it's better to just
> > compute the number of bits without actually working out *what*
> > those digits are (by printing it to a string for example). A simple
> > loop, doing division repeatedly until the quotient is less than the
> > radix, would work, but for efficiency when processing very large
>
> So, there's a challenge here to find a fast solution. This could be
> interesting.

Naw, it's boring. Useful languages like Python have already
covered such trivial matters freeing you to think about
general solutions rather than just those that can be solved
in a closed form expression.

>>> # Python

>>> # Python wrapper for the GMP math library
>>> import gmpy

>>> # some demos
>>> import collatz_functions as cf

>>> # the 1st 6th Generation Type [1,2] Mersenne Hailstone
>>> a = cf.Type12MH(6,1)

>>> # how many bits?
>>> print gmpy.numdigits(a,2)
177149

>>> # how many base 10 digits?
>>> print gmpy.numdigits(a,10)
53328

>>> # excursion (max value) of a Collatz sequence starting at 'a'
>>> e = cf.excursion(a)

>>> # Now since 'a' had 177149 bits (all of which were 1),
>>> # we expect the excursion to have about 177149 * 1.585 bits

>>> print gmpy.numdigits(e,2)
280776

>>> # and in decimal, thats
>>> print gmpy.numdigits(e,10)
84522

>>> # and while we're at it, how many bits in the excursion are 1s?
>>> # (we expect about half.)
>>> print gmpy.popcount(e)
140748

Nifty, eh?


>
>
>
> > A higher-level function, which I haven't bothered to write yet,
> > would test whether the given integer is positive/zero/negative,
> > pass the appropriate value to the two above functions in sequence,
> > and the convert the return value to the desired number-of-digits
> > for toplevel return.
>
> Agreed. Given that the final solutions will depend on the range of
> number of digits it seems a must to have a top-level function which
> will perhaps do some preprocessing and then invoke the appropriate
> lower level function.
>
> James

From: James Harris
Subject: Re: (NEWBIE) Integer-length?
Date: Mon, 15 Dec 2008 21:03:46 +0000
Message-ID: <5897a622-7539-4797-8954-f27978f73559@t39g2000prh.googlegroups.com>

On 15 Dec, 19:00, Mensanator <··········@aol.com> wrote:
...
> > > But last night I got to thinking about the problem of counting the
> > > number of digits in the binary- (or other radix-) representation of
> > > an integer in programming languages which don't have a built-in
> > > function such as Common Lisp's INTEGER-LENGTH or Java's
> > > java.math.BigInteger.bitCount to do that.
>
> > How large a number are we talking about?
>
> Why impose a limit?

I wasn't.

Getting parameters to the OP's spec is part of defining the problem.

>
> > Can you say more about what
> > range of integer lengths you have in mind? If infinite what sizes do
> > you expect most often?
>
> That might be hard to say, depending on the nature of your problem.

We'll have to let the OP advise on that.


>
>
>
> > > If you need to code such a function/method directly yourself in
> > > some other programming language, obviously it's better to just
> > > compute the number of bits without actually working out *what*
> > > those digits are (by printing it to a string for example). A simple
> > > loop, doing division repeatedly until the quotient is less than the
> > > radix, would work, but for efficiency when processing very large
>
> > So, there's a challenge here to find a fast solution. This could be
> > interesting.
>
> Naw, it's boring. Useful languages like Python have already
> covered such trivial matters freeing you to think about
> general solutions rather than just those that can be solved
> in a closed form expression.

Naw, Python is not known for its speed. Getting the answer is not the
problem set. Getting the answer efficiently is required.

...

James

From: Mensanator
Subject: Re: (NEWBIE) Integer-length?
Date: Tue, 16 Dec 2008 00:25:55 +0000
Message-ID: <c2012ed9-b7fd-464d-84af-1378af577cc3@x16g2000prn.googlegroups.com>

On Dec 15, 3:03 pm, James Harris <··············@googlemail.com>
wrote:
> On 15 Dec, 19:00, Mensanator <··········@aol.com> wrote:
> ...
>
> > > > But last night I got to thinking about the problem of counting the
> > > > number of digits in the binary- (or other radix-) representation of
> > > > an integer in programming languages which don't have a built-in
> > > > function such as Common Lisp's INTEGER-LENGTH or Java's
> > > > java.math.BigInteger.bitCount to do that.
>
> > > How large a number are we talking about?
>
> > Why impose a limit?
>
> I wasn't.
>
> Getting parameters to the OP's spec is part of defining the problem.

Ok.

>
>
>
> > > Can you say more about what
> > > range of integer lengths you have in mind? If infinite what sizes do
> > > you expect most often?
>
> > That might be hard to say, depending on the nature of your problem.
>
> We'll have to let the OP advise on that.

Ok.

>
>
>
>
>
>
>
> > > > If you need to code such a function/method directly yourself in
> > > > some other programming language, obviously it's better to just
> > > > compute the number of bits without actually working out *what*
> > > > those digits are (by printing it to a string for example). A simple
> > > > loop, doing division repeatedly until the quotient is less than the
> > > > radix, would work, but for efficiency when processing very large
>
> > > So, there's a challenge here to find a fast solution. This could be
> > > interesting.
>
> > Naw, it's boring. Useful languages like Python have already
> > covered such trivial matters freeing you to think about
> > general solutions rather than just those that can be solved
> > in a closed form expression.
>
> Naw, Python is not known for its speed.

The gmpy library itself isn't Python, it's compiled C,
so it's quite fast.

> Getting the answer is not the
> problem set.

Ok, I assumed it was. The OP is certainly free to ignore
my contibution.

> Getting the answer efficiently is required.

From what I hear, the GMP library is highly reccomended.

>
> ...
>
> James

From: Bartc
Subject: Re: (NEWBIE) Integer-length?
Date: Mon, 15 Dec 2008 19:19:04 +0000
Message-ID: <I4y1l.6746$Sp5.1389@text.news.virginmedia.com>

Mensanator wrote:
> On Dec 15, 6:09 am, James Harris <··············@googlemail.com>
> wrote:
>> On 15 Dec, 07:59, ·············@teh.intarweb.org (Robert
>> Maas,http://tinyurl.com/uh3t) wrote:

>>> But last night I got to thinking about the problem of counting the
>>> number of digits in the binary- (or other radix-) representation of
>>> an integer in programming languages which don't have a built-in
>>> function such as Common Lisp's INTEGER-LENGTH or Java's
>>> java.math.BigInteger.bitCount to do that.

>> So, there's a challenge here to find a fast solution. This could be
>> interesting.
>
> Naw, it's boring. Useful languages like Python have already
> covered such trivial matters freeing you to think about
> general solutions rather than just those that can be solved
> in a closed form expression.

>>>> # Python wrapper for the GMP math library

>>>> # how many bits?
>>>> print gmpy.numdigits(a,2)
> 177149
>
>>>> # how many base 10 digits?
>>>> print gmpy.numdigits(a,10)
> 53328

So how does Python (or GMP) do it? And how fast is it?

These things are interesting to those not satisfied with shrink-wrapped 
ready-made solutions.

-- 
Bartc

From: toby
Subject: Re: (NEWBIE) Integer-length?
Date: Mon, 15 Dec 2008 19:33:26 +0000
Message-ID: <0318c6cd-d4e2-465e-ace3-9aac8b4f2361@s24g2000vbp.googlegroups.com>

On Dec 15, 2:19 pm, "Bartc" <····@freeuk.com> wrote:
> Mensanator wrote:
> > On Dec 15, 6:09 am, James Harris <··············@googlemail.com>
> > wrote:
> >> On 15 Dec, 07:59, ·············@teh.intarweb.org (Robert
> >> Maas,http://tinyurl.com/uh3t) wrote:
> >>> But last night I got to thinking about the problem of counting the
> >>> number of digits in the binary- (or other radix-) representation of
> >>> an integer in programming languages which don't have a built-in
> >>> function such as Common Lisp's INTEGER-LENGTH or Java's
> >>> java.math.BigInteger.bitCount to do that.
> >> So, there's a challenge here to find a fast solution. This could be
> >> interesting.
>
> > Naw, it's boring. Useful languages like Python have already
> > covered such trivial matters freeing you to think about
> > general solutions rather than just those that can be solved
> > in a closed form expression.
> >>>> # Python wrapper for the GMP math library
> >>>> # how many bits?
> >>>> print gmpy.numdigits(a,2)
> > 177149
>
> >>>> # how many base 10 digits?
> >>>> print gmpy.numdigits(a,10)
> > 53328
>
> So how does Python (or GMP) do it? And how fast is it?
>
> These things are interesting to those not satisfied with shrink-wrapped
> ready-made solutions.

TIAS. NIH is a PITA.

>
> --
> Bartc

From: Mensanator
Subject: Re: (NEWBIE) Integer-length?
Date: Mon, 15 Dec 2008 20:42:04 +0000
Message-ID: <a6381a86-05dc-4ba0-a824-c6662186f730@x16g2000prn.googlegroups.com>

On Dec 15, 1:19 pm, "Bartc" <····@freeuk.com> wrote:
> Mensanator wrote:
> > On Dec 15, 6:09 am, James Harris <··············@googlemail.com>
> > wrote:
> >> On 15 Dec, 07:59, ·············@teh.intarweb.org (Robert
> >> Maas,http://tinyurl.com/uh3t) wrote:
> >>> But last night I got to thinking about the problem of counting the
> >>> number of digits in the binary- (or other radix-) representation of
> >>> an integer in programming languages which don't have a built-in
> >>> function such as Common Lisp's INTEGER-LENGTH or Java's
> >>> java.math.BigInteger.bitCount to do that.
> >> So, there's a challenge here to find a fast solution. This could be
> >> interesting.
>
> > Naw, it's boring. Useful languages like Python have already
> > covered such trivial matters freeing you to think about
> > general solutions rather than just those that can be solved
> > in a closed form expression.
> >>>> # Python wrapper for the GMP math library
> >>>> # how many bits?
> >>>> print gmpy.numdigits(a,2)
> > 177149
>
> >>>> # how many base 10 digits?
> >>>> print gmpy.numdigits(a,10)
> > 53328
>
> So how does Python (or GMP) do it?

No clue. But you can go to the GMP web site and download
the sources if you're interested. They're in C, so you
can use them in C or C++. The gmpy site will give you
sources on how that C library is wrapped to give access
via Python.

Keep in mind that these functions assume you're using
the GMP multi-precision integers (and will coerce to
such if not), so you may need to know all about how that
is done, also.

> And how fast is it?

Pretty good, it's compiled C code. You have to expect SOME
processing time when dealing with numbers that are a quarter
million bits. When the answer to print gmpy.numdigits(e,10)
comes back before I get my hand off the return key, that's
usually good enough for me. That, of course, just encourages
me to use bigger numbers and more difficult problems. No matter
how fast, I can always ask a question that will make it bog
down. When your problems are exponential, you don't have to look
far for such questions. My function Type12MH can't even do the
1st 11th generation number because it has more than 2147483648
bits!

>
> These things are interesting to those not satisfied with shrink-wrapped
> ready-made solutions.

Sure, if THAT'S what your interested in. Personally, I've got bigger
fish to fry and need tools and I'm not particularly interested in
how the tool is made as long as it works.

I brought it up in case anyone else shared similar interest.

That's how _I_ learned of it, someone mentioned it in
a newsgroup. I'm returning the favor.

>
> --
> Bartc

From: Kaz Kylheku
Subject: Re: (NEWBIE) Integer-length?
Date: Mon, 15 Dec 2008 21:23:52 +0000
Message-ID: <20081231125822.518@gmail.com>

On 2008-12-15, Bartc <··@freeuk.com> wrote:
>>>>> # Python wrapper for the GMP math library
>
>>>>> # how many bits?
>>>>> print gmpy.numdigits(a,2)
>> 177149
>>
>>>>> # how many base 10 digits?
>>>>> print gmpy.numdigits(a,10)
>> 53328
>
> So how does Python (or GMP) do it? And how fast is it?

The internal data type representing a bignum integer is probably an array, and
it probably keeps track of the number of bits. Or at least the number of words
in the array.

Suppose that (on a particular machine) a bignum is dynamic array of 32 bit
cells. You simply multiply one less than the array size by 32. Then you look
at the most significant word in the array and find the position of the
highest bit that is 1, and add that position to the total.

It would be egregiously stupid to implement this operation arithmetically
at the level where you have direct access to this information.

The number of binary digits can be converted to the number of decimal
digits using multiplication by a constant factor.

The number of binary digits in a positive integer N

   ceiling(log2(N))

The number of decimal digits is:

   ceiling(log10(N))

logarithm of that number, and the number of decimal digits is proportional to
the base 10 logarithm. The base 10 logarithm and base 2 logarithm of
a number are just a constant factor part: log10 x / log2 x  =~ 0.30103.

You can obtain an estimate that is good to within one digit, for the number of
base 10 digits by multiplying the binary size by 0.30103. This
can be done using scaled integer arithmetic, too.

The trick is refining this value to make it exact. the probem is that, for
instance, all numbers from 8 to 15 have the same binary length: 4.
Decimal numbers in that range have a length of either 1 or 2, but
if we multiply 4 by 0.30103 and take the ceiling, we get the worst-case
estimate of 2.

If you need this function for computing the amount of characters needed for
storing the decimal digit representation of a number, this approach is good
enough: multiply the number of words required to store the bignum by
a suitable factor, rounded up to the nearest integer.

From: Ben Bacarisse
Subject: Re: (NEWBIE) Integer-length?
Date: Mon, 15 Dec 2008 12:27:09 +0000
Message-ID: <87vdtlaajm.fsf@bsb.me.uk>

·············@teh.intarweb.org (Robert Maas, http://tinyurl.com/uh3t)
writes:

> (Included another newsgroup because this topic has gone beyond just Lisp.
>  For non-Lispers joining now, INTEGER-LENGTH tells number of bits needed
>   to express a given integer, same as java.math.BigInteger.bitCount.
>  OP was confused because the function returns number of bits rather
>   than number of decimal digits the OP really wanted.)

Nit-pick: the Java equivalent of integer-length is bitLength, not
bitCount (both in  java.math.BigInteger).

-- 
Ben.

From: Madhu
Subject: Re: (NEWBIE) Integer-length?
Date: Mon, 15 Dec 2008 12:36:23 +0000
Message-ID: <m37i618vjs.fsf@moon.robolove.meer.net>

[cll only]

* (Robert Maas, http://tinyurl.com/uh3t) <·················@Yahoo.Com> :
Wrote on Sun, 14 Dec 2008 23:59:48 -0800:

| But last night I got to thinking about the problem of counting the
| number of digits in the binary- (or other radix-) representation of
| an integer in programming languages which don't have a built-in
| function

[snip]

| A simple loop, doing division repeatedly until the quotient is less
| than the radix, would work, but for efficiency when processing very
| large integers, to avoid too many BigInteger multiplications or
| divisions

(defun integer-length*-1 (n radix &aux (ndigits 1) remainder)
  (declare (type (integer 1) n radix))
  (loop (multiple-value-setq (n remainder) (truncate n radix))
     (incf ndigits)
     (if (< n radix)
	 (return ndigits))))

| I got the idea of using repeated squaring to crudely bound the given
| integer, thus needing only log(nDigits) instead of nDigits
| multiplications to get to that point, then using a type of "binary
| search" algorithm to interpolate between two adjacent squarings to get
| the exact bounds between two consecutive powers of the radix, again
| using only log(nDigits) multiplications to do that half the
| algorithm. The nice thing about this algorithm is that to get the
| number of digits in some radix other than binary you use the exact
| same algorithm, just with a different starting point of that base
| instead of 2. Thus you compute the number of decimal (or other radix)
| digits *directly* in the first place, rather than computing the number
| of bits then converting to number of digits by multiplying by
| log[base](2) and rounding up or down depending on the boundary case.


(defun integer-length*-2 (n radix)  ; n radix > 2
  (loop for prev-x = 1 then x
	for prev-b = 0 then b
	for b from 1
	for x = radix then (* x x)
	until (>= x n)
	finally
	(return
	  (loop with ndigits =		;(expt radix (1- prev-b))
		(loop for p = 1 then (* p radix) 
		      repeat (1- prev-b) finally (return p))
		for y = prev-x then (* y radix)
		if (>= y n) return ndigits
		else do (incf ndigits)))))


(Did I get the corner cases right?)

This sort of 2-scale algorithm is usually always a great idea, and gives
Big Oh improvements which are breakthroughs (when the first paper is
published anyway :)

In practice I've always been troubled by the fact that the size of the
second loop (2^2^M)) -- (2^2^(M+1)) is actually pretty wide. 

Typically log(M) and M lie within a constant factor of each other. M is
already of the order of log_r(N) where N is the integer.  So I suspect
the advantage over INTEGER-LENGTH*-1 is questionable.

| It seems to me that this way of getting the result for arbitrary radix
| directly is more clean than that earlier algorithm of doing binary
| first then scaling to other base.

Yes, its a lot cleaner.  Thanks for the idea

--
Madhu


| (Of course if the number of bits is extremely efficient on your
|  particular CPU, then the binary-then-scale algorithm would probably
|  be faster.)

(snip)

| Before I post my code for the bodies of the two above function
| definitions, would anybody wish to try coding their own version, as
| an exercise for a student for example? Feel free to code in any
| language you wish. From my descriptions of the two algorithms given
| in the preface to each function, I would expect the code in just
| about any language to be pretty much the same, i.e. identical
| flowcharts, only the details of how to build a linked list (for the
| stack), and the exponent&power pairs to put on the stack, and how
| to deal with big integers in a generic way (built-in in Common Lisp
| and Java, but not so trivial in many other languages), such trivial
| implementational details, and the surface syntax, would differ from
| one language to another.

Of course I cheated using CL and LOOP, sorry :)
--
Madhu

From: Madhu
Subject: Re: (NEWBIE) Integer-length?
Date: Mon, 15 Dec 2008 12:59:08 +0000
Message-ID: <m31vw98uhv.fsf@moon.robolove.meer.net>

[cll only, supersedes earlier post]

* (Robert Maas, http://tinyurl.com/uh3t) <·················@Yahoo.Com> :
Wrote on Sun, 14 Dec 2008 23:59:48 -0800:

| But last night I got to thinking about the problem of counting the
| number of digits in the binary- (or other radix-) representation of
| an integer in programming languages which don't have a built-in
| function

[snip]

| I got the idea of using repeated squaring to crudely bound the given
| integer, thus needing only log(nDigits) instead of nDigits
| multiplications to get to that point, then using a type of "binary
| search" algorithm to interpolate between two adjacent squarings to get
| the exact bounds between two consecutive powers of the radix, again
| using only log(nDigits) multiplications to do that half the
| algorithm. 

(defun integer-length*-2 (n radix)  ; n radix > 2
  (loop for prev-x = 1 then x
	for prev-b = 0 then b
	for b from 1
	for x = radix then (* x x)
	until (>= x n)
	finally
	(return
	  (loop with ndigits =		;(expt radix (1- prev-b))
		(loop for p = 1 then (* p radix) 
		      repeat (1- prev-b) finally (return p))
		for y = prev-x then (* y radix)
		if (>= y n) return ndigits
		else do (incf ndigits)))))


(Did I get the corner cases right?)

This sort of 2-scale algorithm is usually always a great idea, and gives
Big Oh improvements which are breakthroughs (when the first paper is
published anyway :)

In practice I've always been troubled by the fact that the size of the
second loop (2^2^M)) -- (2^2^(M+1)) is actually pretty wide. 

Typically log(M) and M lie within a constant factor of each other.  Note
M is already of the order of log_r(N) where N is the integer.  So it is
small. So I suspect the gains are small

| The nice thing about this algorithm is that to get the number of
| digits in some radix other than binary you use the exact same
| algorithm, just with a different starting point of that base instead
| of 2. Thus you compute the number of decimal (or other radix) digits
| *directly* in the first place, rather than computing the number of
| bits then converting to number of digits by multiplying by
| log[base](2) and rounding up or down depending on the boundary case.
| It seems to me that this way of getting the result for arbitrary radix
| directly is more clean than that earlier algorithm of doing binary
| first then scaling to other base.

Yes, its a lot cleaner

--
Madhu

From: Madhu
Subject: Re: (NEWBIE) Integer-length?
Date: Mon, 15 Dec 2008 13:12:25 +0000
Message-ID: <m3wse17fba.fsf@moon.robolove.meer.net>

[Argh. Just Fixing the bug I had intended to fix earlier!]

* Madhu <··············@moon.robolove.meer.net> :
Wrote on Mon, 15 Dec 2008 18:29:08 +0530:

| (defun integer-length*-2 (n radix)  ; n radix > 2
|   (loop for prev-x = 1 then x
| 	for prev-b = 0 then b
| 	for b from 1
| 	for x = radix then (* x x)
| 	until (>= x n)
| 	finally
| 	(return
| 	  (loop with ndigits =		;(expt radix (1- prev-b))
^^^-------------------------------------------> this should be 2 instead radix
| 		(loop for p = 1 then (* p radix)
^^^---------------------------------------> this should be 2 instead radix
| 		      repeat (1- prev-b) finally (return p))
| 		for y = prev-x then (* y radix)
| 		if (>= y n) return ndigits
| 		else do (incf ndigits)))))

--
Madhu

From: Mark Wooding
Subject: Re: (NEWBIE) Integer-length?
Date: Mon, 15 Dec 2008 22:58:39 +0000
Message-ID: <slrngkdo8v.k2f.mdw@metalzone.distorted.org.uk>

Robert Maas, http://tinyurl.com/uh3t <·············@teh.intarweb.org> wrote:

> A simple loop, doing division repeatedly until the quotient is less
> than the radix, would work, 

This isn't even a good way of doing a full binary-to-whatever
conversion.

> but for efficiency when processing very large integers, to avoid too
> many BigInteger multiplications or divisions, I got the idea of using
> repeated squaring to crudely bound the given integer, thus needing
> only log(nDigits) instead of nDigits multiplications to get to that
> point, then using a type of "binary search" algorithm to interpolate
> between two adjacent squarings to get the exact bounds between two
> consecutive powers of the radix,

Strange.  Here's my approach.  Theory first.  Let R be our radix, and n
the target (nonnegative) number: we want to count the number of radix-R
digits needed to represent n.

Suppose we're given that n < R^{2^{i+1}}.  Well, we know that B =
R^{2^i} has exactly 2^i + 1 digits, and is the smallest integer with
this property.  If n >= B then let c = 2^i and n' = floor(n / R^{2^i});
otherwise let c = 0 and n' = n.  Either way, because of the upper bound,
we have n' < R^{2^i}, so n' has at most 2^i digits; and indeed n has
exactly c more digits than n' has.

This suggests a two-pass approach.  Firstly, we build a table of pairs
2^i, R^{2^i} for i = 0, ..., while R^{2^i} >= n.  Then we can work back
down: given n, 2^i, R^{2^i} we can compute c and n' as above, count the
digits c' in n' recursively, and return c + c'.  The base case, when 0
<= n < R, is that n has one digit.  (If you want to argue that 0
requires no digits, that's fine: insert a special case accordingly.  I
don't believe it myself, though.)

This recursive algorithm is easy to convert into a purely iterative
algorithm, whereupon it looks rather like this (Common Lisp).

(defun count-digits (radix target)
  (loop with n = 1
        with stack = (loop with stack = nil
	     	      for i = 1 then (* 2 i)
		      for power = radix then (* power power)
		      until (> power target)
		      do (push (cons i power) stack)
		      finally (return stack))
        for (i . power) in stack
        when (<= power target)
        do (incf n i) (setf target (floor target power))
        finally (return n)))

(For Lisp hackers: interestingly, COLLECT is the wrong thing in the
inner LOOP.  We do actually want to collect the items backwards.)

> It seems to me that this way of getting the result for arbitrary radix
> directly is more clean than that earlier algorithm of doing binary
> first then scaling to other base.  (Of course if the number of bits is
> extremely efficient on your particular CPU, then the binary-then-scale
> algorithm would probably be faster.)

Counting bits is very easy.  Most bignum systems count the number of
words used in the representation anyway, so the bit count is the number
of bits in a word, times the number of words minus 1, plus the number of
bits needed to represent the most significant word.  Even without
processor support, the final count is a simple binary search which can
even be unrolled.  The messy part is the fiddling with logarithms.

Out of interest, the above algorithm is fairly similar to a rather
better (recursive) way to do binary-to-whatever conversion.  It also has
the useful property that it generates the digits in the right order,
unlike the naive repeated-division/remainder-by-R loop.  Indeed, knowing
the number of digits is essential in this algorithm, so that one
generates the correct number of leading zeroes in intermediate results.
There's also a whatever-to-binary algorithm which uses similar
principles but seems much easier to write with an explicit stack.

-- [mdw]

From: Kaz Kylheku
Subject: Re: (NEWBIE) Integer-length?
Date: Mon, 15 Dec 2008 23:26:13 +0000
Message-ID: <20081231153457.454@gmail.com>

On 2008-12-15, Mark Wooding <···@distorted.org.uk> wrote:
> Counting bits is very easy.  Most bignum systems count the number of
> words used in the representation anyway, so the bit count is the number
> of bits in a word, times the number of words minus 1, plus the number of
> bits needed to represent the most significant word.  Even without
> processor support, the final count is a simple binary search which can
> even be unrolled.

For some ideas about the latter, see here:

http://www-graphics.stanford.edu/~seander/bithacks.html

Scroll down to section ``Finding integer log base 2 of an integer (aka the
position of the highest bit set)''

From: Madhu
Subject: Re: (NEWBIE) Integer-length?
Date: Tue, 16 Dec 2008 00:08:33 +0000
Message-ID: <m3skop6kxq.fsf@moon.robolove.meer.net>

* Mark Wooding <··················@metalzone.distorted.org.uk> :
Wrote on Mon, 15 Dec 2008 22:58:39 +0000 (UTC):

| Robert Maas, http://tinyurl.com/uh3t <·············@teh.intarweb.org> wrote:
|
|> A simple loop, doing division repeatedly until the quotient is less
|> than the radix, would work, 
|
| This isn't even a good way of doing a full binary-to-whatever
| conversion.
|
|> but for efficiency when processing very large integers, to avoid too
|> many BigInteger multiplications or divisions, I got the idea of using
|> repeated squaring to crudely bound the given integer, thus needing
|> only log(nDigits) instead of nDigits multiplications to get to that
|> point, then using a type of "binary search" algorithm to interpolate
|> between two adjacent squarings to get the exact bounds between two
|> consecutive powers of the radix,
|
| Strange.  Here's my approach.

UIM your approach is identical to what Robert outlined, modulo
some implementation difference.

--
Madhu

From: Mark Wooding
Subject: Re: (NEWBIE) Integer-length?
Date: Tue, 16 Dec 2008 01:40:36 +0000
Message-ID: <slrngke1ok.k2f.mdw@metalzone.distorted.org.uk>

Madhu <·······@meer.net> wrote:

> UIM your approach is identical to what Robert outlined, modulo
> some implementation difference.

I may have misunderstood Robert's description, but I think he described
a different algorithm.  The first part, finding i such that

  R^{2^i} <= n < R^{2^{i+1}}

is common to both, but Robert describes his second part as
`interpolation' and explains that it requires more multiplications.  At
a guess, it works like this:

(defun count-digits-maas (radix target)
  (loop with stack = (loop with stack = nil
  	     	     	   for i = 1 then (* 2 i)
			   for power = radix then (* power power)
			   until (> power target)
			   do (push (cons i power) stack)
			   finally (return stack))
        with n = 1
        with guess = 1
        for (i . power) in stack
        for try = (* guess power)
        when (<= try target)
        do (setf guess try n (+ n i))
        finally (return n)))

which may be faster because bignum is frequently less efficient than
multiplication, but it does more of them -- approximately twice as many.
This seems similar to the method for solving a superincreasing knapsack
problem: we know that the count we want is the sum of some subset of the
2^i, and that the product of the corresponding R^{2^i} is the largest
possible not bigger than n.  The problem is easy if we start large and
work down due to the superincreasing nature of the R^{2^i} factors; so
simply take all of them which don't take us over the limit.

(Sorry if I seemed to disparage Robert's algorithm; I was expressing
surprise at the technique before I'd really understood what was going
on.)

Anyway, this is clearly different from my algorithm, which uses division
and may perhaps be more readily understood.

-- [mdw]

From: Madhu
Subject: Re: (NEWBIE) Integer-length?
Date: Tue, 16 Dec 2008 02:13:03 +0000
Message-ID: <m3k5a07tqo.fsf@moon.robolove.meer.net>

* Mark Wooding <··················@metalzone.distorted.org.uk> :
Wrote on Tue, 16 Dec 2008 01:40:36 +0000 (UTC):

| Madhu <·······@meer.net> wrote:
|
|> UIM your approach is identical to what Robert outlined, modulo
|> some implementation difference.
|
| I may have misunderstood Robert's description, but I think he described
| a different algorithm.  The first part, finding i such that
|
|   R^{2^i} <= n < R^{2^{i+1}}
|
| is common to both, but Robert describes his second part as
| `interpolation' and explains that it requires more multiplications.  At
| a guess, it works like this:

Well TBH he did not post the steps for the second part but just said it
,----
| ;Requires approximately log2(N) incremental multiplications, where N is
| ; number of digits. Let me know if you need a clue as to the algorithm,
| ; because it might not be obvious from my description earlier.
`----

What I understood of what robert  suggested:
<···················@moon.robolove.meer.net>,
and its parent article

| (defun count-digits-maas (radix target) ...)

[snip]

[At a glance this seems structurally similar (modulo details).  You
 really are using just the first two elements of the (reversed) stack
 you build. I dont even build a stack.]

| which may be faster because bignum is frequently less efficient than
| multiplication, but it does more of them -- approximately twice as many.
| This seems similar to the method for solving a superincreasing knapsack
| problem: we know that the count we want is the sum of some subset of the
| 2^i, and that the product of the corresponding R^{2^i} is the largest
| possible not bigger than n.  The problem is easy if we start large and
| work down due to the superincreasing nature of the R^{2^i} factors; so
| simply take all of them which don't take us over the limit.

Well, I'm not sure that makes a difference to the structure of the
algorithm.  R^2^i < n < R^2^{i+1} and the range is pretty wide. if n is
closer to the left bound, then searching up will take fewer steps.

| (Sorry if I seemed to disparage Robert's algorithm; I was expressing
| surprise at the technique before I'd really understood what was going
| on.)
|
| Anyway, this is clearly different from my algorithm, which uses division
| and may perhaps be more readily understood.

Yes, through my tinted spectacles, I was just looking at the
similarities more than the differences: One uses multiplication starting
from one end while the other uses division from the other end.
Structurally they appeared to be the same technique, so I thought I
should mention it, [and see if I was missing something :)]

--
Madhu

From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: (NEWBIE) Integer-length?
Date: Tue, 16 Dec 2008 19:43:55 +0000
Message-ID: <REM-2008dec16-001@Yahoo.Com>

> From: Madhu <·······@meer.net>

> |> UIM your approach is identical to what Robert outlined, modulo
> |> some implementation difference.
> | I may have misunderstood Robert's description, but I think he described
> | a different algorithm.  The first part, finding i such that
> |   R^{2^i} <= n < R^{2^{i+1}}
> | is common to both,

Yes, exactly, although during part 1 I build an explicit stack (as
linked list) which will be used to make part 2 efficient.

> | but Robert describes his second part as
> | `interpolation' and explains that it requires more multiplications.

OK, I think I see where I didn't word my function-comment well and
might have caused you to get confused.  I did use the term
"interpolation" there. But earlier in my description of my
brainstorming prior to writing the code I clearly said "then using
a type of "binary search" algorithm to interpolate between two
adjacent squarings to get the exact bounds between two consecutive
powers of the radix, again using only log(nDigits) multiplications
to do that half the algorithm". So if you looked carefully at the
comments accompanying the two function prototypes, but didn't so
carefully read the earlier brainstorming comments, you might have
misuderstood what kind of "interpolation" I was talking about. I'll
now edit the comment in my source to be more clear there, so that
my comment can stand alone with the earlier text ... done:
;Unwind the stack to efficiently perform binary search between the lower and
; upper bounds (the last two items on the stack) to thereby narrow bound of
; power (of the given radix) around given integer.
That's more verbose than before, but I think it makes it more clear, OK?

Using your 'i' instead of my 'M' notation:
  R^{2^i} <= n < R^{2^{i+1}}
Note that the log(i) items on the stack can be used to synthesize
*any* power of the base all the way from R^{2^i} to R^{2^{i+1}}
because multiplication is associative and commutative, and
taking powers distribute over addition of exponents:
  (R^i1 * R^i2) * R^i3 = R^i1 * (R^i2 * R^i3)
  R^i1 * R^i2 = R^i2 * R^i1
  R^(i1+i2) = R^i1 * R^i2
In this way, the items on the stack, which were the individual
items R^{2^k} for various values of k from 0 thru i+1, can be
multiplied together in arbitrary combinations, without worrying
about sequence or association, to synthesize R^(arbitraryInteger)
by using the binary representation of arbitraryInteger to select
which of those stack elements to use and which not to use. Thus
linear binary search on the exponent arbitraryInteger is
accomplished by synthesizing R^{2^arbitraryInteger}. What's
actually kept on stack, and synthesized during this binary search,
are actually (arbitraryInteger . 2^arbitraryInteger) pairs, the
exponent and the corresponding power. The powers are used during
the binary tests that control which end of the interval will be
shrunk at each step of binary search, either low<-mid or hi<-mid,
and the exponents are computed but ignored until the very end when
the exponent is the return value (number of digits). (Actually the
way I have the code organized, the second function returns *both*
the lower and upper bound, where upper bound has number of digits
and lower bound has number of digits not counting the high-order
digit. A higher level function would then discard the part not
actually wanted, but a different higher level function might return
more of the info.)

> [At a glance this seems structurally similar (modulo details).  You
>  really are using just the first two elements of the (reversed) stack
>  you build. I dont even build a stack.]

Looking just at that remark, I don't see how to achieve log(N)
multiplications and/or divisions in part 2 unless the entire stack
was somehow retained. Note that a *recursive* algorithm for the
whole ball of wax would use the execution stack to both build and
unwind what I did explicitly as a linked list. The stack would be
built up one step at a time while going into recursion, and at
bottom of recursion when the upper bound was finally found the
execution stack would effectively be the same as my linked-list
stack after part 1. Then on the way back out of recursion, at each
level one iteration of the binary search would be preformed as the
stack is unwound, thus narrowing the bounds on the exponent by a
factor of 2. When it gets out of the top level of recursion, it has
been narrowed all the way down to just a single step from k to k+1
where 2^i <= k < k+1 <= 2^(i+1). The pair of pairs ((k . 2^k) (k+1
. 2^(k+1))) is returned from "part 2", the unwinding of the stack.
A higher-level function would then discard everything except k+1
which is the desired number of digits. (A different higher-level
function, which returns number of significant bits, would return k.
For example if we needed to convert to normalized floating point
value, where the high-order bit which is always "1" is *not*
stored, only the remaining k bits are stored in the floating-point
representation.)

Note that at each step of the binary search, we are effectively
splitting in half the range of R^(2^k1) to R^(2^k2), where both k1
and k2 are even, so the midpoint is an integer (k1+k2)/2.
When we pop the next item off the stack, it is exactly the pair
 ( (k2-k1)/2 . R^(k2-k1)/2 )   ;Note *subtrction* here, half-width of interval
The midpoint we need to compute, for binary-search test, is
 ( (k1+k2)/2 . R^(k1+k2)/2 )
Given the range (interval) we already have, and this popped stack
item, there are two ways to generate the desired midpoint,
by multiplication of left end times half-interval
 R^(2^k1) * R^(k2-k1)/2
or by division of right end over half-interval.
 R^(2^k2) / R^(k2-k1)/2
I chose multiplication because it's usually faster than division,
and also because the numbers involved with multiplication from left
end are smaller than the numbers involved with division from right
end.

> Well, I'm not sure that makes a difference to the structure of the
> algorithm.  R^2^i < n < R^2^{i+1} and the range is pretty wide. if n is
> closer to the left bound, then searching up will take fewer steps.

If you have very good reason to expect data which is highly biassed
towards smaller values, hence left end of range, then linear search
from left end is faster than binary search. But in the more general
case where you are expected to deal with an extremely wide range of
different numbers and nobody tells you in advance how many digits
to expect, it seems prudent to make asymptotic behaviour optimal,
which does a little bit more work than necessary for small cases
but protects from taking fifty years if somebody ever gives you a
really huge number.

One well-known case where data is *known* to be highly biassed to
make linear search optimal is natural-language histograms, where E
T O A N ... are so very common in null context, and U is common
after Q, and in most other contexts there are very very few letters
that are likely next, that it's often faster to keep your histogram
sorted in descending order by frequency of occurance and do linear
search. But nothing like that was the premise in this thread about
writing a utility that took an arbitrary integer and returned
number of digits per some radix (lacking some underlying access to
the internal representation where the number of word-size blocks is
"obvious" and can be read out directly, and then the exact bit
position is a little bit of extra code, and the log(R)/log(2)
conversion factor can then be used; or with Java's **decimal**
bignums, you get the number of characters of decimal expression
directly from the internal represention).

Given that in any "decent" programming language, the underlying
language support provides a number-of-bits or number-of-digits
utility, why would you ever use a language where you needed an
algorithm such as I invented?? Well, maybe some "cruddy" language
provides some *other* feature you really need, such as high-level
SQL, or device control, or a really good package for solving
nonlinear optimization, and it's more efficient to code the
number-of-digits algorithm directly in that language than to do a
foreign function call to some hand-coded machine-language
subroutine that magically knows the internal representation of
big-integers, or even worse need to *convert* the big-integer from
your "cruddy" language to another language using the FFI interface,
just to get back the number of digits. Maybe the internal
representation isn't advertised, so you *can't* write the assembly
language code to compute number of bits or digits. Rather than
spend five years translating the entire FORTRAN library to Lisp or
Java, or suffer FFI, you use my algorithm right there within the
"cruddy" language and lose a few seconds of CPU time.

> Yes, through my tinted spectacles, I was just looking at the
> similarities more than the differences: One uses multiplication
> starting from one end while the other uses division from the other
> end.

Oops, my remark about multiplication being faster than division
should have been here instead of above.

> Structurally they appeared to be the same technique, so I thought
> I should mention it, [and see if I was missing something :)]

I didn't fully decode the algorithm the other person submitted, but
I did notice two things:
- Rather than use PROG with GO as I did, he used LOOP. (Just syntactic sugar.)
- My algorithm had a single loop for each of part 1 (build stack
   and produce crude bounds) and part 2 (unwind stack to effect
   binary search to narrow bounds), whereas *his* algorithm had a
   **double** (nested) loop for part 2 (part 1 looked identical to
   mine except for syntactic sugar).

Anyway, I posted this followup at this point in the thread because
I felt it time I reveal my code (with that one comment now changed,
as I noted above, but no code changes at all). I've also written
the higher-level function, which I hadn't yet then:

;Started 2008.Dec.13:
;Efficiently determining the number of digits in a positive integer

;Given a positive integer, and a radix (integer at least 2):
;Repeatedly square the radix thus doubling the exponent in power=radix^exp,
; keeping track of the exponent and power in parallel in a stack.
; Stop when the power exceeds the given integer.
;Return the stack at that point, where the top pair on the stack contains
; first power that exceeds given integer and all other pairs are smaller/equal.
(defun posint+radix-to-gross-powerbound-stack (x radix)
  (prog (stk expnew pownew explo powlo)
    (setq stk (list)) (setq expnew 1) (setq pownew radix)
 lp1
    (push (cons expnew pownew) stk)
    (when (< x pownew) (return stk))
    (setq explo expnew) (setq powlo pownew)
    (setq expnew (* explo 2)) (setq pownew (* powlo powlo))
    (go lp1)
    ))

;Given positive integer, and stack of squarings from some radix to first past
; that given integer (result from posint+radix-to-gross-powerbound-stack):
;Unwind the stack to efficiently interpolate between last two exponents
; to narrow bound of power around given integer.
;Return list of two (exp . pwr) pairs whose exp differs by 1, the first with
; pwr<=posint, second with posint<pwr. The CAR of the latter equals number of
; digits.
(defun posint+stksqrs-narrow-power (posint stksqrs)
  (prog (prhi prlo fac expmid pwrmid prmid)
    (setq prhi (pop stksqrs))
    (setq prlo (pop stksqrs))
  lp(when (null stksqrs) (return (list prlo prhi)))
    (setq fac (pop stksqrs))
    (setq expmid (+ (car prlo) (car fac)))
    (setq pwrmid (* (cdr prlo) (cdr fac)))
    (setq prmid (cons expmid pwrmid))
    (if (< posint pwrmid) (setq prhi prmid)
        (setq prlo prmid))
    (go lp)))

;Given positive integer, and radix:
;Efficiently compute number of digits
(defun posint+radix-to-ndigits (posint radix)
  (prog (stk lohi ndig)
    (unless (< 0 posint) (error "posint not a positive integer"))
    (unless (< 1 radix) (error "radix not at least 2"))
    (setq stk (posint+radix-to-gross-powerbound-stack posint radix))
    (setq lohi (posint+stksqrs-narrow-power posint stk))
    (setq ndig (caadr lohi))
    (return ndig)))

I could refactor the syntax of the first two fuctions to use LOOP
instead of PROG/GO, where LOOP expands into PROG anyway, but why
bother?

I could refactor the syntax of the third function to simply nest
function calls instead of assigning named PROG variables, but why
bother? I think for most people (except extreme Lispers), separate
sequential statements with assignments are easier to understand
than highly nested expressions. One of the reasons is that the
assignments to meaningfully-named variables provides an extra level
of self-documentation, and the other is that it does one thing at a
time instead of conflating everything together at one time, thereby
avoiding overloading the human mind.
But it's just a matter of style, syntactic sugar, not algorithm.
FORTH programmers carry the one-thing-at-a-time avoid-overload idea
to the extreme, of course. They might actually be ""right"".

Exercise for the extreme Lisper: Convert my two separate parts of
the algorithm, building stack and unwinding stack, into a single
*recursive* function as I outlined above (building stack to extend
the upper bound going into recursion, and unwinding stack to narrow
the interval on the way back out). Of course neither PROG nor LOOP
would be used when organizing the code that way.

(Analagous exercise for Java or C++ or Mathematica etc. advocate.
 Heck, I think it would be neat to see the recursive version expressed
 in FORTH, assuming of course there's a FORTH BigInteger library available!)

Exercise (mission impossible?) for the extreme Lisp teacher/mentor:
Try to explain that recursive function to a complete newbie who
has never coded any non-trivial algorithm in any language, but
who is good at math so *would* understand the algorithm itself
right off the bat if explained coherently/meaningfully/conceptually.

OT: My FORTH challenge above triggered a thought:
Reference count garbage collection would be very clean in Forth,
because FORTH is stack-based rather than local-frame-based.
We just need to define four words:
DupRef = Duplicate the top item on the stack and increment reference count.
PopRef = Pop top item off stack;
         decrement reference count;
         recursively garbage collect if the count has reached zero.
DiscardLink = fetch the pointer (link) from a "place" within a cell;
              call PopRef;
              set the contents of the "place" to NULL.
EstablishLink = call DiscardLink (for safety!!);
                store the newly given pointer into that place;
                discard the given target pointer without calling PopRef
                 thus effectively doing a funds-transfer from the stack count
                 to the object.place-link-object count.
Thus the reference count always equals the number of stack pointers
plus the number of cross-pointers from other objects.
That would completely eliminate the type of bug where you forgot to
increment or decrement a reference count, or you decremented the
reference count but forgot to set the pointer to NULL and
mistakenly tried to dereference that pointer later in that stack
frame where the local variable containing the pointer is still accessible.

(Even more than before it seems now that FORTH would be *great*
 implementation language for a new bootstrap version of Lisp using
 reference counting.)

VOT: I wrote MRPP3 = POX using PDP-10 assembly language,
implementing a reference-count system from scratch, but *failed* to
have good concept of what exactly should count as another reference
hence where in my code I need to expend the the extra CPU cycles to
bump the count, resulting in a memory leak bug that I never was
able to find. (A gross count of cells allocated before and after a
task that *should* have resulted in zero net allocation, showed a
nonzero difference, that grew with the number of iterations.) Too
bad I didn't have today's FORTH/stack realization in 1975 when I
wrote MRPP3, or I would written MRPP3 like a FORTH application with
the above four reference-count-maintenance primitives and never
have to suffer such bugs. Live and learn...

From: Mark Wooding
Subject: Re: (NEWBIE) Integer-length?
Date: Thu, 18 Dec 2008 00:35:57 +0000
Message-ID: <slrngkj6nd.k2f.mdw@metalzone.distorted.org.uk>

Robert Maas, http://tinyurl.com/uh3t <·············@teh.intarweb.org> wrote:

> Looking just at that remark, I don't see how to achieve log(N)
> multiplications and/or divisions in part 2 unless the entire stack
> was somehow retained. 

I didn't present my recursive version in my original article because it
involved multiple values, and was crossposted to a general programming
group (where I actually found it) who might not appreciate MULTIPLE-
VALUE-BIND (or even LABELS for that matter) interesting parts of Lisp
getting in the way of an interesting algorithm.  Now we're just in
comp.lang.lisp, I shall post it.  My original version used division;
I've also written a purely multiplicative version.

Purists may also appreciate the fact that these algorithms are purely
functional and don't use LOOP.

(defun count-digits-recursively/division (target radix)
  (labels ((recurse (i power)
	     (if (< target power)
		 (values 0 target)
		 (multiple-value-bind (count remainder)
		     (recurse (* 2 i) (* power power))
		   (if (< remainder power)
		       (values count remainder)
		       (values (+ count i) (floor remainder power)))))))
    (1+ (recurse 1 radix))))

(defun count-digits-recursively/multiplication (target radix)
  (labels ((recurse (i power)
	     (if (< target power)
		 (values 0 1)
		 (multiple-value-bind (count guess)
		     (recurse (* 2 i) (* power power))
		   (let ((try (* guess power)))
		     (if (< target try)
			 (values count guess)
			 (values (+ count i) try)))))))
    (1+ (recurse 1 radix))))

I've attempted to preserve the names of the variables from my earlier
articles, and show the similarity of structure between the two.

In terms of performance, it's clear that there's one big squaring
operation (which might be faster than general multiplication if the
compiler notices) and at most one a big division (FLOOR REMAINDER POWER)
or general multiplication (* GUESS POWER) in each recursive call, and
there are at most 1 + log_2 log_R n recursive calls.

> - My algorithm had a single loop for each of part 1 (build stack
>    and produce crude bounds) and part 2 (unwind stack to effect
>    binary search to narrow bounds), whereas *his* algorithm had a
>    **double** (nested) loop for part 2 (part 1 looked identical to
>    mine except for syntactic sugar).

You misread.  My algorithm effectively had two consecutive LOOPs, though
it was written as

  (loop with stack = (loop ...)
        ...)

which may have confused you.  This is morally equivalent to

  (let ((stack (loop ...)))
    (loop ...))

though once I've decided to use LOOP, I may as well avail myself of any
feature that looks handy...

-- [mdw]

From: John Thingstad
Subject: Re: (NEWBIE) Integer-length?
Date: Thu, 18 Dec 2008 16:53:03 +0000
Message-ID: <op.umdaipwxut4oq5@pandora.alfanett.no>

P� Thu, 18 Dec 2008 01:35:57 +0100, skrev Mark Wooding  
<···@distorted.org.uk>:

> Robert Maas, http://tinyurl.com/uh3t <·············@teh.intarweb.org>  
> wrote:
>
>> Looking just at that remark, I don't see how to achieve log(N)
>> multiplications and/or divisions in part 2 unless the entire stack
>> was somehow retained.
>

Why not just print the number as a string and take it's length?

(defun digit-number (number &optional (radix 10))
   (let ((*print-base* radix))
     (length (with-output-to-string (stream) (princ number stream)))))

(digit-number 1345)
> 4

--------------
John Thingstad

From: Raffael Cavallaro
Subject: Re: (NEWBIE) Integer-length?
Date: Thu, 18 Dec 2008 20:07:12 +0000
Message-ID: <2008121815071227544-raffaelcavallaro@pasespamsilvousplaitmaccom>

On 2008-12-18 11:53:03 -0500, "John Thingstad" <·······@online.no> said:

> Why not just print the number as a string and take it's length?

<http://groups.google.com/group/comp.lang.lisp/msg/72f758ee7bfe591e?hl=en>

i.e., this thread is now circular.
-- 
Raffael Cavallaro, Ph.D.

From: Kaz Kylheku
Subject: Re: (NEWBIE) Integer-length?
Date: Fri, 19 Dec 2008 00:08:42 +0000
Message-ID: <20090103211750.654@gmail.com>

On 2008-12-18, Raffael Cavallaro
<················@pas.espam.s.il.vous.plait.mac.com> wrote:
> On 2008-12-18 11:53:03 -0500, "John Thingstad" <·······@online.no> said:
>
>> Why not just print the number as a string and take it's length?
>
><http://groups.google.com/group/comp.lang.lisp/msg/72f758ee7bfe591e?hl=en>
>
> i.e., this thread is now circular.

#1=(:headers
    ("Newsgroups: comp.lang.lisp"
     "From: ..."
     "Subject ...")
    ...
    (:followups (:headers "Newsgroups: comp.lang.lisp" ... ) 
                ... 
                ( ... ( ... 
		       :followups (#1#)))))

:)

From: Mark Wooding
Subject: Re: (NEWBIE) Integer-length?
Date: Thu, 18 Dec 2008 18:09:18 +0000
Message-ID: <slrngkl4ee.k2f.mdw@metalzone.distorted.org.uk>

John Thingstad <·······@online.no> wrote:

> Why not just print the number as a string and take it's length?

Because that requires much more computation and memory for large
numbers.  Because you might want to know how long the string will be in
advance, order to allocate the output buffer.  And, perhaps most
importantly for this thread, because it's not as interesting!

-- [mdw]

From: Madhu
Subject: Re: (NEWBIE) Integer-length?
Date: Fri, 19 Dec 2008 09:31:29 +0000
Message-ID: <m3skokcxzi.fsf@moon.robolove.meer.net>

* Mark Wooding <··················@metalzone.distorted.org.uk> :
Wrote on Thu, 18 Dec 2008 18:09:18 +0000 (UTC):

| John Thingstad <·······@online.no> wrote:
|
|> Why not just print the number as a string and take it's length?
|
[added context]
|>(defun digit-number (number &optional (radix 10))
|>  (let ((*print-base* radix))
|>    (length (with-output-to-string (stream) (princ number stream)))))
|>

| Because that requires much more computation and memory for large
| numbers.  Because you might want to know how long the string will be in
| advance, order to allocate the output buffer.  And, perhaps most
| importantly for this thread, because it's not as interesting!

Also, most importantly, it wont work for RADIX > 36   :)

--
Madhu

From: Kaz Kylheku
Subject: Re: (NEWBIE) Integer-length?
Date: Fri, 19 Dec 2008 19:55:44 +0000
Message-ID: <20090104180352.930@gmail.com>

On 2008-12-19, Madhu <·······@meer.net> wrote:
> Also, most importantly, it wont work for RADIX > 36   :)

There is no such radix, since there are no more letters.

A condition of type IMPOSSIBLE-RADIX should be signaled then.

Or wait, I could be wrong. Can you use Greek letters? Hmm.

From: Mark Wooding
Subject: Re: (NEWBIE) Integer-length?
Date: Sat, 20 Dec 2008 11:55:22 +0000
Message-ID: <slrngkpn9a.u5t.mdw@metalzone.distorted.org.uk>

Kaz Kylheku <········@gmail.com> wrote:
> On 2008-12-19, Madhu <·······@meer.net> wrote:
> > Also, most importantly, it wont work for RADIX > 36   :)
> 
> There is no such radix, since there are no more letters.

No wonder the Babylonian civilization failed: they were trying to use 60
as a radix, and didn't realize it was impossible.

> A condition of type IMPOSSIBLE-RADIX should be signaled then.

Yep.  And Babylon had hanging gardens but no HANDLER-BIND.

> Or wait, I could be wrong. Can you use Greek letters? Hmm.

There should be cuneiform support in FORMAT!

-- [mdw]

From: Raffael Cavallaro
Subject: Re: (NEWBIE) Integer-length?
Date: Sat, 20 Dec 2008 18:19:45 +0000
Message-ID: <2008122013194516807-raffaelcavallaro@pasespamsilvousplaitmaccom>

On 2008-12-20 06:55:22 -0500, Mark Wooding <···@distorted.org.uk> said:

> There should be cuneiform support in FORMAT!

<http://article.gmane.org/gmane.lisp.openmcl.devel/1763>
-- 
Raffael Cavallaro, Ph.D.

From: Kaz Kylheku
Subject: Re: (NEWBIE) Integer-length?
Date: Sat, 20 Dec 2008 20:32:20 +0000
Message-ID: <20090105201327.474@gmail.com>

On 2008-12-20, Mark Wooding <···@distorted.org.uk> wrote:
> Kaz Kylheku <········@gmail.com> wrote:
>> On 2008-12-19, Madhu <·······@meer.net> wrote:
>> > Also, most importantly, it wont work for RADIX > 36   :)
>> 
>> There is no such radix, since there are no more letters.
>
> No wonder the Babylonian civilization failed: they were trying to use 60
> as a radix, and didn't realize it was impossible.

Oh no, they had it broken into alternating 6 and 10. It's not really base 60.
Otherwise you could claim that we have base 1000 whenever we separate groups of
three digits with a comma. It's real in a way, but on the other hand few
people, if anyone at all, memorize a 1000 x 1000 multiplication table.

Their problem is that they had too many languages, like what
we have in computer science today.

Anyone remember that famous old cartoon of the tower of Babel, but with
programming languages?

Here is one copy I found using Google's image search:

http://blog.tilos.hu/irodalmile2/Babel.gif

Now did this originally appear on the the cover of the book, Programming
Languages: History and Fundamentals, by Jean E. Sammet?  Or did that cover
borrow it from somewhere? I wonder.

I have a hunch I might have first seen this in Creative Computing?

I had little pile of that magazine from the 1970's (which was actually before
my time in computing).  Don't know what happened to that, too bad.

By the way, you'd think 1969 would have been a little bit early to be
writing a book on the history of programming languages, hahaha.

It's like writing a biography about someone who is barely a teenager.

Still, there was already enough of them to fill that cartoon tower.

From: Andrew Reilly
Subject: language and program nostalgia (was Re: (NEWBIE) Integer-length?)
Date: Sat, 20 Dec 2008 22:43:28 +0000
Message-ID: <6r5aofFfon28U1@mid.individual.net>

On Sat, 20 Dec 2008 20:32:20 +0000, Kaz Kylheku wrote:

> Here is one copy I found using Google's image search:
> 
> http://blog.tilos.hu/irodalmile2/Babel.gif
> 

> Still, there was already enough of them to fill that cartoon tower.

There are a lot of languages there that I've never heard of (which isn't 
saying much, I guess), but it makes me wonder: how many of these now 
don't run on *any* working hardware?  Were the programs written in them 
translated into something else, or abandoned as their business reason for 
existing shifted beyond their ability to be adapted?

I know that I've written programs that now don't run on anything because 
no examples of the original hardware exist, and they were tied too 
closely to the peculiarities of the hardware to be worth "porting".  I 
keep listings of some of the particularly neat ones, hoping that I'll 
find time to do something useful with them again...

Cheers,

-- 
Andrew

From: Pascal J. Bourguignon
Subject: Re: language and program nostalgia
Date: Sun, 21 Dec 2008 11:02:22 +0000
Message-ID: <87zlip3i69.fsf@informatimago.com>

Andrew Reilly <···············@areilly.bpc-users.org> writes:

> On Sat, 20 Dec 2008 20:32:20 +0000, Kaz Kylheku wrote:
>
>> Here is one copy I found using Google's image search:
>> 
>> http://blog.tilos.hu/irodalmile2/Babel.gif
>> 
>
>> Still, there was already enough of them to fill that cartoon tower.

It's quite recent, since it has UNICODE at the bottom right. ;-)

> There are a lot of languages there that I've never heard of (which isn't 
> saying much, I guess), but it makes me wonder: how many of these now 
> don't run on *any* working hardware?  Were the programs written in them 
> translated into something else, or abandoned as their business reason for 
> existing shifted beyond their ability to be adapted?

As languages, I'd bet most of them are still running on
"contemporaneous" hardware, either inside virtual machines,
re-implementation, or in ulterior versions of the language with some
backward compatibility layer.

The programs themselves may not all work as is, since they may depend as
you noted, on some specific hardware or device (but they may also be
emulated, or the program may be modified).

Indeed, useful programs were ported or translated on the run.  Smart
businesses learned soon to use perene programming languages such as
COBOL. 

A lot of the less common languages are "experimental" languages or
domain specific languages.  They are developed at one site, and used for
some time only there.  For experimental languages developed in
universities, I'd bet sources running on some unix are (or were) often
available.

The situation may be difficult for old languages that were written in
assembler, on specific hardware.  But anything a tad more general and
useful (eg. Algol) would have been ported to later systems, rewritten in
some other programming language or in themselves, and therefore may
still live on, be compiled/ported on current systems.  Eg. there are
several Algol 68 compiler running on unix available, and if you want
Algol60, there's sources targetting MS-DOS that you could run in dosemu
on unix or that you could port to unix, which shouldn't be too hard.

In the next to worse situation, you have implementations such as LISP
1.5, written in assembler of some old hardware.  Fortunately in this
case, the hardware is well documented, emulators are written, and there
are tools (assembler, linkers) rewritten or recovered to compile the
sources.

The creation of emulators & virtual machines is almost as old as
computers, it's actually a fundamental principle of computing.  For
example (but it's probably not the oldest), the 1401 was emulated on
360, to allow people invested in 1401 autocoder programs to switch over
to 360 smoothly.

Have a look at http://www.bitsavers.org/

> I know that I've written programs that now don't run on anything because 
> no examples of the original hardware exist, and they were tied too 
> closely to the peculiarities of the hardware to be worth "porting".  I 
> keep listings of some of the particularly neat ones, hoping that I'll 
> find time to do something useful with them again...

Hardware can be emulated.  "Interesting" hardware is emulated.

On a distribution like gentoo, there's about a hundred of programming
languages available, and more than a hundred of emulators.  This is not
counting all the less common programming languages and emulators you can
donwload from the Internet.  Use google ;-)

-- 
__Pascal Bourguignon__

From: Christopher Koppler
Subject: Re: language and program nostalgia
Date: Sun, 21 Dec 2008 20:50:19 +0000
Message-ID: <87554c35-ca16-40c1-945b-98def8cd899f@40g2000prx.googlegroups.com>

On Dec 21, 12:02 pm, ····@informatimago.com (Pascal J. Bourguignon)
wrote:
> Andrew Reilly <···············@areilly.bpc-users.org> writes:
> > On Sat, 20 Dec 2008 20:32:20 +0000, Kaz Kylheku wrote:
>
> >> Here is one copy I found using Google's image search:
>
> >>http://blog.tilos.hu/irodalmile2/Babel.gif
>
> >> Still, there was already enough of them to fill that cartoon tower.
>
> It's quite recent, since it has UNICODE at the bottom right. ;-)

http://hopl.murdoch.edu.au/showlanguage.prx?exp=29&language=UNICODE

From: Pascal J. Bourguignon
Subject: Re: language and program nostalgia
Date: Mon, 22 Dec 2008 01:28:48 +0000
Message-ID: <87r6412e27.fsf@informatimago.com>

Christopher Koppler <········@chello.at> writes:

> On Dec 21, 12:02�pm, ····@informatimago.com (Pascal J. Bourguignon)
> wrote:
>> Andrew Reilly <···············@areilly.bpc-users.org> writes:
>> > On Sat, 20 Dec 2008 20:32:20 +0000, Kaz Kylheku wrote:
>>
>> >> Here is one copy I found using Google's image search:
>>
>> >>http://blog.tilos.hu/irodalmile2/Babel.gif
>>
>> >> Still, there was already enough of them to fill that cartoon tower.
>>
>> It's quite recent, since it has UNICODE at the bottom right. ;-)
>
> http://hopl.murdoch.edu.au/showlanguage.prx?exp=29&language=UNICODE

I guess with more than 10,000 programming languages, it's becoming hard
to find unique names...

-- 
__Pascal Bourguignon__