How to avoid GC in tight numeric test loop? (CMUCL)

From: Robert Maas, http://tinyurl.com/uh3t
Subject: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Thu, 12 Jun 2008 14:34:39 +0000
Message-ID: <rem-2008jun12-001@yahoo.com>

(CMU Common Lisp 18b, ... Python 1.0, target Intel x86)

The purpose of this test is to learn (by experiment) what size
active memory fits within the CPU cache and what size thrashes the
CPU cache thereby significantly slowing performace. (There's enough
RAM that VM thrashing never happens even with the largest array
that CMUCL will let me allocate. I could allocate multiple arrays
to get a larger set of active memory, but since I don't care about
that, I won't bother.)

First I allocate the largest power-of-two-size array I can get in
one piece (twice as large crashes the allocator):
(defparameter exp2 28)
(defparameter arrsiz (expt 2 exp2))
(defvar arr)
(type-of (setq arr (make-array (list arrsiz) :element-type '(integer 0 255))))
;=> (SIMPLE-ARRAY (UNSIGNED-BYTE 8) (268435456))

Now I try to write a function which when called won't allocate
anything on the heap, but will efficiently pick a random element of
that array and change it to a random new value:

(defun one-access ()
  (setf (aref arr (the (unsigned-byte 28) (random arrsiz)))
        (the (unsigned-byte 8) (random 256))))

Now I call it a bunch of times, where the loop variable is
carefully declared to avoid CONSing:

(time (dotimes (k 50000)
        (declare (type (unsigned-byte 28) k))
        (one-access)))
Compiling LAMBDA NIL:
Compiling Top-Level Form:
[GC threshold exceeded with 270,465,416 bytes in use.  Commencing GC.]
[GC completed with 268,532,984 bytes retained and 1,932,432 bytes freed.]
[GC will next occur when at least 270,532,984 bytes are in use.]
[GC threshold exceeded with 270,535,464 bytes in use.  Commencing GC.]
[GC completed with 268,547,352 bytes retained and 1,988,112 bytes freed.]
[GC will next occur when at least 270,547,352 bytes are in use.]
[GC threshold exceeded with 270,553,848 bytes in use.  Commencing GC.]
[GC completed with 268,551,464 bytes retained and 2,002,384 bytes freed.]
[GC will next occur when at least 270,551,464 bytes are in use.]
[GC threshold exceeded with 270,553,912 bytes in use.  Commencing GC.]
[GC completed with 268,561,736 bytes retained and 1,992,176 bytes freed.]
[GC will next occur when at least 270,561,736 bytes are in use.]
[GC threshold exceeded with 270,564,192 bytes in use.  Commencing GC.]
[GC completed with 268,569,936 bytes retained and 1,994,256 bytes freed.]
[GC will next occur when at least 270,569,936 bytes are in use.]

Evaluation took:
  3.82 seconds of real time
  1.355693 seconds of user run time
  1.015362 seconds of system run time
  [Run times include 0.57 seconds GC run time]
  0 page faults and
  11432576 bytes consed.
NIL

What am I overlooking? Why is it doing all that "CONS"ing?

If I change the code to just generate random numbers, not update
the array elements:

(time (dotimes (k 50000)
        (declare (type (unsigned-byte 16) k))
        (the (unsigned-byte 28) (random arrsiz))))
Compiling LAMBDA NIL:
Compiling Top-Level Form:
[GC threshold exceeded with 270,682,896 bytes in use.  Commencing GC.]
[GC completed with 268,726,384 bytes retained and 1,956,512 bytes freed.]
[GC will next occur when at least 270,726,384 bytes are in use.]
[GC threshold exceeded with 270,732,576 bytes in use.  Commencing GC.]
[GC completed with 268,753,048 bytes retained and 1,979,528 bytes freed.]
[GC will next occur when at least 270,753,048 bytes are in use.]

Evaluation took:
  0.3 seconds of real time
  0.288998 seconds of user run time
  0.007678 seconds of system run time
  [Run times include 0.18 seconds GC run time]
  0 page faults and
  4687896 bytes consed.

So what's allocating all that memory? Is it inside RANDOM??

Re: How to avoid GC in tight numeric test loop? (CMUCL) Pascal J. Bourguignon
- Re: How to avoid GC in tight numeric test loop? (CMUCL) Robert Maas, http://tinyurl.com/uh3t
Re: How to avoid GC in tight numeric test loop? (CMUCL) Madhu
- Re: How to avoid GC in tight numeric test loop? (CMUCL) Rob Warnock
  - Re: How to avoid GC in tight numeric test loop? (CMUCL) Robert Maas, http://tinyurl.com/uh3t
    - Re: How to avoid GC in tight numeric test loop? (CMUCL) Rob Warnock
    - Re: How to avoid GC in tight numeric test loop? (CMUCL) Pascal J. Bourguignon
      - Re: How to avoid GC in tight numeric test loop? (CMUCL) Rob Warnock
      - Re: How to avoid GC in tight numeric test loop? (CMUCL) Robert Maas, http://tinyurl.com/uh3t
  - Re: How to avoid GC in tight numeric test loop? (CMUCL) Madhu
- Re: How to avoid GC in tight numeric test loop? (CMUCL) Robert Maas, http://tinyurl.com/uh3t
  - Re: How to avoid GC in tight numeric test loop? (CMUCL) Rob Warnock
    - Re: How to avoid GC in tight numeric test loop? (CMUCL) Madhu
    - Re: How to avoid GC in tight numeric test loop? (CMUCL) Raymond Toy (RT/EUS)
      - Re: How to avoid GC in tight numeric test loop? (CMUCL) Rob Warnock
        Re: How to avoid GC in tight numeric test loop? (CMUCL) Madhu
        Re: How to avoid GC in tight numeric test loop? (CMUCL) Rob Warnock
        Re: How to avoid GC in tight numeric test loop? (CMUCL) Madhu
        Re: How to avoid GC in tight numeric test loop? (CMUCL) Rob Warnock
        Re: How to avoid GC in tight numeric test loop? (CMUCL) Madhu
        Re: How to avoid GC in tight numeric test loop? (CMUCL) Raymond Toy (RT/EUS)
    - Re: How to avoid GC in tight numeric test loop? (CMUCL) Robert Maas, http://tinyurl.com/uh3t
      - Re: How to avoid GC in tight numeric test loop? (CMUCL) Pascal J. Bourguignon
      - Re: How to avoid GC in tight numeric test loop? (CMUCL) Rob Warnock
        Re: How to avoid GC in tight numeric test loop? (CMUCL) Raymond Toy (RT/EUS)
        Re: How to avoid GC in tight numeric test loop? (CMUCL) Rob Warnock
        Re: How to avoid GC in tight numeric test loop? (CMUCL) Raymond Toy (RT/EUS)
        Re: How to avoid GC in tight numeric test loop? (CMUCL) Rob Warnock
        Re: How to avoid GC in tight numeric test loop? (CMUCL) Raymond Toy (RT/EUS)

From: Pascal J. Bourguignon
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Thu, 12 Jun 2008 15:27:34 +0000
Message-ID: <7clk1au109.fsf@pbourguignon.anevia.com>

··················@spamgourmet.com.remove (Robert Maas, http://tinyurl.com/uh3t) writes:

> (CMU Common Lisp 18b, ... Python 1.0, target Intel x86)
>
> The purpose of this test is to learn (by experiment) what size
> active memory fits within the CPU cache and what size thrashes the
> CPU cache thereby significantly slowing performace. (There's enough
> RAM that VM thrashing never happens even with the largest array
> that CMUCL will let me allocate. I could allocate multiple arrays
> to get a larger set of active memory, but since I don't care about
> that, I won't bother.)
>
> First I allocate the largest power-of-two-size array I can get in
> one piece (twice as large crashes the allocator):
> (defparameter exp2 28)

I would have named it something like: *log2-of-largest-array*

> (defparameter arrsiz (expt 2 exp2))
> (defvar arr)
> (type-of (setq arr (make-array (list arrsiz) :element-type '(integer 0 255))))
> ;=> (SIMPLE-ARRAY (UNSIGNED-BYTE 8) (268435456))
>
> Now I try to write a function which when called won't allocate
> anything on the heap, but will efficiently pick a random element of
> that array and change it to a random new value:
>
> (defun one-access ()
>   (setf (aref arr (the (unsigned-byte 28) (random arrsiz)))
>         (the (unsigned-byte 8) (random 256))))
>
> Now I call it a bunch of times, where the loop variable is
> carefully declared to avoid CONSing:
>
> (time (dotimes (k 50000)
>         (declare (type (unsigned-byte 28) k))
>         (one-access)))
> Compiling LAMBDA NIL:
> Compiling Top-Level Form:
> [GC threshold exceeded with 270,465,416 bytes in use.  Commencing GC.]
> [GC completed with 268,532,984 bytes retained and 1,932,432 bytes freed.]
> [GC will next occur when at least 270,532,984 bytes are in use.]
> [GC threshold exceeded with 270,535,464 bytes in use.  Commencing GC.]
> [GC completed with 268,547,352 bytes retained and 1,988,112 bytes freed.]
> [GC will next occur when at least 270,547,352 bytes are in use.]
> [GC threshold exceeded with 270,553,848 bytes in use.  Commencing GC.]
> [GC completed with 268,551,464 bytes retained and 2,002,384 bytes freed.]
> [GC will next occur when at least 270,551,464 bytes are in use.]
> [GC threshold exceeded with 270,553,912 bytes in use.  Commencing GC.]
> [GC completed with 268,561,736 bytes retained and 1,992,176 bytes freed.]
> [GC will next occur when at least 270,561,736 bytes are in use.]
> [GC threshold exceeded with 270,564,192 bytes in use.  Commencing GC.]
> [GC completed with 268,569,936 bytes retained and 1,994,256 bytes freed.]
> [GC will next occur when at least 270,569,936 bytes are in use.]
>
> Evaluation took:
>   3.82 seconds of real time
>   1.355693 seconds of user run time
>   1.015362 seconds of system run time
>   [Run times include 0.57 seconds GC run time]
>   0 page faults and
>   11432576 bytes consed.
> NIL
>
> What am I overlooking? Why is it doing all that "CONS"ing?

Perhaps RANDOM itself do cons?

> If I change the code to just generate random numbers, not update
> the array elements:
>
> (time (dotimes (k 50000)
>         (declare (type (unsigned-byte 16) k))
>         (the (unsigned-byte 28) (random arrsiz))))
> Compiling LAMBDA NIL:
> Compiling Top-Level Form:
> [GC threshold exceeded with 270,682,896 bytes in use.  Commencing GC.]
> [GC completed with 268,726,384 bytes retained and 1,956,512 bytes freed.]
> [GC will next occur when at least 270,726,384 bytes are in use.]
> [GC threshold exceeded with 270,732,576 bytes in use.  Commencing GC.]
> [GC completed with 268,753,048 bytes retained and 1,979,528 bytes freed.]
> [GC will next occur when at least 270,753,048 bytes are in use.]
>
> Evaluation took:
>   0.3 seconds of real time
>   0.288998 seconds of user run time
>   0.007678 seconds of system run time
>   [Run times include 0.18 seconds GC run time]
>   0 page faults and
>   4687896 bytes consed.
>
> So what's allocating all that memory? Is it inside RANDOM??

Manifestly, yes.  Notice that they're all temporary data, quickly
collected and reused, several times during your loop.

I guess you'll have to write your own random generator function
carefully avoiding consing.


But I don't see why you'd need a random generator
>                        ...  to learn (by experiment) what size
> active memory fits within the CPU cache and what size thrashes the
> CPU cache thereby significantly slowing performace.

-- 
__Pascal Bourguignon__

From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Fri, 13 Jun 2008 05:48:31 +0000
Message-ID: <rem-2008jun12-007@yahoo.com>

> From: ····@informatimago.com (Pascal J. Bourguignon)
> I guess you'll have to write your own random generator function
> carefully avoiding consing.

Yeah, I started work on adapting my usual linear congruential
algorithm to declare the type of every intermediate value in order
to be sure it doesn't do any consing. But I got bogged down,
couldn't stop it from consing by the time I reached my exhaustion
and frustration point.

> But I don't see why you'd need a random generator

To guarantee random access of pages of memory of whatever size the
CPU cache uses. Normally a disk-swap virtual-memory uses 512-byte
or 1024-byte pages, but the CPU cache might use smaller pages, so
to be safe I need to truly random-access without trying to guess
what the cache-page size is.

But maybe golden-ratio uniform-filling would work as well (or slightly
better) to guarantee the entire array is one huge working set.
But on the other hand, the CPU cache might have a Markovian model
of page accesses, in which case a very simple model would predict
the next page most of the time and allow pre-fetching that page
which would spoil my experiment.

From: Madhu
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Thu, 12 Jun 2008 16:46:00 +0000
Message-ID: <m3mylqzjnb.fsf@meer.net>

* (Robert Maas, http://tinyurl.com/uh3t) <·················@yahoo.com> :
Wrote on Thu, 12 Jun 2008 07:34:39 -0700:
| (CMU Common Lisp 18b, ... Python 1.0, target Intel x86)
|
| The purpose of this test is to learn (by experiment) what size
| active memory fits within the CPU cache and what size thrashes the
| CPU cache thereby significantly slowing performace.
<snip>
| So what's allocating all that memory? Is it inside RANDOM??

Yes. I believe CMUCL's RANDOM conses if its integer argument is greater
than kernel::random-fixnum-max, which is (expt 2 22).

[I Checked this on rand-mt19937.lisp on an old version of CMU Common
 Lisp 2006-09-07 (19D), linux Python 1.1, target Intel x86]
--
Madhu

From: Rob Warnock
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Fri, 13 Jun 2008 03:25:13 +0000
Message-ID: <A6Odnafh_vwEdczVnZ2dnUVZ_qXinZ2d@speakeasy.net>

Madhu  <·······@meer.net> wrote:
+---------------
| Robert Maas <·················@yahoo.com> wrote:
| | (CMU Common Lisp 18b, ... Python 1.0, target Intel x86)
| | The purpose of this test is to learn (by experiment) what size
| | active memory fits within the CPU cache and what size thrashes the
| | CPU cache thereby significantly slowing performace.
| <snip>
| | So what's allocating all that memory? Is it inside RANDOM??
| 
| Yes. I believe CMUCL's RANDOM conses if its integer argument is greater
| than kernel::random-fixnum-max, which is (expt 2 22).
| [I Checked this on rand-mt19937.lisp on an old version of CMU Common
|  Lisp 2006-09-07 (19D), linux Python 1.1, target Intel x86]
+---------------

Well, when I tried it on CMUCL-19c, when given a literal argument
[but much more about that below!] RANDOM didn't cons at all:

    cmu> arrsiz

    268435456
    cmu> (time (dotimes (i 100000000) (random 268435456)))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   2.93f0 seconds of real time
    ;   2.892589f0 seconds of user run time
    ;   5.6f-5 seconds of system run time
    ;   5,428,917,417 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.                  <== NO CONSING
    ; 
    NIL
    cmu> 

The real problems here are two:

First & foremost, Robert forgot to *compile* ONE-ACCESS!!

    cmu> (compile 'one-access)
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ONE-ACCESS
    NIL
    NIL
    cmu> 

Now his example TIME conses *much* less:

    cmu> (time (dotimes (k 50000) (one-access)))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.12f0 seconds of real time
    ;   0.069097f0 seconds of user run time
    ;   0.038177f0 seconds of system run time
    ;   222,723,659 CPU cycles
    ;   0 page faults and
    ;   4,698,872 bytes consed.
    ; 
    NIL
    cmu>

That's only ~94 bytes/call. [Actually, varies from 80 to 96, see below.]

But there's a more subtle problem, probably CMUCL-specific, which
is that ONE-ACCESS is calling (RANDOM ARRSIZ), which, despite being
a (near-)constant here, seems to cause consing. Compare this:

    cmu> (time (dotimes (i 10000) (random 268435456)))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.0f0 seconds of real time
    ;   2.23f-4 seconds of user run time
    ;   3.f-6 seconds of system run time
    ;   408,602 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.                  <== NO CONSING!!
    ; 
    NIL
    cmu> 

versus this: 

    cmu> arrsiz

    268435456
    cmu> (time (dotimes (i 10000) (random arrsiz)))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.01f0 seconds of real time
    ;   0.011814f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   23,014,591 CPU cycles
    ;   0 page faults and
    ;   939,296 bytes consed.
    ; 
    NIL
    cmu> 

Whoa!! *That* was unexpected! [...at least, to me.]
So what's going on here?!? Hmmm...

    cmu> (= arrsiz 268435456)

    T
    cmu> (fixnump arrsiz)

    T
    cmu> (time (random 268435456))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.0f0 seconds of real time
    ;   8.f-6 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   6,113 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.                  <== NO CONSING
    ; 
    246292663
    cmu> (time (let ((x 268435456)) (random x)))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.0f0 seconds of real time
    ;   9.f-6 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   6,202 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.                  <== NO CONSING
    ; 
    32540891
    cmu> (time (let ((x arrsiz)) (random x)))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.0f0 seconds of real time
    ;   2.1f-5 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   30,686 CPU cycles
    ;   0 page faults and
    ;   96 bytes consed.                 <== **WTF?!?**
    ; 
    174987352
    cmu> 

(*rustle*) (*rustle*)... (*grep*) (*grep*)...

Hmmm... Seems there's an internal sort of "compiler macro" built
into the CMUCL compiler -- but *not* visible to user, that is,
(COMPILER-MACRO-FUNCTION 'RANDOM) ==> NIL -- that partially inlines
calls to RANDOM if the arg meets some narrow conditions that I can't
quite parse. It's in the file "cmucl-19c/src/compiler/x86/arith.lisp",
in the (DEFINE-VOP (RANDOM-MT19937) ...) if anyone wants to look at it.
Anyway, when those conditions are met, the compiler partially inlines
the MT19937 algorithm [still calling out to RANDOM-MT19937-UPDATE as
necessary]; otherwise it just generates a normal out-of-line call
to RANDOM. Compare the output of this:

    (disassemble (lambda () (random 268435456)))

with this:

    (disassemble (lambda () (random arrsiz)))

And, indeed, when ONE-ACCESS is changed to manually inline
the value of ARRSIZ, the consing goes away completely!!

    cmu> (defun one-access ()
	   (setf (aref arr (the (unsigned-byte 28) (random 268435456)))
		 (the (unsigned-byte 8) (random 256))))

    ONE-ACCESS
    cmu> (compile *)
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ONE-ACCESS
    NIL
    NIL
    cmu> (time (dotimes (k 50000) (one-access)))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.03f0 seconds of real time
    ;   0.028682f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   60,136,254 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> 

Anyway, the main take-away here is to remember to
*always* compile your functions before trying to analyse
speed and/or consing...


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Fri, 13 Jun 2008 07:15:18 +0000
Message-ID: <rem-2008jun13-001@yahoo.com>

> | | (CMU Common Lisp 18b, ... Python 1.0, target Intel x86)
> | | The purpose of this test is to learn (by experiment) what size
> | | active memory fits within the CPU cache and what size thrashes the
> | | CPU cache thereby significantly slowing performace.
> | <snip>
> | | So what's allocating all that memory? Is it inside RANDOM??
> |
> | Yes. I believe CMUCL's RANDOM conses if its integer argument is greater
> | than kernel::random-fixnum-max, which is (expt 2 22).
> | [I Checked this on rand-mt19937.lisp on an old version of CMU Common
> |  Lisp 2006-09-07 (19D), linux Python 1.1, target Intel x86]
> From: ····@rpw3.org (Rob Warnock)
(Snipping most of your detours, keeping only the part I can use:)
> Well, when I tried it on CMUCL-19c, when given a literal argument
> [but much more about that below!] RANDOM didn't cons at all:
> First & foremost, Robert forgot to *compile* ONE-ACCESS!!

But the REP, after I define a function, when I call it the first
time, it prints out a message *saying* that's compiling it!! So I
just assumed it was telling me the truth. I guess CMUCL lied to me
and I believed it. Oh well.

> But there's a more subtle problem, probably CMUCL-specific, which
> is that ONE-ACCESS is calling (RANDOM ARRSIZ), which, despite being
> a (near-)constant here, seems to cause consing. Compare this:
>     cmu> (time (dotimes (i 10000) (random 268435456)))
..
>     ;   0 bytes consed.                  <== NO CONSING!!

Hmm, that makes no sense to me. If there's a global integer, all
nicely boxed as a first-class-citizen object, the allocation of it
was already done *once* when I did the SETQ or DEFPARAMETER etc. No
further allocation should be needed to *unbox* it, i.e. extract the
known-fixed-precision integer within it. CMUCL is dumb!

> (*rustle*) (*rustle*)... (*grep*) (*grep*)...
> Hmmm... Seems there's an internal sort of "compiler macro" built
> into the CMUCL compiler -- but *not* visible to user, that is,
> (COMPILER-MACRO-FUNCTION 'RANDOM) ==> NIL -- that partially inlines
> calls to RANDOM if the arg meets some narrow conditions that I
> can't quite parse. It's in the file
> "cmucl-19c/src/compiler/x86/arith.lisp", in the (DEFINE-VOP
> (RANDOM-MT19937) ...) if anyone wants to look at it. Anyway, when
> those conditions are met, the compiler partially inlines the
> MT19937 algorithm [still calling out to RANDOM-MT19937-UPDATE as
> necessary]; otherwise it just generates a normal out-of-line call
> to RANDOM. ...

> Anyway, the main take-away here is to remember to
> *always* compile your functions before trying to analyse
> speed and/or consing...

So I need to manually copy and paste from what (expt ...) returns
into the *SOURCE* code of my function definition before I
READ-EVAL-PRINT-COMPILE it. (And of course I need to make sure the
*literal* argument to RANDOM is within the bounds in the first
place.)  OK, will do:

(setq exp2 20)
(setq arrsiz (expt 2 exp2))
=> 1048576 ;Manually copy that in below..

(defun one-random ()
  (the (unsigned-byte 21)
    (random (the (unsigned-byte 21)
              1048576)))) ;...copied from above
(compile *)

(defun many-randoms ()
  (dotimes (k 50000)
    (declare (type (unsigned-byte 21) k))
    (the (unsigned-byte 21)
      (one-random))))
(compile *)

(time (many-randoms))
Compiling LAMBDA NIL:
Compiling Top-Level Form:

Evaluation took:
  0.0 seconds of real time
  0.00166 seconds of user run time
  0.002214 seconds of system run time
  0 page faults and
  0 bytes consed.

Hey, thanks! Now given the limit on size of parameter to RANDOM,
I'll probably want to split the array index into two parts, using
two separate calls to RANDOM, not tonight (already past bedtime),
maybe tomorrow.

One thing that bothers me is needing to manually copy&paste the new
parameter to RANDOM and re-define and re-compile each time I change
the size of the array. Given the array size is the product of *two*
arguments to RANDOM, it's too easy to make a mistake. I'll need to
find a way to make a DEFUN-and-COMPILE factory that takes a
parameter of array size and automatically builds the correct
s-expression for the DEFUN and then EVALs then COMPILEs it. One way
is to print the s-expression to a string, substitute the literal
text within that string, then READ-FROM-STRING to get the new
s-expression. The amazing thing is that in Lisp that monstrosity is
actually doable!! (Try that in Java and weep!) Hmm, another way,
slightly cleaner, is to handcode a source-DEFUN-syntax string
containing a read-time-eval expression for the literal constant,
and wrap a special-variable binding around the call to
READ-FROM-STRING. Something like this:

(defparameter string-to-parse "
  (defun one-random ()
    (the (unsigned-byte 21)
      (random (the (unsigned-byte 21)
                #.$RANDARG$))))
  ")
(proclaim '(special $RANDARG$))
(let (($RANDARG$ (expt 2 15)))
  (read-from-string string-to-parse))
=> (DEFUN ONE-RANDOM ()
     (THE (UNSIGNED-BYTE 21) (RANDOM (THE (UNSIGNED-BYTE 21) 32768))))
It works (the basic idea)!

> And, indeed, when ONE-ACCESS is changed to manually inline
> the value of ARRSIZ, the consing goes away completely!!
>     cmu> (defun one-access ()
>            (setf (aref arr (the (unsigned-byte 28) (random 268435456)))
>                  (the (unsigned-byte 8) (random 256))))

Yeah, after I develop the basic idea of string-to-parse getting
parsed inside a binding of the value to become the literal constant
(actually done above already), I'll work toward doing the same sort
of thing for the entire algorithm I was trying to experiment with.
At least now I have the solution bracketed between what didn't work
before and what isn't quite what I want but does work now, so I can
work one step at a time from a known-working code towards the
desired code.  After a night's sleep mostly likely ...

Thanks again, several people who helped lead toward the solution.

From: Rob Warnock
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Fri, 13 Jun 2008 09:39:46 +0000
Message-ID: <8dmdnc_m0Kv_3c_VnZ2dnUVZ_rrinZ2d@speakeasy.net>

Robert Maas, <··················@spamgourmet.com.remove> wrote:
+---------------
| > From: ····@rpw3.org (Rob Warnock)
| (Snipping most of your detours, keeping only the part I can use:)
| > Well, when I tried it on CMUCL-19c, when given a literal argument
| > [but much more about that below!] RANDOM didn't cons at all:
| > First & foremost, Robert forgot to *compile* ONE-ACCESS!!
| 
| But the REP, after I define a function, when I call it the first
| time, it prints out a message *saying* that's compiling it!! So I
| just assumed it was telling me the truth. I guess CMUCL lied to me
| and I believed it. Oh well.
+---------------

No, CMUCL is *not* "lying" -- you're misreading it! Consider:

    cmu> (defun slow () (dotimes (i 1000000)))

    SLOW
    cmu> (time (slow))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 
    ; [GC threshold exceeded with 13,447,000 bytes in use.  Commencing GC.]
    ; [GC completed with 1,493,040 bytes retained and 11,953,960 bytes freed.]
    ; [GC will next occur when at least 13,493,040 bytes are in use.]
    ; [GC threshold exceeded with 13,506,600 bytes in use.  Commencing GC.]
    ; [GC completed with 1,511,296 bytes retained and 11,995,304 bytes freed.]
    ; [GC will next occur when at least 13,511,296 bytes are in use.]
    ; [GC threshold exceeded with 13,524,856 bytes in use.  Commencing GC.]
    ; [GC completed with 1,503,112 bytes retained and 12,021,744 bytes freed.]
    ; [GC will next occur when at least 13,503,112 bytes are in use.]
    ; [GC threshold exceeded with 13,516,672 bytes in use.  Commencing GC.]
    ; [GC completed with 1,513,176 bytes retained and 12,003,496 bytes freed.]
    ; [GC will next occur when at least 13,513,176 bytes are in use.]

    ; Evaluation took:
    ;   3.05f0 seconds of real time
    ;   3.029525f0 seconds of user run time
    ;   5.f-6 seconds of system run time
    ;   6,760,026,224 CPU cycles
    ;   [Run times include 0.07f0 seconds GC run time]
    ;   0 page faults and
    ;   48,037,040 bytes consed.
    ; 
    NIL
    cmu> 

Did it compile the function SLOW?!?  **NO!!** It *only* compiled
the body of the TIME form! If you want the SLOW function itself
to get compiled, you have to do it yourself:

    cmu> (compile 'slow)
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    SLOW
    NIL
    NIL
    cmu> (time (slow))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.0f0 seconds of real time
    ;   0.001365f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   3,011,760 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> 

Again, the TIME form only compiled the body of the TIME form,
namely, the call to SLOW. But since the second time SLOW was now
a compiled function, things went a *lot* faster and consed nothing.

+---------------
| > But there's a more subtle problem, probably CMUCL-specific, which
| > is that ONE-ACCESS is calling (RANDOM ARRSIZ), which, despite being
| > a (near-)constant here, seems to cause consing. Compare this:
| >     cmu> (time (dotimes (i 10000) (random 268435456)))
| ..
| >     ;   0 bytes consed.                  <== NO CONSING!!
| 
| Hmm, that makes no sense to me. If there's a global integer, all
| nicely boxed as a first-class-citizen object, the allocation of it
| was already done *once* when I did the SETQ or DEFPARAMETER etc. No
| further allocation should be needed to *unbox* it, i.e. extract the
| known-fixed-precision integer within it. CMUCL is dumb!
+---------------

No, CMUCL is "too smart for its own good sometimes".  ;-}  ;-}

The issue is that if the compiler inline expander (DEFTRANSFORM RANDOM)
*doesn't* inline your RANDOM call, and your arg is not less than
KERNEL:RANDOM-FIXNUM-MAX, then a very generic version of RANDOM
[internal function %RANDOM-INTEGER] gets called which immediately
allocates a (KERNEL:RANDOM-CHUNK *RANDOM-STATE*), which is almost
always a BIGNUM [albeit a small one], hence the allocation.

+---------------
| So I need to manually copy and paste from what (expt ...) returns
| into the *SOURCE* code of my function definition before I
| READ-EVAL-PRINT-COMPILE it. (And of course I need to make sure the
| *literal* argument to RANDOM is within the bounds in the first place.)
+---------------

Well, if you're using a literal [and are on 18d or later and/or(?)
are using the Mersenne Twister RANDOM (MT19937)], then you can go
all the way up to (EXPT 2 32) without consing:

    cmu> (time (dotimes (i 1000000) (random 4294967296)))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.02f0 seconds of real time
    ;   0.015109f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   35,257,584 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> 

+---------------
| (defun one-random ()
|   (the (unsigned-byte 21)
|     (random (the (unsigned-byte 21)
|               1048576)))) ;...copied from above
| (compile *)
| 
| (defun many-randoms ()
|   (dotimes (k 50000)
|     (declare (type (unsigned-byte 21) k))
|     (the (unsigned-byte 21)
|       (one-random))))
| (compile *)
+---------------

Note: You don't have to declare the types of either K or the
RANDOM call here -- the CMUCL compiler will figure those out
automatically [it has pretty good dataflow for constants and
"known functions", which includes those earlier in the same file].
That is, this gives the same non-consing results:

    (defun one-random () (random 1048576))
    (compile *)
    (defun many-randoms () (dotimes (k 50000) (one-random)))
    (compile *)
    (time (many-randoms))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.0f0 seconds of real time
    ;   0.00115f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   2,535,624 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL

+---------------
| One thing that bothers me is needing to manually copy&paste the new
| parameter to RANDOM and re-define and re-compile each time I change
| the size of the array. Given the array size is the product of *two*
| arguments to RANDOM, it's too easy to make a mistake.
+---------------

If you can keep the size of the argument to RANDOM less-than-or-equal
the value of KERNEL:RANDOM-FIXNUM-MAX on your system [which is 4194303
on CMUCL-18d & later, but only 524287 on earlier versions (probably
including your 18b!)], then it still won't cons:

    (defparameter *foo* kernel:random-fixnum-max)
    (defun one-random () (random *foo*))
    (compile *)
    (defun many-randoms () (dotimes (k 50000) (one-random)))
    (compile *)
    (time (many-randoms))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.0f0 seconds of real time
    ;   0.002342f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   6,231,240 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL

But if you can't keep the size of the argument to RANDOM under that
limit, and you can't use a literal, then you'll have to either fix
the bug in "cmucl/src/compiler/float-tran.lisp" (DEFTRANSFORM RANDOM)
yourself or wait for someone else to fix it.

+---------------
| I'll need to find a way to make a DEFUN-and-COMPILE factory that
| takes a parameter of array size and automatically builds the correct
| s-expression for the DEFUN and then EVALs then COMPILEs it.
+---------------

EWWWWW! Good luck with that!  :-{

+---------------
| One way is to print the s-expression to a string, substitute the literal
| text within that string, then READ-FROM-STRING to get the new
| s-expression. The amazing thing is that in Lisp that monstrosity is
| actually doable!!
+---------------

Hunh?!? "Strings? *STRINGS?!?* This is Lisp!
We don' need no stinkin' strings!"[1]

    cmu> (defun random-builder (arg)
	   (let ((*compile-print* nil))
	     (compile 'one-random `(lambda () (random ,arg)))
	     (compile 'many-randoms
		      `(lambda () (dotimes (k 50000) (one-random))))
	     nil))
    RANDOM-BUILDER
    cmu> (compile *)
    ; Compiling LAMBDA (ARG): 
    ; Compiling Top-Level Form: 

    RANDOM-BUILDER
    NIL
    NIL
    cmu> (random-builder (* 16384 16384))

    NIL
    cmu> (time (many-randoms))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.0f0 seconds of real time
    ;   0.00117f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   2,579,792 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> (> (* 16384 16384) kernel:random-fixnum-max)

    T
    cmu> 


-Rob

[1] Apologies to "The Treasure of the Sierra Madre".

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Pascal J. Bourguignon
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Fri, 13 Jun 2008 09:18:05 +0000
Message-ID: <7c7ictu20i.fsf@pbourguignon.anevia.com>

··················@spamgourmet.com.remove (Robert Maas, http://tinyurl.com/uh3t) writes:

> One thing that bothers me is needing to manually copy&paste the new
> parameter to RANDOM and re-define and re-compile each time I change
> the size of the array. Given the array size is the product of *two*
> arguments to RANDOM, it's too easy to make a mistake. I'll need to
> find a way to make a DEFUN-and-COMPILE factory that takes a
> parameter of array size and automatically builds the correct
> s-expression for the DEFUN and then EVALs then COMPILEs it. One way
> is to print the s-expression to a string, substitute the literal
> text within that string, then READ-FROM-STRING to get the new
> s-expression. The amazing thing is that in Lisp that monstrosity is
> actually doable!! (Try that in Java and weep!) Hmm, another way,
> slightly cleaner, is to handcode a source-DEFUN-syntax string
> containing a read-time-eval expression for the literal constant,
> and wrap a special-variable binding around the call to
> READ-FROM-STRING. Something like this:
>
> (defparameter string-to-parse "
>   (defun one-random ()
>     (the (unsigned-byte 21)
>       (random (the (unsigned-byte 21)
>                 #.$RANDARG$))))
>   ")
> (proclaim '(special $RANDARG$))
> (let (($RANDARG$ (expt 2 15)))
>   (read-from-string string-to-parse))
> => (DEFUN ONE-RANDOM ()
>      (THE (UNSIGNED-BYTE 21) (RANDOM (THE (UNSIGNED-BYTE 21) 32768))))
> It works (the basic idea)!

Yes, but all this brutality is really not necessary.  This is Lisp!

(defparameter *one-random-definition*
   '(defun one-random ()
      (the (unsigned-byte 21)
        (random (the (unsigned-byte 21) %RAND-ARG%)))))

(loop
  :for log2-of-max :from 1 :to 20
  :for rand-arg = (expt 2 log2-of-max)
  :do (eval (subst rand-arg '%rand-arg% *one-random-definition*))
  :collect (one-random))

--> (1 0 0 2 11 1 81 107 19 873 1946 2584 7652 1368 13615 44345 
     28677 17764 101515 524618) ; for example

-- 
__Pascal Bourguignon__

From: Rob Warnock
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Fri, 13 Jun 2008 10:23:16 +0000
Message-ID: <rLydnUsv-tQJ18_VnZ2dnUVZ_tbinZ2d@speakeasy.net>

Pascal J. Bourguignon <···@informatimago.com> wrote:
+---------------
| ··················@spamgourmet.com.remove (Robert Maas) writes:
| > I'll need to find a way to make a DEFUN-and-COMPILE factory that
| > takes a parameter of array size and automatically builds the correct
| > s-expression for the DEFUN and then EVALs then COMPILEs it. One way
| > is to print the s-expression to a string, substitute the literal
| > text within that string, then READ-FROM-STRING to get the new
| > s-expression. The amazing thing is that in Lisp that monstrosity is
| > actually doable!!
...
| Yes, but all this brutality is really not necessary.  This is Lisp!
| 
| (defparameter *one-random-definition*
|    '(defun one-random ()
|       (the (unsigned-byte 21)
|         (random (the (unsigned-byte 21) %RAND-ARG%)))))
| 
| (loop
|   :for log2-of-max :from 1 :to 20
|   :for rand-arg = (expt 2 log2-of-max)
|   :do (eval (subst rand-arg '%rand-arg% *one-random-definition*))
|   :collect (one-random))
| 
| --> (1 0 0 2 11 1 81 107 19 873 1946 2584 7652 1368 13615 44345 
|      28677 17764 101515 524618) ; for example
+---------------

Yes, this is Lisp, and as I told him in my parallel reply,
"We don' need no stinkin' strings!"  ;-}  ;-}

But the issue at question is GC'ing, and your version doesn't
address those problems, which can -- in CMUCL, at least -- only
be addressed at present (1) by compiling ONE-RANDOM [which your
code didn't], and (2) needing to use a literal numeric arg to
RANDOM if the arg is larger than KERNEL:RANDOM-FIXNUM-MAX [an
internal magic CMUCL constant]. My version [see my parallel reply
in <·····································@speakeasy.net>] did
handle both of those concerns:

    (defun random-builder (arg)
      (let ((*compile-print* nil))
	(compile 'one-random `(lambda () (random ,arg)))
	(compile 'many-randoms
		 `(lambda () (dotimes (k 50000) (one-random))))
	nil))

    ...[remainder elided]...

and results in a MANY-RANDOMS function that does not cons.

If one wanted to pre-compile a bunch of them and save them away
somewhere [and also avoid polluting the global namespace as *my*
version did!! ;-}  ;-} ], then of course a loop like yours would
certainly be the way to do it... but in CMUCL you can go all the
way to 2^32:

    cmu> (defparameter *many-randoms*
           (loop for log2-of-max from 1 to 32
		 for rand-arg = (expt 2 log2-of-max)
	     collect (let ((*compile-print* nil))
		       (compile 'nil `(lambda ()
					(dotimes (k 50000)
					  (random ,rand-arg)))))
		     into results
	     finally (return (coerce results 'vector))))

    *MANY-RANDOMS*
    cmu> (type-of *many-randoms*)

    (SIMPLE-VECTOR 32)
    cmu> (time (funcall (aref *many-randoms* 31)))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.0f0 seconds of real time
    ;   8.27f-4 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   1,821,208 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> 


-Rob

p.s. Still ugly. The *real* fix is in "cmucl/src/compiler/float-tran.lisp"
[making args in (INTEGER 1 4294967296) work correctly on X86 platforms],
and awaits the attentions of the maintainers.

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Fri, 13 Jun 2008 18:08:25 +0000
Message-ID: <rem-2008jun13-003@yahoo.com>

> From: ····@informatimago.com (Pascal J. Bourguignon)
> (defparameter *one-random-definition*
;If I were doing it, I'd call it something like *template-for-random-defun*
; emphasizing that it's not a usable definition itself, it's only a template
; for a definition.
>    '(defun one-random ()
>       (the (unsigned-byte 21)
>         (random (the (unsigned-byte 21) %RAND-ARG%)))))
> (loop
>   :for log2-of-max :from 1 :to 20
>   :for rand-arg = (expt 2 log2-of-max)
>   :do (eval (subst rand-arg '%rand-arg% *one-random-definition*))
>   :collect (one-random))

But indeed having an already-parsed expression (by the usual means
of just doing a DEFPARAMETER from the REP as you suggest), with a
token placeholder where the literal integer value should be
located, and then using SUBST to convert template to instance, is
what I thought of myself after I already posted and was on my way
to bed and too exhausted to go back online and post immediately
before bed. So I guess at this point we're in agreement as to one
decent way to do it. Note: SUBST was the standard way to get by
without the backquote macro back in the old days, but here it may
actually be the *right*way* to do the task here you want the
literal splice to occur repeatedly with different values after just
one READ rather than splice just once at READ time. Still it's only
a matter of efficiency whether to parse using READ every time I set
up a new array size or READ just once and SUBST each time. Both
methods are technically correct.

Actually last night before bed I thought of an actual good reason
why SUBST is "more correct". By parsing just once, you are
guaranteed that all the symbols are in the correct package, and
they won't change in the course of an experiment. But by parsing
each time, if the package has changed then some symbols might parse
into a different package, which might be difficult to diagnose. So
using SUBST is cleaner/safer. Also, perhaps the dummy token to be
substituted should be a keyword, to further eliminate any possible
package problems.

Thus my tentative code as a starting point for today's experiments
will likely be:
 (defparameter *template-for-random-defun*
    '(defun one-random ()
       (the (unsigned-byte 21)
         (random (the (unsigned-byte 21) :DUMMYTOKEN1)))))
I'll do an experiment to see whether (the (unsigned-byte 21) ...)
wrapper around the literal argument to RANDOM is really necessary
to avoid consing. In fact I'll do experiments with *every* use of
(the ...) wrappers to make sure each one is really necessary. And
of course I'll split the size of the array into two separate
factors, so that each factor is within the limit for inline RANDOM,
and have that fully automated so I give the factory one parameter
which is the power of two array size and another parameter which is
the repeat count for (dotimes ...) and it'll compute all the
necessary numeric values and SUBST each of them into the
appropriate :DUMMYTOKEN<n> spots within the expressions to be
evaluated. Maybe I'll give the dummy token keywords more
self-documentating names, such as :DUMMY-ARRSIZ-FACTOR-HI
:DUMMY-ARRSIZ-FACTOR-LO :DUMMY-REPEAT-COUNT. I'll make a LIST of
all the DEFUN-templates that need patching/evaluating/compiling, do
the SUBST once on the entire list, then MAP down the list for both
EVAL and COMPILE. Essentially the toplevel of the DEFUN factory
will be coded as something like this:

(defun define-cache-test-rig (exp2 nrepeat)
  (unless (and (integerp exp2) (<= 2 exp2 28)) (error "EXP2 out of range"))
  (let* ((exp2lo (floor exp2 2))
         (exp2hi (- exp2 exp2lo))
         (arrsizlo (expt 2 exp2lo))
         (arrsizhi (expt 2 exp2hi))
         (forms (subst nrepeat :DUMMY-REPEAT-COUNT
                  (subst arrsizhi :DUMMY-ARRSIZ-FACTOR-HI
                    (subst arrsizlo :DUMMY-ARRSIZ-FACTOR-LO
                      *list-of-templates-for-cache-test-rig*)))))
    (mapcar #'COMPILE (mapcar #'EVAL forms))))

(Note the name starting with DEFINE instead of MAKE, because the
 effect will be global side-effects of DEFUNs happening rather than
 simply constructing and returning a first-class test rig object
 with no side effects happening in the process.)

It's inefficient that it has to make three copies of the template
each time, one copy with just the arrsizlo substitution, a second
copy from that with the arrsizhi also substituted, and finally a
third copy from that with the nrepeat also substituted, but this
happens only once each time I change the array size or repeat
count, so it's not worth worrying about. I could make that more
efficient by using a function called SUBSTS that I wrote for PSL
many years ago, which takes an assoc-list of key-value pairs to
substitute, and scans the tree structure just once to re-build a
structure with *all* the substitutions made at the same time, but
it's not worth the extra effort.

From: Madhu
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Fri, 13 Jun 2008 10:51:40 +0000
Message-ID: <m3d4mlzjyb.fsf@meer.net>

* (Rob Warnock) <································@speakeasy.net> :
Wrote on Thu, 12 Jun 2008 22:25:13 -0500:

| Madhu  <·······@meer.net> wrote:
| | Yes. I believe CMUCL's RANDOM conses if its integer argument is greater
| | than kernel::random-fixnum-max, which is (expt 2 22).
| | [I Checked this on rand-mt19937.lisp on an old version of CMU Common
| |  Lisp 2006-09-07 (19D), linux Python 1.1, target Intel x86]
| +---------------
|
| Well, when I tried it on CMUCL-19c, when given a literal argument
| [but much more about that below!] RANDOM didn't cons at all:
|

I did not mention that in the case of a literal integer argument the
compiler has a deftransform defined which uses (unsigned-byte 32)
arithmetic, which does not cons.

Sorry.

This does not get triggered when the argument it random is specified via
the variable.

--
Madhu

From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Fri, 13 Jun 2008 06:06:32 +0000
Message-ID: <rem-2008jun12-008@yahoo.com>

> | (CMU Common Lisp 18b, ... Python 1.0, target Intel x86)
> | The purpose of this test is to learn (by experiment) what size
> | active memory fits within the CPU cache and what size thrashes the
> | CPU cache thereby significantly slowing performace.
> <snip>
> | So what's allocating all that memory? Is it inside RANDOM??
> From: Madhu <·······@meer.net>
> Yes. I believe CMUCL's RANDOM conses if its integer argument is greater
> than kernel::random-fixnum-max, which is (expt 2 22).

After seeing your reply, I tried using RANDOM with smaller
argument, but it didn't help enough:

(defparameter exp2 21)
(defparameter arrsiz (expt 2 exp2))

arrsiz
=> 2097152

(time (dotimes (k 50000)
        (declare (type (unsigned-byte 16) k))
        (the (unsigned-byte 21) (random arrsiz))))
Compiling LAMBDA NIL:
Compiling Top-Level Form:
[GC threshold exceeded with 2,002,376 bytes in use.  Commencing GC.]
[GC completed with 92,528 bytes retained and 1,909,848 bytes freed.]
[GC will next occur when at least 2,092,528 bytes are in use.]

Evaluation took:
  0.08 seconds of real time
  0.067687 seconds of user run time
  0.010309 seconds of system run time
  [Run times include 0.04 seconds GC run time]
  0 page faults and
  1945120 bytes consed.

OK, only one GC instead of the three I got before, so maybe it's
just a first-time effect when switching from REP to tight loop?
Testing that theory:

(time (dotimes (k 50000)
        (declare (type (unsigned-byte 16) k))
        (the (unsigned-byte 21) (random arrsiz))))
Compiling LAMBDA NIL:
Compiling Top-Level Form:
[GC threshold exceeded with 2,097,016 bytes in use.  Commencing GC.]
[GC completed with 115,600 bytes retained and 1,981,416 bytes freed.]
[GC will next occur when at least 2,115,600 bytes are in use.]

Evaluation took:
  0.07 seconds of real time
  0.073507 seconds of user run time
  0.0 seconds of system run time
  [Run times include 0.04 seconds GC run time]
  0 page faults and
  1947240 bytes consed.

(time (dotimes (k 50000)
        (declare (type (unsigned-byte 16) k))
        (the (unsigned-byte 21) (random arrsiz))))
Compiling LAMBDA NIL:
Compiling Top-Level Form:
[GC threshold exceeded with 2,117,920 bytes in use.  Commencing GC.]
[GC completed with 185,328 bytes retained and 1,932,592 bytes freed.]
[GC will next occur when at least 2,185,328 bytes are in use.]

Evaluation took:
  0.07 seconds of real time
  0.072034 seconds of user run time
  0.0 seconds of system run time
  [Run times include 0.03 seconds GC run time]
  0 page faults and
  1945056 bytes consed.

So each time I start the loop, it does one GC. So if I run a bigger
loop, and all the GC is just at startup then it should still do
just one GC:

(time (dotimes (k 500000)
        (declare (type (unsigned-byte 21) k))
        (the (unsigned-byte 21) (random arrsiz))))
Compiling LAMBDA NIL:
Compiling Top-Level Form:

[GC threshold exceeded with 2,202,784 bytes in use.  Commencing GC.]
[GC completed with 271,912 bytes retained and 1,930,872 bytes freed.]
[GC will next occur when at least 2,271,912 bytes are in use.]
[GC threshold exceeded with 2,272,448 bytes in use.  Commencing GC.]
[GC completed with 278,704 bytes retained and 1,993,744 bytes freed.]
[GC will next occur when at least 2,278,704 bytes are in use.]
[GC threshold exceeded with 2,281,160 bytes in use.  Commencing GC.]
[GC completed with 288,976 bytes retained and 1,992,184 bytes freed.]
[GC will next occur when at least 2,288,976 bytes are in use.]
[GC threshold exceeded with 2,291,336 bytes in use.  Commencing GC.]
[GC completed with 299,240 bytes retained and 1,992,096 bytes freed.]
[GC will next occur when at least 2,299,240 bytes are in use.]
[GC threshold exceeded with 2,301,544 bytes in use.  Commencing GC.]
[GC completed with 317,696 bytes retained and 1,983,848 bytes freed.]
[GC will next occur when at least 2,317,696 bytes are in use.]
[GC threshold exceeded with 2,320,128 bytes in use.  Commencing GC.]
[GC completed with 311,584 bytes retained and 2,008,544 bytes freed.]
[GC will next occur when at least 2,311,584 bytes are in use.]
[GC threshold exceeded with 2,314,040 bytes in use.  Commencing GC.]
[GC completed with 321,856 bytes retained and 1,992,184 bytes freed.]
[GC will next occur when at least 2,321,856 bytes are in use.]
[GC threshold exceeded with 2,324,248 bytes in use.  Commencing GC.]
[GC completed with 328,024 bytes retained and 1,996,224 bytes freed.]
[GC will next occur when at least 2,328,024 bytes are in use.]
[GC threshold exceeded with 2,330,408 bytes in use.  Commencing GC.]
[GC completed with 346,488 bytes retained and 1,983,920 bytes freed.]
[GC will next occur when at least 2,346,488 bytes are in use.]
[GC threshold exceeded with 2,348,896 bytes in use.  Commencing GC.]
[GC completed with 346,488 bytes retained and 2,002,408 bytes freed.]
[GC will next occur when at least 2,346,488 bytes are in use.]

Evaluation took:
  0.79 seconds of real time
  0.749877 seconds of user run time
  0.005508 seconds of system run time
  [Run times include 0.37 seconds GC run time]
  0 page faults and
  19452896 bytes consed.

Maybe the problem is that the parameter *to* RANDOM isn't declared?
Fixing that:

(time (dotimes (k 500000)
        (declare (type (unsigned-byte 21) k))
        (the (unsigned-byte 21) (random (the (unsigned-byte 21) arrsiz)))))
Compiling LAMBDA NIL:
Compiling Top-Level Form:
[GC threshold exceeded with 2,002,264 bytes in use.  Commencing GC.]
[GC completed with 93,232 bytes retained and 1,909,032 bytes freed.]
[GC will next occur when at least 2,093,232 bytes are in use.]
[GC threshold exceeded with 2,096,552 bytes in use.  Commencing GC.]
[GC completed with 89,144 bytes retained and 2,007,408 bytes freed.]
[GC will next occur when at least 2,089,144 bytes are in use.]
[GC threshold exceeded with 2,095,616 bytes in use.  Commencing GC.]
[GC completed with 107,608 bytes retained and 1,988,008 bytes freed.]
[GC will next occur when at least 2,107,608 bytes are in use.]
[GC threshold exceeded with 2,114,016 bytes in use.  Commencing GC.]
[GC completed with 115,792 bytes retained and 1,998,224 bytes freed.]
[GC will next occur when at least 2,115,792 bytes are in use.]
[GC threshold exceeded with 2,122,248 bytes in use.  Commencing GC.]
[GC completed with 130,160 bytes retained and 1,992,088 bytes freed.]
[GC will next occur when at least 2,130,160 bytes are in use.]
[GC threshold exceeded with 2,132,576 bytes in use.  Commencing GC.]
[GC completed with 130,160 bytes retained and 2,002,416 bytes freed.]
[GC will next occur when at least 2,130,160 bytes are in use.]
[GC threshold exceeded with 2,132,536 bytes in use.  Commencing GC.]
[GC completed with 148,616 bytes retained and 1,983,920 bytes freed.]
[GC will next occur when at least 2,148,616 bytes are in use.]
[GC threshold exceeded with 2,150,944 bytes in use.  Commencing GC.]
[GC completed with 158,896 bytes retained and 1,992,048 bytes freed.]
[GC will next occur when at least 2,158,896 bytes are in use.]
[GC threshold exceeded with 2,161,312 bytes in use.  Commencing GC.]
[GC completed with 177,344 bytes retained and 1,983,968 bytes freed.]
[GC will next occur when at least 2,177,344 bytes are in use.]

Evaluation took:
  0.77 seconds of real time
  0.71846 seconds of user run time
  0.013362 seconds of system run time
  [Run times include 0.32 seconds GC run time]
  0 page faults and
  19471920 bytes consed.

Reduced from ten to nine GCs, probably random, not good enough.
Maybe I need to put the entire loop inside a lexical closure where
ARRSIZ is defined once as an int32 then referenced locally?
I'll read the rest of the thread before trying that.

From: Rob Warnock
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Fri, 13 Jun 2008 08:44:42 +0000
Message-ID: <VvmdnQeqFrjnrs_VnZ2dnUVZ_qjinZ2d@speakeasy.net>

Robert Maas, <··················@spamgourmet.com.remove> wrote:
+---------------
| After seeing your reply, I tried using RANDOM with smaller
| argument, but it didn't help enough:
| 
| (defparameter exp2 21)
| (defparameter arrsiz (expt 2 exp2))
| 
| arrsiz
| => 2097152
| 
| (time (dotimes (k 50000)
|         (declare (type (unsigned-byte 16) k))
|         (the (unsigned-byte 21) (random arrsiz))))
| Compiling LAMBDA NIL:
| Compiling Top-Level Form:
| [GC threshold exceeded with 2,002,376 bytes in use.  Commencing GC.]
...
|   1945120 bytes consed.
+---------------

See my parallel reply <·····································@speakeasy.net>.

The problem isn't the value of the arg to RANDOM per se, it's that unless
the arg is a literal number the #+RANDOM-MT19937 (DEFTRANSFORM RANDOM)
inline expander in "cmucl/src/compiler/float-tran.lisp" isn't seeing
the arg as one which will return an (UNSIGNED-BYTE 32) or smaller, and
thus is punting to the generic function call of RANDOM... which conses
some 80-96 bytes per call [it varies].

There's special code on X86 platforms to allow args up to (EXPT 2 32),
*significantly* larger than KERNEL:RANDOM-FIXNUM-MAX [== 4194303 in 19c,
but see below re 18b!], but unfortunately the COND branch for that check
is in the wrong place!! [Yes, I have already mentioned this to one of
the CMUCL maintainers.]

But how big is KERNEL:RANDOM-FIXNUM-MAX on CMUCL-18b?!? I don't have
version 18b here, but on 18d it was already 4194303 [same as 19c],
and your above code does *NOT* cons anything on either 18d or 19c!!

    cmu> (list (lisp-implementation-type)
	       (lisp-implementation-version))

    ("CMU Common Lisp" "18d")
    cmu> (defparameter exp2 21)

    EXP2
    cmu> (defparameter arrsiz (expt 2 exp2))

    ARRSIZ
    cmu> arrsiz

    2097152
    cmu> kernel:random-fixnum-max    ; What is this on your system?!?

    4194303
    cmu> (time (dotimes (k 50000) ; You don't really need to declare K here
		 (random arrsiz)))
    Compiling LAMBDA NIL: 
    Compiling Top-Level Form: 

    Evaluation took:
      0.0 seconds of real time
      7.26e-4 seconds of user run time
      0.001454 seconds of system run time
      0 page faults and
      0 bytes consed.
    NIL
    cmu> (time (dotimes (k 2000000000)
		 (random arrsiz)))
    Compiling LAMBDA NIL: 
    Compiling Top-Level Form: 

    Evaluation took:
        87.24 seconds of real time
        86.44366 seconds of user run time
        0.0f0 seconds of system run time
        0 page faults and
        0 bytes consed.
    NIL
    cmu> 

Aha!! A web search shows that in even older versions of CMUCL,
KERNEL:RANDOM-FIXNUM-MAX was only 524287!! So either try an
ARRSIZ < 524287, or (much better!) switch to a newer version
of CMUCL. [You really, *really* need to do that in any case,
since CMUCL-18b is already a *decade* old!!]

+---------------
| Maybe I need to put the entire loop inside a lexical closure where
| ARRSIZ is defined once as an int32 then referenced locally?
+---------------

That won't help until the bug in "cmucl/src/compiler/float-tran.lisp"
gets fixed. Until then, either use a literal constant arg [can be as
high as (EXPT 2 32) on X86] or a variable arg whose type is known to
the compiler and whose value is less than KERNEL:RANDOM-FIXNUM-MAX
on whatever version you're running.

+---------------
| I'll read the rest of the thread before trying that.
+---------------

Just read my previous [URL above] & this message...  ;-}


-Rob

p.s. Hmmm... What does (FIND :RANDOM-MT19937 *FEATURES*) give you on 18b?
The larger KERNEL:RANDOM-FIXNUM-MAX comes with the new MT19937 random
number generator, which may have been an optional "extra" in CMUCL-18b.
[It was standard by 18d, at least.] If so, you might be able to load it
at runtime [but *before* compiling code which calls RANDOM] and at least
get a non-consing RANDOM with args up to 4194303 [results in 0..4194302].

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Madhu
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Fri, 13 Jun 2008 14:52:43 +0000
Message-ID: <m3prqlxu84.fsf@meer.net>

* (Rob Warnock) <································@speakeasy.net> :
Wrote on Fri, 13 Jun 2008 03:44:42 -0500:

| The problem isn't the value of the arg to RANDOM per se, it's that unless
| the arg is a literal number the #+RANDOM-MT19937 (DEFTRANSFORM RANDOM)
| inline expander in "cmucl/src/compiler/float-tran.lisp" isn't seeing
| the arg as one which will return an (UNSIGNED-BYTE 32) or smaller, and
| thus is punting to the generic function call of RANDOM... which conses
| some 80-96 bytes per call [it varies].

There is a big num multiply In the transform. I believe that should mean
potential consing if the number is not a fixnum, even if it fits within
2^32.  [Do correct me if I'm wrong! This should be in the first branch
of the COND so the bug noted below won't apply]

;; fixnum. no consing
(time (loop for i below 5000 do (setq $k (random (expt 2 29)))))

;; non fixnum. consing
(time (loop for i below 5000 do (setq $k (random (expt 2 31)))))

| there's special code on X86 platforms to allow args up to (EXPT 2 32),
| *significantly* larger than KERNEL:RANDOM-FIXNUM-MAX [== 4194303 in 19c,
| but see below re 18b!], but unfortunately the COND branch for that check
| is in the wrong place!! [Yes, I have already mentioned this to one of
| the CMUCL maintainers.]

[It would be cool if you posted issues you noted on cmucl-devel list, so
 there is a record.  Now to be guilty of that selfsame accusation :)]

I did notice the misplaced COND branch here but was not sure if this
figured in the present problem because I was distracted by the fact that
RANDOM was not deriving (integer 1 (expt 32)) from an argument declared
as (unsigned-byte 32).  Fixing the COND branch makes CMUCL cons about an
order of magnitude less! (compared to calling the generic function)

Consider the test case:
(defvar $size-large (1- (expt 2 30)))
(defvar $k)
(time (let ((x $size-large))
	(declare (type (integer 1 1073741824) x)) ; (expt 2 30)
	(loop for i below 5000 do (setq $k (random x)))))

BEFORE FIXING (GENERIC CALL)

; Evaluation took:
;   0.03 seconds of real time
;   0.012001 seconds of user run time
;   0.004 seconds of system run time
;   14,388,150 CPU cycles
;   0 page faults and
;   429,856 bytes consed.


AFTER FIXING (X86 SPECIFIC)

; Evaluation took:
;   0.0 seconds of real time
;   0.0 seconds of user run time
;   0.0 seconds of system run time
;   637,798 CPU cycles
;   0 page faults and
;   40,432 bytes consed.

--
Madhu

From: Raymond Toy (RT/EUS)
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Wed, 18 Jun 2008 16:52:38 +0000
Message-ID: <sxdod5yvg6h.fsf@rtp.ericsson.se>

>>>>> "Rob" == Rob Warnock <····@rpw3.org> writes:

    Rob> Robert Maas, <··················@spamgourmet.com.remove> wrote:
    Rob> +---------------
    Rob> | After seeing your reply, I tried using RANDOM with smaller
    Rob> | argument, but it didn't help enough:
    Rob> | 
    Rob> | (defparameter exp2 21)
    Rob> | (defparameter arrsiz (expt 2 exp2))
    Rob> | 
    Rob> | arrsiz
    Rob> | => 2097152
    Rob> | 
    Rob> | (time (dotimes (k 50000)
    Rob> |         (declare (type (unsigned-byte 16) k))
    Rob> |         (the (unsigned-byte 21) (random arrsiz))))
    Rob> | Compiling LAMBDA NIL:
    Rob> | Compiling Top-Level Form:
    Rob> | [GC threshold exceeded with 2,002,376 bytes in use.  Commencing GC.]
    Rob> ...
    Rob> |   1945120 bytes consed.
    Rob> +---------------

    Rob> See my parallel reply <·····································@speakeasy.net>.

    Rob> The problem isn't the value of the arg to RANDOM per se, it's that unless
    Rob> the arg is a literal number the #+RANDOM-MT19937 (DEFTRANSFORM RANDOM)
    Rob> inline expander in "cmucl/src/compiler/float-tran.lisp" isn't seeing
    Rob> the arg as one which will return an (UNSIGNED-BYTE 32) or smaller, and
    Rob> thus is punting to the generic function call of RANDOM... which conses
    Rob> some 80-96 bytes per call [it varies].

I think it would help if you declared arrsiz.  As it is now, the
compiler has no way of knowing the type of arrsiz and thus can't apply
the transform.

Also, Madhu has mentioned to me that random isn't very good for large
integers near 2^32.  It uses modulo to convert the 32-bit internal
random number to the desired range.  This produces a bias.  I think
this is why random-integer-extra-bits and random-fixnum-max exists.  A
rejection technique would probably be better.

Ray

From: Rob Warnock
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Sat, 21 Jun 2008 02:59:16 +0000
Message-ID: <Yc-dnfNLKf4Z88HVnZ2dnUVZ_u-dnZ2d@speakeasy.net>

Raymond Toy (RT/EUS) <···········@ericsson.com> wrote:
+---------------
| >>>>> "Rob" == Rob Warnock <····@rpw3.org> writes:
| Rob> Robert Maas, <··················@spamgourmet.com.remove> wrote:
| Rob> |   1945120 bytes consed.
| Rob> +---------------
| 
| Rob> See my parallel reply
|      <·····································@speakeasy.net>.
| Rob> The problem isn't the value of the arg to RANDOM per se, it's
| that unless
| Rob> the arg is a literal number the #+RANDOM-MT19937 (DEFTRANSFORM RANDOM)
| Rob> inline expander in "cmucl/src/compiler/float-tran.lisp" isn't seeing
| Rob> the arg as one which will return an (UNSIGNED-BYTE 32) or smaller, and
| Rob> thus is punting to the generic function call of RANDOM... which conses
| Rob> some 80-96 bytes per call [it varies].
| 
| I think it would help if you declared arrsiz.  As it is now, the
| compiler has no way of knowing the type of arrsiz and thus can't apply
| the transform.
+---------------

Nope, tried all kinds of type declarations -- doesn't help. The problem
is, as I said, a typo in "cmucl/src/compiler/float-tran.lisp", in the
#+RANDOM-MT19937 (DEFTRANSFORM RANDOM). Two of the COND terms need to
be swapped, namely, this code, which does a premature GIVE-UP [because
the RANDOM-FIXNUM-MAX is smaller than (EXPT 2 32), so the second branch
can never be taken!]:

      ((> num-high random-fixnum-max)
       (give-up "The range is too large to assure an accurate result."))
      #+x86
      ((< num-high (expt 2 32))
       '(values (bignum::%multiply (random-chunk (or state *random-state*))
		 num)))

Those clauses need to be swapped, viz.:

      #+x86
      ((< num-high (expt 2 32))
       '(values (bignum::%multiply (random-chunk (or state *random-state*))
		 num)))
      ((> num-high random-fixnum-max)
       (give-up "The range is too large to assure an accurate result."))

And indeed, when I compile a (DEFTRANSFORM RANDOM) patch with that swap,
I see the following behavior. First, before patching:

    cmu> (list (lisp-implementation-type) (lisp-implementation-version))

    ("CMU Common Lisp" "19c Release (19C)")
    cmu> (defvar *foo* 429496729)

    *FOO*
    cmu> (and (> *foo* kernel:random-fixnum-max)
	      (< *foo* (expt 2 32)))

    T
    cmu> (time (let ((bar *foo*))
                 (declare (type (integer 1 500000000) bar))
                 (dotimes (i 1000000)
		   (random bar))))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 
    ; [GC threshold exceeded with 13,319,016 bytes in use.  Commencing GC.]
    ; [GC completed with 1,376,224 bytes retained and 11,942,792 bytes freed.]
    ; [GC will next occur when at least 13,376,224 bytes are in use.]
    ...[*lots* more of that trimmed]...
    ; Evaluation took:
    ;   0.68f0 seconds of real time
    ;   0.650193f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   1,489,031,952 CPU cycles
    ;   [Run times include 0.13f0 seconds GC run time]
    ;   0 page faults and
    ;   94,013,600 bytes consed.
    ; 
    NIL
    cmu> 

You will note that declaring the type of BAR did not reduce
the consing.

Now we try the patch:

    cmu> (load "float-tran-patch")

    ; Loading #P"/u/rpw3/float-tran-patch.x86f".
    T
    cmu> (time (let ((bar *foo*))
                 (declare (type (integer 1 500000000) bar))
                 (dotimes (i 1000000)
		   (random bar))))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.01f0 seconds of real time
    ;   0.015104f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   34,939,968 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> 

*BINGO!* The special inline code got used in the latter case
[as one can show by compiling/disassembling the bodies of the TIME].

+---------------
| Also, Madhu has mentioned to me that random isn't very good for large
| integers near 2^32.  It uses modulo to convert the 32-bit internal
| random number to the desired range.  This produces a bias.  I think
| this is why random-integer-extra-bits and random-fixnum-max exists.  A
| rejection technique would probably be better.
+---------------

But the DEFTRANSFORM has special code on x86 platforms to inline
32-bit random numbers [that is, an arg to RANDOM of less than or
equal to 2^32]. Are you suggesting that [even after the above bug-fix]
said special-case code is in fact wrong?!?


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Madhu
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Sat, 21 Jun 2008 05:15:18 +0000
Message-ID: <m33an7xtax.fsf@moon.robolove.meer.net>

* (Rob Warnock) <································@speakeasy.net> :
Wrote on Fri, 20 Jun 2008 21:59:16 -0500:

| Raymond Toy (RT/EUS) <···········@ericsson.com> wrote:
| | I think it would help if you declared arrsiz.  As it is now, the
| | compiler has no way of knowing the type of arrsiz and thus can't
| | apply the transform.
|

NOTE arrsiz is declared special. I believe This has implications for type
inference (though RPW has handled it in his example)

| Nope, tried all kinds of type declarations -- doesn't help. The
| problem is, as I said, a typo in "cmucl/src/compiler/float-tran.lisp",
| in the #+RANDOM-MT19937 (DEFTRANSFORM RANDOM). Two of the COND terms
| need to be swapped, namely, this code, which does a premature GIVE-UP
| [because the RANDOM-FIXNUM-MAX is smaller than (EXPT 2 32), so the
| second branch can never be taken!]:

[snip]

|     cmu> (defvar *foo* 429496729)
|
|     *FOO*
|     cmu> (and (> *foo* kernel:random-fixnum-max)
| 	      (< *foo* (expt 2 32)))
|
|     T

Ah but youve chosen *foo* FIXNUM.

(fixnump *foo*) => T

I think The no-consing behaviour you observe [after patching
compiler/float-tran.lisp] is not because of 32bit inlining but because
your values lie in the fixnum range.

| +---------------
| | Also, Madhu has mentioned to me that random isn't very good for large
| | integers near 2^32.  It uses modulo to convert the 32-bit internal
| | random number to the desired range.  This produces a bias.  I think
| | this is why random-integer-extra-bits and random-fixnum-max exists.  A
| | rejection technique would probably be better.
| +---------------
|
| But the DEFTRANSFORM has special code on x86 platforms to inline
| 32-bit random numbers [that is, an arg to RANDOM of less than or
| equal to 2^32]. Are you suggesting that [even after the above bug-fix]
| said special-case code is in fact wrong?!?

The issue IIUC is in using (REM (random-chunk) n) to return a random
number in the range [0, N-1],    (as Raymond Toy pointed out:)

(random-chunk) is uniformly distributed in the range
 [0, (expt 2 random-chunk-legth)] 

[The mersenne twister in cmucl uses a random-chunk-length = 32.]

consider the distribution when n = 250 and random-chunk-length = 8

| (random-chunk) | (REM (random-chunk) 250) |
| 0, 250         |                        0 |
| 1, 251         |                        1 |
| 2, 252         |                        2 |
| 3, 253         |                        3 |
| 4, 254         |                        4 |
| 5, 255         |                        5 |
| 6, 256         |                        6 |
| 7              |                        7 |
| 8              |                        8 |
| ...            |                      ... |
| 249            |                      249 |

So there is a small bias in the generated numbers in the range 
[0, (mod (expt 2 random-chunk-length) n)]

I'm not an expert, so please correct me if I'm wrong.  

In the past when I used CMUCL in simulations to get results, I almost
always used an alien (FFI) interface to a RNG written in C -- either
knuth-random or one of the netlib routines, so I have not been bitten by
this myself.

--
Madhu

From: Rob Warnock
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Sat, 21 Jun 2008 13:15:30 +0000
Message-ID: <fI6dnZtOUvVvY8HVnZ2dnUVZ_t_inZ2d@speakeasy.net>

Madhu <·······@meer.net> wrote:
+---------------
| Rob Warnock <····@rpw3.org> wrote:
| | Nope, tried all kinds of type declarations -- doesn't help. The
| | problem is, as I said, a typo in "cmucl/src/compiler/float-tran.lisp",
| | in the #+RANDOM-MT19937 (DEFTRANSFORM RANDOM). Two of the COND terms
| | need to be swapped, namely, this code, which does a premature GIVE-UP
| | [because the RANDOM-FIXNUM-MAX is smaller than (EXPT 2 32), so the
| | second branch can never be taken!]:
| [snip]
| |     cmu> (defvar *foo* 429496729)
| |
| |     *FOO*
| |     cmu> (and (> *foo* kernel:random-fixnum-max)
| | 	      (< *foo* (expt 2 32)))
| |
| |     T
| 
| Ah but youve chosen *foo* FIXNUM.
| (fixnump *foo*) => T
+---------------

True enough, though it *was* bigger than KERNEL:RANDOM-FIXNUM-MAX,
which was all I really needed to show that the patch was working.  ;-}

+---------------
| I think The no-consing behaviour you observe [after patching
| compiler/float-tran.lisp] is not because of 32bit inlining but
| because your values lie in the fixnum range.
+---------------

Yes, but no, not entirely. Yes, if you pick a number larger than
a fixnum (but 2^32 or smaller) and type the intermediate variable
with (INTEGER 1 4294967296), you do still get lots of consing
[~168 bytes per iteration], presumably because it uses bignum
arithmetic to get it into a register. The same thing happens
if you type the intermediate variable with (UNSIGNED-BYTE 32),
but to a lesser degree [only ~86 bytes per iteration].

But interestingly [the "no, not entirely" case], if you pre-decrement
the argument, type the intermediate variable with (UNSIGNED-BYTE 32),
and then pass it incremented to RANDOM, the consing goes away again,
regardless of the value of *FOO* [provided, of course, that you
don't attempt values outside the range (INTEGER 1 4294967296)]:

    cmu> (load "float-tran-patch")

    ; Loading #P"/u/rpw3/float-tran-patch.x86f".
    T
    cmu> (setf *foo* (expt 2 32))       ; [already DEFVAR'd earlier]

    4294967296
    cmu> (time (let ((bar (1- *foo*))) ; so as not to violate the U32
		 (declare (type (unsigned-byte 32) bar)) 
		 (dotimes (i 1000000)
		   (random (the (unsigned-byte 32) (1+ bar))))))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.02f0 seconds of real time
    ;   0.015088f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   34,611,940 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> 

Or for something midway between a fixnum and 2^32:

    cmu> (setf *foo* (floor (- (expt 2 32) most-positive-fixnum) 2))

    1879048192
    cmu> (fixnump *foo*)

    NIL
    cmu> (time (let ((bar (1- *foo*)))
		 (declare (type (unsigned-byte 32) bar)) 
		 (dotimes (i 1000000)
		   (random (the (unsigned-byte 32) (1+ bar))))))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.02f0 seconds of real time
    ;   0.015289f0 seconds of user run time
    ;   0.001162f0 seconds of system run time
    ;   37,957,864 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> 

I think this works because the compiler is smart enough to know
that one plus a U32 is the same as (INTEGER 1 4294967296), which is
the preferred RANDOM arg type to do the special-casing on X86 platforms.

You can also avoid the consing the following way, though I'm not sure
if it's less or more ugly than the above!! ;-}

    cmu> (time (if (= *foo* 4294967296) 
		   (dotimes (i 1000000)
		     (random 4294967296)) ; special-cased in the DEFTRANSFORM
		   (let ((bar *foo*))
		     (declare (type (integer 1 4294967295) bar))
		     (dotimes (i 1000000)
		       (random bar))))) 
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.02f0 seconds of real time
    ;   0.015329f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   35,832,596 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> 

Again, the typing in the second branch lets the compiler know it
can keep BAR in a register and not need generic bignum arithmetic
on it.

+---------------
| | But the DEFTRANSFORM has special code on x86 platforms to inline
| | 32-bit random numbers [that is, an arg to RANDOM of less than or
| | equal to 2^32]. Are you suggesting that [even after the above bug-fix]
| | said special-case code is in fact wrong?!?
| 
| The issue IIUC is in using (REM (random-chunk) n) to return a random
| number in the range [0, N-1],    (as Raymond Toy pointed out:)
| 
| (random-chunk) is uniformly distributed in the range
|  [0, (expt 2 random-chunk-legth)] 
| 
| [The mersenne twister in cmucl uses a random-chunk-length = 32.]
+---------------

Actually, (random-chunk) is of type (UNSIGNED-BYTE 32), and so is is
uniformly distributed in the range [0, (expt 2 random-chunk-length)),
using half-open intervals, or [0, (1- (expt 2 random-chunk-length))]
using closed intervals.

Also, I suspect Ray may have forgotten that I was specifically talking
about the X86 platform. That (REM (RANDOM-CHUNK) N) code is under a #-X86
conditional, and won't be used. And in any event, that section of code
is only used for the case of a compile-time *constant*, and even then,
only when the constant is *not* exactly 2^32.

For the cases I was trying to fix -- the arg is a variable in [1, 2^32] --
a different section of code is used, which on the X86 platform does
(VALUES (BIGNUM::%MULTIPLY (RANDOM-CHUNK ...) ARG)) instead of the
bignum REM, where BIGNUM::%MULTIPLY is an inlined VOP that takes two
32-bit numbers and returns a 64-bit result in two 32-bit registers --
that is, doesn't cons.

Bottom line, if you stand on your head & spin around three times ;-} ;-}
you can get CMUCL on X86 to serve up pseudo-random numbers up to
and including 32 bits long without consing.


-Rob

p.s.
+---------------
| So there is a small bias in the generated numbers in the range 
| [0, (mod (expt 2 random-chunk-length) n)]
| 
| I'm not an expert, so please correct me if I'm wrong.  
+---------------

Does this bias still exist if you use the correct type for
RANDOM-CHUNK, which on X86 is (UNSIGNED-BYTE 32)??

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Madhu
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Sat, 21 Jun 2008 17:37:46 +0000
Message-ID: <m3skv6wuxh.fsf@moon.robolove.meer.net>

* (Rob Warnock) <································@speakeasy.net> :
Wrote on Sat, 21 Jun 2008 08:15:30 -0500:

| +---------------
| | I think The no-consing behaviour you observe [after patching
| | compiler/float-tran.lisp] is not because of 32bit inlining but
| | because your values lie in the fixnum range.
| +---------------

I thought the presence of the bignum multiply indicated there was the
potential to cons

[SNIP]

| But interestingly [the "no, not entirely" case], if you pre-decrement
| the argument, type the intermediate variable with (UNSIGNED-BYTE 32),
| and then pass it incremented to RANDOM, the consing goes away again,
| regardless of the value of *FOO* [provided, of course, that you
| don't attempt values outside the range (INTEGER 1 4294967296)]:

Are you sure its not just the compiler playing tricks?
Could you replace the body of your loop which says:
	(random BAR)
with:
	(setq $foo (random BAR))

after doing a (defvar $foo) ...
Does it change the TIME results?

	"I have seen the CONSING" :)

| | | But the DEFTRANSFORM has special code on x86 platforms to inline
| | | 32-bit random numbers [that is, an arg to RANDOM of less than or
| | | equal to 2^32]. Are you suggesting that [even after the above bug-fix]
| | | said special-case code is in fact wrong?!?
| |
| | The issue IIUC is in using (REM (random-chunk) n) to return a random
| | number in the range [0, N-1],    (as Raymond Toy pointed out:)
| |
| | (random-chunk) is uniformly distributed in the range
| |  [0, (expt 2 random-chunk-legth)]
| |
| | [The mersenne twister in cmucl uses a random-chunk-length = 32.]
|
| Actually, (random-chunk) is of type (UNSIGNED-BYTE 32), and so is is
| uniformly distributed in the range [0, (expt 2 random-chunk-length)),

[SNIP] just addressing bias now:

| | So there is a small bias in the generated numbers in the range
| | [0, (mod (expt 2 random-chunk-length) n)]
| |
| | I'm not an expert, so please correct me if I'm wrong.
|
| Does this bias still exist if you use the correct type for
| RANDOM-CHUNK, which on X86 is (UNSIGNED-BYTE 32)??

I have not studied the MT code (either in CMUCL or C) closely. but I do
not see a (REM) being called in the case you are intersted in (for
arg<2^32) so you should be safe from this bias I was trying to
illustrate.

[I believe that the bias in CMUCL's random (in the other case) is well
bounded and small enough to be acceptable generally, but it is non-zero.
I have not been able to convince myself that the sampling technique used
in ranlib.c is much better, though it comes at a bignum cost (hehheh)]

--
Madhu

From: Rob Warnock
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Mon, 23 Jun 2008 02:31:39 +0000
Message-ID: <rvqdnXIX8IaWlsLVnZ2dnUVZ_v7inZ2d@speakeasy.net>

{I think we've probably beaten this one to death, but I'll
reply once final time, trying not to open any new issues!  ;-}  ]

Madhu <·······@meer.net> wrote:
+---------------
| In <································@speakeasy.net>,
| Rob Warnock <····@rpw3.org> wrote:
| | Madhu <·······@meer.net> wrote:
| | +---------------
| | | I think The no-consing behaviour you observe [after patching
| | | compiler/float-tran.lisp] is not because of 32bit inlining but
| | | because your values lie in the fixnum range.
| | +---------------
| 
| I thought the presence of the bignum multiply indicated there
| was the potential to cons
+---------------

But as I noted before, BIGNUM::%MULTIPLY is a VOP [a compiler-inlined
code segment, roughly] specifically designed *not* to cons when doing
a 32b x 32b -> 64b multiply [usually entirely in the registers]. You
just have to tease the compiler into generating the case that uses it.
Unfortunately, the currently-released x86 code blocks that in *exactly*
the case when you want it -- (<= (1+ RANDOM-FIXNUM-MAX) ARG (EXPT 2 32)) --
which is what my (DEFTRANSFORM RANDOM) patch fixes.

+---------------
| | But interestingly [the "no, not entirely" case], if you pre-decrement
| | the argument, type the intermediate variable with (UNSIGNED-BYTE 32),
| | and then pass it incremented to RANDOM, the consing goes away again,
| | regardless of the value of *FOO* [provided, of course, that you
| | don't attempt values outside the range (INTEGER 1 4294967296)]:
| 
| Are you sure its not just the compiler playing tricks?
+---------------

Well, yes, of course it's "playing tricks", but it's the well-known
trick of type-propagation.  ;-}  But it does break down if the body of
the loop gets too complex, so one has to use alternatives in that event
[see below].

+---------------
| Could you replace the body of your loop which says:
| 	(random BAR)
| with:
| 	(setq $foo (random BAR))
| after doing a (defvar $foo) ...
| Does it change the TIME results?
+---------------

If you do a DEFVAR per se to hold the result, sure, of *course* it
will then definitely cons, since DEFVAR'd variables are global and
therefore must be fully boxed, and therefore any 29- to 32-bit result
will cons up a bignum. But if you define a *local* variable of type
(UNSIGNED-BYTE 32) [big enough to hold a (RANDOM 4294967296)] and
assign to *it*, then there's still no consing. Starting the example
from scratch again:

    cmu> (load "float-tran-patch")

    ; Loading #P"/u/rpw3/float-tran-patch.x86f".
    T
    cmu> (defvar *foo* (floor (- (expt 2 32) most-positive-fixnum) 2))

    *FOO*
    cmu> *foo*

    1879048192
    cmu> (> *foo* kernel:random-fixnum-max)

    T 
    cmu> (fixnump *foo*)

    NIL
    cmu> (time (let ((bar (1- *foo*)) ; so as not to violate the U32
		     (quux 0))
                 (declare (type (unsigned-byte 32) bar quux)) 
                 (dotimes (i 1000000)
                   (setf quux (random (the (unsigned-byte 32) (1+ bar)))))))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.03f0 seconds of real time
    ;   0.029242f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   54,873,082 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> 

And it's not just that the compiler detects that the value of QUUX is
effectively unused. [I could hear that coming a kilometer away!  ;-}  ]
The same stays true even if the value of RANDOM gets stored outside
the loop, as long as you store it in a type that can preserve the
unboxed (UNSIGNED-BYTE 32) nature, such as in a specialized array --
which, if you'll remember, was the question that Robert Maas was
originally asking that *started* this thread!  ;-}  ;-}

Unfortunately, it seems that the type-propagation gets a little
too hairy for the compiler for the trick I used above -- the
(LET ((BAR (1- *FOO*))) ... (RANDOM (1+ BAR))) -- to work here,
so you have to revert to the ugler alternative version I showed
previously.

    cmu> (defvar *arr* (make-array 1000000 :element-type '(unsigned-byte 32)))

    *ARR*
    cmu> (type-of *arr*)

    (SIMPLE-ARRAY (UNSIGNED-BYTE 32) (1000000))
    cmu> (setf *foo* (expt 2 32)) ; Test the first branch

    4294967296
    cmu> (time (if (= *foo* 4294967296)
		 (dotimes (i 1000000)
		   (declare (type (simple-array (unsigned-byte 32)) *arr*))
		   (setf (aref *arr* i)
			 (random 4294967296))) ; Special-cased in DEFTRANSFORM
		 (let ((bar *foo*))            ; Make u32 in-register copy
		   (declare (type (unsigned-byte 32) bar) 
			    (type (simple-array (unsigned-byte 32)) *arr*))
		   (assert (plusp bar))
		   (dotimes (i 1000000)
		     (setf (aref *arr* i)
			   (random (the (integer 1 4294967295) bar)))))))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.04f0 seconds of real time
    ;   0.037973f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   78,157,422 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> (decf *foo*)   ; And now the second branch: 0 < *foo* < 2^32

    4294967295
    cmu> (time (if (= *foo* 4294967296)
		 (dotimes (i 1000000)
		   (declare (type (simple-array (unsigned-byte 32)) *arr*))
		   (setf (aref *arr* i)
			 (random 4294967296))) ; Special-cased in DEFTRANSFORM
		 (let ((bar *foo*))            ; Make u32 in-register copy
		   (declare (type (unsigned-byte 32) bar) 
			    (type (simple-array (unsigned-byte 32)) *arr*))
		   (assert (plusp bar))
		   (dotimes (i 1000000)
		     (setf (aref *arr* i)
			   (random (the (integer 1 4294967295) bar)))))))
    ; Compiling LAMBDA NIL: 
    ; Compiling Top-Level Form: 

    ; Evaluation took:
    ;   0.05f0 seconds of real time
    ;   0.045365f0 seconds of user run time
    ;   0.0f0 seconds of system run time
    ;   85,205,259 CPU cycles
    ;   0 page faults and
    ;   0 bytes consed.
    ; 
    NIL
    cmu> (subseq *arr* 0 20)   ; Just to prove it's storing...
    #(451431769 309298074 3071506735 1152762304 3559544580 951329956
      4115968948 2011196577 1492186446 1960542499 3327192403 3712452625
      3289349728 680218552 2710617085 988100064 4123503696 835921392
      791426944 1630415614)
    cmu> 

"Look, Ma! No consing!"

[Note: The difference in the number of CPU cycles in each case above
is meaningless. RANDOM varies quite widely in its run time from call
to call, completely swamping any small variation in the code between
the two branches of the IF.]

+---------------
| "I have seen the CONSING" :)
+---------------

I'm sure you have. It's not terribly easy to get rid of.
But you *can* get rid of it, at least in the case of filling
an array with (pseudo-)random values [RM's original question].

+---------------
| [SNIP] just addressing bias now:
| 
| | Does this bias still exist if you use the correct type for
| | RANDOM-CHUNK, which on X86 is (UNSIGNED-BYTE 32)??
| 
| I have not studied the MT code (either in CMUCL or C) closely. but I do
| not see a (REM) being called in the case you are intersted in (for
| arg<2^32) so you should be safe from this bias I was trying to illustrate.
+---------------

Oh, o.k., thanks.

+---------------
| [I believe that the bias in CMUCL's random (in the other case) is well
| bounded and small enough to be acceptable generally, but it is non-zero.
| I have not been able to convince myself that the sampling technique used
| in ranlib.c is much better, though it comes at a bignum cost (hehheh)]
+---------------

Yes, well...  ;-}  ;-}


-Rob

p.s. If you feel a need to reply once more, go ahead. But I think I'm done.

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Madhu
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Mon, 23 Jun 2008 05:10:03 +0000
Message-ID: <m3hcbkybx0.fsf@moon.robolove.meer.net>

* (Rob Warnock) <································@speakeasy.net> :
Wrote on Sun, 22 Jun 2008 21:31:39 -0500:

| | I thought the presence of the bignum multiply indicated there
| | was the potential to cons
|
| But as I noted before, BIGNUM::%MULTIPLY is a VOP [a compiler-inlined
| code segment, roughly] specifically designed *not* to cons when doing
| a 32b x 32b -> 64b multiply [usually entirely in the registers]. You
| just have to tease the compiler into generating the case that uses it.
| Unfortunately, the currently-released x86 code blocks that in *exactly*
| the case when you want it -- (<= (1+ RANDOM-FIXNUM-MAX) ARG (EXPT 2 32)) --
| which is what my (DEFTRANSFORM RANDOM) patch fixes.

[snip]

| "Look, Ma! No consing!"
| | "I have seen the CONSING" :)
| I'm sure you have. It's not terribly easy to get rid of.
| But you *can* get rid of it, at least in the case of filling
| an array with (pseudo-)random values [RM's original question].

OK I stand corrected.  My mistake was I assumed that just using the
bignum random value would imply boxing, but obviously I forgot the
original question where the result could be stored unboxed

Sorry for dragging this out!

--
Madhu

From: Raymond Toy (RT/EUS)
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Mon, 23 Jun 2008 16:52:25 +0000
Message-ID: <sxd7icgayba.fsf@rtp.ericsson.se>

>>>>> "Rob" == Rob Warnock <····@rpw3.org> writes:

    Rob> Nope, tried all kinds of type declarations -- doesn't help. The problem
    Rob> is, as I said, a typo in "cmucl/src/compiler/float-tran.lisp", in the
    Rob> #+RANDOM-MT19937 (DEFTRANSFORM RANDOM). Two of the COND terms need to
    Rob> be swapped, namely, this code, which does a premature GIVE-UP [because
    Rob> the RANDOM-FIXNUM-MAX is smaller than (EXPT 2 32), so the second branch
    Rob> can never be taken!]:

    Rob>       ((> num-high random-fixnum-max)
    Rob>        (give-up "The range is too large to assure an accurate result."))
    Rob>       #+x86
    Rob>       ((< num-high (expt 2 32))
    Rob>        '(values (bignum::%multiply (random-chunk (or state *random-state*))
    Rob> 		 num)))

    Rob> Those clauses need to be swapped, viz.:

    Rob>       #+x86
    Rob>       ((< num-high (expt 2 32))
    Rob>        '(values (bignum::%multiply (random-chunk (or state *random-state*))
    Rob> 		 num)))
    Rob>       ((> num-high random-fixnum-max)
    Rob>        (give-up "The range is too large to assure an accurate result."))

Yes this should be swapped.  Also, the code isn't specific to x86, so
the conditionalization for x86 can be removed.  The major difference
would be that non-x86 has an random-chunk that conses (because it
needs to return a result somewhere), whereas x86 has a VOP to do it,
so it doesn't cons at all.  Random-chunk is relatively simple but I'm
too lazy right now to write a vop for the other architectures.

[snip]

    Rob> +---------------
    Rob> | Also, Madhu has mentioned to me that random isn't very good for large
    Rob> | integers near 2^32.  It uses modulo to convert the 32-bit internal
    Rob> | random number to the desired range.  This produces a bias.  I think
    Rob> | this is why random-integer-extra-bits and random-fixnum-max exists.  A
    Rob> | rejection technique would probably be better.
    Rob> +---------------

    Rob> But the DEFTRANSFORM has special code on x86 platforms to inline
    Rob> 32-bit random numbers [that is, an arg to RANDOM of less than or
    Rob> equal to 2^32]. Are you suggesting that [even after the above bug-fix]
    Rob> said special-case code is in fact wrong?!?

Hmm.  Have to think about that.  Instead of doing a rem, it does a
multiply and takes the top 32 bits of the 64-bit product.  I think
this doesn't have the problem.

This change has the annoying side effect that the sequence of random
numbers will now be different from what it was before.  Fortunately,
19e was just recently released, so this is a good time to make such a
change.

Ray

From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Mon, 23 Jun 2008 23:12:02 +0000
Message-ID: <rem-2008jun23-001@yahoo.com>

> Date: Fri, 13 Jun 2008 03:44:42 -0500
> From: ····@rpw3.org (Rob Warnock)
> Robert Maas, <··················@spamgourmet.com.remove> wrote:

Hey, because you deleted part of my address, namely the 'uh3t'
part, I had no practical way to discover your article, because
Google Groups advanced search for "Robert Maas" turned up nothing
more recent than mid-March, for weeks and weeks. As a workaround, I
searched for uh3t, which continued to work, but since you didn't
quote that part of my address, your followup never turned up in my
searches. Finally today Google Groups advanced search for exact
phrases is working again, so I could finally discover that you had
responded to me more than a week ago.

> See my parallel reply <·····································@speakeasy.net>.

telnet nntp3.tsoft.net 119
Connected to corp.supernews.com.
Escape character is '^]'.
200 News.GigaNews.Com
HEAD ································@speakeasy.net
430 no such article

<http://groups.google.com/advanced_search?hl=en&q=&hl=en&>
   Message ID Lookup the message with message ID
->
<http://groups.google.com/group/comp.lang.lisp/msg/56a0b16688179275>
   First & foremost, Robert forgot to *compile* ONE-ACCESS!!
OK, I already saw that a long time ago. It doesn't mention 'uh3t
either, but I'm guessing that somebody *else* quoted the 'uh3t'
part, so I found the thread, and browsed the tree [sic, indented
without any connecting branches] view and found your non-uh3t
article too, but your more recent article I'm just discovering
today was *after* the last time I browsed the tree [sic] view of
that thread.

> The problem isn't the value of the arg to RANDOM per se, it's
> that unless the arg is a literal number the #+RANDOM-MT19937
> (DEFTRANSFORM RANDOM) inline expander in
> "cmucl/src/compiler/float-tran.lisp" isn't seeing the arg as one
> which will return an (UNSIGNED-BYTE 32) or smaller, and
> thus is punting to the generic function call of RANDOM... which
> conses some 80-96 bytes per call [it varies].

Yes, I saw that in the parallel post and fixed it by using SUBST to
include a literal integer in place of a dummy token each time I
generated the DEFUN form to eval.

> There's special code on X86 platforms to allow args up to (EXPT 2 32),
> *significantly* larger than KERNEL:RANDOM-FIXNUM-MAX [== 4194303 in 19c,
> but see below re 18b!], but unfortunately the COND branch for that check
> is in the wrong place!! [Yes, I have already mentioned this to one of
> the CMUCL maintainers.]

Is there a patch for that which generates a FASL file which can be
loaded into the user's environment to temporarily fix the problem
without needing sysadmin access to re-install the CMUCL core image?

> But how big is KERNEL:RANDOM-FIXNUM-MAX on CMUCL-18b?!? I don't
> have version 18b here, ...

You could always ask me. Why didn't you?

Today when I finally saw your question, I did this experiment:

% lisp
CMU Common Lisp 18b, ...
Loaded subsystems:
    Python 1.0, target Intel x86
    CLOS based on PCL version:  September 16 92 PCL (f)
* KERNEL:RANDOM-FIXNUM-MAX
4194303

> but on 18d it was already 4194303

Unchanged from 18b.

> Aha!! A web search shows that in even older versions of CMUCL,
> KERNEL:RANDOM-FIXNUM-MAX was only 524287!! So either try an
> ARRSIZ < 524287, or (much better!) switch to a newer version
> of CMUCL. [You really, *really* need to do that in any case,
> since CMUCL-18b is already a *decade* old!!]

That's not an option here. I don't have any unused disk space on my
personal account that I could use to install a newer version of
CMUCL in parallel to what's already on the system directory.

> p.s. Hmmm... What does (FIND :RANDOM-MT19937 *FEATURES*) give you on 18b?

:RANDOM-MT19937

> ... you might be able to load it at runtime [but *before*
> compiling code which calls RANDOM] and at least get a non-consing
> RANDOM with args up to 4194303 [results in 0..4194302].

Moot since it's already in *FEATURES* hence already installed here, right?

From: Pascal J. Bourguignon
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Tue, 24 Jun 2008 00:00:35 +0000
Message-ID: <878wwvn1lo.fsf@hubble.informatimago.com>

··················@spamgourmet.com.remove (Robert Maas, http://tinyurl.com/uh3t) writes:
> [...]
> searches. Finally today Google Groups advanced search for exact
> phrases is working again, so I could finally discover that you had
> responded to me more than a week ago.

Perhaps you should automatize a little more your news reading.  Since
it may be impractical for you to use gnus, or cnews, what about
writting your own nntp client, for example using:
http://www.cliki.net/org-davep-nntp

> That's not an option here. I don't have any unused disk space on my
> personal account that I could use to install a newer version of
> CMUCL in parallel to what's already on the system directory.

This week, I met at least 4 PC in the trash in Paris.  I guess 2 of
them still had HD.   I just cannot believe you couldn't find some five
years old PC in the trash in the Silicon Valley, that would be at
least ten times more powerful than your current computer.  

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
Wanna go outside.
Oh, no! Help! I got outside!
Let me back inside!

From: Rob Warnock
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Tue, 24 Jun 2008 07:50:58 +0000
Message-ID: <0u6dnb5K8dD_Ov3VnZ2dnUVZ_oninZ2d@speakeasy.net>

Robert Maas, <··················@spamgourmet.com.remove> wrote:
+---------------
| Hey, because you deleted part of my address, namely the 'uh3t' part...
+---------------

It makes the line too long, and anyway, "tinyurl.com" isn't
"part of [your] address".

+---------------
| > There's special code on X86 platforms to allow args up to (EXPT 2 32),
| > *significantly* larger than KERNEL:RANDOM-FIXNUM-MAX [== 4194303 in 19c,
| > but see below re 18b!], but unfortunately the COND branch for that check
| > is in the wrong place!! [Yes, I have already mentioned this to one of
| > the CMUCL maintainers.]
| 
| Is there a patch for that which generates a FASL file which can be
| loaded into the user's environment to temporarily fix the problem
| without needing sysadmin access to re-install the CMUCL core image?
+---------------

Yes, that's how I tested it. Simply COMPILE-FILE the following
[snip between the "===" lines] and load it into your running image:

===== BEGIN "float-tran-patch.lisp" ===================
(in-package "C")

#+random-mt19937
(deftransform random ((num &optional state)
		      ((integer 1 #.(expt 2 32)) &optional *))
  "use inline (unsigned-byte 32) operations"
  (let* ((num-type (continuation-type num))
	 (num-high (cond ((numeric-type-p num-type)
			  (numeric-type-high num-type))
			 ((union-type-p num-type)
			  ;; Find the maximum of the union type.  We
			  ;; know this works because if we're in this
			  ;; routine, NUM must be a subtype of
			  ;; (INTEGER 1 2^32), so each member of the
			  ;; union must be a subtype too.
			  (reduce #'max (union-type-types num-type)
				  :key #'numeric-type-high))
			 (t
			  (give-up)))))
    (cond ((constant-continuation-p num)
	   ;; Check the worst case sum abs error for the random number
	   ;; expectations.
	   (let ((rem (rem (expt 2 32) num-high)))
	     (unless (< (/ (* 2 rem (- num-high rem)) num-high (expt 2 32))
			(expt 2 (- kernel::random-integer-extra-bits)))
	       (give-up "The random number expectations are inaccurate."))
	     (if (= num-high (expt 2 32))
		 '(random-chunk (or state *random-state*))
		 #-x86 '(rem (random-chunk (or state *random-state*)) num)
		 #+x86
		 ;; Use multiplication which is faster.
		 '(values (bignum::%multiply 
			   (random-chunk (or state *random-state*))
			   num)))))
	  ;; ···············@rpw3.org -- BUGFIX! Swapped next two cases.
	  #+x86
	  ((< num-high (expt 2 32))
	   '(values (bignum::%multiply (random-chunk (or state *random-state*))
		     num)))
	  ((> num-high random-fixnum-max)
	   (give-up "The range is too large to assure an accurate result."))
	  (t
	   '(rem (random-chunk (or state *random-state*)) num)))))
===== END "float-tran-patch.lisp" ===================


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Raymond Toy (RT/EUS)
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Tue, 24 Jun 2008 13:45:32 +0000
Message-ID: <sxdtzfj9car.fsf@rtp.ericsson.se>

>>>>> "Rob" == Rob Warnock <····@rpw3.org> writes:

    Rob> Yes, that's how I tested it. Simply COMPILE-FILE the following
    Rob> [snip between the "===" lines] and load it into your running image:

    Rob> ===== BEGIN "float-tran-patch.lisp" ===================
    Rob> (in-package "C")

[snip]

I think the deftransform can be simplified quite a bit.  Something
like the following should work:

#+random-mt19937
(deftransform random ((num &optional state)
		      ((integer 1 #.(expt 2 32)) &optional *))
  "use inline (unsigned-byte 32) operations"
  (let* ((num-type (continuation-type num))
	 (num-high (cond ((numeric-type-p num-type)
			  (numeric-type-high num-type))
			 ((union-type-p num-type)
			  (reduce #'max (union-type-types num-type)
				  :key #'numeric-type-high))
			 (t
			  (give-up)))))
    (cond ((< num-high (expt 2 32))
	   '(values (bignum::%multiply (random-chunk (or state *random-state*))
		     num)))
	  ((= num-high (expt 2 32))
	   '(random-chunk (or state *random-state*))))))

Not tested though.

Ray

From: Rob Warnock
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Wed, 25 Jun 2008 12:38:03 +0000
Message-ID: <54idnQ30wsa2of_VnZ2dnUVZ_uCdnZ2d@speakeasy.net>

Raymond Toy (RT/EUS) <···········@ericsson.com> wrote:
+---------------
| >>>>> "Rob" == Rob Warnock <····@rpw3.org> writes:
| Rob> ===== BEGIN "float-tran-patch.lisp" ===================
| Rob> (in-package "C")
...
| I think the deftransform can be simplified quite a bit.
| Something like the following should work:
...
| Not tested though.
+---------------

Looks reasonable, thanks!


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Raymond Toy (RT/EUS)
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Thu, 26 Jun 2008 13:17:44 +0000
Message-ID: <sxdhcbg9vyf.fsf@rtp.ericsson.se>

>>>>> "Rob" == Rob Warnock <····@rpw3.org> writes:

    Rob> Raymond Toy (RT/EUS) <···········@ericsson.com> wrote:
    Rob> +---------------
    Rob> | >>>>> "Rob" == Rob Warnock <····@rpw3.org> writes:
    Rob> | Rob> ===== BEGIN "float-tran-patch.lisp" ===================
    Rob> | Rob> (in-package "C")
    Rob> ...
    Rob> | I think the deftransform can be simplified quite a bit.
    Rob> | Something like the following should work:
    Rob> ...
    Rob> | Not tested though.
    Rob> +---------------

    Rob> Looks reasonable, thanks!

Actually I made a few mistakes.  The following one is better:

(deftransform random ((num &optional state)
		      ((integer 1 #.(expt 2 32)) &optional *))
  "use inline (unsigned-byte 32) operations"
  (let* ((num-type (continuation-type num))
	 (num-high (cond ((numeric-type-p num-type)
			  (numeric-type-high num-type))
			 ((union-type-p num-type)
			  (reduce #'max (union-type-types num-type)
				  :key #'numeric-type-high))
			 (t
			  (give-up)))))
    (cond ((constant-continuation-p num)
	   (if (= num-high (expt 2 32))
	       '(random-chunk (or state *random-state*))
	       '(values (bignum::%multiply 
			 (random-chunk (or state *random-state*))
			 num))))
	  ((< num-high (expt 2 32))
	   '(values (bignum::%multiply (random-chunk (or state *random-state*))
		     num)))
	  ((= num-high (expt 2 32))
	   '(if (= num (expt 2 32))
		(random-chunk (or state *random-state*))
		(values (bignum::%multiply (random-chunk (or state *random-state*))
					   num))))
	  (t
	   (error "Shouldn't happen")))))

Also, if you use this deftransform, you probably also want to modify
RANDOM itself so that it computes random numbers in the same way.
Otherwise, you get the weird result that RANDOM from the repl with the
same arg and state may produce different answers than a compiled
version with the same arg and state.

Ray

From: Rob Warnock
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Fri, 27 Jun 2008 02:39:01 +0000
Message-ID: <0bKdnYzU3uZYz_nVnZ2dnUVZ_tzinZ2d@speakeasy.net>

Raymond Toy (RT/EUS) <···········@ericsson.com> wrote:
+---------------
| Actually I made a few mistakes.  The following one is better:
...
| Also, if you use this deftransform, you probably also want to modify
| RANDOM itself so that it computes random numbers in the same way.
| Otherwise, you get the weird result that RANDOM from the repl with the
| same arg and state may produce different answers than a compiled
| version with the same arg and state.
+---------------

Based on a reading of CLHS RANDOM, *RANDOM-STATE*, and MAKE-RANDOM-STATE,
I'd say that "weird" is not quite a strong enough adjective here;
"wrong" or even "badly broken" would be better.  ;-}


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607

From: Raymond Toy (RT/EUS)
Subject: Re: How to avoid GC in tight numeric test loop? (CMUCL)
Date: Fri, 27 Jun 2008 13:22:15 +0000
Message-ID: <sxdy74r812w.fsf@rtp.ericsson.se>

>>>>> "Rob" == Rob Warnock <····@rpw3.org> writes:

    Rob> Raymond Toy (RT/EUS) <···········@ericsson.com> wrote:
    Rob> +---------------
    Rob> | Actually I made a few mistakes.  The following one is better:
    Rob> ...
    Rob> | Also, if you use this deftransform, you probably also want to modify
    Rob> | RANDOM itself so that it computes random numbers in the same way.
    Rob> | Otherwise, you get the weird result that RANDOM from the repl with the
    Rob> | same arg and state may produce different answers than a compiled
    Rob> | version with the same arg and state.
    Rob> +---------------

    Rob> Based on a reading of CLHS RANDOM, *RANDOM-STATE*, and MAKE-RANDOM-STATE,
    Rob> I'd say that "weird" is not quite a strong enough adjective here;
    Rob> "wrong" or even "badly broken" would be better.  ;-}

Heh.  Yeah, I guess so.

Interestingly, there was a recent discussion on the cmucl list about
calling SIN from the repl and from compiled code like

(defun mysin (x)
  (declare (type (double-float 0d0 #.(scale-float 1d0 62))) x)
  (sin x))

For large x like (scale-float 1d0 61) (exactly representable in FP),
mysin and sin would produce different results because a compiler
deftransform would convert the call to sin to use the x86 fsin
instruction.  fsin doesn't work so well for such large arguments.

I wanted to make them the same, but most people wanted it fast and
allowed the difference on the grounds that such large x were
hopelessly inaccurate anyway.  Despite the fact that such x is exact.

Note that on sparc and ppc, there is no fsin instruction, so sin and
mysin always produces the same value because both have to call a C
function.

Ray