From: Leo
Subject: Improve the efficiency of a simple function
Date: 
Message-ID: <m063eobsf5.fsf@cam.ac.uk>
Hi there,

I have ported CMWC4096� (a random number generator) to sbcl. However the
speed is far from satisfactory. If it were 15 - 20 times faster, it
would have been more acceptable. I have researched extensively but my
knowledge of common lisp is not sufficient to improve it further. So I
post it here hoping CL gurus can help out. Thank you. -- Leo

C version
--------------------------------
  static unsigned long Q[4096],c=362436; /* choose random initial c<809430660 and */
                                         /* 4096 random 32-bit integers for Q[]   */
  unsigned long CMWC4096(void){
  unsigned long long t, a=18782LL;
  static unsigned long i=4095;
  unsigned long x,r=0xfffffffe;
     i=(i+1)&4095;
     t=a*Q[i]+c;
     c=(t>>32); x=t+c; if(x<c){x++;c++;}
     return(Q[i]=r-x);    }
--------------------------------

CL version (the difference is that the return is mapped to interval (0,1))
--------------------------------
;;; Multiply-with-carry
(let ((Q (make-array 4096 :element-type '(unsigned-byte 32) :adjustable nil
                     :fill-pointer nil :initial-element 0))
      (c 0)
      (i 4095))
  ;;; initialise Q
  ;; 4096 random 32-bit integers for Q
  (loop for j from 0 to 4095
     do (setf (aref Q j) (random #xfffffffe)))
  ;; choose random initial c<809430660
  (setf c (random 809430659))

  (defun random-cmwc ()
    (declare (optimize (speed 3) (debug 0))
             ((simple-array (unsigned-byte 32) (4096)) Q)
             ((unsigned-byte 32) c i))
    (let ((a 18782)
          (r #xfffffffe)                    ; 2^32-2
          (rr 2.3283064370807974d-10)       ; 1/(r+1)
          (t1 0)
          (t2 0)
          (x 0))
      (declare ((unsigned-byte 32) a r x t2)
               ((unsigned-byte 64) t1)
               ((double-float 0d0) rr))
      (setf i (logand (1+ i) 4095)       ; (i+1) - 4096
            t1 (+ (* (aref Q i) a) c)
            c (ash t1 -32)
            x (+ (logand t1 #xffffffff) c))
      (when (< x c)
        (incf x)
        (incf c))
      (setf t2 (- r x))
      (setf (aref Q i) t2)
      (* t2 rr))))


Footnotes: 
�  http://school.anhb.uwa.edu.au/personalpages/kwessen/shared/Marsaglia03.html
-- 
Emacs uptime: 13 hours, 31 minutes, 48 seconds

From: budden
Subject: Re: Improve the efficiency of a simple function
Date: 
Message-ID: <3ccebe23-a8d0-4945-9752-b3606af2256a@d14g2000yqc.googlegroups.com>
Hi!
 I don't insist, but I observed that unsigned-byte arithmetics is slow
in SBCL.
It is essentially slower than C counterparts. Fast fixnum arithmetics
is used up to
2^30 or 2^31 in 32-bit SBCL (don't remember exact boundary, but it was
surprisingly low).
Unsigned-byte 32 seem to be not a fixnum.
 My advice is: try to avoid porting. Link C function instead and check
if it helps in
your application.
From: Leo
Subject: Re: Improve the efficiency of a simple function
Date: 
Message-ID: <m0ljnk9qkt.fsf@cam.ac.uk>
On 2009-06-22 17:28 +0100, budden wrote:
> Hi! I don't insist, but I observed that unsigned-byte arithmetics is
> slow in SBCL. It is essentially slower than C counterparts. Fast
> fixnum arithmetics is used up to 2^30 or 2^31 in 32-bit SBCL (don't
> remember exact boundary, but it was surprisingly low). Unsigned-byte
> 32 seem to be not a fixnum. My advice is: try to avoid porting. Link C
> function instead and check if it helps in your application.

This is in line with

,----[ cmucl manual ]
| The main barrier to efficient Lisp programs is not that there is no
| efficient way to code the program in Lisp, but that it is difficult to
| arrive at that efficient coding.
`----

-- 
Emacs uptime: 22 hours, 11 minutes, 17 seconds
From: Barry Margolin
Subject: Re: Improve the efficiency of a simple function
Date: 
Message-ID: <barmar-BCCBD3.23021022062009@news.eternal-september.org>
In article 
<····································@d14g2000yqc.googlegroups.com>,
 budden <···········@mail.ru> wrote:

> Hi!
>  I don't insist, but I observed that unsigned-byte arithmetics is 
>  slow
> in SBCL. It is essentially slower than C counterparts. Fast fixnum 
> arithmetics is used up to 2^30 or 2^31 in 32-bit SBCL (don't remember 
> exact boundary, but it was surprisingly low). Unsigned-byte 32 seem 
> to be not a fixnum.

That's because Lisp implementations usually need to steal a couple of 
bits for type codes.  Check the values of MOST-POSITIVE-FIXNUM in a few 
implementations.

Also, Common Lisp doesn't do modular arithmetic on UNSIGNED-BYTE, like C 
does with unsigned.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
From: gugamilare
Subject: Re: Improve the efficiency of a simple function
Date: 
Message-ID: <a023b606-c493-4f8c-b49b-647ebe42ae97@x5g2000yqk.googlegroups.com>
On 23 jun, 00:02, Barry Margolin <······@alum.mit.edu> wrote:
> In article
> <····································@d14g2000yqc.googlegroups.com>,
>
>  budden <···········@mail.ru> wrote:
> > Hi!
> >  I don't insist, but I observed that unsigned-byte arithmetics is
> >  slow
> > in SBCL. It is essentially slower than C counterparts. Fast fixnum
> > arithmetics is used up to 2^30 or 2^31 in 32-bit SBCL (don't remember
> > exact boundary, but it was surprisingly low). Unsigned-byte 32 seem
> > to be not a fixnum.
>
> That's because Lisp implementations usually need to steal a couple of
> bits for type codes.  Check the values of MOST-POSITIVE-FIXNUM in a few
> implementations.
>
> Also, Common Lisp doesn't do modular arithmetic on UNSIGNED-BYTE, like C
> does with unsigned.

Someone correct me if I am wrong, but sbcl seems to be able to unbox
values in local variables and optimize the use of (unsigned-byte 32).
From: Rob Warnock
Subject: Re: Improve the efficiency of a simple function
Date: 
Message-ID: <NfmdnZlpAcbkM9zXnZ2dnUVZ_tidnZ2d@speakeasy.net>
gugamilare  <··········@gmail.com> wrote:
+---------------
| Barry Margolin <······@alum.mit.edu> wrote:
| > Also, Common Lisp doesn't do modular arithmetic on UNSIGNED-BYTE, like C
| > does with unsigned.
| 
| Someone correct me if I am wrong, but sbcl seems to be able to unbox
| values in local variables and optimize the use of (unsigned-byte 32).
+---------------

But *only* when its type inferencer -- plus the type promises your
declarations have made -- can *guarantee* that such optimized use
is valid. As Barry say, it *still* won't do modular arithmetic on
UNSIGNED-BYTE unless you lie to it and then declare (SAFETY 0) or
something. [Or unless you wrap each computation in (LOGAND ... #xffffffff),
or (MOD ... 4294967296) or equiv., whereupon it will feel free to allow
"wraparound" to happen and will cheerfully elide the LOGAND or MOD
from the generated code!]


-Rob

-----
Rob Warnock			<····@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
From: Raymond Toy
Subject: Re: Improve the efficiency of a simple function
Date: 
Message-ID: <sxdskhstj54.fsf@rtp.ericsson.se>
>>>>> "Leo" == Leo  <·······@gmail.com> writes:

    Leo> Hi there,
    Leo> I have ported CMWC4096� (a random number generator) to sbcl. However the
    Leo> speed is far from satisfactory. If it were 15 - 20 times faster, it
    Leo> would have been more acceptable. I have researched extensively but my
    Leo> knowledge of common lisp is not sufficient to improve it further. So I
    Leo> post it here hoping CL gurus can help out. Thank you. -- Leo

Any particular reason why you want CMWC4096?  Is the builtin generator
deficient in some way?

I have some other RNGs, including Marsaglia's 64-bit "universal" rng,
various MRG generators, and some WELL rngs, if you're interested.

    Leo> C version
    Leo> --------------------------------
    Leo>   static unsigned long Q[4096],c=362436; /* choose random initial c<809430660 and */
    Leo>                                          /* 4096 random 32-bit integers for Q[]   */
    Leo>   unsigned long CMWC4096(void){
    Leo>   unsigned long long t, a=18782LL;
    Leo>   static unsigned long i=4095;
    Leo>   unsigned long x,r=0xfffffffe;
    Leo>      i=(i+1)&4095;
    Leo>      t=a*Q[i]+c;
    Leo>      c=(t>>32); x=t+c; if(x<c){x++;c++;}
    Leo>      return(Q[i]=r-x);    }
    Leo> --------------------------------

    Leo> CL version (the difference is that the return is mapped to interval (0,1))

Didn't you get some compiler notes when compiling this function?  You
need to fix those if you want to get this code to run fast(er).

First, let me assume you're on a 32-bit lisp.  If not, ignore these
comments.

    Leo>       (setf i (logand (1+ i) 4095)       ; (i+1) - 4096
    Leo>             t1 (+ (* (aref Q i) a) c)

The computation of (* (aref Q i) a) will be a problem if you really
want a 64-bit result.  To do this quickly, you'll need to do it by
hand to get the desired 64-bit result.  That is, you get to rearrange
the computations so you build up the result without having to cons a
bignum.  Or use some implementation-specific internals to do this quickly.

Ray
From: Leo
Subject: Re: Improve the efficiency of a simple function
Date: 
Message-ID: <m0skhs9r4t.fsf@cam.ac.uk>
Hi Raymond,

Many thanks for your excellent answer.

On 2009-06-22 15:53 +0100, Raymond Toy wrote:
>>>>>> "Leo" == Leo  <·······@gmail.com> writes:
>
>     Leo> Hi there,
>     Leo> I have ported CMWC4096� (a random number generator) to sbcl. However the
>     Leo> speed is far from satisfactory. If it were 15 - 20 times faster, it
>     Leo> would have been more acceptable. I have researched extensively but my
>     Leo> knowledge of common lisp is not sufficient to improve it further. So I
>     Leo> post it here hoping CL gurus can help out. Thank you. -- Leo
>
> Any particular reason why you want CMWC4096?  Is the builtin generator
> deficient in some way?
>
> I have some other RNGs, including Marsaglia's 64-bit "universal" rng,
> various MRG generators, and some WELL rngs, if you're interested.

The default random generator in sbcl uses Mersenne twister (MT), I'd
like to have a few other generators around. It seems cmwc4096 is the one
with a longer period than MT and the C code looks short enough so I
attempt a port.

Yes, I am interested in other rngs.

>     Leo> C version
>     Leo> --------------------------------
>     Leo>   static unsigned long Q[4096],c=362436; /* choose random initial c<809430660 and */
>     Leo>                                          /* 4096 random 32-bit integers for Q[]   */
>     Leo>   unsigned long CMWC4096(void){
>     Leo>   unsigned long long t, a=18782LL;
>     Leo>   static unsigned long i=4095;
>     Leo>   unsigned long x,r=0xfffffffe;
>     Leo>      i=(i+1)&4095;
>     Leo>      t=a*Q[i]+c;
>     Leo>      c=(t>>32); x=t+c; if(x<c){x++;c++;}
>     Leo>      return(Q[i]=r-x);    }
>     Leo> --------------------------------
>
>     Leo> CL version (the difference is that the return is mapped to interval (0,1))
>
> Didn't you get some compiler notes when compiling this function?  You
> need to fix those if you want to get this code to run fast(er).

Unfortunately the notes do not help much in this case.

> First, let me assume you're on a 32-bit lisp. If not, ignore these
> comments.

Yes 32-bit on macosx.

>     Leo>       (setf i (logand (1+ i) 4095)       ; (i+1) - 4096
>     Leo>             t1 (+ (* (aref Q i) a) c)
>
> The computation of (* (aref Q i) a) will be a problem if you really
> want a 64-bit result.  To do this quickly, you'll need to do it by
> hand to get the desired 64-bit result.  That is, you get to rearrange
> the computations so you build up the result without having to cons a
> bignum.  Or use some implementation-specific internals to do this quickly.

This one takes about 70% of the whole computing time. Let me see if I
can get this done.

> Ray

Thanks.

Leo

-- 
Emacs uptime: 21 hours, 52 minutes, 34 seconds
From: Spiros Bousbouras
Subject: Re: Improve the efficiency of a simple function
Date: 
Message-ID: <1e8e1db3-5020-4747-ac32-52ca9ae20aa2@q16g2000yqg.googlegroups.com>
On 22 June, 17:20, Leo <·······@gmail.com> wrote:
>
> The default random generator in sbcl uses Mersenne twister (MT), I'd
> like to have a few other generators around. It seems cmwc4096 is the one
> with a longer period than MT and the C code looks short enough so I
> attempt a port.
>
> Yes, I am interested in other rngs.

Check
< http://groups.google.co.uk/group/comp.lang.c/browse_thread/thread/a85bf5f2a97f5a55#
>
From: Raymond Toy
Subject: Re: Improve the efficiency of a simple function
Date: 
Message-ID: <sxdocsgt2mr.fsf@rtp.ericsson.se>
>>>>> "Leo" == Leo  <·······@gmail.com> writes:

    Leo> The default random generator in sbcl uses Mersenne twister (MT), I'd
    Leo> like to have a few other generators around. It seems cmwc4096 is the one
    Leo> with a longer period than MT and the C code looks short enough so I
    Leo> attempt a port.

Good reasons.

    Leo> Yes, I am interested in other rngs.

I can send you the code if you're interested.  I'm not sure about the
periods of these generators.  I think WELL 44497 has period 2^44497 or
so.

    Leo> Yes 32-bit on macosx.

    Leo> (setf i (logand (1+ i) 4095)       ; (i+1) - 4096
    Leo> t1 (+ (* (aref Q i) a) c)
    >> 
    >> The computation of (* (aref Q i) a) will be a problem if you really
    >> want a 64-bit result.  To do this quickly, you'll need to do it by
    >> hand to get the desired 64-bit result.  That is, you get to rearrange
    >> the computations so you build up the result without having to cons a
    >> bignum.  Or use some implementation-specific internals to do this quickly.

    Leo> This one takes about 70% of the whole computing time. Let me see if I
    Leo> can get this done.

If you don't care about portability, look for %multiply which
multiplies two 32-bit numbers and returns the 64-bit product as two
values.  (I think sbcl still has this.  Cmucl still does.)

And if you do care about portability, you can still use %multiply:
the fast internal one, and a slow portable one.

Ray
From: Pascal J. Bourguignon
Subject: Re: Improve the efficiency of a simple function
Date: 
Message-ID: <7ctz28r5z8.fsf@pbourguignon.anevia.com>
Leo <·······@gmail.com> writes:

> Hi there,
>
> I have ported CMWC4096� (a random number generator) to sbcl. However the
> speed is far from satisfactory. If it were 15 - 20 times faster, it
> would have been more acceptable. I have researched extensively but my
> knowledge of common lisp is not sufficient to improve it further. So I
> post it here hoping CL gurus can help out. Thank you. -- Leo
>
> C version
> --------------------------------
>   static unsigned long Q[4096],c=362436; /* choose random initial c<809430660 and */
>                                          /* 4096 random 32-bit integers for Q[]   */
>   unsigned long CMWC4096(void){
>   unsigned long long t, a=18782LL;
>   static unsigned long i=4095;
>   unsigned long x,r=0xfffffffe;
>      i=(i+1)&4095;
>      t=a*Q[i]+c;
>      c=(t>>32); x=t+c; if(x<c){x++;c++;}
>      return(Q[i]=r-x);    }
> --------------------------------
>
> CL version (the difference is that the return is mapped to interval (0,1))
> --------------------------------
> ;;; Multiply-with-carry
> (let ((Q (make-array 4096 :element-type '(unsigned-byte 32) :adjustable nil
>                      :fill-pointer nil :initial-element 0))
>       (c 0)
>       (i 4095))
>   ;;; initialise Q
>   ;; 4096 random 32-bit integers for Q
>   (loop for j from 0 to 4095
>      do (setf (aref Q j) (random #xfffffffe)))
>   ;; choose random initial c<809430660
>   (setf c (random 809430659))
>
>   (defun random-cmwc ()
>     (declare (optimize (speed 3) (debug 0))
>              ((simple-array (unsigned-byte 32) (4096)) Q)
>              ((unsigned-byte 32) c i))
>     (let ((a 18782)
>           (r #xfffffffe)                    ; 2^32-2
>           (rr 2.3283064370807974d-10)       ; 1/(r+1)
>           (t1 0)
>           (t2 0)
>           (x 0))
>       (declare ((unsigned-byte 32) a r x t2)
>                ((unsigned-byte 64) t1)
>                ((double-float 0d0) rr))
>       (setf i (logand (1+ i) 4095)       ; (i+1) - 4096
>             t1 (+ (* (aref Q i) a) c)
>             c (ash t1 -32)
>             x (+ (logand t1 #xffffffff) c))
>       (when (< x c)
>         (incf x)
>         (incf c))
>       (setf t2 (- r x))
>       (setf (aref Q i) t2)
>       (* t2 rr))))

What does { unsigned long c=4294967295; c++; printf("%ud\n",c); }  print?
What does (let ((c 4294967295)) (declare (unsigned-byte 32) c) (incf c) (print c)) print?
What did you miss? (two things)

-- 
__Pascal Bourguignon__
From: Leo
Subject: Re: Improve the efficiency of a simple function
Date: 
Message-ID: <m01vpcbdfe.fsf@cam.ac.uk>
On 2009-06-22 10:08 +0100, Pascal J. Bourguignon wrote:
[...]
> What does { unsigned long c=4294967295; c++; printf("%ud\n",c); }  print?
> What does (let ((c 4294967295)) (declare (unsigned-byte 32) c) (incf c) (print c)) print?
> What did you miss? (two things)

By asking on #lisp, it seems for lisp when the number goes beyond
uint-32, it changes into a even bigger type. But with typechecking, an
error will be thrown at least with sbcl. I couldn't fully appreciate the
comments I got so after a bit of tweaking, it turns into something like
http://paste.lisp.org/display/82281#3 , much uglier and still the same
inefficiency.

Could you elaborate on the two things? Thanks.

Leo

-- 
Emacs uptime: 19 hours, 4 minutes, 51 seconds
From: Pillsy
Subject: Re: Improve the efficiency of a simple function
Date: 
Message-ID: <f835c109-9f7a-46f5-a8db-4126126883d6@e21g2000yqb.googlegroups.com>
On Jun 22, 9:33 am, Leo <·······@gmail.com> wrote:
[...]
> By asking on #lisp, it seems for lisp when the number goes beyond
> uint-32, it changes into a even bigger type. But with typechecking, an
> error will be thrown at least with sbcl.

In C, don't unsigned integers wrap around on overflowing? If c is a 32-
bit unsigned integer, I'd think the equivalent to ++c in Common Lisp
would be something like:

(mod (incf c) (expt 2 32))

Cheers,
Pillsy
From: Spiros Bousbouras
Subject: Re: Improve the efficiency of a simple function
Date: 
Message-ID: <7c659170-77bc-4bf2-a400-d7fb71d1d276@y7g2000yqa.googlegroups.com>
On 22 June, 10:08, ····@informatimago.com (Pascal J. Bourguignon)
wrote:
>
> What does { unsigned long c=4294967295; c++; printf("%ud\n",c); }  print?

This should be      printf("%lu\n",c);