From: Robert Maas, http://tinyurl.com/uh3t
Subject: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <rem-2008jul07-001@yahoo.com>
(Please don't complain that I'm using PROG. I like PROG!!)

;Given N = odd integer to factor, FACLO = smallest odd factor to consider:
;Test odd integers from FACLO upward to first success or sqrt(N).
;Return NIL if no such factor was found, else return the factor.
(defun oddn+faclo-brute-test-factors (n faclo) ;(setq n 3467527) (setq faclo 3)
  ;(declare (optimize (speed 3) (safety 0))) ;Generates too many errors!!
  (unless (fixnump n) (error "Non-FIXNUM N doesn't work here"))
  (unless (fixnump faclo) (error "Non-FIXNUM faclo doesn't work here"))
  (unless (oddp faclo) (error "FACLO needs to be odd"))
  (prog ((nn (the (unsigned-byte 32) n))
         (m (the (unsigned-byte 32) faclo))
         (sqrtn (the (unsigned-byte 32) (isqrt nn)))
         (r 0))
    (declare (type (unsigned-byte 32) nn m sqrtn r))
  lp(setq r (mod nn m))
    (when (zerop r) (return m))
    (incf m 2)
    (when (<= m sqrtn) (go lp))
    (return nil)
    ))

With none of the declarations shown above, compiled it takes about 2-4
 milliseconds per call (see test data below).

With all the TYPE declarations (via both DECLARE and THE),
 but without the OPTIMIZE declaration,
 i.e. as you see it above, compiled:
(setq testprod (* 23053 23167))
(time (dotimes (xx 10000) (oddn+faclo-brute-test-factors testprod 3)))
;About 2 seconds per 10000, hence 200 microseconds per call

With the OPTIMIZE declaration also, attempt to compile the function
generates an excessive spew of error messages (see some later below)
which overflow modem buffer so I can't even see them all. What is wrong?


Just the first four of the spew of error messages:

In: LAMBDA (N FACLO)
  (INCF M 2)
--> LET*
==>
  (+ M 2)
Note: Unable to optimize due to type uncertainty:
      The first argument is a NUMBER, not a FLOAT.

  (<= M SQRTN)
--> IF
==>
  (> M SQRTN)
Note: Unable to optimize due to type uncertainty:
      The first argument is a REAL, not a FLOAT.
Note: Unable to optimize due to type uncertainty:
      The first argument is a REAL, not a INTEGER.

  (MOD NN M)
--> BLOCK LET IF AND IF NOT IF ZEROP
==>
  (= REM 0)
Note: Unable to optimize due to type uncertainty:
      The first argument is a REAL, not a FLOAT.
Note: Unable to open code because:
      Operands might not be the same type.

--> BLOCK LET IF AND IF AND IF MINUSP
==>
  (< KERNEL::DIVISOR 0)
Note: Unable to optimize due to type uncertainty:
      The first argument is a REAL, not a FLOAT.
Note: Unable to optimize due to type uncertainty:
      The first argument is a REAL, not a INTEGER.

etc. etc. etc.

From: Barry Margolin
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <barmar-AA800F.21363307072008@newsgroups.comcast.net>
In article <·················@yahoo.com>,
 ··················@spamgourmet.com.remove (Robert Maas, 
 http://tinyurl.com/uh3t) wrote:

> (Please don't complain that I'm using PROG. I like PROG!!)

But not CHECK-TYPE, apparently.

> 
> ;Given N = odd integer to factor, FACLO = smallest odd factor to consider:
> ;Test odd integers from FACLO upward to first success or sqrt(N).
> ;Return NIL if no such factor was found, else return the factor.
> (defun oddn+faclo-brute-test-factors (n faclo) ;(setq n 3467527) (setq faclo 
> 3)
>   ;(declare (optimize (speed 3) (safety 0))) ;Generates too many errors!!
>   (unless (fixnump n) (error "Non-FIXNUM N doesn't work here"))
>   (unless (fixnump faclo) (error "Non-FIXNUM faclo doesn't work here"))
>   (unless (oddp faclo) (error "FACLO needs to be odd"))
>   (prog ((nn (the (unsigned-byte 32) n))
>          (m (the (unsigned-byte 32) faclo))
>          (sqrtn (the (unsigned-byte 32) (isqrt nn)))
>          (r 0))
>     (declare (type (unsigned-byte 32) nn m sqrtn r))
>   lp(setq r (mod nn m))
>     (when (zerop r) (return m))
>     (incf m 2)
>     (when (<= m sqrtn) (go lp))
>     (return nil)
>     ))
> 
> With none of the declarations shown above, compiled it takes about 2-4
>  milliseconds per call (see test data below).
> 
> With all the TYPE declarations (via both DECLARE and THE),
>  but without the OPTIMIZE declaration,
>  i.e. as you see it above, compiled:
> (setq testprod (* 23053 23167))
> (time (dotimes (xx 10000) (oddn+faclo-brute-test-factors testprod 3)))
> ;About 2 seconds per 10000, hence 200 microseconds per call
> 
> With the OPTIMIZE declaration also, attempt to compile the function
> generates an excessive spew of error messages (see some later below)
> which overflow modem buffer so I can't even see them all. What is wrong?
> 
> 
> Just the first four of the spew of error messages:
> 
> In: LAMBDA (N FACLO)
>   (INCF M 2)
> --> LET*
> ==>
>   (+ M 2)
> Note: Unable to optimize due to type uncertainty:
>       The first argument is a NUMBER, not a FLOAT.

I suspect the problem is that your implementation isn't able to optimize 
unsigned 32-bit numbers, because these are outside the range of fixnums.  
Fixnums are typically smaller than the machine word, to allow for type 
bits.  Since your parameters are bigger than fixnums, it's falling into 
the generic NUMBER code.

What happens if you change all your (UNSIGNED-BYTE 32) declarations to 
FIXNUM?

> 
>   (<= M SQRTN)
> --> IF
> ==>
>   (> M SQRTN)
> Note: Unable to optimize due to type uncertainty:
>       The first argument is a REAL, not a FLOAT.
> Note: Unable to optimize due to type uncertainty:
>       The first argument is a REAL, not a INTEGER.
> 
>   (MOD NN M)
> --> BLOCK LET IF AND IF NOT IF ZEROP
> ==>
>   (= REM 0)
> Note: Unable to optimize due to type uncertainty:
>       The first argument is a REAL, not a FLOAT.
> Note: Unable to open code because:
>       Operands might not be the same type.
> 
> --> BLOCK LET IF AND IF AND IF MINUSP
> ==>
>   (< KERNEL::DIVISOR 0)
> Note: Unable to optimize due to type uncertainty:
>       The first argument is a REAL, not a FLOAT.
> Note: Unable to optimize due to type uncertainty:
>       The first argument is a REAL, not a INTEGER.
> 
> etc. etc. etc.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <rem-2008jul22-002@yahoo.com>
> > (Please don't complain that I'm using PROG. I like PROG!!)
> From: Barry Margolin <······@alum.mit.edu>
> But not CHECK-TYPE, apparently.

Hmmm, good point. I seem to recall that when I first started using
Common Lisp it seemed to me that CHECK-TYPE didn't offer quite the
flexibility I needed. But now that I've learned more deep stuff
about types, especially SATISFIES which converts just about any old
predicate function into a formal type specifier, I'm thinking maybe
CHECK-TYPE might actually be hackable to the full range of my
needs. So next time I'm writing new code in paranoid mode, maybe
I'll give it a try. Thanks.

> I suspect the problem is that your implementation isn't able to
> optimize unsigned 32-bit numbers, because these are outside the
> range of fixnums. Fixnums are typically smaller than the machine
> word, to allow for type bits.  Since your parameters are bigger
> than fixnums, it's falling into the generic NUMBER code.
> What happens if you change all your (UNSIGNED-BYTE 32) declarations to
> FIXNUM?

Well the point of this was to be able to take advantage of
efficient 32-bit machine arithmetic to perform brute-force trial
factors of numbers all the way up to 31 or 32 bit positive integers
without doing any CONSing whatsoever to see whether it was fast
enough to use as-is (no smarts such as using a completely different
primality-test algorithm that asymptotically faster than
brute-force trial factors). The basic idea is to have something
quick-and-dirty for integers that are small enough that nothing
smarter is needed. So if I water down my routine to only
FIXNUM-size integers, I'm basically giving up the experiment I
originally intended.

What I've actually done so-far is give up on the *entire*
optimization experiment, and restrict my algorithm to FIXNUMs only,
which means the higher level purpose can't be done as well as I had
hoped. I'm running the original code I published, which has the
declaractions of type but does *not* have the declaration of
optimization parameters. The higher level purpose is to generate
random products of easily-factorizable integers such that the
product N is within a desired very narrow range and 2N+1 happens to
be a prime. Unfortunately the restriction on FIXNUMs means the
target range can't be quite as narrow as I had hoped. But I've
found an algorithm that allows me to specify ten or eleven
high-order digits of the product and still crank out such
easy-to-factorize products, so that's enough for some of my
high-level needs. (This is all a slight improvement on an algorithm
I developed in MacLisp circa 1977, a cheap alternative to a much
better but much much more complicated idea I haven't yet
implemented. I've tentatively decided to implement the better idea
if and when I have time, rather than make any further effort to
make the crude 1977-variant more efficient.)
From: Pascal J. Bourguignon
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <873am1q122.fsf@hubble.informatimago.com>
··················@spamgourmet.com.remove (Robert Maas, http://tinyurl.com/uh3t) writes:
>> I suspect the problem is that your implementation isn't able to
>> optimize unsigned 32-bit numbers, because these are outside the
>> range of fixnums. Fixnums are typically smaller than the machine
>> word, to allow for type bits.  Since your parameters are bigger
>> than fixnums, it's falling into the generic NUMBER code.
>> What happens if you change all your (UNSIGNED-BYTE 32) declarations to
>> FIXNUM?
>
> Well the point of this was to be able to take advantage of
> efficient 32-bit machine arithmetic to perform brute-force trial
> factors of numbers all the way up to 31 or 32 bit positive integers
> without doing any CONSing whatsoever to see whether it was fast
> enough to use as-is (no smarts such as using a completely different
> primality-test algorithm that asymptotically faster than
> brute-force trial factors). The basic idea is to have something
> quick-and-dirty for integers that are small enough that nothing
> smarter is needed. So if I water down my routine to only
> FIXNUM-size integers, I'm basically giving up the experiment I
> originally intended.

C/MC[598]> (time (defparameter *primes-to-2^32* 
                    (COM.INFORMATIMAGO.COMMON-LISP.PRIMES:COMPUTE-PRIMES-TO (expt 2 32))))
Real time: 1740.897 sec.
Run time: 1740.4999 sec.
Space: 1894678112 Bytes
GC: 3, GC time: 0.26665 sec.
*PRIMES-TO-2^32*

On a single 64-bit core of  Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz  


Sorry, you'll have to wait tomorrow if you want me to send you the primes.

In the mean time, you can fetch my Eratostene sieve implementation,
and run it yourself...

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

Nobody can fix the economy.  Nobody can be trusted with their finger
on the button.  Nobody's perfect.  VOTE FOR NOBODY.
From: John Thingstad
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <op.uepe8xqaut4oq5@pandora.alfanett.no>
P� Tue, 22 Jul 2008 19:30:13 +0200, skrev Pascal J. Bourguignon  
<···@informatimago.com>:

> ··················@spamgourmet.com.remove (Robert Maas,  
> http://tinyurl.com/uh3t) writes:
>>> I suspect the problem is that your implementation isn't able to
>>> optimize unsigned 32-bit numbers, because these are outside the
>>> range of fixnums. Fixnums are typically smaller than the machine
>>> word, to allow for type bits.  Since your parameters are bigger
>>> than fixnums, it's falling into the generic NUMBER code.
>>> What happens if you change all your (UNSIGNED-BYTE 32) declarations to
>>> FIXNUM?
>>
>> Well the point of this was to be able to take advantage of
>> efficient 32-bit machine arithmetic to perform brute-force trial
>> factors of numbers all the way up to 31 or 32 bit positive integers
>> without doing any CONSing whatsoever to see whether it was fast
>> enough to use as-is (no smarts such as using a completely different
>> primality-test algorithm that asymptotically faster than
>> brute-force trial factors). The basic idea is to have something
>> quick-and-dirty for integers that are small enough that nothing
>> smarter is needed. So if I water down my routine to only
>> FIXNUM-size integers, I'm basically giving up the experiment I
>> originally intended.
>
> C/MC[598]> (time (defparameter *primes-to-2^32*
>                     (COM.INFORMATIMAGO.COMMON-LISP.PRIMES:COMPUTE-PRIMES-TO  
> (expt 2 32))))
> Real time: 1740.897 sec.
> Run time: 1740.4999 sec.
> Space: 1894678112 Bytes
> GC: 3, GC time: 0.26665 sec.
> *PRIMES-TO-2^32*
>
> On a single 64-bit core of  Intel(R) Core(TM)2 Quad CPU    Q6600  @  
> 2.40GHz
>
>
> Sorry, you'll have to wait tomorrow if you want me to send you the  
> primes.
>
> In the mean time, you can fetch my Eratostene sieve implementation,
> and run it yourself...
>

That's slow! For anything above 1000000 a quadric sieve would probably  
fare better.

--------------
John Thingstad
From: Pascal J. Bourguignon
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <87tzehokvs.fsf@hubble.informatimago.com>
"John Thingstad" <·······@online.no> writes:

> P� Tue, 22 Jul 2008 19:30:13 +0200, skrev Pascal J. Bourguignon
> <···@informatimago.com>:
>> C/MC[598]> (time (defparameter *primes-to-2^32*
>>                     (COM.INFORMATIMAGO.COMMON-LISP.PRIMES:COMPUTE-PRIMES-TO
>>                              (expt 2 32))))
>> Real time: 1740.897 sec.
>> Run time: 1740.4999 sec.
>> Space: 1894678112 Bytes
>> GC: 3, GC time: 0.26665 sec.
>> *PRIMES-TO-2^32*
>>
>> On a single 64-bit core of  Intel(R) Core(TM)2 Quad CPU    Q6600  @
>> 2.40GHz
>>
>>
>> Sorry, you'll have to wait tomorrow if you want me to send you the
>> primes.
>>
>> In the mean time, you can fetch my Eratostene sieve implementation,
>> and run it yourself...
>>
>
> That's slow! For anything above 1000000 a quadric sieve would probably
> fare better.

Yes, it's on clisp.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

"You cannot really appreciate Dilbert unless you read it in the
original Klingon"
From: Vassil Nikolov
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <snziqvhm8om.fsf@luna.vassil.nikolov.name>
On Mon, 07 Jul 2008 15:30:05 -0700, ··················@spamgourmet.com.remove (Robert Maas, http://tinyurl.com/uh3t) said:

| ...
| (defun oddn+faclo-brute-test-factors (n faclo) ;(setq n 3467527) (setq faclo 3)
|   ;(declare (optimize (speed 3) (safety 0))) ;Generates too many errors!!
|   (unless (fixnump n) (error "Non-FIXNUM N doesn't work here"))
|   (unless (fixnump faclo) (error "Non-FIXNUM faclo doesn't work here"))
|   (unless (oddp faclo) (error "FACLO needs to be odd"))
|   (prog ((nn (the (unsigned-byte 32) n))
|          (m (the (unsigned-byte 32) faclo))
|          (sqrtn (the (unsigned-byte 32) (isqrt nn)))
|          (r 0))
|     (declare (type (unsigned-byte 32) nn m sqrtn r))
|   lp(setq r (mod nn m))
|     (when (zerop r) (return m))
|     (incf m 2)
|     (when (<= m sqrtn) (go lp))
|     (return nil)
|     ))

| ...
| an excessive spew of error messages (see some later below)
| which overflow modem buffer so I can't even see them all.

  (By the way, does DRIBBLE help here?)

| ...
|   (+ M 2)
| Note: Unable to optimize due to type uncertainty:
|       The first argument is a NUMBER, not a FLOAT.

  This seems to say that changing (UNSIGNED-BYTE 32) to FIXNUM might
  be a better deal for that compiler (CMU CL?).  For what it's worth,
  provided that PROG is changed to PROG*, which has to be done in any
  case (SQRTN is defined in terms of NN), the above, even with the
  optimization declaration "in" (uncommented), compiles cleanly with
  SBCL (with another, insignificant change of (FIXNUMP foo) to
  (TYPEP foo 'FIXNUM)).

  By the way, the "do 10,000 times" loop ran twice as slow for me if
  all explicit type information was removed, i.e. with

    (defun oddn+faclo-brute-test-factors (n faclo)
      (prog* ((nn (the t n))
             (m (the t faclo))
             (sqrtn (the t (isqrt nn)))
             (r 0))
      lp(setq r (mod nn m))
        (when (zerop r) (return m))
        (incf m 2)
        (when (<= m sqrtn) (go lp))
        (return nil)
        ))

  (yes, it is easy to do C-M-k before the second argument of THE, then
  C-M-u, C-y and another C-M-k, but typing t C-M-k before the first
  argument of THE is even easier, and leaves more of the bridge
  unburnt behind).

  ---Vassil.


-- 
Peius melius est.  ---Ricardus Gabriel.
From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <rem-2008jul22-003@yahoo.com>
> | an excessive spew of error messages (see some later below)
> | which overflow modem buffer so I can't even see them all.
> From: Vassil Nikolov <···············@pobox.com>
> (By the way, does DRIBBLE help here?)

Ah, something I didn't think of trying. If DRIBBLE is used to save
massive spewout that otherwise would be lost due to modem buffer
overflow, it sorta defeats the idea of an interactive session, that
I need to enable DRIBBLE, do a REP step, close DRIBBLE, then 'more'
the dribble output file each time I do something that spews. But in
this case for a one-time (per major change in source-code
declarations) non-interactive act, maybe it would have been the
right thing to try. On the other hand, if the spew-out is too much
for the modem buffers, then maybe I don't *really* want to spend my
time manually eyeballing the whole thing to try to make sense of it all.

> | ...
> |   (+ M 2)
> | Note: Unable to optimize due to type uncertainty:
> |       The first argument is a NUMBER, not a FLOAT.
>   This seems to say that changing (UNSIGNED-BYTE 32) to FIXNUM might
>   be a better deal for that compiler (CMU CL?).

Yeah, except that since I've declared it a sub-type of INTEGER,
namely (UNSIGNED-BYTE 32), it makes no sense that it's thinking
that I would really want type FLOAT instead, like it's totally
messed up in its analysis of my code despite my best attempt to
tell it what kind of data I'm working with. Dealing with such
mis-understanding of my declarations and intentions is almost as
weird as talking with PARRY (the PDP-10 simulation of a paranoid
mental patient who was constantly concerned about gamblers out to
get him and would go "bonkers" if you said the slightest thing
wrong). Better to just walk out of the room and leave the crazy
compiler alone rather than try to reason with it and risk going
insane yourself.

>  For what it's worth,
>  provided that PROG is changed to PROG*, which has to be done in any
>  case (SQRTN is defined in terms of NN),

Yeah, that was a simple goof. I discovered it after posting but
before seeing any followups when it started producing outright
wrong results. I looked around in debugger, saw that two of the
PROG variables were mutually inconsistent, the second of them
seemingly based on a value I had been using during debugging
earlier, whereupon the cause was obvious, it was using the GLOBAL
binding left over from debugging rather than the PROG-bound current
binding, at which point I realized I should have used PROG* so that
the calculation of the later PROG variable would use the new value
of the earlier PROG variable intead of the leftover GLOBAL value.

As a result of the hour I lost tracking down this bug before I
realized PROG[*] was the problem, I have changed my standard way of
doing line-by-line REP development:
- Old way: Use G*varname for globals that really are intended to be
   globals, use varname for locals that will be global only
   temporarily during line-at-a-time REP development.
- New way: Use G*varname for globals that really are intended to be
   globals, use X-varname for locals that will be global only
   temporarily during line-at-a-time REP development, then delete
   all the X- prefixes at the moment I switch from line-at-a-time
   to full-function-definition, so that any X-varname global
   bindings left around from line-at-a-time REP development can't
   possibly be accidently referenced from the function definition
   because they aren't the same symbols in the first place. That way
   if I screw up like that I'd get a nice clean/obvious UNBOUND
   VARIABLE error signalled.
Since I need to sweep the PROG or PROG* for all SETQs (both regular
SETQs and MULTIPLE-VALUE-SETQs, using a single SETQ search string
to find them all together), I just do an extra sweep for string X-
just before that, so it's not a big hassle. That seems a better
idea than what I had thought of doing previously, of doing a
MAKUNBOUND of each symbol at that transition. Search-replace from
X- to null string in a text editor is a lot faster/fullproof than
concocting all those MAKUNBOUND calls.

> the above, even with the optimization declaration "in"
> (uncommented), compiles cleanly with SBCL (with another,
> insignificant change of (FIXNUMP foo) to (TYPEP foo 'FIXNUM)).

If you do *not* do that FIXNUMP change, do you get some small but
reaonable number of compiler warnings, compared to lots and lots of
them which I get in CMUCL and none at all that you get in SBCL with
the FIXNUMP change? I wish I could try it myself, but my ISP
doesn't provide it, only CMUCL, and I don't have enough private
disk space to install SBCL myself.

> By the way, the "do 10,000 times" loop ran twice as slow for me if
> all explicit type information was removed, i.e. with
> ... ... (the t faclo)) ... ...
>  (yes, it is easy to do C-M-k before the second argument of THE, then
>  C-M-u, C-y and another C-M-k, but typing t C-M-k before the first
>  argument of THE is even easier, and leaves more of the bridge
>  unburnt behind).

Yeah, I know that feeling. If only we had a utility to compare two
versions of a block of text and auto-generate a differences report
using California Propositions markup or something like it
(strike-out for deleted text, BOLD for inserted text, plain for
everything that didn't change), with some easy way to pick and
choose between old and new at each point. I've been thinking of
implmenting a Web/CGI service that compared two documents and
generated Claifornia Propositions markup in HTML.
I already have the comparator itself, which given two texts:
   Hi, this isa a test of hsome text that needs corrextion
   Hi, this is a test of some text that needs correction.
will produce a nested list like this (done manually here):
   ((:STET "Hi, this is") (:DELETE "a") (:STET " a test of ")
    (:DELETE "h") (:STET " text that needs corre") (:REPLACE "x" "c")
    (:STET "tion") (:INSERT "."))
I use that utility in several applications.
I just haven't yet gotten around to writing a CGI demo of it.
From: Raymond Toy (RT/EUS)
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <sxdabg9u1qe.fsf@rtp.ericsson.se>
>>>>> "Robert" == Robert Maas <··················@spamgourmet.com.remove> writes:

    >> the above, even with the optimization declaration "in"
    >> (uncommented), compiles cleanly with SBCL (with another,
    >> insignificant change of (FIXNUMP foo) to (TYPEP foo 'FIXNUM)).

    Robert> If you do *not* do that FIXNUMP change, do you get some small but
    Robert> reaonable number of compiler warnings, compared to lots and lots of
    Robert> them which I get in CMUCL and none at all that you get in SBCL with
    Robert> the FIXNUMP change? I wish I could try it myself, but my ISP
    Robert> doesn't provide it, only CMUCL, and I don't have enough private
    Robert> disk space to install SBCL myself.

Recent versions of cmucl should support (unsigned-byte 32) better, in
the sense that machine instructions are more likely to be used.  But
you also need to help out the compiler by declaring variables and such
correctly.  Earlier versions would sometimes cons when not strictly
necessary.

I can't remember what version you're using.

Ray
From: Pascal J. Bourguignon
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <8763rgu9mj.fsf@hubble.informatimago.com>
··················@spamgourmet.com.remove (Robert Maas, http://tinyurl.com/uh3t) writes:
> With the OPTIMIZE declaration also, attempt to compile the function
> generates an excessive spew of error messages (see some later below)
> which overflow modem buffer so I can't even see them all. What is wrong?



(defvar *compile-spool* '())

(defun compile-file/spool (&rest args)
  (setf *compile-spool*
    (with-output-to-string (*error-output*)
      (let ((*trace-output* *error-output*)
            (*standard-output* *error-output*))
        (apply (function compile-file) args)))))

(defun more (text)
  (let ((lines (split-sequence:split-sequence #\newline text)))
    (loop
      (loop :repeat 24 :while lines :do (princ (pop lines)) (terpri))
        (unless lines (return-from more (values)))
        (loop 
           :named command
           :do (format *query-io* "N)ext page Q)uit : ")
               (case (read *query-io*)
                 ((N) (return-from command))
                 ;; Other commands left as an exercise to the reader.
                 ((Q) (return-from more (values))))))))
                    

(compile-file/spool "your-program.lisp")
(more *compile-spool*)



> Just the first four of the spew of error messages:

They're just warnings.  You could use them to improve the
optimizability of your program.

-- 
__Pascal Bourguignon__                     http://www.informatimago.com/

"Logiciels libres : nourris au code source sans farine animale."
From: Mark Wooding
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <slrng76gkk.pac.mdw@metalzone.distorted.org.uk>
Robert Maas, http://tinyurl.com/uh3t <··················@spamgourmet.com.remove> wrote:

> (Please don't complain that I'm using PROG. I like PROG!!)

Wouldn't dream of it.

> ;Given N = odd integer to factor, FACLO = smallest odd factor to consider:
> ;Test odd integers from FACLO upward to first success or sqrt(N).
> ;Return NIL if no such factor was found, else return the factor.

If this is running too slowly for you, you could consider using a
fancier factoring algorithm. ;-)  For example, Pollard's rho is very
simple; Lenstra's elliptic-curve method (ECM) isn't much more
complicated to do badly.  The quadratic sieve (QS) and its friends are
trickier.

> (defun oddn+faclo-brute-test-factors (n faclo) ;(setq n 3467527) (setq faclo 3)
>   ;(declare (optimize (speed 3) (safety 0))) ;Generates too many errors!!

They're not errors, they're `compiler notes'.  When CMUCL (which I
assume you're using) decides that it can't inline arithmetic or whatever
as much as it'd like, it issues a note.

>   (unless (fixnump n) (error "Non-FIXNUM N doesn't work here"))
>   (unless (fixnump faclo) (error "Non-FIXNUM faclo doesn't work here"))
>   (unless (oddp faclo) (error "FACLO needs to be odd"))

If you're willing to be CMUCL-specific, you can just declare argument
types, and wind down SAFETY later; CMUCL will insert type checks.  And
FIXNUMP is a CMUCL-specific function...

Alternatively, use CHECK-TYPE. ;-)

>   (prog ((nn (the (unsigned-byte 32) n))
>          (m (the (unsigned-byte 32) faclo))
>          (sqrtn (the (unsigned-byte 32) (isqrt nn)))
>          (r 0))
>     (declare (type (unsigned-byte 32) nn m sqrtn r))

The variable NN isn't in scope.  Maybe you really wanted PROG*.

Your choice of type declarations is very odd.

You check up front that your arguments are fixnums; but then you
actually depend on them being (a) nonnegative and (b) less than 2^32.
Also, SQRTN must actually be an (UNSIGNED-BYTE 16), as must M when the
loop is actually going.

In fact (with SBCL 1.0.17), I find the following sufficient to generate
plausible code (no consing, everything inline, no compiler notes):

(defun mdw-small-factor (n lo)
  (declare (type (unsigned-byte 32) n lo))
  (assert (oddp lo) (lo) "Initial factor ~A must be odd." lo)
  (do ((end (isqrt n))
       (m lo (+ m 2)))
      ((> m end) nil)
    (declare (optimize (speed 3) (safety 0)))
    (when (zerop (mod n m))
      (return m))))

To be fair, for my CMUCL's (19d-release) benefit, I had to change the
inner declaration to

    (declare (optimize (speed 3) (safety 0))
             (type (unsigned-byte 32) m)
	     (type (unsigned-byte 16) end))

> With the OPTIMIZE declaration also, attempt to compile the function
> generates an excessive spew of error messages (see some later below)
> which overflow modem buffer so I can't even see them all. What is
> wrong?

You have an ancient version of CMUCL with poorer type inference than the
one I've got here.  I don't get those compiler notes.  All I get is

;   (RETURN M)
; ==>
;   (RETURN-FROM NIL M)
; Note: Doing unsigned word to integer coercion (cost 20) from M to
; "<return value>".

This is fair, since this code only gets run once per call.

-- [mdw]
From: John Thingstad
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <op.udy0mofsut4oq5@pandora.alfanett.no>
P� Tue, 08 Jul 2008 12:33:56 +0200, skrev Mark Wooding  
<···@distorted.org.uk>:

>
> (defun mdw-small-factor (n lo)
>   (declare (type (unsigned-byte 32) n lo))
>   (assert (oddp lo) (lo) "Initial factor ~A must be odd." lo)
>   (do ((end (isqrt n))
>        (m lo (+ m 2)))
>       ((> m end) nil)
>     (declare (optimize (speed 3) (safety 0)))
>     (when (zerop (mod n m))
>       (return m))))

mod here returns the remaining ratio on the heap on (nth-value 2 ..). This  
slows everything down. use rem instead.
(rem can be inlined and doesn't cons)

--------------
John Thingstad
From: Mark Wooding
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <slrng76n89.pac.mdw@metalzone.distorted.org.uk>
John Thingstad <·······@online.no> wrote:

> mod here returns the remaining ratio on the heap on (nth-value 2 ..). This  
> slows everything down. use rem instead.
> (rem can be inlined and doesn't cons)

That's the second time you've uttered this codswallop.

  1. There's no good reason why returning multiple values has to cons.
     It doesn't in ECL, CLISP, CMUCL or SBCL.  Indeed, the latter two
     return multiple values in machine registers.

  2. There's similarly no good reason why MOD can't be inlined.  Indeed,
     in SBCL, it seems it is inlined, even in the generic case (it calls
     TRUNCATE, and does some generic arithmetic).  With more aggressive
     type declarations, the compiled code is entirely inline.

  3. The compiler can see, from the call of MOD (or any similar
     function), how many values are consumed.  It can therefore arrange
     to compile a call to a version of the function which doesn't bother
     computing the unnecessary values.

  4. According to both the Hyperspec and SBCL's declarations, MOD
     returns a single value anyway.

Please stop with this misinformation.

-- [mdw]
From: John Thingstad
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <op.udy6ojbcut4oq5@pandora.alfanett.no>
P� Tue, 08 Jul 2008 14:26:49 +0200, skrev Mark Wooding  
<···@distorted.org.uk>:

> John Thingstad <·······@online.no> wrote:
>
>> mod here returns the remaining ratio on the heap on (nth-value 2 ..).  
>> This
>> slows everything down. use rem instead.
>> (rem can be inlined and doesn't cons)
>
> That's the second time you've uttered this codswallop.
>
>   1. There's no good reason why returning multiple values has to cons.
>      It doesn't in ECL, CLISP, CMUCL or SBCL.  Indeed, the latter two
>      return multiple values in machine registers.
>
>   2. There's similarly no good reason why MOD can't be inlined.  Indeed,
>      in SBCL, it seems it is inlined, even in the generic case (it calls
>      TRUNCATE, and does some generic arithmetic).  With more aggressive
>      type declarations, the compiled code is entirely inline.
>
>   3. The compiler can see, from the call of MOD (or any similar
>      function), how many values are consumed.  It can therefore arrange
>      to compile a call to a version of the function which doesn't bother
>      computing the unnecessary values.
>
>   4. According to both the Hyperspec and SBCL's declarations, MOD
>      returns a single value anyway.
>
> Please stop with this misinformation.
>
> -- [mdw]

Well it has a dramatic effect in LispWorks (one idiv instruction), so  
wouldn't call it misinformation.
But by Norvig's 'law', be spesific.
Don't use modulus if you want to truncate.
Factors are positive integers. Thus why bother about special cases for  
negative numbers?

--------------
John Thingstad
From: Pascal J. Bourguignon
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <7c4p70e3vc.fsf@pbourguignon.anevia.com>
"John Thingstad" <·······@online.no> writes:

> P� Tue, 08 Jul 2008 14:26:49 +0200, skrev Mark Wooding
> <···@distorted.org.uk>:
>
>> John Thingstad <·······@online.no> wrote:
>>
>>> mod here returns the remaining ratio on the heap on (nth-value 2
>>> ..).  This
>>> slows everything down. use rem instead.
>>> (rem can be inlined and doesn't cons)
>>
>> That's the second time you've uttered this codswallop.
>>
>>   1. There's no good reason why returning multiple values has to cons.
>>      It doesn't in ECL, CLISP, CMUCL or SBCL.  Indeed, the latter two
>>      return multiple values in machine registers.
>>
>>   2. There's similarly no good reason why MOD can't be inlined.  Indeed,
>>      in SBCL, it seems it is inlined, even in the generic case (it calls
>>      TRUNCATE, and does some generic arithmetic).  With more aggressive
>>      type declarations, the compiled code is entirely inline.
>>
>>   3. The compiler can see, from the call of MOD (or any similar
>>      function), how many values are consumed.  It can therefore arrange
>>      to compile a call to a version of the function which doesn't bother
>>      computing the unnecessary values.
>>
>>   4. According to both the Hyperspec and SBCL's declarations, MOD
>>      returns a single value anyway.
>>
>> Please stop with this misinformation.
>>
>> -- [mdw]
>
> Well it has a dramatic effect in LispWorks (one idiv instruction), so
> wouldn't call it misinformation.
> But by Norvig's 'law', be spesific.
> Don't use modulus if you want to truncate.
> Factors are positive integers. Thus why bother about special cases for
> negative numbers?

What is the special case? MOD or REM?

-- 
__Pascal Bourguignon__
From: John Thingstad
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <op.udzfv2o7ut4oq5@pandora.alfanett.no>
P� Tue, 08 Jul 2008 18:29:27 +0200, skrev Pascal J. Bourguignon  
<···@informatimago.com>:

>
> What is the special case? MOD or REM?
>

 From the processor point of view mod requires more instructions.
Ie. avoid the need for extra checks is you don't need them.
Or if you like hope the compiler will optimize them away. (not generally  
possible)

--------------
John Thingstad
From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <rem-2008jul22-005@yahoo.com>
> From: "John Thingstad" <·······@online.no>
> > (defun mdw-small-factor (n lo)
> >   (declare (type (unsigned-byte 32) n lo))
> >   (assert (oddp lo) (lo) "Initial factor ~A must be odd." lo)
> >   (do ((end (isqrt n))
> >        (m lo (+ m 2)))
> >       ((> m end) nil)
> >     (declare (optimize (speed 3) (safety 0)))
> >     (when (zerop (mod n m))
> >       (return m))))
> mod here returns the remaining ratio on the heap on (nth-value 2 ..).
> This slows everything down. use rem instead.
> (rem can be inlined and doesn't cons)

Somebody else replied and said your info is wrong.
All I care about is speed (and reduced CONSing).
So I'll judge your idea on the basis of the timing test:

(defun mdw-small-factor (n lo)
  (declare (type (unsigned-byte 32) n lo))
  (assert (oddp lo) (lo) "Initial factor ~A must be odd." lo)
  (do ((end (isqrt n))
       (m lo (+ m 2)))
      ((> m end) nil)
    (declare (optimize (speed 3) (safety 0))
             (type (unsigned-byte 32) m)
             (type (unsigned-byte 16) end))
    (when (zerop (rem n m))
      (return m))))
(compile 'mdw-small-factor)
;No warnings

(integer-length (setq pq (* 65521 65537))) ;32
(setq *gc-verbose* nil)
(mdw-small-factor pq 3) ;65521
(time (dotimes (ix 10000) (mdw-small-factor pq 3)))
Compiling LAMBDA NIL:
Compiling Top-Level Form:
Evaluation took:
  16.49 seconds of real time
  14.852923 seconds of user run time
  0.125615 seconds of system run time
  0 page faults and
  800672 bytes consed.
(slightly faster than before your change, but probably random
 timing variation from run to run, will try a couple more times:)

(time (dotimes (ix 10000) (mdw-small-factor pq 3)))
Compiling LAMBDA NIL:
Compiling Top-Level Form:
Evaluation took:
  19.75 seconds of real time
  14.90023 seconds of user run time
  0.359447 seconds of system run time
  [Run times include 0.04 seconds GC run time]
  0 page faults and
  790024 bytes consed.
(well that was actually slower than the code before your change, no
 need to do another test, this already proves it's random
 variation, not significant)
From: ···············@space.at
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <u7ibw1swi.fsf@space.at>
>>>>> "Mark" == Mark Wooding <···@distorted.org.uk> writes:

    Mark> If this is running too slowly for you, you could consider
    Mark> using a fancier factoring algorithm. ;-) For example,
    Mark> Pollard's rho is very simple; Lenstra's elliptic-curve
    Mark> method (ECM) isn't much more complicated to do badly.  The
    Mark> quadratic sieve (QS) and its friends are trickier.

See http://www.cliki.net/ulimyhmpqs for an implementation of the
(Hypercube Multiple Polynomial) Quadratic Sieve.

Robert, have you tried to convince the administrator of the system you
are using to ungrade to a newer version of CMUCL?

                                hope this helps,
                                    Roland
From: Mark Wooding
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <slrng76nc0.pac.mdw@metalzone.distorted.org.uk>
···············@space.at <···············@space.at> wrote:

> See http://www.cliki.net/ulimyhmpqs for an implementation of the
> (Hypercube Multiple Polynomial) Quadratic Sieve.

Oooh, shiny!

-- [mdw]
From: Dan Weinreb
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <58cda1b1-aeb3-4d90-9e5d-9e11957892b8@p25g2000hsf.googlegroups.com>
On Jul 8, 8:07 am, ···············@space.at wrote:
> >>>>> "Mark" == Mark Wooding <····@distorted.org.uk> writes:
>
>     Mark> If this is running too slowly for you, you could consider
>     Mark> using a fancier factoring algorithm. ;-) For example,
>     Mark> Pollard's rho is very simple; Lenstra's elliptic-curve
>     Mark> method (ECM) isn't much more complicated to do badly.  The
>     Mark> quadratic sieve (QS) and its friends are trickier.
>
> Seehttp://www.cliki.net/ulimyhmpqsfor an implementation of the
> (Hypercube Multiple Polynomial) Quadratic Sieve.
>
> Robert, have you tried to convince the administrator of the system you
> are using to ungrade to a newer version of CMUCL?
>
>                                 hope this helps,
>                                     Roland

You might also give SBCL (Steel Bank Common Lisp) a try.  It is based
on the same compiler as CMUCL, but there have been improvements.  SBCL
has a very good code generator.  It should be easy to try and see what
happens.
From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <rem-2008jul23-001@yahoo.com>
> From: ···············@space.at
> Robert, have you tried to convince the administrator of the
> system you are using to ungrade to a newer version of CMUCL?

No. I don't want to rock the boat. Also I don't know how to
convince a mosquito to suck blood. s/mosquito/horse/suck blood/drink water/
(Sometimes I get tired of the usual cliche.)
I have more important convince-person needs in my life,
must not squander my energy on less-valuable tasks.
For example, how to convince the ACLU to to even return my phone
calls much less take my case where I was denied my right to due process?
Or convince Kaiser-Permanente not to discriminate against lynx users.
Or convince some employer to hire me for the first time since 1992.
Or convince *anybody* in the local area to look at my recent work
so that I will have a professional reference when I apply for a job,
and also to beta-test my software to detect problems I don't see myself.
Or convince *anybody* to spend significant time getting acquainted
with me so that I will have a personal reference when I apply for a job.
From: Don Geddis
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <87vdyw7in5.fsf@geddis.org>
··················@spamgourmet.com.remove (Robert Maas, http://tinyurl.com/uh3t) wrote on Wed, 23 Jul 2008:
>> From: ···············@space.at
>> Robert, have you tried to convince the administrator of the
>> system you are using to ungrade to a newer version of CMUCL?
>
> No. [...]
> I have more important convince-person needs in my life,
> must not squander my energy on less-valuable tasks.
> For example, how to convince the ACLU to to even return my phone
> calls much less take my case where I was denied my right to due process?
> Or convince Kaiser-Permanente not to discriminate against lynx users.
> Or convince some employer to hire me for the first time since 1992.
> Or convince *anybody* in the local area to look at my recent work
> so that I will have a professional reference when I apply for a job,
> and also to beta-test my software to detect problems I don't see myself.
> Or convince *anybody* to spend significant time getting acquainted
> with me so that I will have a personal reference when I apply for a job.

Sounds like you have a lot of unmet goals, and many of these do indeed
sound more important than getting your sys admin to upgrade CMUCL for your
shell account.

But surely there's an efficiency metric for your use of time, not just the
value of the end goal?  Your other needs seem more intractable for you at
this time.

Whereas, the benefit to you of a newer CMUCL, while minor, would certainly
be positive.  And the effort may be minimal.  Have you tried even raising
the issue?  It's possible that it would take your sys admin a mere 5 minutes
to install the latest version, instead of the very old version that you have.

If not, no big loss.  But I don't see the harm, for you, in sending a single
email request.

It seems like a productive (= high efficiency) use of your time.

        -- Don
_______________________________________________________________________________
Don Geddis                  http://don.geddis.org/               ···@geddis.org
Never trust anybody who says "trust me."  Except just this once, of course.
	-- John Varley, "Steel Beach"
From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <rem-2008jul22-004@yahoo.com>
> > ;Given N = odd integer to factor, FACLO = smallest odd factor to consider:
> > ;Test odd integers from FACLO upward to first success or sqrt(N).
> > ;Return NIL if no such factor was found, else return the factor.
> From: Mark Wooding <····@distorted.org.uk>
> If this is running too slowly for you, you could consider using a
> fancier factoring algorithm.

No, the whole point of this is to have something quick-and-dirty to
factor small integers. Beyond that point, I have no need to factor
anything. As it turns out, the algorithm I posted, with the type
declarations, but without the optimization settings, is good enough
for most of my immediate needs. I was just hoping I could push it a
bit further without much trouble, but that seems not possible in
CMUCL, so I'll leave it as-is.

> For example, Pollard's rho is very simple; Lenstra's
> elliptic-curve method (ECM) isn't much more complicated to do
> badly.  The quadratic sieve (QS) and its friends are trickier.

A couple years ago I actually did program, for a different purpose,
something like that, the algorithm where I generate lots of
differences between squares and the number to factor that have
mostly small factors, and use something like matrix diagonalization
to combine shared factors into an explicit square term, until at
some point I get an exact square on the right side whereupon I have
a nontrivial difference of squares equal to the number to factor.

But the whole point of the current exercise is that I directly
construct a product of lots of integers small enough to be quickly
factorizable without requiring any elaborate algorithm. I was
hoping that "small enough" meant up to 32 bits, as it should be if
the CPU supports 32-bit unsigned arithmetic and Lisp allows access
to that arithmetic without CONSing. But that seems to be easier
said than done with CMUCL.

There was another thread (benchmarking CPU speed) where I was able
to get 32-bit non-CONSing arithmetic by putting all my small
integers into an array declared of element type (unsigned-byte 32).
Maybe I need to apply that trick here. For this application I
wouldn't even need to make the array global. I could allocate a new
array each time I enter the find-the-first-nontrivial-factor loop.
That's not zero memory allocation per call, but at least it's fixed
very-small amount of memory allocation per call, which should be
good enough for my purposes. Maybe I should try that when I have
time/energy to get back to this experiment.

> >   ;(declare (optimize (speed 3) (safety 0))) ;Generates too many errors!!

> They're not errors, they're `compiler notes'.  When CMUCL (which I
> assume you're using) decides that it can't inline arithmetic or whatever
> as much as it'd like, it issues a note.

Yeah, but as somebody else pointed out, when my whole goal here is
to get a tight loop as fast as possible, doing machine-32-bit
arithmetic without any CONSing, any failure to inline the code,
whereby values need to be boxed and unboxed repeatedly as they pass
to externaly-called functions, is in fact an error in my attempt.

> If you're willing to be CMUCL-specific,

No, actually I'm not. I want the code to run correctly everywhere,
but to be efficient almost everywhere, where "almost" must include
CMUCL. In fact it would be nice if it ran in XLISP and/or PowerLisp
on my Macintosh.

> Alternatively, use CHECK-TYPE. ;-)

Yeah, somebody else suggested that too, and I remember that it
didn't do what I wanted when I first tried it year ago, but I can't
remember the details of why I decided not to use it. I might reconsider.

> >   (prog ((nn (the (unsigned-byte 32) n))
> >          (m (the (unsigned-byte 32) faclo))
> >          (sqrtn (the (unsigned-byte 32) (isqrt nn)))
> >          (r 0))
> >     (declare (type (unsigned-byte 32) nn m sqrtn r))
> The variable NN isn't in scope.  Maybe you really wanted PROG*.

Yeah, I discovered that myself when a later test run (of the fullly
defined function) was picking up a value from an earlier
line-by-line testing.

> Your choice of type declarations is very odd.
> You check up front that your arguments are fixnums; but then you
> actually depend on them being (a) nonnegative and (b) less than 2^32.
> Also, SQRTN must actually be an (UNSIGNED-BYTE 16), as must M when the
> loop is actually going.

Ah, I confess that >99% of the time I don't need ultra-speed in my
code so it's extremely rare that I even bother with declaractions,
so I don't have the background of practical experience to avoid
making beginner's mistakes that are blatantly obvious to you. Also
I don't have a "feel" for what works with declarations, so I have
to try a lot of different ideas as what might work, and that
results in massive modifications to try one thing then try another
thing, which when done manually tends to yield inconsistencies
where I've made a change one place but haven't made a corresponding
change elsewhere (and as I said I don't have the experience to
immediately notice my mistake). So despite my 15+ years Lisp
programming experience (1.6, MacLisp, SL, PSL, MACL, CMUCL etc.),
CMUCL is the only CL where I've needed to do type declarations at
all, and I've only rarely done it even there, so I'm a newbie
(worse than a novice) at CL declarations.

> In fact (with SBCL 1.0.17), I find the following sufficient to generate
> plausible code (no consing, everything inline, no compiler notes):
> (defun mdw-small-factor (n lo)
>   (declare (type (unsigned-byte 32) n lo))
>   (assert (oddp lo) (lo) "Initial factor ~A must be odd." lo)
>   (do ((end (isqrt n))
>        (m lo (+ m 2)))
>       ((> m end) nil)
>     (declare (optimize (speed 3) (safety 0)))
>     (when (zerop (mod n m))
>       (return m))))

Let me try that in CMUCL:
(compile 'mdw-small-factor)
In: LAMBDA (N LO)
  (> M END)
Note: Unable to optimize due to type uncertainty:
      The first argument is a REAL, not a FLOAT.
Note: Unable to optimize due to type uncertainty:
      The first argument is a REAL, not a INTEGER.
..
Compilation unit finished.
  18 notes
(At least it didn't spew enough to fill modem buffers and lose output!
 So already that's *some* improvement from what I had tried.)

> To be fair, for my CMUCL's (19d-release) benefit, I had to change the
> inner declaration to
>     (declare (optimize (speed 3) (safety 0))
>              (type (unsigned-byte 32) m)
>              (type (unsigned-byte 16) end))

OK, let me make that change and try again:

(defun mdw-small-factor (n lo)
  (declare (type (unsigned-byte 32) n lo))
  (assert (oddp lo) (lo) "Initial factor ~A must be odd." lo)
  (do ((end (isqrt n))
       (m lo (+ m 2)))
      ((> m end) nil)
    (declare (optimize (speed 3) (safety 0))
             (type (unsigned-byte 32) m)
             (type (unsigned-byte 16) end))
    (when (zerop (mod n m))
      (return m))))
(compile 'mdw-small-factor)
Hey, no warnings at all!!

(integer-length (setq pq (* 65521 65537))) ;32
(setq *gc-verbose* nil)
(mdw-small-factor pq 3) ;65521
(time (dotimes (ix 10000) (mdw-small-factor pq 3)))
Compiling LAMBDA NIL:
Compiling Top-Level Form:

Evaluation took:
  16.93 seconds of real time
  14.889368 seconds of user run time
  0.026083 seconds of system run time
  0 page faults and
  802312 bytes consed.
(1.7 milliseconds per call, compared to 2-4 milliseconds per call
 the way I was doing it when I first posted.)

So why is it doing all that CONSing? It needed to CONS
the boxed value PQ just once before I even started the
loop, not each time through the loop. IX is a FIXNUM
which doesn't need any boxing nor otherwise CONSing,
and unboxing the value of PQ when it gets inside the
function being called shouldn't need to do a CONS, and
the return value is a FIXNUM so no CONSing should be
needed each of one thousand times that value is
returned (then discarded).

> You have an ancient version of CMUCL with poorer type inference than the
> one I've got here.  I don't get those compiler notes.  All I get is
> ;   (RETURN M)
> ; ==>
> ;   (RETURN-FROM NIL M)
> ; Note: Doing unsigned word to integer coercion (cost 20) from M to
> ; "<return value>".
> This is fair, since this code only gets run once per call.

Yes, that's fair, the target I had been aiming for.
This version of CMUCL is the best I have available on shell account.
On my Macintosh the situation is even worse, XLisp 2.1g and
PowerLisp 2.01, each of which freezes my entire system requiring
cold-restart several times a day if I'm not careful, and XLisp
doesn't even have BIGNUMs.
From: Geoffrey Summerhayes
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <54252633-a5cc-4a5f-92ee-0992b38d3714@s50g2000hsb.googlegroups.com>
On Jul 7, 6:30 pm, ··················@spamgourmet.com.remove (Robert
Maas, http://tinyurl.com/uh3t) wrote:
> (Please don't complain that I'm using PROG. I like PROG!!)
>
> ;Given N = odd integer to factor, FACLO = smallest odd factor to consider:
> ;Test odd integers from FACLO upward to first success or sqrt(N).
> ;Return NIL if no such factor was found, else return the factor.
> (defun oddn+faclo-brute-test-factors (n faclo) ;(setq n 3467527) (setq faclo 3)
>   ;(declare (optimize (speed 3) (safety 0))) ;Generates too many errors!!
>   (unless (fixnump n) (error "Non-FIXNUM N doesn't work here"))
>   (unless (fixnump faclo) (error "Non-FIXNUM faclo doesn't work here"))
>   (unless (oddp faclo) (error "FACLO needs to be odd"))
>   (prog ((nn (the (unsigned-byte 32) n))
>          (m (the (unsigned-byte 32) faclo))
>          (sqrtn (the (unsigned-byte 32) (isqrt nn)))
>          (r 0))

I wish to register a complaint!
Should be PROG* (Look for NN)

>     (declare (type (unsigned-byte 32) nn m sqrtn r))
>   lp(setq r (mod nn m))
>     (when (zerop r) (return m))
>     (incf m 2)
>     (when (<= m sqrtn) (go lp))
>     (return nil)
>     ))
>

*snip*

> With the OPTIMIZE declaration also, attempt to compile the function
> generates an excessive spew of error messages (see some later below)
> which overflow modem buffer so I can't even see them all. What is wrong?
>
> Just the first four of the spew of error messages:

*snip*

Um, NONE of of the messages you listed are errors, just compiler
optimization notes. Would that be CMUCL?

----
Geoff
From: Barry Margolin
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <barmar-ACEB17.09061009072008@newsgroups.comcast.net>
In article 
<····································@s50g2000hsb.googlegroups.com>,
 Geoffrey Summerhayes <·······@gmail.com> wrote:

> Um, NONE of of the messages you listed are errors, just compiler
> optimization notes. Would that be CMUCL?

But since his intent is to optimize the operations they're complaining 
about, they're effectively errors in this context.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <rem-2008jul23-003@yahoo.com>
> > Um, NONE of of the messages you listed are errors, just compiler
> > optimization notes. Would that be CMUCL?
> From: Barry Margolin <······@alum.mit.edu>
> But since his intent is to optimize the operations they're complaining
> about, they're effectively errors in this context.

Correct. (I looked briefly at and downloaded a whole batch of
responses, including yours, before starting to reply to the first,
so I made reference to your astute observation in some of my
earlier responses.)
From: Robert Maas, http://tinyurl.com/uh3t
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <rem-2008jul23-002@yahoo.com>
> From: Geoffrey Summerhayes <·······@gmail.com>
> I wish to register a complaint!
> Should be PROG* (Look for NN)

Yeah, you're the third person I've seen notice the same mistake,
although I didn't see any of your articles until after I had
already found the mistake myself (when a run of the whole function
was picking up a value left over from line-by-line debugging).

> Um, NONE of of the messages you listed are errors, just compiler
> optimization notes.

Yeah, but somebody else nicely explained, since my purpose is to
highly optimize the code as 32-bit integers, getting warnings that
floating-point can't be optimized tells me something *terrible* is
wrong.

> Would that be CMUCL?

Yes:
CMU Common Lisp 18b, running on shell.rawbw.com
Send questions and bug reports to your local CMU CL maintainer, or to cmucl-help
@cons.org. and ·········@cons.org. respectively.
Loaded subsystems:
    Python 1.0, target Intel x86
    CLOS based on PCL version:  September 16 92 PCL (f)
From: Barry Margolin
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <barmar-C15798.04592123072008@newsgroups.comcast.net>
In article <·················@yahoo.com>,
 ··················@spamgourmet.com.remove (Robert Maas, 
 http://tinyurl.com/uh3t) wrote:

> > From: Geoffrey Summerhayes <·······@gmail.com>
> > I wish to register a complaint!
> > Should be PROG* (Look for NN)
> 
> Yeah, you're the third person I've seen notice the same mistake,
> although I didn't see any of your articles until after I had
> already found the mistake myself (when a run of the whole function
> was picking up a value left over from line-by-line debugging).
> 
> > Um, NONE of of the messages you listed are errors, just compiler
> > optimization notes.
> 
> Yeah, but somebody else nicely explained, since my purpose is to
> highly optimize the code as 32-bit integers, getting warnings that
> floating-point can't be optimized tells me something *terrible* is
> wrong.

My guess is that it can only optimize FLOAT and FIXNUM.  When the 
arguments are not one of these types, it complains that it's not FLOAT.

Perhaps the warning messages have gotten better in newer versions of 
CMUCL.  Since you're using an old version, you probably can't get the 
optimization you want.

-- 
Barry Margolin, ······@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
From: Raymond Toy (RT/EUS)
Subject: Re: Trying to optimize tight loop, need help/advice
Date: 
Message-ID: <sxdk5fcslsw.fsf@rtp.ericsson.se>
>>>>> "Robert" == Robert Maas <··················@spamgourmet.com.remove> writes:

    >> From: Geoffrey Summerhayes <·······@gmail.com>
    >> I wish to register a complaint!
    >> Should be PROG* (Look for NN)

    Robert> Yeah, you're the third person I've seen notice the same mistake,
    Robert> although I didn't see any of your articles until after I had
    Robert> already found the mistake myself (when a run of the whole function
    Robert> was picking up a value left over from line-by-line debugging).

    >> Um, NONE of of the messages you listed are errors, just compiler
    >> optimization notes.

    Robert> Yeah, but somebody else nicely explained, since my purpose is to
    Robert> highly optimize the code as 32-bit integers, getting warnings that
    Robert> floating-point can't be optimized tells me something *terrible* is
    Robert> wrong.

The warnings messages look like they're incomplete.  Normally, when
the compiler prints things like that, you get a longer series of
messages about foo not being of some type, eventually getting to a
type that the compiler could have really optimized.  But you'd have to
know about those types to understand.

FWIW, I tried your code with CMUCL 2008-07.  The only warning is that
the function has to box up the return value.  A peek at the
disassembly shows no consing except for the return value.

    Robert> Yes:
    Robert> CMU Common Lisp 18b, running on shell.rawbw.com

I think there have been a large number of improvements since 18b.  But
they might not be relevant to your code.  Hard to tell.

Ray