I wrote a simple little function to simulate a web page request taking
several seconds to calculate before beginning to return a result. As
strange as it may sound, I got side tracked because I was disappointed
in the efficiency of my time wasting function. I actually learned a
few things along the way.
;;; Time waster
;; the declarations in this function make no noticable difference
(defun waste-time ()
(do ((tim (+ 2 (get-universal-time)))
(i 0 (the fixnum (1+ i))))
((< tim (get-universal-time)) i)
(declare (fixnum i))))
;; the declerations in this function improved speed by about x10
(defun count-to-n (n)
(do ((i 0 (the fixnum (1+ i))))
((<= n i) i)
(declare (fixnum i n))))
;; self explanatory
(defun time-waste-time ()
(let ((tim (time (waste-time))))
(time (count-to-n tim))))
(time-waste-time)
(let ((res nil))
(dotimes (i 10 res)
(push (time-waste-time) res)))
;; this shows that (get-universal-time) is NOT where the time is spent
(time (dotimes (i 10000000) (get-universal-time)))
;; shis shows lots of work being done just by bignum assignment
(time (let ((tim))
(dotimes (i 10000000 tim)
(setf tim (get-universal-time)))))
;; this shows bignum compares being fairly fast
(time (let ((tim (+ 2 (get-universal-time))))
(dotimes (i 10000000 tim)
(< tim (get-universal-time)))))
My expectation was for larger values from waste-time. I can't tell
from the above what is making waste-time so 'slow'. I expect the do
to loop much faster than it does. waste-time also conses a lot. It
is comparable to bignum assignment rather than to the bignum compare.
If waste-time was doing useful work to return a dynamicly generated
file in a single threaded HTTP server, how would it be made to work
concurrently with other requests that require the same sort of
processing? It seems to me that some sort of yield function would
have to be thrown in. This is why I operate under the belief that a
thread per request is the better model. I'm not sure how you set up a
Lisp process to run with smaller stacks for threads. On my Mac, I
have 'ulimit -s 8192' in my .profile for SBCL. Stacks may not need to
be larger than that depending on the app.
--
An ideal world is left as an excercise to the reader.
--- Paul Graham, On Lisp 8.1
David Steuber <·····@david-steuber.com> writes:
> ;; this shows that (get-universal-time) is NOT where the time is spent
> (time (dotimes (i 10000000) (get-universal-time)))
>
> ;; shis shows lots of work being done just by bignum assignment
> (time (let ((tim))
> (dotimes (i 10000000 tim)
> (setf tim (get-universal-time)))))
>
> ;; this shows bignum compares being fairly fast
> (time (let ((tim (+ 2 (get-universal-time))))
> (dotimes (i 10000000 tim)
> (< tim (get-universal-time)))))
I wonder whether these show what you think they show.
CMUCL compiles
(defun a ()
(dotimes (i 10000000) (get-universal-time)))
into code that loops 10000000 times doing *nothing*
and doesn't call GET-UNIVERSAL-TIME at all. A function
like your second one, however, does call GET-UNIVERSAL-TIME
inside the loop. And the third one is a null loop, again,
but it calls GET-UNIVERSAL-TIME exactly once.
Exercise for the reader: explain the above. :-)
> My expectation was for larger values from waste-time. I can't tell
> from the above what is making waste-time so 'slow'. I expect the do
> to loop much faster than it does. waste-time also conses a lot. It
> is comparable to bignum assignment rather than to the bignum compare.
GET-UNIVERSAL-TIME conses.
--
Gareth McCaughan
.sig under construc
Gareth McCaughan <················@pobox.com> writes:
> David Steuber <·····@david-steuber.com> writes:
>
> > ;; this shows that (get-universal-time) is NOT where the time is spent
> > (time (dotimes (i 10000000) (get-universal-time)))
> >
> > ;; shis shows lots of work being done just by bignum assignment
> > (time (let ((tim))
> > (dotimes (i 10000000 tim)
> > (setf tim (get-universal-time)))))
> >
> > ;; this shows bignum compares being fairly fast
> > (time (let ((tim (+ 2 (get-universal-time))))
> > (dotimes (i 10000000 tim)
> > (< tim (get-universal-time)))))
>
> I wonder whether these show what you think they show.
> CMUCL compiles
>
> (defun a ()
> (dotimes (i 10000000) (get-universal-time)))
>
> into code that loops 10000000 times doing *nothing*
> and doesn't call GET-UNIVERSAL-TIME at all. A function
> like your second one, however, does call GET-UNIVERSAL-TIME
> inside the loop. And the third one is a null loop, again,
> but it calls GET-UNIVERSAL-TIME exactly once.
>
> Exercise for the reader: explain the above. :-)
That's actually a tough one. I guess functional programming isn't all
it's cracked up to be. The compiler sees no return value so it
optimizes away the body of the loop. But if the loop has no body, and
the return value is nil, then why not just optimize the whole thing
away to simply return nil as an inline function?
> > My expectation was for larger values from waste-time. I can't tell
> > from the above what is making waste-time so 'slow'. I expect the do
> > to loop much faster than it does. waste-time also conses a lot. It
> > is comparable to bignum assignment rather than to the bignum compare.
>
> GET-UNIVERSAL-TIME conses.
And that's an excuse?
Well, I guess my waste-time function is a good one. Not only will it
run anywhere from zero to three seconds, it will create a lot of
garbage.
--
An ideal world is left as an excercise to the reader.
--- Paul Graham, On Lisp 8.1
From: André Thieme
Subject: making Lisp faster by giving it 40 more bits (was: Wasting Time)
Date:
Message-ID: <c98mf9$9pa$1@ulric.tng.de>
David Steuber wrote:
> I wrote a simple little function to simulate a web page request taking
> several seconds to calculate before beginning to return a result. As
> strange as it may sound, I got side tracked because I was disappointed
> in the efficiency of my time wasting function. I actually learned a
> few things along the way.
>
> ;;; Time waster
>
> ;; the declarations in this function make no noticable difference
> (defun waste-time ()
> (do ((tim (+ 2 (get-universal-time)))
> (i 0 (the fixnum (1+ i))))
> ((< tim (get-universal-time)) i)
> (declare (fixnum i))))
>
> ;; the declerations in this function improved speed by about x10
> (defun count-to-n (n)
> (do ((i 0 (the fixnum (1+ i))))
> ((<= n i) i)
> (declare (fixnum i n))))
>
> ;; self explanatory
> (defun time-waste-time ()
> (let ((tim (time (waste-time))))
> (time (count-to-n tim))))
>
> (time-waste-time)
>
> (let ((res nil))
> (dotimes (i 10 res)
> (push (time-waste-time) res)))
>
> ;; this shows that (get-universal-time) is NOT where the time is spent
> (time (dotimes (i 10000000) (get-universal-time)))
>
> ;; shis shows lots of work being done just by bignum assignment
> (time (let ((tim))
> (dotimes (i 10000000 tim)
> (setf tim (get-universal-time)))))
>
> ;; this shows bignum compares being fairly fast
> (time (let ((tim (+ 2 (get-universal-time))))
> (dotimes (i 10000000 tim)
> (< tim (get-universal-time)))))
>
> My expectation was for larger values from waste-time. I can't tell
> from the above what is making waste-time so 'slow'. I expect the do
> to loop much faster than it does. waste-time also conses a lot. It
> is comparable to bignum assignment rather than to the bignum compare.
>
> If waste-time was doing useful work to return a dynamicly generated
> file in a single threaded HTTP server, how would it be made to work
> concurrently with other requests that require the same sort of
> processing? It seems to me that some sort of yield function would
> have to be thrown in. This is why I operate under the belief that a
> thread per request is the better model. I'm not sure how you set up a
> Lisp process to run with smaller stacks for threads. On my Mac, I
> have 'ulimit -s 8192' in my .profile for SBCL. Stacks may not need to
> be larger than that depending on the app.
>
You just found out what I stumbled over some weeks ago. I probably
forgot to post about it...
please do this:
CL-USER 2 > (defun bignum-counter ()
(dotimes (i 25000000)))
BIGNUM-COUNTER
CL-USER 3 > (compile 'bignum-counter)
and then time it and then come back and compare with my results:
CL-USER 7 > (time (bignum-counter))
Timing the evaluation of (BIGNUM-COUNTER)
user time = 21.931
system time = 0.070
Elapsed time = 0:00:53
Allocation = 265783960 bytes standard / 3652 bytes conses
0 Page faults
Calls to %EVAL 33
NIL
If you try this in Java you will get an execution speed which is 50-80
times higher. This thing is done in less than one second!
I think the problem lies in the implementation of bignum.
After these disappointing results try now this:
CL-USER 16 > (defun clever-counter ()
(dotimes (o 1000)
(dotimes (i 25000))))
CLEVER-COUNTER
CL-USER 17 > (compile 'clever-counter)
And see what happens:
CL-USER 21 > (time (clever-counter))
Timing the evaluation of (CLEVER-COUNTER)
user time = 0.180
system time = 0.010
Elapsed time = 0:00:00
Allocation = 1672 bytes standard / 3542 bytes conses
0 Page faults
Calls to %EVAL 33
NIL
We now have the expected, Lisp worthy speed.
Tested in LW 4.3.6 on Windows.
I hope this helps a bit.
I don't know why the implementors of Lisps don't give us a "real" 64-bit
datatype, like the one in Java or at least a 32-bit type. In real world
apps it happens from time to time to use numbers over 100 million.
In LW I see
CL-USER 80 > (bignump 8388608)
T
CL-USER 81 > (bignump 8388607)
NIL
8388608 is 2^23-1
In Clisp I get 2^24-1
[63]> (type-of 16777216)
BIGNUM
[64]> (type-of 16777215)
FIXNUM
The problem starts when we run over these numbers. So in LW 2^23 or
more. In Java (for example) I can go up to 2^63-1 before I need to start
with bignum calculations. Of course, then also in Java the process of
working with these numbers slows down greatly.
What are the problems for the implementors to give us some more bits for
faster calculations?
In my programs I must play the human compiler/optimizer and try to split
up numbers.. if possible (which usually isn't).
Andr�
--
"Andr� Thieme" <······························@justmail.de> wrote in message
·················@ulric.tng.de...
[...]
> please do this:
> CL-USER 2 > (defun bignum-counter ()
> (dotimes (i 25000000)))
> BIGNUM-COUNTER
>
> CL-USER 3 > (compile 'bignum-counter)
>
> and then time it and then come back and compare with my results:
>
>
>
> CL-USER 7 > (time (bignum-counter))
> Timing the evaluation of (BIGNUM-COUNTER)
>
> user time = 21.931
> system time = 0.070
> Elapsed time = 0:00:53
> Allocation = 265783960 bytes standard / 3652 bytes conses
> 0 Page faults
> Calls to %EVAL 33
> NIL
From the format of your output I gather you're using LispWorks. Here are
the results I got with LW. What hardware did you run this on? I'm using an
HP/Compaq notebook with a Pentium 4 2.66 GHz. chip.
CL-USER 4 >
(time (bignum-counter))
Timing the evaluation of (BIGNUM-COUNTER)
user time = 4.866
system time = 0.010
Elapsed time = 0:00:05
Allocation = 265785080 bytes standard / 3267 bytes conses
0 Page faults
Calls to %EVAL 33
NIL
Carl Taylor
From: André Thieme
Subject: Re: making Lisp faster by giving it 40 more bits
Date:
Message-ID: <c98u8d$euf$1@ulric.tng.de>
Carl Taylor wrote:
> "Andr� Thieme" <······························@justmail.de> wrote in message
> ·················@ulric.tng.de...
>
> [...]
>
>
>>please do this:
>>CL-USER 2 > (defun bignum-counter ()
>> (dotimes (i 25000000)))
>>BIGNUM-COUNTER
>>
>>CL-USER 3 > (compile 'bignum-counter)
>>
>>and then time it and then come back and compare with my results:
>>
>>
>>
>>CL-USER 7 > (time (bignum-counter))
>>Timing the evaluation of (BIGNUM-COUNTER)
>>
>>user time = 21.931
>>system time = 0.070
>>Elapsed time = 0:00:53
>>Allocation = 265783960 bytes standard / 3652 bytes conses
>>0 Page faults
>>Calls to %EVAL 33
>>NIL
>
>
> From the format of your output I gather you're using LispWorks. Here are
> the results I got with LW. What hardware did you run this on? I'm using an
> HP/Compaq notebook with a Pentium 4 2.66 GHz. chip.
>
> CL-USER 4 >
>
> (time (bignum-counter))
> Timing the evaluation of (BIGNUM-COUNTER)
>
> user time = 4.866
> system time = 0.010
> Elapsed time = 0:00:05
> Allocation = 265785080 bytes standard / 3267 bytes conses
> 0 Page faults
> Calls to %EVAL 33
> NIL
>
> Carl Taylor
>
Hi Carl,
I am runnig LW (btw, I stated that I am using LW under Windows) on a AMD
1133 MHz, 512 MB ram. I had some calculations in the background, that's
why it took so long - 53 seconds instead of 8 which I get now.
Anyway, my point is still true. Please let bignum-counter count to 250
million and then to 3*10^9. And try clever-counter with
(defun clever-counter-250 ()
(dotimes (o 25000)
(dotimes (i 10000))))
(defun clever-counter-heavy ()
(dotimes (o 3000)
(dotimes (i 1000000))))
On my machine clever-counter-heavy runs 18 seconds.
How fast ran your bignum-counter-3billion?
A bignum-counter-300million took on my machine 2min 45sec. I were not
even in the mood to try out the 3billion version. Perhaps you want to
try it :)
Andr�
--
From: André Thieme
Subject: Re: making Lisp faster by giving it 40 more bits
Date:
Message-ID: <c98mlo$9rt$1@ulric.tng.de>
Andr� Thieme wrote:
> In LW I see
>
> CL-USER 80 > (bignump 8388608)
> T
>
> CL-USER 81 > (bignump 8388607)
> NIL
>
> 8388608 is 2^23-1
^
|
|
should be 7:
8388607 = 2^23-1
Andr�
--
André Thieme <······························@justmail.de> writes:
> André Thieme wrote:
>
> > In LW I see
> > CL-USER 80 > (bignump 8388608)
> > T
> > CL-USER 81 > (bignump 8388607)
> > NIL
> > 8388608 is 2^23-1
> ^
> |
> |
> should be 7:
>
> 8388607 = 2^23-1
Ok, here is a transcript. I did this directly in SBCL. I'm actually
in the middle of a compile, so the last one took longer than it
should. I did the compile calls, but they are not necessary with
SBCL. SBCL is compile only and does not have an interpreter.
$ sbcl
This is SBCL 0.8.10.55, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.
SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
; loading system definition from
; #P"/home/david/usr/lib/sbcl/systems/sb-bsd-sockets.asd" into
; #<PACKAGE "ASDF2757">
; registering #<SYSTEM SB-BSD-SOCKETS {98B8821}> as SB-BSD-SOCKETS
; registering #<SYSTEM SB-BSD-SOCKETS-TESTS {9B8D241}> as SB-BSD-SOCKETS-TESTS
; loading system definition from
; #P"/home/david/usr/lib/sbcl/systems/sb-posix.asd" into #<PACKAGE "ASDF2857">
; registering #<SYSTEM SB-POSIX {97E07A1}> as SB-POSIX
; registering #<SYSTEM SB-POSIX-TESTS {990BA79}> as SB-POSIX-TESTS
* (defun bignum-counter ()
(dotimes (i 25000000)))
BIGNUM-COUNTER
* (compile 'bignum-counter)
BIGNUM-COUNTER
NIL
NIL
* (time (bignum-counter))
Evaluation took:
1.73 seconds of real time
0.982851 seconds of user run time
0.005999 seconds of system run time
0 page faults and
0 bytes consed.
NIL
* (defun clever-counter ()
(dotimes (o 1000)
(dotimes (i 25000))))
CLEVER-COUNTER
* (compile 'clever-counter)
CLEVER-COUNTER
NIL
NIL
* (time (clever-counter))
Evaluation took:
1.117 seconds of real time
1.114831 seconds of user run time
0.0 seconds of system run time
0 page faults and
0 bytes consed.
NIL
* most-positive-fixnum
536870911
* (defun bignum-counter ()
(dotimes (i (1+ most-positive-fixnum))))
STYLE-WARNING: redefining BIGNUM-COUNTER in DEFUN
BIGNUM-COUNTER
* (compile 'bignum-counter)
BIGNUM-COUNTER
NIL
NIL
* (time (bignum-counter))
Evaluation took:
103.638 seconds of real time
52.830967 seconds of user run time
0.072989 seconds of system run time
0 page faults and
0 bytes consed.
NIL
* (quit)
SBCL's most-positive-fixnum on x86 and ppc-darwin is 2^29 - 1.
OpenMCL has a much smaller 2^24 - 1 fixnum size. I'm not sure why it
is so small.
--
An ideal world is left as an excercise to the reader.
--- Paul Graham, On Lisp 8.1
From: Alexey Dejneka
Subject: Re: making Lisp faster by giving it 40 more bits (was: Wasting Time)
Date:
Message-ID: <m37juwaqn0.fsf@comail.ru>
Hello,
Andr� Thieme <······························@justmail.de> writes:
> I don't know why the implementors of Lisps don't give us a "real"
> 64-bit datatype, like the one in Java or at least a 32-bit type. In
> real world apps it happens from time to time to use numbers over 100
> million.
As object types are (in general) statically unknown, objects should
have uniform representation and carry type information, which takes
some bits. E.g. under CMUCL and SBCL FIXNUMs have 30 bits (including
sign) and 2 bits are used for type. In a local scope, where a compiler
sees all usages of a number, it may use more efficient representation,
e.g. native machine 32-bits numbers:
CL-USER> (defun test ()
(dotimes (i (1- (ash 1 32)))))
TEST
CL-USER> (disassemble 'test)
; 093C066A: MOV EAX, 0 ; no-arg-parsing entry point
; 6F: JMP L1
; 71: L0: INC EAX
; 72: L1: CMP EAX, 4294967295
; 77: JB L0
; 79: MOV EDX, 83886091
; 7E: MOV ECX, [EBP-8]
; 81: MOV EAX, [EBP-4]
; 84: ADD ECX, 2
; 87: MOV ESP, EBP
; 89: MOV EBP, EAX
; 8B: JMP ECX
; 8D: NOP
; 8E: NOP
; 8F: NOP
; 90: BREAK 10 ; error trap
; 92: BYTE #X02
; 93: BYTE #X16 ; INVALID-ARG-COUNT-ERROR
; 94: BYTE #X4D ; ECX
;
NIL
But if the value of I is passed to some separately compiled function,
a standard, possibly boxed, representation should be used.
--
Regards,
Alexey Dejneka
"Alas, the spheres of truth are less transparent than those of
illusion." -- L.E.J. Brouwer
Andr� Thieme <······························@justmail.de> writes:
> If you try this in Java you will get an execution speed which is 50-80
> times higher. This thing is done in less than one second!
>
> I think the problem lies in the implementation of bignum.
No, the only problem (as you observe further on in your post) is that
LW has a (somewhat irritatingly) low most-positive-fixnum. In MCL 5.0,
for instance, most-positive-fixnum has 29 bits, and your bignum-counter
runs in 1.2 seconds on my rather outdated eMac (700 Mhz).
But, IMHO, LW has a lot of nice features which by far outweighs this
little drawback. I'd love to have the possibility to declare unboxed
32 bit unsigned integers, though, to me that would be more useful than
extending the fixnum range.
--
(espen)
Espen Vestre <·····@*do-not-spam-me*.vestre.net> wrote in message news:<··············@merced.netfonds.no>...
> Andr� Thieme <······························@justmail.de> writes:
>
> > If you try this in Java you will get an execution speed which is 50-80
> > times higher. This thing is done in less than one second!
> >
> > I think the problem lies in the implementation of bignum.
>
> No, the only problem (as you observe further on in your post) is that
> LW has a (somewhat irritatingly) low most-positive-fixnum. In MCL 5.0,
> for instance, most-positive-fixnum has 29 bits, and your bignum-counter
> runs in 1.2 seconds on my rather outdated eMac (700 Mhz).
>
> But, IMHO, LW has a lot of nice features which by far outweighs this
> little drawback. I'd love to have the possibility to declare unboxed
> 32 bit unsigned integers, though, to me that would be more useful than
> extending the fixnum range.
I get the following times with LWM v.4.3.7 on an "old" G4 500MHz:
CL-USER 12 > (time (bignum-counter))
Timing the evaluation of (BIGNUM-COUNTER)
user time = 0.250
system time = 0.010
Elapsed time = 0:00:01
Allocation = 5240 bytes
0 Page faults
Calls to %EVAL 31
NIL
Is this relevant?
Cheers
·······@mac.com (Olivier Drolet) writes:
> I get the following times with LWM v.4.3.7 on an "old" G4 500MHz:
[...]
> Is this relevant?
LispWorks on non-x86 machines uses a different tagging scheme which
provides longer fixnums (30-bits, depending on how you are counting).
·······@mac.com (Olivier Drolet) writes:
> Is this relevant?
Yes, I should have remembered that LWM has larger most-positive-fixnum's
than the x86 versions:
CL-USER 1 > (integer-length most-positive-fixnum)
29
--
(espen)
Espen Vestre <·····@*do-not-spam-me*.vestre.net> wrote in message news:<··············@merced.netfonds.no>...
> ·······@mac.com (Olivier Drolet) writes:
>
> > Is this relevant?
>
> Yes, I should have remembered that LWM has larger most-positive-fixnum's
> than the x86 versions:
>
> CL-USER 1 > (integer-length most-positive-fixnum)
> 29
Was that on x86?
I get the following on LWM:
CL-USER 14 > (integer-length most-positive-fixnum)
29
Cheers
·······@mac.com (Olivier Drolet) writes:
>> CL-USER 1 > (integer-length most-positive-fixnum)
>> 29
>
> Was that on x86?
Nope, on x86 it's only 23.
--
(espen)
From: Rahul Jain
Subject: Re: making Lisp faster by giving it 40 more bits
Date:
Message-ID: <87pt8io51u.fsf@nyct.net>
Andr� Thieme <······························@justmail.de> writes:
> I don't know why the implementors of Lisps don't give us a "real" 64-bit
> datatype, like the one in Java or at least a 32-bit type. In real world
> apps it happens from time to time to use numbers over 100 million.
If it only happens from time to time, then why are you trying to burden
the speed of other operations just because you want to speed up this
one? Should all specialized integer arrays now have 64-bit cells instead
of 32-bit ones? Your CPU is optimized for certain operations. Let the
language use those operations when they suffice. What you're advocating
would be to make (dotimes (i 10000) (values)) two times slower.
FWIW, I think one or both of CMUCL and SBCL were adding unboxed 64-bit
types.
--
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: André Thieme
Subject: Re: making Lisp faster by giving it 40 more bits
Date:
Message-ID: <c9lcqt$19t$1@ulric.tng.de>
Rahul Jain wrote:
> Andr� Thieme <······························@justmail.de> writes:
>
>
>>I don't know why the implementors of Lisps don't give us a "real" 64-bit
>>datatype, like the one in Java or at least a 32-bit type. In real world
>>apps it happens from time to time to use numbers over 100 million.
>
>
> If it only happens from time to time, then why are you trying to burden
> the speed of other operations just because you want to speed up this
> one? Should all specialized integer arrays now have 64-bit cells instead
> of 32-bit ones? Your CPU is optimized for certain operations. Let the
> language use those operations when they suffice. What you're advocating
> would be to make (dotimes (i 10000) (values)) two times slower.
>
> FWIW, I think one or both of CMUCL and SBCL were adding unboxed 64-bit
> types.
>
I don't understand why the work with "small" numbers like 10000 would
slow down. My message is, that in todays programm it is not totally
unrealistic that we have to do calculations with numbers above 10^9.
If I have such a number and want to do thousands of operations on it,
they all will be slow.
In Java these calculations won't be slow (as long our number is under
9000000000000000000).
Of course, somewhere we have to draw a line and say "from this point we
do the calculations inside of strings or another data structure and
accept that they are carried out slower."
Why should this line be drawn at 2^24 -1?
Andr�
--
From: Rahul Jain
Subject: Re: making Lisp faster by giving it 40 more bits
Date:
Message-ID: <87u0xtbeu3.fsf@nyct.net>
Andr� Thieme <······························@justmail.de> writes:
> I don't understand why the work with "small" numbers like 10000 would
> slow down.
1. It has to use two registers for all values.
2. It has to use 64 bits for all values.
3. It has to load and store the value using two separate operations.
> My message is, that in todays programm it is not totally
> unrealistic that we have to do calculations with numbers above 10^9.
> If I have such a number and want to do thousands of operations on it,
> they all will be slow.
> In Java these calculations won't be slow (as long our number is under
> 9000000000000000000).
> Of course, somewhere we have to draw a line and say "from this point we
> do the calculations inside of strings or another data structure and
> accept that they are carried out slower."
> Why should this line be drawn at 2^24 -1?
Why should this line be drawn at 2^64 - 1?
The line you're drawing is one of not being ABLE to do calculations when
you have numbers larger than 2^64 - 1 unless you rewrite most of the app
to use different data structures and operators on all these numbers.
--
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist
From: André Thieme
Subject: Re: making Lisp faster by giving it 40 more bits
Date:
Message-ID: <c9m28u$g83$1@ulric.tng.de>
Rahul Jain wrote:
> Andr� Thieme <······························@justmail.de> writes:
>
>
>>I don't understand why the work with "small" numbers like 10000 would
>>slow down.
>
>
> 1. It has to use two registers for all values.
> 2. It has to use 64 bits for all values.
> 3. It has to load and store the value using two separate operations.
How would that be on 64 bit machines?
>>Of course, somewhere we have to draw a line and say "from this point we
>>do the calculations inside of strings or another data structure and
>>accept that they are carried out slower."
>>Why should this line be drawn at 2^24 -1?
>
> Why should this line be drawn at 2^64 - 1?
As long we stay under 32 bits we can do our operations with one
register. If we go over 32 then we can go all the way up to 64 cause we
are now working with two registers anyways.
In my earlier posting I mentioned that in todays programs it is not very
unusual to work with numbers bigger than 10^9. As soon this is a bignum
(our line was crossed (at least with 23 bit numbers)) all calculations
on these numbers get slower.
> The line you're drawing is one of not being ABLE to do calculations when
> you have numbers larger than 2^64 - 1 unless you rewrite most of the app
> to use different data structures and operators on all these numbers.
Why is that?
When I have under LW 2^23 - 1 it is a fixnum. Now let me add 1 and I get
a bignum with which I can continue to do calculations.
Why would the program stop this behaviour if fixnums now have 60 bits?
Andr�
--
André Thieme <······························@justmail.de> writes:
> Rahul Jain wrote:
> > André Thieme <······························@justmail.de> writes:
> >
> >>I don't understand why the work with "small" numbers like 10000 would
> >>slow down.
> > 1. It has to use two registers for all values.
> > 2. It has to use 64 bits for all values.
> > 3. It has to load and store the value using two separate operations.
>
> How would that be on 64 bit machines?
It depends on the machine. 64 bit registers are nice, but you still
use 64bits worth of RAM. You also have to move the data from ram to
registers and back again. On a RISC architecture with 32bit
instructions, you will still need multiple instructions to load a 64
bit value.
On PPC, it takes two instructions to load a 32 bit value and five to
load a 64 bit value.
--
An ideal world is left as an excercise to the reader.
--- Paul Graham, On Lisp 8.1
From: Duane Rettig
Subject: Re: making Lisp faster by giving it 40 more bits
Date:
Message-ID: <44qpsidsr.fsf@franz.com>
David Steuber <·····@david-steuber.com> writes:
> André Thieme <······························@justmail.de> writes:
>
> > Rahul Jain wrote:
> > > André Thieme <······························@justmail.de> writes:
> > >
> > >>I don't understand why the work with "small" numbers like 10000 would
> > >>slow down.
> > > 1. It has to use two registers for all values.
> > > 2. It has to use 64 bits for all values.
> > > 3. It has to load and store the value using two separate operations.
> >
> > How would that be on 64 bit machines?
>
> It depends on the machine. 64 bit registers are nice, but you still
> use 64bits worth of RAM. You also have to move the data from ram to
> registers and back again.
Huh?
This seems garbled. 64-bits is the same as 32-bits, except that
there are 64 bits of address space instead of 32. One needn't
populate a 32-bit machine with all 4 Gb of RAM that can theoretically
be put onto it, nor must one populate a 64-bit machine with the many
terabytes of ram that theoretically be put onto it (if the
manufactureres were even to allow such a thing).
Perhaps you are talking about the width of each object doubling?
> On a RISC architecture with 32bit
> instructions, you will still need multiple instructions to load a 64
> bit value.
Yes, that's true, but the question was about 64-bit machines, for which
64-bit instructons could be used.
> On PPC, it takes two instructions to load a 32 bit value and five to
> load a 64 bit value.
It only takes one instruction to load a 32-bit value on any
architecture. It might take as many as two instructions to load
a 64-bit value from a tagged Lispval on any 64-bit architecture
which doesn't support byte-level address displacement, but only
one on machines which do support byte displacements
As examples, note the first instruction of the MacOSX function,
the first and second instructions of a 64-bit Power4 AIX system,
and the first instruction of an AMD64 system, all of which perform
the CAR operation in the function defined below. Note also the
sizes of the fixnums, which are already larger than on the 32-bit
Lisps.
MacOSX (Power, 32-bit):
CL-USER(1): (compile
(defun foo (x)
(declare (optimize speed (safety 0) (debug 0)))
(car x)))
FOO
NIL
NIL
CL-USER(2): (foo (list most-positive-fixnum))
536870911
CL-USER(3): (disassemble 'foo)
;; disassembly of #<Function FOO>
;; formals: X
;; code start: #x5d4758c:
0: 80630007 lwz r3,7(r3)
4: 3a000001 [addi] lil r16,1
8: 81a1000c lwz r13,12(r1)
12: 4e800020 blr
CL-USER(4):
Power4 (64-bit):
CL-USER(2): (foo (list most-positive-fixnum))
1152921504606846975
CL-USER(3): (disassemble 'foo)
;; disassembly of #<Function FOO>
;; formals: X
;; code start: #x7000010014271e8:
0: 3a23ffff cal r17,-1(r3)
4: e8710000 ld r3,0(r17)
8: 3a000001 [cal] lil r16,1
12: eae10018 ld r23,24(r1)
16: 4e800020 br
CL-USER(4):
AMD-64 (64-bit):
CL-USER(2): (foo (list most-positive-fixnum))
1152921504606846975
CL-USER(3): (disassemble 'foo)
;; disassembly of #<Function FOO>
;; formals: X
;; code start: #x10012f4358:
0: 48 8b 7f ef movq rdi,[rdi-17]
4: f8 clc
5: 4c 8b 74 24 10 movq r14,[rsp+16]
10: c3 ret
11: 90 nop
CL-USER(4):
--
Duane Rettig ·····@franz.com Franz Inc. http://www.franz.com/
555 12th St., Suite 1450 http://www.555citycenter.com/
Oakland, Ca. 94607 Phone: (510) 452-2000; Fax: (510) 452-0182
Duane Rettig <·····@franz.com> wrote:
+---------------
| ... nor must one populate a 64-bit machine with the many
| terabytes of ram that theoretically be put onto it (if the
| manufactureres were even to allow such a thing).
+---------------
True, although it's interesting to note that at least *some* systems
can now be bought with multi-terabytes of memory:
<URL:http://www.sgi.com/servers/altix/>
...
SGI Altix 3000 servers... [can run] a single Linux OS image
with 256 Intel Itanium 2 processors and up to 4TB of memory.
[And, yes, that is a single cache-coherent address space across
all 256 CPUs...]
-Rob
-----
Rob Warnock <····@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607
From: Joe Marshall
Subject: Re: making Lisp faster by giving it 40 more bits
Date:
Message-ID: <zn7kgonw.fsf@comcast.net>
Duane Rettig <·····@franz.com> writes:
> One needn't populate a 32-bit machine with all 4 Gb of RAM that can
> theoretically be put onto it, nor must one populate a 64-bit machine
> with the many terabytes of ram that theoretically be put onto it (if
> the manufactureres were even to allow such a thing).
That'd be several exabytes. Assuming a price of 1 cent per MByte (you
ought to be able to get a bulk discount), that'd be 1.7 trillion dollars.
At approximately 1 nJ per nS per 8 MB, you'd be consuming 2 terawatts
of power. You might find that the heat sink is governed by zoning
regulations.
--
~jrm
From: Duane Rettig
Subject: Re: making Lisp faster by giving it 40 more bits
Date:
Message-ID: <43c5bllza.fsf@franz.com>
David Steuber <·····@david-steuber.com> writes:
> Duane Rettig <·····@franz.com> writes:
>
> > David Steuber <·····@david-steuber.com> writes:
> >
> > > It depends on the machine. 64 bit registers are nice, but you still
> > > use 64bits worth of RAM. You also have to move the data from ram to
> > > registers and back again.
> >
> > Huh?
> >
> > This seems garbled. 64-bits is the same as 32-bits, except that
> > there are 64 bits of address space instead of 32. One needn't
> > populate a 32-bit machine with all 4 Gb of RAM that can theoretically
> > be put onto it, nor must one populate a 64-bit machine with the many
> > terabytes of ram that theoretically be put onto it (if the
> > manufactureres were even to allow such a thing).
>
> Sorry for the confusion. That did get garbled. I'm not talking about
> having RAM to back up a 64bit address space. I'm talking about moving
> a 64 bit value out memory (or as an immediate value) into a register
> using 32 bit wide instructions which have bits used for the opcode,
> registers, etc, leaving 16 bits for actual data.
All of the instructions are 32 bits wide, even when you go to 64-bit
addresses (except on VLIW, of course). When you go to 64-bit architectures,
it becomes more important not to do relocatables, but to do PIC instead.
It turns out to be perfect for Lisp anyway, since Lisp code vectors
(as movable data items) must be Position Independent code anyway.
> > > On a RISC architecture with 32bit
> > > instructions, you will still need multiple instructions to load a 64
> > > bit value.
> >
> > Yes, that's true, but the question was about 64-bit machines, for which
> > 64-bit instructons could be used.
>
> Double the size of programs? I guess you could do that. I thought
> that the 64 bit PPC machines were still using 32 bit instructions.
Yes, but you misunderstand me. The "64-bit instructions" are (like all
of the others) 32 bits in width, but they are _geared_ toward a 64-bit
machine. Usually it is just a handful of instructions that have
to be added to a machine's 32-bit instruction set, and some machines
just create a mode bit that turns a load instruction from loading
a 32-bit value into an instruction that loads a 64-bit value.
So for example, in PPC, the same lwz instruction exists to load a 32-bit
value from memory into a register, but there is a new ld instruction
that loads 64 bits into a register. That is one of the "64-bit"
instructions.
> I've downloaded a PDF on the PPC architecture because the Apple
> documentation for PPC asm programming sucks. I haven't had a chance
> to read it yet though.
Yeah, note that Apple doesn't do 64-bit yet. They _say_ they do,
and the use a 64-bit chip, and even use some of the 64-bit instructions
to say that they do it. But their operating system is still 32-bits,
last I saw, and it won't yet support a true 64-bit lisp. So you
won't get much of a true indication of what 64-bit operations really
means from their documentation.
If you Google for my name in this newsgroup around 9/11 of last year
(or it may have been 9/12) you'll see my complaint on this subject.
> > > On PPC, it takes two instructions to load a 32 bit value and five to
> > > load a 64 bit value.
> >
> > It only takes one instruction to load a 32-bit value on any
> > architecture. It might take as many as two instructions to load
> > a 64-bit value from a tagged Lispval on any 64-bit architecture
> > which doesn't support byte-level address displacement, but only
> > one on machines which do support byte displacements
>
> http://www-106.ibm.com/developerworks/linux/library/l-ppc/
>
> From the "Hello, world" example loading the address of the string:
>
> PPC32
> lis 4,···@ha # load top 16 bits of &msg
> addi 4,4,···@l # load bottom 16 bits
>
> PPC64
> lis 4,···@highest # load msg bits 48-63 into r4 bits 16-31
> ori 4,4,···@higher # load msg bits 32-47 into r4 bits 0-15
>
> rldicr 4,4,32,31 # rotate r4's low word into r4's high word
>
> # load low word into the low word of r4:
> oris 4,4,···@h # load msg bits 16-31 into r4 bits 16-31
> ori 4,4,···@l # load msg bits 0-15 into r4 bits 0-15
>
> "Loading pointers On ppc32 it took two instructions to load a 32-bit
> immediate value into a register. On ppc64 it takes 5! Why?
Because you're doing relocations, and you only have 16 bits in each
instruction for immediate relocatable data. If instead you were to
simply load the (64-bit) word from memory with 64-bit instruction,
then it would take only one instruction. On AIX, which provides
true 64-bit PIC, this is the way you load a value from
the TOC (like the GOT in linux):
ld r11,T.new_other(r2)
where T.new_other is defined in the TOC elsewhere.
> We still have 32-bit fixed-length instructions, which can only load 16
> bits worth of immediate value at a time. Right there you need a
> minimum of four instructions (64 bits / 16 bits per instruction = 4
> instructions). But there are no instructions that can load directly
> into the high word of a 64-bit GPR. So we have to load up the low
> word, shift it to the high word, then load the low word again."
This is correct, but again, it is dealing with relocatable code,
and not position-independent code. You might look for some
documentaition on PIC, or else look at the assembler output that
comes out of C when you give it the -kPIC option.
> This is where I got my information from. If I misunderstood
> something, I apologize.
No problem.
--
Duane Rettig ·····@franz.com Franz Inc. http://www.franz.com/
555 12th St., Suite 1450 http://www.555citycenter.com/
Oakland, Ca. 94607 Phone: (510) 452-2000; Fax: (510) 452-0182
From: Rahul Jain
Subject: Re: making Lisp faster by giving it 40 more bits
Date:
Message-ID: <87k6ym6ex0.fsf@nyct.net>
Andr� Thieme <······························@justmail.de> writes:
> Rahul Jain wrote:
>> Andr� Thieme <······························@justmail.de> writes:
>>
>>>I don't understand why the work with "small" numbers like 10000 would
>>>slow down.
>> 1. It has to use two registers for all values.
>> 2. It has to use 64 bits for all values.
>> 3. It has to load and store the value using two separate operations.
>
> How would that be on 64 bit machines?
It's a moot point on 64 bit machines. In that case, we talk about 128
bit numbers.
> As long we stay under 32 bits we can do our operations with one
> register. If we go over 32 then we can go all the way up to 64 cause we
> are now working with two registers anyways.
Which is what happens with bignums. You get a two-word bignum vector in
that case.
> In my earlier posting I mentioned that in todays programs it is not very
> unusual to work with numbers bigger than 10^9. As soon this is a bignum
> (our line was crossed (at least with 23 bit numbers)) all calculations
> on these numbers get slower.
And you're proposing making ALL operations slower just to make 64-bit
ones the only ones possible.
>> The line you're drawing is one of not being ABLE to do calculations when
>> you have numbers larger than 2^64 - 1 unless you rewrite most of the app
>> to use different data structures and operators on all these numbers.
>
> Why is that?
Because you want to be able to do C/Java-style 64-bit operations.
> When I have under LW 2^23 - 1 it is a fixnum. Now let me add 1 and I get
> a bignum with which I can continue to do calculations.
> Why would the program stop this behaviour if fixnums now have 60 bits?
In that case, using fixnums (and doing any other math) has just become
twice as slow.
--
Rahul Jain
·····@nyct.net
Professional Software Developer, Amateur Quantum Mechanicist