From: George Neuner
Subject: Re: Timing for various operations on a PC
Date: 
Message-ID: <h2dgm11oc8v03dpfug2n2o7jv5m3c59ir1@4ax.com>
On 1 Nov 2005 13:36:26 -0800, ········@gmail.com wrote:

>    In Peter Norvig's essay, "Teach Yourself Programming in Ten Years",
>he says it is a good idea for anybody interested in CS to find out
>certain details like -- " Know how long it takes your computer to
>execute an instruction, fetch a word from memory (with and without a
>cache miss), read consecutive words from disk, and seek to a new
>location on disk." He gives a table(values from 2001) for his PC.
>    Can anyone tell me how to determine this accurately for a PC?

In modern hierarchical memory systems, access times are location and
order dependent.  Factors affecting access time include such things as
address translation, cache associativity and line size, bus contention
from DMA devices, even RAM refresh cycles.  The worst case access time
is predictable for a given hardware configuration, but the analysis is
worthless WRT any other configuration.

WRT processors: modern CPUs with their deep pipelines, multiple
function units and speculative execution make instruction timing
somewhere between very hard and impossible.  Intel, for example, no
longer even publishes instruction cycle times for IA-32/64 processors
because there are so many variables.

In some cases an instruction can be seen to effectively execute in
_zero_ cycles (as measured externally) because it's execution was
entirely subsumed by parallel execution of a longer running
instruction.  A short sequence of instructions that beats on a
particular functional unit can be slower than a much longer sequence
that spreads over multiple units.  Register pressure can cause bubbles
in the pipeline.  A cache load miss can stall a pipeline for dozens or
hundreds of cycles ... an exception that flushes the pipeline can take
thousands of cycles.


CPU cycle counters, if you have them, are the best source of very
precise timing information, but keep in mind that they are valid only
for *uninterrupted* instruction sequences.  No processor I am aware of
preserves cycle counters across context switches so any multitasking
[even interrupts] may render them useless for a particular purpose.
Also, there is no guarantee that all the preceding instructions will
have finished when the cycle counter is read - Intel processors, for
example, do _not_ wait for other instructions to finish.


Norvig's suggestion is a good one, but in general it's impossible to
achieve outside of a very carefully controlled execution environment
where the performance details for every component are either known or
can be explored.

George
--
for email reply remove "/" from address