I've heard of a method for fast dynamic type checking that uses the FPU.
This approach seems pretty novel and interesting to me. Can anyone
give a link or reference to provide more information? Thanks.
-Peter
That wouldn't be a great idea. The FPU isn't a general purpose unit.
In general, the FPU generally has higher latencies than the integer
unit (on the P4, its 1/2 cycle for simple integer instructions vs > 1
cycle for simple FPU instructions). It also tends to have a longer
pipeline, which also increases latency. Most CPUs even have a higher
cache latency for integer vs floating point loads. On the P4, an
integer load from the L1 cache takes 2 clock cycles, while a
floating-point load takes 6! All of this is because the FPU is
designed for streaming code. It performs best when you're performing
simple operations on a large amount of data, with few branches. In
contrast, type checking or generic dispatch is classic integer code.
Its a bunch of simple integer operations accessing data that is spread
out all over memory and containing a large percentage of branches. The
integer units of current CPUs already perform poorly at this task*,
and the FPU unit is even worse at it.
Besides, you've probably already got enough parallelism. Most current
CPUs have from 3-4 integer units**. Most code has trouble keeping just
3 units busy. It is highly likely that any typechecking code will find
a free unit, especially since typechecking depends only on simple
integer operations, and can likely be executed speculatively.
*: By my benchmarks, a simple mono-dispatch using a switch statement
takes over two-dozen clock cycles on a P4.
**: The P4 only has two IUs, but they're double-clocked, so they work
as fast as 4.