[ODE] Some SSE in Quick step

Joakim Eriksson jme at snowcode.com
Tue Jun 1 10:21:27 MST 2004


> Do you have this test code still around?  One thing that's hard for me
to
> figure out here is why your number for m4.Identity() is the same speed
as
> your unoptimized code, while for m3.Identity() it's 61% of your
> unoptimized
> code.  Which version of D3D is this BTW?

Well the reason is simple. D3D simply doesnt have 3x3 matrices. It only
has 4x4 matrices. However our own math library has 3x3 matrices and the
reason is that in our physics and collision there is a lot of math that
only needs 3x3 matrices. Also the Identity function in D3D isn't SSE
optimized. It's standard C code.

I can't share the whole test code (Company code.. yada.. yada..) but the
main loop looks like this
	for (i=0; i<ITER; i++)
	{
		_func;
		c1 = (c1+1)&3;
		c2 = (c2+1)&3;
		c3 = (c3+1)&3;
	}
for one test case the _func will be one of 
	m41[c1].Multiply(m41[c2]);
	m42[c1].Multiply(m42[c2]);
	m43[c1] *= m43[c2];
The m41,m42,m43 and so on are diffrent matrices of diffrent types.
I cycle the matrices used just so we dont read from a set of matrices
and write to a single matrice. You never do that in real life so this
should be a bit better at simulating real load and to prevent him a bit
from doing any obvious optimizations on the arguments. (Like optimizing
away the whole for loop)

The code this was aim at was DX 8 and a P3. Compiler used was VS.Net
2003 (Or 2002) with SSE compiler optimizations active.
 
My guess is that a lot of cpu time is spent going through the jump table
in this test case and that prevents the compiler from doing any form of
argument optimizations. 

Cheers
 Joakim E. - http://www.snowcode.com






More information about the ODE mailing list