[ODE] Some SSE in Quick step

Joakim Eriksson jme at snowcode.com
Sun May 30 12:02:56 MST 2004


> > >  It's certainly not doing anything like:
> > >
> > > If(SSE2)
> > > {
> > > 	SSE2MatrixMultiply();
> > > }
> > > Else if(SSE)
> > > {
> > > 	SSEMatrixMultiply();
> > > }
> > > Else if...
> > >
> > > if that's what you're getting at.
> > No. Read what i've said again. "jumptable chachacha" has nothing to
do
> > with the code you've posted.
> So you know that D3DX is doing this mysteriously-named "jumptable
> chachacha"?

What D3DX does is to have a jump table. At the start of a program they
all point to the same 'Setup' function. So whatever you call first will
get into that setup function. The setup function then setups the real
function pointers depending on what cpu you have. 

It's a clever system because it's clean and you dont have to do any
'if(SSE)' at each call. However you get the access to this jump table so
only large functions can get optimized using this system and even they
will get a perfomance penalty. This is the main reson why you see that
so few functions are actually optimized. The rest are placed in the
standard D3DX header files.

Just to show some perfomance figures from our engine here (Do keep in
mind that this is a syntechic performance tests so it might not be
completetly accurate. The main loop is just performing the operation and
then cycling the diffrent matrices to get a more real result)

------------------------+----------+-----------------+-----------------+
Function                | Original |     SSE(Speedup)|     D3D(Speedup)|
m3.Identity()           |     15.4 |     9.2 ( 1.67) |    25.3 ( 0.61) |

m3 *= m3                |    167.5 |    70.1 ( 2.39) |   289.4 ( 0.58) |

m3 = m3 * m3            |    131.4 |    81.5 ( 1.61) |   345.6 ( 0.38) |
m4.Identity()           |     25.2 |    12.1 ( 2.07) |    25.2 ( 1.00) |

m4 *= m4                |    272.8 |   128.1 ( 2.13) |   286.2 ( 0.95) |

m4 = m4 * m4            |    249.8 |   137.9 ( 1.81) |   336.0 ( 0.74) |
Transpose()             |     77.2 |    25.6 ( 3.02) |    27.9 ( 2.77) |
------------------------+----------+-----------------+-----------------+

All time are cycle times.
Original - Our unoptimized code. Inlined
SSE      - Our optimized SSE code. Inlined
D3D      - D3DX functions. Some are inlined some use the jump table.

Cheers
 Joakim E. - http://www.snowcode.com





More information about the ODE mailing list