[ODE] Some SSE in Quick step

Cyril Labordrie clabordrie at wizarbox.com
Tue May 25 11:36:39 MST 2004


Hello ODE

I totally agree With Joakim. It is very difficult to beat MS Compiler.
Juste make a try, do a simple Math Library using SSE. It will be very
Difficult to make it run faster than a Non SSE compiled library. I don't
Know why. In all the tests I have done I always notices that the faster
Library are the DirectX Matrix.

If you want your code to go faster it is better to avoid all the cache
miss
Than any vector processor optimisation. On a very well-know console a
cache
miss takes around 40cycles. Then I totally agree with Joakim it is
better
to have a cache friendly code than a Vector Processor optimised code

Cyril Labordrie

-----Message d'origine-----
De : ode-bounces at q12.org [mailto:ode-bounces at q12.org] De la part de
Joakim Eriksson
Envoyé : mardi 25 mai 2004 10:37
À : ode at q12.org
Objet : RE: [ODE] Some SSE in Quick step

If your going to optimize the innerloop you should do it inplace because
it's such an important and small pice of code. You can't get every
single clock out of it if your using standard optimized matrix
functions.

However opimizing this innerloop isn't that easy. If it's not alligned I
can quite surly say that the VS.NET 7 compiler will do a very good job
at opimizing it (Just go into properties->C/C++->Code generation->Enable
enhanched instruction set->SSE) and he will use SSE code everywhere and
in ways you will have a very hard time beating (I know, I optimized our
math library here). However then he will only use single instructions.
He wont try to parallel optimize the code for you. Still it's not easy
to do better and the reason is the same it is with just about all code
nowdays. Memory speed. In this case my guess is that optimizing memory
access to be as cache frendly as possible would go a lot longer than
optimizing the computation in itself. Because we are walking though
quite a lot of memory. But the best way to know is to run the code
through VTune and he will tell you if the code is memory bound or cpu
bound.

Cheers
 Joakim E. - http://www.snowcode.com

> -----Original Message-----
> From: ode-bounces at q12.org [mailto:ode-bounces at q12.org] On Behalf Of
Ivan
> Bolcina
> Sent: den 25 maj 2004 10:00
> To: ode at q12.org
> Subject: Re: [ODE] Some SSE in Quick step
> 
> Adam D. Moss wrote:
> 
> > GARY VANSICKLE wrote:
> >
> >> DirectX.
> >
> >
> > ... is of no relevance.
> > _______________________________________________
> > ODE mailing list
> > ODE at q12.org
> > http://q12.org/mailman/listinfo/ode
> >
> Hi.
> 
> I hear that MS had both Intel and AND experts helping write optimized
> code for Direc3DX,I mean that extension for Direct3D, which is used
for
> calculation matrix,quaternions,... and is not really needed by
direct3d,
> it is just some programmer utility. It is supposed to be very
optimized
> for all sorts of extension SSE,MMX,....
> 
> 
> But i dont know if you could use that code but than again, it seems
like
> a waste of time to see matrix operations beenig rewritten so many
times
> in so many libraries. I am not sure, but I believe D3DX code is faster
> than anything written without help MS got from Intel and AMD.
> 
> bye, ivan
> _______________________________________________
> ODE mailing list
> ODE at q12.org
> http://q12.org/mailman/listinfo/ode



_______________________________________________
ODE mailing list
ODE at q12.org
http://q12.org/mailman/listinfo/ode




More information about the ODE mailing list