[ODE] why setTransform() ?

anon anon" <anon at finitemonkeys.com
Wed Jan 9 05:00:02 2002


> Have to be carefull on the Win32 thing, if you are reading old
> documentation it can be misleading.  I still find references to fixed
> point 32 bit math as an optimization, for example.  (Newer PC's are
> faster with floating than fixed).  I *think* (but can't give any
> references) the doubles are as fast as floats, but it's been a while
> since I hand optimized anything, so I'm not sure now.
> 
> Just making the point that the pace of evolution is fast on the PC side,
> it's amazing the number of people say 'well, it's always been that way'
> and of course is completely wrong (now).  I'll have to double check this
> though.
> 
>                == John ==

not so, those people are not completely wrong.

take FDIV on the P3, latency and reissue of 18 cycles for single
32 for double and 38 for extended, its not pipelined so the processor
can only issue one at a time

fsqrt 28 single, 57 double 

You can hide the setup times etc, with carefully constructed asm code
but its a lot easier to do it with fixed point, plus transcendental functions
and so on can be much faster with fixedpoint, since the comon ones 
are lookups, and common division become just shifts, also the bandwidth
is lower since most fixed point is 32 bit.

doubles and floats most certainly are not the same speed, on the P4 they
are both convertered to the internal 80 bit format, so use they are the
same speed there, but its more like the single suffered. and things like
FXCH are no longer free. 

I think you might be getting at the SSE/SSE2 of the P3/P4 , Intel have
obviously setup the P4 for people to use the SSE2 instead of the FPU
however you are probably going to have roll your own or use their
compiler (or vector c), and even then its not optimal and you can even
mix floating point with fixed point.

The G4 is the same, fdiv etc will stall the fpu pipeline

the PS2 seems designed for single FP , it has gobs of bandwidth for it.

However you can get better performance with the FPU and SSE2 or
3DNOW etc, its just that most people don't since the codes not always
written for obvious vectorization.

If like me you are using software renderers theres a lot of advantages
to fixed point, and i'm not locked into FPU for machines that don't
have them.

please reply to me offlist if you wish to argue back and forth about it
charlie