[ODE] Some SSE in Quick step

Nguyen Binh ngbinh at glassegg.com
Tue May 25 10:57:50 MST 2004


Hi Russ,

RS> now, i'm assuming that these numbers are "time to compute physics per
RS> frame". so the SSE version is actually slower in most cases?

    Yes, you are right! Profiling the code show that SSE code is
    slower. The problem is J and iMJ is not 4 bytes aligned so I have
    to use __mm_set_ps() intrinsic which is not efficient. I had
    modify fc slightly so that fc is 4 bytes aligned but change
    J to 4 bytes aligned is not so easy...

    I'll investigating this case...

-- 
Best regards,

---------------------------------------------------------------------
   Nguyen Binh
   Software Engineer
   Glass Egg Digital Media
   
   E.Town Building  
   7th Floor, 364 CongHoa Street
   Tan Binh District,
   HoChiMinh City,
   VietNam,

   Phone : +84 8 8109018
   Fax   : +84 8 8109013

     www.glassegg.com
---------------------------------------------------------------------



More information about the ODE mailing list