Re[2]: [ODE] Faster ODE

Henri Hakl henri at cs.sun.ac.za
Mon Nov 25 13:46:01 2002


This is a multi-part message in MIME format.

------=_NextPart_000_031D_01C294D4.732027E0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I've been thinking about SIMD (MMX, 3DNow(!), SSE(2)) instructions for =
ODE -
and it is quite possible that it can bring about harmony and speed. But =
one
thing that is likely going to cause problems is the SSE(2) code.

For optimal performance a number of details need to be implemented. =
Vectors
and matrices need to be of a horizontal size that is a factor of 4 (this =
is
implemented and the reason why, for example, a 3x3 matrix is defined as =
a 12
TReal (3x4) structure.

However, the structures also have to be aligned onto 16-byte boundaries. =
To
allow for optimal SSE(2) access (using movaps) each 128-bit memory =
vector
that is accessed has to be alligned on a 16-byte memory boundary. This =
is a
problem in ODE, as every math structure now is required to be 16-byte
aligned; this is difficult to achieve because ODE calls/uses =
sub-matrices of
matrices, and it may be difficult to guarantee that every sub-matrix is
>also< correctly 16-byte aligned.

Additionally SSE2 primarily adds double-float functionality to the SIMD
instructions. This can help somewhat for speed in the TReal =3D double =
case,
but isn't likely (just my guess) to have as tremendous a speed bonus as =
4
single floats that can be handled simultaneously for TReal =3D single.

Anyway... ;)
  Henri


----- Original Message -----
From: "Nguyen Binh" <ngbinh@glassegg.com>
To: <ode-admin@q12.org>; "Russ Smith" <russ@q12.org>
Cc: "Jeffrey Palmer" <jeffrey.palmer@acm.org>; <ode@q12.org>
Sent: Monday, November 25, 2002 5:11 AM
Subject: Re[2]: [ODE] Faster ODE


>
>         I think the best way to improve ODE speed is using CPU-
>         specialized instructions like MMX,SIMD,SSE(2).
>
>         The refs can be :
>             http://LibSimd.sourceforge.net
>             SML library of Intel. (Very nice!)
>
> --
> Best regards,
>
> ---------------------------------------------------------------------
>    Nguyen Binh
>    Software Engineer
>    Glass Egg Digital Media
>    Me Linh Point Tower, 10th Floor
>    2 Ngo Duc Ke
>    District 1, Ho Chi Minh City
>    Vietnam
>    Fax:  (84.8)823-8392
>      www.glassegg.com
> ---------------------------------------------------------------------
>
>
> _______________________________________________
> ODE mailing list
> ODE@q12.org
> http://q12.org/mailman/listinfo/ode


------=_NextPart_000_031D_01C294D4.732027E0
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2600.0" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV>I've been thinking about SIMD (MMX, 3DNow(!), SSE(2)) instructions =
for ODE=20
-<BR>and it is quite possible that it can bring about harmony and speed. =
But=20
one<BR>thing that is likely going to cause problems is the SSE(2)=20
code.<BR><BR>For optimal performance a number of details need to be =
implemented.=20
Vectors<BR>and matrices need to be of a horizontal size that is a factor =
of 4=20
(this is<BR>implemented and the reason why, for example, a 3x3 matrix is =
defined=20
as a 12<BR>TReal (3x4) structure.<BR><BR>However, the structures also =
have to be=20
aligned onto 16-byte boundaries. To<BR>allow for optimal SSE(2) access =
(using=20
movaps) each 128-bit memory vector<BR>that is accessed has to be =
alligned on a=20
16-byte memory boundary. This is a<BR>problem in ODE, as every math =
structure=20
now is required to be 16-byte<BR>aligned; this is difficult to achieve =
because=20
ODE calls/uses sub-matrices of<BR>matrices, and it may be difficult to =
guarantee=20
that every sub-matrix is<BR>&gt;also&lt; correctly 16-byte=20
aligned.<BR><BR>Additionally SSE2 primarily adds double-float =
functionality to=20
the SIMD<BR>instructions. This can help somewhat for speed in the TReal =
=3D double=20
case,<BR>but isn't likely (just my guess) to have as tremendous a speed =
bonus as=20
4<BR>single floats that can be handled simultaneously for TReal =3D=20
single.<BR><BR>Anyway... ;)<BR>&nbsp; Henri<BR><BR><BR>----- Original =
Message=20
-----<BR>From: "Nguyen Binh" &lt;<A=20
href=3D"mailto:ngbinh@glassegg.com">ngbinh@glassegg.com</A>&gt;<BR>To: =
&lt;<A=20
href=3D"mailto:ode-admin@q12.org">ode-admin@q12.org</A>&gt;; "Russ =
Smith" &lt;<A=20
href=3D"mailto:russ@q12.org">russ@q12.org</A>&gt;<BR>Cc: "Jeffrey =
Palmer" &lt;<A=20
href=3D"mailto:jeffrey.palmer@acm.org">jeffrey.palmer@acm.org</A>&gt;; =
&lt;<A=20
href=3D"mailto:ode@q12.org">ode@q12.org</A>&gt;<BR>Sent: Monday, =
November 25, 2002=20
5:11 AM<BR>Subject: Re[2]: [ODE] Faster=20
ODE<BR><BR><BR>&gt;<BR>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p; I=20
think the best way to improve ODE speed is using=20
CPU-<BR>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; specialized =

instructions like=20
MMX,SIMD,SSE(2).<BR>&gt;<BR>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;=20
The refs can be=20
:<BR>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;=20
<A=20
href=3D"http://LibSimd.sourceforge.net">http://LibSimd.sourceforge.net</A=
><BR>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;=20
SML library of Intel. (Very nice!)<BR>&gt;<BR>&gt; --<BR>&gt; Best=20
regards,<BR>&gt;<BR>&gt;=20
---------------------------------------------------------------------<BR>=
&gt;&nbsp;&nbsp;&nbsp;=20
Nguyen Binh<BR>&gt;&nbsp;&nbsp;&nbsp; Software=20
Engineer<BR>&gt;&nbsp;&nbsp;&nbsp; Glass Egg Digital=20
Media<BR>&gt;&nbsp;&nbsp;&nbsp; Me Linh Point Tower, 10th=20
Floor<BR>&gt;&nbsp;&nbsp;&nbsp; 2 Ngo Duc Ke<BR>&gt;&nbsp;&nbsp;&nbsp; =
District=20
1, Ho Chi Minh City<BR>&gt;&nbsp;&nbsp;&nbsp; =
Vietnam<BR>&gt;&nbsp;&nbsp;&nbsp;=20
Fax:&nbsp; (84.8)823-8392<BR>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <A=20
href=3D"http://www.glassegg.com">www.glassegg.com</A><BR>&gt;=20
---------------------------------------------------------------------<BR>=
&gt;<BR>&gt;<BR>&gt;=20
_______________________________________________<BR>&gt; ODE mailing =
list<BR>&gt;=20
<A href=3D"mailto:ODE@q12.org">ODE@q12.org</A><BR>&gt; <A=20
href=3D"http://q12.org/mailman/listinfo/ode">http://q12.org/mailman/listi=
nfo/ode</A><BR></DIV></BODY></HTML>

------=_NextPart_000_031D_01C294D4.732027E0--