Age | Commit message (Collapse) | Author |
|
i.e. return unshifted moments in a implicitly ordered float4 array.
Cell positions are reconstructed by a vertex shaded analogously to
how it is done in compustream.
|
|
|
|
Yields another ~5-10 MLUPS in the simple D2Q9 example.
Now averaging at ~840 MLUPS for D2Q9 and ~ 400 MLUPS for D3Q19 on a K2200.
|
|
|
|
|
|
|
|
Note how this basically required no changes besides generalizing cell indexing
and adding the symbolic formulation of a D3Q19 BGK collision step.
Increasing the neighborhood communication from 9 to 19 cells leads to a
significant performance "regression": The 3D kernel yields ~ 360 MLUPS
compared to the 2D version's ~ 820 MLUPS.
|
|
|
|
|