| Age | Commit message (Collapse) | Author | 
|---|
|  | i.e. return unshifted moments in a implicitly ordered float4 array.
Cell positions are reconstructed by a vertex shaded analogously to
how it is done in compustream. | 
|  |  | 
|  | Yields another ~5-10 MLUPS in the simple D2Q9 example.
Now averaging at ~840 MLUPS for D2Q9 and ~ 400 MLUPS for D3Q19 on a K2200. | 
|  |  | 
|  |  | 
|  |  | 
|  | Note how this basically required no changes besides generalizing cell indexing
and adding the symbolic formulation of a D3Q19 BGK collision step.
Increasing the neighborhood communication from 9 to 19 cells leads to a
significant performance "regression": The 3D kernel yields ~ 360 MLUPS
compared to the 2D version's ~ 820 MLUPS. | 
|  |  | 
|  |  |