symlbm_playground - Tinkering with LBM, OpenCL and SymPy-based code generation

Age	Commit message (Collapse)	Author
2020-06-19	Use OpenCL buffer to access moments in streamline impl	Adrian Kummerlaender

2019-10-06	Explicitly enable double precision floating point when required	Adrian Kummerlaender

2019-10-06	Use OpenCL access qualifiers only for image objects	Adrian Kummerlaender
	It seems I was overeager in adding those qualifiers to non-image buffers as they are only defined by the standard in relation to image objects. Adding the qualifiers to normal buffers causes no observable performance difference on Nvidia targets and fails compilation when targeting AMD or Intel.
2019-09-21	Extract GL moments, particle buffers and add texture buffer	Adrian Kummerlaender

2019-09-17	Extract indicators, drawing of geometric primitives	Adrian Kummerlaender

2019-09-13	Add 3d lid driven cavity OpenGL visualization	Adrian Kummerlaender

2019-09-11	Cleanupink	Adrian Kummerlaender

2019-09-06	Add a fun little fake bonfire _simulation_	Adrian Kummerlaender
	…using appropriately colored aging particles
2019-09-04	Reset stuck particles to starting position	Adrian Kummerlaender

2019-09-01	Prototype "ink" particles visualization	Adrian Kummerlaender

2019-07-25	Dampen channel inflow	Adrian Kummerlaender

2019-06-29	Implement layout and memory padding	Adrian Kummerlaender
	There are at least two distinct areas where padding can be beneficial on a GPU: 1. Padding the global thread sizes to support specific thread layouts e.g. (32,1) layouts require the global lattice width to be a multiple of 32 2. Padding the memory layout at the lowest level to align memory accesses i.e. some GPUs read memory in 128 Byte chunks and as such it is beneficial if the operations are aligned accordingly For lattice and thread layout sizes that are exponents of two these two padding areas are equivalent. However when one operates on e.g. a (300,300) lattice using a (30,1) layout, padding to 128 bytes yields a performance improvement of about 10 MLUPS on a K2200. Note that I am getting quite unsatisfied with how the Lattice class and its suroundings continue to accumulate parameters. The naming distinction between Geometry, Grid, Memory and Lattice is also not very intuitive.
2019-06-22	Add platform, precision and thread layout parameters	Adrian Kummerlaender

2019-06-21	Gather interop moments in a more generic manner	Adrian Kummerlaender
	i.e. return unshifted moments in a implicitly ordered float4 array. Cell positions are reconstructed by a vertex shaded analogously to how it is done in compustream.
2019-06-20	Prototype OpenGL interoperation	Adrian Kummerlaender

2019-06-17	Extract population offset	Adrian Kummerlaender

2019-06-16	Declutter gid and offset calculation	Adrian Kummerlaender

2019-06-15	Add support for generating a D3Q19 kernel	Adrian Kummerlaender
	Note how this basically required no changes besides generalizing cell indexing and adding the symbolic formulation of a D3Q19 BGK collision step. Increasing the neighborhood communication from 9 to 19 cells leads to a significant performance "regression": The 3D kernel yields ~ 360 MLUPS compared to the 2D version's ~ 820 MLUPS.
2019-06-15	Consistently name population buffers	Adrian Kummerlaender

2019-06-14	Extract geometry information	Adrian Kummerlaender

2019-06-13	Further the separation between descriptor and lattice	Adrian Kummerlaender

2019-06-13	Tidy up symbolic kernel generation	Adrian Kummerlaender

2019-06-13	Add kernel customization point for velocity boundaries	Adrian Kummerlaender

2019-06-12	Make it easier to exchange initial equilibration logic	Adrian Kummerlaender

2019-06-12	Move kernel template into separate file	Adrian Kummerlaender