symlbm_playground - Tinkering with LBM, OpenCL and SymPy-based code generation

Age	Commit message (Collapse)	Author
2019-06-20	Prototype OpenGL interoperation	Adrian Kummerlaender

2019-06-20	Move back assignment	Adrian Kummerlaender

2019-06-18	Expand square expressions	Adrian Kummerlaender
	Yields another ~5-10 MLUPS in the simple D2Q9 example. Now averaging at ~840 MLUPS for D2Q9 and ~ 400 MLUPS for D3Q19 on a K2200.
2019-06-17	Extract population offset	Adrian Kummerlaender

2019-06-17	Add function for exporting moments as VTK files	Adrian Kummerlaender

2019-06-16	Add PyEVTK to environment	Adrian Kummerlaender

2019-06-16	Replace some explicit dimension branching	Adrian Kummerlaender

2019-06-16	Select thread layout depending on the descriptor's characteristics	Adrian Kummerlaender

2019-06-16	Declutter gid and offset calculation	Adrian Kummerlaender

2019-06-16	Add D3Q27 descriptor	Adrian Kummerlaender

2019-06-15	Split descriptors and symbolic formulation	Adrian Kummerlaender

2019-06-15	Add support for generating a D3Q19 kernel	Adrian Kummerlaender
	Note how this basically required no changes besides generalizing cell indexing and adding the symbolic formulation of a D3Q19 BGK collision step. Increasing the neighborhood communication from 9 to 19 cells leads to a significant performance "regression": The 3D kernel yields ~ 360 MLUPS compared to the 2D version's ~ 820 MLUPS.
2019-06-15	Start to record some benchmarks	Adrian Kummerlaender

2019-06-15	Consistently name population buffers	Adrian Kummerlaender

2019-06-14	Extract geometry information	Adrian Kummerlaender

2019-06-13	Further the separation between descriptor and lattice	Adrian Kummerlaender

2019-06-13	Tidy up symbolic kernel generation	Adrian Kummerlaender

2019-06-13	Add JupyterLab to environment	Adrian Kummerlaender

2019-06-13	Add kernel customization point for velocity boundaries	Adrian Kummerlaender

2019-06-12	Port LDC example to new structure	Adrian Kummerlaender

2019-06-12	Make it easier to exchange initial equilibration logic	Adrian Kummerlaender

2019-06-12	Restructuring	Adrian Kummerlaender

2019-06-12	Initialize material numbers using given geometry function	Adrian Kummerlaender

2019-06-12	Collect moments outside of the lattice class	Adrian Kummerlaender

2019-06-12	Move kernel template into separate file	Adrian Kummerlaender

2019-06-12	Allocate moments buffer only on device	Adrian Kummerlaender

2019-06-11	Restore wrongly deleted file from 75d0088	Adrian Kummerlaender

2019-06-11	Move equilibrization to kernel	Adrian Kummerlaender

2019-06-11	Move D2Q9 codegen into separate file	Adrian Kummerlaender

2019-06-11	Preshift population field pointer	Adrian Kummerlaender
	Now averaging ~ 820 MLUPS again
2019-06-11	Statically resolve indices as far as possible	Adrian Kummerlaender
	Interestingly this seems to lose up to 10 MLUPS at first glance. On the other hand such a small difference could also be a temporary load issue.
2019-06-11	Move index calculation to compile time	Adrian Kummerlaender

2019-06-11	Templatize assignment loops	Adrian Kummerlaender

2019-06-11	Start to use codegen for actual kernel generation	Adrian Kummerlaender

2019-06-11	Remove initial vector field example	Adrian Kummerlaender

2019-06-11	Test generation of D3Q19 kernel code in notebook	Adrian Kummerlaender

2019-06-11	Count operations	Adrian Kummerlaender

2019-06-11	Restructure codegen notebook	Adrian Kummerlaender

2019-06-10	Improve plot generation	Adrian Kummerlaender
	* Only update moment field when it is actually needed * => ~825 MLUPS * Defer plot generation until the actual simulation is done
2019-06-10	Reduce thread block size	Adrian Kummerlaender
	=> ~780 MLUPS
2019-06-10	Improve plot output	Adrian Kummerlaender

2019-06-10	Add fixed velocity boundaries to generated LBM kernel	Adrian Kummerlaender
	Interestingly this increased performance to ~750 MLUPS compared to ~665 MLUPS.
2019-06-09	First test of partially generated LBM kernel	Adrian Kummerlaender
	A kernel extracted from `lbn_codegen.ipynb` yields ~665 MLUPS compared to the ~600 MLUPS produced by a manually optimized kernel. Note that this new kernel currently doesn't handle boundary conditions (but dropping in a density condition doesn't impact performance).
2019-06-09	Start tracking codegen notebook	Adrian Kummerlaender

2019-06-09	Test lid driven cavity	Adrian Kummerlaender
	Notice that the indexing order of numpy arrays follows matrix conventions.
2019-06-09	Fix relaxation time	Adrian Kummerlaender

2019-06-09	Fix boundaries	Adrian Kummerlaender

2019-06-09	Add periodic performance reporting	Adrian Kummerlaender

2019-06-08	Performance optimizations	Adrian Kummerlaender
	Starting point: ~200 MLUPS on a NVidia K2200 Changes that did not noticeably impact performance: * Memory layout AOS vs. SOA (weird, probably highly platform dependent) * Propagate on read * Tagging pointers as read / write only * Manual code inlining Changes that made things worse: * Bad thread block sizes The actual issue: * Hidden double precision computations => Code now yields ~600 MLUPS
2019-06-04	Update notebook	Adrian Kummerlaender