symlbm_playground - Tinkering with LBM, OpenCL and SymPy-based code generation

Age	Commit message (Collapse)	Author
2019-06-11	Templatize assignment loops	Adrian Kummerlaender

2019-06-11	Start to use codegen for actual kernel generation	Adrian Kummerlaender

2019-06-11	Remove initial vector field example	Adrian Kummerlaender

2019-06-11	Test generation of D3Q19 kernel code in notebook	Adrian Kummerlaender

2019-06-11	Count operations	Adrian Kummerlaender

2019-06-11	Restructure codegen notebook	Adrian Kummerlaender

2019-06-10	Improve plot generation	Adrian Kummerlaender
	* Only update moment field when it is actually needed * => ~825 MLUPS * Defer plot generation until the actual simulation is done
2019-06-10	Reduce thread block size	Adrian Kummerlaender
	=> ~780 MLUPS
2019-06-10	Improve plot output	Adrian Kummerlaender

2019-06-10	Add fixed velocity boundaries to generated LBM kernel	Adrian Kummerlaender
	Interestingly this increased performance to ~750 MLUPS compared to ~665 MLUPS.
2019-06-09	First test of partially generated LBM kernel	Adrian Kummerlaender
	A kernel extracted from `lbn_codegen.ipynb` yields ~665 MLUPS compared to the ~600 MLUPS produced by a manually optimized kernel. Note that this new kernel currently doesn't handle boundary conditions (but dropping in a density condition doesn't impact performance).
2019-06-09	Start tracking codegen notebook	Adrian Kummerlaender

2019-06-09	Test lid driven cavity	Adrian Kummerlaender
	Notice that the indexing order of numpy arrays follows matrix conventions.
2019-06-09	Fix relaxation time	Adrian Kummerlaender

2019-06-09	Fix boundaries	Adrian Kummerlaender

2019-06-09	Add periodic performance reporting	Adrian Kummerlaender

2019-06-08	Performance optimizations	Adrian Kummerlaender
	Starting point: ~200 MLUPS on a NVidia K2200 Changes that did not noticeably impact performance: * Memory layout AOS vs. SOA (weird, probably highly platform dependent) * Propagate on read * Tagging pointers as read / write only * Manual code inlining Changes that made things worse: * Bad thread block sizes The actual issue: * Hidden double precision computations => Code now yields ~600 MLUPS
2019-06-04	Update notebook	Adrian Kummerlaender

2019-06-04	Check whether hand-unrolling makes a difference	Adrian Kummerlaender
	…it doesn't in this case.
2019-06-04	Enable verbose OpenCL output	Adrian Kummerlaender

2019-05-31	Try out various OpenCL work group sizes using a Jupyter notebook	Adrian Kummerlaender
	This is actually quite nice for this kind of experimentation!
2019-05-30	Collapse SOA into single array	Adrian Kummerlaender
	Weirdly the expected performance gains due to better coalescence of memory access is not achieved.
2019-05-29	Move to structure of arrays	Adrian Kummerlaender

2019-05-29	Add Jupyter to nix-shell	Adrian Kummerlaender

2019-05-28	Add const qualifiers for pointers	Adrian Kummerlaender

2019-05-28	Pull streaming for local writes	Adrian Kummerlaender

2019-05-28	Remove branch to enable vectorization on Intel	Adrian Kummerlaender
	Twice the MLUPS!
2019-05-27	Add material numbers	Adrian Kummerlaender

2019-05-27	Print some performance statistics	Adrian Kummerlaender

2019-05-26	Add basic D2Q9 LBM	Adrian Kummerlaender
	Ported the basic compustream structure
2019-05-26	Make config window togglable	Adrian Kummerlaender
	Tying the GTK loop into the OpenGL update loop works ok but is probably not how this should be done.
2019-05-21	Restrict to 2D	Adrian Kummerlaender
	i.e. recreate something more like computicle
2019-05-20	Change parameter window into dialog	Adrian Kummerlaender

2019-05-20	Throw together basic UI for updating field function	Adrian Kummerlaender

2019-05-19	Initial commit	Adrian Kummerlaender