boltzgen - Symbolic generation of LBM kernels

Age	Commit message (Collapse)	Author
2020-02-02	Implement basic multi-cuboid communication for CUDA targetHEAD master	Adrian Kummerlaender

2020-02-02	Rename 'collide_and_stream' to 'collide'	Adrian Kummerlaender
	Streaming is only implicit depending on the selected propagation pattern.
2020-01-17	Implement AA for CUDA target	Adrian Kummerlaender

2020-01-11	Implement SSS for OpenCL target	Adrian Kummerlaender
	Sadly OpenCL kernels don't accept pointer-to-pointer arguments which complicates the control structure implementation. A workaround is to cast them into `uintptr_t` which is guaranteed to be large enough to fit any pointer on the device. Special care has to be taken to always perform the pointer shifts on actual floating point pointers and not on type-less pointers.
2019-11-12	Match OpenCL and CUDA cell list dispatch templates	Adrian Kummerlaender

2019-11-12	Fix order of CSE and pow2 expansion	Adrian Kummerlaender
	This way the expanded call to pow2 is resolved into a common subexpression.
2019-11-10	Implement basic CUDA target	Adrian Kummerlaender
	Currently only for the SSS streaming pattern. CudaCodePrinter in `utility/printer.py` is required to add a 'f' suffix to all single precision floating point literals. If this is not done (when targeting single precision) most calculations happen in double precision which destroys performance. (In OpenCL this is not necessary as we can simply set the `-cl-single-precision-constant` flag. Sadly such a flag doesn't seem to exist for nvcc.)
2019-11-09	Add support for population padding to SOA layout	Adrian Kummerlaender

2019-11-09	Implement basic version of the SSS pattern for C++ target	Adrian Kummerlaender
	An interesting extension of the AA pattern. The main advantage of this is that updating pointers in a control structure is much more elegant than duplicating all function implementations as is required by the normal AA pattern. For more details see [1]. Only works for the SOA layout. On a pure memory access level this pattern is equivalent to the AA pattern. The difference is how the memory locations are calculated (by pointer swap & shift or by different indexing functions for odd and even time steps). [1]: "An auto-vectorization friendly parallel lattice Boltzmann streaming scheme for direct addressing" by Mohrhard et al. (2019)
2019-11-09	Add optional OpenGL interop helper function for OpenCL target	Adrian Kummerlaender

2019-11-09	Mark equilibrilize, momenta result values as const	Adrian Kummerlaender
	Doesn't change the outcome but is more in line how the rest of the generated code looks like.
2019-11-08	Rename OpenCL cell list wrapper functions	Adrian Kummerlaender

2019-11-06	Check whether template for requested streaming pattern exists	Adrian Kummerlaender

2019-11-05	Update README	Adrian Kummerlaender

2019-11-05	Implement AA pattern for OpenCL target	Adrian Kummerlaender
	Works well but function naming is getting kind of clunky, e.g. "velocity_momenta_boundary_tick_cells" This could be hidden to a degree by proving branching wrappers for the odd and even time step implementations. However this would not vectorize when targeting Intel via OpenCL.
2019-11-05	Add cell index generator method to Geometry class	Adrian Kummerlaender

2019-11-05	Fix OpenCL vector indexing	Adrian Kummerlaender

2019-11-05	Implement AA pattern for C++ target	Adrian Kummerlaender
	Note that special care has to be taken to provide ghost cells around active cells so the algorithm has somewhere to stream to and from. This is also the case for the AB pattern but there they only have to be equilibrilized once instead of after every other time step. Even when such an equilibrilization is performed there is still a potential bug as inbound populations at the outer boundary are never streamed to (this is not a problem for AB using pull-only streaming). A vectorizable solution may require direction-specific ghost cell equilibrization.
2019-11-04	Drop AB suffix from streaming pattern definition names	Adrian Kummerlaender

2019-11-04	Extract streaming pattern into Mako definitions	Adrian Kummerlaender
	This should allow for plugging in e.g. a AA pattern implementation without without touching any file but `AA.$target.mako`. OpenCL and C++ target templates now look basically the same and could potentially be merged. However this would decrease flexibility should more differences appear in the future. Maintaining separate template files is an acceptable overhead to preserve flexibility.
2019-11-02	Improve lattice, model selection error reportingv0.1.2	Adrian Kummerlaender

2019-11-02	Import `sympy.ccode` inside templates instead of as argument	Adrian Kummerlaender

2019-11-02	Restructure LBM model / lattice distinction	Adrian Kummerlaender

2019-10-31	Call symbolic generator inside code templates	Adrian Kummerlaender
	This paves the way for dropping in other LBM collision models. As a side benefit the default momenta calulcation is now fully inlined where possible.
2019-10-30	Move C++ example to boltzgen_examples repository	Adrian Kummerlaender

2019-10-29	Don't try to reuse population layout for moment array indexingv0.1.1	Adrian Kummerlaender

2019-10-29	Return cell id as string expression when required	Adrian Kummerlaender

2019-10-29	Add support for generating custom templates in boltzgen's context	Adrian Kummerlaender

2019-10-29	Rename target module to memory	Adrian Kummerlaender

2019-10-29	Move further generator arguments into its constructor	Adrian Kummerlaender

2019-10-29	Extract cell indexing function	Adrian Kummerlaender

2019-10-29	Unify AOS, SOA specific cell preshift between targets	Adrian Kummerlaender
	SOA and AOS should not be target specific, neighbor offset calculation / bijection between gid and cell coordinates should be customizable.
2019-10-28	Set default order for custom ndindex overloadv0.1.0	Adrian Kummerlaender

2019-10-28	Use order-accepting ndindex method for generating cell indices	Adrian Kummerlaender

2019-10-28	Add basic setup.py	Adrian Kummerlaender
	No guarantee for correctness - I mostly fiddled this together in order to use common nixpkgs python package functions for including boltzgen in other shell environments.
2019-10-27	Optionally generate cell-list-based OpenCL dispatch functions	Adrian Kummerlaender
	Requires different function naming as OpenCL 1.2 doesn't support overloads. The OpenCL kernel code generated using this commit was successfully tested on an actual GPU. Time to set up some automatic validation.
2019-10-27	Verify precision parameter	Adrian Kummerlaender

2019-10-27	Add README	Adrian Kummerlaender

2019-10-27	Accept cell id as parameter in OpenCL functions	Adrian Kummerlaender
	It is more flexible to place OpenCL thread ID dependent dispatching in a separate function.
2019-10-27	Add bounce back boundary condition	Adrian Kummerlaender

2019-10-27	Tidy up shell environment	Adrian Kummerlaender

2019-10-27	Use Mako defines to generate momenta boundaries	Adrian Kummerlaender

2019-10-27	Extract target-dependent floating point type name	Adrian Kummerlaender

2019-10-27	Move layout implementations into separate folder	Adrian Kummerlaender

2019-10-27	Disable bytecode caching	Adrian Kummerlaender

2019-10-27	Separate functions into separate template files	Adrian Kummerlaender
	Selection of the desired templates is possible via a new `functions` parameter.
2019-10-26	Add extra toggle for OpenMP in C++ test function	Adrian Kummerlaender
	Yields ~160 MLUPs on a Xeon E3-1241 for D2Q9 double precision lid driven cavity. Obviously not anywhere near what is possible on GPUs but respectable for a CPU implementation. Especially considering how simple it is.
2019-10-26	Change C++ test function to LDC with optional VTK output	Adrian Kummerlaender

2019-10-26	Generate primitive velocity momenta BC for C++ target	Adrian Kummerlaender

2019-10-26	Support passing additional string arguments to the generator	Adrian Kummerlaender