boltzgen - Symbolic generation of LBM kernels

Age	Commit message (Collapse)	Author
2019-11-10	Implement basic CUDA target	Adrian Kummerlaender
	Currently only for the SSS streaming pattern. CudaCodePrinter in `utility/printer.py` is required to add a 'f' suffix to all single precision floating point literals. If this is not done (when targeting single precision) most calculations happen in double precision which destroys performance. (In OpenCL this is not necessary as we can simply set the `-cl-single-precision-constant` flag. Sadly such a flag doesn't seem to exist for nvcc.)
2019-11-09	Implement basic version of the SSS pattern for C++ target	Adrian Kummerlaender
	An interesting extension of the AA pattern. The main advantage of this is that updating pointers in a control structure is much more elegant than duplicating all function implementations as is required by the normal AA pattern. For more details see [1]. Only works for the SOA layout. On a pure memory access level this pattern is equivalent to the AA pattern. The difference is how the memory locations are calculated (by pointer swap & shift or by different indexing functions for odd and even time steps). [1]: "An auto-vectorization friendly parallel lattice Boltzmann streaming scheme for direct addressing" by Mohrhard et al. (2019)
2019-11-09	Add optional OpenGL interop helper function for OpenCL target	Adrian Kummerlaender

2019-11-08	Rename OpenCL cell list wrapper functions	Adrian Kummerlaender

2019-11-05	Implement AA pattern for OpenCL target	Adrian Kummerlaender
	Works well but function naming is getting kind of clunky, e.g. "velocity_momenta_boundary_tick_cells" This could be hidden to a degree by proving branching wrappers for the odd and even time step implementations. However this would not vectorize when targeting Intel via OpenCL.
2019-11-05	Implement AA pattern for C++ target	Adrian Kummerlaender
	Note that special care has to be taken to provide ghost cells around active cells so the algorithm has somewhere to stream to and from. This is also the case for the AB pattern but there they only have to be equilibrilized once instead of after every other time step. Even when such an equilibrilization is performed there is still a potential bug as inbound populations at the outer boundary are never streamed to (this is not a problem for AB using pull-only streaming). A vectorizable solution may require direction-specific ghost cell equilibrization.
2019-11-04	Drop AB suffix from streaming pattern definition names	Adrian Kummerlaender

2019-11-04	Extract streaming pattern into Mako definitions	Adrian Kummerlaender
	This should allow for plugging in e.g. a AA pattern implementation without without touching any file but `AA.$target.mako`. OpenCL and C++ target templates now look basically the same and could potentially be merged. However this would decrease flexibility should more differences appear in the future. Maintaining separate template files is an acceptable overhead to preserve flexibility.