aboutsummaryrefslogtreecommitdiff
path: root/boltzgen/kernel/template
AgeCommit message (Collapse)Author
2020-02-02Implement basic multi-cuboid communication for CUDA targetHEADmasterAdrian Kummerlaender
2020-02-02Rename 'collide_and_stream' to 'collide'Adrian Kummerlaender
Streaming is only implicit depending on the selected propagation pattern.
2020-01-17Implement AA for CUDA targetAdrian Kummerlaender
2020-01-11Implement SSS for OpenCL targetAdrian Kummerlaender
Sadly OpenCL kernels don't accept pointer-to-pointer arguments which complicates the control structure implementation. A workaround is to cast them into `uintptr_t` which is guaranteed to be large enough to fit any pointer on the device. Special care has to be taken to always perform the pointer shifts on actual floating point pointers and not on type-less pointers.
2019-11-12Match OpenCL and CUDA cell list dispatch templatesAdrian Kummerlaender
2019-11-10Implement basic CUDA targetAdrian Kummerlaender
Currently only for the SSS streaming pattern. CudaCodePrinter in `utility/printer.py` is required to add a 'f' suffix to all single precision floating point literals. If this is not done (when targeting single precision) most calculations happen in double precision which destroys performance. (In OpenCL this is not necessary as we can simply set the `-cl-single-precision-constant` flag. Sadly such a flag doesn't seem to exist for nvcc.)
2019-11-09Implement basic version of the SSS pattern for C++ targetAdrian Kummerlaender
An interesting extension of the AA pattern. The main advantage of this is that updating pointers in a control structure is much more elegant than duplicating all function implementations as is required by the normal AA pattern. For more details see [1]. Only works for the SOA layout. On a pure memory access level this pattern is equivalent to the AA pattern. The difference is how the memory locations are calculated (by pointer swap & shift or by different indexing functions for odd and even time steps). [1]: "An auto-vectorization friendly parallel lattice Boltzmann streaming scheme for direct addressing" by Mohrhard et al. (2019)
2019-11-09Add optional OpenGL interop helper function for OpenCL targetAdrian Kummerlaender
2019-11-09Mark equilibrilize, momenta result values as constAdrian Kummerlaender
Doesn't change the outcome but is more in line how the rest of the generated code looks like.
2019-11-08Rename OpenCL cell list wrapper functionsAdrian Kummerlaender
2019-11-05Implement AA pattern for OpenCL targetAdrian Kummerlaender
Works well but function naming is getting kind of clunky, e.g. "velocity_momenta_boundary_tick_cells" This could be hidden to a degree by proving branching wrappers for the odd and even time step implementations. However this would not vectorize when targeting Intel via OpenCL.
2019-11-05Fix OpenCL vector indexingAdrian Kummerlaender
2019-11-05Implement AA pattern for C++ targetAdrian Kummerlaender
Note that special care has to be taken to provide ghost cells around active cells so the algorithm has somewhere to stream to and from. This is also the case for the AB pattern but there they only have to be equilibrilized once instead of after every other time step. Even when such an equilibrilization is performed there is still a potential bug as inbound populations at the outer boundary are never streamed to (this is not a problem for AB using pull-only streaming). A vectorizable solution may require direction-specific ghost cell equilibrization.
2019-11-04Drop AB suffix from streaming pattern definition namesAdrian Kummerlaender
2019-11-04Extract streaming pattern into Mako definitionsAdrian Kummerlaender
This should allow for plugging in e.g. a AA pattern implementation without without touching any file but `AA.$target.mako`. OpenCL and C++ target templates now look basically the same and could potentially be merged. However this would decrease flexibility should more differences appear in the future. Maintaining separate template files is an acceptable overhead to preserve flexibility.
2019-11-02Import `sympy.ccode` inside templates instead of as argumentAdrian Kummerlaender
2019-11-02Restructure LBM model / lattice distinctionAdrian Kummerlaender
2019-10-31Call symbolic generator inside code templatesAdrian Kummerlaender
This paves the way for dropping in other LBM collision models. As a side benefit the default momenta calulcation is now fully inlined where possible.
2019-10-30Move C++ example to boltzgen_examples repositoryAdrian Kummerlaender
2019-10-29Don't try to reuse population layout for moment array indexingv0.1.1Adrian Kummerlaender
2019-10-29Unify AOS, SOA specific cell preshift between targetsAdrian Kummerlaender
SOA and AOS should not be target specific, neighbor offset calculation / bijection between gid and cell coordinates should be customizable.
2019-10-27Optionally generate cell-list-based OpenCL dispatch functionsAdrian Kummerlaender
Requires different function naming as OpenCL 1.2 doesn't support overloads. The OpenCL kernel code generated using this commit was successfully tested on an actual GPU. Time to set up some automatic validation.
2019-10-27Accept cell id as parameter in OpenCL functionsAdrian Kummerlaender
It is more flexible to place OpenCL thread ID dependent dispatching in a separate function.
2019-10-27Add bounce back boundary conditionAdrian Kummerlaender
2019-10-27Use Mako defines to generate momenta boundariesAdrian Kummerlaender
2019-10-27Separate functions into separate template filesAdrian Kummerlaender
Selection of the desired templates is possible via a new `functions` parameter.
2019-10-26Add extra toggle for OpenMP in C++ test functionAdrian Kummerlaender
Yields ~160 MLUPs on a Xeon E3-1241 for D2Q9 double precision lid driven cavity. Obviously not anywhere near what is possible on GPUs but respectable for a CPU implementation. Especially considering how simple it is.
2019-10-26Change C++ test function to LDC with optional VTK outputAdrian Kummerlaender
2019-10-26Generate primitive velocity momenta BC for C++ targetAdrian Kummerlaender
2019-10-26Support passing additional string arguments to the generatorAdrian Kummerlaender
2019-10-26Fix cpp test functionAdrian Kummerlaender
2019-10-26Generalize floating point precision argumentAdrian Kummerlaender
2019-10-24Extract offset helper into target and layout specific classesAdrian Kummerlaender
2019-10-24Add test template for C++, enable switching between AOS and SOAAdrian Kummerlaender
2019-10-23Some cleanup, add `collect_moments` to C++ templateAdrian Kummerlaender
2019-10-22Add basic Generator classAdrian Kummerlaender
2019-10-21Pull in C++ template from symlbm_playground's standalone branchAdrian Kummerlaender
2019-10-21Pull in basics from symlbm_playgroundAdrian Kummerlaender
It's time to extract the generator-part of my GPU LBM playground and turn it into a nice reusable library. The goal is to produce a framework that can be used to generate collision and streaming programs from symbolic descriptions. i.e. it should be possible to select a LB model, the desired boundary conditions as well as a data structure / streaming model and use this information to automatically generate matching OpenCL / CUDA / C++ programs.