Age | Commit message (Collapse) | Author | |
---|---|---|---|
2019-11-12 | Share Lattice implementation between plain and interop OpenCL example | Adrian Kummerlaender | |
2019-11-12 | Share boltzgen version between expressions | Adrian Kummerlaender | |
2019-11-12 | Export density, support indexing parameter in CUDA example | Adrian Kummerlaender | |
2019-11-11 | Use separate Nix environments for each target | Adrian Kummerlaender | |
2019-11-10 | Add LDC example for CUDA target | Adrian Kummerlaender | |
2019-11-09 | Fix SSS population padding | Adrian Kummerlaender | |
D2Q9 and D3Q27 worked by coincidence, should now work for all lattices. | |||
2019-11-09 | Add support for SSS pattern to C++ LDC example | Adrian Kummerlaender | |
2019-11-09 | Add OpenCL GL interop example | Adrian Kummerlaender | |
2019-11-09 | Adapt to upstream function rename | Adrian Kummerlaender | |
2019-11-06 | Add basic AA pattern support for OpenCL example | Adrian Kummerlaender | |
2019-11-05 | Add basic AA pattern support for C++ example | Adrian Kummerlaender | |
2019-11-04 | Adapt to upstream changes | Adrian Kummerlaender | |
2019-11-02 | Adapt to upstream changes | Adrian Kummerlaender | |
2019-10-30 | Rename folder, add basic README.md | Adrian Kummerlaender | |
2019-10-30 | Extract settings into config file, add documentation | Adrian Kummerlaender | |
2019-10-30 | Move C++ LDC template from upstream, improve build | Adrian Kummerlaender | |
2019-10-29 | Merge shell environments | Adrian Kummerlaender | |
2019-10-29 | Add example for C++ target | Adrian Kummerlaender | |
2019-10-29 | Move example into subfolder | Adrian Kummerlaender | |
2019-10-29 | Adapt to upstream changes | Adrian Kummerlaender | |
2019-10-29 | Fix double precision | Adrian Kummerlaender | |
2019-10-28 | Basic 2D LDC using boltzgen for kernel generation | Adrian Kummerlaender | |
Using cell lists as parameters for multiple non-branching kernels seems to reduce performance by ~50 MLUPS (for single precision D2Q9). This might be alleviated by padding the cell lists to enable thread layout control or by improved kernel dispatching. On the upside this OpenCL program runs not only on GPUs but is also vectorized on Intel CPUs yielding about 180 MLUPS (single precision) and - anticlimactically - 85 MLUPS for double precision on a i7-4790K. However both these values compare well to the performance of established CPU LBM codes. |