aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-11-12Share Lattice implementation between plain and interop OpenCL exampleAdrian Kummerlaender
2019-11-12Share boltzgen version between expressionsAdrian Kummerlaender
2019-11-12Export density, support indexing parameter in CUDA exampleAdrian Kummerlaender
2019-11-11Use separate Nix environments for each targetAdrian Kummerlaender
2019-11-10Add LDC example for CUDA targetAdrian Kummerlaender
2019-11-09Fix SSS population paddingAdrian Kummerlaender
D2Q9 and D3Q27 worked by coincidence, should now work for all lattices.
2019-11-09Add support for SSS pattern to C++ LDC exampleAdrian Kummerlaender
2019-11-09Add OpenCL GL interop exampleAdrian Kummerlaender
2019-11-09Adapt to upstream function renameAdrian Kummerlaender
2019-11-06Add basic AA pattern support for OpenCL exampleAdrian Kummerlaender
2019-11-05Add basic AA pattern support for C++ exampleAdrian Kummerlaender
2019-11-04Adapt to upstream changesAdrian Kummerlaender
2019-11-02Adapt to upstream changesAdrian Kummerlaender
2019-10-30Rename folder, add basic README.mdAdrian Kummerlaender
2019-10-30Extract settings into config file, add documentationAdrian Kummerlaender
2019-10-30Move C++ LDC template from upstream, improve buildAdrian Kummerlaender
2019-10-29Merge shell environmentsAdrian Kummerlaender
2019-10-29Add example for C++ targetAdrian Kummerlaender
2019-10-29Move example into subfolderAdrian Kummerlaender
2019-10-29Adapt to upstream changesAdrian Kummerlaender
2019-10-29Fix double precisionAdrian Kummerlaender
2019-10-28Basic 2D LDC using boltzgen for kernel generationAdrian Kummerlaender
Using cell lists as parameters for multiple non-branching kernels seems to reduce performance by ~50 MLUPS (for single precision D2Q9). This might be alleviated by padding the cell lists to enable thread layout control or by improved kernel dispatching. On the upside this OpenCL program runs not only on GPUs but is also vectorized on Intel CPUs yielding about 180 MLUPS (single precision) and - anticlimactically - 85 MLUPS for double precision on a i7-4790K. However both these values compare well to the performance of established CPU LBM codes.