Age | Commit message (Collapse) | Author | |
---|---|---|---|
2019-11-09 | Adapt to upstream function rename | Adrian Kummerlaender | |
2019-11-06 | Add basic AA pattern support for OpenCL example | Adrian Kummerlaender | |
2019-11-05 | Add basic AA pattern support for C++ example | Adrian Kummerlaender | |
2019-11-04 | Adapt to upstream changes | Adrian Kummerlaender | |
2019-11-02 | Adapt to upstream changes | Adrian Kummerlaender | |
2019-10-30 | Rename folder, add basic README.md | Adrian Kummerlaender | |
2019-10-30 | Extract settings into config file, add documentation | Adrian Kummerlaender | |
2019-10-30 | Move C++ LDC template from upstream, improve build | Adrian Kummerlaender | |
2019-10-29 | Merge shell environments | Adrian Kummerlaender | |
2019-10-29 | Add example for C++ target | Adrian Kummerlaender | |
2019-10-29 | Move example into subfolder | Adrian Kummerlaender | |
2019-10-29 | Adapt to upstream changes | Adrian Kummerlaender | |
2019-10-29 | Fix double precision | Adrian Kummerlaender | |
2019-10-28 | Basic 2D LDC using boltzgen for kernel generation | Adrian Kummerlaender | |
Using cell lists as parameters for multiple non-branching kernels seems to reduce performance by ~50 MLUPS (for single precision D2Q9). This might be alleviated by padding the cell lists to enable thread layout control or by improved kernel dispatching. On the upside this OpenCL program runs not only on GPUs but is also vectorized on Intel CPUs yielding about 180 MLUPS (single precision) and - anticlimactically - 85 MLUPS for double precision on a i7-4790K. However both these values compare well to the performance of established CPU LBM codes. |