Age | Commit message (Collapse) | Author | |
---|---|---|---|
2019-06-16 | Replace some explicit dimension branching | Adrian Kummerlaender | |
2019-06-15 | Split descriptors and symbolic formulation | Adrian Kummerlaender | |
2019-06-15 | Add support for generating a D3Q19 kernel | Adrian Kummerlaender | |
Note how this basically required no changes besides generalizing cell indexing and adding the symbolic formulation of a D3Q19 BGK collision step. Increasing the neighborhood communication from 9 to 19 cells leads to a significant performance "regression": The 3D kernel yields ~ 360 MLUPS compared to the 2D version's ~ 820 MLUPS. | |||
2019-06-15 | Consistently name population buffers | Adrian Kummerlaender | |
2019-06-14 | Extract geometry information | Adrian Kummerlaender | |
2019-06-13 | Further the separation between descriptor and lattice | Adrian Kummerlaender | |
2019-06-13 | Tidy up symbolic kernel generation | Adrian Kummerlaender | |
2019-06-13 | Add kernel customization point for velocity boundaries | Adrian Kummerlaender | |
2019-06-12 | Make it easier to exchange initial equilibration logic | Adrian Kummerlaender | |
2019-06-12 | Restructuring | Adrian Kummerlaender | |
2019-06-11 | Restore wrongly deleted file from 75d0088 | Adrian Kummerlaender | |
2019-06-11 | Remove initial vector field example | Adrian Kummerlaender | |
2019-06-09 | Fix relaxation time | Adrian Kummerlaender | |
2019-06-09 | Fix boundaries | Adrian Kummerlaender | |
2019-06-09 | Add periodic performance reporting | Adrian Kummerlaender | |
2019-06-08 | Performance optimizations | Adrian Kummerlaender | |
Starting point: ~200 MLUPS on a NVidia K2200 Changes that did not noticeably impact performance: * Memory layout AOS vs. SOA (weird, probably highly platform dependent) * Propagate on read * Tagging pointers as read / write only * Manual code inlining Changes that made things worse: * Bad thread block sizes The actual issue: * Hidden double precision computations => Code now yields ~600 MLUPS | |||
2019-06-04 | Check whether hand-unrolling makes a difference | Adrian Kummerlaender | |
…it doesn't in this case. | |||
2019-05-31 | Try out various OpenCL work group sizes using a Jupyter notebook | Adrian Kummerlaender | |
This is actually quite nice for this kind of experimentation! | |||
2019-05-30 | Collapse SOA into single array | Adrian Kummerlaender | |
Weirdly the expected performance gains due to better coalescence of memory access is not achieved. | |||
2019-05-29 | Move to structure of arrays | Adrian Kummerlaender | |
2019-05-28 | Add const qualifiers for pointers | Adrian Kummerlaender | |
2019-05-28 | Pull streaming for local writes | Adrian Kummerlaender | |
2019-05-28 | Remove branch to enable vectorization on Intel | Adrian Kummerlaender | |
Twice the MLUPS! | |||
2019-05-27 | Add material numbers | Adrian Kummerlaender | |
2019-05-27 | Print some performance statistics | Adrian Kummerlaender | |
2019-05-26 | Add basic D2Q9 LBM | Adrian Kummerlaender | |
Ported the basic compustream structure |