Age | Commit message (Collapse) | Author | |
---|---|---|---|
2019-06-12 | Collect moments outside of the lattice class | Adrian Kummerlaender | |
2019-06-12 | Move kernel template into separate file | Adrian Kummerlaender | |
2019-06-12 | Allocate moments buffer only on device | Adrian Kummerlaender | |
2019-06-11 | Move equilibrization to kernel | Adrian Kummerlaender | |
2019-06-11 | Move D2Q9 codegen into separate file | Adrian Kummerlaender | |
2019-06-11 | Preshift population field pointer | Adrian Kummerlaender | |
Now averaging ~ 820 MLUPS again | |||
2019-06-11 | Statically resolve indices as far as possible | Adrian Kummerlaender | |
Interestingly this seems to lose up to 10 MLUPS at first glance. On the other hand such a small difference could also be a temporary load issue. | |||
2019-06-11 | Move index calculation to compile time | Adrian Kummerlaender | |
2019-06-11 | Templatize assignment loops | Adrian Kummerlaender | |
2019-06-11 | Start to use codegen for actual kernel generation | Adrian Kummerlaender | |
2019-06-10 | Improve plot generation | Adrian Kummerlaender | |
* Only update moment field when it is actually needed * => ~825 MLUPS * Defer plot generation until the actual simulation is done | |||
2019-06-10 | Reduce thread block size | Adrian Kummerlaender | |
=> ~780 MLUPS | |||
2019-06-10 | Improve plot output | Adrian Kummerlaender | |
2019-06-10 | Add fixed velocity boundaries to generated LBM kernel | Adrian Kummerlaender | |
Interestingly this increased performance to ~750 MLUPS compared to ~665 MLUPS. | |||
2019-06-09 | First test of partially generated LBM kernel | Adrian Kummerlaender | |
A kernel extracted from `lbn_codegen.ipynb` yields ~665 MLUPS compared to the ~600 MLUPS produced by a manually optimized kernel. Note that this new kernel currently doesn't handle boundary conditions (but dropping in a density condition doesn't impact performance). |