| Age | Commit message (Collapse) | Author | |
|---|---|---|---|
| 2019-06-12 | Initialize material numbers using given geometry function | Adrian Kummerlaender | |
| 2019-06-12 | Collect moments outside of the lattice class | Adrian Kummerlaender | |
| 2019-06-12 | Move kernel template into separate file | Adrian Kummerlaender | |
| 2019-06-12 | Allocate moments buffer only on device | Adrian Kummerlaender | |
| 2019-06-11 | Restore wrongly deleted file from 75d0088 | Adrian Kummerlaender | |
| 2019-06-11 | Move equilibrization to kernel | Adrian Kummerlaender | |
| 2019-06-11 | Move D2Q9 codegen into separate file | Adrian Kummerlaender | |
| 2019-06-11 | Preshift population field pointer | Adrian Kummerlaender | |
| Now averaging ~ 820 MLUPS again | |||
| 2019-06-11 | Statically resolve indices as far as possible | Adrian Kummerlaender | |
| Interestingly this seems to lose up to 10 MLUPS at first glance. On the other hand such a small difference could also be a temporary load issue. | |||
| 2019-06-11 | Move index calculation to compile time | Adrian Kummerlaender | |
| 2019-06-11 | Templatize assignment loops | Adrian Kummerlaender | |
| 2019-06-11 | Start to use codegen for actual kernel generation | Adrian Kummerlaender | |
| 2019-06-11 | Remove initial vector field example | Adrian Kummerlaender | |
| 2019-06-11 | Test generation of D3Q19 kernel code in notebook | Adrian Kummerlaender | |
| 2019-06-11 | Count operations | Adrian Kummerlaender | |
| 2019-06-11 | Restructure codegen notebook | Adrian Kummerlaender | |
| 2019-06-10 | Improve plot generation | Adrian Kummerlaender | |
| * Only update moment field when it is actually needed * => ~825 MLUPS * Defer plot generation until the actual simulation is done | |||
| 2019-06-10 | Reduce thread block size | Adrian Kummerlaender | |
| => ~780 MLUPS | |||
| 2019-06-10 | Improve plot output | Adrian Kummerlaender | |
| 2019-06-10 | Add fixed velocity boundaries to generated LBM kernel | Adrian Kummerlaender | |
| Interestingly this increased performance to ~750 MLUPS compared to ~665 MLUPS. | |||
| 2019-06-09 | First test of partially generated LBM kernel | Adrian Kummerlaender | |
| A kernel extracted from `lbn_codegen.ipynb` yields ~665 MLUPS compared to the ~600 MLUPS produced by a manually optimized kernel. Note that this new kernel currently doesn't handle boundary conditions (but dropping in a density condition doesn't impact performance). | |||
| 2019-06-09 | Start tracking codegen notebook | Adrian Kummerlaender | |
| 2019-06-09 | Test lid driven cavity | Adrian Kummerlaender | |
| Notice that the indexing order of numpy arrays follows matrix conventions. | |||
| 2019-06-09 | Fix relaxation time | Adrian Kummerlaender | |
| 2019-06-09 | Fix boundaries | Adrian Kummerlaender | |
| 2019-06-09 | Add periodic performance reporting | Adrian Kummerlaender | |
| 2019-06-08 | Performance optimizations | Adrian Kummerlaender | |
| Starting point: ~200 MLUPS on a NVidia K2200 Changes that did not noticeably impact performance: * Memory layout AOS vs. SOA (weird, probably highly platform dependent) * Propagate on read * Tagging pointers as read / write only * Manual code inlining Changes that made things worse: * Bad thread block sizes The actual issue: * Hidden double precision computations => Code now yields ~600 MLUPS | |||
| 2019-06-04 | Update notebook | Adrian Kummerlaender | |
| 2019-06-04 | Check whether hand-unrolling makes a difference | Adrian Kummerlaender | |
| …it doesn't in this case. | |||
| 2019-06-04 | Enable verbose OpenCL output | Adrian Kummerlaender | |
| 2019-05-31 | Try out various OpenCL work group sizes using a Jupyter notebook | Adrian Kummerlaender | |
| This is actually quite nice for this kind of experimentation! | |||
| 2019-05-30 | Collapse SOA into single array | Adrian Kummerlaender | |
| Weirdly the expected performance gains due to better coalescence of memory access is not achieved. | |||
| 2019-05-29 | Move to structure of arrays | Adrian Kummerlaender | |
| 2019-05-29 | Add Jupyter to nix-shell | Adrian Kummerlaender | |
| 2019-05-28 | Add const qualifiers for pointers | Adrian Kummerlaender | |
| 2019-05-28 | Pull streaming for local writes | Adrian Kummerlaender | |
| 2019-05-28 | Remove branch to enable vectorization on Intel | Adrian Kummerlaender | |
| Twice the MLUPS! | |||
| 2019-05-27 | Add material numbers | Adrian Kummerlaender | |
| 2019-05-27 | Print some performance statistics | Adrian Kummerlaender | |
| 2019-05-26 | Add basic D2Q9 LBM | Adrian Kummerlaender | |
| Ported the basic compustream structure | |||
| 2019-05-26 | Make config window togglable | Adrian Kummerlaender | |
| Tying the GTK loop into the OpenGL update loop works ok but is probably not how this should be done. | |||
| 2019-05-21 | Restrict to 2D | Adrian Kummerlaender | |
| i.e. recreate something more like computicle | |||
| 2019-05-20 | Change parameter window into dialog | Adrian Kummerlaender | |
| 2019-05-20 | Throw together basic UI for updating field function | Adrian Kummerlaender | |
| 2019-05-19 | Initial commit | Adrian Kummerlaender | |
