Age | Commit message (Collapse) | Author | |
---|---|---|---|
2019-06-25 | Adapt benchmark results format to be importable | Adrian Kummerlaender | |
2019-06-25 | Fix LDC 3D x-z-plane plot | Adrian Kummerlaender | |
2019-06-25 | Add raw data of Tesla P100 benchmarks | Adrian Kummerlaender | |
2019-06-24 | Add basic benchmark scripts, K2200 results | Adrian Kummerlaender | |
2019-06-22 | Add interactive 2D LDC notebook, fix material initialization | Adrian Kummerlaender | |
2019-06-22 | Add platform, precision and thread layout parameters | Adrian Kummerlaender | |
2019-06-22 | Extract parameters in GL interop example | Adrian Kummerlaender | |
2019-06-21 | Gather interop moments in a more generic manner | Adrian Kummerlaender | |
i.e. return unshifted moments in a implicitly ordered float4 array. Cell positions are reconstructed by a vertex shaded analogously to how it is done in compustream. | |||
2019-06-20 | Prototype OpenGL interoperation | Adrian Kummerlaender | |
2019-06-20 | Move back assignment | Adrian Kummerlaender | |
2019-06-18 | Expand square expressions | Adrian Kummerlaender | |
Yields another ~5-10 MLUPS in the simple D2Q9 example. Now averaging at ~840 MLUPS for D2Q9 and ~ 400 MLUPS for D3Q19 on a K2200. | |||
2019-06-17 | Extract population offset | Adrian Kummerlaender | |
2019-06-17 | Add function for exporting moments as VTK files | Adrian Kummerlaender | |
2019-06-16 | Add PyEVTK to environment | Adrian Kummerlaender | |
2019-06-16 | Replace some explicit dimension branching | Adrian Kummerlaender | |
2019-06-16 | Select thread layout depending on the descriptor's characteristics | Adrian Kummerlaender | |
2019-06-16 | Declutter gid and offset calculation | Adrian Kummerlaender | |
2019-06-16 | Add D3Q27 descriptor | Adrian Kummerlaender | |
2019-06-15 | Split descriptors and symbolic formulation | Adrian Kummerlaender | |
2019-06-15 | Add support for generating a D3Q19 kernel | Adrian Kummerlaender | |
Note how this basically required no changes besides generalizing cell indexing and adding the symbolic formulation of a D3Q19 BGK collision step. Increasing the neighborhood communication from 9 to 19 cells leads to a significant performance "regression": The 3D kernel yields ~ 360 MLUPS compared to the 2D version's ~ 820 MLUPS. | |||
2019-06-15 | Start to record some benchmarks | Adrian Kummerlaender | |
2019-06-15 | Consistently name population buffers | Adrian Kummerlaender | |
2019-06-14 | Extract geometry information | Adrian Kummerlaender | |
2019-06-13 | Further the separation between descriptor and lattice | Adrian Kummerlaender | |
2019-06-13 | Tidy up symbolic kernel generation | Adrian Kummerlaender | |
2019-06-13 | Add JupyterLab to environment | Adrian Kummerlaender | |
2019-06-13 | Add kernel customization point for velocity boundaries | Adrian Kummerlaender | |
2019-06-12 | Port LDC example to new structure | Adrian Kummerlaender | |
2019-06-12 | Make it easier to exchange initial equilibration logic | Adrian Kummerlaender | |
2019-06-12 | Restructuring | Adrian Kummerlaender | |
2019-06-12 | Initialize material numbers using given geometry function | Adrian Kummerlaender | |
2019-06-12 | Collect moments outside of the lattice class | Adrian Kummerlaender | |
2019-06-12 | Move kernel template into separate file | Adrian Kummerlaender | |
2019-06-12 | Allocate moments buffer only on device | Adrian Kummerlaender | |
2019-06-11 | Restore wrongly deleted file from 75d0088 | Adrian Kummerlaender | |
2019-06-11 | Move equilibrization to kernel | Adrian Kummerlaender | |
2019-06-11 | Move D2Q9 codegen into separate file | Adrian Kummerlaender | |
2019-06-11 | Preshift population field pointer | Adrian Kummerlaender | |
Now averaging ~ 820 MLUPS again | |||
2019-06-11 | Statically resolve indices as far as possible | Adrian Kummerlaender | |
Interestingly this seems to lose up to 10 MLUPS at first glance. On the other hand such a small difference could also be a temporary load issue. | |||
2019-06-11 | Move index calculation to compile time | Adrian Kummerlaender | |
2019-06-11 | Templatize assignment loops | Adrian Kummerlaender | |
2019-06-11 | Start to use codegen for actual kernel generation | Adrian Kummerlaender | |
2019-06-11 | Remove initial vector field example | Adrian Kummerlaender | |
2019-06-11 | Test generation of D3Q19 kernel code in notebook | Adrian Kummerlaender | |
2019-06-11 | Count operations | Adrian Kummerlaender | |
2019-06-11 | Restructure codegen notebook | Adrian Kummerlaender | |
2019-06-10 | Improve plot generation | Adrian Kummerlaender | |
* Only update moment field when it is actually needed * => ~825 MLUPS * Defer plot generation until the actual simulation is done | |||
2019-06-10 | Reduce thread block size | Adrian Kummerlaender | |
=> ~780 MLUPS | |||
2019-06-10 | Improve plot output | Adrian Kummerlaender | |
2019-06-10 | Add fixed velocity boundaries to generated LBM kernel | Adrian Kummerlaender | |
Interestingly this increased performance to ~750 MLUPS compared to ~665 MLUPS. |