aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-06-29Implement layout and memory paddingAdrian Kummerlaender
There are at least two distinct areas where padding can be beneficial on a GPU: 1. Padding the global thread sizes to support specific thread layouts e.g. (32,1) layouts require the global lattice width to be a multiple of 32 2. Padding the memory layout at the lowest level to align memory accesses i.e. some GPUs read memory in 128 Byte chunks and as such it is beneficial if the operations are aligned accordingly For lattice and thread layout sizes that are exponents of two these two padding areas are equivalent. However when one operates on e.g. a (300,300) lattice using a (30,1) layout, padding to 128 bytes yields a performance improvement of about 10 MLUPS on a K2200. Note that I am getting quite unsatisfied with how the Lattice class and its suroundings continue to accumulate parameters. The naming distinction between Geometry, Grid, Memory and Lattice is also not very intuitive.
2019-06-28Move some common benchmark plots into helper functionsAdrian Kummerlaender
2019-06-27Add some benchmark plotsAdrian Kummerlaender
2019-06-25Adapt benchmark results format to be importableAdrian Kummerlaender
2019-06-25Fix LDC 3D x-z-plane plotAdrian Kummerlaender
2019-06-25Add raw data of Tesla P100 benchmarksAdrian Kummerlaender
2019-06-24Add basic benchmark scripts, K2200 resultsAdrian Kummerlaender
2019-06-22Add interactive 2D LDC notebook, fix material initializationAdrian Kummerlaender
2019-06-22Add platform, precision and thread layout parametersAdrian Kummerlaender
2019-06-22Extract parameters in GL interop exampleAdrian Kummerlaender
2019-06-21Gather interop moments in a more generic mannerAdrian Kummerlaender
i.e. return unshifted moments in a implicitly ordered float4 array. Cell positions are reconstructed by a vertex shaded analogously to how it is done in compustream.
2019-06-20Prototype OpenGL interoperationAdrian Kummerlaender
2019-06-20Move back assignmentAdrian Kummerlaender
2019-06-18Expand square expressionsAdrian Kummerlaender
Yields another ~5-10 MLUPS in the simple D2Q9 example. Now averaging at ~840 MLUPS for D2Q9 and ~ 400 MLUPS for D3Q19 on a K2200.
2019-06-17Extract population offsetAdrian Kummerlaender
2019-06-17Add function for exporting moments as VTK filesAdrian Kummerlaender
2019-06-16Add PyEVTK to environmentAdrian Kummerlaender
2019-06-16Replace some explicit dimension branchingAdrian Kummerlaender
2019-06-16Select thread layout depending on the descriptor's characteristicsAdrian Kummerlaender
2019-06-16Declutter gid and offset calculationAdrian Kummerlaender
2019-06-16Add D3Q27 descriptorAdrian Kummerlaender
2019-06-15Split descriptors and symbolic formulationAdrian Kummerlaender
2019-06-15Add support for generating a D3Q19 kernelAdrian Kummerlaender
Note how this basically required no changes besides generalizing cell indexing and adding the symbolic formulation of a D3Q19 BGK collision step. Increasing the neighborhood communication from 9 to 19 cells leads to a significant performance "regression": The 3D kernel yields ~ 360 MLUPS compared to the 2D version's ~ 820 MLUPS.
2019-06-15Start to record some benchmarksAdrian Kummerlaender
2019-06-15Consistently name population buffersAdrian Kummerlaender
2019-06-14Extract geometry informationAdrian Kummerlaender
2019-06-13Further the separation between descriptor and latticeAdrian Kummerlaender
2019-06-13Tidy up symbolic kernel generationAdrian Kummerlaender
2019-06-13Add JupyterLab to environmentAdrian Kummerlaender
2019-06-13Add kernel customization point for velocity boundariesAdrian Kummerlaender
2019-06-12Port LDC example to new structureAdrian Kummerlaender
2019-06-12Make it easier to exchange initial equilibration logicAdrian Kummerlaender
2019-06-12RestructuringAdrian Kummerlaender
2019-06-12Initialize material numbers using given geometry functionAdrian Kummerlaender
2019-06-12Collect moments outside of the lattice classAdrian Kummerlaender
2019-06-12Move kernel template into separate fileAdrian Kummerlaender
2019-06-12Allocate moments buffer only on deviceAdrian Kummerlaender
2019-06-11Restore wrongly deleted file from 75d0088Adrian Kummerlaender
2019-06-11Move equilibrization to kernelAdrian Kummerlaender
2019-06-11Move D2Q9 codegen into separate fileAdrian Kummerlaender
2019-06-11Preshift population field pointerAdrian Kummerlaender
Now averaging ~ 820 MLUPS again
2019-06-11Statically resolve indices as far as possibleAdrian Kummerlaender
Interestingly this seems to lose up to 10 MLUPS at first glance. On the other hand such a small difference could also be a temporary load issue.
2019-06-11Move index calculation to compile timeAdrian Kummerlaender
2019-06-11Templatize assignment loopsAdrian Kummerlaender
2019-06-11Start to use codegen for actual kernel generationAdrian Kummerlaender
2019-06-11Remove initial vector field exampleAdrian Kummerlaender
2019-06-11Test generation of D3Q19 kernel code in notebookAdrian Kummerlaender
2019-06-11Count operationsAdrian Kummerlaender
2019-06-11Restructure codegen notebookAdrian Kummerlaender
2019-06-10Improve plot generationAdrian Kummerlaender
* Only update moment field when it is actually needed * => ~825 MLUPS * Defer plot generation until the actual simulation is done