aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-06-08Performance optimizationsAdrian Kummerlaender
Starting point: ~200 MLUPS on a NVidia K2200 Changes that did not noticeably impact performance: * Memory layout AOS vs. SOA (weird, probably highly platform dependent) * Propagate on read * Tagging pointers as read / write only * Manual code inlining Changes that made things worse: * Bad thread block sizes The actual issue: * Hidden double precision computations => Code now yields ~600 MLUPS
2019-06-04Update notebookAdrian Kummerlaender
2019-06-04Check whether hand-unrolling makes a differenceAdrian Kummerlaender
…it doesn't in this case.
2019-06-04Enable verbose OpenCL outputAdrian Kummerlaender
2019-05-31Try out various OpenCL work group sizes using a Jupyter notebookAdrian Kummerlaender
This is actually quite nice for this kind of experimentation!
2019-05-30Collapse SOA into single arrayAdrian Kummerlaender
Weirdly the expected performance gains due to better coalescence of memory access is not achieved.
2019-05-29Move to structure of arraysAdrian Kummerlaender
2019-05-29Add Jupyter to nix-shellAdrian Kummerlaender
2019-05-28Add const qualifiers for pointersAdrian Kummerlaender
2019-05-28Pull streaming for local writesAdrian Kummerlaender
2019-05-28Remove branch to enable vectorization on IntelAdrian Kummerlaender
Twice the MLUPS!
2019-05-27Add material numbersAdrian Kummerlaender
2019-05-27Print some performance statisticsAdrian Kummerlaender
2019-05-26Add basic D2Q9 LBMAdrian Kummerlaender
Ported the basic compustream structure
2019-05-26Make config window togglableAdrian Kummerlaender
Tying the GTK loop into the OpenGL update loop works ok but is probably not how this should be done.
2019-05-21Restrict to 2DAdrian Kummerlaender
i.e. recreate something more like computicle
2019-05-20Change parameter window into dialogAdrian Kummerlaender
2019-05-20Throw together basic UI for updating field functionAdrian Kummerlaender
2019-05-19Initial commitAdrian Kummerlaender