diff options
Performance optimizations
Starting point: ~200 MLUPS on a NVidia K2200
Changes that did not noticeably impact performance:
* Memory layout AOS vs. SOA (weird, probably highly platform dependent)
* Propagate on read
* Tagging pointers as read / write only
* Manual code inlining
Changes that made things worse:
* Bad thread block sizes
The actual issue:
* Hidden double precision computations
=> Code now yields ~600 MLUPS
Diffstat (limited to 'inspect_opencl_layout.ipynb')
0 files changed, 0 insertions, 0 deletions