diff options
Performance optimizations
Starting point: ~200 MLUPS on a NVidia K2200
Changes that did not noticeably impact performance:
* Memory layout AOS vs. SOA (weird, probably highly platform dependent)
* Propagate on read
* Tagging pointers as read / write only
* Manual code inlining
Changes that made things worse:
* Bad thread block sizes
The actual issue:
* Hidden double precision computations
=> Code now yields ~600 MLUPS
Diffstat (limited to 'shell.nix')
-rw-r--r-- | shell.nix | 1 |
1 files changed, 1 insertions, 0 deletions
@@ -23,6 +23,7 @@ pkgs.stdenvNoCC.mkDerivation rec { local-python = custom-python.withPackages (python-packages: with python-packages; [ numpy + sympy pyopencl pyopengl pygobject3 |