aboutsummaryrefslogtreecommitdiff
path: root/result
diff options
context:
space:
mode:
authorAdrian Kummerlaender2019-06-08 23:08:28 +0200
committerAdrian Kummerlaender2019-06-08 23:08:28 +0200
commit5ac924371a7e53641a2f726a9f431ab8cb99f9fb (patch)
tree7e62a64ca253c8c18a85c2131b141125eb115b33 /result
parent4a6b0bb928db91d57eaa09b656d296e79eafe7ed (diff)
downloadsymlbm_playground-5ac924371a7e53641a2f726a9f431ab8cb99f9fb.tar
symlbm_playground-5ac924371a7e53641a2f726a9f431ab8cb99f9fb.tar.gz
symlbm_playground-5ac924371a7e53641a2f726a9f431ab8cb99f9fb.tar.bz2
symlbm_playground-5ac924371a7e53641a2f726a9f431ab8cb99f9fb.tar.lz
symlbm_playground-5ac924371a7e53641a2f726a9f431ab8cb99f9fb.tar.xz
symlbm_playground-5ac924371a7e53641a2f726a9f431ab8cb99f9fb.tar.zst
symlbm_playground-5ac924371a7e53641a2f726a9f431ab8cb99f9fb.zip
Performance optimizations
Starting point: ~200 MLUPS on a NVidia K2200 Changes that did not noticeably impact performance: * Memory layout AOS vs. SOA (weird, probably highly platform dependent) * Propagate on read * Tagging pointers as read / write only * Manual code inlining Changes that made things worse: * Bad thread block sizes The actual issue: * Hidden double precision computations => Code now yields ~600 MLUPS
Diffstat (limited to 'result')
0 files changed, 0 insertions, 0 deletions