aboutsummaryrefslogtreecommitdiff
path: root/test.py
diff options
context:
space:
mode:
authorAdrian Kummerlaender2019-11-10 21:14:07 +0100
committerAdrian Kummerlaender2019-11-10 21:18:57 +0100
commit4a2885ad3ae0396486d288df94339d0c45e6db8b (patch)
tree1a0b5aa000bbcde65fa020381a02b19bb452e284 /test.py
parentd136bb30bc8a9393372ec905aea500a0b61000e3 (diff)
downloadboltzgen-4a2885ad3ae0396486d288df94339d0c45e6db8b.tar
boltzgen-4a2885ad3ae0396486d288df94339d0c45e6db8b.tar.gz
boltzgen-4a2885ad3ae0396486d288df94339d0c45e6db8b.tar.bz2
boltzgen-4a2885ad3ae0396486d288df94339d0c45e6db8b.tar.lz
boltzgen-4a2885ad3ae0396486d288df94339d0c45e6db8b.tar.xz
boltzgen-4a2885ad3ae0396486d288df94339d0c45e6db8b.tar.zst
boltzgen-4a2885ad3ae0396486d288df94339d0c45e6db8b.zip
Implement basic CUDA target
Currently only for the SSS streaming pattern. CudaCodePrinter in `utility/printer.py` is required to add a 'f' suffix to all single precision floating point literals. If this is not done (when targeting single precision) most calculations happen in double precision which destroys performance. (In OpenCL this is not necessary as we can simply set the `-cl-single-precision-constant` flag. Sadly such a flag doesn't seem to exist for nvcc.)
Diffstat (limited to 'test.py')
0 files changed, 0 insertions, 0 deletions