aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAdrian Kummerlaender2019-07-10 00:02:24 +0200
committerAdrian Kummerlaender2019-07-10 00:04:44 +0200
commit4d9d76afd3f47ef50cfe636e1ae76b7233d971a7 (patch)
treee425dc86a98c2fc97b5288a1a157bca995ac3678
parenta0ffeb83e095fb819db25e26cb92021800839779 (diff)
downloadsymlbm_playground-4d9d76afd3f47ef50cfe636e1ae76b7233d971a7.tar
symlbm_playground-4d9d76afd3f47ef50cfe636e1ae76b7233d971a7.tar.gz
symlbm_playground-4d9d76afd3f47ef50cfe636e1ae76b7233d971a7.tar.bz2
symlbm_playground-4d9d76afd3f47ef50cfe636e1ae76b7233d971a7.tar.lz
symlbm_playground-4d9d76afd3f47ef50cfe636e1ae76b7233d971a7.tar.xz
symlbm_playground-4d9d76afd3f47ef50cfe636e1ae76b7233d971a7.tar.zst
symlbm_playground-4d9d76afd3f47ef50cfe636e1ae76b7233d971a7.zip
Add README.md
-rw-r--r--README.md50
1 files changed, 50 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..6d0f8e5
--- /dev/null
+++ b/README.md
@@ -0,0 +1,50 @@
+# Symbolically generated GPU-based LBM
+
+Experimental generation of OpenCL kernels using SymPy, Mako and PyOpenCL.
+
+* Implements a straight forward AB pattern
+* All memory offsets are statically resolved
+* Underlying symbolic formulation is optimized using CSE
+* Characteristic constants of D2Q9 and D3Q27 are transparently recovered using only discrete velocities
+
+## Performance
+
+Theoretical maximum performance on tested hardware:
+
+| GPU | Bandwidth | D2Q9 |   | D3Q19 |   | D3Q27 |   |
+| ------ | ----------- | ------ | ------ | ------ | ------ | ------ | ------ |
+|   |   | single | double | single | double | single | double |
+| K2200 | 63.2 GiB/s | 893 | 459 | 435 | 220 | 308 | 156 |
+| P100 | 512.6 GiB/s | 7242 | 3719 | 3528 | 1787 | 2502 | 1262 |
+
+### Maximum measured performance...
+
+| GPU | D2Q9 |   | D3Q19 |   | D3Q27 |   |
+| ------ | ------ | ------ | ------ | ------ | ------ | ------ |
+|   | single | double | single | double | single | double |
+| K2200 | 843.4 | 326.4 | 423.2 | 163.8 | 303.0 | 116.0 |
+| P100 | 6957.4 | 3585.0 | 3420.2 | 1763.8 | 2374.6 | 1259.6 |
+
+### ...relative to theoretical maximum
+
+| GPU | D2Q9 |   | D3Q19 |   | D3Q27 |   |
+| ------ | ------ | ------ | ------ | ------ | ------ | ------ |
+|   | single | double | single | double | single | double |
+| K2200 | 94.4% | 71.1% | 97.3% | 74.5% | 98.4% | 74.4% |
+| P100 | 96.1% | 96.4% | 96.9% | 98.7% | 94.9% | 99.8% |
+
+### CSE impact on P100
+
+| CSE | D2Q9 |   | D3Q19 |   | D3Q27 |   |
+| ------ | ------ | ------ | ------ | ------ | ------ | ------ |
+|   | single | double | single | double | single | double |
+| No | 6957.4 | 2814.4 | 2581.8 | 998.8 | 1576.4 | 647.4 |
+| Yes | 6922.4 | 3585.0 | 3420.2 | 1763.8 | 2374.6 | 1259.6 |
+
+| CSE | D2Q9 |   | D3Q19 |   | D3Q27 |   |
+| ------ | ------ | ------ | ------ | ------ | ------ | ------ |
+|   | single | double | single | double | single | double |
+| No | 96.1% | 75.7% | 73.2% | 55.9% | 63.0% | 51.3% |
+| Yes | 95.6% | 96.4% | 96.9% | 98.7% | 94.9% | 99.8% |
+
+For more details see the `results/` and `notebook/` directories.