summaryrefslogtreecommitdiff
path: root/lbm.org
diff options
context:
space:
mode:
authorAdrian Kummerlaender2021-05-17 00:15:33 +0200
committerAdrian Kummerlaender2021-05-17 00:15:33 +0200
commit4ec94c97879aafef15f7663135745e4ba61e62cf (patch)
tree322ae3f003892513f529842ff0b3fd100573b680 /lbm.org
downloadLiterateLB-4ec94c97879aafef15f7663135745e4ba61e62cf.tar
LiterateLB-4ec94c97879aafef15f7663135745e4ba61e62cf.tar.gz
LiterateLB-4ec94c97879aafef15f7663135745e4ba61e62cf.tar.bz2
LiterateLB-4ec94c97879aafef15f7663135745e4ba61e62cf.tar.lz
LiterateLB-4ec94c97879aafef15f7663135745e4ba61e62cf.tar.xz
LiterateLB-4ec94c97879aafef15f7663135745e4ba61e62cf.tar.zst
LiterateLB-4ec94c97879aafef15f7663135745e4ba61e62cf.zip
Extract first public LiterateLB version
Diffstat (limited to 'lbm.org')
-rw-r--r--lbm.org5817
1 files changed, 5817 insertions, 0 deletions
diff --git a/lbm.org b/lbm.org
new file mode 100644
index 0000000..6df9a50
--- /dev/null
+++ b/lbm.org
@@ -0,0 +1,5817 @@
+#+TITLE: A literate Lattice Boltzmann Code
+#+SUBTITLE: [[https://kummerlaender.eu][Adrian Kummerländer]]
+#+STARTUP: latexpreview
+#+HTML_DOCTYPE: html5
+#+OPTIONS: toc:nil html-postamble:nil html5-fancy:t html-style:nil
+#+PROPERTY: header-args :exports both :mkdirp yes :noweb no-export :eval no-export
+#+PROPERTY: header-args:python+ :var lattice="D2Q9"
+#+PROPERTY: header-args:cpp+ :main no :eval no
+#+HTML_MATHJAX: path:"https://static.kummerlaender.eu/mathjax/MathJax.js?config=TeX-AMS_HTML"
+
+#+HTML_HEAD: <link rel="stylesheet" type="text/css" href="org.css"/>
+
+#+BEGIN_ABSTRACT
+This file describes a full Lattice Boltzmann code featuring both 2D and 3D lattices, a workable selection of boundary conditions, Smagorinsky
+turbulence modelling, expression-level code optimization and even a full ray marcher for just-in-time volumetric visualization in addition
+to a set of interesting examples. All of this runs on GPUs using CUDA near the maximum possible performance on that platform.
+
+*This document is a [[*Open tasks][work in progress]].*
+#+BEGIN_EXPORT html
+<video style="width:100%" src="https://literatelb.org/talk/fire.webm" onloadeddata="this.play();" playsinline loop muted/>
+#+END_EXPORT
+#+END_ABSTRACT
+#+TOC: headlines 2
+
+* Preamble :noexport:
+#+BEGIN_SRC python :session :results none
+from sympy import *
+from sympy.codegen.ast import CodeBlock, Assignment
+from mako.template import Template
+import itertools
+
+def printlatexpr(*exprs):
+ print("$$\\begin{align*}")
+ for expr in exprs:
+ print(f"{latex(expr.lhs)} &:= {latex(expr.rhs)} \\\\")
+
+ print("\\end{align*}$$")
+
+def multirange(*ranges):
+ return itertools.product(*[ range(r) for r in ranges ])
+
+def multisum(f, *ranges):
+ return sum([ f(*idx) for idx in multirange(*ranges) ])
+
+descriptor = { }
+#+END_SRC
+
+#+NAME: eval-using-descriptor
+#+BEGIN_SRC python :session :results output :var src=""
+if lattice in descriptor:
+ D = descriptor[lattice]
+ print(Template(src).render(**locals()))
+#+END_SRC
+
+#+RESULTS: eval-using-descriptor
+
+* Introduction
+Approaches to modelling fluid dynamics can be coarsely grouped into three categories: Microscopic, Mesoscopic and Macroscopic models.
+Microscopic models describe the behavior of the individual /particles/ of whose the macroscopic fluid movement is an emergent property.
+On the other side of the spectrum, macroscopic models consider only the large scale properties of a fluid such as its velocity and density
+fields. One could say that microscopic particle models are closest to what actually happens in nature and macroscopic models in the form
+of the /Navier Stokes equations (NSE)/ represent an ideal vision of a fluid that only holds for /large enough/ space scales.
+As is the case for most things, what we call a /fluid/ is by itself already an abstraction.
+
+Mesoscopic models sit between these two extremes and consider neither individual particles nor purely macroscopic properties but rather
+the probability of some amount of particles moving with some velocity into a certain direction at various points in time and space. The
+mathematical field of /kinetic theory/ provides a foundation for linking all of these three models.
+
+** Kinetic theory
+One can use kinetic theory to get from a microscropic or /molecular/ system governed by e.g. Newton's laws of motion to macroscopic
+descriptions of certain properties of such a system. For example one can reach the diffusion equation for /billiard systems/ of particles
+colliding with solid obstacles or to the Navier Stokes equations for systems of pairwise colliding particles.
+
+In very rough terms one starts out by considering the microscopic system and taking its limit for infinitely many particles of
+vanishingly small size. This yields a /kinetic equation/ that describes the behaviour of the particle distribution function
+under some collision operator for any point in space and time. When we consider pairwise colliding particles the result
+turns out to be the well known /Boltzmann transport equation/:
+
+$$(\partial_t + v \cdot \nabla_x) f(t,x,v) = \Omega (f).$$
+
+This PDE describes the behavior of the probability $f(t,x,v)$ that particles move into direction $v$ at location $x$ and time $t$ for
+some collision operator $\Omega$. Discretizing this equation on a lattice for a finite set of discrete velocities is called the /Lattice Boltzmann Method (LBM)/.
+
+The practical usefulness of LBM hinges on the ability of reaching the desired /target equations/
+such as NSE or the heat equation as a limit of the kinetic equation.
+
+** Lattice Boltzmann Method
+We discretize the Boltzmann equation in spatial, velocity and temporal space on a cartesian lattice using discrete velocities
+$\xi_i$ that point to the neighbors of each cell.
+
+$$(\partial_t + v \cdot \nabla_x)f = \Omega(f) \approx f_i(x + \xi_i, t + 1) - f_i(x,t) = \Omega_i(x,t)$$
+
+A common collision operator $\Omega$ for Navier-Stokes approximation is given by Bhatnagar, Gross and Kroog (BGK).
+
+$$\Omega(f) := -\frac{f - f^\text{eq}}{\tau}$$
+
+This BGK operator /relaxes/ the local population $f$ towards some equilibrium distribution $f^\text{eq}$ with rate $\tau$. Inserting this operator
+into the discrete Boltzmann equation yields the discrete LBM BGK equation.
+
+$$f_i(x + \xi_i, t + 1) = f_i(x,t) - \frac{1}{\tau} (f_i(x,t) - f_i^\text{eq}(x,t))$$
+
+This explicit form exposes the separation of the LBM algorithm into a local collision step followed by a streaming step.
+The collision step relaxes the population of each cell towards its local equilibrium and the streaming step propagates the resulting
+updated population values to the respective neighboring cells.
+
+Macroscopic moments such as density and velocity can be calculated directly from the distribution functions.
+
+$$\rho := \sum_{i=0}^{q-1} f_i(x,t) \text{ and } u := \frac{1}{\rho} \sum_{i=0}^{q-1} \xi_i f_i(x,t)$$
+
+The equilibrium distribution $f_i^\text{eq}$ for these macroscopic properties is given by
+
+$$f_i^\text{eq}(x,t) := \omega_i \rho \left( 1 + \frac{u \cdot \xi_i}{c_s^2} + \frac{(u \cdot \xi_i)^2}{2c_s^4} - \frac{u \cdot u}{2c_s^2} \right)$$
+
+using lattice-dependent weights $\omega_i$ and speed of sound $c_s$. Note that this is just one possible discretization of the Boltzmann
+equation -- nothing stops us from using different sets of velocities, relaxation times, collision operators and so on.
+Changing these things is how different physics are modeled in LBM. e.g. what we will do in the section on
+[[*Smagorinsky BGK collision][Smagorinsky turbulence modelling]] is to locally change the relaxation time.
+
+To summarize what we are going to do for the simplest bulk fluid case: First calculate the current density and velocity
+moments of a cell, then compute the matching equilibrium distributions and finally perform the local BGK collision to
+update the cell populations.
+The last step before we start again is to propagate the post-collision values to the corresponding neighbor cells.
+
+Special care has to be taken for the boundary conditions at the lattice frontier and around any obstacle geometries.
+Such boundary conditions are one of the major topics of LBM research with a very rich toolbox of specialized collision
+steps for modeling e.g. inflow, outflow, solid or moving walls of various kinds.
+
+* Lattice
+The cartesian grids used for the spatial discretization are commonly described as =DXQY= lattices where =X= is the number of spatial dimensions
+and =Y= is the number of discrete velocities $\xi_i$. e.g. a =D2Q9= lattice is two dimensional and stores nine population values per cell. Each population has
+an associated weigth $\omega_i$ that in a sense controls its impact on the collision step. Additionally we also require the lattice speed of sound $c_s$ which is
+the speed of information propagation within the lattice.
+
+#+BEGIN_SRC python :session :results none
+from fractions import Fraction
+
+class Descriptor:
+ def __init__(self, name, data):
+ self.name = name
+ self.c = [ Matrix(eval(row[0])) for row in data ]
+ self.w = [ Rational(f.numerator, f.denominator)
+ for f in [ Fraction(row[1]) for row in data ] ]
+
+ self.d = self.c[0].shape[0]
+ self.q = len(self.c)
+ self.c_s = sqrt(Rational(1,3))
+#+END_SRC
+
+All of these constants have to be accessible for symbolic computations which is why we store them within a so called descriptor class.
+For convenience we write out the directions and weights as plain tables that are then read into the Python structure.
+
+#+NAME: load-descriptor
+#+BEGIN_SRC python :session :results output :var data=D2Q9
+descriptor[lattice] = Descriptor(lattice, data)
+print(f"Loaded D{descriptor[lattice].d}Q{descriptor[lattice].q} lattice with a weight sum of {sum(descriptor[lattice].w)}.")
+#+END_SRC
+
+#+RESULTS: load-descriptor
+: Loaded D2Q9 lattice with a weight sum of 1.
+
+#+CALL: load-descriptor(lattice="D3Q19", data=D3Q19)
+
+#+RESULTS:
+: Loaded D3Q19 lattice with a weight sum of 1.
+
+** D2Q9
+#+NAME: D2Q9
+| Direction | Weight |
+|-----------+--------|
+| (-1,-1) | 1/36 |
+| (-1, 0) | 1/9 |
+| (-1, 1) | 1/36 |
+| ( 0,-1) | 1/9 |
+| ( 0, 0) | 4/9 |
+| ( 0, 1) | 1/9 |
+| ( 1,-1) | 1/36 |
+| ( 1, 0) | 1/9 |
+| ( 1, 1) | 1/36 |
+
+** D3Q19
+#+NAME: D3Q19
+| Direction | Weight |
+|------------+--------|
+| ( 0, 1, 1) | 1/36 |
+| (-1, 0, 1) | 1/36 |
+| ( 0, 0, 1) | 1/18 |
+| ( 1, 0, 1) | 1/36 |
+| ( 0,-1, 1) | 1/36 |
+| (-1, 1, 0) | 1/36 |
+| ( 0, 1, 0) | 1/18 |
+| ( 1, 1, 0) | 1/36 |
+| (-1, 0, 0) | 1/18 |
+| ( 0, 0, 0) | 1/3 |
+| ( 1, 0, 0) | 1/18 |
+| (-1,-1, 0) | 1/36 |
+| ( 0,-1, 0) | 1/18 |
+| ( 1,-1, 0) | 1/36 |
+| ( 0, 1,-1) | 1/36 |
+| (-1, 0,-1) | 1/36 |
+| ( 0, 0,-1) | 1/18 |
+| ( 1, 0,-1) | 1/36 |
+| ( 0,-1,-1) | 1/36 |
+
+* Collision steps
+While the streaming step of the LBM algorithm only propagates the populations between cells in an unform fashion,
+the collision step determines the actual values those populations take. This means that the physical behaviour
+modelled by a given LBM algorithm is determined primarily by the collision step.
+
+In this section we are going to generate the code for bulk collisions. i.e. the collisions that model fluid and other transport
+phenomena apart from domain boundaries. Those will be handled at a later point by special purpose collision steps called
+/boundary conditions/.
+
+** Code printing
+Before we can get started on constructing expression trees and generating code from them we need to
+setup some basics so SymPy actually generates something we can compile in our environment.
+
+In order to get more fine grained control we need to overload an approproate SymPy C code printer class.
+This allows us to e.g. easily print indexed expressions that access the population array in the correct way
+or to explicitly type float constants according to a compile-time template type.
+
+#+BEGIN_SRC python :session :results none
+from sympy.printing.ccode import C11CodePrinter
+
+class CodeBlockPrinter(C11CodePrinter):
+ def __init__(self, custom_assignment, custom_functions):
+ super(CodeBlockPrinter, self).__init__()
+ self._default_settings['contract'] = False
+ self.custom_assignment = custom_assignment
+ for f in custom_functions:
+ self._kf[f] = f
+
+ def _print_Indexed(self, expr):
+ assert len(expr.indices) == 1
+ if expr.base.name[0] == 'f':
+ return f"{expr.base.name}[{expr.indices[0]}]"
+ else:
+ return f"{expr.base.name}_{expr.indices[0]}"
+
+ def _print_Float(self, flt):
+ return "T{%s}" % str(flt.evalf())
+
+ def _print_Pow(self, expr):
+ if expr.exp == -1:
+ return "T{1} / (%s)" % self.doprint(expr.base)
+ else:
+ return super()._print_Pow(expr)
+
+ def _print_Assignment(self, expr):
+ if self.custom_assignment and expr.lhs.is_Indexed and expr.lhs.base.name[0] == 'f':
+ return f"{self.doprint(expr.lhs)} = {self.doprint(expr.rhs.evalf())};"
+ else:
+ return f"T {self.doprint(expr.lhs)} = {self.doprint(expr.rhs.evalf())};"
+#+END_SRC
+
+For convenience the instantiation of this class is hidden in a =printcode= function that we can use everywhere.
+
+#+BEGIN_SRC python :session :results none
+def printcode(expr, custom_assignment=True, custom_functions=[]):
+ print(CodeBlockPrinter(custom_assignment, custom_functions).doprint(expr))
+#+END_SRC
+
+The additional assignment parameter allow us to control whether the targets of assignment expressions should be
+instantiated as a new scalar variable in addition while the functions parameter allows us to make custom runtime
+functions known to SymPy. If we don't do this for a function =f= that our assignment expression contains a reference
+to, the printer will generate unwanted comments to reflect this.
+
+*** Custom expression transformations
+We do not want to use the =pow= function for squares in the generated code. This can be achieved by providing
+a custom =ReplaceOptim= structure during the CSE optimization step that conditionally resolves =Pow= expressions.
+
+#+BEGIN_SRC python :session :results none
+from sympy.codegen.rewriting import ReplaceOptim
+
+expand_pos_square = ReplaceOptim(
+ lambda e: e.is_Pow and e.exp.is_integer and e.exp == 2,
+ lambda p: UnevaluatedExpr(Mul(p.base, p.base, evaluate = False))
+)
+
+custom_opti = cse_main.basic_optimizations + [
+ (expand_pos_square, expand_pos_square)
+]
+#+END_SRC
+
+#+BEGIN_SRC python :session :results none
+def cse(block, symbol_prefix = 'x'):
+ return block.cse(symbols = numbered_symbols(prefix=symbol_prefix), optimizations = custom_opti, order = 'none')
+#+END_SRC
+
+** Moments
+#+BEGIN_SRC python :session :results none
+i, j = symbols('i, j')
+d, q = symbols('d, q')
+xi = IndexedBase('xi')
+#+END_SRC
+
+To start we define placeholders for the spatial and discrete velocity dimensions as well as the velocity set $\xi$ of
+some lattice. As the moments are constructed using the lattice populations $f$ we also require placeholders
+for those in addition to the moments $\rho$ and $u$ themselves.
+
+#+BEGIN_SRC python :session :results none
+f = IndexedBase('f')
+rho = Symbol('rho')
+u = IndexedBase('u', d)
+u_i = Symbol('u_i')
+#+END_SRC
+
+We are now ready to formulate the density moment which is simply the sum of a cell's populations.
+
+#+BEGIN_SRC python :session :results output replace :wrap latex
+def rho_from_f(f, q):
+ return Assignment(rho, Sum(f[j], (j, 0, q-1)))
+
+printlatexpr(rho_from_f(f, q))
+#+END_SRC
+
+#+RESULTS:
+#+begin_latex
+$$\begin{align*}
+\rho &:= \sum_{j=0}^{q - 1} {f}_{j} \\
+\end{align*}$$
+#+end_latex
+
+Next we build the velocity moment function. The i-th component of the velocty $u$ is the sum of all
+relevant populations divided by the density. In this context /relevant/ populations are all values $f_j$
+for which the i-th component of velocity $\xi_j$ is non-zero.
+
+#+BEGIN_SRC python :session :results output replace :wrap latex
+def u_i_from_f(f, q, xi):
+ return Assignment(u_i, Sum(xi[j,i] * f[j], (j, 0, q-1)) / Sum(f[j], (j, 0, q-1)))
+
+printlatexpr(u_i_from_f(f, q, xi))
+#+END_SRC
+
+#+RESULTS:
+#+begin_latex
+$$\begin{align*}
+u_{i} &:= \frac{\sum_{j=0}^{q - 1} {f}_{j} {\xi}_{j,i}}{\sum_{j=0}^{q - 1} {f}_{j}} \\
+\end{align*}$$
+#+end_latex
+
+Both to illustrate what we are actually going to compute for a given lattice cell and as the
+next step towards code generation we now want to /realize/ our abstract moment expressions
+for a concrete lattice.
+
+#+BEGIN_SRC python :session :results output :wrap latex
+def realize_rho(D):
+ return rho_from_f(f, D.q).doit()
+
+D = descriptor[lattice]
+printlatexpr(realize_rho(D))
+#+END_SRC
+
+#+RESULTS:
+#+begin_latex
+$$\begin{align*}
+\rho &:= {f}_{0} + {f}_{1} + {f}_{2} + {f}_{3} + {f}_{4} + {f}_{5} + {f}_{6} + {f}_{7} + {f}_{8} \\
+\end{align*}$$
+#+end_latex
+
+#+BEGIN_SRC python :session :results none
+def from_indices(elems, car, *cdr):
+ return elems[car] if len(cdr) == 0 else from_indices(elems[car], *cdr)
+
+def realize_indexed(expr, idx, values):
+ return expr.replace(lambda expr: expr.is_Indexed and expr.base.name == idx.name,
+ lambda elem: from_indices(values, *elem.indices))
+#+END_SRC
+
+#+BEGIN_SRC python :session :results output :wrap latex
+def realize_u_i(D, j):
+ return realize_indexed(u_i_from_f(f, D.q, xi).doit().subs([(u_i, u[j]), (i, j)]), xi, D.c)
+
+D = descriptor[lattice]
+printlatexpr(realize_u_i(D, 0))
+#+END_SRC
+
+#+RESULTS:
+#+begin_latex
+$$\begin{align*}
+{u}_{0} &:= \frac{- {f}_{0} - {f}_{1} - {f}_{2} + {f}_{6} + {f}_{7} + {f}_{8}}{{f}_{0} + {f}_{1} + {f}_{2} + {f}_{3} + {f}_{4} + {f}_{5} + {f}_{6} + {f}_{7} + {f}_{8}} \\
+\end{align*}$$
+#+end_latex
+
+At this point we have everything needed to generate an optimized code snippet that
+can be used to compute density and velocity values given a set of variables containing
+a single cell's populations. As a convention we are going to prefix these /current/ population
+variables with =f_curr=.
+
+#+BEGIN_SRC python :session :results none
+def moments_code_block(D, populations):
+ f_rho = realize_indexed(realize_rho(D), f, populations)
+ f_u = [ realize_indexed(realize_u_i(D, i), f, populations) for i in range(D.d) ]
+ return CodeBlock(f_rho, *f_u).cse(symbols = numbered_symbols(prefix='m'), optimizations = custom_opti)
+#+END_SRC
+
+#+NAME: moments-from-f_curr
+#+BEGIN_SRC python :session :results output :cache yes
+printcode(moments_code_block(descriptor[lattice], IndexedBase('f_curr')))
+#+END_SRC
+
+#+RESULTS[11633b87f250d3c0a8e39c7091e10d2ef75b19dc]: moments-from-f_curr
+: T m0 = f_curr[1] + f_curr[2];
+: T m1 = f_curr[3] + f_curr[6];
+: T m2 = m0 + m1 + f_curr[0] + f_curr[4] + f_curr[5] + f_curr[7] + f_curr[8];
+: T m3 = f_curr[0] - f_curr[8];
+: T m4 = T{1} / (m2);
+: T rho = m2;
+: T u_0 = -m4*(m0 + m3 - f_curr[6] - f_curr[7]);
+: T u_1 = -m4*(m1 + m3 - f_curr[2] - f_curr[5]);
+
+Both the fluid dynamics implemented by collision steps and boundary conditions as well as functions
+for computing moments to e.g. visualize are implemented as GPU kernel functions. In CUDA this means
+plain functions marked by the =__global__= specifier. As most kernel share common aspects in their call
+requirements and parameter sets we wrap them in =__device__= functions of sensibly named structures.
+
+#+BEGIN_SRC cpp :tangle tangle/LLBM/kernel/collect_moments.h
+#pragma once
+#include <LLBM/call_tag.h>
+
+struct CollectMomentsF {
+
+using call_tag = tag::call_by_cell_id;
+
+template <typename T, typename S>
+__device__ static void apply(descriptor::D2Q9, S f_curr[9], std::size_t gid, T* cell_rho, T* cell_u) {
+ <<moments-from-f_curr(lattice="D2Q9")>>
+
+ cell_rho[gid] = rho;
+ cell_u[2*gid+0] = u_0;
+ cell_u[2*gid+1] = u_1;
+}
+
+template <typename T, typename S>
+__device__ static void apply(descriptor::D3Q19, S f_curr[19], std::size_t gid, T* cell_rho, T* cell_u) {
+ <<moments-from-f_curr(lattice="D3Q19")>>
+
+ cell_rho[gid] = rho;
+ cell_u[3*gid+0] = u_0;
+ cell_u[3*gid+1] = u_1;
+ cell_u[3*gid+2] = u_2;
+}
+
+};
+#+END_SRC
+
+** Equilibrium
+#+BEGIN_SRC python :session :results none
+f_eq = IndexedBase('f_eq', q)
+#+END_SRC
+
+$$f_i^\text{eq}(x,t) := \omega_i \rho \left( 1 + \frac{u \cdot \xi_i}{c_s^2} + \frac{(u \cdot \xi_i)^2}{2c_s^4} - \frac{u \cdot u}{2c_s^2} \right)$$
+
+Calculating the equilibrium distribution of some population $f_i$ requires the evaluation of inner products
+between vectors. As there doesn't seem to be a nice way of writing an abstract SymPy expression that both
+generates nice LaTeX and can be realized on a concrete lattice we skip the fully abstract step and jump
+right into the latter part.
+
+#+BEGIN_SRC python :session :results output :wrap latex
+def realize_f_eq_i(D, i):
+ v = Matrix([ u[j] for j in range(D.d) ])
+ return Assignment(f_eq[i], D.w[i] * rho * ( 1
+ + D.c[i].dot(v) / D.c_s**2
+ + D.c[i].dot(v)**2 / (2*D.c_s**4)
+ - v.dot(v) / (2*D.c_s**2) ))
+
+D = descriptor[lattice]
+printlatexpr(realize_f_eq_i(D, 0))
+#+END_SRC
+
+#+RESULTS:
+#+begin_latex
+$$\begin{align*}
+{f_{eq}}_{0} &:= \frac{\rho \left(\frac{9 \left(- {u}_{0} - {u}_{1}\right)^{2}}{2} - \frac{3 {u}_{0}^{2}}{2} - 3 {u}_{0} - \frac{3 {u}_{1}^{2}}{2} - 3 {u}_{1} + 1\right)}{36} \\
+\end{align*}$$
+#+end_latex
+
+#+BEGIN_SRC python :session :results none
+def equilibrium_code_block(D):
+ f_moment_eq = [ Assignment(f_next[i], realize_f_eq_i(D, i).rhs) for i in range(D.q) ]
+ return cse(CodeBlock(*f_moment_eq), symbol_prefix = 'e')
+#+END_SRC
+
+#+NAME: equilibrium-from-moments
+#+BEGIN_SRC python :session :results output :cache yes
+printcode(equilibrium_code_block(descriptor[lattice]))
+#+END_SRC
+
+#+RESULTS[7843a5ae8cf9ac0b63787ab4080ba2488a6cc0a7]: equilibrium-from-moments
+#+begin_example
+T e0 = T{0.0277777777777778}*rho;
+T e1 = T{3.00000000000000}*u_1;
+T e2 = T{3.00000000000000}*u_0;
+T e3 = u_0 + u_1;
+T e4 = T{4.50000000000000}*(e3*e3);
+T e5 = u_1*u_1;
+T e6 = T{1.50000000000000}*e5;
+T e7 = u_0*u_0;
+T e8 = T{1.50000000000000}*e7;
+T e9 = e8 + T{-1.00000000000000};
+T e10 = e6 + e9;
+T e11 = T{0.111111111111111}*rho;
+T e12 = -e2;
+T e13 = T{1.00000000000000} - e6;
+T e14 = e13 + T{3.00000000000000}*e7;
+T e15 = -e8;
+T e16 = e1 + e15;
+T e17 = u_0 - u_1;
+T e18 = e13 + T{4.50000000000000}*(e17*e17);
+T e19 = T{3.00000000000000}*e5;
+f_next[0] = -e0*(e1 + e10 + e2 - e4);
+f_next[1] = e11*(e12 + e14);
+f_next[2] = e0*(e12 + e16 + e18);
+f_next[3] = -e11*(e1 - e19 + e9);
+f_next[4] = -T{0.444444444444444}*e10*rho;
+f_next[5] = e11*(e16 + e19 + T{1.00000000000000});
+f_next[6] = e0*(-e1 + e15 + e18 + e2);
+f_next[7] = e11*(e14 + e2);
+f_next[8] = e0*(e13 + e16 + e2 + e4);
+#+end_example
+
+For initialization of lattice data it often makes sense to choose values that are invariant under this
+equilibrium computation. Cells initialized in this way will not change during the collision step
+and as such can be considered to model a fluid /at rest/.
+
+#+BEGIN_SRC python :session :results none
+def initialize_equilibrium(D):
+ return [ Assignment(IndexedBase("f_next", D.q)[i], w_i) for i, w_i in enumerate(D.w) ]
+#+END_SRC
+
+#+NAME: initialize-populations
+#+BEGIN_SRC python :session :results output
+D = descriptor[lattice]
+printcode(CodeBlock(*initialize_equilibrium(D)))
+#+END_SRC
+
+#+RESULTS: initialize-populations
+: f_next[0] = T{0.0277777777777778};
+: f_next[1] = T{0.111111111111111};
+: f_next[2] = T{0.0277777777777778};
+: f_next[3] = T{0.111111111111111};
+: f_next[4] = T{0.444444444444444};
+: f_next[5] = T{0.111111111111111};
+: f_next[6] = T{0.0277777777777778};
+: f_next[7] = T{0.111111111111111};
+: f_next[8] = T{0.0277777777777778};
+
+#+BEGIN_SRC cpp :tangle tangle/LLBM/kernel/initialize.h
+#pragma once
+#include <LLBM/call_tag.h>
+
+struct InitializeO {
+
+using call_tag = tag::call_by_cell_id;
+
+template <typename T, typename S>
+__device__ static void apply(descriptor::D2Q9, S f_curr[9], S f_next[9], std::size_t gid) {
+ <<initialize-populations(lattice="D2Q9")>>
+}
+
+template <typename T, typename S>
+__device__ static void apply(descriptor::D3Q19, S f_curr[19], S f_next[19], std::size_t gid) {
+ <<initialize-populations(lattice="D3Q19")>>
+}
+
+};
+#+END_SRC
+
+** BGK collision
+The BGK collision operators takes a current population $f^{curr}_i$ and /relaxes/ it toward the equilibrium distribution
+$f^{eq}_i$ with some rate $\tau$. The result of this process is the new population $f^{next}_i$.
+
+#+BEGIN_SRC python :session :results none
+tau = Symbol('tau')
+f_curr = IndexedBase('f_curr', q)
+f_next = IndexedBase('f_next', q)
+f_curr_i, f_next_i, f_eq_i = symbols('f^curr_i, f^next_i, f^eq_i')
+#+END_SRC
+
+#+BEGIN_SRC python :session :results output :wrap latex
+def bgk_collision(f_curr, f_next, f_eq, tau):
+ return Assignment(f_next, f_curr + 1/tau * (f_eq - f_curr))
+
+printlatexpr(bgk_collision(f_curr_i, f_next_i, f_eq_i, tau))
+#+END_SRC
+
+#+RESULTS:
+#+begin_latex
+$$\begin{align*}
+f^{next}_{i} &:= f^{curr}_{i} + \frac{- f^{curr}_{i} + f^{eq}_{i}}{\tau} \\
+\end{align*}$$
+#+end_latex
+
+As we might want to use different moment values than the ones constructed from the current population
+as the foundation for the equilibrium distribution the generated code will assume variables =rho= and =u_i=
+to exist. Building the expression tree for code generation is now as simple as instantiating the BGK
+operator for all $q$ directions and substituting the equilibrium distribution.
+
+#+BEGIN_SRC python :session :results none
+def bgk_collision_code_block(D):
+ f_eq_def = [ realize_f_eq_i(D, i).rhs for i in range(D.q) ]
+ f_post_collide = [ bgk_collision(f_curr[i], f_next[i], f_eq_def[i], tau) for i in range(D.q) ]
+ return CodeBlock(*f_post_collide).cse(optimizations = custom_opti)
+#+END_SRC
+
+#+NAME: bgk-collide-to-f_next
+#+BEGIN_SRC python :session :results output :cache yes
+printcode(bgk_collision_code_block(descriptor[lattice]), custom_assignment=True)
+#+END_SRC
+
+#+RESULTS[b9035596468cbdcf22f3e63f7fa43d74a6261849]: bgk-collide-to-f_next
+#+begin_example
+T x0 = T{1} / (tau);
+T x1 = T{0.0138888888888889}*x0;
+T x2 = T{6.00000000000000}*u_1;
+T x3 = T{6.00000000000000}*u_0;
+T x4 = u_0 + u_1;
+T x5 = T{9.00000000000000}*(x4*x4);
+T x6 = u_1*u_1;
+T x7 = T{3.00000000000000}*x6;
+T x8 = u_0*u_0;
+T x9 = T{3.00000000000000}*x8;
+T x10 = x9 + T{-2.00000000000000};
+T x11 = x10 + x7;
+T x12 = T{0.0555555555555556}*x0;
+T x13 = -x3;
+T x14 = T{2.00000000000000} - x7;
+T x15 = x14 + T{6.00000000000000}*x8;
+T x16 = -x9;
+T x17 = x16 + x2;
+T x18 = u_0 - u_1;
+T x19 = x14 + T{9.00000000000000}*(x18*x18);
+T x20 = T{6.00000000000000}*x6;
+f_next[0] = -x1*(rho*(x11 + x2 + x3 - x5) + T{72.0000000000000}*f_curr[0]) + f_curr[0];
+f_next[1] = x12*(rho*(x13 + x15) - T{18.0000000000000}*f_curr[1]) + f_curr[1];
+f_next[2] = x1*(rho*(x13 + x17 + x19) - T{72.0000000000000}*f_curr[2]) + f_curr[2];
+f_next[3] = -x12*(rho*(x10 + x2 - x20) + T{18.0000000000000}*f_curr[3]) + f_curr[3];
+f_next[4] = -T{0.111111111111111}*x0*(T{2.00000000000000}*rho*x11 + T{9.00000000000000}*f_curr[4]) + f_curr[4];
+f_next[5] = x12*(rho*(x17 + x20 + T{2.00000000000000}) - T{18.0000000000000}*f_curr[5]) + f_curr[5];
+f_next[6] = x1*(rho*(x16 + x19 - x2 + x3) - T{72.0000000000000}*f_curr[6]) + f_curr[6];
+f_next[7] = x12*(rho*(x15 + x3) - T{18.0000000000000}*f_curr[7]) + f_curr[7];
+f_next[8] = x1*(rho*(x14 + x17 + x3 + x5) - T{72.0000000000000}*f_curr[8]) + f_curr[8];
+#+end_example
+
+In order to call this collision kernel on the GPU we wrap it in =apply= functions of an appropriately
+named operator structure.
+
+#+BEGIN_SRC cpp :tangle tangle/LLBM/kernel/collide.h
+#pragma once
+#include <LLBM/call_tag.h>
+
+struct BgkCollideO {
+
+using call_tag = tag::call_by_cell_id;
+
+template <typename T, typename S>
+__device__ static void apply(descriptor::D2Q9, S f_curr[9], S f_next[9], std::size_t gid, T tau) {
+ <<moments-from-f_curr(lattice="D2Q9")>>
+ <<bgk-collide-to-f_next(lattice="D2Q9")>>
+}
+
+template <typename T, typename S>
+__device__ static void apply(descriptor::D3Q19, S f_curr[19], S f_next[19], std::size_t gid, T tau) {
+ <<moments-from-f_curr(lattice="D3Q19")>>
+ <<bgk-collide-to-f_next(lattice="D3Q19")>>
+}
+
+};
+#+END_SRC
+
+#+BEGIN_SRC cpp :tangle tangle/LLBM/bulk.h
+#include "kernel/collide.h"
+#+END_SRC
+
+** Smagorinsky BGK collision
+Simulation of turbulent flow using plain BGK collisions is possible -- after all turbulence is captured by the Navier-Stokes
+equations that in turn are the target equations of our BGK-LB method -- but requires very highly resolved grids. The reason
+for this is that turbulence is characterized by the formation of eddies at both very big and very small scales. Most of the
+energy is contained in the large scale features but dissipates into heat at the finest scales. To capture turbulent flow we either
+have to resolve the grid all the way to these finest scales or implement some kind of model for the characteristics of these
+scales in a coarser grid. Computation on such a coarser grid is then also called a large eddy simulation (LES).
+
+One comparably simple model for respresenting the smaller eddies in such a LES is the Smagorinsky subgrid-scale model.
+This model yields a expression for computing the /effective relaxation rate/ $\tau_\text{eff}$ on a per-cell basis given the global relaxation
+time $\tau$ and a Smagorinsky constant. As the relaxation time in BGK LBM is a function of the viscosity this translates into
+computing the effective viscosity using a local strain-rate tensor reconstruction based on the non-equilibrium part of each
+cell's populations.
+
+#+BEGIN_SRC python :session :results none
+tau, smagorinsky = symbols('tau, smagorinsky')
+#+END_SRC
+
+The non-equilibrium part is simply the difference between the actual population stored in a cell and the respective
+equilibrium population that we relax towards. Using these local non-equilibrium parts to reconstruct the strain-rate
+tensor $\Pi_{i,j}^\text{neq}$ is quite convenient as we otherwise would have to employ e.g. a finite difference method just for this.
+
+#+BEGIN_SRC python :session :results none
+def pi_neq(D, f, f_eq):
+ pi_neq = zeros(D.d, D.d)
+ for i, j, k in multirange(D.d, D.d, D.q):
+ pi_neq[i,j] += D.c[k][i] * D.c[k][j] * (f[k] - f_eq[k])
+
+ return pi_neq
+#+END_SRC
+
+To compute the effective relaxation rate we need the norm of this strain-rate tensor.
+
+#+BEGIN_SRC python :session :results none
+def pi_neq_norm(D, f, f_eq):
+ pi = pi_neq(D, f, f_eq)
+ return sqrt(2*multisum(lambda i, j: pi[i,j]**2, D.d, D.d))
+#+END_SRC
+
+#+BEGIN_SRC python :session :results none
+def effective_tau(D, f, f_eq, tau, smagorinsky):
+ pi_norm = pi_neq_norm(D, f, f_eq)
+ return tau + 0.5*(sqrt(tau**2 + 2*sqrt(2)*smagorinsky**2 * pi_norm / D.c_s**4) - tau)
+#+END_SRC
+
+Finally the resulting per-cell relaxation time is simply plugged into the existing BGK collision operator to yield the
+complete Smagorinsky BGK collision step.
+
+#+BEGIN_SRC python :session :results none
+def smagorinsky_bgk_collision_code_block(D, tau, smagorinsky):
+ f_rho = realize_indexed(realize_rho(D), f, f_curr)
+ f_u = [ realize_indexed(realize_u_i(D, i), f, f_curr) for i in range(D.d) ]
+ f_eq = [ realize_f_eq_i(D, i).rhs for i in range(D.q) ]
+ eff_tau = effective_tau(D, f_curr, f_eq, tau, smagorinsky)
+ f_post_collide = [ bgk_collision(f_curr[i], f_next[i], f_eq[i], eff_tau) for i in range(D.q) ]
+ return CodeBlock(f_rho, *f_u, *f_post_collide).cse(optimizations = custom_opti)
+#+END_SRC
+
+This way the BGK collisions are numerically stabilized for low resolutions and high Reynolds numbers.
+
+#+NAME: smagorinsky-bgk-collide-to-f_next
+#+BEGIN_SRC python :session :results output :cache yes
+D = descriptor[lattice]
+printcode(smagorinsky_bgk_collision_code_block(D, tau, smagorinsky))
+#+END_SRC
+
+#+RESULTS[b6cd612f0c97e3da6908e302f304b8a50516eb05]: smagorinsky-bgk-collide-to-f_next
+#+begin_example
+T x0 = f_curr[1] + f_curr[2];
+T x1 = f_curr[3] + f_curr[6];
+T x2 = x0 + x1 + f_curr[0] + f_curr[4] + f_curr[5] + f_curr[7] + f_curr[8];
+T x3 = f_curr[0] - f_curr[8];
+T x4 = T{1} / (x2);
+T x5 = T{72.0000000000000}*f_curr[2];
+T x6 = T{72.0000000000000}*f_curr[6];
+T rho = x2;
+T x31 = T{4.00000000000000}*rho;
+T x40 = T{2.00000000000000}*rho;
+T u_0 = -x4*(x0 + x3 - f_curr[6] - f_curr[7]);
+T x7 = T{6.00000000000000}*u_0;
+T x8 = -x7;
+T x15 = u_0*u_0;
+T x16 = T{3.00000000000000}*x15;
+T x17 = -x16;
+T x27 = x16 + T{-2.00000000000000};
+T u_1 = -x4*(x1 + x3 - f_curr[2] - f_curr[5]);
+T x9 = u_0 - u_1;
+T x10 = T{9.00000000000000}*(x9*x9);
+T x11 = u_1*u_1;
+T x12 = T{3.00000000000000}*x11;
+T x13 = T{2.00000000000000} - x12;
+T x14 = T{6.00000000000000}*u_1;
+T x18 = x14 + x17;
+T x19 = x13 + x18;
+T x20 = x10 + x19 + x8;
+T x21 = rho*x20;
+T x22 = x10 + x13 - x14 + x17 + x7;
+T x23 = rho*x22;
+T x24 = u_0 + u_1;
+T x25 = T{9.00000000000000}*(x24*x24);
+T x26 = x19 + x25 + x7;
+T x28 = x12 + x27;
+T x29 = x14 - x25 + x28 + x7;
+T x30 = rho*x26 - rho*x29 - T{72.0000000000000}*f_curr[0] - T{72.0000000000000}*f_curr[8];
+T x32 = x13 + T{6.00000000000000}*x15;
+T x33 = x32 + x8;
+T x34 = x32 + x7;
+T x35 = x21 + x23 + x30 - x5 - x6;
+T x36 = T{6.00000000000000}*x11;
+T x37 = x14 + x27 - x36;
+T x38 = x18 + x36 + T{2.00000000000000};
+T x39 = T{1} / (tau + sqrt(T{0.707106781186548}*(smagorinsky*smagorinsky)*sqrt((-x21 - x23 + x30 + x5 + x6)*(-x21 - x23 + x30 + x5 + x6) + T{0.500000000000000}*((x31*x33 + x31*x34 + x35 - 72*f_curr[1] - 72*f_curr[7])*(x31*x33 + x31*x34 + x35 - 72*f_curr[1] - 72*f_curr[7])) + T{0.500000000000000}*((-x31*x37 + x31*x38 + x35 - 72*f_curr[3] - 72*f_curr[5])*(-x31*x37 + x31*x38 + x35 - 72*f_curr[3] - 72*f_curr[5]))) + tau*tau));
+f_next[0] = -T{0.0138888888888889}*x39*(x29*x40 + T{144.000000000000}*f_curr[0]) + f_curr[0];
+f_next[1] = T{0.0555555555555556}*x39*(x33*x40 - T{36.0000000000000}*f_curr[1]) + f_curr[1];
+f_next[2] = T{0.0138888888888889}*x39*(x20*x40 - T{144.000000000000}*f_curr[2]) + f_curr[2];
+f_next[3] = -T{0.0555555555555556}*x39*(x37*x40 + T{36.0000000000000}*f_curr[3]) + f_curr[3];
+f_next[4] = -T{0.111111111111111}*x39*(T{4.00000000000000}*rho*x28 + T{18.0000000000000}*f_curr[4]) + f_curr[4];
+f_next[5] = T{0.0555555555555556}*x39*(x38*x40 - T{36.0000000000000}*f_curr[5]) + f_curr[5];
+f_next[6] = T{0.0138888888888889}*x39*(x22*x40 - T{144.000000000000}*f_curr[6]) + f_curr[6];
+f_next[7] = T{0.0555555555555556}*x39*(x34*x40 - T{36.0000000000000}*f_curr[7]) + f_curr[7];
+f_next[8] = T{0.0138888888888889}*x39*(x26*x40 - T{144.000000000000}*f_curr[8]) + f_curr[8];
+#+end_example
+
+
+#+BEGIN_SRC cpp :tangle tangle/LLBM/kernel/smagorinsky_collide.h
+#pragma once
+#include <LLBM/call_tag.h>
+
+struct SmagorinskyBgkCollideO {
+
+using call_tag = tag::call_by_cell_id;
+
+template <typename T, typename S>
+__device__ static void apply(descriptor::D2Q9, S f_curr[9], S f_next[9], std::size_t gid, T tau, T smagorinsky) {
+ <<smagorinsky-bgk-collide-to-f_next(lattice="D2Q9")>>
+}
+
+template <typename T, typename S>
+__device__ static void apply(descriptor::D3Q19, S f_curr[19], S f_next[19], std::size_t gid, T tau, T smagorinsky) {
+ <<smagorinsky-bgk-collide-to-f_next(lattice="D3Q19")>>
+}
+
+};
+#+END_SRC
+
+#+BEGIN_SRC cpp :tangle tangle/LLBM/bulk.h
+#include "kernel/smagorinsky_collide.h"
+#+END_SRC
+
+* Boundary conditions
+Real-world simulations are limited by the available computational resources. This means that we can not allocate
+an infinitely large lattice and consequently we at a minimum need to prescribe some conditions for the outer boundary
+of our finite lattice -- even in cases where we only want to simulate a fluid without any obstacles.
+In practice we commonly want to do both: Prescribe some inflow and outflow conditions as well as various boundaries
+that represent some obstacle geometry. This way we could for example create a virtual wind tunnel where fluid enters
+the domain on one side, is kept in line by smooth free slip walls, encounters some obstacle whose aerodynamic
+properties we want to investigate and exits the simulation lattice on the other side.
+** Bounce back
+To fit bounce back's reputation as the simplest LBM boundary condition we do not require any
+fancy expression trickery to generate its code. This boundary condition simply reflects back
+all populations the way they came from. As such it models a solid wall with no tangential velocity
+at the boundary.
+
+#+BEGIN_SRC python :session :results none
+def bounce_back(D, populations):
+ return [ Assignment(f_next[i], populations[D.c.index(-c_i)]) for i, c_i in enumerate(D.c) ]
+#+END_SRC
+
+#+NAME: bounce-back-full-way
+#+BEGIN_SRC python :session :results output
+D = descriptor[lattice]
+printcode(CodeBlock(*bounce_back(D, IndexedBase('f_curr', D.q))))
+#+END_SRC
+
+#+RESULTS: bounce-back-full-way
+: f_next[0] = f_curr[8];
+: f_next[1] = f_curr[7];
+: f_next[2] = f_curr[6];
+: f_next[3] = f_curr[5];
+: f_next[4] = f_curr[4];
+: f_next[5] = f_curr[3];
+: f_next[6] = f_curr[2];
+: f_next[7] = f_curr[1];
+: f_next[8] = f_curr[0];
+
+If this is used to model the walls of a simple pipe setup we will observe the well known Poiseuille velocity profile.
+
+#+BEGIN_SRC cpp :tangle tangle/LLBM/kernel/bounce_back.h
+#pragma once
+#include <LLBM/call_tag.h>
+
+struct BounceBackO {
+
+using call_tag = tag::call_by_cell_id;
+
+template <typename T, typename S>
+__device__ static void apply(descriptor::D2Q9, S f_curr[9], S f_next[9], std::size_t gid) {
+ <<bounce-back-full-way(lattice="D2Q9")>>
+}
+
+template <typename T, typename S>
+__device__ static void apply(descriptor::D3Q19, S f_curr[19], S f_next[19], std::size_t gid) {
+ <<bounce-back-full-way(lattice="D3Q19")>>
+}
+
+};
+#+END_SRC
+
+#+BEGIN_SRC cpp :tangle tangle/LLBM/boundary.h
+#include "kernel/bounce_back.h"
+#+END_SRC
+
+** Moving wall bounce back
+Modeling solid unmoving obstacles using e.g. bounce back is nice enough but in practice
+we commonly want to induce some kind of movement in our fluids. While this can be done
+by e.g. prescribing an inflow velocity using [[*Prescribed equilibrium][prescribed equilibrium boundaries]], bounce
+back can be modified to represent not just a solid wall but also some momentum exerted
+on the fluid.
+
+#+BEGIN_SRC python :session :results none
+def moving_wall_correction(D, i):
+ u_raw = symarray('u', D.d)
+ return 2 * D.w[D.c.index(-D.c[i])] / D.c_s**2 * -D.c[i].dot(Matrix(u_raw))
+#+END_SRC
+
+The simplest way of incorporating such movement into bounce back is to add the velocity
+components tangential to the respective population's direction.
+
+#+BEGIN_SRC python :session :results none
+def moving_wall_bounce_back(D, populations):
+ return [ Assignment(expr.lhs, expr.rhs - moving_wall_correction(D, i))
+ for