aboutsummaryrefslogtreecommitdiff
path: root/articles/2019-05-25_meta_descriptor.md
blob: 98780e0f72ffa462d223aa64310337b9571a9997 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
# Expressive meta templates for flexible handling of compile-time constants

So [we](http://www.lbrg.kit.edu/) recently released a new version of [OpenLB](http://www.openlb.net) which includes a major refactoring of the central datastructure used to handle various kinds of compile-time constants required by the simulation. This article will summarize the motivation and design of this new concept as well as highlight a couple of tricks and pitfalls in the context of template metaprogramming.

## What is a descriptor?

Every simulation based on Lattice Boltzmann Methods can be characterized by a set of constants such as the modelled spatial dimension, the number of neighbors in the underlying regular grid, the weights used to compute equilibrium distributions or the lattice speed of sound. Due to OpenLB's goal of offering a wide variety of LB models to address many different kinds of flow problems, the constants are not hardcoded throughout the codebase but rather maintained in compile-time data structures. Any usage of these constants can then refer to the characterizing descriptor data structure.

```cpp
/// Old equilibrium implementation using descriptor data
static T equilibrium(int iPop, T rho, const T u[DESCRIPTOR::d], const T uSqr)
{
  T c_u = T();
  for (int iD=0; iD < DESCRIPTOR::d; ++iD) {
    c_u += DESCRIPTOR::c[iPop][iD]*u[iD];
  }
  return rho
       * DESCRIPTOR::t[iPop]
       * ( (T)1
         + DESCRIPTOR::invCs2 * c_u
         + DESCRIPTOR::invCs2 * DESCRIPTOR::invCs2 * (T)0.5 * c_u * c_u
         - DESCRIPTOR::invCs2 * (T)0.5 * uSqr )
       - DESCRIPTOR::t[iPop];
}
```

As many parts of the code do not actually care which specific descriptor is used, most classes and functions are templates that accept any user-defined descriptor type. This allows us to e.g. select descriptor specific optimizations[^1] via plain template specializations.

To continue, the descriptor concept is tightly coupled to the definition of the cells that make up the simulation lattice. The reason for this connection is that we require some place to store the essential per-direction population fields for each node of the lattice. In OpenLB this place is currently the `Cell` class[^2] which locally maintains the population data and as such implements a collision-optimized _array of structures_ memory layout. As a side note this was the initial motivation for rethinking the descriptor concept as we require more flexible structures to turn this into a more efficient _structures of arrays_ situation[^3].

[^1]: e.g. collision steps where all generic code is resolved using common subexpression elimination in order to minimze the number of floating point operations
[^2]: see `src/core/cell.h` for further reading
[^3]: The performance LBM codes is in general not bound by the available processing power but but rather by how well we utilize the available memory bandwidth. i.e. we want to optimize memory throughput as much as possible which leads us to the need for more efficient streaming steps that in turn require changes to the memory layout.

## What was used prior to refactoring?

To better appreciate the new concept we should probably first take a closer look at how stuff this stuff was implemented previously. As a starting point all descriptors were derived from a descriptor base type such as `D2Q9DescriptorBase` for two dimensional lattices with nine discrete velocities:

```cpp
template <typename T>
struct D2Q9DescriptorBase {
  typedef D2Q9DescriptorBase<T> BaseDescriptor;
  enum { d = 2, q = 9 };        ///< number of dimensions/distr. functions
  static const int vicinity;    ///< size of neighborhood
  static const int c[q][d];     ///< lattice directions
  static const int opposite[q]; ///< opposite entry
  static const T t[q];          ///< lattice weights
  static const T invCs2;        ///< inverse square of speed of sound
};
```

As we can see this is a plain struct template with some static member constants to store the data. This in itself is not problematic and worked just fine since the project's inception. Note that the template allows for specification of the floating point type used for all non-integer data. This is required to e.g. use automatic differentiation types that allow for taking the derivative of the whole simulation in order to apply optimization techniques.

```cpp
template<typename T>
const int D2Q9DescriptorBase<T>::vicinity = 1;

template<typename T>
const int D2Q9DescriptorBase<T>::c
[D2Q9DescriptorBase<T>::q][D2Q9DescriptorBase<T>::d] = {
  { 0, 0},
  {-1, 1}, {-1, 0}, {-1,-1}, { 0,-1},
  { 1,-1}, { 1, 0}, { 1, 1}, { 0, 1}
};

template<typename T>
const int D2Q9DescriptorBase<T>::opposite[D2Q9DescriptorBase<T>::q] = {
  0, 5, 6, 7, 8, 1, 2, 3, 4
};

template<typename T>
const T D2Q9DescriptorBase<T>::t[D2Q9DescriptorBase<T>::q] = {
  (T)4/(T)9, (T)1/(T)36, (T)1/(T)9, (T)1/(T)36, (T)1/(T)9,
  (T)1/(T)36, (T)1/(T)9, (T)1/(T)36, (T)1/(T)9
};

template<typename T>
const T D2Q9DescriptorBase<T>::invCs2 = (T)3;
```

The actual data was stored in a separate header `src/dynamics/latticeDescriptors.hh`. All in all this very straight forward approach worked as expected and could be fully resolved at compile time to avoid unnecessary run time jumps inside critical code sections as far as the descriptor concept is concerned. The real issue starts when we take a look at the so called _external fields_:

```cpp
struct Force2dDescriptor {
  static const int numScalars = 2;
  static const int numSpecies = 1;
  static const int forceBeginsAt = 0;
  static const int sizeOfForce   = 2;
};

struct Force2dDescriptorBase {
  typedef Force2dDescriptor ExternalField;
};

template <typename T> struct ForcedD2Q9Descriptor
  : public D2Q9DescriptorBase<T>, public Force2dDescriptorBase {
};
```

Some LBM models require additional per-cell data such as external force vectors or values to model chemical properties. As we can see the declaration of these _external fields_ is another task of the descriptor data structure and _the_ task that was solved the ugliest in our original implementation.

```cpp
// Set force vectors in all cells of material number 1
sLattice.defineExternalField( superGeometry, 1,
                              DESCRIPTOR<T>::ExternalField::forceBeginsAt,
                              DESCRIPTOR<T>::ExternalField::sizeOfForce,
                              force );
```

For example this is a completely unsafe access to raw memory as `forceBeginsAt` and `sizeOfForce` define arbitrary memory offsets. And while we might not care about security in this context you can probably imagine the kinds of obscure bugs caused by potentially faulty and inconsistent handling of such offsets. To make things worse the naming of external field indices and size constants was inconsistent between different fields and stuff only worked as long as a unclear set of naming and layout conventions was followed.

If you want to risk an even closer look[^4] you can download [version 1.2 or earlier](https://www.openlb.net/download/) and start your dive in `src/dynamics/latticeDescriptors.h`. Otherwise we are going to continue with a description of the new approach.

[^4]: Note that this examination of the issues with the previous descriptor concept is not aimed to be a strike at its original developers but rather as an example of how things can get out of hand when expanding a initial concept to cover more and more stuff. As far as legacy code is concerned this is still relatively tame and obviously the niceness of such scaffolding for the actual simulation is a side show when one first and foremost wants to generate new results.

## What is a meta descriptor?

The initial spark for the development of the new meta descriptor concept was the idea to define external fields as the parametrization of a multilinear function on the foundational `D` and `Q` constants of each descriptor[^5]. Lists of such functions could then be passed around via variadic template argument lists. This would then allow for handling external fields in a manner that is both flexible and consistent across all descriptors.

[^5]: i.e. each field describes its size as a function $f : \mathbb{N}_0^3 \to \mathbb{N}_0, (a,b,c) \mapsto a + b D + c Q$

Before we delve into the details if how these expectations were implemented let us first take a look at how the basic `D2Q9` descriptor is defined in the latest OpenLB release:

```cpp
template <typename... FIELDS>
struct D2Q9 : public DESCRIPTOR_BASE<2,9,POPULATION,FIELDS...> {
  typedef D2Q9<FIELDS...> BaseDescriptor;
  D2Q9() = delete;
};

namespace data {

template <>
constexpr int vicinity<2,9> = 1;

template <>
constexpr int c<2,9>[9][2] = {
  { 0, 0},
  {-1, 1}, {-1, 0}, {-1,-1}, { 0,-1},
  { 1,-1}, { 1, 0}, { 1, 1}, { 0, 1}
};

template <>
constexpr int opposite<2,9>[9] = {
  0, 5, 6, 7, 8, 1, 2, 3, 4
};

template <>
constexpr Fraction t<2,9>[9] = {
  {4, 9}, {1, 36}, {1, 9}, {1, 36}, {1, 9},
  {1, 36}, {1, 9}, {1, 36}, {1, 9}
};

template <>
constexpr Fraction cs2<2,9> = {1, 3};

}
```

These few compact lines[^51] describe the whole structure including all of its data. The various functions to access this data are auto-generated in a generic fashion using template metaprogramming and the previously verbose definition of a forced LB model reduces to a single self-explanatory type alias:

[^51]: See `src/dynamics/latticeDescriptors.h`

```cpp
using ForcedD2Q9Descriptor = D2Q9<FORCE>;
```

Descriptor data is now exposed via an adaptable set of free functions templated on the descriptor type. This was required to satisfy a secondary goal of decoupling descriptor data definitions and accesses in order to add support for both transparent auto-generation and platform adaptation (i.e. adding workarounds for porting the code to the GPU).

```cpp
/// Refactored generic equilibrium implementation
static T equilibrium(int iPop, T rho, const T u[DESCRIPTOR::d], const T uSqr)
{
  T c_u = T{};
  for (int iD = 0; iD < DESCRIPTOR::d; ++iD) {
    c_u += descriptors::c<DESCRIPTOR>(iPop,iD) * u[iD];
  }
  return rho
       * descriptors::t<T,DESCRIPTOR>(iPop)
       * ( T{1}
         + descriptors::invCs2<T,DESCRIPTOR>() * c_u
         + descriptors::invCs2<T,DESCRIPTOR>()
           * descriptors::invCs2<T,DESCRIPTOR>()
           * T{0.5} * c_u * c_u
         - descriptors::invCs2<T,DESCRIPTOR>()</