aboutsummaryrefslogtreecommitdiff
path: root/articles/2019-05-25_meta_descriptor.md
blob: 98780e0f72ffa462d223aa64310337b9571a9997 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
# Expressive meta templates for flexible handling of compile-time constants

So [we](http://www.lbrg.kit.edu/) recently released a new version of [OpenLB](http://www.openlb.net) which includes a major refactoring of the central datastructure used to handle various kinds of compile-time constants required by the simulation. This article will summarize the motivation and design of this new concept as well as highlight a couple of tricks and pitfalls in the context of template metaprogramming.

## What is a descriptor?

Every simulation based on Lattice Boltzmann Methods can be characterized by a set of constants such as the modelled spatial dimension, the number of neighbors in the underlying regular grid, the weights used to compute equilibrium distributions or the lattice speed of sound. Due to OpenLB's goal of offering a wide variety of LB models to address many different kinds of flow problems, the constants are not hardcoded throughout the codebase but rather maintained in compile-time data structures. Any usage of these constants can then refer to the characterizing descriptor data structure.

```cpp
/// Old equilibrium implementation using descriptor data
static T equilibrium(int iPop, T rho, const T u[DESCRIPTOR::d], const T uSqr)
{
  T c_u = T();
  for (int iD=0; iD < DESCRIPTOR::d; ++iD) {
    c_u += DESCRIPTOR::c[iPop][iD]*u[iD];
  }
  return rho
       * DESCRIPTOR::t[iPop]
       * ( (T)1
         + DESCRIPTOR::invCs2 * c_u
         + DESCRIPTOR::invCs2 * DESCRIPTOR::invCs2 * (T)0.5 * c_u * c_u
         - DESCRIPTOR::invCs2 * (T)0.5 * uSqr )
       - DESCRIPTOR::t[iPop];
}
```

As many parts of the code do not actually care which specific descriptor is used, most classes and functions are templates that accept any user-defined descriptor type. This allows us to e.g. select descriptor specific optimizations[^1] via plain template specializations.

To continue, the descriptor concept is tightly coupled to the definition of the cells that make up the simulation lattice. The reason for this connection is that we require some place to store the essential per-direction population fields for each node of the lattice. In OpenLB this place is currently the `Cell` class[^2] which locally maintains the population data and as such implements a collision-optimized _array of structures_ memory layout. As a side note this was the initial motivation for rethinking the descriptor concept as we require more flexible structures to turn this into a more efficient _structures of arrays_ situation[^3].

[^1]: e.g. collision steps where all generic code is resolved using common subexpression elimination in order to minimze the number of floating point operations
[^2]: see `src/core/cell.h` for further reading
[^3]: The performance LBM codes is in general not bound by the available processing power but but rather by how well we utilize the available memory bandwidth. i.e. we want to optimize memory throughput as much as possible which leads us to the need for more efficient streaming steps that in turn require changes to the memory layout.

## What was used prior to refactoring?

To better appreciate the new concept we should probably first take a closer look at how stuff this stuff was implemented previously. As a starting point all descriptors were derived from a descriptor base type such as `D2Q9DescriptorBase` for two dimensional lattices with nine discrete velocities:

```cpp
template <typename T>
struct D2Q9DescriptorBase {
  typedef D2Q9DescriptorBase<T> BaseDescriptor;
  enum { d = 2, q = 9 };        ///< number of dimensions/distr. functions
  static const int vicinity;    ///< size of neighborhood
  static const int c[q][d];     ///< lattice directions
  static const int opposite[q]; ///< opposite entry
  static const T t[q];          ///< lattice weights
  static const T invCs2;        ///< inverse square of speed of sound
};
```

As we can see this is a plain struct template with some static member constants to store the data. This in itself is not problematic and worked just fine since the project's inception. Note that the template allows for specification of the floating point type used for all non-integer data. This is required to e.g. use automatic differentiation types that allow for taking the derivative of the whole simulation in order to apply optimization techniques.

```cpp
template<typename T>
const int D2Q9DescriptorBase<T>::vicinity = 1;

template<typename T>
const int D2Q9DescriptorBase<T>::c
[D2Q9DescriptorBase<T>::q][D2Q9DescriptorBase<T>::d] = {
  { 0, 0},
  {-1, 1}, {-1, 0}, {-1,-1}, { 0,-1},
  { 1,-1}, { 1, 0}, { 1, 1}, { 0, 1}
};

template<typename T>
const int D2Q9DescriptorBase<T>::opposite[D2Q9DescriptorBase<T>::q] = {
  0, 5, 6, 7, 8, 1, 2, 3, 4
};

template<typename T>
const T D2Q9DescriptorBase<T>::t[D2Q9DescriptorBase<T>::q] = {
  (T)4/(T)9, (T)1/(T)36, (T)1/(T)9, (T)1/(T)36, (T)1/(T)9,
  (T)1/(T)36, (T)1/(T)9, (T)1/(T)36, (T)1/(T)9
};

template<typename T>
const T D2Q9DescriptorBase<T>::invCs2 = (T)3;
```

The actual data was stored in a separate header `src/dynamics/latticeDescriptors.hh`. All in all this very straight forward approach worked as expected and could be fully resolved at compile time to avoid unnecessary run time jumps inside critical code sections as far as the descriptor concept is concerned. The real issue starts when we take a look at the so called _external fields_:

```cpp
struct Force2dDescriptor {
  static const int numScalars = 2;
  static const int numSpecies = 1;
  static const int forceBeginsAt = 0;
  static const int sizeOfForce   = 2;
};

struct Force2dDescriptorBase {
  typedef Force2dDescriptor ExternalField;
};

template <typename T> struct ForcedD2Q9Descriptor
  : public D2Q9DescriptorBase<T>, public Force2dDescriptorBase {
};
```

Some LBM models require additional per-cell data such as external force vectors or values to model chemical properties. As we can see the declaration of these _external fields_ is another task of the descriptor data structure and _the_ task that was solved the ugliest in our original implementation.

```cpp
// Set force vectors in all cells of material number 1
sLattice.defineExternalField( superGeometry, 1,
                              DESCRIPTOR<T>::ExternalField::forceBeginsAt,
                              DESCRIPTOR<T>::ExternalField::sizeOfForce,
                              force );
```

For example this is a completely unsafe access to raw memory as `forceBeginsAt` and `sizeOfForce` define arbitrary memory offsets. And while we might not care about security in this context you can probably imagine the kinds of obscure bugs caused by potentially faulty and inconsistent handling of such offsets. To make things worse the naming of external field indices and size constants was inconsistent between different fields and stuff only worked as long as a unclear set of naming and layout conventions was followed.

If you want to risk an even closer look[^4] you can download [version 1.2 or earlier](https://www.openlb.net/download/) and start your dive in `src/dynamics/latticeDescriptors.h`. Otherwise we are going to continue with a description of the new approach.

[^4]: Note that this examination of the issues with the previous descriptor concept is not aimed to be a strike at its original developers but rather as an example of how things can get out of hand when expanding a initial concept to cover more and more stuff. As far as legacy code is concerned this is still relatively tame and obviously the niceness of such scaffolding for the actual simulation is a side show when one first and foremost wants to generate new results.

## What is a meta descriptor?

The initial spark for the development of the new meta descriptor concept was the idea to define external fields as the parametrization of a multilinear function on the foundational `D` and `Q` constants of each descriptor[^5]. Lists of such functions could then be passed around via variadic template argument lists. This would then allow for handling external fields in a manner that is both flexible and consistent across all descriptors.

[^5]: i.e. each field describes its size as a function $f : \mathbb{N}_0^3 \to \mathbb{N}_0, (a,b,c) \mapsto a + b D + c Q$

Before we delve into the details if how these expectations were implemented let us first take a look at how the basic `D2Q9` descriptor is defined in the latest OpenLB release:

```cpp
template <typename... FIELDS>
struct D2Q9 : public DESCRIPTOR_BASE<2,9,POPULATION,FIELDS...> {
  typedef D2Q9<FIELDS...> BaseDescriptor;
  D2Q9() = delete;
};

namespace data {

template <>
constexpr int vicinity<2,9> = 1;

template <>
constexpr int c<2,9>[9][2] = {
  { 0, 0},
  {-1, 1}, {-1, 0}, {-1,-1}, { 0,-1},
  { 1,-1}, { 1, 0}, { 1, 1}, { 0, 1}
};

template <>
constexpr int opposite<2,9>[9] = {
  0, 5, 6, 7, 8, 1, 2, 3, 4
};

template <>
constexpr Fraction t<2,9>[9] = {
  {4, 9}, {1, 36}, {1, 9}, {1, 36}, {1, 9},
  {1, 36}, {1, 9}, {1, 36}, {1, 9}
};

template <>
constexpr Fraction cs2<2,9> = {1, 3};

}
```

These few compact lines[^51] describe the whole structure including all of its data. The various functions to access this data are auto-generated in a generic fashion using template metaprogramming and the previously verbose definition of a forced LB model reduces to a single self-explanatory type alias:

[^51]: See `src/dynamics/latticeDescriptors.h`

```cpp
using ForcedD2Q9Descriptor = D2Q9<FORCE>;
```

Descriptor data is now exposed via an adaptable set of free functions templated on the descriptor type. This was required to satisfy a secondary goal of decoupling descriptor data definitions and accesses in order to add support for both transparent auto-generation and platform adaptation (i.e. adding workarounds for porting the code to the GPU).

```cpp
/// Refactored generic equilibrium implementation
static T equilibrium(int iPop, T rho, const T u[DESCRIPTOR::d], const T uSqr)
{
  T c_u = T{};
  for (int iD = 0; iD < DESCRIPTOR::d; ++iD) {
    c_u += descriptors::c<DESCRIPTOR>(iPop,iD) * u[iD];
  }
  return rho
       * descriptors::t<T,DESCRIPTOR>(iPop)
       * ( T{1}
         + descriptors::invCs2<T,DESCRIPTOR>() * c_u
         + descriptors::invCs2<T,DESCRIPTOR>()
           * descriptors::invCs2<T,DESCRIPTOR>()
           * T{0.5} * c_u * c_u
         - descriptors::invCs2<T,DESCRIPTOR>()
           * T{0.5} * uSqr )
       - descriptors::t<T,DESCRIPTOR>(iPop);
}
```

The inclusion of the `descriptors` namespace slightly increases the verbosity of functions such as the one above. If things get to bad we can use local namespace inclusion as a workaround. But even if this was not possible the transparent extensibility (i.e. the ability to customize the underlying implementation without changing all call sites) more than makes up for increasing the character count of some sections.

## Implementation

Back in 2013 I experimented with [_mapping binary structures as tuples using template metaprogramming_](/article/mapping_binary_structures_as_tuples_using_template_metaprogramming/) in order to develop the foundations for a graph database. Surprisingly there were quite a few parallels between what I was doing then to what I am describing in this article. While I neither used the resulting [BinaryMapping](https://github.com/KnairdA/BinaryMapping) library for the development of [GraphStorage](https://github.com/KnairdA/GraphStorage) nor ever used this then LevelDB-based graph _database_ for more than a couple of basic examples, it was a welcome surprise to think back to my first steps doing more template-centered C++ programming.

```cpp
/// Base descriptor of a D-dimensional lattice with Q directions and a list of additional fields
template <unsigned D, unsigned Q, typename... FIELDS>
struct DESCRIPTOR_BASE {
  /// Deleted constructor to enforce pure usage as type and prevent implicit narrowing conversions
  DESCRIPTOR_BASE() = delete;

  /// Number of dimensions
  static constexpr int d = D;
  /// Number of velocities
  static constexpr int q = Q;

  /* [...] */
};
```

As the description of any LBM model includes at least a number of spatial dimensions `D` and a number of discrete velocities `Q` these two constants are the required template arguments of the new `DESCRIPTOR_BASE` class template[^61]. Until we finally get concepts in C++, the members of the `FIELDS` list are by convention expected to offer a `size` and `getLocalIndex` template methods accepting these two foundational constants.

[^61]: See `src/dynamics/descriptorBase.h`

```cpp
/// Base of a descriptor field whose size is defined by A*D + B*Q + C
template <unsigned C, unsigned A=0, unsigned B=0>
struct DESCRIPTOR_FIELD_BASE {
  /// Deleted constructor to enforce pure usage as type and prevent implicit narrowing conversions
  DESCRIPTOR_FIELD_BASE() = delete;

  /// Evaluates the size function
  template <unsigned D, unsigned Q>
  static constexpr unsigned size()
  {
    return A * D + B * Q + C;
  }

  /// Returns global index from local index and provides out_of_range safety
  template <unsigned D, unsigned Q>
  static constexpr unsigned getLocalIndex(const unsigned localIndex)
  {
    return localIndex < (A*D+B*Q+C) ? localIndex : throw std::out_of_range("Index exceeds data field");
  }
};
```

Most[^62] fields use the `DESCRIPTOR_FIELD_BASE` template as a base class. This template parametrizes the previously mentioned multilinear size function and allows for sharing field definitions between all descriptors.

```cpp
// Field types need to be distinct (i.e. not aliases) in order for `DESCRIPTOR_BASE::index` to work
// (Field size parametrized by: Cs + Ds*D + Qs*Q)          Cs Ds Qs
struct POPULATION           : public DESCRIPTOR_FIELD_BASE<0,  0, 1> { };
struct FORCE                : public DESCRIPTOR_FIELD_BASE<0,  1, 0> { };
struct SOURCE               : public DESCRIPTOR_FIELD_BASE<1,  0, 0> { };
/* [...] */
```

Let us take the `FORCE` field as an example[^63]: This field represents a cell-local force vector and as such requires exactly `D` floating point values worth of storage. Correspondingly its base class is `DESCRIPTOR_FIELD_BASE<0,1,0>` which yields a size of `2` for two-dimensional and `3` for three-dimensional descriptors.

[^62]: e.g. there is also a `TENSOR` base template that encodes the size of a tensor of order `D` (which is not a linear function)
[^63]: Common field definitions are collected in `src/dynamics/descriptorField.h`

Building upon this common field structure allows us to write down a `getIndexFromFieldList` helper function template that automatically calculates the starting offset of any element in an arbitrary list of fields:

```cpp
template <
  unsigned D,
  unsigned Q,
  typename WANTED_FIELD,
  typename CURRENT_FIELD,
  typename... FIELDS,
  // WANTED_FIELD equals the head of our field list, terminate recursion
  std::enable_if_t<std::is_same<WANTED_FIELD,CURRENT_FIELD>::value, int> = 0
>
constexpr unsigned getIndexFromFieldList()
{
  return 0;
}

template <
  unsigned D,
  unsigned Q,
  typename WANTED_FIELD,
  typename CURRENT_FIELD,
  typename... FIELDS,
  // WANTED_FIELD doesn't equal the head of our field list
  std::enable_if_t<!std::is_same<WANTED_FIELD,CURRENT_FIELD>::value, int> = 0
>
constexpr unsigned getIndexFromFieldList()
{
  // Break compilation when WANTED_FIELD is not provided by list of fields
  static_assert(sizeof...(FIELDS) > 0, "Field not found.");

  // Add size of current field to implicit offset and continue search
  // for WANTED_FIELD in the tail of our field list
  return CURRENT_FIELD::template size<D,Q>() + getIndexFromFieldList<D,Q,WANTED_FIELD,FIELDS...>();
}
```

As far as template metaprogramming is concerned this code is quite basic -- we simply recursively traverse the variadic field list and sum up the field sizes along the way. This function is wrapped by the `DESCRIPTOR_BASE::index` method template that exposes the memory offset of a given field. We are left with a generic interface that replaces our previous inconsistent and hard to maintain field offsets in the vein of `DESCRIPTOR::ExternalField::forceBeginsAt`.

```cpp
/// Returns index of WANTED_FIELD
/**
 * Fails compilation if WANTED_FIELD is not contained in FIELDS.
 * Branching that depends on this information can be realized using `provides`.
 **/
template <typename WANTED_FIELD>
static constexpr int index(const unsigned localIndex=0)
{
  return getIndexFromFieldList<D,Q,WANTED_FIELD,FIELDS...>()
         + WANTED_FIELD::template getLocalIndex<D,Q>(localIndex);
}
```

As we will see in the section on _improved field access_ this method is not commonly used in user code but rather as a building block for self-documenting field accessors. One might notice that the abstraction layers are starting to pile up -- luckily all of them are by themselves rather plain `constexpr` function templates and can as such still be fully collapsed during compile time.

### Fraction types

The alert reader might have noticed that the type of per-direction weight constants `descriptors::data::t` was changed to `Fraction` in our new meta descriptor. The reason for this is that we use variable templates to store these values and C++ sadly doesn't allow partial specializations in this context. To elaborate, we are not allowed to write:

```cpp
template <typename T>
constexpr Fraction t<T,2,9>[9] = {
  T{4}/T{9}, T{1}/T{36}, T{1}/T{9}, T{1}/T{36}, T{1}/T{9},
  T{1}/T{36}, T{1}/T{9}, T{1}/T{36}, T{1}/T{9}
};
```

To work around this issue I wrote a small floating-point independent fraction type:

```cpp
class Fraction {
private:
  const int _numerator;
  const int _denominator;

public:
  /* [...] */

  template <typename T>
  constexpr T as() const
  {
    return T(_numerator) / T(_denominator);
  }

  template <typename T>
  constexpr T inverseAs() const
  {
    return _numerator != 0
         ? T(_denominator) / T(_numerator)
         : throw std::invalid_argument("inverse of zero is undefined");
  }
};
```

This works well for both integral and automatically differentiable floating point types and even yields a more pleasant syntax for defining fractional descriptor values due to C++'s implicit constructor calls. One remaining hiccup is the representation of values such as square roots that are not easily expressed as readable rational numbers. Such weights are required by some more exotic LB models and currently stored by explicit specialization for any required type. A slightly surprising fact in this context is that the C++ standard doesn't require some functions such as `std::sqrt` to be `constexpr`. This problem remained undetected for quite a while as e.g. GCC fixes this issue in a non-standard extension. So in the long term we are going to have to invest some more effort into adding compile-time math functions in the vein of [GCEM](https://www.kthohr.com/gcem.html).

### Tagging free functions

As I hinted previously one major change besides the refactoring of the actual descriptor structure was the introduction of an abstraction layer between data and call sites. i.e. where we previously wrote `DESCRIPTOR<T>::t[i]` to directly access the ith weight we now call a free function `descriptors::t<T,DESCRIPTOR>(i)`. The advantage of this additional layer is the ability to transparently switch out the underlying data source. Furthermore we can easily expand such free functions to distinguish between various descriptor specializations at compile time via tagging.

```cpp
template <typename T, unsigned D, unsigned Q>
constexpr T t(unsigned iPop, tag::DEFAULT)
{
return data::t<D,Q>[iPop].template as<T>();
}

template <typename T, typename DESCRIPTOR>
constexpr T t(unsigned iPop)
{
return t<T, DESCRIPTOR::d, DESCRIPTOR::q>(iPop,
                                          typename DESCRIPTOR::category_tag());
}
```

This powerful concept uses C++'s function overload resolution to transparently call different implementations based on the given template arguments in a very compact fashion. As an example we can mark a descriptor using some non-default tag `tag::SPECIAL` and implement a function `T t(unsigned iPop, tag::SPECIAL)` to do some _special_ stuff for this descriptor -- the definition of both the tag and its function overload can be written anywhere in the codebase and will be automatically resolved by the generic implementation. This adds a whole new level of extensibility to OpenLB and is currently used to e.g. handle the special requirements of MRT LBM models.

### Extracting tags from lists

One might have noticed that we accessed a `DESCRIPTOR::category_tag` typedef to select the correct function overload. While the canonical way to do function tagging is to simply define this type on a case by case basis in any tagged structure, I chose to develop something slightly more sophisticated: Tags are represented as special zero-size fields[^7] and passed to the descriptor specialization alongside any other fields. This feels quite nice and results in a very expressive and self-documenting interface for defining new descriptors.

[^7]: See `src/dynamics/descriptorTag.h`

```cpp
/// Base of a descriptor tag
struct DESCRIPTOR_TAG {
  template <unsigned, unsigned>
  static constexpr unsigned size()
  {
    return 0; // a tag doesn't have a size
  }
};
```

As such `DESCRIPTOR_BASE` is the only place where the `category_tag` type is defined. To do this we filter the given list of fields and select the first _tag-field_ that is derived from our desired _tag-group_ `tag::CATEGORY`.

```cpp
template <typename BASE, typename FALLBACK, typename... FIELDS>
using field_with_base = typename std::conditional<
  std::is_void<typename utilities::meta::list_item_with_base<BASE, FIELDS...>::type>::value,
  FALLBACK,
  typename utilities::meta::list_item_with_base<BASE, FIELDS...>::type
>::type;

/* [...] */

using category_tag = tag::field_with_base<
  tag::CATEGORY, tag::DEFAULT, FIELDS...>;
```

In order to implement the `utilities::meta::list_item_with_base` meta template I referred back to the [_Scheme metaphor for template metaprogramming_](/article/using_scheme_as_a_metaphor_for_template_metaprogramming/) which results in a readable filtering operation based on the tools offered by the standard library's type traits:

```cpp
/// Get first type based on BASE contained in a given type list
/**
 * If no such list item exists, type is void.
 **/
template <
  typename BASE,
  typename HEAD = void, // Default argument in case the list is empty
  typename... TAIL
>
struct list_item_with_base {
  using type = typename std::conditional<
    std::is_base_of<BASE, HEAD>::value,
    HEAD,
    typename list_item_with_base<BASE, TAIL...>::type
  >::type;
};

template <typename BASE, typename HEAD>
struct list_item_with_base<BASE, HEAD> {
  using type = typename std::conditional<
    std::is_base_of<BASE, HEAD>::value,
    HEAD,
    void
  >::type;
};
```

### Improved field access

The last remaining cornerstone of OpenLB's new meta descriptor concept is the introduction of a set of convenient functions to access a cell's field values via the field's name. By taking this final step we get the ability to write simulation code that doesn't handle any raw memory offsets in addition to being more compact. Furthermore we can now in theory completely modify the underlying field storage structures without forcing the user code to change.

```cpp
/// Return pointer to FIELD of cell
template <typename FIELD, typename X = DESCRIPTOR>
std::enable_if_t<X::template provides<FIELD>(), T*>
getFieldPointer()
{
  const int offset = DESCRIPTOR::template index<FIELD>();
  return &(this->data[offset]);
}

template <typename FIELD, typename X = DESCRIPTOR>
std::enable_if_t<!X::template provides<FIELD>(), T*>
getFieldPointer()
{
  throw std::invalid_argument("DESCRIPTOR does not provide FIELD.");
  return nullptr;
}
```

The foundation of all field accessors is a new `Cell::getFieldPointer` method template that resolves the field location using the `DESCRIPTOR_BASE::index` and `DESCRIPTOR_BASE::size` functions we defined previously. Note that we had to loosen our newly gained compile-time guarantee of a field's existence in favour of generating runtime exception code. The reason for this is that most current builds include code that depends on a certain set of fields even if those fields are not actually provided by a given descriptor. While we are going to resolve this unsatisfying situation in the future, this workaround offered an acceptable compromise.

```cpp
/// Set value of FIELD from a vector
template <typename FIELD, typename X = DESCRIPTOR>
std::enable_if_t<(X::template size<FIELD>() > 1), void>
setField(const Vector<T,DESCRIPTOR::template size<FIELD>()>& field)
{
  std::copy_n(
    field.data,
    DESCRIPTOR::template size<FIELD>(),
    getFieldPointer<FIELD>());
}

/// Set value of FIELD from a scalar
template <typename FIELD, typename X = DESCRIPTOR>
std::enable_if_t<(X::template size<FIELD>() == 1), void>
setField(T value)
{
  getFieldPointer<FIELD>()[0] = value;
}
```

Note that disabling a member function specialization depending on its parent's template arguments is only possible with some indirection: The parent template argument `DESCRIPTOR` is passed as the default value to the member function's `X` argument. This parameter can then be used by `std::enable_if` as one would expect.

## Two days of merging

It is probably clear that the set of changes summarized so far mark a far reaching revamp of the existing codebase -- in fact there was scarcely a file untouched after I got everything to work again. As we do not live in an ideal world where I could have developed this in isolation while any other developments are stopped, both the initial prototype and the following rollout to all of OpenLB had to be developed on a seperate branch. Due to the additional hindrance that I am not actually working anywhere close to full-time on this[^8] these changes took quite a few months from inception to full realization. Correspondingly the meta descriptor and master branch had diverged significantly by the time we felt ready to merge -- you can imagine how unpleasant it was to fiddle this back together.

[^8]: After all I am still primarily a mathematics student

I found the three-way merge functionality offered by [Meld](https://meldmerge.org/) to be a most useful tool during this endeavour. My fingers were still twitching in a rhythmic pattern after two days of using this utility to more or less manually merge everything back together but it was still worlds better than the alternative of e.g. resolving the conflicts in a normal text editor.

Sadly even in retrospect I can not think of a better alternative to letting the branches diverge this far: A significant chunk of all lines had to be changed in randomly non-trivial ways and there was no discrete point in between where you could push these changes to the rest of the team with a good conscience. At least further changes to e.g. the foundational cell data structures should now prove to be significantly easier than they would have been without this refactor.

## Summary

All in all I am quite satisfied with how this new concept turned out in practice: The code is smaller and more self-documenting while growing in extensibility and consistency. The internally increased complexity is restricted to a set of classes and meta templates that the ordinary user that just wants to write a simulation should never come in contact with. Some listings in this article might look cryptic at first but as far as template metaprogramming goes this is still reasonable -- we did not run into any serious portability issues and everything works as expected in GCC, Clang and Intel's C++ compiler[^9].

[^9]: I was surprised to learn how big of an advantage the Intel compiler can provide: In some settings the generated code runs up to 20 percent faster compared to what GCC or Clang produce.

To conclude things I want to encourage everyone to check out the latest [OpenLB](https://www.openlb.net/news/openlb-release-1-3-available/) release to see these and other interesting new features in practive. Should this article have awoken any interest in CFD using Lattice Boltzmann Methods, a fun introduction is provided by my [previous article](/article/fun_with_compute_shaders_and_fluid_dynamics/) on just this topic.