aboutsummaryrefslogtreecommitdiff
path: root/articles/2013-11-03_mapping_binary_structures_as_tuples_using_template_metaprogramming.md
diff options
context:
space:
mode:
Diffstat (limited to 'articles/2013-11-03_mapping_binary_structures_as_tuples_using_template_metaprogramming.md')
-rw-r--r--articles/2013-11-03_mapping_binary_structures_as_tuples_using_template_metaprogramming.md240
1 files changed, 240 insertions, 0 deletions
diff --git a/articles/2013-11-03_mapping_binary_structures_as_tuples_using_template_metaprogramming.md b/articles/2013-11-03_mapping_binary_structures_as_tuples_using_template_metaprogramming.md
new file mode 100644
index 0000000..24c000e
--- /dev/null
+++ b/articles/2013-11-03_mapping_binary_structures_as_tuples_using_template_metaprogramming.md
@@ -0,0 +1,240 @@
+# Mapping binary structures as tuples using template metaprogramming
+
+My current programming related interests are centered mainly around low level data storage and querying techniques - i.e. implementing my own database.
+The first step in this direction was a graph storage library based on Google [LevelDB](https://code.google.com/p/leveldb/). While the resulting prototype [libGraphStorage](https://github.com/KnairdA/GraphStorage/)
+performs fairly well it just doesn't fulfill all my expectations concerning performance and storage requirements. Additionally the decision of building a graph storage on top of
+a key value store proofed to be too limiting as my mental image of the kind of system I wanted to develop changed over time. While I now have a clearer plan about what I want to
+build I am also back to square one of actually implementing anything. As I chose to abandon the concept of building higher level data structures on top of existing key-value store
+libraries in favor of building everything from the system calls up by myself, I am now faced among other things with implementing nice abstractions around mapping raw binary data
+into various structures.
+
+For the purpose of this article I am going to limit myself to flat structures with fixed length fields. The result I am aiming for is a template which enables me to
+write mappings like [EdgeId](https://github.com/KnairdA/GraphStorage/blob/master/src/storage/id/edge_id.cc) in a more compact and efficient way. This also includes support for handling
+differences in endianness and In-place modification of the structure fields.
+
+### Mapping buffers as tuples
+
+To be able to easily work with structure definitions using template metaprogramming I am relying on the standard libraries [_std::tuple_](http://en.cppreference.com/w/cpp/utility/tuple)
+template.
+
+ template<typename Tuple>
+ class BinaryMapping {
+ public:
+ BinaryMapping(uint8_t* const buffer):
+ buffer_(buffer) {
+ TupleReader::read(this->tuple_, buffer);
+ }
+
+ template <size_t Index>
+ decltype(*std::get<Index>(Tuple{})) get() {
+ return *std::get<Index>(this->tuple_);
+ }
+
+ template <size_t Index,
+ typename Value = decltype(*std::get<Index>(Tuple{}))>
+ void set(const Value value) {
+ *std::get<Index>(this->tuple_) = value;
+ }
+
+ private:
+ uint8_t* const buffer_;
+ Tuple tuple_;
+
+ };
+
+This implementation of a template class _BinaryMapping_ provides _get_ and _set_ template methods for accessing values in a given binary buffer using the mapping provided by a given
+Tuple template argument. The most notable element of this class is the usage of the _decltype_ keyword which was introduced in C++11. This keyword makes it easier to declare types
+dependent on template arguments or other types that are difficult to declare using the standard notations. This is possible because _decltype_ exposes the return type of an expression
+during the compilation phase. So the expression `decltype(*std::get<Index>(Tuple{}))` is replaced with the return type of the expression `*std::get<Index>(Tuple{})` during template
+instantiation. As the return type of this expression is dependent on the template arguments _Index_ and _Tuple_ the return type of the template method which is using a _decltype_ expression
+as its return type is also dependent on the template arguments.
+
+As you may have noticed the above template class is not complete as I have not included a implementation of the _TupleReader::read_ method which does the actual work of mapping the binary
+buffer as a tuple. This mapping is achieved by the following recursive template methods:
+
+ struct TupleReader {
+ template <typename Tuple, size_t Index = 0, off_t Offset = 0>
+ static inline typename std::enable_if<
+ Index == std::tuple_size<Tuple>::value, void
+ >::type read(Tuple&, uint8_t*) { }
+
+ template <typename Tuple, size_t Index = 0, off_t Offset = 0>
+ static inline typename std::enable_if<
+ Index < std::tuple_size<Tuple>::value, void
+ >::type read(Tuple& tuple, uint8_t* buffer) {
+ std::get<Index>(tuple) = reinterpret_cast<
+ typename std::tuple_element<Index, Tuple>::type
+ >(buffer+Offset);
+
+ read<Tuple,
+ Index + 1,
+ Offset + sizeof(decltype(*std::get<Index>(Tuple{})))>(
+ tuple, buffer
+ );
+ }
+ };
+
+Template metaprogramming in C++ offers a Turing-complete language which is fully executed during compilation. This means that any problem we may solve during the runtime of a _normal_ program
+may also be solved during compilation using template metaprogramming techniques. This kind of programming is comparable to functional programming as we have to rely on recursion and pattern
+matching.
+The above coding contains two overloads of the _read_ method. The first is our exit condition which stops the recursion as soon as we have processed every tuple element. The second one matches
+every element of a given tuple and sets the pointer of each element to the matching position in the binary buffer. To make this work I use the following four standard library templates:
+
+[_std::enable\_if_](http://en.cppreference.com/w/cpp/types/enable_if) makes it possible to exclude template instantiations from overload resolution based on a condition passed as a template
+argument. The condition has to be a statically resolvable boolean expression, the second argument of the template is the type to be returned when the condition resolves to a true value.
+_TupleReader_ uses this template to match the recursive version of the _read_ method to any _Index_ value below the tuple element count and the empty version to the case where the _Index_
+argument equals the tuple element count. This equals an escape condition for the recursion started by the second overload of the _read_ method as it increases the _Index_ on each call.
+
+[_std::get_](http://en.cppreference.com/w/cpp/utility/tuple/get) returns a reference to the Nth element of the given tuple as specified by the corresponding template argument.
+The methods in _TupleReader_ use this template to set the pointer on each element to the appropriate buffer offset.
+
+[_std::tuple\_size_](http://en.cppreference.com/w/cpp/utility/tuple/tuple_size) returns the number of elements in the _std::tuple_ instantiation passed as the template argument. This value is
+used as a reference value to compare the current recursion index to.
+
+[_std::tuple\_element_](http://en.cppreference.com/w/cpp/utility/tuple/tuple_element) returns the type of the Nth element of a given _std::tuple_ instantiation as specified by the template
+argument. I use this template to get the type to which the current buffer offset has to be cast to match the pointer of the corresponding tuple element.
+
+While the _Index_ argument of the _read_ template method is increased by one, the _Offset_ value is increased by the size of the current tuple element type so the current total buffer offset
+is always available as a template argument.
+
+The classes _TupleReader_ and _BinaryMapping_ are enough to map a binary structure as a _std::tuple_ instantiation like in the following example where I define a _TestRecord_ tuple containing
+a pointer to _uint64\_t_ and _uint16\_t_ integers:
+
+ typedef std::tuple<uint64_t*, uint16_t*> TestRecord;
+
+ uint8_t* buffer = reinterpret_cast<uint8_t*>(
+ std::calloc(10, sizeof(uint8_t))
+ );
+
+ BinaryMapping<TestRecord> mapping(buffer);
+
+ mapping.set<0>(42);
+ mapping.set<1>(1337);
+
+ std::cout << mapping.get<0>() << std::endl;
+ std::cout << mapping.get<1>() << std::endl;
+
+### Endianness
+
+As you may remember this does not take endianness into account as I defined as a requirement in the beginning. I first thought about including support for different endianness types into the
+_BinaryMapping_ template class which worked, but led to problems as soon as I mixed calls to _get_ and _set_. The resulting problems could of course have been fixed but this would probably
+have conflicted with In-place modifications of the buffer. Because of that I chose to implement endianness support in a separate set of templates.
+
+ struct BigEndianSorter {
+ template <class Key>
+ static void write(uint8_t*, const Key&);
+
+ template <typename Key>
+ static Key read(uint8_t* buffer);
+ };
+
+To be able to work with different byte orderings I abstracted the basic operations down to _read_ and _write_ template methods contained in a _struct_ so I would be able to provide separate
+implementations of these methods for each type of endianness. The following template specialization of the _write_ method which does an In-place reordering of a _uint64\_t_ should be enough
+to understand the basic principle:
+
+ template <>
+ void BigEndianSorter::write<uint64_t>(uint8_t* buffer, const uint64_t& number) {
+ *reinterpret_cast<uint64_t*>(buffer) = htobe64(number);
+ }
+
+As soon as I had the basic endianness conversion methods implemented in a manner which could be used to specialize other template classes I was able to build a generic implementation of a
+serializer which respects the structure defined by a given _std::tuple_ instantiation:
+
+ template <class ByteSorter>
+ struct Serializer {
+ template <typename Tuple, size_t Index = 0, off_t Offset = 0>
+ static inline typename std::enable_if<
+ Index == std::tuple_size<Tuple>::value, void
+ >::type serialize(uint8_t*) { }
+
+ template <typename Tuple, size_t Index = 0, off_t Offset = 0>
+ static inline typename std::enable_if<
+ Index < std::tuple_size<Tuple>::value, void
+ >::type serialize(uint8_t* buffer) {
+ ByteSorter::template write<typename std::remove_reference<
+ decltype(*std::get<Index>(Tuple{}))
+ >::type>(
+ buffer+Offset,
+ *reinterpret_cast<typename std::tuple_element<Index, Tuple>::type>(
+ buffer+Offset
+ )
+ );
+
+ serialize<Tuple,
+ Index + 1,
+ Offset + sizeof(decltype(*std::get<Index>(Tuple{})))>(
+ buffer
+ );
+ }
+
+ template <typename Tuple, size_t Index = 0, off_t Offset = 0>
+ static inline typename std::enable_if<
+ Index == std::tuple_size<Tuple>::value, void
+ >::type deserialize(uint8_t*) { }
+
+ template <typename Tuple, size_t Index = 0, off_t Offset = 0>
+ static inline typename std::enable_if<
+ Index < std::tuple_size<Tuple>::value, void
+ >::type deserialize(uint8_t* buffer) {
+ *reinterpret_cast<typename std::tuple_element<Index, Tuple>::type>(
+ buffer+Offset
+ ) = *ByteSorter::template read<
+ typename std::tuple_element<Index, Tuple>::type
+ >(buffer+Offset);
+
+ deserialize<Tuple,
+ Index + 1,
+ Offset + sizeof(decltype(*std::get<Index>(Tuple{})))>(
+ buffer
+ );
+ }
+ };
+
+It should be evident that the way both the _serialize_ and _deserialize_ template methods are implemented is very similar to the _TupleReader_ implementation. In fact the only difference
+is that no actual _std::tuple_ instantiation instance is touched and instead of setting pointers to the buffer we are only reordering the bytes of each section of the buffer corresponding to
+a tuple element. This results in a complete In-place conversion between different byte orderings using the methods provided by a _ByteSorter_ template such as _BigEndianSorter_.
+
+### Conclusion
+
+At last I am now able to do everything I planned in the beginning in a very compact way using the _Serializer_, _TupleReader_ and _BinaryMapping_ templates. In practice this now looks like this:
+
+ typedef std::tuple<uint64_t*, uint16_t*> TestRecord;
+
+ uint8_t* buffer = reinterpret_cast<uint8_t*>(
+ std::calloc(10, sizeof(uint8_t))
+ );
+
+ BinaryMapping<TestRecord> mapping(buffer);
+
+ mapping.set<0>(42);
+ mapping.set<1>(1001);
+
+ Serializer<BigEndianSorter>::serialize<TestRecord>(buffer);
+
+ uint8_t* testBuffer = reinterpret_cast<uint8_t*>(
+ std::calloc(10, sizeof(uint8_t))
+ );
+
+ std::memcpy(testBuffer, buffer, 10);
+
+ Serializer<BigEndianSorter>::deserialize<TestRecord>(testBuffer);
+
+ BinaryMapping<TestRecord> testMapping(testBuffer);
+
+ std::cout << testMapping.get<0>() << std::endl;
+ std::cout << testMapping.get<1>() << std::endl;
+
+ std::free(buffer);
+ std::free(testBuffer);
+
+The above coding makes use of all features provided by the described templates by first setting two values using _BinaryMapping_ specialized on the _TestRecord_ tuple, serializing them using
+_Serializer_ specialized on the _BigEndianSorter_, deserializing the buffer back to the host byte ordering and reading the values using another _BinaryMapping_.
+
+I find the template metaprogramming based approach to mapping binary structures into tuples described in this article to be very nice and clear. While template metaprogramming takes a while to
+get into it is a very powerful method for describing code in a generic way. I would certainly not recommend to try and fit everything in a template based approach but in some cases - especially
+when one would otherwise be required to write lots of boilerplate code repeatedly - it just fits and works out rather nicely. The best example for this would probably be the standard library
+which basically is _just_ a large library of templates.
+The next step in extending the templates explained in this article would probably be adapting the _BinaryMapping_ template to offer sliding window like iteration over larger buffers and extending
+the supported data types.
+
+__Update:__ The current version of a small C++ template library extending the _BinaryMapping_ templates detailed in this article may be found on [Github](https://github.com/KnairdA/BinaryMapping/) or [cgit](http://code.kummerlaender.eu/BinaryMapping/).