From d3a6e17e32578a2c4c376626cbe5401cb57cd097 Mon Sep 17 00:00:00 2001 From: Adrian Kummerlaender Date: Thu, 24 Jul 2014 16:46:52 +0200 Subject: Added basic article datasource and output transformations * articles contain the handle and date in their file name ** i.e. it is split using XPath string functions ** usage of the ISO date format provides automatic ordering * added some articles from my blog as example data --- ...res_as_tuples_using_template_metaprogramming.md | 240 +++++++++++++++++++++ 1 file changed, 240 insertions(+) create mode 100644 articles/2013-11-03_mapping_binary_structures_as_tuples_using_template_metaprogramming.md (limited to 'articles/2013-11-03_mapping_binary_structures_as_tuples_using_template_metaprogramming.md') diff --git a/articles/2013-11-03_mapping_binary_structures_as_tuples_using_template_metaprogramming.md b/articles/2013-11-03_mapping_binary_structures_as_tuples_using_template_metaprogramming.md new file mode 100644 index 0000000..24c000e --- /dev/null +++ b/articles/2013-11-03_mapping_binary_structures_as_tuples_using_template_metaprogramming.md @@ -0,0 +1,240 @@ +# Mapping binary structures as tuples using template metaprogramming + +My current programming related interests are centered mainly around low level data storage and querying techniques - i.e. implementing my own database. +The first step in this direction was a graph storage library based on Google [LevelDB](https://code.google.com/p/leveldb/). While the resulting prototype [libGraphStorage](https://github.com/KnairdA/GraphStorage/) +performs fairly well it just doesn't fulfill all my expectations concerning performance and storage requirements. Additionally the decision of building a graph storage on top of +a key value store proofed to be too limiting as my mental image of the kind of system I wanted to develop changed over time. While I now have a clearer plan about what I want to +build I am also back to square one of actually implementing anything. As I chose to abandon the concept of building higher level data structures on top of existing key-value store +libraries in favor of building everything from the system calls up by myself, I am now faced among other things with implementing nice abstractions around mapping raw binary data +into various structures. + +For the purpose of this article I am going to limit myself to flat structures with fixed length fields. The result I am aiming for is a template which enables me to +write mappings like [EdgeId](https://github.com/KnairdA/GraphStorage/blob/master/src/storage/id/edge_id.cc) in a more compact and efficient way. This also includes support for handling +differences in endianness and In-place modification of the structure fields. + +### Mapping buffers as tuples + +To be able to easily work with structure definitions using template metaprogramming I am relying on the standard libraries [_std::tuple_](http://en.cppreference.com/w/cpp/utility/tuple) +template. + + template + class BinaryMapping { + public: + BinaryMapping(uint8_t* const buffer): + buffer_(buffer) { + TupleReader::read(this->tuple_, buffer); + } + + template + decltype(*std::get(Tuple{})) get() { + return *std::get(this->tuple_); + } + + template (Tuple{}))> + void set(const Value value) { + *std::get(this->tuple_) = value; + } + + private: + uint8_t* const buffer_; + Tuple tuple_; + + }; + +This implementation of a template class _BinaryMapping_ provides _get_ and _set_ template methods for accessing values in a given binary buffer using the mapping provided by a given +Tuple template argument. The most notable element of this class is the usage of the _decltype_ keyword which was introduced in C++11. This keyword makes it easier to declare types +dependent on template arguments or other types that are difficult to declare using the standard notations. This is possible because _decltype_ exposes the return type of an expression +during the compilation phase. So the expression `decltype(*std::get(Tuple{}))` is replaced with the return type of the expression `*std::get(Tuple{})` during template +instantiation. As the return type of this expression is dependent on the template arguments _Index_ and _Tuple_ the return type of the template method which is using a _decltype_ expression +as its return type is also dependent on the template arguments. + +As you may have noticed the above template class is not complete as I have not included a implementation of the _TupleReader::read_ method which does the actual work of mapping the binary +buffer as a tuple. This mapping is achieved by the following recursive template methods: + + struct TupleReader { + template + static inline typename std::enable_if< + Index == std::tuple_size::value, void + >::type read(Tuple&, uint8_t*) { } + + template + static inline typename std::enable_if< + Index < std::tuple_size::value, void + >::type read(Tuple& tuple, uint8_t* buffer) { + std::get(tuple) = reinterpret_cast< + typename std::tuple_element::type + >(buffer+Offset); + + read(Tuple{})))>( + tuple, buffer + ); + } + }; + +Template metaprogramming in C++ offers a Turing-complete language which is fully executed during compilation. This means that any problem we may solve during the runtime of a _normal_ program +may also be solved during compilation using template metaprogramming techniques. This kind of programming is comparable to functional programming as we have to rely on recursion and pattern +matching. +The above coding contains two overloads of the _read_ method. The first is our exit condition which stops the recursion as soon as we have processed every tuple element. The second one matches +every element of a given tuple and sets the pointer of each element to the matching position in the binary buffer. To make this work I use the following four standard library templates: + +[_std::enable\_if_](http://en.cppreference.com/w/cpp/types/enable_if) makes it possible to exclude template instantiations from overload resolution based on a condition passed as a template +argument. The condition has to be a statically resolvable boolean expression, the second argument of the template is the type to be returned when the condition resolves to a true value. +_TupleReader_ uses this template to match the recursive version of the _read_ method to any _Index_ value below the tuple element count and the empty version to the case where the _Index_ +argument equals the tuple element count. This equals an escape condition for the recursion started by the second overload of the _read_ method as it increases the _Index_ on each call. + +[_std::get_](http://en.cppreference.com/w/cpp/utility/tuple/get) returns a reference to the Nth element of the given tuple as specified by the corresponding template argument. +The methods in _TupleReader_ use this template to set the pointer on each element to the appropriate buffer offset. + +[_std::tuple\_size_](http://en.cppreference.com/w/cpp/utility/tuple/tuple_size) returns the number of elements in the _std::tuple_ instantiation passed as the template argument. This value is +used as a reference value to compare the current recursion index to. + +[_std::tuple\_element_](http://en.cppreference.com/w/cpp/utility/tuple/tuple_element) returns the type of the Nth element of a given _std::tuple_ instantiation as specified by the template +argument. I use this template to get the type to which the current buffer offset has to be cast to match the pointer of the corresponding tuple element. + +While the _Index_ argument of the _read_ template method is increased by one, the _Offset_ value is increased by the size of the current tuple element type so the current total buffer offset +is always available as a template argument. + +The classes _TupleReader_ and _BinaryMapping_ are enough to map a binary structure as a _std::tuple_ instantiation like in the following example where I define a _TestRecord_ tuple containing +a pointer to _uint64\_t_ and _uint16\_t_ integers: + + typedef std::tuple TestRecord; + + uint8_t* buffer = reinterpret_cast( + std::calloc(10, sizeof(uint8_t)) + ); + + BinaryMapping mapping(buffer); + + mapping.set<0>(42); + mapping.set<1>(1337); + + std::cout << mapping.get<0>() << std::endl; + std::cout << mapping.get<1>() << std::endl; + +### Endianness + +As you may remember this does not take endianness into account as I defined as a requirement in the beginning. I first thought about including support for different endianness types into the +_BinaryMapping_ template class which worked, but led to problems as soon as I mixed calls to _get_ and _set_. The resulting problems could of course have been fixed but this would probably +have conflicted with In-place modifications of the buffer. Because of that I chose to implement endianness support in a separate set of templates. + + struct BigEndianSorter { + template + static void write(uint8_t*, const Key&); + + template + static Key read(uint8_t* buffer); + }; + +To be able to work with different byte orderings I abstracted the basic operations down to _read_ and _write_ template methods contained in a _struct_ so I would be able to provide separate +implementations of these methods for each type of endianness. The following template specialization of the _write_ method which does an In-place reordering of a _uint64\_t_ should be enough +to understand the basic principle: + + template <> + void BigEndianSorter::write(uint8_t* buffer, const uint64_t& number) { + *reinterpret_cast(buffer) = htobe64(number); + } + +As soon as I had the basic endianness conversion methods implemented in a manner which could be used to specialize other template classes I was able to build a generic implementation of a +serializer which respects the structure defined by a given _std::tuple_ instantiation: + + template + struct Serializer { + template + static inline typename std::enable_if< + Index == std::tuple_size::value, void + >::type serialize(uint8_t*) { } + + template + static inline typename std::enable_if< + Index < std::tuple_size::value, void + >::type serialize(uint8_t* buffer) { + ByteSorter::template write(Tuple{})) + >::type>( + buffer+Offset, + *reinterpret_cast::type>( + buffer+Offset + ) + ); + + serialize(Tuple{})))>( + buffer + ); + } + + template + static inline typename std::enable_if< + Index == std::tuple_size::value, void + >::type deserialize(uint8_t*) { } + + template + static inline typename std::enable_if< + Index < std::tuple_size::value, void + >::type deserialize(uint8_t* buffer) { + *reinterpret_cast::type>( + buffer+Offset + ) = *ByteSorter::template read< + typename std::tuple_element::type + >(buffer+Offset); + + deserialize(Tuple{})))>( + buffer + ); + } + }; + +It should be evident that the way both the _serialize_ and _deserialize_ template methods are implemented is very similar to the _TupleReader_ implementation. In fact the only difference +is that no actual _std::tuple_ instantiation instance is touched and instead of setting pointers to the buffer we are only reordering the bytes of each section of the buffer corresponding to +a tuple element. This results in a complete In-place conversion between different byte orderings using the methods provided by a _ByteSorter_ template such as _BigEndianSorter_. + +### Conclusion + +At last I am now able to do everything I planned in the beginning in a very compact way using the _Serializer_, _TupleReader_ and _BinaryMapping_ templates. In practice this now looks like this: + + typedef std::tuple TestRecord; + + uint8_t* buffer = reinterpret_cast( + std::calloc(10, sizeof(uint8_t)) + ); + + BinaryMapping mapping(buffer); + + mapping.set<0>(42); + mapping.set<1>(1001); + + Serializer::serialize(buffer); + + uint8_t* testBuffer = reinterpret_cast( + std::calloc(10, sizeof(uint8_t)) + ); + + std::memcpy(testBuffer, buffer, 10); + + Serializer::deserialize(testBuffer); + + BinaryMapping testMapping(testBuffer); + + std::cout << testMapping.get<0>() << std::endl; + std::cout << testMapping.get<1>() << std::endl; + + std::free(buffer); + std::free(testBuffer); + +The above coding makes use of all features provided by the described templates by first setting two values using _BinaryMapping_ specialized on the _TestRecord_ tuple, serializing them using +_Serializer_ specialized on the _BigEndianSorter_, deserializing the buffer back to the host byte ordering and reading the values using another _BinaryMapping_. + +I find the template metaprogramming based approach to mapping binary structures into tuples described in this article to be very nice and clear. While template metaprogramming takes a while to +get into it is a very powerful method for describing code in a generic way. I would certainly not recommend to try and fit everything in a template based approach but in some cases - especially +when one would otherwise be required to write lots of boilerplate code repeatedly - it just fits and works out rather nicely. The best example for this would probably be the standard library +which basically is _just_ a large library of templates. +The next step in extending the templates explained in this article would probably be adapting the _BinaryMapping_ template to offer sliding window like iteration over larger buffers and extending +the supported data types. + +__Update:__ The current version of a small C++ template library extending the _BinaryMapping_ templates detailed in this article may be found on [Github](https://github.com/KnairdA/BinaryMapping/) or [cgit](http://code.kummerlaender.eu/BinaryMapping/). -- cgit v1.2.3