Age | Commit message (Collapse) | Author |
|
* CodepointIterator only supports UTF-8 encoded single-byte input strings
** this should prevent CodepointIterator from compiling on systems with larger char sizes while providing a helpful error message
* improved const-correctness by marking currByte (iterator dereferencing cache) and helper method arguments as const
|
|
* operator- takes a reference to a const std::string::const_iterator and as such enables determining the actual position of a codepoint within a string
* ranged for loops in test cases now take the iterator value by rvalue reference instead of by value
|
|
* i.e. replaced uint8_t with std::uint8_t contained within the standard namespace
** as this version of the types is defined by the standard this should offer better compiler independence and standard compliance
* removed unnecessary pointer and reference type arguments in the std::iterator template specializations the CodepointIterator class is derived from
|
|
* it is not the responsibility of a codepoint iterator to cache the resolved codepoint for reuse
** if this is required by the user of this class can iterate it better in the context it is required
** e.g. implement a "CachedIterator" template
|
|
* utility.h and utility.cc now contain the UTF8-codepoint and unit bitmasks and read / write functions
* Modified users of these functions and unions accordingly
* Added the new compilation unit to the Makefile
* Changed bitmask specification from plain integer literals to shift expressions for better readability
|
|
* CodepointIterator is a simple C++ iterator class which iterates through unicode codepoints in a UTF8-encoded string
* It is derived from std::iterator and implements the std::bidirectional_iterator_tag
* Dereferencing an instance of the class provides the codepoint as char32_t
* Tests require Google Test and use UTF8-samples from http://www.columbia.edu/~fdc/utf8/
|