blob: 6da24eaec44eb7b0e5ba5b1518ed0151a99fb1bd (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
|
# CodepointIterator
…is a `std::iterator` derived class implementing the `std::bidirectional_iterator_tag` which iterates through unicode codepoints in a UTF8-encoded string.
The source code is available on both my [Github] profile and [cgit].
For readers versed in German a [blog article] describing the implementation in a more detailed manner is available.
## Current features
* Bidirectional iteration through unicode codepoints
* The class itself does not rely on any external libraries
* Dereferencing an instance of the iterator yields the codepoint as `char32_t`
* Unit Tests based on GoogleTest
## Usage example
While all features of this class are demonstrated by Google-Test based [Unit-Tests] we can see a basic `UTF8::CodepointIterator` usage example in the following code snippet. The [example text] is written in Old Norse runes.
~~~
std::string test(u8"ᛖᚴ ᚷᛖᛏ ᛖᛏᛁ ᚧ ᚷᛚᛖᚱ ᛘᚾ ᚦᛖᛋᛋ ᚨᚧ ᚡᛖ ᚱᚧᚨ ᛋᚨᚱ");
for ( UTF8::CodepointIterator iter(test.cbegin());
iter != test.cend();
++iter ) {
std::wcout << static_cast<wchar_t>(*iter);
}
~~~
{: .language-cpp}
[Github]: https://github.com/KnairdA/CodepointIterator
[cgit]: http://code.kummerlaender.eu/CodepointIterator/
[Unit-Tests]: https://github.com/KnairdA/CodepointIterator/blob/master/test.cc
[example text]: http://www.columbia.edu/~fdc/utf8/
[blog article]: /article/notizen_zu_cpp_und_unicode
|