Monday, December 18, 2017

Getting a Unicode code point from a .NET string

It's easy to convert a .NET Char value into a number, but that gets UTF-16 code points, which are not necessarily Unicode code points. The code points for emoji, for example, don't fit in two bytes, so UTF-16 (used by .NET strings) stores them as two Chars, a high and low surrogate, neither of which are valid Unicode characters on their own. To get the real Unicode code point that starts at a given position in the string, you can use Char.ConvertToUtf32, supplying the string and the starting index.

Relevant Super User answer.

No comments:

Post a Comment