Tuesday, December 20, 2016

When does text direction get affected by a previous RTL character?

Suppose you have this text in a text field:

abc123'ߡ'

That triangle character is U+07E1 NKO LETTER MA. It's marked as a right-to-left character, since it comes from the N'Ko alphabet, which is written from right to left.

Now for some experiments. Try typing a number after the triangle but before the closing quote. It will appear to be inserted before the triangle! Hit Backspace, then move to the very end of the string. Type a normal dash. It's not teleported. That character's bidi class is "European separator", which is listed as weak. But now type a number after that, and the string gets shuffled into this:

abc123'ߡ'-4

The order of the last quote, hyphen, and number appears to be flipped. Interestingly, digits have the "European number" bidi class, which is also weak. So why did the number cause the flip?

Unicode RTL rendering rules mandate that number characters (that don't already have their direction forced upon them) take on the direction of the last character behind them that had a strong preference. Neither the quote nor the hyphen did, so the new number becomes RTL, and it takes the in-between characters along for the ride.

Based on my Super User answer.

No comments:

Post a Comment