Saturday, February 24, 2018

Unmangling ASCII text interpreted as Unicode

Sometimes programs take buffers of ASCII strings (or 8-bit characters in general) and mistakenly pass them to functions that expect Unicode (16-bit) strings. This causes pairs of 8-bit characters to fuse together, combining their byte representations into a Unicode code point. To attempt* to recover the original ASCII from nonsense Unicode on the clipboard, you can use this PowerShell command:
[System.Text.Encoding]::ASCII.GetString([System.Text.Encoding]::Unicode.GetBytes((gcb)))

*The attempt might not be 100% successful if the smashed-together bytes form invalid UTF-16 sequences. The very last character of the string can be lost if the original text had a non-multiple-of-two length.

Based on my Super User answer.

No comments:

Post a Comment