Friday, December 30, 2016

The phpBB 2 to phpBB 3 converter messes up encodings

When I upgraded two forums to phpBB 3 from phpBB 2, some text apparently got messed up. The problem is classic; it took UTF-8 text, interpreted it as Latin-1, and then re-encoded it as UTF-8. This appears to be a problem with the converter - there are several threads on the phpBB forums about it, which I would link if their forums were currently available. I can correct that with a bit of SQL.

Unfortunately, the second forum converted got doubly butchered. I'm not sure exactly how, but it seems some sequences were unrecoverably damaged. That is, some are fine (well, at least recoverable by double-decoding), but some end with an invalid character. I'll need to manually search-and-replace the most common miscoded characters.

No comments:

Post a Comment