While wading through all these old blog posts, I keep running into mangled Unicode characters. I’ve had this problem ever since I started blogging in 2000, but I never knew the source of the error.
Apparently it’s because whatever software I was using to handle my blog posts was misreading UTF-8 as Windows 1252 or ISO-8859-1.
This led me to creating a Ruby script to replace the most common mangled Unicode characters I’ve come across so far, adding more as I get through my old blog posts.
(This technique was lifted from a Stack Overflow answer on how to replace multiple substrings in a single call to the
It so happens that a lot of the mangled Unicode characters I run into used to be smart quotes, and I stumbled upon this phenomenon:
- Smart Quotes are Killing the Apostrophe • 2013 May 6 • New Republic