Commit author name cause 500 server error
Created by: davekaro
Ran into a weird issue today. The git author name for a commit on one of our projects was "Elizabeth." When Gitlab tried to render the commit event, it caused a 500 error. I narrowed it down and discovered that when I ran Gitlabhq::Encode.utf8 "Elizabeth"
in the Rails console, I got the same error.
CharlockHolmes::EncodingDetector.detect("Elizabeth")
returns IBM424_rtl
as the encoding and CharlockHolmes::Converter.convert("Elizabeth", "IBM424_rtl", 'UTF-8')
fails with the same error.
So, why does Gitlab go through guessing a git commit author's name's encoding using CharlockHolmes? It's simply not reliable. CharlockHolmes uses the ICU project which states "This is, at best, an imprecise operation using statistics and heuristics. Because of this, detection works best if you supply at least a few hundred bytes of character data that's mostly in a single language."
A name is hardly a few hundred bytes of character data - surely there is a better way to handle encodings?