Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • G gitlabhq1
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 21
    • Issues 21
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 12
    • Merge requests 12
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • gpt
  • large_projects
  • gitlabhq1
  • Issues
  • #798

Closed
Open
Created May 08, 2012 by Administrator@rootOwner

Commit author name cause 500 server error

Created by: davekaro

Ran into a weird issue today. The git author name for a commit on one of our projects was "Elizabeth." When Gitlab tried to render the commit event, it caused a 500 error. I narrowed it down and discovered that when I ran Gitlabhq::Encode.utf8 "Elizabeth" in the Rails console, I got the same error.

CharlockHolmes::EncodingDetector.detect("Elizabeth") returns IBM424_rtl as the encoding and CharlockHolmes::Converter.convert("Elizabeth", "IBM424_rtl", 'UTF-8') fails with the same error.

So, why does Gitlab go through guessing a git commit author's name's encoding using CharlockHolmes? It's simply not reliable. CharlockHolmes uses the ICU project which states "This is, at best, an imprecise operation using statistics and heuristics. Because of this, detection works best if you supply at least a few hundred bytes of character data that's mostly in a single language."

A name is hardly a few hundred bytes of character data - surely there is a better way to handle encodings?

Assignee
Assign to
Time tracking