Abusive use of archive functionality by robots
Created by: zorun
Since our upgrade to the latest Gitlab version, we very frequently run out of disk space (i.e. almost every day).
After investigation, it turns out that many archives files (.tar.gz, .zip, etc) are created in /home/git/gitlab/tmp/repositories/
. This is due to GET requests to URL like /johndoe/examplerepo/repository/archive.tar.gz?ref=cafecafecafecafecafecafecafecafecafecafe
by robots, most notably GoogleBot.
Is there a way to tell robots not to index these kind of links? A workaround could be to delete these archives automatically after a short period of time.
We're running GitLab 6.5.1 2ffa03ab. Thanks!