Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • G gitlabhq1
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 21
    • Issues 21
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 12
    • Merge requests 12
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • gpt
  • large_projects
  • gitlabhq1
  • Issues
  • #8210

Closed
Open
Created Oct 31, 2014 by Administrator@rootOwner

Web crawler downloads a lot of archives filling up space

Created by: hildensia

We recently had the following problem on our GitLab installation (it's 6.5.0, but as far as I can see nothing changed regarding this issue):

We had a quite big project, which was "public". At some point the Google crawler found its way to this project and started indexing it. And of course it also started to index archives of everything. Thus gitlab generated hundreds and thousands of .zip, .tar.bz2 and .tar.gz files, filling up hundreds of GB space on the hard disc, eventually filling it completely, which in turn lead to gitlab not being available anymore.

One solution of course is to disable crawling completely. But it might be a good idea to disable crawling of archive generation as a standard. It isn't particularly interesting data for a crawler anyway and it hurts if it blows up everything.

Assignee
Assign to
Time tracking