What will you do if you accidentally lose access to your GitLab repository? Or some of your issues, pull requests, webhooks, or other metadata is gone for a lifetime? One may think that if he keeps all his data in Git and uses GitLab, which is one of the most secure Git hosting services, everything will be fine.
However, Git isn’t a backup itself, and GitLab as any other service provider is governed by the Shared Responsibility Model, which means that your data is all your responsibility. The service provider is only responsible for its service being ongoing, and its data being secured.
In this blog post, let’s look at different methods and tools to back up GitLab, their pros and cons.
Why do you need to back up your GitLab environment?
There is no doubt that organizations are up to ensuring the safety and integrity of the data they store within their GitLab instance. We shouldn’t forget that source code is the most valuable asset each company has. However, the threat of losing access to their data isn’t the only reason…
Eliminating data loss and ensuring business continuity
Ransomware attacks, outages, vulnerabilities while building, and other various events of failure can result in data loss. Unfortunately, such events are happening rather often. That’s why it’s crucial for companies to know how to behave once such a critical situation happens, and, what is more, adopt all the necessary security measures that can help you prevent the worst disaster scenario.
Aligning with the Shared Responsibility Model
Moreover, we shouldn’t forget about the already-mentioned Shared Responsibility Model, that every SaaS provider follows. Under this model, GitLab is responsible for the business continuity of its systems and shares the responsibility for the user’s GitLab data protection. Thus, for example, in the GitLab Subscription Agreement, it’s stated that GitLab is responsible “for establishing and maintaining a commercially reasonable information security program that is designed” to:
- make sure that the Customer Content is confidential and secure;
- protect against any potential threats or risks to the integrity or security of the Customer Content;
- prevent unauthorized access to or use of the Customer Content;
- ensure that any GitLab subcontractors, if any, adhere to the aforementioned requirements.
On the other hand, a user is responsible for its data – it’s him who should guarantee the security of his own stack. Here is what GitLab states in the same document:
What does that “commercially reasonable security” mean? It includes all the security measures that the user thinks are necessary to protect its own data, and backup of the GitLab environment is definitely on this must-have list.
Meeting compliance, legal, or other organization requirements
Once your organization steps onto the security track, you understand that you need to comply with straighten security requirements and regulations. What’s more, you need to pass security checks and audits to show that your product is reliable and secure. For that reason, you need to pass such security standards as GDPR, ISO 27001, SOC 2 Type I, and SOC 2 Type II, and one of the requirements to pass those security audits is a reliable backup and Disaster Recovery strategy that guarantees your business continuity.
GitLab backup options – which one to adopt?
A lot of organizations rely on GitLab, to be more precise, more than 100K companies use GitLab as part of their infrastructure. This DevSecOps platform provides numerous security measures to protect its customers’ data. However, when it comes to backup, “GitLab lacks support for many backup features, including incremental backups, selective restore, default encryption.” – as it’s stated in GitLab’s Category Strategy- Backup and Restore. Thus, the Git hosting provider enables you to use different GitLab Backup methods.
While free backup solutions can be rather appealing, they often lack reliability and require a lot of attention from your DevSecOps team’s side, as well as their manual effort. Professional backup solutions, on the other hand, help organizations automate backup processes, reduce shared responsibilities for data protection, meet the company’s compliance requirements, and improve security.
GitLab Backup and Restore Utility
Appreciating security a lot, GitLab has its own dependability measure to help its users with GitLab backups: a built-in backup and restore utility – gitlab-rake. Using these Rake tasks functions, you can build an archive file that contains all of your GitLab data, including database, repositories, configuration files, and attachments.
To perform a Rake task you may need to use the appropriate command that depends on what version of GitLab your organization uses and how you installed GitLab (basically you have 3 scenarios).
GitLab was installed using the Omnibus package
Whether you used the Omnibus package to install your version of GitLab, you will need to select one of the commands:
- sudo gitlab-backup create if you use the 12.2 version of GitLab and later;
- gitlab-rake gitlab:backup:create if your version of GitLab is 12.1 and earlier.
GitLab was installed from the source
In this case, all you need to do is to run the command – sudo -u git -H exec rake gitlab:backup:create RAILS_ENV=production.
GitLab is run from within the Docker container
Running the service form within the Docker container permits you to start your backup from the host. Thus, if you use GitLab version 12.2 or later, run docker exec -t <container name> gitlab-backup create. If you use the version GitLab 12.1 or earlier, you may need docker exec -t <container name> gitlab-rake gitlab:backup:create.
GitLab was trying its best to describe how to back up your GitLab instance by providing in-depth documentation. However, it needs a lot of your attentiveness and time to read all the articles and then create your backup copy. Moreover, those items that are stored outside your file system aren’t included in the GitLab backup, which means that you still may lose some of your source code.
Manual export and download from GitLab
Probably one of the easiest ways to have a copy of your GitLab data is to manually download your archives to your local device. This way, you can export and download repositories, as well as snippets.
Despite its simplicity and affordability (it’s free of charge), it doesn’t provide you with a complete backup. First, those archives don’t contain your metadata, so you don’t get a copy of your entire GitLab instance. Then, it’s an absolutely manual procedure, that requires a lot of attention from your team, and what if you have numerous GitLab repositories? It will eat into their time a lot as well. Finally, this method requires you to manually restore your data should the failure happen. So, recovery is going to be an extra task for your DevOps team.
PgBouncer as a backup option
Connecting PgBouncer is another alternative for backing up your GitLab instance. However, you should be cautious, as this method can be considered a reliable one. GitLab states that it can cause your GitLab instance to go down and you will get an error message in this case: ActiveRecord::StatementInvalid: PG::UndefinedTable.
To help you avoid this challenge GitHub warns that your backup and restore tasks: “must bypass PgBouncer and connect directly to the PostgreSQL primary database node.” On the contrary, you can also use environment variables that will override the settings of your database after your backup performance.
GitLab Repository Cloning
Some DevOps may consider cloning a backup option itself… However, is it? Well, by cloning a GitLab repository you get a local, fully functional copy. You download remote repository files to your computer and then make a connection between them. Yet keep in mind that this connection will require you to add credentials.
There are 3 possible ways to clone your GitLab repo:
- Using SSH and running the git clone command firstname.lastname@example.org:gitlab-tests/sample-project.git. Thus, you will need to authenticate yourself only once.
- using HTTPS and running the git clone command https://gitlab.com/gitlab-tests/sample-project.git. In this case, you will need to authenticate yourself each time you perform a backup operation between your computer and your GitLab instance.
- cloning with HTTPS using Personal access, Deploy, Project access, or group access tokens via the command git clone https://<username>:<token>@gitlab.example.com/tanuki/awesome_project.git. It will permit you to enable Two-Factor Authentication or have a set of credentials that is recoverable and particular to one or more repositories.
File system data transfer or snapshot
If your GitLab instance includes too much Git repo data, making the GitLab backup script significantly slower, or if your GitLab instance has too many branched projects and you don’t want to duplicate your GitLab data, you might opt for the snapshot or file system data transfer option.
However, keep in mind that file system data transfer or snapshot is not a backup method; it is just a snapshot of your entire GitLab instance at some point in time. What’s more, the source OS should be comparable to the destination, making these methods of migration from one operating system to another very difficult or even almost impossible.
GitLab DIY backup
As an alternative, you can try to use custom scripts and do-it-yourself (DIY) solutions. However, such an option may appear to be cost-effective at first glance, but still it has certain limitations. First, they usually have limited scalability, as sometimes they fail to meet the backup requirements of complex GitLab infrastructure, leading to performance issues.
Then, writing custom backup scripts can be time-consuming and require you to pay a lot of attention to ongoing updates and debugging.
And, finally, backup scripts have the risk of data loss and corruption, as it’s difficult to call them a reliable backup strategy planning.
Upgrade your plan to GitLab Dedicated
In 2022 GitLab released its most secure product – GitLab Dedicated which has absorbed all the best features from GitLab and GitLab Ultimate. It provides a variety of security features, including better encryption, full data and source code IP isolation, full control over your data, backup, and Disaster Recovery.
So, once you upgrade your GitLab plan to GitLab Dedicated, you will get snapshots that are performed on a regular basis. Moreover, you will get the possibility to assign additional storage where you’ll be able to keep your backups. In this case, you can meet the 3-2-1 backup rule, by keeping the backup copies in different storage locations.
Despite the positive side, there are some disadvantages. For example, you can’t schedule your backups to be run at the exact time you want, the backup plan that GitLab Dedicated provides is predefined. What’s more, if you have strict RTO and RPO demands, it would be difficult to adjust them as well, as GitLab defines those metrics as – RTO target for 8 hours, and RPO target for 4 hours.
Third-party backup tools
Another option you can decide on is to adopt a third-party backup solution to back up your GitLab environment. In this case, you will be able to share your responsibility for data protection with specially designed backup software. Although GitLab is developing its own backup options and is a reliable DevSecOps platform, organizations still should look for better backup alternatives to meet their needs and requirements. While picking up the right GitLab backup option for your business, you should pay attention to the complexity of data management, GitLab ecosystem coverage (if your backup app includes repositories and metadata in its copy), security features, and recovery scenarios.
How you can benefit from third-party GitLab backup tools?
- You can get better control over the frequency of backups, retention time, and storage destinations.
- You can eliminate the consequences of human errors, outages, and ransomware attacks that usually lead to data loss or data corruption.
- You can restore your data fast with one click without writing any recovery script.
- You can meet strict security regulations and compliance requirements with backup reports.
A comprehensive overview of GitLab Backup Applications
Due to its specific niche, there are only a few backup vendors that support GitLab backups. Let’s look at them more precisely…
Being an automated backup software for GitLab and GitLab Ultimate, GitProtect.io is the easiest way to automate GitLab backups. The cloud-based or on-prem GitLab backup software offers the most comprehensive and advanced features, including backups and restore, scheduled and on-demand backups, and so on.
What sets GitProtect.io apart from other GitLab backup options? Let’s take a look…
Easily set up and manage your GitLab backups: To start backing up your data with GitProtect.io, you shouldn’t install any software or agents. You can simply view and manage your backups from any browser. Thanks to an intuitive interface and its detailed dashboards, visual statistics, compliance section, real-time monitoring, and on-demand actions you are always up-to-date on what is happening with your GitLab infrastructure backup.
Automate backups and reduce your DevSecOps team’s workload: To be sure that your GitLab backups are running as a clock, you can schedule your copies to be performed automatically at any frequency and time you need. Or, if the need arises, you can run your backup manually. Moreover, if your organization has any specific requirements, you can create your fully customized backup plan: different retention settings, including up to unlimited, various encryption levels, with the possibility to set your own encryption key, etc. You can also set multiple backup policies which are very helpful when you have a complex infrastructure – your organization has multiple offices, there are a lot of employees who are working remotely, or your organization operates in different time zones.
Assign as many storage instances as your organization requires: Being a multi-storage system with included unlimited free cloud storage, GitProtect.io allows you to assign any public storage you’d like, including AWS, Google Cloud Storage, Azure Blob Storage, Backblaze B2, and any other storage compatible with S3. Moreover, you can store your data locally – SMB network shares, local disk resources, NAS device, – or even opt for a hybrid backup option and keep your critical GitLab data both in the cloud and locally. Thus, you can easily keep up with the 3-2-1 backup rule, having at 3 copies in 2 different storage destinations, 1 of which is offsite, or any other backup strategies that are gaining popularity nowadays – the 4-3-2 backup strategy or 3-2-1-1-0 one.
Guarantee full data coverage: With GitProtect.io you can backup not only repositories but also all the related metadata, including wikis, issues, snippets, issues, issue comments, deployment keys, pull requests, pull request comments, labels, webhooks, milestones, tags, LFS, pipelines/actions, releases, commits, collaboration, branches, groups, etc.
Ensure ransomware protection and data security: You can be sure that all your GitLab data is encrypted during the backup process, as the backup provider encrypts your data in flight and at rest, with the option of specifying your own encryption key. What’s more, in case you opt for GitProtect Cloud Storage, your backups will be kept in WORM-compliant storage with all the files stored separately. Thus, if ransomware attacks your storage, the malware won’t be able to spread inside your storage and infect all your files.
Meet compliance requirements: Thanks to advanced data-driven monitoring options, SLA section, custom notifications, reporting, and alerting, you are always aware of whether your backup job is done successfully or with warnings. Such an easy overview of backup performance helps you comply with numerous security regulations and compliance requirements, which include GDPR, HIPAA, ISO 27001, and SOC 2.
Secure authentication and authorization: You can enable login and authentication to the GitProtect panel with SAML & SSO, via Auth0, Azure AD, Okta, CyberArk, or Google. It will make your DevSecOps team’s authentication and authorization processes much easier, faster, and smoother.
Manage roles and permissions: You can assign different roles and permissions to your team members. Thus, some of your team members will be able to set backup plans, restore the data, manage data stores, and system settings, while others, the least privileged, will be able to only view the settings. Thus, you will have full control over your backups and their management, as you know what responsibilities every member of your team has.
Recover your GirLab data from any point in time: Foreseing any disaster scenario, GitProtect.io helps you eliminate the consequences of any possible threat – ransomware attack, GitLab outage, your infrastructure downtime, human mistakes, intentional or unintentional deletion. The backup vendor provides you with numerous recovery options, including point-in-time restore, granular recovery of repositories and selected metadata, cross-over recovery to another git hosting service (e.g. GitHub or Bitbucket), restore to the same or new repository or organization, or recovery to your local device.
To see how GitProtect.io can help you build your backup strategy, you can try the backup solution for free during a 14-day trial. After the trial period is over, you will need to pay $18 for up to 15 GitLab repositories per month to continue backing up your GitLab environment.
Providing GitLab backup software, BackupLABS enables GitLab Project automated protection against human errors, malicious users, platform issues, or ransomware and viruses, permitting rollback to specific data in one click.
Automated backups: You can select what GitLab data to protect and set up a backup plan which will be automatically updated every 24 hours.
Backup encryption: Your backup copies are encrypted during both in-flight and at rest with the AES 256 encryption algorithm.
Ensured compliance: You can have accurate and easily accessible records of your GitLab data history.
No need for coding or scripting: You can recover your data by simply defining the needed data to be restored fast.
Daily reports: You can receive daily updates on the statuses of your GitLab backups.
Around-the-clock support: You can turn to the backup support team at any time of the day or night.
Ownership with manual download: Download your own copies when you need for ownership assurance.
If you want to try this solution you can subscribe for a 14-day trial period. After the end of your trial, you will need to pay $9,60 per month for up to 10 GitLab Projects to back up. Depending on the number of GitLab Projects you want to protect, the price will grow respectively. For example, if your organization has 1K Projects in its GitLab environment, then you will need to pay $960 a month.
Which backup option is the best to meet your backup needs and requirements? It’s all up to you to decide. You can test all the solutions, analyze them, and then pick up the most suitable option for your organization. However, don’t forget that GitLab data backup is your responsibility, and your data well-being depends on your decision. Should you opt for your own backup strategy or rely on the third-party backup tool, make sure that you follow GitLab backup best practices – have regular backups, DR possibilities, ransomware protection, security measures, and so on.