A simplistic backup strategy for Git repositories to AWS S3
Since I started hosting some of my git repository on my own server instead of Github, I wanted a backup strategy for my repositories. I see several possibilities there:
- do not backup data on the server, because all local repositories together should have all information
- mirror data on the server to a secondary git repository
- copy the folders somewhere else, where they cannot be accessed as a repository; e.g. in a compressed archive
The first option in my opinion is not the best one, because you might have
to restore the state from a lot of different PCs. The simplest scenario
is that you work on branch A
from host1
,
on branch B
on host2
and so on.
The second option is a good one, because it also allows you to continue pushing and pulling changes in case your main remote fails. However, I did not have a second server and CodeCommit is more expensive than my selected option.
I opted for the third option and created a small script that runs as a cron job, which sends all git repositories as a compressed archive to S3 storage. Even if you do mirroring of a repository to another one, you’d still want to implement daily backups, because they allow you to return to an older state in case your repository gets corrupt for some reason.
All the script really does is putting all files into an archive with the current timestamp and transmitting this to S3. I currently do not have a deletion strategy for old backups, but I assume it’s OK to just delete them after one month (and not follow a sophisticated deletion strategy with increasing time spans the older the backups get).
This first creates the archive with current date in the file name, then calls AWS-cli (inside a docker container) to submit the file to AWS S3 and finally deletes the archive again.
I do not maintain a comments section. If you have any questions or comments regarding my posts, please do not hesitate to send me an e-mail to blog@stefan-koch.name.