Mirroring a Git Repository

For the past couple of years, I’ve been mirroring repositories locally so that I always have the code accessible even without Internet access. When I saw that FoundationDB’s GitHub account was wiped when Apple bought them, I figured other people might be interested in mirroring repositories, too.

I currently mirror Git, Mercurial and Subversion repositories, but here I’ll just focus on how to mirror Git. The other two will be separate articles.

How to do it

First, you want to do the initial clone. This is slightly different than a regular clone because you want to get everything from the remote, including all refs (not just branches), and you want these to be overwritten when you update. Luckily, Git provides an appropriately named flag for doing this.

git clone --mirror git://git.sv.gnu.org/coreutils

This will clone the repository into coreutils.git. You can rename it, but it’s useful to leave the .git at the end to remember that it’s just a mirror; it’s not a working copy.

Keeping it updated

Keeping your mirror up to date is very straight forward:

git remote update --prune

This will overwrite all the refs, so you’ll have a properly updated mirror. Since you want a clean mirror (and don’t want extra branches sticking around after they’ve been deleted on the remote), you should use the --prune flag.

Looking at the contents

What we’ve done so far is create a bare repository. This is similar to doing a git clone, but instead of having the code in the directory and the git files in .git, we have the directory containing only the stuff in .git.

So if we want to actually see the contents of the repository, we’ll need to clone it. This works the exact same as a regular clone, except the URL is on the local filesystem:

git clone $REPOS/coreutils.git

This will clone it to ./coreutils and you can look at the code just as you always did.

Automating it

I’m working on a better system to mirror repositories, but for now I have a couple of scripts that help me out.

This is my mirror script. I pass it the URL of my repository as the first parameter, and the name as an optional second parameter:

#!/bin/bash
set -e

repo=$1
repo_name=$(sed 's/.*\///g' <<< $repo)

# use a different repo name if provided
if [ -n "$2" ]; then
    repo_name=$2
fi

if ! grep --quiet '\.git$' <<< $repo_name; then
    # git clone --mirror will stick a .git at the end if one's missing
    # when we cd into the directory to update, we'll need the .git
    repo_name=$repo_name.git
fi

echo $repo_name >> git_repo_names.txt
echo $repo >> git_repos.txt

git clone --mirror $repo $repo_name

The second one’s my update script. It uses GNU parallel to update all the files more quickly.

#!/bin/bash
parallel 'cd {} && git remote update --prune' < ./git_repo_names.txt

I’m currently working on making something a bit more robust (the .txt files are a bad way of doing this), but these worked fairly well for quickly setting up a lot of git mirrors.

I hope this helped. Git makes it fairly straight forward to mirror repositories and they don’t take up much space, so you should make some yourself! You never know when the code you want to read is no longer available.