Git Howto: Mirror a GitHub repo without pull refs

GitHub recently started publishing all pull request as special git refs. This is awesome, since it makes it trivial to checkout out and work with them from your local repository, without having to add the submitter's repo as a remote all the time. It is also nicely done in that it does not affect normal clones in any way - unless you actually want to fetch them.

However, there is one case where it may have an undesired side effect: mirrors. For example, I routinely mirror the Math.NET Numerics mainline repository to a couple other places, including Codeplex, Gitorious and Google Code. I want a mirror to exactly mirror the source repository, adding all new branches and tags automatically, but also remove those that have been deleted in the source. Git has excellent support for such exact mirroring. Unfortunately this mirroring mechanism includes all the pull refs as well, which may not be what you want. In Math.NET Numerics, some pull request actually base on an old (long removed) branch that included some corrupt objects. So in this case, including them in the mirror not only doubles the repository size, it also causes a corrupt git file system.

Luckily there is an easy way to skip them in the mirror, but to do that we must understand how git refs actually work:

Git Refs

In essence, a git ref is just a reference to a specific git commit. Refs can represent local branches and tags, but also remote branches. To keep things organized, they're structured hierarchically. You can find them in two places:

  • As separate file for each ref in the .git/refs directory
  • In the .git/packed-refs file

In a normal local repository you'll typically end up with the following structure:

1: 
2: 
3: 
4: 
refs/heads/{branchname} - all your local branches
refs/remotes/{remotename}/{branchname}` - all your fetched remote branches
refs/tags/{tagname} - all tags
refs/stash - your stash, if you use it

However, if you create a local mirror of a GitHub repo, i.e.

1: 
$ git clone --mirror git://github.com/mathnet/mathnet-numerics.git

Then you'll end up with exactly the same bare structure as the remote itself, but this time including GitHub's pull requests:

1: 
2: 
3: 
refs/heads/{branchname} - all your remote branches
refs/tags/{tagname} - all your remote tags
refs/pull/{id}/head|merge - all your remote GitHub pull requests

How exactly a remote's refs are mapped down to your local refs and why there is a difference between the structure of a normal clone and a bare mirror is defined in the refspec that is automatically added to your repo config.

In a normal clone, the fetch refspec typically looks like this:

1: 
fetch = +refs/heads/*:refs/remotes/origin/*

It essentially says that all remote refs within refs/heads should map to local refs in refs/remotes/origin. On the other hand, a mirror includes all refs, so its refspec looks like the following:

1: 
fetch = +refs/*:refs/*

Excluding Pull Refs

As far as I know there is no simple way to exclude some refs in a subdirectory of a refspec, but you can add multiple fetch refspecs to get the same effect. Simply replace the catch-all refspec above with two more specific specs to just include all heads and tags, but not the pulls, and all the remote pull refs will no longer make it into your bare mirror:

1: 
2: 
fetch = +refs/heads/*:refs/heads/*
fetch = +refs/tags/*:refs/tags/*

Full Config Example

For completeness, I've attached the full config (see git config -e) I use myself for mirroring Math.NET Numerics below.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
[core]
    repositoryformatversion = 0
    filemode = false
    bare = true
    symlinks = false
    ignorecase = true
    hideDotFiles = dotGitOnly
[remote "mathnet"]
    url = [email protected]:mathnet/mathnet-numerics.git
    fetch = +refs/heads/*:refs/heads/*
    fetch = +refs/tags/*:refs/tags/*
    mirror = true
[remote "mirrors"]
    url = https://git01.codeplex.com/mathnetnumerics
    url = https://code.google.com/p/mathnet-numerics/
    url = [email protected]:mathnet-numerics/mainline.git
    mirror = true
    skipDefaultUpdate = true

To update the mirror, I then run the following commands:

1: 
2: 
$ git remote update
$ git push mirrors