Cleaning up after migrating from Hg to Git
There is a lot of guidance out there on how to migrate from Mercurial to Git, but they often leave you with a repository in a bad state. Even more so if it originally was a subversion repository, then migrated to Mercurial and now finally to Git.
The Lokad.Cloud repository was such a case. The committers and authors in the commit history were a complete mess, but that's not that much of an issue in practice. Worse is the fact that most text files were stored with CLRF line endings instead of LF internally. Git supports platform-native checkouts (CRLF on Windows, LF on Linux) quite nicely, but it only works well if text files are normalized to LF internally when committed. I strongly recommend doing that, as it will save you from a lot of trouble later on. Luckily it is also the default behavior for new repositories.
Migration: Fast-Export to Git
This is the usual procedure that properly converts branches and tags to the git equivalents:
1: 2: 3: 4: 5: |
|
Normalize the whole history to LF line-endings
This step is only needed if all or some of the commits have been using non-LF line endings internally. If the repo once was in Subversion on Windows this most certainly is the case, but not necessarily on pure mercurial repositories. You can find out whether this is an issue, if you remove your git index and then reset. If a lot of files are now listed as modified, you better fix it as described here, if not you can skip this step.
1: 2: |
|
I recommend to do this step in Linux as it didn't work well for me on Windows.
First we need to turn off any automated git end-of-line handling. Unfortunately this is controlled in multiple places (for historical reasons). First there is the core.autocrlf config we need to turn off:
1:
|
|
Then we need to get rid of all the .gitattributes files in your repository in case they specify any automatic eol handling. This is not necessary in most of the cases, yet the repository I was dealing with used to be a hybrid git/mercurial repo some time ago and thus did already have a gitattributes file. If there is one, delete it and commit. Afterwards your current working directory should be clean, since git no longer wants to fix your line endings on any touched text files.
But to make sure the .gitattributes file in previous commits don't mess with us, we need to drop it in all commits (single line):
1: 2: |
|
After that we finally can go converting all the text files to LF line endings, with another history rewrite (single line):
1: 2: |
|
What this does is for every commit, for all files that are not binary, convert them to LF endings using dos2unix. In my case there are some paths with spaces in them (don't ask..), so I switched over to NULL-character separation using the -z
and -0
options.
To ensure the normalization is enforced in future commits (especially from people forking your repository and then send you pull requests), create a new .gitattributes files containing at least something like * text=auto
. The config option core.autocrlf however is not only local but also depreciated. You can remove it completely using
1:
|
|
Clean up committers and authors
You can get a quick overview on how badly the authors are off using
1:
|
|
Luckily, fixing them is not that difficult, with yet another history rewrite:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: |
|
Housekeeping
After all these rewrites it would be a good time to do some git maintenance, i.e.
1:
|
|
to check and verify your repository, drop no longer used blobs with
1:
|
|
and then clean up and optimize your local repository using
1:
|
|