Extract part of a Subversion repo into a Git repo

A few times I’ve wanted to extract part of a large monolithic Subversion repository out into a seperate Git repo, but maintain the commit history.

Here’s how I do it.

First, I set up a mapping of Subversion user => Username in a file, so that the committer can match up easily via GitHub etc – each committer should have an entry like the below, one per line:


davidp = David Precious

Now, I clone the entire Subversion repo via git svn into a new git repository:


# Clone the Subversion repo into a new Git repo:
# (~/subversion_authors.txt is file mentioned above)
git svn clone file:///shared/svn/scripts --no-metadata -A ~/subversion_authors.txt tmp/scripts-repo-tmp

Some tags get added during this process, I believe; I don’t need/want to preserve them, so I remove any and all tags:


# remove tags - we don't need them
git tag -l | xargs git tag -d

Now, the clever part; using git filter-branch to select the path within the repo I want to preserve, and remove everything else, promoting the desired path to the “root” of the repository:


# remove all except a given path:
git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter path/to/desired/dir HEAD

In the above, path/to/desired/dir is the path within the repo that I want to move to the root of the repo; everything else will be discarded.

At this point, I can add a GitHub repository via git remove add origin $url, and push the new repository.

I *think*, because I pushed to GitHub, then deleted my temporary repo and cloned back down, that unrelated previous commits were automatically removed. In case that’s not true, though, the following ought to purge unrelated commits from the new Git repo:


git reset --hard
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
git reflog expire --expire=now --all
git gc --aggressive --prune=now