Cleanly Handling a Fork on GitHub

Last updated: Friday, October 30 2020

One of the greatest things about GitHub is the ease with which you can fork a project and make your own changes.

One of the worst things about GitHub is the ease with which you can fork a project and then lose track of the changes upstream is making.

This is extra-true when you want to make occasional PRs for a project with regular commits; it's easy for your fork to get way behind in a way that makes it really annoying to write patches that'll merge cleanly later. This guide collects my personal experience in handling this situation with minimal pain.

One: Set an "upstream" remote

If you run git remote -v in your repo, you'll see the remote repositories that your local copy knows about. These are things that are easy to push to. By default, it'll only contain the GitHub URL for your fork, labeled "origin". You're gonna be interacting with the original repo you forked from a lot, so you want to give it an easy name, too:

git remote add upstream https://github.com/ORIGINAL_OWNER/ORIGINAL_REPOSITORY.git

Now you can easily refer to the original repo as upstream.

Two: Regularly re-sync with upstream

Whenever you're about to start some new work in your fork, re-sync it with the upstream first, so you have all the new commits and are less likely to conflict badly:

git fetch upstream
git checkout master
git merge upstream/master
git push

This should cleanly merge and just do a "fast-forward" on your repo, rather than actually making a merge commit, as long as you faithfully do what I suggest in the next step.

Three: NEVER COMMIT TO YOUR MASTER BRANCH

If you want a clean, easy workflow, you want to NEVER, EVER commit to your local master branch. The master branch's sole purpose is to track upstream exactly, so you can always cleanly work against the current version and write easy-to-merge PRs.

Instead, always make your changes in branches. When you push the branch to your remote, you can do a PR; when the PR is accepted, you can pull the newly-updated upstream and then delete your branch. master always remains a source of upstream truth, unpolluted by your personal code unless blessed by the upstream maintainers.

(This is good advice in general; always make changes in branches and only commit to master when you're done, but I'm usually lazy and just always commit to master for personal projects. But it's super-important to get this right for forks, or else you're in for a lot of pain.)

Four: Rebase your branches regularly

Say you're in the middle of writing a new proposed feature (in a branch, of course) and you notice that upstream has some new commits that touch code you're going to be modifying soon. You'd like to get that code into your branch now, before you start modifying things, so you don't have merge conflicts later. But how?

First, resync your master as stated in step 2. Then, rebase your topic branch on top of the new upstream code:

git checkout MY_COOL_BRANCH
git rebase master

This'll undo your commits, pull in the new stuff from upstream, then replay your commits on top of it, so when you eventually make a PR, the upstream maintainers will have a perfect clean merge, with your commits sitting on top of their latest code.

Five: (Usually) Don't Force-Push Commits, Just Push

When you eventually do submit your PR, the review will probably catch some small mistakes you need to fix. Go ahead and just fix them, and push the new commits with the changes normally. They'll show up in the PR history, and older comments and conversations will stick around and give context to reviewers later.

If you do a significant change, where the diff isn't actually meaningful any longer, then it might make sense to do a force push: git checkout your PR branch, git log to find the first commit of your PR (or the one that you want to reset your progress onto, at least), git reset COMMIT-HASH-HERE to move your "HEAD" pointer back to that spot, then git commit --amend to rewrite the commit to have all your new changes. Finally, force-push it with git push --force, so GitHub won't reject your change as rewriting history.

(Force-pushing is normally a very bad idea, because git is usually rightfully trying to stop you from making a mistake when it rejects a push, but in this case changing the history of a short-lived PR branch is unimportant and makes things look cleaner overall.)

Note: This is NOT NECESSARY NORMALLY. Even if the repository prefers a "clean, linear history", without a lot of fixup commits showing up when they merge in the PR, they have that option now! The "Squash And Merge" option on a PR does this fixup for you - it squishes all of the PR's changes into a single commit, then commits it to the tip of the master branch.

This gives the best of both worlds - while you're working on a PR, you keep track of all your commits and can review past work or comments, but when you're done, all that messy detail gets eliminated and the project gains one nice commit.

I keep my "origin"-named remote as the upstream remote, which I typically wouldn't have push access to (otherwise, why fork?).

Then I create a new remote called "fork" which points to my fork remote.

I do think the word "upstream" is more specific, but even the word "origin" could be taken to mean "the original project that I forked".

It's probably actually better to just have "upstream" and "fork" remotes and no "origin" remote at all, to keep things nice and explicit.

Reply?

This is precisely how we work and it seems to make perfect sense to me... now! :)

However, when we started, the terms 'origin' and 'upstream' seemed almost interchangeable ('origin' seeming to be what 'upstream' should be, and 'upstream' seeming like some generic term meaning 'not local' :))

Perhaps for new git users, 'master' for the main repo ('upstream'), and 'my-master' for the user's repo ('origin') might be better?

Reply?

and yeah, I realise that you can call the repos whatever you want, but 'origin' and 'upstream' seem quite popular :)

Reply?

If you maintain a policy of grooming commits before merge you can replace step five with "squash commits before merge". And as a bonus GitHub users can do this themselves when accepting a PR but as a nice contributor you can do this for them.

Reply?

It kind of beats me why people get so prissy about "fix typo" commits. Most real-world cases of needing to see the difference between two points in time will involve multiple commits anyway, no matter how groomed they got — that's what git diff is for.

Suggesting that people should force push for that is like saying you should machine-gun the clutter off your desk. I mean it works, and you can do it safely. Just don't be surprised if things get out of hand after people have gotten into that kind of habit...

Reply?

Nah, there are good reasons to be prissy about that. Mostly it's that it clogs up the history, which makes both git log and git blame less useful.

Reply?