Tab Completion

I'm Tab Atkins Jr, and I wear many hats. I work for Google on the Chrome browser as a Web Standards Hacker. I'm also a member of the CSS Working Group, and am either a member or contributor to several other working groups in the W3C. You can contact me here.
Listing of All Posts

Cleanly Handling a Fork on GitHub

Last updated:

One of the greatest things about GitHub is the ease with which you can fork a project and make your own changes.

One of the worst things about GitHub is the ease with which you can fork a project and then lose track of the changes upstream is making.

This is extra-true when you want to make occasional PRs for a project with regular commits; it's easy for your fork to get way behind in a way that makes it really annoying to write patches that'll merge cleanly later. This guide collects my personal experience in handling this situation with minimal pain.

One: Set an "upstream" remote

If you run git remote -v in your repo, you'll see the remote repositories that your local copy knows about. These are things that are easy to push to. By default, it'll only contain the GitHub URL for your fork, labeled "origin". You're gonna be interacting with the original repo you forked from a lot, so you want to give it an easy name, too:

git remote add upstream https://github.com/ORIGINAL_OWNER/ORIGINAL_REPOSITORY.git

Now you can easily refer to the original repo as upstream.

Two: Regularly re-sync with upstream

Whenever you're about to start some new work in your fork, re-sync it with the upstream first, so you have all the new commits and are less likely to conflict badly:

git fetch upstream
git checkout master
git merge upstream/master
git push

This should cleanly merge and just do a "fast-forward" on your repo, rather than actually making a merge commit, as long as you faithfully do what I suggest in the next step.

Three: NEVER COMMIT TO YOUR MASTER BRANCH

If you want a clean, easy workflow, you want to NEVER, EVER commit to your local master branch. The master branch's sole purpose is to track upstream exactly, so you can always cleanly work against the current version and write easy-to-merge PRs.

Instead, always make your changes in branches. When you push the branch to your remote, you can do a PR; when the PR is accepted, you can pull the newly-updated upstream and then delete your branch. master always remains a source of upstream truth, unpolluted by your personal code unless blessed by the upstream maintainers.

(This is good advice in general; always make changes in branches and only commit to master when you're done, but I'm usually lazy and just always commit to master for personal projects. But it's super-important to get this right for forks, or else you're in for a lot of pain.)

Four: Rebase your branches regularly

Say you're in the middle of writing a new proposed feature (in a branch, of course) and you notice that upstream has some new commits that touch code you're going to be modifying soon. You'd like to get that code into your branch now, before you start modifying things, so you don't have merge conflicts later. But how?

First, resync your master as stated in step 2. Then, rebase your topic branch on top of the new upstream code:

git checkout MY_COOL_BRANCH
git rebase master

This'll undo your commits, pull in the new stuff from upstream, then replay your commits on top of it, so when you eventually make a PR, the upstream maintainers will have a perfect clean merge, with your commits sitting on top of their latest code.

Five: Force-push commits that just fix PR nitpicks

When you eventually do submit your PR, the review will probably catch some small mistakes you need to fix. You can just make those fixes and push them to the PR branch, but then when it's eventually merged there will be a lot of silly little "fix typo" commits scattered around.

Instead, just fix the commit itself, using the git commit --amend option, then force-push that to your remote with git push --force. Force-pushing is normally a very bad idea, because git is usually rightfully trying to stop you from making a mistake when it rejects a push, but in this case changing the history of a short-lived PR branch is unimportant and makes things look cleaner overall.

(a limited set of Markdown is supported)

I keep my "origin"-named remote as the upstream remote, which I typically wouldn't have push access to (otherwise, why fork?).

Then I create a new remote called "fork" which points to my fork remote.

I do think the word "upstream" is more specific, but even the word "origin" could be taken to mean "the original project that I forked".

It's probably actually better to just have "upstream" and "fork" remotes and no "origin" remote at all, to keep things nice and explicit.

Reply?

(a limited set of Markdown is supported)

This is precisely how we work and it seems to make perfect sense to me... now! :)

However, when we started, the terms 'origin' and 'upstream' seemed almost interchangeable ('origin' seeming to be what 'upstream' should be, and 'upstream' seeming like some generic term meaning 'not local' :))

Perhaps for new git users, 'master' for the main repo ('upstream'), and 'my-master' for the user's repo ('origin') might be better?

Reply?

(a limited set of Markdown is supported)

Re #2: and yeah, I realise that you can call the repos whatever you want, but 'origin' and 'upstream' seem quite popular :)

Reply?

(a limited set of Markdown is supported)

It kind of beats me why people get so prissy about "fix typo" commits. Most real-world cases of needing to see the difference between two points in time will involve multiple commits anyway, no matter how groomed they got — that's what git diff is for.

Suggesting that people should force push for that is like saying you should machine-gun the clutter off your desk. I mean it works, and you can do it safely. Just don't be surprised if things get out of hand after people have gotten into that kind of habit...

Reply?

(a limited set of Markdown is supported)