hg pull –rebase considered harmful

Up until recently, I pretty much exclusively used git as my VCS system. At my new job, we use Mercurial, at which I had (and still have) a steep learning curve. This article describes a pitfall with hg pull --rebase that I’ve fallen into twice now and that can cause you to accidentially merge branches.

Demonstration

I’ll demonstrate the problem with a little demo repository. This is the state of the repository in the beginning. We have a default branch and a feature branch named FeatureBranch. We’re currently on the default branch and our last commit was not pushed yet:1

 1> hg log -G --template "{rev} [{phase}] Branch: {branch}\n    {desc}\n\n" --pager never
 2
 3@  3 [draft] Branch: default
 4|      Work on default
 5|
 6| o  2 [public] Branch: FeatureBranch
 7| |      Work on Feature
 8| |
 9| o  1 [public] Branch: FeatureBranch
10|/       Create FeatureBranch
11|
12o  0 [public] Branch: default
13			 Initial commit

Now we want to continue to work on FeatureBranch. Since FeatureBranch and default have diverged a bit, I want to “refresh” FeatureBranch from default. In git, I would do git checkout FeatureBranch; git rebase default now. You cant’t do this easily in Mercurial, since you can’t rebase public commits. According to my company’s best practices, in Mercurial you do this via a merge from default:

1> hg checkout FeatureBranch
21 files updated, 0 files merged, 1 files removed, 0 files unresolved
3> hg merge default
41 files updated, 0 files merged, 0 files removed, 0 files unresolved
5(branch merge, don't forget to commit)
6> hg commit -m "Merge from default"

This is how the repository looks like now. Note that we have merged from commit 3 which is still in the draft phase, i.e., was not pushed yet. One might consider this to be “unclean”, but it’s certainly something that can happen.

 1@    4 [draft] Branch: FeatureBranch
 2|\       Merge from default
 3| |
 4| o  3 [draft] Branch: default
 5| |      Work on default
 6| |
 7o |  2 [public] Branch: FeatureBranch
 8| |      Work on Feature
 9| |
10o |  1 [public] Branch: FeatureBranch
11|/       Create FeatureBranch
12|
13o  0 [public] Branch: default
14			 Initial commit

Now I might or might not do some more work on FeatureBranch, but at some point something important comes in and I need to switch back to default again. As usual after a branch switch, I look at what I’m working with here:

1> hg checkout default
20 files updated, 0 files merged, 1 files removed, 0 files unresolved
3> hg log -b default --pager never --template "{rev} [{phase}] Branch: {branch}\n    {desc}\n\n"
43 [draft] Branch: default
5		Work on default
6
70 [public] Branch: default
8		Initial commit

Note that you dont see the merge commit from before. That’s because now I’m only looking at the branch I’m currently on. That’s in fact my usual workflow since we have a lot of branches, and hg log without a -b filter is not very useful.

At this point I notice: Huh, forgot to push my last commit. Better do that now!

1> hg push
2pushing to [upstream]
3searching for changes
4remote has heads on branch 'default' that are not known locally: 26aa8362906c
5abort: push creates new remote head 7128aebbe5b6
6(pull and merge or see 'hg help push' for details about pushing new heads)

Well, everybody else is also working and modified default in the meantime. I need to pull-rebase-push.

There are multiple ways of doing that, and this is the crucial point that triggers the “pitfall”. See below for how to do this correctly. The method that I have been told as “best practice” is to use hg pull --rebase. This is also what TortoiseHG Workbench does by default if you click the “Pull incoming changes” button. So here we go:

 1> hg pull --rebase
 2pulling from [upstream]
 3searching for changes
 4adding changesets
 5adding manifests
 6adding file changes
 7added 1 changesets with 1 changes to 1 files (+1 heads)
 8new changesets 26aa8362906c
 9rebasing 3:3b8bb2a8fb14 "Work on default"
10rebasing 4:f1d64eab9587 "Merge from default"
11saved backup bundle to [some path]

Note that the merge commit from earlier (commit 4) has also been rebased! This makes absolute sense, since 4 sits on top of 3, and 3 is what we want to rebase.

This is how the repository looks like now.

 1o    5 [draft] Branch: default
 2|\       Merge from default
 3| |
 4| @  4 [draft] Branch: default
 5| |      Work on default
 6| |
 7| o  3 [public] Branch: default
 8| |      Commit by some other author
 9| |
10o |  2 [public] Branch: FeatureBranch
11| |      Work on Feature
12| |
13o |  1 [public] Branch: FeatureBranch
14|/       Create FeatureBranch
15|
16o  0 [public] Branch: default
17			 Initial commit

The new commit 3 has been pulled and 4 and 5 (which previously were 3 and 4) have been rebased on top of it. Do you see the problem? The merge commit 5 has changed its branch to default! Note that back when we did hg pull --rebase, we didn’t necessarily see that merge commit in our log, because it was on a different branch then. So unless you happen to remember that there was a merge commit on top of the current commit in your default branch, you have very little chance of spotting this mistake.

All we see right now is a successful hg pull --rebase. What comes next? hg push, of course! And once you do that, you have merged and pushed the whole FeatureBranch into default.

To make matters worse, merge commits cannot be backed out, so it’s not easy to clean up the mess. See the Cleanup section below for the best cleanup solution I have found.

Avoiding the Problem

The cause of the mess is that hg rebase (which is what hg pull --rebase uses) changes the branch of all commits it rebases to the current branch. As far as I can tell, there is no way of changing this when using hg pull --rebase. However, when using hg rebase, one can pass the --keepbranches argument:

–keepbranches keep original branch names

So what you probably want to do (assuming a clean working directory) is this:

  1. Starting from your current branch’s head (let that revision be HEAD), determine the lowest ancestor that’s still in draft phase (if any). Note that revision as BASE.
  2. Do hg pull --up. This pulls and updates to the branch’s new head pulled from the remote repository. Let that new head’s revision be REMOTEHEAD.
  3. Do hg rebase --keepbranches -s BASE -d REMOTEHEAD.
  4. Do hg up ~HEAD.

This should do approximately what hg pull --rebase does, but does preserve branches.

Cleaning Up the Mess

Okay, so you found this post after you did hg pull --rebase ; hg push and now you have the same mess that I faced. You now know what the mistake was, but that doesn’t help you much, right?

Here is the best solution I know to resolve that problem. First, this is again the “messy” state that our repository is currently in:

 1o    5 [public] Branch: default
 2|\       Merge from default
 3| |
 4| @  4 [public] Branch: default
 5| |      Work on default
 6| |
 7| o  3 [public] Branch: default
 8| |      Commit by some other author
 9| |
10o |  2 [public] Branch: FeatureBranch
11| |      Work on Feature
12| |
13o |  1 [public] Branch: FeatureBranch
14|/       Create FeatureBranch
15|
16o  0 [public] Branch: default
17			 Initial commit

The offending merge commit 5 is now public, merging the whole FeatureBranch into default. What I would do now on git would be this:

1> git up default
2> git reset --hard HEAD^
3> git push --force

This would change the head of the default branch to the commit before the merge in the remote repository. After that, I would write a mail to all my colleagues apologizing and telling them they need to pull. ;-)

However, that’s not possible in Mercurial, since you can’t move a head “backwards” via force-push. You even cannot push a second head (at least the way our repository is configured). Thus, we must first get rid of the current head (the merge commit) and then create a new, “clean” head.

Getting rid of the head works by closing the branch. It might sound a bit scary to close your default branch, but it can easily be re-opened, and in fact that’s what we do in the next step: We re-open the default branch with a new head that branches off before the merge commit.

So, first we update to the current head, then we close it:

 1> hg update 5
 21 files updated, 0 files merged, 0 files removed, 0 files unresolved
 3> hg commit --close-branch -m "Closing to clean up accidential merge"
 4> hg log -G --template "{rev} [{phase}] Branch: {branch}\n    {desc}\n\n" --pager never
 5@  6 [draft] Branch: default
 6|      Closing to clean up accidential merge
 7|
 8o    5 [public] Branch: default
 9|\       Merge from default
10| |
11| o  4 [public] Branch: default
12| |      Work on default
13| |
14| o  3 [public] Branch: default
15| |      Commit by some other author
16| |
17o |  2 [public] Branch: FeatureBranch
18| |      Work on Feature
19| |
20o |  1 [public] Branch: FeatureBranch
21|/       Create FeatureBranch
22|
23o  0 [public] Branch: default
24			 Initial commit

Looking good so far. Now since our remote repository rejects pushing a second head, we now need to push before we can re-open the branch at the correct position. Otherwise our push would be rejected, since the push of the second head does not happen “after” closing the first one.

1> hg push
2pushing to [upstream]
3searching for changes
4adding changesets
5adding manifests
6adding file changes
7added 1 changesets with 0 changes to 0 files (-1 heads)

Now we can re-open the branch starting from 4 again.

 1> hg update 4
 20 files updated, 0 files merged, 1 files removed, 0 files unresolved
 3> hg commit -m "Re-open the branch before the accidential commit" --config ui.allowemptycommit=1
 4> hg log -G --template "{rev} [{phase}] Branch: {branch}\n    {desc}\n\n" --pager never
 5@  7 [draft] Branch: default
 6|      Re-open the branch before the accidential commit
 7|
 8| _  6 [public] Branch: default
 9| |      Closing to clean up accidential merge
10| |
11| o  5 [public] Branch: default
12|/|      Merge from default
13| |
14o |  4 [public] Branch: default
15| |      Work on default
16| |
17o |  3 [public] Branch: default
18| |      Commit by some other author
19| |
20| o  2 [public] Branch: FeatureBranch
21| |      Work on Feature
22| |
23| o  1 [public] Branch: FeatureBranch
24|/       Create FeatureBranch
25|
26o  0 [public] Branch: default
27			 Initial commit

And we’re done locally. Our new head of default, commit 7, does not have the merge commit 5 as ancestor. We have successfully un-merged the FeatureBranch from default. Note that the --config ui.allowemptycommit=1 is only necessary since we did not have any actual change to commit. By default, mercurial does not allow empty commits.

Finally, let’s push this into the remote. Since this now also creates a new remote head (but not a second non-closed remote head), a force-push is necessary:

 1> hg push
 2pushing to [upstream]
 3searching for changes
 4abort: push creates new remote head dea12e73c486
 5(merge or see 'hg help push' for details about pushing new heads)
 6> hg push --force
 7pushing to [upstream]
 8searching for changes
 9adding changesets
10adding manifests
11adding file changes
12added 1 changesets with 0 changes to 0 files (+1 heads)

Of course, the repositories on your colleagues computers won’t reflect this change instantly. Best write them a message telling them about the correct head of default, onto which they need to rebase their unpublished commits now.


  1. I will always explicitly state the --template and --pager arguments, usually you would add these to your config. ↩︎

Lukas Barth
Lukas Barth
Algorithm Engineer