When to Merge, When to Rebase in Git

When to Merge, When to Rebase in Git

I've worked with various versioning control technology raning from SVN, Git, Mercurial Hg, and Perforce, all at command line level except SVN dated back more than 10 years ago.

To me, Git provides the most flexible with closest alignment to the workflow of programming. Flexibility in merging, and branching. Top notch that it is so cheap operation, fast to complete.

Great power comes with great care of understanding the tools at hands. In a mid-to-large code base with several developers working on it together, it would come with a time that merging would introduce something that requires attention. To make it looks right, and better.

Two terms often come to mind, to merge, or to rebase.

In this post, I would like to ease you in decision making for when to merge, and when to rebase.



What is Merge, and What is Rebase?

Let's start with the common perception most people get when we talk about merge or rebase.

Merge can potentially introduce a merge commit, but rebase won't, instead it can potentially modify the history.

At a glance, it looks like rebase is always better as it doesn't introduce a merge commit but with potentially unaware of the consequence from modified git history can entail. We cannot instantly say which one is better. We can only say which is proper to use in the right situation.

Let's gradually dig this. Let's go.

When Merge won't introduce a merge commit

Merge won't introduce a merge commit when it can do a fast-forward.

What is Fast-Forward?

When we do a git merge, if your tip at HEAD on your local branch is an ancestor of the tip of the target branch you will be merging it into yours then fast-forward is possible.

Imagine the following git scenario

No alt text provided for this image
Initial state of git repository for our scenario study going forward for this article.

As you can see here, tip of master is an ancestor of feature_a. So it can be fast-forward when we want to merge feature_a into master via "git merge feature_a" while we are at master.

No alt text provided for this image
Result after "git merge feature_a". Fast-forward merge gives a linear history.


If we do this on real git command line. You would see something like this.

No alt text provided for this image
Real test. Look at the history. It is linear.

Notice Fast-forward message as shown from the output after we execute "git merge feature_a". Aligned with a diagram given before it.


Now consider when fast-forward is not possible, what will happen?

Let's say master has added a 3rd commit. Thus the tip of master is not an ancestor of feature_a anymore. Both branches are diverged. Fast-forward is not possible.

No alt text provided for this image
Both branches are diverged. Tip of master is not an ancestor of feature_a anymore.

Now when we do "git merge feature_a" while we are at master. We will get following result.

No alt text provided for this image
Result after "git merge feature_a" when both branches are diverged. Merge commit will be created to connected two branches together.

So compare to real world when we interact with git command line.

No alt text provided for this image
No Fast-forward message shown. Notice additional merge commit added into history.

Take a look at the commit graph.

No alt text provided for this image
Merge commit's main job is to connect diverged branches together.

* as shown on the console means an individual commit. Merge commit is there to connect two diverged branches together.


How to check if Fast-forward possible?

There is a way to check whether the fast-forward would be possible by using the following command

git merge-base --is-ancestor <source-branch> <target-branch>

git merge-base is a git command to find the best common ancestors between two commits.

Execute such command by substituting <source-branch> as master and <target-branch> as feature_a.

We would have

git merge-base --is-ancestor master feature_a && echo "fast-forward is possible" || echo "fast-forward is not possible"

Here is the result

No alt text provided for this image
We can specify arbitrary commit hash into git merge-base.

As seen, if we specify the commit before the tip; which is at "2nd commit", that would be a possible point for fast-forward.

Not that I suggest to check every now and then with git merge-base whether it can be fast-forward to formulate our decision making every time we would be doing a merge. We can use common sense here.

If we know a feature branch which branched off from the master branch quite long time ago. There is a totally high chance that it cannot be fast-forward. Because more commits are added, and highly likely whenever we merge we would add a merge commit in order to connect the two diverged branches.


Not only we would be merging from feature branch into master, inversely is true as well.

Yes. Often while we're at a feature branch, we would be in need from time to time to merge upstream changes from upstream branch we branched off. This is to get updates e.g. bug fixes, changes from main development, etc to our feature branch.

Usually we don't care for what we would be merging from upstream master into a feature branch. A feature branch just happens to need to have those changes to make it equal and to make it possible to make a PR on Github.

Github has such protection, but actually outside of Github realm for normal git, we can just directly merge and resolve the conflicts if any.

This inverse direction of merging leads us to rebasing.

What is Rebase?

For what it's worth, and for what people might know rebase for, rebase is something that doesn't add a merge commit. Ohh yeah! .... but with a twist.

It can modify the git history.

Read above line again, and please do keep in mind. It will be dangerous when several developers working on the same branch. But it won't be any if there is just one person working on it. You can make git history as clean as you like.

Hmm, actually we could potentially manage and handle such situation of old snapshot. Keep reading.

So the real meaning of rebase ...

Rebase is similar to merge but different in its behavior. It will replay commits on your branch to be on top of commits from target branch. In short, it puts all commits from target branch to be at the beginning then applies yours on top.

Let's say we have the following scenario

No alt text provided for this image
Initial state before we do "git rebase master" while we are at feature_a. Notices more commits added into master.


More development going on on master as seen from two additional commits added (4th, and 5th commit). Of course, two branches are diverged. Imagine we are at feature_a and need to prepare for a Github PR, or just want to get upstream updates from master.

Execute the following command to rebase master onto feature_a while we are at feature_a.

git rebase master

The result for feature_a will be

No alt text provided for this image
Result after rebasing. Notice 3rd, 4th, and 5th commits are placed before commits of feature_a. * means commit hash has been changed.

Compare to real world git command line interaction.

No alt text provided for this image
Notice commit hashes inside red rectangle are changed from original state.

So we left with one remaining question in order to use rebase with confidence. When will such commit history be modified? We need to get to know how commit hash be computed.

When will commit hash be changed?

Commit hash is computed from the snapshot of all the commits before it including its commit's meta information like author, commit timestamp, commit title, and commit message. Thus if its parent commit hash has changed, the current commit hash of itself also needs to be re-computed.

Side-track: see this proof-of-concept to reproduce the exact same commit hash from known requirement.

If a commit hash changed, we know something modifies git history.

No alt text provided for this image
Only yellow commits from the left would be changed for their commit hashes. Due to the parent commit has been changed.


So it's quite useless to avoid git history changed if we need rebase?

Yes. I would say if we need to do rebase for a clean history, and semantically correct in behavior when pull down changes from upstream into the feature branch, then no need to think about whether its operation will change git history. No matter the result, if we need to rebase, it means we commit to it and accept the consequence of potentially history changed.

How to recover from git history changed while we are on old snapshot

As a role play, imagine we are another developer whose local working tree is older. We are about to pull down latest changes in which git history has been modified from someone else. Whenever we do "git pull", we would be facing with the following message.

No alt text provided for this image
We are at old snapshot. Upstream git history has changed and we about to pull it down. This happens.

Git offers us with 3 different strategies to merge

  1. Normal merge - "git pull --no-rebase"
  2. Rebase merge - "git pull --rebase"
  3. Fast-forward merge - "git pull --ff-only"

Of course, it doesn't mean that if we select and specify one strategy to proceed then git will perform with that successfully. No. Git will attempt to do it only if it is possible to do so.

Scenario

Let's say we have modified commit "4th commit", and "5th commit" via interactive rebase with command "git rebase -i <3rd commit>" then specify to edit both commit. Then force push to remote. This means we someone else that still has the older snapshot version of it would likely face with an issue.

Highly likely option 3 won't be possible because we are at situation of two diverged branches.

If we choose to proceed with option 1, "git pull --no-rebase" then we will double commits whose commit hash has been changed. The following is the result git snapshot state we would be getting.

No alt text provided for this image
Result from "git pull --no-rebase". Notice there will be a merge commit, and duplicate commits which have been changed.


No alt text provided for this image
First 1-5 commits are commits from the source branch. Another 4th, and 5th commit come from upstream branch.

The best option to recover from old snapshot is to proceed with option 2 as it will performs similar to what we described about git rebase, but eventually it will discards all old commits at the time of replaying source branch's commits on top. Effectively it is the desire outcome.

No alt text provided for this image
"git pull --rebase" effectively discards old commits.


No alt text provided for this image
Notice there is no duplicated commits in history. All updated to match the upstream one.


Summary of When and When

So we have seen the consequence of doing git merge, and git rebase. Semantics between the two are different although both attempt to reach the same goal.

Do Merge

  • when don't want to make changes to git history
  • when merging a feature branch into master as we will have a merge commit to explicitly describe a point of merging with a chance to add commit message about such merging

Do Rebase

  • when merge upstream changes into a feature branch e.g. feature branch merges changes from master
  • when the team can manage and handle old snapshot problem
  • when prepare a clean PR via interactive rebase
  • when recover from old snapshot problem

Until next article. Happy programming.



I work at IronSoftware with main responsibility at low level tech stack (C++).

If you think our solutions and offering can help your business, please feel free to connect, I can get you to the right person to handle your inquiry.
















要查看或添加评论,请登录

Wasin Thonkaew的更多文章

社区洞察