When to Merge, When to Rebase in Git
I've worked with various versioning control technology raning from SVN, Git, Mercurial Hg, and Perforce, all at command line level except SVN dated back more than 10 years ago.
To me, Git provides the most flexible with closest alignment to the workflow of programming. Flexibility in merging, and branching. Top notch that it is so cheap operation, fast to complete.
Great power comes with great care of understanding the tools at hands. In a mid-to-large code base with several developers working on it together, it would come with a time that merging would introduce something that requires attention. To make it looks right, and better.
Two terms often come to mind, to merge, or to rebase.
In this post, I would like to ease you in decision making for when to merge, and when to rebase.
What is Merge, and What is Rebase?
Let's start with the common perception most people get when we talk about merge or rebase.
Merge can potentially introduce a merge commit, but rebase won't, instead it can potentially modify the history.
At a glance, it looks like rebase is always better as it doesn't introduce a merge commit but with potentially unaware of the consequence from modified git history can entail. We cannot instantly say which one is better. We can only say which is proper to use in the right situation.
Let's gradually dig this. Let's go.
When Merge won't introduce a merge commit
Merge won't introduce a merge commit when it can do a fast-forward.
What is Fast-Forward?
When we do a git merge, if your tip at HEAD on your local branch is an ancestor of the tip of the target branch you will be merging it into yours then fast-forward is possible.
Imagine the following git scenario
As you can see here, tip of master is an ancestor of feature_a. So it can be fast-forward when we want to merge feature_a into master via "git merge feature_a" while we are at master.
If we do this on real git command line. You would see something like this.
Notice Fast-forward message as shown from the output after we execute "git merge feature_a". Aligned with a diagram given before it.
Now consider when fast-forward is not possible, what will happen?
Let's say master has added a 3rd commit. Thus the tip of master is not an ancestor of feature_a anymore. Both branches are diverged. Fast-forward is not possible.
Now when we do "git merge feature_a" while we are at master. We will get following result.
So compare to real world when we interact with git command line.
Take a look at the commit graph.
* as shown on the console means an individual commit. Merge commit is there to connect two diverged branches together.
How to check if Fast-forward possible?
There is a way to check whether the fast-forward would be possible by using the following command
git merge-base --is-ancestor <source-branch> <target-branch>
git merge-base is a git command to find the best common ancestors between two commits.
Execute such command by substituting <source-branch> as master and <target-branch> as feature_a.
We would have
git merge-base --is-ancestor master feature_a && echo "fast-forward is possible" || echo "fast-forward is not possible"
Here is the result
As seen, if we specify the commit before the tip; which is at "2nd commit", that would be a possible point for fast-forward.
Not that I suggest to check every now and then with git merge-base whether it can be fast-forward to formulate our decision making every time we would be doing a merge. We can use common sense here.
If we know a feature branch which branched off from the master branch quite long time ago. There is a totally high chance that it cannot be fast-forward. Because more commits are added, and highly likely whenever we merge we would add a merge commit in order to connect the two diverged branches.
Not only we would be merging from feature branch into master, inversely is true as well.
Yes. Often while we're at a feature branch, we would be in need from time to time to merge upstream changes from upstream branch we branched off. This is to get updates e.g. bug fixes, changes from main development, etc to our feature branch.
Usually we don't care for what we would be merging from upstream master into a feature branch. A feature branch just happens to need to have those changes to make it equal and to make it possible to make a PR on Github.
Github has such protection, but actually outside of Github realm for normal git, we can just directly merge and resolve the conflicts if any.
This inverse direction of merging leads us to rebasing.
What is Rebase?
For what it's worth, and for what people might know rebase for, rebase is something that doesn't add a merge commit. Ohh yeah! .... but with a twist.
It can modify the git history.
Read above line again, and please do keep in mind. It will be dangerous when several developers working on the same branch. But it won't be any if there is just one person working on it. You can make git history as clean as you like.
Hmm, actually we could potentially manage and handle such situation of old snapshot. Keep reading.
So the real meaning of rebase ...
Rebase is similar to merge but different in its behavior. It will replay commits on your branch to be on top of commits from target branch. In short, it puts all commits from target branch to be at the beginning then applies yours on top.
Let's say we have the following scenario
More development going on on master as seen from two additional commits added (4th, and 5th commit). Of course, two branches are diverged. Imagine we are at feature_a and need to prepare for a Github PR, or just want to get upstream updates from master.
Execute the following command to rebase master onto feature_a while we are at feature_a.
git rebase master
The result for feature_a will be
Compare to real world git command line interaction.
So we left with one remaining question in order to use rebase with confidence. When will such commit history be modified? We need to get to know how commit hash be computed.
When will commit hash be changed?
Commit hash is computed from the snapshot of all the commits before it including its commit's meta information like author, commit timestamp, commit title, and commit message. Thus if its parent commit hash has changed, the current commit hash of itself also needs to be re-computed.
Side-track: see this proof-of-concept to reproduce the exact same commit hash from known requirement.
If a commit hash changed, we know something modifies git history.
So it's quite useless to avoid git history changed if we need rebase?
Yes. I would say if we need to do rebase for a clean history, and semantically correct in behavior when pull down changes from upstream into the feature branch, then no need to think about whether its operation will change git history. No matter the result, if we need to rebase, it means we commit to it and accept the consequence of potentially history changed.
How to recover from git history changed while we are on old snapshot
As a role play, imagine we are another developer whose local working tree is older. We are about to pull down latest changes in which git history has been modified from someone else. Whenever we do "git pull", we would be facing with the following message.
Git offers us with 3 different strategies to merge
Of course, it doesn't mean that if we select and specify one strategy to proceed then git will perform with that successfully. No. Git will attempt to do it only if it is possible to do so.
Scenario
Let's say we have modified commit "4th commit", and "5th commit" via interactive rebase with command "git rebase -i <3rd commit>" then specify to edit both commit. Then force push to remote. This means we someone else that still has the older snapshot version of it would likely face with an issue.
Highly likely option 3 won't be possible because we are at situation of two diverged branches.
If we choose to proceed with option 1, "git pull --no-rebase" then we will double commits whose commit hash has been changed. The following is the result git snapshot state we would be getting.
The best option to recover from old snapshot is to proceed with option 2 as it will performs similar to what we described about git rebase, but eventually it will discards all old commits at the time of replaying source branch's commits on top. Effectively it is the desire outcome.
Summary of When and When
So we have seen the consequence of doing git merge, and git rebase. Semantics between the two are different although both attempt to reach the same goal.
Do Merge
Do Rebase
Until next article. Happy programming.
I work at IronSoftware with main responsibility at low level tech stack (C++).
If you think our solutions and offering can help your business, please feel free to connect, I can get you to the right person to handle your inquiry.
Software engineer
1 年Huge work, thank you!