In the first part of this series, we understood the fundamental building blocks of the git storage model. Since we have these blocks we can examine how Git uses these building blocks to do the real version controlling.
Now you know commit will always refer to the tree object that represents the root of the repository directory which will refer to all of its sub-tree objects and blobs. Every time when you commit something a new commit object will be created and the parent of that commit will be the last commit that you start work in. The missing piece is to track the last commit that the root directory is in when you go to the repository directory which means the last saved state in git. That is done through the HEAD file in the .git folder. Let’s see the content of it (This repo contains one file and one commit)
Wait what it is not a commit object. That’s where the concept of branches comes into the picture. If this file refers to the latest commit directly then the commit tree will actually be a linked list. The goal of the git was not only to allow a simple version control system. It is also to make it easy to collaborate and contribute to the repository by many people.
Hence it introduces the concept of branches. The branch contains the last commit of that branch and the HEAD file refers to the current branch. So as in the content of the HEAD file let’s examine the refs/heads/master file inside the .git folder
Now you see it is a hash value that represents a commit object.
Git will create a default branch called master when you use the git init command. Let’s create a new branch and examine the .git directory
git branch <branch_name>
git checkout <branch_name>
or use this single command to create and checkout in one step
git checkout -b <branch_name>
The content will be the last commit of the current branch when you create and switch to the new branch
The main advantage of having branches is you can keep a stable branch and create feature branches for feature development. People can work independently without worrying too much about others’ changes. But finally, you need your changes to be in the main branch. That is where the concept of merge comes into the picture
Git is able to interpret the difference between two commits (The output of the git diff command). and able to apply these diffs as a Commit. This is what actually happens in merging. There are many merging algorithms available in Git let’s consider the most used approaches and explain them. Let’s consider the master branch as the branch that we need our new changes to get merged.
Fast forward merge
Let’s consider a scenario where you create a new branch from the master and do some new changes and commit to that new branch. Now you need these changes to be available at the master branch and the master branch has no mere new commit after you create the new branch. So now the easiest way to merge these branches is by just simply changing the master branch commit to the new branch commit. No need for diffing or any new commit, not clear? let’s visualize
What if there is a commit in the master branch after that you create your new branch? Now you can’t directly move the master commit to the new branch commit as in the previous case. The solution is to introduce a new commit. When creating this new commit git has to consider three points to calculate the diffs and apply. The commit that the new branch created, the latest commit of the master branch, and the latest commit of the new branch. That is why it is called a three-way merge. The new Commit will have two parents.
This process might not be this simple if these changes are overlapping. Then in this merge commit you will also have to resolve these conflicts. Git is able to help you resolve these conflicts and modern IDEs make this process more simple. If you are interested in learning in-depth how git diffing patching and conflict resolving happen please watch this series.
The git merge does not change the git history. A powerful feature that allows you to change the git history is the rebase. Let’s visualize the simplest example to see what happens when you use rebase
Here the changes in the new branch are applied to the master branch as new commits. The commits have different hash values. It also has an interactive mode which is more powerful and allows us to do many operations like pick, squash, revert and etc. I will not delve deep into git rebase in this post, but it is a very powerful tool.
There are plenty of resources out there that dig deeper into these subjects, but my goal was not to go that deep because, in the end, you might not use those git-pluming commands daily basis. But the power of having the understanding at a high level is very important and can help you in cases where you are stuck.
From all these simple explanations the point that I want to elaborate on is that all git commands that you use on a daily basis are ways to play with this commit history. Now you can eliminate many myths.
Deleting a branch in Git does not delete the commits immediately, hence don’t need to panic git reflog command can use to recover from these situations, same for git reset don’t fear using them. Just commit your changes and you can play with these commits if it was not as expected in the first place
There is also a garbage collection process available in Git that will remove objects which are not referenced by any branches. It also does compression to reduce storage. These policies also can be configured if needed.
I hope now you can look at Git from a new perspective. Although I didn’t explain everything in detail you know now what to look for if you get into trouble, and after all, you don’t need to know everything in depth but you should have enough understanding about the technology you use, which guide you to know things that you do not know when it is required.