Getting started with Git
Harsh Mistry
Senior Firmware Engineer at Eaton | Ex-Einfochips | Firmware Development | Zephyr RTOS
Hello everyone! I hope you all are having an amazing learning experience in this lock-down. The present scenario has forced everyone to posses multiple skills and completely changed the learning process. Happy learning to everyone! Stay updated and keep learning!
Let's get started!
If you are a software developer or want to be one in the future, version control is an essential skill that you need to posses. Since multiple developers are working on the code at the same time, it is essential to maintain a version control system. If you are new, then you might ask, what is a version control system? Let's look into that in the following section.
About Version Control
Version control is a system that records changes to a file or a set of files so that you can recall specific versions later. It is generally used for software source codes but you can use this with any type of file on the computer. Programmers wanted a way to be able to track the changes that they made to computer code over time, as they added features and as they fixed bugs. So they created version control. Because of this they're also called Source Code Management tools or SCM for short.
For example, if you are a graphic/web designer, then you might want to keep every version of an image or a layout. A Version Control System (VCS) is a wise thing to use in such situations. You may ask, why? Using a VCS allows you to revert selected files to a previous state, revert the entire project back to a previous state, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when, and many more. Another advantage of using a VCS generally means that if you screw things up or lose files, then you get it back easily for a very little overhead.
Let's have a look at the history of version control systems before Git.
History of version control systems
Some of the most popular version control systems before Git that are worth mentioning are as follows:
Source Code Control System (SCCS, closed source: distributed with Unix): Developed by AT&T and released in 1972. It was bundled free with Unix. As we all know that Unix was also free. As a consequence, SCCS along with Unix quickly spread across universities. As a result, when these university graduates went for jobs, they took SCCS with them. What SCCS does, is it keeps the original document and instead of saving the whole document a second time, it takes a snapshot of the changes made. For example, if you want version four of the document then you just take the original document and then apply three sets of changes to get to version four. This seemed to be an efficient way so SCCS was dominant till the 80s. That's when RCS came in.
Revision Control System (RCS, 1982: open source): It was an improved version of SCCS. For starters, it was cross platform. SCCS was Unix only. Since PCs had started becoming more common, it was important to have a version control system. RCS was more intuitive, had a cleaner syntax, required fewer commands. Most importantly, it was faster! This improvement in the speed was a due to a smarter storage strategy than SCCS. Remember that SCCS stored the original version and thereafter only saved the snapshots of the upcoming versions. RCS flipped this strategy! RCS stored the latest version as a whole document and when you wanted to go back in time, you can apply the snapshots in the reverse order to get to the previous versions. If you think, this is a lot faster as most of the time, developers would be working on the latest version. Now you can see the drawback with SCCS; imagine applying 20 sets of changes! Another drawback with both RCS and SCCS was that it only allowed you to work on a single file. This means that they only tracked changes in a single file, not on a set of files or a whole project. CVS allowed you to do that.
Concurrent Versions System (CVS, 1986-1990: open source): The real advantage with CVS was not only working on multiple files, but, it's the concurrency! Multiple users can work on the same file. A separate code repository was maintained on a remote server which would store all the codes. This completely changed the scenario and brought a whole lot of features to the table as users could now share their work, update the files with changes that other people have made, and all of this was possible remotely! This idea of remote repositories was further improved by Apache Subversion.
Apache Subversion or SVN (2000: open source): SVN was much faster than CVS and allowed saving of non-text files as well.The big innovation of SVN was that it was not just tracking changes to single files or multiple files, but watching what happened to them in a directory as a whole. It was watching files and directories collectively and take a snapshot of the directory as a whole. CVS had a hard time when you renamed a file but SVN had no problem regarding that. Even if you added a file or removed a file, SVN would track it. In CVS, you talk about having a version 7 of some file, but in SVN, you talk about the file as it appears in revision 7! CVS would update files one at a time whenever it was applying changes or reading back the changes. SVN would instead do a transactional commit and apply all of the changes that happened to the directory or none at all. The snapshot was bigger than just the individual files as it was applying the changes to a whole directory. SVN was the most popular version control system until Git came out. Before I move on to Git, I would like to shed some light on BitKeeper SCM that came in between SVN and Git. This is an interesting story as it lead to the birth of Git.
BitKeeper SCM (2000: closed source): It was a closed source, proprietary source code management tool i.e. a company owned it and sold it, just like Adobe sells Photoshop or Microsoft sells Word. An important feature that BitKeeper had was distributed version control. They had a community version which they gave away for free but it had a few less features and usage restrictions. Here comes the interesting part! The free version was used for source code management of the Linux kernel from 2002 to 2005 (you can find my article on Linux kernel development and how it is tightly coupled with Git). It was controversial to use a proprietary SCM tool for Linux kernel since it is an open source project. Many people raised concerns saying, what if they change the rules in the future? The developers would be stuck using that company's software. Well, guess what? In April 2005, all those predictions came true and the community version stopped being free! So BitKeeper was never as popular as CVS or SVN but it is what led to the birth of Git. Git was created by, none other than, Linus Torvalds! This definitely proves that necessity is the mother of invention!
Git: A Distributed Version Control system
Git is a distributed version control system. It means that different users maintain their own repositories instead of working from a central repository. Changes are stored as sets or patches and the focus is on tracking changes and not the different versions of the document. You may think that CVS and SVN do that as well. They track changes from one version to other version of each different file, or different states of the directory.
Git focuses on encapsulating the change sets as a discrete unit and then these change sets can be exchanged between different repositories. The purpose is not to keep up to date with the latest version of something. The question is that do we have a change set applied or not? So there is no single master repository but many working copies, each with their own combination of change sets. Let's consider an example to make this clear. Imagine that you have changes to a single document as sets A, B, C, D, E and F. We can have a first repository that has all 6 of these change sets in it. We may a repository 2 that has 4 out of these change sets in it. This doesn't mean that the other repositories are behind the first one and they need to be updated. They just don't have the same change sets! We can have a repository 3 that has the change sets A, B, C, D and a repository 4 that has the change sets A, B, E and F. None of these repositories is right or wrong. None of them is a master repository as well. They are all different repositories that happen to have different change sets in them. We could easily add a change set G to repository 3 and then share it with four without having to go through any kind of central server. By convention, we do designate that a particular repository will be a master repository but it is not built into the Git architecture. All repositories are considered equal. For the sake of convenience, repositories are labelled as master so that everyone can synchronize their work with it. All this is done without going through a central repository. This is very important in the open source world as developers are working in different parts of the world. Someone might want to fork a repository and take the project in a completely different direction.
Installing Git
Head to the git website i.e. https://git-scm.com
As of now, the latest version available for the Windows platform is 2.27.0. You can go to the above link and download it for your Windows system.
If you are on Linux, then head to the https://git-scm.com/download/linux
Git is available for many flavors such as Debain, Ubuntu, Fedora, openSUSE and many more. Select the version as per your Linux distribution.
After successfully installing the setup, head on to the command prompt on your Windows machine and type the following command:
git --version
If this command gives you a meaningful result then it means that git has been successfully installed. Below is the output on my machine since I am using a Windows machine:
In Linux, after installing, open a terminal window and type the following command:
which git
This command will tell you where your git is installed. Then you can check the version by the same command as we used for the windows machine i.e. git --version.
Basic Git configuration
First thing that we need to know is that there are 3 places that git stores configuration information.
- System: This is the broadest configuration and it applies to every user of the system by default. Each user can override it with their own custom configurations but if you don't then these are the default. Linux: /etc/gitconfig Windows: Program Files\Git\etc\gitconfig
- User: These configurations apply only to a single user working on the machine. On Linux, this will be in your home directory inside a file called gitconfig. On Windows, it will be in your user directory which is your home directory with the same name. If you're not familiar with where your home directory is, chances are it's inside your documents and settings folder, and you should find your username there, and inside that username folder, that's where you'll see gitconfig. Linux: ~/.gitconfig Windows: $HOME\.gitconfig
- Project: We can have configurations that apply only to a single project. Most of the time you would want to have the same configuration in all the projects but sometimes you may want a specific set of changes for a particular project.
Git provides some commands that we can use to make changes to these specific levels as described above. The commands are as follows:
System:
git config --system
User:
git config --global
Project:
git config
Let's do some basic git configuration.
The very first thing that you should configure is your user name and email address as this will be used to identify you whenever you are committing some changes to the repository.
git config --global user.name "Harsh Mistry" git config --global user.email "[email protected]"
The above commands will setup you user name and email address. Make sure to put your own email in the above command.
Now you can list all the changes made by you using the following command:
git config --list
The following snapshot shows the output of the above command on my machine.
You can see that there are a lot of configurations that I haven't set yet. These are the system defaults. You can set these based on your requirements.
You can also look at the user name and email separately using the following commands:
git config user.name git config user.email
Now, lets setup the editor and color.
git config --global core.editor "notepad.exe" git config --global color.ui true
Note: If you are on Windows, then you would probably want to use notepad.exe or notepad++.exe. If you are on Linux, then you can use many editors such as nano, vim, emacs and many more based on your distro.
Setting color.ui to true basically tells git that you want the text to be displayed in different colors. You will see the reds, greens, blues and yellows after that. It might be set to true by default but it is good to set it.
You can check the above changes by displaying the contents of the .gitconfig file. They are as follows on my machine:
You would definitely want to install the auto-complete feature on git. Let's see how we can do that.
Auto-complete is a pretty handy feature. When you start typing and then press the Tab key, this will signal git to provide a set of alternatives based on what you are typing. This can save you a lot of typing since the file paths are usually long.
Git auto-complete is a separate script and it is included in the source code repository on GitHub. It can be found at the following link:
https://github.com/git/git/tree/master/contrib/completion
Here you will see a script file named git-completion.bash. This is the one you want to use. There are different scripts based on different shells.
Let's open git-completion.bash. You will see different instructions regarding how to setup. The snapshot below shows these instructions:
What you would want to do now is that copy the contents of this file and save it. The best way to do that is to click on the Raw button as shown below.
This will convert the file to a text-only format and then you can copy and paste it. You can also save the page as a git-completion.bash file from the browser itself in your user directory. The below snapshot shows the location on my machine:
Now, you would want to make this file automatically load all the time. For that you need to make it as a dot file such .gitconfig file which was discussed above. Here's how to do that:
To make this script run automatically, we need to add it to the .bashrc or .bash_profile. Just add the following code to the end of .bashrc or .bash_profile:
# Edit ~/.bashrc or ~/.bash_profile if [ -f ~/.git-completion.bash ]; then source ~/.git-completion.bash fi
The above shell scripting checks if the .git-completion.bash file is present or not. If the file exists, then load it into memory. Don't forget to save the file before exiting! Then restart the terminal or command prompt. The auto-complete should work now.
You can use the following command to learn more about git.
git help
It will list the different types of commands and their syntax. The below snapshot shows the output of help command.
Now, you have setup the basic git configuration and are ready to explore more!
Some useful resources:
**************************************************************************
References
- Official Git site: https://git-scm.com/
- Git Essential Training: The Basics by Kevin Skoglund - LinkedIn learning course
**************************************************************************
Links to my other articles:
- Introduction to Kernel Development Process
- Getting started with Operating Systems
- Introduction to Pointers
Wisdom is not a product of schooling but of the lifelong attempt to acquire it - Albert Einstein
SDE @Brillio | Google SoC'22 @NRNB | Ex Red Hat | Full Stack Developer | GCP Certified Cloud Engineer | AWS Certified Cloud Practitioner | Competitive Coder Rated as Expert(1700+)@Codeforces | Freelancer
4 年Great Work Harsh Mistry
IoT Engineer
4 年Good and informative article Harsh Mistry
Building LinkedSched | Web & Open Source ?? | IKEA
4 年Nice article Harsh Mistry ?
Senior Firmware Engineer at Eaton | Ex-Einfochips | Firmware Development | Zephyr RTOS
4 年Vidhi Vadher