登录查看更多内容

Everything You Need to Know about Git and its Commands

Steffen Knoedler

Senior Fullstack Software Engineer

发布日期: 2020年6月3日

1 Description

Git is a free and open source version control system, originally created by Linus Torvalds in 2005. Version control systems (VCSs) are tools that can track changes to programming code (or other collections of files). VCSs track changes to a folder and its contents in a series of snapshots, where each snapshot contains the entire state of the files and folders within a top-level directory (working directory). VCSs also maintain metadata, such as who created the snapshot and comments. The track record is saved in a database, a repository in Git.

Git is designed distributed. This means that every developer has the full history of the code repository locally. To do that, Git divides the system into a local repository and a shared remote repository. Unlike once popular centralized version control systems without local repositories, Git doesn’t need a constant connection to a central remote repository. You can develop and commit changes to your local directory, even without internet access, and still keep track of the changes made to the source code. As soon as you have internet access, you can push the changes from your local- to the remote repositories and thus, sync both repositories. Developers can work anywhere and collaborate asynchronously from any time zone.

There are several git hosting providers, such as Github.

Btw. Git means “unpleasant person” in British English slang. The founder, Linus Torvalds, sarcastically quipped about the name git

“I’m an egotistical bastard, and I name all my projects after myself”.

The name “git” was given by Linus Torvalds when he wrote the very first version. At that time, he described the tool as “the stupid content tracker”.

2 Git’s Data Model

2.1 Git’s Model History

Git models the history of a collection of files and folders as a series of snapshots. A snapshot is a certain state of the top-level tree (the directory) that is being tracked.

Git does not model the history in time-order! Git takes advantage of an Directed Acyclic Graph (DAG) to model the history of snapshots

Each snapshot in Git refers to a set of “parents”, the snapshots that preceded it. A snapshot might descend from multiple parents, for example due to combining (merging) two parallel branches of development.

Snapshot = Commit = State of tree = Node in DAG
Snapshots are immutable

Figure 1: Git, series of snapshots/ Commits

Git can be divided into the Index and the Object Store. Index is a file that describes the directory structure of the repository at a given point in time. The object store actually contains all objects that are described in the index.

2.2 Object Store

It contains your original data files and all the log messages, author information, dates, and other information required to rebuild any version or branch of the project.

Blob (binary large object): A File that can contain any data and whose internal structure is ignored, no metadata attached
Tree: One level of directory information. It records blob identifiers, path names, and a bit of metadata for all the files in one directory. It can also recursively reference other (sub)tree objects and thus build a complete hierarchy of files and sub-directories.
Commits: A commit object holds metadata for each change introduced into the repository, including the author, committer, commit date, and log message. Each commit points to a tree object that captures, in one complete snapshot, the state of the repository at the time the commit was performed. The initial commit, or root commit, has no parent. Most commits have one commit parent

Data Storage & Compression

Git does not store duplicates. When a new commit is created, it contains a snapshot of all objects but this does not mean that unchanged files are stored again. Instead, Git just references the object from a parent object. When you edit a file, the snapshot contains a new edited file. However, Git also analyzes changed files and the similarity between the new and the older version. Using pack-files and delta-compression, Git can reduce the storage need depending on how similar the files are.

2.3 Index

New content is added to the object store depending on changed made in the index. The index is a temporary and dynamic binary file in the .git folder that describes the directory structure of the entire repository. More specifically, the index captures a version of the project’s overall structure at some moment in time. When you add changes to your staging area, git updates the information in the index about those changes and creates new blob objects in the .git/objects directory with all the other blobs that belong to previous commits. The index tracks all objets by the hash. The has describes the content of the object. You can add files to the index and if it already exist with other content, it will replace that version. Each add creates a blob (file) but only the most recent version will exist longer than 30 days in the object store.

object = blob (file) / tree (directory) / commit (snapshot)

Each object is identified by a SHA-1 hash (rendered as a 40 digits long hexadecimal value) of its contents. Git computes the hash and uses this value for the object’s name. The first two characters of the hash are used to describe the directory and the rest of the hash is the file name. Any tiny change to a file causes the SHA1 hash to change, causing the new version of the file to be indexed separately. Git tracks content instead of file names.

References

humans cant remember hashes. Hence, Git uses references

references are mutable
Master reference usually points to the latest commit in the main branch of development
Head references to the last commit in the currently checked-out branch

Other important definitions

repository = database
Working directory = Your local working folder on disk
origin = source remote repository from where it was cloned; used instead of that original repository’s URL
staged = files are tracked by git, located in staging area/ index

3 Git commands

In the following, I provide a summary of the most important commands. This is not meant to be a complete list of git commands.

3.1 Initialize the Working Directory

Repo already exists on Github

`git clone https://github.com/me/repo.git` download/ checkout a repository on GitHub.com to our machine
`cd repo` change into the repo directory

Repo does not exist on Github yet

`git init` Create a new local repository

This creates the .git folder. It contains all information needed for version control to clone the entire repo. It contains all the information about commits, remote repository address, etc.

3.2 Data transport

3.2.1 Config Author

Configure the author name and email address to be used with your commits.

`git config — global user.name “Steffen Knoedler”`
`git config — global user.email [email protected]`

3.2.2 Index/ staging area

Staging area allows you to split your commit in multiple commits, by adding and commit one feature and then adding and committing another feature.

`git add -A` ( — all) Adds everything, so that everything in your folder on disk is represented in the staging area
`git add .` Stages everything, but does not remove files that have been deleted from the disk
`git add *` Stages everything, but not files that begin with a dot & does not remove files that have been deleted from the disk
`git add -u` ( — update) Stages only Modified Files, removes files that have been deleted from disk, does not add new
`git add <file name 1> <file name 2>` Adds only certain file(s)

3.2.3 Commit to local directory

Record changes from staging area to local directory.

`git commit -m “Commit message”` Commit changes to head
`git commit -a` Automatically adds all changes to staging area and commits staging area

3.2.4 Push to Remote Directory

`git push origin master` Send changes to the master branch of your remote repository

3.2.5 Difference

`git diff` View difference between Stage and Working Directory
`git diff — staged`View difference between HEAD and Stage
`git diff — staged`View difference between HEAD and Working Directory
`git diff [first branch] [second branch]` shows the differences between the two branches mentioned

3.2.6 Undoing Commits & Changes

1 Checkout

Switch branches or restore working tree files. The git checkout command operates upon three distinct entities: files, commits, and branches. If no pathspec was given, git checkout will also update HEAD to set the specified branch as the current branch.

1.1 Checkout Commits

You can utilize the git checkout command to visit that commit. Git checkout is an easy way to “load” any of these saved snapshots onto your development machine. HEAD usually points to master or some other local branch, but when you check out a previous commit, HEAD no longer points to a branch — it points directly to a commit. This is called a *detached HEAD* state

`git checkout a1e8fb5` This makes your working directory match the exact state of the a1e8fb5 commit

Any new commits at a detached Head will be orphaned and deleted at one point. If you want to continue developing from this point forward you new to create a new branch and switch to it (git checkout -b <branch name>). If you just wanted to load the files from that commit and you want to go back, use:

`git checkout master` get back to the “current” state of your project in master branch

1.2 Checkout Branch

Git checkout allows you to switch between branches. Branches are an independent line of development (other path in DAG). More details later

- `git checkout -b <branch name>`Create a new branch and switch to it

- `git checkout <branch name>` Switch to the named branch

1.3 Checkout files

Checking out an old file does not move the HEAD pointer. It remains on the same branch and same commit, avoiding a ‘detached head’ state. You can then commit the old version of the file in a new snapshot as you would any other changes. So, in effect, this usage of git checkout on a file, serves as a way to revert back to an old version of an individual file.

2 Revert and Reset

`git revert <commit name>` undoes a single commit by creating a new commit that undoes the changes made by the last commit. The pointer is then located at the new revert commit.
`git revert HEAD`reverts the latest commit
`git revert <commit name>` goes back to the previous state of a project by removing all subsequent commits. Results in orphaning commits and should be avoided

3 Remove

git rm command can be used to remove individual files or a collection of files from the staging area/ index.

`git rm — cached` removes the file from the index (staging it for deletion on the next commit), but keep your copy in the local file system.
`git rm <file name>` removes file from index and working directory

4 Pull

`git pull` updates the local version of a repository from a remote. It combines to git commands:

`git fetch` downloads content from the branch the head reference is pointed at
`git merge` creates a new commit in which the local and the remote repo are merged, the head is updated to the new commit

You can also use `git pull — rebase`which does not merge the remote and local repository but copies the commits, which are not incorporated in the local repo, and attaches the commits the the local repo. The HEAD pointer moves and points at the last added commit at the local repo.

3.3 Branches

A branch represents an independent line of development. Branches serve as an abstraction for the edit/stage/commit process. You can think of them as a way to request a brand new working directory, staging area, and project history. New commits are recorded in the history for the current branch, which results in a fork in the history of the project.

`git branch` Listing all branches in your repo
`git branch <branch name>` Create a new branch
`git push origin <branchname>` Push the branch to your remote repository, so others can use it
`git push — all origin` Push all branches to your remote repository
`git push origin :<branchname>` Delete a branch on your remote repository
`git branch -m <new branch name>` rename current branch
`git branch -d <branch name>` savely deletes branch if no unmerged changes left
`git branch -D <branch name>` force deletion of branch, with or without unmerged changes

It’s important to understand that branches are just pointers to commits. When you create a branch, all Git needs to do is create a new pointer, it doesn’t change the repository in any other way. When a branch with the name “Crazy experiment” is created. Next to the exist Master pointer, we add the “Crazy experiment” pointer.

Find me somewhere else:

https://sknoedler.github.io

https://github.com/SKnoedler

要查看或添加评论，请登录

Steffen Knoedler的更多文章

Time is Money! But what is Time?

2020年12月3日

Time is Money! But what is Time?

Time is money. In our society we consider time as a scarce resource that can be measured.

1 条评论
Automate Your Job Search with AWS

2019年12月19日

Automate Your Job Search with AWS

In an earlier article, I've described how you can build your own web crawler that collects job postings on the…

3 条评论
Data Project: Who is looking for Data Enthusiasts in Frankfurt, Germany?

2019年7月12日

Data Project: Who is looking for Data Enthusiasts in Frankfurt, Germany?

I am a sucker for Data and I aspire to become a Data Scientist in the future. Small side projects help me to improve my…

2 条评论
Deep Probabilistic Programming

2019年2月24日

Deep Probabilistic Programming

This week, one of our Data Science professors Thomas Hamelryck introduced us to Deep Probabilistic Programming (Deep…

1 条评论

Everything You Need to Know about Git and its Commands

Steffen Knoedler

Senior Fullstack Software Engineer

1 Description

2 Git’s Data Model

References

3 Git commands

Steffen Knoedler的更多文章

社区洞察

其他会员也浏览了

Thing , Feel Git is Boaring

Essential Git Cheat Sheet

Source Code Management using Git and GitHub

MASTERCLASS #TASK ON GIT & GITHUB

GIT & GITHUB

VERSION CONTROL SYSTEM:GIT

Understanding and Getting comfortable with Git

Git & GitHub

Git & GitHub Article

Git 4 GitHub

1 Description

2 Git’s Data Model

References

3 Git commands

Steffen Knoedler的更多文章

Time is Money! But what is Time?

Automate Your Job Search with AWS

Data Project: Who is looking for Data Enthusiasts in Frankfurt, Germany?

Deep Probabilistic Programming

社区洞察

其他会员也浏览了

Thing , Feel Git is Boaring

Essential Git Cheat Sheet

Source Code Management using Git and GitHub

MASTERCLASS #TASK ON GIT & GITHUB

GIT & GITHUB

VERSION CONTROL SYSTEM:GIT

Understanding and Getting comfortable with Git

Git & GitHub

Git & GitHub Article

Git 4 GitHub