Introduction to Git
Version control system
Version control: is a system that helps to manage changes to documents, code, and other collections of information over time. it allows multiple people to work on a project simultaneously, tracks changes made by each contributor, and can revert to previous versions if needed.
Version control system types
There are 3 types of version control systems:
Key Concepts of Version Control
Git & GitHub
Git:
Git is a distributed version control system. It allows you to keep track of changes, revert to previous states, and collaborate with others. It's installed locally and used via command line or GUI tools.
GitHub:
GitHub is a web-based platform that uses Git for version control. It provides additional features like issue tracking, project management, and collaborative tools like pull requests and code reviews.
Git Architecture
Requirements for Git Architecture:
Let's dive deeper into Git's internal data structures, which are the backbone of how Git manages and tracks changes efficiently.
To track everything, we need to convert files and folders to objects in Git:
Types of Objects in Git: Git primarily uses four types of objects:-
These structures are stored in Git's object database, which is located in the .git/objects directory. Each object is identified by a unique SHA-1 hash. This ensures data integrity and allows Git to efficiently detect changes.
How Git Tracks Changes:
Git keeps a database of objects that are stored as key-value pairs. In this system, the key is a unique identifier called a SHA-1 hash, and the value is the actual content of the object (like a file or a commit). This allows Git to efficiently manage and track changes in your code and files.
OS independent:
Git itself is inherently designed to be OS-independent, meaning the core functionalities and commands work consistently across various operating systems like Windows, macOS, and Linux. This is because Git operates at the file system level and relies on a text-based interface for commands.
Git achieves that through a separation between the user-facing working directory structure and its internal data management system. This allows developers on different operating systems to collaborate seamlessly using Git. Here's a breakdown:
Git uses blobs, trees, and commits to represent your project's data. Blobs store file content, trees represent directory structures, and commit reference-specific trees, capturing project snapshots. This internal model is independent of the operating system.
The .git folder within your working directory acts as the central location for Git to store its internal data structures (blobs, trees, commits, etc.), This folder structure is designed to work consistently across operating systems. The .git folder is typically hidden by default on most operating systems, keeping the internal Git data separate from the project files you work with directly. While the working directory structure reflects your OS's folder hierarchy, Git manages its internal data model (blobs, trees, commits) independently. This separation ensures consistent functionality regardless of the underlying operating system.
Unique ID:
Git uses SHA-1 hash functions to uniquely identify objects (blobs, trees, commits). This ensures data integrity and makes it easy to detect changes. Git stores all objects (blobs, trees, commits) in the .git/objects directory. These objects are indexed by their SHA-1 hash.
Track history:
Git tracks the history of changes through a series of commits, each identified by its SHA-1 hash.
When you make changes to files and add them to the staging area using git add, Git creates a snapshot of the current state of the project.
When you commit these changes with the git commit, Git creates a new commit object. This commit object includes:
Each blob object stores the contents of a file. The SHA-1 hash of the blob is based on the file content, ensuring that identical files have the same hash.
Commits are linked together in a chain. Each commit points to one or more parent commits, forming a directed acyclic graph (DAG). This structure allows Git to track the history of changes efficiently.
Staging Area
The staging area (also known as the index or cache) in Git is an intermediate space where you can gather changes before committing them to the repository. It allows you to prepare and review what will go into your next commit, giving you finer control over what changes are recorded in your project's history.
Imagine the staging area as a temporary holding place for changes you've made to your project before they become a permanent part of your project's history.
Staging Changes:
When you modify files in your working directory, those changes are not immediately ready to be committed. You need to explicitly stage them using the git add command. Staging a file tells Git that you want to include the changes in the next commit.
Reviewing Staged Changes:
You can see what changes have been staged by using the git status command. If you want to inspect the exact differences between the working directory, the staging area, and the last commit, we can use git diff commands:
git diff: shows changes in the working directory that haven't been staged.
git diff –staged: shows changes that have been staged but not yet committed
Unstaging Changes:
If you stage a file by mistake or change your mind, you can unstage it using git reset <file>.
Making a Commit:
Once you’re satisfied with the changes in the staging area, you create a commit using the git commit command.
Only the changes that have been staged will be included in the commit. If some changes were not staged, they won’t be part of the commit.
Commit
In Git, a commit is a snapshot of your project's file system at a specific point in time. It represents a set of changes that you want to record in the version history. Commits are fundamental to Git's operation, as they allow you to track changes, revert to previous states, and collaborate with others by sharing your work. Each commit is assigned a unique identifier, a SHA-1 hash, which is calculated based on the content of the commit.
Unlike some version control systems that store diffs (differences) between file versions, Git stores a snapshot of the entire repository at the time of each commit. However, Git is highly efficient in storing these snapshots. If files haven't changed between commits, Git simply references the previous version of those files, so it doesn't use extra space.
Once created, a commit in Git cannot be changed. You can create new commits that alter the state of the project, The original commit stays unchanged in the history.
Each commit is uniquely identified by a 40-character SHA-1 hash. This hash is calculated based on the commit's content, including the changes, metadata (author, date, message), and parent commit.
Content of a Commit: A commit doesn't store the content directly. Instead, it stores a reference to a tree object:-
Commits enable you to revert to previous states of the project if needed, helping you recover from mistakes or undesired changes.
Basic Git Operations
Basic Git operations are the essential commands and workflows that you need to understand to use Git effectively. These operations allow you to initialize repositories, track changes, collaborate with others, and manage the history of your project. Here’s a comprehensive overview of the most important Git operations:
Initialization:
git init: This command creates a new Git repository in the current directory. It initializes the hidden .git folder where Git stores its internal data structures.
Viewing Changes:
git status: This command displays the status of your files, indicating which ones are modified, staged, untracked, etc.
Tracking Changes:
git add <file>: This command adds a specific file to the staging area, indicating you want to include its changes in the next commit.
git add . : This adds all modified and tracked files in the current directory to the staging area.
Committing Changes:
git commit -m "<message>": This command captures the current state of the staged files as a new commit. The <message> argument is a brief description of the changes you're committing.
Git commit -am "<message>": This adds all modified and captures the current state of the staged files as a new commit.
Viewing Commit History:
The git log command allows you to explore the history of your project by displaying a list of commits. It acts like a time machine for your codebase.
Limiting Output: You can specify a number (e.g., git log -2) to show only the last two commits.
领英推荐
Following Commits: Use the -f flag (e.g., git log -f ) to view commits in a full format, including the actual changes made in each commit.
Grepping for Specific Commits: The grep command can be combined with git log to search for commits containing specific keywords in their messages ?(e.g., git log --grep "bug fix").
commit on a single line: The command git log --oneline shows you the commit history of your Git repository in a concise format, with each commit on a single line. This is a flag (option) for the git log command. It instructs Git to display each commit on a single line.
Viewing Differences Using the git diff command:
This is the most common approach for viewing file changes within your Git repository. This command shows the difference between the working directory and the HEAD commit. It displays the changes line by line, highlighting additions with a + sign and deletions with a - sign.
view detailed information
The git show command in Git allows you to view detailed information about specific objects within your repository. These objects can be commits, blobs (which store file content), trees (which represent directory structures), or tags (which act like bookmarks for specific commits).
You need to provide the hash of the object you want to see details about. You can find these hashes using commands like git log for commits or by inspecting the .git folder for blobs and trees. Based on the object type, git show displays relevant information:
HEAD
HEAD is a special pointer that represents the current state of your working directory and indicates which commit or branch you are currently working on. It plays a crucial role in navigating the commit history and managing changes in your repository.
HEAD is a symbolic reference, meaning it points to another reference, typically a branch or a specific commit. When you make a new commit, Git updates HEAD to point to this new commit.
How HEAD Works:
A detached HEAD occurs when HEAD points directly to a specific commit rather than a branch. This can happen if you check out a specific commit or a tag instead of a branch. In this state, you can still make changes and commits, but they won’t belong to any branch unless you explicitly create a new branch from them.
Moving HEAD:
Tags
A tag is a special reference that points to a specific commit. Tags are typically used to mark important points in your project's history, such as releases, milestones, or significant changes. Unlike branches, which continue to move as you commit new changes, a tag is immutable and always points to the same commit. A Git tag is essentially a label or marker that points to a specific commit in your project's history. It's like placing a bookmark on a particular page in a book.
A tag is essentially a snapshot of a specific commit, allowing you to easily refer back to it later. Once created, a tag does not change or move; it remains associated with the commit it was created for.
Tags often have meaningful names, such as v1.0.0 or release-2024, making it easy to identify key points in the project's history.
Types of Tags:
Lightweight Tags:
A lightweight tag is simply a pointer to a specific commit, much like a branch but without the ability to move. It is just a name that points directly to a commit, with no additional metadata. Lightweight tags are quick to create but don't store any extra information (e.g., who created the tag, and when it was created).
git tag v1.0.0? # Creates a lightweight tag named 'v1.0.0'
Annotated Tags:
Annotated tags are more robust and are stored as full objects in the Git database. They include metadata such as the tagger's name, email, date, and a tagging message. This makes them more informative and useful for release management. Annotated tags are recommended when you want to include additional context or make the tag more meaningful.
git tag -a v1.0.0 -m "Version 1.0.0 release"? # Creates an annotated tag with a message.
What Are Tags Used For?
Undoing things
Undoing changes in Git is a common task when working with version control. There are several methods to undo changes, each suited to different scenarios:
Git Branching
What is a Branch?
branching is a powerful feature that allows you to create independent lines of development within your repository. Each branch represents a separate workspace where you can make changes without affecting the main project or other branches. Branching is fundamental to workflows in Git, enabling features like parallel development, experimentation, and collaboration.
A Branch as a Pointer: In Git, a branch is essentially a movable pointer to a commit. The default branch when you create a new Git repository is called main (or sometimes master), and as you make commits, this pointer advances along with your commits, always pointing to the latest commit.
Branching for Parallel Development: Branches allow you to diverge from the main line of development and continue to work on a separate line of development. This is useful for developing new features, fixing bugs, or trying out ideas without affecting the main codebase.
How Branching Works?
Creating a New Branch:
When you create a new branch, Git simply creates a new pointer that references the current commit. This new branch will start from the same place as the current branch. Command:
git branch feature-branch? # Creates a new branch with a name.
Switching Between Branches:
To work on a different branch, you switch (or "checkout") to that branch. When you switch branches, Git updates your working directory to match the state of the branch you checked out. Command:
git switch feature-branch? # Switches to 'feature-branch' we can use the checkout command
Making Changes in a Branch:
Any commits you make while on a branch are unique to that branch. Other branches are not affected by these commits. You can switch back to the original branch (main), and your changes on the feature branch will not be visible unless merged.
Merging Branches:
Once you've completed work on a branch, you may want to merge those changes back into another branch, typically the main branch. Merging incorporates the changes from one branch into another. If there are no conflicting changes, Git will perform a "fast-forward" merge, simply moving the pointer forward. If there are conflicts, you'll need to resolve them manually. Command:
git switch main git merge feature-branch? # Merges 'feature-branch' into 'main'
Deleting a Branch:
After merging, you may want to delete the branch if it’s no longer needed to keep your repository clean. Command:
git branch -d feature-branch? # Deletes 'feature-branch'
Merging branches
Merging in Git is the process of combining changes from one branch into another. It's a fundamental operation that integrates the work done on different branches, allowing you to consolidate features, fixes, or other development work into a single branch.
What is Merging?
Merging takes the content of a source branch and integrates it into the target branch. The result is a commit that has two parent commits, one from each branch, thereby combining the changes from both.
The merge process preserves the commit history of both branches, creating a merge commit that links the two histories together.
How Merging Works?
Types of Merges:
How to Merge Branches?
Basic Merge Command:
To merge another branch into the one you are currently on.
git merge <branch-name> #This command merges the specified branch into the current branch.
Resolving Conflicts:
If the branches have changes that conflict with each other, Git will not automatically complete the merge. Instead, it will mark the conflicts in the affected files and pause the merge. You will need to manually resolve these conflicts by editing the files, staging the resolved files, and completing the merge.
Aborting a Merge:
If you encounter issues during a merge and decide not to proceed, you can abort the merge process, restoring your branch to its previous state:
git merge –abort
Merging Without Committing:
Sometimes, you might want to see the merge result before committing it. You can do this by using the --no-commit option:
git merge --no-commit <branch-name>
This merges the changes but pauses before creating the merge commit, allowing you to review or make further changes.