Introduction to Git
Version Control System - Git

Introduction to Git

Version control system

Version control: is a system that helps to manage changes to documents, code, and other collections of information over time. it allows multiple people to work on a project simultaneously, tracks changes made by each contributor, and can revert to previous versions if needed.

Version control system types

There are 3 types of version control systems:


types of version control systems:
1. Local version control system (LVC)
2. Centralized version control system (CVC)
3.Distributed version control system (DVC)
Version control system types

  1. Local version control system (LVC): A local version control system is a local database located on your local computer, in which every file change is stored as a patch. The main problem with this is that everything is stored locally. If anything were to happen to the local database, all the patches would be lost. If anything were to happen to a single version, all the changes made after that version would be lost.
  2. Centralized version control system (CVC): A centralized version control system has a single server that contains all the file versions. This enables multiple clients to simultaneously access files on the server, pull them to their local computer, or push them onto the server from their local computer. This way, everyone usually knows what everyone else on the project is doing. Administrators have control over who can do what. The biggest issue with this structure is that everything is stored on the centralized server. If something were to happen to that server, nobody could save their versioned changes, pull files, or collaborate at all.
  3. Distributed version control system (DVC): With distributed version control systems, you clone a copy of a repository locally so that you have the full history of the project.

Key Concepts of Version Control

  1. Repositories: A repository (or repo) is a storage location for your project files and their history. It's where all the data is kept, including files, commit history, branches, and tags.
  2. Commits: A commit represents a snapshot of your project at a specific point in time. It includes a message describing what changes were made and who made them.
  3. Branches: Branches allow you to diverge from the main line of development and work on something independently. They are useful for developing features, fixing bugs, or experimenting.
  4. Merging: Merging is the process of integrating changes from different branches. This allows you to incorporate new features or bug fixes from one branch into another.
  5. Cloning: Cloning creates a copy of a repository on your local machine, allowing you to work on it offline.
  6. Pull Requests: In platforms like GitHub, pull requests are a way to propose changes to a project. They allow others to review your changes before integrating them into the main code base.
  7. Conflicts: Conflicts occur when changes from different branches contradict each other. They need to be resolved manually during a merge.

Git & GitHub

Git:

Git is a distributed version control system. It allows you to keep track of changes, revert to previous states, and collaborate with others. It's installed locally and used via command line or GUI tools.

GitHub:

GitHub is a web-based platform that uses Git for version control. It provides additional features like issue tracking, project management, and collaborative tools like pull requests and code reviews.

Git Architecture

Requirements for Git Architecture:

  • Track everything.
  • OS independent.
  • Unique ID.
  • Track History.

Let's dive deeper into Git's internal data structures, which are the backbone of how Git manages and tracks changes efficiently.

To track everything, we need to convert files and folders to objects in Git:

Types of Objects in Git: Git primarily uses four types of objects:-

  • Blob: (Binary Large Object) ?is a fundamental building block for storing file content. ?is the simplest object, It's essentially a raw data storage mechanism for the actual (contents, and metadata) of your files. When you add a file to your Git repository, Git doesn't store the filename itself within the blob. it creates a blob object that contains the raw bytes representing the file's content.
  • Tree: is another fundamental data structure that plays a crucial role in organizing and referencing files within your repository. Trees provide a structured way to represent the organization of files and subdirectories in your project. This hierarchical structure is essential for managing complex codebases.
  • Commit: A commit is a snapshot of the project at a point in time. It records the state of the project, including all files and their contents, and contains metadata like the commit message, author, and pointers to parent commits.
  • Tags: Tags are references that point to specific commits. They are typically used to mark release points.

These structures are stored in Git's object database, which is located in the .git/objects directory. Each object is identified by a unique SHA-1 hash. This ensures data integrity and allows Git to efficiently detect changes.

How Git Tracks Changes:

  • Committing: when you commit, Git creates a new commit object, pointing to a tree object representing your project's structure at that moment.
  • Changing Files: If you modify files, Git creates new blob objects for the changed files.
  • Creating New Files or Directories: Git creates new blob and tree objects as needed.
  • Deleting Files: Git doesn't physically delete files. Instead, the tree object simply omits the reference to the deleted file's blob.

Git keeps a database of objects that are stored as key-value pairs. In this system, the key is a unique identifier called a SHA-1 hash, and the value is the actual content of the object (like a file or a commit). This allows Git to efficiently manage and track changes in your code and files.

OS independent:

Git itself is inherently designed to be OS-independent, meaning the core functionalities and commands work consistently across various operating systems like Windows, macOS, and Linux. This is because Git operates at the file system level and relies on a text-based interface for commands.

Git achieves that through a separation between the user-facing working directory structure and its internal data management system. This allows developers on different operating systems to collaborate seamlessly using Git. Here's a breakdown:

Git uses blobs, trees, and commits to represent your project's data. Blobs store file content, trees represent directory structures, and commit reference-specific trees, capturing project snapshots. This internal model is independent of the operating system.

The .git folder within your working directory acts as the central location for Git to store its internal data structures (blobs, trees, commits, etc.), This folder structure is designed to work consistently across operating systems. The .git folder is typically hidden by default on most operating systems, keeping the internal Git data separate from the project files you work with directly. While the working directory structure reflects your OS's folder hierarchy, Git manages its internal data model (blobs, trees, commits) independently. This separation ensures consistent functionality regardless of the underlying operating system.

Unique ID:

Git uses SHA-1 hash functions to uniquely identify objects (blobs, trees, commits). This ensures data integrity and makes it easy to detect changes. Git stores all objects (blobs, trees, commits) in the .git/objects directory. These objects are indexed by their SHA-1 hash.

Track history:

Git tracks the history of changes through a series of commits, each identified by its SHA-1 hash.

When you make changes to files and add them to the staging area using git add, Git creates a snapshot of the current state of the project.

When you commit these changes with the git commit, Git creates a new commit object. This commit object includes:

  • A reference to a tree object that represents the state of the files in the project.
  • Metadata such as the author, date, and commit message.
  • Pointers to parent commit(s), forming a linked list of commits.

Each blob object stores the contents of a file. The SHA-1 hash of the blob is based on the file content, ensuring that identical files have the same hash.

Commits are linked together in a chain. Each commit points to one or more parent commits, forming a directed acyclic graph (DAG). This structure allows Git to track the history of changes efficiently.

Staging Area

The staging area (also known as the index or cache) in Git is an intermediate space where you can gather changes before committing them to the repository. It allows you to prepare and review what will go into your next commit, giving you finer control over what changes are recorded in your project's history.

Imagine the staging area as a temporary holding place for changes you've made to your project before they become a permanent part of your project's history.

Staging Changes:

When you modify files in your working directory, those changes are not immediately ready to be committed. You need to explicitly stage them using the git add command. Staging a file tells Git that you want to include the changes in the next commit.

Reviewing Staged Changes:

You can see what changes have been staged by using the git status command. If you want to inspect the exact differences between the working directory, the staging area, and the last commit, we can use git diff commands:

git diff: shows changes in the working directory that haven't been staged.

git diff –staged: shows changes that have been staged but not yet committed

Unstaging Changes:

If you stage a file by mistake or change your mind, you can unstage it using git reset <file>.

Making a Commit:

Once you’re satisfied with the changes in the staging area, you create a commit using the git commit command.

Only the changes that have been staged will be included in the commit. If some changes were not staged, they won’t be part of the commit.

Commit

In Git, a commit is a snapshot of your project's file system at a specific point in time. It represents a set of changes that you want to record in the version history. Commits are fundamental to Git's operation, as they allow you to track changes, revert to previous states, and collaborate with others by sharing your work. Each commit is assigned a unique identifier, a SHA-1 hash, which is calculated based on the content of the commit.

Unlike some version control systems that store diffs (differences) between file versions, Git stores a snapshot of the entire repository at the time of each commit. However, Git is highly efficient in storing these snapshots. If files haven't changed between commits, Git simply references the previous version of those files, so it doesn't use extra space.

Once created, a commit in Git cannot be changed. You can create new commits that alter the state of the project, The original commit stays unchanged in the history.

Each commit is uniquely identified by a 40-character SHA-1 hash. This hash is calculated based on the commit's content, including the changes, metadata (author, date, message), and parent commit.

Content of a Commit: A commit doesn't store the content directly. Instead, it stores a reference to a tree object:-

  1. Tree Object: The commit references a tree object that represents the state of the directory at the time of the commit. The tree object contains references to blob objects (which store file contents) and possibly other tree objects (which represent subdirectories).
  2. Parent Commit: A commit usually has a single parent, which is the commit that directly precedes it. This creates a chain of commits, forming the commit history.
  3. Author Information: The commit records the name and email of the person who made the commit (the author) and the timestamp of when the commit was created.
  4. Commit Message: A commit message is a short description of the changes included in the commit.

Commits enable you to revert to previous states of the project if needed, helping you recover from mistakes or undesired changes.

Basic Git Operations

Basic Git operations are the essential commands and workflows that you need to understand to use Git effectively. These operations allow you to initialize repositories, track changes, collaborate with others, and manage the history of your project. Here’s a comprehensive overview of the most important Git operations:

Initialization:

git init: This command creates a new Git repository in the current directory. It initializes the hidden .git folder where Git stores its internal data structures.

Viewing Changes:

git status: This command displays the status of your files, indicating which ones are modified, staged, untracked, etc.

Tracking Changes:

git add <file>: This command adds a specific file to the staging area, indicating you want to include its changes in the next commit.

git add . : This adds all modified and tracked files in the current directory to the staging area.

Committing Changes:

git commit -m "<message>": This command captures the current state of the staged files as a new commit. The <message> argument is a brief description of the changes you're committing.

Git commit -am "<message>": This adds all modified and captures the current state of the staged files as a new commit.

Viewing Commit History:

The git log command allows you to explore the history of your project by displaying a list of commits. It acts like a time machine for your codebase.

Limiting Output: You can specify a number (e.g., git log -2) to show only the last two commits.

Following Commits: Use the -f flag (e.g., git log -f ) to view commits in a full format, including the actual changes made in each commit.

Grepping for Specific Commits: The grep command can be combined with git log to search for commits containing specific keywords in their messages ?(e.g., git log --grep "bug fix").

commit on a single line: The command git log --oneline shows you the commit history of your Git repository in a concise format, with each commit on a single line. This is a flag (option) for the git log command. It instructs Git to display each commit on a single line.

Viewing Differences Using the git diff command:

This is the most common approach for viewing file changes within your Git repository. This command shows the difference between the working directory and the HEAD commit. It displays the changes line by line, highlighting additions with a + sign and deletions with a - sign.

  • Comparing Different Commits: git diff <commit_hash1> <commit_hash2> This allows you to compare the changes between two specific commits identified by their unique hashes.
  • Viewing Staged Changes: git diff –cached :- This displays the difference between the staged files and the HEAD commit. This helps you review the specific changes you've added to the staging area before committing.
  • Viewing Unstaged Changes: git diff –unstaged :- This shows the difference between the working directory and the index (staging area). It highlights any unstaged changes you've made to files but hasn't added for the next commit.

view detailed information

The git show command in Git allows you to view detailed information about specific objects within your repository. These objects can be commits, blobs (which store file content), trees (which represent directory structures), or tags (which act like bookmarks for specific commits).

You need to provide the hash of the object you want to see details about. You can find these hashes using commands like git log for commits or by inspecting the .git folder for blobs and trees. Based on the object type, git show displays relevant information:

  • Commits: It shows the commit hash, author name and email, date and time, and the commit message. Additionally, it displays the diff (difference) between the files in that commit and its parent commit (the commit it originated from).
  • Blobs: It displays the raw content of the file stored in the blob object (usually not human-readable for non-text files).
  • Trees: It shows a listing of entries within the tree, indicating file names and the corresponding hash pointers to child blobs or subdirectory trees.
  • Tags: It displays the tag name, the hash of the tagged commit object, and the tag message (if provided when creating the tag).

HEAD

HEAD is a special pointer that represents the current state of your working directory and indicates which commit or branch you are currently working on. It plays a crucial role in navigating the commit history and managing changes in your repository.

HEAD is a symbolic reference, meaning it points to another reference, typically a branch or a specific commit. When you make a new commit, Git updates HEAD to point to this new commit.

How HEAD Works:

  • Points to a commit: HEAD directly or indirectly points to a specific commit in your repository.
  • Represents the current branch: In most cases, HEAD is a symbolic reference to the current branch.
  • Updates with branch changes: When you create a new commit, the HEAD moves forward to point to the newly created commit.
  • Detached HEAD: In some cases, HEAD can point directly to a commit without being associated with a branch. This is called a detached HEAD state.

A detached HEAD occurs when HEAD points directly to a specific commit rather than a branch. This can happen if you check out a specific commit or a tag instead of a branch. In this state, you can still make changes and commits, but they won’t belong to any branch unless you explicitly create a new branch from them.

Moving HEAD:

  • Switching Branches: When you switch branches, Git moves HEAD to point to the new branch.
  • Reverting HEAD: You can reset HEAD to point to a previous commit or branch state. This is useful for undoing changes. git reset --hard HEAD~1? # Move HEAD to the previous commit.
  • Fast-forward HEAD: You can reset HEAD to point to a previous commit or branch state. git reset --hard HEAD@{1}? # Move HEAD to forward.
  • Rebasing or Merging: During operations like rebase or merge, HEAD moves through different commits as Git applies changes.

Tags

A tag is a special reference that points to a specific commit. Tags are typically used to mark important points in your project's history, such as releases, milestones, or significant changes. Unlike branches, which continue to move as you commit new changes, a tag is immutable and always points to the same commit. A Git tag is essentially a label or marker that points to a specific commit in your project's history. It's like placing a bookmark on a particular page in a book.

A tag is essentially a snapshot of a specific commit, allowing you to easily refer back to it later. Once created, a tag does not change or move; it remains associated with the commit it was created for.

Tags often have meaningful names, such as v1.0.0 or release-2024, making it easy to identify key points in the project's history.

Types of Tags:

Lightweight Tags:

A lightweight tag is simply a pointer to a specific commit, much like a branch but without the ability to move. It is just a name that points directly to a commit, with no additional metadata. Lightweight tags are quick to create but don't store any extra information (e.g., who created the tag, and when it was created).

git tag v1.0.0? # Creates a lightweight tag named 'v1.0.0'

Annotated Tags:

Annotated tags are more robust and are stored as full objects in the Git database. They include metadata such as the tagger's name, email, date, and a tagging message. This makes them more informative and useful for release management. Annotated tags are recommended when you want to include additional context or make the tag more meaningful.

git tag -a v1.0.0 -m "Version 1.0.0 release"? # Creates an annotated tag with a message.

What Are Tags Used For?

  • Versioning Releases: Tags are commonly used to mark specific points in the project's history that correspond to software releases. For example, v1.0.0 could represent the first stable release of your software. By tagging a commit, you can easily retrieve the state of the repository at the time of that release.
  • Marking Milestones: Tags can also be used to mark significant milestones in your project, such as completed features, bug fixes, or other notable achievements.
  • Bug Fixes: You can tag commits that fix critical bugs.

Undoing things

Undoing changes in Git is a common task when working with version control. There are several methods to undo changes, each suited to different scenarios:

  1. To Untracked file: if you add a file to the index (stage area) and want to undo this (untracked file): ( file tracked à Untracked file ) we can use: "git rm –cashed <filename>".
  2. To discard changes in the working directory: if you want to discard changes from the working directory to be (file in staging area = file in working directory) restore from the staging area. we can use: "git restore <file>...".
  3. To unstaged: if you staged changes to the index, and we want to unstaged these changes. To restore the staging area from the repo we can use "git restore –staged ?<file>..." This will unstaged changes but keep changes in the working directory.
  4. To edit the commit message: use this command: "git commit --amend" This will allow us to edit the last commit message.

Git Branching

What is a Branch?

branching is a powerful feature that allows you to create independent lines of development within your repository. Each branch represents a separate workspace where you can make changes without affecting the main project or other branches. Branching is fundamental to workflows in Git, enabling features like parallel development, experimentation, and collaboration.

A Branch as a Pointer: In Git, a branch is essentially a movable pointer to a commit. The default branch when you create a new Git repository is called main (or sometimes master), and as you make commits, this pointer advances along with your commits, always pointing to the latest commit.

Branching for Parallel Development: Branches allow you to diverge from the main line of development and continue to work on a separate line of development. This is useful for developing new features, fixing bugs, or trying out ideas without affecting the main codebase.

How Branching Works?

Creating a New Branch:

When you create a new branch, Git simply creates a new pointer that references the current commit. This new branch will start from the same place as the current branch. Command:

git branch feature-branch? # Creates a new branch with a name.        

Switching Between Branches:

To work on a different branch, you switch (or "checkout") to that branch. When you switch branches, Git updates your working directory to match the state of the branch you checked out. Command:

git switch feature-branch? # Switches to 'feature-branch' we can use the checkout command        

Making Changes in a Branch:

Any commits you make while on a branch are unique to that branch. Other branches are not affected by these commits. You can switch back to the original branch (main), and your changes on the feature branch will not be visible unless merged.

Merging Branches:

Once you've completed work on a branch, you may want to merge those changes back into another branch, typically the main branch. Merging incorporates the changes from one branch into another. If there are no conflicting changes, Git will perform a "fast-forward" merge, simply moving the pointer forward. If there are conflicts, you'll need to resolve them manually. Command:

git switch main git merge feature-branch? # Merges 'feature-branch' into 'main'        

Deleting a Branch:

After merging, you may want to delete the branch if it’s no longer needed to keep your repository clean. Command:

git branch -d feature-branch? # Deletes 'feature-branch'        

Merging branches

Merging in Git is the process of combining changes from one branch into another. It's a fundamental operation that integrates the work done on different branches, allowing you to consolidate features, fixes, or other development work into a single branch.

What is Merging?

Merging takes the content of a source branch and integrates it into the target branch. The result is a commit that has two parent commits, one from each branch, thereby combining the changes from both.

The merge process preserves the commit history of both branches, creating a merge commit that links the two histories together.

How Merging Works?

  1. Identify the Base Commit: Git finds a common ancestor of both branches. This is the starting point for the merge.
  2. Create a Merge Commit: Git creates a new commit that incorporates the changes from both branches since the common ancestor. This new commit becomes the tip of the branch you're merging into.

Types of Merges:

  • Fast-Forward Merge: A fast-forward merge occurs when the target branch is directly ahead of the source branch, meaning no divergent changes have been made. In this case, Git simply moves the target branch pointer forward to the most recent commit in the source branch, as if it had been there all along. This type of merge does not create a new commit; it just updates the branch pointer.
  • Three-Way Merge: A three-way merge is required when the branches have diverged, meaning both branches have unique commits that are not present in the other. Git will create a new commit (a merge commit) that brings together the changes from both branches. This merge commit will have two parent commits, one from each branch.

How to Merge Branches?

Basic Merge Command:

To merge another branch into the one you are currently on.

git merge <branch-name>  #This command merges the specified branch into the current branch.        

Resolving Conflicts:

If the branches have changes that conflict with each other, Git will not automatically complete the merge. Instead, it will mark the conflicts in the affected files and pause the merge. You will need to manually resolve these conflicts by editing the files, staging the resolved files, and completing the merge.

Aborting a Merge:

If you encounter issues during a merge and decide not to proceed, you can abort the merge process, restoring your branch to its previous state:

git merge –abort        

Merging Without Committing:

Sometimes, you might want to see the merge result before committing it. You can do this by using the --no-commit option:

git merge --no-commit <branch-name>        

This merges the changes but pauses before creating the merge commit, allowing you to review or make further changes.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了