GIT Internal (part 1)

GIT Internal (part 1)

If you're a developer, you've probably used Git, and not just that, you probably use it daily. But have you ever wondered how Git is structured? What happens when we commit code or how code merging is handled?

In today’s post, let’s "dissect" Git, a revolutionary software in the programming world. Make sure you’ve already mastered some basic commands like git commit, git branch, and git merge,...

Everything is a hash

Yes, you heard that right. The world inside Git is nothing but hashes, even your code is represented as a hash. Keep this idea in mind because it relates to the next part.

Git Objects: Blob, Tree, and Commit

First, let’s briefly go over Git. The most basic component in Git is an object. Objects can be blobs, trees, or commits. And of course, all of them are identified by a hash.

Blob

  • Blob (binary large object) contains the content of your code.
  • It’s different from a file, as a file includes metadata (like creation date or file name), while a blob contains only the file's raw byte content.
  • A blob is identified by a SHA-1 hash generated from its content.

Blob object

Tree

  • The tree maps the directory structure of your code and is also identified by a SHA-1 hash.
  • Its content points to the hashes of blobs or other trees. Think of blobs as leaf nodes, and trees as non-leaf nodes in a tree structure.

Tree object

In the example above, the tree corresponds to a file system where the root directory contains one file /test.js and one subdirectory /docs. The /docs directory contains two files: /docs/pic.png and /docs/1.txt.

Commit

  • Commits are something you're likely very familiar with. They’re the result of the git commit command you use daily.
  • A commit is like a snapshot, recording the state of the entire directory at a given point in time.
  • A commit contains a pointer to the hash of the root tree, the author (the person who made the commit), the message (the commit message), and the commit time. Uniquely, a commit can also point to one or more previous commits (in the case of merging), known as parent commits.
  • It’s also identified by a SHA-1 hash.
  • Each commit represents a full snapshot when you git commit, not just a list of changes since the previous commit.

Commit object

Q&A Section

At this point, some might wonder: if each commit contains the full snapshot of the directory at that moment, doesn’t that mean we have to store a lot of data with every commit?

Let's jump into an example. In the directory tree above, suppose we change the content of the file 1.txt from "HELLO WORLD" to "HELLO WORLD!". You can see that the tree and blob have changed (in red).

When we change our code

At first glance, it looks like the new commit saves a lot of data, but if you look closely, you’ll notice that unchanged parts remain the same.

In conclusion, if an object hasn’t changed, Git doesn’t create a new copy of that object but keeps the original intact.

When create new commit

Recap

  • Blob: Contains the file content.
  • Tree: Maps the directory and points to blobs and other trees.
  • Commit: A snapshot of the working tree.
  • For example, if I have a file with the content "Hello world" and you have a similar file, the blob hash will be the same because it’s hashed from the file’s content.
  • Similarly, if I have a folder containing a subfolder and other files, and you have an identical folder with the same structure and file names, the tree hash will be the same.
  • However, if you commit and I commit, the commit hashes will likely be different because they hash from the content of the author, commit message, and commit time.

In the next part, I'll talk about branches in Git. Stay tuned!

要查看或添加评论,请登录

Huy Nguyen的更多文章

  • Connect site to site VPN

    Connect site to site VPN

    Site to site VPN là gì Site to site vpn s? thi?t l?p ???ng h?m b?o m?t (secure tunnel) gi?a 2 hay nhi?u network khác…

    1 条评论
  • 0.01 và 0.25 ^ 4

    0.01 và 0.25 ^ 4

    Hello m?i ng??i. Cu?i tu?n th? 7 v?a r?i mình có m?t bu?i offline v?i c?ng ??ng wecommit.

    1 条评论
  • Domain-driven design - Tactical design

    Domain-driven design - Tactical design

    Tóm t?t DDD M?c tiêu c?a DDD là thi?t k? ph?n m?m ??t nghi?p v? vào trung tam, tách bi?t nghi?p v? v?i c?ng ngh?. DDD…

  • Gi?i thi?u AI và Machine learning

    Gi?i thi?u AI và Machine learning

    Hello ace, h?m T7 v?a r?i mình có offline nhóm wecommit, n?i dung v? Gi?i thi?u AI và Machine learning. Bu?i chia s?…

    2 条评论
  • MVCC trong postgresql

    MVCC trong postgresql

    MVCC trong postgresql là gì MVCC (Multi-Version Concurrency Control) là c? ch? ki?m soát concurrency ?? x? lí nhi?u…

  • Event storming cùng microservice và Domain driven design

    Event storming cùng microservice và Domain driven design

    Hello m?i ng??i, th? 7 v?a r?i mình có bu?i chia s? v?i anh em wecommit v?i ch? ?? "Event storming cùng microservice và…

    7 条评论
  • Free talk: L?p trình viên chuyên nghi?p

    Free talk: L?p trình viên chuyên nghi?p

    Hello anh em, ngày h?m qua mình l?i có bu?i off cùng anh em wecommit. Ch? ?? tu?n này là v? "L?p trình viên chuyên…

  • Làm sao ?? làm quen d? án m?i m?t cách nhanh nh?t

    Làm sao ?? làm quen d? án m?i m?t cách nhanh nh?t

    Hello anh em, tu?n v?a r?i mình v?a có 1 bu?i offline c?ng ??ng we commit. B?n cu?i tu?n OT nên nay mình m?i có th?i…

  • Loay hoay ch?n h??ng ?i, làm tech lead hay làm qu?n ly?

    Loay hoay ch?n h??ng ?i, làm tech lead hay làm qu?n ly?

    H?m nay mình l?i có d?p ng?i cùng anh em wecommit ?? bàn lu?n ch? ?? này. Mình hi?n ?ang là m?t middle dev, và c?ng…

    7 条评论
  • Phan m?nh index trong database

    Phan m?nh index trong database

    Khi update data trong b?ng th??ng xuyên có th? làm index b? phan m?nh, gay ra v?n ?? v? hi?u n?ng khi truy xu?t d?…

    1 条评论

社区洞察

其他会员也浏览了