Git Internals
?? Saral Saxena ??????
?11K+ Followers | Linkedin Top Voice || Associate Director || 15+ Years in Java, Microservices, Kafka, Spring Boot, Cloud Technologies (AWS, GCP) | Agile , K8s ,DevOps & CI/CD Expert
You may have even come across a hidden directory named .git and intuited that this must be where the git magic happens. If not, enter command “ls -al” in the terminal from inside any git-tracked directory to see a list of all files, including the .git directory. To view the contents of this directory, enter “ls .git” or open the directory in your preferred editor.
But what does this mysterious .git directory really hold and how does that relate to the information you use
1. branches
This directory is considered deprecated and typically not used in newer git projects.
2. hooks
This directory contains special scripts that can “hook” into your process and execute before or after specific git events (e.g., commit, push, pull). This is a more advanced feature that is less commonly used. If you have familiarity with React hooks, the concept here is similar. You could, for example, define a pre-commit script that checks for adherence to a set of code writing standard before executing the commit. Many of these functions can be established via other means, e.g., setting up a linter in your development environment.
3. info
This directory contains the “exclude” file that specifies which files and sub-directories within the main directory that git should not track. This functionality is often done via a separate file called “.gitignore” which produces similar results. The major difference between these two approaches is that the “.gitignore” file will be tracked by .git and shared with any team members. The “exclude” file, on the other hand, is completely local to you and your version of the project.
4. logs
This directory contains a history log for each branch that describes how the local project has interacted with the branch. This includes things like making commits, moving away or back to a branch, and pushing (for remote branches).
5. objects
This directory is arguably the heart of git. Here is where git stores the core data for each file, log messages, author information, and virtually all the data required to build out your project in its current or any prior iteration.
Each file represents an object, which consists of 3 things: size, a type, and content. The size is represented by the number of characters in the file. The type is one of 4 types which will be described shortly, with corresponding content.
Note: These files are compressed and cannot generally be read directly in an editor. A typical example would look something like:
blob 25\x00My data fits on one line\n
In the above example, the type is a blob, the size is 25 (characters), and the content is “My data fits on one line\n”.
There are four types of git objects:
- Blob: Despite the unusual name, this corresponds to any typical data file you might add to your project.
- Tree: This is a list of references other trees and blobs. It is analogous to a directory.
- Commit: This contains all the metadata such as the author, committer, commit-data, and log-messages for each change (“commit”) to the project.
- Tag: This assigns a human-readable name to any object above, typically a commit. This allows you to, for example, assign a term like “beta” to a final commit that completes the beta version of your project and reference it rather than the hash associated with the commit. (We will get to hashes in a moment.)
You may have noticed that each file is named with a long string of seemingly random letters and numbers. This is because each object corresponds to a hash created by a Secure Hash Algorithm (SHA) that is 40 digits long. This naming convention has the benefit of providing a lot of information about the object itself just in the name. Due to the cryptographic nature of a hash, it is virtually impossible for two different objects to have the same name; thus, it is easy to quickly check if two objects are identical or not. Conversely, if two people create the same object (e.g., by committing the exact same information), it is guaranteed to create an identical name.
You may have also noticed that some of the filenames are not 40 characters long. And that these files may be stored under sub-directories with a couple more seemingly random letters and numbers. In order to store data even more efficiently, git will group hashes that start with the same letters into a sub-directory with those initial letters. The full hash includes the name of the sub-directory and the name of the file itself. Due to the way hashes are created, files stored under the same sub-directory do not necessarily have any special relationship to each other, other than happening to have a similar start to their hash.
6. refs
This directory contains text files that correspond to each branch you have in your project; the files are titled correspondingly. The content of each file is a hash that points to the most recent commit for that branch.
Note: If you name branches using a slash, these will be grouped into sub-directories. For example, if I created AMR/branch-1 and AMR/branch-2, those would display as two files “branch 1” and “branch 2” underneath a subdirectory “AMR”.
7. COMMIT_EDITMSG
A text file containing the most recent message you have committed.
8. config
A text file containing configuration settings specific to this directory.
Note: For global configuration settings, it is worth taking a look at your ~/.gitconfig file, which is stored outside of the .git directory. This will contain globally set configuration such as your username and email that are attached to your commits.
9. description
A text file which may contain an optional description for your project, leveraged by certain tools that access git
10.HEAD
A text file describing of the current head you are pointed at. Typically will be of the form refs/heads/<current branch>.
11.Index
A binary file that contains meta-data including a list of file names, timestamps, permissions, and hashes for anything already tracked in git