Exploring the internals of Git. (Part 1)

The anatomy of the .git directory.

Mar 01, 2025

Post cover

Introduction

Git is so omnipresent in Software Development these days that we almost take it for granted. We use it to manage our code, collaborate with our team, and even to store our data.

But offlate, I have been wondering about how it is actually able to store my projects' history in such an efficient way.

This blog aims to explore just that, understanding the data model for some of the important git components.

Git Objects

Git represents information in the form of objects. Each object is a 40 character sha1 hash that uniquely identifies the object.

There are three types of objects:

  1. Blob
    • Represents a file in the repository. It contains the content of the file.
  2. Tree
    • Represents a directory in the repository. This type of object is used to maintain the directory structure of the project.
  3. Commit
    • Represents a commit in the repository. This type of object is used to store the commit history of the project.

The .git/objects directory

The objects are stored in the .git/objects directory.

$ echo "Hello, World!" > readme.md
$ echo <h1>hello world</h1> > index.html
$
$
$ git add readme.md index.html
$ git commit -m "Initial commit"

Take a look at the objects directory. For each object, there is a subdirectory with the first two characters of the sha1 hash. And the remaining 38 characters are the name of the objects themselves.

objects

But why did git create 4 objects for just 2 files?

The three object types form a hierarchy in which the commit object is the parent of the tree object, and the tree object is the parent of the blob objects. Tree objects can further have zero or more tree objects as children. This nesting is used to retain the directory structure of the project.

hierarchy

How to trace the objects?

  • Get the hash for current commit.
$ git rev-parse HEAD

60d07a0572f16a713cb5eea6cd74777e19eb6e0a
  • Get the tree object for the current commit.
$ git cat-file -p 60d07a0572f16a713cb5eea6cd74777e19eb6e0a

tree b63d87c9c4fbe8a5091ea55f421ba4becb841f3b
author Vignesh Iyer <[email protected]> 1739766156 -0800
committer Vignesh Iyer <[email protected]> 1739766156 -0800

this is my first commit
  • Get the list of blobs in the tree object.
$ git ls-tree b63d87c9c4fbe8a5091ea55f421ba4becb841f3b

100644 blob 1691dc627098b2bce8dd50575a2ab26347cb8a5f    index.html
100644 blob 3b18e512dba79e4c8300dd08aeb37f8e728b8dad    readme.md
  • Get the content of the blob object.
$ git cat-file -p 1691dc627098b2bce8dd50575a2ab26347cb8a5f

<h1>hello world</h1>
blob

Subsequently, any more changes that you make in the existing files will create new blob objects. Additionally, new tree objects will be created, pointing to the new blob objects. Finally the commit object will be updated to point to the new tree object. Note that if any files that are not changed, will not create new blob objects. The same blob object will be referenced by the tree object. This is how git is able to store the history of the project in such an efficient way.

The .git/refs directory

The refs directory contains the references to the objects.

$ ls -l .git/refs

drwxr-xr-x@ - viiyer 16 Feb 20:31  heads
drwxr-xr-x@ - viiyer 16 Feb 20:21  tags

The refs directory contains two subdirectories:

  • heads
    • Contains the references to the branches.
  • tags
    • Contains the references to the tags.

Each of these subdirectories contains a file with the name of the branch or tag.

$ cat .git/refs/heads/main

60d07a0572f16a713cb5eea6cd74777e19eb6e0a

The content of the file is the sha1 hash of the latest commit object.

$ git rev-parse HEAD

60d07a0572f16a713cb5eea6cd74777e19eb6e0a

Each branch is represented by a file in the refs/heads directory, which acts as a pointer to the latest commit object.

The current branch information is stored in the HEAD file.

$ cat .git/HEAD

ref: refs/heads/main

Similarly, tags are also represented by a file in the refs/tags directory.

The index file

The index file is a binary file that contains the information about the files that are about to be committed. It is also known as the staging area.

The index file is updated with the new blob objects when you run the git add command.

Conclusion

In this part of the blog, we have explored the objects, refs, and the index file. In the next part of this blog, I will be implementing a short version of git in python.

References

Subscribe to my RSS feed to get notified about new posts.