The Packfile (Compression)
Git is famous for being incredibly fast and lightweight. But if it saves an entirely new Blob every time you edit a file, wouldn't the repository eventually explode in size?
Garbage Collection
The answer is Yes. When you are writing code, Git optimizes for speed by saving loose objects (entire uncompressed blobs). If you save a 1MB file 50 times, your `.git` folder grows by 50MB.
But occasionally, Git automatically runs a command called git gc (Garbage Collect). This routine sweeps through all your objects, finds versions that look similar, and compresses them into a single binary file called a Packfile using Delta Compression.
.git/objects/ (Loose Objects)
.git/objects/pack/
Delta Compression
In the interactive simulation above, we have 5 versions of the exact same file. In their loose uncompressed state, they total 622 KB of storage.
When you click $ git gc, Git discovers that these 5 files are 99% identical. Instead of storing 5 full copies, it stores the largest most recent file as the Base Object. For the remaining 4 files, it simply calculates the mathematical difference ("The Delta") between them.
Reverse Engineering History
Unlike typical diffing where you start at Version 1 and apply diffs to reach Version 5, Git stores Version 5 intact, and stores the diffs to go backwards to Version 1! Why? Because you are overwhelmingly more likely to read recent code than 5-year-old code.
Loose vs Packed
If you look inside a fresh `.git/objects` folder, you will see your two-character hash folders containing loose objects. If you clone a repository from GitHub, you'll likely see almost nothing there—because GitHub sends repositories over the network pre-packed into the `pack/` directory to save bandwidth.