Chapter 8 — Git Internals (Objects, SHA, Storage Model)
8.1 Introduction
Understanding Git internals transforms Git from a command-based tool into a deterministic data model. Git is fundamentally a content-addressable filesystem combined with a versioned directed acyclic graph (DAG).
This chapter explores:
-
Git object model
-
SHA-based identity system
-
Object storage structure
-
Commit graph representation
-
Packfiles and compression
-
Plumbing commands
These concepts clarify why Git operations are fast, reliable, and cryptographically verifiable.
8.2 Git as a Content-Addressable Store
Git does not store files by name or location. Instead, it stores content objects indexed by cryptographic hashes.
Key properties:
-
Immutable storage
-
Deduplication
-
Integrity verification
-
Referential graph
If content changes, hash changes.
8.3 SHA Hashing in Git
Git historically uses SHA-1 hashes (transition toward SHA-256 in newer implementations).
SHA characteristics
-
Fixed-length digest
-
Collision-resistant (practically)
-
Content-derived identity
Example hash:
e83c5163316f89bfbde7d9ab23ca2e25604af290
This hash uniquely identifies a Git object.
8.4 Git Object Types
Git uses four primary object types:
| Object | Purpose |
|---|---|
| Blob | File content |
| Tree | Directory structure |
| Commit | Snapshot metadata |
| Tag | Annotated reference |
These objects form the Git storage backbone.
8.5 Blob Objects
Blob = Binary Large Object.
Stores:
-
Raw file content
-
No filename
-
No metadata
Creation example:
echo "Hello Git" | git hash-object -w --stdin
Git:
-
Computes SHA
-
Compresses content
-
Stores in object database
8.6 Tree Objects
Tree objects represent directories.
Contain:
-
Blob references
-
Tree references
-
Filenames
-
Permissions
Conceptually similar to a filesystem directory listing.
Inspect:
git cat-file -p tree-hash
8.7 Commit Objects
Commit objects define repository snapshots.
Contain:
-
Root tree pointer
-
Parent commit(s)
-
Author/committer metadata
-
Timestamp
-
Commit message
Commit graph structure creates project history.
Inspect:
git cat-file -p commit-hash
8.8 Tag Objects
Tags provide named references to commits.
Types:
Lightweight
-
Simple pointer
Annotated
-
Metadata
-
Signature
-
Message
Create annotated tag:
git tag -a v1.0 -m "release"
8.9 Object Storage Layout
Git stores objects inside:
.git/objects/
Structure:
.git/objects/ab/cdef123...
Directory name = first 2 hash characters
Filename = remaining characters
This improves filesystem scalability.
8.10 Object Compression
Objects are stored using zlib compression.
Benefits:
-
Reduced disk usage
-
Efficient cloning
-
Network optimization
Git transparently decompresses during retrieval.
8.11 The Commit DAG
Git history forms a Directed Acyclic Graph.
Properties:
-
Directed parent relationships
-
No cycles
-
Multiple parents allowed (merge commits)
Example:
A → B → C
↘
D → E
Merge produces multi-parent commit.
8.12 HEAD, Refs, and Pointers
Git uses reference files:
.git/refs/heads/main
These store commit hashes.
Key references:
-
HEAD → current branch
-
Branch refs → commit pointers
-
Tag refs → tagged commits
Inspect HEAD:
cat .git/HEAD
8.13 Detached HEAD Internals
When HEAD references a commit instead of branch:
HEAD → commit-hash
No branch pointer movement occurs.
Commits may become unreachable.
8.14 Index (Staging Area) Internals
The index is an intermediate snapshot.
Stored in:
.git/index
Functions:
-
Tracks staged content
-
Enables partial commits
-
Accelerates status comparisons
Conceptually:
Working → Index → Commit
8.15 Packfiles
Loose objects are inefficient at scale.
Git compresses objects into packfiles.
Location:
.git/objects/pack/
Advantages:
-
Delta compression
-
Network transfer optimization
-
Storage efficiency
Create packfile:
git gc
8.16 Delta Compression
Git stores object differences rather than full copies.
Example:
-
Version 1 → base object
-
Version 2 → delta
Reduces redundancy significantly.
8.17 Garbage Collection
Git periodically removes unreachable objects.
Run manually:
git gc
Operations:
-
Pack objects
-
Prune unreachable
-
Optimize repository
8.18 Reachability Concept
Objects are retained if reachable from:
-
Branches
-
Tags
-
Reflog
Unreachable objects become garbage.
8.19 Reflog Internals
Reflog tracks reference movement.
Location:
.git/logs/
Allows recovery of:
-
Deleted commits
-
Branch resets
-
Detached states
View reflog:
git reflog
8.20 Plumbing vs Porcelain
Porcelain commands
-
User-friendly
-
Example: commit, push
Plumbing commands
-
Low-level internals
-
Example: hash-object, cat-file
Used for scripting and debugging.
8.21 Writing Objects Manually
Create blob:
git hash-object -w file.txt
Read object:
git cat-file -p hash
List tree:
git ls-tree hash
These expose Git’s storage layer.
8.22 Data Integrity Guarantees
Git ensures:
-
Tamper detection
-
Snapshot immutability
-
Historical traceability
If object changes → hash mismatch → corruption detected.
8.23 Performance Implications
Git internals enable:
-
Fast branching
-
Cheap cloning
-
Efficient merging
-
Scalable history
Because Git manipulates pointers rather than files.
8.24 Practical Mental Model
Think of Git as:
Key-value database + DAG + working directory
Where:
-
Key = SHA
-
Value = object content
8.25 Summary
This chapter examined:
-
Content-addressable storage model
-
SHA identity mechanism
-
Blob, tree, commit, and tag objects
-
Object storage structure
-
Commit DAG
-
Index architecture
-
Packfiles and delta compression
-
Reachability and garbage collection
-
Plumbing command ecosystem
Understanding Git internals provides conceptual clarity for advanced workflows, debugging, repository recovery, and performance optimization.
No comments:
Post a Comment