Understand Git the right way. Learn repositories, commits, branches, merges, tags, and the DAG with clear mental models and real examples.
Article 2 of 32 β Part 1: Foundations
git reflog, found the commit hash, and typed git branch feature-auth a3f7c2d. The branch reappeared instantly β every commit intact. The entire recovery took eleven seconds.
Understanding that branches are just pointers meant recovery was one command away. The concepts in this article are not abstract theory β they are the difference between panic and an eleven-second fix.
Before you type a single Git command, you need a mental model of the concepts Git is built on. A repository is not just a folder. A commit is not just a save. A branch is not just a copy. And a merge is not just pasting code together. This article builds the conceptual foundation that will make every Git command you learn from here on click into place.
.git directory containing every commit, branch, and tag that ever existedA repository (or "repo") is the complete history of your project, stored as a database of every change ever made. It is not just the current files. It is every version of every file, every commit message, every branch, and every tag, all packed into a hidden .git directory at the root of your project.
When you run git init, Git creates a .git folder. This single directory contains everything Git needs: the object database (blobs, trees, commits), references (branches, tags), configuration, and hooks. Your working directory is just the latest checkout. The repository lives inside .git.
$ mkdir my-project && cd my-project $ git init Initialized empty Git repository in /home/user/my-project/.git/ $ ls -la .git/ -rw-r--r-- HEAD # Points to the current branch -rw-r--r-- config # Repository-level configuration drwxr-xr-x objects/ # The object database (commits, trees, blobs) drwxr-xr-x refs/ # Branch and tag pointers drwxr-xr-x hooks/ # Scripts triggered by Git events
Every Git repository is a full copy of the entire history. When you clone a repository from GitHub, you get everything: every commit, every branch, every tag. Your local repository is not a thin client pointing at a server. It is a complete, independent copy. This is what makes Git a distributed version control system.
| Concept | What It Means |
|---|---|
| Local repository | The full Git database on your machine, inside .git |
| Remote repository | A copy hosted elsewhere (GitHub, GitLab, a server) that you sync with |
| Working directory | The actual files you see and edit on disk |
| Staging area (index) | A buffer between your working directory and the next commit |
The repository is the .git directory. Your files are the working directory. They are related but separate concepts. Deleting .git destroys the repository but leaves your files. The files without .git are just a folder, not a repo.
This is the most important mental shift for understanding Git. Most version control systems store changes as diffs: the difference between one version and the next. Git does not do this. Git stores snapshots: a complete picture of every file at the moment you commit.
When you make a commit, Git takes a snapshot of every tracked file in your project. If a file has not changed since the last commit, Git does not store a duplicate. Instead, it stores a pointer to the previous identical file. This makes Git extremely efficient while maintaining the conceptual simplicity of "every commit is a complete snapshot."
# Delta-based systems (SVN, CVS): # Commit 1: [Full file] # Commit 2: [Diff from commit 1] # Commit 3: [Diff from commit 2] # To reconstruct commit 3, apply all diffs sequentially. # Snapshot-based system (Git): # Commit 1: [Snapshot: fileA-v1, fileB-v1, fileC-v1] # Commit 2: [Snapshot: fileA-v2, fileB-v1, fileC-v1] (fileB, fileC are pointers) # Commit 3: [Snapshot: fileA-v2, fileB-v2, fileC-v1] (fileA, fileC are pointers) # To reconstruct commit 3, just read the snapshot. No replay needed.
Every commit in Git contains four pieces of information: a pointer to the snapshot (a tree object), the author and committer metadata, the commit message, and pointers to parent commits. The very first commit has no parent. Every subsequent commit points back to the commit that came before it, forming a chain.
$ git cat-file -p HEAD tree 4b825dc642cb6eb9a060e54bf899d15a4f3f7e2a parent 8a3b5e7f1c2d4e6a8b0c1d2e3f4a5b6c7d8e9f0a author Jane Developer <[email protected]> 1710864000 -0500 committer Jane Developer <[email protected]> 1710864000 -0500 Add user authentication module # tree -> Points to the snapshot of all files # parent -> Points to the previous commit (the chain) # author -> Who wrote the change # committer -> Who applied the change (can differ in rebases) # message -> Human-readable description of why
Every commit is identified by a 40-character SHA-1 hash, computed from the commit's content (tree, parent, author, message). This means the commit ID is a fingerprint of its entire content. If any part of a commit changes, its hash changes. This gives Git built-in integrity checking: if a commit's hash matches, its content is guaranteed to be exactly what was stored.
Because commits point to their parents by hash, changing an old commit does not just change that commit. It changes every commit that came after it, because their parent hashes no longer match. This is why "rewriting history" in Git is a deliberate, explicit operation.
A commit is a snapshot of your entire project at a point in time, identified by a unique SHA-1 hash. Git does not store diffs. It stores snapshots with pointers to unchanged files. This is why Git is fast: reconstructing any version requires reading one snapshot, not replaying a chain of diffs.
A branch in Git is not a copy of your code. It is a lightweight, movable pointer to a specific commit. That is it. A branch is a small file containing a 40-character commit hash (plus a newline). Creating a branch does not duplicate any files. It just creates a new pointer.
When you create a branch, Git creates a new reference that points to the current commit. When you make a new commit on that branch, the pointer moves forward to the new commit. The other branches stay where they are. This is what creates the illusion of parallel timelines: different branches point to different commits in the same history graph.
# Create a new branch β this just creates a pointer $ git branch feature-login $ cat .git/refs/heads/feature-login a1b2c3d4e5f6... # Same commit hash as main # Switch to the branch $ git checkout feature-login # Make a commit β the feature-login pointer moves forward $ echo "login code" > login.py $ git add login.py && git commit -m "Add login module" # Now feature-login points to the new commit # main still points to the old commit $ cat .git/refs/heads/main a1b2c3d4e5f6... # Has not moved $ cat .git/refs/heads/feature-login b2c3d4e5f6a7... # Moved forward to the new commit
HEAD is a special reference that tells Git which branch you are currently on. When you switch branches with git checkout or git switch, you are moving HEAD to point to a different branch. HEAD usually points to a branch name, which in turn points to a commit.
$ cat .git/HEAD ref: refs/heads/main # HEAD points to main $ git checkout feature-login $ cat .git/HEAD ref: refs/heads/feature-login # HEAD now points to feature-login
Because branches are just pointers, they are instant to create and cost almost nothing. This changes how you work. In older systems where branching meant copying the entire codebase, developers avoided branches. In Git, you can create a branch for every bug fix, every feature, every experiment. If it does not work out, you delete the pointer. The commits get garbage-collected eventually. No harm done.
Grab a piece of paper or open a drawing tool. Build the following DAG from scratch:
A β B β C β D on the main branch.feature-1 from commit B. Add 2 commits (E β F) to it.feature-2 from commit C. Add 1 commit (G) to it.feature-1 into main at commit D, creating merge commit M1.feature-2 into main at M1, creating merge commit M2. Label where each branch pointer ends up.A branch is not a folder. It is not a copy. It is not a separate workspace. It is a small file containing a 40-character commit hash (plus a newline). Understanding this makes Git's branching model intuitive rather than mysterious.
Branches are movable pointers to commits. Creating a branch is instant and free. HEAD tells Git which branch you are on. This lightweight branching model is what makes Git's workflow so flexible.
If branches are parallel timelines, merging is the act of bringing those timelines back together. When you merge branch B into branch A, Git combines the changes from both branches into a single result.
If the target branch has not diverged (no new commits since the branch was created), Git simply moves the pointer forward. No new commit is created. This is called a fast-forward merge because Git just "fast-forwards" the pointer along the existing commit chain.
# Before merge: # main: A --- B # feature-login: A --- B --- C --- D $ git checkout main $ git merge feature-login # After merge (fast-forward): # main: A --- B --- C --- D (pointer moved forward) # No new commit created. main just caught up.
If both branches have new commits since they diverged, Git cannot fast-forward. Instead, it performs a three-way merge: it finds the common ancestor (the merge base), compares both branches against it, and creates a new merge commit that has two parents.
# Before merge: # main: A --- B --- E --- F # feature: A --- B --- C --- D $ git checkout main $ git merge feature # After merge: # main: A --- B --- E --- F --- M (merge commit) # \ / # C --- D ----- # M has two parents: F and D
When Git performs a three-way merge and both branches have changed the same lines in the same file, Git cannot automatically decide which version to keep. This is a merge conflict. Git pauses the merge, marks the conflicting sections in the file, and asks you to resolve them manually.
# Git marks conflicts in the file:
<<<<<<< HEAD
def authenticate(user):
return check_password(user)
=======
def authenticate(user):
return verify_credentials(user)
>>>>>>> feature-login
# Above ======= is YOUR branch (HEAD)
# Below ======= is the INCOMING branch
# You manually choose the correct version, remove the markers, then:
$ git add auth.py
$ git commit # Completes the merge
Merge conflicts are not errors. They are Git telling you it needs a human decision. Conflicts mean two people changed the same thing and Git respects both contributions enough not to pick a winner silently.
Merging combines divergent branches. Fast-forward merges just move a pointer. Three-way merges create a new commit with two parents. Conflicts happen when the same lines are changed on both sides and require manual resolution.
A tag is a reference that points to a specific commit and never moves. While branches are movable pointers that advance with each new commit, tags are permanent markers. They are used to label important points in history, most commonly release versions.
Git supports two types of tags. A lightweight tag is just a pointer to a commit, like a branch that never moves. An annotated tag is a full Git object with its own metadata: tagger name, date, and a message. Annotated tags are recommended for releases because they carry additional context.
# Lightweight tag: just a pointer $ git tag v1.0.0 # Annotated tag: a full object with metadata $ git tag -a v1.0.0 -m "Release version 1.0.0 - initial stable release" # View tag details $ git show v1.0.0 tag v1.0.0 Tagger: Jane Developer <[email protected]> Date: Mon Mar 18 2026 10:30:00 -0500 Release version 1.0.0 - initial stable release commit a1b2c3d4e5f6...
| Feature | Branch | Tag |
|---|---|---|
| Moves with new commits? | Yes, advances automatically | No, stays fixed forever |
| Purpose | Active line of development | Mark a specific point (release, milestone) |
| Typical naming | main, feature-login | v1.0.0, v2.3.1 |
| Can be checked out? | Yes, with HEAD attached | Yes, but results in detached HEAD |
Tags are immovable pointers to specific commits. Use annotated tags for releases. Think of them as bookmarks in a book: they mark a page but do not change as you keep reading.
All of Git's structures, commits, branches, merges, and tags, form a Directed Acyclic Graph (DAG). Understanding the DAG is the key to understanding every Git operation.
Directed means each edge has a direction: commits point to their parents (backward in time). Acyclic means there are no loops: you can never follow parent pointers and end up back where you started. Graph means it is a collection of nodes (commits) connected by edges (parent pointers).
# A simple DAG with branches and a merge: # # main # | # v # A --- B --- E --- F --- M (merge commit) # \ / # C --- D ---- # ^ # | # feature # # Arrows point from child to parent (backward in time) # M has two parents: F and D # Branches (main, feature) are just labels on specific nodes $ git log --oneline --graph --all * f3a2b1c (HEAD -> main) Merge branch 'feature' |\ | * d4e5f6a (feature) Add search functionality | * c3d4e5f Implement basic query parser |/ * e1f2a3b Update configuration * b0c1d2e Initial project setup * a9b0c1d First commit
The DAG is not just a visualization convenience β it is the fundamental data structure that makes Git work. Every piece of Git's power derives from the properties of a directed acyclic graph.
Reachability determines everything. When Git asks "is commit X an ancestor of commit Y?", it walks the DAG from Y following parent pointers. If it reaches X, the answer is yes. This single operation powers fast-forward detection, merge-base computation, and garbage collection (unreachable commits get cleaned up).
The acyclic property guarantees termination. Because no cycles exist, every walk through the DAG is guaranteed to terminate. You can never get stuck in an infinite loop following parent pointers. This is why git log always finishes, and why topological sorting (showing commits in dependency order) is always possible.
Merge commits encode topology. A merge commit with two parents is not just "a commit that combined code." It is a structural node in the graph that records which two histories were joined. This is why git log --first-parent can show you just the mainline history, and why git log --ancestry-path can find all commits between two points in the graph.
Branches and tags are external labels. The DAG itself contains only commits and parent pointers. Branches and tags are separate references stored outside the graph that point into it. This separation is what makes branching and tagging O(1) operations β you are manipulating labels, not the graph itself.
Rebasing copies subgraphs. When you rebase, Git copies a subgraph of the DAG to a new location. The original nodes remain (until garbage-collected). The new nodes have new hashes because their parent pointers changed. Understanding this graph-copying operation is the key to understanding why rebase rewrites history.
Every Git command manipulates the DAG. When you commit, you add a node. When you branch, you add a label. When you merge, you create a node with two parents. When you rebase, you copy nodes to a new location. When you understand the DAG, you can reason about any Git operation by thinking about how it changes the graph.
| Git Operation | DAG Effect |
|---|---|
git commit | Adds a new node pointing to the current node as parent |
git branch | Creates a new label pointing to an existing node |
git merge | Creates a new node with two parent pointers |
git rebase | Copies nodes to a new base, creating new hashes |
git tag | Creates a permanent label on a specific node |
git reset | Moves a branch label to a different node |
If you can draw the DAG before and after a Git command, you understand what that command does. Every confusing Git situation becomes clear when you sketch the graph on paper.
Git's history is a Directed Acyclic Graph. Commits are nodes, parent pointers are edges, branches and tags are labels. Every Git command is a graph operation. Learn to think in graphs and Git becomes predictable.
The ultimate benefit of these concepts working together is freedom. When branches are free, commits are snapshots, and history is a graph, you can experiment without risking anything. This changes how you develop software.
Every commit is an immutable snapshot. Once committed, your work is safe in the object database. Even if you delete a branch, the commits still exist (until garbage collection, which takes weeks by default). You can always recover. This means you can try risky refactors, explore wild ideas, and make mistakes, all without fear of losing your stable code.
# Create a branch for a risky experiment $ git checkout -b experiment/new-algorithm # Try something wild $ echo "radical changes" > algorithm.py $ git add . && git commit -m "Try new approach" # It did not work? Just switch back. Nothing is lost. $ git checkout main # Your main branch is exactly as you left it. # Delete the experiment branch if you want $ git branch -D experiment/new-algorithm # The commits still exist in the object database for ~30 days # Or keep it and come back to it later # Branches cost nothing
Practice explaining these concepts using only analogies β no technical jargon allowed. Write or say your explanations out loud:
Professional Git workflows are built on this fearless experimentation. Feature branches, pull requests, code reviews, continuous integration, all of these practices depend on the ability to branch cheaply, commit often, and merge confidently. Without lightweight branches and snapshot-based commits, none of these workflows would be practical.
| Practice | Depends On |
|---|---|
| Feature branches | Cheap, instant branching |
| Pull requests / code review | Isolated branches with full history |
| CI/CD pipelines | Branches as testable units of work |
| Hotfix workflows | Ability to branch from any commit instantly |
| Exploratory coding | Zero-cost throwaway branches |
| Bisect debugging | Every commit is a complete, testable snapshot |
New developers often work directly on the main branch because they are used to systems where branching is expensive. In Git, not branching is the risky behavior. Always branch before making changes so you have a clean fallback.
Git's design, snapshots, cheap branches, and a persistent object database, creates a safety net that enables fearless experimentation. Commit often, branch freely, and let Git protect your work.
.git directory: all commits, branches, tags, and configuration. It is a full database, not just the current files.v1.0.0).feature-search from main. How much disk space does this new branch consume?main and run git merge feature. The feature branch was created from main and main has no new commits since then. What type of merge occurs?git reflog and recreate the branch with git branch branch-name commit-hash. Unreferenced commits are only removed by garbage collection, which by default waits at least 30 days.git pull the same as git merge?git pull is a combination of two operations: git fetch (download new commits from the remote) followed by git merge (integrate those commits into your current branch). You can also configure git pull to use rebase instead of merge with git pull --rebase.git checkout -b new-branch-name.A developer assumes that deleting a large file in the latest commit means Git no longer stores that file. They are surprised when the repository size does not shrink, and even more surprised when git checkout to an older commit brings the file back in full. They expected Git to "forget" the file once the diff removed it.
Commits are snapshots, not diffs. Every commit that ever contained the file still has a complete copy of it in the object database. To truly remove a large file from history, you need git filter-repo or git filter-branch to rewrite history. Understanding the snapshot model prevents this confusion entirely.
A new developer creates a branch called feature-login and then looks in their file explorer expecting to see a new folder or a copy of the project. They see nothing. They panic, thinking the branch was not created. They create a manual copy of the project folder "just in case" and now have two copies of the codebase that will quickly diverge and become impossible to reconcile.
A branch is a small pointer file inside .git/refs/heads/ containing a 40-character commit hash (plus a newline). It does not create folders or copies. Run cat .git/refs/heads/feature-login to see the commit hash it points to. Your working directory shows whichever branch HEAD points to. Switch branches with git checkout or git switch β the files change in place.
A developer coming from SVN avoids creating branches because "branching is expensive and merging is painful." They work directly on main, making large, infrequent commits. When a risky refactor breaks the build, they have no clean state to fall back to. The team is blocked for hours while they manually undo changes.
In Git, branching costs nothing and merging is fast. Always create a branch before starting work: git checkout -b feature/my-change. If the experiment fails, git checkout main returns you to a clean state instantly. Make branching your default reflex, not your last resort.
A developer runs git checkout v1.0.0 to look at an old release. Git prints a "detached HEAD" warning. The developer ignores it, makes several commits to fix a bug they found, then switches back to main. Their bug-fix commits are now orphaned β not on any branch β and will be garbage-collected in 30 days. The work is effectively lost.
HEAD normally points to a branch name (e.g., ref: refs/heads/main). When you checkout a tag or a raw commit hash, HEAD points directly to a commit β this is "detached HEAD." Any new commits will not be on a branch. If you need to make changes, always create a branch first: git checkout -b hotfix/v1.0.1 v1.0.0. If you already made commits in detached HEAD, recover them with git branch my-fix HEAD before switching away.
One question at a time. Score 8/10 or higher and you are ready for Article 3.
.git directory contain?git branch -d feature. What happens to the commits that were on that branch?The concepts in this article are the foundation everything else in Git is built on. Before commands, before workflows, before hosting platforms, there are these core ideas.
.git directory.Next up: Article 3 - Centralized vs Distributed Version Control: SVN vs Git
Categories: : Git