B+ Trees: The Data Structure Powering Your Database Indexes
Every time I used a database index, I took it for granted. Then I dug into how indexes actually work internally — and it all comes back to one data structure: the B+ Tree.
It sounds intimidating. It's not. Let me walk through it.
Why Not Just Use a Regular Sorted Array?
A sorted array lets you do binary search — fast lookups in O(log n). But inserting into a sorted array is slow — you have to shift elements. And if the data is on disk (not in memory), random reads are expensive.
Databases need something that:
- Stays sorted for fast lookups.
- Handles inserts/deletes efficiently without massive reorganization.
- Works well with disk reads (reads in chunks, not one value at a time).
That's exactly what B+ trees are designed for.
What Is a B+ Tree?
A B+ Tree is a self-balancing tree where:
- Every path from the root to a leaf is the same length (always balanced).
- Each node holds multiple keys — so the tree stays shallow even with millions of entries.
- All actual data (or row pointers) are stored only at the leaf nodes.
- The internal (non-leaf) nodes only hold keys as guides to navigate down the tree.
- Leaf nodes are linked together in a doubly-linked list — making range scans fast.
The Structure
Let's say we have user IDs: 1, 5, 9, 14, 17, 21, 30 indexed in a B+ tree with order 3 (max 2 keys per node):
[ 14 ]
/ \
[ 5, 9 ] [ 17, 21 ]
/ | \ / | \
[1,5] [9] [14] [17] [21] [30]
←→ ←→ ←→ ←→ ←→ ←→ (leaf nodes linked)
Internal Nodes (top levels)
- Act as a routing guide: "go left if value < 14, right if ≥ 14."
- Do NOT store actual data — just keys for navigation.
Leaf Nodes (bottom level)
- Store the actual values (or pointers to rows in the table).
- Are linked to each other left-to-right — this is the key feature for range queries.
How a Lookup Works
Find user with ID = 17:
- Start at root:
17 ≥ 14→ go right. - At node
[17, 21]:17 = 17→ go to left child. - At leaf
[17]: found it. Return the row pointer.
That's it — 3 steps for a 7-element tree. For millions of rows, it's still just ~30 steps (since the tree height grows as log of the number of entries).
How a Range Query Works
This is where B+ trees absolutely shine over regular B-trees.
Query: WHERE user_id BETWEEN 9 AND 21
- Traverse tree to find the leaf containing
9. - From there, follow the linked list across leaf nodes:
[9] → [14] → [17] → [21]. - Stop when you exceed the upper bound.
Because leaf nodes are linked, you never have to go back up the tree. You just scan horizontally across the bottom — super efficient.
This is why B+ trees are better than B-trees for databases: in a regular B-tree, data lives at every level (not just leaves), so range queries require traversing up and down the tree. B+ trees keep all data at the leaves and link them — range scans become a simple linear walk.
How Inserts Work
When you insert a new value, the tree might need to split a node to stay balanced:
- Find the correct leaf node for the new value.
- Insert it there.
- If the leaf is full (exceeds max keys), split it into two nodes and push the middle key up to the parent.
- If the parent is also full, split that too — this can ripple all the way up to the root.
- If even the root splits, a new root is created — this is the only way the tree grows taller.
This is what keeps the tree balanced — every leaf is always at the same depth.
B-Tree vs B+ Tree
| B-Tree | B+ Tree | |
|---|---|---|
| Data stored at | Every node (internal + leaf) | Only leaf nodes |
| Leaf nodes linked? | No | Yes (doubly linked list) |
| Range queries | Slower (traverse up/down) | Fast (scan leaf list) |
| Space efficiency | Slightly better per lookup | Better overall for databases |
| Used in databases? | Rarely | Yes — PostgreSQL, MySQL, SQLite, etc. |
Why Databases Love B+ Trees
- Shallow height — even billions of rows fit in a tree that's only 4-5 levels deep. That means at most 4-5 disk reads per lookup.
- Disk-friendly — nodes are sized to match disk pages (usually 4KB or 8KB), so each node read is one I/O operation.
- Range-scan optimized — the linked leaf list makes
BETWEEN,>,<, andORDER BYextremely efficient. - Always balanced — inserts/deletes never leave the tree unbalanced, so worst-case lookup is always predictable.
Next time you add an index to a column and your query goes from 5 seconds to 5 milliseconds — that's a B+ tree doing its job. Understanding this structure gives you intuition for why indexes work the way they do.
B+ trees are one of those foundational ideas where once you understand them, a lot of other database concepts (clustered indexes, range scans, why primary keys should be sequential) just start making sense naturally.