登录查看更多内容

B-Tree vs B+ Tree: Key Differences Explained

Vivek Bansal

Senior Software Engineer at Uber | Ex-Grab | Ex-Directi

发布日期: 2024年9月1日

If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched article every week delivered straight to your inbox.

Brief Overview of Tees

A tree data structure is a hierarchical data structure that consists of nodes connected by edges. It's a non-linear data structure, meaning the elements are not stored sequentially like in arrays or linked lists. Instead, they are organized in a tree-like structure. It consists of key components like Node, Edge, Parent-Child relationship between the nodes, Root, and Subtree, etc.

To understand more about B-Trees and B+Trees, let’s understand more about basic kinds of trees and then about Self-Balancing Binary Search Tree.

Binary Tree: A tree where each node has at most two children.

Binary Search Tree (BST): A binary tree where the left child value of a node is always less than the parent node value, and the right child value is always greater than the parent node value.

Balanced Binary Search Tree: This is a binary tree where the difference in height between the left and right subtrees of each node is at most 1.

At this point, you can imagine why we need Balanced Binary Search Trees: so that the trees don’t grow in a skewed fashion and future operations: insert, search or delete can be performed quickly.

Self-Balancing Binary Search Tree: A self-balancing BST automatically adjusts its shape and height whenever a node is inserted or deleted to maintain balance. A self-balancing BST keeps its height to a minimum, which ensures that its operations maintain a worst-case time complexity of O(log_2n).

B-Trees and B+Trees are NOT self-balancing Binary Search Trees(BST) because they don’t contain only two children nodes, but only self-balancing search trees that are built on the same technical nuances of a BST.

Curious Engineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free subscriber.

What is a B-tree?

As per Wikipedia, B-tree is a self-balancing tree that maintains sorted data and allows searches, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree allowing nodes in a tree to have more than two children. Thus, B-Tree are called m-way trees with some set of rules where "m" represents the maximum number of children a node can have.

Probably, not important, but here are the specific rules that define a B-tree of order m:

Number of keys: Each node must contain at least m/2 - 1 and at most m - 1 keys.
Number of children: Each node must have at least m/2 and at most m children.
Leaf nodes: All leaf nodes must be at the same level.
Root node: The root node can have fewer than m/2 children, but it must have at least 2.

These rules ensure that B-trees remain balanced and efficient for search, insertion, and deletion operations. The value of "m" is typically chosen based on the specific application and the characteristics of the underlying storage system.

Take a pause here and literally imagine that B-Tree is a self-balancing tree. Any insert, update, delete operations would all lead to rebalancing of the B-tree in such a way that it maintains it’s height upto O(log_mN) so that future operations also remain optimized. Here, m = order of the B-Tree and N = number of nodes in the B-Tree.

Different types of nodes in a B-Tree

There are three kinds of nodes often discussed when talking about a B-Tree: Internal nodes, leaf nodes, and root nodes.

Unlike traditional Binary search trees where you can only store one key in a single node, the B-Tree allows you to store multiple keys in a single node which eventually allows the tree to have a larger branching factor and thus a shallower height.

Let’s see how data storage happens inside a B-Tree.

Data storage inside a B-Tree

Whenever we talk about storing data in any tree, we are talking about storing keys and potential values within nodes of a tree. The keys are present within all nodes(root, internal, and leaf) but where are values stored makes all the difference.

B-Trees allow you to store keys and values inside the root node, internal nodes as well as leaf nodes. By values, I mean the actual data records of a table. Refer to the below example for visual representation of a B-Tree. It contains two kinds of pointers: child pointers & data pointers.

“Child pointers”: The left pointer of a key points to a node that contains all the keys smaller than the parent node key and the right pointer points to a node that contains all the keys greater than the parent node key. In the below diagram, you can see that left pointer of node containing 15 as key points to the child node starting from 12. The right pointer of 15 (also the left pointer of 20) points to a node that contains keys between 15 and 20. Similarly, the right pointer of 20 points to a node containing key: 25.

“Data pointers”: These pointers actually point to the data record present in the table. Thus, when searching for a key in a B-Tree, the key is found within a node, the data record is accessed quickly and there is no need to traverse the whole B-Tree.

领英推荐

It's time to check out August's roundup!

Global Excel Summit 6 个月前

Week of May 20th

Stefan Krawczyk 10 个月前

A Generic Data Stream with Coroutines in C++20

Rainer Grimm 3 年前

Representation of a key inside B-Tree pointing to a data record

Please observe carefully how I have used the unique ID of the “developers“ table to be used as a key inside B-Tree. These keys are maintained in a sorted fashion inside B-Tree. Thus, searching for a record using this B-Tree is so fast: we just have to find where the developer’s ID exists in the B-Tree using the BST search algorithm and we would get the whole record.

Search inside a B-Tree

Searching inside a B-Tree is similar to how it happens inside a Binary Search Tree. Firstly, begin the search at the root node of the B-tree. Compare the search key with the multiple keys stored in the current node.

If the search key is found in the current node, the associated data record is returned. If the search key is less than the smallest key in the current node, proceed to the leftmost child node. If the search key is greater than the largest key in the current node, proceed to the rightmost child node. Otherwise, find the child node whose keys are greater than the search key but less than the next key in the current node. Recursively search for all the child nodes until the search key is found or the end of the tree is reached.

What is a B+Tree?

B+Tree is an extension of B-Tree with some modifications. B+Tree is also a self-balancing tree that maintains sorted data and allows searches, insertions, and deletions in logarithmic time.

Different types of nodes in a B+Tree

Same as B-Tree, there are three kinds of nodes in a B+Tree: root node, internal nodes and leaf nodes.

Data storage inside a B+Tree

B+Trees allow you to store keys inside the internal nodes but the data records are only stored at the leaf nodes. Similar to B-Trees, there are two kinds of pointers in a B+Tree along with a slight difference:

“Child pointers”: Similar to B-Tree, these pointers point to the child nodes containing the keys in appropriate range.

“Data pointers”: Unlike B-Trees, the data pointers only exist at the leaf nodes and point to the actual data record of the table. Also, the last pointer in a leaf node points to the next leaf node in the B+ tree which resembles a linked list data structure. This pointer helps speed up sequential access.

Search inside a B+Tree

Search starts at the root and proceeds down the tree, comparing the search key with the keys in each node. If the search key is found in an internal node, the search still continues down the appropriate child node and when the search key is found in a leaf node, the associated data record is returned.

Differences between B-Tree and B+Trees

Now, let’s have a macroscopic view of both B-Trees and B+Trees and understand the key differences and how it impacts their performance:

Data Storage: B-trees store data records in internal nodes, while B+trees store them exclusively in leaf nodes. Thus, B+Trees can store more number of keys in the same space compared to B-Trees which holds keys and data pointers as well.
Random/Sequential Data Access: If you want to have sequential access of all the keys in B-Trees, you would have to do a full traversal of the tree i.e. visit possibly each and every node to access all the keys. However, B+trees have a linked list of leaf nodes. Thus, you can just use one path to traverse down the whole tree and then access all the keys using links at the leaf nodes. Thus, making sequential access more efficient in case of B+Trees. For random access, B-trees and B+trees have similar performance. B+Trees search the key quickly, but might do a little delay in returning the data record. On the other hand, B-Trees might take time to search the key, but if found, will return the data record quickly.

If you’re interested in reading more about how B-Trees and B+Trees are used in databases, you should definitely checkout this article: MySQL vs PostgreSQL: Indexing

That’s it, folks for this edition of the newsletter.

Please consider liking and sharing with your friends as it motivates me to bring you good content for free. If you think I am doing a decent job, share this article in a nice summary with your network. Connect with me on Linkedin or Twitter for more technical posts in the future!

Curious Engineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Resources

Difference between B-Trees and B+Trees: A StackOverFlow answer

B-Tree Indexing Basics Explained

B-Tree and B+Tree by Wikipedia

B Trees and B+ Trees. How they are useful in Databases by Abdul Bari

Understanding B-Trees: The Data Structure Behind Modern Databases

Curious Engineer

19,937 位关注者

Han Qi

Principal Data Scientist at Monarch Tractor

2 周

What does left/right child pointer mean in the b-tree diagram for the arrow between 15 and 20? Is the right child pointer of 15 also the left child pointer of 20? "Otherwise, find the child node whose keys are greater than the search key but less than the next key in the current node." This is hard to understand. What is "the next key"? It assumes an understanding of "current key", which is what? How would that confusing paragraph be written if the search key is 16 or 17 or 18 or 19? Especially for 19, none of the keys in the child node containing [16,18] fit the description "the child node whose keys are greater than the search key" because both 16, 18 are less than 19, not greater. Referring to keys in the diagram would help clarify.

Divyam Mehta

Software Engineer II | 3+ YoE building products and services in code

6 个月

this is very well-put, and to the point. Easy and a quick read to recap on the concept!

1 次回应

Kartik Kaushik

Full Stack developer | content creator| open for collabs| freelancer

6 个月

Insightful one thanks for sharing it Vivek Bansal

2 次回应

查看更多评论

要查看或添加评论，请登录

Vivek Bansal的更多文章

How to implement a Circuit Breaker

2025年1月12日

How to implement a Circuit Breaker

If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

7 条评论
How to implement Consistent Hashing

2024年12月29日

How to implement Consistent Hashing

If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

3 条评论
Optimistic Locking Implementation

2024年12月1日

Optimistic Locking Implementation

If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…
1 year to Curious Engineer ??

2024年11月18日

1 year to Curious Engineer ??

If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…
Message Queues vs Message Brokers

2024年11月9日

Message Queues vs Message Brokers

If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

4 条评论
Introduction to gRPC

2024年11月2日

Introduction to gRPC

If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

7 条评论
Non-Functional Requirements

2024年10月26日

Non-Functional Requirements

Brief Introduction Let’s say you are building a website that allows users to book flight tickets. The requirements for…

4 条评论
QuadTrees

2024年10月19日

QuadTrees

If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

2 条评论
Text Based Search: ElasticSearch

2024年10月12日

Text Based Search: ElasticSearch

If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

3 条评论
Sharding vs Partitioning

2024年10月5日

Sharding vs Partitioning

If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

5 条评论

See all articles

B-Tree vs B+ Tree: Key Differences Explained

Vivek Bansal

Senior Software Engineer at Uber | Ex-Grab | Ex-Directi

Brief Overview of Tees

What is a B-tree?

Different types of nodes in a B-Tree

Data storage inside a B-Tree

领英推荐

Search inside a B-Tree

What is a B+Tree?

Different types of nodes in a B+Tree

Data storage inside a B+Tree

Search inside a B+Tree

Differences between B-Tree and B+Trees

Resources

Curious Engineer

19,937 位关注者

Vivek Bansal的更多文章

社区洞察

其他会员也浏览了

First Impressions of DAX Optimizer

An Infinite Data Stream with Coroutines in C++20

R + Shiny + Fama French: Part 2

A Deep Dive into Go Slices: Understanding Value vs. Reference, Modifying and Adding Elements

Comparing React’s useState and useReducer Hooks

What are differences Between Concatenate and Textjoin ??

Raiders of the lost File

New RTM Pro Tools to Boost Your Workflow + Exciting End-of-Year Teaser!

How to Pass Data Between Components in React

C++ Core Guidelines: The Remaining Rules to Class Hierarchies

Brief Overview of Tees

What is a B-tree?

Different types of nodes in a B-Tree

Data storage inside a B-Tree

领英推荐

Search inside a B-Tree

What is a B+Tree?

Different types of nodes in a B+Tree

Data storage inside a B+Tree

Search inside a B+Tree

Differences between B-Tree and B+Trees

Resources

Curious Engineer

19,937 位关注者

Vivek Bansal的更多文章

How to implement a Circuit Breaker

How to implement Consistent Hashing

Optimistic Locking Implementation

1 year to Curious Engineer ??

Message Queues vs Message Brokers

Introduction to gRPC

Non-Functional Requirements

QuadTrees

Text Based Search: ElasticSearch

Sharding vs Partitioning

社区洞察

其他会员也浏览了

First Impressions of DAX Optimizer

An Infinite Data Stream with Coroutines in C++20

R + Shiny + Fama French: Part 2

A Deep Dive into Go Slices: Understanding Value vs. Reference, Modifying and Adding Elements

Comparing React’s useState and useReducer Hooks

What are differences Between Concatenate and Textjoin ??

Raiders of the lost File

New RTM Pro Tools to Boost Your Workflow + Exciting End-of-Year Teaser!

How to Pass Data Between Components in React

C++ Core Guidelines: The Remaining Rules to Class Hierarchies