B-Tree vs B+ Tree: Key Differences Explained

B-Tree vs B+ Tree: Key Differences Explained


If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched article every week delivered straight to your inbox.


Brief Overview of Tees

A tree data structure is a hierarchical data structure that consists of nodes connected by edges. It's a non-linear data structure, meaning the elements are not stored sequentially like in arrays or linked lists. Instead, they are organized in a tree-like structure. It consists of key components like Node, Edge, Parent-Child relationship between the nodes, Root, and Subtree, etc.

To understand more about B-Trees and B+Trees, let’s understand more about basic kinds of trees and then about Self-Balancing Binary Search Tree.

Binary Tree: A tree where each node has at most two children.

Binary Search Tree (BST): A binary tree where the left child value of a node is always less than the parent node value, and the right child value is always greater than the parent node value.


Balanced Binary Search Tree: This is a binary tree where the difference in height between the left and right subtrees of each node is at most 1.


Balanced BST vs Unbalanced BST

At this point, you can imagine why we need Balanced Binary Search Trees: so that the trees don’t grow in a skewed fashion and future operations: insert, search or delete can be performed quickly.

Self-Balancing Binary Search Tree: A self-balancing BST automatically adjusts its shape and height whenever a node is inserted or deleted to maintain balance. A self-balancing BST keeps its height to a minimum, which ensures that its operations maintain a worst-case time complexity of O(log_2n).

B-Trees and B+Trees are NOT self-balancing Binary Search Trees(BST) because they don’t contain only two children nodes, but only self-balancing search trees that are built on the same technical nuances of a BST.

Curious Engineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free subscriber.

What is a B-tree?

As per Wikipedia, B-tree is a self-balancing tree that maintains sorted data and allows searches, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree allowing nodes in a tree to have more than two children. Thus, B-Tree are called m-way trees with some set of rules where "m" represents the maximum number of children a node can have.

Probably, not important, but here are the specific rules that define a B-tree of order m:

  1. Number of keys: Each node must contain at least m/2 - 1 and at most m - 1 keys.
  2. Number of children: Each node must have at least m/2 and at most m children.
  3. Leaf nodes: All leaf nodes must be at the same level.
  4. Root node: The root node can have fewer than m/2 children, but it must have at least 2.

These rules ensure that B-trees remain balanced and efficient for search, insertion, and deletion operations. The value of "m" is typically chosen based on the specific application and the characteristics of the underlying storage system.

Take a pause here and literally imagine that B-Tree is a self-balancing tree. Any insert, update, delete operations would all lead to rebalancing of the B-tree in such a way that it maintains it’s height upto O(log_mN) so that future operations also remain optimized. Here, m = order of the B-Tree and N = number of nodes in the B-Tree.

Different types of nodes in a B-Tree

There are three kinds of nodes often discussed when talking about a B-Tree: Internal nodes, leaf nodes, and root nodes.


Different types of nodes in a B-Tree

Unlike traditional Binary search trees where you can only store one key in a single node, the B-Tree allows you to store multiple keys in a single node which eventually allows the tree to have a larger branching factor and thus a shallower height.

Let’s see how data storage happens inside a B-Tree.

Data storage inside a B-Tree

Whenever we talk about storing data in any tree, we are talking about storing keys and potential values within nodes of a tree. The keys are present within all nodes(root, internal, and leaf) but where are values stored makes all the difference.

B-Trees allow you to store keys and values inside the root node, internal nodes as well as leaf nodes. By values, I mean the actual data records of a table. Refer to the below example for visual representation of a B-Tree. It contains two kinds of pointers: child pointers & data pointers.

“Child pointers”: The left pointer of a key points to a node that contains all the keys smaller than the parent node key and the right pointer points to a node that contains all the keys greater than the parent node key. In the below diagram, you can see that left pointer of node containing 15 as key points to the child node starting from 12. The right pointer of 15 (also the left pointer of 20) points to a node that contains keys between 15 and 20. Similarly, the right pointer of 20 points to a node containing key: 25.

“Data pointers”: These pointers actually point to the data record present in the table. Thus, when searching for a key in a B-Tree, the key is found within a node, the data record is accessed quickly and there is no need to traverse the whole B-Tree.


Representation of a key inside B-Tree pointing to a data record

Please observe carefully how I have used the unique ID of the “developers“ table to be used as a key inside B-Tree. These keys are maintained in a sorted fashion inside B-Tree. Thus, searching for a record using this B-Tree is so fast: we just have to find where the developer’s ID exists in the B-Tree using the BST search algorithm and we would get the whole record.

Search inside a B-Tree

Searching inside a B-Tree is similar to how it happens inside a Binary Search Tree. Firstly, begin the search at the root node of the B-tree. Compare the search key with the multiple keys stored in the current node.

If the search key is found in the current node, the associated data record is returned. If the search key is less than the smallest key in the current node, proceed to the leftmost child node. If the search key is greater than the largest key in the current node, proceed to the rightmost child node. Otherwise, find the child node whose keys are greater than the search key but less than the next key in the current node. Recursively search for all the child nodes until the search key is found or the end of the tree is reached.

What is a B+Tree?

B+Tree is an extension of B-Tree with some modifications. B+Tree is also a self-balancing tree that maintains sorted data and allows searches, insertions, and deletions in logarithmic time.

Different types of nodes in a B+Tree

Same as B-Tree, there are three kinds of nodes in a B+Tree: root node, internal nodes and leaf nodes.

Data storage inside a B+Tree

B+Trees allow you to store keys inside the internal nodes but the data records are only stored at the leaf nodes. Similar to B-Trees, there are two kinds of pointers in a B+Tree along with a slight difference:

“Child pointers”: Similar to B-Tree, these pointers point to the child nodes containing the keys in appropriate range.

“Data pointers”: Unlike B-Trees, the data pointers only exist at the leaf nodes and point to the actual data record of the table. Also, the last pointer in a leaf node points to the next leaf node in the B+ tree which resembles a linked list data structure. This pointer helps speed up sequential access.


Visual representation of a B+Tree

Search inside a B+Tree

Search starts at the root and proceeds down the tree, comparing the search key with the keys in each node. If the search key is found in an internal node, the search still continues down the appropriate child node and when the search key is found in a leaf node, the associated data record is returned.

Differences between B-Tree and B+Trees

Now, let’s have a macroscopic view of both B-Trees and B+Trees and understand the key differences and how it impacts their performance:

  1. Data Storage: B-trees store data records in internal nodes, while B+trees store them exclusively in leaf nodes. Thus, B+Trees can store more number of keys in the same space compared to B-Trees which holds keys and data pointers as well.
  2. Random/Sequential Data Access: If you want to have sequential access of all the keys in B-Trees, you would have to do a full traversal of the tree i.e. visit possibly each and every node to access all the keys. However, B+trees have a linked list of leaf nodes. Thus, you can just use one path to traverse down the whole tree and then access all the keys using links at the leaf nodes. Thus, making sequential access more efficient in case of B+Trees. For random access, B-trees and B+trees have similar performance. B+Trees search the key quickly, but might do a little delay in returning the data record. On the other hand, B-Trees might take time to search the key, but if found, will return the data record quickly.


If you’re interested in reading more about how B-Trees and B+Trees are used in databases, you should definitely checkout this article: MySQL vs PostgreSQL: Indexing


That’s it, folks for this edition of the newsletter.

Please consider liking and sharing with your friends as it motivates me to bring you good content for free. If you think I am doing a decent job, share this article in a nice summary with your network. Connect with me on Linkedin or Twitter for more technical posts in the future!

Curious Engineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Resources

Difference between B-Trees and B+Trees: A StackOverFlow answer

B-Tree Indexing Basics Explained

B-Tree and B+Tree by Wikipedia

B Trees and B+ Trees. How they are useful in Databases by Abdul Bari

Understanding B-Trees: The Data Structure Behind Modern Databases

Han Qi

Principal Data Scientist at Monarch Tractor

2 周

What does left/right child pointer mean in the b-tree diagram for the arrow between 15 and 20? Is the right child pointer of 15 also the left child pointer of 20? "Otherwise, find the child node whose keys are greater than the search key but less than the next key in the current node." This is hard to understand. What is "the next key"? It assumes an understanding of "current key", which is what? How would that confusing paragraph be written if the search key is 16 or 17 or 18 or 19? Especially for 19, none of the keys in the child node containing [16,18] fit the description "the child node whose keys are greater than the search key" because both 16, 18 are less than 19, not greater. Referring to keys in the diagram would help clarify.

回复
Divyam Mehta

Software Engineer II | 3+ YoE building products and services in code

6 个月

this is very well-put, and to the point. Easy and a quick read to recap on the concept!

Kartik Kaushik

Full Stack developer | content creator| open for collabs| freelancer

6 个月

Insightful one thanks for sharing it Vivek Bansal

要查看或添加评论,请登录

Vivek Bansal的更多文章

  • How to implement a Circuit Breaker

    How to implement a Circuit Breaker

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    7 条评论
  • How to implement Consistent Hashing

    How to implement Consistent Hashing

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    3 条评论
  • Optimistic Locking Implementation

    Optimistic Locking Implementation

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

  • 1 year to Curious Engineer ??

    1 year to Curious Engineer ??

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

  • Message Queues vs Message Brokers

    Message Queues vs Message Brokers

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    4 条评论
  • Introduction to gRPC

    Introduction to gRPC

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    7 条评论
  • Non-Functional Requirements

    Non-Functional Requirements

    Brief Introduction Let’s say you are building a website that allows users to book flight tickets. The requirements for…

    4 条评论
  • QuadTrees

    QuadTrees

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    2 条评论
  • Text Based Search: ElasticSearch

    Text Based Search: ElasticSearch

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    3 条评论
  • Sharding vs Partitioning

    Sharding vs Partitioning

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    5 条评论

社区洞察

其他会员也浏览了