QuadTrees
If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched article every week delivered straight to your inbox.
What’s the problem statement?
Design an efficient system to store and retrieve global business listings, enabling fast proximity searches. The system should allow users to query nearby restaurants, hospitals, or other locations within a specified radius, returning relevant results in a timely manner.
If you haven't read it already, I highly recommend reading this previous edition about GeoHashing:?How to find nearby entities . This would give you a fair idea about how this problem can be solved using Geohashing.
(Sponsored)
A big shout out to the sponsor for this edition who help to run this newsletter for free ?? Multiplayer auto-documents your system, from the high-level logical architecture down to the individual components, APIs, dependencies, and environments. Perfect for teams who want to speed up their workflows and consolidate their technical assets.
Multiplayer is about to launch its new feature called Platform Debugger ??Multiplayer?Platform Debugger?is a powerful tool that leverages its comprehensive understanding of your entire backend system architecture to offer deep session replays with insights spanning frontend screens, platform traces, metrics, and logs. Check out here for more information.
Curious Engineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free subscriber.
What is a QuadTree?
A Quadtree is a tree data structure in which each node has four or fewer children. They are often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The data associated with a leaf cell varies by application, but the leaf cell represents a "unit of interesting spatial information”.
Four children is the maximum number of allowed children per node in the QuadTree. t depends on the implementation if you want exact four nodes or less. The number of nodes varies depending on the regions and how densely they are populated.
The two-dimensional space can be referred to as the entire Earth itself. If you take the planet Earth, flatten it, and present it on a two-dimensional paper, then you would be able to get something like this:
The two-dimensional space can be referred to as the entire Earth itself. If you take the planet Earth, flatten it, and present it on a two-dimensional paper, then you would be able to get something like this:
Representation of Earth presented as a 2-D plane sketched using Multiplayer:
Now, let’s understand how we can construct the quadtree and search for the nearby locations.
Quadtree Data Structure
QuadTree as a data structure can be imagined as a tree where each internal node has exactly four children. In the image below, you can imagine that the root node is the entire world and is subdivided into four quadrants(similar to GeoHashing)
Representation of a QuadTree sketched using Multiplayer:
QuadTree Structure Representation:
QuadTree Construction
The construction of a quadtree can be imagined using the following steps:
领英推荐
X: maximum number of locations allowed in a particular node
Note: While constructing a QuadTree, the efficiency of Insert, Update, or Search Operations depends largely on the factor: what is the maximum allowed number of locations within a particular node (as discussed above: configurable parameter X)?
If the X is very large, then one node of the QuadTree could contain millions of locations in one node, and thus Search or Insert would become costly. There will be fewer leaf nodes and thus less memory usage. Similarly, if the X is very small, then there will be too many branches of the QuadTree and the tree traversal will take a lot of time. An increased number of leaf nodes will potentially lead to higher memory usage. It will also slow down traversal for large datasets due to the increased number of nodes. Balance X with dataset size and query patterns: Carefully consider the trade-off between memory usage and performance.
Insert Operation
Here’s a pseudo-code for inserting a new location (representing a business in the real world) in the QuadTree:
Pseudo-code for inserting a location in the QuadTree:
This is how the Quadtree can be constructed for an entire world. The good thing about this QuadTree as a data structure is that densely populated regions will have more recursive nodes but less densely populated regions will have fewer businesses and thus less height of the QuadTree.
For example: in the image below, the continent North America will have more depth in the QuadTree whereas the continent Antarctica will have less depth in the QuadTree.
North America vs Antarctica in a QuadTree sketched using Multiplayer:
Search Operation
Here’s a pseudo-code for searching all the relevant locations that are within a particular range from the input user’s location in the QuadTree:
Pseudo-code for searching all locations within a given range:
Time Complexity
The time complexity for both insert and search operations can vary depending on the specific implementation and the balance of the tree. If everything’s done right and appropriate points per node and depth of the tree are chosen, the time complexity for both insert and search operations will come to O(logN) with base 4 where N is the number of points/locations in the entire world. Base 4 is present since we are dividing every grid into four quadrants at every step.
Comparison with Geohashing
If you remember there were edge cases when we discussed the Geohashing algorithm. There could be a user’s location that lies on the edge of a grid that doesn’t share the same Geohash prefix with the adjacent grid. This is no longer a problem in QuadTrees.
QuadTree representation of the grids as nodes represents the adjacent neighbor grids that are at most four unit distances away only. Refer to the below image to understand this better. In the attached image, the neighbor grids surrounding the user’s location in the 0 grid are at most 4 unit distances away when traversing in a QuadTree.That’s why the search is relatively easy and intuitive once you find the node containing your user’s location.
Representation of neighbors in a QuadTree sketched using Multiplayer:
QuadTrees Applications
Here are some popular applications of QuadTrees:
That’s it, folks for this edition of the newsletter. Hope you liked this edition. I wanted to explore more around Semantic Search but will reserve that for future editions.
Please consider liking ?? and sharing with your friends as it motivates me to bring you good content for free. If you think I am doing a decent job, share this article in a nice summary with your network. Connect with me on Linkedin or Twitter for more technical posts in the future.
Led ncr
2 周Good work Vivek Bansal please call me urgent 9823518105
Senior Engineering Manager @Zee | Mentor | Career Coach | YouTuber[5k]
1 个月Good Work Vivek Bansal