SkipList: A probabilistic data structure
If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched article every week delivered straight to your inbox.
What is a SkipList?
A SkipList is a probabilistic data structure that supports efficient searching, insertion, and deletion operations on a sorted sequence of elements. It is an alternative to balanced trees like AVL trees or red-black trees(self-balancing BST) but with simpler implementation and often better performance in practice due to its probabilistic nature.
A big shout out to the sponsor for this edition who help to run this newsletter for free ?? Multiplayer auto-documents your system, from the high-level logical architecture down to the individual components, APIs, dependencies, and environments. Perfect for teams who want to speed up their work flows and consolidate their technical assets.
Data Storage
A SkipList is essentially a linked list with additional levels (as highlighted in the image). Each level is a LinkedList prepared by skipping over a few elements in the original sorted LinkedList at the lowest level.
Representation of a SkipList visualized using Multiplayer:
As you can infer from the data structure representation, the last level: Level 0 represents the original list of elements. Level 1 is a LinkedList consisting of the elements above Level 0 by skipping a few elements. Similarly, Level 2 is a LinkedList consisting of skipping a few elements of the LinkedList at Level 1. So on, the list could keep going higher up the Levels.
How does insertion work in SkipList?
Pasting a cool GIF sourced from Wikipedia on the insertion process. Consider reading the steps below the GIF to understand in detail.
Observations
Similarly, Steps 2 to Steps 5 are followed for the update and delete operations in a SkipList. The key thing to note here is: how the SkipLists help in skipping a subsequence of elements for all kinds of operations.
Time Complexity
For a LinkedList that consists of N elements, the SkipLists provide average time complexity O(logN) for insert, update, and delete operations whereas the worst-case time complexity is O(N) for the same set of operations.
The key thing to note here is that when a new node is inserted or an existing node is updated or deleted, the algorithm determines the level of the node based on a series of coin flips. The probability of a node being on a higher level is lower than the probability of it being on a lower level. This means that on average, the algorithm will only need to traverse a few levels to find the correct node, resulting in an average case time complexity of O(log n).
But, that being said, it may very well be possible(with very low probability) that coin flips result in an unbalanced SkipList data structure and almost all of the nodes are encapsulated within 1 or 2 levels of the SkipList. That’s why, the worst-case time complexity could still lead to O(N).
Average case vs Worst case visualized using Multiplayer:
领英推荐
Comparison with Self-Balancing BST
A few points regarding comparison with Self-Balancing BSTs like Red-Black Trees or AVL Trees:
SortedSets by Redis
Redis uses SkipLists as the underlying data structure for Sorted Sets. In Redis, Sorted Sets are a collection of members that are ordered by a score. Each member is associated with a score, and the members are stored in ascending order of their scores. For example:
# Create a leaderboard
ZADD top_scorers 100 player1 80 player2 75 player3
# Get the top 3 players
ZREVRANGE top_scorers 0 2 WITHSCORES
# Update a player's score
ZINCRBY top_scorers 10 player1
# Find players with scores between 80 and 100
ZRANGEBYSCORE top_scorers 80 100 WITHSCORES
Thus, as you can guess, building a real-time leaderboard using Redis’s SortedSets which under the hood uses SkipList is such a powerful use case of the probabilistic data structure that optimizes space as well as the performance(time) of operations.
Other UseCases
Sourced from Wikipedia, here’s a list of popular applications that use SkipList under the hood:
Before you go…
I hope you visualize the importance of probabilistic data structures in real-world systems present around the world. The problem is the same: Is there a probabilistic way that can improve the time required to perform a certain operation? The underlying fact with all probabilistic data structures is that they reject a large amount of data to be processed in less time with a certain probability and that is what makes it a popular choice in designing systems.
If you’ve not read it already, here are some articles that we covered earlier about probabilistic data structures:
That’s it, folks for this edition of the newsletter. Hope you liked this short edition.
Please consider liking ?? and sharing with your friends as it motivates me to bring you good content for free. If you think I am doing a decent job, share this article in a nice summary with your network. Connect with me on Linkedin or Twitter for more technical posts in the future!
Curious Engineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
I make backend topics simple and clear | SDE at SolGuruz?
1 个月Great insights on SkipList and its applications! ????
Engineering leader with experience in building low latency and resilient distributed platforms, cloud technologies, data processing systems. Open source contributor and technical mentor. Ex-Adobe | Apple | InMobi
2 个月Great with databases where ordered data is the need like dictionaries or for logarithmic time operations.
Engineering @ Oracle || x-Microsoft, VMware || Database Internals || Distributed systems || Build and scale data systems || Real-time Analytics || Big Data || ML and Deep learning |
2 个月Thanks for sharing. SkipList is an interesting probabilistic data structure. Apache Cassandra uses?skip lists?for the secondary?indexes, and Mongodb WiredTiger uses?skip lists?for in-memory?operations. SkipLists maintain accuracy, and no approximation is involved, like in Count-min sketch and Bloom filters.
Data Engineer at Verizon
2 个月Very informative. I would like to better understand the memory management of skiplists, considering they have multiple levels and use more memory than simple linked lists. How can we efficiently perform real-time operations(insert, update and delete) frequently without incurring excessive memory overhead ?