登录查看更多内容

SkipList: A probabilistic data structure

Vivek Bansal

Senior Software Engineer at Uber | Ex-Grab | Ex-Directi

发布日期: 2024年9月28日

If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched article every week delivered straight to your inbox.

What is a SkipList?

A SkipList is a probabilistic data structure that supports efficient searching, insertion, and deletion operations on a sorted sequence of elements. It is an alternative to balanced trees like AVL trees or red-black trees(self-balancing BST) but with simpler implementation and often better performance in practice due to its probabilistic nature.

A big shout out to the sponsor for this edition who help to run this newsletter for free ?? Multiplayer auto-documents your system, from the high-level logical architecture down to the individual components, APIs, dependencies, and environments. Perfect for teams who want to speed up their work flows and consolidate their technical assets.

Data Storage

A SkipList is essentially a linked list with additional levels (as highlighted in the image). Each level is a LinkedList prepared by skipping over a few elements in the original sorted LinkedList at the lowest level.

Representation of a SkipList visualized using Multiplayer:

As you can infer from the data structure representation, the last level: Level 0 represents the original list of elements. Level 1 is a LinkedList consisting of the elements above Level 0 by skipping a few elements. Similarly, Level 2 is a LinkedList consisting of skipping a few elements of the LinkedList at Level 1. So on, the list could keep going higher up the Levels.

How does insertion work in SkipList?

Pasting a cool GIF sourced from Wikipedia on the insertion process. Consider reading the steps below the GIF to understand in detail.

GIF representing insertion in a SkipList sourced from Wikipedia

It all starts with a new element insert operation: 80
First, the SkipList checks at the top level: Level 3, and sees if the first element of the LinkedList at the top level is less than the insert element value. i.e. is 80 <= 30?
Since the result (80 <= 30) = false, then we check at the next lower level LinkedList: Level 2, and move forward by one position and compare if 80 <= 50.
The result (80 <= 50) is false, we move on to the next lower level: Level 1, and move forward by one position to find that 80 <= 70 is still false.
Then, finally at the lowest level: Level 0, the comparison 80 <= 90 condition holds true and that’s where we decide to insert the new element.
Then, we flip a coin(doing a probabilistic activity) to see if the new element 80 should be added to the higher levels or not.
If the coin flip results to a Head, we insert it into higher levels of LinkedList and keep on continuing the process unless we get a Tail and terminate the process.

Observations

You would have noticed how we skipped elements 40 and 60 because of LinkedLists that we had at different levels. The more the number of elements present, the better we will be skipping a large subsequence of elements while inserting a new element.
Also, we used a coin to determine if the element should be inserted at a particular level of LinkedList (& other higher levels) to introduce probability into this data structure. It could very well be implemented some other way but I hope you got the idea.

Similarly, Steps 2 to Steps 5 are followed for the update and delete operations in a SkipList. The key thing to note here is: how the SkipLists help in skipping a subsequence of elements for all kinds of operations.

Time Complexity

For a LinkedList that consists of N elements, the SkipLists provide average time complexity O(logN) for insert, update, and delete operations whereas the worst-case time complexity is O(N) for the same set of operations.

The key thing to note here is that when a new node is inserted or an existing node is updated or deleted, the algorithm determines the level of the node based on a series of coin flips. The probability of a node being on a higher level is lower than the probability of it being on a lower level. This means that on average, the algorithm will only need to traverse a few levels to find the correct node, resulting in an average case time complexity of O(log n).

But, that being said, it may very well be possible(with very low probability) that coin flips result in an unbalanced SkipList data structure and almost all of the nodes are encapsulated within 1 or 2 levels of the SkipList. That’s why, the worst-case time complexity could still lead to O(N).

Average case vs Worst case visualized using Multiplayer:

Sandeep Jain 3 个月前

Introduction of iServer Image Service(Chapter 1)

Evelyn Sun 1 年前

Running custom analysis in real time with Raw Export

GameAnalytics 8 个月前

Comparison with Self-Balancing BST

A few points regarding comparison with Self-Balancing BSTs like Red-Black Trees or AVL Trees:

A SkipList does not provide worst-case performance guarantees as similar to a BST because of its probabilistic nature used for building it. That’s why, the worst case for SkipList is O(N) while for Self-Balancing BST is O(logN).
Randomized balancing schemes used in SkipList are easier to implement and understand than deterministic balancing schemes used in balanced binary search trees. You might remember that after every insertion, a self-balancing BST needs to rotate/restructure so that future operations remain optimized. This overhead is much less in SkipList.
SkipList needs less space compared to Self-balancing BST. The SkipList has less pointer overhead (Level and the next pointer) compared to a self-balancing BST where each node has two pointers (left and right). The probabilistic nature of SkipList heights means that the average number of pointers per node is generally lower.

SortedSets by Redis

Redis uses SkipLists as the underlying data structure for Sorted Sets. In Redis, Sorted Sets are a collection of members that are ordered by a score. Each member is associated with a score, and the members are stored in ascending order of their scores. For example:

# Create a leaderboard
ZADD top_scorers 100 player1 80 player2 75 player3

# Get the top 3 players
ZREVRANGE top_scorers 0 2 WITHSCORES

# Update a player's score
ZINCRBY top_scorers 10 player1

# Find players with scores between 80 and 100
ZRANGEBYSCORE top_scorers 80 100 WITHSCORES

Thus, as you can guess, building a real-time leaderboard using Redis’s SortedSets which under the hood uses SkipList is such a powerful use case of the probabilistic data structure that optimizes space as well as the performance(time) of operations.

Other UseCases

Sourced from Wikipedia, here’s a list of popular applications that use SkipList under the hood:

Apache Portable Runtime implements skip lists.
MemSQL uses lock-free skip lists as its prime indexing structure for its database technology.
Cyrus IMAP server offers a "SkipList" backend DB implementation
Lucene uses skip lists to search delta-encoded posting lists in logarithmic time.
Redis, an ANSI-C open-source persistent key/value store for Posix systems, uses skip lists to implement ordered sets.
Discord uses skip lists to store and update the list of members on a server.
RocksDB uses skip lists for its default Memtable implementation.

Before you go…

I hope you visualize the importance of probabilistic data structures in real-world systems present around the world. The problem is the same: Is there a probabilistic way that can improve the time required to perform a certain operation? The underlying fact with all probabilistic data structures is that they reject a large amount of data to be processed in less time with a certain probability and that is what makes it a popular choice in designing systems.

If you’ve not read it already, here are some articles that we covered earlier about probabilistic data structures:

Bloom Filters

Count-min Sketch

That’s it, folks for this edition of the newsletter. Hope you liked this short edition.

Please consider liking ?? and sharing with your friends as it motivates me to bring you good content for free. If you think I am doing a decent job, share this article in a nice summary with your network. Connect with me on Linkedin or Twitter for more technical posts in the future!

Curious Engineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Resources

SkipList by Wikipedia

MIT Video Lecture

GeeksforGeeks

Curious Engineer

18,526 位关注者

Rathod Rahul

I make backend topics simple and clear | SDE at SolGuruz?

1 个月

Great insights on SkipList and its applications! ????

Shishir Chandra

Engineering leader with experience in building low latency and resilient distributed platforms, cloud technologies, data processing systems. Open source contributor and technical mentor. Ex-Adobe | Apple | InMobi

2 个月

Great with databases where ordered data is the need like dictionaries or for logarithmic time operations.

1 次回应

Sri Chavali

Engineering @ Oracle || x-Microsoft, VMware || Database Internals || Distributed systems || Build and scale data systems || Real-time Analytics || Big Data || ML and Deep learning |

2 个月

Thanks for sharing. SkipList is an interesting probabilistic data structure. Apache Cassandra uses?skip lists?for the secondary?indexes, and Mongodb WiredTiger uses?skip lists?for in-memory?operations. SkipLists maintain accuracy, and no approximation is involved, like in Count-min sketch and Bloom filters.

1 次回应

Renganathan S

Data Engineer at Verizon

2 个月

Very informative. I would like to better understand the memory management of skiplists, considering they have multiple levels and use more memory than simple linked lists. How can we efficiently perform real-time operations(insert, update and delete) frequently without incurring excessive memory overhead ?

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

SkipList: A probabilistic data structure

Vivek Bansal

Senior Software Engineer at Uber | Ex-Grab | Ex-Directi

What is a SkipList?

Data Storage

How does insertion work in SkipList?

Observations

Time Complexity

领英推荐

Comparison with Self-Balancing BST

SortedSets by Redis

Other UseCases

Resources

Curious Engineer

18,526 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Loading Data in GraphDB: Best Practices and Tools

Microsoft Fabric Copilot - first look, Power BI

3 Iron Viz Tips from 3 Iron Viz Champions

Merge Overlapping Rasters Using R and Terra

Platform Nuts & Bolts : Extenbable Data Models

Data Visualization with D3.js: Unlocking the Power of Data

Managing State in Jetpack Compose: Individual MutableStates vs. Single MutableState

?? React Query: Understanding useMutation and Why It's Better than useEffect for Side Effects!

Introduction of iServer Image Service(Chapter 1)

What is a SkipList?

Data Storage

How does insertion work in SkipList?

Observations

Time Complexity

领英推荐

Comparison with Self-Balancing BST

SortedSets by Redis

Other UseCases

Resources

Curious Engineer

18,526 位关注者

1 year to Curious Engineer ??

2024年11月18日

Message Queues vs Message Brokers

2024年11月9日

Introduction to gRPC

2024年11月2日

Non-Functional Requirements

2024年10月26日

QuadTrees

2024年10月19日

Text Based Search: ElasticSearch

2024年10月12日

Sharding vs Partitioning

2024年10月5日

Discovering WebSockets using some Coding

2024年9月21日

Onboarding a new video in a Content Platform

2024年9月14日

MySQL vs PostgreSQL: Replication

2024年9月7日

社区洞察

其他会员也浏览了

Loading Data in GraphDB: Best Practices and Tools

Microsoft Fabric Copilot - first look, Power BI

3 Iron Viz Tips from 3 Iron Viz Champions

Merge Overlapping Rasters Using R and Terra

Platform Nuts & Bolts : Extenbable Data Models

Data Visualization with D3.js: Unlocking the Power of Data

Managing State in Jetpack Compose: Individual MutableStates vs. Single MutableState

?? React Query: Understanding useMutation and Why It's Better than useEffect for Side Effects!

Introduction of iServer Image Service(Chapter 1)