Develop a Programming Competition Platform Similar to Leetcode

Develop a Programming Competition Platform Similar to Leetcode

The system should efficiently handle and distribute coding challenges with minimal latency, manage simultaneous user submissions, offer real-time leaderboard updates, and maintain responsiveness throughout the coding contests


Today, we will go over these components:

  1. Delivery of the static content like prompts and submission results.

Storing static files in the object storage

Replicate across CDN for low latency

Maintain a problem metadata database        

2. Remote code execution engine

Queue-based async submission process

Isolated execution environment

Autoscaling pool of task forkers based on the queue size        

3. Real time contest leaderboard

MapReduce and streaming processing of the code submissions

In place leaderboard updates based on the streaming process

Historical leaderboard state snapshots based on the heavier map reduce job        

Problem Object Store and CDN

The essence of the coding contest platform lies in its collection of coding challenges, stored as static text files within a problem object store such as S3. Each file encompasses the problem's description, sample inputs and outputs, constraints, and other essential details necessary for users to comprehend the challenge.

To guarantee swift and dependable delivery of these problem statements to users worldwide, the system employs a Content Delivery Network (CDN). A CDN consists of a series of servers spread out geographically that distribute content to users based on their location. Thus, when a user retrieves a coding problem, the CDN directs the request to the closest server, ensuring the content is delivered promptly. This method significantly decreases latency and enhances the overall user experience on a platform where quick response times are crucial

Code Execution Engine

Tasked with storing, compiling, and executing code submitted by users, this engine needs to be secure, scalable, and capable of supporting various programming languages. It generally functions within a self-scaling, isolated setup, such as a cluster of Docker containers, to ensure sandboxed code execution and mitigate any potential security threats

When a user submits a solution, it undergoes the following process:

1. Submission Processor service

User submissions are first stored in a Submission Object Store like S3. It can manage the influx of user code, tagging each submission with unique metadata for easy retrieval and process.

2. Queue (Kafka)

Upon submission storage, a JSON message containing the submission and problem IDs is dispatched to Kafka. Kafka serves as a distributed messaging queue, enabling asynchronous processing and decoupling the submission intake from the execution phase.

3. Task Runners

Task Runners, consuming messages from Kafka, retrieve the user's code and the associated problem's test cases from the object storage. They execute the code in a scalable, secure, and isolated environment, supporting multiple programming languages. The Task Runners assess code against predefined test cases, focusing on correctness and performance metrics like execution time and memory usage.

4. Submission Object Storage

Post-execution, the Task Runners compile the results - correctness, performance metrics, and other relevant data - and store them in a metadata DB and object storage. These data will be used to render user’s progress in the profile and during the contest in the leaderboard.

Contest Leaderboard

Creating a leaderboard for coding contests requires extracting, processing, and storing data to guarantee accurate, real-time participant rankings. The system usually retrieves data from a submission object store, processes it with Batch MapReduce and streaming techniques, and stores the results in a leaderboard database.

Data Extraction

The process begins with data extraction from the Submission Object Store. This store contains all user submissions, including code, execution results (like pass/fail status, execution time, and memory usage), and user identifiers. For leaderboard computation, key information such as submission time, problem ID, user ID, and execution success is crucial.

Data Processing

Once the data is extracted, the next step is processing it to compute rankings. This is where batch and streaming processes come into play.

  1. MapReduce for Batch Processing: MapReduce is used for batch processing of submission data. In the Map phase, submissions are categorized and organized - for example, mapping each submission to its corresponding problem and user. In the Reduce phase, these mapped submissions are aggregated to compute metrics like total problems solved, average execution time, and success rate for each user.
  2. Streaming for Real-Time Updates: In addition to batch processing, streaming processes are implemented for real-time leaderboard updates. These processes continuously monitor the submission object store for new data. As new submissions come in, they are instantly processed to update relevant metrics like recent problem solves or improvement in execution time.

Storing Data in the Leaderboard Database

After processing, the computed leaderboard data is stored in a 2 formats:

  1. SQL database. This leaderboard database is optimized for fast read and write operations, given that leaderboard updates are frequent and users expect quick access to their standings. Indexes on user_id, contest_id, and rank can be applied to optimize query performance, ensuring that even complex queries, like filtering by contest or rank, are executed swiftly.
  2. NoSQL historical database. This database is used for the fraud detection and other analytics that is performed after the contest. It stores contest state snapshots at the specific point in time. It allows to understand the dynamic of the contest, detect suspicious events, and collect additional on demand metrics.


要查看或添加评论,请登录

Momen Negm的更多文章

社区洞察

其他会员也浏览了