Develop a Programming Competition Platform Similar to Leetcode
Momen Negm
Chief Technology Officer @ T-Vencubator | Data Scientist, Generative AI | Tech entrepreneur - Engineering leader
The system should efficiently handle and distribute coding challenges with minimal latency, manage simultaneous user submissions, offer real-time leaderboard updates, and maintain responsiveness throughout the coding contests
Today, we will go over these components:
Storing static files in the object storage
Replicate across CDN for low latency
Maintain a problem metadata database
2. Remote code execution engine
Queue-based async submission process
Isolated execution environment
Autoscaling pool of task forkers based on the queue size
3. Real time contest leaderboard
MapReduce and streaming processing of the code submissions
In place leaderboard updates based on the streaming process
Historical leaderboard state snapshots based on the heavier map reduce job
Problem Object Store and CDN
The essence of the coding contest platform lies in its collection of coding challenges, stored as static text files within a problem object store such as S3. Each file encompasses the problem's description, sample inputs and outputs, constraints, and other essential details necessary for users to comprehend the challenge.
To guarantee swift and dependable delivery of these problem statements to users worldwide, the system employs a Content Delivery Network (CDN). A CDN consists of a series of servers spread out geographically that distribute content to users based on their location. Thus, when a user retrieves a coding problem, the CDN directs the request to the closest server, ensuring the content is delivered promptly. This method significantly decreases latency and enhances the overall user experience on a platform where quick response times are crucial
Code Execution Engine
Tasked with storing, compiling, and executing code submitted by users, this engine needs to be secure, scalable, and capable of supporting various programming languages. It generally functions within a self-scaling, isolated setup, such as a cluster of Docker containers, to ensure sandboxed code execution and mitigate any potential security threats
When a user submits a solution, it undergoes the following process:
领英推荐
1. Submission Processor service
User submissions are first stored in a Submission Object Store like S3. It can manage the influx of user code, tagging each submission with unique metadata for easy retrieval and process.
2. Queue (Kafka)
Upon submission storage, a JSON message containing the submission and problem IDs is dispatched to Kafka. Kafka serves as a distributed messaging queue, enabling asynchronous processing and decoupling the submission intake from the execution phase.
3. Task Runners
Task Runners, consuming messages from Kafka, retrieve the user's code and the associated problem's test cases from the object storage. They execute the code in a scalable, secure, and isolated environment, supporting multiple programming languages. The Task Runners assess code against predefined test cases, focusing on correctness and performance metrics like execution time and memory usage.
4. Submission Object Storage
Post-execution, the Task Runners compile the results - correctness, performance metrics, and other relevant data - and store them in a metadata DB and object storage. These data will be used to render user’s progress in the profile and during the contest in the leaderboard.
Contest Leaderboard
Creating a leaderboard for coding contests requires extracting, processing, and storing data to guarantee accurate, real-time participant rankings. The system usually retrieves data from a submission object store, processes it with Batch MapReduce and streaming techniques, and stores the results in a leaderboard database.
Data Extraction
The process begins with data extraction from the Submission Object Store. This store contains all user submissions, including code, execution results (like pass/fail status, execution time, and memory usage), and user identifiers. For leaderboard computation, key information such as submission time, problem ID, user ID, and execution success is crucial.
Data Processing
Once the data is extracted, the next step is processing it to compute rankings. This is where batch and streaming processes come into play.
Storing Data in the Leaderboard Database
After processing, the computed leaderboard data is stored in a 2 formats: