登录查看更多内容

LeetCode System Design: A Deep Dive into Scalable Code Execution Platforms

Surya Raghavarapu

Software Engineer

发布日期: 2024年8月12日

+ 关注

Functional Requirements

Users should be able to view coding problems
Users should be able to submit solutions in various programming languages
The system should compile and run the submitted code
The system should test the code against predefined test cases
Users should receive results (success/failure, runtime, memory usage)
The platform should support multiple users simultaneously

Non-Functional Requirements

Scalability: The system should handle millions of submissions daily
Security: User-submitted code must run in isolated environments
Performance: Results should be returned quickly (ideally within seconds)
Availability: The platform should be accessible 99.99% of the time
Consistency: Same code should produce the same results every time

Basic Data Design

Problem Table: ProblemID (Primary Key) Title Description Difficulty TestCases
Submission Table: SubmissionID (Primary Key) UserID (Foreign Key) ProblemID (Foreign Key) SubmittedCode Language Status (Pending/Completed) Result (Success/Failure) Runtime MemoryUsage SubmissionTime

Image1: This shows how the database design looks like for a leetcode basic design

API Design

Get Problem:

GET /api/problems/{problem_id}
Response: 
{  "problem_id": int, 
 "title": string, 
 "description": string,
"difficulty": string
}

2.Submit Solution:

POST /api/submit Request:
 { 
"problem_id": int, 
"code": string, 
"language": string 
} Response: 
{
 "submission_id": string 
}

3.Check Submission Status:

GET /api/submissions/{submission_id}
 Response: 
{ 
"status": string, 
"result": string, 
"runtime": float,
 "memory_usage": float
 }

4.List Problems:

GET /api/problems?page={page_number}&limit={limit} 
Response: {
 "problems": [ { "problem_id": int, "title": string, "difficulty": string }, ... ],
 "total_pages": int }

System Architecture and Flow

Now, let's dive into how the system actually works and why it's designed this way.

Client (Web Browser) Users interact with a web-based IDE (Integrated Development Environment). This is typically a lightweight code editor embedded in the webpage.
API Gateway All requests first go through an API Gateway.

An API Gateway is like a smart receptionist for your system. It directs incoming requests to the right places, handles security checks, and can even do some basic request processing.

Application Servers These servers handle the core logic of the application. They process requests, interact with databases, and manage the overall flow of operations.

Here's where we encounter our first major design decision.

Why can't we just run user code directly on these servers?

The problem is with vertical scaling.

Vertical scaling means adding more power (CPU, RAM) to an existing server. It's like trying to make a car go faster by putting in a bigger engine.

While this can work up to a point, it has limitations:

There's a limit to how powerful a single machine can be It's expensive It doesn't provide isolation between different users' code This is where the concept of horizontal scaling comes in.

Horizontal scaling means adding more machines to your system instead of making one machine more powerful. It's like adding more cars to a delivery fleet instead of trying to make one car do all the deliveries.

Peak Load Scenario: Imagine a coding competition with 10,000 submissions at once. Each submission needs to run against 100 test cases. Each test case takes about 100 milliseconds (0.1 seconds) to run.
Total Workload Calculation: For one submission: 100 test cases 0.1 seconds = 10 seconds For all 10,000 submissions: 10,000 10 seconds = 100,000 seconds
Time to Process on a Single Core: 100,000 seconds is about 27.8 hours (100,000 / 3600)
Cores Needed for Quick Processing: If we want to process all this in just 1 minute (60 seconds): Cores required = Total processing time / Target time That's 100,000 seconds / 60 seconds = 1,667 cores
Comparison with Available Hardware: The most powerful Amazon server instances offer up to 128 cores. To get 1,667 cores, we'd need about 13 of these top-tier instances (1,667 / 128).
Why This Matters: It's not practical or cost-effective to use 13 of the most powerful servers for this task. Even if we could, managing the workload across these servers would be complex. This shows why we can't simply use bigger, more powerful computers (vertical scaling).
The Solution: Instead of a few very powerful machines, LeetCode uses many smaller, containerized environments. This allows for better distribution of work, improved security, and easier scaling. It's more flexible: we can easily add or remove processing power as needed.

In essence, the sheer amount of processing power needed makes it impossible to run a system like LeetCode on traditional servers. This is why they use cloud services, containerization, and distributed computing to handle the massive workload efficiently and cost-effectively.

领英推荐

The Reactive Streams Difference

InRhythm 2 年前

SonarQube benchmark study - CodeScene 6x more accurate

CodeScene 1 年前

Understanding Generics in C#

Asp.net with c# 1 个月前

Code Execution Environment

To solve the problems of vertical scaling and provide necessary isolation, we use containerization. Let's break down the evolution of this approach: a.

Virtual Machines (VMs) Virtual Machines are like having several separate computers running inside your physical computer.

Each VM has its own operating system and resources.

Pros:

Good isolation Can run different operating systems

Cons: Heavy resource usage Slow to start up Inefficient for running small, short-lived processes (like code submissions)

b. Docker Containers

Docker containers are like lightweight, standardized boxes for your code and all its dependencies.

They're much lighter than VMs because they share the host system's kernel.

Pros: Lightweight and fast to start Efficient resource usage Consistent environment across different systems

Cons: Slightly less isolation than VMs (though still very good) Requires careful configuration for security

c. Amazon ECS (Elastic Container Service)

Amazon ECS is like a smart manager for your Docker containers.

It handles deploying containers, scaling them based on demand, and ensuring they're healthy.

Pros: Automated management of containers Easy scaling Integration with other AWS services Cons: Vendor lock-in to AWS Can be complex to set up initially ECS keeps track of container health through regular checks.

If a container becomes unresponsive or unhealthy, ECS can automatically replace it.

Queue System (e.g., Amazon SQS)

A queue system is like a buffer or waiting line for tasks.

When a user submits code, instead of running it immediately, we place it in a queue. This queue system allows for asynchronous processing.

Asynchronous means "not occurring at the same time". In our system, it means that the user doesn't have to wait for their code to finish running before getting a response.

Here's how it works: User submits code System immediately returns a submission ID Code is placed in the queue Worker processes pick up code from the queue and execute it Results are stored in the database User can check the status using the submission ID This approach has several benefits: Users get an immediate response The system can handle traffic spikes more easily We can prioritize certain tasks if needed

Worker Processes These are the processes that actually execute the code. They: Pick up submissions from the queue Set up the necessary environment (using Docker containers) Run the code with the test cases Record the results
Databases We use databases to store problems, submissions, and results. For a system like this, we might use a combination of: Relational database (e.g., PostgreSQL) for structured data like user accounts and problems NoSQL database (e.g., MongoDB) for more flexible data like submissions and results
Caching Layer (e.g., Redis) A cache is like a short-term memory for our system. It stores frequently accessed data to reduce database load and speed up responses.

Putting It All Together

When a user submits code, here's what happens:

The submission goes through the API Gateway
The Application Server receives it and generates a submission ID
The submission is placed in the queue
The user immediately receives the submission ID
A worker picks up the submission from the queue
The worker spins up a Docker container in ECS
The code is executed in the container
Results are stored in the database
The user can check the status and results using their submission ID

This architecture allows LeetCode to:

Handle millions of submissions daily (scalability)
Run untrusted code safely (security)
Provide quick feedback to users (performance)
Handle traffic spikes gracefully (availability)
Ensure consistent results (consistency)

By leveraging technologies like Docker and AWS services, and using patterns like queuing and asynchronous processing, LeetCode creates a robust, scalable system capable of handling the demands of millions of users practicing coding problems every day.

要查看或添加评论，请登录

Surya Raghavarapu的更多文章

???? How Uber Finds Your Ride So Quickly: The Tech Behind the Magic

2024年8月12日

???? How Uber Finds Your Ride So Quickly: The Tech Behind the Magic

In the bustling world of ride-hailing apps like Uber, the ability to match riders with nearby drivers in mere seconds…

2 条评论

LeetCode System Design: A Deep Dive into Scalable Code Execution Platforms

Surya Raghavarapu

Software Engineer

Functional Requirements

Non-Functional Requirements

Basic Data Design

API Design

System Architecture and Flow

领英推荐

b. Docker Containers

Putting It All Together

Surya Raghavarapu的更多文章

社区洞察

其他会员也浏览了

Learn the design and strategy principles of Domain-Driven Design

Learn to create effective diagrams and develop your systems-thinking

List of Popular Software for API Discovery

Backend 101 - A Guide to OpenAPI and API-First Approach

Developing gRPC Microservices in Go with?Examples

What is No-Code, Why use No-Code?

SOLID?-?"O" Open/Closed Principle (OCP) Explained

Dependency Inversion vs. Dependency Injection

Introducing strict-stream - composable streams for AsyncIterable

Docker Tales

Functional Requirements

Non-Functional Requirements

Basic Data Design

API Design

System Architecture and Flow

领英推荐

b. Docker Containers

Putting It All Together

Surya Raghavarapu的更多文章

???? How Uber Finds Your Ride So Quickly: The Tech Behind the Magic

社区洞察

其他会员也浏览了

Learn the design and strategy principles of Domain-Driven Design

Learn to create effective diagrams and develop your systems-thinking

List of Popular Software for API Discovery

Backend 101 - A Guide to OpenAPI and API-First Approach

Developing gRPC Microservices in Go with?Examples

What is No-Code, Why use No-Code?

SOLID?-?"O" Open/Closed Principle (OCP) Explained

Dependency Inversion vs. Dependency Injection

Introducing strict-stream - composable streams for AsyncIterable

Docker Tales