System Design of a flash sale e-commerce.

System Design of a flash sale e-commerce.

Problem Statement:

The task is to design a Flash Sale Pre-Checkout System for a food delivery app that wants to sell a specific food item from a top-rated seller within a short time frame. The system needs to handle millions of users competing for a limited stock, ensuring scalability, efficiency, and accurate inventory allocation.

Key Functional Requirements

  1. Flash Sale Inventory Management: The inventory (e.g., 20,000 burgers) is set aside before the sale starts.
  2. Pre-Checkout Reservation: The item is mapped to a customer before payment, ensuring that an order does not get double-booked.
  3. Concurrency Handling: The system must handle millions of requests per second, ensuring fair allocation of stock.

Key Non-Functional Requirements

  1. Scalability: The system should handle 10 million+ users concurrently.
  2. High Performance: Order reservation response time should be <100ms.
  3. Reliability: The system should prevent race conditions and overselling.
  4. Cost Efficiency: The architecture should minimize infrastructure costs, considering flash sales are short-lived events.

Back-of-the-Envelope Calculation

A rough estimation of concurrent users is necessary to determine how much load the system needs to handle.

Estimating Concurrent Users

Given:

  • Total daily active users (DAU) interested in the sale: 10 million
  • Flash sale duration: 24 hours
  • Total seconds in 24 hours = 24 × 60 × 60 = 86,400 seconds
  • Concurrent requests per second = 10 million / 86,400115.74 ≈ 115 concurrent users per second

This means our system needs to handle at least 115 concurrent users every second, assuming a uniform traffic pattern. However, in reality:

  • Peak traffic happens in the first few minutes.
  • The load is not evenly distributed over 24 hours.
  • We may experience bursts of 10,000+ concurrent users during the peak.

Peak Load Assumption

  • If 80% of users participate in the first 10 minutes, that means 8 million users in 600 seconds.
  • Peak concurrent requests = 8M / 600≈ 13,333 requests per second.

Thus, our system must be designed to handle at least 10,000–15,000 requests per second (RPS) during peak traffic.

Things That Are Out of Scope:

1. Payment Processing & Payment Failures

  • This system does not handle payments, transactions, or fraud detection.
  • We assume that once an item is added to the cart, the payment process happens in a separate system.
  • Any payment failures, refunds, or chargebacks are out of scope.

? Reason for Exclusion: Payment processing is handled by dedicated payment gateways (Stripe, Razorpay, PayPal, etc.) and requires PCI compliance and fraud detection mechanisms.


2. Order Delivery & Logistics

  • The system only handles inventory reservation but does not deal with delivery tracking, dispatching, or logistics management.
  • We assume that once an order is successfully placed, it moves to a separate order fulfillment system.

? Reason for Exclusion: Delivery systems involve fleet management, real-time tracking, and delivery slot optimizations, which are completely separate from pre-checkout operations.


3. Order Cancellation & Abandonment Handling

  • We do not handle cases where a user cancels an order after adding it to the cart.
  • If a user adds an item but does not complete payment, we assume inventory will be released after a pre-set timeout.
  • Any form of re-attribution or reallocation of inventory from abandoned carts is not covered.

? Reason for Exclusion: Order cancellations require timer-based inventory restoration and business logic to prevent abuse (e.g., users reserving multiple items without buying them). These optimizations can be handled by a cart management service.


4. Personalized Recommendations & Dynamic Pricing

  • The system does not incorporate AI-based recommendations to upsell or suggest alternative products.
  • Dynamic pricing (surge pricing, demand-based price fluctuations, or discounts) is out of scope.

? Reason for Exclusion: Such features require machine learning models and integration with pricing engines, which are not critical to the core inventory reservation problem.


5. Seller Inventory Management & Restocking

  • This system does not allow sellers to update stock dynamically or manage their own inventory replenishment.
  • We assume that inventory is fixed before the sale starts (e.g., 20,000 burgers pre-allocated).
  • Any inventory restocking logic is not considered.

? Reason for Exclusion: Flash sales typically work with pre-allocated stock, and dynamic inventory updates introduce complexity and inconsistencies during a high-traffic event.


Final Scope Summary

? Covered in This Article

  • Handling a high-scale pre-checkout reservation system.
  • Ensuring real-time inventory allocation with low latency.
  • Using Redis, Message Queues, and Order Accumulators to process reservations efficiently.
  • Addressing scalability concerns, including rate limiting & race conditions.

? Not Covered (Out of Scope)

  1. Payment Processing & Failures
  2. Delivery & Logistics
  3. Order Cancellation & Abandonment
  4. AI-based Recommendations & Dynamic Pricing
  5. Seller Inventory Management & Restocking


System Design Architecture

High-Level Overview


Key Components:

  1. RedisCache: Manages real-time inventory and reservations.
  2. OrderService: Processes orders and updates inventory.
  3. SQS (Simple Queue Service): Ensures orders are processed asynchronously.
  4. OrderWorker: Picks orders from SQS, finalizes reservations, and updates the database.
  5. DLQ (Dead Letter Queue): Stores failed orders for retry.

Detailed Component Breakdown:


Redis as the Primary Inventory Store

erDiagram REDIS_CACHE { int customer_id string order_status }

? Why Redis?

  • Atomic Operations (INCR, DECR) prevent race conditions.
  • Ultra-fast (sub-millisecond latency).
  • Eviction Policies can prevent stale reservations.

?? Redis Key Structure

inventory_count = 20,000 # Decremented atomically reservation:{customer_id} = {status} # Tracks reservations

?? How it Works:

  1. DECR inventory_count → Ensures atomic reservation.
  2. SET reservation:{customer_id} CONFIRMED → Prevents double allocation.
  3. Orders failing payment are released back into inventory.


Order Processing Pipeline

To prevent bottlenecks and system crashes, we use an asynchronous, event-driven pipeline.


? Advantages of this pipeline

  • Asynchronous processing prevents bottlenecks.
  • Message queueing prevents sudden traffic spikes.
  • Retries failed orders without blocking the main system.

Handling Scalability Challenges

The Thundering Herd Problem

?? Issue: Millions of users rushing in at the same time overloads Redis. ?? Solution: Introduce Rate Limiting + Distributed Locks



  • Rate limiter (Token Bucket Algorithm) blocks excessive requests.
  • Distributed Locks ensure that only one request per user is processed.

Preventing Race Conditions in Order Placement

?? Issue: Two users reserving the last item at the same time. ?? Solution: Atomic Redis Transactions



MULTI DECR inventory_count SET reservation:{customer_id} CONFIRMED EXEC

?? This ensures either both succeed or both fail.


Order Failures & Retries

?? Issue: If an order fails, how do we retry? ?? Solution: Dead Letter Queue (DLQ) + Retry Worker




  • Failed orders go to DLQ.
  • Retry Worker retries failed orders up to 3 times.
  • If still failing, alert the customer support team.

Scaling Horizontally

?? Issue: Single server cannot handle millions of users. ?? Solution: Horizontally Scale Order Service


  • Auto-scaling ensures capacity adjusts dynamically.
  • Load balancer distributes requests evenly.

Final Thoughts

? Scalable & Resilient – Can handle millions of users.

? No Race Conditions – Atomic Redis transactions prevent double booking.

? High Availability – Asynchronous queue ensures no service downtime.

? Cost Efficient – Optimized architecture prevents wasteful spending.

Additional Improvements

?? Optimistic Locking for High Contention: Use WATCH in Redis to detect conflicts.

?? AI-powered Demand Prediction: Use ML to pre-stock inventory in high-demand locations.

?? User Experience Enhancements: Show real-time stock updates to users.

?? Use Stream Processor like apache Spark : Avoid availability overheads with embedded DB. Can be paired with CDC tools as well.

This architecture ensures a smooth, fair, and reliable flash sale experience for millions of users. ??


要查看或添加评论,请登录

Nikhil Kumar的更多文章

社区洞察

其他会员也浏览了