CDN Design & Architecture
Hichem AZOUZ
Project/Program manager, Cloud Solutions/Automation, Telecom digital transformation, Design Solutions
What is a CDN:
A Content Delivery Network (CDN) is a globally distributed network of proxy servers and their associated data centers designed to deliver web content and media efficiently to end users. By distributing content across multiple locations, a CDN improves availability, performance, and reliability. The core objective of a CDN is to minimize latency — the delay between a user’s request for content and its delivery — and to enhance the overall speed and performance of websites and applications.
CDNs achieve this by caching content, such as web pages, images, videos, and other static assets, on servers located at strategic points around the world, known as edge servers. Instead of a user always having to fetch content from the origin server (where the content is originally hosted), a CDN routes their request to the nearest edge server. This reduces the distance data has to travel, thereby decreasing latency and significantly improving load times.
Key Components of CDN :
The components of a CDN architecture operate cohesively to decrease the time taken to display web content to users. Although different CDN types specialize in different facets of content delivery, such as security or performance, they mostly rely on similar setups. The key components of CDN architecture are explained below.
Edge Servers (PoPs - Points of Presence) :
Definition: These are geographically distributed servers (or clusters of servers) located close to end users.
Function: Store cached copies of content (static or dynamic) to reduce latency by serving users from the nearest server.
Role in CDN: When a user requests content, the CDN routes the request to the nearest edge server (PoP), minimizing the round-trip time.
Benefit: Reduces server load on the origin and improves response time for users by serving content locally.
Example: Edge servers in a CDN located across continents allow users in Europe to retrieve content from a European PoP rather than the origin server in the US.
Origin Server :
Definition: The original source server where the content resides.
Function: Stores the master copy of the content (e.g., HTML files, images, videos).
Role in CDN: If the content is not available on the edge server (cache miss), the request is forwarded to the origin server to fetch the data.
Benefit: Centralized management of content, and the CDN takes the load off the origin server by caching content at the edge.
Example: The primary web server of an e-commerce website acts as the origin server.
Content Cache :
Definition: Storage systems in the CDN that hold cached copies of frequently accessed content.
Function: Temporarily store content that is retrieved from the origin server to serve future requests.
Role in CDN: Cache systems minimize origin server requests by storing static or dynamic content and reducing latency for end-users.
Benefit: Lowers bandwidth consumption, reduces load on the origin server, and accelerates content delivery.
Example: A video streaming service caches videos on edge servers to reduce the load on the origin server and improve the playback experience.
Request Routing :
Definition: A mechanism that directs user requests to the nearest or best-performing edge server.
Function: Determines which CDN edge server should respond to a user's request based on proximity, network conditions, server load, etc.
Technologies Involved:
DNS Load Balancing: Routes users to the optimal PoP by resolving the requested domain to the nearest CDN node.
Anycast Routing: Sends user requests to the geographically closest or most appropriate server using a single IP address for all PoPs.
Geolocation-based Routing: Determines the best server for a user by their IP address or geographical location.
Role in CDN: Ensures optimal delivery of content by routing user requests to the most efficient PoP.
Benefit: Improves user experience by lowering response times.
Example: When a user in Asia requests content, DNS resolves the request to the nearest PoP in Asia rather than directing it to an origin server in North America.
Caching and Cache Control Mechanisms :
Definition: Techniques that govern how content is cached and for how long.
Function:
TTL (Time to Live): Specifies the duration for which content is considered fresh in the cache before it's refreshed.
Cache Invalidation: Removes outdated or invalid content from the cache based on cache policies.
Cache Purging: The forced removal of cached data (manually or automatically) when content is updated or no longer needed.
Role in CDN: Ensures that cached data is both fresh and optimized to reduce origin server requests.
Benefit: Balances between ensuring up-to-date content and minimizing latency for end-users.
Example: A news website might set a short TTL for breaking news articles but a longer TTL for static assets like images or CSS files.
Global Traffic Manager (Load Balancer):
Definition: A component that distributes traffic across multiple PoPs or servers.
Function: Balances user traffic based on factors like geographical location, server health, and load.
Role in CDN: Ensures efficient use of network resources by directing traffic away from overloaded or malfunctioning servers.
Benefit: Increases reliability and fault tolerance by preventing any single server from becoming a bottleneck.
Example: If one PoP is experiencing heavy load, the global traffic manager directs new requests to a different, less busy PoP.
Content Purge/Invalidation :
Definition: The process of removing or updating outdated cached content from CDN edge servers.
Function: Ensures that changes or updates to the content (like website updates) are reflected in the CDN.
Role in CDN: Prevents users from seeing outdated versions of content while maintaining cache efficiency.
Benefit: Allows near-instant updates to be reflected globally, improving both performance and content freshness.
Example: When an e-commerce site changes product details or pricing, it can immediately invalidate old content across the CDN.
Secure Delivery and SSL/TLS :
Definition: Security mechanisms to ensure secure content delivery across the CDN.
Function: Supports encrypted connections via SSL/TLS, ensuring secure communication between users and CDN servers.
Role in CDN: Protects sensitive information during transit, such as login credentials, financial data, etc.
Benefit: Ensures trust and security, improving the overall user experience by safeguarding user data.
Example: When users log into a website using HTTPS, the CDN ensures that traffic between the user and edge servers is encrypted.
Monitoring and Analytics :
Definition: Tools used to track CDN performance, usage, and content delivery metrics.
Function: Monitors real-time traffic, response times, cache hit/miss ratios, server health, and more.
Role in CDN: Helps network administrators detect issues like outages, slowdowns, or unusual traffic patterns.
Benefit: Provides insights into user behavior, network performance, and helps optimize content delivery.
Example: An e-commerce website might
How it’s work:
A content delivery network typically functions in the following ?Call Flow
User Request: A user tries to access https://example.com/video.mp4.
DNS Query: The user's device sends a DNS query to resolve example.com. The query is sent to the ISP's recursive resolver, which forwards it to the CDN’s authoritative DNS server.
The CDN DNS server uses GeoDNS to respond with the IP of the nearest edge server
BGP Routing: Using Anycast, the same IP address is advertised across multiple CDN locations.
BGP routes the request to the edge server (closest to the user).
Edge Server Handling: If video.mp4 is cached in the CDN node, it is delivered directly to the user. If not, the edge server fetches the content from the origin server and caches it locally for future requests.
Content Delivery: The video is delivered to the user from the edge server with minimal latency.
Cache Expiry: The edge server holds the content for a defined period, determined by cache headers (Cache-Control, Expires). If the cache expires or is manually invalidated, the next request triggers a fetch from the origin again.
By following these strategies, a CDN can optimize content delivery based on user location, traffic patterns, and caching mechanisms, ensuring fast, reliable access to resources.
CDN Architectur :
There are two strategies PUSH & PULL
Push:
Content is uploaded (pushed) to CDN servers by the content provider. The provider manually sends new or updated content to CDN nodes where it's cached.
there are diefferent tequinics that comes with specific architectures, benefits, and drawbacks tailored to the use case.
A. Write Around
In this strategy, the application writes data directly to the database without updating the cache. Subsequent reads may result in cache misses, at which point the data will be fetched from the database and then cached.
Steps:
The application writes data to the database.
When reading, the application first checks the cache.
If the cache doesn’t have the data (cache miss), it reads from the database.
After fetching from the database, the cache is updated with the data.
Pros:
Suitable for write-heavy applications where immediate cache updates aren't necessary.
Reduced write load on the cache.
Cons:
May result in cache misses on initial reads after a write operation.
Cache becomes inconsistent until it’s populated after a read.
Use Case: Used in write-heavy systems where read latency can be tolerated (e.g., analytics platforms).
B. Write Through :
Every time the application writes data, it updates both the cache and the database simultaneously. This ensures the cache always has the latest version of the data.
Steps:
The application writes data to the cache.
The cache writes the data to the database immediately.
Pros:
Cache remains consistent with the database.
Immediate availability of written data in the cache.
Cons:
Slower write operations because both the cache and database are updated at the same time.
Increased load on the cache and database due to redundant writes.
Use Case: Useful in systems where data consistency between the cache and database is critical (e.g., financial systems).
C.Write Back (Write Behind) :
In this strategy, data is first written to the cache, and the cache writes it to the database asynchronously at a later time. This approach minimizes the time taken for write operations since they only update the cache immediately.
Steps:
The application writes data to the cache.
The cache writes the data to the database asynchronously or "once in a while."
Pros:
Improves write performance since writes are done asynchronously.
Reduces the load on the database because multiple updates can be batched together.
Cons:
Risk of data loss if the cache fails before writing to the database.
The database might not always have the most recent data.
Use Case: Suited for high-throughput write systems where occasional data staleness in the database is acceptable (e.g., caching sessions in web applications).
Pull:
Pull (or Cache) CDN Content is only fetched from the origin server when requested by a user. Once fetched, it's cached at the edge servers.
there are diefferent tequinics that comes with specific architectures, benefits, and drawbacks tailored to the use case.
A. Cache Aside (Lazy Loading) :
In this strategy, the application directly queries the cache. If the data is not present (cache miss), it then reads from the database. After fetching the data from the database, it updates the cache with the fetched data so subsequent reads can be faster.
领英推荐
Steps:
The application tries to read from the cache.
If there’s a cache miss (the data is not found), it goes to the database.
It reads from the database.
The application retrieves the data from the database.
The cache is updated with the data for future reads.
Pros:
Reduces load on the database by serving repeated reads from the cache.
Flexible, the cache is only updated when data is requested.
Cons:
Initial read latency (cache miss).
Cache can become stale unless it’s properly invalidated.
Use Case: Useful for scenarios where data is read more frequently than written (e.g., content management systems).
B. Read Through :
Similar to cache aside, but in this strategy, the application interacts only with the cache. If the requested data is not in the cache, the cache itself fetches the data from the database, updates its cache, and returns the data to the application.
Steps:
The application reads from the cache.
If there’s a cache miss, the cache itself goes to the database.
The cache reads from the database.
The cache fetches the data from the database.
The cache updates itself with the data and serves the request to the application.
Pros:
Simplifies application code since it only interacts with the cache.
Automatically updates the cache when a cache miss occurs.
Cons:
Similar to cache aside, cache invalidation is necessary to avoid serving stale data.
Possible performance bottleneck if many cache misses happen.
Use Case: Ideal for applications where the cache should manage its population automatically (e.g., web applications with frequent reads and occasional writes).
Multi-CDN Strategy :
Multiple CDN providers are used to deliver content. Based on factors like performance, reliability, and cost, traffic is routed between CDNs.
Benefits:
High availability and redundancy.
Optimized content delivery by choosing the best-performing CDN for each user location.
Disadvantages:
Complex implementation and management.
Higher cost as you may need to pay multiple providers.
Use Cases: Large-scale video streaming, global ecommerce, gaming platforms (e.g., Netflix, Amazon).
Hybrid CDN :
Combines on-premise infrastructure with third-party CDN services. Part of the content is delivered from local servers, and part is from CDN.
Benefits:
Flexibility for both local and global delivery.
Control over sensitive or frequently changing data on local servers.
Disadvantages:
High complexity.
Requires substantial infrastructure management and expertise.
Use Cases: Enterprise-grade solutions where data sensitivity or real-time processing is critical (e.g., financial institutions, internal corporate apps).
Summary of Caching Strategies :
Cache Aside: The application is responsible for managing the cache, loading data into the cache when needed. Best for static data with occasional updates.
Read Through: The cache manages itself, automatically fetching data from the database on cache misses. It simplifies application logic but still requires invalidation.
Write Around: Writes go directly to the database, and the cache is updated when read. Best for systems with frequent writes and fewer reads.
Write Through: Both the cache and the database are updated simultaneously. It ensures data consistency at the cost of write latency.
Write Back: Writes are done to the cache first, and the cache asynchronously writes to the database. This improves performance but can introduce risks of data loss.
Each of these strategies has its strengths and weaknesses, and choosing the right one depends on the specific needs of your application, including the read/write balance, latency tolerance, and the consistency requirements.
Best Practices for Implementing a Content Delivery Network :
Implementing a Content Delivery Network (CDN) is a great way to improve the performance, scalability, and security of a website or application. However, to maximize the benefits, it's important to follow best practices. Here are key best practices for effectively implementing a CDN:
Choose the Right CDN Provider :
Assess Global Reach: Ensure the CDN provider has Points of Presence (PoPs) in regions where your users are located.
Evaluate Service Reliability: Consider the provider's uptime guarantees, SLAs, and redundancy mechanisms.
Features to Consider:
Geographical coverage (edge servers)
Support for dynamic and static content
Security features like DDoS protection, WAF (Web Application Firewall)
Real-time monitoring and analytics
Scalability and cost efficiency
Example Providers: Akamai, Cloudflare, AWS CloudFront, Fastly, Google Cloud CDN.
Leverage Cache Effectively :
Use Cache-Control Headers: Set proper TTL (Time to Live): Adjust cache expiration based on content type (e.g., longer TTL for images, shorter TTL for dynamic content).
Cache Static Content: Ensure that static assets like CSS, JavaScript, and images have long cache durations.
Cache Dynamic Content (if applicable): Use edge caching for dynamic content where possible, especially for high-traffic sections.
Purge/Invalidate Caches When Needed: Invalidate or purge outdated content from edge servers when content changes (e.g., product updates, news articles).
Automate cache purging where possible to keep data up-to-date.
Optimize Content for Delivery :
Minimize and Compress Files: Use Gzip or Brotli compression for CSS, JavaScript, and HTML files to reduce file sizes and loading times.
Minify JavaScript, CSS, and HTML files to reduce unnecessary characters and whitespace.
Use Adaptive Bitrate for Video:
For media-heavy sites, use adaptive streaming protocols (e.g., HLS, MPEG-DASH) to deliver videos based on the user’s connection quality.
Image Optimization: Use modern image formats like WebP or AVIF for smaller file sizes without sacrificing quality.
Enable lazy loading for images to load only when needed, improving initial page load times.
Implement Secure Content Delivery :
Use HTTPS/SSL: Ensure your CDN supports SSL/TLS for encrypted communication between users and CDN servers.
Set up HSTS (HTTP Strict Transport Security) to enforce secure connections.
DDoS Protection: Choose a CDN provider that offers built-in DDoS protection to safeguard against large-scale attacks.
Use rate-limiting or WAF to prevent abusive traffic and application-layer attacks.
Token Authentication for Secure Links: Use signed URLs or tokens for premium or restricted content to prevent unauthorized access.
Implement geo-blocking or IP whitelisting/blacklisting for sensitive content.
Monitor CDN Performance :
Use Real-Time Analytics: Monitor key metrics such as cache hit/miss ratio, response time, bandwidth usage, and geographic distribution of traffic.
Use these metrics to adjust cache settings, optimize content, or troubleshoot issues in real time.
Monitor PoP Performance: Keep an eye on the performance of different PoPs. If certain regions are slower, work with your CDN provider to troubleshoot or improve response times.
Set Alerts for Downtime: Set up alerts to detect CDN performance issues, server downtime, or high latencies.
Optimize DNS Resolution :
Use Low TTL for DNS: Set a low Time to Live (TTL) for DNS entries, so that DNS changes (such as moving to a new CDN PoP) propagate quickly.
Use CDN's DNS Service: Many CDN providers offer integrated DNS services optimized for global traffic routing, leveraging fast DNS lookups and intelligent routing.
Segment Your Content Delivery :
Use Multi-CDN: Consider a multi-CDN strategy to provide redundancy and failover in case one CDN experiences an outage.
Multi-CDN setups can optimize content delivery to different regions, ensuring global performance consistency.
Differentiate Static and Dynamic Content: Serve static assets (e.g., images, videos) directly from the CDN, while dynamic content (e.g., user-generated content, personalized data) can be routed through the CDN but processed at the origin.
Optimize Cache Invalidation Policies :
Implement Cache Invalidation Rules: Create specific rules for when to invalidate cached content, based on the nature of the content (e.g., invalidate product pages only when the inventory changes).
Use Versioning for Static Assets: For static content, use versioned URLs (e.g., /style.v1.css) to ensure that changes are automatically reflected without unnecessary cache invalidations.
Configure Geo-Based Content Delivery :
Use Geo-Targeting: Deliver localized content (language, currency, region-specific promotions) based on the user’s location.
Geo-restrictions can also be used to comply with content distribution regulations or licensing restrictions.
Optimize CDN PoP Selection: Fine-tune how CDN PoPs are selected for different regions. Ensure the closest PoP is chosen for delivering the best performance.
Use geo-redundant PoPs in high-demand regions to avoid overload or downtime.
Test and Optimize CDN Performance :
Conduct Load Testing: Regularly test your CDN under load to ensure that it can handle peak traffic without degradation in performance.
Test different geographical regions to identify latency issues or bottlenecks.
A/B Testing: Perform A/B tests to experiment with different cache strategies, content optimizations, and PoP configurations. Optimize based on performance results.
?Leverage API Acceleration and Dynamic Content :
API Caching: For applications that rely on APIs, consider caching the results of common API requests to reduce load
How can Cache Systems go wrong :
The diagram below shows 4 typical cases where caches can go wrong and their solutions.
Thunder herd problem :
This happens when a large number of keys in the cache expire at the same time. Then the query requests directly hit the database, which overloads the database. There are two ways to mitigate this issue: one is to avoid setting the same expiry time for the keys, adding a random number in the configuration; the other is to allow only the core business data to hit the database and prevent non-core data to access the database until the cache is back up.
Cache penetration :
This happens when the key doesn’t exist in the cache or the database. The application cannot retrieve relevant data from the database to update the cache. This problem creates a lot of pressure on both the cache and the database. To solve this, there are two suggestions. One is to cache a null value for non-existent keys, avoiding hitting the database. The other is to use a bloom filter to check the key existence first, and if the key doesn’t exist, we can avoid hitting the database.
Cache breakdown :
This is similar to the thunder herd problem. It happens when a hot key expires. A large number of requests hit the database. Since the hot keys take up 80% of the queries, we do not set an expiration time for them.
Cache crash :
This happens when the cache is down and all the requests go to the database. There are two ways to solve this problem. One is to set up a circuit breaker, and when the cache is down, the application services cannot visit the cache or the database. The other is to set up a cluster for the cache to improve cache availability.
Conclusion :
In conclusion, Content Delivery Networks (CDNs) play a vital role in optimizing the delivery of web content to users worldwide. By leveraging a distributed network of edge servers, CDNs reduce latency, improve reliability, and enhance the overall user experience. With efficient content caching strategies, intelligent routing algorithms, and strategic proxy server placement, CDNs ensure fast and consistent delivery of digital assets, making the internet faster and more accessible for everyone.