ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

The truth about cache hit ratios

Alicia Pritchett

å‘å¸ƒæ—¥æœŸ: 2017å¹´2æœˆ25æ—¥

Since caching is one of the primary services a CDN provides, one of the most common metrics for evaluating CDN performance is cache hit ratio (CHR). CDN customers have used it for years as a primary indicator of how well a CDN is serving their users and handling their traffic. Itâ€™s not uncommon to see â€œ98% cache hit ratioâ€ in a dashboard and become easily convinced that end users are getting the most out of the CDN.

But thereâ€™s much more to CHR than meets the eye and the metric we often hold so dear may not be telling us what we think it's telling us. So, I thought itâ€™d be a good idea to dig into what CHR is actually measuring and how we may need new ways of calculating and evaluating it.

Traditional CHR calculation

For many years, CDNs have used the following formula to calculate CHR:

Where requests[total] is the total number of client requests received by the CDN and requests[origin] is the number of those requests that made it to the origin. Basically, this means if we send 100 requests to the CDN and only one of them leaves the CDN to reach the origin, we'd have a CHR of 99%.

The problem is that when we hear "99% cache hit ratio," we instinctively think that this means 99% of our usersâ€™ requests were served at the edge of the CDN, from caches closest to the clients that made the requests. The truth is that this isn't necessarily the case, and to understand why that is, we need to have a discussion about CDN architecture, cache hierarchies, and long tail content.

How CDNs are built

I don't want to go into too much detail, and I certainly can't speak for all CDNs, but at the highest level, it goes without saying that a CDN is a distributed network of caching proxies. These proxies, and therefore their caches, are shared. They receive and cache requests across a large number of domains, all of them belonging to the CDN's customers. This means that the storage of each server is also shared. Since storage is finite, there's always some sort of algorithm that usually includes eviction models for management of that storage. While oft-requested objects will likely stay in caches longer (assuming they haven't gone past their freshness lifetime), objects requested less frequently are more likely to get evicted from a cache even if their Cache-Control headers say they should be cached longer.

But just because storage management causes a still-fresh object to be evicted from one cache server doesn't mean that it'll necessarily be evicted from the entire CDN and all its cache servers. CDNs usually have some sort of hierarchical model deployed, where if a server that the client is communicating with doesn't have the object in its cache, it'll likely ask a peer or a parent for the object before trying to fetch it from the origin. The problem is that the peer or the parent isn't always next to the server the client is connected to. So the time it takes to fetch the object and serve it to the client could suffer. This is better illustrated with a diagram:

When an edge cache receives the request from the client, even though the TCP connection is between the client and that specific machine, the response can come from any number of possible storage locations, some not on the server at all. The object can be served from the memory of that machine (best case scenario), disk storage on that machine (let's hope that means an SSD because if it doesn't, there's an extra performance hit there too), a local peer/parent, or a not-so-local peer/parent. And the performance suffers respectively through that entire chain since the object is being served from farther and farther away from the network edge.

Where the object is ultimately served from is usually directly related to how often that object is fetched. The less frequently something gets requested, the higher the chance there is of it being a miss at edge cache the client is connected to. In other words, longer tail content (content not fetched very frequently, like social media posts, profile pictures, large ecommerce inventory, etc.) may still be served from the CDN, but not necessarily from a cache near the client.

But most CDNs consider any response from one of their caches a "hit" when they calculate CHR, as long as it didn't make it to origin. This is where CHR, as a metric, can be misleading and give you a false sense of the performance a CDN is providing for your customers. This is also where the object storage model of a CDN becomes crucial. If deployed without enough edge density, without proper scaling at the edge, and/or without optimal eviction algorithms, a lot of your long tail content may be getting served from deep within a CDN, while your CHR appears to be high.

A better way of calculating CHR

What the traditional CHR calculation is or isn't telling us necessitates a rethinking of how we should be calculating the metric we care so much about. To provide an indicating metric for performance, what we really want to know is the percentage of objects that were served from a CDN cache at the very edge of the network. Something more along the lines of:

Which is really the same as:

Where hits[edge] and misses[edge] are the number of cache hits or misses at the edge of the network, respectively. This formula for CHR accurately conveys what percentage of cacheable requests is being served from the edge of the network, closest to the users.

To be honest, the traditional calculation is still extremely useful. Itâ€™s a very good metric for measuring server offload since it tells us what percentage of requests are kept away from the origin. But, to evaluate performance, we really care about whatâ€™s happening at the edge. The best way forward, then, is to consider two different CHR metrics, one for CHR at the edge:

And one for global CHR:

CHR[edge] is a performance metric and CHR[global] is one for offload. Theyâ€™re both valuable and insightful, but theyâ€™re telling different stories. You should use CHR[edge] to gauge how much of your content is being served from the edge of the network, closer to your users. This translates directly to performance benefits. CHR[global], on the other hand, will tell you how much traffic is kept off of your origin. This translates directly to processing and infrastructure offload, which can lead to great cost savings.

Calculating cache hit ratios with Fastly

Fastly's network is built in a way where every one of our points of presence (POPs) is an edge location, all of which you can see on our network map. For scale and storage density, we have layers of cache hierarchy within each POP, transparent to you and your end usersâ€™ requests. This means that even if a request is a cache miss at the server the client is connected to, there's a good chance it'll still be served from a cache inside the POP itâ€™s communicating with, making it a cache hit at the edge. You can also deploy shielding for an extra layer of caching to increase your global cache hit ratio and reduce traffic to your origin.

The CHR thatâ€™s reported to you via the control panel or through the stats API is calculated as:

With services that have no shielded origins, this is effectively equivalent to both CHR[edge] and CHR[global] since every hit is a hit at the edge and every miss is a miss at the edge (and also a request to origin).

If your service has a shielded origin, where the shield POP adds another level of caching, the formula above still holds, but the hit/miss count will include hits and misses that were shielded. In other words, some requests will count twice. So, itâ€™s a hybrid metric. There are a couple of different ways to decouple and calculate CHR[edge] and CHR[global] independently for a shielded service, but I usually prefer to do it using our real-time log streaming.

The following VCL snippet sets up log streaming for CHR calculation with a couple of local variables:

Then, calculating CHR[edge] and CHR[global] from these log lines is pretty simple:

Where:

Likewise:

Where:

Using the sample log entries from above as an example â€” which has the origin shielded at our Dulles, VA (IAD) POP â€” weâ€™d calculate the following:

And:

Which provides a simple, yet good example for why the two metrics are indicating different things and should be evaluated separately.

While log streaming makes it easy to decouple the two metrics for shielded services, weâ€™re working on making it even easier by reporting the metrics independently through our stats system in the not-so-distant future.

Going forward

At Fastly, we talk a lot about doing things at the edge and why thatâ€™s good for your end users. Caching remains one of these core functions and you should never lose sight of how well a CDN is caching your content at the edge. To this point, itâ€™s important to understand how CDNs report cache hit ratio and what it means for your content and your end usersâ€™ experience. Calculating cache hit ratios both at the edge and globally is a great way to truly get a sense of how your content is being served.

Performance

Author

Hooman Beheshti | VP of Technology

Hooman Beheshti is VP of Technology at Fastly, where he develops web performance services. A pioneer in the application acceleration space, Hooman helped design one of the original load balancers while at Radware and has held senior technology positions with Strangeloop Networks and Crescendo Networks. Heâ€™s been developing the core technologies that make the Internet work faster for nearly 20 years and is an expert and frequent speaker on the subjects of load balancing, application performance, and content delivery networks.

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Alicia Pritchettçš„æ›´å¤šæ–‡ç«

Network Expansion Update: 51 POPs & 22 Tbps

2018å¹´6æœˆ29æ—¥

Network Expansion Update: 51 POPs & 22 Tbps

We have been busy over the first half of the year launching POPs in Vancouver, Canada; Santiago, Chile; Buenos Airesâ€¦

2 æ¡è¯„è®º
Why having more POPs isnâ€™t always better

2018å¹´6æœˆ22æ—¥

Why having more POPs isnâ€™t always better

One of the most interesting parts about working at Fastly is addressing questions about how our offering differs fromâ€¦

5 æ¡è¯„è®º
Altitude NYC 2018 recap

2018å¹´5æœˆ9æ—¥

Altitude NYC 2018 recap

3,500 new Fastly service configurations deployed, 250 breakfast sandwiches eaten, 2 rants about edge computing (or atâ€¦
Building the Fastly WAF

2017å¹´10æœˆ15æ—¥

Building the Fastly WAF

In keeping with our security teamâ€™s vision for defending the modern web, we launched our Web Application Firewall (WAF)â€¦
Fastly Launches Media Shield to Optimize High-Traffic Streaming

2017å¹´9æœˆ29æ—¥

Fastly Launches Media Shield to Optimize High-Traffic Streaming

Fastly now powers major brands including A+E, Brightcove, Dish Network, and Vimeo ?SAN FRANCISCO, September 28, 2017 â€“â€¦

1 æ¡è¯„è®º
Empowering OTT providers with multi-DRM & content preconditioning

2017å¹´8æœˆ9æ—¥

Empowering OTT providers with multi-DRM & content preconditioning

Last year we released On-the-Fly packaging (OTFP) support for MPEG-DASH Common Encryption (CENC), enhancing ourâ€¦
Technical trainings & the future of edge delivery at Altitude SF 2017

2017å¹´7æœˆ31æ—¥

Technical trainings & the future of edge delivery at Altitude SF 2017

Altitude SF 2017 featured hands-on trainings and talks from industry leaders like Reddit, the ACLU, Slack, TED, andâ€¦
How we moved our Historical Stats from MySQL to Bigtable with zero downtime

2017å¹´7æœˆ6æ—¥

How we moved our Historical Stats from MySQL to Bigtable with zero downtime

Learning from the past is an essential step in decision making; at Fastly, we offer our Historical Stats API to helpâ€¦

1 æ¡è¯„è®º
What gets my freak flag flying?

2017å¹´6æœˆ14æ—¥

What gets my freak flag flying?

Everywhere from airplanes to networking events to family gatherings, the question â€œWhat do you do?â€ seems to live onâ€¦

1 æ¡è¯„è®º
Extended trainings, Slack, and the ACLU at Altitude San Francisco

2017å¹´5æœˆ19æ—¥

Extended trainings, Slack, and the ACLU at Altitude San Francisco

Join us June 28-29 for Altitude San Francisco, our annual west coast summit. RSVP below (donâ€™t wait â€” Altitude New Yorkâ€¦

See all articles

The truth about cache hit ratios

Alicia Pritchett

Traditional CHR calculation

How CDNs are built

A better way of calculating CHR

Calculating cache hit ratios with Fastly

Going forward

Author

Hooman Beheshti | VP of Technology

Alicia Pritchettçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

GKE Reference Architectures

The World with Caching in Software and the Internet

Cooperative Caching (COOP Cache)

FastAPI Cache: A Simple Caching System

Cache-Control Decoded: Real-World Strategies to Turbocharge Web Performance

Enchanting Monitoring and Caching with ColdFusion

Attributes of the Architecture - Scalability & elasticity

Boosting Application Performance and User Experience

Stop Stacking CDNs: Why One is Enough

Tech Tale #2-The Legendary Caching : Speeding Up the Quest for Information

Traditional CHR calculation

How CDNs are built

A better way of calculating CHR

Calculating cache hit ratios with Fastly

Going forward

Author

Hooman Beheshti | VP of Technology

Alicia Pritchettçš„æ›´å¤šæ–‡ç«

Network Expansion Update: 51 POPs & 22 Tbps

Why having more POPs isnâ€™t always better

Altitude NYC 2018 recap

Building the Fastly WAF

Fastly Launches Media Shield to Optimize High-Traffic Streaming

Empowering OTT providers with multi-DRM & content preconditioning

Technical trainings & the future of edge delivery at Altitude SF 2017

How we moved our Historical Stats from MySQL to Bigtable with zero downtime

What gets my freak flag flying?

Extended trainings, Slack, and the ACLU at Altitude San Francisco

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

GKE Reference Architectures

The World with Caching in Software and the Internet

Cooperative Caching (COOP Cache)

FastAPI Cache: A Simple Caching System

Cache-Control Decoded: Real-World Strategies to Turbocharge Web Performance

Enchanting Monitoring and Caching with ColdFusion

Attributes of the Architecture - Scalability & elasticity

Boosting Application Performance and User Experience

Stop Stacking CDNs: Why One is Enough

Tech Tale #2-The Legendary Caching : Speeding Up the Quest for Information

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†