The truth about cache hit ratios

The truth about cache hit ratios

Since caching is one of the primary services a CDN provides, one of the most common metrics for evaluating CDN performance is cache hit ratio (CHR). CDN customers have used it for years as a primary indicator of how well a CDN is serving their users and handling their traffic. It’s not uncommon to see “98% cache hit ratio” in a dashboard and become easily convinced that end users are getting the most out of the CDN.

But there’s much more to CHR than meets the eye and the metric we often hold so dear may not be telling us what we think it's telling us. So, I thought it’d be a good idea to dig into what CHR is actually measuring and how we may need new ways of calculating and evaluating it.

Traditional CHR calculation

For many years, CDNs have used the following formula to calculate CHR:

Where requests[total] is the total number of client requests received by the CDN and requests[origin] is the number of those requests that made it to the origin. Basically, this means if we send 100 requests to the CDN and only one of them leaves the CDN to reach the origin, we'd have a CHR of 99%.

The problem is that when we hear "99% cache hit ratio," we instinctively think that this means 99% of our users’ requests were served at the edge of the CDN, from caches closest to the clients that made the requests. The truth is that this isn't necessarily the case, and to understand why that is, we need to have a discussion about CDN architecture, cache hierarchies, and long tail content.

How CDNs are built

I don't want to go into too much detail, and I certainly can't speak for all CDNs, but at the highest level, it goes without saying that a CDN is a distributed network of caching proxies. These proxies, and therefore their caches, are shared. They receive and cache requests across a large number of domains, all of them belonging to the CDN's customers. This means that the storage of each server is also shared. Since storage is finite, there's always some sort of algorithm that usually includes eviction models for management of that storage. While oft-requested objects will likely stay in caches longer (assuming they haven't gone past their freshness lifetime), objects requested less frequently are more likely to get evicted from a cache even if their Cache-Control headers say they should be cached longer.

But just because storage management causes a still-fresh object to be evicted from one cache server doesn't mean that it'll necessarily be evicted from the entire CDN and all its cache servers. CDNs usually have some sort of hierarchical model deployed, where if a server that the client is communicating with doesn't have the object in its cache, it'll likely ask a peer or a parent for the object before trying to fetch it from the origin. The problem is that the peer or the parent isn't always next to the server the client is connected to. So the time it takes to fetch the object and serve it to the client could suffer. This is better illustrated with a diagram:

When an edge cache receives the request from the client, even though the TCP connection is between the client and that specific machine, the response can come from any number of possible storage locations, some not on the server at all. The object can be served from the memory of that machine (best case scenario), disk storage on that machine (let's hope that means an SSD because if it doesn't, there's an extra performance hit there too), a local peer/parent, or a not-so-local peer/parent. And the performance suffers respectively through that entire chain since the object is being served from farther and farther away from the network edge.

Where the object is ultimately served from is usually directly related to how often that object is fetched. The less frequently something gets requested, the higher the chance there is of it being a miss at edge cache the client is connected to. In other words, longer tail content (content not fetched very frequently, like social media posts, profile pictures, large ecommerce inventory, etc.) may still be served from the CDN, but not necessarily from a cache near the client.

But most CDNs consider any response from one of their caches a "hit" when they calculate CHR, as long as it didn't make it to origin. This is where CHR, as a metric, can be misleading and give you a false sense of the performance a CDN is providing for your customers. This is also where the object storage model of a CDN becomes crucial. If deployed without enough edge density, without proper scaling at the edge, and/or without optimal eviction algorithms, a lot of your long tail content may be getting served from deep within a CDN, while your CHR appears to be high.

A better way of calculating CHR

What the traditional CHR calculation is or isn't telling us necessitates a rethinking of how we should be calculating the metric we care so much about. To provide an indicating metric for performance, what we really want to know is the percentage of objects that were served from a CDN cache at the very edge of the network. Something more along the lines of:

Which is really the same as:

Where hits[edge] and misses[edge] are the number of cache hits or misses at the edge of the network, respectively. This formula for CHR accurately conveys what percentage of cacheable requests is being served from the edge of the network, closest to the users.

To be honest, the traditional calculation is still extremely useful. It’s a very good metric for measuring server offload since it tells us what percentage of requests are kept away from the origin. But, to evaluate performance, we really care about what’s happening at the edge. The best way forward, then, is to consider two different CHR metrics, one for CHR at the edge:

And one for global CHR:

CHR[edge] is a performance metric and CHR[global] is one for offload. They’re both valuable and insightful, but they’re telling different stories. You should use CHR[edge] to gauge how much of your content is being served from the edge of the network, closer to your users. This translates directly to performance benefits. CHR[global], on the other hand, will tell you how much traffic is kept off of your origin. This translates directly to processing and infrastructure offload, which can lead to great cost savings.

Calculating cache hit ratios with Fastly

Fastly's network is built in a way where every one of our points of presence (POPs) is an edge location, all of which you can see on our network map. For scale and storage density, we have layers of cache hierarchy within each POP, transparent to you and your end users’ requests. This means that even if a request is a cache miss at the server the client is connected to, there's a good chance it'll still be served from a cache inside the POP it’s communicating with, making it a cache hit at the edge. You can also deploy shielding for an extra layer of caching to increase your global cache hit ratio and reduce traffic to your origin.

The CHR that’s reported to you via the control panel or through the stats API is calculated as:

With services that have no shielded origins, this is effectively equivalent to both CHR[edge] and CHR[global] since every hit is a hit at the edge and every miss is a miss at the edge (and also a request to origin).

If your service has a shielded origin, where the shield POP adds another level of caching, the formula above still holds, but the hit/miss count will include hits and misses that were shielded. In other words, some requests will count twice. So, it’s a hybrid metric. There are a couple of different ways to decouple and calculate CHR[edge] and CHR[global] independently for a shielded service, but I usually prefer to do it using our real-time log streaming.

The following VCL snippet sets up log streaming for CHR calculation with a couple of local variables:

Then, calculating CHR[edge] and CHR[global] from these log lines is pretty simple:

Where:

Likewise:

Where:

Using the sample log entries from above as an example — which has the origin shielded at our Dulles, VA (IAD) POP — we’d calculate the following:

And:

Which provides a simple, yet good example for why the two metrics are indicating different things and should be evaluated separately.

While log streaming makes it easy to decouple the two metrics for shielded services, we’re working on making it even easier by reporting the metrics independently through our stats system in the not-so-distant future.

Going forward

At Fastly, we talk a lot about doing things at the edge and why that’s good for your end users. Caching remains one of these core functions and you should never lose sight of how well a CDN is caching your content at the edge. To this point, it’s important to understand how CDNs report cache hit ratio and what it means for your content and your end users’ experience. Calculating cache hit ratios both at the edge and globally is a great way to truly get a sense of how your content is being served.

Performance


Author  

Hooman Beheshti  | VP of Technology         

Hooman Beheshti is VP of Technology at Fastly, where he develops web performance services. A pioneer in the application acceleration space, Hooman helped design one of the original load balancers while at Radware and has held senior technology positions with Strangeloop Networks and Crescendo Networks. He’s been developing the core technologies that make the Internet work faster for nearly 20 years and is an expert and frequent speaker on the subjects of load balancing, application performance, and content delivery networks.












要查看或添加评论,请登录

Alicia Pritchett的更多文章

  • Network Expansion Update: 51 POPs & 22 Tbps

    Network Expansion Update: 51 POPs & 22 Tbps

    We have been busy over the first half of the year launching POPs in Vancouver, Canada; Santiago, Chile; Buenos Aires…

    2 条评论
  • Why having more POPs isn’t always better

    Why having more POPs isn’t always better

    One of the most interesting parts about working at Fastly is addressing questions about how our offering differs from…

    5 条评论
  • Altitude NYC 2018 recap

    Altitude NYC 2018 recap

    3,500 new Fastly service configurations deployed, 250 breakfast sandwiches eaten, 2 rants about edge computing (or at…

  • Building the Fastly WAF

    Building the Fastly WAF

    In keeping with our security team’s vision for defending the modern web, we launched our Web Application Firewall (WAF)…

  • Fastly Launches Media Shield to Optimize High-Traffic Streaming

    Fastly Launches Media Shield to Optimize High-Traffic Streaming

    Fastly now powers major brands including A+E, Brightcove, Dish Network, and Vimeo ?SAN FRANCISCO, September 28, 2017 –…

    1 条评论
  • Empowering OTT providers with multi-DRM & content preconditioning

    Empowering OTT providers with multi-DRM & content preconditioning

    Last year we released On-the-Fly packaging (OTFP) support for MPEG-DASH Common Encryption (CENC), enhancing our…

  • Technical trainings & the future of edge delivery at Altitude SF 2017

    Technical trainings & the future of edge delivery at Altitude SF 2017

    Altitude SF 2017 featured hands-on trainings and talks from industry leaders like Reddit, the ACLU, Slack, TED, and…

  • How we moved our Historical Stats from MySQL to Bigtable with zero downtime

    How we moved our Historical Stats from MySQL to Bigtable with zero downtime

    Learning from the past is an essential step in decision making; at Fastly, we offer our Historical Stats API to help…

    1 条评论
  • What gets my freak flag flying?

    What gets my freak flag flying?

    Everywhere from airplanes to networking events to family gatherings, the question “What do you do?” seems to live on…

    1 条评论
  • Extended trainings, Slack, and the ACLU at Altitude San Francisco

    Extended trainings, Slack, and the ACLU at Altitude San Francisco

    Join us June 28-29 for Altitude San Francisco, our annual west coast summit. RSVP below (don’t wait — Altitude New York…

社区洞察

其他会员也浏览了