All about SLAs, SLIs and SLOs
Pratima Upadhyay
Engineering at Airbnb | Women in Tech Mentor | Distributed Systems | Cloud Computing | System design | Data structures | Algorithms
How do we measure response time of a service?
Response time is the time between a client sending a request and receiving a response. Even if we make the same request over and over again, we’ll get a slightly different response time on every try. For this reason, it’s common to see the average response time of a service being reported. Now the term “average” is usually understood as the arithmetic mean, however, in this case, the mean is not a very good metric if you want to know your “typical” response time, because it doesn’t tell us how many users actually experienced the delay.
Why is median the best indicator for response time?
The median is a good metric if you want to know how long users typically have to wait: half of user requests are served in less than the median response time and the other half take longer than the median. The median is also known as the 50th percentile, and sometimes abbreviated as p50. The 95th, 99th, and 99.9th percentiles are common (abbreviated p95, p99, and p999).
领英推荐
They are the response time thresholds at which 95%, 99%, or 99.9% of requests are faster than that particular threshold. For example, if the 95th percentile response time is 1.5 seconds, that means 95 out of 100 requests take less than 1.5 seconds, and 5 out of 100 requests take 1.5 seconds or more.
SLIs, SLAs and SLOs
Percentiles are often used in service level objectives (SLOs) and service level agreements (SLAs).
Staff Software Engineer at Index Exchange
2 年https://open.spotify.com/episode/77YHB1iaThh7kjEwc3m7v0?si=KHWy8MYARICw4pio7cy9Wg