Baseline(s)..

Baseline(s)..

Merriam-webster defines baseline "as a line serving as a basis", or you can think of it is a starting point used for comparisons.

In site reliability engineering (SRE), establishing a performance baseline for workloads is a common practice. This baseline serves as a reference point for monitoring and also improving system performance over time.

In my previous experience as a SRE, particularly in supporting critical e-commerce websites, I learned that it's beneficial to establish more than one baseline for a more holistic view and accurate analysis.

I have found two key types of baselines are quite useful:

[A] Performance Baseline Influenced by External Factors: This baseline takes into account real-world conditions such as network variability, user device performance, and other external influences. It reflects the performance metrics as experienced by the end-users, providing insights into how these factors impact the user experience.

[B] Performance Baseline Detached from External Factors: This is a controlled baseline that isolates the system's performance from external variables, offering a pure look at the system's capabilities. It helps in identifying the inherent performance characteristics of the raw system, independent of external conditions. Note that, external conditions are always variable.

Lets take a simple example - Latency. When taking latency as an example metric, these two baselines offer different insights. The first, influenced by external factors, shows how latency is experienced by users in various conditions, which is critical for understanding and optimising the user experience. The second, detached from external factors, reveals the system's baseline latency, helping to identify potential improvements in the infrastructure or application itself.

But when combined together, they provide a holistic view of where performance stands and where it can be tuned.

RUM tools (Real User Monitoring) are super useful to understand this baseline [A] however for the baseline [B] I tend to rely on other mechanisms, more on this later (Part 2 of the article).

But first I want to highlight the importance of the establishing multiple baselines. In my SRE days I often came across this question, why we need to do this and here is my thought.

Without contextual information baseline adds little value : Different baselines can capture different aspects of your system performance under a range of contexts. For example, having a baseline for server response times under stable network conditions and another reflecting real-world user conditions allows for a more nuanced understanding of performance across different scenarios.

Improved visibility and control: Multiple baselines enable a more accurate analysis or troubleshooting. You can compare performance across different dimensions, this helps in identifying specific scenarios where performance might degrade and requires tuning.

Targeted Tuning: With baselines for different conditions and components of your service, you can target tuning more effectively. For instance, if one baseline shows degraded performance due to server load while another shows issues related to client-side rendering, you can prioritise resources and efforts to address these specific areas.

Risk mitigation: Multiple baselines also help in risk management. For example, If a new deployment badly affects performance, having established baselines helps quickly quantify the impact and roll back changes if necessary. Workload and their usage patterns evolve over time, so what constitutes "normal" performance can change. Multiple baselines can help track this evolution more accurately, allowing for adjustments in performance expectations and optimisation strategies.

Establishing and maintaining multiple baselines for performance monitoring takes more effort but can lead to a deeper insights and more effective management of site reliability.

[ To be continued..]

要查看或添加评论,请登录

Ron Sengupta的更多文章

  • Digital Asset Custodians in Securing Crypto

    Digital Asset Custodians in Securing Crypto

    As a longtime Crypto and (Crypto security) enthusiast, I was really interested when I came across news that Indian…

  • Cartesian theater

    Cartesian theater

    [ To Be Continued]

    2 条评论
  • Blockchain and DRM

    Blockchain and DRM

    Blockchain is a buzzword that’s thrown around a lot, especially when it comes to Digital Rights Management (DRM)…

  • Supply Chain Mgmt. Simulation in the Beer Pub

    Supply Chain Mgmt. Simulation in the Beer Pub

    The Role of Simulation in Supply Chain Management As per the “Supply Chain Management: Strategy, Planning, and…

  • Right to be forgotten

    Right to be forgotten

    I was researching “right to be forgotten” which you might also hear being called “the right to erasure," I am bit…

  • Sherlock Holmes, Observability and Chaos Engineering - Fun Story

    Sherlock Holmes, Observability and Chaos Engineering - Fun Story

    "Without observability, you don’t have ‘chaos engineering’. You just have chaos.

  • An Enterprise Platform Design

    An Enterprise Platform Design

    Enterprise Kubernetes(Openshift ) is the modern-day operating system for the cloud. It has a wide range of constructs…

  • End to end encryption with Openshift-Two-way-SSL

    End to end encryption with Openshift-Two-way-SSL

    This article aims to demonstrate use cases for Openshift routes to achieve end-to-end encryption. This is a desirable…

  • JDBC Tuning- Part 1

    JDBC Tuning- Part 1

    Before we start please note all R&D should be done in integration/staging environment and not in production…

    1 条评论
  • Harry Potter

    Harry Potter

    I wrote this almost ten years back in 2005, just before "Half blood prince" was published. There was a contest from a…

社区洞察

其他会员也浏览了