SRE concepts part 7 (White/Black Box Monitoring)

SRE concepts part 7 (White/Black Box Monitoring)

The seventh article in the series about SRE Concepts/Topics is about two topics "white-box" and "black-box" Monitoring.

White-box monitoring

White-box monitoring is a type of distributed monitoring system along with black-box monitoring. While black-box monitoring deals with externally visible resources such as disk space, CPU usage, and other physical variables, white box testing focuses more on applications running on the servers and other internals such as logs.

A Site Reliability Engineer needs to be familiar with the black-box and white-box monitoring to maintain applications. Even though both the monitoring systems are focused on two sides of the coin, they are more related than you'd think. Along with numerous advantages when used, black-box monitoring and white-box monitoring also offers some decent standalone benefits.

Applications of White Box Monitoring

The potential applications of white-box monitoring depend on the kind of application you are running. For instance, monitoring user logs and request logs is a case of white-box monitoring.

If you are running a web application, many parameters, including the number of active users, number of requests to access user profiles, requests to post a comment, can monitor everything carefully using white-box methods.

White box monitoring also includes keeping an eye on the HTTP handlers along with usage logs.

Advantages of White-Box Monitoring

There are numerous advantages to white-box testing. With white box testing, you will be monitoring your application's essential services, like the requests and all traffic in general. Analyzing these numbers can give you some valuable information.

For instance, you cannot know the number of users accessing the Superbowl game video using black-box monitoring. However, if you have white box monitoring set up, you can seamlessly identify which video the users are accessing the most to display on your home page.

White-box monitoring can also give some valuable statistics on how long a request took to complete. It becomes essential when you are trying to identify traffic and understand the resources at your hand.

Depending on traffic, you can reduce or increase your resources if they are on the cloud dynamically and only pay for what you are using. Even though White-box monitoring has numerous advantages on its own, it is usually deployed along with black-box monitoring for maximum effectiveness.

The white-box and the block-box monitoring go hand in hand like Yin and Yang. One can exist without the other, but it won't be complete. If you want to ensure maximum effective monitoring of your application, it is best to use white-box and black-box testing.

Who Maintains White-Box Monitoring?

White-box monitoring also used to be a job for both system admins and SREs. However, since it's all internal, you can use many automation tools to reduce the engineers' workload. As a result, we saw a shift of workload from system admins to DevOps engineers or SREs.

Conclusion

The Site Reliability Engineering models require both black-box and white-box monitoring to perform effectively. However, you can manage the load distribution and prioritize one over the other.

Black-box monitoring

Black-box monitoring is a familiar concept to both DevOps and Site Reliability Engineering. It is a distributed monitoring system that every DevOps and Site Reliability Engineer has to be familiar with. Monitoring is quite essential for any application to ensure proper availability and efficiency.

Black-box monitoring traditionally refers to monitoring servers, mainly focusing on hardware resources such as disk space, memory usage, CPU usage, etc. Black-box monitoring is often referred to as the standard system metrics monitoring. Black-box monitoring is quite essential to understand the costs involved in running an application or scaling it.

We can also define Black-box monitoring as monitoring externally visible resources. All the metrics monitored in black box monitoring are visible to the end-user, such as disk drive, RAM modules.

Why Black-Box Monitoring?

Black-box monitoring is quite essential to know the resource availability of a network. If some of the servers are under attack and there is no black-box monitoring in place, the attack may spread to other servers as well, causing a potential breakdown of the entire network. Black-box monitoring also provides information about scaling if implemented in legacy systems.

Applications of Black-Box Monitoring

Black-Box monitoring has numerous applications. Some of the primary applications include monitoring the network switches and devices such as load balancers. Black-box monitoring is also used to monitor the hypervisor-level resources. Hypervisors are devices that allow multiple virtual machines to run on a system.

Monitoring such vital devices can show much information about the devices and resource distribution.

Who Controls Black-Box Monitoring?

Usually, either DevOps or System Administrators are used to monitor the servers and other networking devices. However, as more and more tools are developed to do white-box monitoring, DevOps' load is reduced quite a lot. Nowadays, we either see a shared responsibility model between DevOps and System Admins or SREs and admins for black box monitoring.

Uses of Black-box Monitoring

Since the black-box monitoring includes all the external devices, you can see any defects in this process. For instance, let us say that a CPU did some damage to its core modules during transportation. If there is any hardware damage because of the previous accident, it can be spotted using Black-box monitoring.

One can also pinpoint some software issues with black box monitoring. If a driver error or an OS fails unexpectedly, the system resource usage may increase or decrease accordingly.

In case of a driver error, the system interrupts can cause the CPU usage to rise to 100%. Such issues are easy to detect using black box monitoring. However, in general, both black-box and white-box monitoring systems are used together.

Conclusion

Black-box monitoring is essential to maintain a good network. Without black-box monitoring, you won't have much information on your system's capacity, current usage, and scalability.

Sagar Seth

Head of Site Reliability Engineering @ ECC

3 年

Thanks for sharing your valuable thoughts

回复

要查看或添加评论,请登录

Marcel Koert的更多文章

  • Deepfakes and AI-Generated Misinformation

    Deepfakes and AI-Generated Misinformation

    A Double-Edged Sword Imagine stumbling across a video of a world leader declaring war, only to find out later it was…

  • AI Ethics and Bias

    AI Ethics and Bias

    Building a Fairer Future with AI AI is transforming industries at an unprecedented pace, making decisions that affect…

    1 条评论
  • AI and Job Displacement

    AI and Job Displacement

    A New Era of Opportunity If history has taught us anything, it’s that technology changes the way we work—sometimes in…

  • AI-Driven Decision Making

    AI-Driven Decision Making

    Transforming Critical Industries for the Better Imagine a world where AI helps doctors diagnose diseases earlier than…

  • Paying for views/advertisement for your youtube channel is that bad.

    Paying for views/advertisement for your youtube channel is that bad.

    The Debate Over Paid Views and Advertising on YouTube: A Balanced Perspective YouTube is an ever-expanding universe of…

  • Emphasizing Developer Experience in DevOps

    Emphasizing Developer Experience in DevOps

    In the realm of DevOps, the focus has traditionally been on streamlining processes, automating workflows, and enhancing…

  • Rise of Internal Developer Platforms

    Rise of Internal Developer Platforms

    The Rise of Internal Developer Platforms: A Comprehensive Guide for DevOps Engineers In the dynamic realm of software…

  • The Hype About Platform Engineering: Echoes of the SRE Revolution

    The Hype About Platform Engineering: Echoes of the SRE Revolution

    In the world of modern software development, buzzwords come and go, but some stick long enough to redefine the way we…

  • Openshift V Kubernetes

    Openshift V Kubernetes

    OpenShift and Kubernetes are both popular container orchestration platforms used in the deployment and management of…

  • Human biases in SRE

    Human biases in SRE

    Human biases can have a negative impact on reliability in an IT organisation by influencing decision-making…

社区洞察

其他会员也浏览了