Continuous Monitoring #2 - What is under the hood?
This series is about to bring closer technicalities around monitoring to people, who have no more profound knowledge about how their business application works, and what are connections between Infrastructure, Application and Business parts in terms of monitoring.
TL;DR - "Should I worry if CPU is 100% on my Orders app?"
Let's see how it usually looks when we think about our application (a store, warehouse management system, or website) in terms - of how it's built, and where it is set.
Three scenarios are most common: on-premise (old way), in the cloud as IaaS (Infrastructure as a Service, one of the most common cloud adoptions, made as lift & shift from the old server data center), and finally full cloud adoption with PaaS/SaaS approach (Platform or Software as a Service approach in the cloud).
In each case, monitoring is a challenge. But no worries, it will simplify in the end.
On-premise stack example
In this scenario, everything from our data center till the very end of website access is in our hands, and under the control of our admins. Also, this means, that Admins (facility admins, hardware & system admins, database admins, and application admins) need to monitor a lot of layers.?
From top to bottom:
- Our users & customers layer - the business users of our applications & websites
- Network?layer - which connects users to applications, and applications with databases and other systems
- Application layers (where our applications, databases, and websites reside)
- Server layer - like database servers, web servers, or application hosting servers
- Operating System layer - where our servers are installed, system access and supporting processes
- Hardware layer - this is physical equipment (aka computers) used to host every piece of software we have and interconnect between (network devices)
- Facility layer - simply roof over our hardware, secured doors, power connections, air conditioners, etc.
I know, it's a bit simplified, but for discussion about monitoring "how?" should fit. Besides monitoring, this picture should be good to understand what is connected with what, and what is the influence, when errors will start to occur between these layers.
On-prem stack monitoring applied
You can notice, that we have to cover 7 layers with monitoring!
As the on-prem scenario existence is on the verge, let's not focus deeply on this, but on some explanation I'll do.
Monitoring in this case is a connection of the following steps:
From a business user perspective, it's a long net of internal dependencies when something can break - it will immediately hit the business layer and interrupt the business flow.
To get the most insightful information about the system's condition, we need to create a system map, which describes each business service from the left side but with some indicators of how components beneath behave.
However, a simple diagram based on the flow pictured above may do a thing.
And this is frankly speaking the worst-case scenario, so let's jump further into the more familiar world.
Cloud IaaS stack example
This scenario is a bit simpler: we don't bother about physical machines and building at all, as we pay for the service to the cloud provider (let's take Azure as an example).
IaaS stack monitoring applied
And as in the previous, this scenario is a bit similar according to the layers:
领英推è
Visually, we have only 6 layers to cover - with still very heavy coverage in each:
- Our users & customers layer - the business users of our applications & websites
- Network?layer - which connects users to applications, and applications with databases and other systems - as we're already in the cloud, this is rather a service, than a physical equipment
- Application layers (where our applications, databases, and websites reside)
- Server layer - like database servers, web servers, or application hosting servers
- Operating System layer - where our servers are installed, system access, and supporting processes - still we need to cover this, although it's a part of our provisioned virtual machines
- Virtual Machines layer - bought as an IaaS from the cloud
?
So, interconnections are still a bit complicated, but at least we need to focus on a less:
The good thing about this approach is, that whenever business users notice a lot of application errors caused by a long time of transactions, it can be revealed down the line, that one of the provisioned Virtual Machines is too weak in performance, so rehosting is possible (depends which service is responsible - it can be Database VM, Application Server VM, web server VM etc).
However, in this scenario, we need also to monitor the current costs of VMs (OPEX - Operational Expenditures), which is another dimension of the monitoring.
Let's jump to a more appealing (meaning a lot simpler scenario).
PaaS/SaaS Cloud stack example
Whereas PaaS (Platform as a Service) or even SaaS (Software as a Service) simplifies monitoring on a completely different level. In this scenario, for example, we can use platforms (like Azure Managed SQL Database), hosting services (as an Application Service plan or Web-hosting service), and FTP-less file access based on Azure Blob.
PaaS/SaaS stack monitoring applied
From the monitoring perspective, we still need to be able to see some things:
It looks like we can focus only on 4 layers. But there's a bit of change in the approach - we rely on managed services (PaaS) where cost control basically jumps into more monitoring position (as overspending may occur and need to be controlled vs our expected performance).
From top to bottom:
- Our users & customers layer - the business users of our applications & websites
- Network?layer - which connects users to applications, and applications with databases and other systems - as we're already in the cloud, this is rather a service, than a physical equipment
- Application & Databases layer - this is a set of various application hosting services like Applications, APIs, Functions, Logic Apps or web-hosting and of course wide area of managed databases such as Azure Managed SQL or CosmosDB (services which allows you fully concentrate on database content & logic)
- Service layer - though we are on managed services (like PaaS, SaaS, and FaaS (Function as a Service) basically - serverless, still on this layer we can observe the overall condition, utilization, and most important - costs.
In the details:
I've marked white these metrics which aren't necessary to be monitored, as they play a less role in managed services, as they can autoscale to desired performance (but this will come with the price, just remember that).
Summary
Key to understanding monitoring is a service map available on the business monitoring level and for technical users - on the application level. And behind that, interconnections between each component.
I hope, that after this article the answer to the question set at the beginning is a bit more clear: "Yes, in case of CPU 100% on your database server, You need to worry, my dear business user". And in the best-case scenario, this information should be hidden under the simple "Service Availability = 1%".