Monitoring: The four Golden Signals
Shailender Singh
Advisor - Software Architecture @ Fiserv | AI/ML and Cybersecurity Enthusiast | SRE and Security Author | Ex - Microsoft| Broadcom | Symantec | Mckinsey | Hewlett Packard
Cloud-based SaaS solution or Distributed system monitoring should be focused on 4 golden signals that expose most of the system's internal behavior, If you measure all 4 golden signals and page a human when one signal is problematic, your service will be at least decently covered by monitoring - "Google".
If you understand carefully these 4 signals then you will be in a position to co-relate it to other metrics. Take an example in case you are monitoring CPU or Memory and if you go behind finding the RCA about why CPU or Memory utilization is going high then you will either find more traffic hitting your workload or somehow there were increase in error rate in your app and then somehow your application behavior leading to increase in latency.
In summary, I am not negating to use other traditional key metrics like CPU, Memory, I/O activity on Disk, Network but from bigger perspective, these 4 given golden signals will directly give you answers to various threshold cross problems and will lead to decrease in time spent during incident management or during your problem management.