登录查看更多内容

Monitoring and Alerting

Digital Hub Warsaw I Bayer

Digital for Life Science

发布日期: 2024年9月10日

In previous sections, I used analogy to the human body and I compared Kafka to the heart and the Portal to the brain. We know these usually function well, but not always. Therefore, we need to constantly monitor them, check their performance and health, detect any changes, and respond to them on time. I would compare this functionality to the nervous system, which monitors the entire body and responds to changes to maintain balance and protect against harm.??

Prometheus & Grafana : Metrics collection, Monitoring and Visualisation??

Prometheus is the cornerstone of our Monitoring system. This open-source application is commonly used by the majority of the organisations for monitoring and alerting purposes. Because of its popularity, Prometheus is stable and well integrated with various platforms, also there is huge support from the community for any issues that may arise.???

We’re currently scraping more than a hundred metrics from most of the components of our platform which are located either on virtual machines or in Kubernetes. We collect different types of metrics, such us:??

System Metrics : CPU & memory, storage, and network utilisation??

Applications Performance Metrics: Kafka Throughput, Number of messages sent through the cluster, Volume of data processed through the cluster??

System Health Metrics: VM Node & Pods availability???

Custom Metrics: Number of topics partitions, stalled partitions, data replication indicator??

In Prometheus rules define the conditions under which the alerts will be triggered and subsequently sent to Microsoft Teams Groups via Alert Manager.???

It’s difficult to analyse metrics directly in Prometheus as they are in the numerical form, therefore we visualise that data in Grafana, where users have access to respective dashboards and charts. Besides its dashboarding capabilities, Grafana can also send alerts, and in our case we use it for that purpose in Monitor Group.??

领英推荐

Contribute to OpenTelemetry to enhance end-to-end…

IBM Hybrid Cloud and Infrastructure 6 个月前

??GovCon Market Intelligence by G2Xchange | 6-12-24

G2X - The GovCon Growth Platform 8 个月前

Best Practices for Effective Logging Strategies

Brilworks Software 1 个月前

Kibana & ElasticSearch??

While Grafana and Prometheus are excellent for detecting any anomalies in the platform, they are unable to tell us what exactly is happening with a particular application. For this purpose Kibana and ElasticSearch come with help, which give us centralised access to logs, instead of going to each individual server and analysing files one by one.??

Everything starts with Filebeat, a lightweight application that is basically a simple log collector which sends logs to ElasticSearch. This search and analytics engine stores the logs data, optimised for full-text search, allowing for quick retrieval of information and real-time analytics. Eventually, users can analyse this data via Kibana dashboards, charts and graphs.??

Audit Functionality??

As mentioned earlier, the platform hosts thousands of Kafka Topics and associated objects (e.g. certificates or ACLs). We need to make sure that topics are actively used so that resources are optimally utilised. Therefore, we audit following conditions:???

Topics have 0 offset for more than 90 days – this checks whether topics are not used for longer time??

Topics without ACLs - topics are not accessible without defining ACLs??

A daily process checks these conditions and sends emails to the respective topic owners with a request to take a specific action.??

Summary??

In this article, I aimed to provide you at least a fraction of information about the EDH platform and its components. Detailing each aspect thoroughly would probably require an entire book chapter!??

However, I hope that after reading this, you have an idea about the purpose of this platform, a high-level overview of its main components, and technology behind it.???

Obviously, we haven’t stopped and we continue to introduce new enhancements to deliver the best possible product for Bayer engineers. Apart from regular OS, software upgrades, and KRaft migration mentioned earlier, we are adding more functionalities to the EDH.???

One of the worth mentioning is EDH Wire, which integrates the platform with various data sources and sinks, so users don’t have to develop their own producer and consumers.???

Another tool, currently in development,? is the Stream Transformer. This tool offers advanced transformation and integration capabilities (i.e. merging, joining, routing, etc…). Built on Apache Flink and it inherits all of its features. The jobs are deployed on Kubernetes cluster, ensuring scalability, fault tolerance and easy maintainability.???

But, that’s not all and we have another ideas in our roadmap which will be described in the next article!??

Monitoring and Alerting

Digital Hub Warsaw I Bayer

Digital for Life Science

领英推荐

Digital Hub Warsaw I Bayer的更多文章

社区洞察

其他会员也浏览了

New Features and Application Practices of SuperMap iServer Processing Automation

How to Reduce Tool Sprawl with Itential’s Automated Data Transformations

Log and trace management made easy. Quickwit Integration via Glasskube

Telemetry: Unlocking the Hidden Power of Observability in Axon Server Applications

What is Micro-Metrics Monitoring?

Prometheus Comprehensive Guide to Monitoring and Visualization

Insight Jam Newsletter for November 15, 2024

Forward Networks has teamed up with NetBox Labs to lower the barriers to adopting network automation

Observability vs. Monitoring: Practical Integration Strategies for ITOM

Meter ingestion options for high throughput metering use cases

领英推荐

Digital Hub Warsaw I Bayer的更多文章

Bridging the Gap: Satellite Imagery

Portal - Web User Interface

Enterprise DataHub

Angular vs React in a corporate reality. Part 2.

Angular vs React in a corporate reality. Part 1.

Vanishing UX industry - and AI ain't killing it!

Web Workers - performance of parallel processing in JavaScript

Data driven decision making, or in other words, who data analysts really are and why do we need them? - part II

Data driven decision making, or in other words, who data analysts really are and why do we need them?

UiPath – Developer tips and tricks

社区洞察

其他会员也浏览了

New Features and Application Practices of SuperMap iServer Processing Automation

How to Reduce Tool Sprawl with Itential’s Automated Data Transformations

Log and trace management made easy. Quickwit Integration via Glasskube

Telemetry: Unlocking the Hidden Power of Observability in Axon Server Applications

What is Micro-Metrics Monitoring?

Prometheus Comprehensive Guide to Monitoring and Visualization

Insight Jam Newsletter for November 15, 2024

Forward Networks has teamed up with NetBox Labs to lower the barriers to adopting network automation

Observability vs. Monitoring: Practical Integration Strategies for ITOM

Meter ingestion options for high throughput metering use cases