登录查看更多内容

Building a Dashboard with Grafana: A First Attempt

Drew Robbins

Engineering Leader | Driving Innovation and Observability in Generative AI Applications

发布日期: 2024年1月15日

Every year, during the end-of-year holidays I try to do some reading and I try to learn something new. This year, I wanted to learn more about Grafana and how to create effective dashboards. I had the perfect project to use as a learning scenario: I needed to set up a Raspberry Pi with OpenVPN so I could have a more secure and reliable VPN solution to connect back to the US from Japan. In this article, I don’t go over how to set up OpenVPN on Raspberry PI. There is plenty of documentation for that. Rather, I will share how I designed the Grafana dashboard and implemented a custom OpenTelemetry Collector to gather the necessary metrics.

There is code available here: drewby/otel-openvpn (github.com)

Grafana Dashboards

Grafana is an impressive tool for visualization and monitoring that many people in the industry use to track everything from system infrastructure to industrial systems. If you are new to Grafana or just want some inspiration, check out these amazing dashboards featured for 2023: Grafana dashboards in 2023 - Outstanding examples of the year

Many kinds of metrics can be measured in the Observability field from different sources, such as operating systems, cloud services, edge devices, and more. It is easy to gather this data, enrich it with helpful attributes for query, and keep them in a time-series database thanks to the excellent tools in the OpenTelemetry ecosystem. What is more challenging, and often a problem for people who run these systems, is how to present the data in ways that are useful and meaningful. ?

Grafana provides a lot of documentation and one of the articles that is useful is called Grafana dashboard best practices. This article covers several approaches for how to think about observability from the USE (Utilization, Saturation, Errors) method to the RED (Rate, Errors, Duration) method and explains the Four Golden Signals (Latency, Traffic, Errors, and Saturation). But the later sections were informative for me. In particular, this sentence made me think:

?“A dashboard should tell a story or answer a question”

So, my first step was obvious. Before I created a dashboard, and before I did the work of gathering the required metrics, I needed to think through the question I was trying to answer.

Designing the Raspberry PI OpenVPN Dashboard

I followed an investigative approach to create the Raspberry Pi OpenVPN Dashboard, and I think this method can be useful for others in comparable situations. The main question I wanted to explore was how well the VPN works and what factors might affect its performance, such as network, CPU, memory or disk operations. Here's an explanation of the steps involved:

Defining the Objective: The primary step was to set a clear goal for the dashboard. I aimed to gain insights into the performance and reliability of the VPN solution and the Raspberry Pi system. This goal shaped every other decision in the dashboard creation process.
Identifying Key Areas of Interest: With a clear objective in mind, the next task was to pinpoint the crucial aspects to monitor. This led to the identification of five critical areas: OpenVPN Connections, Network Performance, CPU and Processes, Memory Usage, and Disk Performance. These areas were chosen for their direct impact on the system's performance and health.
Drafting the Initial Concept: Before jumping into digital creation, I started by writing down the potential categories and metrics on paper. This helped in visualizing the dashboard layout and the flow of information. An example of this was my curiosity about the Raspberry Pi's temperature in its fanless case. While it eventually turned out to be a non-issue, the process of questioning and exploring these categories was enlightening and shaped the dashboard's development.
Iterative Development: Understanding that perfection is a process, I began with a basic version of the dashboard, ready to be refined over time. This iterative approach allowed for adjustments and enhancements based on real data availability and usage experience. It was a cycle of developing, observing, learning, and improving.

This method of setting a specific goal, finding key areas, creating a basic draft, and improving the dashboard gradually is something I’ll use more often in the future.

Sourcing the right Metrics

I started by identifying important OpenVPN and network metrics for tha main visualizations. This made sense since the main purpose of the setup was to keep a good VPN connection. I needed to know how well the VPN and the network worked, to make sure they were reliable and efficient. I wanted to measure the number of VPN connections and the bandwidth used by each one.

Then I wanted to see how other factors might affect the quality of the VPN connections. This meant getting data on the ethernet interface, CPU usage, Processes created, Memory utilization, and maybe Disk I/O. I roughly ordered these by how much I thought they could influence performance. I was also curious about how the device's temperature could change and affect performance, so CPU temperature was an important metric.

This was an iterative process, so the initial list of metrics changed a bit once I found out what was possible and what was not. ?

领英推荐

? Network topology in a non-intrusive way, etcd should…

Learnk8s 6 个月前

Telco, Edge, Technology, Database, HR, Careers…

John J. McLaughlin 1 年前

Monitoring Systems with Prometheus - Introduction

Shrey Batra 2 年前

Collecting the Metrics

I’m a big fan of OpenTelemetry and the OpenTelemetry Collector. The collector provides a vast set of extensions to receive, process, and export metrics. Metrics can be sourced from a variety of sources. They can be processed, adding necessary attributes. Finally, they can be exported to numerous backends. For me, the backend would be Grafana Cloud, but the metrics can easily be redirected to other backend analytic tools.

Many of the metrics I wanted to collect are readily available via the Host Metrics Receiver in the OpenTelemetry Contrib project. This receiver collects CPU, Process, Memory, and Disk metrics from Windows, MacOS, and Unix systems.

I also wanted to collect metrics on OpenVPN. Unfortunately, there is no readily available component in OpenTelemetry for this. Instead, I wrote a custom component to collect OpenVPN connections, and the transmitted and received bytes from each connection.

Finally, the CPU Temperature for Raspberry PI is available via a specific mechanism on the device. So I also wrote a quick custom component to read this information.

The custom OpenTelemetry collector build file and the source for the two components are available here: drewby/otel-openvpn (github.com)

Creating the Dashboard

The next step was to create my Grafana dashboard. The dashboard source file is available here for import into your own instance of Grafana.

Creating the dashboard was certainly an iterative process, but it was led by my initial notes of what question I wanted to answer and the categories of visualizations I would need. This process helped eliminate dumping random metrics into random charts on a dashboard that would be noisy and useless.

This is the dashboard I came up with:

This view of the Grafana dashboard captures a specific period of network and system activity. Around 8:00-9:00 AM, a single video stream was watched remotely, pausing a few times. The brief spike in dropped network packets just before 8:30 AM corresponds to a temporary connection loss from the iPhone, not the Raspberry Pi. At 9:00 AM, the network load increased due to three simultaneous video streams initiated from different devices. Finally, at 9:30 AM, a connection to a remote DevContainer on the device and the initiation of a build process are visible, marked by a noticeable increase in CPU, memory, and disk usage, showcasing the system's responsiveness to varying computational demands.

As mentioned earlier, a well-crafted dashboard narrates a coherent story, guiding the user through a logical progression of metrics for a thorough understanding of the system's state.

Beginning with the 'OpenVPN Connections', users can immediately grasp the number of active connections and which ones are consuming the most bandwidth.
This sets the stage for digging into 'Network Performance', where the impact of these connections on network health — such as errors and packet drops — becomes apparent.
From here, the 'CPU and Processes' category offers insights into how these network activities influence the server’s processing load.
This leads logically into 'Memory Usage', highlighting the effects on system memory, crucial for overall performance.
Finally, 'Disk Performance' rounds off the story, revealing the downstream impact on storage systems, completing the picture of how network activities and system resources interact and influence each other.

Conclusion

This Grafana dashboard project for the Raspberry Pi OpenVPN setup shows how useful data visualization can be for network and system monitoring. The process from idea to implementation highlights the need for clear goals, incremental design, and the flexibility of open-source tools like OpenTelemetry. I still have a lot to learn about Grafana and creating dashboards, but this experience helps to establish a method that begins with defining the questions first and then building a dashboard to answer them.

If you enjoyed this article, please consider signing up for the newsletter and sharing it with your LinkedIn network. Your feedback and insights are very valuable; they help me learn and improve. So, please don't hesitate to leave comments with your suggestions or tips. Let's keep learning and growing together in the constantly changing world of data visualization and system monitoring

Observability

595 位关注者

要查看或添加评论，请登录

Drew Robbins的更多文章

Defining Generative AI Monitoring Standards: What’s in a Name?

2024年7月6日

Defining Generative AI Monitoring Standards: What’s in a Name?

We have been doing a lot of Generative AI work lately. I’m sure many of the readers of this newsletter have as well.
Observing a Greener Future: Carbon Aware SDK

2024年4月23日

Observing a Greener Future: Carbon Aware SDK

As software engineers, we're deeply invested in observability to ensure our systems perform optimally and reliably…

2 条评论
OpenTelemetry Semantic Conventions for Generative AI

2024年4月17日

OpenTelemetry Semantic Conventions for Generative AI

Exciting news from our OpenTelemetry working group! We've just merged our first pull-request for OpenTelemetry Semantic…

4 条评论
Why Structured Logging Matters

2024年3月28日

Why Structured Logging Matters

I work with many talented individuals at Microsoft, including Maho Pacheco. He recently authored an insightful article…

1 条评论
Monitoring Generative AI Applications

2023年9月19日

Monitoring Generative AI Applications

As the adoption of Generative AI applications continues to grow, so does the necessity for observability using robust…
Bending OpenAI with Traditional Programming for Unique Recipe Creation

2023年8月13日

Bending OpenAI with Traditional Programming for Unique Recipe Creation

Introduction In today's technological landscape, ChatGPT and other Large Language Models (LLM) have captured the…

1 条评论
Let's Code: Building a Custom OpenTelemetry Collector

2023年6月27日

Let's Code: Building a Custom OpenTelemetry Collector

In past articles, we explored OpenTelemetry, a powerful tool that shines a light on the internal operations of your…

2 条评论
Sampling Strategies in Observability

2023年5月28日

Sampling Strategies in Observability

Balancing data collection is critical in system monitoring. Collect too much, and you risk an overflow of information…
Simplifying Telemetry Data Collection

2023年5月15日

Simplifying Telemetry Data Collection

Enjoying this newsletter? Please share it with your network and encourage them to subscribe to receive more articles on…

1 条评论
Let's Code: Writing Observable Code

2023年5月10日

Let's Code: Writing Observable Code

Enjoying this newsletter? Please share it with your network and encourage them to subscribe to receive more articles on…

1 条评论

See all articles

Building a Dashboard with Grafana: A First Attempt

Drew Robbins

Engineering Leader | Driving Innovation and Observability in Generative AI Applications

Grafana Dashboards

Designing the Raspberry PI OpenVPN Dashboard

Sourcing the right Metrics

领英推荐

Collecting the Metrics

Creating the Dashboard

Conclusion

Observability

595 位关注者

Drew Robbins的更多文章

社区洞察

其他会员也浏览了

Myro Smart Web: A Pioneering P2P Mega Platform Built on Distributed System Principles

Azure Service Bus geo-paired namespace with automated failover

ExternalName service in Kubernetes

eCHO News 21

Building Resilient Architectures with Azure Traffic Manager

Controller Manager – The Auto-Healer of Kubernetes

IBM's Comprehensive NEW Storage Offerings Drive Opportunity for the Channel

Day 45 : Kubernetes Service #90DaysofDevOps

Exploring Microsoft Fabric's Key Components: Fabric Runtime and Fabric Transport

Scaling Socket.IO: Addressing Packet Loss and Event Routing in Horizontal Scaling

Grafana Dashboards

Designing the Raspberry PI OpenVPN Dashboard

Sourcing the right Metrics

领英推荐

Collecting the Metrics

Creating the Dashboard

Conclusion

Observability

595 位关注者

Drew Robbins的更多文章

Defining Generative AI Monitoring Standards: What’s in a Name?

Observing a Greener Future: Carbon Aware SDK

OpenTelemetry Semantic Conventions for Generative AI

Why Structured Logging Matters

Monitoring Generative AI Applications

Bending OpenAI with Traditional Programming for Unique Recipe Creation

Let's Code: Building a Custom OpenTelemetry Collector

Sampling Strategies in Observability

Simplifying Telemetry Data Collection

Let's Code: Writing Observable Code

社区洞察

其他会员也浏览了

Myro Smart Web: A Pioneering P2P Mega Platform Built on Distributed System Principles

Azure Service Bus geo-paired namespace with automated failover

ExternalName service in Kubernetes

eCHO News 21

Building Resilient Architectures with Azure Traffic Manager

Controller Manager – The Auto-Healer of Kubernetes

IBM's Comprehensive NEW Storage Offerings Drive Opportunity for the Channel

Day 45 : Kubernetes Service #90DaysofDevOps

Exploring Microsoft Fabric's Key Components: Fabric Runtime and Fabric Transport

Scaling Socket.IO: Addressing Packet Loss and Event Routing in Horizontal Scaling