Building a Dashboard with Grafana: A First Attempt
Drew Robbins
Engineering Leader | Driving Innovation and Observability in Generative AI Applications
Every year, during the end-of-year holidays I try to do some reading and I try to learn something new. This year, I wanted to learn more about Grafana and how to create effective dashboards. I had the perfect project to use as a learning scenario: I needed to set up a Raspberry Pi with OpenVPN so I could have a more secure and reliable VPN solution to connect back to the US from Japan. In this article, I don’t go over how to set up OpenVPN on Raspberry PI. There is plenty of documentation for that. Rather, I will share how I designed the Grafana dashboard and implemented a custom OpenTelemetry Collector to gather the necessary metrics.
There is code available here: drewby/otel-openvpn (github.com)
Grafana Dashboards
Grafana is an impressive tool for visualization and monitoring that many people in the industry use to track everything from system infrastructure to industrial systems. If you are new to Grafana or just want some inspiration, check out these amazing dashboards featured for 2023: Grafana dashboards in 2023 - Outstanding examples of the year
Many kinds of metrics can be measured in the Observability field from different sources, such as operating systems, cloud services, edge devices, and more. It is easy to gather this data, enrich it with helpful attributes for query, and keep them in a time-series database thanks to the excellent tools in the OpenTelemetry ecosystem. What is more challenging, and often a problem for people who run these systems, is how to present the data in ways that are useful and meaningful. ?
Grafana provides a lot of documentation and one of the articles that is useful is called Grafana dashboard best practices. This article covers several approaches for how to think about observability from the USE (Utilization, Saturation, Errors) method to the RED (Rate, Errors, Duration) method and explains the Four Golden Signals (Latency, Traffic, Errors, and Saturation). But the later sections were informative for me. In particular, this sentence made me think:
?“A dashboard should tell a story or answer a question”
So, my first step was obvious. Before I created a dashboard, and before I did the work of gathering the required metrics, I needed to think through the question I was trying to answer.
Designing the Raspberry PI OpenVPN Dashboard
I followed an investigative approach to create the Raspberry Pi OpenVPN Dashboard, and I think this method can be useful for others in comparable situations. The main question I wanted to explore was how well the VPN works and what factors might affect its performance, such as network, CPU, memory or disk operations. Here's an explanation of the steps involved:
This method of setting a specific goal, finding key areas, creating a basic draft, and improving the dashboard gradually is something I’ll use more often in the future.
Sourcing the right Metrics
I started by identifying important OpenVPN and network metrics for tha main visualizations. This made sense since the main purpose of the setup was to keep a good VPN connection. I needed to know how well the VPN and the network worked, to make sure they were reliable and efficient. I wanted to measure the number of VPN connections and the bandwidth used by each one.
Then I wanted to see how other factors might affect the quality of the VPN connections. This meant getting data on the ethernet interface, CPU usage, Processes created, Memory utilization, and maybe Disk I/O. I roughly ordered these by how much I thought they could influence performance. I was also curious about how the device's temperature could change and affect performance, so CPU temperature was an important metric.
This was an iterative process, so the initial list of metrics changed a bit once I found out what was possible and what was not. ?
领英推荐
Collecting the Metrics
I’m a big fan of OpenTelemetry and the OpenTelemetry Collector. The collector provides a vast set of extensions to receive, process, and export metrics. Metrics can be sourced from a variety of sources. They can be processed, adding necessary attributes. Finally, they can be exported to numerous backends. For me, the backend would be Grafana Cloud, but the metrics can easily be redirected to other backend analytic tools.
Many of the metrics I wanted to collect are readily available via the Host Metrics Receiver in the OpenTelemetry Contrib project. This receiver collects CPU, Process, Memory, and Disk metrics from Windows, MacOS, and Unix systems.
I also wanted to collect metrics on OpenVPN. Unfortunately, there is no readily available component in OpenTelemetry for this. Instead, I wrote a custom component to collect OpenVPN connections, and the transmitted and received bytes from each connection.
Finally, the CPU Temperature for Raspberry PI is available via a specific mechanism on the device. So I also wrote a quick custom component to read this information.
The custom OpenTelemetry collector build file and the source for the two components are available here: drewby/otel-openvpn (github.com)
Creating the Dashboard
The next step was to create my Grafana dashboard. The dashboard source file is available here for import into your own instance of Grafana.
Creating the dashboard was certainly an iterative process, but it was led by my initial notes of what question I wanted to answer and the categories of visualizations I would need. This process helped eliminate dumping random metrics into random charts on a dashboard that would be noisy and useless.
This is the dashboard I came up with:
This view of the Grafana dashboard captures a specific period of network and system activity. Around 8:00-9:00 AM, a single video stream was watched remotely, pausing a few times. The brief spike in dropped network packets just before 8:30 AM corresponds to a temporary connection loss from the iPhone, not the Raspberry Pi. At 9:00 AM, the network load increased due to three simultaneous video streams initiated from different devices. Finally, at 9:30 AM, a connection to a remote DevContainer on the device and the initiation of a build process are visible, marked by a noticeable increase in CPU, memory, and disk usage, showcasing the system's responsiveness to varying computational demands.
As mentioned earlier, a well-crafted dashboard narrates a coherent story, guiding the user through a logical progression of metrics for a thorough understanding of the system's state.
Conclusion
This Grafana dashboard project for the Raspberry Pi OpenVPN setup shows how useful data visualization can be for network and system monitoring. The process from idea to implementation highlights the need for clear goals, incremental design, and the flexibility of open-source tools like OpenTelemetry. I still have a lot to learn about Grafana and creating dashboards, but this experience helps to establish a method that begins with defining the questions first and then building a dashboard to answer them.
If you enjoyed this article, please consider signing up for the newsletter and sharing it with your LinkedIn network. Your feedback and insights are very valuable; they help me learn and improve. So, please don't hesitate to leave comments with your suggestions or tips. Let's keep learning and growing together in the constantly changing world of data visualization and system monitoring