Building a Dashboard with Grafana: A First Attempt

Building a Dashboard with Grafana: A First Attempt

Every year, during the end-of-year holidays I try to do some reading and I try to learn something new. This year, I wanted to learn more about Grafana and how to create effective dashboards. I had the perfect project to use as a learning scenario: I needed to set up a Raspberry Pi with OpenVPN so I could have a more secure and reliable VPN solution to connect back to the US from Japan. In this article, I don’t go over how to set up OpenVPN on Raspberry PI. There is plenty of documentation for that. Rather, I will share how I designed the Grafana dashboard and implemented a custom OpenTelemetry Collector to gather the necessary metrics.

There is code available here: drewby/otel-openvpn (github.com)

Grafana Dashboards

Grafana is an impressive tool for visualization and monitoring that many people in the industry use to track everything from system infrastructure to industrial systems. If you are new to Grafana or just want some inspiration, check out these amazing dashboards featured for 2023: Grafana dashboards in 2023 - Outstanding examples of the year

Many kinds of metrics can be measured in the Observability field from different sources, such as operating systems, cloud services, edge devices, and more. It is easy to gather this data, enrich it with helpful attributes for query, and keep them in a time-series database thanks to the excellent tools in the OpenTelemetry ecosystem. What is more challenging, and often a problem for people who run these systems, is how to present the data in ways that are useful and meaningful. ?

Grafana provides a lot of documentation and one of the articles that is useful is called Grafana dashboard best practices. This article covers several approaches for how to think about observability from the USE (Utilization, Saturation, Errors) method to the RED (Rate, Errors, Duration) method and explains the Four Golden Signals (Latency, Traffic, Errors, and Saturation). But the later sections were informative for me. In particular, this sentence made me think:

?“A dashboard should tell a story or answer a question”

So, my first step was obvious. Before I created a dashboard, and before I did the work of gathering the required metrics, I needed to think through the question I was trying to answer.

Designing the Raspberry PI OpenVPN Dashboard

I followed an investigative approach to create the Raspberry Pi OpenVPN Dashboard, and I think this method can be useful for others in comparable situations. The main question I wanted to explore was how well the VPN works and what factors might affect its performance, such as network, CPU, memory or disk operations. Here's an explanation of the steps involved:

  1. Defining the Objective: The primary step was to set a clear goal for the dashboard. I aimed to gain insights into the performance and reliability of the VPN solution and the Raspberry Pi system. This goal shaped every other decision in the dashboard creation process.
  2. Identifying Key Areas of Interest: With a clear objective in mind, the next task was to pinpoint the crucial aspects to monitor. This led to the identification of five critical areas: OpenVPN Connections, Network Performance, CPU and Processes, Memory Usage, and Disk Performance. These areas were chosen for their direct impact on the system's performance and health.
  3. Drafting the Initial Concept: Before jumping into digital creation, I started by writing down the potential categories and metrics on paper. This helped in visualizing the dashboard layout and the flow of information. An example of this was my curiosity about the Raspberry Pi's temperature in its fanless case. While it eventually turned out to be a non-issue, the process of questioning and exploring these categories was enlightening and shaped the dashboard's development.
  4. Iterative Development: Understanding that perfection is a process, I began with a basic version of the dashboard, ready to be refined over time. This iterative approach allowed for adjustments and enhancements based on real data availability and usage experience. It was a cycle of developing, observing, learning, and improving.

This method of setting a specific goal, finding key areas, creating a basic draft, and improving the dashboard gradually is something I’ll use more often in the future.

Sourcing the right Metrics

I started by identifying important OpenVPN and network metrics for tha main visualizations. This made sense since the main purpose of the setup was to keep a good VPN connection. I needed to know how well the VPN and the network worked, to make sure they were reliable and efficient. I wanted to measure the number of VPN connections and the bandwidth used by each one.

Then I wanted to see how other factors might affect the quality of the VPN connections. This meant getting data on the ethernet interface, CPU usage, Processes created, Memory utilization, and maybe Disk I/O. I roughly ordered these by how much I thought they could influence performance. I was also curious about how the device's temperature could change and affect performance, so CPU temperature was an important metric.

This was an iterative process, so the initial list of metrics changed a bit once I found out what was possible and what was not. ?

Collecting the Metrics

I’m a big fan of OpenTelemetry and the OpenTelemetry Collector. The collector provides a vast set of extensions to receive, process, and export metrics. Metrics can be sourced from a variety of sources. They can be processed, adding necessary attributes. Finally, they can be exported to numerous backends. For me, the backend would be Grafana Cloud, but the metrics can easily be redirected to other backend analytic tools.

Many of the metrics I wanted to collect are readily available via the Host Metrics Receiver in the OpenTelemetry Contrib project. This receiver collects CPU, Process, Memory, and Disk metrics from Windows, MacOS, and Unix systems.

I also wanted to collect metrics on OpenVPN. Unfortunately, there is no readily available component in OpenTelemetry for this. Instead, I wrote a custom component to collect OpenVPN connections, and the transmitted and received bytes from each connection.

Finally, the CPU Temperature for Raspberry PI is available via a specific mechanism on the device. So I also wrote a quick custom component to read this information.

The custom OpenTelemetry collector build file and the source for the two components are available here: drewby/otel-openvpn (github.com)

Creating the Dashboard

The next step was to create my Grafana dashboard. The dashboard source file is available here for import into your own instance of Grafana.

Creating the dashboard was certainly an iterative process, but it was led by my initial notes of what question I wanted to answer and the categories of visualizations I would need. This process helped eliminate dumping random metrics into random charts on a dashboard that would be noisy and useless.

This is the dashboard I came up with:

Grafana Dashboard Example

This view of the Grafana dashboard captures a specific period of network and system activity. Around 8:00-9:00 AM, a single video stream was watched remotely, pausing a few times. The brief spike in dropped network packets just before 8:30 AM corresponds to a temporary connection loss from the iPhone, not the Raspberry Pi. At 9:00 AM, the network load increased due to three simultaneous video streams initiated from different devices. Finally, at 9:30 AM, a connection to a remote DevContainer on the device and the initiation of a build process are visible, marked by a noticeable increase in CPU, memory, and disk usage, showcasing the system's responsiveness to varying computational demands.

As mentioned earlier, a well-crafted dashboard narrates a coherent story, guiding the user through a logical progression of metrics for a thorough understanding of the system's state.

  1. Beginning with the 'OpenVPN Connections', users can immediately grasp the number of active connections and which ones are consuming the most bandwidth.
  2. This sets the stage for digging into 'Network Performance', where the impact of these connections on network health — such as errors and packet drops — becomes apparent.
  3. From here, the 'CPU and Processes' category offers insights into how these network activities influence the server’s processing load.
  4. This leads logically into 'Memory Usage', highlighting the effects on system memory, crucial for overall performance.
  5. Finally, 'Disk Performance' rounds off the story, revealing the downstream impact on storage systems, completing the picture of how network activities and system resources interact and influence each other.

Conclusion

This Grafana dashboard project for the Raspberry Pi OpenVPN setup shows how useful data visualization can be for network and system monitoring. The process from idea to implementation highlights the need for clear goals, incremental design, and the flexibility of open-source tools like OpenTelemetry. I still have a lot to learn about Grafana and creating dashboards, but this experience helps to establish a method that begins with defining the questions first and then building a dashboard to answer them.

If you enjoyed this article, please consider signing up for the newsletter and sharing it with your LinkedIn network. Your feedback and insights are very valuable; they help me learn and improve. So, please don't hesitate to leave comments with your suggestions or tips. Let's keep learning and growing together in the constantly changing world of data visualization and system monitoring


要查看或添加评论,请登录

Drew Robbins的更多文章

  • Defining Generative AI Monitoring Standards: What’s in a Name?

    Defining Generative AI Monitoring Standards: What’s in a Name?

    We have been doing a lot of Generative AI work lately. I’m sure many of the readers of this newsletter have as well.

  • Observing a Greener Future: Carbon Aware SDK

    Observing a Greener Future: Carbon Aware SDK

    As software engineers, we're deeply invested in observability to ensure our systems perform optimally and reliably…

    2 条评论
  • OpenTelemetry Semantic Conventions for Generative AI

    OpenTelemetry Semantic Conventions for Generative AI

    Exciting news from our OpenTelemetry working group! We've just merged our first pull-request for OpenTelemetry Semantic…

    4 条评论
  • Why Structured Logging Matters

    Why Structured Logging Matters

    I work with many talented individuals at Microsoft, including Maho Pacheco. He recently authored an insightful article…

    1 条评论
  • Monitoring Generative AI Applications

    Monitoring Generative AI Applications

    As the adoption of Generative AI applications continues to grow, so does the necessity for observability using robust…

  • Bending OpenAI with Traditional Programming for Unique Recipe Creation

    Bending OpenAI with Traditional Programming for Unique Recipe Creation

    Introduction In today's technological landscape, ChatGPT and other Large Language Models (LLM) have captured the…

    1 条评论
  • Let's Code: Building a Custom OpenTelemetry Collector

    Let's Code: Building a Custom OpenTelemetry Collector

    In past articles, we explored OpenTelemetry, a powerful tool that shines a light on the internal operations of your…

    2 条评论
  • Sampling Strategies in Observability

    Sampling Strategies in Observability

    Balancing data collection is critical in system monitoring. Collect too much, and you risk an overflow of information…

  • Simplifying Telemetry Data Collection

    Simplifying Telemetry Data Collection

    Enjoying this newsletter? Please share it with your network and encourage them to subscribe to receive more articles on…

    1 条评论
  • Let's Code: Writing Observable Code

    Let's Code: Writing Observable Code

    Enjoying this newsletter? Please share it with your network and encourage them to subscribe to receive more articles on…

    1 条评论

社区洞察

其他会员也浏览了