Let's Code: Building a Custom OpenTelemetry Collector
Drew Robbins
Engineering Leader | Driving Innovation and Observability in Generative AI Applications
In past articles, we explored OpenTelemetry, a powerful tool that shines a light on the internal operations of your software. We also discussed the OpenTelemetry Collector, a service you can run alongside your software to collect Logs, Metrics, and Traces.
In this article, we are digging a bit deeper. Actually, a lot deeper! We'll go beyond the default uses of the OpenTelemetry Collector, and explore the landscape of custom distributions. This is where things get exciting, as it offers the agility to shape the tool to your unique requirements and situations.
In this article, you will discover how to craft your custom OpenTelemetry Collector Distribution, swap out components with your original code, and engineer new components to integrate into the collector.
To demonstrate how it all works, I will walk you through the creation of a custom receiver, built to gather TCP stats from your local machine and yield metrics for your monitoring backend.
To wrap it all up, I will point you to a repository hosting a complete working example inclusive of all the code snippets I'll discuss.
Understanding the OpenTelemetry Collector
The OpenTelemetry Collector is a versatile tool designed to gather, process, and export telemetry data. It's a pivotal component of the OpenTelemetry framework, acting as the intermediary that collects metrics, traces, and logs from your applications and services and then sends them to various backends for analysis and storage.
The power of the OpenTelemetry Collector lies in its modular architecture. It allows you to tailor the collector to your specific needs by assembling your own selection of components. There are two main distributions of the collector: Core and Contrib.
Core Distribution is the barebones version of the collector. It is stable, lightweight, and contains only a basic set of stable components - receivers, processors, and exporters. This version is ideal for those who wish to use a lean, well-tested, and efficient collector without any unnecessary extras.
Contrib Distribution, on the other hand, is the feature-packed variant. It includes all components from the Core, but also offers a wide range of additional receivers, processors, and exporters maintained by the community. This version is designed for those who need a more extensive set of components and don't mind trading some efficiency for added functionality.
More often than not, the Core distribution does not include all of the functionality a team requires for their telemetry needs. However, the Contrib Distribution contains too much, requiring the team to deploy a lot of code into their system that they will never use.
To provide a middle ground, the OpenTelemetry project offers a build tool for engineering teams to create their own curated and purposeful distribution of the collector with only the components they need.
Why Build a Custom OpenTelemetry Collector?
Custom distributions provide the ability to choose only the components you need, giving you the best of both worlds: the leanness of the Core distribution and the richness of features from the Contrib distribution.
Moreover, creating custom distributions can prove beneficial in the following scenarios:
At the time of this writing, the contrib distribution binary is just over 200mb. The minimal distribution binary we build below is around 20mb.
Building a Custom OpenTelemetry Collector
Creating your custom OpenTelemetry Collector is a straightforward process. With a few steps and the right tools, you'll have a custom distribution that suits your specific needs. In this section, we'll guide you through the process.
Step 1: Install the Dependencies
To start building, you'll need to have Go installed on your system as it's the language used to write OpenTelemetry Collector. If you haven't installed it yet, you can download it from the official Go website. After you've installed Go, you'll need to install the OpenTelemetry Collector Builder tool, which helps you compile and build your custom collector.
You can do this with the following command in your terminal:
go install go.opentelemetry.io/collector/cmd/builder@latest
This command fetches the latest version of the OpenTelemetry Collector Builder and installs it on your machine.
Step 2: Create the Configuration File
The Builder uses a configuration file, typically named builder-config.yaml, which specifies the components you want to include in your custom distribution. This file is structured into sections including dist, receivers, processors, and exporters.
Here's an example builder-config.yaml file:
dist:
? name: otelcol
? description: Custom OTel Collector distribution
? output_path: ./otelcol-dev
? otelcol_version: 0.80.0
receivers:
? - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/filestatsreceiver v0.80.0
processors:
? - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/attributesprocessor v0.80.0
exporters:
? - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter v0.80.0
This builder config includes just three components. A filestats receiver is a simple scraping component that reads the modification dates and sizes of a set of files and creates OpenTelemetry metrics. The attributes processor allows you to configure various modifications to the attributes of the OpenTelemetry signals. The prometheus exporter exposes a /metrics endpoint compatible with prometheus scrapers.
Step 3: Build the Custom Collector
Now that we've set up our configuration file, we can use the Builder to build our custom collector:
builder --config builder-config.yaml
This command will compile and build the custom OpenTelemetry Collector based on the configuration file you created, and the output will be located in the directory specified in the configuration file.
Step 4: Run the Collector
With your custom collector built, the next step is to run it. First, you'll need to create a configuration file for the collector itself, typically named otelcol.yaml. This file contains configuration information about how the collector should run, including how it should interact with the receivers, processors, and exporters.
Here's an example the works with the components in the OpenTelemetry Collector you created in Step 3:
receivers:
filestats:
include: /var/log/syslog*
collection_interval: 5s
processors:
attributes:
actions:
- key: host.name
value: "$HOSTNAME"
action: insert
exporters:
prometheus:
endpoint: 0.0.0.0:8889
service:
pipelines:
metrics:
receivers: [filestats]
processors: [attributes]
exporters: [prometheus]
This configuration instructs the filestats receiver to collect information about the syslog files on the local system. The resulting metrics are passed to the attributes processor which adds the computer's hostname as an attribute to the metrics. The metrics are then passed to the prometheus exporter which exposes a /metrics endpoint on port 8889.
After you've created your otelcol.yaml, you can start the collector using the following command:
./otelcol-dev/otelcol --config otelcol.yaml
This command runs your custom collector with the specified configuration.
领英推荐
Creating a Custom Receiver for TCP Stats
You can add new capabilities to the OpenTelemetry Collector through custom components like receivers, processors, and exporters. Receivers initiate the data journey in the collector by receiving or fetching data. Here, we'll focus on building a custom receiver specifically designed for scraping TCP stats.
So why TCP stats? Our team's Distinguished Engineer, Joe Long , convinced me that this is one of the most accurate metrics of the health of a system. Unlike application metrics, which are often beneficial for diagnostics but less so for operations teams, TCP stats provide a universally interpretable narrative about the system state.
TCP stats can flag most conditions that impact system performance. For instance, if the system experiences memory or CPU constraints, TCP stats usually make it visible as connections pool in the queue. Similarly, if the client application performance dips, regular application metrics might show a happy signal, but TCP stats would show growing or depleting connection queues – both irregularities.
Generating our Metrics
We've already put together an OpenTelemetry Collector in the previous section. Now, we're going to add our custom component, creating a subdirectory called tcpstatsreceiver. Once we have this directory, we start by defining our metrics and generating them using the OpenTelemetry Collector project's mdatagen tool.
We use the mdatagen tool to create boilerplate code for the metrics API, based on a metadata.yaml file. This file defines the metrics and attributes that our receiver will handle. Here's a sample snippet from the metadata.yaml file used for our TCP stats scraper:
metrics:
tcp.queue.length:
description: Total number of sockets.
enabled: true
gauge:
value_type: int
unit: sockets
attributes: [local.address, local.port]
The full metadata.yaml file in our example is available here.
The Component Factory
With our metadata defined, it's time to implement the factory for our receiver. This factory enables the OpenTelemetry Collector to produce instances of our receiver.
Here's a snippet from factory.go:
func NewFactory() receiver.Factory {
return receiver.NewFactory(
metadata.Type,
createDefaultConfig,
receiver.WithMetrics(CreateTcpStatsReceiver, component.StabilityLevelDevelopment),
)
}
Here, NewFactory returns a factory for creating instances of the TCP stats receiver. receiver.NewFactory requires the type of the receiver, a function to create the default configuration, and options, one of which is receiver.WithMetrics. This option is used to set the metrics creation function and its stability level. Note that a receiver can also handle Logs and Traces by specifying additional options here.
The CreateTcpStatsReceiver function uses the scraperhelper to simplify the creation of a new receiver. scraperhelper is provided by the OpenTelemetry Collector to facilitate receivers that need to wake up periodically and collect information for publishing as signal data. Here, we create an instance of the scraper, and then use scraperhelper.NewScraperControllerReceiver to return a new instance of the receiver.
ns := newScraper(metricsBuilder, cfg.Path, cfg.PortFilter, settings.Logger)
scraper, err := scraperhelper.NewScraper(metadata.Type, ns.scrape)
return scraperhelper.NewScraperControllerReceiver(
&cfg.ScraperControllerSettings,
settings,
consumer,
scraperhelper.AddScraper(scraper))
In the above code, we pass the scrape function from our scraper to the scraperhelper. By default, scraperhelper will call this function every 10 seconds. You can adjust this interval in ScraperControllerSettings, which we will cover in the Configuration below.
The Scraper
The scraper function calls our TCP stats code to collect a summary of the TCP connections on the local machine. It then creates metrics using the generated metrics API and emits the metrics, returning them for collection by other OpenTelemetry components.
Here is a summarized version of the function in scraper.go:
func (s *scraper) scrape(ctx context.Context) (pmetric.Metrics, error) {
stats, err := s.tcpStats.get()
now := pcommon.NewTimestampFromTime(time.Now())
for _, stat := range stats {
s.metricsBuilder.RecordTCPQueueLengthDataPoint(now, stat.QueueLength, stat.LocalAddress, stat.LocalPort)
}
return s.metricsBuilder.Emit(), nil
}
We won't cover the tcpstats code here, but the code is available in tcpstats.go and documented in this README.
Configuration
Finally, make your custom OpenTelemetry Collector component more versatile by adding configurability. The OpenTelemetry Collector provides a configuration system, and to implement it for our component, we add the following function in config.go:
type Config struct {
Path string `mapstructure:"path"` // Path to the file to be scraped for metrics (default: /proc/net/tcp)
PortFilter string `mapstructure:"portfilter"` // Comma-separated list of ports to filter on (default: "")
scraperhelper.ScraperControllerSettings `mapstructure:",squash"` // ScraperControllerSettings to configure scraping interval (default: 10s)
metadata.MetricsBuilderConfig `mapstructure:",squash"` // MetricsBuilderConfig to enable/disable specific metrics (default: all enabled)
}
func createDefaultConfig() component.Config {
return &Config{
Path: "/proc/net/tcp",
PortFilter: "",
ScraperControllerSettings: scraperhelper.NewDefaultScraperControllerSettings(metadata.Type),
MetricsBuilderConfig: metadata.DefaultMetricsBuilderConfig(),
}
}
This allows users to specify where to look for TCP stats, what ports to monitor, the scrape interval, and the generation of specific metrics.
Note that ScraperControllerSettings is provided by the OpenTelemetry Collector and MetricsBuilderConfig is generated by mdatagen.
Putting it All Together
Once we have completed and tested our code (unit tests are in the repository), we build it into our custom collector by updating builder-config.yaml with the following receivers section:
receivers
- gomod: github.com/drewby/tcpstatsreceiver v0.80.0
import: github.com/drewby/tcpstatsreceiver
name: "tcpstatsreceiver"
path: "./tcpstatsreceiver":
This tells the builder to find our custom component in the tcpstatsreceiver directory and build it into our custom collector. We can update our otelcol.yaml config file:
receivers:
tcpstats:
path: /proc/net/tcp
portfilter: 8005
collection_interval: 5s
...
service:
pipelines:
metrics:
receivers: [tcpstats]
processors: [attributes]
exporters: [prometheus]
Finally, run the new collector (on a Linux-based system) and you should see TCP stats in the local /metrics endpoint. The repository also contains a test application that simulates slow connections and fills up the TCP queues.
Give it a Go
The GitHub repository contains a dev container that's ready for use in Visual Studio Code, or even directly from GitHub Codespaces.
The dev container has all the dependencies installed so you can get up and running quickly. Try out the tcpstatsreceiver or try creating your own custom component.
Conclusion
Congrats on powering through this code-intensive article. If you are still with me, then we are certainly cut from the same cloth! The OpenTelemetry Collector is a robust instrument with the flexibility to cater to various telemetry needs. If you can't find components that fit your system, it allows you to tailor-make functionalities with your own components.
If this article was helpful, don't forget to subscribe to the newsletter and spread the word amongst your network on LinkedIn.
Co-founder @ Incerto | Building Custom Observability for Fintech
11 个月Amazing Article! Can you share the resource for why tcpstat is a good health check ?