登录查看更多内容

Let's Code: Building a Custom OpenTelemetry Collector

Drew Robbins

Engineering Leader | Driving Innovation and Observability in Generative AI Applications

发布日期: 2023年6月27日

In past articles, we explored OpenTelemetry, a powerful tool that shines a light on the internal operations of your software. We also discussed the OpenTelemetry Collector, a service you can run alongside your software to collect Logs, Metrics, and Traces.

In this article, we are digging a bit deeper. Actually, a lot deeper! We'll go beyond the default uses of the OpenTelemetry Collector, and explore the landscape of custom distributions. This is where things get exciting, as it offers the agility to shape the tool to your unique requirements and situations.

In this article, you will discover how to craft your custom OpenTelemetry Collector Distribution, swap out components with your original code, and engineer new components to integrate into the collector.

To demonstrate how it all works, I will walk you through the creation of a custom receiver, built to gather TCP stats from your local machine and yield metrics for your monitoring backend.

To wrap it all up, I will point you to a repository hosting a complete working example inclusive of all the code snippets I'll discuss.

Understanding the OpenTelemetry Collector

The OpenTelemetry Collector is a versatile tool designed to gather, process, and export telemetry data. It's a pivotal component of the OpenTelemetry framework, acting as the intermediary that collects metrics, traces, and logs from your applications and services and then sends them to various backends for analysis and storage.

The power of the OpenTelemetry Collector lies in its modular architecture. It allows you to tailor the collector to your specific needs by assembling your own selection of components. There are two main distributions of the collector: Core and Contrib.

Core Distribution is the barebones version of the collector. It is stable, lightweight, and contains only a basic set of stable components - receivers, processors, and exporters. This version is ideal for those who wish to use a lean, well-tested, and efficient collector without any unnecessary extras.

Contrib Distribution, on the other hand, is the feature-packed variant. It includes all components from the Core, but also offers a wide range of additional receivers, processors, and exporters maintained by the community. This version is designed for those who need a more extensive set of components and don't mind trading some efficiency for added functionality.

More often than not, the Core distribution does not include all of the functionality a team requires for their telemetry needs. However, the Contrib Distribution contains too much, requiring the team to deploy a lot of code into their system that they will never use.

To provide a middle ground, the OpenTelemetry project offers a build tool for engineering teams to create their own curated and purposeful distribution of the collector with only the components they need.

Why Build a Custom OpenTelemetry Collector?

Custom distributions provide the ability to choose only the components you need, giving you the best of both worlds: the leanness of the Core distribution and the richness of features from the Contrib distribution.

Moreover, creating custom distributions can prove beneficial in the following scenarios:

Performance Tuning: If your team has specific performance requirements that the standard distributions can't meet, a custom distribution can be optimized for your environment.
Specialized Telemetry: For specialized telemetry that's not supported by the standard receivers, processors, or exporters, a custom distribution can incorporate the needed components.
Internal Tooling Integration: If your organization uses proprietary or in-house tooling, a custom distribution can be created to seamlessly integrate with these systems.
Security and Compliance: In some cases, it might be necessary to create a custom distribution to meet specific security protocols or compliance requirements.

At the time of this writing, the contrib distribution binary is just over 200mb. The minimal distribution binary we build below is around 20mb.

Building a Custom OpenTelemetry Collector

Creating your custom OpenTelemetry Collector is a straightforward process. With a few steps and the right tools, you'll have a custom distribution that suits your specific needs. In this section, we'll guide you through the process.

Step 1: Install the Dependencies

To start building, you'll need to have Go installed on your system as it's the language used to write OpenTelemetry Collector. If you haven't installed it yet, you can download it from the official Go website. After you've installed Go, you'll need to install the OpenTelemetry Collector Builder tool, which helps you compile and build your custom collector.

You can do this with the following command in your terminal:

go install go.opentelemetry.io/collector/cmd/builder@latest

This command fetches the latest version of the OpenTelemetry Collector Builder and installs it on your machine.

Step 2: Create the Configuration File

The Builder uses a configuration file, typically named builder-config.yaml, which specifies the components you want to include in your custom distribution. This file is structured into sections including dist, receivers, processors, and exporters.

Here's an example builder-config.yaml file:

dist:
? name: otelcol
? description: Custom OTel Collector distribution
? output_path: ./otelcol-dev
? otelcol_version: 0.80.0

receivers:
? - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/filestatsreceiver v0.80.0

processors:
? - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/attributesprocessor v0.80.0

exporters:
? - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter v0.80.0

This builder config includes just three components. A filestats receiver is a simple scraping component that reads the modification dates and sizes of a set of files and creates OpenTelemetry metrics. The attributes processor allows you to configure various modifications to the attributes of the OpenTelemetry signals. The prometheus exporter exposes a /metrics endpoint compatible with prometheus scrapers.

Step 3: Build the Custom Collector

Now that we've set up our configuration file, we can use the Builder to build our custom collector:

builder --config builder-config.yaml

This command will compile and build the custom OpenTelemetry Collector based on the configuration file you created, and the output will be located in the directory specified in the configuration file.

Step 4: Run the Collector

With your custom collector built, the next step is to run it. First, you'll need to create a configuration file for the collector itself, typically named otelcol.yaml. This file contains configuration information about how the collector should run, including how it should interact with the receivers, processors, and exporters.

Here's an example the works with the components in the OpenTelemetry Collector you created in Step 3:

receivers:
  filestats:
    include: /var/log/syslog*
    collection_interval: 5s

processors:
  attributes:
    actions:
      - key: host.name
        value: "$HOSTNAME"
        action: insert

exporters:
  prometheus:
    endpoint: 0.0.0.0:8889

service:
  pipelines:
    metrics:
      receivers: [filestats]
      processors: [attributes]
      exporters: [prometheus]

This configuration instructs the filestats receiver to collect information about the syslog files on the local system. The resulting metrics are passed to the attributes processor which adds the computer's hostname as an attribute to the metrics. The metrics are then passed to the prometheus exporter which exposes a /metrics endpoint on port 8889.

After you've created your otelcol.yaml, you can start the collector using the following command:

./otelcol-dev/otelcol --config otelcol.yaml

This command runs your custom collector with the specified configuration.

领英推荐

Business Logic Component [3 of 4]

Mike Matiunin 1 年前

Building Robust APIs with Elegance…

Code Graphers 1 年前

That one time we addressed a customer issue an…

Jeff Smith 3 个月前

Creating a Custom Receiver for TCP Stats

You can add new capabilities to the OpenTelemetry Collector through custom components like receivers, processors, and exporters. Receivers initiate the data journey in the collector by receiving or fetching data. Here, we'll focus on building a custom receiver specifically designed for scraping TCP stats.

So why TCP stats? Our team's Distinguished Engineer, Joe Long , convinced me that this is one of the most accurate metrics of the health of a system. Unlike application metrics, which are often beneficial for diagnostics but less so for operations teams, TCP stats provide a universally interpretable narrative about the system state.

TCP stats can flag most conditions that impact system performance. For instance, if the system experiences memory or CPU constraints, TCP stats usually make it visible as connections pool in the queue. Similarly, if the client application performance dips, regular application metrics might show a happy signal, but TCP stats would show growing or depleting connection queues – both irregularities.

Generating our Metrics

We've already put together an OpenTelemetry Collector in the previous section. Now, we're going to add our custom component, creating a subdirectory called tcpstatsreceiver. Once we have this directory, we start by defining our metrics and generating them using the OpenTelemetry Collector project's mdatagen tool.

We use the mdatagen tool to create boilerplate code for the metrics API, based on a metadata.yaml file. This file defines the metrics and attributes that our receiver will handle. Here's a sample snippet from the metadata.yaml file used for our TCP stats scraper:

metrics:
  tcp.queue.length:
    description: Total number of sockets.
    enabled: true
    gauge:
      value_type: int
    unit: sockets 
    attributes: [local.address, local.port]

The full metadata.yaml file in our example is available here.

The Component Factory

With our metadata defined, it's time to implement the factory for our receiver. This factory enables the OpenTelemetry Collector to produce instances of our receiver.

Here's a snippet from factory.go:

func NewFactory() receiver.Factory {
    return receiver.NewFactory(
        metadata.Type,
        createDefaultConfig,
        receiver.WithMetrics(CreateTcpStatsReceiver, component.StabilityLevelDevelopment),
    )
}

Here, NewFactory returns a factory for creating instances of the TCP stats receiver. receiver.NewFactory requires the type of the receiver, a function to create the default configuration, and options, one of which is receiver.WithMetrics. This option is used to set the metrics creation function and its stability level. Note that a receiver can also handle Logs and Traces by specifying additional options here.

The CreateTcpStatsReceiver function uses the scraperhelper to simplify the creation of a new receiver. scraperhelper is provided by the OpenTelemetry Collector to facilitate receivers that need to wake up periodically and collect information for publishing as signal data. Here, we create an instance of the scraper, and then use scraperhelper.NewScraperControllerReceiver to return a new instance of the receiver.

ns := newScraper(metricsBuilder, cfg.Path, cfg.PortFilter, settings.Logger)
scraper, err := scraperhelper.NewScraper(metadata.Type, ns.scrape)
return scraperhelper.NewScraperControllerReceiver(
        &cfg.ScraperControllerSettings,
        settings,
        consumer,
        scraperhelper.AddScraper(scraper))

In the above code, we pass the scrape function from our scraper to the scraperhelper. By default, scraperhelper will call this function every 10 seconds. You can adjust this interval in ScraperControllerSettings, which we will cover in the Configuration below.

The Scraper

The scraper function calls our TCP stats code to collect a summary of the TCP connections on the local machine. It then creates metrics using the generated metrics API and emits the metrics, returning them for collection by other OpenTelemetry components.

Here is a summarized version of the function in scraper.go:

func (s *scraper) scrape(ctx context.Context) (pmetric.Metrics, error) {
    stats, err := s.tcpStats.get()
    now := pcommon.NewTimestampFromTime(time.Now())
    for _, stat := range stats {
        s.metricsBuilder.RecordTCPQueueLengthDataPoint(now, stat.QueueLength, stat.LocalAddress, stat.LocalPort)
    }
    return s.metricsBuilder.Emit(), nil
}

We won't cover the tcpstats code here, but the code is available in tcpstats.go and documented in this README.

Configuration

Finally, make your custom OpenTelemetry Collector component more versatile by adding configurability. The OpenTelemetry Collector provides a configuration system, and to implement it for our component, we add the following function in config.go:

type Config struct {
    Path                                    string                   `mapstructure:"path"`       // Path to the file to be scraped for metrics (default: /proc/net/tcp)
    PortFilter                              string                   `mapstructure:"portfilter"` // Comma-separated list of ports to filter on (default: "")
    scraperhelper.ScraperControllerSettings `mapstructure:",squash"` // ScraperControllerSettings to configure scraping interval (default: 10s)
    metadata.MetricsBuilderConfig           `mapstructure:",squash"` // MetricsBuilderConfig to enable/disable specific metrics (default: all enabled)
}

func createDefaultConfig() component.Config {
    return &Config{
        Path:                      "/proc/net/tcp",
        PortFilter:                "",
        ScraperControllerSettings: scraperhelper.NewDefaultScraperControllerSettings(metadata.Type),
        MetricsBuilderConfig:      metadata.DefaultMetricsBuilderConfig(),
    }
}

This allows users to specify where to look for TCP stats, what ports to monitor, the scrape interval, and the generation of specific metrics.

Note that ScraperControllerSettings is provided by the OpenTelemetry Collector and MetricsBuilderConfig is generated by mdatagen.

Putting it All Together

Once we have completed and tested our code (unit tests are in the repository), we build it into our custom collector by updating builder-config.yaml with the following receivers section:

receivers
  - gomod: github.com/drewby/tcpstatsreceiver v0.80.0
    import: github.com/drewby/tcpstatsreceiver
    name: "tcpstatsreceiver"
    path: "./tcpstatsreceiver":

This tells the builder to find our custom component in the tcpstatsreceiver directory and build it into our custom collector. We can update our otelcol.yaml config file:

receivers:
  tcpstats:
    path: /proc/net/tcp
    portfilter: 8005
    collection_interval: 5s
...
service:
  pipelines:
    metrics:
      receivers: [tcpstats]
      processors: [attributes]
      exporters: [prometheus]

Finally, run the new collector (on a Linux-based system) and you should see TCP stats in the local /metrics endpoint. The repository also contains a test application that simulates slow connections and fills up the TCP queues.

Give it a Go

The GitHub repository contains a dev container that's ready for use in Visual Studio Code, or even directly from GitHub Codespaces.

The dev container has all the dependencies installed so you can get up and running quickly. Try out the tcpstatsreceiver or try creating your own custom component.

Conclusion

Congrats on powering through this code-intensive article. If you are still with me, then we are certainly cut from the same cloth! The OpenTelemetry Collector is a robust instrument with the flexibility to cater to various telemetry needs. If you can't find components that fit your system, it allows you to tailor-make functionalities with your own components.

If this article was helpful, don't forget to subscribe to the newsletter and spread the word amongst your network on LinkedIn.

Observability

595 位关注者

Jim Ettig

2 个月

That's a cool dive into customizing the OpenTelemetry Collector, Drew! How did you find the process of building a custom component? It’d be great to hear more on how monitoring TCP queues impacted the insights you gained compared to traditional metrics.

1 次回应

Anurag Pandey

Co-founder @ Incerto | Building Custom Observability for Fintech

1 年

Amazing Article! Can you share the resource for why tcpstat is a good health check ?

查看更多评论

要查看或添加评论，请登录

Drew Robbins的更多文章

Defining Generative AI Monitoring Standards: What’s in a Name?

2024年7月6日

Defining Generative AI Monitoring Standards: What’s in a Name?

We have been doing a lot of Generative AI work lately. I’m sure many of the readers of this newsletter have as well.
Observing a Greener Future: Carbon Aware SDK

2024年4月23日

Observing a Greener Future: Carbon Aware SDK

As software engineers, we're deeply invested in observability to ensure our systems perform optimally and reliably…

2 条评论
OpenTelemetry Semantic Conventions for Generative AI

2024年4月17日

OpenTelemetry Semantic Conventions for Generative AI

Exciting news from our OpenTelemetry working group! We've just merged our first pull-request for OpenTelemetry Semantic…

4 条评论
Why Structured Logging Matters

2024年3月28日

Why Structured Logging Matters

I work with many talented individuals at Microsoft, including Maho Pacheco. He recently authored an insightful article…

1 条评论
Building a Dashboard with Grafana: A First Attempt

2024年1月15日

Building a Dashboard with Grafana: A First Attempt

Every year, during the end-of-year holidays I try to do some reading and I try to learn something new. This year, I…
Monitoring Generative AI Applications

2023年9月19日

Monitoring Generative AI Applications

As the adoption of Generative AI applications continues to grow, so does the necessity for observability using robust…
Bending OpenAI with Traditional Programming for Unique Recipe Creation

2023年8月13日

Bending OpenAI with Traditional Programming for Unique Recipe Creation

Introduction In today's technological landscape, ChatGPT and other Large Language Models (LLM) have captured the…

1 条评论
Sampling Strategies in Observability

2023年5月28日

Sampling Strategies in Observability

Balancing data collection is critical in system monitoring. Collect too much, and you risk an overflow of information…
Simplifying Telemetry Data Collection

2023年5月15日

Simplifying Telemetry Data Collection

Enjoying this newsletter? Please share it with your network and encourage them to subscribe to receive more articles on…

1 条评论
Let's Code: Writing Observable Code

2023年5月10日

Let's Code: Writing Observable Code

Enjoying this newsletter? Please share it with your network and encourage them to subscribe to receive more articles on…

1 条评论

See all articles

Let's Code: Building a Custom OpenTelemetry Collector

Drew Robbins

Engineering Leader | Driving Innovation and Observability in Generative AI Applications

Understanding the OpenTelemetry Collector

Why Build a Custom OpenTelemetry Collector?

Building a Custom OpenTelemetry Collector

Step 1: Install the Dependencies

Step 2: Create the Configuration File

Step 3: Build the Custom Collector

Step 4: Run the Collector

领英推荐

Creating a Custom Receiver for TCP Stats

Generating our Metrics

The Component Factory

The Scraper

Configuration

Putting it All Together

Give it a Go

Conclusion

Observability

595 位关注者

Drew Robbins的更多文章

社区洞察

其他会员也浏览了

Building a Redux-Powered Blog with a Mock API

REST API Status Codes: From Chaos to Clarity

Understanding Business Rules vs. Trivial Validation in .NET C#

Shorten URLs Using the HTTP Callout POST Method

Beyond JSON: New Data Formats Revolutionizing System Integration

Mastering Custom Validation Rules in Laravel 11: A Complete Guide

Implementing Repository Pattern With NestJS

Response Codes of Your REST APIs are vital - Do not ignore them!

Working with SCA tools

What is an API?

Understanding the OpenTelemetry Collector

Why Build a Custom OpenTelemetry Collector?

Building a Custom OpenTelemetry Collector

Step 1: Install the Dependencies

Step 2: Create the Configuration File

Step 3: Build the Custom Collector

Step 4: Run the Collector

领英推荐

Creating a Custom Receiver for TCP Stats

Generating our Metrics

The Component Factory

The Scraper

Configuration

Putting it All Together

Give it a Go

Conclusion

Observability

595 位关注者

Drew Robbins的更多文章

Defining Generative AI Monitoring Standards: What’s in a Name?

Observing a Greener Future: Carbon Aware SDK

OpenTelemetry Semantic Conventions for Generative AI

Why Structured Logging Matters

Building a Dashboard with Grafana: A First Attempt

Monitoring Generative AI Applications

Bending OpenAI with Traditional Programming for Unique Recipe Creation

Sampling Strategies in Observability

Simplifying Telemetry Data Collection

Let's Code: Writing Observable Code

社区洞察

其他会员也浏览了

Building a Redux-Powered Blog with a Mock API

REST API Status Codes: From Chaos to Clarity

Understanding Business Rules vs. Trivial Validation in .NET C#

Shorten URLs Using the HTTP Callout POST Method

Beyond JSON: New Data Formats Revolutionizing System Integration

Mastering Custom Validation Rules in Laravel 11: A Complete Guide

Implementing Repository Pattern With NestJS

Response Codes of Your REST APIs are vital - Do not ignore them!

Working with SCA tools

What is an API?