登录查看更多内容

Simplifying Telemetry Data Collection

Drew Robbins

Engineering Leader | Driving Innovation and Observability in Generative AI Applications

发布日期: 2023年5月15日

Enjoying this newsletter? Please share it with your network and encourage them to subscribe to receive more articles on observability.

In modern distributed systems, collecting telemetry data is vital for gaining performance insights, troubleshooting issues, and optimizing resource utilization. As applications and services become more intricate and evolve, the need for a robust telemetry collection solution becomes increasingly critical. Without an efficient and centralized approach to collecting telemetry data, monitoring and understanding system behavior can become challenging and time-consuming.

The OpenTelemetry Collector is a vendor-neutral component that can receive, process, and export telemetry data from a variety of sources. It simplifies the task of collecting and processing data by offering a standard way of routing data from various sources to multiple backends, reducing the complexity of deploying and maintaining data collectors.?

The OpenTelemetry Collector's core functionality is based on three basic types of functions: Receivers, Processors, and Exporters.?

Receivers: Responsible for ingesting telemetry data from various sources, such as agents, sidecars, or even directly from applications.??
Processors: Apply transformations, filtering, and sampling to the incoming data to improve its quality and reduce its volume.?
Exporters: Forward the processed data to the chosen backends, such as cloud providers, data stores, or visualization tools.?

Using these building blocks, pipelines for logs, metrics, and traces can be composed to fit the specific observability requirements of an application or service. For example, an application that generates high volumes of logs can use a pipeline with a Log Receiver, a Processor that filters out irrelevant data, and an Exporter that sends the data to a log analysis tool like Elasticsearch or Splunk.?

No alt text provided for this image — OpenTelemetry Collector Pipeline

The OpenTelemetry Collector also has a Contrib project, which provides additional Receivers, Processors, and Exporters. These Contrib components can be used to support specific technologies, protocols, or use cases, such as Kubernetes monitoring or Azure integration.?

Finally, the OpenTelemetry Collector can be deployed in various architectures and configurations, such as a Sidecar (per-pod), DaemonSet (per-node), or a Gateway (stand-alone). These configurations can be combined to create different collector architectures as required. It can also be scaled horizontally to handle large volumes of data and support high availability and failover scenarios.?

Choosing a Distribution

You have three distributions to choose from for the OpenTelemetry Collector: core, contrib, and custom.

The core distribution includes the most commonly used and stable components from both the OpenTelemetry Collector and the Contrib project. It includes features like OTLP importers and exports, simple processors for adding attributes or filtering signals, and popular exporters like Prometheus and Jaeger.

The contrib distribution, on the other hand, includes all the components from both projects, regardless of their current state (alpha, beta, stable, etc.). It contains over a hundred components in total.

While most users might need a few components from the contrib project, they may not want to deploy all of them. In such cases, the custom distribution allows you to build your own distribution using a build tool provided by the OpenTelemetry Collector project. To do this, you need to define a file called build-config.yaml, which specifies the components you want to include. Here's an example of what the file might look like:

dist:
    name: otelcol
    description: Custom OTel Collector distribution
    output_path: .
    otelcol_version: 0.74.0

exporters:
    - gomod: go.opentelemetry.io/collector/exporter/otlphttpexporter v0.74.0
    - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/lokiexporter v0.74.0

processors:
    - gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.74.0
    - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/transformprocessor v0.74.0

receivers:
    - gomod: go.opentelemetry.io/collector/receiver/otlpreceiver v0.74.0

In this example, the custom distribution will only include specific components for your collection system. It will have an OTLP receiver and exporter, a Loki exporter, a batch processor, and a transform processor.

To create this custom distribution, you need to install the builder using go install and then pass the build-config.yaml file to the builder. This will generate your custom image. Additionally, you can incorporate this process into a Dockerfile for easier deployment in a cluster.

To build the custom distribution, you can use the following command:

builder --config builder-config.yaml

There is a working example of a custom collector and the Dockerfile to build it here: https://github.com/drewby/otelcollector-example

领英推荐

Unraveling the Threads: Data Fabric vs Data Mesh for…

Precisely 6 个月前

How to Build a Scalable and Efficient Data Fabric for…

Pronix Inc 7 个月前

How data modernisation drives competitive success

e& enterprise 5 个月前

Configuring the Collector

Once you have your custom distribution of the OpenTelemetry Collector, you need to configure it to specify the behavior and settings of the components you included. The configuration is done using a YAML file.

Here's an example configuration YAML file that utilizes the components from the custom distribution mentioned earlier:

receivers:
? otlpreceiver:
? ? protocols:
? ? ? grpc:

processors:
? batchprocessor:

exporters:
? otlphttpexporter:
? ? endpoint: "https://your-otlp-endpoint.com"
? lokiexporter:
? ? url: "https://your-loki-url.com"

service:
? pipelines:
? ? traces:
? ? ? receivers: [otlpreceiver]
? ? ? processors: [batchprocessor]
? ? ? exporters: [otlphttpexporter]
? ? logs:
? ? ? receivers: [otlpreceiver]
? ? ? processors: [batchprocessor]
? ? ? exporters: [lokiexporter]
? ? metrics:
? ? ? receivers: [otlpreceiver]
? ? ? processors: []
? ? ? exporters: [otlphttpexporter]

In this example, we configure the OpenTelemetry Collector to receive traces, logs, and metrics using the OTLP protocol over gRPC by setting up the otlpreceiver receiver.

We then apply the batchprocessor processor to the received traces and logs.

For exporting the data, we include two exporters: otlphttpexporter and lokiexporter. The otlphttpexporter is configured with the endpoint parameter, which should be set to the URL of your OTLP endpoint. Similarly, the lokiexporter is configured with the url parameter, which should be set to the URL of your Loki instance.

Finally, we define a pipeline named for each signal type in the service section, which connects the configured receiver, processor, and exporters together. This is an important step and its easy to forget, causing lots of wasted time wondering why your collector isn't doing anything.

Once you have your configuration YAML ready, you can start the OpenTelemetry Collector using the following command:

otelcol --config /path/to/your/config.yaml

Deploying the Collector

To deploy the OpenTelemetry Collector, you have three modes to choose from: sidecar, daemonset, or gateway/shared. Each mode serves a specific purpose in different deployment scenarios. If you are deploying to Kubernetes, it is recommended to use the OpenTelemetry Operator, which manages multiple collectors and keeps configuration in one place.

I will cover the OpenTelementry Operator in a future article. However, let's take a closer look at each deployment mode:

Sidecar: In a sidecar deployment, the OpenTelemetry Collector runs alongside your application within the same container or pod. This mode allows the collector to collect data from the application directly, enabling instrumentation and telemetry collection without modifying the application code. The sidecar collector can receive data from your application, process it, and export it to the desired backend or observability platform.
DaemonSet: A daemonset deployment involves running an instance of the OpenTelemetry Collector on each node in a Kubernetes cluster. This mode is particularly useful when you want to collect data from every application or service running on the cluster. By deploying the collector as a daemonset, you can ensure that telemetry data from all the applications on each node is captured and processed centrally.
Gateway/Shared: In a gateway or shared deployment, the OpenTelemetry Collector is deployed as a centralized collection point that receives telemetry data from multiple applications or services. This mode is suitable when you have a shared infrastructure or microservices architecture, and you want to consolidate telemetry data from various sources into a single location. The collector acts as a gateway that aggregates and processes the incoming telemetry data before exporting it to the desired backend. It's also possible to perform tail sampling of your trace data using the gateway.

Each of these deployment modes can be scaled in different ways and the modes can be combined in the same cluster to achieve a collector architecture that works for your scenario.

Learning More

In future articles, I will cover more of the OpenTelemetry Collector, specific components for processing signal data, and provide some guidance on deploying it using the OpenTelemetry Operator. For more information about the Collector, you can refer to the official documentation at https://opentelemetry.io/docs/collector/.

If you have any questions or need assistance, feel free to ask!

Observability

595 位关注者

Jim Ettig

2 个月

Great insights, Drew! The OpenTelemetry Collector sounds like a key player in handling diverse data streams effectively. Could you share some real-world examples where it made a big impact on data observability? I’d love to learn more about its practical benefits.

要查看或添加评论，请登录

Drew Robbins的更多文章

20@Microsoft: How Unexpected Moments Shaped My Career

2025年3月24日

20@Microsoft: How Unexpected Moments Shaped My Career

This month marks 20 years since I joined Microsoft. To reflect on that milestone, I’m sharing a short series about the…
Defining Generative AI Monitoring Standards: What’s in a Name?

2024年7月6日

Defining Generative AI Monitoring Standards: What’s in a Name?

We have been doing a lot of Generative AI work lately. I’m sure many of the readers of this newsletter have as well.
Observing a Greener Future: Carbon Aware SDK

2024年4月23日

Observing a Greener Future: Carbon Aware SDK

As software engineers, we're deeply invested in observability to ensure our systems perform optimally and reliably…

2 条评论
OpenTelemetry Semantic Conventions for Generative AI

2024年4月17日

OpenTelemetry Semantic Conventions for Generative AI

Exciting news from our OpenTelemetry working group! We've just merged our first pull-request for OpenTelemetry Semantic…

4 条评论
Why Structured Logging Matters

2024年3月28日

Why Structured Logging Matters

I work with many talented individuals at Microsoft, including Maho Pacheco. He recently authored an insightful article…

1 条评论
Building a Dashboard with Grafana: A First Attempt

2024年1月15日

Building a Dashboard with Grafana: A First Attempt

Every year, during the end-of-year holidays I try to do some reading and I try to learn something new. This year, I…
Monitoring Generative AI Applications

2023年9月19日

Monitoring Generative AI Applications

As the adoption of Generative AI applications continues to grow, so does the necessity for observability using robust…
Bending OpenAI with Traditional Programming for Unique Recipe Creation

2023年8月13日

Bending OpenAI with Traditional Programming for Unique Recipe Creation

Introduction In today's technological landscape, ChatGPT and other Large Language Models (LLM) have captured the…

1 条评论
Let's Code: Building a Custom OpenTelemetry Collector

2023年6月27日

Let's Code: Building a Custom OpenTelemetry Collector

In past articles, we explored OpenTelemetry, a powerful tool that shines a light on the internal operations of your…

2 条评论
Sampling Strategies in Observability

2023年5月28日

Sampling Strategies in Observability

Balancing data collection is critical in system monitoring. Collect too much, and you risk an overflow of information…

See all articles

Simplifying Telemetry Data Collection

Drew Robbins

Engineering Leader | Driving Innovation and Observability in Generative AI Applications

Choosing a Distribution

领英推荐

Configuring the Collector

Deploying the Collector

Learning More

Observability

595 位关注者

Drew Robbins的更多文章

社区洞察

其他会员也浏览了

Forte Spotlight: Data Engineering Takes Center Stage

A Modern Approach to Scalable Data Management Pipeline

Data Observability and Resilience at Scale

How to Protect Your Data Pipeline Process with Data Contracts

SEEOcta Data: Big Data – How Data can Generate Revenue

Data fabric architecture and it's top 6 use cases

Maximise the value of your data to drive Workload Migration

May the Data be with you

Data Management News for the Week of February 7; Updates from GridGain, Immuta, Percona & More

Data Mesh and the case of Data Sharing

Choosing a Distribution

领英推荐

Configuring the Collector

Deploying the Collector

Learning More

Observability

595 位关注者

Drew Robbins的更多文章

20@Microsoft: How Unexpected Moments Shaped My Career

Defining Generative AI Monitoring Standards: What’s in a Name?

Observing a Greener Future: Carbon Aware SDK

OpenTelemetry Semantic Conventions for Generative AI

Why Structured Logging Matters

Building a Dashboard with Grafana: A First Attempt

Monitoring Generative AI Applications

Bending OpenAI with Traditional Programming for Unique Recipe Creation

Let's Code: Building a Custom OpenTelemetry Collector

Sampling Strategies in Observability

社区洞察

其他会员也浏览了

Forte Spotlight: Data Engineering Takes Center Stage

A Modern Approach to Scalable Data Management Pipeline

Data Observability and Resilience at Scale

How to Protect Your Data Pipeline Process with Data Contracts

SEEOcta Data: Big Data – How Data can Generate Revenue

Data fabric architecture and it's top 6 use cases

Maximise the value of your data to drive Workload Migration

May the Data be with you

Data Management News for the Week of February 7; Updates from GridGain, Immuta, Percona & More

Data Mesh and the case of Data Sharing