What is OpenTelemetry —— an Introduction for Beginners
Originally post at: https://greptime.com/blogs/2024-09-05-opentelemetry
Devops Teams must prioritize observability within their networks and applications enabling them to understand issues that arise and plan for future expansion and support. In the previous article about semantic conventions, we broke down the methods of observability into these three layers:
At this time, OpenTelemetry focused primarily on the protocol layer. OpenTelemetry does not prescribe exactly what to store, or what to do with it, but rather how it's shaped, and how to move it around.
How Does OpenTelemetry Solve this
OpenTelemetry (OTel) is a set of open source standards for monitoring the health of your application or network based on sets of observability data or signals. This data is broken down into three different types: metrics, logs, and traces. In summary, metrics and logs help to understand what happened and where, while traces help to correlate those happenings to identify when and in what order they happened.
Together, these signals provide a comprehensive view of your system’s health allowing operators to ensure their networks are running smoothly. OpenTelemetry specializes in servicing distributed systems by building on its concept of context propagation. The platform aims to deliver clarity into complex systems by providing a continuous context aligning system events across applications and hosts. Teams are able to understand what happened, where it happened, and when it happened.
While OpenTelemetry shares a large amount of overlap with many of other popular apps and frameworks in the ecosystem, OTel leans strongly into its open source ethos by defining a vendor-neutral set of protocol standards and interfaces. OTel has seen wide adoption within the observability space due to this transparency and composability.
So how exactly can teams use these different signals to ensure the health of their network?
Metrics
Metrics consist of quantitative data describing some performance, benchmark, or statistic relevant to the health of your application. Examples of metrics include
Metrics can be aggregated and exported in user-defined ways to help teams understand when certain events occur and allow teams to invoke fallback mechanisms to ensure the stability of their system. For example, a team might capture a metric such as disk utilization to understand how much capacity exists on a given storage node, and invoke a purge process when the utilization exceeds some given threshold.
Logs
Logging was the first monitoring tool to help developers understand what’s happening in their code. Logs provide a simple, unstructured way for developers to emit messages from running processes to provide more context about the code’s process and status. Most developers will use well known logging libraries in their chosen language to aid in their development process. OTel made the decision to support legacy logging solutions by augmenting these logs to support the OTel Log Data format through supporting bridge components and/or sending these logs through some transformation process while ingesting them through the OTel collector. This provides an easier onboarding process to getting teams started with OpenTelemetry.
Traces
Traces represent the journey of a request as it flows through various services and components in a distributed system. They help track the path and performance of individual requests, making it easier to identify where bottlenecks or errors occur. Traces are made of spans representing a single operation or unit of work within a system, like a database query, an HTTP request, or a function execution. Spans are often related in a parent-child hierarchy, where a parent span can have one or more child spans. This structure helps map the flow of a request through different components of the system. An example of a trace might look something like this
{
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"spans": [
{
"span_id": "6e0c63257de34c92",
"parent_span_id": null,
"name": "GET /api/v1/users",
...
},
{
"span_id": "9a92f9a757bb43e1",
"parent_span_id": "6e0c63257de34c92",
"name": "Cache lookup",
"start_time": "2024-08-24T10:23:45.125Z",
"end_time": "2024-08-24T10:23:45.128Z",
"attributes": {
"cache.hit": true,
"cache.key": "user_123456"
},
"events": [],
"links": []
}
]
}
Core Components of OpenTelemetry
source: https://lumigo.io/opentelemetry/
As mentioned previously, OpenTelemetry is vendor neutral. It provides the specification, standards, and core components, and allows others to implement these to best suit their needs. Some of the key components of the OpenTelemetry ecosystem include:
API: The OpenTelemetry API provides a standardized way for applications to generate telemetry data. It's language-specific and offers interfaces for creating spans, adding attributes, and managing context. e.g. Java, Python, Go, JavaScript, .NET, Ruby, PHP, Rust, C++
These core components work together to provide a comprehensive observability solution that can be adapted to various environments and integrated with different tools in the observability ecosystem. The wide range of implementations and integrations available for each component demonstrates the flexibility and broad adoption of the OpenTelemetry standard across the industry.
Example Implementation
Below is a simple application to illustrate the different components and how they work together:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.propagate import set_global_textmap
from opentelemetry.propagators.b3 import B3MultiFormat
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from flask import Flask
import requests
# 1. API & SDK
tracer_provider = TracerProvider(
resource=Resource.create({SERVICE_NAME: "example-service"})
)
trace.set_tracer_provider(tracer_provider)
# 2. Exporter
otlp_exporter = OTLPSpanExporter(endpoint="https://localhost:4317")
span_processor = BatchSpanProcessor(otlp_exporter)
tracer_provider.add_span_processor(span_processor)
# 3. Propagator
set_global_textmap(B3MultiFormat())
# 4. Instrumentation
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
tracer = trace.get_tracer(__name__)
@app.route("/")
def hello():
with tracer.start_as_current_span("make-request") as span:
span.set_attribute("custom.attribute", "example value")
response = requests.get("https://api.example.com")
span.set_attribute("response.status_code", response.status_code)
return "Hello, World!"
if __name__ == "__main__":
app.run(debug=True)
To run the sample process you would need to run an OpenTelemetry Collector, which executes as a separate docker image or executable. An example Docker configuration looks similar to the following:
version: "3"
services:
otel-collector:
image: otel/opentelemetry-collector:latest
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- "4317:4317" # OTLP gRPC receiver
- "4318:4318" # OTLP HTTP receiver
- "8888:8888" # Prometheus metrics exposed by the collector
- "8889:8889" # Prometheus exporter metrics
- "13133:13133" # Health check extension
Executing the above docker file with a mounted configuration similar to below would allow the python process to export its data to this target, e.g. otel-collector-config.yaml.
receivers:
otlp:
protocols:
grpc:
processors:
batch:
exporters:
logging:
loglevel: debug
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [logging]
With all of the different moving pieces to this collection process, it's easy to get overwhelmed. We have provided a small introduction to the different components of OpenTelemetry and how the pieces move together. In future articles, we will expand deeper into how to create a more robust observability pipeline with OpenTelemetry.