Concept of Distributed Tracing (A Step ahead on Open Telemetry)
Picture courtsey middleware.io

Concept of Distributed Tracing (A Step ahead on Open Telemetry)

Microservices provides enormous agility and flexibility to software development process. By partitioning large applications into interdependent services which communicate via explicit network communication contracts, each team encapsulates their implementation from the others. In practice, microservices are really a tradeoff for one set of organizational and technical problems for others. While the benefits of microservices amount to greater independence, clearer organizational boundaries and greater agility, such benefits do come with some distinct costs:

Loss of Traceability- As part of microservice design augmentation single end-user request is broken across multiple processes, possibly written in multiple frameworks and implementation languages which makes it much harder to track?what exactly happened?in the course of processing a request. Unlike a monolithic process, where we could gather the complete story of how a request was handled from a single process written in a single language, we no longer have an easy way of doing that in a microservices environment.

?Increased troubleshooting spend?– With the loss of traceability the act of tracking down and fixing sources of errors inside microservice architectures can be tremendously more expensive and time-consuming than its counter-parts. To add to this, in most cases failure data cannot be correlated in a clear manner inside microservices. Instead of an immediately understandable stack trace, we have to work backwards from status codes and error messages propagated across the network.

Cross-team Dependencies?- Requests has to make multiple hops over the network and has to be handled by multiple processes developed by independent teams, figuring out exactly?where?an error has occurred and whose responsibility it is to fix, does become an exercise of frustration. The practice of debugging microservices often involves sitting developers from multiple product teams down in a conference room correlating timestamped logs from multiple services.

The core of the problem really is that distributed approaches to developing software, such as microservices, really require different tools than what we used in the past when developing monoliths. We can’t expect to attach debuggers to four different processes and try to step-through-debug requests in that environment and that is where?Distributed Tracing?becomes an impeccable requirement.

Distributed Tracing is the process of tracking and analyzing what happens to a request (transaction) across all services it touches. A trace?begins when a user sends an initial request to an entry point of your application. A new trace ID is generated, and each successive request will be decorated with HTTP headers that contain correlation data back to the original request. Each individual operation invoked as part of fulfilling that end-user’s request is called a “span,”?and each span is tagged both with its own unique ID as well as the ID of the trace and the ID of the “parent” span - the ID of the operation and created the current request. All correlation data is propagated downstream and then reported out of band to a distributed tracing engine.

Each span captures important data points specific to the current microservice process handling a request, such as:

  • Useful Tags/metadata, for being able to lookup requests by session ID, HTTP method or any such correlation dimensions.
  • Logs and events, to help establish contextual analysis.
  • The exact service name and address of the process handling this request.
  • In failure conditions, detailed stack traces and error messages.

This feature of distributed tracing makes it possible to correlate data across multiple processes together and figure out?exactly where?something went wrong in the course of processing starting from end users. Most popular distributed tracing tools have support in every major programming language and have plugins for targeting major web frameworks and event buses. With tool like Zipkin or Jaeger, we can solve our microservice architecture’s coherence and data silo problems by propagating distributed trace data between our services and then reporting spans back to the reporting servers included with those tools.

A distributed tracing engine, such as Zipkin or Jaeger, correlates these related spans and makes the request data for any particular operation both searchable and easy to understand using powerful visualization tools.

Benefits of Distributed Tracing :-

The benefits of distributed tracing for software development teams are numerous.

1.????Distributed tracing?radically improves developer productivity and output?by drastically reduce time spent debugging and troubleshooting issues with your systems. It does makes it easy to understand the behavior of distributed systems.

2.????Distributed tracing?works across multiple applications, programming languages, and transports. For Ex - Ruby on Rails applications can propagate traces to .NET applications over HTTP, RabbitMQ, WebSockets, or other transports and all of the relevant information can still be uploaded, decoded, and visualized by the same tracing engine such as Zipkin.

3.????Distributed tracing can also help?improve time and speed to market?by enabling correlation between feature delivery with customer servicing performance.

4.????Distributed tracing?also facilitates excellent cross-team communication and cooperation. It eliminates costly data silos that could otherwise hinder developer’s ability to quickly locate and fix sources of error.

For further details on implementation and best practices, feel free to connect over a Fika break ??

Saurabh Shah

Certified Architect | Cloud | Microservices | APIs | Automation | Passionate about RPA & AI

2 年

One of the key ingredient when we talk about Microservices

要查看或添加评论,请登录

Amit Sengupta的更多文章

社区洞察

其他会员也浏览了