登录查看更多内容

Why you need OpenTelemetry based Observability for your AI apps

Karthik Kalyanaraman

Cofounder and CTO, Langtrace AI | OpenTelemetry Contributor

发布日期: 2024年4月25日

Introduction

With the advent of LLMs, modern software development is going through an important shift - from mostly deterministic systems that can be reasoned through logic to non-deterministic inference endpoints. While LLMs are enabling developers to build innovative applications that were not possible before, they are also creating some interesting challenges when it comes to software quality, testing and debugging.

New Challenges

How do you get an LLM to respond only JSON?

One of the biggest challenges of LLMs is it's non deterministic nature. In simple terms, the LLM's behaviour is governed by uncontrollable variables like model weights, training data etc. which does not guarantee predictability of the response. A traditional GET API endpoint reading a database and returning a JSON response is guaranteed to return a JSON response every single time or fail. But, asking an LLM to return a structured JSON response is not guaranteed and identifying/understanding why it failed the one time during the day is close to impossible as it depends on uncontrollable variables.

This has big consequences to software testing, debugging and quality. When you cannot deterministically reproduce a bug, how can you fix it? One of the ways engineering teams are approaching solving this problem is by putting effective guard rails by using a combination model iteration, fine tuning and prompt engineering to control the behaviour of these systems. In reality, this will be an ongoing process of continuous improvement to get to the dream land of 100% model accurately and quality. This means defining metrics, having tools to gain visibility and diligently measuring and tracking the defined metrics will be of utmost importance to get the most out of these systems.

OpenTelemetry - an important standard

In order to effectively measure these new metrics of applications built using LLMs, we need the ability to generate and capture telemetry data from the LLM interaction layer which is typically made up of frameworks like Langchain and LlamaIndex, vectorDBs like Pinecone and PgVector and LLMs like OpenAI, Cohere and Anthropic.

Luckily we do not need to re-invent the wheel here as significant strides have been made in the last few years in pushing a standard data model for telemetry spans and traces primarily driven by OpenTelemetry, a CNCF incubated project that has wide adoption across the industry.

OpenTelemetry defines a standard data model for spans and traces, the building blocks of any observability system which guarantees minimal intrusion while maximizing high cardinality for effective debugging, visibility and incident response. OpenTelemetry also prevents vendor lock in by letting teams switch between different observability tools as the data model is fixed and widely adopted by various libraries, frameworks and the industry at large.

Langtrace SDK - the first step

领英推荐

TAI #141: Claude 3.7 Sonnet; Software Dev Focus in…

Towards AI 3 周前

Why intelligent observability is essential for our…

New Relic 1 个月前

Anthropic - Introducing New Computer Capabilities with…

Arbisoft 5 个月前

An example span generated by Langtrace's Python SDK

The first step towards this goal is to equip teams with open telemetry(o11y) tracing capabilities for the LLM software layer. With Langtrace SDK, we are building open source SDKs for popular languages in order to capture o11y standard traces from LLM frameworks, vectorDBs and LLM providers. These SDKs are fully compatible with any of the available o11y exporters that can send these traces to any storage system or observability backend.

Langtrace Cloud - the observability layer you need

New metrics demands the need for new capabilities on the observability client. This includes but not limited to:

Running tests and manual/automated evaluations.
Upload reference datasets and download captured, and annotated datasets.
Prompt management and a versioning.
Token usage tracking.

With Langtrace Cloud, we are building a light weight client that is hyper optimized to solve for the above needs while serving as an additional observability layer beside your existing observability solution.

Conclusion

LLMs are a fascinating technology that is supercharging a leap in computing. It is important to pick the right set of tools early in the adoption journey to gain control, visibility and confidence to build and ship high quality software with new and innovative capabilities.

Links

[1] https://langtrace.ai/

[2] https://docs.langtrace.ai/introduction

[3] https://github.com/Scale3-Labs/langtrace

要查看或添加评论，请登录

Karthik Kalyanaraman的更多文章

Evaluate CrewAI Agents using Langtrace

2024年12月20日

Evaluate CrewAI Agents using Langtrace

With the latest update, you can now evaluate both entire agent sessions and individual operations across your entire…
Attribute Extraction from Images using DSPy

2024年11月18日

Attribute Extraction from Images using DSPy

Introduction DSPy recently added support for VLMs in beta. A quick thread on attributes extraction from images using…

1 条评论
Automatic Prompt Generation using DSPy

2024年10月31日

Automatic Prompt Generation using DSPy

Introduction In this post, I will show you a simple implementation of "automatic prompt generation" for solving math…

1 条评论
OpenAI launches Evals, Tracing & Fine tuning

2024年10月2日

OpenAI launches Evals, Tracing & Fine tuning

Introduction ?? OpenAI makes an official entry into application specific Evals and LLM Ops tooling. OpenAI announced…
Building Compound AI systems

2024年9月26日

Building Compound AI systems

Introduction In this article, I will explain how I think about building and optimizing compound AI pipelines with…

1 条评论
Evaluating the AI Oracle approach

2024年4月10日

Evaluating the AI Oracle approach

Recently, I came across this tweet about the AI Oracle approach for improving the accuracy and quality of responses for…
The year of Podcasts and Audio as UX vs the rest

2019年1月3日

The year of Podcasts and Audio as UX vs the rest

In 2019, I think the biggest threat to social media is going to be digital audio streaming (Podcasts and digital music…

2 条评论
LinkedIn’s nifty new feature?—?Career Advice

2017年10月17日

LinkedIn’s nifty new feature?—?Career Advice

Recently, I was dabbling with LinkedIn’s features when I came across this new spot on my profile’s “Your Dashboard”…

5 条评论

See all articles

Why you need OpenTelemetry based Observability for your AI apps

Karthik Kalyanaraman

Cofounder and CTO, Langtrace AI | OpenTelemetry Contributor

Introduction

New Challenges

OpenTelemetry - an important standard

Langtrace SDK - the first step

领英推荐

Langtrace Cloud - the observability layer you need

Conclusion

Links

Karthik Kalyanaraman的更多文章

社区洞察

其他会员也浏览了

Crash Course on Developing AI Applications with LangChain

Custom Enterprise LLM/RAG with Real-Time Fine-Tuning

Issue #217 - THE ML ENGINEER ??

LangSmith

AI - Monday, February 3, 2025: Commentary with Notable and Interesting News, Articles, and Papers

GenAI Weekly — Edition 28

AI Is Blurring the Line Between PMs and Engineers

Digixvalley Granite 3.0: open, state-of-the-art Enterprise Models

LLM Developers: The future of software development

January 25, 2024

Introduction

New Challenges

OpenTelemetry - an important standard

Langtrace SDK - the first step

领英推荐

Langtrace Cloud - the observability layer you need

Conclusion

Links

Karthik Kalyanaraman的更多文章

Evaluate CrewAI Agents using Langtrace

Attribute Extraction from Images using DSPy

Automatic Prompt Generation using DSPy

OpenAI launches Evals, Tracing & Fine tuning

Building Compound AI systems

Evaluating the AI Oracle approach

The year of Podcasts and Audio as UX vs the rest

LinkedIn’s nifty new feature?—?Career Advice

社区洞察

其他会员也浏览了

Crash Course on Developing AI Applications with LangChain

Custom Enterprise LLM/RAG with Real-Time Fine-Tuning

Issue #217 - THE ML ENGINEER ??

LangSmith

AI - Monday, February 3, 2025: Commentary with Notable and Interesting News, Articles, and Papers

GenAI Weekly — Edition 28

AI Is Blurring the Line Between PMs and Engineers

Digixvalley Granite 3.0: open, state-of-the-art Enterprise Models

LLM Developers: The future of software development

January 25, 2024