ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Defining Generative AI Monitoring Standards: Whatâ€™s in a Name?

Drew Robbins

Engineering Leader | Driving Innovation and Observability in Generative AI Applications

å‘å¸ƒæ—¥æœŸ: 2024å¹´7æœˆ6æ—¥

We have been doing a lot of Generative AI work lately. Iâ€™m sure many of the readers of this newsletter have as well. As part of my recent work, Iâ€™ve been helping draft the OpenTelemetry Semantic Conventions for Generative AI applications, aimed at standardizing how we monitor these new types of applications.

The initial release is here: Semantic Conventions for Generative AI systems

You can also see the latest drafts here: https://github.com/open-telemetry/semantic-conventions/tree/main/docs/gen-ai

Monitoring distributed systems can be a huge challenge for large companies. When they add a non-deterministic component like Generative AI, it becomes even more challenging. They might wonder why the system works well one day but not the next. They may be perplexed by why some customers seem to find every way to break the application. And overall, theyâ€™ll be surprised by the cost and not sure how to associate that cost with the outcomes they were hoping to achieve.

These systems, which include technologies like large language models (LLMs), require accurate and consistent monitoring to ensure they operate efficiently and safely. However, the diverse and complex nature of these systems poses significant challenges for developers and operators. Without standardized conventions, the telemetry data collected can be inconsistent, making it difficult to gain meaningful insights and take actionable steps.

The Semantic Conventions provide a common framework for collecting and interpreting telemetry data. By defining clear and consistent standards for attributes, spans, and metrics, we can ensure that data from different sources is compatible and easily understandable. This consistency is essential for maintaining the performance, reliability, and safety of Generative AI applications.

Importance of Semantic Conventions

When we started this effort, I didnâ€™t expect how many people would be interested in, essentially, coming up with names. But the interest around the effort underscores how important it is to have a common set of semantics. They play a critical role in standardizing the way telemetry data is collected and interpreted across different systems and platforms. This standardization brings several key benefits:

1. Consistency Across Systems:

Semantic Conventions ensure that data collected from various sources follows a uniform structure. This consistency is crucial when integrating data from different services, making it easier to correlate and analyze telemetry data. For example, whether you are monitoring a language model from OpenAI or a custom model from another provider, the conventions ensure that the data points are comparable.

2. Simplified Dashboard Creation:

With standardized data, creating comprehensive and meaningful dashboards becomes much more straightforward. Operators can set up visualizations that accurately reflect the performance and usage patterns of Generative AI models. This helps in quickly identifying trends, anomalies, and potential issues, enabling proactive management of AI systems.

3. Debugging Capabilities:

Debugging complex AI systems can be challenging without a clear and consistent set of telemetry data. Semantic Conventions provide a structured way to capture and log detailed information about AI operations. This detailed logging is invaluable when tracing the root cause of issues, understanding the context of errors, and implementing fixes.

4. Interoperability:

By adhering to widely accepted standards, different teams and organizations can collaborate more effectively. Semantic Conventions facilitate interoperability between tools and services, allowing for a more integrated approach to monitoring and observability. This is particularly important in large organizations or projects involving multiple stakeholders.

5. Data Privacy:

Standardized conventions also take into account data privacy and performance concerns. For instance, standardizing the ability to toggle the capture of prompts and completions ensures that sensitive information is protected and that the telemetry system remains efficient. This balance between data richness and operational efficiency is essential for scalable and secure AI operations.

Collaborative Efforts from Leading Companies

The development of Semantic Conventions for Generative AI is an example of collaboration within the tech industry. This initiative has brought together experts from a diverse array of leading companies, each contributing their unique insights and expertise. The collective effort ensures that the standards we develop are comprehensive, robust, and applicable across various platforms and use cases.

Participants from these Companies:

Microsoft, Traceloop, Google, Apple, Amazon/AWS, IBM, Elastic, Honeycomb, Langtrace, WhyLabs, Alibaba, Red Hat, LangChain4j, Truera, Splunk, SigNoz, Ozmo

This diverse representation ensures that the Semantic Conventions we establish are not only technically sound but also practical and widely applicable. Each company brings a unique perspective, ensuring that the standards are balanced and address the varied needs of the industry.

Another side benefit of this work has been the opportunity for me to meet and collaborate with industry leaders from these organizations. Getting involved in open-source software (OSS) projects like this one is not only a way to contribute to important technological advancements but also an excellent opportunity to build and expand your professional network.

Join the Effort

The work on Semantic Conventions for Generative AI is far from complete, and there is much more to be done. We invite you to join us in this ongoing effort to enhance the observability of AI applications. Your contributions can help shape the future of how we monitor and manage these complex systems.

Getting involved is not just about contributing code or ideas; itâ€™s about becoming part of a community that is dedicated to improving the tools and practices we all rely on. Whether you are an experienced developer, a researcher, or someone who is passionate about AI and observability, there is a place for you in this project.

Why Join Us?

1. Make a Difference: Your contributions can have a significant impact on how Generative AI applications are monitored and managed, leading to more reliable and efficient systems.

2. Collaborate with Industry Leaders: Work alongside experts from leading companies such as Microsoft, Google, Amazon, IBM, and many others. This is a unique opportunity to learn from and collaborate with some of the brightest minds in the field.

3. Expand Your Network: Being part of this initiative allows you to build and expand your professional network, opening up new opportunities for growth and collaboration.

4. Enhance Your Skills: Contributing to open-source projects is an excellent way to enhance your technical skills, gain new knowledge, and stay updated with the latest trends and technologies in AI and observability.

5. Be Recognized: Your work will be recognized within the community and beyond, highlighting your contributions to an important and impactful project.

How to Get Involved:

Join the Discussion: Participate in meetings, forums, and discussions related to the project. Share your ideas, ask questions, and provide feedback.
Contribute to the Code: Help develop and improve the Semantic Conventions by contributing code, writing documentation, or testing new features.
Spread the Word: Help raise awareness about the importance of Semantic Conventions and encourage others to get involved.

Links to Get Started:

If you are interested in joining the effort, please reach out and become part of this exciting journey. Your expertise and enthusiasm are invaluable, and we look forward to collaborating with you.

Observability

595 ä½å…³æ³¨è€…

è®¢é˜…

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Drew Robbinsçš„æ›´å¤šæ–‡ç«

20@Microsoft: How Unexpected Moments Shaped My Career

2025å¹´3æœˆ24æ—¥

20@Microsoft: How Unexpected Moments Shaped My Career

This month marks 20 years since I joined Microsoft. To reflect on that milestone, Iâ€™m sharing a short series about theâ€¦
Observing a Greener Future: Carbon Aware SDK

2024å¹´4æœˆ23æ—¥

Observing a Greener Future: Carbon Aware SDK

As software engineers, we're deeply invested in observability to ensure our systems perform optimally and reliablyâ€¦

2 æ¡è¯„è®º
OpenTelemetry Semantic Conventions for Generative AI

2024å¹´4æœˆ17æ—¥

OpenTelemetry Semantic Conventions for Generative AI

Exciting news from our OpenTelemetry working group! We've just merged our first pull-request for OpenTelemetry Semanticâ€¦

4 æ¡è¯„è®º
Why Structured Logging Matters

2024å¹´3æœˆ28æ—¥

Why Structured Logging Matters

I work with many talented individuals at Microsoft, including Maho Pacheco. He recently authored an insightful articleâ€¦

1 æ¡è¯„è®º
Building a Dashboard with Grafana: A First Attempt

2024å¹´1æœˆ15æ—¥

Building a Dashboard with Grafana: A First Attempt

Every year, during the end-of-year holidays I try to do some reading and I try to learn something new. This year, Iâ€¦
Monitoring Generative AI Applications

2023å¹´9æœˆ19æ—¥

Monitoring Generative AI Applications

As the adoption of Generative AI applications continues to grow, so does the necessity for observability using robustâ€¦
Bending OpenAI with Traditional Programming for Unique Recipe Creation

2023å¹´8æœˆ13æ—¥

Bending OpenAI with Traditional Programming for Unique Recipe Creation

Introduction In today's technological landscape, ChatGPT and other Large Language Models (LLM) have captured theâ€¦

1 æ¡è¯„è®º
Let's Code: Building a Custom OpenTelemetry Collector

2023å¹´6æœˆ27æ—¥

Let's Code: Building a Custom OpenTelemetry Collector

In past articles, we explored OpenTelemetry, a powerful tool that shines a light on the internal operations of yourâ€¦

2 æ¡è¯„è®º
Sampling Strategies in Observability

2023å¹´5æœˆ28æ—¥

Sampling Strategies in Observability

Balancing data collection is critical in system monitoring. Collect too much, and you risk an overflow of informationâ€¦
Simplifying Telemetry Data Collection

2023å¹´5æœˆ15æ—¥

Simplifying Telemetry Data Collection

Enjoying this newsletter? Please share it with your network and encourage them to subscribe to receive more articles onâ€¦

1 æ¡è¯„è®º

See all articles

Importance of Semantic Conventions

1. Consistency Across Systems:

2. Simplified Dashboard Creation:

3. Debugging Capabilities:

4. Interoperability:

5. Data Privacy:

Collaborative Efforts from Leading Companies

Participants from these Companies:

Join the Effort

Why Join Us?

How to Get Involved:

Links to Get Started:

Observability

595 ä½å…³æ³¨è€…

Drew Robbinsçš„æ›´å¤šæ–‡ç«

20@Microsoft: How Unexpected Moments Shaped My Career

Observing a Greener Future: Carbon Aware SDK

OpenTelemetry Semantic Conventions for Generative AI

Why Structured Logging Matters

Building a Dashboard with Grafana: A First Attempt

Monitoring Generative AI Applications

Bending OpenAI with Traditional Programming for Unique Recipe Creation

Let's Code: Building a Custom OpenTelemetry Collector

Sampling Strategies in Observability

Simplifying Telemetry Data Collection

595 ä½å…³æ³¨è€…