登录查看更多内容

Testing and Validating Generative AI Applications

Sri Krishnamurthy, CFA, CAP

CEO, QuantUniversity | AI Expert | Educator | Author | TedX Speaker |

发布日期: 2024年10月3日

QuantUniversity kickoff its annual Fall AI school yesterday.

Agus Sudjianto gave the fist guest lecture discussing approaches to validating and testing generative AI models, focusing on Retrieval Augmented Generation (RAG) systems. He argued that comprehensively validating large language models (LLMs) is difficult due to the vastness of their training data and the uncontrollability of user input, but argues that validation of RAG systems is possible due to their reliance on finite sets of documents. He presented a framework for RAG system validation that involves? curating question-answer pairs,? evaluating retrieval and generation capabilities, and? conducting human evaluation to calibrate the model's performance. He emphasized the use of interpretable metrics based on semantic similarity,? dimensionality reduction techniques for analysis, and? automated testing methods for scalability. He discussed the importance of? feature engineering and? prompt engineering in achieving successful validation.?

Slides and Video:

The slides and video from the presentation is available here:

Register using the code "QUFALLSCHOOL2024" to get access to the video and slides.

https://academy.qusandbox.com/market/66fdad8050c75511c728b4c8

Key Themes:

Validating AI for High-Stakes Applications: Traditional AI validation methods are inadequate for generative AI, especially in finance and medicine where errors carry significant risks.
Conceptual Soundness and Outcome Analysis: Validation must encompass not just performance but also conceptual soundness, input design, explainability, and alignment with business goals.
RAG Systems as a Validation Target: Comprehensive validation of generic LLMs is currently impossible due to their vast scope. This lecture focuses on validating RAG systems, which offer a more manageable scope for testing.
Automated Testing and Explainable Metrics: Automated testing with stratified sampling ensures comprehensive coverage. Explainable metrics based on semantic similarity and embedding models are crucial for transparency and regulatory acceptance.
Human-in-the-Loop Calibration: While automation is key, human evaluation and calibration remain essential for verifying algorithmic results and setting appropriate thresholds for production deployment.

Important Ideas/Facts:

AI as a "Guessing Machine": Dr. Sudjianto emphasizes that AI, even generative AI, ultimately makes predictions based on patterns. This inherent uncertainty necessitates rigorous validation, especially in high-stakes fields.
Limitations of Benchmarking: While useful for general comparisons, standardized benchmarks fail to address the specific risks and requirements of real-world business applications.
Focus on Model Weakness: Identifying and diagnosing model weaknesses, particularly through explainable metrics, is paramount for building trust and mitigating potential harm.
Importance of Embedding Models: Embedding models play a crucial role in validation, enabling semantic similarity calculations for evaluating retrieval quality, groundedness, and answer relevance.
Data Science in Validation: Dr. Sudjianto highlights that model validation in the age of generative AI is a data science-heavy task, requiring expertise in areas like clustering, calibration, and statistical analysis.

Notable Quotes:

"AI is a guessing machine... Now we want to use them for cases where mistakes can cause real harm."
"Comprehensive validation of [generic] LLMs is hard, and I would call it impossible... But comprehensive validation of RAG is possible."
"We do not use LLMs as judge or jury... We need techniques, we need metrics that is very transparent, explainable, easy to understand."
"The key that we're going to present today in model validation is really identification of model weakness and diagnostics with explainable diagnostics."
"Fast thing is data science. Building it is a technology exercise."

Join us for our other upcoming events:

1. On demand courses at www.quantuniversity.com

2. Next Lecture:

The Use and Governance of Generative AI at Large US Financial Institutions

Oct 8th with Jacob Kosoff , 美国银行

3. Boston Fintech Week:

1/2 day Workshop on AI & Investing

https://web.cvent.com/event/6cf00e1d-9f3b-4244-9db3-279d9efeef42/summary

Looking forward to your participation!

Sri Krishnamurthy, CFA, CAP

QuantUniversity

AI&Risk Management Newsletter

4,776 位关注者

要查看或添加评论，请登录

Sri Krishnamurthy, CFA, CAP的更多文章

Navigating Model Risk Management in the Age of AI

2024年11月7日

Navigating Model Risk Management in the Age of AI

This week, Christophe Rougeaux from TD and Hua Julia Li who led Model Risk Management at State Street in her prior role…
QU AI school Lecture 4: Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigation

2024年10月30日

QU AI school Lecture 4: Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigation

This week, Apostol Vassilev from National Institute of Standards and Technology (NIST) discussed a taxonomy of…
Part 2: Unlocking AI's Potential in Finance: Insights from QuantUniversity's Boston Fintech Week Workshop

2024年10月23日

Part 2: Unlocking AI's Potential in Finance: Insights from QuantUniversity's Boston Fintech Week Workshop

QuantUniversity organized a half-day workshop on AI and Investing, bringing together an incredible panel of industry…
AI in FinTech: Opportunities,Adoption and the future

2024年10月22日

AI in FinTech: Opportunities,Adoption and the future

QuantUniversity organized a half-day workshop on AI and Investing, bringing together an incredible panel of industry…

1 条评论
The Case for Specialized Agentic AI Architectures: Moving Beyond Generic AI Agentic architectures

2024年10月13日

The Case for Specialized Agentic AI Architectures: Moving Beyond Generic AI Agentic architectures

With the plethora of AI development options today, often connected to publicly available large language models (LLMs)…

1 条评论
Use of Generative AI in Large Financial Institutions

2024年10月9日

Use of Generative AI in Large Financial Institutions

QuantUniversity Professional Learning had the second lecture in the QU AI Fall School yesterday Jacob Kosoff from Bank…
What do you do when your AI system in production fails?

2024年5月21日

What do you do when your AI system in production fails?

As AI models and systems evolve, so do the failure modes and testing frameworks! Even with rigorous pre-production…
AI News from Colorado!

2024年5月18日

AI News from Colorado!

I had a chance to visit Denver last week to give a talk at the CEM Benchmarking conference! I was impressed by the…

3 条评论
10 things you need to know about the EU AI ACT

2024年3月14日

10 things you need to know about the EU AI ACT

Welcome to the 50th edition of the AI Risk News Letter! Can't believe we got to 50 with the support of the 4500+…

1 条评论
AI Risk on the radar as a potential financial system risk

2023年12月15日

AI Risk on the radar as a potential financial system risk

As various regulatory bodies are watching the AI space, the Financial Stability Oversight Council report released…

1 条评论

See all articles

Register using the code "QUFALLSCHOOL2024" to get access to the video and slides.

Join us for our other upcoming events:

AI&Risk Management Newsletter

4,776 位关注者

Sri Krishnamurthy, CFA, CAP的更多文章

Navigating Model Risk Management in the Age of AI

QU AI school Lecture 4: Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigation

Part 2: Unlocking AI's Potential in Finance: Insights from QuantUniversity's Boston Fintech Week Workshop

AI in FinTech: Opportunities,Adoption and the future

The Case for Specialized Agentic AI Architectures: Moving Beyond Generic AI Agentic architectures

Use of Generative AI in Large Financial Institutions

What do you do when your AI system in production fails?

AI News from Colorado!

10 things you need to know about the EU AI ACT

AI Risk on the radar as a potential financial system risk