登录查看更多内容

LLMs in production: Lessons from the trenches

Sri Krishnamurthy, CFA, CAP

CEO, QuantUniversity | Entrepreneur | AI Innovation | Education | Author | Speaker

发布日期: 2024年11月21日

Dr. Uday Kamath, Ph.D. , Chief Analytics Officer at Smarsh , presented a lecture at the QuantUniversity AI Fall School on November 11, 2024, titled "LLMs in Production." The presentation explored the practical aspects of deploying Large Language Models (LLMs) in real-world applications.

LLM Applications and Metrics

Dr. Kamath began by outlining various LLM applications, mapping them to relevant Natural Language Processing (NLP) tasks and corresponding evaluation metrics. He provided examples such as:

Conversational applications: Chatbots and AI assistants utilizing text generation, summarization, and dialogue management. Common metrics for these applications include BLEU, Perplexity, and human evaluation for naturalness and coherence.

Search and Information Retrieval: Search engines and knowledge base search utilizing information retrieval, semantic search, and summarization. Metrics like Precision, Recall, Mean Reciprocal Rank (MRR), and F1 Score are commonly used.

Content Creation: Social media content generation and marketing copywriting, employing text generation, paraphrasing, and summarization tasks. ROUGE, BLEU, BERTScore, and human evaluation for creativity and coherence are relevant metrics.

Coding Assistants: Tools like GitHub Copilot, using code generation, completion, error detection, and natural language understanding. BLEU, and Code Execution Accuracy are used to evaluate these applications.

Translation and Multilingual Applications: Website translation and content localization, leveraging machine translation, language identification, and multilingual generation. BLEU, METEOR, and TER (Translation Edit Rate) are common metrics for this domain.

Document Analysis and Processing: Legal document review and financial report analysis utilizing summarization, document classification, and information extraction. ROUGE, F1 Score, Precision, and Recall are frequently used.

Sentiment and Intent Analysis: Social media sentiment tracking and customer feedback analysis employing sentiment analysis, intent detection, and text classification. Accuracy, F1 Score, Precision, and Recall are common evaluation metrics.

Question-Answering Systems: FAQ bots and educational tutoring systems relying on question answering, knowledge retrieval, and contextual reasoning. Exact Match (EM), F1 Score, and Mean Reciprocal Rank (MRR) are used to assess these systems.

Categorization of Evaluation Metrics

Dr. Kamath discussed different ways to categorize LLM evaluation metrics:

With References vs. Without References: Metrics can compare model output to correct answers (e.g., BLEU, ROUGE) or assess fluency and coherence without a reference (e.g., Perplexity).

Character-based, Word-based, and Embedding-based: Metrics can focus on character-level correctness, n-gram overlap in words (e.g., BLEU), or semantic similarity using vector embeddings (e.g., BERTScore).

Human Evaluation vs. LLM Evaluation: Metrics can involve human judges assessing relevance, fluency, and coherence or use another model for automated evaluation.

He then provided detailed explanations of commonly used metrics including Perplexity, BLEU, ROUGE, BERTScore, and Pass@k, along with their formulas and pros and cons.

LLM Selection Criteria

Dr. Kamath emphasized the importance of choosing the right LLM for production success. He highlighted key attributes to consider:

-Analytic Quality

-Inference Latency

-Total Cost of Ownership (TCO)

-Adaptability and Maintenance

-Data Security and Licensing

领英推荐

How to Use LLAMA 3?

Blockchain Council 6 个月前

The Art of Prompting in Microsoft 365 Copilot

CloudThat 5 个月前

Dave Tales Edition #25 | The Role of Semantic Search…

DaveAI 4 个月前

He compared open-source and closed-source models, discussing their advantages and disadvantages in terms of flexibility, cost, customization, ease of use, and adaptability.

Evaluation and Optimization

Dr. Kamath stressed the significance of evaluating LLMs to ensure they meet specific task requirements. He recommended using benchmarks like the HuggingFace Open LLM Leaderboard to compare model performance, complemented by domain-specific tests for application-aligned performance.

He also discussed inference latency and cost optimization, pointing out that model size, number of layers, and numeric precision can impact latency. He suggested reducing response length as an optimization tip. For TCO, he advised considering costs beyond tokens, such as setup, labor, and maintenance, and recommended tools like HuggingFace's TCO calculator.

Adaptability, Maintenance, and Security

Dr. Kamath addressed the practical aspects of adaptability, maintenance, and security. He compared open-source and closed-source models in terms of adaptability and maintenance requirements. He cautioned about data privacy, especially with third-party models, and emphasized the importance of considering licensing aspects, including software licensing and data use restrictions. He specifically advised regulated industries to prioritize control over security and adaptability to ensure compliance and safeguard data integrity.

LLMOps

Dr. Kamath introduced LLMOps as an extension of MLOps, focusing on deploying and managing LLMs in production.He outlined key areas of LLMOps:

-Experiment Tracking

-Version Control

-Deployment and CI/CD

-Monitoring and Observability

He provided an example architecture for LLMOps, highlighting the use of prompt templates, adapter models, CI/CD pipelines, and production metrics for feedback and improvement.

Dr. Kamath's presentation provided valuable insights into the practical considerations and challenges of deploying LLMs in production environments. He covered a wide range of topics, from application-specific metrics to LLMOps best practices, equipping attendees with a comprehensive understanding of the LLM production landscape.

Slides and Video of the workshop

The slides and video from yesterday's workshop are available on www.qu.academy

If you don't have a www.qu.academy account, register using the code "QUFallSchool24" to get access to the video and slides. If you already have an account, just login and you will see this and all the other lectures from the QuantUniversity AI Fall school!

Join 5000+ subscribers to the QuantUniversity 's weekly edition of the AI&Risk Management Newsletters to get valuable insights from academics, industry professionals and thought leaders. You will also be alerted about the weekly guest lecture series I host every week!

Yours truly

Sri Krishnamurthy, CFA, CAP

QuantUniversity

AI&Risk Management Newsletter

4,921 位关注者

要查看或添加评论，请登录

Sri Krishnamurthy, CFA, CAP的更多文章

The Times They Are A-Changin’ – AI Risk and the Shifting Global Narrative

2025年2月12日

The Times They Are A-Changin’ – AI Risk and the Shifting Global Narrative

"Come gather ‘round people, wherever you roam, and admit that the waters around you have grown..

1 条评论
Starting an AI project for financial professionals

2024年12月26日

Starting an AI project for financial professionals

In the rapidly evolving financial sector, the integration of Artificial Intelligence (AI) has become pivotal! Drawing…
Peeling the Agentic Systems Onion: 5 Things to Consider

2024年12月21日

Peeling the Agentic Systems Onion: 5 Things to Consider

When I wrote my thesis on agentic systems twenty three years ago, I lived in a simpler world! No intelligence, no…

2 条评论
Alien AI: Myth or Reality

2024年12月18日

Alien AI: Myth or Reality

Yesterday, QuantUniversity hosted an enlightening event featuring Frederic Siboulet, who shared his thoughts on the…
Fifth Third Bank's Chief Model Risk Officer Presents a Framework for Managing Generative AI in Retail Banking

2024年11月24日

Fifth Third Bank's Chief Model Risk Officer Presents a Framework for Managing Generative AI in Retail Banking

In last week's QuantUniversity 's Guest lecture series, Rafic Fahs, Chief Model Risk Officer at Fifth Third Bank…

2 条评论
Navigating Model Risk Management in the Age of AI

2024年11月7日

Navigating Model Risk Management in the Age of AI

This week, Christophe Rougeaux from TD and Hua Julia Li who led Model Risk Management at State Street in her prior role…
QU AI school Lecture 4: Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigation

2024年10月30日

QU AI school Lecture 4: Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigation

This week, Apostol Vassilev from National Institute of Standards and Technology (NIST) discussed a taxonomy of…
Part 2: Unlocking AI's Potential in Finance: Insights from QuantUniversity's Boston Fintech Week Workshop

2024年10月23日

Part 2: Unlocking AI's Potential in Finance: Insights from QuantUniversity's Boston Fintech Week Workshop

QuantUniversity organized a half-day workshop on AI and Investing, bringing together an incredible panel of industry…
AI in FinTech: Opportunities,Adoption and the future

2024年10月22日

AI in FinTech: Opportunities,Adoption and the future

QuantUniversity organized a half-day workshop on AI and Investing, bringing together an incredible panel of industry…

1 条评论
The Case for Specialized Agentic AI Architectures: Moving Beyond Generic AI Agentic architectures

2024年10月13日

The Case for Specialized Agentic AI Architectures: Moving Beyond Generic AI Agentic architectures

With the plethora of AI development options today, often connected to publicly available large language models (LLMs)…

1 条评论

See all articles

LLMs in production: Lessons from the trenches

Sri Krishnamurthy, CFA, CAP

CEO, QuantUniversity | Entrepreneur | AI Innovation | Education | Author | Speaker

领英推荐

Slides and Video of the workshop

AI&Risk Management Newsletter

4,921 位关注者

Sri Krishnamurthy, CFA, CAP的更多文章

社区洞察

其他会员也浏览了

Top 7 AI Startups Revolutionizing Data Science and Analytics

The Expanding Universe of Large Language Models: A Deep Dive

The Evolution of AI in Content Creation: From Siri to Smart Chatbots

Large Language Models vs. Generative AI: Is There any Difference?

Multilingual Language Model: Everything All At Once

Introduction, Features, and Applications of Chatgpt4

Guide to Using Perplexity AI

Exploring the Features of ChatGPT-4: Advanced Language Model.

Evaluating & Finetuning Text-To-Audio Multimodal Models

How AI and Machine Learning Are Transforming React Native App Development

领英推荐

Slides and Video of the workshop

AI&Risk Management Newsletter

4,921 位关注者

Sri Krishnamurthy, CFA, CAP的更多文章

The Times They Are A-Changin’ – AI Risk and the Shifting Global Narrative

Starting an AI project for financial professionals

Peeling the Agentic Systems Onion: 5 Things to Consider

Alien AI: Myth or Reality

Fifth Third Bank's Chief Model Risk Officer Presents a Framework for Managing Generative AI in Retail Banking

Navigating Model Risk Management in the Age of AI

QU AI school Lecture 4: Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigation

Part 2: Unlocking AI's Potential in Finance: Insights from QuantUniversity's Boston Fintech Week Workshop

AI in FinTech: Opportunities,Adoption and the future

The Case for Specialized Agentic AI Architectures: Moving Beyond Generic AI Agentic architectures

社区洞察

其他会员也浏览了

Top 7 AI Startups Revolutionizing Data Science and Analytics

The Expanding Universe of Large Language Models: A Deep Dive

The Evolution of AI in Content Creation: From Siri to Smart Chatbots

Large Language Models vs. Generative AI: Is There any Difference?

Multilingual Language Model: Everything All At Once

Introduction, Features, and Applications of Chatgpt4

Guide to Using Perplexity AI

Exploring the Features of ChatGPT-4: Advanced Language Model.

Evaluating & Finetuning Text-To-Audio Multimodal Models

How AI and Machine Learning Are Transforming React Native App Development