登录查看更多内容

Fine-tuning OpenAI Models on Azure, Leveraging Embeddings, and Integrating with Cognitive Search for Custom Q&A Solutions

Emilie Lundblad

Microsoft Regional Director & AI MVP | Data + AI Speaker & Educator | Board member | Data, Data science, ML & AI Solution creator

发布日期: 2023年4月3日

As the technology of large language models continues to advance, businesses can benefit immensely from utilizing these tools to develop custom question-and-answer solutions tailored to their specific needs. One such example is creating your own ChatGPT-like system, such as Bloomberg GPT for finance, to harness the best of AI-powered conversational agents in a secure and controlled environment on Microsoft Azure. In this article, we will explore how to fine-tune OpenAI models on Azure, leverage text embeddings, and integrate with cognitive search to enhance the accuracy and relevance of these solutions.

Creating Your Own ChatGPT-like System

Developing a ChatGPT-like system involves training a base model of a large language model on a domain-specific dataset and fine-tuning it to understand and respond to user queries effectively. Fine-tuning not only improves the model's performance but also ensures that it aligns with your company's data and requirements.

Pros and Cons of Fine-tuning:

Pros:

Improved relevance: Fine-tuning ensures the model is tailored to your domain-specific needs, making it more relevant to your company's data.
Enhanced performance: It enables the model to perform better on specific tasks and understand the nuances of your company's information.
Reduced response time: A fine-tuned model can provide faster and more accurate responses, improving overall user experience.

Cons:

Labeling data can require some work and at the same time, the base models may be updated, which might require the finetuning of a new model, this also goes when new important documentation is created.
Access to all data that the model has been fine-tuned on, to anyone with model access, thus no possibility to differentiate the information access.
Traceability of where an answer comes from as well as factual correctness can be an issues
Overfitting: Overfitting is a potential issue when fine-tuning a model, as it may perform exceptionally well on the training data but fail to generalize to new, unseen data.
Maintenance and monitoring: Keeping a fine-tuned model updated requires constant monitoring, additional resources, and regular updates to maintain optimal performance.
Cost: Fine-tuning requires additional computational resources and can be expensive, especially for large-scale models and extensive datasets.

Leveraging Embeddings for Enhanced Q&A Solutions

Text embeddings are numerical (vectors) representations of text that measure the relatedness of text strings. They can be used to measure the relatedness of text strings, classify them, and even identify outliers.

By incorporating embeddings into your fine-tuned OpenAI model, you can improve its ability to find accurate and relevant answers to questions within your company-specific data, creating your own custom Q&A ChatGPT.

领英推荐

OpenAI Sora vs AWS Nova: Who Wins in the Ultimate AI…

Analytics Insight? 3 个月前

Latest AI, Crypto News Headlines for September 12, 2023

Lewis E. Farrell 1 年前

Fine-Tuning Capabilities in Azure OpenAI: A Game…

Ravi Sarkar 1 年前

Pros and Cons of Embeddings

Pros:

Improved similarity measurement: Embeddings can effectively capture semantic relationships between text strings, enabling accurate similarity measurements.
Scalability: Text embeddings can be used with large datasets, making them suitable for enterprise-level applications.
Versatility: Embeddings can be used for a variety of tasks, including clustering, recommendations, and anomaly detection.

Cons:

Limited interpretability: The vector representations generated by embeddings are often difficult to interpret, as they exist in high-dimensional spaces.
Sensitivity to input: Embeddings are sensitive to the quality and nature of the input data, which means that they can be susceptible to issues like bias and noise.
Latency, sometimes embeddings can cause slower response time.

Integrating with Cognitive Search

Cognitive search is a powerful tool that can be integrated with your fine-tuned model and embeddings to provide a comprehensive and tailored Q&A solution.

When combined with OpenAI embeddings, it offers an enhanced search experience across Azure applications. By utilizing embeddings from OpenAI models, you can harness the power of advanced natural language understanding to improve search relevance, cluster similar content, and offer personalized recommendations. This integration enables a more intuitive and intelligent search with natural language queries functionality within web, mobile, and enterprise applications on the Azure platform, with the possibility to search in databases.

By incorporating cognitive search, you can enable intelligent search capabilities, such as understanding natural language queries, providing personalized recommendations, and offering context-aware responses, see example below;

?In conclusion, by fine-tuning OpenAI models on Azure, leveraging text embeddings, and integrating with cognitive search, you can develop a custom and powerful Q&A solution tailored to your company's specific needs. This approach ensures a secure and efficient way to harness the advancements in large language models while addressing the unique challenges your business faces.

References:

Azure Machine Learning documentation
OpenAI fine-tuning guide
OpenAI API documentation
Microsoft OpenAI Technical review
Bloomberg GPT https://arxiv.org/pdf/2303.17564.pdf

Mahtab Syed

Data and AI Leader | AI Solutions | Cloud Architecture(Azure, GCP, AWS) | Data Engineering, Generative AI, Artificial Intelligence, Machine Learning and MLOps Programs | Coding and Kaggle

1 年

Thanks Emilie Lundblad for this article. I am curious and I have 6 questions. From Azure OpenAI Service perspective as we know from Aug 2023 there are 2 main ways of getting customized Azure OpenAI ChatGPT Service 1. Fine-tuning the base Azure Open AI model Q1 - Is this fine-tuned model on proprietry data stored in Azure OpenAI Tenant? I guess Yes. Q2 - Is there a way to apply RBAC on this model to control who has access to which data in the model? I guess No 2. Integrating with Cognitive Search " On your data" with augmented prompts Q3 - I guess the Customer data which is in Cognitive search is stored in Customer's Azure Tenant and is joined with the base model in Azure OpenAI Tenant to get an answer. Looks like a more secure way. I think Yes Q4 - Using RBAC in Cognitive Search can we restrict access based on Role? I guess Yes Q5 - Can we extract other context / user related data from other DBs? I guess Yes Q6 - As the custom data grows in size which solution has lower latency 1. Fine-tuning or 2. Cognitive search. I think this needs to be answered based on other design decisions like cost and security Referring this article and solution designs https://learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy

Naomi Little

Digital Consultant | Head of Digital | Creative Consultant | Data Analyst | Founder & Editor in Chief at PLEB Magazine

1 年

Great article Emilie Lundblad and very timely. One thing I see is that companies are not aware of the amount of data needed to run OpenAi models at present on their own instance. Plus the cost involved can become quite high, when using computing power, combined with the necessary hardware and the cost of using OpenAi itself. Have you found a way to combat this in the short term?

1 次回应

Flemming Hansen

1 年

Thx. Emilie ????interesting With the BloombergGPT ??

1 次回应

Christian Born Djurhuus

1 年

Thanks for sharing Emilie Lundblad

1 次回应

查看更多评论

要查看或添加评论，请登录

Emilie Lundblad的更多文章

Save Time with Custom Python Modules in Microsoft Fabric

2025年1月20日

Save Time with Custom Python Modules in Microsoft Fabric

While we wait for the video session to come online,I have instead created a LinkedIn article with the key points from…

1 条评论
Thank you 2024 & Goodbye

2025年1月1日

Thank you 2024 & Goodbye

Reflecting on 2024: A Year of Growth, Challenge, and Transformation I woke up in 2025 and realized that 2024 had been…
AI & Security reflections upon leaving Chicago & Microsoft Ignite

2024年11月24日

AI & Security reflections upon leaving Chicago & Microsoft Ignite

After an incredible trip to Chicago and Microsoft Ignite, I’m leaving with valuable learnings - both technical and…

6 条评论
How to Ensure GenAI Solutions are Both Responsible and Compliant to EU AI Act with Examples from Microsoft AI

2024年11月12日

How to Ensure GenAI Solutions are Both Responsible and Compliant to EU AI Act with Examples from Microsoft AI

Introduction As generative AI transforms industries and changes the way we work, leaders and AI professionals face a…
Responsible AI: A Pathway to Compliance with the EU AI Act

2024年9月25日

Responsible AI: A Pathway to Compliance with the EU AI Act

Defining Responsible AI Responsible AI refers to the development, deployment, and usage of AI systems that are ethical,…

4 条评论
Creating Value with AI as a Leader and Organization

2024年9月23日

Creating Value with AI as a Leader and Organization

Creating Value with AI as a Leader and Organization In 2024, we’re witnessing a remarkable evolution in AI, with tools…

1 条评论
Unlocking the Future of Business Development with AI

2024年8月30日

Unlocking the Future of Business Development with AI

In today's fast-paced business environment, staying ahead of the curve requires more than just adapting to change—it…

3 条评论
How to Successfully Prepare for the PL-600: Microsoft Power Platform Solution Architect Certification

2024年5月2日

How to Successfully Prepare for the PL-600: Microsoft Power Platform Solution Architect Certification

Having just passed the PL-600 Microsoft Solution Architect Exam, and having over the years passed more than 20…

4 条评论
Tired of Looking for Documents and Information in Emails? AI Help is Here!

2024年2月15日

Tired of Looking for Documents and Information in Emails? AI Help is Here!

In today's rapidly evolving digital landscape, the Hempel AI community gathered both at Hempel's Auditorium in Lyngby…

15 条评论
The right to be forgotten by a chatbot or AI model?

2023年3月2日

The right to be forgotten by a chatbot or AI model?

When we send questions and other chat prompts to chatGPT where does the data go? And if it is information about a…

14 条评论

See all articles

Fine-tuning OpenAI Models on Azure, Leveraging Embeddings, and Integrating with Cognitive Search for Custom Q&A Solutions

Emilie Lundblad

Microsoft Regional Director & AI MVP | Data + AI Speaker & Educator | Board member | Data, Data science, ML & AI Solution creator

Creating Your Own ChatGPT-like System

领英推荐

Emilie Lundblad的更多文章

社区洞察

其他会员也浏览了

?? Meet GPT-4

Latest AI, Crypto News Headlines for August 1, 2023

Microsoft Launches OpenAI CoPilots For Dynamics Apps And The Enterprise.

Latest Microsoft Generative AI Technologies: Azure OpenAI, Copilot, and More

Microsoft’s Azure OpenAI Service: Efficient, Intuitive, and Individualized

Azure OpenAI advancements on the road to Microsoft Ignite 2024

OpenAI – The AI That Can be Life-changing

Integrating Azure OpenAI with Power Automate: A Comprehensive Guide for Organizational Efficiency

Dos and Don'ts for Azure OpenAI

OpenAI's GPTs: Custom AI for Everyone

Creating Your Own ChatGPT-like System

领英推荐

Emilie Lundblad的更多文章

Save Time with Custom Python Modules in Microsoft Fabric

Thank you 2024 & Goodbye

AI & Security reflections upon leaving Chicago & Microsoft Ignite

How to Ensure GenAI Solutions are Both Responsible and Compliant to EU AI Act with Examples from Microsoft AI

Responsible AI: A Pathway to Compliance with the EU AI Act

Creating Value with AI as a Leader and Organization

Unlocking the Future of Business Development with AI

How to Successfully Prepare for the PL-600: Microsoft Power Platform Solution Architect Certification

Tired of Looking for Documents and Information in Emails? AI Help is Here!

The right to be forgotten by a chatbot or AI model?

社区洞察

其他会员也浏览了

?? Meet GPT-4

Latest AI, Crypto News Headlines for August 1, 2023

Microsoft Launches OpenAI CoPilots For Dynamics Apps And The Enterprise.

Latest Microsoft Generative AI Technologies: Azure OpenAI, Copilot, and More

Microsoft’s Azure OpenAI Service: Efficient, Intuitive, and Individualized

Azure OpenAI advancements on the road to Microsoft Ignite 2024

OpenAI – The AI That Can be Life-changing

Integrating Azure OpenAI with Power Automate: A Comprehensive Guide for Organizational Efficiency

Dos and Don'ts for Azure OpenAI

OpenAI's GPTs: Custom AI for Everyone