登录查看更多内容

Building a RAG Solution using Azure AI Studio - Part 2

Ziggy Z.

Microsoft AI Most Valuable Professional | 9x Microsoft Certified | Certified Azure Solutions Architect | LinkedIn Learning Instructor

发布日期: 2024年4月10日

A lot of companies and employees have been using Large Language Models (LLMs) such as ChatGPT to help them with their work. It has helped them summarize and analyze documents, generate emails and articles, create presentations, summarize meetings, and help them create code.

However, the question arises, how can I use LLMs in a secure and efficient manner that it can answer questions based on the company's confidential data. The answer is through Azure AI Studio and Azure Open AI through Retrieval Augmented Generation (RAG).

This will be a series of articles on how to build a RAG Solution using Azure AI Studio.

Part 1 of the series was covered in this article: Building a RAG Solution using Azure AI Studio - Part 1 | LinkedIn

This article focuses on document chunking and saving the text embedding of the documents in a vector database - steps 1 and 2 of how RAG works as explained in Part 1.

Definitions

Text embedding is a process of representing text as a real-valued vector that encodes the meaning of the word. Computers love numbers. But they struggle with raw text. So, we transform words into these numeric vectors. An algorithm is used to convert the raw text into vectors. These algorithms vary based on the LLM we intend to use. In our case, we will be using text-embedding-ada-002 as this is the algorithm used by OpenAI.

Words are converted into Text/Vector Embeddings

Document Chunking is important because it splits very large data into smaller pieces. The documents we are talking about here are your company documents such as contracts, images, documents, financials, agreements, product manuals, and sales manuals, etc. Some of your documents would be very large, but the text embedding only has a specific size. Text-embedding-ada-002 has a size of 1536 dimensions. Thus, large documents must be chunked into smaller pieces for all the data to fit into the specific dimension. If document chunking is not done and one of your documents are too large to fit 1536 dimensions, then a lot of context would be lost.

In layman's terms, imagine trying to summarize a 1000-page document into 1 page vs summarizing 10 pages into 1 page. The latter would contain more context while the former would lose a lot of information.

Choose what storage solution to use

Going back to our RAG solution, documents must be chunked (break large documents into smaller ones) and transformed into text embeddings (vector form) by using an embedding model. The text embedding is then stored in a vector database for storage and retrieval. The question is, what storage solution shall we be using?

The answer would depend on whether your data is structured or unstructured. For structured data, my recommendation is to use Azure Cosmos DB for MongoDB vCore while for unstructured data we will use Azure AI Search.

In this tutorial, we would be using Azure AI Search as we are assuming your documents would be documents, contracts, pdfs, etc.

Azure AI Search

Azure AI Search is an AI-powered information retrieval platform that helps developers build rich search experiences and generative AI apps. It integrates with Azure storage, Azure OpenAI Service, and other Azure AI services to provide semantic, vector, and hybrid search capabilities.

The beauty of Azure AI search is that it does the document chunking and text embedding vectorization (done during document ingestion) and vector storage for you.

With just a few clicks, it does all the operations for you! To see more how this is done, you can check this link: https://learn.microsoft.com/en-us/azure/search/search-get-started-portal-import-vectors?wt.mc_id=MVP_322781

Ivana Tilca 10 个月前

Setting the Record Straight: Copilot for Microsoft 365…

Insight Solutions 7 个月前

Approaches to using Azure Open AI services.

Arvind Raman 6 个月前

Vectorizing your data steps

The following are the steps to upload your data into Azure AI Search:

Create a text embedding model deployment. Recall that we will be using text-embedding-ada-002. This is assuming you already have access to Azure OpenAI. The application form is in the first article.

Upload your files into a Storage Account. You would need to create a Storage account, create a container (which is a folder), and upload your files.

Create a Container (products is a sample)

Create an Azure AI Search Resource. You can use the Free Pricing tier.

Go to your resource then click Import and Vectorize data. Follow the directions as shown in the photos below.

Choose your Azure OpenAI Service and Text Embedding Model

Test if the storage contains the Vector Column. Notice as well that your document size would increase because Azure AI Search did the document chunking for you. In my sample, I had 20 product manuals and the system chunked them into 163 documents.

Summary

In this article, we discussed how document chunking and text embeddings are important in RAG. We then did a demo on how to import and vectorize your data using Azure AI Search.

Next Step

Once you have completed the tasks above, we will then now proceed in connecting this data to Azure AI Studio.

Jan Malmstrom

6 个月

Ziggy... My Azure does not look like the instructions... I am totally lost here...d Need Help

查看更多评论

要查看或添加评论，请登录

Ziggy Z.的更多文章

Why is Responsible AI Important?

2024年9月27日

Why is Responsible AI Important?

Responsible AI is crucial to ensure that artificial intelligence systems are designed, deployed, and utilized in…
How Can You Learn AI to Stay Competitive?

2024年7月25日

How Can You Learn AI to Stay Competitive?

Recently, I received a thoughtful message from a student asking about how they can transition to a career in AI. This…
Building a RAG Solution using Azure AI Studio - Part 3

2024年4月17日

Building a RAG Solution using Azure AI Studio - Part 3

A lot of companies and employees have been using Large Language Models (LLMs) such as ChatGPT to help them with their…

6 条评论
Building a RAG Solution using Azure AI Studio - Part 1

2024年4月2日

Building a RAG Solution using Azure AI Studio - Part 1

A lot of companies and employees have been using Large Language Models (LLMs) such as ChatGPT to help them with their…

2 条评论
Leveraging AI for a Competitive Advantage in the Workplace

2024年1月18日

Leveraging AI for a Competitive Advantage in the Workplace

Key Takeaway: Embrace AI to gain a competitive edge. AI may not replace humans, but humans who leverage AI may outpace…
AI Capabilities in Azure

2024年1月15日

AI Capabilities in Azure

Artificial Intelligence has been present for many years, but OpenAI and ChatGPT have significantly increased awareness…

7 条评论
AI Applications: Should You Worry About Your Data and Privacy?

2023年11月7日

AI Applications: Should You Worry About Your Data and Privacy?

As AI apps become more popular for fun and work, it's important to know how your data and privacy could be at risk. You…

3 条评论
Solve Machine Learning Problems with No Coding Experience in Azure

2023年10月13日

Solve Machine Learning Problems with No Coding Experience in Azure

Machine learning lies at the heart of artificial intelligence, underpinning numerous modern applications and services…

11 条评论
Why I Switched from ChatGPT to Bing Chat: A Comparison of AI-Powered Chatbots

2023年9月18日

Why I Switched from ChatGPT to Bing Chat: A Comparison of AI-Powered Chatbots

I started using ChatGPT in December 2022, when it had just launched. I was impressed by its ability to generate natural…

9 条评论
ChatGPT for Enterprise Corporations: Leveraging AI Conversations Safely

2023年6月30日

ChatGPT for Enterprise Corporations: Leveraging AI Conversations Safely

ChatGPT, an advanced conversational AI program, has gained significant attention for its ability to engage in…

See all articles

Building a RAG Solution using Azure AI Studio - Part 2