ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search - Part 3 - Connect to SharePoint Document Library
Warning - Long Thread Ahead......
Introduction
Microsoft has introduced an exciting new feature in ChatGPT Playground, enabling users to integrate their own data and deploy it as a web app effortlessly. This feature complements the concept of the ChatGPT with Enterprise Data app, as discussed in our previous blog, eliminating the need for DevOps steps.
The deployment process has been simplified significantly, requiring only a few clicks, provided that the data is appropriately prepared. As shown below, three data sources are currently supported.
In this tutorial, we will walk you through the process of creating a SharePoint Index in Azure Cognitive Search and deploying the web app.
Follow along my video for a step-by-step guide.
Prepare your SharePoint Online Document Libraries
In Azure Cognitive Search, we could only index SharePoint Document Library. The .aspx is not supported. If you have loads of content on SharePoint pages, you could look into the PVA's Generative Answers.
It's important to be aware that the PVA solution also has certain limitations at this stage, which we talked about here.
To prepare the SharePoint site, note down the home site URL and upload some files in the document library.
SharePoint Indexer supports a large list of file formats, but only for Document Libraries. There are also other limitations you need to consider when designing a solution.
Configure the SharePoint Indexer
First, we are going to create a SharePoint search indexer following this MS article. An indexer in Azure Cognitive Search is a crawler that extracts searchable data and metadata from a data source.
Step One: Create Azure Cognitive Search Service
2. You want to note down the Url and Admin Key for the REST API Call later.
Step Two: Enable System Assigned Identity
Once created, go to the resource and enable the system assigned identity if your Azure and SPO are in the same tenant.
Step Three: Create an AAD application for Indexer to use for Authentication
2. Under Authentication, Set?Allow public client flows?to?Yes?then select?Save
3. Select?+ Add a platform, then?Mobile and desktop applications, then check?https://login.microsoftonline.com/common/oauth2/nativeclient, then?Configure.
Step Four:?Create Data Source with the Azure Cognitive Search REST API
In Azure Cognitive Search, a data source is used with?indexers, providing the connection information for on demand or scheduled data refresh of a target index, pulling data from?supported Azure data sources.
POST https://acsspsearch.search.windows.net/datasources?api-version=2023-07-01-Preview
Content-Type: application/json
api-key: [admin key]
{
? ? "name" : "sharepoint-datasource",
? ? "type" : "sharepoint",
? ? "credentials" : { "connectionString" : "[connection-string]" },
? ? "container" : { "name" : "defaultSiteLibrary", "query" : null }
}
2. And for Delegated API permissions connection string format. Remove the square brackets.
SharePointOnlineEndpoint=[SharePoint site url];ApplicationId=[Azure AD App ID]
3. Here is how I did it through Postman.
If successful, you will get a 201 Created message.
4. You should also find the newly created data source in Azure.
Step Five: Create an Index
The index specifies the fields in a document, attributes, and other constructs that shape the search experience.
POST https://acsspsearch.search.windows.net/indexes?api-version=2023-07-01-Preview
Content-Type: application/json
api-key: [admin key]
{
? ? "name" : "sharepoint-index",
? ? "fields": [
? ? ? ? { "name": "id", "type": "Edm.String", "key": true, "searchable": false },
? ? ? ? { "name": "metadata_spo_item_name", "type": "Edm.String", "key": false, "searchable": true, "filterable": false, "sortable": false, "facetable": false },
? ? ? ? { "name": "metadata_spo_item_path", "type": "Edm.String", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
? ? ? ? { "name": "metadata_spo_item_content_type", "type": "Edm.String", "key": false, "searchable": false, "filterable": true, "sortable": false, "facetable": true },
? ? ? ? { "name": "metadata_spo_item_last_modified", "type": "Edm.DateTimeOffset", "key": false, "searchable": false, "filterable": false, "sortable": true, "facetable": false },
? ? ? ? { "name": "metadata_spo_item_size", "type": "Edm.Int64", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
? ? ? ? { "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
? ? ]
}
2. You should receive a 201 success.
3. You should also be able to see it in Azure.
Step Six: Create an Indexer
An indexer connects a data source with a target search index and provides a schedule to automate the data refresh. Once the index and data source have been created, you're ready to create the indexer.
POST https://acsspsearch.search.windows.net/indexers?api-version=2023-07-01-Preview
Content-Type: application/json
api-key: [admin key]
{
? ? "name" : "sharepoint-indexer",
? ? "dataSourceName" : "sharepoint-datasource",
? ? "targetIndexName" : "sharepoint-index",
? ? "parameters": {
? ? "batchSize": null,
? ? "maxFailedItems": null,
? ? "maxFailedItemsPerBatch": null,
? ? "base64EncodeKeys": null,
? ? "configuration": {
? ? ? ? "indexedFileNameExtensions" : ".pdf, .docx",
? ? ? ? "excludedFileNameExtensions" : ".png, .jpg",
? ? ? ? "dataToExtract": "contentAndMetadata"
? ? ? }
? ? },
? ? "schedule" : { },
? ? "fieldMappings" : [
? ? ? ? {?
? ? ? ? ? "sourceFieldName" : "metadata_spo_site_library_item_id",?
? ? ? ? ? "targetFieldName" : "id",?
? ? ? ? ? "mappingFunction" : {?
? ? ? ? ? ? "name" : "base64Encode"?
? ? ? ? ? }?
? ? ? ? ?}
? ? ]
}
2. Second, get the Indexer Status. When creating the indexer for the first time, the?Create Indexer?request will remain waiting until your complete the next steps.?
Note the indexedFileNameExtensions and excludeFileNameExtensions. Make sure it includes the file extensions in your document libraries.
3. Send a GET Indexer command. The rest of the steps have to be done within 10 minutes.
GET https://acsspsearch.search.windows.net/indexers/sharepoint-indexer/status?api-version=2023-07-01-Preview
Content-Type: application/json
api-key: [admin key]
4. Look for the error message in the Get Indexer Status response.
5. Use the machine code in the error message to sign in.
6. The SharePoint indexer will access the SharePoint content as the signed-in user. Make sure the user has the correct permission to the SharePoint site you are indexing.
7. If you check back on Create Indexer Post, it shall display 201 Created.
8. If you check back on the Get Indexer Status, you should see a 200 OK.
9. Back in Cognitive Search, you can see the Indexer even with some files indexed.
We have now successfully created a SharePoint search indexer for Azure Cognitive Search.
User Azure OpenAI Studio
First, let's try the OpenAI Playground to test our app with the SharePoint Indexer.
Azure AI Studio
2. Go the Playground - Chat, select Add you data and Add a data source.
3. Select the search service created from previous step.
4. I left everything as content here. The other option is meta_data but the final result was poor.
5. I skipped Data management and click save.
6. Back at the Chat, limit the response to your data and start chatting.
7. Now if you are happy with the product, you can Deploy to a new web app.
8. Fill in the subscription and resources.
9. It will take a couple of minutes to deploy the app.
10. Once you launch the app, it might present you with the following screen. We did not configure the authentication for the demo app. Just wait another 10 mins and refresh the window.
11. Remember this app? We have started with a few local files in the original demo and now iterated to support SharePoint Online Document Libraries, pretty cool huh?!
Limitations:
The SharePoint Index solution only supports Document Library, meaning it will not index the content from your SharePoint pages.
Also, regarding the document libraries, in my actually test result, was only able to answer questions from the Word documents.
MS support has told me that combining SharePoint Index with the Azure OpenAI App is not officially supported. As you can see from the SharePoint Index doc, there is little progress over the years, and it is offered as it is. It only has REST API access and there is no guarantee this solution will make it into Production.
The performance is not comparable to the ChatGPT app we built over Langchain in my other blog.
However, Microsoft also just released Vector data support in Azure Cognitive Search, which I am looking forward to test.
Technical Specialist @ VersaFile | Document Intelligent and Generative AI Solutions | Passionate learner | Microsoft Power Platform | Microsoft Syntex | IBM Filenet
1 年Hi Leo. This blog is awesome. Thanks for sharing. I wanted to know, if it's possible to apply filters in the DataSource or in the indexer. I mean, to only bring documents with an specific value in a SharePoint Column? Thanks again.
Solution Architect at Capgemini
1 年Did you ever manage to overcome the issues around the App struggling to query documents in formats other than Word?
Software Engineer
1 年Thanks for the article Leo Wang. A month ago I was able to create an indexer using your article. However, today I'm unable to do the same. Step 6.7 never completes, even though I complete the auth step before with a user with access. Does this approach still work?
Microsoft FastTrack Recognized Solution Architect Dynamics 365 CE | Power Platform Specialist
1 年Thank you Leo Wang Before I struggled with setting up the indexer and always got a permission error. Your article gave me the hint, that my datasource wasn't properly configured. Now it works.
Financial Systems at Havas
1 年Hi Leo I see we're able to add one data source index, but what if there is data in multiple indexes, are you aware of any way to add multiple data sources / indexes into the chat playground?