ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search - Part 3 - Connect to SharePoint Document Library
Artwork by Midjourney - Knolling

ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search - Part 3 - Connect to SharePoint Document Library

Warning - Long Thread Ahead......

Introduction

Microsoft has introduced an exciting new feature in ChatGPT Playground, enabling users to integrate their own data and deploy it as a web app effortlessly. This feature complements the concept of the ChatGPT with Enterprise Data app, as discussed in our previous blog, eliminating the need for DevOps steps.

The deployment process has been simplified significantly, requiring only a few clicks, provided that the data is appropriately prepared. As shown below, three data sources are currently supported.

In this tutorial, we will walk you through the process of creating a SharePoint Index in Azure Cognitive Search and deploying the web app.

No alt text provided for this image

Follow along my video for a step-by-step guide.

Prepare your SharePoint Online Document Libraries

In Azure Cognitive Search, we could only index SharePoint Document Library. The .aspx is not supported. If you have loads of content on SharePoint pages, you could look into the PVA's Generative Answers.

No alt text provided for this image
Power Virtual Agent Generative Answers - Supported Content

It's important to be aware that the PVA solution also has certain limitations at this stage, which we talked about here.

To prepare the SharePoint site, note down the home site URL and upload some files in the document library.

No alt text provided for this image

SharePoint Indexer supports a large list of file formats, but only for Document Libraries. There are also other limitations you need to consider when designing a solution.

No alt text provided for this image

Configure the SharePoint Indexer

First, we are going to create a SharePoint search indexer following this MS article. An indexer in Azure Cognitive Search is a crawler that extracts searchable data and metadata from a data source.

Step One: Create Azure Cognitive Search Service

  1. In Azure Cognitive Search, create a search service. Do not use the free tier.

No alt text provided for this image

2. You want to note down the Url and Admin Key for the REST API Call later.

No alt text provided for this image
No alt text provided for this image

Step Two: Enable System Assigned Identity

Once created, go to the resource and enable the system assigned identity if your Azure and SPO are in the same tenant.

No alt text provided for this image

Step Three: Create an AAD application for Indexer to use for Authentication

  1. Create an app in App Registration, leave it as single tenant. Add the following Delegated API permissions. Grant admin consent. Note down the App ID.

No alt text provided for this image

2. Under Authentication, Set?Allow public client flows?to?Yes?then select?Save

No alt text provided for this image

3. Select?+ Add a platform, then?Mobile and desktop applications, then check?https://login.microsoftonline.com/common/oauth2/nativeclient, then?Configure.

No alt text provided for this image

Step Four:?Create Data Source with the Azure Cognitive Search REST API

In Azure Cognitive Search, a data source is used with?indexers, providing the connection information for on demand or scheduled data refresh of a target index, pulling data from?supported Azure data sources.

  1. To create a data source, call?Create Data Source?using preview API version?2020-06-30-Preview?or later.

POST https://acsspsearch.search.windows.net/datasources?api-version=2023-07-01-Preview
Content-Type: application/json
api-key: [admin key]

{
? ? "name" : "sharepoint-datasource",
? ? "type" : "sharepoint",
? ? "credentials" : { "connectionString" : "[connection-string]" },
? ? "container" : { "name" : "defaultSiteLibrary", "query" : null }
}        

2. And for Delegated API permissions connection string format. Remove the square brackets.

SharePointOnlineEndpoint=[SharePoint site url];ApplicationId=[Azure AD App ID]        

3. Here is how I did it through Postman.

No alt text provided for this image

If successful, you will get a 201 Created message.

4. You should also find the newly created data source in Azure.

No alt text provided for this image

Step Five: Create an Index

The index specifies the fields in a document, attributes, and other constructs that shape the search experience.

  1. Use the following in Postman to create an Index. You will need the same Url and Admin key. The endpiont is "indexes" instead of "datasources".

POST https://acsspsearch.search.windows.net/indexes?api-version=2023-07-01-Preview
Content-Type: application/json
api-key: [admin key]

{
? ? "name" : "sharepoint-index",
? ? "fields": [
? ? ? ? { "name": "id", "type": "Edm.String", "key": true, "searchable": false },
? ? ? ? { "name": "metadata_spo_item_name", "type": "Edm.String", "key": false, "searchable": true, "filterable": false, "sortable": false, "facetable": false },
? ? ? ? { "name": "metadata_spo_item_path", "type": "Edm.String", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
? ? ? ? { "name": "metadata_spo_item_content_type", "type": "Edm.String", "key": false, "searchable": false, "filterable": true, "sortable": false, "facetable": true },
? ? ? ? { "name": "metadata_spo_item_last_modified", "type": "Edm.DateTimeOffset", "key": false, "searchable": false, "filterable": false, "sortable": true, "facetable": false },
? ? ? ? { "name": "metadata_spo_item_size", "type": "Edm.Int64", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
? ? ? ? { "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
? ? ]
}        

2. You should receive a 201 success.

No alt text provided for this image

3. You should also be able to see it in Azure.

No alt text provided for this image

Step Six: Create an Indexer

An indexer connects a data source with a target search index and provides a schedule to automate the data refresh. Once the index and data source have been created, you're ready to create the indexer.

  1. First, send a Creat Indexer request.

POST https://acsspsearch.search.windows.net/indexers?api-version=2023-07-01-Preview
Content-Type: application/json
api-key: [admin key]

{
? ? "name" : "sharepoint-indexer",
? ? "dataSourceName" : "sharepoint-datasource",
? ? "targetIndexName" : "sharepoint-index",
? ? "parameters": {
? ? "batchSize": null,
? ? "maxFailedItems": null,
? ? "maxFailedItemsPerBatch": null,
? ? "base64EncodeKeys": null,
? ? "configuration": {
? ? ? ? "indexedFileNameExtensions" : ".pdf, .docx",
? ? ? ? "excludedFileNameExtensions" : ".png, .jpg",
? ? ? ? "dataToExtract": "contentAndMetadata"
? ? ? }
? ? },
? ? "schedule" : { },
? ? "fieldMappings" : [
? ? ? ? {?
? ? ? ? ? "sourceFieldName" : "metadata_spo_site_library_item_id",?
? ? ? ? ? "targetFieldName" : "id",?
? ? ? ? ? "mappingFunction" : {?
? ? ? ? ? ? "name" : "base64Encode"?
? ? ? ? ? }?
? ? ? ? ?}
? ? ]
}        

2. Second, get the Indexer Status. When creating the indexer for the first time, the?Create Indexer?request will remain waiting until your complete the next steps.?

No alt text provided for this image
Note the indexedFileNameExtensions and excludeFileNameExtensions. Make sure it includes the file extensions in your document libraries.

3. Send a GET Indexer command. The rest of the steps have to be done within 10 minutes.

GET https://acsspsearch.search.windows.net/indexers/sharepoint-indexer/status?api-version=2023-07-01-Preview
Content-Type: application/json
api-key: [admin key]        

4. Look for the error message in the Get Indexer Status response.

No alt text provided for this image

5. Use the machine code in the error message to sign in.

No alt text provided for this image

6. The SharePoint indexer will access the SharePoint content as the signed-in user. Make sure the user has the correct permission to the SharePoint site you are indexing.

7. If you check back on Create Indexer Post, it shall display 201 Created.

No alt text provided for this image

8. If you check back on the Get Indexer Status, you should see a 200 OK.

No alt text provided for this image

9. Back in Cognitive Search, you can see the Indexer even with some files indexed.

No alt text provided for this image

We have now successfully created a SharePoint search indexer for Azure Cognitive Search.

User Azure OpenAI Studio

First, let's try the OpenAI Playground to test our app with the SharePoint Indexer.

Azure AI Studio

  1. Create a new deployment.

No alt text provided for this image

2. Go the Playground - Chat, select Add you data and Add a data source.

No alt text provided for this image

3. Select the search service created from previous step.

No alt text provided for this image

4. I left everything as content here. The other option is meta_data but the final result was poor.

No alt text provided for this image

5. I skipped Data management and click save.

No alt text provided for this image

6. Back at the Chat, limit the response to your data and start chatting.

No alt text provided for this image

7. Now if you are happy with the product, you can Deploy to a new web app.

No alt text provided for this image

8. Fill in the subscription and resources.

No alt text provided for this image

9. It will take a couple of minutes to deploy the app.

No alt text provided for this image

10. Once you launch the app, it might present you with the following screen. We did not configure the authentication for the demo app. Just wait another 10 mins and refresh the window.

No alt text provided for this image

11. Remember this app? We have started with a few local files in the original demo and now iterated to support SharePoint Online Document Libraries, pretty cool huh?!

No alt text provided for this image

Limitations:

The SharePoint Index solution only supports Document Library, meaning it will not index the content from your SharePoint pages.

Also, regarding the document libraries, in my actually test result, was only able to answer questions from the Word documents.

MS support has told me that combining SharePoint Index with the Azure OpenAI App is not officially supported. As you can see from the SharePoint Index doc, there is little progress over the years, and it is offered as it is. It only has REST API access and there is no guarantee this solution will make it into Production.

The performance is not comparable to the ChatGPT app we built over Langchain in my other blog.

However, Microsoft also just released Vector data support in Azure Cognitive Search, which I am looking forward to test.

Andres Felipe Noguera E

Technical Specialist @ VersaFile | Document Intelligent and Generative AI Solutions | Passionate learner | Microsoft Power Platform | Microsoft Syntex | IBM Filenet

1 年

Hi Leo. This blog is awesome. Thanks for sharing. I wanted to know, if it's possible to apply filters in the DataSource or in the indexer. I mean, to only bring documents with an specific value in a SharePoint Column? Thanks again.

回复
Kunal Dada

Solution Architect at Capgemini

1 年

Did you ever manage to overcome the issues around the App struggling to query documents in formats other than Word?

回复
Maciej Krzysik

Software Engineer

1 年

Thanks for the article Leo Wang. A month ago I was able to create an indexer using your article. However, today I'm unable to do the same. Step 6.7 never completes, even though I complete the auth step before with a user with access. Does this approach still work?

回复
Lars Martin

Microsoft FastTrack Recognized Solution Architect Dynamics 365 CE | Power Platform Specialist

1 年

Thank you Leo Wang Before I struggled with setting up the indexer and always got a permission error. Your article gave me the hint, that my datasource wasn't properly configured. Now it works.

回复
Andrew Zamer

Financial Systems at Havas

1 年

Hi Leo I see we're able to add one data source index, but what if there is data in multiple indexes, are you aware of any way to add multiple data sources / indexes into the chat playground?

要查看或添加评论,请登录

Leo Wang的更多文章