Bringing OpenAI to SharePoint Online

Bringing OpenAI to SharePoint Online


In this article, I will explain how I built a fully functional chat assistant powered by Azure OpenAI, which uses my own data stored in SharePoint Online. This not only behaves exactly like ChatGPT but also better, as it helps me understand and find answers from files stored in SharePoint, saving me tons of work and time!

In this article, I will be talking about the following:

Let me show you first how my ChatGPT assistant works through a couple of tests, and then I will guide you through the requirements and the whole process to help you create your very own ChatGPT assistant. Interested?


Let's Chat!

My new chat assistant is smart enough to understand when I'm talking about the same thing. It doesn't matter if I refer to it as a résumé, resume, or CV. Throughout this article, you will see that reflected in my questions. It will provide me with answers from files stored in SharePoint Online.

I start my session by asking about myself. The document library has copies of my résumé, cover letter, and a one-page profile in PowerPoint format (which I use for presentations).

I begin the session with a straightforward question:

The chat assistant summarizes all the information available from the documents in the library, saving me a ton of time from opening file after file and reading them to make sense of what I have in my document library.

The interesting part of the answer is the direct references to the documents pointed to by the numbers 1, 2, and 3, which also provide links to the respective files. I can click on any of the links and have a quick view without leaving my chat session.

Filling the shoes of a recruiter, my interest grows about my potential candidate, and I ask another question about his profile. I don't even need to refer to his name as Alex Gonsales or Alex:

Quick question with a direct answer. I'm feeling like a recruiter now ??

My chat assistant has my undivided attention, and I want to explore this answer further to see how much more information I can get from the data I made available in SharePoint Online:

The conversation goes deeper and becomes more refined:

Then I decided to really act on my recruiting skills and asked about a different topic not related to programming languages:

The chat assistant is correct again. In the case above, it listed one entry from my résumé where SQL was the bread and butter of my past experience (there are a couple of SQL keywords in my résumé, but this is the only entry where I took the time to explain a job deeply related to SQL). I guess it is true what recruiters often say over the phone: "write down your experience, don't just put buzzwords in."

It is interesting to note that I'm still able to ask my assistant questions beyond my résumé. I asked what Meta5 is, and it not only explained Meta5 but backed it up with a reference from my résumé.

So I decided to test my assistant and ask about other things:

Talking about recruiters, how many times do they ask you to send an email back with some bullet points? So I do the same with my chat assistant:

And ask again, but this time, I want the answer directly from the résumé:


After this long answer, I'm done. I know enough about Alex Gonsales, so it is time to challenge my assistant to find other SharePoint professionals, and the answer doesn't disappoint:


First Things First

We need to enable the system assigned identity for Azure Cognitive Search. This is a necessary requirement if the SharePoint site is running on the same tenant. The identity is used internally by the service to automatically detect the tenant where the service is provisioned. If you intend to access a SharePoint site outside your tenant, you don't need to perform this configuration:

The Azure Cognitive Search indexer will need an Azure Application for authentication, and you will need to choose which kind of permissions this app will use within your application: Delegated or Application permissions.

Just a quick reminder in case you don't remember Azure AD 101: If you configure the solution we are about to create to use Delegated Permissions, the indexer will run under the identity of the user or app sending the request (limited access based on user permissions). The access to data will be limited to the site to which the user has access. On the other hand, while using Application Permissions, the indexer will use the identity of the SharePoint tenant, having access to all sites and files within the SharePoint tenant (full access).

Create the Azure Application under App registrations and enter the following:

  1. Application Name: enter any meaningful name you want
  2. Supported Account Type: select single tenant
  3. Redirect URI (optional): leave it blank

Once the Application is registered, click on the left tab for API Permissions and click on the + Add a permission button. Then select Microsoft Graph and choose which kind of permissions you want to choose from:

For Delegated Permissions:

For Application Permissions:

The next step is to configure the authentication for the application. On the left tab, click on the Authentication option and configure it as follows:

The last step is the application secret. Click on the left tab and choose the option Certificates & secrets and create your Client Secret. Write down the value of the secret and also the Application (client) Id. We will use these values in the scripts below to configure the different parts of Azure Cognitive Search.


Preparing SharePoint Online

We start by deciding on what to use for our data source. In my case, it was a document library, and you can opt for the same by using one of your existing site collections or simply create a new site collection. It really doesn't matter.

There are several options for configuring how Azure Cognitive Search will index SharePoint Online contents. It can include the entire site collection with all document libraries, just the default library, or, for example, a specific query, including mapping columns. You can read all the options here:

https://learn.microsoft.com/en-us/azure/search/search-howto-index-sharepoint-online#controlling-which-documents-are-indexed

For the first time, I will recommend a smaller approach, which will also significantly speed up your deployment when the data is being indexed for the first time. Just create a document library and upload a few documents. Large libraries might throw curveballs when troubleshooting a faulty index.

As said before, you can create improved experiences by mapping columns from the document library to the index. This will add additional metadata to help classify the data stored in SharePoint, and Azure Cognitive Search can use this to improve searches, giving more context to the information indexed.

The image below is a screenshot from my site collection that I call Talent Tracker. It has a document library called CVs that I will be using as my data source:

In my site collection, the library CVs was highly customized with document sets providing columns to store extra information about the Employee Name, Job Title, Department, etc. This is a good example of how to index extra metadata for more advanced indexes (Semantic and Vector) that can be mapped later in the script with the reference: additionalColumns=[...]

In my example, I'm creating a document set for each profile that I want to store, it will provide with identifiable metadata that can be used directly in SharePoint list views, including SharePoint Search, and the also as searchable metada in the Azure Cognitive Search index:

This is what I will be using as my data source, and it is up to you the level of customization for your test. However, the minimum requirement is a document library with files. Remember to exclude ;additionalColumns=[...] from your script, or add the correct columns if you want the metadata to be indexed.


Setting up Azure Cognitive Search

You will need Azure Cognitive Search with at least Standard pricing tier or above, and at least one search unit replica; that's the bare minimum, and it can be quite expensive. Pay attention to Azure Cost Management as it will start costing you around $20 a day once it's provisioned ??.

This part demands a little more of your care and attention because SharePoint Online connector is still in the preview version for Azure Cognitive Search. There is no way to configure each of the steps below using the UI. Therefore, you will need to resort to the REST API. I'm providing the necessary scripts for creating the data source, index, and indexer.

The calls to the endpoint are made using Invoke-RestMethod. Pay special attention to the variable $headers; it should contain the private key for your endpoint:

The key will be used by all the scripts, and all the features created by the script are listed as "sharepoint-xxxx" - You can change the names to something you like, but remember to tie them together (there is a dependency).

Run each script one at a time and check the outcome of each feature. Data source and index creation are pretty quick, and there is no harm in running them as a single script. The last script is tricky and might take some time to show results. I'm sharing an optional fourth script to help you check the Indexer (step 4 is optional). It will be useful for troubleshooting typical problems with the Indexer and the Indexing queue. You can find this at the end of this step.

Depending on the choices you made during the Azure Application registration process, you will need to modify the scripts below to match the correct level of permissions:

For Delegated Permissions, use the following authentication string:

SharePointOnlineEndpoint=URI;ApplicationId=XXX        

For Application Permissions, use the following authentication string:

SharePointOnlineEndpoint=URI;ApplicationId=XXX;ApplicationSecret=XXX        

You will need to include the parameter ;TenantId=XXX in case your SharePoint site is located on a different tenant. The URI should be the full site collection address without the document library, something like the following example: https://TenantName.sharepoint.com/sites/SiteCollection. Do not include the document library in the URI; it should be specified later in the "query" parameter.

Launch your Windows Terminal/PowerShell/ISE and run the following scripts in the order given below:

1 - Creating the Data Source:

#   Azure Cognitive Search - API Key

$headers = @{ "api-key" = "{KEY VALUE FROM AZ COGNITIVE SEARCH}" }

#   Azure Cognitive Search - Creating the Data Source

$request = @"
{
    "name" : "sharepoint-datasource",
    "type" : "sharepoint",
    "credentials" : { 
        "connectionString" : "SharePointOnlineEndpoint=https://TenantName.sharepoint.com/sites/SiteCollection;ApplicationId=XXX;ApplicationSecret=XXX"
    },
    "container" : { 
        "name" : "useQuery", 
        "query" : "includeLibrary=https://TenantName.sharepoint.com/sites/SiteCollection/DocumentLibrary;additionalColumns=ColumnName1,ColumnName2,ColumnName3,ColumnName4"
    }
}
"@

$uri = "https://AZCognitiveSearchEndPoint.search.windows.net/datasources?api-version=2020-06-30-Preview"

Invoke-RestMethod -Uri $uri -Headers $headers -Method Post -Body $request -ContentType "application/json"        

2 - Index:

#   Azure Cognitive Search - API Key

$headers = @{ "api-key" = "{KEY VALUE FROM AZ COGNITIVE SEARCH}" }

#   Azure Cognitive Search - Creating the Index

$request = @"
{
    "name" : "sharepoint-index",
    "fields":  [
        { "name": "id", "type": "Edm.String", "key": true, "searchable": false },
        { "name": "metadata_spo_item_name", "type": "Edm.String", "key": false, "searchable": true, "filterable": false, "sortable": false, "facetable": false },
        { "name": "metadata_spo_item_path", "type": "Edm.String", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
        { "name": "metadata_spo_item_content_type", "type": "Edm.String", "key": false, "searchable": false, "filterable": true, "sortable": false, "facetable": true },
        { "name": "metadata_spo_item_last_modified", "type": "Edm.DateTimeOffset", "key": false, "searchable": false, "filterable": false, "sortable": true, "facetable": false },
        { "name": "metadata_spo_item_size", "type": "Edm.Int64", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },
        { "name": "content", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
    ]
}
"@

$uri = "https://AZCognitiveSearchEndPoint.search.windows.net/indexes?api-version=2020-06-30"

Invoke-RestMethod -Uri $uri -Headers $headers -Method Post -Body $request -ContentType "application/json"        

3 - Indexer:

#   Azure Cognitive Search - API Key

$headers = @{ "api-key" = "{KEY VALUE FROM AZ COGNITIVE SEARCH}" }


#   Azure Cognitive Search - Creating the Indexer

$request = @"
{
    "name" : "sharepoint-indexer",
    "dataSourceName" : "sharepoint-datasource",
    "targetIndexName" : "sharepoint-index",
    "parameters": {
    "batchSize": null,
    "maxFailedItems": null,
    "maxFailedItemsPerBatch": null,
    "base64EncodeKeys": null,
    "configuration": {
        "indexedFileNameExtensions" : ".pdf, .docx, .pptx",
        "excludedFileNameExtensions" : ".png, .jpg, .gif",
        "dataToExtract": "contentAndMetadata"
      }
    },
    "schedule" : {},
    "fieldMappings" : [
        { 
          "sourceFieldName" : "metadata_spo_site_library_item_id", 
          "targetFieldName" : "id", 
          "mappingFunction" : { 
            "name" : "base64Encode" 
          } 
        }
    ]
}
"@


$uri = "https://AZCognitiveSearchEndPoint.search.windows.net/indexers?api-version=2020-06-30-Preview"

Invoke-RestMethod -Uri $uri -Headers $headers -Method Post -Body $request -ContentType "application/json"        

The Indexer script ties everything together. Once created, the Indexer will then index the SharePoint data for the first time, making the documents available for Azure OpenAI. This is the final step before we can proceed with the configuration of the chat assistant.

If you are not using SharePoint Online as your data source, Azure OpenAI will configure everything above during the creation of the chat assistant, including the data upload (depending on your choices).

And here is the additional script to help you monitor and/or troubleshoot any possible problems with your Indexer:

4) Indexer status:

#   Azure Cognitive Search - API Key

$headers = @{ "api-key" = "{KEY VALUE FROM AZ COGNITIVE SEARCH}" }

$uri = "https://AZCognitiveSearchEndPoint.search.windows.net/indexers/sharepoint-indexer/status?api-version=2020-06-30-Preview"

Invoke-RestMethod -Uri $uri -Headers $headers -Method Get -ContentType "application/json"        


Creating the Chat Assistant

This is quite straightforward, and I will quickly cover what is necessary to get to the chat agent. There is a lot that can be said and explained about Azure OpenAI configuration, but for brevity, I will keep it to a minimum.

Azure OpenAI is under limited access, and these steps assume you have already been granted access. If you haven't, please visit this link:

https://learn.microsoft.com/en-us/legal/cognitive-services/openai/limited-access

Everything you need to know to gain access to OpenAI is listed in the link above, including the registration process. It is quite a lengthy process, and even if you answer anything wrong, you can try resubmitting your application. It takes time, but you need to make a good case by describing as best as you can why you need it and how you plan to use it.

Open up Azure OpenAI Studio at https://oai.azure.com/ or the direct link from your own OpenAI resource. Choose the option "Bring your own data":

You will notice that there are no deployments found within your resource. Click on the button that says "Create new deployment":

From there, a new dialog will prompt you to choose a model to be used in the chat assistant. You can choose between "gpt-3.5-turbo" and "gpt-3.5-turbo-16k". Also, pick how any of them should be updated when a new version is released. As I mentioned earlier, I will keep it brief, as there is much more to cover. I plan to write more about Azure OpenAI in the future and revisit these options that I will be skipping for now.

Give a name to your deployment, something like "cv-assistant":

Map the fields as shown in the screenshot below:

At this point, choose "Keyword":

You can revisit this later and change the configuration to "Semantic" search. This option is probably the best since OpenAI takes full advantage of its capabilities. My script creates a simple "Keyword" index, but you can manually add your own Semantic configuration directly from the Azure Cognitive Search UI. With my base "Keyword" index, you have everything in place to further expand and add more capabilities.

You are now ready to deploy, just choose "Save and close":

Depending on the number of files in your document library, the indexer might still be busy indexing your files, and the chat assistant will not have any data available. However, it is most likely that you will be greeted with a note that says "Start chatting" from a smiling bot, signaling that you are ready to test your agent:

Congratulations! You have just deployed your first OpenAI chat assistant!


Conclusion

Having ChatGPT capabilities to assist SharePoint Online content is a game-changer, and the opportunities are endless, but everything is connected through Azure Cognitive Search.

My goal in this article was to help you get started with OpenAI and SharePoint Online, but this is just the tip of the iceberg. Azure Cognitive Search is probably the most valuable hidden gem among all Azure AI services. It has connectors for numerous data sources and can make sense not only of text but also images. In my article, I omitted the indexing of .jpg and .png files, but you can add those and see them showing as results during your conversation with the assistant.

Probably the major downside of Azure Cognitive Search is the price. It is an expensive service, and the first two tiers do not support Semantic search. The first tier is called Free and is very limited. Basic is the next tier, which starts at US$ 71 per month. Things only get interesting from the Standard tier, starting at US$ 239. This might be a big NO for the average developer who is just looking to learn.

Another thing to consider is the overwhelming tuning options from OpenAI. There are options for the completion assistant, chat assistant, different models besides GPT 3.5, and the fine-tuned model from training datasets (with very limited availability from the limited access program).

I would recommend starting with the basics by fine-tuning the chat assistant interactions first. Try to improve the responses and the randomness of answers. In the configuration panel on the right next to the chat session, you can experiment with changing the Temperature. This setting will produce more creative responses for every interaction, while TOP K controls a different facet regarding the size of the data stored (referenced by PARTS).

It was a long article, but thank you for making it to the end. I hope you found it useful and informative. If you have any comments or questions, please feel free to let me know. I'm here to help!


#ai #azure #azureopenai #openai #azuresearch #azurecognitiveservices #chatgpt

Stefan Schulte

Cloud & AI Solution Architect, DevOps Engineer

10 个月

Hi Alex, cool stuff, thanks for sharing. Did you find a way to vectorize the sharepoint content for even more accurate results?

回复
Louis Josso

Bringing your data to life ?? | Data Scientist at Vinted

1 年

For the one stopping by, I think there could be tiny typo in the GetStatus function : You should not add the "-body $request" but directly : Invoke-RestMethod -Uri $uri -Headers $headers -Method Get -ContentType "application/json" Enjoy! ??♂? (and thank you, one of the best source of information here)

Hey Alex, I have another query. Is there an out-of-the-box solution by Microsoft to prevent the bot from providing a solution to the user if they don't have access to the SharePoint site? Let's say we train the bot on confidential HR policies present in a Sharepoint site. Let's say a user from some other department doesn't have access to those policies: which are only meant for the HR and queries bot regarding something present in those documents. Is there a way to prevent the bot from providing an answer to that user? I hope I made sense

回复
Sandra Mau

Vice President & Tech Startup Founder - Cloud, AI & Government Solutions

1 年

This post is fantastic, thanks for sharing Alex. Just wondering how this approach might compare to using Microsoft Syntex, as they posted a recent article on their AI & Sharepoint integration https://learn.microsoft.com/en-us/microsoft-365/syntex/syntex-overview?

回复
Raajeev H Dave

Human Leader GenAI and Predictive Analysis

1 年

Hi Alex G Look like we have achieved everything except role-based access to of SharePoint document. I am using delegated access (which provides the same) with a service account. Now I have a restriction of chat based on document read access permission in SharePoint. What are the options available? Any further help will be appreciated.

回复

要查看或添加评论,请登录

Alex G的更多文章

  • Copilot - The 5 Lanes of Development

    Copilot - The 5 Lanes of Development

    I've just begun my journey with Copilot Studio, and it feels like reconnecting with an old friend. It shares the…

    2 条评论
  • Batch Updating Windows Task Scheduler

    Batch Updating Windows Task Scheduler

    Have you ever heard of the proverbial needle in the virtual haystack? Well, if you haven't, allow me to introduce you…

  • The SharePoint Online "User Information List" - a Graph API approach

    The SharePoint Online "User Information List" - a Graph API approach

    In the past when looking for SharePoint users I would go down with Get-PnPUser to get the user list from a given…

    14 条评论
  • Updating M365 User Profiles (Delve)

    Updating M365 User Profiles (Delve)

    About two or three years ago I was asked by a customer to develop a way to update Delve user profiles, the customer was…

    2 条评论
  • Building a Scalper Bot

    Building a Scalper Bot

    How it started? A while ago, back in December, I was telling my wife about an article I just read on how difficult was…

社区洞察

其他会员也浏览了