Summarize your content with Azure Synapse & Azure Language Model
credit for the above image goes to dilbert.com

Summarize your content with Azure Synapse & Azure Language Model

For the analytics community who love Azure Synapse for all the SQL warehouse(s) or serverless SQLs and would like to extend it and achieve AI workloads, I hope this blog might be a joy to read and learn how to broaden the capability by using Synapse to bring in AI models.


The reason why I took this as a topic for today is that when I searched google/bing to show me a document that would guide my customer on 'How to call Azure Cognitive service AI models from Azure Synapse,' I didn't get a direct hit. And hence thought this blog could also help my customers & technical teams who have Azure Synapse workspace in their subscription and doesn't want to spin up a new service/compute like AML or Azure functions but utilize Azure Synapse as one platform that can solve the needs from Analytics, Data Engineering & AI to have a seamless integration within their teams.

P.S. The views in this post are based on my experience and implementation success only and?are not related to any company.

Before we get into the 'how-to,' let's take a feature and develop a use case that needs to be solved. While the implementation process discussed in this article can be applied to any of the models from the Azure Cognitive services family (which is an excellent azure PAAS service that gives ready-to-use AI models), I'm going to take the Language model (highlighted below) and share my experience and thoughts for achieving success using it with Azure Synapse.

No alt text provided for this image


Use case:

One of my favorite features of the language model is summarization. On a day-to-day basis, we read a lot of documents/articles/emails or listen to many conversations/meetings. So, we all would love to have a summary on our notes that would help us take action or give an understanding later just by reading this synopsis. So, let's assume that, as an example, use case to be solved with the help of Synapse and the Summarize feature of the Language model.

?I will discuss how to get a summary from a text document in this article. If you wish to get a summary of what's going on from an audio/video conversation, it will be a two-step process - convert the audio/video to text using the Speech model and then apply the summarization steps that are discussed here (but still the implementation is similar, please post in a comment if you would like to see that or contact me for help)

Architecture:

Lets us understand how we will solve this with Azure services using this diagram.

No alt text provided for this image

At a high level, above are the components involved. One of the core advantages of using Azure Synapse is that we can extend the workspace to do powerful AI programming using the available spark compute.

  1. The source data, in our case, are text files (sitting in an on-prem file system) and uploaded/copied to Azure Data lake storage. There are several ways to achieve this, and Synapse Pipeline is one of them, with a simple drag-and-drop UI to do it. If your source is in a different place, like Azure Blob Storage or AWS S3, we need to choose the appropriate connector (aka linked service in Synapse Pipeline) instead of a file system.
  2. The Azure data lake storage is linked to Azure Synapse Workspace using Linked Service.
  3. A spark cluster is created if there isn't one already
  4. As a pre-requisite, we would need to install the following python libraries to the spark cluster so that it is available for us to use in our notebook session

No alt text provided for this image

For this exercise, it is adequate to have just the Language model library alone. Still, I have added 'azure-cognitive services speech' and 'azure-ai-textanalytics' to show how easy it is to add other models and combine their power to solve even complex requirements that they can solve together.

We can look at how to combine and use them in my next article if I see much interest from the community in this topic!

Before we get into the coding, let's prepare our Azure Synapse and understand how to meet the above pre-requisite.

Preparing Synapse Spark Cluster:

Installing python libraries to the Synapse Spark Cluster is very straightforward.

  1. Create a requirements.txt file (like shown in the above image holding the notepad) or a YML file with the needed libraries. In my case, I would need the below libraries.
  2. Go to Manage and choose Apache spark pools and then select the spark pool which you are planning to use and choose the ellipsis to see the 'Packages' option and then upload the requirements.txt that you created and click the apply button (as shown below pics). This is all we need to load external packages to Synapse Spark Cluster.

No alt text provided for this image
No alt text provided for this image

Now that our spark pool has all the required packages, we can start developing our code to access the Azure Language Model. Go to Develop and create a Notebook. Make sure to do the following when you create a new notebook;

  • Give a proper name to the notebook.
  • Attach the right cluster to which we have installed the libraries
  • Choose the language that you prefer to use (in my case, I have Python).

In the next section, you will see how easy it is to call the Azure Cognitive Services Language model from Azure Synapse!

No alt text provided for this image

Working with AI Models from Synapse:

  • Import the needed packages to your session

from?azure.ai.textanalytics?import?TextAnalyticsClien
from?azure.core.credentials?import?AzureKeyCredential
from?azure.storage.blob?import?BlobServiceClient,?BlobClient,?ContainerClient        

  • Define the needed input variables and authenticate


# Get the key and endpoint from the Azure Lanaguage Service that you are having in your subscription

key?=?"0eaxxxxxxxxxxxxxxxxxxxxxxxxx"
endpoint?=?"https://xxxxxxxxxxdemolanguagesrvc.cognitiveservices.azure.com/"



# Get the blob connection string where you have the source data and also define the input container

blob_connection_string?=?"DefaultEndpointsProtocol=https;AccountName=yourstorageaccountname;AccountKey=xxxxxxxxxxxxxxxxx==;EndpointSuffix=core.windows.net
blob_service_client?=?BlobServiceClient.from_connection_string(blob_connection_string)"


# Get?the?input?container?name?where?the?source?text?file?is?available input_container_name?=?"output-text
input_filename?=?"Community Expert interview.wav-converted.txt"



#?Authenticate?the?client?using?your?key?and?endpoint?
def?authenticate_client():
????ta_credential?=?AzureKeyCredential(key)
????text_analytics_client?=?TextAnalyticsClient(
????????????endpoint=endpoint,?
????????????credential=ta_credential)
????return?text_analytics_client


  client?=?authenticate_client()        

  • Define the method?that would call the AI model and?summarize?the given input



def?sample_extractive_summarization(client)
????from?azure.core.credentials?import?AzureKeyCredential
????from?azure.ai.textanalytics?import?(
????????TextAnalyticsClient,
????????ExtractSummaryAction
????)?


????#connect?to?the?storage?and?download?the?text/doc?(raw?data)
????blob_container_client?=?blob_service_client.get_container_client(container=input_container_name)
????blob_client?=?blob_service_client.get_blob_client(container=input_container_name,?blob=input_filename)
????data?=?blob_client.download_blob()
????data?=?data.readall()
????data?=?data.decode()
????str_text?=?data.strip()


????document?=?str_text.splitlines()


????poller?=?client.begin_analyze_actions(
????????document,
????????actions=[
????????????ExtractSummaryAction(max_sentence_count=3)
????????],
????)


????document_results?=?poller.result()
????for?result?in?document_results:
????????extract_summary_result?=?result[0]??#?first?document,?first?result
????????if?extract_summary_result.is_error:
????????????print("...Is?an?error?with?code?'{}'?and?message?'{}'".format(
????????????????extract_summary_result.code,?extract_summary_result.message
????????????))
????????else:
????????????print("Summary?extracted:?\n{}".format(
????????????????"?".join([sentence.text?for?sentence?in?extract_summary_result.sentences]))
????????????)
        

  • Finally, call the method - We sure can get intuitive here and pass input params to this method, like the file you want to be read and summarize.


sample_extractive_summarization(client)

        

Wola! Just with the above few lines of code, we were able to integrate Azure blob storage and Azure cognitive services into Azure Synapse to get the below output. In my example, I took a text document from an open-source database that had an hour-long conversation with a community expert on diabetes. And from below you can see that the AI model and our code to call from it from Synapse got us a nice summary of it!

No alt text provided for this image

I hope you enjoyed this. Please post your comments and questions in the comment section below; thanks!

要查看或添加评论,请登录

Arvind Periyasamy的更多文章

社区洞察

其他会员也浏览了