Summarize your content with Azure Synapse & Azure Language Model
For the analytics community who love Azure Synapse for all the SQL warehouse(s) or serverless SQLs and would like to extend it and achieve AI workloads, I hope this blog might be a joy to read and learn how to broaden the capability by using Synapse to bring in AI models.
The reason why I took this as a topic for today is that when I searched google/bing to show me a document that would guide my customer on 'How to call Azure Cognitive service AI models from Azure Synapse,' I didn't get a direct hit. And hence thought this blog could also help my customers & technical teams who have Azure Synapse workspace in their subscription and doesn't want to spin up a new service/compute like AML or Azure functions but utilize Azure Synapse as one platform that can solve the needs from Analytics, Data Engineering & AI to have a seamless integration within their teams.
P.S. The views in this post are based on my experience and implementation success only and?are not related to any company.
Before we get into the 'how-to,' let's take a feature and develop a use case that needs to be solved. While the implementation process discussed in this article can be applied to any of the models from the Azure Cognitive services family (which is an excellent azure PAAS service that gives ready-to-use AI models), I'm going to take the Language model (highlighted below) and share my experience and thoughts for achieving success using it with Azure Synapse.
Use case:
One of my favorite features of the language model is summarization. On a day-to-day basis, we read a lot of documents/articles/emails or listen to many conversations/meetings. So, we all would love to have a summary on our notes that would help us take action or give an understanding later just by reading this synopsis. So, let's assume that, as an example, use case to be solved with the help of Synapse and the Summarize feature of the Language model.
?I will discuss how to get a summary from a text document in this article. If you wish to get a summary of what's going on from an audio/video conversation, it will be a two-step process - convert the audio/video to text using the Speech model and then apply the summarization steps that are discussed here (but still the implementation is similar, please post in a comment if you would like to see that or contact me for help)
Architecture:
Lets us understand how we will solve this with Azure services using this diagram.
At a high level, above are the components involved. One of the core advantages of using Azure Synapse is that we can extend the workspace to do powerful AI programming using the available spark compute.
For this exercise, it is adequate to have just the Language model library alone. Still, I have added 'azure-cognitive services speech' and 'azure-ai-textanalytics' to show how easy it is to add other models and combine their power to solve even complex requirements that they can solve together.
We can look at how to combine and use them in my next article if I see much interest from the community in this topic!
Before we get into the coding, let's prepare our Azure Synapse and understand how to meet the above pre-requisite.
领英推荐
Preparing Synapse Spark Cluster:
Installing python libraries to the Synapse Spark Cluster is very straightforward.
Now that our spark pool has all the required packages, we can start developing our code to access the Azure Language Model. Go to Develop and create a Notebook. Make sure to do the following when you create a new notebook;
In the next section, you will see how easy it is to call the Azure Cognitive Services Language model from Azure Synapse!
Working with AI Models from Synapse:
from?azure.ai.textanalytics?import?TextAnalyticsClien
from?azure.core.credentials?import?AzureKeyCredential
from?azure.storage.blob?import?BlobServiceClient,?BlobClient,?ContainerClient
# Get the key and endpoint from the Azure Lanaguage Service that you are having in your subscription
key?=?"0eaxxxxxxxxxxxxxxxxxxxxxxxxx"
endpoint?=?"https://xxxxxxxxxxdemolanguagesrvc.cognitiveservices.azure.com/"
# Get the blob connection string where you have the source data and also define the input container
blob_connection_string?=?"DefaultEndpointsProtocol=https;AccountName=yourstorageaccountname;AccountKey=xxxxxxxxxxxxxxxxx==;EndpointSuffix=core.windows.net
blob_service_client?=?BlobServiceClient.from_connection_string(blob_connection_string)"
# Get?the?input?container?name?where?the?source?text?file?is?available input_container_name?=?"output-text
input_filename?=?"Community Expert interview.wav-converted.txt"
#?Authenticate?the?client?using?your?key?and?endpoint?
def?authenticate_client():
????ta_credential?=?AzureKeyCredential(key)
????text_analytics_client?=?TextAnalyticsClient(
????????????endpoint=endpoint,?
????????????credential=ta_credential)
????return?text_analytics_client
client?=?authenticate_client()
def?sample_extractive_summarization(client)
????from?azure.core.credentials?import?AzureKeyCredential
????from?azure.ai.textanalytics?import?(
????????TextAnalyticsClient,
????????ExtractSummaryAction
????)?
????#connect?to?the?storage?and?download?the?text/doc?(raw?data)
????blob_container_client?=?blob_service_client.get_container_client(container=input_container_name)
????blob_client?=?blob_service_client.get_blob_client(container=input_container_name,?blob=input_filename)
????data?=?blob_client.download_blob()
????data?=?data.readall()
????data?=?data.decode()
????str_text?=?data.strip()
????document?=?str_text.splitlines()
????poller?=?client.begin_analyze_actions(
????????document,
????????actions=[
????????????ExtractSummaryAction(max_sentence_count=3)
????????],
????)
????document_results?=?poller.result()
????for?result?in?document_results:
????????extract_summary_result?=?result[0]??#?first?document,?first?result
????????if?extract_summary_result.is_error:
????????????print("...Is?an?error?with?code?'{}'?and?message?'{}'".format(
????????????????extract_summary_result.code,?extract_summary_result.message
????????????))
????????else:
????????????print("Summary?extracted:?\n{}".format(
????????????????"?".join([sentence.text?for?sentence?in?extract_summary_result.sentences]))
????????????)
sample_extractive_summarization(client)
Wola! Just with the above few lines of code, we were able to integrate Azure blob storage and Azure cognitive services into Azure Synapse to get the below output. In my example, I took a text document from an open-source database that had an hour-long conversation with a community expert on diabetes. And from below you can see that the AI model and our code to call from it from Synapse got us a nice summary of it!
I hope you enjoyed this. Please post your comments and questions in the comment section below; thanks!