Applying analytics on Realtime Data Through Cosmos & Azure Synapse Link Behind Private Networks (VNET)
ecently a customer and I were discussing the options of replicating the transactional data from both the document store and relational store in near real-time ( < 5 min) to apply analytics or to conduct auditing on this data. While conversing about the potential options, I felt many aren’t aware of the native solution available in Azure or did not try thinking this is hard to achieve especially when private networks are involved between the systems. In this article, I would like to help explore and explain a wonderful feature called ‘Synapse Link’ that Azure has using Synapse.
P.S. The views in this post are based on my experience and implementation success only and?are not related to any company. Though, I would be explaining the options through a document database (data stored in JSON), the concept and architectural process are the same for the data coming relational store using Azure SQL Dbs.
As an example, let’s take a situation where we need to record events from IoT devices in a food packaging plant. The need is to record all the events from the machine(s) for several reasons. Example: perform audits based on FDA regulations, apply predictive analytics on machine failures (detect anomalies) and analyze event relations, etc.
As the focus of this topic is to understand how to replicate transactional data in near real-time, let’s not go deep on the “How To” on the IoT side in this example. So, let’s consider that the transactional data is been recorded in Cosmos DB – Azure Native Document store database using SQL API.
Setting up Cosmos for replicating the data through the Synapse link:
As I mentioned at the beginning of the article, let's introduce a little bit of complexity here by setting up all the required services on a private network in Azure, as I see most users irrespective of which industry, they belong they really wanted to do all this securely behind a private network.
1.??????So, the prerequisite, here is to set up a vent in Azure. This is straightforward and here is a guide for it if you are new to this.
2.??????Next is setting up a cosmos Db account behind the VNET that got created in step 1. Again this is fairly easy to set up from the Azure portal or I have a sample ARM code if you would like to use it.
3.??????Once your Cosmos DB account is created, enable the Azure Synapse Link feature to your Cosmos DB account and create/enable an analytical store for a?Cosmos DB container.
So far in our architecture, we have sensor data from machines emitting events that get captured through IoT Hub and get recorded in Cosmos DB in real-time which is now gets replicated in near real-time to Azure Synapse Link. There are several advantages of using this feature and a few of the use cases are even documented in Azure docs as the best option to utilize this.
Setting up Synapse Workspace to access the data from Cosmos to Synapse Link:
So now on the Synapse side, this data can be accessed through Synapse Serverless SQL Pool or through Synapse Spark Pool. So of course, we need a Synapse workspace that needs to be created behind a VNET, but in the case of Synapse, it is a Managed VNET (Azure takes care of the VNET for you – all you need to do is ask for a private Synapse workspace).?
This is where it might get a little confusing (but trust me, once you understood the concept of managed vnet, you would wish you have this for almost all other azure services), our cosmos is behind a vent that you manage, and Synapse workspace is managed by Azure for you. Making these two services talk to each other might get challenging if you have not worked with Synapse before. Here is an awesome article, IMHO a must-read article while dealing with Synapse-managed VNET. I also have a sample ARM code with all the private endpoints enabled for reference.
1.??????Create Synapse Private Link Hub, this is required to have the Azure Synapse Studio accessible from the managed vnet through your vnet
2.??????Create a Synapse Workspace with data exfiltration enabled. This is key as you can’t enable it once you have the Synapse workspace created
3.??????Create a Private endpoint from your Synapse workspace to link Synapse workspace to the above-created Private Link hub. Follow directions from here. Use the VNET you created where it asks to supply the Network information
4.??????Create a private endpoint from your synapse workspace for Synapse Serverless (SQLOnDemand) and link to your VNET (the one you created in step 1)
5.??????Since our Cosmos DB account has an existing private endpoint, Synapse serverless SQL pool will be blocked from accessing the account, due to network isolation checks on the Azure Cosmos DB account – In order to Allow the Synapse workspace to access the Azure Cosmos DB account by specifying?NetworkAclBypassResourceId?setting on the account. This is a key step that I have seen users missing it.
6.??????The final configuration required before we can access the data from Serverless SQL is to add a managed private endpoint (PE) for the Azure Cosmos DB analytical store. While creating this, you will also note that you have an option to create PE either for the analytical or transactional store. Synapse Serverless SQL only supports analytical stores, if you need to query the transactional store then you must use the Spark pool. But remember, even for the analytical store, the data can be accessed in near real-time!
7.??????Wola! You should be able to see the analytical store for your cosmos DB container in Linked resources and should be able to submit queries.
Hopefully, this blog was helpful for architects and engineers working or planning to design an analytical system on near real-time data from Cosmos. I felt, that the Synapse link is one of the great features that are hidden and many customers aren't taking advantage of it. It not only simplifies the architecture but is also a great cost saver through this analytical store!
I welcome other suggestions and points you would like to share through comments on this topic. Thanks for reading!
Solving Analytical Challenges with Machine Learning on the Cloud
2 年What a great, clear article explaining a very complex topic. I will always look to you, Arvind, for guidance on this stuff. At least after reading this, I feel more confident that I could attempt getting this set up in one shot.