登录查看更多内容

Azure Synapse Analytics - Quick Bites!

Preetha R.

Microsoft Certified Azure Solutions Architect Expert | Bridging Business Goals with Cloud Excellence | Strong Problem-Solving Skills | Follow my insights at #techysvault

发布日期: 2023年2月3日

Hi Friends,

This week, on the basis of my learnings, I will share some high-level best practices, information, and limitations in Azure Synapse Analytics.

On top of this reference, you can do more research for your use case.

Azure Synapse Analytics Important Pointers -

1.??????IP firewall rules – In portal, under networking, configure specific IP address range & associate with rule name.

2.??????By default, when we create synapse workspace, public network is enabled (not recommended).

3.??????Endpoints in synapse-

-> Dedicated SQL pool

-> Serverless SQL pool

-> Development endpoint

4.??????From local, TCP ports 80,443 & 1433; UPD – 53 should be open in order to connect to synapse studio. TCP 1433(outbound) to connect to SSMS & Power BI.

5.??????Managed VNet is recommend securing the data in Synapse. It is managed by Synapse. No need to create inbound NSG rules on your own VNet to allow synapse to connect to VNet. No need to create subnet for any spark cluster.

6.??????Managed VNet along with managed endpoint protect with data exfiltration is possible.

7.??????Dedicated SQL pool & serverless SQL pool is outside of managed VNET. Intra workspace communication between workspace and dedicated SQL pool/serverless pool use azure private link. This azure private link is automatically created when managed VNet is created.

8.??????Azure synapse analytics enables to connect to various components through endpoints.

9.??????Private links enables to access other services like Azure storage. You can establish private link to a resource by creating a private endpoint. Private endpoints are mapped to a specific resource and no

10.???PolyBase can't load rows that have more than 1,000,000 bytes of data. When you put data into the text files in Azure Blob storage or Azure Data Lake Store, they must have fewer than 1,000,000 bytes of data. This byte limitation is true regardless of the table schema.

11.??Split large, compressed files into smaller compressed files.

12.??For fastest loading speed, run only one load job at a time. If that is not feasible, run a minimal number of loads concurrently. If you expect a large loading job, consider scaling up your dedicated SQL pool before the load.

13.??To run loads with appropriate compute resources, create loading users designated for running loads. Assign each loading user to a specific resource class or workload group. To run a load, sign in as one of the loading users, and then run the load. The load runs with the user's resource class.?

14.??Allow multiple users to load data. Deny to schema.

15.??Load to staging table- Load data to staging table. Define the staging table as a heap and use round-robin for the distribution option. Consider that loading is usually a two-step process in which you first load to a staging table and then insert the data into a production data warehouse table. If the production table uses a hash distribution, the total time to load and insert might be faster if you define the staging table with the hash distribution. Loading to the staging table takes longer, but the second step of inserting the rows to the production table does not incur data movement across the distributions.

16.??Load to a columnstore index

·??????To ensure the loading user has enough memory to achieve maximum compression rates, use loading users that are a member of a medium or large resource class.

·??????Load enough rows to completely fill new row groups. During a bulk-load, every 1,048,576 rows get compressed directly into the columnstore as a full row group. Loads with fewer than 102,400 rows send the rows to the delta store where rows are held in a b-tree index. If you load too few rows, they might all go to the delta store and not get compressed immediately into columnstore format.

17.??Loading with the COPY statement will provide the highest throughput with dedicated SQL pools. If you cannot use the COPY to load and must use the?SqLBulkCopy API?or?bcp A batch size between 100 K to 1M rows is the recommended baseline for determining optimal batch size capacity.

18.??PolyBase is the best choice when you are loading or exporting large volumes of data, or you need faster performance.

19.?Allowing authentication via Azure Active Directory (Azure AD) only is not supported for dedicated SQL pools with Azure Synapse features enabled.?

20.?Private endpoint is a network interface that uses a private IP address from your virtual network. This network interface connects you privately and securely to a service that's powered by Azure Private Link.

Inbound connections- Connections coming to access synapse resources

Endpoints to provide the point of this incoming connection

Dedicated SQL endpoint – dedicated SQL databases (tcp port 1433)

Serverless SQL endpoint – serverless SQL (tcp port 1433)

Development endpoint – Apache spark pools & pipeline/data flows (tcp port 443)

How are we going to secure these 3 endpoints?

领英推荐

10 Future Apache Iceberg Developments to Look forward…

Alex Merced 3 个月前

What is a SPARQL Endpoint?

Cognizone 1 年前

Unravelling SQLbits 2024

Tursio 11 个月前

?By default, all these endpoints are public endpoints by default. Synapse workspace provides below options to protect these endpoints from public domain-

1.??????IP firewall

2.??????DDoS protection(basic)- Includes

Active traffic monitoring

Always on Detection

Automatic attack mitigation

3.??????All inbound traffic to the workspace endpoints is encrypted in transit with TLS 1.2

4.??????Disable public access (only applicable for managed VNET). This ensures all inbound traffic to synapse workspace resources go through private endpoint only.

5.??????Each synapse endpoint discussed above should have 3 separate private endpoints. These private endpoints are powered by Azure private link and traffic stays entirely in Microsoft backbone.

6.??????These private endpoints can be accessed only within the same virtual networks or other VNet which ae globally or regionally peered to the VNet containing the private endpoint or from on-premises resources by express route or VPN gateway.

7.??????Private endpoint prevents data leakage.

8.??????Across tenants and region is possible (private endpoint implementation)

Outbound connections – Connections going out of Synapse workspace to access other resources and services.

Possible outgoing connections –

From dedicated SQL pool & serverless SQL pool – connect to Azure storage account

From Apache spark pools & pipelines – connect to 90+ services via linked services. (Within Managed VNet).

1.??????Outbound connections to the managed VNet to other azure service which supports private endpoints can be made using managed endpoint connections.

2.??????Outbound connections to the ADLS gen2 can be restricted using firewall restrictions at storage account.

3.??????Outbound network security is important to avoid data exfiltration. This is available only when managed VNet is enabled. This setting cannot be modified after the workspace has been created.

21.??Data Exfiltration cannot be possible for following scenarios –

1.??????From synapse pipeline we cannot connect to REST API or nay service that is hosted publicly outside the approved tenant list/ or organization.

2.??????Authenticating through service principal or oAuth.

3.??????Connecting to machine learning workspace to run pipeline.

4.??????Many scenarios in future if outbound connection goes outside the approved tenant.

We can overcome these limitations by using either of below approaches-

1.??????Use self-hosted integration runtime – this is deployed outside the managed VNet, and they are fully managed by customer in their own VNet.

2.??????Create separate synapse workspaces for secure zone (access to secure data sources & prevents exfiltration) and non-secure zone (access restricted to secured zone and enables access data from public data sources)

Conclusion

It's always a best practice to start with research and note down all the recommendations from Microsoft and then map it with your requirement.

These pointers were perfectly fitting for my use case. I am 100% sure, these will be helpful for you as well.

Please feel free to reach out to me to have discussion on any of the topics mentioned above.

I would love to help. Enjoy learning and sharing ??

Thank You All ??

要查看或添加评论，请登录

Preetha R.的更多文章

Building a Cost-Effective Chatbot with FastAPI, Ollama & AI Agents: A Beginner's Guide!

2025年3月13日

Building a Cost-Effective Chatbot with FastAPI, Ollama & AI Agents: A Beginner's Guide!

Hi Friends, Happy to share another article after a long time! As I mentioned in my recent post, this project is a great…
Deploying a Web Application on Azure Kubernetes Service (AKS): A Step-by-Step Guide for Beginners!

2025年2月13日

Deploying a Web Application on Azure Kubernetes Service (AKS): A Step-by-Step Guide for Beginners!

Hi Friends, In this article, I'll guide you through a small project to run a web application using Docker, containers…

1 条评论
Building the Perfect Python Playground: Three Ways to Set Up Your Environment!

2025年2月6日

Building the Perfect Python Playground: Three Ways to Set Up Your Environment!

Hi Friends, Setting up a proper Python environment is a crucial step for any Python developer. It ensures that your…
Navigating the Latest Azure Innovations: Key Updates for Enhanced Cloud Solutions!

2025年1月23日

Navigating the Latest Azure Innovations: Key Updates for Enhanced Cloud Solutions!

Hi Friends, Hope you are doing well. This week, Microsoft Azure has introduced several significant updates that promise…
Exploring the Latest Azure Updates: Enhancing Security and Efficiency!

2025年1月16日

Exploring the Latest Azure Updates: Enhancing Security and Efficiency!

Hello Friends, Hope you all doing good! In the ever-evolving world of cloud computing, staying abreast of the latest…
Azure AI Agent Service - Brief Overview!

2025年1月9日

Azure AI Agent Service - Brief Overview!

Hi Friends, This week's spotlight is on the trending topic: AI Agent Service. Azure AI Agent Service is a new offering…

1 条评论
My First Web Application with Flask: A Beginner's Guide!

2025年1月2日

My First Web Application with Flask: A Beginner's Guide!

Hi Friends, This week, I ventured into the world of Flask, a dynamic framework for creating web applications. While my…
Navigating Real-Time Challenges in Terraform Implementation!

2024年12月26日

Navigating Real-Time Challenges in Terraform Implementation!

Hi Friends, This week's article will highlight the major challenges and solutions involved in implementing Terraform…
Microsoft Azure Architectural Latest Updates!

2024年12月19日

Microsoft Azure Architectural Latest Updates!

Hi Friends, As we wrap up the year, Microsoft Azure continues to innovate and introduce new features that enhance the…
Kickstart Your Python Programming Journey Part-3: Python Control Flow!

2024年12月12日

Kickstart Your Python Programming Journey Part-3: Python Control Flow!

Hi Friends, Hope you are doing good! In my quest to master Python programming, I've already shared two insightful…

See all articles

Azure Synapse Analytics - Quick Bites!

Preetha R.

Microsoft Certified Azure Solutions Architect Expert | Bridging Business Goals with Cloud Excellence | Strong Problem-Solving Skills | Follow my insights at #techysvault

Azure Synapse Analytics Important Pointers -

领英推荐

Conclusion

Preetha R.的更多文章

社区洞察

其他会员也浏览了

Monitor Azure Data Factory with Power BI - Part 2

Uncovering Fivetran & Snowflake; fundamental flaws when materialized

Azure Data and Power BI News (August 2023)

Azure Data and Power BI News (Ignite 2023 Edition)

Azure Data and Power BI News (May 2022)

Microsoft Fabric--Connecting the Dots

Getting Your Hands Dirty with Microsoft Fabric: A Beginner's Guide (Part 1)

Bad Fashion: Open Data Lakehouses

Azure Data and Power BI News (April 2023)

Azure Data and Power BI News (June 2022)

Azure Synapse Analytics Important Pointers -

领英推荐

Conclusion

Preetha R.的更多文章

Building a Cost-Effective Chatbot with FastAPI, Ollama & AI Agents: A Beginner's Guide!

Deploying a Web Application on Azure Kubernetes Service (AKS): A Step-by-Step Guide for Beginners!

Building the Perfect Python Playground: Three Ways to Set Up Your Environment!

Navigating the Latest Azure Innovations: Key Updates for Enhanced Cloud Solutions!

Exploring the Latest Azure Updates: Enhancing Security and Efficiency!

Azure AI Agent Service - Brief Overview!

My First Web Application with Flask: A Beginner's Guide!

Navigating Real-Time Challenges in Terraform Implementation!

Microsoft Azure Architectural Latest Updates!

Kickstart Your Python Programming Journey Part-3: Python Control Flow!

社区洞察

其他会员也浏览了

Monitor Azure Data Factory with Power BI - Part 2

Uncovering Fivetran & Snowflake; fundamental flaws when materialized

Azure Data and Power BI News (August 2023)

Azure Data and Power BI News (Ignite 2023 Edition)

Azure Data and Power BI News (May 2022)

Microsoft Fabric--Connecting the Dots

Getting Your Hands Dirty with Microsoft Fabric: A Beginner's Guide (Part 1)

Bad Fashion: Open Data Lakehouses

Azure Data and Power BI News (April 2023)

Azure Data and Power BI News (June 2022)