Use T-SQL queries over your Azure Data Lake files with serverless SQL pool in Azure Synapse Analytics

One of the important features of Azure Synapse Analytics is the ability to analyze the files placed on Azure Data Lake Storage (ADLS) and Azure Blob Storage using Transact-SQL (T-SQL) language. Azure Synapse Analytics workspace provides dedicated and serverless SQL pools that represent the query endpoints that you can use to query large amounts of data using the Transact-SQL language. In this article, you will learn how to leverage Synapse serverless SQL pool to explore your files, build a relational layer over your storage files, and use serverless SQL pool features to analyze your files and implement a security mechanism that controls access to your data.

Querying files on Azure storage

There are two main scenarios that you will use to analyze the files on Azure storage:

  • Ad-hoc data exploration and analytics where you can read the content of files or explore file schema using the OPENROWSET function.
  • Designing a logical data warehouse (LDW) that represents a thin logical relational layer over your Azure Data Lake storage. LDW enables you to create external tables and define data access rules on top of your Azure Data Lake files. LDW is a logical adapter for any tool that can use Transact-SQL language and that needs to access the files instead of database tables.

The tools and applications that can execute T-SQL queries can read the content of files without noticing the difference between externally stored and standard in-database data. OPENROWSET function, external tables, and views represent abstractions on top of physical files that provide expected relational interface over the externally stored data.

You can use Azure Synapse Studio to query your files using a simple web interface without the need to install some tool on your computer. As an alternative, you can use standard desktop tools such as Azure Data Studio or SQL Server Management Studio that you are using to access Azure SQL or SQL Server databases. The tools like Power BI workspace or Power BI Desktop that use Transact-SQL language supported in serverless SQL endpoint in Azure Synapse analytics can read the content of files and create the reports on files.

You can analyze various file formats such as CSVPARQUET (including support for nested types), and JSON. You can either read a single file, specify complex file patterns with wildcards to read data from multiple files within Azure Data Lake storage, and even optimize performance using partition elimination with built-in file metadata functions FILEPATH and FILENAME. These are new functionalities introduced in Transact-SQL language that you can easily integrate with other standard SQL concepts such as views.

Securing access to your files

You can use advanced security mechanisms to control who can access your files and control access to your workspace, database objects, and data. The first security layer in any mission-critical system is network security control. You can configure IP firewall rules to control what are the IP addresses that can access your service, or set up Managed private endpoints that provide private IP addresses that can be accessed from within your VNet.

Once you define your network security strategy, you can allow users to authenticate in order to run the queries. Users can authenticate using Azure AD authentication with their own Azure identities or SQL authentication using a simple username and password. Azure AD authentication is an advanced authentication option that provides centralized user management and advanced security features such as Multi-factor authentication.

You can define advanced permission rules and control what external tables and views can be used to access the storage. You can use a granular permission model to GRANT or DENY access to some external tables, the groups of tables organized in schemas or databases. You can also GRANT or DENY access to individual users or the groups of users organized in database roles. This is a flexible and proven enterprise-grade security model that is used in Azure SQL Database and SQL Server database engines. You can also enable ad-hoc storage data analysts to directly read the content of files by granting them permission to execute the OPENROWSET function on particular storage accounts.

The third layer of security is storage access control where you can define an impersonation mechanism that database users can use to access the files on storage. The recommended impersonation mechanism is Azure Active Directory pass-through authentication where the identity of Azure AD user is directly provided to Azure Data Lake storage. This mechanism enables your Storage administrators to define who can access files on the storage layer by assigning Azure storage RBAC roles such as the Storage Data Reader role. As an alternative, your Storage administrator can provide Shared Access Signature that defines time-limited access to some files in storage. The third option is to allow the workspace to access files and impersonate the database user using Azure Synapse workspace identity. This option enables your Storage owner to allow access to a single identity and lets you fully control access using a database permission model.

Serverless consumption model

Serverless SQL queries in Azure Synapse Analytics are charged using a pay-per-use consumption model. There is no pre-provisioning of resources, so you don’t need to scale up or down your compute depending on your workload. Azure Synapse will dynamically allocate compute needed to process the data depending on your query complexity and the amount of data. You are paying only for processed data - see the pricing page for more details. Metadata queries and actions where you setup credentials and tables are not charged. Serverless querying leverages the full potential of cloud elasticity in Azure and delivers the main promise of cloud data computing where you can forget about the resource constraints and just use your service when you need it.

Conclusion

The serverless SQL pool in Azure Synapse Analytics is an excellent choice in the scenarios where you need to combine Transact-SQL language and big data analytics. Transact-SQL enables data analysts who are already familiar with SQL language to explore the Azure storage files with a minimal required learning curve. A logical data warehouse on top of Azure storage is a relational interface that enables the tools and applications that can query databases to transparently access Azure storage using the proven security model. In addition, the serverless consumption model enables you to forget about resource provisioning, manual scaling, and start querying the data and paying only for the resources you used. You can learn more about this service on Azure documentation and create your first workspace to see how it fits in your analytical solution.

要查看或添加评论,请登录

Jovan Popovic的更多文章

社区洞察

其他会员也浏览了