Analyze Lakehouse data with T-SQL in Microsoft Fabric
Jovan Popovic
Principal Program Manager at Microsoft, working on Microsoft Fabric Warehouse. Worked on Azure Synapse, Azure SQL Azure SQL Managed Instance, and SQL Server.
Microsoft Fabric is a unified data analytics platform that enables you to store your data in Lakehouse artifacts. A Lakehouse combines the best features that you can expect from data lakes and data warehouses. Lakehouses provide scalability and flexibility for data of any size. They also deliver performance and reliability for analytical queries that you have in the classic warehouses.
Microsoft Fabric Lakehouse artifact has a separation of compute (Spark engine) and storage (Microsoft Fabric One Lake) and supports data processing with the Spark languages (PySpark, Scala, SparkSQL) using the notebooks. In addition to classic notebooks for accessing data, Microsoft Fabric enables you to query your Lakehouse data using T-SQL language.
Microsoft Fabric provides an SQL endpoint every Lakehouse artifact that represents a query interface that lets you use T-SQL language and TDS protocol to query the Lakehouse data. In this post, I will show you what is the SQL endpoint and why you should use it for your Lakehouse analytics.
What is SQL endpoint?
An SQL endpoint is an analytical endpoint that allows you to use T-SQL language and TDS (Tabular Data Stream) protocol to query Lakehouse data.
A SQL endpoint is automatically created for your Lakehouse artifact when you create it in a Fabric workspace. You can find it in the Fabric workspace whenever you create a new Lakehouse.
The SQL endpoint has a "warehouse icon" because it is similar to the Warehouse artifact in terms of functionality and connectivity protocol. It lets you analyze your Lakehouse data using the same experience as in any other warehouse or database. You can use T-SQL language to query data, add SQL views, procedures and define security rules in SQL language. It acts as a bridge between the tools that can send T-SQL queries and Lakehouse artifact.
The only difference between the Warehouse artifact and the SQL endpoint is that SQL endpoint only allows you to analyze the Lakehouse data, not to update it. To update data in the Lakehouse, you still need to use Spark notebooks or dataflows. SQL endpoint is just an analytical endpoint on top of your Lakehouse data. Read this article to get mor einformation about the differences between Warehouse and SQL endpoint for Lakehouse.
You can use any tool that supports T-SQL querying over TDS connection (for example Power BI, Excel) to connect to the Lakehouse and query data. You just need to click on your SQL endpoint and get the connection string from the context menu.
This connection string will enable any tool that can connect to SQL Server or Azure SQL database to query your Lakehouse data.
With this connection string, you can use Power BI Report Builder, Excel, SQL Server Management Studio, or Azure Data Studio to connect to your Lakehouse data using the SQL endpoint.
领英推荐
You don't have to copy or move Lakehouse data to another database/warehouse to query it with SQL. You can query lake data directly with T-SQL if you connect to the SQL endpoint. Also, thanks to SQL endpoint, you can query Lakehouse data from Fabric warehouses using 3-part-name references.
On the picture below you can see the Fabric SQL editor that you can use to query your Fabric Warehouse from a browser. You might notice that we are in the context of warehouse but querying a lakehouse table using the 3-part-name reference.
This 3-part-name reference is querying Lakehouse table using the SQL endpoint.
Why use SQL endpoint?
T-SQL language is the preferred language for analyzing data in Microsoft data platform, so SQL endpoint is something that you will always use to analyze your data in Lakehouse scenario. Some of the benefits of using SQL endpoint are:
Use the SQL endpoint whenever you need to query your Lakhoeuse data using T-SQL/TDS and connect to your Lakehouse data using the tools from classic T-SQL ecosystem.
For example, Power BI datasets that represent data that will be shown on report are using underlying TDS/T-SQL connection via SQL endpoint to query lake data if you are using direct query mode for querying. If you are using Fabric web editor for querying Lakehouse data using TSQL, you are connected via SQL endpoint.
Whenever you see T-SQL language on top of your Lakehouse data, you are using the SQL endpoint for querying.
Conclusion
An SQL endpoint is an analytical API that enables you to query and analyze your Lakehouse data using T-SQL language and TDS protocol. It allows you to use your favorite tools and languages for accessing your Lakehouse data, without compromising on performance or flexibility.
To learn more about the SQL Endpoint, see?Better together: the lakehouse and warehouse in Microsoft Fabric.