Use shortcuts instead of external tables to reference external data in Fabric Warehouse

Use shortcuts instead of external tables to reference external data in Fabric Warehouse

Microsoft Fabric is a unified data and analytics platform that delivers a modern warehouse that integrates smoothly with Lakehouses, Power BI, and other data and analytics engines. It has all the capabilities that you need from a cutting-edge cloud data warehouse.

One interesting fact about Microsoft Fabric Warehouse is that it doesn't support external tables. If you need external tables to migrate your code you can vote for this idea Support Parquet/CSV external tables in Fabric Data Warehouse, but in this post I will show ou an interesting alternative.

Traditionally, external tables are used in modern warehouses to reference data on external lakes or Lakehouses. External tables are logical tables that reference data stored outside the warehouse.

However, in Microsoft Fabric warehouse, external tables are NOT available. As you can see in the Microsoft Fabric warehouse surface area documentation, there are no external tables among the supported objects. This is not a missing feature – external tables are not needed anymore in Fabric warehouse.

Instead, you can directly reference tables in Lakehouses within the same Fabric workspace using 3-part-names - for example: `MyLakehouse.dbo.MyTable`. This way, you can access Lakehouse data in the same workspace without creating external tables.

Sometimes, you may need to reference data outside your Fabric workspace. For example, you may want to access data in a Lakehouse or Warehouse placed in another Fabric workspace, an external Azure Data Lake Storage account, or an external Amazon S3 bucket. In these cases, you can use Lakehouse shortcuts to create virtual folders that point to the external data sources.

A Lakehouse shortcut is a special type of lake folders that references a folder in another Fabric Lakehouse/warehouse, ADLS, or S3. You can create a shortcut in your Lakehouse using the Fabric UI. Once you create a shortcut, it will appear as a regular folder in your Lakehouse. You can read data using the shortcut as if it were a local folder in your One Lake.

If you want to use the shortcuts in your Warehouse, your shortcut must reference Delta Lake folders and be placed under the /Tables folder in your Lakehouse. Then the shortcut will automatically appear as a classic table in a SQL endpoint for that Lakehouse. You can then query it using a 3-part-name from your Warehouse.

Another benefit of using shortcuts is that they encapsulate the access mechanism to the external data sources. You don't need to store the URI or credentials in your SQL database and synchronize them with other engines, such as Spark. The shortcuts handle the authentication and authorization for you, making it easier and more secure to access external data.

Limitations

Shortcuts are a great way to access external data sources in Fabric, but there are some limitations that you should be aware of when you start using them:

  • You need a separate Lakehouse to use shortcuts in your Warehouse. You cannot create shortcuts directly in your Warehouse artifact without a Lakehouse artifact. This means that you need to create a Lakehouse in addition to your warehouse, and reference external data via Lakehouse and 3-part-names.
  • You cannot create shortcuts programmatically in your Lakehouse. You have to manually set them up in the Fabric UI, which may be tedious if you have many shortcuts to create. Currently, there is no API or SDK to automate the creation of shortcuts.
  • You can only use shortcuts to access data in Delta Lake format. We strongly recommend that you use Delta Lake as your primary format for storing data. However, if you have data in other formats, such as Parquet, you need to convert them to Delta Lake to leverage Fabric warehouse and shortcuts.
  • A shortcut can only reference one Delta folder at a time. You cannot reference multiple Delta folders with one shortcut. If you want to access multiple Delta folders using shortcuts, you need to create multiple shortcuts, one for each Delta folder.

These are the limitations at the time of writing this article, but they may change in the future as we continue to improve Fabric. If you have any feedback or ideas on how to make the shortcuts better, or if you have any other suggestions for Fabric, please feel free to share them on the Microsoft Fabric ideas site in Warehouse or One Lake/Shortcuts categories or vote for the existing ones. Microsoft Fabric team values your input and we want to make Fabric the best data and analytics platform for you.

Conclusion

Shortcuts are a new way of referencing external data sources in Fabric. Unlike Synapse external tables or linked services that are tightly coupled with their Synapse engines, shortcuts are bound to the storage layer (One Lake) and are applicable to all engines in Fabric, including warehouse. This means that you can use shortcuts to access data from any Fabric workspace, Azure Data Lake Storage account, or Amazon S3 bucket, regardless of the engine you are using. You can also use shortcuts to reference external delta folders as tables in your warehouse.

Shortcuts are more flexible, secure, and consistent than external tables or linked services, which are limited to specific engines and require more configuration and maintenance. Therefore, we believe that shortcuts are a better solution for accessing external data in Fabric. The only scenario where you wod need external table is reading underlying Parquet/CSV formant becase the tables referenced via shortuct can read on Delta Lake format. If you think that shorcuts cannot replace external tables in your scenario, you can vote for this idea Support Parquet/CSV external tables in Fabric Data Warehouse, and explain what is the missing feature tht prevents you from using shortcuts.

I hope that this article has helped you understand how to use shortcuts in Fabric.

Thank you for Article. Can we copy these shortcut files into another Lakehouse in files folder, so that they can be accessed there? Since the shortcuts are supposed to be physically created everytime I want the files to be copied and present so that they can be accessed within Notebooks. Thank you appreciate your input.

回复
Syed Haris Sherazi

Microsoft Certified Data Engineer | Data Solutions Architect | Data Scientist | Database Developer

1 个月

Does SQL Server 2022 support integrating with Microsoft Fabric Warehouse/Lakehouse via PolyBase external data sources? My linked server (ODBC) works fine, but attempts using PolyBase fail

回复
Vamsi Behara

Data & AI Leader | Lead Data Architect | Driving Cloud, Lakehouse & Gen-AI Innovations | Trusted Advisor

1 个月

Hi Jovan Popovic, looks like External Tables are not supported even today in Lakehouse or Warehouse (though couldn't find any document to confirm it, but we were getting errors while trying to create ETs). So does this mean, by architecture, Fabric does not support ETs? And we can only create Managed Tables? And to access external data sets (for ex: from current ADLS Gen2 accounts), we must go with Shortcuts only(and it is mandatory the data sets has to be in Delta format only)?

回复
Mukesh P.

Software Engineer

4 个月

This is great Jovan Popovic, also now shortcuts api are available so you create them programmatically.

回复
Yongtao Teng

CSA at Microsoft

9 个月

Hi Jovan Popovic, I found there are some rest api in preview for manage shortcuts but most of them didnt support service principles but only works for user's token, are these apis will support service principles then we can managed them by code, thanks.

回复

要查看或添加评论,请登录

Jovan Popovic的更多文章

社区洞察

其他会员也浏览了