Virtualizing AWS data by using Fabric Shortcuts
Technical Problem
Before the invention of shortcuts in Microsoft Fabric, big data engineers had to create pipelines to read data from external sources such as Amazon Web Services (AWS) S3 buckets and write into Azure Data Lake Storage.? This duplication of data is at risk of becoming stale over time.? Additionally, computing power might be wasted on bringing over data that is used one time.? With today’s companies being comprised of mergers and acquisitions over time, your company’s data landscape might exist in multiple cloud vendors.? How can we virtualize the data stored in S3 buckets in our Microsoft Fabric Lakehouse design?
Business Problem
Our manager has asked us to create a virtualized data lake using AWS S3 buckets and Microsoft Fabric Lakehouse.? The new shortcut feature will be used to link to both files and delta tables.? Most of the article will be centered around setting up an AWS trial account, loading data into S3 buckets, and creating a service account to access those buckets.? Once the data is linked, a little work will be needed to either create a managed delta table or test linked delta tables.? At the end of this article, the big data engineer will be comfortable with Microsoft Fabric Shortcuts using AWS S3 buckets as a source.
Learn More
Please see the recent article on SQL Server Central for all the details and examples.
Do you have any idea if the aws glue catalog shortcuts are also in the roadmap like fabric has for the unity catalog