登录查看更多内容

Getting Your Hands Dirty with Microsoft Fabric: A Beginner's Guide (Part 1)

Akshay T.

Azure 14X | KPMG | Ex - EY | Azure Data Engineer | Data Factory | DataBricks | Data Lake | Synapse | Data Pipelines | Data Warehousing | CI/CD | PySpark | SQL | Python | [Views Are Personal]

发布日期: 2023年7月9日

Traditional large-scale data analytics solutions have relied on data warehouses and SQL queries to store and retrieve data. However, the rise of big data, characterized by vast #volumes , #variety , and #speed of new data, along with affordable storage and cloud-based distributed computing, has introduced a new approach: the data lake. Unlike data warehouses, data lakes store information as files without a fixed schema.

To bridge the gap between data warehouses and data lakes, data engineers and analysts are increasingly turning to a hybrid solution called a data lakehouse. Microsoft Fabric offers a powerful lakehouse solution that combines the scalability of file storage in a OneLake store (built on Azure Data Lake Store Gen2) with a relational metadata layer based on the open-source Delta Lake table format. This allows you to store data in files within your data lake and apply a relational schema to enable SQL queries.

With Microsoft Fabric, you can harness the advantages of both data warehouses and data lakes, empowering you to define tables and views using Delta Lake's schema capabilities and query them using familiar SQL semantics.

Setting Up Your Workspace in Microsoft Fabric

Setting up a workspace in Microsoft Fabric is the first step towards unlocking its data capabilities. By following these simple instructions, you'll be ready to work with data seamlessly:

1. Access Microsoft Fabric by visiting https://app.fabric.microsoft.com and log in.

2. Navigate to the menu bar on the left-hand side and click on the Workspaces option (you'll recognize it by the icon resembling ??).

3. Create a new workspace by choosing a name that suits your needs. Make sure to select a licensing mode that includes Fabric capacity, such as Trial, Premium, or Fabric.

4. Once your new workspace is created, it will open up, appearing empty and ready for you to start leveraging its potential.

Creating a Data Lakehouse in Microsoft Fabric

Now that you have established your workspace, it's time to dive into the Data Engineering experience within the portal and build a data lakehouse to house your valuable data files. Follow these steps to get started:

1. In the Power BI portal, locate the bottom-left corner and switch to the Data Engineering experience.

2. The Data Engineering home page presents you with a range of tiles that allow you to effortlessly create commonly used data engineering assets.

3. Within the Data Engineering home page, initiate the creation of a new Lakehouse by providing it with a name that suits your purpose.

4. After a short wait, a brand new lakehouse will be generated, opening up exciting possibilities for exploration.

5. Take note of the Lakehouse explorer pane on the left-hand side, which grants you access to browse through the tables and files residing in your lakehouse.

- The Tables folder hosts tables that can be queried using SQL semantics. These tables in Microsoft Fabric's lakehouse adhere to the open-source Delta Lake file format, widely employed in Apache Spark.

- The Files folder contains the data files stored in the OneLake storage, specifically designated for your lakehouse. This folder also allows you to create shortcuts for referencing externally stored data.

Currently, your lakehouse does not contain any tables or files, providing you with a clean slate to begin your data exploration journey.

Ingesting Data into Your Microsoft Fabric Lakehouse

To enrich your lakehouse with valuable data, Microsoft Fabric offers various methods for data ingestion. While pipelines and data flows provide advanced options for copying data from external sources, one of the simplest approaches is uploading files or folders directly from your local computer. Follow these steps to seamlessly upload data into your lakehouse:

1. Download the sales.csv file from the following source: [URL of the sales.csv file from Kaggle](https://www.kaggle.com/datasets/kyanyoga/sample-sales-data?resource=download). Save the file as sales.csv on your local computer.

2. Return to the web browser tab where your lakehouse is open. Within the Lakehouse explorer pane, access the Files folder and click on the ellipsis (...) menu. From the dropdown menu, select "New subfolder" to create a subfolder named "data".

3. In the ellipsis menu for the newly created data folder, choose "Upload" and then select "Upload file". Proceed to upload the sales.csv file from your local computer.

4. Once the upload is complete, navigate to the Files/data folder and verify that the sales.csv file has been successfully uploaded. It should be visible in the folder's contents.

5. For a preview of the uploaded file's contents, simply select the sales.csv file.

By following these steps, you can effortlessly bring data into your Microsoft Fabric lakehouse, empowering you to explore and analyze the uploaded sales.csv file and unlock its insights.

Transforming File Data into Queryable Tables

To enhance the usability of the uploaded sales data, Microsoft Fabric provides the capability to load the data from a file into a table. This enables data analysts and engineers to leverage SQL queries for efficient data exploration. Follow these steps to seamlessly load file data into a table:

1. Begin by navigating to the Home page and selecting the Files/Data folder. Here, you will find the sales.csv file that was previously uploaded.

2. Access the ellipsis (...) menu for the sales.csv file, and choose "Load to Tables" from the options provided.

领英推荐

Warping through Data pipelines

Mathias Halkj?r Petersen 1 年前

Microsoft Fabric--Connecting the Dots

Jo Peterson 2 个月前

Azure Data and Power BI News (July 2023)

Pawel Potasinski 1 年前

3. In the Load to table dialog box, assign a suitable name to the table, such as "sales", and confirm the load operation. Now, patiently wait for the table creation and data loading process to complete.

4. Within the Lakehouse explorer pane, locate the newly created "sales" table to gain visibility into its data.

5. To explore the underlying files associated with the sales table, access the ellipsis (...) menu for the table and select "View files".

It's worth noting that files for a Delta table are stored in the Parquet format, including a subfolder named "_delta_log" that maintains transactional details applied to the table.

By following these steps, you can transform the uploaded sales data into a queryable table, opening up a world of possibilities for data analysis and extraction of valuable insights in Microsoft Fabric.

Unlocking Data Exploration with SQL

Once you have created a lakehouse and defined tables within it, a powerful SQL endpoint is automatically generated. This endpoint allows you to query your tables effortlessly using SQL SELECT statements, enabling seamless data exploration. Follow these steps to leverage the SQL endpoint in your Microsoft Fabric lakehouse:

1. On the top-right corner of the Lakehouse page, locate the switch that toggles between Lakehouse and SQL endpoint. Click on it to switch to the SQL endpoint mode.

2. Allow a brief moment for the SQL query endpoint to initialize. Soon, you will be presented with a visual interface that empowers you to query tables within your lakehouse. The interface will open up exciting possibilities for data exploration, as illustrated here.

3. To begin querying, utilize the "New SQL query" button, which will open a query editor.

SELECT PRODUCTCODE, AVG(QUANTITYORDERED) AS AvgQuantityOrdered, AVG(PRICEEACH) AS AvgPrice
FROM sales_data_sample
GROUP BY PRODUCTCODE;

4. In the query editor, enter the desired SQL query. For example, Calculate the average quantity ordered and the average price for each product code.

5. To execute the query and view the results, simply click on the "? Run" button.

By following these steps, you can tap into the potential of the SQL endpoint in your Microsoft Fabric lakehouse, enabling you to perform SQL queries effortlessly and obtain valuable insights from your data.

Crafting Powerful Reports with Power BI

Within your Microsoft Fabric lakehouse, the tables you define automatically become part of a default dataset, forming the foundation for reporting and analysis with Power BI. Follow these steps to harness the potential of default datasets and create insightful reports:

1. At the bottom of the SQL Endpoint page, locate and select the "Model" tab. This tab reveals the data model schema associated with the dataset.

2. In the menu ribbon, navigate to the "Reporting" tab and click on "New report". This action opens a new browser tab dedicated to designing your report.

3. Within the Data pane on the right-hand side, expand the "sales" table. Select the desired fields for your report, such as "PRODUCTLINE" and "SALES".

4. A table visualization is automatically added to the report, showcasing the selected data.

5. To optimize the workspace, hide the Data and Filters panes, creating more space for report design. Ensure the table visualization is selected, and in the Visualizations pane, customize the visualization by converting it into a Clustered bar chart. Adjust the size and appearance of the chart as desired.

6. To save the progress made, go to the File menu and select "Save". Save the report as "Product Sales Report" in the workspace you previously created.

7. Close the browser tab containing the report to return to the SQL endpoint for your lakehouse. In the hub menu bar on the left, select your workspace to confirm that it now includes the following elements:

??- Your lakehouse

??- The SQL endpoint for your lakehouse

??- A default dataset representing the tables within your lakehouse

??- The "Product Sales Report" report

By following these steps, you can harness the default datasets in Microsoft Fabric's Lakehouse to create visually appealing and insightful reports using Power BI. This empowers you to gain valuable business intelligence and make data-driven decisions with ease.

Clean up resources

After successfully creating a lakehouse and importing data into it, it's essential to ensure efficient resource management. This section will guide you through the process of cleaning up your resources. Let's dive in:

1. If you have concluded your exploration of the lakehouse and no longer require the workspace created for this exercise, it's time to remove it.

2. On the left-hand side of the interface, locate and select the icon representing your workspace. This action displays all the items contained within the workspace.

3. In the toolbar, access the ellipsis (...) menu and click on "Workspace settings".

4. Within the "Other" section of the settings, find and select the option "Remove this workspace".

By following these steps, you can effectively tidy up your resources, eliminating the workspace associated with the exercise. This ensures efficient resource utilization within your Microsoft Fabric Lakehouse environment.

Data Digest

8,128 位关注者

Manish Garg

Data and Product Engineering | Technical Architect | Cloud (Azure, GCP) | 3x Azure | Analytics

1 年

Great Article Akshay. Very informative and Crisp. Thanks for Sharing.

1 次回应

Naveen Jadhav

1 年

Very well written an article, thanks for sharing

1 次回应

Angshool Deka

Student at Cotton University

1 年

Thanks for sharing this wonderful article.Akshay,I will be happy if you accept my connection on LinkedIn

1 次回应

Ronnan Lima

Azure Data Engineer | Co-Founder of BRAINS ?? | MCT | Mentor | @ronnanlimadataeng

1 年

Thanks for sharing!!

1 次回应

Phillipe Santos

Engenheiro de Dados l Databricks Certified | Microsoft Azure Data Engineer Certified | PySpark | Python | SQL

1 年

Nice article, thanks for sharing. ??

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Getting Your Hands Dirty with Microsoft Fabric: A Beginner's Guide (Part 1)

Akshay T.

Azure 14X | KPMG | Ex - EY | Azure Data Engineer | Data Factory | DataBricks | Data Lake | Synapse | Data Pipelines | Data Warehousing | CI/CD | PySpark | SQL | Python | [Views Are Personal]

Setting Up Your Workspace in Microsoft Fabric

Creating a Data Lakehouse in Microsoft Fabric

Ingesting Data into Your Microsoft Fabric Lakehouse

Transforming File Data into Queryable Tables

领英推荐

Unlocking Data Exploration with SQL

Crafting Powerful Reports with Power BI

Clean up resources

Data Digest

8,128 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Azure Data and Power BI News (August 2023)

Azure Data and Power BI News (October 2023)

Azure Data and Power BI News (Ignite 2023 Edition)

Azure Data and Power BI News (October 2022)

Azure Data and Power BI News (April 2023)

Bad Fashion: Open Data Lakehouses

Azure Data and Power BI News (June 2022)

Azure Data and Power BI News (April 2022)

CTRL + ALT + Data Security #16

Azure Data and Power BI News (September 2022)

Setting Up Your Workspace in Microsoft Fabric

Creating a Data Lakehouse in Microsoft Fabric

Ingesting Data into Your Microsoft Fabric Lakehouse

Transforming File Data into Queryable Tables

领英推荐

Unlocking Data Exploration with SQL

Crafting Powerful Reports with Power BI

Clean up resources

Data Digest

8,128 位关注者

Copy Tables from On-Premise SQL Server to Azure Data Lake | Azure Data Engineering Project Guide [Part 3]

2024年3月18日

Conquering the Azure Data Engineer Associate Exam: A 30-Day Blueprint to Success

2024年3月11日

Part 2- Data Ingestion | A Step-by-Step Guide to Building End-to-End Data Engineering Projects with Azure

2024年3月10日

A Step-by-Step Guide to Building End-to-End Data Engineering Projects with Azure - Part 1

2024年3月9日

Seamless Integration: Databricks' Approach to Reading and Writing in Azure Data Lake Gen 2

2023年7月4日

Azure Data Factory – CI/CD [Part-2]

2023年3月29日

Azure Data Factory – CI/CD [Part 1]

2023年3月26日

Creating an Automated Data Pipeline with Databricks

2023年3月5日

Capture Data Changes in Azure Data Factory and Azure Synapse Analytics

2023年2月4日

Real-Time Challenges and Solutions for Data Engineers in Azure Databricks

2023年1月27日

社区洞察

其他会员也浏览了

Azure Data and Power BI News (August 2023)

Azure Data and Power BI News (October 2023)

Azure Data and Power BI News (Ignite 2023 Edition)

Azure Data and Power BI News (October 2022)

Azure Data and Power BI News (April 2023)

Bad Fashion: Open Data Lakehouses

Azure Data and Power BI News (June 2022)

Azure Data and Power BI News (April 2022)

CTRL + ALT + Data Security #16

Azure Data and Power BI News (September 2022)