Getting Your Hands Dirty with Microsoft Fabric: A Beginner's Guide (Part 1)
Traditional large-scale data analytics solutions have relied on data warehouses and SQL queries to store and retrieve data. However, the rise of big data, characterized by vast #volumes , #variety , and #speed of new data, along with affordable storage and cloud-based distributed computing, has introduced a new approach: the data lake. Unlike data warehouses, data lakes store information as files without a fixed schema.
To bridge the gap between data warehouses and data lakes, data engineers and analysts are increasingly turning to a hybrid solution called a data lakehouse. Microsoft Fabric offers a powerful lakehouse solution that combines the scalability of file storage in a OneLake store (built on Azure Data Lake Store Gen2) with a relational metadata layer based on the open-source Delta Lake table format. This allows you to store data in files within your data lake and apply a relational schema to enable SQL queries.
With Microsoft Fabric, you can harness the advantages of both data warehouses and data lakes, empowering you to define tables and views using Delta Lake's schema capabilities and query them using familiar SQL semantics.
Setting Up Your Workspace in Microsoft Fabric
Setting up a workspace in Microsoft Fabric is the first step towards unlocking its data capabilities. By following these simple instructions, you'll be ready to work with data seamlessly:
1. Access Microsoft Fabric by visiting https://app.fabric.microsoft.com and log in.
2. Navigate to the menu bar on the left-hand side and click on the Workspaces option (you'll recognize it by the icon resembling ??).
3. Create a new workspace by choosing a name that suits your needs. Make sure to select a licensing mode that includes Fabric capacity, such as Trial, Premium, or Fabric.
4. Once your new workspace is created, it will open up, appearing empty and ready for you to start leveraging its potential.
Creating a Data Lakehouse in Microsoft Fabric
Now that you have established your workspace, it's time to dive into the Data Engineering experience within the portal and build a data lakehouse to house your valuable data files. Follow these steps to get started:
1. In the Power BI portal, locate the bottom-left corner and switch to the Data Engineering experience.
2. The Data Engineering home page presents you with a range of tiles that allow you to effortlessly create commonly used data engineering assets.
3. Within the Data Engineering home page, initiate the creation of a new Lakehouse by providing it with a name that suits your purpose.
4. After a short wait, a brand new lakehouse will be generated, opening up exciting possibilities for exploration.
5. Take note of the Lakehouse explorer pane on the left-hand side, which grants you access to browse through the tables and files residing in your lakehouse.
- The Tables folder hosts tables that can be queried using SQL semantics. These tables in Microsoft Fabric's lakehouse adhere to the open-source Delta Lake file format, widely employed in Apache Spark.
- The Files folder contains the data files stored in the OneLake storage, specifically designated for your lakehouse. This folder also allows you to create shortcuts for referencing externally stored data.
Currently, your lakehouse does not contain any tables or files, providing you with a clean slate to begin your data exploration journey.
Ingesting Data into Your Microsoft Fabric Lakehouse
To enrich your lakehouse with valuable data, Microsoft Fabric offers various methods for data ingestion. While pipelines and data flows provide advanced options for copying data from external sources, one of the simplest approaches is uploading files or folders directly from your local computer. Follow these steps to seamlessly upload data into your lakehouse:
1. Download the sales.csv file from the following source: [URL of the sales.csv file from Kaggle](https://www.kaggle.com/datasets/kyanyoga/sample-sales-data?resource=download). Save the file as sales.csv on your local computer.
2. Return to the web browser tab where your lakehouse is open. Within the Lakehouse explorer pane, access the Files folder and click on the ellipsis (...) menu. From the dropdown menu, select "New subfolder" to create a subfolder named "data".
3. In the ellipsis menu for the newly created data folder, choose "Upload" and then select "Upload file". Proceed to upload the sales.csv file from your local computer.
4. Once the upload is complete, navigate to the Files/data folder and verify that the sales.csv file has been successfully uploaded. It should be visible in the folder's contents.
5. For a preview of the uploaded file's contents, simply select the sales.csv file.
By following these steps, you can effortlessly bring data into your Microsoft Fabric lakehouse, empowering you to explore and analyze the uploaded sales.csv file and unlock its insights.
Transforming File Data into Queryable Tables
To enhance the usability of the uploaded sales data, Microsoft Fabric provides the capability to load the data from a file into a table. This enables data analysts and engineers to leverage SQL queries for efficient data exploration. Follow these steps to seamlessly load file data into a table:
1. Begin by navigating to the Home page and selecting the Files/Data folder. Here, you will find the sales.csv file that was previously uploaded.
2. Access the ellipsis (...) menu for the sales.csv file, and choose "Load to Tables" from the options provided.
领英推荐
3. In the Load to table dialog box, assign a suitable name to the table, such as "sales", and confirm the load operation. Now, patiently wait for the table creation and data loading process to complete.
4. Within the Lakehouse explorer pane, locate the newly created "sales" table to gain visibility into its data.
5. To explore the underlying files associated with the sales table, access the ellipsis (...) menu for the table and select "View files".
It's worth noting that files for a Delta table are stored in the Parquet format, including a subfolder named "_delta_log" that maintains transactional details applied to the table.
By following these steps, you can transform the uploaded sales data into a queryable table, opening up a world of possibilities for data analysis and extraction of valuable insights in Microsoft Fabric.
Unlocking Data Exploration with SQL
Once you have created a lakehouse and defined tables within it, a powerful SQL endpoint is automatically generated. This endpoint allows you to query your tables effortlessly using SQL SELECT statements, enabling seamless data exploration. Follow these steps to leverage the SQL endpoint in your Microsoft Fabric lakehouse:
1. On the top-right corner of the Lakehouse page, locate the switch that toggles between Lakehouse and SQL endpoint. Click on it to switch to the SQL endpoint mode.
2. Allow a brief moment for the SQL query endpoint to initialize. Soon, you will be presented with a visual interface that empowers you to query tables within your lakehouse. The interface will open up exciting possibilities for data exploration, as illustrated here.
3. To begin querying, utilize the "New SQL query" button, which will open a query editor.
SELECT PRODUCTCODE, AVG(QUANTITYORDERED) AS AvgQuantityOrdered, AVG(PRICEEACH) AS AvgPrice
FROM sales_data_sample
GROUP BY PRODUCTCODE;
4. In the query editor, enter the desired SQL query. For example, Calculate the average quantity ordered and the average price for each product code.
5. To execute the query and view the results, simply click on the "? Run" button.
By following these steps, you can tap into the potential of the SQL endpoint in your Microsoft Fabric lakehouse, enabling you to perform SQL queries effortlessly and obtain valuable insights from your data.
Crafting Powerful Reports with Power BI
Within your Microsoft Fabric lakehouse, the tables you define automatically become part of a default dataset, forming the foundation for reporting and analysis with Power BI. Follow these steps to harness the potential of default datasets and create insightful reports:
1. At the bottom of the SQL Endpoint page, locate and select the "Model" tab. This tab reveals the data model schema associated with the dataset.
2. In the menu ribbon, navigate to the "Reporting" tab and click on "New report". This action opens a new browser tab dedicated to designing your report.
3. Within the Data pane on the right-hand side, expand the "sales" table. Select the desired fields for your report, such as "PRODUCTLINE" and "SALES".
4. A table visualization is automatically added to the report, showcasing the selected data.
5. To optimize the workspace, hide the Data and Filters panes, creating more space for report design. Ensure the table visualization is selected, and in the Visualizations pane, customize the visualization by converting it into a Clustered bar chart. Adjust the size and appearance of the chart as desired.
6. To save the progress made, go to the File menu and select "Save". Save the report as "Product Sales Report" in the workspace you previously created.
7. Close the browser tab containing the report to return to the SQL endpoint for your lakehouse. In the hub menu bar on the left, select your workspace to confirm that it now includes the following elements:
??- Your lakehouse
??- The SQL endpoint for your lakehouse
??- A default dataset representing the tables within your lakehouse
??- The "Product Sales Report" report
By following these steps, you can harness the default datasets in Microsoft Fabric's Lakehouse to create visually appealing and insightful reports using Power BI. This empowers you to gain valuable business intelligence and make data-driven decisions with ease.
Clean up resources
After successfully creating a lakehouse and importing data into it, it's essential to ensure efficient resource management. This section will guide you through the process of cleaning up your resources. Let's dive in:
1. If you have concluded your exploration of the lakehouse and no longer require the workspace created for this exercise, it's time to remove it.
2. On the left-hand side of the interface, locate and select the icon representing your workspace. This action displays all the items contained within the workspace.
3. In the toolbar, access the ellipsis (...) menu and click on "Workspace settings".
4. Within the "Other" section of the settings, find and select the option "Remove this workspace".
By following these steps, you can effectively tidy up your resources, eliminating the workspace associated with the exercise. This ensures efficient resource utilization within your Microsoft Fabric Lakehouse environment.
Data and Product Engineering | Technical Architect | Cloud (Azure, GCP) | 3x Azure | Analytics
1 年Great Article Akshay. Very informative and Crisp. Thanks for Sharing.
Assistant Director | Lead Data Engineer | BI | SQL | ETL | Azure | ADF | Databricks | Synapse| Data Scientist | IIIT-B
1 年Very well written an article, thanks for sharing
Student at Cotton University
1 年Thanks for sharing this wonderful article.Akshay,I will be happy if you accept my connection on LinkedIn
Azure Data Engineer | Co-Founder of BRAINS ?? | MCT | Mentor | @ronnanlimadataeng
1 年Thanks for sharing!!
Engenheiro de Dados l Databricks Certified | Microsoft Azure Data Engineer Certified | PySpark | Python | SQL
1 年Nice article, thanks for sharing. ??