Seamless analytics with Microsoft Fabric
End-to-End analytics using Microsoft Fabric

Seamless analytics with Microsoft Fabric

Holistic data analysis using Microsoft Fabric

Microsoft Fabric is a unified platform that enhances collaboration among data professionals by eliminating data silos. It allows data engineers, analysts, and scientists to work together within the same SaaS product, streamlining data model curation, transformation, and visualization. Fabric also provides a more direct connection with data through DirectLake mode and simplifies the integration of native data science techniques. As a SaaS platform, it enables quick provisioning and execution of workloads, allowing for resource scalability and responsiveness to evolving business needs. Also, it introduces a low-to-no-code approach, making it accessible to a wider range of users.

Personas in Microsoft Fabric

Personas in Microsoft Fabric

Microsoft Fabric provides a suite of analytics experiences for specific tasks, including:

  • Data Engineering: A top-tier Spark platform with superior authoring capabilities, enabling data engineers to perform large-scale data transformations.
  • Data Factory: Integrates the user-friendliness of Power Query with the scalability and power of Azure Data Factory, utilizing over 200 native connectors for on-premises and cloud data sources.
  • Data Science: Construct, implement, and manage machine learning models within Fabric to equip data scientists and analysts with foresightful findings.
  • Data Warehouse: Offers top-tier SQL performance and scalability, with a complete separation of compute and storage for independent scaling, and native data storage in the open Delta Lake.
  • Real-Time Analytics: A top-tier engine for observational data analytics, generating actionable insights from real-time data.
  • Data Activator: Initiate actions on your data automatically, without the need for coding.
  • Power BI: The premier business intelligence platform enables users to make informed decisions swiftly and intuitively using data.

Microsoft Fabric lakehouses

Microsoft Fabric - Lake -centric and Open

In Microsoft Fabric, a lakehouse can be set up in any premium workspace. It allows data from various sources to be loaded and processed automatically. Fabric shortcuts provide access to external data, and the Lakehouse Explorer enables data navigation. Data can be explored and transformed using Notebooks or Dataflows (Gen2). Data Factory Pipelines facilitate complex data transformations. Transformed data can be queried, used for machine learning, real-time analytics, or Power BI reporting. Data governance policies can also be applied.

Ingesting Data into a Lakehouse

There are several methods to load data into a Fabric lakehouse:

  • Upload: Local files/folders are uploaded, processed, and loaded into tables.
  • Dataflows (Gen2): Data is imported from various sources, transformed using Power Query Online, and loaded directly into a table.
  • Notebooks: Used for data ingestion, transformation, and loading into tables or files.
  • Data Factory pipelines: Data copying and processing activities are orchestrated, with results loaded into tables or files.

Accessing Data Using Shortcuts

Microsoft Fabric shortcuts enable access to externally stored data, useful for integrating data from various sources into your lakehouse. OneLake manages permissions and credentials, using user identity for data access authorization. Shortcuts, appearing as folders, can be created in Lakehouses and KQL databases, and are utilized by Spark, SQL, Real-Time Analytics, and Analysis Services for data querying.

Explore and transform data in a lakehouse

After data loading, Microsoft Fabric lakehouse offers various tools for data exploration and transformation:

  • Apache Spark: Processes data using Spark pools via Notebooks or Spark Job Definitions.
  • Notebooks: Interactive interfaces for data reading, transformation, and writing.
  • Spark Job Definitions: Scripts for on-demand or scheduled data processing.
  • SQL Endpoint: Allows Transact-SQL queries for data exploration in lakehouse tables.
  • Dataflows (Gen2): Besides data ingestion, it performs transformations via Power Query.
  • Data Pipelines: Orchestrates complex transformations through a sequence of activities.

Optimized file formats

Although formats for structured and semi-structured data that are easy for humans to read can have their advantages, they are usually not designed with storage efficiency or processing speed in mind. As a result, over the years, experts have created specific file formats that support compression and indexing, leading to more efficient storage and processing capabilities.

  • Avro, created by Apache, is a row-based format that uses JSON headers and binary data for efficient compression and storage.
  • ORC, developed by HortonWorks for Apache Hive, is a columnar format that optimizes read/write operations and stores data in stripes, each containing column data and statistical information.
  • Parquet, a columnar format by Cloudera and Twitter, stores data in row groups and excels in handling nested data types. It uses metadata for quick data retrieval and supports efficient compression and encoding.

Medallion Lakehouse Architecture

The Medallion Lakehouse Architecture, often referred to as Medallion Architecture, is a design pattern that organizations use to systematically arrange data in a lakehouse. This architecture is the suggested design method for Microsoft Fabric.

The architecture consists of three unique layers or zones, each representing the data quality stored in the lakehouse, with higher stages indicating superior quality. These stages aid in establishing a unified source of truth for enterprise data products. Notably, the Medallion Architecture ensures the ACID properties (Atomicity, Consistency, Isolation, and Durability) as data moves through the layers.

The three stages of Medallion are:

  1. Bronze (raw)
  2. Silver (validated)
  3. Gold (enriched)

Every Fabric tenant is automatically equipped with Microsoft OneLake, a single, unified, logical data lake for the entire organization, intended to be the sole location for all your analytics data.

Microsoft Fabric - Medallion Lakehouse Architecture

For more information on implementing the Medallion Architecture in Microsoft Fabric, you can refer to the articles and documentations below.

Microsoft Fabric – a unified analytics solution for the era of AI

Microsoft Fabric Lakehouse

Medallion Architecture in Fabric

Medallion Architecture using Databricks

Data Lakehouse using Azure Data Explorer


vishnu chaudhary

Manager | Data Engineer | Analytics Engineer - Microsoft Fabric | Lakehouse | Warehouse| KQL| ETL Informatica | SQL | Azure | Power BI | Python | Pyspark

10 个月

Checkout about Microsoft Fabric end to end use: https://youtube.com/@DataVerse_Academy?si=_WokrLjA8HMpy49W

要查看或添加评论,请登录

Sakthivel N.的更多文章

社区洞察

其他会员也浏览了