Fabric Swatches?-?The Warehouse?-?Part?1

Fabric Swatches?-?The Warehouse?-?Part?1


Prelude

Welcome to the second article in the Fabric Swatches series, where we delve into the intricacies of Microsoft Fabric.

In this installment, we’re diving into the heart of data systems: the Datawarehouse, and how it works in Fabric. Whether you’re managing data or preparing it for analytics, understanding the Warehouse is essential to making the most of Fabric.


Warehouse Management

The Fabric Warehouse is a robust item designed to handle your data needs with efficiency, resiliency and security. Let’s explore the key aspects of managing your Warehouse in Microsoft Fabric:

Creation

Setting up a new Warehouse is straightforward whether using the UI or the Rest API.

Updating the structure of your warehouse can be achieved using T-SQL, Visual Queries or Data Factory items (pipelines and Gen2 Dataflows). Changes to the warehouse can vary from update to schemas, tables, or managing views, stored procedures or adding shortcuts to other warehouses or Analytics endpoints.

Source Control

One of the key features in any modern data platform is source control and warehouse versioning. Version control is integral to maintaining the integrity and traceability of your data and its structure.

Fabric’s Warehouse supports native source control, allowing you to manage changes with confidence and react as needed if issues arise. This is achieved via Fabric’s native Git integration that stores the Warehouse structure as a SQL database project?:

Fabric Warehouse Git synchronisation
As of time of writing this article two main limitations are to keep in mind?:? - Security constraints aren’t versioned and should be re-applied by taking a post deployment approach using sql scripts. - Altering tables causes the table to be dropped on the next deployment, follow these steps to bypass the issue

Alongside the versioning of your warehouse structure, the data stored in a warehouse is also versioned to a point. As mentioned in the first article of this series, the tables in Fabric are Delta tables, reconstructing or accessing a prior version of the data is possible via?:

  • Time-Travel?: The ability to query a table/view by passing a timestamp in the query as followsSELECT * FROM <table> OPTION (FOR TIMESTAMP AS OF ‘2024–07–01T10:00:00.000’);? It is also possible to query stored procedures, views or multiple tables in a join query? A default retention period of 30 applies, with the possibility to extend at added costs.
  • Delta Lake logs?: A less user-friendly and direct way of accessing the data is via the transaction logs for the delta tables, the approach is documented here.

Monitoring the warehouse

Keeping an eye on your Warehouse is crucial for Data security as well as performance optimization.? Fabric offers advanced monitoring tools that give you real-time insights into query performance, resource usage, and potential bottlenecks. The main tools for monitoring the warehouse are?:

  • Query Activity?: Use this view to monitor all query runs, and access some query insights (Longest running queries and most frequent queries)

  • DMVs?: Dynamic Management Views, these are three system views (as of writing) allowing the administrator to monitor warehouse connections, sessions and requests in each session?:? . sys.dm_exec_connections . sys.dm_exec_sessions . sys.dm_exec_requests Using these views allows to quickly identify queries and sessions responsible for performance issues in the warehouse as well as bottlenecks, insights which can then allow the administrator to terminate blocking requests. (Read more here)
  • Capacity Metrics App:

Fabric Capacity Metrics App

Performance management

In order to optimize the performance in the warehouse, certain features are in place to keep data in memory, maintain fresh statistics on the data and enhance write operations. These features are?:

  • Caching?: Caching data in the warehouse is managed in a transparent way for the end user, and is dove via two possible ways - In-Memory caching: the data accessed during heavy compute queries (filtering, aggregating?.. ) is compressed into columnar format designed for in-memory workloads. This data is then made available to all sessions by being stored in-memory and allowing for parallel processing. - Disk cache: When the datasets are too large?, the warehouse utilizes disk space as an extension storage to the in-memory cache, and data is serialized from the latter to the former. This cache is also useful since it maintains the data for longer periods allowing to be then moved back to memory when the data is requested.

  • Warehouse Statistics?: When querying the warehouse, the engine creates an execution plan for your query, the query optimizer then generates the optimal plan based on your warehouse statistics (rows, fields, types, counts?… ). Two types of statistics exist in Fabric?:? - Manual statistics?: Using CREATE STATISTICS, UPDATE STATISTICS, and DROP STATISTICS statements you can manages the stats associated with a given table. (More here). - Automatic statistics?: Good news, statistics are also managed for your by the query optimizer. For each query submitted, the engine will determine if statistics exist and create them if they don’t, but it will also update the statistics if it detects that these no longer reflect the current state of the data.
  • V-Order?: V-Order refers to the open-source compliant operation of optimizing the way in which data is written to the underlying parquet files, by applying a set of operations (sorting, compression, row groups redistribution?..). This method allows for the engine to use less resources when accessing the data later on.

Security

Securing access in the Warehouse starts with access control at the workspace level via role-based access control. A warehouse can also be shared with read-only access to a specific user or group.

Granting access to sql endpoints

In order to further control access to the data stored in the warehouse, granular object-level security needs to be implemented. Three security options are available?:

  • RLS?: Row-level security refers to the implementation of security predicates by filtering rows based on a specific condition
  • CLS?: Column-level security is a more straight forward approach that grants or denies access to specific columns
  • Data Masking?: Is the practice of allowing the user to view the field, but not be able to fully or partially read the data in that field.

I added sample T-SQL for implementing these in my repository

Cloning

Fabric’s Synapse Warehouse offers the possibility of creating clone tables, these are zero-copy clone tables that reference the same underlying data, avoiding data duplication and allowing segmentation of tables structures to address different needs (Development and testing, archiving, isolating uses).

It is possible to create table clones for current data, or from a previous-point-in-time. (More here)

Restore Points

We all know that mistakes happen, so if you find yourself needing to rollback to a valid state before your data is corrupted, or you’re using your warehouse for testing purposes, Fabric offers the ability to restore in-place your warehouse to a set restore point.

Creation for restore points

By default, system create restore points are created every 8 hours, you can also create as needed your own restore points. These restore points are kept for 30 days before being deleted, this retention period is currently non-configurable. Keep in mind?:

  • System created restore points can’t be deleted (in order to maintain SLAs)
  • You can restore to any restore point available as long as you are a Workspace Admin
  • User created restore points can be managed (created, deleted, renamed) by Contributor roles and above


Conclusion

The Fabric Warehouse is a versatile and powerful component of Microsoft Fabric, designed to meet the needs of modern data Platform and analytics.

Managing, securing and monitoring the warehouse is for the most part a straightforward journey.

Stay tuned for the second part of this article, where we talk about how to use the warehouse to ingest data, transform it, handle constraints and more.

要查看或添加评论,请登录

Achraf C.的更多文章

  • Optimizing Costs in Microsoft Fabric: Part 1

    Optimizing Costs in Microsoft Fabric: Part 1

    Since the release of Microsoft Fabric in General Availability (GA), one of the most frequent questions I’ve encountered…

    1 条评论
  • Fabric Swatches?—?The Warehouse?—?Part 2

    Fabric Swatches?—?The Warehouse?—?Part 2

    In this article, we’re diving into the remaining details of the Datawarehouse and how it works in Fabric. Using the…

  • MS Fabric DP-700 exam : My two-cents

    MS Fabric DP-700 exam : My two-cents

    What is the Microsoft DP-700 Exam ? Released last week in beta, the DP-700: Implementing Data Engineering Solutions…

  • Fabric Swatches?-?The Silky Lakehouse

    Fabric Swatches?-?The Silky Lakehouse

    Prelude The following article is the first one in a series titled : Fabric Swatches (Of course I had to get a pun in…

  • ?? Revolutionize Your realtime Data Ingestion Pipeline with FastAPI, Redis Streams, and DuckDB! ??

    ?? Revolutionize Your realtime Data Ingestion Pipeline with FastAPI, Redis Streams, and DuckDB! ??

    Hey LinkedIn Community, I've been looking forward to posting again, and figured the best way to do it is to work on a…

  • DuckDB : The little duckling that could

    DuckDB : The little duckling that could

    A Brief Introduction to DuckDB DuckDB is an open-source, in-memory OLAP database designed to excel in scenarios where…

    1 条评论
  • Fabric adoption guide

    Fabric adoption guide

    So, you heard about Fabric: Microsoft’s modern data platform solution, and maybe even had multiple conversations about…

    3 条评论
  • Data Citizenship's Dos and?Don'ts

    Data Citizenship's Dos and?Don'ts

    Data is all around us, running freely sometimes. So as not to feel overwhelmed, a data citizen needs to be armed by the…

社区洞察

其他会员也浏览了