Microsoft FABRIC
?? Christophe Hervouet
Stratégie et Conseil DATA (plateformes de données BI modernes / organisation / gouvernance / architectures) -------- Modern BI Data Platforms Advisor (Organization/ Governance and Architectures)
?????? ???? ?????????????? ???????? ?????????? ???? ?????????????? ?????????????? ?????????????????? ????????????
Updated on 29/09/2024
???? ???? ???? ?????? ???????? ???????? ,
INDEX
1) ???????? ?????????? : ?????????????? ???????? ?????????????? ???????????? ?????? ???????????????? & ???????????????????? -?????? ?????????? ?????????????????? ?????????? ??????????
2) ???????? ?????????? - ???????????????? : ?????????? ?? ???? ???????????? ?????? ???????????? ???????????????? ?????????? ???????? (?????????????? ???????? ???????? ???? ????????-??????????????)
3) ???????? ?????????????? (???????? ???????????????? & ??????????2) ?????? ??.?? ???????????????????? - ??????????????
4) ???????? ?????????????? ?????? ?????????? ?????????????????????? (?????????????? ?????????????????????????????? , ???????? ?????????????? , ???????? ?????? ?????????????????? ) - ??????????????
5) ???????? ?????????????? ???????? ????????????????????????
6) ?????????????????? ?????? ????????????????????????
7) ?????????????????? ?????? ?? ?????? ?????????????? ????????????????????????
8) ?????????? ???? ???????????????? ????????????????????????
9) ???????? ?????????? - ???????????? ???????????????? ???????????????????? ?????? ???????????????????????????? ????????????????????????
10) ?????????????????? ????????????????????????
11) ?????????? ???? ???????????????? ?????????? ???????? ?????????????????? ?????? ???????????????? ??????????????????????
12) ?????????? ???? ???????????????? ?????????? ???????? ?????????????????? ???????????????? ??????????????????????
13) ?????? , "????????" ???? ?????? ????????/?????????????????????? ?????????? ?????????????????????????????? - ?????????????? ?????????????? ???? ????????????
14) ???????? ?????????? - ???????????????????? ???????????????? ?????????????????? ??????. ????????????????????
15) ???????? ?????????? - ???????? ???? ???????? ?????????? ???????????????? ???????? ???????????? ??????????????????
======================================
1) ???????? ?????????? : I???????????? ???????? ??????????????? ???????????? ?????? ???????????????? & ???????????????????? - Target : DATA CITIZENS & IT
Push integration to the maximum by creating on domains restricted areas (spaces ?) ==> "ready to use" and "multi artifacts usage" external sources connections
?? External resources connections like Rest API , SQL databases , ERP , Folders access , Key vault , Managed access , On premise or public cloud
?? Logical names shared with consumers builders (user + ETL tool name (All , ..) )
?? Avoid to share credentials (secrets , passwords , tokens ..) & key vault access to several E.T.L - Ingestion tools builder persons
?? Prevent creating "from scratch , I need credentials" new connection on the ingest tool itself <== only use the existing one
?? Manage on premise (via the on prise gateways) & cloud data sources on the same place
?? "One click" usage on Data pipeline tasks , dataflow gen2 , notebooks , event stream source , power bi , data flow , azure python functions , mirroring sources etc ...
?? Of course , also power BI desktop needs to see shared with me datasources (access to this new by domains data sources hub page)
?? Governance : Will also avoid duplicated datasource setups and give datasources usages (lineage) informations
2) ???????? ?????????? - ???????????????? : ?????????? ?? ???? ???????????? ?????? ???????????? ???????????????? ?????????? ???????? (?????????????? ???????? ???????? ???? ????????-??????????????) - Target : DATA CITIZENS & IT
?? All items (data source , WS , DWH , LH , One lake , Semantic model etc ..) access and a user access (all items) , with roles
?? Fabric Admin Apis can provide these informations but we need a Fabric page as access federal & by domain " tower control"
?? Persona : Domain & Tenant security officer
3) ???????? ?????????????? (???????? ???????????????? & ??????????2) ?????? ??.?? ???????????????????? Target : DATA CITIZENS & IT
?? Improve the copy (data pipeline & dfgen2) with these use cases
Native management avoiding currents work arounds :
-- (grab external watermark value ,
--primary key merge (type 1 columns) via TSQL stored procedures or LH notebooks , --primary keys snapshot (type 2 column) via TSQL stored procedures or LH notebooks ,
--automatically add system datetime (ingestion , start / end row version ,etc ..) and surrogate key
Something efficient like on Airbyte & Fivetran & DBT
Have look to the detailed requirement ==> https://www.dhirubhai.net/posts/%F0%9F%91%89-christophe-hervouet-678813109_improve-data-factory-reduce-build-maintenance-activity-7204102292856291328-w-wN?utm_source=share&utm_medium=member_desktop
?? Improve the copy (data pipeline & dfgen2) by a setup avoiding duplicated rows after any ROWS APPEND Pattern (full or incremental) incident recovery
Native management avoiding currents work arounds (deduplicate rows via SQL DML)
Something efficient like on Airbyte & Fivetran
?? Copy (data pipeline & dfgen2)
Improve the Schema Change Management (Source VS Destination) Settings can be :
Something efficient like on Airbyte & Fivetran
4) ???????? ?????????????? ?????? ?????????? ?????????????????????? (?????????????? ?????????????????????????????? , ???????? ?????????????? , ???????? ?????? ?????????????????? ) - Target : DATA CITIZENS & IT
?? For Gold SQL TRANSFORMATIONS provide something similar DBT
?? Offer a data quality control task on data pipelines
?? Improve Rest APIs ingestion on data pipeline (access token , continuation token , bloc chunk , post & get parameters , loop to chidren api by passing parent parameters etc ..
OR
Something efficient like on Airbyte & Fivetran with call to Cloud Functions (python language) for powerful APIs data ingestion
5) ???????? ?????????????? ???????? ?????????????????????? - Target : DATA CITIZENS & IT
?? Improuve data pipeline CICD and data pipeline parameters or variables transmission to children TASKS (input & output results)
#1) - Stop for Dev ( and features) workspaces a STATIC link with a Github branch( features and main)
-- Like on Azure data Factory WS and DBT cloud WS
-- Avoiding to manage several WS
?? Of course CICD deployments pipelines, original source needs to be the main branch ( data ops controls are OK)
--Data citizens with dev skills and IT area
#2) - All Fabric artifacts metadata to be stored on files on a text file format ( tmdl , tmsl , json , yaml , M , SQL , python etc..)
Will be fantastic for github storage , source control and cicd deployments pipelines
-- Sounds there is a “problem” with Dataflow gen2 for this
?? Of course for source control quality (before/After lines changes) random code is impossible (Warehouse DDL project and data pipelines)
I ' m changing something and 60% of my script lines are moving to another lines ( equal red color )!!! may I cry ?
--Data citizens with dev skills and IT area
3#) cicd with deployments pipelines (data citizens with dev skills and IT area)
--We need In/(out) parameters on each ingestion artifacts
--We need deployments rules for all of them on environments stages==> modify parameters/external connections name values on "the fly"
6) ?????????????????? ?????? ?? ?????? ?????????????? ?????????????????????? - Target : more IT
?? Any news ? about T-SQL DDL ==>
(*) Deal with nested fields on RAW tables directly
领英推荐
More accurate (on DDL) the json type to store nested fields like json content
Better ( more clear) schema contract (for ingestion tools and storage)No need to unnest "strategic" fields on ingestion tools
Bigquery exemple ??
CREATE TABLE IF NOT EXISTS mydataset.mytable (
id STRING,
first_name STRING,
last_name STRING,
dob DATE,
addresses ARRAY< STRUCT< status STRING, address STRING, city STRING, state STRING, zip STRING, numberOfYears STRING>>
)
OPTIONS ( description = 'Example name and addresses table');
Unnest() array content via function on queries
and T-SQL DML ==>
RAW LH or DWH owner gives access to the GOLD view only ... not to Gold SSO UPN users
Nobody undersand these "super strange" limitations from Microsoft who slow down adoption
?? Security / Access
?? Query activity ==> Offer much more queries history insights
In a hurry to get on Fabric DWH Console similar SQL QUERIES insights as on Bigquery
Very useful to detect query performance ?
or other DWH feature to reduce SQL query ?
etc ..
Bigquery console query plan
7) ?????????? ???? ???????????????? ?????????????????????? - Target : DATA CITIZENS & IT
?? Improve direct query to semantic model robustness
?? Permit to create Power BI semantic models (import mode) directly on Service
?? Provide perspective management on DESKTOP
?? Power BI semantic model on import mode
8) ???????? ?????????? - ???????????? ???????????????? ???????????????????? ?????? ???????????????????????????? ?????????????????????? - Target : more IT
?? Provide something more serious the current Capacities Metrics App to follow our F capacities health
?? Follow F capacities metrics (track health , Cus champions Cus debt , penalties throttling etc. )
9) ?????????????????? ?????????????????????? - Target : more IT
?? Differentiate behaviors and services offered (depending you connect an SQL ERP data or Snowflake BI data product
10) ?????????? ???? ???????????????? ?????????? ???????? ?????????????????? ?????? ???????????????? ????????????????????s - Target : DATA CITIZENS & IT
?? Stop limitations and allow access on SEMANTIC MODEL sourcing to SQL views & CTAS tables :
11) ?????????? ???? ???????????????? ?????????? ???????? ?????????????????? ???????????????? ????????????????????s -Target : DATA CITIZENS & IT
?? Stop limitations and allow access on SEMANTIC MODEL sourcing to T SQL Views & CTAS tables :
12) ?????? , "????????" ???? ?????? ????????/?????????????????????? ?????????? ?????????????????????????????? - ?????????????? ?????????????? ???? ???????????? - Target : DATA CITIZENS & IT
?? A native data pipeline (orchestrator) task to RUN DBT cloud jobs
(DBT cloud connection and a command (run model(s) / build model(s) /test/snapshot) + CICD environment : Targeted Fabric DWH + CICD environment variables for code : sourcing )
?? Warehouse T SQL functions DBT compiler engine would like to use :
13) ???????? ?????????? - ???????????????????? ???????????????? ?????????????????? ??????. ???????????????????? - Target : DATA CITIZENS & IT
?? ?????? ???????? ?????? ???????? ???????????? ???????????????? / ???????????????????? / ?????????????????????? (?????????? ????????)
?? ?????????????? ?????????????????? ?????? ?????????? ?????????????????? "???????? ??????????????????" & "UNIT TESTS"
for power BI data & meta data with semantic link library
for Lakehouse data with pytest lib.
for KQL data via a lakehouse shortcut with pytest lib.
for Warehouse data with tSQLt lib.
for data pipeline & dfgen2 metadata with ?? lib. ?etc ..
14) ???????? ?????????? - ???????? ???? ???????? ?????????? ???????????????? ???????? ???????????? ?????????????????? - Target : DATA CITIZENS & IT
On Workspace and regarding each items an option where to click to observe fine description (use case , history , labels , transformations , lineage , sourcing , targeting , description , modelling , KPIs formula , schema etc ..)
Example :
etc ..
Although there are Fabric/PBI admin APIs that already perfom this
Of course copilot can help to produce these "metadata" type of informations