Microsoft FABRIC

Microsoft FABRIC

?????? ???? ?????????????? ???????? ?????????? ???? ?????????????? ?????????????? ?????????????????? ????????????

Updated on 29/09/2024


???? ???? ???? ?????? ???????? ???????? ,

  • ?????????????????? ???????? ?????????????????? ???????????????????? ??????????????
  • ???????? ?? ?????????????? ???? ?????????????????? : ???????????? ???????????? ???????????? ?????????????? (??????) / ???????????????? ???????????? ?????????????????? / ???????????????? ??????????
  • ???? ?? "???????????? ???? ?????????????? ??????????" ==> Initiate ideas on this portal ==> https://ideas.fabric.microsoft.com/


INDEX

1) ???????? ?????????? : ?????????????? ???????? ?????????????? ???????????? ?????? ???????????????? & ???????????????????? -?????? ?????????? ?????????????????? ?????????? ??????????

2) ???????? ?????????? - ???????????????? : ?????????? ?? ???? ???????????? ?????? ???????????? ???????????????? ?????????? ???????? (?????????????? ???????? ???????? ???? ????????-??????????????)

3) ???????? ?????????????? (???????? ???????????????? & ??????????2) ?????? ??.?? ???????????????????? - ??????????????

4) ???????? ?????????????? ?????? ?????????? ?????????????????????? (?????????????? ?????????????????????????????? , ???????? ?????????????? , ???????? ?????? ?????????????????? ) - ??????????????

5) ???????? ?????????????? ???????? ????????????????????????

6) ?????????????????? ?????? ????????????????????????

7) ?????????????????? ?????? ?? ?????? ?????????????? ????????????????????????

8) ?????????? ???? ???????????????? ????????????????????????

9) ???????? ?????????? - ???????????? ???????????????? ???????????????????? ?????? ???????????????????????????? ????????????????????????

10) ?????????????????? ????????????????????????

11) ?????????? ???? ???????????????? ?????????? ???????? ?????????????????? ?????? ???????????????? ??????????????????????

12) ?????????? ???? ???????????????? ?????????? ???????? ?????????????????? ???????????????? ??????????????????????

13) ?????? , "????????" ???? ?????? ????????/?????????????????????? ?????????? ?????????????????????????????? - ?????????????? ?????????????? ???? ????????????

14) ???????? ?????????? - ???????????????????? ???????????????? ?????????????????? ??????. ????????????????????

15) ???????? ?????????? - ???????? ???? ???????? ?????????? ???????????????? ???????? ???????????? ??????????????????

======================================


1) ???????? ?????????? : I???????????? ???????? ??????????????? ???????????? ?????? ???????????????? & ???????????????????? - Target : DATA CITIZENS & IT

Push integration to the maximum by creating on domains restricted areas (spaces ?) ==> "ready to use" and "multi artifacts usage" external sources connections

?? External resources connections like Rest API , SQL databases , ERP , Folders access , Key vault , Managed access , On premise or public cloud

?? Logical names shared with consumers builders (user + ETL tool name (All , ..) )

?? Avoid to share credentials (secrets , passwords , tokens ..) & key vault access to several E.T.L - Ingestion tools builder persons

?? Prevent creating "from scratch , I need credentials" new connection on the ingest tool itself <== only use the existing one

?? Manage on premise (via the on prise gateways) & cloud data sources on the same place

?? "One click" usage on Data pipeline tasks , dataflow gen2 , notebooks , event stream source , power bi , data flow , azure python functions , mirroring sources etc ...

?? Of course , also power BI desktop needs to see shared with me datasources (access to this new by domains data sources hub page)

?? Governance : Will also avoid duplicated datasource setups and give datasources usages (lineage) informations



2) ???????? ?????????? - ???????????????? : ?????????? ?? ???? ???????????? ?????? ???????????? ???????????????? ?????????? ???????? (?????????????? ???????? ???????? ???? ????????-??????????????) - Target : DATA CITIZENS & IT

?? All items (data source , WS , DWH , LH , One lake , Semantic model etc ..) access and a user access (all items) , with roles

?? Fabric Admin Apis can provide these informations but we need a Fabric page as access federal & by domain " tower control"

?? Persona : Domain & Tenant security officer



3) ???????? ?????????????? (???????? ???????????????? & ??????????2) ?????? ??.?? ???????????????????? Target : DATA CITIZENS & IT

?? Improve the copy (data pipeline & dfgen2) with these use cases

Native management avoiding currents work arounds :
-- (grab external watermark value , 
--primary key  merge (type 1 columns)  via TSQL stored procedures  or LH notebooks , --primary keys snapshot (type 2 column) via  TSQL stored procedures or LH notebooks ,  
--automatically add system datetime (ingestion , start / end row version ,etc ..) and surrogate key        
Something efficient like on Airbyte & Fivetran & DBT


Have look to the detailed requirement ==> https://www.dhirubhai.net/posts/%F0%9F%91%89-christophe-hervouet-678813109_improve-data-factory-reduce-build-maintenance-activity-7204102292856291328-w-wN?utm_source=share&utm_medium=member_desktop

  • [Full Overwrite]
  • [Append]
  • [Incremental append based a column type date]
  • [Incremental append based a file and its system update datetime]
  • [Rows snapshot based primay key and Type2 columns changes] => provide automatically start / end row version datetime , surrogate key
  • [Merge based primary keys and Type 1 columns updates]


?? Improve the copy (data pipeline & dfgen2) by a setup avoiding duplicated rows after any ROWS APPEND Pattern (full or incremental) incident recovery

Native management avoiding currents work arounds (deduplicate rows via SQL DML)        
Something efficient like on Airbyte & Fivetran


?? Copy (data pipeline & dfgen2)

Improve the Schema Change Management (Source VS Destination) Settings can be :

  • Schema less (auto update table destination)
  • Implicit schema is the current one on destination (& a columns mapping)
  • Manage a schema column “contract” (& a columns mapping)
  • If we fill in a schema , can we put default value if NULL as value ?

Something efficient like on Airbyte & Fivetran



4) ???????? ?????????????? ?????? ?????????? ?????????????????????? (?????????????? ?????????????????????????????? , ???????? ?????????????? , ???????? ?????? ?????????????????? ) - Target : DATA CITIZENS & IT

?? For Gold SQL TRANSFORMATIONS provide something similar DBT

  • DBT patterns are very cool and useful


?? Offer a data quality control task on data pipelines

  • SQL & semantic model completeness , manage your rules


?? Improve Rest APIs ingestion on data pipeline (access token , continuation token , bloc chunk , post & get parameters , loop to chidren api by passing parent parameters etc ..

  • Offer something similar Airbyte" low code" connector builder (as part of Airbyte UI)

OR

  • Offer a cloud function (python) task on data pipeline
  • Similar than very robust Azure functions
  • No need to use web task & notebook for this use case

Something efficient like on Airbyte & Fivetran with call to Cloud Functions (python language) for powerful APIs data ingestion



5) ???????? ?????????????? ???????? ?????????????????????? - Target : DATA CITIZENS & IT

?? Improuve data pipeline CICD and data pipeline parameters or variables transmission to children TASKS (input & output results)

#1) - Stop for Dev ( and features) workspaces a STATIC link with a Github branch( features and main)

-- Like on Azure data Factory WS and DBT cloud WS

-- Avoiding to manage several WS

?? Of course CICD deployments pipelines, original source needs to be the main branch ( data ops controls are OK)

--Data citizens with dev skills and IT area

#2) - All Fabric artifacts metadata to be stored on files on a text file format ( tmdl , tmsl , json , yaml , M , SQL , python etc..)

Will be fantastic for github storage , source control and cicd deployments pipelines

-- Sounds there is a “problem” with Dataflow gen2 for this

?? Of course for source control quality (before/After lines changes) random code is impossible (Warehouse DDL project and data pipelines)

I ' m changing something and 60% of my script lines are moving to another lines ( equal red color )!!! may I cry ?

--Data citizens with dev skills and IT area

3#) cicd with deployments pipelines (data citizens with dev skills and IT area)

--We need In/(out) parameters on each ingestion artifacts

--We need deployments rules for all of them on environments stages==> modify parameters/external connections name values on "the fly"




6) ?????????????????? ?????? ?? ?????? ?????????????? ?????????????????????? - Target : more IT

?? Any news ? about T-SQL DDL ==>

  • Alter table add/remove column : limitation fixed summer 2024
  • Identity Columns
  • Primary keys
  • RECORD/ARRAY Columns (store nested fields like json structures)

(*) Deal with nested fields on RAW tables directly

More accurate (on DDL) the json type to store nested fields like json content

Better ( more clear) schema contract (for ingestion tools and storage)No need to unnest "strategic" fields on ingestion tools

Bigquery exemple ??

CREATE TABLE IF NOT EXISTS mydataset.mytable (

id STRING,

first_name STRING,

last_name STRING,

dob DATE,

addresses ARRAY< STRUCT< status STRING, address STRING, city STRING, state STRING, zip STRING, numberOfYears STRING>>

)

OPTIONS ( description = 'Example name and addresses table');

Unnest() array content via function on queries


  • Temporary tables (useful on stored proc)
  • Temporal Tables over Gold tables (rows versioning/snapshot after a merge - Sql server 2022)

and T-SQL DML ==>

  • Merge (useful UPSERT on stored proc or for engine DBT compiler) (Type 1 upsert)
  • Merge with OUTPUT $action (useful UPSERT + CREATION new current row in case of update) on storec proc or for engine DBT compiler) (Row snapshot , Type 2 versioning)
  • truncate table (useful on stored proc or for DBT engine compiler) (Full refresh) : limitation fixed summer 2024
  • Provide something like Bigquery authorized viewsManaged access for the SQL GOLD view (Gold DWH or LH) to underlying RAW tables (another DWH or LH on same Workspace)

RAW LH or DWH owner gives access to the GOLD view only ... not to Gold SSO UPN users


Nobody undersand these "super strange" limitations from Microsoft who slow down adoption


?? Security / Access

  • More a less perfect : Users / Identity have access a Workspacr or the Warehouse only
  • RLS ? Yes possible via T - SQL functions BUT I prefer to manage this on Power BI semantic model
  • OLS ? I prefer to manage this on Power BI semantic model
  • Shema access ? Yes possible via granted function but we need something "visual" like for WS and Warehouse access management
  • GRANT INSERT ON SCHEMA :: HumanResources TO guest;

?? Query activity ==> Offer much more queries history insights

In a hurry to get on Fabric DWH Console similar SQL QUERIES insights as on Bigquery

Very useful to detect query performance ?

  • Volumetry to read (Mb & Rows)
  • Volumetry as result (Mb & Rows)
  • Global & sub tasks query durations
  • Fine Query Plan Presentation => Are we using a T SQL table partition ?

or other DWH feature to reduce SQL query ?

etc ..

Bigquery console query plan

Bigquery console execution detail



7) ?????????? ???? ???????????????? ?????????????????????? - Target : DATA CITIZENS & IT

?? Improve direct query to semantic model robustness

  • Architecture & governance : DQ to semantic model or another way to offer perspectives to datacitizens from official PBI data products


?? Permit to create Power BI semantic models (import mode) directly on Service

  • Provide power query , modelling , incremental refresh , RLS , calculations groups , perspectives , pro dev features etc ... Not only update as currently

  • Strangely, datamarts (on preview ) can perform this


?? Provide perspective management on DESKTOP


?? Power BI semantic model on import mode

  • How to to made confidant / Improve Image as real Analytics Domain Data Product on big companies (IMPORT MODE)?
  • Powerful and unique by domain provider for everybody consumers
  • To compete with GCP looker & SAP BW
  • Perhaps Microsoft needs to do something for IMPORT MODE for big compagnies
  • Not sure all data architects consider a power BI semantic model (IMPORT MODE) as Azure Analysis Services cube replacement
  • Even Microsoft declare Power BI semantic model (IMPORT MODE) is a superset of Azure Analysis Services



8) ???????? ?????????? - ???????????? ???????????????? ???????????????????? ?????? ???????????????????????????? ?????????????????????? - Target : more IT

?? Provide something more serious the current Capacities Metrics App to follow our F capacities health

?? Follow F capacities metrics (track health , Cus champions Cus debt , penalties throttling etc. )

  • Provide alerting (like AWS) in case of throttling issues
  • Provide a API to grab current App Fabric metrics data
  • Provide to FABRIC adminstrators a Service now or JIRA like , serious ticketing system (avoiding outlook ping-pong matchs)



9) ?????????????????? ?????????????????????? - Target : more IT

?? Differentiate behaviors and services offered (depending you connect an SQL ERP data or Snowflake BI data product

  • Not sure politically and governance speaking it's a "good idea" to offer the SQL end point on Fabric to Snowflake BI data products ( read only or allow transformations)
  • Limit SQL endpoints in case of attachment to a ?????????????????? "???????????????? ????????????????" ???????? ??????????????, avoiding to create "political SQL architecturals problems " <== governance / SQL data products
  • At the very least prohibit any new transformation, to avoid breaking the "single source of truth"
  • Mainly promote semantic model (on direct lake mode) access here .. nothing more
  • Mirroring to an ERP on SQL server sounds very different and SQL transformations are tolerated .. and recommended <== you need "Kimball" T SQL gold transformations



10) ?????????? ???? ???????????????? ?????????? ???????? ?????????????????? ?????? ???????????????? ????????????????????s - Target : DATA CITIZENS & IT

?? Stop limitations and allow access on SEMANTIC MODEL sourcing to SQL views & CTAS tables :

  • ?????????????????? ?????? ?????????? (created via LH SQL end point)
  • and ???????? (???????????? ?????????? ???? ???????????? .. ???????? ?????????? ???????? ??????????) (created on spark SQL notebook)



11) ?????????? ???? ???????????????? ?????????? ???????? ?????????????????? ???????????????? ????????????????????s -Target : DATA CITIZENS & IT

?? Stop limitations and allow access on SEMANTIC MODEL sourcing to T SQL Views & CTAS tables :

  • T SQL SQL View
  • T SQL CTAS table



12) ?????? , "????????" ???? ?????? ????????/?????????????????????? ?????????? ?????????????????????????????? - ?????????????? ?????????????? ???? ???????????? - Target : DATA CITIZENS & IT

?? A native data pipeline (orchestrator) task to RUN DBT cloud jobs

(DBT cloud connection and a command (run model(s) / build model(s) /test/snapshot) + CICD environment : Targeted Fabric DWH + CICD environment variables for code : sourcing )


?? Warehouse T SQL functions DBT compiler engine would like to use :

  • Type 1 upsert ==> merge (useful UPSERT on stored proc or for engine DBT compiler)
  • Row snapshot , Type 2 versioning ==> merge with OUTPUT $action (useful UPSERT + CREATION new current row in case of update) on storec proc or for engine DBT compiler)
  • Full refresh ==> truncate table (useful on stored proc or for DBT engine compiler) : limitation fixed summer 2024



13) ???????? ?????????? - ???????????????????? ???????????????? ?????????????????? ??????. ???????????????????? - Target : DATA CITIZENS & IT

?? ?????? ???????? ?????? ???????? ???????????? ???????????????? / ???????????????????? / ?????????????????????? (?????????? ????????)

  • Summer 2024 issue : Trying to deploy a Warehouse T SQL table with 2 new columns , empty the targeted table <== NOW FIXED : certainly due to T SQL alter table limitation

?? ?????????????? ?????????????????? ?????? ?????????? ?????????????????? "???????? ??????????????????" & "UNIT TESTS"

  • Via a new service (on deployment pipeline)

  • Or a trigger a call to pyspark notebook useful libraries

for power BI data & meta data with semantic link library

for Lakehouse data with pytest lib.

for KQL data via a lakehouse shortcut with pytest lib.

for Warehouse data with tSQLt lib.

for data pipeline & dfgen2 metadata with ?? lib. ?etc ..


14) ???????? ?????????? - ???????? ???? ???????? ?????????? ???????????????? ???????? ???????????? ?????????????????? - Target : DATA CITIZENS & IT

On Workspace and regarding each items an option where to click to observe fine description (use case , history , labels , transformations , lineage , sourcing , targeting , description , modelling , KPIs formula , schema etc ..)

Example :

  • A data pipeline all its logic
  • A data flowgen2 all its logic
  • A direct lake semantic models all is history (sourcing --> modelling)
  • An imported semantic model all its history ( sourcing --> power query transformation --> modelling)
  • A Lakehouse fine description
  • A DWH database fine description
  • A KQL database fine description
  • A ML model fine description
  • An event stream logic

etc ..

Although there are Fabric/PBI admin APIs that already perfom this

Of course copilot can help to produce these "metadata" type of informations





要查看或添加评论,请登录

社区洞察

其他会员也浏览了