登录查看更多内容

MODERN DATA STACK Manifest #1 for Extract &Load data operations - Best Practices & Specifications for Low code Editors - Outside python notebooks

?? Christophe Hervouet

DATA Advisor : Organisations / Gouvernances / Architectures + Lead tech Microsoft Data : Azure / Power BI / Microsoft Fabric + Expert Bigquery / DBT Cloud

发布日期: 2024年3月1日

+ 关注

Extract & Load?? tools/platforms? - Batch ingestions Batch ingest data from various sources (ERP, Sales Force, rest API, Flat files etc.)

Feeding Raw , Staging or Bronze layers
For EDITORS tools / services - Prepare your specifications with your prerequisites
Minimum features required to guarantee robust and efficient projects (reduce time to market fo datamesh IT projects - quick ingestions durations etc.)

1) Connection to "lot of "? sources (ERP , API , files , SQL..)?& "lot of " targeted modern stack? Lakehouses & DWH??

?? Manage key vault providers for services accounts & secrets

?? Deal with potentially Managed access?(E.L service?is well-known by SQL source &?SQL target)?

2) Manage interfaces to deal?with sources API?setups

?? URL parameters (hard coded , expressions , caching loop)

?? Body parameters?(hard coded , expressions , caching loop) ? ?? Unnest / Flatten Yes or No?the JSON?answer ?(No=SQL (via DBT) will do the job based arrays/records type columns)

?? Continuation token?

?? Chunk?of blocks

?? Offer "ready to use" interfaces according to grab API data (parent & children loop) from main cases (Sales force , Azure Graph , O365 , PBI logs etc ..)

?? Offer to community SDK (yaml , python) to create your own connectors? interfaces (sources & targets)? Will need? to be endorsed?

2bis ) Manage interfaces to deal?with flat files on blob buckets folders

3) Copy / Connections / Synchronize SOURCE --> TARGET?

?? Choose?columns to synchronize

?? Automatic targeted table creation first run time ? yes/no?

?? Manage schema changes detections =>

? -- Schema less?

? --- or manage a schema? ?

-- If?Schema less?do you want an automatic schema?alter on targeted?SQL table ? ?? Offer?natively several options as ingestion =>

-- Full refresh (select all source rows) or Incremental (select "new" source rows)

领英推荐

YOUR SQL PERFORMANCE SUCKS - AND HOW TO FIX IT

Andrew Madson MSc, MBA 1 个月前

WINDOWs of the World

Helen Wall 1 年前

Part2:ADF ETL Process: Extracting CSV Data to Azure…

Shanthi Kumar V - I Build AI Competencies/Practices scale up AICXOs 2 个月前

-- Full refresh : Overwrite or Append new rows (no worries DBT can deduplicate after ==> gold layer)

--?Incremental on SQL source : Append new rows based a pivotal columns (most of the time a date or and id)

-- Incremental on Flat file source : Append new rows based source files metadata , last?update system datetime (add this column on the targeted schema) ?

-- If incremental is YES , do you want an automatic deduplication ? , based primary keys for the merge (one row by PK with values for MAX(date or id)?<== kind of PK merge with last values

-- If incremental is YES , and automatic deduplication is NO then (no worries DBT can deduplicate after ==> gold layer)

4) Audit logs

?? Add?automatically ingestion date time & pipeline session id on targeted SQL table

5) Monitoring

?? Store and offer logs & informations

?? Alert in case of success or issues or?warnings ( schemas changes)?

?? Offer?logs via Api (copy histories : datetimes , rows count , issues.. )?

6) Orchestrations service (Airflow - Microsoft datafactory etc.)

?? On SAAS?(one editor - one data platform ) - normally it's integrated?

?? Not on SAAS (several editors or data platforms)

Example of chain : Airbyte E.L --> DBT (Transformation for gold data) ? ?--> Refresh a power bi semantic model --> send notification

?? Orchestration needs to use planning?&?Source systems triggers?(messages or files changes etc.)?

?? E.L tool/service can provide messagess and fill?in variables to orchestrator

7) Embedded & API

?? Run jobs via API

8) DevOps

?? The E.L?pipelines need to be?CICD Compliant (automatic deployments on DEV/TEST/PROD environments parameters)

?? Nothing linked to environments is "hard coded" (source - target - credentials are managed via environment variables) ?

Artur K?nig

1 年

great summary! what do you think about adding an "increment staging layer" before bronze when using sources with large data? I newly tend to do this in fabric to detect errors before appending

查看更多评论

要查看或添加评论，请登录

?? Christophe Hervouet的更多文章

Perform on my Microsoft Fabric tenant a "fine" Licensing (Power BI : PRO/PPU) tracking

2024年12月3日

Perform on my Microsoft Fabric tenant a "fine" Licensing (Power BI : PRO/PPU) tracking

It's no longer a mystery now, you all know Power BI PRO and PPU licenses will see their prices increasing next year…

1 条评论
Perform rich and mandatory Data Quality Controls (DQC) tasks on your Power BI semantic models tables

2024年11月5日

Perform rich and mandatory Data Quality Controls (DQC) tasks on your Power BI semantic models tables

Contexts / architectures this study : Power BI semantic models over a Data Warehouse SQL data product : Bigquery or…

6 条评论
Manage Power BI (semantic model=analytic cube) over Bigquery (SQL gold DWH)

2024年10月18日

Manage Power BI (semantic model=analytic cube) over Bigquery (SQL gold DWH)

Data/BI architects and decisions makers You have invested a lot (budgets, time, projects) on Snowflake, Databricks or…

1 条评论
How Microsoft Fabric notebooks can be helpful on your Power BI semantic model creator (& data analyst) life

2024年7月18日

How Microsoft Fabric notebooks can be helpful on your Power BI semantic model creator (& data analyst) life

?? My personal context is Power BI analytics over GCP Bigquery SQL DWH ?? As consequence : None Fabric Lakehouse & DWH…

5 条评论
Microsoft FABRIC

2024年6月13日

Microsoft FABRIC

?????? ???? ?????????????? ???????? ?????????? ???? ?????????????? ?????????????? ?????????????????? ????????????…
POWER BI - Licensing Administrators - Create/leverage a Power App canvas form to help on the workflow

2024年5月28日

POWER BI - Licensing Administrators - Create/leverage a Power App canvas form to help on the workflow

1) Ingest ETL or ELT all required data on a SQL DWH or Lakehouse system Sources are : Power BI activity events via GET…
Current tendencies around Power BI management and organization

2024年4月19日

Current tendencies around Power BI management and organization

LA FACTORY Provide "official" BI data ( SQL Lakehouse /DWH & Power BI) to consumers and datacitizens Supply "official"…

2 条评论
Microsoft Fabric (or Power BI premium) capacities administrators

2024年4月11日

Microsoft Fabric (or Power BI premium) capacities administrators

We now all know that Fabric's economic model is based, among other things, on CU (capacity unit) supply. This CU will…

1 条评论
???? ???????????????? ?????????????? 5 ???????????? ??????????

2024年3月7日

???? ???????????????? ?????????????? 5 ???????????? ??????????

??A DATA SAAS “turnkey” data platform for your BI & ML projects ??Strong integration between each workloads & artifacts…

1 条评论
Microsoft DATAFACTORY - Load staging layer pattern example

2024年1月21日

Microsoft DATAFACTORY - Load staging layer pattern example

Part #1) LOAD sources data on Lakehouse or DWH Staging tables Use case : ??In this part we will present to you…

2 条评论

See all articles

MODERN DATA STACK Manifest #1 for Extract &Load data operations - Best Practices & Specifications for Low code Editors - Outside python notebooks

?? Christophe Hervouet

DATA Advisor : Organisations / Gouvernances / Architectures + Lead tech Microsoft Data : Azure / Power BI / Microsoft Fabric + Expert Bigquery / DBT Cloud

Extract & Load?? tools/platforms? - Batch ingestions Batch ingest data from various sources (ERP, Sales Force, rest API, Flat files etc.)

领英推荐

?? Christophe Hervouet的更多文章

社区洞察

其他会员也浏览了

A guide to becoming a Taruk Makto in SQL

Mastering SQL Common Table Expressions (CTEs): Simplify Your Queries

Google Sheets, KNIME ETL, CSVs, Tigers, Bears, Excel, Tableau, oh my.

Data Type Conversion in SQL: A Closer Look at CAST Function

SQL : Key Features, Commands & Practical Use Cases |Data Analytics | Belayet Hossain

Efficient & Robust Analytic Queries

Spark SQL and How spark SQL Query are executed at Runtime ?

End-to-End Data Analytics Project: SQL, Python-ETL & Power BI in FinTech | Belayet Hossain

Mastering dbt: Unlocking Benefits and Confronting Challenges

Data Structure

Extract & Load?? tools/platforms? - Batch ingestions Batch ingest data from various sources (ERP, Sales Force, rest API, Flat files etc.)

领英推荐

?? Christophe Hervouet的更多文章

Perform on my Microsoft Fabric tenant a "fine" Licensing (Power BI : PRO/PPU) tracking

Perform rich and mandatory Data Quality Controls (DQC) tasks on your Power BI semantic models tables

Manage Power BI (semantic model=analytic cube) over Bigquery (SQL gold DWH)

How Microsoft Fabric notebooks can be helpful on your Power BI semantic model creator (& data analyst) life

Microsoft FABRIC

POWER BI - Licensing Administrators - Create/leverage a Power App canvas form to help on the workflow

Current tendencies around Power BI management and organization

Microsoft Fabric (or Power BI premium) capacities administrators

???? ???????????????? ?????????????? 5 ???????????? ??????????

Microsoft DATAFACTORY - Load staging layer pattern example

社区洞察

其他会员也浏览了

A guide to becoming a Taruk Makto in SQL

Mastering SQL Common Table Expressions (CTEs): Simplify Your Queries

Google Sheets, KNIME ETL, CSVs, Tigers, Bears, Excel, Tableau, oh my.

Data Type Conversion in SQL: A Closer Look at CAST Function

SQL : Key Features, Commands & Practical Use Cases |Data Analytics | Belayet Hossain

Efficient & Robust Analytic Queries

Spark SQL and How spark SQL Query are executed at Runtime ?

End-to-End Data Analytics Project: SQL, Python-ETL & Power BI in FinTech | Belayet Hossain

Mastering dbt: Unlocking Benefits and Confronting Challenges

Data Structure