登录查看更多内容

DataOps simple model

Carlos Godinho Ferreira

Product Owner for HCDAP - Data Analytics Platform

发布日期: 2024年2月5日

What Is Data Engineering?

Data engineering is a set of practices performed by data engineers to transform (ETL) the raw data available into useful data for data analysts and data scientists for taking better decisions for any organization.

What Is DataOps?

DataOps is a process of automating the data-driven cycle used by analytics data teams by creating an automated pipeline. It improves the quality and reduces the cycle time of data analytics.

Difference between DevOps vs DataOps

The difference Between DataOps and DevOps is:

Why DataOps For Data-Engineering??

DataOps helps the data engineers by enabling end-end orchestration of pipeline, (spark, SQL, hive) code and organizational data environments. It makes collaboration within the teams to involve and solve customer needs. DataOps helps data engineers to collaborate with data stakeholders and helping them to achieve scalability, reliability, agility.

How DataOps Is Used In Azure?

Key Learnings:

1) Use of data tiers in datalake

Generally, you want to divide your data lake into three major areas which contain your bronze, silver, and gold datasets.

Bronze tier: It is where the raw data is kept in a data lake without any data transformation applied
Silver-tier: In the silver tier, the data is cleansed, and semi-processed. These conform to a pre-defined schema and might have data augmentation required. The data is normally used by data-scientist who don’t require fully cleaned data.
Gold tier: These are fully cleaned data used by business users. These are structured in fact and dimension tables.

Iain Brown Ph.D. 1 年前

How to Become a Data Engineer — II

Axel Schwanke 8 个月前

Forte Spotlight: Internal Development Platforms…

Forte Group 2 个月前

2) Validate data early in the pipeline

Data validation is done between the bronze and silver datasets. You may assure that all subsequent datasets conform to a certain schema by validating early in your process. This also can potentially prevent data pipeline failures in cases of unexpected changes to the input data.
Data that doesn’t pass through this validation is stored as a malformed record for diagnostic purposes.

3) Make your data pipelines replayable and idempotent

Silver and Gold datasets can get corrupted due to several reasons such as unintended bugs, unexpected input data changes, and more. By making data pipelines replayable and idempotent, you can recover from this state through the deployment of code fixes and replaying the data pipelines.

4) Ensure data transformation code is testable

Abstracting away data transformation code from data access code is key to ensuring unit tests can be written against data transformation logic. Moving transformation code from notebooks to packages is an example of this.

5) Have a CI/CD pipeline

This includes all the artifacts needed to build a data pipeline from scratch in source control. This includes infrastructure as code artifacts, database objects (schema definitions, functions, stored procedures, etc.), reference/application data, data pipeline definitions, and data validation and transformation logic.
There should also be a safe, repeatable process to move changes through dev, test, and finally production.

6) Secure and centralized configuration

Maintain a central, secure location for sensitive configurations such as database connection strings that can be accessed by the appropriate services within the specific environment. This can be done by azure key vault.

7) Monitor infrastructure, pipelines, and data

A proper monitoring solution should be in place to ensure failures are identified, diagnosed, and addressed on time.

What Are The Benefits Of DataOps?

The benefits of DataOps are listed below:

Reduces the complexity of data analytics end-end orchestration and operations.
It improves the data quality.
It reduces the data life cycle of data cleaning, processing, and loading.
It optimizes the data collaboration within the DevOps team.
It will go through production-like tests and patterns which enhances testing.
Can solve the problems faster than the traditional method.
It ensures the protection of customer data and hence no risks.

Conclusion

In short, we have to say that DataOps is not just DevOps for development. It is a set of practices and methods that can add value to the data you collect, encourage collaboration, coordinate processes from an on-premises deployment to the cloud, ensure controlled and secure results, and ensure data security. Allows monitoring of each process and quality checks at different stages to ensure the reliability of data. Reduce delay and time. Optimize the loading and cleaning process and reduce the indirect life cycle, make work easier and faster, and evolve with the latest trends, etc.

Stuart Heginbotham

9 个月

A good intro. I never thought of calling the tiers bronze, silver, or gold. It makes it feel like you will win a medal if you get it right. Thanks.

要查看或添加评论，请登录

Carlos Godinho Ferreira的更多文章

Removing part of string before and after specific character using Transact-SQL string functions

2017年5月11日

Removing part of string before and after specific character using Transact-SQL string functions

Problem Today, one of the developers come to me and asked me the question that is there any T-SQL function that he…
SQL

2017年2月22日

SQL

Alguém com conhecimentos de SQL disponivel? Envie CV ASAP
TSVECTOR

2016年6月2日

TSVECTOR

select count(*) from ln_vias ALTER TABLE ln_vias ADD pesquisatxt tsvector; #1 possibilidade 1 CREATE FUNCTION…
Mobile App Developer - iOS, Android

2016年5月16日

Mobile App Developer - iOS, Android

A CityPOST é uma multinacional Irlandesa, que opera no ramo da distribui??o postal. Está há mais de 20 anos no nosso…

1 条评论

DataOps simple model

Carlos Godinho Ferreira

Product Owner for HCDAP - Data Analytics Platform

What Is Data Engineering?

What Is DataOps?

Difference between DevOps vs DataOps

The difference Between DataOps and DevOps is:

Why DataOps For Data-Engineering??

How DataOps Is Used In Azure?

Key Learnings:

1) Use of data tiers in datalake

领英推荐

2) Validate data early in the pipeline

3) Make your data pipelines replayable and idempotent

4) Ensure data transformation code is testable

5) Have a CI/CD pipeline

6) Secure and centralized configuration

7) Monitor infrastructure, pipelines, and data

What Are The Benefits Of DataOps?

Conclusion

Carlos Godinho Ferreira的更多文章

社区洞察

其他会员也浏览了

Building a Simple Data Pipeline with Mage: A Beginner's Guide

Mastering the Flow: Navigating the Currents of Data Collection and Ingestion in Data Engineering Interviews.

Data Engineering Best Practices: Building Efficient Data Pipeline

Automation in Data Engineering: How No-Code and Low-Code Tools Are Redefining the Role

Why Do Modern Businesses Need Data Engineering Services?

Data Vault

Navigating the Chaos: Strategies and Technologies for Organizing Report Generation in a World of Disorganized Data

Quality at Source: Rethinking Testing Practices in Data Engineering

The Importance of Data Engineering in Today's Digital World

CI/CD in Data Engineering: A Guide for Seamless Deployment

What Is Data Engineering?

What Is DataOps?

Difference between DevOps vs DataOps

The difference Between DataOps and DevOps is:

Why DataOps For Data-Engineering??

How DataOps Is Used In Azure?

Key Learnings:

1) Use of data tiers in datalake

领英推荐

2) Validate data early in the pipeline

3) Make your data pipelines replayable and idempotent

4) Ensure data transformation code is testable

5) Have a CI/CD pipeline

6) Secure and centralized configuration

7) Monitor infrastructure, pipelines, and data

What Are The Benefits Of DataOps?

Conclusion

Carlos Godinho Ferreira的更多文章

Removing part of string before and after specific character using Transact-SQL string functions

SQL

TSVECTOR

Mobile App Developer - iOS, Android

社区洞察

其他会员也浏览了

Building a Simple Data Pipeline with Mage: A Beginner's Guide

Mastering the Flow: Navigating the Currents of Data Collection and Ingestion in Data Engineering Interviews.

Data Engineering Best Practices: Building Efficient Data Pipeline

Automation in Data Engineering: How No-Code and Low-Code Tools Are Redefining the Role

Why Do Modern Businesses Need Data Engineering Services?

Data Vault

Navigating the Chaos: Strategies and Technologies for Organizing Report Generation in a World of Disorganized Data

Quality at Source: Rethinking Testing Practices in Data Engineering

The Importance of Data Engineering in Today's Digital World

CI/CD in Data Engineering: A Guide for Seamless Deployment