登录查看更多内容

PRACTICAL CHALLENGES OF IMPLEMENTING A DATA PIPELINE

Ashutosh Shah (Ash)

Generative AI, Agentic AI & Data Product - Responsible AI Architecture Strategy

发布日期: 2022年4月24日

A data pipeline includes a series of steps that are executed sequentially on each dataset in order to generate a final output. The entire process usually involves complex stages of extraction, processing, storage, and analysis. As a result, each stage as well as the entire framework requires diligent management and adoption of best practices. Some common challenges while implementing a data pipeline include:

SLOWER PIPELINES DUE TO MULTIPLE JOINS AND STAR SCHEMA

Joins allow data teams to combine data from two separate tables and extract insights. Given the number of sources, modern data pipelines use multiple joins for end-to-end orchestration. These joins consume computing resources, thereby slowing down data operations. Besides this, large data warehouses rely on star schemas to join DIMENSION tables to FACT tables. On account of its highly denormalised state, star schemas are considered less flexible to enforce the data integrity of dynamic data models.

SLOW DEVELOPMENT OF RUNNABLE DATA TRANSFORMATIONS

With modern data pipelines, organizations are able to build functional data models based on the recorded data definitions. However, developing functional transformations from these models comes with its own challenges as the process is expensive, slow, and error-prone. Developers are often required to manually create executable codes and runtimes for data models, thereby resulting in ad-hoc, unstable transformations.?

NUMEROUS SOURCES AND ORIGINS

领英推荐

Building Robust Systems for Modern Data Challenges

Diogo Ribeiro 6 个月前

What is Data Orchestration?

Dave Gray 5 个月前

When Data Gets Complex

Leif Rasmussen 1 个月前

The dynamic nature of data-driven applications requires constant evolution and are often ingesting data from a growing number of sources. Managing these sources and the processes they run is often challenging as these expose data with different formats. A large number of sources also makes it difficult to document the data pipeline’s configuration details, which hampers cross-domain collaboration in software teams.?

COMPLEXITY IN SECURING SENSITIVE DATA

Organizations host petabytes of data for multiple users with different data requirements. Each of these users has different access permissions for different services, requiring restrictions on how data can be accessed, shared, or modified. Assigning access rights to every individual manually is often a herculean task, which if not done right, may lead to the access of sensitive information to malicious individuals.

GROWING TALENT GAP

With the growth of emerging disciplines such as data science and deep learning, companies require more personnel resources and expertise than job markets can offer. Combined with this is the fact that a typical data pipeline implementation requires a huge learning curve, thereby requiring organizations to dedicate resources to either upskill existing staff or hire skilled experts.

Shortly, I will come up with some of best practices to implement & mitigate the challenges data pipeline, Thanks.

Avisek Biswas

VP @ HSBC | Decisioning Data Steward & Decision Systems Leader | MBA | BE

2 年

Informative. Please include one sub-topic on data cleaning.

1 次回应

Akashdeep(AD) PMP?,CISA?,CISM?,CRISC?,CCSK?,AWS SA-A,AzureSA

GRC - IT AUDIT & Security

2 年

Thanks Ashutosh Kumar . Informative

1 次回应

查看更多评论

要查看或添加评论，请登录

Ashutosh Shah (Ash)的更多文章

Do not get confused DataOps with DevOps

2022年4月23日

Do not get confused DataOps with DevOps

DataOps and DevOps are two distinctly different pursuits. Both are based on agile frameworks that are designed to…

PRACTICAL CHALLENGES OF IMPLEMENTING A DATA PIPELINE

Ashutosh Shah (Ash)

Generative AI, Agentic AI & Data Product - Responsible AI Architecture Strategy

领英推荐

Ashutosh Shah (Ash)的更多文章

社区洞察

其他会员也浏览了

How to Achieve the CDO Mindset

Data Mapping Tool Implementation

Unlocking the Power of Data Processing

Seven Fundamentals of a Strong Data Strategy

Day 5 of New Day New Leaning

Why Data Engineering is Critical for Business Growth

How can You Improve Your Data Strategy?

Data Modeling: Motivated by the Data

Data Visualization

What is a Data Product Platform?

领英推荐

Ashutosh Shah (Ash)的更多文章

Do not get confused DataOps with DevOps

社区洞察

其他会员也浏览了

How to Achieve the CDO Mindset

Data Mapping Tool Implementation

Unlocking the Power of Data Processing

Seven Fundamentals of a Strong Data Strategy

Day 5 of New Day New Leaning

Why Data Engineering is Critical for Business Growth

How can You Improve Your Data Strategy?

Data Modeling: Motivated by the Data

Data Visualization

What is a Data Product Platform?