登录查看更多内容

Should Data Engineers be Domain Competent?

Shammy Narayanan

Chief Solution Architect | 10x Cloud Certified | Founder - Celebrating Life | Adjunct Professor at VIT | Author

发布日期: 2023年5月9日

Data and SARS have one thing in common, both are scary. Similar to the rush to get vaccinated, enterprises are in unforgiving haste in assembling a data team with a Midasian dream of monetizing their data. The "Influencers" engine is working overtime and fueling this golden dream, but the question we refuse to ask ourselves is "Are we building the data team right?". Does our data strategy have sanity in-built into it? To find the responses, we need not look outward but a candid conversation with our teams and a peep into the pile of production issues will spotlight our fundamentally flawed approach.

A traditional data engineer views a table with one million records as relational rows that must be crunched, transported and loaded to a different destination. In contrast, an application programmer approaches the same table as a set of member information or pending claims that impact life. The former is a pureplay, technical view, while the latter is more human-centric. These drastically differing lenses form the genesis of the data siloes

Let's start with a common data incident; how often have we witnessed a dilapidating performance due to an inconsistent index? Indeed, the response is a non-zero integer for a long-timer in IT. It's because the indices are built on a set of columns perceived as vital by the DBAs with scant regard for the "True" application access pathways. This builds up to gradual performance degradation requiring re-indexing after a series of nagging customer complaints about the slowness. Isn't this scenario a powerful testament to how lacking a primary domain can profusely lead to distressed customers?

Graduating further, take a closer look at the existing partitioning strategy of your critical databases; I bet my paycheck that 90% of the table partition will be based on the date column rather than on the access parameters. Such a mindless bookish strategy drains the spool, hogs onto the CPU and renders the application irresponsive when a join is executed. Could we have built it right the first time? Not until the data analysts understand the application well. Deploying and celebrating such an ill-designed application is like claiming success in surgery when the patient is lying dead on the table.

领英推荐

OPC UA over MQTT: Describing the Message Content

OPC Foundation 1 年前

September Data News

CastorDoc 2 年前

Data Insights #6: New Data Solutions

Dark Light - data & BI consultancy 6 个月前

The same ignorance gets carried over to the data transformation/processing. Usually, the load balancers are configured to balance the incoming data load. This simple approach works fine as long as the data source is homogeneous; however, in real-time, we have data from heterogeneous sources with conflicting and varying priorities. In such instances, our approach needs to prioritize the criticality. For example, records used for MIS reporting can wait compared to a transaction waiting for pre-authorization. Such smartness in data ontology has to be inbuilt, and it can be done only by a team that understands the domain. On any given day, a low-performing smart pipeline is far preferable to a high-throughput pipeline built on FIFO. I can keep enumerating myriad of such use cases ranging from inefficient APIs and incompetent data invalidation strategies to miserable database locks. All these testaments are not the product of technical incompetencies but the direct impact of the flawed strategy to isolate data teams and treat them as pureplay "Technical Powerhouse."

When we advocate domain knowledge, let's not relegate it to a few Business Analysts who are tasked to translate a set of high-level requirements into user stories, rather domain knowledge implies that every data engineer gets a grip on the intrinsic understanding of how functionality flows and what it tries to accomplish. Of course, this is easier to preach than practice, as expecting a data team to understand thousands of tables and millions of rows is akin to expecting them to navigate a freeway in peak time on the reverse gear with blindfolds; it will be a disaster.

When its amply evident that Data teams need domain knowledge, its also imperative that centralized data teams are not delivering efficient results; Embedding a Data team as part of an application team appears to be the most viable solution; this is where the concept of Data Mesh that is fast evolving, and its sexiness is seducing the enterprises. The next wave of maturity is to move cautiously and swiftly from a centralized mode to a federated model where data teams are de-centralized. Yet, strategic layers such as Data Governance, security and compliance stay under a common umbrella. Will this be the silver bullet for all our problems? We hope and sincerely wish so, but we cannot guarantee it. As data and analytics evolve from the dark underbelly of the IT landscape, we are in to witness more such surprises and twists convoluting this complicated maze. For engineers like me, such a whirlwind is what makes working in Data an exciting and exuberant challenge.

Detoxifying Digital Decisions

3,953 位关注者

Sujatha Sivaraman

Digital Transformation Leader | I Empower my clients to Accelerate their performance and Elevate their Leadership | Best Selling Author

1 年

Shammy , You have well brought out the need for domain competency for data engineers .

1 次回应

查看更多评论

要查看或添加评论，请登录

Shammy Narayanan的更多文章

Deceptive AI: The Human-Machine Romance

2024年3月26日

Deceptive AI: The Human-Machine Romance

Back in my school days, I always had my mother waiting for me either in the bustling playground or back at home with…

28 条评论
Unveiling the Truth Behind Our Excuses: A Journey to Financial Liberation

2023年12月12日

Unveiling the Truth Behind Our Excuses: A Journey to Financial Liberation

Do you ever find yourself nodding in agreement, surrounded by a sea of discontented individuals, all musing about the…

12 条评论
Network Goldmine - From Layoffs to Opportunities

2023年11月28日

Network Goldmine - From Layoffs to Opportunities

In the last quarter, a couple of my acquaintances lost their jobs. One was a Delivery Head, and the other was an Office…

13 条评论
The Illusion of AI Democratization

2023年10月24日

The Illusion of AI Democratization

AI Democratization is a deceptive charade created by larger corporations to deflect, divert and disseminate attention…
Disrupting the Norm: Are Retirees the Missing Link in Turbocharging Emerging Tech?

2023年9月29日

Disrupting the Norm: Are Retirees the Missing Link in Turbocharging Emerging Tech?

In recent times, a notable trend has caught my attention: a surge in experienced professionals aged 55 and above…

11 条评论
Revealing the Enigma: Unearthing Data Quality Challenges!

2023年9月14日

Revealing the Enigma: Unearthing Data Quality Challenges!

A few days back, I took to LinkedIn to highlight the culinary catastrophe of mutton Biryani being crowned a…

8 条评论
Data Program Disasters: Unveiling the Common Pitfalls

2023年7月11日

Data Program Disasters: Unveiling the Common Pitfalls

When I propose the idea of creating a Data Platform, what images immediately flash through your mind? Let me attempt to…

7 条评论
Unleashing the Data Dragon: Navigating the Challenges of Data Leadership

2023年6月27日

Unleashing the Data Dragon: Navigating the Challenges of Data Leadership

With an enviable paycheck and proximity to the power centre, the life of a Data leader may look glamorous, just as…

5 条评论
How Can Companies "Truly" Support Cancer Warriors?

2023年5月16日

How Can Companies "Truly" Support Cancer Warriors?

A few years ago, working as a consultant in a mid-size organization, noticed a weird pattern of scheduling mandatory…

7 条评论
Working under a Good Boss is a Recipe for Disaster

2023年5月2日

Working under a Good Boss is a Recipe for Disaster

"New York's bestsellers" and self-prophesied leadership coaches have continuously conditioned and deceived our minds…

18 条评论

See all articles

Should Data Engineers be Domain Competent?

Shammy Narayanan

Chief Solution Architect | 10x Cloud Certified | Founder - Celebrating Life | Adjunct Professor at VIT | Author

领英推荐

Detoxifying Digital Decisions

3,953 位关注者

Shammy Narayanan的更多文章

社区洞察

其他会员也浏览了

Pre-defined functions for all Data Modification purposes

The Data Scientist's Dilemma: When NULL Isn't Just Nothing

What is a Data Structure?

Data Products: Save $150 Million, Reduce Risk, Increase Speed

Understanding Expression Indexes in Hudi: Improving Query Performance

learn Data Structures:

Understanding Linked Lists: Operations and Applications

Mastering Data Quality: A Comprehensive Guide to Tackling Data Issues

5 Reasons Data Democratization May Fail

Tree Data Structure in Brief :

领英推荐

Detoxifying Digital Decisions

3,953 位关注者

Shammy Narayanan的更多文章

Deceptive AI: The Human-Machine Romance

Unveiling the Truth Behind Our Excuses: A Journey to Financial Liberation

Network Goldmine - From Layoffs to Opportunities

The Illusion of AI Democratization

Disrupting the Norm: Are Retirees the Missing Link in Turbocharging Emerging Tech?

Revealing the Enigma: Unearthing Data Quality Challenges!

Data Program Disasters: Unveiling the Common Pitfalls

Unleashing the Data Dragon: Navigating the Challenges of Data Leadership

How Can Companies "Truly" Support Cancer Warriors?

Working under a Good Boss is a Recipe for Disaster

社区洞察

其他会员也浏览了

Pre-defined functions for all Data Modification purposes

The Data Scientist's Dilemma: When NULL Isn't Just Nothing

What is a Data Structure?

Data Products: Save $150 Million, Reduce Risk, Increase Speed

Understanding Expression Indexes in Hudi: Improving Query Performance

learn Data Structures:

Understanding Linked Lists: Operations and Applications

Mastering Data Quality: A Comprehensive Guide to Tackling Data Issues

5 Reasons Data Democratization May Fail

Tree Data Structure in Brief :