I am now a 7(8?) months old data engineer part 2
Photo by <a >Donald Giannatti</a> on <a href="https://unsplash.com/photos/Zu55y7L5Wm4?utm_source=unsplash&utm_medium=referral&utm_content=creditC

I am now a 7(8?) months old data engineer part 2

I recently began a series of posts on my learnings in a role of a data engineer.

The focus of my first post was on environments (not like ???????? but deployment environments like dev, test, prod).

It's quite a life-changer to go from working in a single environment to suddenly doing development in dev, testing in uat and release production-ready features to production environment ??

One more realization that I had is on the topic of data quality.

??Here is to number 2 of my data engineer learnings.


Data Quality Chase ??

One of my biggest motivators for shifting into data engineering was the impact I would have on the quality of analytics data.

Back in the day when I was a data analyst, I was constantly frustrated by inconsistencies I would find in the tables that I used when building reports, dashboards and when doing analyses??.

Sometimes data was outdated ?? and sometimes it was simply wrong and not consistent with the source system ??.

My thinking was: ??Let me get into data engineering so that I can learn to build robust pipelines and deliver good quality data for data analysts.

To my surprise however, it takes much more than a motivated data engineer, or a team of data engineers for that matter, to have some sort of impact on data quality.

Data is ingested from somewhere (a source system). And so if data is of low quality already there, then data engineers have little to no hope.

Surely one can always build some advanced set of data quality checks and rules to correct bad data upon ingestion into a data lake, or when performing cleaning and transformations.

Although isn't that a bit too much too ask of a data engineer?

(I think it is. But that's a topic for another post)

??What now, you might ask? How do I live knowing that even on a data engineer role I have so little impact on data quality?

I live wonderfully, actually?? and that is because while I cannot magically correct bad data coming from source systems, I can expose that data to analysts and business users.

??This makes data quality problems visible.

Those data analysts consuming data to build their analytics solutions as well as business users consuming those solutions (reports and dashboards) become aware and impacted by poor quality data. It then becomes in their interest to address and fix bad data at the source ??

One could actually say that I am having some sort of indirect and long-term impact on data quality.


??I wonder:

  • Can anyone relate to my motivation to get into data engineering for data quality?
  • What about the dilemma of fixing bad data upon ingestion or delivery to data analysts?


??And a random question:

  • What does the picture in this article symbolize?








??Bogdan Cioat?

Senior Platform Engineer hos INTELLISHORE

1 年

The rust spreads slowly but surely, until it's too late, one can say :) But don't loose hope, there are tools at hand that can help with such issues. Have a look at dbt, I'm sure it will tackle all the concerns and issues you raised in this post. Some good starting (and motivational) points to start with: * https://www.getdbt.com/blog/data-quality-dimensions/ * https://www.getdbt.com/blog/data-quality-framework/ Have fun and the time will fly :)

要查看或添加评论,请登录

Ivanna Jurkiv Ditlevsen的更多文章

  • Data Reading Club #10

    Data Reading Club #10

    My focus these past few weeks has been on preparing for #dp203 certification from Microsoft. This meant reading a lot…

    2 条评论
  • Data Reading Club #9

    Data Reading Club #9

    How to Identify Your Business-Critical Data by Mikkel Dengs?e There comes a point when a data team delivers so many…

  • I am now a 7 months old data engineer part 1

    I am now a 7 months old data engineer part 1

    They say that time flies when you are having fun. It has certainly zoomed by at warp speed for me ?? in the last few…

    9 条评论
  • Data Reading Club #8

    Data Reading Club #8

    One Version of Truth According to My Cousin Vinny by Eckerson Group ?? Is there such a thing as one version of the…

    4 条评论
  • Data Reading Club #7

    Data Reading Club #7

    Microsoft Fabric Launch Digital Event ??It has been a big week in the world of Azure users. An annual Microsoft Build…

    1 条评论
  • Data Reading Club #6

    Data Reading Club #6

    The Rise of the Semantic Layer in the Modern Data Stack with Dave Mariani- Monday Morning Data Chat ???How do you…

  • Data Reading Club #5

    Data Reading Club #5

    The Failed Promises of Extract, Transform, and Load—and What Comes Next ??Disclaimer: this is one hell of a provocative…

    2 条评论
  • Data Reading Club #4

    Data Reading Club #4

    Testing Data Pipelines: The Modern Data Stack Challenge by Ari Bajo Rouvinen How do you know that what you deploy to…

    5 条评论
  • Data Reading Club #3

    Data Reading Club #3

    The Death of Data Modeling by Chad Sanderson Although data modeling has been around for decades, it is not something…

    2 条评论
  • Data Reading Club #2

    Data Reading Club #2

    Back to the Future: Where Dimensional Modeling Enters the Modern Data Stack by Tony Dahlager and John Barcheski Data…

    1 条评论

社区洞察

其他会员也浏览了