I am now a 7(8?) months old data engineer part 2
Ivanna Jurkiv Ditlevsen
Senior Data Engineer | Novo Nordisk Engineering
I recently began a series of posts on my learnings in a role of a data engineer.
The focus of my first post was on environments (not like ???????? but deployment environments like dev, test, prod).
It's quite a life-changer to go from working in a single environment to suddenly doing development in dev, testing in uat and release production-ready features to production environment ??
One more realization that I had is on the topic of data quality.
??Here is to number 2 of my data engineer learnings.
Data Quality Chase ??
One of my biggest motivators for shifting into data engineering was the impact I would have on the quality of analytics data.
Back in the day when I was a data analyst, I was constantly frustrated by inconsistencies I would find in the tables that I used when building reports, dashboards and when doing analyses??.
Sometimes data was outdated ?? and sometimes it was simply wrong and not consistent with the source system ??.
My thinking was: ??Let me get into data engineering so that I can learn to build robust pipelines and deliver good quality data for data analysts.
To my surprise however, it takes much more than a motivated data engineer, or a team of data engineers for that matter, to have some sort of impact on data quality.
Data is ingested from somewhere (a source system). And so if data is of low quality already there, then data engineers have little to no hope.
Surely one can always build some advanced set of data quality checks and rules to correct bad data upon ingestion into a data lake, or when performing cleaning and transformations.
Although isn't that a bit too much too ask of a data engineer?
(I think it is. But that's a topic for another post)
??What now, you might ask? How do I live knowing that even on a data engineer role I have so little impact on data quality?
领英推荐
I live wonderfully, actually?? and that is because while I cannot magically correct bad data coming from source systems, I can expose that data to analysts and business users.
??This makes data quality problems visible.
Those data analysts consuming data to build their analytics solutions as well as business users consuming those solutions (reports and dashboards) become aware and impacted by poor quality data. It then becomes in their interest to address and fix bad data at the source ??
One could actually say that I am having some sort of indirect and long-term impact on data quality.
??I wonder:
??And a random question:
Platform Engineer hos INTELLISHORE
1 年The rust spreads slowly but surely, until it's too late, one can say :) But don't loose hope, there are tools at hand that can help with such issues. Have a look at dbt, I'm sure it will tackle all the concerns and issues you raised in this post. Some good starting (and motivational) points to start with: * https://www.getdbt.com/blog/data-quality-dimensions/ * https://www.getdbt.com/blog/data-quality-framework/ Have fun and the time will fly :)