Building Trust in Data
Keren Henninger
? Data Exploration ?? Analyst at the Crossroads ?? Supply Chain @ Abbott
Much like how the Google Data Analytics course begins, "data, data everywhere" (I chose not to take this course)... Well, you get it, data really is everywhere.
And the Modern Data Stack is confusing. Two years ago, it looked like this:
Fear not, the Modern Data Stack is gradually consolidating.
Beware, there's going to be lot of buzz words mentioned.
In their practical use, both are more related to the work of the data engineer.
Data engineers spend about 50% of their time maintaining the pipelines, trying to minimize the data down time. And time is money.
1 minute of downtime on AWS costs $200,000 at the very minimum.
Having reliable data is ideal. The reality is less than ideal. And which of the data roles is responsible for that?
In comes Data Governance. "Data governance is everything you do to ensure data is secure, private, accurate, available, and usable. It includes the actions people must take, the processes they must follow, and the technology that supports them throughout the data life cycle." (Google Cloud)
So, observability means discovery, resolution, and most important in my mind- prevention. There's no point (although, there is glamour) in just resolving issues. Prevention may take some effort and won't justify itself right away, but will save time and money in the long run.
Data engineers don't need to be responsible for data quality. Data governance brings responsibility to the data producers- within domain data teams.
And data analysts can spend about 80% of their time cleaning and processing the data. Why do they need to do that? because it's wrong. Don't draw insights and present them to your stakeholders out of wrong data.
Data Quality- the data needs to be accurate, complete, consistent, valid, timely and unique.
Garbage in = Garbage out.
Not to mention the data that never gets used and just sits in storage (that's the Dark Data).
?? Linking the data teams to ROI is not simple, but it is the direction of data: optimizing the spend and running metrics around it. Your organization needs to be confident that the data is trust worthy and actionable, and that it has a monetary value.
Disclaimer: I am a data analytics student, this is written from my point of view and and is inspired in part by Monte Carlo, the Data Observability Platform.
? Data Exploration ?? Analyst at the Crossroads ?? Supply Chain @ Abbott
2 年(Not resources, but relevant to further explore "The Modern Data Stack Evolution - 2023 will be the year of consolidation" by Chris Tabb from Data Day Texas 2023, and "What is Data Observability? 5 Key Pillars To Know In 2023" from Monte Carlo)