Data Engineering: Where the Magic of Analytics Starts
In 2012, HBR had a front page article titled "Data Scientist: The Sexiest Job of the 21st Century" (link below). Executives worldwide began to understand and appreciate the world of possibilities that can be unlocked by having Data Scientists. That has unleashed massive waves of interest in Data Science/Machine Learning/Artificial Intelligence. Sadly, we paid a fraction of interest to foundation - Our Data.
I hope that this can start a bigger discussion about how to appreciate the critical role Data Engineers play.
How things have evolved over 8 years...
Data Science 8 years ago was hard. There was a dearth of qualified talent, the tools were still nascent and we were getting through the hype curve. Today, there is an incredibly rich pipeline of talent in Data Science. Schools have done an incredible job to cater to these roles and creating the right curriculum combined with the explosion of the Massive Open Online Courses like Udemy and Coursera.
The process to build models has been largely commoditized as there is a pretty clearly defined way that most models are built. This has led to dozens of auto ML capabilities such as Data Robot, PyCaret, Einstein, etc.. Through a lot of pain, there is a reasonable understanding of what Data Science can do (and what it can't) and we've learned as an industry how to manage the business discussions better to set ourselves up for success. Ultimately, Data Science is a hell of a lot easier today.
What hasn't gotten easier is the data management. Data Engineering (the modern term for the practice of data management from ingestion, cleaning, storage, security and access management) is the plumbing behind Data Science. Generally, we don't appreciate it. Like most utilities (water, sewage, electricity), we don't care about them until they stop working. We just expect data to be ready for use, documented and cleaned. Data Engineering didn't get a front page article in HBR, but it should.
The tools are better, but the expectations are way higher. We need to ingest data from APIs, production transaction data, test, images, IOT data and seamlessly blend them. Unlike ML, there's not a way to automate this. There's no singular path to managing data lifecycle, it depends too much on domain knowledge. It requires someone to understand the data coming in, make decisions about how to clean it, make decisions about how to integrate it into other sources and lastly document it.
As for talent, educators have not done nearly enough to create ready to work Data Engineers. For every 100 data science programs, there may be 1 data engineering program. Most programs that do exist are purely technical focusing on how to use a specific application and not teaching the theory of data operations, data architecture/modeling, security and governance.
So, what do we need to do to change this?
To Business Leaders:
领英推荐
?To Educators:
To Analytics Practitioners:
Data Engineering may not be the sexiest role in 2021, but, it's probably one of the most important ones.
Let me know your thoughts!
Excellent article. People often become enamoured with the sexiness of Machine Learning, forgetting the data cleansing and formatting needed before applying the appropriate techniques. Sarah Bacon
This is a terrific article. We’ve always advocated with clients to focus on data integration, governance and quality as means to improve the output of any analytics.
We do need more!
MBA, BASc. | CLSSMBB | CCMP | Transformation | Program Mgmt | Strategy Planning & Deployment | Board Member
3 年Great article! One of the themes here and in the comments is expectations of executives making the data asks. Their data literacy, including understanding the roles and constraints are an important dimension. Outside of a call for action, I would encourage ways to help them understand the challenge first hand. Show them the data, uncleansed, unstructured. Ask them for insights. I would be curious about their response.
Data @ Fig
3 年Great article, thanks for sharing! I find that a lot of learning and development is done on the job as well as within academic settings. In addition to creating DE specific educational programs, I would also suggest cultivating a collaborative work culture where other professionals from the analytics realm (data scientists and analysts alike) can grow into the role through development and mentorship opportunities within the business. It seems that the bigger the company, the bigger the gap between specialties and, as such, the harder this is to achieve.