Data Engineering: Where the Magic of Analytics Starts

Data Engineering: Where the Magic of Analytics Starts

In 2012, HBR had a front page article titled "Data Scientist: The Sexiest Job of the 21st Century" (link below). Executives worldwide began to understand and appreciate the world of possibilities that can be unlocked by having Data Scientists. That has unleashed massive waves of interest in Data Science/Machine Learning/Artificial Intelligence. Sadly, we paid a fraction of interest to foundation - Our Data.

I hope that this can start a bigger discussion about how to appreciate the critical role Data Engineers play.

How things have evolved over 8 years...

Data Science 8 years ago was hard. There was a dearth of qualified talent, the tools were still nascent and we were getting through the hype curve. Today, there is an incredibly rich pipeline of talent in Data Science. Schools have done an incredible job to cater to these roles and creating the right curriculum combined with the explosion of the Massive Open Online Courses like Udemy and Coursera.

The process to build models has been largely commoditized as there is a pretty clearly defined way that most models are built. This has led to dozens of auto ML capabilities such as Data Robot, PyCaret, Einstein, etc.. Through a lot of pain, there is a reasonable understanding of what Data Science can do (and what it can't) and we've learned as an industry how to manage the business discussions better to set ourselves up for success. Ultimately, Data Science is a hell of a lot easier today.

What hasn't gotten easier is the data management. Data Engineering (the modern term for the practice of data management from ingestion, cleaning, storage, security and access management) is the plumbing behind Data Science. Generally, we don't appreciate it. Like most utilities (water, sewage, electricity), we don't care about them until they stop working. We just expect data to be ready for use, documented and cleaned. Data Engineering didn't get a front page article in HBR, but it should.

The tools are better, but the expectations are way higher. We need to ingest data from APIs, production transaction data, test, images, IOT data and seamlessly blend them. Unlike ML, there's not a way to automate this. There's no singular path to managing data lifecycle, it depends too much on domain knowledge. It requires someone to understand the data coming in, make decisions about how to clean it, make decisions about how to integrate it into other sources and lastly document it.

As for talent, educators have not done nearly enough to create ready to work Data Engineers. For every 100 data science programs, there may be 1 data engineering program. Most programs that do exist are purely technical focusing on how to use a specific application and not teaching the theory of data operations, data architecture/modeling, security and governance.

So, what do we need to do to change this?

To Business Leaders:

  1. Encourage more investment into Data Engineering teams and platforms. You need them to make the magic happen.
  2. Tell academia and educators that they need to do more to create talent pipelines for this skill. Help them understand what's missing and where your gaps are.
  3. Be realistic about technology. Technology can help but managing these complex data ecosystems takes smart minds.
  4. Ensure that your thinking about the data risks. Your Enterprise Risk Management should ensure that you're managing the data lifecycle, using security best practices and governing data.


?To Educators:

  1. We need programs that make students ready to start Data Engineering roles. At least in Canada, they're almost non-existent.
  2. Make sure your programs are educating leaders on the value of Data Engineering. AI and ML is sexy, but they need to know that it's a team sport!


To Analytics Practitioners:

  1. Really consider data engineering as a field for your career. There's probably no area more in demand, pay is great and the challenges are enormous.
  2. We need a way to speak to the business value of data. We need to help educate business leaders on why these investments are required.
  3. If you're an analytics executive, join me in pushing for these changes!

Data Engineering may not be the sexiest role in 2021, but, it's probably one of the most important ones.

Let me know your thoughts!


Excellent article. People often become enamoured with the sexiness of Machine Learning, forgetting the data cleansing and formatting needed before applying the appropriate techniques. Sarah Bacon

回复

This is a terrific article. We’ve always advocated with clients to focus on data integration, governance and quality as means to improve the output of any analytics.

回复
David Haigh

MBA, BASc. | CLSSMBB | CCMP | Transformation | Program Mgmt | Strategy Planning & Deployment | Board Member

3 年

Great article! One of the themes here and in the comments is expectations of executives making the data asks. Their data literacy, including understanding the roles and constraints are an important dimension. Outside of a call for action, I would encourage ways to help them understand the challenge first hand. Show them the data, uncleansed, unstructured. Ask them for insights. I would be curious about their response.

回复

Great article, thanks for sharing! I find that a lot of learning and development is done on the job as well as within academic settings. In addition to creating DE specific educational programs, I would also suggest cultivating a collaborative work culture where other professionals from the analytics realm (data scientists and analysts alike) can grow into the role through development and mentorship opportunities within the business. It seems that the bigger the company, the bigger the gap between specialties and, as such, the harder this is to achieve.

要查看或添加评论,请登录

Brad Kent的更多文章

  • Analytics as an Agent of Change

    Analytics as an Agent of Change

    Analysts are trained by most schools to focus on databases, statistics, machine learning models, programming and…

    7 条评论

社区洞察

其他会员也浏览了