Data Engineer
Credit: Databricks

Data Engineer

What does a data engineer actually do every day? What kind of skills do they use? What do they build? Are they developers or designers or both? What’s the difference to data scientists and data analysts?

Let’s take a pause, and actually think about this relatively new role. A role which is in high demand. Every data platform project needs them. Every AI project needs them. Those two alone means millions of jobs. Meaning that data engineers are in short supply. More of them are required than the quantity available.

What is a data engineer? Many sources define it as the professionals who design and build systems that collect, store and process data. These systems consist of: - data stores, e.g. databases, file systems, data streams. - data ingestions, i.e. things that brings data into those data stores. - data transformations, i.e. things that process the data in those data stores. - data distribution, i.e. things that send the output of those processing to the users.

So data engineers are builders and designers. They build data processing systems. They build the data pipelines, the data stores, and the data processing engines.

Those are how we data engineer explains our role to an outsiders. The language is sanitised, no technical jargons. But between ourselves we use thousands of “engineer speak” words like deployment, orchestration, analytics, ML, batch, silver zones, fact tables, Spark, lakes, data quality, data governance, error handling and recovery. Below is an example of a data processing system that data engineers build (credit: Databricks). We call it a data platform.


What kind of skills do data engineers use? For processing data most of data engineers use SQL or Python, but also many other languages like R, Java, Scala and node.js. Then some kind of data transport tools, cloud data storage, and database management.

The number of technologies that data engineers have to learn is mind boggling. We are not talking about a handful, i.e. just five. It’s more like 30 different tools. You have orchestration, devops, data ingestion pipelines, creation of infrastructure using IaC, and many more.

Because there are so many tools to master, which is borderline impossible, many data engineers focus on one cloud platform (say AWS or Azure), one database management (say SQL Server, Snowflake or Databricks), one data transformation tool (say ADF, DLT or dbt). Using this strategy, the secondary tools like orchestration, IAC, DevOps and DQ are accepted as non-critical for data engineers, and left to be performed by other roles such as DevOps engineers and DQ engineers.

What’s a day-to-day life of a data engineers look like? Well, they code. They build data pipelines. They test their work, to make sure that the output of those pipelines are as expected. They test the performance of the transformation/processing, to makes sure that that it performs within acceptable limits. They deal a lot with data issues. Sometimes they deal with the data issues by correcting/adjusting them in the transformation. Sometimes they put it back for data remediation in the source.

There are many roles in a data project, from project managers to testers. But data engineers is like the glue which interacts with all other roles. They work with the data analysts. They work with the data architects. They work with the network engineers. They work with the testers. They work with the business analysts. They work with the business users. They work with the project managers. They work with the ML engineers.

Apart from building data pipelines, data engineers also build data stores, i.e. data lake, data warehouse and databases. Even storage system for unstructured data, streaming data, and semi-structured data (JSON, XML, social media data, etc.). So the types of database they deal with varies. Not just relational, but also graph database, document DB and other NOSQL databases. So data engineers must also be skillful in data stores, from designing to building it. Yes the era has changed from the time where we had a dedicated “data architect” designing the database. These days, data engineers also design the data stores. And not just relational databases, but all kinds of lakes and NOSQL databases too.

Hassan Kroud

Senior Fullstack Entwickler - Data Engineer

5 个月

Why Experienced Software Developers Can Transition to Data Engineering Quickly Software developers with years of experience are well-positioned to move into data engineering. Here's why: 1- Problem-Solving Skills: Developers excel at solving complex problems, a key skill in data engineering for building and optimizing pipelines. 2- Programming Expertise: Data engineering relies on languages like Python and Java, which developers often already know, making the technical transition smooth. 3- Database Experience: Developers familiar with databases and SQL can easily manage and optimize large datasets. 4- Cloud & DevOps Knowledge: Cloud platforms (AWS, Azure) and DevOps practices align closely with data engineering, making it easy for developers to adapt. 5- Analytical Mindset: Developers’ problem-solving skills are highly applicable in data engineering for managing data flows and ensuring scalability. 6- Agility with Tools: Developers are fast learners of new tools like Spark, Kafka, and Hadoop, which are critical in data engineering. In short, software developers can quickly transition to data engineering, leveraging their existing technical skills and experience.

Data engineers are the backbone of any data-driven initiative, acting as both architects and builders of the data infrastructure that powers insights. Their role goes beyond just coding—it's about designing scalable, efficient systems that ensure data flows seamlessly from collection to transformation to analysis. While data scientists and analysts interpret data, it's the data engineer who ensures the data is clean, accessible, and ready for use, making their contribution foundational to the success of AI and data platform projects. In a world awash with data, the demand for skilled data engineers will only continue to grow.

要查看或添加评论,请登录

Vincent Rainardi的更多文章

  • Data Warehousing Basics: Dimensional Model

    Data Warehousing Basics: Dimensional Model

    If you build a warehouse or a lakehouse for analytics or reporting, in most cases the best data model is dimensional…

    5 条评论
  • DQ Engineering

    DQ Engineering

    DQ stands for Data Quality. If you don't have a background in data quality, read this first: https://www.

    6 条评论
  • Data Product

    Data Product

    For those of you who don't know what a data product and “data as a product” are, please read this first:…

    13 条评论
  • Snowflake vs SQL Server

    Snowflake vs SQL Server

    Sometimes we need to remind ourselves that Snowflake is not an OLTP database. I know today is the era of Hybrid tables…

    6 条评论
  • Data engineer becoming solution architect

    Data engineer becoming solution architect

    Are you a data engineer thinking about transitioning to a cloud solution architect? Data engineer are good with…

    2 条评论
  • Asset Mgt vs Fund Mgt vs Investment Mgt vs Wealth Mgt: What's the difference?

    Asset Mgt vs Fund Mgt vs Investment Mgt vs Wealth Mgt: What's the difference?

    If you work in banking or investment or any other sector in financial services, you might be wondering about the above.…

  • Data Warehousing Basics: Cost

    Data Warehousing Basics: Cost

    If you call yourself a data engineer you need to be aware of 2 additional things compared to a developer. The first one…

    2 条评论
  • My Linkedin post & articles

    My Linkedin post & articles

    The list below goes back to Nov 2024. For older than that see here.

    10 条评论
  • Data Warehousing Basics: Single Customer View

    Data Warehousing Basics: Single Customer View

    Imagine that you work for an insurance company who sell health insurance (HI), life insurance (LI), general insurance…

    2 条评论
  • Data Warehousing Basics: NFR

    Data Warehousing Basics: NFR

    What I’m about to tell you today failed a lot of data warehousing projects which is why it’s worth paying attention so…

    1 条评论

社区洞察

其他会员也浏览了