Data Engineers don't need to be Superman - Here is what you should look into
Learn Data Engineering
We teach Data Engineering and help companies recruit top talent
In my coaching session we recently had a very exciting question: What are the most important coding and tool skills for data engineers? Especially in relation to job interviews.
My Data Science Platform Blueprint
I think that’s a really good question and I’d like to expand on it a bit. Instead on focusing on certain tools, let’s take a look at my Platform Blueprint first:
There I have shown all the phases of a data science platform and this is the perfect basis for answering this question!
As you can see, we have the Connect, the Buffer, the Process, the Storage and the Visualization.
There is one thing that you have to know: you should be able to have the knowledge of at least one tool in each of these sections!
You don’t need to be Superman
In my document streaming project, for example, I relied on MongoDB and Docker, and I used Spark, Kafka and FastAPI.
So at each stage of the blueprint you should be able to understand and use one tool.
If you know FastAPI and can create an API there then you can also create an API on GCP. Or you can create an API on Azure; it’s not really a big deal then.
If you know MongoDB, you will be able to transition to DynamoDB or to CosmosDB or whatever. I think that’s what companies care about.
领英推荐
If you know AWS and a company does the same thing on Azure: I don’t think they ever care about that because they’ll know you will figure it out, because you know it’s all sort of similar.
It’s important to have these skills so you understand how you can build a simple ETL job. You take data from a data source like a database or query it from an external API and then write it somewhere into a data warehouse or into a data lake.
People always think they need to be Superman to be a good Data Engineer. You do not need to know all the tools neither do you need to be super good at everything! If you have the right skills in the right areas then you can do a lot. And that is exactly what you need.
This you need 100% of the time
Python and SQL.
You definitely need to have coding skills. Without coding skills you can do almost nothing! Nowadays you need to start coding with Python. Well because…almost everything is Python. And you need to understand how relational databases work. Because without knowing how to query and design a relational database, it’s super hard to get into these other data stores that are NoSQL.
But of course you also need the basic development skills. You have to know how to develop code, how to test your programs, how to use git for your code management or for code repositories and collaboration with your colleagues. I think these are the main things you should know.
From these basics you can then expand your knowledge, like I told you earlier in this post.
I hope this helps you understand that it’s ok to not know everything and that you should focus on getting good in a few things.
See you later.
Andreas