Data Engineers don't need to be Superman - Here is what you should look into
Photo by Esteban Lopez on Unsplash

Data Engineers don't need to be Superman - Here is what you should look into

In my coaching session we recently had a very exciting question: What are the most important coding and tool skills for data engineers? Especially in relation to job interviews.

My Data Science Platform Blueprint

I think that’s a really good question and I’d like to expand on it a bit. Instead on focusing on certain tools, let’s take a look at my Platform Blueprint first:

No alt text provided for this image

There I have shown all the phases of a data science platform and this is the perfect basis for answering this question!

As you can see, we have the Connect, the Buffer, the Process, the Storage and the Visualization.

There is one thing that you have to know: you should be able to have the knowledge of at least one tool in each of these sections!

You don’t need to be Superman

In my document streaming project, for example, I relied on MongoDB and Docker, and I used Spark, Kafka and FastAPI.

So at each stage of the blueprint you should be able to understand and use one tool.

If you know FastAPI and can create an API there then you can also create an API on GCP. Or you can create an API on Azure; it’s not really a big deal then.

If you know MongoDB, you will be able to transition to DynamoDB or to CosmosDB or whatever. I think that’s what companies care about.

If you know AWS and a company does the same thing on Azure: I don’t think they ever care about that because they’ll know you will figure it out, because you know it’s all sort of similar.

It’s important to have these skills so you understand how you can build a simple ETL job. You take data from a data source like a database or query it from an external API and then write it somewhere into a data warehouse or into a data lake.

People always think they need to be Superman to be a good Data Engineer. You do not need to know all the tools neither do you need to be super good at everything! If you have the right skills in the right areas then you can do a lot. And that is exactly what you need.

This you need 100% of the time

Python and SQL.

You definitely need to have coding skills. Without coding skills you can do almost nothing! Nowadays you need to start coding with Python. Well because…almost everything is Python. And you need to understand how relational databases work. Because without knowing how to query and design a relational database, it’s super hard to get into these other data stores that are NoSQL.

But of course you also need the basic development skills. You have to know how to develop code, how to test your programs, how to use git for your code management or for code repositories and collaboration with your colleagues. I think these are the main things you should know.

From these basics you can then expand your knowledge, like I told you earlier in this post.

I hope this helps you understand that it’s ok to not know everything and that you should focus on getting good in a few things.

See you later.

Andreas

  • Do you want to learn Data Engineering? I will help you learn the points I have shown in the article. Check out my?Data Engineering Academy!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了