Yet Another Data Engineering Roadmap, But With a Twist
???? ????? ???? ??? roadmap ??Data Engineering ?? ???? ????. ???? ????? ??roadmap ?? ?? ????? ??? ???? ?????? ????? ??agile . ????? ???? ???? ???? ?????????? ??? ?? ???? ?????? ?? ????? ? ?????? ????. ???? ???? ??waterfall ???? ?? ?????? ????? ??? ??? ????? ? ??? ??? ???? ???? ????, ?? ??? ????? ??????? ? ?? ??? ??? ????? ??????? ??? ???? ????? ?? ?????? ??????? ? ????? ???? ?????? ??? ???? ???? ????? ??? ? ????. ?????? ?? pie chart ? ?????? ???? ?????:
- basics
- tools
- concepts
?? ??????? ????? ??? ???? ??? ??? python ? SQL ? ??data modeling ????? ??????? ??? ??? ??????. ?????? ???? ???? ??? ??????? ????? ?? ???? docker ?? ??????? ??. ?? ??? ???? ???? ???? ????? ??????? ??? ????? ??? ????? ???? ???? ?? ??? ?? ???? ??? ???? ?????? ??????.???????? ???, ???? ???? ???? ???? ?? tool ?? ?? ????? ?? ?? ????? ????? ???? ???? ???? ????? ?? ????? ??? ???? ?????
??? ??tools ?? ??????? ????? ???? Apache Spark, Apache Hive ? HDFS ????? ?? ???? ??tools ????????? .??? ?? ???? relational database ? ??? ???? MySQL ????? ?????? ? ???? ??? ??SQL dialect ???? oracle
??? ??? ???? ??concepts ???? ??? ???? ????? ?? Data quality ?? batch vs stream processing ?? data warehouse ?vs data lakes ? ???? ???? ????? ??????? ?????
??????? ????? ????? ??? ??? ???? ?????? ??????? ???tools ? ??????? ????????, ??? ???? ?? ????? ?????? ??? ????? ???? ????? ????? ????? ????? ???? ??? ???? ? ???? ??? ??? ?? ????? ? ?? ???? ?????? ????? ??????? ??????
????? ???version controlling ????? ?????? ???? ?????? ?? ?????? ????? ? ????? ???? ??????? ??? ?? ???? ???? ?? ???? ? ?????? ??, ? ??bash scripting ????? ????? ???? ??? ???? ?? ????? ??? ???? ???? administrative tasks ????tools ?????? . ? ????? ??? ??file formats ???? ????? ????? ????? ??? ??csv.
??? ??? ???? ???tools. Apache airflow ? Apache Kafka ??? ???? ???? ?? ????? ??????. ????? ????? ???? ???? ?? ??NoSQL databases ? ??? ???? ???????? ?? MongoDB ?? ?????? ? ?????? ???????. ? ????? PostgreSQL ? ?? relational database ????? ?? ?????? ??? ????? ??? ???? ?????.
??? ??concepts ?? ??????? ?? ??????? ?????? ????? ?????? ???? ?? ?? ????? ????? ??? ??????? ?????. ?????? ?? ??data governance ? ??master data management ???data catalogs ?? ????? ??? ???? ?? ?? midlevel ???? ??????, ?? ??? ???? ??? ??????? ???? ???? ???? ? ???? ???? ?????? ???? ???? ??????.
??????? ?????? ??? ?? ??? ???? ????? ??? ????? ??????? ?? ????? ?????. ???? ???? ?????? ?????? ???? ????? ???? ?? ??? ??? ???? ???? ????? ????? ?????????????? ??? multi machine systems ? ?????? ???? ??? scale ????.
?? ??basics ????? ????? ?????? ?????? ? ?????security ? networks ?? ?? ???? ????? ?? ???????? ?? ???? ??? ??? ???? ???? ??? clusters . ??? ??clusters ?????? ?? ????? ???? ????? ???? Ansible ????? ?????? ???? automation ?????? ???? ????? ???? ???? ? ??????. ???? ???? Ansible ??? automation tool ??? ??????.
?? ??? ??tools ???? ??? ???? ??? ????? ?? BI tool. ??BI engineers ? ??data analysts ??? ??? ??? ??????? ????? ? ???? ??? ??????? ?????? ??????, ???? ???? ?? ??data scientists. ???? ???? ???? ??? ??????? ??? ? ???? ?? ?????? ???. ? ??????? ?? ??? ????? ??? ??? ??? ???SQL. ????? ?? ????? ???? ????? ?? cloud platform. ???? ???????? ??? ????? ??? ??????? ?????? (AWS, Google Cloud, Azure) ?? ?????? ???? ????? ????? ?? ??? . ?? ???? ????? ????? ?? ??????? ???? ???? ?? ??????? ??????? ???? ???? ???? ?? ?? ????? ?????.
? ????? ?????? ??security ???? ?? ??concepts ?????? ???? ?? ???? ???? ?? ??data security ?????? ?? ??? ??concepts
?? ??? ???????? ?? ???? ???? ???? ??? ??????, ?????? ???? ??????? ??? ????? ?? ????. ???? ?? ???? ??????? ????? ??? ????? ?? ???? ????? ????? ????????
??roadmap ?? ????? ??? ???? ??agile ? ??iterations ?? ?? ??? ?? ?????, ??? ????? ??? iteration. ?? ???????? ???? ???? ???? ? ?? ???? ??? ???? ??????. ?
领英推荐
?
I have created a data engineering roadmap. Of course, all these ideas and topics are from my point of view but one thing I tried to do is to build it in the concept of level, so everyone can have a sense of where they stand now and where they need to go. Each level is consisting of three main parts:
-?Basics
-?Tools
-?Concepts
Level 1 is the entry level. You have you python, SQL and Data Modeling in your basic skills which every data engineer should have. The odd one here that you might not expect is Docker. It will prove very helpful in your journey ahead when you need to work on a tool quickly rather than going through the hassle of installing it.
For tools I have put Apache Spark since it’s a super processing framework especially if you already know python, HDFS and Apache Hive as they are regarded two of the core tools of the data field, and finally MySQL as it is a very popular relational database with the same SQL dialect as the Oracle database.
The concepts’ part has some very important core concepts such as Data Quality, Batch processing vs Stream processing and Data Warehouses vs Data Lakes.
Level 2 is more of a midlevel. You start doing some administrative tasks on your tool stack so you discover the need to learn Bash Scripting. Perhaps you need to manage multiple versions of your config files or data pipelines so you also need to know Version Control. And now that you have been working for a while you may have discovered that the world is not all CSV so you need to have some knowledge about other file formats such as Avro and ORC.
For the tools you should have heard by then about Apache Kafka, the leading event streaming platform. Also, Apache Airflow is very popular as an orchestration framework. I also recommend knowing at least one NoSQL database so I put MongoDB there put feel free to put any other NoSQL database. Lastly also we already have put MySQL in the earlier level put PostgreSQL is so popular that I feel we have to know how to work with it comfortably.
The concepts that I recommend here might be a little advanced for a midlevel, but I feel that they should start searching and learning about them to be able to move up the seniority ladder. So I have put here things like Data Governance, Data Catalogs and Master Data Management.
?When you reach level 3 you should be able to build or at least help in building data warehouses from scratch, so you might have to learn off-field topics such as security and networks to help you in some of your tasks. As well as an automation tool to help you manage your cluster. I recommend Ansible but feel free to learn any other tool.
In the tools’ part I recommend learning at least one cloud platform (AWS, Google Cloud, Azure) as well as a BI tool. BI engineers and data analysts are your most important customers that consume your data so you need to know how they work, plus learning a BI tool is fun and easy if you already know SQL.
The concepts’ part only contain the data security concept but it’s an important one that is normally overlooked.
During this whole roadmap, one should work on as many projects as it is possible. Projects are the best proof that you already learned this topic or tool.
In the true essence of agile, this roadmap is in iteration 1. Any suggestions are welcomed and I will surly notify you if I modified anything in it.
Student at Faculty of Arts Hebrew Department and English Literature
1 年thank you for your effort
Senior Big Data Engineer | Aviation ?? | FinTech
1 年Fares Helmy
Data Engineer@Cyshield
1 年???? ???? ???? ??? ?? ???? ?? article ?? ?????? ??????
Data Engineer@Cyshield
1 年?????? ????? ??? ??? ????????.. ???? ??????? ???? ?????? ?????? ?? ????. ????? ?? ???? ????? ?? ???? ??? ??? ????? ????
Data Science
1 年Ahmed Shaaban ??? ???? ?????? ??? ????? ???????? ??data Analyst?