The Roadmap to learn Data Science in 2022 - The efficient way

The Roadmap to learn Data Science in 2022 - The efficient way


"Data is a gift from yesterday that you receive today to make tomorrow better.”

Welcome back to Data for Everyone! On today's article I wanted to share with you my personal thoughts on the following question: After everything I've experienced, If I could start over with Data Science, knowing what I know today... How would I do it?

Personally, I've wasted countless hours watching YouTube videos and spent money on many online courses that were just not it or added little to no value to my skillset. That's why the objective of this article is to give you the most efficient path for you to go from 0 to a proficient Data person while also sharing the mistakes I made on my personal journey to become a Junior Data Scientist (So that you can avoid them).

Without further ado, let's get into it.

Step #1 - Pick the right tools & get to work

The first thing every beginner should do is to pick a user-friendly data analysis tool and stick to it until they have an advanced knowledge of the tool and can use it efficiently.

I personally started with one of the easiest and most in-demand tools nowadays... Yes, I'm talking about Microsoft Excel.

No alt text provided for this image

Now, is this the best tool there is for data analysis? NO.

But... If I could go back in time I would definitely start again with it. I personally recommend every professional to become familiar with Excel as this is a tool that is easy to learn, is used by most companies and provides a glimpse of the power of Data Analysis, Data Visualization, mathematical formulas & simple algorithms like regression models.

At the same time, although Excel provides a wide range of application for data analysis... It's not meant especifically for it; which creates limitations, crashing and multiple errors when working with large datasets.

And yes, Excel is great but once you've mastered it it's time to move to the big leagues. Now, the second most request tool in the market for data analysis (And the tool every Big Tech company interviews in) is SQL, aka ¨Structured Query Language¨; a standardized programming language that is used to manage relational databases and perform various operations on the data in them.

No alt text provided for this image

You've probably heard about SQL before and you might be wondering why is it so in demand? Well, it doesn't have the limitations that Excel has as it allows you to Extract, Transform and Load large datasets. SQL is a programming language with an easy to understand syntax that makes learning it much more digestable than other languages for database management, and since more data-roles involve at some point the extraction of information or the manipulation of databases, SQL would be the logical tool to use. This programming language will be your stepping stone before getting into the more serious programming languages.

Though, before we get into the more advanced programming languages, every data-oriented role nowadays requires of some sort of visualization skills for data and insights representation. There are dozens of Business Intelligence tools out there but after making some market research during my personal job hunting back in February, I've come to realize that the majority of companies look for professionals with experience in Tableau & Power BI.

No alt text provided for this image

Tableau is a BI tool with an extensive range of data visualization capabilities (More than Power BI), it also appears to be more in demand in the market though its paid versions have a much higher price tag... Which could make the learning process not as budget friendly as with other tools.

No alt text provided for this image

Now, Power BI is part of the Microsoft Suit Office stack so it works smoothly with Excel, both the free version & paid ones are budget friendly compared to many other BI tools. From my personal experience it would be ideal to learn right after SQL as it allows you to run queries through DAX (Data Analysis eXpressions).

From my personal experience, I started with Tableau thanks to a free subscription paid by my school (*If you're still in college ask for it) which I used for many of my school projects. After watching many tutorials and analyzing other people's visualizations, I quickly understood that Tableau would allow me to make "eye-catching" visualizations while requiring less coding & effort since the tool is user-friendly once you understand the basic commands on the interface. Power BI is a great tool to develop your skills on once you have a strong understanding of SQL queries.

*Here's an example of a Tableau visualization I recently created analyzing my LinkedIn's data.

There's one last thing to do once you've mastered the previously mentioned tools... as I mentioned before, you also need a programming language to perform data science... Luckily for us data guys we have to pick between two languages mainly, Python or R.

No alt text provided for this image

I personally started with Python, a "general purpose programming language" which happens to be the most used programming language in the world as well as the easiest to learn out of them all. As I got better at it I was able to understand the freedom of it when solving problems through Data Science, meaning that the range of packages and options that python provided for me at the time was easier to understand and more friendly to learn than other languages.

At the same time, getting better at Python opened the door to learn other skills like web applications, AI applications and even software development.

No alt text provided for this image

R is a programming language that serves as an outstanding statistical tool as well as for data analysis, data visualization and ML (Machine Learning). This is the tool that most Data Scientists use on a daily basis to create ML models and perform statistical analysis while manipulating dataframes.

From my personal experience, I've been working as a full-time junior Data Scientist & I've used both of the tools for my projects... But R has happened to turn as a better ally when creating and testing models than python because of the nature of the projects I've worked on so far.

Now, what should you start programming in? If I had the chance to go back in time I would personally suggest you begin with R, whose syntax is fairly similar to Python's will make you learn math and statical analysis before you develop expertise into it, allowing to make a smoother transition to a general programming language like Python is. Not to mention that ML is majorly modeled in R nowadays... Which could open the door for you to focus on AI & ML in the future.

So, just to make a recap: I suggest you start with Excel, SQL, Tableau/Power BI, R & Python for you to become a data professional and have a realistic chance when it comes to applying for data jobs. This is also the path I took myself which doesn't necessarily means that will work for you since your first language or tool will definitely not be your last.

Step #2 - Learn by Doing

Now that you know the tools you need to succeed, let's talk about where & how you can learn them... Which brings me to the biggest mistake I did when I started my data journey.

The mistake I did is that I tried to learn by watching others.

Most people would argue that to start you would have to do countless data courses in Udemy or Coursera, or even watching hours of YouTube videos in which people actually analyze data. Though it might seem the most reasonable approach to you, without writing code or analyzing data by yourself you'll only get a false sense of progress because analyzing data in your head is very different from actually doing it.

Yes, data analysis is very different in your head from actually analyzing it and stumbling upon messy data, syntax errors and more which will bring you to debug for hours because you watched a YouTube video from a random guy and thought that you could just copy and paste his code and make it work... Spoiler alert: Most of the time it doesn't.

So what is the right way?

Get the reps in. Start coding & analyzing any random dataset that you can find on kaggle or somewhere else.

Step #3 - Free Resources

For Excel, the website I linked will teach you the basics of Excel as well as the most advanced functionalities like formulas, algorithms & visualization.

For SQL, I recommend W3 schools (The website I used to teach myself HTML, CSS, SQL, R and more) one of the best free websites out there to learn coding through guided exercises.

When it comes to Power BI/Tableau, I suggest you look into Datacamp's individual courses as they're free for college students and have one of the best catalogues for beginners in terms of combining theoretical concepts with coding.

Finally, for Python I suggest learn python.org, this has been the website I've been using for years and which I still use (along with Stack Overflow) in order to answer my day to day problems at work or for my personal projects.

Final Step - Consistency

Working in Data is amazing, but it also requires a high degree of theoretical knowledge along with the ability to translate thoughts into code, this is not by any means easy and will require you to constantly sharpen your problem solving and programming skills as much as you can so that you can decipher the most complex problems. Just like in soccer you'll feel rusty and out of shape If you don't go and play for 2 weeks, same happens with data analysis. Start coding and make it a habit, it doesnt't have to be countless hours in your day, take small steps towards learning at a steady pace and the results will come.

If your fingers can't dance and express your logical reasoning into code, your value within the industry will not be as high as it could be. That's why the best way to learn data science is to do data science.

要查看或添加评论,请登录

Alfredo Serrano Figueroa的更多文章

社区洞察

其他会员也浏览了