Data Science from the beginning
Dr. Rejwan Bin Sulaiman
CEH Certified Cyber Security & Google Certified Data Science Expert | Mentor for Cyber Security or PhD Students
Data Science has become a dream job for many of us. But for several, it looks like a testing puzzle and they don’t know where to start. If you are one of them, then continue reading.
In this post, I’ll discuss how you can start your journey of Data Science from the very beginning.
I’ll explain the following steps in detail.
If you are from an IT background, you are probably familiar with programming with Python, so you can skip this step. But if you’re yet not exposed to the pleasure of coding, you should start learning Python. It’s the easiest to learn of all programming languages and is widely used for development and data analytics.
To begin with, you can search for free online tutorials that will help you understand the basics of Python. I am listing a few links that will help you learn Python on your own in a flash. You can try these out and choose what suits you the best.
The list is not extensive and you can find numerous resources on the web that can help you start learning the fundamentals of Python. You can also find many YouTube channels that have Python tutorials for starters.
Once you are familiar with the syntax and other basics of programming, you can resume learning the intermediate and advanced levels of Python. I recommend you to complete at least the intermediate level, so you can be familiar with Data Structures and File Systems in Python.
Data Science is the expertise of scrutinizing the data and drawing practical and actionable discernment. For that, you must have understanding of basic Statistics and Mathematics. You should know the basics to comprehend principal things like the distribution of data and the working of algorithms.
First of all, go through your high school statistics so you can pick up base again. For that, I recommend Khan Academy’s series of?High School Stats.
After refurbishing your high school concepts, You can start perusing any of the following books:
The above links will directly take you to the respective pdf versions of these books. You can also purchase the physical copies as per your convenience. After having read one of these books, you will also get familiar with the fundamentals of Data Analysis which will help you in the next step.
Note:?My general advice is to always have an open mind for whatever you cross paths with. The prime working and logic are generally the same if you are performing a task in two different languages. It’s only a matter of syntax and framework that varies.
Now that you know the basics of Python programming and the required Statistics, its time to finally get practical.
If you want to learn without paying anything, just make an account on Udacity and sign up for their free course —?Intro to Data Analysis.
This course will introduce you to the useful Python libraries such as?Pandas?and?Numpy, that are needed for Data Analysis. You can learn at your own pace and easily finish the course in a few weeks.
You can also find Nanodegree programs offered by Udacity, for which you generally have to pay.
If you are comfortable paying for learning, there are many good platforms such as?Coursera,?Dataquest,?Datacamp, etc. Although I strongly suggest you check out?DataCamp career tracks.?You can find the track that suits you best based on how much you already know.
By the end of this step, you should be familiar with some important libraries of Python and data structures like?Series,?Arrays, and?DataFrames. You should also be able to perform tasks like data wrangling, drawing conclusions, vectorized operations, grouping data, and combining data from multiple files.
The final key to bridging the gap between Analytics and Machine Learning —?Data Visualization.
Data Visualization is an important part of Data Analytics as it helps you draw conclusions and visualize patterns in the data. Therefore it is peremptory to learn how to visualize data. The best and the simplest way to do so is to go through?Kaggle’s course of Data Visualization. After this, you will be familiar with an important Python library —?Seaborn.
Note:?Kaggle is a popular website among Data Scientists all over the world. It conducts timely contests to challenge the skills of data-savvies and also provides free interactive courses to help budding data enthusiasts such as yourselves.
Machine Learning, is the process with which a machine (computer) learns itself. It is the study of computer algorithms that improve automatically through experience. You build models mostly using predefined algorithms depending upon the kind of data and business problem you are facing. These models train themselves on a given data and are then used to draw conclusions on new data.
The simplest way to go about learning Machine Learning would be to go through the following courses on Kaggle in the given order:
Although there are many other ways to?learn Machine Learning, I have mentioned the easiest one for which you don’t have to pay. If money is not the constraint for you, you can explore various courses on?DataCamp,?Coursera?(one of the best), Udacity, and other related platforms.
Once you have achieved all the knowledge, you must reserve it and enrich it by practicing as much as you can. For this, you can find projects to work on and business problems to solve.
One of the best ways to stay in habit is by participating in Kaggle contests and solving problems.
Kaggle gives you the problem to be solved and the required data to work on. If it’s a contest, you can submit your results and get a rank in the leaderboard based on your score.
You can also work on personal projects to build a portfolio of your own. You can try the following sources to explore datasets:
To practice, I recommend you download and install?Anaconda?on your local machine. This is a great toolkit for doing your Data Science projects.
You will find?Jupyter Notebook?as one of the tools in Anaconda, which is a great way to build Python projects and showcase them in your portfolios.
I am sure that following the guidelines in this article would have helped you execute the goal of learning data science.
Healthcare Data Analyst | Physical Therapist — Driving Informed Healthcare Decisions through Data Analytics | Experienced in Optimizing Healthcare Resources and Managing Patient Costs
3 年Thank you for this post, Rejwan. I already shared it with my friend. I'll definitely share this also with my wife and son (who I also inspired to get into DS). Come to think of it, maybe I should've been an IT recruiter. ??