5 Beginner Friendly Steps to Learn Machine Learning and Data Science with Python
Daniel Bourke
Teaching beginners ML at zerotomastery.io, building ML at nutrify.app
I replied to a handful of emails this morning. There were a few questions about learning machine learning and data science.
Someone told me they’d done some Python and wanted to know what to do next.
I’ve learned Python, what should I do next?
I put together a couple of steps in the email and I’m copying them here. You can consider them a rough outline to go from not knowing how to code to being a machine learning practitioner.
Remember, if you’re starting to learn machine learning, it can be daunting. There’s a lot. Take your time. Bookmark this article so you can refer to it as you go.
My style of learning is code first. Get code running first and learn the theory side of things when you need to, not before.
I’m biased towards using Python because that’s what I started with and continue to use. You could use something else but these steps will be for Python.
1. Learn Python, data science tools and machine learning concepts
The email said they’d already done some Python. But this step is for someone who’s completely new as well. Spend a few months learning Python code at the same time as different machine learning concepts. You’ll need them both.
Whilst learning Python code, practice using data science tools such as Jupyter and Anaconda. Spend a few hours tinkering with them, what they’re for and why you should use them.
Resources for learning
- Elements of AI — overview of major artificial intelligence and machine learning concepts.
- Python for Everybody on Coursera — learn Python from scratch.
- Learn Python by freeCodeCamp — all major Python concepts in one video.
- Anaconda Tutorial by Corey Schafer — learn Anaconda (what you’ll use for setting up your computer for data science and machine learning) in one video.
- Jupyter Notebook for Beginners Tutorial by Dataquest — get up and running with Jupyter Notebooks in one article.
- Jupyter Notebook Tutorial by Corey Schafer — learn how to use Jupyter Notebooks in one video.
2. Learn data analysis, manipulation & visualization with pandas, NumPy Matplotlib
Once you’ve got some Python skills, you’ll want to learn how to work with and manipulate data.
To do so, you should get familiar with pandas, NumPy and Matplotlib.
Pandas will help you work with dataframes, these are tables of information like you would see in an Excel file. Think rows and columns. This kind of data is called structured data.
NumPy will help you perform numerical operations on your data. Machine learning turns everything you can think of into numbers and then finds the patterns in those numbers.
Matplotlib will help you make graphs and visualizations of your data. Understanding a pile of numbers in a table can be hard for humans. We much prefer seeing a graph with a line going through it. Making visualizations is a big part of communicating your findings.
Resources for learning
- Applied Data Science with Python on Coursera — start tailoring your Python skills towards data science.
- pandas in 10-minutes — a quick overview of the pandas library and some of its most useful functions.
- Python pandas Tutorial by Codebasics — YouTube series going through all of the major capabilities of pandas.
- NumPy Tutorial by freeCodeCamp — learn NumPy in one YouTube video.
- Matplotlib Tutorial by Sentdex — YouTube series teaching all of the most useful features of Matplotlib.
3. Learn machine learning with scikit-learn
Now you’ve got skills to manipulate data, it’s time to find patterns in it.
scikit-learn is a Python library with many helpful machine learning algorithms built-in ready for you to use.
It also features many other helpful functions to figure out how well your learning algorithm learned.
Focus on learning what kind of machine learning problems there are, such as, classification and regression, and what kind of algorithms are best for those.
Resources for learning
- Machine Learning in Python with scikit-learn by Data School — YouTube playlist which teaches all of the major functionality in scikit-learn.
- A Gentle Introduction to Exploratory Data Analysis by Daniel Bourke— overview of some of major concepts in exploratory data analysis, comes with code and video to help you enter your first Kaggle competition.
- Daniel Formosso’s exploratory data analysis notebook with scikit-learn — more in-depth version of the resource above, comes with end-to-end project.
4. Learn deep learning neural networks
Deep learning and neural networks work best on data without much structure.
Dataframes have structure, whereas, images, videos, audio files, natural language text have structure but not as much.
For most cases, you’ll want to use an ensemble of decision trees (Random Forests or an algorithm like XGBoost) for structured data and you’ll want to use deep learning or transfer learning (taking a pre-trained neural network and using it on your problem) for unstructured data. You could start a note with little tidbits like this for yourself and collect them as you go.
Resources for learning
- deeplearning.ai by Andrew Ng on Coursera — deep learning taught by one of the best in business.
- fast.ai deep learning courses by Jeremy Howard — a hands-on approach to deep learning taught by one of the industries best practitioners.
5. Extra curriculum & books
Along the way, it would be ideal if you practised what you were learning with small projects of your own. These don’t have to be elaborate world-changing things but something you can say “I’ve done this with X”. And then share your work via Github or a blog post. Github is used to showcase your code, a blog post is used to show how you can communicate your work. You should aim to release one of each for every project.
The best way to apply for a job is to have already done the things it requires. Sharing your work is a great way to showcase to a potential future employer what you’re capable of.
After you’re familiar using some of the different frameworks for machine learning and deep learning, you could try to cement your knowledge by building them from scratch. You won’t always have to do this in production or in a machine learning role but knowing how things work from the inside will help you build upon your own work.
Resources for learning
- How to start your own machine learning projects by Daniel Bourke — starting your own projects can be hard, this article gives you a few pointers.
- fast.ai deep learning from the foundations by Jeremy Howard — once you’ve gone top-down, this course will help you fill in the gaps from the bottom up.
- Grokking Deep Learning by Andrew Trask — this book will teach you how to build neural networks from scratch and why you should know how to.
- These books will help you learn machine learning by Daniel Bourke — YouTube video going through some of the best books on machine learning.
How long for each step?
You could spend 6-months or more on each. Don’t rush. Learning new things takes time. The main skill you are building as a data scientist or machine learning engineer is how to ask good questions of data then using your tools to try and find answers.
Some days you’ll feel like you’re learning nothing. Even going backwards. Ignore it. Don’t compare your progress day to day. Compare your progress year on year.
Where can I learn these skills?
I’ve listed some resources above, they’re all available online, most of them are free and they are more than enough to get started.
If you’re looking for a one stop shop, DataCamp is a great place to do most of these. Otherwise, my Machine Learning and Artificial Intelligence resources database contains a good archive of free and paid learning materials.
Remember, part of being a data scientist or machine learning engineer is solving problems. Treat your first assignment as finding out more about each of the steps here and creating your own curriculum to help you learn them.
If you want to know what an example self-lead curriculum for machine learning looks like, check out my Self-Created AI Masters Degree. It’s what I used to go from zero coding to being a machine learning engineer in 9-months. It’s not perfect but it’s mine, that’s why it worked.
What about statistics? What about math? What about probability?
You will learn these things along the way. Start with code first. Get things running. Trying to learn all of the statistics, all of the math, all of the probability before running your code is like trying to boil the ocean. It will hold you back.
None of the statistics, math and probability matter if your code doesn’t run. Get something working, and then use your research skills to find out if it’s correct.
What about certifications?
Certifications are nice but you’re not after them. You’re after skills.
Don’t make the mistake I did and think more certifications equals more skills. They don’t.
Build foundational knowledge through courses and resources like the above and then build specific knowledge (knowledge which can’t be taught) through your own projects.
If you have questions, leave a comment below so others can see. Otherwise, feel free to reach out.
This article and more like it originally appeared on mrdbourke.com.
You can find the video version on YouTube.
Looking for job in Data Analyst/Science | Immediate Joiner | Python , Excel , Sql , PowerBi , Statistics
4 年thanks alot for the resources
Release and Change Manager | Ex-TCS | Ex - Capgemini | Change | Problem | Incident | Release | DevOps | Agile | SQL | ITIL
5 年Pankush Kapoor
Business Analyst bei IBM
5 年Hi Daniel, thanks for sharing!?
Selin Erguncu
#SWMANAGER#PROJECTLEADER#EMBEDDEDC#32BITMICROCONTROLLER#DC CHARGER #AC CHARGER#OBC+DCDC+PDU
5 年Hi, thanks for sharing useful information. I am new in learning python. I started with very basic part to get familiar with python. I want to learn How I can use python for reading files and manipulate data and perform data processing. Please tell me right source from where I can learn?