3rd Story – To Math or Not to Math

3rd Story – To Math or Not to Math

A big question on the minds of many people when they begin their journey into Data Science is: “Do I really need to learn math?” It was on my mind for a long time while studying.

Let me take a moment to split this question. First off, you do need math when doing Data Science, at least when it comes to statistics and probability, but we will talk about that part in the not-so-distant future. Right now, I want to focus on the math that makes many people shake even before they try it: Linear Algebra and Calculus.

So, understanding that, let′s get back to the question at hand: Do you need math for Data Science? And the short answer is: No… But…

As I was polishing up my programming skills, already versed in the basic concepts of statistics and felt like a master (which I was not) on Numpy and Pandas, having practiced some basic ML models like Regression, Logistic Regression and Decision Trees as well as some Random Forests (sorry about this, I just love Random Forests), I started delving into the murky waters of Deep Learning. And after barely grasping the concept of activation with Sigmoids and ReLus, I was completely lost when it came to back propagation. If you have not seen these concepts yet, do not worry, you will in then future. The important thing here is that no matter how much Andrew Ng kept repeating that it did not really matter if I did not understand the math behind it, I felt lost. That feeling you get when someone tells an inside joke and you are not privy to the inside, everyone else laughs and you feel like you might be missing out on something.

And that is the moment I sort of figured out that, although you can do ML and even DL without going into the math of things, there is something missing, some important and very deep understanding that no matter how great your intuition is, you need to jump into the rabbit hole to really grasp it.

Don’t get me wrong, I am far from being a mathematician or even considering myself good at it. I had taken a basic calculus course in college 20 years ago (and barley passed) and had more or less ignored math until starting my road to data science. But I must admit it was a challenging and fun part of the road, and there is an amazing advantage to understanding it. Think about it this way, computers only understand numbers, and Machine Learning is all about telling a computer to run some number operations on some specific way until we get a result that is better than a certain other number.

In my case, not wanting to be left out, I decided to take a detour into the world of mathematics, and I found a three course specialization with the amazing name “Mathematics for Machine Learning”, and I was off to understanding my new path a little better.

Here are some of my insights regarding math and data science:

1. Linear Algebra is your friend. Most of DS and ML happens in the amazing imaginary world of vectors and matrices (and tensors, but let’s not get into that right now), and most of the models, especially in Natural Language Processing, have to do with comparing the distance (or similarity) between one vector and another. The better you understand this space, the easier it will be for you to grasp the nuances of your models and the easier it will be to adjust parameters and hyperparameters, meaning, you will be able to build better models, faster.

2. Calculus. Here is a concept you will hear over and over and over again, even before you understand it: gradient descent. This is one of the main optimization algorithms to find a minimum. Understanding the calculus behind it helps understand its importance and allows you to manage the hyperparameters that affect it more efficiently. Multivariate calculus is also the main driving force behind backpropagation, an important step in neural networks to minimize loss and improve the network on every iteration.

3. Principal Component Analysis (PCA). Every feature in an ML model creates its own dimension. Think about this, if you have two features, say gas consumption and engine size, you can easily plot your data on a 2-dimensional chart. If we add a third feature, like car weight, we can still plot it on a more complex and less friendly 3-dimensional visualization. Once we add the fourth feature, or more, we move into the multi-dimensional reality that we can no longer chart or visualize. Nonetheless, computers can manage as many dimensions as we can throw at them, but it comes at a huge price: increased computer power and decreased model understanding. The more dimensions you have, the more time your models will take to train and the harder they become to explain. PCA is a process to lower the dimensionality by discovering the most important features and even to combine them into less dimensions. In other words, it will help you discover the Principal Components and hence, make your models easier to train and to explain.


Can you become an amazing Data Scientist without the math? Yes, you can. Will your understanding be deeper, and will you be able to contribute more to the Data Science community? Also yes.


For me, getting deeper into the advanced math has opened new levels of comprehension, allowing me to conceptualize projects better, to understand some of the ever-growing body of research and I do believe it has helped me become a better Data Scientist all around.

When to start depends on you, I started after I began Deep Learning and felt I was missing out on some important insights and wish I had learned this sometime between learning python and getting into the basics of Machine Learning. But at the end of the day, this is your journey, and it is always a good tome to learn more and to go deeper.


Coursera, Kahn Academy, Udemy, Udacity, YouTube are just some of the resources where you can learn mathematics, for free or paid. I will be here wishing you an amazing adventure into mathematics and the doors it will open for you.

Hope we cross paths through our Journeys…

Jack Raifer Baruch

Follow me on Twitter: @JackRaifer

Follow me on LinkedIN: jackraifer



Next Story: Lies, Damn Lies and Statistics


About the Road to Data Science Series

Today, I am working on the first steps of remarkably interesting projects for human development based on Data Science and Machine Learning.

But not that long ago (really, not long at all) I knew extraordinarily little about data science and much less what it all meant (and I am still learning more and more about it every day). In my quest for reinventing myself from Psychologist working in Behavioral Economics to Data Scientist I went through an incredibly interesting journey and learned a lot. This series is mostly a letter to my past self, to help anyone like me take this amazing road and, luckily, avoid some of the mistakes I made on the way due to lack of knowledge or perspective.

Hope you enjoy my ramblings as much as I found joy on my Road to Data Science.


Need Help on your Journey?

This can be a difficult path alone, so feel free to reach out to me through LinkedIN or Twitter. I started this series because of the #66DaysOfData initiative by Ken Jee, it is a great way to connect and get support, so just check out Ken on twitter @KenJee_DS and join the #66DaysOfData challenge.



Learning Resources I have Used:


A LOT of content, some free, most paid. Check out cupon sites where you can usually find free cupons for courses on python, R, data science, machine learning and much more.



Interesting place to learn, they have some free courses and then paid content. Very hands on coding exercises, few videos, mostly reading.



My favorite place to learn. Thousands of courses, a lot of content on programming, Data Science and Machine Learning. The University of Michigan has many courses here for python programming from the very basics to complex things. All courses are free to audit, you only pay if you want to earn a certificate.



The top free place to learn to code. Hundreds of hours of free videos on almost any language. They now also have certifications, also for free.



The place to learn anything. All of it is free, it might take a while to get to the content you want and enjoy.



Top site for data science, also run many competitions. They have many free courses, but the programming part is scarce, some basic ones and all focused on Data Science and Machine Learning.



Similar to Codecademy, with many paths and courses. Some free content, the rest is paid. Very focused on Data Science.


My favorite place to practice code, challenges for every level from beginners to advanced. This is a good place to challenge yourself and check your progress.

This is an insightful series.

Norn Vandy

Product Development & Risk Management Manager

4 年



Jack Raifer Baruch的更多文章

