The biggest misconception in learning the mathematical foundations of data science which no one tells you is ..

The biggest misconception in learning the mathematical foundations of data science which no one tells you is ..

I will continue to share about my book - mathematical foundations of data science.

The biggest misconception in learning the mathematics of data science is: statistical inference is not the same as machine learning inference.?

This requires some explanation, but if you understand this concept, you are a long way ahead in understanding the mathematical foundations of data science than most people.?

Statisticians use the term ‘inference’ to mean making predictions about a population based on a sample.? In machine learning, we refer to the term ‘inference’ as the ability of an algorithm to generalise from the training data to new instances of data.?

This has implications which I shall explain below:

  1. In statistical inference, you have a process of sampling. Machine learning does not need you to sample a smaller subset of data to make predictions about the whole population.
  2. Statistical inference draws conclusions based on the algorithm’s underlying probability distribution of data. In machine learning inference, you do not need to understand the probability distribution of the underlying data because you essentially learn from the training phase for inference.
  3. Statistical inference aims to understand the relationships between variables and to test hypotheses. This requires you to use models that include assumptions about the underlying statistical distribution. Hence, in statistical inference, we use techniques like? confidence intervals, hypothesis testing, and regression analysis to estimate the parameters of the underlying distribution and to test theories about the data.
  4. In Machine Learning, you learn from the data without making assumptions about the model’s underlying probability distributions. You are primarily concerned about selecting the best model that performs well for a task regardless of its interpretability or the understanding of its theoratical structure.??

Once you understand the above, we can see how these two approaches are actually used in different contexts

Statistical Inference is used in fields where understanding the cause-effect relationship or testing theories is crucial. Examples include medical research, social sciences, and economics. Its also used where model interpretability is critical.

In contrasts, machine learning inference approaches are used when you value the predictability on new data - without necessarily understanding the underlying structure of the data or the interpretability of the model.? Because of the need to understand the underlying structure, machine learning and deep learning models are more complex relative to statistical models.?

Finally, to add to the fun and confusion, some models like regression are used both in a statistical sense and a machine learning sense - which is why when you use linear regression for machine learning, you still need to understand the underlying assumptions of regression . ?

I will expand on this in future posts

If you found this useful, you can sign up for my book https://forms.gle/g4Y41BncN56oCsoX8 ?

If you are a non developer and want to learn AI with me, please see Erdos Research Labs ?

You can meet me and our team at our Oxford AI summit ?

If you would like to study with me, see our courses

Low code AI course at the university of oxford? for non developers

AI and digital twins?

Cecilia Quintana

Management ?? | Technical Leader ???? | Marathonist ??♀?

8 个月

Thanks for this post, this triggered me something that I've read in the book "love and math" from Edward Frenkel , where he point out that it's important to understand that if we don't understand the foundation of a certain subject we put our trust in the words and knowledge of someone or something else, hence we can put in danger human knowledge and criticism.

Frank X. Sowa

Founder/CEO at The Xavier Group, Ltd. -- Strategy Consultant; Futurist; Comprehensive Anticipatory Design Scientist

8 个月

SPOT ON. Yes. Too many create more problems by only possessing broad overview knowledge of language definitions. Thus, it is a growing commonality to confuse mathematically sound statistics with mathematically sound machine learning and misrepresent by lack of understanding the two of them (and worse, to a broader issue) as well as the internal processes and systems for which they are to be used. THEY ARE NOT the SAME. They really are not similar.

CHESTER SWANSON SR.

Next Trend Realty LLC./wwwHar.com/Chester-Swanson/agent_cbswan

8 个月

Thanks for Sharing.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了