Hypothesis Testing in Machine Learning
Hypothesis Testing in Machine Learning

Hypothesis Testing in Machine Learning

In data science and machine learning initiatives, the word hypothesis is frequently used. We all know that one of the most potent technologies in use today, machine learning, enables us to predict outcomes based on prior data. Additionally, experts in data science and machine learning run trials to resolve an issue. These ML experts and data scientists make a first presumption regarding the problem's resolution.


In machine learning, this presumption is referred to as a hypothesis. The terms hypothesis and model are frequently used interchangeably in machine learning. A model, on the other hand, is a mathematical representation that is used to evaluate the hypothesis, whereas a hypothesis is an assumption made by scientists. We will go through a few key ideas and their relevance to a hypothesis in machine learning in this topic, "Hypothesis in Machine Learning," in detail. So let's begin with a brief overview of the hypothesis.


What is a Hypothesis?

The term "hypothesis" refers to a notion or explanation that is put forth but lacks supporting data. It's only a hunch based on certain facts; it hasn't been proven yet. A sound theory can be tested and found to be either true or untrue.

Parameters of hypothesis testing

  • Null hypothesis(H0): The null hypothesis (H0) in statistics is the default assumption or assertion that there is no association between any two measured cases or any two groups.

In other words, it is a fundamental assumption or one that is founded on knowledge of the problem.

Example: A company's daily production is 50 units.

  • Alternative hypothesis(H1): The alternative hypothesis, or H1, is the null-hypothesis-rejecting hypothesis that is utilised in hypothesis testing.

A company's production, for instance, does not equal 50 units per day, etc.

  • Level of significance

The level of significance at which we accept or reject the null hypothesis is referred to as this. Since a hypothesis cannot be accepted with 100% accuracy, we choose a level of significance that is typically 5%. This is typically indicated by the symbol "alpha," which is typically 0.05 or 5%, meaning that you should have 95% confidence that your output will produce results that are similar in each sample.

  • P-value

The likelihood of discovering the observed/extreme outcomes when the null hypothesis (H0) of a study-given problem is true is known as the P value or computed probability. If your P-value is smaller than the selected level of significance, you acknowledge that your sample does support the alternative hypothesis and reject the null hypothesis.



The fairness or difficulty of a coin is unknown, so let's choose the null and alternative hypotheses.


A coin is a fair coin, which is the null hypothesis (H0).

An interesting coin is an alternative hypothesis (H1).

alpha = 0.05 or 5%

Let's now flip the coin and determine the p-value (probability value).

Toss a coin once, assuming it will land on heads (P-value = 50%). (because the odds of the head and the tail are equal)

If the second toss of the coin results in another head, the p-value is now equal to 50/2 or 25%.


and similarly, we threw six straight times and the outcome was all heads; the P-value is now 1.5 percent.

Our null hypothesis does not stand up since we put our significance threshold at 95%, which suggests that we can tolerate a 5% error rate. Therefore, we need to reject this null hypothesis and suggest that this coin is tricky because it has given us 6 consecutive heads.


Testing Your Hypotheses Wrong


When we reject the null hypothesis despite it being true, we commit a type I error. Alpha is used to indicate a type I error.

Type II mistakes occur when the null hypothesis is accepted even when it is untrue. The sign of a type II mistake is beta.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了