Decision Making With P-Values
RAHUL KUMAR
Data engineer with skills :- Python, PySpark, SQL, Azure Data Factory, Azure Data Bricks, Azure Data Lake ,Azure Synapse Analytics.Created pipeline to ingest data from heterogeneous sources.Also build python tools.
In simple words we can understand Hypothesis as Explanation on the basis of limited evidence. (Educated guess)
Null Hypothesis (H0) is the one to be tested and the alternative(H1/Ha) is everything else.
Lets take an example:
H0: μ = 70 (Average marks of students in a class is 70)
Ha: μ ≠ 70(Average marks of students in a class is not 70)
Type 1 error is when we reject a true null hypothesis and is the more serious error. It is also called a False positive. The probability of making this error is α (The level of significance).
Type 2 error is when we accept a false null hypothesis. It is also called a False negative. The probability of making this error is β. β depends on sample size and population. Probability of rejecting a false null hypothesis is equal to 1-β. This is the researcher’s goal of hypothesis testing to reject a false null hypothesis. Therefore 1-β is called The Power of the test.
Example:
The person is arrested on the charge of being guilty of burglary. A jury of judges has to decide guilty or not guilty.
H0: Person is innocent
H1: Person is guilty
Type I error will be if the Jury convicts the person [rejects H0] although the person was innocent [H0 is true].
Type II error will be the case when Jury released the person [Do not reject H0] although the person is guilty [H1 is true].
Level of confidence
As the name suggests a level of confidence: how confident are we in taking out decisions. LOC (Level of confidence) should be more than 95%. Less than 95% of confidence will not be accepted.
Confidence level(C)=1-α
Level of significance(α)
The significance level, in the simplest of terms, is the threshold probability of incorrectly rejecting the null hypothesis when it is in fact true. This is also known as the type I error rate.
It is the probability of a type 1 error.
The critical region is that region in the sample space in which if the calculated value lies then we reject the null hypothesis. The critical region lies in one tail or two tails on the probability distribution curve according to the alternative hypothesis.
Critical values are values separating the values that support or reject the null hypothesis and are calculated on the basis of alpha.
We will see more examples later on and it will be clear how do we choose α.
Based on the alternative hypothesis, three cases of critical region arise:
Case 1) This is a double-tailed test.
Case 2) This scenario is also called a Left-tailed test.
Case 3) This scenario is also called a Right-tailed test.
Generally, strong control of α is desired and in tests, it is prefixed at very low levels like 0.05(5%) or 01(1%).
If H0 is not rejected at a significance level(α ) of 5%, then one can say that our null hypothesis is true with 95% assurance.
If the null hypothesis is getting rejected at 1%, then for sure it will get rejected at the higher values of significance level, say 5% or 10%.
The p-value is the smallest level of significance at which a null hypothesis can be rejected.
Decision making with p-value
We compare p-value to significance level(α ) for taking a decision on Null Hypothesis.
If p-value is greater than alpha, we do not reject the null hypothesis.
If p-value is smaller than alpha, we reject the null hypothesis.