Testing of Hypothesis
Hypothesis testing is indeed a branch of inferential statistics. Inferential statistics involves using sample data to make inferences or draw conclusions about a larger population. Hypothesis testing is one of the key tools within inferential statistics that allows us to make these inferences.
In hypothesis testing, we start with a research question or a claim about a population. We then collect a sample from that population and use statistical techniques to analyze the data. The goal is to determine whether the evidence from the sample supports or contradicts the claim or hypothesis about the population.
By using hypothesis testing, we can assess the probability that any observed differences or relationships between variables in the sample are due to chance or if they represent true differences or relationships in the population. It helps us make statements about the population based on limited sample information.
Hypothesis testing provides a structured framework for making statistical inferences and helps ensure that our conclusions are reliable and based on evidence. It is widely used in scientific research, social sciences, business, and many other fields to draw meaningful insights from data and make informed decisions.
Let's begin with a basic definition because solid pillars are essential for constructing a building....
Statistical Inference : The main objective of statistical inference is to make inferences about a population based on a sample. This can be done by estimating population parameters, such as the mean or standard deviation, or by testing hypotheses about the population.
There are two main types of statistical inference:
Inference-I : Theory of Estimation
The main objective is with the help of statistic to estimate the parameter.
Inference-II : Testing of Hypothesis
The main objective is to test the parameter using the sample.
Hypothesis : Any statement regarding population(popn) parameter is called Hypothesis and then it is testing called testing of hypothesis.
Statistical Hypothesis : is a statement regarding population parameter which is to be tested on basic information available from a sample is called statistical hypothesis.
There are two types of statistical hypothesis they are.....
Simple Hypothesis : " If the hypothesis completely specifies the population then it is called simple hypothesis."
EX : if x1,x2,x3......,xn is a random sample of size n from a binomial population with parameter p then the hypothesis H0 : p=p0 is simple hypothesis.
Composite Hypothesis : " A hypothesis which does not specify completely "r" parameter of a population is termed as composite hypothesis with "r" degrees of freedom."
EX : H1 : P>P0 ,P<P1.
If we are facing any problem we assume two assumptions
1.Assumption-I : Null Hypothesis [H0] (h not) -simple Hypothesis.
2.Assumption-II : Alternative Hypothesis [H1] (h one) - Composite Hypothesis.
Null Hypothesis : Any statement there is no significance difference is called null hypothesis and it is denoted by H0.
Ex : a sample size n = 30, drawn from a population and it is found that sample mean = 15 if we want to test that popn mean=10 or not.
The popn mean μ= 10 i.e.,H0 :μ=10 There is no significance difference between sample mean and popn mean.
Alternative Hypothesis : It is against of null hypothesis. i.e., any statement there is a difference is called alternative hypothesis denoted by H1
EX: H0 μ=μ0 then the alternative hypothesis could be H1 μ!=μ0 or μ>μ0,μ<μ0.
Critical region & accept region :
let x1,x2,x3,....,xn are the sample observations with sample size n drawn from a popn. the sample are on a wen diagram . the set of all points plotted is called sample space. The basis of the testing of hypothesis the division of the sample sample space into two regions i.e, accept & rejection region.
**if the sample points are fall in accept region then we accept h0
** if the sample points fall in rejection region then we reject h0.
p(accept) +p(reject) = 1
p(accept) = 1-P(reject)
Types of errors : in natural life we can see the following statements.]
H0 : product is good H1 : product is not good.
The above first two statements are correct statements & the last statement are wrong statements. The wrong statements are error statements.
Type-I-error :- The statement reject h0 when it is true is called type-I error are also called producers risk.
Type -II-error : The statement accept h0 when it is false is called type-II error. it is also called consumers risk.
Type -I error is more risk then Type-II.\
Type-I error is demed to be more serious than Type-II error.
Level of significance : The probability of type-I error is called level of significance & it is denoted by '' α'' and it is given by
α = p{Type-I error}
= p{reject h0 when it is true}
= P{x∈w/Ho}
Critical region :
The critical region is the region of values that corresponds to the rejection of the null hypothesis at some chosen probability level.
The shaded area under the curve is equal to the los (α) & the non-shaded area under the curve is equal to (1-α).
The division point of accept region and rejection region is known as critical value, table value i.e., z-table value, t-table value, f-table value, χ2-table values etc.
The absolute value of any test is larger than the critical(calculated) value then we reject the null hypothesis.
zcal > ztab -- Reject h0
tcal > ttab -- Reject h0
fcal > ftab -- Reject h0
One & Two tailed test : The statistical test used will be two sided & one sided tests it is depends on alternative hypothesis.
in two tailed test the level of significance is divide by 2
___________Right tiled test
|
one tailed test ------|
|__________left tailed test
In studying of magnitude (size) of variation about a population the entire sampling theory is mainly classified into two types.
Small sample test : If the sample size is less than 30 that sample is called small sample. To test the small sample we have to use t, f, χ2 - distributions {exact sampling distributions}.
Large sample test : To study the general magnitude of variation or characteristic about population, sample size (n>30) is drawn from population and the corresponding sample is called large sample and a statistic which is used by large sample is known as large sample test.
i.e., If the sample size is greater than or equal to 30 then that sample is called ''large sample" .
Test procedure for testing of statistical hypothesis [for large samples]:
To test statistical hypothesis consisting of the following steps by step procedure...
Step_1 : set up the null hypothesis(h0) for given data.
step_2 : set up the alternative hypothesis for the given data (h1) which help us to decide whether the test is one tailed or two tailed test.
step_3 :choose appropriate level of significance maybe 1%,5%,10%.
step_4 : compute the test statistic z = (t-E(t))/s.E(t) ~ N(0,1)
step_5 : conclusion : If |z|>3 then reject h0 otherwise |z|<=3 then compose the calculated value with z table value.
--- if |Z| cal < |z|tab then accept h0.
--- if |z| cal >= |z|tab then reject h0.
To test the large sample we use normal distribution. Because we know that the each and every distribution is approximately to normal distribution. provide the size of the sample increase so normal distribution is called parent distribution and it is the good example for large sample test.
for any statistic 't' the normal test is
领英推荐
z = (t-E(t))/s.E(t)
we have some of the following large sample tests.
z = (t-E(t))/S.E(t)
2.Test for difference of means
here t = x?-?
z = (((x?-? ) - E(x?-? ))/S.E(x?-? )) ~N(0,1)
3. Test for single standard deviation
now t = sample standard deviation = s
z = (s-E(s))/S.E(s)
we know that for large values of n the sample standard deviation
follows normal distribution with mean σ and variance σ^2/2n
there fore ofter solving final formula is
z = (s-σ)/σ^2/2n
4. Test for difference of standard deviation
here t = difference of standard deviations s1-s2
z = ((s1-s2)-E(sl-s2))/S.E(s1-s2)
we know that for large samples values of n the sample standard deviation
follow normal distribution with mean σ and variance σ^2/2n
z = ((s1-s2)-(σ1-σ2))/sqrt(σ1^2/2n1)+(σ2^2/2n2)) ~ N(0,1)
5.Test for single proportion.
here t = sample proportion = P
z = ((p-E(p))/S.E(p)) ~ N(0,1)
6.Test for difference of proportion :
here t = difference of sample proportion = p1-p2
z = ((p1-p2)-E(p1-p2))/S.E(p1-p2)) ~ N(0,1)
Based on the problem the main formula of z is modified.
SMALL SAMPLE TESTS
To test the general characteristic or hypothesis about the population parameter in a population by selecting a small sample (n<30) the test statistic required which is know as 'small sample test'.
The exact sampling distribution (t, f, x^2 distributions) are used to conduct the small sample test.
The various small sample tests are:
t - test:
F - test: F - test for population variance.
χ2 - test :
Assumptions of t - test OR Assumptions for conducting student t-test:
t - test for mean is applied under the following assumptions..
t - test for mean OR t - test for single mean:
t -test for difference of two means(independent samples) OR t -test for equality of the population means:
t -test for difference of means (dependent samples) OR paired t-test for difference of means :
t - test for correlation coefficient :
Assumption to conduct F - test :
F - test for equality of variance :
Conditions for validity of χ2 (chi - square) test for goodness of fit :
For the validity of chi-square test of goodness of fit and test the independence of attributes the following conditions should be satisfied.
If any theoretical call frequencies is less than 5 before applying the chi-square test it is pooled with the preceding or succeeding frequency.
Chi - square test for goodness of fit:
chi - square test for independence of attributes:
Degrees of freedom :
no . of independent observations is known as degrees of freedom OR the
sample size - no . of estimated parameters.
EX : n = 7 , n-1 = 7-1 = 6 (independent observations).
let's look into how to calculate critical values for two and one tailed tests.....
Now get into critical values of small sample tests