Testing of Hypothesis

Hypothesis testing is indeed a branch of inferential statistics. Inferential statistics involves using sample data to make inferences or draw conclusions about a larger population. Hypothesis testing is one of the key tools within inferential statistics that allows us to make these inferences.

In hypothesis testing, we start with a research question or a claim about a population. We then collect a sample from that population and use statistical techniques to analyze the data. The goal is to determine whether the evidence from the sample supports or contradicts the claim or hypothesis about the population.

By using hypothesis testing, we can assess the probability that any observed differences or relationships between variables in the sample are due to chance or if they represent true differences or relationships in the population. It helps us make statements about the population based on limited sample information.

Hypothesis testing provides a structured framework for making statistical inferences and helps ensure that our conclusions are reliable and based on evidence. It is widely used in scientific research, social sciences, business, and many other fields to draw meaningful insights from data and make informed decisions.

Let's begin with a basic definition because solid pillars are essential for constructing a building....

Statistical Inference : The main objective of statistical inference is to make inferences about a population based on a sample. This can be done by estimating population parameters, such as the mean or standard deviation, or by testing hypotheses about the population.

There are two main types of statistical inference:

Inference-I : Theory of Estimation

The main objective is with the help of statistic to estimate the parameter.

Inference-II : Testing of Hypothesis

The main objective is to test the parameter using the sample.

Hypothesis : Any statement regarding population(popn) parameter is called Hypothesis and then it is testing called testing of hypothesis.

Statistical Hypothesis : is a statement regarding population parameter which is to be tested on basic information available from a sample is called statistical hypothesis.

There are two types of statistical hypothesis they are.....

Simple Hypothesis : " If the hypothesis completely specifies the population then it is called simple hypothesis."

EX : if x1,x2,x3......,xn is a random sample of size n from a binomial population with parameter p then the hypothesis H0 : p=p0 is simple hypothesis.

Composite Hypothesis : " A hypothesis which does not specify completely "r" parameter of a population is termed as composite hypothesis with "r" degrees of freedom."

EX : H1 : P>P0 ,P<P1.

If we are facing any problem we assume two assumptions

1.Assumption-I : Null Hypothesis [H0] (h not) -simple Hypothesis.

2.Assumption-II : Alternative Hypothesis [H1] (h one) - Composite Hypothesis.

Null Hypothesis : Any statement there is no significance difference is called null hypothesis and it is denoted by H0.

Ex : a sample size n = 30, drawn from a population and it is found that sample mean = 15 if we want to test that popn mean=10 or not.

The popn mean μ= 10 i.e.,H0 :μ=10 There is no significance difference between sample mean and popn mean.

Alternative Hypothesis : It is against of null hypothesis. i.e., any statement there is a difference is called alternative hypothesis denoted by H1

EX: H0 μ=μ0 then the alternative hypothesis could be H1 μ!=μ0 or μ>μ0,μ<μ0.

Critical region & accept region :

let x1,x2,x3,....,xn are the sample observations with sample size n drawn from a popn. the sample are on a wen diagram . the set of all points plotted is called sample space. The basis of the testing of hypothesis the division of the sample sample space into two regions i.e, accept & rejection region.

**if the sample points are fall in accept region then we accept h0

** if the sample points fall in rejection region then we reject h0.

p(accept) +p(reject) = 1
p(accept) = 1-P(reject)

Types of errors : in natural life we can see the following statements.]

H0 : product is good H1 : product is not good.

Accept h0 when it is true
reject h0 when it is false
reject ho when it is true
accept h0 when it is false.

The above first two statements are correct statements & the last statement are wrong statements. The wrong statements are error statements.

Type-I-error :- The statement reject h0 when it is true is called type-I error are also called producers risk.

Type -II-error : The statement accept h0 when it is false is called type-II error. it is also called consumers risk.

Type -I error is more risk then Type-II.\

Type-I error is demed to be more serious than Type-II error.

Level of significance : The probability of type-I error is called level of significance & it is denoted by '' α'' and it is given by

α = p{Type-I error}
  = p{reject h0 when it is true}
  = P{x∈w/Ho}

Critical region :

The critical region is the region of values that corresponds to the rejection of the null hypothesis at some chosen probability level.

The shaded area under the curve is equal to the los (α) & the non-shaded area under the curve is equal to (1-α).

The division point of accept region and rejection region is known as critical value, table value i.e., z-table value, t-table value, f-table value, χ2-table values etc.

The absolute value of any test is larger than the critical(calculated) value then we reject the null hypothesis.

zcal > ztab -- Reject h0
tcal > ttab -- Reject h0
fcal > ftab -- Reject h0

One & Two tailed test : The statistical test used will be two sided & one sided tests it is depends on alternative hypothesis.

if the alternative hypothesis is != type then the statistic test two tailed test.

in two tailed test the level of significance is divide by 2

if alternative hypothesis > or < type then the statistical test is one tailed test.

                      
                       ___________Right tiled test
                      |
one tailed test ------|
                      |__________left tailed test

if H1 is ">" type then the statistical test is right tailed test.
if H1 is "<" type then the statistical test is left tailed test.

In studying of magnitude (size) of variation about a population the entire sampling theory is mainly classified into two types.

large sample theory (test)
small sample Theory (test)

Small sample test : If the sample size is less than 30 that sample is called small sample. To test the small sample we have to use t, f, χ2 - distributions {exact sampling distributions}.

Large sample test : To study the general magnitude of variation or characteristic about population, sample size (n>30) is drawn from population and the corresponding sample is called large sample and a statistic which is used by large sample is known as large sample test.

i.e., If the sample size is greater than or equal to 30 then that sample is called ''large sample" .

Test procedure for testing of statistical hypothesis [for large samples]:

To test statistical hypothesis consisting of the following steps by step procedure...

Step_1 : set up the null hypothesis(h0) for given data.

step_2 : set up the alternative hypothesis for the given data (h1) which help us to decide whether the test is one tailed or two tailed test.

step_3 :choose appropriate level of significance maybe 1%,5%,10%.

step_4 : compute the test statistic z = (t-E(t))/s.E(t) ~ N(0,1)

step_5 : conclusion : If |z|>3 then reject h0 otherwise |z|<=3 then compose the calculated value with z table value.

--- if |Z| cal < |z|tab then accept h0.

--- if |z| cal >= |z|tab then reject h0.

To test the large sample we use normal distribution. Because we know that the each and every distribution is approximately to normal distribution. provide the size of the sample increase so normal distribution is called parent distribution and it is the good example for large sample test.

for any statistic 't' the normal test is

z = (t-E(t))/s.E(t)

we have some of the following large sample tests.

Test for single mean or sample

  z = (t-E(t))/S.E(t)

2.Test for difference of means

      here t =  x?-?

       z = (((x?-? ) - E(x?-? ))/S.E(x?-? )) ~N(0,1)

3. Test for single standard deviation

now t = sample standard deviation = s
z = (s-E(s))/S.E(s)
we know that for large values of n the sample standard deviation 
follows normal distribution with mean σ  and variance σ^2/2n
there fore ofter solving final formula is 
z = (s-σ)/σ^2/2n

4. Test for difference of standard deviation

here t = difference of standard deviations s1-s2
z = ((s1-s2)-E(sl-s2))/S.E(s1-s2)
we know that for large samples values of n the sample standard deviation 
follow normal distribution with mean σ  and variance σ^2/2n
z = ((s1-s2)-(σ1-σ2))/sqrt(σ1^2/2n1)+(σ2^2/2n2)) ~ N(0,1)

5.Test for single proportion.

here t = sample proportion = P
z = ((p-E(p))/S.E(p)) ~ N(0,1)

6.Test for difference of proportion :

here t = difference of sample proportion = p1-p2
z = ((p1-p2)-E(p1-p2))/S.E(p1-p2)) ~ N(0,1)

Based on the problem the main formula of z is modified.

SMALL SAMPLE TESTS

To test the general characteristic or hypothesis about the population parameter in a population by selecting a small sample (n<30) the test statistic required which is know as 'small sample test'.

The exact sampling distribution (t, f, x^2 distributions) are used to conduct the small sample test.

The various small sample tests are:

t - test:

t- test for single mean.
t- test for two means {Independent sample}
t- test for two means {Dependent sample}

F - test: F - test for population variance.

χ2 - test :

χ2 - test for goodness of fit.
χ2- test for independence of attributes
χ2- test for simple variance .

Assumptions of t - test OR Assumptions for conducting student t-test:

t - test for mean is applied under the following assumptions..

all the observations are independent.
Assumed population is always normal OR the observations drawn from the normal population.
population standard deviation is unknown.

t - test for mean OR t - test for single mean:

t -test for difference of two means(independent samples) OR t -test for equality of the population means:

t -test for difference of means (dependent samples) OR paired t-test for difference of means :

t - test for correlation coefficient :

Assumption to conduct F - test :

Do two groups have different variances?
Is the variance of a sample significantly different from a known value?
Is the variance of a sample significantly different from the variance of another sample?

F - test for equality of variance :

Conditions for validity of χ2 (chi - square) test for goodness of fit :

For the validity of chi-square test of goodness of fit and test the independence of attributes the following conditions should be satisfied.

sample observation should be independent.
sum of observed frequency and sum of expected frequencies should be equal.
no theoretical call frequencies should be less than 5.
N, the total frequency should be reasonably large say greater than 50.

If any theoretical call frequencies is less than 5 before applying the chi-square test it is pooled with the preceding or succeeding frequency.

Chi - square test for goodness of fit:

chi - square test for independence of attributes:

Degrees of freedom :

no . of independent observations is known as degrees of freedom OR the

sample size - no . of estimated parameters.

EX : n = 7 , n-1 = 7-1 = 6 (independent observations).

let's look into how to calculate critical values for two and one tailed tests.....

Now get into critical values of small sample tests

Testing of Hypothesis

Pujala Bhanuprakash

Junior Data scientist |Bsc statistics

Test procedure for testing of statistical hypothesis [for large samples]:

领英推荐

SMALL SAMPLE TESTS

t - test:

社区洞察

其他会员也浏览了

Simple Linear Regression in Statistics

Simple Linear Regression in Statistics (VIDEO??)

Statistical Powerhouses: The Role of ANOVA and MANOVA in Quantitative Analysis

MULTIVARIATE STATISTICS

Statistical Model

Statistical modeling

Demystifying Inferential Statistics: Unlocking Insights from Data

6 MISTAKES OF HYPOTHESIS TESTING

Behind the Data Curtain - Hypothesis Testing

Embracing Complexity in Quantitative Research: A Dual Lens on Descriptive and Inferential Statistics