The correlation Coefficient of my data is real ?!!

The correlation Coefficient of my data is real ?!!

What makes you confident in the percentage of the data correlation !?

?? For example, your code output was 95% of the correlation between data and each other which makes you sure that percentage did not affect by an outlier.

No alt text provided for this image

?? Let’s start with a fundamental question: is it possible I see a linear relationship in my data due to random chance? How can we be 95% sure the correlation between these two variables is significant and not coincidental??

?? So for it we go for the hypothesis testing and with estimating our parameter is the population correlation coefficient

H0 : ρ = 0 (implies no relationship)

H1 : ρ ≠ 0 (relationship is present)

No alt text provided for this image

?? Our null hypothesis H0 is that there is no relationship between two variables, or more technically, the correlation coefficient is 0

?? Alternative hypothesis H1 is there is a relationship, and it can be a positive or negative correlation

?? We already calculated the correlation coefficient for a dataset in and we get 0.957586

?? Simple data sets for explanations

No alt text provided for this image

?? We need to evaluate if this was by random luck. Let’s pursue our hypothesis

  • test with 95% confidence firstly we got the data is normally distributed (T-distributed)

No alt text provided for this image

  • Get the x values of the limits of the upper area which is the top of the tails

No alt text provided for this image

?? If our test value happens to fall outside this range of(1.95.-1.95) so we reject out null hypothes

No alt text provided for this image
No alt text provided for this image

The test value here is approximately 9.39956, which is definitely outside the range of (–2.262, 2.262) so we can reject the null hypothesis and say our correlation is real. That’s because the p-value is remarkably significant: .000005976. This is well below our .05 threshold.

Hint : why I used t distribution:- the difference between t distribution and normal distribution is that the t-distribution treating with small samples smaller than 31 sample


