The correlation Coefficient of my data is real ?!!
Belal Aboelkher
Machine Learning Nerd |computer vision staff Engineer | ROS | Robotics
What makes you confident in the percentage of the data correlation !?
?? For example, your code output was 95% of the correlation between data and each other which makes you sure that percentage did not affect by an outlier.
?? Let’s start with a fundamental question: is it possible I see a linear relationship in my data due to random chance? How can we be 95% sure the correlation between these two variables is significant and not coincidental??
?? So for it we go for the hypothesis testing and with estimating our parameter is the population correlation coefficient
H0 : ρ = 0 (implies no relationship)
H1 : ρ ≠ 0 (relationship is present)
?? Our null hypothesis H0 is that there is no relationship between two variables, or more technically, the correlation coefficient is 0
?? Alternative hypothesis H1 is there is a relationship, and it can be a positive or negative correlation
?? We already calculated the correlation coefficient for a dataset in and we get 0.957586
?? Simple data sets for explanations
?? We need to evaluate if this was by random luck. Let’s pursue our hypothesis
领英推荐
?? If our test value happens to fall outside this range of(1.95.-1.95) so we reject out null hypothes
The test value here is approximately 9.39956, which is definitely outside the range of (–2.262, 2.262) so we can reject the null hypothesis and say our correlation is real. That’s because the p-value is remarkably significant: .000005976. This is well below our .05 threshold.
Hint : why I used t distribution:- the difference between t distribution and normal distribution is that the t-distribution treating with small samples smaller than 31 sample