First, correlation does not imply causation. There may be other factors that influence the relationship between the two variables, such as confounding variables, reverse causality, or spurious correlations. For example, ice cream sales and shark attacks are positively correlated, but it does not mean that ice cream causes shark attacks. Rather, both are influenced by a third variable, which is the temperature. To establish causation, you need to conduct an experiment or use a causal inference method.
Second, correlation is sensitive to outliers and non-normality. Outliers are extreme values that deviate from the rest of the data, and non-normality means that the data does not follow a bell-shaped distribution. Both can affect the accuracy and validity of the correlation coefficient. For example, if you have a few very wealthy people in your sample, they can inflate the correlation between income and happiness. To deal with outliers and non-normality, you can use robust methods, such as trimming or transforming the data.
Third, correlation is not a comprehensive measure of relationship. It only captures the linear or monotonic aspect of the relationship, but not the nonlinear or complex aspect. For example, if you plot the relationship between age and happiness, you may find a U-shaped curve, where happiness decreases in midlife and increases in later life. However, the correlation coefficient may be close to zero, because it does not account for the curvature. To capture the nonlinear or complex aspect of the relationship, you can use other methods, such as regression or curve fitting.