登录查看更多内容

Statistics with Excel, the Correlation

Donata Petrelli

Financial Data Scientist | Quantitative Analyst | Chief Financial Officer | Book Author

发布日期: 2020年3月22日

Knowing whether a particular agent can influence the diffusion of a virus or a macro-political event can affect the trend of a financial asset, are problems that a priori can not have immediate and predictable response but whose solution makes difference for the common interest.

In general, the analysis of data that describe a phenomenon and their realistic interpretation are the basis of scientific research that has as its goal the understanding of the phenomenon itself and its most likely future development. The scientific approach generally tends to find what and how many may be the causes that lead to the manifestation of the event. However, the cause-effect relationship is not always evident and, therefore, it is not so immediate to find an optimal solution to a contingent problem.

This article talks about a mathematical method to identify whether between two quantities, of which any kind of association is a priori ignored, there can be a relationship such that the variation of one involves the variation of the other and, if so, by what measure. That is the correlation. We will use a powerful and simple tool as Excel.

The correlation … what's it for

It often happens that describing the behavior of a variable, perhaps dependent on others, is not sufficient for the analysis of a much more complex phenomenon, in which there are other parameters whose mutual relationship with the studied variable is not yet known.

In this case we have to verify if there is a relationship between the characteristics of the phenomenon (the variables) and, if it is so, of what intensity. It is necessary to study the correlation between the variables.

In the correlation analysis, the variables are indicated with X1 and X2 and not X and Y, as in the case of cause-effect relation that we all know, just to highlight the absence of the concept of functional dependence.

We don't yet know if and what kind of relationship exists between them. We use the correlation to assess if there is a linear relationship between them.

If the analysis is negative, i.e. there is no linear relationship, this doesn't mean that there can be no other relationship. For example, there could be a polynomial relationship of a degree > 1. We should therefore use other, more sophisticated techniques to verify this.

Chart Analysis with Excel

In general the chart is the first and most important analysis tool that allows us to make the first considerations about the trend of a phenomenon. Also in this case the first thing to do is to analyze the chart of distributions of the values of the two quantities and thus visually verify if there can be a relation between the two variables.

We normally use the Scatter Chart or Bubble Chart present in the Excel charts collection.

After obtaining the data of the two quantities, using the appropriate data mining techniques and after verifying the number of values to be represented, creating a scatter chart is very simple:

1. Select the data you want to plot in the scatter chart

2. Click the Insert tab, and then click Insert Scatter (X, Y) or Bubble Chart.

3. Click Scatter

Figure 1 - Create a Scatter chart

Let's take an example. We want to compare the pairs of price values of two different stocks to determine whether there is a correlation between the two stocks. For educational purposes only, we try to compare the closing prices of Intel and AMD. We create a first scatter by putting the Intel values on the X-axis and the AMD values on the Y-axis. Then a second one doing the opposite, i.e. putting the AMD values on the X-axis and the Intel values on the Y-axis.

In both cases we add a trendline to scatter chart and display R^2 value, that measures the soundness of the linear model obtained, and Equation.

Figure 2 – Scatter chart of INTC (X-axis) and AMD (Y-axis)

Figure 3 – Scatter chart of AMD (X-axis) and INTC (Y-axis)

What we get seems impressive. Although the equations are different, the value of R^2 is the same in both cases. This means that the correlation makes no distinction on which quantity is reported in the horizontal axis and which in the vertical one. The correlation studies the relationship between the quantities in an absolute sense, which is in the absence of an a priori hypothesis of a certain functional dependence between the represented variables.

Generally its use is appropriate when the scatter has an oval shape as in the examples shown in the two previous figures.

Correlation coefficient

The explanation for the above fact lies in the very nature of the correlation as the measure of the intensity of the associations and not the dependency of the relationship. That is, it is said that between the two variables there is a correlation when the tendency of a variable to vary with a more or less high level of intensity as a function of another one occurs, but no diagnosis of the type of relationship is made.

Sometimes the variations of one variable derive from those of the other (e.g. the relation between heritable somatic characters), others are common (relation between stature and individual weight), sometimes mutually dependent (relation between price and demand: the price influences to modify the demand, the demand influences to modify the price).

The correlation coefficient measures the intensity of these relationships. There are different types, corresponding to different calculation methods and, therefore, there are different formulas for this. We consider the Pearson correlation coefficient with the following formula:

The following can be deduced from the analysis of the formula:

rxy > 0 means that X and Y are directly correlated
rxy = 0 means that X and Y are not correlated
rxy < 0 means that X and Y are inversely correlated

The absolute value (|0,X|) indicates the degree of interdependence:

rxy < 0.3 weak correlation
rxy >= 0.3 And rxy<= 0.7 moderate correlation
rxy > 0.7 strong correlation

Correlation with Excel

From Pearson's formula it can be seen that the correlation is based on the concept of covariance between two variables. Therefore you can manually construct this coefficient step-by-step using the Excel Statistical Functions:

STDEV.P(number1,[number2],...)

where:

Number1: (Required) The first number argument corresponding to a population.

Number2 ... : (Optional) Number arguments 2 to 254 corresponding to a population. You can also use a single array or a reference to an array instead of arguments separated by commas.

and

COVARIANCE.P(array1,array2)

where:

Array1: (Required) The first cell range

Array2: (Required) The second cell range

Otherwise you can directly use the CORREL function with the following syntax:

CORREL(array1, array2)

where

Array1: (Required) A range of cell values

Array2 : (Required) A second range of cell values

Conclusion

In any field of application, medical, economic, financial, political, etc., correlation analysis is the first step in the analysis of dependence between variables. In this short article, we have seen how to do it via Excel, a powerful and, at the same time, popular tool.

From here we can then deepen the relationship through more specific analysis methodologies, such as, for example, regression (simple, linear, multiple). These are tools of mathematics and statistics and require the user's knowledge of their meaning.

Today we also have innovative Artificial Intelligence techniques that allow us to delegate to the machine the duty of finding meaningful relationships between data. But we will talk about this, if you want, in another article

What we’ve seen

Data analysis
Correlation analysis
Data processing
Scatter charts
Correlation coefficient
Excel statistical functions

If you found this article interesting and useful I will be happy if you want to share it ??

要查看或添加评论，请登录

Donata Petrelli的更多文章

Il Quantum Computing

2020年7月1日

Il Quantum Computing

Un possibile percorso di studio per conoscere l’argomento Dall’epoca dei computer grandi quanto una stanza agli attuali…

9 条评论
Intelligenza Artificiale e Esports

2020年6月10日

Intelligenza Artificiale e Esports

L’evoluzione delle tecniche di analisi attraverso lo Sport Osservare l’evoluzione dello Sport è una diversa prospettiva…

2 条评论
"The Black Swan" prediction

2020年4月23日

"The Black Swan" prediction

Predictive mathematical models in "particular" contexts We are living one of the most complex moments in history that…

4 条评论
Bayes with Excel

2020年4月1日

Bayes with Excel

From company manager to family man, from politician to school director, in critical situations we all have to make…

2 条评论
Correlation with Bayes

2020年3月27日

Correlation with Bayes

Never before must we be able to read and interpret data for our own good. From the biological sector to the medical…

5 条评论
Chi ha spostato la maionese dal frigo?

2020年2月12日

Chi ha spostato la maionese dal frigo?

Archiviare i dati conviene sempre Quanto tempo perdiamo nel cercare oggetti che non ricordiamo dove li abbiamo riposti…

1 条评论
Morphological optimization of Neural Networks

2019年12月20日

Morphological optimization of Neural Networks

How to pick the optimal model for the efficiency of training algorithm Among the Machine Learning models, the one of…

3 条评论
Classic Math Vs Artificial Intelligence

2019年12月20日

Classic Math Vs Artificial Intelligence

The transition from a “function-centric world” to a “data-centric world” From the primitive shepherds, through the…
Intelligenza Artificiale per il Trading

2019年12月9日

Intelligenza Artificiale per il Trading

Il modello Petrelli-Cesarini, un metodo per la previsione di prezzi nei mercati finanziari “La gioia nell’osservare e…
La correlazione con Excel

2019年6月14日

La correlazione con Excel

Individuare relazioni tra variabili In ogni attività, l’analisi dei dati caratteristici di un qualche fenomeno ed una…

See all articles

Statistics with Excel, the Correlation

Donata Petrelli

Financial Data Scientist | Quantitative Analyst | Chief Financial Officer | Book Author

The correlation … what's it for

Chart Analysis with Excel

Correlation coefficient

Correlation with Excel

Conclusion

What we’ve seen

Donata Petrelli的更多文章

其他会员也浏览了

Simple Linear Regression in Statistics using Least Squares Method

Statistics For People In A Hurry

Choosing The Right Statistical Test For Research: All About Continuous Data, More Than Two Groups, Small Samples N’ More

All We Need To Know About Probability In Statistics- The Beginner’s Guide

Chapter 2: Histogram-based Outlier Score (HBOS)

Part 2: The Beginner's Guide To No-Fire and ALL-Fire Sensitivity

The Powers of “Normal Distribution”

Beyond the Average: The Diverse World of Statistical Means

What is a Time Series

Correlation plots in?R

The correlation … what's it for

Chart Analysis with Excel

Correlation coefficient

Correlation with Excel

Conclusion

What we’ve seen

Donata Petrelli的更多文章

Il Quantum Computing

Intelligenza Artificiale e Esports

"The Black Swan" prediction

Bayes with Excel

Correlation with Bayes

Chi ha spostato la maionese dal frigo?

Morphological optimization of Neural Networks

Classic Math Vs Artificial Intelligence

Intelligenza Artificiale per il Trading

La correlazione con Excel

其他会员也浏览了

Simple Linear Regression in Statistics using Least Squares Method

Statistics For People In A Hurry

Choosing The Right Statistical Test For Research: All About Continuous Data, More Than Two Groups, Small Samples N’ More

All We Need To Know About Probability In Statistics- The Beginner’s Guide

Chapter 2: Histogram-based Outlier Score (HBOS)

Part 2: The Beginner's Guide To No-Fire and ALL-Fire Sensitivity

The Powers of “Normal Distribution”

Beyond the Average: The Diverse World of Statistical Means

What is a Time Series

Correlation plots in?R