Measures of Dispersion- Range, Varince & Standard Deviation
Arjun Panwar
Building vision based solutions | Computer Vision Engineer | Python | System Design
Measures of Dispersion help us to know about the dispersion of the data set.
Why we need Measures of Dispersion?
Central Tendency i.e. mean, median and mode are not sufficient to reveal the shape of data set. To know about the variation among the data set values, we need Measures of Dispersion.
We consider three major measures of dispersion — Range, Variance & Standard Deviation.
Range
Range tells us about the lower and upper limits of the data set. It is the difference between the smallest and the largest observations.
The range is very sensitive to outliers.
Variance
Variance is a measure of dispersion in a data set. It is measured by first finding the Deviation of each element in a data set from the mean, and then by squaring it. Variance is an average of all squared deviations.
Note: In the sample variance formula, the denominator has n-1 instead of n, where n is the number of observations in the sample. This use of ‘n-1’ is the Bessel’s correction method. The reason behind using this method is, it corrects the bias in the estimation of the population variance.
import numpy as np data=[312,464,4,32,24,43,6] np.var(data)
Standard deviation
A standard deviation is a statistic that measures the dispersion of a data set relative to its mean. It tells us about the concentration of data around the mean of the data set.
Unlike variance, standard deviation has the advantage of being in the same units as the original variable
import numpy as np
data=[312,464,4,32,24,43,6] np.std(results)
Facts about Standard Deviation:?
- If the standard deviation is small, the data has little spread (i.e., the majority of points fall very near the mean).
- If standard deviation = 0, there is no spread. This only happens when all data items are the same value.
- The standard deviation is significantly affected by outliers and skewed distributions.
Here is a question for you, in the following line plot arrange red, green & blue lines according to their standard deviation. Comment down your answers.