Beyond the coefficient of variation
Andrés Gutiérrez
ECLAC Regional Adviser on Social Statistics - Vicepresident of the International Association of Survey Statisticians (2023 - 2025) - Elected Member of the International Statistical Institute
It is evident that in order to have reliable and accurate official statistical systems, quality criteria must be used to determine when an estimation resulting from a household survey can be published or suppressed. Some statistical institutes have adopted policies aimed at warning users about the relevance and credibility of the estimates. These practices aim to not publishing estimates with a coefficient of variation (CV) higher than a certain threshold (for example, 15% or 30%). The reason behind this reasoning is based on the reliability that must be ensured in a process of publishing figures for decision-making in the public sector. Despite the fact that the coefficient of variation is a measure that defines the precision of an indicator, it has, however, some shortcomings. Below are some of them:
Specifically, if the policy of not reporting estimates with a coefficient of variation (CV) greater than 15% were followed, estimates with small magnitudes (very close to zero) would be automatically penalized by this indicator. Even if the variability of the figure is small (close to zero), the coefficient of variation would be large. For example, suppose a study aims to estimate, among other parameters, the proportion of children who drop out of classrooms and do not return to school. After conducting the sampling, it was found that the proportion of dropout children is P=0.06 with a coefficient of variation of 25%. If we follow the 15% rule, then the estimates would not be publishable.
I firmly believe that the coefficient of variation should not be considered the holy grail of the measures of quality when establishing publication criteria. It would not be appropriate to adopt restrictive policies based on an indicator that cannot be generalized to all cases. What measure of variability should be adopted? In particular, it is necessary to go back to the roots of inference in finite populations, where reliability, precision, and sample size depend on the confidence interval, which encompasses two important measures of quality: the standard error (defined as the square root of the estimator's variance) and the margin of error (defined as the multiplication of the standard error by the appropriate percentile of the estimator's distribution). With the confidence interval, one can determine if a estimate is reliable or not and decide whether to proceed with its publication.
领英推荐
Continuing with the example of dropout students, a coefficient of variation of 25% for an estimated proportion of P=0.06 results in a standard error of 1.5% (calculated as 0.25 * 0.06 = 0.015) and a margin of error close to 3% (calculated as 0.015 * 1.96 = 0.029). Therefore, the confidence interval for the proportion would be [3%, 9%] = [6% - 3%, 6% + 3%]. These figures are not negligible and could be published by any entity generating official statistics. However, the idea of reviewing confidence intervals to determine the quality of a figure is not reaaly suitable since it would require evaluating all figures (one by one) and making the decision after a careful study of the interval's width.
In ECLAC, we wrote a somehow comprehensive solution based on quality criteria that go beyond the coefficient of variation. A similar rationale was recently adopted by the Chilean National Statistical Institute. What is your opinion about banning some estimates that does not meet certain quality criteria?