Beyond the coefficient of variation
https://unsplash.com/@polarmermaid

Beyond the coefficient of variation

It is evident that in order to have reliable and accurate official statistical systems, quality criteria must be used to determine when an estimation resulting from a household survey can be published or suppressed. Some statistical institutes have adopted policies aimed at warning users about the relevance and credibility of the estimates. These practices aim to not publishing estimates with a coefficient of variation (CV) higher than a certain threshold (for example, 15% or 30%). The reason behind this reasoning is based on the reliability that must be ensured in a process of publishing figures for decision-making in the public sector. Despite the fact that the coefficient of variation is a measure that defines the precision of an indicator, it has, however, some shortcomings. Below are some of them:

  1. Does a negative coefficient of variation make sense? Yes, it does. When the estimate is negative, the coefficient of variation is also negative. For example, when estimating differences, net changes, gross changes, impacts, etc. Is a negative coefficient of variation easily understood? No, it is not. Generally, the absolute value is taken.
  2. Suppose the parameter estimation is exactly zero. For this configuration, regardless of how large or small the variance is, the coefficient of variation is not defined.
  3. Assume that the estimation of the parameter of interest is very close to zero. For this configuration, regardless of how large or small the variance is, the coefficient of variation will be very large and will not represent the quality of the sampling strategy.
  4. If you are estimating a proportion P. If the point estimate is very close to zero, the coefficient of variation (CV) tends to infinity. However, the same CV of the complement of the proportion (1-P) will be very small and reliable. This results in a paradox, as the same phenomenon is being measured, but the CVs are contradictory.
  5. Due to the above, the coefficient of variation (CV) for a proportion does not induce a symmetric measure around P=0.5, as standard error or variance do.

Specifically, if the policy of not reporting estimates with a coefficient of variation (CV) greater than 15% were followed, estimates with small magnitudes (very close to zero) would be automatically penalized by this indicator. Even if the variability of the figure is small (close to zero), the coefficient of variation would be large. For example, suppose a study aims to estimate, among other parameters, the proportion of children who drop out of classrooms and do not return to school. After conducting the sampling, it was found that the proportion of dropout children is P=0.06 with a coefficient of variation of 25%. If we follow the 15% rule, then the estimates would not be publishable.

I firmly believe that the coefficient of variation should not be considered the holy grail of the measures of quality when establishing publication criteria. It would not be appropriate to adopt restrictive policies based on an indicator that cannot be generalized to all cases. What measure of variability should be adopted? In particular, it is necessary to go back to the roots of inference in finite populations, where reliability, precision, and sample size depend on the confidence interval, which encompasses two important measures of quality: the standard error (defined as the square root of the estimator's variance) and the margin of error (defined as the multiplication of the standard error by the appropriate percentile of the estimator's distribution). With the confidence interval, one can determine if a estimate is reliable or not and decide whether to proceed with its publication.

Continuing with the example of dropout students, a coefficient of variation of 25% for an estimated proportion of P=0.06 results in a standard error of 1.5% (calculated as 0.25 * 0.06 = 0.015) and a margin of error close to 3% (calculated as 0.015 * 1.96 = 0.029). Therefore, the confidence interval for the proportion would be [3%, 9%] = [6% - 3%, 6% + 3%]. These figures are not negligible and could be published by any entity generating official statistics. However, the idea of reviewing confidence intervals to determine the quality of a figure is not reaaly suitable since it would require evaluating all figures (one by one) and making the decision after a careful study of the interval's width.

In ECLAC, we wrote a somehow comprehensive solution based on quality criteria that go beyond the coefficient of variation. A similar rationale was recently adopted by the Chilean National Statistical Institute. What is your opinion about banning some estimates that does not meet certain quality criteria?


要查看或添加评论,请登录

Andrés Gutiérrez的更多文章

  • Milan Kundera and Probability

    Milan Kundera and Probability

    Today, Milan Kundera passed away. He was 94.

    1 条评论
  • Lord's paradox in R

    Lord's paradox in R

    In an article entitled "A Paradox in the Interpretation of Group Comparisons" published in Psychological Bulletin, Lord…

  • You smart? How about your children? The law of regression to the mean

    You smart? How about your children? The law of regression to the mean

    Francis Galton -British scientist and cousin of Charles Darwin-, in the late 19th century cleverly coined the term…

    1 条评论
  • How important is that variable?

    How important is that variable?

    When building a model that includes explanatory variables related to the phenomenon of interest, one question arises:…

    2 条评论

社区洞察

其他会员也浏览了