Evaluators and avoiding the replication crisis
Personal statements or any other information or views are not endorsed or authorised by my employer
Previous articles in this series have looked at the replication crisis in detail: how it arose in psychology and began to be dealt with, and the extent that it may be an issue in social science and evaluations. This final article looks at some of the replications that have taken place and what can be learned.
Replications in evaluation
As examined in the previous article, conducting evaluation replications is not straightforward. Verification and reanalysis of data should be possible, although replication studies in other fields suggest data is not always publicly available. Properly reproducing an evaluation is considerably more difficult as evaluations always take place focused on an intervention delivered in a specific way to a certain population within a particular context. Matching the replication to any original study is almost impossible, even if a sufficiently open-minded funder is willing to fund a replication in these circumstances.
Given these issues it is not surprising that there have been relatively few attempts to replicate evaluations, particularly if this means trying to reproduce studies. The most prominent programme has been the programme of over 100 “push button” replications developed by the International Initiative for Impact Evaluation (3ie). Prior to this, 3ie began funding more than 20 replications of impact evaluations, with the most recently published replications having largely focused on replications of studies in Africa, India and South America on cash transfers and on HIV interventions.
These results led to considerable debate within the international development sector. Original authors and replicators had very different perspectives, linked to different understandings as to what counted as replications. Various international development experts weighed in with their views (for example, here, here, here and here). Discussion often focused on the extent that replicators should approach analysis differently to the original authors – if you change the analysis to include a different variable, is this because the original authors were necessarily wrong, or is it just that researchers have different values and perspectives?
In addition to highlighting various difficulties in undertaking replications of evaluations, these replications have raised much needed awareness around the importance of replications and how these can be developed. For example, ?zler recommends that failure to replicate should be defined in advance and that a similar process should be in place for any replication as for the original paper, including working with the originally publishing journal and referees if possible.
What can be done...
“Across the medical and social sciences, new discussions about replication have led to transformations in research practice. Sociologists, however, have been largely absent from these discussions” Freese and Peterson
In one way, evaluators are in a relatively fortuitous position. The lack of replications, particularly in non-international work, means that the field has been under considerably less scrutiny than other social sciences. This puts us in a beneficial position where we can learn from what has worked in other fields. This should include (as noted in the second article in this series) developing processes and protocols to ensure data are standardly available for verification or re-analysis, that analysis and papers are pre-registered, and that studies have sufficient power. The 3ie experience gives us the opportunity to learn specifically from evaluation replications, particularly on how reproductions can be developed to make sure original authors and replicators work together as closely and collaboratively as possible.
An alternative view is that the small number of replications in evaluations means there is little proof around the extent of any connected issues and that we are lucky in having avoided the scrutiny other fields have faced. However, we cannot close our eyes to what is happening in similar fields or the emerging evidence from evaluation replications and feel action in not required. As the efforts in psychology to develop their processes following public attention illustrate, sometimes sunlight is the best disinfectant.
The first and most important step is to raise awareness among evaluators so that practitioners are thinking about the issue, being aware of the replication crisis, potential causes and solutions relating to the issue. The crisis can be used to show the importance of much of our standard work on evaluations to ensure that they are well-designed and accurately reported.
Raising awareness will doubtless highlight a range of varying viewpoints. There will likely be different perspectives as to the extent that there is an issue with replications in evaluation and, if so, the possible mitigating actions that could be taken. This is not a discussion that we should avoid.
Potential solutions should be examined and implemented where possible. This process should involve a wide range of stakeholders, including clients and funders, agencies, and both public and private organisations. Given the challenges in undertaking reproductions or genuine extensions of evaluations, a good starting point would be to make sure that simpler “push button” replications can take place. Evaluation data should be more widely available, and funders and journals should ensure that data and protocols are available before publication or final payment.
This represents a challenge to evaluators but one that we should be confident in facing. Doing so will not just mean we are avoiding the “replication crisis” but providing the best quality evaluations.