Instrumental variables: a smart way to support the exclusion restriction
Photo by AbsolutVision on Unsplash

Instrumental variables: a smart way to support the exclusion restriction

There is no way the exclusion restriction can be ultimately proved. However, in my last econometrics course, I was shown a clever placebo test to bring some evidence in its support. The aim of this article is explaining this little trick and providing an example of its use since it might come in handy to the ones planning to use IV in their analysis.

The assumptions of the IV: a quick recap

In a nutshell, the Instrumental Variable methodology provides a way to work around those situations in which we suspect, or we know, our explanatory variable to have an endogeneity issue. To do this, we need an instrument, thus another variable, which must satisfy three assumptions:

  1. Independence: it must be uncorrelated with the error term.
  2. Exclusion: it must have no direct effect on the variable of interest.
  3. Relevance: it must have a significant correlation with the instrumented variable.

Sometimes "exogeneity" is used to combine the first and the second assumption. The image which follows is a graphical representation of the causal paths and it will help me later to introduce the placebo test:

Non è stato fornito nessun testo alternativo per questa immagine

Intuitively, the two crossed arrows represent respectively the first and the second assumption.

Supporting a non-verifiable assumption

Addressing the actual purpose of this article, we know that there is no way to prove conditions 1 and 2: in other words, there is no test which can guarantee us that these assumptions are verified. Intuitively, condition 1 cannot be proved since the error term is, by definition, unobservable. For condition 2 instead, the idea is that if we check the correlation between Z and Y we do not know whether the result is through T or not. What one can do is explain theoretically why exclusion is assumed to hold.

However, in these cases, one can try to falsify the hypothesis: if we cannot falsify, then some more convincing evidence in favour of the assumption is gained. Intuitively, not being able to falsify your own hypothesis is a sign of the strength of the hypothesis itself. However, at the cost of repeating myself, I want to stress that it is not a proof of the validity but simply a support to it, as long as we are dealing with an empirical subject.

Placebo tests belong to this kind of practice. In econometrics, the idea behind them is to estimate causal effects that are known to be equal to zero based on a priori knowledge: if the results of the estimates actually turn out to be zero, the method gains some robustness, if they don't... bad news.

Back to the exclusion restriction

Restricting the scope to our case, what we have to do is find an instance for which we can test that, assuming the exclusion restriction to be holding, the estimated effect of Z on Y is zero.

Non è stato fornito nessun testo alternativo per questa immagine

The "trick" consists in checking our dataset to see if there happens to be a portion of the population whose first stage effect, thus the impact of the instrument on the instrumented variable, is null or very small. Ideally, supposing a case where T is a binary treatment, we should identify a subsample for which the treatment status is always 0, as suggested by some prior information.

At this point, running the regression of the instrument (Z) on the dependent variable (Y), we should find no effect and this would make our assumption more trustworthy. If this is not the case, and our previous knowledge is well-founded, then we have shown that Z has some effect on Y which is not due to the variation caused on T: the exclusion restriction is endangered.

Two examples

The first example is taken from a paper by van Kippersluis and Rietveld (2018), who use the example on the effect of prostate cancer (T) on well-being (Y). Given the likelihood of endogeneity issues, the IV methodology was adopted, using as an instrument a particular gene (Z) which is considered to be a determinant of the illness. The exclusion restriction requires an absence of direct effect between Z and Y, but how to run the "test"? The authors verified the effect of the gene on women: since they do not have a prostate to start with, the first stage of the regression is completely absent. Any effect of Z on Y among women cannot be through T since there is no T at all!

The second example is fictitious but it can give more an idea of the application in a common scenario. Suppose that we are studying the effect of a policy (T), let's say a welfare activation program, on a variable like mental health (Y). To work around the selection bias, we want to use the IV method: as an instrument, we choose the encouragement to participate (Z) which was randomly distributed across the population. At this point, suppose that a group of individuals, say those aged under 25, was prevented from the participation in the program. Once again, supposing that some of these individuals were encouraged anyway, we can test the effect of Z on Y on this restricted sample: since they had no T, any effect of Z on Y must be direct, violating the exclusion restriction.

Conclusion

The test suggested in this article is a way to support the exclusion restriction and it can be easily implemented if one disposes of the right data. Since doing something like this had never touched my mind before it was explained to me, I wanted to spread the idea to provide the intuition to others, mainly my fellow students.

Bonus: a couple of technicalities

In the article, I mainly underlined that estimating a null effect can be seen as a support to the exclusion restriction: but does a significant coefficient necessarily imply a violation of this condition? Well, no, but it is still bad news. As a matter of fact, what we have estimated is an effect which is not through T: this means that it can be either direct or through any other unobserved variable. While the first case represents a violation of the exclusion restriction, the second is instead a violation of independence. Even if, on the practical level, distinguishing the two scenarios is of little value since the conclusion is always the invalidity of the instrument, this can be of interest on the theoretical side. As far as my understanding goes, I think there is no way of disentangling the two effects.

Addressing another aspect, what if we get a non-null but still small effect? Still, on the practical side, it makes sense to compare the size of this result with the one of the actual reduced form; if we have a really strong instrument and a weak "placebo effect" then we should still be able to use the instrument.

Lastly, I mention a point which instantly tempted my practical-oriented mind and might do the same with some of you. Let's say we get a non-null result, can't we just "scale" the IV by this coefficient to account for this bias? Well, we can but it would impose a rather uncommon set of restrictions: we base this test on the claim that thanks to some knowledge, the first stage is different for the subsample we restricted our analysis to. Applying the reduced form estimates then blindly onto the whole sample seems in contrast with that initial claim.

__________________________________________________________________________

References

Hans van Kippersluis, Cornelius A. Rietveld, Beyond plausibly exogenous, The Econometrics Journal, Volume 21, Issue 3, 1 October 2018, Pages 316–331, https://doi.org/10.1111/ectj.12113

Hans van Kippersluis, Cornelius A Rietveld, Pleiotropy-robust Mendelian randomization, International Journal of Epidemiology, Volume 47, Issue 4, August 2018, Pages 1279–1288, https://doi.org/10.1093/ije/dyx002

要查看或添加评论,请登录

Filippo Pisello的更多文章

社区洞察

其他会员也浏览了