Criminy! How Can My Confidence Interval Be Wrong?
Have you ever played a board game where you tossed two dice to move? What did you think when you rolled a three? You probably thought about what space you landed on and what action you’d have to take on that turn. But, what did you think about the probability of rolling a three? Probably nothing. There’s nothing rare or uncommon about tossing a three.
To roll a three using two dice the first die would have to be a one and the second die a two—or vice versa, two on the first die and one on the second. So, there are only two combinations to roll a three. There is a total of thirty-six possible combinations when rolling two, fair, six-sided dice. Therefore, the probability of tossing a three is 2/36 ≈ 0.056 ≈ 5.6%. Hence, if you rolled 100 times, based on probability you’d expect to toss a three about five or six times. Or, if you rolled 20 times, you’d expect to toss a three once, maybe twice. Again, it’s common enough and occurs with enough frequency you’d probably think nothing about rolling a three.
Let’s shift gears. You’re at work running a test to determine if some parameter of your product hits a nominal value listed in a requirement or stated on a blueprint. You estimate a 95% confidence interval and see the value is within the confidence interval. Everyone’s happy and you move on to the next task or project. But then, either based on customer complaints, recorded issues in the field, or some other empirical evidence, reality contradicts the results you obtained in your test. Your confidence interval was wrong.
“Mon Dieu! I was 95% confident it was right,” you exclaim. “How can it be wrong?”
95% confidence, by definition, means 5% not confident. Or put another way, 95% chance of being right and 5% of being wrong. You think nothing of tossing a three when playing that board game—which has a probability of 5.6% of occurring, only slightly higher than the 5% chance of estimating a bogus confidence interval. But when it comes to using statistics, you suspend your belief in the chance of being wrong and delude yourself into the erroneous conclusion that 95% confidence is so high, it can’t possibly be wrong. Yet, with about the same frequency as tossing a three, 95% confidence intervals may be wrong approximately 5% of the time.
There’s more to it than that. We say, “I’m highly confident my pizza will burn if I leave it in the oven too long.” Or, “I’m highly confident the electric company will shut off my power if I don’t pay my bill.” Each of these has a basis in truth making our belief in them strong. So, in our minds we extend that way of thinking to I’m 95% confident my interval is true. That’s not exactly what it means.
Confidence interval methodology is based on repetition. A sample of size n is randomly selected from a population, a statistic is calculated as your “best guess” estimate of the true population parameter (based on your sample size), and then a confidence interval is constructed around that statistic. Then, the n selected units are put back into the population and another sample of size n is selected, the statistic is calculated, and a second confidence interval is constructed. This is repeated many, many, many times (i.e., approaching infinity). Then, when these many confidence intervals have been constructed, theoretically 95% of them should contain the true population parameter and 5% may not. It is this idea of repetition where the definition of statistical confidence comes from. We are confident the methodology, when repeated, will produce correct results 95% of the time—not that each and every confidence interval will contain the true population parameter.
In other words, if you select 100 unique samples and create 100 confidence intervals, then 95 will probably contain the true population parameter and five may not. However, you selected only one sample of size n and estimated only one confidence interval. The truth is you have no way of knowing if your one confidence interval is one of the 95% that contains the true population parameter or one of the 5% that doesn’t. Moreover, increasing the sample size doesn’t change the theory behind confidence interval methodology. It only gives you more data to work with in that one sample.
The above explanation assumes no bias was present in your data. If that’s true, then by “the luck of the draw” probability may award you with a sample which is truly representative of your population. Unfortunately, probability offers no guarantees. If bias was present, however, then all bets are off and that 5% error rate you so graciously accepted may be as real as sighting a yeti or a jackalope, or meeting an honest politician. Hmm, bias sounds like a good topic for another post!
--
3 年Dear Data Whisperer: We 95% enjoyed you article and 100% loved the conclusion. P.S. Nice picture. It looks sort of familiar. MC says there is a small interval in her understanding.