On the credible probability of confidence intervals
Warning! I am not a statistician. This article is not reviewed. Please confer with a responsible adult!
Confidence intervals are commonly misinterpreted by consumers of statistics. Hoekstra et al. [1] presented 120 psychology researchers and 442 students with ‘a fictitious scenario of a professor who conducts an experiment and reports a 95% CI for the mean that ranges from 0.1 to 0.4’. 58% of respondents endorsed the assertion ‘There is a 95% probability that the true mean lies between 0.1 and 0.4’, and the proportions were similar between students and researchers. That assertion is incorrect,1 but clearly the misinterpretation is common.
The usual explanation for why the assertion is incorrect goes something like ‘The true mean is a fixed (but unknown) value, not random. So either it is in the interval (with probability 1), or it is not in the interval (with probability 1). There can be no probability in between.’2 But this seems unsatisfying, for it seems to beg the question. There are interpretations of probability which can assign probabilities to fixed-but-unknown parameters. Does adopting another interpretation of probability not resolve the issue?
Interpretations of probability and approaches to statistical inference
Two interpretations of probability are particularly relevant to this discussion:
-
The relative frequency interpretation of probability says to the effect that the probability of an event in a random experiment is the ‘limiting relative frequency’ of the event, if the random experiment were repeated infinitely [2]. If I have a coin, and repeatedly tossing the coin a large number of times would tend to it landing heads 50% of the time, then the probability of a single coin flip landing heads is 50%.
-
The subjective interpretation of probability says to the effect that probability represents a ‘degree of belief’ in an outcome which is uncertain [2]. If I have a coin, I am uncertain about which way it would land if I were to toss it. If specifically I think it is equally plausible that the coin would land heads as tails, then the probability of it landing heads is 50%.
Interpretations of probability are often conflated with the frequentist and Bayesian approaches to statistical inference. Confidence intervals come from the frequentist school of inference, which is commonly associated with the relative frequency interpretation.3 But as we shall see, there is nothing stopping us applying the subjective interpretation to frequentist methods.
Frequentist estimation and relative likelihood interpretation
Let's consider the following model of frequentist estimation, using the relative likelihood interpretation of probability. Let be the true value of the target parameter. In the relative likelihood interpretation, is unknown but fixed – there is no random experiment which can ‘change’ , so it is not a random variable. Let be possible values of .
Considering each possible in turn, we imagine that, supposing that is the true value of , we draw a sample of observations . , then, is a random variable, whose sampling distribution is based on . Let be possible vectors of observations which we could draw.
Based on the which we observe, we produce the corresponding confidence interval according to our confidence procedure. , then, is a random variable, a function of . In other words, we map each possible vector of observations to a corresponding confidence interval .
Since we are considering a single at a time, we can then determine if is in each of the confidence intervals, and since the sampling distribution of (and hence ) is known for a particular , we can compute , the probability that the generated confidence interval will contain the true value of , if the true value of is .
Because we of course do not know which is the true value of , we then require that for every (i.e. for any possible value of ), .4 Then is a 95% confidence interval for . In particular, since the inequality is true for any possible value of , it is necessarily true that , no matter what in fact truly happens to be. In frequentist terms, the coverage probability of is ≥95%.
Once we perform the experiment, however, we realise a particular confidence interval , which is now a fixed, known interval. In the relative likelihood interpretation, since the true is also fixed, no random variables are now involved – so either contains , or it does not, and no probabilities can be assigned.
Frequentist estimation and subjective interpretation
With the subjective interpretation of probability, however, we need not be stumped here. In the subjective interpretation of probability, is of course still unknown and fixed. But even though it is not the outcome of a ‘random experiment’, we can nevertheless assign probabilities to possible values of , representing our uncertain prior beliefs about what is likely to be. So let instead be the events that the true equals each of those possible values.
We then proceed similar to before – in the event that is the true value of , will have a particular sampling distribution, and we generate confidence intervals , and compute . The calculation is exactly the same as in the relative likelihood interpretation, but in this case it can be directly interpreted as a conditional probability, because is a random event. The overall experiment is depicted in the following tree diagram:
We then require that for all , so that is a 95% confidence interval for . We can then reason as before and say, then, that ‘no matter what in fact truly happens to be’, for the true value of .5 Again, in frequentist terms, the coverage probability of is ≥95%.
It is apt to reiterate the focus of the frequentist coverage probability. Because for any , then for the true value of , which is what will actually happen. If, say, the true value of is in fact , we condition on and ask the probability that given . This is represented in the following diagram by the shaded blue area as a fraction of the dashed area:
Credible probability of confidence interval
Still using the subjective interpretation of probability, once we perform the experiment, we realise a particular confidence interval . But even though is now fixed, are random events, representing our uncertainty about the true value of the parameter. Therefore we can ask the probability – our degree of belief – that contains the true parameter, based on how likely we think each possible value of the true parameter is. This is the Bayesian credible probability of the particular interval .
Let's suppose we generated the confidence interval . Referring to the tree diagram, note that we are only concerned with the branches which result in generating . We of course do not care about given those events where we might have generated, say, – it is of course unlikely the true parameter lies within one of the hypothetical confidence intervals which we did not generate! We care about given that we generated as the confidence interval. So in other words, we condition on and ask the probability that given . This is represented in the following diagram by the shaded green area as a fraction of the dashed area:
Clearly, this is quite different to the diagram corresponding to the frequentist coverage probability.
In symbolic terms, we seek . This is simply the sum,6 over all elements , that is the true :
By Bayes' theorem:
Immediately, we see that this quantity will depend on , the prior probability (i.e. not conditioned on the generated confidence interval, and so not conditioned on the data we observed) that is the true .7
Conclusion
This demonstrates that the credible probability of a confidence interval is not equal to the coverage probability. In other words, after gathering observations, there is not a 95% probability that the true parameter lies within the 95% confidence interval which has been generated. This is not (necessarily) because ‘it either lies within the interval or not’, but because the credible probability requires additionally specifying a prior distribution on the target parameter.
References
[1] Hoekstra R, Morey RD, Rouder JN, Wagenmakers EJ. Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review. 2014 Oct 1; 21(5): 1157–64. doi: 10.3758/s13423-013-0572-3
[2] Hájek A. ‘Interpretations of probability’. In: Zalta EN, ed. The Stanford encyclopedia of philosophy. 2019 Aug 28 [cited 2022 Sep 12]. https://plato.stanford.edu/entries/probability-interpret/
[3] Casella G, Berger RL. Statistical inference. 2nd ed. California: Duxbury; 2001.
[4] Wackerly DD, Mendenhall W III, Scheaffer RL. Mathematical statistics with applications. 7th ed. California: Duxbury; 2008.
Footnotes
-
Hoekstra et al. [1] present 6 asserted statements, of which this is statement 4. I agree statements 1, 4 and 6 are manifestly incorrect assertions. In my view, statements 2, 3 and 5 are so vague as to be at least debatable. But that is neither here nor there. ↩
-
This is essentially the reason given by Hoekstra et al. [1], that the assertion ‘assign[s] [a] probabilit[y] to [a] parameter[], something that is not allowed within the frequentist framework’. ↩
-
For example, introductory textbooks like Casella & Berger [3] and Wackerly et al. [4] adopt this interpretation with little discussion. One could say the relative frequency interpretation and frequentist school of inference are alike in philosophy, in that they both focus prospectively on the future properties of a large number of experiments. ↩
-
See e.g. [3 p. 418]. ↩
-
But in the subjective interpretation, since are events and are mutually exclusive, we can also directly compute as . This represents the prior probability, across all possible values of according to how likely we believe they are, that the generated confidence interval will contain the true . And since for all , necessarily. ↩
-
For simplicity, we present the discrete case. In the continuous case, we replace sums of probabilities with integrals over probability densities. ↩
-
The prior also sneaks into the calculation via the term. A more detailed discussion can be found at https://stats.stackexchange.com/a/89363. ↩