On the credible probability of confidence intervals

Warning! I am not a statistician. This article is not reviewed. Please confer with a responsible adult!

Confidence intervals are commonly misinterpreted by consumers of statistics. Hoekstra et al. [1] presented 120 psychology researchers and 442 students with ‘a fictitious scenario of a professor who conducts an experiment and reports a 95% CI for the mean that ranges from 0.1 to 0.4’. 58% of respondents endorsed the assertion ‘There is a 95% probability that the true mean lies between 0.1 and 0.4’, and the proportions were similar between students and researchers. That assertion is incorrect,¹ but clearly the misinterpretation is common.

The usual explanation for why the assertion is incorrect goes something like ‘The true mean is a fixed (but unknown) value, not random. So either it is in the interval (with probability 1), or it is not in the interval (with probability 1). There can be no probability in between.’² But this seems unsatisfying, for it seems to beg the question. There are interpretations of probability which can assign probabilities to fixed-but-unknown parameters. Does adopting another interpretation of probability not resolve the issue?

Interpretations of probability and approaches to statistical inference

Two interpretations of probability are particularly relevant to this discussion:

The relative frequency interpretation of probability says to the effect that the probability of an event in a random experiment is the ‘limiting relative frequency’ of the event, if the random experiment were repeated infinitely [2]. If I have a coin, and repeatedly tossing the coin a large number of times would tend to it landing heads 50% of the time, then the probability of a single coin flip landing heads is 50%.
The subjective interpretation of probability says to the effect that probability represents a ‘degree of belief’ in an outcome which is uncertain [2]. If I have a coin, I am uncertain about which way it would land if I were to toss it. If specifically I think it is equally plausible that the coin would land heads as tails, then the probability of it landing heads is 50%.

Interpretations of probability are often conflated with the frequentist and Bayesian approaches to statistical inference. Confidence intervals come from the frequentist school of inference, which is commonly associated with the relative frequency interpretation.³ But as we shall see, there is nothing stopping us applying the subjective interpretation to frequentist methods.

Frequentist estimation and relative likelihood interpretation

Let's consider the following model of frequentist estimation, using the relative likelihood interpretation of probability. Let $θ$ be the true value of the target parameter. In the relative likelihood interpretation, $θ$ is unknown but fixed – there is no random experiment which can ‘change’ $θ$, so it is not a random variable. Let $θ_1, θ_2, θ_3, …$ be possible values of $θ$.

Considering each possible $θ_i$ in turn, we imagine that, supposing that $θ_i$ is the true value of $θ$, we draw a sample of observations $\mathbf{x}$. $\mathbf{x}$, then, is a random variable, whose sampling distribution is based on $θ_i$. Let $\mathbf{x}_1, \mathbf{x}_2, \mathbf{x}_3, …$ be possible vectors of observations which we could draw.

Based on the $\mathbf{x}$ which we observe, we produce the corresponding confidence interval $C$ according to our confidence procedure. $C$, then, is a random variable, a function of $\mathbf{x}$. In other words, we map each possible vector of observations $\mathbf{x}_1, \mathbf{x}_2, \mathbf{x}_3, …$ to a corresponding confidence interval $c_1, c_2, c_3, …$.

Since we are considering a single $θ_i$ at a time, we can then determine if $θ_i$ is in each of the confidence intervals, and since the sampling distribution of $\mathbf{x}$ (and hence $C$) is known for a particular $θ_i$, we can compute $\Pr_{θ_i}(θ ∈ C)$, the probability that the generated confidence interval will contain the true value of $θ$, if the true value of $θ$ is $θ_i$.

Because we of course do not know which $θ_i$ is the true value of $θ$, we then require that for every $θ_i$ (i.e. for any possible value of $θ$), $\Pr_{θ_i}(θ ∈ C) ≥ 0.95$.⁴ Then $C$ is a 95% confidence interval for $θ$. In particular, since the inequality is true for any possible value of $θ$, it is necessarily true that $\Pr(θ ∈ C) ≥ 0.95$, no matter what $θ$ in fact truly happens to be. In frequentist terms, the coverage probability of $C$ is ≥95%.

Once we perform the experiment, however, we realise a particular confidence interval $c$, which is now a fixed, known interval. In the relative likelihood interpretation, since the true $θ$ is also fixed, no random variables are now involved – so $c$ either contains $θ$, or it does not, and no probabilities can be assigned.

Frequentist estimation and subjective interpretation

With the subjective interpretation of probability, however, we need not be stumped here. In the subjective interpretation of probability, $θ$ is of course still unknown and fixed. But even though it is not the outcome of a ‘random experiment’, we can nevertheless assign probabilities to possible values of $θ$, representing our uncertain prior beliefs about what $θ$ is likely to be. So let $θ_1, θ_2, θ_3, …$ instead be the events that the true $θ$ equals each of those possible values.

We then proceed similar to before – in the event that $θ_i$ is the true value of $θ$, $\mathbf{x}$ will have a particular sampling distribution, and we generate confidence intervals $C$, and compute $\Pr(θ ∈ C | θ_i)$. The calculation is exactly the same as $\Pr_{θ_i}(θ ∈ C)$ in the relative likelihood interpretation, but in this case it can be directly interpreted as a conditional probability, because $θ_i$ is a random event. The overall experiment is depicted in the following tree diagram:

Probability tree diagram

We then require that $\Pr(θ ∈ C | θ_i) ≥ 0.95$ for all $θ_i$, so that $C$ is a 95% confidence interval for $θ$. We can then reason as before and say, then, that ‘no matter what $θ$ in fact truly happens to be’, $\Pr(θ ∈ C) ≥ 0.95$ for the true value of $θ$.⁵ Again, in frequentist terms, the coverage probability of $C$ is ≥95%.

It is apt to reiterate the focus of the frequentist coverage probability. Because $\Pr(θ ∈ C | θ_i) ≥ 0.95$ for any $θ_i$, then $\Pr(θ ∈ C) ≥ 0.95$ for the true value of $θ$, which is what will actually happen. If, say, the true value of $θ$ is in fact $θ_3$, we condition on $θ_3$ and ask the probability that $θ ∈ C$ given $θ_3$. This is represented in the following diagram by the shaded blue area as a fraction of the dashed area:

Probability tree diagram

Credible probability of confidence interval

Still using the subjective interpretation of probability, once we perform the experiment, we realise a particular confidence interval $c$. But even though $c$ is now fixed, $θ_1, θ_2, θ_3, …$ are random events, representing our uncertainty about the true value of the parameter. Therefore we can ask the probability – our degree of belief – that $c$ contains the true parameter, based on how likely we think each possible value of the true parameter is. This is the Bayesian credible probability of the particular interval $c$.

Let's suppose we generated the confidence interval $c_3$. Referring to the tree diagram, note that we are only concerned with the branches which result in generating $c_3$. We of course do not care about $\Pr(θ ∈ c_3)$ given those events where we might have generated, say, $c_1$ – it is of course unlikely the true parameter lies within one of the hypothetical confidence intervals which we did not generate! We care about $\Pr(θ ∈ c_3)$ given that we generated $c_3$ as the confidence interval. So in other words, we condition on $c_3$ and ask the probability that $θ ∈ C$ given $c_3$. This is represented in the following diagram by the shaded green area as a fraction of the dashed area:

Probability tree diagram

Clearly, this is quite different to the diagram corresponding to the frequentist coverage probability.

In symbolic terms, we seek $\Pr(θ ∈ c_3 | c_3)$. This is simply the sum,⁶ over all elements $θ_i ∈ c_3$, that $θ_i$ is the true $θ$:

\[\Pr(θ ∈ c_3 | c_3) = \sum_{θ_i ∈ c_3} \Pr(θ_i | c_3)\]

By Bayes' theorem:

\[\Pr(θ ∈ c_3 | c_3) = \sum_{θ_i ∈ c_3} \frac{\Pr(c_3 | θ_i) \Pr(θ_i)}{\Pr(c_3)}\]

Immediately, we see that this quantity will depend on $\Pr(θ_i)$, the prior probability (i.e. not conditioned on the generated confidence interval, and so not conditioned on the data we observed) that $θ_i$ is the true $θ$.⁷

Conclusion

This demonstrates that the credible probability of a confidence interval is not equal to the coverage probability. In other words, after gathering observations, there is not a 95% probability that the true parameter lies within the 95% confidence interval which has been generated. This is not (necessarily) because ‘it either lies within the interval or not’, but because the credible probability requires additionally specifying a prior distribution on the target parameter.

References

[1] Hoekstra R, Morey RD, Rouder JN, Wagenmakers EJ. Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review. 2014 Oct 1; 21(5): 1157–64. doi: 10.3758/s13423-013-0572-3

[2] Hájek A. ‘Interpretations of probability’. In: Zalta EN, ed. The Stanford encyclopedia of philosophy. 2019 Aug 28 [cited 2022 Sep 12]. https://plato.stanford.edu/entries/probability-interpret/

[3] Casella G, Berger RL. Statistical inference. 2nd ed. California: Duxbury; 2001.

[4] Wackerly DD, Mendenhall W III, Scheaffer RL. Mathematical statistics with applications. 7th ed. California: Duxbury; 2008.

Footnotes

Hoekstra et al. [1] present 6 asserted statements, of which this is statement 4. I agree statements 1, 4 and 6 are manifestly incorrect assertions. In my view, statements 2, 3 and 5 are so vague as to be at least debatable. But that is neither here nor there. ↩
This is essentially the reason given by Hoekstra et al. [1], that the assertion ‘assign[s] [a] probabilit[y] to [a] parameter[], something that is not allowed within the frequentist framework’. ↩
For example, introductory textbooks like Casella & Berger [3] and Wackerly et al. [4] adopt this interpretation with little discussion. One could say the relative frequency interpretation and frequentist school of inference are alike in philosophy, in that they both focus prospectively on the future properties of a large number of experiments. ↩
See e.g. [3 p. 418]. ↩
But in the subjective interpretation, since $θ_i$ are events and are mutually exclusive, we can also directly compute $\Pr(θ ∈ C)$ as $\Pr(θ ∈ C) = \sum_{θ_i} \Pr(θ ∈ C | θ_i) \Pr(θ_i)$. This represents the prior probability, across all possible values of $θ$ according to how likely we believe they are, that the generated confidence interval will contain the true $θ$. And since $\Pr(θ ∈ C | θ_i) ≥ 0.95$ for all $θ_i$, $\Pr(θ ∈ C) ≥ 0.95$ necessarily. ↩
For simplicity, we present the discrete case. In the continuous case, we replace sums of probabilities with integrals over probability densities. ↩
The prior also sneaks into the calculation via the $\Pr(c_3)$ term. A more detailed discussion can be found at https://stats.stackexchange.com/a/89363. ↩