Warning! I am not a statistician. This article is not reviewed. Please confer with a responsible adult!

Confidence intervals are commonly misinterpreted by consumers of statistics. Hoekstra et al. [1] presented 120 psychology researchers and 442 students with ‘a fictitious scenario of a professor who conducts an experiment and reports a 95% CI for the mean that ranges from 0.1 to 0.4’. 58% of respondents endorsed the assertion ‘There is a 95% probability that the true mean lies between 0.1 and 0.4’, and the proportions were similar between students and researchers. That assertion is incorrect,1 but clearly the misinterpretation is common.

The usual explanation for why the assertion is incorrect goes something like ‘The true mean is a fixed (but unknown) value, not random. So either it is in the interval (with probability 1), or it is not in the interval (with probability 1). There can be no probability in between.’2 But this seems unsatisfying, for it seems to beg the question. There are interpretations of probability which can assign probabilities to fixed-but-unknown parameters. Does adopting another interpretation of probability not resolve the issue?

Interpretations of probability and approaches to statistical inference

Two interpretations of probability are particularly relevant to this discussion:

  • The relative frequency interpretation of probability says to the effect that the probability of an event in a random experiment is the ‘limiting relative frequency’ of the event, if the random experiment were repeated infinitely [2]. If I have a coin, and repeatedly tossing the coin a large number of times would tend to it landing heads 50% of the time, then the probability of a single coin flip landing heads is 50%.

  • The subjective interpretation of probability says to the effect that probability represents a ‘degree of belief’ in an outcome which is uncertain [2]. If I have a coin, I am uncertain about which way it would land if I were to toss it. If specifically I think it is equally plausible that the coin would land heads as tails, then the probability of it landing heads is 50%.

Interpretations of probability are often conflated with the frequentist and Bayesian approaches to statistical inference. Confidence intervals come from the frequentist school of inference, which is commonly associated with the relative frequency interpretation.3 But as we shall see, there is nothing stopping us applying the subjective interpretation to frequentist methods.

Frequentist estimation and relative likelihood interpretation

Let's consider the following model of frequentist estimation, using the relative likelihood interpretation of probability. Let θθ be the true value of the target parameter. In the relative likelihood interpretation, θθ is unknown but fixed – there is no random experiment which can ‘change’ θθ, so it is not a random variable. Let θ1,θ2,θ3,θ_1, θ_2, θ_3, … be possible values of θθ.

Considering each possible θiθ_i in turn, we imagine that, supposing that θiθ_i is the true value of θθ, we draw a sample of observations x\mathbf{x}. x\mathbf{x}, then, is a random variable, whose sampling distribution is based on θiθ_i. Let x1,x2,x3,\mathbf{x}_1, \mathbf{x}_2, \mathbf{x}_3, … be possible vectors of observations which we could draw.

Based on the x\mathbf{x} which we observe, we produce the corresponding confidence interval CC according to our confidence procedure. CC, then, is a random variable, a function of x\mathbf{x}. In other words, we map each possible vector of observations x1,x2,x3,\mathbf{x}_1, \mathbf{x}_2, \mathbf{x}_3, … to a corresponding confidence interval c1,c2,c3,c_1, c_2, c_3, ….

Since we are considering a single θiθ_i at a time, we can then determine if θiθ_i is in each of the confidence intervals, and since the sampling distribution of x\mathbf{x} (and hence CC) is known for a particular θiθ_i, we can compute Prθi(θC)\Pr_{θ_i}(θ ∈ C), the probability that the generated confidence interval will contain the true value of θθ, if the true value of θθ is θiθ_i.

Because we of course do not know which θiθ_i is the true value of θθ, we then require that for every θiθ_i (i.e. for any possible value of θθ), Prθi(θC)0.95\Pr_{θ_i}(θ ∈ C) ≥ 0.95.4 Then CC is a 95% confidence interval for θθ. In particular, since the inequality is true for any possible value of θθ, it is necessarily true that Pr(θC)0.95\Pr(θ ∈ C) ≥ 0.95, no matter what θθ in fact truly happens to be. In frequentist terms, the coverage probability of CC is ≥95%.

Once we perform the experiment, however, we realise a particular confidence interval cc, which is now a fixed, known interval. In the relative likelihood interpretation, since the true θθ is also fixed, no random variables are now involved – so cc either contains θθ, or it does not, and no probabilities can be assigned.

Frequentist estimation and subjective interpretation

With the subjective interpretation of probability, however, we need not be stumped here. In the subjective interpretation of probability, θθ is of course still unknown and fixed. But even though it is not the outcome of a ‘random experiment’, we can nevertheless assign probabilities to possible values of θθ, representing our uncertain prior beliefs about what θθ is likely to be. So let θ1,θ2,θ3,θ_1, θ_2, θ_3, … instead be the events that the true θθ equals each of those possible values.

We then proceed similar to before – in the event that θiθ_i is the true value of θθ, x\mathbf{x} will have a particular sampling distribution, and we generate confidence intervals CC, and compute Pr(θCθi)\Pr(θ ∈ C | θ_i). The calculation is exactly the same as Prθi(θC)\Pr_{θ_i}(θ ∈ C) in the relative likelihood interpretation, but in this case it can be directly interpreted as a conditional probability, because θiθ_i is a random event. The overall experiment is depicted in the following tree diagram:

Probability tree diagram

We then require that Pr(θCθi)0.95\Pr(θ ∈ C | θ_i) ≥ 0.95 for all θiθ_i, so that CC is a 95% confidence interval for θθ. We can then reason as before and say, then, that ‘no matter what θθ in fact truly happens to be’, Pr(θC)0.95\Pr(θ ∈ C) ≥ 0.95 for the true value of θθ.5 Again, in frequentist terms, the coverage probability of CC is ≥95%.

It is apt to reiterate the focus of the frequentist coverage probability. Because Pr(θCθi)0.95\Pr(θ ∈ C | θ_i) ≥ 0.95 for any θiθ_i, then Pr(θC)0.95\Pr(θ ∈ C) ≥ 0.95 for the true value of θθ, which is what will actually happen. If, say, the true value of θθ is in fact θ3θ_3, we condition on θ3θ_3 and ask the probability that θCθ ∈ C given θ3θ_3. This is represented in the following diagram by the shaded blue area as a fraction of the dashed area:

Probability tree diagram

Credible probability of confidence interval

Still using the subjective interpretation of probability, once we perform the experiment, we realise a particular confidence interval cc. But even though cc is now fixed, θ1,θ2,θ3,θ_1, θ_2, θ_3, … are random events, representing our uncertainty about the true value of the parameter. Therefore we can ask the probability – our degree of belief – that cc contains the true parameter, based on how likely we think each possible value of the true parameter is. This is the Bayesian credible probability of the particular interval cc.

Let's suppose we generated the confidence interval c3c_3. Referring to the tree diagram, note that we are only concerned with the branches which result in generating c3c_3. We of course do not care about Pr(θc3)\Pr(θ ∈ c_3) given those events where we might have generated, say, c1c_1 – it is of course unlikely the true parameter lies within one of the hypothetical confidence intervals which we did not generate! We care about Pr(θc3)\Pr(θ ∈ c_3) given that we generated c3c_3 as the confidence interval. So in other words, we condition on c3c_3 and ask the probability that θCθ ∈ C given c3c_3. This is represented in the following diagram by the shaded green area as a fraction of the dashed area:

Probability tree diagram

Clearly, this is quite different to the diagram corresponding to the frequentist coverage probability.

In symbolic terms, we seek Pr(θc3c3)\Pr(θ ∈ c_3 | c_3). This is simply the sum,6 over all elements θic3θ_i ∈ c_3, that θiθ_i is the true θθ:

Pr(θc3c3)=θic3Pr(θic3)\Pr(θ ∈ c_3 | c_3) = \sum_{θ_i ∈ c_3} \Pr(θ_i | c_3)

By Bayes' theorem:

Pr(θc3c3)=θic3Pr(c3θi)Pr(θi)Pr(c3)\Pr(θ ∈ c_3 | c_3) = \sum_{θ_i ∈ c_3} \frac{\Pr(c_3 | θ_i) \Pr(θ_i)}{\Pr(c_3)}

Immediately, we see that this quantity will depend on Pr(θi)\Pr(θ_i), the prior probability (i.e. not conditioned on the generated confidence interval, and so not conditioned on the data we observed) that θiθ_i is the true θθ.7

Conclusion

This demonstrates that the credible probability of a confidence interval is not equal to the coverage probability. In other words, after gathering observations, there is not a 95% probability that the true parameter lies within the 95% confidence interval which has been generated. This is not (necessarily) because ‘it either lies within the interval or not’, but because the credible probability requires additionally specifying a prior distribution on the target parameter.

References

[1] Hoekstra R, Morey RD, Rouder JN, Wagenmakers EJ. Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review. 2014 Oct 1; 21(5): 1157–64. doi: 10.3758/s13423-013-0572-3

[2] Hájek A. ‘Interpretations of probability’. In: Zalta EN, ed. The Stanford encyclopedia of philosophy. 2019 Aug 28 [cited 2022 Sep 12]. https://plato.stanford.edu/entries/probability-interpret/

[3] Casella G, Berger RL. Statistical inference. 2nd ed. California: Duxbury; 2001.

[4] Wackerly DD, Mendenhall W III, Scheaffer RL. Mathematical statistics with applications. 7th ed. California: Duxbury; 2008.

Footnotes

  1. Hoekstra et al. [1] present 6 asserted statements, of which this is statement 4. I agree statements 1, 4 and 6 are manifestly incorrect assertions. In my view, statements 2, 3 and 5 are so vague as to be at least debatable. But that is neither here nor there. 

  2. This is essentially the reason given by Hoekstra et al. [1], that the assertion ‘assign[s] [a] probabilit[y] to [a] parameter[], something that is not allowed within the frequentist framework’. 

  3. For example, introductory textbooks like Casella & Berger [3] and Wackerly et al. [4] adopt this interpretation with little discussion. One could say the relative frequency interpretation and frequentist school of inference are alike in philosophy, in that they both focus prospectively on the future properties of a large number of experiments. 

  4. See e.g. [3 p. 418]

  5. But in the subjective interpretation, since θiθ_i are events and are mutually exclusive, we can also directly compute Pr(θC)\Pr(θ ∈ C) as Pr(θC)=θiPr(θCθi)Pr(θi)\Pr(θ ∈ C) = \sum_{θ_i} \Pr(θ ∈ C | θ_i) \Pr(θ_i). This represents the prior probability, across all possible values of θθ according to how likely we believe they are, that the generated confidence interval will contain the true θθ. And since Pr(θCθi)0.95\Pr(θ ∈ C | θ_i) ≥ 0.95 for all θiθ_i, Pr(θC)0.95\Pr(θ ∈ C) ≥ 0.95 necessarily. 

  6. For simplicity, we present the discrete case. In the continuous case, we replace sums of probabilities with integrals over probability densities. 

  7. The prior also sneaks into the calculation via the Pr(c3)\Pr(c_3) term. A more detailed discussion can be found at https://stats.stackexchange.com/a/89363