Bayesian biostatistics procedures matching frequentist confidence intervals

Warning! I am not a statistician. This article is not reviewed. Please confer with a responsible adult!

Confidence intervals are commonly misinterpreted as there being, after observing the data, a 95% probability that the true parameter lies within the confidence interval. The usual explanation why this is incorrect is that the true parameter is not random, and so is either inside or outside the confidence interval. This explanation holds in the ‘relative likelihood’ interpretation of probability associated with frequentist statistics.

However, as I have discussed previously, in the ‘subjective’ interpretation of probability associated with Bayesian statistics, we can assign a probability to the true parameter lying within a given interval. To do so implies that we are thinking of a particular likelihood function for the data, and a prior that would allow us to assign a probability to the true parameter lying within the interval before observing any data.

It is relevant to ask, then, if we interpret a frequentist confidence interval of level $α$ as a Bayesian credible interval of level $α$, what prior does this imply? Equivalently, can we construct Bayesian procedures whose credible intervals exactly match the standard frequentist confidence intervals, for all levels $α$?

Mean of normal distribution with known variance

Consider a normally distributed variable with known variance, $X \sim \mathcal{N}(μ, σ^2)$. In frequentist inference, the usual procedure to construct a confidence interval on the mean of this variable is based on the z test, which can be constructed based on pivotal quantities.

For a vector of observations $\boldsymbol{x}$ sampled from $X$, the sample mean $\bar{x}$ is a statistic, a function of the observations [1 p. 211]. We know that the sampling distribution of $\bar{x}$ is $\mathcal{N}(μ, σ^2/n)$. The random variable $z = \frac{\bar{x} - μ}{σ/\sqrt{n}}$, then, has distribution $\mathcal{N}(0, 1)$. Because this distribution does not depend on the unknown parameter $μ$, $z$ is a pivotal quantity [1 p. 427].

Working backwards, we know then that $-1.96 ≤ z ≤ 1.96$ with probability 0.95, so $\bar{x} - 1.96σ/\sqrt{n} ≤ μ ≤ \bar{x} + 1.96σ/\sqrt{n}$ with probability 0.95, and so $\bar{x} ± 1.96σ/\sqrt{n}$ is a 95% confidence interval for $μ$. Equivalently, the confidence interval is the 2.5th to 97.5th percentiles for the sampling distribution of $\bar{x}$.

In the Bayesian approach, we take our prior $p(μ)$ and update it based on the likelihood of the observations $\boldsymbol{x}$, which is the product of the likelihood of each observation $x_1, x_2, …, x_n$. Since $X$ has a normal distribution with known variance:

\[\begin{align*} p(\boldsymbol{x} | μ) &\propto \prod_{i=1}^n \frac{1}{σ} \exp \left[ - \frac{(x_i - μ)^2}{2σ^2} \right] \\ &= \frac{1}{σ} \exp \left[ - \frac{\sum_{i=1}^n (x_i - μ)^2}{2σ^2} \right] \\ &= \frac{1}{σ} \exp \left[ - \frac{\sum_{i=1}^n x_i{}^2 - \sum_{i=1}^n 2 x_i μ + \sum_{i=1}^n μ^2}{2σ^2} \right] \end{align*}\]

Since the observations $\boldsymbol{x}$ are at this point fixed, $\sum_{i=1}^n x_i{}^2$ is simply a constant and so can be factored out up to proportionality:

\[\begin{align*} p(\boldsymbol{x} | μ) &\propto \frac{1}{σ} \exp \left[ - \frac{- \sum_{i=1}^n 2 x_i μ + \sum_{i=1}^n μ^2}{2σ^2} \right] \\ &= \frac{1}{σ} \exp \left[ - \frac{nμ^2 - 2μ\sum x_i}{2σ^2} \right] \\ &= \frac{1}{σ} \exp \left[ - \frac{μ^2 - 2μ\bar{x}}{2σ^2/n} \right] &&\text{(divide by $n$)} \\ &= \frac{1}{σ} \exp \left[ - \frac{(μ - \bar{x})^2 - \bar{x}^2}{2σ^2/n} \right] &&\text{(complete the square)} \\ &\propto \frac{1}{σ} \exp \left[ - \frac{(μ - \bar{x})^2}{2σ^2/n} \right] &&\text{(since $\bar{x}$ is fixed)} \\ &\propto \frac{1}{σ/\sqrt{n}} \exp \left[ - \frac{(μ - \bar{x})^2}{2σ^2/n} \right] &&\text{(since $n$ is fixed)} \end{align*}\]

Curiously, the right-hand side is proportional to the probability density in the frequentist approach of $\bar{x} \sim \mathcal{N}(μ, σ^2/n)$. So if we weight this likelihood by a uniform prior $p(μ) \propto 1$, then the Bayesian posterior distribution exactly equals the frequentist sampling distribution of $\bar{x}$ [2]. A Bayesian equal-tailed credible interval of level $α$, then, matches exactly the conventional frequentist confidence interval of level $α$.

To summarise, the Bayesian equal-tailed credible interval matches the conventional frequentist confidence interval (for all levels $α$) when the Bayesian posterior distribution exactly equals the frequentist sampling distribution. (Note that this equality has the peculiar philosophical property that the Bayesian posterior distribution is a distribution for the parameter $μ$, but the frequentist sampling distribution is a distribution for the estimator $\bar{x}$!)

Example

observations = stats.norm(5, 1).rvs(30, random_state=12345)

# Frequentist
ci = stats.norm(observations.mean(), 1/np.sqrt(observations.size)).interval(0.95)
print('Frequentist CI: [{:.2f}, {:.2f}]'.format(*ci))

# Bayesian
with pm.Model() as model:
	mean = pm.Flat('mean')
	obs = pm.Normal('obs', mu=mean, sigma=1, observed=observations)
	idata = pm.sample(...)
ci = idata.posterior['mean'].quantile([0.025, 0.975]).data
print('Bayesian CI: [{:.2f}, {:.2f}]'.format(*ci))

Frequentist CI: [4.98, 5.70]
Bayesian CI: [4.98, 5.70]

Posterior density for mean

Mean of normal distribution with unknown variance

Of course, the more common estimation problem in biostatistics is when $σ^2$ is unknown and the random variable has unknown variance. The usual frequentist procedure is again to use pivotal quantities, to perform a t test.

Let $t = \frac{\bar{x} - μ}{s/\sqrt{n}}$, where $S = \sum (x_i - \bar{x})^2$, and $s^2 = \frac{S}{n - 1}$, the unbiased estimator of the population variance. We know that $t$'s sampling distribution is a t distribution with $n - 1$ degrees of freedom. Equivalently, $\bar{x}$ follows an unstandardised t distribution with mean $μ$, scale parameter $s/\sqrt{n}$, and $n-1$ degrees of freedom, and the confidence interval for $μ$ is the 2.5th to 97.5th percentiles for this sampling distribution [1 p. 429].

In the Bayesian approach, the likelihood is again $p(\boldsymbol{x} | μ, σ^2) \propto \frac{1}{σ/\sqrt{n}} \exp \left[ - \frac{(μ - \bar{x})^2}{2σ^2/n} \right]$. The conditional posterior distribution for $μ$ is:

\[\begin{align*} p(μ | σ^2, \boldsymbol{x}) &\propto p(\boldsymbol{x} | μ, σ^2) p(μ | σ^2) \end{align*}\]

Assuming as before a uniform prior $p(μ) \propto 1$, and further assuming that $μ$ and $σ^2$ are independent a priori, $p(μ | σ^2) = p(μ) \propto 1$, so as before:

\[\begin{align*} p(μ | σ^2, \boldsymbol{x}) &\propto \frac{1}{σ/\sqrt{n}} \exp \left[ - \frac{(μ - \bar{x})^2}{2σ^2/n} \right] \\ \therefore\qquad μ | σ^2, \boldsymbol{x} &\sim \mathcal{N}(μ, σ^2/n) \end{align*}\]

But unlike in the case of a known mean, this is not the desired (marginal) posterior density for $μ$, as it is conditional on $σ^2$, so we must first obtain the posterior for $σ^2$.

By Cochran's theorem, we know that $S \sim σ^2 χ^2_{n-1}$ [1 p. 218], so in terms of likelihood:

\[\begin{align*} p(S | σ^2) &\propto \frac{1}{σ^2} \left(\frac{S}{σ^2}\right)^{\frac{n-1}{2} - 1} \exp \left(-\frac{S}{2σ^2}\right) \end{align*}\]

Again assuming $μ$ and $σ^2$ are independent a priori, the posterior distribution for $σ^2$ is:

\[\begin{align*} p(σ^2 | \boldsymbol{x}) = p(σ^2 | S) &\propto p(S | σ^2) p(σ^2) \\ &\propto \frac{1}{σ^2} \left(\frac{S}{σ^2}\right)^{\frac{n-1}{2} - 1} \exp \left(-\frac{S}{2σ^2}\right) \cdot p(σ^2) \\ &\propto \frac{1}{σ^2} \left(\frac{1}{σ^2}\right)^{\frac{n-1}{2} - 1} \exp \left(-\frac{S}{2σ^2}\right) \cdot p(σ^2) &&\text{(since $S$ is fixed)} \\ &= \left(\frac{1}{σ^2}\right)^{\frac{n-1}{2}} \exp \left(-\frac{S}{2σ^2}\right) \cdot p(σ^2) \end{align*}\]

So our conditional posterior $μ | σ^2, \boldsymbol{x} \sim \mathcal{N}(μ, σ^2/n)$ must be marginalised over $σ^2 | \boldsymbol{x}$, whose density is given by the above expression which depends on the prior $p(σ^2)$. In order for credible intervals for $μ$ to match the frequentist confidence intervals, we require the marginal posterior distribution for $μ$ to follow the unstandardised t distribution obtained from the frequentist method, with scale parameter $s/\sqrt{n}$ and $n-1$ degrees of freedom. It can be shown that this happens when $σ^2 | \boldsymbol{x} \sim \text{Inverse-Gamma}(\frac{n-1}{2}, \frac{S}{2})$ [3 p. 520].

Letting the prior $p(σ^2) \propto 1/σ^2$, we obtain:

\[\begin{align*} p(σ^2 | \boldsymbol{x}) &\propto \left(\frac{1}{σ^2}\right)^{\frac{n-1}{2} + 1} \exp \left(-\frac{S}{2σ^2}\right) \end{align*}\]

which gives $σ^2 | \boldsymbol{x}$ the required inverse-gamma distribution.

To summarise, assuming a uniform prior on the mean, $p(μ) \propto 1$, and an uninformative prior on the variance, $p(σ^2) \propto 1/σ^2$, the marginal posterior distribution for $μ$ is an unstandardised t distribution with mean $μ$, scale parameter $s/\sqrt{n}$, and $n-1$ degrees of freedom, which exactly equals the frequentist sampling distribution of $\bar{x}$. So, again, a Bayesian equal-tailed credible interval of level $α$ matches exactly the conventional frequentist confidence interval of level $α$.

Example

observations = stats.norm(5, 1).rvs(30, random_state=12345)

# Frequentist
ci = stats.t(loc=observations.mean(), scale=stats.sem(observations), df=observations.size-1).interval(0.95)
print('Frequentist CI: [{:.2f}, {:.2f}]'.format(*ci))

# Bayesian
with pm.Model() as model:
	mean = pm.Flat('mean')
	sigmasq = pm.DensityDist('sigmasq', logp=lambda x: 1/x)
	obs = pm.Normal('obs', mu=mean, sigma=np.sqrt(sigmasq), observed=observations)
	idata = pm.sample(...)
ci = idata.posterior['mean'].quantile([0.025, 0.975]).data
print('Bayesian CI: [{:.2f}, {:.2f}]'.format(*ci))

Frequentist CI: [4.94, 5.73]
Bayesian CI: [4.94, 5.73]

Posterior density for mean

Risk ratio for cohort study

Let's turn now away from the continuous case to the opposite scenario of inference on binary outcomes. Consider 2 groups, the control group of size $N_0$ and the exposed group of size $N_1$. Suppose each member of the control group has probability $π_0$ of becoming a case (developing the outcome being studied), and likewise $π_1$ for the exposed group. If $n_0$ and $n_1$ are the number of cases in each group, then we have:

\[\begin{align*} n_0 &\sim \mathcal{B}(N_0, π_0) \\ n_1 &\sim \mathcal{B}(N_1, π_1) \end{align*}\]

The effect size of interest is the risk ratio (relative risk), $ρ = π_1/π_0$. The frequentist procedure for inference on the risk ratio is well known, so what follows is a high-level tour of just the important parts for this exercise. Frequentist inference proceeds by estimating the risk ratio with:

\[\begin{align*} \hat{ρ} &= \frac{n_1 / N_1}{n_0 / N_0} \\ \log\hat{ρ} &= \log(n_1 / N_1) - \log(n_0 / N_0) \\ &= \log n_1 - \log n_0 - \log N_1 + \log N_0 \end{align*}\]

Since $N_0$ and $N_1$ are fixed in a cohort study, $\log\hat{ρ}$ is then essentially the sum of 2 random variables plus some constant. By some central limit theorem hand-waving, then, $\log\hat{ρ}$ is approximately normally distributed, with variance:

\[\begin{align*} \mathrm{var}[\log\hat{ρ}] &≈ \mathrm{var}[\log n_0] + \mathrm{var}[\log n_1] \end{align*}\]

We can approximate the standard deviation of $\log n_0$ and $\log n_1$ by Taylor series expansion [1 p. 242]:

\[\begin{align*} \mathrm{var}[f(x)] &\approx \left(f'(\mathrm{E}[x])\right)^2 \mathrm{var}[x] \\ \mathrm{var}[\log n_0] &\approx \left(\frac{1}{\mathrm{E}[n_0]}\right)^2 \mathrm{var}[n_0] \\ &= \left(\frac{1}{N_0 π_0}\right)^2 N_0 π_0 (1 - π_0) &&\text{(since $n_0 \sim \mathcal{B}(N_0, π_0)$)} \\ &= \frac{1 - π_0}{N_0 π_0} \end{align*}\]

And estimated from the observations:

\[\begin{align*} \mathrm{var}[\log n_0] &\approx \frac{N_0 - n_0}{N_0 n_0} \end{align*}\]

So the standard deviation of $\log\hat{ρ}$ is approximately:

\[\begin{align*} σ_{\log\hat{ρ}} &≈ \sqrt{\frac{N_0 - n_0}{N_0 n_0} + \frac{N_1 - n_1}{N_1 n_1}} \end{align*}\]

We then model the sampling distribution of $\log\hat{ρ}$ as $\mathcal{N}\left(\log ρ, \frac{N_0 - n_0}{N_0 n_0} + \frac{N_1 - n_1}{N_1 n_1}\right)$, and the confidence interval for $\log ρ$ is the 2.5th to 97.5th percentiles for this sampling distribution.

In order for a Bayesian procedure to match the frequentist confidence intervals, the posterior distribution for $\log ρ$ must match this normal sampling distribution. From before, we know we obtain a normal posterior distribution when estimating the mean of a normally distributed variable with known variance and uniform prior mean. This is equivalent to having $\log ρ ≈ \log(\hat{π}_1/\hat{π}_0) = \log n_1 - \log n_0 - \log N_1 + \log N_0$, and assuming $\log n_0$ and $\log n_1$, the observed number of cases in each group, are normally distributed each with known variance and uniform prior mean.

We can see immediately that this is a bizarre choice of model from a Bayesian perspective, as the observed cases could then exceed the number of patients in that group! Indeed, having a uniform mean on the log scale suggests that we believe a priori it is quite plausible to observe more cases than there are patients. The normal approximation is unusual even from the frequentist standpoint – we would preferably approximate $n_0$ and $n_1$, the binomially distributed variables, with a normal distribution, not their logarithms.

Additionally, the ‘known’ variance we must specify for $\log n_0$ and $\log n_1$ must be $\frac{N_0 - n_0}{N_0 n_0} ≈ \frac{1 - π_0}{N_0 π_0}$ and $\frac{N_1 - n_1}{N_1 n_1} ≈ \frac{1 - π_1}{N_1 π_1}$ in order to match the frequentist result. This has the flavour of an empirical Bayesian method, where the actual observations are used to set parameters in the model. However, it is again a bizarre choice, since for fixed $N_0$ and $N_1$, these expressions are totally determined by $π_0$ and $π_1$, so if you think you ‘know’ the variance, you must surely also know $π_0$ and $π_1$ anyway.

Example

N0 = 750
n0 = 7
N1 = 250
n1 = 12

# Frequentist
ci = stats.norm(np.log((n1/N1)/(n0/N0)), np.sqrt((N0-n0)/(N0*n0) + (N1-n1)/(N1*n1))).interval(0.95)
print('Frequentist CI: [{:.2f}, {:.2f}]'.format(*np.exp(ci)))

# Bayesian
with pm.Model() as model:
	log_n0_mean = pm.Flat('log_n0_mean')
	log_n1_mean = pm.Flat('log_n1_mean')
	RR = pm.Deterministic('RR', (np.exp(log_n1_mean)/N1) / (np.exp(log_n0_mean)/N0))
	log_n0 = pm.Normal('log_n0', mu=log_n0_mean, sigma=np.sqrt((N0-n0)/(N0*n0)), observed=np.log(n0))
	log_n1 = pm.Normal('log_n1', mu=log_n1_mean, sigma=np.sqrt((N1-n1)/(N1*n1)), observed=np.log(n1))
	idata = pm.sample(...)
ci = idata.posterior['RR'].quantile([0.025, 0.975]).data
print('Bayesian CI: [{:.2f}, {:.2f}]'.format(*ci))

Frequentist CI: [2.05, 12.92]
Bayesian CI: [2.05, 12.92]

Posterior density for risk ratio

We can see how this compares with the standard Bayesian procedure, where we assume a uniform prior on [0, 1] for $π_0$ and $π_1$, with $n_0 \sim \mathcal{B}(N_0, π_0)$ and $n_1 \sim \mathcal{B}(N_1, π_1)$. This results in beta posterior distributions for $π_0$ and $π_1$, and so the posterior distribution for $ρ$ is the ratio of 2 independent beta distributions (blue). This posterior distribution is a little more concentrated than the frequentist-matching posterior (orange) based on approximations:

Standard Bayesian posterior density for risk ratio

Conclusion

We have demonstrated Bayesian procedures for three inference problems, where the Bayesian equal-tailed credible intervals exactly match the frequentist confidence intervals – for estimating the mean of a normally distributed variable with known variance, for estimating the mean of a normally distributed variable with unknown variance, and for estimating a risk ratio from a cohort study.

For the first 2 problems, the frequentist confidence interval corresponds with a reasonable Bayesian procedure. This suggests that, for these problems, it is in some sense reasonable to interpret a frequentist confidence interval as a credible interval (given the appropriate model and uninformative prior). However, in the third problem, the matching ‘Bayesian’ procedure is bizarre by Bayesian standards, and to interpret the confidence interval as a credible interval seems less appropriate.

References

[1] Casella G, Berger RL. Statistical inference. 2nd ed. California: Duxbury; 2001.

[2] See e.g. Rigollet P. Bayesian statistics. Cambridge (MA): Massachusetts Institute of Technology; 2017. MIT 18.650 Statistics for Applications, Fall 2016; lecture 17. https://youtu.be/bFZ-0FH5hfs?t=3728. From 1:02:08.

[3] Jackman S. Bayesian analysis for the social sciences. West Sussex: John Wiley & Sons; 2009.