Directly computing HDIs from PDFs in SciPy
Warning! I am not a statistician. This article is not reviewed. Please confer with a responsible adult!
In Bayesian inference, it is often desired to calculate credible intervals for model parameters. The 2 common choices are the highest posterior density interval (HPD/HDI), and the equal-tailed interval. In many cases, the posterior density must be estimated by simulation, but in some cases the posterior density has a known closed-form expression, which enables these intervals to be directly computed.
In SciPy, equal-tailed intervals are easily computed from known distributions, via the interval method on an rv_continuous, e.g. stats.norm.interval(confidence=0.95, loc=0, scale=1)
.
Conversely, SciPy does not have a simple method to compute the highest density interval for a known distribution. The PyMC Bayesian modelling library has functionality for computing the HDI (arviz.hdi), but from samples from the posterior, not directly from the PDF.
We can leverage SciPy's numerical optimisation and root-finding functions to compute the highest density interval directly from the distribution. We can achieve this by solving for the narrowest interval which covers 95% of the distribution (or some other desired level). The following code snippet achieves this:
from scipy import optimize
def hdi(distribution, level=0.95):
"""
Get the highest density interval for the distribution, e.g. for a Bayesian posterior, the highest posterior density interval (HPD/HDI)
"""
# For a given lower limit, we can compute the corresponding 95% interval
def interval_width(lower):
upper = distribution.ppf(distribution.cdf(lower) + level)
return upper - lower
# Find such interval which has the smallest width
# Use equal-tailed interval as initial guess
initial_guess = distribution.ppf((1-level)/2)
optimize_result = optimize.minimize(interval_width, initial_guess)
lower_limit = optimize_result.x[0]
width = optimize_result.fun
upper_limit = lower_limit + width
return (lower_limit, upper_limit)
For example, using this function to compute a highest posterior density interval, for a beta-binomial model with uniform prior:
from scipy import stats
n, N = 12, 250
prior_a, prior_b = 1, 1
distribution = stats.beta(n + prior_a, N - n + prior_b)
print(hdi(distribution, 0.95)) # -> (0.025943479765227942, 0.07930059177617696)
We can confirm this matches the highest posterior density interval computed by the R binom.bayes function:
> library(binom)
> binom.bayes(12, 250, conf.level=0.95, type='highest', prior.shape1=1, prior.shape2=1)
method x n shape1 shape2 mean lower upper sig
1 bayes 12 250 13 239 0.0515873 0.02594348 0.0793006 0.05
This helper function is available as part of a library here.