Directly computing HDIs from PDFs in SciPy

Warning! I am not a statistician. This article is not reviewed. Please confer with a responsible adult!

In Bayesian inference, it is often desired to calculate credible intervals for model parameters. The 2 common choices are the highest posterior density interval (HPD/HDI), and the equal-tailed interval. In many cases, the posterior density must be estimated by simulation, but in some cases the posterior density has a known closed-form expression, which enables these intervals to be directly computed.

In SciPy, equal-tailed intervals are easily computed from known distributions, via the interval method on an rv_continuous, e.g. stats.norm.interval(confidence=0.95, loc=0, scale=1).

Conversely, SciPy does not have a simple method to compute the highest density interval for a known distribution. The PyMC Bayesian modelling library has functionality for computing the HDI (arviz.hdi), but from samples from the posterior, not directly from the PDF.

We can leverage SciPy's numerical optimisation and root-finding functions to compute the highest density interval directly from the distribution. We can achieve this by solving for the narrowest interval which covers 95% of the distribution (or some other desired level). The following code snippet achieves this:

from scipy import optimize

def hdi(distribution, level=0.95):
	"""
	Get the highest density interval for the distribution, e.g. for a Bayesian posterior, the highest posterior density interval (HPD/HDI)
	"""
	
	# For a given lower limit, we can compute the corresponding 95% interval
	def interval_width(lower):
		upper = distribution.ppf(distribution.cdf(lower) + level)
		return upper - lower
	
	# Find such interval which has the smallest width
	# Use equal-tailed interval as initial guess
	initial_guess = distribution.ppf((1-level)/2)
	optimize_result = optimize.minimize(interval_width, initial_guess)
	
	lower_limit = optimize_result.x[0]
	width = optimize_result.fun
	upper_limit = lower_limit + width
	
	return (lower_limit, upper_limit)

For example, using this function to compute a highest posterior density interval, for a beta-binomial model with uniform prior:

from scipy import stats

n, N = 12, 250
prior_a, prior_b = 1, 1

distribution = stats.beta(n + prior_a, N - n + prior_b)

print(hdi(distribution, 0.95))  # -> (0.025943479765227942, 0.07930059177617696)

We can confirm this matches the highest posterior density interval computed by the R binom.bayes function:

> library(binom)
> binom.bayes(12, 250, conf.level=0.95, type='highest', prior.shape1=1, prior.shape2=1)
  method  x   n shape1 shape2      mean      lower     upper  sig
1  bayes 12 250     13    239 0.0515873 0.02594348 0.0793006 0.05

This helper function is available as part of a library here.