-
A high-performance Rust implementation of the Turnbull non-parametric maximum likelihood estimator for interval-censored survival data
A common need in biostatistics is to estimate survival curves, but particular difficulty arises when observations are interval censored, i.e. the time of event is not observed exactly, but is known only to fall within a particular interval. In this setting, the Turnbull estimator [1]… »
-
A high-performance Rust implementation of interval-censored Cox regression
Cox proportional hazards models are commonly used in biostatistics for the modelling of time-to-event data, such as mortality or disease progression. A difficulty in applying Cox models arises when observations are interval censored, i.e. the time of event is not observed exactly, but is known… »
-
SciPy distribution for the odds ratio of independent beta variables
In biostatistics, a common effect measure when considering dichotomous exposures and outcomes is the odds ratio. With two proportions $π_0$ and $π_1$, the odds ratio is $ψ = \frac{π_1 / (1 - π_1)}{π_0 / (1 - π_0)}$, as compared to the risk ratio,… »
-
Directly computing HDIs from PDFs in SciPy
In Bayesian inference, it is often desired to calculate credible intervals for model parameters. The 2 common choices are the highest posterior density interval (HPD/HDI), and the equal-tailed interval. In many cases, the posterior density must be estimated by simulation, but in some cases the… »
-
Beta ratio distribution for SciPy
The quotient of 2 independent beta-distributed random variables has a known distribution, but its closed-form expression is a little hairy [1, 2]. One Python implementation of this distribution is available from Julian Saffer [3], but it suffers from some numerical issues… »
-
Bayesian biostatistics procedures matching frequentist confidence intervals
Confidence intervals are commonly misinterpreted as there being, after observing the data, a 95% probability that the true parameter lies within the confidence interval. The usual explanation why this is incorrect is that the true parameter is not random, and so is either inside or… »
-
On the credible probability of confidence intervals
Confidence intervals are commonly misinterpreted by consumers of statistics. Hoekstra et al. [1] presented 120 psychology researchers and 442 students with ‘a fictitious scenario of a professor who conducts an experiment and reports a 95% CI for the mean that ranges from 0.1 to… »
-
Quasi-likelihood gamma regression in statsmodels for zeroes in observations
Generalised linear models with a gamma distribution and log link are frequently used to model non-negative right-skewed continuous data, such as costs [1].
For example, in statsmodels:
… »import numpy as np import pandas as pd from scipy import stats import statsmodels.api as sm #
-
Robust Poisson regression in medical biostatistics
Log-binomial and robust (modified) Poisson regression are common approaches to estimating risk ratios in medical biostatistics [1].
I have discussed log-binomial regression in a previous post about generalised linear models. The conceptual basis for using log-binomial regression to estimate risk ratios is straightforward –… »
-
Generalised linear models for medical biostatistics
Recently, I've been doing some statistical analysis using log-binomial generalised linear models (GLMs). Resources on the topic seem to fall largely into 2 categories:
-
Assume you want to know none of the background: ‘Use a log-binomial GLM if you want a risk ratio.’1
-
Assume
-