statistics

A high-performance Rust implementation of the Turnbull non-parametric maximum likelihood estimator for interval-censored survival data

29 October 2023 (updated 02 May 2024) | mathematics statistics rust | Comment

A common need in biostatistics is to estimate survival curves, but particular difficulty arises when observations are interval censored, i.e. the time of event is not observed exactly, but is known only to fall within a particular interval. In this setting, the Turnbull estimator [1]… »
A high-performance Rust implementation of interval-censored Cox regression

29 April 2023 (updated 01 May 2023) | mathematics statistics rust | Comment

Cox proportional hazards models are commonly used in biostatistics for the modelling of time-to-event data, such as mortality or disease progression. A difficulty in applying Cox models arises when observations are interval censored, i.e. the time of event is not observed exactly, but is known… »
SciPy distribution for the odds ratio of independent beta variables

07 October 2022 | mathematics statistics python | Comment

In biostatistics, a common effect measure when considering dichotomous exposures and outcomes is the odds ratio. With two proportions $π_0$ and $π_1$, the odds ratio is $ψ = \frac{π_1 / (1 - π_1)}{π_0 / (1 - π_0)}$, as compared to the risk ratio,… »
Directly computing HDIs from PDFs in SciPy

06 October 2022 | mathematics statistics python | Comment

In Bayesian inference, it is often desired to calculate credible intervals for model parameters. The 2 common choices are the highest posterior density interval (HPD/HDI), and the equal-tailed interval. In many cases, the posterior density must be estimated by simulation, but in some cases the… »
Beta ratio distribution for SciPy

05 October 2022 | mathematics statistics programming python | Comment

The quotient of 2 independent beta-distributed random variables has a known distribution, but its closed-form expression is a little hairy [1, 2]. One Python implementation of this distribution is available from Julian Saffer [3], but it suffers from some numerical issues… »
Bayesian biostatistics procedures matching frequentist confidence intervals

03 October 2022 (updated 15 March 2023) | mathematics statistics | Comment

Confidence intervals are commonly misinterpreted as there being, after observing the data, a 95% probability that the true parameter lies within the confidence interval. The usual explanation why this is incorrect is that the true parameter is not random, and so is either inside or… »
On the credible probability of confidence intervals

13 September 2022 | mathematics statistics | Comment

Confidence intervals are commonly misinterpreted by consumers of statistics. Hoekstra et al. [1] presented 120 psychology researchers and 442 students with ‘a fictitious scenario of a professor who conducts an experiment and reports a 95% CI for the mean that ranges from 0.1 to… »
Quasi-likelihood gamma regression in statsmodels for zeroes in observations

03 September 2022 (updated 06 September 2022) | mathematics statistics | Comment

Generalised linear models with a gamma distribution and log link are frequently used to model non-negative right-skewed continuous data, such as costs [1].

For example, in statsmodels:
```
import numpy as np
import pandas as pd
from scipy import stats
import statsmodels.api as sm

#
```
… »
Robust Poisson regression in medical biostatistics

29 August 2022 (updated 03 September 2022) | mathematics statistics medicine | Comment

Log-binomial and robust (modified) Poisson regression are common approaches to estimating risk ratios in medical biostatistics [1].

I have discussed log-binomial regression in a previous post about generalised linear models. The conceptual basis for using log-binomial regression to estimate risk ratios is straightforward –… »
Generalised linear models for medical biostatistics

06 November 2021 (updated 14 October 2022) | mathematics statistics medicine | Comment

Recently, I've been doing some statistical analysis using log-binomial generalised linear models (GLMs). Resources on the topic seem to fall largely into 2 categories:
- Assume you want to know none of the background: ‘Use a log-binomial GLM if you want a risk ratio.’¹
- Assume
… »