Robust Poisson regression in medical biostatistics
Warning! I am not a statistician. This article is not reviewed. Please confer with a responsible adult!
Logbinomial and robust (modified) Poisson regression are common approaches to estimating risk ratios in medical biostatistics [1].
I have discussed logbinomial regression in a previous post about generalised linear models. The conceptual basis for using logbinomial regression to estimate risk ratios is straightforward – it is natural to specify a binomial (Bernoulli) distribution for a binary outcome, and a log link puts the parameter estimates on the risk ratio scale.
Robust Poisson regression, on the other hand, seems harder to justify. Under the usual parametric view of generalised linear models, it seems to assume a Poisson distribution for a binary outcome, which is incorrect, as a Poissondistributed variable can take any nonnegative integer value, but the outcome can only either happen or not.
It turns out that robust Poisson regression can be justified under a quasilikelihood framework [1].
Maximum likelihood estimation for logbinomial generalised linear models
Generalised linear models, such as the logbinomial model, are fitted by finding the maximum likelihood estimators for the model parameters. The maximum likelihood estimator is the estimator which maximises the likelihood of obtaining the observed result, which we denote by maximising the likelihood function $L(\hat\beta)$. This is equivalent to maximising the loglikelihood $\log L(\hat\beta)$.
When the loglikelihood is maximised, its derivative is 0 with respect to each model parameter, i.e.:
\[\frac{\partial \log L(\hat\beta_j)}{\partial \hat\beta_j} = 0, \quad\text{for each model parameter $\hat\beta_j$}\]The lefthand side, the derivative of the loglikelihood, is also called the score function $S_j(\hat\beta)$ [2 p. 129].
It can be shown that, for a distribution in the exponential family (which the binomial, Poisson, Gaussian, etc. distributions are), the score function is:
\[S_j(\hat\beta) = \frac{\partial \log L(\hat\beta_j)}{\partial \hat\beta_j} = \sum_i \frac{(y_i  \hat\mu_i) x_{ij}}{\mathrm{var}(y_i)} \frac{\partial \hat\mu_i}{\partial g(\hat\mu_i)}\]where $g$ is the link function [2 p. 124].
In logbinomial regression, $y_i$ is assumed to follow a binomial distribution, so $\mathrm{var}(y_i) = \hat\mu_i (1  \hat\mu_i)$. Further, the log function $g(\hat\mu_i) = \log(\hat\mu_i)$. So the score function is:
\[\begin{align*} S_j(\hat\beta) &= \sum_i \frac{(y_i  \hat\mu_i) x_{ij}}{\mathrm{var}(y_i)} \frac{\partial \hat\mu_i}{\partial \log(\hat\mu_i)} \\ &= \sum_i \frac{(y_i  \hat\mu_i) x_{ij}}{\hat\mu_i (1  \hat\mu_i)} \cdot \hat\mu_i \\ &= \sum_i \frac{(y_i  \hat\mu_i) x_{ij}}{1  \hat\mu_i} \end{align*}\]Setting $S_j(\hat\beta) = 0$ for each $j$ gives us a system of $j$ simultaneous estimating equations, the solution to which yields the maximum likelihood estimator.
However, when $\hat\mu_i$ (i.e. the predicted probability of the outcome) is close to 1, the denominator $1  \hat\mu_i$ approaches 0, leading to convergence issues in the logbinomial model [3].^{1} To address this issue, we turn to quasilikelihood estimation.
Quasilikelihood estimation to address model convergence
Let us return to the definition of the score function from maximum likelihood estimation:
\[S_j(\hat\beta) = \sum_i \frac{(y_i  \hat\mu_i) x_{ij}}{\mathrm{var}(y_i)} \frac{\partial \hat\mu_i}{\partial g(\hat\mu_i)}\]In maximum likelihood estimation, the variance $\mathrm{var}(y_i)$ is derived from an assumption about the distribution of $y$. In quasilikelihood estimation, we make no parametric assumption about the distribution of $y$. Rather, we assume only a particular relationship between the mean and variance [2 p. 268]. Plugging the mean–variance relationship into the above expression yields a quasiscore function [2 p. 278].
In quasilikelihood estimation, we let the quasiscore functions $S_j(\hat\beta) = 0$, and solve as usual. Because we make no distributional assumption, there is no likelihoodbased interpretation of the resulting model. However, it turns out that models produced by solving quasilikelihood estimating equations still have good properties, and the estimators $\hat\beta$ are consistent. In fact, this result holds even if the mean–variance relationship is misspecified [2 p. 279].^{2}
So, instead of assuming (as in logbinomial regression) that $\mathrm{var}(y_i) = \hat\mu_i (1  \hat\mu_i)$, let us use a different mean–variance relationship. When the outcome is rare, $\hat\mu_i (1  \hat\mu_i) ≈ \hat\mu_i$, so $\mathrm{var}(y_i) = \hat\mu_i$ seems like a reasonable choice.^{3} Keeping the log link, the quasiscore function becomes:
\[\begin{align*} S_j(\beta) &= \sum_i \frac{(y_i  \hat\mu_i) x_{ij}}{\mathrm{var}(y_i)} \frac{\partial \hat\mu_i}{\partial \log(\hat\mu_i)} \\ &= \sum_i \frac{(y_i  \hat\mu_i) x_{ij}}{\hat\mu_i} \cdot \hat\mu_i \\ &= \sum_i (y_i  \hat\mu_i) x_{ij} \end{align*}\]This quasiscore function is identical to the score function from Poisson regression [2 p. 230]. So the estimators $\hat\beta$ produced by Poisson regression will be identical to the quasilikelihood estimation, and quasilikelihood estimation tells us these estimators will be consistent, even though the mean–variance relationship is misspecified. However, if the variance is misspecified, the usual calculation for standard errors of the parameters will be incorrect, and a robust sandwich estimator must instead be used [2 p. 279].
This is the basis for the use of robust Poisson regression in modelling risk ratios – not because we must assume a Poisson distribution for the outcome, but because the estimators $\hat\beta$ are numerically identical to those produced by a consistent quasilikelihood estimation which makes no such assumption.
Further reading
Similar reasoning, but instead assuming constant variance, yields lognormal regression as a method for computing risk ratios, but this technique is uncommon [4].
It is also reported that robust Poisson regression can be justified from a semiparametric perspective [5].
See also https://blog.stata.com/2011/08/22/usepoissonratherthanregresstellafriend/ for a more exciting discussion of robust Poisson regression generally.
References
[1] Chen W, Qian L, Shi J, Franklin M. Comparing performance between logbinomial and robust Poisson regression models for estimating risk ratios under model misspecification. BMC Medical Research Methodology. 2018 Jun 22; 18(1): Article 63. doi: 10.1186/s1287401805195
[2] Agresti A. Foundations of linear and generalized linear models. New Jersey: John Wiley & Sons; 2015.
[3] Carter RE, Lipsitz SR, Tilley BC. Quasilikelihood estimation for relative risk regression models. Biostatistics. 2005; 6(1): 39–44. doi: 10.1093/biostatistics/kxh016
[4] Lumley T, Kronmal R, Ma S. Relative risk regression in medical research: models, contrasts, estimators, and algorithms. UW Biostatistics Working Paper Series. 2006 [cited 2022 Aug 29]; Working Paper 293. http://biostats.bepress.com/uwbiostat/paper293
[5] Talbot D, Mesidor M, Chiu Y, et al. An alternative perspective on the robust poisson method for estimating risk or prevalence ratios [Preprint]. 2022 Mar 30 [cited 2022 Aug 29]. https://arxiv.org/abs/2112.00547
Footnotes

Note that, in the estimating equations, $\frac{1}{1  \hat\mu_i}$ is effectively a weight applied to each observation. When $\hat\mu_i$ approaches 1, the weight approaches infinity [1]. ↩

This result is also known for maximum likelihood estimators of generalised linear models with distributions in the exponential family [2 p. 128]. ↩

Compared to the logbinomial model, the analogous weight applied to each observation is 1, so in some sense, this quasilikelihood model weights each observation equally [4]. ↩