Warning! I am not a statistician. This article is not reviewed. Please confer with a responsible adult!

Log-binomial and robust (modified) Poisson regression are common approaches to estimating risk ratios in medical biostatistics [1].

I have discussed log-binomial regression in a previous post about generalised linear models. The conceptual basis for using log-binomial regression to estimate risk ratios is straightforward – it is natural to specify a binomial (Bernoulli) distribution for a binary outcome, and a log link puts the parameter estimates on the risk ratio scale.

Robust Poisson regression, on the other hand, seems harder to justify. Under the usual parametric view of generalised linear models, it seems to assume a Poisson distribution for a binary outcome, which is incorrect, as a Poisson-distributed variable can take any non-negative integer value, but the outcome can only either happen or not.

It turns out that robust Poisson regression can be justified under a quasi-likelihood framework [1].

Maximum likelihood estimation for log-binomial generalised linear models

Generalised linear models, such as the log-binomial model, are fitted by finding the maximum likelihood estimators for the model parameters. The maximum likelihood estimator is the estimator which maximises the likelihood of obtaining the observed result, which we denote by maximising the likelihood function $L(\hat\beta)$. This is equivalent to maximising the log-likelihood $\log L(\hat\beta)$.

When the log-likelihood is maximised, its derivative is 0 with respect to each model parameter, i.e.:

\[\frac{\partial \log L(\hat\beta_j)}{\partial \hat\beta_j} = 0, \quad\text{for each model parameter $\hat\beta_j$}\]

The left-hand side, the derivative of the log-likelihood, is also called the score function $S_j(\hat\beta)$ [2 p. 129].

It can be shown that, for a distribution in the exponential family (which the binomial, Poisson, Gaussian, etc. distributions are), the score function is:

\[S_j(\hat\beta) = \frac{\partial \log L(\hat\beta_j)}{\partial \hat\beta_j} = \sum_i \frac{(y_i - \hat\mu_i) x_{ij}}{\mathrm{var}(y_i)} \frac{\partial \hat\mu_i}{\partial g(\hat\mu_i)}\]

where $g$ is the link function [2 p. 124].

In log-binomial regression, $y_i$ is assumed to follow a binomial distribution, so $\mathrm{var}(y_i) = \hat\mu_i (1 - \hat\mu_i)$. Further, the log function $g(\hat\mu_i) = \log(\hat\mu_i)$. So the score function is:

\[\begin{align*} S_j(\hat\beta) &= \sum_i \frac{(y_i - \hat\mu_i) x_{ij}}{\mathrm{var}(y_i)} \frac{\partial \hat\mu_i}{\partial \log(\hat\mu_i)} \\ &= \sum_i \frac{(y_i - \hat\mu_i) x_{ij}}{\hat\mu_i (1 - \hat\mu_i)} \cdot \hat\mu_i \\ &= \sum_i \frac{(y_i - \hat\mu_i) x_{ij}}{1 - \hat\mu_i} \end{align*}\]

Setting $S_j(\hat\beta) = 0$ for each $j$ gives us a system of $j$ simultaneous estimating equations, the solution to which yields the maximum likelihood estimator.

However, when $\hat\mu_i$ (i.e. the predicted probability of the outcome) is close to 1, the denominator $1 - \hat\mu_i$ approaches 0, leading to convergence issues in the log-binomial model [3].1 To address this issue, we turn to quasi-likelihood estimation.

Quasi-likelihood estimation to address model convergence

Let us return to the definition of the score function from maximum likelihood estimation:

\[S_j(\hat\beta) = \sum_i \frac{(y_i - \hat\mu_i) x_{ij}}{\mathrm{var}(y_i)} \frac{\partial \hat\mu_i}{\partial g(\hat\mu_i)}\]

In maximum likelihood estimation, the variance $\mathrm{var}(y_i)$ is derived from an assumption about the distribution of $y$. In quasi-likelihood estimation, we make no parametric assumption about the distribution of $y$. Rather, we assume only a particular relationship between the mean and variance [2 p. 268]. Plugging the mean–variance relationship into the above expression yields a quasi-score function [2 p. 278].

In quasi-likelihood estimation, we let the quasi-score functions $S_j(\hat\beta) = 0$, and solve as usual. Because we make no distributional assumption, there is no likelihood-based interpretation of the resulting model. However, it turns out that models produced by solving quasi-likelihood estimating equations still have good properties, and the estimators $\hat\beta$ are consistent. In fact, this result holds even if the mean–variance relationship is misspecified [2 p. 279].2

So, instead of assuming (as in log-binomial regression) that $\mathrm{var}(y_i) = \hat\mu_i (1 - \hat\mu_i)$, let us use a different mean–variance relationship. When the outcome is rare, $\hat\mu_i (1 - \hat\mu_i) ≈ \hat\mu_i$, so $\mathrm{var}(y_i) = \hat\mu_i$ seems like a reasonable choice.3 Keeping the log link, the quasi-score function becomes:

\[\begin{align*} S_j(\beta) &= \sum_i \frac{(y_i - \hat\mu_i) x_{ij}}{\mathrm{var}(y_i)} \frac{\partial \hat\mu_i}{\partial \log(\hat\mu_i)} \\ &= \sum_i \frac{(y_i - \hat\mu_i) x_{ij}}{\hat\mu_i} \cdot \hat\mu_i \\ &= \sum_i (y_i - \hat\mu_i) x_{ij} \end{align*}\]

This quasi-score function is identical to the score function from Poisson regression [2 p. 230]. So the estimators $\hat\beta$ produced by Poisson regression will be identical to the quasi-likelihood estimation, and quasi-likelihood estimation tells us these estimators will be consistent, even though the mean–variance relationship is misspecified. However, if the variance is misspecified, the usual calculation for standard errors of the parameters will be incorrect, and a robust sandwich estimator must instead be used [2 p. 279].

This is the basis for the use of robust Poisson regression in modelling risk ratios – not because we must assume a Poisson distribution for the outcome, but because the estimators $\hat\beta$ are numerically identical to those produced by a consistent quasi-likelihood estimation which makes no such assumption.

Further reading

Similar reasoning, but instead assuming constant variance, yields log-normal regression as a method for computing risk ratios, but this technique is uncommon [4].

It is also reported that robust Poisson regression can be justified from a semiparametric perspective [5].

See also https://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/ for a more exciting discussion of robust Poisson regression generally.


[1] Chen W, Qian L, Shi J, Franklin M. Comparing performance between log-binomial and robust Poisson regression models for estimating risk ratios under model misspecification. BMC Medical Research Methodology. 2018 Jun 22; 18(1): Article 63. doi: 10.1186/s12874-018-0519-5

[2] Agresti A. Foundations of linear and generalized linear models. New Jersey: John Wiley & Sons; 2015.

[3] Carter RE, Lipsitz SR, Tilley BC. Quasi-likelihood estimation for relative risk regression models. Biostatistics. 2005; 6(1): 39–44. doi: 10.1093/biostatistics/kxh016

[4] Lumley T, Kronmal R, Ma S. Relative risk regression in medical research: models, contrasts, estimators, and algorithms. UW Biostatistics Working Paper Series. 2006 [cited 2022 Aug 29]; Working Paper 293. http://biostats.bepress.com/uwbiostat/paper293

[5] Talbot D, Mesidor M, Chiu Y, et al. An alternative perspective on the robust poisson method for estimating risk or prevalence ratios [Preprint]. 2022 Mar 30 [cited 2022 Aug 29]. https://arxiv.org/abs/2112.00547


  1. Note that, in the estimating equations, $\frac{1}{1 - \hat\mu_i}$ is effectively a weight applied to each observation. When $\hat\mu_i$ approaches 1, the weight approaches infinity [1]

  2. This result is also known for maximum likelihood estimators of generalised linear models with distributions in the exponential family [2 p. 128]

  3. Compared to the log-binomial model, the analogous weight applied to each observation is 1, so in some sense, this quasi-likelihood model weights each observation equally [4]