Warning! I am not a statistician. This article is not reviewed. Please confer with a responsible adult!

Cox proportional hazards models are commonly used in biostatistics for the modelling of time-to-event data, such as mortality or disease progression. A difficulty in applying Cox models arises when observations are interval censored, i.e. the time of event is not observed exactly, but is known only to fall within a particular interval.

Algorithms have been derived for fitting Cox models on interval-censored data, but software implementations in publicly available software are limited. Current options include icenReg and IntCens for R, and stintcox for Stata 17+.

hpstat intcox is a Rust implementation of interval-censored Cox regression using an iterative convex minorant-based approach described by Huang & Wellner [1], incorporating a damped iterative convex minorant algorithm for the baseline cumulative hazard as described by Aragón & Eberly [2] and Pan [3]. This contrasts with the expectation–maximisation algorithm used by IntCens and Stata. Standard errors are estimated in a computationally efficient manner using a profile likelihood-based method advanced by Zeng, Gao & Lin [4]. This contrasts with the bootstrap-based approach used by icenReg.

Performance

To evaluate the performance of hpstat intcox compared with other software for interval-censored Cox regression, we fit a model on the ‘Bangkok Metropolitan Administration HIV’ data from Zeng, Mao & Lin [5]. This dataset contains 1124 observations, 1216 time points and 8 covariates. Tolerances were tuned by hand to target similar log-likelihoods between the software. icenReg was run both with and without bootstrap standard error calculation – when bootstrap standard errors were calculated, 4-core multiprocessing was enabled.

The table below shows the model log-likelihoods obtained by the software, noting that tolerances were adjusted to achieve a comparison of speed rather than accuracy.

Software Version Options Model log-likelihood
Stata 17.0 full lrmodel favorspeed −604.82642
hpstat 6c5ab0d --ll-tolerance 0.000001 −603.06036
IntCens 0.2 --convergence_threshold 0.003 −604.941
icenReg 2.0.15 bs_samples=0 or 100 −603.0603

The figure below shows the mean execution times on an Intel Core i5-7500. Error bars are shown at 10× true scale.

Execution times

We performed a similar comparison on a real-world dataset comprising 16639 observations, 1546 time points and 9 covariates. With this data, Stata stintcox failed to compute even a single iteration within 5 minutes.

The table below shows the model log-likelihoods obtained by the software.

Software Version Options Model log-likelihood
Stata 17.0 full lrmodel favorspeed Did not converge
hpstat 6c5ab0d --ll-tolerance 0.00005 --max-iterations 5000 −20892.56502
IntCens 0.2 --convergence_threshold 0.03 −20981.5
icenReg 2.0.15 bs_samples=0 or 100 −20892.56

The figure below shows the mean execution times on an Intel Core i5-7500. Error bars are shown at 10× true scale.

Execution times

Discussion

In both comparisons, hpstat intcox was the fastest-performing implementation. icenReg fit the maximum likelihood estimates in a similar amount of time to hpstat, but took significantly longer to estimate standard errors due to using a bootstrap-based approach. IntCens and Stata, which use the more computationally efficient profile likelihood-based estimator applied by hpstat, use a less efficient expectation–maximisation algorithm and took significantly longer on the whole.

The advantage of hpstat intcox is in combining the computationally efficient iterative convex minorant algorithm with profile likelihood-based estimation of standard errors. The maximum likelihood estimation itself also appears to be faster than icenReg, though whether this is due to implementation or library differences was not explored, leading to low total execution times.

A disadvantage of hpstat intcox is that it is more limited compared with other software. Compared with icenReg, it lacks functionality for many postestimation tasks, and is limited to only fitting a proportional hazard Cox model, whereas icenReg can fit proportional odds models as well as parametric survival regression. Compared with IntCens, it can only fit proportional hazard models on time-independent covariates, whereas IntCens can fit proportional odds models as well as account for time-varying covariates and multivariate or clustered data, which are major benefits of the expectation–maximisation algorithm. hpstat has also not been extensively reviewed or tested.

hpstat is available at https://yingtongli.me/git/hpstat.

References

  1. Huang J, Wellner JA. Interval censored survival data: a review of recent progress. In: Lin DY, Fleming TR, editors. Proceedings of the First Seattle Symposium in Biostatistics: survival analysis; 1995 Nov 20–21; University of Washington, Seattle. New York: Springer-Verlag; 1997. p. 123–69. doi: 10.1007/978-1-4684-6316-3_8
  2. Aragón J, Eberly D. On convergence of convex minorant algorithms for distribution estimation with interval-censored data. Journal of Computational and Graphical Statistics. 1992;1(2):129–40. doi: 10.2307/1390837
  3. Pan W. Extending the iterative convex minorant algorithm to the Cox model for interval-censored data. Journal of Computational and Graphical Statistics. 1999;8(1):109–20. doi: 10.1080/10618600.1999.10474804
  4. Zeng D, Gao F, Lin DY. Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data. Biometrika. 2017;104(3):505–25. doi: 10.1093/biomet/asx029
  5. Zeng D, Mao L, Lin DY. Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika. 2016;103(2):253–71. doi: 10.1093/biomet/asw013