Helpful SciPy utilities and recipes
This is a collection of helpful utilities and recipes for biostatistics analyses in Python/SciPy. The objective is to (where possible) wrap existing implementations of statistical functions to provide a convenient and consistent interface, which reduces boilerplate, and reports results in a userfriendly standard format for medical research (cf. APA, AMA, NEJM style).
For example, compare the standard statsmodels code:
import statsmodels.api as sm
d1 = sm.stats.DescrStatsW(df[df['Type'] == 'Fresh']['Potency'])
d2 = sm.stats.DescrStatsW(df[df['Type'] == 'Stored']['Potency'])
cm = sm.stats.CompareMeans(d1, d2)
print(cm.ttest_ind())
print(d1.mean  d2.mean)
print(cm.tconfint_diff())
(4.236832825824121, 0.0004959477708633275, 18.0)
0.5399999999999991
(0.2722297224438056, 0.8077702775561927)
With the following:
yli.ttest_ind(df, 'Potency', 'Type')
t(18) = 4.24; p < 0.001*
Δμ (95% CI) = 0.54 (0.27–0.81), Fresh > Stored
Scope
This is a personal helper library, whose scope is limited to statistical functions and applications which are useful to me. It is not intended or expected that you will necessarily be able to import the library wholesale for your purposes. Rather, in the spirit of collaboration it is hoped that this library may contain examples and generally standalone implementations (hence ‘utilities and recipes’) which may be helpful for you to use or adapt.
Dependencies
The mandatory dependencies of this library are:
 NumPy, tested on 1.23.3
 pandas, tested on 1.4.4
 SciPy, tested on 1.9.2
 statsmodels, tested on 0.13.2
Optional dependencies are:
 hpstat, for IntervalCensoredCox
 matplotlib and seaborn, for plotting functions
 mpmath, for beta_ratio and beta_oddsratio
 PyCryptodome, for pickle_write_encrypted and pickle_read_encrypted
 rpy2, with R packages:
 shap, for RegressionModel.shap
Functions
Relevant statistical functions are all directly available from the toplevel yli namespace:
 Significance testing:
 anova_oneway: Oneway ANOVA
 chi2: Pearson χ^{2} test
 mannwhitney: Mann–Whitney U test
 pearsonr: Pearson correlation coefficient r
 ttest_ind: Independent 2sample t test
 Regression:
 IntervalCensoredCox: Model for intervalcensored Cox regression
 PenalisedLogit: Model for Firth penalised logistic regression
 regress: Fit arbitrary regression models
 vif: Compute the variance inflation factor for independent variables in regression
 Survival analysis:
 kaplanmeier: Kaplan–Meier plot
 logrank: Logrank test
 turnbull: Turnbull estimator plot for intervalcensored data
 Input/output:
 pickle_write_compressed, pickle_read_compressed: Pickle a pandas DataFrame and compress using LZMA
 pickle_write_encrypted, pickle_read_encrypted: Pickle a pandas DataFrame, compress using LZMA, and encrypt
 Distributions:
 beta_oddsratio: SciPy distribution for the odds ratio of 2 independent betadistributed variables
 beta_ratio: SciPy distribution for the ratio of 2 independent betadistributed variables
 hdi: Find the highest density interval (e.g. highest posterior density, HPD/HDI) for a SciPy distribution
 transformed_dist: SciPy distribution for an arbitrary transformation of a SciPy distribution
 Bayesian inference:
 bayesfactor_afbf: Adjusted fractional Bayes factor for a hypothesis on parameters from regression
Documentation and examples
Each function is documented in the respective docstring within the source code, and Sphinx documentation is buildable from the docs directory. Hosted documentation is available at https://yingtongli.me/scipyylidocs/.
Examples can be found in the unit tests in the tests directory.
Warning
No warranty is made as to the correctness of any function in this library. While the library is unit tested, the validation is not extensive. This applies particularly for functions which are more than simple wrappers for existing implementations. Please apply appropriate caution.