scipy-yli/README.md

# Helpful SciPy utilities and recipes

This is a collection of helpful utilities and recipes for biostatistics analyses in Python/SciPy. The objective is to (where possible) wrap existing implementations of statistical functions to provide a convenient and consistent interface, which reduces boilerplate, and reports results in a user-friendly standard format for medical research (cf. APA, AMA, NEJM style).

For example, compare the standard statsmodels code:

```python
import statsmodels.api as sm
d1 = sm.stats.DescrStatsW(df[df['Type'] == 'Fresh']['Potency'])
d2 = sm.stats.DescrStatsW(df[df['Type'] == 'Stored']['Potency'])

cm = sm.stats.CompareMeans(d1, d2)
print(cm.ttest_ind())
print(d1.mean - d2.mean)
print(cm.tconfint_diff())
```

```
(4.236832825824121, 0.0004959477708633275, 18.0)
0.5399999999999991
(0.2722297224438056, 0.8077702775561927)
```

With the following:

```python
yli.ttest_ind(df, 'Potency', 'Type')
```

> *t*(18) = 4.24; *p* < 0.001*    
> Δ*μ* (95% CI) = 0.54 (0.27–0.81), Fresh > Stored

## Scope

This is a *personal* helper library, whose scope is limited to statistical functions and applications which are useful to me. It is not intended or expected that you will necessarily be able to import the library wholesale for your purposes. Rather, in the spirit of collaboration it is hoped that this library may contain examples and generally standalone implementations (hence ‘utilities and recipes’) which may be helpful for you to use or adapt.

## Dependencies

The mandatory dependencies of this library are:

* [NumPy](https://numpy.org/), tested on 1.23.3
* [pandas](https://pandas.pydata.org/), tested on 1.4.4
* [SciPy](https://scipy.org/), tested on 1.9.2
* [statsmodels](https://www.statsmodels.org/), tested on 0.13.2

Optional dependencies are:

* [hpstat](https://yingtongli.me/git/hpstat), for *turnbull* and *IntervalCensoredCox*
* [matplotlib](https://matplotlib.org/) and [seaborn](https://seaborn.pydata.org/), for plotting functions
* [mpmath](https://mpmath.org/), for *beta_ratio* and *beta_oddsratio*
* [PyCryptodome](https://www.pycryptodome.org/), for *pickle_write_encrypted* and *pickle_read_encrypted*
* [rpy2](https://rpy2.github.io/), with R packages:
	* [BFpack](https://cran.r-project.org/web/packages/BFpack/index.html), for *bayesfactor_afbf* (*RegressionModel.bayesfactor_beta_zero*)
	* [logistf](https://cran.r-project.org/web/packages/logistf/index.html), for *PenalisedLogit*
* [shap](https://shap.readthedocs.io/en/latest/), for *RegressionModel.shap*

## Functions

Relevant statistical functions are all directly available from the top-level *yli* namespace:

* Significance testing:
	* *anova_oneway*: One-way ANOVA
	* *chi2*: Pearson *χ*<sup>2</sup> test
	* *mannwhitney*: Mann–Whitney *U* test
	* *pearsonr*: Pearson correlation coefficient *r*
	* *ttest_ind*: Independent 2-sample *t* test
* Regression:
	* *IntervalCensoredCox*: Model for interval-censored Cox regression
	* *PenalisedLogit*: Model for Firth penalised logistic regression
	* *regress*: Fit arbitrary regression models
	* *vif*: Compute the variance inflation factor for independent variables in regression
* Survival analysis:
	* *kaplanmeier*: Kaplan–Meier plot
	* *logrank*: Log-rank test
	* *turnbull*: Turnbull estimator plot, including pointwise confidence intervals, for interval-censored data
* Input/output:
	* *pickle_write_compressed*, *pickle_read_compressed*: Pickle a pandas DataFrame and compress using LZMA
	* *pickle_write_encrypted*, *pickle_read_encrypted*: Pickle a pandas DataFrame, compress using LZMA, and encrypt
* Distributions:
	* *beta_oddsratio*: SciPy distribution for the odds ratio of 2 independent beta-distributed variables
	* *beta_ratio*: SciPy distribution for the ratio of 2 independent beta-distributed variables
	* *hdi*: Find the highest density interval (e.g. highest posterior density, HPD/HDI) for a SciPy distribution
	* *transformed_dist*: SciPy distribution for an arbitrary transformation of a SciPy distribution
* Bayesian inference:
	* *bayesfactor_afbf*: Adjusted fractional Bayes factor for a hypothesis on parameters from regression

## Documentation and examples

Each function is documented in the respective docstring within the source code, and Sphinx documentation is buildable from the *docs* directory. Hosted documentation is available at <https://yingtongli.me/scipy-yli-docs/>.

Examples can be found in the unit tests in the *tests* directory.

## Warning

No warranty is made as to the correctness of any function in this library. While the library is unit tested, the validation is not extensive. This applies particularly for functions which are more than simple wrappers for existing implementations. Please apply appropriate caution.
Add README.md 2022-10-17 20:35:11 +11:00			`# Helpful SciPy utilities and recipes`

			`This is a collection of helpful utilities and recipes for biostatistics analyses in Python/SciPy. The objective is to (where possible) wrap existing implementations of statistical functions to provide a convenient and consistent interface, which reduces boilerplate, and reports results in a user-friendly standard format for medical research (cf. APA, AMA, NEJM style).`

			`For example, compare the standard statsmodels code:`

			```python
			`import statsmodels.api as sm`
			`d1 = sm.stats.DescrStatsW(df[df['Type'] == 'Fresh']['Potency'])`
			`d2 = sm.stats.DescrStatsW(df[df['Type'] == 'Stored']['Potency'])`

			`cm = sm.stats.CompareMeans(d1, d2)`
			`print(cm.ttest_ind())`
			`print(d1.mean - d2.mean)`
			`print(cm.tconfint_diff())`
			```

			```
			`(4.236832825824121, 0.0004959477708633275, 18.0)`
			`0.5399999999999991`
			`(0.2722297224438056, 0.8077702775561927)`
			```

			`With the following:`

			```python
			`yli.ttest_ind(df, 'Potency', 'Type')`
			```

			`> t(18) = 4.24; p < 0.001*`
			`> Δμ (95% CI) = 0.54 (0.27–0.81), Fresh > Stored`

			`## Scope`

			`This is a personal helper library, whose scope is limited to statistical functions and applications which are useful to me. It is not intended or expected that you will necessarily be able to import the library wholesale for your purposes. Rather, in the spirit of collaboration it is hoped that this library may contain examples and generally standalone implementations (hence ‘utilities and recipes’) which may be helpful for you to use or adapt.`

			`## Dependencies`

			`The mandatory dependencies of this library are:`

			`* [NumPy](https://numpy.org/), tested on 1.23.3`
			`* [pandas](https://pandas.pydata.org/), tested on 1.4.4`
			`* [SciPy](https://scipy.org/), tested on 1.9.2`
			`* [statsmodels](https://www.statsmodels.org/), tested on 0.13.2`

			`Optional dependencies are:`

Use hpstat for yli.turnbull to enable computation of confidence intervals 2023-10-20 21:11:56 +11:00			`* [hpstat](https://yingtongli.me/git/hpstat), for turnbull and IntervalCensoredCox`
Implement yli.IntervalCensoredCox 2023-04-17 22:38:44 +10:00			`* [matplotlib](https://matplotlib.org/) and [seaborn](https://seaborn.pydata.org/), for plotting functions`
Add README.md 2022-10-17 20:35:11 +11:00			`* [mpmath](https://mpmath.org/), for beta_ratio and beta_oddsratio`
Make PyCryptodome an optional dependency 2022-10-18 17:58:48 +11:00			`* [PyCryptodome](https://www.pycryptodome.org/), for pickle_write_encrypted and pickle_read_encrypted`
Add README.md 2022-10-17 20:35:11 +11:00			`* [rpy2](https://rpy2.github.io/), with R packages:`
Update documentation 2023-04-16 23:52:12 +10:00			`* [BFpack](https://cran.r-project.org/web/packages/BFpack/index.html), for bayesfactor_afbf (RegressionModel.bayesfactor_beta_zero)`
Add README.md 2022-10-17 20:35:11 +11:00			`* [logistf](https://cran.r-project.org/web/packages/logistf/index.html), for PenalisedLogit`
Update documentation 2023-04-16 23:52:12 +10:00			`* [shap](https://shap.readthedocs.io/en/latest/), for RegressionModel.shap`
Add README.md 2022-10-17 20:35:11 +11:00
			`## Functions`

			`Relevant statistical functions are all directly available from the top-level yli namespace:`

			`* Significance testing:`
			`* anova_oneway: One-way ANOVA`
			`* chi2: Pearson χ<sup>2</sup> test`
			`* mannwhitney: Mann–Whitney U test`
			`* pearsonr: Pearson correlation coefficient r`
			`* ttest_ind: Independent 2-sample t test`
			`* Regression:`
Implement yli.IntervalCensoredCox 2023-04-17 22:38:44 +10:00			`* IntervalCensoredCox: Model for interval-censored Cox regression`
Add README.md 2022-10-17 20:35:11 +11:00			`* PenalisedLogit: Model for Firth penalised logistic regression`
			`* regress: Fit arbitrary regression models`
			`* vif: Compute the variance inflation factor for independent variables in regression`
Add documentation for survival analysis 2023-02-26 00:05:10 +11:00			`* Survival analysis:`
			`* kaplanmeier: Kaplan–Meier plot`
			`* logrank: Log-rank test`
Use hpstat for yli.turnbull to enable computation of confidence intervals 2023-10-20 21:11:56 +11:00			`* turnbull: Turnbull estimator plot, including pointwise confidence intervals, for interval-censored data`
Add README.md 2022-10-17 20:35:11 +11:00			`* Input/output:`
			`* pickle_write_compressed, pickle_read_compressed: Pickle a pandas DataFrame and compress using LZMA`
			`* pickle_write_encrypted, pickle_read_encrypted: Pickle a pandas DataFrame, compress using LZMA, and encrypt`
			`* Distributions:`
			`* beta_oddsratio: SciPy distribution for the odds ratio of 2 independent beta-distributed variables`
			`* beta_ratio: SciPy distribution for the ratio of 2 independent beta-distributed variables`
			`* hdi: Find the highest density interval (e.g. highest posterior density, HPD/HDI) for a SciPy distribution`
			`* transformed_dist: SciPy distribution for an arbitrary transformation of a SciPy distribution`
			`* Bayesian inference:`
			`* bayesfactor_afbf: Adjusted fractional Bayes factor for a hypothesis on parameters from regression`

Touch up documentation 2022-11-10 18:28:46 +11:00			`## Documentation and examples`

			`Each function is documented in the respective docstring within the source code, and Sphinx documentation is buildable from the docs directory. Hosted documentation is available at <https://yingtongli.me/scipy-yli-docs/>.`

			`Examples can be found in the unit tests in the tests directory.`
Add README.md 2022-10-17 20:38:05 +11:00
			`## Warning`

			`No warranty is made as to the correctness of any function in this library. While the library is unit tested, the validation is not extensive. This applies particularly for functions which are more than simple wrappers for existing implementations. Please apply appropriate caution.`