Add README.md

This commit is contained in:
RunasSudo 2022-10-17 20:35:11 +11:00
parent ca521ae529
commit afc0a724fe
Signed by: RunasSudo
GPG Key ID: 7234E476BF21C61A
1 changed files with 79 additions and 0 deletions

79
README.md Normal file
View File

@ -0,0 +1,79 @@
# Helpful SciPy utilities and recipes
This is a collection of helpful utilities and recipes for biostatistics analyses in Python/SciPy. The objective is to (where possible) wrap existing implementations of statistical functions to provide a convenient and consistent interface, which reduces boilerplate, and reports results in a user-friendly standard format for medical research (cf. APA, AMA, NEJM style).
For example, compare the standard statsmodels code:
```python
import statsmodels.api as sm
d1 = sm.stats.DescrStatsW(df[df['Type'] == 'Fresh']['Potency'])
d2 = sm.stats.DescrStatsW(df[df['Type'] == 'Stored']['Potency'])
cm = sm.stats.CompareMeans(d1, d2)
print(cm.ttest_ind())
print(d1.mean - d2.mean)
print(cm.tconfint_diff())
```
```
(4.236832825824121, 0.0004959477708633275, 18.0)
0.5399999999999991
(0.2722297224438056, 0.8077702775561927)
```
With the following:
```python
yli.ttest_ind(df, 'Potency', 'Type')
```
> *t*(18) = 4.24; *p* < 0.001*
> Δ*μ* (95% CI) = 0.54 (0.27–0.81), Fresh > Stored
## Scope
This is a *personal* helper library, whose scope is limited to statistical functions and applications which are useful to me. It is not intended or expected that you will necessarily be able to import the library wholesale for your purposes. Rather, in the spirit of collaboration it is hoped that this library may contain examples and generally standalone implementations (hence ‘utilities and recipes’) which may be helpful for you to use or adapt.
## Dependencies
The mandatory dependencies of this library are:
* [NumPy](https://numpy.org/), tested on 1.23.3
* [pandas](https://pandas.pydata.org/), tested on 1.4.4
* [SciPy](https://scipy.org/), tested on 1.9.2
* [statsmodels](https://www.statsmodels.org/), tested on 0.13.2
Optional dependencies are:
* [mpmath](https://mpmath.org/), for *beta_ratio* and *beta_oddsratio*
* [rpy2](https://rpy2.github.io/), with R packages:
* [BFpack](https://cran.r-project.org/web/packages/BFpack/index.html), for *bayesfactor_afbf* (*RegressionResult.bayesfactor_beta_zero*)
* [logistf](https://cran.r-project.org/web/packages/logistf/index.html), for *PenalisedLogit*
## Functions
Relevant statistical functions are all directly available from the top-level *yli* namespace:
* Significance testing:
* *anova_oneway*: One-way ANOVA
* *chi2*: Pearson *χ*<sup>2</sup> test
* *mannwhitney*: Mann–Whitney *U* test
* *pearsonr*: Pearson correlation coefficient *r*
* *ttest_ind*: Independent 2-sample *t* test
* Regression:
* *logit_then_regress*: Perform logistic regression and use the estimates as the starting values for an arbitrary regression
* *PenalisedLogit*: Model for Firth penalised logistic regression
* *regress*: Fit arbitrary regression models
* *vif*: Compute the variance inflation factor for independent variables in regression
* Input/output:
* *pickle_write_compressed*, *pickle_read_compressed*: Pickle a pandas DataFrame and compress using LZMA
* *pickle_write_encrypted*, *pickle_read_encrypted*: Pickle a pandas DataFrame, compress using LZMA, and encrypt
* Distributions:
* *beta_oddsratio*: SciPy distribution for the odds ratio of 2 independent beta-distributed variables
* *beta_ratio*: SciPy distribution for the ratio of 2 independent beta-distributed variables
* *hdi*: Find the highest density interval (e.g. highest posterior density, HPD/HDI) for a SciPy distribution
* *transformed_dist*: SciPy distribution for an arbitrary transformation of a SciPy distribution
* Bayesian inference:
* *bayesfactor_afbf*: Adjusted fractional Bayes factor for a hypothesis on parameters from regression
Each function is documented in the respective docstring within the source code. Examples can be found in the unit tests in the *tests* directory.