Significance tests
Functions
- yli.anova_oneway(df, dep, ind, *, nan_policy='warn')
Perform one-way ANOVA
- Parameters:
df (DataFrame) – Data to perform the test on
dep (str) – Column in df for the dependent variable (numeric)
ind (str) – Column in df for the independent variable (categorical)
nan_policy (str) – How to handle nan values (see NaN handling)
- Return type:
Example:
df = pd.DataFrame({ 'Method': [1]*8 + [2]*7 + [3]*9, 'Score': [96, 79, 91, ...] }) yli.anova_oneway(df, 'Score', 'Method')
F(2, 21) = 29.57; p < 0.001*
The output states that the value of the F statistic is 29.57, the F distribution has 2 degrees of freedom in the numerator and 21 in the denominator, and the test is significant with p value < 0.001.
- yli.auto_univariable(df, dep, inds, *, nan_policy='warn')
Automatically compute univariable tests of association for a dichotomous dependent variable
The tests performed are:
For a dichotomous independent variable –
yli.chi2()
For a continuous independent variable –
yli.ttest_ind()
For an ordinal independent variable –
yli.mannwhitney()
If nan_policy is warn or omit, rows with nan values are omitted only from the individual tests of association for the missing variables.
- Parameters:
df (DataFrame) – Data to perform the test on
dep (str) – Column in df for the dependent variable (dichotomous)
inds (List[str]) – Columns in df for the independent variables
nan_policy (str) – How to handle nan values (see NaN handling)
- Return type:
- yli.chi2(df, dep, ind, *, nan_policy='warn')
Perform a Pearson χ2 test
If a 2×2 contingency table is obtained (i.e. if both variables are dichotomous), an odds ratio and risk ratio are calculated. The ratios are calculated for the higher-valued value in each variable (i.e. True compared with False for a boolean). The risk ratio is calculated relative to the independent variable (rows of the contingency table).
- Parameters:
df (DataFrame) – Data to perform the test on
dep (str) – Column in df for the dependent variable (categorical)
ind (str) – Column in df for the independent variable (categorical)
nan_policy (str) – How to handle nan values (see NaN handling)
- Return type:
Example:
df = pd.DataFrame({ 'Response': np.repeat([False, True, False, True], [250, 750, 400, 1600]), 'Stress': np.repeat([False, False, True, True], [250, 750, 400, 1600]) }) yli.chi2(df, 'Stress', 'Response')
Stress False True Response False 250 400 True 750 1600 χ²(1) = 9.82; p = 0.002* OR (95% CI) = 1.33 (1.11–1.60) RR (95% CI) = 1.11 (1.03–1.18)
The output shows the contingency table, and states that the value of the Pearson χ2 statistic is 9.82, the χ2 distribution has 1 degree of freedom, and the test is significant with p value 0.002.
The odds of Stress in the Response = True group are 1.33 times that in the Response = False group, with 95% confidence interval 1.11–1.60.
The risk of Stress in the Response = True group is 1.11 times that in the Response = False group, with 95% confidence interval 1.03–1.18.
- yli.mannwhitney(df, dep, ind, *, nan_policy='warn', brunnermunzel=True, use_continuity=False, alternative='two-sided', method='auto')
Perform a Mann–Whitney U test
By default, this function performs a Brunner–Munzel test if the Mann–Whitney test is significant. If the Mann–Whitney test is significant but the Brunner–Munzel test is not, a warning is raised. The Brunner–Munzel test is returned only if non-significant.
- Parameters:
df (DataFrame) – Data to perform the test on
dep (str) – Column in df for the dependent variable (numeric)
ind (str) – Column in df for the independent variable (dichotomous)
nan_policy (str) – How to handle nan values (see NaN handling)
brunnermunzel (bool) – Whether to compute the Brunner–Munzel test if the Mann–Whitney test is significant
use_continuity – See scipy.stats.mannwhitneyu
alternative – See scipy.stats.mannwhitneyu
method – See scipy.stats.mannwhitneyu
- Returns:
The result of the Mann–Whitney test. The result of a Brunner–Munzel test is included in the result object if and only if brunnermunzel is True, and the Mann–Whitney test is significant, and the Brunner–Munzel test is non-significant.
- Return type:
Example:
df = pd.DataFrame({ 'Sample': ['Before'] * 12 + ['After'] * 12, 'Oxygen': [11.0, 11.2, 11.2, ...] }) yli.mannwhitney(df, 'Oxygen', 'Sample', method='asymptotic', alternative='less')
Sample After Before Oxygen Median (IQR) 10.75 (10.55–10.95) 11.55 (11.20–11.83) Median (range) 10.75 (11.00–12.10) 11.55 (11.00–12.10) U = 6.0; p < 0.001* r = 0.92, Before > After
The output states that the value of the Mann–Whitney U statistic is 6.0, and the one-sided test is significant with asymptotic p value < 0.001. The rank-biserial correlation is 0.92 in favour of the Before group.
- yli.pearsonr(df, dep, ind, *, nan_policy='warn')
Compute the Pearson product-moment correlation coefficient (Pearson’s r)
- Parameters:
df (DataFrame) – Data to perform the test on
dep (str) – Column in df for the dependent variable (numerical)
ind (str) – Column in df for the independent variable (numerical)
nan_policy (str) – How to handle nan values (see NaN handling)
- Return type:
Example:
df = pd.DataFrame({ 'y': [41, 39, 47, 51, 43, 40, 57, 46, 50, 59, 61, 52], 'x': [24, 30, 33, 35, 36, 36, 37, 37, 38, 40, 43, 49] }) yli.pearsonr(df, 'y', 'x')
r (95% CI) = 0.65 (0.11–0.89); p = 0.02*
The output states that the value of the Pearson correlation coefficient is 0.65, with 95% confidence interval 0.11–0.89, and the test is significant with p value 0.02.
- yli.spearman(df, dep, ind, *, nan_policy='warn')
Compute the Spearman rank correlation coefficient (Spearman’s ρ)
The confidence interval for ρ is computed analogously to SciPy’s pearsonr, using the Fisher transformation and normal approximation, without adjustment to variance.
- Parameters:
df (DataFrame) – Data to perform the test on
dep (str) – Column in df for the dependent variable (numerical)
ind (str) – Column in df for the independent variable (numerical)
nan_policy (str) – How to handle nan values (see NaN handling)
- Return type:
Example:
df = pd.DataFrame({ 'Profit': [2.5, 6.2, 3.1, ...], 'Quality': [50, 57, 61, ...] }) yli.spearman(df, 'Profit', 'Quality')
ρ (95% CI) = 0.87 (0.60–0.96); p < 0.001*
The output states that the value of the Spearman correlation coefficient is 0.87, with 95% confidence interval 0.60–0.96, and the test is significant with p value < 0.001.
- yli.ttest_ind(df, dep, ind, *, nan_policy='warn')
Perform an independent 2-sample Student’s t test
- Parameters:
df (DataFrame) – Data to perform the test on
dep (str) – Column in df for the dependent variable (numeric)
ind (str) – Column in df for the independent variable (dichotomous)
nan_policy (str) – How to handle nan values (see NaN handling)
- Return type:
Example:
df = pd.DataFrame({ 'Type': ['Fresh'] * 10 + ['Stored'] * 10, 'Potency': [10.2, 10.5, 10.3, ...] }) yli.ttest_ind(df, 'Potency', 'Type')
Type Fresh Stored Potency μ (SD) 10.37 (0.32) 9.83 (0.24) t(18) = 4.24; p < 0.001* Δμ (95% CI) = 0.54 (0.27–0.81), Fresh > Stored
The output states that the value of the t statistic is 4.24, the t distribution has 18 degrees of freedom, and the test is significant with p value < 0.001. The mean difference is 0.54 in favour of the Fresh group, with 95% confidence interval 0.27–0.81.
- yli.ttest_ind_multiple(df, dep, inds, *, nan_policy='warn', method='hs')
Perform independent 2-sample Student’s t tests with multiple independent variables, adjusting for multiplicity
- Parameters:
df (DataFrame) – Data to perform the test on
dep (str) – Column in df for the dependent variable (numeric)
ind (List[str]) – Columns in df for the independent variables (dichotomous)
nan_policy (str) – How to handle nan values (see NaN handling)
method (str) – Method to apply for multiplicity adjustment (see statsmodels multipletests)
- Return type:
Result classes
- class yli.sig_tests.AutoBinaryResult(*, dep, group1, group2, result_data, result_labels)
Result of automatically computed univariable tests of association for a dichotomous dependent variable
Results data stored within instances of this class is not intended to be directly accessed.
- dep
Name of the dependent variable (str)
- group1
Name of the first group (str)
- group2
Name of the second group (str)
- summary()
Return a stringified summary of the tests of association
- Return type:
str
- class yli.sig_tests.BrunnerMunzelResult(statistic, pvalue)
Result of a Brunner–Munzel test
See
yli.mannwhitney()
. This library calls the Brunner–Munzel test statistic W.- pvalue
p value for the W statistic (float)
- statistic
W statistic (float)
- summary()
Return a stringified summary of the Brunner–Munzel test
- Return type:
str
- class yli.sig_tests.ChiSquaredResult(statistic, dof, pvalue)
Result of a generic test with χ2-distributed test statistic
See
yli.logrank()
,yli.regress.RegressionModel.deviance_chi2()
.See also
yli.regress.BrantResult
,yli.regress.LikelihoodRatioTestResult
,PearsonChiSquaredResult
.- dof
Degrees of freedom for the χ2 distribution (int)
- pvalue
p value for the χ2 test (float)
- statistic
χ2 statistic (float)
- summary()
Return a stringified summary of the χ2 test
- Return type:
str
- class yli.sig_tests.FTestResult(statistic, dof1, dof2, pvalue)
Result of an F test for ANOVA/regression
See
yli.anova_oneway()
andyli.regress.RegressionModel.ftest()
.- dof1
Degrees of freedom in the F distribution numerator (int)
- dof2
Degrees of freedom in the F distribution denominator (int)
- pvalue
p value for the F statistic (float)
- statistic
F statistic (float)
- summary()
Return a stringified summary of the F test
- Return type:
str
- class yli.sig_tests.MannWhitneyResult(*, statistic, pvalue, dep, ind, group1, group2, med1, med2, iqr1, iqr2, range1, range2, rank_biserial, direction, brunnermunzel=None)
Result of a Mann–Whitney U test
See
yli.mannwhitney()
.- brunnermunzel
BrunnerMunzelResult
on the same data, or None if N/A
- dep
Name of the dependent variable (str)
- direction
Description of the direction of the effect (str)
- group1
Name of the first group (str)
- group2
Name of the second group (str)
- ind
Name of the independent variable (str)
- iqr1
Interquartile range of the first group (
yli.utils.Interval
)
- iqr2
Interquartile range of the second group (
yli.utils.Interval
)
- med1
Median of the first group (float)
- med2
Median of the second group (float)
- pvalue
p value for the U statistic (float)
- range1
Range of the first group (
yli.utils.Interval
)
- range2
Range of the second group (
yli.utils.Interval
)
- rank_biserial
Absolute value of the rank-biserial correlation (float)
- statistic
Lesser of the two Mann–Whitney U statistics (float)
- summary()
Return a stringified summary of the Mann–Whitney test
- Return type:
str
- summary_short(html)
Return a stringified summary of the Mann–Whitney test (U statistic only)
- Return type:
str
- class yli.sig_tests.MultipleTTestResult(*, dep, results)
Result of multiple Student’s t tests, adjusted for multiplicity
- dep
Name of the dependent variable (str)
- results
Results of the t tests (List[
TTestResult
])
- summary()
Return a stringified summary of the t tests
- Return type:
str
- class yli.sig_tests.PearsonChiSquaredResult(ct, statistic, dof, pvalue, oddsratio=None, riskratio=None)
Result of a Pearson χ2 test
See
yli.chi2()
.- ct
Contingency table for the observations (DataFrame)
- dof
Degrees of freedom for the χ2 distribution (int)
- oddsratio
Odds ratio (float; None if not a 2×2 table)
- pvalue
p value for the χ2 test (float)
- riskratio
Risk ratio (float; None if not a 2×2 table)
- statistic
χ2 statistic (float)
- summary()
Return a stringified summary of the χ2 test
- Return type:
str
- summary_short(html)
Return a stringified summary of the χ2 test (χ2 statistic only)
- Return type:
str
- class yli.sig_tests.PearsonRResult(statistic, pvalue)
Result of Pearson product-moment correlation
See
yli.pearsonr()
.- pvalue
p value for the r statistic (float)
- statistic
Pearson r correlation statistic (
yli.utils.Estimate
)
- summary()
Return a stringified summary of the Pearson correlation
- Return type:
str
- class yli.sig_tests.SpearmanResult(statistic, pvalue)
Result of Spearman rank correlation
See
yli.spearman()
.- pvalue
p value for the ρ statistic (float)
- statistic
Spearman ρ correlation statistic (
yli.utils.Estimate
)
- summary()
Return a stringified summary of the Spearman correlation
- Return type:
str
- class yli.sig_tests.TTestResult(*, statistic, dof, pvalue, dep, ind, group1, group2, mu1, mu2, sd1, sd2, delta, delta_direction)
Result of a Student’s t test
See
yli.ttest_ind()
.- delta
Absolute value of the mean difference (
yli.utils.Estimate
)
- delta_direction
Description of the direction of the effect (str)
- dep
Name of the dependent variable (str)
- dof
Degrees of freedom of the t distribution (int)
- group1
Name of the first group (str)
- group2
Name of the second group (str)
- ind
Name of the independent variable (str)
- mu1
Mean of the first group (float)
- mu2
Mean of the second group (float)
- pvalue
p value for the t statistic (float)
- sd1
Standard deviation of the first group (float)
- sd2
Standard deviation of the second group (float)
- statistic
t statistic (float)
- summary()
Return a stringified summary of the t test
- Return type:
str
- summary_short(html)
Return a stringified summary of the t test (t statistic only)
- Return type:
str