Significance tests

Functions

yli.anova_oneway(df, dep, ind, *, nan_policy='warn')

Perform one-way ANOVA

Parameters:

df (DataFrame) – Data to perform the test on
dep (str) – Column in df for the dependent variable (numeric)
ind (str) – Column in df for the independent variable (categorical)
nan_policy (str) – How to handle nan values (see NaN handling)

Return type:

yli.sig_tests.FTestResult

Example:

df = pd.DataFrame({
        'Method': [1]*8 + [2]*7 + [3]*9,
        'Score': [96, 79, 91, ...]
})
yli.anova_oneway(df, 'Score', 'Method')

F(2, 21) = 29.57; p < 0.001*

The output states that the value of the F statistic is 29.57, the F distribution has 2 degrees of freedom in the numerator and 21 in the denominator, and the test is significant with p value < 0.001.

yli.auto_univariable(df, dep, inds, *, nan_policy='warn')

Automatically compute univariable tests of association for a dichotomous dependent variable

The tests performed are:

For a dichotomous independent variable – yli.chi2()
For a continuous independent variable – yli.ttest_ind()
For an ordinal independent variable – yli.mannwhitney()

If nan_policy is warn or omit, rows with nan values are omitted only from the individual tests of association for the missing variables.

Parameters:

df (DataFrame) – Data to perform the test on
dep (str) – Column in df for the dependent variable (dichotomous)
inds (List[str]) – Columns in df for the independent variables
nan_policy (str) – How to handle nan values (see NaN handling)

Return type:

yli.sig_tests.AutoBinaryResult

yli.chi2(df, dep, ind, *, nan_policy='warn')

Perform a Pearson χ² test

If a 2×2 contingency table is obtained (i.e. if both variables are dichotomous), an odds ratio and risk ratio are calculated. The ratios are calculated for the higher-valued value in each variable (i.e. True compared with False for a boolean). The risk ratio is calculated relative to the independent variable (rows of the contingency table).

Parameters:

df (DataFrame) – Data to perform the test on
dep (str) – Column in df for the dependent variable (categorical)
ind (str) – Column in df for the independent variable (categorical)
nan_policy (str) – How to handle nan values (see NaN handling)

Return type:

yli.sig_tests.PearsonChiSquaredResult

Example:

df = pd.DataFrame({
        'Response': np.repeat([False, True, False, True], [250, 750, 400, 1600]),
        'Stress': np.repeat([False, False, True, True], [250, 750, 400, 1600])
})
yli.chi2(df, 'Stress', 'Response')

Stress    False  True
Response             
False       250   400
True        750  1600

χ²(1) = 9.82; p = 0.002*
OR (95% CI) = 1.33 (1.11–1.60)
RR (95% CI) = 1.11 (1.03–1.18)

The output shows the contingency table, and states that the value of the Pearson χ² statistic is 9.82, the χ² distribution has 1 degree of freedom, and the test is significant with p value 0.002.

The odds of Stress in the Response = True group are 1.33 times that in the Response = False group, with 95% confidence interval 1.11–1.60.

The risk of Stress in the Response = True group is 1.11 times that in the Response = False group, with 95% confidence interval 1.03–1.18.

yli.mannwhitney(df, dep, ind, *, nan_policy='warn', brunnermunzel=True, use_continuity=False, alternative='two-sided', method='auto')

Perform a Mann–Whitney U test

By default, this function performs a Brunner–Munzel test if the Mann–Whitney test is significant. If the Mann–Whitney test is significant but the Brunner–Munzel test is not, a warning is raised. The Brunner–Munzel test is returned only if non-significant.

Parameters:

df (DataFrame) – Data to perform the test on
dep (str) – Column in df for the dependent variable (numeric)
ind (str) – Column in df for the independent variable (dichotomous)
nan_policy (str) – How to handle nan values (see NaN handling)
brunnermunzel (bool) – Whether to compute the Brunner–Munzel test if the Mann–Whitney test is significant
use_continuity – See scipy.stats.mannwhitneyu
alternative – See scipy.stats.mannwhitneyu
method – See scipy.stats.mannwhitneyu

Returns:

The result of the Mann–Whitney test. The result of a Brunner–Munzel test is included in the result object if and only if brunnermunzel is True, and the Mann–Whitney test is significant, and the Brunner–Munzel test is non-significant.

Return type:

yli.sig_tests.MannWhitneyResult

Example:

df = pd.DataFrame({
        'Sample': ['Before'] * 12 + ['After'] * 12,
        'Oxygen': [11.0, 11.2, 11.2, ...]
})
yli.mannwhitney(df, 'Oxygen', 'Sample', method='asymptotic', alternative='less')

Sample                        After               Before
Oxygen                                                  
Median (IQR)    10.75 (10.55–10.95)  11.55 (11.20–11.83)
Median (range)  10.75 (11.00–12.10)  11.55 (11.00–12.10)

U = 6.0; p < 0.001*
r = 0.92, Before > After

The output states that the value of the Mann–Whitney U statistic is 6.0, and the one-sided test is significant with asymptotic p value < 0.001. The rank-biserial correlation is 0.92 in favour of the Before group.

yli.pearsonr(df, dep, ind, *, nan_policy='warn')

Compute the Pearson product-moment correlation coefficient (Pearson’s r)

Parameters:

df (DataFrame) – Data to perform the test on
dep (str) – Column in df for the dependent variable (numerical)
ind (str) – Column in df for the independent variable (numerical)
nan_policy (str) – How to handle nan values (see NaN handling)

Return type:

yli.sig_tests.PearsonRResult

Example:

df = pd.DataFrame({
        'y': [41, 39, 47, 51, 43, 40, 57, 46, 50, 59, 61, 52],
        'x': [24, 30, 33, 35, 36, 36, 37, 37, 38, 40, 43, 49]
})
yli.pearsonr(df, 'y', 'x')

r (95% CI) = 0.65 (0.11–0.89); p = 0.02*

The output states that the value of the Pearson correlation coefficient is 0.65, with 95% confidence interval 0.11–0.89, and the test is significant with p value 0.02.

yli.spearman(df, dep, ind, *, nan_policy='warn')

Compute the Spearman rank correlation coefficient (Spearman’s ρ)

The confidence interval for ρ is computed analogously to SciPy’s pearsonr, using the Fisher transformation and normal approximation, without adjustment to variance.

Parameters:

df (DataFrame) – Data to perform the test on
dep (str) – Column in df for the dependent variable (numerical)
ind (str) – Column in df for the independent variable (numerical)
nan_policy (str) – How to handle nan values (see NaN handling)

Return type:

yli.sig_tests.SpearmanResult

Example:

df = pd.DataFrame({
        'Profit': [2.5, 6.2, 3.1, ...],
        'Quality': [50, 57, 61, ...]
})
yli.spearman(df, 'Profit', 'Quality')

ρ (95% CI) = 0.87 (0.60–0.96); p < 0.001*

The output states that the value of the Spearman correlation coefficient is 0.87, with 95% confidence interval 0.60–0.96, and the test is significant with p value < 0.001.

yli.ttest_ind(df, dep, ind, *, nan_policy='warn')

Perform an independent 2-sample Student’s t test

Parameters:

df (DataFrame) – Data to perform the test on
dep (str) – Column in df for the dependent variable (numeric)
ind (str) – Column in df for the independent variable (dichotomous)
nan_policy (str) – How to handle nan values (see NaN handling)

Return type:

yli.sig_tests.TTestResult

Example:

df = pd.DataFrame({
        'Type': ['Fresh'] * 10 + ['Stored'] * 10,
        'Potency': [10.2, 10.5, 10.3, ...]
})
yli.ttest_ind(df, 'Potency', 'Type')

Type            Fresh       Stored
Potency                           
μ (SD)   10.37 (0.32)  9.83 (0.24)

t(18) = 4.24; p < 0.001*
Δμ (95% CI) = 0.54 (0.27–0.81), Fresh > Stored

The output states that the value of the t statistic is 4.24, the t distribution has 18 degrees of freedom, and the test is significant with p value < 0.001. The mean difference is 0.54 in favour of the Fresh group, with 95% confidence interval 0.27–0.81.

yli.ttest_ind_multiple(df, dep, inds, *, nan_policy='warn', method='hs')

Perform independent 2-sample Student’s t tests with multiple independent variables, adjusting for multiplicity

Parameters:

df (DataFrame) – Data to perform the test on
dep (str) – Column in df for the dependent variable (numeric)
ind (List[str]) – Columns in df for the independent variables (dichotomous)
nan_policy (str) – How to handle nan values (see NaN handling)
method (str) – Method to apply for multiplicity adjustment (see statsmodels multipletests)

Return type:

yli.sig_tests.MultipleTTestResult

Result classes

class yli.sig_tests.AutoBinaryResult(*, dep, group1, group2, result_data, result_labels)

Result of automatically computed univariable tests of association for a dichotomous dependent variable

See yli.auto_univariable().

Results data stored within instances of this class is not intended to be directly accessed.

dep: Name of the dependent variable (str)

group1: Name of the first group (str)

group2: Name of the second group (str)

summary()

Return a stringified summary of the tests of association

Return type:: str

class yli.sig_tests.BrunnerMunzelResult(statistic, pvalue)

Result of a Brunner–Munzel test

See yli.mannwhitney(). This library calls the Brunner–Munzel test statistic W.

pvalue: p value for the W statistic (float)

statistic: W statistic (float)

summary()

Return a stringified summary of the Brunner–Munzel test

Return type:: str

class yli.sig_tests.ChiSquaredResult(statistic, dof, pvalue)

Result of a generic test with χ²-distributed test statistic

See yli.logrank(), yli.regress.RegressionModel.deviance_chi2().

dof: Degrees of freedom for the χ² distribution (int)

pvalue: p value for the χ² test (float)

statistic: χ² statistic (float)

summary()

Return a stringified summary of the χ² test

Return type:: str

class yli.sig_tests.FTestResult(statistic, dof1, dof2, pvalue)

Result of an F test for ANOVA/regression

See yli.anova_oneway() and yli.regress.RegressionModel.ftest().

dof1: Degrees of freedom in the F distribution numerator (int)

dof2: Degrees of freedom in the F distribution denominator (int)

pvalue: p value for the F statistic (float)

statistic: F statistic (float)

summary()

Return a stringified summary of the F test

Return type:: str

class yli.sig_tests.MannWhitneyResult(*, statistic, pvalue, dep, ind, group1, group2, med1, med2, iqr1, iqr2, range1, range2, rank_biserial, direction, brunnermunzel=None)

Result of a Mann–Whitney U test

See yli.mannwhitney().

brunnermunzel: BrunnerMunzelResult on the same data, or None if N/A

dep: Name of the dependent variable (str)

direction: Description of the direction of the effect (str)

group1: Name of the first group (str)

group2: Name of the second group (str)

ind: Name of the independent variable (str)

iqr1: Interquartile range of the first group (yli.utils.Interval)

iqr2: Interquartile range of the second group (yli.utils.Interval)

med1: Median of the first group (float)

med2: Median of the second group (float)

pvalue: p value for the U statistic (float)

range1: Range of the first group (yli.utils.Interval)

range2: Range of the second group (yli.utils.Interval)

rank_biserial: Absolute value of the rank-biserial correlation (float)

statistic: Lesser of the two Mann–Whitney U statistics (float)

summary()

Return a stringified summary of the Mann–Whitney test

Return type:: str

summary_short(html)

Return a stringified summary of the Mann–Whitney test (U statistic only)

Return type:: str

class yli.sig_tests.MultipleTTestResult(*, dep, results)

Result of multiple Student’s t tests, adjusted for multiplicity

See yli.ttest_ind_multiple().

dep: Name of the dependent variable (str)

results: Results of the t tests (List[TTestResult])

summary()

Return a stringified summary of the t tests

Return type:: str

class yli.sig_tests.PearsonChiSquaredResult(ct, statistic, dof, pvalue, oddsratio=None, riskratio=None)

Result of a Pearson χ² test

See yli.chi2().

ct: Contingency table for the observations (DataFrame)

dof: Degrees of freedom for the χ² distribution (int)

oddsratio: Odds ratio (float; None if not a 2×2 table)

pvalue: p value for the χ² test (float)

riskratio: Risk ratio (float; None if not a 2×2 table)

statistic: χ² statistic (float)

summary()

Return a stringified summary of the χ² test

Return type:: str

summary_short(html)

Return a stringified summary of the χ² test (χ² statistic only)

Return type:: str

class yli.sig_tests.PearsonRResult(statistic, pvalue)

Result of Pearson product-moment correlation

See yli.pearsonr().

pvalue: p value for the r statistic (float)

statistic: Pearson r correlation statistic (yli.utils.Estimate)

summary()

Return a stringified summary of the Pearson correlation

Return type:: str

class yli.sig_tests.SpearmanResult(statistic, pvalue)

Result of Spearman rank correlation

See yli.spearman().

pvalue: p value for the ρ statistic (float)

statistic: Spearman ρ correlation statistic (yli.utils.Estimate)

summary()

Return a stringified summary of the Spearman correlation

Return type:: str

class yli.sig_tests.TTestResult(*, statistic, dof, pvalue, dep, ind, group1, group2, mu1, mu2, sd1, sd2, delta, delta_direction)

Result of a Student’s t test

See yli.ttest_ind().

delta: Absolute value of the mean difference (yli.utils.Estimate)

delta_direction: Description of the direction of the effect (str)

dep: Name of the dependent variable (str)

dof: Degrees of freedom of the t distribution (int)

group1: Name of the first group (str)

group2: Name of the second group (str)

ind: Name of the independent variable (str)

mu1: Mean of the first group (float)

mu2: Mean of the second group (float)

pvalue: p value for the t statistic (float)

sd1: Standard deviation of the first group (float)

sd2: Standard deviation of the second group (float)

statistic: t statistic (float)

summary()

Return a stringified summary of the t test

Return type:: str

summary_short(html)

Return a stringified summary of the t test (t statistic only)

Return type:: str