Significance tests

Functions

yli.anova_oneway(df, dep, ind, *, nan_policy='warn')

Perform one-way ANOVA

Parameters:
  • df (DataFrame) – Data to perform the test on

  • dep (str) – Column in df for the dependent variable (numeric)

  • ind (str) – Column in df for the independent variable (categorical)

  • nan_policy (str) – How to handle nan values (see NaN handling)

Return type:

yli.sig_tests.FTestResult

Example:

df = pd.DataFrame({
        'Method': [1]*8 + [2]*7 + [3]*9,
        'Score': [96, 79, 91, ...]
})
yli.anova_oneway(df, 'Score', 'Method')
F(2, 21) = 29.57; p < 0.001*

The output states that the value of the F statistic is 29.57, the F distribution has 2 degrees of freedom in the numerator and 21 in the denominator, and the test is significant with p value < 0.001.

yli.auto_univariable(df, dep, inds, *, nan_policy='warn')

Automatically compute univariable tests of association for a dichotomous dependent variable

The tests performed are:

If nan_policy is warn or omit, rows with nan values are omitted only from the individual tests of association for the missing variables.

Parameters:
  • df (DataFrame) – Data to perform the test on

  • dep (str) – Column in df for the dependent variable (dichotomous)

  • inds (List[str]) – Columns in df for the independent variables

  • nan_policy (str) – How to handle nan values (see NaN handling)

Return type:

yli.sig_tests.AutoBinaryResult

yli.chi2(df, dep, ind, *, nan_policy='warn')

Perform a Pearson χ2 test

If a 2×2 contingency table is obtained (i.e. if both variables are dichotomous), an odds ratio and risk ratio are calculated. The ratios are calculated for the higher-valued value in each variable (i.e. True compared with False for a boolean). The risk ratio is calculated relative to the independent variable (rows of the contingency table).

Parameters:
  • df (DataFrame) – Data to perform the test on

  • dep (str) – Column in df for the dependent variable (categorical)

  • ind (str) – Column in df for the independent variable (categorical)

  • nan_policy (str) – How to handle nan values (see NaN handling)

Return type:

yli.sig_tests.PearsonChiSquaredResult

Example:

df = pd.DataFrame({
        'Response': np.repeat([False, True, False, True], [250, 750, 400, 1600]),
        'Stress': np.repeat([False, False, True, True], [250, 750, 400, 1600])
})
yli.chi2(df, 'Stress', 'Response')
Stress    False  True
Response             
False       250   400
True        750  1600

χ²(1) = 9.82; p = 0.002*
OR (95% CI) = 1.33 (1.11–1.60)
RR (95% CI) = 1.11 (1.03–1.18)

The output shows the contingency table, and states that the value of the Pearson χ2 statistic is 9.82, the χ2 distribution has 1 degree of freedom, and the test is significant with p value 0.002.

The odds of Stress in the Response = True group are 1.33 times that in the Response = False group, with 95% confidence interval 1.11–1.60.

The risk of Stress in the Response = True group is 1.11 times that in the Response = False group, with 95% confidence interval 1.03–1.18.

yli.mannwhitney(df, dep, ind, *, nan_policy='warn', brunnermunzel=True, use_continuity=False, alternative='two-sided', method='auto')

Perform a Mann–Whitney U test

By default, this function performs a Brunner–Munzel test if the Mann–Whitney test is significant. If the Mann–Whitney test is significant but the Brunner–Munzel test is not, a warning is raised. The Brunner–Munzel test is returned only if non-significant.

Parameters:
  • df (DataFrame) – Data to perform the test on

  • dep (str) – Column in df for the dependent variable (numeric)

  • ind (str) – Column in df for the independent variable (dichotomous)

  • nan_policy (str) – How to handle nan values (see NaN handling)

  • brunnermunzel (bool) – Whether to compute the Brunner–Munzel test if the Mann–Whitney test is significant

  • use_continuity – See scipy.stats.mannwhitneyu

  • alternative – See scipy.stats.mannwhitneyu

  • method – See scipy.stats.mannwhitneyu

Returns:

The result of the Mann–Whitney test. The result of a Brunner–Munzel test is included in the result object if and only if brunnermunzel is True, and the Mann–Whitney test is significant, and the Brunner–Munzel test is non-significant.

Return type:

yli.sig_tests.MannWhitneyResult

Example:

df = pd.DataFrame({
        'Sample': ['Before'] * 12 + ['After'] * 12,
        'Oxygen': [11.0, 11.2, 11.2, ...]
})
yli.mannwhitney(df, 'Oxygen', 'Sample', method='asymptotic', alternative='less')
Sample                        After               Before
Oxygen                                                  
Median (IQR)    10.75 (10.55–10.95)  11.55 (11.20–11.83)
Median (range)  10.75 (11.00–12.10)  11.55 (11.00–12.10)

U = 6.0; p < 0.001*
r = 0.92, Before > After

The output states that the value of the Mann–Whitney U statistic is 6.0, and the one-sided test is significant with asymptotic p value < 0.001. The rank-biserial correlation is 0.92 in favour of the Before group.

yli.pearsonr(df, dep, ind, *, nan_policy='warn')

Compute the Pearson product-moment correlation coefficient (Pearson’s r)

Parameters:
  • df (DataFrame) – Data to perform the test on

  • dep (str) – Column in df for the dependent variable (numerical)

  • ind (str) – Column in df for the independent variable (numerical)

  • nan_policy (str) – How to handle nan values (see NaN handling)

Return type:

yli.sig_tests.PearsonRResult

Example:

df = pd.DataFrame({
        'y': [41, 39, 47, 51, 43, 40, 57, 46, 50, 59, 61, 52],
        'x': [24, 30, 33, 35, 36, 36, 37, 37, 38, 40, 43, 49]
})
yli.pearsonr(df, 'y', 'x')
r (95% CI) = 0.65 (0.11–0.89); p = 0.02*

The output states that the value of the Pearson correlation coefficient is 0.65, with 95% confidence interval 0.11–0.89, and the test is significant with p value 0.02.

yli.spearman(df, dep, ind, *, nan_policy='warn')

Compute the Spearman rank correlation coefficient (Spearman’s ρ)

The confidence interval for ρ is computed analogously to SciPy’s pearsonr, using the Fisher transformation and normal approximation, without adjustment to variance.

Parameters:
  • df (DataFrame) – Data to perform the test on

  • dep (str) – Column in df for the dependent variable (numerical)

  • ind (str) – Column in df for the independent variable (numerical)

  • nan_policy (str) – How to handle nan values (see NaN handling)

Return type:

yli.sig_tests.SpearmanResult

Example:

df = pd.DataFrame({
        'Profit': [2.5, 6.2, 3.1, ...],
        'Quality': [50, 57, 61, ...]
})
yli.spearman(df, 'Profit', 'Quality')
ρ (95% CI) = 0.87 (0.60–0.96); p < 0.001*

The output states that the value of the Spearman correlation coefficient is 0.87, with 95% confidence interval 0.60–0.96, and the test is significant with p value < 0.001.

yli.ttest_ind(df, dep, ind, *, nan_policy='warn')

Perform an independent 2-sample Student’s t test

Parameters:
  • df (DataFrame) – Data to perform the test on

  • dep (str) – Column in df for the dependent variable (numeric)

  • ind (str) – Column in df for the independent variable (dichotomous)

  • nan_policy (str) – How to handle nan values (see NaN handling)

Return type:

yli.sig_tests.TTestResult

Example:

df = pd.DataFrame({
        'Type': ['Fresh'] * 10 + ['Stored'] * 10,
        'Potency': [10.2, 10.5, 10.3, ...]
})
yli.ttest_ind(df, 'Potency', 'Type')
Type            Fresh       Stored
Potency                           
μ (SD)   10.37 (0.32)  9.83 (0.24)

t(18) = 4.24; p < 0.001*
Δμ (95% CI) = 0.54 (0.27–0.81), Fresh > Stored

The output states that the value of the t statistic is 4.24, the t distribution has 18 degrees of freedom, and the test is significant with p value < 0.001. The mean difference is 0.54 in favour of the Fresh group, with 95% confidence interval 0.27–0.81.

yli.ttest_ind_multiple(df, dep, inds, *, nan_policy='warn', method='hs')

Perform independent 2-sample Student’s t tests with multiple independent variables, adjusting for multiplicity

Parameters:
  • df (DataFrame) – Data to perform the test on

  • dep (str) – Column in df for the dependent variable (numeric)

  • ind (List[str]) – Columns in df for the independent variables (dichotomous)

  • nan_policy (str) – How to handle nan values (see NaN handling)

  • method (str) – Method to apply for multiplicity adjustment (see statsmodels multipletests)

Return type:

yli.sig_tests.MultipleTTestResult

Result classes

class yli.sig_tests.AutoBinaryResult(*, dep, group1, group2, result_data, result_labels)

Result of automatically computed univariable tests of association for a dichotomous dependent variable

See yli.auto_univariable().

Results data stored within instances of this class is not intended to be directly accessed.

dep

Name of the dependent variable (str)

group1

Name of the first group (str)

group2

Name of the second group (str)

summary()

Return a stringified summary of the tests of association

Return type:

str

class yli.sig_tests.BrunnerMunzelResult(statistic, pvalue)

Result of a Brunner–Munzel test

See yli.mannwhitney(). This library calls the Brunner–Munzel test statistic W.

pvalue

p value for the W statistic (float)

statistic

W statistic (float)

summary()

Return a stringified summary of the Brunner–Munzel test

Return type:

str

class yli.sig_tests.ChiSquaredResult(statistic, dof, pvalue)

Result of a generic test with χ2-distributed test statistic

See yli.logrank(), yli.regress.RegressionModel.deviance_chi2().

See also yli.regress.BrantResult, yli.regress.LikelihoodRatioTestResult, PearsonChiSquaredResult.

dof

Degrees of freedom for the χ2 distribution (int)

pvalue

p value for the χ2 test (float)

statistic

χ2 statistic (float)

summary()

Return a stringified summary of the χ2 test

Return type:

str

class yli.sig_tests.FTestResult(statistic, dof1, dof2, pvalue)

Result of an F test for ANOVA/regression

See yli.anova_oneway() and yli.regress.RegressionModel.ftest().

dof1

Degrees of freedom in the F distribution numerator (int)

dof2

Degrees of freedom in the F distribution denominator (int)

pvalue

p value for the F statistic (float)

statistic

F statistic (float)

summary()

Return a stringified summary of the F test

Return type:

str

class yli.sig_tests.MannWhitneyResult(*, statistic, pvalue, dep, ind, group1, group2, med1, med2, iqr1, iqr2, range1, range2, rank_biserial, direction, brunnermunzel=None)

Result of a Mann–Whitney U test

See yli.mannwhitney().

brunnermunzel

BrunnerMunzelResult on the same data, or None if N/A

dep

Name of the dependent variable (str)

direction

Description of the direction of the effect (str)

group1

Name of the first group (str)

group2

Name of the second group (str)

ind

Name of the independent variable (str)

iqr1

Interquartile range of the first group (yli.utils.Interval)

iqr2

Interquartile range of the second group (yli.utils.Interval)

med1

Median of the first group (float)

med2

Median of the second group (float)

pvalue

p value for the U statistic (float)

range1

Range of the first group (yli.utils.Interval)

range2

Range of the second group (yli.utils.Interval)

rank_biserial

Absolute value of the rank-biserial correlation (float)

statistic

Lesser of the two Mann–Whitney U statistics (float)

summary()

Return a stringified summary of the Mann–Whitney test

Return type:

str

summary_short(html)

Return a stringified summary of the Mann–Whitney test (U statistic only)

Return type:

str

class yli.sig_tests.MultipleTTestResult(*, dep, results)

Result of multiple Student’s t tests, adjusted for multiplicity

See yli.ttest_ind_multiple().

dep

Name of the dependent variable (str)

results

Results of the t tests (List[TTestResult])

summary()

Return a stringified summary of the t tests

Return type:

str

class yli.sig_tests.PearsonChiSquaredResult(ct, statistic, dof, pvalue, oddsratio=None, riskratio=None)

Result of a Pearson χ2 test

See yli.chi2().

ct

Contingency table for the observations (DataFrame)

dof

Degrees of freedom for the χ2 distribution (int)

oddsratio

Odds ratio (float; None if not a 2×2 table)

pvalue

p value for the χ2 test (float)

riskratio

Risk ratio (float; None if not a 2×2 table)

statistic

χ2 statistic (float)

summary()

Return a stringified summary of the χ2 test

Return type:

str

summary_short(html)

Return a stringified summary of the χ2 test (χ2 statistic only)

Return type:

str

class yli.sig_tests.PearsonRResult(statistic, pvalue)

Result of Pearson product-moment correlation

See yli.pearsonr().

pvalue

p value for the r statistic (float)

statistic

Pearson r correlation statistic (yli.utils.Estimate)

summary()

Return a stringified summary of the Pearson correlation

Return type:

str

class yli.sig_tests.SpearmanResult(statistic, pvalue)

Result of Spearman rank correlation

See yli.spearman().

pvalue

p value for the ρ statistic (float)

statistic

Spearman ρ correlation statistic (yli.utils.Estimate)

summary()

Return a stringified summary of the Spearman correlation

Return type:

str

class yli.sig_tests.TTestResult(*, statistic, dof, pvalue, dep, ind, group1, group2, mu1, mu2, sd1, sd2, delta, delta_direction)

Result of a Student’s t test

See yli.ttest_ind().

delta

Absolute value of the mean difference (yli.utils.Estimate)

delta_direction

Description of the direction of the effect (str)

dep

Name of the dependent variable (str)

dof

Degrees of freedom of the t distribution (int)

group1

Name of the first group (str)

group2

Name of the second group (str)

ind

Name of the independent variable (str)

mu1

Mean of the first group (float)

mu2

Mean of the second group (float)

pvalue

p value for the t statistic (float)

sd1

Standard deviation of the first group (float)

sd2

Standard deviation of the second group (float)

statistic

t statistic (float)

summary()

Return a stringified summary of the t test

Return type:

str

summary_short(html)

Return a stringified summary of the t test (t statistic only)

Return type:

str