Internal utilities
Data wrangling
- yli.utils.as_2groups(df, data, group)
Group the data by the given variable, asserting only 2 groups
- Parameters:
df (DataFrame) – Data to group
group (str) – Column to group by
- Returns:
(group1, data1, group2, data2)
group1, group2 (str) – The 2 values of the grouping variable
data1, data2 (DataFrame) – The 2 corresponding subsets of df
- yli.utils.as_numeric(data)
Convert the data to a numeric type, factorising if required
- Parameters:
data – Data to convert
- Returns:
See pandas.factorize
- yli.utils.convert_pandas_nullable(df)
Convert pandas nullable dtypes (e.g. Int64) to non-nullable numpy dtypes
Behaviour on encountering NA values is undefined, so the data should be passed through
check_nan()
first.- Parameters:
df (DataFrame) – Data to check for pandas nullable dtypes
- Returns:
Data with pandas nullable dtypes converted, which may or may not be copied
- Return type:
DataFrame
p values
- yli.utils.fmt_p(p, style)
Format p value for display
- Parameters:
p (float) – p value to display
style (
PValueStyle
) – Style to format the p value
- Returns:
Formatted p value
- Return type:
str
- class yli.utils.PValueStyle(value)
An enum.Flag representing how to render a p value
- VALUE_ONLY
Display only the p value (e.g.
0.08
,<0.001*
)This is an alias for specifying no flags.
- RELATION
Force displaying a relational operator before the p value (e.g.
= 0.08
,< 0.001*
)
- HTML
Format as HTML (e.g. escape
<
)
Formula manipulation
- yli.utils.cols_for_formula(formula, df)
Return the columns corresponding to the Patsy formula
- Parameters:
formula (str) – Patsy formula to parse
df (DataFrame) – Data to apply the formula on
- Returns:
Columns in (the right-hand side of) the formula
- Return type:
List[str]
- yli.utils.formula_factor_ref_category(formula, df, factor)
Get the reference category for a term in a Patsy formula referring to a categorical factor
- Parameters:
formula (str) – Patsy formula to parse
df (DataFrame) – Data to apply the formula on
factor – Factor to determine reference category for (e.g.
Country
,C(Country)
,C(Country, Treatment)
,C(Country, Treatment("Australia"))
)
- Returns:
Reference category for the specified factor
- yli.utils.parse_patsy_term(formula, df, term)
Parse a Patsy term into its component parts
Example: The term
"C(x, Treatment(y))[T.z]"
parses to("C(x, Treatment(y))", "x", "z")
.- Returns:
(factor, column, contrast)
factor (str) – Name of the factor, as specified in the Patsy formula
column (str) – Name of the DataFrame column corresponding to the factor
contrast (str) – Name of the contrast for the factor, or None if not applicable
Library style
For API nomenclature, the following guidelines are used:
Prefer to call a test by its specific name (e.g. anova rather than ftest where applicable), unless most commonly known only by the distribution of the test statistic (e.g. chi2, ttest).
A test/statistic is not referred to by both a distribution and specific name (e.g. mannwhitney rather than mannwhitneyu), unless required for disambiguation (e.g. pearsonr to distinguish the Pearson χ2 test).
The word “test” is omitted (e.g. chi2 rather than chi2test), unless the name would otherwise be a single letter (e.g. ttest, ftest), or unless required for disambiguation (e.g. LikelihoodRatioTestResult to distinguish from the unrelated meaning of “likelihood ratio” in epidemiology).
Underscores are usually omitted from the names of specific tests, test families and statistics (e.g. ttest, oddsratio, pearsonr, pvalue), but are used to separate these from other components (e.g. ttest_ind, anova_oneway, lrtest_null). There are a few exceptions (e.g. rank_biserial, pseudo_rsquared, f_statistic).
The result class for a test has the same naming convention as the test function (e.g. TTestResult for ttest_ind), with abbreviations spelled out (e.g. PearsonChiSquaredResult, LikelihoodRatioTestResult); unless the result class is generic among several tests (e.g. FTestResult for anova_oneway and RegressionResult.ftest), or unless required for disambiguation (e.g. PearsonChiSquaredResult for chi2, as there are other χ2 tests).