Descriptive statistics
Functions
- yli.auto_correlations(df, cols)
Automatically compute pairwise correlation coefficients
Dichotomous variables are coded as 0/1, according to which value is lower or higher in the natural sort order. Categorical variables with more than 2 categories are coded with one-hot dummy variables for all categories. Ordinal variables are factorised and coded as ranks. Pairwise Pearson correlation coefficients are then calculated on the coded data.
The effect of the coding is that, for example:
2 continuous variables are compared using Pearson’s r
2 ordinal variables are compared using Spearman’s ρ
2 dichotomous variables are compared using Yule’s φ
A continuous variable and dichotomous variable are compared using point-biserial correlation
An ordinal variable and dichotomous variable are compared using rank-biserial correlation
There is no nan_policy argument. nan values are omitted from summary statistics for each variable, and the count of nan values is reported.
- Parameters:
df (DataFrame) – Data to compute correlations for
cols (List[str]) – Columns in df for the variables to compute correlations for
- Return type:
- yli.auto_descriptives(df, cols, *, ordinal_range=[])
Automatically compute descriptive summary statistics
The statistics computed are:
For a categorical variable – Counts of values
For a continuous variable – Mean and standard deviation
For an ordinal variable – Median and IQR (default) or range
There is no nan_policy argument. nan values are omitted from summary statistics for each variable, and the count of nan values is reported.
- Parameters:
df (DataFrame) – Data to summarise
cols (List[str]) – Columns in df for the variables to summarise
ordinal_range (List[str]) – Columns of ordinal variables in df to report median and range (rather than IQR)
- Return type:
Result classes
- class yli.descriptives.AutoCorrelationsResult(correlations)
Result of automatically computed pairwise correlation coefficients
- correlations
Pairwise correlation coefficients (DataFrame)
- plot()
Plot a heatmap of the pairwise correlation coefficients
- summary()
Return a stringified summary of the correlation matrix
- Return type:
str
- class yli.descriptives.AutoDescriptivesResult(*, result_data, result_labels)
Result of automatically computed descriptive summary statistics
Results data stored within instances of this class is not intended to be directly accessed.
- summary()
Return a stringified summary of the tests of association
- Return type:
str