Descriptive statistics

Functions

yli.auto_correlations(df, cols)

Automatically compute pairwise correlation coefficients

Dichotomous variables are coded as 0/1, according to which value is lower or higher in the natural sort order. Categorical variables with more than 2 categories are coded with one-hot dummy variables for all categories. Ordinal variables are factorised and coded as ranks. Pairwise Pearson correlation coefficients are then calculated on the coded data.

The effect of the coding is that, for example:

  • 2 continuous variables are compared using Pearson’s r

  • 2 ordinal variables are compared using Spearman’s ρ

  • 2 dichotomous variables are compared using Yule’s φ

  • A continuous variable and dichotomous variable are compared using point-biserial correlation

  • An ordinal variable and dichotomous variable are compared using rank-biserial correlation

There is no nan_policy argument. nan values are omitted from summary statistics for each variable, and the count of nan values is reported.

Parameters:
  • df (DataFrame) – Data to compute correlations for

  • cols (List[str]) – Columns in df for the variables to compute correlations for

Return type:

yli.descriptives.AutoCorrelationsResult

yli.auto_descriptives(df, cols, *, ordinal_range=[])

Automatically compute descriptive summary statistics

The statistics computed are:

  • For a categorical variable – Counts of values

  • For a continuous variable – Mean and standard deviation

  • For an ordinal variable – Median and IQR (default) or range

There is no nan_policy argument. nan values are omitted from summary statistics for each variable, and the count of nan values is reported.

Parameters:
  • df (DataFrame) – Data to summarise

  • cols (List[str]) – Columns in df for the variables to summarise

  • ordinal_range (List[str]) – Columns of ordinal variables in df to report median and range (rather than IQR)

Return type:

yli.descriptives.AutoDescriptivesResult

Result classes

class yli.descriptives.AutoCorrelationsResult(correlations)

Result of automatically computed pairwise correlation coefficients

See yli.auto_correlations().

correlations

Pairwise correlation coefficients (DataFrame)

plot()

Plot a heatmap of the pairwise correlation coefficients

summary()

Return a stringified summary of the correlation matrix

Return type:

str

class yli.descriptives.AutoDescriptivesResult(*, result_data, result_labels)

Result of automatically computed descriptive summary statistics

See yli.auto_descriptives().

Results data stored within instances of this class is not intended to be directly accessed.

summary()

Return a stringified summary of the tests of association

Return type:

str