apply_aggregations#

mobgap.utils.df_operations.apply_aggregations(
df: DataFrame,
aggregations: list[tuple[str | tuple[str, ...], callable | str | list[callable | str]] | CustomOperation],
*,
missing_columns: Literal['raise', 'ignore', 'warn'] = 'warn',
) Series[source]#

Apply a set of aggregations to any Dataframe.

Returns a Series with one entry per aggregation. Compared to the default pandas df.agg method, this allows more flexibility in selecting the data to apply the allows to apply aggregations, that require the data of multiple columns at once.

Parameters:
df

The DataFrame containing the data to aggregate. Aggregations are applied on individual or multiple columns of this DataFrame. The identifier provided in the aggregations must be a valid loc identifier for the DataFrame.

aggregationslist

A list specifying which aggregation functions are to be applied for which metrics and data origins. There are two ways to define aggregations:

  1. As a tuple in the format (<identifier>, <aggregation>). In this case, the operation is performed based on exactly one column from the input df. Therefore, <identifier> can either be a string representing the name of the column to evaluate (for data with single-level columns), or a tuple of strings uniquely identifying the column to evaluate. <aggregation> is the function or the list of functions to apply.

  2. As a named tuple of type CustomOperation taking three arguments: identifier, function, and column_name. identifier is a valid loc identifier selecting one or more columns from the dataframe, function is the (custom) aggregation function or list of functions to apply, and column_name is the name of the resulting column in the output dataframe. In case of a single-level output column, column_name is a string, whereas for multi-level output columns, it is a tuple of strings. This allows for more complex aggregations that require multiple columns as input,

missing_columns

How to handle missing columns specified in the aggregations.

  • “raise”: Raise a MissingDataColumnsError.

  • “ignore”: Ignore the missing columns and continue with the remaining aggregations.

  • “warn”: Issue a warning and continue with the remaining aggregations (default).

Returns:
aggregated_series

A pandas series containing the aggregated values. The index of the series is defined by the identifiers of the aggregations and the names of the functions. The multiindex columns will have the form (*idetifier, function_name)

Notes

Warning

When mixing custom operations with built-in aggregations, make sure that the number of levels in the identifiers of the normal aggregations and the number of levels in the column_name attribute of the custom aggregations are identical. Otherwise, they can not be combined.

As implementation note, all the traditional aggregations will be directly handled by Pandas df.agg method. All the CustomOperations will be applied manually. At the end the results will be concatenated.

Examples using mobgap.utils.df_operations.apply_aggregations#

Evaluation of final walking bout level DMOs

Evaluation of final walking bout level DMOs

Cadence Evaluation

Cadence Evaluation

Stride Length Evaluation

Stride Length Evaluation