apply_aggregations#
- mobgap.utils.df_operations.apply_aggregations(
- df: DataFrame,
- aggregations: list[tuple[str | tuple[str, ...], callable | str | list[callable | str]] | CustomOperation],
- *,
- missing_columns: Literal['raise', 'ignore', 'warn'] = 'warn',
Apply a set of aggregations to any Dataframe.
Returns a Series with one entry per aggregation. Compared to the default pandas
df.aggmethod, this allows more flexibility in selecting the data to apply the allows to apply aggregations, that require the data of multiple columns at once.- Parameters:
- df
The DataFrame containing the data to aggregate. Aggregations are applied on individual or multiple columns of this DataFrame. The identifier provided in the aggregations must be a valid loc identifier for the DataFrame.
- aggregationslist
A list specifying which aggregation functions are to be applied for which metrics and data origins. There are two ways to define aggregations:
As a tuple in the format
(<identifier>, <aggregation>). In this case, the operation is performed based on exactly one column from the input df. Therefore, <identifier> can either be a string representing the name of the column to evaluate (for data with single-level columns), or a tuple of strings uniquely identifying the column to evaluate.<aggregation>is the function or the list of functions to apply.As a named tuple of type
CustomOperationtaking three arguments:identifier,function, andcolumn_name.identifieris a valid loc identifier selecting one or more columns from the dataframe,functionis the (custom) aggregation function or list of functions to apply, andcolumn_nameis the name of the resulting column in the output dataframe. In case of a single-level output column,column_nameis a string, whereas for multi-level output columns, it is a tuple of strings. This allows for more complex aggregations that require multiple columns as input,
- missing_columns
How to handle missing columns specified in the aggregations.
“raise”: Raise a
MissingDataColumnsError.“ignore”: Ignore the missing columns and continue with the remaining aggregations.
“warn”: Issue a warning and continue with the remaining aggregations (default).
- Returns:
- aggregated_series
A pandas series containing the aggregated values. The index of the series is defined by the identifiers of the aggregations and the names of the functions. The multiindex columns will have the form
(*idetifier, function_name)
Notes
Warning
When mixing custom operations with built-in aggregations, make sure that the number of levels in the identifiers of the normal aggregations and the number of levels in the
column_nameattribute of the custom aggregations are identical. Otherwise, they can not be combined.As implementation note, all the traditional aggregations will be directly handled by Pandas
df.aggmethod. All the CustomOperations will be applied manually. At the end the results will be concatenated.