create_multi_groupby#

mobgap.utils.df_operations.create_multi_groupby(
primary_df: DataFrame,
secondary_dfs: DataFrame | list[DataFrame],
groupby: Hashable | list[str],
**kwargs: Unpack[dict[str, Any]],
) MultiGroupBy[source]#

Group multiple dataframes by the same index levels to apply a function to each group across all dataframes.

This function will return an object similar to a DataFrameGroupBy object, but with only the apply and __iter__ methods implemented. This special groupby object applies a groupby to the primary dataframe, but when iterating over the groups, or applying a function, it will also provide the groups of the secondary dataframes by using loc with the group name of the primary dataframe.

This also means that this function is much more limited than the standard groupby object, as it only supports the grouping by existing named index levels and forces all dataframes to have the same index columns.

Warning

It is important to understand that we only groupy the index of the primary dataframe! This means if an index value only exists in one of the secondary dataframes, it will be ignored. We do this to be able to “just” use the normal pandas groupby API under the hood. We simply group the primary dataframe, get the corresponding groups from the secondary dataframes (if available) and inject them into all operations.

Parameters:
primary_df

The primary dataframe to group by. Its index will be used to perform the actual grouping.

secondary_dfs

The secondary dataframes to group by.

groupby

The names of the index levels to group by.

kwargs

All further arguments will be passed to .groupby of all dataframes.

Examples

>>> df = pd.DataFrame(
...     {
...         "group1": [1, 1, 2, 3],
...         "group2": [1, 2, 1, 1],
...         "value": [1, 2, 3, 4],
...     }
... ).set_index(["group1", "group2"])
>>> df_2 = pd.DataFrame(
...     {
...         "group1": [1, 1, 1, 2],
...         "group2": [1, 2, 3, 1],
...         "value": [11, 12, 13, 14],
...     }
... ).set_index(["group1", "group2"])
>>> multi_groupby = create_multi_groupby(df, df_2, ["group1"])
>>> for group, (df1, df2) in multi_groupby:
...     print(group)
...     print(df1)
...     print(df2)
1
               value
group1 group2
1      1           1
       2           2
               value
group1 group2
1      1          11
       2          12
       3          13
2
               value
group1 group2
2      1           3
               value
group1 group2
2      1          14
3
               value
group1 group2
3      1           4
Empty DataFrame
Columns: [value]
Index: []

Examples using mobgap.utils.df_operations.create_multi_groupby#

The Mobilise-D pipeline: Step-by-Step Breakdown

The Mobilise-D pipeline: Step-by-Step Breakdown

Evaluation of final walking bout level DMOs

Evaluation of final walking bout level DMOs

ICD Evaluation

ICD Evaluation