Mobilised Aggregator#

This example shows how to use the MobilisedAggregator class to aggregate DMOs over multiple walking bouts.

Loading some example data#

Note

This data is randomly generated and not physiologically meaningful. However, it has the same structure as any other typical input dataset for the MobilisedAggregator.

The input data for the aggregator is a pandas.DataFrame with one row for every walking bout. The columns contain the DMO parameters estimated for each walking bout, such as duration, stride length, etc.

import pandas as pd
from mobgap import PACKAGE_ROOT
from mobgap.aggregation import MobilisedAggregator

DATA_PATH = (
    PACKAGE_ROOT.parent / "example_data/original_results/mobilised_aggregator"
)

data = pd.read_csv(
    DATA_PATH / "aggregation_test_input.csv", index_col=0
).set_index(["visit_type", "participant_id", "measurement_date", "wb_id"])
data.head()

				duration_s	n_steps	cadence_spm	walking_speed_mps	stride_length_m	stride_duration_s	n_turns
visit_type	participant_id	measurement_date	wb_id
T1	12345	2023-01-01	0	4.74702	8	99.82188	1.16079	2.51885	1.58675	0
			1	5.13150	7	101.16429	2.57881	1.57243	1.46537	0
			2	8.52727	12	86.53527	1.60044	1.66305	2.56092	2
			3	16.24554	27	91.49977	0.95558	0.88961	3.14549	1
			4	6.09907	8	93.69895	2.33230	1.95969	2.35295	0

Furthermore, the aggregator allows to provide a data mask, which is a boolean pandas.DataFrame with the same dimensions as the input data. The data mask indicates which DMOs of the input data should be used for the aggregation (marked as True) and which should be ignored (marked as False).

For this example, we create this mask by applying the “standard” thresholds from Mobilise-D to the data. To learn more about this see the example threshold_check example.

Note

It is only possible to use the apply_thresholds function here, as all the example data is from the same participant. As some thresholds are cohort or height specific, you would have to apply the thresholds for each participant data separately.

from mobgap.aggregation import apply_thresholds, get_mobilised_dmo_thresholds

thresholds = get_mobilised_dmo_thresholds()
# Note: The height is "artificially" set to 1.75m, as the example data does not contain this information.
data_mask = apply_thresholds(
    data,
    thresholds,
    cohort="HA",
    height_m=1.75,
    measurement_condition="free_living",
)

Performing the aggregation#

The MobilisedAggregator is now used to aggregate the input data over several walking bouts, e.g., over all walking bouts from one participant, or over all walking bouts per participant and day, week, or other criteria. The data is grouped using additional columns in the input data, which are not used for the aggregation itself. In this example, the data is grouped by participant (subject_code) and day (visit_date).

agg = MobilisedAggregator(
    **dict(
        MobilisedAggregator.PredefinedParameters.cvs_dmo_data,
        use_original_names=False,
    )
)
agg.aggregate(data, wb_dmos_mask=data_mask)

MobilisedAggregator(groupby=['visit_type', 'participant_id', 'measurement_date'], unique_wb_id_column='wb_id', use_original_names=False)

The resulting pandas.DataFrame containing the aggregated data contains one row for every group. In this case, there is only one participant and day, so the resulting dataframe contains only one row.

agg_data = agg.aggregated_data_
agg_data

			wb_all__count	total_walking_duration_h	wb_all__n_steps__sum	wb_all__n_turns__sum	wb_all__duration_s__avg	wb_all__duration_s__max	wb_all__duration_s__var	wb_all__cadence_spm__avg	wb_all__stride_duration_s__avg	wb_all__cadence_spm__var	wb_all__stride_duration_s__var	wb_10_30__count	wb_10_30__walking_speed_mps__avg	wb_10_30__stride_length_m__avg	wb_10__count	wb_10__walking_speed_mps__max	wb_30__count	wb_30__walking_speed_mps__avg	wb_30__stride_length_m__avg	wb_30__cadence_spm__avg	wb_30__stride_duration_s__avg	wb_30__walking_speed_mps__max	wb_30__cadence_spm__max	wb_30__walking_speed_mps__var	wb_30__stride_length_m__var	wb_60__count
visit_type	participant_id	measurement_date
T1	12345	2023-01-01	2378	10.534135	59320	3012	8.85868	26.926738	2.27456	94.673461	2.212942	0.127248	0.261023	844	1.496715	1.86458	1029	2.096355	185	1.619353	1.975227	102.806937	2.100595	2.12756	115.174986	0.240867	0.252349	62

Warning

To exactly match the expected output of the original Mobilise-D R-Script, the two stride length parameters would need to be converted to cm and all values rounded to 3 decimals. This is not done in the Python implementation to be consistent with the units across the entire package.

Comparison with R aggregation script#

The outputs of this aggregation algorithm are analogous to the outputs of the original Mobilise-D R-Script, using the same duration filters and aggregation metrics. However, there can be small differences in the second/third decimal place range in the results. This is due to different outputs of the quantile function in Python and R. Furthermore, the parameter “strlen_30_var” is converted to cm for consistency, while it is in m in the original R-Script. By grouping the data by participant and day, the results the Daily Aggregations of the original R-Script are retrieved. To get the Weekly Aggregations, the Daily results are averaged over all recording days per participant and rounded depending on the aggregation metric. Obviously, in this example, the results are identical to the Daily Aggregations, as there is only data from one day contained.

weekly_agg = (
    agg.aggregated_data_.groupby("participant_id")
    .mean(numeric_only=True)
    .reset_index()
)
round_to_int_original_cols = [
    "steps_all_sum",
    "turns_all_sum",
    "wb_all_sum",
    "wb_10_sum",
    "wb_30_sum",
    "wb_60_sum",
]
round_to_int_new_cols = [
    "wb_all__n_steps__sum",
    "wb_all__n_turns__sum",
    "wb_all__count",
    "wb_10__count",
    "wb_30__count",
    "wb_60__count",
]

round_to_int = (
    round_to_int_original_cols
    if agg.use_original_names
    else round_to_int_new_cols
)

round_to_three_decimals = weekly_agg.columns[
    ~weekly_agg.columns.isin(round_to_int)
]
weekly_agg[round_to_int] = weekly_agg[round_to_int].round()
weekly_agg[round_to_three_decimals] = weekly_agg[round_to_three_decimals].round(
    3
)
weekly_agg

	participant_id	wb_all__count	total_walking_duration_h	wb_all__n_steps__sum	wb_all__n_turns__sum	wb_all__duration_s__avg	wb_all__duration_s__max	wb_all__duration_s__var	wb_all__cadence_spm__avg	wb_all__stride_duration_s__avg	wb_all__cadence_spm__var	wb_all__stride_duration_s__var	wb_10_30__count	wb_10_30__walking_speed_mps__avg	wb_10_30__stride_length_m__avg	wb_10__count	wb_10__walking_speed_mps__max	wb_30__count	wb_30__walking_speed_mps__avg	wb_30__stride_length_m__avg	wb_30__cadence_spm__avg	wb_30__stride_duration_s__avg	wb_30__walking_speed_mps__max	wb_30__cadence_spm__max	wb_30__walking_speed_mps__var	wb_30__stride_length_m__var	wb_60__count
0	12345	2378.0	10.534	59320.0	3012.0	8.859	26.927	2.275	94.673	2.213	0.127	0.261	844.0	1.497	1.865	1029.0	2.096	185.0	1.619	1.975	102.807	2.101	2.128	115.175	0.241	0.252	62.0

Total running time of the script: (0 minutes 1.005 seconds)

Estimated memory usage: 9 MB

Gallery generated by Sphinx-Gallery