.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/pipeline/_03_dmo_evaluation_on_wb_level.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_pipeline__03_dmo_evaluation_on_wb_level.py: .. _gsd_evaluation_parameter: Evaluation of final walking bout level DMOs ================================================ This example shows how to evaluate the performance of parameters on a walking bout (WB) level by comparing against a reference. On this level, we usually need to deal with the issue that the WB identified by the algorithm pipeline might not match the reference WBs. This makes comparing the parameters within them difficult. In general, two approaches can be taken here [1]_: 1. First aggregate the WB-level parameters of both systems to a common level (e.g. per trial, per day, per hour, ...) and then compare the aggregated values. 2. Identify the subset of WBs that match between the two systems and compare the parameters only within these WBs. In the following example we will show both approaches. But first some general setup. .. [1] Kirk, C., Küderle, A., Micó-Amigo, M.E. et al. Mobilise-D insights to estimate real-world walking speed in multiple conditions with a wearable device. Sci Rep 14, 1754 (2024). https://doi.org/10.1038/s41598-024-51766-5 .. GENERATED FROM PYTHON SOURCE LINES 26-34 Loading some example data ------------------------- We simply load some example DMO data and their reference that we provide with the package. Usually, the "detected" data would be the output of your algorithm pipeline and the "reference" data would be the ground truth. .. note :: This data is randomly generated and not physiologically meaningful. However, it has the same structure as any other typical input data for this evaluation. .. GENERATED FROM PYTHON SOURCE LINES 34-50 .. code-block:: default from pprint import pprint import numpy as np import pandas as pd from mobgap import PACKAGE_ROOT DATA_PATH = PACKAGE_ROOT.parent / "example_data/dmo_data/dummy_dmo_data" detected_dmo = pd.read_csv(DATA_PATH / "detected_dmo_data.csv").set_index( ["visit_type", "participant_id", "measurement_date", "wb_id"] ) reference_dmo = pd.read_csv(DATA_PATH / "reference_dmo_data.csv").set_index( ["visit_type", "participant_id", "measurement_date", "wb_id"] ) .. GENERATED FROM PYTHON SOURCE LINES 51-55 In both dataframes each row represents one WB with all of its parameters. The index contains multiple levels, including the visit type, participant_id, measurement day, and WB id, The start and end index of each WB in samples relative to the start of the respective recording is contained in the columns `start` and `end`. .. GENERATED FROM PYTHON SOURCE LINES 55-57 .. code-block:: default detected_dmo .. raw:: html
start end duration_s n_steps cadence_spm walking_speed_mps stride_length_m stride_duration_s n_turns
visit_type participant_id measurement_date wb_id
T1 12345 2023-01-01 0 0 5 5.130315 8 100.232432 2.061659 2.907343 2.384807 1
1 10 15 5.436672 7 101.677896 2.722036 2.469691 2.439419 1
2 20 25 9.140576 12 87.484329 2.255931 2.572195 3.472869 3
3 30 35 17.204985 28 92.096962 1.141349 1.595533 3.507587 1
4 40 45 6.217228 9 93.988941 2.548155 2.338223 3.313120 0
5 50 55 3.521295 8 99.425820 1.820821 2.882743 2.388743 0
6 60 65 14.034649 17 87.428880 2.783120 3.309131 2.627496 1
7 70 75 8.296356 12 86.372700 2.240491 2.721844 1.653604 1


.. GENERATED FROM PYTHON SOURCE LINES 58-60 .. code-block:: default reference_dmo .. raw:: html
start end duration_s n_steps cadence_spm walking_speed_mps stride_length_m stride_duration_s n_turns
visit_type participant_id measurement_date wb_id
T1 12345 2023-01-01 0 0 4 4.74702 8 99.82188 1.16079 2.51885 1.58675 0
1 15 19 5.13150 7 101.16429 2.57881 1.57243 1.46537 0
2 20 24 8.52727 12 86.53527 1.60044 1.66305 2.56092 2
3 35 39 16.24554 27 91.49977 0.95558 0.88961 3.14549 1
4 40 44 6.09907 8 93.69895 2.33230 1.95969 2.35295 0
5 55 59 3.25806 7 99.33525 1.14732 2.37307 1.81262 0
6 60 64 13.52080 17 87.09312 2.09538 2.41782 2.00582 1
7 75 79 7.49830 12 85.96436 2.23757 1.89026 1.55788 0
8 80 84 8.21455 10 75.12352 0.59915 2.16121 2.31160 0
9 95 99 6.84377 9 76.61402 2.22903 1.03362 3.17821 0


.. GENERATED FROM PYTHON SOURCE LINES 61-68 Approach 1: Aggregate then compare ---------------------------------- First, we combine the detected and reference data, which can easily be done as both dataframes have the same index levels. To sustain the information about the origin of the data, we add a column level assigning `"detected"` and `"reference"` to the respective dmos. Furthermore, we rearrange the columns to have the DMO metrics as the first level of the column index. .. GENERATED FROM PYTHON SOURCE LINES 68-77 .. code-block:: default combined_dmos = ( pd.concat( [detected_dmo, reference_dmo], keys=["detected", "reference"], axis=1 ) .reorder_levels((1, 0), axis=1) .sort_index(axis=1) ) combined_dmos .. raw:: html
cadence_spm duration_s end n_steps n_turns start stride_duration_s stride_length_m walking_speed_mps
detected reference detected reference detected reference detected reference detected reference detected reference detected reference detected reference detected reference
visit_type participant_id measurement_date wb_id
T1 12345 2023-01-01 0 100.232432 99.82188 5.130315 4.74702 5.0 4 8.0 8 1.0 0 0.0 0 2.384807 1.58675 2.907343 2.51885 2.061659 1.16079
1 101.677896 101.16429 5.436672 5.13150 15.0 19 7.0 7 1.0 0 10.0 15 2.439419 1.46537 2.469691 1.57243 2.722036 2.57881
2 87.484329 86.53527 9.140576 8.52727 25.0 24 12.0 12 3.0 2 20.0 20 3.472869 2.56092 2.572195 1.66305 2.255931 1.60044
3 92.096962 91.49977 17.204985 16.24554 35.0 39 28.0 27 1.0 1 30.0 35 3.507587 3.14549 1.595533 0.88961 1.141349 0.95558
4 93.988941 93.69895 6.217228 6.09907 45.0 44 9.0 8 0.0 0 40.0 40 3.313120 2.35295 2.338223 1.95969 2.548155 2.33230
5 99.425820 99.33525 3.521295 3.25806 55.0 59 8.0 7 0.0 0 50.0 55 2.388743 1.81262 2.882743 2.37307 1.820821 1.14732
6 87.428880 87.09312 14.034649 13.52080 65.0 64 17.0 17 1.0 1 60.0 60 2.627496 2.00582 3.309131 2.41782 2.783120 2.09538
7 86.372700 85.96436 8.296356 7.49830 75.0 79 12.0 12 1.0 0 70.0 75 1.653604 1.55788 2.721844 1.89026 2.240491 2.23757
8 NaN 75.12352 NaN 8.21455 NaN 84 NaN 10 NaN 0 NaN 80 NaN 2.31160 NaN 2.16121 NaN 0.59915
9 NaN 76.61402 NaN 6.84377 NaN 99 NaN 9 NaN 0 NaN 95 NaN 3.17821 NaN 1.03362 NaN 2.22903


.. GENERATED FROM PYTHON SOURCE LINES 78-91 This provides us with a dataframe containing the detected and reference values for all detected and reference WBs. Some entries are NaN, as the number of WBs in the detected and reference data might differ. The single rows in this dataframe should not be compared directly, as the same WB ids from a detected and a reference WB might not actually belong to the same WB. Therefore, we need to aggregate the DMO data based on an index level of choice, e.g., per day, to retrieve meaningful and interpretable results. This can for instance be done by grouping the data and averaging over the groups. If required, apart from simple groupwise averaging, other aggregation functions (e.g., moving averages or averaging over a span of several days) can be applied. As long as the same aggregation method is applied to both the detected and reference data, further processing can be done in the same way as shown below. .. note:: In case of missing data, applying `dropna()` to the resulting dataframe might be helpful to remove all groups were either detected or reference data is missing. .. GENERATED FROM PYTHON SOURCE LINES 91-100 .. code-block:: default daily_matches = ( combined_dmos.groupby( level=["visit_type", "participant_id", "measurement_date"], axis=0 ) .mean() .dropna() ) daily_matches.T .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/examples/pipeline/_03_dmo_evaluation_on_wb_level.py:92: FutureWarning: The 'axis' keyword in DataFrame.groupby is deprecated and will be removed in a future version. combined_dmos.groupby( .. raw:: html
visit_type T1
participant_id 12345
measurement_date 2023-01-01
cadence_spm detected 93.588495
reference 89.685043
duration_s detected 8.622759
reference 8.008588
end detected 40.000000
reference 51.500000
n_steps detected 12.625000
reference 11.700000
n_turns detected 1.000000
reference 0.400000
start detected 35.000000
reference 47.500000
stride_duration_s detected 2.723456
reference 2.197761
stride_length_m detected 2.599588
reference 1.847961
walking_speed_mps detected 2.196695
reference 1.693637


.. GENERATED FROM PYTHON SOURCE LINES 101-105 The resulting dataframe contains the average detected and reference values for each DMO per visit type, participant, and day. This in conveniently provided as a multiindex column, so that selecting a single DMO, yields a DataFrame with the detected and reference values. .. GENERATED FROM PYTHON SOURCE LINES 105-107 .. code-block:: default daily_matches["cadence_spm"] .. raw:: html
detected reference
visit_type participant_id measurement_date
T1 12345 2023-01-01 93.588495 89.685043


.. GENERATED FROM PYTHON SOURCE LINES 108-114 In our example data, we only have data from a single day, so the aggregated result only has one row. Normally, you would have multiple rows, one for each group of WBs. From here on, further processing to retrieve the aggregated error metrics is identical to the further processing when following approach 2, and is shown below. But let's first show how to calculate the error metrics on a WB-by-WB basis. .. GENERATED FROM PYTHON SOURCE LINES 116-134 Approach 2: Match then compare ------------------------------ As the first step we need to indentify WBs that match between the detected and reference data. As it is unlikely that the WBs are exactly the same, we need to define a threshold for the overlap between the WBs to consider them as a match. This matching can be done using the :func:`~mobgap.pipeline.evaluation.categorize_intervals` function. It classifies every WB in the data either as true positive (TP), false positive (FP), or false negative (TP). In case our data has only WBs from a single recording, we could directly provide the detected and reference data to the function. However, in most cases data would contain WBs from multiple recordings, trials, and participants, ... . In our case, we actually only have WBs from a single recording, but we will still show the approach assuming that the data is more complex. To avoid, that WBs from different recordings are matched (as the matching is just performed based on the start/end index), we need to group the data by the relevant index levels first and apply the matching function to each group. This can be done using the :func:`~mobgap.utils.array_handling.create_multi_groupby` helper function. .. GENERATED FROM PYTHON SOURCE LINES 134-142 .. code-block:: default from mobgap.utils.df_operations import create_multi_groupby per_trial_participant_day_grouper = create_multi_groupby( detected_dmo, reference_dmo, groupby=["visit_type", "participant_id", "measurement_date"], ) .. GENERATED FROM PYTHON SOURCE LINES 143-152 This provides us with a groupby-object that is similar to the normal pandas groupby-object that can be created from a single dataframe. The ``MultiGroupBy`` object allows us to apply a function to each group across all dataframes. Here we apply :func:`~mobgap.pipeline.evaluation.categorize_intervals` with a threshold of 0.8 to each group. The `overlap_threshold` parameter defines the minimum overlap between the detected and reference WBs to be considered a match. It can be chosen according to your needs, whereby a value closer to 0.5 will yield more matches than a value closer to 1. .. GENERATED FROM PYTHON SOURCE LINES 152-165 .. code-block:: default from mobgap.pipeline.evaluation import categorize_intervals wb_tp_fp_fn = per_trial_participant_day_grouper.apply( lambda det, ref: categorize_intervals( gsd_list_detected=det, gsd_list_reference=ref, overlap_threshold=0.8, multiindex_warning=False, ) ) wb_tp_fp_fn .. raw:: html
gs_id_detected gs_id_reference match_type
visit_type participant_id measurement_date match_id
T1 12345 2023-01-01 0 (T1, 12345, 2023-01-01, 0) (T1, 12345, 2023-01-01, 0) tp
1 (T1, 12345, 2023-01-01, 1) NaN fp
2 (T1, 12345, 2023-01-01, 2) (T1, 12345, 2023-01-01, 2) tp
3 (T1, 12345, 2023-01-01, 3) NaN fp
4 (T1, 12345, 2023-01-01, 4) (T1, 12345, 2023-01-01, 4) tp
5 (T1, 12345, 2023-01-01, 5) NaN fp
6 (T1, 12345, 2023-01-01, 6) (T1, 12345, 2023-01-01, 6) tp
7 (T1, 12345, 2023-01-01, 7) NaN fp
8 NaN (T1, 12345, 2023-01-01, 1) fn
9 NaN (T1, 12345, 2023-01-01, 3) fn
10 NaN (T1, 12345, 2023-01-01, 5) fn
11 NaN (T1, 12345, 2023-01-01, 7) fn
12 NaN (T1, 12345, 2023-01-01, 8) fn
13 NaN (T1, 12345, 2023-01-01, 9) fn


.. GENERATED FROM PYTHON SOURCE LINES 166-176 We can see that the function returns a dataframe with the same index as the input dataframes and each WB is classified as TP, FP, or FN. For the TP WBs, the corresponding reference WB is assigned. For the comparison we want to perform here, only the matching WBs, i.e., the TPs, are of interest. If you are interested in the FPs or FNs, have a look at the general :ref:`GSD evaluation example `. Based on the positive matches, we can now extract the DMO data from detected and reference data that is to be compared. To make extracting all the TP WBs a little easier, we can use the :func:`~mobgap.pipeline.evaluation.get_matching_intervals` function. .. GENERATED FROM PYTHON SOURCE LINES 176-185 .. code-block:: default from mobgap.pipeline.evaluation import get_matching_intervals wb_matches = get_matching_intervals( metrics_detected=detected_dmo, metrics_reference=reference_dmo, matches=wb_tp_fp_fn, ) wb_matches.T .. raw:: html
visit_type T1
participant_id 12345
measurement_date 2023-01-01
match_id 0 2 4 6
cadence_spm detected 100.232432 87.484329 93.988941 87.428880
reference 99.821880 86.535270 93.698950 87.093120
duration_s detected 5.130315 9.140576 6.217228 14.034649
reference 4.747020 8.527270 6.099070 13.520800
end detected 5.000000 25.000000 45.000000 65.000000
reference 4.000000 24.000000 44.000000 64.000000
n_steps detected 8.000000 12.000000 9.000000 17.000000
reference 8.000000 12.000000 8.000000 17.000000
n_turns detected 1.000000 3.000000 0.000000 1.000000
reference 0.000000 2.000000 0.000000 1.000000
start detected 0.000000 20.000000 40.000000 60.000000
reference 0.000000 20.000000 40.000000 60.000000
stride_duration_s detected 2.384807 3.472869 3.313120 2.627496
reference 1.586750 2.560920 2.352950 2.005820
stride_length_m detected 2.907343 2.572195 2.338223 3.309131
reference 2.518850 1.663050 1.959690 2.417820
walking_speed_mps detected 2.061659 2.255931 2.548155 2.783120
reference 1.160790 1.600440 2.332300 2.095380
wb_id detected 0.000000 2.000000 4.000000 6.000000
reference 0.000000 2.000000 4.000000 6.000000


.. GENERATED FROM PYTHON SOURCE LINES 186-189 The returned dataframe contains the detected and reference values for all DMOs of the matched WBs. This in conveniently provided as a multiindex column, so that selecting a single DMO, yields a DataFrame with the detected and reference values. .. GENERATED FROM PYTHON SOURCE LINES 189-191 .. code-block:: default wb_matches["cadence_spm"] .. raw:: html
detected reference
visit_type participant_id measurement_date match_id
T1 12345 2023-01-01 0 100.232432 99.82188
2 87.484329 86.53527
4 93.988941 93.69895
6 87.428880 87.09312


.. GENERATED FROM PYTHON SOURCE LINES 192-220 From here on, the aggregated DMOs (when following approach 1) or matched WBs (when following approach 2) can be compared with the same methods to calculate error metrics. For the sake of simplicity, we will show the calculation of error metrics for the matched WBs `wb_matches` (approach 2) here. However, the input can also simply be replaced by the aggregated DMO dataframe `` Estimate Errors in DMO data --------------------------- The DMO data can now be compared day by day (approach 1) or WB by WB (approach 2). We want to calculate general error metrics like the error, absolute error, relative error, and absolute relative error for each day (WB) and DMO. This can be done using the generic the :func:`~mobgap.utils.df_operations.apply_transformations` helper that allows us to apply any list of transformation functions (transformation function -> WB in Series with same length out). It further allows us to declaratively define which transformation/error should be applied to which columns (i.e. which DMOs). A simple definition of error metrics would look like this: As input, it receives the matching DMO data and a list of transformations that should be applied to the data. A transformation is characterized as a function that takes some subset of the input dataframe, performs some operation on it, and returns a series with the same length as the input as output. Calculating the differences between two sets of values, e.g., between detected and reference values, is a common type of transformation that is applied to evaluate the performance of the DMO estimation. For this purpose, the transformations are defined as aa list of tuples containing the DMO of interest as the first element and the error functions applied to the detected and reference values as the second element. This way, you can also define custom error functions and pass them as transformations. Note that the columns of the detected and reference values are expected to be named `detected` and `reference` per default. For the standard error metrics (error, absolute error, relative error, absolute relative), the :func:`~mobgap.pipeline.evaluation.get_default_error_transformations` returns the correct transformations. .. GENERATED FROM PYTHON SOURCE LINES 220-228 .. code-block:: default from mobgap.pipeline.evaluation import ErrorTransformFuncs as E custom_errors = [ ("cadence_spm", [E.abs_error, E.rel_error]), ("duration_s", [E.error]), ("n_turns", [E.rel_error]), ] .. GENERATED FROM PYTHON SOURCE LINES 229-235 This definition should be relatively self-explanatory. We can now apply these transformations to the DMO data using the :func:`~mobgap.utils.df_operations.apply_transformations`. Note, that there is no need to group the dataframe again, as all the transformations are applied row-wise to the entire dataframe. .. GENERATED FROM PYTHON SOURCE LINES 235-241 .. code-block:: default from mobgap.utils.df_operations import apply_transformations custom_wb_errors = apply_transformations(wb_matches, custom_errors) custom_wb_errors.T .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/pipeline/_error_metrics.py:84: UserWarning: Zero division occurred in rel_error because divisor contains zeroes. Affected error metrics are set to NaN. _handle_zero_division(ref, zero_division_hint, "rel_error") .. raw:: html
visit_type T1
participant_id 12345
measurement_date 2023-01-01
match_id 0 2 4 6
cadence_spm abs_error 0.410552 0.949059 0.289991 0.335760
rel_error 0.004113 0.010967 0.003095 0.003855
duration_s error 0.383295 0.613306 0.118158 0.513849
n_turns rel_error NaN 0.500000 NaN 0.000000


.. GENERATED FROM PYTHON SOURCE LINES 242-249 We can also modify the error metrics or provide custom error functions. We will show three options here. 1. Use a usual error metric, but change some input parameters, and have the output under a new name. For this case, we just define a new function wrapping the old one. For example, we might want to suppress the warning that is raised when a zero division occurs in the relative error. As we saw above, this warning is raised for the `n_turns` parameter. .. GENERATED FROM PYTHON SOURCE LINES 249-253 .. code-block:: default def rel_error_without_warning(x): return E.rel_error(x, zero_division_hint=np.nan) .. GENERATED FROM PYTHON SOURCE LINES 254-258 2. When we want to keep the same name for the function, we could just overwrite the old function. But, to avoid accidentally messing up other code, that uses the function, we can also use a lambda function and manually set the name of the function. As a result, we supress the warning as above, but keep the function name for the aggregation. .. GENERATED FROM PYTHON SOURCE LINES 258-262 .. code-block:: default rel_error_as_lambda = lambda x: E.rel_error(x, zero_division_hint=np.nan) rel_error_as_lambda.__name__ = "rel_error" .. GENERATED FROM PYTHON SOURCE LINES 263-272 3. We can also define a completely new error function. The Dataframe we get as input here, contains the columns `detected` and `reference` with the detected and reference values for the DMO of interest. For this example here, we will create a nonsensical ``scaled_error`` function that scales the error by a factor of 2. .. note:: If you want to introduce custom, more complex transformation functions, you can also define them as :class:`~mobgap.utils.df_operations.CustomOperation` as shown for aggregations in the "Aggregation" section. .. GENERATED FROM PYTHON SOURCE LINES 272-276 .. code-block:: default def scaled_error(x): return 2 * (x["detected"] - x["reference"]) .. GENERATED FROM PYTHON SOURCE LINES 277-281 Our custom functions can now be used in the transformations list and freely combined with other error metrics. Also, keep in mind, that the definition is "just" Python, so we can use things like list comprehensions to generate the list of transformations as shown below. .. GENERATED FROM PYTHON SOURCE LINES 281-294 .. code-block:: default custom_errors = [ ("cadence_spm", [E.error, scaled_error]), ("duration_s", [E.error]), ("n_turns", [rel_error_without_warning, rel_error_as_lambda]), *( (m, [E.abs_error, E.rel_error]) for m in ["stride_duration_s", "stride_length_m"] ), ] custom_wb_errors = apply_transformations(wb_matches, custom_errors) custom_wb_errors.T .. raw:: html
visit_type T1
participant_id 12345
measurement_date 2023-01-01
match_id 0 2 4 6
cadence_spm error 0.410552 0.949059 0.289991 0.335760
scaled_error 0.821103 1.898118 0.579983 0.671520
duration_s error 0.383295 0.613306 0.118158 0.513849
n_turns rel_error_without_warning NaN 0.500000 NaN 0.000000
rel_error NaN 0.500000 NaN 0.000000
stride_duration_s abs_error 0.798057 0.911949 0.960170 0.621676
rel_error 0.502951 0.356102 0.408071 0.309936
stride_length_m abs_error 0.388493 0.909145 0.378533 0.891311
rel_error 0.154234 0.546673 0.193160 0.368642


.. GENERATED FROM PYTHON SOURCE LINES 295-301 As expected, the resulting dataframe contains the error metrics for the specified DMOs and could now be further processed, e.g., by aggregating the results. As an alternative to defining a custom error definition, we provide a "default" error definition that can be used to calculate the standard error metrics for the common DMOs. In most cases, this is a good starting point for the evaluation of the DMOs. .. GENERATED FROM PYTHON SOURCE LINES 301-307 .. code-block:: default from mobgap.pipeline.evaluation import get_default_error_transformations default_errors = get_default_error_transformations() pprint(default_errors) .. rst-class:: sphx-glr-script-out .. code-block:: none [('cadence_spm', [, , , ]), ('duration_s', [, , , ]), ('n_steps', [, , , ]), ('n_strides', [, , , ]), ('n_turns', [, , , ]), ('stride_duration_s', [, , , ]), ('stride_length_m', [, , , ]), ('walking_speed_mps', [, , , ])] .. GENERATED FROM PYTHON SOURCE LINES 308-312 While the visualization here is a little ugly, we can see that the default error transformation attempts to calculate the error, the relative error, the absolute error, and the absolute relative error for all the core DMOs. We can apply it as before. .. GENERATED FROM PYTHON SOURCE LINES 312-316 .. code-block:: default wb_errors = apply_transformations(wb_matches, default_errors) wb_errors.T .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:365: UserWarning: One of the transformations requires the following columns: 'n_strides'. They are not found in the DataFrame. warnings.warn(str(e), stacklevel=1) /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/pipeline/_error_metrics.py:84: UserWarning: Zero division occurred in rel_error because divisor contains zeroes. Affected error metrics are set to NaN. _handle_zero_division(ref, zero_division_hint, "rel_error") /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/pipeline/_error_metrics.py:146: UserWarning: Zero division occurred in abs_rel_error because divisor contains zeroes. Affected error metrics are set to NaN. _handle_zero_division(ref, zero_division_hint, "abs_rel_error") .. raw:: html
visit_type T1
participant_id 12345
measurement_date 2023-01-01
match_id 0 2 4 6
cadence_spm error 0.410552 0.949059 0.289991 0.335760
rel_error 0.004113 0.010967 0.003095 0.003855
abs_error 0.410552 0.949059 0.289991 0.335760
abs_rel_error 0.004113 0.010967 0.003095 0.003855
duration_s error 0.383295 0.613306 0.118158 0.513849
rel_error 0.080744 0.071923 0.019373 0.038004
abs_error 0.383295 0.613306 0.118158 0.513849
abs_rel_error 0.080744 0.071923 0.019373 0.038004
n_steps error 0.000000 0.000000 1.000000 0.000000
rel_error 0.000000 0.000000 0.125000 0.000000
abs_error 0.000000 0.000000 1.000000 0.000000
abs_rel_error 0.000000 0.000000 0.125000 0.000000
n_turns error 1.000000 1.000000 0.000000 0.000000
rel_error NaN 0.500000 NaN 0.000000
abs_error 1.000000 1.000000 0.000000 0.000000
abs_rel_error NaN 0.500000 NaN 0.000000
stride_duration_s error 0.798057 0.911949 0.960170 0.621676
rel_error 0.502951 0.356102 0.408071 0.309936
abs_error 0.798057 0.911949 0.960170 0.621676
abs_rel_error 0.502951 0.356102 0.408071 0.309936
stride_length_m error 0.388493 0.909145 0.378533 0.891311
rel_error 0.154234 0.546673 0.193160 0.368642
abs_error 0.388493 0.909145 0.378533 0.891311
abs_rel_error 0.154234 0.546673 0.193160 0.368642
walking_speed_mps error 0.900869 0.655491 0.215855 0.687740
rel_error 0.776083 0.409569 0.092550 0.328217
abs_error 0.900869 0.655491 0.215855 0.687740
abs_rel_error 0.776083 0.409569 0.092550 0.328217


.. GENERATED FROM PYTHON SOURCE LINES 317-319 Before we now aggregate the results, we can also combine the error metrics with the reference and detected values to have all the information in one dataframe. .. GENERATED FROM PYTHON SOURCE LINES 319-322 .. code-block:: default wb_matches_with_errors = pd.concat([wb_matches, wb_errors], axis=1) wb_matches_with_errors.T .. raw:: html
visit_type T1
participant_id 12345
measurement_date 2023-01-01
match_id 0 2 4 6
cadence_spm detected 100.232432 87.484329 93.988941 87.428880
reference 99.821880 86.535270 93.698950 87.093120
duration_s detected 5.130315 9.140576 6.217228 14.034649
reference 4.747020 8.527270 6.099070 13.520800
end detected 5.000000 25.000000 45.000000 65.000000
reference 4.000000 24.000000 44.000000 64.000000
n_steps detected 8.000000 12.000000 9.000000 17.000000
reference 8.000000 12.000000 8.000000 17.000000
n_turns detected 1.000000 3.000000 0.000000 1.000000
reference 0.000000 2.000000 0.000000 1.000000
start detected 0.000000 20.000000 40.000000 60.000000
reference 0.000000 20.000000 40.000000 60.000000
stride_duration_s detected 2.384807 3.472869 3.313120 2.627496
reference 1.586750 2.560920 2.352950 2.005820
stride_length_m detected 2.907343 2.572195 2.338223 3.309131
reference 2.518850 1.663050 1.959690 2.417820
walking_speed_mps detected 2.061659 2.255931 2.548155 2.783120
reference 1.160790 1.600440 2.332300 2.095380
wb_id detected 0.000000 2.000000 4.000000 6.000000
reference 0.000000 2.000000 4.000000 6.000000
cadence_spm error 0.410552 0.949059 0.289991 0.335760
rel_error 0.004113 0.010967 0.003095 0.003855
abs_error 0.410552 0.949059 0.289991 0.335760
abs_rel_error 0.004113 0.010967 0.003095 0.003855
duration_s error 0.383295 0.613306 0.118158 0.513849
rel_error 0.080744 0.071923 0.019373 0.038004
abs_error 0.383295 0.613306 0.118158 0.513849
abs_rel_error 0.080744 0.071923 0.019373 0.038004
n_steps error 0.000000 0.000000 1.000000 0.000000
rel_error 0.000000 0.000000 0.125000 0.000000
abs_error 0.000000 0.000000 1.000000 0.000000
abs_rel_error 0.000000 0.000000 0.125000 0.000000
n_turns error 1.000000 1.000000 0.000000 0.000000
rel_error NaN 0.500000 NaN 0.000000
abs_error 1.000000 1.000000 0.000000 0.000000
abs_rel_error NaN 0.500000 NaN 0.000000
stride_duration_s error 0.798057 0.911949 0.960170 0.621676
rel_error 0.502951 0.356102 0.408071 0.309936
abs_error 0.798057 0.911949 0.960170 0.621676
abs_rel_error 0.502951 0.356102 0.408071 0.309936
stride_length_m error 0.388493 0.909145 0.378533 0.891311
rel_error 0.154234 0.546673 0.193160 0.368642
abs_error 0.388493 0.909145 0.378533 0.891311
abs_rel_error 0.154234 0.546673 0.193160 0.368642
walking_speed_mps error 0.900869 0.655491 0.215855 0.687740
rel_error 0.776083 0.409569 0.092550 0.328217
abs_error 0.900869 0.655491 0.215855 0.687740
abs_rel_error 0.776083 0.409569 0.092550 0.328217


.. GENERATED FROM PYTHON SOURCE LINES 323-350 Aggregate Results ----------------- Finally, the estimated DMO measures and their errors can be aggregated over all WBs (approach 2) or all days (approach 1). For this purpose, different aggregation functions can be applied to the error metrics, ranging from simple, built-in aggregations like the mean or standard deviation to more complex functions like the limits of agreement or 5th and 95th percentiles. This can be done using the :func:`~mobgap.utils.df_operations.apply_aggregations` function. It operates similarly to the :func:`~mobgap.utils.df_operations.apply_transformations` function used above by taking the error metrics dataframe and a list of aggregations as input. In contrast to the transformations, an aggregation performed over a subset of dataframe columns is expected to return a single value or a tuple of values stored in one cell of the resulting dataframe. There are two ways to define aggregations: 1. As a tuple in the format ``(, )``. In this case, the operation is performed based on exactly one column from the input df. Therefore, ```` can either be a string representing the name of the column to evaluate (for data with single-level columns), or a tuple of strings uniquely identifying the column to evaluate in case of multi-index columns. In our example, the identifier is a tuple ``(, )``, where ```` is the metric column to evaluate, ```` is the specific column from which data should be utilized (here, it would be either ``detected``, ``reference``, or one of the error columns). ```` is the function or the list of functions to apply. The output dataframe will have a multilevel column with ``metric`` as the first level and ``origin`` as the second level. A valid aggregations list for all of our DMOs would consequently look like this: .. GENERATED FROM PYTHON SOURCE LINES 350-366 .. code-block:: default metrics = [ "cadence_spm", "duration_s", "n_steps", "n_turns", "stride_duration_s", "stride_length_m", "walking_speed_mps", ] aggregations_simple = [ ((m, o), ["mean", "std"]) for m in metrics for o in ["detected", "reference", "error"] ] pprint(aggregations_simple) .. rst-class:: sphx-glr-script-out .. code-block:: none [(('cadence_spm', 'detected'), ['mean', 'std']), (('cadence_spm', 'reference'), ['mean', 'std']), (('cadence_spm', 'error'), ['mean', 'std']), (('duration_s', 'detected'), ['mean', 'std']), (('duration_s', 'reference'), ['mean', 'std']), (('duration_s', 'error'), ['mean', 'std']), (('n_steps', 'detected'), ['mean', 'std']), (('n_steps', 'reference'), ['mean', 'std']), (('n_steps', 'error'), ['mean', 'std']), (('n_turns', 'detected'), ['mean', 'std']), (('n_turns', 'reference'), ['mean', 'std']), (('n_turns', 'error'), ['mean', 'std']), (('stride_duration_s', 'detected'), ['mean', 'std']), (('stride_duration_s', 'reference'), ['mean', 'std']), (('stride_duration_s', 'error'), ['mean', 'std']), (('stride_length_m', 'detected'), ['mean', 'std']), (('stride_length_m', 'reference'), ['mean', 'std']), (('stride_length_m', 'error'), ['mean', 'std']), (('walking_speed_mps', 'detected'), ['mean', 'std']), (('walking_speed_mps', 'reference'), ['mean', 'std']), (('walking_speed_mps', 'error'), ['mean', 'std'])] .. GENERATED FROM PYTHON SOURCE LINES 367-375 2. As a named tuple of Type `CustomOperation` taking three values: `identifier`, `function`, and `column_name`. `identifier` is a valid loc identifier selecting one or more columns from the dataframe, `function` is the (custom) aggregation function or list of functions to apply, and `column_name` is the name of the resulting column in the output dataframe (single-level column if `column_name` is a string, multi-level column if `column_name` is a tuple). This allows for more complex aggregations that require multiple columns as input, for example, the intraclass correlation coefficient (ICC) for the DMOs (see below). A valid aggregation list for calculating the ICC of all DMOs would look like this: .. GENERATED FROM PYTHON SOURCE LINES 376-385 .. code-block:: default from mobgap.pipeline.evaluation import CustomErrorAggregations as A from mobgap.pipeline.evaluation import get_default_error_aggregations from mobgap.utils.df_operations import CustomOperation aggregations_custom = [ CustomOperation(identifier=m, function=A.icc, column_name=(m, "all")) for m in metrics ] pprint(aggregations_custom) .. rst-class:: sphx-glr-script-out .. code-block:: none [CustomOperation(identifier='cadence_spm', function=, column_name=('cadence_spm', 'all')), CustomOperation(identifier='duration_s', function=, column_name=('duration_s', 'all')), CustomOperation(identifier='n_steps', function=, column_name=('n_steps', 'all')), CustomOperation(identifier='n_turns', function=, column_name=('n_turns', 'all')), CustomOperation(identifier='stride_duration_s', function=, column_name=('stride_duration_s', 'all')), CustomOperation(identifier='stride_length_m', function=, column_name=('stride_length_m', 'all')), CustomOperation(identifier='walking_speed_mps', function=, column_name=('walking_speed_mps', 'all'))] .. GENERATED FROM PYTHON SOURCE LINES 386-391 In this case, the ICC function gets the entire "sub-dataframe" obtained by the selection ``wb_matches_with_errors.loc[:, m]`` as shown below for ``stride_duration_s`` as example, and could then perform any required calculations. The selection could theoretically be any valid loc selection. So you could even select values across multiple DMOs. .. GENERATED FROM PYTHON SOURCE LINES 391-393 .. code-block:: default sub_df = wb_matches_with_errors.loc[:, "stride_duration_s"] .. GENERATED FROM PYTHON SOURCE LINES 394-395 The ICC function just takes the ``detected`` and ``reference`` columns and calculates the ICC. .. GENERATED FROM PYTHON SOURCE LINES 395-397 .. code-block:: default A.icc(sub_df) .. rst-class:: sphx-glr-script-out .. code-block:: none (0.12564828430955782, array([-0.77, 0.9 ])) .. GENERATED FROM PYTHON SOURCE LINES 398-403 Within one aggregation list, both types of aggregations can be combined as long as the resulting output dataframes can be concatenated, i.e. have the same number of column levels. Then, the :func:`~mobgap.utils.df_operations.apply_aggregations` function can be called. This returns a pandas Series with the aggregated values for each metric and origin. For better readability, we sort and format the resulting dataframe. .. GENERATED FROM PYTHON SOURCE LINES 403-415 .. code-block:: default from mobgap.utils.df_operations import apply_aggregations aggregations = aggregations_simple + aggregations_custom agg_results = ( apply_aggregations(wb_matches_with_errors, aggregations) .rename_axis(index=["aggregation", "metric", "origin"]) .reorder_levels(["metric", "origin", "aggregation"]) .sort_index(level=0) .to_frame("values") ) agg_results .. raw:: html
values
metric origin aggregation
cadence_spm all icc (0.9958867337397626, [0.96, 1.0])
detected mean 92.283646
std 6.128986
error mean 0.496341
std 0.305876
reference mean 91.787305
std 6.267057
duration_s all icc (0.9935066257807105, [0.94, 1.0])
detected mean 8.630692
std 3.980795
error mean 0.407152
std 0.214453
reference mean 8.22354
std 3.86233
n_steps all icc (0.9927710843373494, [0.93, 1.0])
detected mean 11.5
std 4.041452
error mean 0.25
std 0.5
reference mean 11.25
std 4.272002
n_turns all icc (0.8064516129032259, [-0.03, 0.99])
detected mean 1.25
std 1.258306
error mean 0.5
std 0.57735
reference mean 0.75
std 0.957427
stride_duration_s all icc (0.12564828430955782, [-0.77, 0.9])
detected mean 2.949573
std 0.525579
error mean 0.822963
std 0.150423
reference mean 2.12661
std 0.426573
stride_length_m all icc (0.1021600349369979, [-0.78, 0.9])
detected mean 2.781723
std 0.422111
error mean 0.64187
std 0.298442
reference mean 2.139852
std 0.400293
walking_speed_mps all icc (0.20380942118620765, [-0.74, 0.92])
detected mean 2.412216
std 0.317996
error mean 0.614989
std 0.2875
reference mean 1.797227
std 0.522486


.. GENERATED FROM PYTHON SOURCE LINES 416-418 If you simply want to apply a standard set of aggregations to the error metrics, you can use the :func:`~mobgap.pipeline.evaluation.get_default_error_aggregations` function, resulting in the following list: .. GENERATED FROM PYTHON SOURCE LINES 418-422 .. code-block:: default aggregations_default = get_default_error_aggregations() pprint(aggregations_default) .. rst-class:: sphx-glr-script-out .. code-block:: none [(('cadence_spm', 'detected'), ['mean', ]), (('cadence_spm', 'reference'), ['mean', ]), (('cadence_spm', 'abs_error'), ['mean', ]), (('cadence_spm', 'abs_rel_error'), ['mean', ]), (('duration_s', 'detected'), ['mean', ]), (('duration_s', 'reference'), ['mean', ]), (('duration_s', 'abs_error'), ['mean', ]), (('duration_s', 'abs_rel_error'), ['mean', ]), (('n_steps', 'detected'), ['mean', ]), (('n_steps', 'reference'), ['mean', ]), (('n_steps', 'abs_error'), ['mean', ]), (('n_steps', 'abs_rel_error'), ['mean', ]), (('n_strides', 'detected'), ['mean', ]), (('n_strides', 'reference'), ['mean', ]), (('n_strides', 'abs_error'), ['mean', ]), (('n_strides', 'abs_rel_error'), ['mean', ]), (('n_turns', 'detected'), ['mean', ]), (('n_turns', 'reference'), ['mean', ]), (('n_turns', 'abs_error'), ['mean', ]), (('n_turns', 'abs_rel_error'), ['mean', ]), (('stride_duration_s', 'detected'), ['mean', ]), (('stride_duration_s', 'reference'), ['mean', ]), (('stride_duration_s', 'abs_error'), ['mean', ]), (('stride_duration_s', 'abs_rel_error'), ['mean', ]), (('stride_length_m', 'detected'), ['mean', ]), (('stride_length_m', 'reference'), ['mean', ]), (('stride_length_m', 'abs_error'), ['mean', ]), (('stride_length_m', 'abs_rel_error'), ['mean', ]), (('walking_speed_mps', 'detected'), ['mean', ]), (('walking_speed_mps', 'reference'), ['mean', ]), (('walking_speed_mps', 'abs_error'), ['mean', ]), (('walking_speed_mps', 'abs_rel_error'), ['mean', ]), (('cadence_spm', 'error'), ['mean', ]), (('cadence_spm', 'rel_error'), ['mean', ]), (('duration_s', 'error'), ['mean', ]), (('duration_s', 'rel_error'), ['mean', ]), (('n_steps', 'error'), ['mean', ]), (('n_steps', 'rel_error'), ['mean', ]), (('n_strides', 'error'), ['mean', ]), (('n_strides', 'rel_error'), ['mean', ]), (('n_turns', 'error'), ['mean', ]), (('n_turns', 'rel_error'), ['mean', ]), (('stride_duration_s', 'error'), ['mean', ]), (('stride_duration_s', 'rel_error'), ['mean', ]), (('stride_length_m', 'error'), ['mean', ]), (('stride_length_m', 'rel_error'), ['mean', ]), (('walking_speed_mps', 'error'), ['mean', ]), (('walking_speed_mps', 'rel_error'), ['mean', ]), CustomOperation(identifier='cadence_spm', function=, column_name=('cadence_spm', 'all')), CustomOperation(identifier='duration_s', function=, column_name=('duration_s', 'all')), CustomOperation(identifier='n_steps', function=, column_name=('n_steps', 'all')), CustomOperation(identifier='n_strides', function=, column_name=('n_strides', 'all')), CustomOperation(identifier='n_turns', function=, column_name=('n_turns', 'all')), CustomOperation(identifier='stride_duration_s', function=, column_name=('stride_duration_s', 'all')), CustomOperation(identifier='stride_length_m', function=, column_name=('stride_length_m', 'all')), CustomOperation(identifier='walking_speed_mps', function=, column_name=('walking_speed_mps', 'all')), CustomOperation(identifier=None, function=, column_name=('all', 'all'))] .. GENERATED FROM PYTHON SOURCE LINES 423-424 If you want to include further aggregations next to the default ones, you can also append them to this list. .. GENERATED FROM PYTHON SOURCE LINES 424-428 .. code-block:: default aggregations_default_extended = aggregations_default + [ *(((m, o), ["std"]) for m in metrics for o in ["detected", "reference"]) ] .. GENERATED FROM PYTHON SOURCE LINES 429-431 This list of standard aggregations can then also be passed to the :func:`~mobgap.utils.df_operations.apply_aggregations` function. .. GENERATED FROM PYTHON SOURCE LINES 431-440 .. code-block:: default default_agg_results = ( apply_aggregations(wb_matches_with_errors, aggregations_default_extended) .rename_axis(index=["aggregation", "metric", "origin"]) .reorder_levels(["metric", "origin", "aggregation"]) .sort_index(level=0) .to_frame("values") ) default_agg_results .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'detected')'. They are not found in the DataFrame. warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1) /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'reference')'. They are not found in the DataFrame. warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1) /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'abs_error')'. They are not found in the DataFrame. warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1) /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'abs_rel_error')'. They are not found in the DataFrame. warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1) /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'error')'. They are not found in the DataFrame. warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1) /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'rel_error')'. They are not found in the DataFrame. warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1) /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:566: UserWarning: One of the transformations requires the following columns: 'n_strides'. They are not found in the DataFrame. warnings.warn(str(e), UserWarning, stacklevel=1) .. raw:: html
values
metric origin aggregation
all all n_datapoints 4
cadence_spm abs_error mean 0.496341
quantiles (0.29685672457991713, 0.8682829318918925)
abs_rel_error mean 0.005508
quantiles (0.003208965846203449, 0.00993913893132042)
... ... ... ...
walking_speed_mps reference mean 1.797227
quantiles (1.2267375, 2.2967619999999997)
std 0.522486
rel_error loa (-0.15414805045818636, 0.9573580216047631)
mean 0.401605

106 rows × 1 columns



.. GENERATED FROM PYTHON SOURCE LINES 441-445 .. note:: If you want to modify the default arguments of the aggregation functions, e.g. to change the calculated quantiles, you can either define custom aggregation functions or adapt the default functions as shown for the transformation functions above. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 4.901 seconds) **Estimated memory usage:** 9 MB .. _sphx_glr_download_auto_examples_pipeline__03_dmo_evaluation_on_wb_level.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: _03_dmo_evaluation_on_wb_level.py <_03_dmo_evaluation_on_wb_level.py>` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: _03_dmo_evaluation_on_wb_level.ipynb <_03_dmo_evaluation_on_wb_level.ipynb>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_