.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/pipeline/_03_dmo_evaluation_on_wb_level.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_pipeline__03_dmo_evaluation_on_wb_level.py: .. _gsd_evaluation_parameter: Evaluation of final walking bout level DMOs ================================================ This example shows how to evaluate the performance of parameters on a walking bout (WB) level by comparing against a reference. On this level, we usually need to deal with the issue that the WB identified by the algorithm pipeline might not match the reference WBs. This makes comparing the parameters within them difficult. In general, two approaches can be taken here [1]_: 1. First aggregate the WB-level parameters of both systems to a common level (e.g. per trial, per day, per hour, ...) and then compare the aggregated values. 2. Identify the subset of WBs that match between the two systems and compare the parameters only within these WBs. In the following example we will show both approaches. But first some general setup. .. [1] Kirk, C., Küderle, A., Micó-Amigo, M.E. et al. Mobilise-D insights to estimate real-world walking speed in multiple conditions with a wearable device. Sci Rep 14, 1754 (2024). https://doi.org/10.1038/s41598-024-51766-5 .. GENERATED FROM PYTHON SOURCE LINES 26-34 Loading some example data ------------------------- We simply load some example DMO data and their reference that we provide with the package. Usually, the "detected" data would be the output of your algorithm pipeline and the "reference" data would be the ground truth. .. note :: This data is randomly generated and not physiologically meaningful. However, it has the same structure as any other typical input data for this evaluation. .. GENERATED FROM PYTHON SOURCE LINES 34-50 .. code-block:: default from pprint import pprint import numpy as np import pandas as pd from mobgap import PACKAGE_ROOT DATA_PATH = PACKAGE_ROOT.parent / "example_data/dmo_data/dummy_dmo_data" detected_dmo = pd.read_csv(DATA_PATH / "detected_dmo_data.csv").set_index( ["visit_type", "participant_id", "measurement_date", "wb_id"] ) reference_dmo = pd.read_csv(DATA_PATH / "reference_dmo_data.csv").set_index( ["visit_type", "participant_id", "measurement_date", "wb_id"] ) .. GENERATED FROM PYTHON SOURCE LINES 51-55 In both dataframes each row represents one WB with all of its parameters. The index contains multiple levels, including the visit type, participant_id, measurement day, and WB id, The start and end index of each WB in samples relative to the start of the respective recording is contained in the columns `start` and `end`. .. GENERATED FROM PYTHON SOURCE LINES 55-57 .. code-block:: default detected_dmo .. raw:: html

				start	end	duration_s	n_steps	cadence_spm	walking_speed_mps	stride_length_m	stride_duration_s	n_turns
visit_type	participant_id	measurement_date	wb_id
T1	12345	2023-01-01	0	0	5	5.130315	8	100.232432	2.061659	2.907343	2.384807	1
			1	10	15	5.436672	7	101.677896	2.722036	2.469691	2.439419	1
			2	20	25	9.140576	12	87.484329	2.255931	2.572195	3.472869	3
			3	30	35	17.204985	28	92.096962	1.141349	1.595533	3.507587	1
			4	40	45	6.217228	9	93.988941	2.548155	2.338223	3.313120	0
			5	50	55	3.521295	8	99.425820	1.820821	2.882743	2.388743	0
			6	60	65	14.034649	17	87.428880	2.783120	3.309131	2.627496	1
			7	70	75	8.296356	12	86.372700	2.240491	2.721844	1.653604	1

.. GENERATED FROM PYTHON SOURCE LINES 58-60 .. code-block:: default reference_dmo .. raw:: html

				start	end	duration_s	n_steps	cadence_spm	walking_speed_mps	stride_length_m	stride_duration_s	n_turns
visit_type	participant_id	measurement_date	wb_id
T1	12345	2023-01-01	0	0	4	4.74702	8	99.82188	1.16079	2.51885	1.58675	0
			1	15	19	5.13150	7	101.16429	2.57881	1.57243	1.46537	0
			2	20	24	8.52727	12	86.53527	1.60044	1.66305	2.56092	2
			3	35	39	16.24554	27	91.49977	0.95558	0.88961	3.14549	1
			4	40	44	6.09907	8	93.69895	2.33230	1.95969	2.35295	0
			5	55	59	3.25806	7	99.33525	1.14732	2.37307	1.81262	0
			6	60	64	13.52080	17	87.09312	2.09538	2.41782	2.00582	1
			7	75	79	7.49830	12	85.96436	2.23757	1.89026	1.55788	0
			8	80	84	8.21455	10	75.12352	0.59915	2.16121	2.31160	0
			9	95	99	6.84377	9	76.61402	2.22903	1.03362	3.17821	0

.. GENERATED FROM PYTHON SOURCE LINES 61-68 Approach 1: Aggregate then compare ---------------------------------- First, we combine the detected and reference data, which can easily be done as both dataframes have the same index levels. To sustain the information about the origin of the data, we add a column level assigning `"detected"` and `"reference"` to the respective dmos. Furthermore, we rearrange the columns to have the DMO metrics as the first level of the column index. .. GENERATED FROM PYTHON SOURCE LINES 68-77 .. code-block:: default combined_dmos = ( pd.concat( [detected_dmo, reference_dmo], keys=["detected", "reference"], axis=1 ) .reorder_levels((1, 0), axis=1) .sort_index(axis=1) ) combined_dmos .. raw:: html

				cadence_spm		duration_s		end		n_steps		n_turns		start		stride_duration_s		stride_length_m		walking_speed_mps
				detected	reference	detected	reference	detected	reference	detected	reference	detected	reference	detected	reference	detected	reference	detected	reference	detected	reference
visit_type	participant_id	measurement_date	wb_id
T1	12345	2023-01-01	0	100.232432	99.82188	5.130315	4.74702	5.0	4	8.0	8	1.0	0	0.0	0	2.384807	1.58675	2.907343	2.51885	2.061659	1.16079
			1	101.677896	101.16429	5.436672	5.13150	15.0	19	7.0	7	1.0	0	10.0	15	2.439419	1.46537	2.469691	1.57243	2.722036	2.57881
			2	87.484329	86.53527	9.140576	8.52727	25.0	24	12.0	12	3.0	2	20.0	20	3.472869	2.56092	2.572195	1.66305	2.255931	1.60044
			3	92.096962	91.49977	17.204985	16.24554	35.0	39	28.0	27	1.0	1	30.0	35	3.507587	3.14549	1.595533	0.88961	1.141349	0.95558
			4	93.988941	93.69895	6.217228	6.09907	45.0	44	9.0	8	0.0	0	40.0	40	3.313120	2.35295	2.338223	1.95969	2.548155	2.33230
			5	99.425820	99.33525	3.521295	3.25806	55.0	59	8.0	7	0.0	0	50.0	55	2.388743	1.81262	2.882743	2.37307	1.820821	1.14732
			6	87.428880	87.09312	14.034649	13.52080	65.0	64	17.0	17	1.0	1	60.0	60	2.627496	2.00582	3.309131	2.41782	2.783120	2.09538
			7	86.372700	85.96436	8.296356	7.49830	75.0	79	12.0	12	1.0	0	70.0	75	1.653604	1.55788	2.721844	1.89026	2.240491	2.23757
			8	NaN	75.12352	NaN	8.21455	NaN	84	NaN	10	NaN	0	NaN	80	NaN	2.31160	NaN	2.16121	NaN	0.59915
			9	NaN	76.61402	NaN	6.84377	NaN	99	NaN	9	NaN	0	NaN	95	NaN	3.17821	NaN	1.03362	NaN	2.22903

.. GENERATED FROM PYTHON SOURCE LINES 78-91 This provides us with a dataframe containing the detected and reference values for all detected and reference WBs. Some entries are NaN, as the number of WBs in the detected and reference data might differ. The single rows in this dataframe should not be compared directly, as the same WB ids from a detected and a reference WB might not actually belong to the same WB. Therefore, we need to aggregate the DMO data based on an index level of choice, e.g., per day, to retrieve meaningful and interpretable results. This can for instance be done by grouping the data and averaging over the groups. If required, apart from simple groupwise averaging, other aggregation functions (e.g., moving averages or averaging over a span of several days) can be applied. As long as the same aggregation method is applied to both the detected and reference data, further processing can be done in the same way as shown below. .. note:: In case of missing data, applying `dropna()` to the resulting dataframe might be helpful to remove all groups were either detected or reference data is missing. .. GENERATED FROM PYTHON SOURCE LINES 91-100 .. code-block:: default daily_matches = ( combined_dmos.groupby( level=["visit_type", "participant_id", "measurement_date"], axis=0 ) .mean() .dropna() ) daily_matches.T .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/examples/pipeline/_03_dmo_evaluation_on_wb_level.py:92: FutureWarning: The 'axis' keyword in DataFrame.groupby is deprecated and will be removed in a future version. combined_dmos.groupby( .. raw:: html

	visit_type	T1
	participant_id	12345
	measurement_date	2023-01-01
cadence_spm	detected	93.588495
cadence_spm	reference	89.685043
duration_s	detected	8.622759
duration_s	reference	8.008588
end	detected	40.000000
end	reference	51.500000
n_steps	detected	12.625000
n_steps	reference	11.700000
n_turns	detected	1.000000
n_turns	reference	0.400000
start	detected	35.000000
start	reference	47.500000
stride_duration_s	detected	2.723456
stride_duration_s	reference	2.197761
stride_length_m	detected	2.599588
stride_length_m	reference	1.847961
walking_speed_mps	detected	2.196695
walking_speed_mps	reference	1.693637

.. GENERATED FROM PYTHON SOURCE LINES 101-105 The resulting dataframe contains the average detected and reference values for each DMO per visit type, participant, and day. This in conveniently provided as a multiindex column, so that selecting a single DMO, yields a DataFrame with the detected and reference values. .. GENERATED FROM PYTHON SOURCE LINES 105-107 .. code-block:: default daily_matches["cadence_spm"] .. raw:: html

			detected	reference
visit_type	participant_id	measurement_date
T1	12345	2023-01-01	93.588495	89.685043

.. GENERATED FROM PYTHON SOURCE LINES 108-114 In our example data, we only have data from a single day, so the aggregated result only has one row. Normally, you would have multiple rows, one for each group of WBs. From here on, further processing to retrieve the aggregated error metrics is identical to the further processing when following approach 2, and is shown below. But let's first show how to calculate the error metrics on a WB-by-WB basis. .. GENERATED FROM PYTHON SOURCE LINES 116-134 Approach 2: Match then compare ------------------------------ As the first step we need to indentify WBs that match between the detected and reference data. As it is unlikely that the WBs are exactly the same, we need to define a threshold for the overlap between the WBs to consider them as a match. This matching can be done using the :func:`~mobgap.pipeline.evaluation.categorize_intervals` function. It classifies every WB in the data either as true positive (TP), false positive (FP), or false negative (TP). In case our data has only WBs from a single recording, we could directly provide the detected and reference data to the function. However, in most cases data would contain WBs from multiple recordings, trials, and participants, ... . In our case, we actually only have WBs from a single recording, but we will still show the approach assuming that the data is more complex. To avoid, that WBs from different recordings are matched (as the matching is just performed based on the start/end index), we need to group the data by the relevant index levels first and apply the matching function to each group. This can be done using the :func:`~mobgap.utils.array_handling.create_multi_groupby` helper function. .. GENERATED FROM PYTHON SOURCE LINES 134-142 .. code-block:: default from mobgap.utils.df_operations import create_multi_groupby per_trial_participant_day_grouper = create_multi_groupby( detected_dmo, reference_dmo, groupby=["visit_type", "participant_id", "measurement_date"], ) .. GENERATED FROM PYTHON SOURCE LINES 143-152 This provides us with a groupby-object that is similar to the normal pandas groupby-object that can be created from a single dataframe. The ``MultiGroupBy`` object allows us to apply a function to each group across all dataframes. Here we apply :func:`~mobgap.pipeline.evaluation.categorize_intervals` with a threshold of 0.8 to each group. The `overlap_threshold` parameter defines the minimum overlap between the detected and reference WBs to be considered a match. It can be chosen according to your needs, whereby a value closer to 0.5 will yield more matches than a value closer to 1. .. GENERATED FROM PYTHON SOURCE LINES 152-165 .. code-block:: default from mobgap.pipeline.evaluation import categorize_intervals wb_tp_fp_fn = per_trial_participant_day_grouper.apply( lambda det, ref: categorize_intervals( gsd_list_detected=det, gsd_list_reference=ref, overlap_threshold=0.8, multiindex_warning=False, ) ) wb_tp_fp_fn .. raw:: html

				gs_id_detected	gs_id_reference	match_type
visit_type	participant_id	measurement_date	match_id
T1	12345	2023-01-01	0	(T1, 12345, 2023-01-01, 0)	(T1, 12345, 2023-01-01, 0)	tp
			1	(T1, 12345, 2023-01-01, 1)	NaN	fp
			2	(T1, 12345, 2023-01-01, 2)	(T1, 12345, 2023-01-01, 2)	tp
			3	(T1, 12345, 2023-01-01, 3)	NaN	fp
			4	(T1, 12345, 2023-01-01, 4)	(T1, 12345, 2023-01-01, 4)	tp
			5	(T1, 12345, 2023-01-01, 5)	NaN	fp
			6	(T1, 12345, 2023-01-01, 6)	(T1, 12345, 2023-01-01, 6)	tp
			7	(T1, 12345, 2023-01-01, 7)	NaN	fp
			8	NaN	(T1, 12345, 2023-01-01, 1)	fn
			9	NaN	(T1, 12345, 2023-01-01, 3)	fn
			10	NaN	(T1, 12345, 2023-01-01, 5)	fn
			11	NaN	(T1, 12345, 2023-01-01, 7)	fn
			12	NaN	(T1, 12345, 2023-01-01, 8)	fn
			13	NaN	(T1, 12345, 2023-01-01, 9)	fn

.. GENERATED FROM PYTHON SOURCE LINES 166-176 We can see that the function returns a dataframe with the same index as the input dataframes and each WB is classified as TP, FP, or FN. For the TP WBs, the corresponding reference WB is assigned. For the comparison we want to perform here, only the matching WBs, i.e., the TPs, are of interest. If you are interested in the FPs or FNs, have a look at the general :ref:`GSD evaluation example `. Based on the positive matches, we can now extract the DMO data from detected and reference data that is to be compared. To make extracting all the TP WBs a little easier, we can use the :func:`~mobgap.pipeline.evaluation.get_matching_intervals` function. .. GENERATED FROM PYTHON SOURCE LINES 176-185 .. code-block:: default from mobgap.pipeline.evaluation import get_matching_intervals wb_matches = get_matching_intervals( metrics_detected=detected_dmo, metrics_reference=reference_dmo, matches=wb_tp_fp_fn, ) wb_matches.T .. raw:: html

	visit_type	T1
	participant_id	12345
	measurement_date	2023-01-01
	match_id	0	2	4	6
cadence_spm	detected	100.232432	87.484329	93.988941	87.428880
cadence_spm	reference	99.821880	86.535270	93.698950	87.093120
duration_s	detected	5.130315	9.140576	6.217228	14.034649
duration_s	reference	4.747020	8.527270	6.099070	13.520800
end	detected	5.000000	25.000000	45.000000	65.000000
end	reference	4.000000	24.000000	44.000000	64.000000
n_steps	detected	8.000000	12.000000	9.000000	17.000000
n_steps	reference	8.000000	12.000000	8.000000	17.000000
n_turns	detected	1.000000	3.000000	0.000000	1.000000
n_turns	reference	0.000000	2.000000	0.000000	1.000000
start	detected	0.000000	20.000000	40.000000	60.000000
start	reference	0.000000	20.000000	40.000000	60.000000
stride_duration_s	detected	2.384807	3.472869	3.313120	2.627496
stride_duration_s	reference	1.586750	2.560920	2.352950	2.005820
stride_length_m	detected	2.907343	2.572195	2.338223	3.309131
stride_length_m	reference	2.518850	1.663050	1.959690	2.417820
walking_speed_mps	detected	2.061659	2.255931	2.548155	2.783120
walking_speed_mps	reference	1.160790	1.600440	2.332300	2.095380
wb_id	detected	0.000000	2.000000	4.000000	6.000000
wb_id	reference	0.000000	2.000000	4.000000	6.000000

.. GENERATED FROM PYTHON SOURCE LINES 186-189 The returned dataframe contains the detected and reference values for all DMOs of the matched WBs. This in conveniently provided as a multiindex column, so that selecting a single DMO, yields a DataFrame with the detected and reference values. .. GENERATED FROM PYTHON SOURCE LINES 189-191 .. code-block:: default wb_matches["cadence_spm"] .. raw:: html

				detected	reference
visit_type	participant_id	measurement_date	match_id
T1	12345	2023-01-01	0	100.232432	99.82188
			2	87.484329	86.53527
			4	93.988941	93.69895
			6	87.428880	87.09312

.. GENERATED FROM PYTHON SOURCE LINES 192-220 From here on, the aggregated DMOs (when following approach 1) or matched WBs (when following approach 2) can be compared with the same methods to calculate error metrics. For the sake of simplicity, we will show the calculation of error metrics for the matched WBs `wb_matches` (approach 2) here. However, the input can also simply be replaced by the aggregated DMO dataframe `` Estimate Errors in DMO data --------------------------- The DMO data can now be compared day by day (approach 1) or WB by WB (approach 2). We want to calculate general error metrics like the error, absolute error, relative error, and absolute relative error for each day (WB) and DMO. This can be done using the generic the :func:`~mobgap.utils.df_operations.apply_transformations` helper that allows us to apply any list of transformation functions (transformation function -> WB in Series with same length out). It further allows us to declaratively define which transformation/error should be applied to which columns (i.e. which DMOs). A simple definition of error metrics would look like this: As input, it receives the matching DMO data and a list of transformations that should be applied to the data. A transformation is characterized as a function that takes some subset of the input dataframe, performs some operation on it, and returns a series with the same length as the input as output. Calculating the differences between two sets of values, e.g., between detected and reference values, is a common type of transformation that is applied to evaluate the performance of the DMO estimation. For this purpose, the transformations are defined as aa list of tuples containing the DMO of interest as the first element and the error functions applied to the detected and reference values as the second element. This way, you can also define custom error functions and pass them as transformations. Note that the columns of the detected and reference values are expected to be named `detected` and `reference` per default. For the standard error metrics (error, absolute error, relative error, absolute relative), the :func:`~mobgap.pipeline.evaluation.get_default_error_transformations` returns the correct transformations. .. GENERATED FROM PYTHON SOURCE LINES 220-228 .. code-block:: default from mobgap.pipeline.evaluation import ErrorTransformFuncs as E custom_errors = [ ("cadence_spm", [E.abs_error, E.rel_error]), ("duration_s", [E.error]), ("n_turns", [E.rel_error]), ] .. GENERATED FROM PYTHON SOURCE LINES 229-235 This definition should be relatively self-explanatory. We can now apply these transformations to the DMO data using the :func:`~mobgap.utils.df_operations.apply_transformations`. Note, that there is no need to group the dataframe again, as all the transformations are applied row-wise to the entire dataframe. .. GENERATED FROM PYTHON SOURCE LINES 235-241 .. code-block:: default from mobgap.utils.df_operations import apply_transformations custom_wb_errors = apply_transformations(wb_matches, custom_errors) custom_wb_errors.T .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/pipeline/_error_metrics.py:84: UserWarning: Zero division occurred in rel_error because divisor contains zeroes. Affected error metrics are set to NaN. _handle_zero_division(ref, zero_division_hint, "rel_error") .. raw:: html

	visit_type	T1
	participant_id	12345
	measurement_date	2023-01-01
	match_id	0	2	4	6
cadence_spm	abs_error	0.410552	0.949059	0.289991	0.335760
cadence_spm	rel_error	0.004113	0.010967	0.003095	0.003855
duration_s	error	0.383295	0.613306	0.118158	0.513849
n_turns	rel_error	NaN	0.500000	NaN	0.000000

.. GENERATED FROM PYTHON SOURCE LINES 242-249 We can also modify the error metrics or provide custom error functions. We will show three options here. 1. Use a usual error metric, but change some input parameters, and have the output under a new name. For this case, we just define a new function wrapping the old one. For example, we might want to suppress the warning that is raised when a zero division occurs in the relative error. As we saw above, this warning is raised for the `n_turns` parameter. .. GENERATED FROM PYTHON SOURCE LINES 249-253 .. code-block:: default def rel_error_without_warning(x): return E.rel_error(x, zero_division_hint=np.nan) .. GENERATED FROM PYTHON SOURCE LINES 254-258 2. When we want to keep the same name for the function, we could just overwrite the old function. But, to avoid accidentally messing up other code, that uses the function, we can also use a lambda function and manually set the name of the function. As a result, we supress the warning as above, but keep the function name for the aggregation. .. GENERATED FROM PYTHON SOURCE LINES 258-262 .. code-block:: default rel_error_as_lambda = lambda x: E.rel_error(x, zero_division_hint=np.nan) rel_error_as_lambda.__name__ = "rel_error" .. GENERATED FROM PYTHON SOURCE LINES 263-272 3. We can also define a completely new error function. The Dataframe we get as input here, contains the columns `detected` and `reference` with the detected and reference values for the DMO of interest. For this example here, we will create a nonsensical ``scaled_error`` function that scales the error by a factor of 2. .. note:: If you want to introduce custom, more complex transformation functions, you can also define them as :class:`~mobgap.utils.df_operations.CustomOperation` as shown for aggregations in the "Aggregation" section. .. GENERATED FROM PYTHON SOURCE LINES 272-276 .. code-block:: default def scaled_error(x): return 2 * (x["detected"] - x["reference"]) .. GENERATED FROM PYTHON SOURCE LINES 277-281 Our custom functions can now be used in the transformations list and freely combined with other error metrics. Also, keep in mind, that the definition is "just" Python, so we can use things like list comprehensions to generate the list of transformations as shown below. .. GENERATED FROM PYTHON SOURCE LINES 281-294 .. code-block:: default custom_errors = [ ("cadence_spm", [E.error, scaled_error]), ("duration_s", [E.error]), ("n_turns", [rel_error_without_warning, rel_error_as_lambda]), *( (m, [E.abs_error, E.rel_error]) for m in ["stride_duration_s", "stride_length_m"] ), ] custom_wb_errors = apply_transformations(wb_matches, custom_errors) custom_wb_errors.T .. raw:: html

	visit_type	T1
	participant_id	12345
	measurement_date	2023-01-01
	match_id	0	2	4	6
cadence_spm	error	0.410552	0.949059	0.289991	0.335760
cadence_spm	scaled_error	0.821103	1.898118	0.579983	0.671520
duration_s	error	0.383295	0.613306	0.118158	0.513849
n_turns	rel_error_without_warning	NaN	0.500000	NaN	0.000000
n_turns	rel_error	NaN	0.500000	NaN	0.000000
stride_duration_s	abs_error	0.798057	0.911949	0.960170	0.621676
stride_duration_s	rel_error	0.502951	0.356102	0.408071	0.309936
stride_length_m	abs_error	0.388493	0.909145	0.378533	0.891311
stride_length_m	rel_error	0.154234	0.546673	0.193160	0.368642

.. GENERATED FROM PYTHON SOURCE LINES 295-301 As expected, the resulting dataframe contains the error metrics for the specified DMOs and could now be further processed, e.g., by aggregating the results. As an alternative to defining a custom error definition, we provide a "default" error definition that can be used to calculate the standard error metrics for the common DMOs. In most cases, this is a good starting point for the evaluation of the DMOs. .. GENERATED FROM PYTHON SOURCE LINES 301-307 .. code-block:: default from mobgap.pipeline.evaluation import get_default_error_transformations default_errors = get_default_error_transformations() pprint(default_errors) .. rst-class:: sphx-glr-script-out .. code-block:: none [('cadence_spm', [, , , ]), ('duration_s', [, , , ]), ('n_steps', [, , , ]), ('n_strides', [, , , ]), ('n_turns', [, , , ]), ('stride_duration_s', [, , , ]), ('stride_length_m', [, , , ]), ('walking_speed_mps', [, , , ])] .. GENERATED FROM PYTHON SOURCE LINES 308-312 While the visualization here is a little ugly, we can see that the default error transformation attempts to calculate the error, the relative error, the absolute error, and the absolute relative error for all the core DMOs. We can apply it as before. .. GENERATED FROM PYTHON SOURCE LINES 312-316 .. code-block:: default wb_errors = apply_transformations(wb_matches, default_errors) wb_errors.T .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:365: UserWarning: One of the transformations requires the following columns: 'n_strides'. They are not found in the DataFrame. warnings.warn(str(e), stacklevel=1) /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/pipeline/_error_metrics.py:84: UserWarning: Zero division occurred in rel_error because divisor contains zeroes. Affected error metrics are set to NaN. _handle_zero_division(ref, zero_division_hint, "rel_error") /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/pipeline/_error_metrics.py:146: UserWarning: Zero division occurred in abs_rel_error because divisor contains zeroes. Affected error metrics are set to NaN. _handle_zero_division(ref, zero_division_hint, "abs_rel_error") .. raw:: html

	visit_type	T1
	participant_id	12345
	measurement_date	2023-01-01
	match_id	0	2	4	6
cadence_spm	error	0.410552	0.949059	0.289991	0.335760
	rel_error	0.004113	0.010967	0.003095	0.003855
	abs_error	0.410552	0.949059	0.289991	0.335760
	abs_rel_error	0.004113	0.010967	0.003095	0.003855
duration_s	error	0.383295	0.613306	0.118158	0.513849
	rel_error	0.080744	0.071923	0.019373	0.038004
	abs_error	0.383295	0.613306	0.118158	0.513849
	abs_rel_error	0.080744	0.071923	0.019373	0.038004
n_steps	error	0.000000	0.000000	1.000000	0.000000
	rel_error	0.000000	0.000000	0.125000	0.000000
	abs_error	0.000000	0.000000	1.000000	0.000000
	abs_rel_error	0.000000	0.000000	0.125000	0.000000
n_turns	error	1.000000	1.000000	0.000000	0.000000
	rel_error	NaN	0.500000	NaN	0.000000
	abs_error	1.000000	1.000000	0.000000	0.000000
	abs_rel_error	NaN	0.500000	NaN	0.000000
stride_duration_s	error	0.798057	0.911949	0.960170	0.621676
	rel_error	0.502951	0.356102	0.408071	0.309936
	abs_error	0.798057	0.911949	0.960170	0.621676
	abs_rel_error	0.502951	0.356102	0.408071	0.309936
stride_length_m	error	0.388493	0.909145	0.378533	0.891311
	rel_error	0.154234	0.546673	0.193160	0.368642
	abs_error	0.388493	0.909145	0.378533	0.891311
	abs_rel_error	0.154234	0.546673	0.193160	0.368642
walking_speed_mps	error	0.900869	0.655491	0.215855	0.687740
	rel_error	0.776083	0.409569	0.092550	0.328217
	abs_error	0.900869	0.655491	0.215855	0.687740
	abs_rel_error	0.776083	0.409569	0.092550	0.328217

.. GENERATED FROM PYTHON SOURCE LINES 317-319 Before we now aggregate the results, we can also combine the error metrics with the reference and detected values to have all the information in one dataframe. .. GENERATED FROM PYTHON SOURCE LINES 319-322 .. code-block:: default wb_matches_with_errors = pd.concat([wb_matches, wb_errors], axis=1) wb_matches_with_errors.T .. raw:: html

	visit_type	T1
	participant_id	12345
	measurement_date	2023-01-01
	match_id	0	2	4	6
cadence_spm	detected	100.232432	87.484329	93.988941	87.428880
cadence_spm	reference	99.821880	86.535270	93.698950	87.093120
duration_s	detected	5.130315	9.140576	6.217228	14.034649
duration_s	reference	4.747020	8.527270	6.099070	13.520800
end	detected	5.000000	25.000000	45.000000	65.000000
end	reference	4.000000	24.000000	44.000000	64.000000
n_steps	detected	8.000000	12.000000	9.000000	17.000000
n_steps	reference	8.000000	12.000000	8.000000	17.000000
n_turns	detected	1.000000	3.000000	0.000000	1.000000
n_turns	reference	0.000000	2.000000	0.000000	1.000000
start	detected	0.000000	20.000000	40.000000	60.000000
start	reference	0.000000	20.000000	40.000000	60.000000
stride_duration_s	detected	2.384807	3.472869	3.313120	2.627496
stride_duration_s	reference	1.586750	2.560920	2.352950	2.005820
stride_length_m	detected	2.907343	2.572195	2.338223	3.309131
stride_length_m	reference	2.518850	1.663050	1.959690	2.417820
walking_speed_mps	detected	2.061659	2.255931	2.548155	2.783120
walking_speed_mps	reference	1.160790	1.600440	2.332300	2.095380
wb_id	detected	0.000000	2.000000	4.000000	6.000000
wb_id	reference	0.000000	2.000000	4.000000	6.000000
cadence_spm	error	0.410552	0.949059	0.289991	0.335760
	rel_error	0.004113	0.010967	0.003095	0.003855
	abs_error	0.410552	0.949059	0.289991	0.335760
	abs_rel_error	0.004113	0.010967	0.003095	0.003855
duration_s	error	0.383295	0.613306	0.118158	0.513849
	rel_error	0.080744	0.071923	0.019373	0.038004
	abs_error	0.383295	0.613306	0.118158	0.513849
	abs_rel_error	0.080744	0.071923	0.019373	0.038004
n_steps	error	0.000000	0.000000	1.000000	0.000000
	rel_error	0.000000	0.000000	0.125000	0.000000
	abs_error	0.000000	0.000000	1.000000	0.000000
	abs_rel_error	0.000000	0.000000	0.125000	0.000000
n_turns	error	1.000000	1.000000	0.000000	0.000000
	rel_error	NaN	0.500000	NaN	0.000000
	abs_error	1.000000	1.000000	0.000000	0.000000
	abs_rel_error	NaN	0.500000	NaN	0.000000
stride_duration_s	error	0.798057	0.911949	0.960170	0.621676
	rel_error	0.502951	0.356102	0.408071	0.309936
	abs_error	0.798057	0.911949	0.960170	0.621676
	abs_rel_error	0.502951	0.356102	0.408071	0.309936
stride_length_m	error	0.388493	0.909145	0.378533	0.891311
	rel_error	0.154234	0.546673	0.193160	0.368642
	abs_error	0.388493	0.909145	0.378533	0.891311
	abs_rel_error	0.154234	0.546673	0.193160	0.368642
walking_speed_mps	error	0.900869	0.655491	0.215855	0.687740
	rel_error	0.776083	0.409569	0.092550	0.328217
	abs_error	0.900869	0.655491	0.215855	0.687740
	abs_rel_error	0.776083	0.409569	0.092550	0.328217

.. GENERATED FROM PYTHON SOURCE LINES 323-350 Aggregate Results ----------------- Finally, the estimated DMO measures and their errors can be aggregated over all WBs (approach 2) or all days (approach 1). For this purpose, different aggregation functions can be applied to the error metrics, ranging from simple, built-in aggregations like the mean or standard deviation to more complex functions like the limits of agreement or 5th and 95th percentiles. This can be done using the :func:`~mobgap.utils.df_operations.apply_aggregations` function. It operates similarly to the :func:`~mobgap.utils.df_operations.apply_transformations` function used above by taking the error metrics dataframe and a list of aggregations as input. In contrast to the transformations, an aggregation performed over a subset of dataframe columns is expected to return a single value or a tuple of values stored in one cell of the resulting dataframe. There are two ways to define aggregations: 1. As a tuple in the format ``(, )``. In this case, the operation is performed based on exactly one column from the input df. Therefore, ```` can either be a string representing the name of the column to evaluate (for data with single-level columns), or a tuple of strings uniquely identifying the column to evaluate in case of multi-index columns. In our example, the identifier is a tuple ``(, )``, where ```` is the metric column to evaluate, ```` is the specific column from which data should be utilized (here, it would be either ``detected``, ``reference``, or one of the error columns). ```` is the function or the list of functions to apply. The output dataframe will have a multilevel column with ``metric`` as the first level and ``origin`` as the second level. A valid aggregations list for all of our DMOs would consequently look like this: .. GENERATED FROM PYTHON SOURCE LINES 350-366 .. code-block:: default metrics = [ "cadence_spm", "duration_s", "n_steps", "n_turns", "stride_duration_s", "stride_length_m", "walking_speed_mps", ] aggregations_simple = [ ((m, o), ["mean", "std"]) for m in metrics for o in ["detected", "reference", "error"] ] pprint(aggregations_simple) .. rst-class:: sphx-glr-script-out .. code-block:: none [(('cadence_spm', 'detected'), ['mean', 'std']), (('cadence_spm', 'reference'), ['mean', 'std']), (('cadence_spm', 'error'), ['mean', 'std']), (('duration_s', 'detected'), ['mean', 'std']), (('duration_s', 'reference'), ['mean', 'std']), (('duration_s', 'error'), ['mean', 'std']), (('n_steps', 'detected'), ['mean', 'std']), (('n_steps', 'reference'), ['mean', 'std']), (('n_steps', 'error'), ['mean', 'std']), (('n_turns', 'detected'), ['mean', 'std']), (('n_turns', 'reference'), ['mean', 'std']), (('n_turns', 'error'), ['mean', 'std']), (('stride_duration_s', 'detected'), ['mean', 'std']), (('stride_duration_s', 'reference'), ['mean', 'std']), (('stride_duration_s', 'error'), ['mean', 'std']), (('stride_length_m', 'detected'), ['mean', 'std']), (('stride_length_m', 'reference'), ['mean', 'std']), (('stride_length_m', 'error'), ['mean', 'std']), (('walking_speed_mps', 'detected'), ['mean', 'std']), (('walking_speed_mps', 'reference'), ['mean', 'std']), (('walking_speed_mps', 'error'), ['mean', 'std'])] .. GENERATED FROM PYTHON SOURCE LINES 367-375 2. As a named tuple of Type `CustomOperation` taking three values: `identifier`, `function`, and `column_name`. `identifier` is a valid loc identifier selecting one or more columns from the dataframe, `function` is the (custom) aggregation function or list of functions to apply, and `column_name` is the name of the resulting column in the output dataframe (single-level column if `column_name` is a string, multi-level column if `column_name` is a tuple). This allows for more complex aggregations that require multiple columns as input, for example, the intraclass correlation coefficient (ICC) for the DMOs (see below). A valid aggregation list for calculating the ICC of all DMOs would look like this: .. GENERATED FROM PYTHON SOURCE LINES 376-385 .. code-block:: default from mobgap.pipeline.evaluation import CustomErrorAggregations as A from mobgap.pipeline.evaluation import get_default_error_aggregations from mobgap.utils.df_operations import CustomOperation aggregations_custom = [ CustomOperation(identifier=m, function=A.icc, column_name=(m, "all")) for m in metrics ] pprint(aggregations_custom) .. rst-class:: sphx-glr-script-out .. code-block:: none [CustomOperation(identifier='cadence_spm', function=, column_name=('cadence_spm', 'all')), CustomOperation(identifier='duration_s', function=, column_name=('duration_s', 'all')), CustomOperation(identifier='n_steps', function=, column_name=('n_steps', 'all')), CustomOperation(identifier='n_turns', function=, column_name=('n_turns', 'all')), CustomOperation(identifier='stride_duration_s', function=, column_name=('stride_duration_s', 'all')), CustomOperation(identifier='stride_length_m', function=, column_name=('stride_length_m', 'all')), CustomOperation(identifier='walking_speed_mps', function=, column_name=('walking_speed_mps', 'all'))] .. GENERATED FROM PYTHON SOURCE LINES 386-391 In this case, the ICC function gets the entire "sub-dataframe" obtained by the selection ``wb_matches_with_errors.loc[:, m]`` as shown below for ``stride_duration_s`` as example, and could then perform any required calculations. The selection could theoretically be any valid loc selection. So you could even select values across multiple DMOs. .. GENERATED FROM PYTHON SOURCE LINES 391-393 .. code-block:: default sub_df = wb_matches_with_errors.loc[:, "stride_duration_s"] .. GENERATED FROM PYTHON SOURCE LINES 394-395 The ICC function just takes the ``detected`` and ``reference`` columns and calculates the ICC. .. GENERATED FROM PYTHON SOURCE LINES 395-397 .. code-block:: default A.icc(sub_df) .. rst-class:: sphx-glr-script-out .. code-block:: none (0.12564828430955782, array([-0.77, 0.9 ])) .. GENERATED FROM PYTHON SOURCE LINES 398-403 Within one aggregation list, both types of aggregations can be combined as long as the resulting output dataframes can be concatenated, i.e. have the same number of column levels. Then, the :func:`~mobgap.utils.df_operations.apply_aggregations` function can be called. This returns a pandas Series with the aggregated values for each metric and origin. For better readability, we sort and format the resulting dataframe. .. GENERATED FROM PYTHON SOURCE LINES 403-415 .. code-block:: default from mobgap.utils.df_operations import apply_aggregations aggregations = aggregations_simple + aggregations_custom agg_results = ( apply_aggregations(wb_matches_with_errors, aggregations) .rename_axis(index=["aggregation", "metric", "origin"]) .reorder_levels(["metric", "origin", "aggregation"]) .sort_index(level=0) .to_frame("values") ) agg_results .. raw:: html

			values
metric	origin	aggregation
cadence_spm	all	icc	(0.9958867337397626, [0.96, 1.0])
	detected	mean	92.283646
	detected	std	6.128986
	error	mean	0.496341
	error	std	0.305876
	reference	mean	91.787305
	reference	std	6.267057
duration_s	all	icc	(0.9935066257807105, [0.94, 1.0])
	detected	mean	8.630692
	detected	std	3.980795
	error	mean	0.407152
	error	std	0.214453
	reference	mean	8.22354
	reference	std	3.86233
n_steps	all	icc	(0.9927710843373494, [0.93, 1.0])
	detected	mean	11.5
	detected	std	4.041452
	error	mean	0.25
	error	std	0.5
	reference	mean	11.25
	reference	std	4.272002
n_turns	all	icc	(0.8064516129032259, [-0.03, 0.99])
	detected	mean	1.25
	detected	std	1.258306
	error	mean	0.5
	error	std	0.57735
	reference	mean	0.75
	reference	std	0.957427
stride_duration_s	all	icc	(0.12564828430955782, [-0.77, 0.9])
	detected	mean	2.949573
	detected	std	0.525579
	error	mean	0.822963
	error	std	0.150423
	reference	mean	2.12661
	reference	std	0.426573
stride_length_m	all	icc	(0.1021600349369979, [-0.78, 0.9])
	detected	mean	2.781723
	detected	std	0.422111
	error	mean	0.64187
	error	std	0.298442
	reference	mean	2.139852
	reference	std	0.400293
walking_speed_mps	all	icc	(0.20380942118620765, [-0.74, 0.92])
	detected	mean	2.412216
	detected	std	0.317996
	error	mean	0.614989
	error	std	0.2875
	reference	mean	1.797227
	reference	std	0.522486

.. GENERATED FROM PYTHON SOURCE LINES 416-418 If you simply want to apply a standard set of aggregations to the error metrics, you can use the :func:`~mobgap.pipeline.evaluation.get_default_error_aggregations` function, resulting in the following list: .. GENERATED FROM PYTHON SOURCE LINES 418-422 .. code-block:: default aggregations_default = get_default_error_aggregations() pprint(aggregations_default) .. rst-class:: sphx-glr-script-out .. code-block:: none [(('cadence_spm', 'detected'), ['mean', ]), (('cadence_spm', 'reference'), ['mean', ]), (('cadence_spm', 'abs_error'), ['mean', ]), (('cadence_spm', 'abs_rel_error'), ['mean', ]), (('duration_s', 'detected'), ['mean', ]), (('duration_s', 'reference'), ['mean', ]), (('duration_s', 'abs_error'), ['mean', ]), (('duration_s', 'abs_rel_error'), ['mean', ]), (('n_steps', 'detected'), ['mean', ]), (('n_steps', 'reference'), ['mean', ]), (('n_steps', 'abs_error'), ['mean', ]), (('n_steps', 'abs_rel_error'), ['mean', ]), (('n_strides', 'detected'), ['mean', ]), (('n_strides', 'reference'), ['mean', ]), (('n_strides', 'abs_error'), ['mean', ]), (('n_strides', 'abs_rel_error'), ['mean', ]), (('n_turns', 'detected'), ['mean', ]), (('n_turns', 'reference'), ['mean', ]), (('n_turns', 'abs_error'), ['mean', ]), (('n_turns', 'abs_rel_error'), ['mean', ]), (('stride_duration_s', 'detected'), ['mean', ]), (('stride_duration_s', 'reference'), ['mean', ]), (('stride_duration_s', 'abs_error'), ['mean', ]), (('stride_duration_s', 'abs_rel_error'), ['mean', ]), (('stride_length_m', 'detected'), ['mean', ]), (('stride_length_m', 'reference'), ['mean', ]), (('stride_length_m', 'abs_error'), ['mean', ]), (('stride_length_m', 'abs_rel_error'), ['mean', ]), (('walking_speed_mps', 'detected'), ['mean', ]), (('walking_speed_mps', 'reference'), ['mean', ]), (('walking_speed_mps', 'abs_error'), ['mean', ]), (('walking_speed_mps', 'abs_rel_error'), ['mean', ]), (('cadence_spm', 'error'), ['mean', ]), (('cadence_spm', 'rel_error'), ['mean', ]), (('duration_s', 'error'), ['mean', ]), (('duration_s', 'rel_error'), ['mean', ]), (('n_steps', 'error'), ['mean', ]), (('n_steps', 'rel_error'), ['mean', ]), (('n_strides', 'error'), ['mean', ]), (('n_strides', 'rel_error'), ['mean', ]), (('n_turns', 'error'), ['mean', ]), (('n_turns', 'rel_error'), ['mean', ]), (('stride_duration_s', 'error'), ['mean', ]), (('stride_duration_s', 'rel_error'), ['mean', ]), (('stride_length_m', 'error'), ['mean', ]), (('stride_length_m', 'rel_error'), ['mean', ]), (('walking_speed_mps', 'error'), ['mean', ]), (('walking_speed_mps', 'rel_error'), ['mean', ]), CustomOperation(identifier='cadence_spm', function=, column_name=('cadence_spm', 'all')), CustomOperation(identifier='duration_s', function=, column_name=('duration_s', 'all')), CustomOperation(identifier='n_steps', function=, column_name=('n_steps', 'all')), CustomOperation(identifier='n_strides', function=, column_name=('n_strides', 'all')), CustomOperation(identifier='n_turns', function=, column_name=('n_turns', 'all')), CustomOperation(identifier='stride_duration_s', function=, column_name=('stride_duration_s', 'all')), CustomOperation(identifier='stride_length_m', function=, column_name=('stride_length_m', 'all')), CustomOperation(identifier='walking_speed_mps', function=, column_name=('walking_speed_mps', 'all')), CustomOperation(identifier=None, function=, column_name=('all', 'all'))] .. GENERATED FROM PYTHON SOURCE LINES 423-424 If you want to include further aggregations next to the default ones, you can also append them to this list. .. GENERATED FROM PYTHON SOURCE LINES 424-428 .. code-block:: default aggregations_default_extended = aggregations_default + [ *(((m, o), ["std"]) for m in metrics for o in ["detected", "reference"]) ] .. GENERATED FROM PYTHON SOURCE LINES 429-431 This list of standard aggregations can then also be passed to the :func:`~mobgap.utils.df_operations.apply_aggregations` function. .. GENERATED FROM PYTHON SOURCE LINES 431-440 .. code-block:: default default_agg_results = ( apply_aggregations(wb_matches_with_errors, aggregations_default_extended) .rename_axis(index=["aggregation", "metric", "origin"]) .reorder_levels(["metric", "origin", "aggregation"]) .sort_index(level=0) .to_frame("values") ) default_agg_results .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'detected')'. They are not found in the DataFrame. warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1) /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'reference')'. They are not found in the DataFrame. warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1) /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'abs_error')'. They are not found in the DataFrame. warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1) /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'abs_rel_error')'. They are not found in the DataFrame. warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1) /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'error')'. They are not found in the DataFrame. warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1) /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'rel_error')'. They are not found in the DataFrame. warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1) /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:566: UserWarning: One of the transformations requires the following columns: 'n_strides'. They are not found in the DataFrame. warnings.warn(str(e), UserWarning, stacklevel=1) .. raw:: html

			values
metric	origin	aggregation
all	all	n_datapoints	4
cadence_spm	abs_error	mean	0.496341
	abs_error	quantiles	(0.29685672457991713, 0.8682829318918925)
	abs_rel_error	mean	0.005508
	abs_rel_error	quantiles	(0.003208965846203449, 0.00993913893132042)
...	...	...	...
walking_speed_mps	reference	mean	1.797227
		quantiles	(1.2267375, 2.2967619999999997)
		std	0.522486
	rel_error	loa	(-0.15414805045818636, 0.9573580216047631)
	rel_error	mean	0.401605

106 rows × 1 columns

.. GENERATED FROM PYTHON SOURCE LINES 441-445 .. note:: If you want to modify the default arguments of the aggregation functions, e.g. to change the calculated quantiles, you can either define custom aggregation functions or adapt the default functions as shown for the transformation functions above. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 4.901 seconds) **Estimated memory usage:** 9 MB .. _sphx_glr_download_auto_examples_pipeline__03_dmo_evaluation_on_wb_level.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: _03_dmo_evaluation_on_wb_level.py <_03_dmo_evaluation_on_wb_level.py>` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: _03_dmo_evaluation_on_wb_level.ipynb <_03_dmo_evaluation_on_wb_level.ipynb>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_