Note

Go to the end to download the full example code

Evaluation of final walking bout level DMOs#

This example shows how to evaluate the performance of parameters on a walking bout (WB) level by comparing against a reference. On this level, we usually need to deal with the issue that the WB identified by the algorithm pipeline might not match the reference WBs. This makes comparing the parameters within them difficult. In general, two approaches can be taken here [1]:

First aggregate the WB-level parameters of both systems to a common level (e.g. per trial, per day, per hour, …) and then compare the aggregated values.
Identify the subset of WBs that match between the two systems and compare the parameters only within these WBs.

In the following example we will show both approaches.

But first some general setup.

Loading some example data#

We simply load some example DMO data and their reference that we provide with the package. Usually, the “detected” data would be the output of your algorithm pipeline and the “reference” data would be the ground truth.

Note

This data is randomly generated and not physiologically meaningful. However, it has the same structure as any other typical input data for this evaluation.

from pprint import pprint

import numpy as np
import pandas as pd
from mobgap import PACKAGE_ROOT

DATA_PATH = PACKAGE_ROOT.parent / "example_data/dmo_data/dummy_dmo_data"

detected_dmo = pd.read_csv(DATA_PATH / "detected_dmo_data.csv").set_index(
    ["visit_type", "participant_id", "measurement_date", "wb_id"]
)

reference_dmo = pd.read_csv(DATA_PATH / "reference_dmo_data.csv").set_index(
    ["visit_type", "participant_id", "measurement_date", "wb_id"]
)

In both dataframes each row represents one WB with all of its parameters. The index contains multiple levels, including the visit type, participant_id, measurement day, and WB id, The start and end index of each WB in samples relative to the start of the respective recording is contained in the columns start and end.

detected_dmo

				start	end	duration_s	n_steps	cadence_spm	walking_speed_mps	stride_length_m	stride_duration_s	n_turns
visit_type	participant_id	measurement_date	wb_id
T1	12345	2023-01-01	0	0	5	5.130315	8	100.232432	2.061659	2.907343	2.384807	1
			1	10	15	5.436672	7	101.677896	2.722036	2.469691	2.439419	1
			2	20	25	9.140576	12	87.484329	2.255931	2.572195	3.472869	3
			3	30	35	17.204985	28	92.096962	1.141349	1.595533	3.507587	1
			4	40	45	6.217228	9	93.988941	2.548155	2.338223	3.313120	0
			5	50	55	3.521295	8	99.425820	1.820821	2.882743	2.388743	0
			6	60	65	14.034649	17	87.428880	2.783120	3.309131	2.627496	1
			7	70	75	8.296356	12	86.372700	2.240491	2.721844	1.653604	1

reference_dmo

				start	end	duration_s	n_steps	cadence_spm	walking_speed_mps	stride_length_m	stride_duration_s	n_turns
visit_type	participant_id	measurement_date	wb_id
T1	12345	2023-01-01	0	0	4	4.74702	8	99.82188	1.16079	2.51885	1.58675	0
			1	15	19	5.13150	7	101.16429	2.57881	1.57243	1.46537	0
			2	20	24	8.52727	12	86.53527	1.60044	1.66305	2.56092	2
			3	35	39	16.24554	27	91.49977	0.95558	0.88961	3.14549	1
			4	40	44	6.09907	8	93.69895	2.33230	1.95969	2.35295	0
			5	55	59	3.25806	7	99.33525	1.14732	2.37307	1.81262	0
			6	60	64	13.52080	17	87.09312	2.09538	2.41782	2.00582	1
			7	75	79	7.49830	12	85.96436	2.23757	1.89026	1.55788	0
			8	80	84	8.21455	10	75.12352	0.59915	2.16121	2.31160	0
			9	95	99	6.84377	9	76.61402	2.22903	1.03362	3.17821	0

Approach 1: Aggregate then compare#

First, we combine the detected and reference data, which can easily be done as both dataframes have the same index levels. To sustain the information about the origin of the data, we add a column level assigning "detected" and "reference" to the respective dmos. Furthermore, we rearrange the columns to have the DMO metrics as the first level of the column index.

combined_dmos = (
    pd.concat(
        [detected_dmo, reference_dmo], keys=["detected", "reference"], axis=1
    )
    .reorder_levels((1, 0), axis=1)
    .sort_index(axis=1)
)
combined_dmos

				cadence_spm		duration_s		end		n_steps		n_turns		start		stride_duration_s		stride_length_m		walking_speed_mps
				detected	reference	detected	reference	detected	reference	detected	reference	detected	reference	detected	reference	detected	reference	detected	reference	detected	reference
visit_type	participant_id	measurement_date	wb_id
T1	12345	2023-01-01	0	100.232432	99.82188	5.130315	4.74702	5.0	4	8.0	8	1.0	0	0.0	0	2.384807	1.58675	2.907343	2.51885	2.061659	1.16079
			1	101.677896	101.16429	5.436672	5.13150	15.0	19	7.0	7	1.0	0	10.0	15	2.439419	1.46537	2.469691	1.57243	2.722036	2.57881
			2	87.484329	86.53527	9.140576	8.52727	25.0	24	12.0	12	3.0	2	20.0	20	3.472869	2.56092	2.572195	1.66305	2.255931	1.60044
			3	92.096962	91.49977	17.204985	16.24554	35.0	39	28.0	27	1.0	1	30.0	35	3.507587	3.14549	1.595533	0.88961	1.141349	0.95558
			4	93.988941	93.69895	6.217228	6.09907	45.0	44	9.0	8	0.0	0	40.0	40	3.313120	2.35295	2.338223	1.95969	2.548155	2.33230
			5	99.425820	99.33525	3.521295	3.25806	55.0	59	8.0	7	0.0	0	50.0	55	2.388743	1.81262	2.882743	2.37307	1.820821	1.14732
			6	87.428880	87.09312	14.034649	13.52080	65.0	64	17.0	17	1.0	1	60.0	60	2.627496	2.00582	3.309131	2.41782	2.783120	2.09538
			7	86.372700	85.96436	8.296356	7.49830	75.0	79	12.0	12	1.0	0	70.0	75	1.653604	1.55788	2.721844	1.89026	2.240491	2.23757
			8	NaN	75.12352	NaN	8.21455	NaN	84	NaN	10	NaN	0	NaN	80	NaN	2.31160	NaN	2.16121	NaN	0.59915
			9	NaN	76.61402	NaN	6.84377	NaN	99	NaN	9	NaN	0	NaN	95	NaN	3.17821	NaN	1.03362	NaN	2.22903

This provides us with a dataframe containing the detected and reference values for all detected and reference WBs. Some entries are NaN, as the number of WBs in the detected and reference data might differ. The single rows in this dataframe should not be compared directly, as the same WB ids from a detected and a reference WB might not actually belong to the same WB. Therefore, we need to aggregate the DMO data based on an index level of choice, e.g., per day, to retrieve meaningful and interpretable results. This can for instance be done by grouping the data and averaging over the groups. If required, apart from simple groupwise averaging, other aggregation functions (e.g., moving averages or averaging over a span of several days) can be applied. As long as the same aggregation method is applied to both the detected and reference data, further processing can be done in the same way as shown below.

Note

In case of missing data, applying dropna() to the resulting dataframe might be helpful to remove all groups were either detected or reference data is missing.

daily_matches = (
    combined_dmos.groupby(
        level=["visit_type", "participant_id", "measurement_date"], axis=0
    )
    .mean()
    .dropna()
)
daily_matches.T

/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/examples/pipeline/_03_dmo_evaluation_on_wb_level.py:92: FutureWarning: The 'axis' keyword in DataFrame.groupby is deprecated and will be removed in a future version.
  combined_dmos.groupby(

	visit_type	T1
	participant_id	12345
	measurement_date	2023-01-01
cadence_spm	detected	93.588495
cadence_spm	reference	89.685043
duration_s	detected	8.622759
duration_s	reference	8.008588
end	detected	40.000000
end	reference	51.500000
n_steps	detected	12.625000
n_steps	reference	11.700000
n_turns	detected	1.000000
n_turns	reference	0.400000
start	detected	35.000000
start	reference	47.500000
stride_duration_s	detected	2.723456
stride_duration_s	reference	2.197761
stride_length_m	detected	2.599588
stride_length_m	reference	1.847961
walking_speed_mps	detected	2.196695
walking_speed_mps	reference	1.693637

The resulting dataframe contains the average detected and reference values for each DMO per visit type, participant, and day. This in conveniently provided as a multiindex column, so that selecting a single DMO, yields a DataFrame with the detected and reference values.

daily_matches["cadence_spm"]

			detected	reference
visit_type	participant_id	measurement_date
T1	12345	2023-01-01	93.588495	89.685043

In our example data, we only have data from a single day, so the aggregated result only has one row. Normally, you would have multiple rows, one for each group of WBs. From here on, further processing to retrieve the aggregated error metrics is identical to the further processing when following approach 2, and is shown below.

But let’s first show how to calculate the error metrics on a WB-by-WB basis.

Approach 2: Match then compare#

As the first step we need to indentify WBs that match between the detected and reference data. As it is unlikely that the WBs are exactly the same, we need to define a threshold for the overlap between the WBs to consider them as a match. This matching can be done using the categorize_intervals function. It classifies every WB in the data either as true positive (TP), false positive (FP), or false negative (TP). In case our data has only WBs from a single recording, we could directly provide the detected and reference data to the function.

However, in most cases data would contain WBs from multiple recordings, trials, and participants, … . In our case, we actually only have WBs from a single recording, but we will still show the approach assuming that the data is more complex.

To avoid, that WBs from different recordings are matched (as the matching is just performed based on the start/end index), we need to group the data by the relevant index levels first and apply the matching function to each group. This can be done using the create_multi_groupby helper function.

from mobgap.utils.df_operations import create_multi_groupby

per_trial_participant_day_grouper = create_multi_groupby(
    detected_dmo,
    reference_dmo,
    groupby=["visit_type", "participant_id", "measurement_date"],
)

This provides us with a groupby-object that is similar to the normal pandas groupby-object that can be created from a single dataframe. The MultiGroupBy object allows us to apply a function to each group across all dataframes.

Here we apply categorize_intervals with a threshold of 0.8 to each group. The overlap_threshold parameter defines the minimum overlap between the detected and reference WBs to be considered a match. It can be chosen according to your needs, whereby a value closer to 0.5 will yield more matches than a value closer to 1.

from mobgap.pipeline.evaluation import categorize_intervals

wb_tp_fp_fn = per_trial_participant_day_grouper.apply(
    lambda det, ref: categorize_intervals(
        gsd_list_detected=det,
        gsd_list_reference=ref,
        overlap_threshold=0.8,
        multiindex_warning=False,
    )
)
wb_tp_fp_fn

				gs_id_detected	gs_id_reference	match_type
visit_type	participant_id	measurement_date	match_id
T1	12345	2023-01-01	0	(T1, 12345, 2023-01-01, 0)	(T1, 12345, 2023-01-01, 0)	tp
			1	(T1, 12345, 2023-01-01, 1)	NaN	fp
			2	(T1, 12345, 2023-01-01, 2)	(T1, 12345, 2023-01-01, 2)	tp
			3	(T1, 12345, 2023-01-01, 3)	NaN	fp
			4	(T1, 12345, 2023-01-01, 4)	(T1, 12345, 2023-01-01, 4)	tp
			5	(T1, 12345, 2023-01-01, 5)	NaN	fp
			6	(T1, 12345, 2023-01-01, 6)	(T1, 12345, 2023-01-01, 6)	tp
			7	(T1, 12345, 2023-01-01, 7)	NaN	fp
			8	NaN	(T1, 12345, 2023-01-01, 1)	fn
			9	NaN	(T1, 12345, 2023-01-01, 3)	fn
			10	NaN	(T1, 12345, 2023-01-01, 5)	fn
			11	NaN	(T1, 12345, 2023-01-01, 7)	fn
			12	NaN	(T1, 12345, 2023-01-01, 8)	fn
			13	NaN	(T1, 12345, 2023-01-01, 9)	fn

We can see that the function returns a dataframe with the same index as the input dataframes and each WB is classified as TP, FP, or FN. For the TP WBs, the corresponding reference WB is assigned. For the comparison we want to perform here, only the matching WBs, i.e., the TPs, are of interest. If you are interested in the FPs or FNs, have a look at the general GSD evaluation example.

Based on the positive matches, we can now extract the DMO data from detected and reference data that is to be compared. To make extracting all the TP WBs a little easier, we can use the get_matching_intervals function.

from mobgap.pipeline.evaluation import get_matching_intervals

wb_matches = get_matching_intervals(
    metrics_detected=detected_dmo,
    metrics_reference=reference_dmo,
    matches=wb_tp_fp_fn,
)
wb_matches.T

	visit_type	T1
	participant_id	12345
	measurement_date	2023-01-01
	match_id	0	2	4	6
cadence_spm	detected	100.232432	87.484329	93.988941	87.428880
cadence_spm	reference	99.821880	86.535270	93.698950	87.093120
duration_s	detected	5.130315	9.140576	6.217228	14.034649
duration_s	reference	4.747020	8.527270	6.099070	13.520800
end	detected	5.000000	25.000000	45.000000	65.000000
end	reference	4.000000	24.000000	44.000000	64.000000
n_steps	detected	8.000000	12.000000	9.000000	17.000000
n_steps	reference	8.000000	12.000000	8.000000	17.000000
n_turns	detected	1.000000	3.000000	0.000000	1.000000
n_turns	reference	0.000000	2.000000	0.000000	1.000000
start	detected	0.000000	20.000000	40.000000	60.000000
start	reference	0.000000	20.000000	40.000000	60.000000
stride_duration_s	detected	2.384807	3.472869	3.313120	2.627496
stride_duration_s	reference	1.586750	2.560920	2.352950	2.005820
stride_length_m	detected	2.907343	2.572195	2.338223	3.309131
stride_length_m	reference	2.518850	1.663050	1.959690	2.417820
walking_speed_mps	detected	2.061659	2.255931	2.548155	2.783120
walking_speed_mps	reference	1.160790	1.600440	2.332300	2.095380
wb_id	detected	0.000000	2.000000	4.000000	6.000000
wb_id	reference	0.000000	2.000000	4.000000	6.000000

The returned dataframe contains the detected and reference values for all DMOs of the matched WBs. This in conveniently provided as a multiindex column, so that selecting a single DMO, yields a DataFrame with the detected and reference values.

wb_matches["cadence_spm"]

				detected	reference
visit_type	participant_id	measurement_date	match_id
T1	12345	2023-01-01	0	100.232432	99.82188
			2	87.484329	86.53527
			4	93.988941	93.69895
			6	87.428880	87.09312

From here on, the aggregated DMOs (when following approach 1) or matched WBs (when following approach 2) can be compared with the same methods to calculate error metrics. For the sake of simplicity, we will show the calculation of error metrics for the matched WBs wb_matches (approach 2) here. However, the input can also simply be replaced by the aggregated DMO dataframe ``

Estimate Errors in DMO data#

The DMO data can now be compared day by day (approach 1) or WB by WB (approach 2). We want to calculate general error metrics like the error, absolute error, relative error, and absolute relative error for each day (WB) and DMO. This can be done using the generic the apply_transformations helper that allows us to apply any list of transformation functions (transformation function -> WB in Series with same length out). It further allows us to declaratively define which transformation/error should be applied to which columns (i.e. which DMOs).

A simple definition of error metrics would look like this: As input, it receives the matching DMO data and a list of transformations that should be applied to the data. A transformation is characterized as a function that takes some subset of the input dataframe, performs some operation on it, and returns a series with the same length as the input as output. Calculating the differences between two sets of values, e.g., between detected and reference values, is a common type of transformation that is applied to evaluate the performance of the DMO estimation. For this purpose, the transformations are defined as aa list of tuples containing the DMO of interest as the first element and the error functions applied to the detected and reference values as the second element. This way, you can also define custom error functions and pass them as transformations. Note that the columns of the detected and reference values are expected to be named detected and reference per default. For the standard error metrics (error, absolute error, relative error, absolute relative), the get_default_error_transformations returns the correct transformations.

from mobgap.pipeline.evaluation import ErrorTransformFuncs as E

custom_errors = [
    ("cadence_spm", [E.abs_error, E.rel_error]),
    ("duration_s", [E.error]),
    ("n_turns", [E.rel_error]),
]

This definition should be relatively self-explanatory.

We can now apply these transformations to the DMO data using the apply_transformations. Note, that there is no need to group the dataframe again, as all the transformations are applied row-wise to the entire dataframe.

from mobgap.utils.df_operations import apply_transformations

custom_wb_errors = apply_transformations(wb_matches, custom_errors)
custom_wb_errors.T

/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/pipeline/_error_metrics.py:84: UserWarning: Zero division occurred in rel_error because divisor contains zeroes. Affected error metrics are set to NaN.
  _handle_zero_division(ref, zero_division_hint, "rel_error")

	visit_type	T1
	participant_id	12345
	measurement_date	2023-01-01
	match_id	0	2	4	6
cadence_spm	abs_error	0.410552	0.949059	0.289991	0.335760
cadence_spm	rel_error	0.004113	0.010967	0.003095	0.003855
duration_s	error	0.383295	0.613306	0.118158	0.513849
n_turns	rel_error	NaN	0.500000	NaN	0.000000

We can also modify the error metrics or provide custom error functions. We will show three options here.

Use a usual error metric, but change some input parameters, and have the output under a new name. For this case, we just define a new function wrapping the old one. For example, we might want to suppress the warning that is raised when a zero division occurs in the relative error. As we saw above, this warning is raised for the n_turns parameter.

def rel_error_without_warning(x):
    return E.rel_error(x, zero_division_hint=np.nan)

When we want to keep the same name for the function, we could just overwrite the old function. But, to avoid accidentally messing up other code, that uses the function, we can also use a lambda function and manually set the name of the function. As a result, we supress the warning as above, but keep the function name for the aggregation.

rel_error_as_lambda = lambda x: E.rel_error(x, zero_division_hint=np.nan)
rel_error_as_lambda.__name__ = "rel_error"

We can also define a completely new error function. The Dataframe we get as input here, contains the columns detected and reference with the detected and reference values for the DMO of interest. For this example here, we will create a nonsensical scaled_error function that scales the error by a factor of 2.

Note

If you want to introduce custom, more complex transformation functions, you can also define them as CustomOperation as shown for aggregations in the “Aggregation” section.

def scaled_error(x):
    return 2 * (x["detected"] - x["reference"])

Our custom functions can now be used in the transformations list and freely combined with other error metrics.

Also, keep in mind, that the definition is “just” Python, so we can use things like list comprehensions to generate the list of transformations as shown below.

custom_errors = [
    ("cadence_spm", [E.error, scaled_error]),
    ("duration_s", [E.error]),
    ("n_turns", [rel_error_without_warning, rel_error_as_lambda]),
    *(
        (m, [E.abs_error, E.rel_error])
        for m in ["stride_duration_s", "stride_length_m"]
    ),
]

custom_wb_errors = apply_transformations(wb_matches, custom_errors)
custom_wb_errors.T

	visit_type	T1
	participant_id	12345
	measurement_date	2023-01-01
	match_id	0	2	4	6
cadence_spm	error	0.410552	0.949059	0.289991	0.335760
cadence_spm	scaled_error	0.821103	1.898118	0.579983	0.671520
duration_s	error	0.383295	0.613306	0.118158	0.513849
n_turns	rel_error_without_warning	NaN	0.500000	NaN	0.000000
n_turns	rel_error	NaN	0.500000	NaN	0.000000
stride_duration_s	abs_error	0.798057	0.911949	0.960170	0.621676
stride_duration_s	rel_error	0.502951	0.356102	0.408071	0.309936
stride_length_m	abs_error	0.388493	0.909145	0.378533	0.891311
stride_length_m	rel_error	0.154234	0.546673	0.193160	0.368642

As expected, the resulting dataframe contains the error metrics for the specified DMOs and could now be further processed, e.g., by aggregating the results.

As an alternative to defining a custom error definition, we provide a “default” error definition that can be used to calculate the standard error metrics for the common DMOs. In most cases, this is a good starting point for the evaluation of the DMOs.

from mobgap.pipeline.evaluation import get_default_error_transformations

default_errors = get_default_error_transformations()

pprint(default_errors)

[('cadence_spm',
  [<function error at 0x7fdc83ac5480>,
   <function rel_error at 0x7fdc82f9b5b0>,
   <function abs_error at 0x7fdc82f9b640>,
   <function abs_rel_error at 0x7fdc82f9b6d0>]),
 ('duration_s',
  [<function error at 0x7fdc83ac5480>,
   <function rel_error at 0x7fdc82f9b5b0>,
   <function abs_error at 0x7fdc82f9b640>,
   <function abs_rel_error at 0x7fdc82f9b6d0>]),
 ('n_steps',
  [<function error at 0x7fdc83ac5480>,
   <function rel_error at 0x7fdc82f9b5b0>,
   <function abs_error at 0x7fdc82f9b640>,
   <function abs_rel_error at 0x7fdc82f9b6d0>]),
 ('n_strides',
  [<function error at 0x7fdc83ac5480>,
   <function rel_error at 0x7fdc82f9b5b0>,
   <function abs_error at 0x7fdc82f9b640>,
   <function abs_rel_error at 0x7fdc82f9b6d0>]),
 ('n_turns',
  [<function error at 0x7fdc83ac5480>,
   <function rel_error at 0x7fdc82f9b5b0>,
   <function abs_error at 0x7fdc82f9b640>,
   <function abs_rel_error at 0x7fdc82f9b6d0>]),
 ('stride_duration_s',
  [<function error at 0x7fdc83ac5480>,
   <function rel_error at 0x7fdc82f9b5b0>,
   <function abs_error at 0x7fdc82f9b640>,
   <function abs_rel_error at 0x7fdc82f9b6d0>]),
 ('stride_length_m',
  [<function error at 0x7fdc83ac5480>,
   <function rel_error at 0x7fdc82f9b5b0>,
   <function abs_error at 0x7fdc82f9b640>,
   <function abs_rel_error at 0x7fdc82f9b6d0>]),
 ('walking_speed_mps',
  [<function error at 0x7fdc83ac5480>,
   <function rel_error at 0x7fdc82f9b5b0>,
   <function abs_error at 0x7fdc82f9b640>,
   <function abs_rel_error at 0x7fdc82f9b6d0>])]

While the visualization here is a little ugly, we can see that the default error transformation attempts to calculate the error, the relative error, the absolute error, and the absolute relative error for all the core DMOs.

We can apply it as before.

wb_errors = apply_transformations(wb_matches, default_errors)
wb_errors.T

/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:365: UserWarning: One of the transformations requires the following columns: 'n_strides'. They are not found in the DataFrame.
  warnings.warn(str(e), stacklevel=1)
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/pipeline/_error_metrics.py:84: UserWarning: Zero division occurred in rel_error because divisor contains zeroes. Affected error metrics are set to NaN.
  _handle_zero_division(ref, zero_division_hint, "rel_error")
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/pipeline/_error_metrics.py:146: UserWarning: Zero division occurred in abs_rel_error because divisor contains zeroes. Affected error metrics are set to NaN.
  _handle_zero_division(ref, zero_division_hint, "abs_rel_error")

	visit_type	T1
	participant_id	12345
	measurement_date	2023-01-01
	match_id	0	2	4	6
cadence_spm	error	0.410552	0.949059	0.289991	0.335760
	rel_error	0.004113	0.010967	0.003095	0.003855
	abs_error	0.410552	0.949059	0.289991	0.335760
	abs_rel_error	0.004113	0.010967	0.003095	0.003855
duration_s	error	0.383295	0.613306	0.118158	0.513849
	rel_error	0.080744	0.071923	0.019373	0.038004
	abs_error	0.383295	0.613306	0.118158	0.513849
	abs_rel_error	0.080744	0.071923	0.019373	0.038004
n_steps	error	0.000000	0.000000	1.000000	0.000000
	rel_error	0.000000	0.000000	0.125000	0.000000
	abs_error	0.000000	0.000000	1.000000	0.000000
	abs_rel_error	0.000000	0.000000	0.125000	0.000000
n_turns	error	1.000000	1.000000	0.000000	0.000000
	rel_error	NaN	0.500000	NaN	0.000000
	abs_error	1.000000	1.000000	0.000000	0.000000
	abs_rel_error	NaN	0.500000	NaN	0.000000
stride_duration_s	error	0.798057	0.911949	0.960170	0.621676
	rel_error	0.502951	0.356102	0.408071	0.309936
	abs_error	0.798057	0.911949	0.960170	0.621676
	abs_rel_error	0.502951	0.356102	0.408071	0.309936
stride_length_m	error	0.388493	0.909145	0.378533	0.891311
	rel_error	0.154234	0.546673	0.193160	0.368642
	abs_error	0.388493	0.909145	0.378533	0.891311
	abs_rel_error	0.154234	0.546673	0.193160	0.368642
walking_speed_mps	error	0.900869	0.655491	0.215855	0.687740
	rel_error	0.776083	0.409569	0.092550	0.328217
	abs_error	0.900869	0.655491	0.215855	0.687740
	abs_rel_error	0.776083	0.409569	0.092550	0.328217

Before we now aggregate the results, we can also combine the error metrics with the reference and detected values to have all the information in one dataframe.

wb_matches_with_errors = pd.concat([wb_matches, wb_errors], axis=1)
wb_matches_with_errors.T

	visit_type	T1
	participant_id	12345
	measurement_date	2023-01-01
	match_id	0	2	4	6
cadence_spm	detected	100.232432	87.484329	93.988941	87.428880
cadence_spm	reference	99.821880	86.535270	93.698950	87.093120
duration_s	detected	5.130315	9.140576	6.217228	14.034649
duration_s	reference	4.747020	8.527270	6.099070	13.520800
end	detected	5.000000	25.000000	45.000000	65.000000
end	reference	4.000000	24.000000	44.000000	64.000000
n_steps	detected	8.000000	12.000000	9.000000	17.000000
n_steps	reference	8.000000	12.000000	8.000000	17.000000
n_turns	detected	1.000000	3.000000	0.000000	1.000000
n_turns	reference	0.000000	2.000000	0.000000	1.000000
start	detected	0.000000	20.000000	40.000000	60.000000
start	reference	0.000000	20.000000	40.000000	60.000000
stride_duration_s	detected	2.384807	3.472869	3.313120	2.627496
stride_duration_s	reference	1.586750	2.560920	2.352950	2.005820
stride_length_m	detected	2.907343	2.572195	2.338223	3.309131
stride_length_m	reference	2.518850	1.663050	1.959690	2.417820
walking_speed_mps	detected	2.061659	2.255931	2.548155	2.783120
walking_speed_mps	reference	1.160790	1.600440	2.332300	2.095380
wb_id	detected	0.000000	2.000000	4.000000	6.000000
wb_id	reference	0.000000	2.000000	4.000000	6.000000
cadence_spm	error	0.410552	0.949059	0.289991	0.335760
	rel_error	0.004113	0.010967	0.003095	0.003855
	abs_error	0.410552	0.949059	0.289991	0.335760
	abs_rel_error	0.004113	0.010967	0.003095	0.003855
duration_s	error	0.383295	0.613306	0.118158	0.513849
	rel_error	0.080744	0.071923	0.019373	0.038004
	abs_error	0.383295	0.613306	0.118158	0.513849
	abs_rel_error	0.080744	0.071923	0.019373	0.038004
n_steps	error	0.000000	0.000000	1.000000	0.000000
	rel_error	0.000000	0.000000	0.125000	0.000000
	abs_error	0.000000	0.000000	1.000000	0.000000
	abs_rel_error	0.000000	0.000000	0.125000	0.000000
n_turns	error	1.000000	1.000000	0.000000	0.000000
	rel_error	NaN	0.500000	NaN	0.000000
	abs_error	1.000000	1.000000	0.000000	0.000000
	abs_rel_error	NaN	0.500000	NaN	0.000000
stride_duration_s	error	0.798057	0.911949	0.960170	0.621676
	rel_error	0.502951	0.356102	0.408071	0.309936
	abs_error	0.798057	0.911949	0.960170	0.621676
	abs_rel_error	0.502951	0.356102	0.408071	0.309936
stride_length_m	error	0.388493	0.909145	0.378533	0.891311
	rel_error	0.154234	0.546673	0.193160	0.368642
	abs_error	0.388493	0.909145	0.378533	0.891311
	abs_rel_error	0.154234	0.546673	0.193160	0.368642
walking_speed_mps	error	0.900869	0.655491	0.215855	0.687740
	rel_error	0.776083	0.409569	0.092550	0.328217
	abs_error	0.900869	0.655491	0.215855	0.687740
	abs_rel_error	0.776083	0.409569	0.092550	0.328217

Aggregate Results#

Finally, the estimated DMO measures and their errors can be aggregated over all WBs (approach 2) or all days (approach 1). For this purpose, different aggregation functions can be applied to the error metrics, ranging from simple, built-in aggregations like the mean or standard deviation to more complex functions like the limits of agreement or 5th and 95th percentiles. This can be done using the apply_aggregations function. It operates similarly to the apply_transformations function used above by taking the error metrics dataframe and a list of aggregations as input. In contrast to the transformations, an aggregation performed over a subset of dataframe columns is expected to return a single value or a tuple of values stored in one cell of the resulting dataframe. There are two ways to define aggregations:

As a tuple in the format (<identifier>, <aggregation>). In this case, the operation is performed based on exactly one column from the input df. Therefore, <identifier> can either be a string representing the name of the column to evaluate (for data with single-level columns), or a tuple of strings uniquely identifying the column to evaluate in case of multi-index columns. In our example, the identifier is a tuple (<metric>, <origin>), where <metric> is the metric column to evaluate, <origin> is the specific column from which data should be utilized (here, it would be either detected, reference, or one of the error columns).

<aggregation> is the function or the list of functions to apply. The output dataframe will have a multilevel column with metric as the first level and origin as the second level. A valid aggregations list for all of our DMOs would consequently look like this:

metrics = [
    "cadence_spm",
    "duration_s",
    "n_steps",
    "n_turns",
    "stride_duration_s",
    "stride_length_m",
    "walking_speed_mps",
]
aggregations_simple = [
    ((m, o), ["mean", "std"])
    for m in metrics
    for o in ["detected", "reference", "error"]
]
pprint(aggregations_simple)

[(('cadence_spm', 'detected'), ['mean', 'std']),
 (('cadence_spm', 'reference'), ['mean', 'std']),
 (('cadence_spm', 'error'), ['mean', 'std']),
 (('duration_s', 'detected'), ['mean', 'std']),
 (('duration_s', 'reference'), ['mean', 'std']),
 (('duration_s', 'error'), ['mean', 'std']),
 (('n_steps', 'detected'), ['mean', 'std']),
 (('n_steps', 'reference'), ['mean', 'std']),
 (('n_steps', 'error'), ['mean', 'std']),
 (('n_turns', 'detected'), ['mean', 'std']),
 (('n_turns', 'reference'), ['mean', 'std']),
 (('n_turns', 'error'), ['mean', 'std']),
 (('stride_duration_s', 'detected'), ['mean', 'std']),
 (('stride_duration_s', 'reference'), ['mean', 'std']),
 (('stride_duration_s', 'error'), ['mean', 'std']),
 (('stride_length_m', 'detected'), ['mean', 'std']),
 (('stride_length_m', 'reference'), ['mean', 'std']),
 (('stride_length_m', 'error'), ['mean', 'std']),
 (('walking_speed_mps', 'detected'), ['mean', 'std']),
 (('walking_speed_mps', 'reference'), ['mean', 'std']),
 (('walking_speed_mps', 'error'), ['mean', 'std'])]

As a named tuple of Type CustomOperation taking three values: identifier, function, and column_name. identifier is a valid loc identifier selecting one or more columns from the dataframe, function is the (custom) aggregation function or list of functions to apply, and column_name is the name of the resulting column in the output dataframe (single-level column if column_name is a string, multi-level column if column_name is a tuple). This allows for more complex aggregations that require multiple columns as input, for example, the intraclass correlation coefficient (ICC) for the DMOs (see below). A valid aggregation list for calculating the ICC of all DMOs would look like this:

from mobgap.pipeline.evaluation import CustomErrorAggregations as A
from mobgap.pipeline.evaluation import get_default_error_aggregations
from mobgap.utils.df_operations import CustomOperation

aggregations_custom = [
    CustomOperation(identifier=m, function=A.icc, column_name=(m, "all"))
    for m in metrics
]
pprint(aggregations_custom)

[CustomOperation(identifier='cadence_spm', function=<function icc at 0x7fdc82f9b7f0>, column_name=('cadence_spm', 'all')),
 CustomOperation(identifier='duration_s', function=<function icc at 0x7fdc82f9b7f0>, column_name=('duration_s', 'all')),
 CustomOperation(identifier='n_steps', function=<function icc at 0x7fdc82f9b7f0>, column_name=('n_steps', 'all')),
 CustomOperation(identifier='n_turns', function=<function icc at 0x7fdc82f9b7f0>, column_name=('n_turns', 'all')),
 CustomOperation(identifier='stride_duration_s', function=<function icc at 0x7fdc82f9b7f0>, column_name=('stride_duration_s', 'all')),
 CustomOperation(identifier='stride_length_m', function=<function icc at 0x7fdc82f9b7f0>, column_name=('stride_length_m', 'all')),
 CustomOperation(identifier='walking_speed_mps', function=<function icc at 0x7fdc82f9b7f0>, column_name=('walking_speed_mps', 'all'))]

In this case, the ICC function gets the entire “sub-dataframe” obtained by the selection wb_matches_with_errors.loc[:, m] as shown below for stride_duration_s as example, and could then perform any required calculations. The selection could theoretically be any valid loc selection. So you could even select values across multiple DMOs.

sub_df = wb_matches_with_errors.loc[:, "stride_duration_s"]

The ICC function just takes the detected and reference columns and calculates the ICC.

A.icc(sub_df)

(0.12564828430955782, array([-0.77,  0.9 ]))

Within one aggregation list, both types of aggregations can be combined as long as the resulting output dataframes can be concatenated, i.e. have the same number of column levels. Then, the apply_aggregations function can be called. This returns a pandas Series with the aggregated values for each metric and origin. For better readability, we sort and format the resulting dataframe.

from mobgap.utils.df_operations import apply_aggregations

aggregations = aggregations_simple + aggregations_custom
agg_results = (
    apply_aggregations(wb_matches_with_errors, aggregations)
    .rename_axis(index=["aggregation", "metric", "origin"])
    .reorder_levels(["metric", "origin", "aggregation"])
    .sort_index(level=0)
    .to_frame("values")
)
agg_results

			values
metric	origin	aggregation
cadence_spm	all	icc	(0.9958867337397626, [0.96, 1.0])
	detected	mean	92.283646
	detected	std	6.128986
	error	mean	0.496341
	error	std	0.305876
	reference	mean	91.787305
	reference	std	6.267057
duration_s	all	icc	(0.9935066257807105, [0.94, 1.0])
	detected	mean	8.630692
	detected	std	3.980795
	error	mean	0.407152
	error	std	0.214453
	reference	mean	8.22354
	reference	std	3.86233
n_steps	all	icc	(0.9927710843373494, [0.93, 1.0])
	detected	mean	11.5
	detected	std	4.041452
	error	mean	0.25
	error	std	0.5
	reference	mean	11.25
	reference	std	4.272002
n_turns	all	icc	(0.8064516129032259, [-0.03, 0.99])
	detected	mean	1.25
	detected	std	1.258306
	error	mean	0.5
	error	std	0.57735
	reference	mean	0.75
	reference	std	0.957427
stride_duration_s	all	icc	(0.12564828430955782, [-0.77, 0.9])
	detected	mean	2.949573
	detected	std	0.525579
	error	mean	0.822963
	error	std	0.150423
	reference	mean	2.12661
	reference	std	0.426573
stride_length_m	all	icc	(0.1021600349369979, [-0.78, 0.9])
	detected	mean	2.781723
	detected	std	0.422111
	error	mean	0.64187
	error	std	0.298442
	reference	mean	2.139852
	reference	std	0.400293
walking_speed_mps	all	icc	(0.20380942118620765, [-0.74, 0.92])
	detected	mean	2.412216
	detected	std	0.317996
	error	mean	0.614989
	error	std	0.2875
	reference	mean	1.797227
	reference	std	0.522486

If you simply want to apply a standard set of aggregations to the error metrics, you can use the get_default_error_aggregations function, resulting in the following list:

aggregations_default = get_default_error_aggregations()
pprint(aggregations_default)

[(('cadence_spm', 'detected'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('cadence_spm', 'reference'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('cadence_spm', 'abs_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('cadence_spm', 'abs_rel_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('duration_s', 'detected'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('duration_s', 'reference'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('duration_s', 'abs_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('duration_s', 'abs_rel_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_steps', 'detected'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_steps', 'reference'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_steps', 'abs_error'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_steps', 'abs_rel_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_strides', 'detected'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_strides', 'reference'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_strides', 'abs_error'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_strides', 'abs_rel_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_turns', 'detected'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_turns', 'reference'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_turns', 'abs_error'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_turns', 'abs_rel_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('stride_duration_s', 'detected'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('stride_duration_s', 'reference'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('stride_duration_s', 'abs_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('stride_duration_s', 'abs_rel_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('stride_length_m', 'detected'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('stride_length_m', 'reference'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('stride_length_m', 'abs_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('stride_length_m', 'abs_rel_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('walking_speed_mps', 'detected'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('walking_speed_mps', 'reference'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('walking_speed_mps', 'abs_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('walking_speed_mps', 'abs_rel_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('cadence_spm', 'error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('cadence_spm', 'rel_error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('duration_s', 'error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('duration_s', 'rel_error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('n_steps', 'error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('n_steps', 'rel_error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('n_strides', 'error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('n_strides', 'rel_error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('n_turns', 'error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('n_turns', 'rel_error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('stride_duration_s', 'error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('stride_duration_s', 'rel_error'),
  ['mean', <function loa at 0x7fdc82f9b910>]),
 (('stride_length_m', 'error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('stride_length_m', 'rel_error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('walking_speed_mps', 'error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('walking_speed_mps', 'rel_error'),
  ['mean', <function loa at 0x7fdc82f9b910>]),
 CustomOperation(identifier='cadence_spm', function=<function icc at 0x7fdc82f9b7f0>, column_name=('cadence_spm', 'all')),
 CustomOperation(identifier='duration_s', function=<function icc at 0x7fdc82f9b7f0>, column_name=('duration_s', 'all')),
 CustomOperation(identifier='n_steps', function=<function icc at 0x7fdc82f9b7f0>, column_name=('n_steps', 'all')),
 CustomOperation(identifier='n_strides', function=<function icc at 0x7fdc82f9b7f0>, column_name=('n_strides', 'all')),
 CustomOperation(identifier='n_turns', function=<function icc at 0x7fdc82f9b7f0>, column_name=('n_turns', 'all')),
 CustomOperation(identifier='stride_duration_s', function=<function icc at 0x7fdc82f9b7f0>, column_name=('stride_duration_s', 'all')),
 CustomOperation(identifier='stride_length_m', function=<function icc at 0x7fdc82f9b7f0>, column_name=('stride_length_m', 'all')),
 CustomOperation(identifier='walking_speed_mps', function=<function icc at 0x7fdc82f9b7f0>, column_name=('walking_speed_mps', 'all')),
 CustomOperation(identifier=None, function=<function n_datapoints at 0x7fdc82f9b9a0>, column_name=('all', 'all'))]

If you want to include further aggregations next to the default ones, you can also append them to this list.

aggregations_default_extended = aggregations_default + [
    *(((m, o), ["std"]) for m in metrics for o in ["detected", "reference"])
]

This list of standard aggregations can then also be passed to the apply_aggregations function.

default_agg_results = (
    apply_aggregations(wb_matches_with_errors, aggregations_default_extended)
    .rename_axis(index=["aggregation", "metric", "origin"])
    .reorder_levels(["metric", "origin", "aggregation"])
    .sort_index(level=0)
    .to_frame("values")
)
default_agg_results

/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'detected')'. They are not found in the DataFrame.
  warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1)
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'reference')'. They are not found in the DataFrame.
  warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1)
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'abs_error')'. They are not found in the DataFrame.
  warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1)
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'abs_rel_error')'. They are not found in the DataFrame.
  warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1)
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'error')'. They are not found in the DataFrame.
  warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1)
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'rel_error')'. They are not found in the DataFrame.
  warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1)
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:566: UserWarning: One of the transformations requires the following columns: 'n_strides'. They are not found in the DataFrame.
  warnings.warn(str(e), UserWarning, stacklevel=1)

			values
metric	origin	aggregation
all	all	n_datapoints	4
cadence_spm	abs_error	mean	0.496341
	abs_error	quantiles	(0.29685672457991713, 0.8682829318918925)
	abs_rel_error	mean	0.005508
	abs_rel_error	quantiles	(0.003208965846203449, 0.00993913893132042)
...	...	...	...
walking_speed_mps	reference	mean	1.797227
		quantiles	(1.2267375, 2.2967619999999997)
		std	0.522486
	rel_error	loa	(-0.15414805045818636, 0.9573580216047631)
	rel_error	mean	0.401605

106 rows × 1 columns

Note

If you want to modify the default arguments of the aggregation functions, e.g. to change the calculated quantiles, you can either define custom aggregation functions or adapt the default functions as shown for the transformation functions above.

Total running time of the script: (0 minutes 4.901 seconds)

Estimated memory usage: 9 MB

Gallery generated by Sphinx-Gallery