Evaluation of final walking bout level DMOs#

This example shows how to evaluate the performance of parameters on a walking bout (WB) level by comparing against a reference. On this level, we usually need to deal with the issue that the WB identified by the algorithm pipeline might not match the reference WBs. This makes comparing the parameters within them difficult. In general, two approaches can be taken here [1]:

  1. First aggregate the WB-level parameters of both systems to a common level (e.g. per trial, per day, per hour, …) and then compare the aggregated values.

  2. Identify the subset of WBs that match between the two systems and compare the parameters only within these WBs.

In the following example we will show both approaches.

But first some general setup.

Loading some example data#

We simply load some example DMO data and their reference that we provide with the package. Usually, the “detected” data would be the output of your algorithm pipeline and the “reference” data would be the ground truth.

Note

This data is randomly generated and not physiologically meaningful. However, it has the same structure as any other typical input data for this evaluation.

from pprint import pprint

import numpy as np
import pandas as pd
from mobgap import PACKAGE_ROOT

DATA_PATH = PACKAGE_ROOT.parent / "example_data/dmo_data/dummy_dmo_data"

detected_dmo = pd.read_csv(DATA_PATH / "detected_dmo_data.csv").set_index(
    ["visit_type", "participant_id", "measurement_date", "wb_id"]
)

reference_dmo = pd.read_csv(DATA_PATH / "reference_dmo_data.csv").set_index(
    ["visit_type", "participant_id", "measurement_date", "wb_id"]
)

In both dataframes each row represents one WB with all of its parameters. The index contains multiple levels, including the visit type, participant_id, measurement day, and WB id, The start and end index of each WB in samples relative to the start of the respective recording is contained in the columns start and end.

start end duration_s n_steps cadence_spm walking_speed_mps stride_length_m stride_duration_s n_turns
visit_type participant_id measurement_date wb_id
T1 12345 2023-01-01 0 0 5 5.130315 8 100.232432 2.061659 2.907343 2.384807 1
1 10 15 5.436672 7 101.677896 2.722036 2.469691 2.439419 1
2 20 25 9.140576 12 87.484329 2.255931 2.572195 3.472869 3
3 30 35 17.204985 28 92.096962 1.141349 1.595533 3.507587 1
4 40 45 6.217228 9 93.988941 2.548155 2.338223 3.313120 0
5 50 55 3.521295 8 99.425820 1.820821 2.882743 2.388743 0
6 60 65 14.034649 17 87.428880 2.783120 3.309131 2.627496 1
7 70 75 8.296356 12 86.372700 2.240491 2.721844 1.653604 1


start end duration_s n_steps cadence_spm walking_speed_mps stride_length_m stride_duration_s n_turns
visit_type participant_id measurement_date wb_id
T1 12345 2023-01-01 0 0 4 4.74702 8 99.82188 1.16079 2.51885 1.58675 0
1 15 19 5.13150 7 101.16429 2.57881 1.57243 1.46537 0
2 20 24 8.52727 12 86.53527 1.60044 1.66305 2.56092 2
3 35 39 16.24554 27 91.49977 0.95558 0.88961 3.14549 1
4 40 44 6.09907 8 93.69895 2.33230 1.95969 2.35295 0
5 55 59 3.25806 7 99.33525 1.14732 2.37307 1.81262 0
6 60 64 13.52080 17 87.09312 2.09538 2.41782 2.00582 1
7 75 79 7.49830 12 85.96436 2.23757 1.89026 1.55788 0
8 80 84 8.21455 10 75.12352 0.59915 2.16121 2.31160 0
9 95 99 6.84377 9 76.61402 2.22903 1.03362 3.17821 0


Approach 1: Aggregate then compare#

First, we combine the detected and reference data, which can easily be done as both dataframes have the same index levels. To sustain the information about the origin of the data, we add a column level assigning "detected" and "reference" to the respective dmos. Furthermore, we rearrange the columns to have the DMO metrics as the first level of the column index.

combined_dmos = (
    pd.concat(
        [detected_dmo, reference_dmo], keys=["detected", "reference"], axis=1
    )
    .reorder_levels((1, 0), axis=1)
    .sort_index(axis=1)
)
combined_dmos
cadence_spm duration_s end n_steps n_turns start stride_duration_s stride_length_m walking_speed_mps
detected reference detected reference detected reference detected reference detected reference detected reference detected reference detected reference detected reference
visit_type participant_id measurement_date wb_id
T1 12345 2023-01-01 0 100.232432 99.82188 5.130315 4.74702 5.0 4 8.0 8 1.0 0 0.0 0 2.384807 1.58675 2.907343 2.51885 2.061659 1.16079
1 101.677896 101.16429 5.436672 5.13150 15.0 19 7.0 7 1.0 0 10.0 15 2.439419 1.46537 2.469691 1.57243 2.722036 2.57881
2 87.484329 86.53527 9.140576 8.52727 25.0 24 12.0 12 3.0 2 20.0 20 3.472869 2.56092 2.572195 1.66305 2.255931 1.60044
3 92.096962 91.49977 17.204985 16.24554 35.0 39 28.0 27 1.0 1 30.0 35 3.507587 3.14549 1.595533 0.88961 1.141349 0.95558
4 93.988941 93.69895 6.217228 6.09907 45.0 44 9.0 8 0.0 0 40.0 40 3.313120 2.35295 2.338223 1.95969 2.548155 2.33230
5 99.425820 99.33525 3.521295 3.25806 55.0 59 8.0 7 0.0 0 50.0 55 2.388743 1.81262 2.882743 2.37307 1.820821 1.14732
6 87.428880 87.09312 14.034649 13.52080 65.0 64 17.0 17 1.0 1 60.0 60 2.627496 2.00582 3.309131 2.41782 2.783120 2.09538
7 86.372700 85.96436 8.296356 7.49830 75.0 79 12.0 12 1.0 0 70.0 75 1.653604 1.55788 2.721844 1.89026 2.240491 2.23757
8 NaN 75.12352 NaN 8.21455 NaN 84 NaN 10 NaN 0 NaN 80 NaN 2.31160 NaN 2.16121 NaN 0.59915
9 NaN 76.61402 NaN 6.84377 NaN 99 NaN 9 NaN 0 NaN 95 NaN 3.17821 NaN 1.03362 NaN 2.22903


This provides us with a dataframe containing the detected and reference values for all detected and reference WBs. Some entries are NaN, as the number of WBs in the detected and reference data might differ. The single rows in this dataframe should not be compared directly, as the same WB ids from a detected and a reference WB might not actually belong to the same WB. Therefore, we need to aggregate the DMO data based on an index level of choice, e.g., per day, to retrieve meaningful and interpretable results. This can for instance be done by grouping the data and averaging over the groups. If required, apart from simple groupwise averaging, other aggregation functions (e.g., moving averages or averaging over a span of several days) can be applied. As long as the same aggregation method is applied to both the detected and reference data, further processing can be done in the same way as shown below.

Note

In case of missing data, applying dropna() to the resulting dataframe might be helpful to remove all groups were either detected or reference data is missing.

daily_matches = (
    combined_dmos.groupby(
        level=["visit_type", "participant_id", "measurement_date"], axis=0
    )
    .mean()
    .dropna()
)
daily_matches.T
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/examples/pipeline/_03_dmo_evaluation_on_wb_level.py:92: FutureWarning: The 'axis' keyword in DataFrame.groupby is deprecated and will be removed in a future version.
  combined_dmos.groupby(
visit_type T1
participant_id 12345
measurement_date 2023-01-01
cadence_spm detected 93.588495
reference 89.685043
duration_s detected 8.622759
reference 8.008588
end detected 40.000000
reference 51.500000
n_steps detected 12.625000
reference 11.700000
n_turns detected 1.000000
reference 0.400000
start detected 35.000000
reference 47.500000
stride_duration_s detected 2.723456
reference 2.197761
stride_length_m detected 2.599588
reference 1.847961
walking_speed_mps detected 2.196695
reference 1.693637


The resulting dataframe contains the average detected and reference values for each DMO per visit type, participant, and day. This in conveniently provided as a multiindex column, so that selecting a single DMO, yields a DataFrame with the detected and reference values.

daily_matches["cadence_spm"]
detected reference
visit_type participant_id measurement_date
T1 12345 2023-01-01 93.588495 89.685043


In our example data, we only have data from a single day, so the aggregated result only has one row. Normally, you would have multiple rows, one for each group of WBs. From here on, further processing to retrieve the aggregated error metrics is identical to the further processing when following approach 2, and is shown below.

But let’s first show how to calculate the error metrics on a WB-by-WB basis.

Approach 2: Match then compare#

As the first step we need to indentify WBs that match between the detected and reference data. As it is unlikely that the WBs are exactly the same, we need to define a threshold for the overlap between the WBs to consider them as a match. This matching can be done using the categorize_intervals function. It classifies every WB in the data either as true positive (TP), false positive (FP), or false negative (TP). In case our data has only WBs from a single recording, we could directly provide the detected and reference data to the function.

However, in most cases data would contain WBs from multiple recordings, trials, and participants, … . In our case, we actually only have WBs from a single recording, but we will still show the approach assuming that the data is more complex.

To avoid, that WBs from different recordings are matched (as the matching is just performed based on the start/end index), we need to group the data by the relevant index levels first and apply the matching function to each group. This can be done using the create_multi_groupby helper function.

from mobgap.utils.df_operations import create_multi_groupby

per_trial_participant_day_grouper = create_multi_groupby(
    detected_dmo,
    reference_dmo,
    groupby=["visit_type", "participant_id", "measurement_date"],
)

This provides us with a groupby-object that is similar to the normal pandas groupby-object that can be created from a single dataframe. The MultiGroupBy object allows us to apply a function to each group across all dataframes.

Here we apply categorize_intervals with a threshold of 0.8 to each group. The overlap_threshold parameter defines the minimum overlap between the detected and reference WBs to be considered a match. It can be chosen according to your needs, whereby a value closer to 0.5 will yield more matches than a value closer to 1.

from mobgap.pipeline.evaluation import categorize_intervals

wb_tp_fp_fn = per_trial_participant_day_grouper.apply(
    lambda det, ref: categorize_intervals(
        gsd_list_detected=det,
        gsd_list_reference=ref,
        overlap_threshold=0.8,
        multiindex_warning=False,
    )
)
wb_tp_fp_fn
gs_id_detected gs_id_reference match_type
visit_type participant_id measurement_date match_id
T1 12345 2023-01-01 0 (T1, 12345, 2023-01-01, 0) (T1, 12345, 2023-01-01, 0) tp
1 (T1, 12345, 2023-01-01, 1) NaN fp
2 (T1, 12345, 2023-01-01, 2) (T1, 12345, 2023-01-01, 2) tp
3 (T1, 12345, 2023-01-01, 3) NaN fp
4 (T1, 12345, 2023-01-01, 4) (T1, 12345, 2023-01-01, 4) tp
5 (T1, 12345, 2023-01-01, 5) NaN fp
6 (T1, 12345, 2023-01-01, 6) (T1, 12345, 2023-01-01, 6) tp
7 (T1, 12345, 2023-01-01, 7) NaN fp
8 NaN (T1, 12345, 2023-01-01, 1) fn
9 NaN (T1, 12345, 2023-01-01, 3) fn
10 NaN (T1, 12345, 2023-01-01, 5) fn
11 NaN (T1, 12345, 2023-01-01, 7) fn
12 NaN (T1, 12345, 2023-01-01, 8) fn
13 NaN (T1, 12345, 2023-01-01, 9) fn


We can see that the function returns a dataframe with the same index as the input dataframes and each WB is classified as TP, FP, or FN. For the TP WBs, the corresponding reference WB is assigned. For the comparison we want to perform here, only the matching WBs, i.e., the TPs, are of interest. If you are interested in the FPs or FNs, have a look at the general GSD evaluation example.

Based on the positive matches, we can now extract the DMO data from detected and reference data that is to be compared. To make extracting all the TP WBs a little easier, we can use the get_matching_intervals function.

from mobgap.pipeline.evaluation import get_matching_intervals

wb_matches = get_matching_intervals(
    metrics_detected=detected_dmo,
    metrics_reference=reference_dmo,
    matches=wb_tp_fp_fn,
)
wb_matches.T
visit_type T1
participant_id 12345
measurement_date 2023-01-01
match_id 0 2 4 6
cadence_spm detected 100.232432 87.484329 93.988941 87.428880
reference 99.821880 86.535270 93.698950 87.093120
duration_s detected 5.130315 9.140576 6.217228 14.034649
reference 4.747020 8.527270 6.099070 13.520800
end detected 5.000000 25.000000 45.000000 65.000000
reference 4.000000 24.000000 44.000000 64.000000
n_steps detected 8.000000 12.000000 9.000000 17.000000
reference 8.000000 12.000000 8.000000 17.000000
n_turns detected 1.000000 3.000000 0.000000 1.000000
reference 0.000000 2.000000 0.000000 1.000000
start detected 0.000000 20.000000 40.000000 60.000000
reference 0.000000 20.000000 40.000000 60.000000
stride_duration_s detected 2.384807 3.472869 3.313120 2.627496
reference 1.586750 2.560920 2.352950 2.005820
stride_length_m detected 2.907343 2.572195 2.338223 3.309131
reference 2.518850 1.663050 1.959690 2.417820
walking_speed_mps detected 2.061659 2.255931 2.548155 2.783120
reference 1.160790 1.600440 2.332300 2.095380
wb_id detected 0.000000 2.000000 4.000000 6.000000
reference 0.000000 2.000000 4.000000 6.000000


The returned dataframe contains the detected and reference values for all DMOs of the matched WBs. This in conveniently provided as a multiindex column, so that selecting a single DMO, yields a DataFrame with the detected and reference values.

wb_matches["cadence_spm"]
detected reference
visit_type participant_id measurement_date match_id
T1 12345 2023-01-01 0 100.232432 99.82188
2 87.484329 86.53527
4 93.988941 93.69895
6 87.428880 87.09312


From here on, the aggregated DMOs (when following approach 1) or matched WBs (when following approach 2) can be compared with the same methods to calculate error metrics. For the sake of simplicity, we will show the calculation of error metrics for the matched WBs wb_matches (approach 2) here. However, the input can also simply be replaced by the aggregated DMO dataframe ``

Estimate Errors in DMO data#

The DMO data can now be compared day by day (approach 1) or WB by WB (approach 2). We want to calculate general error metrics like the error, absolute error, relative error, and absolute relative error for each day (WB) and DMO. This can be done using the generic the apply_transformations helper that allows us to apply any list of transformation functions (transformation function -> WB in Series with same length out). It further allows us to declaratively define which transformation/error should be applied to which columns (i.e. which DMOs).

A simple definition of error metrics would look like this: As input, it receives the matching DMO data and a list of transformations that should be applied to the data. A transformation is characterized as a function that takes some subset of the input dataframe, performs some operation on it, and returns a series with the same length as the input as output. Calculating the differences between two sets of values, e.g., between detected and reference values, is a common type of transformation that is applied to evaluate the performance of the DMO estimation. For this purpose, the transformations are defined as aa list of tuples containing the DMO of interest as the first element and the error functions applied to the detected and reference values as the second element. This way, you can also define custom error functions and pass them as transformations. Note that the columns of the detected and reference values are expected to be named detected and reference per default. For the standard error metrics (error, absolute error, relative error, absolute relative), the get_default_error_transformations returns the correct transformations.

from mobgap.pipeline.evaluation import ErrorTransformFuncs as E

custom_errors = [
    ("cadence_spm", [E.abs_error, E.rel_error]),
    ("duration_s", [E.error]),
    ("n_turns", [E.rel_error]),
]

This definition should be relatively self-explanatory.

We can now apply these transformations to the DMO data using the apply_transformations. Note, that there is no need to group the dataframe again, as all the transformations are applied row-wise to the entire dataframe.

/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/pipeline/_error_metrics.py:84: UserWarning: Zero division occurred in rel_error because divisor contains zeroes. Affected error metrics are set to NaN.
  _handle_zero_division(ref, zero_division_hint, "rel_error")
visit_type T1
participant_id 12345
measurement_date 2023-01-01
match_id 0 2 4 6
cadence_spm abs_error 0.410552 0.949059 0.289991 0.335760
rel_error 0.004113 0.010967 0.003095 0.003855
duration_s error 0.383295 0.613306 0.118158 0.513849
n_turns rel_error NaN 0.500000 NaN 0.000000


We can also modify the error metrics or provide custom error functions. We will show three options here.

  1. Use a usual error metric, but change some input parameters, and have the output under a new name. For this case, we just define a new function wrapping the old one. For example, we might want to suppress the warning that is raised when a zero division occurs in the relative error. As we saw above, this warning is raised for the n_turns parameter.

def rel_error_without_warning(x):
    return E.rel_error(x, zero_division_hint=np.nan)
  1. When we want to keep the same name for the function, we could just overwrite the old function. But, to avoid accidentally messing up other code, that uses the function, we can also use a lambda function and manually set the name of the function. As a result, we supress the warning as above, but keep the function name for the aggregation.

rel_error_as_lambda = lambda x: E.rel_error(x, zero_division_hint=np.nan)
rel_error_as_lambda.__name__ = "rel_error"
  1. We can also define a completely new error function. The Dataframe we get as input here, contains the columns detected and reference with the detected and reference values for the DMO of interest. For this example here, we will create a nonsensical scaled_error function that scales the error by a factor of 2.

Note

If you want to introduce custom, more complex transformation functions, you can also define them as CustomOperation as shown for aggregations in the “Aggregation” section.

def scaled_error(x):
    return 2 * (x["detected"] - x["reference"])

Our custom functions can now be used in the transformations list and freely combined with other error metrics.

Also, keep in mind, that the definition is “just” Python, so we can use things like list comprehensions to generate the list of transformations as shown below.

custom_errors = [
    ("cadence_spm", [E.error, scaled_error]),
    ("duration_s", [E.error]),
    ("n_turns", [rel_error_without_warning, rel_error_as_lambda]),
    *(
        (m, [E.abs_error, E.rel_error])
        for m in ["stride_duration_s", "stride_length_m"]
    ),
]

custom_wb_errors = apply_transformations(wb_matches, custom_errors)
custom_wb_errors.T
visit_type T1
participant_id 12345
measurement_date 2023-01-01
match_id 0 2 4 6
cadence_spm error 0.410552 0.949059 0.289991 0.335760
scaled_error 0.821103 1.898118 0.579983 0.671520
duration_s error 0.383295 0.613306 0.118158 0.513849
n_turns rel_error_without_warning NaN 0.500000 NaN 0.000000
rel_error NaN 0.500000 NaN 0.000000
stride_duration_s abs_error 0.798057 0.911949 0.960170 0.621676
rel_error 0.502951 0.356102 0.408071 0.309936
stride_length_m abs_error 0.388493 0.909145 0.378533 0.891311
rel_error 0.154234 0.546673 0.193160 0.368642


As expected, the resulting dataframe contains the error metrics for the specified DMOs and could now be further processed, e.g., by aggregating the results.

As an alternative to defining a custom error definition, we provide a “default” error definition that can be used to calculate the standard error metrics for the common DMOs. In most cases, this is a good starting point for the evaluation of the DMOs.

[('cadence_spm',
  [<function error at 0x7fdc83ac5480>,
   <function rel_error at 0x7fdc82f9b5b0>,
   <function abs_error at 0x7fdc82f9b640>,
   <function abs_rel_error at 0x7fdc82f9b6d0>]),
 ('duration_s',
  [<function error at 0x7fdc83ac5480>,
   <function rel_error at 0x7fdc82f9b5b0>,
   <function abs_error at 0x7fdc82f9b640>,
   <function abs_rel_error at 0x7fdc82f9b6d0>]),
 ('n_steps',
  [<function error at 0x7fdc83ac5480>,
   <function rel_error at 0x7fdc82f9b5b0>,
   <function abs_error at 0x7fdc82f9b640>,
   <function abs_rel_error at 0x7fdc82f9b6d0>]),
 ('n_strides',
  [<function error at 0x7fdc83ac5480>,
   <function rel_error at 0x7fdc82f9b5b0>,
   <function abs_error at 0x7fdc82f9b640>,
   <function abs_rel_error at 0x7fdc82f9b6d0>]),
 ('n_turns',
  [<function error at 0x7fdc83ac5480>,
   <function rel_error at 0x7fdc82f9b5b0>,
   <function abs_error at 0x7fdc82f9b640>,
   <function abs_rel_error at 0x7fdc82f9b6d0>]),
 ('stride_duration_s',
  [<function error at 0x7fdc83ac5480>,
   <function rel_error at 0x7fdc82f9b5b0>,
   <function abs_error at 0x7fdc82f9b640>,
   <function abs_rel_error at 0x7fdc82f9b6d0>]),
 ('stride_length_m',
  [<function error at 0x7fdc83ac5480>,
   <function rel_error at 0x7fdc82f9b5b0>,
   <function abs_error at 0x7fdc82f9b640>,
   <function abs_rel_error at 0x7fdc82f9b6d0>]),
 ('walking_speed_mps',
  [<function error at 0x7fdc83ac5480>,
   <function rel_error at 0x7fdc82f9b5b0>,
   <function abs_error at 0x7fdc82f9b640>,
   <function abs_rel_error at 0x7fdc82f9b6d0>])]

While the visualization here is a little ugly, we can see that the default error transformation attempts to calculate the error, the relative error, the absolute error, and the absolute relative error for all the core DMOs.

We can apply it as before.

/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:365: UserWarning: One of the transformations requires the following columns: 'n_strides'. They are not found in the DataFrame.
  warnings.warn(str(e), stacklevel=1)
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/pipeline/_error_metrics.py:84: UserWarning: Zero division occurred in rel_error because divisor contains zeroes. Affected error metrics are set to NaN.
  _handle_zero_division(ref, zero_division_hint, "rel_error")
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/pipeline/_error_metrics.py:146: UserWarning: Zero division occurred in abs_rel_error because divisor contains zeroes. Affected error metrics are set to NaN.
  _handle_zero_division(ref, zero_division_hint, "abs_rel_error")
visit_type T1
participant_id 12345
measurement_date 2023-01-01
match_id 0 2 4 6
cadence_spm error 0.410552 0.949059 0.289991 0.335760
rel_error 0.004113 0.010967 0.003095 0.003855
abs_error 0.410552 0.949059 0.289991 0.335760
abs_rel_error 0.004113 0.010967 0.003095 0.003855
duration_s error 0.383295 0.613306 0.118158 0.513849
rel_error 0.080744 0.071923 0.019373 0.038004
abs_error 0.383295 0.613306 0.118158 0.513849
abs_rel_error 0.080744 0.071923 0.019373 0.038004
n_steps error 0.000000 0.000000 1.000000 0.000000
rel_error 0.000000 0.000000 0.125000 0.000000
abs_error 0.000000 0.000000 1.000000 0.000000
abs_rel_error 0.000000 0.000000 0.125000 0.000000
n_turns error 1.000000 1.000000 0.000000 0.000000
rel_error NaN 0.500000 NaN 0.000000
abs_error 1.000000 1.000000 0.000000 0.000000
abs_rel_error NaN 0.500000 NaN 0.000000
stride_duration_s error 0.798057 0.911949 0.960170 0.621676
rel_error 0.502951 0.356102 0.408071 0.309936
abs_error 0.798057 0.911949 0.960170 0.621676
abs_rel_error 0.502951 0.356102 0.408071 0.309936
stride_length_m error 0.388493 0.909145 0.378533 0.891311
rel_error 0.154234 0.546673 0.193160 0.368642
abs_error 0.388493 0.909145 0.378533 0.891311
abs_rel_error 0.154234 0.546673 0.193160 0.368642
walking_speed_mps error 0.900869 0.655491 0.215855 0.687740
rel_error 0.776083 0.409569 0.092550 0.328217
abs_error 0.900869 0.655491 0.215855 0.687740
abs_rel_error 0.776083 0.409569 0.092550 0.328217


Before we now aggregate the results, we can also combine the error metrics with the reference and detected values to have all the information in one dataframe.

visit_type T1
participant_id 12345
measurement_date 2023-01-01
match_id 0 2 4 6
cadence_spm detected 100.232432 87.484329 93.988941 87.428880
reference 99.821880 86.535270 93.698950 87.093120
duration_s detected 5.130315 9.140576 6.217228 14.034649
reference 4.747020 8.527270 6.099070 13.520800
end detected 5.000000 25.000000 45.000000 65.000000
reference 4.000000 24.000000 44.000000 64.000000
n_steps detected 8.000000 12.000000 9.000000 17.000000
reference 8.000000 12.000000 8.000000 17.000000
n_turns detected 1.000000 3.000000 0.000000 1.000000
reference 0.000000 2.000000 0.000000 1.000000
start detected 0.000000 20.000000 40.000000 60.000000
reference 0.000000 20.000000 40.000000 60.000000
stride_duration_s detected 2.384807 3.472869 3.313120 2.627496
reference 1.586750 2.560920 2.352950 2.005820
stride_length_m detected 2.907343 2.572195 2.338223 3.309131
reference 2.518850 1.663050 1.959690 2.417820
walking_speed_mps detected 2.061659 2.255931 2.548155 2.783120
reference 1.160790 1.600440 2.332300 2.095380
wb_id detected 0.000000 2.000000 4.000000 6.000000
reference 0.000000 2.000000 4.000000 6.000000
cadence_spm error 0.410552 0.949059 0.289991 0.335760
rel_error 0.004113 0.010967 0.003095 0.003855
abs_error 0.410552 0.949059 0.289991 0.335760
abs_rel_error 0.004113 0.010967 0.003095 0.003855
duration_s error 0.383295 0.613306 0.118158 0.513849
rel_error 0.080744 0.071923 0.019373 0.038004
abs_error 0.383295 0.613306 0.118158 0.513849
abs_rel_error 0.080744 0.071923 0.019373 0.038004
n_steps error 0.000000 0.000000 1.000000 0.000000
rel_error 0.000000 0.000000 0.125000 0.000000
abs_error 0.000000 0.000000 1.000000 0.000000
abs_rel_error 0.000000 0.000000 0.125000 0.000000
n_turns error 1.000000 1.000000 0.000000 0.000000
rel_error NaN 0.500000 NaN 0.000000
abs_error 1.000000 1.000000 0.000000 0.000000
abs_rel_error NaN 0.500000 NaN 0.000000
stride_duration_s error 0.798057 0.911949 0.960170 0.621676
rel_error 0.502951 0.356102 0.408071 0.309936
abs_error 0.798057 0.911949 0.960170 0.621676
abs_rel_error 0.502951 0.356102 0.408071 0.309936
stride_length_m error 0.388493 0.909145 0.378533 0.891311
rel_error 0.154234 0.546673 0.193160 0.368642
abs_error 0.388493 0.909145 0.378533 0.891311
abs_rel_error 0.154234 0.546673 0.193160 0.368642
walking_speed_mps error 0.900869 0.655491 0.215855 0.687740
rel_error 0.776083 0.409569 0.092550 0.328217
abs_error 0.900869 0.655491 0.215855 0.687740
abs_rel_error 0.776083 0.409569 0.092550 0.328217


Aggregate Results#

Finally, the estimated DMO measures and their errors can be aggregated over all WBs (approach 2) or all days (approach 1). For this purpose, different aggregation functions can be applied to the error metrics, ranging from simple, built-in aggregations like the mean or standard deviation to more complex functions like the limits of agreement or 5th and 95th percentiles. This can be done using the apply_aggregations function. It operates similarly to the apply_transformations function used above by taking the error metrics dataframe and a list of aggregations as input. In contrast to the transformations, an aggregation performed over a subset of dataframe columns is expected to return a single value or a tuple of values stored in one cell of the resulting dataframe. There are two ways to define aggregations:

  1. As a tuple in the format (<identifier>, <aggregation>). In this case, the operation is performed based on exactly one column from the input df. Therefore, <identifier> can either be a string representing the name of the column to evaluate (for data with single-level columns), or a tuple of strings uniquely identifying the column to evaluate in case of multi-index columns. In our example, the identifier is a tuple (<metric>, <origin>), where <metric> is the metric column to evaluate, <origin> is the specific column from which data should be utilized (here, it would be either detected, reference, or one of the error columns).

    <aggregation> is the function or the list of functions to apply. The output dataframe will have a multilevel column with metric as the first level and origin as the second level. A valid aggregations list for all of our DMOs would consequently look like this:

metrics = [
    "cadence_spm",
    "duration_s",
    "n_steps",
    "n_turns",
    "stride_duration_s",
    "stride_length_m",
    "walking_speed_mps",
]
aggregations_simple = [
    ((m, o), ["mean", "std"])
    for m in metrics
    for o in ["detected", "reference", "error"]
]
pprint(aggregations_simple)
[(('cadence_spm', 'detected'), ['mean', 'std']),
 (('cadence_spm', 'reference'), ['mean', 'std']),
 (('cadence_spm', 'error'), ['mean', 'std']),
 (('duration_s', 'detected'), ['mean', 'std']),
 (('duration_s', 'reference'), ['mean', 'std']),
 (('duration_s', 'error'), ['mean', 'std']),
 (('n_steps', 'detected'), ['mean', 'std']),
 (('n_steps', 'reference'), ['mean', 'std']),
 (('n_steps', 'error'), ['mean', 'std']),
 (('n_turns', 'detected'), ['mean', 'std']),
 (('n_turns', 'reference'), ['mean', 'std']),
 (('n_turns', 'error'), ['mean', 'std']),
 (('stride_duration_s', 'detected'), ['mean', 'std']),
 (('stride_duration_s', 'reference'), ['mean', 'std']),
 (('stride_duration_s', 'error'), ['mean', 'std']),
 (('stride_length_m', 'detected'), ['mean', 'std']),
 (('stride_length_m', 'reference'), ['mean', 'std']),
 (('stride_length_m', 'error'), ['mean', 'std']),
 (('walking_speed_mps', 'detected'), ['mean', 'std']),
 (('walking_speed_mps', 'reference'), ['mean', 'std']),
 (('walking_speed_mps', 'error'), ['mean', 'std'])]
  1. As a named tuple of Type CustomOperation taking three values: identifier, function, and column_name. identifier is a valid loc identifier selecting one or more columns from the dataframe, function is the (custom) aggregation function or list of functions to apply, and column_name is the name of the resulting column in the output dataframe (single-level column if column_name is a string, multi-level column if column_name is a tuple). This allows for more complex aggregations that require multiple columns as input, for example, the intraclass correlation coefficient (ICC) for the DMOs (see below). A valid aggregation list for calculating the ICC of all DMOs would look like this:

from mobgap.pipeline.evaluation import CustomErrorAggregations as A
from mobgap.pipeline.evaluation import get_default_error_aggregations
from mobgap.utils.df_operations import CustomOperation

aggregations_custom = [
    CustomOperation(identifier=m, function=A.icc, column_name=(m, "all"))
    for m in metrics
]
pprint(aggregations_custom)
[CustomOperation(identifier='cadence_spm', function=<function icc at 0x7fdc82f9b7f0>, column_name=('cadence_spm', 'all')),
 CustomOperation(identifier='duration_s', function=<function icc at 0x7fdc82f9b7f0>, column_name=('duration_s', 'all')),
 CustomOperation(identifier='n_steps', function=<function icc at 0x7fdc82f9b7f0>, column_name=('n_steps', 'all')),
 CustomOperation(identifier='n_turns', function=<function icc at 0x7fdc82f9b7f0>, column_name=('n_turns', 'all')),
 CustomOperation(identifier='stride_duration_s', function=<function icc at 0x7fdc82f9b7f0>, column_name=('stride_duration_s', 'all')),
 CustomOperation(identifier='stride_length_m', function=<function icc at 0x7fdc82f9b7f0>, column_name=('stride_length_m', 'all')),
 CustomOperation(identifier='walking_speed_mps', function=<function icc at 0x7fdc82f9b7f0>, column_name=('walking_speed_mps', 'all'))]

In this case, the ICC function gets the entire “sub-dataframe” obtained by the selection wb_matches_with_errors.loc[:, m] as shown below for stride_duration_s as example, and could then perform any required calculations. The selection could theoretically be any valid loc selection. So you could even select values across multiple DMOs.

sub_df = wb_matches_with_errors.loc[:, "stride_duration_s"]

The ICC function just takes the detected and reference columns and calculates the ICC.

(0.12564828430955782, array([-0.77,  0.9 ]))

Within one aggregation list, both types of aggregations can be combined as long as the resulting output dataframes can be concatenated, i.e. have the same number of column levels. Then, the apply_aggregations function can be called. This returns a pandas Series with the aggregated values for each metric and origin. For better readability, we sort and format the resulting dataframe.

from mobgap.utils.df_operations import apply_aggregations

aggregations = aggregations_simple + aggregations_custom
agg_results = (
    apply_aggregations(wb_matches_with_errors, aggregations)
    .rename_axis(index=["aggregation", "metric", "origin"])
    .reorder_levels(["metric", "origin", "aggregation"])
    .sort_index(level=0)
    .to_frame("values")
)
agg_results
values
metric origin aggregation
cadence_spm all icc (0.9958867337397626, [0.96, 1.0])
detected mean 92.283646
std 6.128986
error mean 0.496341
std 0.305876
reference mean 91.787305
std 6.267057
duration_s all icc (0.9935066257807105, [0.94, 1.0])
detected mean 8.630692
std 3.980795
error mean 0.407152
std 0.214453
reference mean 8.22354
std 3.86233
n_steps all icc (0.9927710843373494, [0.93, 1.0])
detected mean 11.5
std 4.041452
error mean 0.25
std 0.5
reference mean 11.25
std 4.272002
n_turns all icc (0.8064516129032259, [-0.03, 0.99])
detected mean 1.25
std 1.258306
error mean 0.5
std 0.57735
reference mean 0.75
std 0.957427
stride_duration_s all icc (0.12564828430955782, [-0.77, 0.9])
detected mean 2.949573
std 0.525579
error mean 0.822963
std 0.150423
reference mean 2.12661
std 0.426573
stride_length_m all icc (0.1021600349369979, [-0.78, 0.9])
detected mean 2.781723
std 0.422111
error mean 0.64187
std 0.298442
reference mean 2.139852
std 0.400293
walking_speed_mps all icc (0.20380942118620765, [-0.74, 0.92])
detected mean 2.412216
std 0.317996
error mean 0.614989
std 0.2875
reference mean 1.797227
std 0.522486


If you simply want to apply a standard set of aggregations to the error metrics, you can use the get_default_error_aggregations function, resulting in the following list:

[(('cadence_spm', 'detected'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('cadence_spm', 'reference'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('cadence_spm', 'abs_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('cadence_spm', 'abs_rel_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('duration_s', 'detected'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('duration_s', 'reference'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('duration_s', 'abs_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('duration_s', 'abs_rel_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_steps', 'detected'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_steps', 'reference'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_steps', 'abs_error'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_steps', 'abs_rel_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_strides', 'detected'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_strides', 'reference'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_strides', 'abs_error'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_strides', 'abs_rel_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_turns', 'detected'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_turns', 'reference'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_turns', 'abs_error'), ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('n_turns', 'abs_rel_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('stride_duration_s', 'detected'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('stride_duration_s', 'reference'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('stride_duration_s', 'abs_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('stride_duration_s', 'abs_rel_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('stride_length_m', 'detected'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('stride_length_m', 'reference'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('stride_length_m', 'abs_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('stride_length_m', 'abs_rel_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('walking_speed_mps', 'detected'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('walking_speed_mps', 'reference'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('walking_speed_mps', 'abs_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('walking_speed_mps', 'abs_rel_error'),
  ['mean', <function quantiles at 0x7fdc82f9b880>]),
 (('cadence_spm', 'error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('cadence_spm', 'rel_error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('duration_s', 'error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('duration_s', 'rel_error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('n_steps', 'error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('n_steps', 'rel_error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('n_strides', 'error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('n_strides', 'rel_error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('n_turns', 'error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('n_turns', 'rel_error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('stride_duration_s', 'error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('stride_duration_s', 'rel_error'),
  ['mean', <function loa at 0x7fdc82f9b910>]),
 (('stride_length_m', 'error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('stride_length_m', 'rel_error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('walking_speed_mps', 'error'), ['mean', <function loa at 0x7fdc82f9b910>]),
 (('walking_speed_mps', 'rel_error'),
  ['mean', <function loa at 0x7fdc82f9b910>]),
 CustomOperation(identifier='cadence_spm', function=<function icc at 0x7fdc82f9b7f0>, column_name=('cadence_spm', 'all')),
 CustomOperation(identifier='duration_s', function=<function icc at 0x7fdc82f9b7f0>, column_name=('duration_s', 'all')),
 CustomOperation(identifier='n_steps', function=<function icc at 0x7fdc82f9b7f0>, column_name=('n_steps', 'all')),
 CustomOperation(identifier='n_strides', function=<function icc at 0x7fdc82f9b7f0>, column_name=('n_strides', 'all')),
 CustomOperation(identifier='n_turns', function=<function icc at 0x7fdc82f9b7f0>, column_name=('n_turns', 'all')),
 CustomOperation(identifier='stride_duration_s', function=<function icc at 0x7fdc82f9b7f0>, column_name=('stride_duration_s', 'all')),
 CustomOperation(identifier='stride_length_m', function=<function icc at 0x7fdc82f9b7f0>, column_name=('stride_length_m', 'all')),
 CustomOperation(identifier='walking_speed_mps', function=<function icc at 0x7fdc82f9b7f0>, column_name=('walking_speed_mps', 'all')),
 CustomOperation(identifier=None, function=<function n_datapoints at 0x7fdc82f9b9a0>, column_name=('all', 'all'))]

If you want to include further aggregations next to the default ones, you can also append them to this list.

aggregations_default_extended = aggregations_default + [
    *(((m, o), ["std"]) for m in metrics for o in ["detected", "reference"])
]

This list of standard aggregations can then also be passed to the apply_aggregations function.

default_agg_results = (
    apply_aggregations(wb_matches_with_errors, aggregations_default_extended)
    .rename_axis(index=["aggregation", "metric", "origin"])
    .reorder_levels(["metric", "origin", "aggregation"])
    .sort_index(level=0)
    .to_frame("values")
)
default_agg_results
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'detected')'. They are not found in the DataFrame.
  warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1)
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'reference')'. They are not found in the DataFrame.
  warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1)
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'abs_error')'. They are not found in the DataFrame.
  warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1)
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'abs_rel_error')'. They are not found in the DataFrame.
  warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1)
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'error')'. They are not found in the DataFrame.
  warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1)
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:475: UserWarning: One of the transformations requires the following columns: '('n_strides', 'rel_error')'. They are not found in the DataFrame.
  warnings.warn(str(MissingDataColumnsError(key)), UserWarning, stacklevel=1)
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/utils/df_operations.py:566: UserWarning: One of the transformations requires the following columns: 'n_strides'. They are not found in the DataFrame.
  warnings.warn(str(e), UserWarning, stacklevel=1)
values
metric origin aggregation
all all n_datapoints 4
cadence_spm abs_error mean 0.496341
quantiles (0.29685672457991713, 0.8682829318918925)
abs_rel_error mean 0.005508
quantiles (0.003208965846203449, 0.00993913893132042)
... ... ... ...
walking_speed_mps reference mean 1.797227
quantiles (1.2267375, 2.2967619999999997)
std 0.522486
rel_error loa (-0.15414805045818636, 0.9573580216047631)
mean 0.401605

106 rows × 1 columns



Note

If you want to modify the default arguments of the aggregation functions, e.g. to change the calculated quantiles, you can either define custom aggregation functions or adapt the default functions as shown for the transformation functions above.

Total running time of the script: (0 minutes 4.901 seconds)

Estimated memory usage: 9 MB

Gallery generated by Sphinx-Gallery