Note

Go to the end to download the full example code.

Performance of the cadence algorithms on the TVS dataset#

The following provides an analysis and comparison of the cadence algorithms on the TVS dataset (lab and free-living). We look into the actual performance of the algorithms compared to the reference data. Note, that at the time of writing, comparison with the original Matlab results is not possible, as these algorithms were not run on the same version of the TVS dataset.

Note

If you are interested in how these results are calculated, head over to the processing page.

Below are the list of algorithms that we will compare. Note, that we use the prefix “MobGap” to refer to the reimplemented python algorithms.

algorithms = {
    "HKLeeImproved": ("HKLeeImproved", "MobGap"),
    "ShinImproved": ("ShinImproved", "MobGap"),
    "matlab_HKLee_Imp2": ("HKLeeImproved", "Original Implementation"),
    "matlab_Shin_Imp": ("ShinImproved", "Original Implementation"),
}

The code below loads the data and prepares it for the analysis. By default, the data will be downloaded from an online repository (and cached locally). If you want to use a local copy of the data, you can set the MOBGAP_VALIDATION_DATA_PATH environment variable. and the MOBGAP_VALIDATION_USE_LOCA_DATA to 1.

The file download will print a couple log information, which can usually be ignored. You can also change the version parameter to load a different version of the data.

from pathlib import Path

import pandas as pd
from mobgap.data.validation_results import ValidationResultLoader
from mobgap.utils.misc import get_env_var


def format_loaded_results(
    values: dict[tuple[str, str], pd.DataFrame],
    index_cols: list[str],
    convert_rel_error: bool = False,
) -> pd.DataFrame:
    formatted = (
        pd.concat(values, names=["algo", "version", *index_cols])
        .reset_index()
        .assign(
            algo_with_version=lambda df: (
                df["algo"] + " (" + df["version"] + ")"
            ),
            _combined="combined",
        )
    )
    if not convert_rel_error:
        return formatted
    rel_cols = [c for c in formatted.columns if "rel_error" in c]
    formatted[rel_cols] = formatted[rel_cols] * 100
    return formatted


local_data_path = (
    Path(get_env_var("MOBGAP_VALIDATION_DATA_PATH")) / "results"
    if int(get_env_var("MOBGAP_VALIDATION_USE_LOCAL_DATA", 0))
    else None
)
__RESULT_VERSION = "v1.0.0"
__RESULT_VERSION = "v1.0.0"
loader = ValidationResultLoader(
    "cad", result_path=local_data_path, version=__RESULT_VERSION
)


free_living_index_cols = [
    "cohort",
    "participant_id",
    "time_measure",
    "recording",
    "recording_name",
    "recording_name_pretty",
]

free_living_results = format_loaded_results(
    {
        v: loader.load_single_results(k, "free_living")
        for k, v in algorithms.items()
    },
    free_living_index_cols,
    convert_rel_error=True,
)

lab_index_cols = [
    "cohort",
    "participant_id",
    "time_measure",
    "test",
    "trial",
    "test_name",
    "test_name_pretty",
]

lab_results = format_loaded_results(
    {
        v: loader.load_single_results(k, "laboratory")
        for k, v in algorithms.items()
    },
    lab_index_cols,
    convert_rel_error=True,
)

cohort_order = ["HA", "CHF", "COPD", "MS", "PD", "PFF"]

  0%|                                              | 0.00/12.5k [00:00<?, ?B/s]
  0%|                                              | 0.00/12.5k [00:00<?, ?B/s]
100%|█████████████████████████████████████| 12.5k/12.5k [00:00<00:00, 71.2MB/s]

  0%|                                              | 0.00/12.5k [00:00<?, ?B/s]
  0%|                                              | 0.00/12.5k [00:00<?, ?B/s]
100%|█████████████████████████████████████| 12.5k/12.5k [00:00<00:00, 84.7MB/s]

  0%|                                              | 0.00/12.4k [00:00<?, ?B/s]
  0%|                                              | 0.00/12.4k [00:00<?, ?B/s]
100%|█████████████████████████████████████| 12.4k/12.4k [00:00<00:00, 83.0MB/s]

  0%|                                              | 0.00/12.5k [00:00<?, ?B/s]
  0%|                                              | 0.00/12.5k [00:00<?, ?B/s]
100%|█████████████████████████████████████| 12.5k/12.5k [00:00<00:00, 87.1MB/s]

  0%|                                              | 0.00/97.2k [00:00<?, ?B/s]
  0%|                                              | 0.00/97.2k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 97.2k/97.2k [00:00<00:00, 605MB/s]

  0%|                                              | 0.00/96.4k [00:00<?, ?B/s]
  0%|                                              | 0.00/96.4k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 96.4k/96.4k [00:00<00:00, 622MB/s]

  0%|                                              | 0.00/92.4k [00:00<?, ?B/s]
  0%|                                              | 0.00/92.4k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 92.4k/92.4k [00:00<00:00, 597MB/s]

  0%|                                              | 0.00/94.6k [00:00<?, ?B/s]
  0%|                                              | 0.00/94.6k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 94.6k/94.6k [00:00<00:00, 607MB/s]

Performance metrics#

Below you can find the setup for all performance metrics that we will calculate. We only use the wb__ results for the comparison. These results are calculated by first calculating the average cadence per WB. Then calculating the error metrics for each WB. Then we take the average over all WBs of a participant to get the wb__ results.

from functools import partial

from mobgap.pipeline.evaluation import CustomErrorAggregations as A
from mobgap.utils.df_operations import (
    CustomOperation,
    apply_aggregations,
    apply_transformations,
    multilevel_groupby_apply_merge,
)
from mobgap.utils.tables import FormatTransformer as F
from mobgap.utils.tables import RevalidationInfo, revalidation_table_styles
from mobgap.utils.tables import StatsFunctions as S

custom_aggs = [
    CustomOperation(
        identifier=None,
        function=A.n_datapoints,
        column_name=[("n_datapoints", "all")],
    ),
    CustomOperation(
        identifier=None,
        function=lambda df_: df_["wb__detected"].isna().sum(),
        column_name=[("n_nan_detected", "all")],
    ),
    ("wb__detected", ["mean", A.conf_intervals]),
    ("wb__reference", ["mean", A.conf_intervals]),
    ("wb__error", ["mean", A.loa]),
    ("wb__abs_error", ["mean", A.conf_intervals]),
    ("wb__rel_error", ["mean", A.conf_intervals]),
    ("wb__abs_rel_error", ["mean", A.conf_intervals]),
    CustomOperation(
        identifier=None,
        function=partial(
            A.icc,
            reference_col_name="wb__reference",
            detected_col_name="wb__detected",
            icc_type="icc2",
            # For the lab data, some trials have no results for the Original algorithms.
            nan_policy="omit",
        ),
        column_name=[("icc", "wb_level"), ("icc_ci", "wb_level")],
    ),
]

stats_transform = [
    CustomOperation(
        identifier=None,
        function=partial(
            S.pairwise_tests,
            value_col=c,
            between="version",
            reference_group_key="Original Implementation",
        ),
        column_name=[("stats_metadata", c)],
    )
    for c in [
        "wb__abs_error",
        "wb__abs_rel_error",
    ]
]

format_transforms = [
    CustomOperation(
        identifier=None,
        function=lambda df_: df_[("n_datapoints", "all")].astype(int),
        column_name="n_datapoints",
    ),
    CustomOperation(
        identifier=None,
        function=lambda df_: df_[("n_nan_detected", "all")].astype(int),
        column_name="n_nan_detected",
    ),
    *(
        CustomOperation(
            identifier=None,
            function=partial(
                F.value_with_metadata,
                value_col=("mean", c),
                other_columns={
                    "range": ("conf_intervals", c),
                    "stats_metadata": ("stats_metadata", c),
                },
            ),
            column_name=c,
        )
        for c in [
            "wb__reference",
            "wb__detected",
            "wb__abs_error",
            "wb__rel_error",
            "wb__abs_rel_error",
        ]
    ),
    CustomOperation(
        identifier=None,
        function=partial(
            F.value_with_metadata,
            value_col=("mean", "wb__error"),
            other_columns={"range": ("loa", "wb__error")},
        ),
        column_name="wb__error",
    ),
    CustomOperation(
        identifier=None,
        function=partial(
            F.value_with_metadata,
            value_col=("icc", "wb_level"),
            other_columns={"range": ("icc_ci", "wb_level")},
        ),
        column_name="icc",
    ),
]


final_names = {
    "n_datapoints": "# participants",
    "wb__detected": "WD mean and CI [steps/min]",
    "wb__reference": "INDIP mean and CI [steps/min]",
    "wb__error": "Bias and LoA [steps/min]",
    "wb__abs_error": "Abs. Error [steps/min]",
    "wb__rel_error": "Rel. Error [%]",
    "wb__abs_rel_error": "Abs. Rel. Error [%]",
    "icc": "ICC",
    "n_nan_detected": "# Failed WBs",
}


validation_thresholds = {
    "Abs. Error [steps/min]": RevalidationInfo(
        threshold=None, higher_is_better=False
    ),
    "Abs. Rel. Error [%]": RevalidationInfo(
        threshold=20, higher_is_better=False
    ),
    "ICC": RevalidationInfo(threshold=0.7, higher_is_better=True),
    "# Failed WBs": RevalidationInfo(threshold=None, higher_is_better=False),
}


def format_tables(df: pd.DataFrame) -> pd.DataFrame:
    return (
        df.pipe(apply_transformations, format_transforms)
        .rename(columns=final_names)
        .loc[:, list(final_names.values())]
    )

Free-Living Comparison#

We focus on the free-living data for the comparison as this is the expected use case for the algorithms.

All results across all cohorts#

The results below represent the average performance across all participants independent of the cohort.

import matplotlib.pyplot as plt
import seaborn as sns

fig, ax = plt.subplots()
sns.boxplot(
    data=free_living_results, x="algo_with_version", y="wb__abs_error", ax=ax
)
plt.xticks(rotation=45, ha="right")
fig.tight_layout()
fig.show()

perf_metrics_all = free_living_results.pipe(
    multilevel_groupby_apply_merge,
    [
        (
            ["algo", "version"],
            partial(apply_aggregations, aggregations=custom_aggs),
        ),
        (
            ["algo"],
            partial(apply_transformations, transformations=stats_transform),
        ),
    ],
).pipe(format_tables)
perf_metrics_all.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["algo"],
)

		# participants	WD mean and CI [steps/min]	INDIP mean and CI [steps/min]	Bias and LoA [steps/min]	Abs. Error [steps/min]	Rel. Error [%]	Abs. Rel. Error [%]	ICC	# Failed WBs
algo	version
HKLeeImproved	MobGap	101	87.73 [86.49, 88.98]	85.55 [83.63, 87.46]	2.18 [-9.06, 13.42]	6.77 [5.91, 7.63]	4.35 [2.62, 6.08]	8.82 [7.33, 10.31]	0.74 [0.60, 0.83]	1
HKLeeImproved	Original Implementation	101	89.13 [87.90, 90.35]	85.55 [83.63, 87.46]	3.58 [-7.45, 14.61]	7.59 [6.78, 8.39]	6.06 [4.35, 7.77]	9.92 [8.46, 11.38]	0.70 [0.42, 0.83]	1
ShinImproved	MobGap	101	86.07 [84.80, 87.35]	85.55 [83.63, 87.46]	0.47 [-10.60, 11.53]	6.51 [5.67, 7.36]	2.17 [0.57, 3.77]	8.22 [6.89, 9.55]	0.76 [0.67, 0.84]	1
ShinImproved	Original Implementation	101	85.45 [84.17, 86.73]	85.55 [83.63, 87.46]	-0.10 [-11.46, 11.26]	6.57 [5.72, 7.43]	1.44 [-0.13, 3.02]	8.18 [6.91, 9.46]	0.76 [0.66, 0.83]	1

Per Cohort#

The results below represent the average performance across all participants within a cohort.

fig, ax = plt.subplots()
sns.boxplot(
    data=free_living_results,
    x="cohort",
    y="wb__abs_error",
    hue="algo_with_version",
    order=cohort_order,
    ax=ax,
)
fig.show()
perf_metrics_cohort = (
    free_living_results.pipe(
        multilevel_groupby_apply_merge,
        [
            (
                ["cohort", "algo", "version"],
                partial(apply_aggregations, aggregations=custom_aggs),
            ),
            (
                ["cohort", "algo"],
                partial(apply_transformations, transformations=stats_transform),
            ),
        ],
    )
    .pipe(format_tables)
    .loc[cohort_order]
)
perf_metrics_cohort.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["cohort", "algo"],
)

			# participants	WD mean and CI [steps/min]	INDIP mean and CI [steps/min]	Bias and LoA [steps/min]	Abs. Error [steps/min]	Rel. Error [%]	Abs. Rel. Error [%]	ICC	# Failed WBs
cohort	algo	version
HA	HKLeeImproved	MobGap	20	89.31 [86.98, 91.64]	87.33 [83.98, 90.69]	1.98 [-4.44, 8.40]	6.25 [5.34, 7.17]	3.70 [1.63, 5.77]	7.86 [6.34, 9.38]	0.84 [0.57, 0.94]	0
	HKLeeImproved	Original Implementation	20	90.57 [88.36, 92.79]	87.33 [83.98, 90.69]	3.24 [-3.43, 9.91]	7.14 [6.26, 8.02]	5.22 [3.06, 7.38]	8.98 [7.42, 10.55]	0.77 [0.19, 0.92]	0
	ShinImproved	MobGap	20	87.76 [85.45, 90.07]	87.33 [83.98, 90.69]	0.30 [-5.84, 6.44]	5.87 [5.04, 6.71]	1.58 [-0.30, 3.46]	7.13 [5.85, 8.42]	0.89 [0.75, 0.96]	0
	ShinImproved	Original Implementation	20	87.24 [84.94, 89.55]	87.33 [83.98, 90.69]	-0.09 [-6.63, 6.44]	6.05 [5.18, 6.91]	1.11 [-0.86, 3.07]	7.36 [6.01, 8.71]	0.88 [0.71, 0.95]	0
CHF	HKLeeImproved	MobGap	10	91.11 [87.90, 94.32]	89.60 [85.27, 93.93]	1.51 [-5.21, 8.24]	5.75 [4.90, 6.60]	2.86 [0.36, 5.36]	6.88 [5.69, 8.06]	0.83 [0.48, 0.95]	0
	HKLeeImproved	Original Implementation	10	92.71 [89.50, 95.92]	89.60 [85.27, 93.93]	3.11 [-3.21, 9.44]	6.58 [5.58, 7.57]	4.77 [2.31, 7.22]	8.03 [6.57, 9.48]	0.77 [0.12, 0.94]	0
	ShinImproved	MobGap	10	89.39 [86.26, 92.53]	89.60 [85.27, 93.93]	-0.36 [-6.23, 5.50]	5.61 [4.69, 6.53]	0.53 [-1.55, 2.60]	6.43 [5.60, 7.26]	0.89 [0.61, 0.97]	0
	ShinImproved	Original Implementation	10	89.01 [85.77, 92.26]	89.60 [85.27, 93.93]	-0.58 [-6.25, 5.08]	5.45 [4.40, 6.51]	0.17 [-1.77, 2.12]	6.15 [5.34, 6.96]	0.90 [0.65, 0.97]	0
COPD	HKLeeImproved	MobGap	17	85.74 [83.14, 88.35]	82.74 [79.78, 85.69]	2.98 [-0.26, 6.23]	5.32 [4.50, 6.13]	4.50 [3.36, 5.65]	6.95 [5.87, 8.03]	0.85 [-0.03, 0.97]	0
	HKLeeImproved	Original Implementation	17	87.27 [84.59, 89.94]	82.74 [79.78, 85.69]	4.53 [1.10, 7.97]	6.51 [5.59, 7.42]	6.47 [5.22, 7.72]	8.49 [7.26, 9.73]	0.74 [-0.05, 0.94]	0
	ShinImproved	MobGap	17	84.09 [81.45, 86.73]	82.74 [79.78, 85.69]	1.31 [-1.26, 3.89]	4.92 [4.21, 5.64]	2.44 [1.52, 3.36]	6.32 [5.47, 7.18]	0.95 [0.63, 0.99]	0
	ShinImproved	Original Implementation	17	83.72 [81.10, 86.34]	82.74 [79.78, 85.69]	0.98 [-1.98, 3.94]	4.84 [4.10, 5.58]	1.95 [0.90, 3.01]	6.16 [5.26, 7.06]	0.96 [0.83, 0.99]	0
MS	HKLeeImproved	MobGap	18	88.88 [85.60, 92.17]	86.67 [82.82, 90.52]	2.21 [-2.52, 6.94]	6.40 [5.10, 7.70]	3.95 [2.30, 5.59]	8.16 [6.34, 9.98]	0.92 [0.55, 0.98]	0
	HKLeeImproved	Original Implementation	18	90.38 [86.83, 93.94]	86.67 [82.82, 90.52]	3.71 [-0.95, 8.37]	7.29 [5.98, 8.60]	5.67 [4.14, 7.21]	9.34 [7.43, 11.24]	0.87 [0.02, 0.97]	0
	ShinImproved	MobGap	18	87.11 [83.48, 90.74]	86.67 [82.82, 90.52]	0.40 [-5.07, 5.87]	6.05 [4.56, 7.55]	1.62 [0.08, 3.15]	7.49 [5.55, 9.43]	0.94 [0.85, 0.98]	0
	ShinImproved	Original Implementation	18	86.18 [82.68, 89.68]	86.67 [82.82, 90.52]	-0.49 [-5.97, 4.98]	6.30 [4.89, 7.71]	0.55 [-0.95, 2.05]	7.66 [5.88, 9.45]	0.94 [0.85, 0.98]	0
PD	HKLeeImproved	MobGap	19	88.81 [85.89, 91.74]	88.01 [82.84, 93.19]	0.80 [-11.88, 13.48]	6.99 [4.93, 9.05]	2.71 [-0.43, 5.85]	8.61 [6.37, 10.85]	0.77 [0.49, 0.90]	0
	HKLeeImproved	Original Implementation	19	90.00 [87.45, 92.55]	88.01 [82.84, 93.19]	1.98 [-11.92, 15.89]	7.81 [5.83, 9.79]	4.29 [0.72, 7.86]	9.73 [7.45, 12.01]	0.69 [0.36, 0.87]	0
	ShinImproved	MobGap	19	87.11 [84.02, 90.21]	88.01 [82.84, 93.19]	-0.90 [-14.20, 12.39]	7.03 [4.81, 9.25]	0.64 [-2.49, 3.78]	8.31 [6.13, 10.48]	0.74 [0.45, 0.89]	0
	ShinImproved	Original Implementation	19	86.57 [83.31, 89.82]	88.01 [82.84, 93.19]	-1.45 [-15.04, 12.15]	7.05 [4.69, 9.41]	-0.08 [-3.21, 3.06]	8.20 [5.97, 10.42]	0.74 [0.45, 0.89]	0
PFF	HKLeeImproved	MobGap	17	83.18 [80.29, 86.07]	79.58 [73.16, 85.99]	3.60 [-18.84, 26.04]	9.75 [5.75, 13.74]	8.32 [-0.92, 17.57]	14.21 [6.35, 22.08]	0.39 [-0.08, 0.73]	1
	HKLeeImproved	Original Implementation	17	84.61 [81.93, 87.29]	79.58 [73.16, 85.99]	5.03 [-15.63, 25.69]	9.99 [6.31, 13.66]	10.01 [1.13, 18.89]	14.67 [6.98, 22.36]	0.44 [-0.00, 0.75]	1
	ShinImproved	MobGap	17	81.60 [78.91, 84.29]	79.58 [73.16, 85.99]	2.00 [-19.52, 23.51]	9.47 [5.79, 13.15]	6.06 [-2.40, 14.52]	13.41 [6.56, 20.25]	0.42 [-0.08, 0.75]	1
	ShinImproved	Original Implementation	17	80.66 [78.10, 83.23]	79.58 [73.16, 85.99]	1.09 [-21.11, 23.28]	9.52 [5.82, 13.21]	4.92 [-3.38, 13.23]	13.20 [6.73, 19.67]	0.41 [-0.12, 0.75]	1

Per relevant cohort#

Overview over all cohorts is good, but this is not how the Cadence algorithms are used in our main pipeline. Here, the HA, CHF, and COPD cohort use the IcdShinImproved algorithm, while the IcdHKLeeImproved algorithm is used for the MS, PD, PFF cohorts. Let’s look at the performance of these algorithms on the respective cohorts.

from mobgap.pipeline import MobilisedPipelineHealthy, MobilisedPipelineImpaired

low_impairment_algo = "ShinImproved"
low_impairment_cohorts = list(MobilisedPipelineHealthy().recommended_cohorts)

low_impairment_results = free_living_results[
    free_living_results["cohort"].isin(low_impairment_cohorts)
].query("algo == @low_impairment_algo")

hue_order = ["Original Implementation", "MobGap"]

fig, ax = plt.subplots()
sns.boxplot(
    data=low_impairment_results,
    x="cohort",
    y="wb__abs_rel_error",
    hue="version",
    hue_order=hue_order,
    ax=ax,
)
sns.boxplot(
    data=low_impairment_results,
    x="_combined",
    y="wb__abs_rel_error",
    hue="version",
    hue_order=hue_order,
    legend=False,
    ax=ax,
)
fig.suptitle(f"Low Impairment Cohorts ({low_impairment_algo})")
fig.show()

perf_metrics_cohort.copy().loc[
    pd.IndexSlice[low_impairment_cohorts, low_impairment_algo], :
].reset_index("algo", drop=True).style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["cohort"],
)

		# participants	WD mean and CI [steps/min]	INDIP mean and CI [steps/min]	Bias and LoA [steps/min]	Abs. Error [steps/min]	Rel. Error [%]	Abs. Rel. Error [%]	ICC	# Failed WBs
cohort	version
HA	MobGap	20	87.76 [85.45, 90.07]	87.33 [83.98, 90.69]	0.30 [-5.84, 6.44]	5.87 [5.04, 6.71]	1.58 [-0.30, 3.46]	7.13 [5.85, 8.42]	0.89 [0.75, 0.96]	0
HA	Original Implementation	20	87.24 [84.94, 89.55]	87.33 [83.98, 90.69]	-0.09 [-6.63, 6.44]	6.05 [5.18, 6.91]	1.11 [-0.86, 3.07]	7.36 [6.01, 8.71]	0.88 [0.71, 0.95]	0
COPD	MobGap	17	84.09 [81.45, 86.73]	82.74 [79.78, 85.69]	1.31 [-1.26, 3.89]	4.92 [4.21, 5.64]	2.44 [1.52, 3.36]	6.32 [5.47, 7.18]	0.95 [0.63, 0.99]	0
COPD	Original Implementation	17	83.72 [81.10, 86.34]	82.74 [79.78, 85.69]	0.98 [-1.98, 3.94]	4.84 [4.10, 5.58]	1.95 [0.90, 3.01]	6.16 [5.26, 7.06]	0.96 [0.83, 0.99]	0
CHF	MobGap	10	89.39 [86.26, 92.53]	89.60 [85.27, 93.93]	-0.36 [-6.23, 5.50]	5.61 [4.69, 6.53]	0.53 [-1.55, 2.60]	6.43 [5.60, 7.26]	0.89 [0.61, 0.97]	0
CHF	Original Implementation	10	89.01 [85.77, 92.26]	89.60 [85.27, 93.93]	-0.58 [-6.25, 5.08]	5.45 [4.40, 6.51]	0.17 [-1.77, 2.12]	6.15 [5.34, 6.96]	0.90 [0.65, 0.97]	0

high_impairment_algo = "HKLeeImproved"
high_impairment_cohorts = list(MobilisedPipelineImpaired().recommended_cohorts)

high_impairment_results = free_living_results[
    free_living_results["cohort"].isin(high_impairment_cohorts)
].query("algo == @high_impairment_algo")

fig, ax = plt.subplots()
sns.boxplot(
    data=high_impairment_results,
    x="cohort",
    y="wb__abs_rel_error",
    hue="version",
    hue_order=hue_order,
    ax=ax,
)
sns.boxplot(
    data=high_impairment_results,
    x="_combined",
    y="wb__abs_rel_error",
    hue="version",
    hue_order=hue_order,
    legend=False,
    ax=ax,
)
fig.suptitle(f"High Impairment Cohorts ({high_impairment_algo})")
fig.show()

perf_metrics_cohort.copy().loc[
    pd.IndexSlice[high_impairment_cohorts, high_impairment_algo], :
].reset_index("algo", drop=True).style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["cohort"],
)

		# participants	WD mean and CI [steps/min]	INDIP mean and CI [steps/min]	Bias and LoA [steps/min]	Abs. Error [steps/min]	Rel. Error [%]	Abs. Rel. Error [%]	ICC	# Failed WBs
cohort	version
PD	MobGap	19	88.81 [85.89, 91.74]	88.01 [82.84, 93.19]	0.80 [-11.88, 13.48]	6.99 [4.93, 9.05]	2.71 [-0.43, 5.85]	8.61 [6.37, 10.85]	0.77 [0.49, 0.90]	0
PD	Original Implementation	19	90.00 [87.45, 92.55]	88.01 [82.84, 93.19]	1.98 [-11.92, 15.89]	7.81 [5.83, 9.79]	4.29 [0.72, 7.86]	9.73 [7.45, 12.01]	0.69 [0.36, 0.87]	0
MS	MobGap	18	88.88 [85.60, 92.17]	86.67 [82.82, 90.52]	2.21 [-2.52, 6.94]	6.40 [5.10, 7.70]	3.95 [2.30, 5.59]	8.16 [6.34, 9.98]	0.92 [0.55, 0.98]	0
MS	Original Implementation	18	90.38 [86.83, 93.94]	86.67 [82.82, 90.52]	3.71 [-0.95, 8.37]	7.29 [5.98, 8.60]	5.67 [4.14, 7.21]	9.34 [7.43, 11.24]	0.87 [0.02, 0.97]	0
PFF	MobGap	17	83.18 [80.29, 86.07]	79.58 [73.16, 85.99]	3.60 [-18.84, 26.04]	9.75 [5.75, 13.74]	8.32 [-0.92, 17.57]	14.21 [6.35, 22.08]	0.39 [-0.08, 0.73]	1
PFF	Original Implementation	17	84.61 [81.93, 87.29]	79.58 [73.16, 85.99]	5.03 [-15.63, 25.69]	9.99 [6.31, 13.66]	10.01 [1.13, 18.89]	14.67 [6.98, 22.36]	0.44 [-0.00, 0.75]	1

Speed dependency#

One important aspect of the algorithm performance is the dependency on the walking speed. Aka, how well do the algorithms perform at different walking speeds. For this we plot the absolute relative error against the walking speed of the reference data. For better granularity, we use the values per WB, instead of the aggregates per participant.

The overlayed dots represent the trend-line calculated by taking the median of the absolute relative error within bins of 0.05 m/s.

import numpy as np

wb_level_results = format_loaded_results(
    {
        v: loader.load_single_csv_file(
            k, "free_living", "raw_wb_level_values_with_errors.csv"
        )
        for k, v in algorithms.items()
    },
    free_living_index_cols,
)

algo_names = wb_level_results.algo_with_version.unique()
fig, axs = plt.subplots(
    len(algo_names),
    1,
    sharex=True,
    sharey=True,
    figsize=(12, 3 * len(algo_names)),
)
for ax, algo in zip(axs, algo_names):
    data = wb_level_results.query("algo_with_version == @algo").copy()

    # Create scatter plot
    sns.scatterplot(
        data=data,
        x="reference_ws",
        y="abs_rel_error",
        ax=ax,
        alpha=0.3,
    )

    # Create bins and calculate medians
    bins = np.arange(0, data["reference_ws"].max() + 0.05, 0.05)
    data["speed_bin"] = pd.cut(data["reference_ws"], bins=bins)

    # Calculate bin centers for plotting
    data["bin_center"] = data["speed_bin"].apply(lambda x: x.mid)

    # Calculate medians per bin and cohort
    binned_data = (
        data.groupby("bin_center", observed=True)["abs_rel_error"]
        .median()
        .reset_index()
    )

    # Plot median lines
    sns.scatterplot(
        data=binned_data,
        x="bin_center",
        y="abs_rel_error",
        ax=ax,
    )

    ax.set_title(algo)
    ax.set_xlabel("Walking Speed (m/s)")
    ax.set_ylabel("Absolute Relative Error")

fig.tight_layout()
fig.show()

HKLeeImproved (MobGap), ShinImproved (MobGap), HKLeeImproved (Original Implementation), ShinImproved (Original Implementation)

  0%|                                               | 0.00/289k [00:00<?, ?B/s]
  0%|                                               | 0.00/289k [00:00<?, ?B/s]
100%|███████████████████████████████████████| 289k/289k [00:00<00:00, 1.36GB/s]

  0%|                                               | 0.00/287k [00:00<?, ?B/s]
  0%|                                               | 0.00/287k [00:00<?, ?B/s]
100%|███████████████████████████████████████| 287k/287k [00:00<00:00, 1.57GB/s]

  0%|                                               | 0.00/288k [00:00<?, ?B/s]
  0%|                                               | 0.00/288k [00:00<?, ?B/s]
100%|███████████████████████████████████████| 288k/288k [00:00<00:00, 1.58GB/s]

  0%|                                               | 0.00/290k [00:00<?, ?B/s]
  0%|                                               | 0.00/290k [00:00<?, ?B/s]
100%|███████████████████████████████████████| 290k/290k [00:00<00:00, 1.49GB/s]

Laboratory Comparison#

Every datapoint below is one trial of a test. Note, that each datapoint is weighted equally in the calculation of the performance metrics. This is a limitation of this simple approach, as the number of strides per trial and the complexity of the context can vary significantly. For a full picture, different groups of tests should be analyzed separately. The approach below should still provide a good overview to compare the algorithms.

All results across all cohorts#

The results below represent the average performance across all participants independent of the cohort.

fig, ax = plt.subplots()
sns.boxplot(data=lab_results, x="algo_with_version", y="wb__abs_error", ax=ax)
plt.xticks(rotation=45, ha="right")
fig.tight_layout()
fig.show()

perf_metrics_all = lab_results.pipe(
    multilevel_groupby_apply_merge,
    [
        (
            ["algo", "version"],
            partial(apply_aggregations, aggregations=custom_aggs),
        ),
        (
            ["algo"],
            partial(apply_transformations, transformations=stats_transform),
        ),
    ],
).pipe(format_tables)
perf_metrics_all.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["algo"],
)

		# participants	WD mean and CI [steps/min]	INDIP mean and CI [steps/min]	Bias and LoA [steps/min]	Abs. Error [steps/min]	Rel. Error [%]	Abs. Rel. Error [%]	ICC	# Failed WBs
algo	version
HKLeeImproved	MobGap	1168	95.44 [94.62, 96.26]	96.41 [95.41, 97.41]	-1.00 [-22.12, 20.12]	4.45 [3.88, 5.03]^*	0.21 [-0.45, 0.88]	4.81 [4.19, 5.43]^*	0.77 [0.74, 0.79]	114
HKLeeImproved	Original Implementation	1168	97.18 [96.29, 98.07]	96.41 [95.41, 97.41]	0.77 [-20.07, 21.62]	5.30 [4.76, 5.84]	1.94 [1.27, 2.61]	5.76 [5.15, 6.36]	0.79 [0.77, 0.81]	113
ShinImproved	MobGap	1168	94.83 [93.94, 95.72]	96.41 [95.41, 97.41]	-1.79 [-23.81, 20.24]	4.75 [4.16, 5.35]	-0.84 [-1.47, -0.21]	4.81 [4.24, 5.39]	0.76 [0.73, 0.79]	119
ShinImproved	Original Implementation	1168	94.32 [93.43, 95.21]	96.41 [95.41, 97.41]	-2.09 [-24.03, 19.84]	4.95 [4.35, 5.54]	-1.16 [-1.77, -0.54]	5.03 [4.47, 5.58]	0.76 [0.73, 0.79]	113

Per Cohort#

The results below represent the average performance across all trails of all participants within a cohort.

fig, ax = plt.subplots()
sns.boxplot(
    data=lab_results,
    x="cohort",
    y="wb__abs_error",
    hue="algo_with_version",
    order=cohort_order,
    ax=ax,
)
fig.show()
perf_metrics_cohort = (
    lab_results.pipe(
        multilevel_groupby_apply_merge,
        [
            (
                ["cohort", "algo", "version"],
                partial(apply_aggregations, aggregations=custom_aggs),
            ),
            (
                ["cohort", "algo"],
                partial(apply_transformations, transformations=stats_transform),
            ),
        ],
    )
    .pipe(format_tables)
    .loc[cohort_order]
)
perf_metrics_cohort.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["cohort", "algo"],
)

			# participants	WD mean and CI [steps/min]	INDIP mean and CI [steps/min]	Bias and LoA [steps/min]	Abs. Error [steps/min]	Rel. Error [%]	Abs. Rel. Error [%]	ICC	# Failed WBs
cohort	algo	version
HA	HKLeeImproved	MobGap	227	99.61 [97.77, 101.45]	101.68 [99.55, 103.81]	-2.07 [-20.69, 16.56]	4.02 [2.85, 5.19]	-1.41 [-2.34, -0.49]	3.59 [2.74, 4.44]	0.80 [0.74, 0.85]	36
	HKLeeImproved	Original Implementation	227	101.38 [99.42, 103.35]	101.68 [99.55, 103.81]	-0.29 [-20.49, 19.90]	5.19 [4.01, 6.38]	0.35 [-0.70, 1.40]	4.80 [3.91, 5.69]	0.79 [0.73, 0.84]	36
	ShinImproved	MobGap	227	99.17 [97.06, 101.28]	101.68 [99.55, 103.81]	-2.51 [-24.42, 19.41]	4.88 [3.52, 6.24]	-1.97 [-3.08, -0.87]	4.23 [3.22, 5.25]	0.76 [0.68, 0.81]	36
	ShinImproved	Original Implementation	227	98.99 [96.96, 101.03]	101.68 [99.55, 103.81]	-2.69 [-22.72, 17.34]	5.11 [3.89, 6.33]	-2.16 [-3.18, -1.14]	4.52 [3.61, 5.44]	0.79 [0.72, 0.84]	36
CHF	HKLeeImproved	MobGap	106	95.01 [92.40, 97.63]	95.65 [92.35, 98.96]	-0.64 [-21.05, 19.76]	4.26 [2.42, 6.09]	0.64 [-1.58, 2.86]	4.71 [2.65, 6.77]	0.78 [0.69, 0.85]	9
	HKLeeImproved	Original Implementation	106	97.36 [94.58, 100.14]	95.65 [92.35, 98.96]	1.70 [-16.73, 20.13]	5.06 [3.49, 6.63]	3.00 [0.73, 5.27]	5.82 [3.73, 7.91]	0.82 [0.75, 0.88]	9
	ShinImproved	MobGap	106	96.03 [93.35, 98.72]	95.65 [92.35, 98.96]	0.02 [-12.79, 12.83]	3.54 [2.45, 4.62]	0.96 [-0.81, 2.73]	4.14 [2.50, 5.77]	0.91 [0.87, 0.94]	10
	ShinImproved	Original Implementation	106	94.87 [92.17, 97.56]	95.65 [92.35, 98.96]	-0.79 [-13.33, 11.75]	3.59 [2.52, 4.66]	0.08 [-1.55, 1.72]	4.05 [2.55, 5.54]	0.92 [0.88, 0.94]	9
COPD	HKLeeImproved	MobGap	214	96.18 [94.50, 97.86]	98.34 [96.14, 100.53]	-2.16 [-24.26, 19.94]	3.88 [2.42, 5.34]	-1.32 [-2.35, -0.29]	3.36 [2.38, 4.33]	0.70 [0.61, 0.76]	34
	HKLeeImproved	Original Implementation	214	97.52 [95.80, 99.24]	98.34 [96.14, 100.53]	-0.81 [-21.78, 20.15]	4.68 [3.35, 6.00]	0.02 [-1.01, 1.05]	4.30 [3.39, 5.21]	0.74 [0.66, 0.80]	34
	ShinImproved	MobGap	214	95.09 [93.40, 96.78]	98.34 [96.14, 100.53]	-3.24 [-28.22, 21.73]	4.27 [2.59, 5.96]	-2.33 [-3.46, -1.21]	3.56 [2.45, 4.67]	0.61 [0.50, 0.69]	34
	ShinImproved	Original Implementation	214	95.12 [93.40, 96.84]	98.34 [96.14, 100.53]	-3.22 [-26.61, 20.17]	4.23 [2.66, 5.79]	-2.43 [-3.48, -1.38]	3.62 [2.60, 4.64]	0.66 [0.55, 0.74]	34
MS	HKLeeImproved	MobGap	228	94.14 [92.06, 96.22]	94.89 [92.57, 97.22]	-0.74 [-24.09, 22.61]	4.97 [3.55, 6.39]	0.65 [-1.19, 2.50]	5.70 [4.00, 7.40]	0.75 [0.69, 0.81]	6
	HKLeeImproved	Original Implementation	228	96.40 [94.11, 98.69]	94.89 [92.57, 97.22]	1.50 [-19.80, 22.81]	5.70 [4.47, 6.93]	2.72 [0.98, 4.45]	6.44 [4.86, 8.01]	0.81 [0.76, 0.85]	6
	ShinImproved	MobGap	228	93.30 [91.02, 95.58]	94.89 [92.57, 97.22]	-2.21 [-28.81, 24.39]	5.91 [4.29, 7.53]	-1.12 [-3.02, 0.77]	6.22 [4.49, 7.95]	0.70 [0.62, 0.76]	10
	ShinImproved	Original Implementation	228	92.49 [90.20, 94.79]	94.89 [92.57, 97.22]	-2.40 [-28.10, 23.30]	5.85 [4.29, 7.41]	-1.39 [-3.11, 0.34]	6.13 [4.58, 7.68]	0.72 [0.65, 0.78]	6
PD	HKLeeImproved	MobGap	224	93.35 [91.63, 95.07]	93.64 [91.54, 95.74]	-0.29 [-18.64, 18.06]	4.14 [3.01, 5.27]	0.78 [-0.56, 2.12]	4.75 [3.52, 5.99]	0.80 [0.74, 0.84]	28
	HKLeeImproved	Original Implementation	224	94.90 [92.91, 96.88]	93.64 [91.54, 95.74]	1.26 [-17.92, 20.43]	5.02 [3.85, 6.18]	2.21 [0.89, 3.53]	5.63 [4.42, 6.85]	0.80 [0.74, 0.85]	28
	ShinImproved	MobGap	224	92.61 [90.75, 94.47]	93.64 [91.54, 95.74]	-1.25 [-16.78, 14.28]	3.50 [2.53, 4.46]	-0.75 [-1.63, 0.13]	3.61 [2.83, 4.40]	0.86 [0.82, 0.89]	29
	ShinImproved	Original Implementation	224	91.99 [90.12, 93.87]	93.64 [91.54, 95.74]	-1.65 [-20.64, 17.35]	4.22 [3.03, 5.42]	-0.98 [-2.07, 0.11]	4.44 [3.44, 5.43]	0.79 [0.73, 0.84]	28
PFF	HKLeeImproved	MobGap	169	94.32 [92.14, 96.50]	94.03 [91.14, 96.92]	0.08 [-22.79, 22.94]	5.36 [3.79, 6.94]	2.22 [-0.21, 4.65]	6.71 [4.45, 8.96]	0.76 [0.69, 0.82]	1
	HKLeeImproved	Original Implementation	169	95.65 [93.31, 97.98]	94.03 [91.14, 96.92]	1.62 [-21.92, 25.15]	6.04 [4.44, 7.64]	3.86 [1.37, 6.35]	7.60 [5.29, 9.90]	0.76 [0.69, 0.82]	0
	ShinImproved	MobGap	169	93.50 [91.12, 95.87]	94.03 [91.14, 96.92]	-0.52 [-23.02, 21.98]	5.78 [4.28, 7.29]	1.28 [-0.92, 3.47]	6.75 [4.79, 8.71]	0.79 [0.72, 0.84]	0
	ShinImproved	Original Implementation	169	92.95 [90.53, 95.38]	94.03 [91.14, 96.92]	-1.07 [-25.33, 23.18]	5.98 [4.33, 7.63]	0.70 [-1.55, 2.95]	6.90 [4.89, 8.90]	0.75 [0.68, 0.81]	0

Per relevant cohort#

low_impairment_results = lab_results[
    lab_results["cohort"].isin(low_impairment_cohorts)
].query("algo == @low_impairment_algo")

hue_order = ["Original Implementation", "MobGap"]

fig, ax = plt.subplots()
sns.boxplot(
    data=low_impairment_results,
    x="cohort",
    y="wb__abs_rel_error",
    hue="version",
    hue_order=hue_order,
    ax=ax,
)
sns.boxplot(
    data=low_impairment_results,
    x="_combined",
    y="wb__abs_rel_error",
    hue="version",
    hue_order=hue_order,
    legend=False,
    ax=ax,
)
fig.suptitle(f"Low Impairment Cohorts ({low_impairment_algo})")
fig.show()

perf_metrics_cohort.copy().loc[
    pd.IndexSlice[low_impairment_cohorts, low_impairment_algo], :
].reset_index("algo", drop=True).style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["cohort"],
)

		# participants	WD mean and CI [steps/min]	INDIP mean and CI [steps/min]	Bias and LoA [steps/min]	Abs. Error [steps/min]	Rel. Error [%]	Abs. Rel. Error [%]	ICC	# Failed WBs
cohort	version
HA	MobGap	227	99.17 [97.06, 101.28]	101.68 [99.55, 103.81]	-2.51 [-24.42, 19.41]	4.88 [3.52, 6.24]	-1.97 [-3.08, -0.87]	4.23 [3.22, 5.25]	0.76 [0.68, 0.81]	36
HA	Original Implementation	227	98.99 [96.96, 101.03]	101.68 [99.55, 103.81]	-2.69 [-22.72, 17.34]	5.11 [3.89, 6.33]	-2.16 [-3.18, -1.14]	4.52 [3.61, 5.44]	0.79 [0.72, 0.84]	36
COPD	MobGap	214	95.09 [93.40, 96.78]	98.34 [96.14, 100.53]	-3.24 [-28.22, 21.73]	4.27 [2.59, 5.96]	-2.33 [-3.46, -1.21]	3.56 [2.45, 4.67]	0.61 [0.50, 0.69]	34
COPD	Original Implementation	214	95.12 [93.40, 96.84]	98.34 [96.14, 100.53]	-3.22 [-26.61, 20.17]	4.23 [2.66, 5.79]	-2.43 [-3.48, -1.38]	3.62 [2.60, 4.64]	0.66 [0.55, 0.74]	34
CHF	MobGap	106	96.03 [93.35, 98.72]	95.65 [92.35, 98.96]	0.02 [-12.79, 12.83]	3.54 [2.45, 4.62]	0.96 [-0.81, 2.73]	4.14 [2.50, 5.77]	0.91 [0.87, 0.94]	10
CHF	Original Implementation	106	94.87 [92.17, 97.56]	95.65 [92.35, 98.96]	-0.79 [-13.33, 11.75]	3.59 [2.52, 4.66]	0.08 [-1.55, 1.72]	4.05 [2.55, 5.54]	0.92 [0.88, 0.94]	9

high_impairment_results = lab_results[
    lab_results["cohort"].isin(high_impairment_cohorts)
].query("algo == @high_impairment_algo")

fig, ax = plt.subplots()
sns.boxplot(
    data=high_impairment_results,
    x="cohort",
    y="wb__abs_rel_error",
    hue="version",
    hue_order=hue_order,
    ax=ax,
)
sns.boxplot(
    data=high_impairment_results,
    x="_combined",
    y="wb__abs_rel_error",
    hue="version",
    hue_order=hue_order,
    legend=False,
    ax=ax,
)
fig.suptitle(f"High Impairment Cohorts ({high_impairment_algo})")
fig.show()

perf_metrics_cohort.copy().loc[
    pd.IndexSlice[high_impairment_cohorts, high_impairment_algo], :
].reset_index("algo", drop=True).style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["cohort"],
)

		# participants	WD mean and CI [steps/min]	INDIP mean and CI [steps/min]	Bias and LoA [steps/min]	Abs. Error [steps/min]	Rel. Error [%]	Abs. Rel. Error [%]	ICC	# Failed WBs
cohort	version
PD	MobGap	224	93.35 [91.63, 95.07]	93.64 [91.54, 95.74]	-0.29 [-18.64, 18.06]	4.14 [3.01, 5.27]	0.78 [-0.56, 2.12]	4.75 [3.52, 5.99]	0.80 [0.74, 0.84]	28
PD	Original Implementation	224	94.90 [92.91, 96.88]	93.64 [91.54, 95.74]	1.26 [-17.92, 20.43]	5.02 [3.85, 6.18]	2.21 [0.89, 3.53]	5.63 [4.42, 6.85]	0.80 [0.74, 0.85]	28
MS	MobGap	228	94.14 [92.06, 96.22]	94.89 [92.57, 97.22]	-0.74 [-24.09, 22.61]	4.97 [3.55, 6.39]	0.65 [-1.19, 2.50]	5.70 [4.00, 7.40]	0.75 [0.69, 0.81]	6
MS	Original Implementation	228	96.40 [94.11, 98.69]	94.89 [92.57, 97.22]	1.50 [-19.80, 22.81]	5.70 [4.47, 6.93]	2.72 [0.98, 4.45]	6.44 [4.86, 8.01]	0.81 [0.76, 0.85]	6
PFF	MobGap	169	94.32 [92.14, 96.50]	94.03 [91.14, 96.92]	0.08 [-22.79, 22.94]	5.36 [3.79, 6.94]	2.22 [-0.21, 4.65]	6.71 [4.45, 8.96]	0.76 [0.69, 0.82]	1
PFF	Original Implementation	169	95.65 [93.31, 97.98]	94.03 [91.14, 96.92]	1.62 [-21.92, 25.15]	6.04 [4.44, 7.64]	3.86 [1.37, 6.35]	7.60 [5.29, 9.90]	0.76 [0.69, 0.82]	0

Total running time of the script: (0 minutes 11.850 seconds)

Estimated memory usage: 80 MB

Gallery generated by Sphinx-Gallery

Performance of the cadence algorithms on the TVS dataset#

Performance metrics#

Free-Living Comparison#

All results across all cohorts#

Per Cohort#

Per relevant cohort#

Speed dependency#

Laboratory Comparison#

All results across all cohorts#

Per Cohort#

Per relevant cohort#

This Page