Note

Go to the end to download the full example code.

Performance of the stride length algorithms on the TVS dataset#

The following provides an analysis and comparison of the stride length algorithms on the TVS dataset (lab and free-living). We look into the actual performance of the algorithms compared to the reference data and compare the results to the previous results generated by the matlab pipeline.

Note

If you are interested in how these results are calculated, head over to the processing page.

Below are the list of algorithms that we will compare. Note, that we use the prefix “MobGap” to refer to the reimplemented python algorithms. For the zjils algorithm, we compare both potential threshold values that were determined as part of the pre-validation analysis on the MsProject dataset.

algorithms = {
    "SlZjilstra__MS_ALL": ("SlZjilstra - MS-all", "MobGap"),
    "SlZjilstra__MS_MS": ("SlZjilstra - MS-MS", "MobGap"),
    "matlab_zjilsV3__MS_ALL": (
        "SlZjilstra - MS-all",
        "Original Implementation",
    ),
    "matlab_zjilsV3__MS_MS": ("SlZjilstra - MS-MS", "Original Implementation"),
}

The code below loads the data and prepares it for the analysis. By default, the data will be downloaded from an online repository (and cached locally). If you want to use a local copy of the data, you can set the MOBGAP_VALIDATION_DATA_PATH environment variable. and the MOBGAP_VALIDATION_USE_LOCA_DATA to 1.

The file download will print a couple log information, which can usually be ignored. You can also change the version parameter to load a different version of the data.

from pathlib import Path

import pandas as pd
from mobgap.data.validation_results import ValidationResultLoader
from mobgap.utils.misc import get_env_var


def format_loaded_results(
    values: dict[tuple[str, str], pd.DataFrame],
    index_cols: list[str],
    convert_rel_error: bool = False,
) -> pd.DataFrame:
    formatted = (
        pd.concat(values, names=["algo", "version", *index_cols])
        .reset_index()
        .assign(
            algo_with_version=lambda df: (
                df["algo"] + " (" + df["version"] + ")"
            ),
            _combined="combined",
        )
    )
    if not convert_rel_error:
        return formatted
    rel_cols = [c for c in formatted.columns if "rel_error" in c]
    formatted[rel_cols] = formatted[rel_cols] * 100
    return formatted


local_data_path = (
    Path(get_env_var("MOBGAP_VALIDATION_DATA_PATH")) / "results"
    if int(get_env_var("MOBGAP_VALIDATION_USE_LOCAL_DATA", 0))
    else None
)
__RESULT_VERSION = "v1.2.0"
loader = ValidationResultLoader(
    "sl", result_path=local_data_path, version=__RESULT_VERSION
)


free_living_index_cols = [
    "cohort",
    "participant_id",
    "time_measure",
    "recording",
    "recording_name",
    "recording_name_pretty",
]

free_living_results = format_loaded_results(
    {
        v: loader.load_single_results(k, "free_living")
        for k, v in algorithms.items()
    },
    free_living_index_cols,
    convert_rel_error=True,
)

lab_index_cols = [
    "cohort",
    "participant_id",
    "time_measure",
    "test",
    "trial",
    "test_name",
    "test_name_pretty",
]

lab_results = format_loaded_results(
    {
        v: loader.load_single_results(k, "laboratory")
        for k, v in algorithms.items()
    },
    lab_index_cols,
    convert_rel_error=True,
)

cohort_order = ["HA", "CHF", "COPD", "MS", "PD", "PFF"]

  0%|                                              | 0.00/12.0k [00:00<?, ?B/s]
  0%|                                              | 0.00/12.0k [00:00<?, ?B/s]
100%|█████████████████████████████████████| 12.0k/12.0k [00:00<00:00, 55.2MB/s]

  0%|                                              | 0.00/12.1k [00:00<?, ?B/s]
  0%|                                              | 0.00/12.1k [00:00<?, ?B/s]
100%|█████████████████████████████████████| 12.1k/12.1k [00:00<00:00, 86.0MB/s]

  0%|                                              | 0.00/11.9k [00:00<?, ?B/s]
  0%|                                              | 0.00/11.9k [00:00<?, ?B/s]
100%|█████████████████████████████████████| 11.9k/11.9k [00:00<00:00, 89.3MB/s]

  0%|                                              | 0.00/12.1k [00:00<?, ?B/s]
  0%|                                              | 0.00/12.1k [00:00<?, ?B/s]
100%|█████████████████████████████████████| 12.1k/12.1k [00:00<00:00, 90.1MB/s]

  0%|                                              | 0.00/89.6k [00:00<?, ?B/s]
  0%|                                              | 0.00/89.6k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 89.6k/89.6k [00:00<00:00, 520MB/s]

  0%|                                              | 0.00/90.2k [00:00<?, ?B/s]
  0%|                                              | 0.00/90.2k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 90.2k/90.2k [00:00<00:00, 549MB/s]

  0%|                                              | 0.00/87.5k [00:00<?, ?B/s]
  0%|                                              | 0.00/87.5k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 87.5k/87.5k [00:00<00:00, 487MB/s]

  0%|                                              | 0.00/88.0k [00:00<?, ?B/s]
  0%|                                              | 0.00/88.0k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 88.0k/88.0k [00:00<00:00, 519MB/s]

Performance metrics#

Below you can find the setup for all performance metrics that we will calculate. We only use the wb__ results for the comparison. These results are calculated by first calculating the average stride length per WB. Then calculating the error metrics for each WB. Then we take the average over all WBs of a participant to get the wb__ results.

from functools import partial

from mobgap.pipeline.evaluation import CustomErrorAggregations as A
from mobgap.utils.df_operations import (
    CustomOperation,
    apply_aggregations,
    apply_transformations,
    multilevel_groupby_apply_merge,
)
from mobgap.utils.tables import FormatTransformer as F
from mobgap.utils.tables import RevalidationInfo, revalidation_table_styles
from mobgap.utils.tables import StatsFunctions as S

custom_aggs = [
    CustomOperation(
        identifier=None,
        function=A.n_datapoints,
        column_name=[("n_datapoints", "all")],
    ),
    CustomOperation(
        identifier=None,
        function=lambda df_: df_["wb__detected"].isna().sum(),
        column_name=[("n_nan_detected", "all")],
    ),
    ("wb__detected", ["mean", A.conf_intervals]),
    ("wb__reference", ["mean", A.conf_intervals]),
    ("wb__error", ["mean", A.loa]),
    ("wb__abs_error", ["mean", A.conf_intervals]),
    ("wb__rel_error", ["mean", A.conf_intervals]),
    ("wb__abs_rel_error", ["mean", A.conf_intervals]),
    CustomOperation(
        identifier=None,
        function=partial(
            A.icc,
            reference_col_name="wb__reference",
            detected_col_name="wb__detected",
            icc_type="icc2",
            # For the lab data, some trials have no results for the old algorithms.
            nan_policy="omit",
        ),
        column_name=[("icc", "wb_level"), ("icc_ci", "wb_level")],
    ),
]

stats_transform = [
    CustomOperation(
        identifier=None,
        function=partial(
            S.pairwise_tests,
            value_col=c,
            between="version",
            reference_group_key="Original Implementation",
        ),
        column_name=[("stats_metadata", c)],
    )
    for c in [
        "wb__abs_error",
        "wb__abs_rel_error",
    ]
]

format_transforms = [
    CustomOperation(
        identifier=None,
        function=lambda df_: df_[("n_datapoints", "all")].astype(int),
        column_name="n_datapoints",
    ),
    CustomOperation(
        identifier=None,
        function=lambda df_: df_[("n_nan_detected", "all")].astype(int),
        column_name="n_nan_detected",
    ),
    *(
        CustomOperation(
            identifier=None,
            function=partial(
                F.value_with_metadata,
                value_col=("mean", c),
                other_columns={
                    "range": ("conf_intervals", c),
                    **(
                        {"stats_metadata": ("stats_metadata", c)}
                        if c in ["wb__abs_error", "wb__abs_rel_error"]
                        else {}
                    ),
                },
            ),
            column_name=c,
        )
        for c in [
            "wb__reference",
            "wb__detected",
            "wb__abs_error",
            "wb__rel_error",
            "wb__abs_rel_error",
        ]
    ),
    CustomOperation(
        identifier=None,
        function=partial(
            F.value_with_metadata,
            value_col=("mean", "wb__error"),
            other_columns={"range": ("loa", "wb__error")},
        ),
        column_name="wb__error",
    ),
    CustomOperation(
        identifier=None,
        function=partial(
            F.value_with_metadata,
            value_col=("icc", "wb_level"),
            other_columns={"range": ("icc_ci", "wb_level")},
        ),
        column_name="icc",
    ),
]


final_names = {
    "n_datapoints": "# participants",
    "wb__detected": "WD mean and CI [m]",
    "wb__reference": "INDIP mean and CI [m]",
    "wb__error": "Bias and LoA [m]",
    "wb__abs_error": "Abs. Error [m]",
    "wb__rel_error": "Rel. Error [%]",
    "wb__abs_rel_error": "Abs. Rel. Error [%]",
    "icc": "ICC",
    "n_nan_detected": "# Failed WBs",
}


validation_thresholds = {
    "Abs. Error [m]": RevalidationInfo(threshold=None, higher_is_better=False),
    "Abs. Rel. Error [%]": RevalidationInfo(
        threshold=20, higher_is_better=False
    ),
    "ICC": RevalidationInfo(threshold=0.7, higher_is_better=True),
    "# Failed WBs": RevalidationInfo(threshold=None, higher_is_better=False),
}


def format_tables(df: pd.DataFrame) -> pd.DataFrame:
    return (
        df.pipe(apply_transformations, format_transforms)
        .rename(columns=final_names)
        .loc[:, list(final_names.values())]
    )

Free-Living Comparison#

We focus on the free-living data for the comparison as this is the expected use case for the algorithms.

All results across all cohorts#

The results below represent the average performance across all participants independent of the cohort.

import matplotlib.pyplot as plt
import seaborn as sns

fig, ax = plt.subplots()
sns.boxplot(
    data=free_living_results, x="algo_with_version", y="wb__abs_error", ax=ax
)
plt.xticks(rotation=45, ha="right")
fig.tight_layout()
fig.show()

perf_metrics_all = free_living_results.pipe(
    multilevel_groupby_apply_merge,
    [
        (
            ["algo", "version"],
            partial(apply_aggregations, aggregations=custom_aggs),
        ),
        (
            ["algo"],
            partial(apply_transformations, transformations=stats_transform),
        ),
    ],
).pipe(format_tables)
perf_metrics_all.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["algo"],
)

		# participants	WD mean and CI [m]	INDIP mean and CI [m]	Bias and LoA [m]	Abs. Error [m]	Rel. Error [%]	Abs. Rel. Error [%]	ICC	# Failed WBs
algo	version
SlZjilstra - MS-MS	MobGap	101	0.89 [0.86, 0.92]	0.79 [0.76, 0.83]	0.10 [-0.10, 0.29]	0.15 [0.14, 0.16]^**	19.51 [16.12, 22.89]	24.71 [22.03, 27.38]^*	0.70 [0.16, 0.87]	1
SlZjilstra - MS-MS	Original Implementation	101	0.93 [0.90, 0.96]	0.79 [0.76, 0.83]	0.13 [-0.08, 0.34]	0.17 [0.16, 0.19]	26.03 [21.99, 30.08]	30.15 [26.67, 33.62]	0.59 [-0.04, 0.83]	1
SlZjilstra - MS-all	MobGap	101	0.92 [0.89, 0.95]	0.79 [0.76, 0.83]	0.13 [-0.07, 0.32]	0.16 [0.15, 0.17]^**	23.47 [19.97, 26.97]	27.03 [24.07, 29.99]^*	0.64 [-0.03, 0.86]	1
SlZjilstra - MS-all	Original Implementation	101	0.96 [0.93, 0.99]	0.79 [0.76, 0.83]	0.16 [-0.05, 0.37]	0.19 [0.17, 0.21]	30.20 [26.02, 34.38]	32.89 [29.13, 36.66]	0.53 [-0.09, 0.81]	1

Per Cohort#

The results below represent the average performance across all participants within a cohort.

fig, ax = plt.subplots()
sns.boxplot(
    data=free_living_results,
    x="cohort",
    y="wb__abs_error",
    hue="algo_with_version",
    order=cohort_order,
    ax=ax,
)
fig.show()
perf_metrics_cohort = (
    free_living_results.pipe(
        multilevel_groupby_apply_merge,
        [
            (
                ["cohort", "algo", "version"],
                partial(apply_aggregations, aggregations=custom_aggs),
            ),
            (
                ["cohort", "algo"],
                partial(apply_transformations, transformations=stats_transform),
            ),
        ],
    )
    .pipe(format_tables)
    .loc[cohort_order]
)
perf_metrics_cohort.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["cohort", "algo"],
)

			# participants	WD mean and CI [m]	INDIP mean and CI [m]	Bias and LoA [m]	Abs. Error [m]	Rel. Error [%]	Abs. Rel. Error [%]	ICC	# Failed WBs
cohort	algo	version
HA	SlZjilstra - MS-MS	MobGap	20	0.90 [0.85, 0.95]	0.82 [0.76, 0.87]	0.09 [-0.05, 0.22]	0.13 [0.11, 0.14]^*	16.67 [11.83, 21.52]	20.99 [17.81, 24.18]	0.69 [-0.03, 0.90]	0
	SlZjilstra - MS-MS	Original Implementation	20	0.93 [0.88, 0.99]	0.82 [0.76, 0.87]	0.11 [-0.04, 0.27]	0.15 [0.13, 0.17]	21.98 [16.32, 27.65]	25.62 [21.41, 29.83]	0.55 [-0.10, 0.85]	0
	SlZjilstra - MS-all	MobGap	20	0.93 [0.88, 0.99]	0.82 [0.76, 0.87]	0.12 [-0.02, 0.25]	0.14 [0.12, 0.16]	20.54 [15.53, 25.55]	23.33 [19.57, 27.09]	0.60 [-0.09, 0.88]	0
	SlZjilstra - MS-all	Original Implementation	20	0.96 [0.91, 1.02]	0.82 [0.76, 0.87]	0.15 [-0.01, 0.30]	0.17 [0.15, 0.19]	26.01 [20.16, 31.87]	28.31 [23.52, 33.11]	0.47 [-0.10, 0.81]	0
CHF	SlZjilstra - MS-MS	MobGap	10	0.91 [0.82, 1.00]	0.86 [0.75, 0.97]	0.06 [-0.10, 0.21]	0.13 [0.10, 0.17]	14.97 [5.64, 24.31]	22.01 [14.20, 29.83]	0.85 [0.43, 0.96]	0
	SlZjilstra - MS-MS	Original Implementation	10	0.94 [0.85, 1.03]	0.86 [0.75, 0.97]	0.08 [-0.07, 0.23]	0.15 [0.12, 0.19]	20.31 [10.82, 29.79]	26.73 [18.56, 34.89]	0.80 [0.09, 0.96]	0
	SlZjilstra - MS-all	MobGap	10	0.94 [0.85, 1.04]	0.86 [0.75, 0.97]	0.09 [-0.07, 0.24]	0.14 [0.10, 0.17]	18.78 [9.14, 28.43]	23.51 [14.93, 32.09]	0.80 [0.07, 0.95]	0
	SlZjilstra - MS-all	Original Implementation	10	0.97 [0.88, 1.06]	0.86 [0.75, 0.97]	0.11 [-0.04, 0.26]	0.16 [0.12, 0.20]	24.29 [14.49, 34.08]	28.38 [19.39, 37.37]	0.74 [-0.07, 0.94]	0
COPD	SlZjilstra - MS-MS	MobGap	17	0.94 [0.87, 1.00]	0.81 [0.75, 0.88]	0.12 [-0.03, 0.27]	0.15 [0.13, 0.17]	19.76 [14.57, 24.94]	22.96 [19.24, 26.69]	0.59 [-0.10, 0.87]	0
	SlZjilstra - MS-MS	Original Implementation	17	0.97 [0.91, 1.03]	0.81 [0.75, 0.88]	0.16 [-0.00, 0.31]	0.18 [0.15, 0.20]	25.88 [19.96, 31.81]	27.94 [23.15, 32.72]	0.46 [-0.09, 0.82]	0
	SlZjilstra - MS-all	MobGap	17	0.97 [0.90, 1.03]	0.81 [0.75, 0.88]	0.15 [0.00, 0.30]	0.17 [0.15, 0.20]	23.72 [18.36, 29.08]	25.76 [21.44, 30.08]	0.51 [-0.09, 0.84]	0
	SlZjilstra - MS-all	Original Implementation	17	1.01 [0.94, 1.07]	0.81 [0.75, 0.88]	0.19 [0.03, 0.35]	0.20 [0.17, 0.23]	30.05 [23.92, 36.17]	31.25 [25.91, 36.60]	0.39 [-0.07, 0.78]	0
MS	SlZjilstra - MS-MS	MobGap	18	0.98 [0.91, 1.05]	0.84 [0.76, 0.92]	0.13 [-0.08, 0.35]	0.18 [0.14, 0.21]	24.22 [15.50, 32.94]	27.95 [20.55, 35.35]	0.57 [-0.07, 0.85]	0
	SlZjilstra - MS-MS	Original Implementation	18	1.03 [0.96, 1.09]	0.84 [0.76, 0.92]	0.18 [-0.05, 0.41]	0.21 [0.17, 0.25]	31.40 [21.12, 41.67]	34.03 [24.71, 43.35]	0.44 [-0.11, 0.79]	0
	SlZjilstra - MS-all	MobGap	18	1.01 [0.94, 1.08]	0.84 [0.76, 0.92]	0.17 [-0.05, 0.39]	0.20 [0.16, 0.24]	28.33 [19.32, 37.35]	30.88 [22.81, 38.96]	0.51 [-0.10, 0.83]	0
	SlZjilstra - MS-all	Original Implementation	18	1.06 [0.99, 1.13]	0.84 [0.76, 0.92]	0.21 [-0.02, 0.44]	0.23 [0.18, 0.28]	35.74 [25.12, 46.36]	37.28 [27.25, 47.31]	0.39 [-0.10, 0.76]	0
PD	SlZjilstra - MS-MS	MobGap	19	0.86 [0.78, 0.94]	0.79 [0.70, 0.87]	0.07 [-0.20, 0.35]	0.15 [0.12, 0.19]	17.93 [6.19, 29.67]	26.35 [17.07, 35.63]	0.65 [0.27, 0.85]	0
	SlZjilstra - MS-MS	Original Implementation	19	0.90 [0.82, 0.97]	0.79 [0.70, 0.87]	0.11 [-0.19, 0.41]	0.18 [0.13, 0.22]	24.69 [9.99, 39.39]	31.55 [18.76, 44.33]	0.55 [0.09, 0.80]	0
	SlZjilstra - MS-all	MobGap	19	0.89 [0.81, 0.97]	0.79 [0.70, 0.87]	0.10 [-0.18, 0.38]	0.16 [0.12, 0.21]	21.84 [9.71, 33.97]	28.02 [17.85, 38.19]	0.61 [0.14, 0.84]	0
	SlZjilstra - MS-all	Original Implementation	19	0.93 [0.85, 1.01]	0.79 [0.70, 0.87]	0.14 [-0.16, 0.44]	0.19 [0.14, 0.24]	28.81 [13.62, 44.00]	33.69 [20.00, 47.39]	0.50 [-0.01, 0.79]	0
PFF	SlZjilstra - MS-MS	MobGap	17	0.74 [0.67, 0.80]	0.65 [0.56, 0.73]	0.09 [-0.08, 0.25]	0.14 [0.11, 0.16]	22.19 [14.14, 30.24]	27.28 [21.25, 33.30]	0.75 [0.08, 0.92]	1
	SlZjilstra - MS-MS	Original Implementation	17	0.78 [0.72, 0.84]	0.65 [0.56, 0.73]	0.13 [-0.04, 0.30]	0.17 [0.14, 0.19]	30.38 [21.15, 39.61]	34.25 [26.60, 41.90]	0.62 [-0.09, 0.89]	1
	SlZjilstra - MS-all	MobGap	17	0.76 [0.69, 0.83]	0.65 [0.56, 0.73]	0.11 [-0.05, 0.28]	0.15 [0.12, 0.17]	26.24 [17.92, 34.56]	29.71 [22.83, 36.60]	0.70 [-0.06, 0.91]	1
	SlZjilstra - MS-all	Original Implementation	17	0.81 [0.74, 0.87]	0.65 [0.56, 0.73]	0.16 [-0.02, 0.33]	0.18 [0.15, 0.21]	34.69 [25.16, 44.22]	37.30 [28.91, 45.69]	0.56 [-0.09, 0.87]	1

Deep Dive Analysis of Main Algorithms#

Below, you can find detailed correlation and residual plots comparing the new and the old implementation of each algorithm. Each datapoint represents one participant.

from mobgap.plotting import (
    calc_min_max_with_margin,
    make_square,
    move_legend_outside,
    plot_regline,
    residual_plot,
)


def combo_residual_plot(data):
    fig, axs = plt.subplots(
        ncols=2,
        sharey=True,
        sharex=True,
        figsize=(15, 9),
        constrained_layout=True,
    )
    fig.suptitle(data.name)
    for (version, subdata), ax in zip(data.groupby("version"), axs):
        residual_plot(
            subdata,
            "wb__reference",
            "wb__detected",
            "cohort",
            "m",
            ax=ax,
            legend=ax == axs[-1],
        )
        ax.set_title(version)
    move_legend_outside(fig, axs[-1])
    plt.show()


def combo_scatter_plot(data):
    fig, axs = plt.subplots(
        ncols=2,
        sharey=True,
        sharex=True,
        figsize=(15, 8),
        constrained_layout=True,
    )
    fig.suptitle(data.name)
    min_max = calc_min_max_with_margin(
        data["wb__reference"], data["wb__detected"]
    )
    for (version, subdata), ax in zip(data.groupby("version"), axs):
        subdata = subdata[["wb__reference", "wb__detected", "cohort"]].dropna(
            how="any"
        )
        sns.scatterplot(
            subdata,
            x="wb__reference",
            y="wb__detected",
            hue="cohort",
            ax=ax,
            legend=ax == axs[-1],
        )
        plot_regline(subdata["wb__reference"], subdata["wb__detected"], ax=ax)
        make_square(ax, min_max, draw_diagonal=True)
        ax.set_title(version)
        ax.set_xlabel("Reference [m]")
        ax.set_ylabel("Detected [m]")
    move_legend_outside(fig, axs[-1])
    plt.tight_layout()
    plt.show()


free_living_results.groupby("algo").apply(
    combo_residual_plot, include_groups=False
)
free_living_results.groupby("algo").apply(
    combo_scatter_plot, include_groups=False
)

/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/stable/revalidation/stride_length/_01_sl_analysis.py:422: UserWarning: The figure layout has changed to tight
  plt.tight_layout()
/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/stable/revalidation/stride_length/_01_sl_analysis.py:422: UserWarning: The figure layout has changed to tight
  plt.tight_layout()

Below, we show the direct correlation between the results from the old and the new implementation. Each datapoint represents one participant.

def compare_scatter_plot(data):
    fig, ax = plt.subplots(figsize=(9, 9), constrained_layout=True)
    reformated_data = (
        data.pivot_table(
            values="wb__detected",
            index=("cohort", "participant_id"),
            columns="version",
        )
        .reset_index()
        .dropna(how="any")
    )

    min_max = calc_min_max_with_margin(
        reformated_data["Original Implementation"], reformated_data["MobGap"]
    )
    sns.scatterplot(
        reformated_data,
        x="Original Implementation",
        y="MobGap",
        hue="cohort",
        ax=ax,
    )
    plot_regline(
        reformated_data["Original Implementation"],
        reformated_data["MobGap"],
        ax=ax,
    )
    make_square(ax, min_max, draw_diagonal=True)
    move_legend_outside(fig, ax)
    ax.set_title(data.name)
    ax.set_xlabel("Original Implementation [m]")
    ax.set_ylabel("MobGap [m]")
    plt.show()


free_living_results.groupby("algo").apply(
    compare_scatter_plot, include_groups=False
)

Speed dependency#

One important aspect of the algorithm performance is the dependency on the walking speed. Aka, how well do the algorithms perform at different walking speeds. For this we plot the absolute relative error against the walking speed of the reference data. For better granularity, we use the values per WB, instead of the aggregates per participant.

The overlayed dots represent the trend-line calculated by taking the median of the absolute relative error within bins of 0.05 m/s.

import numpy as np

wb_level_results = format_loaded_results(
    {
        v: loader.load_single_csv_file(
            k, "free_living", "raw_wb_level_values_with_errors.csv"
        )
        for k, v in algorithms.items()
    },
    free_living_index_cols,
)

# For plotting all participants at the end
combined = wb_level_results.copy()
combined["cohort"] = "Combined"
wb_level_results = pd.concat([wb_level_results, combined]).reset_index(
    drop=True
)

algo_names = wb_level_results["algo_with_version"].unique()
cohort_names = wb_level_results["cohort"].unique()

wb_level_results["cohort"] = pd.Categorical(
    wb_level_results["cohort"], categories=cohort_names, ordered=True
)
wb_level_results["algo_with_version"] = pd.Categorical(
    wb_level_results["algo_with_version"], categories=algo_names, ordered=True
)


fig = plt.figure(constrained_layout=True, figsize=(18, 3 * len(algo_names)))
subfigs = fig.subfigures(len(algo_names), 1, wspace=0.1, hspace=0.1)

min_max_x = calc_min_max_with_margin(wb_level_results["reference_ws"])
min_max_y = calc_min_max_with_margin(wb_level_results["abs_rel_error"])

for subfig, (algo, data) in zip(
    subfigs, wb_level_results.groupby("algo_with_version", observed=True)
):
    subfig.suptitle(algo)
    subfig.supxlabel("Walking Speed (m/s)")
    subfig.supylabel("Absolute Relative Error")
    axs = subfig.subplots(1, len(cohort_names), sharex=True, sharey=True)
    for ax, (cohort, cohort_data) in zip(
        axs, data.groupby("cohort", observed=True)
    ):
        sns.scatterplot(
            data=cohort_data,
            x="reference_ws",
            y="abs_rel_error",
            ax=ax,
            alpha=0.3,
        )

        bins = np.arange(0, cohort_data["reference_ws"].max() + 0.05, 0.05)
        cohort_data["speed_bin"] = pd.cut(
            cohort_data["reference_ws"], bins=bins
        )

        # Calculate bin centers for plotting
        cohort_data["bin_center"] = cohort_data["speed_bin"].apply(
            lambda x: x.mid
        )

        # Calculate medians per bin and cohort
        binned_data = (
            cohort_data.groupby("bin_center", observed=True)["abs_rel_error"]
            .median()
            .reset_index()
        )

        # Plot median lines
        sns.scatterplot(
            data=binned_data,
            x="bin_center",
            y="abs_rel_error",
            ax=ax,
        )
        ax.set_title(cohort)
        ax.set_xlabel(None)
        ax.set_ylabel(None)

        ax.set_xlim(*min_max_x)
        ax.set_ylim(*min_max_y)

fig.show()

CHF, COPD, HA, MS, PD, PFF, Combined, CHF, COPD, HA, MS, PD, PFF, Combined, CHF, COPD, HA, MS, PD, PFF, Combined, CHF, COPD, HA, MS, PD, PFF, Combined

  0%|                                               | 0.00/280k [00:00<?, ?B/s]
  0%|                                               | 0.00/280k [00:00<?, ?B/s]
100%|███████████████████████████████████████| 280k/280k [00:00<00:00, 1.23GB/s]

  0%|                                               | 0.00/281k [00:00<?, ?B/s]
  0%|                                               | 0.00/281k [00:00<?, ?B/s]
100%|███████████████████████████████████████| 281k/281k [00:00<00:00, 1.49GB/s]

  0%|                                               | 0.00/276k [00:00<?, ?B/s]
  0%|                                               | 0.00/276k [00:00<?, ?B/s]
100%|███████████████████████████████████████| 276k/276k [00:00<00:00, 1.46GB/s]

  0%|                                               | 0.00/277k [00:00<?, ?B/s]
  0%|                                               | 0.00/277k [00:00<?, ?B/s]
100%|███████████████████████████████████████| 277k/277k [00:00<00:00, 1.43GB/s]

Laboratory Comparison#

Every datapoint below is one trial of a test. Note, that each datapoint is weighted equally in the calculation of the performance metrics. This is a limitation of this simple approach, as the number of strides per trial and the complexity of the context can vary significantly. For a full picture, different groups of tests should be analyzed separately. The approach below should still provide a good overview to compare the algorithms.

fig, ax = plt.subplots()
sns.boxplot(data=lab_results, x="algo_with_version", y="wb__abs_error", ax=ax)
plt.xticks(rotation=45, ha="right")
fig.tight_layout()
fig.show()

perf_metrics_all = lab_results.pipe(
    multilevel_groupby_apply_merge,
    [
        (
            ["algo", "version"],
            partial(apply_aggregations, aggregations=custom_aggs),
        ),
        (
            ["algo"],
            partial(apply_transformations, transformations=stats_transform),
        ),
    ],
).pipe(format_tables)
perf_metrics_all.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["algo"],
)

		# participants	WD mean and CI [m]	INDIP mean and CI [m]	Bias and LoA [m]	Abs. Error [m]	Rel. Error [%]	Abs. Rel. Error [%]	ICC	# Failed WBs
algo	version
SlZjilstra - MS-MS	MobGap	1168	1.02 [1.01, 1.04]	1.02 [1.01, 1.03]	0.00 [-0.26, 0.27]	0.11 [0.10, 0.11]^*	3.15 [2.10, 4.20]	12.61 [11.81, 13.40]^*	0.84 [0.82, 0.85]	113
SlZjilstra - MS-MS	Original Implementation	1168	1.04 [1.03, 1.05]	1.02 [1.01, 1.03]	0.02 [-0.27, 0.31]	0.12 [0.11, 0.12]	5.58 [4.38, 6.79]	14.10 [13.12, 15.07]	0.80 [0.78, 0.82]	145
SlZjilstra - MS-all	MobGap	1168	1.06 [1.04, 1.07]	1.02 [1.01, 1.03]	0.04 [-0.23, 0.31]	0.11 [0.11, 0.12]^*	6.57 [5.48, 7.65]	13.37 [12.50, 14.23]^*	0.83 [0.79, 0.86]	113
SlZjilstra - MS-all	Original Implementation	1168	1.08 [1.06, 1.09]	1.02 [1.01, 1.03]	0.05 [-0.23, 0.34]	0.12 [0.12, 0.13]	9.07 [7.83, 10.32]	15.08 [14.03, 16.14]	0.79 [0.72, 0.84]	145

Per Cohort#

The results below represent the average performance across all trails of all participants within a cohort.

fig, ax = plt.subplots()
sns.boxplot(
    data=lab_results,
    x="cohort",
    y="wb__abs_error",
    hue="algo_with_version",
    order=cohort_order,
    ax=ax,
)
fig.show()
perf_metrics_cohort = (
    lab_results.pipe(
        multilevel_groupby_apply_merge,
        [
            (
                ["cohort", "algo", "version"],
                partial(apply_aggregations, aggregations=custom_aggs),
            ),
            (
                ["cohort", "algo"],
                partial(apply_transformations, transformations=stats_transform),
            ),
        ],
    )
    .pipe(format_tables)
    .loc[cohort_order]
)
perf_metrics_cohort.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["cohort", "algo"],
)

			# participants	WD mean and CI [m]	INDIP mean and CI [m]	Bias and LoA [m]	Abs. Error [m]	Rel. Error [%]	Abs. Rel. Error [%]	ICC	# Failed WBs
cohort	algo	version
HA	SlZjilstra - MS-MS	MobGap	227	1.07 [1.05, 1.10]	1.08 [1.05, 1.10]	-0.01 [-0.26, 0.25]	0.11 [0.10, 0.12]	1.14 [-0.75, 3.03]	11.22 [9.92, 12.52]	0.77 [0.71, 0.82]	36
	SlZjilstra - MS-MS	Original Implementation	227	1.08 [1.06, 1.10]	1.08 [1.05, 1.10]	-0.00 [-0.27, 0.27]	0.11 [0.10, 0.12]	2.26 [0.04, 4.47]	12.21 [10.51, 13.91]	0.73 [0.66, 0.79]	36
	SlZjilstra - MS-all	MobGap	227	1.11 [1.08, 1.13]	1.08 [1.05, 1.10]	0.03 [-0.23, 0.29]	0.11 [0.10, 0.12]	4.49 [2.54, 6.45]	11.81 [10.41, 13.21]	0.76 [0.69, 0.82]	36
	SlZjilstra - MS-all	Original Implementation	227	1.11 [1.09, 1.14]	1.08 [1.05, 1.10]	0.04 [-0.24, 0.31]	0.12 [0.10, 0.13]	5.64 [3.35, 7.92]	12.78 [10.95, 14.61]	0.72 [0.63, 0.78]	36
CHF	SlZjilstra - MS-MS	MobGap	106	1.04 [1.00, 1.09]	1.10 [1.05, 1.15]	-0.06 [-0.33, 0.22]	0.11 [0.09, 0.13]	-3.81 [-6.26, -1.37]	10.15 [8.47, 11.83]	0.81 [0.69, 0.88]	9
	SlZjilstra - MS-MS	Original Implementation	106	1.09 [1.05, 1.13]	1.10 [1.05, 1.15]	-0.08 [-0.40, 0.25]	0.14 [0.12, 0.16]	-4.63 [-7.50, -1.75]	12.22 [10.29, 14.14]	0.72 [0.53, 0.83]	38
	SlZjilstra - MS-all	MobGap	106	1.08 [1.04, 1.12]	1.10 [1.05, 1.15]	-0.02 [-0.30, 0.25]	0.10 [0.08, 0.12]	-0.63 [-3.15, 1.90]	9.79 [8.07, 11.51]	0.83 [0.75, 0.88]	9
	SlZjilstra - MS-all	Original Implementation	106	1.13 [1.09, 1.17]	1.10 [1.05, 1.15]	-0.04 [-0.36, 0.28]	0.13 [0.11, 0.15]	-1.47 [-4.44, 1.49]	11.89 [9.93, 13.85]	0.75 [0.62, 0.84]	38
COPD	SlZjilstra - MS-MS	MobGap	214	1.12 [1.10, 1.14]	1.10 [1.07, 1.12]	0.02 [-0.19, 0.23]	0.09 [0.08, 0.10]	3.29 [1.64, 4.94]	9.16 [7.93, 10.39]	0.79 [0.72, 0.84]	34
	SlZjilstra - MS-MS	Original Implementation	214	1.13 [1.11, 1.15]	1.10 [1.07, 1.12]	0.03 [-0.19, 0.25]	0.09 [0.08, 0.10]	5.03 [2.92, 7.13]	9.93 [8.07, 11.78]	0.73 [0.64, 0.80]	34
	SlZjilstra - MS-all	MobGap	214	1.15 [1.13, 1.18]	1.10 [1.07, 1.12]	0.06 [-0.16, 0.28]	0.10 [0.09, 0.11]	6.72 [5.01, 8.42]	10.54 [9.22, 11.87]	0.75 [0.57, 0.84]	34
	SlZjilstra - MS-all	Original Implementation	214	1.17 [1.14, 1.19]	1.10 [1.07, 1.12]	0.07 [-0.16, 0.29]	0.10 [0.09, 0.11]	8.50 [6.33, 10.68]	11.32 [9.34, 13.30]	0.68 [0.44, 0.81]	34
MS	SlZjilstra - MS-MS	MobGap	228	1.06 [1.03, 1.09]	1.04 [1.01, 1.07]	0.02 [-0.28, 0.32]	0.12 [0.11, 0.13]	4.13 [1.80, 6.46]	13.42 [11.76, 15.07]	0.78 [0.72, 0.83]	6
	SlZjilstra - MS-MS	Original Implementation	228	1.08 [1.05, 1.11]	1.04 [1.01, 1.07]	0.04 [-0.27, 0.36]	0.13 [0.12, 0.14]	7.14 [4.53, 9.76]	14.76 [12.74, 16.77]	0.73 [0.65, 0.79]	6
	SlZjilstra - MS-all	MobGap	228	1.10 [1.06, 1.13]	1.04 [1.01, 1.07]	0.06 [-0.25, 0.36]	0.13 [0.11, 0.14]	7.58 [5.17, 9.99]	14.13 [12.27, 15.98]	0.76 [0.66, 0.83]	6
	SlZjilstra - MS-all	Original Implementation	228	1.12 [1.09, 1.15]	1.04 [1.01, 1.07]	0.08 [-0.23, 0.40]	0.14 [0.12, 0.15]	10.68 [7.98, 13.39]	15.70 [13.44, 17.96]	0.70 [0.53, 0.80]	6
PD	SlZjilstra - MS-MS	MobGap	224	0.99 [0.96, 1.02]	1.00 [0.97, 1.03]	-0.01 [-0.28, 0.26]	0.11 [0.10, 0.13]	1.57 [-0.60, 3.74]	12.95 [11.52, 14.38]	0.81 [0.75, 0.85]	28
	SlZjilstra - MS-MS	Original Implementation	224	1.01 [0.99, 1.04]	1.00 [0.97, 1.03]	0.01 [-0.27, 0.29]	0.12 [0.11, 0.13]	4.44 [1.97, 6.91]	13.86 [12.01, 15.71]	0.78 [0.72, 0.83]	30
	SlZjilstra - MS-all	MobGap	224	1.02 [0.99, 1.05]	1.00 [0.97, 1.03]	0.02 [-0.25, 0.30]	0.12 [0.11, 0.13]	4.94 [2.70, 7.18]	13.44 [11.86, 15.02]	0.80 [0.75, 0.85]	28
	SlZjilstra - MS-all	Original Implementation	224	1.05 [1.02, 1.07]	1.00 [0.97, 1.03]	0.05 [-0.23, 0.33]	0.12 [0.11, 0.13]	7.89 [5.34, 10.44]	14.64 [12.61, 16.67]	0.77 [0.69, 0.83]	30
PFF	SlZjilstra - MS-MS	MobGap	169	0.85 [0.81, 0.88]	0.82 [0.78, 0.87]	0.02 [-0.25, 0.29]	0.11 [0.10, 0.12]	9.80 [5.58, 14.03]	17.79 [14.19, 21.38]	0.87 [0.83, 0.91]	0
	SlZjilstra - MS-MS	Original Implementation	169	0.86 [0.83, 0.89]	0.82 [0.78, 0.87]	0.04 [-0.25, 0.33]	0.13 [0.11, 0.14]	13.36 [8.69, 18.02]	20.86 [16.85, 24.87]	0.84 [0.78, 0.88]	1
	SlZjilstra - MS-all	MobGap	169	0.87 [0.84, 0.91]	0.82 [0.78, 0.87]	0.05 [-0.21, 0.31]	0.11 [0.10, 0.13]^*	13.44 [9.08, 17.81]	19.10 [15.24, 22.96]	0.87 [0.80, 0.91]	0
	SlZjilstra - MS-all	Original Implementation	169	0.89 [0.86, 0.92]	0.82 [0.78, 0.87]	0.07 [-0.22, 0.35]	0.13 [0.12, 0.15]	17.11 [12.28, 21.93]	22.73 [18.46, 27.00]	0.83 [0.72, 0.89]	1

Total running time of the script: (0 minutes 13.631 seconds)

Estimated memory usage: 86 MB

Gallery generated by Sphinx-Gallery

Performance of the stride length algorithms on the TVS dataset#

Performance metrics#

Free-Living Comparison#

All results across all cohorts#

Per Cohort#

Deep Dive Analysis of Main Algorithms#

Speed dependency#

Laboratory Comparison#

Per Cohort#

This Page