Note

Go to the end to download the full example code.

Cadence estimation#

The following provides an analysis and comparison of the Mobilise-D algorithm pipeline on the Mobilise-D Technical Validation Study (TVS) dataset for the estimation of cadence (free-living). In this example, we look into the performance of the Python implementation of the pipeline compared to the reference data. We also compare the actual performance to that obtained by the original Matlab-based implementation [1].

Note

If you are interested in how these results are calculated, head over to the processing page.

from typing import Optional

Below the list of pipelines that are compared is shown. Note, that we use “MobGap” to refer to the reimplemented python algorithms, and the “Original Implementation” to refer to the original Matlab-based implementation.

algorithms = {
    "Official_MobiliseD_Pipeline": ("Mobilise-D Pipeline", "MobGap"),
    "EScience_MobiliseD_Pipeline": (
        "Mobilise-D Pipeline",
        "Original Implementation",
    ),
}

The code below loads the data and prepares it for the analysis. By default, the data will be downloaded from an online repository (and cached locally). If you want to use a local copy of the data, you can set the MOBGAP_VALIDATION_DATA_PATH environment variable. and the MOBGAP_VALIDATION_USE_LOCA_DATA to 1.

The file download will print a couple log information, which can usually be ignored. You can also change the version parameter to load a different version of the data.

from pathlib import Path

import pandas as pd
from mobgap.data.validation_results import ValidationResultLoader
from mobgap.utils.misc import get_env_var


def format_loaded_results(
    values: dict[tuple[str, str], pd.DataFrame],
    index_cols: list[str],
    col_prefix_filter: Optional[str],
    convert_rel_error: bool = False,
) -> pd.DataFrame:
    formatted = (
        pd.concat(values, names=["algo", "version", *index_cols])
        .pipe(
            lambda df: (
                df.filter(like=col_prefix_filter) if col_prefix_filter else df
            )
        )
        .reset_index()
        .assign(
            algo_with_version=lambda df: (
                df["algo"] + " (" + df["version"] + ")"
            ),
            _combined="combined",
        )
    )

    if col_prefix_filter:
        formatted.columns = formatted.columns.str.removeprefix(
            col_prefix_filter
        )

    if convert_rel_error:
        rel_cols = [c for c in formatted.columns if "rel_error" in c]
        formatted[rel_cols] = formatted[rel_cols] * 100

    return formatted


local_data_path = (
    Path(get_env_var("MOBGAP_VALIDATION_DATA_PATH")) / "results"
    if int(get_env_var("MOBGAP_VALIDATION_USE_LOCAL_DATA", 0))
    else None
)
__RESULT_VERSION = "v1.2.0"
loader = ValidationResultLoader(
    "full_pipeline", result_path=local_data_path, version=__RESULT_VERSION
)

# Loading free-living data
free_living_index_cols = [
    "cohort",
    "participant_id",
    "time_measure",
    "recording",
    "recording_name",
    "recording_name_pretty",
]

_free_living_results = {  # Matched and aggregate/combined per-recording results for the 2.5 h free-living recordings
    v: loader.load_single_results(k, "free_living")
    for k, v in algorithms.items()
}

_free_living_results_raw = {  # Matched per-WB results for the 2.5 h free-living recordings
    v: loader.load_single_csv_file(k, "free_living", "raw_matched_errors.csv")
    for k, v in algorithms.items()
}
free_living_results_combined = format_loaded_results(
    _free_living_results,
    free_living_index_cols,
    "combined__",
    convert_rel_error=True,
)
free_living_results_matched = format_loaded_results(
    _free_living_results,
    free_living_index_cols,
    "matched__",
    convert_rel_error=True,
)
free_living_results_matched_raw = format_loaded_results(
    values=_free_living_results_raw,
    index_cols=free_living_index_cols,
    col_prefix_filter=None,
    convert_rel_error=True,
)

del _free_living_results, _free_living_results_raw

# Loading laboratory data
laboratory_index_cols = [
    "cohort",
    "participant_id",
    "time_measure",
    "test",
    "trial",
    "test_name",
    "test_name_pretty",
]

_laboratory_results = {  # Matched and aggregate/combined per-recording results for the laboratory recordings
    v: loader.load_single_results(k, "laboratory")
    for k, v in algorithms.items()
}

_laboratory_results_raw = {  # Matched per-WB results for the laboratory recordings
    v: loader.load_single_csv_file(k, "laboratory", "raw_matched_errors.csv")
    for k, v in algorithms.items()
}
laboratory_results_combined = format_loaded_results(
    _laboratory_results,
    laboratory_index_cols,
    "combined__",
    convert_rel_error=True,
)
laboratory_results_matched = format_loaded_results(
    _laboratory_results,
    laboratory_index_cols,
    "matched__",
    convert_rel_error=True,
)
laboratory_results_matched_raw = format_loaded_results(
    values=_laboratory_results_raw,
    index_cols=laboratory_index_cols,
    col_prefix_filter=None,
    convert_rel_error=True,
)

del _laboratory_results, _laboratory_results_raw
cohort_order = ["HA", "CHF", "COPD", "MS", "PD", "PFF"]

Performance metrics#

Below you can find the setup for all performance metrics that we will calculate. We only use the single__ results for the comparison.

Note

For the evaluation of the full pipeline performance, two types of aggregation are performed, which will be described later on in the example.

from functools import partial

from mobgap.pipeline.evaluation import CustomErrorAggregations as A
from mobgap.utils.df_operations import (
    CustomOperation,
    apply_aggregations,
    apply_transformations,
    multilevel_groupby_apply_merge,
)
from mobgap.utils.tables import FormatTransformer as F
from mobgap.utils.tables import RevalidationInfo, revalidation_table_styles
from mobgap.utils.tables import StatsFunctions as S

custom_aggs_combined = [
    CustomOperation(
        identifier=None,
        function=A.n_datapoints,
        column_name=[("n_datapoints", "all")],
    ),
    ("cadence_spm__detected", ["mean", A.conf_intervals]),
    ("cadence_spm__reference", ["mean", A.conf_intervals]),
    ("cadence_spm__error", ["mean", A.loa]),
    ("cadence_spm__abs_error", ["mean", A.conf_intervals]),
    ("cadence_spm__rel_error", ["mean", A.conf_intervals]),
    ("cadence_spm__abs_rel_error", ["mean", A.conf_intervals]),
    CustomOperation(
        identifier=None,
        function=partial(
            A.icc,
            reference_col_name="cadence_spm__reference",
            detected_col_name="cadence_spm__detected",
            icc_type="icc2",
            # For the lab data, some trials have no results for the old algorithms.
            nan_policy="omit",
        ),
        column_name=[("icc", "all"), ("icc_ci", "all")],
    ),
]

custom_aggs_matched = [
    CustomOperation(
        identifier=None,
        function=lambda df_: df_["n_matched_wbs"].sum(),
        column_name=[("n_wbs_matched", "all")],
    ),
    *custom_aggs_combined,
]

stats_transform = [
    CustomOperation(
        identifier=None,
        function=partial(
            S.pairwise_tests,
            value_col=c,
            between="version",
            reference_group_key="Original Implementation",
        ),
        column_name=[("stats_metadata", c)],
    )
    for c in [
        "cadence_spm__abs_error",
        "cadence_spm__abs_rel_error",
    ]
]

format_transforms_combined = [
    CustomOperation(
        identifier=None,
        function=lambda df_: df_[("n_datapoints", "all")].astype(int),
        column_name="n_datapoints",
    ),
    *(
        CustomOperation(
            identifier=None,
            function=partial(
                F.value_with_metadata,
                value_col=("mean", c),
                other_columns={
                    "range": ("conf_intervals", c),
                    "stats_metadata": ("stats_metadata", c),
                },
            ),
            column_name=c,
        )
        for c in [
            "cadence_spm__reference",
            "cadence_spm__detected",
            "cadence_spm__abs_error",
            "cadence_spm__rel_error",
            "cadence_spm__abs_rel_error",
        ]
    ),
    CustomOperation(
        identifier=None,
        function=partial(
            F.value_with_metadata,
            value_col=("mean", "cadence_spm__error"),
            other_columns={"range": ("loa", "cadence_spm__error")},
        ),
        column_name="cadence_spm__error",
    ),
    CustomOperation(
        identifier=None,
        function=partial(
            F.value_with_metadata,
            value_col=("icc", "all"),
            other_columns={"range": ("icc_ci", "all")},
        ),
        column_name="icc",
    ),
]

format_transforms_matched = [
    CustomOperation(
        identifier=None,
        function=lambda df_: df_[("n_wbs_matched", "all")].astype(int),
        column_name="n_wbs_matched",
    ),
    *format_transforms_combined,
]


final_names_combined = {
    "n_datapoints": "# participants",
    "cadence_spm__detected": "WD mean and CI [steps/min]",
    "cadence_spm__reference": "INDIP mean and CI [steps/min]",
    "cadence_spm__error": "Bias and LoA [steps/min]",
    "cadence_spm__abs_error": "Abs. Error [steps/min]",
    "cadence_spm__rel_error": "Rel. Error [%]",
    "cadence_spm__abs_rel_error": "Abs. Rel. Error [%]",
    "icc": "ICC",
}

final_names_matched = {
    **final_names_combined,
    "n_wbs_matched": "# Matched WBs",
}

validation_thresholds = {
    "Abs. Error [steps/min]": RevalidationInfo(
        threshold=None, higher_is_better=False
    ),
    "Abs. Rel. Error [%]": RevalidationInfo(
        threshold=20, higher_is_better=False
    ),
    "ICC": RevalidationInfo(threshold=0.7, higher_is_better=True),
}


def format_tables_combined(df: pd.DataFrame) -> pd.DataFrame:
    return (
        df.pipe(apply_transformations, format_transforms_combined)
        .rename(columns=final_names_combined)
        .loc[:, list(final_names_combined.values())]
    )


def format_tables_matched(df: pd.DataFrame) -> pd.DataFrame:
    return (
        df.pipe(apply_transformations, format_transforms_matched)
        .rename(columns=final_names_matched)
        .loc[:, list(final_names_matched.values())]
    )

Free-living dataset#

Combined/Aggregated Evaluation#

To mimic actual use of wearable device where actual decisions are made on aggregated measures over a longer measurement period and not WB per WB, our primary comparison is based on the median gait metrics over the entire recording. We call this combined or aggregated evaluation. For this we combined all WBs for a datapoint by taking the median of the calculated cadence. These combined values were then compared between the systems.

Note

In the free-living dataset, each datapoint represents one 2.5h recording.

All results across all cohorts#

The results below represent the average performance across all participants independent of the cohort in terms of error, relative error, absolute error, and absolute relative error.

import matplotlib.pyplot as plt
import seaborn as sns

sns.set_context("talk")
metrics = {
    "abs_rel_error": "Abs. Rel. Error (%)",
    "error": "Error (steps/min)",
    "rel_error": "Rel. Error (%)",
    "abs_error": "Abs. Error (steps/min)",
}


def multi_metric_plot(data, metrics, nrows, ncols):
    fig, axs = plt.subplots(
        nrows, ncols, sharex=True, figsize=(ncols * 6, nrows * 4 + 2)
    )
    for ax, (metric, metric_label) in zip(axs.flatten(), metrics.items()):
        overall_df = data[["version", f"cadence_spm__{metric}"]].rename(
            columns={f"cadence_spm__{metric}": metric_label}
        )

        sns.boxplot(
            data=overall_df, x="version", hue="version", y=metric_label, ax=ax
        )

        ax.set_title(metric_label)
        ax.set_ylabel(metric_label)

        ax.tick_params(axis="both", which="major")
        ax.tick_params(axis="both", which="minor")

        ax.grid(True)

    plt.tight_layout()
    plt.show()


free_living_results_combined.pipe(multi_metric_plot, metrics, 2, 2)

Abs. Rel. Error (%), Error (steps/min), Rel. Error (%), Abs. Error (steps/min)

free_living_combined_perf_metrics_all = free_living_results_combined.pipe(
    multilevel_groupby_apply_merge,
    [
        (
            ["algo", "version"],
            partial(apply_aggregations, aggregations=custom_aggs_combined),
        ),
        (
            ["algo"],
            partial(apply_transformations, transformations=stats_transform),
        ),
    ],
).pipe(format_tables_combined)
free_living_combined_perf_metrics_all.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["algo"],
)

		# participants	WD mean and CI [steps/min]	INDIP mean and CI [steps/min]	Bias and LoA [steps/min]	Abs. Error [steps/min]	Rel. Error [%]	Abs. Rel. Error [%]	ICC
algo	version
Mobilise-D Pipeline	MobGap	101	86.49 [85.26, 87.72]	85.78 [83.76, 87.79]	0.87 [-12.50, 14.23]	4.76 [3.80, 5.72]	1.91 [0.03, 3.80]	5.92 [4.38, 7.45]	0.68 [0.56, 0.77]
Mobilise-D Pipeline	Original Implementation	101	87.03 [85.58, 88.48]	85.78 [83.76, 87.79]	1.14 [-12.13, 14.41]	5.02 [4.11, 5.93]	2.08 [0.42, 3.75]	6.10 [4.87, 7.33]	0.71 [0.60, 0.80]

Residual plots

from mobgap.plotting import move_legend_outside, residual_plot


def combo_residual_plot(data, name=None):
    name = name or data.name
    fig, axs = plt.subplots(
        ncols=2,
        sharey=True,
        sharex=True,
        figsize=(12, 9),
        constrained_layout=True,
    )
    fig.suptitle(name)
    for (version, subdata), ax in zip(data.groupby("version"), axs):
        residual_plot(
            subdata,
            "cadence_spm__reference",
            "cadence_spm__detected",
            "cohort",
            "steps/min",
            ax=ax,
            legend=ax == axs[-1],
        )
        ax.set_title(version)
    move_legend_outside(fig, axs[-1])
    plt.show()


free_living_results_combined.query('algo == "Mobilise-D Pipeline"').pipe(
    combo_residual_plot, name="Aggregated Analysis  - Cadence"
)

Aggregated Analysis - Cadence, MobGap, Original Implementation

Per-cohort analysis#

The results below represent the average absolute error on cadence estimation across all participants within a cohort.

fig, ax = plt.subplots(figsize=(12, 6))
sns.boxplot(
    data=free_living_results_combined,
    x="cohort",
    y="cadence_spm__abs_error",
    hue="version",
    order=cohort_order,
    showmeans=True,
    ax=ax,
).legend().set_title(None)
ax.set_ylabel("Absolute Error [steps/min]")
ax.set_title("Absolute Error - Combined Analysis")
fig.show()

free_living_combined_perf_metrics_cohort = (
    free_living_results_combined.pipe(
        multilevel_groupby_apply_merge,
        [
            (
                ["cohort", "algo", "version"],
                partial(apply_aggregations, aggregations=custom_aggs_combined),
            ),
            (
                ["cohort", "algo"],
                partial(apply_transformations, transformations=stats_transform),
            ),
        ],
    )
    .pipe(format_tables_combined)
    .loc[cohort_order]
)
free_living_combined_perf_metrics_cohort.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["cohort", "algo"],
)

			# participants	WD mean and CI [steps/min]	INDIP mean and CI [steps/min]	Bias and LoA [steps/min]	Abs. Error [steps/min]	Rel. Error [%]	Abs. Rel. Error [%]	ICC
cohort	algo	version
HA	Mobilise-D Pipeline	MobGap	20	85.21 [83.34, 87.08]	87.12 [83.67, 90.56]	-1.90 [-10.78, 6.97]	3.94 [2.71, 5.18]	-1.79 [-4.06, 0.48]	4.45 [3.11, 5.79]	0.72 [0.42, 0.88]
HA	Mobilise-D Pipeline	Original Implementation	20	83.96 [81.93, 85.99]	87.12 [83.67, 90.56]	-3.15 [-12.74, 6.43]	4.85 [3.50, 6.21]	-3.23 [-5.65, -0.81]	5.43 [4.03, 6.84]	0.64 [0.23, 0.85]
CHF	Mobilise-D Pipeline	MobGap	10	87.01 [83.29, 90.72]	90.03 [85.17, 94.89]	-3.02 [-11.23, 5.18]	4.35 [2.74, 5.95]	-3.13 [-6.06, -0.20]	4.80 [3.10, 6.51]	0.76 [0.24, 0.94]
CHF	Mobilise-D Pipeline	Original Implementation	10	93.28 [88.55, 98.01]	90.03 [85.17, 94.89]	3.29 [-3.49, 10.07]	4.05 [2.58, 5.52]	3.81 [1.33, 6.28]	4.62 [2.85, 6.40]	0.84 [0.23, 0.97]
COPD	Mobilise-D Pipeline	MobGap	17	83.41 [81.21, 85.60]	82.55 [79.23, 85.86]	0.86 [-5.26, 6.98]	2.57 [1.68, 3.46]	1.31 [-0.56, 3.17]	3.20 [2.01, 4.40]	0.86 [0.66, 0.95]
COPD	Mobilise-D Pipeline	Original Implementation	17	81.84 [79.56, 84.12]	82.55 [79.23, 85.86]	-0.70 [-7.01, 5.60]	2.61 [1.71, 3.52]	-0.61 [-2.49, 1.28]	3.16 [2.05, 4.27]	0.86 [0.65, 0.95]
MS	Mobilise-D Pipeline	MobGap	18	89.27 [86.69, 91.85]	87.51 [83.40, 91.62]	1.77 [-6.73, 10.26]	3.70 [2.42, 4.97]	2.50 [-0.06, 5.06]	4.51 [2.68, 6.34]	0.81 [0.57, 0.93]
MS	Mobilise-D Pipeline	Original Implementation	18	90.00 [87.63, 92.36]	87.51 [83.40, 91.62]	2.49 [-6.45, 11.43]	3.78 [2.16, 5.39]	3.39 [0.61, 6.16]	4.71 [2.42, 7.01]	0.77 [0.44, 0.91]
PD	Mobilise-D Pipeline	MobGap	19	89.80 [86.42, 93.18]	88.46 [82.92, 94.01]	1.34 [-16.62, 19.29]	6.58 [3.73, 9.43]	2.56 [-2.00, 7.13]	7.54 [4.35, 10.72]	0.61 [0.22, 0.83]
PD	Mobilise-D Pipeline	Original Implementation	19	91.29 [88.12, 94.47]	88.46 [82.92, 94.01]	2.83 [-15.94, 21.60]	7.71 [4.95, 10.46]	4.37 [-0.43, 9.17]	8.91 [5.72, 12.10]	0.54 [0.14, 0.79]
PFF	Mobilise-D Pipeline	MobGap	17	84.14 [80.77, 87.51]	79.74 [73.26, 86.22]	5.21 [-14.01, 24.43]	7.41 [3.53, 11.29]	8.91 [0.57, 17.24]	10.99 [3.28, 18.70]	0.53 [0.09, 0.80]
PFF	Mobilise-D Pipeline	Original Implementation	17	84.63 [80.34, 88.91]	79.74 [73.26, 86.22]	3.73 [-13.11, 20.56]	6.53 [3.41, 9.65]	6.43 [0.43, 12.42]	9.09 [3.99, 14.19]	0.68 [0.30, 0.87]

Scatter plot The results below represent the detected and reference values of cadence scattered across all participants within a cohort. Correlation factor, p-value and confidence intervals of the regression line are shown in the plot. Each datapoint represents one participant.

from mobgap.plotting import calc_min_max_with_margin, make_square, plot_regline


def combo_scatter_plot(data, name=None):
    name = name or data.name
    fig, axs = plt.subplots(
        ncols=2,
        sharey=True,
        sharex=True,
        figsize=(12, 8),
        constrained_layout=True,
    )
    fig.suptitle(name)

    min_max = calc_min_max_with_margin(
        data["cadence_spm__reference"],
        data["cadence_spm__detected"],
    )

    for (version, subdata), ax in zip(data.groupby("version"), axs):
        subdata = subdata[
            [
                "cadence_spm__reference",
                "cadence_spm__detected",
                "cohort",
            ]
        ].dropna(how="any")

        sns.scatterplot(
            subdata,
            x="cadence_spm__reference",
            y="cadence_spm__detected",
            hue="cohort",
            ax=ax,
            legend=ax == axs[-1],
        )

        plot_regline(
            subdata["cadence_spm__reference"],
            subdata["cadence_spm__detected"],
            ax=ax,
        )

        make_square(ax, min_max, draw_diagonal=True)

        ax.set_title(version)
        ax.set_xlabel("Reference [steps/min]")
        ax.set_ylabel("Detected [steps/min]")
        ax.tick_params(axis="both", labelsize=20)

    move_legend_outside(fig, axs[-1])

    plt.show()


free_living_results_combined.query('algo == "Mobilise-D Pipeline"').pipe(
    combo_scatter_plot, name="Mobilise-D Pipeline - Cadence"
)

Mobilise-D Pipeline - Cadence, MobGap, Original Implementation

Matched/True Positive Evaluation#

The “Matched” Evaluation directly compares the performance of cadence estimation on only the WBs that were detected in both systems (true positives). WBs were included in the true positive analysis, if there was an overlap of more than 80% between WBs detected by the two systems (details about the selection of this threshold can be found in [1]). The threshold of 80% was selected as a trade-off to allow us: (i) to consider as much as possible a like-for-like comparison between selected WBs (INDIP vs. wearable device), and at the same time (ii) to include the minimum number of WBs to ensure sufficient statistical power for the analyses (i.e., at least 101 walking bouts for each cohort). This target was based upon the number of WBs rather than a percentage of total walking bouts that would allow us to meet criteria established by statistical experts for robust statistical analysis after sample-size re-evaluation (total WB number > 101 corresponding to ICC > 0.7 and a CI = 0.2).

Note

compared to the results published in [1], the primary analysis on the matched results is performed on the average performance metrics across all matched WBs per recording/per participant. The original publication considered the average performance metrics across all matched WBs without additional aggregation.

Results across all cohorts#

The results below represent the average performance across all participants independent of the cohort in terms of error, relative error, absolute error, and absolute relative error.

free_living_results_matched.pipe(multi_metric_plot, metrics, 2, 2)

As each pipeline version produces different WB’s, it is important to compare the number of matched WBs to put all other metrics into perspective.

fig, ax = plt.subplots(figsize=(12, 6))
sns.barplot(
    data=free_living_results_matched.groupby(["version"])["n_matched_wbs"]
    .sum()
    .reset_index(),
    x="version",
    y="n_matched_wbs",
    ax=ax,
)
fig.show()

free_living_matched_perf_metrics_all = free_living_results_matched.pipe(
    multilevel_groupby_apply_merge,
    [
        (
            ["algo", "version"],
            partial(apply_aggregations, aggregations=custom_aggs_matched),
        ),
        (
            ["algo"],
            partial(apply_transformations, transformations=stats_transform),
        ),
    ],
).pipe(format_tables_matched)

free_living_matched_perf_metrics_all.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["algo"],
)

		# participants	WD mean and CI [steps/min]	INDIP mean and CI [steps/min]	Bias and LoA [steps/min]	Abs. Error [steps/min]	Rel. Error [%]	Abs. Rel. Error [%]	ICC	# Matched WBs
algo	version
Mobilise-D Pipeline	MobGap	101	89.24 [87.85, 90.62]	89.04 [87.16, 90.93]	0.16 [-9.84, 10.16]	4.96 [4.10, 5.83]	1.22 [-0.06, 2.49]	5.89 [4.77, 7.01]	0.82 [0.74, 0.88]	1984
Mobilise-D Pipeline	Original Implementation	101	89.44 [87.94, 90.94]	90.60 [88.76, 92.44]	-1.16 [-9.55, 7.23]	4.72 [3.94, 5.49]	-0.54 [-1.40, 0.33]	5.28 [4.47, 6.09]	0.87 [0.81, 0.91]	1697

Residual plot

free_living_results_matched.query('algo == "Mobilise-D Pipeline"').pipe(
    combo_residual_plot, name="Matched WBs - Cadence"
)

Matched WBs - Cadence, MobGap, Original Implementation

Per-cohort analysis#

Boxplot The results below represent the average absolute error on cadence estimation across all participants within a cohort.

fig, ax = plt.subplots(figsize=(12, 6))
sns.barplot(
    data=free_living_results_matched.groupby(["version", "cohort"])[
        "n_matched_wbs"
    ]
    .sum()
    .reset_index(),
    hue="version",
    y="n_matched_wbs",
    x="cohort",
    order=cohort_order,
    ax=ax,
)
fig.show()

fig, ax = plt.subplots(figsize=(12, 6))
sns.boxplot(
    data=free_living_results_matched,
    x="cohort",
    y="cadence_spm__abs_error",
    hue="algo_with_version",
    order=cohort_order,
    ax=ax,
).legend().set_title(None)
ax.set_ylabel("Absolute Error [steps/min]")
ax.set_title("Absolute Error - Matched Analysis")
fig.show()

Processing the per-cohort performance table

free_living_matched_perf_metrics_cohort = (
    free_living_results_matched.pipe(
        multilevel_groupby_apply_merge,
        [
            (
                ["cohort", "algo", "version"],
                partial(apply_aggregations, aggregations=custom_aggs_matched),
            ),
            (
                ["cohort", "algo"],
                partial(apply_transformations, transformations=stats_transform),
            ),
        ],
    )
    .pipe(format_tables_matched)
    .loc[cohort_order]
)

free_living_matched_perf_metrics_cohort.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["cohort", "algo"],
)

			# participants	WD mean and CI [steps/min]	INDIP mean and CI [steps/min]	Bias and LoA [steps/min]	Abs. Error [steps/min]	Rel. Error [%]	Abs. Rel. Error [%]	ICC	# Matched WBs
cohort	algo	version
HA	Mobilise-D Pipeline	MobGap	20	89.91 [87.15, 92.67]	90.47 [86.76, 94.17]	-0.69 [-7.81, 6.43]	5.02 [3.82, 6.23]	0.30 [-1.77, 2.37]	5.93 [4.32, 7.54]	0.88 [0.73, 0.95]	524
HA	Mobilise-D Pipeline	Original Implementation	20	90.02 [87.23, 92.81]	92.72 [89.25, 96.20]	-2.71 [-7.65, 2.24]	4.64 [3.77, 5.51]	-2.48 [-3.55, -1.40]	4.94 [4.17, 5.71]	0.88 [0.30, 0.97]	410
CHF	Mobilise-D Pipeline	MobGap	10	91.98 [88.22, 95.75]	93.24 [88.49, 97.99]	-1.06 [-5.77, 3.66]	4.50 [2.92, 6.08]	-0.38 [-1.81, 1.05]	4.84 [3.34, 6.34]	0.93 [0.74, 0.98]	220
CHF	Mobilise-D Pipeline	Original Implementation	10	93.67 [88.03, 99.30]	94.64 [88.71, 100.57]	-0.98 [-3.56, 1.61]	3.44 [1.69, 5.20]	-0.78 [-1.60, 0.05]	3.55 [1.83, 5.27]	0.99 [0.91, 1.00]	176
COPD	Mobilise-D Pipeline	MobGap	17	85.45 [82.97, 87.94]	84.42 [81.47, 87.36]	0.88 [-2.99, 4.74]	3.79 [3.05, 4.53]	1.55 [0.27, 2.83]	4.75 [3.72, 5.77]	0.92 [0.78, 0.97]	410
COPD	Mobilise-D Pipeline	Original Implementation	17	83.78 [81.42, 86.14]	86.14 [83.24, 89.05]	-2.36 [-6.25, 1.52]	4.23 [3.45, 5.00]	-2.43 [-3.46, -1.40]	4.82 [4.04, 5.60]	0.86 [0.17, 0.96]	323
MS	Mobilise-D Pipeline	MobGap	18	90.71 [87.18, 94.24]	90.39 [86.01, 94.76]	0.32 [-8.22, 8.87]	6.10 [3.96, 8.24]	1.64 [-0.81, 4.09]	7.23 [4.78, 9.68]	0.88 [0.70, 0.95]	327
MS	Mobilise-D Pipeline	Original Implementation	18	91.23 [87.35, 95.10]	89.68 [84.75, 94.60]	1.55 [-5.52, 8.61]	5.33 [3.87, 6.79]	2.98 [0.64, 5.33]	6.72 [4.45, 8.99]	0.92 [0.79, 0.97]	355
PD	Mobilise-D Pipeline	MobGap	19	92.01 [88.88, 95.14]	91.72 [86.38, 97.06]	0.29 [-16.84, 17.42]	5.32 [1.99, 8.66]	1.67 [-3.60, 6.94]	6.45 [1.78, 11.11]	0.61 [0.22, 0.83]	267
PD	Mobilise-D Pipeline	Original Implementation	19	92.47 [89.67, 95.27]	93.66 [89.63, 97.70]	-1.20 [-13.06, 10.67]	5.42 [2.48, 8.36]	-0.14 [-2.57, 2.29]	5.90 [2.99, 8.81]	0.70 [0.37, 0.87]	256
PFF	Mobilise-D Pipeline	MobGap	17	85.52 [81.84, 89.20]	84.59 [79.54, 89.65]	0.92 [-10.07, 11.92]	4.69 [2.56, 6.81]	2.05 [-0.96, 5.06]	5.55 [3.13, 7.96]	0.82 [0.56, 0.94]	236
PFF	Mobilise-D Pipeline	Original Implementation	17	86.12 [82.23, 90.01]	87.14 [81.64, 92.64]	-1.02 [-13.12, 11.08]	4.47 [1.94, 7.01]	-0.35 [-2.96, 2.25]	4.70 [2.65, 6.76]	0.82 [0.51, 0.94]	177

Deep dive investigation: Do errors depend on WB duration or walking speed?#

Effect of WB duration#

We investigate the dependency of the absolute cadence error of all true-positive WBs from the real-world recording on the WB duration reported by the reference system. In the top, WB errors are grouped by various duration bouts. In the bottom the number of bouts within each duration group is visualized.

import numpy as np
from mobgap.utils.df_operations import cut_into_overlapping_bins


def plot_wb_duration_analysis(df):
    """Generates a single figure with:
    - First row: Two side-by-side boxplot for "new" and "old" cases.
    - Second row: A grouped bar chart comparing WB counts for "new" and "old" cases.

    df: DataFrame containing 'version' column with values 'new' or 'old' to distinguish data
    """
    fig, axs = plt.subplot_mosaic(
        [["v"], ["v"], ["v"], ["n"]], sharex=True, figsize=(12, 9)
    )
    # Compute WB durations in seconds
    df_with_durations = df.assign(
        duration_s=lambda df_: (
            (df_["end__reference"] - df_["start__reference"]) / 100
        )
    )

    bins = {
        "All": (-np.inf, np.inf),
        "> 10 s": (10, np.inf),
        "<= 10 s": (0, 10),
        "10 - 30 s": (10, 30),
        "30 - 60 s": (30, 60),
        "60 - 120 s": (60, 120),
        "> 120 s": (120, np.inf),
    }

    binned_df = cut_into_overlapping_bins(
        df_with_durations, "duration_s", bins
    ).reset_index()
    n = sns.countplot(
        data=binned_df, x="bin", hue="version", ax=axs["n"], legend=False
    )
    for container in n.containers:
        n.bar_label(container, size=10)

    sns.boxplot(
        data=binned_df,
        x="bin",
        y="cadence_spm__abs_error",
        hue="version",
        ax=axs["v"],
    )
    sns.despine(fig)

    axs["v"].set_ylabel("Absolute Cadence Error (steps/min)")
    axs["n"].set_ylabel("WB Count")
    axs["n"].set_xlabel("Ref. WB Duration")
    fig.show()


free_living_results_matched_raw.query("algo == 'Mobilise-D Pipeline'").pipe(
    plot_wb_duration_analysis
)

Effect of walking speed on error#

One important aspect of the algorithm performance is the dependency on the cadence. Aka, how well do the algorithms perform at different walking speeds. For this we plot the absolute error against the cadence of the reference data. For better granularity, we use the values per WB, instead of the aggregates per participant. The overlayed dots represent the trend-line calculated by taking the median of the absolute error within bins of 0.05 m/s.

# For plotting all participants at the end
free_living_combined = free_living_results_matched_raw.copy()
free_living_combined["cohort"] = "Combined"
ws_level_results = pd.concat(
    [free_living_results_matched_raw, free_living_combined]
).reset_index(drop=True)

algo_names = ws_level_results["algo_with_version"].unique()
cohort_names = ws_level_results["cohort"].unique()

ws_level_results["cohort"] = pd.Categorical(
    ws_level_results["cohort"], categories=cohort_names, ordered=True
)
ws_level_results["algo_with_version"] = pd.Categorical(
    ws_level_results["algo_with_version"], categories=algo_names, ordered=True
)

# Create the figure with subplots
fig = plt.figure(constrained_layout=True, figsize=(24, 5 * len(algo_names)))
subfigs = fig.subfigures(len(algo_names), 1, wspace=0.1, hspace=0.1)

# Define the min and max limits for x and y axes
min_max_x = calc_min_max_with_margin(
    ws_level_results["walking_speed_mps__reference"]
)
min_max_y = calc_min_max_with_margin(ws_level_results["cadence_spm__abs_error"])

# Plotting each algorithm version
for subfig, (algo, data) in zip(
    subfigs, ws_level_results.groupby("algo_with_version", observed=True)
):
    subfig.suptitle(algo)
    subfig.supxlabel("Walking Speed (m/s)")
    subfig.supylabel("Absolute Error (steps/min)")

    # Create subplots for each cohort
    axs = subfig.subplots(1, len(cohort_names), sharex=True, sharey=True)

    for ax, (cohort, cohort_data) in zip(
        axs, data.groupby("cohort", observed=True)
    ):
        # Scatter plot for the cohort data
        sns.scatterplot(
            data=cohort_data,
            x="walking_speed_mps__reference",  # Reference walking speed
            y="cadence_spm__abs_error",  # Absolute error
            ax=ax,
            alpha=0.3,
        )

        # Define bins for walking speed
        bins = np.arange(
            0, cohort_data["walking_speed_mps__reference"].max() + 0.05, 0.05
        )
        cohort_data["speed_bin"] = pd.cut(
            cohort_data["walking_speed_mps__reference"], bins=bins
        )

        # Calculate bin centers
        cohort_data["bin_center"] = cohort_data["speed_bin"].apply(
            lambda x: x.mid
        )

        # Calculate median error per bin and cohort
        binned_data = (
            cohort_data.groupby("bin_center", observed=True)[
                "cadence_spm__abs_error"
            ]
            .median()
            .reset_index()
        )

        # Plot the median lines for each bin
        sns.scatterplot(
            data=binned_data,
            x="bin_center",
            y="cadence_spm__abs_error",  # Median error
            ax=ax,
        )

        ax.set_title(cohort)
        ax.set_xlabel(None)
        ax.set_ylabel(None)

        # Set axis limits
        ax.set_xlim(*min_max_x)
        ax.set_ylim(*min_max_y)

fig.show()

CHF, COPD, HA, MS, PD, PFF, Combined, CHF, COPD, HA, MS, PD, PFF, Combined

Laboratory dataset#

Combined/Aggregated Evaluation#

Note

In the laboratory dataset, each datapoint represents one trial.

All results across all cohorts#

The results below represent the average performance across all participants independent of the cohort in terms of error, relative error, absolute error, and absolute relative error.

import matplotlib.pyplot as plt
import seaborn as sns

sns.set_context("talk")
metrics = {
    "abs_rel_error": "Abs. Rel. Error (%)",
    "error": "Error (steps/min)",
    "rel_error": "Rel. Error (%)",
    "abs_error": "Abs. Error (steps/min)",
}


def multi_metric_plot(data, metrics, nrows, ncols):
    fig, axs = plt.subplots(
        nrows, ncols, sharex=True, figsize=(ncols * 6, nrows * 4 + 2)
    )
    for ax, (metric, metric_label) in zip(axs.flatten(), metrics.items()):
        overall_df = data[["version", f"cadence_spm__{metric}"]].rename(
            columns={f"cadence_spm__{metric}": metric_label}
        )

        sns.boxplot(
            data=overall_df, x="version", hue="version", y=metric_label, ax=ax
        )

        ax.set_title(metric_label)
        ax.set_ylabel(metric_label)

        ax.tick_params(axis="both", which="major")
        ax.tick_params(axis="both", which="minor")

        ax.grid(True)

    plt.tight_layout()
    plt.show()


laboratory_results_combined.pipe(multi_metric_plot, metrics, 2, 2)

laboratory_combined_perf_metrics_all = laboratory_results_combined.pipe(
    multilevel_groupby_apply_merge,
    [
        (
            ["algo", "version"],
            partial(apply_aggregations, aggregations=custom_aggs_combined),
        ),
        (
            ["algo"],
            partial(apply_transformations, transformations=stats_transform),
        ),
    ],
).pipe(format_tables_combined)

laboratory_combined_perf_metrics_all.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["algo"],
)

		# participants	WD mean and CI [steps/min]	INDIP mean and CI [steps/min]	Bias and LoA [steps/min]	Abs. Error [steps/min]	Rel. Error [%]	Abs. Rel. Error [%]	ICC
algo	version
Mobilise-D Pipeline	MobGap	1168	94.93 [94.17, 95.70]	96.38 [95.38, 97.38]	-2.17 [-21.53, 17.19]	4.87 [4.37, 5.38]^*	-1.17 [-1.76, -0.58]	5.02 [4.49, 5.54]^*	0.78 [0.75, 0.81]
Mobilise-D Pipeline	Original Implementation	1168	94.22 [93.45, 94.98]	96.38 [95.38, 97.38]	-2.40 [-22.82, 18.02]	5.75 [5.24, 6.27]	-1.32 [-1.98, -0.65]	6.02 [5.44, 6.59]	0.75 [0.71, 0.78]

Residual plots

def combo_residual_plot(data, name=None):
    name = name or data.name
    fig, axs = plt.subplots(
        ncols=2,
        sharey=True,
        sharex=True,
        figsize=(12, 9),
        constrained_layout=True,
    )
    fig.suptitle(name)
    for (version, subdata), ax in zip(data.groupby("version"), axs):
        residual_plot(
            subdata,
            "cadence_spm__reference",
            "cadence_spm__detected",
            "cohort",
            "steps/min",
            ax=ax,
            legend=ax == axs[-1],
        )
        ax.set_title(version)
    move_legend_outside(fig, axs[-1])
    plt.show()


laboratory_results_combined.query('algo == "Mobilise-D Pipeline"').pipe(
    combo_residual_plot, name="Aggregated Analysis  - Cadence"
)

Per-cohort analysis#

The results below represent the average absolute error on cadence estimation across all participants within a cohort.

fig, ax = plt.subplots(figsize=(12, 6))
sns.boxplot(
    data=laboratory_results_combined,
    x="cohort",
    y="cadence_spm__abs_error",
    hue="version",
    order=cohort_order,
    showmeans=True,
    ax=ax,
).legend().set_title(None)
ax.set_ylabel("Absolute Error [steps/min]")
ax.set_title("Absolute Error - Combined Analysis")
fig.show()

laboratory_combined_perf_metrics_cohort = (
    laboratory_results_combined.pipe(
        multilevel_groupby_apply_merge,
        [
            (
                ["cohort", "algo", "version"],
                partial(apply_aggregations, aggregations=custom_aggs_combined),
            ),
            (
                ["cohort", "algo"],
                partial(apply_transformations, transformations=stats_transform),
            ),
        ],
    )
    .pipe(format_tables_combined)
    .loc[cohort_order]
)
laboratory_combined_perf_metrics_cohort.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["cohort", "algo"],
)

			# participants	WD mean and CI [steps/min]	INDIP mean and CI [steps/min]	Bias and LoA [steps/min]	Abs. Error [steps/min]	Rel. Error [%]	Abs. Rel. Error [%]	ICC
cohort	algo	version
HA	Mobilise-D Pipeline	MobGap	227	96.73 [95.16, 98.30]	101.68 [99.54, 103.81]	-5.10 [-23.82, 13.62]	6.01 [4.83, 7.18]^*	-4.28 [-5.28, -3.29]	5.38 [4.49, 6.28]^*	0.73 [0.54, 0.83]
HA	Mobilise-D Pipeline	Original Implementation	227	93.41 [91.75, 95.06]	101.68 [99.54, 103.81]	-7.16 [-29.50, 15.19]	8.11 [6.71, 9.51]	-6.36 [-7.55, -5.17]	7.41 [6.32, 8.49]	0.61 [0.32, 0.76]
CHF	Mobilise-D Pipeline	MobGap	106	95.23 [92.75, 97.71]	95.69 [92.38, 98.99]	-2.33 [-18.63, 13.98]	4.80 [3.44, 6.17]	-1.26 [-3.11, 0.59]	5.07 [3.48, 6.66]	0.84 [0.76, 0.89]
CHF	Mobilise-D Pipeline	Original Implementation	106	97.06 [94.27, 99.85]	95.69 [92.38, 98.99]	-3.35 [-21.59, 14.89]	5.67 [4.13, 7.21]	-2.21 [-4.30, -0.13]	5.91 [4.11, 7.71]	0.80 [0.68, 0.88]
COPD	Mobilise-D Pipeline	MobGap	214	94.76 [93.18, 96.34]	98.25 [96.04, 100.45]	-4.18 [-28.88, 20.51]	5.67 [4.06, 7.28]	-3.33 [-4.50, -2.17]	5.13 [4.08, 6.17]	0.56 [0.43, 0.67]
COPD	Mobilise-D Pipeline	Original Implementation	214	92.54 [90.95, 94.14]	98.25 [96.04, 100.45]	-4.56 [-24.73, 15.60]	5.56 [4.25, 6.87]	-4.05 [-5.09, -3.00]	5.19 [4.25, 6.14]	0.67 [0.51, 0.77]
MS	Mobilise-D Pipeline	MobGap	228	94.18 [92.22, 96.15]	94.82 [92.50, 97.15]	-1.40 [-18.31, 15.50]	3.97 [2.96, 4.98]	-0.66 [-1.75, 0.43]	4.11 [3.16, 5.06]	0.85 [0.81, 0.89]
MS	Mobilise-D Pipeline	Original Implementation	228	94.64 [92.75, 96.54]	94.82 [92.50, 97.15]	-0.97 [-19.20, 17.26]	4.78 [3.74, 5.82]	-0.08 [-1.27, 1.12]	5.02 [4.02, 6.02]	0.83 [0.78, 0.86]
PD	Mobilise-D Pipeline	MobGap	224	94.06 [92.34, 95.77]	93.61 [91.50, 95.71]	-0.62 [-13.77, 12.52]	3.75 [3.01, 4.48]	0.04 [-0.91, 0.99]	4.12 [3.34, 4.90]	0.89 [0.86, 0.92]
PD	Mobilise-D Pipeline	Original Implementation	224	93.74 [92.13, 95.34]	93.61 [91.50, 95.71]	-0.63 [-15.42, 14.17]	4.67 [3.89, 5.45]	0.16 [-0.89, 1.21]	5.04 [4.22, 5.86]	0.85 [0.81, 0.89]
PFF	Mobilise-D Pipeline	MobGap	169	94.80 [92.65, 96.95]	94.08 [91.18, 96.97]	0.45 [-21.71, 22.61]	5.33 [3.82, 6.83]	2.52 [0.00, 5.04]	6.68 [4.34, 9.02]	0.77 [0.71, 0.83]
PFF	Mobilise-D Pipeline	Original Implementation	169	95.30 [93.20, 97.41]	94.08 [91.18, 96.97]	0.90 [-22.80, 24.60]	6.12 [4.54, 7.70]	3.36 [0.52, 6.19]	7.86 [5.24, 10.49]	0.74 [0.66, 0.80]

from mobgap.plotting import calc_min_max_with_margin


def combo_scatter_plot(data, name=None):
    name = name or data.name
    fig, axs = plt.subplots(
        ncols=2,
        sharey=True,
        sharex=True,
        figsize=(12, 8),
        constrained_layout=True,
    )
    fig.suptitle(name)

    min_max = calc_min_max_with_margin(
        data["cadence_spm__reference"],
        data["cadence_spm__detected"],
    )

    for (version, subdata), ax in zip(data.groupby("version"), axs):
        subdata = subdata[
            [
                "cadence_spm__reference",
                "cadence_spm__detected",
                "cohort",
            ]
        ].dropna(how="any")

        sns.scatterplot(
            subdata,
            x="cadence_spm__reference",
            y="cadence_spm__detected",
            hue="cohort",
            ax=ax,
            legend=ax == axs[-1],
        )

        plot_regline(
            subdata["cadence_spm__reference"],
            subdata["cadence_spm__detected"],
            ax=ax,
        )

        make_square(ax, min_max, draw_diagonal=True)

        ax.set_title(version)
        ax.set_xlabel("Reference [steps/min]")
        ax.set_ylabel("Detected [steps/min]")
        ax.tick_params(axis="both", labelsize=20)

    move_legend_outside(fig, axs[-1])

    plt.show()


laboratory_results_combined.query('algo == "Mobilise-D Pipeline"').pipe(
    combo_scatter_plot, name="Mobilise-D Pipeline - Cadence"
)

Matched/True Positive Evaluation#

Note

compared to the results published in [1], the primary analysis on the matched results is performed on the average performance metrics across all matched WBs per trial. The original publication considered the average performance metrics across all matched WBs without additional aggregation.

Results across all cohorts#

The results below represent the average performance across all participants independent of the cohort in terms of error, relative error, absolute error, and absolute relative error.

laboratory_results_matched.pipe(multi_metric_plot, metrics, 2, 2)

As each pipeline version produces different WB’s, it is important to compare the number of matched WBs to put all other metrics into perspective.

fig, ax = plt.subplots(figsize=(12, 6))
sns.barplot(
    data=laboratory_results_matched.groupby(["version"])["n_matched_wbs"]
    .sum()
    .reset_index(),
    x="version",
    y="n_matched_wbs",
    ax=ax,
)
fig.show()

laboratory_matched_perf_metrics_all = laboratory_results_matched.pipe(
    multilevel_groupby_apply_merge,
    [
        (
            ["algo", "version"],
            partial(apply_aggregations, aggregations=custom_aggs_matched),
        ),
        (
            ["algo"],
            partial(apply_transformations, transformations=stats_transform),
        ),
    ],
).pipe(format_tables_matched)

laboratory_matched_perf_metrics_all.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["algo"],
)

		# participants	WD mean and CI [steps/min]	INDIP mean and CI [steps/min]	Bias and LoA [steps/min]	Abs. Error [steps/min]	Rel. Error [%]	Abs. Rel. Error [%]	ICC	# Matched WBs
algo	version
Mobilise-D Pipeline	MobGap	1168	94.24 [93.53, 94.94]	95.27 [94.47, 96.07]	-1.17 [-12.99, 10.66]	3.17 [2.86, 3.47]^*	-0.80 [-1.14, -0.45]	3.30 [3.00, 3.60]^*	0.89 [0.87, 0.91]	674
Mobilise-D Pipeline	Original Implementation	1168	94.79 [94.06, 95.52]	96.59 [95.76, 97.43]	-1.81 [-14.57, 10.96]	3.87 [3.55, 4.19]	-1.38 [-1.76, -1.00]	3.96 [3.64, 4.28]	0.88 [0.85, 0.90]	714

Residual plot

laboratory_results_matched.query('algo == "Mobilise-D Pipeline"').pipe(
    combo_residual_plot, name="Matched WBs - Cadence"
)

Per-cohort analysis#

Boxplot The results below represent the average absolute error on cadence estimation across all participants within a cohort.

fig, ax = plt.subplots(figsize=(12, 6))
sns.barplot(
    data=laboratory_results_matched.groupby(["version", "cohort"])[
        "n_matched_wbs"
    ]
    .sum()
    .reset_index(),
    hue="version",
    y="n_matched_wbs",
    x="cohort",
    order=cohort_order,
    ax=ax,
)
fig.show()

fig, ax = plt.subplots(figsize=(12, 6))
sns.boxplot(
    data=laboratory_results_matched,
    x="cohort",
    y="cadence_spm__abs_error",
    hue="algo_with_version",
    order=cohort_order,
    ax=ax,
).legend().set_title(None)
ax.set_ylabel("Absolute Error [steps/min]")
ax.set_title("Absolute Error - Matched Analysis")
fig.show()

Processing the per-cohort performance table

laboratory_matched_perf_metrics_cohort = (
    laboratory_results_matched.pipe(
        multilevel_groupby_apply_merge,
        [
            (
                ["cohort", "algo", "version"],
                partial(apply_aggregations, aggregations=custom_aggs_matched),
            ),
            (
                ["cohort", "algo"],
                partial(apply_transformations, transformations=stats_transform),
            ),
        ],
    )
    .pipe(format_tables_matched)
    .loc[cohort_order]
)

laboratory_matched_perf_metrics_cohort.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["cohort", "algo"],
)

			# participants	WD mean and CI [steps/min]	INDIP mean and CI [steps/min]	Bias and LoA [steps/min]	Abs. Error [steps/min]	Rel. Error [%]	Abs. Rel. Error [%]	ICC	# Matched WBs
cohort	algo	version
HA	Mobilise-D Pipeline	MobGap	227	97.51 [96.11, 98.91]	99.00 [97.38, 100.62]	-1.88 [-11.93, 8.16]	3.38 [2.82, 3.93]^*	-1.61 [-2.37, -0.85]	3.50 [2.86, 4.14]	0.89 [0.81, 0.93]	80
HA	Mobilise-D Pipeline	Original Implementation	227	95.33 [93.95, 96.72]	99.82 [98.22, 101.43]	-4.49 [-15.48, 6.49]	4.94 [4.27, 5.62]	-4.20 [-4.86, -3.54]	4.72 [4.12, 5.32]	0.82 [0.47, 0.92]	102
CHF	Mobilise-D Pipeline	MobGap	106	92.12 [89.98, 94.26]	92.10 [89.17, 95.03]	-1.14 [-12.62, 10.35]	3.61 [2.71, 4.51]	-0.49 [-1.89, 0.90]	3.95 [2.77, 5.13]	0.90 [0.82, 0.94]	53
CHF	Mobilise-D Pipeline	Original Implementation	106	96.24 [93.59, 98.88]	99.48 [96.46, 102.49]	-3.24 [-11.66, 5.18]	3.91 [3.13, 4.70]	-2.95 [-3.66, -2.24]	3.77 [3.07, 4.47]	0.94 [0.77, 0.97]	60
COPD	Mobilise-D Pipeline	MobGap	214	96.25 [94.84, 97.66]	97.98 [96.42, 99.54]	-1.80 [-10.51, 6.90]	3.09 [2.61, 3.58]	-1.60 [-2.34, -0.85]	3.28 [2.65, 3.92]	0.91 [0.84, 0.95]	93
COPD	Mobilise-D Pipeline	Original Implementation	214	94.97 [93.46, 96.48]	97.89 [96.24, 99.53]	-2.92 [-9.37, 3.54]	3.35 [2.97, 3.73]	-2.86 [-3.28, -2.43]	3.35 [3.00, 3.70]	0.93 [0.69, 0.97]	106
MS	Mobilise-D Pipeline	MobGap	228	93.49 [91.72, 95.27]	94.64 [92.70, 96.58]	-1.15 [-16.47, 14.17]	3.47 [2.52, 4.41]	-0.67 [-1.62, 0.29]	3.56 [2.69, 4.43]	0.85 [0.80, 0.89]	176
MS	Mobilise-D Pipeline	Original Implementation	228	94.68 [92.81, 96.54]	95.65 [93.61, 97.68]	-0.97 [-17.41, 15.47]	3.96 [2.99, 4.93]	-0.37 [-1.45, 0.70]	4.17 [3.23, 5.10]	0.84 [0.79, 0.88]	182
PD	Mobilise-D Pipeline	MobGap	224	92.88 [91.39, 94.37]	93.12 [91.47, 94.78]	-0.24 [-7.41, 6.92]	2.48 [2.12, 2.83]^*	-0.01 [-0.57, 0.55]	2.78 [2.35, 3.21]	0.95 [0.94, 0.97]	150
PD	Mobilise-D Pipeline	Original Implementation	224	93.31 [91.76, 94.85]	93.43 [91.70, 95.15]	-0.12 [-8.37, 8.13]	3.20 [2.82, 3.58]	0.19 [-0.45, 0.82]	3.54 [3.09, 3.99]	0.94 [0.92, 0.96]	141
PFF	Mobilise-D Pipeline	MobGap	169	94.14 [92.13, 96.15]	95.52 [93.16, 97.87]	-1.38 [-15.23, 12.48]	3.31 [2.34, 4.29]	-0.92 [-1.81, -0.03]	3.19 [2.41, 3.97]	0.88 [0.83, 0.92]	122
PFF	Mobilise-D Pipeline	Original Implementation	169	95.28 [93.22, 97.34]	96.21 [93.69, 98.72]	-0.93 [-16.91, 15.06]	4.01 [2.93, 5.10]	-0.13 [-1.44, 1.18]	4.11 [2.96, 5.27]	0.86 [0.80, 0.90]	123

Deep dive investigation: Do errors depend on WB duration or walking speed?#

Effect of WB duration#

import numpy as np


def plot_wb_duration_analysis(df):
    """Generates a single figure with:
    - First row: Two side-by-side boxplot for "new" and "old" cases.
    - Second row: A grouped bar chart comparing WB counts for "new" and "old" cases.

    df: DataFrame containing 'version' column with values 'new' or 'old' to distinguish data
    """
    fig, axs = plt.subplot_mosaic(
        [["v"], ["v"], ["v"], ["n"]], sharex=True, figsize=(12, 9)
    )
    # Compute WB durations in seconds
    df_with_durations = df.assign(
        duration_s=lambda df_: (
            (df_["end__reference"] - df_["start__reference"]) / 100
        )
    )

    bins = {
        "All": (-np.inf, np.inf),
        "> 10 s": (10, np.inf),
        "<= 10 s": (0, 10),
        "10 - 30 s": (10, 30),
        "30 - 60 s": (30, 60),
        "60 - 120 s": (60, 120),
        "> 120 s": (120, np.inf),
    }

    binned_df = cut_into_overlapping_bins(
        df_with_durations, "duration_s", bins
    ).reset_index()
    n = sns.countplot(
        data=binned_df, x="bin", hue="version", ax=axs["n"], legend=False
    )
    for container in n.containers:
        n.bar_label(container, size=10)

    sns.boxplot(
        data=binned_df,
        x="bin",
        y="cadence_spm__abs_error",
        hue="version",
        ax=axs["v"],
    )
    sns.despine(fig)

    axs["v"].set_ylabel("Absolute Cadence Error (steps/min)")
    axs["n"].set_ylabel("WB Count")
    axs["n"].set_xlabel("Ref. WB Duration")
    fig.show()


laboratory_results_matched_raw.query("algo == 'Mobilise-D Pipeline'").pipe(
    plot_wb_duration_analysis
)

Effect of walking speed on error#

# For plotting all participants at the end
laboratory_combined = laboratory_results_matched_raw.copy()
laboratory_combined["cohort"] = "Combined"
ws_level_results = pd.concat(
    [laboratory_results_matched_raw, laboratory_combined]
).reset_index(drop=True)

algo_names = ws_level_results["algo_with_version"].unique()
cohort_names = ws_level_results["cohort"].unique()

ws_level_results["cohort"] = pd.Categorical(
    ws_level_results["cohort"], categories=cohort_names, ordered=True
)
ws_level_results["algo_with_version"] = pd.Categorical(
    ws_level_results["algo_with_version"], categories=algo_names, ordered=True
)

# Create the figure with subplots
fig = plt.figure(constrained_layout=True, figsize=(24, 5 * len(algo_names)))
subfigs = fig.subfigures(len(algo_names), 1, wspace=0.1, hspace=0.1)

# Define the min and max limits for x and y axes
min_max_x = calc_min_max_with_margin(
    ws_level_results["walking_speed_mps__reference"]
)
min_max_y = calc_min_max_with_margin(ws_level_results["cadence_spm__abs_error"])

# Plotting each algorithm version
for subfig, (algo, data) in zip(
    subfigs, ws_level_results.groupby("algo_with_version", observed=True)
):
    subfig.suptitle(algo)
    subfig.supxlabel("Walking Speed (m/s)")
    subfig.supylabel("Absolute Error (steps/min)")

    # Create subplots for each cohort
    axs = subfig.subplots(1, len(cohort_names), sharex=True, sharey=True)

    for ax, (cohort, cohort_data) in zip(
        axs, data.groupby("cohort", observed=True)
    ):
        # Scatter plot for the cohort data
        sns.scatterplot(
            data=cohort_data,
            x="walking_speed_mps__reference",  # Reference walking speed
            y="cadence_spm__abs_error",  # Absolute error
            ax=ax,
            alpha=0.3,
        )

        # Define bins for walking speed
        bins = np.arange(
            0, cohort_data["walking_speed_mps__reference"].max() + 0.05, 0.05
        )
        cohort_data["speed_bin"] = pd.cut(
            cohort_data["walking_speed_mps__reference"], bins=bins
        )

        # Calculate bin centers
        cohort_data["bin_center"] = cohort_data["speed_bin"].apply(
            lambda x: x.mid
        )

        # Calculate median error per bin and cohort
        binned_data = (
            cohort_data.groupby("bin_center", observed=True)[
                "cadence_spm__abs_error"
            ]
            .median()
            .reset_index()
        )

        # Plot the median lines for each bin
        sns.scatterplot(
            data=binned_data,
            x="bin_center",
            y="cadence_spm__abs_error",  # Median error
            ax=ax,
        )

        ax.set_title(cohort)
        ax.set_xlabel(None)
        ax.set_ylabel(None)

        # Set axis limits
        ax.set_xlim(*min_max_x)
        ax.set_ylim(*min_max_y)

fig.show()

Total running time of the script: (0 minutes 21.410 seconds)

Estimated memory usage: 91 MB

Gallery generated by Sphinx-Gallery

Cadence estimation#

Performance metrics#

Free-living dataset#

Combined/Aggregated Evaluation#

All results across all cohorts#

Per-cohort analysis#

Matched/True Positive Evaluation#

Results across all cohorts#

Per-cohort analysis#

Deep dive investigation: Do errors depend on WB duration or walking speed?#

Effect of WB duration#

Effect of walking speed on error#

Laboratory dataset#

Combined/Aggregated Evaluation#

All results across all cohorts#

Per-cohort analysis#

Matched/True Positive Evaluation#

Results across all cohorts#

Per-cohort analysis#

Deep dive investigation: Do errors depend on WB duration or walking speed?#

Effect of WB duration#

Effect of walking speed on error#

This Page