Note

Go to the end to download the full example code.

Performance of the initial contact algorithms on the TVS dataset#

The following provides an analysis and comparison of the icd performance on the TVS dataset (lab and free-living). We look into the actual performance of the algorithms compared to the reference data and compare these results with the performance of the original matlab algorithm.

Note

If you are interested in how these results are calculated, head over to the processing page.

We focus on the single_results (aka the performance per trail) and will aggregate it over multiple levels.

Below are the list of algorithms that we will compare. Note, that we use the prefix “MobGap” to refer to the reimplemented python algorithms and “Original Implementation” to refer to the original matlab algorithms.

# Note also that the IcdIonescu algorithm is the reimplementation of the Ani_McCamley algorithm in the original
# matlab algorithms.
# The  other two algorithms (IcdShinImproved and IcdHKLeeImproved) are actually cadence algorithms.
# As they can also be used to detect initial contacts, we present their results as well.
# However, you should check the dedicated cadence analysis for a more detailed comparison of these algorithms.
algorithms = {
    "IcdIonescu": ("IcdIonescu", "MobGap"),
    "IcdShinImproved": ("IcdShinImproved", "MobGap"),
    "IcdHKLeeImproved": ("IcdHKLeeImproved", "MobGap"),
}
# We only load the matlab algorithms that we reimplemented
algorithms.update(
    {
        "matlab_Ani_McCamley": ("IcdIonescu", "Original Implementation"),
    }
)

The code below loads the data and prepares it for the analysis. By default, the data will be downloaded from an online repository (and cached locally). If you want to use a local copy of the data, you can set the MOBGAP_VALIDATION_DATA_PATH environment variable. and the MOBGAP_VALIDATION_USE_LOCA_DATA to 1.

The file download will print a couple log information, which can usually be ignored. You can also change the version parameter to load a different version of the data.

from pathlib import Path

import pandas as pd
from mobgap.data.validation_results import ValidationResultLoader
from mobgap.utils.misc import get_env_var

local_data_path = (
    Path(get_env_var("MOBGAP_VALIDATION_DATA_PATH")) / "results"
    if int(get_env_var("MOBGAP_VALIDATION_USE_LOCAL_DATA", 0))
    else None
)
__RESULT_VERSION = "v1.2.0"
loader = ValidationResultLoader(
    "icd", result_path=local_data_path, version=__RESULT_VERSION
)

free_living_index_cols = [
    "cohort",
    "participant_id",
    "time_measure",
    "recording",
    "recording_name",
    "recording_name_pretty",
]

results = {
    v: loader.load_single_results(k, "free_living")
    for k, v in algorithms.items()
}
results = pd.concat(results, names=["algo", "version", *free_living_index_cols])
results_long = results.reset_index().assign(
    algo_with_version=lambda df: df["algo"] + " (" + df["version"] + ")",
    _combined="combined",
)

lab_index_cols = [
    "cohort",
    "participant_id",
    "time_measure",
    "test",
    "trial",
    "test_name",
    "test_name_pretty",
]

lab_results = {
    v: loader.load_single_results(k, "laboratory")
    for k, v in algorithms.items()
}
lab_results = pd.concat(lab_results, names=["algo", "version", *lab_index_cols])
lab_results_long = lab_results.reset_index().assign(
    algo_with_version=lambda df: df["algo"] + " (" + df["version"] + ")",
    _combined="combined",
)

cohort_order = ["HA", "CHF", "COPD", "MS", "PD", "PFF"]

  0%|                                              | 0.00/5.94k [00:00<?, ?B/s]
  0%|                                              | 0.00/5.94k [00:00<?, ?B/s]
100%|█████████████████████████████████████| 5.94k/5.94k [00:00<00:00, 29.7MB/s]

  0%|                                              | 0.00/5.94k [00:00<?, ?B/s]
  0%|                                              | 0.00/5.94k [00:00<?, ?B/s]
100%|█████████████████████████████████████| 5.94k/5.94k [00:00<00:00, 34.9MB/s]

  0%|                                              | 0.00/5.97k [00:00<?, ?B/s]
  0%|                                              | 0.00/5.97k [00:00<?, ?B/s]
100%|█████████████████████████████████████| 5.97k/5.97k [00:00<00:00, 33.4MB/s]

  0%|                                              | 0.00/5.91k [00:00<?, ?B/s]
  0%|                                              | 0.00/5.91k [00:00<?, ?B/s]
100%|█████████████████████████████████████| 5.91k/5.91k [00:00<00:00, 29.1MB/s]

  0%|                                              | 0.00/25.7k [00:00<?, ?B/s]
  0%|                                              | 0.00/25.7k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 25.7k/25.7k [00:00<00:00, 121MB/s]

  0%|                                              | 0.00/25.8k [00:00<?, ?B/s]
  0%|                                              | 0.00/25.8k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 25.8k/25.8k [00:00<00:00, 140MB/s]

  0%|                                              | 0.00/25.9k [00:00<?, ?B/s]
  0%|                                              | 0.00/25.9k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 25.9k/25.9k [00:00<00:00, 137MB/s]

  0%|                                              | 0.00/25.2k [00:00<?, ?B/s]
  0%|                                              | 0.00/25.2k [00:00<?, ?B/s]
100%|██████████████████████████████████████| 25.2k/25.2k [00:00<00:00, 142MB/s]

Performance metrics#

For each participant, performance metrics were calculated by classifying the detected initial contacts as TP, FP or FN matches. Based on these values, recall (sensitivity), precision (positive predictive value), F1 score were calculated. On top of that, absolute error for each true positive initial contact was calculated as the temporal difference between detected and reference values. Relative error was calculated by dividing all absolute errors, within a walking bout, by the average step duration estimated from the reference system. From these, we calculate the mean and confidence interval for both systems, the bias and limits of agreement (LoA) between the algorithm output and the reference data, and the ICC.

Below the functions that calculate these metrics are defined.

from functools import partial

from mobgap.pipeline.evaluation import CustomErrorAggregations as A
from mobgap.utils.df_operations import (
    CustomOperation,
    apply_aggregations,
    apply_transformations,
    multilevel_groupby_apply_merge,
)
from mobgap.utils.tables import FormatTransformer as F
from mobgap.utils.tables import RevalidationInfo, revalidation_table_styles
from mobgap.utils.tables import StatsFunctions as S

custom_aggs = [
    CustomOperation(
        identifier=None,
        function=A.n_datapoints,
        column_name=[("n_datapoints", "all")],
    ),
    ("recall", ["mean", A.conf_intervals]),
    ("precision", ["mean", A.conf_intervals]),
    ("f1_score", ["mean", A.conf_intervals]),
    ("tp_absolute_timing_error_s", ["mean", A.loa]),
    ("tp_relative_timing_error", ["mean", A.loa]),
]

stats_transform = [
    CustomOperation(
        identifier=None,
        function=partial(
            S.pairwise_tests,
            value_col=c,
            between="version",
            reference_group_key="Original Implementation",
        ),
        column_name=[("stats_metadata", c)],
    )
    for c in [
        "recall",
        "precision",
        "f1_score",
    ]
]

format_transforms = [
    CustomOperation(
        identifier=None,
        function=lambda df_: df_[("n_datapoints", "all")].astype(int),
        column_name=("General", "n_datapoints"),
    ),
    *(
        CustomOperation(
            identifier=None,
            function=partial(
                F.value_with_metadata,
                value_col=("mean", c),
                other_columns={
                    "range": ("conf_intervals", c),
                    "stats_metadata": ("stats_metadata", c),
                },
            ),
            column_name=("ICD", c),
        )
        for c in [
            "recall",
            "precision",
            "f1_score",
        ]
    ),
    *(
        CustomOperation(
            identifier=None,
            function=partial(
                F.value_with_metadata,
                value_col=("mean", c),
                other_columns={"range": ("loa", c)},
            ),
            column_name=("IC Timing", c),
        )
        for c in [
            "tp_absolute_timing_error_s",
            "tp_relative_timing_error",
        ]
    ),
]

final_names = {
    "n_datapoints": "# recordings",
    "recall": "Recall",
    "precision": "Precision",
    "f1_score": "F1 Score",
    "tp_absolute_timing_error_s": "Abs. Error [s]",
    "tp_relative_timing_error": "Bias and LoA",
}


validation_thresholds = {
    ("ICD", "Recall"): RevalidationInfo(threshold=0.7, higher_is_better=True),
    ("ICD", "Precision"): RevalidationInfo(
        threshold=0.7, higher_is_better=True
    ),
    ("ICD", "F1 Score"): RevalidationInfo(threshold=0.7, higher_is_better=True),
}


def format_results(df: pd.DataFrame) -> pd.DataFrame:
    return (
        df.pipe(apply_transformations, format_transforms)
        .rename(columns=final_names)
        .loc[:, pd.IndexSlice[:, list(final_names.values())]]
    )

Free-Living Comparison#

We focus the comparison on the free-living data, as this is the most relevant considering our final use-case. In the free-living data, there is one 2.5 hour recording per participant. This means, each datapoint in the plots below and in the summary statistics represents one participant.

All results across all cohorts#

import matplotlib.pyplot as plt
import seaborn as sns

hue_order = ["Original Implementation", "MobGap"]

fig, ax = plt.subplots()
sns.boxplot(
    data=results_long,
    x="algo",
    y="f1_score",
    hue="version",
    hue_order=hue_order,
    ax=ax,
)
fig.show()

perf_metrics_all = results_long.pipe(
    multilevel_groupby_apply_merge,
    [
        (
            ["algo", "version"],
            partial(apply_aggregations, aggregations=custom_aggs),
        ),
        (
            ["algo"],
            partial(apply_transformations, transformations=stats_transform),
        ),
    ],
).pipe(format_results)
perf_metrics_all.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["algo"],
)

		General	ICD			IC Timing
		# recordings	Recall	Precision	F1 Score	Abs. Error [s]	Bias and LoA
algo
IcdHKLeeImproved	MobGap	101	0.93 [0.91, 0.95]	0.91 [0.89, 0.93]	0.92 [0.90, 0.94]	0.14 [0.08, 0.21]	0.21 [0.12, 0.29]
IcdIonescu	MobGap	101	0.90 [0.88, 0.92]	0.92 [0.90, 0.94]	0.91 [0.89, 0.93]	0.06 [0.01, 0.11]	0.08 [0.03, 0.14]
IcdIonescu	Original Implementation	101	0.90 [0.88, 0.92]	0.92 [0.90, 0.94]	0.91 [0.89, 0.93]	0.06 [0.01, 0.11]	0.08 [0.03, 0.14]
IcdShinImproved	MobGap	101	0.93 [0.91, 0.95]	0.92 [0.90, 0.94]	0.93 [0.91, 0.95]	0.06 [0.02, 0.11]	0.09 [0.04, 0.13]

Per Cohort#

While this provides a good overview, it does not fully reflect how these algorithms perform on the different cohorts.

fig, ax = plt.subplots()
sns.boxplot(
    data=results_long, x="cohort", y="f1_score", hue="algo_with_version", ax=ax
)
fig.show()

perf_metrics_per_cohort = (
    results_long.pipe(
        multilevel_groupby_apply_merge,
        [
            (
                ["cohort", "algo", "version"],
                partial(apply_aggregations, aggregations=custom_aggs),
            ),
            (
                ["cohort", "algo"],
                partial(apply_transformations, transformations=stats_transform),
            ),
        ],
    )
    .pipe(format_results)
    .loc[cohort_order]
)
perf_metrics_per_cohort.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["cohort", "algo"],
)

			General	ICD			IC Timing
			# recordings	Recall	Precision	F1 Score	Abs. Error [s]	Bias and LoA
cohort	algo
HA	IcdHKLeeImproved	MobGap	20	0.94 [0.93, 0.96]	0.93 [0.91, 0.94]	0.93 [0.92, 0.95]	0.14 [0.09, 0.18]	0.20 [0.15, 0.26]
	IcdIonescu	MobGap	20	0.92 [0.91, 0.94]	0.94 [0.93, 0.96]	0.93 [0.92, 0.95]	0.05 [0.03, 0.08]	0.08 [0.05, 0.11]
	IcdIonescu	Original Implementation	20	0.92 [0.90, 0.94]	0.94 [0.93, 0.96]	0.93 [0.92, 0.94]	0.05 [0.02, 0.08]	0.08 [0.04, 0.11]
	IcdShinImproved	MobGap	20	0.94 [0.93, 0.96]	0.94 [0.93, 0.95]	0.94 [0.93, 0.95]	0.06 [0.03, 0.08]	0.08 [0.05, 0.11]
CHF	IcdHKLeeImproved	MobGap	10	0.94 [0.92, 0.96]	0.93 [0.91, 0.96]	0.94 [0.92, 0.96]	0.14 [0.07, 0.20]	0.21 [0.11, 0.30]
	IcdIonescu	MobGap	10	0.92 [0.89, 0.95]	0.95 [0.93, 0.97]	0.94 [0.91, 0.96]	0.06 [0.01, 0.10]	0.08 [0.02, 0.14]
	IcdIonescu	Original Implementation	10	0.93 [0.90, 0.95]	0.95 [0.94, 0.97]	0.94 [0.92, 0.96]	0.05 [0.00, 0.10]	0.08 [0.01, 0.15]
	IcdShinImproved	MobGap	10	0.95 [0.93, 0.96]	0.95 [0.94, 0.97]	0.95 [0.93, 0.96]	0.06 [0.02, 0.10]	0.09 [0.04, 0.14]
COPD	IcdHKLeeImproved	MobGap	17	0.92 [0.91, 0.93]	0.89 [0.87, 0.91]	0.91 [0.89, 0.92]	0.17 [0.10, 0.23]	0.22 [0.14, 0.31]
	IcdIonescu	MobGap	17	0.89 [0.88, 0.91]	0.92 [0.91, 0.94]	0.91 [0.89, 0.92]	0.07 [0.02, 0.12]	0.09 [0.04, 0.14]
	IcdIonescu	Original Implementation	17	0.89 [0.87, 0.90]	0.92 [0.91, 0.94]	0.90 [0.89, 0.92]	0.07 [0.03, 0.12]	0.09 [0.04, 0.15]
	IcdShinImproved	MobGap	17	0.93 [0.92, 0.94]	0.92 [0.91, 0.94]	0.93 [0.91, 0.94]	0.07 [0.03, 0.11]	0.09 [0.04, 0.14]
MS	IcdHKLeeImproved	MobGap	18	0.95 [0.93, 0.97]	0.93 [0.90, 0.95]	0.94 [0.91, 0.96]	0.14 [0.08, 0.19]	0.20 [0.11, 0.29]
	IcdIonescu	MobGap	18	0.93 [0.91, 0.95]	0.94 [0.92, 0.96]	0.93 [0.91, 0.95]	0.06 [-0.00, 0.13]	0.09 [0.02, 0.15]
	IcdIonescu	Original Implementation	18	0.93 [0.91, 0.95]	0.94 [0.92, 0.96]	0.93 [0.91, 0.96]	0.06 [-0.00, 0.13]	0.09 [0.03, 0.15]
	IcdShinImproved	MobGap	18	0.94 [0.92, 0.97]	0.94 [0.91, 0.97]	0.94 [0.92, 0.97]	0.06 [0.00, 0.12]	0.09 [0.03, 0.15]
PD	IcdHKLeeImproved	MobGap	19	0.93 [0.91, 0.96]	0.93 [0.90, 0.96]	0.93 [0.91, 0.95]	0.14 [0.09, 0.20]	0.21 [0.14, 0.28]
	IcdIonescu	MobGap	19	0.92 [0.89, 0.95]	0.94 [0.91, 0.97]	0.93 [0.90, 0.96]	0.06 [0.01, 0.11]	0.09 [0.04, 0.14]
	IcdIonescu	Original Implementation	19	0.92 [0.89, 0.95]	0.94 [0.91, 0.97]	0.93 [0.90, 0.96]	0.06 [0.01, 0.10]	0.08 [0.03, 0.14]
	IcdShinImproved	MobGap	19	0.94 [0.91, 0.96]	0.95 [0.92, 0.97]	0.94 [0.92, 0.96]	0.06 [0.02, 0.10]	0.09 [0.04, 0.13]
PFF	IcdHKLeeImproved	MobGap	17	0.88 [0.77, 0.99]	0.84 [0.73, 0.95]	0.86 [0.75, 0.96]	0.15 [0.05, 0.25]	0.19 [0.07, 0.30]
	IcdIonescu	MobGap	17	0.84 [0.73, 0.95]	0.84 [0.73, 0.95]	0.84 [0.73, 0.94]	0.07 [0.02, 0.11]	0.08 [0.03, 0.13]
	IcdIonescu	Original Implementation	17	0.84 [0.73, 0.95]	0.84 [0.73, 0.95]	0.84 [0.73, 0.95]	0.06 [0.02, 0.11]	0.08 [0.03, 0.13]
	IcdShinImproved	MobGap	17	0.87 [0.76, 0.98]	0.85 [0.73, 0.96]	0.86 [0.75, 0.97]	0.06 [0.01, 0.11]	0.08 [0.02, 0.13]

Only relevant algorithms#

Finally, we present comparison of the old and new implementations of IcdIonescu. IcdShinImproved and IcdHKLeeImproved are excluded because they are cadence algorithms and we don’t calculate ICs with these algos in the old Matlab implementation.

fig, ax = plt.subplots()
sns.boxplot(
    data=results_long.query("algo == 'IcdIonescu'"),
    x="cohort",
    y="f1_score",
    hue="algo_with_version",
    ax=ax,
)
fig.show()

final_perf_metrics = (
    perf_metrics_per_cohort.copy()
    .query("algo == 'IcdIonescu'")
    .reset_index(level="algo", drop=True)
)

final_perf_metrics.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["cohort"],
)

		General	ICD			IC Timing
		# recordings	Recall	Precision	F1 Score	Abs. Error [s]	Bias and LoA
cohort
HA	MobGap	20	0.92 [0.91, 0.94]	0.94 [0.93, 0.96]	0.93 [0.92, 0.95]	0.05 [0.03, 0.08]	0.08 [0.05, 0.11]
HA	Original Implementation	20	0.92 [0.90, 0.94]	0.94 [0.93, 0.96]	0.93 [0.92, 0.94]	0.05 [0.02, 0.08]	0.08 [0.04, 0.11]
CHF	MobGap	10	0.92 [0.89, 0.95]	0.95 [0.93, 0.97]	0.94 [0.91, 0.96]	0.06 [0.01, 0.10]	0.08 [0.02, 0.14]
CHF	Original Implementation	10	0.93 [0.90, 0.95]	0.95 [0.94, 0.97]	0.94 [0.92, 0.96]	0.05 [0.00, 0.10]	0.08 [0.01, 0.15]
COPD	MobGap	17	0.89 [0.88, 0.91]	0.92 [0.91, 0.94]	0.91 [0.89, 0.92]	0.07 [0.02, 0.12]	0.09 [0.04, 0.14]
COPD	Original Implementation	17	0.89 [0.87, 0.90]	0.92 [0.91, 0.94]	0.90 [0.89, 0.92]	0.07 [0.03, 0.12]	0.09 [0.04, 0.15]
MS	MobGap	18	0.93 [0.91, 0.95]	0.94 [0.92, 0.96]	0.93 [0.91, 0.95]	0.06 [-0.00, 0.13]	0.09 [0.02, 0.15]
MS	Original Implementation	18	0.93 [0.91, 0.95]	0.94 [0.92, 0.96]	0.93 [0.91, 0.96]	0.06 [-0.00, 0.13]	0.09 [0.03, 0.15]
PD	MobGap	19	0.92 [0.89, 0.95]	0.94 [0.91, 0.97]	0.93 [0.90, 0.96]	0.06 [0.01, 0.11]	0.09 [0.04, 0.14]
PD	Original Implementation	19	0.92 [0.89, 0.95]	0.94 [0.91, 0.97]	0.93 [0.90, 0.96]	0.06 [0.01, 0.10]	0.08 [0.03, 0.14]
PFF	MobGap	17	0.84 [0.73, 0.95]	0.84 [0.73, 0.95]	0.84 [0.73, 0.94]	0.07 [0.02, 0.11]	0.08 [0.03, 0.13]
PFF	Original Implementation	17	0.84 [0.73, 0.95]	0.84 [0.73, 0.95]	0.84 [0.73, 0.95]	0.06 [0.02, 0.11]	0.08 [0.03, 0.13]

Laboratory Comparison#

Every datapoint below is one trial of a test. Note, that each datapoint is weighted equally in the calculation of the performance metrics. This is a limitation of this simple approach, as the number of strides per trial and the complexity of the context can vary significantly. For a full picture, different groups of tests should be analyzed separately. The approach below should still provide a good overview to compare the algorithms.

hue_order = ["Original Implementation", "MobGap"]

fig, ax = plt.subplots()
sns.boxplot(
    data=lab_results_long,
    x="algo",
    y="f1_score",
    hue="version",
    hue_order=hue_order,
    ax=ax,
)
fig.show()

perf_metrics_all = lab_results_long.pipe(
    multilevel_groupby_apply_merge,
    [
        (
            ["algo", "version"],
            partial(apply_aggregations, aggregations=custom_aggs),
        ),
        (
            ["algo"],
            partial(apply_transformations, transformations=stats_transform),
        ),
    ],
).pipe(format_results)
perf_metrics_all.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["algo"],
)

		General	ICD			IC Timing
		# recordings	Recall	Precision	F1 Score	Abs. Error [s]	Bias and LoA
algo
IcdHKLeeImproved	MobGap	1168	0.80 [0.79, 0.82]	0.88 [0.86, 0.89]	0.83 [0.82, 0.85]	0.12 [0.02, 0.22]	0.18 [0.02, 0.34]
IcdIonescu	MobGap	1168	0.73 [0.72, 0.75]	0.88 [0.86, 0.89]	0.79 [0.78, 0.81]	0.05 [-0.02, 0.13]	0.08 [-0.02, 0.18]
IcdIonescu	Original Implementation	1168	0.72 [0.70, 0.73]	0.86 [0.84, 0.88]	0.78 [0.76, 0.80]	0.05 [-0.02, 0.12]	0.07 [-0.02, 0.17]
IcdShinImproved	MobGap	1168	0.79 [0.77, 0.80]	0.88 [0.86, 0.90]	0.83 [0.81, 0.84]	0.05 [-0.01, 0.12]	0.08 [-0.01, 0.17]

Per Cohort#

While this provides a good overview, it does not fully reflect how these algorithms perform on the different cohorts.

fig, ax = plt.subplots()
sns.boxplot(
    data=lab_results_long,
    x="cohort",
    y="f1_score",
    hue="algo_with_version",
    ax=ax,
)
fig.show()

perf_metrics_per_cohort = (
    lab_results_long.pipe(
        multilevel_groupby_apply_merge,
        [
            (
                ["cohort", "algo", "version"],
                partial(apply_aggregations, aggregations=custom_aggs),
            ),
            (
                ["cohort", "algo"],
                partial(apply_transformations, transformations=stats_transform),
            ),
        ],
    )
    .pipe(format_results)
    .loc[cohort_order]
)
perf_metrics_per_cohort.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["cohort", "algo"],
)

			General	ICD			IC Timing
			# recordings	Recall	Precision	F1 Score	Abs. Error [s]	Bias and LoA
cohort	algo
HA	IcdHKLeeImproved	MobGap	227	0.74 [0.69, 0.78]	0.83 [0.78, 0.87]	0.78 [0.73, 0.82]	0.10 [0.01, 0.20]	0.17 [0.01, 0.33]
	IcdIonescu	MobGap	227	0.67 [0.63, 0.71]	0.82 [0.78, 0.87]	0.74 [0.69, 0.78]	0.04 [-0.03, 0.11]	0.06 [-0.04, 0.16]
	IcdIonescu	Original Implementation	227	0.67 [0.63, 0.71]	0.82 [0.77, 0.87]	0.73 [0.69, 0.78]	0.04 [-0.03, 0.10]	0.06 [-0.04, 0.16]
	IcdShinImproved	MobGap	227	0.72 [0.67, 0.76]	0.82 [0.78, 0.87]	0.76 [0.72, 0.81]	0.05 [-0.02, 0.11]	0.08 [-0.02, 0.18]
CHF	IcdHKLeeImproved	MobGap	106	0.81 [0.76, 0.86]	0.89 [0.83, 0.94]	0.84 [0.79, 0.89]	0.12 [0.02, 0.21]	0.18 [0.02, 0.33]
	IcdIonescu	MobGap	106	0.74 [0.70, 0.79]	0.89 [0.84, 0.94]	0.81 [0.76, 0.85]	0.06 [-0.02, 0.13]	0.08 [-0.02, 0.18]
	IcdIonescu	Original Implementation	106	0.74 [0.70, 0.79]	0.89 [0.84, 0.94]	0.80 [0.76, 0.85]	0.05 [-0.02, 0.12]	0.08 [-0.02, 0.17]
	IcdShinImproved	MobGap	106	0.80 [0.75, 0.85]	0.89 [0.84, 0.95]	0.84 [0.79, 0.89]	0.05 [-0.01, 0.12]	0.08 [-0.01, 0.17]
COPD	IcdHKLeeImproved	MobGap	214	0.74 [0.70, 0.79]	0.82 [0.78, 0.87]	0.78 [0.73, 0.83]	0.11 [-0.00, 0.23]	0.18 [-0.01, 0.38]
	IcdIonescu	MobGap	214	0.68 [0.64, 0.72]	0.83 [0.78, 0.88]	0.74 [0.70, 0.79]	0.04 [-0.02, 0.10]	0.06 [-0.03, 0.15]
	IcdIonescu	Original Implementation	214	0.68 [0.64, 0.72]	0.83 [0.78, 0.88]	0.74 [0.70, 0.79]	0.04 [-0.02, 0.10]	0.06 [-0.03, 0.16]
	IcdShinImproved	MobGap	214	0.73 [0.69, 0.78]	0.83 [0.78, 0.88]	0.77 [0.73, 0.82]	0.04 [-0.02, 0.11]	0.07 [-0.03, 0.16]
MS	IcdHKLeeImproved	MobGap	228	0.87 [0.85, 0.89]	0.94 [0.92, 0.96]	0.90 [0.88, 0.92]	0.13 [0.03, 0.22]	0.19 [0.05, 0.33]
	IcdIonescu	MobGap	228	0.78 [0.76, 0.81]	0.94 [0.91, 0.96]	0.85 [0.83, 0.87]	0.07 [-0.02, 0.16]	0.10 [-0.00, 0.21]
	IcdIonescu	Original Implementation	228	0.75 [0.72, 0.78]	0.90 [0.87, 0.94]	0.82 [0.79, 0.85]	0.06 [-0.02, 0.15]	0.09 [-0.01, 0.20]
	IcdShinImproved	MobGap	228	0.85 [0.82, 0.87]	0.94 [0.92, 0.96]	0.89 [0.87, 0.91]	0.07 [-0.01, 0.14]	0.10 [0.00, 0.19]
PD	IcdHKLeeImproved	MobGap	224	0.78 [0.74, 0.82]	0.84 [0.80, 0.89]	0.81 [0.77, 0.85]	0.12 [0.01, 0.23]	0.18 [0.02, 0.35]
	IcdIonescu	MobGap	224	0.71 [0.67, 0.74]	0.85 [0.81, 0.89]	0.77 [0.73, 0.81]	0.05 [-0.02, 0.11]	0.07 [-0.02, 0.15]
	IcdIonescu	Original Implementation	224	0.69 [0.65, 0.73]	0.83 [0.78, 0.87]	0.75 [0.71, 0.79]	0.04 [-0.02, 0.11]	0.06 [-0.02, 0.15]
	IcdShinImproved	MobGap	224	0.77 [0.73, 0.81]	0.85 [0.81, 0.90]	0.81 [0.77, 0.85]	0.05 [-0.01, 0.11]	0.07 [-0.01, 0.16]
PFF	IcdHKLeeImproved	MobGap	169	0.90 [0.89, 0.92]	0.95 [0.94, 0.97]	0.92 [0.91, 0.93]	0.13 [0.05, 0.21]	0.19 [0.09, 0.29]
	IcdIonescu	MobGap	169	0.83 [0.81, 0.84]	0.95 [0.93, 0.96]	0.88 [0.87, 0.89]	0.06 [-0.00, 0.13]	0.09 [0.02, 0.16]
	IcdIonescu	Original Implementation	169	0.81 [0.79, 0.83]	0.93 [0.90, 0.95]	0.86 [0.84, 0.88]	0.06 [-0.01, 0.13]	0.08 [0.01, 0.15]
	IcdShinImproved	MobGap	169	0.89 [0.87, 0.90]	0.95 [0.94, 0.97]	0.91 [0.90, 0.92]	0.06 [0.00, 0.12]	0.09 [0.01, 0.17]

Only relevant algorithms#

fig, ax = plt.subplots()
sns.boxplot(
    data=lab_results_long.query("algo == 'IcdIonescu'"),
    x="cohort",
    y="f1_score",
    hue="algo_with_version",
    ax=ax,
)
fig.show()

final_perf_metrics = perf_metrics_per_cohort.query(
    "algo == 'IcdIonescu'"
).reset_index(level="algo", drop=True)

final_perf_metrics.style.pipe(
    revalidation_table_styles,
    validation_thresholds,
    ["cohort"],
)

		General	ICD			IC Timing
		# recordings	Recall	Precision	F1 Score	Abs. Error [s]	Bias and LoA
cohort
HA	MobGap	227	0.67 [0.63, 0.71]	0.82 [0.78, 0.87]	0.74 [0.69, 0.78]	0.04 [-0.03, 0.11]	0.06 [-0.04, 0.16]
HA	Original Implementation	227	0.67 [0.63, 0.71]	0.82 [0.77, 0.87]	0.73 [0.69, 0.78]	0.04 [-0.03, 0.10]	0.06 [-0.04, 0.16]
CHF	MobGap	106	0.74 [0.70, 0.79]	0.89 [0.84, 0.94]	0.81 [0.76, 0.85]	0.06 [-0.02, 0.13]	0.08 [-0.02, 0.18]
CHF	Original Implementation	106	0.74 [0.70, 0.79]	0.89 [0.84, 0.94]	0.80 [0.76, 0.85]	0.05 [-0.02, 0.12]	0.08 [-0.02, 0.17]
COPD	MobGap	214	0.68 [0.64, 0.72]	0.83 [0.78, 0.88]	0.74 [0.70, 0.79]	0.04 [-0.02, 0.10]	0.06 [-0.03, 0.15]
COPD	Original Implementation	214	0.68 [0.64, 0.72]	0.83 [0.78, 0.88]	0.74 [0.70, 0.79]	0.04 [-0.02, 0.10]	0.06 [-0.03, 0.16]
MS	MobGap	228	0.78 [0.76, 0.81]	0.94 [0.91, 0.96]	0.85 [0.83, 0.87]	0.07 [-0.02, 0.16]	0.10 [-0.00, 0.21]
MS	Original Implementation	228	0.75 [0.72, 0.78]	0.90 [0.87, 0.94]	0.82 [0.79, 0.85]	0.06 [-0.02, 0.15]	0.09 [-0.01, 0.20]
PD	MobGap	224	0.71 [0.67, 0.74]	0.85 [0.81, 0.89]	0.77 [0.73, 0.81]	0.05 [-0.02, 0.11]	0.07 [-0.02, 0.15]
PD	Original Implementation	224	0.69 [0.65, 0.73]	0.83 [0.78, 0.87]	0.75 [0.71, 0.79]	0.04 [-0.02, 0.11]	0.06 [-0.02, 0.15]
PFF	MobGap	169	0.83 [0.81, 0.84]	0.95 [0.93, 0.96]	0.88 [0.87, 0.89]	0.06 [-0.00, 0.13]	0.09 [0.02, 0.16]
PFF	Original Implementation	169	0.81 [0.79, 0.83]	0.93 [0.90, 0.95]	0.86 [0.84, 0.88]	0.06 [-0.01, 0.13]	0.08 [0.01, 0.15]

Total running time of the script: (0 minutes 6.588 seconds)

Estimated memory usage: 81 MB

Gallery generated by Sphinx-Gallery

Performance of the initial contact algorithms on the TVS dataset#

Performance metrics#

Free-Living Comparison#

All results across all cohorts#

Per Cohort#

Only relevant algorithms#

Laboratory Comparison#

Per Cohort#

Only relevant algorithms#

This Page