.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_revalidation/laterality/_01_lrc_analysis.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_revalidation_laterality__01_lrc_analysis.py: .. _lrc_val_results: Performance of the laterality classification algorithms on the TVS dataset ========================================================================== The following provides an analysis and comparison of the stride length algorithms on the TVS dataset (lab and free-living). We look into the actual performance of the algorithms compared to the reference data. Compared to the other revalidation scripts, this one does not load the old "matlab" results, as there are no old results. The laterality algorithm by Ulrich et al. was validated independently and was already written in Python. The implemented version follows the old version very closely. The goal of this revalidation, is to validate the re-trained model (with the updated training code) on the TVS dataset. We compare it against the old model and the McCamley algorithm. .. note:: If you are interested in how these results are calculated, head over to the :ref:`processing page `. .. GENERATED FROM PYTHON SOURCE LINES 23-30 Below are the list of algorithms that we will compare. Note, that we use the postfix "MobGap" to refer to the newly trained model and "Original Implementation" refers to the models trained as part of previous work. We compare all the available models. For context, the "MS_ALL" models are used by default in the pipelines. For the McCamley algorithm, only a single version exists. Same for the acceleration based Manseur algorithm. .. GENERATED FROM PYTHON SOURCE LINES 30-39 .. code-block:: Python algorithms = { "Mansour": ("BenMansour", "-"), "McCamley": ("McCamley", "-"), "UllrichOld__ms_all": ("Ullrich - MS-ALL", "Original Implementation"), "UllrichOld__ms_ms": ("Ullrich - MS-MS", "Original Implementation"), "UllrichNew__ms_all": ("Ullrich - MS-ALL", "MobGap"), } .. GENERATED FROM PYTHON SOURCE LINES 40-47 The code below loads the data and prepares it for the analysis. By default, the data will be downloaded from an online repository (and cached locally). If you want to use a local copy of the data, you can set the `MOBGAP_VALIDATION_DATA_PATH` environment variable. and the MOBGAP_VALIDATION_USE_LOCA_DATA to `1`. The file download will print a couple log information, which can usually be ignored. You can also change the `version` parameter to load a different version of the data. .. GENERATED FROM PYTHON SOURCE LINES 47-118 .. code-block:: Python from pathlib import Path import pandas as pd from mobgap.data.validation_results import ValidationResultLoader from mobgap.utils.misc import get_env_var def format_loaded_results( values: dict[tuple[str, str], pd.DataFrame], index_cols: list[str], ) -> pd.DataFrame: formatted = ( pd.concat(values, names=["algo", "version", *index_cols]) .reset_index() .assign( algo_with_version=lambda df: ( df["algo"] + " (" + df["version"] + ")" ), _combined="combined", ) ) return formatted local_data_path = ( Path(get_env_var("MOBGAP_VALIDATION_DATA_PATH")) / "results" if int(get_env_var("MOBGAP_VALIDATION_USE_LOCAL_DATA", 0)) else None ) __RESULT_VERSION = "v1.2.0" loader = ValidationResultLoader( "lrc", result_path=local_data_path, version=__RESULT_VERSION ) free_living_index_cols = [ "cohort", "participant_id", "time_measure", "recording", "recording_name", "recording_name_pretty", ] free_living_results = format_loaded_results( { v: loader.load_single_results(k, "free_living") for k, v in algorithms.items() }, free_living_index_cols, ) lab_index_cols = [ "cohort", "participant_id", "time_measure", "test", "trial", "test_name", "test_name_pretty", ] lab_results = format_loaded_results( { v: loader.load_single_results(k, "laboratory") for k, v in algorithms.items() }, lab_index_cols, ) cohort_order = ["HA", "CHF", "COPD", "MS", "PD", "PFF"] .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0.00/2.38k [00:00 pd.DataFrame: return ( df.pipe(apply_transformations, format_transforms) .rename(columns=final_names) .loc[:, list(final_names.values())] ) .. GENERATED FROM PYTHON SOURCE LINES 196-204 Free-Living Comparison ---------------------- We focus on the free-living data for the comparison as this is the expected use case for the algorithms. All results across all cohorts ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The results below represent the average performance across all participants independent of the cohort. .. GENERATED FROM PYTHON SOURCE LINES 204-235 .. code-block:: Python import matplotlib.pyplot as plt import seaborn as sns fig, ax = plt.subplots() sns.boxplot( data=free_living_results, x="algo_with_version", y="accuracy", ax=ax ) plt.xticks(rotation=45, ha="right") fig.tight_layout() fig.show() fig, ax = plt.subplots() sns.boxplot( data=free_living_results, x="algo_with_version", y="accuracy_pairwise", ax=ax, ) plt.xticks(rotation=45, ha="right") fig.tight_layout() fig.show() perf_metrics_all = ( free_living_results.groupby(["algo", "version"]) .apply(apply_aggregations, custom_aggs, include_groups=False) .pipe(format_tables) ) perf_metrics_all.style.pipe( revalidation_table_styles, validation_thresholds, ["algo"] ) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_001.png :alt: 01 lrc analysis :srcset: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_001.png :class: sphx-glr-single-img * .. image-sg:: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_002.png :alt: 01 lrc analysis :srcset: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_002.png :class: sphx-glr-single-img .. raw:: html
    # participants Accuracy Accuracy IC-pairs
algo version      
BenMansour - 101 0.89 [0.86, 0.92] 0.89 [0.87, 0.90]
McCamley - 101 0.78 [0.75, 0.82] 0.77 [0.75, 0.79]
Ullrich - MS-ALL MobGap 101 0.80 [0.77, 0.84] 0.81 [0.79, 0.83]
Original Implementation 101 0.77 [0.74, 0.81] 0.74 [0.71, 0.77]
Ullrich - MS-MS Original Implementation 101 0.76 [0.72, 0.80] 0.76 [0.73, 0.78]


.. GENERATED FROM PYTHON SOURCE LINES 236-239 Per Cohort ~~~~~~~~~~ The results below represent the average performance across all participants within a cohort. .. GENERATED FROM PYTHON SOURCE LINES 239-271 .. code-block:: Python fig, ax = plt.subplots() sns.boxplot( data=free_living_results, x="cohort", y="accuracy", hue="algo_with_version", order=cohort_order, ax=ax, ) ax.set_title("Accuracy") fig.show() fig, ax = plt.subplots() sns.boxplot( data=free_living_results, x="cohort", y="accuracy_pairwise", hue="algo_with_version", order=cohort_order, ax=ax, ) ax.set_title("Accuracy IC-pairs") fig.show() perf_metrics_cohort = ( free_living_results.groupby(["cohort", "algo", "version"]) .apply(apply_aggregations, custom_aggs, include_groups=False) .pipe(format_tables) .loc[cohort_order] ) perf_metrics_cohort.style.pipe( revalidation_table_styles, validation_thresholds, ["cohort", "algo"] ) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_003.png :alt: Accuracy :srcset: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_003.png :class: sphx-glr-single-img * .. image-sg:: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_004.png :alt: Accuracy IC-pairs :srcset: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_004.png :class: sphx-glr-single-img .. raw:: html
      # participants Accuracy Accuracy IC-pairs
cohort algo version      
HA BenMansour - 20 0.92 [0.88, 0.95] 0.91 [0.89, 0.92]
McCamley - 20 0.85 [0.80, 0.90] 0.81 [0.76, 0.86]
Ullrich - MS-ALL MobGap 20 0.86 [0.81, 0.91] 0.84 [0.80, 0.88]
Original Implementation 20 0.84 [0.79, 0.88] 0.78 [0.73, 0.84]
Ullrich - MS-MS Original Implementation 20 0.82 [0.76, 0.87] 0.78 [0.73, 0.83]
CHF BenMansour - 10 0.89 [0.76, 1.02] 0.92 [0.90, 0.94]
McCamley - 10 0.82 [0.73, 0.91] 0.80 [0.72, 0.88]
Ullrich - MS-ALL MobGap 10 0.83 [0.73, 0.93] 0.85 [0.80, 0.90]
Original Implementation 10 0.80 [0.71, 0.90] 0.78 [0.72, 0.85]
Ullrich - MS-MS Original Implementation 10 0.72 [0.61, 0.84] 0.72 [0.63, 0.82]
COPD BenMansour - 17 0.89 [0.84, 0.95] 0.87 [0.83, 0.90]
McCamley - 17 0.70 [0.62, 0.78] 0.71 [0.66, 0.76]
Ullrich - MS-ALL MobGap 17 0.76 [0.68, 0.84] 0.78 [0.74, 0.82]
Original Implementation 17 0.70 [0.62, 0.78] 0.69 [0.64, 0.73]
Ullrich - MS-MS Original Implementation 17 0.76 [0.70, 0.83] 0.74 [0.70, 0.78]
MS BenMansour - 18 0.86 [0.75, 0.97] 0.89 [0.84, 0.93]
McCamley - 18 0.78 [0.68, 0.87] 0.79 [0.74, 0.84]
Ullrich - MS-ALL MobGap 18 0.79 [0.69, 0.88] 0.82 [0.77, 0.86]
Original Implementation 18 0.75 [0.66, 0.84] 0.73 [0.67, 0.79]
Ullrich - MS-MS Original Implementation 18 0.75 [0.65, 0.86] 0.80 [0.75, 0.85]
PD BenMansour - 19 0.92 [0.90, 0.95] 0.90 [0.86, 0.93]
McCamley - 19 0.79 [0.71, 0.87] 0.78 [0.72, 0.84]
Ullrich - MS-ALL MobGap 19 0.81 [0.71, 0.91] 0.83 [0.76, 0.90]
Original Implementation 19 0.79 [0.70, 0.88] 0.78 [0.70, 0.85]
Ullrich - MS-MS Original Implementation 19 0.78 [0.68, 0.87] 0.78 [0.71, 0.84]
PFF BenMansour - 17 0.83 [0.72, 0.94] 0.85 [0.82, 0.89]
McCamley - 17 0.75 [0.67, 0.84] 0.73 [0.68, 0.78]
Ullrich - MS-ALL MobGap 17 0.76 [0.65, 0.86] 0.76 [0.69, 0.83]
Original Implementation 17 0.75 [0.65, 0.84] 0.70 [0.61, 0.79]
Ullrich - MS-MS Original Implementation 17 0.70 [0.60, 0.81] 0.70 [0.62, 0.77]


.. GENERATED FROM PYTHON SOURCE LINES 272-276 Deep Dive Analysis of Main Algorithms ------------------------------------- Below, we show the direct correlation between the results from the old and the new implementation. Each datapoint represents one participant. .. GENERATED FROM PYTHON SOURCE LINES 276-323 .. code-block:: Python from mobgap.plotting import ( calc_min_max_with_margin, make_square, move_legend_outside, plot_regline, ) def compare_scatter_plot(data, name): fig, ax = plt.subplots(figsize=(8, 8), constrained_layout=True) reformated_data = ( data.pivot_table( values="accuracy", index=("cohort", "participant_id"), columns="version", ) .reset_index() .dropna(how="any") ) min_max = calc_min_max_with_margin( reformated_data["Original Implementation"], reformated_data["MobGap"] ) sns.scatterplot( reformated_data, x="Original Implementation", y="MobGap", hue="cohort", ax=ax, ) plot_regline( reformated_data["Original Implementation"], reformated_data["MobGap"], ax=ax, ) make_square(ax, min_max, draw_diagonal=True) move_legend_outside(fig, ax) ax.set_title(name) ax.set_xlabel("Original Implementation") ax.set_ylabel("MobGap") plt.show() free_living_results.query("algo == 'Ullrich - MS-ALL'").pipe( compare_scatter_plot, "Ullrich - MS-ALL" ) .. image-sg:: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_005.png :alt: Ullrich - MS-ALL :srcset: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 324-333 Conclusion Free-Living ~~~~~~~~~~~~~~~~~~~~~~ It is good to see that the new version of the algorithm performs slightly better than the old version. However, it is unclear, why the new model is different, as we used almost the same pipeline and the same data. The non-ML algo (McCamly) performs surprisingly well, and much better than in the tests we did as part of Mobilise-D. Overall, the performance is not as good as we would like it to be. In particular for a couple of participants, where the performance is as low as 0.1. The Manseur algorithm overall performs the best, even though it is the simplest algorithm in the group. This is very surprising and should be further investigated. .. GENERATED FROM PYTHON SOURCE LINES 336-344 Laboratory Comparison --------------------- Every datapoint below is one trial of a test. Note, that each datapoint is weighted equally in the calculation of the performance metrics. This is a limitation of this simple approach, as the number of strides per trial and the complexity of the context can vary significantly. For a full picture, different groups of tests should be analyzed separately. The approach below should still provide a good overview to compare the algorithms. .. GENERATED FROM PYTHON SOURCE LINES 344-367 .. code-block:: Python fig, ax = plt.subplots() sns.boxplot(data=lab_results, x="algo_with_version", y="accuracy", ax=ax) plt.xticks(rotation=45, ha="right") fig.tight_layout() fig.show() fig, ax = plt.subplots() sns.boxplot( data=lab_results, x="algo_with_version", y="accuracy_pairwise", ax=ax ) plt.xticks(rotation=45, ha="right") fig.tight_layout() fig.show() perf_metrics_all = ( lab_results.groupby(["algo", "version"]) .apply(apply_aggregations, custom_aggs, include_groups=False) .pipe(format_tables) ) perf_metrics_all.style.pipe( revalidation_table_styles, validation_thresholds, ["algo"] ) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_006.png :alt: 01 lrc analysis :srcset: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_006.png :class: sphx-glr-single-img * .. image-sg:: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_007.png :alt: 01 lrc analysis :srcset: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_007.png :class: sphx-glr-single-img .. raw:: html
    # participants Accuracy Accuracy IC-pairs
algo version      
BenMansour - 1168 0.87 [0.86, 0.87] 0.85 [0.84, 0.86]
McCamley - 1168 0.80 [0.79, 0.81] 0.77 [0.76, 0.78]
Ullrich - MS-ALL MobGap 1168 0.85 [0.84, 0.86] 0.83 [0.82, 0.84]
Original Implementation 1168 0.79 [0.78, 0.80] 0.74 [0.73, 0.75]
Ullrich - MS-MS Original Implementation 1168 0.78 [0.77, 0.79] 0.75 [0.74, 0.76]


.. GENERATED FROM PYTHON SOURCE LINES 368-371 Per Cohort ~~~~~~~~~~ The results below represent the average performance across all trails of all participants within a cohort. .. GENERATED FROM PYTHON SOURCE LINES 371-402 .. code-block:: Python fig, ax = plt.subplots() sns.boxplot( data=lab_results, x="cohort", y="accuracy", hue="algo_with_version", order=cohort_order, ax=ax, ) fig.show() fig, ax = plt.subplots() sns.boxplot( data=lab_results, x="cohort", y="accuracy_pairwise", hue="algo_with_version", order=cohort_order, ax=ax, ) fig.show() perf_metrics_cohort = ( lab_results.groupby(["cohort", "algo", "version"]) .apply(apply_aggregations, custom_aggs, include_groups=False) .pipe(format_tables) .loc[cohort_order] ) perf_metrics_cohort.style.pipe( revalidation_table_styles, validation_thresholds, ["cohort", "algo"] ) .. rst-class:: sphx-glr-horizontal * .. image-sg:: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_008.png :alt: 01 lrc analysis :srcset: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_008.png :class: sphx-glr-single-img * .. image-sg:: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_009.png :alt: 01 lrc analysis :srcset: /auto_revalidation/laterality/images/sphx_glr__01_lrc_analysis_009.png :class: sphx-glr-single-img .. raw:: html
      # participants Accuracy Accuracy IC-pairs
cohort algo version      
HA BenMansour - 227 0.88 [0.87, 0.89] 0.85 [0.83, 0.86]
McCamley - 227 0.80 [0.77, 0.83] 0.78 [0.75, 0.80]
Ullrich - MS-ALL MobGap 227 0.82 [0.79, 0.84] 0.82 [0.80, 0.84]
Original Implementation 227 0.76 [0.73, 0.79] 0.74 [0.72, 0.77]
Ullrich - MS-MS Original Implementation 227 0.81 [0.79, 0.83] 0.78 [0.75, 0.80]
CHF BenMansour - 106 0.88 [0.86, 0.90] 0.85 [0.83, 0.88]
McCamley - 106 0.85 [0.82, 0.87] 0.80 [0.77, 0.83]
Ullrich - MS-ALL MobGap 106 0.88 [0.86, 0.90] 0.86 [0.83, 0.88]
Original Implementation 106 0.81 [0.79, 0.84] 0.75 [0.71, 0.79]
Ullrich - MS-MS Original Implementation 106 0.73 [0.69, 0.77] 0.69 [0.65, 0.73]
COPD BenMansour - 214 0.81 [0.78, 0.84] 0.81 [0.79, 0.83]
McCamley - 214 0.80 [0.77, 0.83] 0.77 [0.74, 0.80]
Ullrich - MS-ALL MobGap 214 0.87 [0.85, 0.90] 0.85 [0.83, 0.88]
Original Implementation 214 0.79 [0.76, 0.82] 0.76 [0.73, 0.78]
Ullrich - MS-MS Original Implementation 214 0.83 [0.81, 0.85] 0.78 [0.75, 0.80]
MS BenMansour - 228 0.87 [0.85, 0.89] 0.85 [0.83, 0.87]
McCamley - 228 0.79 [0.76, 0.82] 0.77 [0.74, 0.79]
Ullrich - MS-ALL MobGap 228 0.86 [0.84, 0.88] 0.83 [0.80, 0.85]
Original Implementation 228 0.81 [0.79, 0.83] 0.74 [0.71, 0.77]
Ullrich - MS-MS Original Implementation 228 0.76 [0.73, 0.79] 0.74 [0.71, 0.77]
PD BenMansour - 224 0.88 [0.87, 0.90] 0.86 [0.84, 0.87]
McCamley - 224 0.77 [0.74, 0.79] 0.75 [0.73, 0.78]
Ullrich - MS-ALL MobGap 224 0.81 [0.79, 0.84] 0.82 [0.79, 0.84]
Original Implementation 224 0.77 [0.74, 0.79] 0.72 [0.69, 0.75]
Ullrich - MS-MS Original Implementation 224 0.75 [0.73, 0.78] 0.73 [0.70, 0.75]
PFF BenMansour - 169 0.89 [0.87, 0.91] 0.87 [0.84, 0.89]
McCamley - 169 0.83 [0.81, 0.85] 0.77 [0.74, 0.80]
Ullrich - MS-ALL MobGap 169 0.86 [0.84, 0.88] 0.81 [0.79, 0.84]
Original Implementation 169 0.81 [0.79, 0.83] 0.74 [0.72, 0.76]
Ullrich - MS-MS Original Implementation 169 0.81 [0.79, 0.84] 0.76 [0.73, 0.79]


.. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 5.224 seconds) **Estimated memory usage:** 81 MB .. _sphx_glr_download_auto_revalidation_laterality__01_lrc_analysis.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: _01_lrc_analysis.ipynb <_01_lrc_analysis.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: _01_lrc_analysis.py <_01_lrc_analysis.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: _01_lrc_analysis.zip <_01_lrc_analysis.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_