.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/laterality/_99_lrc_evaluation.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_laterality__99_lrc_evaluation.py: .. _lrc_evaluation: LRC Evaluation ============== This example demonstrates how to evaluate an LRC algorithm. As left-right classification, is a balanced binary classification problem, we can apply simple metrics like accuracy to evaluate the performance of the algorithm. .. GENERATED FROM PYTHON SOURCE LINES 12-19 .. code-block:: default import pandas as pd from mobgap.data import LabExampleDataset from mobgap.laterality import LrcUllrich from mobgap.pipeline import GsIterator from mobgap.utils.conversions import to_body_frame .. GENERATED FROM PYTHON SOURCE LINES 20-26 Loading some example data -------------------------- First, we load some example data and apply the LrcUllrich algorithm with its default pre-trained model to it. We use the reference initial contacts as input for the algorithm so that we can focus on the evaluation of the L/R classification independently of the detection of the initial contacts. However, you can use any other algorithm as well. .. GENERATED FROM PYTHON SOURCE LINES 26-69 .. code-block:: default def load_data(): lab_example_data = LabExampleDataset(reference_system="INDIP") single_test = lab_example_data.get_subset( cohort="MS", participant_id="001", test="Test11", trial="Trial1" ) return single_test def calculate_output(single_test_data): """Calculate the GSD Iluz output per WB.""" iterator = GsIterator() ref_paras = single_test_data.reference_parameters_relative_to_wb_ for (gs, data), r in iterator.iterate( to_body_frame(single_test_data.data_ss), ref_paras.wb_list ): ref_ics = ref_paras.ic_list.loc[gs.id] r.ic_list = ( LrcUllrich() .predict( data, ref_ics, sampling_rate_hz=single_test_data.sampling_rate_hz, ) .ic_lr_list_ ) return iterator.results_.ic_list def load_reference(single_test_data): """Load the reference gait sequences from the test data.""" ref_gsd = single_test_data.reference_parameters_.ic_list return ref_gsd test_data = load_data() calculated_ic_lr_list = calculate_output(test_data) reference_ic_lr_list = load_reference(test_data) .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/data/_mobilised_matlab_loader.py:1091: UserWarning: There were multiple ICs with the same index value, but different LR labels. This is likely an issue with the reference system you should further investigate. For now, we set the `lr_label` of the stride corresponding to this IC to Nan. However, both values still remain in the IC list. return parse_reference_parameters( /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/data/_mobilised_matlab_loader.py:1082: UserWarning: There were multiple ICs with the same index value, but different LR labels. This is likely an issue with the reference system you should further investigate. For now, we set the `lr_label` of the stride corresponding to this IC to Nan. However, both values still remain in the IC list. return parse_reference_parameters( .. GENERATED FROM PYTHON SOURCE LINES 70-72 We can see that the calculated and the reference ic_list have the same structure with the ``lr_label`` column providing the detected label per initial contact. .. GENERATED FROM PYTHON SOURCE LINES 72-74 .. code-block:: default calculated_ic_lr_list .. raw:: html
ic lr_label
wb_id step_id
0 0 1019 right
1 1065 left
2 1129 right
3 1172 left
4 1236 right
... ... ... ...
5 7 21896 left
8 21951 right
9 22042 left
10 22089 right
11 22128 right

93 rows × 2 columns



.. GENERATED FROM PYTHON SOURCE LINES 75-77 .. code-block:: default reference_ic_lr_list .. raw:: html
ic lr_label
wb_id step_id
0 0 1019 right
1 1065 left
2 1129 right
3 1172 left
4 1236 right
... ... ... ...
5 7 21896 right
8 21951 left
9 22042 right
10 22089 left
11 22128 right

93 rows × 2 columns



.. GENERATED FROM PYTHON SOURCE LINES 78-81 Visual comparison of the detected and reference labels ------------------------------------------------------ One easy way to compare the results is to visualize them as colorful bars. .. GENERATED FROM PYTHON SOURCE LINES 81-110 .. code-block:: default import matplotlib.pyplot as plt def plot_lr(ref, detected): fig, ax = plt.subplots(figsize=(15, 5)) # We plot one box either (red or blue depending on the laterality) for each detected IC ignoring the actual time for (_, row), (_, ref_row) in zip(detected.iterrows(), ref.iterrows()): ax.plot( [row["ic"], row["ic"]], [0, 0.98], color="r" if row["lr_label"] == "left" else "b", linewidth=5, ) ax.plot( [ref_row["ic"], ref_row["ic"]], [1.02, 2], color="r" if ref_row["lr_label"] == "left" else "b", linewidth=5, ) ax.set_yticks([0.5, 1.5]) ax.set_yticklabels(["Detected", "Reference"]) return fig, ax fig, _ = plot_lr(reference_ic_lr_list, calculated_ic_lr_list) fig.show() .. image-sg:: /auto_examples/laterality/images/sphx_glr__99_lrc_evaluation_001.png :alt: 99 lrc evaluation :srcset: /auto_examples/laterality/images/sphx_glr__99_lrc_evaluation_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 111-113 If we zoom in on a longer WB, we can see that for some ICs the L/R label does not match. But, in particular for regular gait in the center of the WB, the labels match quite well. .. GENERATED FROM PYTHON SOURCE LINES 113-118 .. code-block:: default fig, ax = plot_lr(reference_ic_lr_list, calculated_ic_lr_list) ax.set_xlim(12000, 15000) fig.show() .. image-sg:: /auto_examples/laterality/images/sphx_glr__99_lrc_evaluation_002.png :alt: 99 lrc evaluation :srcset: /auto_examples/laterality/images/sphx_glr__99_lrc_evaluation_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 119-122 Calculating evaluation metrics ------------------------------ We can also quantify the agreement between the detected and the reference labels using typical classification metrics. .. GENERATED FROM PYTHON SOURCE LINES 122-133 .. code-block:: default from sklearn.metrics import classification_report pd.DataFrame( classification_report( reference_ic_lr_list["lr_label"], calculated_ic_lr_list["lr_label"], target_names=["left", "right"], output_dict=True, ) ).T .. raw:: html
precision recall f1-score support
left 0.823529 0.807692 0.815534 52.000000
right 0.761905 0.780488 0.771084 41.000000
accuracy 0.795699 0.795699 0.795699 0.795699
macro avg 0.792717 0.794090 0.793309 93.000000
weighted avg 0.796362 0.795699 0.795938 93.000000


.. GENERATED FROM PYTHON SOURCE LINES 134-136 In general we focus on the accuracy, as it is a balanced binary classification problem. If you only want to calculate this you can just calculate the ``accuracy_score`` .. GENERATED FROM PYTHON SOURCE LINES 136-142 .. code-block:: default from sklearn.metrics import accuracy_score accuracy_score( reference_ic_lr_list["lr_label"], calculated_ic_lr_list["lr_label"] ) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.7956989247311828 .. GENERATED FROM PYTHON SOURCE LINES 143-144 Similarly, we could create a confusion matrix to get more insights into the performance of the algorithm. .. GENERATED FROM PYTHON SOURCE LINES 144-151 .. code-block:: default from sklearn.metrics import ConfusionMatrixDisplay disp = ConfusionMatrixDisplay.from_predictions( reference_ic_lr_list["lr_label"], calculated_ic_lr_list["lr_label"] ) disp.figure_.show() .. image-sg:: /auto_examples/laterality/images/sphx_glr__99_lrc_evaluation_003.png :alt: 99 lrc evaluation :srcset: /auto_examples/laterality/images/sphx_glr__99_lrc_evaluation_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 152-160 Running a full evaluation pipeline ---------------------------------- Instead of manually evaluating and investigating the performance of an algorithm on a single piece of data, we often want to run a full evaluation on an entire dataset. This can be done using the :class:`~mobgap.laterality.base.LrdPipeline` class and some ``tpcp`` functions. But let's start with selecting some data. We want to use all the simulated real-world walking data from the INDIP reference system (Test11). .. GENERATED FROM PYTHON SOURCE LINES 160-166 .. code-block:: default simulated_real_world_walking = LabExampleDataset( reference_system="INDIP" ).get_subset(test="Test11") simulated_real_world_walking .. raw:: html

LabExampleDataset [3 groups/rows]

cohort participant_id time_measure test trial
0 HA 001 TimeMeasure1 Test11 Trial1
1 HA 002 TimeMeasure1 Test11 Trial1
2 MS 001 TimeMeasure1 Test11 Trial1


.. GENERATED FROM PYTHON SOURCE LINES 167-168 Now we can create a pipeline instance and directly run it on of the datapoints of the dataset. .. GENERATED FROM PYTHON SOURCE LINES 168-174 .. code-block:: default from mobgap.laterality.pipeline import LrcEmulationPipeline pipeline = LrcEmulationPipeline(LrcUllrich()) pipeline.safe_run(simulated_real_world_walking[0]).ic_lr_list_ .. raw:: html
ic lr_label
wb_id step_id
0 0 632 right
1 709 left
2 763 left
3 824 right
4 876 left
... ... ... ...
5 3 12162 left
4 12220 right
5 12277 left
6 12335 right
7 12516 right

63 rows × 2 columns



.. GENERATED FROM PYTHON SOURCE LINES 175-181 This is exactly what we did before, just on a pipeline level, without manually extracting the data from the dataset. To now actually run a validation, we need to iterate over all datapoints and calculate the accuracy for each of them. This can be done using the :func:`~tpcp.validate.validate` function. Note, that the ``LrdPipeline`` class already has a ``score`` method that returns the accuracy. This is used by default, but you could supply your own scoring method as well. .. GENERATED FROM PYTHON SOURCE LINES 181-188 .. code-block:: default from tpcp.validate import validate evaluation_results_with_opti = pd.DataFrame( validate(pipeline, simulated_real_world_walking) ) evaluation_results_with_opti.drop(["single__raw_results"], axis=1).T .. rst-class:: sphx-glr-script-out .. code-block:: none Datapoints: 0%| | 0/3 [00:00
0
debug__score_time 1.026261
data_labels [(HA, 001, TimeMeasure1, Test11, Trial1), (HA,...
single__accuracy [0.7142857142857143, 0.8043478260869565, 0.795...
agg__accuracy 0.771444


.. GENERATED FROM PYTHON SOURCE LINES 189-195 The accuracy provided is the mean accuracy over all datapoints. The accuracy per datapoint can be found in the ``single_accuracy`` column. In addition to the metrics, we also provide the raw results for each datapoint in the ``single_raw_results`` column. This could be used for further analysis. For example to calculate the confusion matrix over all ICs of all datapoints. .. GENERATED FROM PYTHON SOURCE LINES 195-203 .. code-block:: default raw_results = pd.concat( evaluation_results_with_opti["single__raw_results"][0], keys=evaluation_results_with_opti["data_labels"][0], axis=0, ) raw_results.head() .. raw:: html
ic lr_label ref_lr_label
wb_id step_id
HA 001 TimeMeasure1 Test11 Trial1 0 0 632 right left
1 709 left right
2 763 left left
3 824 right right
4 876 left left


.. GENERATED FROM PYTHON SOURCE LINES 204-205 The confusion matrix can be calculated using the same functions as before. .. GENERATED FROM PYTHON SOURCE LINES 205-210 .. code-block:: default disp = ConfusionMatrixDisplay.from_predictions( raw_results["ref_lr_label"], raw_results["lr_label"] ) disp.figure_.show() .. image-sg:: /auto_examples/laterality/images/sphx_glr__99_lrc_evaluation_004.png :alt: 99 lrc evaluation :srcset: /auto_examples/laterality/images/sphx_glr__99_lrc_evaluation_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 211-231 If you want to calculate additional metrics, you can either create a custom score function or subclass the pipeline and overwrite the score function. Parameter Optimization and Model Training ----------------------------------------- Simply applying an algorithm for evaluation is one thing, but often we want to optimize the parameters of the algorithm, train internal models, or both and evalute the performance of this optimization approach and not just a fixed algorithm/model. In this case, we need to create a train test split on the dataset and to ensure we have independent data for the optimization. In general, we would recommend using a cross-validation approach. This can be done using the :func:`~tpcp.validate.cross_validate` function. In the example below, we show the "most complicated" case, where we retrain the internal model of the ``LrcUllrich`` algorithm and optimize one of the Hyperparmeters of the internal SVM. As we retrain the model and optimize hyperparameters, we need to use a :class:`~tpcp.optimize.GridSearchCV` nested within the cross-validation loop. Let's set this up first. .. GENERATED FROM PYTHON SOURCE LINES 231-237 .. code-block:: default from sklearn.model_selection import ParameterGrid from sklearn.pipeline import Pipeline from sklearn.preprocessing import MinMaxScaler from sklearn.svm import SVC from tpcp.optimize import GridSearchCV .. GENERATED FROM PYTHON SOURCE LINES 238-239 We initialize the pipeline with an untrained model and an untrained scaler as a new pipeline. .. GENERATED FROM PYTHON SOURCE LINES 239-244 .. code-block:: default clf_pipeline = Pipeline( [("scaler", MinMaxScaler()), ("clf", SVC(kernel="linear"))] ) pipeline = LrcEmulationPipeline(LrcUllrich(clf_pipe=clf_pipeline)) .. GENERATED FROM PYTHON SOURCE LINES 245-247 Then we can create a parameter Grid for the gridsearch. Note, that we use ``__`` to set nested parameters. .. GENERATED FROM PYTHON SOURCE LINES 247-249 .. code-block:: default para_grid = ParameterGrid({"algo__clf_pipe__clf__C": [0.1, 1.0, 10.0]}) .. GENERATED FROM PYTHON SOURCE LINES 250-253 Then we path the pipeline to the optimizer. We only select a 2-fold cross-validation for this example, as we will only have 2 datapoints per train set and we want to minimize run time for this example. .. GENERATED FROM PYTHON SOURCE LINES 253-255 .. code-block:: default optimizer = GridSearchCV(pipeline, para_grid, return_optimized="accuracy", cv=2) .. GENERATED FROM PYTHON SOURCE LINES 256-257 Let's test the optimizer first on a manual train set. .. GENERATED FROM PYTHON SOURCE LINES 257-259 .. code-block:: default optimizer.optimize(simulated_real_world_walking[:2]) .. rst-class:: sphx-glr-script-out .. code-block:: none Split-Para Combos: 0%| | 0/6 [00:00, pipeline=LrcEmulationPipeline(algo=LrcUllrich(clf_pipe=Pipeline(steps=[('scaler', MinMaxScaler()), ('clf', SVC(kernel='linear'))]), smoothing_filter=ButterworthFilter(cutoff_freq_hz=(0.5, 2), filter_type='bandpass', order=4, zero_phase=True))), pre_dispatch='n_jobs', progress_bar=True, pure_parameters=False, return_optimized='accuracy', return_train_score=False, safe_optimize=True, scoring=None, verbose=0) .. GENERATED FROM PYTHON SOURCE LINES 260-261 We can inspect the results: .. GENERATED FROM PYTHON SOURCE LINES 261-264 .. code-block:: default results = pd.DataFrame(optimizer.cv_results_) results.loc[:, ~results.columns.str.endswith("raw_results")].T .. raw:: html
0 1 2
mean__debug__optimize_time 0.296903 0.27337 0.274028
std__debug__optimize_time 0.015367 0.009517 0.010843
mean__debug__score_time 0.329314 0.330383 0.328388
std__debug__score_time 0.020126 0.021075 0.020266
split0__test__data_labels [(HA, 001, TimeMeasure1, Test11, Trial1)] [(HA, 001, TimeMeasure1, Test11, Trial1)] [(HA, 001, TimeMeasure1, Test11, Trial1)]
split1__test__data_labels [(HA, 002, TimeMeasure1, Test11, Trial1)] [(HA, 002, TimeMeasure1, Test11, Trial1)] [(HA, 002, TimeMeasure1, Test11, Trial1)]
split0__train__data_labels [(HA, 002, TimeMeasure1, Test11, Trial1)] [(HA, 002, TimeMeasure1, Test11, Trial1)] [(HA, 002, TimeMeasure1, Test11, Trial1)]
split1__train__data_labels [(HA, 001, TimeMeasure1, Test11, Trial1)] [(HA, 001, TimeMeasure1, Test11, Trial1)] [(HA, 001, TimeMeasure1, Test11, Trial1)]
param__algo__clf_pipe__clf__C 0.1 1.0 10.0
params {'algo__clf_pipe__clf__C': 0.1} {'algo__clf_pipe__clf__C': 1.0} {'algo__clf_pipe__clf__C': 10.0}
split0__test__agg__accuracy 0.507937 0.746032 0.793651
split1__test__agg__accuracy 0.652174 0.695652 0.782609
mean__test__agg__accuracy 0.580055 0.720842 0.78813
std__test__agg__accuracy 0.072119 0.02519 0.005521
rank__test__agg__accuracy 3 2 1
split0__test__single__accuracy [0.5079365079365079] [0.746031746031746] [0.7936507936507936]
split1__test__single__accuracy [0.6521739130434783] [0.6956521739130435] [0.782608695652174]


.. GENERATED FROM PYTHON SOURCE LINES 265-266 And apply/score the best performing and retrained model directly on the test set. .. GENERATED FROM PYTHON SOURCE LINES 266-268 .. code-block:: default optimizer.score(simulated_real_world_walking[2])["accuracy"] .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/data/_mobilised_matlab_loader.py:1091: UserWarning: There were multiple ICs with the same index value, but different LR labels. This is likely an issue with the reference system you should further investigate. For now, we set the `lr_label` of the stride corresponding to this IC to Nan. However, both values still remain in the IC list. return parse_reference_parameters( /home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/data/_mobilised_matlab_loader.py:1082: UserWarning: There were multiple ICs with the same index value, but different LR labels. This is likely an issue with the reference system you should further investigate. For now, we set the `lr_label` of the stride corresponding to this IC to Nan. However, both values still remain in the IC list. return parse_reference_parameters( 0.8709677419354839 .. GENERATED FROM PYTHON SOURCE LINES 269-270 Let's run everything combined with the external cross-validate to actually validate our optimization approach. .. GENERATED FROM PYTHON SOURCE LINES 270-279 .. code-block:: default from tpcp.validate import cross_validate evaluation_results_with_opti = pd.DataFrame( cross_validate(optimizer, simulated_real_world_walking, cv=3) ) evaluation_results_with_opti.loc[ :, ~evaluation_results_with_opti.columns.str.endswith("raw_results") ].T .. rst-class:: sphx-glr-script-out .. code-block:: none CV Folds: 0%| | 0/3 [00:00
0 1 2
debug__score_time 0.349116 0.31151 0.362476
debug__optimize_time 4.582617 4.833665 4.515596
train__data_labels [(HA, 002, TimeMeasure1, Test11, Trial1), (MS,... [(HA, 001, TimeMeasure1, Test11, Trial1), (MS,... [(HA, 001, TimeMeasure1, Test11, Trial1), (HA,...
test__data_labels [(HA, 001, TimeMeasure1, Test11, Trial1)] [(HA, 002, TimeMeasure1, Test11, Trial1)] [(MS, 001, TimeMeasure1, Test11, Trial1)]
test__single__accuracy [0.9047619047619048] [0.5869565217391305] [0.8709677419354839]
test__agg__accuracy 0.904762 0.586957 0.870968


.. GENERATED FROM PYTHON SOURCE LINES 280-283 We can compare these results with the performance of the pre-trained model that was not optimized for the given dataset, by using :class:`~tpcp.optimize.DummyOptimize`, to run a cross-validation, but without any optimization. We simply evaluate the pre-trained model on exactly the same test sets as the optimized model. .. GENERATED FROM PYTHON SOURCE LINES 283-296 .. code-block:: default from tpcp.optimize import DummyOptimize optimizer = DummyOptimize( LrcEmulationPipeline(LrcUllrich()), ignore_potential_user_error_warning=True ) evaluation_results_pre_trained = pd.DataFrame( cross_validate(optimizer, simulated_real_world_walking, cv=3) ) evaluation_results_pre_trained.loc[ :, ~evaluation_results_pre_trained.columns.str.endswith("raw_results") ].T .. rst-class:: sphx-glr-script-out .. code-block:: none CV Folds: 0%| | 0/3 [00:00
0 1 2
debug__score_time 0.379112 0.336233 0.38061
debug__optimize_time 0.002706 0.002096 0.002234
train__data_labels [(HA, 002, TimeMeasure1, Test11, Trial1), (MS,... [(HA, 001, TimeMeasure1, Test11, Trial1), (MS,... [(HA, 001, TimeMeasure1, Test11, Trial1), (HA,...
test__data_labels [(HA, 001, TimeMeasure1, Test11, Trial1)] [(HA, 002, TimeMeasure1, Test11, Trial1)] [(MS, 001, TimeMeasure1, Test11, Trial1)]
test__single__accuracy [0.7142857142857143] [0.8043478260869565] [0.7956989247311828]
test__agg__accuracy 0.714286 0.804348 0.795699


.. GENERATED FROM PYTHON SOURCE LINES 297-300 Note that using only so little data is not a good idea in practice. There are many parameters, that you should tweak to make this a robust validation. However, this example should provide a good starting point for your own experiments. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 27.020 seconds) **Estimated memory usage:** 18 MB .. _sphx_glr_download_auto_examples_laterality__99_lrc_evaluation.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: _99_lrc_evaluation.py <_99_lrc_evaluation.py>` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: _99_lrc_evaluation.ipynb <_99_lrc_evaluation.ipynb>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_