.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/laterality/_99_lrc_evaluation.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_laterality__99_lrc_evaluation.py: .. _lrc_evaluation: LRC Evaluation ============== This example demonstrates how to evaluate an LRC algorithm. As left-right classification, is a balanced binary classification problem, we can apply simple metrics like accuracy to evaluate the performance of the algorithm. .. GENERATED FROM PYTHON SOURCE LINES 12-19 .. code-block:: Python import pandas as pd from mobgap.data import LabExampleDataset from mobgap.laterality import LrcUllrich from mobgap.pipeline import GsIterator from mobgap.utils.conversions import to_body_frame .. GENERATED FROM PYTHON SOURCE LINES 20-26 Loading some example data ------------------------- First, we load some example data and apply the LrcUllrich algorithm with its default pre-trained model to it. We use the reference initial contacts as input for the algorithm so that we can focus on the evaluation of the L/R classification independently of the detection of the initial contacts. However, you can use any other algorithm as well. .. GENERATED FROM PYTHON SOURCE LINES 26-69 .. code-block:: Python def load_data(): lab_example_data = LabExampleDataset(reference_system="INDIP") single_test = lab_example_data.get_subset( cohort="MS", participant_id="001", test="Test11", trial="Trial1" ) return single_test def calculate_output(single_test_data): """Calculate the GSD Iluz output per WB.""" iterator = GsIterator() ref_paras = single_test_data.reference_parameters_relative_to_wb_ for (gs, data), r in iterator.iterate( to_body_frame(single_test_data.data_ss), ref_paras.wb_list ): ref_ics = ref_paras.ic_list.loc[gs.id] r.ic_list = ( LrcUllrich() .predict( data, ref_ics, sampling_rate_hz=single_test_data.sampling_rate_hz, ) .ic_lr_list_ ) return iterator.results_.ic_list def load_reference(single_test_data): """Load the reference gait sequences from the test data.""" ref_gsd = single_test_data.reference_parameters_.ic_list return ref_gsd test_data = load_data() calculated_ic_lr_list = calculate_output(test_data) reference_ic_lr_list = load_reference(test_data) .. GENERATED FROM PYTHON SOURCE LINES 70-72 We can see that the calculated and the reference ic_list have the same structure with the ``lr_label`` column providing the detected label per initial contact. .. GENERATED FROM PYTHON SOURCE LINES 72-74 .. code-block:: Python calculated_ic_lr_list .. raw:: html

		ic	lr_label
wb_id	step_id
0	0	1019	right
	1	1065	left
	2	1129	left
	3	1172	left
	4	1236	right
...	...	...	...
5	7	21896	left
	8	21951	right
	9	22042	left
	10	22089	left
	11	22128	right

91 rows × 2 columns

.. GENERATED FROM PYTHON SOURCE LINES 75-77 .. code-block:: Python reference_ic_lr_list .. raw:: html

		ic	lr_label
wb_id	step_id
0	0	1019	right
	1	1065	left
	2	1129	right
	3	1172	left
	4	1236	right
...	...	...	...
5	7	21896	right
	8	21951	left
	9	22042	right
	10	22089	left
	11	22128	right

91 rows × 2 columns

.. GENERATED FROM PYTHON SOURCE LINES 78-81 Visual comparison of the detected and reference labels ------------------------------------------------------ One easy way to compare the results is to visualize them as colorful bars. .. GENERATED FROM PYTHON SOURCE LINES 81-110 .. code-block:: Python import matplotlib.pyplot as plt def plot_lr(ref, detected): fig, ax = plt.subplots(figsize=(15, 5)) # We plot one box either (red or blue depending on the laterality) for each detected IC ignoring the actual time for (_, row), (_, ref_row) in zip(detected.iterrows(), ref.iterrows()): ax.plot( [row["ic"], row["ic"]], [0, 0.98], color="r" if row["lr_label"] == "left" else "b", linewidth=5, ) ax.plot( [ref_row["ic"], ref_row["ic"]], [1.02, 2], color="r" if ref_row["lr_label"] == "left" else "b", linewidth=5, ) ax.set_yticks([0.5, 1.5]) ax.set_yticklabels(["Detected", "Reference"]) return fig, ax fig, _ = plot_lr(reference_ic_lr_list, calculated_ic_lr_list) fig.show() .. image-sg:: /auto_examples/laterality/images/sphx_glr__99_lrc_evaluation_001.png :alt: 99 lrc evaluation :srcset: /auto_examples/laterality/images/sphx_glr__99_lrc_evaluation_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 111-113 If we zoom in on a longer WB, we can see that for some ICs the L/R label does not match. But, in particular for regular gait in the center of the WB, the labels match quite well. .. GENERATED FROM PYTHON SOURCE LINES 113-118 .. code-block:: Python fig, ax = plot_lr(reference_ic_lr_list, calculated_ic_lr_list) ax.set_xlim(12000, 15000) fig.show() .. image-sg:: /auto_examples/laterality/images/sphx_glr__99_lrc_evaluation_002.png :alt: 99 lrc evaluation :srcset: /auto_examples/laterality/images/sphx_glr__99_lrc_evaluation_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 119-122 Calculating evaluation metrics ------------------------------ We can also quantify the agreement between the detected and the reference labels using typical classification metrics. .. GENERATED FROM PYTHON SOURCE LINES 122-140 .. code-block:: Python from sklearn.metrics import classification_report if not reference_ic_lr_list.empty: lr_classification_report = pd.DataFrame( classification_report( reference_ic_lr_list["lr_label"], calculated_ic_lr_list["lr_label"], target_names=["left", "right"], output_dict=True, ) ).T else: lr_classification_report = pd.DataFrame( columns=["precision", "recall", "f1-score", "support"] ) lr_classification_report .. raw:: html

	precision	recall	f1-score	support
left	0.782609	0.720000	0.750000	50.000000
right	0.688889	0.756098	0.720930	41.000000
accuracy	0.736264	0.736264	0.736264	0.736264
macro avg	0.735749	0.738049	0.735465	91.000000
weighted avg	0.740383	0.736264	0.736903	91.000000

.. GENERATED FROM PYTHON SOURCE LINES 141-143 In general we focus on the accuracy, as it is a balanced binary classification problem. If you only want to calculate this you can just calculate the ``accuracy_score`` .. GENERATED FROM PYTHON SOURCE LINES 143-154 .. code-block:: Python from sklearn.metrics import accuracy_score lr_accuracy = ( accuracy_score( reference_ic_lr_list["lr_label"], calculated_ic_lr_list["lr_label"] ) if not reference_ic_lr_list.empty else float("nan") ) lr_accuracy .. rst-class:: sphx-glr-script-out .. code-block:: none 0.7362637362637363 .. GENERATED FROM PYTHON SOURCE LINES 155-156 Similarly, we could create a confusion matrix to get more insights into the performance of the algorithm. .. GENERATED FROM PYTHON SOURCE LINES 156-164 .. code-block:: Python from sklearn.metrics import ConfusionMatrixDisplay if not reference_ic_lr_list.empty: disp = ConfusionMatrixDisplay.from_predictions( reference_ic_lr_list["lr_label"], calculated_ic_lr_list["lr_label"] ) disp.figure_.show() .. image-sg:: /auto_examples/laterality/images/sphx_glr__99_lrc_evaluation_003.png :alt: 99 lrc evaluation :srcset: /auto_examples/laterality/images/sphx_glr__99_lrc_evaluation_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 165-173 Running a full evaluation pipeline ---------------------------------- Instead of manually evaluating and investigating the performance of an algorithm on a single piece of data, we often want to run a full evaluation on an entire dataset. This can be done using the :class:`~mobgap.laterality.base.LrdPipeline` class and some ``tpcp`` functions. But let's start with selecting some data. We want to use all the simulated real-world walking data from the INDIP reference system (Test11). .. GENERATED FROM PYTHON SOURCE LINES 173-179 .. code-block:: Python simulated_real_world_walking = LabExampleDataset( reference_system="INDIP" ).get_subset(test="Test11") simulated_real_world_walking .. raw:: html

LabExampleDataset [3 groups/rows]

	cohort	participant_id	time_measure	test	trial
0	HA	001	TimeMeasure1	Test11	Trial1
1	HA	002	TimeMeasure1	Test11	Trial1
2	MS	001	TimeMeasure1	Test11	Trial1

.. GENERATED FROM PYTHON SOURCE LINES 180-181 Now we can create a pipeline instance and directly run it on of the datapoints of the dataset. .. GENERATED FROM PYTHON SOURCE LINES 181-187 .. code-block:: Python from mobgap.laterality.pipeline import LrcEmulationPipeline pipeline = LrcEmulationPipeline(LrcUllrich()) pipeline.safe_run(simulated_real_world_walking[0]).ic_lr_list_ .. raw:: html

		ic	lr_label
wb_id	step_id
0	0	632	right
	1	709	left
	2	763	left
	3	824	right
	4	876	left
...	...	...	...
5	3	12162	left
	4	12220	right
	5	12277	left
	6	12335	right
	7	12516	left

63 rows × 2 columns

.. GENERATED FROM PYTHON SOURCE LINES 188-194 This is exactly what we did before, just on a pipeline level, without manually extracting the data from the dataset. To now actually run a validation, we need to iterate over all datapoints and calculate the accuracy for each of them. This can be done using the :func:`~tpcp.validate.validate` function. Note, that the ``LrdPipeline`` class already has a ``score`` method that returns the accuracy. This is used by default, but you could supply your own scoring method as well. .. GENERATED FROM PYTHON SOURCE LINES 194-206 .. code-block:: Python from mobgap.laterality.evaluation import lrc_score from tpcp.validate import validate evaluation_results_with_opti = pd.DataFrame( validate( pipeline, simulated_real_world_walking, scoring=lrc_score, ) ) evaluation_results_with_opti.drop(["single__raw__predictions"], axis=1).T .. rst-class:: sphx-glr-script-out .. code-block:: none Datapoints: 0%| | 0/3 [00:00

	0
debug__score_time	0.729573
data_labels	[(HA, 001, TimeMeasure1, Test11, Trial1), (HA,...
single__accuracy	[0.7619047619047619, 0.7391304347826086, 0.736...
single__accuracy_pairwise	[0.6935483870967742, 0.5333333333333333, 0.666...
agg__accuracy	0.745766
agg__accuracy_pairwise	0.631183

.. GENERATED FROM PYTHON SOURCE LINES 207-213 The accuracy provided is the mean accuracy over all datapoints. The accuracy per datapoint can be found in the ``single_accuracy`` column. In addition to the metrics, we also provide the raw results for each datapoint in the ``single_raw_results`` column. This could be used for further analysis. For example to calculate the confusion matrix over all ICs of all datapoints. .. GENERATED FROM PYTHON SOURCE LINES 213-217 .. code-block:: Python raw_results = evaluation_results_with_opti["single__raw__predictions"][0] raw_results.head() .. raw:: html

							predicted	reference
cohort	participant_id	time_measure	test	trial	wb_id	step_id
HA	001	TimeMeasure1	Test11	Trial1	0	0	right	left
						1	left	right
						2	left	left
						3	right	right
						4	left	left

.. GENERATED FROM PYTHON SOURCE LINES 218-219 The confusion matrix can be calculated using the same functions as before. .. GENERATED FROM PYTHON SOURCE LINES 219-225 .. code-block:: Python if not raw_results.empty: disp = ConfusionMatrixDisplay.from_predictions( raw_results["reference"], raw_results["predicted"] ) disp.figure_.show() .. image-sg:: /auto_examples/laterality/images/sphx_glr__99_lrc_evaluation_004.png :alt: 99 lrc evaluation :srcset: /auto_examples/laterality/images/sphx_glr__99_lrc_evaluation_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 226-245 If you want to calculate additional metrics, you can either create a custom score function. Parameter Optimization and Model Training ----------------------------------------- Simply applying an algorithm for evaluation is one thing, but often we want to optimize the parameters of the algorithm, train internal models, or both and evalute the performance of this optimization approach and not just a fixed algorithm/model. In this case, we need to create a train test split on the dataset and to ensure we have independent data for the optimization. In general, we would recommend using a cross-validation approach. This can be done using the :func:`~tpcp.validate.cross_validate` function. In the example below, we show the "most complicated" case, where we retrain the internal model of the ``LrcUllrich`` algorithm and optimize one of the Hyperparmeters of the internal SVM. As we retrain the model and optimize hyperparameters, we need to use a :class:`~tpcp.optimize.GridSearchCV` nested within the cross-validation loop. Let's set this up first. .. GENERATED FROM PYTHON SOURCE LINES 245-251 .. code-block:: Python from sklearn.model_selection import ParameterGrid from sklearn.pipeline import Pipeline from sklearn.preprocessing import MinMaxScaler from sklearn.svm import SVC from tpcp.optimize import GridSearchCV .. GENERATED FROM PYTHON SOURCE LINES 252-253 We initialize the pipeline with an untrained model and an untrained scaler as a new pipeline. .. GENERATED FROM PYTHON SOURCE LINES 253-258 .. code-block:: Python clf_pipeline = Pipeline( [("scaler", MinMaxScaler()), ("clf", SVC(kernel="linear"))] ) pipeline = LrcEmulationPipeline(LrcUllrich(clf_pipe=clf_pipeline)) .. GENERATED FROM PYTHON SOURCE LINES 259-261 Then we can create a parameter Grid for the gridsearch. Note, that we use ``__`` to set nested parameters. .. GENERATED FROM PYTHON SOURCE LINES 261-263 .. code-block:: Python para_grid = ParameterGrid({"algo__clf_pipe__clf__C": [0.1, 1.0, 10.0]}) .. GENERATED FROM PYTHON SOURCE LINES 264-267 Then we path the pipeline to the optimizer. We only select a 2-fold cross-validation for this example, as we will only have 2 datapoints per train set and we want to minimize run time for this example. .. GENERATED FROM PYTHON SOURCE LINES 267-275 .. code-block:: Python optimizer = GridSearchCV( pipeline, para_grid, return_optimized="accuracy", cv=2, scoring=lrc_score, ) .. GENERATED FROM PYTHON SOURCE LINES 276-277 Let's test the optimizer first on a manual train set. .. GENERATED FROM PYTHON SOURCE LINES 277-279 .. code-block:: Python optimizer.optimize(simulated_real_world_walking[:2]) .. rst-class:: sphx-glr-script-out .. code-block:: none Split-Para Combos: 0%| | 0/6 [00:00, pipeline=LrcEmulationPipeline(algo=LrcUllrich(clf_pipe=Pipeline(steps=[('scaler', MinMaxScaler()), ('clf', SVC(kernel='linear'))]), smoothing_filter=ButterworthFilter(cutoff_freq_hz=(0.5, 2), filter_type='bandpass', order=4, zero_phase=True))), pre_dispatch='n_jobs', progress_bar=True, pure_parameters=False, return_optimized='accuracy', return_train_score=False, safe_optimize=True, scoring=Scorer(default_aggregator=, final_aggregator=, n_jobs=None, pre_dispatch='2*n_jobs', progress_bar=True, score_func=, single_score_callback=None, verbose=0), verbose=0) .. GENERATED FROM PYTHON SOURCE LINES 280-281 We can inspect the results: .. GENERATED FROM PYTHON SOURCE LINES 281-284 .. code-block:: Python results = pd.DataFrame(optimizer.cv_results_) results.loc[:, ~results.columns.str.endswith("raw_results")].T .. raw:: html

	0	1	2
mean__debug__optimize_time	0.188295	0.170197	0.169777
std__debug__optimize_time	0.008776	0.008983	0.009409
mean__debug__score_time	0.221495	0.219404	0.219253
std__debug__score_time	0.017926	0.01672	0.016159
split0__test__data_labels	[(HA, 001, TimeMeasure1, Test11, Trial1)]	[(HA, 001, TimeMeasure1, Test11, Trial1)]	[(HA, 001, TimeMeasure1, Test11, Trial1)]
split1__test__data_labels	[(HA, 002, TimeMeasure1, Test11, Trial1)]	[(HA, 002, TimeMeasure1, Test11, Trial1)]	[(HA, 002, TimeMeasure1, Test11, Trial1)]
split0__train__data_labels	[(HA, 002, TimeMeasure1, Test11, Trial1)]	[(HA, 002, TimeMeasure1, Test11, Trial1)]	[(HA, 002, TimeMeasure1, Test11, Trial1)]
split1__train__data_labels	[(HA, 001, TimeMeasure1, Test11, Trial1)]	[(HA, 001, TimeMeasure1, Test11, Trial1)]	[(HA, 001, TimeMeasure1, Test11, Trial1)]
param__algo__clf_pipe__clf__C	0.1	1.0	10.0
params	{'algo__clf_pipe__clf__C': 0.1}	{'algo__clf_pipe__clf__C': 1.0}	{'algo__clf_pipe__clf__C': 10.0}
split0__test__agg__accuracy	0.507937	0.746032	0.793651
split1__test__agg__accuracy	0.652174	0.695652	0.782609
mean__test__agg__accuracy	0.580055	0.720842	0.78813
std__test__agg__accuracy	0.072119	0.02519	0.005521
rank__test__agg__accuracy	3	2	1
split0__test__agg__accuracy_pairwise	0.064516	0.548387	0.709677
split1__test__agg__accuracy_pairwise	0.688889	0.644444	0.644444
mean__test__agg__accuracy_pairwise	0.376703	0.596416	0.677061
std__test__agg__accuracy_pairwise	0.312186	0.048029	0.032616
rank__test__agg__accuracy_pairwise	3	2	1
split0__test__single__accuracy	[0.5079365079365079]	[0.746031746031746]	[0.7936507936507936]
split1__test__single__accuracy	[0.6521739130434783]	[0.6956521739130435]	[0.782608695652174]
split0__test__single__accuracy_pairwise	[0.06451612903225806]	[0.5483870967741935]	[0.7096774193548387]
split1__test__single__accuracy_pairwise	[0.6888888888888889]	[0.6444444444444445]	[0.6444444444444445]
split0__test__single__raw__predictions	...	...	...
split1__test__single__raw__predictions	...	...	...

.. GENERATED FROM PYTHON SOURCE LINES 285-286 And apply/score the best performing and retrained model directly on the test set. .. GENERATED FROM PYTHON SOURCE LINES 286-291 .. code-block:: Python lrc_score(optimizer.optimized_pipeline_, simulated_real_world_walking[2])[0][ "accuracy" ] .. rst-class:: sphx-glr-script-out .. code-block:: none Datapoints: 0%| | 0/1 [00:00

	0	1	2
debug__score_time	0.233761	0.200758	0.242409
debug__optimize_time	2.892581	3.070624	2.827143
train__data_labels	[(HA, 002, TimeMeasure1, Test11, Trial1), (MS,...	[(HA, 001, TimeMeasure1, Test11, Trial1), (MS,...	[(HA, 001, TimeMeasure1, Test11, Trial1), (HA,...
test__data_labels	[(HA, 001, TimeMeasure1, Test11, Trial1)]	[(HA, 002, TimeMeasure1, Test11, Trial1)]	[(MS, 001, TimeMeasure1, Test11, Trial1)]
test__single__accuracy	[0.9047619047619048]	[0.5869565217391305]	[0.8681318681318682]
test__single__accuracy_pairwise	[0.8870967741935484]	[0.4444444444444444]	[0.7777777777777778]
test__agg__accuracy	0.904762	0.586957	0.868132
test__agg__accuracy_pairwise	0.887097	0.444444	0.777778

.. GENERATED FROM PYTHON SOURCE LINES 308-311 We can compare these results with the performance of the pre-trained model that was not optimized for the given dataset, by using :class:`~tpcp.optimize.DummyOptimize`, to run a cross-validation, but without any optimization. We simply evaluate the pre-trained model on exactly the same test sets as the optimized model. .. GENERATED FROM PYTHON SOURCE LINES 311-329 .. code-block:: Python from tpcp.optimize import DummyOptimize optimizer = DummyOptimize( LrcEmulationPipeline(LrcUllrich()), ignore_potential_user_error_warning=True ) evaluation_results_pre_trained = pd.DataFrame( cross_validate( optimizer, simulated_real_world_walking, cv=3, scoring=lrc_score, ) ) evaluation_results_pre_trained.loc[ :, ~evaluation_results_pre_trained.columns.str.endswith("raw__predictions") ].T .. rst-class:: sphx-glr-script-out .. code-block:: none CV Folds: 0%| | 0/3 [00:00

	0	1	2
debug__score_time	0.268173	0.216302	0.262861
debug__optimize_time	0.002526	0.001352	0.001348
train__data_labels	[(HA, 002, TimeMeasure1, Test11, Trial1), (MS,...	[(HA, 001, TimeMeasure1, Test11, Trial1), (MS,...	[(HA, 001, TimeMeasure1, Test11, Trial1), (HA,...
test__data_labels	[(HA, 001, TimeMeasure1, Test11, Trial1)]	[(HA, 002, TimeMeasure1, Test11, Trial1)]	[(MS, 001, TimeMeasure1, Test11, Trial1)]
test__single__accuracy	[0.7619047619047619]	[0.7391304347826086]	[0.7362637362637363]
test__single__accuracy_pairwise	[0.6935483870967742]	[0.5333333333333333]	[0.6666666666666666]
test__agg__accuracy	0.761905	0.73913	0.736264
test__agg__accuracy_pairwise	0.693548	0.533333	0.666667

.. GENERATED FROM PYTHON SOURCE LINES 330-333 Note that using only so little data is not a good idea in practice. There are many parameters, that you should tweak to make this a robust validation. However, this example should provide a good starting point for your own experiments. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 19.137 seconds) **Estimated memory usage:** 89 MB .. _sphx_glr_download_auto_examples_laterality__99_lrc_evaluation.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: _99_lrc_evaluation.ipynb <_99_lrc_evaluation.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: _99_lrc_evaluation.py <_99_lrc_evaluation.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: _99_lrc_evaluation.zip <_99_lrc_evaluation.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_