.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/gait_sequences/_04_gsd_evaluation_challenges.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_gait_sequences__04_gsd_evaluation_challenges.py: GSD Evaluation Challenges ------------------------- The :ref:`gsd_evaluation` example demonstrates how to evaluate the performance of a GSD algorithm on a single datapoint and explains the individual performance metrics that are calculated. With that you could set up a custom evaluation pipeline to run and then score the output of a GSD algorithm multiple datapoints and then aggregate the results. To make this process easier, we set up opinionated evaluation challenges that can be used to quickly perform the same evaluation with multiple algorithms and datasets. Below, we will show how to use them on the example dataset. .. GENERATED FROM PYTHON SOURCE LINES 16-20 Dataset ------- To use the challenges, we need to dataset with reference information in the expected format. We will use the :class:`~mobgap.data.LabExampleDataset` for this purpose. .. GENERATED FROM PYTHON SOURCE LINES 20-26 .. code-block:: default from mobgap.data import LabExampleDataset long_test = LabExampleDataset(reference_system="INDIP").get_subset( test="Test11" ) .. GENERATED FROM PYTHON SOURCE LINES 27-30 Algorithm --------- Next we need to create an instance of a valid GSD algorithm. .. GENERATED FROM PYTHON SOURCE LINES 30-34 .. code-block:: default from mobgap.gait_sequences import GsdIluz algo = GsdIluz() .. GENERATED FROM PYTHON SOURCE LINES 35-38 This algorithm needs to be wrapped in a :class:`~mobgap.gait_sequences.pipeline.GsdEmulationPipeline` to be used in the challenges. This pipeline takes care of extracting the correct data from the dataset and running the algorithm on it. .. GENERATED FROM PYTHON SOURCE LINES 38-42 .. code-block:: default from mobgap.gait_sequences.pipeline import GsdEmulationPipeline pipe = GsdEmulationPipeline(algo) .. GENERATED FROM PYTHON SOURCE LINES 43-44 Let's demonstrate that quickly on a single datapoint. .. GENERATED FROM PYTHON SOURCE LINES 44-47 .. code-block:: default pipe_with_results = pipe.clone().run(long_test[0]) pipe_with_results.gs_list_ .. raw:: html

	start	end
gs_id
0	600	1201
1	2700	4201
2	4350	5251
3	7800	8851
4	9450	10201
5	10950	11551
6	13050	13651

.. GENERATED FROM PYTHON SOURCE LINES 48-65 Evaluation Challenge -------------------- This pipeline can now be used as part of an evaluation challenge. An evaluation challenge takes care of two things: - Running the pipeline on multiple datapoints - Scoring the results per datapoint and then aggregating the results We provide two challenges: - :class:`~mobgap.gait_sequences.evaluation.GsdEvaluation`: This challenge simply runs the pipeline on all datapoints and then scores the results. - :class:`~mobgap.gait_sequences.evaluation.GsdEvaluationCV`: This challenge runs a cross-validation on the dataset and then scores the results per fold. Before we run the entire pipeline, let's look at the scoring. We provide a default scoring function that calculates all the relevant performance metrics. Let's look at the code of it first. .. GENERATED FROM PYTHON SOURCE LINES 65-71 .. code-block:: default from inspect import getsource from mobgap.gait_sequences.evaluation import gsd_evaluation_scorer print(getsource(gsd_evaluation_scorer)) .. rst-class:: sphx-glr-script-out .. code-block:: none def gsd_evaluation_scorer(pipeline: GsdEmulationPipeline, datapoint: BaseGaitDatasetWithReference) -> dict: """Evaluate the performance of a GSD algorithm on a single datapoint. This function is used to evaluate the performance of a GSD algorithm on a single datapoint. It calculates the performance metrics based on the detected gait sequences and the reference gait sequences. This is the default scoring functions for the GSD evaluation pipelines (``GsdEvaluation`` and ``GsdEvaluationCV``). Parameters ---------- pipeline An instance of GSD emulation pipeline that wraps the algorithm that should be evaluated. datapoint The datapoint to be evaluated. Returns ------- dict A dictionary containing the performance metrics. Note, that some results are wrapped in a ``NoAgg`` object or other aggregators. The results of this function are not expected to be parsed manually, but rather the function is expected to be used in the context of the :func:`~tpcp.validate.validate`/:func:`~tpcp.validate.cross_validate` functions or similar as scorer. This functions will aggregate the results and provide a summary of the performance metrics. """ from mobgap.gait_sequences.evaluation import ( calculate_matched_gsd_performance_metrics, calculate_unmatched_gsd_performance_metrics, categorize_intervals_per_sample, ) # Run the algorithm on the datapoint detected_gs_list = pipeline.safe_run(datapoint).gs_list_ reference_gs_list = datapoint.reference_parameters_.wb_list[["start", "end"]] n_overall_samples = len(datapoint.data_ss) sampling_rate_hz = datapoint.sampling_rate_hz matches = categorize_intervals_per_sample( gsd_list_detected=detected_gs_list, gsd_list_reference=reference_gs_list, n_overall_samples=n_overall_samples ) # Calculate the performance metrics performance_metrics = { **calculate_unmatched_gsd_performance_metrics( gsd_list_detected=detected_gs_list, gsd_list_reference=reference_gs_list, sampling_rate_hz=sampling_rate_hz, ), **calculate_matched_gsd_performance_metrics(matches), "detected": NoAgg(detected_gs_list), "reference": NoAgg(reference_gs_list), } return performance_metrics .. GENERATED FROM PYTHON SOURCE LINES 72-86 We can see that this method is relatively simple, using the gsd evaluation functions that we provide. So if you want to run your own scoring function, it should be straightforward to do so. Note, the :class:`~tpcp.validate.NoAgg` wrapping some of the return values. This is a special aggregator that tells the challenge to not try to aggregate the respective values. For all other values, the challenge will try average the values across all datapoints. To learn more about these special aggregators, check out the `tpcp example `_. The scoring function takes care of running the pipeline. So we can test the scorer, by just providing it with a pipeline and a datapoint. Note, that we remove the :class:`~tpcp.validate.NoAgg` parameters from the results, as they don't visualize well. .. GENERATED FROM PYTHON SOURCE LINES 86-93 .. code-block:: default from pprint import pprint single_dp_results = gsd_evaluation_scorer(pipe, long_test[0]) single_dp_results.pop("detected") single_dp_results.pop("reference") pprint(single_dp_results) .. rst-class:: sphx-glr-script-out .. code-block:: none {'accuracy': 0.7349292709466811, 'detected_gs_duration_s': 60.14, 'detected_num_gs': 7, 'f1_score': 0.6371400198609732, 'fn_samples': 839, 'fp_samples': 2815, 'gs_absolute_duration_error_s': 19.700000000000003, 'gs_absolute_relative_duration_error': 0.487141444114738, 'gs_absolute_relative_duration_error_log': 0.3968557834081124, 'gs_duration_error_s': 19.700000000000003, 'gs_relative_duration_error': 0.487141444114738, 'npv': 0.8919093017263592, 'num_gs_absolute_error': 1, 'num_gs_absolute_relative_error': 0.16666666666666666, 'num_gs_absolute_relative_error_log': 0.15415067982725836, 'num_gs_error': 1, 'num_gs_relative_error': 0.16666666666666666, 'precision': 0.5326249377386685, 'recall': 0.7926859402026192, 'reference_gs_duration_s': 40.44, 'reference_num_gs': 6, 'specificity': 0.7109262682275621, 'tn_samples': 6923, 'tp_samples': 3208} .. GENERATED FROM PYTHON SOURCE LINES 94-96 The challenge will call this scoring method for each datapoint in the dataset. Let's test this with the `GsdEvaluation` challenge. .. GENERATED FROM PYTHON SOURCE LINES 96-99 .. code-block:: default from mobgap.gait_sequences.evaluation import GsdEvaluation eval_challenge = GsdEvaluation(long_test, scoring=gsd_evaluation_scorer) .. GENERATED FROM PYTHON SOURCE LINES 100-101 We can now run the challenge. .. GENERATED FROM PYTHON SOURCE LINES 101-103 .. code-block:: default eval_challenge = eval_challenge.run(pipe) .. rst-class:: sphx-glr-script-out .. code-block:: none Datapoints: 0%| | 0/3 [00:00

	0
agg__reference_gs_duration_s	48.926667
agg__detected_gs_duration_s	58.620000
agg__gs_duration_error_s	9.693333
agg__gs_relative_duration_error	0.237191
agg__gs_absolute_duration_error_s	9.693333
agg__gs_absolute_relative_duration_error	0.237191
agg__gs_absolute_relative_duration_error_log	0.200297
agg__detected_num_gs	6.000000
agg__reference_num_gs	5.000000
agg__num_gs_error	1.000000
agg__num_gs_relative_error	0.333333
agg__num_gs_absolute_error	1.666667
agg__num_gs_absolute_relative_error	0.444444
agg__num_gs_absolute_relative_error_log	0.333816
agg__tp_samples	3730.333333
agg__fp_samples	2138.666667
agg__fn_samples	1166.000000
agg__precision	0.632099
agg__recall	0.766355
agg__f1_score	0.688517
agg__tn_samples	10477.333333
agg__specificity	0.815958
agg__accuracy	0.802683
agg__npv	0.899915

.. GENERATED FROM PYTHON SOURCE LINES 116-117 The raw results are stored in the columns starting with ``single__``. .. GENERATED FROM PYTHON SOURCE LINES 117-119 .. code-block:: default single_results = validate_results.filter(like="single__") .. GENERATED FROM PYTHON SOURCE LINES 120-121 And it is often helpful to explode them to get a better overview. .. GENERATED FROM PYTHON SOURCE LINES 121-135 .. code-block:: default exploded_results = ( single_results.explode(single_results.columns.to_list()) .rename_axis("fold") .set_index( pd.MultiIndex.from_tuples( (dl := validate_results["data_labels"].explode().to_list()), names=list(dl[0]._fields), ), append=True, ) ) exploded_results.columns = exploded_results.columns.str.removeprefix("single__") exploded_results.T .. raw:: html

fold	0
cohort	HA		MS
participant_id	001	002	001
time_measure	TimeMeasure1	TimeMeasure1	TimeMeasure1
test	Test11	Test11	Test11
trial	Trial1	Trial1	Trial1
reference_gs_duration_s	40.44	40.82	65.52
detected_gs_duration_s	60.14	49.62	66.1
gs_duration_error_s	19.7	8.8	0.58
gs_relative_duration_error	0.487141	0.215581	0.008852
gs_absolute_duration_error_s	19.7	8.8	0.58
gs_absolute_relative_duration_error	0.487141	0.215581	0.008852
gs_absolute_relative_duration_error_log	0.396856	0.195222	0.008813
detected_num_gs	7	6	5
reference_num_gs	6	3	6
num_gs_error	1	3	-1
num_gs_relative_error	0.166667	1.0	-0.166667
num_gs_absolute_error	1	3	1
num_gs_absolute_relative_error	0.166667	1.0	0.166667
num_gs_absolute_relative_error_log	0.154151	0.693147	0.154151
tp_samples	3208	3132	4851
fp_samples	2815	1835	1766
fn_samples	839	955	1704
precision	0.532625	0.630562	0.733112
recall	0.792686	0.766332	0.740046
f1_score	0.63714	0.691849	0.736562
tn_samples	6923	10080	14429
specificity	0.710926	0.845992	0.890954
accuracy	0.734929	0.825647	0.847473
npv	0.891909	0.913457	0.894378
detected	start end gs_id 0 ...	start end gs_id 0 ...	start end gs_id 0 ...
reference	start end wb_id 0 ...	start end wb_id 0 ...	start end wb_id 0 ...

.. GENERATED FROM PYTHON SOURCE LINES 136-138 The ``detected`` and ``reference`` columns in this dataframe contain the raw un-aggregated gait-sequences. So if we want to perform further evaluation on them (e.g. visualize them), we can use them. .. GENERATED FROM PYTHON SOURCE LINES 138-144 .. code-block:: default raw_gs_list = pd.concat( exploded_results.loc[:, ["detected", "reference"]].stack().to_dict(), names=[*exploded_results.index.names, "system"], ).unstack("system") raw_gs_list .. raw:: html

							start		end
						system	detected	reference	detected	reference
fold	cohort	participant_id	time_measure	test	trial
0	HA	001	TimeMeasure1	Test11	Trial1	0	600.0	632.0	1201.0	988.0
						1	2700.0	2864.0	4201.0	3325.0
						2	4350.0	3853.0	5251.0	5085.0
						3	7800.0	7641.0	8851.0	8621.0
						4	9450.0	9451.0	10201.0	9932.0
						5	10950.0	11989.0	11551.0	12517.0
						6	13050.0	NaN	13651.0	NaN
		002	TimeMeasure1	Test11	Trial1	0	450.0	485.0	1201.0	1131.0
						1	2400.0	1746.0	3301.0	3554.0
						2	3450.0	6083.0	4051.0	7708.0
						3	5700.0	NaN	7201.0	NaN
						4	7350.0	NaN	7951.0	NaN
						5	15000.0	NaN	15601.0	NaN
	MS	001	TimeMeasure1	Test11	Trial1	0	750.0	1019.0	1651.0	1768.0
						1	4650.0	4534.0	6151.0	5549.0
						2	12900.0	9665.0	14851.0	10569.0
						3	20100.0	12337.0	21151.0	14633.0
						4	21300.0	20151.0	22501.0	20982.0
						5	NaN	21378.0	NaN	22129.0

.. GENERATED FROM PYTHON SOURCE LINES 145-146 Further there are some runtime information available (i.e. when the challenge was started, and how long it took). .. GENERATED FROM PYTHON SOURCE LINES 146-148 .. code-block:: default eval_challenge.start_datetime_, eval_challenge.end_datetime_ .. rst-class:: sphx-glr-script-out .. code-block:: none ('2024-10-21T12:43:10.783453+00:00', '2024-10-21T12:43:11.696589+00:00') .. GENERATED FROM PYTHON SOURCE LINES 149-152 .. code-block:: default eval_challenge.runtime_s_ .. rst-class:: sphx-glr-script-out .. code-block:: none 0.9129904770015855 .. GENERATED FROM PYTHON SOURCE LINES 153-174 Using :class:`~mobgap.gait_sequences.evaluation.GsdEvaluation` is great, if you are only comparing (or planning to compare) non-ML algorithms, or algorithms that don't require further optimization (e.g. through GridSearch). Therefore, it is generally recommended to run a cross-validation with :class:`~mobgap.gait_sequences.evaluation.GsdEvaluationCV`. This allows you to evaluate the performance of the algorithm on multiple folds of the dataset and through the use of :class:`~tpcp.optimize.DummyOptimize` you can also use algorithms without optimization in the same pipeline for comparison. Let's demonstrate the use of :class:`~mobgap.gait_sequences.evaluation.GsdEvaluationCV` on the example dataset using the same algorithm once with and once without GridSearch. For the CV-based challenge, we need to set up a cross-validation. As we only have 3 datapoints here, we will use a 3-fold cross-validation without grouping or stratification. In a real-world scenario, you would use a more sophisticated cross-validation strategy. You can learn more about cross-validation in the `tpcp example `_. Further, to speed things up, we are going to use multi-processing. We can configure this using the ``n_jobs`` parameter that we pass to the internal :func:`~tpcp.validate.cross_validate` function via the ``cv_params`` parameters .. GENERATED FROM PYTHON SOURCE LINES 174-183 .. code-block:: default from mobgap.gait_sequences.evaluation import GsdEvaluationCV eval_challenge_cv = GsdEvaluationCV( long_test, cv_iterator=3, scoring=gsd_evaluation_scorer, cv_params={"n_jobs": 2, "return_optimizer": True}, ) .. GENERATED FROM PYTHON SOURCE LINES 184-186 To use our pipeline from above, we need to wrap it in a :class:`~tpcp.optimize.DummyOptimize` instance. This will basically skip any optimization on the train set and just apply the pipeline to the test set. .. GENERATED FROM PYTHON SOURCE LINES 186-192 .. code-block:: default from tpcp.optimize import DummyOptimize eval_challenge_cv = eval_challenge_cv.run( DummyOptimize(pipe, ignore_potential_user_error_warning=True) ) .. rst-class:: sphx-glr-script-out .. code-block:: none CV Folds: 0%| | 0/3 [00:00

	0	1	2
test__agg__reference_gs_duration_s	40.440000	40.820000	65.520000
test__agg__detected_gs_duration_s	60.140000	49.620000	66.100000
test__agg__gs_duration_error_s	19.700000	8.800000	0.580000
test__agg__gs_relative_duration_error	0.487141	0.215581	0.008852
test__agg__gs_absolute_duration_error_s	19.700000	8.800000	0.580000
test__agg__gs_absolute_relative_duration_error	0.487141	0.215581	0.008852
test__agg__gs_absolute_relative_duration_error_log	0.396856	0.195222	0.008813
test__agg__detected_num_gs	7.000000	6.000000	5.000000
test__agg__reference_num_gs	6.000000	3.000000	6.000000
test__agg__num_gs_error	1.000000	3.000000	-1.000000
test__agg__num_gs_relative_error	0.166667	1.000000	-0.166667
test__agg__num_gs_absolute_error	1.000000	3.000000	1.000000
test__agg__num_gs_absolute_relative_error	0.166667	1.000000	0.166667
test__agg__num_gs_absolute_relative_error_log	0.154151	0.693147	0.154151
test__agg__tp_samples	3208.000000	3132.000000	4851.000000
test__agg__fp_samples	2815.000000	1835.000000	1766.000000
test__agg__fn_samples	839.000000	955.000000	1704.000000
test__agg__precision	0.532625	0.630562	0.733112
test__agg__recall	0.792686	0.766332	0.740046
test__agg__f1_score	0.637140	0.691849	0.736562
test__agg__tn_samples	6923.000000	10080.000000	14429.000000
test__agg__specificity	0.710926	0.845992	0.890954
test__agg__accuracy	0.734929	0.825647	0.847473
test__agg__npv	0.891909	0.913457	0.894378

.. GENERATED FROM PYTHON SOURCE LINES 204-205 The raw results are stored in the columns starting with ``single__``. .. GENERATED FROM PYTHON SOURCE LINES 205-222 .. code-block:: default single_results = cv_results.filter(like="test__single__") exploded_results_cv = ( single_results.explode(single_results.columns.to_list()) .rename_axis("fold") .set_index( pd.MultiIndex.from_tuples( (dl := cv_results["test__data_labels"].explode().to_list()), names=list(dl[0]._fields), ), append=True, ) ) exploded_results_cv.columns = exploded_results_cv.columns.str.removeprefix( "test__single__" ) exploded_results_cv.T .. raw:: html

fold	0	1	2
cohort	HA	HA	MS
participant_id	001	002	001
time_measure	TimeMeasure1	TimeMeasure1	TimeMeasure1
test	Test11	Test11	Test11
trial	Trial1	Trial1	Trial1
reference_gs_duration_s	40.44	40.82	65.52
detected_gs_duration_s	60.14	49.62	66.1
gs_duration_error_s	19.7	8.8	0.58
gs_relative_duration_error	0.487141	0.215581	0.008852
gs_absolute_duration_error_s	19.7	8.8	0.58
gs_absolute_relative_duration_error	0.487141	0.215581	0.008852
gs_absolute_relative_duration_error_log	0.396856	0.195222	0.008813
detected_num_gs	7	6	5
reference_num_gs	6	3	6
num_gs_error	1	3	-1
num_gs_relative_error	0.166667	1.0	-0.166667
num_gs_absolute_error	1	3	1
num_gs_absolute_relative_error	0.166667	1.0	0.166667
num_gs_absolute_relative_error_log	0.154151	0.693147	0.154151
tp_samples	3208	3132	4851
fp_samples	2815	1835	1766
fn_samples	839	955	1704
precision	0.532625	0.630562	0.733112
recall	0.792686	0.766332	0.740046
f1_score	0.63714	0.691849	0.736562
tn_samples	6923	10080	14429
specificity	0.710926	0.845992	0.890954
accuracy	0.734929	0.825647	0.847473
npv	0.891909	0.913457	0.894378
detected	start end gs_id 0 ...	start end gs_id 0 ...	start end gs_id 0 ...
reference	start end wb_id 0 ...	start end wb_id 0 ...	start end wb_id 0 ...

.. GENERATED FROM PYTHON SOURCE LINES 223-224 And the raw outputs: .. GENERATED FROM PYTHON SOURCE LINES 224-230 .. code-block:: default raw_gs_list_cv = pd.concat( exploded_results_cv.loc[:, ["detected", "reference"]].stack().to_dict(), names=[*exploded_results_cv.index.names, "system"], ).unstack("system") raw_gs_list_cv .. raw:: html

							start		end
						system	detected	reference	detected	reference
fold	cohort	participant_id	time_measure	test	trial
0	HA	001	TimeMeasure1	Test11	Trial1	0	600.0	632.0	1201.0	988.0
						1	2700.0	2864.0	4201.0	3325.0
						2	4350.0	3853.0	5251.0	5085.0
						3	7800.0	7641.0	8851.0	8621.0
						4	9450.0	9451.0	10201.0	9932.0
						5	10950.0	11989.0	11551.0	12517.0
						6	13050.0	NaN	13651.0	NaN
1	HA	002	TimeMeasure1	Test11	Trial1	0	450.0	485.0	1201.0	1131.0
						1	2400.0	1746.0	3301.0	3554.0
						2	3450.0	6083.0	4051.0	7708.0
						3	5700.0	NaN	7201.0	NaN
						4	7350.0	NaN	7951.0	NaN
						5	15000.0	NaN	15601.0	NaN
2	MS	001	TimeMeasure1	Test11	Trial1	0	750.0	1019.0	1651.0	1768.0
						1	4650.0	4534.0	6151.0	5549.0
						2	12900.0	9665.0	14851.0	10569.0
						3	20100.0	12337.0	21151.0	14633.0
						4	21300.0	20151.0	22501.0	20982.0
						5	NaN	21378.0	NaN	22129.0

.. GENERATED FROM PYTHON SOURCE LINES 231-238 If we compare these results to the ones from the non-CV challenge, we can see that "single" results are identical, just that they were called in multiple folds. This is expected, as we used :class:`~tpcp.optimize.DummyOptimize` and thus didn't optimize the algorithm. Let's try a :class:`~tpcp.optimize.GridSearch` on the algorithm to see how the results change. For the gridsearch, we will re-use the same scoring function as before, but we need to specify, which scoring result we want to optimize for. .. GENERATED FROM PYTHON SOURCE LINES 238-244 .. code-block:: default from sklearn.model_selection import ParameterGrid from tpcp.optimize import GridSearch para_grid = ParameterGrid({"algo__window_length_s": [2, 3, 4]}) optimizer = GridSearch(pipe, para_grid, return_optimized="precision") .. GENERATED FROM PYTHON SOURCE LINES 245-248 The optimizer can now be used in the same CV challenge as before. This way we can guarantee that the same folds are used for the optimization and the evaluation and ensure the best possible comparison between the algorithms versions. .. GENERATED FROM PYTHON SOURCE LINES 248-250 .. code-block:: default eval_challenge_cv = eval_challenge_cv.clone().run(optimizer) .. rst-class:: sphx-glr-script-out .. code-block:: none CV Folds: 0%| | 0/3 [00:00

	algo__window_length_s
0	3
1	2
2	2

.. GENERATED FROM PYTHON SOURCE LINES 267-269 Or we can go much deeper, by getting all information about the optimization process. Let's just look at the keys of the information that is available. .. GENERATED FROM PYTHON SOURCE LINES 269-272 .. code-block:: default all_opti_results_fold0 = pd.DataFrame(opt_results.loc[0].gs_results_) all_opti_results_fold0.columns.to_list() .. rst-class:: sphx-glr-script-out .. code-block:: none ['agg__reference_gs_duration_s', 'rank__agg__reference_gs_duration_s', 'agg__detected_gs_duration_s', 'rank__agg__detected_gs_duration_s', 'agg__gs_duration_error_s', 'rank__agg__gs_duration_error_s', 'agg__gs_relative_duration_error', 'rank__agg__gs_relative_duration_error', 'agg__gs_absolute_duration_error_s', 'rank__agg__gs_absolute_duration_error_s', 'agg__gs_absolute_relative_duration_error', 'rank__agg__gs_absolute_relative_duration_error', 'agg__gs_absolute_relative_duration_error_log', 'rank__agg__gs_absolute_relative_duration_error_log', 'agg__detected_num_gs', 'rank__agg__detected_num_gs', 'agg__reference_num_gs', 'rank__agg__reference_num_gs', 'agg__num_gs_error', 'rank__agg__num_gs_error', 'agg__num_gs_relative_error', 'rank__agg__num_gs_relative_error', 'agg__num_gs_absolute_error', 'rank__agg__num_gs_absolute_error', 'agg__num_gs_absolute_relative_error', 'rank__agg__num_gs_absolute_relative_error', 'agg__num_gs_absolute_relative_error_log', 'rank__agg__num_gs_absolute_relative_error_log', 'agg__tp_samples', 'rank__agg__tp_samples', 'agg__fp_samples', 'rank__agg__fp_samples', 'agg__fn_samples', 'rank__agg__fn_samples', 'agg__precision', 'rank__agg__precision', 'agg__recall', 'rank__agg__recall', 'agg__f1_score', 'rank__agg__f1_score', 'agg__tn_samples', 'rank__agg__tn_samples', 'agg__specificity', 'rank__agg__specificity', 'agg__accuracy', 'rank__agg__accuracy', 'agg__npv', 'rank__agg__npv', 'single__reference_gs_duration_s', 'single__detected_gs_duration_s', 'single__gs_duration_error_s', 'single__gs_relative_duration_error', 'single__gs_absolute_duration_error_s', 'single__gs_absolute_relative_duration_error', 'single__gs_absolute_relative_duration_error_log', 'single__detected_num_gs', 'single__reference_num_gs', 'single__num_gs_error', 'single__num_gs_relative_error', 'single__num_gs_absolute_error', 'single__num_gs_absolute_relative_error', 'single__num_gs_absolute_relative_error_log', 'single__tp_samples', 'single__fp_samples', 'single__fn_samples', 'single__precision', 'single__recall', 'single__f1_score', 'single__tn_samples', 'single__specificity', 'single__accuracy', 'single__npv', 'single__detected', 'single__reference', 'data_labels', 'debug__score_time', 'param__algo__window_length_s', 'params'] .. GENERATED FROM PYTHON SOURCE LINES 273-277 With that, we hope it becomes clear, how these challenges can be extremely valuable, when benchmarking algorithms across datasets. To see how we evaluate the performance of the algorithms available in mobgap, check out the other gsd evaluation examples. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 16.891 seconds) **Estimated memory usage:** 9 MB .. _sphx_glr_download_auto_examples_gait_sequences__04_gsd_evaluation_challenges.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: _04_gsd_evaluation_challenges.py <_04_gsd_evaluation_challenges.py>` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: _04_gsd_evaluation_challenges.ipynb <_04_gsd_evaluation_challenges.ipynb>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_