Note

Go to the end to download the full example code

GSD Evaluation#

This example shows how to apply evaluation algorithms to GSD and thus how to rate the performance of a GSD algorithm.

import pandas as pd
from mobgap.data import LabExampleDataset
from mobgap.gait_sequences import GsdIluz

Loading some example data#

First, we load example data and apply the GSD Iluz algorithm to it. However, you can use any other GSD algorithm as well. To have a reference to compare the results to, we also load the corresponding ground truth data. These steps are explained in more detail in the GSD Iluz example.

from mobgap.utils.conversions import to_body_frame


def load_data():
    lab_example_data = LabExampleDataset(reference_system="INDIP")
    single_test = lab_example_data.get_subset(
        cohort="MS", participant_id="001", test="Test11", trial="Trial1"
    )
    return single_test


def calculate_gsd_iluz_output(single_test_data):
    """Calculate the GSD Iluz output for one sensor from the test data."""
    det_gsd = (
        GsdIluz()
        .detect(
            to_body_frame(single_test_data.data_ss),
            sampling_rate_hz=single_test_data.sampling_rate_hz,
        )
        .gs_list_
    )
    return det_gsd


def load_reference(single_test_data):
    """Load the reference gait sequences from the test data."""
    ref_gsd = single_test_data.reference_parameters_.wb_list
    return ref_gsd


test_data = load_data()
detected_gsd_list = calculate_gsd_iluz_output(test_data)
reference_gsd_list = load_reference(test_data)

/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/data/_mobilised_matlab_loader.py:1082: UserWarning: There were multiple ICs with the same index value, but different LR labels. This is likely an issue with the reference system you should further investigate. For now, we set the `lr_label` of the stride corresponding to this IC to Nan. However, both values still remain in the IC list.
  return parse_reference_parameters(

that is characterized by its start and end index in samples.

detected_gsd_list

	start	end
gs_id
0	750	1651
1	4650	6151
2	12900	14851
3	20100	21151
4	21300	22501

reference_gsd_list

	start	end	n_strides	duration_s	length_m	avg_walking_speed_mps	avg_cadence_spm	avg_stride_length_m	termination_reason
wb_id
0	1019	1768	9	7.48	4.468932	0.847668	107.795850	0.942678	Pause
1	4534	5549	11	10.14	2.900453	0.365176	93.396106	0.483923	Pause
2	9665	10569	9	9.03	2.140232	0.294058	75.981133	0.506458	Pause
3	12337	14633	28	22.95	11.201110	0.634425	92.337768	0.803933	Pause
4	20151	20982	11	8.30	2.390709	0.371746	87.915774	0.507484	Pause
5	21378	22129	9	7.50	2.517558	0.492965	95.365740	0.599360	Pause

Validation of algorithm output against a reference#

Let’s quantify how the algorithm output compares to the reference labels. To gain a detailed insight into the performance of the algorithm, we can look into the individual matches between the detected and reference gait sequences. Note, that there are two different ways to approach this:

We can calculate the for each sample, whether it is correctly detected as gait or not.
We can check on the level of gait sequences, whether a detected gait sequence matches with a reference gait sequence (by a certain overlap threshold).

In mobgap, we provide functions to calculate both types of performance metrics. Let’s start with the first one.

Sample-wise performance evaluation#

To do this, we use the categorize_intervals_per_sample function to identify overlapping regions between the detected gait sequences and the reference gait sequences. These overlapping regions can then be converted into sample-wise classifications of true positives, false positives, and false negatives.

As function arguments, besides the mandatory detected and reference gait sequences, the total number of samples in the recording can be specified as optional parameter. If provided, the intervals where no gait sequences are present in the reference and the detected list are also reported. Later on, we can then use these categorized intervals to calculate a set of higher-level performance metrics.

As result, a DataFrame containing start and end index of the resulting categorized intervals together with a match_type column that contains the type of match for each interval, i.e. tp for true positive, fp for false positive, and fn for false negative. These intervals can not be interpreted as gait sequences, but are rather subsequences of the detected gait sequences categorizing correctly detected samples (tp), falsely detected samples (fp), samples from the reference gsd list that were not detected (fn), and (optionally) samples where no gait sequences are present in both the reference and detected gait sequences (tn). Note that the tn intervals are not explicitly calculated, but are inferred from the total length of the recording (if provided) and from the other intervals, as everything between them is considered as true negative.

from mobgap.gait_sequences.evaluation import categorize_intervals_per_sample

categorized_intervals = categorize_intervals_per_sample(
    gsd_list_detected=detected_gsd_list,
    gsd_list_reference=reference_gsd_list,
    n_overall_samples=len(test_data.data_ss),
)

categorized_intervals

	start	end	match_type
0	0	750	tn
1	750	1019	fp
2	1019	1651	tp
3	1651	1768	fn
4	1768	4534	tn
5	4534	4650	fn
6	4650	5549	tp
7	5549	6151	fp
8	6151	9665	tn
9	9665	10569	fn
10	10569	12337	tn
11	12337	12900	fn
12	12900	14633	tp
13	14633	14851	fp
14	14851	20100	tn
15	20100	20151	fp
16	20151	20982	tp
17	20982	21151	fp
18	21151	21300	tn
19	21300	21378	fp
20	21378	22129	tp
21	22129	22501	fp
22	22501	22727	tn

Based on the individually categorized tp, fp, fn, and tn intervals, common performance metrics, e.g., F1 score, precision, or recall can be calculated. For this purpose, the calculate_matched_gsd_performance_metrics function can be used. It calculates the metrics based on the “matched” gsd intervals, i.e., the categorized interval list where every entry has a match type (tp, fp, fn, tn) assigned. Therefore, the function requires to call the categorize_intervals_per_sample function first. The categorized intervals can then be passed as an argument to calculate_matched_gsd_performance_metrics. It returns a dictionary containing the metrics for the specified categorized intervals DataFrame. Here, the total number of samples in every match type, precision, recall, F1 score, are always calculated. Depending on whether true negatives are present in the categorized intervals, specificity, negative predictive value, and accuracy will additionally be reported.

from mobgap.gait_sequences.evaluation import (
    calculate_matched_gsd_performance_metrics,
)

matched_metrics_dict = calculate_matched_gsd_performance_metrics(
    categorized_intervals
)

matched_metrics_dict

{'tp_samples': 4851, 'fp_samples': 1766, 'fn_samples': 1704, 'precision': 0.733111682031132, 'recall': 0.740045766590389, 'f1_score': 0.736562405101731, 'tn_samples': 14429, 'specificity': 0.8909539981475764, 'accuracy': 0.8474725274725274, 'npv': 0.894377983016178}

Furthermore, there is a range of high-level performance metrics that are simply calculated based on the overall amount of gait sequences/gait detected in reference and detected data. Thus, they can be inferred from the reference and detected gait sequences directly without any intermediate steps using the calculate_unmatched_gsd_performance_metrics function. As some of the unmatched metrics are reported in seconds, the function requires the sampling frequency of the recorded data as an additional argument. It returns a dictionary containing all metrics for the specified detected and reference gait sequences.

from mobgap.gait_sequences.evaluation import (
    calculate_unmatched_gsd_performance_metrics,
)

unmatched_metrics_dict = calculate_unmatched_gsd_performance_metrics(
    gsd_list_detected=detected_gsd_list,
    gsd_list_reference=reference_gsd_list,
    sampling_rate_hz=test_data.sampling_rate_hz,
)

unmatched_metrics_dict

{'reference_gs_duration_s': 65.52, 'detected_gs_duration_s': 66.1, 'gs_duration_error_s': 0.5799999999999983, 'gs_relative_duration_error': 0.008852258852258826, 'gs_absolute_duration_error_s': 0.5799999999999983, 'gs_absolute_relative_duration_error': 0.008852258852258826, 'gs_absolute_relative_duration_error_log': 0.008813307312826587, 'detected_num_gs': 5, 'reference_num_gs': 6, 'num_gs_error': -1, 'num_gs_relative_error': -0.16666666666666666, 'num_gs_absolute_error': 1, 'num_gs_absolute_relative_error': 0.16666666666666666, 'num_gs_absolute_relative_error_log': 0.15415067982725836}

Direct Gait Sequence Matching#

Apart from the performance evaluation methods mentioned above, it might be useful in some cases to identify how many and which detected gait sequences reliably match with the ground truth.

This is primarily useful, when further parameters are associated with each gaits sequence, e.g., the gait speed. In this case, matching gait sequences that cover the same gait regions allows proper comparison of these parameters. For more information on this, see the example on the overall parameter evaluation on Walking-Bout level (TODO).

For this purpose, the categorize_intervals can be used. It returns all intervals of the detected gait sequences that overlap with the reference gait sequences by at least a given amount. The index of the result dataframe indicated the index of the detected gait sequence. We can see that with an overlap threshold of 0.7 (70%), three of the six detected gait sequences are considered as matches with the reference gait sequences for our example recording. Note, that this threshold is enforced in both directions, i.e., the detected gait sequence must overlap with the reference gait sequence by at least 70% and vice versa. This means that only 1 to 1 matches are possible. If multiple detected gait sequences overlap with the same reference gait sequence, only the one with the highest overlap is considered as a match. If one gait sequence is covered by multiple smaller once, possibly none of them is considered as a match.

from mobgap.gait_sequences.evaluation import categorize_intervals

matches = categorize_intervals(
    gsd_list_detected=detected_gsd_list,
    gsd_list_reference=reference_gsd_list,
    overlap_threshold=0.7,
)

matches

	gs_id_detected	gs_id_reference	match_type
match_id
0	0	0	tp
1	1	NaN	fp
2	2	3	tp
3	3	4	tp
4	4	NaN	fp
5	NaN	1	fn
6	NaN	2	fn
7	NaN	5	fn

Running a full evaluation pipeline#

Instead of manually evaluating and investigating the performance of a GSD algorithm on a single piece of data, we often want to run a full evaluation on an entire dataset. This can be done using the GsdEvaluationPipeline class and some tpcp functions.

But let’s start with selecting some data. We want to use all the simulated real-world walking data from the INDIP reference system (Test11).

simulated_real_world_walking = LabExampleDataset(
    reference_system="INDIP"
).get_subset(test="Test11")

simulated_real_world_walking

LabExampleDataset [3 groups/rows]

	cohort	participant_id	time_measure	test	trial
0	HA	001	TimeMeasure1	Test11	Trial1
1	HA	002	TimeMeasure1	Test11	Trial1
2	MS	001	TimeMeasure1	Test11	Trial1

Now we can use the GsdEvaluationPipeline class to directly run a Gsd algorithm on a datapoint. The pipeline takes care of extracting the required data.

from mobgap.gait_sequences.pipeline import GsdEmulationPipeline

pipeline = GsdEmulationPipeline(GsdIluz())

pipeline.safe_run(simulated_real_world_walking[0]).gs_list_

	start	end
gs_id
0	600	1201
1	2700	4201
2	4350	5251
3	7800	8851
4	9450	10201
5	10950	11551
6	13050	13651

Note, that this did just “run” the pipeline on a single datapoint. If we want to run it on all datapoints and evaluate the performance of the algorithm, we can use the validate function.

It uses the build in score method of the pipeline to calculate the performance of the algorithm on each datapoint and then takes the mean of the results. All mean and individual results are returned in huge dictionary that can be easily converted to a pandas DataFrame.

from tpcp.validate import validate

evaluation_results = pd.DataFrame(
    validate(pipeline, simulated_real_world_walking)
)

evaluation_results.drop(["single__reference", "single__detected"], axis=1).T

Datapoints:   0%|          | 0/3 [00:00<?, ?it/s]
Datapoints:  33%|███▎      | 1/3 [00:00<00:00,  3.74it/s]
Datapoints:  67%|██████▋   | 2/3 [00:00<00:00,  3.62it/s]/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/data/_mobilised_matlab_loader.py:1082: UserWarning: There were multiple ICs with the same index value, but different LR labels. This is likely an issue with the reference system you should further investigate. For now, we set the `lr_label` of the stride corresponding to this IC to Nan. However, both values still remain in the IC list.
  return parse_reference_parameters(

Datapoints: 100%|██████████| 3/3 [00:00<00:00,  3.47it/s]
Datapoints: 100%|██████████| 3/3 [00:00<00:00,  3.52it/s]

	0
debug__score_time	0.907791
data_labels	[(HA, 001, TimeMeasure1, Test11, Trial1), (HA,...
single__reference_gs_duration_s	[40.44, 40.82, 65.52]
single__detected_gs_duration_s	[60.14, 49.62, 66.1]
single__gs_duration_error_s	[19.700000000000003, 8.799999999999997, 0.5799...
single__gs_relative_duration_error	[0.487141444114738, 0.21558059774620278, 0.008...
single__gs_absolute_duration_error_s	[19.700000000000003, 8.799999999999997, 0.5799...
single__gs_absolute_relative_duration_error	[0.487141444114738, 0.21558059774620278, 0.008...
single__gs_absolute_relative_duration_error_log	[0.3968557834081124, 0.19522182088195622, 0.00...
single__detected_num_gs	[7, 6, 5]
single__reference_num_gs	[6, 3, 6]
single__num_gs_error	[1, 3, -1]
single__num_gs_relative_error	[0.16666666666666666, 1.0, -0.16666666666666666]
single__num_gs_absolute_error	[1, 3, 1]
single__num_gs_absolute_relative_error	[0.16666666666666666, 1.0, 0.16666666666666666]
single__num_gs_absolute_relative_error_log	[0.15415067982725836, 0.6931471805599453, 0.15...
single__tp_samples	[3208, 3132, 4851]
single__fp_samples	[2815, 1835, 1766]
single__fn_samples	[839, 955, 1704]
single__precision	[0.5326249377386685, 0.6305617072679686, 0.733...
single__recall	[0.7926859402026192, 0.7663322730609249, 0.740...
single__f1_score	[0.6371400198609732, 0.6918489065606361, 0.736...
single__tn_samples	[6923, 10080, 14429]
single__specificity	[0.7109262682275621, 0.8459924464960135, 0.890...
single__accuracy	[0.7349292709466811, 0.8256467941507312, 0.847...
single__npv	[0.8919093017263592, 0.9134571816946081, 0.894...
agg__reference_gs_duration_s	48.926667
agg__detected_gs_duration_s	58.62
agg__gs_duration_error_s	9.693333
agg__gs_relative_duration_error	0.237191
agg__gs_absolute_duration_error_s	9.693333
agg__gs_absolute_relative_duration_error	0.237191
agg__gs_absolute_relative_duration_error_log	0.200297
agg__detected_num_gs	6.0
agg__reference_num_gs	5.0
agg__num_gs_error	1.0
agg__num_gs_relative_error	0.333333
agg__num_gs_absolute_error	1.666667
agg__num_gs_absolute_relative_error	0.444444
agg__num_gs_absolute_relative_error_log	0.333816
agg__tp_samples	3730.333333
agg__fp_samples	2138.666667
agg__fn_samples	1166.0
agg__precision	0.632099
agg__recall	0.766355
agg__f1_score	0.688517
agg__tn_samples	10477.333333
agg__specificity	0.815958
agg__accuracy	0.802683
agg__npv	0.899915

In addition to the metrics, the method also returns the raw reference and detected gait sequences. These can be used for further custom analysis.

evaluation_results["single__reference"][0][0]

	start	end
wb_id
0	632	988
1	2864	3325
2	3853	5085
3	7641	8621
4	9451	9932
5	11989	12517

evaluation_results["single__detected"][0][0]

	start	end
gs_id
0	600	1201
1	2700	4201
2	4350	5251
3	7800	8851
4	9450	10201
5	10950	11551
6	13050	13651

If you want to calculate additional metrics, you can either create a custom score function or subclass the pipeline and overwrite the score function.

Parameter Optimization#

Simply applying an algorithm to the data for evaluation is often not enough. In case, of machine learning algorithms or algorithms with tunable parameters, we might want to optimize these parameters to get the best possible performance. To avoid overfitting, we can use cross-validation to evaluate the performance of the algorithm on multiple splits of the data.

Below we show that procedure by using a simple grid search to optimize the window length of the GSD Iluz algorithm and evaluate this approach within a 3-fold cross-validation. Per-fold we select the window length leading to the highest precision on the “train set” and evaluate the performance on the “test set”.

Note, that on a real world dataset, you would likely need to perform a group-wise stratified cross-validation to avoid data leakage between multiple trials from the same participant and ensure equal distribution of patient cohorts across the folds. See the detailed tpcp examples on these topics.

from sklearn.model_selection import ParameterGrid
from tpcp.optimize import GridSearch
from tpcp.validate import cross_validate

para_grid = ParameterGrid({"algo__window_length_s": [2, 3, 4]})

cross_validate_results = pd.DataFrame(
    cross_validate(
        GridSearch(
            GsdEmulationPipeline(GsdIluz()),
            para_grid,
            return_optimized="precision",
        ),
        simulated_real_world_walking,
        cv=3,
        return_train_score=True,
    )
)

cross_validate_results.drop(
    [
        "test__single__reference",
        "test__single__detected",
        "train__single__reference",
        "train__single__detected",
    ],
    axis=1,
).T

CV Folds:   0%|          | 0/3 [00:00<?, ?it/s]

Parameter Combinations:   0%|          | 0/3 [00:00<?, ?it/s]

Datapoints:   0%|          | 0/2 [00:00<?, ?it/s]

Datapoints:  50%|█████     | 1/2 [00:00<00:00,  3.44it/s]/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/data/_mobilised_matlab_loader.py:1082: UserWarning: There were multiple ICs with the same index value, but different LR labels. This is likely an issue with the reference system you should further investigate. For now, we set the `lr_label` of the stride corresponding to this IC to Nan. However, both values still remain in the IC list.
  return parse_reference_parameters(

Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.32it/s]
Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.34it/s]

Parameter Combinations:  33%|███▎      | 1/3 [00:00<00:01,  1.56it/s]

Datapoints:   0%|          | 0/2 [00:00<?, ?it/s]

Datapoints:  50%|█████     | 1/2 [00:00<00:00,  3.63it/s]/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/data/_mobilised_matlab_loader.py:1082: UserWarning: There were multiple ICs with the same index value, but different LR labels. This is likely an issue with the reference system you should further investigate. For now, we set the `lr_label` of the stride corresponding to this IC to Nan. However, both values still remain in the IC list.
  return parse_reference_parameters(

Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.45it/s]
Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.47it/s]

Parameter Combinations:  67%|██████▋   | 2/3 [00:01<00:00,  1.60it/s]

Datapoints:   0%|          | 0/2 [00:00<?, ?it/s]

Datapoints:  50%|█████     | 1/2 [00:00<00:00,  3.63it/s]/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/data/_mobilised_matlab_loader.py:1082: UserWarning: There were multiple ICs with the same index value, but different LR labels. This is likely an issue with the reference system you should further investigate. For now, we set the `lr_label` of the stride corresponding to this IC to Nan. However, both values still remain in the IC list.
  return parse_reference_parameters(

Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.49it/s]
Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.51it/s]

Parameter Combinations: 100%|██████████| 3/3 [00:01<00:00,  1.62it/s]
Parameter Combinations: 100%|██████████| 3/3 [00:01<00:00,  1.61it/s]

Datapoints:   0%|          | 0/1 [00:00<?, ?it/s]

Datapoints: 100%|██████████| 1/1 [00:00<00:00,  3.58it/s]
Datapoints: 100%|██████████| 1/1 [00:00<00:00,  3.57it/s]

Datapoints:   0%|          | 0/2 [00:00<?, ?it/s]

Datapoints:  50%|█████     | 1/2 [00:00<00:00,  3.65it/s]/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/data/_mobilised_matlab_loader.py:1082: UserWarning: There were multiple ICs with the same index value, but different LR labels. This is likely an issue with the reference system you should further investigate. For now, we set the `lr_label` of the stride corresponding to this IC to Nan. However, both values still remain in the IC list.
  return parse_reference_parameters(

Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.47it/s]
Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.50it/s]

CV Folds:  33%|███▎      | 1/3 [00:02<00:05,  2.81s/it]

Parameter Combinations:   0%|          | 0/3 [00:00<?, ?it/s]

Datapoints:   0%|          | 0/2 [00:00<?, ?it/s]

Datapoints:  50%|█████     | 1/2 [00:00<00:00,  3.49it/s]/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/data/_mobilised_matlab_loader.py:1082: UserWarning: There were multiple ICs with the same index value, but different LR labels. This is likely an issue with the reference system you should further investigate. For now, we set the `lr_label` of the stride corresponding to this IC to Nan. However, both values still remain in the IC list.
  return parse_reference_parameters(

Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.37it/s]
Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.39it/s]

Parameter Combinations:  33%|███▎      | 1/3 [00:00<00:01,  1.59it/s]

Datapoints:   0%|          | 0/2 [00:00<?, ?it/s]

Datapoints:  50%|█████     | 1/2 [00:00<00:00,  3.57it/s]/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/data/_mobilised_matlab_loader.py:1082: UserWarning: There were multiple ICs with the same index value, but different LR labels. This is likely an issue with the reference system you should further investigate. For now, we set the `lr_label` of the stride corresponding to this IC to Nan. However, both values still remain in the IC list.
  return parse_reference_parameters(

Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.46it/s]
Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.47it/s]

Parameter Combinations:  67%|██████▋   | 2/3 [00:01<00:00,  1.61it/s]

Datapoints:   0%|          | 0/2 [00:00<?, ?it/s]

Datapoints:  50%|█████     | 1/2 [00:00<00:00,  3.58it/s]/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/data/_mobilised_matlab_loader.py:1082: UserWarning: There were multiple ICs with the same index value, but different LR labels. This is likely an issue with the reference system you should further investigate. For now, we set the `lr_label` of the stride corresponding to this IC to Nan. However, both values still remain in the IC list.
  return parse_reference_parameters(

Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.46it/s]
Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.48it/s]

Parameter Combinations: 100%|██████████| 3/3 [00:01<00:00,  1.61it/s]
Parameter Combinations: 100%|██████████| 3/3 [00:01<00:00,  1.61it/s]

Datapoints:   0%|          | 0/1 [00:00<?, ?it/s]

Datapoints: 100%|██████████| 1/1 [00:00<00:00,  3.66it/s]
Datapoints: 100%|██████████| 1/1 [00:00<00:00,  3.66it/s]

Datapoints:   0%|          | 0/2 [00:00<?, ?it/s]

Datapoints:  50%|█████     | 1/2 [00:00<00:00,  3.52it/s]/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/data/_mobilised_matlab_loader.py:1082: UserWarning: There were multiple ICs with the same index value, but different LR labels. This is likely an issue with the reference system you should further investigate. For now, we set the `lr_label` of the stride corresponding to this IC to Nan. However, both values still remain in the IC list.
  return parse_reference_parameters(

Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.38it/s]
Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.40it/s]

CV Folds:  67%|██████▋   | 2/3 [00:05<00:02,  2.81s/it]

Parameter Combinations:   0%|          | 0/3 [00:00<?, ?it/s]

Datapoints:   0%|          | 0/2 [00:00<?, ?it/s]

Datapoints:  50%|█████     | 1/2 [00:00<00:00,  3.53it/s]

Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.59it/s]
Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.58it/s]

Parameter Combinations:  33%|███▎      | 1/3 [00:00<00:01,  1.67it/s]

Datapoints:   0%|          | 0/2 [00:00<?, ?it/s]

Datapoints:  50%|█████     | 1/2 [00:00<00:00,  3.55it/s]

Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.61it/s]
Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.60it/s]

Parameter Combinations:  67%|██████▋   | 2/3 [00:01<00:00,  1.67it/s]

Datapoints:   0%|          | 0/2 [00:00<?, ?it/s]

Datapoints:  50%|█████     | 1/2 [00:00<00:00,  3.55it/s]

Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.62it/s]
Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.60it/s]

Parameter Combinations: 100%|██████████| 3/3 [00:01<00:00,  1.68it/s]
Parameter Combinations: 100%|██████████| 3/3 [00:01<00:00,  1.67it/s]

Datapoints:   0%|          | 0/1 [00:00<?, ?it/s]/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/data/_mobilised_matlab_loader.py:1082: UserWarning: There were multiple ICs with the same index value, but different LR labels. This is likely an issue with the reference system you should further investigate. For now, we set the `lr_label` of the stride corresponding to this IC to Nan. However, both values still remain in the IC list.
  return parse_reference_parameters(

Datapoints: 100%|██████████| 1/1 [00:00<00:00,  3.34it/s]
Datapoints: 100%|██████████| 1/1 [00:00<00:00,  3.34it/s]

Datapoints:   0%|          | 0/2 [00:00<?, ?it/s]

Datapoints:  50%|█████     | 1/2 [00:00<00:00,  3.53it/s]

Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.57it/s]
Datapoints: 100%|██████████| 2/2 [00:00<00:00,  3.56it/s]

CV Folds: 100%|██████████| 3/3 [00:08<00:00,  2.78s/it]
CV Folds: 100%|██████████| 3/3 [00:08<00:00,  2.79s/it]

	0	1	2
debug__score_time	0.302269	0.296322	0.322179
debug__optimize_time	1.87483	1.871321	1.799603
train__data_labels	[(HA, 002, TimeMeasure1, Test11, Trial1), (MS,...	[(HA, 001, TimeMeasure1, Test11, Trial1), (MS,...	[(HA, 001, TimeMeasure1, Test11, Trial1), (HA,...
test__data_labels	[(HA, 001, TimeMeasure1, Test11, Trial1)]	[(HA, 002, TimeMeasure1, Test11, Trial1)]	[(MS, 001, TimeMeasure1, Test11, Trial1)]
test__single__reference_gs_duration_s	[40.44]	[40.82]	[65.52]
...	...	...	...
train__agg__f1_score	0.714206	0.739854	0.698675
train__agg__tn_samples	12254.5	10993.0	8886.0
train__agg__specificity	0.868473	0.84001	0.819709
train__agg__accuracy	0.83656	0.832687	0.812025
train__agg__npv	0.903918	0.916703	0.911692

100 rows × 3 columns

In general, it is a good idea to use cross_validation also for algorithms that do not have tunable parameters. This way you can ensure that the performance of the algorithm is stable across different splits of the data, and it allows the direct comparison between tunable and non-tunable algorithms.

Total running time of the script: (0 minutes 12.054 seconds)

Estimated memory usage: 9 MB

Gallery generated by Sphinx-Gallery