ICD Evaluation#

This example shows how to apply evaluation algorithms to ICD and thus how to rate the performance of an ICD algorithm.

import pandas as pd

Import useful modules and packages

from mobgap.data import LabExampleDataset
from mobgap.initial_contacts import IcdIonescu
from mobgap.pipeline import GsIterator
from mobgap.utils.conversions import to_body_frame

Loading some example data#

First, we load example data and apply the ICD Ionescu algorithm to it. However, you can use any other ICD algorithm as well. To have a reference to compare the results to, we also load the corresponding ground truth data. These steps are explained in more detail in the ICD Ionescu example.

def load_data():
    """Load example data and extract a single trial for demonstration purposes."""
    example_data = LabExampleDataset(
        reference_system="INDIP", reference_para_level="wb"
    )
    single_test = example_data.get_subset(
        cohort="HA", participant_id="001", test="Test11", trial="Trial1"
    )
    return single_test


def calculate_icd_ionescu_output(single_test_data):
    """Calculate the ICD Ionescu output for one sensor from the test data."""
    imu_data = to_body_frame(single_test_data.data_ss)
    sampling_rate_hz = single_test_data.sampling_rate_hz
    reference_wbs = single_test_data.reference_parameters_.wb_list

    iterator = GsIterator()
    for (gs, data), result in iterator.iterate(imu_data, reference_wbs):
        result.ic_list = (
            IcdIonescu()
            .detect(data, sampling_rate_hz=sampling_rate_hz)
            .ic_list_
        )

    det_ics = iterator.results_.ic_list
    return det_ics


def load_reference(single_test_data):
    """Load the reference initial contacts from the test data."""
    ref_ics = single_test_data.reference_parameters_.ic_list
    return ref_ics


wb_data = load_data()
detected_ics = calculate_icd_ionescu_output(wb_data)
reference_ics = load_reference(wb_data)

As you can see our detected initial contacts and reference initial contacts are multiindexed dataframes. The first level of the multiindex is the walking bout id and the second level is the index of the initial contact within the walking bout.

ic
wb_id step_id
0 0 697
1 760
2 814
3 872
4 924
1 0 2929
1 2989
2 3056
3 3109
4 3176
5 3264
2 0 3913
1 3975
2 4085
3 4141
4 4201
5 4273
6 4355
7 4488
8 4553
9 4638
10 4758
11 4815
12 4863
13 4968
14 5038
3 0 7739
1 7846
2 7983
3 8096
4 8166
5 8229
6 8281
7 8336
8 8399
9 8461
10 8541
11 8606
4 0 9531
1 9593
2 9659
3 9723
4 9791
5 9853
5 0 12044
1 12099
2 12157
3 12211
4 12271
5 12414


ic lr_label
wb_id step_id
0 0 632 left
1 709 right
2 763 left
3 824 right
4 876 left
... ... ... ...
5 3 12162 left
4 12220 right
5 12277 left
6 12335 right
7 12516 left

63 rows × 2 columns



Matching ICs between detected and reference lists#

Let’s quantify how the algorithm output compares to the reference labels. To gain a detailed insight into the performance of the algorithm, we can look into the individual matches between the detected and reference initial contacts. To do this, we use the categorize_ic_list function to classify each detected initial contact as a true positive, false positive, or false negative. We can then use these results to calculate a range of higher-level performance metrics.

Note, that we want to only match initial contacts within the same walking bout. If we would simply pass the detected and reference initial contacts to the matching function, it would match all ICs independent of the walking bout, as it ignores the multiindex. We will have a look at how this looks like below, and when we might want to use it, but for now, let’s perform the matching within the walking bouts.

For this, we need to group the detected and reference initial contacts by the walking bout id. This can be done using the create_multi_groupby helper function.

from mobgap.utils.df_operations import create_multi_groupby

per_wb_grouper = create_multi_groupby(
    detected_ics, reference_ics, groupby="wb_id"
)

The provides us with a groupby object that is similar to the normal pandas groupby object that can be created from a single dataframe. The MultiGroupBy object allows us to apply a function to each group across all dataframes. I.e. the function will get the detected and reference initial contacts for each walking bout and then can perform some operation on them.

In our case we want to apply the categorize_ic_list function to each walking bout. This function will then return a dataframe with the matches given a certain tolerance.

We don’t assume that initial contacts are detected at perfectly the exact same time in both systems. Hence, we allow for a certain deviation in the matching process.

from mobgap.utils.conversions import as_samples

tolerance_s = 0.2
tolerance_samples = as_samples(tolerance_s, wb_data.sampling_rate_hz)
tolerance_samples
20

Now we can apply the matching function to each walking bout. Note, that our matches retain the multiindex and provide matches for each walking bout separately. The dataframe has 3 columns, containing the index value of the detected ic, the index value of matched reference ic, and the match type. The two index columns contain tuples in our case, as they stem from the original multiindex that we provided. So each of the tuples has the form (wb_id, ic_id).

from mobgap.initial_contacts.evaluation import categorize_ic_list

matches_per_wb = create_multi_groupby(
    detected_ics, reference_ics, groupby="wb_id"
).apply(
    lambda df1, df2: categorize_ic_list(
        ic_list_detected=df1,
        ic_list_reference=df2,
        tolerance_samples=tolerance_samples,
        multiindex_warning=False,
    )
)
matches_per_wb
ic_id_detected ic_id_reference match_type
wb_id
0 0 (0, 0) (0, 1) tp
1 (0, 1) (0, 2) tp
2 (0, 2) (0, 3) tp
3 (0, 3) (0, 4) tp
4 (0, 4) (0, 5) tp
... ... ... ... ...
5 4 (5, 4) (5, 5) tp
5 (5, 5) NaN fp
6 NaN (5, 0) fn
7 NaN (5, 6) fn
8 NaN (5, 7) fn

69 rows × 3 columns



Instead of matching the initial contacts within the same walking bout, we could also match all initial contacts independent of the walking bout. This can be done by simply passing the detected and reference initial contacts directly to the matching function. This can be useful if the walking bouts between the two compared systems are not identical or the multiindex has other columns that should not be taken into account for the matching.

/home/docs/checkouts/readthedocs.org/user_builds/mobgap/checkouts/v0.9.0/mobgap/initial_contacts/evaluation.py:172: UserWarning: The index of `ic_list_detected` or `ic_list_reference` is a MultiIndex. Please be aware that the index levels will not be regarded separately for the matching process, and initial contacts might be matched across different index groups, such as walking bouts or participants.
If this is not the intended use case for you, consider grouping your input data before calling the evaluation function.

This can be done using the `create_multi_groupby` function from the `mobgap.utils.array_handling`. Checkout the example of IC-evaluation for more information.
  warnings.warn(
ic_id_detected ic_id_reference match_type
0 (0, 0) (0, 1) tp
1 (0, 1) (0, 2) tp
2 (0, 2) (0, 3) tp
3 (0, 3) (0, 4) tp
4 (0, 4) (0, 5) tp
... ... ... ...
64 NaN (4, 0) fn
65 NaN (4, 7) fn
66 NaN (5, 0) fn
67 NaN (5, 6) fn
68 NaN (5, 7) fn

69 rows × 3 columns



Note, that this did not really make a difference in our case, as the individual WBs are identical between the two systems and far enough apart so that matches between different WBs are not possible. But in general, this can be a typical “foot-gun” for users, as they might not be aware of the fact that the multiindex is ignored in the matching process. Hence, as you can see above, a warning is raised if you pass a multiindex to the matching function. This can be silenced by setting the multiindex_warning parameter to False.

As in our case we would recommend to match the ICs per walking bout, we will continue with the matches per walking bout and ignore matches_all for the rest of this example.

Calculating performance metrics#

From these matches_per_wb, a range of higher-level performance metrics (including the total number of true positives, false positives, and false negatives, as well as precision, recall, and F1-score) can be calculated. For this purpose, we can use the calculate_matched_icd_performance_metrics function. It returns a dictionary containing all metrics for the specified detected and reference initial contact lists.

We can again decide, if we want to calculate these metrics across all walking bouts or for each walking bout separately. We will quickly show both approaches below.

Across all walking bouts:

tp_samples    44.000000
fp_samples     6.000000
fn_samples    19.000000
precision      0.880000
recall         0.698413
f1_score       0.778761
dtype: float64

Per Wb:

For this we can use the normal pandas groupby to calculate the metrics for each walking bout separately.

tp_samples fp_samples fn_samples precision recall f1_score
wb_id
0 5.0 0.0 2.0 1.000000 0.714286 0.833333
1 4.0 2.0 2.0 0.666667 0.666667 0.666667
2 13.0 2.0 5.0 0.866667 0.722222 0.787879
3 11.0 1.0 5.0 0.916667 0.687500 0.785714
4 6.0 0.0 2.0 1.000000 0.750000 0.857143
5 5.0 1.0 3.0 0.833333 0.625000 0.714286


Which of the two approaches makes more sense depends on the use case and what your multiindex represents.

Total running time of the script: (0 minutes 2.028 seconds)

Estimated memory usage: 9 MB

Gallery generated by Sphinx-Gallery