mobgap.initial_contacts.evaluation.categorize_ic_list#

mobgap.initial_contacts.evaluation.categorize_ic_list( *, ic_list_detected: DataFrame, ic_list_reference: DataFrame, tolerance_samples: int | float = 0, multiindex_warning: bool = True, ) → DataFrame[source]#

Evaluate an initial contact list against a reference contact-by-contact.

This compares an initial contact dataframe with a ground truth initial contact dataframe and classifies each intial contact as true positive, false positive or false negative. The comparison is purely based on the index of each initial contact. Two initial contacts are considered a positive match, if the difference between their indices is less than or equal to the threshold.

If multiple detected initial contacts would match to a single ground truth initial contact (or vise-versa), only the initial contact with the lowest distance is considered an actual match. In case of multiple matches with the same distance, the first match will be considered. All other matches will be considered false positives or false negatives.

The detected and reference initial contact dataframes must have a column named “ic” that contains the index of the resective initial contact. As index, we support either a single or a multiindex without duplicates (i.e., the index must identify each initial contact uniquely). If a multiindex is provided, the single index levels will be ignored for the comparison and matches across different index groups will be possible. If this is not the intended use case, consider grouping your input data before calling the evaluation function (see create_multi_groupby and the example of IC-evaluation).

Parameters:

ic_list_detected: The dataframe of detected initial contacts.
ic_list_reference: The ground truth initial contact dataframe.
tolerance_samples: The allowed tolerance between a detected and reference initial contact in samples for it to be considered a true positive match. The comparison is done as distance <= tolerance_samples.
multiindex_warning: If True, a warning will be raised if the index of the input data is a MultiIndex, explaining that the index levels will be ignored for the matching process. This exists, as this is a common source of error, when this function is used together with a typical pipeline that iterates over individual gait sequences during the processing using GsIterator. Only set this to False, once you understand the two different usecases.

Returns:

matches: A 3 column dataframe with the column names ic_id_detected, ic_id_reference, and match_type. Each row is a match containing the index value of the detected and the reference list, that belong together, or a tuple of index values in case of a multiindex input. The match_type column indicates the type of match. For all initial contacts that have a match in the reference list, this will be “tp” (true positive). Initial contacts that do not have a match will be mapped to a NaN and the match-type will be “fp” (false positive). All reference initial contacts that do not have a counterpart in the detected list are marked as “fn” (false negative).

Notes

This function is a simplified version of a gaitmap function.

Examples

>>> ic_detected = pd.DataFrame([11, 23, 30, 50], columns=["ic"]).rename_axis(
...     "ic_id"
... )
>>> ic_reference = pd.DataFrame([10, 20, 32, 40], columns=["ic"]).rename_axis(
...     "ic_id"
... )
>>> result = categorize_ic_list(
...     ic_list_detected=ic_detected,
...     ic_list_reference=ic_reference,
...     tolerance_samples=2,
... )
>>> result
  ic_id_detected ic_id_reference match_type
0    0                 0         tp
1    1               NaN         fp
2    2                 2         tp
3    3               NaN         fp
4  NaN                 1         fn
5  NaN                 3         fn