mobgap.gait_sequences.evaluation.categorize_intervals#
- mobgap.gait_sequences.evaluation.categorize_intervals(
- *,
- gsd_list_detected: DataFrame,
- gsd_list_reference: DataFrame,
- overlap_threshold: float = 0.8,
- multiindex_warning: bool = True,
Evaluate a gait sequence list against a reference sequence-by-sequence with a minimum overlap threshold.
This compares a gait sequence list against a reference list and classifies each detected sequence as true positive, false positive, or false negative. A gait sequence is classified as true positive when having at least
overlap_thresholdoverlap with a reference sequence. If a detected sequence has no overlap with any reference sequence, it is classified as false positive. If a reference sequence has no overlap with any detected sequence, it is classified as false negative.Note, that the threshold is enforced in both directions. That means, that the relative overlap of the detected gait sequence with respect to the overall length of the detected interval AND to the overall length of the matched reference interval must be at least
overlap_threshold.The detected and reference dataframes are expected to have columns namend “start” and “end” containing the start and end indices of the respective gait sequences. As index, we support either a single or a multiindex without duplicates (i.e., the index must identify each gait sequence uniquely). If a multiindex is provided, the single index levels will be ignored for the comparison and matches across different index groups will be possible. If this is not the intended use case, consider grouping your input data before calling this function (see
create_multi_groupby).Note, we assume that
gsd_list_detectedhas no overlaps, but we don’t enforce it! Additionally, note that this method won’t return any new intervals (as done incategorize_intervals_per_sample). Instead, the comparison is done on a sequence-by-sequence level based on the provided intervals.- Parameters:
- gsd_list_detected: pd.DataFrame
Each row contains a detected gait sequence interval as output from the GSD algorithms. The respective start index is stored in a column named
startand the stop index in a column namedstop.- gsd_list_reference: pd.DataFrame
Gold standard to validate the detected gait sequences against. Should have the same format as
gsd_list_detected.- overlap_threshold: float
The minimum relative overlap between a detected sequence and its reference with respect to the length of both intervals. Must be larger than 0.5 and smaller than or equal to 1.
- multiindex_warning
If True, a warning will be raised if the index of the input data is a MultiIndex, explaining that the index levels will be ignored for the matching process. This exists, as this is a common source of error, when this function is used together with a typical pipeline that iterates over individual gait sequences during the processing using
GsIterator. Only set this to False, once you understand the two different usecases.
- Returns:
- matches: pandas.DataFrame
A 3 column dataframe with the column names
gsd_id_detected,gsd_id_reference, andmatch_type. Each row is a match containing the index value of the detected and the reference list, that belong together, or a tuple of index values in case of a multiindex input. Thematch_typecolumn indicates the type of match. For all gait sequences that have a match in the reference list, this will be “tp” (true positive). Gait sequences that do not have a match will be mapped to a NaN and the match-type will be “fp” (false positive). All reference gait sequences that do not have a counterpart in the detected list are marked as “fn” (false negative).
Examples
>>> from mobgap.gait_sequences.evaluation import categorize_intervals >>> detected = pd.DataFrame( ... [[0, 10, 0], [20, 30, 1]], columns=["start", "end", "id"] ... ).set_index("id") >>> reference = pd.DataFrame( ... [[0, 10, 0], [15, 25, 1]], columns=["start", "end", "id"] ... ).set_index("id") >>> result = categorize_intervals(detected, reference) gsd_id_detected gs_id_reference match_type 0 0 0 tp 1 1 NaN fp 2 NaN 1 fn