mobgap.gsd.evaluation.find_matches_with_min_overlap#

mobgap.gsd.evaluation.find_matches_with_min_overlap(
*,
gsd_list_detected: DataFrame,
gsd_list_reference: DataFrame,
overlap_threshold: float = 0.8,
) DataFrame[source]#

Find all matches of gsd_list_detected in gsd_list_reference with at least overlap_threshold overlap.

The detected and reference dataframes are expected to have columns namend “start” and “end” containing the start and end indices of the respective gait sequences.

Note, that the threshold is enforced in both directions. That means, that the relative overlap of the detected gait sequence with respect to the overall length of the detected interval AND to the overall length of the matched reference interval must be at least overlap_threshold.

Note, we assume that gsd_list_detected has no overlaps, but we don’t enforce it!

Parameters:
gsd_list_detected: pd.DataFrame

Each row contains a detected gait sequence interval as output from the GSD algorithms. The respective start index is stored in the first and the stop index in the second column. Furthermore, the id of the respective gait sequence can be provided in the third column.

gsd_list_reference: pd.DataFrame

Gold standard to validate the detected gait sequences against. Should have the same format as gsd_list_detected.

overlap_threshold: float

The minimum relative overlap between a detected sequence and its reference with respect to the length of both intervals. Must be larger than 0.5 and smaller than or equal to 1.

Returns:
pandas.DataFrame

A dataframe containing the intervals from gsd_list_detected that overlap with gsd_list_reference with the specified minimum overlap. The dataframe contains the gait sequence ids as index column as well as the start and end indices of the intervals.

Examples

>>> from mobgap.gsd.evaluation import find_matches_with_min_overlap
>>> detected = pd.DataFrame([[0, 10, 0], [20, 30, 1]], columns=["start", "end", "id"]).set_index("id")
>>> reference = pd.DataFrame([[0, 10, 0], [15, 25, 1]], columns=["start", "end", "id"]).set_index("id")
>>> result = find_matches_with_min_overlap(detected, reference)
   start  end
id
0      0   10