mobgap.pipeline.evaluation.get_matching_intervals#

mobgap.pipeline.evaluation.get_matching_intervals(
*,
metrics_detected: DataFrame,
metrics_reference: DataFrame,
matches: DataFrame,
) DataFrame[source]#

Extract the detected and reference gait sequences that are considered as matches sequence-by-sequence.

Additionally, the metrics of the detected and reference gait sequences are extracted and returned in a DataFrame for further comparison. When your metrics are already aggregated on a higher level, such as daily, participant-wise, or session-wise, refer to ~func:~mobgap.gait_sequences.evaluation.combine_det_with_ref_without_matching instead.

Parameters:
metrics_detected

Each row corresponds to a detected gait sequence interval as output from the GSD algorithms. The columns contain the metrics estimated for each respective gait sequence based on these detected intervals. The columns present in both metrics_detected and metrics_reference are regarded for the matching, while the other columns are discarded.

metrics_reference

Each row corresponds to a reference gait sequence interval as retrieved from the reference system. The columns contain the metrics estimated for each respective gait sequence based on these reference intervals. The columns present in both metrics_detected and metrics_reference are regarded for the matching, while the other columns are discarded.

matches

A DataFrame containing the matched gait sequences as output by find_matches_with_min_overlap. Must have been calculated based on the same interval data as metrics_detected and metrics_reference. Expected to have the columns gs_id_detected, gs_id_reference, and match_type.

Returns:
matches: pd.DataFrame

The detected gait sequences that are considered as matches assigned to the reference sequences they are matching with. As index, the unique identifier for each matched gait sequence assigned in the matches DataFrame is used. The columns are two-level MultiIndex columns, consisting of a metrics and an origin level. As first column level, all columns present in both metrics_detected and metrics_reference are included. The second column level indicates the origin of the respective value, either detected or reference for metrics that were estimated based on the detected or reference gait sequences, respectively.

Examples

>>> from mobgap.gait_sequences.evaluation import categorize_intervals
>>> detected = pd.DataFrame(
...     [[0, 10, 0], [20, 30, 1]], columns=["start", "end", "id"]
... ).set_index("id")
>>> reference = pd.DataFrame(
...     [[0, 10, 0], [21, 29, 1]], columns=["start", "end", "id"]
... ).set_index("id")
>>> detected_metrics = pd.DataFrame(
...     [[1, 2, 0], [1, 2, 1]], columns=["metric_1", "metric_2", "id"]
... ).set_index("id")
>>> reference_metrics = pd.DataFrame(
...     [[2, 3, 0], [2, 3, 1]], columns=["metric_1", "metric_2", "id"]
... ).set_index("id")
>>> matches = categorize_intervals(
...     gsd_list_detected=detected, gsd_list_reference=reference
... )
>>> matched_gs = get_matching_intervals(
...     metrics_detected=detected_metrics,
...     metrics_reference=reference_metrics,
...     matches=matches,
... )
    metric metric_1           metric_2
    origin detected reference detected reference
id
0             1         2        2         3
1             1         2        2         3