mobgap.initial_contacts.evaluation.icd_score#

mobgap.initial_contacts.evaluation.icd_score = Scorer(default_aggregator=<function mean_agg>, final_aggregator=<function icd_final_agg>, n_jobs=None, pre_dispatch='2*n_jobs', progress_bar=True, score_func=<function icd_per_datapoint_score>, single_score_callback=None, verbose=0)[source]#

Scorer for ICD algorithms.

This is a pre-configured Scorer object using the icd_per_datapoint_score function as per-datapoint scorer and the icd_final_agg function as final aggregator. For more information about Scorer, head to the tpcp documentation (Scorer). For usage information in the context of mobgap, have a look at the evaluation example for ICD.

The following metrics are calculated:

Raw metrics (part of the single results):

  • single__raw__detected: The detected initial contacts as a single dataframe with the datapoint labels as index.

  • single__raw__reference: The reference initial contacts as a single dataframe with the datapoint labels as index.

Metrics per datapoint (single results): These values are all provided as a list of values, one per datapoint.

Aggregated metrics (aggregated results):

  • All single outputs averaged over all datapoints. These are stored as agg__{metric_name}.

  • All metrics from calculate_matched_icd_performance_metrics and calculate_true_positive_icd_error recalculated on all detected ICs across all datapoints. These are stored as combined__{metric_name}. Compared to the per-datapoint results (which are calculated, as errors per recording -> average over all recordings), these metrics are calculated as combining all ICDs from all recordings and then calculating the performance metrics. Effectively, this means, that in the per_datapoint version, each recording is weighted equally, while in the combined version, each IC is weighted equally.