Evaluation#

class mobgap.utils.evaluation.Evaluation( dataset: BaseGaitDatasetWithReference, scoring: Callable[[T, BaseGaitDatasetWithReference], float | Aggregator[Any] | dict[str, float | Aggregator[Any]]] | Scorer[T, BaseGaitDatasetWithReference], *, validate_paras: dict | None = None, )[source]#

Gneric Evaluation challenge for all algorithms.

This challenge wraps any valid gait pipeline together with a scoring function and runs and scores it on a dataset.

This is a suitable approach, when you want to evaluate and compare algorithms that are not “trainable” in any way. For example, traditional algorithms or pre-trained models. Note, that if you are planning to compare algorithms that are trainable with non-trainable algorithms, you should use the EvaluationCV for all of them.

Parameters:

dataset: A gait dataset with reference information. Evaluation is performed across all datapoints within the dataset.
scoring: A scoring function that evaluates the performance of the algorithm on a single datapoint. It should take a pipeline and a datapoint as input, run the pipeline on the datapoint and return a dictionary of performance metrics. These performance metrics are then aggregated across all datapoints.
validate_paras: Dictionary with further parameters that are directly passed to validate. This can overwrite all parameters except pipeline, dataset, scoring. Typical usecase is to set n_jobs to activate multiprocessing.

Other Parameters:

pipeline: The pipeline passed to the run method.

Attributes:

results_

Dictionary with all results of the validation. The results are returned by validate. You can control what information is provided via validate_paras

perf_

A dictionary with the performance results of the action method. This includes:

start_datetime_utc_timestamp: The start time of the action in UTC as a timestamp.
start_datetime: The start time of the action as a string.
end_datetime_utc_timestamp: The end time of the action in UTC as a timestamp.
end_datetime: The end time of the action as a string.
runtime_s: The runtime of the action in seconds.

Methods

`clone`()	Create a new instance of the class with all parameters copied over.
`get_aggregated_results_as_df`()	Get the aggregated results as a pandas DataFrame.
`get_params`([deep])	Get parameters for this algorithm.
`get_raw_results`()	Get the raw results of the cross-validation.
`get_single_results_as_df`([columns])	Get the results as a pandas DataFrame.
`run`(pipeline)	Run the evaluation challenge.
`set_params`(**params)	Set the parameters of this Algorithm.

__init__( dataset: BaseGaitDatasetWithReference, scoring: Callable[[T, BaseGaitDatasetWithReference], float | Aggregator[Any] | dict[str, float | Aggregator[Any]]] | Scorer[T, BaseGaitDatasetWithReference], *, validate_paras: dict | None = None, ) → None[source]#

clone() → Self#

Create a new instance of the class with all parameters copied over.

This will create a new instance of the class itself and all nested objects

get_aggregated_results_as_df() → DataFrame[source]#

Get the aggregated results as a pandas DataFrame.

This will return all agg__ columns that the scorer returned (see results_ attribute) as a pandas dataframe.

The returned Df just has a single row with the index 0 and each column represents one aggregated values. This shape is used, to provide equivalent output to the results of the cross-validation.

Returns:

pd.DataFrame: The results as a pandas DataFrame.

get_params(deep: bool = True) → dict[str, Any]#

Get parameters for this algorithm.

Parameters:

deep: Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like nested_object_name__ (Note the two “_” at the end)

Returns:

params: Parameter names mapped to their values.

get_raw_results() → dict[source]#

Get the raw results of the cross-validation.

Get the direct output of the algorithms. These are usually handed down through the single__raw__ parameters of the scoring output.

The exact structure of the results depends on the scorer and the optimizer used. Usually, outputs are provided as pandas dataframes.

If the individual outputs are dataframes, they are concatenated along the cv_fold axis. Otherwise, they are simply returned as a list, where each element represents the output of one cv-fold.

Returns:

dict: Raw algorithm results from teh cross-validation.

get_single_results_as_df( columns: list[str] | None = None, ) → DataFrame[source]#

Get the results as a pandas DataFrame.

This will return the results as a pandas DataFrame with the columns specified in the columns parameter. If no columns are specified, all columns are returned. We exclude single__raw__ columns, as they are by convention reserved for the direct output of the pipeline and usually don’t make sense to view together with the single results.

This will provide as one row per data label.

Parameters:

columns: List of columns that should be included in the DataFrame. These need to be specified WITHOUT the “single__” prefix. (e.g. f1_score instead of single__f1_score). If not specified, all columns are included.

Returns: