EvaluationCV#

class mobgap.utils.evaluation.EvaluationCV(
dataset: BaseGaitDatasetWithReference,
scoring: Callable[[T, BaseGaitDatasetWithReference], float | Aggregator[Any] | dict[str, float | Aggregator[Any]]] | Scorer[T, BaseGaitDatasetWithReference],
cv_iterator: DatasetSplitter | int | BaseCrossValidator | Iterator | None,
*,
cv_params: dict | None = None,
)[source]#

Generic Evaluation challenge for all algorithms using a cross-validation for scoring.

This class will use cross_validate to evaluate the performance of a pipeline on a dataset with reference information. This is a suitable approach, when you want to evaluate and compare algorithms that are “trainable” in any way. This could be, because they are ML algorithms or because they have hyperparameters that can be optimized via Grid-Search.

The cross validation parameters can be modified by the user to adapt them to a given dataset.

Parameters:
dataset

A gait dataset with reference information. Evaluation is performed across all datapoints within the dataset.

scoring

A scoring function that evaluates the performance of the algorithm on a single datapoint. It should take a pipeline and a datapoint as input, run the pipeline on the datapoint and return a dictionary of performance metrics. These performance metrics are then aggregated across all datapoints.

cv_iterator

A valid cv_iterator. For complex CVs (e.g. stratified/grouped) this should be a DatasetSplitter instance. For more information see cross_validate.

cv_params

Dictionary with further parameters that are directly passed to cross_validate. This can overwrite all parameters except optimizable, dataset, scoring, and cv, which are directly set via the other parameters of this method. Typical usecase is to set n_jobs to activate multiprocessing.

Other Parameters:
optimizer

The tpcp optimizer passed to the run method.

Attributes:
results_

Dictionary with all results of the cross-validation. The results are returned by cross_validate. You can control what information is provided via cv_params

perf_

A dictionary with the performance results of the action method. This includes:

  • start_datetime_utc_timestamp: The start time of the action in UTC as a timestamp.

  • start_datetime: The start time of the action as a string.

  • end_datetime_utc_timestamp: The end time of the action in UTC as a timestamp.

  • end_datetime: The end time of the action as a string.

  • runtime_s: The runtime of the action in seconds.

Methods

clone()

Create a new instance of the class with all parameters copied over.

get_aggregated_results_as_df(*[, group])

Get the aggregated results as a pandas DataFrame.

get_params([deep])

Get parameters for this algorithm.

get_raw_results(*[, group])

Get the raw results of the cross-validation.

get_single_results_as_df([columns, group])

Get the results as a pandas DataFrame.

run(optimizer)

Run the evaluation challenge.

set_params(**params)

Set the parameters of this Algorithm.

__init__(
dataset: BaseGaitDatasetWithReference,
scoring: Callable[[T, BaseGaitDatasetWithReference], float | Aggregator[Any] | dict[str, float | Aggregator[Any]]] | Scorer[T, BaseGaitDatasetWithReference],
cv_iterator: DatasetSplitter | int | BaseCrossValidator | Iterator | None,
*,
cv_params: dict | None = None,
) None[source]#
clone() Self[source]#

Create a new instance of the class with all parameters copied over.

This will create a new instance of the class itself and all nested objects

get_aggregated_results_as_df(
*,
group: Literal['test', 'train'] = 'test',
) DataFrame[source]#

Get the aggregated results as a pandas DataFrame.

This will return all agg__ columns that the scorer returned (see results_ attribute) as a pandas dataframe.

The returned Df will have the cv-folds as rows and the aggregated values as columns. This makes it convenient to then calculate typical metrics like mean, std, etc. across the cv-folds.

Returns:
pd.DataFrame

The results as a pandas DataFrame.

get_params(deep: bool = True) dict[str, Any][source]#

Get parameters for this algorithm.

Parameters:
deep

Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like nested_object_name__ (Note the two “_” at the end)

Returns:
params

Parameter names mapped to their values.

get_raw_results(
*,
group: Literal['test', 'train'] = 'test',
) dict[source]#

Get the raw results of the cross-validation.

Get the direct output of the algorithms. These are usually handed down through the agg__raw__ parameters of the scoring output.

The exact structure of the results depends on the scorer and the optimizer used. Usually, outputs are provided as pandas dataframes.

If the individual outputs are dataframes, they are concatenated along the cv_fold axis. Otherwise, they are simply returned as a list, where each element represents the output of one cv-fold.

Returns:
dict

Raw algorithm results from the cross-validation.

get_single_results_as_df(
columns: list[str] | None = None,
*,
group: Literal['train', 'test'] = 'test',
) DataFrame[source]#

Get the results as a pandas DataFrame.

This will return the results as a pandas DataFrame with the columns specified in the columns parameter. If no columns are specified, all columns are returned.

We exclude single__raw__ columns, as they are by convention reserved for the direct output of the pipeline and usually don’t make sense to view together with the other single results.

This will provide as one row per data label of all datapoints across all cv-folds. Be aware, that this means, that these results were potentially generated with different models or hyperparameters (depending on what you are optimizing).

Warning

When using group="train", you will likely get duplicated rows in the results, as the same datapoints were used in multiple cv-folds as training data. You should remove these duplicates depending on your application.

Parameters:
columns

List of columns that should be included in the DataFrame. These need to be specified WITHOUT the “single__” and the test/train__ prefix. (e.g. f1_score instead of test__single__f1_score). If not specified, all columns are included.

group

Whether to return the results for the test or the train set. Note, that the train results might only be available, if you passed return_train_scores=True to the cv_params of the EvaluationCv instance.

Returns:
pd.DataFrame

The results as a pandas DataFrame.

run(
optimizer: BaseOptimize[T, BaseGaitDatasetWithReference],
) Self[source]#

Run the evaluation challenge.

This will call the optimizer for each train set and evaluate the performance on each test set defined by the cv_iterator on the dataset.

Parameters:
optimizer

A valid tpcp optimizer that wraps a pipeline that is compatible with the provided dataset and scorer. Usually that should be an optimizer wrapping a GsdEmulationPipeline. If you want to run without optimization, but still use the same test-folds, use DummyOptimize:

>>> from tpcp.optimize import DummyOptimize
>>> from mobgap.gait_sequences import GsdIluz
>>>
>>> dummy_optimizer = DummyOptimize(
...     pipeline=GsdEmulationPipeline(GsdIluz()),
...     ignore_potential_user_error_warning=True,
... )
>>> challenge = EvaluationCV(
...     dataset,
...     scoring,
...     cv_iterator,
... )
>>> challenge.run(dummy_optimizer)
Returns:
self

The instance of the class with the results_ attribute set to the results of the cross-validation.

set_params(**params: Any) Self[source]#

Set the parameters of this Algorithm.

To set parameters of nested objects use nested_object_name__para_name=.

Examples using mobgap.utils.evaluation.EvaluationCV#

GSD Evaluation Challenges

GSD Evaluation Challenges