EvaluationCV#
- class mobgap.utils.evaluation.EvaluationCV(
- dataset: BaseGaitDatasetWithReference,
- scoring: Callable[[T, BaseGaitDatasetWithReference], float | Aggregator[Any] | dict[str, float | Aggregator[Any]]] | Scorer[T, BaseGaitDatasetWithReference],
- cv_iterator: DatasetSplitter | int | BaseCrossValidator | Iterator | None,
- *,
- cv_params: dict | None = None,
Generic Evaluation challenge for all algorithms using a cross-validation for scoring.
This class will use
cross_validateto evaluate the performance of a pipeline on a dataset with reference information. This is a suitable approach, when you want to evaluate and compare algorithms that are “trainable” in any way. This could be, because they are ML algorithms or because they have hyperparameters that can be optimized via Grid-Search.The cross validation parameters can be modified by the user to adapt them to a given dataset.
- Parameters:
- dataset
A gait dataset with reference information. Evaluation is performed across all datapoints within the dataset.
- scoring
A scoring function that evaluates the performance of the algorithm on a single datapoint. It should take a pipeline and a datapoint as input, run the pipeline on the datapoint and return a dictionary of performance metrics. These performance metrics are then aggregated across all datapoints.
- cv_iterator
A valid cv_iterator. For complex CVs (e.g. stratified/grouped) this should be a
DatasetSplitterinstance. For more information seecross_validate.- cv_params
Dictionary with further parameters that are directly passed to
cross_validate. This can overwrite all parameters exceptoptimizable,dataset,scoring, andcv, which are directly set via the other parameters of this method. Typical usecase is to setn_jobsto activate multiprocessing.
- Other Parameters:
- optimizer
The tpcp optimizer passed to the
runmethod.
- Attributes:
- results_
Dictionary with all results of the cross-validation. The results are returned by
cross_validate. You can control what information is provided viacv_params- perf_
A dictionary with the performance results of the action method. This includes:
start_datetime_utc_timestamp: The start time of the action in UTC as a timestamp.
start_datetime: The start time of the action as a string.
end_datetime_utc_timestamp: The end time of the action in UTC as a timestamp.
end_datetime: The end time of the action as a string.
runtime_s: The runtime of the action in seconds.
Methods
clone()Create a new instance of the class with all parameters copied over.
get_aggregated_results_as_df(*[, group])Get the aggregated results as a pandas DataFrame.
get_params([deep])Get parameters for this algorithm.
get_raw_results(*[, group])Get the raw results of the cross-validation.
get_single_results_as_df([columns, group])Get the results as a pandas DataFrame.
run(optimizer)Run the evaluation challenge.
set_params(**params)Set the parameters of this Algorithm.
- __init__(
- dataset: BaseGaitDatasetWithReference,
- scoring: Callable[[T, BaseGaitDatasetWithReference], float | Aggregator[Any] | dict[str, float | Aggregator[Any]]] | Scorer[T, BaseGaitDatasetWithReference],
- cv_iterator: DatasetSplitter | int | BaseCrossValidator | Iterator | None,
- *,
- cv_params: dict | None = None,
- clone() Self[source]#
Create a new instance of the class with all parameters copied over.
This will create a new instance of the class itself and all nested objects
- get_aggregated_results_as_df(
- *,
- group: Literal['test', 'train'] = 'test',
Get the aggregated results as a pandas DataFrame.
This will return all
agg__columns that the scorer returned (seeresults_attribute) as a pandas dataframe.The returned Df will have the cv-folds as rows and the aggregated values as columns. This makes it convenient to then calculate typical metrics like mean, std, etc. across the cv-folds.
- Returns:
- pd.DataFrame
The results as a pandas DataFrame.
- get_params(deep: bool = True) dict[str, Any][source]#
Get parameters for this algorithm.
- Parameters:
- deep
Only relevant if object contains nested algorithm objects. If this is the case and deep is True, the params of these nested objects are included in the output using a prefix like
nested_object_name__(Note the two “_” at the end)
- Returns:
- params
Parameter names mapped to their values.
- get_raw_results(
- *,
- group: Literal['test', 'train'] = 'test',
Get the raw results of the cross-validation.
Get the direct output of the algorithms. These are usually handed down through the
agg__raw__parameters of the scoring output.The exact structure of the results depends on the scorer and the optimizer used. Usually, outputs are provided as pandas dataframes.
If the individual outputs are dataframes, they are concatenated along the
cv_foldaxis. Otherwise, they are simply returned as a list, where each element represents the output of one cv-fold.- Returns:
- dict
Raw algorithm results from the cross-validation.
- get_single_results_as_df( ) DataFrame[source]#
Get the results as a pandas DataFrame.
This will return the results as a pandas DataFrame with the columns specified in the
columnsparameter. If no columns are specified, all columns are returned.We exclude
single__raw__columns, as they are by convention reserved for the direct output of the pipeline and usually don’t make sense to view together with the other single results.This will provide as one row per data label of all datapoints across all cv-folds. Be aware, that this means, that these results were potentially generated with different models or hyperparameters (depending on what you are optimizing).
Warning
When using
group="train", you will likely get duplicated rows in the results, as the same datapoints were used in multiple cv-folds as training data. You should remove these duplicates depending on your application.- Parameters:
- columns
List of columns that should be included in the DataFrame. These need to be specified WITHOUT the “single__” and the
test/train__prefix. (e.g.f1_scoreinstead oftest__single__f1_score). If not specified, all columns are included.- group
Whether to return the results for the test or the train set. Note, that the train results might only be available, if you passed
return_train_scores=Trueto thecv_paramsof theEvaluationCvinstance.
- Returns:
- pd.DataFrame
The results as a pandas DataFrame.
- run(
- optimizer: BaseOptimize[T, BaseGaitDatasetWithReference],
Run the evaluation challenge.
This will call the optimizer for each train set and evaluate the performance on each test set defined by the
cv_iteratoron thedataset.- Parameters:
- optimizer
A valid tpcp optimizer that wraps a pipeline that is compatible with the provided dataset and scorer. Usually that should be an optimizer wrapping a
GsdEmulationPipeline. If you want to run without optimization, but still use the same test-folds, useDummyOptimize:>>> from tpcp.optimize import DummyOptimize >>> from mobgap.gait_sequences import GsdIluz >>> >>> dummy_optimizer = DummyOptimize( ... pipeline=GsdEmulationPipeline(GsdIluz()), ... ignore_potential_user_error_warning=True, ... ) >>> challenge = EvaluationCV( ... dataset, ... scoring, ... cv_iterator, ... ) >>> challenge.run(dummy_optimizer)
- Returns:
- self
The instance of the class with the
results_attribute set to the results of the cross-validation.