skpro.benchmarking.evaluate.evaluate#
- skpro.benchmarking.evaluate.evaluate(estimator, cv, X, y, scoring=None, return_data=False, error_score=nan, backend=None, compute=True, backend_params=None, C=None, **kwargs)[source]#
Evaluate estimator using re-sample folds.
All-in-one statistical performance benchmarking utility for estimators which runs a simple backtest experiment and returns a summary pd.DataFrame.
The experiment run is the following:
Denote by \(X_{train, 1}, X_{test, 1}, \dots, X_{train, K}, X_{test, K}\) the train/test folds produced by the generator
cv.split(X)Denote by \(y_{train, 1}, y_{test, 1}, \dots, y_{train, K}, y_{test, K}\) the train/test folds produced by the generatorcv.split(y).For
i = 1tocv.get_n_folds(X)do:fittheestimatorto \(X_{train, 1}\), \(y_{train, 1}\)y_pred = estimator.predict
(or
predict_probaorpredict_quantiles, depending onscoring) with exogeneous data \(X_{test, i}\)Compute
scoringon ``y_pred``versus \(y_{test, 1}\).
Results returned in this function’s return are: * results of
scoringcalculations, from 3, in the i-th loop * runtimes for fitting and/or predicting, from 1, 2 in the i-th loop * \(y_{train, i}\), \(y_{test, i}\),y_pred(optional)A distributed and-or parallel back-end can be chosen via the
backendparameter.- Parameters:
- estimatorskpro BaseProbaRegressor descendant (concrete estimator)
skpro estimator to benchmark
- cvsklearn splitter
determines split of
Xandyinto test and train folds- Xpandas DataFrame
Feature instances to use in evaluation experiment
- ypd.DataFrame, must be same length as X
Labels to use in the evaluation experiment
- scoringsubclass of skpro.performance_metrics.BaseMetric or list of same,
default=None. Used to get a score function that takes y_pred and y_test arguments and accept y_train as keyword argument. If None, then uses scoring = MeanAbsolutePercentageError(symmetric=True).
- return_databool, default=False
Returns three additional columns in the DataFrame, by default False. The cells of the columns contain each a pd.Series for y_train, y_pred, y_test.
- error_score“raise” or numeric, default=np.nan
Value to assign to the score if an exception occurs in estimator fitting. If set to “raise”, the exception is raised. If a numeric value is given, FitFailedWarning is raised.
- backend{“dask”, “loky”, “multiprocessing”, “threading”}, by default None.
Runs parallel evaluate if specified and strategy is set as “refit”.
“None”: executes loop sequentally, simple list comprehension
“loky”, “multiprocessing” and “threading”: uses
joblib.Parallelloops“joblib”: custom and 3rd party
joblibbackends, e.g.,spark“dask”: uses
dask, requiresdaskpackage in environment“dask_lazy”: same as “dask”, but changes the return to (lazy)
dask.dataframe.DataFrame.
Recommendation: Use “dask” or “loky” for parallel evaluate. “threading” is unlikely to see speed ups due to the GIL and the serialization backend (
cloudpickle) for “dask” and “loky” is generally more robust than the standardpicklelibrary used in “multiprocessing”.- computebool, default=True
If backend=”dask”, whether returned DataFrame is computed. If set to True, returns pd.DataFrame, otherwise dask.dataframe.DataFrame.
- backend_paramsdict, optional
additional parameters passed to the backend as config. Directly passed to
utils.parallel.parallelize. Valid keys depend on the value ofbackend:“None”: no additional parameters,
backend_paramsis ignored“loky”, “multiprocessing” and “threading”: default
joblibbackends any valid keys forjoblib.Parallelcan be passed here, e.g.,n_jobs, with the exception ofbackendwhich is directly controlled bybackend. Ifn_jobsis not passed, it will default to-1, other parameters will default tojoblibdefaults.“joblib”: custom and 3rd party
joblibbackends, e.g.,spark. any valid keys forjoblib.Parallelcan be passed here, e.g.,n_jobs,backendmust be passed as a key ofbackend_paramsin this case. Ifn_jobsis not passed, it will default to-1, other parameters will default tojoblibdefaults.“dask”: any valid keys for
dask.computecan be passed, e.g.,scheduler
- Cpd.DataFrame, optional (default=None)
censoring information to use in the evaluation experiment,
should have same column name as y, same length as X and y
should have entries 0 and 1 (float or int), 0 = uncensored, 1 = (right) censored
if None, all observations are assumed to be uncensored. Can be passed to any probabilistic regressor, but is ignored if
capability:survivaltag isFalse.
- Returns:
- resultspd.DataFrame or dask.dataframe.DataFrame
DataFrame that contains several columns with information regarding each refit/update and prediction of the estimator. Row index is splitter index of train/test fold in
cv. Entries in the i-th row are for the i-th train/test split incv. Columns are as follows:test_{scoring.name}: (float) Model performance score. If
scoringis a list, then there is a column withnametest_{scoring.name}for each scorer.fit_time: (float) Time in sec for
fiton train fold.pred_time: (float) Time in sec to
predictfrom fitted estimator.pred_[method]_time: (float) Time in sec to run
predict_[method]from fitted estimator.len_y_train: (int) length of y_train.
y_train: (pd.Series) only present if see
return_data=Truetrain fold of the i-th split incv, used to fit the estimator.y_pred: (pd.Series) present if see
return_data=Truepredictions from fitted estimator for the i-th test fold indices ofcv.y_test: (pd.Series) present if see
return_data=Truetesting fold of the i-th split incv, used to compute the metric.
Examples
>>> import pandas as pd >>> from sklearn.datasets import load_diabetes >>> from sklearn.linear_model import LinearRegression >>> from sklearn.model_selection import KFold >>> >>> from skpro.benchmarking.evaluate import evaluate >>> from skpro.metrics import CRPS >>> from skpro.regression.residual import ResidualDouble >>> >>> X, y = load_diabetes(return_X_y=True, as_frame=True) >>> y = pd.DataFrame(y) # skpro assumes y is pd.DataFrame >>> >>> estimator = ResidualDouble(LinearRegression()) >>> cv = KFold(n_splits=3) >>> crps = CRPS() >>> >>> results = evaluate(estimator=estimator, X=X, y=y, cv=cv, scoring=crps)