skpro.benchmarking.evaluate.evaluate#

skpro.benchmarking.evaluate.evaluate(estimator, cv, X, y, scoring=None, return_data=False, error_score=nan, backend=None, compute=True, backend_params=None, C=None, **kwargs)[source]#

Evaluate estimator using re-sample folds.

All-in-one statistical performance benchmarking utility for estimators which runs a simple backtest experiment and returns a summary pd.DataFrame.

The experiment run is the following:

Denote by \(X_{train, 1}, X_{test, 1}, \dots, X_{train, K}, X_{test, K}\) the train/test folds produced by the generator cv.split(X) Denote by \(y_{train, 1}, y_{test, 1}, \dots, y_{train, K}, y_{test, K}\) the train/test folds produced by the generator cv.split(y).

  1. For i = 1 to cv.get_n_folds(X) do:

  2. fit the estimator to \(X_{train, 1}\), \(y_{train, 1}\)

  3. y_pred = estimator.predict

(or predict_proba or predict_quantiles, depending on scoring) with exogeneous data \(X_{test, i}\)

  1. Compute scoring on ``y_pred``versus \(y_{test, 1}\).

Results returned in this function’s return are: * results of scoring calculations, from 3, in the i-th loop * runtimes for fitting and/or predicting, from 1, 2 in the i-th loop * \(y_{train, i}\), \(y_{test, i}\), y_pred (optional)

A distributed and-or parallel back-end can be chosen via the backend parameter.

Parameters:
estimatorskpro BaseProbaRegressor descendant (concrete estimator)

skpro estimator to benchmark

cvsklearn splitter

determines split of X and y into test and train folds

Xpandas DataFrame

Feature instances to use in evaluation experiment

ypd.DataFrame, must be same length as X

Labels to used in the evaluation experiment

scoringsubclass of skpro.performance_metrics.BaseMetric or list of same,

default=None. Used to get a score function that takes y_pred and y_test arguments and accept y_train as keyword argument. If None, then uses scoring = MeanAbsolutePercentageError(symmetric=True).

return_databool, default=False

Returns three additional columns in the DataFrame, by default False. The cells of the columns contain each a pd.Series for y_train, y_pred, y_test.

error_score“raise” or numeric, default=np.nan

Value to assign to the score if an exception occurs in estimator fitting. If set to “raise”, the exception is raised. If a numeric value is given, FitFailedWarning is raised.

backend{“dask”, “loky”, “multiprocessing”, “threading”}, by default None.

Runs parallel evaluate if specified and strategy is set as “refit”.

  • “None”: executes loop sequentally, simple list comprehension

  • “loky”, “multiprocessing” and “threading”: uses joblib.Parallel loops

  • “joblib”: custom and 3rd party joblib backends, e.g., spark

  • “dask”: uses dask, requires dask package in environment

  • “dask_lazy”: same as “dask”, but changes the return to (lazy) dask.dataframe.DataFrame.

Recommendation: Use “dask” or “loky” for parallel evaluate. “threading” is unlikely to see speed ups due to the GIL and the serialization backend (cloudpickle) for “dask” and “loky” is generally more robust than the standard pickle library used in “multiprocessing”.

computebool, default=True

If backend=”dask”, whether returned DataFrame is computed. If set to True, returns pd.DataFrame, otherwise dask.dataframe.DataFrame.

backend_paramsdict, optional

additional parameters passed to the backend as config. Directly passed to utils.parallel.parallelize. Valid keys depend on the value of backend:

  • “None”: no additional parameters, backend_params is ignored

  • “loky”, “multiprocessing” and “threading”: default joblib backends any valid keys for joblib.Parallel can be passed here, e.g., n_jobs, with the exception of backend which is directly controlled by backend. If n_jobs is not passed, it will default to -1, other parameters will default to joblib defaults.

  • “joblib”: custom and 3rd party joblib backends, e.g., spark. any valid keys for joblib.Parallel can be passed here, e.g., n_jobs, backend must be passed as a key of backend_params in this case. If n_jobs is not passed, it will default to -1, other parameters will default to joblib defaults.

  • “dask”: any valid keys for dask.compute can be passed, e.g., scheduler

Cpd.DataFrame, optional (default=None)

censoring information to use in the evaluation experiment, should have same column name as y, same length as X and y should have entries 0 and 1 (float or int) 0 = uncensored, 1 = (right) censored if None, all observations are assumed to be uncensored Can be passed to any probabilistic regressor, but is ignored if capability:survival tag is False.

Returns:
resultspd.DataFrame or dask.dataframe.DataFrame

DataFrame that contains several columns with information regarding each refit/update and prediction of the estimator. Row index is splitter index of train/test fold in cv. Entries in the i-th row are for the i-th train/test split in cv. Columns are as follows: - test_{scoring.name}: (float) Model performance score.

If scoring is a list, then there is a column withname test_{scoring.name} for each scorer.

  • fit_time: (float) Time in sec for fit on train fold.

  • pred_time: (float) Time in sec to predict from fitted estimator.

  • pred_[method]_time: (float) Time in sec to run predict_[method] from fitted estimator.

  • len_y_train: (int) length of y_train.

  • y_train: (pd.Series) only present if see return_data=True train fold of the i-th split in cv, used to fit the estimator.

  • y_pred: (pd.Series) present if see return_data=True predictions from fitted estimator for the i-th test fold indices of cv.

  • y_test: (pd.Series) present if see return_data=True testing fold of the i-th split in cv, used to compute the metric.

Examples

>>> from sklearn.datasets import load_diabetes
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.model_selection import KFold
>>> from skpro.benchmarking.evaluate import evaluate
>>> from skpro.metrics import CRPS
>>> from skpro.regression.residual import ResidualDouble
>>> X, y = load_diabetes(return_X_y=True, as_frame=True)
>>> y = pd.DataFrame(y)  # skpro assumes y is pd.DataFrame
>>> estimator = ResidualDouble(LinearRegression())
>>> cv = KFold(n_splits=3)
>>> crps = CRPS()
>>> results = evaluate(estimator=estimator, X=X, y=y, cv=cv, scoring=crps)