skpro.benchmarking.evaluate.evaluate#
- skpro.benchmarking.evaluate.evaluate(estimator, cv, X, y, scoring=None, return_data=False, error_score=nan, backend=None, compute=True, backend_params=None, C=None, **kwargs)[source]#
Evaluate estimator using re-sample folds.
All-in-one statistical performance benchmarking utility for estimators which runs a simple backtest experiment and returns a summary pd.DataFrame.
The experiment run is the following:
Denote by \(X_{train, 1}, X_{test, 1}, \dots, X_{train, K}, X_{test, K}\) the train/test folds produced by the generator
cv.split(X)
Denote by \(y_{train, 1}, y_{test, 1}, \dots, y_{train, K}, y_{test, K}\) the train/test folds produced by the generatorcv.split(y)
.For
i = 1
tocv.get_n_folds(X)
do:fit
theestimator
to \(X_{train, 1}\), \(y_{train, 1}\)y_pred = estimator.predict
(or
predict_proba
orpredict_quantiles
, depending onscoring
) with exogeneous data \(X_{test, i}\)Compute
scoring
on ``y_pred``versus \(y_{test, 1}\).
Results returned in this function’s return are: * results of
scoring
calculations, from 3, in the i-th loop * runtimes for fitting and/or predicting, from 1, 2 in the i-th loop * \(y_{train, i}\), \(y_{test, i}\),y_pred
(optional)A distributed and-or parallel back-end can be chosen via the
backend
parameter.- Parameters:
- estimatorskpro BaseProbaRegressor descendant (concrete estimator)
skpro estimator to benchmark
- cvsklearn splitter
determines split of
X
andy
into test and train folds- Xpandas DataFrame
Feature instances to use in evaluation experiment
- ypd.DataFrame, must be same length as X
Labels to used in the evaluation experiment
- scoringsubclass of skpro.performance_metrics.BaseMetric or list of same,
default=None. Used to get a score function that takes y_pred and y_test arguments and accept y_train as keyword argument. If None, then uses scoring = MeanAbsolutePercentageError(symmetric=True).
- return_databool, default=False
Returns three additional columns in the DataFrame, by default False. The cells of the columns contain each a pd.Series for y_train, y_pred, y_test.
- error_score“raise” or numeric, default=np.nan
Value to assign to the score if an exception occurs in estimator fitting. If set to “raise”, the exception is raised. If a numeric value is given, FitFailedWarning is raised.
- backend{“dask”, “loky”, “multiprocessing”, “threading”}, by default None.
Runs parallel evaluate if specified and strategy is set as “refit”.
“None”: executes loop sequentally, simple list comprehension
“loky”, “multiprocessing” and “threading”: uses
joblib.Parallel
loops“joblib”: custom and 3rd party
joblib
backends, e.g.,spark
“dask”: uses
dask
, requiresdask
package in environment“dask_lazy”: same as “dask”, but changes the return to (lazy)
dask.dataframe.DataFrame
.
Recommendation: Use “dask” or “loky” for parallel evaluate. “threading” is unlikely to see speed ups due to the GIL and the serialization backend (
cloudpickle
) for “dask” and “loky” is generally more robust than the standardpickle
library used in “multiprocessing”.- computebool, default=True
If backend=”dask”, whether returned DataFrame is computed. If set to True, returns pd.DataFrame, otherwise dask.dataframe.DataFrame.
- backend_paramsdict, optional
additional parameters passed to the backend as config. Directly passed to
utils.parallel.parallelize
. Valid keys depend on the value ofbackend
:“None”: no additional parameters,
backend_params
is ignored“loky”, “multiprocessing” and “threading”: default
joblib
backends any valid keys forjoblib.Parallel
can be passed here, e.g.,n_jobs
, with the exception ofbackend
which is directly controlled bybackend
. Ifn_jobs
is not passed, it will default to-1
, other parameters will default tojoblib
defaults.“joblib”: custom and 3rd party
joblib
backends, e.g.,spark
. any valid keys forjoblib.Parallel
can be passed here, e.g.,n_jobs
,backend
must be passed as a key ofbackend_params
in this case. Ifn_jobs
is not passed, it will default to-1
, other parameters will default tojoblib
defaults.“dask”: any valid keys for
dask.compute
can be passed, e.g.,scheduler
- Cpd.DataFrame, optional (default=None)
censoring information to use in the evaluation experiment, should have same column name as y, same length as X and y should have entries 0 and 1 (float or int) 0 = uncensored, 1 = (right) censored if None, all observations are assumed to be uncensored Can be passed to any probabilistic regressor, but is ignored if capability:survival tag is False.
- Returns:
- resultspd.DataFrame or dask.dataframe.DataFrame
DataFrame that contains several columns with information regarding each refit/update and prediction of the estimator. Row index is splitter index of train/test fold in
cv
. Entries in the i-th row are for the i-th train/test split incv
. Columns are as follows: - test_{scoring.name}: (float) Model performance score.If
scoring
is a list, then there is a column withnametest_{scoring.name}
for each scorer.fit_time: (float) Time in sec for
fit
on train fold.pred_time: (float) Time in sec to
predict
from fitted estimator.pred_[method]_time: (float) Time in sec to run
predict_[method]
from fitted estimator.len_y_train: (int) length of y_train.
y_train: (pd.Series) only present if see
return_data=True
train fold of the i-th split incv
, used to fit the estimator.y_pred: (pd.Series) present if see
return_data=True
predictions from fitted estimator for the i-th test fold indices ofcv
.y_test: (pd.Series) present if see
return_data=True
testing fold of the i-th split incv
, used to compute the metric.
Examples
>>> from sklearn.datasets import load_diabetes >>> from sklearn.linear_model import LinearRegression >>> from sklearn.model_selection import KFold
>>> from skpro.benchmarking.evaluate import evaluate >>> from skpro.metrics import CRPS >>> from skpro.regression.residual import ResidualDouble
>>> X, y = load_diabetes(return_X_y=True, as_frame=True) >>> y = pd.DataFrame(y) # skpro assumes y is pd.DataFrame
>>> estimator = ResidualDouble(LinearRegression()) >>> cv = KFold(n_splits=3) >>> crps = CRPS()
>>> results = evaluate(estimator=estimator, X=X, y=y, cv=cv, scoring=crps)