Changelog#
All notable changes to this project beggining with version 0.1.0 will be documented in this file. The format is based on Keep a Changelog and we adhere to Semantic Versioning. The source code for all releases is available on GitHub.
You can also subscribe to skpro’s
PyPi release.
For planned changes and upcoming releases, see roadmap in the issue tracker.
[2.3.0] - 2024-05-16#
new tutorial notebooks for survival prediction and probability distributions (#303, #305) @fkiraly
interface to
ngboostprobabilistic regressor and survival predictor (#215, #301, #309, #332) @ShreeshaM07interface to Poisson regressor from
sklearn(#213) @nilesh05aprprobability distributions rearchitecture, including scalar valued distributions, e.g.,
Normal(mu=0, sigma=1)- see “core interface changes”probability distributions: illustrative and didactic plotting functionality, e.g.,
my_normal.plot("pdf")(#275) @fkiralymore distributions: beta, chi-squared, delta, exponential, uniform - @an20805, @malikrafsan, @ShreeshaM07, @sukjingitsit
Core interface changes#
Probability distributions have been rearchitected with API improvements:
all changes are fully downwards compatible with the previous API.
distributions can now be scalar valued, e.g.,
Normal(mu=0, sigma=1). More generally, all distributions behave as scalar distributions ifindexandcolumnsare not passed and all parameters passed are scalar. or scalar-like. In this case, methods such aspdf,cdforsamplewill return scalar (float) values instead ofpd.DataFrame.ndimandshape- distributions now possess anndimproperty, which evaluates to 0 for scalar distributions, and 2 otherwise. Theshapeproperty evaluates to the empty tuple for scalar distributions, and to a 2-tuple with the shape for array-like distributions. This is in line withnumpyconventions.plot- distributions now have aplotmethod, which can be used to plot any method of the distribution. The method is called asmy_distr.plot("pdf")ormy_distribution.plot("cdf"), or simsilar. If the distribution is scalar, this will create a singlematplotlibplot in anaxobject. DataFrame-like distributions will create a plot for each marginal component, returningfigwith an array ofaxobjects, of same shape as the distribution object.head,tail- distributions now possessheadandtailmethods, which return the first and lastnrows of the distribution, respectively. This is useful for inspecting the distribution object in a Jupyter notebook, in particular when combined withplot.at,iat- distributions now possessatandiatsubsetters, which can be used to subset a DataFrame-like distribution to a scalar distribution at a given integer index or location index, respectively.pdf,pmf- all distributions now possess apdfandpmfmethod, for probability density function and probability mass function. These are available for all distributions, continuous, discrete, and mixed.pdfreturns the density of the continuous part of the distribution,pmfthe mass of the discrete part. Continuous distributions will return 0 forpmf, discrete distributions will return 0 forpdf. Logarithmic versions of these methods are available aslog_pdfandlog_pmf, these may be more numerically stable.surv,haz- distributions now possess shorthand methods to return survival function evaluates,surv, and hazard function evaluates,haz. These are available for all distributions. In case of mixed distributions, hazard is computed with the continuous part of the distribution.distr:paramtypetag - distributions are now annotated with a new public tag:distr:paramtypeindicates whether the distribution is"parametric","non-parametric", or"composite". Parametric distributions have only numpy array-like or categorical parameters. Non-parametric distributions may have further types of parameters such as data-like, but no distributions. Composite distributions have other distributions as parameters.to_df,get_params_df- parametric distributions now provide methodsto_df,get_params_df, which allow to return distribution parameters coerced toDataFrame, ordictofDataFrame, keyed by parameter names, respectively.the extension contract for distributions has been changed to a boilerplate layered design. Extenders will now implement private methods such as
_pdf,_cdf, instead of overriding the public interface. This allows for more flexibility in boilerplate design, and ensures more consistent behavior across distributions. The new extension contract is documented in the newskproextension template,extension_templates/distribution_template.py.
Deprecations and removals#
At version 2.4.0, the
boundparameter will be removed from theCyclicBoostingprobabilistic supervised regression estimator, and will be replaced by use oflowerorupper. To retain previous behaviour, users should replacebound="U"withupper=Noneandlower=None;bound="L"withupper=Noneandlowerset to the value of the lower bound; andbound="B"with bothupperandlowerset to the respective values. To silence the warnings and prevent exceptions occurring from 2.4.0, users should not explicitly setbounds, and ensure values for any subsequent parameters are set as keyword arguments, not positional arguments.
Enhancements#
Probability distributions#
[ENH] probability distributions - boilerplate refactor (#265) @fkiraly
[ENH] probability distributions: convenience feature to coerce
indexandcolumnstopd.Index(#276) @fkiraly[ENH] distribution
quantilemethod for scalar distributions (#277) @fkiraly[ENH] systematic suite tests for scalar probability distributions (#278) @fkiraly
[ENH] scalar test cases for probability distributions (#279) @fkiraly
[ENH] activate tests for distribution base class defaults (#266) @fkiraly
[ENH] probability distributions: illustrative and didactic plotting functionality (#275) @fkiraly
[ENH] Chi-Squared Distribution (#217) @sukjingitsit
[ENH] Adapter for Scipy Distributions (#287) @malikrafsan
[ENH] simplify coercion in
BaseDistribution._log_pdfand_pdfdefault (#293) @fkiraly[ENH] Beta Distribution (#298) @malikrafsan
[ENH] distributions: survival and hazard function and defaults (#294) @fkiraly
[ENH] improved
Empiricaldistribution - scalar mode, new API compatibility (#307) @fkiraly[ENH] increase distribution default
plotresolution (#308) @fkiraly[ENH] distribution
get_paramsin data frame format (#285) @fkiraly[ENH]
headandtailfor distribution objects (#310) @fkiraly[ENH] full support of hierarchical
MultiIndexindexinEmpiricaldistribution, tests (#314) @fkiraly[ENH]
atandiatsubsetters for distributions (#274) @fkiraly[ENH]
Exponentialdistribution (#325) @ShreeshaM07[ENH]
Mixturedistribution upgrade - refactor to new extension interface, support scalar case (#315) @fkiraly[ENH] native implementation of Johnson QPD family, explicit pdf (#327) @fkiraly
[ENH] improved defaults for
BaseDistribution_mean,_var, and_energy_x(#330) @fkiraly
Probabilistic regression#
[ENH] interface to
ngboost(#215) @ShreeshaM07[ENH] interfacing Poisson regressor from sklearn (#213) @nilesh05apr
[ENH] refactor
NGBoostRegressorto inheritNGBoostAdapter(#309) @ShreeshaM07[ENH]
Exponentialdist inNGBoostRegressor,NGBoostSurvival(#332) @ShreeshaM07
Survival and time-to-event prediction#
Test framework#
Fixes#
Probability distributions#
[BUG] bugfixes for distribution base class default methods (#281) @fkiraly
[BUG] fix
Empiricalindex to bepd.MultiIndexfor hierarchical data index (#286) @fkiraly[BUG] update Johnson QPDistributions with bugfixes and vectorization (cyclic-boosting ver.1.4.0) (#232) @setoguchi-naoki
[BUG]
BaseDistribution._var: fix missing factor 2 in Monte Carlo variance default method (#331) @fkiraly
Survival and time-to-event prediction#
Maintenance#
[MNT] [Dependabot](deps): Update
sphinx-galleryrequirement from<0.16.0to<0.17.0(#288) @dependabot[bot][MNT] move GHA runners consistently to
ubuntu-latest,windows-latest,macos-13(#272) @fkiraly[MNT] set macos runner for release workflow to
macos-13(#273) @fkiraly[MNT] moving ensemble regressors to
regression.ensemble(#302) @fkiraly[MNT] deprecation handling for
CyclicBoosting(#329) @fkiraly, @setoguchi-naoki[MNT] fix repository variables in changelog generator (#333) @fkiraly
Documentation#
[DOC] add missing contributors to
all-contributorsrc- @an20805, @duydl, @sukjingitsit (#284) @fkiraly[DOC] tutorial notebook for probability distributions (#303) @fkiraly
[DOC] tutorial notebook for survival prediction (#305) @fkiraly
[DOC] visualizations for first intro vignette in intro notebook and minor updates (#311) @fkiraly
Contributors#
@an20805, @fkiraly, @malikrafsan, @nilesh05apr, @setoguchi-naoki, @ShreeshaM07, @sukjingitsit
[2.2.2] - 2024-04-20#
lifelinespredictive survival regressors are available asskproestimators: accelerated failure time (Fisk, Log-normal, Weibull), CoxPH variants, Aalen additive model (#247, #258, #260) @fkiralyscikit-survivalpredictive survival regressors are available asskproestimators: CoxPH variants, CoxNet, survival tree and forest, survival gradient boosting (#237) @fkiralyGLM regressor using
statsmodelsGLM, with Gaussian link (#222) @julian-fongvarious survival type distributions added: log-normal, logistic, Fisk (=log-logistic), Weibull (#218, #241, #242, #259) @bhavikar, @malikrafsan, @fkiraly
Core interface changes#
Probability distributions#
Probability distributions (
BaseDistribution) now have alenmethod, which returns the number of number of rows of the distribution, this is the same as thelenof apd.DataFramereturned bysample.the interface now supports discrete distributions and those with integer support. Such distributions implement
pmfandlog_pmfmethods.
Enhancements#
Probability distributions#
[ENH] make
Empiricaldistribution compatible with multi-index rows (#233) @fkiraly[ENH] empirical quantile parameterized distribution (#236) @fkiraly
[ENH] add
lenofBaseDistribution, testshape,len, indices (#239) @fkiraly[ENH] Logistic distribution (#241) @malikrafsan
[ENH] Weibull distribution (#242) @malikrafsan
[ENH] Johnson QP-distributions - add some missing capability tags (#253) @fkiraly
[ENH] remove stray
_get_bc_paramsfromLogNormal(#256) @fkiraly[ENH] Fisk distribution aka log-logistic distribution (#259) @fkiraly
Probabilistic regression#
[ENH]
GLMRegressorusing statsmodelsGLMwith Gaussian link (#222) @julian-fong[ENH] added test parameters for probabilistic metrics (#234) @fkiraly
Survival and time-to-event prediction#
Test framework#
Fixes#
Probability distributions#
Documentation#
Maintenance#
Contributors#
[2.2.1] - 2024-03-03#
Minor bugfix and maintenance release.
[2.2.0] - 2024-02-08#
interface to
cyclic_boostingpackage (#144) @setoguchi-naoki, @FelixWickframework support for probabilistic survival/time-to-event prediction with right censored data (#157) @fkiraly
basic set of time-to-event prediction estimators and survival prediction metrics (#161, #198) @fkiraly
Johnson Quantile-Parameterized Distributions (QPD) with bounded and unbounded mode (#144) @setoguchi-naoki, @FelixWick
abstract parallelization backend, for benchmarking and tuning (#160) @fkiraly, @hazrulakmal
Dependency changes#
pandasbounds have been updated to>=1.1.0,<2.3.0.
Core interface changes#
estimators and objects now record author and maintainer information in the new tags
"authors"and"maintainers". This is required only for estimators inskproproper and compatible third party packages. It is also used to generate mini-package headers used in lookup functionality of theskprowebpage.the
model_selectionandbenchmarkingutilities now support abstract parallelization backends via thebackendandbackend_paramsarguments. This has been standardized to use the same backend options and syntax as the abstract parallelization backend insktime.
Probabilistic regression#
all probabilistic regressors now accept an argument
Cinfit, to pass censoring information. This is for API compatibility with survival and is ignored when passed to non-survival regressors, corresponding to the naive reduction strategy of “ignoring censoring information”.existing pipelines, tuners and ensemble methods have been extended to support survival prediction - if
Cif passed, it is passed to the underlying components.
Survival and time-to-event prediction#
support for probabilistic survival or time-to-event prediction estimators with right censored data has been introduced. The interface and base class is identical to the tabular probabilistic regression interface, with the addition of a
Cargument to thefitmethods. Regressors that genuinely support survival prediction have thecapability: survivaltag set toTruein their metadata.an extension template for survival prediction has been added to the
skproextension templates, inextension_templatesthe interface for probabilistic performance metrics has been extended to also accept censoring information, which can be passed via the optional
C_trueargument, to all performance metrics. Metrics genuinely supporting survival prediction have thecapability: survivaltag set toTrue. Other metrics still take theC_trueargument, but ignore it. This corresponds to the naive reduction strategy of “ignoring censoring information”.for pipelining and tuning, the existing compositors in
model_selectionandregression.composecan be used, see above.for benchmarking, the existing benchmarking framework in
benchmarkingcan be used, it has been extended to support survival prediction and censoring information.
Enhancements#
BaseObject and base framework#
Probability distributions#
[ENH] Johnson Quantile-Parameterized Distributions (QPD) with bounded and unbounded mode (#144) @setoguchi-naoki, @FelixWick
Probabilistic regression#
[ENH] Cyclic boosting interface (#144) @setoguchi-naoki, @FelixWick
[ENH] abstract parallelization backend, refactor of
evaluateand tuners, extend evaluate and tuners to survival predictors (#160) @fkiraly, @hazrulakmal
Survival and time-to-event prediction#
[ENH] support for survival/time-to-event prediction, statsmodels Cox PH model (#157) @fkiraly
[ENH] survival prediction compositor - reducers to tabular probabilistic regression (#161) @fkiraly
[ENH] survival prediction metrics - framework support and tests, SPLL, Harrell C (#198) @fkiraly
Fixes#
Probabilistic regression#
Test framework#
Documentation#
Maintenance#
[MNT] [Dependabot](deps): Bump styfle/cancel-workflow-action from
0.12.0to0.12.1(#183) @dependabot[MNT] skip
CyclicBoostingand QPD tests until #189 failures are resolved (#193) @fkiraly[MNT] [Dependabot](deps-dev): Update pandas requirement from
<2.2.0,>=1.1.0to>=1.1.0,<2.3.0(#182) @dependabot[MNT] [Dependabot](deps): Bump codecov/codecov-action from 3 to 4 by (#201) @dependabot
[MNT] [Dependabot](deps): Bump pre-commit/action from
3.0.0to3.0.1(#202) @dependabot
Contributors#
[2.1.3] - 2023-01-22#
sklearn compatibility update:
compatibility with
sklearn 1.4.Xaddition of
feature_names_in_andn_features_in_default attributes toBaseProbaRegressor, written toselfinfit
Dependency changes#
sklearnbounds have been updated to<1.4.0,>=0.24.0.
Core interface changes#
probabilistic regressors will now always save attributes
feature_names_in_andn_features_in_toselfinfit.feature_names_in_is an 1Dnp.ndarrayof feature names seen infit,n_features_in_is anint, and equal tolen(feature_names_in_).this ensures compatibility with
sklearn, where these attributes are expected.the new attributes can also be queried via the existing
get_fitted_paramsinterface.
Enhancements#
[ENH] in
BaseRegressorProba.fit, use"feature_names"metadata field to store feature names and write toselfinfit(#180) @dependabot
Maintenance#
[MNT] [Dependabot](deps): Bump
actions/dependency-review-actionfrom 3 to 4 (#178) @dependabot[MNT] [Dependabot](deps-dev): Update polars requirement from
<0.20.0to<0.21.0(#176) @dependabot[MNT] [Dependabot](deps-dev): Update
sphinx-issuesrequirement from<4.0.0to<5.0.0(#179) @dependabot[MNT] [Dependabot](deps-dev): Update
scikit-learnrequirement from<1.4.0,>=0.24.0to>=0.24.0,<1.5.0(#177) @dependabot
[2.1.2] - 2024-01-07#
sklearnbased probabilistic regressors - Gaussian processes, Bayesian linear regression (#166) @fkiralySklearnProbaReg- general interface adapter tosklearnregressors with variance prediction model (#163) @fkiraly
Dependency changes#
scikit-basebounds have been updated to<0.8.0,>=0.6.1.polars(data container soft dependency) bounds have been updated to allow python 3.12.
Enhancements#
Data types, checks, conversions#
Probability distributions#
Probabilistic regression#
[ENH]
sklearnwrappers to str-coerce columns ofpd.DataFramebefore passing (#148) @fkiraly[ENH] clean up copy-paste leftovers in
BaseProbaRegressor(#156) @fkiraly[ENH] adapter for
sklearnprobabilistic regressors (#163) @fkiraly[ENH] interfacing all concrete
sklearnprobabilistic regressors (#166) @fkiraly
Test framework#
[ENH] scenario tests for mixed
pandascolumn index types (#145) @fkiraly[ENH] scitype inference utility, test class register, test class test condition (#159) @fkiraly
Fixes#
Probabilistic regression#
[BUG] in probabilistic regressors, ensure correct index treatment if
X: pd.DataFrameandy: np.ndarrayare passed (#146) @fkiraly
Documentation#
Maintenance#
[MNT] [Dependabot](deps): Bump
actions/upload-artifactfrom 3 to 4 (#154) @dependabot[MNT] [Dependabot](deps): Bump
actions/download-artifactfrom 3 to 4 (#153) @dependabot[MNT] [Dependabot](deps): Bump
actions/setup-pythonfrom 4 to 5 (#152) @dependabot[MNT] [Dependabot](deps-dev): Update
sphinx-galleryrequirement from<0.15.0to<0.16.0(#149) @dependabot[MNT] [Dependabot](deps-dev): Update
scikit-baserequirement from<0.7.0,>=0.6.1to>=0.6.1,<0.8.0(#169) @dependabot[MNT] adding
codecov.ymland turning coverage reports informational (#165) @fkiraly[MNT] handle deprecation of
pandas.DataFrame.applymap(#170) @fkiraly
[2.1.1] - 2023-11-02#
probabilistic regressor: multiple quantile regression (#108) @Ram0nB
probabilistic regressor: interface to
MapieRegressorfrommapiepackage (#136) @fkiraly
Data types, checks, conversions#
Probabilistic regression#
Test framework#
[ENH] integrate
check_estimatorwithTestAllEstimatorsandTestAllRegressorsfor python command line estimator testing (#138) @fkiraly
Documentation#
Maintenance#
Fixes#
Contributors#
[2.1.0] - 2023-10-09#
Python 3.12 compatibility release.
[MNT] [Dependabot](deps-dev): Update
numpyrequirement from<1.25,>=1.21.0to>=1.21.0,<1.27(#118) @dependabot[MNT] Python 3.12 support - for
skprorelease 2.1.0 (#109) @fkiraly
[2.0.1] - 2023-10-08#
Release with minor maintenance actions and enhancements.
[MNT] address deprecation of
skbase.testing.utils.deep_equals(#111) @fkiraly[MNT] activate
dependabotfor version updates and maintenance (#110) @fkiraly[MNT] [Dependabot](deps): Bump
styfle/cancel-workflow-actionfrom 0.9.1 to 0.12.0 (#113) @dependabot[MNT] [Dependabot](deps): Bump
actions/dependency-review-actionfrom 1 to 3 (#114) @dependabot[MNT] [Dependabot](deps): Bump
actions/checkoutfrom 3 to 4 (#115) @dependabot[MNT] [Dependabot](deps): Bump
actions/download-artifactfrom 2 to 3 (#116) @dependabot[MNT] [Dependabot](deps): Bump
actions/upload-artifactfrom 2 to 3 (#117) @dependabot
[2.0.0] - 2023-09-13#
Re-release of skpro, newly rearchitected using skbase!
Try out skpro v2 on Binder!
Contributions, bug reports, and feature requests are welcome on the issue tracker
or on the community Discord.
[1.0.1] - 2019-02-18#
First stable release of skpro, last release before hiatus.
[1.0.0b] - 2017-12-08#
First public release (beta) of skpro.