Releases: alteryx/evalml
Releases · alteryx/evalml
v0.12.0.dev1
Publishing a new package to TestPyPi that has unit tests included.
v0.12.0
v0.12.0 Aug. 3, 2020
Enhancements
- Added string and categorical targets support for binary and multiclass pipelines and check for numeric targets for
DetectLabelLeakage
data check #932 - Added clear exception for regression pipelines if target datatype is string or categorical #960
- Added target column names and class labels in
predict
andpredict_proba
output for pipelines #951 - Added
_compute_shap_values
andnormalize_values
topipelines/explanations
module #958 - Added
explain_prediction
feature which explains single predictions with SHAP #974 - Added Imputer to allow different imputation strategies for numerical and categorical dtypes #991
- Added support for configuring logfile path using env var, and don't create logger if there are filesystem errors #975
- Updated catboost estimators' default parameters and automl hyperparameter ranges to speed up fit time #998
Fixes
- Fixed ReadtheDocs warning failure regarding embedded gif #943
- Removed incorrect parameter passed to pipeline classes in
_add_baseline_pipelines
#941 - Added universal error for calling
predict
,predict_proba
,transform
, andfeature_importances
before fitting #969, #994 - Made
TextFeaturizer
component and pip dependenciesfeaturetools
andnlp_primitives
optional #976 - Updated imputation strategy in automl to no longer limit impute strategy to
most_frequent
for all features if there are any categorical columns #991 - Fixed UnboundLocalError for
cv_pipeline
when automl search errors #996 - Fixed
Imputer
to reset dataframe index to preserve behavior expected fromSimpleImputer
#1009
Changes
- Moved
get_estimators
toevalml.pipelines.components.utils
#934 - Modified Pipelines to raise
PipelineScoreError
when they encounter an error during scoring #936 - Moved
evalml.model_families.list_model_families
toevalml.pipelines.components.allowed_model_families
#959 - Renamed
DateTimeFeaturization
toDateTimeFeaturizer
#977
Documentation Changes
- Update README.md #963
- Reworded message when errors are returned from data checks in search #982
- Added section on understanding model predictions with
explain_prediction
to User Guide #981 - Added a section to the user guide and api reference about how XGBoost and CatBoost are not fully supported. #992
- Added custom components section in user guide #993
- Update FAQ section formatting #997
- Update release process documentation #1003
Testing Changes
- Moved
predict_proba
andpredict
tests regarding string / categorical targets totest_pipelines.py
#972 - Fix dependency update bot by updating python version to 3.7 to avoid frequent github version updates #1002
Breaking Changes
get_estimators
has been moved toevalml.pipelines.components.utils
(previously was underevalml.pipelines.utils
) #934- Removed the
raise_errors
flag in AutoML search. All errors during pipeline evaluation will be caught and logged. #936 evalml.model_families.list_model_families
has been moved toevalml.pipelines.components.allowed_model_families
#959TextFeaturizer
: thefeaturetools
andnlp_primitives
packages must be installed after installing evalml in order to use this component #976- Renamed
DateTimeFeaturization
toDateTimeFeaturizer
#977
v0.11.2
v0.11.2 July 16, 2020
Enhancements
- Added
NoVarianceDataCheck
toDefaultDataChecks
#893 - Added text processing and featurization component
TextFeaturizer
#913, #924 - Added additional checks to InvalidTargetDataCheck to handle invalid target data types #929
Fixes
- Makes automl results a read-only property #919
Changes
- Deleted static pipelines and refactored tests involving static pipelines, removed
all_pipelines()
andget_pipelines()
#904 - Moved
list_model_families
toevalml.model_family.utils
#903 - Updated
all_pipelines
,all_estimators
,all_components
to use the same mechanism for dynamically generating their elements #898 - Rename
master
branch tomain
#918 - Add pypi release github action #923
- Updated AutoMLSearch.search stdout output and logging and removed tqdm progress bar #921
- Moved automl config checks previously in
search()
to init #933
Documentation Changes
Testing Changes
- Cleaned up fixture names and usages in tests #895
Breaking Changes
list_model_families
has been moved toevalml.model_family.utils
(previously was underevalml.pipelines.utils
) #903- Static pipeline definitions have been removed, but similar pipelines can still be constructed via creating an instance of PipelineBase #904
all_pipelines()
andget_pipelines()
utility methods have been removed #904
v0.11.dev1 July 10, 2020
A development release to check pypi github action deployment to test.pypi.org.
v0.11.0
v0.11.0 June 30, 2020
Enhancements
- Added multiclass support for ROC curve graphing #832
- Added preprocessing component to drop features whose percentage of NaN values exceeds a specified threshold #834
- Added data check to check for problematic target labels #814
- Added PerColumnImputer that allows imputation strategies per column #824
- Added transformer to drop specific columns #827
- Added support for
categories
,handle_error
, anddrop
parameters inOneHotEncoder
#830 #897 - Added preprocessing component to handle DateTime columns featurization #838
- Added ability to clone pipelines and components #842
- Define getter method for component
parameters
#847 - Added utility methods to calculate and graph permutation importances #860, #880
- Added new utility functions necessary for generating dynamic preprocessing pipelines #852
- Added kwargs to all components #863
- Updated
AutoSearchBase
to use dynamically generated preprocessing pipelines #870 - Added SelectColumns transformer #873
- Added ability to evaluate additional pipelines for automl search #874
- Added
default_parameters
class property to components and pipelines #879 - Added better support for disabling data checks in automl search #892
- Added ability to save and load AutoML objects to file #888
- Updated
AutoSearchBase.get_pipelines
to return an untrained pipeline instance #876 - Saved learned binary classification thresholds in automl results cv data dict #876
Fixes
- Fixed bug where SimpleImputer cannot handle dropped columns #846
- Fixed bug where PerColumnImputer cannot handle dropped columns #855
- Enforce requirement that builtin components save all inputted values in their parameters dict #847
- Don't list base classes in
all_components
output #847 - Standardize all components to output pandas data structures, and accept either pandas or numpy #853
- Fixed rankings and full_rankings error when search has not been run #894
Changes
- Update
all_pipelines
andall_components
to try initializing pipelines/components, and on failure exclude them #849 - Refactor
handle_components
tohandle_components_class
, standardize toComponentBase
subclass instead of instance #850 - Refactor "blacklist"/"whitelist" to "allow"/"exclude" lists #854
- Replaced
AutoClassificationSearch
andAutoRegressionSearch
withAutoMLSearch
#871 - Renamed feature_importances and permutation_importances methods to use singular names (feature_importance and permutation_importance) #883
- Updated
automl
default data splitter to train/validation split for large datasets #877 - Added open source license, update some repo metadata #887
Documentation Changes
- Fix some typos and update the EvalML logo #872
Testing Changes
- Update the changelog check job to expect the new branching pattern for the deps update bot #836
- Check that all components output pandas datastructures, and can accept either pandas or numpy #853
- Replaced
AutoClassificationSearch
andAutoRegressionSearch
withAutoMLSearch
#871
Breaking Changes
- Pipelines' static
component_graph
field must contain eitherComponentBase
subclasses orstr
, instead ofComponentBase
subclass instances #850 - Rename
handle_component
tohandle_component_class
. Now standardizes toComponentBase
subclasses instead ofComponentBase
subclass instances #850 - Renamed automl's
cv
argument todata_split
#877 - Pipelines' and classifiers'
feature_importances
is renamedfeature_importance
,graph_feature_importances
is renamedgraph_feature_importance
#883 - Passing
data_checks=None
to automl search will not perform any data checks as opposed to default checks. #892 - Pipelines to search for in AutoML are now determined automatically, rather than using the statically - defined pipeline classes. #870
- Updated
AutoSearchBase.get_pipelines
to return an untrained pipeline instance, instead of one which happened to be trained on the final cross - validation fold #876
v0.10.0
v0.10.0 May 29, 2020
Enhancements
- Added baseline models for classification and regression, add functionality to calculate baseline models before searching in AutoML #746
- Port over highly-null guardrail as a data check and define
DefaultDataChecks
andDisableDataChecks
classes #745 - Update
Tuner
classes to work directly with pipeline parameters dicts instead of flat parameter lists #779 - Add Elastic Net as a pipeline option #812
- Added new Pipeline option
ExtraTrees
#790 - Added precicion-recall curve metrics and plot for binary classification problems in
evalml.pipeline.graph_utils
#794
Fixes
- Update pipeline
score
to returnnan
score for any objective which throws an exception during scoring #787 - Fixed bug introduced in #787 where binary classification metrics requiring predicted probabilities error in scoring #798
- CatBoost and XGBoost classifiers and regressors can no longer have a learning rate of 0 #795
Changes
- Cleanup pipeline
score
code, and cleanup codecov #711 - Remove
pass
for abstract methods for codecov #730 - Added str for AutoSearch object #675
- Add util methods to graph ROC and confusion matrix #720
- Refactor
AutoBase
toAutoSearchBase
#758 - Updated AutoBase with
data_checks
parameter, removed previousdetect_label_leakage
parameter, and added functionality to run data checks before search in AutoML #765 - Updated our logger to use Python's logging utils #763
- Refactor most of
AutoSearchBase._do_iteration
impl intoAutoSearchBase._evaluate
#762 - Port over all guardrails to use the new DataCheck API #789
- Expanded
import_or_raise
to catch all exceptions #759 - Adds RMSE, MSLE, RMSLE as standard metrics #788
- Don't allow
Recall
to be used as an objective for AutoML #784 - Removed feature selection from pipelines #819
Documentation Changes
- Add instructions to freeze
master
onrelease.md
#726 - Update release instructions with more details #727 #733
- Add objective base classes to API reference #736
- Fix components API to match other modules #747
Testing Changes
- Delete codecov yml, use codecov.io's default #732
- Added unit tests for fraud cost, lead scoring, and standard metric objectives #741
- Update codecov client #782
- Updated AutoBase str test to include no parameters case #783
- Added unit tests for
ExtraTrees
pipeline #790 - If codecov fails to upload, fail build #810
- Updated Python version of dependency action #816
- Update the dependency update bot to use a suffix when creating branches #817
Breaking Changes
- The
detect_label_leakage
parameter for AutoML classes has been removed and replaced by adata_checks
parameter #765 - Moved ROC and confusion matrix methods from
evalml.pipeline.plot_utils
toevalml.pipeline.graph_utils
#720 Tuner
classes require a pipeline hyperparameter range dict as an init arg instead of a space definition #779Tuner.propose
andTuner.add
work directly with pipeline parameters dicts instead of flat parameter lists #779PipelineBase.hyperparameters
andcustom_hyperparameters
use pipeline parameters dict format instead of being represented as a flat list #779- All guardrail functions previously under
evalml.guardrails.utils
will be removed and replaced by data checks #789 Recall
disallowed as an objective for AutoML #784
v0.9.0
v0.9.0 Apr. 27, 2020
Enhancements
- Added accuracy as an standard objective :pr:
624
- Added verbose parameter to load_fraud :pr:
560
- Added Balanced Accuracy metric for binary, multiclass :pr:
612
:pr:661
- Added XGBoost regressor and XGBoost regression pipeline :pr:
666
- Added Accuracy metric for multiclass :pr:
672
- Added objective name in
AutoBase.describe_pipeline
:pr:686
Fixes
- Removed direct access to
cls.component_graph
:pr:595
- Add testing files to .gitignore :pr:
625
- Remove circular dependencies from
Makefile
:pr:637
- Add error case for
normalize_confusion_matrix()
:pr:640
- Fixed XGBoostClassifier and XGBoostRegressor bug with feature names that contain [, ], or < :pr:
659
- Update make_pipeline_graph to not accidentally create empty file when testing if path is valid :pr:
649
- Fix pip installation warning about docsutils version, from boto dependency :pr:
664
- Removed zero division warning for F1/precision/recall metrics :pr:
671
- Fixed
summary
for pipelines without estimators :pr:707
Changes
- Updated default objective for binary/multiseries classification to log loss :pr:
613
- Created classification and regression pipeline subclasses and removed objective as an attribute of pipeline classes :pr:
405
- Changed the output of
score
to return one dictionary :pr:429
- Created binary and multiclass objective subclasses :pr:
504
- Updated objectives API :pr:
445
- Removed call to
get_plot_data
from AutoML :pr:615
- Set
raise_error
to default to True for AutoML classes :pr:638
- Remove unnecessary "u" prefixes on some unicode strings :pr:
641
- Changed one-hot encoder to return uint8 dtypes instead of ints :pr:
653
- Pipeline
_name
field changed tocustom_name
:pr:650
- Removed
graphs.py
and moved methods intoPipelineBase
:pr:657
, :pr:665
- Remove s3fs as a dev dependency :pr:
664
- Changed requirements-parser to be a core dependency :pr:
673
- Replace
supported_problem_types
field on pipelines withproblem_type
attribute on base classes :pr:678
- Changed AutoML to only show best results for a given pipeline template in
rankings
, addedfull_rankings
property to show all :pr:682
- Update
ModelFamily
values: don't list xgboost/catboost as classifiers now that we have regression pipelines for them :pr:677
- Changed AutoML's
describe_pipeline
to get problem type from pipeline instead :pr:685
- Standardize
import_or_raise
error messages :pr:683
- Updated argument order of objectives to align with sklearn's :pr:
698
- Renamed
pipeline.feature_importance_graph
topipeline.graph_feature_importances
:pr:700
- Moved ROC and confusion matrix methods to
evalml.pipelines.plot_utils
:pr:704
- Renamed
MultiClassificationObjective
toMulticlassClassificationObjective
, to align with pipeline naming scheme :pr:715
Documentation Changes
- Fixed some sphinx warnings :pr:
593
- Fixed docstring for AutoClassificationSearch with correct command :pr:
599
- Limit readthedocs formats to pdf, not htmlzip and epub :pr:
594
:pr:600
- Clean up objectives API documentation :pr:
605
- Fixed function on Exploring search results page :pr:
604
- Update release process doc :pr:
567
- AutoClassificationSearch and AutoRegressionSearch show inherited methods in API reference :pr:
651
- Fixed improperly formatted code in breaking changes for changelog :pr:
655
- Added configuration to treat Sphinx warnings as errors :pr:
660
- Removed separate plotting section for pipelines in API reference :pr:
657
, :pr:665
- Have leads example notebook load S3 files using https, so we can delete s3fs dev dependency :pr:
664
- Categorized components in API reference and added descriptions for each category :pr:
663
- Fixed Sphinx warnings about BalancedAccuracy objective :pr:
669
- Updated API reference to include missing components and clean up pipeline docstrings :pr:
689
- Reorganize API ref, and clarify pipeline sub-titles :pr:
688
- Add and update preprocessing utils in API reference :pr:
687
- Added inheritance diagrams to API reference :pr:
695
- Documented which default objective AutoML optimizes for :pr:
699
- Create seperate install page :pr:
701
- Include more utils in API ref, like
import_or_raise
:pr:704
- Add more color to pipeline documentation :pr:
705
Testing Changes
- Matched install commands of
check_latest_dependencies
test and it's GitHub action :pr:578
- Added Github app to auto assign PR author as assignee :pr:
477
- Removed unneeded conda installation of xgboost in windows checkin tests :pr:
618
- Update graph tests to always use tmpfile dir :pr:
649
- Changelog checkin test workaround for release PRs: If 'future release' section is empty of PR refs, pass check :pr:
658
Breaking Changes
- Pipelines will now no longer take an objective parameter during instantiation, and will no longer have an objective attribute.
fit()
andpredict()
now use an optionalobjective
parameter, which is only used in binary classification pipelines to fit for a specific objective.score()
will now use a requiredobjectives
parameter that is used to determine all the objectives to score on. This differs from the previous behavior, where the pipeline's objective was scored on regardless.score()
will now return one dictionary of all objective scores.ROC
andConfusionMatrix
plot methods viaAuto(*).plot
have been removed by :pr:615
and are replaced byroc_curve
andconfusion_matrix
inevamlm.pipelines.plot_utils
in :pr:704
normalize_confusion_matrix
has been moved toevalml.pipelines.plot_utils
:pr:704
- Pipelines
_name
field changed tocustom_name
- Pipelines
supported_problem_types
field is removed because it is no longer necessary :pr:678
- Updated argument order of objectives'
objective_function
to align with sklearn :pr:698
pipeline.feature_importance_graph
has been renamed topipeline.graph_feature_importances
in :pr:700
- Removed unsupported
MSLE
objective :pr:704
v0.8.0
v0.8.0 Apr. 1, 2020
Enhancements
- Add normalization option and information to confusion matrix #484
- Add util function to drop rows with NaN values #487
- Renamed
PipelineBase.name
asPipelineBase.summary
and redefinedPipelineBase.name
as class property #491 - Added access to parameters in Pipelines with
PipelineBase.parameters
(used to be return ofPipelineBase.describe
) #501 - Added
fill_value
parameter for SimpleImputer #509 - Added functionality to override component hyperparemeters and made pipelines take hyperparemeters from components #516
- Allow numpy.random.RandomState for random_state parameters #556
Fixes
Changes
- Undo version cap in XGBoost placed in #402 and allowed all released of XGBoost #407
- Support pandas 1.0.0 #486
- Made all references to the logger static #503
- Refactored
model_type
parameter for components and pipelines tomodel_family
#507 - Refactored
problem_types
for pipelines and components intosupported_problem_types
#515 - Moved
pipelines/utils.save_pipeline
andpipelines/utils.load_pipeline
toPipelineBase.save
andPipelineBase.load
#526 - Limit number of categories encoded by OneHotEncoder #517
Documentation Changes - Updated API reference to remove PipelinePlot and added moved PipelineBase plotting methods #483
- Add code style and github issue guides #463, #512
- Updated API reference for to surface class variables for pipelines and components #537
Testing Changes - Added automated dependency check PR #482, #505
- Updated automated dependency check comment #497
- Have build_docs job use python executor, so that env vars are set properly #547
- Run windows unit tests on PRs #557
Breaking Changes
AutoClassificationSearch
andAutoRegressionSearch
'smodel_types
parameter has been refactored intoallowed_model_families
ModelTypes
enum has been changed toModelFamily
- Components and Pipelines now have a
model_family
field instead ofmodel_type
get_pipelines
utility function now acceptsmodel_families
as an argument instead ofmodel_types
PipelineBase.name
no longer returns structure of pipeline and has been replaced byPipelineBase.summary
PipelineBase.problem_types
andEstimator.problem_types
has been renamed tosupported_problem_types
pipelines/utils.save_pipeline
andpipelines/utils.load_pipeline
moved toPipelineBase.save
andPipelineBase.load
v0.7.0
v0.7.0 Mar. 9, 2020
Enhancements
- Added emacs buffers to .gitignore #350
- Add CatBoost (gradient-boosted trees) classification and regression components and pipelines #247
- Added Tuner abstract base class #351
- Added n_jobs as parameter for AutoClassificationSearch and AutoRegressionSearch #403
- Changed colors of confusion matrix to shades of blue and updated axis order to match scikit-learn's #426
- Added PipelineBase graph and feature_importance_graph methods, moved from previous location #423
- Added support for python 3.8 #462
Fixes
- Fixed ROC and confusion matrix plots not being calculated if user passed own additional_objectives #276
- Fixed ReadtheDocs FileNotFoundError exception for fraud dataset #439
Changes
- Added n_estimators as a tunable parameter for XGBoost #307
- Remove unused parameter ObjectiveBase.fit_needs_proba #320
- Remove extraneous parameter component_type from all components #361
- Remove unused rankings.csv file #397
- Downloaded demo and test datasets so unit tests can run offline #408
- Remove
_needs_fitting
attribute from Components #398 - Changed plot.feature_importance to show only non-zero feature importances by default, added optional parameter to show all #413
- Dropped support for Python 3.5 #438
- Removed unused
apply.py
file #449 - Clean up requirements.txt to remove unused deps #451
Documentation Changes
- Update release.md with instructions to release to internal license key #354
Testing Changes
- Added tests for utils (and moved current utils to gen_utils) #297
- Moved XGBoost install into it's own separate step on Windows using Conda #313
- Rewind pandas version to before 1.0.0, to diagnose test failures for that version #325
- Added dependency update checkin test #324
- Rewind XGBoost version to before 1.0.0 to diagnose test failures for that version #402
- Update dependency check to use a whitelist #417
- Update unit test jobs to not install dev deps #455
Breaking Changes
- Python 3.5 will not be actively supported.
v0.6.0
v0.6.0 (Dec. 16, 2019)
Enhancements
- Added ability to create a plot of feature importances #133
- Add early stopping to AutoML using patience and tolerance parameters #241
- Added ROC and confusion matrix metrics and plot for classification problems and introduce PipelineSearchPlots class #242
- Enhanced AutoML results with search order #260
Fixes
- Lower botocore requirement #235
- Fixed decision_function calculation for FraudCost objective #254
- Fixed return value of Recall metrics #264
Changes
- Renamed automl classes to AutoRegressionSearch and AutoClassificationSearch #287
- Updating demo datasets to retain column names #223
- Moving pipeline visualization to PipelinePlots class #228
- Standarizing inputs as pd.Dataframe / pd.Series #130
- Enforcing that pipelines must have an estimator as last component #277
- Added ipywidgets as a dependency in requirements.txt #278
Documentation Changes
- Adding class properties to API reference #244
- Fix and filter FutureWarnings from scikit-learn #249, #257
- Adding Linear Regression to API reference and cleaning up some Sphinx warnings #227
Testing Changes
Breaking Changes
- The
fit()
method forAutoClassifier
andAutoRegressor
has been renamed tosearch()
. AutoClassifier
has been renamed toAutoClassificationSearch
AutoRegressor
has been renamed toAutoRegressionSearch
AutoClassificationSearch.results
andAutoRegressionSearch.results
now is a dictionary withpipeline_results
andsearch_order
keys.pipeline_results
can be used to access a dictionary that is identical to the old.results
dictionary. Whereas,search_order
returns a list of the search order in terms of pipeline id.- Pipelines now require an estimator as the last component in
component_list
. Slicing pipelines now throws an NotImplementedError to avoid returning Pipelines without an estimator.