Skip to content

Documentation misleading about search with no parameters #27

@micahjsmith

Description

@micahjsmith
  • AutoBazaar version: 560730b
  • Python version: 3.7.7
  • Operating System (python -c 'import platform;print(platform.platform())'): Darwin-19.6.0-x86_64-i386-64bit

Description

Documentation has this claim:

For example if you want to search for the best

$ abz search -i /path/to/your/datasets/folder name_of_your_dataset

This will evaluate the default pipeline without performing additional tuning iteration on it.

This seems to be misleading, as running the search with no arguments actually evaluates 1000+ iterations before I killed it.

What I Did

$ time abz search 196_autoMpg
Using TensorFlow backend.
20201015192335979857 - Processing Datasets: ['196_autoMpg']
###############################
#### Searching 196_autoMpg ####
###############################
[15:23:37] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
<repeated 8000 times>
^C
###############################
#### Executing 196_autoMpg ####
###############################
[16:23:50] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
Executing best pipeline ABPipeline({
    "primitives": [
        "mlprimitives.custom.feature_extraction.CategoricalEncoder",
        "sklearn.impute.SimpleImputer",
        "sklearn.preprocessing.RobustScaler",
        "xgboost.XGBRegressor"
    ],
    "init_params": {},
    "input_names": {},
    "output_names": {},
    "hyperparameters": {
        "mlprimitives.custom.feature_extraction.CategoricalEncoder#1": {
            "keep": false,
            "copy": true,
            "features": "auto",
            "max_unique_ratio": 0,
            "max_labels": 25
        },
        "sklearn.impute.SimpleImputer#1": {
            "missing_values": NaN,
            "fill_value": null,
            "verbose": false,
            "copy": true,
            "strategy": "median"
        },
        "sklearn.preprocessing.RobustScaler#1": {
            "quantile_range": [
                25.0,
                75.0
            ],
            "copy": true,
            "with_centering": true,
            "with_scaling": true
        },
        "xgboost.XGBRegressor#1": {
            "n_jobs": -1,
            "n_estimators": 617,
            "max_depth": 9,
            "learning_rate": 0.03240539972838852,
            "gamma": 0.27690923264683187,
            "min_child_weight": 5
        }
    },
    "tunable_hyperparameters": {
        "mlprimitives.custom.feature_extraction.CategoricalEncoder#1": {
            "max_labels": {
                "type": "int",
                "default": 0,
                "range": [
                    0,
                    100
                ]
            }
        },
        "sklearn.impute.SimpleImputer#1": {
            "strategy": {
                "type": "str",
                "default": "mean",
                "values": [
                    "mean",
                    "median",
                    "most_frequent",
                    "constant"
                ]
            }
        },
        "sklearn.preprocessing.RobustScaler#1": {
            "with_centering": {
                "description": "If True, center the data before scaling. This will cause transform to raise an exception when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory",
                "type": "bool",
                "default": true
            },
            "with_scaling": {
                "description": "If True, scale the data to interquartile range",
                "type": "bool",
                "default": true
            }
        },
        "xgboost.XGBRegressor#1": {
            "n_estimators": {
                "type": "int",
                "default": 100,
                "range": [
                    10,
                    1000
                ]
            },
            "max_depth": {
                "type": "int",
                "default": 3,
                "range": [
                    3,
                    10
                ]
            },
            "learning_rate": {
                "type": "float",
                "default": 0.1,
                "range": [
                    0,
                    1
                ]
            },
            "gamma": {
                "type": "float",
                "default": 0.1,
                "range": [
                    0,
                    1
                ]
            },
            "min_child_weight": {
                "type": "int",
                "default": 1,
                "range": [
                    1,
                    10
                ]
            }
        }
    },
    "outputs": {
        "default": [
            {
                "name": "y",
                "type": "array",
                "variable": "xgboost.XGBRegressor#1.y"
            }
        ]
    },
    "id": "e168ec26-31f0-4e78-a3a7-3ef18bf432c8",
    "name": "single_table/regression/default",
    "template": null,
    "loader": {
        "data_modality": "single_table",
        "task_type": "regression"
    },
    "score": 8.4004691556447,
    "rank": 8.400469155645126,
    "metric": "meanSquaredError"
})
#############################
#### Scoring 196_autoMpg ####
#############################
Score: 7.041906911649814
       predictions     targets
count   100.000000  100.000000
mean     23.589642   23.478000
std       7.581228    7.573446
min      10.351545   10.000000
25%      17.002141   17.375000
50%      24.067155   23.250000
75%      29.522121   28.000000
max      38.241291   44.000000
                                         pipeline     score      rank  cv_score            metric data_modality   task_type task_subtype     elapsed  iterations  load_time  trivial_time      cv_time error  step
dataset
196_autoMpg  e168ec26-31f0-4e78-a3a7-3ef18bf432c8  7.041907  8.400469  8.400469  meanSquaredError  single_table  regression   univariate  3613.11274      1693.0   0.059046      1.091654  3307.688052  None  None

real    60m17.985s
user    61m12.325s
sys     50m16.661s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions