Skip to content

Questions on hyperparameter distributions and validation percentage #174

Open
@ClimbsRocks

Description

@ClimbsRocks

Question from @MelvinDunn that I'm documenting here:

Had one question while I was looking at the ol' machina:

-How does this machine determine the starting points for hyperparams?
-How does it determine the validation size? (Couldn't find it)

Sorry, I was interested, and while I know I could easily just look at the
code myself, I thought you would know off the top of your head.

I'm extremely interested in AutoML, and I think this machine is, well,
wonderful.

Thanks again,

Melvin

My response:
I love curiosity- thanks for continuing to ask questions!

  1. We use RandomizedSearchCV to find the optimal hyperparameters. It picks parameters randomly from the distributions we give. Those distributions can be found in pySetup/parameterMakers.
  2. Right now the validation size is just hard-coded in. It's a pretty large split. I've messed around with different values, but I want to say it's somewhere around 20-40% depending on the size of the input data. The exception to this is data like Numer.ai that has a specific validationSplit column, that must be specified in the dataDescription row (where we specify what type of data each column holds). Then we just use that validation split.

Keep the questions coming!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions