Predict sales prices and practice feature engineering, RFs, and gradient boosting
Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)
You can use the makefile to download the data from Kaggle into the raw_data folder.
make download_files
First you need to import the trainer.
# Import
from houses_trainer.trainer import Trainer
You have to initialize the trainer by choosing a predefined model.
# Instanciate trainer
trainer_ridge = Trainer(model="ridge")
Then you can load the data and build the pipeline.
# Load data
trainer_ridge.load_data()
# Build Pipeline
trainer_ridge.build_pipeline(feature_cutoff_percentage=75)
Once the pipeline is built, you can cross validate your model, and make a prediction using the test dataset. The prediction is saved in a csv file located in the submission folder.
# Cross Validate
trainer_ridge.cross_validate(cv=5)
# Prediction
trainer_ridge.predict()
Finally you may submit your results to the Kaggle competition
# Submit results
trainer_ridge.submit()