As part of this task I:
- created Python 3.9.13 venv for such task, venv was used both for analysis notebooks and
train.py
/predict.py
testing. - analysed provided dataset
train.csv
(process of analysis in1.EDA.ipynb
and2 Model selection.ipynb
) - found out that target column is generated as
target = abs(var6)**2 + var7
, where var6 and var7 - columns '6' and '7' respectively. - prepared files
train.py
to recreate linear regression model training andpredict.py
for model inference - generated predictions for
hidden_test.csv
dataset -predictions.csv
1.EDA.ipynb
- general analysis oftrain.csv
2 Model selection.ipynb
- additional research to find model that is the most effective in describing relationship target ~ data and see what info about this relationship I can extract from ittrain.py
- script for model training and savingpredict.py
- script for generating predictions from saved model.gitignore
- git exceptionsREADME.md
- this READMErequirements.txt
- requirements for venv recreationpredictions.csv
- predictions forhidden_test.csv
All elements (both notebooks and scripts) where created and tested in Python 3.9.13 venv with requirements as provided in requirements.txt
.
To train model run in terminal:
$ python train.py
The script accepts two optional arguments:
--train-file
: Path to the CSV file containing the training data. Default istrain.csv
.--model-file
: Path to save the trained model. Default ismodel.pkl
.
so if there is need to set dataset other than train.csv
and/or model's file name other than model.pkl
, use this command:
$ python train.py --train-file custom_train_data.csv --model-file custom_model.pkl
To generate predictions from saved model run in terminal:
$ python predict.py
The script accepts three optional arguments:
--model-file
: Path to the pre-trained model file. Default ismodel.pkl
.--test-file
: Path to the CSV file containing the test data. Default ishidden_test.csv
.--output-file
: Path to save the prediction results as a CSV file. Default ispredictions.csv
.
so if there is need to set dataset other than hidden_test.csv
, and/or model's file name other than model.pkl
, and/or predictions file name other than predictions.csv
use this command:
$ python predict.py --model-file custom_model.pkl --test-file custom_test_data.csv --output-file custom_predictions.csv