Open
Description
Overall, I really liked the write up on your blog. Thank you.
I ran into a few different issues that may or may not be related to getting this to work on Linux.
- sklearn.dataset.make_classification appears to return NaN values for the Result column when running dataset.py to create the dataset. (This is likely not related to Linux)
- The mlflow experiment's artifact_location defaults to "/C:" which is very windows specific.
My fixes for these were:
diff --git a/steps/clean.py b/steps/clean.py
index ecbcc4b..64d755f 100644
--- a/steps/clean.py
+++ b/steps/clean.py
@@ -24,5 +24,9 @@ class Cleaner:
IQR = Q3 - Q1
upper_bound = Q3 + 1.5 * IQR
data = data[data['AnnualPremium'] <= upper_bound]
+
+ data['Result'] = data['Result'].fillna(0.0)
diff --git a/main.py b/main.py
index 71b6ce1..d8936ad 100644
--- a/main.py
+++ b/main.py
@@ -1,3 +1,4 @@
+import os
import logging
import yaml
import mlflow
@@ -49,9 +50,16 @@ def train_with_mlflow():
with open('config.yml', 'r') as file:
config = yaml.safe_load(file)
- mlflow.set_experiment("Model Training Experiment")
-
- with mlflow.start_run() as run:
+ experiment_name = "Model Training Experiment #1"
+ try:
+ experiment = mlflow.get_experiment_by_name(experiment_name)
+ experiment_id = experiment.experiment_id
+ except AttributeError:
+ print(f"Creating experiment: {experiment_name}")
+ artifact_path = os.path.join(os.path.dirname(__file__), "mlruns")
+ experiment_id = mlflow.create_experiment(experiment_name, artifact_location=artifact_path)
+
+ with mlflow.start_run(experiment_id=experiment_id) as run:
Metadata
Metadata
Assignees
Labels
No labels