Skip to content

Some issues getting this to work on Linux #2

Open
@samr

Description

@samr

Overall, I really liked the write up on your blog. Thank you.

I ran into a few different issues that may or may not be related to getting this to work on Linux.

  1. sklearn.dataset.make_classification appears to return NaN values for the Result column when running dataset.py to create the dataset. (This is likely not related to Linux)
  2. The mlflow experiment's artifact_location defaults to "/C:" which is very windows specific.

My fixes for these were:

diff --git a/steps/clean.py b/steps/clean.py
index ecbcc4b..64d755f 100644
--- a/steps/clean.py
+++ b/steps/clean.py
@@ -24,5 +24,9 @@ class Cleaner:
         IQR = Q3 - Q1
         upper_bound = Q3 + 1.5 * IQR
         data = data[data['AnnualPremium'] <= upper_bound]
+
+        data['Result'] = data['Result'].fillna(0.0)

diff --git a/main.py b/main.py
index 71b6ce1..d8936ad 100644
--- a/main.py
+++ b/main.py
@@ -1,3 +1,4 @@
+import os
 import logging
 import yaml
 import mlflow
@@ -49,9 +50,16 @@ def train_with_mlflow():
     with open('config.yml', 'r') as file:
         config = yaml.safe_load(file)
 
-    mlflow.set_experiment("Model Training Experiment")
-    
-    with mlflow.start_run() as run:
+    experiment_name = "Model Training Experiment #1"
+    try:
+        experiment = mlflow.get_experiment_by_name(experiment_name)
+        experiment_id = experiment.experiment_id
+    except AttributeError:
+        print(f"Creating experiment: {experiment_name}")
+        artifact_path = os.path.join(os.path.dirname(__file__), "mlruns")
+        experiment_id = mlflow.create_experiment(experiment_name, artifact_location=artifact_path)
+
+    with mlflow.start_run(experiment_id=experiment_id) as run:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions