- Installed minikube and docker desktop
- disable firewall for network access if needed
- unzip data and mount location for data access
-
Assuming
data.zip
is unzipped to current directorymlops_pipeline_demo
, start minikube with data mapping for data load:
minikube start --mount-string="D:/mlops_pipeline_demo/data:/data" --mount
orminikube start --mount-string="/home/<user>/mlops_pipeline_demo/data:/data" --mount
-
Create config map:
kubectl create configmap ml-config --from-file config-maps/config-map.yaml -n ml-workflow
-
Build docker containers for stages
./build_docker_images.sh
-
Start minio storage:
kubectl apply -f deployments/minio.yaml -n=ml-workflow
-
Run
alameda_data_prep:25.05.3
container to load data from local path, apply feature engineering and store into MINIo.
This step populatestrain
andpredict
bucket with transformed features, and createsholdout
bucket with 1st file from train set for model evaluation purpose.
kubectl apply -f deployments/data_prep.yaml
-
Start MLFlow experiment tracking server:
kubectl apply -f deployments/mlflow.yaml
-
Run
alameda_model_training:25.05.1
container:
kubectl apply -f deployments/model_training.yaml
-
Run
alameda_batch_inference:25.05.1
container:
kubectl apply -f deployments/model_batch_inference.yaml
-
Launch postgres for storing predictions:
kubectl apply -f deployments/postgres.yaml
-
Start model monitoring with :
kubectl apply -f deployments/model_monitoring.yaml
Run ./forward_ports.sh
and navigate to MINIo, MLFlow and AlamedaMonitoring dashboards.
- persistent storage option is not used to simplify installation process
- data memory usage overflow for non-distributed processing of data load for model training
- simplified data feature engineering (no map-reduce to efficiently calculate features across shards of data)
- config map is used for secrets for simplicity
- deployment specs are not fully parametrized
Reproducing fully might require manual involvement to properly set img tags for a consistent execution.