Existing ways of deploying an Amazon EKS cluster with Kubeflow inside don’t offer a clear blueprint to set up, serve, update, and scale multiple clusters quickly. They require using several CLIs (kubectl
, kfctl
, ect.), limit what resources you can add to your cluster after deployment, and don’t allow for CI/CD automation or easy replication of cluster configurations you create.
Swiss Army Kube for Kubeflow (SAKK) is a free open-source Terraform-based IaC tool that allows you to declaratively set up modular ML-ready AWS EKS clusters with Kubeflow, automated with GitOps. SAKK provides a blueprint based on the best DevOps practices, which allows for one-click cluster replication, easy management, and augmenting your clusters with any resources, not limited to native tools. SAKK helps to quickly bring ML-ready clusters to production, and scale them by adding new modules as you go. SAKK is built on top of Terraform (infrastructure as code), ArgoCD (deployment automation & management of all Kubernetes resources), and Cognito (AWS identity provider). As a result of deployment with SAKK, you get a scalable modular cluster like this:
We believe that any organization or engineer using ML should be able to focus on their pipelines and applications without having to worry too much about the nitty-gritty of infrastructure deployment. Currently, SAKK is available for the Amazon EKS (Elastic Kubernetes Service) cluster only. We plan to expand to other platforms soon.
Swiss Army Kube for Kubeflow is based on the main Swiss Army Kube repository. SAKK is a SAK modification for the Kubeflow EKS setup based on SAK's collection of modules.
- Provision an AWS EKS cluster with Kubeflow inside in minutes
- Use existing project structure to set up your ML cluster configuration
- Configure your deployment with a single
.tf
file - Deploy with a couple of Terraform commands
- Add any resources to your cluster before or after deployment
- Deliver your projects and apps with GitOps CI/CD automation
- Easily edit, reconfigure, rerun, add or destroy resources
- Build your own ML training pipelines in Kubeflow on AWS EKS
- Manage your cluster with Terraform and Kubernetes CLI or ArgoCD CLI/UI
- Replicate your cluster configuration with a couple of clicks
- Configure and deploy as many ML clusters as you need fast
- Scale deployments by adding new resources as modules
- Reduce your cloud infrastructure spend with spot instances
- Maximize your workload cost-efficiency
This repository is a template of a Kubeflow EKS cluster for your ML projects. Modify the main.tf
file to set up a cluster and deploy it to AWS with Terraform commands. With this simple yet powerful workflow, you can provision as many ML-ready EKS clusters (with different settings of variables, networks, Kubernetes versions, etc.) as you want in no time.
- Prerequisites
- Prepare an AWS account with configured IAM user
- Fork and clone this repository
- Install Terraform
- Install AWS CLI
- Configure your EKS cluster before deployment using the repo as a template
- Configure
backend.hcl
- Configure variables in
main.tf
- Configure
- Deploy your Kubeflow Kubernetes EKS cluster with Terraform commands
- Commit and push the repository
- Manage your Kubernetes cluster with ArgoCD (or configure
kubectl
) and deploy your ML apps to it.
To see the cluster configuration and deployment process, you can check out this demo video:
Cluster configuration and deployment
To see the deployment of an example located in the sak-kubeflow/examples/simple
directory, you can check out this demo video:
Example deployment
SAKK is great for enterprises that work on ML/AI projects and want to deploy and manage Kubeflow clusters on AWS EKS in a declarative, modular, repeatable, GitOps way.
Please visit our Quickstart to get ready with prerequisites, configure your cluster, and deploy it with Terraform commands:
terraform init
terraform apply
aws --region <region> eks update-kubeconfig --name <cluster-name>
After the deployment, you'll have a Kubernetes cluster with Kubeflow and ArgoCD inside, deployed on AWS EKS and automated with GitOps (as shown below). You can manage your Kubernetes cluster with kubectl
CLI or ArgoCD CLI/UI, and your AWS cluster with AWS or Terraform CLI.
To get involved, please check out our CONTRIBUTING.md.
We are always happy to hear your thoughts and questions about SAKK. Please join our Slack to discuss: