-
Notifications
You must be signed in to change notification settings - Fork 11
aws/eks: Removing unnecessary abstraction layers, improving flexibility, addressing several issues #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. I am not familiar with TF. Will let others review.
abc7e98
to
f431791
Compare
* Added additional variables to eks-private and eks-public modules to allow for more customization of the EKS cluster. * General README cleanup including links for installing requirements in the eks-public (eks-private still needs updating) Additional updates: * pre-commit-config updates to latest versions. Changes to be committed: modified: ../../.pre-commit-config.yaml modified: eks-private/README.md modified: eks-private/eks.tf modified: eks-private/main.tf modified: eks-private/variables.tf modified: eks-public/README.md modified: eks-public/eks.tf modified: eks-public/main.tf modified: eks-public/variables.tf
Changes to be committed: modified: eks-private/README.md modified: eks-public/README.md
- Including sample value files for the helm charts. - Cleanup of readme for helm chart installation, ensuring that all helm charts are used similarly. - Included sample value yaml files which can be used with the helm charts. - Added additional outputs to make it slightly easier to use with helm charts. Changes to be committed: modified: README.md modified: outputs.tf new file: sample-values_elb.yaml new file: sample-values_nvdp.yaml
* Ensure that all helm chart installation commands are similar * Ensure that Anyscale Helm Chart examples are correct * Additional general cleanup Changes to be committed: modified: README.md
Tested and validated eks-private and eks-public examples. Changes to be committed: modified: ../eks-private/README.md modified: ../eks-private/main.tf modified: ../eks-private/outputs.tf new file: ../eks-private/sample-values_elb.yaml new file: ../eks-private/sample-values_nvdp.yaml modified: ../eks-private/variables.tf modified: ../eks-public/README.md modified: ../eks-public/main.tf modified: ../eks-public/variables.tf
* Changed to use existing Anyscale SG module * Removed VPC modules and added parameters for existing VPC and existing subnets * Readme updates Changes to be committed: modified: eks-existing/README.md modified: eks-existing/main.tf modified: eks-existing/outputs.tf new file: eks-existing/sample-values_elb.yaml new file: eks-existing/sample-values_nvdp.yaml modified: eks-existing/variables.tf modified: eks-private/README.md modified: eks-public/README.md modified: eks-public/main.tf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - huge thanks for working on this, @hongchaodeng !
* kubectl CLI | ||
* helm CLI | ||
|
||
### Creating Anyscale Resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Creating Anyscale Resources | |
### Creating Anyscale AWS Resources |
examples/aws/eks-private/README.md
Outdated
Create `terraform.tfvars`: | ||
|
||
```hcl | ||
aws_region = "us-west-2" | ||
``` | ||
|
||
Run: | ||
|
||
```shell | ||
terraform init | ||
terraform apply | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create `terraform.tfvars`: | |
```hcl | |
aws_region = "us-west-2" | |
``` | |
Run: | |
```shell | |
terraform init | |
terraform apply | |
``` | |
Review the `eks.tf` and make any changes necessary for your EKS deployment. | |
Initialize, plan and apply the Terraform following your companies policies. |
examples/aws/eks-private/README.md
Outdated
terraform apply | ||
``` | ||
|
||
### Installing K8s Components |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Installing K8s Components | |
### Installing Kubernetes Components |
examples/aws/eks-public/eks.tf
Outdated
version = "20.33.1" | ||
|
||
# Cluster basic configuration | ||
cluster_name = "anyscale-eks-public" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this a variable with a default of anyscale-eks-public
(and the same for private). I need to modify this file manually to know which one I'm using.
Summary
This pull request makes significant improvements to our EKS setup by removing unnecessary abstraction layers, improving flexibility, and addressing several issues with system components.
Changes
Replace
anyscale-eks
with the official AWS EKS modulePreviously, we relied on the
anyscale-eks
module, which added an extra layer of abstraction and limited user flexibility. Users had to request changes in theanyscale-eks
module before they could modify the underlying EKS configuration. By switching to the official AWS EKS module, users now have full control over their cluster configurations.Remove Helm management from Terraform and provide documentation for manual installation
Previously, system component Helm charts (e.g.,
ingress-nginx
) were managed within Terraform, leading to multiple issues:terraform destroy
could get stuck when certain components were hanging, even though the entire cloud infrastructure could be safely removed.terraform apply
, which is not aligned with the expected Kubernetes declarative model.helm install
, giving users direct control over their configurations.Fixes
AWS Load Balancer Controller (LBC) issues
After upgrading, LBC required additional subnet tagging and IAM policies, causing it to stop functioning. These necessary configurations have been added.
Ingress-nginx service misconfiguration
internal
, even for public clusters. This has been corrected to allow public-facing configurations where needed.NVIDIA device plugin controller issues
The existing setup lacked the correct node tolerations, preventing the controller from working properly. This has been fixed.
This version enhances readability, making it more structured and easier for users to understand. Let me know if you'd like any further refinements! 🚀
Pull request checklist
Please check if your PR fulfills the following requirements:
Pull Request Type
Example updates.
Does this introduce a breaking change?
The core examples for
eks-public
andeks-private
have been rewritten. These changes will force a new creation of an EKS cluster if you run this as an update to an existing deployment.