Skip to content

aws/eks: Removing unnecessary abstraction layers, improving flexibility, addressing several issues #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 7, 2025

Conversation

hongchaodeng
Copy link
Member

@hongchaodeng hongchaodeng commented Feb 21, 2025

Summary

This pull request makes significant improvements to our EKS setup by removing unnecessary abstraction layers, improving flexibility, and addressing several issues with system components.

Changes

  • Replace anyscale-eks with the official AWS EKS module
    Previously, we relied on the anyscale-eks module, which added an extra layer of abstraction and limited user flexibility. Users had to request changes in the anyscale-eks module before they could modify the underlying EKS configuration. By switching to the official AWS EKS module, users now have full control over their cluster configurations.

  • Remove Helm management from Terraform and provide documentation for manual installation
    Previously, system component Helm charts (e.g., ingress-nginx) were managed within Terraform, leading to multiple issues:

    • terraform destroy could get stuck when certain components were hanging, even though the entire cloud infrastructure could be safely removed.
    • User modifications to Helm chart values could be unintentionally overridden on subsequent terraform apply, which is not aligned with the expected Kubernetes declarative model.
    • Instead, we now document how to install these Helm charts using helm install, giving users direct control over their configurations.

Fixes

  • AWS Load Balancer Controller (LBC) issues
    After upgrading, LBC required additional subnet tagging and IAM policies, causing it to stop functioning. These necessary configurations have been added.

  • Ingress-nginx service misconfiguration

    • The service load balancer type was previously hardcoded to internal, even for public clusters. This has been corrected to allow public-facing configurations where needed.
    • Several outdated annotations have been updated following the upgrade.
  • NVIDIA device plugin controller issues
    The existing setup lacked the correct node tolerations, preventing the controller from working properly. This has been fixed.


This version enhances readability, making it more structured and easier for users to understand. Let me know if you'd like any further refinements! 🚀

Pull request checklist

Please check if your PR fulfills the following requirements:

  • pre-commit has been run
  • Tests for the changes have been added (for bug fixes / features)
  • All tests passing
  • Docs have been reviewed and added / updated if needed (for bug fixes / features)

Pull Request Type

  • Bugfix
  • New feature
  • Refactoring (no functional changes)
  • Documentation change
  • Other (please describe):
    Example updates.

Does this introduce a breaking change?

  • Yes
  • No

The core examples for eks-public and eks-private have been rewritten. These changes will force a new creation of an EKS cluster if you run this as an update to an existing deployment.

@hongchaodeng hongchaodeng requested a review from a team as a code owner February 21, 2025 16:58
@github-actions github-actions bot added documentation Improvements or additions to documentation examples labels Feb 21, 2025
@hongchaodeng hongchaodeng changed the title aws/eks: remove anyscale-eks dep and use official aws-eks module aws/eks: Removing unnecessary abstraction layers, improving flexibility, addressing several issues Feb 21, 2025
Copy link

@brucez-anyscale brucez-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx. I am not familiar with TF. Will let others review.

brent-anyscale and others added 7 commits February 26, 2025 17:22
* Added additional variables to eks-private and eks-public modules to allow for more customization of the EKS cluster.
* General README cleanup including links for installing requirements in the eks-public (eks-private still needs updating)

Additional updates:
* pre-commit-config updates to latest versions.

Changes to be committed:
	modified:   ../../.pre-commit-config.yaml
	modified:   eks-private/README.md
	modified:   eks-private/eks.tf
	modified:   eks-private/main.tf
	modified:   eks-private/variables.tf
	modified:   eks-public/README.md
	modified:   eks-public/eks.tf
	modified:   eks-public/main.tf
	modified:   eks-public/variables.tf
Changes to be committed:
	modified:   eks-private/README.md
	modified:   eks-public/README.md
- Including sample value files for the helm charts.
- Cleanup of readme for helm chart installation, ensuring that all helm charts are used similarly.
- Included sample value yaml files which can be used with the helm charts.
- Added additional outputs to make it slightly easier to use with helm charts.

Changes to be committed:
	modified:   README.md
	modified:   outputs.tf
	new file:   sample-values_elb.yaml
	new file:   sample-values_nvdp.yaml
* Ensure that all helm chart installation commands are similar
* Ensure that Anyscale Helm Chart examples are correct
* Additional general cleanup

Changes to be committed:
	modified:   README.md
Tested and validated eks-private and eks-public examples.

Changes to be committed:
	modified:   ../eks-private/README.md
	modified:   ../eks-private/main.tf
	modified:   ../eks-private/outputs.tf
	new file:   ../eks-private/sample-values_elb.yaml
	new file:   ../eks-private/sample-values_nvdp.yaml
	modified:   ../eks-private/variables.tf
	modified:   ../eks-public/README.md
	modified:   ../eks-public/main.tf
	modified:   ../eks-public/variables.tf
* Changed to use existing Anyscale SG module
* Removed VPC modules and added parameters for existing VPC and existing subnets
* Readme updates

Changes to be committed:
	modified:   eks-existing/README.md
	modified:   eks-existing/main.tf
	modified:   eks-existing/outputs.tf
	new file:   eks-existing/sample-values_elb.yaml
	new file:   eks-existing/sample-values_nvdp.yaml
	modified:   eks-existing/variables.tf
	modified:   eks-private/README.md
	modified:   eks-public/README.md
	modified:   eks-public/main.tf
Copy link
Collaborator

@brent-anyscale brent-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - huge thanks for working on this, @hongchaodeng !

* kubectl CLI
* helm CLI

### Creating Anyscale Resources
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Creating Anyscale Resources
### Creating Anyscale AWS Resources

Comment on lines 18 to 29
Create `terraform.tfvars`:

```hcl
aws_region = "us-west-2"
```

Run:

```shell
terraform init
terraform apply
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Create `terraform.tfvars`:
```hcl
aws_region = "us-west-2"
```
Run:
```shell
terraform init
terraform apply
```
Review the `eks.tf` and make any changes necessary for your EKS deployment.
Initialize, plan and apply the Terraform following your companies policies.

terraform apply
```

### Installing K8s Components
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Installing K8s Components
### Installing Kubernetes Components

version = "20.33.1"

# Cluster basic configuration
cluster_name = "anyscale-eks-public"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a variable with a default of anyscale-eks-public (and the same for private). I need to modify this file manually to know which one I'm using.

@brent-anyscale brent-anyscale added the enhancement New feature or request label Mar 7, 2025
@brent-anyscale brent-anyscale merged commit bff81ab into main Mar 7, 2025
3 checks passed
@hongchaodeng hongchaodeng deleted the hdeng/eks branch March 7, 2025 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request examples
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants