Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update GPU ec2 instance types in integration tests #8217

Merged
merged 1 commit into from
Feb 13, 2025

Conversation

cheeseandcereal
Copy link
Member

Description

This PR updates the instance types used in the integration tests to stop using P3 or G4ad instances which are much older. P3 instances are scheduled for retiring at some point, so we should stop using them.

Replaced with more modern equivalents.

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the userdocs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes
  • (Core team) Added labels for change area (e.g. area/nodegroup) and kind (e.g. kind/improvement)

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

@cheeseandcereal cheeseandcereal added area/testing skip-release-notes Causes PR not to show in release notes labels Feb 13, 2025
@@ -924,7 +924,7 @@ var _ = Describe("(Integration) Create, Get, Scale & Delete", func() {
"--timeout=45m",
"--cluster", params.ClusterName,
"--nodes", "1",
"--instance-types", "p3.2xlarge,p3.8xlarge,g3s.xlarge,g4ad.xlarge,g4ad.2xlarge",
"--instance-types", "g6.xlarge,g6.2xlarge",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should work but might run into insufficient capacity with this type FYI

if we go a bit larger to 4x, 8x, 16x - the price won't be too drastic given its run a short duration but there will be more available capacity (customers avoid these because they really need the GPUs, not the added vCPU/memory - so they all aim for the x, 2x size)
image

Copy link
Member Author

@cheeseandcereal cheeseandcereal Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I wanted to try using another instance type entirely, but they're either also old or gigantic (like the P5 instances start at 24xl).

We can try this for now, and then if there are issues adjust

@cheeseandcereal cheeseandcereal merged commit 7e25a4c into main Feb 13, 2025
11 checks passed
@cheeseandcereal cheeseandcereal deleted the update-gpu-test-instances branch February 13, 2025 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing skip-release-notes Causes PR not to show in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants