-
Notifications
You must be signed in to change notification settings - Fork 89
Updating Azure Plugin to Support Flexible VMSS #1106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Updating Azure Plugin to Support Flexible VMSS #1106
Conversation
- Updated Azure SDK imports to use the new `azidentity` and `armcompute` packages. - Replaced deprecated methods for creating Azure clients with new client secret credential methods. - Modified `setupAzureClient` to handle client creation and error handling more effectively. - Enhanced `scaleOut` and `scaleIn` methods to support both Uniform and Flexible orchestration modes. - Implemented concurrent processing for VM instance views in Flexible mode to improve performance. - Updated tests to reflect changes in the Azure SDK and added new test cases for instance view processing. - Simplified instance view processing logic to accommodate new SDK structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work @benjamin-lykins. I've left a bunch of comments -- some on small details and some on design. For the smaller comments like contents of debug logging, try to take a pass over the whole PR with those in mind. (Edit: it looks like GitHub is unhelpfully doing the "hidden conversations" thing so make sure you expand that as that includes some of the major design items)
We've added a lot of new code here and it doesn't seem like there's a lot of test coverage. Unfortunately we don't have integration tests so that's challenging, but maybe as you break up some of those large methods I pointed out there'd be a way to extract some of the logic into testable chunks.
Removing Co-authored-by: Tim Gross <[email protected]>
removing comments and cleanup Co-authored-by: Tim Gross <[email protected]>
…/benjamin-lykins/nomad-autoscaler into feat/azure-vmss-support-flexible
…/benjamin-lykins/nomad-autoscaler into feat/azure-vmss-support-flexible
consolidating scale in function and generalizing. tested both uniform and flexible without issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making good progress!
checking for running and provisioned vms.
@tgross Replied back to your latest reviews and put in a series of commits to address the recommendations. I'm working to clean up the functions which get both status and IDs for Flexible VMSS. I may try to consolidate those two functions, or generalize some of it that I can. |
Co-authored-by: Tim Gross <[email protected]>
both uniform & flexible
Fixes: #1096
I will note, this might not be a small first PR to put in. Ended up being a little bit more noisy since I did migrate off the legacy Azure SDK.
Changes
"github.com/Azure/azure-sdk-for-go/services/compute/mgmt/2020-06-01/compute"
"github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/compute/armcompute"
Supports both flexible and uniform scale sets. Queries the VMSS to get the
type
and then conditionally runs the respective process status/scale in and out processes.Flexible scale sets are a shift from uniform and were unable to query the instance view of the instance. Essentially, grab the list of instances from the VMSS, then use that against the VM (not VMSS) API to get instance status.
When getting instance states, since Flexible VMSS can have up to 2000 instances, I used a goroutine to iterate through them more effectively. Hardcoded at 5 concurrent, might be worth allowing the end user to provide a limit on this, but from my testing with 5 concurrent against 100 instances, it took ~1 second. When I was sequentially pulling instance state, it took ~30 seconds for the same count. I did test limiting to 10 concurrent, but it took it from ~1 second to a little less than a second, so not much gained.
processInstanceView
.I can double check the why, but I believe one of the attributes was not available in the current api.
Admittedly
Test_processInstanceView
used copilot to refactor for the new api.Testing
Flexible VMSS
Uniform VMSS