-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Add Node Weight to GetDesiredBalance #131025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Node Weight to GetDesiredBalance #131025
Conversation
Extends the `_internal/desired_balance` API to return the node weights. These are found within the `DesiredBalanceResponse.ClusterBalanceStats .NodeBalanceStats` object. If no node weights have been calculated, then this value defaults to 0. Issue: elastic#126579
78aa356
to
aeb59b2
Compare
3ed1d0d
to
3ac4def
Compare
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. Looks promising. I left some comments.
...rc/main/java/org/elasticsearch/cluster/routing/allocation/allocator/ClusterBalanceStats.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/cluster/routing/allocation/allocator/ClusterBalanceStats.java
Outdated
Show resolved
Hide resolved
...st/java/org/elasticsearch/cluster/routing/allocation/allocator/ClusterBalanceStatsTests.java
Outdated
Show resolved
Hide resolved
- requires: | ||
capabilities: | ||
- method: GET | ||
path: _internal/desired_balance | ||
capabilities: [ cluster_balance-node_balance_stats-node_weights_returned ] | ||
test_runner_features: [ capabilities ] | ||
reason: "Node weights returned in the node balance stats was added in version 9.2.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not add this requires
section to existing tests since (1) they do not test the new field and (2) it reduces the test coverage for old versions. Instead we should add a new test with this requires
section and checks the new field in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had issues with the CI tests failing without this - will wait for the CI tests to run on my most recent commit and try and debug from there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised that the existing tests would fail without it. If you share a buildscan of the failure, I can also help looking into it.
...r/src/main/java/org/elasticsearch/rest/action/admin/cluster/RestGetDesiredBalanceAction.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/cluster/routing/allocation/allocator/ClusterBalanceStats.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great. I only had minor comments.
...st/java/org/elasticsearch/cluster/routing/allocation/allocator/ClusterBalanceStatsTests.java
Outdated
Show resolved
Hide resolved
- requires: | ||
capabilities: | ||
- method: GET | ||
path: _internal/desired_balance | ||
capabilities: [ cluster_balance-node_balance_stats-node_weights_returned ] | ||
test_runner_features: [ capabilities ] | ||
reason: "Node weights returned in the node balance stats was added in version 9.2.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised that the existing tests would fail without it. If you share a buildscan of the failure, I can also help looking into it.
129ab38
to
d255680
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more minor comments.
...-api-spec/src/yamlRestTest/resources/rest-api-spec/test/cluster.desired_balance/10_basic.yml
Outdated
Show resolved
Hide resolved
test/framework/src/main/java/org/elasticsearch/test/ESTestCase.java
Outdated
Show resolved
Hide resolved
...st/java/org/elasticsearch/cluster/routing/allocation/allocator/ClusterBalanceStatsTests.java
Outdated
Show resolved
Hide resolved
9b99150
to
08d6cc8
Compare
594fc20
to
cd5e8f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great. Some minor comments over testing code.
.../test/java/org/elasticsearch/cluster/routing/allocation/allocator/NodeBalanceStatsTests.java
Outdated
Show resolved
Hide resolved
public void testSerializationWithTransportVersionV_8_7_0() throws IOException { | ||
ClusterBalanceStats.NodeBalanceStats instance = createTestInstance(); | ||
// Serialization changes based on this version | ||
TransportVersion oldVersion = TransportVersions.V_8_7_0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's common to randomize from a broader version range for the oldVersion, i.e.:
TransportVersion oldVersion = TransportVersions.V_8_7_0; | |
final var oldVersion = TransportVersionUtils.randomVersionBetween( | |
random(), | |
TransportVersions.MINIMUM_COMPATIBLE, | |
TransportVersionUtils.getPreviousVersion(TransportVersions.V_8_8_0) | |
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 4a7ec19
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, so MINIMUM_COMPATIBLE
refers to Elastic version 8.19. Therefore any transport version less than this can be safely removed, as per the comment on line 51 of this file, and the Jira ticket https://elasticco.atlassian.net/browse/ES-10337.
Therefore, as a follow up to this PR, I will remove all references to TransportVersion.V_8_8_0
and TransportVersions.V_8_12_0
in the NodeBalanceStats.readFrom()
method (and then delete these tests since they will then not be necessary).
Until that time, I have just randomly generated a TransportVersion between V_8_0_0
and V_8_8_0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you are right. 👍
.../test/java/org/elasticsearch/cluster/routing/allocation/allocator/NodeBalanceStatsTests.java
Outdated
Show resolved
Hide resolved
.../test/java/org/elasticsearch/cluster/routing/allocation/allocator/NodeBalanceStatsTests.java
Outdated
Show resolved
Hide resolved
.../test/java/org/elasticsearch/cluster/routing/allocation/allocator/NodeBalanceStatsTests.java
Outdated
Show resolved
Hide resolved
.../test/java/org/elasticsearch/cluster/routing/allocation/allocator/NodeBalanceStatsTests.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/cluster/routing/allocation/allocator/ClusterBalanceStats.java
Outdated
Show resolved
Hide resolved
3955ca7
to
72635ca
Compare
❌ Author of the following commits did not sign a Contributor Agreement: Please, read and sign the above mentioned agreement if you want to contribute to this project |
dca823a
to
dfe3506
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
One comment about handling not-yet-included node is somewhat important and please address it. Thanks for the iterations!
@@ -300,18 +372,45 @@ private static ClusterState createClusterState(List<DiscoveryNode> nodes, List<T | |||
.build(); | |||
} | |||
|
|||
private static DesiredBalance createDesiredBalance(ClusterState state) { | |||
private static DesiredBalance createDesiredBalanceWithEmptyNodeWeights(ClusterState state) { | |||
return createDesiredBalance(state, randomDoubleBetween(-1, 1, true), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: seems wasteful to randomize given its unused
return createDesiredBalance(state, randomDoubleBetween(-1, 1, true), true); | |
return createDesiredBalance(state, 0, true); |
Double nodeWeight = desiredBalance.weightsPerNode().isEmpty() | ||
? null | ||
: desiredBalance.weightsPerNode().get(routingNode.node()).nodeWeight(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two suggestions
- We can add
assert desiredBalance != null
before this line to make the intention clear that it should not benull
. Otherwise if it throws a NPE in some future tests, it would make people pause and wonder whether null handling is needed here. - Desired balance is compueted asynchrounously. It is possible that a new node from the latest cluster state is not yet included. So we need to count for it as well, e.g. something like:
Double nodeWeight = Optional.of(desiredBalance.weightsPerNode().get(routingNode.node())
.map(NodeWeightStats::nodeWeight)
.orElse(null);
Extends the
_internal/desired_balance
API to return the node weights. These are found within theDesiredBalanceResponse.ClusterBalanceStats.NodeBalanceStats
object. If no node weights have been calculated, then this value defaults to 0.Github issue: #126579
Jira Ticket: ES-11546