-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem? Please describe.
Right now we have Debug-level log events that describe when a rebalance between two shard regions is imminent:
akka.net/src/contrib/cluster/Akka.Cluster.Sharding/ShardCoordinator.cs
Lines 518 to 527 in 4edb257
| if (!coordinator.RebalanceInProgress.ContainsKey(shard)) | |
| { | |
| if (coordinator.CurrentState.Shards.TryGetValue(shard, out var rebalanceFromRegion)) | |
| { | |
| coordinator.Log.Debug("Rebalance shard [{0}] from [{1}]", shard, rebalanceFromRegion); | |
| coordinator.StartShardRebalanceIfNeeded(shard, rebalanceFromRegion, isRebalance: true); | |
| } | |
| else | |
| coordinator.Log.Debug("Rebalance of non-existing shard [{0}] is ignored", shard); | |
| } |
This will eventually result in the individual Shard actors that are being handed off logging the following message locally:
| shard.Log.Debug("HandOff shard [{0}]", shard.ShardId); |
This is very useful to have if you're running with verbose logging enabled - however, it comes with some limitations:
- Most users don't run with
akka.loglevel=DEBUGso these events get missed; - From a telemetry perspective, it'd be useful to track how frequently this happens and how long it takes to rebalance each shard; and
- It'd be useful to have some programmatic notifications for these events akin to what we receive from Akka.Cluster's membership and reachability events for instance.
The dominant use case for adding this feature is likely going to be telemetry - tracking rebalancing operations over time with strongly typed messages with useful context. This data could of course be constructed via parsing debug-level events in a system like ELK but maybe there's some use cases where it'd be better for us to do this via strongly typed messages.
Each entity actor inside a rebalanced
Shardreceives its own passivation message (which can be customized via configuration) already today. There's no real use-case for this feature to help user's customize how entity actors handle the migration of their data et al - you already have tools to do that through handling of thePassivate/ custom stop message feature built into sharding as-is.In addition that - all
ShardRegionandShardRegionProxyactors that point to shards that are being rebalanced already receive a notification that causes them to buffer and pause message traffic to those shards until they're re-homed. Therefore, there's really no added value for these message types for the purpose of traffic direction either - as that's also already solved.
Describe the solution you'd like
These are local events isolated to the node being rebalanced only; we will not, under any circumstances, add global support so every single node knows about every other node's rebalance.
I think it'd be helpful if each local Shard that was rebalanced published the following two events locally:
ShardRebalanceBegin- a notification of which shard on which node is being rebalanced, the total number of entities affected, and theShardRegiontype name.ShardRebalanceComplete- a notification that the previous operation completed, its status, and the total elapsed duration.
These messages would not be transmitted over the wire and to subscribe these notifications one would have to simply request IShardRebalanceNotification from the EventStream:
Context.System.EventStream.Subscribe(typeof(IShardRebalanceNotification), Context.Self);The events themselves (pseudo-code) :
public interface IShardRebalanceNotification{
string ShardRegion {get;}
string ShardId {get;]
int TotalEntities {get;}
}
public sealed class ShardRebalanceBegin : IShardRebalanceNotification{
public string ShardRegion {get;}
public string ShardId {get;]
public int TotalEntities {get;}
public RebalanceReason {get;} // either Rebalance or Shutdown
}
public sealed class ShardRebalanceComplete : IShardRebalanceNotification{
public string ShardRegion {get;}
public string ShardId {get;]
public int TotalEntities {get;}
public TimeSpan Elapsed {get;}
}We could also, in theory, provide a list to all of the local entity actors that are being rebalanced since that data is available at the time this event is being published.
Describe alternatives you've considered
As mentioned earlier in this post, this data could roughly be extracted from log messages using an external system but that's extremely cumbersome and doesn't really offer any sort of in-proc support for passivation and message handling.
