- 
                Notifications
    
You must be signed in to change notification settings  - Fork 391
 
Description
Background information:
- The original problems with the old algorithm;
 - The fix to the claim/join algorithm;
 - The PR to fix the leave algorithm;
 - The original rack awareness PR.
 
When joining a node, the following algorithms are attempted:
1 - A basic attempt to satisfy wants (vnodes required by the joining node) by asking node-by-node which vnodes can be passed on without breaking target_n_val (the claim_v2 algortihm).
2 - If Step 1 is unsuccessful, then attempt to stripe the all vnodes across all nodes (the sequential_claim algorithm).
3 - If Step 2 creates tail violations (i.e. if 0 < RingSize rem NodeCount < TargetNVal), resolve through the solve_tail_violations algorithm.
When leaving a node, the following algorithms are attempted:
1 - A basic attempt to perform a simple_transfer (vnodes are passed in turn to nodes that would not break target_n_val).
2 - Use sequential_claim as in join.
3 - Use solve_tail_violations extension to sequential_claim as in join
Ideally, in both cases Step 1 should succeed - as Step 2 will inevitable lead to a full cluster reorganisation (and hence a large volume of transfers).
As part of #967 location awareness was added to the sequential_claim algorithm (Step 2).
This issue is to document an ongoing investigation to these three problems:
- Under what conditions does the 
sequential_claimalgorithm (both with and without the need for thesolve_tail_volationaalgorithm provide a location safe cluster; - Can the 
claim_v2(Step 1 for joins) andsimple_transfer(Step 1 for leave) algorithms be extended to be location aware; - Can the 
claim_v2andsimple_transferalgorithms be extended to reduce the scenarios in which cluster changes fallback tosequential_claim.