-
Notifications
You must be signed in to change notification settings - Fork 849
feat: add block-level partition shuffle #19311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
dantengsky
wants to merge
4
commits into
databendlabs:main
Choose a base branch
from
dantengsky:feat/block-level-partition-shuffle
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
feat: add block-level partition shuffle #19311
dantengsky
wants to merge
4
commits into
databendlabs:main
from
dantengsky:feat/block-level-partition-shuffle
+115
−5
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
Docker Image for PR
|
5c7075e to
c0a0d92
Compare
Contributor
Docker Image for PR
|
Contributor
ade09d8 to
9764f7a
Compare
Contributor
🤖 CI Job Analysis
⛔️ CANCELLEDHigher priority request detected - retry cancelled to avoid conflicts. |
Add heuristic-based block-level shuffle for better load balancing when tables have few segments relative to cluster size. Changes: - Add BlockMod shuffle kind for block-level distribution - Add auto_block_shuffle_threshold setting (default=5, 0 to disable) - When segment_count < nodes * threshold, use block-level shuffle - Each executor filters blocks by block_idx % num_executors == executor_idx - Add info logging for shuffle strategy selection - Preserve partition kind during reshuffle to prevent data duplication
9764f7a to
cedc7b5
Compare
Move block_slot computation from executor-side (prune_segments_with_pipeline) to coordinator-side (redistribute_source_fragment). This ensures all executors use the same cluster view that was determined when the plan was created, preventing data duplication or loss if cluster membership changes. Changes: - Add block_slot field to DataSourcePlan - Compute block_slot in redistribute_source_fragment for BlockMod shuffle - Pass block_slot through plan instead of computing at execution time
…castWarehouse Block filtering is now controlled by plan.block_slot, not by partition kind. After reshuffle, all executors just process partitions sequentially. Also revert incorrect change in memory_table.rs (should use BroadcastCluster).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
ci-benchmark-cloud
Benchmark: run only cloud tests for tpch/hits
ci-cloud
Build docker image for cloud test
pr-feature
this PR introduces a new feature to the codebase
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
Add heuristic-based block-level shuffle for better load balancing when tables have few segments relative to cluster size.
Background
In distributed query scenarios, when a table has few segments relative to the cluster size, the original segment-level Mod distribution strategy causes uneven load balancing.
Problem scenarios:
Solution
Introduce an automatic block-level distribution heuristic:
auto_block_shuffle_threshold(default: 5, set to 0 to disable)How it improves
With block-level distribution, workload is evenly distributed regardless of segment count.
New Settings
-- View current threshold
SELECT value FROM system.settings WHERE name = 'auto_block_shuffle_threshold';
-- Adjust threshold (block-level distribution when segment < nodes * threshold)
SET auto_block_shuffle_threshold = 5; -- default
-- Disable automatic block-level distribution
SET auto_block_shuffle_threshold = 0;
Changes
Tests
Type of change
This change is