feat: add block-level partition shuffle #19311

dantengsky · 2026-01-22T02:10:57Z

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Add heuristic-based block-level shuffle for better load balancing when tables have few segments relative to cluster size.

Background

In distributed query scenarios, when a table has few segments relative to the cluster size, the original segment-level Mod distribution strategy causes uneven load balancing.

Problem scenarios:

3 segments, 4 nodes, 300 blocks → Segments per node: 1-1-1-0 (1 node idle, max 2x workload difference)
10 segments, 4 nodes, 1000 blocks → Segments per node: 3-3-2-2 (50% workload difference)

Solution

Introduce an automatic block-level distribution heuristic:

Trigger condition: Activates when segment_count < cluster_nodes * threshold
Distribution method: All segments assign to all nodes; each node filters blocks by block_idx % num_nodes == node_idx
setting: auto_block_shuffle_threshold (default: 5, set to 0 to disable)

How it improves

┌───────────────────────────────────┬────────────────────────────────────────────┬─────────────────────────┐
│             Scenario              │               Original (Mod)               │     New (BlockMod)      │
├───────────────────────────────────┼────────────────────────────────────────────┼─────────────────────────┤
│ 3 segments, 4 nodes, 300 blocks   │ Segments: 1-1-1-0, blocks: 100-100-100-0   │ Blocks: 75-75-75-75     │
├───────────────────────────────────┼────────────────────────────────────────────┼─────────────────────────┤
│ 10 segments, 4 nodes, 1000 blocks │ Segments: 3-3-2-2, blocks: 300-300-200-200 │ Blocks: 250-250-250-250 │
└───────────────────────────────────┴────────────────────────────────────────────┴─────────────────────────┘

With block-level distribution, workload is evenly distributed regardless of segment count.

New Settings

-- View current threshold
SELECT value FROM system.settings WHERE name = 'auto_block_shuffle_threshold';

-- Adjust threshold (block-level distribution when segment < nodes * threshold)
SET auto_block_shuffle_threshold = 5; -- default

-- Disable automatic block-level distribution
SET auto_block_shuffle_threshold = 0;

Changes

Add BlockMod shuffle kind for block-level distribution
Add auto_block_shuffle_threshold setting (default=5, 0 to disable)
When segment_count < nodes * threshold, use block-level shuffle
Each executor filters blocks by block_idx % num_executors == executor_idx
Add info logging for shuffle strategy selection

Tests

Unit Test
Logic Test
Benchmark Test
No Test - Explain why

Type of change

Bug Fix (non-breaking change which fixes an issue)
New Feature (non-breaking change which adds functionality)
Breaking Change (fix or feature that could cause existing functionality not to work as expected)
Documentation Update
Refactoring
Performance Improvement
Other (please describe):

This change is

github-actions · 2026-01-22T03:16:53Z

Docker Image for PR

tag: pr-19311-14cf1ad-1769051528

note: this image tag is only available for internal use.

github-actions · 2026-01-22T07:18:58Z

Docker Image for PR

tag: pr-19311-27fa899-1769066148

note: this image tag is only available for internal use.

github-actions · 2026-01-22T08:10:51Z

ClickBench Report

hits: https://benchmark.databend.com/clickbench/pr/19311/21238603203/hits.html
tpch100: https://benchmark.databend.com/clickbench/pr/19311/21238603203/tpch100.html
tpch1000: https://benchmark.databend.com/clickbench/pr/19311/21238603203/tpch1000.html

github-actions · 2026-01-22T08:34:36Z

🤖 CI Job Analysis

Workflow: 21243076599

⛔️ CANCELLED

Higher priority request detected - retry cancelled to avoid conflicts.

View Workflow

Add heuristic-based block-level shuffle for better load balancing when tables have few segments relative to cluster size. Changes: - Add BlockMod shuffle kind for block-level distribution - Add auto_block_shuffle_threshold setting (default=5, 0 to disable) - When segment_count < nodes * threshold, use block-level shuffle - Each executor filters blocks by block_idx % num_executors == executor_idx - Add info logging for shuffle strategy selection - Preserve partition kind during reshuffle to prevent data duplication

Move block_slot computation from executor-side (prune_segments_with_pipeline) to coordinator-side (redistribute_source_fragment). This ensures all executors use the same cluster view that was determined when the plan was created, preventing data duplication or loss if cluster membership changes. Changes: - Add block_slot field to DataSourcePlan - Compute block_slot in redistribute_source_fragment for BlockMod shuffle - Pass block_slot through plan instead of computing at execution time

…castWarehouse Block filtering is now controlled by plan.block_slot, not by partition kind. After reshuffle, all executors just process partitions sequentially. Also revert incorrect change in memory_table.rs (should use BroadcastCluster).

github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Jan 22, 2026

dantengsky added the ci-cloud Build docker image for cloud test label Jan 22, 2026

dantengsky force-pushed the feat/block-level-partition-shuffle branch from 5c7075e to c0a0d92 Compare January 22, 2026 06:30

dantengsky added the ci-benchmark-cloud Benchmark: run only cloud tests for tpch/hits label Jan 22, 2026

dantengsky force-pushed the feat/block-level-partition-shuffle branch 2 times, most recently from ade09d8 to 9764f7a Compare January 22, 2026 08:33

dantengsky force-pushed the feat/block-level-partition-shuffle branch from 9764f7a to cedc7b5 Compare January 22, 2026 08:35

dantengsky added 3 commits January 22, 2026 16:52

fix: restore BroadcastCluster test case and add BlockMod test

11044bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add block-level partition shuffle #19311

feat: add block-level partition shuffle #19311

dantengsky commented Jan 22, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: add block-level partition shuffle #19311

Are you sure you want to change the base?

feat: add block-level partition shuffle #19311

Conversation

dantengsky commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Type of change

Uh oh!

github-actions bot commented Jan 22, 2026

Docker Image for PR

Uh oh!

github-actions bot commented Jan 22, 2026

Docker Image for PR

Uh oh!

github-actions bot commented Jan 22, 2026

ClickBench Report

Uh oh!

github-actions bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 CI Job Analysis

⛔️ CANCELLED

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dantengsky commented Jan 22, 2026 •

edited

Loading

github-actions bot commented Jan 22, 2026 •

edited

Loading