Skip to content

[#2506] feat(spark3): Introduce option to enable reorder multi servers for reader #2507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 18, 2025

Conversation

zuston
Copy link
Member

@zuston zuston commented Jun 18, 2025

What changes were proposed in this pull request?

Introduce option to enable reorder multi servers for reader

Why are the changes needed?

If partition splitting is enabled, large partitions will be distributed across multiple shuffle servers. With the help of Spark AQE (Adaptive Query Execution), these large partitions will be processed by multiple tasks.

In this case, all split tasks may sequentially read from the same set of shuffle servers, which can lead to high RPC concurrency pressure on specific servers.

This PR introduces the ability to randomly reorder the underlying shuffle servers to achieve better load balancing during reading.

Does this PR introduce any user-facing change?

Yes.

rss.client.read.reorderMultiServersEnable=false

How was this patch tested?

Needn't

@zuston zuston requested a review from jerqi June 18, 2025 04:08
Copy link

Test Results

 3 049 files  ±0   3 049 suites  ±0   6h 48m 33s ⏱️ +53s
 1 186 tests ±0   1 185 ✅ ±0   1 💤 ±0  0 ❌ ±0 
15 042 runs  ±0  15 027 ✅ ±0  15 💤 ±0  0 ❌ ±0 

Results for commit a5f48f9. ± Comparison against base commit 22be628.

@zuston zuston merged commit a69930d into apache:master Jun 18, 2025
41 checks passed
@zuston zuston deleted the readRandom branch June 18, 2025 09:19
zuston added a commit to zuston/incubator-uniffle that referenced this pull request Jul 2, 2025
…servers for reader (apache#2507)

### What changes were proposed in this pull request?

Introduce option to enable reorder multi servers for reader

### Why are the changes needed?

If partition splitting is enabled, large partitions will be distributed across multiple shuffle servers. With the help of Spark AQE (Adaptive Query Execution), these large partitions will be processed by multiple tasks.

In this case, all split tasks may sequentially read from the same set of shuffle servers, which can lead to high RPC concurrency pressure on specific servers.

This PR introduces the ability to randomly reorder the underlying shuffle servers to achieve better load balancing during reading.

### Does this PR introduce _any_ user-facing change?

Yes.

`rss.client.read.reorderMultiServersEnable=false
`
### How was this patch tested?

Needn't
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants