Skip to content

[#2537] feat(spark): Introduce option to activate small cache in grpc server #2538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 8, 2025

Conversation

zuston
Copy link
Member

@zuston zuston commented Jul 7, 2025

What changes were proposed in this pull request?

Introduce the config option to activate small cache in grpc server

Why are the changes needed?

for #2537

When partition reassignment is enabled in the production environment, we observed that some Spark jobs failed due to gRPC request timeouts (DEADLINE_EXCEEDED). Upon investigating the Spark driver logs, we found severe GC events, indicating significant memory pressure on the driver process.

Based on the PR #1780, the small cache looks effective for the grpc mode.

This PR is to make the small cache being enabled as the default option because GRPC_NETTY mode has been as the default rpc mode.

Does this PR introduce any user-facing change?

Yes.

rss.rpc.netty.smallCacheEnabled=true

How was this patch tested?

Existing unit tests.

@zuston zuston linked an issue Jul 7, 2025 that may be closed by this pull request
3 tasks
Copy link

github-actions bot commented Jul 7, 2025

Test Results

 3 072 files  ±0   3 072 suites  ±0   6h 50m 47s ⏱️ + 2m 7s
 1 190 tests ±0   1 189 ✅ ±0   1 💤 ±0  0 ❌ ±0 
15 090 runs  ±0  15 075 ✅ ±0  15 💤 ±0  0 ❌ ±0 

Results for commit b277b01. ± Comparison against base commit 77e6ab1.

♻️ This comment has been updated with latest results.

@jerqi jerqi requested a review from rickyma July 7, 2025 06:45
@jerqi
Copy link
Contributor

jerqi commented Jul 7, 2025

@rickyma Could you take a look at this PR?

.booleanType()
.defaultValue(false)
.withDescription(
"The option to control whether enable the pooled byte buf allocator small cache. This is only valid for spark driver side grpc server");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option to control whether to enable the small cache in the pooled byte buffer allocator. This option is only applicable to the gRPC server on the Spark driver side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we need to describe this new config in docs.

@@ -102,14 +103,17 @@ static void reset() {
}

private Server buildGrpcServer(int serverPort) {
boolean isClientSmallCacheEnabled =
Copy link
Contributor

@rickyma rickyma Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can put this config into RssBaseConf also? Because all other configs are in it.
In this way, we should rename it to isSmallCacheEnabled thus it could be used both in clients and servers?
Then, we need to change the description of it as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. This option could be extended to cover the #1780 requirements.

@zuston
Copy link
Member Author

zuston commented Jul 7, 2025

PTAL @rickyma

@zuston zuston requested a review from rickyma July 8, 2025 02:39
Copy link
Contributor

@rickyma rickyma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

@zuston zuston requested a review from rickyma July 8, 2025 05:40
@zuston
Copy link
Member Author

zuston commented Jul 8, 2025

Thanks @rickyma @jerqi .Merged

@zuston zuston merged commit ece59ee into apache:master Jul 8, 2025
40 of 41 checks passed
@zuston zuston deleted the grpcDriver branch July 8, 2025 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] GRPC connection timeout in driver side
3 participants