Skip to content

[#2537] feat(spark): Introduce option to activate small cache in grpc server#2538

Merged
zuston merged 3 commits intoapache:masterfrom
zuston:grpcDriver
Jul 8, 2025
Merged

[#2537] feat(spark): Introduce option to activate small cache in grpc server#2538
zuston merged 3 commits intoapache:masterfrom
zuston:grpcDriver

Conversation

@zuston
Copy link
Member

@zuston zuston commented Jul 7, 2025

What changes were proposed in this pull request?

Introduce the config option to activate small cache in grpc server

Why are the changes needed?

for #2537

When partition reassignment is enabled in the production environment, we observed that some Spark jobs failed due to gRPC request timeouts (DEADLINE_EXCEEDED). Upon investigating the Spark driver logs, we found severe GC events, indicating significant memory pressure on the driver process.

Based on the PR #1780, the small cache looks effective for the grpc mode.

This PR is to make the small cache being enabled as the default option because GRPC_NETTY mode has been as the default rpc mode.

Does this PR introduce any user-facing change?

Yes.

rss.rpc.netty.smallCacheEnabled=true

How was this patch tested?

Existing unit tests.

@zuston zuston linked an issue Jul 7, 2025 that may be closed by this pull request
3 tasks
@github-actions
Copy link

github-actions bot commented Jul 7, 2025

Test Results

 3 072 files  ±0   3 072 suites  ±0   6h 50m 47s ⏱️ + 2m 7s
 1 190 tests ±0   1 189 ✅ ±0   1 💤 ±0  0 ❌ ±0 
15 090 runs  ±0  15 075 ✅ ±0  15 💤 ±0  0 ❌ ±0 

Results for commit b277b01. ± Comparison against base commit 77e6ab1.

♻️ This comment has been updated with latest results.

@roryqi roryqi requested a review from rickyma July 7, 2025 06:45
@roryqi
Copy link
Contributor

roryqi commented Jul 7, 2025

@rickyma Could you take a look at this PR?

.booleanType()
.defaultValue(false)
.withDescription(
"The option to control whether enable the pooled byte buf allocator small cache. This is only valid for spark driver side grpc server");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option to control whether to enable the small cache in the pooled byte buffer allocator. This option is only applicable to the gRPC server on the Spark driver side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we need to describe this new config in docs.

}

private Server buildGrpcServer(int serverPort) {
boolean isClientSmallCacheEnabled =
Copy link
Contributor

@rickyma rickyma Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can put this config into RssBaseConf also? Because all other configs are in it.
In this way, we should rename it to isSmallCacheEnabled thus it could be used both in clients and servers?
Then, we need to change the description of it as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. This option could be extended to cover the #1780 requirements.

@zuston
Copy link
Member Author

zuston commented Jul 7, 2025

PTAL @rickyma

@zuston zuston requested a review from rickyma July 8, 2025 02:39
Copy link
Contributor

@rickyma rickyma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

@zuston zuston requested a review from rickyma July 8, 2025 05:40
@zuston
Copy link
Member Author

zuston commented Jul 8, 2025

Thanks @rickyma @jerqi .Merged

@zuston zuston merged commit ece59ee into apache:master Jul 8, 2025
40 of 41 checks passed
@zuston zuston deleted the grpcDriver branch July 8, 2025 07:49
zuston added a commit to zuston/incubator-uniffle that referenced this pull request Sep 23, 2025
…n grpc server (apache#2538)

### What changes were proposed in this pull request?

Introduce the config option to activate small cache in grpc server

### Why are the changes needed?

for apache#2537

When partition reassignment is enabled in the production environment, we observed that some Spark jobs failed due to gRPC request timeouts (DEADLINE_EXCEEDED). Upon investigating the Spark driver logs, we found severe GC events, indicating significant memory pressure on the driver process.

Based on the PR apache#1780, the small cache looks effective for the grpc mode.

This PR is to make the small cache being enabled as the default option because GRPC_NETTY mode has been as the default rpc mode. 

### Does this PR introduce _any_ user-facing change?

Yes.

`rss.rpc.netty.smallCacheEnabled=true`

### How was this patch tested?

Existing unit tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] GRPC connection timeout in driver side

3 participants