Skip to content

[SPARK-55341][SQL] Add storage level flag for cached local relations#54118

Closed
pranavdev022 wants to merge 2 commits intoapache:masterfrom
pranavdev022:add-flag-for-cached-disk-storage
Closed

[SPARK-55341][SQL] Add storage level flag for cached local relations#54118
pranavdev022 wants to merge 2 commits intoapache:masterfrom
pranavdev022:add-flag-for-cached-disk-storage

Conversation

@pranavdev022
Copy link
Contributor

@pranavdev022 pranavdev022 commented Feb 3, 2026

What changes were proposed in this pull request?

This PR adds a feature flag spark.sql.artifact.cacheStorageLevel to control the storage level used for cached blocks:

  • When enabled: uses DISK_ONLY storage level to reduce memory pressure
  • When disabled (default): uses MEMORY_AND_DISK_SER storage level (current behavior)

This allows users to opt into disk-only storage for cached artifacts when memory is constrained, while maintaining backward compatibility with the default behavior.

Why are the changes needed?

Cached artifact blocks in ArtifactManager currently use MEMORY_AND_DISK_SER storage level. In some scenarios with large artifacts, especially large local relations, this can cause memory pressure.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests verify the default behavior MEMORY_AND_DISK_SER continues to work correctly. The flag is internal and defaults to MEMORY_AND_DISK_SER, maintaining current behavior.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Feb 3, 2026
@github-actions
Copy link

github-actions bot commented Feb 3, 2026

JIRA Issue Information

=== Task SPARK-55341 ===
Summary: Add disk only optional feature flag when adding cached blocks
Assignee: None
Status: Open
Affected: ["4.2.0"]


This comment was automatically generated by GitHub Actions

@pranavdev022 pranavdev022 force-pushed the add-flag-for-cached-disk-storage branch from 1e88816 to a73ff6f Compare February 4, 2026 19:33
if (existingBlock == null || existingBlock.id != blockId) {
val storageLevelStr = session.conf.get(
SQLConf.ARTIFACT_MANAGER_CACHE_STORAGE_LEVEL)
val storageLevel = try {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you're not setting storageLevel to level. BTW, I think you don't need to do a try-catch. You can add the validation logic in conf itself with checkValues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon thanks for pointing these out.
I have set level to use StorageLevel and simplified the validation logic using checkValues.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hvanhovell
Copy link
Contributor

Merging to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants