Skip to content

Conversation

@MonkeyCanCode
Copy link

@MonkeyCanCode MonkeyCanCode commented Nov 15, 2025

Summary

Fix memory leak in Spark from AuthSessionCache when using Iceberg and ensure resources get cleanup.

Background

I am using Spark Connect where end-users will be submitting their spark jobs/queries from their end into the remote Spark Connect server. These queries runtime can ranges from seconds to minutes and query per users can varies as well. Also, this in case, the end-users are the ones who are creating spark session and defined the connection info to Iceberg REST catalog. By default, Spark Connect server will cleanup idle sessions after one hour.

What I found out interesting is the memory usage of Spark Connect is not able to get garbage collected after Spark Connect server killed the idle sessions after reached default TTL. After some debugging, this point me to ClassLoader from Apache Spark leak in AuthSessionCache.java from Apache Iceberg.

Changes

  1. Fixing the ClassLaoder leak in Apache Spark in AuthSessionCache.java
    The existed ThreadPools.newExitingWorkerPool created a ScheduledExecutorService and registers a JVM-level shutdown hook. This hook can inadvertently hold a strong reference to session specific ClassLoader in Spark connect via the tasks it manages, which preventing them from being released. This change replaces newExitingWorkerPool with newScheduledPool which creates a thread pool with daemon threads. Based on my understanding, daemon threads do not block JVM from existing thus prevent the issue mentioned above.

  2. Ensure proper resources cleanup in catalogs
    CachingCatalog and SparkCatalog now implements java.io.Closeable which allows them to propagate the close call to the underlying wrapped catalog. This will ensure that any resource referenced by catalogs are properly released.

Reference

JIRA for Apache Spark: https://issues.apache.org/jira/browse/SPARK-54367

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant