-
Notifications
You must be signed in to change notification settings - Fork 28.5k
[SPARK-51974][CONNECT][ML] Limit model size and per-session model cache size #50751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
sql/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala
Outdated
Show resolved
Hide resolved
@@ -61,6 +61,31 @@ private[connect] class MLCache(sessionHolder: SessionHolder) extends Logging { | |||
|
|||
private[ml] val totalSizeBytes: AtomicLong = new AtomicLong(0) | |||
|
|||
private[ml] val totalModelCacheSizeBytes: AtomicLong = new AtomicLong(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "totalSizeBytes" above is tracking the total size in memory. Why do we need to use a different totalModelCacheSizeBytes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totalModelCacheSizeBytes
is for counting in-memory + offloaded data size
totalSizeBytes
is for counting in-memory size, I will rename it to totalInMemorySizeBytes
sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLCache.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to reformat the error classes, otherwise LGTM
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Refactored configurations: @zhengruifeng @xi-db
Only when |
Signed-off-by: Weichen Xu <[email protected]>
LGTM pending CI |
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Weichen Xu <[email protected]>
merged to master |
What changes were proposed in this pull request?
Limit model size and per-session model cache size.
Configurations:
Only when spark.connect.session.connectML.mlCache.memoryControl.enabled is true, the rest configures can work.
Why are the changes needed?
Motivation: This is for ML cache management, to avoid huge ML cache affecting Spark driver availability.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
UT.
Was this patch authored or co-authored using generative AI tooling?
No.