[SPARK-55337][SS] Fix MemoryStream backward compatibility by cloud-fan · Pull Request #54108 · apache/spark

cloud-fan · 2026-02-03T05:27:48Z

What changes were proposed in this pull request?

This is a followup to #52402 that addresses backward compatibility concerns:

Keep the original implicit SQLContext factory methods for full backward compatibility
Add new overloads with explicit SparkSession parameter for new code
Fix TestGraphRegistrationContext to provide implicit spark and sqlContext to avoid name shadowing issues in nested classes
Remove redundant implicit val sparkSession declarations from pipeline tests that are no longer needed with the fix

Why are the changes needed?

PR #52402 changed the MemoryStream API to use implicit SparkSession which broke backward compatibility for code that only has implicit SQLContext available. This followup ensures:

Old code continues to work without modification
New code can use SparkSession with explicit parameters
Internal implementation uses SparkSession (modernization from [SPARK-53656][SS] Refactor MemoryStream to use SparkSession instead of SQLContext #52402)

Does this PR introduce any user-facing change?

No. This maintains full backward compatibility while adding new API options.

How was this patch tested?

Existing tests pass. The API changes are additive.

Was this patch authored or co-authored using generative AI tooling?

Yes

Made with Cursor

github-actions · 2026-02-03T05:27:59Z

JIRA Issue Information

=== Bug SPARK-55337 ===
Summary: Fix MemoryStream backward compatibility
Assignee: None
Status: Open
Affected: ["4.1.1"]

This comment was automatically generated by GitHub Actions

cloud-fan · 2026-02-03T05:38:21Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/runtime/memory.scala

-
-  // Deprecated: Used when an implicit SQLContext is in scope
-  @deprecated("Use MemoryStream.apply with an implicit SparkSession instead of SQLContext", "4.1.0")
-  def apply[A: Encoder]()(implicit sqlContext: SQLContext): MemoryStream[A] =


This is the problem. This is not backward compatible, as the previous API is def apply[A: Encoder](implicit sqlContext: SQLContext): MemoryStream[A] (no parentheses).

There is no way to keep both implicits. So the proposal here is to only keep implicit SQLContext, and require to pass SparkSession implicitly.

cloud-fan · 2026-02-03T05:59:32Z

cc @ganeshashree @HeartSaVioR

dongjoon-hyun

@cloud-fan , we cannot create a follow-up for the released JIRA issue because your PR has a different fix version, 4.1.2 (or 4.2.0), instead of 4.1.0. Please create a new JIRA ID.

### What changes were proposed in this pull request? This is a followup to apache#52402 that addresses backward compatibility concerns: 1. Keep the original `implicit SQLContext` factory methods for full backward compatibility 2. Add new overloads with explicit `SparkSession` parameter for new code 3. Fix `TestGraphRegistrationContext` to provide implicit `spark` and `sqlContext` to avoid name shadowing issues in nested classes 4. Remove redundant `implicit val sparkSession` declarations from pipeline tests that are no longer needed with the fix ### Why are the changes needed? PR apache#52402 changed the MemoryStream API to use `implicit SparkSession` which broke backward compatibility for code that only has `implicit SQLContext` available. This followup ensures: - Old code continues to work without modification - New code can use SparkSession with explicit parameters - Internal implementation uses SparkSession (modernization from apache#52402) ### Does this PR introduce _any_ user-facing change? No. This maintains full backward compatibility while adding new API options. ### How was this patch tested? Existing tests pass. The API changes are additive. ### Was this patch authored or co-authored using generative AI tooling? Yes Co-authored-by: Cursor <cursoragent@cursor.com>

Addressed.

dongjoon-hyun · 2026-02-03T10:10:00Z

Thank you for getting a new JIRA ID.

Remove the `apply[A: Encoder](numPartitions: Int, sparkSession: SparkSession)` factory method that creates a semantic trap - it can accidentally match calls like `MemoryStream[T](0, spark)` interpreting the first argument as `numPartitions` instead of `id`, causing zero partitions to be created and no data to flow. Users who need both `numPartitions` and explicit `SparkSession` can use the case class constructor directly: `new MemoryStream[A](id, sparkSession, Some(numPartitions))`. Co-authored-by: Cursor <cursoragent@cursor.com>

HeartSaVioR

+1 (one minor comment which I think it's easily addressable) but probably good to make clear about below...

Do we know about the impact of this change? It seems to be complicated to think through the impact of this change (and a prior change, since it was turned out to be not backward compatible).

HeartSaVioR · 2026-02-04T11:45:35Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/runtime/memory.scala

+   * Creates a MemoryStream with explicit encoder and SparkSession.
+   * Usage: `MemoryStream(Encoders.scalaInt, spark)`
+   */
+  def apply[A](encoder: Encoder[A], sparkSession: SparkSession): MemoryStream[A] =


I roughly remember the intention was to discourage the usage of SQLContext - if that's the case, we probably want to have the way to pass numPartitions parameter.

That said, looks like this method (explicit encoder instance) is newly added. Is there any usage of this? We don't seem to add the same in ContinuousMemoryStream and LowLatencyMemoryStream.

It's used in quite some places like StreamingQueryManagerSuite

testQuietly("can start a streaming query with the same name in a different session") { val session2 = spark.cloneSession() val ds1 = MemoryStream(Encoders.INT, spark).toDS() val ds2 = MemoryStream(Encoders.INT, session2).toDS()

I've add a new overload to specific numPartitions with SparkSession.

HeartSaVioR · 2026-02-04T11:54:19Z

sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/MemorySinkSuite.scala

      intsToDF(expected)(schema))
  }

-  test("LowPriorityMemoryStreamImplicits works with implicit sqlContext") {


lol we added tests in wrong place... the file seems to be for memory "sink", not memory "source".

cloud-fan · 2026-02-04T16:36:54Z

Do we know about the impact of this change? It seems to be complicated to think through the impact of this change (and a prior change, since it was turned out to be not backward compatible).

It's internal API so this is not strictly a bug fix. It will break new Spark 4.1 app that start to use the new def apply with SparkSession implicit, but not fixing it will break a lot more Spark apps that are not upgraded to 4.1 yet. There is no way to support both as they just conflict with each other.

HeartSaVioR

+1 pending CI

It's unfortunate we broke the backward compatibility and fixing it would break it again, but I understand there is no better way. Thanks for fixing the nasty bug.

HeartSaVioR · 2026-02-04T20:19:57Z

Shall we rerun the CI? It's good to try again before looking into CI failure and say it's not relevant to this change.

github-actions bot added SQL STRUCTURED STREAMING CORE CONNECT labels Feb 3, 2026

cloud-fan force-pushed the memory-stream-compat branch from 9d29544 to 6074dff Compare February 3, 2026 05:33

cloud-fan commented Feb 3, 2026

View reviewed changes

cloud-fan force-pushed the memory-stream-compat branch 2 times, most recently from 5825d62 to 14f1f0f Compare February 3, 2026 05:57

dongjoon-hyun previously requested changes Feb 3, 2026

View reviewed changes

cloud-fan changed the title ~~[SPARK-53656][SS][FOLLOWUP] Improve MemoryStream backward compatibility~~ [SPARK-55337][SS] Fix MemoryStream backward compatibility Feb 3, 2026

github-actions bot removed CORE CONNECT labels Feb 3, 2026

cloud-fan force-pushed the memory-stream-compat branch from 14f1f0f to 823133a Compare February 3, 2026 07:55

cloud-fan and others added 2 commits February 3, 2026 18:54

Update StateStoreCoordinatorSuite.scala

fdb8c9e

HeartSaVioR approved these changes Feb 4, 2026

View reviewed changes

address comments

e1c0b1e

HeartSaVioR approved these changes Feb 4, 2026

View reviewed changes

Conversation

cloud-fan commented Feb 3, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JIRA Issue Information

Uh oh!

cloud-fan Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Feb 3, 2026

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Feb 3, 2026

Uh oh!

HeartSaVioR left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Feb 4, 2026

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Feb 3, 2026 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

HeartSaVioR left a comment •

edited

Loading