Commit 9a9c714
[SPARK-55628][SS] Integrate stream-stream join state format V4
### What changes were proposed in this pull request?
Integrate stream-stream join state format V4 which uses timestamp-based indexing with a secondary index.
Key changes:
- Enable V4 in `STREAMING_JOIN_STATE_FORMAT_VERSION` config
- Gated V4 behind `spark.sql.streaming.join.stateFormatV4.enabled` while V4 is under development.
- Route V4 to use VCF (`stateFormatVersion >= 3`) and hardcode schema version 3 for VCF path
- Fix checkpoint ID routing for V4's single-store design
- Mark V4's secondary index (`TsWithKeyStore`) as `isInternal = true` to prevent double-counting in `numRowsTotal` metrics
- Convert watermark from milliseconds to microseconds at all 4 eviction call sites (V4 stores timestamps as `TimestampType`)
- Add `TimestampAsPostfixKeyStateEncoderSpec` and `TimestampAsPrefixKeyStateEncoderSpec` to `KeyStateEncoderSpec.fromJson` for checkpoint restart deserialization
- Add V4 branch in `getSchemaForStateStores` and `getSchemasForStateStoreWithColFamily` for correct column family schemas and encoder specs
### Why are the changes needed?
SPARK-55628 tracks the integration of V4 state format into the stream-stream join operator. V4 was implemented in SPARK-55144 but not yet wired into the operator.
### Does this PR introduce _any_ user-facing change?
No. V4 is gated behind an internal config (`spark.sql.streaming.join.stateFormatVersion=4`, default remains 2). V4 is marked as experimental and subject to change.
### How was this patch tested?
- Added `StreamingJoinV4Suite.scala` with 4 new test suites: `StreamingInnerJoinV4Suite`, `StreamingOuterJoinV4Suite`, `StreamingFullOuterJoinV4Suite`, `StreamingLeftSemiJoinV4Suite`
- All suites re-run existing join tests with V4 config via `TestWithV4StateFormat` trait
- 2 V4-specific tests: plan assertion (verifies `stateFormatVersion == 4` in execution plan) and schema validation (verifies correct column families and encoder specs)
- 94/94 tests pass across all 4 suites
### Was this patch authored or co-authored using generative AI tooling?
Yes
### Behavioral Change Information
- [ ] This is a behavioral change
- [x] This is not a behavioral change
Closes #54777 from nicholaschew11/spark-55628-v4-join-integration.
Authored-by: Nicholas Chew <chew.nicky@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>1 parent 8efc4c6 commit 9a9c714
File tree
6 files changed
+324
-42
lines changed- sql
- catalyst/src/main/scala/org/apache/spark/sql/internal
- core/src
- main/scala/org/apache/spark/sql/execution/streaming
- operators/stateful/join
- state
- test/scala/org/apache/spark/sql/streaming
6 files changed
+324
-42
lines changedLines changed: 13 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3133 | 3133 | | |
3134 | 3134 | | |
3135 | 3135 | | |
3136 | | - | |
| 3136 | + | |
| 3137 | + | |
3137 | 3138 | | |
3138 | 3139 | | |
3139 | | - | |
3140 | | - | |
3141 | | - | |
| 3140 | + | |
3142 | 3141 | | |
3143 | 3142 | | |
| 3143 | + | |
| 3144 | + | |
| 3145 | + | |
| 3146 | + | |
| 3147 | + | |
| 3148 | + | |
| 3149 | + | |
| 3150 | + | |
| 3151 | + | |
| 3152 | + | |
3144 | 3153 | | |
3145 | 3154 | | |
3146 | 3155 | | |
| |||
Lines changed: 20 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
198 | 198 | | |
199 | 199 | | |
200 | 200 | | |
201 | | - | |
| 201 | + | |
202 | 202 | | |
203 | 203 | | |
204 | 204 | | |
| |||
292 | 292 | | |
293 | 293 | | |
294 | 294 | | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
295 | 299 | | |
296 | | - | |
| 300 | + | |
297 | 301 | | |
298 | 302 | | |
299 | 303 | | |
| |||
437 | 441 | | |
438 | 442 | | |
439 | 443 | | |
440 | | - | |
| 444 | + | |
441 | 445 | | |
442 | 446 | | |
443 | 447 | | |
| |||
463 | 467 | | |
464 | 468 | | |
465 | 469 | | |
466 | | - | |
| 470 | + | |
467 | 471 | | |
468 | 472 | | |
469 | 473 | | |
| |||
479 | 483 | | |
480 | 484 | | |
481 | 485 | | |
482 | | - | |
| 486 | + | |
483 | 487 | | |
484 | 488 | | |
485 | 489 | | |
| |||
801 | 805 | | |
802 | 806 | | |
803 | 807 | | |
804 | | - | |
| 808 | + | |
805 | 809 | | |
806 | 810 | | |
807 | 811 | | |
808 | 812 | | |
809 | 813 | | |
810 | 814 | | |
811 | 815 | | |
812 | | - | |
| 816 | + | |
813 | 817 | | |
814 | 818 | | |
815 | 819 | | |
| |||
833 | 837 | | |
834 | 838 | | |
835 | 839 | | |
836 | | - | |
| 840 | + | |
837 | 841 | | |
838 | 842 | | |
839 | 843 | | |
840 | 844 | | |
841 | 845 | | |
842 | 846 | | |
843 | 847 | | |
844 | | - | |
| 848 | + | |
845 | 849 | | |
846 | 850 | | |
847 | 851 | | |
848 | 852 | | |
849 | 853 | | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
850 | 861 | | |
851 | 862 | | |
852 | 863 | | |
| |||
Lines changed: 58 additions & 28 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| 37 | + | |
37 | 38 | | |
38 | 39 | | |
39 | 40 | | |
| |||
252 | 253 | | |
253 | 254 | | |
254 | 255 | | |
255 | | - | |
256 | | - | |
257 | | - | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
258 | 259 | | |
259 | 260 | | |
260 | 261 | | |
| |||
496 | 497 | | |
497 | 498 | | |
498 | 499 | | |
499 | | - | |
| 500 | + | |
500 | 501 | | |
501 | 502 | | |
502 | 503 | | |
| |||
648 | 649 | | |
649 | 650 | | |
650 | 651 | | |
651 | | - | |
| 652 | + | |
| 653 | + | |
652 | 654 | | |
653 | 655 | | |
654 | 656 | | |
655 | 657 | | |
656 | 658 | | |
657 | | - | |
| 659 | + | |
| 660 | + | |
658 | 661 | | |
659 | 662 | | |
660 | 663 | | |
| |||
1311 | 1314 | | |
1312 | 1315 | | |
1313 | 1316 | | |
1314 | | - | |
1315 | | - | |
| 1317 | + | |
| 1318 | + | |
1316 | 1319 | | |
1317 | 1320 | | |
1318 | 1321 | | |
| |||
1411 | 1414 | | |
1412 | 1415 | | |
1413 | 1416 | | |
1414 | | - | |
| 1417 | + | |
1415 | 1418 | | |
1416 | 1419 | | |
1417 | 1420 | | |
| |||
1744 | 1747 | | |
1745 | 1748 | | |
1746 | 1749 | | |
| 1750 | + | |
| 1751 | + | |
1747 | 1752 | | |
1748 | 1753 | | |
1749 | 1754 | | |
| |||
1780 | 1785 | | |
1781 | 1786 | | |
1782 | 1787 | | |
1783 | | - | |
1784 | | - | |
1785 | | - | |
1786 | 1788 | | |
1787 | 1789 | | |
1788 | | - | |
1789 | | - | |
1790 | | - | |
1791 | | - | |
1792 | | - | |
1793 | | - | |
1794 | | - | |
1795 | | - | |
1796 | | - | |
| 1790 | + | |
| 1791 | + | |
| 1792 | + | |
| 1793 | + | |
| 1794 | + | |
| 1795 | + | |
| 1796 | + | |
| 1797 | + | |
| 1798 | + | |
| 1799 | + | |
| 1800 | + | |
| 1801 | + | |
| 1802 | + | |
| 1803 | + | |
1797 | 1804 | | |
1798 | | - | |
1799 | | - | |
1800 | | - | |
1801 | | - | |
1802 | | - | |
| 1805 | + | |
| 1806 | + | |
| 1807 | + | |
| 1808 | + | |
| 1809 | + | |
| 1810 | + | |
| 1811 | + | |
| 1812 | + | |
| 1813 | + | |
| 1814 | + | |
| 1815 | + | |
| 1816 | + | |
| 1817 | + | |
| 1818 | + | |
| 1819 | + | |
| 1820 | + | |
| 1821 | + | |
| 1822 | + | |
1803 | 1823 | | |
1804 | | - | |
| 1824 | + | |
| 1825 | + | |
1805 | 1826 | | |
1806 | 1827 | | |
1807 | 1828 | | |
| |||
1816 | 1837 | | |
1817 | 1838 | | |
1818 | 1839 | | |
| 1840 | + | |
| 1841 | + | |
| 1842 | + | |
| 1843 | + | |
| 1844 | + | |
| 1845 | + | |
| 1846 | + | |
| 1847 | + | |
| 1848 | + | |
1819 | 1849 | | |
1820 | 1850 | | |
1821 | | - | |
| 1851 | + | |
1822 | 1852 | | |
1823 | 1853 | | |
1824 | 1854 | | |
| |||
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
611 | 611 | | |
612 | 612 | | |
613 | 613 | | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
614 | 618 | | |
615 | 619 | | |
616 | 620 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
941 | 941 | | |
942 | 942 | | |
943 | 943 | | |
944 | | - | |
| 944 | + | |
945 | 945 | | |
946 | 946 | | |
947 | 947 | | |
| |||
0 commit comments